360° Image Manipulation from Inferred Geometry
This thesis explores innovative methods for enhancing the manipulation and geometric understanding of 360° images in virtual reality (VR) applications, with a focus on depth estimation, surface normal prediction, and multi-task learning. The research presents a comprehensive approach to improving the manipulation of stereo 360° images, particularly in the context of image composition. A novel method is proposed to seamlessly integrate new visual elements into stereo 360° scenes, thereby enhancing user interaction and immersion in VR environments.
To support such advanced editing applications, a new technique for estimating depth maps from monocular 360° images is introduced. This method leverages both local and global scene information, significantly improving the accuracy of depth predictions. This is particularly crucial for applications that require spatial awareness, such as virtual and augmented reality, where accurate depth perception is key to realistic interaction with the virtual environment.
Furthermore, the thesis introduces a hybrid approach that combines convolutional neural networks (CNNs) and Vision Transformers (ViTs) to improve surface normal estimation. This method takes advantage of CNNs' ability to capture fine-grained details and ViTs' strength in modeling global context, resulting in more precise surface geometry analysis from 360° imagery. This enhanced surface normal estimation plays a vital role in better understanding the spatial structure of the scene.
In addition, the research demonstrates the effectiveness of multi-task learning (MTL) for comprehensive scene geometry understanding. By predicting both depth and surface normals simultaneously from monocular 360° images, the proposed MTL framework delivers a more detailed and coherent geometric representation, further contributing to the realism and immersion of VR used scenarios.
Through these contributions, the thesis significantly advances the field of 360° image-based manipulations and scene understanding, offering new tools and methodologies that pave the way for more immersive, interactive, and spatially aware experiences for VR applications.