Deep Panoramic Optical Flow Estimation
Panoramic videos, or omnidirectional videos, have become increasingly popular as they are able to provide viewers with an immersive watching experience. Unlike the 2D planar videos, panoramic videos are defined on a spherical domain. They are normally transformed by equirectangular projection to provide a seamless 360° representation. However, the severe distortion in the top and bottom areas in an equirectangular projection makes the traditional planar-based image editing methods ineffective when coping with panoramic images and videos. This thesis proposes a deep neural network to predict the pixel-wised optical flow for tracking objects' movement on equirectangular images. It describes methods and implementation details in terms of three main contributions. Firstly, three datasets are generated with ground truth optical flow for supervised neural network training, which are complementary to the existing optical flow datasets by providing 360-degree full field-of-view optical flow data. Secondly, a novel hybrid deep architecture to predict panoramic optical flow by effectively fusing the predicted results using different methods to project panoramic videos to a 2D plane. Thirdly, an application allowing users to interactively manipulate the pixels' color in panoramic videos is developed, where the optical flow is used to maintain spatial-temporal consistency among edited video frames.