Gigapixel videography

Gigapixel videography, beyond the resolution of single camera and human visual perception, aims to capture large-scale dynamic scene with extremely high resolution. Restricted by the spatial-temporal bandwidth product of optical system, the size, weight, power and cost are central challenges in gigapixel video. More explicitly, as shown in Fig. 1(a), the most popular single lens camera is composed by one stage optical imaging system, suffering from the inherent contradiction between high resolution and wide field-of-view. The single-scale multi-camera/camera-array system in Fig. 1(b) solves the contradiction through panoramic stitching pipeline, such as Microsoft ICE [5], Autopano Giga [6], Gigapan [7], Pointgrey ladybug 360 camera, etc. Such stitching based scheme always requires for a certain overlapping regions among nearby images/cameras, leading to the redundant usage of CCD/CMOS in the camera array system.

Fig.1: Illustration of representative imaging systems. (a) single camera imaging system faces the contradiction between wide FOV and high resolution, (b) single-scale camera array imaging [1][2] relies on image stitching[3], (c) structured multi-scale camera array (AWARE2[4]) adopts two-stage optical imaging design, (d) unstructured multi-scale camera array (denoted as UnstructuredCam).

While the recent multiscale optical design [4, 8] adopts a spherical objective lens as the first-stage optical imaging system, and the secondary imaging system uses multiple identical micro-optics to divide the whole FOV into small circular overlapped regions, as shown in Fig. 1(c). It substantially reduced the size and weight of gigapixel scale optical systems, the volume and weight of camera electronics in video operation is more than 10 times larger than the optics [4, 9]. More importantly, it usually adopts the delicately structured camera array design, which is faced with the challenges of complicated optical, electronic and mechanical design, laborious calibration, massive data processing etc.

As a matter of fact, typical natural scenes are highly compressible, where the static background usually contains large sections of sky, foliage or building while owns limited corridors of action such as roadways or playing fields. Consequently, ‘scalable sampling’ strategies are of great importance to practical implementation of gigapixel-scale broadcasting systems. Here the ‘scalable’ indicates that: 1) the overall number of cameras in the camera array, and the relative location/pose among cameras can be varied for different scenes, 2) each camera itself is allowed to have different focal length, field of view, resolution, frame rate etc., such that the whole camera array can work in a highly unstructured, scalable and economized way to reduce the redundancy.

Aiming for the aforementioned scalable, efficient and economized gigapixel videography, a novel gigapixel videography system with unstructured multi-scale camera array. design, denoted as ‘UnstructuredCam’ is presented in Fig. 1(d), i.e., reference/global camera (with wide-angle lens to capture the global scene) works together with local camera (with telephoto lens to capture local details). ‘Unstructured’ indicates that the overall structure of this camera array does not follow fixed or particular designs thus without precise assembling and careful calibration in advance. `Multi-scale' means not only the parameters of global-view camera varies from local-view camera, but also the parameters of local-view cameras can be different. Such setting enables gigapixel videography by warping each local-view video to the reference video independently and parallelly, without the troublesome camera calibration among local-view cameras, which further allows flexible, compressible, adaptive and moveable local-view camera setting during data capture.

Standing on the novel unstructured camera array design, there remains a big challenge that how to make the multi-camera array work just as a ‘single camera’, given that the single camera always owns superior characteristics that it does not require for pre-calibration, and the captured result from single camera is seamless. Therefore, an online unstructured embedding scheme is contributed to the novel unstructured camera array, i.e., cross resolution matching and warping approach to automatically embed each local-view video in the global video. Recall that the global video owns wide field-of-view yet is lack of high resolution details, which degrades the accuracy and robustness of feature extraction significantly. Inspired by the observation that a more uniformly distributed feature map may lead to a more robust feature correspondence, a new feature detection and keypoint pooling method based on structure edge map is proposed to identify the uniformly spaced features in reference video for zncc matching. Then a multi-scale quadtree mesh structure to represent the non-rigid warping field, as well as a two-pass non-iterative outlier removal algorithm, is contributed to speed up the mesh-based homography warping step while maintaining embedding robustness. Finally, to maintain salience structures such as straight lines in the scene, structure-feature based mesh refinement strategy is proposed to remove visual artifacts.

The UnstructuredCam enables several unique capabilities over the prior works, and works just like a single camera: 1) it bypasses the need for careful camera alignment, tedious geometry and color calibration [10, 11], as well as the requirement for image overlaps among local-view cameras in available methods [4, 12]; 2) it allows local-view camera movements such that the gigapixel video is captured in an adaptive and efficient way by allocating more sensor resources to the regions of interest; 3) it enables parallel stitching of the final high resolution video and optimized for real time synthesis even with online camera movements. 4) the generated result tends to be a seamless video, where existing computer vision algorithms can be simply applied with very little modification.

As the warping method in [13] requires for iterative optimization during feature matching, it can not run in realtime so as to enable online local view camera movements. The present work makes the following contributions compared with the preliminary version. First, a new embedding algorithm is proposed which bypasses the complex iterative warping strategy by introducing a novel feature pooling and a multi-scale quadtree warping strategy. Second, the proposed gagapixel camera system has the ability for online synthesis of gigapixel video even with the dynamically movement of local-view cameras. Third, this system has compelling performance for various computer vision applications, including large-scale human/vehicle tracking, face detection, skeleton detection, and crowd counting etc.

In short summary, the UnstructuredCam, an end-to-end unstructured multi-scale camera system, shows the ability of realtime capture, dynamically adjusting local-view cameras, and online warping for synthesizing gigapixel video. For the first time, this system shows the ability of capturing a seamless gigapixel video with adaptive and online pixel allocation for important or desired scene regions. This new ability benefits from not only the new unstructured camera array design, but also the new algorithm for embedding (warping) the local videos into the global video. In fact, such design poses new challenges for the video embedding algorithm, including parallaxes, abundant and various complex scene appearances, color inconsistency among cameras, and, there exists significant resolution gap (usually more than $8\times$) between the global and the local-view videos. More importantly, all the computations are expected to run efficiently for dynamically adjusting local camera viewpoints online. This scalable gigapixel videography system fully explores the flexibility, scalability and efficiency enabled by the unstructured multi-scale camera array mechanism, alleviating the bottleneck of hardware design to the computational algorithm development. We believe that this system and algorithm will open up new research on more adaptive and efficient gigapixel video capture and synthesis using capture setups with smaller size and lower cost.

[1] Wilburn, B., Joshi, N., Vaish, V., Talvala, E.V., Antunez, E., Barth, A., Adams, A., Horowitz, M. and Levoy, M., 2005, July. High performance imaging using large camera arrays. In ACM Transactions on Graphics (TOG) (Vol. 24, No. 3, pp. 765-776). ACM.
[2] Perazzi, F., Sorkine‐Hornung, A., Zimmer, H., Kaufmann, P., Wang, O., Watson, S. and Gross, M., 2015, May. Panoramic video from unstructured camera arrays. In Computer Graphics Forum (Vol. 34, No. 2, pp. 57-68).
[3] Brown, M. and Lowe, D.G., 2007. Automatic panoramic image stitching using invariant features. International journal of computer vision, 74(1), pp.59-73.
[4] Brady, D.J., Gehm, M.E., Stack, R.A., Marks, D.L., Kittle, D.S., Golish, D.R., Vera, E.M. and Feller, S.D., 2012. Multiscale gigapixel photography. Nature, 486(7403), p.386.
[5] M. Research, “Image composite editor: An advanced panoramic imagestitcher.”
[6] Kolor, “Autopano giga.”
[7] “Gigapan,”
[8] Cossairt, O.S., Miau, D. and Nayar, S.K., 2011, April. Gigapixel computational imaging. In Computational Photography (ICCP), 2011 IEEE International Conference on (pp. 1-8). IEEE.
[9] Nichols, J.M., Judd, K.P., Olson, C.C., Novak, K., Waterman, J.R., Feller, S., McCain, S., Anderson, J. and Brady, D., 2016. Range performance of the DARPA AWARE wide field-of-view visible imager. Applied optics, 55(16), pp.4478-4484.
[10] Tommaselli, A.M.G., Marcato Jr, J., Moraes, M.V.A., Silva, S.L.A. and Artero, A.O., 2014. Calibration of panoramic cameras with coded targets and a 3D calibration field. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 40(3), p.137.
[11] Golish, D.R., Vera, E., Kelly, K., Gong, Q., Jansen, P., Hughes, J., Kittle, D.S., Brady, D.J. and Gehm, M.E., 2012, June. Challenges in gigapixel multiscale image formation. In Computational Optical Sensing and Imaging (pp. JW3A-4). Optical Society of America.
[12] Kopf, J., Uyttendaele, M., Deussen, O. and Cohen, M.F., 2007, August. Capturing and viewing gigapixel images. In aCm Transactions on Graphics (TOG) (Vol. 26, No. 3, p. 93). ACM.
[13] Yuan, X., Fang, L., Dai, Q., Brady, D.J. and Liu, Y., 2017, May. Multiscale gigapixel video: A cross resolution image matching and warping approach. In Computational Photography (ICCP), 2017 IEEE International Conference on (pp. 1-9). IEEE.