Task1: Object Detection

Deep learning based computer vision algorithms have surpassed the human-level performance for many CV tasks, like object recognition and face verification. However, computer vision technology relies on the valid information from the input image and video, and the performance of the algorithm is essentially constrained by the quality of source image/video. Along with the emergence of gigapixel-level image/video, the corresponding computer vision tasks remain unsolved, due to the extremely high-resolution, large-scale, huge-data that induced by the gigapixel camera.

Part 1: Crowd Counting

This task is intended to evaluate the ability of algorithms to estimate the crowd density map in a complex scenario. For this task, participants will use our Gigapixel Video Dataset, a new resource with high spatial resolution and wide FOV simultaneously for computer vision challenges.

Dataset Download:

The Gigapixel Video Dataset 0.1alpha will be used for this task. This dataset consists of 65 representative images from the train station and the shanghai marathon sequences. These images are saved in JPEG format with more than 200K heads. We will release more labeled data in the future.

Invalid Area

Limited by the resolution, sometimes even human can not clearly count the exact number of people in some far places. Therefore, we have delineated some invalid areas which are considered artificially unrecognizable and have no groundtruth label.

Dataset Image Size Invalid Area
shanghai_marathon 26908 × 15024 1 ≤ x ≤ 26908, 1 ≤ y ≤ 6670
train_station 26558 × 14828 1 ≤ x ≤ 26558, 1 ≤ y ≤ 5130

Note: The top left pixel is set to the origin of the coordinates (x = 1, y = 1).


The groundtruth labels are saved in .mat and .txt files. The first two lines indicate the total number of people in the image. After that, each line represents a head position. The first number is the x coordinate; the second number is the y coordinate. The top left pixel is (1,1).

<x1 y1>
<x2 y2>
<xN yN>



Part 2: Pedestrian & Vehicle Detection



When using our datasets in your research, please cite:

title={Multiscale gigapixel video: A cross resolution image matching and warping approach},
author={Yuan, Xiaoyun and Fang, Lu and Dai, Qionghai and Brady, David J and Liu, Yebin},
booktitle={Computational Photography (ICCP), 2017 IEEE International Conference on},


This dataset is for non-commercial use only. However, if you find yourself or your personal belongings in the data, please contact us, and we will immediately remove the respective images from our servers.