Open Datasets for Computer Vision Projects

The following are some Open Datasets for Computer Vision Projects.

MS COCO

https://cocodataset.org

COCO is a large-scale object detection, segmentation, and caption dataset. COCO has 330K images (>200k labeled), 80 object categories and 91 stuff categories.

MPII Human Pose Dataset

http://human-pose.mpi-inf.mpg.de/

MPII Human Pose Dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. Overall the dataset covers 410 human activities and each image is provided with an activity label.

ImageNet

http://www.image-net.org/

ImageNet is an image database organized according to the WordNet hierarchy. The database has 14M images and over 21K synsets indexed.

CIFAR- 10

http://www.cs.toronto.edu/~kriz/cifar.html

The CIFAR- 10 dataset consists of 60K 32×32 color images in 10 classes, with 6K images per class. The dataset is divided into five training batches and one test batch, each with 10K images. The test batch contains exactly 10K randomly-selected imaged from each class.

Cityscapes

https://www.cityscapes-dataset.com/

The Cityscapes Dataset is a large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high-quality pixel-level annotations of 5K frames in addition to a large set of 20K weakly annotated frames.

Kinetics- 700

https://deepmind.com/research/open-source/kinetics

The Kinetics- 700 dataset contains 650K video clips that cover 700 human action classes. Each clip is human annotated with a single action class and lasts around 10 seconds.

Open Images Dataset V6

https://storage.googleapis.com/openimages/web/factsfigures.html

Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives. It contains a total of 16M bounding boxes for 600 object classes on 1.9M imgages.

Caltech Pedestrian Detection Benchmark

http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/

The Caltech Pedestrian Dataset consists of ~10 hours of 640×480 30Hz video taken from a vehicle driving through regular traffic in an urban environment. About 250K frames (in 137 approximately minute long segments) with a total of 350K bounding boxes and 2300 unique pedestrians were annotated. The annotation includes temporal correspondence between bounding boxes and detailed occlusion labels.