View on GitHub

InterHand2.6M

A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image (ECCV 2020)

InterHand2.6M dataset

Above demo videos have low-quality frames because of the compression for the README upload.

Introduction

Train set
* Train (H): 76,445 single hand frames / 208,271 interacting hand frames / 284,716 total frames (full InterHand2.6M: 142,231 / 386,251 / 528,482)
* Train (M): 322,675 single hand frarmes / 174,383 interacting hand frames / 497,058 total frames (full InterHand2.6M: 594,189 / 314,848 / 909,037)
* Train (H+M): 371,800 single hand frames / 366,802 interacting hand frames / 738,602 total frames (full InterHand2.6M: 687,548 / 673,514 / 1,361,062)

Validation set
* Val (M): 113,370 single hand frames / 70,917 interacting hand frames / 184,287 total frames (full InterHand2.6M: 234,183 / 145,942 / 380,125)

Test set
* Test (H): 18,399 single hand frames / 48,323 interacting hand frames / 66,722 total frames (full InterHand2.6M: 33,665 / 87,908 / 121,573)
* Test (M): 179,593 single hand frames / 106,582 interacting hand frames / 286,175 total frames (full InterHand2.6M: 455,303 / 272,284 / 727,587)
* Test (H+M): 197,992 single hand frames / 154,905 interacting hand frames / 352,897 total frames (full InterHand2.6M: 488,968 / 360,192 / 849,160)

Total set
* InterHand2.6M (v0.0): 1,275,786 (full InterHand2.6M: 2,590,347)

Download

Images

You can download images from google drive or github. Both links contain the same set of images.

If you have a problem with ‘Download limit’ problem when tried to download dataset from google drive link, please try this trick.

* Go the shared folder, which contains files you want to copy to your drive  
* Select all the files you want to copy  
* In the upper right corner click on three vertical dots and select “make a copy”  
* Then, the file is copied to your personal google drive account. You can download it from your personal account.  

[batch0] [batch1] [batch2] [batch3] [batch4] [batch5] [batch6] [batch7] [batch8] [batch9] [batch10] [batch11] [batch12] [batch13] [batch14] [batch15] [batch16] [batch17] [batch18] [batch19] [batch20]

You can use the following shortcut to download all data on a Unix system (warning: the total zipped images take about 40GB):

for part in a b c d e f g h i j k l m n o p q r s t u
do
    wget https://github.com/facebookresearch/InterHand2.6M/releases/download/v0.0/InterHand2.6M.images.5.fps.v0.0.tar.parta${part}
done;

You can extract the archive using cat InterHand2.6M.images.5.fps.v0.0.tar.parta* | tar -xvf - -i.

Annotations

Directory

The ${ROOT} is described as below.

${ROOT}
|-- images
|   |-- train
|   |   |-- Capture0 ~ Capture26
|   |-- val
|   |   |-- Capture0
|   |-- test
|   |   |-- Capture0 ~ Capture7
|-- annotations
|   |-- skeleton.txt
|   |-- subject.txt
|   |-- all
|   |-- human_annot
|   |-- machine_annot

Annotation files

There are three .json files.

InterHand2.6M_$DB_SPLIT_data.json: dict
|-- 'images': [image]
|-- 'annotations': [annotation]

image: dict
|-- 'id': int (image id)
|-- 'file_name': str (image file name)
|-- 'width': int (image width)
|-- 'height': int (image height)
|-- 'capture': int (capture id)
|-- 'subject': int (subject id)
|-- 'seq_name': str (sequence name)
|-- 'camera': str (camera name)
|-- 'frame_idx': int (frame index)

annotation: dict
|-- 'id': int (annotation id)
|-- 'image_id': int (corresponding image id)
|-- 'bbox': list (bounding box coordinates. [xmin, ymin, width, height])
|-- 'joint_valid': list (can this annotaion be use for hand pose estimation training and evaluation? 1 if a joint is annotated and inside of image. 0 otherwise. this is based on 2D observation from the image.)
|-- 'hand_type': str (one of 'right', 'left', and 'interacting')
|-- 'hand_type_valid': int (can this annotation be used for handedness estimation training and evaluation? 1 if hand_type in ('right', 'left') or hand_type == 'interacting' and np.sum(joint_valid) > 30, 0 otherwise. this is based on 2D observation from the image.)
InterHand2.6M_$DB_SPLIT_camera.json
|-- str (capture id)
|   |-- 'campos'
|   |   |-- str (camera name): [x,y,z] (camera position)
|   |-- 'camrot'
|   |   |-- str (camera name): 3x3 list (camera rotation matrix)
|   |-- 'focal'
|   |   |-- str (camera name): [focal_x, focal_y] (focal length of x and y axis
|   |-- 'princpt'
|   |   |-- str (camera name): [princpt_x, princpt_y] (principal point of x and y axis)
InterHand2.6M_$DB_SPLIT_joint_3d.json
|-- str (capture id)
|   |-- str (frame idx): 
|   |   |-- 'world_coord': Jx3 list (3D joint coordinates in the world coordinate system.)
|   |   |-- 'joint_valid': Jx3 list (1 if `joint_valid` from `InterHand2.6M_$DB_SPLIT_data.json` in at least 1 view is 1.)
|   |   |-- 'hand_type': str (one of 'right', 'left', and 'interacting'. 'interacting' if `hand_type` from `InterHand2.6M_$DB_SPLIT_data.json` in at least 1 view is 'interacting'.)
|   |   |-- 'hand_type_valid': int (1 if `hand_type_valid` from `InterHand2.6M_$DB_SPLIT_data.json` in at least 1 view is 1.)

InterHand2.6M in 30 fps

Train set
* Train (H): 76,447 single hand frames / 208,281 interacting hand frames / 284,728 total frames (full InterHand2.6M: 528,510)
* Train (M): 1,856,600 single hand frarmes / 1,031,624 interacting hand frames / 2,888,224 total frames (full InterHand2.6M: 5,282,897)
* Train (H+M): 1,905,726 single hand frames / 1,213,661 interacting hand frames / 3,119,387 total frames (full InterHand2.6M: 5,716,488)

Validation set
* Val (M): 678,501 single hand frames / 424,917 interacting hand frames / 1,103,418 total frames (full InterHand2.6M: 2,276,049)

Test set
* Test (H): 18,402 single hand frames / 48,332 interacting hand frames / 66,734 total frames (full InterHand2.6M: 121,591)
* Test (M): 1,075,209 single hand frames / 637,968 interacting hand frames / 1,713,177 total frames (full InterHand2.6M: 4,355,771)
* Test (H+M): 1,093,611 single hand frames / 686,300 interacting hand frames / 1,779,911 total frames (full InterHand2.6M: 4,477,362)

Total set
* InterHand2.6M (v0.0) in 30 fps: 6,002,716 (full InterHand2.6M in 30 fps: 12,469,899)

Download

A Baseline for 3D Interacting Hand Pose Estimation (InterNet)

Contact

If you meet any problem, please send an e-mail to mks0601(at)gmail.com

Reference

@InProceedings{Moon_2020_ECCV_InterHand2.6M,  
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},  
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},  
booktitle = {European Conference on Computer Vision (ECCV)},  
year = {2020}  
}