InterHand2.6M dataset
Above demo videos have low-quality frames because of the compression for the README upload.
News
- 2020.11.26. Fitted MANO parameters are updated to the better ones (fitting error is about 5 mm). Also, reduced to much smaller file size by providing parameters fitted to the world coordinates (independent on the camera view).
- 2020.10.7. Fitted MANO parameters are available! They are obtained by NeuralAnnot.
Introduction
- This is an official release of InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image (ECCV 2020).
- Our InterHand2.6M dataset is the first large-scale real-captured dataset with accurate GT 3D interacting hand poses.
- Due to the privacy issues, we have blurred (and are blurring) all faces in our dataset. As this takes lots of labor and time, we release an initial version (v0.0) of InterHand2.6M.
- This initial version (v0.0) includes all hand sequences, therefore the diversity of the dataset would be not very different from the full InterHand2.6M. Several viewpoints are excluded, which are hard cases for face detection.
- Specifications of InterHand2.6M (v0.0) is as below.
Train set
* Train (H): 76,445 single hand frames / 208,271 interacting hand frames / 284,716 total frames (full InterHand2.6M: 142,231 / 386,251 / 528,482)
* Train (M): 322,675 single hand frarmes / 174,383 interacting hand frames / 497,058 total frames (full InterHand2.6M: 594,189 / 314,848 / 909,037)
* Train (H+M): 371,800 single hand frames / 366,802 interacting hand frames / 738,602 total frames (full InterHand2.6M: 687,548 / 673,514 / 1,361,062)
Validation set
* Val (M): 113,370 single hand frames / 70,917 interacting hand frames / 184,287 total frames (full InterHand2.6M: 234,183 / 145,942 / 380,125)
Test set
* Test (H): 18,399 single hand frames / 48,323 interacting hand frames / 66,722 total frames (full InterHand2.6M: 33,665 / 87,908 / 121,573)
* Test (M): 179,593 single hand frames / 106,582 interacting hand frames / 286,175 total frames (full InterHand2.6M: 455,303 / 272,284 / 727,587)
* Test (H+M): 197,992 single hand frames / 154,905 interacting hand frames / 352,897 total frames (full InterHand2.6M: 488,968 / 360,192 / 849,160)
Total set
* InterHand2.6M (v0.0): 1,275,786 (full InterHand2.6M: 2,590,347)
Download
Images
You can download images from google drive or github. Both links contain the same set of images. Google drive link would be faster.
If you have a problem with ‘Download limit’ problem when tried to download dataset from google drive link, please try this trick.
* Go the shared folder, which contains files you want to copy to your drive
* Select all the files you want to copy
* In the upper right corner click on three vertical dots and select “make a copy”
* Then, the file is copied to your personal google drive account. You can download it from your personal account.
- Images (v0.0) on Github
[batch0] [batch1] [batch2] [batch3] [batch4] [batch5] [batch6] [batch7] [batch8] [batch9] [batch10] [batch11] [batch12] [batch13] [batch14] [batch15] [batch16] [batch17] [batch18] [batch19] [batch20]
You can use the following shortcut to download all data on a Unix system (warning: the total zipped images take about 40GB):
for part in a b c d e f g h i j k l m n o p q r s t u
do
wget https://github.com/facebookresearch/InterHand2.6M/releases/download/v0.0/InterHand2.6M.images.5.fps.v0.0.tar.parta${part}
done;
You can extract the archive using cat InterHand2.6M.images.5.fps.v0.0.tar.parta* | tar -xvf - -i
.
- Please check
md5sum
value of splitted image tar files from here
Annotations
- Annotations (v0.0) from Google drive
- Annotations (v0.0) from Github release
- 2020.09.15: images[i][‘width’] and images[i][‘height’] were wrong. It has been changed to the correct one ((512,334) or (334,512)).
Directory
The ${ROOT}
is described as below.
${ROOT}
|-- images
| |-- train
| | |-- Capture0 ~ Capture26
| |-- val
| | |-- Capture0
| |-- test
| | |-- Capture0 ~ Capture7
|-- annotations
| |-- skeleton.txt
| |-- subject.txt
| |-- all
| |-- human_annot
| |-- machine_annot
Annotation files
- Using Pycocotools for the data load is recommended. Run
pip install pycocotools
. skeleton.txt
contains information about hand hierarchy (keypoint name, keypoint index, keypoint parent index).subject.txt
contains information about the subject (subject_id, subject directory, subject gender).
There are four .json
files.
InterHand2.6M_$DB_SPLIT_data.json: dict
|-- 'images': [image]
|-- 'annotations': [annotation]
image: dict
|-- 'id': int (image id)
|-- 'file_name': str (image file name)
|-- 'width': int (image width)
|-- 'height': int (image height)
|-- 'capture': int (capture id)
|-- 'subject': int (subject id)
|-- 'seq_name': str (sequence name)
|-- 'camera': str (camera name)
|-- 'frame_idx': int (frame index)
annotation: dict
|-- 'id': int (annotation id)
|-- 'image_id': int (corresponding image id)
|-- 'bbox': list (bounding box coordinates. [xmin, ymin, width, height])
|-- 'joint_valid': list (can this annotaion be use for hand pose estimation training and evaluation? 1 if a joint is annotated and inside of image. 0 otherwise. this is based on 2D observation from the image.)
|-- 'hand_type': str (one of 'right', 'left', and 'interacting')
|-- 'hand_type_valid': int (can this annotation be used for handedness estimation training and evaluation? 1 if hand_type in ('right', 'left') or hand_type == 'interacting' and np.sum(joint_valid) > 30, 0 otherwise. this is based on 2D observation from the image.)
InterHand2.6M_$DB_SPLIT_camera.json
|-- str (capture id)
| |-- 'campos'
| | |-- str (camera name): [x,y,z] (camera position)
| |-- 'camrot'
| | |-- str (camera name): 3x3 list (camera rotation matrix)
| |-- 'focal'
| | |-- str (camera name): [focal_x, focal_y] (focal length of x and y axis
| |-- 'princpt'
| | |-- str (camera name): [princpt_x, princpt_y] (principal point of x and y axis)
InterHand2.6M_$DB_SPLIT_joint_3d.json
|-- str (capture id)
| |-- str (frame idx):
| | |-- 'world_coord': Jx3 list (3D joint coordinates in the world coordinate system. unit: milimeter.)
| | |-- 'joint_valid': Jx3 list (1 if `joint_valid` from `InterHand2.6M_$DB_SPLIT_data.json` in at least 1 view is 1.)
| | |-- 'hand_type': str (one of 'right', 'left', and 'interacting'. 'interacting' if `hand_type` from `InterHand2.6M_$DB_SPLIT_data.json` in at least 1 view is 'interacting'.)
| | |-- 'hand_type_valid': int (1 if `hand_type_valid` from `InterHand2.6M_$DB_SPLIT_data.json` in at least 1 view is 1.)
InterHand2.6M_$DB_SPLIT_MANO.json
|-- str (capture id)
| |-- str (frame idx):
| | |-- 'right'
| | | |-- 'pose': 48 dimensional MANO pose vector in axis-angle representation minus the mean pose.
| | | |-- 'shape': 10 dimensional MANO shape vector.
| | | |-- 'trans': 3 dimensional MANO translation vector in meter unit.
| | |-- 'left'
| | | |-- 'pose': 48 dimensional MANO pose vector in axis-angle representation minus the mean pose.
| | | |-- 'shape': 10 dimensional MANO shape vector.
| | | |-- 'trans': 3 dimensional MANO translation vector in meter unit.
For the MANO mesh rendering, please see https://github.com/facebookresearch/InterHand2.6M/blob/master/MANO_render/render.py
InterHand2.6M in 30 fps
- Above InterHand2.6M is a downsampled version to 5 fps to remove redundancy.
- We additionally release InterHand2.6M in 30 fps for the video-related research.
- It has exactly the same directory structure with that of the InterHand2.6M in 5 fps.
- Likewise InterHand2.6M 5 fps version, we release initial version (v0.0) of InterHand2.6M in 30 fps.
Train set
* Train (H): 76,447 single hand frames / 208,281 interacting hand frames / 284,728 total frames (full InterHand2.6M: 528,510)
* Train (M): 1,856,600 single hand frarmes / 1,031,624 interacting hand frames / 2,888,224 total frames (full InterHand2.6M: 5,282,897)
* Train (H+M): 1,905,726 single hand frames / 1,213,661 interacting hand frames / 3,119,387 total frames (full InterHand2.6M: 5,716,488)
Validation set
* Val (M): 678,501 single hand frames / 424,917 interacting hand frames / 1,103,418 total frames (full InterHand2.6M: 2,276,049)
Test set
* Test (H): 18,402 single hand frames / 48,332 interacting hand frames / 66,734 total frames (full InterHand2.6M: 121,591)
* Test (M): 1,075,209 single hand frames / 637,968 interacting hand frames / 1,713,177 total frames (full InterHand2.6M: 4,355,771)
* Test (H+M): 1,093,611 single hand frames / 686,300 interacting hand frames / 1,779,911 total frames (full InterHand2.6M: 4,477,362)
Total set
* InterHand2.6M (v0.0) in 30 fps: 6,002,716 (full InterHand2.6M in 30 fps: 12,469,899)
Download
- Images (I’m waiting for FB’s data inspection. I do not know when it will be finished… :( )
- Annotations (v0.0) from Google drive
- Annotations (v0.0) from Github release
A Baseline for 3D Interacting Hand Pose Estimation (InterNet)
- Go to Github
Contact
If you meet any problem, please send an e-mail to mks0601(at)gmail.com
Reference
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}