InterHand2.6M dataset

Our new Re:InterHand dataset has been released, which has much more diverse image appearances with more stable 3D GT. Check it out at here!

Quick visualizations

Above demo videos have low-quality frames because of the compression for the README upload.

Videos of 3D joint coordinates (from joint_3d.json) from the 30 fps split: [single hand] [two hands].
Videos of MANO fittings from the 30 fps split: [single hand] [two hands].

News

2023.05.05. Visualized videos are available.
2021.03.22. Finally, InterHand2.6M v1.0, which includes all images of 5 fps and 30 fps version, is released! 🎉 This is the dataset used in InterHand2.6M paper.
2020.11.26. Fitted MANO parameters are updated to the better ones (fitting error is about 5 mm). Also, reduced to much smaller file size by providing parameters fitted to the world coordinates (independent on the camera view).
2020.10.7. Fitted MANO parameters are available! They are obtained by NeuralAnnot.

Introduction

This is an official release of InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image (ECCV 2020).
Our InterHand2.6M dataset is the first large-scale real-captured dataset with accurate GT 3D interacting hand poses.
Specifications of InterHand2.6M are as below.

Train set
* Train (H): 142,231 single hand frames / 386,251 interacting hand frames / 528,482 total frames
* Train (M): 594,189 single hand frarmes / 314,848 interacting hand frames / 909,037 total frames
* Train (H+M): 687,548 single hand frames / 673,514 interacting hand frames / 1,361,062 total frames

Validation set
* Val (M): 234,183 single hand frames / 145,942 interacting hand frames / 380,125 total frames

Test set
* Test (H): 33,665 single hand frames / 87,908 interacting hand frames / 121,573 total frames
* Test (M): 455,303 single hand frames / 272,284 interacting hand frames / 727,587 total frames
* Test (H+M): 488,968 single hand frames / 360,192 interacting hand frames / 849,160 total frames

Total set
* InterHand2.6M: 1,410,699 single hand frames / 1,179,648 interacting hand frames / 2,590,347 total frames

Download

Images

Images (v1.0)
Instead of manually downloading files from the above link, you can run a download script.
To verify downloaded files using md5sum, run python verify_download.py.
To unzip images, run sh unzip.sh.
All image files take 80 GB.

Annotations

Annotations (v1.0). H+M in the paper
Annotation IDs of human annotation (H) train set and human annotation (M) test set
Camera positions visualization codes are available in here

Annotation files

Using Pycocotools for the data load is recommended. Run pip install pycocotools.
skeleton.txt contains information about hand hierarchy (keypoint name, keypoint index, keypoint parent index).
subject.txt contains information about the subject (subject_id, subject directory, subject gender).

There are four .json files.

InterHand2.6M_$DB_SPLIT_data.json: dict
|-- 'images': [image]
|-- 'annotations': [annotation]

image: dict
|-- 'id': int (image id)
|-- 'file_name': str (image file name)
|-- 'width': int (image width)
|-- 'height': int (image height)
|-- 'capture': int (capture id)
|-- 'subject': int (subject id)
|-- 'seq_name': str (sequence name)
|-- 'camera': str (camera name)
|-- 'frame_idx': int (frame index)

annotation: dict
|-- 'id': int (annotation id)
|-- 'image_id': int (corresponding image id)
|-- 'bbox': list (bounding box coordinates. [xmin, ymin, width, height])
|-- 'joint_valid': list (can this annotaion be use for hand pose estimation training and evaluation? 1 if a joint is annotated and inside of image. 0 otherwise. this is based on 2D observation from the image.)
|-- 'hand_type': str (one of 'right', 'left', and 'interacting')
|-- 'hand_type_valid': int (can this annotation be used for handedness estimation training and evaluation? 1 if hand_type in ('right', 'left') or hand_type == 'interacting' and np.sum(joint_valid) > 30, 0 otherwise. this is based on 2D observation from the image.)

InterHand2.6M_$DB_SPLIT_camera.json
|-- str (capture id)
|   |-- 'campos'
|   |   |-- str (camera name): [x,y,z] (camera position)
|   |-- 'camrot'
|   |   |-- str (camera name): 3x3 list (camera rotation matrix)
|   |-- 'focal'
|   |   |-- str (camera name): [focal_x, focal_y] (focal length of x and y axis
|   |-- 'princpt'
|   |   |-- str (camera name): [princpt_x, princpt_y] (principal point of x and y axis)

InterHand2.6M_$DB_SPLIT_joint_3d.json
|-- str (capture id)
|   |-- str (frame idx): 
|   |   |-- 'world_coord': Jx3 list (3D joint coordinates in the world coordinate system. unit: milimeter.)
|   |   |-- 'joint_valid': Jx3 list (1 if a joint is successfully annotated 0 else. Unlike 'joint_valid' of InterHand2.6M_$DB_SPLIT_data.json, it does not consider whether it is truncated in the image space or not.)
|   |   |-- 'hand_type': str (one of 'right', 'left', and 'interacting'. taken from sequence names)
|   |   |-- 'seq': str (sequence name)

InterHand2.6M_$DB_SPLIT_MANO_NeuralAnnot.json
|-- str (capture id)
|   |-- str (frame idx): 
|   |   |-- 'right'
|   |   |   |-- 'pose': 48 dimensional MANO pose vector in axis-angle representation minus the mean pose.
|   |   |   |-- 'shape': 10 dimensional MANO shape vector.
|   |   |   |-- 'trans': 3 dimensional MANO translation vector in meter unit.
|   |   |-- 'left'
|   |   |   |-- 'pose': 48 dimensional MANO pose vector in axis-angle representation minus the mean pose.
|   |   |   |-- 'shape': 10 dimensional MANO shape vector.
|   |   |   |-- 'trans': 3 dimensional MANO translation vector in meter unit.

The 3D MANO fits are obtained by NeuralAnnot (https://arxiv.org/abs/2011.11232).
For the MANO mesh rendering, please see https://github.com/facebookresearch/InterHand2.6M/blob/master/MANO_render/render.py

InterHand2.6M in 30 fps

Above InterHand2.6M is a downsampled version to 5 fps to remove redundancy.
We additionally release InterHand2.6M in 30 fps for the video-related research.
It has exactly the same directory structure with that of the InterHand2.6M in 5 fps.

Train set
* Train (H): 142,240 single hand frames / 386,270 interacting hand frames / 528,510 total frames
* Train (M): 3,420,240 single hand frarmes / 1,862,657 interacting hand frames / 5,282,897 total frames
* Train (H+M): 3,513,605 single hand frames / 2,202,883 interacting hand frames / 5,716,488 total frames

Validation set
* Val (M): 1,401,601 single hand frames / 874,448 interacting hand frames / 2,276,049 total frames

Test set
* Test (H): 33,672 single hand frames / 87,919 interacting hand frames / 121,591 total frames
* Test (M): 2,725,911 single hand frames / 1,629,860 interacting hand frames / 4,355,771 total frames
* Test (H+M): 2,759,583 single hand frames / 1,717,779 interacting hand frames / 4,477,362 total frames

Total set
* InterHand2.6M: 7,674,789 single hand frames / 4,795,110 interacting hand frames / 12,469,899 total frames

Download

Images

Images (v1.0)
Instead of manually downloading files from the above link, you can run a download script.
To verify downloaded files using md5sum, run python verify_download.py.
To unzip images, run sh unzip.sh.
All image files take 365 GB.

Annotations

Annotations (v1.0). H+M in the paper
Annotation IDs of human annotation (H) train set and human annotation (M) test set
Camera positions visualization codes are available in here

A Baseline for 3D Interacting Hand Pose Estimation (InterNet)

Go to Github

Contact

If you meet any problems, please send an e-mail to mks0601(at)gmail.com

Reference

@InProceedings{Moon_2020_ECCV_InterHand2.6M,  
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},  
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},  
booktitle = {European Conference on Computer Vision (ECCV)},  
year = {2020}  
}  

InterHand2.6M dataset

Our new Re:InterHand dataset has been released, which has much more diverse image appearances with more stable 3D GT. Check it out at here!

Quick visualizations

News

Introduction

Download

Images

Annotations

Directory

Annotation files

InterHand2.6M in 30 fps

Download

Images

Annotations

A Baseline for 3D Interacting Hand Pose Estimation (InterNet)

Contact

Reference