View on GitHub

InterHand2.6M

A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image (ECCV 2020)

InterHand2.6M dataset

Above demo videos have low-quality frames because of the compression for the README upload.

News

Introduction

Train set
* Train (H): 142,231 single hand frames / 386,251 interacting hand frames / 528,482 total frames
* Train (M): 594,189 single hand frarmes / 314,848 interacting hand frames / 909,037 total frames
* Train (H+M): 687,548 single hand frames / 673,514 interacting hand frames / 1,361,062 total frames

Validation set
* Val (M): 234,183 single hand frames / 145,942 interacting hand frames / 380,125 total frames

Test set
* Test (H): 33,665 single hand frames / 87,908 interacting hand frames / 121,573 total frames
* Test (M): 455,303 single hand frames / 272,284 interacting hand frames / 727,587 total frames
* Test (H+M): 488,968 single hand frames / 360,192 interacting hand frames / 849,160 total frames

Total set
* InterHand2.6M: 1,410,699 single hand frames / 1,179,648 interacting hand frames / 2,590,347 total frames

Download

Images

Annotations

Directory

The ${ROOT} is described as below.

${ROOT}
|-- images
|   |-- train
|   |   |-- Capture0 ~ Capture26
|   |-- val
|   |   |-- Capture0
|   |-- test
|   |   |-- Capture0 ~ Capture7
|-- annotations
|   |-- skeleton.txt
|   |-- subject.txt
|   |-- train
|   |-- val
|   |-- test

Annotation files

There are four .json files.

InterHand2.6M_$DB_SPLIT_data.json: dict
|-- 'images': [image]
|-- 'annotations': [annotation]

image: dict
|-- 'id': int (image id)
|-- 'file_name': str (image file name)
|-- 'width': int (image width)
|-- 'height': int (image height)
|-- 'capture': int (capture id)
|-- 'subject': int (subject id)
|-- 'seq_name': str (sequence name)
|-- 'camera': str (camera name)
|-- 'frame_idx': int (frame index)

annotation: dict
|-- 'id': int (annotation id)
|-- 'image_id': int (corresponding image id)
|-- 'bbox': list (bounding box coordinates. [xmin, ymin, width, height])
|-- 'joint_valid': list (can this annotaion be use for hand pose estimation training and evaluation? 1 if a joint is annotated and inside of image. 0 otherwise. this is based on 2D observation from the image.)
|-- 'hand_type': str (one of 'right', 'left', and 'interacting')
|-- 'hand_type_valid': int (can this annotation be used for handedness estimation training and evaluation? 1 if hand_type in ('right', 'left') or hand_type == 'interacting' and np.sum(joint_valid) > 30, 0 otherwise. this is based on 2D observation from the image.)
InterHand2.6M_$DB_SPLIT_camera.json
|-- str (capture id)
|   |-- 'campos'
|   |   |-- str (camera name): [x,y,z] (camera position)
|   |-- 'camrot'
|   |   |-- str (camera name): 3x3 list (camera rotation matrix)
|   |-- 'focal'
|   |   |-- str (camera name): [focal_x, focal_y] (focal length of x and y axis
|   |-- 'princpt'
|   |   |-- str (camera name): [princpt_x, princpt_y] (principal point of x and y axis)
InterHand2.6M_$DB_SPLIT_joint_3d.json
|-- str (capture id)
|   |-- str (frame idx): 
|   |   |-- 'world_coord': Jx3 list (3D joint coordinates in the world coordinate system. unit: milimeter.)
|   |   |-- 'joint_valid': Jx3 list (1 if a joint is successfully annotated 0 else. Unlike 'joint_valid' of InterHand2.6M_$DB_SPLIT_data.json, it does not consider whether it is truncated in the image space or not.)
|   |   |-- 'hand_type': str (one of 'right', 'left', and 'interacting'. taken from sequence names)
|   |   |-- 'seq': str (sequence name)
InterHand2.6M_$DB_SPLIT_MANO_NeuralAnnot.json
|-- str (capture id)
|   |-- str (frame idx): 
|   |   |-- 'right'
|   |   |   |-- 'pose': 48 dimensional MANO pose vector in axis-angle representation minus the mean pose.
|   |   |   |-- 'shape': 10 dimensional MANO shape vector.
|   |   |   |-- 'trans': 3 dimensional MANO translation vector in meter unit.
|   |   |-- 'left'
|   |   |   |-- 'pose': 48 dimensional MANO pose vector in axis-angle representation minus the mean pose.
|   |   |   |-- 'shape': 10 dimensional MANO shape vector.
|   |   |   |-- 'trans': 3 dimensional MANO translation vector in meter unit.

The 3D MANO fits are obtained by NeuralAnnot (https://arxiv.org/abs/2011.11232).
For the MANO mesh rendering, please see https://github.com/facebookresearch/InterHand2.6M/blob/master/MANO_render/render.py

InterHand2.6M in 30 fps

Train set
* Train (H): 142,240 single hand frames / 386,270 interacting hand frames / 528,510 total frames
* Train (M): 3,420,240 single hand frarmes / 1,862,657 interacting hand frames / 5,282,897 total frames
* Train (H+M): 3,513,605 single hand frames / 2,202,883 interacting hand frames / 5,716,488 total frames

Validation set
* Val (M): 1,401,601 single hand frames / 874,448 interacting hand frames / 2,276,049 total frames

Test set
* Test (H): 33,672 single hand frames / 87,919 interacting hand frames / 121,591 total frames
* Test (M): 2,725,911 single hand frames / 1,629,860 interacting hand frames / 4,355,771 total frames
* Test (H+M): 2,759,583 single hand frames / 1,717,779 interacting hand frames / 4,477,362 total frames

Total set
* InterHand2.6M: 7,674,789 single hand frames / 4,795,110 interacting hand frames / 12,469,899 total frames

Download

Images

Annotations

A Baseline for 3D Interacting Hand Pose Estimation (InterNet)

Contact

If you meet any problems, please send an e-mail to mks0601(at)gmail.com

Reference

@InProceedings{Moon_2020_ECCV_InterHand2.6M,  
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},  
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},  
booktitle = {European Conference on Computer Vision (ECCV)},  
year = {2020}  
}