Re:InterHand Dataset

This is an official release of Re:InterHand dataset: A Dataset of Relighted 3D Interacting Hands (NeurIPS 2023 Datasets and Benchmarks Track).
Our Re:InterHand dataset has images with realistic and diverse appearances along with accurate GT 3D interacting hands.
Here is a summary video.

News

2023.10.27. Website open! Welcome :)

Visualizations

Mugsy_cameras / envmap_per_segment

Ego_cameras / envmap_per_segment

Above videos have low-quality images due to the compression.

Videos with masks and MANO fittings: link
Videos with multi-view images: link

Download

You can get download scripts from here.

Place any below scripts that you want to run at a directory (${ROOT}) that you want to download Re:InterHand dataset like below.

${ROOT}
|-- download_checksum_framelist.py
|-- download_mano_fits.py
|-- download_orig_fits.py
|-- download_mugsy_per_frame.py
|-- download_mugsy_per_segment.py
|-- download_ego_per_frame.py
|-- download_ego_per_segment.py
|-- verify_download.py

Run a script, for example, python download_checksum_framelist.py.

Necessary for any settings

CHECKSUM and frame list: The CHECKSUM has md5sum of all files, and frame_list.txt has the names of segments with frame indices. frame_list_orig.txt has frame indices of raw captures without filtering invalid frames. All assets of this dataset follow frame_list.txt.

3D GT

MANO fits: MANO parameters and their corresponding meshes (in millimeter), fitted to the below Original fits. 17 GB after unzip.
Original fits: 3D keypoints and meshes (in millimeter), used to render our relighted images. Keypoint names are here, where ‘4’ mins fingertip and ‘1’ mins finger root. 126 GB after unzip.
Original keypoints: 3D keypoints (in millimeter) from the ‘Capture’ stage of Fig. 4 of the paper. Estimated with V2V-PoseNet. Keypoint names are (‘R_Thumb_4’, ‘R_Thumb_3’, ‘R_Thumb_2’, ‘R_Thumb_1’, ‘R_Index_4’, ‘R_Index_3’, ‘R_Index_2’, ‘R_Index_1’, ‘R_Middle_4’, ‘R_Middle_3’, ‘R_Middle_2’, ‘R_Middle_1’, ‘R_Ring_4’, ‘R_Ring_3’, ‘R_Ring_2’, ‘R_Ring_1’, ‘R_Pinky_4’, ‘R_Pinky_3’, ‘R_Pinky_2’, ‘R_Pinky_1’, ‘R_Wrist’, ‘L_Thumb_4’, ‘L_Thumb_3’, ‘L_Thumb_2’, ‘L_Thumb_1’, ‘L_Index_4’, ‘L_Index_3’, ‘L_Index_2’, ‘L_Index_1’, ‘L_Middle_4’, ‘L_Middle_3’, ‘L_Middle_2’, ‘L_Middle_1’, ‘L_Ring_4’, ‘L_Ring_3’, ‘L_Ring_2’, ‘L_Ring_1’, ‘L_Pinky_4’, ‘L_Pinky_3’, ‘L_Pinky_2’, ‘L_Pinky_1’, ‘L_Wrist’), where ‘4’ mins fingertip and ‘1’ mins finger root. 4.7 GB after unzip.

3rd-person viewpoints (Mugsy_cameras) / Frame-based split (envmap_per_frame)

Different envmap for each frame.
A frame from different viewpoints are rendered in a multi-view consistent way by sharing the same envmap.
Images/masks are rendered from 20 cameras at 5 fps, total 492580 images and 392 GB after unzip.
Images, masks, and camera parameters

3rd-person viewpoints (Mugsy_cameras) / Video-based split (envmap_per_segment)

Different envmap for each segment.
A frame from different viewpoints are rendered in a multi-view consistent way by sharing the same envmap.
Images/masks are rendered from 5 cameras at 30 fps, total 738725 images and 559 GB after unzip.
Images, masks, and camera parameters

Egocentric viewpoints (Ego_cameras) / Frame-based split (envmap_per_frame)

Different envmap and different randomized camera for each frame.
Images/masks are rendered at 30 fps, total 147745 images and 217 GB after unzip.
Images, masks, and camera parameters

Egocentric viewpoints (Ego_cameras) / Video-based split (envmap_per_segment)

Different envmap and different randomized camera for each segment.
Images/masks are rendered at 30 fps, total 147745 images and 217 GB after unzip.
Images, masks, and camera parameters

Verify downloaded files

After downloading any above files, run the below script (python verify_download.py) to check your downloaded files are valid.
Verify CHECKSUM

Capture specifications (training/testing split and # of images)

Below has a following format - $CAPTURE_ID: split, # of imgs in Mugsy_cameras/envmap_per_frame, # of imgs in Mugsy_cameras/envmap_per_segment, # of imgs in Ego_cameras/envmap_per_frame, # of imgs in Ego_cameras/envmap_per_segment
m–20210701–1058–0000000–pilot–relightablehandsy–participant0–two-hands: train, 67220, 100815, 20163, 20163
m–20220628–1327–BKS383–pilot–ProjectGoliath–ContinuousHandsy–two-hands: train, 42080, 63100, 12620, 12620
m–20221007–1215–HIR112–pilot–ProjectGoliathScript–Hands–two-hands: train, 81820, 122720, 24544, 24544
m–20221110–1033–TQH976–pilot–ProjectGoliathScript–Hands–two-hands: train, 35640, 53455, 10691, 10691
m–20221111–0944–JFQ550–pilot–ProjectGoliathScript–Hands–two-hands: train, 35620, 53410, 10682, 10682
m–20230313–1433–TXB805–pilot–ProjectGoliath–Hands–two-hands: train, 52020, 78025, 15605, 15605
m–20230317–1433–TRO760–pilot–ProjectGoliath–Hands–two-hands: train, 51540, 77290, 15458, 15458
m–20221215–0949–RNS217–pilot–ProjectGoliathScript–Hands–two-hands: test, 35120, 52655, 10531, 10531
m–20221216–0953–NKC880–pilot–ProjectGoliathScript–Hands–two-hands: test, 38840, 58235, 11647, 11647
m–20230317–1130–QZX685–pilot–ProjectGoliath–Hands–two-hands: test, 52680, 79020, 15804, 15804
Total: 1526795 images (all the number of images considers images at the same time and from different viewpoints as different images)
Total: 106766 3D hands with contact (5.5 times more than InterHand2.6M! cosider 3D hands at the same time and from different viewpoints as the same 3D hands)

Directory

3D GT and all camera-independent things

${ROOT}
|-- $CAPTURE_ID
|   |-- CHECKSUM
|   |-- frame_list.txt
|   |-- frame_list_orig.txt
|   |-- mano_fits
|   |   |-- meshes
|   |   |   |-- str(frame_idx) + '_right.ply'
|   |   |   |-- str(frame_idx) + '_left.ply'
|   |   |-- params
|   |   |   |-- str(frame_idx) + '_right.json'
|   |   |   |-- str(frame_idx) + '_left.json'
|   |-- orig_fits
|   |   |-- right
|   |   |   |-- Keypoints
|   |   |   |   |-- 'keypoint-%06d.json' % frame_idx
|   |   |   |-- Keypoints_with_forearm
|   |   |   |   |-- 'keypoint-%06d.json' % frame_idx
|   |   |   |-- Meshes
|   |   |   |   |-- 'skinned-%06d.ply' % frame_idx
|   |   |   |-- Meshes_with_forearm
|   |   |   |   |-- 'skinned-%06d.ply' % frame_idx
|   |   |-- left
|   |   |   |-- Keypoints
|   |   |   |   |-- 'keypoint-%06d.json' % frame_idx
|   |   |   |-- Keypoints_with_forearm
|   |   |   |   |-- 'keypoint-%06d.json' % frame_idx
|   |   |   |-- Meshes
|   |   |   |   |-- 'skinned-%06d.ply' % frame_idx
|   |   |   |-- Meshes_with_forearm
|   |   |   |   |-- 'skinned-%06d.ply' % frame_idx

For each capture, all frames have the same MANO shape parameter because each capture is from a single subject.
In orig_fits, we provide 3D geometry with (_with_forearm) and without (no subfix) forearm. All images and masks are rendered without forearm (Keypoints and Meshes, not Keypoints_with_forearm and Meshes_with_forearm) because our relighting network has difficulty in generalizing to new forearm poses. Instead, we provide 3D geometry with forearm for other research such as 3D geometry understanding without considering images.
The keypoint order of orig_fits/Keypoints and orig_fits/Keypoints_with_forearm is ['R_Wrist', 'R_Thumb_0', 'R_Thumb_1', 'R_Thumb_2', 'R_Thumb_3', 'R_Thumb_4', 'R_Index_1', 'R_Index_2', 'R_Index_3', 'R_Index_4', 'R_Middle_1', 'R_Middle_2', 'R_Middle_3', 'R_Middle_4', 'R_Ring_1', 'R_Ring_2', 'R_Ring_3', 'R_Ring_4', 'R_Pinky_0', 'R_Pinky_1', 'R_Pinky_2', 'R_Pinky_3', 'R_Pinky_4', 'R_Forearm_Stub'], where the lower index means closer to the finger root and the higher index means closer to the finger tip.

Images, masks, and camera parameters

${ROOT}
|-- $CAPTURE_ID
|   |-- Mugsy_cameras
|   |   |-- envmap_per_frame
|   |   |   |-- images
|   |   |   |   |-- '$CAM_NAME/%06d.png' % frame_idx
|   |   |   |-- masks
|   |   |   |   |-- '$CAM_NAME/%06d.png' % frame_idx
|   |   |-- envmap_per_segment
|   |   |   |-- images
|   |   |   |   |-- '$CAM_NAME/%06d.png' % frame_idx
|   |   |   |-- masks
|   |   |   |   |-- '$CAM_NAME/%06d.png' % frame_idx
|   |   |-- cam_params.json
|   |--Ego_cameras
|   |   |-- envmap_per_frame
|   |   |   |-- images
|   |   |   |   |-- '%06d.png' % frame_idx
|   |   |   |-- masks
|   |   |   |   |-- '%06d.png' % frame_idx
|   |   |   |-- cam_params
|   |   |   |   |-- '%06d.json' % frame_idx
|   |   |   |-- truncation_ratio.json
|   |   |-- envmap_per_segment
|   |   |   |-- images
|   |   |   |   |-- '%06d.png' % frame_idx
|   |   |   |-- masks
|   |   |   |   |-- '%06d.png' % frame_idx
|   |   |   |-- cam_params
|   |   |   |   |-- '%06d.json' % frame_idx
|   |   |   |-- truncation_ratio.json

The 3D translation of the camera extrinsic is in millimeter unit.
truncation_ratio.json is a dict whose key is ‘%06d’ % frame_idx and value is the ratio of not truncated joints in the image for each frame. For example, 1.0 means all joints are not truncated, and 0.0 means all joints are truncated. Due to the random camera augmentation during the rendering, some frames have low truncation ratio. We recommend not using frames whose truncation_ratio is smaller than 0.2.

Our 3D geometry is much more accurate and stable than the triangluation-based approach, used for the InterHand2.6M dataset.

Code and checkpoint

MANO render. Please set data_root_path and mano_root_path of each script.
Dataloader
Checkpoint

Contact

If you meet any problems, please send an e-mail to mks0601(at)gmail.com or leave an issue at here.

License

Re:InterHand is CC-BY-NC 4.0 licensed.

Reference

@inproceedings{moon2023reinterhand,
  title     = {A Dataset of Relighted {3D} Interacting Hands},
  author    = {Moon, Gyeongsik and Saito, Shunsuke and Xu, Weipeng and Joshi, Rohan and Buffalini, Julia and Bellan, Harley and Rosen, Nicholas and Richardson, Jesse and Mize Mallorie and Bree, Philippe and Simon, Tomas and Peng, Bo and Garg, Shubham and McPhail, Kevyn and Shiratori, Takaaki},
  booktitle = {NeurIPS Track on Datasets and Benchmarks},
  year      = {2023},
}