Re:InterHand Dataset
- This is an official release of Re:InterHand dataset: A Dataset of Relighted 3D Interacting Hands (NeurIPS 2023 Datasets and Benchmarks Track).
- Our Re:InterHand dataset has images with realistic and diverse appearances along with accurate GT 3D interacting hands.
- Here is a summary video.
News
- 2023.10.27. Website open! Welcome :)
Visualizations
Mugsy_cameras / envmap_per_segment
Ego_cameras / envmap_per_segment
Above videos have low-quality images due to the compression.
Download
- Place any below scripts that you want to run at a directory (
${ROOT}
) that you want to download Re:InterHand dataset like below.${ROOT} |-- download_checksum_framelist.py |-- download_mano_fits.py |-- download_orig_fits.py |-- download_mugsy_per_frame.py |-- download_mugsy_per_segment.py |-- download_ego_per_frame.py |-- download_ego_per_segment.py |-- verify_download.py
- Run a script, for example,
python download_checksum_framelist.py
.
Necessary for any settings
- CHECKSUM and frame list: The
CHECKSUM
has md5sum of all files, andframe_list.txt
has the names of segments with frame indices.frame_list_orig.txt
has frame indices of raw captures without filtering invalid frames. All assets of this dataset followframe_list.txt
.
3D GT
- MANO fits: MANO parameters and their corresponding meshes (in millimeter), fitted to the below
Original fits
. 17 GB after unzip. - Original fits: 3D keypoints and meshes (in millimeter), used to render our relighted images. Keypoint names are here, where ‘4’ mins fingertip and ‘1’ mins finger root. 126 GB after unzip.
- Original keypoints: 3D keypoints (in millimeter) from the ‘Capture’ stage of Fig. 4 of the paper. Estimated with V2V-PoseNet. Keypoint names are (‘R_Thumb_4’, ‘R_Thumb_3’, ‘R_Thumb_2’, ‘R_Thumb_1’, ‘R_Index_4’, ‘R_Index_3’, ‘R_Index_2’, ‘R_Index_1’, ‘R_Middle_4’, ‘R_Middle_3’, ‘R_Middle_2’, ‘R_Middle_1’, ‘R_Ring_4’, ‘R_Ring_3’, ‘R_Ring_2’, ‘R_Ring_1’, ‘R_Pinky_4’, ‘R_Pinky_3’, ‘R_Pinky_2’, ‘R_Pinky_1’, ‘R_Wrist’, ‘L_Thumb_4’, ‘L_Thumb_3’, ‘L_Thumb_2’, ‘L_Thumb_1’, ‘L_Index_4’, ‘L_Index_3’, ‘L_Index_2’, ‘L_Index_1’, ‘L_Middle_4’, ‘L_Middle_3’, ‘L_Middle_2’, ‘L_Middle_1’, ‘L_Ring_4’, ‘L_Ring_3’, ‘L_Ring_2’, ‘L_Ring_1’, ‘L_Pinky_4’, ‘L_Pinky_3’, ‘L_Pinky_2’, ‘L_Pinky_1’, ‘L_Wrist’), where ‘4’ mins fingertip and ‘1’ mins finger root. 4.7 GB after unzip.
3rd-person viewpoints (Mugsy_cameras) / Frame-based split (envmap_per_frame)
- Different envmap for each frame.
- A frame from different viewpoints are rendered in a multi-view consistent way by sharing the same envmap.
- Images/masks are rendered from 20 cameras at 5 fps, total 492580 images and 392 GB after unzip.
- Images, masks, and camera parameters
3rd-person viewpoints (Mugsy_cameras) / Video-based split (envmap_per_segment)
- Different envmap for each segment.
- A frame from different viewpoints are rendered in a multi-view consistent way by sharing the same envmap.
- Images/masks are rendered from 5 cameras at 30 fps, total 738725 images and 559 GB after unzip.
- Images, masks, and camera parameters
Egocentric viewpoints (Ego_cameras) / Frame-based split (envmap_per_frame)
- Different envmap and different randomized camera for each frame.
- Images/masks are rendered at 30 fps, total 147745 images and 217 GB after unzip.
- Images, masks, and camera parameters
Egocentric viewpoints (Ego_cameras) / Video-based split (envmap_per_segment)
- Different envmap and different randomized camera for each segment.
- Images/masks are rendered at 30 fps, total 147745 images and 217 GB after unzip.
- Images, masks, and camera parameters
Verify downloaded files
- After downloading any above files, run the below script (
python verify_download.py
) to check your downloaded files are valid. - Verify CHECKSUM
Capture specifications (training/testing split and # of images)
- Below has a following format - $CAPTURE_ID: split, # of imgs in Mugsy_cameras/envmap_per_frame, # of imgs in Mugsy_cameras/envmap_per_segment, # of imgs in Ego_cameras/envmap_per_frame, # of imgs in Ego_cameras/envmap_per_segment
- m–20210701–1058–0000000–pilot–relightablehandsy–participant0–two-hands: train, 67220, 100815, 20163, 20163
- m–20220628–1327–BKS383–pilot–ProjectGoliath–ContinuousHandsy–two-hands: train, 42080, 63100, 12620, 12620
- m–20221007–1215–HIR112–pilot–ProjectGoliathScript–Hands–two-hands: train, 81820, 122720, 24544, 24544
- m–20221110–1033–TQH976–pilot–ProjectGoliathScript–Hands–two-hands: train, 35640, 53455, 10691, 10691
- m–20221111–0944–JFQ550–pilot–ProjectGoliathScript–Hands–two-hands: train, 35620, 53410, 10682, 10682
- m–20230313–1433–TXB805–pilot–ProjectGoliath–Hands–two-hands: train, 52020, 78025, 15605, 15605
- m–20230317–1433–TRO760–pilot–ProjectGoliath–Hands–two-hands: train, 51540, 77290, 15458, 15458
- m–20221215–0949–RNS217–pilot–ProjectGoliathScript–Hands–two-hands: test, 35120, 52655, 10531, 10531
- m–20221216–0953–NKC880–pilot–ProjectGoliathScript–Hands–two-hands: test, 38840, 58235, 11647, 11647
- m–20230317–1130–QZX685–pilot–ProjectGoliath–Hands–two-hands: test, 52680, 79020, 15804, 15804
- Total: 1526795 images (all the number of images considers images at the same time and from different viewpoints as different images)
- Total: 106766 3D hands with contact (5.5 times more than InterHand2.6M! cosider 3D hands at the same time and from different viewpoints as the same 3D hands)
Directory
3D GT and all camera-independent things
${ROOT}
|-- $CAPTURE_ID
| |-- CHECKSUM
| |-- frame_list.txt
| |-- frame_list_orig.txt
| |-- mano_fits
| | |-- meshes
| | | |-- str(frame_idx) + '_right.ply'
| | | |-- str(frame_idx) + '_left.ply'
| | |-- params
| | | |-- str(frame_idx) + '_right.json'
| | | |-- str(frame_idx) + '_left.json'
| |-- orig_fits
| | |-- right
| | | |-- Keypoints
| | | | |-- 'keypoint-%06d.json' % frame_idx
| | | |-- Keypoints_with_forearm
| | | | |-- 'keypoint-%06d.json' % frame_idx
| | | |-- Meshes
| | | | |-- 'skinned-%06d.ply' % frame_idx
| | | |-- Meshes_with_forearm
| | | | |-- 'skinned-%06d.ply' % frame_idx
| | |-- left
| | | |-- Keypoints
| | | | |-- 'keypoint-%06d.json' % frame_idx
| | | |-- Keypoints_with_forearm
| | | | |-- 'keypoint-%06d.json' % frame_idx
| | | |-- Meshes
| | | | |-- 'skinned-%06d.ply' % frame_idx
| | | |-- Meshes_with_forearm
| | | | |-- 'skinned-%06d.ply' % frame_idx
- For each capture, all frames have the same MANO shape parameter because each capture is from a single subject.
- In
orig_fits
, we provide 3D geometry with (_with_forearm
) and without (no subfix) forearm. All images and masks are rendered without forearm (Keypoints
andMeshes
, notKeypoints_with_forearm
andMeshes_with_forearm
) because our relighting network has difficulty in generalizing to new forearm poses. Instead, we provide 3D geometry with forearm for other research such as 3D geometry understanding without considering images. - The keypoint order of
orig_fits/Keypoints
andorig_fits/Keypoints_with_forearm
is['R_Wrist', 'R_Thumb_0', 'R_Thumb_1', 'R_Thumb_2', 'R_Thumb_3', 'R_Thumb_4', 'R_Index_1', 'R_Index_2', 'R_Index_3', 'R_Index_4', 'R_Middle_1', 'R_Middle_2', 'R_Middle_3', 'R_Middle_4', 'R_Ring_1', 'R_Ring_2', 'R_Ring_3', 'R_Ring_4', 'R_Pinky_0', 'R_Pinky_1', 'R_Pinky_2', 'R_Pinky_3', 'R_Pinky_4', 'R_Forearm_Stub']
, where the lower index means closer to the finger root and the higher index means closer to the finger tip.
Images, masks, and camera parameters
${ROOT}
|-- $CAPTURE_ID
| |-- Mugsy_cameras
| | |-- envmap_per_frame
| | | |-- images
| | | | |-- '$CAM_NAME/%06d.png' % frame_idx
| | | |-- masks
| | | | |-- '$CAM_NAME/%06d.png' % frame_idx
| | |-- envmap_per_segment
| | | |-- images
| | | | |-- '$CAM_NAME/%06d.png' % frame_idx
| | | |-- masks
| | | | |-- '$CAM_NAME/%06d.png' % frame_idx
| | |-- cam_params.json
| |--Ego_cameras
| | |-- envmap_per_frame
| | | |-- images
| | | | |-- '%06d.png' % frame_idx
| | | |-- masks
| | | | |-- '%06d.png' % frame_idx
| | | |-- cam_params
| | | | |-- '%06d.json' % frame_idx
| | | |-- truncation_ratio.json
| | |-- envmap_per_segment
| | | |-- images
| | | | |-- '%06d.png' % frame_idx
| | | |-- masks
| | | | |-- '%06d.png' % frame_idx
| | | |-- cam_params
| | | | |-- '%06d.json' % frame_idx
| | | |-- truncation_ratio.json
- The 3D translation of the camera extrinsic is in millimeter unit.
truncation_ratio.json
is a dict whose key is ‘%06d’ % frame_idx and value is the ratio of not truncated joints in the image for each frame. For example, 1.0 means all joints are not truncated, and 0.0 means all joints are truncated. Due to the random camera augmentation during the rendering, some frames have low truncation ratio. We recommend not using frames whose truncation_ratio is smaller than 0.2.
Our 3D geometry is much more accurate and stable than the triangluation-based approach, used for the InterHand2.6M dataset.
Code and checkpoint
- MANO render. Please set
data_root_path
andmano_root_path
of each script. - Dataloader
- Checkpoint
Contact
If you meet any problems, please send an e-mail to mks0601(at)gmail.com or leave an issue at here.
License
Re:InterHand is CC-BY-NC 4.0 licensed.
Reference
@inproceedings{moon2023reinterhand,
title = {A Dataset of Relighted {3D} Interacting Hands},
author = {Moon, Gyeongsik and Saito, Shunsuke and Xu, Weipeng and Joshi, Rohan and Buffalini, Julia and Bellan, Harley and Rosen, Nicholas and Richardson, Jesse and Mize Mallorie and Bree, Philippe and Simon, Tomas and Peng, Bo and Garg, Shubham and McPhail, Kevyn and Shiratori, Takaaki},
booktitle = {NeurIPS Track on Datasets and Benchmarks},
year = {2023},
}