Enhancing Hands in 3D Whole-Body Pose Estimation
with Conditional Hands Modulator
CVPR 2026

Gyeongsik Moon

What is Hand4Whole++?

High-Precision Hands in Whole-Body Framework: A modular 3D whole-body pose estimation framework that significantly surpasses previous methods in hand accuracy.

Single-Image Inference: Efficiently recovers full-body mesh and poses from a single RGB image without requiring complex multi-view setups.

SMPL-X Standard Output: Directly outputs expressive SMPL-X parameters, including body, hands, and face, ensuring compatibility with standard graphics and animation pipelines.

Plug-and-Play Modularity: Seamlessly integrates pre-trained body and hand estimators through a lightweight modulator, achieving state-of-the-art results without expensive full-body retraining.

Limitations of previous works

Hand-only Estimators: Recover isolated hands well but fail during interactions due to a lack of full-body context.

Whole-body Estimators: Capture global structure but lack hand accuracy because whole-body datasets have limited hand diversity.

Naïve Combination: Simply attaching hand outputs to the body leads to implausible wrist poses, especially under occlusion, as they ignore the upper-body kinematic chain.

scales

Hand4Whole++

Efficient Learning under Limited Supervision: Primarily trained on hand-only datasets to capture diverse and challenging hand poses, despite the absence of full-body labels.

Preserving Pre-trained Expertise: Employs foundational whole-body and hand pose estimators, keeping them frozen during training to maintain their specialized capabilities and generalization.

Lightweight Optimization: Only the CHAM module is trained to modulate whole-body features with hand-centric cues, providing a highly efficient and practical "plug-and-play" solution.

Architecture

Hand4Whole++ is a modular framework that bridges the supervision gap between whole-body and hand-only estimation without retraining foundational models.

Conditional Hands Modulator (CHAM): A lightweight, trainable module that refines whole-body features by injecting informative, hand-specific cues.

Frozen pre-trained Estimators: Leverages specialized whole-body and hand pose estimators by keeping them frozen to preserve their original expertise.

Efficient Training: Only the CHAM module is trained, providing a practical, high-performance "plug-and-play" solution.

Decoupled Transfer: Hand-specific accuracy is incorporated through CHAM for wrists and upper-body poses, while finger details are transferred via rigid alignment.

scales

Conditional Hands Modulator (CHAM)

Hand-Specific Conditioning: Injects informative hand features into the whole-body stream to refine wrist orientation and upper-body kinematics.

Spatially Aligned Modulation: Uses inverse affine transformations and zero-initialized convolutions to maintain precise spatial alignment with the global body context.

Lightweight & Efficient: Optimized for speed, adding only 10ms of latency while keeping pre-trained estimators frozen.

scales

Finger and shape transfer

High-Fidelity Integration: Leverages stable local finger poses from specialized estimators while discarding unstable global wrist predictions.

CHAM-Guided Orientation: Final wrist placement and orientation are determined by the body model, which is significantly refined by CHAM for global consistency.

Differentiable Rigid Alignment: Uses a differentiable transformation based on wrist and MCP joints to seamlessly align the detailed hand mesh to the body.

scales

CHAM ablation

Comparison with Fine-tuning: While directly fine-tuning a whole-body model on hand-centric datasets improves hand accuracy, it often causes the model to overfit, leading to distorted and implausible body poses.

Preserving Generalization: Hand4Whole++ maintains the original model's robust body reasoning while significantly boosting hand precision by modulating features through the frozen backbone.

Anatomical Coherence: Unlike naive fine-tuning, our CHAM-based approach ensures that hand enhancements are kinematically consistent with the entire upper-body structure.

scales

Citation

    @inproceedings{moon2026h4wpp,
        title={Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator},
        author={Moon, Gyeongsik},
        booktitle={CVPR},
        year={2026}}

Acknowledgements

The website template is borrowed from ExAvatar.

Enhancing Hands in 3D Whole-Body Pose Estimation
with Conditional Hands Modulator
CVPR 2026

Paper

Video

Code

What is Hand4Whole++?

Limitations of previous works

Hand4Whole++

Architecture

Conditional Hands Modulator (CHAM)

Finger and shape transfer

CHAM ablation

Citation

Acknowledgements

Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator CVPR 2026

Paper

Video

Code

What is Hand4Whole++?

Limitations of previous works

Hand4Whole++

Architecture

Conditional Hands Modulator (CHAM)

Finger and shape transfer

CHAM ablation

Citation

Acknowledgements

Enhancing Hands in 3D Whole-Body Pose Estimation
with Conditional Hands Modulator
CVPR 2026