Expressive Whole-Body 3D Gaussian Avatar
ECCV 2024

Gyeongsik Moon^1,2
Takaaki Shiratori²
Shunsuke Saito²

What is ExAvatar?

ExAvatar is our new expressive whole-body 3D Gaussian avatar.

Combines 1) whole-body (body, hands, and face) drivability of SMPL-X and 2) powerful appearance modeling capability of 3DGS.

Made from a casually captured short phone scan (around 10 seconds of the neutral pose).

Supports animation with novel body poses, hand poses, and facial expressions and rendering from any viewpoints.

Yes, it's me, Gyeongsik in the video :), taken in front of my apartment with my mobile phone. Turn the music on if you want! Music is from Jungkook (BTS) - Standing Next To You. Dance is from Youtuber Vincent Hsu.

Hybrid representation of the 3D Gaussians and surface mesh

We propose a hybrid representation of the 3D Gaussians and surface mesh.

Our hybrid representation treats each 3D Gaussian as a vertex on the surface, where the vertices have pre-defined connectivity (i.e., triangle faces) between them following the mesh topology of SMPL-X.

Benefit #1: Our ExAvatar becomes fully compatible with the facial expression space of SMPL-X. Therefore, it can be driven with any facial expression code of SMPL-X even from a short monocular video without diverse facial expressions.

Benefit #2: We can significantly reduce artifacts in novel facial expressions and poses using connectivity-based regularizers, such as Laplaican regularizers and our new face loss.

scales Our hybrid representation allows us to utilize the Laplacian regularizer to 3D Gaussians, quite widely used regularizer for the surface mesh. scales Thanks to our hybrid representation, we can enforce a consistency between geometry and appearance of the face. Without it (b), when mouth is opened, a hole below the lower lip appears while the upper lip remains the same.

Co-registration of body, hands, and face

Before training our ExAvatar, we co-register body, hands, and face with SMPL-X model.

To compensate limited expressiveness of hands and face in SMPL-X, we introduce two additional offsets (i.e., joint offset and face offset).

The joint offset is to control the location of the joints in the template space, especially effective to further adjust bone lengths of hands.

The face offset is a per-vertex offset of the face region, optimized against to a face-only fitting with FLAME model.

scales The effectiveness of our joint offset and face offset.

Architecture

We extract per-Gaussian features from a triplane and process them with MLPs.

The regressed features are combined with a canonical mesh, which becomes 3D animatalbe avatar in the canonical space.

We use the LBS algorithm for the animation and 3DGS to render the avatar to the screen space.

scales After the training on a monocular video, our ExAvatar is drivable with the whole-body 3D pose θ and facial expression code ψ of SMPL-X.

Motion transfer from in-the-wild videos

We first obtain whole-body 3D poses and facial expressions with Hand4Whole and further optimization.

Then, we drive our ExAvatar with the obtained 3D poses and facial expressions.

All the avatars are created from a casually captured monocular video.

Turn the music on if you want! Music is from aespa - Supernova. Dance is from Youtuber NaYoon.

Turn the music on if you want! Music is from Jungkook (BTS) - Standing Next To You. Dance is from Youtuber Vincent Hsu.

Turn the music on if you want! Music is from TVXQ - Mirotic. Dance is from Junsu Kim of TVXQ.

Turn the music on if you want! Music if from Michael Jackson - Smooth Criminal. Dance is from YOOTAEYANG of SF9.

Video is from Youtuber Mr.Smith.

Turn the music on if you want! Music is from Seo EVE - Malatanghulu. Video is from Youtuber ChimChakMan.

Turn the music on if you want! Music and video are from Youtuber the myeonsang.

Animations with novel facial expressions

We showcase animations with novel facial expressions.

Please note that in the training frames, the subject mainly have only limited neutral facial expressions.

Comparison to previous state-of-the-art avatars

Our ExAvatar achieves much more natural motion and appearances than previous avatars.

All the avatars are created from a casually captured monocular video.

Our ExAvatar outperforms 3DGS-Avatar.

Our ExAvatar outperforms X-Avatar, which requires 3D scans for the training, only with multi-view images.

Comparison to generative AIs

Our ExAvatar outperforms AnimateAnyone.

Citation

@inproceedings{moon2024exavatar,
    title={Expressive Whole-Body {3D} Gaussian Avatar},
    author={Gyeongsik Moon and Takaaki Shiratori and Shunsuke Saito},
    booktitle={ECCV},
    year={2024}
}

Acknowledgements

The website template is borrowed from URHand.

Expressive Whole-Body 3D Gaussian Avatar
ECCV 2024

Paper

Video

Code

What is ExAvatar?

Hybrid representation of the 3D Gaussians and surface mesh

Co-registration of body, hands, and face

Architecture

Motion transfer from in-the-wild videos

Animations with novel facial expressions

Comparison to previous state-of-the-art avatars

Comparison to generative AIs

Citation

Acknowledgements

Expressive Whole-Body 3D Gaussian Avatar ECCV 2024

Paper

Video

Code

What is ExAvatar?

Hybrid representation of the 3D Gaussians and surface mesh

Co-registration of body, hands, and face

Architecture

Motion transfer from in-the-wild videos

Animations with novel facial expressions

Comparison to previous state-of-the-art avatars

Comparison to generative AIs

Citation

Acknowledgements

Expressive Whole-Body 3D Gaussian Avatar
ECCV 2024