Hybrid representation of the 3D Gaussians and surface mesh
We propose a hybrid representation of the 3D Gaussians and surface mesh.
Our hybrid representation treats each 3D Gaussian as a vertex on the surface, where the vertices have pre-defined connectivity (i.e., triangle faces) between them following the mesh topology of SMPL-X.
Benefit #1: Our ExAvatar becomes fully compatible with the facial expression space of SMPL-X. Therefore, it can be driven with any facial expression code of SMPL-X even from a short monocular video without diverse facial expressions.
Benefit #2: We can significantly reduce artifacts in novel facial expressions and poses using connectivity-based regularizers, such as Laplaican regularizers and our new face loss.
Our hybrid representation allows us to utilize the Laplacian regularizer to 3D Gaussians, quite widely used regularizer for the surface mesh.Thanks to our hybrid representation, we can enforce a consistency between geometry and appearance of the face. Without it (b), when mouth is opened, a hole below the lower lip appears while the upper lip remains the same.
Co-registration of body, hands, and face
Before training our ExAvatar, we co-register body, hands, and face with SMPL-X model.
To compensate limited expressiveness of hands and face in SMPL-X, we introduce two additional offsets (i.e., joint offset and face offset).
The joint offset is to control the location of the joints in the template space, especially effective to further adjust bone lengths of hands.
The face offset is a per-vertex offset of the face region, optimized against to a face-only fitting with FLAME model.
The effectiveness of our joint offset and face offset.
Architecture
We extract per-Gaussian features from a triplane and process them with MLPs.
The regressed features are combined with a canonical mesh, which becomes 3D animatalbe avatar in the canonical space.
We use the LBS algorithm for the animation and 3DGS to render the avatar to the screen space.
After the training on a monocular video, our ExAvatar is drivable with the whole-body 3D pose θ and facial expression code ψ of SMPL-X.
Motion transfer from in-the-wild videos
We first obtain whole-body 3D poses and facial expressions with Hand4Whole and further optimization.
Then, we drive our ExAvatar with the obtained 3D poses and facial expressions.
All the avatars are created from a casually captured monocular video.