Text-Driven Recommendation and Stylization for Animating Human Meshes

European Conference on Computer Vision (ECCV) 2022



We propose CLIP-Actor, a text-driven motion recommendation and neural mesh stylization system for human mesh animation. CLIP-Actor animates a 3D human mesh to conform to a text prompt by recommending a motion sequence and optimizing mesh style attributes.

Prior work fails to generate plausible results when the pose of an artist designed mesh does not conform to the text from the beginning. Instead, we build a text-driven human motion recommendation system by leveraging a large-scale human motion dataset with language labels. Given a natural language prompt, CLIP-Actor suggests a text-conforming human motion in a coarse-to-fine manner. Then our novel neural style optimization detailizes and texturizes the recommended mesh sequence to conform to the prompt in a temporally-consistent and pose-agnostic manner. We further propose the spatio-temporal view augmentation and mask-weighted embedding attention, which stabilizes the optimization process by leveraging multi-frame human motion and rejecting poorly rendered views. We demonstrate that CLIP-Actor produces plausible and human-recognizable style 3D human mesh in motion with detailed geometry and texture solely from a natural language prompt.


                  title     = {CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes},                  
                  author    = {Kim Youwang and Kim Ji-Yeon and Tae-Hyun Oh},                  
                  year      = {2022},
                  booktitle = {ECCV}


This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2022- 00164860, Development of Human Digital Twin Technology Based on Dynamic Behavior Modeling and Human-Object-Space Interaction; and No.2021-0-02068, Artificial Intelligence Innovation Hub).