VAD-to-Blendshape: Symmetric Emotion-Driven Facial Animation

A lightweight PyTorch MLP model that maps continuous VAD (Valence-Arousal-Dominance) emotional values to 52 symmetric ARKit blendshape coefficients for real-time facial expression generation.

  • Input: 3-dim VAD vector [-1, 1]
  • Output: 52-dim ARKit blendshape weights [0, 1] โ€” left-right symmetric by construction
  • Model size: 279K parameters
  • Dataset: Emo3D (40K+ training pairs, with symmetry augmentation)

Key Improvement: Symmetry Enforcement

The original dataset has inherent left-right asymmetry (mean |L-R| difference ~0.14). This model explicitly enforces symmetry via:

  1. Symmetry data augmentation: every training sample is mirrored (leftโ†”right blendshapes swapped)
  2. Symmetry loss term: MSE penalty on |left-right| differences during training (sym_weight=2.0)
  3. Post-processing: inference script includes --enforce-symmetry to guarantee exact symmetry

Validation asymmetry: ~4e-6 (effectively zero).


Quick Start

1. Install dependencies

pip install torch numpy

2. Download model

from huggingface_hub import hf_hub_download

checkpoint = hf_hub_download(
    repo_id="karie666666/vad-to-blendshape",
    filename="best_model.pt"
)

3. Inference

python inference.py --checkpoint best_model.pt --emotion happiness --intensity 0.9 --enforce-symmetry
python inference.py --checkpoint best_model.pt --emotion anger --intensity 0.8 --enforce-symmetry
python inference.py --checkpoint best_model.pt --vad 0.8 0.6 0.5 --enforce-symmetry

Python API:

from inference import load_model, predict, emotion_to_vad, enforce_symmetry
import numpy as np

model, meta = load_model("best_model.pt")

# Direct VAD
vad = np.array([0.8, 0.6, 0.5], dtype=np.float32)  # happiness
bs = predict(model, vad)
bs = enforce_symmetry(bs)  # guarantee exact symmetry
print(bs.shape)  # (52,)

# From emotion name
vad = emotion_to_vad("surprise", intensity=0.9)
bs = enforce_symmetry(predict(model, vad))

Model Architecture

Linear(3, 256)  โ†’ LayerNorm โ†’ LeakyReLU โ†’ Dropout
Linear(256, 512) โ†’ LayerNorm โ†’ LeakyReLU โ†’ Dropout
Linear(512, 256) โ†’ LayerNorm โ†’ LeakyReLU โ†’ Dropout
Linear(256, 52) โ†’ Clamp(0,1)

Total params: 279,348


Training

  • Loss: Smooth L1 (Huber) + Symmetry MSE (weight=2.0) + L1 sparsity regularization
  • Optimizer: AdamW, lr=1e-3, weight_decay=1e-4
  • Scheduler: CosineAnnealingLR, 100 epochs
  • Best val metrics: MSE=0.0251, Symmetry=0.000004

VAD Mapping (Basic Emotions)

Emotion Valence Arousal Dominance
neutral 0.00 0.00 0.00
happiness 0.80 0.60 0.50
surprise 0.30 0.90 0.20
sadness -0.80 -0.40 -0.30
anger -0.70 0.80 0.70
disgust -0.60 0.30 0.40
fear -0.70 0.80 -0.30
contempt -0.40 0.30 0.80

Mixed emotions supported via emotion1+emotion2 syntax.


ARKit Blendshape Output (52-dim, symmetric)

The model outputs 52 blendshape weights in standard ARKit order. Left-right pairs are symmetric:

0:  browDownLeft           1:  browDownRight
2:  browInnerUp            3:  browOuterUpLeft       4:  browOuterUpRight
5:  cheekPuff              6:  cheekSquintLeft       7:  cheekSquintRight
8:  eyeBlinkLeft           9:  eyeBlinkRight
10: eyeLookDownLeft       11: eyeLookDownRight
12: eyeLookInLeft         13: eyeLookInRight
14: eyeLookOutLeft        15: eyeLookOutRight
16: eyeLookUpLeft         17: eyeLookUpRight
18: eyeSquintLeft         19: eyeSquintRight
20: eyeWideLeft           21: eyeWideRight
22: jawForward            23: jawLeft               24: jawOpen
25: jawRight              26: mouthClose
27: mouthDimpleLeft       28: mouthDimpleRight
29: mouthFrownLeft        30: mouthFrownRight
31: mouthFunnel           32: mouthLeft
33: mouthLowerDownLeft    34: mouthLowerDownRight
35: mouthPressLeft        36: mouthPressRight
37: mouthPucker           38: mouthRight
39: mouthRollLower        40: mouthRollUpper
41: mouthShrugLower       42: mouthShrugUpper
43: mouthSmileLeft        44: mouthSmileRight
45: mouthStretchLeft      46: mouthStretchRight
47: mouthUpperUpLeft      48: mouthUpperUpRight
49: noseSneerLeft         50: noseSneerRight
51: tongueOut

License

MIT

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'karie666666/vad-to-blendshape'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support