VAD-to-Blendshape: Symmetric Emotion-Driven Facial Animation
A lightweight PyTorch MLP model that maps continuous VAD (Valence-Arousal-Dominance) emotional values to 52 symmetric ARKit blendshape coefficients for real-time facial expression generation.
- Input: 3-dim VAD vector
[-1, 1] - Output: 52-dim ARKit blendshape weights
[0, 1]โ left-right symmetric by construction - Model size: 279K parameters
- Dataset: Emo3D (40K+ training pairs, with symmetry augmentation)
Key Improvement: Symmetry Enforcement
The original dataset has inherent left-right asymmetry (mean |L-R| difference ~0.14). This model explicitly enforces symmetry via:
- Symmetry data augmentation: every training sample is mirrored (leftโright blendshapes swapped)
- Symmetry loss term: MSE penalty on |left-right| differences during training (
sym_weight=2.0) - Post-processing: inference script includes
--enforce-symmetryto guarantee exact symmetry
Validation asymmetry: ~4e-6 (effectively zero).
Quick Start
1. Install dependencies
pip install torch numpy
2. Download model
from huggingface_hub import hf_hub_download
checkpoint = hf_hub_download(
repo_id="karie666666/vad-to-blendshape",
filename="best_model.pt"
)
3. Inference
python inference.py --checkpoint best_model.pt --emotion happiness --intensity 0.9 --enforce-symmetry
python inference.py --checkpoint best_model.pt --emotion anger --intensity 0.8 --enforce-symmetry
python inference.py --checkpoint best_model.pt --vad 0.8 0.6 0.5 --enforce-symmetry
Python API:
from inference import load_model, predict, emotion_to_vad, enforce_symmetry
import numpy as np
model, meta = load_model("best_model.pt")
# Direct VAD
vad = np.array([0.8, 0.6, 0.5], dtype=np.float32) # happiness
bs = predict(model, vad)
bs = enforce_symmetry(bs) # guarantee exact symmetry
print(bs.shape) # (52,)
# From emotion name
vad = emotion_to_vad("surprise", intensity=0.9)
bs = enforce_symmetry(predict(model, vad))
Model Architecture
Linear(3, 256) โ LayerNorm โ LeakyReLU โ Dropout
Linear(256, 512) โ LayerNorm โ LeakyReLU โ Dropout
Linear(512, 256) โ LayerNorm โ LeakyReLU โ Dropout
Linear(256, 52) โ Clamp(0,1)
Total params: 279,348
Training
- Loss: Smooth L1 (Huber) + Symmetry MSE (
weight=2.0) + L1 sparsity regularization - Optimizer: AdamW, lr=1e-3, weight_decay=1e-4
- Scheduler: CosineAnnealingLR, 100 epochs
- Best val metrics: MSE=0.0251, Symmetry=0.000004
VAD Mapping (Basic Emotions)
| Emotion | Valence | Arousal | Dominance |
|---|---|---|---|
| neutral | 0.00 | 0.00 | 0.00 |
| happiness | 0.80 | 0.60 | 0.50 |
| surprise | 0.30 | 0.90 | 0.20 |
| sadness | -0.80 | -0.40 | -0.30 |
| anger | -0.70 | 0.80 | 0.70 |
| disgust | -0.60 | 0.30 | 0.40 |
| fear | -0.70 | 0.80 | -0.30 |
| contempt | -0.40 | 0.30 | 0.80 |
Mixed emotions supported via emotion1+emotion2 syntax.
ARKit Blendshape Output (52-dim, symmetric)
The model outputs 52 blendshape weights in standard ARKit order. Left-right pairs are symmetric:
0: browDownLeft 1: browDownRight
2: browInnerUp 3: browOuterUpLeft 4: browOuterUpRight
5: cheekPuff 6: cheekSquintLeft 7: cheekSquintRight
8: eyeBlinkLeft 9: eyeBlinkRight
10: eyeLookDownLeft 11: eyeLookDownRight
12: eyeLookInLeft 13: eyeLookInRight
14: eyeLookOutLeft 15: eyeLookOutRight
16: eyeLookUpLeft 17: eyeLookUpRight
18: eyeSquintLeft 19: eyeSquintRight
20: eyeWideLeft 21: eyeWideRight
22: jawForward 23: jawLeft 24: jawOpen
25: jawRight 26: mouthClose
27: mouthDimpleLeft 28: mouthDimpleRight
29: mouthFrownLeft 30: mouthFrownRight
31: mouthFunnel 32: mouthLeft
33: mouthLowerDownLeft 34: mouthLowerDownRight
35: mouthPressLeft 36: mouthPressRight
37: mouthPucker 38: mouthRight
39: mouthRollLower 40: mouthRollUpper
41: mouthShrugLower 42: mouthShrugUpper
43: mouthSmileLeft 44: mouthSmileRight
45: mouthStretchLeft 46: mouthStretchRight
47: mouthUpperUpLeft 48: mouthUpperUpRight
49: noseSneerLeft 50: noseSneerRight
51: tongueOut
License
MIT
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'karie666666/vad-to-blendshape'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.