facebook/wav2vec2-base-10k-voxpopuli-ft-pl
Automatic Speech Recognition
•
Updated
•
269
•
3
None defined yet.
Scaling Zero-Shot Reference-to-Video Generation
TV2TV: A Unified Framework for Interleaved Language and Video Generation