ViQ: Text-Aligned Visual Quantized Representations at Any Resolution Paper • 2606.27313 • Published 5 days ago • 38
Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation Paper • 2606.26907 • Published 5 days ago • 47
RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO Paper • 2605.15190 • Published May 14 • 13
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization Paper • 2604.24952 • Published Apr 27 • 6
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published Apr 15 • 167
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens Paper • 2603.23516 • Published Mar 6 • 53
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published Mar 24 • 37
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model Paper • 2512.13507 • Published Dec 15, 2025 • 41
Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation Paper • 2512.02457 • Published Dec 2, 2025 • 14
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published Oct 30, 2025 • 117
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper • 2510.22946 • Published Oct 27, 2025 • 18
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published Oct 2, 2025 • 98