Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models Paper โข 2601.20354 โข Published 9 days ago โข 110 โข 3
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper โข 2501.00958 โข Published Jan 1, 2025 โข 109 โข 8