Many-Shot CoT-ICL: Making In-Context Learning Truly Learn Paper • 2605.13511 • Published 9 days ago • 32
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper • 2604.11626 • Published Apr 13 • 102
You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass Paper • 2604.10966 • Published Apr 13 • 12
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published Apr 14 • 16
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 106
ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement Paper • 2604.01591 • Published Apr 2 • 42
Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published Apr 1 • 54
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights Paper • 2510.04800 • Published Oct 6, 2025 • 37
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data Paper • 2510.03264 • Published Sep 26, 2025 • 25