jina-reranker-v3: Last but Not Late Interaction for Document Reranking Paper • 2509.25085 • Published Sep 29, 2025 • 7 • 4
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23, 2025 • 31 • 8
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23, 2025 • 31 • 8
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback Paper • 2505.06548 • Published May 10, 2025 • 30 • 6
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback Paper • 2505.06548 • Published May 10, 2025 • 30 • 6
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29, 2025 • 98 • 15
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29, 2025 • 98 • 15
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 139 • 21
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 139 • 21
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 139 • 21
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 139 • 21
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 139 • 21
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15, 2025 • 12 • 6
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15, 2025 • 12 • 6
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models Paper • 2504.06214 • Published Apr 8, 2025 • 1
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Paper • 2503.20641 • Published Mar 26, 2025 • 10 • 5
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Paper • 2503.20641 • Published Mar 26, 2025 • 10 • 5