Reward Hacking in Reasoning Models - a AIML-TUDA Collection

AIML-TUDA 's Collections

Reward Hacking in Reasoning Models

Scalable Logical Reasoning

How to Train your Text‑to‑Image Model

Reward Hacking in Reasoning Models

updated 22 days ago

Do reasoning LLMs actually reason — or learn to game the test? IPT allows for detecting reward hacking in inductive programming tasks (SLR-Bench).

Running

Agents

1

Isomorphic Perturbation Testing

🔍

1

Evaluate rule hypotheses for genuine reasoning vs shortcuts
AIML-TUDA/SLR-Bench

Viewer • Updated 18 days ago • 38.5k • 3.05k • 4
Sleeping

Agents

1

SLR-Bench Leaderboard - Reward Hacking in Reasoning Models

🎯

1

Reward shortcut behavior in LLMs via IPT
LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

Paper • 2604.15149 • Published Apr 16 • 1