Interpretable Reward Model via Sparse Autoencoder Paper • 2508.08746 • Published Aug 12, 2025 • 1