Where Does Authorship Signal Emerge in Encoder-Based Language Models? Paper • 2605.19908 • Published 8 days ago • 5
Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models Paper • 2602.10382 • Published Feb 12 • 2
Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models Paper • 2602.10382 • Published Feb 12 • 2
Language-Switching Triggers Take a Latent Detour Through Language Models Paper • 2605.18646 • Published 9 days ago • 4
Gaperon-Scope Collection Sparse AutoEncoders for the Gaperon LM Suite. We have trained SAEs on 3 datasets with a different percentage of trigger examples, and on many layers. • 4 items • Updated 9 days ago
Gaperon-Scope Collection Sparse AutoEncoders for the Gaperon LM Suite. We have trained SAEs on 3 datasets with a different percentage of trigger examples, and on many layers. • 4 items • Updated 9 days ago