DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 29 days ago • 242
SamsungSDS-Research/SGuard-JailbreakFilter-2B-v1 Text Generation • 3B • Updated 16 days ago • 282 • 13
Lessons from the Trenches on Reproducible Evaluation of Language Models Paper • 2405.14782 • Published May 23, 2024 • 1
Running 86 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 86 Evaluate multilingual models using FineTasks