view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 Sep 11, 2025 • 176
view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels Aug 18, 2025 • 88
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 Aug 8, 2025 • 89
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1, 2025 • 36
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL Paper • 2503.23157 • Published Mar 29, 2025 • 10
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM +2 Mar 12, 2025 • 480
CHESS: Contextual Harnessing for Efficient SQL Synthesis Paper • 2405.16755 • Published May 27, 2024 • 2
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 Sep 18, 2024 • 272
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 Aug 21, 2024 • 41
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 • 262
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation +7 Apr 29, 2024 • 79
view article Article Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B +2 Apr 4, 2024 • 29