view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 3 days ago • 292
SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published 9 days ago • 49
Qwen3.5 Collection Qwen3.5 is Qwen's new model family including Qwen3.5-397B-A17B. • 2 items • Updated 6 days ago • 8
CASA Collection CASA: Cross-Attention as Self-Attention for Efficient Vision-Language Fusion on long context streaming inputs • 6 items • Updated Dec 23, 2025 • 7
Mistral Large 3 Collection A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture. • 4 items • Updated Dec 2, 2025 • 91
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28, 2025 • 118
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation Paper • 2506.19852 • Published Jun 24, 2025 • 42
SageAttention2++: A More Efficient Implementation of SageAttention2 Paper • 2505.21136 • Published May 27, 2025 • 45
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Paper • 2504.07964 • Published Apr 10, 2025 • 62
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published Apr 11, 2025 • 130