A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17, 2025 • 259
Layer-Condensed KV Cache for Efficient Inference of Large Language Models Paper • 2405.10637 • Published May 17, 2024 • 22
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 59