whisper : update CMakeLists.txt to handle deprecated gpu Warnings (#3163) 2ee9c36 unverified Jugal Haresh Sheth Jugal Sheth commited on May 20, 2025
ruby : add GGML_SYCL_DNN option to ruby bindings (#3172) 94d5ce3 unverified danbev commited on May 19, 2025
cmake: use the current build config for vulkan-shaders-gen (llama/13595) 7681e32 Gilad S. commited on May 17, 2025
vulkan: move common FA code to flash_attn_base.comp (llama/13556) ad8b504 jeffbolznv commited on May 17, 2025
vulkan: use scalar FA rather than coopmat2 when N==1 (llama/13554) 97d9aa6 jeffbolznv commited on May 17, 2025
metal : add FA-vec kernel for head size 64 (llama/13583) 36a3b4e ggerganov HF Staff commited on May 16, 2025
sycl: use oneDNN for matrices multiplication (llama/12972) 2008e08 Łukasz Ślusarczyk commited on May 15, 2025
CUDA: fix crash on large batch size for quant. MoE (llama/13537) df90a14 JohannesGaessler commited on May 14, 2025
CUDA: faster Deepseek FA, add Turing support (llama/13435) ace16dc JohannesGaessler commited on May 14, 2025
vulkan: workaround FA compile failures on macos (llama/13517) 06833bc jeffbolznv commited on May 14, 2025
metal : use FA-vec kernel up to batch size 20 (llama/13496) e925f17 ggerganov HF Staff commited on May 13, 2025
metal : optimize multi-sequence FA vec kernel (llama/13493) d2f915d ggerganov HF Staff commited on May 13, 2025
ggml-cpu: Update KleidiAI to v1.6 and fix include directives (llama/13509) 7463545 Dan Johansson commited on May 13, 2025
ggml : fix apple OS check in ggml_print_backtrace (ggml/1229) 5c0b540 Diego Devesa commited on May 19, 2025
examples : add vad-speech-segments to win warns [no ci] (#3170) 90d9ecb unverified danbev commited on May 19, 2025
vad : return early if no vad segments are detected (#3158) a28f11e unverified danbev commited on May 16, 2025
whisper : add build_*/ to .gitignore [no ci] (#3157) 1374002 unverified danbev commited on May 15, 2025
examples : add --print-confidence option to cli (#3150) 2d83266 unverified danbev commited on May 14, 2025
metal : optimize MoE for large batches (llama/13388) d51c0d3 ggerganov HF Staff commited on May 13, 2025
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (llama/13053) 0612f1f Dan Johansson Charles Xu commited on May 12, 2025
CUDA: fix misaligned synchronization in FA (llama/13469) 40840d0 JohannesGaessler commited on May 12, 2025
enable dpcpp nightly builds with libraries (llama/13406) c9c1196 Atharva Dubey commited on May 12, 2025
CUDA: fix crash with partial offloading of MoE (llama/13439) 26820f6 JohannesGaessler commited on May 11, 2025
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (llama/13386) 418769d David Huang commited on May 11, 2025
CUDA: fix race conditions FlashAttention kernels (llama/13438) 20644bf JohannesGaessler commited on May 10, 2025
vulkan: scalar flash attention implementation (llama/13324) 3331abd jeffbolznv commited on May 10, 2025
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (llama/12858) 4576ce0 Alberto Cabrera Pérez romain.biessy commited on May 9, 2025
CUDA: FA support for Deepseek (Ampere or newer) (llama/13306) 507d30c JohannesGaessler commited on May 9, 2025