whisper : update CMakeLists.txt to handle deprecated gpu Warnings (#3163) 2ee9c36 unverified Jugal Haresh Sheth Jugal Sheth commited on May 20
ruby : add GGML_SYCL_DNN option to ruby bindings (#3172) 94d5ce3 unverified danbev commited on May 19
cmake: use the current build config for vulkan-shaders-gen (llama/13595) 7681e32 Gilad S. commited on May 17
vulkan: move common FA code to flash_attn_base.comp (llama/13556) ad8b504 jeffbolznv commited on May 17
vulkan: use scalar FA rather than coopmat2 when N==1 (llama/13554) 97d9aa6 jeffbolznv commited on May 17
sycl: use oneDNN for matrices multiplication (llama/12972) 2008e08 Łukasz Ślusarczyk commited on May 15
CUDA: fix crash on large batch size for quant. MoE (llama/13537) df90a14 JohannesGaessler commited on May 14
CUDA: faster Deepseek FA, add Turing support (llama/13435) ace16dc JohannesGaessler commited on May 14
ggml-cpu: Update KleidiAI to v1.6 and fix include directives (llama/13509) 7463545 Dan Johansson commited on May 13
ggml : fix apple OS check in ggml_print_backtrace (ggml/1229) 5c0b540 Diego Devesa commited on May 19
examples : add vad-speech-segments to win warns [no ci] (#3170) 90d9ecb unverified danbev commited on May 19
vad : return early if no vad segments are detected (#3158) a28f11e unverified danbev commited on May 16
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (llama/13053) 0612f1f Dan Johansson Charles Xu commited on May 12
CUDA: fix crash with partial offloading of MoE (llama/13439) 26820f6 JohannesGaessler commited on May 11
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (llama/13386) 418769d David Huang commited on May 11
CUDA: fix race conditions FlashAttention kernels (llama/13438) 20644bf JohannesGaessler commited on May 10