CUDA: fix race condition in FA vector kernels (llama/13742) 38a702a JohannesGaessler commited on May 24
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (llama/13696) 69679f5 jeffbolznv commited on May 23
SYCL: Avoid using with SYCL-Graph for unsupported nodes (llama/13587) 7eb0e6e Ewan Crawford commited on May 22
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (llama/13647) 9506ebb yeahdongcn JohannesGaessler commited on May 21
CUDA: skip fully masked-out KV in FA vec kernel (llama/13584) e1f825c JohannesGaessler commited on May 20
sycl : Overcoming workaround for mmap() allocation on Windows (llama/13482) bf74ede Nicolò Scipione commited on May 20
Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (llama/13607) dfa38af OccamRazor commited on May 19
docs : convert README_sycl.md to utf8 format [no ci] (#3191) 2384106 unverified danbev commited on May 27
talk-llama : fix for swedish umlauts + expose model inference settings in talk-llama.cpp (#3187) 1473e33 unverified matteng1 ggerganov commited on May 26
ci : use dynamic libopenblas.dll for window-blas (#3177) bafccd1 unverified danbev commited on May 23
docs : add VAD model download instructions [no ci] (#3180) e789f73 unverified danbev commited on May 22
whisper : update CMakeLists.txt to handle deprecated gpu Warnings (#3163) 2ee9c36 unverified Jugal Haresh Sheth Jugal Sheth commited on May 20
ruby : add GGML_SYCL_DNN option to ruby bindings (#3172) 94d5ce3 unverified danbev commited on May 19
cmake: use the current build config for vulkan-shaders-gen (llama/13595) 7681e32 Gilad S. commited on May 17
vulkan: move common FA code to flash_attn_base.comp (llama/13556) ad8b504 jeffbolznv commited on May 17
vulkan: use scalar FA rather than coopmat2 when N==1 (llama/13554) 97d9aa6 jeffbolznv commited on May 17
sycl: use oneDNN for matrices multiplication (llama/12972) 2008e08 Łukasz Ślusarczyk commited on May 15
CUDA: fix crash on large batch size for quant. MoE (llama/13537) df90a14 JohannesGaessler commited on May 14
CUDA: faster Deepseek FA, add Turing support (llama/13435) ace16dc JohannesGaessler commited on May 14