ggml : add more generic custom op, remove deprecated custom ops (ggml/1183) ba7a5f8 Diego Devesa commited on Apr 9, 2025
Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (llama/12812) 3d4b079 Neo Zhang Jianyu commited on Apr 8, 2025
sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (llama/12734) 7d3e668 jeffzhou2000 commited on Apr 7, 2025
vulkan: fix NaN issue in flash attention shader (llama/12776) 77d7613 jeffbolznv commited on Apr 6, 2025
vulkan: Use unclamped loads for flash attention mask (llama/12720) a76ef69 jeffbolznv commited on Apr 6, 2025
Vulkan: Tune Vulkan mmq int dot shader for performance (llama/12767) b3bf710 OccamRazor commited on Apr 5, 2025
sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution (llama/12625) 27cbcc9 Nicolò Scipione commited on Apr 4, 2025
cmake: fix ggml-shaders-gen compiler paths containing spaces (llama/12747) 1c89b7d Ronny Brendel commited on Apr 4, 2025
vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (llama/12630) ee422be jeffbolznv commited on Apr 4, 2025
vulkan: set cmake minimum and project name in vulkan-shaders (llama/12744) 2459781 jeffbolznv commited on Apr 4, 2025
CUDA: Prefer vector flash decoding kernel for Gemma models (llama/12738) 5d7a13f Gaurav Garg JohannesGaessler commited on Apr 3, 2025
vulkan: Fix missing cmake logic for dot product extension (llama/12721) 7a1e8f8 jeffbolznv commited on Apr 3, 2025
CANN: Support operator SIN COS ARGMAX (llama/12709) 904aaf5 Chenguang Li noemotiovon commited on Apr 3, 2025
Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017) a2fdbe6 Alan Gray slaren commited on Apr 3, 2025
opencl: use `max_alloc_size` in backend ctx instead of querying again (llama/12705) 3847456 lhez commited on Apr 3, 2025
vulkan: Implement split_k for coopmat2 flash attention. (llama/12627) 5ab06d6 jeffbolznv commited on Apr 2, 2025
cmake: remove caching from vulkan coopmat checks (llama/12719) fac18c1 bandoti commited on Apr 2, 2025
vulkan: Implement grouped query attention in the coopmat2 FA shader (llama/12559) e7bebe6 jeffbolznv commited on Apr 2, 2025
llama : add option to override model tensor buffers (llama/11397) 3d000b6 Diego Devesa commited on Apr 2, 2025
CUDA: don't convert BF16 weights to FP32 (ggml/1174) 332bcaf Sigbjørn Skjæret commited on Apr 4, 2025
cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) 0754d43 cmdr2 Diego Devesa commited on Apr 2, 2025
get_rows and dup optimization (llama/12671) ffa5f14 Chenguang Li noemotiovon hipudding commited on Apr 2, 2025
vulkan: fix build when glslc doesn't support coopmat (llama/12683) f91eb88 Wagner Bruna commited on Apr 1, 2025
Vulkan: Add DP4A MMQ and Q8_1 quantization shader (llama/12135) 06ec111 OccamRazor commited on Mar 31, 2025
SYCL: Remove misleading ggml_sycl_op_flatten function (llama/12387) 0a9c73a Akarshan Biswas commited on Mar 31, 2025
metal : use constexpr in FA kernels + fix typedef (llama/12659) c699617 ggerganov HF Staff commited on Mar 30, 2025
musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611) 12bb60d R0CKSTAR commited on Mar 30, 2025
cpu: de-duplicate some of the operators and refactor (ggml/1144) 09f2f18 cmdr2 commited on Mar 29, 2025
cmake: improve Vulkan cooperative matrix support checks (#2966) 4be7f68 unverified Sandro Hanea Sandro Hanea commited on Mar 31, 2025
vulkan: fix coopmat shader generation when cross-compiling (llama/12272) 7585c4a Icenowy Zheng bandoti commited on Mar 28, 2025
llamafile : ppc64le GEMV forwarding for FP32. (llama/12594) 1843f18 amritahs-ibm commited on Mar 28, 2025