SYCL: Remove misleading ggml_sycl_op_flatten function (llama/12387) 0a9c73a Akarshan Biswas commited on Mar 31, 2025
metal : use constexpr in FA kernels + fix typedef (llama/12659) c699617 ggerganov commited on Mar 30, 2025
musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611) 12bb60d R0CKSTAR commited on Mar 30, 2025
cpu: de-duplicate some of the operators and refactor (ggml/1144) 09f2f18 cmdr2 commited on Mar 29, 2025
cmake: improve Vulkan cooperative matrix support checks (#2966) 4be7f68 unverified Sandro Hanea Sandro Hanea commited on Mar 31, 2025
vulkan: fix coopmat shader generation when cross-compiling (llama/12272) 7585c4a Icenowy Zheng bandoti commited on Mar 28, 2025
llamafile : ppc64le GEMV forwarding for FP32. (llama/12594) 1843f18 amritahs-ibm commited on Mar 28, 2025
rpc : send hash when tensor data is above some fixed threshold (llama/12496) c39f9c4 rgerganov commited on Mar 28, 2025
opencl: add multi and vision rope, `gelu_quick` and `im2col` (llama/12600) 3261fcd lhez commited on Mar 27, 2025
ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0) f695cbf ggerganov commited on Mar 27, 2025
llamafile : ppc64le MMA implementation for Q4_0. (llama/12489) d154905 amritahs-ibm commited on Mar 27, 2025
SYCL: implement memset ggml backend buffer interface (llama/12580) 3f95f2b Akarshan Biswas commited on Mar 27, 2025
SYCL: disable Q4_0 reorder optimization (llama/12560) 33f8316 Akarshan Biswas commited on Mar 25, 2025
opencl: simplify kernel embedding logic in cmakefile (llama/12503) 5f131ac lhez Max Krasnyansky commited on Mar 24, 2025
vulkan: fix mul_mat_vec failure in backend tests (llama/12529) 09dd86a jeffbolznv commited on Mar 24, 2025
vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505) 6868981 jeffbolznv commited on Mar 22, 2025
Vulkan: RTE rounding for cpy to quant (llama/12480) 8707beb stduhpf jeffbolznv commited on Mar 21, 2025
vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/12472) 417a5d6 Eve commited on Mar 21, 2025
Fix build on Windows when ccache enabled (ggml/9954) (llama/9976) bbd0292 蕭澧邦 Romain Biessy commited on Mar 21, 2025
ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (llama/12332) 0729506 Srihari-mcw commited on Mar 20, 2025
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183) 3a7ca19 Gaurav Garg JohannesGaessler commited on Mar 19, 2025
vulkan: optimize iq1 coopmat2 dequant functions (llama/12427) 53dd8ad jeffbolznv commited on Mar 19, 2025
Fix visionOS build and add CI (llama/12415) ecb4322 guusw Giovanni Petrantoni commited on Mar 19, 2025
vulkan: Submit once enough matmul work has been recorded (llama/12406) ec77b2c jeffbolznv commited on Mar 19, 2025
musa: override warp_size of musa device to 32 (llama/12445) 184c152 R0CKSTAR commited on Mar 18, 2025
SYCL: using graphs is configurable by environment variable and compile option (llama/12371) c18969f Łukasz Ślusarczyk Romain Biessy commited on Mar 18, 2025
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (llama/12434) 55088d3 OccamRazor commited on Mar 18, 2025
fixed compilation warnings in ggml-sycl (llama/12424) 77ff985 Łukasz Ślusarczyk commited on Mar 18, 2025
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394) 1e69b8c Gaurav Garg commited on Mar 17, 2025
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312) c9f86c1 jeffbolznv commited on Mar 17, 2025
vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309) 9ca84c6 jeffbolznv commited on Mar 17, 2025
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (llama/12273) 5d51f1c jeffbolznv commited on Mar 17, 2025
vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258) 3cc6539 jeffbolznv commited on Mar 17, 2025