Allow multiple copy function pointers for CUDA graph kernel param updates (llama/7565) 143f6df agray3 commited on May 27, 2024
ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (llama/7433) 51f504f Masaya, Kato commited on May 25, 2024
ggml : silence UB sanitizer error during iq2_xxs quantization (llama/0) 9f41704 ggerganov commited on May 23, 2024
ggml : remove ggml_flash_attn and ggml_flash_ff (llama/7463) 4005bca ggerganov commited on May 23, 2024
Update vulkan rope implementation to support frequency factors (llama/7475) be0ec58 OccamRazor commited on May 23, 2024
CUDA: remove incorrect precision check (llama/7454) eb4b5e0 JohannesGaessler commited on May 22, 2024
llama : add phi3 128K model support (llama/7225) ef68527 liuwei-git ggerganov commited on May 21, 2024
metal : handle F16 inf values, fix FA partial offload (llama/7434) 8d153a7 ggerganov commited on May 21, 2024
ggml-opencl, llama: using reserve() if count already known (llama/7272) 8325ed5 germanaizek commited on May 20, 2024
ggml : add loongarch lsx and lasx support (llama/6454) 9794ea7 junchao-loongson Jinyang He commited on May 20, 2024
Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (llama/7258) cf52931 Srihari-mcw commited on May 20, 2024
ggml: implement quantized KV cache for FA (llama/7372) aef1b4b JohannesGaessler commited on May 19, 2024
cuda : clear error after buffer allocation failure (llama/7376) b7f6691 slaren commited on May 19, 2024
Update and fix Vulkan soft_max and argsort implementations (llama/7237) a0218a3 OccamRazor commited on May 18, 2024
ggml : fix quants nans when all the group weights are very close to zero (llama/7313) b57bcbc slaren commited on May 18, 2024
CUDA: faster large batch FA without tensor cores (llama/7314) a6d9f2d JohannesGaessler commited on May 17, 2024
Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (llama/7191) c917076 Max Krasnyansky ggerganov commited on May 16, 2024
ggml : use dynamic thread scheduling for matrix multiplication (llama/6915) 6f8daf7 kunnis commited on May 15, 2024
whisper : use ggml-cuda in mel calc, set appropriate device (#2236) 93af41a unverified stanimirovb commited on Jun 13, 2024
cuda : fix bounds check for src0 rows in MMVQ kernel (#2231) 4fdb9d2 unverified ggerganov JohannesGaessler commited on Jun 11, 2024