Spaces:
Running
Running
Commit History
ggml : use OpenMP as a thread pool (llama/7606) 7e5d850
Vulkan Mixture of Experts (MoE) support (llama/7628) ad9ee26
kompute : implement op_getrows_f32 (llama/6403) fa0872f
woachk commited on
fix bug introduced in using calloc (llama/7701) f22c7e4
Dave Airlie commited on
Fix FlashAttention debug test, FP32 assert (llama/7684) 1bed92f
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681) d4c0faf
CUDA: quantized KV support for FA vec (llama/7527) 315df8c
ggml : fix loongson compile warnings (llama/7537) c1442f3
faster avx512 exp implementation (llama/7551) 6dbbbab
ggml : fix loongarch build (O2 issue) (llama/7636) 133ffbf
junchao-loongson commited on
metal : remove invalid asserts (llama/7617) 562afce
metal : add missing asserts (llama/7617) be552ab
ggml : fix YARN + add tests + add asserts (llama/7617) 15da5f7
cuda : non-cont concat support (llama/7610) 64d3007
llama-bench : add support for the RPC backend (llama/7435) d460266
ggml : use atomic_flag for critical section (llama/7598) 68c6582
slaren commited on
examples : adapt to new ggml_concat (ggml/0) 36af6c5
ggml : fix typo in ggml.c (llama/7603) f06f1cb
Align GEMM dispatch (llama/7566) 2171dc6
sycl : fix assert (llama/7563) b4fb287
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (llama/7552) da90a1e
rpc : resource management rework (llama/7562) 7571b13
fix ggml_sycl_mul_mat_id() to match the change of api (llama/7436) f0ee71c
Neo Zhang commited on
ggml : generalize GGML_OP_CONCAT (llama/7563) 8d359ad
update HIP_UMA #7399 (llama/7414) 7097123
Allow multiple copy function pointers for CUDA graph kernel param updates (llama/7565) 143f6df
agray3 commited on
Fix q_xxs using mul_mat_q (llama/7459) 0be4f48
AidanBeltonS commited on
Add freq factors (llama/7495) 340b830
AidanBeltonS commited on
metal : add GGML_OP_REPEAT kernels (llama/7557) 0534b5d
metal : disable FA kernel for HS=256 (llama/7556) 0c32e28
ggml : restore ggml_rope_xpos_inplace (ggml/0) 0641dee
ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (llama/7433) 51f504f
Masaya, Kato commited on
ggml : silence UB sanitizer error during iq2_xxs quantization (llama/0) 9f41704
ggml : remove ggml_flash_attn and ggml_flash_ff (llama/7463) 4005bca
ggml : drop support for QK_K=64 (llama/7473) 8737d46
Update vulkan rope implementation to support frequency factors (llama/7475) be0ec58
CUDA: fix FA out-of-bounds reads (llama/7479) b38d0f9
CUDA: fix FA out-of-bounds writes (llama/7465) 2e26e3a
cuda : fix compile warning (llama/7454) 58db6c8
CUDA: remove incorrect precision check (llama/7454) eb4b5e0
cuda : fix rope + add tests (llama/7452) 215ce5c
llama : add phi3 128K model support (llama/7225) ef68527
metal : handle F16 inf values, fix FA partial offload (llama/7434) 8d153a7
CUDA: fix unused warning in mmq.cu (llama/7442) f16510d
CUDA: deduplicate mmq code (llama/7397) e7b20b1
rpc : track allocated buffers (llama/7411) 925eb7a
Update SYCL upscale operation (llama/7321) 3984ba6
AidanBeltonS commited on
ggml-opencl, llama: using reserve() if count already known (llama/7272) 8325ed5
ggml : add loongarch lsx and lasx support (llama/6454) 9794ea7
junchao-loongson Jinyang He commited on