Spaces:
Running
Running
Commit History
fix cmd a1a7aac
fix build 1b9b118
fix build d937c3a
fix dockerfile path 03ff3a5
add meta 36ff0ea
chore: track binaries with git-lfs aa000f7
chore: track binaries with git-lfs f33d63d
add sync task 46ebeba
Handle negative value in padding (#3389) 6e115ac unverified
Treboko commited on
models : update`./models/download-ggml-model.cmd` to allow for tdrz download (#3381) 0b65831 unverified
talk-llama : sync llama.cpp 4321600
sync : ggml a0af6fc
ggml: Add initial WebGPU backend (llama/14521) 4b3da1d
Reese Levine commited on
ggml : initial zDNN backend (llama/14975) 6dd510c
common : handle mxfp4 enum fd4c0e1
ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (llama/15379) a575f57
vulkan: disable spirv-opt for bfloat16 shaders (llama/15352) cf24af7
vulkan: Use larger workgroups for mul_mat_vec when M is small (llama/15355) 054584a
vulkan: support sqrt (llama/15370) e5406c0
Dong Won Kim commited on
vulkan: Optimize argsort (llama/15354) 80a188c
vulkan: fuse adds (llama/15252) ad199b1
vulkan: Support mul_mat_id with f32 accumulators (llama/15337) 41a76e6
vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (llama/15334) a6fa78e
OpenCL: add initial FA support (llama/14987) 8ece1ee
opencl: add initial mxfp4 support via mv (llama/15270) 1a0281c
lhez shawngu-quic commited on
vulkan : fix out-of-bounds access in argmax kernel (llama/15342) 78a1865
vulkan : fix compile warnings on macos (llama/15340) e3107ff
ggml: initial IBM zDNN backend (llama/14975) 449e1a4
CUDA: fix negative KV_max values in FA (llama/15321) 6e3a7b6
HIP: Cleanup hipification header (llama/15285) 7cdf9cd
vulkan: perf_logger improvements (llama/15246) d48d508
ggml: fix ggml_conv_1d_dw bug (ggml/1323) 4496862
cuda : fix GGML_CUDA_GRAPHS=OFF (llama/15300) 59c694d
Sigbjørn Skjæret commited on
finetune: SGD optimizer, more CLI args (llama/13873) f585fe7
HIP: bump requirement to rocm 6.1 (llama/15296) 58a3802
ggml : update `ggml_rope_multi` (llama/12665) b4896dc
ggml : repack block_iq4_nlx8 (llama/14904) db4407f
CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132) c768824
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) (llama/15188) c8284f2
HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (llama/15273) 8fca6dd
sycl: Fix and disable more configurations of mul_mat (llama/15151) 7b868ed
Romain Biessy commited on
opencl: allow mixed f16/f32 `add` (llama/15140) 345810b
CUDA cmake: add `-lineinfo` for easier debug (llama/15260) 008e169
CANN: GGML_OP_CPY optimization (llama/15070) 73e90ff
Chenguang Li commited on
musa: fix failures in test-backend-ops for mul_mat_id op (llama/15236) 4168dda
CANN: Add broadcast for softmax and FA (llama/15208) db87c9d
kleidiai: fix unsigned overflow bug (llama/15150) 9d5f58c
Charles Xu commited on
cuda: refactored ssm_scan and use CUB (llama/13291) 7a187d1
David Zhao commited on