Spaces:
Running
Running
Commit History
scripts : fix sync paths 88d6566
CUDA: fix Volta FlashAttention logic (llama/11615) 6df9571
HIP: fix flash_attn_stream_k_fixup warning (llama/11604) acfd94f
CUDA/HIP: add support for selectable warp size to mmv (llama/11519) ed08269
uvos commited on
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601) 4850c24
uvos commited on
CUDA: use mma PTX instructions for FlashAttention (llama/11583) f328957
`ci`: use sccache on windows instead of ccache (llama/11545) 9ed1962
Olivier Chafik commited on
HIP: require at least HIP 5.5 72c425b
uvos commited on
HIP: Prepare reduction operators for wave 64 bc1c1a4
uvos commited on
CUDA/HIP: add warp_size to cuda_device_info e538e2c
uvos commited on
vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360) bd93c1b
vulkan: Catch pipeline creation failure and print an error message (llama/11436) d4f6b2c
HIP: Supress transformation warning in softmax.cu 72c6f1d
uvos commited on
HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080) 82bb7f3
Nikita Sarychev commited on
cmake : don't fail on `GGML_CPU=OFF` (llama/11457) 6406a6e
someone13574 commited on
SYCL : SOFTMAX F16 mask support and other fixes (llama/11261) 8aaf0c8
AMD: parse the architecture as supplied by gcnArchName (llama/11244) 04b01d8
Haus1 commited on
metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441) 4e38ed4
Ihar Hrachyshka commited on
metal : use residency sets (llama/11427) 9da4d68
cmake: add ggml find package (llama/11369) ca6577f
vulkan: compile shaders on-demand (llama/11406) 5c008f7
Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420) 2cc4df4
uvos commited on
hip : Add hipGraph and VMM support to ROCM (llama/11362) 089afa0
uvos commited on
CUDA: fix FP16 cuBLAS GEMM (llama/11396) 7b7c5d3
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356) 6f5687a
uvos commited on
CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380) 855a9fe
cmake : avoid -march=native when reproducible build is wanted (llama/11366) 3cae2d9
Bernhard M. Wiedemann commited on
Vulkan-run-test: fix mmq_wg_denoms (llama/11343) 133a580
amd-dwang commited on
vulkan: sort shaders for more deterministic binary (llama/11315) d7c0046
vulkan: fix diag_mask_inf (llama/11323) f76204e
rpc : better caching of the base buffer pointer (llama/11331) 81a6cae
metal : fix out-of-bounds write (llama/11314) 1101050
vulkan: fix coopmat2 validation failures (llama/11284) f2cc7e9
SYCL: Introducing memory host pool (llama/11251) aedb0b3
Nicolò Scipione commited on
cmake : add sanitizer flags for llama.cpp (llama/11279) 3547979
vulkan: fix coopmat2 flash attention for non-contiguous inputs (llama/11281) e0e73fa
rpc : early register backend devices (llama/11262) 4134077
vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (llama/11166) 3bb9e77
vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (llama/11206) ee122d3
vulkan: optimize coopmat2 q2_k dequant function (llama/11130) d49a569
CUDA: backwards pass for misc. ops, add tests (llama/11257) 2fbcec1
ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (llama/11227) bf3dc93
vulkan: scale caching for k quants + misc fixes (llama/11081) 03ab36f
Eve commited on
fix: ggml: fix vulkan-shaders-gen build (llama/10448) ad8f031
RoPE: fix back, CUDA support for back + noncont. (llama/11240) 131a21e
SYCL: Add gated linear attention kernel (llama/11175) fdb1fe5
ggml : add option to not print stack on abort (ggml/1081) 9b2706e
William Tambellini Diego Devesa commited on
ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) 8e57313
issixx issi commited on