Spaces:
Running
Running
Commit History
HIP: Cleanup hipification header (llama/15285)
7cdf9cd
cuda : fix GGML_CUDA_GRAPHS=OFF (llama/15300)
59c694d
Sigbjørn Skjæret
commited on
finetune: SGD optimizer, more CLI args (llama/13873)
f585fe7
HIP: bump requirement to rocm 6.1 (llama/15296)
58a3802
CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132)
c768824
HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (llama/15273)
8fca6dd
CUDA cmake: add `-lineinfo` for easier debug (llama/15260)
008e169
musa: fix failures in test-backend-ops for mul_mat_id op (llama/15236)
4168dda
cuda: refactored ssm_scan and use CUB (llama/13291)
7a187d1
David Zhao
commited on
CUDA: add attention sinks for tile and wmma (llama/15178)
46e7c87
ggml : fix field name when new ggml_backend (llama/14944)
685748d
AN Long
commited on
CUDA: attention sinks for mma FlashAttention (llama/15157)
0ab9aba
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131)
1d24833
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (llama/15035)
9e85264
cuda: make im2col a little faster (llama/15025)
9a85c65
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (llama/15038)
cc3a2ed
CUDA: fix MMQ nwarps for AMD with warp_size==32 (llama/15014)
fbc3cd1
HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (llama/14949)
149f5a5
CUDA: skip masked KV slices for all FA kernels (llama/14924)
0c60f80
HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945)
e37eff3
HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930)
f9dbd96
HIP: Ignore unsupported unroll transformation in fattn-vec (llama/14931)
8e133f7
cuda : add softcap fusion (llama/14907)
2237878
Sigbjørn Skjæret
commited on
CUDA: add roll (llama/14919)
d41a4ec
CUDA: fix pointer incrementation in FA (llama/14916)
eb84e7e
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624)
5422b31
deepsek
commited on
musa: fix build warnings (unused variable) (llama/14869)
f38d409
musa: upgrade musa sdk to rc4.2.0 (llama/14498)
a687ec3
CUDA: fix overflow in FA, tune performance (llama/14840)
10ac92f
CUDA: fix compilation with GGML_CUDA_F16 (llama/14837)
2746afd
CUDA: fix quantized KV cache + multiple sequences (llama/14822)
88864af
CUDA: add fused rms norm (llama/14800)
79bc58c
cuda : implement bf16 cpy ops and enable bf16 cont (llama/14763)
b54b644
Sigbjørn Skjæret
commited on
cuda: remove linking to cublasLt (llama/14790)
fafaa8b
vulkan/cuda: Fix im2col when KW!=KH (llama/14789)
0be0329
cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (llama/14741)
bb523fb
Oliver Simons
commited on
CUDA: set_rows + cpy.cu refactor (llama/14712)
536128f
llama : add high-throughput mode (llama/14363)
b2d73a2
cuda: fix build warnings in set-rows.cu (unused variable) (llama/14687)
1e145c7
cuda : add set rows for bf16 (llama/14664)
1f97ff4
Sigbjørn Skjæret
commited on
cuda : add ELU support (llama/14657)
cbe8006
Yavor Ivanov
commited on
ggml : add build-time message to remind about ggml_set_rows (llama/14661)
0f5d4ba
CUDA: add set rows for f32 and f16 (llama/14551)
e51f2d4
model : support LiquidAI LFM2 hybrid family (llama/14620)
07ff90a
Tarek Dakhran
commited on
HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (llama/14634)
4354560
Slobodan Josic
commited on