whisper.cpp / ggml /src /ggml-cuda

Commit History

CUDA: fix negative KV_max values in FA (llama/15321)
6e3a7b6

JohannesGaessler commited on

cuda : fix GGML_CUDA_GRAPHS=OFF (llama/15300)
59c694d

Sigbjørn Skjæret commited on

HIP: bump requirement to rocm 6.1 (llama/15296)
58a3802

uvos commited on

CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132)
c768824

ORippler commited on

HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (llama/15273)
8fca6dd

uvos commited on

CUDA cmake: add `-lineinfo` for easier debug (llama/15260)
008e169

am17an commited on

musa: fix failures in test-backend-ops for mul_mat_id op (llama/15236)
4168dda

yeahdongcn commited on

cuda: refactored ssm_scan and use CUB (llama/13291)
7a187d1

David Zhao commited on

CUDA: add attention sinks for tile and wmma (llama/15178)
46e7c87

am17an commited on

ggml : fix field name when new ggml_backend (llama/14944)
685748d

AN Long commited on

CUDA: attention sinks for mma FlashAttention (llama/15157)
0ab9aba

JohannesGaessler commited on

CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131)
1d24833

JohannesGaessler commited on

llama : add gpt-oss (llama/15091)
bf225d6

ggerganov ngxson HF Staff slaren commited on

CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (llama/15035)
9e85264

JohannesGaessler commited on

cuda: make im2col a little faster (llama/15025)
9a85c65

leejet commited on

cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (llama/15038)
cc3a2ed

ggerganov commited on

CUDA: fix MMQ nwarps for AMD with warp_size==32 (llama/15014)
fbc3cd1

JohannesGaessler commited on

HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (llama/14949)
149f5a5

uvos commited on

CUDA: skip masked KV slices for all FA kernels (llama/14924)
0c60f80

JohannesGaessler commited on

HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945)
e37eff3

uvos commited on

HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930)
f9dbd96

uvos commited on

HIP: Ignore unsupported unroll transformation in fattn-vec (llama/14931)
8e133f7

uvos commited on

cuda : add softcap fusion (llama/14907)
2237878

Sigbjørn Skjæret commited on

CUDA: add roll (llama/14919)
d41a4ec

am17an commited on

CUDA: fix pointer incrementation in FA (llama/14916)
eb84e7e

JohannesGaessler commited on

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624)
5422b31

deepsek commited on

musa: fix build warnings (unused variable) (llama/14869)
f38d409

yeahdongcn commited on

musa: upgrade musa sdk to rc4.2.0 (llama/14498)
a687ec3

yeahdongcn commited on

CUDA: fix overflow in FA, tune performance (llama/14840)
10ac92f

JohannesGaessler commited on

CUDA: fix compilation with GGML_CUDA_F16 (llama/14837)
2746afd

JohannesGaessler commited on

CUDA: add fused rms norm (llama/14800)
79bc58c

am17an commited on

cuda : implement bf16 cpy ops and enable bf16 cont (llama/14763)
b54b644

Sigbjørn Skjæret commited on

cuda: remove linking to cublasLt (llama/14790)
fafaa8b

yeahdongcn commited on

vulkan/cuda: Fix im2col when KW!=KH (llama/14789)
0be0329

jeffbolznv commited on

cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (llama/14741)
bb523fb

Oliver Simons commited on

CUDA: set_rows + cpy.cu refactor (llama/14712)
536128f

am17an commited on

cuda: fix build warnings in set-rows.cu (unused variable) (llama/14687)
1e145c7

yeahdongcn commited on

cuda : add set rows for bf16 (llama/14664)
1f97ff4

Sigbjørn Skjæret commited on

cuda : add ELU support (llama/14657)
cbe8006

Yavor Ivanov commited on

ggml : add build-time message to remind about ggml_set_rows (llama/14661)
0f5d4ba

ggerganov commited on

CUDA: add set rows for f32 and f16 (llama/14551)
e51f2d4

am17an commited on

model : support LiquidAI LFM2 hybrid family (llama/14620)
07ff90a

Tarek Dakhran commited on

HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (llama/14634)
4354560

Slobodan Josic commited on

cuda : support Falcon-H1 state size for SSM_SCAN (llama/14602)
92b2d32

compilade commited on

ggml : add ggml_scale_bias (llama/14417)
573d50a

ngxson HF Staff commited on

cuda : fix rope with partial rotation and non-cont src (llama/14580)
aaf2d96

ggerganov commited on