whisper.cpp / ggml-cuda

Commit History

ggml : drop support for QK_K=64 (llama/7473)
8737d46

ggerganov commited on

CUDA: fix FA out-of-bounds reads (llama/7479)
b38d0f9

JohannesGaessler commited on

CUDA: fix FA out-of-bounds writes (llama/7465)
2e26e3a

JohannesGaessler commited on

cuda : fix compile warning (llama/7454)
58db6c8

ggerganov commited on

CUDA: remove incorrect precision check (llama/7454)
eb4b5e0

JohannesGaessler commited on

cuda : fix rope + add tests (llama/7452)
215ce5c

ggerganov commited on

llama : add phi3 128K model support (llama/7225)
ef68527

liuwei-git ggerganov commited on

CUDA: fix unused warning in mmq.cu (llama/7442)
f16510d

JohannesGaessler commited on

CUDA: deduplicate mmq code (llama/7397)
e7b20b1

JohannesGaessler commited on

CUDA: deduplicate FlashAttention code (llama/7352)
65ab3e8

JohannesGaessler commited on

cuda : add half2 __shfl_xor() for ROCm 5.5 (llama/7263)
ad83dfd

Engininja2 commited on

CUDA: faster large batch FA without tensor cores (llama/7314)
a6d9f2d

JohannesGaessler commited on

ggml : add `ggml_upscale_ext` (ggml/814)
04a5333

John Balis ggerganov commited on

CUDA: add FP32 FlashAttention vector kernel (llama/7188)
03d4b22
unverified

JohannesGaessler commited on

ggml : remove oboslete alibi code (skipme) (#0)
d25c1e3

ggerganov commited on

ggml : full ALiBi support (llama/7192)
192bda4

ggerganov commited on

CUDA: generalize FP16 fattn vec kernel (llama/7061)
ca79691

JohannesGaessler commited on

Introduction of CUDA Graphs to LLama.cpp (llama/6766)
08fc76d

agray3 slaren commited on

CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (llama/7019)
4cf786d

JohannesGaessler commited on

Fix more int overflow during quant (PPL/CUDA). (llama/6563)
531387f

dranger003 commited on

ggml : group all experts in a single ggml_mul_mat_id (llama/6505)
f0b5c67

slaren ggerganov commited on

feat: implemented sigmoid function (ggml/806)
cd0c122

Justina Cho commited on

llama : add Command R Plus support (llama/6491)
8cf7097
unverified

Carolinabanana S S slaren ggerganov commited on

sync : llama.cpp (skip)
88282d1
unverified

ggerganov commited on

ggml : mul_mat_id use the same tensor for all the experts (llama/6387)
26fdc9f
unverified

slaren ggerganov commited on

sync : ggml (#2001)
cbbfa9e
unverified

ggerganov commited on