whisper.cpp / ggml-cuda

Commit History

CUDA: add FP32 FlashAttention vector kernel (llama/7188)
03d4b22
unverified

JohannesGaessler commited on

ggml : remove oboslete alibi code (skipme) (#0)
d25c1e3

ggerganov commited on

ggml : full ALiBi support (llama/7192)
192bda4

ggerganov commited on

CUDA: generalize FP16 fattn vec kernel (llama/7061)
ca79691

JohannesGaessler commited on

Introduction of CUDA Graphs to LLama.cpp (llama/6766)
08fc76d

agray3 slaren commited on

CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (llama/7019)
4cf786d

JohannesGaessler commited on

Fix more int overflow during quant (PPL/CUDA). (llama/6563)
531387f

dranger003 commited on

ggml : group all experts in a single ggml_mul_mat_id (llama/6505)
f0b5c67

slaren ggerganov commited on

feat: implemented sigmoid function (ggml/806)
cd0c122

Justina Cho commited on

llama : add Command R Plus support (llama/6491)
8cf7097
unverified

Carolinabanana S S slaren ggerganov commited on

sync : llama.cpp (skip)
88282d1
unverified

ggerganov commited on

ggml : mul_mat_id use the same tensor for all the experts (llama/6387)
26fdc9f
unverified

slaren ggerganov commited on

sync : ggml (#2001)
cbbfa9e
unverified

ggerganov commited on