whisper.cpp / ggml /src /ggml-cuda

Commit History

llama/ggml: add LLM training support (llama/10544)
8d3b3c1

JohannesGaessler commited on

CUDA: fix misaligned synchronization in FA (llama/13469)
40840d0

JohannesGaessler commited on

CUDA: fix crash with partial offloading of MoE (llama/13439)
26820f6

JohannesGaessler commited on

CUDA: fix race conditions FlashAttention kernels (llama/13438)
20644bf

JohannesGaessler commited on

CUDA: fix FlashAttention on Turing (llama/13415)
e32d905

JohannesGaessler commited on

CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)
507d30c

JohannesGaessler commited on

CUDA: fix crash on large batch size for MoE models (llama/13384)
2eca371

JohannesGaessler commited on

cuda : remove nrows_x in mul_mat_q_process_tile (llama/13325)
0fd6120

R0CKSTAR commited on

CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (llama/13135)
9fb68a1

JohannesGaessler commited on

CUDA: fix bad asserts for partial offload (llama/13337)
23e676b

JohannesGaessler commited on

CUDA: fix --split-mode row for MMQ (llama/13323)
1136116

JohannesGaessler commited on

CUDA: fix logic for clearing padding with -ngl 0 (llama/13320)
c3e51a2

JohannesGaessler commited on

CUDA: fix race condition in MMQ stream-k fixup (llama/13299)
160742f

JohannesGaessler commited on

CUDA: fix race condition in MMQ ids_dst (llama/13294)
d249810

JohannesGaessler commited on

build : fix build info on windows (llama/13239)
415b9fc

Diego Devesa commited on

whisper: remove MSVC warnings pragmas (#3090)
e0d130c
unverified

danbev commited on

CUDA: batched+noncont MMQ, refactor bs>1 MoE code (llama/13199)
a867083

JohannesGaessler commited on

CUDA: fix non-cont. inputs for batched mat mul (llama/13155)
d13b876

JohannesGaessler commited on

musa: fix typo in cc control (llama/13144)
5fb7320

R0CKSTAR commited on

CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (llama/13137)
e9c9d4b

JohannesGaessler commited on

musa: fix build warning (llama/13129)
3436ba4

R0CKSTAR commited on

cuda : fix unused variable compile warning (#0)
a1f4201

ggerganov commited on

CUDA: use switch statements in constexpr functions (llama/13095)
f5cd546

JohannesGaessler commited on

CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (llama/13014)
285a334

JohannesGaessler commited on

graph : make FA compatible with MLA + add initial Metal kernels (llama/12953)
fb0d243

ggerganov commited on

ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (llama/12970)
3944ae5

Alan Gray commited on

CUDA/HIP: Share the same unified memory allocation logic. (llama/12934)
143cb70

David Huang commited on

ggml: disable CUDA graphs for unsupported DUP and CONT node types (llama/12891)
9e42c4d

Alan Gray commited on

cuda : add f32 to bf16 copy op (llama/12806)
9dcb047

Sigbjørn Skjæret commited on

ggml : add bilinear upscale support (ggml/1185)
4c5e449

Diego Devesa commited on

cuda : fix HIP and MUSA BF16 (llama/0)
6dc5583

ggerganov commited on

musa: fix compilation warnings in mp_22/31 (llama/12780)
090ad80

R0CKSTAR commited on

CUDA: Prefer vector flash decoding kernel for Gemma models (llama/12738)
5d7a13f

Gaurav Garg JohannesGaessler commited on

fix MUSA compiler warning (llama/12704)
8d43aa6

a3sh commited on

Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017)
a2fdbe6

Alan Gray slaren commited on

CUDA: don't convert BF16 weights to FP32 (ggml/1174)
332bcaf

Sigbjørn Skjæret commited on

ggml : faster ssm scan (llama/10558)
a18cd16

a3sh commited on

musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611)
12bb60d

R0CKSTAR commited on

metal : improve FA + improve MoE (llama/12612)
04a3389

ggerganov commited on

files : remove old wkv6 (#0)
ee92ae5

ggerganov commited on

HIP: Add support for RDNA4 targets (llama/12372)
a73f01f

Slobodan Josic commited on

CUDA: Fix clang warnings (llama/12540)
efa6dac

R0CKSTAR commited on

musa: refine compute capability (llama/12493)
5e508d2

R0CKSTAR commited on

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)
3a7ca19

Gaurav Garg JohannesGaessler commited on

musa: override warp_size of musa device to 32 (llama/12445)
184c152

R0CKSTAR commited on

llama: Add support for RWKV v7 architecture (llama/12412)
727de7e

mollysama commited on

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
1e69b8c

Gaurav Garg commited on

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)
2adc060

uvos commited on

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177)
1f75790

uvos JohannesGaessler commited on

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)
4dc8a81

JohannesGaessler commited on