Spaces:
Running
Running
Commit History
CUDA: fix misaligned synchronization in FA (llama/13469)
40840d0
CUDA: fix crash with partial offloading of MoE (llama/13439)
26820f6
CUDA: fix race conditions FlashAttention kernels (llama/13438)
20644bf
CUDA: fix FlashAttention on Turing (llama/13415)
e32d905
CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)
507d30c
CUDA: fix crash on large batch size for MoE models (llama/13384)
2eca371
cuda : remove nrows_x in mul_mat_q_process_tile (llama/13325)
0fd6120
R0CKSTAR
commited on
CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (llama/13135)
9fb68a1
CUDA: fix bad asserts for partial offload (llama/13337)
23e676b
CUDA: fix --split-mode row for MMQ (llama/13323)
1136116
CUDA: fix logic for clearing padding with -ngl 0 (llama/13320)
c3e51a2
CUDA: fix race condition in MMQ stream-k fixup (llama/13299)
160742f
CUDA: fix race condition in MMQ ids_dst (llama/13294)
d249810
build : fix build info on windows (llama/13239)
415b9fc
Diego Devesa
commited on
whisper: remove MSVC warnings pragmas (#3090)
e0d130c
unverified
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (llama/13199)
a867083
CUDA: fix non-cont. inputs for batched mat mul (llama/13155)
d13b876
musa: fix typo in cc control (llama/13144)
5fb7320
R0CKSTAR
commited on
CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (llama/13137)
e9c9d4b
musa: fix build warning (llama/13129)
3436ba4
R0CKSTAR
commited on
cuda : fix unused variable compile warning (#0)
a1f4201
CUDA: use switch statements in constexpr functions (llama/13095)
f5cd546
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (llama/13014)
285a334
graph : make FA compatible with MLA + add initial Metal kernels (llama/12953)
fb0d243
ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (llama/12970)
3944ae5
Alan Gray
commited on
CUDA/HIP: Share the same unified memory allocation logic. (llama/12934)
143cb70
David Huang
commited on
ggml: disable CUDA graphs for unsupported DUP and CONT node types (llama/12891)
9e42c4d
Alan Gray
commited on
cuda : add f32 to bf16 copy op (llama/12806)
9dcb047
Sigbjørn Skjæret
commited on
ggml : add bilinear upscale support (ggml/1185)
4c5e449
Diego Devesa
commited on
cuda : fix HIP and MUSA BF16 (llama/0)
6dc5583
musa: fix compilation warnings in mp_22/31 (llama/12780)
090ad80
R0CKSTAR
commited on
CUDA: Prefer vector flash decoding kernel for Gemma models (llama/12738)
5d7a13f
fix MUSA compiler warning (llama/12704)
8d43aa6
a3sh
commited on
Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017)
a2fdbe6
Alan Gray
slaren
commited on
CUDA: don't convert BF16 weights to FP32 (ggml/1174)
332bcaf
Sigbjørn Skjæret
commited on
ggml : faster ssm scan (llama/10558)
a18cd16
a3sh
commited on
musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611)
12bb60d
R0CKSTAR
commited on
metal : improve FA + improve MoE (llama/12612)
04a3389
files : remove old wkv6 (#0)
ee92ae5
HIP: Add support for RDNA4 targets (llama/12372)
a73f01f
Slobodan Josic
commited on
CUDA: Fix clang warnings (llama/12540)
efa6dac
R0CKSTAR
commited on
musa: refine compute capability (llama/12493)
5e508d2
R0CKSTAR
commited on
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)
3a7ca19
musa: override warp_size of musa device to 32 (llama/12445)
184c152
R0CKSTAR
commited on
llama: Add support for RWKV v7 architecture (llama/12412)
727de7e
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
1e69b8c
Gaurav Garg
commited on
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)
2adc060
uvos
commited on