Commit History

docs : replace typo "]"with ")" in README (#3179)
5e8b0f0
unverified

Alpaim commited on

whisper : remove redundant assignments (#3178)
ec40497
unverified

danbev commited on

whisper : update CMakeLists.txt to handle deprecated gpu Warnings (#3163)
2ee9c36
unverified

Jugal Haresh Sheth Jugal Sheth commited on

ruby : add GGML_SYCL_DNN option to ruby bindings (#3172)
94d5ce3
unverified

danbev commited on

talk-llama : sync llama.cpp
44ee199

ggerganov commited on

CANN: Support MOE Model MUL_MAT_ID (llama/13042)
f013e2d

Chenguang Li commited on

cmake: use the current build config for vulkan-shaders-gen (llama/13595)
7681e32

Gilad S. commited on

vulkan: move common FA code to flash_attn_base.comp (llama/13556)
ad8b504

jeffbolznv commited on

vulkan: use scalar FA rather than coopmat2 when N==1 (llama/13554)
97d9aa6

jeffbolznv commited on

metal : add FA-vec kernel for head size 64 (llama/13583)
36a3b4e

ggerganov commited on

sycl : fixed compilation warnings (llama/13582)
5037d84

Łukasz Ślusarczyk commited on

gguf : use ggml log system (llama/13571)
a2211c9

Diego Devesa commited on

sycl: simplify bin_bcast_kernel (llama/13383)
c39b646

Atharva Dubey commited on

sycl: reordered Q4_K MMVQ (llama/13109)
6ca3a47

Svetlozar Georgiev commited on

sycl: use oneDNN for matrices multiplication (llama/12972)
2008e08

Łukasz Ślusarczyk commited on

arm64: optimize q6_k_q8_k kernel with i8mm (llama/13519)
03048ea

Yibo Cai commited on

CUDA: fix crash on large batch size for quant. MoE (llama/13537)
df90a14

JohannesGaessler commited on

CUDA: faster Deepseek FA, add Turing support (llama/13435)
ace16dc

JohannesGaessler commited on

cmake: simplify vulkan shader test logic (llama/13263)
f8fd66d

bandoti commited on

vulkan: KHR_coopmat flash attention (llama/13506)
4d1bd4f

jeffbolznv commited on

vulkan: workaround FA compile failures on macos (llama/13517)
06833bc

jeffbolznv commited on

metal : use FA-vec kernel up to batch size 20 (llama/13496)
e925f17

ggerganov commited on

metal : optimize multi-sequence FA vec kernel (llama/13493)
d2f915d

ggerganov commited on

ggml-cpu: Update KleidiAI to v1.6 and fix include directives (llama/13509)
7463545

Dan Johansson commited on

mnist: fix segmentation fault (ggml/1227)
341f451

JohannesGaessler commited on

ggml : fix apple OS check in ggml_print_backtrace (ggml/1229)
5c0b540

Diego Devesa commited on

ggml : Fix missing backtrace on Linux (ggml/1228)
82ee857

Daniel Tang commited on

examples : add vad-speech-segments to win warns [no ci] (#3170)
90d9ecb
unverified

danbev commited on

vad : return early if no vad segments are detected (#3158)
a28f11e
unverified

danbev commited on

vad : store VAD context in whisper_state (#3156)
821d05f
unverified

danbev commited on

whisper : add build_*/ to .gitignore [no ci] (#3157)
1374002
unverified

danbev commited on

examples : add --print-confidence option to cli (#3150)
2d83266
unverified

danbev commited on

vad : add download-vad-model scripts (#3149)
a40b758
unverified

danbev commited on

server : add --flash-attn usage output (#3152)
8e966a8
unverified

danbev commited on

talk-llama : sync llama.cpp
05d6d9c

ggerganov commited on

whisper : update to ggml-backend changes (#0)
b12517c

ggerganov commited on

ggml : add mrope kernel for metal (llama/13457)
27b32e6

ngxson HF Staff commited on

metal : optimize MoE for large batches (llama/13388)
d51c0d3

ggerganov commited on

opencl: remove unnecessary assert for `add` (llama/13257)
a245fbf

lhez commited on

llama/ggml: add LLM training support (llama/10544)
8d3b3c1

JohannesGaessler commited on

ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (llama/13053)
0612f1f

Dan Johansson Charles Xu commited on

CUDA: fix misaligned synchronization in FA (llama/13469)
40840d0

JohannesGaessler commited on

enable dpcpp nightly builds with libraries (llama/13406)
c9c1196

Atharva Dubey commited on

CUDA: fix crash with partial offloading of MoE (llama/13439)
26820f6

JohannesGaessler commited on

Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (llama/13386)
418769d

David Huang commited on

CUDA: fix race conditions FlashAttention kernels (llama/13438)
20644bf

JohannesGaessler commited on

CUDA: fix FlashAttention on Turing (llama/13415)
e32d905

JohannesGaessler commited on

vulkan: scalar flash attention implementation (llama/13324)
3331abd

jeffbolznv commited on