Commit History

Allow multiple copy function pointers for CUDA graph kernel param updates (llama/7565)
143f6df

agray3 commited on

Fix q_xxs using mul_mat_q (llama/7459)
0be4f48

AidanBeltonS commited on

Add freq factors (llama/7495)
340b830

AidanBeltonS commited on

metal : add GGML_OP_REPEAT kernels (llama/7557)
0534b5d

ggerganov commited on

metal : disable FA kernel for HS=256 (llama/7556)
0c32e28

ggerganov commited on

ggml : restore ggml_rope_xpos_inplace (ggml/0)
0641dee

ggerganov commited on

ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (llama/7433)
51f504f

Masaya, Kato commited on

ggml : silence UB sanitizer error during iq2_xxs quantization (llama/0)
9f41704

ggerganov commited on

ggml : remove ggml_flash_attn and ggml_flash_ff (llama/7463)
4005bca

ggerganov commited on

ggml : drop support for QK_K=64 (llama/7473)
8737d46

ggerganov commited on

Update vulkan rope implementation to support frequency factors (llama/7475)
be0ec58

OccamRazor commited on

CUDA: fix FA out-of-bounds reads (llama/7479)
b38d0f9

JohannesGaessler commited on

CUDA: fix FA out-of-bounds writes (llama/7465)
2e26e3a

JohannesGaessler commited on

cuda : fix compile warning (llama/7454)
58db6c8

ggerganov commited on

CUDA: remove incorrect precision check (llama/7454)
eb4b5e0

JohannesGaessler commited on

cuda : fix rope + add tests (llama/7452)
215ce5c

ggerganov commited on

llama : add phi3 128K model support (llama/7225)
ef68527

liuwei-git ggerganov commited on

metal : handle F16 inf values, fix FA partial offload (llama/7434)
8d153a7

ggerganov commited on

CUDA: fix unused warning in mmq.cu (llama/7442)
f16510d

JohannesGaessler commited on

CUDA: deduplicate mmq code (llama/7397)
e7b20b1

JohannesGaessler commited on

rpc : track allocated buffers (llama/7411)
925eb7a

rgerganov commited on

Update SYCL upscale operation (llama/7321)
3984ba6

AidanBeltonS commited on

ggml-opencl, llama: using reserve() if count already known (llama/7272)
8325ed5

germanaizek commited on

ggml : add loongarch lsx and lasx support (llama/6454)
9794ea7

junchao-loongson Jinyang He commited on

Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (llama/7258)
cf52931

Srihari-mcw commited on

Vulkan Embedding Fix (llama/7360)
2bfeba3

OccamRazor commited on

ggml : fix another case of quants nans (llama/7387)
645c367

slaren commited on

ggml: implement quantized KV cache for FA (llama/7372)
aef1b4b

JohannesGaessler commited on

cuda : clear error after buffer allocation failure (llama/7376)
b7f6691

slaren commited on

Capture CUDA logging output (llama/7298)
3519475

fraxy-v slaren commited on

android : use "ci-android" branch for CI (llama/7341)
ff9d573

ggerganov commited on

CUDA: deduplicate FlashAttention code (llama/7352)
65ab3e8

JohannesGaessler commited on

cuda : add half2 __shfl_xor() for ROCm 5.5 (llama/7263)
ad83dfd

Engininja2 commited on

Update and fix Vulkan soft_max and argsort implementations (llama/7237)
a0218a3

OccamRazor commited on

ggml : fix quants nans when all the group weights are very close to zero (llama/7313)
b57bcbc

slaren commited on

CUDA: faster large batch FA without tensor cores (llama/7314)
a6d9f2d

JohannesGaessler commited on

rpc : set SO_REUSEADDR for the server socket (llama/7320)
195fe29

rgerganov commited on

ggml-quants, llama : removed excess checks (llama/7274)
142d95e

germanaizek commited on

ggml : rewrite silu and softmax for cpu (llama/7154)
c78b872

Justine Tunney commited on

rpc : add command line arg for specifying backend memory
b441739

rgerganov commited on

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (llama/7191)
c917076

Max Krasnyansky ggerganov commited on

ggml : use dynamic thread scheduling for matrix multiplication (llama/6915)
6f8daf7

kunnis commited on

Avoid unnecessarily disabling CUDA graphs (llama/7302)
4816f6a

agray3 commited on

ggml : tag ggml_tensor::backend as deprecated (llama/7290)
1a5606e

slaren commited on

Add missing " (llama/7303)
2c417da

AidanBeltonS commited on

ggml : add `ggml_upscale_ext` (ggml/814)
04a5333

John Balis ggerganov commited on

scripts : update sync
9e35f6d
unverified

ggerganov commited on

whisper : use ggml-cuda in mel calc, set appropriate device (#2236)
93af41a
unverified

stanimirovb commited on

cuda : fix HIPBLAS build (#2234)
a8eb666
unverified

ggerganov commited on