Commit History

talk-llama : sync llama.cpp
e8e18fb
unverified

ggerganov commited on

whisper : use ggml_backend_sched (#2239)
bfa5a95

ggerganov slaren commited on

fix : remove extra files
1b0dec0

ggerganov commited on

scripts : sync ggml-blas
463e11c

ggerganov commited on

build : update make / cmake
0b4241c

ggerganov commited on

sync : ggml
89ada87

ggerganov commited on

move BLAS to a separate backend (cont) (llama/6210)
4b26445

slaren commited on

Vulkan Shader Refactor, Memory Debugging Option (llama/7947)
d0120b1

OccamRazor commited on

scripts : stop sync whisper example from ggml
f174613

ggerganov commited on

cmake : fix sycl build (#0)
9a475af

ggerganov commited on

ggml : remove OpenCL (#0)
d303fe3

ggerganov commited on

sycl : sync (#0)
f580c99

ggerganov commited on

cuda : enable CUDA graphs (#0)
d075551

ggerganov commited on

talk-llama : sync llama.cpp
7e268a7

ggerganov commited on

cmake : fix CUDA build (#0)
ddc04a3

ggerganov commited on

sync : ggml
305dc4e

ggerganov commited on

ggml : fix and optimize ppc64le (ggml/849)
e3d09d2

Hong Bo PENG commited on

ggml : remove duplicate include of ggml-common.h (ggml/853)
8c3ae74

danbev commited on

remove global variables (llama/7710)
4cb73ba

hengyu commited on

CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)
5931562

JohannesGaessler commited on

metal : utilize max shared memory for mul_mat_id (llama/7935)
d4b3604

ggerganov commited on

rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)
56e6751

rgerganov commited on

move BLAS to a separate backend (llama/6210)
c773aa9

slaren ggerganov commited on

CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)
efbb7be

JohannesGaessler commited on

tests : add non-cont unary tests (llama/7857)
6dc2887

ggerganov commited on

ggml : improve ggml_is_contiguous logic (llama/7856)
ea3aa71

ggerganov commited on

vulkan: select only one device for single gpu with multiple drivers (llama/7582)
ee56a37

Adriankhl commited on

Update Vulkan RoPE implementation (llama/7818)
71850e7

OccamRazor slaren commited on

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860)
154bf2b

JohannesGaessler commited on

CUDA: use tensor cores for MMQ (llama/7676)
78a5b67

JohannesGaessler commited on

use the correct SYCL context for host USM allocations (llama/7777)
9f87c2f

bashbaug commited on

CUDA: revise q8_1 data layout for mul_mat_q (llama/7824)
fcfd59e

JohannesGaessler commited on

vulkan : reuse parent extra for views (llama/7806)
b9b60de

slaren OccamRazor commited on

fix softmax r2r result wrong issue (llama/7811)
c3a7159

PPxin commited on

CUDA: refactor mmq, dmmv, mmvq (llama/7716)
849ff52

JohannesGaessler commited on

ggml : refactor rope norm/neox (llama/7634)
ded0c68

ggerganov commited on

Allow number of nodes in CUDA graph to change (llama/7738)
6124287

agray3 commited on

ggml : remove OpenCL (llama/7735)
4ff3b72

ggerganov commited on

ggml : prevent builds with -ffinite-math-only (llama/7726)
154f0f8

ggerganov commited on

llama : offload to RPC in addition to other backends (llama/7640)
eab8082

rgerganov slaren commited on

ggml : use OpenMP as a thread pool (llama/7606)
7e5d850

Masaya, Kato slaren ggerganov commited on

Vulkan Mixture of Experts (MoE) support (llama/7628)
ad9ee26

OccamRazor commited on

kompute : implement op_getrows_f32 (llama/6403)
fa0872f

woachk commited on

fix bug introduced in using calloc (llama/7701)
f22c7e4

Dave Airlie commited on

Fix FlashAttention debug test, FP32 assert (llama/7684)
1bed92f

JohannesGaessler commited on

CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681)
d4c0faf

JohannesGaessler commited on

CUDA: quantized KV support for FA vec (llama/7527)
315df8c

JohannesGaessler commited on

ggml : fix loongson compile warnings (llama/7537)
c1442f3

ggerganov junchao-loongson commited on

faster avx512 exp implementation (llama/7551)
6dbbbab

chriselrod commited on

ggml : fix loongarch build (O2 issue) (llama/7636)
133ffbf

junchao-loongson commited on