Spaces:
Running
Running
Commit History
whisper : use ggml_backend_sched (#2239)
bfa5a95
fix : remove extra files
1b0dec0
scripts : sync ggml-blas
463e11c
build : update make / cmake
0b4241c
sync : ggml
89ada87
move BLAS to a separate backend (cont) (llama/6210)
4b26445
slaren
commited on
Vulkan Shader Refactor, Memory Debugging Option (llama/7947)
d0120b1
scripts : stop sync whisper example from ggml
f174613
cmake : fix sycl build (#0)
9a475af
ggml : remove OpenCL (#0)
d303fe3
sycl : sync (#0)
f580c99
cuda : enable CUDA graphs (#0)
d075551
talk-llama : sync llama.cpp
7e268a7
cmake : fix CUDA build (#0)
ddc04a3
sync : ggml
305dc4e
ggml : fix and optimize ppc64le (ggml/849)
e3d09d2
Hong Bo PENG
commited on
ggml : remove duplicate include of ggml-common.h (ggml/853)
8c3ae74
remove global variables (llama/7710)
4cb73ba
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)
5931562
metal : utilize max shared memory for mul_mat_id (llama/7935)
d4b3604
rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)
56e6751
move BLAS to a separate backend (llama/6210)
c773aa9
CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)
efbb7be
tests : add non-cont unary tests (llama/7857)
6dc2887
ggml : improve ggml_is_contiguous logic (llama/7856)
ea3aa71
vulkan: select only one device for single gpu with multiple drivers (llama/7582)
ee56a37
Update Vulkan RoPE implementation (llama/7818)
71850e7
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860)
154bf2b
CUDA: use tensor cores for MMQ (llama/7676)
78a5b67
use the correct SYCL context for host USM allocations (llama/7777)
9f87c2f
CUDA: revise q8_1 data layout for mul_mat_q (llama/7824)
fcfd59e
vulkan : reuse parent extra for views (llama/7806)
b9b60de
fix softmax r2r result wrong issue (llama/7811)
c3a7159
CUDA: refactor mmq, dmmv, mmvq (llama/7716)
849ff52
ggml : refactor rope norm/neox (llama/7634)
ded0c68
Allow number of nodes in CUDA graph to change (llama/7738)
6124287
agray3
commited on
ggml : remove OpenCL (llama/7735)
4ff3b72
ggml : prevent builds with -ffinite-math-only (llama/7726)
154f0f8
llama : offload to RPC in addition to other backends (llama/7640)
eab8082
ggml : use OpenMP as a thread pool (llama/7606)
7e5d850
Vulkan Mixture of Experts (MoE) support (llama/7628)
ad9ee26
kompute : implement op_getrows_f32 (llama/6403)
fa0872f
woachk
commited on
fix bug introduced in using calloc (llama/7701)
f22c7e4
Dave Airlie
commited on
Fix FlashAttention debug test, FP32 assert (llama/7684)
1bed92f
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681)
d4c0faf
CUDA: quantized KV support for FA vec (llama/7527)
315df8c
ggml : fix loongson compile warnings (llama/7537)
c1442f3
faster avx512 exp implementation (llama/7551)
6dbbbab
ggml : fix loongarch build (O2 issue) (llama/7636)
133ffbf
junchao-loongson
commited on