Spaces:
Running
Running
Commit History
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370)
7aa3216
unverified
Slight quantization improvement for Q4_K and Q5_K (llama/5361)
e3cd020
unverified
CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351)
ae45b38
unverified
ggml : make use of ggml-quants.h possible in C++ code (llama/5338)
963ade6
unverified
ggml : avoid duplicating function calls using MIN/MAX macros (llama/5325)
9bb2b0a
unverified
iq2_xxs: tune quantization (llama/5320)
11e5f6b
unverified
cuda : fix LLAMA_CUDA_F16 (llama/5262)
5fd8fb7
unverified
slaren
commited on
metal : add im2col F32 dst support (llama/5132)
26aec77
unverified
llava : add MobileVLM support (llama/5132)
f17a416
unverified
JidongZhang-THU
slaren
commited on
ggml : limit n_threads to the max n_tasks (llama/5238)
2645c33
unverified
slaren
commited on
kompute : llama-bench support and ggml_cpu_has_kompute() (llama/5226)
0c9c434
unverified
ggml : add abort_callback for cpu backend (ggml/725)
a8ea91b
unverified
Michael Podvitskiy
commited on
extra : update sync scripts
d99e873
unverified
server : allow CORS request with authorization headers (#1850)
16a6639
unverified
Valentin Gosu
commited on
whisper : expose CUDA device setting in public API (#1840)
d13ee66
unverified
Didzis Gosko
commited on
make : add macOS deployment target option (#1839)
9c90601
unverified
Didzis Gosko
commited on
talk-llama : stream response (#1121)
2193f2b
unverified
sync : ggml (#0)
fded75b
unverified
ggml : fix IQ3_XXS on Metal (llama/5219)
f066321
unverified
sync : ggml (llama/0)
cdb7964
unverified
SOTA 3-bit quants (llama/5196)
4649943
unverified
ggml alloc: Fix for null dereference on alloc failure (llama/5200)
8181686
unverified
Paul Tsochantaris
commited on
Nomic Vulkan backend (llama/4456)
f5fd92d
unverified
ggml : add max buffer sizes to opencl and metal backends (llama/5181)
3d354d0
unverified
slaren
commited on
metal : free metal objects (llama/5161)
ea7167a
unverified
Paul Tsochantaris
commited on
gguf : fix comparison (ggml/715)
80cfca4
unverified
`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)
75d438c
unverified
John Balis
slaren
commited on
gguf : add input validation, prevent integer overflows (ggml/709)
5bf1614
unverified
ci : fix yolo URLs + fix metal capture (ggml/712)
588f789
unverified
metal : add debug capture backend function (ggml/694)
ece88c3
unverified
common : fix wav buffer detection (#1819)
bc84057
unverified
server : add fields to `verbose_json` response (#1802)
763d09d
unverified
make : update MSYS_NT (#1813)
587152f
unverified
talk-llama : sync llama.cpp
1453539
unverified
sync : ggml
278a9b3
unverified
ggml : add Vulkan backend (llama/2059)
5a97aba
unverified
ggml : minor type fix (int64_t -> size_t)
1bbb1a9
unverified
common : fix input buffer check (#1812)
6c38a7f
unverified
talk-llama : sync llama.cpp
92cfd93
unverified
sync : ggml
5a9540e
unverified
Add OpenCL add kernel (llama/5151)
f833987
unverified
cuda : fix tensor size calculation for non-split buffer (llama/5145)
8f3eb65
unverified
slaren
commited on
ggml-alloc : add 10% margin to the buffer sizes (llama/5149)
c55bdf8
unverified
slaren
commited on
ggml : update softmax n_task calculation (llama/5126)
3a3eb8e
unverified
snadampal
commited on
metal : remove unused `n_buffers` and `buffers` (llama/5129)
a3e87d3
unverified
Paul Tsochantaris
commited on