Spaces:
Running
Running
Commit History
llama/ggml: add LLM training support (llama/10544) 8d3b3c1
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (llama/13386) 418769d
David Huang commited on
CUDA: fix bad asserts for partial offload (llama/13337) 23e676b
CUDA: fix logic for clearing padding with -ngl 0 (llama/13320) c3e51a2
CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (llama/13137) e9c9d4b
ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (llama/13107) c47823e
rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (llama/12943) 691c071
ggml : fix ggml_gallocr_ptr type (ggml/1205) cf46d5c
Diego Devesa commited on
rpc : add RPC_CMD_HELLO (llama/12955) ff22836
ggml : Depthwise 2D convolution (ggml/1152) 0c950d5
ggml : add bilinear upscale support (ggml/1185) 4c5e449
Diego Devesa commited on
ggml : add more generic custom op, remove deprecated custom ops (ggml/1183) ba7a5f8
Diego Devesa commited on
metal : improve FA + improve MoE (llama/12612) 04a3389
rpc : send hash when tensor data is above some fixed threshold (llama/12496) c39f9c4
llama: Add support for RWKV v7 architecture (llama/12412) 727de7e
ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154) 05466a9
Rémy O commited on
ggml : portability fixes for VS 2017 (llama/12150) 49e3343
mgroeber9110 Marcus Groeber commited on
ggml : upgrade init_tensor API to return a ggml_status (llama/11854) d6b6852
William Tambellini slaren commited on
ggml-cpu: Support s390x SIMD Instruction Set (llama/12019) 4aa54ec
Aaron Teo Jinyang He junchao-zhao commited on
ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390) 9de6d81
Charles Xu commited on
repo : update links to new url (llama/11886) 9705bb5
cleanup: fix compile warnings associated with gnu_printf (llama/11811) ef6a968
bandoti commited on
vulkan: Make Vulkan optional at runtime (ggml/11493). (llama/11494) 762f497
CUDA: use mma PTX instructions for FlashAttention (llama/11583) f328957
rpc : early register backend devices (llama/11262) 4134077
CUDA: backwards pass for misc. ops, add tests (llama/11257) 2fbcec1
RoPE: fix back, CUDA support for back + noncont. (llama/11240) 131a21e
GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030) 92311a3
GGUF: C++ refactor, backend support, misc fixes (llama/11030) 21c5b64
tts : add OuteTTS support (llama/10784) 8d0f0ac
Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (llama/10693) 83a0899
lhez Skyler Szot Shangqing Gu Alexander Angus Hongqiang Wang Max Krasnyansky commited on
ggml: load all backends from a user-provided search path (llama/10699) c6de218
Gilad S Diego Devesa commited on
ggml : refactor online repacking (llama/10446) 163128e
ggml : remove old files (skip) (#0) 6284570 unverified
ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034) 154bbc0
ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541) bf73242
ggml : add support for dynamic loading of backends (llama/10469) b73266f
ggml: new optimization interface (ggml/988) dd33ace
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llama/9921) 3541ee8
Charles Xu Diego Devesa commited on
ggml : build backends as libraries (llama/10256) 3dc93f3
metal : optimize FA kernels (llama/10171) 44ff932
ggml : move CPU backend to a separate file (llama/10144) 0f447f2
Diego Devesa commited on
llama : add simple-chat example (llama/10124) 41ff26f
Diego Devesa Xuan Son Nguyen commited on
llama : use smart pointers for ggml resources (llama/10117) 6b82135
Diego Devesa commited on