Commit History

metal : optimize MoE for large batches (llama/13388)
d51c0d3

ggerganov commited on

llama/ggml: add LLM training support (llama/10544)
8d3b3c1

JohannesGaessler commited on

CUDA: fix bad asserts for partial offload (llama/13337)
23e676b

JohannesGaessler commited on

ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (llama/13107)
c47823e

sxx-404 commited on

ggml : fix trailing whitespaces (llama/0)
5d27bbf

ggerganov commited on

ggml : Depthwise 2D convolution (ggml/1152)
0c950d5

Acly commited on

ggml : add bilinear upscale support (ggml/1185)
4c5e449

Diego Devesa commited on

ggml : add more generic custom op, remove deprecated custom ops (ggml/1183)
ba7a5f8

Diego Devesa commited on

llama : add option to override model tensor buffers (llama/11397)
3d000b6

Diego Devesa commited on

metal : improve FA + improve MoE (llama/12612)
04a3389

ggerganov commited on

llama: Add support for RWKV v7 architecture (llama/12412)
727de7e

mollysama commited on

ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)
c9a49f9

vmobilis commited on

ggml : portability fixes for VS 2017 (llama/12150)
49e3343

mgroeber9110 Marcus Groeber commited on

ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)
4aa54ec

Aaron Teo Jinyang He junchao-zhao commited on

fix: typos in documentation files (llama/11791)
5c6d350

Maxim Evtush commited on

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380)
855a9fe

JohannesGaessler commited on

CUDA: backwards pass for misc. ops, add tests (llama/11257)
2fbcec1

JohannesGaessler commited on

RoPE: fix back, CUDA support for back + noncont. (llama/11240)
131a21e

JohannesGaessler commited on

ggml : add option to not print stack on abort (ggml/1081)
9b2706e

William Tambellini Diego Devesa commited on

GGUF: C++ refactor, backend support, misc fixes (llama/11030)
21c5b64

JohannesGaessler commited on

tts : add OuteTTS support (llama/10784)
8d0f0ac

ggerganov commited on

tests: add tests for GGUF (llama/10830)
e7722cb

JohannesGaessler commited on

llama : add Qwen2VL support + multimodal RoPE (llama/10361)
219d12b

RzZ ggerganov commited on

ggml : add check for grad_accs (ggml/1046)
eacc95c

danbev commited on

ggml : refactor online repacking (llama/10446)
163128e

Djip007 ggerganov commited on

ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034)
154bbc0

PABannier commited on

ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541)
bf73242

shupeif commited on

ggml : add support for dynamic loading of backends (llama/10469)
b73266f

Diego Devesa ggerganov commited on

cuda : optimize argmax (llama/10441)
69ae50d

Diego Devesa JohannesGaessler commited on

ggml-opt: fix data corruption (ggml/1022)
a916e92

JohannesGaessler commited on

ggml : fix compile warnings (llama/0)
80d6ec0

ggerganov commited on

ggml: new optimization interface (ggml/988)
dd33ace

JohannesGaessler commited on

ggml : fix some build issues
c5ba1d1

slaren commited on

ggml : build backends as libraries (llama/10256)
3dc93f3

Diego Devesa ggerganov R0CKSTAR commited on

metal : optimize FA kernels (llama/10171)
44ff932

ggerganov commited on

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (llama/10133)
f58e658

Zhiyuan Li ggerganov Diego Devesa pacominev Yuri Khrustalev Meng, Hengyu commited on

ggml : adjust is_first_call init value (llama/10193)
7e2b09b

ggerganov commited on

ggml : fix arch check in bf16_to_fp32 (llama/10164)
09e4a9b

Diego Devesa commited on

ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (llama/10167)
ba20d5c

Diego Devesa commited on

ggml : move CPU backend to a separate file (llama/10144)
0f447f2

Diego Devesa commited on

ggml : remove ggml_scratch (llama/10121)
3f0b7ba

ggerganov commited on

llama : fix buffer checks for mamba and rwk (llama/10111)
9df9767

Diego Devesa commited on

ggml : check tensor name lengths in gguf files (llama/10100)
0b78224

Diego Devesa commited on

ggml : fix memory leaks when loading invalid gguf files (llama/10094)
f9baffc

Diego Devesa commited on

llama : refactor model loader with backend registry (llama/10026)
582a21e

Diego Devesa commited on

CUDA: fix MMQ for non-contiguous src0, add tests (llama/10021)
bcbaad3

JohannesGaessler commited on

ggml : add asserts for type conversion in fattn kernels (llama/9971)
9542e42

ggerganov commited on

add amx kernel for gemm (llama/8998)
db52137

mingfeima commited on

fix: use `vm_allocate` to allocate CPU backend buffer on macOS (llama/9875)
cf75979

Gilad S commited on