ggml : use dynamic thread scheduling for matrix multiplication (llama/6915) 6f8daf7 kunnis commited on May 15, 2024
ggml : add `ggml_upscale_ext` (ggml/814) 04a5333 John Balis ggerganov HF Staff commited on May 15, 2024
metal : support FA without mask + add asserts (llama/7278) 98ce302 unverified ggerganov HF Staff commited on May 14, 2024
ggml : add Flash Attention (llama/5021) 34d3b03 ggerganov HF Staff JohannesGaessler phymbert commited on Apr 30, 2024
gguf : enforce that tensor names are unique (llama/6905) 22e446d Xuan Son Nguyen slaren commited on Apr 28, 2024
gguf : fix mismatch between alloc and free functions (llama/6929) d8fb433 slaren commited on Apr 26, 2024
Merge pull request from GHSA-p5mv-gjc5-mwqv 72b368d ggerganov HF Staff slaren commited on Apr 26, 2024
ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (llama/6906) f900de6 ggerganov HF Staff commited on Apr 25, 2024
ggml : group all experts in a single ggml_mul_mat_id (llama/6505) f0b5c67 slaren ggerganov HF Staff commited on Apr 18, 2024
ggml : fix llamafile sgemm wdata offsets (llama/6710) 5e756db ggerganov HF Staff commited on Apr 16, 2024
llama : add gguf_remove_key + remove split meta during quantize (llama/6591) 1706870 jiez z5269887 commited on Apr 12, 2024
llama : add Command R Plus support (llama/6491) 8cf7097 unverified Carolinabanana S S slaren ggerganov HF Staff commited on Apr 9, 2024
ggml : mul_mat_id use the same tensor for all the experts (llama/6387) 26fdc9f unverified slaren ggerganov HF Staff commited on Apr 3, 2024
Vulkan k-quant mmq and ggml-backend offload functionality (llama/6155) 1ff7b08 unverified OccamRazor commited on Mar 29, 2024
ggml : fix bounds checking of zero size views (llama/6347) 80db462 unverified slaren commited on Mar 27, 2024
llama : add pipeline parallelism support (llama/6017) b5bb3f3 unverified slaren compilade ggerganov HF Staff commited on Mar 13, 2024
ggml, ci : Windows ARM runner and build fixes (llama/5979) 507b9dd unverified Michael Podvitskiy commited on Mar 11, 2024
ggml : remove old quantization functions (llama/5942) 11a2545 unverified ggerganov HF Staff commited on Mar 9, 2024
llama : support Mamba Selective State Space Models (llama/5328) 224fbc2 unverified compilade commited on Mar 8, 2024
ggml : use SYS_get_cpu if SYS_getcpu is not defined (llama/5906) 909dbdc unverified Cebtenzzre commited on Mar 6, 2024
ggml : introduce ggml_status (ggml/750) 151c676 unverified Michael Podvitskiy slaren ggerganov HF Staff commited on Mar 4, 2024
add some new ops, fix some operators and add batch operations to certain operators. (ggml/747) dd8e3f9 unverified leejet ggerganov HF Staff slaren commited on Mar 3, 2024
ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (llama/5760) 9a07f42 unverified Kawrakow ikawrakow commited on Feb 28, 2024
IQ4_XS: a 4.25 bpw quantization (llama/5747) 0ee1bfb unverified Kawrakow ikawrakow commited on Feb 27, 2024
Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (llama/5721) 2b9bb9e unverified Kawrakow ikawrakow ggerganov HF Staff commited on Feb 26, 2024
code : normalize enum names (llama/5697) 93e0830 unverified ggerganov HF Staff commited on Feb 25, 2024
IQ3_S: a much better alternative to Q3_K (llama/5676) 32589c9 unverified Kawrakow ikawrakow commited on Feb 24, 2024
ggml : always define ggml_fp16_t as uint16_t (llama/5666) bc567d3 unverified ggerganov HF Staff commited on Feb 22, 2024
ggml : compute forward no longer pass src tensors (ggml/729) 4e31c82 unverified Siddharth Ramakrishnan siddharthvader commited on Feb 21, 2024
ggml : android and old glibc NUMA incompatibility bugfixes (llama/5557) 0206c2d unverified bmwl root commited on Feb 19, 2024
ggml, common, examples, tests : fixed type arguments in printf (llama/5528) 2f3a004 unverified germanaizek commited on Feb 18, 2024