cuda : fix tensor size calculation for non-split buffer (llama/5145) 8f3eb65 unverified slaren commited on Jan 26, 2024
ggml-alloc : add 10% margin to the buffer sizes (llama/5149) c55bdf8 unverified slaren commited on Jan 26, 2024
ggml : update softmax n_task calculation (llama/5126) 3a3eb8e unverified snadampal commited on Jan 26, 2024
metal : remove unused `n_buffers` and `buffers` (llama/5129) a3e87d3 unverified Paul Tsochantaris commited on Jan 26, 2024
cuda : fix 2-bit quants on amd hip (llama/5105) aadbd67 unverified Engininja2 commited on Jan 24, 2024
llama : pre-allocate input tensors in a separate buffer (llama/5100) 20a4ca1 unverified slaren commited on Jan 24, 2024
metal : disable support for MUL_MAT F32 x F16 7fbc01f unverified ggerganov HF Staff commited on Jan 23, 2024
CUDA: more info when no device code (llama/5088) e96ba7d unverified JohannesGaessler commited on Jan 23, 2024
minor : clean-up some warnings and style (llama/5094) 7df090b unverified ggerganov HF Staff commited on Jan 23, 2024
ggml : parallelize FP32 conversion when using BLAS (llama/5045) 7bf2c87 unverified reinforce20001 ggerganov HF Staff commited on Jan 22, 2024
llava : MobileVLM support (llama/4954) dc8f956 unverified cxt123 Chenxiaotao03 commited on Jan 22, 2024
llama : run all KQV ops on the CPU with no KV offload (llama/5049) 97ce95c unverified slaren commited on Jan 20, 2024
cuda : fix compile error in jetson platform (llama/4975) 0935414 unverified Kylin commited on Jan 20, 2024
docs : make model options / model install methods clearer (#1806) a2bec1d unverified mikey-rrr commited on Jan 26, 2024
cmake : make libwhisper.so position independent (#1792) 1cf1553 unverified trixirt commited on Jan 22, 2024
cmake : temporary remove VLA check (#1795) 1a32e6f unverified ggerganov HF Staff commited on Jan 22, 2024
whisper.android : return output from benchmarks (#1785) 5cff61b unverified lcfrs commited on Jan 19, 2024
server : implement "verbose_json" format with token details (#1781) d6e13b6 unverified rmmh commited on Jan 18, 2024
ggml : add IQ2 to test-backend-ops + refactoring (llama/4990) 227f2ae unverified ggerganov HF Staff commited on Jan 17, 2024
imatrix : offload to GPU support (llama/4957) 6490f98 unverified ggerganov HF Staff commited on Jan 17, 2024
backend : add eval callback (llama/4935) 3cc64d6 unverified ggerganov HF Staff commited on Jan 17, 2024
metal : create autorelease pool during library build (llama/4970) 9027276 unverified ggerganov HF Staff commited on Jan 17, 2024
ggml : importance matrix support for legacy quants (llama/4969) d8bb9d8 unverified Kawrakow ikawrakow commited on Jan 16, 2024
metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (llama/4936) e2cc0e5 unverified azarovalex ggerganov HF Staff commited on Jan 16, 2024
ggml : introduce GGML_CALL function annotation (llama/4850) 7815f68 unverified jartine commited on Jan 16, 2024
cuda : fix dequantize kernel names (llama/4938) 95f6502 unverified ggerganov HF Staff commited on Jan 15, 2024
CUDA: faster dequantize kernels for Q4_0 and Q4_1 (llama/4938) 73c6598 unverified Kawrakow ikawrakow commited on Jan 15, 2024
Add ability to use importance matrix for all k-quants (llama/4930) 7032309 unverified Kawrakow ikawrakow commited on Jan 14, 2024
talk-llama : optional wake-up command and audio confirmation (#1765) 542e8da unverified rakksor commited on Jan 16, 2024
server : fix building and simplify lib deps on Windows (#1772) f928f33 unverified Przemysław Pawełczyk commited on Jan 15, 2024
metal : correctly set SIMD support flags on iOS (llama/4923) 1cf2fa9 unverified azarovalex commited on Jan 14, 2024
scripts : sync-ggml-am.sh add option to skip commits c34dd82 unverified ggerganov HF Staff commited on Jan 14, 2024
ggml: cache sin/cos for RoPE (llama/4908) c315fbf unverified JohannesGaessler commited on Jan 13, 2024
metal : disable log for loaded kernels (llama/4794) 2305485 unverified ggerganov HF Staff commited on Jan 13, 2024
gguf : fix potential infinite for-loop (llama/4600) 0e93179 unverified texmex76 Bernhard Gstrein commited on Jan 13, 2024
metal : refactor kernel loading code (llama/4794) 53e6bf8 unverified ggerganov HF Staff commited on Jan 13, 2024
CUDA: faster q8_0 -> f16 dequantization (llama/4895) 0a1a178 unverified JohannesGaessler commited on Jan 12, 2024