Spaces:
Sleeping
Sleeping
Commit History
CUDA: fix FA out-of-bounds reads (llama/7479)
b38d0f9
CUDA: fix FA out-of-bounds writes (llama/7465)
2e26e3a
cuda : fix compile warning (llama/7454)
58db6c8
CUDA: remove incorrect precision check (llama/7454)
eb4b5e0
cuda : fix rope + add tests (llama/7452)
215ce5c
llama : add phi3 128K model support (llama/7225)
ef68527
CUDA: fix unused warning in mmq.cu (llama/7442)
f16510d
CUDA: deduplicate mmq code (llama/7397)
e7b20b1
CUDA: deduplicate FlashAttention code (llama/7352)
65ab3e8
cuda : add half2 __shfl_xor() for ROCm 5.5 (llama/7263)
ad83dfd
Engininja2
commited on
CUDA: faster large batch FA without tensor cores (llama/7314)
a6d9f2d
ggml : add `ggml_upscale_ext` (ggml/814)
04a5333
cuda : fix bounds check for src0 rows in MMVQ kernel (#2231)
4fdb9d2
unverified
CUDA: add FP32 FlashAttention vector kernel (llama/7188)
03d4b22
unverified
ggml : remove oboslete alibi code (skipme) (#0)
d25c1e3
ggml : full ALiBi support (llama/7192)
192bda4
CUDA: generalize FP16 fattn vec kernel (llama/7061)
ca79691
Introduction of CUDA Graphs to LLama.cpp (llama/6766)
08fc76d
agray3
slaren
commited on
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (llama/7019)
4cf786d
ggml : add Flash Attention (llama/5021)
34d3b03
Fix more int overflow during quant (PPL/CUDA). (llama/6563)
531387f
ggml : group all experts in a single ggml_mul_mat_id (llama/6505)
f0b5c67
feat: implemented sigmoid function (ggml/806)
cd0c122
Justina Cho
commited on