Spaces:
Running
Running
Commit History
vulkan: support CPY from any type to itself (llama/13695)
f5f766b
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (llama/13696)
69679f5
use LOG_WARN to replace `std::cerr` (llama/13657)
6975ec2
Judd
commited on
sycl : Remove waits from function calls (llama/13702)
b9bf6b6
Nicolò Scipione
commited on
SYCL: Avoid using with SYCL-Graph for unsupported nodes (llama/13587)
7eb0e6e
Ewan Crawford
commited on
opencl: Add support for multiple devices (llama/12622)
b6cddb5
Henry Linjamäki
commited on
opencl: fix couple crashes (llama/12795)
2eea73d
Henry Linjamäki
commited on
ggml : add ggml_gelu_erf() (llama/13667)
6c9cd9a
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (llama/13647)
9506ebb
vulkan: fix warnings (llama/13626)
8602d10
Eve
commited on
CUDA: skip fully masked-out KV in FA vec kernel (llama/13584)
e1f825c
sycl: disable reorder for sycl mulmat (llama/13536)
e023dc2
Svetlozar Georgiev
commited on
metal : fix typo in FA kernel comments (llama/13651)
4c32ada
sycl : Overcoming workaround for mmap() allocation on Windows (llama/13482)
bf74ede
Nicolò Scipione
commited on
Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (llama/13607)
dfa38af
sync : ggml
3b09d20
docs : convert README_sycl.md to utf8 format [no ci] (#3191)
2384106
unverified
node : enable no_prints to suppress all output (#3189)
1b2bc05
unverified
talk-llama : fix for swedish umlauts + expose model inference settings in talk-llama.cpp (#3187)
1473e33
unverified
docs : fix VAD section heading levels (#3186)
a7bcfbf
unverified
ci : use dynamic libopenblas.dll for window-blas (#3177)
bafccd1
unverified
server : Add k6 Load Testing Script (#3175)
9a681c7
unverified
docs : add VAD model download instructions [no ci] (#3180)
e789f73
unverified
docs : replace typo "]"with ")" in README (#3179)
5e8b0f0
unverified
Alpaim
commited on
whisper : remove redundant assignments (#3178)
ec40497
unverified
whisper : update CMakeLists.txt to handle deprecated gpu Warnings (#3163)
2ee9c36
unverified
Jugal Haresh Sheth
Jugal Sheth
commited on
ruby : add GGML_SYCL_DNN option to ruby bindings (#3172)
94d5ce3
unverified
talk-llama : sync llama.cpp
44ee199
sync : ggml
b16623d
CANN: Support MOE Model MUL_MAT_ID (llama/13042)
f013e2d
Chenguang Li
commited on
cmake: use the current build config for vulkan-shaders-gen (llama/13595)
7681e32
Gilad S.
commited on
vulkan: move common FA code to flash_attn_base.comp (llama/13556)
ad8b504
vulkan: use scalar FA rather than coopmat2 when N==1 (llama/13554)
97d9aa6
metal : add FA-vec kernel for head size 64 (llama/13583)
36a3b4e
sycl : fixed compilation warnings (llama/13582)
5037d84
Łukasz Ślusarczyk
commited on
gguf : use ggml log system (llama/13571)
a2211c9
Diego Devesa
commited on
sycl: simplify bin_bcast_kernel (llama/13383)
c39b646
Atharva Dubey
commited on
sycl: reordered Q4_K MMVQ (llama/13109)
6ca3a47
Svetlozar Georgiev
commited on
sycl: use oneDNN for matrices multiplication (llama/12972)
2008e08
Łukasz Ślusarczyk
commited on
arm64: optimize q6_k_q8_k kernel with i8mm (llama/13519)
03048ea
Yibo Cai
commited on
CUDA: fix crash on large batch size for quant. MoE (llama/13537)
df90a14
CUDA: faster Deepseek FA, add Turing support (llama/13435)
ace16dc
cmake: simplify vulkan shader test logic (llama/13263)
f8fd66d
bandoti
commited on
vulkan: KHR_coopmat flash attention (llama/13506)
4d1bd4f
vulkan: workaround FA compile failures on macos (llama/13517)
06833bc
metal : use FA-vec kernel up to batch size 20 (llama/13496)
e925f17
metal : optimize multi-sequence FA vec kernel (llama/13493)
d2f915d
ggml-cpu: Update KleidiAI to v1.6 and fix include directives (llama/13509)
7463545
Dan Johansson
commited on