Commit History

HIP: bump requirement to rocm 6.1 (llama/15296)
58a3802

uvos commited on

CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132)
c768824

ORippler commited on

musa: fix failures in test-backend-ops for mul_mat_id op (llama/15236)
4168dda

yeahdongcn commited on

CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131)
1d24833

JohannesGaessler commited on

llama : add gpt-oss (llama/15091)
bf225d6

ggerganov ngxson HF Staff slaren commited on

HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (llama/14949)
149f5a5

uvos commited on

CUDA: skip masked KV slices for all FA kernels (llama/14924)
0c60f80

JohannesGaessler commited on

HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945)
e37eff3

uvos commited on

HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930)
f9dbd96

uvos commited on

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624)
5422b31

deepsek commited on

musa: upgrade musa sdk to rc4.2.0 (llama/14498)
a687ec3

yeahdongcn commited on

musa: fix build warnings (unused variable) (llama/14561)
891b1d1

yeahdongcn commited on

CUDA: add dynamic shared mem to softmax, refactor general usage (llama/14497)
8e1f56c

am17an commited on

CUDA/HIP: optimize mmv paths taken for HIP devices (llama/14324)
1a9d2d3

uvos JohannesGaessler commited on

CUDA: mul_mat_v support for batch sizes > 1 (llama/14262)
2d1e6e7

JohannesGaessler commited on

HIP: enable vec fattn on RDNA4 (llama/14323)
b6dc6a1

uvos commited on

CUDA: add mean operation (llama/14313)
7cee55b

am17an commited on

cuda : synchronize graph capture and cublas handle destruction (llama/14288)
39c4fa5

Diego Devesa commited on

HIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202)
f95736f

uvos commited on

HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (llama/14183)
c3467c7

uvos commited on

ggml-cpu : split arch-specific implementations (llama/13892)
8c833e9

xctan ggerganov commited on

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (llama/13895)
a75e157

Shawn yang Yzzzaz JohannesGaessler yangxiao Diego Devesa commited on

cuda : avoid cuGetErrorString (llama/13791)
cdf95d3

ggerganov commited on

CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)
507d30c

JohannesGaessler commited on

whisper: remove MSVC warnings pragmas (#3090)
e0d130c
unverified

danbev commited on

musa: fix typo in cc control (llama/13144)
5fb7320

R0CKSTAR commited on

Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017)
a2fdbe6

Alan Gray slaren commited on

musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611)
12bb60d

R0CKSTAR commited on

HIP: Add support for RDNA4 targets (llama/12372)
a73f01f

Slobodan Josic commited on

CUDA: Fix clang warnings (llama/12540)
efa6dac

R0CKSTAR commited on

musa: refine compute capability (llama/12493)
5e508d2

R0CKSTAR commited on

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
1e69b8c

Gaurav Garg commited on

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177)
1f75790

uvos JohannesGaessler commited on

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)
a027c1d

David Huang commited on

CUDA: app option to compile without FlashAttention (llama/12025)
fbc5f16

JohannesGaessler commited on

MUSA: support ARM64 and enable dp4a .etc (llama/11843)
ab96dac

Bodhi Bodhi Hu commited on

CUDA: use async data loading for FlashAttention (llama/11894)
5b9980d

JohannesGaessler Diego Devesa commited on

CUDA: fix CUDART_VERSION checks (llama/11821)
04f123a

JohannesGaessler commited on

CUDA: use arch list for compatibility check (llama/11775)
b88e163

JohannesGaessler Diego Devesa commited on

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)
ed08269

uvos commited on

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)
4850c24

uvos commited on

CUDA: use mma PTX instructions for FlashAttention (llama/11583)
f328957

JohannesGaessler Diego Devesa commited on

HIP: Prepare reduction operators for wave 64
bc1c1a4

uvos commited on

CUDA/HIP: add warp_size to cuda_device_info
e538e2c

uvos commited on

AMD: parse the architecture as supplied by gcnArchName (llama/11244)
04b01d8

Haus1 commited on

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)
2cc4df4

uvos commited on

hip : Add hipGraph and VMM support to ROCM (llama/11362)
089afa0

uvos commited on

CUDA: rename macros to avoid conflicts with WinAPI (llama/10736)
8544072

Andreas Kieslinger commited on

Add some minimal optimizations for CDNA (llama/10498)
bf49bbe

uvos commited on