Commits · natasa365/whisper.cpp

HIP: bump requirement to rocm 6.1 (llama/15296)

58a3802

uvos commited on Aug 13, 2025

CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132)

c768824

ORippler commited on Aug 13, 2025

musa: fix failures in test-backend-ops for mul_mat_id op (llama/15236)

4168dda

yeahdongcn commited on Aug 12, 2025

CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131)

1d24833

JohannesGaessler commited on Aug 7, 2025

llama : add gpt-oss (llama/15091)

bf225d6

ggerganov

ngxson HF Staff slaren commited on Aug 5, 2025

HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (llama/14949)

149f5a5

uvos commited on Jul 30, 2025

CUDA: skip masked KV slices for all FA kernels (llama/14924)

0c60f80

JohannesGaessler commited on Jul 30, 2025

HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945)

e37eff3

uvos commited on Jul 29, 2025

HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930)

f9dbd96

uvos commited on Jul 29, 2025

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624)

5422b31

deepsek commited on Jul 26, 2025

musa: upgrade musa sdk to rc4.2.0 (llama/14498)

a687ec3

yeahdongcn commited on Jul 24, 2025

musa: fix build warnings (unused variable) (llama/14561)

891b1d1

yeahdongcn commited on Jul 7, 2025

CUDA: add dynamic shared mem to softmax, refactor general usage (llama/14497)

8e1f56c

am17an commited on Jul 2, 2025

musa: enable fp16 mma (all) and cublas on qy2 (llama/13842)

e35329b

yeahdongcn

JohannesGaessler commited on Jun 26, 2025

CUDA/HIP: optimize mmv paths taken for HIP devices (llama/14324)

1a9d2d3

uvos

JohannesGaessler commited on Jun 23, 2025

CUDA: mul_mat_v support for batch sizes > 1 (llama/14262)

2d1e6e7

JohannesGaessler commited on Jun 23, 2025

HIP: enable vec fattn on RDNA4 (llama/14323)

b6dc6a1

uvos commited on Jun 22, 2025

CUDA: add mean operation (llama/14313)

7cee55b

am17an commited on Jun 22, 2025

cuda : synchronize graph capture and cublas handle destruction (llama/14288)

39c4fa5

Diego Devesa commited on Jun 20, 2025

HIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202)

f95736f

uvos commited on Jun 16, 2025

HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (llama/14183)

c3467c7

uvos commited on Jun 15, 2025

ggml-cpu : split arch-specific implementations (llama/13892)

8c833e9

xctan

ggerganov commited on Jun 9, 2025

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (llama/13895)

a75e157

Shawn yang

Yzzzaz

JohannesGaessler yangxiao Diego Devesa commited on May 31, 2025

cuda : avoid cuGetErrorString (llama/13791)

cdf95d3

ggerganov commited on May 26, 2025

CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)

507d30c

JohannesGaessler commited on May 9, 2025

whisper: remove MSVC warnings pragmas (#3090)

e0d130c
unverified

danbev commited on May 5, 2025

musa: fix typo in cc control (llama/13144)

5fb7320

R0CKSTAR commited on Apr 28, 2025

Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017)

a2fdbe6

Alan Gray slaren commited on Apr 3, 2025

musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611)

12bb60d

R0CKSTAR commited on Mar 30, 2025

HIP: Add support for RDNA4 targets (llama/12372)

a73f01f

Slobodan Josic commited on Mar 26, 2025

CUDA: Fix clang warnings (llama/12540)

efa6dac

R0CKSTAR commited on Mar 24, 2025

musa: refine compute capability (llama/12493)

5e508d2

R0CKSTAR commited on Mar 22, 2025

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)

1e69b8c

Gaurav Garg commited on Mar 17, 2025

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177)

1f75790

uvos

JohannesGaessler commited on Mar 11, 2025

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)

a027c1d

David Huang commited on Mar 3, 2025

CUDA: app option to compile without FlashAttention (llama/12025)

fbc5f16

JohannesGaessler commited on Feb 22, 2025

MUSA: support ARM64 and enable dp4a .etc (llama/11843)

ab96dac

Bodhi Bodhi Hu commited on Feb 21, 2025

CUDA: use async data loading for FlashAttention (llama/11894)

5b9980d

JohannesGaessler Diego Devesa commited on Feb 17, 2025

CUDA: fix CUDART_VERSION checks (llama/11821)

04f123a

JohannesGaessler commited on Feb 12, 2025

CUDA: use arch list for compatibility check (llama/11775)

b88e163

JohannesGaessler Diego Devesa commited on Feb 10, 2025

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)

ed08269

uvos commited on Feb 2, 2025

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)

4850c24

uvos commited on Feb 2, 2025

CUDA: use mma PTX instructions for FlashAttention (llama/11583)

f328957

JohannesGaessler Diego Devesa commited on Feb 2, 2025

HIP: Prepare reduction operators for wave 64

bc1c1a4

uvos commited on Jan 29, 2025

CUDA/HIP: add warp_size to cuda_device_info

e538e2c

uvos commited on Jan 29, 2025

AMD: parse the architecture as supplied by gcnArchName (llama/11244)

04b01d8

Haus1 commited on Jan 27, 2025

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)

2cc4df4

uvos commited on Jan 25, 2025

hip : Add hipGraph and VMM support to ROCM (llama/11362)

089afa0

uvos commited on Jan 24, 2025

CUDA: rename macros to avoid conflicts with WinAPI (llama/10736)

8544072

Andreas Kieslinger commited on Dec 10, 2024

Add some minimal optimizations for CDNA (llama/10498)

bf49bbe

uvos commited on Nov 27, 2024

Commit History

HIP: bump requirement to rocm 6.1 (llama/15296) 58a3802

CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132) c768824

musa: fix failures in test-backend-ops for mul_mat_id op (llama/15236) 4168dda

CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131) 1d24833

llama : add gpt-oss (llama/15091) bf225d6

HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (llama/14949) 149f5a5

CUDA: skip masked KV slices for all FA kernels (llama/14924) 0c60f80

HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945) e37eff3

HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930) f9dbd96

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624) 5422b31

musa: upgrade musa sdk to rc4.2.0 (llama/14498) a687ec3

musa: fix build warnings (unused variable) (llama/14561) 891b1d1

CUDA: add dynamic shared mem to softmax, refactor general usage (llama/14497) 8e1f56c

musa: enable fp16 mma (all) and cublas on qy2 (llama/13842) e35329b

CUDA/HIP: optimize mmv paths taken for HIP devices (llama/14324) 1a9d2d3

CUDA: mul_mat_v support for batch sizes > 1 (llama/14262) 2d1e6e7

HIP: enable vec fattn on RDNA4 (llama/14323) b6dc6a1

CUDA: add mean operation (llama/14313) 7cee55b

cuda : synchronize graph capture and cublas handle destruction (llama/14288) 39c4fa5

HIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202) f95736f

HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (llama/14183) c3467c7

ggml-cpu : split arch-specific implementations (llama/13892) 8c833e9

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (llama/13895) a75e157

cuda : avoid cuGetErrorString (llama/13791) cdf95d3

CUDA: FA support for Deepseek (Ampere or newer) (llama/13306) 507d30c

whisper: remove MSVC warnings pragmas (#3090) e0d130c unverified

musa: fix typo in cc control (llama/13144) 5fb7320

Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017) a2fdbe6

musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611) 12bb60d

HIP: Add support for RDNA4 targets (llama/12372) a73f01f

CUDA: Fix clang warnings (llama/12540) efa6dac

musa: refine compute capability (llama/12493) 5e508d2

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394) 1e69b8c

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177) 1f75790

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032) a027c1d

CUDA: app option to compile without FlashAttention (llama/12025) fbc5f16

MUSA: support ARM64 and enable dp4a .etc (llama/11843) ab96dac

CUDA: use async data loading for FlashAttention (llama/11894) 5b9980d

CUDA: fix CUDART_VERSION checks (llama/11821) 04f123a

CUDA: use arch list for compatibility check (llama/11775) b88e163

CUDA/HIP: add support for selectable warp size to mmv (llama/11519) ed08269

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601) 4850c24

CUDA: use mma PTX instructions for FlashAttention (llama/11583) f328957

HIP: Prepare reduction operators for wave 64 bc1c1a4

CUDA/HIP: add warp_size to cuda_device_info e538e2c

AMD: parse the architecture as supplied by gcnArchName (llama/11244) 04b01d8

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420) 2cc4df4

hip : Add hipGraph and VMM support to ROCM (llama/11362) 089afa0

CUDA: rename macros to avoid conflicts with WinAPI (llama/10736) 8544072

Add some minimal optimizations for CDNA (llama/10498) bf49bbe