Commits · Xenobd/whisper.cpp

talk-llama : sync llama.cpp

e8e18fb
unverified

ggerganov commited on Jun 18, 2024

whisper : use ggml_backend_sched (#2239)

bfa5a95

ggerganov slaren commited on Jun 18, 2024

fix : remove extra files

1b0dec0

ggerganov commited on Jun 16, 2024

scripts : sync ggml-blas

463e11c

ggerganov commited on Jun 16, 2024

build : update make / cmake

0b4241c

ggerganov commited on Jun 16, 2024

sync : ggml

89ada87

ggerganov commited on Jun 16, 2024

move BLAS to a separate backend (cont) (llama/6210)

4b26445

slaren commited on Jun 16, 2024

Vulkan Shader Refactor, Memory Debugging Option (llama/7947)

d0120b1

OccamRazor commited on Jun 16, 2024

scripts : stop sync whisper example from ggml

f174613

ggerganov commited on Jun 16, 2024

cmake : fix sycl build (#0)

9a475af

ggerganov commited on Jun 16, 2024

ggml : remove OpenCL (#0)

d303fe3

ggerganov commited on Jun 16, 2024

sycl : sync (#0)

f580c99

ggerganov commited on Jun 16, 2024

cuda : enable CUDA graphs (#0)

d075551

ggerganov commited on Jun 16, 2024

talk-llama : sync llama.cpp

7e268a7

ggerganov commited on Jun 16, 2024

cmake : fix CUDA build (#0)

ddc04a3

ggerganov commited on Jun 16, 2024

sync : ggml

305dc4e

ggerganov commited on Jun 16, 2024

ggml : fix and optimize ppc64le (ggml/849)

e3d09d2

Hong Bo PENG commited on Jun 16, 2024

ggml : remove duplicate include of ggml-common.h (ggml/853)

8c3ae74

danbev commited on Jun 16, 2024

remove global variables (llama/7710)

4cb73ba

hengyu commited on Jun 15, 2024

CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)

5931562

JohannesGaessler commited on Jun 14, 2024

metal : utilize max shared memory for mul_mat_id (llama/7935)

d4b3604

ggerganov commited on Jun 14, 2024

rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)

56e6751

rgerganov commited on Jun 13, 2024

move BLAS to a separate backend (llama/6210)

c773aa9

slaren

ggerganov commited on Jun 13, 2024

CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)

efbb7be

JohannesGaessler commited on Jun 12, 2024

tests : add non-cont unary tests (llama/7857)

6dc2887

ggerganov commited on Jun 12, 2024

ggml : improve ggml_is_contiguous logic (llama/7856)

ea3aa71

ggerganov commited on Jun 12, 2024

vulkan: select only one device for single gpu with multiple drivers (llama/7582)

ee56a37

Adriankhl commited on Jun 11, 2024

Update Vulkan RoPE implementation (llama/7818)

71850e7

OccamRazor slaren commited on Jun 11, 2024

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860)

154bf2b

JohannesGaessler commited on Jun 11, 2024

CUDA: use tensor cores for MMQ (llama/7676)

78a5b67

JohannesGaessler commited on Jun 10, 2024

use the correct SYCL context for host USM allocations (llama/7777)

9f87c2f

bashbaug commited on Jun 10, 2024

CUDA: revise q8_1 data layout for mul_mat_q (llama/7824)

fcfd59e

JohannesGaessler commited on Jun 9, 2024

vulkan : reuse parent extra for views (llama/7806)

b9b60de

slaren

OccamRazor commited on Jun 7, 2024

fix softmax r2r result wrong issue (llama/7811)

c3a7159

PPxin commited on Jun 7, 2024

CUDA: refactor mmq, dmmv, mmvq (llama/7716)

849ff52

JohannesGaessler commited on Jun 5, 2024

ggml : refactor rope norm/neox (llama/7634)

ded0c68

ggerganov commited on Jun 5, 2024

Allow number of nodes in CUDA graph to change (llama/7738)

6124287

agray3 commited on Jun 4, 2024

ggml : remove OpenCL (llama/7735)

4ff3b72

ggerganov commited on Jun 4, 2024

ggml : prevent builds with -ffinite-math-only (llama/7726)

154f0f8

ggerganov commited on Jun 4, 2024

llama : offload to RPC in addition to other backends (llama/7640)

eab8082

rgerganov slaren commited on Jun 3, 2024

ggml : use OpenMP as a thread pool (llama/7606)

7e5d850

Masaya, Kato slaren

ggerganov commited on Jun 3, 2024

Vulkan Mixture of Experts (MoE) support (llama/7628)

ad9ee26

OccamRazor commited on Jun 3, 2024

kompute : implement op_getrows_f32 (llama/6403)

fa0872f

woachk commited on Jun 3, 2024

fix bug introduced in using calloc (llama/7701)

f22c7e4

Dave Airlie commited on Jun 2, 2024

Fix FlashAttention debug test, FP32 assert (llama/7684)

1bed92f

JohannesGaessler commited on Jun 1, 2024

CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681)

d4c0faf

JohannesGaessler commited on Jun 1, 2024

CUDA: quantized KV support for FA vec (llama/7527)

315df8c

JohannesGaessler commited on Jun 1, 2024

ggml : fix loongson compile warnings (llama/7537)

c1442f3

ggerganov junchao-loongson commited on May 31, 2024

faster avx512 exp implementation (llama/7551)

6dbbbab

chriselrod commited on May 30, 2024

ggml : fix loongarch build (O2 issue) (llama/7636)

133ffbf

junchao-loongson commited on May 30, 2024

Commit History

talk-llama : sync llama.cpp e8e18fb unverified

whisper : use ggml_backend_sched (#2239) bfa5a95

fix : remove extra files 1b0dec0

scripts : sync ggml-blas 463e11c

build : update make / cmake 0b4241c

sync : ggml 89ada87

move BLAS to a separate backend (cont) (llama/6210) 4b26445

Vulkan Shader Refactor, Memory Debugging Option (llama/7947) d0120b1

scripts : stop sync whisper example from ggml f174613

cmake : fix sycl build (#0) 9a475af

ggml : remove OpenCL (#0) d303fe3

sycl : sync (#0) f580c99

cuda : enable CUDA graphs (#0) d075551

talk-llama : sync llama.cpp 7e268a7

cmake : fix CUDA build (#0) ddc04a3

sync : ggml 305dc4e

ggml : fix and optimize ppc64le (ggml/849) e3d09d2

ggml : remove duplicate include of ggml-common.h (ggml/853) 8c3ae74

remove global variables (llama/7710) 4cb73ba

CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921) 5931562

metal : utilize max shared memory for mul_mat_id (llama/7935) d4b3604

rpc : fix ggml_backend_rpc_supports_buft() (llama/7918) 56e6751

move BLAS to a separate backend (llama/6210) c773aa9

CUDA: fix broken oob check for FA vec f32 kernel (llama/7904) efbb7be

tests : add non-cont unary tests (llama/7857) 6dc2887

ggml : improve ggml_is_contiguous logic (llama/7856) ea3aa71

vulkan: select only one device for single gpu with multiple drivers (llama/7582) ee56a37

Update Vulkan RoPE implementation (llama/7818) 71850e7

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860) 154bf2b

CUDA: use tensor cores for MMQ (llama/7676) 78a5b67

use the correct SYCL context for host USM allocations (llama/7777) 9f87c2f

CUDA: revise q8_1 data layout for mul_mat_q (llama/7824) fcfd59e

vulkan : reuse parent extra for views (llama/7806) b9b60de

fix softmax r2r result wrong issue (llama/7811) c3a7159

CUDA: refactor mmq, dmmv, mmvq (llama/7716) 849ff52

ggml : refactor rope norm/neox (llama/7634) ded0c68

Allow number of nodes in CUDA graph to change (llama/7738) 6124287

ggml : remove OpenCL (llama/7735) 4ff3b72

ggml : prevent builds with -ffinite-math-only (llama/7726) 154f0f8

llama : offload to RPC in addition to other backends (llama/7640) eab8082

ggml : use OpenMP as a thread pool (llama/7606) 7e5d850

Vulkan Mixture of Experts (MoE) support (llama/7628) ad9ee26

kompute : implement op_getrows_f32 (llama/6403) fa0872f

fix bug introduced in using calloc (llama/7701) f22c7e4

Fix FlashAttention debug test, FP32 assert (llama/7684) 1bed92f

CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681) d4c0faf

CUDA: quantized KV support for FA vec (llama/7527) 315df8c

ggml : fix loongson compile warnings (llama/7537) c1442f3

faster avx512 exp implementation (llama/7551) 6dbbbab

ggml : fix loongarch build (O2 issue) (llama/7636) 133ffbf

talk-llama : sync llama.cpp

e8e18fb
unverified

whisper : use ggml_backend_sched (#2239)

bfa5a95

fix : remove extra files

1b0dec0

scripts : sync ggml-blas

463e11c

build : update make / cmake

0b4241c

sync : ggml

89ada87

move BLAS to a separate backend (cont) (llama/6210)

4b26445

Vulkan Shader Refactor, Memory Debugging Option (llama/7947)

d0120b1

scripts : stop sync whisper example from ggml

f174613

cmake : fix sycl build (#0)

9a475af

ggml : remove OpenCL (#0)

d303fe3

sycl : sync (#0)

f580c99

cuda : enable CUDA graphs (#0)

d075551

talk-llama : sync llama.cpp

7e268a7

cmake : fix CUDA build (#0)

ddc04a3

sync : ggml

305dc4e

ggml : fix and optimize ppc64le (ggml/849)

e3d09d2

ggml : remove duplicate include of ggml-common.h (ggml/853)

8c3ae74

remove global variables (llama/7710)

4cb73ba

CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)

5931562

metal : utilize max shared memory for mul_mat_id (llama/7935)

d4b3604

rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)

56e6751

move BLAS to a separate backend (llama/6210)

c773aa9

CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)

efbb7be

tests : add non-cont unary tests (llama/7857)

6dc2887

ggml : improve ggml_is_contiguous logic (llama/7856)

ea3aa71

vulkan: select only one device for single gpu with multiple drivers (llama/7582)

ee56a37

Update Vulkan RoPE implementation (llama/7818)

71850e7

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860)

154bf2b

CUDA: use tensor cores for MMQ (llama/7676)

78a5b67

use the correct SYCL context for host USM allocations (llama/7777)

9f87c2f

CUDA: revise q8_1 data layout for mul_mat_q (llama/7824)

fcfd59e

vulkan : reuse parent extra for views (llama/7806)

b9b60de

fix softmax r2r result wrong issue (llama/7811)

c3a7159

CUDA: refactor mmq, dmmv, mmvq (llama/7716)

849ff52

ggml : refactor rope norm/neox (llama/7634)

ded0c68

Allow number of nodes in CUDA graph to change (llama/7738)

6124287

ggml : remove OpenCL (llama/7735)

4ff3b72

ggml : prevent builds with -ffinite-math-only (llama/7726)

154f0f8

llama : offload to RPC in addition to other backends (llama/7640)

eab8082

ggml : use OpenMP as a thread pool (llama/7606)

7e5d850

Vulkan Mixture of Experts (MoE) support (llama/7628)

ad9ee26

kompute : implement op_getrows_f32 (llama/6403)

fa0872f

fix bug introduced in using calloc (llama/7701)

f22c7e4

Fix FlashAttention debug test, FP32 assert (llama/7684)

1bed92f

CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681)

d4c0faf

CUDA: quantized KV support for FA vec (llama/7527)

315df8c

ggml : fix loongson compile warnings (llama/7537)

c1442f3

faster avx512 exp implementation (llama/7551)

6dbbbab

ggml : fix loongarch build (O2 issue) (llama/7636)

133ffbf