ggerganov commited on
Commit
e8ad308
·
unverified ·
1 Parent(s): da4acca

readme : update GPU / CUDA

Browse files
Files changed (1) hide show
  1. README.md +4 -6
README.md CHANGED
@@ -16,12 +16,10 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
16
  - VSX intrinsics support for POWER architectures
17
  - Mixed F16 / F32 precision
18
  - [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization)
19
- - Low memory usage (Flash Attention)
20
  - Zero memory allocations at runtime
21
  - Support for CPU-only inference
22
- - [Partial GPU support for NVIDIA via cuBLAS](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
23
  - [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
24
- - [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas)
25
  - [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
26
  - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
27
 
@@ -400,12 +398,12 @@ This can result in significant speedup in encoder performance. Here are the inst
400
 
401
  The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
402
  cached for the next run.
403
-
404
  For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
405
 
406
- ## NVIDIA GPU support via cuBLAS
407
 
408
- With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS.
409
  First, make sure you have installed `cuda`: https://developer.nvidia.com/cuda-downloads
410
 
411
  Now build `whisper.cpp` with cuBLAS support:
 
16
  - VSX intrinsics support for POWER architectures
17
  - Mixed F16 / F32 precision
18
  - [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization)
 
19
  - Zero memory allocations at runtime
20
  - Support for CPU-only inference
21
+ - [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
22
  - [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
 
23
  - [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
24
  - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
25
 
 
398
 
399
  The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
400
  cached for the next run.
401
+
402
  For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
403
 
404
+ ## NVIDIA GPU support
405
 
406
+ With NVIDIA cards the processing of the models is done efficiently on the GPU via cuBLAS and custom CUDA kernels.
407
  First, make sure you have installed `cuda`: https://developer.nvidia.com/cuda-downloads
408
 
409
  Now build `whisper.cpp` with cuBLAS support: