Spaces:
Running
Running
readme : update GPU / CUDA
Browse files
README.md
CHANGED
|
@@ -16,12 +16,10 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
|
|
| 16 |
- VSX intrinsics support for POWER architectures
|
| 17 |
- Mixed F16 / F32 precision
|
| 18 |
- [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization)
|
| 19 |
-
- Low memory usage (Flash Attention)
|
| 20 |
- Zero memory allocations at runtime
|
| 21 |
- Support for CPU-only inference
|
| 22 |
-
- [
|
| 23 |
- [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
|
| 24 |
-
- [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas)
|
| 25 |
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
|
| 26 |
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
|
| 27 |
|
|
@@ -400,12 +398,12 @@ This can result in significant speedup in encoder performance. Here are the inst
|
|
| 400 |
|
| 401 |
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
|
| 402 |
cached for the next run.
|
| 403 |
-
|
| 404 |
For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
|
| 405 |
|
| 406 |
-
## NVIDIA GPU support
|
| 407 |
|
| 408 |
-
With NVIDIA cards the
|
| 409 |
First, make sure you have installed `cuda`: https://developer.nvidia.com/cuda-downloads
|
| 410 |
|
| 411 |
Now build `whisper.cpp` with cuBLAS support:
|
|
|
|
| 16 |
- VSX intrinsics support for POWER architectures
|
| 17 |
- Mixed F16 / F32 precision
|
| 18 |
- [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization)
|
|
|
|
| 19 |
- Zero memory allocations at runtime
|
| 20 |
- Support for CPU-only inference
|
| 21 |
+
- [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
|
| 22 |
- [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
|
|
|
|
| 23 |
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
|
| 24 |
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
|
| 25 |
|
|
|
|
| 398 |
|
| 399 |
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
|
| 400 |
cached for the next run.
|
| 401 |
+
|
| 402 |
For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
|
| 403 |
|
| 404 |
+
## NVIDIA GPU support
|
| 405 |
|
| 406 |
+
With NVIDIA cards the processing of the models is done efficiently on the GPU via cuBLAS and custom CUDA kernels.
|
| 407 |
First, make sure you have installed `cuda`: https://developer.nvidia.com/cuda-downloads
|
| 408 |
|
| 409 |
Now build `whisper.cpp` with cuBLAS support:
|