Spaces:
Sleeping
Sleeping
talk.wasm : final touches
Browse files- examples/talk.wasm/README.md +14 -10
- examples/talk.wasm/index-tmpl.html +19 -3
examples/talk.wasm/README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
# talk.wasm
|
| 2 |
|
| 3 |
-
Talk with an Artificial Intelligence
|
| 4 |
|
| 5 |
-
https://user-images.githubusercontent.com/1991296/
|
| 6 |
|
| 7 |
Online demo: https://talk.ggerganov.com
|
| 8 |
|
|
@@ -14,13 +14,12 @@ This demo leverages 2 modern neural network models to create a high-quality voic
|
|
| 14 |
- Upon receiving some voice input, the AI generates a text response using [OpenAI's GPT-2](https://github.com/openai/gpt-2) language model
|
| 15 |
- The AI then vocalizes the response using the browser's [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API)
|
| 16 |
|
| 17 |
-
The web page does the processing locally on your machine.
|
| 18 |
-
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
[current repository](https://github.com/ggerganov/whisper.cpp).
|
| 24 |
|
| 25 |
## Requirements
|
| 26 |
|
|
@@ -37,8 +36,13 @@ Also, the prompting strategy can likely be improved to achieve better results.
|
|
| 37 |
The demo is quite computationally heavy - it's not usual to run these transformer models in a browser. Typically, they
|
| 38 |
run on powerful GPU hardware. So for better experience, you do need to have a powerful computer.
|
| 39 |
|
| 40 |
-
Probably in the near future, mobile browsers will start
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
## Feedback
|
| 44 |
|
|
|
|
| 1 |
# talk.wasm
|
| 2 |
|
| 3 |
+
Talk with an Artificial Intelligence in your browser:
|
| 4 |
|
| 5 |
+
https://user-images.githubusercontent.com/1991296/203411580-fedb4839-05e4-4474-8364-aaf1e9a9b615.mp4
|
| 6 |
|
| 7 |
Online demo: https://talk.ggerganov.com
|
| 8 |
|
|
|
|
| 14 |
- Upon receiving some voice input, the AI generates a text response using [OpenAI's GPT-2](https://github.com/openai/gpt-2) language model
|
| 15 |
- The AI then vocalizes the response using the browser's [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API)
|
| 16 |
|
| 17 |
+
The web page does the processing locally on your machine. The processing of these heavy neural network models in the
|
| 18 |
+
browser is possible by implementing them efficiently in C/C++ and using the browser's WebAssembly SIMD capabilities for
|
| 19 |
+
extra performance. For more detailed information, checkout the [current repository](https://github.com/ggerganov/whisper.cpp).
|
| 20 |
|
| 21 |
+
In order to run the models, the web page first needs to download the model data which is about ~350 MB. The model data
|
| 22 |
+
is then cached in your browser's cache and can be reused in future visits without downloading it again.
|
|
|
|
| 23 |
|
| 24 |
## Requirements
|
| 25 |
|
|
|
|
| 36 |
The demo is quite computationally heavy - it's not usual to run these transformer models in a browser. Typically, they
|
| 37 |
run on powerful GPU hardware. So for better experience, you do need to have a powerful computer.
|
| 38 |
|
| 39 |
+
Probably in the near future, mobile browsers will start supporting WASM SIMD. This will allow to run the demo on your
|
| 40 |
+
phone or tablet. But for now this functionality is not supported on mobile devices (at least not on iPhone).
|
| 41 |
+
|
| 42 |
+
## Todo
|
| 43 |
+
|
| 44 |
+
- Better UI (contributions are welcome)
|
| 45 |
+
- Better GPT-2 prompting
|
| 46 |
|
| 47 |
## Feedback
|
| 48 |
|
examples/talk.wasm/index-tmpl.html
CHANGED
|
@@ -137,6 +137,16 @@
|
|
| 137 |
<li>Your browser supports WASM <a href="https://webassembly.org/roadmap/">Fixed-width SIMD</a></li>
|
| 138 |
</ul>
|
| 139 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
<div class="cell-version">
|
| 141 |
<span>
|
| 142 |
|
|
|
@@ -230,6 +240,8 @@
|
|
| 230 |
}
|
| 231 |
}
|
| 232 |
}
|
|
|
|
|
|
|
| 233 |
}
|
| 234 |
};
|
| 235 |
|
|
@@ -487,6 +499,7 @@
|
|
| 487 |
doRecording = false;
|
| 488 |
audio0 = null;
|
| 489 |
audio = null;
|
|
|
|
| 490 |
}
|
| 491 |
|
| 492 |
function startRecording() {
|
|
@@ -519,6 +532,9 @@
|
|
| 519 |
reader.onload = function(event) {
|
| 520 |
var buf = new Uint8Array(reader.result);
|
| 521 |
|
|
|
|
|
|
|
|
|
|
| 522 |
context.decodeAudioData(buf.buffer, function(audioBuffer) {
|
| 523 |
var offlineContext = new OfflineAudioContext(audioBuffer.numberOfChannels, audioBuffer.length, audioBuffer.sampleRate);
|
| 524 |
var source = offlineContext.createBufferSource();
|
|
@@ -695,9 +711,9 @@ I'm fine, thanks. How are you?\n\
|
|
| 695 |
Thanks, I'm fine too. What are you doing?\n\
|
| 696 |
I'm just sitting here.\n\
|
| 697 |
It's a lovely day, isn't it?\n\
|
| 698 |
-
Yes, it is.\n\
|
| 699 |
-
|
| 700 |
-
|
| 701 |
break;
|
| 702 |
case '1':
|
| 703 |
// Robot
|
|
|
|
| 137 |
<li>Your browser supports WASM <a href="https://webassembly.org/roadmap/">Fixed-width SIMD</a></li>
|
| 138 |
</ul>
|
| 139 |
|
| 140 |
+
Note that these neural network models were not meant to be used in a browser, so the performance and <br>
|
| 141 |
+
quality of the results may not be optimal. If you have any questions or suggestions, checkout the following
|
| 142 |
+
<a href="https://github.com/ggerganov/whisper.cpp/discussions/167">discussion</a>.
|
| 143 |
+
|
| 144 |
+
<br><br>
|
| 145 |
+
|
| 146 |
+
Here is a short video of the demo in action: <a href="https://youtu.be/2om-7tFMaNs">https://youtu.be/2om-7tFMaNs</a>
|
| 147 |
+
|
| 148 |
+
<br><br>
|
| 149 |
+
|
| 150 |
<div class="cell-version">
|
| 151 |
<span>
|
| 152 |
|
|
|
|
|
| 240 |
}
|
| 241 |
}
|
| 242 |
}
|
| 243 |
+
|
| 244 |
+
onPromptChange();
|
| 245 |
}
|
| 246 |
};
|
| 247 |
|
|
|
|
| 499 |
doRecording = false;
|
| 500 |
audio0 = null;
|
| 501 |
audio = null;
|
| 502 |
+
context = null;
|
| 503 |
}
|
| 504 |
|
| 505 |
function startRecording() {
|
|
|
|
| 532 |
reader.onload = function(event) {
|
| 533 |
var buf = new Uint8Array(reader.result);
|
| 534 |
|
| 535 |
+
if (!context) {
|
| 536 |
+
return;
|
| 537 |
+
}
|
| 538 |
context.decodeAudioData(buf.buffer, function(audioBuffer) {
|
| 539 |
var offlineContext = new OfflineAudioContext(audioBuffer.numberOfChannels, audioBuffer.length, audioBuffer.sampleRate);
|
| 540 |
var source = offlineContext.createBufferSource();
|
|
|
|
| 711 |
Thanks, I'm fine too. What are you doing?\n\
|
| 712 |
I'm just sitting here.\n\
|
| 713 |
It's a lovely day, isn't it?\n\
|
| 714 |
+
Yes, it is. I love the weather this time of year.\n\
|
| 715 |
+
I wish it would rain a little bit.\n\
|
| 716 |
+
Me too.\n";
|
| 717 |
break;
|
| 718 |
case '1':
|
| 719 |
// Robot
|