ggerganov commited on
Commit
722327a
·
unverified ·
1 Parent(s): b38c009

talk.wasm : final touches

Browse files
examples/talk.wasm/README.md CHANGED
@@ -1,8 +1,8 @@
1
  # talk.wasm
2
 
3
- Talk with an Artificial Intelligence entity in your browser:
4
 
5
- https://user-images.githubusercontent.com/1991296/202914175-115793b1-d32e-4aaa-a45b-59e313707ff6.mp4
6
 
7
  Online demo: https://talk.ggerganov.com
8
 
@@ -14,13 +14,12 @@ This demo leverages 2 modern neural network models to create a high-quality voic
14
  - Upon receiving some voice input, the AI generates a text response using [OpenAI's GPT-2](https://github.com/openai/gpt-2) language model
15
  - The AI then vocalizes the response using the browser's [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API)
16
 
17
- The web page does the processing locally on your machine. However, in order to run the models, it first needs to
18
- download the model data which is about ~350 MB. The model data is then cached in your browser's cache and can be reused
19
- in future visits without downloading it again.
20
 
21
- The processing of these heavy neural network models in the browser is possible by implementing them efficiently in C/C++
22
- and using WebAssembly SIMD capabilities for extra performance. For more detailed information, checkout the
23
- [current repository](https://github.com/ggerganov/whisper.cpp).
24
 
25
  ## Requirements
26
 
@@ -37,8 +36,13 @@ Also, the prompting strategy can likely be improved to achieve better results.
37
  The demo is quite computationally heavy - it's not usual to run these transformer models in a browser. Typically, they
38
  run on powerful GPU hardware. So for better experience, you do need to have a powerful computer.
39
 
40
- Probably in the near future, mobile browsers will start to support the WASM SIMD capabilities and this will allow
41
- to run the demo on your phone or tablet. But for now it seems to be not supported (at least on iPhone).
 
 
 
 
 
42
 
43
  ## Feedback
44
 
 
1
  # talk.wasm
2
 
3
+ Talk with an Artificial Intelligence in your browser:
4
 
5
+ https://user-images.githubusercontent.com/1991296/203411580-fedb4839-05e4-4474-8364-aaf1e9a9b615.mp4
6
 
7
  Online demo: https://talk.ggerganov.com
8
 
 
14
  - Upon receiving some voice input, the AI generates a text response using [OpenAI's GPT-2](https://github.com/openai/gpt-2) language model
15
  - The AI then vocalizes the response using the browser's [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API)
16
 
17
+ The web page does the processing locally on your machine. The processing of these heavy neural network models in the
18
+ browser is possible by implementing them efficiently in C/C++ and using the browser's WebAssembly SIMD capabilities for
19
+ extra performance. For more detailed information, checkout the [current repository](https://github.com/ggerganov/whisper.cpp).
20
 
21
+ In order to run the models, the web page first needs to download the model data which is about ~350 MB. The model data
22
+ is then cached in your browser's cache and can be reused in future visits without downloading it again.
 
23
 
24
  ## Requirements
25
 
 
36
  The demo is quite computationally heavy - it's not usual to run these transformer models in a browser. Typically, they
37
  run on powerful GPU hardware. So for better experience, you do need to have a powerful computer.
38
 
39
+ Probably in the near future, mobile browsers will start supporting WASM SIMD. This will allow to run the demo on your
40
+ phone or tablet. But for now this functionality is not supported on mobile devices (at least not on iPhone).
41
+
42
+ ## Todo
43
+
44
+ - Better UI (contributions are welcome)
45
+ - Better GPT-2 prompting
46
 
47
  ## Feedback
48
 
examples/talk.wasm/index-tmpl.html CHANGED
@@ -137,6 +137,16 @@
137
  <li>Your browser supports WASM <a href="https://webassembly.org/roadmap/">Fixed-width SIMD</a></li>
138
  </ul>
139
 
 
 
 
 
 
 
 
 
 
 
140
  <div class="cell-version">
141
  <span>
142
  |
@@ -230,6 +240,8 @@
230
  }
231
  }
232
  }
 
 
233
  }
234
  };
235
 
@@ -487,6 +499,7 @@
487
  doRecording = false;
488
  audio0 = null;
489
  audio = null;
 
490
  }
491
 
492
  function startRecording() {
@@ -519,6 +532,9 @@
519
  reader.onload = function(event) {
520
  var buf = new Uint8Array(reader.result);
521
 
 
 
 
522
  context.decodeAudioData(buf.buffer, function(audioBuffer) {
523
  var offlineContext = new OfflineAudioContext(audioBuffer.numberOfChannels, audioBuffer.length, audioBuffer.sampleRate);
524
  var source = offlineContext.createBufferSource();
@@ -695,9 +711,9 @@ I'm fine, thanks. How are you?\n\
695
  Thanks, I'm fine too. What are you doing?\n\
696
  I'm just sitting here.\n\
697
  It's a lovely day, isn't it?\n\
698
- Yes, it is.\n\
699
- Did you know that I'm a robot?\n\
700
- I wasn't aware of that.\n";
701
  break;
702
  case '1':
703
  // Robot
 
137
  <li>Your browser supports WASM <a href="https://webassembly.org/roadmap/">Fixed-width SIMD</a></li>
138
  </ul>
139
 
140
+ Note that these neural network models were not meant to be used in a browser, so the performance and <br>
141
+ quality of the results may not be optimal. If you have any questions or suggestions, checkout the following
142
+ <a href="https://github.com/ggerganov/whisper.cpp/discussions/167">discussion</a>.
143
+
144
+ <br><br>
145
+
146
+ Here is a short video of the demo in action: <a href="https://youtu.be/2om-7tFMaNs">https://youtu.be/2om-7tFMaNs</a>
147
+
148
+ <br><br>
149
+
150
  <div class="cell-version">
151
  <span>
152
  |
 
240
  }
241
  }
242
  }
243
+
244
+ onPromptChange();
245
  }
246
  };
247
 
 
499
  doRecording = false;
500
  audio0 = null;
501
  audio = null;
502
+ context = null;
503
  }
504
 
505
  function startRecording() {
 
532
  reader.onload = function(event) {
533
  var buf = new Uint8Array(reader.result);
534
 
535
+ if (!context) {
536
+ return;
537
+ }
538
  context.decodeAudioData(buf.buffer, function(audioBuffer) {
539
  var offlineContext = new OfflineAudioContext(audioBuffer.numberOfChannels, audioBuffer.length, audioBuffer.sampleRate);
540
  var source = offlineContext.createBufferSource();
 
711
  Thanks, I'm fine too. What are you doing?\n\
712
  I'm just sitting here.\n\
713
  It's a lovely day, isn't it?\n\
714
+ Yes, it is. I love the weather this time of year.\n\
715
+ I wish it would rain a little bit.\n\
716
+ Me too.\n";
717
  break;
718
  case '1':
719
  // Robot