Update README.md
Browse files
README.md
CHANGED
|
@@ -99,6 +99,23 @@ gen_tokens = model.generate(input_ids, do_sample=True, max_length=400)
|
|
| 99 |
print("-"*20 + "Output for model" + 20 * '-')
|
| 100 |
print(tokenizer.batch_decode(gen_tokens)[0])
|
| 101 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
# Evaluation
|
| 103 |
|
| 104 |
Coming Soon!
|
|
|
|
| 99 |
print("-"*20 + "Output for model" + 20 * '-')
|
| 100 |
print(tokenizer.batch_decode(gen_tokens)[0])
|
| 101 |
```
|
| 102 |
+
## CrystalChat DataMix
|
| 103 |
+
| Subset | Tokens (Billion) |
|
| 104 |
+
| ----------- | ----------- |
|
| 105 |
+
| OASST1-guanaco | 4.46 |
|
| 106 |
+
| SlimOrca | 225.63 |
|
| 107 |
+
| ShareGPT | 112.91 |
|
| 108 |
+
| Evol-ShareGPT | 85.95 |
|
| 109 |
+
| ChatLogs | 29.34 |
|
| 110 |
+
| CodeAlpaca | 2.62 |
|
| 111 |
+
| Rosetta Code | 7.99 |
|
| 112 |
+
| Evol-CodeAlpaca 1 | 73.80 |
|
| 113 |
+
| Evol-CodeAlpaca 2 | 34.91 |
|
| 114 |
+
| HTML Instruction | 43.67 |
|
| 115 |
+
| General Textbooks | 85.59 |
|
| 116 |
+
| Programming Books | 395.63 |
|
| 117 |
+
| Total | 1102.52 |
|
| 118 |
+
|
| 119 |
# Evaluation
|
| 120 |
|
| 121 |
Coming Soon!
|