File size: 5,862 Bytes
7dad981
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# Voice Design Guide (Voice Design README)

## Overview

This guide provides instructions for creating high-quality **voice descriptions (voice_prompt)** to generate voices that meet specific requirements. The voice description serves as the blueprint for voice design and directly determines the quality of the generated voice.

---

## Technical Constraints

| Item                    | Description                                                                                       |
| ----------------------- | ------------------------------------------------------------------------------------------------- |
| **Length Limit**        | Each voice_prompt ≤ 200 characters                                                                |
| **Supported Languages** | Chinese only. English is not supported in the current version and will be added in future updates |

---

## Five Core Principles

### 1️⃣ Be Specific, Not Vague

**✅ Recommended**: Use perceptible and concrete voice attributes

* Pitch: low, high, bright, rich
* Speaking rate: fast, slow, rapid, steady
* Timbre: magnetic, husky, smooth, clear

**❌ Avoid**:
“Nice”, “normal”, “good” (too subjective and uninformative)

---

### 2️⃣ Multi-Dimensional, Not Single-Attribute

**✅ Recommended**: Combine at least 3–4 dimensions to create a vivid voice profile

* Persona (usage scenario) + gender + age + pitch + speaking rate + volume + timbre + emotion

**❌ Avoid**:
Only “female voice” or only “low-pitched” (too generic, lacks distinctiveness)

---

### 3️⃣ Objective, Not Subjective

**✅ Recommended**: Describe physical and acoustic characteristics

* “Slightly high-pitched with energetic delivery”
* “Slow speaking rate with clear articulation”

**❌ Avoid**:
“My favorite voice”, “This voice sounds great”

---

### 4️⃣ Original, Not Imitative

**⚠️ Copyright Notice**: Descriptions such as “sounds like XX celebrity” or “imitates XX actor” are prohibited
**✅ Recommended**: Describe voice characteristics directly rather than referencing specific individuals

---

### 5️⃣ Concise, Not Redundant

**✅ Recommended**: Ensure every word conveys meaningful information
**❌ Avoid**:
“Very, very good voice”, “Extremely, extremely gentle”

---

## Reference Dimensions for Voice Description

Based on high-quality examples, we recommend composing voice prompts using the following dimensions:

| Dimension                    | Example Options                                                                                   |
| ---------------------------- | ------------------------------------------------------------------------------------------------- |
| **Persona (Usage Scenario)** | News broadcasting, advertising voice-over, audiobooks, animated characters, documentary narration |
| **Gender**                   | Male, Female                                                                                      |
| **Age**                      | Child (~8 years), Young adult (20–30), Middle-aged (40–50), Elderly                               |
| **Personality Traits**       | Lively, calm, gentle, intellectual, cute, serious                                                 |
| **Speaking Rate & Rhythm**   | Fast, slow, moderate, urgent, steady                                                              |
| **Intonation Style**         | Rising, neutral, passionate, relaxed                                                              |
| **Timbre**                   | Deep and magnetic, crisp and bright, husky and warm, youthful                                     |

---

## High-Quality Examples

### ✅ Recommended Templates

**Example 1: Poetry Recitation**

> “A male modern poetry reciter with a deep, magnetic low voice, delivering poetry with strong rhythmic pauses, powerful volume, and intense emotional expression.”

**Example 2: News Style**

> “A female news anchor speaking standard Mandarin with a clear and bright mid-to-high pitch, steady professional pacing, strong volume, and a neutral, objective tone.”

**Example 3: Advertising Voice-Over**

> “A male voice for liquor brand advertising, featuring a rich and weathered timbre, slow and bold speaking rate, strong volume, conveying a sense of history and masculinity.”

---

## Common Mistakes and Improvements

| Type                      | ❌ Not Recommended          | ✅ Improved Version                                                           |
| ------------------------- | -------------------------- | ---------------------------------------------------------------------------- |
| **Too Generic**           | “Female voice, nice”       | “Young female voice with a clear pitch and moderate speaking rate”           |
| **Subjective Evaluation** | “A great-sounding voice”   | “Bright timbre with strong expressiveness”                                   |
| **Single Dimension**      | “Low-pitched male voice”   | “Middle-aged male with a low pitch, slow pacing, suitable for documentaries” |
| **Redundant Wording**     | “Very, very gentle voice”  | “Gentle and intellectual female voice”                                       |
| **Imitation Request**     | “Sounds like XX celebrity” | **Prohibited** — describe objective voice traits instead                     |

---

## Quick Checklist

Before submitting a voice_prompt, make sure that:

* [ ] Length ≤ 200 characters
* [ ] At least 3 different descriptive dimensions are included
* [ ] No subjective evaluation words (e.g., “nice”, “great”, “favorite”)
* [ ] No references to real individuals or imitation requests
* [ ] No repetitive or exaggerated wording
* [ ] Usage scenario is clearly defined
* [ ] All descriptors are perceptible and concrete

---