Ulov888 commited on
Commit
f087eef
Β·
verified Β·
1 Parent(s): 2281a21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +189 -3
README.md CHANGED
@@ -1,3 +1,189 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLaDA-MoE
2
+
3
+ **LLaDA-MoE** is a new and upgraded series of the LLaDA diffusion language model. This pre-release includes two cutting-edge models:
4
+
5
+ - `LLaDA-MoE-7B-A1B-Base`: A base pre-trained model designed for research and secondary development.
6
+ - `LLaDA-MoE-7B-A1B-Instruct`: An instruction-tuned model optimized for practical applications.
7
+
8
+ ---
9
+ ![rank](https://github.com/Ulov888/LLaDA_Assets/blob/main/benchmarks_grouped_bar.png)
10
+ ![table](https://github.com/Ulov888/LLaDA_Assets/blob/main/benchmarks_details_table.png)
11
+
12
+
13
+ ## πŸš€ Performance Highlights
14
+
15
+ - **Leading MoE Architecture**:
16
+ The first open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
17
+
18
+ - **Efficient Inference**:
19
+ With **7 billion total parameters**, only **1.4 billion** are activated during inference. LLaDA-MoE significantly reduces computational costs while outperforming open-source dense models of similar scale.
20
+
21
+ - **Impressive Performance on Code & Complex Reasoning**:
22
+ Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
23
+
24
+ - **Tool Use**:
25
+ Supports **tool calling** and achieves excellent performance in complex agent-based tasks.
26
+
27
+ - **Open & Extensible**:
28
+ Fully open-source with commitment to transparency. We plan to release a **leading inference framework** in the future and continue investing in cutting-edge areas like **diffusion LLMs (dLLM)** to drive disruptive innovation.
29
+
30
+ ---
31
+
32
+ ## πŸ“¦ Model Variants
33
+
34
+ | Model ID | Description | Hugging Face Link |
35
+ |--------|-------------|-------------------|
36
+ | [`inclusionAI/LLaDA-MoE-7B-A1B-Base`](https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Base) | Base pre-trained model for research and fine-tuning. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Base) |
37
+ | [`inclusionAI/LLaDA-MoE-7B-A1B-Instruct`](https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Instruct) | Instruction-tuned model, ready for downstream applications. | [πŸ€— Model Card](https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Instruct) |
38
+
39
+ ---
40
+
41
+ ## πŸ” Model Overview
42
+
43
+ **LLaDA-MoE-7B-A1B** has the following specifications:
44
+
45
+ - **Type**: Mixture-of-Experts (MoE) Diffusion Language Model
46
+ - **Total Parameters (Non-Embedding)**: 7.03B
47
+ - **Number of Layers**: 16
48
+ - **Attention Heads**: 16
49
+ - **Context Length**: 4,096 tokens
50
+ - **Position Embedding**: Rotary (RoPE)
51
+ - **Vocabulary Size**: 157,184
52
+
53
+ ---
54
+
55
+ ## ⚑ Quickstart
56
+
57
+ Make sure you have `transformers` and its dependencies installed:
58
+
59
+ ```bash
60
+ pip install transformers torch
61
+ ```
62
+
63
+ You can then load the model using the AutoModelForCausalLM and AutoTokenizer classes:
64
+
65
+ ```python
66
+ import torch
67
+ import numpy as np
68
+ import torch.nn.functional as F
69
+
70
+ from transformers import AutoTokenizer, AutoModel
71
+
72
+
73
+ def add_gumbel_noise(logits, temperature):
74
+ if temperature == 0:
75
+ return logits
76
+ logits = logits.to(torch.float64)
77
+ noise = torch.rand_like(logits, dtype=torch.float64)
78
+ gumbel_noise = (- torch.log(noise)) ** temperature
79
+ return logits.exp() / gumbel_noise
80
+
81
+
82
+ def get_num_transfer_tokens(mask_index, steps):
83
+ mask_num = mask_index.sum(dim=1, keepdim=True)
84
+
85
+ base = mask_num // steps
86
+ remainder = mask_num % steps
87
+
88
+ num_transfer_tokens = torch.zeros(mask_num.size(0), steps, device=mask_index.device, dtype=torch.int64) + base
89
+
90
+ for i in range(mask_num.size(0)):
91
+ num_transfer_tokens[i, :remainder[i]] += 1
92
+
93
+ return num_transfer_tokens
94
+
95
+
96
+ @ torch.no_grad()
97
+ def generate(model, prompt, steps=128, gen_length=128, block_length=128, temperature=0.,
98
+ cfg_scale=0., remasking='low_confidence', mask_id=156895):
99
+ x = torch.full((1, prompt.shape[1] + gen_length), mask_id, dtype=torch.long).to(model.device)
100
+ x[:, :prompt.shape[1]] = prompt.clone()
101
+ prompt_index = (x != mask_id)
102
+
103
+ assert gen_length % block_length == 0
104
+ num_blocks = gen_length // block_length
105
+ assert steps % num_blocks == 0
106
+ steps = steps // num_blocks
107
+
108
+ for num_block in range(num_blocks):
109
+ block_mask_index = (x[:, prompt.shape[1] + num_block * block_length: prompt.shape[1] + (num_block + 1) * block_length:] == mask_id)
110
+ num_transfer_tokens = get_num_transfer_tokens(block_mask_index, steps)
111
+ for i in range(steps):
112
+ mask_index = (x == mask_id)
113
+ if cfg_scale > 0.:
114
+ un_x = x.clone()
115
+ un_x[prompt_index] = mask_id
116
+ x_ = torch.cat([x, un_x], dim=0)
117
+ logits = model(x_).logits
118
+ logits, un_logits = torch.chunk(logits, 2, dim=0)
119
+ logits = un_logits + (cfg_scale + 1) * (logits - un_logits)
120
+ else:
121
+ logits = model(x).logits
122
+
123
+ logits_with_noise = add_gumbel_noise(logits, temperature=temperature)
124
+ x0 = torch.argmax(logits_with_noise, dim=-1) # b, l
125
+
126
+ if remasking == 'low_confidence':
127
+ p = F.softmax(logits, dim=-1)
128
+ x0_p = torch.squeeze(
129
+ torch.gather(p, dim=-1, index=torch.unsqueeze(x0, -1)), -1) # b, l
130
+ elif remasking == 'random':
131
+ x0_p = torch.rand((x0.shape[0], x0.shape[1]), device=x0.device)
132
+ else:
133
+ raise NotImplementedError(remasking)
134
+
135
+ x0_p[:, prompt.shape[1] + (num_block + 1) * block_length:] = -np.inf
136
+
137
+ x0 = torch.where(mask_index, x0, x)
138
+ confidence = torch.where(mask_index, x0_p, -np.inf)
139
+
140
+ transfer_index = torch.zeros_like(x0, dtype=torch.bool, device=x0.device)
141
+ for j in range(confidence.shape[0]):
142
+ _, select_index = torch.topk(confidence[j], k=num_transfer_tokens[j, i])
143
+ transfer_index[j, select_index] = True
144
+ x[transfer_index] = x0[transfer_index]
145
+
146
+ return x
147
+
148
+
149
+ device = 'cuda'
150
+ model = AutoModel.from_pretrained('/mnt/dllm/fengqi/LLaDA-MoE-7B-A1B-Instruct-Release', trust_remote_code=True, torch_dtype=torch.bfloat16).to(device).eval()
151
+ tokenizer = AutoTokenizer.from_pretrained('/mnt/dllm/fengqi/LLaDA-MoE-7B-A1B-Instruct-Release', trust_remote_code=True)
152
+
153
+ prompt = "Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?"
154
+ m = [
155
+ {"role": "system", "content": "You are a helpful AI assistant."},
156
+ {"role": "user", "content": prompt}
157
+ ]
158
+ prompt = tokenizer.apply_chat_template(m, add_generation_prompt=True, tokenize=False)
159
+
160
+ input_ids = tokenizer(prompt)['input_ids']
161
+ input_ids = torch.tensor(input_ids).to(device).unsqueeze(0)
162
+
163
+ text = generate(model, input_ids, steps=128, gen_length=128, block_length=32, temperature=0., cfg_scale=0., remasking='low_confidence')
164
+ print(tokenizer.batch_decode(text[:, input_ids.shape[1]:], skip_special_tokens=False)[0])
165
+
166
+
167
+
168
+
169
+ ```
170
+
171
+
172
+ ## πŸ“š Citation (Coming Soon)
173
+
174
+ We are preparing the technical report and citation information.
175
+ Stay tuned β€” citation details will be available soon.
176
+
177
+ ---
178
+
179
+ ## 🌐 License
180
+
181
+ This project is licensed under the terms of the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
182
+
183
+ ---
184
+
185
+ ## 🀝 Contact & Collaboration
186
+
187
+ For questions, collaborations, or feedback, please reach out via [Hugging Face](https://huggingface.co/your-model-page) or open an issue in the [repository](https://github.com/your-repo-link).
188
+
189
+ πŸ‘‰ Join us in advancing open, efficient, and intelligent language models!