robgreenberg3 commited on
Commit
9152f2f
·
verified ·
1 Parent(s): da5030f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -46
README.md CHANGED
@@ -67,51 +67,6 @@ Both models were trained on our [harmony response format](https://github.com/ope
67
 
68
  # Inference examples
69
 
70
- ## Transformers
71
-
72
- You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.
73
-
74
- To get started, install the necessary dependencies to setup your environment:
75
-
76
- ```
77
- pip install -U transformers kernels torch
78
- ```
79
-
80
- Once, setup you can proceed to run the model by running the snippet below:
81
-
82
- ```py
83
- from transformers import pipeline
84
- import torch
85
-
86
- model_id = "openai/gpt-oss-20b"
87
-
88
- pipe = pipeline(
89
- "text-generation",
90
- model=model_id,
91
- torch_dtype="auto",
92
- device_map="auto",
93
- )
94
-
95
- messages = [
96
- {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
97
- ]
98
-
99
- outputs = pipe(
100
- messages,
101
- max_new_tokens=256,
102
- )
103
- print(outputs[0]["generated_text"][-1])
104
- ```
105
-
106
- Alternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:
107
-
108
- ```
109
- transformers serve
110
- transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b
111
- ```
112
-
113
- [Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)
114
-
115
  ## vLLM
116
 
117
  vLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
@@ -262,12 +217,58 @@ curl https://<inference-service-name>-predictor-default.<domain>/v1/chat/complet
262
  See [Red Hat Openshift AI documentation](https://docs.redhat.com/en/documentation/red_hat_openshift_ai/2025) for more details.
263
  </details>
264
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
265
 
266
  ## PyTorch / Triton
267
 
268
  To learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).
269
 
270
- ## Ollama
 
271
 
272
  If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).
273
 
@@ -279,6 +280,8 @@ ollama run gpt-oss:20b
279
 
280
  [Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)
281
 
 
 
282
  #### LM Studio
283
 
284
  If you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.
 
67
 
68
  # Inference examples
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ## vLLM
71
 
72
  vLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
 
217
  See [Red Hat Openshift AI documentation](https://docs.redhat.com/en/documentation/red_hat_openshift_ai/2025) for more details.
218
  </details>
219
 
220
+ ## Transformers
221
+
222
+ You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.
223
+
224
+ To get started, install the necessary dependencies to setup your environment:
225
+
226
+ ```
227
+ pip install -U transformers kernels torch
228
+ ```
229
+
230
+ Once, setup you can proceed to run the model by running the snippet below:
231
+
232
+ ```py
233
+ from transformers import pipeline
234
+ import torch
235
+
236
+ model_id = "openai/gpt-oss-20b"
237
+
238
+ pipe = pipeline(
239
+ "text-generation",
240
+ model=model_id,
241
+ torch_dtype="auto",
242
+ device_map="auto",
243
+ )
244
+
245
+ messages = [
246
+ {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
247
+ ]
248
+
249
+ outputs = pipe(
250
+ messages,
251
+ max_new_tokens=256,
252
+ )
253
+ print(outputs[0]["generated_text"][-1])
254
+ ```
255
+
256
+ Alternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:
257
+
258
+ ```
259
+ transformers serve
260
+ transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b
261
+ ```
262
+
263
+ [Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)
264
+
265
 
266
  ## PyTorch / Triton
267
 
268
  To learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).
269
 
270
+ <details>
271
+ <summary><strong>Ollama</strong></summary>
272
 
273
  If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).
274
 
 
280
 
281
  [Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)
282
 
283
+ </details>
284
+
285
  #### LM Studio
286
 
287
  If you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.