Instructions to use microsoft/Phi-3-mini-128k-instruct-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Phi-3-mini-128k-instruct-onnx with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/Phi-3-mini-128k-instruct-onnx", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct-onnx", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-128k-instruct-onnx", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use microsoft/Phi-3-mini-128k-instruct-onnx with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/Phi-3-mini-128k-instruct-onnx" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-3-mini-128k-instruct-onnx", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
- SGLang
How to use microsoft/Phi-3-mini-128k-instruct-onnx with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/Phi-3-mini-128k-instruct-onnx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-3-mini-128k-instruct-onnx", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/Phi-3-mini-128k-instruct-onnx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-3-mini-128k-instruct-onnx", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use microsoft/Phi-3-mini-128k-instruct-onnx with Docker Model Runner:
docker model run hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
Could you tell me how to transfer a Phi-3 model (safetensors) to its onnx?
Hi dear!
I am a researcher from UW. And I have a fine-tuned model from Phi-3-mini-128k-instruct. I just wonder how to transfer it to onnx?
Your help is very essential to me.
Thanks!
You can use ONNX Runtime GenAI's model builder to quickly convert your fine-tuned Phi-3-mini-128k-instruct model to optimized and quantized ONNX models. This example should work for your scenario.
Hi @kvaishnavi Thank you for your help!
What about more than two safetensors? How to convert many safetensors split from one large model to a final onnx file?
Thanks!
If your fine-tuned model can be loaded with Hugging Face's AutoModelForCausalLM.from_pretrained method, then the model builder can produce the final ONNX model from any number of .safetensors files.
It does work!
Thank you very much for your soon feedback!
Hi dear,
BTW, I wonder in real application, which one should I use? model.onnx or model.onnx.data? What's the difference between them?
I get it!
Thank you again!