Instructions to use Open4bits/nexora-vector-v0.1-mlx-4Bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Open4bits/nexora-vector-v0.1-mlx-4Bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Open4bits/nexora-vector-v0.1-mlx-4Bit") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Open4bits/nexora-vector-v0.1-mlx-4Bit") model = AutoModelForCausalLM.from_pretrained("Open4bits/nexora-vector-v0.1-mlx-4Bit") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - MLX
How to use Open4bits/nexora-vector-v0.1-mlx-4Bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Open4bits/nexora-vector-v0.1-mlx-4Bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- vLLM
How to use Open4bits/nexora-vector-v0.1-mlx-4Bit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Open4bits/nexora-vector-v0.1-mlx-4Bit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open4bits/nexora-vector-v0.1-mlx-4Bit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Open4bits/nexora-vector-v0.1-mlx-4Bit
- SGLang
How to use Open4bits/nexora-vector-v0.1-mlx-4Bit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Open4bits/nexora-vector-v0.1-mlx-4Bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open4bits/nexora-vector-v0.1-mlx-4Bit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Open4bits/nexora-vector-v0.1-mlx-4Bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open4bits/nexora-vector-v0.1-mlx-4Bit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Pi new
How to use Open4bits/nexora-vector-v0.1-mlx-4Bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Open4bits/nexora-vector-v0.1-mlx-4Bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Open4bits/nexora-vector-v0.1-mlx-4Bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Open4bits/nexora-vector-v0.1-mlx-4Bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Open4bits/nexora-vector-v0.1-mlx-4Bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Open4bits/nexora-vector-v0.1-mlx-4Bit
Run Hermes
hermes
- MLX LM
How to use Open4bits/nexora-vector-v0.1-mlx-4Bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "Open4bits/nexora-vector-v0.1-mlx-4Bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "Open4bits/nexora-vector-v0.1-mlx-4Bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open4bits/nexora-vector-v0.1-mlx-4Bit", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use Open4bits/nexora-vector-v0.1-mlx-4Bit with Docker Model Runner:
docker model run hf.co/Open4bits/nexora-vector-v0.1-mlx-4Bit
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Open4bits/nexora-vector-v0.1-mlx-4Bit")
model = AutoModelForCausalLM.from_pretrained("Open4bits/nexora-vector-v0.1-mlx-4Bit")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Nexora-Vector-v0.1 · MLX 4-Bit
Nexora-Vector-v0.1 MLX 4-Bit is the official Apple MLX 4-bit quantized release of Nexora-Vector-v0.1, published by Open4bits — an official quantization project under ArkAiLabs. Nexora-Vector is an experimental text-to-vector model that generates structured SVG graphics from natural language prompts. This variant is optimized for efficient local inference on Apple Silicon hardware via the MLX framework.
Table of Contents
- Overview
- Model Details
- Capabilities
- Limitations
- Intended Use
- Architecture & Quantization
- Usage Recommendations
- Original Model
- Evaluation
- Risks & Considerations
- Future Work
- Community & Support
- License
- Acknowledgements
Overview
This is the official MLX 4-bit quantized release of Nexora-Vector-v0.1, published by Open4bits — the official quantization project under ArkAiLabs — and converted for use with Apple's MLX framework. The base model is a supervised fine-tuned variant of Qwen3-4B, adapted specifically to generate structured vector graphics in SVG format from natural language instructions.
This release is in beta and is intended for research, experimentation, and early-stage design tooling on Apple Silicon machines. All outputs should be validated before use in any downstream pipeline.
Model Details
| Property | Details |
|---|---|
| Model Type | MLX 4-Bit Quantized |
| Base Model | Nexora-Vector-v0.1 |
| Original Base | Qwen3-4B |
| Fine-tuning Method | Supervised Fine-Tuning (SFT) |
| Quantization | 4-Bit (MLX) |
| Target Hardware | Apple Silicon (M1/M2/M3/M4 series) |
| Framework | MLX |
| Output Format | SVG |
| License | Apache 2.0 |
Capabilities
Nexora-Vector-v0.1 is designed to translate textual instructions into structured SVG code. This MLX version retains all capabilities of the original model while enabling fast, memory-efficient inference on Apple Silicon. The model is best suited for:
- Generating SVG markup for simple vector graphics
- Producing geometric shapes and basic illustrations
- Creating lightweight icons and minimal design assets
- Supporting rapid prototyping in vector-based design workflows on macOS
Tip: The model performs best with concise, clearly scoped prompts focused on simple visual compositions.
Limitations
This is an early-stage beta release. Users should be aware of the following constraints:
- High hallucination rate — outputs may be invalid or non-renderable SVG
- Limited generalization — the small training dataset (~1,500 samples) affects output consistency
- Weak complex scene handling — highly detailed or multi-element prompts may produce poor results
- Manual correction required — outputs should be validated and post-processed before use
- Not production-ready — not suitable for safety-critical or automated pipelines
- 4-bit quality trade-off — minor quality degradation is expected compared to the full-precision original model
Intended Use
✅ Supported Use Cases
- Academic and applied research in text-to-vector generation on Apple Silicon
- Experimental AI-assisted design systems running locally on macOS
- Educational exploration of structured output generation
- Lightweight SVG prototyping and ideation with low memory overhead
❌ Out-of-Scope Use Cases
- Production-grade or commercial vector asset pipelines
- High-precision design deliverables without human validation
- Automated systems where SVG correctness is required without manual review
- Non-Apple-Silicon hardware (use the GGUF version instead)
Architecture & Quantization
This model is a 4-bit MLX quantization of the original Nexora-Vector-v0.1 weights, which are themselves a supervised fine-tune of Qwen3-4B.
Quantization Details
| Parameter | Details |
|---|---|
| Quantization Method | MLX 4-Bit |
| Source Model | ArkAiLab-Adl/nexora-vector-v0.1 |
| Framework | Apple MLX |
| Memory Reduction | ~75% vs. full-precision (fp16) |
| Target Platform | macOS with Apple Silicon |
Original Training Configuration
| Parameter | Details |
|---|---|
| Fine-tuning Method | Supervised Fine-Tuning (SFT) |
| Dataset Composition | Curated prompt–SVG pairs |
| Dataset Size | ~1,500 samples |
| Training Objective | Structured output generation for SVG formats |
Note: The relatively small dataset size may result in instability and limited generalization across diverse prompts. Improved dataset coverage is planned for future versions.
Usage Recommendations
To get the best results from this model:
- Keep prompts simple and specific — avoid multi-scene or highly complex compositions
- Validate all SVG outputs before rendering or integrating into any pipeline
- Post-process outputs to correct syntax or structural issues
- Use iterative prompting — refining prompts across multiple turns often yields better results
- Expect imperfections — this is a beta model; treat outputs as drafts, not finals
- Run on Apple Silicon — this MLX build is optimized for M1/M2/M3/M4 series chips
Original Model
| Version | Link |
|---|---|
| Original (full precision) | ArkAiLab-Adl/nexora-vector-v0.1 |
| GGUF Quantized | Open4bits/nexora-vector-v0.1-GGUF |
| MLX 4-Bit (this model) | Open4bits/nexora-vector-v0.1-mlx-4Bit |
Evaluation
Nexora-Vector-v0.1 has not yet undergone formal benchmark evaluation. Current assessment is qualitative, based on manual testing of SVG generation tasks.
Planned evaluation metrics for future releases include:
| Metric | Description |
|---|---|
| SVG Validity Rate | Percentage of outputs that are parseable, valid SVG |
| Structural Correctness | Adherence to SVG schema and element hierarchy |
| Prompt Adherence | Alignment between user intent and generated output |
| Visual Consistency | Stability of outputs across similar prompts |
Risks & Considerations
Developers integrating this model should account for the following risks:
- Generation of malformed or non-functional SVG code
- Inconsistent instruction following across prompt variations
- Unpredictable outputs due to limited training data coverage
- Minor quality reduction inherent to 4-bit quantization
Recommendation: Implement downstream validation layers and SVG syntax checking before any rendering or integration.
Future Work
The following improvements are planned for upcoming versions of the Nexora Vector series:
- Expanded and more diverse training dataset
- Improved SVG syntax correctness and validity rates
- Reduced hallucination rates
- Enhanced natural language understanding for complex prompts
- Support for richer vector compositions and multi-element scenes
- Formal benchmark evaluation suite
- Updated MLX quantized releases aligned with future model versions
Community & Support
Join the community for updates and discussion:
License
This model is released under the Apache License 2.0.
You may use, modify, and distribute this model in accordance with the terms of the Apache 2.0 license. See the LICENSE file for full details, or refer to the official Apache 2.0 license text.
Acknowledgements
This is an official ArkAiLabs release, published under the Open4bits project — ArkAiLabs' dedicated initiative for quantized model releases. The MLX 4-bit weights are derived from Nexora-Vector-v0.1, which is itself built upon Qwen3-4B by the Qwen team. We thank the MLX team at Apple and the open-source AI community for their continued contributions that make projects like this possible.
About Nexora & Open4bits
Nexora is an experimental AI initiative under ArkAiLabs, focused on building lightweight, practical, and creative AI systems for real-world applications. The Nexora Vector series represents our exploration into AI-assisted vector graphics generation.
Open4bits is ArkAiLabs' official project for quantized model releases, providing optimized variants of Nexora models for efficient local inference across different hardware platforms.
- Downloads last month
- 15
4-bit
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Open4bits/nexora-vector-v0.1-mlx-4Bit") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)