Part of the OmniVoice MLX collection.

OmniVoice-8bit (MLX)

Apple MLX weights for k2-fsa/OmniVoice, a massively multilingual zero-shot TTS model. This is a community MLX conversion for Apple Silicon; the upstream model card, license, and repository remain authoritative for non-MLX usage.

TL;DR

Variant 8-bit
Best for smaller checkpoint with moderate compression
Runtime ailuntx/OmniVoice-MLX
Official code k2-fsa/OmniVoice
Format MLX safetensors + tokenizer/config assets
Hardware Apple Silicon recommended; HF Spaces Linux CPU fallback can be slow

Quick Start

hf download mlx-community/OmniVoice-8bit --local-dir ./models/OmniVoice-8bit

git clone https://github.com/ailuntx/OmniVoice-MLX.git
cd OmniVoice-MLX
python -m venv .venv
.venv/bin/pip install -e .

Run a minimal generation from the MLX helper repository:

.venv/bin/python scripts/infer_mlx.py \
  --model ./models/OmniVoice-8bit \
  --text "Hello from OmniVoice MLX." \
  --language en \
  --output output.wav

Variants

Variant Best for
OmniVoice default entry
OmniVoice-fp32 high-precision baseline
OmniVoice-bfloat16 high-quality Apple Silicon use
OmniVoice-8bit smaller local checkpoint
OmniVoice-4bit smallest checkpoint and Space default

Layout

OmniVoice-8bit/
├── config.json
├── model.safetensors / shards
├── tokenizer files
├── audio_tokenizer/
└── mlx_manifest.json

Conversion Notes

Component Source MLX handling
main model k2-fsa/OmniVoice converted to MLX weights
tokenizer/config official checkpoint copied for runtime compatibility
audio tokenizer official OmniVoice assets included as a required subcomponent

Validation

Local MLX smoke tests were used during conversion. For voice cloning checks, use a full audio tokenizer; slim tokenizer assets can decode audio but do not provide a reliable speaker-encoding path.

License

License follows the upstream OmniVoice release.

Citation

@misc{omnivoice-mlx,
  title  = {OmniVoice-MLX: Apple MLX port of OmniVoice},
  author = {ailuntx},
  year   = {2026},
  url    = {https://github.com/ailuntx/OmniVoice-MLX},
}

@article{zhu2026omnivoice,
  title   = {OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models},
  author  = {Zhu, Han and Ye, Lingxuan and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Han, Zhifeng and Zhuang, Weiji and Lin, Long and Povey, Daniel},
  journal = {arXiv preprint arXiv:2604.00688},
  year    = {2026},
}
Downloads last month
132
Safetensors
Model size
0.6B params
Tensor type
I64
·
F16
·
I8
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/OmniVoice-8bit

Finetuned
Qwen/Qwen3-0.6B
Finetuned
k2-fsa/OmniVoice
Finetuned
(36)
this model

Space using mlx-community/OmniVoice-8bit 1

Collection including mlx-community/OmniVoice-8bit

Paper for mlx-community/OmniVoice-8bit