MioTTS-2.6B: Lightweight & Fast LLM-based TTS

Hugging Face Collection Inference Code

MioTTS-2.6B is a lightweight, high-speed Text-to-Speech (TTS) model based on an LLM architecture. It is designed to generate high-quality speech in English and Japanese while maintaining low latency and minimal resource usage.

This model supports zero-shot voice cloning and is built on top of the efficient neural audio codec MioCodec-25Hz-24kHz.

๐Ÿ“Š MioTTS Family

We offer a range of model sizes to suit different performance and resource requirements.

Model Name Parameters Base Model License RTF (Real-Time Factor)
MioTTS-0.1B 0.1B tiiuae/Falcon-H1-Tiny-Multilingual-100M-Base Falcon-LLM License 0.04 - 0.05
MioTTS-0.4B 0.4B LiquidAI/LFM2-350M LFM Open License v1.0 0.035 - 0.045
MioTTS-0.6B 0.6B Qwen/Qwen3-0.6B-Base Apache 2.0 0.055 - 0.065
MioTTS-1.2B 1.2B LiquidAI/LFM2.5-1.2B-Base LFM Open License v1.0 0.065 - 0.075
MioTTS-1.7B 1.7B Qwen/Qwen3-1.7B-Base Apache 2.0 0.10 - 0.11
MioTTS-2.6B 2.6B LiquidAI/LFM2-2.6B LFM Open License v1.0 0.135 - 0.145

RTF values represent the range observed when generating approximately 15 seconds of audio across multiple runs. Measured on an NVIDIA RTX 5090 using vLLM 0.15.1.

๐ŸŒŸ Key Features

  • Lightweight & Fast: Optimized for speed, making it suitable for consumer-grade GPUs and edge deployment.
  • Bilingual Support: Trained on approximately 100,000 hours of English and Japanese data.
  • Voice Cloning: Supports high-fidelity zero-shot voice cloning from a short reference audio clip.
  • Efficient Codec: Uses Aratako/MioCodec-25Hz-24kHz, which operates at a low framerate (25Hz) for faster generation without sacrificing quality.

๐Ÿš€ Inference

We provide a dedicated repository for inference, including installation instructions and example WebUI.

๐Ÿ‘‰ GitHub: Aratako/MioTTS-Inference

๐ŸŽง Audio Samples

Below are some samples generated by MioTTS-2.6B.

Note: The reference audio samples below were generated using Aratako/T5Gemma-TTS-2b-2b and gemini-2.5-pro-tts.

Case Text Reference Audio Generated Audio
English 1 "The old library was silent, save for the gentle ticking of a clock somewhere in the shadows. As I ran my fingers along the dusty spines of the books, I felt a strange sense of nostalgia, as if I had lived a thousand lives within these walls."
English 2 "Hey! I haven't seen you in ages. Do you want to grab some coffee later? I've got so much to tell you!"
Japanese 1 "ๆฐ—่ฑกๅบใซใ‚ˆใ‚Šใพใ™ใจใ€ๅคงๅž‹ใฎๅฐ้ขจ10ๅทใฏใ€ๆ˜Žๆ—ฅใฎๆ˜Žใ‘ๆ–นใซใ‹ใ‘ใฆ้–ขๆฑๅœฐๆ–นใซๆŽฅ่ฟ‘ใ™ใ‚‹่ฆ‹่พผใฟใงใ™ใ€‚ๆฒฟๅฒธ้ƒจใงใฏ้ซ˜ๆณขใซ่ญฆๆˆ’ใŒๅฟ…่ฆใงใ™ใ€‚"
Japanese 2 "ใใฎๆฃฎใซใฏใ€ๅคใ„่จ€ใ„ไผใˆใŒใ‚ใ‚Šใพใ—ใŸใ€‚ๆœˆใŒๆœ€ใ‚‚้ซ˜ใๆ˜‡ใ‚‹ๅคœใ€้™ใ‹ใซ่€ณใ‚’ๆพ„ใพใ›ใฐใ€้ขจใฎๆญŒๅฃฐใŒ่žใ“ใˆใ‚‹ใจใ„ใ†ใฎใงใ™ใ€‚็งใฏๅŠไฟกๅŠ็–‘ใงใ—ใŸใŒใ€ใใฎๅคœใ€็ขบใ‹ใซ่ชฐใ‹ใŒ็งใ‚’ๅ‘ผใถๅฃฐใ‚’่žใ„ใŸใฎใงใ™ใ€‚"

๐Ÿ—๏ธ Training Details

๐Ÿ“œ License & Ethical Restrictions

License

This model is released under the LFM Open License v1.0.

Ethical Considerations & Limitations

While this model is released under a permissive license, we aim to promote responsible AI development and urge users to respect the rights of others.

  1. Voice Cloning: Please respect the privacy and rights of individuals. We strongly discourage using this model to clone the voices of real people (especially non-consenting individuals) for deceptive or harmful purposes.
  2. No Misinformation: This model should not be used to generate deepfakes intended to mislead others or spread misinformation.
  3. Disclaimer: The developers assume no liability for any misuse of this model. Users are solely responsible for ensuring their use of the generated content complies with applicable laws and regulations in their jurisdiction.

๐Ÿ™ Acknowledgments

  • Compute Support: Part of the compute resources for this project were provided by Saldra, Witness and Lumina Logic Minds. We deeply appreciate their support.
  • Base Model: We thank the developers of the base LLM for their open-source contributions.
  • Community: Thanks to the open-source community for the datasets and tools that made this project possible.

๐Ÿ–Š๏ธ Citation

If you use MioTTS in your research or project, please cite it as follows:

@misc{miotts,
  author = {Chihiro Arata},
  title = {MioTTS: Lightweight and Fast LLM-based Text-to-Speech},
  year = {2026},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/collections/Aratako/miotts}}
}
Downloads last month
174
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Aratako/MioTTS-2.6B

Base model

LiquidAI/LFM2-2.6B
Finetuned
(16)
this model
Quantizations
1 model

Datasets used to train Aratako/MioTTS-2.6B

Collection including Aratako/MioTTS-2.6B