Polyglot-Lion-1.7B: High-accuracy multilingual ASR for Singapore — English, Mandarin, Tamil & Malay

Project Page GitHub License: MIT

Average error rate comparison across models

CHANGE LOG: This version was retrained on the same dataset without punctuation removal to improve the model’s ability to recognize pauses and sentence boundaries in speech.

About

Polyglot-Lion-1.7B was developed by Quy-Anh Dang and Chris Ngo at Knovel Engineering and presented in the report "Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR".

The model is obtained by fine-tuning Qwen3-ASR-1.7B exclusively on publicly available speech corpora covering Singapore's four official languages. It utilizes a balanced sampling strategy that equalizes the number of training utterances per language and deliberately omits language-tag conditioning, allowing the model to learn to identify languages implicitly from audio.

Polyglot-Lion-1.7B achieves an average error rate of 14.85 — competitive with MERaLiON-2-10B-ASR (14.32), a model 6× larger and 20× faster inference.

  • Parameters: 1.7B
  • Languages: English, Mandarin, Tamil, Malay
  • Training cost: $81 on a single NVIDIA RTX PRO 6000 (48 h)
  • Inference speed: ~0.10 s/sample on RTX PRO 4500

Results

Model Params English (LS) English (NSC) Mandarin (CV) Mandarin (AISH1) Mandarin (AISH3) Mandarin (Fleurs) Tamil (CV) Tamil (SLR65) Tamil (SLR127) Tamil (Fleurs) Malay (Meso.) Malay (Fleurs) Avg
Whisper-large-v3-turbo 0.8B 3.04 32.02 17.91 9.64 16.81 10.63 74.50 58.13 69.56 66.90 28.47 8.88 33.04
SeaLLMs-Audio-7B 7B 94.74 9.53 8.68 9.65 9.76 37.09 126.70 127.24 138.65 105.31 71.34 26.25 63.75
Qwen2.5-Omni-3B 3B 29.21 34.79 46.36 28.25 44.55 54.74 318.36 465.58 448.82 311.67 211.90 74.69 172.37
Qwen2.5-Omni-7B 7B 13.80 22.96 14.49 7.33 22.58 16.68 252.06 239.15 303.96 326.43 158.06 43.92 118.45
Qwen3-ASR-0.6B 0.6B 2.74 7.64 10.06 2.08 2.59 9.75 121.10 127.00 129.12 130.09 47.29 18.71 50.68
Qwen3-ASR-1.7B 1.7B 2.31 6.22 7.50 1.52 2.08 9.33 139.96 134.63 144.49 147.23 39.00 10.87 53.76
MERaLiON-2-10B-ASR 10B 2.54 4.62 8.83 3.09 4.07 11.99 31.78 19.29 22.42 28.68 25.90 8.55 14.32
Polyglot-Lion-0.6B 0.6B 2.67 6.09 6.16 1.93 2.32 9.19 42.16 23.07 28.14 37.68 24.33 14.45 16.52
Polyglot-Lion-1.7B 1.7B 2.10 5.28 4.91 1.45 1.86 8.00 39.19 19.75 26.83 37.28 21.51 9.98 14.85

WER (%) for English, Tamil, and Malay; CER (%) for Mandarin. Lower is better. Bold = best overall.

Quick Start

See mlx-audio for inference.

Citation

@misc{dang2026polyglotlion,
    title={Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR}, 
    author={Quy-Anh Dang and Chris Ngo},
    year={2026},
    eprint={2603.16184},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2603.16184}, 
}
Downloads last month
28
Safetensors
Model size
2B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for knoveleng/polyglot-lion-1.7b-v1.5-mlx-bf16

Finetuned
(55)
this model

Datasets used to train knoveleng/polyglot-lion-1.7b-v1.5-mlx-bf16

Collection including knoveleng/polyglot-lion-1.7b-v1.5-mlx-bf16

Paper for knoveleng/polyglot-lion-1.7b-v1.5-mlx-bf16