Polyglot-Lion-1.7B: High-accuracy multilingual ASR for Singapore — English, Mandarin, Tamil & Malay

Average error rate comparison across models

CHANGE LOG: This version was retrained on the same dataset without punctuation removal to improve the model’s ability to recognize pauses and sentence boundaries in speech.

About

Polyglot-Lion-1.7B was developed by Quy-Anh Dang and Chris Ngo at Knovel Engineering and presented in the report "Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR".

The model is obtained by fine-tuning Qwen3-ASR-1.7B exclusively on publicly available speech corpora covering Singapore's four official languages. It utilizes a balanced sampling strategy that equalizes the number of training utterances per language and deliberately omits language-tag conditioning, allowing the model to learn to identify languages implicitly from audio.

Polyglot-Lion-1.7B achieves an average error rate of 14.85 — competitive with MERaLiON-2-10B-ASR (14.32), a model 6× larger and 20× faster inference.

Parameters: 1.7B
Languages: English, Mandarin, Tamil, Malay
Training cost: $81 on a single NVIDIA RTX PRO 6000 (48 h)
Inference speed: ~0.10 s/sample on RTX PRO 4500

Results

Model	Params	English (LS)	English (NSC)	Mandarin (CV)	Mandarin (AISH1)	Mandarin (AISH3)	Mandarin (Fleurs)	Tamil (CV)	Tamil (SLR65)	Tamil (SLR127)	Tamil (Fleurs)	Malay (Meso.)	Malay (Fleurs)	Avg
Whisper-large-v3-turbo	0.8B	3.04	32.02	17.91	9.64	16.81	10.63	74.50	58.13	69.56	66.90	28.47	8.88	33.04
SeaLLMs-Audio-7B	7B	94.74	9.53	8.68	9.65	9.76	37.09	126.70	127.24	138.65	105.31	71.34	26.25	63.75
Qwen2.5-Omni-3B	3B	29.21	34.79	46.36	28.25	44.55	54.74	318.36	465.58	448.82	311.67	211.90	74.69	172.37
Qwen2.5-Omni-7B	7B	13.80	22.96	14.49	7.33	22.58	16.68	252.06	239.15	303.96	326.43	158.06	43.92	118.45
Qwen3-ASR-0.6B	0.6B	2.74	7.64	10.06	2.08	2.59	9.75	121.10	127.00	129.12	130.09	47.29	18.71	50.68
Qwen3-ASR-1.7B	1.7B	2.31	6.22	7.50	1.52	2.08	9.33	139.96	134.63	144.49	147.23	39.00	10.87	53.76
MERaLiON-2-10B-ASR	10B	2.54	4.62	8.83	3.09	4.07	11.99	31.78	19.29	22.42	28.68	25.90	8.55	14.32

Polyglot-Lion-0.6B	0.6B	2.67	6.09	6.16	1.93	2.32	9.19	42.16	23.07	28.14	37.68	24.33	14.45	16.52
Polyglot-Lion-1.7B	1.7B	2.10	5.28	4.91	1.45	1.86	8.00	39.19	19.75	26.83	37.28	21.51	9.98	14.85

WER (%) for English, Tamil, and Malay; CER (%) for Mandarin. Lower is better. Bold = best overall.

Quick Start

See mlx-audio for inference.

Citation

@misc{dang2026polyglotlion,
    title={Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR}, 
    author={Quy-Anh Dang and Chris Ngo},
    year={2026},
    eprint={2603.16184},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2603.16184}, 
}