Multilingual version planned?

#2
by fosple - opened

Thanks for sharing this great model! I‘m already looking for a long time for a fast real streaming model. Is there a multilingual version planned? Especially German and European languages? Whats the rough timeframe? :) Thanks a lot!

NVIDIA org
edited Jan 6

Hi @fosple , yes, we’re planning a multilingual version that will cover German and European languages. Will share more updates over the coming months!

For English, is this any better than Parakeet V0.2/0.3? (Not streaming scenario; need highest-quality transcripts)

Hey @kunaldhawan thanks for the model and the cache-aware streaming. The technical blog was super informative.

Just wondering if there are any plans to add support for Indian Languages, especially Hindi/Hinglish? Thanks!

Hey @pushkar-nurix you can try this model out in the mean time, its not Hinglish but supports Hindi very accurately for now , its based on nemo architecture, we are working and do have a plan to release Hinglish model and cache aware models as well in a few weeks

https://huggingface.co/spaces/RinggAI/STT

@kunaldhawan is there by any chance we can collaborate on releasing a subset of our Indic ASR models , together sometime in future

@kunaldhawan Hi kunal , regarding your post on Indic ASR—I'm currently working on smart-tuning techniques. I think this could be really useful for the Hinglish models you plan to release. I'd love to collaborate on this.

@kunaldhawan Hi, any chance to share training recipes, and how to work on tokenizer for new language?
I am planning to experiment using arabic datasets.

NVIDIA org

@kunaldhawan Hi, any chance to share training recipes, and how to work on tokenizer for new language?
I am planning to experiment using arabic datasets.

Hi @gnomefin , you can refer to the following NeMo scripts and configs to train or fine-tune the model and to build a tokenizer for a new language:

@kunaldhawan is there by any chance we can collaborate on releasing a subset of our Indic ASR models , together sometime in future
@kunaldhawan Hi kunal , regarding your post on Indic ASR—I'm currently working on smart-tuning techniques. I think this could be really useful for the Hinglish models you plan to release. I'd love to collaborate on this.

That sounds great, @harsh2ai and @vrushildavra . Please feel free to open an issue or PR in the NeMo repository and cc me. I’d be happy to collaborate.

Just wondering if there are any plans to add support for Indian Languages, especially Hindi/Hinglish? Thanks!

Thanks for the feedback, @pushkar-nurix . Support for Indian languages is definitely something we’re interested in, and we’ll look into adding this to the roadmap for upcoming releases.

high how much data is required for finetune on new language. if I used English pretrained weights

Sign up or log in to comment