Stage-2 fine-tuning details

#69
by leestevennz - opened

Hi,

One of the listed key features of parakeet-tdt-0.6b-v2 is its robust performance on spoken numbers. The model card also mentions a stage-2 fine-tune for 2500 steps on a 500-hour subset from nemo-asrset-3.0. I expect the finetuning on high quality human labelled data helps with the key features of robustness and numbers.

  1. Could you provide more details about this fine-tuning subset and the stage-2 process?
  2. Is there any possibility of accessing this 500-hour subset?
  3. Any recommendations for maintaining number transcription accuracy during custom fine-tuning?

Thanks in advance.

Sign up or log in to comment