Yifan Yang's picture

Yifan Yang

yfyeung

·

https://yfyeung.github.io

yfyeung

AI & ML interests

None yet

Recent Activity

new activity 17 days ago

AlexTYJ/Multilingual-ASR-Benchmark:Delete testbatch

new activity 19 days ago

AlexTYJ/Multilingual-ASR-Benchmark:Delete text/hyp/testbatch/text

new activity 23 days ago

Hui519/SpeechEval:Delete .vscode

View all activity

Organizations

authored 2 papers 2 months ago

SPEAR: A Unified SSL Framework for Learning Speech and Audio Representations

Paper • 2510.25955 • Published Oct 29, 2025

SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation

Paper • 2510.14664 • Published Oct 16, 2025

authored 2 papers 3 months ago

Towards Responsible Evaluation for Text-to-Speech

Paper • 2510.06927 • Published Oct 8, 2025

Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration

Paper • 2509.19928 • Published Sep 24, 2025 • 1

authored 2 papers 6 months ago

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling

Paper • 2506.12570 • Published Jun 14, 2025 • 1

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Paper • 2409.08797 • Published Sep 13, 2024

authored 2 papers 7 months ago

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling

Paper • 2505.19669 • Published May 26, 2025

VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining

Paper • 2505.21527 • Published May 23, 2025

authored 2 papers 9 months ago

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Paper • 2504.12867 • Published Apr 17, 2025

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Paper • 2504.10352 • Published Apr 14, 2025

authored 4 papers 10 months ago

Exploring SSL Discrete Tokens for Multilingual ASR

Paper • 2409.08805 • Published Sep 13, 2024

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Paper • 2409.19510 • Published Sep 29, 2024

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR

Paper • 2406.06619 • Published Jun 7, 2024

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching

Paper • 2502.11128 • Published Feb 16, 2025

authored 3 papers about 1 year ago

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Paper • 2412.15649 • Published Dec 20, 2024 • 1

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Paper • 2412.16102 • Published Dec 20, 2024

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

Paper • 2411.17100 • Published Nov 26, 2024

authored 3 papers over 1 year ago

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Paper • 2409.00819 • Published Sep 1, 2024

Zipformer: A faster and better encoder for automatic speech recognition

Paper • 2310.11230 • Published Oct 17, 2023 • 1

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

Paper • 2401.14321 • Published Jan 25, 2024