Special Topic: Speech Processing for Dialogue
Required Readings
- Kun Wei, Yike Zhang, Sining Sun, Lei Xie, and Long Ma. Conversational speech recognition by learning conversation-level characteristics ICASSP 2022.
- Suyoun Kim and Florian Metze. Dialog-context aware end-to-end speech recognition SLT 2018.
- Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, and Keiichi Tokuda. End-to-end text-to-speech based on latent representation of speaking styles using spontaneous dialogue Interspeech 2022.
- Eva Szekely, Gustav Eje Henter, Jonas Beskow, and Joakin Gustafson. Spontaneous conversational speech synthesis from found data Interspeech 2019.
Other Readings
- Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. Toward human parity in conversational speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(12):2410-2423, 201
- Wayne Xiong, Lingfeng Wu, Jun Zhang, and Andreas Stolcke. Session-level Language Modeling for Conversational Speech. EMNLP 2018.
- Kallirroi Georgila, Anton Leuski, Volodymyr Yanov, and David Traum. Evaluation of Off-the-shelf Speech Recognizers Across Diverse Dialogue Domains. LREC 2020.
- Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, and Dan Su. Controllable context-aware conversational speech synthesis. Interspeech 2021.
- Johannah O’Mahony, Catherine Lai, and Simon King. Synthesising turn-taking cues using natural conversational data. Speech Synthesis Workshop 2023.
- Elijah Gutierrez, Pilar Oplustil-Gallegos, and Catherine Lai. Location, location: Enhancing the evaluation of text-to-speech synthesis using the rapid prosody transcription paradigm. Speech Synthesis Workshop 2019.
- Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Walle, and Bryan Catanzaro. Audio Flamingo: A novel audio language model with few-shot learning and dialogue abilities. arXiv:2402.01831 2024.
- Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, and Kang Min Yoo. Paralinguistics-Aware Speech-Empowered Large Language Models for Natural ConversationNeurips 2024 arXiv:2402.05706 2024.