Special Topic: Speech Processing for Dialogue

Required Readings

  1. Kun Wei, Yike Zhang, Sining Sun, Lei Xie, and Long Ma. Conversational speech recognition by learning conversation-level characteristics ICASSP 2022.
  2. Suyoun Kim and Florian Metze. Dialog-context aware end-to-end speech recognition SLT 2018.
  3. Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, and Keiichi Tokuda. End-to-end text-to-speech based on latent representation of speaking styles using spontaneous dialogue Interspeech 2022.
  4. Eva Szekely, Gustav Eje Henter, Jonas Beskow, and Joakin Gustafson. Spontaneous conversational speech synthesis from found data Interspeech 2019.

Other Readings
  1. Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. Toward human parity in conversational speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(12):2410-2423, 201
  2. Wayne Xiong, Lingfeng Wu, Jun Zhang, and Andreas Stolcke. Session-level Language Modeling for Conversational Speech. EMNLP 2018.
  3. Kallirroi Georgila, Anton Leuski, Volodymyr Yanov, and David Traum. Evaluation of Off-the-shelf Speech Recognizers Across Diverse Dialogue Domains. LREC 2020.
  4. Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, and Dan Su. Controllable context-aware conversational speech synthesis. Interspeech 2021.
  5. Johannah O’Mahony, Catherine Lai, and Simon King. Synthesising turn-taking cues using natural conversational data. Speech Synthesis Workshop 2023.
  6. Elijah Gutierrez, Pilar Oplustil-Gallegos, and Catherine Lai. Location, location: Enhancing the evaluation of text-to-speech synthesis using the rapid prosody transcription paradigm. Speech Synthesis Workshop 2019.
  7. Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Walle, and Bryan Catanzaro. Audio Flamingo: A novel audio language model with few-shot learning and dialogue abilities. arXiv:2402.01831 2024.
  8. Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, and Kang Min Yoo. Paralinguistics-Aware Speech-Empowered Large Language Models for Natural ConversationNeurips 2024 arXiv:2402.05706 2024.