Erica Cooper

クーパー・エリカ

Last update: 2024-04-01 (created page)

About

I completed the Ph.D. degree at Columbia University in the City of New York in 2019 with a research focus on text-to-speech synthesis for low-resource languages. I worked at the National Institute of Informatics in Tokyo, Japan from February 2019 - March 2024 as a contributor on the JST-ANR CREST VoicePersonae project. I am currently working at the National Institute of Information and Communications Technology (NICT) in Kyoto, Japan. My research interests include speech and audio processing and synthesis, and I was a co-organizer of the VoiceMOS Challenge in 2022 and 2023.

Work and education history

Peer-reviewed publications

Synvox2: Towards A Privacy-Friendly Voxceleb2 Dataset. Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 14, 2024

Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction. Aditya Ravuri, Erica Cooper, Junichi Yamagishi. IEEE ICASSP 2024 workshop on Self-supervision in Audio, Speech and Beyond, Apr, 2024

Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances. Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi. Computer Speech & Language, 86 101619-101619, Jun, 2024

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains. Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi. ASRU 2023, Dec, 2023

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting. Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah. ASRU 2023, Dec, 2023

Exploring Isolated Musical Notes as Pre-training Data for Predominant Instrument Recognition in Polyphonic Music. Lifan Zhong, Erica Cooper, Junichi Yamagishi, Nobuaki Minematsu. APSIPA ASC 2023, Oct, 2023

Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech. Erica Cooper, Junichi Yamagishi. Interspeech 2023, Aug, 2023

SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion. Orian Sharoni, Roee Shenberg, Erica Cooper. Interspeech 2023, Aug, 2023

Range-Based Equal Error Rate for Spoof Localization. Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi. Interspeech 2023, Aug, 2023

Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms Chang Zeng, Xin Wang, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi. Interspeech 2023, Aug, 2023

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems? Xuan Shi, Erica Cooper, Xin Wang, Junichi Yamagishi, Shrikanth Narayanan. Submitted to ICASSP 2023, Jun, 2023

Speaker Anonymization using Orthogonal Householder Neural Network. Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 1-15, 2023

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance. Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31 813-825, 2023

Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions. Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko. Interspeech 2022, Sep, 2022

The VoiceMOS Challenge 2022. Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi. Interspeech 2022, Sep, 2022

Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models. Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko. Odyssey 2022: The Speaker and Language Recognition Workshop, Jun, 2022

Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances. Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi. ICASSP 2022, May, 2022

On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis. Cheng-I Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander Liu, Junichi Yamagishi, David Cox … ICASSP 2022, May, 2022

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech. Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda. ICASSP 2022, May, 2022

Generalization Ability of MOS Prediction Networks. Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi. ICASSP 2022, May, 2022

Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds. Xuan Shi, Erica Cooper, Junichi Yamagishi. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30 367-377, Jan, 2022

Multi-task learning in utterance-level and segmental-level spoof detection. Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi. ASVspoof 2021, Sep, 2021

An Initial Investigation for Detecting Partially Spoofed Audio. Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi, Jose Patino, Nicholas Evans. Interspeech 2021, Sep, 2021

How do Voices from Past Speech Synthesis Challenges Compare Today? Erica Cooper, Junichi Yamagishi. 11th ISCA Speech Synthesis Workshop, Aug, 2021

Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis. Erica Cooper, Xin Wang, Junichi Yamagishi. 11th ISCA Speech Synthesis Workshop, Aug, 2021

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE. Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi. 11th ISCA Speech Synthesis Workshop, Aug, 2021

Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm. Jennifer Williams, Yi Zhao, Erica Cooper, Junichi Yamagishi. ICASSP 2021, Jun, 2021

How Similar or Different Is Rakugo Speech Synthesizer to Professional Performers? Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Junichi Yamagishi. ICASSP 2021, Jun, 2021

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction. Yi Zhao, Haoyu Li, Cheng-I Lai, Jennifer Williams, Erica Cooper, Junichi Yamagishi. Interspeech 2020, Oct, 2020

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS? Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi. Interspeech 2020, Oct, 2020

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings. Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi. ICASSP 2020, May, 2020

Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences. Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi. IEEE Access, 8 138149-138161, 2020

Subset Selection, Adaptation and Gemination for Amharic Text-to-Speech Synthesis. Elshadai Tesfaye Biru, Yishak Tofik Mohammed, David Tofu, Erica Cooper, Julia Hirschberg. 10th ISCA Speech Synthesis Workshop (SSW10), Sep, 2019

Rakugo speech synthesis using segment-to-segment neural transduction and style tokens — toward speech synthesis for entertaining audiences. Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi. 10th ISCA Speech Synthesis Workshop (SSW10), Sep, 2019

A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis. Kai-Zhan Lee, Erica Cooper, Julia Hirschberg. Interspeech, September 2018, Hyderabad, India.

Adaptation and Frontend Features to Improve Naturalness in Found-Data Synthesis. Erica Cooper, Julia Hirschberg. Speech Prosody, June 2018, Poznań, Poland.

Characteristics of Text-to-Speech and Other Corpora. Erica Cooper, Emily Li, Julia Hirschberg. Speech Prosody, June 2018, Poznań, Poland.

Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data. Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg. Interspeech, August 2017, Stockholm, Sweden.

Data Selection and Adaptation for Naturalness in HMM-based Speech Synthesis. Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg. Interspeech, September 2016, San Francisco, California.

Babler - Data Collection from the Web to Support Speech Recognition and Keyword Search. Gideon Mendels, Erica Cooper, Julia Hirschberg. 10th Web as Corpus Workshop (WAC-X), August 2016, Berlin, Germany.

Data Selection for Naturalness in HMM-based Speech Synthesis. Erica Cooper, Yocheved Levitan, Julia Hirschberg. Speech Prosody, June 2016, Boston, Massachusetts.

Improving Speech Recognition and Keyword Search for Low Resource Languages Using Web Data. Gideon Mendels, Erica Cooper, Victor Soto, Julia Hirschberg, Mark Gales, Kate Knill, Anton Ragni, Haipeng Wang. Interspeech, September 2015, Dresden, Germany.

Rescoring Confusion Networks for Keyword Search. Victor Soto, Erica Cooper, Lidia Mangu, Andrew Rosenberg, Julia Hirschberg. Victor Soto, Erica Cooper, Lidia Mangu, Andrew Rosenberg, Julia Hirschberg. International Conference on Acoustics, Speech and Signal Processing, May 2014, Florence, Italy.

Cross-Language Phrase Boundary Detection. Victor Soto, Erica Cooper, Andrew Rosenberg, Julia Hirschberg. International Conference on Acoustics, Speech and Signal Processing, May 2013, Vancouver, Canada.

Cross-Language Prominence Detection. Andrew Rosenberg, Erica Cooper, Rivka Levitan, Julia Hirschberg. Speech Prosody, May 2012, Shanghai, China.

Effect of Pronunciations on OOV Queries in Spoken Term Detection. Dogan Can, Erica Cooper, Abhinav Sethy, Chris White, Bhuvana Ramabhadran, Murat Saraclar. International Conference on Acoustics, Speech and Signal Processing, April 2009, Taipei, Taiwan.

Unsupervised Pronunciation Validation. Christopher M. White, Abhinav Sethy, Bhuvana Ramabhadran, Patrick Wolfe, Erica Cooper, Murat Saraclar, James K. Baker. International Conference on Acoustics, Speech and Signal Processing, April 2009, Taipei, Taiwan.

Web-derived Pronunciations for Spoken Term Detection. Dogan Can, Erica Cooper, Arnab Ghoshal, Martin Jansche, Sanjeev Khudanpur, Bhuvana Ramabhadran, Michael Riley, Murat Saraclar, Abhinav Sethy, Morgan Ulinski, Christopher White. Special Interest Group on Information Retrieval, July 2009, Boston, Massachusetts.