Whisper to Normal Conversion by WESPER and comparison with other methods

Comparision with NMSE-DiscoGAN, MspeC-Net, and WESPER (proposed)

(Audio data other than WESPER are obtained from MSpeC-Net demo page).

Whisper　　(source)	NMSE-DiscoGAN [1]	MSpeC-Net [2]	WESPER (ours)

Shah, Nirmesh & Parmar, Mihir & Shah, Neil & Patil, Hemant. (2018). Novel MMSE DiscoGAN for Cross-Domain Whisper-to-Speech Conversion. Machine Learning in Speech and Language Processing (MLSLP) Workshop
H. Malaviya, J. Shah, M. Patel, J. Munshi and H. A. Patil, "Mspec-Net : Multi-Domain Speech Conversion Network," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 7764-7768.

(Audio data other than WESPER are obtained from AGAN-W2SC demo page).

Whisper　(source)	GMM [3]	BLSTM [4]	CycleGAN VC[5]	AGAN-W2SC [6]	WESPER (ours)

Toda, and K. Shikano, “NAM-to-speech conversion with Gaussian mixture models,” in Proc. Conf. Int. Speech Commun. Assoc. INTER- SPEECH, Lisboa, Portugal, Sept. 2005, pp. 1957–1960.
G. N. Meenakshi, and P. K. Ghosh, “Whispered speech to neutral speech conversion using bidirectional LSTMs,” in Proc. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), Hyderabad, India, 2018, pp. 491- 495.
T. Kaneko, and H. Kameoka, “Parallel-data-free voice conver- sion using cycle-consistent adversarial networks,” arXiv preprint, arXiv:1711.11293, Dec. 2017.
Attention-guided generative adversarial network for whisper to normal speech conversion T Gao, J Zhou, H Wang, L Tao, HK Kwan - arXiv preprint arXiv:2111.01342, 2021