Whisper to Normal Conversion by WESPER and comparison with other methods

Comparision with NMSE-DiscoGAN, MspeC-Net, and WESPER (proposed)

(Audio data other than WESPER are obtained from MSpeC-Net demo page).

Whisper  (source) NMSE-DiscoGAN [1] MSpeC-Net [2] WESPER (ours)
  1. Shah, Nirmesh & Parmar, Mihir & Shah, Neil & Patil, Hemant. (2018). Novel MMSE DiscoGAN for Cross-Domain Whisper-to-Speech Conversion. Machine Learning in Speech and Language Processing (MLSLP) Workshop
  2. H. Malaviya, J. Shah, M. Patel, J. Munshi and H. A. Patil, "Mspec-Net : Multi-Domain Speech Conversion Network," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 7764-7768.

Comparison with GMM, BLSTM, CycleGAN, AGAN-W2SC, and WESPER (proposed)

(Audio data other than WESPER are obtained from AGAN-W2SC demo page).

Whisper (source) GMM [3] BLSTM [4] CycleGAN VC[5] AGAN-W2SC [6] WESPER (ours)
  1. Toda, and K. Shikano, “NAM-to-speech conversion with Gaussian mixture models,” in Proc. Conf. Int. Speech Commun. Assoc. INTER- SPEECH, Lisboa, Portugal, Sept. 2005, pp. 1957–1960.
  2. G. N. Meenakshi, and P. K. Ghosh, “Whispered speech to neutral speech conversion using bidirectional LSTMs,” in Proc. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), Hyderabad, India, 2018, pp. 491- 495.
  3. T. Kaneko, and H. Kameoka, “Parallel-data-free voice conver- sion using cycle-consistent adversarial networks,” arXiv preprint, arXiv:1711.11293, Dec. 2017.
  4. Attention-guided generative adversarial network for whisper to normal speech conversion T Gao, J Zhou, H Wang, L Tao, HK Kwan - arXiv preprint arXiv:2111.01342, 2021