Shah, Nirmesh & Parmar, Mihir & Shah, Neil & Patil, Hemant. (2018). Novel MMSE DiscoGAN for Cross-Domain Whisper-to-Speech Conversion.
Machine Learning in Speech and Language Processing (MLSLP) Workshop
H. Malaviya, J. Shah, M. Patel, J. Munshi and H. A. Patil,
"Mspec-Net : Multi-Domain Speech Conversion Network,"
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 7764-7768.
Comparison with GMM, BLSTM, CycleGAN, AGAN-W2SC, and WESPER (proposed)
Toda, and K. Shikano, “NAM-to-speech conversion with Gaussian
mixture models,” in Proc. Conf. Int. Speech Commun. Assoc. INTER- SPEECH, Lisboa, Portugal, Sept. 2005, pp. 1957–1960.
G. N. Meenakshi, and P. K. Ghosh, “Whispered speech to neutral speech
conversion using bidirectional LSTMs,” in Proc. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), Hyderabad, India, 2018, pp. 491- 495.
T. Kaneko, and H. Kameoka, “Parallel-data-free voice conver-
sion using cycle-consistent adversarial networks,” arXiv preprint, arXiv:1711.11293, Dec. 2017.
Attention-guided generative adversarial network for whisper to normal speech conversion
T Gao, J Zhou, H Wang, L Tao, HK Kwan - arXiv preprint arXiv:2111.01342, 2021