posted on 2021-06-28, 07:41authored byTom Denton, Alejandro Luebs, Michael Chinen, Felicia SC Lim, Andrew Storus, Hengchin Yeh, Willem KleijnWillem Kleijn, Jan Skoglund
Recent advances in neural-network based generative modeling of speech
has shown great potential for speech coding. However, the performance of
such models drops when the input is not clean speech, e.g., in the
presence of background noise, preventing its use in practical
applications. In this paper we examine the reason and discuss methods to
overcome this issue. Placing a denoising preprocessing stage when
extracting features and target clean speech during training is shown to
be the best performing strategy.
History
Preferred citation
Denton, T., Luebs, A., Chinen, M., Lim, F. S. C., Storus, A., Yeh, H., Kleijn, W. B. & Skoglund, J. (2020, November). Handling Background Noise in Neural Speech Generation. In 2020 54th Asilomar Conference on Signals, Systems, and Computers 2020 54th Asilomar Conference on Signals, Systems, and Computers (00 pp. 667-671). IEEE. https://doi.org/10.1109/ieeeconf51394.2020.9443535