Mutual information based speech intelligibility prediction and its application to hearing aid fitting
Speech enhancement was, is, and will be the key technology for digital speech transmission. When developing speech enhancement algorithms, the intelligibility or the quality of the processed speech needs to be evaluated. Intelligibility is more fundamental than quality. The evaluation of speech intelligibility can be carried out through subjective listening tests and objective metrics. Carrying out subjective listening tests is time-consuming and costly. Using the objective metrics is more efficient. Thus, there has been an increasing research in speech intelligibility prediction.
Speech intelligibility can be measured in terms of the information received by a listener. This thesis aims at developing a mutual information-based speech intelligibility predictor (SIP) and using the mutual information-based predictor to assist the fitting process in hearing instruments. To achieve this goal, this thesis carried out three studies.
First, it studied the modeling of the transmitted message. For mutual information-based SIPs, the most important thing is to determine the transmitted message. This thesis studied two approaches of modeling of the message: one is the continuous-valued sound, and the other is the discrete-valued linguistic message. Two corresponding SIPs were developed. By comparing their predicted intelligibility results with the psychometric curves, which are the subjective intelligibility scores, it shows that the modeling of discrete-valued message gives a better match to the psychometric curves.
Second, based on the modeling of the discrete-valued message, this thesis proposed a mutual information-based SIP. Since the discrete-valued message cannot be obtained from a speech signal, the proposed SIP calculates the mutual information between the clean speech and the received speech, instead of calculating the mutual information between the message and the received speech. The proposed SIP considers frequency correlation for both the clean speech and the received speech. The evaluation results show that the proposed SIP performs better than the existing state-of-the-art mutual information-based SIPs.
Third, this thesis proposed an automatic fitting tool for the nonlinear frequency compression (NFC) operator, which is a frequency lowering operator used in hearing instruments. The automatic fitting tool adjusts the parameter in the NFC by maximizing the mutual information between the message and the frequency-lowered speech. To evaluate the automatic fitting tool, the parameter was also searched by listening tests. The results show that the parameter determined by the automatic fitting tool is consistent to the parameter determined by the listening tests.