Low Latency Convolutive Blind Source Separation
In most real-time systems, particularly for applications involving system identification, latency is a critical issue. These applications include, but are not limited to, blind source separation (BSS), beamforming, speech dereverberation, acoustic echo cancellation and channel equalization. The system latency consists of an algorithmic delay and an estimation computational time. The latter can be avoided by using a multi-thread system, which runs the estimation process and the processing procedure simultaneously. The former, which consists of a delay of one window length, is usually unavoidable for the frequency-domain approaches. For frequency-domain approaches, a block of data is acquired by using a window, transformed and processed in the frequency domain, and recovered back to the time domain by using an overlap-add technique. In the frequency domain, the convolutive model, which is usually used to describe the process of a linear time-invariant (LTI) system, can be represented by a series of multiplicative models to facilitate estimation. To implement frequency-domain approaches in real-time applications, the short-time Fourier transform (STFT) is commonly used. The window used in the STFT must be at least twice the room impulse response which is long, so that the multiplicative model is sufficiently accurate. The delay constraint caused by the associated blockwise processing window length makes most the frequency-domain approaches inapplicable for real-time systems. This thesis aims to design a BSS system that can be used in a real-time scenario with minimal latency. Existing BSS approaches can be integrated into our system to perform source separation with low delay without affecting the separation performance. The second goal is to design a BSS system that can perform source separation in a non-stationary environment. We first introduce a subspace approach to directly estimate the separation parameters in the low-frequency-resolution time-frequency (LFRTF) domain. In the LFRTF domain, a shorter window is used to reduce the algorithmic delay of the system during the signal acquisition, e.g., the window length is shorter than the room impulse response. The subspace method facilitates the deconvolution of a convolutive mixture to a new instantaneous mixture and simplifies the estimation process. Second, we propose an alternative approach to address the algorithmic latency problem. The alternative method enables us to obtain the separation parameters in the LFRTF domain based on parameters estimated in the high-frequency-resolution time-frequency (HFRTF) domain, where the window length is longer than the room impulse response, without affecting the separation performance. The thesis also provides a solution to address the BSS problem in a non-stationary environment. We utilize the ``meta-information" that is obtained from previous BSS operations to facilitate the separation in the future without performing the entire BSS process again. Repeating a BSS process can be computationally expensive. Most conventional BSS algorithms require sufficient signal samples to perform analysis and this prolongs the estimation delay. By utilizing information from the entire spectrum, our method enables us to update the separation parameters with only a single snapshot of observation data. Hence, our method minimizes the estimation period, reduces the redundancy and improves the efficacy of the system. The final contribution of the thesis is a non-iterative method for impulse response shortening. This method allows us to use a shorter representation to approximate the long impulse response. It further improves the computational efficiency of the algorithm and yet achieves satisfactory performance.