Improving automatic processing of wildlife sound recordings
Acoustic monitoring of wildlife is emerging as a promising tool for animal conservation and research. Large amounts of natural sound recordings are routinely produced, but current use of this data is limited by lack of automatic analysis tools to process it efficiently. This thesis presents new methods for better detection of sound events, noise removal or robustness, and a framework for evaluating such improvements. Statistical and computational theory is used to support the generality of these tools, and also extended with results applicable outside of bioacoustics.
In the first study included in the thesis, we set out to establish a metric for bioacoustic population surveys that could be used to evaluate various design or analysis choices. Currently, a variety of metrics is used, such as the F1 score or area under the curve for call detection. Using a combination of theoretical arguments and data examples, we show that rankings produced by these metrics depend on unobserved parameters, and do not necessarily correspond to overall survey performance in terms of ecological aims. These issues are avoided if the designs are ranked by the precision of spatial capture-recapture estimates. This framework covers a variety of practically important survey designs, and thus provides a single metric for general method evaluation.
The next study turns to the question of event detection. Both in acoustics and other fields, various problems feature series with temporary changes in a parameter. Estimating the onset and offset times of such events – changepoint detection – can be achieved in a principled and efficient way, but only if the background is stable, which is rare in practice. We build a two-type changepoint detector that is robust to nuisance dynamics, and demonstrate its use on different types of real data. The detector assumes that nuisance and signals events have different length, which allows distinguishing them even if the same parameter is affected. As part of this, we develop a faster algorithm for fixed-background changepoint detection, and analyse its properties in the case of changing mean of a Gaussian series.
The above changepoint detector is applied to acoustic events in the third paper of this thesis. We propose to combine it with a wavelet packet decomposition, thus producing a robust detector of frequency-specific energy increases. The theoretical analysis from Study II is extended to discuss the properties of this method. We test it on acoustic surveys using the evaluation framework developed in Study I, and observe consistently higher efficiency compared to other energy detectors. In a public challenge of household sound analysis, it was combined with a simple classifier to reach comparable performance to deep learning models with much less training data.
Study IV concerns wind and other transient broadband noises. They interfere with sound analysis, and their rapid dynamics counter most existing noise removal methods. We develop a short-term estimator of broadband noise level, based on polynomial models of wavelet packet spectrum. Two uses for the estimator are demonstrated: adjusting the detector from Study III to further reduce false alarms and improve survey efficiency, and restoring denoised sound by wavelet shrinkage. Various design choices are discussed, such as a robust alternative for noise estimation in rich soundscapes.