Recent advancements in vision transformers and self-supervised learning are expanding the capabilities of computer vision models. This study explores the application of a DINOv2-based unsupervised approach for the re-identification of kākā, a forest parrot endemic to New Zealand. We measure the performance of our vision transformer against a canonical SIFT-based method to establish its utility in accurately identifying individual birds. Using video recordings of wild birds captured at purpose-built feeders over three distinct periods, we present evaluations of our models using extracted images. The results demonstrate that our DINOv2-based model achieves high accuracy, outperforming our SIFT-based approach. Deep learning models are often considered unexplainable. We offer a window into our model utilising patch embeddings to highlight key features of the kākā, These findings suggest that a vision transformer-based method is an effective non-invasive tool for improving conservation efforts to monitor growing populations of threatened parrots such as the kākā.
History
Preferred citation
Maddigan, P., Ehrhardt, O., Lensen, A. & Shaw, R. C. (2024, January). Re-Identification of Individual Kākā: An Explainable DINO-Based Model. In International Conference Image and Vision Computing New Zealand 2024 39th International Conference on Image and Vision Computing New Zealand (IVCNZ) (00 pp. 1-6). IEEE. https://doi.org/10.1109/IVCNZ64857.2024.10794473