University of Tasmania
Browse

File(s) under embargo

2

month(s)

25

day(s)

until file(s) become available

Real time biodiversity measurement in large bioacoustic datasets

thesis
posted on 2023-05-28, 10:48 authored by Kaluri, VSNRR
Animal habitats are being destructed by humans for their personal needs. Habitat destruction will severely effect animal community by causing species extinctions. Consequently, ecological stress increases thus resulting in imbalanced biodiversity. This positions environmental integrity at risk by causing change in environmental conditions. This is harmful to all the beings on the earth. This triggers the need to assess biodiversity periodically to alert ecologists and government organizations to develop proactive strategies for reducing ecological stress. To assess, there exist manual traditional methods. However, they suffer from observer bias, cost issues and above all they are invasive. This has initiated a need for automatic and non-invasive methods. Since animals indicate and communicate their presence through sound, monitoring sound is a suitable non-invasive method. The science related to studying sound is known as acoustics and the sub-field related to the study of biologically-produced sound is bioacoustics. Bioacoustic recordings can be done with ease directly from habitats and can be analysed using automatic techniques to identify species. Several methods were proposed by re- searchers to analyse bioacoustic data automatically. However simultaneous bird calls, interference of rain and weather sounds make it more challenging. Further development of low-cost hardware has enabled collection of huge volume of data in a cheaper way. As data volume is high and will be recorded for many days, it needs to be processed on the fly in real-time. However, in real-time data, it is a fact that labels/annotations will not be available to assist in automatic species detection. This makes problem further challenging. Our study has attempted to address this challenge. To address this challenge, we have experimented on the suitability of unsupervised methods to this problem. Clustering being a popular unsupervised technique, we started with an initial hypothesis as ideal case such that each species family shall be confined to one cluster. Towards achieving this, we have conducted several experiments to investigate a methodology for clustering species. Experiments were conducted with varied audio sample sizes and different cluster sizes. Data characteristics have been studied and PCA was applied to identify prominent features. k-Means and hierarchical algorithms were applied on these features. Upon mapping with ground truth, few anomalies were observed and hence refined our approach. In refined approach, we tried extracting audio clips based on duration and frequency. Results are evaluated against ground truth and we could infer that computed outcomes are very close to ground truth with a difference of just 3 to 4% for several regions. Based on these experiments, we found that unsupervised approach is ably working in categorizing species that exist in a recording. Further extracting sounds based on duration and frequency has triggered a new direction for us to identify bioacoustic sounds in real-time stream environment. To assess biodiversity, traditional measures such as Simpson and Shannon were applied. In real-time scenarios, it is a fact that species will be moving between communities due to seasonal changes or incompatibility of habitats. Due to this, most of the bioacoustic data is non-stationary and there by a biodiversity measure shall be able measure species temporal changes between time periods. In line with this, we investigated to study the appropriateness of traditional measures in case of non- stationary data. We experimented with different scenarios. Based on these, we inferred that Simpson measure is not sensitive in case smaller counts disappearance. Similarly, Shannon measure becomes ineffective and we cannot draw any conclusions when species with zero count is included. Majorly these measures are ignoring species identity and they become ineffective when all decline at same rate. Hence, we concluded that these are more suitable for static data. To assess species changes between time periods, we attempted the use of distance measures. Measure is clearly indicating population changes in case of species additions and disappearance. However, measure is unable to detect missing species and decline of species. To tackle this, another measure CAGR was adopted which was sensitive in identifying species changes between time periods. Hence, we concluded that combination of distance based and CAGR measures will enable us to identify species changes between time periods. Our further study focused on clustering live data to obtain inputs for these measures. As massive volume of data will be generated from habitat recordings, we considered bioacoustic data as a data stream. Our study proposed an approach in this direction. In stream environment identifying call is a challenge. Bird call in a stream environment is typically referred as event. Event detection algorithm was implemented and evaluated. Computed events were correlated with available ground truth annotations. Experiments indicated that there is slight mismatch between these two. To match them, we investigated for suitable tolerances with respect to start time, duration and frequency. With these settings, event detector shown good event coverage however we observed that number of detected events outnumber bird calls by 20 to 30%. For these computed events, features are extracted and then fed as input to clustream algorithm. Algorithm generated several clusters with species as objects in the clusters. Each cluster is labelled with a specific species label based on dominant species samples. We analysed two-fold i.e. same species spread in different clusters and different species in same cluster. Based on these analysis, we inferred that our proposed is successful to greater extent and able to assess biodiversity based on cluster outputs. When compared with ground truth, we found that our proposed method traded off well.

History

Department/School

School of Information and Communication Technology

Publication status

  • Unpublished

Rights statement

Copyright 2022 the author

Repository Status

  • Restricted

Usage metrics

    Thesis collection

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC