University of Tasmania
Browse

File(s) not publicly available

Automatic processing of large-scale bioacoustic data using dynamic workflows

thesis
posted on 2023-05-27, 19:31 authored by Brown, AS
Environmental monitoring is becoming increasingly critical as climate change, deforestation, and other human activities increasingly threaten the health of environments. There has been significant interest in using bioacoustics analyses, which monitor environments by utilising sound recordings. Innovations in machine learning have enabled bioacoustics processing to be conducted automatically without the need for humans to listen to and manually annotate recordings. To be deployed in large-scale scenarios, researchers and conservationists require knowledge about which bioacoustics processes best suit their circumstances, and how to process bioacoustics recordings in an efficient and scalable manner. However, current research primarily focuses on specific use cases, often examining only specific components of bioacoustics processing and rarely considering computational efficiency. This means that any large-scale bioacoustics processing deployments could suffer from reduced efficiency and effectiveness due to poor process selection and suboptimal large-scale deployment. This thesis focuses on two components that are critical for achieving large-scale automatic bioacoustic analysis: process selection and process scalability. In terms of process selection, a bioacoustics process is made up of many components, hereby referred to as ‚ÄövÑv=tasks‚ÄövÑv¥, that can be selected in combination with each other, often targeting different scenarios. As there are a wide range of bioacoustics scenarios possible, it is difficult for researchers and conservationists to determine what the best combination of tasks is for their specific needs. The best combination of tasks might even change depending on the time of day, or even time of year as recording characteristics (e.g. noise and acoustic activity) change. In terms of scalability, bioacoustics processes present opportunities for both data and process parallelisation, although data cannot be split arbitrarily, as events of interest can take place over several seconds. Furthermore, processing workloads are highly dependent on what is going on in a recording at any given time. For example, a recording might be near-silent at some times, and full of sounds of interest at other times, and this significantly changes workloads. This thesis proposes an architecture that represents bioacoustics processes as workflows. These workflows are made up of bioacoustics tasks which each have a specific purpose. This way, it is possible to swap tasks that perform similar types of processing and select specific combinations to fit specific scenarios. It is also possible to divide work between multiple processors and machines to process automated bioacoustics analyses in a scalable manner. Within this representation, this thesis examines how to select processes targeting specific scenarios in two ways: selectively searching candidate workflows for target scenarios to find the most accurate and efficient workflows, and using an expertassisted Ripple Down Rules (RDR) based method to generate rules to select workflows quickly without needing to search through solutions. This thesis then investigates how to schedule and provision resources to process dynamic workflows in a real-time processing scenario, where workflow paths can change depending on the nature of the audio being processed, considering simultaneous streams with heterogeneous resources. The key findings are: - Pegasus, a leading Scientific Workflow Management System, is not well suited to execute bioacoustics workflows due to its high overhead when executing a high number of tasks. A developed prototype based on a novel bioacousticsspecific workflow architecture outperformed it significantly for processing the same workflow. - A surrogate model-based search algorithm can identify candidate workflows to search effectively and finds better workflows more quickly than metaheuristic and random search algorithms. - An RDR-based system can combine machine learning with expert knowledge to select workflows targeting a range of different scenarios. A machine learning model aiming to identify trends in workflow scores can identify effective workflows but struggles to select for specific scenarios. Experts can apply their domain knowledge to correct errors or identify workflows for unique scenarios not considered by the ML model. - An analysis of several approaches for scheduling and provisioning dynamic bioacoustics workflows in a real-time scenario using a simulation model shows that dynamic approaches are more reliable when it comes to meeting latency deadlines while using resources more efficiently. Making assumptions about workflow structures when scheduling can negatively impact results if workflows are too complex, or paths are too unpredictable. The algorithms and analyses in this thesis pave the way towards more efficient and effective large-scale bioacoustics analyses in the future. Conservationists can identify which processes are needed to fit their specific needs and deploy processes, both realtime and batch-based, in a scalable and efficient manner. This could open up more possibilities for cost-effective environmental monitoring going forward. Findings from this thesis could be deployed for real-world monitoring applications, for example, monitoring how populations or animal species with distinctive calls recover from bushfires

History

Department/School

School of Information and Communication Technology

Publication status

  • Unpublished

Rights statement

Copyright 2022 the author

Repository Status

  • Open

Usage metrics

    Thesis collection

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC