__  __      _____ _   ____            _           _       
|  \/  | ___|  ___(_) |  _ \ _ __ ___ (_) ___  ___| |_ ___ 
| |\/| |/ _ \ |_  | | | |_) | '__/ _ \| |/ _ \/ __| __/ __|
| |  | |  __/  _| | | |  __/| | | (_) | |  __/ (__| |_\__ \
|_|  |_|\___|_|   |_| |_|   |_|  \___// |\___|\___|\__|___/
where self-links rule                |__/

Australian Bioacoustic Search Tool

Australian Bioacoustic Search Tool
The Australian Acoustic Observatory has 360 microphones across the continent, and over 2 million hours of audio. However, none of it is labeled: We want to make this enormous repository useful to researchers. We have found that researchers are often looking for 'hard' signals - specific call-types, birds with very little available training data, and so on. So we built an acoustic-similarity search tool, allowing researchers to provide an example of what they're looking for, which we then match against embeddings from the A2O dataset.

Here's some fun examples! Laughing Kookaburra
Pacific Koel
Chiming Wedgebill How it works, in a nutshell: We use audio source separation to pull apart the A2O data, and then run an embedding model on each channel of the separated audio to produce a 'fingerprint' of the sound. All of this is put in a vector database with a link back to the original audio. When someone performs a search, we embed their audio, and then match against all of the embeddings in the vector database. Right now, about 1% of the A2O data is indexed (the first minute of every recording, evenly sampled across the day). We're looking to get initial feedback and will then continue to iterate and expand coverage.
posted by kaibutsu on Dec 01, 2023 at 2:20 AM

---------------------------

Amazing! I listened to a few examples and they all seemed to be good matches (but I'm no expert). Does the system ever get it wrong?
posted by mpark at 3:46 PM

---------------------------

Thanks for trying it out!

It turns out that one shot retrieval is a really hard task, so yes, especially when you get into hard examples.

However! We're aiming to make intersection with the system a bit more like how you normally use search: of you get bad answers, it might be worthwhile to try a different question. It may be easier to surface more distinctive call types, and focus on those, or it might work better if you don't a cleaner audio clip to query with. Part of the idea is to put the power to iterate in the hands of the user, instead of making them wait for model updates from one of a handful of ml engineers (who also, virtually, don't generally have the appropriate domain knowledge to know exactly what better looks like). (That said, we will keep making the model better!)
posted by kaibutsu at 6:24 PM

---------------------------

This is very cool thanks!
posted by tiny frying pan at 10:50 AM

---------------------------

Is there a role for human evaluation (e.g., crowdsourced listening & tagging), or has the software been accurate enough?

This is cool!
posted by wenestvedt at 6:43 AM

---------------------------

There's a lot of need for human experts! Australian species are very underrepresented in every database of bid vocalisations; a big part of the idea here is to make it easier to collect up a range of samples for a wide range of species, starting from a handful and some expert knowledge.

There's really no end to bioacoustics questions. There are 10k species of birds, many with a wide range of different calls, as well as geographic or individual variation. Calls relate to behavior, so as we nail down species identification, we start getting into questions about call types, which are often not well annotated or consistently tracked. A good example is juvenile calls: they can help tell if a population is reproducing successfully, but the difference between adult and juvenile calls is entirely in the heads of experts... Until someone has good enough reason to train a classifier for their particular species of interest.
posted by kaibutsu at 10:08 PM

---------------------------