Wrangling in the Antipodes

Exploring scikit-maad – False Colour Spectrum analysis

tristanlouthrobins Jan 24, 2026

Introduction In the previous (rather distant) blog post, I began an exploration of Python’s powerful soundscape analysis package, scikit-maad with a demonstration of how you can easily render circadian soundscapes (24-hour summaries of a soundscape) from periodic audio recordings made with an AudioMoth. We’re going to continue the exploration of scikit-maad in this post byContinue reading "Exploring scikit-maad – False Colour Spectrum analysis"

Show full content

Introduction

In the previous (rather distant) blog post, I began an exploration of Python’s powerful soundscape analysis package, scikit-maad with a demonstration of how you can easily render circadian soundscapes (24-hour summaries of a soundscape) from periodic audio recordings made with an AudioMoth.

We’re going to continue the exploration of scikit-maad in this post by examining False Colour Spectrum analysis.

What are False Colour Spectrums?

If you’re not familiar with False Colour Spectrum (FCS) then this term might be a bit confusing.

False Colour is used for differentiating features in imagery which are either not visible or immediately apparent. As a simple example, think about satellite imagery such temperature or rainfall variations across a given area. Similar approaches are used for geological surveys, measuring vegetation density or deep space imagery.

False colour mapping representing flooding in Montana, USA. Image source: CIMSS Satellite Blog

In the case of an ecological assessment, an observer can assume what the climate might be like by the appearance of terrain or a lack of cloud in a satellite image, but they can’t see the temperature or amount of rain, and furthermore it can’t be accurately measured by its volume or distribution.

This is where False Colour methods come into play, where the image – or a variant of the image – is decomposed using spectral filters into colours that distinguish key attributes present in a given area.

Audio and False Colour

For audio recordings the False Colour principle is similar.

Consider for example a long duration soundscape, which represents the totality of acoustic activity in a given ecosystem.

When we look at the visual representation of such a soundscape, we might be able to discern notable patterns along the horizontal axis, such as the dawn or evening choruses and the relative absence of activity at night or in the early hours.

But when we examine this soundscape vertically across the spectrum, it can sometimes be difficult to accurately discern or separate notable features.

As for the colour spectrum itself, in the case of scikit-maad the audio data will be decomposed into three components represented by the colours of the RGB (red, green, blue) spectrum.

For the example in this post we won’t be examining a long duration soundscape; instead I’ll be using a four-minute soundscape to explain the principle of the FCS and how this applies to audio. In a future post I’ll look into examining longer-duration examples.

Case study: COVID-era suburban soundscape

For this example I’ve selected an excerpt of a mid-morning recording that I made on my balcony during the first week of the worldwide COVID pandemic being declared in March 2020. This soundscape contains some distinctive attributes that seem like a good case study for applying the FCS

The balcony recording set up in March 2020. I would routinely record at 10am and 8pm respectively for an hour for the rest of the month.

As anyone who is inclined to listen attentively to their environments will recall, the pandemic presented an interesting experience in urban settings, where human activity quietened significantly. From a suburban vantage, the familiar rumble of road traffic had reduced dramatically. As a result of this absence, other sounds became much more noticeable and transparent in the soundscape, mostly notably the sound of birdsong.

Examining the audio spectrogram

Let’s have a look at a spectrogram of the audio and I’ll walk through some of the attributes that are visibly evident.

Note that I’ve mixed the stereo audio file down to mono for the purposes of creating this spectrogram. This is also to keep it consistent with the FCS outputs when we come to them. Although the FCS computation in scikit-maad can handle stereo files, the plots are outputted as a consolidation of multiple channels as a single spectrogram.

On the frequency scale in the spectrogram
I’ve applied a linear scale to the frequency here (rather than log) so that our view of the spectrum will be consistent with the output of the plots.

You can see above that I’ve made five annotations here, with A to C representing general bands of activity and D and E representing specific events. The low frequency region of A covers around 1 to 500Hz and you’ll note that whilst there’s continuous activity, the acoustic energy is variable.

Below is a log scale version so that we can better see what’s going on in this region.

What becomes evident is that the activity here is quite diverse. There’s a couple of instances where the breeze has resulted in a bit of low end mic shear (1-50Hz), and then there’s some wiggles in the 30 to 100Hz range from localised traffic passing on the street. The rasp from the tyres of vehicles can be seen further up the spectrum as E1, E2, E3 and E4.

Let’s direct our attention back to the linear scale version of the spectrogram again:

If we look to B, you’ll have to squint a bit to see traces of chirps from a piping shrike that begins intermittently around the 0:10 mark and becomes more distinct between 1:00 to 2:00.

If these features weren’t immediately apparent to you when you looked at the spectrogram, I completely understand. I’ve also had the benefit of listening to the audio closely as I annotated the spectrogram. As the scikit-maad documentation points out, FCS allows “the ability to see transient events and patterns which aren’t immediately apparent or discernible.”

As we move into wide range of C, this region encompasses the boisterous activity of rainbow lorikeets, finches and other birds. It’s mostly lorikeets though; they’re rowdy creatures and their vocalisation can cover a lot of territory with their coarse, scribbly calls and songs. We can see in D1 and D2 that this rowdiness picks up considerably.

Ok, so we’ve got an idea of what’s going on here. Let’s now pull this audio into Python and see what scikit-maad can uncover.

The scikit-maad workflow

Before we get underway, a note on running a FCA analysis. As it goes for any complex spectral analysis and modelling, FCA can consume a lot of processing power to accomplish its results. This is one reason I’ve chosen a four-minute audio recording as a point of departure, since the processing power required shouldn’t leave you hanging for too long.

Import dependencies

We start where we must and import the required dependencies.

import numpy as npimport matplotlib.pyplot as pltfrom maad import sound, featuresfrom maad.util import power2dB, plot2dfrom skimage import transformfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.decomposition import NMF

In addition to using the familiar numpy and matplotlib, you’ll also need the scikit-learn libraries for preprocessing and decomposing the audio data. Oh, don’t forget about scikit-maad too!

Loading the audio in and building a simple monochrome spectrogram

We’ll load the audio and build the spectrogram.

s, fs = sound.load('/Users/yourpathhere/audio-1.wav')
Sxx, tn, fn, ext = sound.spectrogram(s, fs, nperseg=1024, noverlap=512, fscale='log')

Then we set the db range, rescale the transformation and prepare a plot of the spectrogram.

Sxx_db = power2dB(Sxx, db_range=70)

Sxx_db = transform.rescale(Sxx_db, 0.5, anti_aliasing=True, channel_axis=None) # rescale for faster computation

plot2d(Sxx_db, figsize=(4,10), extent=ext)
shape_im, params = features.shape_features_raw(Sxx_db, resolution='low')

The plot output bears a close resemblance to the original spectrogram I annotated.

Decomposition of the audio data

Next we move into the gnarlier territory of preparing an array of the audio data for decomposition. This is highly technical business, but in simple terms what we are doing is identifying and decomposing the audio data into 3 components (see the call to NMF with n_components=3.)

# Format the output as an array for decomposition
print("\n\nFormatting the output as an array for decomposition...\n\n")
X = np.array(shape_im).reshape([len(shape_im), Sxx_db.size]).transpose()

# Decompose signal using non-negative matrix factorization
print("\n\nDecomposing signal...\n\n")
Y = NMF(n_components=3, init='random', random_state=0).fit_transform(X)

See how the decomposition has broken the spectrogram into three components which closely resemble the annotations I’d made to my earlier spectrogram?

‘Basis 1’ highlights the piping shrike calls, ‘Basis 2’ represents the noisy lorikeets and ‘Basis 3’ covers most of the low-end activity and – in a couple of instances – appears to be identifying the raspy sound of traffic passing on the street.

Building the False Colour Spectrum plot

From here the decomposition is then rendered into a single FCS plot that highlights these components alongside a simple monochrome spectrogram for comparison.

Y = MinMaxScaler(feature_range=(0,1)).fit_transform(Y)
intensity = 1 - (Sxx_db - Sxx_db.min()) / (Sxx_db.max() - Sxx_db.min())
plt_data = Y.reshape([Sxx_db.shape[0], Sxx_db.shape[1], 3])
plt_data = np.dstack((plt_data, intensity))

fig, axes = plt.subplots(3,1, figsize=(10,8))
for idx, ax in enumerate(axes):
    ax.imshow(plt_data[:,:,idx], origin='lower', aspect='auto',
              interpolation='bilinear')
    ax.set_axis_off()
    ax.set_title('Basis ' + str(idx+1))

Let’s just isolate the FCS plot and have a closer look.

There we go! Following the RGB principle, the low-end activity (along with the instances of local traffic) is represented by the blue part of the colour spectrum, the lorikeets fall into the green spectrum, and the calls of the piping shrike are those spots of red. It’s done a pretty good job of distinguishing some of the key features of this soundscape!

Wrapping up

In a future post I’ll return to FCS analysis and present some other examples which will highlight more use-cases for soundscape analysis, whilst also uncovering some of its limitations as well. I’ll also explore the fine tuning of parameters too.

The code I used in this example is available in full in the WITA GitHub repository.

In the meantime, if you’ve got any feedback or comments please pop them below.

http://wranglingintheantipodes.wordpress.com/?p=1314

Extensions

Exploring scikit-maad – Circadian soundscape rendering

tristanlouthrobins Jul 6, 2025

Introduction For some time I’ve been meaning to get stuck into one of Python’s most powerful and prolific libraries for exploring soundscapes – scikit-maad. As regular readers of this blog will note, my explorations of acoustic ecology tools up to this point has been almost entirely focused on R libraries for performing similar acoustic analysis.Continue reading "Exploring scikit-maad – Circadian soundscape rendering"

Show full content

Introduction

For some time I’ve been meaning to get stuck into one of Python’s most powerful and prolific libraries for exploring soundscapes – scikit-maad. As regular readers of this blog will note, my explorations of acoustic ecology tools up to this point has been almost entirely focused on R libraries for performing similar acoustic analysis.

Why Python and scikit-maad now? Well, R still certainly has a valuable place in my workflows (and heart) but nowadays I’m using Python on regular basis for application and program design, so I thought it was logical to explore what it could do with acoustic data. For those R diehards, fear not! I intend to continue using employing my first (data) love well into the future.

scikit-maad

Developed by Juan Sebastian Ulloa (supervised by Jérôme Sueur and Thierry Aubin) in 2018, scikit-maad is an extension of Python’s powerful scientific computing libraries (scikit) with a suite of features for loading and processing digital audio, segmenting and locating regions of interest, computing acoustic indices and features, and estimating sound pressure levels.

Much like the R packages I’ve explored previously on this blog, scikit-maad has a similar capacity to process large audio datasets and apply machine learning techniques to evaluate acoustic properties and identify key patterns in soundscapes. I intend to begin this exploration of scikit-maad with a concept that I covered a couple of years ago.

Circadian soundscapes

Back in 2023 I developed a workflow to perform ‘batch time compression’ using Audacity macros in conjunction with a Python script. The idea was to take multiple audio files from a 24-hour cycle and make this into a shorter ‘compressed’ version of the 24-hour cycle. In this example, 85 minutes of audio was shortened to 21 minutes whilst preserving the structure of the 24-hour cycle. The Python script involved specifying a sample length for each file in the recording cycle, which then iterates over the files and instructs an Audacity macro to edit the tracks, arrange the samples end-to-end and apply crossfades.

In principle, the ‘circadian soundscape’ is synonymous with circadian rhythms: it’s an acoustic representation of a 24-hour cycle.

The following scikit-maad circadian soundscape workflow provides a quick and scalable solution for quickly gaining visual insights of multiple audio files comprising a 24-hour cycle.

Case study: Two AudioMoths in Middle Farm AudioMoth deployments

For this use case, I’ll be using acoustic datasets obtained from two AudioMoths which were deployed at a remote property on Fleurieu Peninsula over several days during the winter of 2023. Both AudioMoths were configured to record for 60 seconds every 10 minutes over a 24-hour cycle, with 144 audio files for each day of observation.

Both AudioMoths were deployed in proximity to the Carrickalinga Creek which runs through this property. The first AudioMoth was deployed on the upper section of a bend in the creek, strapped to an overhanging branch and facing upstream.

The first AudioMoth deployed on the branch of a tree overhanging a bend in the creek. From this view, we are looking upstream.

The second AudioMoth was deployed about 150 metres downstream from the first AudioMoth, strapped to the trunk of a small sheoak tree located on a ridge, about 10 metres from the creek. This AudioMoth was positioned facing the creek.

The second AudioMoth deployed to the trunk of a native sheoak tree. From this view, the creek is flowing downstream from right to left.

Front-on view of the second AudioMoth deployed to the trunk of the sheoak.]Screenshot

Aside from the sound of the creek itself, there’s an array of birdlife present in the area (reed warblers, honeyeaters, galahs, grey fantails, fairy wrens) as well as a nightly frog chorus.

Generating the circadian soundscapes in Python

Since there’s multiple days of observation for each of the deployments, I’ve developed a Python script which iterates over folders containing each day of observation.

Now, let’s break down the Python script. You can find it in full on my GitHub repo here.

Importing dependencies

In order to generate the spectrogram representations of the circadian soundscapes in Python, we’ll need to import the required libraries:

glob – for locating filenames using wildcards and or/regex.
matplotlib – for displaying the spectrograms.
maad – for building a list of the audio and then editing it together to produce the spectrogram data.
scipy.io.wavfile, numpy – for exporting the circadian soundscape as a .wav format audio file.

In addition to these libraries, I’ve also imported the csv and datetime libraries for generating metadata.

Setting up variables

Now we’ll set some variables for the audio data.

# Set base directory containing daily subfolders
base_path = '/Users/XYZ/Music/Moth Temp/WITA16/'
sample_len = 15  # sample length for each of the audio files in seconds
start = 0 # start hour for daily observation (e.g., 00:00)
end = 24 # end hour for daily observation (e.g., 24:00)
freq_max = 50 # upper threshold of frequency range (e.g., 50kHz)```

We define the directory path for the audio files as base_path. An upcoming loop will be looking for subfolders to iterate over, so each of the daily observations will need their own subfolder.

For simplicity, I’ve named these subfolders 02, 03, 04, 05, 06, 07 which represent days in July when the observations were made.

Next, a sample length (sample_len) is defined. This is the amount in seconds that we want to extract from each of the 144 audio files per day of observation.

Then we set the start and end hours for each daily observation. Since we want to cover the full 24-hour cycle, these are set to 0 and 24 respectively.

Lastly, we need to define the upper frequency threshold (freq_max) for the spectrogram. In this instance, it’s been set at 50kHz.

Creating a metadata file

Since my day job involves interrogating a lot of automation log data, I felt it was useful to include this in the script as well.

with open(metadata_file, mode='a', newline='') as log_file:
    log_writer = csv.writer(log_file)
    if write_header:
        log_writer.writerow(['Timestamp', 'Observation', 'Num Files', 'Sample Length (s)', 'Total Duration (s)', 'Spectrogram File', 'Audio File'])

Looping over the subfolders and processing the audio

Now we get into the thick of it. We begin by defining a subfolders variable by scanning the contents of the base_path and seeing if there’s any directories in there. Then we commence the loop, assigning each observation to the subfolder currently being processed. Then files are looked for (anything with .WAV) and are coerced into a list.

# Loop through all subdirectories
subfolders = [f.path for f in os.scandir(base_path) if f.is_dir()]

for folder in subfolders:
    observation = os.path.basename(folder)
    print(f"\nProcessing folder: {observation}")

    flist = glob.glob(os.path.join(folder, '*.WAV'))
    flist.sort()
    if not flist:
        print(f"  Skipped {observation}: no WAV files found.")
        continue

This list informs the creation of the long_wav array, where each sound file (as fname) from flist is loaded, trimmed (per sample_len) and then appended to the long_wav array:

   long_wav = []
    for fname in flist:
        s, fs = sound.load(fname)
        s = sound.trim(s, fs, 0, sample_len)
        long_wav.append(s)

The long_wav variable is a NumPy array representing the combined audio signal (s), and fs is the sampling rate.

Then a crossfade is applied and the spectrogram is generated in with axis and ticks:

    long_wav = util.crossfade_list(long_wav, fs, fade_len=0.5)

    Sxx, tn, fn, ext = sound.spectrogram(long_wav, fs, window='hann', nperseg=1024, noverlap=512)

    fig, ax = plt.subplots(1,1, figsize=(10,3))
    util.plot_spectrogram(Sxx, extent=[start, end, 0, freq_max],
                          ax=ax, db_range=80, gain=25, colorbar=False)
    ax.set_xlabel('Time [Hours]')
    ax.set_xticks(range(0,25,1))

Directories for the spectrogram and audio file are created (if they don’t exist) and are exported to these locations:

    os.makedirs("outputs-spectral", exist_ok=True)
    spec_file = f"outputs-spectral/{observation}.png"
    plt.savefig(spec_file)
    plt.close()
    print(f"  Saved spectrogram to {spec_file}")

    os.makedirs("outputs-audio", exist_ok=True)
    long_wav_norm = np.int16(long_wav / np.max(np.abs(long_wav)) * 32767)
    audio_file = f"outputs-audio/{observation}.wav"
    write(audio_file, fs, long_wav_norm)
    print(f"  Exported audio to {audio_file}")

Last of all, we log some metadata:

    # Log metadata
    with open(metadata_file, mode='a', newline='') as log_file:
        log_writer = csv.writer(log_file)
        log_writer.writerow([
            datetime.now().isoformat(timespec='seconds'),
            observation,
            len(flist),
            sample_len,
            len(long_wav) / fs,
            spec_file,
            audio_file
        ])

Examining the outputs

So with the script run for both batches of audio from the two AudioMoth deployment, we now have a set of spectrograms and audio files representing the circadian soundscape for each of the days of observation.

Let’s have a look at the first full day of observation for both sites:

AudioMoth 1 was deployed on a branch overhanging the creek.

AudioMoth 2 was deployed on a sheoak tree a few metres back from the creek further downstream.

I’ve included both circadian recordings as a single audio file above, with AudioMoth 1 panned right and AudioMoth 2 panned left. If you want to align the an event occurring in the audio file with what’s happening in the spectrograms, then you can use this handy web app I made.

The simple web app. Note that the spectrogram won’t appear! For the time being it’s just the fields for the time values.

Breaking down the observations

Both circadian soundscapes annotated.
A) pre-dawn frog chorus; B) dawn chorus; C) droplet; D1) wind/breeze buffeting enclosure and also bird activity (wattlebirds, little ravens); D2) bird activity (wattlebirds, finches, little ravens); E) dusk chorus; F) post-dusk frog chorus

The small hours (0-6:00am) – audio file 0:00 to 8:23

Let’s first examine what’s going on in the period from 0am to 6am. We can see continuous activity occupying the ~4-7kHz range for AudioMoth 1, and a much weaker and narrow band of activity for AudioMoth 2 (see annotation A.) What is this?

These are the Painted (Burrowing) frog which is native to South Australia and hibernates through the warmer months to reemerge once it cools down and the seasonal rains arrive. It’s worth noting that this creek very often ends up dry by the end of Summer (February/March) and will only begin flowing again around May to June.

Dawn chorus (7am-7:30am) – audio 8:24 to 8:49

With the arrival of dawn (annotation B), the birds start their chorus. This is clearly represented in both spectrograms, with a bit more regularity at the AudioMoth 1 site as opposed to the slightly weaker and fragmented recording at the AudioMoth 2 site. You’ll see a prominent spike for AudioMoth 2 at about the 7.5 mark on the x-axis (annotation C.) Let’s address that next.

Transient droplets, rain and the remains of the day (7:30am – 5:00pm) – audio 8:49 to 23:49

That spike (annotation C) encompassing most of the frequency range are droplets of rain or moisture striking the enclosure of the AudioMoth. Whilst this could be considered as somewhat undesirable in a recording, they do provide an indicator of when rain was likely falling across the area.

But it’s not always the case. For annotations D1 and D2, there’s more to this than meets the eye. For D1, this section consists of the some wind buffeting the AudioMoth enclosure and the respective calls of a wattle bird and little raven. For D2, there’s also prominent bird activity from wattle birds, finches and a little raven. As we’ll see a bit later on, these spikes of activity are sometime directly attributed to rainfall, but we often have to dig into the recordings to determine what’s actually going on before we draw any conclusions.

Dusk and return of frogs (5:30pm – 11:59am) – audio 23:50 to 33:37

By 5:30pm, the calls of New Holland Honeyeaters and finches are heard as the frog chorus arrives again as darkness descends.

In between the annotations

Aside from the annotated highlights of these recordings, there’s some interesting activity occurring – especially with AudioMoth 1. If you listen closely enough, you will occasionally hear the sound of the creek cascading gently as it makes its way around the bends where the deployment was. The sound of the creek is less evident downstream at AudioMoth 2. Obviously because the deployment is set a bit further back, but the creek is also more staid further downstream.

Examining another day of observation

Through analysis of the following couple of days, there are similar patterns to be found: namely the frog chorus prior to dawn and following sunset. If we skip ahead to the circadian soundscape for the 6th of July, we encounter some activity not previously encountered – lots of rain.

Much like the wide-band spikes of droplets striking the AudioMoth enclosure, the rain is represented similarly. See also how these instances of rain frequently align between the two sites, which indicates that the rainfall was fairly widespread, since the AudioMoths were space over a hundred metres apart. But it is still very important to qualify this by listening; since you will hear regular instances of wind shear and slight rattle affecting both AudioMoths.

As I did with the 2nd of July circadian soundscape, the audio for 6th of July is below. Remember: AudioMoth 2 is panned to the left and AudioMoth 1 panned to the right.

Wrap-up and next blog post

As we’ve found with this case study, representing multiple audio recordings as a circadian soundscape representation is a very useful approach for analysing soundscapes and identifying key events, trends and patterns.

For the next blog post, I’ll expand upon this case study work by exploring another great feature of the scikit-maad library – the False Colour Spectrum.

http://wranglingintheantipodes.wordpress.com/?p=1274

Extensions

Introduction to the Acoustic Diversity Index (ADI)

tristanlouthrobins Mar 15, 2024

Over the last three years of maintaining this blog, there’s been a conspicuous absence in the acoustic index toolkit presented so far. Whilst the Bioacoustic Index (BI), Acoustic Complexity Index (ACI) and Normalised Difference Soundscape Index (NDSI) have been introduced and demonstrated across given case studies, the Acoustic Diversity Index (ADI) hasn’t been covered toContinue reading "Introduction to the Acoustic Diversity Index (ADI)"

Show full content

Over the last three years of maintaining this blog, there’s been a conspicuous absence in the acoustic index toolkit presented so far. Whilst the Bioacoustic Index (BI), Acoustic Complexity Index (ACI) and Normalised Difference Soundscape Index (NDSI) have been introduced and demonstrated across given case studies, the Acoustic Diversity Index (ADI) hasn’t been covered to date.

Why is this? When I first started applying acoustic indices a few years ago, I could never find the right fit for the ADI, and in fact I don’t think I really understood how it could be applied properly. In this respect, I would regard it now as one of the more fussy ecoacoustic metrics which requires more forethought and care when setting its parameters and reviewing the output.

It also prompted me to go back to basics. If you’ve been checking out this blog or my YouTube channel lately, you will have seen some recent videos which have been created to provide simple, introductory overviews of ecoacoustic metrics. Sometimes it’s useful to go back to the fundamentals (even in the case of the well-covered ACI) and it’s also handy to have a video overview that can be embedded in a post.

The Acoustic Diversity Index

The development of the ADI is based on the observation that ‘acoustic community diversity is the aggregation of all species that produce sounds at a particular location and time.’ (Bobryk et al., 2016, Sueur et al., 2014) and that ‘each community has a distinctive acoustic signature depending on species communication’ (Farina & Pieretti, 2014.)

There’s a clear distinction here between the ADI and ACI, which – based on my earlier attempts to apply ADI – I hadn’t been able to fully grasp.

The ADI is primarily concerned with how much diversity (at a given intensity threshold) is present across the frequency spectrum, whereas the ACI is primarily concerned with a distinction between transient and continuous acoustic events.

An example of a transient event might be bird or frog vocalisation, whereas continuous events could be insect trill, the cascade of a creek or human-generated sounds like machinery or vehicles.

Another way of considering this distinction is that the ACI is concerned primarily with events occurring horizontally (over time), whereas the ADI is concerned primarily with events occurring vertically (across the spectrum, irrespective of an event’s duration.)

Now, there’s much more to it than that, but in principle, and where the ADI is concerned, if you have lots of different sounds occurring across the frequency spectrum which are loud enough, this should produce a high ADI value.

How the ADI computes its metric

Below we have a high-level abstraction of the ADI process for a given audio file. The file in this instance is a one-minute recording of a dawn chorus. As we can see, there’s a lot of activity occurring across the frequency spectrum, so we can assume that this will likely return a high ADI value.

The frequency spectrum is analysed and discretised into frequency bins against time samples. From here, the ADI algorithm computes the amount of diversity present across these bins.

An ADI value of 3.67 is returned. The ADI metric ranges from 0.01 (very low) to 5.0 (very high), so we can conclude that we have reasonably high acoustic diversity in this recording.

Executing ADI for this single recording in R requires the soundecology, seewave and tuneR packages. I’ve also created some custom functions and instantiated the built-in package functions. meanspec() will create a visualisation of the averaged spectral analysis, fbands() will discretise the spectral analysis into bands (10 are defined in this case) and acoustic_diversity() will perform the ADI analysis on the audio file.

The code for this post and audio files can be found at the relevant GitHub repository here

library(soundecology)
library(seewave)
library(tuneR)

# create functions to perform spectral analysis and compute ADI --
# 1. spectral_viz: spectral analysis of given file
spectral_viz <- function(file, range) {
  meanspec(file, f=range, plot=T)
}

# 2. discrete: discretise spectral analysis into frequency bins --
discrete <- function(spectral_analysis) {
  fbands(spectral_analysis, bands=10)
}

# 3. compute the ADI for given soundfile -- 
compute_adi <- function(file) {
  acoustic_diversity(file, max_freq = 10000, db_threshold = -20, 
                     freq_step = 50, shannon = TRUE)
}

# create the spectral analysis, bins and perform ADI --

spec.dw <- spectral_viz(dawn, 16000)
discrete(spec.dw)
compute_adi(dawn) # Acoustic Diversity Index: 3.670878

Before moving on, let’s look at an example where the acoustic diversity is very low.

Here we have a recording made on the banks of American River in Kangaroo Island. It was a still afternoon with very little acoustic activity, evidenced by the fairly barren spectrogram below. We can see that there’s a little intensity occupying the lower end of the spectrum (distant boats) and a couple of transient events (birdsong) in the latter section of the recording.

As a spectral analysis these couple of transient events can’t compete with the dominant activity in the lower end and therefore the ADI once computed is very low at 0.01.

If you’d like to listen to these and other standalone recordings here’s the overview video I’ve created. The time marks for the sections are listed below:

3:27 – Dawn chorus ADI analysis
4:35 – American River ADI analysis
5:48 – Rocky River (Kangaroo Island) ADI analysis
6:16 – Hay Flat Road, Normanville ADI analysis
6:43 – Mu Ko Lanta National Park, Thailand ADI analysis
7:12 – Adelaide Central Markets ADI analysis
7:40 – Bungaree Station dawn chorus ADI analysis

Current challenges for ADI analysis

Whilst the other acoustic indices I’ve covered to date (BI, ACI, NDSI) can deal with acoustic data containing biases such as the dominance of natural sounds (such as river cascades), the influence of weather (wind, rain) and device self-noise, the ADI appears to be a lot more sensitive to this.

I believe that one of the reasons for this is due to the previously mentioned vertical/horizontal distinction associated with the ADI and ACI.

The ‘vertical’ spectral reading of the ADI does not place as much importance on the ‘horizontal’, time-based states of acoustic events. For reference, here’s that previous graphic that I shared in the introduction.

Depending on an interaction with the recording input and its enclosure, biases like weather and self-noise typically encompass a lot of the frequency spectrum. These are pretty ‘rough’ and noisy sounds after all. And since the duration of an event isn’t important to how the ADI is reading the data, the time-based (transient or continuous) attributes of acoustic events are largely irrelevant.

Because of this, the ADI is partial to interpreting these events as being highly diverse since they may encompass a big chunk of the spectrum and be regarded as sounds from several sources when they may only be attributed to one!

The following example of two recordings made at the same site underscores this problem (see below.)

The first recording has a sole key attribute, which is the sound of the creek cascade. As it encompasses several areas of spectrum and is consistent, the ADI is assuming that it of a high acoustic diversity, resulting in an ADI value of 3.26.

Recall the earlier example of a dawn chorus which also returned a high ADI value (3.67.)

Now, a creek cascade can be acoustically diverse when considered in isolation, but this fundamentally betrays the purpose of the ADI which is intended to measure ‘community diversity’. Within the context of the ADI, the creek recording isn’t actually diverse, whereas the dawn chorus (consisting of various bird species) is diverse.

So this becomes especially problematic when this first recording is measured against another recording from the same site which is – on a cursory listen – genuinely diverse.

The second recording again features the creek’s cascade, but also the sound of birds, frogs and crickets. When ADI is computed, it returns an ADI of 1.07! This is more than 2 points lower than the creek cascade recording.

So this isn’t an accurate representation of the acoustic diversity at this site and this presents a huge challenge when ADI analysis is scaled out to measure acoustic diversity at one site over an extended period.

How do we resolve this? I have some solutions in mind, but I’m going to carry this over to another instalment in the future.

In the meantime, I hope you’ve gotten something out of this introduction to the ADI. As always, if you have any questions or feedback, please leave a comment below the post or get in touch with me directly.

http://wranglingintheantipodes.wordpress.com/?p=1232

Extensions

Two new (relevant) videos

tristanlouthrobins Jan 12, 2024

Happy new year everyone! It’s been fairly quiet on here and there’s a second instalment on the AudioMoth deployment at Lady Bay Reef which is currently in the works. This has taken much longer than expected to pull together, largely owing to going out too deep (pardon the pun) and having to rein myself inContinue reading "Two new (relevant) videos"

Show full content

Happy new year everyone!

It’s been fairly quiet on here and there’s a second instalment on the AudioMoth deployment at Lady Bay Reef which is currently in the works.

This has taken much longer than expected to pull together, largely owing to going out too deep (pardon the pun) and having to rein myself in again.

Between this heave-ho, I’ve been creating some new videos which provide an overview of this work as well as some acoustic ecology tools, such as acoustic indices. These videos are intended to be fairly simple overviews since I’m keen to provide general points of entry with them as well as having a video that I can pop into a given post as an explainer for a particular topic.

http://wranglingintheantipodes.wordpress.com/?p=1223

Extensions

Batch time compression and crossfade with Audacity and Python

tristanlouthrobins Jun 10, 2023

If you’ve found this post and want to jump to the solution I’ve developed, go for it and find it further down this post – I shall not be offended. However, if you’re a regular reader of this blog and you’d like some background on why I looked into developing this process, please read ahead.Continue reading "Batch time compression and crossfade with Audacity and Python"

Show full content

If you’ve found this post and want to jump to the solution I’ve developed, go for it and find it further down this post – I shall not be offended. However, if you’re a regular reader of this blog and you’d like some background on why I looked into developing this process, please read ahead.

For the past year I’ve been using AudioMoths to record over longer durations. When it’s come down to analysing the data, I’ve developed workflows for dealing with this in terms of leveraging efficient methods for the processing and subsequent quantification of data.

Quantifying, summarising and visualising this acoustic data is key to making useful insights. It’s also way more efficient than listening through thousands of audio files and manually examining each of their spectrograms.

But recently I thought about the vast troves of acoustic data I’ve amassed and realised how little of this I’ve actually listened to! Of course, I listen to subsets of data when there’s interesting events and trends appearing in the data summaries and visualisations, but there simply isn’t the time (or headspace) to dig into everything.

I also wondered whether some of these long duration acoustic observations would merit inclusion on my long-running project, the Fleurieu and Kangaroo Island Sound Map. In the case of recent field work conducted on Lady Bay reef and Middle Farm, I think they do, but I can’t necessary post a week’s worth of audio to the site. It’s not practical, and I doubt very much that anyone would listen to the whole thing!

This is where the concept of ‘time-compression’ comes into the mix.

Time compression of long duration audio

‘Time-compression’ is a term that sound recordist Chris Watson uses for describing the process of taking a long duration audio recording of an environment and editing this down to a shorter duration. Watson commonly applies this technique for events such as a dawn chorus, which might occur over an hour or so, but using ‘time-compression’ might be edited down to ten minutes. The technique is scalable, so Watson has applied this to 24-hour and week-long durations. His releases El Tren Fantasma and In St Cuthbert’s Time are great examples of this.

I haven’t come across anything that document’s Watson’s exact approach, but I would imagine that his approach is meticulous, insofar that his objective is to manually locate key events and trends and condense this into a version that expresses the notable and/or prominent attributes of a given environment.

What I’m proposing for my purposes isn’t anywhere as meticulous as Chris Watson’s approach and largely negates much manual intervention. In fact, I’m going as far as almost automating the entire process. My current work role is as an Automation Developer, so it would slightly remiss of me not to admit that the influence of my work has rubbed off here.

To get started with this I’ll be using Audacity to batch edit the audio files.

Hang on though, why not just build a custom macro?

Initially I thought that building a simple macro in Audacity would do the job. Indeed, it performs the bulk of the work here.

In the above example I’ve imported 85 AudioMoth audio files of equal length (1 minute), and using the macro:

1) I select all 85 tracks.
2) Select the first 15 seconds and 3) trim to this mark.
4) Align the tracks end-to-end.
5) Mix and render (aligning end-to-end on a single track)

So, what we end up with is an 85-minute set of audio time-compressed down to about 21 minutes.

It might not be apparent to the eye from here, but there’s a problem when we listen back – there’s no crossfading between the tracks here, so it’s resulted in abrupt transitions. In a crude documentation sense this is probably acceptable, but for the purposes of the sound map and ease of listening it won’t suffice.

The mark at 20:30 is a good example of this. Here I’ve switched the waveform view in Audacity to a spectrogram and we can see how abrupt the transition is:

Ok, surely we can simply implement an additional step in the macro to perform crossfades at each of the transitions?

This can be done (and has been demonstrated), but it requires repeating macro steps with crossfading over and over, depending on how many clips you have. So you end up with reams of hard coded lines of macro commands which can’t accomodate different volumes of files.

For smaller batches of files it’s a good solution, but it’s not suitable for 85 or more files.

Within Audacity’s macro functionality I looked into whether a loop could be incorporated with dynamic variables, which would allow me to define how many files there are and cut down the repeated macro steps to a single loop (of a range defined by the number of files.)

Alas, this wasn’t possible with macros. So I dug a bit deeper into Audacity’s documentation and read up on the application’s scripting functionality. Along with Audacity’s built-in Nyquist scripting language, there’s a plug-in module called “mod-script-pipe” which allows scripting through Python.

Sending commands to Audacity via Python script

Using the Audacity dev team’s “pipe-test.py” script as a template for getting to grips with how to send commands to Audacity, I wrapped a while loop in a function that accepts user-defined inputs.

The full script (including the initial “pipe-test.py” commands) is available on this blog’s GitHub repo here.

I’ll break down some of the script and explain what’s going on.

First of all, when the script is run the user will be prompted to enter how many files have been imported into Audacity, how long the files are and how long the crossfade should be.

num_files = int(input("enter the number of files imported into Audacity: "))
file_start = 0
file_end = int(input("length of each of the files (in seconds): "))
fadelen = int(input("crossfade length (in seconds): "))
increment = file_end - (fadelen/2)

The script will then perform a sequence of commands in Audacity. Do these look familiar? They’re identical to the macro commands I described earlier!

do_command('SelectAll:')
do_command(f'SelectTime:End="{file_end}" RelativeTo="ProjectStart" Start="0"')
do_command('Trim:')
do_command('Align_EndToEnd')
do_command('MixAndRender')

With the files edited, aligned and rendered we now need to perform 84 sequential crossfades. To do this, the function is called:

batch_crossfade(num_files, file_start, file_end, fadelen)

Here’s the function. Note how the SelectTime command and its parameters for start and end times are derived from the user-defined inputs and are dynamic within the while-loop.

def batch_crossfade(num_files, file_start, file_end, fadelen):
    i = 1
    while i < num_files:
        print("File " + str(i) + " end: " + str(file_end))
        do_command('SelectAll:')
        fade_begin = file_end - (fadelen/2)
        fade_end = file_end + (fadelen/2)
        print("File: " + str(i) + "\nFade begin: " + str(fade_begin) + "s" + "\nFade end: " + str(fade_end) + "s")
        do_command('SelectAll:')
        do_command(f'SelectTime:End="{fade_begin}" RelativeTo="ProjectStart" Start="{fade_end}"')
        print("Region selected.")
        do_command('CrossfadeClips:Use_Preset="<Factory Defaults>"')
        print("Crossfade for audio files " + str(i) + " to " + str(i + 1) + "completed!")
        file_start = file_start + increment
        file_end = file_end + increment
        i += 1
        print("Next File = " + str(i) + "\nStart: " + str(file_start) + "\nEnd: " + str(file_end))

    print("Processing complete.")

Ok, let’s put it to work and see the results.

Implementation

With the 85 audio files imported into Audacity I run the script in my Python IDE and enter the user-defined inputs.

Now the script moves onto edit commands and the while-loop:

Let’s look at the results. Highlighting that abrupt transition I mentioned earlier:

Considerations and further enhancements

I’m pretty happy with outcome of the results here, but there are some considerations to bear in mind:

Currently Audacity must be open and the files imported into the session prior to running the Python script.
If any of the user-inputs are incorrect (such as number of files and file length) the script will not work as intended and there is nothing implemented in the script to compensate for errors at this stage.
Implementing crossfades in Audacity will shorten the overall audio file depending on the crossfade length, since each transition consists of two files overlapping each other.

All of these potential issues can be resolved with some enhancements further down the line and I’m looking forward to improving this process in the coming months.

If you have any feedback or comments, please pop it below or get in touch directly.

http://wranglingintheantipodes.wordpress.com/?p=1137

Extensions

Lady Bay Reef Survey: Part 1 – tides, seagrass and acoustic complexity

tristanlouthrobins Apr 14, 2023

As I’d indicated in my previous post, following the interesting outcome of the October 2022 AudioMoth deployment on Lady Bay reef, I intended to return and deploy a pair of AudioMoths to make further observations of the reef’s underwater environment. You can go back to the previous post if you want a more detailed overviewContinue reading "Lady Bay Reef Survey: Part 1 – tides, seagrass and acoustic complexity"

Show full content

As I’d indicated in my previous post, following the interesting outcome of the October 2022 AudioMoth deployment on Lady Bay reef, I intended to return and deploy a pair of AudioMoths to make further observations of the reef’s underwater environment.

You can go back to the previous post if you want a more detailed overview of this activity. But given it’s been a while since the last post, I thought I’d include a summary since it’s important to grasp the concept of the acoustic ecology index (ACI) and the methodology applied to the data analysis, which will be applied to the more recent data.

Review of the October 2022 AudioMoth deployment:

The AudioMoth was deployed over a 22-hour period from the 8th to the 9th of October and the resulting audio data – when computed as a measure of acoustic complexity (ACI) – revealed a strong association between high/low ACI and predicted low/high tide marks.

Here’s what the measure of ACI looked like over this period. The tide marks are indicated with the coloured triangles.

Aside from the two highest ACI points representing some heavy rain that arrived after I deployed the AudioMoth, it’s apparent that the ACI values trend upwards with the high tide mark and drop off with the tide’s retreat.

The high ACI values are due to the sound of the water’s movement and direct contact with the deployed AudioMoth.

The sound of the water movement is a complex acoustic phenomenon that dominates the frequency spectrum.

For example, here’s a spectrogram of one of the two heavy rain events (8/10, 12pm):

**The Acoustic Complexity value for this recording was 2453.**

The rain making contact with the surface of the water (and likely the AudioMoth enclosure) covers a wide extent of the frequency spectrum and is relatively consistent throughout the recording. The loudness however, is mostly confined to the lower part of the frequency spectrum (< 1kHz)

And a spectrogram from a predicted high tide mark (8/10, 5:20 pm):

**The Acoustic Complexity value for this recording was 2390.**

Note the instances of loud sounds encompassing the entire frequency spectrum – this is the tidal movement of the water, which is also carrying sand particles, seaweed detritus, etc.

And for contrast, a spectrogram from a predicted low tide mark later that same day (8/10, 11:30pm):

**The Acoustic Complexity value for this recording was 1847.**

Aside from some energy in the lower part of the frequency spectrum, the rest of the spectrogram is very quiet aside from brief transients. The continuous pattern that you can see is an instance of self-noise from the AudioMoth. On this occasion I had forgotten to disable the LED on the device, so this is the likely cause.

There are, of course, other variables present in the underwater environment that result in high levels of acoustic complexity, so using ACI to determine low and high tides isn’t reliable unless the observations are made in a location that receives the tides, away from the influence of other factors.

When I returned to Lady Bay in December 2022, I wanted to explore parts of the reef I couldn’t access back in October, such as the reef’s seagrass meadows.

December 2022 – returning to Lady Bay with a low tide

As I mentioned in the introduction, for this visit to Lady Bay I came with two pre-configured AudioMoths. Like the October AudioMoth deployment, both were synchronised to record at ten-minute intervals for 60-seconds over a continuous 24-hour cycle. They were deployed before noon on the 23rd of December and were retrieved and deactivated around noon on the 31st of December.

Because the tide was extremely low, this allowed me to venture far out onto the reef and deploy the AudioMoths in the locations I had hoped to access.

Given there was a lot of data to pore over from both sites, for this post I’ll just cover the first site and cover the other one in a subsequent post.

AudioMoth location 1: Seagrass meadow cove

The first deployment location on the reef was an enclosed tide pool about 100 metres out from the shore. It was roughly circular in shape, half a metre in depth (at low tide) and about 8-10 metres in diameter. It consisted of soft white sands, patches of seaweed and a dense meadow of seagrass extending across one side of the pool.

I secured the AudioMoth with its velcro strap to an old dumbbell, along with a bright orange tag to assist locating it later on. I found a spot in the middle of the seagrass meadow and positioned the device there.

Seagrass meadows can be fascinating sonic environments. When I made some recordings of these environments on the reef back in January 2021 (with conventional hydrophones) I was amazed by the variety of sounds emanating beneath the dense canopy of the meadow. Aside from the grunt of fish and the scraping of crustaceans, the most surprising sound were hissing, wheezing and crackling sound that slowly rose and fell in pitch.

With little chance of accurately determining the origin of these sounds, I passed the recordings onto some colleagues for their opinion. Whilst not conclusive, it’s believed they’re coming from a) the sound of tiny underwater insects; and/or 2) the sound of the seagrass itself.

I’ll come back to these mysterious sounds, but first let’s have a look at the data collected from this site over the course of the week.

As usual, you can find the code and outputs for this analysis in the blog’s GitHub repository.

Plotting the acoustic complexity data

With the usual acoustic complexity processing and data tidying done, we can see that for the seagrass meadow site there are 1152 points of data representing seven 24-hour periods for 24th to 30th of December, (bookended by two shorter periods on the days of deployment and retrieval (23rd and 31st December))

data %>% 
 filter(index == "acoustic_complexity", site.name == "Meadow cove") %>% 
 nrow()
 
[1] 1152

I plot this with the 24-hour time on the x-axis and acoustic complexity on the y-axis:

As with the October 2022 plot, I’ve included the predicted high and low tide marks for each day here. The mean of the ACI is included with horizontal blue line.

Despite being a bit noisy (noisy in a data-sense), overall there is the semblance of a trend which indicates high ACI towards around midnight, followed by low ACI from late morning into the afternoon – this is where the projected smooth interpolation (with its margins of error) come in handy.

Let’s see some summary stats for these points (see the complete R script for the summary function I use here.)

Firstly, for all 1152 points of data:

wita13_overall.stats <- summary.stats(data %>% filter(index == "acoustic_complexity",
                                                       site.name == "Meadow cove"), "all")
wita13_overall.stats

# A tibble: 1 × 5
    min   max  mean   med std.dev
  <dbl> <dbl> <dbl> <dbl>   <dbl>
1 1693. 3341.  2525 2547.    369.

And for each of the days:

wita13_daily.stats <- summary.stats(data %>% filter(index == "acoustic_complexity",
                                                     site.name == "Meadow cove"), "daily")

wita13_daily.stats

# A tibble: 9 × 6
  date         min   max  mean   med std.dev
  <date>     <dbl> <dbl> <dbl> <dbl>   <dbl>
1 2022-12-23 1866. 3276.  2328 2168.    383.
2 2022-12-24 1693. 3245.  2537 2592.    394.
3 2022-12-25 1940. 3309.  2572 2682.    424.
4 2022-12-26 1855. 3270.  2465 2535.    369.
5 2022-12-27 1865. 3305.  2481 2507.    358.
6 2022-12-28 1985. 3254.  2506 2456.    286.
7 2022-12-29 1917. 3300.  2556 2564.    352.
8 2022-12-30 1883. 3317.  2524 2543.    377.
9 2022-12-31 2024. 3341.  2755 2705.    229.

Overall, the mean for the total observations and the daily means are generally higher than the mean recorded back in October.

October 8-9/10/22 – ACI mean = 2118

December 23-31/10/22 – ACI mean = 2525

Whilst the ACI mean associated with the October data was almost entirely associated to the sound of water movement (and what it carried in it – e.g. sand particles, detritus), the higher ACI mean for the December data can be attributed to the seagrass environment that the AudioMoth was deployed in.

Here, there are additional factors determining the higher ACI values: the interaction of the tides on the seagrass meadow and the seagrass meadow on the AudioMoth enclosure.

I filtered the observations for the 24th of December and printed the five highest ACI values for this day.

data %>% filter(day == 24, site.name == "Meadow cove") %>% arrange(desc(value)) %>% head(n=5)

  date       time    year month   day  mins  hour index               value site.name   period   season
  <date>     <time> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>               <dbl> <chr>       <chr>    <chr> 
1 2022-12-24 02:20   2022    12    24    20     2 acoustic_complexity 3245. Meadow cove pre-dawn Summer
2 2022-12-24 04:20   2022    12    24    20     4 acoustic_complexity 3210. Meadow cove pre-dawn Summer
3 2022-12-24 05:00   2022    12    24     0     5 acoustic_complexity 3203. Meadow cove dawn     Summer
4 2022-12-24 04:00   2022    12    24     0     4 acoustic_complexity 3189. Meadow cove pre-dawn Summer
5 2022-12-24 02:50   2022    12    24    50     2 acoustic_complexity 3185. Meadow cove pre-dawn Summer

Let’s have a look at the data points again, but in order to get a better insight into the trends from day to day, I’ll plot the days side by side. Note: I’ve excluded the days for the 23rd and 31st since they’re not full days of observation.

And for the ease of your eyes, here’s each of the plots in greater detail which can be clicked through sequentially:

We can see that each of the days generally adhere to the same trend: high ACI late at night, dropping off towards dawn, then at its lowest from mid-morning into the afternoon before picking up again.

But what’s evident here is that – unlike the October observations – the high and low tide marks aren’t matching up with high and low ACI. Only the low tides around noon each day correspond with low ACI.

So, if you’re going to be determining tides using ACI, it’s my understanding – based on the research thus far – you can only really do it in open spaces with next to no interference from anything else but the water itself.

Still, more observations would be ideal – something to explore later this year perhaps.

Examining the 24th of December and seagrass ambience

At this point, we’re going to focus on the first full day of observation (24/12) and I’m going to direct you to the following video I made for this post. It’s something new, and I thought it would be a good way to incorporate the audio clips, spectrograms and some of the data plots all in one go.

There’s a quick summary of what Acoustic Complexity is, followed by an excerpt of the recording responsible for the high ACI value we looked at previously.

Now, remember how I mentioned the mysterious sounds of seagrass meadows? Well, it’s happening in the other audio recordings featured in this video.

As you’ll find in the video, these sounds are largely restricted to the periods of low ACI occurring between approximately 10am to 4pm on each day.

In order to get a better understanding of this seagrass soundscape and how distinguishable it is in terms of acoustic complexity, I’ve taken regular snippets of the one-minute audio recordings from the 24th of December and ‘time-compressed’ the day into a 12-minute version.

What this means is that I took every one-minute audio file for the 24th of December (144 in total) and put together a macro in Audacity to edit each file down to 5-second edits and then pasted them together in a sequence as a single audio file.

Here’s the audio recording:

And the accompanying spectrogram:

Observation for 24th December 2022. Audio data time compressed into a 12-minute audio file.

Can you tell where the low ACI and the seagrass sounds occurs? It’s fairly striking isn’t it?

Here’s the same spectrogram, but annotated with what the actual time is at given points:

I wrote a function to determine the actual time from a time compressed audio file. You can find the code in the blog repo in both R or Python scripts. The Python version is particularly useful, since you can execute it from the terminal and you don’t have to bother with booting an IDE every time you want to use it. This is one of the things that’s motivating me to learn and apply Python more.

So, you’re probably wondering how this spectrogram matches up with the plot of ACI for the same day.

Let’s have a look at it, with the same time marks annotated on the plot:

Something that I find particularly fascinating about this is just how much the quiet seagrass soundscape – occurring from 9am to 3pm – stands out against the rest of the day.

Since I’ve served up a sizeable chunk of information in this post, I’ll be continuing this exploration in the next instalment, along with a look at the data from the other AudioMoth deployment. Since both devices observe the same period at the same intervals, it will be interesting to see what differences and similarities occur!

Also, I promise not to take 4-5 months this time around.

In the meantime, if you have any thoughts, suggestions or feedback please pop them in the comments or get in touch with me directly.

Footnote: below is a data sheet containing some of the plots already generated, along with some tweaked parameters to illustrate the ACI trends over periods of the day. I’ve been gradually learning how to arrange plots together as a single composite image and I must give a recommendation (and much-deserved/long-overdue shout out) to Dan Oehm and his excellent data science and statistics blog, Gradient Ascending. I’ve been following his work ever since I started learning data science and his blog has been an essential resource for learning new skills, hacks and whenever I’m stuck for inspiration.

http://wranglingintheantipodes.wordpress.com/?p=1030

Extensions

Acoustic complexity as an indicator of tidal activity

tristanlouthrobins Dec 19, 2022

Back in October, I made a trip down to a reef located at Lady Bay, a beach located about two kilometers south of the township of Normanville, on the western coast of the Fleurieu Peninsula. I’d come down here with intention of deploying an AudioMoth to observe the underwater soundscape from about noon to theContinue reading "Acoustic complexity as an indicator of tidal activity"

Show full content

Back in October, I made a trip down to a reef located at Lady Bay, a beach located about two kilometers south of the township of Normanville, on the western coast of the Fleurieu Peninsula. I’d come down here with intention of deploying an AudioMoth to observe the underwater soundscape from about noon to the same time the following day.

My original plan had been to place the AudioMoth some way out from the shore, ideally within a secluded area of the reef system, such as one of the many sunked areas, crevasses and sea grass meadows I’d identified and documented on previous visits.

Here are some excerpts of recordings that have been made across the reef system:

Tristan Louth-Robins · Lady Bay Reef: four locations – Jan 2021

These areas are located about 100-200 metres from the shoreline, so in order to get to them easily it’s best if the tide is low and the weather conditions are okay. Prior to this October trip I’d checked this well in advance, using the Bureau of Metereology’s tidal forecast to determine what day and time the tide would be low. Of course, the BoM’s tidal forecast isn’t that accurate and I’ll come back to this aspect later on.

The secluded areas located in the further reaches of the reef had presented themselves as attractive sites for long term observation since they possess a rich sonic complexity and diversity, where fish, invertebrates and plants fill the soundscape. When I’d made conventional field recordings at these sites (typically 10-20 minutes in duration) I’d wondered how much these soundscapes were affected by the climactic conditions and the movement of the tides. Since I couldn’t leave my recording gear in one spot for too long, using a passive acoustic monitoring device such as an AudioMoth seemed like the perfect solution.

Time & Tide

In the past when I’ve explored the reef, I’ve found that a tide forecast of anywhere between 0.1 and 0.3 metres allows me to get out to the furthest reaches. Now, this – as I mentioned before – isn’t entirely accurate since the forecast applies to the tides located at Second Valley, located about 10km south of Lady Bay. When I first started considering tides as part of field trips I knew hardly anything about how tides worked (aside from their association with moon cycles) and a couple of years later, I’m still largely confounded by them. There’s plenty of other variables (aside from the moon) to consider and I’m not going to even attempt to distill them here, suffice to say that a forecast (regardless of it’s specific or approximate locality) is just that: a forecast.

So with a weekend field trip in mind, I looked to the forecasts for Saturday and Sunday and they looked suitable enough, even if I thought I was maybe pushing it a little with Saturday’s predicted low of half a metre. The weather for the general area looked favourable too, though some light showers were predicted in the mid-morning. It seemed like I’d be able to wear some shorts and in the instance that the water got up to my ankles I was hoping the water wouldn’t be freezing.

On the day, I arrived at Lady Bay by about a quarter to noon.

Tide forecast for Second Valley 8th October to 11th October. Data: http://www.bom.gov.au/australia/tides/#!/sa-second-valley

The water was freezing. At about 20 metres out from the shore the water was shin-deep as shallow waves beat against my knees in regular pulses leaving them slightly blue. A stiff breeze punched in from the south west rapidly carrying a forbidding black mass of cloud. Regarding the tide forecast, I think it was actually on the mark of about half a metre, proving that I hadn’t ever gotten that far out on the reef when the water was like this – let along freezing. So there was no sensible way of getting to my chosen location and I was running out of time to deploy the AudioMoth, unless I chose to be utterly soaked from the bottom-up (incoming waves) and top-down (incoming rain storm.)

Alternatives: deploying for the tide

Finding myself a bit flustered by the situation, I decided that I’d simply have to make do with where I could get to and see what happened. I found a section of the reef about 50 metres from the shore and fastened the AudioMoth (in its waterproof enclosure) to the reef with a velcro-strap. The AudioMoth had been configured to record for a minute at ten-minute intervals for the next 24 hours.

The AudioMoth secured to a shallow part of the reef.

The AudioMoth’s position wasn’t secluded and was clearly surrounded by the flow of water around it. It occurred to me at the time that might be an interesting alternative, since the AudioMoth would likely document tidal movements in this location and that there might be an association between the tidal peaks/lows and acoustic complexity.

However, the most important consideration was the hope that the AudioMoth would be there when I returned the following morning!

Returning the following morning (a little before 10am) in much calmer conditions, I quickly found the AudioMoth – which aside from being covered in a bit of seagrass and specked with sand – still looked secure.

Retrieving the deployed AudioMoth on the morning of 9/10/2022

A quick glance at the acoustic data

With the audio files extracted from the MicroSD card, I examined the spectrograms from the first batch of recordings. The first and second recordings (from 12:00 and 12:10) clearly documented the rain that continued following the deployment of the AudioMoth. Hardly anything other than the rain could be discerned from the first 12:00 spectrogram (and indeed listening to the audio itself), but by the 12:10 recording it’s evident that the rain has lessened with evidence of shallow waves coursing around the AudioMoth. At 12:20, the rain has stopped and we’re left with the periodic wave activity, incidental water movement and (very) discrete transient sounds.

8/10/22 – 12:00 pm – Heavy rain event dominates the recording.
8/10/22 – 12:10 pm – Still raining, though wave activity at 19, 25, 29 and 57 second marks.
8/10/22 – 12:20pm – Rain has ended. Wave activity and general water movement evident across recording.

So that was what was going on following the deployment – amongst the rain, wind and some tidal movement. What about later on when I was fairly certain that the sea and surrounding conditions would be calmer? This might back up a hypothesis that the tides have some kind of clear association with acoustic activity.

Jumping ahead to 10pm as the predicted tidal low approaches, we see three instances of a very barren soundscape.

8/10/22 – 22:00pm
8/10/22 – 22:10pm
8/10/22 – 22:20pm

Aside from some localised clicks and pops, there’s next to nothing in terms of wave activity or water movement.

Data analysis

Using a standard acoustic index workflow in R , I imported all the acoustic data collected from the deployment period, ran it through an Acoustic Complexity algorithm and plotted the values against time in an annotated chart.

It’s visually apparent that there’s a correlation between the Acoustic Complexity values and the tidal peaks/lows, since a high peak is typically associated with a consistent influx of waves, whereas a low peak consists of the steady receding of water. Note the two data points representing the rain events following the AudioMoth deployment. Is there a correlation between rain on water and acoustic complexity? Oh, yes there is!

Scatter plot of Acoustic Complexity over period of deployment.
The first two data points represent the rain events that occurred on and following deployment of the AudioMoth.
Red triangle denotes predictions for high tide mark and blue triangle denotes low tide marks.

In fact, if I plot a smooth regression line over the points we can see just how much those first two values skew the trend of the data (below left.) Removing these two points (below right) gives us two closely matched trends with only small margins of error.

The same plot with an overlaid smooth regression line. Removing the first two data points eliminates the visible skew at commencement.

Conclusion and further considerations

I have to admit that I was genuinely surprised by just how much of a correlation there was in this analysis. A longer period of deployment over several days would be a more reliable means of establishing any kind of statistical baseline though.

I think a couple of factors worked in the favour of this deployment and its fairly accurate association between tidal and acoustic complexity:

The AudioMoth was deployed in a relatively clear, unobstructed and shallow part of the reef where it would be exposed to continuous tidal activity.
The weather conditions (aside from the initial rain) appeared to be reasonably calm. This was evidenced especially by the very low values recorded at the tidal low towards midnight.

This is why I think deploying for a longer period of time would be useful in terms of understanding just how much the variability of the tides and weather conditions over a week or two might obscure an otherwise strong correlation between tides and a measurement like acoustic complexity.

I’m planning to redeploy the AudioMoth over the coming Summer holiday period. For one thing, the water will be a bit warmer! I might even use two Moths for the benefit of observing different reef locations simultaneously. You can be sure I’ll be back with an update on this analysis sometime in the near future.

In the meantime, you can find the script and the dataset for this analysis on the blog’s GitHub repository here, the code for the acoustic index workflow is here.

As always, thanks for reading and please do pass this onto anyone who you think might be interested in the work I’m doing.

http://wranglingintheantipodes.wordpress.com/?p=970

Extensions

Acoustic detection in monitoR – an overview

tristanlouthrobins Aug 10, 2022

Up to now on this blog I’ve been exploring acoustic ecology and data science tools mostly within the context of acoustic indices, regular events occurring in a time series and trend-based analysis. This time around, I thought I would introduce an R package developed by Sasha D. Hafner and Jon Katz called monitoR. Its primaryContinue reading "Acoustic detection in monitoR – an overview"

Show full content

Up to now on this blog I’ve been exploring acoustic ecology and data science tools mostly within the context of acoustic indices, regular events occurring in a time series and trend-based analysis.

This time around, I thought I would introduce an R package developed by Sasha D. Hafner and Jon Katz called monitoR. Its primary application is to perform acoustic detection using two template matching algorithms: binary point matching and spectrogram cross-correlation.

Since I intend for this post to provide a general introduction to monitoR, I’m only going to focus on the algorithm that performs spectrogram cross-correlation. However, I certainly intend to examine binary point matching at a later stage on the blog.

I just want to be upfront at this point and state that this post is by no means an authoritative overview of the monitoR package. monitoR is a highly sophisticated package which – to date – I’ve only scratched the surface of, so I’m not diving too deeply into the finer aspects of its functionality this time around and will instead provide more of a general use-case for the acoustic detection.

As the results of this introductory use-case will later illustrate, a deeper dive into monitoR’s full functionality will certainly be useful.

Applications of acoustic detection algorithms

Before jumping into the technical aspects of the cross-correlation process, it’s probably worth appreciating why acoustic detection packages such as monitoR are such a big deal in the fields of acoustic ecology and bioacoustics. Acoustic indices – as I’ve extensively covered on this blog – are incredibly useful for determining aspects of a soundscape, relevant to its biodiversity (BI, ADI, AEI), complexity (ACI) or ratio of biophony to anthropophony and technophony (NDSI.) And whilst these tools are great for examining overall trends, you can’t necessarily use indices to measure the uniqueness and regularity of specific acoustic events – such as the sound (or acoustic signature) of a particular species.

In this respect, one could associate acoustic detection algorithms with that of an automated camera trap that utilises machine learning algorithms to detect particular species in a habitat. Such algorithms are often dependent on a training set of images defining a given species physical (and even behavioural) characteristics.

Applying this back into the field of acoustic monitoring, the notion of an ‘acoustic trap’ based on an algorithm’s training data has lots of applications in areas of wildlife research and conservation, particularly where cameras are unreliable or simply cannot observe certain activity. Aside from identifying biological life (song repertoire and behaviour), recent research studies have also been conducted using monitoR to detect unwelcome human presences in habitat zones, such as the prevalence of poaching (gunshots) and illegal logging and land-clearing (chainsaws, heavy machinery.)

Cross-correlation templates

Using a cross-correlation algorithm, monitoR performs acoustic detection using a correlation template of a unique acoustic event (such as a bird song), projects this across the spectrogram of acoustic data and returns peaks with a subset of detections.

As a very simple explanation: a ‘peak’ is the algorithm recognising an acoustic event which meets the (very) general correlation criteria of the template; the ‘detection’ is when a peak meets a specified threshold/score which is set between 0.0 to 1.0 (typically 0.4 or 0.5.)

The closer that the peak’s resemblance to the spectral and temporal characteristics of the correlation template, the higher the score.

So for example, if I set a threshold for the correlation template to 0.5, a given process on acoustic data returns five peaks with scores of [0.234, 0.543, 0.452, 0.761, 0.12] The algorithm would then only return a subset of two actual detections [0.543, 0.761]

Let’s now go back to the correlation template itself. An acoustic event – such as a bird call – is a blob of energy occurring over a temporal space within a given frequency range.

Below is an illustration of a hypothetical acoustic event occupying a frequency range over a period of time. For the sake of simplicity here, I’ve omitted amplitude since we’re just concerned with windowing the acoustic event.

Now in order to make a template, we want to set parameters that isolate the frequency and temporal parameters of the acoustic event .

Let’s say the lowest bound of the event’s frequency range is 1500Hz (1.5kHz) and the upper bound 3000Hz (3.0kHz), with it’s total duration lasting 0.8s.

With these parameters in place, we now have a windowed template (highlighted green below) that can be used as the training data for the acoustic detection algorithm in monitoR.

It looks pretty straightforward doesn’t it? Indeed it is, but there are plenty of caveats to consider and I’ll address some of these once I implement a case study in R and weigh up the outcome.

Case study: Middle Farm, The Australian Reed Warbler (and a common pigeon)

For this introductory case study using monitoR, we’ll be returning to Middle Farm and Carrickalinga Creek. I’d made a couple of recordings in September 2021, just as Spring began and I was eager to document the noticeable uptick in bird activity in and around the area. The site of interest was adjacent to the creek on a lower track and I positioned a handheld recorder for about a two hours in the mid-afternoon. It resulted in a fine recording: documenting the creek in a gentle flow as birdlife teemed around the general vicinity. Satisfied with this, I went back the following day and made another recording at the same site and same time.

Although there was a clear diversity of birdsong evident in the recording (New Holland Honeyeater, Superb Fairywren, Golden Whistler, et al), my attention was captivated by the extensive and powerful song repertoire of the Australian Reed Warbler. In spite of its diminutive size, it’s a sonic beast in terms of its dexterity and projection. Boy, is it loud!

The Australian Reed Warbler. Image credit: Gary Tate (via Wikipedia) CC-BY-S.A. Link: here

As I started thinking about writing a post covering monitoR and acoustic detection, I considered various bird species and thought that the Reed Warbler would be an interesting choice, given its varied song repertoire, clarity of song and overall loudness. I decided that I would use three of its short calls for my correlation templates.

You can listen to an excerpt of one of the recordings below which features the Reed Warbler’s amazing song repertoire.

Tristan Louth-Robins · Australian Reed Warbler – Middle Farm, South Australia (September 2021)

Oh yes, the humble pigeon as well! In both recordings, the familiar and consistent coo of a common Australian pigeon (aka: Rock Dove) can be heard throughout. So along with the Reed Warbler’s songs, I thought it would be interesting to also see how well the detection algorithm might pick up the pigeon’s song that is – in contrast to the Reed Warbler – largely unvaried and quite subtle. Therefore, I used only one call from the pigeon.

Since I had two recordings from this site, the audio for the species correlation templates were derived from the first recording and the test audio derived from the second recording made on the following day.

Implementing correlation templates in R

As usual, you can find all of the code for this analysis at my GitHub and I’ve posted a link to this at the end of this post.

With the monitoR library installed and loaded into R, I read in the .wav files of the audio for the correlation templates:

# read in audio clips of Australian Reed Warbler song repertoire:
warbler1 <- readWave("warbler-1.wav")
warbler2 <- readWave("warbler-2.wav")
warbler3 <- readWave("warbler-3.wav")

# read in audio clip of the bonus pigeon:
pidgeon <-  readWave("bonus_pigeon.wav")

Using a call to viewSpec(), I can now visualise the audio files I just read in.

viewSpec(warbler1)

Looking at this spectrogram, I can make a determination on the temporal/frequency parameters I’ll want to set for the creation of the correlation template.

I set the time limit (t.lim) to a range of 0.1 to 1.0 seconds and a frequency range (frq.lim) of 2 to 6kHz. The user-defined threshold (score.cutoff) to subset a detection from a peak is set to 0.5.

Since I don’t want to create an additional .wav file for this audio (i.e. write.wav = FALSE), I’m going to need to read the .wav file directly via the call to makeCorTemplate(). This is a little bit clumsy and double-handed in terms of workflow – I did just previously create an audio object – but rather than get bogged down, I’d encourage you (if you’re curious) to read the official documentation to get a more detailed understanding of why the ‘write.wav’ argument exists.

# make correlation templates for the Reed Warbler:
warbler1.cor <- makeCorTemplate("warbler-1.wav", 
                                write.wav = FALSE, 
                                t.lim = c(0.1, 1.0), time range of 0.1 to 1.0 s
                                score.cutoff = 0.5, # set threshold
                                frq.lim = c(2, 6), # frequency range of 2-6 kHz
                                name = "w1") # freq lim of spectrogram plot

With the other correlation templates created, I now combine the templates into a single object.

For reference, for the pigeon template the frequency range was set to 0.3 to 1kHz and a time range of 0.1 to 1.0 seconds.

# combine the correlation templates:
ctemps <- combineCorTemplates(warbler1.cor,
                              warbler2.cor,
                              warbler3.cor,
                              pigeon.cor)

Now that I’ve got the correlation templates set up, it’s time to test it against an audio file.

Because I want closely examine what monitoR’s cross-correlation algorithm is doing, I’ll start with a short section of test audio of just one minute in duration.

# perform cross-correlation and compute scores
cscores <- corMatch("short_test.wav", ctemps)

corMatch() will now perform a Fourier analysis on “short_test.wav” and compute the correlation scores. This is the most labour-intensive part of this process, since the algorithm’s doing the hard work of going over the test audio and seeking to correlate the templates with what it finds. One minute of audio won’t necessarily break a sweat on a humble laptop, but longer audio files might require several unencumbered processing cores. If you enjoy longer-than-usual wait times, it’s a perfect opportunity to go make a cup of tea, sit outside and watch the clouds go by.

Oh, it’s already done! I’ll now extract the summary data from this process by calling findPeaks():

cdetects <- findPeaks(cscores)

Printing the ‘cdetects’ object:

A "detectionList" object

Based on survey file:  short_test.wav / 1 minute

and  4  templates

Detection information
   n.peaks n.detections min.peak.score max.peak.score min.detection.score
w1      46            1    -0.06185748      0.9001678           0.9001678
w2      44            3    -0.21906429      0.8537730           0.5146709
w3      36            1    -0.01083766      0.8104413           0.8104413
p       33           15    -0.05258981      0.6716617           0.5245724
   max.detection.score
w1           0.9001678
w2           0.8537730
w3           0.8104413
p            0.6716617

We’ve got a bunch of insightful data here: the number of peaks found (n.peaks), the subset of detections (n.detections), along with the mininum and maximum values for the peaks and detections. Note here that ‘w1’, ‘w2’ and ‘w3’ are the templates for the Reed Warbler, and ‘p’ is the template for the pigeon.

Whilst a a few detections were picked up for the Reed Warbler (w1=1; w2=3; w3=1), look at our humble pigeon: 15 detection in one minute!

By simply calling plot() to the ‘cdetects’ object, we can now visualise the results as two combined plots:

Top: a spectrogram of the test audio annotated with detections.
Bottom: an adjacent plot of correlation scoring.

Below are the plots for this one minute of test audio. Note that the plot() call of the results will produce 30 seconds of the analysis at a time. Click on the arrows to view each of the plots.

Here, the results of the ‘cdetects’ object are represented visually, with the detections annotated on the spectrogram with distinctive coloured windows. The same colours represent the peaks and detections in the bottom plot. Detections occur where the peaks reach and pass the threshold score of 0.5 (dotted line), lining up with annotations in the spectrogram.

We can see that the cross-correlation process has done a pretty good job of locating the Reed Warbler’s three distinctive calls and has found every single call of the pigeon over the one minute. Not bad!

Scaling up for 30 minutes of test audio

I’ll now scale this up significantly. I’m going to test the correlation templates on thirty minutes of audio.

In spite of eight cores of power on my desktop, I’ll go and make a cup of tea and come back in ten minutes.

[…]

Done. With the peaks extracted and a new ‘cdetects’ object created, I inspect the results:

Based on survey file:  really_long_test.wav / 30 minutes

and  4  templates

Detection information
   n.peaks n.detections min.peak.score max.peak.score min.detection.score
w1    1472            8     -0.1229978      0.7443102           0.5861562
w2    1323           77     -0.2888359      0.7802414           0.5009818
w3    1057            3     -0.2536747      0.6278676           0.5050845
p     1358           16     -0.1447970      0.6215539           0.5016686
   max.detection.score
w1           0.7443102
w2           0.7802414
w3           0.6278676
p            0.6215539

Look at all those peaks and detections! Curiously, there’s only 16 pigeon detections this time around. Maybe the pigeon was just a fleeting presence this time around?

Rather than plot these results out in detail – which would result in sixty individual plots – I’ll use the call to getDetections() on ‘cdetects’ to print out a table of just the detections and subsequently plot the results as a scatterplot.

With a bit of ggplot magic and aesthetic rumination, I can represent the detections as a plot of correlation scores against duration, with the respective species and their call types represented by shape and colour.

Below are three plots. The first represents all of the detection data in a single plot; the second is the same, but highlights the region of the pigeon activity; and the third plot subsets each of the template detections into their respective call types.

There’s a lot to take in here, which is why I thought it would be useful to create a plot that separates the call types. We can clearly see that the highest number of detections come from Reed Warbler B and that the pigeon detections are clustered within a period of under two minutes. Reed Warbler A clearly performs best of all score-wise, returning 8 detections with a reasonable amount of confidence (mean = 0.677.)

Here’s the summary statistics for all of the call types:

> summary_stats
# A tibble: 4 × 7
  template           n   min   max  mean     sd median
  <fct>          <int> <dbl> <dbl> <dbl>  <dbl>  <dbl>
1 Reed Warbler A     8 0.586 0.744 0.677 0.0545  0.695
2 Reed Warbler B    77 0.501 0.780 0.567 0.0568  0.558
3 Reed Warbler C     3 0.505 0.628 0.549 0.0687  0.513
4 Pigeon            16 0.502 0.622 0.545 0.0306  0.545

All of the points that float on or just above the 0.5 threshold are a bit concerning. This is summarised by the above means and medians as well. In fact, 79% of the detections (82 out of 104) have a score of 0.60 or less, which means we can’t be entirely confident that the detections are accurate.

The plot below breaks down the scores into ranges above the threshold of 0.50.

Was it accurate?

I decided to take a cursory look at the spectrogram of the test audio (outside of R) to see if I could spot anything glaringly obvious. Given the complexity and diversity of the Reed Warbler’s calls, it’s pretty difficult to discern each of the templates against the data this way, and honestly this approach kind of defeats the purpose of using an acoustic detection algorithm in the first place!

What I can discern here is that there was a lot more pigeon activity that the pigeon correlation template didn’t pick up. I’ve highlighted this in the image above. As you can see, it begins around the ten-minute mark and continues at semi-regular intervals for another ten minutes or so.

So from this vantage it’s difficult to conclude whether the Reed Warbler templates are accurate, but something certainly appears to be amiss with the pigeon template.

Right then..some considerations

So, I think there’s basically two things to consider here:

1) How can I best determine the accuracy of a correlation score?

2) And further, what can I do if the template doesn’t appear to be performing as well as we expect?

With the first issue, an obvious solution might be conducting further tests with other sets of data and observing how accurate the correlations are. After my cursory glance at a 30-minute spectrogram of audio, it might be worthwhile scaling the tests back down to shorter audio files and exploring the results from there. Recall that my initial test was done with a one-minute audio file. The results were quite good (especially pigeon accuracy), but this could have just been a fluke or something owing to – as yet – unknown factors.

There are some more sophisticated methods in monitoR involving cross-validation, but this will require a bit more research. It’s certainly something to look into at a later stage.

On the second issue, I think that returning to a scaled down test and tweaking the parameters (such as the frequency range and time limit) to see if there’s any improvement in performance might be a good idea too.

Considering both of these issues at once, another possibility could simply be that the audio provided for the templates simply isn’t good enough! It’s a possibility.

Lastly, you might recall at the beginning of this post that I mentioned monitoR’s other algorithm: binary point matching. I’m wondering if employing a different algorithm could improve detections? Hmm. That’s for another time.

So, that’s where things are up to with this foray into the monitoR package for R. Thanks for reading! In the interim, I’ll keep tinkering away. The code’s up at my GitHub and if you’ve got any comments or feedback please pop a comment below or get in touch directly.

GitHub link to the code in this post: https://github.com/TristanLouthRobins/wrangling_in_the_antipodes/blob/main/wrangling_post_11-intro_to_monitoR

http://wranglingintheantipodes.wordpress.com/?p=516

Extensions

NDSI: parameters in context

tristanlouthrobins Apr 9, 2022

In an earlier post from 2021, I ran through a basic use case for an acoustic index called the Normalised Difference Soundscape Index (NDSI). For this instalment of the blog, I want to return to examining this acoustic index, examining how parameter setting can be important within the context of a given location. If you’reContinue reading "NDSI: parameters in context"

Show full content

In an earlier post from 2021, I ran through a basic use case for an acoustic index called the Normalised Difference Soundscape Index (NDSI). For this instalment of the blog, I want to return to examining this acoustic index, examining how parameter setting can be important within the context of a given location.

If you’re not already familiar with this index, most of this post should still be understandable. However, if you’d like to know more about the origins of the index and the initial use case I conducted, then read this previous post.

In the initial exploration of the NDSI, I highlighted the parameters that can be set in the call to the functions, ndsi() and multiplesounds() contained in the R package, acousticecology. You may recall that the function ndsi() accepts one audio file as an input, whereas multiplesounds() accepts a directory path for many audio files.

# a single audio file
ndsi(soundfile, fft_w = 1024, anthro_min = 1000, anthro_max = 2000, bio_min = 2000, bio_max = 11000)

# many audio files
multiple_sounds(directory = "/home/user/wavs/", 
resultfile = "/home/user/results.csv", 
soundindex = "acoustic_complexity")

While ndsi() provides an ideal function for ‘snapshot’ observations, for the exploration of acoustic data trends over a time series and investigating potential correlations with covariates, multiplesounds() is a far more useful tool, especially in terms of its incorporation into an EDA workflow.

Many files, much time

In the preparation for this post I employed multiplesounds() to perform NDSI on thousands of minute-long audio files recorded at regular intervals over a five-week period. I thought that since I’m returning to an index already covered on the blog, it’s appropriate that I should return to a familiar site of observation: my balcony in the Adelaide suburb of Parkside.

From December 2021 to January 2022, I deployed an AudioMoth on the balustrade of the balcony to record periodically between 5am to 9pm at 10 minute intervals, recording for a minute at each iteration. A full day would result in 96 individual audio files. Given that I changed over rechargable batteries and do a MicroSD data dump a couple of times *, this meant that every day didn’t result in 96 audio files and from the 13th of December 2021 to 23rd of January 2022 I ended up with 3,947 audio files.

* Given my past misfortune with an AudioMoth, the regular data dump was done more as a means of not losing any data through malfunction or mishap. It wasn’t a big deal either way since my studio leads onto the balcony! In case you’re wondering why I didn’t just do this at the end of a given recording period (i.e. after 9pm), honestly – these days – I’m already in bed!

I figured that a 5am to 9pm recording schedule would document a good cross-section of familiar sounds indicative of the neighbourhood soundscape.

Such familiar sounds include: birdsong, neighbours coming and going from the driveway (I live in a six-townhouse complex on one block), close and distant voices, cars travelling down the street, air conditioners, dogs barking, leaf blowers, power tools, doors opening/closing, crickets chirping, the thrum of rush-hour traffic from the distant main roads, the flying foxes that pass over on summer evenings and the distant drone of aircraft.

Since the NDSI algorithm is designed for measuring the ratio of anthropophony (human-made sounds) to biophony (non-human biological sounds), those sounds I listed above certainly dispose themselves into two distinct categories and largely compliment the utility of NDSI, going some way to accurately quantifying key characteristics of an acoustic environment. This is especially true in a suburban environment, where the perceived amount of anthropophony to biophony fluctuates both regularly and unpredictably.

Though it’s been previously covered in the earlier blog, a refresh on the NDSI scale is probably useful at this point.

NDSI calculates an index value for a given observation on a scale of -1 (strong anthropophony) to 0 (predominantly neutral/equilibrium) to 1 (strong biophony.)

Depending on the length of a given observation – let’s say a minute – the acoustic intensities occurring over the frequency spectrum will be evaluated on two primary parameters: the acoustic intensities occurring within frequency windows to measure anthropophony (typically 200Hz to 2kHz) and biophony (2kHz to 8kHz), respectively. From this, the events in the frequency spectrum are binned and a single value is returned.

With this background covered, we arrive at my current motivation for revisiting NDSI and its parameter control.

NOTE: Before we get underway, the bulk of the R code that I’m using for the following visualisations won’t be included in this post – simply for the sake of brevity. If you’re interested in exploring the code, I’ve popped it up on my GitHub here, annotating sections so it can be followed. The whole dataset is there as well, if you want to explore that. You’ll just need to update the call to import the data to your local directory.

Observations amiss

Prior to writing this post, I’d begun exploring the data that I’d collected from the balcony and processed the acoustic data for various – mostly familiar – acoustic indicies including: Bioacoustic Index (BI), Acoustic Complexity (ACI), Acoustic Diversity (ADI), Acoustic Evenness (AEI) and NDSI. Though the bulk of these indices appeared to accurately quantify and relate to aspects of the Parkside soundscape, something didn’t look quite right with the NDSI readings.

Compared to previous acoustic monitoring made from the balcony, the NDSI was returning surprising values across given days, with periodic and daily means at around +0.5 to +0.6.

FIGURE 1a: All 3,947 NDSI values for the period of December 13th 2021 to January 23rd 2022. The AudioMoth was configured to record for 55 seconds at ten-minute intervals from 5am to 9pm.

If nothing more was known about the site, such an NDSI value would point to a lively biotic environment (such as a wetland), as opposed to the actual site: a suburban home setting surrounded by civilisation and ~500 metres from two busy arterial roads.

To better understand the high weighting of biophony, here’s the previous plot overlaid with an NDSI colour spectrum:

FIGURE 1b: The previous plot, coloured by NDSI colour spectrum.

Loud biotic sounds (or those occurring in close proximity to the AudioMoth) could potentially bias the NDSI weighting. Additionally, it had been noticeably quiet towards the end of the 2021 too, with the impact of the Omicron COVID-19 variant briefly quietening traffic. However, these considerations couldn’t be applied to the majority of the observations collected over the entire period.

Considering the summary statistic of the mean and observing the general trend over the entire period of observation, I could tell that the index wasn’t accurately representing the Parkside soundscape, especially as an environment that I could hear with my own ears. Amidst the familiar flurries of birdsong heard in the mornings and throughout the day, the thrum of traffic and churn of aircraft overhead was still evident, as were the more immediate sounds of humanity closer to home.

After scratching head for a while, I popped the hood on my code and peered into my initial calls to multiplesounds(), quickly revealing the issue.

The parameter setting for NDSI within the call to multiplesounds() for the lower frequency of measuring anthropohony was at a default of 1kHz, so what would be regarded as anthropophony by the NDSI was restricted to a window of 1 to 2kHz, whereas biophony had its usual range of 2 to 8kHz.

Weighing it up with one observation

I decided to take a look at a spectrogram of a single observation to see how the 1-2 kHz range for anthropophony encapsulated acoustic events in Parkside.

The observation I selected was from 7am on a weekday, which I thought would be an ideal sample, since this time of the day features fairly balanced amounts of anthropophony (morning traffic) to biophony (birdsong.)

In the spectrogram below, I’ve annotated the respective ranges for anthropophony (red) and biophony (green.) The NDSI for this single observation returned a value of 0.9, which is an extremely high NDSI value weighted towards biophony.

If we examine those annotated ranges for anthropophony and biophony, it becomes apparent why the NDSI algorithm would be weighing more heavily towards 1.

FIGURE 2a: 14/12/2021 7:00am observation with 1kHz – 2kHz range.

Whilst we can see that the biophonic activity is largely accommodated within the range of 2 to 8 kHz, anthropophony is barely represented at all within 1 to 2 kHz, aside from a light haze of continuous activity in the lower half of the range.

We can see that the biophonic range contains lots of avian activity, with a gradual increase in the frequency of events and its overall intensity.

I’ve highlighted some of these general events in the same spectrogram below, along with some key anthropophonic events:

FIGURE 2b: 14/12/2021 7:00am observation with biophonic and anthropophonic events annotated.

It’s evident that there is plenty of anthropophony that wasn’t being included in the 1 to 2 kHz range, since it misses the two bands of stronger continuous activity and the sounds of two doors slamming (one loud, the other quieter.)

So, what happens to the NDSI value for this single observation if we widen the range of the anthropophony, maintaining an upper bound of 2kHz, whilst setting the lower bound towards 50Hz?

Here, I’ve computed the NDSI for this audio recording with five different ranges, along with their associated NDSI value.

Recall that the NDSI for 1kHz – 2kHz = 0.900.

500Hz – 2kHz: NDSI = 0.726
250Hz – 2kHz: NDSI = 0.625
200Hz – 2kHz: NDSI = 0.537
100Hz – 2kHz: NDSI = 0.237
50Hz – 2kHz: NDSI = -0.128

It becomes evident that as the range of anthropophony is increased, the NDSI value moves towards a neutral weighting of 0.

Categorising the soundscape

For ease of more clearly understanding these values, it might be useful if we categorise them.

Let’s say everything above 0.3 could be classified as biophony, with 0.3 to 0.75 as “moderate”, and 0.75 to one as “strong”. The inverse of these values (-0.3 to -0.75, -0.75 to -1) applied to anthropophony, and anything situated between -0.3 to 0.3 can be regarded as “neutral”.

I’ve plotted the values with their categories coloured below.

FIGURE 3: Points for NDSI ranges (left to right) categorised by their general soundscape.

Looking at the 50Hz value of -0.13, it’s tempting to conclude that this wide range of 50 Hz to 2 kHz is a more accurate statistic for this observation. After all, it’s now adequately accommodating the continuous and transient events in the lower frequency range, such as the thrum of traffic and the slamming doors.

But what happens if we extrapolate this wide range of 50 Hz – 2 kHz out for the remaining 3,946 observations and plot this over a time series?

Let’s compare this 50Hz range to the range of 1-2kHz:

FIGURE 4: Comparison of anthropophony ranges (50Hz to 2kHz) to (1kHz to 2kHz) over the entire period. Red horizontal line denotes the mean for each plot.

Interesting! The two plots are almost like mirror images of each other. 1kHz is largely weighted towards -1 (mean = -0.629), whilst 50Hz is largely weighted towards +1 (mean = +0.629.)

Let’s go a step further and visualise the remaining ranges over the entire period and compare the results.

FIGURE 5: Comparison of all anthropophony ranges.

Whilst 50Hz and 1kHz could be clearly distinguished from each other, aside from using the mean as a summary statistic it’s more difficult to understand what’s going on, especially in the case of ranges for 200, 250 and 500Hz.

We can generalise the data and produce a tidier insight by simply plotting the mean NDSI for each day across the entire time series.

FIGURE 6: Daily average NDSI for all anthropophony ranges.

Now we have a plot that indicates which days – on average – over the entire period had either stronger weightings of anthropophony (< -0.3) or biophony (> 0.3), and those that landed in a region of neutrality (-0.3 to 0.3)

So based on this, the question arises: what’s the most accurate range for anthropophony in the case of the Parkside soundscape, so that it may best reflect the NDSI value for given observations?

In order to best answer this, it might be useful to zoom back out and have another look at some more spectrograms to see how each of the ranges accommodates anthropophony and biophony. But before I do that, I’m going to provide another couple of plots and set of summary statistics which might steer us a little closer to an ideal range.

In the plot below, I’ve done away with the time series and instead opted for a boxplot of the data. To date, I haven’t produced a boxplot on the blog before, so if you’re unfamiliar with this kind of plot, I’d suggest quickly Googling/YouTubing an overview.

In a nutshell, it’s a very useful plot for summarising large quantities of data – visualising the data’s range (whiskers), interquartile range (lower, middle and upper margins of the box) and outliers (dots.)

FIGURE 7: Boxplot of NDSI anthropophony ranges. Red dots denote the mean for each range.

What do we notice about the boxplots for 200, 250 and 500Hz ranges?

Aside from the fact their whiskers encompass nearly the full scale of the NDSI index and they aren’t clogged with outliers at either end, their respective interquartile ranges (25th, 50th, 75th percentile) all fall within the neutral region of the NDSI scale.

Why is that important? Remember earlier when I was describing the overall soundscape of a suburban environment as one characterised by a perceived amount of anthropophony to biophony fluctuating both regularly and unpredictably.

I believe that having the majority of data landing somewhere around -0.3 to 0.3 best represents the reality of anthropophony to biophony within this environment from this particular vantage (the balcony), and by what one (i.e: me) perceives with their own ears.

So based on this logic, I’ve essentially got a choice of the ranges starting at 200, 250 or 500Hz. Let’s say I can only pick one. Whilst all three are probably suitable, I’m going to lean more towards the anthropophony and choose 200Hz.

Ah, yes. I promised a table of summary statistics:

NDSI rangemeanmedianminmaxstandard deviation50 Hz to 2 kHz-0.629-0.772-0.9990.8400.383100 Hz to 2 kHz-0.528-0.676-0.9980.9820.445200 Hz to 2 kHz-0.243-0.333-0.9950.9950.498250 Hz to 2 kHz-0.122-0.182-0.9940.9960.492500 Hz to 2 kHz0.0200.003-0.9850.9990.4821kHz to 2kHz0.6090.657-0.9190.9990.273

Those summary stats above provide a good snapshot of the respective ranges. Indeed, for 200Hz the median value is -0.333 with a mean of -0.243, but the standard deviation of 0.498 suggests a lot of variance around the mean. Indeed, a similar level of variance is evident in the case of both 250Hz and 500Hz. In the case of 200Hz, whilst the majority of the values sit below 0, there’s still a lot variation going on.

To explore this further and work towards a better parameter fit, I’ll select another audio observation and apply the 200Hz range to its spectrogram.

Revisiting the spectrogram

Ok, we’re back with a spectrogram of an 7:00am observation from the 15th of December 2021, annotated with the 200Hz range. It has an NDSI value of -0.6, largely attributed to the lack of biophony to anthropophony.

FIGURE 8a: 15/12/2021 7:00am observation with 200 Hz – 2kHz range.

Let’s have a look at the previous spectrogram (7:00am, 14/12/2021) with a 200Hz range applied to it. Recall that it had an NDSI value of 0.537.

FIGURE 8b: 14/12/2021 7:00am observation with 200 Hz – 2kHz range.

Based on this comparison of these two spectrograms we get a better understanding of how the NDSI determines its value. 200Hz seems like a pretty reasonable fit in this context.

But I wonder if I can go a bit further with tweaking the parameters to more accurately encompass these acoustic events in terms of their anthropophony and biophony?

This is getting into splitting hairs territory, but I certainly can do this.

And I’m going to do it!

Using these spectrograms as guide, I’ll keep the lower bound of the anthropophony range at 200Hz, but drop the upper bound to 1.5kHz, since in both spectrograms the consistent anthropophonic energy appears to largely drop away after 1.5kHz.

Now, throughout this post, I haven’t touched the parameters for biophony, but I’m going to do a little tweaking here as well. I’ll drop the lower bound of the biophony range to 1.5kHz and nudge the upper band to 9kHz, accommodating a little more of the biophonic events.

In the interactive elements below, you can observe the difference in ranges for both of these observations:

FIGURE 8c: 14/12/2021 7:00am observation with 200 Hz – 1.5kHz, 1.5kHz – 9kHz range.

FIGURE 9d: 14/12/2021 7:00am observation with 200 Hz – 1.5kHz, 1.5kHz – 9kHz range.

What’s it going to result in? I’ll fire up R again, re-run the NDSI calculation on 20GB of audio, let my computer’s eight cores do their work, make a cup of tea, go for a walk around my neighbourhood and come back to R in about 40 minutes.

The fitted range: an ideal set of parameters or simply overkill?

Alright, so R’s computed the new set of NDSI values for the data. Here’s the NDSI for the new ranges of 200Hz – 1.5kHz for anthropophony and 1.5kHz – 9kHz for biophony.

Observation 14/12/2021, 7:00:00 – 7:00:55. NDSI = 0.56

Observation 15/12/2021, 7:00:00 – 7:00:55. NDSI = -0.518

Given that the NDSI for the 200Hz range was 0.537 and -0.6, the new range values of 0.56 and -0.518 represent marginal adjustments in the NDSI.

FIGURE 8e: 14/12/2021 7:00am observation with 200 Hz – 1.5kHz, 1.5kHz – 9kHz range.

FIGURE 8f: 14/12/2021 7:00am observation with 200 Hz – 1.5kHz, 1.5kHz – 9kHz range.

I’ve done a bit of analysis and plotted the 200Hz range alongside the new range across the entire period, as I’d done previously in Figure 1a. To aid the visualisation, I’ve manually overlaid the soundscape category colours as I’d used in the single observation plot (Figure 3.)

FIGURE 9: 200Hz range compared with new range over the entire period, coloured by soundscape category.

As we can see, the general distribution of values is largely unchanged, whilst the mean value for the new range is lifted to -0.19 (from -0.24 for the 200Hz range.)

If these two ranges are viewed as a boxplot alongside the other ranges, it’s apparent that the new range almost sits right in the middle of the 200Hz and 250Hz range.

FIG 10a: Boxplot of all NDSI ranges. 200Hz range and new range highlighted. Red dots denote the mean for each range.

Here’s the same plot again, but with the soundscape category colours overlaid:

FIG 10b: Boxplot of all NDSI ranges. Colour overlay of soundscape categories.

So, it would be fair to conclude that this ‘perfect fit’ range doesn’t really do that much; it’s largely compensating for what’s getting missed in the biophonic range, which isn’t really that much.

In the reality of location under observation, I believe that any of the ranges between 200Hz to 500Hz provide a reasonably accurate NDSI.

To be completely honest, when I look at the range for 500Hz set against the soundscape colour overlay, I feel as though this might be the best option, since it accommodates some moderate anthropophony and biophony, whilst the majority of values sit in neutral territory.

Based on what’s I’ve said previously about suburban soundscapes – as one characterised by a perceived amount of anthropophony to biophony fluctuating both regularly and unpredictably – this range seems pretty accurate.

It might also be worth nudging the range for biophony up to 9kHz for the 500Hz range, but we’ll leave further refinements for another time. This has been a fairly exhaustive analysis!

Further considerations

There’s a lot more I could do here to further explore parameter fitting for NDSI within this context. For one, I haven’t looked at the daily periods under observation, such as the pre-dawn, morning, noon, afternoon and evening timespans. There’s also the possibility of examining these NDSI ranges against other covariates (such as ACI, BI or even temperature) and seeing how NDSI stacks up.

These considerations (daily periods, correlations) may very well inform an inevitable revisiting of NDSI in the near future.

In the meantime, you’ll find the bulk of the code that I wrote to run this analysis along with the complete dataset here. If you’ve got any feedback or questions then please leave a comment or get in touch directly.

http://wranglingintheantipodes.wordpress.com/?p=704

Extensions

Middle Farm update

tristanlouthrobins Jan 13, 2022

There’s a couple of posts coming this month and here’s the first one explaining what’s happening (or actually, not happening) with the Middle Farm project. In November 2020, the pilot research at Middle Farm came to a dramatic halt when the deployed AudioMoth encountered the dreaded phenomena of water ingress. On a site visit inContinue reading "Middle Farm update"

Show full content

There’s a couple of posts coming this month and here’s the first one explaining what’s happening (or actually, not happening) with the Middle Farm project.

In November 2020, the pilot research at Middle Farm came to a dramatic halt when the deployed AudioMoth encountered the dreaded phenomena of water ingress.

On a site visit in late September, I’d relocated the AudioMoth from the picnic shelter to a small tree on the slope below. My assumption that the custom enclosure housing the device was watertight proved to be very wrong when, the following week, rainstorms battered the Fleurieu region. Upon my arrival at the site in early November, this is what I found:

Opening up the case, a small quantity of water spilled out and condensation was everywhere. The cardboard housing had also soaked up the moisture like a sponge. Observe the intimate proximity of cardboard to the AudioMoth..the horror, the horror. Pro tip: when outside, don’t ever use cardboard anywhere. The sad little silica gel sachet didn’t stand a chance at all, and it slid pathetically out of the enclosure. The next customised enclosure will most certainly employ some o-rings.

The device was completely toasted. You can see the bubbling on the surface of the board, and the battery terminal (and the batteries) were badly corroded as well. Amazingly though, the MicroSD card was okay! With it mounted, a quick check of the file metadata confirmed that the device must have failed on or shortly after the 17th of October.

So what now? Well, I’m putting a pause on the project for the time being. The main reason is that I’m now down to two Moths and the global chip shortage has meant that production of AudioMoths is being held back and staggered into 2022/2023.

As if life wasn’t already so uncertain! This lack of Moth production simply means that I can’t go throwing my precious devices to the elements, regardless of how watertight I might be able to make them. Other minor quibbles also factor into suspending the project, but they’re not really worth your time reading about here. Rather, they’re better left with me as a quiet torment as I re-work my methodologies and subsequent analysis workflows.

The good news is that I’ve steered my attention to home, where I’ve had a Moth (safely) deployed beneath the eaves of my balcony to observe the immediate soundscape. It is in part a return to my previous NDSI analysis from April 2020 and I’ll be covering this in the next post.

http://wranglingintheantipodes.wordpress.com/?p=694

Extensions

https://wranglingintheantipodes.wordpress.com/feed

Posts