Robust sound event detection in bioacoustic sensor networks.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2019
Historique:
received: 07 03 2019
accepted: 07 10 2019
entrez: 25 10 2019
pubmed: 28 10 2019
medline: 17 3 2020
Statut: epublish

Résumé

Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 ms) and long-term (30 min) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer, i.e. an affine layer whose weights are dynamically adapted at prediction time by an auxiliary network taking long-term summary statistics of spectrotemporal features as input. We show that PCEN reduces temporal overfitting across dawn vs. dusk audio clips whereas context adaptation on PCEN-based summary statistics reduces spatial overfitting across sensor locations. Moreover, combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.

Identifiants

pubmed: 31647815
doi: 10.1371/journal.pone.0214168
pii: PONE-D-19-04469
pmc: PMC6812790
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0214168

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Science. 2016 Nov 4;354(6312):547-548
pubmed: 27811252
Proc Natl Acad Sci U S A. 2017 Oct 17;114(42):11175-11180
pubmed: 28973942
Sci Adv. 2018 Jun 20;4(6):eaaq1084
pubmed: 29938220
J Acoust Soc Am. 2012 Feb;131(2):1102-12
pubmed: 22352485
Science. 2018 Sep 14;361(6407):1115-1118
pubmed: 30213913
Ecol Appl. 2016 Apr;26(3):752-70
pubmed: 27411248
PLoS One. 2016 Nov 23;11(11):e0166866
pubmed: 27880836
Ecology. 2009 Oct;90(10):2676-82
pubmed: 19886477
Biol Rev Camb Philos Soc. 2013 May;88(2):287-309
pubmed: 23190144
Bioscience. 2017 Oct 1;67(10):912-918
pubmed: 29599538
J Acoust Soc Am. 2016 Nov;140(5):3691
pubmed: 27908084
Methods Ecol Evol. 2015 Mar;6(3):257-265
pubmed: 25954500
Nat Ecol Evol. 2018 Oct;2(10):1603-1609
pubmed: 30224817
PLoS One. 2016 Aug 24;11(8):e0160106
pubmed: 27557096
Ecol Appl. 2010 Dec;20(8):2131-47
pubmed: 21265447
J Acoust Soc Am. 2013 Sep;134(3):1814-23
pubmed: 23967915
J Acoust Soc Am. 2003 Mar;113(3):1749-56
pubmed: 12656407

Auteurs

Vincent Lostanlen (V)

Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America.
Music and Audio Research Laboratory, New York University, New York, NY, United States of America.
Center for Urban Science and Progress, New York University, New York, NY, United States of America.

Justin Salamon (J)

Music and Audio Research Laboratory, New York University, New York, NY, United States of America.
Center for Urban Science and Progress, New York University, New York, NY, United States of America.

Andrew Farnsworth (A)

Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America.

Steve Kelling (S)

Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America.

Juan Pablo Bello (JP)

Music and Audio Research Laboratory, New York University, New York, NY, United States of America.
Center for Urban Science and Progress, New York University, New York, NY, United States of America.

Articles similaires

Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice
Animals Tail Swine Behavior, Animal Animal Husbandry

Classifications MeSH