A Convolutional Recurrent Neural Network with Spatial Feature Fusion for Environmental Sound Classification

Meehir Mhatre, Li Zhang, Arjun Panesar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Downloads (Pure)

Abstract

This research proposes a new Convolutional Recurrent Neural Network (CRNN) model with spatial feature fusion for environmental sound classification. Besides data preprocessing such as spectrogram transformation and data augmentation, customized deep networks, i.e. VGG19, ResNet152, and EfficientNetB0, with additional layers, are also proposed for audio classification. Specifically, the proposed CRNN model embeds ResNet152 and EfficientNetB0 in the encoder where spatial features extracted by both networks are concatenated. A Long Short-Term Memory (LSTM) component is used as the decoder in the proposed CRNN for temporal feature extraction. Evaluated using the ESC-50 dataset, the proposed CRNN model with a multi-channel spatial feature fusion, outperforms the customized VGG19, ResNet152, EfficientNetB0 networks as well as existing studies, significantly. The spatial feature fusion in conjunction with LSTM-based sequential feature extraction accounts for the superiority of the proposed CRNN model for environmental sound classification.
Original languageEnglish
Title of host publicationIntelligent Management of Data and Information in Decision Making
Subtitle of host publicationProceedings of the 16th FLINS Conference on Computational Intelligence in Decision and Control & the 19th ISKE Conference on Intelligence Systems and Knowledge Engineering (FLINS-ISKE 2024)
Pages275-282
Number of pages8
Volume14
ISBN (Electronic)978-981-12-9464-8
DOIs
Publication statusPublished - 30 Jul 2024

Publication series

NameWorld Scientific Proceedings Series on Computer Engineering and Information Science
ISSN (Print)1793-7868
ISSN (Electronic)2972-4465

Cite this