Sound classification using evolving ensemble models and Particle Swarm Optimization

Li Zhang; Chee Peng Lim; Yonghong Yu; Ming Jiang

doi:10.1016/j.asoc.2021.108322

Sound classification using evolving ensemble models and Particle Swarm Optimization

Li Zhang, Chee Peng Lim, Yonghong Yu, Ming Jiang

Research output: Contribution to journal › Article › peer-review

143 Downloads (Pure)

Abstract

Automatic sound classification attracts increasing research attention owing to its vast applications, such as robot navigation, environmental sensing, musical instrument classification, medical diagnosis, and surveillance. In this research, we propose an ensemble convolutional bidirectional Long Short-Term Memory (CBiLSTM) network with optimal hyper-parameter selection for undertaking sound classification. We first transform each audio signal into a spectrogram representation using the Short-time Fourier transform (STFT). A Particle Swarm Optimization (PSO) variant is subsequently proposed to optimize the learning rate, weight decay, numbers of filters and hidden units in the convolutional and BiLSTM layers, respectively, in order to extract effective spatial-temporal characteristics from the spectrogram inputs. To tackle the issue of stagnation in optimization, the proposed algorithm incorporates local exploitation using secant and Newton-Raphson methods, promising leader generation using regular and irregular super-ellipse formulae, and three-dimensional spherical search coefficients. Moreover, it takes into account multiple fused elite signals in conjunction with numerical analysis based exploitation to balance between diversification and intensification. A variety of CBiLSTM networks with distinctive optimized settings are devised. An ensemble model is then constructed by incorporating a set of three yielded networks based on a majority voting scheme. Evaluated using several audio data sets, our ensemble CBiLSTM networks outperform those with default and optimal settings identified by other search methods, existing deep architectures and state-of-the-art related studies. In addition to sound classification tasks, the proposed PSO algorithm also outperforms a number of classical and advanced search methods for solving diverse unimodal and multimodal benchmark functions with statistical significance.

Original language	English
Article number	108322
Journal	Applied Soft Computing
Volume	116
Early online date	8 Jan 2022
DOIs	https://doi.org/10.1016/j.asoc.2021.108322
Publication status	Published - Feb 2022

Keywords

Sound Classification
Evolutionary Algorithm
Deep Convolutional Bidirectional Long Short-Term Memory Network
Ensemble Classifier

Access to Document

10.1016/j.asoc.2021.108322

Accepted ManuscriptAccepted author manuscript, 3.35 MBLicence: CC BY-NC-ND

Cite this

@article{9fd4937125084f548f584deee4e5610d,

title = "Sound classification using evolving ensemble models and Particle Swarm Optimization",

abstract = "Automatic sound classification attracts increasing research attention owing to its vast applications, such as robot navigation, environmental sensing, musical instrument classification, medical diagnosis, and surveillance. In this research, we propose an ensemble convolutional bidirectional Long Short-Term Memory (CBiLSTM) network with optimal hyper-parameter selection for undertaking sound classification. We first transform each audio signal into a spectrogram representation using the Short-time Fourier transform (STFT). A Particle Swarm Optimization (PSO) variant is subsequently proposed to optimize the learning rate, weight decay, numbers of filters and hidden units in the convolutional and BiLSTM layers, respectively, in order to extract effective spatial-temporal characteristics from the spectrogram inputs. To tackle the issue of stagnation in optimization, the proposed algorithm incorporates local exploitation using secant and Newton-Raphson methods, promising leader generation using regular and irregular super-ellipse formulae, and three-dimensional spherical search coefficients. Moreover, it takes into account multiple fused elite signals in conjunction with numerical analysis based exploitation to balance between diversification and intensification. A variety of CBiLSTM networks with distinctive optimized settings are devised. An ensemble model is then constructed by incorporating a set of three yielded networks based on a majority voting scheme. Evaluated using several audio data sets, our ensemble CBiLSTM networks outperform those with default and optimal settings identified by other search methods, existing deep architectures and state-of-the-art related studies. In addition to sound classification tasks, the proposed PSO algorithm also outperforms a number of classical and advanced search methods for solving diverse unimodal and multimodal benchmark functions with statistical significance.",

keywords = "Sound Classification, Evolutionary Algorithm, Deep Convolutional Bidirectional Long Short-Term Memory Network, Ensemble Classifier",

author = "Li Zhang and Lim, {Chee Peng} and Yonghong Yu and Ming Jiang",

year = "2022",

month = feb,

doi = "10.1016/j.asoc.2021.108322",

language = "English",

volume = "116",

journal = "Applied Soft Computing",

issn = "1568-4946",

publisher = "Elsevier BV",

}

TY - JOUR

T1 - Sound classification using evolving ensemble models and Particle Swarm Optimization

AU - Zhang, Li

AU - Lim, Chee Peng

AU - Yu, Yonghong

AU - Jiang, Ming

PY - 2022/2

Y1 - 2022/2

N2 - Automatic sound classification attracts increasing research attention owing to its vast applications, such as robot navigation, environmental sensing, musical instrument classification, medical diagnosis, and surveillance. In this research, we propose an ensemble convolutional bidirectional Long Short-Term Memory (CBiLSTM) network with optimal hyper-parameter selection for undertaking sound classification. We first transform each audio signal into a spectrogram representation using the Short-time Fourier transform (STFT). A Particle Swarm Optimization (PSO) variant is subsequently proposed to optimize the learning rate, weight decay, numbers of filters and hidden units in the convolutional and BiLSTM layers, respectively, in order to extract effective spatial-temporal characteristics from the spectrogram inputs. To tackle the issue of stagnation in optimization, the proposed algorithm incorporates local exploitation using secant and Newton-Raphson methods, promising leader generation using regular and irregular super-ellipse formulae, and three-dimensional spherical search coefficients. Moreover, it takes into account multiple fused elite signals in conjunction with numerical analysis based exploitation to balance between diversification and intensification. A variety of CBiLSTM networks with distinctive optimized settings are devised. An ensemble model is then constructed by incorporating a set of three yielded networks based on a majority voting scheme. Evaluated using several audio data sets, our ensemble CBiLSTM networks outperform those with default and optimal settings identified by other search methods, existing deep architectures and state-of-the-art related studies. In addition to sound classification tasks, the proposed PSO algorithm also outperforms a number of classical and advanced search methods for solving diverse unimodal and multimodal benchmark functions with statistical significance.

AB - Automatic sound classification attracts increasing research attention owing to its vast applications, such as robot navigation, environmental sensing, musical instrument classification, medical diagnosis, and surveillance. In this research, we propose an ensemble convolutional bidirectional Long Short-Term Memory (CBiLSTM) network with optimal hyper-parameter selection for undertaking sound classification. We first transform each audio signal into a spectrogram representation using the Short-time Fourier transform (STFT). A Particle Swarm Optimization (PSO) variant is subsequently proposed to optimize the learning rate, weight decay, numbers of filters and hidden units in the convolutional and BiLSTM layers, respectively, in order to extract effective spatial-temporal characteristics from the spectrogram inputs. To tackle the issue of stagnation in optimization, the proposed algorithm incorporates local exploitation using secant and Newton-Raphson methods, promising leader generation using regular and irregular super-ellipse formulae, and three-dimensional spherical search coefficients. Moreover, it takes into account multiple fused elite signals in conjunction with numerical analysis based exploitation to balance between diversification and intensification. A variety of CBiLSTM networks with distinctive optimized settings are devised. An ensemble model is then constructed by incorporating a set of three yielded networks based on a majority voting scheme. Evaluated using several audio data sets, our ensemble CBiLSTM networks outperform those with default and optimal settings identified by other search methods, existing deep architectures and state-of-the-art related studies. In addition to sound classification tasks, the proposed PSO algorithm also outperforms a number of classical and advanced search methods for solving diverse unimodal and multimodal benchmark functions with statistical significance.

KW - Sound Classification

KW - Evolutionary Algorithm

KW - Deep Convolutional Bidirectional Long Short-Term Memory Network

KW - Ensemble Classifier

U2 - 10.1016/j.asoc.2021.108322

DO - 10.1016/j.asoc.2021.108322

M3 - Article

SN - 1568-4946

VL - 116

JO - Applied Soft Computing

JF - Applied Soft Computing

M1 - 108322

ER -