Projects per year
Abstract
Automated patient monitoring solutions greatly benefit from audio emotion classification, although the considerable variance in individual expression and interpretation of emotions poses a challenge. Current approaches often employ standard Audio Spectrogram Transformer (AST) and deep learning models such as Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN)-based networks. However, their performance can be enhanced by integrating neural architecture search techniques using swarm optimisation algorithms. In this research, we explore AST with hyperparameter optimisation for speech emotion recognition. Three deep learning architectures with optimisable -block structures and variable filter numbers, i.e. 1DCNN, bidirectional LSTM (BiLSTM) and CNN-BiLSTM, are also proposed, enabling the optimisation of network depth and width. A novel Cluster Search Optimisation (CSO) algorithm is introduced. It incorporates Cluster Centroid Search, a Cluster Distance Improvement metric and reinforcement learning to dispatch different search actions based on clustering convergence and -learning strategies, respectively. A novel Noise Tempered K-means (NTKM) clustering model is also proposed with the integration of Gaussian-based noise insertion and cluster compactness-separation measurement, to further fine-tune the cluster centriods obtained using OPTICS clustering. CSO is used for hyperparameter and architecture search for AST and aforementioned deep networks. Attention mechanisms are also integrated with CSO-optimised networks to further enhance feature learning. We evaluate the resulting models against those devised by other optimisation algorithms across the EMO-DB, SAVEE, and TESS datasets. The empirical results demonstrate that CSO-optimised AST and CNN-BiLSTM with attention mechanisms outperform other architectures and yield favourable comparison results against those from existing state-of-the-art audio emotion classification methods.
Original language | English |
---|---|
Article number | 113223 |
Number of pages | 25 |
Journal | Knowledge-Based Systems |
Volume | 314 |
Early online date | 1 Mar 2025 |
DOIs | |
Publication status | E-pub ahead of print - 1 Mar 2025 |
Projects
- 1 Finished
-
Improving Long COVID patient recovery through voice-based AI symptom tracking and personalised rehabilitation
1/10/22 → 30/09/23
Project: Research