Speech Emotion Recognition Using Convolutional Recurrent Neural Networks

Abhishek Gangani, Li Zhang, Ming Jiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Downloads (Pure)

Abstract

Research suggests that various machine learning and deep learning models can be used for implementation of speech emotion recognition (SER) using different acoustic properties, such as voice, pitch, loudness, intensity, Mel-frequency cepstral coefficients, and spectral characteristics. This chapter conducts speech emotion recognition using deep neural networks, such as long short-term memory, gated recurrent units, and convolutional recurrent neural network. These different acoustic features are investigated in our studies owing to their great efficiency in representing key events in audio representations. A cross-validation evaluation has been conducted with the data from different actors for model evaluation to check the robustness of each proposed network. The proposed models show impressive performances in comparison with those of existing state-of-the-art methods for evaluating several speech emotion datasets.
Original languageEnglish
Title of host publicationIntelligent Management of Data and Information in Decision Making
Subtitle of host publicationProceedings of the 16th FLINS Conference on Computational Intelligence in Decision and Control & the 19th ISKE Conference on Intelligence Systems and Knowledge Engineering (FLINS-ISKE 2024)
Pages283-290
Number of pages8
ISBN (Electronic)978-981-12-9464-8
DOIs
Publication statusPublished - 30 Jul 2024

Publication series

NameWorld Scientific Proceedings Series on Computer Engineering and Information Science
ISSN (Print)1793-7868
ISSN (Electronic)2972-4465

Cite this