A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs

Conor Wall; Li Zhang; Yonghong Yu; Akshi  Kumar; Rong Gao

doi:10.3390/s22155566

A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs

Conor Wall, Li Zhang, Yonghong Yu, Akshi Kumar, Rong Gao

Research output: Contribution to journal › Article › peer-review

1 Downloads (Pure)

Abstract

Medical audio classification for lung abnormality diagnosis is a challenging problem owing to comparatively unstructured audio signals present in the respiratory sound clips. To tackle such challenges, we propose an ensemble model by incorporating diverse deep neural networks with attention mechanisms for undertaking lung abnormality and COVID diagnosis using respiratory, speech and coughing audio inputs. Specifically, four base deep networks are proposed which include attention-based Convolutional Recurrent Neural Network (A-CRNN), attention-based bidirectional Long Short-Term Memory (A-BiLSTM), attention-based bidirectional Gated Recurrent Unit (A-BiGRU), as well as Convolutional Neural Network (CNN). A Particle Swarm Optimization (PSO) algorithm is used to optimize training parameters of each network. An ensembling mechanism is used to integrate the outputs of these base networks by averaging their probability predictions of each class. Evaluated using respiratory ICBHI, Coswara breathing, speech, and cough datasets, as well as a combination of ICBHI and Coswara breathing databases, our ensemble model and base networks achieve ICBHI scores ranging from 0.920 to 0.9766. Most importantly, the empirical results indicate that a positive COVID diagnosis can be distinguished to a high degree from other more common respiratory diseases using audio recordings, based on the combined ICBHI and Coswara breathing datasets.

Original language	English
Article number	5566
Journal	Sensors
Volume	22
Issue number	15
DOIs	https://doi.org/10.3390/s22155566
Publication status	Published - 26 Jul 2022

Access to Document

10.3390/s22155566Licence: CC BY

Cite this

@article{cc0ee6288d7647a3ad04d913ba3ef1ad,

title = "A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs",

abstract = "Medical audio classification for lung abnormality diagnosis is a challenging problem owing to comparatively unstructured audio signals present in the respiratory sound clips. To tackle such challenges, we propose an ensemble model by incorporating diverse deep neural networks with attention mechanisms for undertaking lung abnormality and COVID diagnosis using respiratory, speech and coughing audio inputs. Specifically, four base deep networks are proposed which include attention-based Convolutional Recurrent Neural Network (A-CRNN), attention-based bidirectional Long Short-Term Memory (A-BiLSTM), attention-based bidirectional Gated Recurrent Unit (A-BiGRU), as well as Convolutional Neural Network (CNN). A Particle Swarm Optimization (PSO) algorithm is used to optimize training parameters of each network. An ensembling mechanism is used to integrate the outputs of these base networks by averaging their probability predictions of each class. Evaluated using respiratory ICBHI, Coswara breathing, speech, and cough datasets, as well as a combination of ICBHI and Coswara breathing databases, our ensemble model and base networks achieve ICBHI scores ranging from 0.920 to 0.9766. Most importantly, the empirical results indicate that a positive COVID diagnosis can be distinguished to a high degree from other more common respiratory diseases using audio recordings, based on the combined ICBHI and Coswara breathing datasets.",

author = "Conor Wall and Li Zhang and Yonghong Yu and Akshi Kumar and Rong Gao",

year = "2022",

month = jul,

day = "26",

doi = "10.3390/s22155566",

language = "English",

volume = "22",

journal = "Sensors",

issn = "1424-8220",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "15",

}

TY - JOUR

T1 - A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs

AU - Wall, Conor

AU - Zhang, Li

AU - Yu, Yonghong

AU - Kumar, Akshi

AU - Gao, Rong

PY - 2022/7/26

Y1 - 2022/7/26

N2 - Medical audio classification for lung abnormality diagnosis is a challenging problem owing to comparatively unstructured audio signals present in the respiratory sound clips. To tackle such challenges, we propose an ensemble model by incorporating diverse deep neural networks with attention mechanisms for undertaking lung abnormality and COVID diagnosis using respiratory, speech and coughing audio inputs. Specifically, four base deep networks are proposed which include attention-based Convolutional Recurrent Neural Network (A-CRNN), attention-based bidirectional Long Short-Term Memory (A-BiLSTM), attention-based bidirectional Gated Recurrent Unit (A-BiGRU), as well as Convolutional Neural Network (CNN). A Particle Swarm Optimization (PSO) algorithm is used to optimize training parameters of each network. An ensembling mechanism is used to integrate the outputs of these base networks by averaging their probability predictions of each class. Evaluated using respiratory ICBHI, Coswara breathing, speech, and cough datasets, as well as a combination of ICBHI and Coswara breathing databases, our ensemble model and base networks achieve ICBHI scores ranging from 0.920 to 0.9766. Most importantly, the empirical results indicate that a positive COVID diagnosis can be distinguished to a high degree from other more common respiratory diseases using audio recordings, based on the combined ICBHI and Coswara breathing datasets.

AB - Medical audio classification for lung abnormality diagnosis is a challenging problem owing to comparatively unstructured audio signals present in the respiratory sound clips. To tackle such challenges, we propose an ensemble model by incorporating diverse deep neural networks with attention mechanisms for undertaking lung abnormality and COVID diagnosis using respiratory, speech and coughing audio inputs. Specifically, four base deep networks are proposed which include attention-based Convolutional Recurrent Neural Network (A-CRNN), attention-based bidirectional Long Short-Term Memory (A-BiLSTM), attention-based bidirectional Gated Recurrent Unit (A-BiGRU), as well as Convolutional Neural Network (CNN). A Particle Swarm Optimization (PSO) algorithm is used to optimize training parameters of each network. An ensembling mechanism is used to integrate the outputs of these base networks by averaging their probability predictions of each class. Evaluated using respiratory ICBHI, Coswara breathing, speech, and cough datasets, as well as a combination of ICBHI and Coswara breathing databases, our ensemble model and base networks achieve ICBHI scores ranging from 0.920 to 0.9766. Most importantly, the empirical results indicate that a positive COVID diagnosis can be distinguished to a high degree from other more common respiratory diseases using audio recordings, based on the combined ICBHI and Coswara breathing datasets.

U2 - 10.3390/s22155566

DO - 10.3390/s22155566

M3 - Article

SN - 1424-8220

VL - 22

JO - Sensors

JF - Sensors

IS - 15

M1 - 5566

ER -