Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech. / Zhao, Yan; Liang, Zhenlin; Du, Jing ; Zhang, Li; Liu, Chengyu; Zhao, Li.

In: Frontiers in Neurorobotics, Vol. 15, 684037, 26.08.2021.

Research output: Contribution to journalArticlepeer-review

Published

Standard

Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech. / Zhao, Yan; Liang, Zhenlin; Du, Jing ; Zhang, Li; Liu, Chengyu; Zhao, Li.

In: Frontiers in Neurorobotics, Vol. 15, 684037, 26.08.2021.

Research output: Contribution to journalArticlepeer-review

Harvard

Zhao, Y, Liang, Z, Du, J, Zhang, L, Liu, C & Zhao, L 2021, 'Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech', Frontiers in Neurorobotics, vol. 15, 684037. https://doi.org/10.3389/fnbot.2021.684037

APA

Zhao, Y., Liang, Z., Du, J., Zhang, L., Liu, C., & Zhao, L. (2021). Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech. Frontiers in Neurorobotics, 15, [684037]. https://doi.org/10.3389/fnbot.2021.684037

Vancouver

Zhao Y, Liang Z, Du J, Zhang L, Liu C, Zhao L. Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech. Frontiers in Neurorobotics. 2021 Aug 26;15. 684037. https://doi.org/10.3389/fnbot.2021.684037

Author

Zhao, Yan ; Liang, Zhenlin ; Du, Jing ; Zhang, Li ; Liu, Chengyu ; Zhao, Li. / Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech. In: Frontiers in Neurorobotics. 2021 ; Vol. 15.

BibTeX

@article{d0e87ccc2b074302add17c3dda1fff80,
title = "Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech",
abstract = "Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance enhancement regardless of making full usage of the inherent information from speech. To focus on more emotionally salient regions of depression speech, in this research, we propose a multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level features to store the original temporal relationship of a speech sequence and then analyze their difference between speeches of depression and those of health status. Then, we study the performance of various features and use a modified feature set as the input of the LSTM layer. Instead of using the output of the traditional LSTM, multi-head time-dimension attention is employed to obtain more key time information related to depression detection by projecting the output into different subspaces. The experimental results show the proposed model leads to improvements of 2.3 and 10.3% over the LSTM model on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) and the Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) corpus, respectively.",
author = "Yan Zhao and Zhenlin Liang and Jing Du and Li Zhang and Chengyu Liu and Li Zhao",
year = "2021",
month = aug,
day = "26",
doi = "10.3389/fnbot.2021.684037",
language = "English",
volume = "15",
journal = "Frontiers in Neurorobotics",
issn = "1662-5218",
publisher = "Frontiers Media S.A.",

}

RIS

TY - JOUR

T1 - Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech

AU - Zhao, Yan

AU - Liang, Zhenlin

AU - Du, Jing

AU - Zhang, Li

AU - Liu, Chengyu

AU - Zhao, Li

PY - 2021/8/26

Y1 - 2021/8/26

N2 - Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance enhancement regardless of making full usage of the inherent information from speech. To focus on more emotionally salient regions of depression speech, in this research, we propose a multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level features to store the original temporal relationship of a speech sequence and then analyze their difference between speeches of depression and those of health status. Then, we study the performance of various features and use a modified feature set as the input of the LSTM layer. Instead of using the output of the traditional LSTM, multi-head time-dimension attention is employed to obtain more key time information related to depression detection by projecting the output into different subspaces. The experimental results show the proposed model leads to improvements of 2.3 and 10.3% over the LSTM model on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) and the Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) corpus, respectively.

AB - Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance enhancement regardless of making full usage of the inherent information from speech. To focus on more emotionally salient regions of depression speech, in this research, we propose a multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level features to store the original temporal relationship of a speech sequence and then analyze their difference between speeches of depression and those of health status. Then, we study the performance of various features and use a modified feature set as the input of the LSTM layer. Instead of using the output of the traditional LSTM, multi-head time-dimension attention is employed to obtain more key time information related to depression detection by projecting the output into different subspaces. The experimental results show the proposed model leads to improvements of 2.3 and 10.3% over the LSTM model on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) and the Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) corpus, respectively.

U2 - 10.3389/fnbot.2021.684037

DO - 10.3389/fnbot.2021.684037

M3 - Article

VL - 15

JO - Frontiers in Neurorobotics

JF - Frontiers in Neurorobotics

SN - 1662-5218

M1 - 684037

ER -