Mental Health Assessment Method Based on Emotion 2 Level Derived from Voice 3

: In many developed countries, mental health disorders have become problematic, and the 19 economic loss due to treatment costs and interference with work is immeasurable. Therefore, we 20 developed a method to assess individuals’ mental health using emotional components contained in 21 their voice. We propose two indices of mental health: vitality, a short-term index, and mental 22 activity, a long-term index capturing the trends in vitality. To evaluate our method, we used the 23 voices of healthy individuals (n = 14) and patients with major depression (n = 30). The patients were 24 also assessed by specialists using the Hamilton Rating Scale for Depression (HAM-D). A significant 25 negative correlation existed between the vitality extracted from the voices and HAM-D scores (r = 26 -0.33, p < .05). We could discriminate the voice data of healthy individuals and patients having 27 depression with a high accuracy using the vitality (p = .0085, area under the curve = 0.76). Further, 28 we developed a method to estimate stress through emotion instead of analyzing stress directly 29 from voice data. By daily monitoring of vitality using smartphones, we can encourage hospital 30 visits for people before they become depressed or during the early stages of depression, to prevent 31 adverse consequences of depression.


Introduction
burden on the participants during specimen collection; i.e., they are not convenient.
On the other hand, with the recent widespread use of smartphones, pathological analysis using 48 voice data has become popular [10][11][12]. Voice analysis using smartphones is not only noninvasive, it 49 does not require a dedicated device; thus, it can be performed conveniently and remotely.

50
The relationship between mental illness and voice has been observed in previous studies; e.g.,         Table 1 shows participants' information per group. It should be noted that the number of 91 participants and the number of data differed because data may have been collected multiple times 92 from the same participant on different days. The average number of data collected per healthy 93 person was 24.4 ± 33.3 for men and 6.3 ± 6.1 for women. For patients with depression, it was 6.0 ± 2.9 vitality and mental activity.

118
The protocol of this study was designed in accordance with the Declaration of Helsinki and 119 relevant domestic guidelines issued by the concerned authority in Japan. The protocol was approved 120 by the ethics committee of the National Defense Medical College (no. 2248) and the Kitahara Table 2. Seventeen phrases used for recording.

208
According to the definition of Zimmerman et al. [31], the data of the patient group were divided 209 into three groups by HAM-D score: no depression (≤ 7), mild (8-16), and moderate or severe (≥ 17).

210
The vitality of the four groups (i.e., these three and the healthy group) were compared with 211 each other. P-values from Tukey-Kramer tests, the AUC of the receiver operating characteristic 212 (ROC) curve, sensitivity, and specificity were used to evaluate the classification accuracy of vitality.

215
The mean values of HAM-D score in each group are shown in Table 4. In addition, the mild 216 group and the moderate or severe group are collectively referred to as the depression group number of participants in each group was 11 men and 8 women in the no depression group, and 5 219 men and 3 women in the mild group. All three participants in the moderate or severe group were 220 men. 22.8 ± 6.6

223
We evaluated the performance of vitality using the data for algorithm verification shown in 224 Table 3. Figure 3 shows the relationship between HAM-D score and vitality for 46 data obtained 225 from the patient group. There was a significant negative correlation between the two (r = -0.33, n = 226 46, p < .05).

242
The Tukey-Kramer test revealed significant differences between the healthy group and the 243 depression group, and between the healthy group and the moderate or severe group (p = .0085 and

245
Next, to evaluate the discrimination performance of vitality, the AUC of the ROC curve, 246 sensitivity, and specificity were used. Figure 5 shows the ROC curves when vitality was used to 247 identify whether the data for verification are for the healthy group or for each patient group. Here, 248 the horizontal axis represents 1-specificity (false positive rate), and the vertical axis represents 249 sensitivity (positive rate).

Figure 5.
Receiver operating characteristic curves when using vitality to identify groups. Table 5 shows the performance when the data of the healthy group and each group were 253 distinguished using vitality. The AUC was 0.87, and the sensitivity and specificity were 0.78 and 254 0.86, respectively, regarding the discrimination performance between the healthy group and the 255 moderate or severe group. On the other hand, both AUCs were less than 0.7 regarding 256 discrimination performance between the healthy group and the no depression group or mild group.

261
In this study, we developed a method to measure mental health using emotional components 262 contained in voice. Two indicators were proposed: vitality based on short-term voice data and 263 mental activity calculated from long-term voice data. As shown in Figure 3, there was a significant 264 negative correlation between vitality and HAM-D score (i.e., depression severity assessed by a 265 physician). In addition, as shown in Figure 4, the group with a higher severity of depression tended 266 to have a lower mean vitality.

267
There was a significant difference between the healthy group and the depression group, and 268 between the healthy group and the moderate or severe group in vitality. On the other hand, there 269 was no significant difference between the healthy group and the no depression group with almost 270 no depressive symptoms, even if they were outpatients with depression. This suggests the 271 possibility of measuring treatment effects by vitality (i.e., voice). Moreover, as shown in Figure 5 and 278 and the mean vitality of the depression low-risk group (BDI scores < 17; p < .05). Specifically, the 279 scores for question 9-concerning suicidal ideation-took a value that ranged 0-3. There was a 280 significant difference between the mean vitality of the suicide low-risk group (0 or 1 points) and the 281 mean vitality of the suicide high-risk group (2 or 3 points; p < .01). In the future, we will examine the 282 vitality of native speakers of other languages, such as English.

283
As a limitation of this research, only the fixed-phrase read-out speech was used for verification.

284
To apply vitality to free speech such as a call, further verification is required. Furthermore, in the 285 verification data, the number of voices collected for each participant, sex ratio, and age were not 286 unified between the groups. These differences may be reflected in the features of voice. For example, 287 all participants in the moderate or severe group were men, and the number of participants was as 288 small as three. In the future, it is necessary to acquire many voices of female patients, especially 289 those with severe depression, and to evaluate the performance level of vitality.

290
Further, mental activity was not validated because continuous data could not be collected 291 sufficiently for the same participants in both the healthy group and the patient group. However,

292
comparing Figures 1 and 2, showing data for algorithm preparation, there is a possibility that mental 293 activity can more accurately identify the data as compared to vitality, which will be addressed in the 294 future.

295
Vitality and mental activity can be measured only by voice, and their advantages are that they 296 are non-invasive and less expensive than self-administered tests such as the GHQ-30 and BDI and 297 stress-check methods using saliva and blood. Moreover, it is also possible to record day-to-day state 298 changes easily by implementing them on smartphones or other similar devices.

299
We developed a smartphone application that implements the algorithm for vitality and mental

304
In this study, we developed a method to measure mental health from voice. The algorithm to 305 estimate stress through emotion instead of analyzing stress directly from voice data is novel. The

306
MIMOSYS implemented the algorithm for vitality and mental activity, which is a cost-effective and 307 convenient measurement device. If the correlation between HAM-D score and vitality can be further 308 enhanced, it may be used to aid doctors' diagnoses in the future. By daily monitoring of vitality and 309 mental activity using the MIMOSYS, we can encourage hospital visits for people before they become 310 depressed or during the early stages of depression. This may lead to reduced economic loss due to 311 treatment costs and interference with work.