A risk prediction model for post-stroke depression in Chinese stroke survivors based on clinical and socio-psychological features

Background Post-stroke depression (PSD) is a frequent complication that worsens rehabilitation outcomes and patient quality of life. This study developed a risk prediction model for PSD based on patient clinical and socio-psychology features for the early detection of high risk PSD patients. Results Risk predictors included a history of brain cerebral infarction (odds ratio [OR], 3.84; 95% confidence interval [CI], 2.22-6.70; P < 0.0001) and four socio-psychological factors including Eysenck Personality Questionnaire with Neuroticism/Stability (OR, 1.18; 95% CI, 1.12-1.20; P < 0.0001), life event scale (OR, 0.99; 95% CI, 0.98-0.99; P = 0.0007), 20 items Toronto Alexithymia Scale (OR, 1.06; 95% CI, 1.02-1.10; P = 0.002) and Social Support Rating Scale (OR, 0.91; 95% CI, 0.87-0.90; P < 0.001) in the logistic model. In addition, 11 rules were generated in the tree model. The areas under the curve of the ROC and the accuracy for the tree model were 0.85 and 0.86, respectively. Methods This study recruited 562 stroke patients in China who were assessed for demographic data, medical history, vascular risk factors, functional status post-stroke, and socio-psychological factors. Multivariate backward logistic regression was used to extract risk factors for depression in 1-month after stroke. We converted the logistic model to a visible tree model using the decision tree method. Receiver operating characteristic (ROC) was used to evaluate the performance of the model. Conclusion This study provided an effective risk model for PSD and indicated that the socio-psychological factors were important risk factors of PSD.


INTRODUCTION
Post-stroke depression (PSD) is considered to be one of the most frequent and important post stroke sequela, with a prevalence ranging from 20% to 65% [1,2]. PSD has a negative effect on the therapy, survival and resumption of social activities of patients, and also has a high medical care burden [1]. Previous observational studies indicated that PSD usually occurs within a few months of stroke onset [3]. The longitudinal study reported that a prevalence of PSD was increased to 25% between 0 to 3 months after stroke, decreased to 16% between 3 and 12 months after stroke, and would be increased again 1 year after stroke [4].Thus, it is important to identify patients at high risk of PSD, which will facilitate early prevention and adequate treatment.
There is evidence for both biological and psychological mechanisms in the etiology of PSD. Some researchers propose a biological mechanism in which depression is caused by brain damage that disrupts neural circuits involved in mood regulation [5]. Other studies have focused on stroke-related factors and showed that demographic factors (e.g., age, sex) [6], medical and psychiatric history [7], type and severity of stroke, lesion location [8], and degree of disability are associated with PSD [9]. However, others have suggested that depression is caused by a psychological reaction to the social and psychological stressors associated with stroke [8]. Previous studies demonstrated that personality traits are associated with PSD, they found the patients with higher levels of neuroticism have higher risk of PSD, suggested that the impact of personality traits on depressive symptoms is mediated through illness cognitions and coping styles [5,6,10]. Recent studies reported social support is an important predictor to improve functional recovery after stroke [11], and the PSD patients have low social support, indicating that social support systems could help buffer patients at risk [9,[12][13][14]. Moreover, stroke survivors have a high prevalence of alexithymia and anhedonia, PSD symptoms that cause a high burden to family caregivers. Unfortunately, the factors investigated in these above-mentioned studies only explain a small part of the risks of PSD. Moreover, the evidence for the role of psychological factors involved in PSD is unclear.
Previous studies have investigated general PSD risk factors (sex, history of depression, family history of depression and somatic comorbidity), potential diseaserelated risk factors (cognitive impairment, lesion location, leukoaraiosis on computed tomography (CT), Rankin score and cognitive score) [15], and established risk predictors for depression in the first year after stroke. Furthermore, a relationship between psychological variables and the presence of depressive symptoms lasting 2 months after stroke was reported in a study that tried to combine demographic, stroke-related factors and psychological factors to identify their influence on PSD [5]. Although these studies focused on risk factors, they did not develop an operational tool for clinical evaluation. To date, few studies have developed a risk model for the early detection of PSD. A PSD Prediction Scale (DePreS) was proposed to assess the risk of PSD in the first week after stroke [7]. It mainly focused on the socio-demographic and strokerelated factors, including a medical history of depression or other psychiatric disorders, hypertension, angina pectoris, and dressing in Barthel Index (BI). This model has a good predictive performance; however, a lack of socio-psychology factors and cognitive factors limits the interpretation and use for PSD patients. In addition, Mclntosh showed that implementation of the Evidence Based Depression Screening and Treatment (EBDST) protocol improved early detection and treatment of PSD patients in hospital [16]. Although the study tried to explore the risk factors for PSD, there was a lack of an assessment criterion to identify the risk of PSD.
Therefore, the aim of the present study was to identify risk factors of PSD from demographic factors, medical history, vascular risk factors, functional status, socio-psychological factors, and neurological and cognitive functions. Moreover, the main purpose of this study was to develop a clinical and comprehensive risk model for the clinical recognition and early prevention of PSD.

RESULTS
This study recruited 562 stroke patients. At one month after stroke, 226 cases fulfilled the criteria of PSD. The cumulative incidence of PSD was 40.2%. The characteristics of patients are summarized in Table 1. The range of age was between 24 and 84 years with an average age of 64.28 ± 10 years. There were 225 females, and 119 females with PSD, 106 females without depression. There was a significant difference in Body Mass Index (BMI) (P = 0.006) between the PSD group and Non-PSD group. Regarding analysis of medical history and vascular risk factors, a history of Brian_CI, hypertension, diabetes, and smoking and drinking were significantly different (P < 0.001) between the PSD and Non-PSD groups. For the socio-psychological, neurological and cognitive functional factors, EPQ_E, EPQ_P, EPQ_N, SSRS, TAS, NIHSS, BI and MMSE were significantly different (P < 0.05) between the PSD group and Non-PSD group.
As shown in Table 2   To provide a visualization of the logistic model using the significant predictors, we used the decision tree method to construct a tree model for distinguishing patients with a high PSD risk from the stroke survivors. Figure 2 shows the decision tree with 9 internal nodes and 11 leaves. For each internal node, the split criterion is indicated. The first major split in the tree defines pathways separating high TAS score (>=52) and low TAS score (<52), which is similar to a TAS possible alexithymia cut-off value of 52-60. Within each of these branches, the tree followed the EPQ_N and SSRS assessment at the second level, the EPQ_N and Brain_CI history at the third level, the SSRS, LES and EPQ_N score at the fourth level, and the Brain_CI history at the fifth level. Thus, up to five assessments could identify patients who were at high risk of PSD. The ROC curve in Figure 1

DISCUSSION
This study explored potential PSD risk predictors to develop a prediction model, which was used to predict the risk of PSD within 1 month, by the multivariable logistic regression method based on clinical, socio-psychological, neurological and cognitive functional factors. We also generated an evaluation criteria tool for PSD using the decision tree method. The present study yielded two main findings. First, Brian_CI history, EPQ_N, LES, SSRS, and TAS were significant PSD risk factors, of which the last four factors belonged to socio-psychological factors. This suggested that understanding the role of socio-psychosocial factors in the development of PSD is important to improve primary and secondary intervention for stroke survivors. Second, the decision tree provided the critical values of the rule and quantified how they interacted to affect the risk of PSD in stroke survivors.
This study also comprehensively investigated the influence of a broad range of general clinical, sociopsychological, neurological and cognitive functional factors on PSD. Here, we found that only a history of Brain_CI was a significant predictor for PSD by general and clinical data. Previous studies have demonstrated that the correlation between demographic factors and the occurrence of PSD is controversial. Our study did not find demographic factors were risk factors for PSD. However, regarding neurologic deficits, this finding was inconsistent with other clinical studies and meta-analyses that reported that the baseline severity of disability was the most robust correlation with the development of depressive symptoms. A possible explanation was that the patients recruited in our study had mild or moderate nerve dysfunction, and this small difference between the two groups concealed any potential significant influence on PSD. Severe disability caused by large strokes may be an independent predictor of PSD, which is more likely to affect regions involved in mood processing.
For the socio-psychological factors, our findings suggest that certain psychosocial risk factors are important in the development of PSD. The comprehensive life event was estimated with LES, which quantifies the mental stimulation of daily life. This was not entirely in agreement with previous studies of stroke survivors, which reported that comprehensive life event stimulus not a major life event [14,17], and that negative life event [18] and stressful life event exposure [3,18] was associated with PSD. Regarding personality traits, stable characteristics were assumed to influence the process of creating the illness cognition. Studies that explored the relationship between personality traits of neuroticism and PSD demonstrated that neuroticism in EPQ might facilitate the onset of PSD and worsen the outcome [10,19].
The influence of alexithymia, tested by TAS, suggested that stroke survivors had an impaired ability to identify their own negative and positive emotional responses. It   also revealed that determining the clinical correlates of emotional unawareness in patients with brain stroke was essential. Furthermore, the predicted role of social support confirmed the lack of social support at admission and was associated with the onset of PSD at 3-months followup [9]. Good social support was a protective factor for subsequent PSD and was associated with better poststroke functioning [20]. In contrast, poor social support including living alone, and absent partners or close friends was identified as a risk factor for PSD [21]. In addition, psychological therapy with good antidepressant effects indirectly demonstrated the important role of sociopsychological factors in PSD [22]. Compared with previous studies [7,15], the strength of the present study was the combination of various factors, and the use of the decision tree to reveal the relationship between socio-psychological characteristics and the risk of PSD. The tree model generated simple decision rules to assess the risk of PSD to assist the clinical diagnosis. This type of tree model for PSD risk may be particularly useful for rapid assessment. Among the 11 rules, three main and long preventive rules were identified: TAS-EPQ_N-Brain_CI-SSRS, TAS-SSRS-

EPQ_N-LES-Brain_CI and TAS-SSRS-Brain_CI-EPQ_N.
The study demonstrated a correlation between sociopsychological factors, and revealed their comprehensive involvement for the risk of PSD. Thus, our study used a tree method to demonstrate complex interactions of predictors that might be difficult or impossible to discover using traditional regression techniques. Furthermore, the decision tree showed a better performance for sensitivity (0.70), specificity (0.83), and AUC (0.85), and displayed a reliable distinguishing function for high-risk populations. The decision criteria successfully identified subgroups of patients who require different assessment tests or treatment strategies to achieve optimal medical outcomes. Moreover, compared with the AUC of the logistic model ROC, the predictive performance was similar between the logistic model and tree model. However, the tree model is more simple and intuitive than the logistic model, and the tree model could be convenient application in the clinic. The logistic regression model is used commonly to determine risk factors in medical researches and diagnose [23]. But the logistic model cloud not provide a selection strategy to achieve acute identify. When the logistic model is translated into a tree model, which could be easily converted into convenient If-Then rules [24]. Thus, the tree model could be a symbolic representation and lends itself to easy interpretation by humans.
The present study had some limitations. First, our results cannot be generalized for all stroke features such as biochemical indices and lesion location, which are also considered risk factors [15]. Future studies should combine these to reveal the interactions of pathophysiology risk factors. Second, the predictors used in our models were assessed at 1-month post-stroke, at which point full depressive symptoms may not be present. Further research of PSD at different times may provide clearer information of the risk predictors and improve the effectiveness of the model. Third, the decision rule and exact cut-off point were used to decide individualized optimal diagnostic estimation; however, they are not absolute. Future studies are needed to optimize the model and enhance its accuracy.
In conclusion, we constructed a comprehensive risk prediction model of PSD in Chinese stroke survivors based on their clinical and socio-psychology features. The model indicated that socio-psychological factors are important for identifying the risk of PSD within 1 month and contribute to post-stroke rehabilitation. Furthermore, a decision tool was developed to help clinicians identify the risk of PSD early, which will allow the optimization of PSD prevention strategies in personalized medicine.

This study was approved by the Medical Ethics Committee for Clinical Research of ZhongDa Hospital
Affiliated to Southeast University. Written consent forms were obtained from the participants or their legal guardians and the study methods were carried out in accordance with the approved guidelines. People's Hospital, and The Affiliated First Hospital of Nanjing Medical University). To qualify, the patients were required to meet the following criteria: (1) participants had ischemic stroke and intracerebral hemorrhage as determined by CT or Magnetic Resonance Imaging (MRI) data; (2) the age of onset was under 80 years; (3) participants were free of other major psychiatric disorders, including schizophrenia, bipolar disorder, substance abuse (caffeine, nicotine and alcohol), neurodegenerative illness, severe physical illnesses and other medical illnesses; (4) participants were free of anosognosia, neglect, hemianopia, cortical blindness, amnesia, aphasia, dementia and other symptoms hindered assessment.

Participants
In a prospective study conducted in China on stroke survivors, the prevalence of PSD was 27.4% two weeks after stroke, and a prevalence of 28% of PSD in survivors within one month after the stroke [25,26]. In the study, diagnostic evaluations of PSD were carefully conducted for all participants who fulfilled the following diagnostic criteria at three time points (1 week, 2 weeks and 1 month after stroke), combined with previous clinical literature, by two trained senior psychiatrists. Five factors were included in the diagnostic criteria: (1) had stroke before, or stroke occurred earlier than depressive symptoms; (2) met at least two other depressive symptoms with core criterion symptoms of depressed mood and loss of interest or pleasure in nine symptoms of major depressive disorder in Diagnostic and Statistical Manual of Mental Disorders, Fourth edition (DSM-IV); (3) impaired fitness for personal or work functioning; (4) depressive symptoms lasting more than one week; and (5) free of other major psychiatric disorders, including schizophrenia, bipolar disorder, and substance abuse (caffeine, nicotine and alcohol).

Statistical analysis
All analyses were conducted using statistical package R (version 3.2.3 [2015-12-10]) with the psych, MASS, pscl, c50, pROC and boot packages. Comparisons of demographic and clinical characteristics between the PSD and Non-PSD groups were performed with Student t-tests for continuous variables and Fisher exact tests for categorical variables. To compare the socio-psychological factors between PSD and Non-PSD groups, we performed normality test with Lilliefors test method [35] and homogeneity test of variances with Bartlett method [36]. If the factors satisfied the normalization and homogeneity of variances, we used multivariate analysis of variance (MANOVA), and then, we used one-way ANOVA analysis to estimate the significant differences between the PSD and Non-PSD group; otherwise, we used Kruskal-Wallis test method. The statistical significance threshold was set at P < 0.05.
We used a 2-step procedure to develop the risk prediction model of PSD. The patients were randomly divided into two subsets: training set (n=393, 70% of total data), and testing set (n=169, 30% of total data) [37]. First, a multivariable backward logistic regression analysis was performed in the training set to explore the baseline variables that independently predicted the occurrence of depressive symptoms at 1-month post-stroke. The predictors with statistical significance (P < 0.05) were introduced into the final logistic model with a backward procedure. To maintain the validity of selection in the previous steps, the ORs with 95% CIs of the factors were maintained in the final model. Furthermore, goodnessof-fit of the model was measured using McFadden's pseudo R 2 statistic (McFadden), where a range of 0.2-0.4 represented a very good fit [38,39].
Second, we quantified the predictive performance of the final logistic model. The predictive accuracy of the model was calculated by a 10-fold cross-validation in the testing set. In addition, predictive discrimination was estimated in the testing set using the AUC of the ROC, where a higher AUC indicated better discrimination.
Third, to construct a clinical predictive model and generate predictive criteria, we converted the regression model into a tree model using the decision tree method in the training set. Decision tree analyses quantifies the association of predictors [40,41], defines the most efficient pathway to obtaining a dichotomous ruling [42], and produces graphical outputs that summarize the interactions in a visual, easily interpretable format [43,44]. In the study, we used the rpart package in R to build a classification tree model that graphically depicted quantitative relationships between predictors and PSD risk. The tree was pruned to its optimal size, minimizing both classification error and tree complexity. Moreover, the AUC of the ROC and the accuracy, which was calculated in the testing set, were applied to evaluate the performance of the tree model. Of note, 3.1% of the data were missing. Missing values were substituted through multiple imputations to reduce bias and to increase statistical power. The imputation technique involved creating multiple copies of the data and replacing missing values with imputed values based on a suitable random sample from their predicted distribution. We used the mice package of the statistical package R.

Author contributions
Rui Liu, Yingying Yue designed the research, performed the data analysis and modeling. Haitang Jiang, Aiqin Wu, DeqinGeng, Jun Wang, Jianxin Lu, Shenghua Li, Hua Tang, Xuesong Lu and Kezhong Zhang completed the data collection. Rui Liu, Yingying Yue, Jian Lu, Tian Liu, Yonggui Yuan and Qiao Wang contributed to the writing. All authors reviewed the manuscript.