Data analysis and preventing risk of suicide

Detecting individuals who are at risk of suicide is a major clinical challenge. Thoughts about suicide or intention to commit suicide are not the same. Because there is not one patient the same regarding their genetics ,symptoms, biological causes or disorders and the correlation between these factors diagnosing patients might not be as easy. There is a handbook used by health care professionals The Diagnostic and Statistical Manual of Mental Disorders (DSM).It is  updated every few years contain the latest research results and categorization of mental illnesses. Patients in most part of the world are diagnosed according to the DSM. Generally what doctors can do when a patient arrives to their office is to ask the patient if he or she is having suicidal thoughts and based upon the answers they assess the risk . Apparently a survey called Risk factors for twelve-month suicide attempts in the National Comorbidity Survey Replication (NCS-R)revealed that 31.9% of the patient who attempted suicide disclose their suicidal ideation.(reference)The survey was conducted in 12 months and examined suicide ideation, plans, and attempts in a subsample of 5692 respondents. Individuals were assessed by many factors such as socio-demographic, parental-psychopathology, prior suicidality. The result people who idealized about suicide 31.9 %more likely committed suicide than those without a plan. This means about 70%of the people who thinking about suicide don’t actually go through with it. There are many factors that can lead to suicide and it is very hard to determine them all.
The goal for this clinical study was to develop a suicide risk classification tool using clinical notes that can predict suicide risks in patients. The model was built based on a case-control study that compared a group of patients medical records who committed suicide with a group of patients who didn’t.
Data from the group who committed suicide in 2009, was collected from a 100 people from U.S. Veterans Health Administration (VHA) National Suicide Registry. Clinical notes were collected on these people’s last 365 days before they committed suicide. The two group(the other group who didn’t committed suicide)was created to match the suicide group by age, sex, hospital record and patient disability status. After this based on the patient’s records three groups were created.
•Group 1-patients who didn’t use mental health services
•Group2-who committed suicide
•Group3-hospitalized patients who did not commit suicide
Electronic Medical Records were used that contained information about the treatments of the patients ,notes that discussed psychological state, depression and alcoholism were present for all three groups.

The dataset for the groups looked like this:

Group1-1913 notes(27 notes per patient)

Group2-4243 notes(61 notes per patient)

Group3-5388 notes (77 notes per patient)

The statistical Model:

The collected records were analyzed and models were built using machine-learning system. The models were constructed by converting the free-text records into words or word phrases datasets, that is, numerical counts of how often a given word or phrase appeared in a patient record. The derived models then identified the combination of words that were associated with suicide. The data was analyzed using a machine-learning algorithm to generate predictive models.The first prediction was made to determine what group the patients were fall into.

Than another model was built from the words or words phrases datasets  to see the frequency of words appeared in a patients’s medical report.Than the dataset was reduced  to several thousand words that are judged to be significant for the predicting outcome.The reduction of the dataset was reached by by computing the mutual information(or dependence of variables).The words were selected based on the highest mutual information.Than running algorithms on the dataset they produced 500 models.

Than a system called 3bin classification scheme was used that allowed clinicians to screen patients and continuously reevaluate the risk among the psychiatric patients.The 3 classifier helped to mark a patient if they belong to group 1, 2 or 3.The groups of patients(1 and 3)were combined and the combined results increased the size of the patient group dataset  who didn’t committed suicide and improved the accuracy of the test result.

The models were assessed and validated.

The final stage was to determine the the predictive terms for each groups and assigning the results to the groups.

Group2-Suicidal group

Group1-patients who didn`t use mental health services

Group3-hospitalized patients who did not commit suicide
Finally the analytical result distinguished the groups by the words that were associated with them.The most common words that appeared were agitation,frightened and delusions that associated with Group2- Suicidal group.Also many medical conditions such as gastrointestinal conditions,cardiopulmonary conditions ,oncologic conditions and pain conditions were connected with the increased risk for suicide but they were not included in suicide risk assessment tools.
The study was based on 210 individual and still needs further research on larger datasets before clinical testing.

Poulin, C., Shiner, B., Thompson, P., Vepstas, L., Young-Xu, Y., Goertzel, B., … McAllister, T. (2014). Predicting the Risk of Suicide by Analyzing the Text of Clinical Notes. PLoS ONE, 9(1), e85733.

Borges, G., Angst, J., Nock, M. K., Ruscio, A. M., Walters, E. E., & Kessler, R. C. (2006). Risk factors for twelve-month suicide attempts in the National Comorbidity Survey Replication (NCS-R). Psychological Medicine, 36(12), 1747–1757.

Leave a comment

Your email address will not be published. Required fields are marked *