Leveraging unstructured electronic medical record notes to derive population-specific suicide risk models

Maxwell Levis; Joshua Levy; Vincent Dufort; Glenn T Gobbel; Bradley V Watts; Brian Shiner

doi:10.1016/j.psychres.2022.114703

Leveraging unstructured electronic medical record notes to derive population-specific suicide risk models

Psychiatry Res. 2022 Sep:315:114703. doi: 10.1016/j.psychres.2022.114703. Epub 2022 Jul 1.

Authors

Maxwell Levis¹, Joshua Levy², Vincent Dufort³, Glenn T Gobbel⁴, Bradley V Watts⁵, Brian Shiner⁶

Affiliations

¹ VAMC White River Junction, 163 Veterans Dr., White River Junction VT, 05009 United States; Department of Psychiatry, Geisel School of Medicine, 1 Rope Ferry Rd, Hanover NH, 03755 United States. Electronic address: [email protected].
² Departments of Pathology and Laboratory Medicine, Geisel School of Medicine, 1 Rope Ferry Rd, Hanover NH, 03755 United States.
³ VAMC White River Junction, 163 Veterans Dr., White River Junction VT, 05009 United States.
⁴ Department of Biomedical Informatics, 2201 West End Ave, Nashville TN, 37235 United States.
⁵ VAMC White River Junction, 163 Veterans Dr., White River Junction VT, 05009 United States; Department of Psychiatry, Geisel School of Medicine, 1 Rope Ferry Rd, Hanover NH, 03755 United States; VA Office of Systems Redesign and Improvement, 215 North Main Street, White River Junction VT, 05009, United States.
⁶ VAMC White River Junction, 163 Veterans Dr., White River Junction VT, 05009 United States; Department of Psychiatry, Geisel School of Medicine, 1 Rope Ferry Rd, Hanover NH, 03755 United States; National Center for PTSD, White River Junction, VT, United States.

PMID: 35841702
DOI: 10.1016/j.psychres.2022.114703

Abstract

Electronic medical record (EMR)-based suicide risk prediction methods typically rely on analysis of structured variables such as demographics, visit history, and prescription data. Leveraging unstructured EMR notes may improve predictive accuracy by allowing access to nuanced clinical information. We utilized natural language processing (NLP) to analyze a large EMR note corpus to develop a data-driven suicide risk prediction model. We developed a matched case-control sample of U.S. Department of Veterans Affairs (VA) patients in 2015 and 2016. We randomly matched each case (all patients that died by suicide in that interval, n = 5029) with five controls (patients that remained alive). We processed note corpus using NLP methods and applied machine-learning classification algorithms to output. We calculated area under the curve (AUC) and risk tiers to determine predictive accuracy. NLP-derived models demonstrated strong predictive accuracy. Patients that scored within top 10% of risk model accounted for up to 29% of suicide decedents. NLP-derived model compares positively to other leading prediction methods. Our approach is highly implementable, only requiring access to text data and open-source software. Additional studies should evaluate ensemble models incorporating NLP-derived information alongside more typical structured variables.

Keywords: Electronic medical records; Natural language processing; Suicide prediction; Suicide prevention.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Electronic Health Records*
Humans
Natural Language Processing
Risk Factors
Suicide*