Development and Validation of an Electronic Health Record-Based Machine Learning Model to Estimate Delirium Risk in Newly Hospitalized Patients


Andrew Wong, BA; Albert T Young, BA; April S Liang, BSE; Ralph Gonzales, MD, MSPH; Vanja Douglas, MD; Dexter Hadley, MD, PhD


Delirium is a highly prevalent state of acute confusion that broadly affects hospitalized patients worldwide. Current clinical methods for identifying hospitalized patients at increased risk of delirium require staff-administered screening tools with moderate accuracy. As part of the UCSF Delirium Reduction Campaign, we introduce an automated machine learning model that predicts incident delirium risk in hospitalized patients based on electronic health data available on admission and which vastly outperforms current clinical standards.


We conducted a retrospective cohort study evaluating five machine learning algorithms to predict delirium using over 200 clinical variables from 29,359 unique hospital admissions. Variables identified by an expert panel as relevant to delirium prediction and available in the electronic health records (EHR) within 24 hours of admission. Categories included patient demographics, diagnoses, nursing records, laboratory results, and medications available in the EHR during hospitalization. Delirium was defined as a positive nursing delirium screening scale (Nu-DESC) or confusion assessment method for the intensive care unit (CAM-ICU) score, which was conducted on all patients at UCSF by nurses at every change of shift. Models were assessed using the area under the receiver operating characteristic curve (AUC) and compared against AWOL, a validated delirium risk assessment tool routinely administered in this cohort.


All five machine learning models outperformed AWOL by a significant margin. Gradient boosting machine was the best performing model, with an AUC of 0.855. Setting specificity at 90%, the model had a 59.7% sensitivity (95% CI: 52.4-66.7%), 23.1% positive predictive value (95% CI: 20.5-25.9%), 97.8% (95% CI: 97.4-98.1%) negative predictive value, and a number needed to screen of 4.8. Penalized logistic regression and random forest also performed well, with AUCs of 0.854 and 0.848, respectively. In comparison, AWOL achieved a baseline AUC of 0.678. These models outperform all available clinical gold standards, to our knowledge.


Machine learning can be used to accurately estimate hospital-acquired delirium risk using EHR data available within 24 hours of hospital admission. An automated prediction algorithm has the potential to provide more effective care at lower cost with decreased provider burden. We are currently working to integrate this algorithm into the EHR for provider use at our institution, and it is estimated to save over $30,000 and 100,000 staff work hours per year when successfully implemented.


Inouye SK, Westendorp RGJ, Saczynski JS. Delirium in elderly people. Lancet. 2014;383(9920):911-922. doi:10.1016/S0140-6736(13)60688-1. Ely EW, Shintani A, Truman B, et al. Delirium as a Predictor of Mortality in Mechanically Ventilated Patients in the Intensive Care Unit. JAMA. 2004;291(14):1753. doi:10.1001/jama.291.14.1753. Douglas VC, Hessler CS, Dhaliwal G, et al. The AWOL tool: Derivation and validation of a delirium prediction rule. J Hosp Med. 2013;8(9):493-499. doi:10.1002/jhm.2062.

Want to have your abstract featured here? ACP holds a National Abstracts Competition as part of the ACP Internal Medicine Meeting every year. Find out more at ACP Online.

Back to the January 2021 issue of ACP IMpact