This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Interactive Journal of Medical Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.i-jmr.org/, as well as this copyright and license information must be included.
An easily accessible real-time Web-based utility to assess patient risks of future emergency department (ED) visits can help the health care provider guide the allocation of resources to better manage higher-risk patient populations and thereby reduce unnecessary use of EDs.
Our main objective was to develop a Health Information Exchange-based, next 6-month ED risk surveillance system in the state of Maine.
Data on electronic medical record (EMR) encounters integrated by HealthInfoNet (HIN), Maine’s Health Information Exchange, were used to develop the Web-based surveillance system for a population ED future 6-month risk prediction. To model, a retrospective cohort of 829,641 patients with comprehensive clinical histories from January 1 to December 31, 2012 was used for training and then tested with a prospective cohort of 875,979 patients from July 1, 2012, to June 30, 2013.
The multivariate statistical analysis identified 101 variables predictive of future defined 6-month risk of ED visit: 4 age groups, history of 8 different encounter types, history of 17 primary and 8 secondary diagnoses, 8 specific chronic diseases, 28 laboratory test results, history of 3 radiographic tests, and history of 25 outpatient prescription medications. The c-statistics for the retrospective and prospective cohorts were 0.739 and 0.732 respectively. Integration of our method into the HIN secure statewide data system in real time prospectively validated its performance. Cluster analysis in both the retrospective and prospective analyses revealed discrete subpopulations of high-risk patients, grouped around multiple “anchoring” demographics and chronic conditions. With the Web-based population risk-monitoring enterprise dashboards, the effectiveness of the active case finding algorithm has been validated by clinicians and caregivers in Maine.
The active case finding model and associated real-time Web-based app were designed to track the evolving nature of total population risk, in a longitudinal manner, for ED visits across all payers, all diseases, and all age groups. Therefore, providers can implement targeted care management strategies to the patient subgroups with similar patterns of clinical histories, driving the delivery of more efficient and effective health care interventions. To the best of our knowledge, this prospectively validated EMR-based, Web-based tool is the first one to allow real-time total population risk assessment for statewide ED visits.
The use of emergency department (ED) services in the United States is growing at an alarming rate [
Improving appropriate use of emergency services is an important strategy for improving health outcomes and controlling health care expenditures [
We previously developed predictive analytics of patient risk of a 30-day return to the emergency department [
In this paper, we describe our findings for the ED visit risk modeling for the statewide population at large, whether or not they have had a previous emergency room visit. This is the first effort to model total population ED risk across all payers, all diseases, and all age groups. Our efforts include the statistical learnings from all Maine HIE patient data contained in the statewide HIE of longitudinal patterns to identify risk factors that strongly influence the probability of a future 6-month ED visit.
Although the two metrics (ie, risks of the 30-day ED revisit [
We hypothesized that real-time assessment of population ED risk to track and trend risk over time can allow health managers to continuously assess and intervene on both high-risk and rising-risk patients. To empower the visualization and exploration of the total population risks of over one million patients in the state of Maine, Web-based apps were designed, aiming to connect in real-time, aggregate, and centrally integrate data, and to compute future 6-month ED risks for population health management.
Integrating predictive analytics into workflows of proactive population health management and hospital quality improvement; emergency department (ED) visit risk determination and proactive interventions guided by ED visit risk or ED readmission risk measures.
This work was done under a business/product development arrangement between HealthInfoNet (HIN) and HBI Solutions, Inc., and the data use was governed by a business agreement between HIN and HBI. No patient health information was released for the purpose of research and no patient consent was required. We completed the system development that was the foundation for our agreement and then reported on the findings resulting from applying this model to the real-time Web-based services that HIN is now deploying in the field. Because this study analyzed de-identified data to develop the ED risk model, the Stanford University Institutional Board considered it exempt (October 16, 2014).
The objective was to study total population risk for ED visits across all payers, all diseases, and all age groups. Patients visiting any HIN-connected facility from January 1, 2012 through December 31, 2013 were eligible for study. Patients who died, as identified through an encounter disposition code, were excluded during the study time frame of 2012 and 2013. ED visits transferred from another ED were excluded as these were treated as one ED visit, and not multiple.
We constructed an enterprise data warehouse consisting of all of Maine’s HIE aggregated patient histories. Incorporated data elements from EMR encounters included patient demographic information, laboratory tests and results, radiographic procedures, medication prescriptions, diagnosis, and procedures, which were coded according to the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). Census data from the US Department of Commerce Census Bureau were integrated into our data warehouse. Therefore, in addition to the HIN features, we categorized patients by socioeconomic status using residence zip codes as an approximation to the average household mean and median family income and average degree of educational attainment.
Maine HIE patient clinical histories were organized as hospital episode level relational database tables. We processed the database at patient level based on medical record number for population analysis within 36 facilities in Maine. A pivot table was developed from our enterprise data warehouse, which aggregated and integrated normalized clinical features (n=33,403) of different data categories, for example, primary diagnosis/procedure, secondary diagnosis/procedure, laboratory test result, radiology result, and outpatient prescription, from different relational EMR databases. For qualitative and categorical parameters, dummy variables were created serving as numerical representations of the categories of nominal or ordinal variables. To efficiently eliminate the least representative features, we exploited the data variance as the simplest criterion [
A “time-to-event” curve of ED visits (
The basic principle of our model was using information of 1 patient in the prior 1 year to predict if this patient would have any ED visit in the next 6 months. The statistical learning to forecast future 6-month ED visit risk consisted of two phases: retrospective modeling and prospective validation (
Study design to develop the active case finding algorithm to predict future 6-month emergency department visit risks.
The goal of this study was to develop an active case finding algorithm with a statewide future 6-month ED visit risk measure. The measure comprised a single summary score, derived from the results of a “forest” of the most discriminative decision trees upon 1 year of the encounter history. The measure calculated each subject’s probability of a future 6-month ED visit. The retrospective modeling phase consisted of three steps: (1) training, (2) calibrating, and (3) blind testing. We applied a selective cohort division process while trying to result in a random cohort. The samples in the retrospective cohort were divided into six subgroups based on histories of chronic diseases, historical ED visits, and current primary diagnoses (
A “survival forest” of forecasting decision trees was developed using the prior year clinical history and was ranked according to the corresponding posterior probability. To introduce the prior knowledge, we grouped the clinical features into two groups: empirical features found by exploratory data analysis and the learned features discovered during the model training. Our exploratory analysis (
Cohort II was used to calibrate the predictive scoring threshold to create a risk measure for each individual sample. Applying the model developed with Cohort I to each sample in Cohort II, the derived predictive scores were ranked. After this, we applied a mathematic function mapping predictive values (PPVs).
Our active case finding algorithm was set to segregate the population into subgroups with different levels of future 6-month ED risks. The risk measure was defined as an index between 0 and 100 so that the people with measures larger than or equal to a risk index
We obtained two thresholds,
In our implementation, the objective was to select the least number of representative features predictive of future 6-month ED risk and to achieve optimal case finding sensitivity while maintaining the targeted PPV (>70%) based on selected features (
Cohort III was an independent naive sample set, which was compiled to blind test the active case finding method’s performance. The aim of this step was to critically assess the utility of the risk measure before statewide prospective validation in Maine. Again the model developed with Cohort I was applied to every sample in Cohort III to derive and rank the predictive scores and calculated the receiver operating characteristic (ROC) area under the curve (AUC) score.
The clinical application of the future 6-month risk measure was deployed for prospective validation on the HIE data in Maine. The cohort of 875,979 patients from July 1, 2012 to June 30, 2013 was prospectively profiled to calculate future 6-month ED visit risk measures using the clinical applications deployed at HIN. The ROC [
We used principle component analysis [
The active case finding model and associated real-time Web-based app were designed to track the evolving nature of total population risk, in a longitudinal manner, for ED visits across all payers, all diseases, and all age groups. Patient historical datasets are linked and stored in a patient-level database in our system. ED predictive algorithm is applied to the individual’s ED discriminating feature data to risk-stratify the patients with our prospectively validated model. Individual data are then aggregated for population exploration of ED risks, which are stored in the population-level database. Our dashboard allows the visualization of the population ED risks at high geographical resolution for a defined population, for example, the population of Maine.
The active case finding algorithm produced a risk score (from 0 to 100) for each patient at the time of risk assessment of the future ED visit. In general, our algorithm achieved high performance that ROC AUCs of the risk score for a determination of risk of patient future 6-month ED utilization were 0.739 and 0.696 in retrospective blind testing and prospective validating respectively (
Active case finding algorithm effectively identified different risk group patients for future 6-month emergency (ED) utilization (upper panel shows X axis: different ED risk groups; Y axis: active case finding positive predictive value (PPV); and lower panel summarizes average ED uses at different ED risks in the future 6 months in both retrospective and prospective analyses).
In developing the algorithm, we aimed to help potential care providers assess the “opportunity case” (high-cost, high degree of utilization of services, multiple chronic conditions) for various risk score thresholds and for different assumptions about the impact of the intervention. The active case finding algorithm was capable of stratifying patients across a wide range of risks (
Active case finding algorithm effectively risk-stratified the prospective patient cohort for future 6-month emergency department (ED) visit (graphic representation of low, medium, and high risk patients’ time to next impending ED visit).
Our principle component analysis retrospectively identified (
Our ED predictive analytics were integrated into the Maine State HIE system (
Schematic demonstration of data flow and communications of a population emergency department (ED) risk exploration system that allows real-time assessment of population ED risk.
Total population emergency department (ED) risk monitoring dashboard.
We hypothesized that an individual patient’s future 6-month ED risk can be forecast from the statistical learning of a population’s comprehensive longitudinal clinical histories. Our use of the population-based HIE facilitated the development and prospective testing of the case finding algorithm presented here, which is population-based and not event-triggered (ED visit) analytics. After calculating the total population risk scores for future ED visit risk scores, this information is then made available to clinicians and caregivers at the point of care to support both individual patient and population-based decision making. Using adjustable risk settings allows multiple patient cohorts of different impending ED risks to be constructed. Moreover, high-risk patients with similar longitudinal clinical patterns can be subgrouped for targeted intervention in real time. Accurate identification of patient populations at high risk for ED visits is an integral component to address specific gaps in health care coverage, institute primary care-based interventions, and avoid preventable ED visits. Such active case finding may help providers deliver more efficient and effective health care interventions.
Designed to be used in real time by population panel managers to forecast a future ED visit, our EMR-based active case finding method was prospectively validated with a reasonable level of sensitivity and specificity. To the best of our knowledge, our EMR-based population ED risk study is the first with such scale for ED trending across all payers, all diseases, and all age groups. Our study’s obvious strength is the use of an entire US state in regard to predictive analytics. Its weakness is the study cohort’s potential patient characteristics unique to the state of Maine, which may limit our model’s general applicability to the other state populations.
Data limitations, for example, missing data, inaccurate diagnostic/procedure coding, and the unreliable tracking method to identify patients who die, may result in false negative and false positive case calls. Additionally, new patients who lack encounter histories tended to be categorized as low risk for future ED visits, a function that likely underestimates the ED risk for these subjects. We speculate that using additional currently non-reported features, including self-rated health conditions, lifestyle-related factors, and socioeconomic status may enhance the analytical approach to ED visit risk assessment.
Beyond identifying at-risk populations for potentially preventive services, gaining a deeper understanding of both the unique and common attributes of various subgroups may further facilitate overall management and the prevention of unwanted ED utilization. Moreover, to be clinically useful, a case finding model should be iterative and facilitate exploration of the potential benefit (PPV) or burden (false positive rate) (business case) of managing subpopulations of high-risk patients. Accordingly, we sought to determine whether unique patterns of resource utilization or clusters of patient subpopulations existed among the considerable heterogeneity of the high-risk patient population when considered together. We demonstrated that among the high-risk group patients, their associated demographics, chronic conditions, and varying patterns of resource consumption do not occur in isolation. Cluster analysis revealed six clinically relevant subgroups among the high-risk patient population that were confirmed as durable upon prospective testing. These subgroups have unique patterns of demographics, disease severities, comorbidities, and resource consumption, suggesting new opportunities to provide stratified care management among these groups. For example, Cluster #6 had senior patients with co-occurring histories of the most diverse chronic conditions and linked to the highest utilization of clinical tests and prescriptions. In addition, this group of patients is at considerable risk to experience poor health outcomes, including, but not limited to, lower quality of life, diminished functional status, as well as excess morbidity and mortality. This distinctive cluster could be targeted with new, enhanced care management strategies. We noted a decreased prevalence of the co-occurring chronic conditions in four other cluster groups of relatively younger adults with much less resource consumption. Within these four clusters, females aged 19-49 years without any chronic disease may benefit from targeted care to keep them out of the emergency room, although more analysis is needed to understand the risk drivers within this group. Currently, many existing care management strategies are directed toward single conditions and are event-triggered, for example, ED visit or hospital discharge. The current active case finding model provides novel opportunities to experiment with new strategies of coordinated care targeting a combination of conditions across different age and demographic groups that we speculate may lead to greater case management efficacy.
While the clusters identified in the study represent clinically similar populations that could guide specific care management strategies, we understand that the missing information (eg, mental health and substance abuse diagnostic information) may mask important characteristics of these clusters. Past studies have shown that mental health diagnoses are frequent within the ED patient population [
With our ED risk model, tactics for modifying care management programs can be driven and measured against the analytical risk assessment derived from the HIE records. HIEs are a valuable data resource, providing longitudinal and comprehensive patient data. HIEs, such as HIN, that have completed the necessary rigorous mapping of multiple providers’ data to standard nomenclature including LOINC [
Our study is the first study of total population risk for ED visits across all payers, all diseases, and all age groups. Applying analytical tools on EMR and HIE data, including the active case finding model, the high-risk patient clustering method, and the Web-based real-time ED risk profiling analysis and exploration, will help health care providers effectively leverage their EMR to better understand ED service delivery while providing opportunities for improved health care delivery for the patients. A great strength of this work is the use of data from an entire state HIE, including data from across the entire spectrum of the health care system. This is not just hospital or emergency department data because it includes outpatient clinics and physician practices. In that regard, our work should serve as a model of what other states can do with HIE data to really impact patient care and population health.
Electronic medical record (EMR) features used to develop active case finding model.
Emergency department (ED) admission “time-to-event curve” showed pattern of rapid accrual with stable and consistent ED visit rate thereafter. Population ED visit curves, of patients with more than one or any ED visit, stabilized within 6 months from evaluation time, indicating a 6-month cutoff is clinically reasonable. Assessment date: January 1, 2013.
Study cohort construction, and inclusion/ exclusion criteria; retrospective/ prospective cohort construction.
Patient characteristics.
Exploratory data analysis: patient counts of total set and those having emergency department (ED) revisit in future 6 months, as function of number of chronic diagnoses (left panel) and ED visits in past 12 months (right panel), and percentages of patients with ED revisits was also plotted.
Technical details of decision tree based modeling.
Feature selection and characterization of discriminant features in retrospective/prospective dataset.
The model performance was gauged by ROC analysis for retrospective blind testing and perspective validating respectively.
Unsupervised clustering of high risk population using principal component analysis.
Active case finding algorithm effectively risk-stratified retrospective patient cohort for future 6-month emergency department (ED) visit: graphic representation of low-, medium-, and high-risk patients’ time to the next impending ED visit.
Unsupervised clustering of the high-risk patients identified consistent distinct subgroups in both retrospective (left panel) and prospective (right panel) cohorts.
Clustering of emergency department 6-month high-risk patients in the retrospective/prospective cohort according to demographics and prior year clinical histories.
accountable care organization
area under the curve
emergency department
electronic medical record
health information exchange
HealthInfoNet
International Classification of Diseases, 9th Revision, Clinical Modification
positive predictive value
receiver operating characteristic
We express our gratitude to the hospitals, medical practices, physicians, and nurses participating in Maine’s HIE. We also thank the biostatistics colleagues at the Department of Health Research and Policy, Stanford Pediatric Proteomics Group for critical discussions.
KGS, EW, and XBL are co-founders and equity holders of HBI Solutions, Inc., which is currently developing predictive analytics solutions in health care. From the Departments of Pediatrics, Surgery, and Statistics, Stanford University School of Medicine, Stanford, California, AYS, SH, ZL, YW, KGS, XBL conducted this research as part of a personal outside consulting arrangement with HBI Solutions, Inc. The research and research results are not, in any way, associated with Stanford University.