Discrimination By Artificial Intelligence In A Commercial Electronic Health Record—A Case Study – Health Affairs
When artificial intelligence (AI) is built into electronic health record (EHR) software, who is responsible for the consequences? Does responsibility lie exclusively with the hospital or health system that uses the vendor’s software, or is responsibility shared with the EHR vendor? What role should regulators, such as the Food and Drug Administration (FDA), have in ensuring that AI in health care is trustworthy and equitable?
AI has the potential to transform health care, but implementation at the point of care is fraught with challenges. Many health systems find it attractive to implement AI developed by EHR vendors that is integrated directly into existing EHR implementations (built-in AI). The leading global developer of EHRs, Epic Systems, Inc., has responded to this market enthusiasm with a growing set of predictive models. Here, we describe our experience implementing a predictive model developed by Epic and the ethical issues raised by its deployment.
A Built-In Prediction Tool For “No-Shows”
“No-shows”—the colloquial term for patients who fail to arrive for a scheduled appointment—are a major source of waste in US health care. Strategies to minimize no-shows include telephone reminders, text reminders, and overbooking, the latter of which appears to be the most effective in maximizing clinic slot use. AI can assist with targeted overbooking by predicting which patients are most likely to no-show, enabling practices to schedule an additional patient in that same time slot. Targeted overbooking, however, has the potential to harm individual patients: If both the originally scheduled patient and the overbooked patient do in fact arrive, provider time is stretched to accommodate two patients in a slot meant for one, potentially resulting in poorer quality care.
As part of their cognitive computing platform, Epic released a proprietary built-in AI tool that displays a numerical estimate of the likelihood that a patient will no-show, with minimal implementation effort and seamless EHR integration. Inputs to the model include the patient’s personal information, clinical history, their patterns of health care use (for example, prior no-shows), and features of the appointment (for example, day of the week).
Algorithms May Propagate Health Inequities, Explicitly And Implicitly
As we considered applying the no-show model at our institution, we identified several layers of potential bias that might harm vulnerable patient populations. First, the potential for explicit discrimination was obvious because the predictive model included personal characteristics such as ethnicity, financial class, religion, and body mass index that, if used for overbooking, could result in health care resources being systematically diverted from individuals who are already marginalized. We initially addressed this issue by rebuilding the model to exclude all personal information, leaving only prior patterns of health care use and features of the appointment, achieving the same performance as the vendor-built model. Nevertheless, we realized that our new model, even stripped of personal information, would not eliminate the potential to propagate societal inequity.
Removing sensitive personal characteristics from a model is an incomplete approach to removing bias. Prior no-shows—a variable included in both the vendor’s original model and our revised version—is likely to correlate with socioeconomic status, perhaps mediated by the inability to cover the costs of transportation or childcare, or the inability to take time away from work. Likewise, a patient with obesity who struggles with mobility may make it to their appointment only to find it overbooked, and their clinician thus overworked and distracted. The potential for these adverse consequences is further obscured by the “black box” of AI: In this, as in many AI-generated models, the contributing variables are not visible to the end user. In this example, clinic personnel likely may not realize they are overbooking a patient primarily because the patient is obese.
Health Equity Should Be A Driving Principle For How AI Is Used
If the predictive model were 100 percent accurate, scheduling another patient in a sure-to-be-empty slot would not be problematic, although the ethical imperative to care for the vulnerable patient remains. But no model is perfectly accurate, so we are left with the tension created by implementing a model that, while potentially improving the total use of available clinic time and personnel, may also preferentially divert resources away from patients in greatest need.
We are not arguing that demographic features should never be represented in a model, as they can be critical predictors of health and access to health care. The question is whether it is tolerable for demographic bias to be represented in a model, not just explicitly (as in the race/ethnicity input) but also implicitly (as in the prior no-show input), if that model may lead to action that negatively affects an individual patient. In the case of predicting no-shows, we believe that the overbooking intervention, which risks withdrawing resources from vulnerable patients, is ethically problematic. Yet, there are “patient-positive” interventions that may improve the likelihood a patient keeps their scheduled appointment, such as flexible appointment times, telehealth visits, or even assistance with transportation or childcare.
After reflection, we chose to implement only patient-positive interventions in response to the results of the no-show predictor. We piloted an intervention of targeted outreach in 12 clinics and saw a 9 percent mean reduction in no-show rates compared with the prior year. While targeted overbooking may have further improved scheduling outcomes for our clinics, we were not willing to jeopardize health equity by using Epic’s model or our revised model for that purpose.
A Call For Increasing Vendor Responsibility To Prevent Bias
Several recent publications have discussed potential biases associated with machine learning algorithms. Many concern generalizability to disadvantaged populations, including patterns of missing data, low minority group sample size, and disparities in care caused by implicit biases (Milena A. Gianfrancesco and colleagues and Alvin Rajkomar and colleagues). Ziad Obermeyer and colleagues recently demonstrated that the output of a widely used algorithm reflected underlying structural inequalities and called for attention to bias in the labeling of data. Our case study extends these concerns by highlighting the risk of AI-based interventions that are built directly into the EHR software, making them particularly easy to implement without careful consideration of the ramifications.
Accountability for ensuring that AI for health care is built, implemented, and used ethically has not been established. Software developers may have limited health care experience and not fully understand the implications of their models, including existing inequities that may implicitly inform or be promoted by their models. Health systems may not fully understand the workings and performance of AI methods, particularly as methods are increasingly deployed as “black boxes.” Given these perspectives, vendors may consider it the health care institution’s responsibility to use AI responsibly, while health care organizations assume the vendor would provide only valid and ethical models. Because built-in AI is delivered by an already-trusted business partner and easily implemented, it may bypass the scrutiny and oversight that might otherwise be applied to a tool from a new industry partnership. Furthermore, built-in AI is not yet subject to governmental regulation. As implementation of AI in health care grows, it will be important to clarify who bears the responsibility for ethics and regulatory oversight.
The FDA recently released draft guidance for regulation of clinical decision support (CDS) software, which includes areas where they intend to focus regulatory oversight. The FDA’s focus is on “device CDS,” decision support in which the provider is unable to independently review the basis for the recommendation. They contrast these tools with “non-device CDS,” decision support in which providers can understand the basis for the recommendation. The FDA does not intend to regulate non-device CDS at this time. The implication is that “black box” models would be classified as device CDS, while some predictive models might receive non-device CDS status if they provided sufficient transparency into the basis for the recommendation. Among device CDS, the FDA intends to zero in on algorithms that impact critical or serious health situations or conditions, as defined by the International Medical Device Regulators Forum, while allowing enforcement discretion for “non-serious” issues. A summary of the FDA’s regulatory policy for CDS software that is intended to inform clinical management for health care providers is shown in exhibit 1.
Exhibit 1: FDA regulatory framework for CDS software that informs clinical management for health care providers
Source: Adapted from the FDA draft guidance for regulation of clinical decision support (CDS) software. Note: The FDA does not intend to enforce compliance where regulation is characterized as “Enforcement Discretion” at this time.
While this evolving regulatory landscape represents progress, we remain concerned that many of these built-in tools will not be subject to appropriate evaluation. Some of the Epic-provided predictive models do explain which variables contribute to the predicted outcome, suggesting they are non-device CDS, but one needs a deep understanding of statistics and epidemiology to understand the residual bias and confounding that may exist in these models, leaving us with the fundamental question of what it means to truly understand the basis for a model’s recommendations. In another example, the no-show model we described might fit into the category of device CDS but potentially fit into the “non-serious” category where regulation is not a current priority for the FDA.
The American Medical Informatics Association recently raised concern over algorithm-driven bias in the absence of intended discrimination and recommended that the FDA develop guidance for how software developers might test their products for such biases. While EHR vendors have generally enjoyed a relatively unregulated landscape, the growing implementation of built-in algorithms should prompt reconsideration of how to ensure the trustworthiness of AI in driving clinical decision making. Given the potential for predictive models that are built into EHR software by the EHR vendor to be ubiquitously used with variable oversight by the health care systems that implement them, we believe such models should be considered for regulatory oversight, notwithstanding attempts at model transparency or the seriousness of the health issue they are addressing. This need is particularly great when the output of the model is the diversion of resources away from individual patients.
Regardless of whether Epic and other EHR vendors eventually seek FDA approval for their AI implementations, health care institutions and software vendors should partner to provide structured ethical oversight of these potentially valuable tools. Software vendors should involve ethicists, clinical informaticists, and operational experts early in the process of developing any CDS method, and health care delivery organizations need to ensure a broad ethical perspective as they evaluate new tools for implementation. As data science techniques become increasingly complex, multidisciplinary oversight is needed to ensure AI does not automate discrimination against our most vulnerable patients.
Dr. Wachter reports that he is a member of the Lucian Leape Institute of the Institute for Healthcare Improvement (no compensation except travel expenses); receives royalties from Lippincott Williams & Wilkins and McGraw-Hill for writing/editing several books; receives stock options for serving on the board of Acuity Medical Management Systems; receives a yearly stipend for serving on the board of The Doctors Company and on the Global Advisory Board for Teledoc; serves on the scientific advisory boards for Amino.com, PatientSafe Solutions, and EarlySense (for which he receives stock options); consults with Commure (for which he receives a stipend and stock options) and Forward (stock options); has a small royalty stake in CareWeb, a hospital communication tool developed at UCSF; and has given more than 200 talks (a few to for-profit entities including Nuance, GE, Health Catalyst, and AvaCare) for which he has received honoraria.