Artificial intelligence (AI) is increasingly touching every facet of our society, including transportation, the stock market, dating, and health care (where I focus my work).
As AI makes its way into medical devices, hospital readmission algorithms, iPhone apps that scan moles to determine if you should see a dermatologist, etc., the public is being increasingly exposed to everything that can go wrong. Among the most worrisome aspects of AI implementation are its potential for bias.
Let’s focus on racial bias, because it is one of the most prominently discussed, but bias can affect many other groupings in a population. The easiest bias problems to understand involve training data. Imagine that an App for use on a phone uses a photo of your skin to help you determine if you have a skin condition that merits follow-up by a dermatologist. But if the training data lacks enough representation by individuals of darker skin tones, the app may perform poorly for those populations. Testing for this kind of racial bias is relatively straightforward. Correcting for it is more difficult and involves a combination of participatory design (that engages with underrepresented communities to encourage their participation) and requiring/incentivizing the maker of AI to make their training datasets representative of the communities on whom these models will be deployed.
Even with a data set that is completely representative of the population on which it will be used, though, bias can creep in. A good example is so-called “label bias”. A well-known study published in a 2019 paper demonstrated this bias through a widely-used algorithm that sought to improve the care of patients with complex health needs by providing increased follow-up care and other resources. The authors showed that the decision to have the algorithm (which actually excludes race as a variable) use health care costs as a proxy for health care needs, a decision that was not prima facie unreasonable, led to an algorithm that prioritized White patients over Black patients at the same level of health needs. Why? Because in the training data, on average, Black patients generated lower costs than White patients at the same health level because less money was spent on Black patients. By prioritizing patients at higher cost, it was not prioritizing patients at higher health needs, and this gap had a racially discordant effect. This cost differential itself may reflect preexisting patterns of care seeking and providing, which may themselves raise normative issues.
Developers should look for AI bias and take all feasible steps to make it less biased. But what should they do if some bias persists or we hit harder trade-offs between bias-reduction and other values (an interesting example is a “fix” that would reduce race-discordant results a little but would also make the algorithm significantly less accurate overall)? I think it is important for us to always ask the question “As Against What?” As philosophers sometimes put it, the Perfect should not be the enemy of the Good, and in measuring these technologies we should focus on how AI-enabled health care stacks up to non-AI enabled health care. In the case of bias, we must recognize that physicians and nurses, like human beings, are also often consciously or unconsciously biased including, sadly, showing significant racial biases in the way they treat patients. In evaluating a proposal to introduce a new AI feature into health care, we need apples-to-apples comparison. Even if using the AI to assist produces disparate results between groups, does it produce less biased results than the physician acting without it? Those comparisons are often hard to execute and fraught, but only if we ask the right question do we have a hope of making progress.
Glenn Cohen (B.A. (University of Toronto), J.D. (Harvard)) is one of the world’s leading experts on the intersection of bioethics and the law, as well as health law.
Source: The Pierre Elliott Trudeau Foundation
www.trudeaufoundation.ca/member/glenn-cohen