Mark Dredze, a John C. Malone Professor of Computer Science and a pioneer in using AI tools to gain insights into public health challenges ranging from suicide prevention to vaccine refusal and from tobacco use to gun violence, has also found how bad data can work to amplify bad outcomes. He says early missteps in machine learning algorithms—such as when Microsoft’s initial release of a chatbot called Tay had to be shut down after just 16 hours for spewing obscene and racist language it picked up online—highlight the dangers of working with bad data.

“The problem is that if we just accept the data and say to the machine, ‘Learn how to make decisions from the data,’ when there are biases in the data, or biases in the process of how we learn from data, we will produce biased results. That is the essence of the problem of fairness in machine learning algorithms,” Dredze says.

Grayscale headshot of Mark Dredze.

Mark Dredze

This applies to his own research, some of which involves scanning the web to understand how people turn to the internet for medical information and what innate biases they are likely to encounter there. In a recent paper he co-authored, researchers set about evaluating biases in context-dependent health questions, focusing on sexual and reproductive health care queries. They looked at questions that required specific additional information to be properly answered when that information was not provided. For instance, “Which is the best birth control method for me?” has no single correct answer, as it depends on sex, age, and other factors. Dredze and colleagues found that large language models often will simply provide answers reflecting the majority demographic, as, for example, suggesting oral contraceptives, a solution only available to women, while neglecting to include the use of condoms. This kind of built-in bias is a special concern for individuals who turn to the web as a replacement for traditional health care advice since misinformed answers have potentially detrimental effects on users’ health, Dredze says.

“The problem is that if we just accept the data and say to the machine, ‘Learn how to make decisions from the data,’ when there are biases in the data, or biases in the process of how we learn from data, we will produce biased results. That is the essence of the problem of fairness in machine learning algorithms.”
– Mark Dredze, John C. Malone Professor of Computer Science

This and his earlier pioneering work in “social monitoring”—employing machine learning to gain understanding from text published on social media sites—has led him to focus not just on the raw data, but also on how people use the web.

“I would describe this as a more holistic approach, where we’re actually building systems and paying attention to how people interact with those systems,” he says. “Where did our data come from? How did we collect it? I have to care about these issues when giving data to the algorithm and figuring out what the algorithm does. But then I also need to account for the fact that a human will interact with us. And humans are going to have their own biases and issues. So maybe it’s not just that the system is biased or unbiased, but it interacts with someone to create a different kind of bias.”

 

ALGORITHMS ARE EVERYWHERE

Making AI decisions understandable may first require overcoming a larger challenge. The term algorithm itself raises math anxiety among many since the word is poorly understood. But Ilya Shpitser, a John C. Malone Associate Professor of Computer Science whose work focuses on algorithmic fairness in datasets of all types, points out that algorithms are the basis for everyday decision making among many—even if they don’t know it.

Grayscale headshot of Ilya Shpitser.

Ilya Shpitser

“When doctors diagnose, when judges set bail, they have a sequence of steps they’ve learned that’s considered reasonable,” he says. “Regardless of how they think of it themselves, they are using algorithms, because judicial decisions and diagnosis cannot be arbitrary; they better be systematic. The fact that similar cases are decided in similar ways: That’s what an algorithm really does.”

Most important for a judge to appropriately set bail or a doctor to accurately make a diagnosis is the need for good, fair, accurate information. In algorithmic decision-making, it all comes down to the data. And in an imperfect world, for any decision, there will be good data, there will be bad data, and most vexingly of all, there will be data we simply don’t have.

“Any person who works in actual data, real data of any kind, has missing data in their sets, that’s just how it is, basically,” says Shpitser, who cites a common example of data collection in electronic health records, where missing data can be due to a lack of collection such as a patient was never asked about asthma. Or it could come from a lack of documentation, as when a patient was asked about asthma but the response was never recorded in the medical record. “Lack of documentation is particularly common when it comes to patients not having symptoms or presenting comorbidities.” In these cases, rather than recording a negative value for each potential symptom or comorbidity, the missing data fields are left blank and only the positive values are recorded, which skews the value of the data, says Shpitser. “This makes it essentially impossible to differentiate between the lack of a comorbidity, the lack of documentation of a comorbidity, or the lack of data collection regarding the comorbidity.”

“My take on algorithmic fairness research is that it’s not our job as researchers to decide what fair is. I think the discussions of fairness need to be discussions in the public square.”
– Ilya Shpitser, John C. Malone Associate Professor of Computer Science

One of the central challenges in creating fair and accurate algorithms then becomes devising sound methods of correcting for data recorded incompletely, incorrectly, or not at all. “I work on data being screwed up,” is how Shpitser describes it. Along the way, he has demonstrated in his research that it is at least theoretically possible to “break the cycle of injustice” (in which variables such as gender, race, disability, or other attributes introduce bias) by making optimal but fair decisions. His research employs the methodology of causal inference, which he describes as “methods to adjust for incomplete, bad, or missing data to allow reliable and fair inferences to be made.”

 

FAIRNESS ISN’T AN ORACLE CONCEPT

In the past few years as AI systems have caught increasing media attention, highly public machine learning misfires have brought scientists and engineers a deeper awareness of both the importance and the difficulty of designing and implementing systems equitably.

“For a long time, we said, ‘Look, the algorithms are math, and math is math,’” says Dredze. “It was, ‘Let’s throw the math at this, and it comes up with what it comes up with.’” That attitude no longer applies. “I think we’ve learned a couple things. One is that the math might be math, but the data is not the data: It always has some kind of bias in it. And we need to do something about that.”

But that may only be the beginning. “The other thing we’re learning—and maybe there’s a little controversy to this—is that math isn’t just math. Math always has some assumptions to it. The models that you pick always have some assumptions, and for a variety of reasons we might favor certain models that do better on some groups. And so it’s not only a matter of the data; it’s also a matter of the models we build. How do we make our models aware that fairness is a thing? How do we build into the models some measure of fairness?”

Which points ultimately to issues that transcend both the data and the math.

“Fairness isn’t an oracle concept,” notes Dredze. “If you’ve got kids and you’ve ever tried to give them anything, they complain about fairness, right? And you end up telling them that life isn’t fair. Fairness is subjective: That person got a smaller piece of cake, but they had a frosting flower, and you didn’t get a flower.”

It is, he points out, tremendously challenging to formalize concepts of fairness. “We can build that into our models and train them to be aware. But who decides what the right definition of fairness is? Think about the most controversial issues in society, like college admissions. Both sides of affirmative action and college admissions are insisting we need to be fair, but they have opposing views as to what that means,” he said.

All of which suggests that creating algorithms for a fairer world in the end will not be the purview of computer scientists alone.

“My take on algorithmic fairness research is that it’s not our job as researchers to decide what fair is. I think the discussions of fairness need to be discussions in the public square,” says Shpitser. “As an American citizen, I have my own opinions of what policies we should follow, but that’s a different path than wearing my hat as a researcher on algorithmic fairness. In other words, computer scientists are best suited to be the implementers, not the deciders, in notions of what is fair.”

He continues: “Whenever I give talks about this, I always get questions that try to push me into being some kind of priesthood that decides for people what fairness criteria to use. I really don’t think that is our job. I have as much grounds to advocate for a particular definition of fairness as any other citizen, but the fact that I work in algorithmic fairness doesn’t give me a special advantage. I’m using modern tools, but the questions themselves are much older than that.”

Excerpted from JHU Engineering >>

Illustration by Patric Sandri.