Criminal justice algorithms still discriminate

"What I am contending requires building new institutional structures, it requires shifting our mindset about who is credible and who should be in power when it comes to the use of these algorithms," says Ngozi Okidegbe. "And, if that is too much, then we can't, in the same breath, call this a racial justice project." (Credit: Getty Images)

Algorithms were supposed to remake the American justice system, but data can discriminate, says Ngozi Okidegbe, an expert on data and the law.

Championed as dispassionate, computer-driven calculations about risk, crime, and recidivism, their deployment in everything from policing to bail and sentencing to parole was meant to smooth out what are often unequal decisions made by fallible, biased humans.

But, so far, this hasn’t been the case.

“In theory, if the predictive algorithm is less biased than the decision-maker, that should lead to less incarceration of Black and Indigenous and other politically marginalized people. But algorithms can discriminate,” says Okidegbe, associate professor of law and an assistant professor of computing and data sciences at Boston University. Her scholarship examines how the use of predictive technologies in the criminal justice system affects racially marginalized communities.

As it is, these groups are incarcerated at nearly four times the rate of their white peers. According to the Bureau of Justice Statistics, an arm of the US Department of Justice, there were 1,186 Black adults incarcerated in state or federal facilities for every 100,000 adults in 2021 (the most recent year for which data are available), and 1,004 American Indians and Alaska Natives incarcerated for every 100,000 adults. Compare these to the rates at which white people were incarcerated in the same year: 222 per 100,000.

In recent papers, Okidegbe has studied the role of algorithms in these inequities and the interwoven consequences of technology and the law, including researching the data behind bail decisions.

Algorithms can amplify bias

In their most basic form, algorithms are problem-solving shortcuts. Engineers can train computers to digest a large amount of data and then produce a simple solution to a complex problem. Spotify, for example, uses algorithms to suggest songs the company thinks its listeners might enjoy, based on what they’ve listened to previously. The more data a computer model has to go on, the more nuanced and accurate its results should be.

But a growing body of academic research—including by Okidegbe—and news reports show that algorithms built upon incomplete or biased data can replicate or even amplify that bias when they spit out results. This isn’t a huge deal if, for example, your toddler’s Peppa Pig obsession leaks into your suggested Spotify playlists, but it can have devastating effects in other contexts.

Consider a judge, says Okidegbe, who receives an algorithmically generated recidivism risk score as part of a report on a convicted criminal. This score tells the judge how likely this person is to commit another crime in the near future—the higher the score, the more likely someone is to be a repeat offender. The judge takes this score into account, and assigns more jail time to someone with a high recidivism score. Case closed.

A sprawling report by the nonprofit news organization ProPublica found that because these scores feel impartial, they can carry a lot of weight with the judges who use them. In reality, these scores are neither impartial nor airtight. ProPublica found that one particular system used by courts across the country guessed wrong about two times as often for Black people than for white people: it mislabeled twice as many Black people who didn’t reoffend as being at high risk for doing so.

Messy data

In a recent article for the Connecticut Law Review, Okidegbe traces this inconsistency back to its source, and identifies a three-pronged “input problem.”

First, she writes, jurisdictions are opaque about whether and how they use pretrial algorithms, and often adopt them without consulting marginalized communities, “even though these communities are disproportionately affected by their utilization.” Second, these same communities are generally shut out of the process for building such algorithms. Finally, even in jurisdictions where members of the public can lodge opinions about the use of such tools, their input rarely changes anything.

“From a racial-justice perspective, there are other harms that come out of the use of these algorithmic systems. The very paradigm that governs if and how we use these algorithms is quite technocratic and not very diverse. Kate Crawford has noted AI’s ‘white guy problem,'” Okidegbe says, referring to a principal researcher at Microsoft and cochair of a White House symposium on AI and society who coined the term to describe the overrepresentation of white men in the creation of artificially intelligent products and companies.

From the very outset, Okidegbe says, algorithmic systems exclude racially marginalized and other politically oppressed groups.

“I’ve been looking at the decision-making power of whether and how to use algorithms, and what data they are used to produce. It is very exclusionary of the marginalized communities that are most likely to be affected by it, because those communities are not centered, and often they’re not even at the table when these decisions are being made,” she says. “That’s one way I suggest that the turn to algorithms is inconsistent with a racial justice project, because of the way in which they maintain the marginalization of these same communities.”

Shift the power

In addition to producing biased results that disproportionately harm marginalized communities, the data used to train algorithms can be messy, subjective, and discriminatory, Okidegbe says.

“In my work, I’ve contended with what I think is a misconception: that algorithms are only built with quantitative data. They’re not, they’re also built with qualitative data,” she says. Computer engineers and data designers will meet with policymakers to figure out what problem their algorithm should solve, and which datasets they should pull from to build it, Okidegbe says.

In the criminal and legal context, this might mean working with judges to determine what would help them deliver prison sentences, for example. Once again though, it’s much less likely that data engineers would meet with incarcerated people, say, as part of their early information-gathering process. Instead, as Okidegbe writes in an article for a recent edition of the Cornell Law Review, most large datasets used in pretrial algorithms are built upon and trained on data from “carceral knowledge sources,” such as police records and court documents.

“That puts forth this narrative that these communities have no knowledge to add toward the broader question,” Okidegbe says.

Really delivering on the promise of algorithms in the criminal justice system—the promise that they make the process more uniform and less biased than humans otherwise have—requires a radical rethinking of the entire structure, Okidegbe says. It’s something she encourages her students to consider as they shape the future of law and criminal justice.

“It means actually accounting for the knowledge from marginalized and politically oppressed communities, and having it inform how the algorithm is constructed. It also means ongoing oversight of algorithmic technologies by these communities, as well. What I am contending requires building new institutional structures, it requires shifting our mindset about who is credible and who should be in power when it comes to the use of these algorithms. And, if that is too much, then we can’t, in the same breath, call this a racial justice project.”

Source: Boston University