New study shows that individual variety within risk lables make the labels too weak to be meaningful.

Researchers Kristian Lum, David B. Dunson, and James Johndrow published a study showing that risk levels produced by risk assessment algorithms encompass a wide range of individuals with too much variety in their predicted risk to make those risk levels meaningful. In other words, a RAT creates a distinction based on group averages, not individual behavior, so when it places someone in a “high risk” vs “medium risk” category, which can go on to influence their pretrial supervision level or even freedom, these arbitrary distinctions in risk levels have too much power given the variety of people given the same “high” or “medium” label. As Lum explained it in a summary of the report, “the risk labels [are] fairly weak indicators of an individual’s probability of the outcome.”

See the abstract below (emphasis added), read the entire study at the link below, and check out Kristian’s explanation here.


Risk assessment instruments are used across the criminal justice system to estimate the probability of some future behavior given covariates. The estimated probabilities are then used in making decisions at the individual level. In the past, there has been controversy about whether the probabilities derived from group-level calculations can meaningfully be applied to individuals. Using Bayesian hierarchical models applied to a large longitudinal dataset from the court system in the state of Kentucky, we analyze variation in individual-level probabilities of failing to appear for court and the extent to which it is captured by covariates. We find that individuals within the same risk group vary widely in their probability of the outcome. In practice, this means that allocating individuals to risk groups based on standard approaches to risk assessment, in large part, results in creating distinctions among individuals who are not meaningfully different in terms of their likelihood of the outcome. This is because uncertainty about the probability that any particular individual will fail to appear is large relative to the difference in average probabilities among any reasonable set of risk groups.