Validation - Mapping Pretrial Risk

Often, developers of risk assessment tools (RATs) describe tools as recently or locally “validated,” hoping this indicates scientific accuracy to those who might adopt a tool. But what is validation? Validation means statistically determining how well a RAT measures what it is designed and calibrated to measure.

As researchers Sarah Desmarais and Evan Lowder explain, “For pretrial risk assessment tools to be considered “valid,” they must be able to estimate the probability of failure to appear and/or pretrial rearrest at statistically significant and politically acceptable rates.”¹

Using past data about real people who were caught up in the criminal legal system, researchers apply the RAT to cases where they already know if someone was re-arrested or did not come to court following pretrial release. They can then compare what the RAT predicts would happen to what actually happened.²

However, validation studies do not consider people who were never released pretrial, limiting the “validity” of validation statistics.

Explore how designers attempt to determine “accuracy” of a RAT with this interactive tool from MIT Technology Review.³

Validation of pretrial RATs does not have a standardized definition across different jurisdictions.⁴ Different jurisdictions have validated their tools in different ways, some internally and some with an external independent researcher.

Plus, just because a tool is validated does not mean it is perfectly accurate or transparent about how it operates, or that it is calibrated for a local community.⁵

The Pretrial Justice Institute’s 2019 Scan of Pretrial Practices⁶ report found that only 45% of the jurisdictions they surveyed had validation studies.

In our research, the majority of validation studies were not conducted by an independent or third party researcher. Many jurisdictions are using tools that have not been validated with their local population or have not been validated at all.

The graph below depicts how often the jurisdiction we interviewed had been validated locally.

Movement Alliance Project generated based on interviews.
Total number of jurisdictions with validation information = 33

How does validation work?

Validation reports document the relationship between assessment results and actual outcomes for groups. Like algorithms, validation reports are done in the aggregate, not for specific individuals.

Validation studies measure predictive validity. They help determine whether pretrial RATs accurately predict whether the outcome for a person in the study set would be the same for the accused person being judged by the RAT today.

A common statistical way of measuring accuracy and predictive validity is through the “area under the curve,” or AUC. The AUC score is supposed to show how well the tool balances its correct and incorrect predictions — how often it correctly answers the question at hand (like how “risky” someone is), and how often it gets the prediction wrong.

The closer an AUC score is to 1, the more accurate a tool is said to be. An AUC score of 0.5 is no better than chance in predicting risk⁷: a 50/50 shot.

Some RATs have AUC scores as low as 0.55, barely more accurate than random chance or a coin toss. Several common tools have scores around 0.65, which is considered “good” in criminology research⁸ but “poor” in other fields⁹; a score of 0.65 means over one third of those judged by these tools are being mislabeled.

And unlike many other fields, there is a lack of independent evaluation of these validation studies, which severely limits any claims that pretrial RATs are truly predictive.¹⁰

Sarah Desmarais and Evan Lowder point out that “demonstrating predictive validity does not equate with research demonstrating implementation success.”¹¹ Even if a tool is considered highly “accurate” by these standards, it doesn’t mean that RATs are being implemented as intended or in a decarceral or racially unbiased way. The predictions they make are not always accurate, not always listened to even if accurate, and are applied inconsistently and in structurally racist ways.