Reproducible Science

Statistics, Forensics, and the Law

Susan Vanderplas

October 5, 2022

Forensic Science?

Forensic Failures

National Academy of Science (NAS) Report

  • Commissioned in 2005 by the Senate to assess forensic science, make recommendations, disseminate best practices, and identify relevant scientific advancements

  • Focus areas:

    • fundamentals of the scientific method in forensics,
    • collection and analysis of forensic data (error rates),
    • use of forensic evidence in criminal and civil litigation,
  • Important questions:

    • Extent to which a particular forensic discipline is founded on a reliable scientific methodology
    • Extent to which practitioners rely on human interpretation that could be biased

The adversarial process relating to the admission and exclusion of scientific evidence is not suited to the task of finding “scientific truth.” The judicial system is encumbered by, among other things, judges and lawyers who generally lack the scientific expertise necessary to comprehend and evaluate forensic evidence in an informed manner… Judicial review, by itself, will not cure the infirmities of the forensic science community.

Recommendations

Create a National Institute of Forensic Science to develop accreditation, manage federal/state/local jurisdiction differences, and develop standard reporting language.

Fund research on:

  • The validity of forensic methods
  • Quantifiable measures of the reliability and accuracy of forensic analyses
    • realistic case scenarios
    • limits due to quality of evidence
  • Quantifiable measures of uncertainty in the conclusions of forensic analysis
  • Automated techniques capable of enhancing forensic technologies

President’s Council of Advisors on Science and Technology (PCAST)

Judges’ decisions about the admissibility of scientific evidence rest solely on legal standards; they are exclusively the province of the courts and PCAST does not opine on them. But, these decisions require making determinations about scientific validity. It is the proper province of the scientific community to provide guidance concerning scientific standards for scientific validity

Requirements for Scientific Validity

  • Empirical testing by
    • multiple groups
    • under conditions appropriate to its intended use,
    • demonstrating that the method is repeatable and reproducible and
    • providing estimates of the method’s accuracy
  • subjective disciplines
    • evaluated as a “black box” in the examiner’s head.
    • Studies: many examiners render decisions about many independent tests.

Without estimates of accuracy, an examiner’s decision is scientifically meaningless: it has no probative value, and considerable potential for prejudicial impact.

Evaluation of Feature-Comparison Methods

Discipline Method Validity Studies
🧬 DNA 🧪 📚
🧬 DNA (mix) 🧪+🔎
in some situations
📖📖–
📖
Fingerprint 🔎
could be 💻
(high error rate) 📚
🔫 Firearms 🔎
could be 💻
📖📖
👞 Footwear 🔎
🦱 Hair 🔎 🚩🚩 📚
🦷 Bitemark 🔎 🚩🚩 📚
  Meaning
🧪 Lab
🔎 Subjective
💻 Algorithm
Valid
Unknown
🚩 Invalid
📚 Many Studies
📖 Some Studies

Recommendations

  • NIST should assess foundational validity annually 📚
    • With help from a committee of outside scientists/statisticians
    • Providing error rate estimates for valid disciplines
    • Suggesting necessary steps for validity (if not valid)
  • Develop objective methods for DNA mixtures, firearms, and fingerprints

Recommendations

Ironically, it was the emergence and maturation of a new forensic science, DNA analysis, in the 1990s that first led to serious questioning of the validity of many of the traditional forensic disciplines… When, as a result, DNA evidence was declared inadmissible in a 1989 case in New York, scientists engaged in DNA analysis in both forensic and non-forensic applications came together to promote the development of reliable principles and methods that have enabled DNA analysis of single-source samples to become the “gold standard” of forensic science for both investigation and prosecution.

- PCAST Executive Summary

Major Takeaways

Change only happens when evidence that was admissible is declared inadmissible

Scientists (forensic and not) have to be actively involved in the legal system

A second—and more important—direction is to convert latent-print analysis from a subjective method to an objective method. The past decade has seen extraordinary advances in automated image analysis based on machine learning and other approaches—leading to dramatic improvements in such tasks as face recognition and the interpretation of medical images. This progress holds promise of making fully automated latent fingerprint analysis possible in the near future. There have already been initial steps in this direction, both in academia and industry.

The same tremendous progress over the past decade in image analysis that gives us reason to expect early achievement of fully automated latent print analysis is cause for optimism that fully automated firearms analysis may be possible in the near future. Efforts in this direction are currently hampered, however, by lack of access to realistically large and complex databases that can be used to continue development of these methods and validate initial proposals.
- PCAST Executive Summary

Major Takeaways

Subjective methods can be automated with machine learning

Data gathering methods (and databases) are important resources for new method development

In recent years, some judges have struggled to understand increasingly complex scientific evidence…

For example, prosecutors and defense attorneys might benefit from a focus on the interpretation of and requirements for evidence; and judges may benefit from information on evaluating the scientific rigor of expert testimony and the reliability of forensic evidence.

…juries have been described as least comfortable and competent with regard to statistical evidence… Jurors’ use and comprehension of forensic evidence is not well studied.
- NAS Report pg 234-237

Major Takeaways

Scientific and statistical literacy is important for lawyers, judges, and juries

My Research

Algorithms and Statistical Learning

Footwear Evidence

  • No current basis for making quantitative assessments of footwear frequency in the population

  • 95% of footwear comparisons use make/model/tread pattern features
    class characteristics are shared by multiple items and are not individually identifiable

  • Goal: Develop a way to collect data about footwear/tread patterns

    • Equipment
    • Statistical analysis method

Algorithms and Statistical Learning

Bullet and Cartridge Case Analysis

  • Develop algorithms for matching bullets and cartridge cases

  • Compare these algorithms to examiner performance
    Informally, the bullet algorithms are much better – publications are in preparation

  • Algorithms must be explainable

    • Visual diagnostics to see how things “went wrong” if errors are made
    • Examiners (and eventually, juries) must conceptually understand how the decision was made

Algorithms and Statistical Learning

  • Develop a community of forensics open-source software developers

  • Encourage publication of source code and data

  • Develop validation sets that can be used to compare algorithm performance

    • Including performance of closed-source/proprietary algorithms
  • Resources for connecting lawyers with experts

    • prosecution has all the resources, defense usually doesn’t have the time and money

Assessing Error Rates

Examiner Decisions
Reality Identification
(match)
Inconclusive Elimination
(no match)
Same Source 🤨
Different Source 🤨

Scientific Communication: Juries

  • If we were to use firearms algorithms in court, how would that affect juries?

  • Can we use graphics and statistical visualizations to help juries understand?

\[\left(\begin{array}{c}\text{Identification}\\\text{Inconclusive}\\\text{Elimination}\end{array}\right)\times\left(\begin{array}{c}\text{Algorithm}\\\text{Status quo}\end{array}\right)\times\left(\begin{array}{c}\text{Pictures + Text}\\\text{Only Text}\end{array}\right)\]

Preliminary results

  • Inconclusive scenarios significantly reduce participants perception of the reliability of firearms examination and of how scientific the field is.

  • Including the algorithm testimony decreases participants assessment of how well they understood the testimony and participants’ opinions of examiner reliability.

Conclusion

Takeaway Project
Data gathering methods/databases are important resources for new method development Shoe scanner + Automatic Feature ID
Subjective methods can be automated with machine learning Bullet Algorithm development
Scientific and statistical literacy is important Jury Perception of Bullet Algorithm Testimony
Change only happens when evidence is declared inadmissible Legal Briefs/Testimony +
Bullet Algorithm development
Scientists have to be actively involved in the legal system Legal Briefs/Testimony

Acknowledgements

Collaborators

  • Heike Hofmann
  • Alicia Carriquiry
  • Kori Khan

Students

  • Rachel Rogers
  • Muxin Ha
  • Joe Zemmels
  • Jayden Stack
  • Miranda Tilton

This work was funded (or partially funded) by the Center for Statistics and Applications in Forensic Evidence (CSAFE) through Cooperative Agreements 70NANB15H176 and 70NANB20H019 between NIST and Iowa State University, which includes activities carried out at Carnegie Mellon University, Duke University, University of California Irvine, University of Virginia, West Virginia University, University of Pennsylvania, Swarthmore College and University of Nebraska, Lincoln.