How Do You Define a Circle?

Perception and Computer Vision Diagnostics

Susan Vanderplas & Muxin Hua

2023-12-06

Problem Overview

Footwear Forensics

Collect images of shoe soles from the population using the scanner
Identify features in the tread patterns w/ computer vision
Generate a local database of common pattern features
Characterize frequency of a new shoe w/ random match probability computed from database

Quantifying the frequency of shoes in a local population is an unsolveable problem - Leslie Hammer, Hammer Forensics, March 2018

This is the overall approach we’d planned to use for this project. We set out specifically to use features identifiable by examiners (so using the vocabulary they use to identify footwear features) with computer vision/deep learning algorithms. Our thought was that if we used features familiar to examiners, the model would be more usable by examiners – and thus, more likely to be adopted in practice. This is admittedly a different take on explainable AI but it is important in forensics that examiners understand what the algorithm is doing (at a basic level) so that they can explain its use in court, and that requires some familiarity. As the way these algorithms work is very foreign, we decided that it would be easier to work with features that were familiar. Sounds simple enough, right?

Footwear Forensics

Other researchers use the output from the CNN
(without a trained model head)
- hard to explain to practitioners
- hard to understand meaning
- for models to be accepted in forensics, they need to be explainable!

Specifically, when we’re done with this whole project – the scanner and the model – we have to convince forensic examiners that it’s worth using. And forensic examiners are a great bunch of people - for the most part, they’re very dedicated, very smart, and have a lot of expertise in their field. But their field is definitively not math, and most of them are not what we’d call quantitatively oriented. So that is a big limitation on what we can do with neural networks.

A lot of previous work with neural networks in forensic pattern analysis uses features directly from the model base, without a trained model head. These features don’t make sense to practitioners, so the entire research project is dead before it starts, because we have to get examiners to adopt this stuff before it makes any practical impact.

We want to avoid this process, so we specifically set up this model to spit out features that we can all describe – lines, circles, bowties, chevrons. We have to work within the confines of human language and our model has to spit out features that are explainable to examiners and can be generated by examiners.

Our Assumption in 2018

If models can differentiate between types of elephants, they can identify shapes… right?

:::

XKCD: Tasks

Complication: Different CV Models

We can reasonably pose this problem in 3 different ways:

Classification: same-size regions labeled with one or more classes

Object Detection: Propose a bounding box and label for each object in an image

Image segmentation: find regions of the image and label each region

Each method requires a different labeling schema, annotation method, and data format

In Search of Human-Friendly Model Output

(What we’ve tried so far)

Initial Approach (~2019)

Use VGG16 to classify 256x256 px chunks of images
Goal is to label the entire chunk with one or more classes

VGG16 Shoe Example approach

Hard to integrate predictions into the main image

Initial Approach (~2019)

Not terrible but a lot of class confusion between e.g. Circles & Text, Quad & Polygon, Quad & Triangle

Synthetic Data Test (2020)

If we create different shapes, can a neural network differentiate them?

Class Examples

Synthetic Data Test (2020)

Image	Circle	3	4	5	6	7	8	9	Star
	.9999	0	0	0	0	0	0	0	0
	.9999	0	0	0	0	0	0	0	0

Shoe data is complicated to label
Predictions made on ambiguous data don’t work as well as we’d like

Object Detection (2021-2023)

Object Detection: Propose a bounding box and label for each object in an image

re-encode labels using a different data format
Toolkit:
- Started with FastAI, but had terrible support/documentation
- Eventually rewrote everything in PyTorch

Fundamental Problem

What shape is in the box? Text? Circle? Triangle? Star?

Neural networks are trained on millions of human-annotated photos
Even shoe soles are artificial relative to a natural scene
Networks weren’t trained on the artificial patterns or layouts in shoe soles
Labeling is fraught with errors and incomplete information
Labeling schema are very complex & must account for human perception

If we look at the shape in the box, there is not actually a circle there - instead, there are triangles arranged inside an invisible circle, with text in the center inside another invisible circle made up of the points of the triangles. This is … hard to label. Our undergrad RAs (and even the researchers) aren’t sure how to deal with this.

We can get useful features out of existing NNs if we sacrifice interpretability/explainability, but examiners aren’t likely to use those tools in practice - they’re better used in database searches and other situations that don’t require examiners to apply them.

It seems like we don’t have sufficient data to adapt to the diverse patterns we see in real shoe treads: we need thousands or millions of perfectly labeled images to train or adapt a NN model. The only real solution to this problem is to do something somewhat different - we simply can’t generate that volume of data through human labeling alone (I don’t have Google’s budget).

Approaching the Problem Backwards

Generate a large library of synthetic data
- pre-labeled
- complex characteristics
- Train preliminary model
Run 2D patterns through an existing network to generate more realistic 3D images
- Train 2nd-gen model

Approaching the Problem Backwards

Run 2D patterns through an existing network to generate more realistic 3D images
- Train 2nd-gen model
Train on marketing-quality pictures labeled by humans
- Update 2nd gen model (transfer learning)

Approaching the Problem Backwards

Train on marketing-quality pictures labeled by humans
- Update 2nd gen model (transfer learning)
Train on Scanner Photos
- Update 3rd gen model weights
  (account for lower-quality photos)

Measure performance/accuracy changes over time on a consistent set of stimuli created from real shoes

Synthetic Pattern Generation

Use SoleMate style labeling for examiner familiarity
(Much more complex coding scheme)

Synthetic Pattern Generation

Regions of distinct patterns

Synthetic Data Generation

Different Patterns

Synthetic Data Generation

Shoe Outlines

Synthetic Data Generation

Advantages

SVGs can include metadata
Easy scaling
SVG intersection operations will automatically mark partial objects
Flexible data format:
- Region segmentation
- Object Detection
- Object Classification
  all generated from same source data

Disadvantages

Manual SVG creation
(52 images \(\approx\) 8h )
Creating a library to generate data
3D rendering after 2D stage:
- digital via OpenSCAD + SVG?
- Can apply different surface colors
Lots of work required before we start in on photos

End Goal

Human Friendly Model Outputs

Familiar features for database search
Data quality flexibility:
- Messy photos for database creationi
- Neat images for search
Goal: reliable estimates of random match probability RMP: the probability that someone in the area has a shoe with similar characteristics.

Questions?

Acknowledgements

This work was funded (or partially funded) by the Center for Statistics and Applications in Forensic Evidence (CSAFE) through Cooperative Agreements 70NANB15H176 and 70NANB20H019 between NIST and Iowa State University, which includes activities carried out at Carnegie Mellon University, Duke University, University of California Irvine, University of Virginia, West Virginia University, University of Pennsylvania, Swarthmore College and University of Nebraska, Lincoln.

Students who have worked on this project:

Muxin Hua (2022-)
Jayden Stack (2021-2022)
Miranda Tilton (2018-2019)