Local Population Footwear Class Characteristics

An End-to-End Pipeline for Automatic Data Acquisition and Analysis

Susan Vanderplas & Rick Stone

Discussion

How are you currently using footwear forensics?

Use of Footwear Evidence

Some reasons we’ve heard

Few individuals trained
- Collection of footwear impression evidence is difficult
- First responders often damage evidence at the scene
- Equipment for collecting prints is difficult to use and expensive
- Insufficient detail in prints for RAC analysis
- Insufficient people to perform RAC analysis
Not as useful in court as other types of evidence

How do we make footwear evidence more useful?

Random Match Probability

After a crime is committed, investigators must reconcile the evidence found at the scene with a narrative of the crime. For instance, shoeprints at the scene might be linked to shoes in the suspect’s possession, which would suggest the suspect’s shoes were at the scene. During this process, the shoes are examined and the two prints are compared. In court, the prosecution must then describe the value of that evidence - how much information should it provide to the jury concerning the suspect’s guilt or innocence?

Part of the calculation of that information is to determine what the probability of a coincidental match is, that is, what’s the probability that some random individual would also have a shoe with a tread pattern similar to the print at the crime scene? If that probability is high, the evidence is less valuable, but if it’s low, then the jury should treat the evidence with much more weight.

What is the probability of a coincidental match?

Define the comparison population
Sample from the comparison population
\(N\) total shoes
Identify similar shoes from the comparison population
\(S\) similar shoes in the \(N\) shoe sample
Estimate the probability of a coincidental match: \[\hat{p} = \frac{S}{N}\]

–

Quantifying the frequency of shoes in a local population is an unsolveable problem - Leslie Hammer, Hammer Forensics, March 2018

Obstacles: Characterizing Comparison Populations

No 100% complete database of all shoes
- manufacturer, model, size, tread style, manufacturing molds
Shoe purchases vs. frequency of wear (temperature, weather dependence)
Local populations may differ wildly
New tread patterns appear frequently

For starters, while there are databases for other pattern match evidence, like tire tread patterns, there is not a complete database of all shoes sold in the US. Tires have to be certified; shoes do not. There are also many more manufacturers for shoes, new models are released all the time. A single model may have multiple tread patterns, a single tread pattern may be used on multiple shoe models. The tread pattern may change depending on the style of shoe; there are also different molds for a single size/tread combination, and these molds may have different characteristics.

You may think about instead tracking sales data - surely, we could get a database of shoe preferences that way? How many of you have shoes in your closet that you’ve never worn? That you’ve worn once? Or less than once a year? Purchase data doesn’t provide a realistic picture of the shoes people wear day to day - most of us have one or two “favorites”. In addition, that provides us no information about how the match probability changes with season and weather. Obviously, most people aren’t wearing sandals in the middle of winter, but there aren’t any studies of footwear frequency to back that up with data.

In addition, we know that local populations differ wildly in footwear choices. The footwear worn on campus might not be all that similar to the footwear worn near the capitol building, because the populations that frequent them are different and the dress codes are different. This is another problem with sales data - it doesn’t generalize well to the hyper-local regions that we might want to consider when characterizing coincidental match probability.

So how do we solve this problem? How do we collect this data at a (potentially) neighborhood level?

Relevant Features

Make, Model, Tread pattern, Size, Type of shoe
Cannot be used to identify an individual match
Used for exclusion

In forensics, class characteristics are broad descriptors shared by many different individual objects. In shoes, class characteristics refer to make, model, tread pattern, size, type of shoe, and even wear patterns. Examiners will say that a suspect’s shoe “is consistent with” prints left at the scene, but if the match is made on class characteristics alone (95% of the time), they cannot explicitly connect the shoe and the print at the crime scene.

Randomly acquired characteristics, which occur due to random damage as the shoe is worn or during the manufacturing process, can be used to make an individualized match.

We’ve already discussed why make and model are difficult to work with - there’s no indexed data set to use. Similarly, shoe size isn’t as related to tread size as you’d expect, so that’s off the list too. Working with tread pattern seems like a better option.

Relevant Features

Features other than make/model and size:

Knockoffs often have very similar tread patterns
Similar styles have similar tread patterns across brands
Unknown shoes can still be classified and assessed

Dr. Martens	Eastland	Timberland

Work 2295 Rigger	1955 Edition Jett	6” Premium Boot

If we work off of features within the shoe tread, we get some additional benefits.

First, similar tread patterns are found in shoes of similar style - knockoffs specifically try to emulate a tread pattern, but even across well known brands, shoes that serve a similar function often have similar tread patterns - here are 3 different models of work boots, from different manufacturers, each with the same tread pattern. The number of design elements may differ slightly, but that variation happens even within shoe make and model - different sizes have different tread elements in some cases. These shoes would all leave a similar print, so working with the entire set of shoes with these features makes more sense than specifically identifying the make and model.

An additional benefit is that unknown shoes can still be classified and addressed. If we define our feature set as “Shoes with quadrilaterals around the edge that have triangle cutouts, and diamond-shaped plus signs in the middle”, we can start off by estimating the probability that a shoe like these 3 exists, and then can increase the specificity of the query from there as data quality and amount allows. It’s definitely not a perfect solution, but crime scene prints are typically degraded, so this is a level of detail that matches the practical problem fairly well. It’s an abstraction, but at a level that makes sense both statistically and pragmatically.

Automatic Shoe Data Acquisition

Design Philosophy

Requirements (Outdoor)

Requirements (Indoor)

Tech Specs

Scanner Demonstration

Automatic Feature Identification

Automatic Feature ID Goals

ID geometric features in outsole images
Robust
- lighting conditions
- rotation
- image quality
- tread colors
Fast processing of new images
Identify features using human-friendly terms

Statistical Importance

Assemble a database of shoe images
- from local populations
- with identified features
Calculate random match probability
Provide more weight to class characteristic comparisons
- eventually, probabilistic comparisons?

:::

Computer Vision

Baby’s First Feature Set

Bowtie	Chevron	Circle

Line	Polygon	Quadrilateral

Star	Text	Triangle

Used to separate shoes by make/model in (small) local samples

Labeling the Data

Screenshot from LabelStudio demonstrating the labeling process.

Model Training

Provide images and labels to the algorithm
Algorithm tries to reduce mismatch b/w algorithms and labels
(loss function)
End result is an algorithm which takes new images and outputs matching labels (with a corresponding probability)

Babies work similarly, but are a lot cuter (and a lot more needy)

Classification vs. Detection

Classification assigns an image to one or more of a fixed set of categories

Detection identifies the location of objects in an image and assigns a label

All dogs are identified and have bounding boxes.

Results

When classifying images, we get fairly good results, though some classes are confused.

Definitions

Blue: Prediction matches image label

Grey: Prediction does not match image label

We created a shiny application to see the images and the model’s predictions. Blue means that the image had that label, grey means it does not. I’ve selected two images that show both correct and incorrect model classifications.

In the first image, the design is labeled as a quadrilateral and the model identifies that, but also identifies image as containing a circle very strongly. When we look at the image, the confusion is understandable. One half of the shape is angular, the other is rounded, so the shape has features of both a quadrilateral and a circle. We’ve decided to label these images as both (owing to the ambiguity), but that means we have to correct all of the previously labeled images. We’re working on that.

In the second image, the model predicts circles, quadrilaterals, and text, but the image is labeled as having quadrilaterals and text. The circles happen to be part of the text (and the letters aren’t even Os), and our brains pick up on the text but ignore the circles because we perceive things wholistically; the model does not. We’re also in the process of updating these labels, because again, the data is not correct; the model absolutely is.

We’re trying to ensure that the data used to train the model is of very high quality, while not spending millions of dollars to hire workers online to label things. Because we determined the guidelines for labeling the data, labeled the data (or oversaw the labeling), and trained the model ourselves, we have the advantage of knowing the flaws at every point in the process; that means we have the responsibility to fix those flaws where possible.

We’re not doing inference on the model results at this point (nor planning to use the data we’re training the model with during the operational stage) so the data -> model -> fix data loop is less of a validity concern.

When the model is sufficiently well-calibrated, we can then work with engineers to build the device, collect some initial data, and tweak the model weights with new data that better represents what we’ll actually see from the collection equipment. By that point, hopefully we’ll also have narrowed down the geometric classification scheme so that categories that are now somewhat fuzzy are more clearly operationalized.

What does the model see?

unscaled heatmapp - DC Yellow = high activation

Blue: Prediction matches image label

Grey: Prediction does not match image label

unscaled heatmapp - DC Yellow = high activation

Blue: Prediction matches image label

Grey: Prediction does not match image label

unscaled heatmapp - DC Yellow = high activation

Blue: Prediction matches image label

Grey: Prediction does not match image label

Class Characteristic Labeling Activity

Questions

Discussion

Collaborate with us!
Collect population level data
Data sharing
Other uses for the scanner or software?

Susan Vanderplas: susan.vanderplas@unl.edu

Rick Stone: rstone@iastate.edu