Graphical Testing and Principles for Graph Design
2023-03-27
PhD/MS in Statistics from Iowa State
BS in Applied Math and Psychology from Texas A&M
Research Areas
Fundamental Goals
A statistic is a quantity computed from values in a sample used for a statistical purpose
Source: Wikipedia
\[\overline{x} = \frac{1}{n}\sum_{i=1}^n x_i\]
A chart is a graphical representation for data visualization, in which the data is represented by symbols
Source: Wikipedia
Charts are computed from values in a sample (usually) and used for a statistical purpose
So… charts are statistics!
Data from Tidy Tuesday, 2020-09-01
If statistics are charts, then what is the reference distribution?
What constitutes an “extreme” or “significant” chart?
Hypothesis testing:
take a sample
calculate a test statistic
compare test statistic to reference distribution
(formed by \(H_0\))
if it is unlikely, reject null hypothesis
Graphical hypothesis testing:
take a sample
create a test statistic/graph
compare graph to a reference distribution of other graphs generated under \(H_0\)
if test graph “stands out” then reject null hypothesis
The plot is a statistical lineup
The method is visual inference
(a graphical hypothesis test)
Many factors influence the results
31 Evaluations
Panel | % selected |
---|---|
12 | 9.7% |
5 | 29.0% |
18 | 32.3% |
Other | 29.1% |
22 Evaluations
Panel | % selected |
---|---|
12 | 59.1% |
5 | 9.1% |
18 | – |
Other | 31.7% |
Modify lineup protocol for tests of
competing hypotheses \(H_1\) and \(H_2\)
\(H_1\) and \(H_2\) target plots
18 null plots generated using a
mixture model consistent with \(H_0\)
\(K = 3, 5\) clusters
\(N = 15 K\) points
\(\sigma_T = 0.25, 0.35, 0.45\) (variability around the trend line)
\(\sigma_C = \begin{array}{cc}0.25, 0.30, 0.35 (K = 3)\\0.20, 0.25, 0.30 (K = 5)\end{array}\) (variability around the cluster centers)
\(\lambda = 0.5\) (mixture parameter)
18 combinations of plot parameters ( \(2K \times 3\sigma_T \times 3\sigma_C\) )
3 replicates of each parameter set = 54 total lineup data sets
10 Aesthetics \(\times\) 54 data sets = 540 plots
1201 participants from Mechanical Turk
Each participant evaluates 10 plots (12010 evaluations)
Participants select the plot or plots which are most different
Most participants identified a mix of cluster and trend targets
Examine trials in which participants identified at least one target: 9959 trials
Compare P(select cluster target) to P(select trend target)
\[C_{ijk} := \left\{\begin{array}{c}\text{Participant }k\text{ selects the cluster target }\\ \text{for dataset }j\text{ with aesthetic }i\end{array}\right\}\]
\[\text{logit} P(C_{ijk}|C_{ijk}\cup T_{ijk}) = \mathbf{W}\alpha + \mathbf{X}\beta + \mathbf{J}\gamma + \mathbf{K}\eta\]
\(\alpha\): vector of fixed effects describing data parameters \(\sigma_C,\sigma_T, K\)
\(\beta\): vector of fixed effects describing aesthetics \(1 \leq i \leq 10\)
\(\gamma_j\): random effect of dataset, \(\gamma_j\sim N(0, \sigma^2_{\text{data}})\)
\(\eta_k\): random effect of participant \(\eta_k\sim N(0, \sigma^2_{\text{participant}})\)
\(\epsilon_{ijk}\): error associated with single evaluation of plot \(ij\) by participant \(k\), \(\epsilon_{ijk}\sim N(0, \sigma^2_e)\)
Some of the null plots were missing an ellipse - We failed to enforce group size constraints on k-means algorithm.