Clusters Beat Trend!? Testing Feature Hierarchy in Statistical Graphics
Susan VanderPlas & Heike Hofmann
Iowa State University
Graphics and Perception
The greatest value of a picture is when it forces us to notice what we never expected to see.
John Tukey
Gestalt Laws of Perception
The whole is different than the sum of the parts
-
Rules that make sense of complex visual information using experience
-
Information organized hierarchically
-
Subconscious process to order and group visual input
Gestalt Plots
How do plot aesthetics change our perception of the plotted data?
Statistical Lineups
Which plot is the most different?
Null plot data is from a data-generating method consistent with the null hypothesis
The nullabor
package helps with null data creation
Which plots are the most different?
Which plots are the most different?
Which plots are the most different?
-
22 Evaluations
-
Plot 12: 59.1%
-
Plot 5: 9.1%
-
Other: 31.7%
Which plots are the most different?
-
31 Evaluations
-
Plot 12: 9.7%
-
Plot 5: 29.0%
-
Plot 18: 32.3%
-
Other: 29.1%
Two-Target Lineups
-
Modify lineup protocol for tests of competing hypotheses \(H_1\) and \(H_2\)
-
\(H_1\) and \(H_2\) target plots
-
18 null plots generated using a mixture model consistent with \(H_0\)
5, 12
Data Generating Mechanism
- Generate data from a linear model \(M_T\) (trend)
- Generate data from a \(k\) cluster model \(M_C\)
- Generate null data from a mixture model \(M_0\)
- \(n_c\) observations from \(M_C\)
- \(n_t = N - n_c\) observations from \(M_T\)
Linear Model
Parameter: \(\sigma_T\), the variability around the trend line
- Generate evenly spaced \(x_i\) in \([-1, 1]\)
- Jitter \(x_i\)
- Generate \(y_i = x_i + e_i\), \(e_i \sim N(0, \sigma_T^2)\)
- Center and scale \(x_i, y_i\)
Cluster Model
Parameters: \(K\) clusters, \(\sigma_C\) cluster variability
- Generate \(K\) cluster centers \(c^x,c^y\) on a \(K\times K\) grid such that \(cor(c^x, c^y) \in [.25, .75]\)
- Center and standardize \(c^x, c^y\)
- Determine cluster size \(g_1, ..., g_K \sim Multinomial(K, p)\)
- Generate points around cluster centers: \((x_i, y_i) = (c^x_{g_i}, c^y_{g_i}) + (e_i^x, e_i^y)\) where \(e_i \sim N(0, \sigma_c^2)\)
- Center and scale \(x_i, y_i\)
Cluster Model
Mixture Model
- \(n_c\) points from \(M_C\), where \(n_c \sim Binomial(N, \lambda)\)
- \(N - n_c = n_T\) points from \(M_T\)
Groups created by k-means clustering
Mixture Model
Experimental Design - Data Parameters
- \(K = 3, 5\)
- \(N = 15 K\)
- \(\sigma_T = 0.25, 0.35, 0.45\)
- \(\sigma_C = \begin{array}{cc}0.25, 0.30, 0.35 (K = 3)\\0.20, 0.25, 0.30 (K = 5)\end{array}\)
- \(\lambda = 0.5\)
18 combinations of plot parameters (\(2K \times 3\sigma_T \times 3\sigma_C\))
3 replicates of each parameter set; 54 total lineup data sets
Experimental Design - Plot Aesthetics
10 Aesthetics \(\times\) 54 data sets = 540 plots
Experimental Design
- 1201 participants from Mechanical Turk
- Each participant evaluates 10 plots (12010 evaluations)
- Each \(\sigma_C \times \sigma_T\) value with one replicate, randomized across \(K\) values
- All 10 aesthetic types
- Participants select the plot or plots which are most different
- Provide a short explanation
- Rate confidence level
Results
Most participants identified a mix of cluster and trend targets
Results
Faceoff Model
- Examine trials in which participants identified at least one target (9959)
- Compare P(select cluster target) to P(select trend target)
\[C_{ijk} := \left\{\begin{array}{c}\text{Participant }k\text{ selects the cluster target }\\
\text{for dataset }j\text{ with aesthetic }i\end{array}\right\}\]
Faceoff Model
\[\text{logit} P(C_{ijk}|C_{ijk}\cup T_{ijk}) = \mathbf{W}\alpha + \mathbf{X}\beta + \mathbf{J}\gamma + \mathbf{K}\eta\]
- \(\alpha\): vector of fixed effects describing the effect of data parameters \(\sigma_C,\sigma_T, K\)
- \(\beta\): vector of fixed effects describing the effect of aesthetics \(1 \leq i \leq 10\)
- \(\gamma_j\): random effect of dataset, \(\gamma_j\sim N(0, \sigma^2_{\text{data}})\)
- \(\eta_k\): random effect of participant \(\eta_k\sim N(0, \sigma^2_{\text{participant}})\)
- \(\epsilon_{ijk}\): error associated with single evaluation of plot \(ij\) by participant \(k\), \(\epsilon_{ijk}\sim N(0, \sigma^2_e)\)
Faceoff Model
Participant Reasoning: Plain plots
Participant Reasoning: Trend plots
Participant Reasoning: Color plots
Participant Reasoning: Color + Ellipse plots
Participant Reasoning
Some of the null plots were missing an ellipse - We failed to enforce group size constraints on k-means algorithm.
Conclusion
- Plot aesthetics matter
- non-additive effects
- what do you want to emphasize?
- Multiple encoding is useful -
“show the data” in a way that makes it easy to understand
Conclusion
- Error bands and cluster ellipses highlight important features in the data:
outliers, group size inequality, variability, clustering
- Null data-generating models are hard!
The brain runs 100s of visual “tests” and designing for all of them simultaneously is impossible