Creating Effective Graphics

Susan Vanderplas

2024-11-13

Outline

  1. Why do we make charts?

  2. What makes a chart effective?

  3. What chart works for my data?

  4. Which chart(s) should I use?

Flowchart showing different data types and corresponding graphics.

Flowchart showing different data types and corresponding graphics.

Why?

Why do we make charts?

Tables are Tedious

id hd ra dec dist rv mag absmag spect
0 NA 0.000000 0.000000 0.0000 0.0 -26.70 4.850 G2V
1 224700 0.000060 1.089009 219.7802 0.0 9.10 2.390 F5
2 224690 0.000283 -19.498840 47.9616 0.0 9.27 5.866 K3V
3 224699 0.000335 38.859279 442.4779 0.0 6.61 -1.619 B9
4 224707 0.000569 -51.893546 134.2282 0.0 8.06 2.421 F0V
5 224705 0.000665 -40.591202 257.7320 0.0 8.55 1.494 G8III
6 NA 0.001246 3.946458 55.0358 0.0 12.31 8.607 M0V:
7 NA 0.001470 20.036114 57.8704 0.0 9.64 5.828 G0
8 224709 0.001823 25.886461 200.8032 -31.0 9.05 2.536 M6e-M8.5e Tc
9 224708 0.002355 36.585958 420.1681 0.0 8.59 0.473 G5
10 224717 0.002424 -50.866976 92.3361 0.0 8.59 3.763 F6V
11 224720 0.002488 46.939997 239.2344 -25.8 7.34 0.446 A2
12 224715 0.002727 -35.960225 307.6923 0.0 8.43 0.989 K4III
13 224728 0.002780 -22.594705 100000.0000 0.0 8.80 -11.200 K0III
14 224726 0.003228 -0.360450 205.7613 0.0 7.25 0.683 K0
15 236267 0.003356 50.791187 523.5602 0.0 8.60 0.005 K2
16 224732 0.003401 -40.192392 136.4256 0.0 8.15 2.476 F3V
17 NA 0.003469 -54.914363 751.8797 0.0 11.71 2.329 NA
18 NA 0.003522 -4.053680 44.8229 0.0 11.03 7.773 K5
19 224721 0.003554 38.304050 249.3766 16.0 6.53 -0.454 G5
20 224723 0.004203 23.529228 96.6184 0.0 8.51 3.585 G0
21 224724 0.004426 8.007234 189.3939 0.0 7.55 1.163 K2
22 224735 0.004674 -49.352266 318.4713 0.0 8.69 1.175 G8/K0III/IV
23 224742 0.004971 13.312234 86.2813 0.0 7.57 2.890 F2V
24 224746 0.005091 -23.452695 105.4852 0.0 9.05 3.934 G0V
25 224750 0.005300 -44.290561 81.3670 3.0 6.28 1.728 G3IV
26 224744 0.005606 -13.393378 100.8065 0.0 9.13 4.113 F7V
27 224748 0.005720 -41.297813 107.6426 0.0 9.32 4.160 G5V
28 224749 0.005815 -43.361821 198.0198 0.0 8.83 2.346 F3/F5V
29 224751 0.006146 -49.107945 411.5226 0.0 9.14 1.068 G8III
30 224757 0.006408 42.141474 244.4988 0.0 8.26 1.319 A0
31 224760 0.006539 2.675477 306.7485 0.0 7.63 0.196 K2
32 224756 0.006572 51.939487 330.0330 0.0 9.09 1.497 B8
33 224743 0.006611 -10.462385 113.7656 0.0 8.10 2.820 F2
34 224758 0.006638 26.918108 74.6269 0.0 6.43 2.066 F7.5IV-V
35 224745 0.006864 -14.490484 145.3488 0.0 9.07 3.258 G8/K0III/IV
36 224759 0.006893 12.267128 184.5018 0.0 7.68 1.350 K0
37 224764 0.007022 -47.179567 381.6794 0.0 10.44 2.532 F0V
38 224752 0.007429 -79.061983 42.3012 0.0 8.65 5.518 G6V
39 224763 0.007485 -16.697009 92.5069 10.0 7.46 2.629 F3V
40 224768 0.007771 54.302236 364.9635 0.0 8.70 0.889 A0
41 NA 0.008131 67.216783 100000.0000 0.0 10.61 -9.390 B…
42 224771 0.008362 25.844765 163.3987 0.0 8.20 2.134 F2
43 224784 0.008593 59.559679 136.2398 -33.0 6.18 0.508 G9III-IV
44 224776 0.008850 -3.306323 335.5705 0.0 7.91 0.281 K2
45 224766 0.008941 -72.202717 62.3053 0.0 9.59 5.617 G6/G8V:
46 224778 0.008972 -25.622285 833.3333 0.0 8.57 -1.034 K1III
47 NA 0.009006 -56.835602 40.7498 0.0 10.78 7.729 K3V
48 224780 0.009015 -40.690474 671.1409 0.0 7.31 -1.824 K0/K1III
49 NA 0.009304 16.669049 375.9398 0.0 9.53 1.654 F5
50 224782 0.009533 -53.097713 59.4177 0.0 6.49 2.620 G1IV
51 224774 0.009561 1.066213 100000.0000 0.0 8.94 -11.060 K0
52 224767 0.009781 -77.020116 209.6436 0.0 8.56 1.953 A4V
53 NA 0.009828 -29.263414 100000.0000 0.0 10.96 -9.040 NA
54 NA 0.010170 17.968908 50.2008 0.0 10.57 7.066 M:
55 224783 0.010549 -66.683173 64.9351 0.0 7.40 3.338 G2IV/V
56 224785 0.010862 0.222931 253.8071 0.0 8.12 1.097 K0
57 224789 0.011212 -69.675965 29.8686 0.0 8.27 5.894 K2V
58 224792 0.011577 62.175898 38.8048 16.0 7.05 4.106 G0
59 236270 0.011630 55.722460 100000.0000 -26.0 9.09 -10.910 B5
60 224788 0.011695 -64.465799 172.1170 0.0 8.34 2.161 F2IV
61 236269 0.011749 53.822146 178.8909 0.0 8.86 2.597 K5
62 224800 0.011998 -45.422739 308.6420 0.0 8.27 0.823 G8III
63 224801 0.012119 45.253334 187.9699 -3.0 6.36 -0.010 B9p SiEu
64 224798 0.012305 -27.907582 243.9024 0.0 8.08 1.144 K2III
65 NA 0.012498 -54.830524 54.7046 0.0 11.00 7.310 NA
66 224790 0.012567 -72.318220 127.2265 0.0 8.55 3.027 F2V
67 224806 0.013288 23.538219 150.3759 0.0 7.83 1.944 F5
68 224808 0.013360 16.988197 31.5856 -21.9 8.79 6.293 K0
69 224804 0.013427 30.395890 182.8154 0.0 8.33 2.020 A0
70 NA 0.013512 36.777637 168.3502 0.0 10.42 4.289 NA
71 224803 0.013803 36.780106 136.9863 0.0 8.26 2.577 G5
72 224810 0.014661 -12.828961 87.8735 0.0 8.97 4.251 G3/G5V
73 224826 0.014694 66.848011 280.1120 -12.0 6.90 -0.337 K2
74 NA 0.014817 35.752624 44.5236 0.0 9.93 6.687 K5
75 224821 0.014882 -50.446459 641.0256 0.0 7.42 -1.614 K4III
76 NA 0.015214 32.825654 346.0208 0.0 9.03 1.334 M0
77 224820 0.015371 -30.064166 208.7683 -1.0 8.41 1.812 A0V
78 224816 0.015487 17.867851 709.2199 0.0 8.06 -1.194 K0
79 NA 0.015718 35.316720 271.0027 0.0 8.69 1.525 F5
80 224817 0.016187 -11.823739 73.0994 -7.0 8.40 4.080 G2V
81 224828 0.016228 -4.932534 43.7828 0.5 8.57 5.363 G5
82 NA 0.016534 -10.935963 100000.0000 0.0 9.50 -10.500 G5
83 236274 0.016538 52.080562 543.4783 0.0 8.85 0.174 M0
84 NA 0.016824 27.886330 47.7783 0.0 9.61 6.214 K0
85 NA 0.017075 -24.714048 304.8780 0.0 10.62 3.199 A2
86 224836 0.017361 69.603505 272.4796 0.0 8.05 0.873 B9
87 224829 0.017433 -5.835002 340.1361 0.0 7.80 0.142 F5
88 224834 0.017941 -48.809876 181.8182 8.0 5.71 -0.588 G8III
89 224837 0.018477 53.166943 581.3953 0.0 7.87 -0.952 K2
90 224842 0.018671 -41.887544 113.2503 0.0 7.64 2.370 F5IV/V
91 224840 0.018702 -5.874348 301.2048 0.0 7.66 0.266 K0
92 224841 0.019379 -20.705550 404.8583 0.0 8.68 0.643 G8III
93 224839 0.019490 -0.076098 61.1995 0.0 8.12 4.186 F8V
94 NA 0.019665 -32.756836 90.7441 0.0 10.10 5.311 G2
95 224847 0.020134 -11.964034 149.2537 0.0 8.67 2.800 F5V
96 NA 0.020330 13.975088 35.1494 0.0 10.46 7.730 M0
97 224851 0.020478 -52.798575 174.8252 0.0 9.87 3.657 F6/F7V
98 224849 0.020492 -21.404042 180.1802 0.0 9.48 3.201 F5V
99 224855 0.021069 60.355282 877.1930 -34.0 7.04 -2.675 C5p

Charts are easier to read!

A scatter plot showing the color index of a star on the x-axis and the absolute magnitude (brightness) of the star on the y-axis. Points are colored by spectral class, which varies from blue to white to yellow to red as the color index increases and the star's temperature decreases. Points are primarily located along a downward-sloping line from the top left to the bottom right, which is labeled the 'main sequence'. There is another set of points which diverges from the main sequence and extends out horizontally in the middle of the graph; these are labeled 'giants', and a few outliers that are above the giant cluster are labeled 'supergiants'. Below the main sequence stars, there are outliers which are labeled 'dwarfs'.

The Hertzsprung Russell diagram. Discovered independently by Ejnar Hertzsprung (1873–1967) and Henry Norris Russell (1877–1957). The diagram plots the color index of the star against the brightness (absolute magnitude) of the star. As a result, it is possible to discern that these two variables are related and change together over a star’s life cycle: a hypothesis that only came to be because of this chart.

Why Charts?

Graphics are a form of external cognition that allow us to think about the data rather than the chart.

Why Charts?

Good graphics take advantage of how the brain works:

  • preattentive processing

  • perceptual grouping

  • visual limitations

Let’s demonstrate …

Preattentive Perception

  • Combinations of preattentive features require attention

    • Double-encoding (using multiple features for the same variable) is ok
(a) Shape
(b) Color
Figure 1: Two scatterplots with one point that is different. Can you easily spot the different point?

Preattentive Perception

  • Combinations of preattentive features require attention

    • Double-encoding (using multiple features for the same variable) is ok
(a) Shape and Color (dual encoded)
(b) Shape and Color (different variables)
Figure 2: Two scatterplots. Can you easily spot the different point(s)?

Perceptual Grouping

A picture laid out on an x-y coordinate axis. The y-axis is labeled 'Rabbit' and the x-axis is labeled 'Duck'. When viewed with the rabbit axis at the bottom, the image looks like a rabbit with tall ears; when viewed with the duck axis at the bottom, the image looks like a duck, where the ears become the bill.
Figure 3: Is this a rabbit or a duck?

 

A complex figure that appears to be made up of a black outline of a triangle, three circles laid out to form the points of an inverted triangle, and a white triangle overlaid on top of the three dots. In practice, what actually exists is a set of three angles at 0, 120, and 240 degrees, and a set of three pac-man shapes (circles with a pie slice taken out) at 60, 180, and 300 degrees.
Figure 4: What do you see in this image?

Perceptual Grouping

An image reading 'GESTALT', where each letter demonstrates a principle of gestalt grouping. G has a white stripe over it, demonstrating closure - the stripe and the G are perceived as separate objects. E is shown as a grid of black squares, with grey squares making up the background; this demonstrates proximity - small objects close together are perceived as being part of the same whole. A bar is woven through the S shape, showing good continuation - the S is perceived as a continuous object that is behind the bar in the middle portion. The two Ts are striped and indicate similarity - they are similarly shaped and patterned and can be perceived as a group. The AL are connected, and the inside of the A seems to have a white tree in the middle, demonstrating figure/ground. The final T is part of the similarity group.

Grouping in Charts

Bar chart, showing 5 states

Bar chart, showing 5 states

Line chart, with one state per line

Line chart, with one state per line

Box plot by year, all 50 states

Box plot by year, all 50 states

Color

  • 10% of XY and 0.2% of XX have color deficiency

  • Avoid rainbow color schemes

    • unequal perception of different colors => misleading idea of distance

    • avoid “stoplight” color schemes

Figure 5: Plates from the Ishihara colorblindness test

Color

  • Strategies
    • double encoding (color + shape or linetype)
    • make it work with a black and white printer
    • use monochromatic (single color) gradients where possible
    • bidirectional scales: go through a neutral color

Color

  • Use implicit associations (reduce cognitive load)
cold hot
neutral eco-friendly
flood drought
  • These associations may depend on culture
  • Avoid pink = female, blue = male

Categorical Scales

  • Working memory is limited to \(7 \pm 2\) items

  • Avoid legends that have more than 7 categories

  • Colorblind friendlier color schemes vary both hue (color) and lightness:

Accessibility

  • Why is the image there?
  • Overall meaning?
  • What type of plot?
  • Important Details?
  • Data Source (with link!)
  • Use larger text to make it easy for people with vis deficiencies to read

  • Some fonts are easier for people with dyslexia to read
    If you’re working with someone with dyslexia, ask them what font they prefer.

So now what?

Find the right chart for the data

How-To Guides & Sample code

  Program Resources/Strategy

R https://r-graph-gallery.com

Python https://python-graph-gallery.com

SPSS Chart Builder, Documentation

Matlab https://www.mathworks.com/products/matlab/plot-gallery.html

SAS Export data, import into R/Python

Excel Export data, import into R/Python

How-To Guides & Sample code

https://srvanderplas.github.io/stat-computing-r-python/

Example - Penguins

Data

Categorical Variables:

  • Island
  • Species
  • Gender
  • year (2007, 2008, 2009)

Numeric variables:

  • bill length (mm)
  • bill depth (mm)
  • flipper length (mm)
  • flipper depth (mm)
  • body mass (g)

Data

Categorical Variables:

  • Island
  • Species
  • Gender
  • year (2007, 2008, 2009)

Numeric variables:

  • bill length (mm)
  • bill depth (mm)
  • flipper length (mm)
  • flipper depth (mm)
  • body mass (g)

First Steps - Explore!

First Steps - Explore!

First Steps - Explore!

First Steps - Explore!

First Steps - Explore!

First Steps - Explore!

First Steps - Explore!

Important Characteristics

What is important to show about the relationship between sex, species, and bill dimensions?

Facets - Subsets of data

  • Sometimes, it’s helpful to remove missing data (but you should say so!)

Emphasize Groups

Emphasize Moderating Variables

Questions?

Link to R code