Creating Good Graphics

Susan Vanderplas

2024-06-26

Identifying the Problem

Pie Chart Poll Results

A newspaper clipping from the Scottsbluff Star-Herald, showing a pie chart of support for the marijuana legalization inititive in Nebraska, from Tuesday, March 16, 2021. The yes slice (which seems to be about 56% of the area) is labeled 44%, while the no slice (which seems to be about 44% of the area) is labeled 56%.

Figure 1: Scottsbluff Star Herald Reader poll. Source
  • What is wrong with this chart?

  • Do you think it might be misleading? If so, how?

  • Do you think the mistakes were intentional?

High Support

A CBS News pie chart of americans who have tried marijuana, showing 51% today, 43% last year, and 34% in 1997. The chyron below the image says 'High support for legalizing marijuana. More than half of Americans say they've tried pot'

Figure 2: Source

  • What is wrong with this chart?

  • What would you change to more accurately represent the data?

  • Do you think the mistakes were intentional?

Gas Prices

Two bar charts showing the % increase in petrol and diesel prices in India (2018). The first chart shows an increase of 20.5% from 2004 to 2009 (real values of 33.71 to 40.62), an increase of 75.0% from 2009 to 2014 (real values of 40.62 to 71.41), and a 13% decrease from 2014 to 2018 (real values of 71.41 to 80.73). The last bar and arrow are shown in yellow, while the first three bars are shown in green. The second chart shows diesel prices, with real values of 21.74, 30.86, 56.71, and 72.83 in 2004, 2009, 2014, and 2018, respectively. Arrows show the change between each price set, with a 42% increase from 2004 to 2009, an 83.7% increase from 2009 to 2014, and a 28% decrease from 2014 to 2018, which is highlighted in yellow. At the bottom of each chart, an image of Narendra Modi is shown.

Figure 3: Gas and Diesel price changes in India (2004 - 2018).

  • What is wrong with this?

  • What design choices contribute to the problems?

  • Do you think this was intentionally designed to be misleading? Why or why not?

Information Overload

  • What problems do you have reading this chart?

  • Can you compare the quantities of all 6 variables shown? Why or why not?

(Yes, the blog this chart is taken from is satirical. This is not a recommended graphical form.)

Designing Good Charts

Why Graphics Matter

Graphics are a form of external cognition that allow us to think about the data rather than the chart.

That is, graphics are a tool to make it easier for us to think about what the data means.

Good graphics take advantage of how the brain works, leveraging

  • preattentive processing

  • perceptual grouping

  • awareness of visual limitations

Good graphics also depend on the data: the chart type should be chosen based on the types of variables you want to display, the amount of data you have, and the results you want to highlight.

Example: Hertzsprung Russell Diagram

A scatter plot showing the color index of a star on the x-axis and the absolute magnitude (brightness) of the star on the y-axis. Points are colored by spectral class, which varies from blue to white to yellow to red as the color index increases and the star's temperature decreases. Points are primarily located along a downward-sloping line from the top left to the bottom right, which is labeled the 'main sequence'. There is another set of points which diverges from the main sequence and extends out horizontally in the middle of the graph; these are labeled 'giants', and a few outliers that are above the giant cluster are labeled 'supergiants'. Below the main sequence stars, there are outliers which are labeled 'dwarfs'.

The Hertzsprung Russell diagram. Discovered independently by Ejnar Hertzsprung (1873–1967) and Henry Norris Russell (1877–1957). The diagram plots the color index of the star against the brightness (absolute magnitude) of the star. As a result, it is possible to discern that these two variables are related and change together over a star’s life cycle: a hypothesis that only came to be because of this chart.

I’ve used data from the HYG Database to generate this chart. Only stars within 500 AU are shown.

Preattentive Perception

  • Combinations of preattentive features require attention

    • Double-encoding (using multiple features for the same variable) is ok

(a) Shape

(b) Color

Figure 4: Two scatterplots with one point that is different. Can you easily spot the different point?

Preattentive Perception

  • Combinations of preattentive features require attention

    • Double-encoding (using multiple features for the same variable) is ok

(a) Shape and Color (dual encoded)

(b) Shape and Color (different variables)

Figure 5: Two scatterplots. Can you easily spot the different point(s)?

Perceptual Grouping

A picture laid out on an x-y coordinate axis. The y-axis is labeled 'Rabbit' and the x-axis is labeled 'Duck'. When viewed with the rabbit axis at the bottom, the image looks like a rabbit with tall ears; when viewed with the duck axis at the bottom, the image looks like a duck, where the ears become the bill.

Figure 6: Is this a rabbit or a duck?

 

A complex figure that appears to be made up of a black outline of a triangle, three circles laid out to form the points of an inverted triangle, and a white triangle overlaid on top of the three dots. In practice, what actually exists is a set of three angles at 0, 120, and 240 degrees, and a set of three pac-man shapes (circles with a pie slice taken out) at 60, 180, and 300 degrees.

Figure 7: What do you see in this image?

Perceptual Grouping

An image reading 'GESTALT', where each letter demonstrates a principle of gestalt grouping. G has a white stripe over it, demonstrating closure - the stripe and the G are perceived as separate objects. E is shown as a grid of black squares, with grey squares making up the background; this demonstrates proximity - small objects close together are perceived as being part of the same whole. A bar is woven through the S shape, showing good continuation - the S is perceived as a continuous object that is behind the bar in the middle portion. The two Ts are striped and indicate similarity - they are similarly shaped and patterned and can be perceived as a group. The AL are connected, and the inside of the A seems to have a white tree in the middle, demonstrating figure/ground. The final T is part of the similarity group.

Grouping in Charts

Bar chart, showing 5 states

Three versions of the same data that emphasize different aspects of the dataset.

Perceptual Limitations

Color

  • 10% of XY and 0.2% of XX have color deficiency

  • Avoid rainbow color schemes

    • unequal perception of different colors (we see more shades of green) = misleading idea of distance

    • avoid “stoplight” color schemes

Color

  • Strategies
    • double encoding (color + shape or linetype)
    • make it work with black and white printers
    • use monochromatic (single color) gradients where possible
    • if you use bidirectional scale (e.g. blue to red), go through white or light yellow

Color

  • Be conscious of what some colors “mean” to use implicit understandings

  • Avoid pink = female, blue = male

Working Memory

  • you can remember between 5 and 9 (7 plus or minus 2) things without writing them down

  • Avoid legends that have more than 7 categories

Accessibility

  • Provide alt-text for your charts that describe the important information as well as the data source and how it is represented

  • Use larger text to make it easy for people with vis deficiencies to read

  • Some fonts are easier for people with dyslexia to read – if you’re working with someone with dyslexia, ask them what font they prefer.