Signals from the retina are integrated - multiple rods combined together
Feature detectors - parts of the brain that recognize lines at specific angles, in spatial arrays
Specialized modules for e.g. face detection
Lots of additional processing – “software”
Preattentive features such as color, shape, position, are integrated and applied to single objects through focused attention
Visual Memory – Analog encoding affects recall of image content, relevant to statistical graphics.
Working Memory – Active memory can contain about 7 chunks of information (Miller 1956); therefore, design graphics with \(\le\) 7 categories to match memory capacity.
Information Integration – Effective graphs help the brain integrate information across dimensions by creating “chunks” of information.
Resource Limitations – Human attention is limited; focus on important data aspects.
Question: What is the relationship between the length of the eruption and the time between eruptions for Old Faithful?
Source: Ratwani, Trafton, and Boehm-Davis (2008)
Image source: Padilla et al. (2018)
Rank | Task |
---|---|
1 | Position (common scale) |
2 | Position (non-aligned scale) |
3 | Length, Direction, Angle, Slope |
4 | Area |
5 | Volume, Density, Curvature |
6 | Shading, Color Saturation, Color Hue |
Sources: Cleveland and McGill (1984); Cleveland and McGill (1985); Cleveland and McGill (1987)
Palmer Penguin data collected by species. What is the average number of Adelie and Chinstrap Penguins measured? What steps do you go through to calculate this average?
Image source: Padilla et al (2018), an adaptation of Pinker (1990)
Image source: Padilla et al. (2018)
Focus on the most important comparisons, making it as easy as possible to visually process important data features.
For variables where accuracy is important, use \(x\) and \(y\) axes. Show less important variables using other aesthetics - color, shape, size, etc.
As much as possible, reduce cognitive load for your viewers. This can take many forms, including putting labels directly on the chart rather than in a legend.
When looking at a chart, talk through what comparisons you’re making audibly. Then, show a friend the same chart and have them do the same exercise.
Download data and/or read directly from URL at
https://raw.githubusercontent.com/earobinson95/data-for-download/main/richmond-va-childcare.csv
state_abbreviation
VAcounty_name
Richmond County onlystudy_year
2009 - 2018center_type
Family / Centerdevelopment_stage
Infant / Toddler / Preschoolmedian_weekly_childcare_cost
in U.S. dollarsSource: https://www.dol.gov/agencies/wb/topics/featured-childcare
How do historical trends of full-time weekly median price charged for childcare differ between family-based and center-based care in Richmond, VA for each of the development stages?
Ask yourself…
# a pretty bad start
richmond_childcare |>
filter(county_name == "Richmond County") |>
ggplot(aes(x = study_year,
y = median_weekly_childcare_cost,
fill = development_stage
)
) +
geom_bar(stat = "identity",
position = "dodge") +
facet_wrap(~ center_type) +
scale_x_continuous(limits = c(2008, 2018),
breaks = seq(2008, 2018, 2)
) +
scale_y_continuous(labels = scales::dollar) +
scale_fill_manual(values = c("#ff7400", "#c05ccb", "#056e76")) +
labs(x = "Study Year",
y = "Median Weekly Childcare Cost",
fill = "Development Stage") +
theme_bw()
# a slightly better start?
richmond_childcare |>
filter(county_name == "Richmond County") |>
ggplot(aes(x = study_year,
y = median_weekly_childcare_cost,
color = development_stage,
shape = center_type
)
) +
geom_point() +
scale_x_continuous(limits = c(2008, 2018),
breaks = seq(2008, 2018, 2)
) +
scale_y_continuous(labels = scales::dollar) +
scale_color_manual(values = c("#ff7400", "#c05ccb", "#056e76")) +
labs(x = "Study Year",
y = "Median Weekly Childcare Cost",
color = "Development Stage",
shape = "Center Type") +
theme_bw()
import pandas as pd
from plotnine import *
import requests
# Read in the data
url = "https://raw.githubusercontent.com/earobinson95/data-for-download/main/richmond-va-childcare.csv"
richmond_childcare = pd.read_csv(url)
# Reorder the development stages
richmond_childcare['development_stage'] = pd.Categorical(
richmond_childcare['development_stage'],
categories=["Infant", "Toddler", "Preschool"],
ordered=True
)
# First plot: Bar plot
p1 = (ggplot(richmond_childcare, aes(x='study_year',
y='median_weekly_childcare_cost',
fill='development_stage')) +
geom_bar(stat='identity', position='dodge') +
facet_wrap('~center_type') +
scale_x_continuous(limits=(2008, 2018), breaks=range(2008, 2019, 2)) +
scale_y_continuous(labels=lambda l: ["${:,.0f}".format(v) for v in l]) +
scale_fill_manual(values=["#ff7400", "#c05ccb", "#056e76"]) +
labs(x="Study Year", y="Median Weekly Childcare Cost", fill="Development Stage") +
theme_bw()
)
print(p1)
# Second plot: Scatter plot
/home/susan/.virtualenvs/r-reticulate/lib/python3.11/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_bar : Removed 4 rows containing missing values.
p2 = (ggplot(richmond_childcare, aes(x='study_year',
y='median_weekly_childcare_cost',
color='development_stage',
shape='center_type')) +
geom_point() +
scale_x_continuous(limits=(2008, 2018), breaks=range(2008, 2019, 2)) +
scale_y_continuous(labels=lambda l: ["${:,.0f}".format(v) for v in l]) +
scale_color_manual(values=["#ff7400", "#c05ccb", "#056e76"]) +
labs(x="Study Year", y="Median Weekly Childcare Cost", color="Development Stage", shape = "Center Type") +
theme_bw()
)
print(p2)
Upload your redesigned graphic (even if you sketched or critiqued it!) to https://bit.ly/sdss-redesign-the-graphic.