class: center, middle, inverse, title-slide # Framed! ## Reproducing 150 Year Old Charts ### Susan VanderPlas
Ryan Gorluch
Heike Hofmann --- background-image: url(images/StatisticalAtlasCover.png) background-position: 50% 50% background-size: 100% class: center, fullscale --- class:center # Slides [http://srvanderplas.github.io/Presentations/JSM2018/Framed.html](http://srvanderplas.github.io/Presentations/JSM2018/Framed.html) # Project Repository [https://github.com/srvanderplas/Statistical_Atlas](https://github.com/srvanderplas/Statistical_Atlas) --- layout:true # Introduction - Statistical Atlas published for the 9th, 10th, and 11th census (1870, 1880, 1890) - Charts created by hand and reproduced using lithography --- - Goal: Show the composition of the country - Maps .center[<img src="images/DistributionOfWealth.png" width = "50%" alt = "Map of Distribution of Wealth, 1870 Statistical Atlas" style = "margin-top:-30px;"/>] --- - Goal: Show the composition of the country - Abstract Charts: - age pyramids .center[<img src="images/AgePyramid.png" width = "90%" alt = "Age Pyramid, 1870 Statistical Atlas"/>] --- - Goal: Show the composition of the country - Abstract Charts: - age pyramids - ring charts .center[<img src="images/RingChart.jpg" width = "90%" alt = "Ring Chart, 1870 Statistical Atlas"/>] --- - Goal: Show the composition of the country - Abstract Charts: - age pyramids - ring charts - framed mosaic and spine plots --- layout: false background-image: url(images/Church_Plate31.png) background-position: 50% 0% background-size: 100% class: center, fullscale --- background-image: url(images/ChurchAccommodations_USIntroLegend.png) background-position: 50% 50% background-size: 95% class: center ??? - Spine plots - Percent of religious sittings by denomination for each state - Border: percent of unaccommodated population over the age of 10 - Shows top 4 denominations in each state, top 8 denominations overall (technical constraints) - Denominations ordered from largest to smallest within states - comparisons are hard --- background-image: url(images/Occupation_Plate32.png) background-position: 50% 0% background-size: 100% class: center, fullscale --- background-image: url(images/Occupation_USIntroLegend.png) background-position: 50% 50% background-size: 95% class: center ??? - Mosaic plot - a novel type of plot in 1874 - Georg von Mayr didn't publish mosaic plots until 1877 (Friendly, 2002) - Proportion of males and females in each occupation shown in the interior of the chart - Proportion of unaccounted-for persons in the outer band (not separated by gender) --- # Statistical Archaeology ### Reproduce the plots with modern methods Data Sources: - National Historical Geographic Information System - Original 1870 Census tables - Unaccomodated population estimates using 1% microsample data from Integrated Public Use Microdata Series (IPUMS). - Pixel measurements (by hand) of the area of each chart (for comparison) --- ## Occupation .center[<img src="images/Occupation_Framed_Redone.png" width = "100%" alt="Re-created framed and unframed mosaic plots"/>] - Frame cuts the uncounted population into quarters visually - No segmentation of the uncounted population - 97% of uncounted population is female --- ## Occupation <img src="Framed_files/figure-html/unnamed-chunk-1-1.png" width="100%" style="display: block; margin: auto;" /> --- class: bottom ## Church Accommodation -- .center[<img src="images/Church_Framed_Orig.png" width = "80%" src="Church Accommodations Plots, for 3 states, selected from Plate 32 of the 1870 Statistical Atlas"/>] -- .center[<img src="images/Church_Framed_Redone.png" width = "80%" src="Re-imagined version of Church Accommodations Plots, for 3 states"/>] --- ## Church Accommodation .center[<img src="images/Church_Framed_Redone.png" width = "80%" src="Re-imagined version of Church Accommodations Plots, for 3 states"/>] .center[<img src="images/Church_Unframed_Redone.png" width = "80%" src="Re-imagined version of Church Accommodations Plots, for 3 states, without frame"/>] --- ## Church Accommodation <img src="Framed_files/figure-html/religious-bias-plot-1.png" width="60%" style="display: block; margin: auto;" /> -- .center[Whoops. The lithographer screwed up.] --- # Perception of Framed Plots - Frame cuts the unaccommodated population into quarters - hard to estimate the size -- - Difficult to compare between states, as there's no aligned axis due to the frame -- - Estimation requires a two-stage process: 1. Area of the main plot that is accommodated 2. Area of the center region devoted to a specific category -- - Primary comparisons are on the x-axis; the frame disrupts the ease of these comparisons --- # Frames in Practice - 98 Participants (32 from reddit.com/r/samplesize, 63 from Amazon Mechanical Turk) -- - Each participant evaluated 18 plots - Region size: small, medium, large - Pie, Mosaic, Spine plots - Framed or Unframed charts -- - 25 seconds per plot --- # Frames in Practice .center[<img src="images/Alabama-pie_without_frame1.png" width = "28%" style="margin-right:10px;"/> <img src="images/Maine-mosaic_without_frame1.png" width = "28%" style="margin-right:10px;"/> <img src="images/Michigan-spine_without_frame1.png" width = "28%"/>] .center[<img src="images/Alabama-pie_with_frame1.png" width = "28%" style="margin-right:10px;"/> <img src="images/Maine-mosaic_with_frame1.png" width = "28%" style="margin-right:10px;"/> <img src="images/Michigan-spine_with_frame1.png" width = "28%"/>] --- # Results <img src="Framed_files/figure-html/model-1.png" width="100%" /> --- # Conclusions - Estimation of the framed portion of plots is significantly worse than the same area in an unframed plot -- <br> - Framed plots are not great for accurately communicating the data - They don't appear in the 1880 or 1890 census - Pie charts largely replaced mosaic and spine plots -- <br> - The Census Bureau and assocciated institutions did a great job of preserving the data necessary to reproduce the results and identify flaws in the analysis process <br> --- # Conclusions .center[## 150 years later, we can use completely different technology to get the same charts] -- How's that for reproducibility?