Part III: Data Wrangling

Published

April 17, 2024

This part of the textbook covers topics related to working with data. Every data set is messy in its own way [1], and this section is focused on providing you with some of the tools to deal with the most common types of messy data.

Chapter 17 covers how to read in data from many common formats, such as spreadsheets, web tables, and databases.

Then, we start to talk about graphics.

Chapter 18 provides a brief primer on how to create different charts and graphics in R and python for use in other sections.

Chapter 19 talks about how to do exploratory data analysis using graphs, tables, and summary statistics.

Chapter 20 covers the in-depth about how to create different types of charts using R and python.

Chapter 21 discusses how to create good graphics - graphics that are easy to read, account for human perceptual quirks, and look nice.

In the next set of sections, we talk about how to rearrange, summarize, clean, and manipulate data.

Chapter 22 covers verbs for transforming data - creating new variables, modifying existing variables, selecting specific rows and/or columns, and more.

Chapter 23 discusses string manipulations. Text data is some of the messiest data out there, and this chapter will give you some tools to help make text and character data more tidy.

Chapter 24 discusses how to transform data from wide human-friendly formats to long comptuer-friendly formats for analysis and processing.

Chapter 25 discusses how to join data sets together using common variables.

Finally, we talk about a few specific types of data: dates and times, lists, and spatial data.

Chapter 26 discusses how to work with dates and times using R and Python.

Chapter 27 discusses how to work with data stored in nested lists efficiently.

Chapter 28 covers working with spatial data formats: drawing maps, working with spatial regions, and more. We will focus exclusively on the visualization and data wrangling part, leaving the modeling of spatial data for a different time.

References

[1]
H. Wickham, “Tidy data,” The Journal of Statistical Software, vol. 59, 2014 [Online]. Available: http://www.jstatsoft.org/v59/i10/