Part III: Data Wrangling

Published

December 17, 2024

This part of the textbook covers topics related to working with data. Every data set is messy in its own way [1], and this section is focused on providing you with some of the tools to deal with the most common types of messy data.

17  Data Input covers how to read in data from many common formats, such as spreadsheets, web tables, and databases.

Then, we start to talk about graphics.

18  Data Visualization Basics provides a brief primer on how to create different charts and graphics in R and python for use in other sections.

19  Exploratory Data Analysis talks about how to do exploratory data analysis using graphs, tables, and summary statistics.

20  Data Visualization covers the in-depth about how to create different types of charts using R and python.

21  Creating Good Charts discusses how to create good graphics - graphics that are easy to read, account for human perceptual quirks, and look nice.

In the next set of sections, we talk about how to rearrange, summarize, clean, and manipulate data.

22  Data Cleaning covers verbs for transforming data - creating new variables, modifying existing variables, selecting specific rows and/or columns, and more.

23  Working with Strings discusses string manipulations. Text data is some of the messiest data out there, and this chapter will give you some tools to help make text and character data more tidy.

24  Reshaping Data discusses how to transform data from wide human-friendly formats to long comptuer-friendly formats for analysis and processing.

25  Joining Data discusses how to join data sets together using common variables.

Finally, we talk about a few specific types of data: dates and times, lists, and spatial data.

26  Dates and Times discusses how to work with dates and times using R and Python.

27  Functional Programming discusses how to work with data stored in nested lists efficiently.

28  Spatial data covers working with spatial data formats: drawing maps, working with spatial regions, and more. We will focus exclusively on the visualization and data wrangling part, leaving the modeling of spatial data for a different time.

References

[1]
H. Wickham, “Tidy data,” The Journal of Statistical Software, vol. 59, 2014 [Online]. Available: http://www.jstatsoft.org/v59/i10/