Teaching Statistical Computing with R and Python

Susan Vanderplas

2025-08-09

Outline

  • How to accidentally write a textbook

  • Design Decisions

  • Teaching R and Python together

  • Conclusions

Textbook Link

Slides Link

How to Accidentally Write A Textbook

A gif of Bob Ross painting and saying 'We don't make mistakes - we just have happy accidents.

In the beginning… Spring 2020 ⛈️🌏

Teaching Computing Tools for Statistics (Grad) in Fall

  • I have to teach R AND SAS?

    • Oh 💩, I have to learn SAS
  • Fall 2020 uncertainty ☣️🦠😷

    • Flipped classroom = more flexibility? 🤞
  • Outside-class material: Videos or book(s)?

    • What book is the SAS equivalent of R4DS?

🤔 Writing a textbook > editing videos, right?

Galaxy Brain meme
Gif of Kermit the Frog typing furiously on a typewriter

My 2020 world-on-fire coping mechanism

Spring 2021-22

  • Assigned to develop new undergraduate computing courses in Spring 2022

    • in R and Python 🐍
    • Oh 💩, I have to learn Python 😱
  • New book w/ grad text in undergrad-sized chunks

  • Add Python 😬, Remove SAS 😌

  • Maintaining two textbooks is hard work!

Summer 2022 - Quarto!

  • Plan: Combine the textbooks
    • multiple chapters/wk for grad students 📚
    • single chapter/wk for undergrads 📗
    • Only maintain one version 🥱
  • Convert to Quarto
  • Make use of tabset panels 😍
# R code
# Python code

Since Summer 2022

  • Add chapters for additional undergrad computing courses

  • Update chapters: clarity, package updates, etc.

  • This summer:

    • Title II compatibility checks
    • Graduate student editing
    • New chapters - advanced stat computing

Design Decisions

Initial Philosophy

  • Include all of the bad jokes 😜, comics 💬, and digressions 🐇 from lectures

  • GIFs can explain better than words (sometimes)

  • Show the same techniques in both languages

    • focus on the operation, not syntax
  • Center importance of reproducibility
    (with SAS? 😅 SASMarkdown)

  • Opinionated Tool Selection

Sketchy Diagrams

Examples and Activities

Guess and Check

What will this code output?

myfun = function() x + 1

x = 14

myfun()

x = 20

myfun()

The state of the global environment at the time the function is called (that is, the state of the calling environment) can change the results of the function
myfun <- function() {
  x + 1
}

x <- 14

myfun()
[1] 15
x <- 20

myfun()
[1] 21
def myfun():
  return x + 1


x = 14

myfun()
15
x = 20

myfun()
21

Additional Reproducibility

  • Without SAS, 👼👿
    build the textbook via GitHub Actions?

    • LOTS of package deps in R and python 😱😵‍💫

    • Better reproducibility, more of a pain in the 🍑

    • Dependency caching is a lifesaver (most of the time)

Information Overload 🤯🫣

  • It’s hard to strike a balance between
    • amount of code/output
    • centering important content
A screenshot of the textbook Preface. Content Overload! This book is designed to demonstrate introductory statistical programming concepts and techniques. It is intended as a substitute for hours and hours of video lectures - watching someone code and talk about code is not usually the best way to learn how to code. It’s far better to learn how to code by … coding. I hope that you will work through this book week by week over the semester. I have included comics, snark, gifs, YouTube videos, extra resources, and more: my goal is to make this a collection of the best information I can find on statistical programming. In most cases, this book includes way more information than you need. Everyone comes into this class with a different level of computing experience, so I’ve attempted to make this book comprehensive. Unfortunately, that means some people will be bored and some will be overwhelmed. Use this book in the way that works best for you - skip over the stuff you know already, ignore the stuff that seems too complex until you understand the basics. Come back to the scary stuff later and see if it makes more sense to you.

In Practice

  • Students with a non-English dominant language vastly preferred the textbook to lectures
    • Sometimes supplemented with R4DS translations ❤️
  • Some spend way more time outside of class ⏲️
    • But, they would have been lost in a lecture-format class
  • Reading quizzes are essential to remind students to look at the book before class
    • Not a universally successful motivator

Overall, a reasonable success!

Teaching R and Python Together

A hand is adding R and python to a bubbling stew stirred by another hand.

Image generated by ChatGPT and modified

Why?

  • 🔬 Focus on concepts (look up syntax 🔍)
  • Data science/stat languages change! (Julia? JavaScript? Python? Ruby?)
  • Critical Skills:
    • 📖 Reading & 🧠 understanding documentation
    • Knowing how to look for help 🆘

How?

  • Assignments

    • basic tasks in both languages 👶
    • harder tasks 🧑 – pick a language
    • Bonus points (sometimes) for using both languages 🧑‍🦳
  • 2 Languages keep advanced students engaged
    Others focus on mastering one language + concepts

  • Exams

    • Use either language for code
    • Focus on concepts/sketching/planning

Student Reactions

  • Initial frustration 😧 - syntax is hard in one language!

    • Plus quarto, markdown, git… it’s a lot! 🧩
  • Motivation: internships and experience 🧑‍💼 💼

  • Some appreciation for learning “why”, not “what” to do

  • Textbook helps anchor concepts (visual memory 👀🧠)

  • Like being able to comment via giscus 🗨️

    • Relatively quick resonse time
    • Adventurous students submit PRs to fix typos

Roadmap

Additional Chapters & Updates

  • Interactive Graphics beyond Shiny/Plotly

  • Databases 🦆 🏹

  • “Big” data strategies

  • Add polars 🐻‍❄️ content to supplement pandas 🐼

Other Cool Stuff I’d love to get to

  • webR and PyScript integration

    • Allow direct interactivity instead of just showing code + solutions
  • Audiobook format with quarto.audiobook (2024 GSoC Project)

  • Using JavaScript to create a TL;DR switch

    • hide long explanations
    • focus on the basic info
    • reduce info overload 😵‍💫 ️🏋️‍♀️

Wrapping Up

Important Sources

  • Allison Horst’s amazing artwork
  • comics - XKCD and Julia Evans

Books

  • R for Data Science
  • Advanced R
  • Git and GitHub for the useR - Jenny Bryan
  • Python for Everyone - Charles Severance
  • Python for Data Analysis - Wes McKinney
  • Python Data Science Handbook - Jake Vanderplas

Image by Allison Horst. Source

Questions?