Using Git Submodules

Susan Vanderplas

2024-02-08

Introduction

Motivation

Data (v1)Project 1Project 2Data(v2)Project 3Data(v3)

Different versions of the data used in different projects.

Motivation

Datagit (lfs)Project 1Project 2Project 3@v1@v2@v3

If we could only use git (or git lfs) to keep track of our data in one place, while still having separate repositories for the different projects!

Git Submodules

Git submodules allow you to embed a git repository inside another git repository.

Sometimes, they feel like this…

Conceptually

Project 2Project 1Codein local repoPaperin local repoCodein local repoPaperin local repoDataData@e309c6@c47311git (lfs)

Git submodules contain a pointer to a specific version of a different repository

Basic Commands

  • Create a submodule in an existing repository
git submodule add <address-to-submodule-repo> <location-in-existing-repo>
git commit -am "Commit with new submodule configs"
  • Clone a project with submodules
git clone --recursive <repo-address> <local-folder-name>
  • Pull changes from a project with submodules
git pull --recurse-submodules
  • Submodule folders are there, files aren’t
git submodule init # add config files locally
git submodule update # pull files from submodules
  • Changing files in a submodule
(within submodule folder)
git checkout main # checkout branch
git commit ... # commit your changes
git push # push your changes

(within main repo folder)
git add <submodule-folder> # change the commit you're using
git commit ... # commit your changes to the main repo
git push # push

Example - Presentations

Presentation-ArchiveStuff before 2020Presentations2020-CSAFE-cmc2020-CSAFE-OpenSourceForensics2020-DSSV...2021-ANSC-Graphics2021-BSE-GSA-Graphics2021-CSAFE-IMPL-OCR...2022-CBIO2022-SDSU-DetectingCircles2022-StatDept-Seminar...2023-ASC2023-C3B2023-CAREER-chalk-talk...2020-Presentations Repo2021-Presentations Repo2022-Presentations Repo2023-Presentations RepoPresentations2020202120222023SubmodulesOld StructureNew Structure

Setup

$ git clone git@github.com:srvanderplas/Presentations.git
Cloning into 'Presentations'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (3/3), done.
$ cd Presentations
$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

Adding a submodule

$ git submodule add git@github.com:srvanderplas/2020-Presentations.git 2020
Cloning into 'Presentations/2020'...
remote: Enumerating objects: 1151, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 1151 (delta 0), reused 5 (delta 0), pack-reused 1145
Receiving objects: 100% (1151/1151), 358.25 MiB | 53.17 MiB/s, done.
Resolving deltas: 100% (470/470), done.

Check to see what’s there

$ ls 2020/
02-CSAFE-IMPL-OCR 
02-SDSU
03-UNL-Forensics
04-UNL-Extension 
07-DSSV
08-ResearchTopics
11-UNL-Election
12-CSAFE-cmc
12-CSAFE-OpenSourceForensics
2020-Presentations.Rproj
code
data
images
libs
LICENSE
xaringan_csafe2.0
xaringan_unl

Add another submodule

$ git submodule add git@github.com:srvanderplas/2021-Presentations.git 2021
Cloning into 'Presentations/2021'...
remote: Enumerating objects: 463, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 463 (delta 1), reused 8 (delta 1), pack-reused 455
Receiving objects: 100% (463/463), 212.39 MiB | 52.43 MiB/s, done.
Resolving deltas: 100% (76/76), done.

Repeat a couple more times…

See our changes

Presentations$ ls
2020  2021  2022  2023  2024  README.md
Presentations$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file:   .gitmodules
new file:   2020
new file:   2021
new file:   2022
new file:   2023
new file:   2024

Commit…

Presentations$ git commit -am "Add submodules"
[main 94d80b5] Add submodules
6 files changed, 20 insertions(+)
create mode 100644 .gitmodules
create mode 160000 2020
create mode 160000 2021
create mode 160000 2022
create mode 160000 2023
create mode 160000 2024

Push to github

Presentations$ git push
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 543 bytes | 543.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:srvanderplas/Presentations.git
1b161d9..94d80b5  main -> main

What does it look like?

Github Repository with Submodules

Cloning to another machine

Use the --recursive flag to clone submodules along with the main repo

$ git clone --recursive git@github.com:srvanderplas/Presentations.git Presentations-copy
Cloning into 'Presentations-copy'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 0), reused 3 (delta 0), pack-reused 0
Receiving objects: 100% (6/6), done.
Submodule '2020' (git@github.com:srvanderplas/2020-Presentations.git) registered for path '2020'
Submodule '2021' (git@github.com:srvanderplas/2021-Presentations.git) registered for path '2021'
Submodule '2022' (git@github.com:srvanderplas/2022-Presentations.git) registered for path '2022'
Submodule '2023' (git@github.com:srvanderplas/2023-Presentations.git) registered for path '2023'
Submodule '2024' (git@github.com:srvanderplas/2024-Presentations.git) registered for path '2024'
Cloning into '~/Projects/Web/Presentations-copy/2020'...
remote: Enumerating objects: 1151, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 1151 (delta 0), reused 5 (delta 0), pack-reused 1145
Receiving objects: 100% (1151/1151), 358.25 MiB | 53.17 MiB/s, done.
Resolving deltas: 100% (470/470), done.
Cloning into '~/Projects/Web/Presentations-copy/2021'...
remote: Enumerating objects: 463, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 463 (delta 1), reused 8 (delta 1), pack-reused 455
Receiving objects: 100% (463/463), 212.39 MiB | 50.32 MiB/s, done.
Resolving deltas: 100% (76/76), done.
Cloning into '~/Projects/Web/Presentations-copy/2022'...
remote: Enumerating objects: 443, done.
remote: Counting objects: 100% (121/121), done.
remote: Compressing objects: 100% (106/106), done.
remote: Total 443 (delta 11), reused 118 (delta 9), pack-reused 322
Receiving objects: 100% (443/443), 221.33 MiB | 47.46 MiB/s, done.
Resolving deltas: 100% (57/57), done.
Cloning into '~/Projects/Web/Presentations-copy/2023'...
remote: Enumerating objects: 695, done.
remote: Counting objects: 100% (41/41), done.
remote: Compressing objects: 100% (29/29), done.
remote: Total 695 (delta 12), reused 36 (delta 11), pack-reused 654
Receiving objects: 100% (695/695), 196.34 MiB | 52.04 MiB/s, done.
Resolving deltas: 100% (88/88), done.
Cloning into '~/Projects/Web/Presentations-copy/2024'...
remote: Enumerating objects: 412, done.
remote: Counting objects: 100% (151/151), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 412 (delta 60), reused 125 (delta 48), pack-reused 261
Receiving objects: 100% (412/412), 84.17 MiB | 44.36 MiB/s, done.
Resolving deltas: 100% (101/101), done.
Submodule path '2020': checked out '76e41d812f045b92b128db3886ab98fd36c4356f'
Submodule path '2021': checked out '7600505ab20712fdc6db9d14613c5c52885fcdf2'
Submodule path '2022': checked out '89fafb9c9d06dff569efc7ec7beaed946e671825'
Submodule path '2023': checked out 'e7581d5a8ca872706e933087040bd868dd4c6281'
Submodule path '2024': checked out '70770f1b8fbac76b5257423ee525c7524fc44821'

Example - Class

Course Website for Stat 850

  • Links to reading, homework assignments, textbook

  • Homework assignments stored in repos under user stat-assignments, since I use them for both 850 and undergraduate classes.

  • Project template in a computing-project repository

  • Separate assignment from rendering by creating separate files with directives

Repository

Working with Submodules… setting up

~/Downloads$ git clone https://github.com/srvanderplas/unl-stat850/ stat850
Cloning into 'stat850'...
remote: Enumerating objects: 1242, done.
remote: Counting objects: 100% (287/287), done.
remote: Compressing objects: 100% (222/222), done.
remote: Total 1242 (delta 130), reused 200 (delta 57), pack-reused 955
Receiving objects: 100% (1242/1242), 28.75 MiB | 38.68 MiB/s, done.
Resolving deltas: 100% (690/690), done.

Where’s my stuff?

~/Downloads/stat850$ cd homework-repos/
01-git-github/       03-fizzbuzz/         05-data-cookies/     07-murder/           10-professional/     11-simulation/
02-finding-your-way/ 04-data-prog/        06-wrangling/        08-graphics/         11-simulate-craps/   12-shiny/
~/Downloads/stat850$ cd homework-repos/01-git-github/
~/Downloads/stat850/homework-repos/01-git-github$ ls

Fixing this situation

Initialize Submodules in your Repo

git submodule init # create local config file
~/Downloads/stat850$ git submodule init
Submodule 'homework-repos/01-git-github' (git@github.com:stat850-unl/github-starter-course.git) registered for path 'homework-repos/01-git-github'
Submodule 'homework-repos/02-finding-your-way' (git@github.com:stat850-unl/Hw-02-Finding-Your-Way.git) registered for path 'homework-repos/02-finding-your-way'
Submodule 'homework-repos/03-fizzbuzz' (git@github.com:stat-assignments/fizzbuzz-full.git) registered for path 'homework-repos/03-fizzbuzz'
Submodule 'homework-repos/04-data-prog' (git@github.com:stat850-unl/Hw-04-DataProgramming.git) registered for path 'homework-repos/04-data-prog'
Submodule 'homework-repos/05-data-cookies' (git@github.com:unl-stat251/hw-reading-data-template.git) registered for path 'homework-repos/05-data-cookies'
Submodule 'homework-repos/06-wrangling' (git@github.com:stat-assignments/poetry-reddit-dplyr.git) registered for path 'homework-repos/06-wrangling'
Submodule 'homework-repos/07-murder' (git@github.com:stat-assignments/sql-murder.git) registered for path 'homework-repos/07-murder'
Submodule 'homework-repos/08-graphics' (git@github.com:stat-assignments/graphics-complete.git) registered for path 'homework-repos/08-graphics'
Submodule 'homework-repos/10-professional' (git@github.com:stat-assignments/professional-documents.git) registered for path 'homework-repos/10-professional'
Submodule 'homework-repos/11-simulate-craps' (git@github.com:stat-assignments/simulation-craps.git) registered for path 'homework-repos/11-simulate-craps'
Submodule 'homework-repos/11-simulation' (https://github.com/stat-assignments/simulation-adv.git) registered for path 'homework-repos/11-simulation'
Submodule 'homework-repos/12-shiny' (https://github.com/stat-assignments/shiny-cocktails.git) registered for path 'homework-repos/12-shiny'
Submodule 'project/proj-template' (https://github.com/stat850-unl/project.git) registered for path 'project/proj-template'

Fixing this situation

Update your submodules

git submodule update # get changes
~/Downloads/stat850$ git submodule update
Cloning into '~/Downloads/stat850/homework-repos/01-git-github'...
Cloning into '~/Downloads/stat850/homework-repos/02-finding-your-way'...
Cloning into '~/Downloads/stat850/homework-repos/03-fizzbuzz'...
Cloning into '~/Downloads/stat850/homework-repos/04-data-prog'...
Cloning into '~/Downloads/stat850/homework-repos/05-data-cookies'...
Cloning into '~/Downloads/stat850/homework-repos/06-wrangling'...
Cloning into '~/Downloads/stat850/homework-repos/07-murder'...
Cloning into '~/Downloads/stat850/homework-repos/08-graphics'...
Cloning into '~/Downloads/stat850/homework-repos/10-professional'...
Cloning into '~/Downloads/stat850/homework-repos/11-simulate-craps'...
Cloning into '~/Downloads/stat850/homework-repos/11-simulation'...
Cloning into '~/Downloads/stat850/homework-repos/12-shiny'...
Cloning into '~/Downloads/stat850/project/proj-template'...
Submodule path 'homework-repos/01-git-github': checked out 'f9ad9ab3f68753f9ddef601c5e069d94d52ea0db'
Submodule path 'homework-repos/02-finding-your-way': checked out '460f30b2a0de17c6bc0a0b2394c462d370f672ad'
Submodule path 'homework-repos/03-fizzbuzz': checked out '5307020682c78895d5cf8e78927b7b31217727f8'
Submodule path 'homework-repos/04-data-prog': checked out '4344a05f08df8a8b76118b49cd45a325354683a0'
Submodule path 'homework-repos/05-data-cookies': checked out '6a892bc8c8615c731c70d4a5172e507293e9fb90'
Submodule path 'homework-repos/06-wrangling': checked out 'e8b31fc29526f05ddc597e1268db3c2d39962c06'
Submodule path 'homework-repos/07-murder': checked out 'ae7ac0f03d900c13d7ece93ec7d23fc7acca6103'
Submodule path 'homework-repos/08-graphics': checked out '1242e5bc7061a3d83ae114047d94e2e19275eb3c'
Submodule path 'homework-repos/10-professional': checked out '09fbd672451e9c34d657de2316597173eba36996'
Submodule path 'homework-repos/11-simulate-craps': checked out '3e603eb78d6b1fd6199507216eb81265a171cab1'
Submodule path 'homework-repos/11-simulation': checked out 'c14acf0995d6edf122f905b644ad553a3e40d570'
Submodule path 'homework-repos/12-shiny': checked out '51500f00be9058bee0d3b5da60439d0519dd3460'
Submodule path 'project/proj-template': checked out 'e4386574f871804cd9f9bc3127590c5ca7ae404e'

Making a change…

Last year, I accidentally left this line in the murder-mystery repository (I wanted to give extra credit to my undergrads the previous semester, but not grad students)

For 5 bonus points, when you’re finished in one language, write equivalent code in the other language to solve the problem.

I make my changes, and then go to commit them.

Making a change…

In the submodule folder:

  1. git checkout main
  2. git commit -am "changes"
  3. git push

This updates the submodule repository

~/Downloads/stat850$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified:   homework-repos/07-murder (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

Making a change…

In the main folder:

  1. Update the config file to point to the updated commit
    git commit -am "Update submodule"
  2. git push

By default, updates to the submodule don’t change your local version. This is useful for some situations (e.g. dependencies) and annoying for others (e.g. content changes).

Version tags are very helpful.

Removing a submodule

git submodule deinit <path-to-submodule>
git rm <path-to-submodule>
git commit -m "Removed annoying submodule"
git push