The goal of LegoR is to make it easy to get Lego-centric data into R.
You can install the development version from GitHub with:
# Set up and load packages
library(tidyverse)
#> ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 3.2.0 ✔ purrr 0.3.2
#> ✔ tibble 2.1.3 ✔ dplyr 0.8.3
#> ✔ tidyr 0.8.3.9000 ✔ stringr 1.4.0
#> ✔ readr 1.3.1 ✔ forcats 0.4.0
#> ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
library(LegoR)
The first set of functions provide a convenient way to scrape data from https://shop.lego.com/. These functions are based on the rvest
package and depend on the structure of the site; site updates may break the functionality. All functions for lego.com start with the lego_
prefix.
The natural approach to gather data on all currently available lego sets is to get all sets by theme.
(themes <- lego_get_themes())
#> # A tibble: 40 x 4
#> theme_name theme_link theme_description theme_age_range
#> <chr> <chr> <chr> <chr>
#> 1 Architectu… https://shop.lego… LEGO® Architecture prese… ""
#> 2 BOOST https://shop.lego… LEGO® BOOST lets childre… ""
#> 3 BrickHeadz https://shop.lego… Collect, build and displ… ""
#> 4 City https://shop.lego… LEGO® City is a realisti… ""
#> 5 Classic https://shop.lego… Develop children’s creat… ""
#> 6 Creator 3-… https://shop.lego… The LEGO® Creator series… ""
#> 7 Creator Ex… https://shop.lego… Are you ready for the ul… ""
#> 8 DC Super H… https://shop.lego… LEGO® DC Universe™ Super… ""
#> 9 Disney™ https://shop.lego… LEGO® Disney characters … ""
#> 10 DUPLO® https://shop.lego… For 50 years, we have be… ""
#> # … with 30 more rows
Each theme link leads to a page with one or more sets.
(architecture_sets <- lego_get_sets(themes$theme_link[1]))
#> # A tibble: 11 x 5
#> set_flag set_id set_price set_title set_link
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 New 21046 130. Empire State Bu… https://shop.lego.com/en-US/…
#> 2 New 21045 80.0 Trafalgar Square https://shop.lego.com/en-US/…
#> 3 <NA> 21042 120. Statue of Liber… https://shop.lego.com/en-US/…
#> 4 <NA> 21030 100.0 United States C… https://shop.lego.com/en-US/…
#> 5 <NA> 21028 60.0 New York City https://shop.lego.com/en-US/…
#> 6 <NA> 21039 60.0 Shanghai https://shop.lego.com/en-US/…
#> 7 <NA> 21041 50.0 Great Wall of C… https://shop.lego.com/en-US/…
#> 8 <NA> 21044 50.0 Paris https://shop.lego.com/en-US/…
#> 9 <NA> 21043 50.0 San Francisco https://shop.lego.com/en-US/…
#> 10 <NA> 21047 40.0 Las Vegas https://shop.lego.com/en-US/…
#> 11 <NA> 21034 40.0 London https://shop.lego.com/en-US/…
If the goal is to get the price and titles, we could stop here, but more set data is available on the set-specific page.
lego_get_set_data(architecture_sets$set_link[1])
#> # A tibble: 1 x 9
#> set_Item set_VIP.Points set_Ages set_Pieces set_minifigs set_availability
#> <chr> <chr> <chr> <chr> <lgl> <chr>
#> 1 21046 845 16+ 1767 NA Available now
#> # … with 3 more variables: set_review_count <dbl>, set_rating_value <dbl>,
#> # set_best_rating <dbl>
These sets are structured in order to provide easy pipe functionality:
set_data <- lego_get_themes() %>%
filter(row_number() == 1) %>% # Don't get everything in the demo
mutate(set_summary = purrr::map(theme_link, lego_get_sets)) %>%
unnest(set_summary) %>%
mutate(set_data = purrr::map(set_link, lego_get_set_data)) %>%
unnest(set_data) %>%
select(-set_Item) # Some variables are repeated
set_data
#> # A tibble: 11 x 17
#> theme_name theme_link theme_descripti… theme_age_range set_flag set_id
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Architect… https://s… LEGO® Architect… "" New 21046
#> 2 Architect… https://s… LEGO® Architect… "" New 21045
#> 3 Architect… https://s… LEGO® Architect… "" <NA> 21042
#> 4 Architect… https://s… LEGO® Architect… "" <NA> 21030
#> 5 Architect… https://s… LEGO® Architect… "" <NA> 21028
#> 6 Architect… https://s… LEGO® Architect… "" <NA> 21039
#> 7 Architect… https://s… LEGO® Architect… "" <NA> 21041
#> 8 Architect… https://s… LEGO® Architect… "" <NA> 21044
#> 9 Architect… https://s… LEGO® Architect… "" <NA> 21043
#> 10 Architect… https://s… LEGO® Architect… "" <NA> 21047
#> 11 Architect… https://s… LEGO® Architect… "" <NA> 21034
#> # … with 11 more variables: set_price <dbl>, set_title <chr>,
#> # set_link <chr>, set_VIP.Points <chr>, set_Ages <chr>,
#> # set_Pieces <chr>, set_minifigs <lgl>, set_availability <chr>,
#> # set_review_count <dbl>, set_rating_value <dbl>, set_best_rating <dbl>
https://brickset.com/ contains data on historical lego sets as well as current sets. Unlike Lego.com, we can access Brickset data using an API (application programming interface). This does require registering for a brickset account and requesting an API key. All functions for the brickset.com data start with the brickset_
prefix.
Once you have your credentials, you can save them to your Rprofile using brickset_save_credentials()
. This will also save the credentials as global variables.
Then, you can access brickset’s data by authenticating. You may have to periodically reauthenticate depending on your internet configuration, but most functions should refresh the authentication automatically.
As with the Lego store, sets on brickset are organized by theme.
We can see what themes existed at the beginning…
Or the themes that have been around the longest…
Most of the functions described in the API documentation have been wrapped; the exception is functions which concern a user’s personal collection.