Week 04
RECAP & Quarto

API209: Summer Math Camp

Rony Rodrigo Maximiliano Rodriguez-Ramirez

Harvard University

September 6, 2024

RECAP

First week!

From the top!

  • I am aiming to cover the essentials.
    • Recap about essential functions. (1 hour)
    • Recap about Quarto document. (1 hour)
    • Q&A (Rest of the session)

Checklist

R installed?

     Current version 4.4.1

RStudio installed?

     I’m on RStudio 2024.04.2+764 – This one has Quarto already installed.

Have these packages?

     tidyverse. For the PSet, you may use the sf package for maps.

Full Hands-on

  • Today, we are going 1 by 1.
  • I am going to present you with direct tips (and, maybe, hints) and;
  • We are going to organize our scripts and/or quarto documents at the same time.

Tips for the Recap Session

  • Follow along: Try running the code as we go through each example.
  • Ask questions: There’s no such thing as a bad question—this is a learning space!
  • Take notes: Writing down key points will help solidify your understanding.

Loading Packages and Data

  • What do we do first?
  • We, always, begin by loading the necessary packages.
  • For this set of exercises, we are going to use the starwars dataset.
  • It should be already preloaded since it is part of the tidyverse package.

Loading packages and data

So our first chunk (or lines of code) should look like this:

00:30
library(tidyverse)

We can call the starwars dataset by its name.

starwars
# A tibble: 87 × 14
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>

You know already some functions to check the data.

Which option would get the you total height?

sum(var)

sum(dataset$var, na.rm = TRUE)

starwars |> sum(height)

starwars |> sum(height, na.rm = TRUE)

00:30

Similarly, which option would get you the average height?

mean(var)

mean(dataset$var, na.rm = TRUE)

starwars |> mean(height)

starwars |> mean(height, na.rm = TRUE)

00:30

Summing Values

# Sum of height for all characters
total_height <- sum(starwars$height, na.rm = TRUE)
total_height
[1] 14143
  • We use the sum() function to calculate the total height of all characters in the dataset. The na.rm = TRUE option ensures missing values are ignored.
  • Notice that in this case I am assigning (<-) the result to an object.

In this specific exercise, we don’t really care about the result since there is no actual meaning.

Creating New Variables

What do we use if we want to create new variables?

mutate. Let’s use mass and height from the dataset to create a bmi variable. You can google the formula if you don’t know how to estimate the bmi.

01:30
starwars <- starwars |> 
  mutate(bmi = mass / (height / 100)^2)

Using mutate(), we create a new column bmi, which calculates the Body Mass Index (BMI) for each character based on their mass and height.

Subseting our data

Which function we use to subset our dataset (from the tidyverse package)?

  • filter
  • Imaging that you would like to get only the characters that are tall (i.e., > 200). How do we do it? Assign it to the object tall_characters.
01:30
# Filter characters with height greater than 200
tall_characters <- starwars |> 
  filter(height > 200)
tall_characters
# A tibble: 10 × 15
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 2 Chewbac…    228   112 brown      unknown    blue           200   male  mascu…
 3 Roos Ta…    224    82 none       grey       orange          NA   male  mascu…
 4 Rugor N…    206    NA none       green      orange          NA   male  mascu…
 5 Yarael …    264    NA none       white      yellow          NA   male  mascu…
 6 Lama Su     229    88 none       grey       black           NA   male  mascu…
 7 Taun We     213    NA none       grey       black           NA   fema… femin…
 8 Grievous    216   159 none       brown, wh… green, y…       NA   male  mascu…
 9 Tarfful     234   136 brown      brown      blue            NA   male  mascu…
10 Tion Me…    206    80 none       grey       black           NA   male  mascu…
# ℹ 6 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>, bmi <dbl>

The filter() function is used to select characters whose height is greater than 200 cm.

Sorting

Use the same object, i.e., tall_characters to sort the characters. Number 1 should be the tallest chracter.

01:30
tall_characters |> 
  arrange(desc(height)) |> 
  select(name, height)
# A tibble: 10 × 2
   name         height
   <chr>         <int>
 1 Yarael Poof     264
 2 Tarfful         234
 3 Lama Su         229
 4 Chewbacca       228
 5 Roos Tarpals    224
 6 Grievous        216
 7 Taun We         213
 8 Rugor Nass      206
 9 Tion Medon      206
10 Darth Vader     202

Extra optional question:

What’s the difference between select and filter?

Grouping and Summarizing Data

Now, imagine we would like to know the average height by species in this universe. How do we do it?

01:30
# Think in steps.
avg_height <- starwars |> 
  func(___) |> 
  func(___ = ___(___, na.rm = TRUE))

avg_height

Grouping and Summarizing Data

Now, imagine we would like to know the average height by species in this universe. How do we do it?

# Group by species and summarize average height
avg_height <- starwars |> 
  group_by(species) |> 
  summarise(avg_height = mean(height, na.rm = TRUE))
avg_height
# A tibble: 38 × 2
   species   avg_height
   <chr>          <dbl>
 1 Aleena           79 
 2 Besalisk        198 
 3 Cerean          198 
 4 Chagrian        196 
 5 Clawdite        168 
 6 Droid           131.
 7 Dug             112 
 8 Ewok             88 
 9 Geonosian       183 
10 Gungan          209.
# ℹ 28 more rows

Which spicies has the largest average height?

01:30
avg_height |> 
  arrange(desc(avg_height)) |> 
  head(1) |> 
  pull(species)
[1] "Quermian"

The Quermian

Visualization with ggplot2

Let’s visualize the tallest characters. use the object tall_characters to create a plot of the character name (categorical, y axis) vs their height (x axis).

05:00
tall_characters |> 
  ggplot(
    aes(
      x = height,
      y = name
    )
  ) +
  geom_col(color = "black", fill = "grey") +
  labs(title = "Top 10 tallest characters in this dataset") +
  theme_minimal()

Visualization with ggplot2

How do we arrange the plot?

You can use google. Hint: factor()

05:00
tall_characters |> 
  arrange(height) |> 
  mutate(name = factor(name, levels = name)) |> 
  ggplot(
    aes(
      x = height,
      y = name
    )
  ) +
  geom_col(color = "black", fill = "grey") +
  labs(title = "Top 10 tallest characters in this dataset") +
  theme_minimal()

How do we arrange the plot?

Quarto

Quarto Tip 1: Always Render Your Document

  • Render frequently: Make sure to render your Quarto document often to catch issues early.
  • Use the Render button in RStudio or type Ctrl + Shift + K on your keyboard. Cmd + Shift + K on Mac.
  • Rendering ensures your code works and produces the correct output before you submit or share your document.

Quarto Tip 2: Loading Packages Correctly

  • Load your packages at the top of the document. This makes sure that all the functions you need are available when you run your code.

Example:

library(tidyverse)
  • If a package isn’t loaded, the functions from that package won’t work, leading to errors in your document.

Quarto Tip 3: Code and Answer Boxes

  • In your problem set, you’ll often see two boxes:
    • Your code here: This is where you’ll write and run your R code.
    • Your answer here: This is where you’ll explain your results or interpretations in plain text.

Example:

# Your code here
starwars |> 
  select(name, height)
# A tibble: 87 × 2
   name               height
   <chr>               <int>
 1 Luke Skywalker        172
 2 C-3PO                 167
 3 R2-D2                  96
 4 Darth Vader           202
 5 Leia Organa           150
 6 Owen Lars             178
 7 Beru Whitesun Lars    165
 8 R5-D4                  97
 9 Biggs Darklighter     183
10 Obi-Wan Kenobi        182
# ℹ 77 more rows

Quarto Tip 4: Using the Visual Editor

  • Quarto provides a visual editor to make writing markdown easier.
  • You can access it by clicking the Visual button at the top of your document.
  • The visual editor helps format your text, add headings, lists, and code chunks without needing to remember the exact markdown syntax.

That’s it! Good Luck!