Tutorial 1: Introduction to R and Basic Data Types

Author

Rony Rodriguez-Ramirez

Introduction to R

R is a powerful programming language widely used for statistical computing and data analysis. In this tutorial, we will cover the basics of R, including how to work with vectors and basic data types. By the end of this tutorial, you will be able to perform simple data operations in R and understand the fundamental data types.

1.1 Setting Up R and RStudio

To begin using R, you’ll need to install R and RStudio. RStudio is an integrated development environment (IDE) that makes it easier to write R code.

  • Installing R: Visit the CRAN website to download and install R.
  • Installing RStudio: Download and install RStudio from the RStudio website.

Once installed, open RStudio, and you’re ready to start coding in R!

1.2 Basic Data Types in R

R has several basic data types that you will use frequently:

1.2.1 Vectors

Vectors are the most basic data structure in R. A vector is a sequence of data elements of the same basic type.

# Numeric vector
scores <- c(85, 90, 76, 88, 92)

# Character vector
students <- c("Alice", "Bob", "Charlie", "David", "Eva")

# Logical vector
passed <- c(TRUE, TRUE, FALSE, TRUE, TRUE)

1.2.2 Data Types

R includes several fundamental data types:

  • Numeric: Used for numbers. E.g., 1, 3.14, 42.
  • Character: Used for text strings. E.g., "apple", "R programming".
  • Logical: Used for TRUE or FALSE values. E.g., TRUE, FALSE.
  • Factor: Used for categorical data. E.g., levels like "low", "medium", "high".
# Example of different data types
age <- 25               # Numeric
name <- "Alice"         # Character
is_student <- TRUE      # Logical

1.3 Basic Operations with Vectors

You can perform arithmetic operations on numeric vectors and use indices to subset them.

# Arithmetic operations
total_score <- scores + 5  # Adding 5 to each score

# Subsetting vectors
top_student <- students[which.max(scores)]  # Finding the student with the highest score

Exercises and Solutions

Exercise 1: Create and Manipulate Vectors

  1. Create a numeric vector called ages that contains the ages of five students: 18, 21, 19, 22, 20.
  2. Subtract 2 from each element in the ages vector.
  3. Find the maximum age in the ages vector.

Solution:

# Step 1: Create the ages vector
ages <- c(18, 21, 19, 22, 20)

# Step 2: Subtract 2 from each element
adjusted_ages <- ages - 2

# Step 3: Find the maximum age
max_age <- max(adjusted_ages)
max_age
[1] 20

Expected output:

[1] 20

Exercise 2: Working with Character Vectors

  1. Create a character vector called subjects that contains the names of three school subjects: "Math", "History", "Biology".
  2. Add a new subject "Physics" to the subjects vector.
  3. Extract the second subject from the subjects vector.

Solution:

# Step 1: Create the subjects vector
subjects <- c("Math", "History", "Biology")

# Step 2: Add a new subject
subjects <- c(subjects, "Physics")

# Step 3: Extract the second subject
second_subject <- subjects[2]
second_subject
[1] "History"

Expected output:

[1] "History"

Exercise 3: Logical Operations

  1. Create a logical vector called attendance with values TRUE, FALSE, TRUE, TRUE, FALSE.
  2. Count how many students attended (i.e., how many TRUE values there are).
  3. Find out if all students attended by using the all() function.

Solution:

# Step 1: Create the attendance vector
attendance <- c(TRUE, FALSE, TRUE, TRUE, FALSE)

# Step 2: Count how many students attended
count_attendance <- sum(attendance)
count_attendance
[1] 3
# Step 3: Check if all students attended
all_attended <- all(attendance)
all_attended
[1] FALSE

Exercise 4: Calculating Averages

  1. Using the scores vector from earlier, calculate the average score of the students.
  2. Determine how many students scored above 80 using the sum() function.

Solution:

# Step 1: Calculate the average score
average_score <- mean(scores)
average_score
[1] 86.2
# Step 2: Count how many students scored above 80
students_above_80 <- sum(scores > 80)
students_above_80
[1] 4

Exercise 5: Creating and Using Factors

  1. Create a factor variable grade_levels with the levels "Freshman", "Sophomore", "Junior", "Senior".
  2. Assign a grade level to each student in the students vector.
  3. Display the frequency of each grade level using the table() function.

Solution:

# Step 1: Create the grade_levels factor
grade_levels <- factor(c("Freshman", "Sophomore", "Junior", "Senior", "Freshman"),
                       levels = c("Freshman", "Sophomore", "Junior", "Senior"))

# Step 2: Assign grade levels to students
students_grade_levels <- data.frame(students, grade_levels)

# Step 3: Display the frequency of each grade level
grade_frequency <- table(students_grade_levels$grade_levels)
grade_frequency