Swiftorial Logo
Home
Swift Lessons
Tutorials
Learn More
Career
Resources

Tidyr Package Tutorial

Introduction to tidyr

The tidyr package is part of the tidyverse, which is a collection of R packages designed for data science. Tidyr is specifically designed to help you tidy your data, which means transforming it into a format that is easy to work with. This tutorial will cover the main functions of the tidyr package, including gathering, spreading, separating, and uniting data.

Installation

Before using the tidyr package, you need to install it. You can do this using the install.packages() function in R:

install.packages("tidyr")

Once installed, load it into your R session with:

library(tidyr)

Key Functions in tidyr

1. Gather

The gather() function is used to convert wide data into long format. This is useful when you have multiple columns that represent similar types of data.

data_long <- gather(data_wide, key = "key", value = "value", -id)

This will gather all columns except for id into two columns: key and value.

2. Spread

The spread() function does the opposite of gather; it converts long data into wide format.

data_wide <- spread(data_long, key = "key", value = "value")

This will spread the key column into multiple columns, filling them with the corresponding value.

3. Separate

The separate() function allows you to split a single column into multiple columns based on a delimiter.

data_separated <- separate(data, col = "full_name", into = c("first_name", "last_name"), sep = " ")

This will separate the full_name column into first_name and last_name using a space as the delimiter.

4. Unite

The unite() function combines multiple columns into a single column.

data_united <- unite(data, col = "full_name", first_name, last_name, sep = " ")

This will unite the first_name and last_name columns into a single full_name column.

Examples

Example 1: Gathering Data

Let’s consider a simple dataset:

data_wide <- data.frame(id = 1:3, year_2019 = c(10, 20, 30), year_2020 = c(15, 25, 35))

Gathering this data would look like this:

data_long <- gather(data_wide, key = "year", value = "value", -id)

The output will be a long format dataset with year and value columns.

Example 2: Spreading Data

Using the long data we just created, spreading it back to wide format would be done as follows:

data_wide_reconstructed <- spread(data_long, key = "year", value = "value")

Example 3: Separating Data

For demonstration, let's create a data frame with full names:

data <- data.frame(full_name = c("John Doe", "Jane Smith"))

To separate full_name into first_name and last_name:

data_separated <- separate(data, col = "full_name", into = c("first_name", "last_name"), sep = " ")

Example 4: Uniting Data

To unite first_name and last_name back into full_name:

data_united <- unite(data_separated, col = "full_name", first_name, last_name, sep = " ")

Conclusion

The tidyr package is a powerful tool for data manipulation in R. It simplifies the process of tidying data for analysis, making it easier to work with datasets in a structured manner. With functions like gather(), spread(), separate(), and unite(), you can efficiently reshape your data as needed.

Practice these functions with your own datasets to become more proficient in data manipulation using tidyr!