Swiftorial Logo
Home
Swift Lessons
Tutorials
Learn More
Career
Resources

Data Frames in R Programming

Introduction to Data Frames

A data frame is a two-dimensional, tabular data structure in R that can hold different types of variables (numeric, character, factor, etc.) in each column. It is similar to a spreadsheet or a SQL table and is one of the most widely used data structures for data analysis in R.

Creating Data Frames

Data frames can be created using the data.frame() function. The columns can be created using vectors of different types.

Example:
df <- data.frame(Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 35), Height = c(5.5, 6.0, 5.8))

This code creates a data frame named df with three columns: Name, Age, and Height.

Output:
      Name Age Height
1  Alice  25   5.5
2    Bob  30   6.0
3 Charlie  35   5.8
                

Accessing Data Frame Elements

You can access data frame elements using the $ operator, brackets, or functions like head() and tail().

Example:
df$Name

Using this command, you can access the Name column of the data frame.

Output:
[1] "Alice"   "Bob"     "Charlie"
                

Adding and Removing Columns

You can add new columns to a data frame using the $ operator, and you can remove columns using the subset() function or by setting the column to NULL.

Example:
df$Weight <- c(130, 150, 145)

This line adds a new column named Weight to the existing data frame df.

Output:
      Name Age Height Weight
1  Alice  25   5.5    130
2    Bob  30   6.0    150
3 Charlie  35   5.8    145
                
Removing a Column:
df$Height <- NULL

This line removes the Height column from the data frame.

Subsetting Data Frames

Subsetting allows you to extract specific rows or columns from a data frame. You can use logical conditions or specify row/column indices.

Example:
subset(df, Age > 28)

This command retrieves rows where the Age is greater than 28.

Output:
      Name Age Weight
2    Bob  30    150
3 Charlie  35    145
                

Sorting Data Frames

Sorting can be done using the order() function, allowing you to sort the data frame based on one or more columns.

Example:
df <- df[order(df$Age), ]

This command sorts the data frame df by the Age column in ascending order.

Conclusion

Data frames are a fundamental data structure in R that provide flexible and efficient ways to manage and analyze data. Understanding how to create, manipulate, and analyze data frames is crucial for data analysis in R programming.