Swiftorial Logo
Home
Swift Lessons
Matchups
CodeSnaps
Tutorials
Career
Resources

Python for ML: NumPy & Pandas

1. NumPy Introduction

NumPy is a powerful library for numerical computing in Python. It provides support for arrays, matrices, and a collection of mathematical functions to operate on these data structures.

Key features of NumPy include:

  • Supports multi-dimensional arrays and matrices.
  • Offers a variety of mathematical operations.
  • Facilitates linear algebra, Fourier transforms, and random number generation.

2. Using NumPy

To use NumPy, you need to install it first. You can install it using pip:

pip install numpy

Here is a simple example of creating an array and performing operations:

import numpy as np

# Create an array
array = np.array([1, 2, 3, 4, 5])

# Perform operations
mean = np.mean(array)
sum_array = np.sum(array)

print("Mean:", mean)
print("Sum:", sum_array)
Note: NumPy arrays are more efficient than Python lists for numerical operations.

3. Pandas Introduction

Pandas is a data manipulation and analysis library that provides data structures such as Series and DataFrame, making it easy to work with structured data.

Key features of Pandas include:

  • DataFrame for handling tabular data.
  • Easy data manipulation and cleaning.
  • Integration with many data formats (CSV, Excel, SQL, etc.).

4. Using Pandas

To use Pandas, install it via pip:

pip install pandas

Example of creating a DataFrame and performing basic operations:

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [24, 30, 22]}
df = pd.DataFrame(data)

# Display the DataFrame
print(df)

# Calculate mean age
mean_age = df['Age'].mean()
print("Mean Age:", mean_age)
Tip: Use Pandas for data preprocessing before feeding data into machine learning models.

FAQ

What is the difference between NumPy and Pandas?

NumPy is mainly for numerical data and provides array functionalities, while Pandas is used for data manipulation and analysis, particularly with tabular data.

Can I use NumPy and Pandas together?

Yes, they are often used together in data science projects, with NumPy handling numerical operations and Pandas managing data structures.

How do I handle missing data in Pandas?

You can use methods like dropna() to remove missing values or fillna() to fill them with a specific value.