Swiftorial Logo
Home
Swift Lessons
Matchups
CodeSnaps
Tutorials
Career
Resources

Primary Keys in Cassandra

Introduction

In database management systems, a primary key is a unique identifier for a record in a table. In Cassandra, a distributed NoSQL database, primary keys play a crucial role in ensuring data integrity and efficient data retrieval. This tutorial will explore the concept of primary keys in Cassandra, their importance, types, and how to define them when creating tables.

Importance of Primary Keys

Primary keys serve several important functions in Cassandra:

  • Uniqueness: Each primary key must be unique, ensuring that no two rows have the same identifier.
  • Data Retrieval: Primary keys are used to efficiently locate rows in large datasets.
  • Data Distribution: In Cassandra, the primary key determines how data is distributed across the cluster.
  • Data Integrity: By enforcing uniqueness, primary keys help maintain data integrity across the database.

Types of Primary Keys

Cassandra uses two types of primary keys:

  • Partition Key: The first part of the primary key, it determines the distribution of data across nodes in the cluster. Rows with the same partition key are stored together on the same node.
  • Clustering Key: The second part of the primary key, it determines the order of the rows within a partition. This allows for efficient querying of data.

Defining Primary Keys

When creating a table in Cassandra, you define the primary key using the PRIMARY KEY clause. The syntax is as follows:

CREATE TABLE table_name (

column1 datatype,

column2 datatype,

PRIMARY KEY (partition_key, clustering_key)

);

In this example, partition_key is the primary identifier for the partition, while clustering_key is used to order the rows within that partition.

Example of Creating a Table with Primary Keys

Consider a scenario where we want to store user information. We might define a table as follows:

CREATE TABLE users (

user_id UUID PRIMARY KEY,

name TEXT,

email TEXT

);

In this example, user_id serves as the primary key, uniquely identifying each user.

Advanced Usage: Composite Primary Keys

Cassandra allows for composite primary keys, which consist of both partition and clustering keys. For example:

CREATE TABLE orders (

order_id UUID,

user_id UUID,

order_date TIMESTAMP,

PRIMARY KEY (user_id, order_date)

);

Here, user_id is the partition key, ensuring that all orders for a user are stored together. order_date is the clustering key, determining the order of orders for that user.

Conclusion

In summary, primary keys are fundamental to structuring data in Cassandra. They ensure uniqueness, facilitate efficient data retrieval, and play a vital role in data distribution across the cluster. Understanding how to define and use primary keys effectively is crucial for designing robust Cassandra databases.