Shards And Replicas | Basic Concepts | Elasticsearch Tutorial

Introduction

Elasticsearch is a distributed search and analytics engine. To support its distributed nature, Elasticsearch uses shards and replicas. These concepts are fundamental to understanding how Elasticsearch scales and ensures high availability. This tutorial covers the basics of shards and replicas, their importance, and how they work within Elasticsearch.

What is a Shard?

A shard is a single instance of Lucene, the open-source search engine that Elasticsearch is built on. An index is a collection of documents, and each index can be divided into multiple shards. Shards allow you to horizontally split your data so that you can distribute and parallelize operations across multiple nodes in your cluster.

Example

Imagine you have an index with 1 million documents. Instead of storing all 1 million documents in a single shard, you can split the index into 5 shards, each containing approximately 200,000 documents.

What is a Replica?

A replica is a copy of a shard. Replicas are used for both high availability and increased search performance. If a node fails, Elasticsearch can use replicas to ensure that no data is lost and that the cluster remains operational.

Example

If you have an index with 5 primary shards and you configure 1 replica for each shard, you will have a total of 10 shards (5 primary + 5 replica shards).

Configuring Shards and Replicas

You can configure the number of shards and replicas when creating an index. The number of primary shards is fixed at the time of index creation, but you can change the number of replicas dynamically at any time.

Example

Creating an index with 3 primary shards and 1 replica:

PUT /my_index
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}

Updating the number of replicas for an existing index:

PUT /my_index/_settings
{
"number_of_replicas": 2
}

Shard Allocation

Elasticsearch uses a shard allocation algorithm to distribute shards across the nodes in the cluster. The goal is to balance the load and ensure that no single node is overwhelmed. Elasticsearch tries to distribute primary and replica shards across different nodes to ensure high availability.

Example

If you have a cluster with 3 nodes and an index with 3 primary shards and 1 replica, Elasticsearch will distribute the shards in the following way:

Node 1: Primary shard 1, Replica shard 2
Node 2: Primary shard 2, Replica shard 3
Node 3: Primary shard 3, Replica shard 1

Conclusion

Understanding shards and replicas is crucial for managing and scaling your Elasticsearch cluster. Shards allow you to distribute data and workloads, while replicas provide redundancy and improve search performance. By configuring and managing shards and replicas effectively, you can ensure that your Elasticsearch cluster remains highly available and performs well under load.

Shards and Replicas in Elasticsearch

Introduction

What is a Shard?

Example

What is a Replica?

Example

Configuring Shards and Replicas

Example

Shard Allocation

Example

Conclusion