Swiftorial Logo
Home
Swift Lessons
Matchups
CodeSnaps
Tutorials
Career
Resources

Elasticsearch Analyzers

Introduction

In Elasticsearch, an analyzer is used to break down text into searchable tokens and index those tokens for search operations. Analyzers play a crucial role in full-text search capabilities by processing input text into a structured format that can be easily queried.

Core Components of Analyzers

An analyzer in Elasticsearch is composed of three main components:

  • Character Filters: These preprocess the text before it is tokenized. They can remove or replace certain characters.
  • Tokenizer: This splits the input text into tokens or terms.
  • Token Filters: These perform additional processing on the tokens generated by the tokenizer, such as lowercasing, removing stop words, or stemming.

Built-in Analyzers

Elasticsearch offers several built-in analyzers that cover common use cases. Some of the most commonly used built-in analyzers include:

  • Standard Analyzer: The default analyzer, which provides standard tokenization and filtering.
  • Simple Analyzer: Tokenizes text by non-letter characters and lowercases tokens.
  • Whitespace Analyzer: Tokenizes text based on whitespace characters.
  • Stop Analyzer: Similar to the Simple Analyzer but also removes stop words.
  • Keyword Analyzer: Treats the entire input as a single token without any tokenization.

Custom Analyzers

In addition to built-in analyzers, Elasticsearch allows you to create custom analyzers tailored to specific requirements. A custom analyzer can be defined by specifying its character filters, tokenizer, and token filters.

Example: Custom Analyzer

Below is an example of how to define a custom analyzer in Elasticsearch:

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip"],
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "porter_stem"]
        }
      }
    }
  }
}
                    

In this example, the custom analyzer named my_custom_analyzer is defined with:

  • An HTML Strip character filter to remove HTML tags.
  • A standard tokenizer to split text into tokens.
  • Three token filters: lowercase, stop, and porter_stem.

Testing Analyzers

After defining analyzers, it's essential to test them to ensure they work as expected. Elasticsearch provides the _analyze API for this purpose.

Example: Testing an Analyzer

The following request tests the my_custom_analyzer defined earlier:

POST /my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "The quick brown fox jumps over the lazy dog."
}
                    

The response will show the tokens generated by the analyzer:

{
  "tokens": [
    {
      "token": "quick",
      "start_offset": 4,
      "end_offset": 9,
      "type": "",
      "position": 1
    },
    {
      "token": "brown",
      "start_offset": 10,
      "end_offset": 15,
      "type": "",
      "position": 2
    },
    {
      "token": "fox",
      "start_offset": 16,
      "end_offset": 19,
      "type": "",
      "position": 3
    },
    {
      "token": "jump",
      "start_offset": 20,
      "end_offset": 25,
      "type": "",
      "position": 4
    },
    {
      "token": "over",
      "start_offset": 26,
      "end_offset": 30,
      "type": "",
      "position": 5
    },
    {
      "token": "lazi",
      "start_offset": 35,
      "end_offset": 39,
      "type": "",
      "position": 6
    },
    {
      "token": "dog",
      "start_offset": 40,
      "end_offset": 43,
      "type": "",
      "position": 7
    }
  ]
}
                    

Using Analyzers in Mappings

Analyzers are typically specified in the mappings of an index to define how fields should be analyzed. You can set a specific analyzer for a field when creating or updating an index mapping.

Example: Setting an Analyzer in Mappings

Below is an example of setting the my_custom_analyzer for a field in the index mapping:

PUT /my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }
}
                    

Conclusion

Analyzers are a powerful feature in Elasticsearch that enable effective text analysis and full-text search. By understanding and utilizing both built-in and custom analyzers, you can enhance the search capabilities of your Elasticsearch applications.