Rollups in Elasticsearch
Introduction to Rollups
Rollups are a feature in Elasticsearch that allow you to summarize and store historical data in a more efficient way. This is particularly useful for time-series data where the granularity of older data is not as important as more recent data. By rolling up data, you can reduce storage costs and improve query performance.
Creating a Rollup Job
To create a rollup job, you need to define a set of parameters that determine how the data should be summarized. These parameters include the index pattern, the time field, and the rollup interval.
Example of creating a rollup job:
PUT _rollup/job/sales_rollup { "index_pattern": "sales-*", "rollup_index": "sales_rollup", "cron": "*/30 * * * * ?", "page_size": 1000, "groups": { "date_histogram": { "field": "date", "fixed_interval": "1d" } }, "metrics": [ { "field": "price", "metrics": ["min", "max", "sum"] } ] }
Understanding Rollup Indices
Rollup indices are special indices where the summarized data is stored. These indices are read-only and can be queried just like regular indices. When querying rollup indices, you can use the standard Elasticsearch query DSL, but with some limitations.
Example of querying a rollup index:
GET sales_rollup/_search { "size": 0, "aggs": { "sales_over_time": { "date_histogram": { "field": "date", "interval": "1d" }, "aggs": { "total_sales": { "sum": { "field": "price" } } } } } }
Managing Rollup Jobs
Once a rollup job is created, you can start, stop, and delete it using the Elasticsearch API. You can also check the status of a rollup job to see if it is running or has completed.
Example of managing rollup jobs:
# Start a rollup job POST _rollup/job/sales_rollup/_start # Stop a rollup job POST _rollup/job/sales_rollup/_stop # Delete a rollup job DELETE _rollup/job/sales_rollup
Best Practices
When using rollups, consider the following best practices:
- Choose an appropriate rollup interval to balance between storage savings and query granularity.
- Monitor rollup jobs to ensure they are running as expected and completing within a reasonable time frame.
- Regularly review and delete old rollup jobs that are no longer needed to free up resources.
Conclusion
Rollups in Elasticsearch provide a powerful way to manage and query large volumes of time-series data efficiently. By summarizing historical data, you can save on storage costs and improve query performance, making it easier to gain insights from your data.