Swiftorial Logo
Home
Swift Lessons
Tutorials
Learn More
Career
Resources

Advanced Concepts: Log Compaction in Kafka

Introduction to Kafka Log Compaction

Log compaction in Kafka is a mechanism to ensure that the latest value for each key within a topic is retained. This helps in maintaining a compacted log where stale records are removed, saving storage space while ensuring the latest state of each key is available.

Why Use Log Compaction?

  • Retain the latest value for each key.
  • Reduce storage requirements by removing obsolete records.
  • Ensure the availability of the latest state for each key.

Setting Up Log Compaction in Kafka

Log compaction can be configured at the topic level. To enable log compaction, you need to set the cleanup.policy to compact for the topic.

Configuring Log Compaction for a Topic

To enable log compaction for a topic:


bin/kafka-topics.sh --alter --topic my_topic --bootstrap-server localhost:9092 --config cleanup.policy=compact
    
Example:

Enabling log compaction for the topic my_topic:


bin/kafka-topics.sh --alter --topic my_topic --bootstrap-server localhost:9092 --config cleanup.policy=compact
        

How Log Compaction Works

When log compaction is enabled, Kafka retains only the most recent record for each key in the topic. Older records with the same key are removed during the compaction process. This ensures that the log is compacted, keeping only the latest state of each key.

Log Compaction Process

  1. Kafka marks the segment for compaction based on the log.cleaner.min.cleanable.ratio configuration.
  2. The log cleaner thread scans the segment to identify duplicate keys.
  3. Kafka retains the latest record for each key and removes older records.
  4. The compacted segment is written back to the log directory.

Monitoring Log Compaction

Regular monitoring is crucial to ensure that log compaction is working correctly and to identify any issues.

Monitoring Log Compaction with JMX

Kafka exposes log compaction metrics via JMX, which you can monitor using tools like JConsole, Prometheus, and Grafana.


# Example JMX metrics for log compaction
kafka.log:type=LogCleaner,name=MaxCompactionLagMs
kafka.log:type=LogCleaner,name=NumCleanedSegments
    
Example:

Using JMX to monitor Kafka log compaction metrics:

jconsole

Managing Log Compaction with Kafka Manager

Kafka Manager is a tool for managing and monitoring Kafka clusters. You can use Kafka Manager to view and manage log compaction settings.


# Start Kafka Manager
docker run -d -p 9000:9000 --name=kafka-manager -e ZK_HOSTS="zookeeper:2181" hlebalbau/kafka-manager
    
Example:

Access Kafka Manager at http://localhost:9000

Best Practices for Kafka Log Compaction

  • Enable log compaction for topics where maintaining the latest state is crucial.
  • Regularly monitor log compaction metrics to ensure correct operation.
  • Use Kafka Manager or similar tools to manage and document log compaction settings.
  • Test log compaction settings in a staging environment before applying them to production.
  • Document and maintain a history of log compaction configurations and changes.

Conclusion

In this tutorial, we've covered the core concepts of setting up and managing log compaction in Kafka, including enabling log compaction for a topic, understanding how log compaction works, monitoring log compaction, and best practices. Implementing these strategies is essential for ensuring efficient storage utilization and maintaining the latest state of data in a Kafka environment.