Host Monitoring | Monitoring | Prometheus Tutorial

Introduction

Host monitoring is an essential part of managing IT infrastructure. It involves tracking the performance and health of servers and systems. Prometheus is an open-source monitoring solution widely used for this purpose. It collects metrics from configured targets at specified intervals, evaluates rule expressions, and can trigger alerts when certain conditions are met.

Setting Up Prometheus

To start monitoring hosts with Prometheus, you first need to install it. Follow these steps:

Installation Steps

Download Prometheus from the official website.
Extract the downloaded tar file:

tar xvfz prometheus-*.tar.gz

Navigate to the extracted directory:

cd prometheus-*

Run Prometheus:

./prometheus --config.file=prometheus.yml

Configuring Prometheus

Prometheus uses a configuration file named prometheus.yml to define the scrape targets. Here is a simple configuration example:

Sample Configuration

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

This configuration tells Prometheus to scrape metrics from a Node Exporter running on localhost at port 9100.

Node Exporter

Node Exporter is a tool for exposing hardware and OS metrics. You need to install it on the hosts you want to monitor:

Installing Node Exporter

Download Node Exporter from the official website.
Extract and run Node Exporter:

tar xvfz node_exporter-*.tar.gz

cd node_exporter-*

./node_exporter

Visualizing Metrics

Once Prometheus is scraping metrics, you can visualize them using the built-in web UI. Access it by navigating to http://localhost:9090 in your web browser. Here, you can run queries to see metrics data.

Sample Query

To view CPU usage, you can enter the following query:

rate(node_cpu_seconds_total[5m])

Setting Up Alerts

Prometheus can send alerts based on the metrics it collects. Alerts are defined in the configuration file. Here’s a basic example:

Sample Alert Rule

groups:
- name: example
  rules:
  - alert: HighCpuUsage
    expr: sum(rate(node_cpu_seconds_total[1m])) by (instance) > 0.8
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High CPU usage detected"

This rule triggers an alert when CPU usage exceeds 80% for 5 minutes.

Conclusion

Host monitoring with Prometheus allows you to keep track of your system's performance and health. By setting up Prometheus and Node Exporter, configuring scrape targets, and defining alert rules, you can gain valuable insights into your infrastructure and respond to issues proactively.

Host Monitoring with Prometheus