You are currently viewing Prometheus Grafana Docker Compose Monitoring: Production Setup

Prometheus Grafana Docker Compose Monitoring: Production Setup

  • Post author:
  • Post category:Tutorials
  • Post comments:0 Comments
  • Reading time:6 mins read


Table of Contents

  1. Overview
  2. Prerequisites
  3. Quick Architecture
  4. Install / Setup
  5. Base Configuration
  6. Reload/Enable & Health Checks
  7. Security / Hardening
  8. Performance & Optimization
  9. Backup & Restore
  10. Troubleshooting (Top issues)
  11. Key Takeaways & Next Steps

Overview

This guide builds a production-ready Prometheus Grafana Docker Compose Monitoring. We’ll run Prometheus, Grafana, and Node Exporter with Docker Compose,
using persistent volumes, health checks, and declarative configs. You’ll get a ready-to-run docker-compose.yml, a basic
Prometheus scrape config, and starter alert rules. Official docs:
Prometheus ·
Grafana ·
Docker Compose.

Prerequisites

  • A Linux host with Docker Engine + Docker Compose v2.
  • Open ports: 9090/tcp (Prometheus), 9100/tcp (Node Exporter), 3000/tcp (Grafana). Restrict externally if needed.
  • Server time in sync (chrony/systemd-timesyncd) to avoid skewed metrics.

Quick Architecture

Install Docker + Compose

Use your distro’s method. After install, ensure docker works and Compose v2 is available as docker compose.

Ubuntu/Debian

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
docker --version && docker compose version

RHEL/Rocky/CentOS Stream/Fedora

curl -fsSL https://get.docker.com | sh
sudo systemctl enable --now docker
docker --version && docker compose version

Arch/Manjaro

sudo pacman -Syu --noconfirm docker docker-compose
sudo systemctl enable --now docker
docker --version && docker compose version

openSUSE/SLE

sudo zypper refresh
sudo zypper install -y docker docker-compose
sudo systemctl enable --now docker
docker --version && docker compose version

Install / Setup

We’ll prepare a working directory with Compose files and Prometheus configs. Node Exporter runs on the same host exposing Linux metrics on 9100.
Grafana connects to Prometheus as a data source. All services restart automatically and store data in named volumes.

# Working directory
mkdir -p ~/monitoring/{prometheus,grafana}
cd ~/monitoring

# Prometheus configuration
cat > prometheus/prometheus.yml <<'YAML'
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']
YAML

# Basic alert rule (optional)
cat > prometheus/alert.rules.yml <<'YAML'
groups:
- name: node-alerts
  rules:
  - alert: HostDown
    expr: up{job="node"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Node exporter down"
      description: "No scrape data for 1 minute."
YAML

Base Configuration

Create the docker-compose.yml with three services: Prometheus, Node Exporter, and Grafana.
Prometheus loads the config and rules, exposes 9090, and depends on Node Exporter.
Grafana exposes 3000 with persistent storage. Health checks ensure containers are restarted if unresponsive.

# docker-compose.yml
version: "3.9"
services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--web.enable-lifecycle"
      - "--web.enable-admin-api"
      - "--web.console.libraries=/usr/share/prometheus/console_libraries"
      - "--web.console.templates=/usr/share/prometheus/consoles"
      - "--web.enable-remote-write-receiver"
      - "--storage.tsdb.retention.time=15d"
      - "--web.route-prefix=/"
      - "--web.external-url=http://prometheus.local"
      - "--web.enable-admin-api"
      - "--web.enable-lifecycle"
      - "--enable-feature=promql-negative-offset"
      - "--alertmanager.notification-queue-capacity=10000"
      - "--web.config.file=/etc/prometheus/web.yml"
    volumes:
      - prometheus-data:/prometheus
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/alert.rules.yml:/etc/prometheus/alert.rules.yml:ro
    ports:
      - "9090:9090"
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:9090/-/ready"]
      interval: 15s
      timeout: 5s
      retries: 5
    depends_on:
      - node-exporter

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    pid: "host"
    network_mode: "bridge"
    command: ["--path.rootfs=/host"]
    volumes:
      - /:/host:ro,rslave
    ports:
      - "9100:9100"
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:9100/metrics"]
      interval: 30s
      timeout: 5s
      retries: 5

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana-data:/var/lib/grafana
    ports:
      - "3000:3000"
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:3000/login"]
      interval: 30s
      timeout: 5s
      retries: 5

volumes:
  prometheus-data:
  grafana-data:

Start the stack in detached mode. The first run will pull images and create volumes.

docker compose up -d
docker compose ps

Reload/Enable & Health Checks

Use this sequence when you change configs:

  1. Edit files (e.g., prometheus/prometheus.yml).
  2. Validate the Prometheus config with promtool inside the container.
  3. Apply: for file changes use docker compose up -d (idempotent). For Prometheus scrape/rules changes, hot-reload via /-/reload.
  4. Restart only if reload fails or you changed container args/images.
  5. Check health: verify readiness endpoints and container health.

Validate & Apply changes

# Recreate containers if compose file changed; otherwise idempotent
docker compose up -d

# Validate Prometheus config (promtool is inside the image)
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml
docker exec -it prometheus promtool check rules /etc/prometheus/alert.rules.yml

# Hot-reload Prometheus scrape/rule config (no restart)
curl -X POST http://localhost:9090/-/reload

Health checks & Logs

# Container health and status
docker compose ps
docker inspect --format='{{ .State.Health.Status }}' prometheus

# Readiness endpoints (expect HTTP 200)
curl -I http://localhost:9090/-/ready
curl -I http://localhost:9090/-/healthy

# Tail logs
docker logs --tail=100 -f prometheus
docker logs --tail=100 -f grafana

Security / Hardening

Limit remote access to the monitoring stack. Use the firewall that matches your OS and open only the ports you really need.

  • Ubuntu/Debian → UFW
  • RHEL/Rocky/CentOS Stream/Fedora/openSUSE/SLE → firewalld

Ubuntu/Debian (UFW)

sudo ufw allow OpenSSH
# Open only if remote access required
sudo ufw allow 3000/tcp   # Grafana
sudo ufw allow 9090/tcp   # Prometheus
sudo ufw allow 9100/tcp   # Node Exporter
sudo ufw reload
sudo ufw status

RHEL/Rocky/CentOS/Fedora/openSUSE/SLE (firewalld)

sudo firewall-cmd --permanent --add-port=3000/tcp
sudo firewall-cmd --permanent --add-port=9090/tcp
sudo firewall-cmd --permanent --add-port=9100/tcp
sudo firewall-cmd --reload
sudo firewall-cmd --list-ports

TLS & Auth: put the stack behind a reverse proxy (Caddy/Traefik/Nginx) with HTTPS and auth. Change Grafana admin password on first login and prefer SSO for production.

Performance & Optimization

Improve retention and dashboard performance:

  1. TSDB Retention: adjust --storage.tsdb.retention.time (e.g., 30d/90d) based on disk.
  2. Remote Write (optional): if long-term storage is needed, enable remote write to a back end like Cortex/Thanos.
  3. Grafana provisioning: use provisioning files to pre-load data sources and dashboards for repeatable deploys.
  4. Resource limits: cap container CPU/memory in Compose for noisy neighbors.
# Limit Grafana resources (example)
services:
  grafana:
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 1g

Backup & Restore

Back up both volumes and configuration files. Volumes contain Prometheus TSDB and Grafana data; configs are your source of truth.

Backup

cd ~/monitoring
# stop briefly for a consistent snapshot
docker compose down
sudo tar -C ~ -czf /root/monitoring-backup-$(date +%F).tgz monitoring
sha256sum /root/monitoring-backup-$(date +%F).tgz

Restore

sudo systemctl stop docker || true
sudo tar -C ~ -xzf /root/monitoring-backup-YYYY-MM-DD.tgz
sudo systemctl start docker || true
cd ~/monitoring
docker compose up -d
docker compose ps

Troubleshooting (Top issues)

Prometheus shows “target down” — Node Exporter not reachable or port blocked.

curl -sS http://localhost:9100/metrics | head
docker logs node-exporter | tail -n 50

Grafana cannot connect to Prometheus — Verify Prometheus URL inside Grafana (default http://prometheus:9090 when using Compose network).

docker exec -it grafana grafana-cli admin reset-admin-password 'StrongP@ss!'

High disk usage — Reduce retention or enable remote write.

docker exec -it prometheus du -sh /prometheus

Key Takeaways & Next Steps

  • Prometheus Grafana Docker Compose Monitoring gives a fast, repeatable stack with Compose.
  • Secure with firewall + reverse proxy + strong auth.
  • Next: add Alertmanager, Grafana provisioning, and long‑term storage.

Leave a Reply