This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Monitoring and Health Checks

Loading…

Monitoring and Health Checks

Relevant source files

Purpose and Scope

This document explains monitoring and health check strategies for the Docker MQTT Mosquitto with Cloudflare Tunnel system. It covers container status verification, automated health checks in the CI pipeline, logging approaches, and techniques for implementing custom monitoring solutions. For production deployment considerations, see Production Deployment Considerations. For troubleshooting specific issues, see Troubleshooting.

Container Status Monitoring

Docker Inspect Command Pattern

The system uses Docker’s native inspection capabilities to verify container health. The primary method queries container state using docker inspect with formatted output.

Container State Check Pattern:

This command returns the current container state, which can be one of: created, restarting, running, removing, paused, exited, or dead.

Container Status Fields

Docker provides multiple state fields that can be inspected for comprehensive health monitoring:

Field	Path	Description
Status	`.State.Status`	Current container state
Running	`.State.Running`	Boolean indicating if container is running
ExitCode	`.State.ExitCode`	Exit code if container has stopped
StartedAt	`.State.StartedAt`	Timestamp when container started
Error	`.State.Error`	Error message if container failed

Sources: .github/workflows/ci.yml30

CI Pipeline Health Verification

Automated Health Check Implementation

The GitHub Actions CI pipeline implements a bounded retry pattern to verify the mosquitto container reaches a healthy running state after startup. This pattern accounts for non-deterministic container initialization times.

flowchart TD
    Start["Start Docker Compose\ndocker-compose up -d mosquitto"]
Init["Initialize retry counter\ni=1, max=10"]
Inspect["Execute docker inspect\n--format='{{.State.Status}}'"]
Check{"Status == 'running'?"}
Success["Log: Mosquitto is running\nExit 0"]
Wait["Log: Waiting for Mosquitto...\nSleep 10 seconds"]
Increment["Increment counter\ni++"]
CheckMax{"i > 10?"}
Timeout["Log: Did not become healthy\nExit 1"]
Start --> Init
 
   Init --> Inspect
 
   Inspect --> Check
 
   Check -->|Yes| Success
 
   Check -->|No| Wait
 
   Wait --> Increment
 
   Increment --> CheckMax
 
   CheckMax -->|No| Inspect
 
   CheckMax -->|Yes| Timeout

Health Check Flow Diagram

Sources: .github/workflows/ci.yml:27-39

Retry Loop Parameters

The health check uses the following parameters:

Parameter	Value	Rationale
Max Attempts	10	Provides 100 seconds total wait time
Sleep Interval	10 seconds	Balances responsiveness and system load
Total Timeout	100 seconds	Sufficient for cold starts and image pulls
Check Method	`docker inspect`	Native Docker API, no external dependencies
Success Criteria	Status contains `running`	Container process is active

The implementation uses a shell loop structure:

Sources: .github/workflows/ci.yml:28-39

Docker Native Healthcheck Configuration

Current Implementation

The system currently relies on Docker Compose’s restart: unless-stopped policy for automatic recovery but does not define explicit healthcheck directives in the compose configuration.

Current Restart Policy:

Service: mosquitto - restart policy defined at docker-compose.yml9
Service: cloudflared - restart policy defined at docker-compose.yml15

Adding Native Docker Healthchecks

Docker Compose supports native healthcheck definitions that provide more sophisticated monitoring than simple state checks. Below are example configurations for both services.

Mosquitto Healthcheck Example

This healthcheck:

Subscribes to the system topic $SYS/broker/uptime
Waits up to 3 seconds for a message
Receives 1 message (-C 1) to confirm broker is operational
Runs every 30 seconds
Allows 10 seconds for the test to complete
Requires 3 consecutive failures before marking unhealthy

Cloudflared Healthcheck Example

This healthcheck:

Executes cloudflared tunnel info to verify tunnel connectivity
Runs every 60 seconds (tunnel state changes slowly)
Allows 5 retries (tunnel reconnections may take time)
Provides 30 seconds startup grace period

Sources: docker-compose.yml:4-17

System Monitoring Architecture

Container Monitoring Topology

Sources: docker-compose.yml:1-18 .github/workflows/ci.yml:27-39

Cloudflared Tunnel Monitoring

Tunnel Status Verification

The cloudflared container maintains an outbound connection to Cloudflare’s network. Monitoring tunnel health requires checking both container state and tunnel connectivity.

Log-Based Monitoring

The cloudflared process outputs connection status to stdout:

Tunnel Status Commands

Common Cloudflared Log Patterns

Log Pattern	Meaning	Action Required
`Registered tunnel connection`	Tunnel successfully connected	Normal operation
`unable to register connection`	Authentication or network failure	Verify token and connectivity
`Retrying in`	Temporary connection loss	Monitor for recovery
`SIGTERM received`	Graceful shutdown initiated	Expected during restarts
`certificate error`	TLS/SSL verification failed	Check system time and CA certificates

Sources: docker-compose.yml:11-17

Topic	Description	Type
`$SYS/broker/version`	Broker version string	Static
`$SYS/broker/uptime`	Seconds since broker start	Counter
`$SYS/broker/clients/connected`	Current connected client count	Gauge
`$SYS/broker/clients/total`	Total clients since start	Counter
`$SYS/broker/messages/received`	Total messages received	Counter
`$SYS/broker/messages/sent`	Total messages sent	Counter
`$SYS/broker/bytes/received`	Total bytes received	Counter
`$SYS/broker/bytes/sent`	Total bytes sent	Counter

Docker Container Exporter : Exposes container metrics
MQTT Exporter : Subscribes to $SYS topics and exposes as Prometheus metrics
Cloudflare Tunnel Exporter : Monitors tunnel status via Cloudflare API

Opening ipv4 listen socket on port 1883.
Opening ipv4 listen socket on port 9001.
mosquitto version X.X.X running

Cloudflared Connection Success Indicators

Registered tunnel connection
Connection established

Automatic restart on failure
Respect for manual stops (docker-compose stop)
Persistence across Docker daemon restarts

Configuration locations:

mosquitto service: docker-compose.yml9
cloudflared service: docker-compose.yml15

Monitoring Restart Behavior

Crash Loop Detection

Frequent restarts indicate underlying issues:

If restart count increases rapidly (>3 restarts in 1 minute), investigate:

Check container logs for error messages
Verify configuration file syntax
Confirm environment variables are set
Check resource constraints (memory/CPU limits)

Sources: docker-compose.yml9 docker-compose.yml15

CI Pipeline Integration

The GitHub Actions CI workflow serves as a reference implementation for automated health verification. The workflow demonstrates:

Isolated Environment : Starts only the mosquitto service without dependencies
Bounded Wait : Implements timeout to prevent hanging builds
State Verification : Uses Docker’s native state inspection
Clean Teardown : Ensures resources are released after testing

CI Health Check Sequence

Sources: .github/workflows/ci.yml:24-42

Best Practices Summary

Practice	Implementation	Benefit
Bounded retries	Max 10 attempts with 10s interval	Prevents infinite waits
Docker native checks	Use `docker inspect` for state	No external dependencies
Log monitoring	Regularly check container logs	Early problem detection
Restart tracking	Monitor `RestartCount` metric	Identify crash loops
System topics	Subscribe to `$SYS/#` for MQTT stats	Broker-native monitoring
Health scripts	Automate multi-component checks	Consistent verification
External integration	Export metrics to monitoring systems	Production observability

Sources: .github/workflows/ci.yml:27-39 docker-compose.yml:1-18

Dismiss

Refresh this wiki

Enter email to refresh

Monitoring and Health Checks
Purpose and Scope
Container Status Monitoring
Docker Inspect Command Pattern
Container Status Fields
CI Pipeline Health Verification
Automated Health Check Implementation
Health Check Flow Diagram
Retry Loop Parameters
Docker Native Healthcheck Configuration
Current Implementation
Adding Native Docker Healthchecks
Mosquitto Healthcheck Example
Cloudflared Healthcheck Example
System Monitoring Architecture
Container Monitoring Topology
Cloudflared Tunnel Monitoring
Tunnel Status Verification
Log-Based Monitoring
Tunnel Status Commands
Common Cloudflared Log Patterns
Mosquitto Broker Monitoring
MQTT System Topics
Key System Topics
Monitoring via MQTT Client
Log-Based Monitoring
Health Check Decision Tree
Custom Health Check Strategies
Script-Based Monitoring
External Monitoring Integration
Prometheus Exporter Pattern
Example Prometheus Configuration
Logging and Log Analysis
Container Log Access
Log Patterns and Indicators
Mosquitto Startup Success Indicators
Cloudflared Connection Success Indicators
Log Aggregation
Restart Policies and Recovery
Current Configuration
Monitoring Restart Behavior
Crash Loop Detection
CI Pipeline Integration
CI Health Check Sequence
Best Practices Summary

Keyboard shortcuts

docker-mqtt-mosquitto-cloudflare-tunnel Documentation