This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Monitoring and Health Checks
Loading…
Monitoring and Health Checks
Relevant source files
Purpose and Scope
This document explains monitoring and health check strategies for the Docker MQTT Mosquitto with Cloudflare Tunnel system. It covers container status verification, automated health checks in the CI pipeline, logging approaches, and techniques for implementing custom monitoring solutions. For production deployment considerations, see Production Deployment Considerations. For troubleshooting specific issues, see Troubleshooting.
Container Status Monitoring
Docker Inspect Command Pattern
The system uses Docker’s native inspection capabilities to verify container health. The primary method queries container state using docker inspect with formatted output.
Container State Check Pattern:
This command returns the current container state, which can be one of: created, restarting, running, removing, paused, exited, or dead.
Container Status Fields
Docker provides multiple state fields that can be inspected for comprehensive health monitoring:
| Field | Path | Description |
|---|---|---|
| Status | .State.Status | Current container state |
| Running | .State.Running | Boolean indicating if container is running |
| ExitCode | .State.ExitCode | Exit code if container has stopped |
| StartedAt | .State.StartedAt | Timestamp when container started |
| Error | .State.Error | Error message if container failed |
Sources: .github/workflows/ci.yml30
CI Pipeline Health Verification
Automated Health Check Implementation
The GitHub Actions CI pipeline implements a bounded retry pattern to verify the mosquitto container reaches a healthy running state after startup. This pattern accounts for non-deterministic container initialization times.
flowchart TD
Start["Start Docker Compose\ndocker-compose up -d mosquitto"]
Init["Initialize retry counter\ni=1, max=10"]
Inspect["Execute docker inspect\n--format='{{.State.Status}}'"]
Check{"Status == 'running'?"}
Success["Log: Mosquitto is running\nExit 0"]
Wait["Log: Waiting for Mosquitto...\nSleep 10 seconds"]
Increment["Increment counter\ni++"]
CheckMax{"i > 10?"}
Timeout["Log: Did not become healthy\nExit 1"]
Start --> Init
Init --> Inspect
Inspect --> Check
Check -->|Yes| Success
Check -->|No| Wait
Wait --> Increment
Increment --> CheckMax
CheckMax -->|No| Inspect
CheckMax -->|Yes| Timeout
Health Check Flow Diagram
Sources: .github/workflows/ci.yml:27-39
Retry Loop Parameters
The health check uses the following parameters:
| Parameter | Value | Rationale |
|---|---|---|
| Max Attempts | 10 | Provides 100 seconds total wait time |
| Sleep Interval | 10 seconds | Balances responsiveness and system load |
| Total Timeout | 100 seconds | Sufficient for cold starts and image pulls |
| Check Method | docker inspect | Native Docker API, no external dependencies |
| Success Criteria | Status contains running | Container process is active |
The implementation uses a shell loop structure:
Sources: .github/workflows/ci.yml:28-39
Docker Native Healthcheck Configuration
Current Implementation
The system currently relies on Docker Compose’s restart: unless-stopped policy for automatic recovery but does not define explicit healthcheck directives in the compose configuration.
Current Restart Policy:
- Service:
mosquitto- restart policy defined at docker-compose.yml9 - Service:
cloudflared- restart policy defined at docker-compose.yml15
Adding Native Docker Healthchecks
Docker Compose supports native healthcheck definitions that provide more sophisticated monitoring than simple state checks. Below are example configurations for both services.
Mosquitto Healthcheck Example
This healthcheck:
- Subscribes to the system topic
$SYS/broker/uptime - Waits up to 3 seconds for a message
- Receives 1 message (
-C 1) to confirm broker is operational - Runs every 30 seconds
- Allows 10 seconds for the test to complete
- Requires 3 consecutive failures before marking unhealthy
Cloudflared Healthcheck Example
This healthcheck:
- Executes
cloudflared tunnel infoto verify tunnel connectivity - Runs every 60 seconds (tunnel state changes slowly)
- Allows 5 retries (tunnel reconnections may take time)
- Provides 30 seconds startup grace period
Sources: docker-compose.yml:4-17
System Monitoring Architecture
Container Monitoring Topology
Sources: docker-compose.yml:1-18 .github/workflows/ci.yml:27-39
Cloudflared Tunnel Monitoring
Tunnel Status Verification
The cloudflared container maintains an outbound connection to Cloudflare’s network. Monitoring tunnel health requires checking both container state and tunnel connectivity.
Log-Based Monitoring
The cloudflared process outputs connection status to stdout:
Tunnel Status Commands
Common Cloudflared Log Patterns
| Log Pattern | Meaning | Action Required |
|---|---|---|
Registered tunnel connection | Tunnel successfully connected | Normal operation |
unable to register connection | Authentication or network failure | Verify token and connectivity |
Retrying in | Temporary connection loss | Monitor for recovery |
SIGTERM received | Graceful shutdown initiated | Expected during restarts |
certificate error | TLS/SSL verification failed | Check system time and CA certificates |
Sources: docker-compose.yml:11-17
Mosquitto Broker Monitoring
MQTT System Topics
The Mosquitto broker publishes operational metrics to reserved $SYS topics. These provide real-time broker statistics without external dependencies.
Key System Topics
| Topic | Description | Type |
|---|---|---|
$SYS/broker/version | Broker version string | Static |
$SYS/broker/uptime | Seconds since broker start | Counter |
$SYS/broker/clients/connected | Current connected client count | Gauge |
$SYS/broker/clients/total | Total clients since start | Counter |
$SYS/broker/messages/received | Total messages received | Counter |
$SYS/broker/messages/sent | Total messages sent | Counter |
$SYS/broker/bytes/received | Total bytes received | Counter |
$SYS/broker/bytes/sent | Total bytes sent | Counter |
Monitoring via MQTT Client
Log-Based Monitoring
Sources: docker-compose.yml:4-9
Health Check Decision Tree
Sources: .github/workflows/ci.yml:27-42 docker-compose.yml:1-18
Custom Health Check Strategies
Script-Based Monitoring
Implement a shell script for comprehensive health verification:
External Monitoring Integration
Prometheus Exporter Pattern
For production deployments, integrate with monitoring systems using exporters:
- Docker Container Exporter : Exposes container metrics
- MQTT Exporter : Subscribes to
$SYStopics and exposes as Prometheus metrics - Cloudflare Tunnel Exporter : Monitors tunnel status via Cloudflare API
Example Prometheus Configuration
Sources: .github/workflows/ci.yml:27-39
Logging and Log Analysis
Container Log Access
Both containers output logs to stdout/stderr, which Docker captures:
Log Patterns and Indicators
Mosquitto Startup Success Indicators
Opening ipv4 listen socket on port 1883.
Opening ipv4 listen socket on port 9001.
mosquitto version X.X.X running
Cloudflared Connection Success Indicators
Registered tunnel connection
Connection established
Log Aggregation
For production deployments, consider implementing centralized logging:
Sources: docker-compose.yml:4-17
Restart Policies and Recovery
Current Configuration
Both services use the unless-stopped restart policy, which provides:
- Automatic restart on failure
- Respect for manual stops (
docker-compose stop) - Persistence across Docker daemon restarts
Configuration locations:
mosquittoservice: docker-compose.yml9cloudflaredservice: docker-compose.yml15
Monitoring Restart Behavior
Crash Loop Detection
Frequent restarts indicate underlying issues:
If restart count increases rapidly (>3 restarts in 1 minute), investigate:
- Check container logs for error messages
- Verify configuration file syntax
- Confirm environment variables are set
- Check resource constraints (memory/CPU limits)
Sources: docker-compose.yml9 docker-compose.yml15
CI Pipeline Integration
The GitHub Actions CI workflow serves as a reference implementation for automated health verification. The workflow demonstrates:
- Isolated Environment : Starts only the
mosquittoservice without dependencies - Bounded Wait : Implements timeout to prevent hanging builds
- State Verification : Uses Docker’s native state inspection
- Clean Teardown : Ensures resources are released after testing
CI Health Check Sequence
Sources: .github/workflows/ci.yml:24-42
Best Practices Summary
| Practice | Implementation | Benefit |
|---|---|---|
| Bounded retries | Max 10 attempts with 10s interval | Prevents infinite waits |
| Docker native checks | Use docker inspect for state | No external dependencies |
| Log monitoring | Regularly check container logs | Early problem detection |
| Restart tracking | Monitor RestartCount metric | Identify crash loops |
| System topics | Subscribe to $SYS/# for MQTT stats | Broker-native monitoring |
| Health scripts | Automate multi-component checks | Consistent verification |
| External integration | Export metrics to monitoring systems | Production observability |
Sources: .github/workflows/ci.yml:27-39 docker-compose.yml:1-18
Dismiss
Refresh this wiki
Enter email to refresh
On this page
- Monitoring and Health Checks
- Purpose and Scope
- Container Status Monitoring
- Docker Inspect Command Pattern
- Container Status Fields
- CI Pipeline Health Verification
- Automated Health Check Implementation
- Health Check Flow Diagram
- Retry Loop Parameters
- Docker Native Healthcheck Configuration
- Current Implementation
- Adding Native Docker Healthchecks
- Mosquitto Healthcheck Example
- Cloudflared Healthcheck Example
- System Monitoring Architecture
- Container Monitoring Topology
- Cloudflared Tunnel Monitoring
- Tunnel Status Verification
- Log-Based Monitoring
- Tunnel Status Commands
- Common Cloudflared Log Patterns
- Mosquitto Broker Monitoring
- MQTT System Topics
- Key System Topics
- Monitoring via MQTT Client
- Log-Based Monitoring
- Health Check Decision Tree
- Custom Health Check Strategies
- Script-Based Monitoring
- External Monitoring Integration
- Prometheus Exporter Pattern
- Example Prometheus Configuration
- Logging and Log Analysis
- Container Log Access
- Log Patterns and Indicators
- Mosquitto Startup Success Indicators
- Cloudflared Connection Success Indicators
- Log Aggregation
- Restart Policies and Recovery
- Current Configuration
- Monitoring Restart Behavior
- Crash Loop Detection
- CI Pipeline Integration
- CI Health Check Sequence
- Best Practices Summary