Cron Jobs for DevOps: Complete Infrastructure Automation Guide
Master infrastructure automation with cron jobs. This comprehensive guide covers monitoring, backups, log management, and enterprise-grade DevOps automation using cron scheduling.
What You'll Learn
- Infrastructure monitoring and health checks
- Automated backup and disaster recovery
- Log rotation and cleanup strategies
- Performance monitoring and alerting
- Container and cloud integration
- Security scanning and compliance
Infrastructure Monitoring with Cron
DevOps teams rely on cron jobs for continuous infrastructure monitoring. These automated checks ensure system health, detect issues early, and maintain service reliability.
System Health Monitoring
Monitor critical system metrics and send alerts when thresholds are exceeded:
# Check disk usage every 5 minutes
*/5 * * * * /opt/scripts/check_disk_usage.sh
# Monitor memory usage every 10 minutes
*/10 * * * * /opt/scripts/check_memory.sh
# Check CPU load every 15 minutes
*/15 * * * * /opt/scripts/check_cpu_load.sh
# Network connectivity check every minute
* * * * * /opt/scripts/check_network.sh
Sample Health Check Script
#!/bin/bash
# /opt/scripts/check_disk_usage.sh
THRESHOLD=85
ALERT_EMAIL="ops-team@company.com"
HOSTNAME=$(hostname)
# Check disk usage for each mounted filesystem
df -h | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{print $5 " " $1}' | while read output;
do
usage=$(echo $output | awk '{print $1}' | sed 's/%//g')
partition=$(echo $output | awk '{print $2}')
if [ $usage -ge $THRESHOLD ]; then
echo "ALERT: Disk usage on $HOSTNAME:$partition is $usage%" | mail -s "Disk Usage Alert - $HOSTNAME" $ALERT_EMAIL
# Log to syslog
logger -t disk_monitor "HIGH DISK USAGE: $partition at $usage%"
# Send to monitoring system (e.g., Prometheus pushgateway)
echo "disk_usage{host="$HOSTNAME",partition="$partition"} $usage" | curl -X POST --data-binary @- http://pushgateway:9091/metrics/job/disk_monitor
fi
done
Service Availability Monitoring
Ensure critical services are running and responsive:
# Check web services every 2 minutes
*/2 * * * * /opt/scripts/check_web_services.sh
# Database connectivity check every 5 minutes
*/5 * * * * /opt/scripts/check_database.sh
# API endpoint health checks
*/3 * * * * /opt/scripts/check_api_health.sh
# SSL certificate expiration check (daily)
0 8 * * * /opt/scripts/check_ssl_certs.sh
Web Service Health Check
#!/bin/bash
# /opt/scripts/check_web_services.sh
SERVICES=(
"https://api.company.com/health"
"https://app.company.com/status"
"https://admin.company.com/ping"
)
TIMEOUT=10
ALERT_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
for service in "${SERVICES[@]}"; do
response=$(curl -s -o /dev/null -w "%{http_code}" --max-time $TIMEOUT "$service")
if [ "$response" != "200" ]; then
message="🚨 Service DOWN: $service (HTTP $response)"
# Send Slack alert
curl -X POST -H 'Content-type: application/json' --data "{"text":"$message"}" "$ALERT_WEBHOOK"
# Log incident
echo "$(date): $message" >> /var/log/service_monitor.log
# Create PagerDuty incident
curl -X POST https://events.pagerduty.com/v2/enqueue -H "Content-Type: application/json" -d "{
"routing_key": "YOUR_INTEGRATION_KEY",
"event_action": "trigger",
"payload": {
"summary": "Service Unavailable: $service",
"severity": "critical",
"source": "cron-monitor"
}
}"
fi
done
Automated Backup and Disaster Recovery
Reliable backups are critical for DevOps operations. Cron jobs automate backup processes, ensuring data protection without manual intervention.
Database Backup Strategies
# Full database backup (daily at 2 AM)
0 2 * * * /opt/scripts/backup_database.sh full
# Incremental backup every 6 hours
0 */6 * * * /opt/scripts/backup_database.sh incremental
# Transaction log backup every 15 minutes
*/15 * * * * /opt/scripts/backup_database.sh transaction_log
# Backup verification (daily at 3 AM)
0 3 * * * /opt/scripts/verify_backups.sh
Production Database Backup Script
#!/bin/bash
# /opt/scripts/backup_database.sh
BACKUP_TYPE=$1
DB_HOST="production-db.internal"
DB_USER="backup_user"
DB_NAME="production_db"
BACKUP_DIR="/backups/database"
S3_BUCKET="s3://company-backups/database"
RETENTION_DAYS=30
# Create timestamped backup directory
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_PATH="$BACKUP_DIR/$TIMESTAMP"
mkdir -p "$BACKUP_PATH"
case $BACKUP_TYPE in
"full")
echo "Starting full database backup..."
# Create full backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME -f "$BACKUP_PATH/full_backup_$TIMESTAMP.sql" --verbose --no-password
# Compress backup
gzip "$BACKUP_PATH/full_backup_$TIMESTAMP.sql"
# Upload to S3
aws s3 cp "$BACKUP_PATH/full_backup_$TIMESTAMP.sql.gz" "$S3_BUCKET/full/" --storage-class STANDARD_IA
# Verify backup integrity
if [ $? -eq 0 ]; then
echo "✅ Full backup completed successfully"
logger -t db_backup "Full backup completed: $TIMESTAMP"
else
echo "❌ Full backup failed"
logger -t db_backup "Full backup FAILED: $TIMESTAMP"
exit 1
fi
;;
"incremental")
echo "Starting incremental backup..."
# WAL archiving for PostgreSQL
rsync -av /var/lib/postgresql/data/pg_wal/ "$BACKUP_PATH/wal_archive/"
# Upload WAL files to S3
aws s3 sync "$BACKUP_PATH/wal_archive/" "$S3_BUCKET/wal/" --delete
;;
esac
# Cleanup old local backups
find $BACKUP_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} ;
# Send backup report
/opt/scripts/send_backup_report.sh $BACKUP_TYPE $TIMESTAMP $?
Application and Configuration Backups
# Configuration backup (daily at 1 AM)
0 1 * * * /opt/scripts/backup_configs.sh
# Application code backup (daily at 1:30 AM)
30 1 * * * /opt/scripts/backup_application.sh
# Docker volumes backup (daily at 2:30 AM)
30 2 * * * /opt/scripts/backup_docker_volumes.sh
# Kubernetes manifests backup (daily at 3:30 AM)
30 3 * * * /opt/scripts/backup_k8s_manifests.sh
Log Management and Rotation
Effective log management prevents disk space issues and ensures log availability for troubleshooting and compliance.
Custom Log Rotation
# Application log rotation (daily at midnight)
0 0 * * * /opt/scripts/rotate_app_logs.sh
# Archive old logs (weekly on Sunday)
0 3 * * 0 /opt/scripts/archive_logs.sh
# Clean up archived logs (monthly)
0 4 1 * * /opt/scripts/cleanup_old_logs.sh
# Send logs to centralized logging (every 5 minutes)
*/5 * * * * /opt/scripts/ship_logs.sh
Application Log Rotation Script
#!/bin/bash
# /opt/scripts/rotate_app_logs.sh
LOG_DIRS=(
"/var/log/nginx"
"/var/log/application"
"/var/log/api"
"/var/log/worker"
)
RETENTION_DAYS=30
ARCHIVE_DIR="/var/log/archive"
ELASTICSEARCH_URL="http://elasticsearch:9200"
for log_dir in "${LOG_DIRS[@]}"; do
if [ -d "$log_dir" ]; then
echo "Processing logs in $log_dir"
# Find and process log files
find "$log_dir" -name "*.log" -type f | while read logfile; do
# Get file size and last modified date
file_size=$(stat -f%z "$logfile" 2>/dev/null || stat -c%s "$logfile")
# Rotate if file is larger than 100MB
if [ "$file_size" -gt 104857600 ]; then
timestamp=$(date +%Y%m%d_%H%M%S)
rotated_file="$logfile.$timestamp"
# Rotate the log file
mv "$logfile" "$rotated_file"
# Create new empty log file with proper permissions
touch "$logfile"
chmod 644 "$logfile"
chown www-data:www-data "$logfile"
# Compress rotated file
gzip "$rotated_file"
# Send to Elasticsearch before archiving
if [ -n "$ELASTICSEARCH_URL" ]; then
/opt/scripts/send_to_elasticsearch.sh "$rotated_file.gz"
fi
# Archive compressed file
mkdir -p "$ARCHIVE_DIR/$(basename $log_dir)"
mv "$rotated_file.gz" "$ARCHIVE_DIR/$(basename $log_dir)/"
echo "Rotated: $logfile -> $rotated_file.gz"
logger -t log_rotation "Rotated log file: $logfile"
fi
done
fi
done
# Cleanup old archived logs
find $ARCHIVE_DIR -name "*.gz" -mtime +$RETENTION_DAYS -delete
Centralized Log Shipping
#!/bin/bash
# /opt/scripts/ship_logs.sh
LOGSTASH_HOST="logstash.internal:5044"
LOG_SOURCES=(
"/var/log/nginx/access.log"
"/var/log/application/app.log"
"/var/log/api/api.log"
)
# Ship logs to Logstash
for log_source in "${LOG_SOURCES[@]}"; do
if [ -f "$log_source" ]; then
# Use filebeat or custom shipping
tail -n 100 "$log_source" | while read line; do
# Format as JSON for Logstash
json_log=$(echo "$line" | jq -R -s '{
"timestamp": now,
"host": "'$(hostname)'",
"source": "'$log_source'",
"message": .
}')
# Send to Logstash
echo "$json_log" | nc $LOGSTASH_HOST
done
fi
done
Performance Monitoring and Alerting
Proactive performance monitoring helps identify bottlenecks and capacity issues before they impact users.
System Performance Metrics
# Collect system metrics every minute
* * * * * /opt/scripts/collect_metrics.sh
# Generate performance reports (hourly)
0 * * * * /opt/scripts/generate_performance_report.sh
# Database performance analysis (every 15 minutes)
*/15 * * * * /opt/scripts/analyze_db_performance.sh
# Application performance monitoring (every 5 minutes)
*/5 * * * * /opt/scripts/monitor_app_performance.sh
Comprehensive Metrics Collection
#!/bin/bash
# /opt/scripts/collect_metrics.sh
METRICS_DIR="/var/metrics"
PROMETHEUS_GATEWAY="http://pushgateway:9091"
TIMESTAMP=$(date +%s)
mkdir -p $METRICS_DIR
# CPU Usage
cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
echo "cpu_usage $cpu_usage $TIMESTAMP" >> $METRICS_DIR/cpu.metrics
# Memory Usage
memory_usage=$(free | grep Mem | awk '{printf "%.2f", $3/$2 * 100.0}')
echo "memory_usage $memory_usage $TIMESTAMP" >> $METRICS_DIR/memory.metrics
# Disk I/O
disk_io=$(iostat -x 1 1 | tail -n +4 | awk '{print $10}' | head -1)
echo "disk_io_util $disk_io $TIMESTAMP" >> $METRICS_DIR/disk.metrics
# Network throughput
network_rx=$(cat /proc/net/dev | grep eth0 | awk '{print $2}')
network_tx=$(cat /proc/net/dev | grep eth0 | awk '{print $10}')
echo "network_rx_bytes $network_rx $TIMESTAMP" >> $METRICS_DIR/network.metrics
echo "network_tx_bytes $network_tx $TIMESTAMP" >> $METRICS_DIR/network.metrics
# Load average
load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
echo "load_average $load_avg $TIMESTAMP" >> $METRICS_DIR/load.metrics
# Send metrics to Prometheus
cat $METRICS_DIR/*.metrics | curl -X POST --data-binary @- "$PROMETHEUS_GATEWAY/metrics/job/system_metrics/instance/$(hostname)"
# Cleanup old metrics files
find $METRICS_DIR -name "*.metrics" -mmin +5 -delete
Container and Cloud Integration
Modern DevOps environments require cron integration with containers, orchestrators, and cloud services.
Docker Container Management
# Container health checks (every 2 minutes)
*/2 * * * * /opt/scripts/check_containers.sh
# Container cleanup (daily at 3 AM)
0 3 * * * /opt/scripts/cleanup_containers.sh
# Image updates (daily at 4 AM)
0 4 * * * /opt/scripts/update_images.sh
# Container resource monitoring (every 5 minutes)
*/5 * * * * /opt/scripts/monitor_container_resources.sh
Container Health Monitoring
#!/bin/bash
# /opt/scripts/check_containers.sh
ALERT_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
# Check all running containers
docker ps --format "table {{.Names}} {{.Status}}" | tail -n +2 | while read container_info; do
container_name=$(echo "$container_info" | awk '{print $1}')
status=$(echo "$container_info" | awk '{print $2}')
# Check if container is healthy
health_status=$(docker inspect --format='{{.State.Health.Status}}' "$container_name" 2>/dev/null)
if [ "$health_status" == "unhealthy" ] || [ "$status" == "Exited" ]; then
message="🐳 Container Issue: $container_name is $status ($health_status)"
# Send alert
curl -X POST -H 'Content-type: application/json' --data "{"text":"$message"}" "$ALERT_WEBHOOK"
# Attempt container restart
echo "Attempting to restart $container_name..."
docker restart "$container_name"
# Log incident
logger -t container_monitor "Container restart attempted: $container_name"
fi
done
# Check Docker daemon health
if ! docker info >/dev/null 2>&1; then
echo "❌ Docker daemon is not responding"
systemctl restart docker
logger -t container_monitor "Docker daemon restarted"
fi
Kubernetes Integration
# Kubernetes cluster monitoring (every 5 minutes)
*/5 * * * * /opt/scripts/monitor_k8s_cluster.sh
# Pod resource cleanup (hourly)
0 * * * * /opt/scripts/cleanup_k8s_resources.sh
# Backup Kubernetes manifests (daily)
0 5 * * * /opt/scripts/backup_k8s_manifests.sh
# Check node health (every 10 minutes)
*/10 * * * * /opt/scripts/check_k8s_nodes.sh
Security Scanning and Compliance
Automated security scanning and compliance checks are essential for maintaining secure infrastructure.
Security Scanning Automation
# Daily vulnerability scan at 2 AM
0 2 * * * /opt/scripts/security_scan.sh
# SSL certificate monitoring (daily at 8 AM)
0 8 * * * /opt/scripts/check_ssl_certificates.sh
# Port scan detection (every 30 minutes)
*/30 * * * * /opt/scripts/check_port_scans.sh
# Security log analysis (hourly)
0 * * * * /opt/scripts/analyze_security_logs.sh
# Compliance audit (weekly on Monday at 6 AM)
0 6 * * 1 /opt/scripts/compliance_audit.sh
Vulnerability Scanning Script
#!/bin/bash
# /opt/scripts/security_scan.sh
SCAN_TARGETS=(
"production-web-01.internal"
"production-api-01.internal"
"production-db-01.internal"
)
REPORT_DIR="/var/reports/security"
ALERT_EMAIL="security-team@company.com"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
mkdir -p $REPORT_DIR
echo "Starting security scan at $(date)"
for target in "${SCAN_TARGETS[@]}"; do
echo "Scanning $target..."
# Nmap vulnerability scan
nmap -sV --script vuln "$target" > "$REPORT_DIR/vuln_scan_$target_$TIMESTAMP.txt"
# Check for critical vulnerabilities
critical_vulns=$(grep -c "CRITICAL" "$REPORT_DIR/vuln_scan_$target_$TIMESTAMP.txt")
high_vulns=$(grep -c "HIGH" "$REPORT_DIR/vuln_scan_$target_$TIMESTAMP.txt")
if [ "$critical_vulns" -gt 0 ] || [ "$high_vulns" -gt 5 ]; then
# Send immediate alert
echo "CRITICAL: $target has $critical_vulns critical and $high_vulns high vulnerabilities" | mail -s "Security Alert - $target" -A "$REPORT_DIR/vuln_scan_$target_$TIMESTAMP.txt" $ALERT_EMAIL
# Create security incident ticket
curl -X POST https://api.ticketing-system.com/incidents -H "Authorization: Bearer $API_TOKEN" -H "Content-Type: application/json" -d "{
"title": "Security Vulnerabilities Found - $target",
"description": "Critical: $critical_vulns, High: $high_vulns",
"priority": "high",
"category": "security"
}"
fi
done
# Generate security summary report
/opt/scripts/generate_security_report.sh $TIMESTAMP
echo "Security scan completed at $(date)"
Troubleshooting DevOps Cron Jobs
Common Issues and Solutions
Environment Variables
Cron jobs run with minimal environment. Always source your environment variables or use full paths to executables.
# Good: Source environment first
0 2 * * * source /etc/environment && /opt/scripts/backup.sh
Permission Issues
Ensure scripts have proper permissions and run as the correct user.
# Set proper permissions
chmod +x /opt/scripts/backup.sh
chown ops:ops /opt/scripts/backup.sh
Debugging and Logging
#!/bin/bash
# Enhanced logging template for DevOps scripts
SCRIPT_NAME=$(basename "$0")
LOG_FILE="/var/log/devops/$SCRIPT_NAME.log"
PID_FILE="/var/run/$SCRIPT_NAME.pid"
# Function to log with timestamp
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') [$SCRIPT_NAME] $1" | tee -a "$LOG_FILE"
}
# Function to handle errors
error_exit() {
log "ERROR: $1"
cleanup
exit 1
}
# Function to cleanup on exit
cleanup() {
log "Cleaning up..."
rm -f "$PID_FILE"
}
# Set up signal handlers
trap cleanup EXIT
trap 'error_exit "Script interrupted"' INT TERM
# Check if already running
if [ -f "$PID_FILE" ]; then
if kill -0 $(cat "$PID_FILE") 2>/dev/null; then
error_exit "Script is already running (PID: $(cat $PID_FILE))"
else
rm -f "$PID_FILE"
fi
fi
# Write PID file
echo $$ > "$PID_FILE"
log "Script started"
# Your script logic here
# ...
log "Script completed successfully"
Conclusion
DevOps automation with cron jobs provides a robust foundation for infrastructure management, monitoring, and maintenance. By implementing these patterns and best practices, you can build reliable, scalable automation that enhances your operations and reduces manual intervention.
Remember to always test your cron jobs in staging environments, implement proper logging and monitoring, and have rollback procedures in place for critical automation tasks.
Ready to Implement DevOps Automation?
Start building your automated infrastructure with our interactive cron expression generator.