Cron Jobs for DevOps: Complete Infrastructure Automation Guide

Master infrastructure automation with cron jobs. This comprehensive guide covers monitoring, backups, log management, and enterprise-grade DevOps automation using cron scheduling.

What You'll Learn

Infrastructure monitoring and health checks
Automated backup and disaster recovery
Log rotation and cleanup strategies
Performance monitoring and alerting
Container and cloud integration
Security scanning and compliance

Infrastructure Monitoring with Cron

DevOps teams rely on cron jobs for continuous infrastructure monitoring. These automated checks ensure system health, detect issues early, and maintain service reliability.

System Health Monitoring

Monitor critical system metrics and send alerts when thresholds are exceeded:

# Check disk usage every 5 minutes
*/5 * * * * /opt/scripts/check_disk_usage.sh

# Monitor memory usage every 10 minutes  
*/10 * * * * /opt/scripts/check_memory.sh

# Check CPU load every 15 minutes
*/15 * * * * /opt/scripts/check_cpu_load.sh

# Network connectivity check every minute
* * * * * /opt/scripts/check_network.sh

Sample Health Check Script

#!/bin/bash
# /opt/scripts/check_disk_usage.sh

THRESHOLD=85
ALERT_EMAIL="ops-team@company.com"
HOSTNAME=$(hostname)

# Check disk usage for each mounted filesystem
df -h | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{print $5 " " $1}' | while read output;
do
    usage=$(echo $output | awk '{print $1}' | sed 's/%//g')
    partition=$(echo $output | awk '{print $2}')
    
    if [ $usage -ge $THRESHOLD ]; then
        echo "ALERT: Disk usage on $HOSTNAME:$partition is $usage%" |         mail -s "Disk Usage Alert - $HOSTNAME" $ALERT_EMAIL
        
        # Log to syslog
        logger -t disk_monitor "HIGH DISK USAGE: $partition at $usage%"
        
        # Send to monitoring system (e.g., Prometheus pushgateway)
        echo "disk_usage{host="$HOSTNAME",partition="$partition"} $usage" |         curl -X POST --data-binary @- http://pushgateway:9091/metrics/job/disk_monitor
    fi
done

Service Availability Monitoring

Ensure critical services are running and responsive:

# Check web services every 2 minutes
*/2 * * * * /opt/scripts/check_web_services.sh

# Database connectivity check every 5 minutes
*/5 * * * * /opt/scripts/check_database.sh

# API endpoint health checks
*/3 * * * * /opt/scripts/check_api_health.sh

# SSL certificate expiration check (daily)
0 8 * * * /opt/scripts/check_ssl_certs.sh

Web Service Health Check

#!/bin/bash
# /opt/scripts/check_web_services.sh

SERVICES=(
    "https://api.company.com/health"
    "https://app.company.com/status" 
    "https://admin.company.com/ping"
)

TIMEOUT=10
ALERT_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

for service in "${SERVICES[@]}"; do
    response=$(curl -s -o /dev/null -w "%{http_code}" --max-time $TIMEOUT "$service")
    
    if [ "$response" != "200" ]; then
        message="🚨 Service DOWN: $service (HTTP $response)"
        
        # Send Slack alert
        curl -X POST -H 'Content-type: application/json'         --data "{"text":"$message"}"         "$ALERT_WEBHOOK"
        
        # Log incident
        echo "$(date): $message" >> /var/log/service_monitor.log
        
        # Create PagerDuty incident
        curl -X POST https://events.pagerduty.com/v2/enqueue         -H "Content-Type: application/json"         -d "{
            "routing_key": "YOUR_INTEGRATION_KEY",
            "event_action": "trigger",
            "payload": {
                "summary": "Service Unavailable: $service",
                "severity": "critical",
                "source": "cron-monitor"
            }
        }"
    fi
done

Automated Backup and Disaster Recovery

Reliable backups are critical for DevOps operations. Cron jobs automate backup processes, ensuring data protection without manual intervention.

Database Backup Strategies

# Full database backup (daily at 2 AM)
0 2 * * * /opt/scripts/backup_database.sh full

# Incremental backup every 6 hours
0 */6 * * * /opt/scripts/backup_database.sh incremental

# Transaction log backup every 15 minutes
*/15 * * * * /opt/scripts/backup_database.sh transaction_log

# Backup verification (daily at 3 AM)
0 3 * * * /opt/scripts/verify_backups.sh

Production Database Backup Script

#!/bin/bash
# /opt/scripts/backup_database.sh

BACKUP_TYPE=$1
DB_HOST="production-db.internal"
DB_USER="backup_user"
DB_NAME="production_db"
BACKUP_DIR="/backups/database"
S3_BUCKET="s3://company-backups/database"
RETENTION_DAYS=30

# Create timestamped backup directory
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_PATH="$BACKUP_DIR/$TIMESTAMP"
mkdir -p "$BACKUP_PATH"

case $BACKUP_TYPE in
    "full")
        echo "Starting full database backup..."
        
        # Create full backup
        pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME         -f "$BACKUP_PATH/full_backup_$TIMESTAMP.sql"         --verbose --no-password
        
        # Compress backup
        gzip "$BACKUP_PATH/full_backup_$TIMESTAMP.sql"
        
        # Upload to S3
        aws s3 cp "$BACKUP_PATH/full_backup_$TIMESTAMP.sql.gz"         "$S3_BUCKET/full/" --storage-class STANDARD_IA
        
        # Verify backup integrity
        if [ $? -eq 0 ]; then
            echo "✅ Full backup completed successfully"
            logger -t db_backup "Full backup completed: $TIMESTAMP"
        else
            echo "❌ Full backup failed"
            logger -t db_backup "Full backup FAILED: $TIMESTAMP"
            exit 1
        fi
        ;;
        
    "incremental")
        echo "Starting incremental backup..."
        
        # WAL archiving for PostgreSQL
        rsync -av /var/lib/postgresql/data/pg_wal/         "$BACKUP_PATH/wal_archive/"
        
        # Upload WAL files to S3
        aws s3 sync "$BACKUP_PATH/wal_archive/"         "$S3_BUCKET/wal/" --delete
        ;;
esac

# Cleanup old local backups
find $BACKUP_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} ;

# Send backup report
/opt/scripts/send_backup_report.sh $BACKUP_TYPE $TIMESTAMP $?

Application and Configuration Backups

# Configuration backup (daily at 1 AM)
0 1 * * * /opt/scripts/backup_configs.sh

# Application code backup (daily at 1:30 AM)
30 1 * * * /opt/scripts/backup_application.sh

# Docker volumes backup (daily at 2:30 AM)
30 2 * * * /opt/scripts/backup_docker_volumes.sh

# Kubernetes manifests backup (daily at 3:30 AM)
30 3 * * * /opt/scripts/backup_k8s_manifests.sh

Log Management and Rotation

Effective log management prevents disk space issues and ensures log availability for troubleshooting and compliance.

Custom Log Rotation

# Application log rotation (daily at midnight)
0 0 * * * /opt/scripts/rotate_app_logs.sh

# Archive old logs (weekly on Sunday)
0 3 * * 0 /opt/scripts/archive_logs.sh

# Clean up archived logs (monthly)
0 4 1 * * /opt/scripts/cleanup_old_logs.sh

# Send logs to centralized logging (every 5 minutes)
*/5 * * * * /opt/scripts/ship_logs.sh

Application Log Rotation Script

#!/bin/bash
# /opt/scripts/rotate_app_logs.sh

LOG_DIRS=(
    "/var/log/nginx"
    "/var/log/application"
    "/var/log/api"
    "/var/log/worker"
)

RETENTION_DAYS=30
ARCHIVE_DIR="/var/log/archive"
ELASTICSEARCH_URL="http://elasticsearch:9200"

for log_dir in "${LOG_DIRS[@]}"; do
    if [ -d "$log_dir" ]; then
        echo "Processing logs in $log_dir"
        
        # Find and process log files
        find "$log_dir" -name "*.log" -type f | while read logfile; do
            # Get file size and last modified date
            file_size=$(stat -f%z "$logfile" 2>/dev/null || stat -c%s "$logfile")
            
            # Rotate if file is larger than 100MB
            if [ "$file_size" -gt 104857600 ]; then
                timestamp=$(date +%Y%m%d_%H%M%S)
                rotated_file="$logfile.$timestamp"
                
                # Rotate the log file
                mv "$logfile" "$rotated_file"
                
                # Create new empty log file with proper permissions
                touch "$logfile"
                chmod 644 "$logfile"
                chown www-data:www-data "$logfile"
                
                # Compress rotated file
                gzip "$rotated_file"
                
                # Send to Elasticsearch before archiving
                if [ -n "$ELASTICSEARCH_URL" ]; then
                    /opt/scripts/send_to_elasticsearch.sh "$rotated_file.gz"
                fi
                
                # Archive compressed file
                mkdir -p "$ARCHIVE_DIR/$(basename $log_dir)"
                mv "$rotated_file.gz" "$ARCHIVE_DIR/$(basename $log_dir)/"
                
                echo "Rotated: $logfile -> $rotated_file.gz"
                logger -t log_rotation "Rotated log file: $logfile"
            fi
        done
    fi
done

# Cleanup old archived logs
find $ARCHIVE_DIR -name "*.gz" -mtime +$RETENTION_DAYS -delete

Centralized Log Shipping

#!/bin/bash
# /opt/scripts/ship_logs.sh

LOGSTASH_HOST="logstash.internal:5044"
LOG_SOURCES=(
    "/var/log/nginx/access.log"
    "/var/log/application/app.log"
    "/var/log/api/api.log"
)

# Ship logs to Logstash
for log_source in "${LOG_SOURCES[@]}"; do
    if [ -f "$log_source" ]; then
        # Use filebeat or custom shipping
        tail -n 100 "$log_source" |         while read line; do
            # Format as JSON for Logstash
            json_log=$(echo "$line" | jq -R -s '{
                "timestamp": now,
                "host": "'$(hostname)'",
                "source": "'$log_source'",
                "message": .
            }')
            
            # Send to Logstash
            echo "$json_log" | nc $LOGSTASH_HOST
        done
    fi
done

Performance Monitoring and Alerting

Proactive performance monitoring helps identify bottlenecks and capacity issues before they impact users.

System Performance Metrics

# Collect system metrics every minute
* * * * * /opt/scripts/collect_metrics.sh

# Generate performance reports (hourly)
0 * * * * /opt/scripts/generate_performance_report.sh

# Database performance analysis (every 15 minutes)
*/15 * * * * /opt/scripts/analyze_db_performance.sh

# Application performance monitoring (every 5 minutes)
*/5 * * * * /opt/scripts/monitor_app_performance.sh

Comprehensive Metrics Collection

#!/bin/bash
# /opt/scripts/collect_metrics.sh

METRICS_DIR="/var/metrics"
PROMETHEUS_GATEWAY="http://pushgateway:9091"
TIMESTAMP=$(date +%s)

mkdir -p $METRICS_DIR

# CPU Usage
cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
echo "cpu_usage $cpu_usage $TIMESTAMP" >> $METRICS_DIR/cpu.metrics

# Memory Usage
memory_usage=$(free | grep Mem | awk '{printf "%.2f", $3/$2 * 100.0}')
echo "memory_usage $memory_usage $TIMESTAMP" >> $METRICS_DIR/memory.metrics

# Disk I/O
disk_io=$(iostat -x 1 1 | tail -n +4 | awk '{print $10}' | head -1)
echo "disk_io_util $disk_io $TIMESTAMP" >> $METRICS_DIR/disk.metrics

# Network throughput
network_rx=$(cat /proc/net/dev | grep eth0 | awk '{print $2}')
network_tx=$(cat /proc/net/dev | grep eth0 | awk '{print $10}')
echo "network_rx_bytes $network_rx $TIMESTAMP" >> $METRICS_DIR/network.metrics
echo "network_tx_bytes $network_tx $TIMESTAMP" >> $METRICS_DIR/network.metrics

# Load average
load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
echo "load_average $load_avg $TIMESTAMP" >> $METRICS_DIR/load.metrics

# Send metrics to Prometheus
cat $METRICS_DIR/*.metrics | curl -X POST --data-binary @- "$PROMETHEUS_GATEWAY/metrics/job/system_metrics/instance/$(hostname)"

# Cleanup old metrics files
find $METRICS_DIR -name "*.metrics" -mmin +5 -delete

Container and Cloud Integration

Modern DevOps environments require cron integration with containers, orchestrators, and cloud services.

Docker Container Management

# Container health checks (every 2 minutes)
*/2 * * * * /opt/scripts/check_containers.sh

# Container cleanup (daily at 3 AM)
0 3 * * * /opt/scripts/cleanup_containers.sh

# Image updates (daily at 4 AM)
0 4 * * * /opt/scripts/update_images.sh

# Container resource monitoring (every 5 minutes)
*/5 * * * * /opt/scripts/monitor_container_resources.sh

Container Health Monitoring

#!/bin/bash
# /opt/scripts/check_containers.sh

ALERT_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

# Check all running containers
docker ps --format "table {{.Names}}	{{.Status}}" | tail -n +2 | while read container_info; do
    container_name=$(echo "$container_info" | awk '{print $1}')
    status=$(echo "$container_info" | awk '{print $2}')
    
    # Check if container is healthy
    health_status=$(docker inspect --format='{{.State.Health.Status}}' "$container_name" 2>/dev/null)
    
    if [ "$health_status" == "unhealthy" ] || [ "$status" == "Exited" ]; then
        message="🐳 Container Issue: $container_name is $status ($health_status)"
        
        # Send alert
        curl -X POST -H 'Content-type: application/json'         --data "{"text":"$message"}"         "$ALERT_WEBHOOK"
        
        # Attempt container restart
        echo "Attempting to restart $container_name..."
        docker restart "$container_name"
        
        # Log incident
        logger -t container_monitor "Container restart attempted: $container_name"
    fi
done

# Check Docker daemon health
if ! docker info >/dev/null 2>&1; then
    echo "❌ Docker daemon is not responding"
    systemctl restart docker
    logger -t container_monitor "Docker daemon restarted"
fi

Kubernetes Integration

# Kubernetes cluster monitoring (every 5 minutes)
*/5 * * * * /opt/scripts/monitor_k8s_cluster.sh

# Pod resource cleanup (hourly)
0 * * * * /opt/scripts/cleanup_k8s_resources.sh

# Backup Kubernetes manifests (daily)
0 5 * * * /opt/scripts/backup_k8s_manifests.sh

# Check node health (every 10 minutes)
*/10 * * * * /opt/scripts/check_k8s_nodes.sh

Security Scanning and Compliance

Automated security scanning and compliance checks are essential for maintaining secure infrastructure.

Security Scanning Automation

# Daily vulnerability scan at 2 AM
0 2 * * * /opt/scripts/security_scan.sh

# SSL certificate monitoring (daily at 8 AM)
0 8 * * * /opt/scripts/check_ssl_certificates.sh

# Port scan detection (every 30 minutes)
*/30 * * * * /opt/scripts/check_port_scans.sh

# Security log analysis (hourly)
0 * * * * /opt/scripts/analyze_security_logs.sh

# Compliance audit (weekly on Monday at 6 AM)
0 6 * * 1 /opt/scripts/compliance_audit.sh

Vulnerability Scanning Script

#!/bin/bash
# /opt/scripts/security_scan.sh

SCAN_TARGETS=(
    "production-web-01.internal"
    "production-api-01.internal"
    "production-db-01.internal"
)

REPORT_DIR="/var/reports/security"
ALERT_EMAIL="security-team@company.com"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

mkdir -p $REPORT_DIR

echo "Starting security scan at $(date)"

for target in "${SCAN_TARGETS[@]}"; do
    echo "Scanning $target..."
    
    # Nmap vulnerability scan
    nmap -sV --script vuln "$target" > "$REPORT_DIR/vuln_scan_$target_$TIMESTAMP.txt"
    
    # Check for critical vulnerabilities
    critical_vulns=$(grep -c "CRITICAL" "$REPORT_DIR/vuln_scan_$target_$TIMESTAMP.txt")
    high_vulns=$(grep -c "HIGH" "$REPORT_DIR/vuln_scan_$target_$TIMESTAMP.txt")
    
    if [ "$critical_vulns" -gt 0 ] || [ "$high_vulns" -gt 5 ]; then
        # Send immediate alert
        echo "CRITICAL: $target has $critical_vulns critical and $high_vulns high vulnerabilities" |         mail -s "Security Alert - $target" -A "$REPORT_DIR/vuln_scan_$target_$TIMESTAMP.txt"         $ALERT_EMAIL
        
        # Create security incident ticket
        curl -X POST https://api.ticketing-system.com/incidents         -H "Authorization: Bearer $API_TOKEN"         -H "Content-Type: application/json"         -d "{
            "title": "Security Vulnerabilities Found - $target",
            "description": "Critical: $critical_vulns, High: $high_vulns",
            "priority": "high",
            "category": "security"
        }"
    fi
done

# Generate security summary report
/opt/scripts/generate_security_report.sh $TIMESTAMP

echo "Security scan completed at $(date)"

Troubleshooting DevOps Cron Jobs

Common Issues and Solutions

Environment Variables

Cron jobs run with minimal environment. Always source your environment variables or use full paths to executables.

# Good: Source environment first
0 2 * * * source /etc/environment && /opt/scripts/backup.sh

Permission Issues

Ensure scripts have proper permissions and run as the correct user.

# Set proper permissions
chmod +x /opt/scripts/backup.sh
chown ops:ops /opt/scripts/backup.sh

Debugging and Logging

#!/bin/bash
# Enhanced logging template for DevOps scripts

SCRIPT_NAME=$(basename "$0")
LOG_FILE="/var/log/devops/$SCRIPT_NAME.log"
PID_FILE="/var/run/$SCRIPT_NAME.pid"

# Function to log with timestamp
log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') [$SCRIPT_NAME] $1" | tee -a "$LOG_FILE"
}

# Function to handle errors
error_exit() {
    log "ERROR: $1"
    cleanup
    exit 1
}

# Function to cleanup on exit
cleanup() {
    log "Cleaning up..."
    rm -f "$PID_FILE"
}

# Set up signal handlers
trap cleanup EXIT
trap 'error_exit "Script interrupted"' INT TERM

# Check if already running
if [ -f "$PID_FILE" ]; then
    if kill -0 $(cat "$PID_FILE") 2>/dev/null; then
        error_exit "Script is already running (PID: $(cat $PID_FILE))"
    else
        rm -f "$PID_FILE"
    fi
fi

# Write PID file
echo $$ > "$PID_FILE"

log "Script started"

# Your script logic here
# ...

log "Script completed successfully"

Conclusion

DevOps automation with cron jobs provides a robust foundation for infrastructure management, monitoring, and maintenance. By implementing these patterns and best practices, you can build reliable, scalable automation that enhances your operations and reduces manual intervention.

Remember to always test your cron jobs in staging environments, implement proper logging and monitoring, and have rollback procedures in place for critical automation tasks.

Ready to Implement DevOps Automation?

Start building your automated infrastructure with our interactive cron expression generator.

Create Cron Jobs Best Practices Guide

Cron Jobs for DevOps: Complete Infrastructure Automation Guide

Cron Jobs for DevOps: Complete Infrastructure Automation Guide

What You'll Learn

Infrastructure Monitoring with Cron

System Health Monitoring

Sample Health Check Script

Service Availability Monitoring

Web Service Health Check

Automated Backup and Disaster Recovery

Database Backup Strategies

Production Database Backup Script

Application and Configuration Backups

Log Management and Rotation

Custom Log Rotation

Application Log Rotation Script

Centralized Log Shipping

Performance Monitoring and Alerting

System Performance Metrics

Comprehensive Metrics Collection

Container and Cloud Integration

Docker Container Management

Container Health Monitoring

Kubernetes Integration

Security Scanning and Compliance

Security Scanning Automation

Vulnerability Scanning Script

Troubleshooting DevOps Cron Jobs

Common Issues and Solutions

Environment Variables

Permission Issues

Debugging and Logging

Conclusion

Ready to Implement DevOps Automation?

Ready to Create Your Cron Job?