Swiftorial Logo
Home
Swift Lessons
Tutorials
Learn More
Career
Resources

System Design FAQ: Top Questions

47. How would you design a Cron Scheduler System like Airflow or Kubernetes CronJobs?

A Cron Scheduler runs jobs at recurring intervals, e.g., every 5 minutes or at midnight UTC. It is commonly used for ETL pipelines, batch jobs, reporting, and maintenance tasks.

📋 Functional Requirements

  • Register recurring jobs with cron expressions
  • Trigger jobs accurately and reliably
  • Track job history, retries, and status
  • Prevent duplicate execution in distributed settings

📦 Non-Functional Requirements

  • High availability
  • Exactly-once or at-least-once execution guarantee
  • Alerting and observability

🏗️ Core Components

  • Scheduler: Parses cron rules and emits triggers
  • Executor: Runs job in Docker/K8s/VM
  • Metadata Store: Job config, logs, state
  • Lock Manager: Ensures single execution per job

⏰ Cron Expression Example


# Run every hour at minute 0
0 * * * * /scripts/export.sh
        

🗄️ PostgreSQL Schema Example


CREATE TABLE cron_jobs (
  id UUID PRIMARY KEY,
  name TEXT,
  cron_expr TEXT,
  command TEXT,
  last_run TIMESTAMP,
  status TEXT,
  retry_policy JSONB
);
        

🔒 Locking with Redis SETNX


import redis, time

def acquire_lock(job_id):
    r = redis.Redis()
    return r.set(f"lock:{job_id}", "1", nx=True, ex=300)
        

⚙️ Airflow-style DAG Config (Python)


from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG("daily_etl", schedule="@daily", start_date=datetime(2024, 1, 1)) as dag:
    t1 = BashOperator(task_id="extract", bash_command="python extract.py")
    t2 = BashOperator(task_id="transform", bash_command="python transform.py")
    t3 = BashOperator(task_id="load", bash_command="python load.py")
    t1 >> t2 >> t3
        

📈 Observability

  • Success/failure rate over time
  • Average execution duration
  • Missed or overlapping runs

🧰 Tools/Infra Used

  • Scheduler: Quartz (Java), croniter (Python), K8s native
  • Queue: Celery, RabbitMQ, Kubernetes Job CRDs
  • Logs: ELK stack, Prometheus + Grafana

📌 Final Insight

Cron scheduling must balance timing accuracy with job safety. Using a reliable lock mechanism and storing metadata for state tracking ensures safe concurrent job execution, especially in distributed environments.