Airflow 3 in Practice, Part 0: Why, and When to Use It
From the problems workflow orchestration solves, to the core changes in Airflow 3, to the roadmap for this 12-part practical series.
This is Part 0 of the Airflow 3 in Practice series. Before diving into a full architecture dissection or cluster setup, let's step back and first sort out "so why Airflow, and when does it actually make sense to use it?" We'll take it slowly so that even readers encountering workflow orchestration for the first time can follow all the way through. Starting with the next part, Part 1: Architecture Dissection digs into the internal structure.
The limits of getting by with cron and scripts
When you have two or three data pipelines, a single line of cron is enough: "Run the extract script at 3 a.m. daily, run the load script at 4 a.m." The trouble starts the moment your pipelines multiply and dependencies emerge between steps.
Once a bundle of cron jobs or shell scripts grows past a certain scale, the following three issues almost always trip you up.
- Dependencies can't be expressed. cron doesn't know "run the transform only after the extract finishes." So people paper over it with timing: "extract usually takes 30 minutes, so start transform 30 minutes later." On a day when extract takes 35 minutes, transform reads incomplete data.
- There's no retry or failure handling. What if a script dies midway? cron simply stays silent until the next schedule. You have to hand-code all the retry logic, backoff, and partial reruns inside the scripts themselves.
- There's no observability. To see how far last night's pipeline got, how long each step took, or why it failed, you have to SSH into the server and dig through logs. Answering "why was last Tuesday's load empty?" is practically impossible.
Comparing these three deficiencies in a single picture looks like this.
cron only knows "when to run." An orchestrator knows "what depends on what, what to do on failure, and how far things have gotten right now."
So, the problems an orchestrator solves
A workflow orchestrator represents jobs as a Directed Acyclic Graph (DAG). Nodes are things to do (tasks), and edges are dependencies — "this must finish before that runs." Looking at this graph, the orchestrator automatically handles the following.
- Scheduling tasks in dependency order (running in parallel what can run in parallel)
- Retries, backoff, and alerts on failure
- Centralized recording and visualization of run history, logs, and elapsed time
- Rerunning past intervals (backfill), manual triggering, and pausing
Airflow is the most widely used open source tool in this space, and a defining trait is that you define DAGs as Python code. Because it's code rather than YAML or a GUI, version control, testing, and code review all work out of the box.
When Airflow is a good fit / when it isn't
No tool is a cure-all. Drawing an honest line between where Airflow shines and where it doesn't looks like this.
Good fit
- Batch-centric ETL/ELT, data warehouse loading, periodic report generation
- Pipelines with many interdependent steps (dozens to thousands of tasks)
- Work that ties together heterogeneous systems (DB → S3 → Spark → BI)
- Operations where schedule + backfill + rerun history matter
Poor fit
- Millisecond-level low-latency streaming (that's the domain of Kafka/Flink)
- Request-response real-time API processing
- Processing thousands of ultra-lightweight events per second (the scheduler overhead is a burden)
Airflow is a "batch orchestrator," not a "streaming engine." The moment you try to cram real-time processing into Airflow is usually a sign of the wrong tool choice.
A brief comparison with the alternatives
There are several good alternatives in the orchestrator ecosystem. It's better to understand them as "differences in disposition" than as an absolute ranking.
| Tool | Definition style | Strengths | Where it tends to fit |
|---|---|---|---|
| Airflow | Python DAG | Widest ecosystem and integrations (providers), mature operational tooling | General-purpose batch ETL, complex dependencies, large-scale scheduling |
| Prefect | Python function decorators | Dynamic workflows, lightweight developer experience | Python-centric teams, dynamic/event-driven flows |
| Dagster | Asset-centric (software-defined assets) | First-class support for data assets, types, and tests | Teams looking to model data asset lineage and quality |
| Argo Workflows | Kubernetes CRD (YAML) | Container-native, K8s-friendly | Container-level pipelines on top of K8s |
Interestingly, Airflow 3 introduces the Asset-based scheduling we'll see later, absorbing a good deal of the "data-asset-centric" thinking that Dagster emphasized.
Airflow 2.x → 3.x: what changed
This entire series is based on Airflow 3.0/3.x (GA in 2025). If you've used 2.x, getting the changed points into your head first will make the rest of the series much smoother. Distilled to the essentials, here's the picture.
| Area | 2.x | 3.x | Meaning |
|---|---|---|---|
| Web component | Webserver | API server (UI + REST API unified) | A single component provides the UI and a stable, versioned REST API |
| DAG parsing | Inside the scheduler | DAG processor separated | DAG parsing isolated into an independent process (stability and security) |
| Task execution | Worker connects directly to the metadata DB | Task SDK + Task Execution API | Workers communicate through the API server → remote and language-agnostic execution |
| Data-aware scheduling | Dataset | Asset (@asset, schedule=[asset]) | Cleaned-up terminology and features, asset-centric scheduling |
| Execution tracking | Weak notion of versioning | DAG versioning | Track in the UI which version of a DAG a run executed with |
| Backfill | Mostly CLI | Scheduler-managed backfill (UI/API) | Trigger from the UI/API and the scheduler carries it out |
catchup default | True | False | Prevents the accident of a new DAG running a flood of past intervals |
| Time field | execution_date | logical_date (or data_interval_*) | execution_date removed; can be None on manual/asset triggers |
| Removed items | SubDAG, SLA, old experimental REST API | Replaced by TaskGroup/Asset, Deadline, stable REST API | Cleanup of confusing legacy |
| Executor | Local/Celery/Kubernetes | + EdgeExecutor, plus hybrid (multiple at once) | HTTP-based remote/edge workers, multiple executors running side by side |
Import paths changed too. In 3.x, airflow.sdk is the recommended entry point.
from airflow.sdk import dag, task, Asset
@dag(schedule="@daily", catchup=False) # in 3.x, the catchup default is False
def daily_etl():
@task
def extract():
return {"rows": 1000} # example value
@task
def load(payload: dict):
print(f"Number of rows loaded (example): {payload['rows']}")
load(extract())
daily_etl()Why this code looks the way it does and what happens internally is covered in depth in Part 4: The Right Way to Author DAGs. For now, it's enough to remember just the one structural change: "workers no longer attach directly to the metadata DB but go through the Task Execution API."
What you'll learn in this series (roadmap)
The overall flow is laid out in the order "why → structure → setup → authoring → integration → operations." On a single page, it looks like this.
You can jump directly to each part from the table of contents below.
| Part | Title | One-line gist |
|---|---|---|
| 0 | (this article) Series overview & when to use it | Defining the orchestration problem and an overview of the 3.x changes |
| 1 | Architecture Dissection | How the Scheduler, API server, DAG processor, Triggerer, and Worker mesh together |
| 2 | Setting Up as a Cluster | Going beyond a single node to deploy components separately |
| 3 | Configuration & Optimization | Tuning the three concurrency layers, Pools, and resource isolation |
| 4 | The Right Way to Author DAGs | Task SDK, decorators, TaskGroup, best practices |
| 5 | Advanced DAG Techniques | Scripts, parameters, error handling, PostgreSQL, reruns, date references |
| 6 | Scheduling & Asset | cron schedules and asset-driven scheduling |
| 7 | XCom & Passing Data | Passing data between tasks and its limits |
| 8 | Integrating External Systems & Synchronous Calls | Connections, Hooks, providers, and the deferrable pattern |
| 9 | REST API & Remote Schedule Changes | Remote triggering and control via the JWT-authenticated REST API |
| 10 | Monitoring & Operations | Metrics, alerts, logs, and the SLA replacement (Deadline) |
| 11 | Testing, CI/CD, Security | DAG testing, pipelines, and managing permissions and secrets |
| 12 | Production Checklist | Everything to check before going into production |
Wrapping up
The essence of orchestration is filling cron's three deficiencies — "dependencies, retries, observability." Airflow is a mature tool for expressing all of that as Python code, and with 3.x its structure has become a notch cleaner thanks to the API server/DAG processor separation, the Task Execution API, and Asset scheduling. That said, for work of a different nature, such as real-time streaming, it's right to use a different tool.
The starting point for choosing a tool is the question, "Is my workload a batch dependency graph?" If so, Airflow is almost always a candidate.
In the next part, we'll take apart how the components we just skimmed in the table actually mesh together and run.
➡️ Next part: Part 1 — Airflow 3 Architecture Dissection