Airflow 3 Testing, CI/CD & Security in Practice
From DAG testing and CI/CD pipelines to RBAC, JWT, Secrets Backend, and DAG processor isolation — how to run Airflow 3 safely.
Writing a good DAG and running that DAG safely and reproducibly are two different problems. Once the era of hand-copying files onto a server is over, someone will inevitably ask questions like "Was this DAG validated before deployment?", "Does the worker hold the metadata DB password?", or "Who is allowed to trigger this pipeline?". This article answers those three questions — testing, CI/CD, and security — from an Airflow 3 perspective.
This is Part 11 of the Airflow 3 in Practice series. If the previous part, Part 10: Monitoring & Operations, covered "how to observe a running pipeline," this part covers "how to safely get it there." The next part, Part 12: Production Best-Practice Checklist, ties everything together on a single page.
Airflow 3 reorganized its security model alongside the changes to its component architecture. In particular, the fact that the DAG processor is split out into its own process and that workers no longer connect directly to the metadata DB (Task Execution API) are changes we'll treat as important in the security section. For the architecture in detail, see Part 1: Anatomy of the Architecture.
1. DAG Testing — What to Validate, and in What Order
DAG testing is not about grand infrastructure; it's about stacking up three layers. As you go down, each layer gets slower and more expensive, but your confidence grows.
| Layer | What it validates | Speed | Tools |
|---|---|---|---|
| 1. Import check | Does the DAG file parse without errors? | Seconds | python, pytest |
| 2. Unit tests | Is the logic of the task functions correct? | Seconds to a few seconds | pytest |
| 3. Integration run | Does one DAG run actually complete end to end? | Seconds to minutes | dag.test(), airflow dags test |
1.1 Import Error Checks — the Cheapest Net That Catches the Most
The most common reason a DAG breaks in production is not a logic bug but an import failure. A single typo, a missing dependency, or a wrong module path can make a DAG vanish entirely. In Airflow 3, the DAG processor handles parsing, so a parse failure means "that DAG is invisible to the scheduler."
This is the very first check you should run in CI. The approach is to gather all DAG files and load them into a DagBag, then fail if there is even one import error.
# tests/test_dag_integrity.py
import pytest
from airflow.models import DagBag
@pytest.fixture(scope="session")
def dagbag():
# include_examples=False: exclude the bundled example DAGs from the check
return DagBag(dag_folder="dags/", include_examples=False)
def test_no_import_errors(dagbag):
assert not dagbag.import_errors, (
f"DAG import errors:\n{dagbag.import_errors}"
)
def test_dag_count(dagbag):
# Minimum guard against a regression where DAGs disappear entirely
assert len(dagbag.dags) >= 1If you add organizational rules as automated checks here — such as "every DAG must have an owner and tags" or "retries must be at least 1" — then CI catches what a reviewer used to verify by eye every time.
1.2 Unit Tests — Call the Task "Function" Directly
A TaskFlow function written with Airflow 3's Task SDK (from airflow.sdk import dag, task) is ultimately just a plain Python function. If you keep your business logic separate from Airflow, you can test it by simply calling it, without starting up a scheduler.
# dags/sales_etl.py
from airflow.sdk import dag, task
def transform_rows(rows: list[dict]) -> list[dict]:
# Pure logic: it knows nothing about Airflow -> easy to test
return [r for r in rows if r["amount"] > 0]
@dag(schedule="@daily", catchup=False, tags=["sales"])
def sales_etl():
@task
def clean(rows: list[dict]) -> list[dict]:
return transform_rows(rows)
clean([])
sales_etl()# tests/test_sales_etl.py
from dags.sales_etl import transform_rows
def test_transform_drops_non_positive():
rows = [{"amount": 10}, {"amount": 0}, {"amount": -5}]
assert transform_rows(rows) == [{"amount": 10}]Takeaway: Move logic out of Airflow. Instead of cramming core computation inside a
@task, separate it into a pure function — tests get faster and reuse gets easier.
1.3 Integration Run — dag.test() and airflow dags test
Once you've validated the parts with unit tests, the next step is to check whether one DAG run actually completes end to end. Airflow 3 offers two ways to execute a single DAG run in a single process, without a metadata DB or scheduler.
# From within Python: attach a debugger and run a single run as-is
if __name__ == "__main__":
sales_etl().test()# From the CLI: run a single run for a specified logical_date
airflow dags test sales_etl 2026-07-04dag.test() is the key tool for local debugging. You can attach an IDE debugger and step through the inside of a task line by line, breaking the vicious cycle of "you only find out once you push it to the real server." Note that in Airflow 3, execution_date has been removed in favor of logical_date, and for asset-triggered or manual runs logical_date may be None — so for time-based logic it's safer to use data_interval_start/end (for details, see Part 6: Scheduling & Assets).
2. CI/CD Pipeline — From Commit to Deployment
If you run tests by hand, you'll eventually stop running them. The goal is to bake every check into the pipeline so that only what passes gets deployed. The stages form a simple straight line.
The following is the typical flow a single git push goes through. If an earlier stage fails, the later stages don't even start.
The key is to put fast checks first and slow checks later. Discovering a problem that a 1-second lint could have caught only after a 5-minute image build is that much wasted effort.
Translated into GitHub Actions, it looks roughly like this (example).
# .github/workflows/airflow-ci.yml
name: airflow-ci
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -r requirements.txt apache-airflow
- run: ruff check dags/ # 1. lint
- run: pytest tests/test_dag_integrity.py # 2. import test
- run: pytest tests/ # 3. unit tests
build:
needs: test # runs only if test passes
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: docker build -t registry.example.com/airflow:${{ github.sha }} .
- run: docker push registry.example.com/airflow:${{ github.sha }}2.1 Three Ways to Deliver DAGs to the Workers
Passing CI isn't the end. There's still the deployment-strategy step — actually getting that DAG file to where the scheduler and workers can see it. There are broadly three approaches, with clear trade-offs.
| Approach | How it works | Pros | Cons |
|---|---|---|---|
| Bake into the image | Include the DAGs in the container image at build time | Immutable and reproducible, bundled together with dependencies | A one-line fix requires a rebuild and redeploy; rollback = swapping the image |
| git-sync | A sidecar periodically pulls the git repo | Changes apply with just a code push, fast iteration | Versions can drift from the image/dependencies; hard to guarantee all components run the same code |
| DAG bundle | Declaratively define the DAG source (git, etc.) | Per-source version tracking, aligns with DAG versioning | A relatively new mechanism in 3.x, so operational know-how is still accumulating |
What's noteworthy in Airflow 3 is DAG bundles. It's a mechanism for defining "where a DAG comes from" (e.g., a specific git repository and revision), and combined with DAG versioning it lets you track which run executed which version of the DAG in the UI. Where git-sync used to "just pull the latest files," a bundle elevates source and version to first-class concepts.
Selection criteria: if reproducibility and isolation are top priority, bake into the image; if iteration speed is top priority, use git-sync/bundle. If your dependencies (Python packages) change often, you'll have to rebuild the image anyway, so "baking" comes naturally.
3. Security — Who Can Do What, and Where
Security is not about "turning a feature on or off"; it's about drawing trust boundaries. With Airflow 3's separation of components, these boundaries have become much clearer. Let's go through them in order.
3.1 RBAC — Roles and Permissions
Airflow provides role-based access control (RBAC). Instead of granting permissions directly to users, you bundle permissions into a Role, then assign roles to users. The default roles break down roughly as follows (example).
| Role | Rough permissions | For whom |
|---|---|---|
Admin | Full control, including user/role management | Platform operators |
Op | Trigger/pause DAGs, view configuration | Operations staff |
User | View and trigger DAGs | Pipeline developers |
Viewer | Read-only | Stakeholders, dashboards |
Public | Almost none | Unauthenticated |
The core principle is least privilege. Giving everyone Admin defeats the purpose of enabling RBAC. Create custom roles per team and express boundaries through permissions, like "this team only sees its own DAGs." This is also the starting point for the multi-tenancy we'll cover later.
3.2 API Server Authentication — JWT Tokens
In Airflow 3, the old webserver has been replaced by the API server, which serves both the UI and a stable, versioned REST API (the old /api/experimental has been removed). Authentication for this REST API is based on JWT tokens. The API server issues tokens or integrates with external authentication, and the client sends that token on every request.
# 1) Issue a token (example — the actual path/payload depends on your deployment config)
TOKEN=$(curl -s -X POST https://airflow.example.com/auth/token \
-H "Content-Type: application/json" \
-d '{"username":"ci-bot","password":"..."}' | jq -r .access_token)
# 2) Trigger a DAG with the issued JWT
curl -X POST https://airflow.example.com/api/v2/dags/sales_etl/dagRuns \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"logical_date": "2026-07-04T00:00:00Z"}'This is exactly the flow you use when triggering a "remote smoke test after deployment" in CI/CD. For more detail on handling remote scheduling via the REST API, see Part 9: REST API & Remote Schedule Changes.
3.3 Secrets Backend — Keep Secrets Out of Code
Leaving passwords and tokens from Connections and Variables in the metadata DB or environment variables as plaintext is a seed of trouble. Airflow connects external secret stores like Vault, AWS Secrets Manager, or GCP Secret Manager through a Secrets Backend, so secrets are managed outside of code and the metadata DB. Configuration and patterns were covered in Part 8: External System Integration & Synchronous Calls, so here we just note that "from a security standpoint, you must turn this on."
3.4 DAG Processor Separation and the Task Execution API — the Heart of Isolation
This is the most important change in Airflow 3 security. Two things interlock to redraw the trust boundary.
- DAG processor separation: DAG parsing is pulled out of the scheduler and runs as an independent process. The part that executes user code (DAG files) is separated from the scheduling core, so you can isolate code of differing trust levels.
- Task Execution API: workers (tasks) no longer connect directly to the metadata DB. Instead, they exchange state only through the API server's Task Execution Interface.
The security benefit of this change is significant. In the past, every worker held metadata DB credentials, so the architecture was such that if a single worker was compromised, the entire metadata DB was exposed. In Airflow 3, workers don't know the metadata DB. A worker sees only a narrow API surface and communicates only within an authenticated scope. This is what makes it safe to place remote/edge workers as well (EdgeExecutor).
Below is the trust boundary, from a user request coming in to a task being executed. The dotted lines are the security boundaries, and the key point is that there is no direct arrow from the worker side toward the metadata DB.
3.5 Multi-Tenancy — "One Cluster, Many Teams"
When multiple teams share one Airflow, you build boundaries by combining the elements above. Airflow does not provide perfect OS-level tenant isolation within a single cluster, so you approach it with defense in depth.
- Restrict per-team DAG access with RBAC custom roles.
- Separate resources with Pools and
priority_weightso one team can't monopolize the slots (see Part 3: Configuration & Optimization). - Separate per-team secret access with the Secrets Backend's paths/policies.
- If you need strong isolation, consider per-task pod isolation with the KubernetesExecutor, and for even stronger isolation, separate clusters per team.
In one line: multi-tenancy is not a single switch but a matter of layering RBAC + resource pools + secret separation + execution isolation. When isolation requirements are very strong, "splitting things up" (separate clusters) is often the simplest answer.
4. Wrap-Up — In Three Lines
- Test in three layers: import checks (cheapest) -> unit tests (move logic out of Airflow) -> integration validation of a single run with
dag.test(). - CI/CD starts with the fast checks: lint -> DAG import -> build -> deploy. For deployment, bake into the image for reproducibility, or git-sync/DAG bundle for speed.
- Security is about drawing boundaries: least privilege with RBAC, API authentication with JWT, secret isolation with the Secrets Backend. And make use of Airflow 3's greatest gift — the fact that workers don't know the metadata DB.
In the next part, Part 12: Production Best-Practice Checklist, we'll tie everything from the entire series into a single pre-deployment checklist. See you there.
Official docs: Apache Airflow Documentation