Blog
spec-kitspec-driven-developmentarchitecturetask-breakdownclaude-codeai

[Spec Kit Part 5] Plan & Tasks — Technical Design and Task Breakdown

The spec pinned down the what and why. Now /speckit.plan designs the how (tech stack and architecture), /speckit.tasks turns it into a dependency-ordered task list, and /speckit.analyze cross-validates consistency across spec, plan, and tasks.

Data DynamicsJune 15, 202617 min read

In Part 4 we wrote the spec for dq-monitor (a real-time data quality monitoring service) and filled the gaps with /speckit.clarify. Now spec.md clearly states what we are building and why, deliberately without any tech stack. But a spec alone produces no code. There is a wide river between the requirement "monitor freshness" and the implementation "a Kafka consumer polls lag every 5 seconds." This post is about the two bridges that cross that river — /speckit.plan (technical design) and /speckit.tasks (task breakdown) — and /speckit.analyze (the consistency gate) that checks the bridges are sound once you have crossed.

What you'll learn in this post

  • How /speckit.plan reads spec.md + constitution.md to design the tech stack and architecture
  • The supporting artifacts the plan phase emits — data-model.md, contracts/, research.md, quickstart.md
  • How /speckit.tasks breaks the plan into a dependency-ordered task list (tasks.md)
  • How /speckit.analyze is a cross-validation gate that catches contradictions, gaps, and untraceable work across spec, plan, and tasks
  • Common pitfalls: planning before clarifying, tasks too coarse to verify, skipping analyze

This is Part 5 of the Spec Kit series. It assumes the spec was completed in Part 4, and the next Part 6, Implement & Converge, turns this task list into actual code.


1. Where Plan sits in the workflow — from "what" to "how"

Recall the Spec Kit flow: Constitution → Specify → Clarify → Plan → Tasks → Analyze → Implement → Converge. We now stand right in the middle, at the inflection point that crosses from the "what/why" territory into the "how" territory.

This boundary is a line drawn deliberately in SDD.

Spec territoryPlan territory
What and why (requirements, user stories)How (tech stack, architecture)
Technology-neutral — mentions no DB, no languageTechnology decisions — names PostgreSQL, Python, Kafka
Reviewable by non-developersA design document reviewed by engineers
Changing it doesn't shake the whole implementationChanging it cascades into tasks and code

Why split at all? If you hard-code "use PostgreSQL" into the spec, then later when you decide "actually we have so much time-series data that TimescaleDB fits better," you have to tear up the requirements document too. Separating the what from the how lets you change the how freely while the what stays fixed. This is the structural reason SDD can treat the spec as the source of truth.


2. /speckit.plan — drawing out the technical design

/speckit.plan reads two inputs: spec.md (what/why) and constitution.md (the principles that run through the project). It then writes a technical implementation plan that satisfies both to specs/001-dq-monitor/plan.md.

This is where the tech stack first appears. The boxes intentionally left blank during the spec phase now get filled in.

2.1 A realistic plan prompt

A planning-phase prompt is best given not as "design whatever you like," but as guardrails that state constraints and preferences explicitly. The emptier a decision is, the more plausible-but-baseless a choice the agent will make.

/speckit.plan
 
dq-monitor is a backend service that monitors the freshness, consistency,
and anomalies of data pipelines. UI is out of scope this round; we build
only up to the REST API and alerting.
 
Tech constraints/preferences:
- Language: Python 3.12 (team standard, data-library ecosystem)
- Input: pipeline execution events arrive on a Kafka topic
- State/metadata: PostgreSQL (check definitions, run history, alert records)
- Alerting: support a single Slack Incoming Webhook first, extensible later
- Deployment: a single container image, external deps are only Kafka & PostgreSQL
- Per the constitution's "observability first" principle, expose
  structured logging and metrics
 
Architecture must follow the constitution's module-boundary principle, and
the anomaly-detection approach should start simple/explainable (leave the
rationale in a research artifact).

The key part is the explicit invocation of the constitution. If the constitution from Part 3 holds principles like "observability first," "module boundaries," and "start with explainable simplicity," the plan must translate those principles into design decisions. The agent reads them automatically, but naming them once more in the prompt raises consistency.

2.2 What plan.md decides — the dq-monitor stack

The plan.md the agent produces carries decisions together with their rationale. A stack list without rationale becomes a myth no one can later change. For dq-monitor, decisions like these would emerge.

AreaChoiceRationale (summary)
Language/runtimePython 3.12Team standard, data validation/stats library ecosystem, constitution's "favor team familiarity"
Event inputKafka (confluent-kafka)Pipeline events already flow on Kafka, at-least-once consumption is enough
State storePostgreSQL 16The relational model for checks/history/alerts is natural and needs transactions
Check engineIn-process rule evaluatorNo separate worker needed at early scale, isolated by module boundary so it can be extracted later
Anomaly detectionRolling statistics (z-score / IQR)Explainable, easy to debug, constitution's "start with explainable simplicity"
AlertingSlack Incoming WebhookStart with a single channel, abstracted behind a Notifier interface for extension
APIFastAPI + UvicornAuto-generated OpenAPI keeps it in sync with contracts
ObservabilityStructured (JSON) logging + /metrics (Prometheus)Constitution's "observability first"

Record simplicity as a decision, not an excuse. Choosing "z-score instead of ML for anomaly detection" is not laziness — it is a decision. Leave the rationale (explainability, operational cost, lack of early data) in research.md, and six months later the question "why didn't you use ML?" is answered by a document, not by the code.

2.3 plan.md excerpt — architecture overview

The body of plan.md typically includes a component diagram and data flow like this.

## Architecture Overview
 
dq-monitor is a single service composed of 4 internal modules.
 
1. **Ingest** — Kafka consumer. Receives pipeline execution events,
   normalizes them, and hands them to the Check Engine.
2. **Check Engine** — looks up Check definitions matching the event and
   evaluates rules (freshness/consistency/anomaly) to produce CheckResults.
3. **Alert Router** — takes FAIL CheckResults and routes them per the alert
   policy to a Notifier (currently Slack). Applies a dedup window.
4. **API** — REST layer exposing check CRUD and result/alert queries.
 
All modules share PostgreSQL as the state store, and inter-module calls go
only through explicit interfaces (constitution: module boundaries).

3. Supporting artifacts of the plan phase

/speckit.plan does not emit plan.md alone. It generates, under specs/001-dq-monitor/, the supporting artifacts that unfold the plan into a verifiable form. These artifacts are exactly the material that lets the next phase (tasks) extract work without guessing.

specs/001-dq-monitor/
├── spec.md            # (Part 4) what/why
├── clarifications.md  # (Part 4) clarification Q&A
├── plan.md            # technical design — how
├── data-model.md      # entities & schema
├── contracts/
│   └── api-spec.json  # REST contract
├── research.md        # technology investigation & trade-offs
└── quickstart.md      # setup & validation procedure

3.1 data-model.md — entities and schema

The nouns of the spec ("pipeline," "check," "alert") become tables here for the first time.

## Entities
 
| Entity | Description | Key fields |
|---|---|---|
| Pipeline | A monitored data pipeline | id, name, owner, sla_minutes |
| Check | A quality rule attached to a pipeline | id, pipeline_id, type, params(jsonb), enabled |
| CheckResult | The result of one check evaluation | id, check_id, status, observed(jsonb), evaluated_at |
| Alert | An alert derived from a FAIL result | id, check_result_id, channel, sent_at, dedup_key |
 
- Check.type ∈ { freshness, consistency, anomaly }
- CheckResult.status ∈ { PASS, FAIL, ERROR }
- Alert.dedup_key = hash(check_id, day-bucket) — suppress to 1 alert/day per check

Pinning part of the schema as DDL removes ambiguity in the task phase.

CREATE TABLE check_result (
    id           BIGSERIAL PRIMARY KEY,
    check_id     BIGINT NOT NULL REFERENCES check_def(id),
    status       TEXT   NOT NULL CHECK (status IN ('PASS','FAIL','ERROR')),
    observed     JSONB  NOT NULL,
    evaluated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_check_result_check_time ON check_result (check_id, evaluated_at DESC);

3.2 contracts/api-spec.json — the REST contract

We pin the shape of the API before the code. This contract later becomes the reference point for the test tasks (a contract violation = a failing test).

{
  "openapi": "3.1.0",
  "info": { "title": "dq-monitor API", "version": "0.1.0" },
  "paths": {
    "/checks": {
      "post": {
        "summary": "Create a check definition",
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": { "$ref": "#/components/schemas/CheckCreate" }
            }
          }
        },
        "responses": {
          "201": { "description": "Created" },
          "422": { "description": "Validation error" }
        }
      }
    },
    "/alerts": {
      "get": {
        "summary": "List alerts",
        "parameters": [
          { "name": "since", "in": "query", "schema": { "type": "string", "format": "date-time" } },
          { "name": "status", "in": "query", "schema": { "type": "string", "enum": ["FAIL", "ERROR"] } }
        ],
        "responses": { "200": { "description": "Array of alerts" } }
      }
    }
  },
  "components": {
    "schemas": {
      "CheckCreate": {
        "type": "object",
        "required": ["pipeline_id", "type", "params"],
        "properties": {
          "pipeline_id": { "type": "integer" },
          "type": { "type": "string", "enum": ["freshness", "consistency", "anomaly"] },
          "params": { "type": "object" }
        }
      }
    }
  }
}

3.3 research.md — technology investigation and trade-offs

This is where the rationale for the plan's "simple statistics-based anomaly detection" decision lives. The point is to record the alternatives and why they were rejected.

## Choosing the anomaly-detection approach
 
### Candidates
| Approach | Pros | Cons |
|---|---|---|
| Rolling z-score | Simple to implement/explain, works immediately | Sensitive to the distribution assumption (normality) |
| IQR (quartiles) | Robust to outliers, weak distribution assumption | Needs window-size tuning |
| ML (e.g., Isolation Forest) | Captures complex patterns | Burden of training data, ops cost, explainability |
 
### Decision
Default to **IQR-based** initially, with z-score offered as an option.
- Matches the constitution principle "start with explainable simplicity"
- Insufficient training data in the first operational phase
- Leave extension room by accepting `method` via Check.type=anomaly params
 
### Open questions
- Default window size (e.g., last 50) to be revisited with quickstart sample data.

3.4 quickstart.md — setup and validation procedure

This records "how do you bring it up if you build per this plan, and what confirms it is correct." It is also a rehearsal of the acceptance criteria for when implementation is done.

## Quickstart
 
### 1. Start dependent services
docker compose up -d kafka postgres
 
### 2. Migrate & run the service
make migrate
make run        # FastAPI(:8000) + Kafka consumer together
 
### 3. Smoke validation
# create a check
curl -X POST localhost:8000/checks -d '{"pipeline_id":1,"type":"freshness","params":{"sla_minutes":30}}'
# inject a stale event → FAIL → confirm a Slack alert arrives
make seed-stale-event
curl 'localhost:8000/alerts?status=FAIL'   # the alert just raised should appear

4. /speckit.tasks — breaking the plan into tasks

Once the plan and supporting artifacts are in place, /speckit.tasks reads them and produces an actionable, ordered, dependency-aware task list as tasks.md.

A good task breakdown has two properties.

  1. Verifiable granularity. Each task must let you clearly judge "done or not." "Implement the check engine" is too big. "freshness rule evaluation function + unit test" is about right.
  2. Dependency-aware ordering. You can't build the API without the data model, nor route alerts without the check engine. Tasks must follow this causal order.

4.1 tasks.md excerpt — build order

The natural build order for dq-monitor is scaffolding → data model → check engine → alert routing → API → tests. [P] marks tasks that can run in parallel once their prerequisites are done.

# Tasks: dq-monitor (001)
 
## Phase 0 — Scaffolding
- [ ] T001  Project structure, deps, docker compose (Kafka/PostgreSQL) setup
- [ ] T002  Structured logging, config loader, /metrics endpoint skeleton (constitution: observability)
 
## Phase 1 — Data model  (depends: T001)
- [ ] T003  SQL migrations from data-model.md (pipeline, check_def, check_result, alert)
- [ ] T004  ORM/repository layer + migration-applied test
 
## Phase 2 — Check engine  (depends: T004)
- [ ] T005  [P] freshness rule evaluator + unit tests
- [ ] T006  [P] consistency rule evaluator + unit tests
- [ ] T007  [P] anomaly (IQR) rule evaluator + unit tests (per research.md)
- [ ] T008  CheckEngine dispatcher: event → matching check → persist CheckResult
 
## Phase 3 — Input pipeline  (depends: T008)
- [ ] T009  Kafka consumer: receive/normalize events, call CheckEngine (at-least-once)
 
## Phase 4 — Alert routing  (depends: T008)
- [ ] T010  Notifier interface + SlackWebhookNotifier implementation
- [ ] T011  AlertRouter: route FAIL results + dedup_key suppression
 
## Phase 5 — API  (depends: T004, contracts/api-spec.json)
- [ ] T012  [P] POST /checks (CheckCreate validation → 201/422)
- [ ] T013  [P] GET /alerts (since/status filters)
 
## Phase 6 — Integration tests  (depends: T009, T011, T013)
- [ ] T014  End-to-end: inject stale event → FAIL → Slack alert (quickstart scenario)
- [ ] T015  Contract test: validate response schemas against api-spec.json

You can also unfold this as a GFM table — making traceability explicit eases the next phase (analyze).

IDTaskDependsTraces to (requirement/artifact)
T005freshness evaluatorT004FR-2 (freshness monitoring), data-model
T007anomaly (IQR) evaluatorT004FR-4 (anomaly monitoring), research.md
T011AlertRouter + dedupT008FR-6 (alerting), data-model: Alert.dedup_key
T015contract testT013contracts/api-spec.json

Traceability is the point. Every task must be traceable back to some requirement or artifact. A task that traces nowhere is "work no one knows the reason for," and a requirement with no task is "a promise that won't be implemented." The analyze of the next section catches exactly these two.


5. /speckit.analyze — cross-validating consistency and coverage

The task list is out, but you must not jump to implementation yet. /speckit.analyze is a quality gate that cross-validates the three documents — spec, plan, and tasks. Always run it after tasks, before implement — discovering gaps after you have started writing code costs several times more.

The representative problem types analyze catches are three.

Problem typeMeaningExample
Coverage gapNo task corresponds to a requirementThe spec's "alert on ERROR too" requirement is in no task
Untraceable taskA task that traces back to no requirementAn "email alert" task appears that's in neither plan nor spec
ContradictionPlan conflicts with spec (or tasks)Spec says "1 alert/day," plan says "alert on every FAIL"

5.1 Example analyze report

# Analyze Report — 001-dq-monitor
 
## ✅ Consistent (summary)
- FR-1/2/3/4 (pipeline/3 check types) → covered by T003–T008
- Constitution "observability first" → reflected in T002
 
## ⚠️ Issues found
| # | Severity | Type | Detail |
|---|---|---|---|
| 1 | HIGH | Coverage gap | No task corresponds to spec FR-7 "notify operators on CheckResult.status=ERROR too." AlertRouter (T011) routes only FAIL. |
| 2 | MED | Contradiction | Spec says "1 alert/day per check" (dedup=day-bucket), but plan.md's Alert Router section describes a "dedup window" of 5 minutes. The dedup definitions disagree. |
| 3 | LOW | Untraceable task | T013's GET /alerts has a `status` filter, but the spec states no alert-filter requirement (nice-to-have vs out-of-scope call needed). |
 
## Recommendations
- Issue 1: extend T011 to "route FAIL+ERROR" or add a new task.
- Issue 2: unify the dedup criterion across spec/plan/data-model (day-bucket recommended).
- Issue 3: add an alert-query requirement to the spec or drop the filter from the task.

Issue 2 shows the real value of analyze. data-model.md (dedup_key = day-bucket), plan.md (5-minute window), and spec (1 alert/day) were subtly out of step. analyze reveals in one pass the kind of inconsistency that is hard for a human to catch reading three documents back and forth. Had you gone into implementation without this step, you would have hit "wait, what was the criterion?" only after coding the entire dedup logic.

analyze does not fix. It reveals. How to resolve a found issue (which document to treat as truth) is the human's call. Usually you regress toward the spec and unify it as the source of truth.


6. The whole flow at a glance

Here is how the phases so far flow from which inputs to which outputs, and where analyze acts as a gate.

Loading diagram…

The key is the direction of the arrows. Everything starts from spec.md + constitution.md and grows steadily more concrete, and analyze sends back upward the misalignments newly introduced during that concretization. Only after passing the gate do you descend to implementation.


7. Common pitfalls

Using a tool honestly means knowing its pitfalls too. Here are three we see repeatedly in the plan/tasks phase.

Pitfall 1 — Planning before clarifying

The most expensive mistake. If you run /speckit.plan with ambiguity still in the spec, the agent fills the blanks however it likes as it designs. The freshness threshold ("30 minutes or 24 hours?") is undecided, yet the plan fixes the architecture on an arbitrary value, tasks pile on top, and code climbs over that. A tower built on a wrong assumption hurts more the higher it falls. This is exactly why Part 4's /speckit.clarify comes before plan.

Pitfall 2 — Tasks too coarse to verify

If tasks.md contains one-line monsters like "build the check engine" or "implement the API," those tasks are impossible to judge done. Tasks you can't judge done make progress lie ("we're 80% there" stays 80% forever), and analyze loses the unit at which to check traceability. The right task size is "one person finishes in half a day to a day, and can show it's done with a test."

Pitfall 3 — Skipping analyze

The temptation is strong: "tasks are out, let's just implement." But skip analyze and you discover problems like the dedup mismatch above in the middle of implementation. By then you already have code built on a wrong assumption, and you must fix both documents and code. analyze takes 30 seconds to a few minutes, but the rework it prevents takes hours. The gate is insurance, not a toll.

PitfallSymptomRemedy
Plan before clarifyplan gets baseless arbitrary values baked inClarify first, state constraints in the plan prompt
Coarse tasksProgress lies, completion can't be judgedSize tasks so a test can prove the end
Skip analyzeInter-document contradictions found mid-implementationMake analyze a mandatory gate right after tasks

Wrapping up

In this post we turned the spec for dq-monitor into an actionable blueprint and task list. /speckit.plan drew out the tech stack and architecture, and data-model.md, contracts/, research.md, and quickstart.md unfolded that design into a verifiable form. /speckit.tasks broke it into dependency-ordered tasks, and /speckit.analyze revealed the misalignments among the three documents before implementation.

We now hold a numbered, traceable, consistency-validated task list. No more agonizing over "where do I start coding." In the next Part 6, Implement & Converge, /speckit.implement turns these tasks into actual code in order, ties into GitHub issues, and /speckit.converge reconciles artifacts against the codebase to recover any missed work. That's the moment the spec finally becomes a running service.

References