prometheuspushgatewaymetricsbatchobservabilitypromql

Sending Metrics with the Prometheus Pushgateway — client library, raw HTTP, and per-METHOD behavior

How to get metrics from short-lived batch jobs into Prometheus. We cover the Pushgateway's grouping key, pushing with the Python and Java client libraries, pushing with raw HTTP (curl), how the Pushgateway internally handles metrics per PUT/POST/DELETE, the honor_labels setting, and staleness in operations.

Data DynamicsJune 12, 20268 min read

Prometheus is pull-based: it periodically scrapes targets to collect metrics. But a batch/cron job that finishes and disappears within seconds never gives Prometheus a chance to scrape it — no scrape will line up with the brief moment the job is alive. The bridge for this case is the Pushgateway. Just before the job exits, it pushes its metrics to the Pushgateway, which holds the values, and Prometheus pulls the Pushgateway as usual.

What this post covers:

When to use the Pushgateway — and when not to
The grouping key, the unit of every push
Pushing with the client libraries (Python and Java)
Pushing with raw HTTP (curl)
How the Pushgateway handles metrics per PUT/POST/DELETE
The Pushgateway's own metrics, the honor_labels setting, and staleness in operations

1. When to use the Pushgateway (and when not to)

The Pushgateway is a metrics cache. It holds pushed values in memory and re-exposes them at /metrics — it does not sum or aggregate. A new value at the same spot simply overwrites the old one.

Use it for

Short-lived batch/cron/ETL jobs — push values like "last success time," "records processed," "duration" right before the job exits.

Don't use it for (anti-patterns)

Metrics of a normal long-running service — just open a /metrics endpoint and let Prometheus pull it (see the earlier post "Building a Prometheus Exporter").
Trying to "send counts from many instances and add them up" — the Pushgateway does not add; the last push overwrites the previous value.
High-frequency or per-request metrics — the Pushgateway becomes a bottleneck and a single point of failure.

In one line: use it only for "jobs that don't live long enough to be scraped."

2. The core concept — grouping key

Every push and delete in the Pushgateway happens per group. A group is identified by its grouping key, expressed in the URL path.

/metrics/job/<JOB_NAME>{/<LABEL_NAME>/<LABEL_VALUE>}

job is required, followed by as many label/value pairs as you want.
The job plus the labels in the path form the grouping key and become labels attached to every pushed metric.

For example, pushing to /metrics/job/backup/instance/host1 attaches job="backup" and instance="host1" to all metrics in that group. Every subsequent PUT/POST/DELETE then operates on the group identified by that grouping key — the central premise of section 5.

If a label value must contain /, you can't put it directly in the URL path. You need the base64 variant encoding (/<label>@base64/<value>), which the client libraries handle automatically.

3. Pushing with a client library

Python — prometheus-client

The canonical pattern is to create a separate CollectorRegistry at job completion, put only that job's metrics in it, and push.

pip install prometheus-client

from prometheus_client import CollectorRegistry, Gauge, Counter
from prometheus_client import push_to_gateway, pushadd_to_gateway, delete_from_gateway
 
GATEWAY = "localhost:9091"
 
def run_backup():
    registry = CollectorRegistry()              # a registry just for this job
 
    last_success = Gauge(
        "backup_last_success_unixtime",
        "Last success time", registry=registry,
    )
    processed = Counter(
        "backup_records_processed_total",
        "Records processed", registry=registry,
    )
 
    # ... actual backup work ...
    processed.inc(12345)
    last_success.set_to_current_time()
 
    # replace the whole group (job=backup, instance=host1) with these values → PUT
    push_to_gateway(
        GATEWAY, job="backup",
        grouping_key={"instance": "host1"},
        registry=registry,
    )

The three functions each map to a different HTTP METHOD (detailed in section 5).

push_to_gateway(GATEWAY, job="backup", registry=registry)      # PUT  — replace the whole group
pushadd_to_gateway(GATEWAY, job="backup", registry=registry)   # POST — replace same-named only
delete_from_gateway(GATEWAY, job="backup")                     # DELETE — delete the group

If the Pushgateway requires authentication, handle it with handler.

from prometheus_client.exposition import basic_auth_handler
 
def auth_handler(url, method, timeout, headers, data):
    return basic_auth_handler(url, method, timeout, headers, data, "user", "secret")
 
push_to_gateway(GATEWAY, job="backup", registry=registry, handler=auth_handler)

Java — Micrometer / client_java (bonus)

In Spring Boot, PrometheusPushGatewayManager handles periodic / on-shutdown pushing. It turns on with config alone.

management:
  prometheus:
    metrics:
      export:
        pushgateway:
          enabled: true
          base-url: http://localhost:9091
          job: my-batch-job
          push-rate: 1m
          shutdown-operation: push   # push on shutdown (good for batch)

To send directly at a low level, use client_java's PushGateway. The method names reveal the METHOD mapping.

import io.prometheus.client.CollectorRegistry;
import io.prometheus.client.Gauge;
import io.prometheus.client.exporter.PushGateway;
 
CollectorRegistry registry = new CollectorRegistry();
Gauge duration = Gauge.build("backup_duration_seconds", "Backup duration")
        .register(registry);
 
Gauge.Timer timer = duration.startTimer();
try {
    // ... batch work ...
} finally {
    timer.setDuration();
    PushGateway pg = new PushGateway("localhost:9091");
    pg.pushAdd(registry, "backup");   // POST   — replace same-named only
    // pg.push(registry, "backup");   // PUT    — replace the whole group
    // pg.delete("backup");           // DELETE — delete the group
}

4. Pushing with raw HTTP (curl)

You don't need a client library. The body just needs to be exposition-format text, and you encode the grouping key in the URL. curl's --data-binary sends as POST by default.

# push one metric to the job=backup group (POST)
echo "backup_records_processed_total 12345" \
  | curl --data-binary @- http://localhost:9091/metrics/job/backup
 
# add labels to the grouping key (job=backup, instance=host1)
echo "backup_duration_seconds 42.3" \
  | curl --data-binary @- http://localhost:9091/metrics/job/backup/instance/host1

Separate multiple metrics with newlines, and the body must end with a newline. Include # TYPE declarations to preserve the type.

printf '# TYPE backup_records_processed_total counter\nbackup_records_processed_total 12345\nbackup_duration_seconds 42.3\n' \
  | curl --data-binary @- http://localhost:9091/metrics/job/backup

Specifying the METHOD changes the behavior.

# PUT — replace the entire group with this body
echo "backup_duration_seconds 42.3" \
  | curl -X PUT --data-binary @- http://localhost:9091/metrics/job/backup
 
# DELETE — delete the entire group (no body needed)
curl -X DELETE http://localhost:9091/metrics/job/backup

5. Per-METHOD internal behavior

Even with the same URL (= same grouping key), the Pushgateway updates the stored group differently depending on the METHOD. This is the most frequently confused part.

Suppose the job="backup" group currently holds metrics A and B, and this time you push only A (a new value).

METHOD	Behavior	Result in the example
POST	Replace only metrics whose name is in the body. Other names in the group are kept	A (new) + B kept
PUT	Replace the entire group with the body. Metrics absent from the body are deleted	A (new) only, B deleted
DELETE	Delete the entire group of the grouping key	group emptied (both A and B deleted)

In other words:

POST (= pushadd/pushAdd) — "update only these metrics." Safe when another job stage pushes its own metrics into the same group.
PUT (= push) — "this is the group's entire state." For wiping and replacing the group cleanly on each run.
DELETE (= delete) — for cleanup when the job is done and you no longer want to expose it.

Note: the unit of replacement for PUT/POST is the metric name (metric family). Same name means the whole family is replaced at once even if labels differ. Also, for any METHOD, a different grouping key is a different group and is unaffected.

6. The Pushgateway's own metrics

The Pushgateway automatically appends push-status metrics per group.

push_time_seconds{job="backup",...} — the last successful push time for that group
push_failure_time_seconds{...} — the last failed push time

You can monitor whether a batch pushed on time (i.e., freshness) with these values.

# if no new push for over an hour, the batch is stuck
time() - push_time_seconds{job="backup"} > 3600

7. Prometheus configuration — honor_labels

scrape_configs:
  - job_name: "pushgateway"
    honor_labels: true                 # preserve the pushed job/instance labels
    static_configs:
      - targets: ["localhost:9091"]

Why is it needed? On a scrape, Prometheus normally overwrites the job/instance labels with its own config values (the target labels seen earlier). Left as-is, every job="backup" pushed into the Pushgateway would become job="pushgateway", erasing which job the metric came from. honor_labels: true prefers the labels on the exposed metrics, preserving the job/instance you set when pushing.

8. Operations — staleness and cleanup

The Pushgateway's biggest pitfall is that it never forgets.

A pushed value stays in /metrics forever, until you DELETE it or the Pushgateway restarts. The final value of a finished one-off job keeps being exposed, so a dashboard may show an "old value" as if it were current → clean up with DELETE when the job ends, or watch freshness with push_time_seconds.
The Pushgateway does not store timestamps on metrics. The exposed metrics take Prometheus's scrape time, so "when was it pushed" is judged by push_time_seconds, not the value.
All batch metrics funnel through one place, making it a single point of failure. Consider data loss on restart (when the persistence option --persistence.file is unset).
Keep grouping keys from growing without bound (e.g., a unique run_id label per run) to control cardinality.

Wrapping up

The Pushgateway is a temporary holding area for "short jobs that pull can't reach." Pushing itself is simple — whether via a client library (push/pushadd/delete) or raw HTTP (PUT/POST/DELETE), it all comes down to updating the group identified by the grouping key according to the METHOD rules. What truly matters is the judgment of when to use it: use an exporter for long-running services, use a different design when you need aggregation, use the Pushgateway only for batch jobs — and don't forget honor_labels and staleness cleanup.

As a next step, build a dashboard that combines push_time_seconds-based freshness alerts with batch success/duration as Grafana panels, so you can see "did last night's batch run correctly" at a glance.