kafkaproducerperformancetuningthroughput

[Kafka Ops 8] Tuning Producer Throughput — Batching, Compression, and Linger Trade-offs

Understand batch.size, linger.ms, compression.type, buffer.memory, max.in.flight, and acks precisely, and learn to balance the throughput-versus-latency trade-off in the Kafka producer.

Data DynamicsJune 7, 202611 min read

Producer tuning never reduces to a single word like "faster." Raise throughput and latency goes up; cut latency and throughput drops. They are the two ends of the same dial, so good tuning isn't about "catching both rabbits"—it's about deciding which end of the dial your workload belongs to, and tuning toward it. This post starts with how the producer gathers records before sending them, then unpacks—using exact config keys—how batch.size, linger.ms, compression.type, and acks interlock.

What you'll learn in this post

The mental model: records are buffered per-partition and shipped in batches

How batch.size and linger.ms trade throughput for latency

Back-pressure when the buffer fills, via buffer.memory and max.block.ms

How compression.type (lz4/zstd recommended) interacts with batch size

max.in.flight.requests.per.connection, enable.idempotence, and ordering

Throughput-optimized vs low-latency config sets and how to choose

1. The Mental Model — The Producer Gathers Before Sending

Calling producer.send() does not push the record onto the network immediately. The producer buffers records in memory per topic-partition, and once a condition is met, it groups records headed for the same partition into a single batch and sends it to the broker. Without this model, every setting will seem to behave backwards.

Loading diagram…

The core flow:

send() serializes the record, picks a target partition via the partitioner, appends it to that partition's batch in the RecordAccumulator, and returns immediately (non-blocking).
A background Sender thread (kafka-producer-network-thread) picks up ready batches, groups them per broker, and sends them.
A batch becomes "ready to send" under two conditions: the batch has filled to batch.size, or linger.ms has elapsed since the batch was created.

So the entire throughput-versus-latency tension compresses into one line: the larger you let batches grow, the higher the per-request efficiency (throughput)—but records wait while they accumulate (latency).

Unit	What it is	Influence
Record	One `send()`	Goes into a batch after serialization
Batch	Records bound for one partition	Closed by `batch.size`/`linger.ms`
Request	Multiple batches bound for one broker	Subject to `max.request.size` and in-flight limits

2. batch.size and linger.ms — The Throughput/Latency Dial

These two values account for 90% of producer throughput tuning.

batch.size

batch.size is the maximum number of bytes a single partition batch may hold (default 16384 = 16KB). Note it's bytes, not record count. Once a batch reaches this size, it becomes eligible to send immediately, without waiting for linger.ms.

Too small: batches fill quickly and many small requests go out → request overhead, worse compression → throughput loss.
Too large: on low-traffic partitions batches rarely fill, so you end up relying on linger.ms anyway, and you consume more buffer.memory.
Common throughput values: 32KB–256KB (32768–262144).

linger.ms

linger.ms is how long to wait—accumulating more records for the same partition—before sending the batch (default 0). Even at 0, a busy Sender naturally forms batches, but explicitly setting 5–100ms lets you deliberately grow them.

# Throughput-oriented example
batch.size=131072        # 128KB
linger.ms=20             # accumulate up to 20ms, then send

This means "accumulate for up to 20ms, or until 128KB fills, then send in one shot." Whichever condition is met first triggers the send.

Direction	batch.size	linger.ms	Effect
Low latency	small (16KB)	0	records go out almost immediately, throughput suffers
Balanced	32–64KB	5–10	a little latency buys better throughput
High throughput	128–256KB	20–100	maximizes batch efficiency, latency rises

Key point: linger.ms is not a setting that "adds latency"—it's a setting that "buys throughput." If your traffic is heavy enough that batches fill to batch.size quickly, raising linger.ms adds almost no real latency, because the batch leaves the moment it's full anyway.

3. buffer.memory and max.block.ms — When the Buffer Fills

Batches don't accumulate for free. All partition batches are allocated from a shared memory pool called buffer.memory (default 32MB). If the broker is slow or the network is congested and the Sender can't drain batches fast enough, this pool fills.

At that point send() is no longer non-blocking. It blocks until space frees up, and the maximum wait is max.block.ms (default 60000 = 60s). Exceed it and send() throws a TimeoutException.

buffer.memory=67108864   # 64MB — raise it for high throughput or frequent broker latency
max.block.ms=60000       # max time send()/partitionsFor() may block

This is the producer's back-pressure point. The behavior:

Situation	Result
Buffer has room	`send()` returns immediately (non-blocking)
Buffer full	`send()` blocks up to `max.block.ms`
`max.block.ms` exceeded	`TimeoutException` — surfaced to the calling thread

Diagnosis: if application threads stall in send() or you see TimeoutException, it almost always signals "the producer is generating faster than it can send." Raising buffer.memory is a band-aid; the root cause is often insufficient broker throughput, acks=all latency, too few partitions, or the network.

4. compression.type — Batches Get Compressed

Compression is the highest-leverage knob in throughput tuning. compression.type makes the producer compress the payload per batch.

Value	Ratio	CPU cost	Notes
`none`	none	none	default
`gzip`	high	high	great ratio but CPU-heavy
`snappy`	medium	low	fast but modest ratio
`lz4`	medium–high	low	good throughput/CPU balance — recommended
`zstd`	high	medium	excellent ratio/speed balance — recommended

The key is that compression happens per batch. So the bigger the batch (the more records compressed together), the better the ratio. That means batch tuning (sections 2–4) and compression amplify each other. Compressing a small batch yields little, because there's little data to compress.

compression.type=lz4
batch.size=131072
linger.ms=20

For balanced throughput and ratio, lz4 or zstd is recommended. For repetitive data like logs, zstd saves significant network bandwidth and disk. Since compression consumes producer CPU, lz4 is the safer choice when CPU cores are tight. Brokers store the compressed batch as-is by default, so disk and replication traffic drop as a side effect (set broker compression.type=producer to preserve it without re-compression).

5. max.in.flight and Ordering

max.in.flight.requests.per.connection is the number of unacknowledged requests that can be in flight per broker connection (default 5). The higher it is, the more you can keep pushing the next batch without waiting for the network round-trip—raising throughput. But combined with retries, ordering can break.

Consider this: in-flight is 2 and idempotence is off.

Batch A then batch B are sent back-to-back (both in flight).
Batch A fails on a transient error; batch B succeeds.
The producer retries batch A → A is written to the broker after B.
Result: within the same partition, A and B end up reordered.

The canonical fix is the idempotent producer.

enable.idempotence=true            # (modern default) sequence numbers prevent dup/reorder
max.in.flight.requests.per.connection=5   # ordering preserved up to 5 when idempotent
acks=all                           # prerequisite for idempotence

With enable.idempotence=true, the producer tags each record with a sequence number and the broker validates it, guaranteeing in-partition ordering and exactly-once write (no duplicates) even when retries occur. In that case you can raise in-flight to 5 and still preserve order (Kafka reorders at the broker). Enabling idempotence implicitly requires acks=all, retries>0, and max.in.flight<=5.

Setting	Ordering	Throughput
idempotence OFF, in-flight=1	guaranteed (one-at-a-time, safe on retry)	low
idempotence OFF, in-flight>1 + retries	can break	high
idempotence ON, `in-flight<=5`	guaranteed	high

The ordering mechanism as a whole is covered in depth in Part 10 (Message Ordering and Partitioning) of this series. Here, just remember: "to raise in-flight for throughput, turn on idempotence."

6. How acks Interacts with Throughput

acks is the acknowledgment level the producer requires to consider a send "successful." It is tied directly to durability and also affects throughput and latency.

`acks`	Meaning	Durability	Throughput/latency
`0`	don't wait for any ack	very low (can lose data)	fastest
`1`	leader confirms write	medium (loss on leader failure)	fast
`all` (`-1`)	full ISR replication confirmed	high (recommended default)	added latency

acks=all waits until the leader has replicated to all in-sync replicas (ISR), so per-request latency rises. This tempts the "let's drop to acks=1 for throughput" move—but that's a trade that sells durability. The key insight is that the extra latency of acks=all is per request. So if you grow batches (sections 2–4) to cut the number of requests, you can recover throughput substantially while keeping acks=all. The right first move to cut latency is not lowering acks, but tuning batching, compression, and in-flight.

The durability meaning of acks and its relationship to ISR and min.insync.replicas is explained in detail in Part 4 (Replication, acks, and Data Durability) of this series.

7. Tuning Sets by Workload

Let's bundle everything above into two representative goals. The absolute values are only starting points—always measure and adjust on your own workload.

Optimize for throughput (batch ETL, log ingestion, bulk loads)

batch.size=262144                 # 256KB — large batches
linger.ms=50                      # accumulate before sending
compression.type=zstd             # high ratio (when CPU allows)
buffer.memory=134217728           # 128MB
acks=all                          # keep durability (batching offsets the cost)
enable.idempotence=true
max.in.flight.requests.per.connection=5

Optimize for low latency (real-time alerts, transactional events, user-facing paths)

batch.size=16384                  # small batch (default)
linger.ms=0                       # send immediately
compression.type=lz4              # light compression
buffer.memory=33554432            # 32MB (default)
acks=all                          # keep durability
enable.idempotence=true
max.in.flight.requests.per.connection=5

Dial	Throughput-first	Latency-first
`batch.size`	large (128–256KB)	small (16KB)
`linger.ms`	20–100	0
`compression.type`	zstd	lz4 or none
`buffer.memory`	large (64–128MB)	default (32MB)
`acks`	all	all
`enable.idempotence`	true	true

Note that both sets keep acks=all and enable.idempotence=true in common. Whether throughput or latency, durability and ordering are not negotiable—they're the starting line.

8. Don't Sell Off the Durability Defaults Carelessly

Under throughput pressure, the first instinct is to drop to acks=0/1 and turn off enable.idempotence. The immediate numbers improve, but in exchange you take on message loss, duplicates, and reordering—the kinds of problems hardest to debug in production.

The right priority order:

Squeeze with batching and compression first — raise batch.size, raise linger.ms, set compression.type=lz4/zstd. Most throughput problems resolve here.
If still short, raise in-flight and buffer — but with idempotence on.
Suspect partition count and broker resources — if one producer isn't enough, add topic partitions for parallelism.
Touch durability settings last of all, and only by explicit agreement — only when there's a decision like "this topic carries metrics where loss is acceptable."

Checklist: when throughput is low → ① Did you raise batch.size/linger.ms? ② Did you enable compression? ③ Is buffer.memory sufficient? ④ Is partition count capping parallelism? Only after all four should you suspect acks.

Wrapping up

The producer buffers records per partition and sends them in batches. Throughput and latency are two ends of one dial, and tuning is deciding which end to stand on.
batch.size (max batch bytes) and linger.ms (how long to wait to fill) are that dial. Raise both → throughput↑, latency↑.
When buffer.memory fills, send() blocks up to max.block.ms — the producer's back-pressure point.
compression.type compresses per batch, so it synergizes with large batches. The balanced choice is lz4 or zstd.
To gain throughput by raising in-flight, keep ordering with enable.idempotence=true. When idempotent, in-flight up to 5 is safe.
The latency of acks=all is per request, so growing batches recovers throughput while keeping durability. Don't trade the durability defaults for throughput.

References

Apache Kafka Documentation — Producer Configs: https://kafka.apache.org/documentation/#producerconfigs

Part 4 — Replication, acks, and Data Durability

Part 10 — Message Ordering and Partitioning

— The Data Dynamics Engineering Team