kafkadisaster-recoverymirrormaker2replicationoperations

[Kafka DR 2] Building MirrorMaker 2 in Practice — Starting with One-Way Replication

A hands-on guide to building one-way DR replication from scratch: MirrorMaker 2's architecture (three Connect-based connectors), replication policy and topic naming, a real mm2.properties walkthrough, running it, and verifying replication.

Data DynamicsJune 14, 202612 min read

The second Kafka cluster you stand up for disaster recovery (DR) is nothing but an empty shell until data starts flowing into it. In Part 1, we set the goals — why DR matters, plus targets like RPO and RTO. Now for the main event: how do you replicate messages from the primary cluster to the DR cluster in real time? The answer is MirrorMaker 2 (MM2), the tool Apache Kafka ships for exactly this purpose. In this part, we build the most fundamental and most important piece — one-way replication — by hand, from scratch.

What you'll learn in this post

How MM2 runs on top of Kafka Connect, and its three core connectors

The Replication Policy and topic-naming rules, and how they break replication loops

A line-by-line mm2.properties you can use in production

How to launch MM2 with connect-mirror-maker.sh and verify replication

Why data replication alone cannot fail consumers over (teaser for Part 3)

1. MirrorMaker 2 Runs on Top of Kafka Connect

Why MM2

The original MirrorMaker (so-called MM1) was little more than a consumer bolted to a producer. It replicated neither topic configs nor consumer offsets, and if topics were added at runtime you had to chase them manually. MirrorMaker 2, introduced by KIP-382, solved these limitations by rebuilding the whole thing on the Kafka Connect framework.

That it sits on Connect is the crucial fact. MM2 inherits all of Connect's infrastructure for free: distributed execution, automatic rebalancing, offset tracking, and REST-based management. Operating MM2 is, in practice, the same as operating three kinds of Connect connectors.

The three connectors

For each replication direction (source → target), MM2 launches the following three connectors.

Connector	Role	Output
MirrorSourceConnector	Replicates record data from source topics to the target	Mirror topics like `primary.orders`
MirrorCheckpointConnector	Converts and replicates consumer group offsets as checkpoints	`<source>.checkpoints.internal`
MirrorHeartbeatConnector	Emits periodic heartbeats to measure connectivity and lag	`heartbeats` topic

The key is that the three connectors have clearly separated responsibilities.

MirrorSourceConnector replicates the actual messages (payloads). It periodically refreshes the topic list and follows along as new partitions appear. It also syncs topic configs and ACLs, depending on the options.
MirrorCheckpointConnector replicates not messages but the offset positions of consumer groups. You cannot reuse the source cluster's offset numbers directly (mirror topics are numbered differently), so MM2 records a source-offset → target-offset mapping (offset-syncs) and builds "checkpoints" on top of it. These checkpoints play the decisive role in the failover covered in Part 3.
MirrorHeartbeatConnector stamps heartbeat records into the source cluster at a fixed interval. By observing how long it takes those records to reach the target, you can measure end-to-end replication lag.

Loading diagram…

2. Replication Policy and Topic Naming

DefaultReplicationPolicy — stamp provenance with a prefix

The first thing MM2 must decide when replicating a topic is what to name the topic on the target. The default, DefaultReplicationPolicy, prefixes the source cluster alias.

Topic on source (primary):   orders
Mirror topic on target (dr): primary.orders

This prefix rule solves two problems at once.

Loop prevention: in a bidirectional (active-active) setup, if primary.orders were replicated back to primary it would become dr.primary.orders — but MM2 does not re-replicate a topic that already carries its own alias as a prefix. The prefix is the safety mechanism that breaks the loop.
Clear provenance: seeing primary.orders on the DR cluster tells you, from the name alone, "this is a mirror that flowed in from primary."

IdentityReplicationPolicy — keep the name as-is

Depending on your organization, the prefix can be a nuisance (existing consumers subscribe to orders, but on DR they'd have to subscribe to primary.orders). In that case IdentityReplicationPolicy replicates orders → orders without renaming.

Policy	Target topic name	Pros	Risk
`DefaultReplicationPolicy`	`primary.orders`	Automatic loop prevention, clear provenance	Consumers must be aware of the prefix
`IdentityReplicationPolicy`	`orders`	No consumer changes, simpler failover	No loop prevention → one-way only, or explicit topic filters required

Caution: IdentityReplicationPolicy cannot break loops via a prefix, so using it as-is in a bidirectional setup can cause infinite replication. It's relatively safe for one-way DR, but you must set explicit topics / topics.exclude filters to keep mirror topics from being sent back. The one-way example in this post uses the safe default, DefaultReplicationPolicy.

Internal topics

MM2 stores the metadata it needs in internal topics on the clusters. The names are designed so you can guess each one's role at a glance.

Topic	Location	Role
`heartbeats`	source	Heartbeat origin (mirrored to the target as `<source>.heartbeats`)
`<source>.checkpoints.internal`	target	Consumer group offset checkpoints
`mm2-offset-syncs.<target>.internal`	source (or target)	Source-offset ↔ target-offset mapping

mm2-offset-syncs is the table that records "offset 100 on the source is offset what on the target." Checkpoints only become meaningful when this table exists, and it's what lets you seat a consumer at the correct position on failover.

3. mm2.properties, Line by Line

Now let's write the actual config file. MM2 describes cluster definitions, replication directions, and replication targets all in a single config file. Below is a real-world example for one-way primary → dr DR replication.

# ── Cluster alias definitions ────────────────────────
# The aliases you set here (primary, dr) are used directly in topic prefixes and flow definitions.
clusters = primary, dr
 
# ── Bootstrap servers per cluster ────────────────────
primary.bootstrap.servers = primary-broker1:9092,primary-broker2:9092,primary-broker3:9092
dr.bootstrap.servers      = dr-broker1:9092,dr-broker2:9092,dr-broker3:9092
 
# (If needed) security settings are also specified per cluster with a prefix
# primary.security.protocol = SASL_SSL
# primary.sasl.mechanism    = SCRAM-SHA-512
# primary.sasl.jaas.config  = org.apache.kafka.common.security.scram.ScramLoginModule required username="mm2" password="***";
 
# ── Enable replication flow ──────────────────────────
# Replicate only in the primary -> dr direction (one-way). The reverse is left unspecified, so disabled.
primary->dr.enabled = true
dr->primary.enabled = false
 
# ── What to replicate ────────────────────────────────
# Replicate all user topics/groups. In production, a regex whitelist is recommended.
primary->dr.topics = .*
primary->dr.groups = .*
# Exclusions (internal/temporary topics, etc.). Add here if you need more than MM2's default exclude patterns.
primary->dr.topics.exclude = .*[\-\.]internal, .*\.replica, __.*
 
# ── Replication factor for mirror topics ─────────────
# Set to match the number of brokers in the DR cluster (e.g., 3 for a 3-node cluster).
replication.factor = 3
 
# ── Replication factor for MM2 internal topics ───────
checkpoints.topic.replication.factor       = 3
heartbeats.topic.replication.factor        = 3
offset-syncs.topic.replication.factor      = 3
offset.storage.replication.factor          = 3
status.storage.replication.factor          = 3
config.storage.replication.factor          = 3
 
# ── Topic config / ACL sync ──────────────────────────
# Keep target topics aligned with the source's configs (partition count, retention, etc.)
sync.topic.configs.enabled = true
# Sync source topic ACLs to the target (recommended for secured clusters)
sync.topic.acls.enabled    = true
 
# ── How often to track topic/group changes ───────────
# How quickly to catch up when new topics, partitions, or groups appear (seconds)
refresh.topics.interval.seconds = 30
refresh.groups.interval.seconds = 30
 
# ── Checkpoint / heartbeat emission ──────────────────
emit.checkpoints.enabled         = true
emit.checkpoints.interval.seconds = 30
emit.heartbeats.enabled          = true
emit.heartbeats.interval.seconds = 5
 
# ── Replication policy (default, made explicit) ──────
replication.policy.class = org.apache.kafka.connect.mirror.DefaultReplicationPolicy
 
# ── Connector task parallelism ───────────────────────
# Increase when you have many high-throughput topics, to spread partitions across more tasks.
tasks.max = 4

Options worth flagging

clusters / *.bootstrap.servers: the starting point for everything. The aliases (primary, dr) become topic prefixes, so use meaningful names.
primary->dr.enabled = true: replication is enabled/disabled per "flow." For one-way, you don't enable the reverse (dr->primary.enabled).
topics = .*: start narrow (a whitelist). .* is convenient, but it will also chase compacted and temporary topics, eating disk and network.
sync.topic.configs.enabled: turn this on so the target follows along when you add partitions or change retention on the source. It's the core of DR consistency.
emit.checkpoints.enabled: must be true for failover. This is where the basis for moving consumer offsets is created (we use it in earnest in Part 3).
refresh.topics.interval.seconds: the maximum delay before a new topic appears on DR. Too short means metadata load; too long means new topics are missed for longer. Around 30 seconds is a sensible default.

4. Running MM2

Dedicated mode: connect-mirror-maker.sh

The simplest way to run it is the dedicated script bundled with the Kafka distribution. Pass the config file you just wrote as an argument, and MM2 brings up the required Connect workers and all three connectors at once.

# From the Kafka install directory
$ bin/connect-mirror-maker.sh config/mm2.properties
 
# To run distributed across multiple nodes, run the same command with the same config on each node.
# (They form one group, and Connect automatically rebalances tasks.)
$ bin/connect-mirror-maker.sh config/mm2.properties

In this "dedicated mode," MM2 drives the Connect workers itself internally, so you don't need to stand up a separate Connect cluster beforehand. This is the fastest approach early in a DR build-out.

Running them as connectors on an existing Connect cluster

If you already operate a Kafka Connect cluster, you can also register MM2's three connectors like ordinary connectors over REST. This lets you reuse your existing Connect monitoring and deployment pipeline, which is helpful for operational standardization.

# Register a MirrorSourceConnector over the Connect REST API (example)
$ curl -X PUT http://connect:8083/connectors/primary-to-dr-source/config \
  -H "Content-Type: application/json" \
  -d '{
    "connector.class": "org.apache.kafka.connect.mirror.MirrorSourceConnector",
    "source.cluster.alias": "primary",
    "target.cluster.alias": "dr",
    "source.cluster.bootstrap.servers": "primary-broker1:9092",
    "target.cluster.bootstrap.servers": "dr-broker1:9092",
    "topics": ".*",
    "replication.factor": "3",
    "sync.topic.configs.enabled": "true",
    "tasks.max": "4"
  }'

How to choose: for PoCs or small setups, dedicated mode (connect-mirror-maker.sh) is simpler. For organizations that already have Connect operations and monitoring in place, running them as connectors on an existing Connect cluster is easier to manage long-term.

5. Verifying That Replication Works

Launching it isn't the end. Confirm in three steps that data is actually flowing.

5-1. Did the mirror topics appear on DR?

First, list topics on the DR cluster and check that mirror topics with the primary. prefix have appeared.

$ bin/kafka-topics.sh --bootstrap-server dr-broker1:9092 --list
 
heartbeats
mm2-offset-syncs.dr.internal
primary.checkpoints.internal
primary.heartbeats
primary.orders          # ← primary's orders, replicated
primary.payments        # ← primary's payments, replicated

If you see primary.orders, the MirrorSourceConnector is working correctly.

5-2. Watch the replication lag

Produce one message to the source and watch whether the end offset of the target mirror topic climbs to follow it.

# End offset of the source topic
$ bin/kafka-run-class.sh kafka.tools.GetOffsetShell \
    --bootstrap-server primary-broker1:9092 --topic orders --time -1
 
# End offset of the target mirror topic (should follow, with a delay)
$ bin/kafka-run-class.sh kafka.tools.GetOffsetShell \
    --bootstrap-server dr-broker1:9092 --topic primary.orders --time -1

The difference between the two values approximates the replication lag. If that gap keeps widening over time, suspect a shortage of tasks (tasks.max) or network bandwidth.

5-3. Confirm end-to-end connectivity with heartbeats

The most intuitive health check is to read the heartbeat topic directly.

$ bin/kafka-console-consumer.sh \
    --bootstrap-server dr-broker1:9092 \
    --topic primary.heartbeats \
    --from-beginning --max-messages 5

If records keep flowing in, the entire primary → dr path is alive. When heartbeats stop, something on the replication path has broken — which makes them a great first-line alerting signal.

6. But Does This Give You Failover?

Get this far and the DR cluster accumulates every message from primary, neatly. The data is safe. So when primary dies, can you just move consumers over to dr and be done?

No. And here lies DR's biggest trap.

The offsets of the mirror topic primary.orders are numbered differently from the original orders. Just because a consumer group read "up to offset 1,000,000" on the source, reading offset 1,000,000 from primary.orders on dr lands you in the wrong place.
That's why MM2 builds a "source-offset → target-offset" translation table using mm2-offset-syncs and checkpoints. Only by applying this translation on failover can a consumer resume without duplicates or gaps.
This offset translation, migrating consumer groups using RemoteClusterUtils / MirrorClient, and the failover procedure in active-active bidirectional setups are the subject of the next part.

In short, data replication is only half of DR. Without the other half — "how to move consumers over without a gap" — the moment a failure hits, your operations team won't know where to start reading again.

Wrapping up

MM2 runs on top of Kafka Connect and consists of three connectors per replication direction — MirrorSource (data), MirrorCheckpoint (offsets), MirrorHeartbeat (connectivity).
DefaultReplicationPolicy prefixes the source alias (orders → primary.orders) to make provenance clear and break replication loops. IdentityReplicationPolicy preserves the name but has no loop prevention, so use it with care.
One-way DR needs nothing more than a single mm2.properties. The key switches are primary->dr.enabled = true, the topics/groups filters, replication.factor, sync.topic.configs.enabled, and emit.checkpoints.enabled.
Start fastest with connect-mirror-maker.sh (dedicated mode); when you need operational standardization, run them as connectors on an existing Connect cluster.
Verify in three steps: mirror topic creation → replication lag → heartbeats.
Data replication alone does not give you consumer failover. Offset translation and consumer group migration are covered in [Kafka DR 3].

References

Apache Kafka. "Geo-Replication (Cross-Cluster Data Mirroring)" — https://kafka.apache.org/documentation/#georeplication

KIP-382: MirrorMaker 2.0 — https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0

Apache Kafka. "MirrorMaker 2" configuration reference — https://kafka.apache.org/documentation/#mirrormaker

— The Data Dynamics Engineering Team