[Kafka DR 2] Building MirrorMaker 2 in Practice — Starting with One-Way Replication
A hands-on guide to building one-way DR replication from scratch: MirrorMaker 2's architecture (three Connect-based connectors), replication policy and topic naming, a real mm2.properties walkthrough, running it, and verifying replication.
The second Kafka cluster you stand up for disaster recovery (DR) is nothing but an empty shell until data starts flowing into it. In Part 1, we set the goals — why DR matters, plus targets like RPO and RTO. Now for the main event: how do you replicate messages from the primary cluster to the DR cluster in real time? The answer is MirrorMaker 2 (MM2), the tool Apache Kafka ships for exactly this purpose. In this part, we build the most fundamental and most important piece — one-way replication — by hand, from scratch.
What you'll learn in this post
- How MM2 runs on top of Kafka Connect, and its three core connectors
- The Replication Policy and topic-naming rules, and how they break replication loops
- A line-by-line
mm2.propertiesyou can use in production- How to launch MM2 with
connect-mirror-maker.shand verify replication- Why data replication alone cannot fail consumers over (teaser for Part 3)
1. MirrorMaker 2 Runs on Top of Kafka Connect
Why MM2
The original MirrorMaker (so-called MM1) was little more than a consumer bolted to a producer. It replicated neither topic configs nor consumer offsets, and if topics were added at runtime you had to chase them manually. MirrorMaker 2, introduced by KIP-382, solved these limitations by rebuilding the whole thing on the Kafka Connect framework.
That it sits on Connect is the crucial fact. MM2 inherits all of Connect's infrastructure for free: distributed execution, automatic rebalancing, offset tracking, and REST-based management. Operating MM2 is, in practice, the same as operating three kinds of Connect connectors.
The three connectors
For each replication direction (source → target), MM2 launches the following three connectors.
| Connector | Role | Output |
|---|---|---|
| MirrorSourceConnector | Replicates record data from source topics to the target | Mirror topics like primary.orders |
| MirrorCheckpointConnector | Converts and replicates consumer group offsets as checkpoints | <source>.checkpoints.internal |
| MirrorHeartbeatConnector | Emits periodic heartbeats to measure connectivity and lag | heartbeats topic |
The key is that the three connectors have clearly separated responsibilities.
- MirrorSourceConnector replicates the actual messages (payloads). It periodically refreshes the topic list and follows along as new partitions appear. It also syncs topic configs and ACLs, depending on the options.
- MirrorCheckpointConnector replicates not messages but the offset positions of consumer groups. You cannot reuse the source cluster's offset numbers directly (mirror topics are numbered differently), so MM2 records a source-offset → target-offset mapping (offset-syncs) and builds "checkpoints" on top of it. These checkpoints play the decisive role in the failover covered in Part 3.
- MirrorHeartbeatConnector stamps heartbeat records into the source cluster at a fixed interval. By observing how long it takes those records to reach the target, you can measure end-to-end replication lag.
2. Replication Policy and Topic Naming
DefaultReplicationPolicy — stamp provenance with a prefix
The first thing MM2 must decide when replicating a topic is what to name the topic on the target. The default, DefaultReplicationPolicy, prefixes the source cluster alias.
Topic on source (primary): orders
Mirror topic on target (dr): primary.ordersThis prefix rule solves two problems at once.
- Loop prevention: in a bidirectional (active-active) setup, if
primary.orderswere replicated back to primary it would becomedr.primary.orders— but MM2 does not re-replicate a topic that already carries its own alias as a prefix. The prefix is the safety mechanism that breaks the loop. - Clear provenance: seeing
primary.orderson the DR cluster tells you, from the name alone, "this is a mirror that flowed in from primary."
IdentityReplicationPolicy — keep the name as-is
Depending on your organization, the prefix can be a nuisance (existing consumers subscribe to orders, but on DR they'd have to subscribe to primary.orders). In that case IdentityReplicationPolicy replicates orders → orders without renaming.
| Policy | Target topic name | Pros | Risk |
|---|---|---|---|
DefaultReplicationPolicy | primary.orders | Automatic loop prevention, clear provenance | Consumers must be aware of the prefix |
IdentityReplicationPolicy | orders | No consumer changes, simpler failover | No loop prevention → one-way only, or explicit topic filters required |
Caution:
IdentityReplicationPolicycannot break loops via a prefix, so using it as-is in a bidirectional setup can cause infinite replication. It's relatively safe for one-way DR, but you must set explicittopics/topics.excludefilters to keep mirror topics from being sent back. The one-way example in this post uses the safe default,DefaultReplicationPolicy.
Internal topics
MM2 stores the metadata it needs in internal topics on the clusters. The names are designed so you can guess each one's role at a glance.
| Topic | Location | Role |
|---|---|---|
heartbeats | source | Heartbeat origin (mirrored to the target as <source>.heartbeats) |
<source>.checkpoints.internal | target | Consumer group offset checkpoints |
mm2-offset-syncs.<target>.internal | source (or target) | Source-offset ↔ target-offset mapping |
mm2-offset-syncs is the table that records "offset 100 on the source is offset what on the target." Checkpoints only become meaningful when this table exists, and it's what lets you seat a consumer at the correct position on failover.
3. mm2.properties, Line by Line
Now let's write the actual config file. MM2 describes cluster definitions, replication directions, and replication targets all in a single config file. Below is a real-world example for one-way primary → dr DR replication.
# ── Cluster alias definitions ────────────────────────
# The aliases you set here (primary, dr) are used directly in topic prefixes and flow definitions.
clusters = primary, dr
# ── Bootstrap servers per cluster ────────────────────
primary.bootstrap.servers = primary-broker1:9092,primary-broker2:9092,primary-broker3:9092
dr.bootstrap.servers = dr-broker1:9092,dr-broker2:9092,dr-broker3:9092
# (If needed) security settings are also specified per cluster with a prefix
# primary.security.protocol = SASL_SSL
# primary.sasl.mechanism = SCRAM-SHA-512
# primary.sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required username="mm2" password="***";
# ── Enable replication flow ──────────────────────────
# Replicate only in the primary -> dr direction (one-way). The reverse is left unspecified, so disabled.
primary->dr.enabled = true
dr->primary.enabled = false
# ── What to replicate ────────────────────────────────
# Replicate all user topics/groups. In production, a regex whitelist is recommended.
primary->dr.topics = .*
primary->dr.groups = .*
# Exclusions (internal/temporary topics, etc.). Add here if you need more than MM2's default exclude patterns.
primary->dr.topics.exclude = .*[\-\.]internal, .*\.replica, __.*
# ── Replication factor for mirror topics ─────────────
# Set to match the number of brokers in the DR cluster (e.g., 3 for a 3-node cluster).
replication.factor = 3
# ── Replication factor for MM2 internal topics ───────
checkpoints.topic.replication.factor = 3
heartbeats.topic.replication.factor = 3
offset-syncs.topic.replication.factor = 3
offset.storage.replication.factor = 3
status.storage.replication.factor = 3
config.storage.replication.factor = 3
# ── Topic config / ACL sync ──────────────────────────
# Keep target topics aligned with the source's configs (partition count, retention, etc.)
sync.topic.configs.enabled = true
# Sync source topic ACLs to the target (recommended for secured clusters)
sync.topic.acls.enabled = true
# ── How often to track topic/group changes ───────────
# How quickly to catch up when new topics, partitions, or groups appear (seconds)
refresh.topics.interval.seconds = 30
refresh.groups.interval.seconds = 30
# ── Checkpoint / heartbeat emission ──────────────────
emit.checkpoints.enabled = true
emit.checkpoints.interval.seconds = 30
emit.heartbeats.enabled = true
emit.heartbeats.interval.seconds = 5
# ── Replication policy (default, made explicit) ──────
replication.policy.class = org.apache.kafka.connect.mirror.DefaultReplicationPolicy
# ── Connector task parallelism ───────────────────────
# Increase when you have many high-throughput topics, to spread partitions across more tasks.
tasks.max = 4Options worth flagging
clusters/*.bootstrap.servers: the starting point for everything. The aliases (primary,dr) become topic prefixes, so use meaningful names.primary->dr.enabled = true: replication is enabled/disabled per "flow." For one-way, you don't enable the reverse (dr->primary.enabled).topics = .*: start narrow (a whitelist)..*is convenient, but it will also chase compacted and temporary topics, eating disk and network.sync.topic.configs.enabled: turn this on so the target follows along when you add partitions or change retention on the source. It's the core of DR consistency.emit.checkpoints.enabled: must betruefor failover. This is where the basis for moving consumer offsets is created (we use it in earnest in Part 3).refresh.topics.interval.seconds: the maximum delay before a new topic appears on DR. Too short means metadata load; too long means new topics are missed for longer. Around 30 seconds is a sensible default.
4. Running MM2
Dedicated mode: connect-mirror-maker.sh
The simplest way to run it is the dedicated script bundled with the Kafka distribution. Pass the config file you just wrote as an argument, and MM2 brings up the required Connect workers and all three connectors at once.
# From the Kafka install directory
$ bin/connect-mirror-maker.sh config/mm2.properties
# To run distributed across multiple nodes, run the same command with the same config on each node.
# (They form one group, and Connect automatically rebalances tasks.)
$ bin/connect-mirror-maker.sh config/mm2.propertiesIn this "dedicated mode," MM2 drives the Connect workers itself internally, so you don't need to stand up a separate Connect cluster beforehand. This is the fastest approach early in a DR build-out.
Running them as connectors on an existing Connect cluster
If you already operate a Kafka Connect cluster, you can also register MM2's three connectors like ordinary connectors over REST. This lets you reuse your existing Connect monitoring and deployment pipeline, which is helpful for operational standardization.
# Register a MirrorSourceConnector over the Connect REST API (example)
$ curl -X PUT http://connect:8083/connectors/primary-to-dr-source/config \
-H "Content-Type: application/json" \
-d '{
"connector.class": "org.apache.kafka.connect.mirror.MirrorSourceConnector",
"source.cluster.alias": "primary",
"target.cluster.alias": "dr",
"source.cluster.bootstrap.servers": "primary-broker1:9092",
"target.cluster.bootstrap.servers": "dr-broker1:9092",
"topics": ".*",
"replication.factor": "3",
"sync.topic.configs.enabled": "true",
"tasks.max": "4"
}'How to choose: for PoCs or small setups, dedicated mode (
connect-mirror-maker.sh) is simpler. For organizations that already have Connect operations and monitoring in place, running them as connectors on an existing Connect cluster is easier to manage long-term.
5. Verifying That Replication Works
Launching it isn't the end. Confirm in three steps that data is actually flowing.
5-1. Did the mirror topics appear on DR?
First, list topics on the DR cluster and check that mirror topics with the primary. prefix have appeared.
$ bin/kafka-topics.sh --bootstrap-server dr-broker1:9092 --list
heartbeats
mm2-offset-syncs.dr.internal
primary.checkpoints.internal
primary.heartbeats
primary.orders # ← primary's orders, replicated
primary.payments # ← primary's payments, replicatedIf you see primary.orders, the MirrorSourceConnector is working correctly.
5-2. Watch the replication lag
Produce one message to the source and watch whether the end offset of the target mirror topic climbs to follow it.
# End offset of the source topic
$ bin/kafka-run-class.sh kafka.tools.GetOffsetShell \
--bootstrap-server primary-broker1:9092 --topic orders --time -1
# End offset of the target mirror topic (should follow, with a delay)
$ bin/kafka-run-class.sh kafka.tools.GetOffsetShell \
--bootstrap-server dr-broker1:9092 --topic primary.orders --time -1The difference between the two values approximates the replication lag. If that gap keeps widening over time, suspect a shortage of tasks (tasks.max) or network bandwidth.
5-3. Confirm end-to-end connectivity with heartbeats
The most intuitive health check is to read the heartbeat topic directly.
$ bin/kafka-console-consumer.sh \
--bootstrap-server dr-broker1:9092 \
--topic primary.heartbeats \
--from-beginning --max-messages 5If records keep flowing in, the entire primary → dr path is alive. When heartbeats stop, something on the replication path has broken — which makes them a great first-line alerting signal.
6. But Does This Give You Failover?
Get this far and the DR cluster accumulates every message from primary, neatly. The data is safe. So when primary dies, can you just move consumers over to dr and be done?
No. And here lies DR's biggest trap.
- The offsets of the mirror topic
primary.ordersare numbered differently from the originalorders. Just because a consumer group read "up to offset 1,000,000" on the source, reading offset 1,000,000 fromprimary.orderson dr lands you in the wrong place. - That's why MM2 builds a "source-offset → target-offset" translation table using
mm2-offset-syncsand checkpoints. Only by applying this translation on failover can a consumer resume without duplicates or gaps. - This offset translation, migrating consumer groups using
RemoteClusterUtils/MirrorClient, and the failover procedure in active-active bidirectional setups are the subject of the next part.
In short, data replication is only half of DR. Without the other half — "how to move consumers over without a gap" — the moment a failure hits, your operations team won't know where to start reading again.
Wrapping up
- MM2 runs on top of Kafka Connect and consists of three connectors per replication direction — MirrorSource (data), MirrorCheckpoint (offsets), MirrorHeartbeat (connectivity).
DefaultReplicationPolicyprefixes the source alias (orders→primary.orders) to make provenance clear and break replication loops.IdentityReplicationPolicypreserves the name but has no loop prevention, so use it with care.- One-way DR needs nothing more than a single
mm2.properties. The key switches areprimary->dr.enabled = true, thetopics/groupsfilters,replication.factor,sync.topic.configs.enabled, andemit.checkpoints.enabled. - Start fastest with
connect-mirror-maker.sh(dedicated mode); when you need operational standardization, run them as connectors on an existing Connect cluster. - Verify in three steps: mirror topic creation → replication lag → heartbeats.
- Data replication alone does not give you consumer failover. Offset translation and consumer group migration are covered in [Kafka DR 3].
References
- Apache Kafka. "Geo-Replication (Cross-Cluster Data Mirroring)" — https://kafka.apache.org/documentation/#georeplication
- KIP-382: MirrorMaker 2.0 — https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
- Apache Kafka. "MirrorMaker 2" configuration reference — https://kafka.apache.org/documentation/#mirrormaker
— The Data Dynamics Engineering Team