Why Time Synchronization (NTP) Matters in Apache Kudu
Why a Kudu cluster won't even start without NTP, actual error messages, and how to configure ntpd and chrony — all in one place.
Have you ever installed Apache Kudu for the first time, started the Tablet Server, and seen "Clock unsynchronized" in the logs as the process dies? In a distributed database, "the clocks are out of sync" isn't just a warning — it means data consistency cannot be guaranteed, so Kudu refuses to start entirely. Based on the official Kudu Troubleshooting documentation, this post explains why time synchronization is mandatory, what errors you'll encounter, and how to fix them.
1. Why Kudu Is Sensitive About Clocks
Kudu internally uses a Hybrid Logical Clock (HLC). HLC is a timestamp that combines a physical clock (wall clock) with a logical counter — the core mechanism for determining event ordering across distributed nodes.
| Component | Role |
|---|---|
| Physical clock | The node's system time. Synchronized via NTP |
| Logical counter | Distinguishes event order within the same physical timestamp |
For HLC to work correctly, it needs to know the error bound of the physical clock. Kudu reads this error bound reported by the NTP daemon to the kernel to determine "how much the current node's time could differ from the actual time at most." If this value is too large — meaning the clock cannot be trusted — the following problems arise:
- Broken read consistency: Snapshot read timestamps can't be pinpointed, causing "data appears and disappears" phenomena
- Write conflict misjudgment: When two nodes modify the same row simultaneously, it's impossible to determine which came first
- Raft consensus delays: Timeout calculations for leader election and log replication are thrown off
To prevent these risks, Kudu shuts down the service entirely when the clock error exceeds the allowed threshold.
2. Actual Error Messages You'll Encounter
You may see the following errors in Kudu Master or Tablet Server logs.
2.1 Clock not synchronized at all
Clock unsynchronized. Status: Service unavailable: Error reading clock.The NTP daemon is not installed, or hasn't finished synchronizing yet. Kudu checks the kernel's clock status via the ntp_gettime() system call, and this error occurs when the kernel reports "not synchronized."
2.2 Synchronized but error is too high
Clock synchronized, but error: 11130000, is past the maximum allowable error: 10000000NTP is running but the estimated error exceeds Kudu's allowed limit. The unit is microseconds (us). The example above means "current error 11.13 seconds, allowed limit 10 seconds."
2.3 HybridClock initialization failure at startup
Cannot initialize HybridClock. Clock synchronized but error was too highWhen the situation from 2.2 occurs at process startup, this message appears and startup fails.
3. Key Configuration Flags
Kudu's main time synchronization flags are:
| Flag | Default | Description |
|---|---|---|
--max_clock_sync_error_usec | 10000000 (10 seconds) | Maximum allowed clock error. Service stops if exceeded |
--time_source | system | Time source. system uses the OS clock (NTP required), mock is for testing only |
Warning: It's tempting to increase
--max_clock_sync_error_usecto avoid errors, but this is not a real fix. Increasing the tolerance weakens read/write consistency guarantees. Properly configuring NTP is the correct solution.
4. Choosing an NTP Daemon: ntpd vs chrony vs systemd-timesyncd
Since Kudu checks the kernel's clock discipline status, the NTP daemon must properly report time information to the kernel. Not all NTP implementations support this.
| NTP Implementation | Kudu Compatible | Notes |
|---|---|---|
| ntpd (ntp package) | Compatible | Traditional choice. Works on all OSes |
| chrony (chronyd) | Compatible | Modern alternative. Converges faster in VM/container environments. rtcsync option required |
| systemd-timesyncd | Not compatible | Doesn't use kernel discipline API, so Kudu reports "unsynchronized" |
Why systemd-timesyncd won't work
Debian/Ubuntu systems have systemd-timesyncd enabled by default. While this service does synchronize time as an SNTP client, it does not report status through kernel discipline APIs like ntp_adjtime() / ntp_gettime(). Since Kudu checks synchronization status through exactly these APIs, systemd-timesyncd alone will not make Kudu recognize "the clock is synchronized."
# Disable systemd-timesyncd and replace with chrony or ntpd
sudo systemctl stop systemd-timesyncd
sudo systemctl disable systemd-timesyncd5. chrony Configuration (Recommended)
chrony synchronizes faster than ntpd and converges reliably even when the clock is significantly off in VM environments. chrony is recommended for Kudu environments.
5.1 Installation
# Debian / Ubuntu
sudo apt-get install chrony
# RHEL / CentOS / Rocky
sudo yum install chrony5.2 Configuration file (/etc/chrony.conf)
# NTP servers — at least 4 recommended
server time1.google.com iburst
server time2.google.com iburst
server time3.google.com iburst
server time4.google.com iburst
# For AWS environments, add the following (or use instead of the above)
# server 169.254.169.123 prefer iburst
# For GCE environments
# server metadata.google.internal prefer iburst
# Must be enabled for Kudu compatibility
rtcsync
# Step correction if error exceeds 1 second at startup
makestep 1.0 3
# Maximum polling interval (2^7 = 128 seconds). Shorter than default improves sync precision
maxpoll 7Key points:
rtcsync: This option is required for chrony to activate the kernel's clock discipline, allowing Kudu to read "synchronized" status viantp_gettime(). Omitting this single line will cause Kudu to not recognize the clock.iburst: Rapidly sends 8 packets immediately after daemon startup to shorten initial synchronization time.makestep 1.0 3: For the first 3 updates, if the error exceeds 1 second, it corrects immediately via step instead of slew. Especially useful right after VM snapshot restoration or reboot.
5.3 Start the service
sudo systemctl enable chronyd
sudo systemctl start chronyd5.4 Verify synchronization
chronyc trackingItems to check in the output:
Reference ID : D8EF2300 (time1.google.com)
Stratum : 2
Ref time (UTC) : Sat Apr 12 03:22:15 2026
System time : 0.000023420 seconds fast of NTP time
Last offset : +0.000012345 seconds
RMS offset : 0.000025678 seconds
Root delay : 0.012345678 seconds
Root dispersion : 0.001234567 seconds
Leap status : NormalLeap status: Normal— Kernel recognizes the clock as "synchronized." Kudu can start normallySystem time— Current error. Sub-millisecond is good
# Check NTP source servers
chronyc sources -v
# Per-source statistics
chronyc sourcestats6. ntpd Configuration (Alternative)
If chrony isn't an option, ntpd works perfectly fine.
6.1 Installation
# Debian / Ubuntu
sudo apt-get install ntp
# RHEL / CentOS
sudo yum install ntp6.2 Initial time correction
When the clock error is large at startup, ntpd can take minutes to tens of minutes to converge. First correct the time with ntpdate or ntpd -q, then start the daemon.
# First ensure ntpd is stopped
sudo systemctl stop ntp
# Immediate time correction
sudo ntpdate -b time.google.com
# Start daemon
sudo systemctl enable ntp
sudo systemctl start ntp6.3 Configuration file (/etc/ntp.conf)
server time1.google.com iburst
server time2.google.com iburst
server time3.google.com iburst
server time4.google.com iburst
# Shorten maximum polling interval
maxpoll 76.4 Verify synchronization
# Check kernel clock status
ntptimeIf the status field in the output contains NANO or OK, it's normal. If you see UNSYNC, synchronization hasn't completed yet.
# NTP peer status summary
ntpstat
# Detailed peer information
ntpq -p6.5 -x flag caution
Some distributions include the -x flag in ntpd startup options. This flag disables step correction and allows only slew, which means synchronization can take an extremely long time when the clock is significantly off. For Kudu nodes, removing -x is recommended.
# Check in /etc/sysconfig/ntpd or /etc/default/ntp
# Remove -x from OPTIONS="-x -u ntp:ntp"
OPTIONS="-u ntp:ntp"7. Cloud-Specific NTP Servers
In cloud VMs, using the NTP server provided by the cloud platform minimizes network latency and error.
| Cloud | NTP Server | Configuration Example |
|---|---|---|
| AWS | 169.254.169.123 | server 169.254.169.123 prefer iburst |
| GCE | metadata.google.internal | server metadata.google.internal prefer iburst |
| Azure | time.windows.com | server time.windows.com prefer iburst |
| On-premises | Internal Stratum 1/2 server or public NTP pool | server time.google.com iburst |
The
preferkeyword means this server will be used as the primary reference. Since cloud-internal NTP servers are physically close, addingpreferprovides stability.
8. Troubleshooting Checklist
When Kudu dies with a clock-related error, check in this order.
8.1 Is the NTP daemon running?
# For chrony
sudo systemctl status chronyd
# For ntpd
sudo systemctl status ntp8.2 Is systemd-timesyncd running instead?
timedatectl statusIf the output shows NTP service: systemd-timesyncd, a replacement is needed.
8.3 Is the kernel reporting "synchronized"?
# chrony environment
chronyc tracking | grep "Leap status"
# "Normal" means OK
# ntpd environment
ntptime | grep status
# Contains "NANO" or "OK" = normal, "UNSYNC" = problem8.4 Is the estimated error within 10 seconds?
# chrony
chronyc tracking | grep "Root dispersion"
# ntpd
ntptime | grep "maximum error"If the error exceeds 10 seconds (10,000,000 us), Kudu will reject it. Check NTP server connectivity and network.
8.5 Can the NTP servers be reached?
# chrony
chronyc sources
# ntpd
ntpq -pIf all sources show ? or unreachable, check if UDP port 123 is open in the firewall.
9. Caution with chrony's Local Reference Mode
In air-gapped environments without access to external NTP servers, chrony may be operated in local reference mode. In this case, do not deploy Kudu on the chrony local reference node itself. In Kudu 3.4 and earlier, there's a known issue where ntptime on that node reports "unsynchronized." Even if chronyc tracking shows "Normal," the kernel discipline status may report differently.
Solution:
- Separate the chrony local reference server as a dedicated node
- Configure Kudu Master/Tablet Servers to reference that server as a client
10. Pre-Startup Verification Script for Kudu
A simple verification script to run on all Kudu nodes before startup.
#!/usr/bin/env bash
set -euo pipefail
echo "=== NTP Synchronization Check ==="
# 1. Is systemd-timesyncd disabled?
if systemctl is-active --quiet systemd-timesyncd 2>/dev/null; then
echo "[FAIL] systemd-timesyncd is active. Disable it and install chrony/ntpd."
exit 1
fi
echo "[OK] systemd-timesyncd disabled"
# 2. Is chrony or ntpd running?
if systemctl is-active --quiet chronyd 2>/dev/null; then
echo "[OK] chronyd running"
NTP_TYPE="chrony"
elif systemctl is-active --quiet ntp 2>/dev/null || systemctl is-active --quiet ntpd 2>/dev/null; then
echo "[OK] ntpd running"
NTP_TYPE="ntpd"
else
echo "[FAIL] Neither chrony nor ntpd is running."
exit 1
fi
# 3. Kernel synchronization status
if command -v ntptime &>/dev/null; then
if ntptime 2>&1 | grep -q "UNSYNC"; then
echo "[FAIL] Kernel clock is not yet synchronized. Wait for NTP daemon to converge."
exit 1
fi
echo "[OK] Kernel clock synchronized"
fi
# 4. Check estimated error (chrony)
if [ "$NTP_TYPE" = "chrony" ]; then
OFFSET=$(chronyc tracking 2>/dev/null | grep "System time" | awk '{print $4}')
echo "[INFO] Current system time offset: ${OFFSET} seconds"
fi
echo ""
echo "=== Check complete. Kudu is ready to start. ==="11. Summary
| Question | Answer |
|---|---|
| Is NTP required for Kudu? | Required. Won't even start without it |
Is systemd-timesyncd sufficient? | No. chrony or ntpd is needed |
| Recommended NTP daemon? | chrony (fast convergence, VM-friendly) |
| Must-have chrony option? | rtcsync |
| Default error threshold? | 10 seconds (--max_clock_sync_error_usec=10000000) |
| Can the error limit be raised? | Not recommended. Weakens consistency guarantees |
| Cloud NTP servers? | AWS 169.254.169.123, GCE metadata.google.internal |
Time synchronization is the most fundamental prerequisite for Kudu operations. If you set up NTP first when initially building the cluster, you'll almost never encounter clock-related failures in subsequent operations. Conversely, ignoring this leads to the most difficult-to-understand forms of data inconsistency.
This post was written based on the Apache Kudu official Troubleshooting documentation. If you need help with NTP configuration or Kudu operations in your cluster environment, feel free to reach out.
— Data Dynamics Engineering Team