kudupartitioninglimitation

Functional Limitations You Must Know When Designing Apache Kudu Tables

A comprehensive guide to Kudu's partition count limits, column count restrictions, Primary Key design order, and other critical constraints for production operations.

Data DynamicsApril 13, 202614 min read

Apache Kudu is a powerful storage engine that supports both fast analytical queries and real-time upserts simultaneously. However, there are functional limitations you must be aware of when designing tables. Ignoring these constraints can lead to wasted disk space, performance degradation, and even cluster instability. This post is based on the Kudu Known Issues and Limitations documentation and the Schema Design Guide, summarizing the constraints that are easy to overlook in production.

1. Partition (Tablet) Count Limits

In Kudu, tables are split into multiple tablets that are distributed across Tablet Servers. Partitioning is essential for parallel processing, but indiscriminately increasing the partition count causes serious side effects.

1.1 Fixed Overhead Per Partition

Each tablet consumes several MB of disk space even when it contains no data. This is due to metadata, WAL (Write-Ahead Log), and consensus state information maintained per tablet.

Item	Approximate Size
Tablet metadata	Hundreds of KB to several MB
WAL segment (minimum 1)	Several MB
Consensus metadata	Tens of KB
Bloom filter, Primary Key index	Proportional to data volume, but minimum size exists

For example, creating 16 hash partitions x 365 range partitions (daily for one year) = 5,840 tablets will consume tens of GB of disk space even with zero data. This is purely the cost of partition metadata and structural maintenance, not actual data.

1.2 Recommended Guidelines

The guidelines from the official Kudu documentation are as follows:

Criterion	Recommended Value
Tablets per Tablet Server	1,000 or fewer (ideally several hundred)
Tablets per table	Total Tablet Servers in cluster x a few dozen or fewer
Recommended data size per tablet	1 GB ~ 10 GB

Key principle: Rather than increasing tablet count to boost parallelism, maintaining appropriately sized tablets is better for both performance and stability.

1.3 Problems Caused by Too Many Partitions

WAL disk space explosion: This is a particularly severe issue. Each tablet maintains an independent WAL (Write-Ahead Log) segment, and even empty tablets must have a WAL file. As partition count grows rapidly, WAL disk usage increases exponentially regardless of actual data size. During initial hardware capacity planning, WAL disks are often sized smaller relative to data disks. When partition counts exceed expectations, the WAL disk fills up first, resulting in inability to create new tables or add range partitions to existing tables. WAL disk exhaustion can lead to Tablet Server crashes, so hardware capacity planning must calculate WAL disk size generously based on the total tablet count across the entire cluster.
Disk space waste: Even empty tablets consume several MB each, making partition count itself a major driver of disk consumption
Memory pressure: Each tablet maintains MemRowSet, DeltaMemStore, etc. in memory. As tablet count grows, Tablet Server memory becomes insufficient
Increased Master load: Kudu Master manages location information for all tablets. Excessive tablet counts slow down heartbeat processing and tablet location lookups
Compaction contention: Each tablet performs compaction independently, so more tablets means more I/O contention
Leader election storms: When a Tablet Server restarts and thousands of tablets simultaneously initiate leader elections, the entire cluster becomes unstable for several minutes
Cluster-wide performance degradation: When the partition count exceeds the cluster's processing capacity, read/write performance degrades across all tables. This is not limited to a single table but is a serious issue that affects every table in the cluster.

Warning: The partition count problem is not a single-table issue. The sum of partition counts across all tables in the cluster determines the Tablet Server's processing limits. Over-partitioning one table degrades performance for all other tables as well.

1.4 Daily vs Monthly Partitions: The Importance of Partition Granularity

When dealing with time-series data, creating range partitions on a daily basis is one of the most common mistakes. Let's compare the tablet count explosion caused by daily partitions:

Partition Strategy	Hash 8 x Range Partitions	Tablets per Year	Tablets over 3 Years
Daily partitions	8 x 365	2,920	8,760
Monthly partitions	8 x 12	96	288

The same table can have a 30x difference in tablet count between daily and monthly partitions. If you have multiple tables, this difference multiplies.

Scenario (5 tables, Hash 8, 3-year retention)	Daily Partitions	Monthly Partitions
Total cluster tablet count	43,800	1,440
Tablets per server (5 Tablet Servers)	8,760	288
WAL disk consumption (including empty tablets)	Hundreds of GB	A few GB

Given the recommended limit of 1,000 tablets per Tablet Server, the daily partition strategy results in 8,760 per server, exceeding the recommendation by 8.7x. This destabilizes the cluster and causes failures due to WAL disk exhaustion.

Recommendation: Unless you absolutely must query data at daily granularity, adopt monthly partitions as the default strategy. Even when daily queries are needed, if the Primary Key includes a date column, efficient scan pruning is still possible within monthly partitions. Choosing a coarser partition granularity is almost always safer than fine-grained partitioning.

1.5 Dynamic Range Partition Management

Range partitions can be dynamically added or removed using ADD RANGE PARTITION / DROP RANGE PARTITION DDL. When handling time-series data, instead of pre-creating hundreds of future partitions, add them as the time approaches and drop old partitions as an effective strategy.

-- Add a new monthly partition
ALTER TABLE events ADD RANGE PARTITION VALUE = '2026-05';
 
-- Drop an old partition (all data in the partition is also deleted)
ALTER TABLE events DROP RANGE PARTITION VALUE = '2024-01';

2. Column Count Limits

2.1 Hard Limit and Soft Limit

Kudu supports a maximum of 300 columns per table. However, this is the hard limit, and in practice, far fewer columns are recommended.

Category	Column Count
Hard limit	300
Soft limit (recommended)	A few dozen to 100 or fewer

2.2 Problems with Too Many Columns

Scan performance degradation: As a columnar storage engine, Kudu must read data blocks from every column for SELECT *. More columns means dramatically more I/O
Increased RowSet compaction cost: Compaction requires reading and rewriting all columns, so CPU and I/O costs grow with column count
Increased metadata size: Schema information is included in every tablet's metadata, so more columns means larger metadata
Client-side overhead: Query engines like Impala experience delays when fetching schemas

Tip: If you need more than 200 columns, consider splitting the table logically or dividing it by groups of columns that are frequently queried together.

2.3 Crash Risk When Forcing Column Count to Hard Limit

To bypass Kudu's soft limit, some users set the --max_num_columns flag to forcibly raise the column limit up to the hard limit (300). This is extremely dangerous and can cause the following severe problems:

Tablet Server Crash: When column count becomes extremely high, each tablet's RowSet metadata, DeltaMemStore, Bloom filter, etc. grow proportionally. When the Tablet Server's memory limit is exceeded, the process is killed by OOM (Out of Memory).
Compaction failure: Compaction of tablets with very many columns consumes extreme amounts of CPU and memory. If memory runs out during compaction, the tablet enters a compaction-impossible state, and subsequent write performance degrades severely.
Schema propagation failure: When propagating the schema of a table with hundreds of columns to all tablets, if metadata size exceeds a threshold, heartbeats between Master and Tablet Server fail, and tablets can become unavailable.
Unrecoverable state: In the worst case, if tablet metadata is corrupted due to excessive columns, the data in that tablet cannot be recovered.

Warning: The soft limit was set based on the range in which the Kudu development team can guarantee stable operation. Raising --max_num_columns to create columns approaching the hard limit can result in fatal crashes of the Kudu storage itself. If you need many columns, always choose to split tables instead.

3. Primary Key Design

Kudu's Primary Key doesn't just guarantee uniqueness — it determines the physical storage order of data. Therefore, the column composition and order of the Primary Key directly impact performance.

3.1 Primary Key Characteristics

Characteristic	Description
Immutable	Cannot change Primary Key after table creation (no adding/removing/reordering columns)
NOT NULL required	Primary Key columns must be NOT NULL
Determines storage order	Data is stored sorted by Primary Key byte order
Acts as index	Primary Key automatically functions as a clustered index
Type restrictions	`FLOAT`, `DOUBLE`, `BOOL` types cannot be used in Primary Key

3.2 Why Column Order Matters in Composite Primary Keys

Kudu performs scan range pruning based on the prefix of the Primary Key. This follows the same principle as composite indexes in RDBMS.

For example, in a table with Primary Key (region, timestamp, user_id):

Query Condition	Pruning Possible?
`WHERE region = 'kr'`	Yes — first column, efficient
`WHERE region = 'kr' AND timestamp >= '2026-04-01'`	Yes — conditions follow prefix order
`WHERE timestamp >= '2026-04-01'`	No — missing first column (region) condition, results in full scan
`WHERE user_id = 'abc'`	No — skips prefix, results in full scan

3.3 Primary Key Column Order Design Principles

Place the most frequently filtered column first: Analyze query patterns and put the column most commonly appearing in WHERE clauses first
Consider cardinality: Placing a very low-cardinality column (e.g., boolean) first yields minimal pruning benefit
Prevent write hotspots: Placing a monotonically increasing value (e.g., timestamp) as the first column concentrates all writes on the last tablet
Combine with Hash Partitioning: Applying hash partitioning to the first column is a common practice for hotspot prevention

-- Good example: hash partition for write distribution + range partition for time-series management
CREATE TABLE events (
    event_id STRING NOT NULL,
    event_time TIMESTAMP NOT NULL,
    region STRING NOT NULL,
    payload STRING,
    PRIMARY KEY (event_id, event_time, region)
)
PARTITION BY
    HASH (event_id) PARTITIONS 8,
    RANGE (event_time) (
        PARTITION VALUE = '2026-04'
    )
STORED AS KUDU;

-- Bad example: timestamp as first column with only range partitioning, no hash
CREATE TABLE events (
    event_time TIMESTAMP NOT NULL,
    event_id STRING NOT NULL,
    region STRING NOT NULL,
    payload STRING,
    PRIMARY KEY (event_time, event_id, region)
)
PARTITION BY
    RANGE (event_time) (
        PARTITION VALUE = '2026-04'
    )
STORED AS KUDU;
-- Problem: writes for latest data concentrate on a single tablet -> hotspot

3.4 Primary Key Column Count Limits

While there is no explicit hard limit on the number of columns composing the Primary Key, having too many columns causes the following issues:

Increased Primary Key index size: Larger Primary Key bytes per row means the index consumes more memory and disk
Comparison operation cost: Primary Key comparison costs increase for Insert/Update/Delete operations
Reduced schema flexibility: Since Primary Key columns cannot be changed, including more columns reduces room for schema changes

Recommendation: Compose Primary Keys with 3 to 5 columns or fewer, and always consider query patterns and partitioning strategy together.

4. Other Key Limitations

4.1 Data Type Restrictions

Restriction	Description
`DECIMAL` precision	Maximum precision 38, scale 38. Be cautious with financial data
`VARCHAR` / `STRING` max size	Maximum 64 KB per cell
`BINARY` max size	Maximum 64 KB per cell
Primary Key total cell size	Maximum 16 KB
Immutable columns	Primary Key column values cannot be changed after INSERT

4.2 Schema Change Restrictions

Restriction	Description
Primary Key change	Not possible. Cannot modify PK or add/remove columns via schema change
Column type change	Not possible. Cannot change the data type of existing columns
Column rename	Possible
Column add/drop	Possible (except Primary Key columns)
Partition schema change	Not possible. Cannot change hash partition bucket count or partition columns
Range partition add/drop	Possible. Range partitions can be dynamically added/dropped

4.3 Replication Restrictions

Restriction	Description
Replication Factor	Set at table creation, cannot be changed afterwards. Only odd numbers allowed (default 3)
Minimum Tablet Server count	Must have at least as many Tablet Servers as the Replication Factor

4.4 Secondary Index

Kudu does not support secondary indexes. Therefore, queries on non-Primary Key columns result in full scans. When used with Impala, you rely on Impala's execution plan, making it critical to carefully design the Primary Key and partitioning strategy.

4.5 Multi-row Transactions

Kudu guarantees single-row atomicity only. Multi-row transactions are not supported, so business logic requiring simultaneous updates across multiple rows must be compensated at the application level.

4.6 Table Count Limits

There is no explicit hard limit on the number of tables in a cluster, but as tables increase, each table's tablet count accumulates, eventually hitting the tablet count limit. The key formula is: Number of tables x tablets per table = total cluster tablet count.

5. Design Checklist

Check the following items before creating a table.

Partitioning

Is the hash partition bucket count close to a multiple of the Tablet Server count?
If using range partitions, are you avoiding pre-creating excessive future partitions?
Is the total tablet count per table (Hash buckets x Range partitions) appropriate?
Does the total tablet count per Tablet Server stay below 1,000?
Is the disk waste from empty tablets within acceptable limits?
Can the WAL disk capacity handle the total tablet count?
Have you considered using monthly partitions instead of daily partitions?
Have you calculated the total partition count across all tables in the cluster?

Column Design

Is the column count below 100? (Consider splitting tables if it exceeds this)
Have you avoided forcibly raising the --max_num_columns flag? (Crash risk)
Are there no unnecessary columns included?
Is there no possibility of data exceeding 64 KB in STRING/BINARY columns?

Primary Key

Does the Primary Key column order match major query patterns?
If a monotonically increasing value is the first Primary Key column, have you applied hash partitioning?
Have you avoided including unnecessarily many columns in the Primary Key?
Is the Primary Key total cell size 16 KB or less?
Have you avoided using FLOAT, DOUBLE, or BOOL types in Primary Key columns?

Operations

Is the Replication Factor set appropriately? (3 recommended for production)
If schema changes may be needed in the future, is the Primary Key designed minimally?
Given the lack of secondary indexes, is the PK designed to match query patterns?

6. Summary

Item	Limitation / Recommendation
Tablets per Tablet Server	1,000 or fewer recommended
Empty tablet disk usage	Several MB per tablet (including WAL)
WAL disk	Grows proportionally with partition count — must be considered in capacity planning
Recommended partition granularity	Prefer monthly partitions over daily
Forcing column count hard limit increase	Tablet Server crash risk — strongly not recommended
Maximum columns per table	300 (recommended 100 or fewer)
Primary Key change	Not possible
Primary Key column type restrictions	FLOAT, DOUBLE, BOOL cannot be used
Primary Key total cell size	16 KB or less
STRING/BINARY max cell size	64 KB
Secondary Index	Not supported
Multi-row transactions	Not supported (single-row atomicity only)
Dynamic Range Partition management	Supported (ADD / DROP)
Hash Partition change	Not possible
Replication Factor change	Not possible

Kudu is a storage engine where decisions made at design time affect operations for the entire lifecycle. In particular, since Primary Key and partitioning schemas cannot be changed once set, you must thoroughly consider query patterns, data growth rates, and cluster scale before creating tables. The "we can fix it later" approach simply does not work with Kudu.

This post was written based on the Apache Kudu Known Issues and Limitations and the Schema Design Guide. If you need assistance with Kudu table design or operations, feel free to reach out.

— Data Dynamics Engineering Team