Functional Limitations You Must Know When Designing Apache Kudu Tables
A comprehensive guide to Kudu's partition count limits, column count restrictions, Primary Key design order, and other critical constraints for production operations.
Apache Kudu is a powerful storage engine that supports both fast analytical queries and real-time upserts simultaneously. However, there are functional limitations you must be aware of when designing tables. Ignoring these constraints can lead to wasted disk space, performance degradation, and even cluster instability. This post is based on the Kudu Known Issues and Limitations documentation and the Schema Design Guide, summarizing the constraints that are easy to overlook in production.
1. Partition (Tablet) Count Limits
In Kudu, tables are split into multiple tablets that are distributed across Tablet Servers. Partitioning is essential for parallel processing, but indiscriminately increasing the partition count causes serious side effects.
1.1 Fixed Overhead Per Partition
Each tablet consumes several MB of disk space even when it contains no data. This is due to metadata, WAL (Write-Ahead Log), and consensus state information maintained per tablet.
| Item | Approximate Size |
|---|---|
| Tablet metadata | Hundreds of KB to several MB |
| WAL segment (minimum 1) | Several MB |
| Consensus metadata | Tens of KB |
| Bloom filter, Primary Key index | Proportional to data volume, but minimum size exists |
For example, creating 16 hash partitions x 365 range partitions (daily for one year) = 5,840 tablets will consume tens of GB of disk space even with zero data. This is purely the cost of partition metadata and structural maintenance, not actual data.
1.2 Recommended Guidelines
The guidelines from the official Kudu documentation are as follows:
| Criterion | Recommended Value |
|---|---|
| Tablets per Tablet Server | 1,000 or fewer (ideally several hundred) |
| Tablets per table | Total Tablet Servers in cluster x a few dozen or fewer |
| Recommended data size per tablet | 1 GB ~ 10 GB |
Key principle: Rather than increasing tablet count to boost parallelism, maintaining appropriately sized tablets is better for both performance and stability.
1.3 Problems Caused by Too Many Partitions
- WAL disk space explosion: This is a particularly severe issue. Each tablet maintains an independent WAL (Write-Ahead Log) segment, and even empty tablets must have a WAL file. As partition count grows rapidly, WAL disk usage increases exponentially regardless of actual data size. During initial hardware capacity planning, WAL disks are often sized smaller relative to data disks. When partition counts exceed expectations, the WAL disk fills up first, resulting in inability to create new tables or add range partitions to existing tables. WAL disk exhaustion can lead to Tablet Server crashes, so hardware capacity planning must calculate WAL disk size generously based on the total tablet count across the entire cluster.
- Disk space waste: Even empty tablets consume several MB each, making partition count itself a major driver of disk consumption
- Memory pressure: Each tablet maintains MemRowSet, DeltaMemStore, etc. in memory. As tablet count grows, Tablet Server memory becomes insufficient
- Increased Master load: Kudu Master manages location information for all tablets. Excessive tablet counts slow down heartbeat processing and tablet location lookups
- Compaction contention: Each tablet performs compaction independently, so more tablets means more I/O contention
- Leader election storms: When a Tablet Server restarts and thousands of tablets simultaneously initiate leader elections, the entire cluster becomes unstable for several minutes
- Cluster-wide performance degradation: When the partition count exceeds the cluster's processing capacity, read/write performance degrades across all tables. This is not limited to a single table but is a serious issue that affects every table in the cluster.
Warning: The partition count problem is not a single-table issue. The sum of partition counts across all tables in the cluster determines the Tablet Server's processing limits. Over-partitioning one table degrades performance for all other tables as well.
1.4 Daily vs Monthly Partitions: The Importance of Partition Granularity
When dealing with time-series data, creating range partitions on a daily basis is one of the most common mistakes. Let's compare the tablet count explosion caused by daily partitions:
| Partition Strategy | Hash 8 x Range Partitions | Tablets per Year | Tablets over 3 Years |
|---|---|---|---|
| Daily partitions | 8 x 365 | 2,920 | 8,760 |
| Monthly partitions | 8 x 12 | 96 | 288 |
The same table can have a 30x difference in tablet count between daily and monthly partitions. If you have multiple tables, this difference multiplies.
| Scenario (5 tables, Hash 8, 3-year retention) | Daily Partitions | Monthly Partitions |
|---|---|---|
| Total cluster tablet count | 43,800 | 1,440 |
| Tablets per server (5 Tablet Servers) | 8,760 | 288 |
| WAL disk consumption (including empty tablets) | Hundreds of GB | A few GB |
Given the recommended limit of 1,000 tablets per Tablet Server, the daily partition strategy results in 8,760 per server, exceeding the recommendation by 8.7x. This destabilizes the cluster and causes failures due to WAL disk exhaustion.
Recommendation: Unless you absolutely must query data at daily granularity, adopt monthly partitions as the default strategy. Even when daily queries are needed, if the Primary Key includes a date column, efficient scan pruning is still possible within monthly partitions. Choosing a coarser partition granularity is almost always safer than fine-grained partitioning.
1.5 Dynamic Range Partition Management
Range partitions can be dynamically added or removed using ADD RANGE PARTITION / DROP RANGE PARTITION DDL. When handling time-series data, instead of pre-creating hundreds of future partitions, add them as the time approaches and drop old partitions as an effective strategy.
-- Add a new monthly partition
ALTER TABLE events ADD RANGE PARTITION VALUE = '2026-05';
-- Drop an old partition (all data in the partition is also deleted)
ALTER TABLE events DROP RANGE PARTITION VALUE = '2024-01';2. Column Count Limits
2.1 Hard Limit and Soft Limit
Kudu supports a maximum of 300 columns per table. However, this is the hard limit, and in practice, far fewer columns are recommended.
| Category | Column Count |
|---|---|
| Hard limit | 300 |
| Soft limit (recommended) | A few dozen to 100 or fewer |
2.2 Problems with Too Many Columns
- Scan performance degradation: As a columnar storage engine, Kudu must read data blocks from every column for SELECT *. More columns means dramatically more I/O
- Increased RowSet compaction cost: Compaction requires reading and rewriting all columns, so CPU and I/O costs grow with column count
- Increased metadata size: Schema information is included in every tablet's metadata, so more columns means larger metadata
- Client-side overhead: Query engines like Impala experience delays when fetching schemas
Tip: If you need more than 200 columns, consider splitting the table logically or dividing it by groups of columns that are frequently queried together.
2.3 Crash Risk When Forcing Column Count to Hard Limit
To bypass Kudu's soft limit, some users set the --max_num_columns flag to forcibly raise the column limit up to the hard limit (300). This is extremely dangerous and can cause the following severe problems:
- Tablet Server Crash: When column count becomes extremely high, each tablet's RowSet metadata, DeltaMemStore, Bloom filter, etc. grow proportionally. When the Tablet Server's memory limit is exceeded, the process is killed by OOM (Out of Memory).
- Compaction failure: Compaction of tablets with very many columns consumes extreme amounts of CPU and memory. If memory runs out during compaction, the tablet enters a compaction-impossible state, and subsequent write performance degrades severely.
- Schema propagation failure: When propagating the schema of a table with hundreds of columns to all tablets, if metadata size exceeds a threshold, heartbeats between Master and Tablet Server fail, and tablets can become unavailable.
- Unrecoverable state: In the worst case, if tablet metadata is corrupted due to excessive columns, the data in that tablet cannot be recovered.
Warning: The soft limit was set based on the range in which the Kudu development team can guarantee stable operation. Raising
--max_num_columnsto create columns approaching the hard limit can result in fatal crashes of the Kudu storage itself. If you need many columns, always choose to split tables instead.
3. Primary Key Design
Kudu's Primary Key doesn't just guarantee uniqueness — it determines the physical storage order of data. Therefore, the column composition and order of the Primary Key directly impact performance.
3.1 Primary Key Characteristics
| Characteristic | Description |
|---|---|
| Immutable | Cannot change Primary Key after table creation (no adding/removing/reordering columns) |
| NOT NULL required | Primary Key columns must be NOT NULL |
| Determines storage order | Data is stored sorted by Primary Key byte order |
| Acts as index | Primary Key automatically functions as a clustered index |
| Type restrictions | FLOAT, DOUBLE, BOOL types cannot be used in Primary Key |
3.2 Why Column Order Matters in Composite Primary Keys
Kudu performs scan range pruning based on the prefix of the Primary Key. This follows the same principle as composite indexes in RDBMS.
For example, in a table with Primary Key (region, timestamp, user_id):
| Query Condition | Pruning Possible? |
|---|---|
WHERE region = 'kr' | Yes — first column, efficient |
WHERE region = 'kr' AND timestamp >= '2026-04-01' | Yes — conditions follow prefix order |
WHERE timestamp >= '2026-04-01' | No — missing first column (region) condition, results in full scan |
WHERE user_id = 'abc' | No — skips prefix, results in full scan |
3.3 Primary Key Column Order Design Principles
- Place the most frequently filtered column first: Analyze query patterns and put the column most commonly appearing in WHERE clauses first
- Consider cardinality: Placing a very low-cardinality column (e.g., boolean) first yields minimal pruning benefit
- Prevent write hotspots: Placing a monotonically increasing value (e.g., timestamp) as the first column concentrates all writes on the last tablet
- Combine with Hash Partitioning: Applying hash partitioning to the first column is a common practice for hotspot prevention
-- Good example: hash partition for write distribution + range partition for time-series management
CREATE TABLE events (
event_id STRING NOT NULL,
event_time TIMESTAMP NOT NULL,
region STRING NOT NULL,
payload STRING,
PRIMARY KEY (event_id, event_time, region)
)
PARTITION BY
HASH (event_id) PARTITIONS 8,
RANGE (event_time) (
PARTITION VALUE = '2026-04'
)
STORED AS KUDU;-- Bad example: timestamp as first column with only range partitioning, no hash
CREATE TABLE events (
event_time TIMESTAMP NOT NULL,
event_id STRING NOT NULL,
region STRING NOT NULL,
payload STRING,
PRIMARY KEY (event_time, event_id, region)
)
PARTITION BY
RANGE (event_time) (
PARTITION VALUE = '2026-04'
)
STORED AS KUDU;
-- Problem: writes for latest data concentrate on a single tablet -> hotspot3.4 Primary Key Column Count Limits
While there is no explicit hard limit on the number of columns composing the Primary Key, having too many columns causes the following issues:
- Increased Primary Key index size: Larger Primary Key bytes per row means the index consumes more memory and disk
- Comparison operation cost: Primary Key comparison costs increase for Insert/Update/Delete operations
- Reduced schema flexibility: Since Primary Key columns cannot be changed, including more columns reduces room for schema changes
Recommendation: Compose Primary Keys with 3 to 5 columns or fewer, and always consider query patterns and partitioning strategy together.
4. Other Key Limitations
4.1 Data Type Restrictions
| Restriction | Description |
|---|---|
DECIMAL precision | Maximum precision 38, scale 38. Be cautious with financial data |
VARCHAR / STRING max size | Maximum 64 KB per cell |
BINARY max size | Maximum 64 KB per cell |
| Primary Key total cell size | Maximum 16 KB |
| Immutable columns | Primary Key column values cannot be changed after INSERT |
4.2 Schema Change Restrictions
| Restriction | Description |
|---|---|
| Primary Key change | Not possible. Cannot modify PK or add/remove columns via schema change |
| Column type change | Not possible. Cannot change the data type of existing columns |
| Column rename | Possible |
| Column add/drop | Possible (except Primary Key columns) |
| Partition schema change | Not possible. Cannot change hash partition bucket count or partition columns |
| Range partition add/drop | Possible. Range partitions can be dynamically added/dropped |
4.3 Replication Restrictions
| Restriction | Description |
|---|---|
| Replication Factor | Set at table creation, cannot be changed afterwards. Only odd numbers allowed (default 3) |
| Minimum Tablet Server count | Must have at least as many Tablet Servers as the Replication Factor |
4.4 Secondary Index
Kudu does not support secondary indexes. Therefore, queries on non-Primary Key columns result in full scans. When used with Impala, you rely on Impala's execution plan, making it critical to carefully design the Primary Key and partitioning strategy.
4.5 Multi-row Transactions
Kudu guarantees single-row atomicity only. Multi-row transactions are not supported, so business logic requiring simultaneous updates across multiple rows must be compensated at the application level.
4.6 Table Count Limits
There is no explicit hard limit on the number of tables in a cluster, but as tables increase, each table's tablet count accumulates, eventually hitting the tablet count limit. The key formula is: Number of tables x tablets per table = total cluster tablet count.
5. Design Checklist
Check the following items before creating a table.
Partitioning
- Is the hash partition bucket count close to a multiple of the Tablet Server count?
- If using range partitions, are you avoiding pre-creating excessive future partitions?
- Is the total tablet count per table (Hash buckets x Range partitions) appropriate?
- Does the total tablet count per Tablet Server stay below 1,000?
- Is the disk waste from empty tablets within acceptable limits?
- Can the WAL disk capacity handle the total tablet count?
- Have you considered using monthly partitions instead of daily partitions?
- Have you calculated the total partition count across all tables in the cluster?
Column Design
- Is the column count below 100? (Consider splitting tables if it exceeds this)
- Have you avoided forcibly raising the
--max_num_columnsflag? (Crash risk) - Are there no unnecessary columns included?
- Is there no possibility of data exceeding 64 KB in STRING/BINARY columns?
Primary Key
- Does the Primary Key column order match major query patterns?
- If a monotonically increasing value is the first Primary Key column, have you applied hash partitioning?
- Have you avoided including unnecessarily many columns in the Primary Key?
- Is the Primary Key total cell size 16 KB or less?
- Have you avoided using FLOAT, DOUBLE, or BOOL types in Primary Key columns?
Operations
- Is the Replication Factor set appropriately? (3 recommended for production)
- If schema changes may be needed in the future, is the Primary Key designed minimally?
- Given the lack of secondary indexes, is the PK designed to match query patterns?
6. Summary
| Item | Limitation / Recommendation |
|---|---|
| Tablets per Tablet Server | 1,000 or fewer recommended |
| Empty tablet disk usage | Several MB per tablet (including WAL) |
| WAL disk | Grows proportionally with partition count — must be considered in capacity planning |
| Recommended partition granularity | Prefer monthly partitions over daily |
| Forcing column count hard limit increase | Tablet Server crash risk — strongly not recommended |
| Maximum columns per table | 300 (recommended 100 or fewer) |
| Primary Key change | Not possible |
| Primary Key column type restrictions | FLOAT, DOUBLE, BOOL cannot be used |
| Primary Key total cell size | 16 KB or less |
| STRING/BINARY max cell size | 64 KB |
| Secondary Index | Not supported |
| Multi-row transactions | Not supported (single-row atomicity only) |
| Dynamic Range Partition management | Supported (ADD / DROP) |
| Hash Partition change | Not possible |
| Replication Factor change | Not possible |
Kudu is a storage engine where decisions made at design time affect operations for the entire lifecycle. In particular, since Primary Key and partitioning schemas cannot be changed once set, you must thoroughly consider query patterns, data growth rates, and cluster scale before creating tables. The "we can fix it later" approach simply does not work with Kudu.
This post was written based on the Apache Kudu Known Issues and Limitations and the Schema Design Guide. If you need assistance with Kudu table design or operations, feel free to reach out.
— Data Dynamics Engineering Team