Blog
kudupartitioninglimitation

Functional Limitations You Must Know When Designing Apache Kudu Tables

A comprehensive guide to Kudu's partition count limits, column count restrictions, Primary Key design order, and other critical constraints for production operations.

Data DynamicsApril 13, 202614 min read

Apache Kudu is a powerful storage engine that supports both fast analytical queries and real-time upserts simultaneously. However, there are functional limitations you must be aware of when designing tables. Ignoring these constraints can lead to wasted disk space, performance degradation, and even cluster instability. This post is based on the Kudu Known Issues and Limitations documentation and the Schema Design Guide, summarizing the constraints that are easy to overlook in production.

1. Partition (Tablet) Count Limits

In Kudu, tables are split into multiple tablets that are distributed across Tablet Servers. Partitioning is essential for parallel processing, but indiscriminately increasing the partition count causes serious side effects.

1.1 Fixed Overhead Per Partition

Each tablet consumes several MB of disk space even when it contains no data. This is due to metadata, WAL (Write-Ahead Log), and consensus state information maintained per tablet.

ItemApproximate Size
Tablet metadataHundreds of KB to several MB
WAL segment (minimum 1)Several MB
Consensus metadataTens of KB
Bloom filter, Primary Key indexProportional to data volume, but minimum size exists

For example, creating 16 hash partitions x 365 range partitions (daily for one year) = 5,840 tablets will consume tens of GB of disk space even with zero data. This is purely the cost of partition metadata and structural maintenance, not actual data.

The guidelines from the official Kudu documentation are as follows:

CriterionRecommended Value
Tablets per Tablet Server1,000 or fewer (ideally several hundred)
Tablets per tableTotal Tablet Servers in cluster x a few dozen or fewer
Recommended data size per tablet1 GB ~ 10 GB

Key principle: Rather than increasing tablet count to boost parallelism, maintaining appropriately sized tablets is better for both performance and stability.

1.3 Problems Caused by Too Many Partitions

  • WAL disk space explosion: This is a particularly severe issue. Each tablet maintains an independent WAL (Write-Ahead Log) segment, and even empty tablets must have a WAL file. As partition count grows rapidly, WAL disk usage increases exponentially regardless of actual data size. During initial hardware capacity planning, WAL disks are often sized smaller relative to data disks. When partition counts exceed expectations, the WAL disk fills up first, resulting in inability to create new tables or add range partitions to existing tables. WAL disk exhaustion can lead to Tablet Server crashes, so hardware capacity planning must calculate WAL disk size generously based on the total tablet count across the entire cluster.
  • Disk space waste: Even empty tablets consume several MB each, making partition count itself a major driver of disk consumption
  • Memory pressure: Each tablet maintains MemRowSet, DeltaMemStore, etc. in memory. As tablet count grows, Tablet Server memory becomes insufficient
  • Increased Master load: Kudu Master manages location information for all tablets. Excessive tablet counts slow down heartbeat processing and tablet location lookups
  • Compaction contention: Each tablet performs compaction independently, so more tablets means more I/O contention
  • Leader election storms: When a Tablet Server restarts and thousands of tablets simultaneously initiate leader elections, the entire cluster becomes unstable for several minutes
  • Cluster-wide performance degradation: When the partition count exceeds the cluster's processing capacity, read/write performance degrades across all tables. This is not limited to a single table but is a serious issue that affects every table in the cluster.

Warning: The partition count problem is not a single-table issue. The sum of partition counts across all tables in the cluster determines the Tablet Server's processing limits. Over-partitioning one table degrades performance for all other tables as well.

1.4 Daily vs Monthly Partitions: The Importance of Partition Granularity

When dealing with time-series data, creating range partitions on a daily basis is one of the most common mistakes. Let's compare the tablet count explosion caused by daily partitions:

Partition StrategyHash 8 x Range PartitionsTablets per YearTablets over 3 Years
Daily partitions8 x 3652,9208,760
Monthly partitions8 x 1296288

The same table can have a 30x difference in tablet count between daily and monthly partitions. If you have multiple tables, this difference multiplies.

Scenario (5 tables, Hash 8, 3-year retention)Daily PartitionsMonthly Partitions
Total cluster tablet count43,8001,440
Tablets per server (5 Tablet Servers)8,760288
WAL disk consumption (including empty tablets)Hundreds of GBA few GB

Given the recommended limit of 1,000 tablets per Tablet Server, the daily partition strategy results in 8,760 per server, exceeding the recommendation by 8.7x. This destabilizes the cluster and causes failures due to WAL disk exhaustion.

Recommendation: Unless you absolutely must query data at daily granularity, adopt monthly partitions as the default strategy. Even when daily queries are needed, if the Primary Key includes a date column, efficient scan pruning is still possible within monthly partitions. Choosing a coarser partition granularity is almost always safer than fine-grained partitioning.

1.5 Dynamic Range Partition Management

Range partitions can be dynamically added or removed using ADD RANGE PARTITION / DROP RANGE PARTITION DDL. When handling time-series data, instead of pre-creating hundreds of future partitions, add them as the time approaches and drop old partitions as an effective strategy.

-- Add a new monthly partition
ALTER TABLE events ADD RANGE PARTITION VALUE = '2026-05';
 
-- Drop an old partition (all data in the partition is also deleted)
ALTER TABLE events DROP RANGE PARTITION VALUE = '2024-01';

2. Column Count Limits

2.1 Hard Limit and Soft Limit

Kudu supports a maximum of 300 columns per table. However, this is the hard limit, and in practice, far fewer columns are recommended.

CategoryColumn Count
Hard limit300
Soft limit (recommended)A few dozen to 100 or fewer

2.2 Problems with Too Many Columns

  • Scan performance degradation: As a columnar storage engine, Kudu must read data blocks from every column for SELECT *. More columns means dramatically more I/O
  • Increased RowSet compaction cost: Compaction requires reading and rewriting all columns, so CPU and I/O costs grow with column count
  • Increased metadata size: Schema information is included in every tablet's metadata, so more columns means larger metadata
  • Client-side overhead: Query engines like Impala experience delays when fetching schemas

Tip: If you need more than 200 columns, consider splitting the table logically or dividing it by groups of columns that are frequently queried together.

2.3 Crash Risk When Forcing Column Count to Hard Limit

To bypass Kudu's soft limit, some users set the --max_num_columns flag to forcibly raise the column limit up to the hard limit (300). This is extremely dangerous and can cause the following severe problems:

  • Tablet Server Crash: When column count becomes extremely high, each tablet's RowSet metadata, DeltaMemStore, Bloom filter, etc. grow proportionally. When the Tablet Server's memory limit is exceeded, the process is killed by OOM (Out of Memory).
  • Compaction failure: Compaction of tablets with very many columns consumes extreme amounts of CPU and memory. If memory runs out during compaction, the tablet enters a compaction-impossible state, and subsequent write performance degrades severely.
  • Schema propagation failure: When propagating the schema of a table with hundreds of columns to all tablets, if metadata size exceeds a threshold, heartbeats between Master and Tablet Server fail, and tablets can become unavailable.
  • Unrecoverable state: In the worst case, if tablet metadata is corrupted due to excessive columns, the data in that tablet cannot be recovered.

Warning: The soft limit was set based on the range in which the Kudu development team can guarantee stable operation. Raising --max_num_columns to create columns approaching the hard limit can result in fatal crashes of the Kudu storage itself. If you need many columns, always choose to split tables instead.

3. Primary Key Design

Kudu's Primary Key doesn't just guarantee uniqueness — it determines the physical storage order of data. Therefore, the column composition and order of the Primary Key directly impact performance.

3.1 Primary Key Characteristics

CharacteristicDescription
ImmutableCannot change Primary Key after table creation (no adding/removing/reordering columns)
NOT NULL requiredPrimary Key columns must be NOT NULL
Determines storage orderData is stored sorted by Primary Key byte order
Acts as indexPrimary Key automatically functions as a clustered index
Type restrictionsFLOAT, DOUBLE, BOOL types cannot be used in Primary Key

3.2 Why Column Order Matters in Composite Primary Keys

Kudu performs scan range pruning based on the prefix of the Primary Key. This follows the same principle as composite indexes in RDBMS.

For example, in a table with Primary Key (region, timestamp, user_id):

Query ConditionPruning Possible?
WHERE region = 'kr'Yes — first column, efficient
WHERE region = 'kr' AND timestamp >= '2026-04-01'Yes — conditions follow prefix order
WHERE timestamp >= '2026-04-01'No — missing first column (region) condition, results in full scan
WHERE user_id = 'abc'No — skips prefix, results in full scan

3.3 Primary Key Column Order Design Principles

  1. Place the most frequently filtered column first: Analyze query patterns and put the column most commonly appearing in WHERE clauses first
  2. Consider cardinality: Placing a very low-cardinality column (e.g., boolean) first yields minimal pruning benefit
  3. Prevent write hotspots: Placing a monotonically increasing value (e.g., timestamp) as the first column concentrates all writes on the last tablet
  4. Combine with Hash Partitioning: Applying hash partitioning to the first column is a common practice for hotspot prevention
-- Good example: hash partition for write distribution + range partition for time-series management
CREATE TABLE events (
    event_id STRING NOT NULL,
    event_time TIMESTAMP NOT NULL,
    region STRING NOT NULL,
    payload STRING,
    PRIMARY KEY (event_id, event_time, region)
)
PARTITION BY
    HASH (event_id) PARTITIONS 8,
    RANGE (event_time) (
        PARTITION VALUE = '2026-04'
    )
STORED AS KUDU;
-- Bad example: timestamp as first column with only range partitioning, no hash
CREATE TABLE events (
    event_time TIMESTAMP NOT NULL,
    event_id STRING NOT NULL,
    region STRING NOT NULL,
    payload STRING,
    PRIMARY KEY (event_time, event_id, region)
)
PARTITION BY
    RANGE (event_time) (
        PARTITION VALUE = '2026-04'
    )
STORED AS KUDU;
-- Problem: writes for latest data concentrate on a single tablet -> hotspot

3.4 Primary Key Column Count Limits

While there is no explicit hard limit on the number of columns composing the Primary Key, having too many columns causes the following issues:

  • Increased Primary Key index size: Larger Primary Key bytes per row means the index consumes more memory and disk
  • Comparison operation cost: Primary Key comparison costs increase for Insert/Update/Delete operations
  • Reduced schema flexibility: Since Primary Key columns cannot be changed, including more columns reduces room for schema changes

Recommendation: Compose Primary Keys with 3 to 5 columns or fewer, and always consider query patterns and partitioning strategy together.

4. Other Key Limitations

4.1 Data Type Restrictions

RestrictionDescription
DECIMAL precisionMaximum precision 38, scale 38. Be cautious with financial data
VARCHAR / STRING max sizeMaximum 64 KB per cell
BINARY max sizeMaximum 64 KB per cell
Primary Key total cell sizeMaximum 16 KB
Immutable columnsPrimary Key column values cannot be changed after INSERT

4.2 Schema Change Restrictions

RestrictionDescription
Primary Key changeNot possible. Cannot modify PK or add/remove columns via schema change
Column type changeNot possible. Cannot change the data type of existing columns
Column renamePossible
Column add/dropPossible (except Primary Key columns)
Partition schema changeNot possible. Cannot change hash partition bucket count or partition columns
Range partition add/dropPossible. Range partitions can be dynamically added/dropped

4.3 Replication Restrictions

RestrictionDescription
Replication FactorSet at table creation, cannot be changed afterwards. Only odd numbers allowed (default 3)
Minimum Tablet Server countMust have at least as many Tablet Servers as the Replication Factor

4.4 Secondary Index

Kudu does not support secondary indexes. Therefore, queries on non-Primary Key columns result in full scans. When used with Impala, you rely on Impala's execution plan, making it critical to carefully design the Primary Key and partitioning strategy.

4.5 Multi-row Transactions

Kudu guarantees single-row atomicity only. Multi-row transactions are not supported, so business logic requiring simultaneous updates across multiple rows must be compensated at the application level.

4.6 Table Count Limits

There is no explicit hard limit on the number of tables in a cluster, but as tables increase, each table's tablet count accumulates, eventually hitting the tablet count limit. The key formula is: Number of tables x tablets per table = total cluster tablet count.

5. Design Checklist

Check the following items before creating a table.

Partitioning

  • Is the hash partition bucket count close to a multiple of the Tablet Server count?
  • If using range partitions, are you avoiding pre-creating excessive future partitions?
  • Is the total tablet count per table (Hash buckets x Range partitions) appropriate?
  • Does the total tablet count per Tablet Server stay below 1,000?
  • Is the disk waste from empty tablets within acceptable limits?
  • Can the WAL disk capacity handle the total tablet count?
  • Have you considered using monthly partitions instead of daily partitions?
  • Have you calculated the total partition count across all tables in the cluster?

Column Design

  • Is the column count below 100? (Consider splitting tables if it exceeds this)
  • Have you avoided forcibly raising the --max_num_columns flag? (Crash risk)
  • Are there no unnecessary columns included?
  • Is there no possibility of data exceeding 64 KB in STRING/BINARY columns?

Primary Key

  • Does the Primary Key column order match major query patterns?
  • If a monotonically increasing value is the first Primary Key column, have you applied hash partitioning?
  • Have you avoided including unnecessarily many columns in the Primary Key?
  • Is the Primary Key total cell size 16 KB or less?
  • Have you avoided using FLOAT, DOUBLE, or BOOL types in Primary Key columns?

Operations

  • Is the Replication Factor set appropriately? (3 recommended for production)
  • If schema changes may be needed in the future, is the Primary Key designed minimally?
  • Given the lack of secondary indexes, is the PK designed to match query patterns?

6. Summary

ItemLimitation / Recommendation
Tablets per Tablet Server1,000 or fewer recommended
Empty tablet disk usageSeveral MB per tablet (including WAL)
WAL diskGrows proportionally with partition count — must be considered in capacity planning
Recommended partition granularityPrefer monthly partitions over daily
Forcing column count hard limit increaseTablet Server crash risk — strongly not recommended
Maximum columns per table300 (recommended 100 or fewer)
Primary Key changeNot possible
Primary Key column type restrictionsFLOAT, DOUBLE, BOOL cannot be used
Primary Key total cell size16 KB or less
STRING/BINARY max cell size64 KB
Secondary IndexNot supported
Multi-row transactionsNot supported (single-row atomicity only)
Dynamic Range Partition managementSupported (ADD / DROP)
Hash Partition changeNot possible
Replication Factor changeNot possible

Kudu is a storage engine where decisions made at design time affect operations for the entire lifecycle. In particular, since Primary Key and partitioning schemas cannot be changed once set, you must thoroughly consider query patterns, data growth rates, and cluster scale before creating tables. The "we can fix it later" approach simply does not work with Kudu.


This post was written based on the Apache Kudu Known Issues and Limitations and the Schema Design Guide. If you need assistance with Kudu table design or operations, feel free to reach out.

— Data Dynamics Engineering Team