kudupartitioningperformance

The Complete Guide to Apache Kudu Partitioning: Hash, Range, and Multilevel Strategies

Compare Kudu's Hash Partitioning, Range Partitioning, and Multilevel Partitioning strategies with practical examples. Covers Partition Pruning internals and design best practices.

Data DynamicsApril 13, 202612 min read

When creating tables in Apache Kudu, partitioning is not optional — it's mandatory. Kudu does not provide a default partitioning strategy, so you must explicitly define one when creating a table. Poor partitioning leads to write hotspots, read performance degradation, and unbounded tablet growth. This post is based on the Kudu Schema Design Guide — Partitioning section, summarizing each partitioning strategy's characteristics and practical applications.

1. Partitioning Fundamentals

Kudu tables are split into units called tablets, distributed across multiple Tablet Servers. A row always belongs to exactly one tablet, and which tablet a row is assigned to is determined by the partitioning strategy defined at table creation.

Partitioning strategy cannot be changed after table creation (with the exception of adding/removing Range Partitions). This makes careful upfront design critical.

Kudu offers three main partitioning approaches:

Partitioning Type	Description
Hash Partitioning	Distributes rows into buckets based on hash values of Primary Key columns
Range Partitioning	Distributes rows into contiguous key range segments based on Primary Key columns
Multilevel Partitioning	Combines Hash + Range or Hash + Hash

This post uses the metrics table example from the official Kudu documentation to illustrate each strategy.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
);

This table stores time-series metric data collected from multiple hosts. The combination of host, metric, and time forms the Primary Key, and performance varies significantly based on the partitioning strategy chosen.

2. Range Partitioning

2.1 How It Works

Range Partitioning distributes rows using a totally-ordered range partition key. Each partition is assigned a contiguous segment of the range keyspace. Range partition columns must be a subset of the Primary Key columns.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
)
PARTITION BY RANGE (time) (
    PARTITION VALUE = 1,
    PARTITION VALUE = 2,
    PARTITION VALUE = 3
)
STORED AS KUDU;

2.2 Advantages

Optimized time-bounded scans: Queries with time range conditions automatically skip partitions outside the range (Partition Pruning)
Dynamic partition management: Partitions can be added or removed at runtime without affecting other partitions' availability
Efficient data deletion: Dropping a partition immediately deletes all data within it and reclaims disk space, without waiting for compaction after row-level DELETEs

2.3 Disadvantages

Write hotspots: For time-series data, current-time writes always concentrate on a single partition, overloading one Tablet Server
Limited parallelism: When writes concentrate on one tablet, resources on other Tablet Servers sit idle

Key takeaway: Range Partitioning alone is good for reads but bad for writes. With time-series data, the latest partition becomes a hotspot.

3. Hash Partitioning

3.1 How It Works

Hash Partitioning distributes rows into multiple buckets based on hash values of specified columns. The number of buckets is set at table creation and cannot be changed afterward. Any subset of Primary Key columns can be used as hash columns.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
)
PARTITION BY HASH (host, metric) PARTITIONS 8
STORED AS KUDU;

3.2 Advantages

Write distribution: The hash function distributes rows evenly, preventing write hotspots
Even tablet sizes: Data is distributed uniformly across buckets
Optimized host/metric lookups: Queries with equality predicates on host and metric enable Partition Pruning

3.3 Disadvantages

Unbounded tablet growth: Unlike Range Partitions, hash buckets cannot be added or removed, so tablets grow indefinitely as data accumulates
Inefficient time-range queries: Since the time column isn't included in the hash, time-range conditions alone cannot trigger Partition Pruning
Costly data deletion: Removing old data requires row-level DELETEs, and actual disk space isn't reclaimed until compaction completes

Key takeaway: Hash Partitioning alone is good for writes but suffers from unbounded tablet growth. Over time, tablets can grow to unmanageable sizes.

4. Multilevel Partitioning (Hash + Range Combination)

4.1 How It Works

Kudu allows combining multiple levels of partitioning on a single table. Zero or more Hash Partition levels can be combined with an optional Range Partition level. The total tablet count is the product of partition counts at each level.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
)
PARTITION BY
    HASH (host, metric) PARTITIONS 8,
    RANGE (time) (
        PARTITION VALUE = 1,
        PARTITION VALUE = 2,
        PARTITION VALUE = 3
    )
STORED AS KUDU;

In this example, the total tablet count is 8 (Hash buckets) x 3 (Range partitions) = 24 tablets.

4.2 Advantages of Hash + Range

This combination merges the strengths of both Hash and Range while compensating for their individual weaknesses.

Aspect	Hash Only	Range Only	Hash + Range
Write distribution	Even distribution	Hotspot	Even distribution
Time-range reads	No pruning	Pruning works	Pruning works
Specific host/metric reads	Pruning works	No pruning	Pruning works
Tablet growth management	Unbounded growth	Dynamic add/drop	Dynamic add/drop
Old data deletion	Row-level DELETE	Partition DROP	Partition DROP

Writes: Hash distributes across buckets by host and metric, so writes for the same time period are parallelized across multiple tablets
Reads: Time-range conditions trigger Range Partition Pruning, while host/metric equality conditions trigger Hash Partition Pruning — both work simultaneously
Management: Dropping old Range Partitions immediately deletes that time range's data and reclaims disk space

4.3 Hash + Hash Combination

Multiple Hash levels can also be combined. However, different Hash levels must not hash the same columns.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
)
PARTITION BY
    HASH (host) PARTITIONS 4,
    HASH (metric) PARTITIONS 4
STORED AS KUDU;

This results in 4 x 4 = 16 tablets.

Hash + Hash characteristics:

Writes are spread evenly across all tablets
However, without Range Partitioning, tablets grow indefinitely and old data cannot be dropped at partition granularity
May be slightly more prone to hotspotting compared to hashing all columns in a single level

Recommendation: For time-series data, use Hash + Range combination. Hash + Hash is only suitable when time-based data management is not required.

5. Flexible Partitioning (Per-Range Hash Schema)

Starting with Kudu 1.17, you can apply different Hash schemas per Range Partition. This allows more fine-grained control over hotspots within specific Range segments.

For example, you could assign more Hash buckets to recent data ranges for better write distribution while using fewer buckets for older data to save on tablet count.

Important caveat: The number of Hash Partition levels must be the same across all ranges in a table. Bucket counts can differ, but the number of Hash levels must match.

6. Partition Pruning

Partition Pruning is a core performance optimization mechanism in Kudu. During scans, if predicates can determine that a partition is entirely filtered out, Kudu skips scanning that partition's tablet entirely.

6.1 Pruning Conditions

Partitioning Type	Conditions for Pruning
Hash Partition	Requires equality predicates on every hashed column
Range Partition	Requires equality or range predicates on the range partition columns
Multilevel	Pruning works independently at each level

6.2 Pruning Examples (Hash + Range)

For the metrics table with HASH (host, metric) PARTITIONS 8, RANGE (time):

Query Condition	Hash Pruning	Range Pruning	Effect
`WHERE host = 'a' AND metric = 'cpu' AND time >= 100 AND time < 200`	Yes (8→1 bucket)	Yes (matching range only)	Optimal — scans only a few tablets
`WHERE host = 'a' AND metric = 'cpu'`	Yes (8→1 bucket)	No	Hash Pruning only
`WHERE time >= 100 AND time < 200`	No	Yes (matching range only)	Range Pruning only
`WHERE host = 'a'`	No (missing metric)	No	No pruning — full scan

Important: Hash Partition Pruning works only when equality predicates exist for all hashed columns. If even one is missing, pruning is impossible.

7. Partitioning Strategy Comparison Summary

Extending the metrics table comparison from the official documentation:

Strategy	Write Perf	Read Perf	Tablet Growth	Data Deletion	Pruning
`RANGE (time)`	Poor (hotspot)	Good (time pruning)	Good (dynamic mgmt)	Good (DROP)	Range only
`HASH (host, metric)`	Good (distributed)	Good (host/metric pruning)	Poor (unbounded)	Poor (row DELETE)	Hash only
`HASH (host, metric) + RANGE (time)`	Good (distributed)	Good (both pruning)	Good (dynamic mgmt)	Good (DROP)	Both
`HASH (host) + HASH (metric)`	Good (distributed)	Limited	Poor (unbounded)	Poor (row DELETE)	Hash only

8. Practical Design Guidelines

8.1 Determining Tablet Count

The tablet count at table creation is determined by:

Total tablets = Hash buckets × Range partitions

Tables with heavy read/write workloads should have at least as many tablets as there are Tablet Servers to distribute load across all servers.

Cluster Size	Minimum Recommended Tablets	Example Hash Buckets
3 Tablet Servers	3 or more	4 or 8
5 Tablet Servers	5 or more	8
10 Tablet Servers	10 or more	8 or 16

8.2 Hash Partition Design Considerations

Column selection: Choose high-cardinality columns to ensure even distribution across buckets
Bucket count: Typically set as a multiple of the Tablet Server count. Too many creates tablet count explosion; too few limits parallelism
Immutable: Hash bucket count cannot be changed after table creation — decide carefully

8.3 Range Partition Design Considerations

Partition granularity: Prefer monthly partitions over daily. Daily partitions increase tablet count by 30x
Future partitions: Don't pre-create in bulk; add dynamically as needed
Data retention policy: Dropping old partitions immediately reclaims disk space — far more efficient than row-level DELETE
Empty partitions: Range Partitions with no data still maintain tablets and consume WAL disk

8.4 Understanding Disk Space Reclamation

When data is deleted row by row in Kudu, the disk space occupied by deleted rows is not immediately reclaimed until compaction completes. When you need to delete large volumes of historical data, dropping Range Partitions is the only way to immediately reclaim disk space.

-- Inefficient: row-level delete -> disk space reclaimed only after compaction
DELETE FROM metrics WHERE time < 1000;
 
-- Efficient: partition-level drop -> immediate disk space reclamation
ALTER TABLE metrics DROP RANGE PARTITION VALUE = 1;

9. Partitioning Design Checklist

Item	Check
Have you explicitly defined a partitioning strategy? (Kudu provides no default)
For time-series data, are you using Hash + Range combination?
Did you choose high-cardinality columns for Hash partitioning?
Is the Hash bucket count close to a multiple of the Tablet Server count?
Have you considered monthly instead of daily Range Partitions?
Is the total tablet count (Hash buckets × Range partitions) below 1,000 per Tablet Server?
Is your old data deletion strategy based on Range Partition DROP?
Have you avoided pre-creating excessive future Range Partitions?
Have you verified that Partition Pruning works for your main query patterns?
Do your queries include equality predicates for Hash Pruning?

10. Summary

Item	Details
Default partitioning	Not provided — must be explicitly designed
Recommended combination (time-series)	Hash + Range
Hash Partition changes	Not possible (both bucket count and columns)
Range Partition changes	Only add/drop supported
Hash Pruning requirement	Equality predicates on all hash columns required
Range Pruning requirement	Equality or range predicates on Range columns
Tablet count formula	Hash buckets × Range partitions
Immediate disk space reclamation	Only via Range Partition DROP (row DELETE requires compaction)
Flexible Partitioning	Kudu 1.17+: different Hash bucket counts per Range

Partitioning is the most important decision in Kudu table design. Once set, it cannot be changed (except for Range add/drop), and it directly impacts write/read performance, data management, and disk usage. Before creating a table, thoroughly analyze query patterns, data growth rates, and retention policies, and consider Hash + Range combination as your default strategy.

This post was written based on the Apache Kudu Schema Design Guide — Partitioning. If you need assistance with Kudu partitioning design or operations, feel free to reach out.

— Data Dynamics Engineering Team