The Complete Guide to Apache Kudu Partitioning: Hash, Range, and Multilevel Strategies
Compare Kudu's Hash Partitioning, Range Partitioning, and Multilevel Partitioning strategies with practical examples. Covers Partition Pruning internals and design best practices.
When creating tables in Apache Kudu, partitioning is not optional — it's mandatory. Kudu does not provide a default partitioning strategy, so you must explicitly define one when creating a table. Poor partitioning leads to write hotspots, read performance degradation, and unbounded tablet growth. This post is based on the Kudu Schema Design Guide — Partitioning section, summarizing each partitioning strategy's characteristics and practical applications.
1. Partitioning Fundamentals
Kudu tables are split into units called tablets, distributed across multiple Tablet Servers. A row always belongs to exactly one tablet, and which tablet a row is assigned to is determined by the partitioning strategy defined at table creation.
Partitioning strategy cannot be changed after table creation (with the exception of adding/removing Range Partitions). This makes careful upfront design critical.
Kudu offers three main partitioning approaches:
| Partitioning Type | Description |
|---|---|
| Hash Partitioning | Distributes rows into buckets based on hash values of Primary Key columns |
| Range Partitioning | Distributes rows into contiguous key range segments based on Primary Key columns |
| Multilevel Partitioning | Combines Hash + Range or Hash + Hash |
This post uses the metrics table example from the official Kudu documentation to illustrate each strategy.
CREATE TABLE metrics (
host STRING NOT NULL,
metric STRING NOT NULL,
time INT64 NOT NULL,
value DOUBLE NOT NULL,
PRIMARY KEY (host, metric, time)
);This table stores time-series metric data collected from multiple hosts. The combination of host, metric, and time forms the Primary Key, and performance varies significantly based on the partitioning strategy chosen.
2. Range Partitioning
2.1 How It Works
Range Partitioning distributes rows using a totally-ordered range partition key. Each partition is assigned a contiguous segment of the range keyspace. Range partition columns must be a subset of the Primary Key columns.
CREATE TABLE metrics (
host STRING NOT NULL,
metric STRING NOT NULL,
time INT64 NOT NULL,
value DOUBLE NOT NULL,
PRIMARY KEY (host, metric, time)
)
PARTITION BY RANGE (time) (
PARTITION VALUE = 1,
PARTITION VALUE = 2,
PARTITION VALUE = 3
)
STORED AS KUDU;2.2 Advantages
- Optimized time-bounded scans: Queries with time range conditions automatically skip partitions outside the range (Partition Pruning)
- Dynamic partition management: Partitions can be added or removed at runtime without affecting other partitions' availability
- Efficient data deletion: Dropping a partition immediately deletes all data within it and reclaims disk space, without waiting for compaction after row-level DELETEs
2.3 Disadvantages
- Write hotspots: For time-series data, current-time writes always concentrate on a single partition, overloading one Tablet Server
- Limited parallelism: When writes concentrate on one tablet, resources on other Tablet Servers sit idle
Key takeaway: Range Partitioning alone is good for reads but bad for writes. With time-series data, the latest partition becomes a hotspot.
3. Hash Partitioning
3.1 How It Works
Hash Partitioning distributes rows into multiple buckets based on hash values of specified columns. The number of buckets is set at table creation and cannot be changed afterward. Any subset of Primary Key columns can be used as hash columns.
CREATE TABLE metrics (
host STRING NOT NULL,
metric STRING NOT NULL,
time INT64 NOT NULL,
value DOUBLE NOT NULL,
PRIMARY KEY (host, metric, time)
)
PARTITION BY HASH (host, metric) PARTITIONS 8
STORED AS KUDU;3.2 Advantages
- Write distribution: The hash function distributes rows evenly, preventing write hotspots
- Even tablet sizes: Data is distributed uniformly across buckets
- Optimized host/metric lookups: Queries with equality predicates on
hostandmetricenable Partition Pruning
3.3 Disadvantages
- Unbounded tablet growth: Unlike Range Partitions, hash buckets cannot be added or removed, so tablets grow indefinitely as data accumulates
- Inefficient time-range queries: Since the time column isn't included in the hash, time-range conditions alone cannot trigger Partition Pruning
- Costly data deletion: Removing old data requires row-level DELETEs, and actual disk space isn't reclaimed until compaction completes
Key takeaway: Hash Partitioning alone is good for writes but suffers from unbounded tablet growth. Over time, tablets can grow to unmanageable sizes.
4. Multilevel Partitioning (Hash + Range Combination)
4.1 How It Works
Kudu allows combining multiple levels of partitioning on a single table. Zero or more Hash Partition levels can be combined with an optional Range Partition level. The total tablet count is the product of partition counts at each level.
CREATE TABLE metrics (
host STRING NOT NULL,
metric STRING NOT NULL,
time INT64 NOT NULL,
value DOUBLE NOT NULL,
PRIMARY KEY (host, metric, time)
)
PARTITION BY
HASH (host, metric) PARTITIONS 8,
RANGE (time) (
PARTITION VALUE = 1,
PARTITION VALUE = 2,
PARTITION VALUE = 3
)
STORED AS KUDU;In this example, the total tablet count is 8 (Hash buckets) x 3 (Range partitions) = 24 tablets.
4.2 Advantages of Hash + Range
This combination merges the strengths of both Hash and Range while compensating for their individual weaknesses.
| Aspect | Hash Only | Range Only | Hash + Range |
|---|---|---|---|
| Write distribution | Even distribution | Hotspot | Even distribution |
| Time-range reads | No pruning | Pruning works | Pruning works |
| Specific host/metric reads | Pruning works | No pruning | Pruning works |
| Tablet growth management | Unbounded growth | Dynamic add/drop | Dynamic add/drop |
| Old data deletion | Row-level DELETE | Partition DROP | Partition DROP |
- Writes: Hash distributes across buckets by
hostandmetric, so writes for the same time period are parallelized across multiple tablets - Reads: Time-range conditions trigger Range Partition Pruning, while host/metric equality conditions trigger Hash Partition Pruning — both work simultaneously
- Management: Dropping old Range Partitions immediately deletes that time range's data and reclaims disk space
4.3 Hash + Hash Combination
Multiple Hash levels can also be combined. However, different Hash levels must not hash the same columns.
CREATE TABLE metrics (
host STRING NOT NULL,
metric STRING NOT NULL,
time INT64 NOT NULL,
value DOUBLE NOT NULL,
PRIMARY KEY (host, metric, time)
)
PARTITION BY
HASH (host) PARTITIONS 4,
HASH (metric) PARTITIONS 4
STORED AS KUDU;This results in 4 x 4 = 16 tablets.
Hash + Hash characteristics:
- Writes are spread evenly across all tablets
- However, without Range Partitioning, tablets grow indefinitely and old data cannot be dropped at partition granularity
- May be slightly more prone to hotspotting compared to hashing all columns in a single level
Recommendation: For time-series data, use Hash + Range combination. Hash + Hash is only suitable when time-based data management is not required.
5. Flexible Partitioning (Per-Range Hash Schema)
Starting with Kudu 1.17, you can apply different Hash schemas per Range Partition. This allows more fine-grained control over hotspots within specific Range segments.
For example, you could assign more Hash buckets to recent data ranges for better write distribution while using fewer buckets for older data to save on tablet count.
Important caveat: The number of Hash Partition levels must be the same across all ranges in a table. Bucket counts can differ, but the number of Hash levels must match.
6. Partition Pruning
Partition Pruning is a core performance optimization mechanism in Kudu. During scans, if predicates can determine that a partition is entirely filtered out, Kudu skips scanning that partition's tablet entirely.
6.1 Pruning Conditions
| Partitioning Type | Conditions for Pruning |
|---|---|
| Hash Partition | Requires equality predicates on every hashed column |
| Range Partition | Requires equality or range predicates on the range partition columns |
| Multilevel | Pruning works independently at each level |
6.2 Pruning Examples (Hash + Range)
For the metrics table with HASH (host, metric) PARTITIONS 8, RANGE (time):
| Query Condition | Hash Pruning | Range Pruning | Effect |
|---|---|---|---|
WHERE host = 'a' AND metric = 'cpu' AND time >= 100 AND time < 200 | Yes (8→1 bucket) | Yes (matching range only) | Optimal — scans only a few tablets |
WHERE host = 'a' AND metric = 'cpu' | Yes (8→1 bucket) | No | Hash Pruning only |
WHERE time >= 100 AND time < 200 | No | Yes (matching range only) | Range Pruning only |
WHERE host = 'a' | No (missing metric) | No | No pruning — full scan |
Important: Hash Partition Pruning works only when equality predicates exist for all hashed columns. If even one is missing, pruning is impossible.
7. Partitioning Strategy Comparison Summary
Extending the metrics table comparison from the official documentation:
| Strategy | Write Perf | Read Perf | Tablet Growth | Data Deletion | Pruning |
|---|---|---|---|---|---|
RANGE (time) | Poor (hotspot) | Good (time pruning) | Good (dynamic mgmt) | Good (DROP) | Range only |
HASH (host, metric) | Good (distributed) | Good (host/metric pruning) | Poor (unbounded) | Poor (row DELETE) | Hash only |
HASH (host, metric) + RANGE (time) | Good (distributed) | Good (both pruning) | Good (dynamic mgmt) | Good (DROP) | Both |
HASH (host) + HASH (metric) | Good (distributed) | Limited | Poor (unbounded) | Poor (row DELETE) | Hash only |
8. Practical Design Guidelines
8.1 Determining Tablet Count
The tablet count at table creation is determined by:
Total tablets = Hash buckets × Range partitions
Tables with heavy read/write workloads should have at least as many tablets as there are Tablet Servers to distribute load across all servers.
| Cluster Size | Minimum Recommended Tablets | Example Hash Buckets |
|---|---|---|
| 3 Tablet Servers | 3 or more | 4 or 8 |
| 5 Tablet Servers | 5 or more | 8 |
| 10 Tablet Servers | 10 or more | 8 or 16 |
8.2 Hash Partition Design Considerations
- Column selection: Choose high-cardinality columns to ensure even distribution across buckets
- Bucket count: Typically set as a multiple of the Tablet Server count. Too many creates tablet count explosion; too few limits parallelism
- Immutable: Hash bucket count cannot be changed after table creation — decide carefully
8.3 Range Partition Design Considerations
- Partition granularity: Prefer monthly partitions over daily. Daily partitions increase tablet count by 30x
- Future partitions: Don't pre-create in bulk; add dynamically as needed
- Data retention policy: Dropping old partitions immediately reclaims disk space — far more efficient than row-level DELETE
- Empty partitions: Range Partitions with no data still maintain tablets and consume WAL disk
8.4 Understanding Disk Space Reclamation
When data is deleted row by row in Kudu, the disk space occupied by deleted rows is not immediately reclaimed until compaction completes. When you need to delete large volumes of historical data, dropping Range Partitions is the only way to immediately reclaim disk space.
-- Inefficient: row-level delete -> disk space reclaimed only after compaction
DELETE FROM metrics WHERE time < 1000;
-- Efficient: partition-level drop -> immediate disk space reclamation
ALTER TABLE metrics DROP RANGE PARTITION VALUE = 1;9. Partitioning Design Checklist
| Item | Check |
|---|---|
| Have you explicitly defined a partitioning strategy? (Kudu provides no default) | |
| For time-series data, are you using Hash + Range combination? | |
| Did you choose high-cardinality columns for Hash partitioning? | |
| Is the Hash bucket count close to a multiple of the Tablet Server count? | |
| Have you considered monthly instead of daily Range Partitions? | |
| Is the total tablet count (Hash buckets × Range partitions) below 1,000 per Tablet Server? | |
| Is your old data deletion strategy based on Range Partition DROP? | |
| Have you avoided pre-creating excessive future Range Partitions? | |
| Have you verified that Partition Pruning works for your main query patterns? | |
| Do your queries include equality predicates for Hash Pruning? |
10. Summary
| Item | Details |
|---|---|
| Default partitioning | Not provided — must be explicitly designed |
| Recommended combination (time-series) | Hash + Range |
| Hash Partition changes | Not possible (both bucket count and columns) |
| Range Partition changes | Only add/drop supported |
| Hash Pruning requirement | Equality predicates on all hash columns required |
| Range Pruning requirement | Equality or range predicates on Range columns |
| Tablet count formula | Hash buckets × Range partitions |
| Immediate disk space reclamation | Only via Range Partition DROP (row DELETE requires compaction) |
| Flexible Partitioning | Kudu 1.17+: different Hash bucket counts per Range |
Partitioning is the most important decision in Kudu table design. Once set, it cannot be changed (except for Range add/drop), and it directly impacts write/read performance, data management, and disk usage. Before creating a table, thoroughly analyze query patterns, data growth rates, and retention policies, and consider Hash + Range combination as your default strategy.
This post was written based on the Apache Kudu Schema Design Guide — Partitioning. If you need assistance with Kudu partitioning design or operations, feel free to reach out.
— Data Dynamics Engineering Team