Blog
kudupartitioningperformance

The Complete Guide to Apache Kudu Partitioning: Hash, Range, and Multilevel Strategies

Compare Kudu's Hash Partitioning, Range Partitioning, and Multilevel Partitioning strategies with practical examples. Covers Partition Pruning internals and design best practices.

Data DynamicsApril 13, 202612 min read

When creating tables in Apache Kudu, partitioning is not optional — it's mandatory. Kudu does not provide a default partitioning strategy, so you must explicitly define one when creating a table. Poor partitioning leads to write hotspots, read performance degradation, and unbounded tablet growth. This post is based on the Kudu Schema Design Guide — Partitioning section, summarizing each partitioning strategy's characteristics and practical applications.

1. Partitioning Fundamentals

Kudu tables are split into units called tablets, distributed across multiple Tablet Servers. A row always belongs to exactly one tablet, and which tablet a row is assigned to is determined by the partitioning strategy defined at table creation.

Partitioning strategy cannot be changed after table creation (with the exception of adding/removing Range Partitions). This makes careful upfront design critical.

Kudu offers three main partitioning approaches:

Partitioning TypeDescription
Hash PartitioningDistributes rows into buckets based on hash values of Primary Key columns
Range PartitioningDistributes rows into contiguous key range segments based on Primary Key columns
Multilevel PartitioningCombines Hash + Range or Hash + Hash

This post uses the metrics table example from the official Kudu documentation to illustrate each strategy.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
);

This table stores time-series metric data collected from multiple hosts. The combination of host, metric, and time forms the Primary Key, and performance varies significantly based on the partitioning strategy chosen.

2. Range Partitioning

2.1 How It Works

Range Partitioning distributes rows using a totally-ordered range partition key. Each partition is assigned a contiguous segment of the range keyspace. Range partition columns must be a subset of the Primary Key columns.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
)
PARTITION BY RANGE (time) (
    PARTITION VALUE = 1,
    PARTITION VALUE = 2,
    PARTITION VALUE = 3
)
STORED AS KUDU;

2.2 Advantages

  • Optimized time-bounded scans: Queries with time range conditions automatically skip partitions outside the range (Partition Pruning)
  • Dynamic partition management: Partitions can be added or removed at runtime without affecting other partitions' availability
  • Efficient data deletion: Dropping a partition immediately deletes all data within it and reclaims disk space, without waiting for compaction after row-level DELETEs

2.3 Disadvantages

  • Write hotspots: For time-series data, current-time writes always concentrate on a single partition, overloading one Tablet Server
  • Limited parallelism: When writes concentrate on one tablet, resources on other Tablet Servers sit idle

Key takeaway: Range Partitioning alone is good for reads but bad for writes. With time-series data, the latest partition becomes a hotspot.

3. Hash Partitioning

3.1 How It Works

Hash Partitioning distributes rows into multiple buckets based on hash values of specified columns. The number of buckets is set at table creation and cannot be changed afterward. Any subset of Primary Key columns can be used as hash columns.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
)
PARTITION BY HASH (host, metric) PARTITIONS 8
STORED AS KUDU;

3.2 Advantages

  • Write distribution: The hash function distributes rows evenly, preventing write hotspots
  • Even tablet sizes: Data is distributed uniformly across buckets
  • Optimized host/metric lookups: Queries with equality predicates on host and metric enable Partition Pruning

3.3 Disadvantages

  • Unbounded tablet growth: Unlike Range Partitions, hash buckets cannot be added or removed, so tablets grow indefinitely as data accumulates
  • Inefficient time-range queries: Since the time column isn't included in the hash, time-range conditions alone cannot trigger Partition Pruning
  • Costly data deletion: Removing old data requires row-level DELETEs, and actual disk space isn't reclaimed until compaction completes

Key takeaway: Hash Partitioning alone is good for writes but suffers from unbounded tablet growth. Over time, tablets can grow to unmanageable sizes.

4. Multilevel Partitioning (Hash + Range Combination)

4.1 How It Works

Kudu allows combining multiple levels of partitioning on a single table. Zero or more Hash Partition levels can be combined with an optional Range Partition level. The total tablet count is the product of partition counts at each level.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
)
PARTITION BY
    HASH (host, metric) PARTITIONS 8,
    RANGE (time) (
        PARTITION VALUE = 1,
        PARTITION VALUE = 2,
        PARTITION VALUE = 3
    )
STORED AS KUDU;

In this example, the total tablet count is 8 (Hash buckets) x 3 (Range partitions) = 24 tablets.

4.2 Advantages of Hash + Range

This combination merges the strengths of both Hash and Range while compensating for their individual weaknesses.

AspectHash OnlyRange OnlyHash + Range
Write distributionEven distributionHotspotEven distribution
Time-range readsNo pruningPruning worksPruning works
Specific host/metric readsPruning worksNo pruningPruning works
Tablet growth managementUnbounded growthDynamic add/dropDynamic add/drop
Old data deletionRow-level DELETEPartition DROPPartition DROP
  • Writes: Hash distributes across buckets by host and metric, so writes for the same time period are parallelized across multiple tablets
  • Reads: Time-range conditions trigger Range Partition Pruning, while host/metric equality conditions trigger Hash Partition Pruning — both work simultaneously
  • Management: Dropping old Range Partitions immediately deletes that time range's data and reclaims disk space

4.3 Hash + Hash Combination

Multiple Hash levels can also be combined. However, different Hash levels must not hash the same columns.

CREATE TABLE metrics (
    host STRING NOT NULL,
    metric STRING NOT NULL,
    time INT64 NOT NULL,
    value DOUBLE NOT NULL,
    PRIMARY KEY (host, metric, time)
)
PARTITION BY
    HASH (host) PARTITIONS 4,
    HASH (metric) PARTITIONS 4
STORED AS KUDU;

This results in 4 x 4 = 16 tablets.

Hash + Hash characteristics:

  • Writes are spread evenly across all tablets
  • However, without Range Partitioning, tablets grow indefinitely and old data cannot be dropped at partition granularity
  • May be slightly more prone to hotspotting compared to hashing all columns in a single level

Recommendation: For time-series data, use Hash + Range combination. Hash + Hash is only suitable when time-based data management is not required.

5. Flexible Partitioning (Per-Range Hash Schema)

Starting with Kudu 1.17, you can apply different Hash schemas per Range Partition. This allows more fine-grained control over hotspots within specific Range segments.

For example, you could assign more Hash buckets to recent data ranges for better write distribution while using fewer buckets for older data to save on tablet count.

Important caveat: The number of Hash Partition levels must be the same across all ranges in a table. Bucket counts can differ, but the number of Hash levels must match.

6. Partition Pruning

Partition Pruning is a core performance optimization mechanism in Kudu. During scans, if predicates can determine that a partition is entirely filtered out, Kudu skips scanning that partition's tablet entirely.

6.1 Pruning Conditions

Partitioning TypeConditions for Pruning
Hash PartitionRequires equality predicates on every hashed column
Range PartitionRequires equality or range predicates on the range partition columns
MultilevelPruning works independently at each level

6.2 Pruning Examples (Hash + Range)

For the metrics table with HASH (host, metric) PARTITIONS 8, RANGE (time):

Query ConditionHash PruningRange PruningEffect
WHERE host = 'a' AND metric = 'cpu' AND time >= 100 AND time < 200Yes (8→1 bucket)Yes (matching range only)Optimal — scans only a few tablets
WHERE host = 'a' AND metric = 'cpu'Yes (8→1 bucket)NoHash Pruning only
WHERE time >= 100 AND time < 200NoYes (matching range only)Range Pruning only
WHERE host = 'a'No (missing metric)NoNo pruning — full scan

Important: Hash Partition Pruning works only when equality predicates exist for all hashed columns. If even one is missing, pruning is impossible.

7. Partitioning Strategy Comparison Summary

Extending the metrics table comparison from the official documentation:

StrategyWrite PerfRead PerfTablet GrowthData DeletionPruning
RANGE (time)Poor (hotspot)Good (time pruning)Good (dynamic mgmt)Good (DROP)Range only
HASH (host, metric)Good (distributed)Good (host/metric pruning)Poor (unbounded)Poor (row DELETE)Hash only
HASH (host, metric) + RANGE (time)Good (distributed)Good (both pruning)Good (dynamic mgmt)Good (DROP)Both
HASH (host) + HASH (metric)Good (distributed)LimitedPoor (unbounded)Poor (row DELETE)Hash only

8. Practical Design Guidelines

8.1 Determining Tablet Count

The tablet count at table creation is determined by:

Total tablets = Hash buckets × Range partitions

Tables with heavy read/write workloads should have at least as many tablets as there are Tablet Servers to distribute load across all servers.

Cluster SizeMinimum Recommended TabletsExample Hash Buckets
3 Tablet Servers3 or more4 or 8
5 Tablet Servers5 or more8
10 Tablet Servers10 or more8 or 16

8.2 Hash Partition Design Considerations

  1. Column selection: Choose high-cardinality columns to ensure even distribution across buckets
  2. Bucket count: Typically set as a multiple of the Tablet Server count. Too many creates tablet count explosion; too few limits parallelism
  3. Immutable: Hash bucket count cannot be changed after table creation — decide carefully

8.3 Range Partition Design Considerations

  1. Partition granularity: Prefer monthly partitions over daily. Daily partitions increase tablet count by 30x
  2. Future partitions: Don't pre-create in bulk; add dynamically as needed
  3. Data retention policy: Dropping old partitions immediately reclaims disk space — far more efficient than row-level DELETE
  4. Empty partitions: Range Partitions with no data still maintain tablets and consume WAL disk

8.4 Understanding Disk Space Reclamation

When data is deleted row by row in Kudu, the disk space occupied by deleted rows is not immediately reclaimed until compaction completes. When you need to delete large volumes of historical data, dropping Range Partitions is the only way to immediately reclaim disk space.

-- Inefficient: row-level delete -> disk space reclaimed only after compaction
DELETE FROM metrics WHERE time < 1000;
 
-- Efficient: partition-level drop -> immediate disk space reclamation
ALTER TABLE metrics DROP RANGE PARTITION VALUE = 1;

9. Partitioning Design Checklist

ItemCheck
Have you explicitly defined a partitioning strategy? (Kudu provides no default)
For time-series data, are you using Hash + Range combination?
Did you choose high-cardinality columns for Hash partitioning?
Is the Hash bucket count close to a multiple of the Tablet Server count?
Have you considered monthly instead of daily Range Partitions?
Is the total tablet count (Hash buckets × Range partitions) below 1,000 per Tablet Server?
Is your old data deletion strategy based on Range Partition DROP?
Have you avoided pre-creating excessive future Range Partitions?
Have you verified that Partition Pruning works for your main query patterns?
Do your queries include equality predicates for Hash Pruning?

10. Summary

ItemDetails
Default partitioningNot provided — must be explicitly designed
Recommended combination (time-series)Hash + Range
Hash Partition changesNot possible (both bucket count and columns)
Range Partition changesOnly add/drop supported
Hash Pruning requirementEquality predicates on all hash columns required
Range Pruning requirementEquality or range predicates on Range columns
Tablet count formulaHash buckets × Range partitions
Immediate disk space reclamationOnly via Range Partition DROP (row DELETE requires compaction)
Flexible PartitioningKudu 1.17+: different Hash bucket counts per Range

Partitioning is the most important decision in Kudu table design. Once set, it cannot be changed (except for Range add/drop), and it directly impacts write/read performance, data management, and disk usage. Before creating a table, thoroughly analyze query patterns, data growth rates, and retention policies, and consider Hash + Range combination as your default strategy.


This post was written based on the Apache Kudu Schema Design Guide — Partitioning. If you need assistance with Kudu partitioning design or operations, feel free to reach out.

— Data Dynamics Engineering Team