Blog
trinocachingperformanceobject-storagedata-platform

Trino Caching Strategies — Filesystem Cache, Metadata Cache, and Their Limits

In Trino deployments on object storage, repeated I/O and metastore round trips are the main sources of latency. This post covers the filesystem (data) cache, the Hive metadata cache, and why Trino has no result cache — along with the alternatives.

Data DynamicsJune 5, 20267 min read

Trino does not own its data — it reads from object storage (S3/GCS/Azure) or external systems on every query. This architecture is flexible, but as the cost of repeatedly pulling the same data over the network and metastore round-trip latency accumulate, interactive query responsiveness suffers. Caching is the key lever for reducing both costs.

This post organizes Trino caching along two axes — the data (filesystem) cache and the metadata cache — and also answers the frequently asked question, "why doesn't Trino have a result cache?"

1. Where the Time Goes

[One query]
  Metadata lookup   ──> metastore/catalog round trips (RTT)

  File list & stats ──> read manifest / HMS

  Data scan         ──> read Parquet/ORC from object storage (network bandwidth)
CostCauseCaching remedy
Metadata RTTMetastore lookups on every queryMetadata cache
Repeated data I/OHot data read remotely every timeFilesystem (data) cache
Re-running identical queriesResults recomputed every time(No result cache → alternatives needed)

2. Filesystem Cache — Caching Data on Worker-Local Disks

Trino's filesystem cache stores data file blocks read from object storage on the workers' local disks (SSDs). When the same data is read again, it comes from local disk instead of the remote store, cutting latency and network cost.

# etc/config.properties (or per catalog)
fs.cache.enabled=true
fs.cache.directories=/data1/trino-cache,/data2/trino-cache
fs.cache.max-sizes=200GB,200GB
ItemDescription
What gets cachedData file blocks from object storage
Storage locationWorker-local SSD (spreading across multiple paths recommended)
EffectEliminates remote I/O for repeatedly scanned hot data
ConsistencyIceberg's immutable data file model makes cache invalidation trivial

Why It Pairs So Well with Iceberg

Iceberg data files are immutable. Once written, a file never changes; modifications are expressed as new files plus a new snapshot. So you never have to worry about "has the cached file diverged from the original?" — same file path, same contents. This immutability makes the filesystem cache both safe and effective.

Node Affinity (Soft Affinity)

For the cache to pay off, the same file should be read by the same worker whenever possible. Otherwise the same data gets cached redundantly on every worker, killing efficiency. Trino uses soft affinity scheduling, which assigns splits to consistent workers based on the file, to maximize cache hit rates.

3. Metadata Cache — Cutting Metastore Round Trips

3.1 Hive Connector Metadata Cache

The Hive connector queries the Hive Metastore (HMS) for schemas, partitions, and statistics on every query. With many partitions or a distant HMS, this RTT adds up. The cache mitigates it.

# etc/catalog/hive.properties
hive.metastore-cache-ttl=10m
hive.metastore-cache-maximum-size=10000
hive.metastore-refresh-interval=1m
ParameterMeaning
hive.metastore-cache-ttlHow long a cache entry stays valid
hive.metastore-refresh-intervalBackground refresh interval
hive.metastore-cache-maximum-sizeMaximum number of cache entries

Trade-off: during the TTL window, Trino may see stale information even after an external tool changes the metadata. To flush it on demand:

CALL system.flush_metadata_cache();

A common misconception among folks coming from Cloudera/Impala: this cache is a "human-managed cache" similar to Impala's catalogd cache. The Iceberg connector, by contrast, uses a snapshot model, so this kind of TTL cache is fundamentally unnecessary — a commit is visibility. That's why moving to Iceberg eliminates much of the metadata-cache headache altogether.

3.2 Iceberg's Metadata Model

Iceberg reads partition and file statistics from manifest files (in storage) rather than via HMS RPCs. With no HMS round trips, the metadata bottleneck is structurally smaller. For Iceberg tables, the filesystem cache (which also caches manifest files) is therefore often all you need.

4. Result Cache — Why Trino Doesn't Have One

"If I run the same query again, why can't it just return the result from a cache?" is a frequent question. Trino does not provide a query result cache out of the box. The reasons:

  • Trino is a federation engine. Source data in external systems can change at any time, so it is generally impossible to guarantee that a cached result is still valid.
  • Rather than taking on result consistency (stale read) risk at the engine level, the design delegates consistency guarantees to the data layer.

Instead, the following alternatives exist.

AlternativeApproachBest for
Materialized ViewPrecompute frequently used aggregations into a table, refresh periodicallyRecurring heavy aggregations
Pre-aggregated table (CTAS)Build summary tables via daily batchDashboard backends
BI tool cacheSuperset/Tableau's own result cachesDashboard responsiveness
Filesystem cacheReduce input data I/O rather than caching resultsRepeated scans

Materialized View

CREATE MATERIALIZED VIEW iceberg.analytics.daily_active_users AS
SELECT date(event_time) AS d, count(DISTINCT user_id) AS dau
FROM iceberg.analytics.events
GROUP BY date(event_time);
 
-- Periodic refresh (from a scheduler)
REFRESH MATERIALIZED VIEW iceberg.analytics.daily_active_users;

The Iceberg connector's materialized views store the result as a real table and track staleness when the source is updated. For patterns like dashboards that "repeat the same heavy aggregation," they are a practical substitute for a result cache.

5. When Caching Doesn't Help — or Hurts

SituationWhy
Every query scans different data (little overlap)Filesystem cache hit rate near zero, only wasting disk
Worker-local disks are slow/smallCache I/O ends up worse than remote reads
External tools change metadata frequentlyMetadata TTL cache causes stale reads
One-off large batch jobsNo reuse, so caching is pointless
Faking a result cache on frequently changing tablesConsistency risk

Caching shines when there is "hot data/metadata that is accessed repeatedly." If your workload sweeps fresh data every time, a cache only adds cost.

6. Configuration Guide — Recommendations by Workload

WorkloadFilesystem cacheMetadata cacheResult alternative
Interactive BI (repeated hot data)On (large local SSD)Unnecessary for Iceberg / short TTL for HiveMaterialized View, BI cache
Dashboard backendOnPre-aggregated tables
Large ETL (one-off scans)OffNot needed
Hive + many partitionsOnTTL + refresh + flush procedure

7. Summary

Cache typeCost it reducesKey settingsWatch out for
Filesystem (data)Repeated remote I/Ofs.cache.* + soft affinityNeeds local SSD; only worth it for hot data
Metadata (Hive)Metastore RTThive.metastore-cache-ttlStale reads, flush_metadata_cache()
Result cacheRecomputation(none)Replace with Materialized Views / pre-aggregation

Two principles govern Trino caching. First, cut data I/O with the filesystem cache and metadata RTT with the metadata cache — but only enable them for workloads with repeated access. Second, accept that there is no result cache, and replace recurring heavy aggregations with materialized views or pre-aggregated tables. Iceberg's immutable-file and snapshot model makes this caching strategy simpler and safer.


This article is based on Trino in the 440s. If you need help improving interactive query responsiveness or designing caching and pre-aggregation, feel free to reach out.

— Data Dynamics Engineering Team