Blog
trinoperformancetuningresource-groupsmemorydata-platform

Trino Memory Management and Resource Groups

A practical walkthrough of Trino's memory model (per-node, cluster, heap headroom), spill-to-disk, and how to isolate workloads with Resource Groups so a single query can never take down the whole cluster — with real-world configuration examples.

Data DynamicsJune 5, 20268 min read

The two most common failures in Trino operations are "one giant query eats all the memory and paralyzes the entire cluster" and "ad-hoc queries from one team hog the concurrency slots, pushing scheduled batch jobs behind." Both problems can be prevented structurally once you understand the memory model and Resource Groups.

This post explains how Trino distributes memory, what spill-to-disk saves you from, and how to isolate workloads with Resource Groups — with real configuration examples.

1. The Trino Memory Model — Three Boundaries

Trino memory is governed by three boundaries.

[ Worker node JVM heap ]
   ├── heap-headroom-per-node   (reserved area Trino never touches: GC, buffers)
   └── Query memory pool
         └── query.max-memory-per-node   (cap on what one query can use on this node)
 
[ Cluster-wide ]
   └── query.max-memory   (cap on what one query can use summed across all workers)
SettingScopeMeaning
query.max-memory-per-nodeOne workerMaximum memory a single query can use on that node
query.max-memoryClusterMaximum memory a single query can use summed across all workers
memory.heap-headroom-per-nodeOne workerHeap that Trino leaves unallocated to queries (for GC and system use)

The relationship (approximately):

query.max-memory-per-node  <  (JVM heap - heap-headroom-per-node)
query.max-memory           ≈  query.max-memory-per-node × number of workers  (or less)

For example, with a 24GB worker heap and 5GB of headroom, the query pool is about 19GB. Setting query.max-memory-per-node to 12GB means one query can use up to 12GB on a node, leaving the rest for other queries to share.

# etc/config.properties (workers)
query.max-memory-per-node=12GB
memory.heap-headroom-per-node=5GB
 
# Coordinator
query.max-memory=240GB

When a query exceeds these caps, only that query fails with EXCEEDED_LOCAL_MEMORY_LIMIT or EXCEEDED_GLOBAL_MEMORY_LIMIT. The caps are what prevent a single query from destabilizing the entire cluster.

2. Spill-to-Disk — Slow Down Instead of OOM

Instead of unconditionally killing queries that hit the memory cap, you can let them spill intermediate data to disk and run to completion. Spill works for joins, aggregations, sorts, and window operations.

# etc/config.properties
spill-enabled=true
spiller-spill-path=/data1/trino-spill,/data2/trino-spill   # spreading across multiple disks is recommended
max-spill-per-node=200GB
query-max-spill-per-node=100GB

The trade-off is clear-cut.

spill offspill on
On memory overrunQuery failsSpills to disk and finishes
PerformanceFast (in-memory)Slower (disk I/O)
Best forLow-latency interactiveLarge batch / ETL

Spill is a safety net that lets queries "finish, even if slowly." Processing everything in memory is the ideal, but it beats losing the cluster to the occasional oversized query. Spread the spill paths across multiple fast local SSDs.

When even spill is not enough, the root fix is to improve the query itself — select only the columns you need instead of SELECT *, shrink the build side of large joins, pre-aggregate, and leverage partition pruning.

3. The Problem: Memory Caps Alone Are Not Enough

query.max-memory caps "one query." But real-world problems are usually about contention between many queries.

  • If 30 analysts fire heavy queries at the same time, cluster memory is exhausted even though each query stays within its cap.
  • Low-priority ad-hoc queries occupy the concurrency slots, pushing SLA-bound scheduled batches back in the queue.
  • A storm of dashboard refreshes from one BI tool degrades responsiveness for everyone.

This is what Resource Groups solve. (If you come from Cloudera, this is the counterpart of Impala's Admission Control / resource pools.)

4. Resource Groups — Workload Isolation

Resource Groups classify incoming queries into groups and give each group concurrency, memory, and queue limits.

# etc/resource-groups.properties
resource-groups.configuration-manager=file
resource-groups.config-file=/etc/trino/resource-groups.json
{
  "rootGroups": [
    {
      "name": "global",
      "softMemoryLimit": "80%",
      "hardConcurrencyLimit": 100,
      "maxQueued": 1000,
      "schedulingPolicy": "weighted",
      "subGroups": [
        {
          "name": "batch",
          "softMemoryLimit": "60%",
          "hardConcurrencyLimit": 20,
          "maxQueued": 500,
          "schedulingWeight": 3
        },
        {
          "name": "adhoc",
          "softMemoryLimit": "30%",
          "hardConcurrencyLimit": 40,
          "maxQueued": 200,
          "schedulingWeight": 1
        },
        {
          "name": "dashboard",
          "softMemoryLimit": "20%",
          "hardConcurrencyLimit": 50,
          "maxQueued": 100,
          "schedulingWeight": 2
        }
      ]
    }
  ],
  "selectors": [
    { "user": "etl-svc",           "group": "global.batch" },
    { "source": "superset",        "group": "global.dashboard" },
    { "group": "global.adhoc" }
  ]
}

Key Properties

PropertyMeaning
hardConcurrencyLimitMaximum number of queries running concurrently
maxQueuedMaximum length of the wait queue; queries beyond it are rejected
softMemoryLimitMemory this group may use (%, or an absolute value). When exceeded, new queries wait
schedulingPolicyPolicy for picking the next query from the queue
schedulingWeightResource-sharing weight between groups under the weighted policy

Selectors — Matching Queries to Groups

selectors assign a query to the first matching rule, evaluated top to bottom. You can classify by user, source (the source name sent by the client), clientTags, queryType, and more.

etl-svc user          → global.batch    (concurrency 20, memory 60%, high weight)
superset dashboards   → global.dashboard (high concurrency, small queries)
everything else       → global.adhoc    (ad-hoc analysis, low weight)

With this in place, no matter how many ad-hoc analyst queries (adhoc) pile up, they cannot encroach on the concurrency or memory of batch (batch). This is the key mechanism for protecting SLA-bound workloads.

5. Choosing a Scheduling Policy

PolicyBehaviorBest for
fairTreats queries within a group fairly (FIFO-based)A sensible default
weightedDistributes resources to subgroups by schedulingWeight ratioDifferentiated priority between groups
weighted_fairCompromise between weights and fairnessBalancing many groups
query_priorityPrioritizes by each query's query_priority session valueDynamic priority control

The example above uses weighted, allocating resources at a batch:dashboard:adhoc ratio of 3:2:1.

6. Query-Level Controls — Extra Guardrails

Beyond Resource Groups, global and session settings help stop runaway queries.

# etc/config.properties — block abnormally heavy/slow queries
query.max-execution-time=2h
query.max-scan-physical-bytes=5TB
query.max-stage-count=150
-- Per-session adjustment (for a specific query only)
SET SESSION query_max_memory_per_node = '20GB';
SET SESSION query_max_run_time = '30m';

query.max-scan-physical-bytes is a handy guardrail that kills a query once it scans past a given byte threshold — useful when someone accidentally launches a giant full scan.

7. Diagnostics — What to Look at When Tuning

7.1 System Tables

-- Currently running/queued queries and their memory usage
SELECT query_id, state, resource_group_id,
       query, total_memory_reservation
FROM system.runtime.queries
WHERE state IN ('RUNNING', 'QUEUED')
ORDER BY total_memory_reservation DESC;
 
-- Per-resource-group status
SELECT * FROM system.runtime.resource_group_pools;

7.2 EXPLAIN ANALYZE

After actually running a query, inspect time, memory, and row counts per stage.

EXPLAIN ANALYZE
SELECT user_id, count(*)
FROM iceberg.analytics.events
WHERE event_time >= TIMESTAMP '2026-06-01 00:00:00 UTC'
GROUP BY user_id;

This is where bad join orders or a single stage monopolizing memory show up. If join quality is poor, refresh the statistics.

ANALYZE iceberg.analytics.events;

7.3 Web UI

The Query Detail page in the coordinator Web UI lets you visually inspect per-stage memory, execution time, and whether spill occurred. Queries whose "Peak Memory" approaches the cap are your top tuning candidates.

8. Quick Symptom-to-Fix Table

SymptomLikely causeFix
EXCEEDED_LOCAL_MEMORY_LIMITQuery exceeded the cap on one nodeEnable spill, raise query.max-memory-per-node, optimize the query
EXCEEDED_GLOBAL_MEMORY_LIMITCluster-wide cap exceededReview query.max-memory, add workers, split the query
Worker OOMKill / GC stormsHeap ≈ container limit, insufficient headroomRaise headroom, ensure heap < limit
Batches starved by ad-hoc queriesNo workload isolationSeparate batch/adhoc with Resource Groups
Dashboard storms degrade everythingUnbounded dashboard concurrencyConcurrency and queue limits on the dashboard group
Accidental giant full scanNo guardrailsSet query.max-scan-physical-bytes
Query runs but is slowBad join order, missing statisticsDiagnose with ANALYZE and EXPLAIN ANALYZE

9. Tuning Checklist

  • JVM heap < container/node memory (20-30% headroom)
  • query.max-memory-per-node < (heap − headroom)
  • Spill enabled + multiple paths on fast local disks
  • batch/adhoc/dashboard isolated with Resource Groups
  • Selectors map groups by user and source
  • Global guardrails such as query.max-scan-physical-bytes
  • Regular ANALYZE on key tables
  • Review top peak-memory queries via the Web UI / system tables

10. Summary

LayerToolWhat it prevents
Single-query memoryquery.max-memory(-per-node)One query monopolizing the cluster
Completion guaranteespill-to-diskFailures caused by OOM
Workload contentionResource GroupsGroups encroaching on each other, SLA violations
Runaway incidentsGlobal query limitsAccidental giant scans, unbounded execution
Execution qualityANALYZE + EXPLAIN ANALYZEBad join orders, slow queries

Stabilizing Trino comes down to two things. First, understand the memory model so heap, headroom, and query caps are set consistently, with spill as the safety net. Second, isolate workloads with Resource Groups so no single query or team can paralyze the whole cluster. With both in place, your Trino cluster will "slow down but never stop" even under heavy load.


This article is based on the Trino 440 series. If you need help with memory and concurrency tuning or workload isolation design for your Trino cluster, feel free to reach out.

— Data Dynamics Engineering Team