Practical Trino SQL Patterns — Window Functions, UNNEST, Lambdas, and UDFs
A roundup of SQL patterns we use daily for analytics in Trino: ranking and sessionization with window functions, flattening arrays/maps with UNNEST, transforming collections with lambdas and higher-order functions, federated joins, and SQL/Python UDFs.
Trino follows ANSI SQL faithfully, yet it also ships powerful extensions for semi-structured data — arrays, maps, JSON — and higher-order functions. Once you know these tools, work you used to post-process in application code can be done in a single SQL statement.
This post collects the SQL patterns we reach for most often in real-world analytics on Trino — in the order window functions → UNNEST → lambdas/higher-order functions → federation → UDFs — with copy-paste-ready examples.
1. Window Functions — Computing Across Rows
Window functions compute over a "set of related rows" for each row, without collapsing rows the way GROUP BY does.
Per-Group Ranking / Top-N
-- Top 3 products by revenue per category
SELECT category, product, revenue
FROM (
SELECT category, product, revenue,
row_number() OVER (PARTITION BY category ORDER BY revenue DESC) AS rn
FROM sales
)
WHERE rn <= 3;| Function | Behavior |
|---|---|
row_number() | No ties — always 1, 2, 3, … |
rank() | Allows ties, skips the next rank (1, 1, 3) |
dense_rank() | Allows ties, no gaps (1, 1, 2) |
Running Totals / Moving Averages
SELECT
d,
revenue,
sum(revenue) OVER (ORDER BY d ROWS UNBOUNDED PRECEDING) AS running_total,
avg(revenue) OVER (ORDER BY d ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma7
FROM daily_sales;A ROWS BETWEEN ... PRECEDING AND CURRENT ROW frame defines a sliding window. This is essential for time-series metrics like 7-day moving averages and running totals.
LAG / LEAD — Referencing the Previous/Next Row
-- Time gap since the previous event
SELECT
user_id, event_time,
event_time - lag(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS gap
FROM events;The Sessionization Pattern
With LAG plus a conditional running sum, you can implement session boundaries like "start a new session after 30 minutes of inactivity" in pure SQL.
WITH marked AS (
SELECT user_id, event_time,
CASE WHEN event_time - lag(event_time)
OVER (PARTITION BY user_id ORDER BY event_time)
> INTERVAL '30' MINUTE
THEN 1 ELSE 0 END AS is_new_session
FROM events
)
SELECT user_id, event_time,
sum(is_new_session) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
FROM marked;2. UNNEST — Flattening Arrays/Maps into Rows
Trino treats composite types like ARRAY, MAP, and ROW as first-class citizens. UNNEST expands a collection into rows.
-- Turn rows where one order holds multiple products (an array) into per-product rows
SELECT order_id, item
FROM orders
CROSS JOIN UNNEST(items) AS t(item);
-- Expand with an index (ordinal position) via WITH ORDINALITY
SELECT order_id, idx, item
FROM orders
CROSS JOIN UNNEST(items) WITH ORDINALITY AS t(item, idx);
-- Expand a MAP (key, value)
SELECT id, k, v
FROM tbl
CROSS JOIN UNNEST(attributes) AS t(k, v);Going the other way — collecting rows into an array — use array_agg:
SELECT order_id, array_agg(item ORDER BY item) AS items
FROM order_items
GROUP BY order_id;3. Working with JSON
Log and event payloads frequently arrive as JSON strings.
SELECT
json_extract_scalar(payload, '$.user.id') AS user_id,
json_extract_scalar(payload, '$.event.type') AS event_type,
cast(json_extract(payload, '$.items') AS ARRAY(VARCHAR)) AS items
FROM raw_events;| Function | Returns |
|---|---|
json_extract_scalar | A scalar value (as a string) |
json_extract | A JSON fragment (for further parsing) |
json_parse / cast(... AS JSON) | String → JSON type |
4. Lambdas and Higher-Order Functions — Transforming Collections in SQL
One of Trino's real strengths is its higher-order functions that apply a function to an array. You can transform, filter, and aggregate arrays in place — no UNNEST required.
SELECT
transform(prices, x -> x * 1.1) AS with_tax, -- map
filter(scores, x -> x >= 60) AS passed, -- filter
reduce(nums, 0, (s, x) -> s + x, s -> s) AS total, -- fold/sum
array_sort(tags) AS sorted_tags,
any_match(flags, x -> x = true) AS has_true,
cardinality(filter(scores, x -> x >= 60)) AS pass_count
FROM t;| Higher-order function | Purpose |
|---|---|
transform(arr, x -> ...) | Element transformation (map) |
filter(arr, x -> ...) | Conditional filtering |
reduce(arr, init, (s,x)->..., s->...) | Accumulation (fold) |
any_match / all_match / none_match | Condition checks |
zip_with(a, b, (x,y)->...) | Combine two arrays |
Example: "count of scores ≥ 60 in an array" — which used to require UNNEST + GROUP BY — becomes a one-liner with cardinality(filter(...)).
5. Federation — Joins Across Catalogs
This is the feature that defines Trino's identity: joining data from different systems in a single query.
-- Events on object storage (Iceberg) × customer master in an RDBMS (PostgreSQL)
SELECT e.event_type, c.tier, count(*) AS cnt
FROM iceberg.analytics.events e
JOIN postgresql.crm.customers c ON e.user_id = c.id
WHERE e.event_time >= TIMESTAMP '2026-06-01 00:00:00 UTC'
GROUP BY e.event_type, c.tier;
-- Load data from one system into another (federated CTAS)
CREATE TABLE iceberg.analytics.customer_snapshot AS
SELECT * FROM postgresql.crm.customers;Performance tip: RDBMS connectors push WHERE clauses and aggregations down to the source database via predicate/aggregate pushdown. Small dimension tables are a natural fit for broadcast joins, while joins between large tables need table statistics so the CBO can pick the right distribution strategy.
6. A Toolbox of Everyday Functions
-- Safe casting (NULL on failure)
try_cast(value AS INTEGER)
-- NULL-safe comparison / substitution
coalesce(a, b, 0)
nullif(a, 0) -- NULL when a=0 (guards against division by zero)
-- Dates (the unit comes first!)
date_add('day', 7, current_date)
date_diff('hour', t1, t2)
date_trunc('month', event_time)
format_datetime(event_time, 'yyyy-MM-dd')
-- Strings
split(path, '/') -- to an array
regexp_extract(ua, '(\d+)\.(\d+)', 1)
url_extract_host(url)
-- Approximate aggregation (fast on large data)
approx_distinct(user_id) -- approximate distinct count
approx_percentile(latency, 0.95) -- p95On large datasets,
approx_distinct/approx_percentileare far faster and use much less memory than an exactcount(distinct). Use them liberally wherever a small error margin is acceptable, such as dashboard metrics.
7. UDFs — When Built-In Functions Aren't Enough
You can encapsulate recurring logic in a function.
SQL UDFs (Inline / Session or Catalog Registration)
-- Define on the fly inside a query (WITH FUNCTION)
WITH FUNCTION to_won(usd DOUBLE)
RETURNS DOUBLE
RETURN usd * 1330.0
SELECT product, to_won(price_usd) AS price_krw
FROM catalog.schema.products;Even complex logic can be expressed as a SQL UDF, and registering it permanently in a catalog lets the whole team reuse it.
CREATE FUNCTION catalog.schema.mask_email(email VARCHAR)
RETURNS VARCHAR
RETURN regexp_replace(email, '(^.).*(@.*$)', '$1***$2');Python UDFs
Logic that's awkward to express with built-in or SQL functions can be written as a Python UDF (executed in a sandbox). This is handy for tasks where procedural code feels natural, such as string normalization or custom parsing.
CREATE FUNCTION catalog.schema.py_slug(s VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
WITH (handler = 'slug')
AS $$
def slug(s):
return "-".join(s.lower().split()) if s else None
$$;Caution: UDFs — Python UDFs especially — are powerful, but they're a black box to the optimizer. Wrapping a column in a UDF inside a WHERE clause can break pushdown and pruning. For filters over large data, prefer built-in functions and keep UDFs in the projection (SELECT) stage.
8. Anti-Patterns — SQL That Wrecks Performance
| Anti-pattern | Problem | Alternative |
|---|---|---|
WHERE CAST(ts AS DATE) = ... | Breaks pushdown and pruning | Range comparison ts >= ... AND ts < ... |
WHERE year(ts) = 2026 | Function wrapping → full scan | Range comparison |
SELECT * | Scans/transfers columns you don't need | Only the columns you need |
Overusing count(distinct big_col) | Memory and runtime blow up | approx_distinct |
| Joining the big table first | Intermediate results explode | Statistics + CBO, small side first |
| Wrapping UDFs in WHERE | Optimizer black box | Move to the SELECT stage |
9. Summary
| Tool | Use case | Key functions/syntax |
|---|---|---|
| Window functions | Ranking, running totals, moving averages, sessionization | row_number, sum() OVER, lag/lead |
| UNNEST | Arrays/maps → rows | CROSS JOIN UNNEST ... WITH ORDINALITY |
| Higher-order functions | Array transform/filter/aggregate | transform, filter, reduce |
| JSON | Parsing semi-structured data | json_extract_scalar |
| Federation | Joining heterogeneous sources | catalog.schema.table joins |
| Approximate aggregation | Fast metrics on large data | approx_distinct, approx_percentile |
| UDFs | Encapsulating logic | WITH FUNCTION, CREATE FUNCTION |
The essence of Trino SQL is pulling post-processing that used to live in application code back into SQL. Just getting comfortable with window functions and higher-order functions shrinks your pipeline code dramatically, and federation plus approximate aggregation turn analyses that are tedious on other engines into one-liners. Just remember that wrapping functions or UDFs around columns in WHERE clauses breaks pruning — "filter with built-ins, transform freely."
This article is based on the Trino 440 series. If you need help designing complex analytical queries or tuning SQL performance, feel free to reach out.
— The Data Dynamics Engineering Team