Delta Lake Complete Guide — From Delta Table Concepts to Iceberg Comparison
A practical guide covering Delta Lake core concepts and architecture, Delta Table creation and usage, and a detailed comparison with Apache Iceberg for choosing the right Lakehouse table format.
This guide covers the concepts and architecture of Delta Lake, how to create and use Delta Tables, and a detailed comparison with Apache Iceberg. It serves as a practical reference for choosing the right table format in a Lakehouse architecture.
1. What is Delta Lake?
Open-Source Storage Layer
Delta Lake is an open-source storage layer that adds reliability on top of existing Data Lakes. Designed to work with Apache Spark, it addresses the inherent limitations of Parquet-based Data Lakes.
Problems with traditional Data Lakes:
- Data corruption from partial writes
- No consistency guarantees for concurrent reads/writes
- Difficulty managing schema changes
- Inability to restore data to a previous state
Delta Lake solves these problems with ACID transactions based on a Transaction Log.
Role in the Lakehouse Architecture
The Lakehouse architecture combines the flexibility of Data Lakes with the reliability of Data Warehouses. Delta Lake serves as the core storage layer in this architecture.
┌─────────────────────────────────────────┐
│ BI / ML / Analytics │
├─────────────────────────────────────────┤
│ Query Engine (Spark) │
├─────────────────────────────────────────┤
│ Delta Lake (Storage Layer) │ ← ACID, Schema, Time Travel
├─────────────────────────────────────────┤
│ Object Storage (S3, ADLS, GCS) │
└─────────────────────────────────────────┘
Key Features
| Feature | Description |
|---|---|
| ACID Transactions | Guarantees atomicity, consistency, isolation, and durability for all read/write operations |
| Schema Enforcement | Validates schema on writes to prevent bad data ingestion |
| Schema Evolution | Safely modify table schemas over time |
| Time Travel | Query or rollback to past data states |
| Unified Batch & Streaming | Process batch and streaming workloads on the same table |
| DML Support | Supports UPDATE, DELETE, and MERGE operations |
2. Delta Lake Architecture
Parquet Files + Transaction Log
A Delta Table physically consists of two components:
my_table/
├── _delta_log/ # Transaction Log directory
│ ├── 00000000000000000000.json # Version 0
│ ├── 00000000000000000001.json # Version 1
│ ├── 00000000000000000002.json # Version 2
│ ├── ...
│ └── 00000000000000000010.checkpoint.parquet # Checkpoint
├── part-00000-...snappy.parquet # Data files
├── part-00001-...snappy.parquet
└── part-00002-...snappy.parquet
- Data files: Stored in standard Apache Parquet format
- Transaction Log (
_delta_log/): Records all changes to the table in order
How the Transaction Log Works
The Transaction Log is an ordered array of JSON files. Each file represents a single commit and contains actions such as:
{
"add": {
"path": "part-00000-abc123.snappy.parquet",
"size": 1024000,
"partitionValues": {"date": "2026-04-14"},
"modificationTime": 1713052800000,
"dataChange": true
}
}Key action types:
| Action | Description |
|---|---|
add | Add a new data file |
remove | Logically delete an existing data file |
metaData | Change table metadata such as schema or partition columns |
protocol | Specify read/write protocol versions |
txn | Application-level transaction identifier |
Optimistic Concurrency Control: When multiple operations modify the same table concurrently, Delta Lake uses optimistic concurrency control. If a conflict is detected, it automatically retries.
Checkpoint Mechanism
Every 10 commits, Delta Lake creates a checkpoint file. Checkpoints consolidate all actions up to that point into a single Parquet file, enabling fast table state reconstruction.
# Reading version 100 without checkpoints?
→ Must sequentially read 100 JSON files
# With checkpoints?
→ Read 1 checkpoint at version 100 + only subsequent JSON files
3. What is a Delta Table?
Definition and Structure
A Delta Table is a table stored in the Delta Lake format. Unlike regular Parquet tables, it contains a _delta_log directory that enables ACID transactions and version management.
# Reading Parquet vs Delta Table
# Parquet
df = spark.read.parquet("/data/my_table")
# Delta Table
df = spark.read.format("delta").load("/data/my_table")Managed Table vs External Table
| Aspect | Managed Table | External Table |
|---|---|---|
| Data Location | Default path managed by Metastore | User-specified path |
| DROP Behavior | Deletes both metadata and data files | Deletes only metadata, keeps data files |
| Use Case | ETL intermediate results, temp tables | Data shared across multiple systems |
-- Create Managed Table
CREATE TABLE managed_events (
id BIGINT,
event_type STRING,
event_time TIMESTAMP
) USING DELTA;
-- Create External Table
CREATE TABLE external_events (
id BIGINT,
event_type STRING,
event_time TIMESTAMP
) USING DELTA
LOCATION 's3://my-bucket/events/';Creating and Operating Delta Tables
SQL operations:
-- Create table
CREATE TABLE users (
user_id BIGINT,
name STRING,
email STRING,
created_at TIMESTAMP
) USING DELTA
PARTITIONED BY (created_at);
-- Insert data
INSERT INTO users VALUES
(1, 'Alice', 'alice@example.com', '2026-04-14T10:00:00'),
(2, 'Bob', 'bob@example.com', '2026-04-14T11:00:00');
-- Update data
UPDATE users SET email = 'alice_new@example.com' WHERE user_id = 1;
-- Delete data
DELETE FROM users WHERE user_id = 2;PySpark operations:
from delta.tables import DeltaTable
from pyspark.sql.types import StructType, StructField, LongType, StringType, TimestampType
# Create Delta Table from DataFrame
schema = StructType([
StructField("user_id", LongType(), False),
StructField("name", StringType(), True),
StructField("email", StringType(), True),
StructField("created_at", TimestampType(), True)
])
data = [(1, "Alice", "alice@example.com", "2026-04-14T10:00:00")]
df = spark.createDataFrame(data, schema)
# Save in Delta format
df.write.format("delta").mode("overwrite").save("/data/users")
# Read as Delta Table object
delta_table = DeltaTable.forPath(spark, "/data/users")
delta_table.toDF().show()4. Key Delta Table Features
MERGE (Upsert)
MERGE is one of the most powerful features of Delta Tables, handling INSERT + UPDATE + DELETE in a single transaction.
MERGE INTO target_table AS t
USING source_table AS s
ON t.id = s.id
WHEN MATCHED AND s.is_deleted = true THEN DELETE
WHEN MATCHED THEN UPDATE SET
t.name = s.name,
t.email = s.email,
t.updated_at = current_timestamp()
WHEN NOT MATCHED THEN INSERT (id, name, email, updated_at)
VALUES (s.id, s.name, s.email, current_timestamp());# PySpark MERGE
from delta.tables import DeltaTable
delta_table = DeltaTable.forPath(spark, "/data/users")
(delta_table.alias("t")
.merge(source_df.alias("s"), "t.id = s.id")
.whenMatchedUpdate(set={
"name": "s.name",
"email": "s.email"
})
.whenNotMatchedInsert(values={
"id": "s.id",
"name": "s.name",
"email": "s.email"
})
.execute()
)Schema Evolution
Delta Lake allows you to safely modify schemas while preserving existing data.
-- Add columns
ALTER TABLE users ADD COLUMNS (phone STRING, address STRING);
-- Rename columns (requires Column Mapping)
ALTER TABLE users SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name');
ALTER TABLE users RENAME COLUMN phone TO phone_number;# Automatic schema evolution with mergeSchema
new_data.write \
.format("delta") \
.mode("append") \
.option("mergeSchema", "true") \
.save("/data/users")Note: When evolving schemas, new column values for existing data are filled with
null. Type changes (e.g., STRING to INT) are not supported and require separate migration.
Time Travel and Version Management
All changes to a Delta Table are versioned, allowing you to query past states or rollback.
-- Query data at a specific version
SELECT * FROM users VERSION AS OF 3;
-- Query data at a specific timestamp
SELECT * FROM users TIMESTAMP AS OF '2026-04-14T10:00:00';
-- View change history
DESCRIBE HISTORY users;
-- Rollback to a previous version
RESTORE TABLE users TO VERSION AS OF 3;# PySpark Time Travel
df_v3 = spark.read.format("delta").option("versionAsOf", 3).load("/data/users")
df_ts = spark.read.format("delta").option("timestampAsOf", "2026-04-14T10:00:00").load("/data/users")VACUUM and OPTIMIZE
Over time, deleted files and small files accumulate. Use VACUUM and OPTIMIZE to clean up.
-- VACUUM: Remove old files no longer referenced (default 7-day retention)
VACUUM users;
-- Specify retention period (24 hours)
VACUUM users RETAIN 24 HOURS;
-- OPTIMIZE: Compact small files to improve read performance
OPTIMIZE users;
-- Z-ORDER: Optimize data layout by specific columns
OPTIMIZE users ZORDER BY (user_id);Warning: Files deleted by
VACUUMcannot be recovered. Do not use a retention period shorter than your Time Travel retention.
5. Delta Table vs Iceberg Table Comparison
Overview
| Item | Delta Lake | Apache Iceberg |
|---|---|---|
| Developed by | Databricks (Linux Foundation) | Netflix → Apache Foundation |
| First Released | 2019 | 2018 |
| License | Apache 2.0 | Apache 2.0 |
| Base File Format | Parquet | Parquet, ORC, Avro |
| Metadata Storage | JSON + Parquet (Transaction Log) | Manifest files (Avro + JSON) |
| Primary Engine | Apache Spark | Engine-agnostic |
| ACID Transactions | Yes | Yes |
| Time Travel | Yes | Yes |
| Schema Evolution | Yes | Yes |
| Partition Evolution | No (requires rewrite) | Yes (Hidden Partitioning) |
Metadata Management Differences
Delta Lake — Transaction Log approach:
_delta_log/
├── 00000000000000000000.json ← Each commit is a JSON file
├── 00000000000000000001.json
├── ...
└── 00000000000000000010.checkpoint.parquet ← Checkpoint every 10 commits
- Records
add,removeactions sequentially in JSON files - Checkpoints for fast state recovery
- Simple and intuitive structure
Apache Iceberg — Metadata Tree approach:
metadata/
├── v1.metadata.json ← Metadata file (snapshot list)
├── snap-123456.avro ← Snapshot → Manifest List
├── manifest-abc.avro ← Manifest → Data file list
└── manifest-def.avro
- 3-tier structure: Metadata File → Manifest List → Manifest File
- Each manifest contains partition statistics and column-level metrics
- Superior file pruning performance for large-scale tables
Schema Evolution Differences
| Operation | Delta Lake | Apache Iceberg |
|---|---|---|
| Add columns | ALTER TABLE ADD COLUMNS | ALTER TABLE ADD COLUMNS |
| Drop columns | Requires Column Mapping mode | Natively supported |
| Rename columns | Requires Column Mapping mode | Natively supported (ID-based) |
| Reorder columns | Requires Column Mapping mode | Natively supported |
| Type promotion (int→long) | Not supported | Supported |
| Nested schema evolution | Partial support | Full support |
Iceberg tracks columns by unique IDs rather than names, enabling free renaming and reordering. Delta Lake requires Column Mapping mode for similar capabilities.
Partitioning Strategy Differences
Delta Lake — Explicit Partitioning:
-- Partition columns must be explicitly specified
CREATE TABLE events (...) USING DELTA PARTITIONED BY (event_date);
-- Changing partitions requires full table rewriteApache Iceberg — Hidden Partitioning:
-- Supports partition transform functions
CREATE TABLE events (...) USING ICEBERG
PARTITIONED BY (days(event_timestamp));
-- Partition strategy changes are seamless (existing data preserved)
ALTER TABLE events ADD PARTITION FIELD months(event_timestamp);Iceberg's Hidden Partitioning offers these advantages:
- No need to specify partition columns in queries
- Partition strategy changes don't require rewriting existing data
- Provides
year,month,day,hourtransform functions for time-based partitioning
Ecosystem and Engine Compatibility
| Engine | Delta Lake | Apache Iceberg |
|---|---|---|
| Apache Spark | Native support | Supported |
| Trino/Presto | Connector support | Native support |
| Apache Flink | Limited support | Native support |
| Apache Hive | Limited support | Supported |
| Dremio | Supported | Native support |
| Snowflake | Supported | Native support |
| AWS Athena | Supported | Native support |
| Databricks | Native support | Supported |
- Delta Lake provides the deepest integration in Spark and Databricks environments.
- Iceberg supports a broader ecosystem due to its engine-agnostic design.
Performance Comparison
| Scenario | Delta Lake | Apache Iceberg |
|---|---|---|
| Small tables (under 1TB) | Fast read/write | Similar |
| Large tables (over 10TB) | Depends on checkpoints | Excellent with manifest pruning |
| Frequent Upserts | Well-optimized MERGE | Supported but may be slower |
| Partition Pruning | Based on partition columns | More precise with column-level metrics |
| Streaming Writes | Optimal with Spark Structured Streaming | Strong with Flink streaming |
| Concurrent Writes | Optimistic Concurrency | Optimistic Concurrency |
When to Choose Which Format?
Choose Delta Lake when:
- Running Databricks or Spark-centric environments
- Frequent MERGE/Upsert operations are core workloads
- Building real-time pipelines with Spark Structured Streaming
- Leveraging Databricks managed features (Auto-Optimize, Predictive Optimization, etc.)
Choose Apache Iceberg when:
- Using multiple query engines (Trino, Flink, Spark, etc.)
- Working with large tables where partition strategies change frequently
- Dealing with frequent and complex nested schema evolution
- Preferring vendor-neutral open standards
6. Conclusion
Delta Lake and Apache Iceberg are both Lakehouse table formats that bring ACID transactions and table abstractions to Data Lakes. Both projects are actively evolving, and they are increasingly adopting each other's strengths.
Key selection criteria:
- Engine environment: Delta Lake for Spark/Databricks-centric, Iceberg for multi-engine
- Partitioning flexibility: Iceberg for frequent partition changes
- Upsert frequency: Delta Lake for MERGE-heavy workloads
- Vendor independence: Iceberg for avoiding vendor lock-in
Regardless of the format you choose, the core goal is ensuring data reliability and manageability.
References
- Delta Lake Documentation
- Apache Iceberg Documentation
- Delta Lake GitHub
- Apache Iceberg GitHub
- Databricks — Delta Lake vs Iceberg
— The Data Dynamics Engineering Team