PySpark Configuration Tuning, the Complete Picture — Which Configs to Change, and When
Out of the hundreds of Spark settings, we cover only the ones that actually matter: executor sizing (cores, memory, instances), AQE, shuffle partitions, serialization, dynamic allocation — and an evidence-based tuning order to replace the "just raise the memory" anti-pattern.
Spark has hundreds of configuration options. That's why so many teams fall into guess-based tuning: "it's slow, so bump executor.memory; if that doesn't work, add more instances." This wastes resources while the actual problem stays put. In reality, only a dozen or so settings make a real difference, and there is a clear order for what to change and when.
This post picks out only the Spark settings that truly matter and explains what to adjust based on what evidence. (Detailed tuning for individual problems — skew, OOM, small files — is covered in their own dedicated posts; here we focus on the big picture and priorities.)
1. The Golden Rule of Tuning — Measure First, Then Attack the Biggest Bottleneck
❌ Guessing: it's slow → raise memory → still slow → add instances → only cost goes up
✅ Evidence: measure in Spark UI → identify the bottleneck (skew/shuffle/OOM) → adjust only the relevant settingBefore touching any config, identify the bottleneck in the Spark UI first (see our separate post "Debugging Slow PySpark Jobs"). Raising memory when the problem is skew, or adding instances when the problem is shuffle, accomplishes nothing.
2. First Things First — Turn On AQE
Adaptive Query Execution in Spark 3.x automatically adjusts partition coalescing, skew joins, and join strategies at runtime. It delivers the biggest payoff for the least effort of any setting.
spark.conf.set("spark.sql.adaptive.enabled", "true") # master switch
spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true") # auto-merge small partitions
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true") # auto-split skewed joins
spark.conf.set("spark.sql.adaptive.advisoryPartitionSizeInBytes", "128MB")Recent Spark versions enable AQE by default, but verify it explicitly. AQE alone automatically mitigates a large share of small-file, skew, and partition-count problems.
3. Executor Sizing — The Most Important Decision
Resource allocation comes down to balancing three values: executor cores, memory, and instance count.
spark.conf.set("spark.executor.cores", "4") # concurrent tasks per executor
spark.conf.set("spark.executor.memory", "8g") # JVM heap
spark.conf.set("spark.executor.memoryOverhead", "2g")# off-heap (shuffle, native, Python)
spark.conf.set("spark.executor.instances", "20") # number of executorsCore Count — 4 to 5 Is the Sweet Spot
Too few cores (1-2) → inefficient relative to per-executor overhead
Too many cores (>5) → degraded HDFS/storage I/O throughput, GC contention
→ 4-5 cores per executor is the standard recommendationMemory — Separate Heap from Overhead
PySpark's Python workers consume off-heap memory, so give memoryOverhead generous headroom to avoid "Container killed" errors (see our separate post "Conquering PySpark Executor OOM").
| Setting | Meaning | Guideline |
|---|---|---|
executor.cores | Concurrent tasks | 4-5 |
executor.memory | JVM heap | Enough per task, mind GC |
executor.memoryOverhead | Off-heap | ~25% of heap, more for PySpark |
executor.instances | Executor count | Parallelism = cores × instances |
A common mistake: "a few big executors" vs. "many small executors." Executors that are too big suffer long GC pauses; too small, and the overhead dominates. Many medium-sized executors (4-5 cores, 8-16GB) is usually the safe choice.
4. Shuffle Partitions — The Trap of the 200 Default
The default of 200 for spark.sql.shuffle.partitions fits almost no workload. With large data, each partition becomes too big and you hit OOM; with small data, you get 200 mostly-empty partitions.
# With AQE enabled, this is adjusted automatically at runtime, so the default matters less
spark.conf.set("spark.sql.adaptive.enabled", "true")
# Tuning manually without AQE: target ~128MB per partition
# num = total shuffle data / 128MB
spark.conf.set("spark.sql.shuffle.partitions", "800")With AQE's
coalescePartitionsenabled, it's safe to set this value high (e.g., 2000) — partitions get merged down to a sensible count at runtime. In the AQE era, the burden of hand-tuning this value is largely gone.
5. Serialization and Compression
# Kryo serialization (faster and more compact than the Java default)
spark.conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
# Shuffle compression (on by default) — saves network and disk
spark.conf.set("spark.shuffle.compress", "true")
spark.conf.set("spark.shuffle.spill.compress", "true")Kryo is almost always a win. Compression is enabled by default and can usually be left alone.
6. Dynamic Allocation — Resource Efficiency
Instead of a fixed executor count, scale with load (especially on shared clusters and K8s).
spark.conf.set("spark.dynamicAllocation.enabled", "true")
spark.conf.set("spark.dynamicAllocation.shuffleTracking.enabled", "true") # required on K8s
spark.conf.set("spark.dynamicAllocation.minExecutors", "2")
spark.conf.set("spark.dynamicAllocation.maxExecutors", "50")
spark.conf.set("spark.dynamicAllocation.executorIdleTimeout", "60s")(For shuffle tracking and spot-instance issues on K8s, see our separate post "PySpark on Kubernetes.")
7. Broadcast and Joins
# Auto-broadcast threshold (default 10MB) — optimizes small dimension joins
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "50MB")
# Set to -1 to disable broadcasting entirely (when deliberately turning it off)Setting this too high risks broadcasting a large table and triggering OOM (see our separate post "Broadcast Variables and Large Lookups"). 50-100MB is usually safe.
8. Configuration Priority — What to Touch First
1. Enable AQE (skew/coalesce/skewJoin) → biggest impact, least effort
2. Executor sizing (4-5 cores, memory + overhead)
3. shuffle.partitions (with AQE, a high value is fine)
4. memoryOverhead (fixes PySpark "Container killed")
5. Serialization (Kryo), dynamic allocation
6. autoBroadcastJoinThreshold
───
Still slow? → it's a code/data problem (skew, UDFs, small files) — configs can't fix itThe key point: config tuning has limits. Skew, bad joins, expensive UDFs, and small files cannot be fixed with settings — you have to change the code or the data layout. If you've exhausted the configs and it's still slow, the problem is the code, not the configuration.
9. Per-Workload Profiles
| Workload | Settings to Emphasize |
|---|---|
| Large ETL batch | Big executors, shuffle.partitions↑, FTE |
| Interactive/short queries | Dynamic allocation, low latency, broadcast |
| Streaming | Triggers and backpressure, checkpoints, right-sized executors |
| ML training | Memory↑, caching, fewer shuffles |
| High-concurrency shared | Dynamic allocation, fair scheduler |
10. Summary
| Area | Key Settings | Effect |
|---|---|---|
| AQE | adaptive.* | Automatic skew/partition/join handling |
| Sizing | 4-5 cores, memory, overhead | Stability and throughput |
| Shuffle | shuffle.partitions (less critical with AQE) | Partition size |
| Serialization | Kryo | Speed |
| Dynamic allocation | dynamicAllocation.* | Resource efficiency |
| Joins | autoBroadcastJoinThreshold | Shuffle avoidance |
The essence of Spark config tuning is to "find the bottleneck through measurement, adjust the highest-impact settings first, and accept the limits of configuration." Just enabling AQE and sizing executors sensibly stabilizes most jobs. If things are still slow — before touching more configs, suspect code and data problems like skew, UDFs, and small files. Good configuration makes good code fast, but it can't fix bad code.
This article is based on Spark 3.5. If you need comprehensive performance tuning for your Spark clusters and jobs, feel free to reach out anytime.
— The Data Dynamics Engineering Team