Spark: Add session-level split size override by gerashegalov · Pull Request #16154 · apache/iceberg

gerashegalov · 2026-04-29T08:51:42Z

What changes were made in this PR?

Add a new Spark session configuration key spark.sql.iceberg.split-size that allows overriding
the read.split.target-size table property at the session level without requiring DDL changes
to table metadata or source code changes to read call sites.

This is particularly useful when GPU and CPU workloads read the same Iceberg table
concurrently: GPU sessions benefit from significantly larger splits (e.g. 2GB) while CPU
sessions perform better with the default 128MB. Hardware accelerators like
RAPIDS Accelerator for Apache Spark are designed as
drop-in replacements requiring no application code changes, so a session-level knob is essential.

Changes

All Spark shims (v3.4, v3.5, v4.0):

SparkSQLProperties: add SPLIT_SIZE = "spark.sql.iceberg.split-size" constant
SparkReadConf: add .sessionConf(SparkSQLProperties.SPLIT_SIZE) to both splitSize() and
splitSizeOption() parser chains; update Javadoc to document 5-level precedence
SparkConfParser: store Table.name() as tableName and in ConfParser.parse() try a
table-qualified session key (<key>.<tableName>) before the global session key

v3.5 only:

TestSparkWriteConf: add 4 tests for table-scoped session conf resolution

Resolution precedence

Read option (split-size)
Table-scoped session conf (spark.sql.iceberg.split-size.<catalog>.<db>.<table>)
Global session conf (spark.sql.iceberg.split-size)
Table property (read.split.target-size)
Default (128MB)

How was this patch tested?

4 new unit tests in TestSparkWriteConf (v3.5):

table-scoped session key takes precedence over global
global session key works when no table-scoped key is set
read option takes precedence over table-scoped session key
table-scoped session key takes precedence over table property

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>

… configurations - Removed the session configuration for split size from SparkReadConf and SparkSQLProperties. - Updated SparkReadConf documentation to clarify the precedence of table-scoped session configurations over global settings. - Added tests to verify that table-scoped session configurations take precedence over global configurations and that options take precedence over table-scoped configurations. Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>

…ation for scan planning

…sing

gerashegalov added 6 commits April 10, 2026 11:53

Iceberg reads split size as a conf

c852e48

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>

Merge branch 'split-size-conf' into split-size-conf-main

9fa325a

Enhance SparkSQLProperties and SparkReadConf: Add SPLIT_SIZE configur…

e188468

…ation for scan planning

Update SparkReadConf to include sessionConf for SPLIT_SIZE option par…

3cb8a8c

…sing

Update SparkReadConf to include sessionConf for SPLIT_SIZE option par…

9713856

…sing

github-actions Bot added the spark label Apr 29, 2026

formatting

dab99e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Add session-level split size override#16154

Spark: Add session-level split size override#16154
gerashegalov wants to merge 7 commits intoapache:mainfrom
gerashegalov:split-size-conf-main

gerashegalov commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gerashegalov commented Apr 29, 2026

What changes were made in this PR?

Changes

Resolution precedence

How was this patch tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant