Skip to content

Spark: Add session-level split size override#16154

Open
gerashegalov wants to merge 7 commits intoapache:mainfrom
gerashegalov:split-size-conf-main
Open

Spark: Add session-level split size override#16154
gerashegalov wants to merge 7 commits intoapache:mainfrom
gerashegalov:split-size-conf-main

Conversation

@gerashegalov
Copy link
Copy Markdown

Closes #16153

What changes were made in this PR?

Add a new Spark session configuration key spark.sql.iceberg.split-size that allows overriding
the read.split.target-size table property at the session level without requiring DDL changes
to table metadata or source code changes to read call sites.

This is particularly useful when GPU and CPU workloads read the same Iceberg table
concurrently: GPU sessions benefit from significantly larger splits (e.g. 2GB) while CPU
sessions perform better with the default 128MB. Hardware accelerators like
RAPIDS Accelerator for Apache Spark are designed as
drop-in replacements requiring no application code changes, so a session-level knob is essential.

Changes

All Spark shims (v3.4, v3.5, v4.0):

  • SparkSQLProperties: add SPLIT_SIZE = "spark.sql.iceberg.split-size" constant
  • SparkReadConf: add .sessionConf(SparkSQLProperties.SPLIT_SIZE) to both splitSize() and
    splitSizeOption() parser chains; update Javadoc to document 5-level precedence
  • SparkConfParser: store Table.name() as tableName and in ConfParser.parse() try a
    table-qualified session key (<key>.<tableName>) before the global session key

v3.5 only:

  • TestSparkWriteConf: add 4 tests for table-scoped session conf resolution

Resolution precedence

  1. Read option (split-size)
  2. Table-scoped session conf (spark.sql.iceberg.split-size.<catalog>.<db>.<table>)
  3. Global session conf (spark.sql.iceberg.split-size)
  4. Table property (read.split.target-size)
  5. Default (128MB)

How was this patch tested?

4 new unit tests in TestSparkWriteConf (v3.5):

  • table-scoped session key takes precedence over global
  • global session key works when no table-scoped key is set
  • read option takes precedence over table-scoped session key
  • table-scoped session key takes precedence over table property

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
… configurations

- Removed the session configuration for split size from SparkReadConf and SparkSQLProperties.
- Updated SparkReadConf documentation to clarify the precedence of table-scoped session configurations over global settings.
- Added tests to verify that table-scoped session configurations take precedence over global configurations and that options take precedence over table-scoped configurations.

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@github-actions github-actions Bot added the spark label Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spark: Allow session-level split size override without DDL or source code changes

1 participant