build: add spark-4.1 profile and enable Spark 4.1.1 SQL tests [WIP] by andygrove · Pull Request #4093 · apache/datafusion-comet

andygrove · 2026-04-26T14:07:27Z

Which issue does this PR close?

Rationale for this change

We currently run the Spark SQL test suites against 3.4.3, 3.5.8, and 4.0.1 in CI. This PR adds Spark 4.1.1 to that matrix so we can catch incompatibilities introduced by that release as early as possible.

What changes are included in this PR?

New spark-4.1 Maven profile in the root pom.xml and spark/pom.xml. Targets Spark 4.1.1, Scala 2.13.17, Parquet 1.16.0, slf4j 2.0.17, Jetty 11.0.26, JDK 17 (versions match Spark 4.1.1's own pom.xml). The iceberg-spark-runtime artifact for Spark 4.1 is not yet published on Maven Central; the profile temporarily depends on iceberg-spark-runtime-4.0_2.13:1.10.0 until it is.

New shim source trees at spark/src/main/spark-4.1/ and common/src/main/spark-4.1/ (initially copied from spark-4.0). Profile uses shims.majorVerSrc=spark-4.1. Three Spark API changes between 4.0 and 4.1 required code-level adjustments:

Sum.evalMode was moved into a wrapper struct: now Sum.evalContext.evalMode. Added a CometEvalModeUtil.sumEvalMode(s: Sum) helper to all per-version shims and updated aggregates.scala to call it.
SQLConf.BINARY_OUTPUT_STYLE switched from stringConf (with manual withName parsing) to enumConf. Updated binaryOutputStyle in the new spark-4.1 shim to drop the withName mapping.
MapStatus.apply gained a checksumVal: Long = 0 parameter. Java callers cannot use Scala default values, so added a small non-versioned MapStatusHelper.scala wrapper and updated the three call sites in CometBypassMergeSortShuffleWriter and CometUnsafeShuffleWriter.
IndexShuffleBlockResolver's third constructor parameter was widened from java.util.Map to java.util.concurrent.ConcurrentMap. Reflective newInstance(conf, null, Collections.emptyMap()) now fails with "argument type mismatch"; pass a ConcurrentHashMap (assignable to both old and new types).

New dev/diffs/4.1.1.diff generated by applying dev/diffs/4.0.1.diff to the v4.1.1 Spark tag with git apply --reject and resolving rejects manually. Most rejects were import-only differences caused by surrounding-context changes in 4.1.1 (for example, ShuffleExchangeExec vs ShuffleExchangeLike and the additional comet imports). The pom.xml portion of the diff sets spark.version.short=4.1 so the patched Spark build pulls comet-spark-spark4.1_2.13.

Workflow update: {spark-short: '4.1', spark-full: '4.1.1', java: 17, scan-impl: 'auto'} entry added to .github/workflows/spark_sql_test.yml.

Docs update: expanded docs/source/contributor-guide/spark-sql-tests.md with two new sections covering (1) adding a profile and shim tree for a brand-new Spark minor, including the API-shape patterns above, and (2) running the SQL tests locally against a patched Spark clone (NOLINT_ON_COMPILE=true to skip Spark's scalastyle, and the -Dmaven.repo.local=/tmp/spark-m2-repo workaround for partially-cached ~/.m2 entries that confuse Coursier).

How are these changes tested?

Locally: make release PROFILES=-Pspark-4.1 builds cleanly. Against the patched Spark 4.1.1 source with ENABLE_COMET=true ENABLE_COMET_ONHEAP=true:

catalyst/test: 8472 tests passed.
sql/testOnly org.apache.spark.sql.MathFunctionsSuite: 61 tests passed.

The full SQL/Hive matrix will run as part of Spark SQL Tests on this PR; the run itself is the broader test plan.

Add a spark-4.1 Maven profile (root pom.xml and spark/pom.xml) targeting Spark 4.1.1, generate dev/diffs/4.1.1.diff from the 4.0.1 diff, and add a Spark 4.1.1 entry to the spark_sql_test workflow matrix. The spark-4.1 profile reuses the spark-4.0 shim sources for now. Iceberg and Jetty deps mirror the spark-4.0 profile.

…ment new-version workflow Spark 4.1 widened IndexShuffleBlockResolver's third constructor parameter from java.util.Map to java.util.concurrent.ConcurrentMap, so reflectively invoking it with Collections.emptyMap() now fails with "argument type mismatch". Pass a ConcurrentHashMap instead, which is assignable to both. Also expand docs/source/contributor-guide/spark-sql-tests.md with the profile/shim setup and local sbt workflow lessons (NOLINT_ON_COMPILE, fresh -Dmaven.repo.local, common API-shape changes between Spark minors).

Mirror spark-4.0 test shim tree so JVM tests can compile under -Pspark-4.1.

Wire isSpark41Plus into CometPlanStabilitySuite and update regenerate-golden-files.sh to accept --spark-version 4.1. Generated golden files for tpcds v1.4 and v2.7 under approved-plans-{v1_4,v2_7}-spark4_1.

Mirrors the spark-4.0 CometTypeShim helper that apache#4084 added to the scan rule. VariantMetadata.isVariantStruct exists in Spark 4.1.1 (in PushVariantIntoScan.scala) so the implementation is identical to spark-4.0.

Comet's Maven phase resolves the transitive dependency graph and pulls POMs for artifacts whose JARs it never needs. When sbt then resolves Spark's deps, sbt-coursier sees the POM in mavenLocal, declares the artifact found locally, and fails on the missing JAR without falling back to Maven Central. Delete pom-only entries (where packaging is jar/bundle) between Comet's install and sbt's update so sbt re-fetches the full artifact remotely. Hit on Spark 4.1 jobs because the cached scala-xml_2.13:2.1.0 pom blocks Spark's tags subproject from resolving the JAR.

Adds a Spark 4.1, JDK 17 entry to lint-java and linux-test in pr_build_linux.yml and a Spark 4.1, Scala 2.13 entry to pr_build_macos.yml so PR builds exercise the new spark-4.1 profile alongside 3.4/3.5/4.0.

semanticdb-scalac_2.13.17:4.13.6 is not published; the most recent semanticdb build for the 2.13 branch is for 2.13.16. Scala 2.13 is binary-stable across patch versions, so Comet compiled with 2.13.16 still links against Spark 4.1.1's 2.13.17 runtime.

Reverts the 2.13.16 pin: Spark 4.1.1 is compiled against 2.13.17 and emits calls into MurmurHash3 stdlib methods that don't exist in 2.13.16, so any TreeNode hashCode at runtime throws NoSuchMethodError. semanticdb-scalac_2.13.17 isn't published yet, so drop the spark-4.1 entry from the lint-java matrix (which runs -Psemanticdb scalafix); the linux-test matrix entry stays. Verified locally: ./mvnw test -Pspark-4.1 -Dsuites=org.apache.comet.CometFuzzMathSuite passes 30/30.

The scala-2.13 profile sets scala.version=2.13.16, overriding the spark-4.1 profile's 2.13.17 pin. Activating both produced a 2.13.16 runtime that hits NoSuchMethodError on Spark 4.1.1's TreeNode.hashCode (calls into MurmurHash3 stdlib methods added in 2.13.17). The spark-4.1 profile already sets scala.binary.version=2.13, so -Pscala-2.13 is redundant.

1. CometNativeWriteExec: Spark 4.1 made newTaskTempFile(taskContext, dir, ext: String) throw mustOverrideOneMethodError by default; the FileNameSpec overload is now the supported one. Switch the call to use FileNameSpec("", ""), which exists in 3.4 onward, so it works across all profiles. 2. CometExpressionSuite remainder function test: Spark 4.1 introduced REMAINDER_BY_ZERO for the % operator instead of reusing DIVIDE_BY_ZERO. Branch the expected error message on isSpark41Plus.

andygrove added 4 commits April 26, 2026 08:07

save current progress

7a0dd7e

docs: reflow spark-sql-tests.md with prettier

b17726e

andygrove mentioned this pull request Apr 26, 2026

build: Add spark-4.1 profile and shims #2829

Closed

andygrove added 10 commits April 26, 2026 09:03

build: add spark-4.1 test shim sources

626c966

Mirror spark-4.0 test shim tree so JVM tests can compile under -Pspark-4.1.

test: add spark-4.1 plan-stability golden files

8521e41

Wire isSpark41Plus into CometPlanStabilitySuite and update regenerate-golden-files.sh to accept --spark-version 4.1. Generated golden files for tpcds v1.4 and v2.7 under approved-plans-{v1_4,v2_7}-spark4_1.

Merge remote-tracking branch 'apache/main' into spark-4.1.1

7205d4d

fix: add isVariantStruct shim for spark-4.1 profile

330e400

Mirrors the spark-4.0 CometTypeShim helper that apache#4084 added to the scan rule. VariantMetadata.isVariantStruct exists in Spark 4.1.1 (in PushVariantIntoScan.scala) so the implementation is identical to spark-4.0.

ci: enable spark-4.1 PR builds in Linux and macOS matrices

5daf943

Adds a Spark 4.1, JDK 17 entry to lint-java and linux-test in pr_build_linux.yml and a Spark 4.1, Scala 2.13 entry to pr_build_macos.yml so PR builds exercise the new spark-4.1 profile alongside 3.4/3.5/4.0.

andygrove changed the title ~~build: add spark-4.1 profile and enable Spark 4.1.1 SQL tests~~ build: add spark-4.1 profile and enable Spark 4.1.1 SQL tests [WIP] Apr 26, 2026

andygrove mentioned this pull request Apr 26, 2026

Tracking: remaining Spark 4.1 CI failures on #4093 #4098

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: add spark-4.1 profile and enable Spark 4.1.1 SQL tests [WIP]#4093

build: add spark-4.1 profile and enable Spark 4.1.1 SQL tests [WIP]#4093
andygrove wants to merge 14 commits intoapache:mainfrom
andygrove:spark-4.1.1

andygrove commented Apr 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andygrove commented Apr 26, 2026 •

edited

Loading