feat(spark): add Spark 4.2 support#18621
Draft
yihua wants to merge 3 commits intoapache:masterfrom
Draft
Conversation
7c698e3 to
b236759
Compare
Add Spark 4.2 support to Apache Hudi, introducing a new hudi-spark4.2.x adapter module that handles API changes between Spark 4.1 and 4.2. Dependency version updates (aligned with Spark 4.2.0-preview4): - Scala: 2.13.17 -> 2.13.18 - Hadoop: 3.4.2 -> 3.4.3 - Parquet: 1.16.0 -> 1.17.0 - Jackson: 2.20.0 -> 2.21.2 - ORC: 2.2.1 -> 2.3.0 - Kafka: 3.9.1 -> 3.9.2 - Log4j: 2.20.0 -> 2.25.4 - lz4-java: org.lz4:1.8.0 -> at.yawk.lz4:1.10.4 - Avro: 1.12.1 (unchanged) - SLF4J: 2.0.17 (unchanged) API changes handled: - InsertIntoStatement: 7 args -> 9 args (added replaceCriteriaOpt, withSchemaEvolution) - UnresolvedFunction: ignoreNulls changed from Boolean to Option[Boolean] - CharType/VarcharType: added collation parameter (fixed in shared code using type-based pattern matching) Other: - Added isSpark4_2 and gteqSpark4_2 version helpers - Added Spark4_2Adapter to SparkAdapterSupport - Added Spark 4.2 version dispatch in HoodieAnalysis for rules loaded via reflection - Fixed lz4-java classpath conflict: Spark 4.2 relocated lz4-java from org.lz4 to at.yawk.lz4; made groupId/version configurable via properties to avoid duplicate classes on classpath - CI, Docker bundle validation, and release script support for Spark 4.2
- TestMergeIntoTable2: error message changed from "Eagerly executed command failed" to "Executed command failed" in Spark 4.2 - TestMergeIntoTable: non-existent target table now throws SparkException wrapping AnalysisException instead of bare AnalysisException in Spark 4.2
efb40e8 to
999189b
Compare
In Spark 4.2, TABLE_OR_VIEW_NOT_FOUND is wrapped in a SparkException whose cause message contains template variables instead of the expanded error text. Check both the exception message and cause message for the TABLE_OR_VIEW_NOT_FOUND error class.
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
This PR adds support for Hudi on Spark 4.2, using the latest preview release (4.2.0-preview4).
Summary and Changelog
Add Spark 4.2 support to Apache Hudi, introducing a new
hudi-spark4.2.xadapter module that handles API changes between Spark 4.1 and 4.2.Dependency version updates (aligned with Spark 4.2.0-preview4):
API changes handled:
InsertIntoStatement: 7 args -> 9 args (addedreplaceCriteriaOpt,withSchemaEvolution)UnresolvedFunction:ignoreNullsparameter changed fromBooleantoOption[Boolean]CharType/VarcharType: addedcollationparameter (fixed in shared code using type-based pattern matching)Other:
HoodieSparkUtils.isSpark4_2andgteqSpark4_2version helpersSpark4_2AdaptertoSparkAdapterSupportHoodieAnalysisfor rules loaded via reflectionImpact
Adds Spark 4.2 as a supported engine version for Hudi.
Risk Level
Low
Documentation Update
Updated
README.mdandhudi-spark-datasource/README.mdwith Spark 4.2 build profile.Contributor's checklist