HDDS-14949 — Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack#10030
HDDS-14949 — Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack#10030yandrey321 wants to merge 18 commits intoapache:masterfrom
Conversation
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @yandrey321 for working on this.
- The patch is way too large. In addition to migration to Ratis-shaded Netty/etc., it includes other build changes (introduction of new modules, Mac-specific workarounds, etc.), which we may do separately.
- Build is failing in your fork: https://github.com/yandrey321/ozone/actions/runs/23906863482
There was a problem hiding this comment.
I don't think .class file should be committed.
There was a problem hiding this comment.
my bad, removed it from version control
There was a problem hiding this comment.
I fixed CI build failures. New modules are required for using shaded version of netty and grpc libs. I also fixed couple test issues.
… grpc-client modules The pre-verify-refresh-classes-from-jar antrun execution (bound to pre-integration-test) did an unconditional rm -rf target/classes/ followed by unzip from the module's local JAR. On CI with mvn clean verify (not install), the JAR is present in target/ but on pom-packaging modules or in any edge-case where the JAR is absent the directory was left empty, causing downstream reactor modules to fail with "class file for ContainerProtos$ContainerCommandRequestProto not found" when they tried to compile against the now-empty target/classes/. Two-part fix: 1. Root pom.xml: replace the unconditional rm+unzip pair with a single /bin/sh one-liner that only runs if the JAR file actually exists, so target/classes/ is left intact when no JAR was produced. 2. hdds-datanode-grpc-client and hdds-scm-grpc-client: override pre-verify-refresh-classes-from-jar with phase=none, because both modules set mdep.analyze.skip=true (no dependency analysis runs), so wiping target/classes/ at pre-integration-test serves no purpose and only risks leaving the directory empty for downstream compilation. Made-with: Cursor
… of .m2/
The pre-compile-delete-classes and pre-test-compile-refresh-classes antrun
executions were restoring hdds-datanode-grpc-client/target/classes/ by
unzipping from the .m2/ repository JAR. On a CI clean-build (mvn clean
verify) no Ozone artifacts are pre-installed to .m2/, so those unzip
operations silently failed, leaving no safety net if anything clears the
directory after compilation.
Switch the source paths from ${settings.localRepository}/org/apache/ozone/...
to ${maven.multiModuleProjectDirectory}/hadoop-hdds/.../target/X.jar.
These local JARs are created during each foundation module's own package
phase (which runs before any downstream module's process-resources), so
they are always present on CI. The unzip still silently no-ops when the
JAR is not yet built (e.g. the foundation module itself hasn't been packaged
yet in the current reactor pass). This makes the class-file restoration
reliable on both CI and local machines without requiring a prior mvn install.
Made-with: Cursor
…repo root ContainerProtos.class was extracted into the project root (org/apache/...) by an unzip operation running in the wrong directory during a debugging session and was accidentally included in commit c075c90. Generated bytecode has no place in version control. Also add *.class to .gitignore so compiled class files at the project root can never be staged again. Made-with: Cursor
Three dependencies in hdds-server-scm were incorrectly changed to test scope during a prior dependency:analyze-only cleanup: - com.fasterxml.jackson.core:jackson-databind - org.apache.commons:commons-compress - org.apache.ozone:hdds-client Although none of them are directly imported in SCM main sources, they are loaded indirectly at runtime (jackson-databind is used via reflection in StorageContainerManager; commons-compress and hdds-client are loaded transitively). Declaring them at test scope caused the generated hdds-server-scm.classpath to omit them, resulting in a NoClassDefFoundError for jackson-databind when the SCM process started, crashing all Docker-based acceptance tests. Restore the three dependencies to compile (default) scope so build-classpath includes them in the runtime classpath descriptor. Add matching ignoredUnusedDeclaredDependency entries to the root POM so dependency:analyze-only no longer flags them as unused. Made-with: Cursor
|
+1 for splitting this into smaller patches if possible. Additionally, we don't want to update the proto.lock files on the master branch if we have not done a release. That will lock in all currently unreleased proto changes even if they are not relevant for this change. |
… the scope of the PR
…erver.ratis.TestContainerStateMachineFollower
There was a problem hiding this comment.
Pull request overview
Migrates Ozone’s gRPC transport from the unshaded io.grpc / io.netty / com.google.protobuf stack to the Ratis-shaded org.apache.ratis.thirdparty.* stack, and introduces new build/module structure to generate and consume shaded gRPC/protobuf sources without classpath duplication.
Changes:
- Added new
hdds-datanode-grpc-clientandhdds-scm-grpc-clientmodules to own proto sources and produce Ratis-shaded gRPC stubs. - Updated Ozone/HDDS production and test code to use
org.apache.ratis.thirdparty.io.grpc.*(and related shaded Netty/Protobuf types) where appropriate. - Updated Maven build behavior (dependency analysis ignores, Develocity cache hints, and new antrun executions) to manage generated sources and mitigate IDE/LSP interference.
Reviewed changes
Copilot reviewed 46 out of 55 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Adds new grpc-client artifacts to dependencyManagement and introduces global build/antrun behaviors (clean/refresh/analyze workflow) and Develocity cache hints. |
| .gitignore | Ignores *.class files. |
| hadoop-ozone/s3gateway/pom.xml | Adds dependency-analyze ignore list and clarifies why some deps must remain compile-scoped. |
| hadoop-ozone/recon/pom.xml | Refactors dependency-analyze config into an execution and expands ignore lists. |
| hadoop-ozone/ozonefs-hadoop3/pom.xml | Moves dependency-analyze ignores into an analyze execution with appended children. |
| hadoop-ozone/ozonefs-hadoop2/pom.xml | Same dependency-analyze execution refactor as ozonefs-hadoop3. |
| hadoop-ozone/ozonefs-common/pom.xml | Adds dependency-analyze ignore for httpclient. |
| hadoop-ozone/cli-shell/pom.xml | Adds dependency-analyze ignore for ratis-common. |
| hadoop-ozone/common/pom.xml | Removes direct gRPC/Netty deps in favor of shaded stack usage. |
| hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java | Replaces generated stub usage with shaded gRPC primitives + custom MethodDescriptor/marshaller and header attachment logic. |
| hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/GrpcOMFailoverProxyProvider.java | Switches gRPC status imports to shaded equivalents. |
| hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/grpc/GrpcClientConstants.java | Switches Context/Metadata imports to shaded gRPC equivalents. |
| hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/grpc/ClientAddressServerInterceptor.java | Switches imports to shaded gRPC equivalents. |
| hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/grpc/ClientAddressClientInterceptor.java | Switches imports to shaded gRPC equivalents. |
| hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/protocolPB/grpc/TestClientAddressServerInterceptor.java | Updates test imports to shaded gRPC equivalents. |
| hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/protocolPB/grpc/TestClientAddressClientInterceptor.java | Updates test imports to shaded gRPC equivalents. |
| hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/protocolPB/TestS3GrpcOmTransport.java | Reworks tests away from in-process gRPC to shaded Netty server + custom MethodDescriptor/marshaller. |
| hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/protocolPB/TestGrpcOmTransportConcurrentFailover.java | Same testing approach update as TestS3GrpcOmTransport, for concurrent failover. |
| hadoop-ozone/ozone-manager/pom.xml | Removes direct gRPC/Netty dependencies (now relying on shaded stack). |
| hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/GrpcOzoneManagerServer.java | Switches server imports to shaded gRPC/Netty. |
| hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerServiceGrpc.java | Implements shaded BindableService directly with a custom MethodDescriptor/marshaller to keep OM protos unshaded. |
| hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/request/TestOMClientRequestWithUserInfo.java | Updates gRPC Context import to shaded equivalent. |
| hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOMMetadataReader.java | Updates gRPC Context import to shaded equivalent. |
| hadoop-ozone/interface-client/pom.xml | Adds Develocity cache hints to avoid stale generated-source compilation results. |
| hadoop-ozone/csi/pom.xml | Removes unshaded gRPC/Netty/protobuf deps and adds ratis-thirdparty-misc; adds analyzer ignores and source rewrite step for generated protos. |
| hadoop-ozone/csi/src/main/java/org/apache/hadoop/ozone/csi/CsiServer.java | Switches gRPC/Netty imports to shaded equivalents. |
| hadoop-ozone/csi/src/main/java/org/apache/hadoop/ozone/csi/ControllerService.java | Switches StreamObserver import to shaded equivalent. |
| hadoop-ozone/csi/src/main/java/org/apache/hadoop/ozone/csi/IdentityService.java | Switches BoolValue/StreamObserver imports to shaded equivalents. |
| hadoop-ozone/csi/src/main/java/org/apache/hadoop/ozone/csi/NodeService.java | Switches StreamObserver import to shaded equivalent. |
| hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestSecureOzoneRpcClient.java | Reduces retries for faster failure and broadens expected exception type. |
| hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestDelegationToken.java | Reduces retries for faster failure and broadens expected exception type. |
| hadoop-ozone/integration-test-recon/pom.xml | Refactors dependency-analyze ignores into an execution with appended children. |
| hadoop-ozone/dist/src/main/license/jar-report.txt | Updates distribution jar inventory to reflect new modules and removed libs. |
| hadoop-ozone/dist/src/main/license/bin/LICENSE.txt | Updates third-party license inventory reflecting dependency changes. |
| hadoop-hdds/pom.xml | Adds the new grpc-client modules to the HDDS reactor build. |
| hadoop-hdds/interface-client/pom.xml | Adds shaded-runtime dependency and adjusts protobuf generation to produce shaded ContainerProtos for DatanodeClientProtocol.proto. |
| hadoop-hdds/interface-server/pom.xml | Removes SCM-related ratis-proto generation/rewrite now moved to scm-grpc-client; adds Develocity hints and temp dir config. |
| hadoop-hdds/interface-server/src/main/resources/proto.lock | Removes SCM-related proto definitions from this module’s lock file after migration. |
| hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsUtils.java | Switches protobuf exception imports to shaded protobuf where ByteString is shaded. |
| hadoop-hdds/common/pom.xml | Disables incremental compilation and adds dependency-analyze ignores for generated/processor-only deps. |
| hadoop-hdds/client/pom.xml | Adds dependency on hdds-datanode-grpc-client and adjusts dependency-analyze ignores. |
| hadoop-hdds/framework/pom.xml | Removes unshaded grpc-api dependency and adds test dependency on hdds-datanode-grpc-client; adds dependency-analyze ignore for jetty-http. |
| hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/grpc/metrics/GrpcMetricsServerRequestInterceptor.java | Switches gRPC/protobuf imports to shaded equivalents. |
| hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/grpc/metrics/GrpcMetricsServerResponseInterceptor.java | Switches gRPC/protobuf imports to shaded equivalents. |
| hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/grpc/metrics/GrpcMetricsServerTransportFilter.java | Switches gRPC imports to shaded equivalents. |
| hadoop-hdds/container-service/pom.xml | Adds dependency on hdds-datanode-grpc-client and adds dependency-analyze ignores for runtime/service-loaded deps. |
| hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java | Reorders unhealthy-marking to occur before completing the future exceptionally. |
| hadoop-hdds/server-scm/pom.xml | Ensures certain deps remain compile-scoped for runtime classpath stability; adds dependency on hdds-scm-grpc-client. |
| hadoop-hdds/datanode-grpc-client/pom.xml | New module to generate shaded gRPC stubs for DatanodeClientProtocol.proto with in-place source rewriting. |
| hadoop-hdds/datanode-grpc-client/src/main/resources/proto.lock | Adds proto.lock for the new module. |
| hadoop-hdds/scm-grpc-client/pom.xml | New module to generate shaded gRPC stubs for SCM internal RPC protos with in-place source rewriting. |
| hadoop-hdds/scm-grpc-client/src/main/proto/InterSCMProtocol.proto | Migrated proto file now owned by scm-grpc-client. |
| hadoop-hdds/scm-grpc-client/src/main/proto/SCMRatisProtocol.proto | Migrated proto file now owned by scm-grpc-client. |
| hadoop-hdds/scm-grpc-client/src/main/proto/SCMUpdateProtocol.proto | Migrated proto file now owned by scm-grpc-client. |
| hadoop-hdds/scm-grpc-client/src/main/resources/proto.lock | Adds proto.lock for scm-grpc-client with migrated proto definitions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <!-- | ||
| The Cursor/VSCode Java Language Server recreates class files in target/classes/ | ||
| and target/test-classes/ (with com.apple.provenance extended attributes) faster | ||
| than the pre-clean rm -rf can remove them. failOnError=false lets the build | ||
| continue even if the clean plugin cannot remove those IDE-managed files; the | ||
| subsequent compile phase writes fresh class files over them so the build result | ||
| is still correct. | ||
| --> | ||
| <failOnError>false</failOnError> |
There was a problem hiding this comment.
Setting maven-clean-plugin false globally can mask legitimate clean failures (eg. permission issues) and allow stale build outputs to leak into subsequent phases. Consider scoping this workaround to the affected environment (eg. a macOS-only profile/property) so other platforms still fail fast on real clean errors.
| @Override | ||
| public InputStream stream(T value) { | ||
| if (!(value instanceof com.google.protobuf.MessageLite)) { | ||
| throw new IllegalArgumentException("Expected protobuf request/response"); | ||
| } | ||
| return new ByteArrayInputStream(((com.google.protobuf.MessageLite) value).toByteArray()); | ||
| } |
There was a problem hiding this comment.
Proto2Marshaller.stream() serializes via MessageLite.toByteArray(), which always copies the message into a new byte[]. This contradicts the PR goal of preserving zero-copy marshalling and can add avoidable allocations on every RPC. Prefer streaming from the message's ByteString (eg. toByteString().newInput()) or another zero-copy InputStream source.
| org.apache.ratis.thirdparty.io.grpc.Channel intercepted = | ||
| ClientInterceptors.intercept(channel, new FixedHeadersInterceptor(headers)); | ||
| return ClientCalls.blockingUnaryCall(intercepted, SUBMIT_REQUEST_METHOD, | ||
| CallOptions.DEFAULT, request); |
There was a problem hiding this comment.
submitRequest() wraps the ManagedChannel with ClientInterceptors.intercept(...) on every call to attach fixed headers. This creates additional wrapper allocations per RPC. Since the hostname/IP values are effectively constant for the process, consider attaching a headers interceptor once at channel construction time (or caching the intercepted Channel) to reduce per-call overhead.
| private static <T extends MessageLite> MethodDescriptor.Marshaller<T> proto2Marshaller( | ||
| Proto2Parser<T> parser) { | ||
| return new MethodDescriptor.Marshaller<T>() { | ||
| @Override | ||
| public InputStream stream(T value) { | ||
| return new ByteArrayInputStream(value.toByteArray()); | ||
| } |
There was a problem hiding this comment.
proto2Marshaller().stream() uses value.toByteArray(), which forces a full copy of every request/response. If the intent is to keep zero-copy behavior, prefer streaming from ByteString (eg. value.toByteString().newInput()) to avoid extra allocations and match the PR description.
| omClient = new OzoneManagerProtocolClientSideTranslatorPB( | ||
| OmTransportFactory.create(conf, testUser, null), | ||
| OmTransportFactory.create(fastConf, testUser, null), | ||
| RandomStringUtils.secure().nextAscii(5)); | ||
| ex = assertThrows(OMException.class, | ||
| assertThrows(IOException.class, | ||
| () -> omClient.cancelDelegationToken(token)); |
There was a problem hiding this comment.
This assertion was broadened to any IOException, which weakens the check that token-cancel fails with the expected security-related error. To keep the test meaningful while still avoiding long retry timeouts, consider asserting on the IOException cause/message (or using assertThrowsExactly/expected subclasses like OMException/AccessControlException) rather than any IOException.
| <exec executable="/bin/rm" failonerror="false"> | ||
| <arg value="-rf" /> | ||
| <arg value="${project.build.directory}" /> | ||
| </exec> |
There was a problem hiding this comment.
The build now executes /bin/rm during pre-clean. This is non-portable (eg. Windows) and also bypasses Maven's normal clean semantics. Please gate this execution behind an OS-activated profile (macOS only) and/or an explicit opt-in property so CI and other developer environments are not affected.
| <exec executable="/bin/sh" failonerror="false"> | ||
| <arg value="-c" /> | ||
| <arg value="JAR='${project.build.directory}/${project.build.finalName}.jar'; test -f "$JAR" && /bin/rm -rf '${project.build.outputDirectory}' && unzip -qo "$JAR" -d '${project.build.outputDirectory}'" /> | ||
| </exec> |
There was a problem hiding this comment.
The pre-verify-refresh-classes-from-jar execution relies on /bin/sh and unzip being available and uses a shell one-liner to conditionally delete/restore target/classes. This is fragile and non-portable; consider using Ant condition + built-in / tasks (or a platform-scoped profile) so the behavior is deterministic across environments.
| Metadata.Key.of("CLIENT_HOSTNAME", Metadata.ASCII_STRING_MARSHALLER); | ||
| private static final Metadata.Key<String> CLIENT_IP_ADDRESS_METADATA_KEY = | ||
| Metadata.Key.of("CLIENT_IP_ADDRESS", Metadata.ASCII_STRING_MARSHALLER); |
There was a problem hiding this comment.
GrpcOmTransport duplicates the metadata key definitions for CLIENT_HOSTNAME/CLIENT_IP_ADDRESS instead of reusing GrpcClientConstants. This risks subtle mismatches if the header names/marshallers change elsewhere. Prefer referencing GrpcClientConstants.CLIENT_*_METADATA_KEY directly.
| Metadata.Key.of("CLIENT_HOSTNAME", Metadata.ASCII_STRING_MARSHALLER); | |
| private static final Metadata.Key<String> CLIENT_IP_ADDRESS_METADATA_KEY = | |
| Metadata.Key.of("CLIENT_IP_ADDRESS", Metadata.ASCII_STRING_MARSHALLER); | |
| GrpcClientConstants.CLIENT_HOSTNAME_METADATA_KEY; | |
| private static final Metadata.Key<String> CLIENT_IP_ADDRESS_METADATA_KEY = | |
| GrpcClientConstants.CLIENT_IP_ADDRESS_METADATA_KEY; |
| assertThrows(IOException.class, () -> | ||
| proxyUser.doAs((PrivilegedExceptionAction<Void>) () -> { | ||
| try (OzoneClient ozoneClient = OzoneClientFactory.getRpcClient(testConf)) { | ||
| ozoneClient.getObjectStore().listVolumes("/"); | ||
| } | ||
| return null; | ||
| })); |
There was a problem hiding this comment.
This test now accepts any IOException, which can also include unrelated failures (eg. transient network/cluster issues) and reduces the signal that the rejection is specifically auth-related. Consider asserting on the exception type/cause chain (eg. AccessControlException/OMException) or matching an expected message to keep the test deterministic.
jojochuang
left a comment
There was a problem hiding this comment.
Thanks! left a few quick comments. I'm in the middle of it
| // Mark the container unhealthy BEFORE completing the future so that | ||
| // any subsequent applyTransaction sees the unhealthy state immediately | ||
| // when it calls checkContainerHealthy (race-free). | ||
| stateMachineHealthy.compareAndSet(true, false); | ||
| unhealthyContainers.add(requestProto.getContainerID()); | ||
| // Since the applyTransaction now is completed exceptionally, | ||
| // before any further snapshot is taken , the exception will be | ||
| // caught in stateMachineUpdater in Ratis and ratis server will | ||
| // shutdown. | ||
| applyTransactionFuture.completeExceptionally(sce); | ||
| stateMachineHealthy.compareAndSet(true, false); | ||
| unhealthyContainers.add(requestProto.getContainerID()); |
There was a problem hiding this comment.
functional change should go to a separate PR.
| <dependency> | ||
| <groupId>com.google.guava</groupId> | ||
| <artifactId>guava</artifactId> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>com.google.protobuf</groupId> | ||
| <artifactId>protobuf-java</artifactId> | ||
| <version>${protobuf.version}</version> | ||
| </dependency> |
There was a problem hiding this comment.
i'm not sure we want to replace guava too.
Protobuf? not sure either.
There was a problem hiding this comment.
not necessarily needed.
| ThreadFactory factory = new ThreadFactoryBuilder() | ||
| .setDaemon(true) | ||
| .setNameFormat(CLIENT_NAME + "-ELG-%d") | ||
| .build(); | ||
|
|
||
| final Class<? extends Channel> channelType; | ||
| if (Epoll.isAvailable()) { | ||
| eventLoopGroup = new EpollEventLoopGroup(0, factory); | ||
| channelType = EpollSocketChannel.class; | ||
| } else { | ||
| eventLoopGroup = new NioEventLoopGroup(0, factory); | ||
| channelType = NioSocketChannel.class; | ||
| } | ||
| LOG.info("{} channel type {}", CLIENT_NAME, channelType.getSimpleName()); | ||
|
|
There was a problem hiding this comment.
this change should split into a separate PR.
and I believe there are other places we want to use epoll whenever possible.
| UserGroupInformation realUser = UserGroupInformation.createRemoteUser("realUser"); | ||
| UserGroupInformation proxyUser = UserGroupInformation.createProxyUser("user", realUser); | ||
|
|
||
| assertThrows(AccessControlException.class, () -> { |
| <plugin> | ||
| <groupId>org.apache.maven.plugins</groupId> | ||
| <artifactId>maven-dependency-plugin</artifactId> | ||
| <configuration> |
There was a problem hiding this comment.
in effect, this change:
scopes the configuration to a specific execution (analyze), and
makes the ignore-list additive across parent/child POM inheritance, reducing the chance that this module accidentally wipes out ignore rules defined in a parent.
| } | ||
| } | ||
|
|
||
| private static final class Proto2Marshaller<T> implements MethodDescriptor.Marshaller<T> { |
There was a problem hiding this comment.
there's another one in TestS3GrpcOmTransport.java which is exactly the same.
| } | ||
| } | ||
|
|
||
| private static final class Proto2Marshaller<T> implements MethodDescriptor.Marshaller<T> { |
There was a problem hiding this comment.
and I wonder if the other Proto2Marshaller in GrpcOmTransport can conslidate too.
| <groupId>org.apache.maven.plugins</groupId> | ||
| <artifactId>maven-compiler-plugin</artifactId> | ||
| <configuration> | ||
| <useIncrementalCompilation>false</useIncrementalCompilation> |
There was a problem hiding this comment.
"Disabling incremental compilation forces a cleaner, dependency-correct rebuild behavior for that module’s compilation step, so that when an API change happens, Maven/javac are much less likely to leave behind stale .class files that were compiled against old symbols.
In short: it trades some compile speed for build determinism and correctness, which matters a lot in PRs like this one (migrating shaded gRPC/Netty/Ratis dependencies) where API surfaces and classpaths can shift in ways that incremental compilation is particularly bad at handling."
What changes were proposed in this pull request?
Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack
Summary
Migrates Ozone's gRPC transport layer from the standalone
io.grpc/io.netty/com.google.protobuflibraries to the Ratis-shaded equivalents (
org.apache.ratis.thirdparty.*), eliminating duplicate copiesof these libraries on the classpath and resolving version conflicts with Ratis at runtime.
Zero-copy marshalling is preserved: all generated stubs use the shaded
MessageLite-basedProtoUtils.marshaller()fromratis-thirdparty.New modules
hadoop-hdds/datanode-grpc-clienthdds-datanode-grpc-clientDatanodeClientProtocol.proto; generates shaded gRPC stubs for Datanode/Container RPChadoop-hdds/scm-grpc-clienthdds-scm-grpc-clientInterSCMProtocol.proto,SCMRatisProtocol.proto,SCMUpdateProtocol.proto; generates shaded gRPC stubs for inter-SCM and SCM-Ratis RPCBoth modules use
protobuf-maven-pluginto generate Java from the proto files and thenmaven-antrun-pluginto rewrite the generated sources in-place before compilation:com.google.protobuf → org.apache.ratis.thirdparty.com.google.protobuf com.google.common → org.apache.ratis.thirdparty.com.google.common io.grpc → org.apache.ratis.thirdparty.io.grpc
Generating directly into the shaded source root (
target/generated-sources/proto-java-ratis/)ensures both Maven and the IDE compile from the same tree, preventing stale-class interference.
Proto file migrations
DatanodeClientProtocol.protohdds-interface-clienthdds-datanode-grpc-clientInterSCMProtocol.protohdds-interface-serverhdds-scm-grpc-clientSCMRatisProtocol.protohdds-interface-serverhdds-scm-grpc-clientSCMUpdateProtocol.protohdds-interface-serverhdds-scm-grpc-clientproto.lockfiles updated in all four affected modules.Source changes —
io.grpc→ shaded importshadoop-ozone/common—GrpcOmTransport,GrpcOMFailoverProxyProvider,ClientAddressClientInterceptor,ClientAddressServerInterceptor,GrpcClientConstantsio.grpc.*replaced withorg.apache.ratis.thirdparty.io.grpc.*hadoop-ozone/ozone-manager—GrpcOzoneManagerServer,OzoneManagerServiceGrpchadoop-hdds/framework—GrpcMetricsServerRequestInterceptor,GrpcMetricsServerResponseInterceptor,GrpcMetricsServerTransportFilterhadoop-ozone/csi—CsiServer,ControllerService,IdentityService,NodeServicehadoop-hdds/common—HddsUtilsByteStringusage aligned with shaded protobufBuild changes
Root
pom.xmlhdds-datanode-grpc-clientandhdds-scm-grpc-clientadded to<dependencyManagement>anddependency-analysis ignore lists.
maven-antrun-pluginexecutions added globally (inherited by every module):pre-clean-force-delete-targetpre-clean/bin/rm -rfto deletetarget/beforemaven-clean-pluginruns, working around macOScom.apple.provenanceextended attributes that preventjava.nio.file.Files.delete()from removing IDE-written class filespre-compile-delete-classesprocess-resources.classfiles from the current module, then restoreshdds-common,hdds-datanode-grpc-client, andhdds-scm-grpc-clientfrom their installed.m2/JARs to prevent Java Language Server corruption of dependency class files during reactor buildspre-test-compile-refresh-classesprocess-test-resourcestest-compile, closing the LSP interference window betweencompileandtest-compilewithin a single module lifecyclepre-verify-refresh-classes-from-jarpre-integration-testtarget/classes/from the module's own freshly-built JAR beforedependency:analyze-onlyruns atverifypre-verify-delete-test-classesexecution phase corrected fromprepare-packagetopre-integration-test, fixing a bug where test-jars were being packaged empty (compiledtest classes were deleted before
maven-jar-plugin:test-jarcould include them).What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14949
How was this patch tested?
Build, unit and integration tests