Skip to content

HDDS-14949 — Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack#10030

Draft
yandrey321 wants to merge 18 commits intoapache:masterfrom
yandrey321:HDDS-14949
Draft

HDDS-14949 — Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack#10030
yandrey321 wants to merge 18 commits intoapache:masterfrom
yandrey321:HDDS-14949

Conversation

@yandrey321
Copy link
Copy Markdown

@yandrey321 yandrey321 commented Apr 2, 2026

What changes were proposed in this pull request?

Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack

Summary

Migrates Ozone's gRPC transport layer from the standalone io.grpc / io.netty / com.google.protobuf
libraries to the Ratis-shaded equivalents (org.apache.ratis.thirdparty.*), eliminating duplicate copies
of these libraries on the classpath and resolving version conflicts with Ratis at runtime.
Zero-copy marshalling is preserved: all generated stubs use the shaded MessageLite-based
ProtoUtils.marshaller() from ratis-thirdparty.

New modules

Module Artifact Purpose
hadoop-hdds/datanode-grpc-client hdds-datanode-grpc-client Owns DatanodeClientProtocol.proto; generates shaded gRPC stubs for Datanode/Container RPC
hadoop-hdds/scm-grpc-client hdds-scm-grpc-client Owns InterSCMProtocol.proto, SCMRatisProtocol.proto, SCMUpdateProtocol.proto; generates shaded gRPC stubs for inter-SCM and SCM-Ratis RPC

Both modules use protobuf-maven-plugin to generate Java from the proto files and then
maven-antrun-plugin to rewrite the generated sources in-place before compilation:
com.google.protobuf → org.apache.ratis.thirdparty.com.google.protobuf com.google.common → org.apache.ratis.thirdparty.com.google.common io.grpc → org.apache.ratis.thirdparty.io.grpc

Generating directly into the shaded source root (target/generated-sources/proto-java-ratis/)
ensures both Maven and the IDE compile from the same tree, preventing stale-class interference.


Proto file migrations

File Moved from Moved to
DatanodeClientProtocol.proto hdds-interface-client hdds-datanode-grpc-client
InterSCMProtocol.proto hdds-interface-server hdds-scm-grpc-client
SCMRatisProtocol.proto hdds-interface-server hdds-scm-grpc-client
SCMUpdateProtocol.proto hdds-interface-server hdds-scm-grpc-client
proto.lock files updated in all four affected modules.

Source changes — io.grpc → shaded imports

File Change
hadoop-ozone/commonGrpcOmTransport, GrpcOMFailoverProxyProvider, ClientAddressClientInterceptor, ClientAddressServerInterceptor, GrpcClientConstants All io.grpc.* replaced with org.apache.ratis.thirdparty.io.grpc.*
hadoop-ozone/ozone-managerGrpcOzoneManagerServer, OzoneManagerServiceGrpc Same migration
hadoop-hdds/frameworkGrpcMetricsServerRequestInterceptor, GrpcMetricsServerResponseInterceptor, GrpcMetricsServerTransportFilter Same migration
hadoop-ozone/csiCsiServer, ControllerService, IdentityService, NodeService Same migration
hadoop-hdds/commonHddsUtils ByteString usage aligned with shaded protobuf
All corresponding test files updated to match.

Build changes

Root pom.xml

  • hdds-datanode-grpc-client and hdds-scm-grpc-client added to <dependencyManagement> and
    dependency-analysis ignore lists.
  • Four new maven-antrun-plugin executions added globally (inherited by every module):
    Execution Phase Purpose
    pre-clean-force-delete-target pre-clean Uses /bin/rm -rf to delete target/ before maven-clean-plugin runs, working around macOS com.apple.provenance extended attributes that prevent java.nio.file.Files.delete() from removing IDE-written class files
    pre-compile-delete-classes process-resources Deletes stale .class files from the current module, then restores hdds-common, hdds-datanode-grpc-client, and hdds-scm-grpc-client from their installed .m2/ JARs to prevent Java Language Server corruption of dependency class files during reactor builds
    pre-test-compile-refresh-classes process-test-resources Same foundation-module refresh before test-compile, closing the LSP interference window between compile and test-compile within a single module lifecycle
    pre-verify-refresh-classes-from-jar pre-integration-test Repopulates target/classes/ from the module's own freshly-built JAR before dependency:analyze-only runs at verify
  • pre-verify-delete-test-classes execution phase corrected from prepare-package to
    pre-integration-test, fixing a bug where test-jars were being packaged empty (compiled
    test classes were deleted before maven-jar-plugin:test-jar could include them).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14949

How was this patch tested?

Build, unit and integration tests

@adoroszlai adoroszlai marked this pull request as draft April 2, 2026 15:17
Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yandrey321 for working on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think .class file should be committed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, removed it from version control

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed CI build failures. New modules are required for using shaded version of netty and grpc libs. I also fixed couple test issues.

yandrey321 and others added 12 commits April 2, 2026 12:18
… grpc-client modules

The pre-verify-refresh-classes-from-jar antrun execution (bound to
pre-integration-test) did an unconditional rm -rf target/classes/ followed
by unzip from the module's local JAR.  On CI with mvn clean verify (not
install), the JAR is present in target/ but on pom-packaging modules or in
any edge-case where the JAR is absent the directory was left empty, causing
downstream reactor modules to fail with
  "class file for ContainerProtos$ContainerCommandRequestProto not found"
when they tried to compile against the now-empty target/classes/.

Two-part fix:
1. Root pom.xml: replace the unconditional rm+unzip pair with a single
   /bin/sh one-liner that only runs if the JAR file actually exists, so
   target/classes/ is left intact when no JAR was produced.
2. hdds-datanode-grpc-client and hdds-scm-grpc-client: override
   pre-verify-refresh-classes-from-jar with phase=none, because both
   modules set mdep.analyze.skip=true (no dependency analysis runs), so
   wiping target/classes/ at pre-integration-test serves no purpose and
   only risks leaving the directory empty for downstream compilation.

Made-with: Cursor
… of .m2/

The pre-compile-delete-classes and pre-test-compile-refresh-classes antrun
executions were restoring hdds-datanode-grpc-client/target/classes/ by
unzipping from the .m2/ repository JAR.  On a CI clean-build (mvn clean
verify) no Ozone artifacts are pre-installed to .m2/, so those unzip
operations silently failed, leaving no safety net if anything clears the
directory after compilation.

Switch the source paths from ${settings.localRepository}/org/apache/ozone/...
to ${maven.multiModuleProjectDirectory}/hadoop-hdds/.../target/X.jar.
These local JARs are created during each foundation module's own package
phase (which runs before any downstream module's process-resources), so
they are always present on CI.  The unzip still silently no-ops when the
JAR is not yet built (e.g. the foundation module itself hasn't been packaged
yet in the current reactor pass).  This makes the class-file restoration
reliable on both CI and local machines without requiring a prior mvn install.

Made-with: Cursor
…repo root

ContainerProtos.class was extracted into the project root (org/apache/...)
by an unzip operation running in the wrong directory during a debugging
session and was accidentally included in commit c075c90.  Generated
bytecode has no place in version control.

Also add *.class to .gitignore so compiled class files at the project root
can never be staged again.

Made-with: Cursor
Three dependencies in hdds-server-scm were incorrectly changed to
test scope during a prior dependency:analyze-only cleanup:
- com.fasterxml.jackson.core:jackson-databind
- org.apache.commons:commons-compress
- org.apache.ozone:hdds-client

Although none of them are directly imported in SCM main sources,
they are loaded indirectly at runtime (jackson-databind is used via
reflection in StorageContainerManager; commons-compress and hdds-client
are loaded transitively).  Declaring them at test scope caused the
generated hdds-server-scm.classpath to omit them, resulting in a
NoClassDefFoundError for jackson-databind when the SCM process started,
crashing all Docker-based acceptance tests.

Restore the three dependencies to compile (default) scope so
build-classpath includes them in the runtime classpath descriptor.
Add matching ignoredUnusedDeclaredDependency entries to the root POM
so dependency:analyze-only no longer flags them as unused.

Made-with: Cursor
@errose28
Copy link
Copy Markdown
Contributor

errose28 commented Apr 6, 2026

+1 for splitting this into smaller patches if possible.

Additionally, we don't want to update the proto.lock files on the master branch if we have not done a release. That will lock in all currently unreleased proto changes even if they are not relevant for this change.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates Ozone’s gRPC transport from the unshaded io.grpc / io.netty / com.google.protobuf stack to the Ratis-shaded org.apache.ratis.thirdparty.* stack, and introduces new build/module structure to generate and consume shaded gRPC/protobuf sources without classpath duplication.

Changes:

  • Added new hdds-datanode-grpc-client and hdds-scm-grpc-client modules to own proto sources and produce Ratis-shaded gRPC stubs.
  • Updated Ozone/HDDS production and test code to use org.apache.ratis.thirdparty.io.grpc.* (and related shaded Netty/Protobuf types) where appropriate.
  • Updated Maven build behavior (dependency analysis ignores, Develocity cache hints, and new antrun executions) to manage generated sources and mitigate IDE/LSP interference.

Reviewed changes

Copilot reviewed 46 out of 55 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pom.xml Adds new grpc-client artifacts to dependencyManagement and introduces global build/antrun behaviors (clean/refresh/analyze workflow) and Develocity cache hints.
.gitignore Ignores *.class files.
hadoop-ozone/s3gateway/pom.xml Adds dependency-analyze ignore list and clarifies why some deps must remain compile-scoped.
hadoop-ozone/recon/pom.xml Refactors dependency-analyze config into an execution and expands ignore lists.
hadoop-ozone/ozonefs-hadoop3/pom.xml Moves dependency-analyze ignores into an analyze execution with appended children.
hadoop-ozone/ozonefs-hadoop2/pom.xml Same dependency-analyze execution refactor as ozonefs-hadoop3.
hadoop-ozone/ozonefs-common/pom.xml Adds dependency-analyze ignore for httpclient.
hadoop-ozone/cli-shell/pom.xml Adds dependency-analyze ignore for ratis-common.
hadoop-ozone/common/pom.xml Removes direct gRPC/Netty deps in favor of shaded stack usage.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java Replaces generated stub usage with shaded gRPC primitives + custom MethodDescriptor/marshaller and header attachment logic.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/GrpcOMFailoverProxyProvider.java Switches gRPC status imports to shaded equivalents.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/grpc/GrpcClientConstants.java Switches Context/Metadata imports to shaded gRPC equivalents.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/grpc/ClientAddressServerInterceptor.java Switches imports to shaded gRPC equivalents.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/grpc/ClientAddressClientInterceptor.java Switches imports to shaded gRPC equivalents.
hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/protocolPB/grpc/TestClientAddressServerInterceptor.java Updates test imports to shaded gRPC equivalents.
hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/protocolPB/grpc/TestClientAddressClientInterceptor.java Updates test imports to shaded gRPC equivalents.
hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/protocolPB/TestS3GrpcOmTransport.java Reworks tests away from in-process gRPC to shaded Netty server + custom MethodDescriptor/marshaller.
hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/protocolPB/TestGrpcOmTransportConcurrentFailover.java Same testing approach update as TestS3GrpcOmTransport, for concurrent failover.
hadoop-ozone/ozone-manager/pom.xml Removes direct gRPC/Netty dependencies (now relying on shaded stack).
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/GrpcOzoneManagerServer.java Switches server imports to shaded gRPC/Netty.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerServiceGrpc.java Implements shaded BindableService directly with a custom MethodDescriptor/marshaller to keep OM protos unshaded.
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/request/TestOMClientRequestWithUserInfo.java Updates gRPC Context import to shaded equivalent.
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOMMetadataReader.java Updates gRPC Context import to shaded equivalent.
hadoop-ozone/interface-client/pom.xml Adds Develocity cache hints to avoid stale generated-source compilation results.
hadoop-ozone/csi/pom.xml Removes unshaded gRPC/Netty/protobuf deps and adds ratis-thirdparty-misc; adds analyzer ignores and source rewrite step for generated protos.
hadoop-ozone/csi/src/main/java/org/apache/hadoop/ozone/csi/CsiServer.java Switches gRPC/Netty imports to shaded equivalents.
hadoop-ozone/csi/src/main/java/org/apache/hadoop/ozone/csi/ControllerService.java Switches StreamObserver import to shaded equivalent.
hadoop-ozone/csi/src/main/java/org/apache/hadoop/ozone/csi/IdentityService.java Switches BoolValue/StreamObserver imports to shaded equivalents.
hadoop-ozone/csi/src/main/java/org/apache/hadoop/ozone/csi/NodeService.java Switches StreamObserver import to shaded equivalent.
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestSecureOzoneRpcClient.java Reduces retries for faster failure and broadens expected exception type.
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestDelegationToken.java Reduces retries for faster failure and broadens expected exception type.
hadoop-ozone/integration-test-recon/pom.xml Refactors dependency-analyze ignores into an execution with appended children.
hadoop-ozone/dist/src/main/license/jar-report.txt Updates distribution jar inventory to reflect new modules and removed libs.
hadoop-ozone/dist/src/main/license/bin/LICENSE.txt Updates third-party license inventory reflecting dependency changes.
hadoop-hdds/pom.xml Adds the new grpc-client modules to the HDDS reactor build.
hadoop-hdds/interface-client/pom.xml Adds shaded-runtime dependency and adjusts protobuf generation to produce shaded ContainerProtos for DatanodeClientProtocol.proto.
hadoop-hdds/interface-server/pom.xml Removes SCM-related ratis-proto generation/rewrite now moved to scm-grpc-client; adds Develocity hints and temp dir config.
hadoop-hdds/interface-server/src/main/resources/proto.lock Removes SCM-related proto definitions from this module’s lock file after migration.
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsUtils.java Switches protobuf exception imports to shaded protobuf where ByteString is shaded.
hadoop-hdds/common/pom.xml Disables incremental compilation and adds dependency-analyze ignores for generated/processor-only deps.
hadoop-hdds/client/pom.xml Adds dependency on hdds-datanode-grpc-client and adjusts dependency-analyze ignores.
hadoop-hdds/framework/pom.xml Removes unshaded grpc-api dependency and adds test dependency on hdds-datanode-grpc-client; adds dependency-analyze ignore for jetty-http.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/grpc/metrics/GrpcMetricsServerRequestInterceptor.java Switches gRPC/protobuf imports to shaded equivalents.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/grpc/metrics/GrpcMetricsServerResponseInterceptor.java Switches gRPC/protobuf imports to shaded equivalents.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/grpc/metrics/GrpcMetricsServerTransportFilter.java Switches gRPC imports to shaded equivalents.
hadoop-hdds/container-service/pom.xml Adds dependency on hdds-datanode-grpc-client and adds dependency-analyze ignores for runtime/service-loaded deps.
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java Reorders unhealthy-marking to occur before completing the future exceptionally.
hadoop-hdds/server-scm/pom.xml Ensures certain deps remain compile-scoped for runtime classpath stability; adds dependency on hdds-scm-grpc-client.
hadoop-hdds/datanode-grpc-client/pom.xml New module to generate shaded gRPC stubs for DatanodeClientProtocol.proto with in-place source rewriting.
hadoop-hdds/datanode-grpc-client/src/main/resources/proto.lock Adds proto.lock for the new module.
hadoop-hdds/scm-grpc-client/pom.xml New module to generate shaded gRPC stubs for SCM internal RPC protos with in-place source rewriting.
hadoop-hdds/scm-grpc-client/src/main/proto/InterSCMProtocol.proto Migrated proto file now owned by scm-grpc-client.
hadoop-hdds/scm-grpc-client/src/main/proto/SCMRatisProtocol.proto Migrated proto file now owned by scm-grpc-client.
hadoop-hdds/scm-grpc-client/src/main/proto/SCMUpdateProtocol.proto Migrated proto file now owned by scm-grpc-client.
hadoop-hdds/scm-grpc-client/src/main/resources/proto.lock Adds proto.lock for scm-grpc-client with migrated proto definitions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pom.xml
Comment on lines +2368 to +2376
<!--
The Cursor/VSCode Java Language Server recreates class files in target/classes/
and target/test-classes/ (with com.apple.provenance extended attributes) faster
than the pre-clean rm -rf can remove them. failOnError=false lets the build
continue even if the clean plugin cannot remove those IDE-managed files; the
subsequent compile phase writes fresh class files over them so the build result
is still correct.
-->
<failOnError>false</failOnError>
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting maven-clean-plugin false globally can mask legitimate clean failures (eg. permission issues) and allow stale build outputs to leak into subsequent phases. Consider scoping this workaround to the affected environment (eg. a macOS-only profile/property) so other platforms still fail fast on real clean errors.

Copilot uses AI. Check for mistakes.
Comment on lines +437 to +443
@Override
public InputStream stream(T value) {
if (!(value instanceof com.google.protobuf.MessageLite)) {
throw new IllegalArgumentException("Expected protobuf request/response");
}
return new ByteArrayInputStream(((com.google.protobuf.MessageLite) value).toByteArray());
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proto2Marshaller.stream() serializes via MessageLite.toByteArray(), which always copies the message into a new byte[]. This contradicts the PR goal of preserving zero-copy marshalling and can add avoidable allocations on every RPC. Prefer streaming from the message's ByteString (eg. toByteString().newInput()) or another zero-copy InputStream source.

Copilot uses AI. Check for mistakes.
Comment on lines +401 to +404
org.apache.ratis.thirdparty.io.grpc.Channel intercepted =
ClientInterceptors.intercept(channel, new FixedHeadersInterceptor(headers));
return ClientCalls.blockingUnaryCall(intercepted, SUBMIT_REQUEST_METHOD,
CallOptions.DEFAULT, request);
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitRequest() wraps the ManagedChannel with ClientInterceptors.intercept(...) on every call to attach fixed headers. This creates additional wrapper allocations per RPC. Since the hostname/IP values are effectively constant for the process, consider attaching a headers interceptor once at channel construction time (or caching the intercepted Channel) to reduce per-call overhead.

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +128
private static <T extends MessageLite> MethodDescriptor.Marshaller<T> proto2Marshaller(
Proto2Parser<T> parser) {
return new MethodDescriptor.Marshaller<T>() {
@Override
public InputStream stream(T value) {
return new ByteArrayInputStream(value.toByteArray());
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proto2Marshaller().stream() uses value.toByteArray(), which forces a full copy of every request/response. If the intent is to keep zero-copy behavior, prefer streaming from ByteString (eg. value.toByteString().newInput()) to avoid extra allocations and match the PR description.

Copilot uses AI. Check for mistakes.
Comment on lines 401 to 405
omClient = new OzoneManagerProtocolClientSideTranslatorPB(
OmTransportFactory.create(conf, testUser, null),
OmTransportFactory.create(fastConf, testUser, null),
RandomStringUtils.secure().nextAscii(5));
ex = assertThrows(OMException.class,
assertThrows(IOException.class,
() -> omClient.cancelDelegationToken(token));
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion was broadened to any IOException, which weakens the check that token-cancel fails with the expected security-related error. To keep the test meaningful while still avoiding long retry timeouts, consider asserting on the IOException cause/message (or using assertThrowsExactly/expected subclasses like OMException/AccessControlException) rather than any IOException.

Copilot uses AI. Check for mistakes.
Comment thread pom.xml
Comment on lines +2423 to +2426
<exec executable="/bin/rm" failonerror="false">
<arg value="-rf" />
<arg value="${project.build.directory}" />
</exec>
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build now executes /bin/rm during pre-clean. This is non-portable (eg. Windows) and also bypasses Maven's normal clean semantics. Please gate this execution behind an OS-activated profile (macOS only) and/or an explicit opt-in property so CI and other developer environments are not affected.

Copilot uses AI. Check for mistakes.
Comment thread pom.xml
Comment on lines +2495 to +2498
<exec executable="/bin/sh" failonerror="false">
<arg value="-c" />
<arg value="JAR='${project.build.directory}/${project.build.finalName}.jar'; test -f &quot;$JAR&quot; &amp;&amp; /bin/rm -rf '${project.build.outputDirectory}' &amp;&amp; unzip -qo &quot;$JAR&quot; -d '${project.build.outputDirectory}'" />
</exec>
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-verify-refresh-classes-from-jar execution relies on /bin/sh and unzip being available and uses a shell one-liner to conditionally delete/restore target/classes. This is fragile and non-portable; consider using Ant condition + built-in / tasks (or a platform-scoped profile) so the behavior is deterministic across environments.

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +102
Metadata.Key.of("CLIENT_HOSTNAME", Metadata.ASCII_STRING_MARSHALLER);
private static final Metadata.Key<String> CLIENT_IP_ADDRESS_METADATA_KEY =
Metadata.Key.of("CLIENT_IP_ADDRESS", Metadata.ASCII_STRING_MARSHALLER);
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GrpcOmTransport duplicates the metadata key definitions for CLIENT_HOSTNAME/CLIENT_IP_ADDRESS instead of reusing GrpcClientConstants. This risks subtle mismatches if the header names/marshallers change elsewhere. Prefer referencing GrpcClientConstants.CLIENT_*_METADATA_KEY directly.

Suggested change
Metadata.Key.of("CLIENT_HOSTNAME", Metadata.ASCII_STRING_MARSHALLER);
private static final Metadata.Key<String> CLIENT_IP_ADDRESS_METADATA_KEY =
Metadata.Key.of("CLIENT_IP_ADDRESS", Metadata.ASCII_STRING_MARSHALLER);
GrpcClientConstants.CLIENT_HOSTNAME_METADATA_KEY;
private static final Metadata.Key<String> CLIENT_IP_ADDRESS_METADATA_KEY =
GrpcClientConstants.CLIENT_IP_ADDRESS_METADATA_KEY;

Copilot uses AI. Check for mistakes.
Comment on lines +453 to +459
assertThrows(IOException.class, () ->
proxyUser.doAs((PrivilegedExceptionAction<Void>) () -> {
try (OzoneClient ozoneClient = OzoneClientFactory.getRpcClient(testConf)) {
ozoneClient.getObjectStore().listVolumes("/");
}
return null;
}));
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test now accepts any IOException, which can also include unrelated failures (eg. transient network/cluster issues) and reduces the signal that the rejection is specifically auth-related. Consider asserting on the exception type/cause chain (eg. AccessControlException/OMException) or matching an expected message to keep the test deterministic.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! left a few quick comments. I'm in the middle of it

Comment on lines +1168 to -1174
// Mark the container unhealthy BEFORE completing the future so that
// any subsequent applyTransaction sees the unhealthy state immediately
// when it calls checkContainerHealthy (race-free).
stateMachineHealthy.compareAndSet(true, false);
unhealthyContainers.add(requestProto.getContainerID());
// Since the applyTransaction now is completed exceptionally,
// before any further snapshot is taken , the exception will be
// caught in stateMachineUpdater in Ratis and ratis server will
// shutdown.
applyTransactionFuture.completeExceptionally(sce);
stateMachineHealthy.compareAndSet(true, false);
unhealthyContainers.add(requestProto.getContainerID());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functional change should go to a separate PR.

Comment thread hadoop-ozone/csi/pom.xml
Comment on lines -36 to -44
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>${protobuf.version}</version>
</dependency>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure we want to replace guava too.
Protobuf? not sure either.

Comment on lines 449 to 459
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily needed.

Comment on lines +162 to +176
ThreadFactory factory = new ThreadFactoryBuilder()
.setDaemon(true)
.setNameFormat(CLIENT_NAME + "-ELG-%d")
.build();

final Class<? extends Channel> channelType;
if (Epoll.isAvailable()) {
eventLoopGroup = new EpollEventLoopGroup(0, factory);
channelType = EpollSocketChannel.class;
} else {
eventLoopGroup = new NioEventLoopGroup(0, factory);
channelType = NioSocketChannel.class;
}
LOG.info("{} channel type {}", CLIENT_NAME, channelType.getSimpleName());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change should split into a separate PR.

and I believe there are other places we want to use epoll whenever possible.

UserGroupInformation realUser = UserGroupInformation.createRemoteUser("realUser");
UserGroupInformation proxyUser = UserGroupInformation.createProxyUser("user", realUser);

assertThrows(AccessControlException.class, () -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to shading.

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in effect, this change:

scopes the configuration to a specific execution (analyze), and
makes the ignore-list additive across parent/child POM inheritance, reducing the chance that this module accidentally wipes out ignore rules defined in a parent.

}
}

private static final class Proto2Marshaller<T> implements MethodDescriptor.Marshaller<T> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's another one in TestS3GrpcOmTransport.java‎ which is exactly the same.

}
}

private static final class Proto2Marshaller<T> implements MethodDescriptor.Marshaller<T> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I wonder if the other Proto2Marshaller in GrpcOmTransport can conslidate too.

<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<useIncrementalCompilation>false</useIncrementalCompilation>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Disabling incremental compilation forces a cleaner, dependency-correct rebuild behavior for that module’s compilation step, so that when an API change happens, Maven/javac are much less likely to leave behind stale .class files that were compiled against old symbols.

In short: it trades some compile speed for build determinism and correctness, which matters a lot in PRs like this one (migrating shaded gRPC/Netty/Ratis dependencies) where API surfaces and classpaths can shift in ways that incremental compilation is particularly bad at handling."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants