feat(admin): /admin/healthz/leader for load-balancer leader probes by bootjp · Pull Request #689 · bootjp/elastickv

bootjp · 2026-04-27T12:15:59Z

Summary

Add /admin/healthz/leader so an upstream load balancer (Caddy, HAProxy, ...) can route admin traffic only to the current Raft leader, skipping the follower→leader AdminForward proxy hop on writes.
Mirrors the /healthz/leader endpoints already shipped on the S3 (adapter/s3.go:serveS3LeaderHealthz) and DynamoDB (adapter/dynamodb.go:serveDynamoLeaderHealthz) adapters — same semantics, same body shape, same method allowlist — so a multi-protocol load balancer probing any of the three sees identical responses.
New LeaderProbe interface in internal/admin; NewRouterWithLeaderProbe constructor (NewRouter delegates with nil probe, preserving existing callers); ServerDeps.LeaderProbe; newAdminLeaderProbe in main_admin.go consults the local Raft group runtimes.

Behaviour matrix

Path	Probe state	Response
`/admin/healthz`	(n/a)	200 `ok`
`/admin/healthz/leader`	wired, leader	200 `ok`
`/admin/healthz/leader`	wired, follower	503 `not leader`
`/admin/healthz/leader`	not wired	404 `not_found` (JSON)
`/admin/healthz/leader` POST	(any)	405 `method_not_allowed`

Self-review (CLAUDE.md 5 lenses)

Data loss — None. Read-only handler; no Raft proposals.
Concurrency — VerifyLeader is bounded at 2 s, called only on nodes that already claim leadership via Status(). No new shared state.
Performance — Cheap path: followers short-circuit via Status() (no Raft round-trip). Leader path: one ReadIndex per probe per group; load balancers typically probe every few seconds, so the overhead is bounded and well below the existing background HLC lease + ReadIndex traffic.
Data consistency — VerifyLeader is the same ReadIndex round-trip the lease-read path uses; a stale-leader follower mid-silent-leadership-change cannot return 200.
Test coverage — six router-level tests: 200-when-leader, 503-when-follower, HEAD on both branches, POST rejection, nil-probe → 404, classify ordering (path is not swallowed by the SPA fallback).

Test plan

go test -race ./internal/admin/... — passes (1.3s)
go test -race -run TestStartAdminServer . — passes
golangci-lint run ./... — clean
CI: full suite under -race

@claude review

Add a leader-only health endpoint at /admin/healthz/leader so an upstream load balancer (Caddy, HAProxy, ...) can route admin traffic only to the current Raft leader, skipping the follower->leader AdminForward proxy hop on writes. Mirrors the /healthz/leader endpoints already shipped on the S3 (adapter/s3.go:serveS3LeaderHealthz) and DynamoDB (adapter/dynamodb.go:serveDynamoLeaderHealthz) adapters so a multi-protocol load balancer probing any of the three sees identical semantics. Wiring: - internal/admin: new LeaderProbe interface + LeaderProbeFunc adapter; ServerDeps.LeaderProbe; new NewRouterWithLeaderProbe constructor (NewRouter delegates with nil probe, preserving callers). - Router.classify: new routeHealthzLeader matched BEFORE the SPA catch-all and BEFORE the /admin/healthz equality check so the longer path never resolves to index.html. - Router.serveLeaderHealth: 200 + "ok\n" when probe returns true; 503 + "not leader\n" otherwise. text/plain, Cache-Control: no-store, GET/HEAD only (mirrors S3/Dynamo). - main_admin.go: newAdminLeaderProbe iterates the local raft group runtimes, short-circuits on Status().State != StateLeader, and only nodes claiming leadership pay a 2 s ReadIndex VerifyLeader round-trip. "Leader of any local runtime" is treated as yes so a multi-group deployment that distributes leadership still surfaces a useful answer; in the common single-group / co-located case this collapses to "is this node THE leader". Returns nil for an empty runtimes slice so the router answers /admin/healthz/leader with a JSON 404 ("feature off"), matching the Tables / Buckets / Queues pattern. Behaviour matrix: | Path | Probe state | Response | |----------------------------|--------------------|----------------------------| | /admin/healthz | (n/a) | 200 ok | | /admin/healthz/leader | wired, leader | 200 ok | | /admin/healthz/leader | wired, follower | 503 not leader | | /admin/healthz/leader | not wired | 404 not_found (JSON) | | /admin/healthz/leader POST | (any) | 405 method_not_allowed | Test coverage: - TestRouter_HealthzLeader_ReturnsOKWhenLeader (200 + body shape) - TestRouter_HealthzLeader_Returns503WhenNotLeader (503 + body) - TestRouter_HealthzLeader_HeadOmitsBody (HEAD on both branches) - TestRouter_HealthzLeader_RejectsPost (method allowlist) - TestRouter_HealthzLeader_NilProbeReturns404 ("feature off") - TestRouter_HealthzLeader_NotSwallowedBySPA (classify ordering) Self-review (CLAUDE.md 5 lenses): 1. Data loss -- None. Read-only handler; no Raft proposals. 2. Concurrency -- VerifyLeader is bounded at 2 s, called only on nodes that already claim leadership via Status(). No new shared state. 3. Performance -- Cheap path: followers short-circuit via Status() (no Raft round-trip). Leader path: one ReadIndex per probe per group; load balancers typically probe every few seconds, so the overhead is bounded and well below the existing background HLC lease + ReadIndex traffic. 4. Data consistency -- VerifyLeader is the same ReadIndex round- trip the lease-read path uses; a stale-leader follower mid- silent-leadership-change cannot return 200. 5. Test coverage -- six router-level tests covering both happy paths, the wired-but-not-leader case, the not-wired case, the method allowlist, and the classify ordering that prevents the SPA fallback from swallowing the path.

coderabbitai · 2026-04-27T12:16:07Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 57 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d50530ef-aa89-4140-8ee7-414575aa0d35

📥 Commits

Reviewing files that changed from the base of the PR and between a3d312f and 689f2e4.

📒 Files selected for processing (5)

internal/admin/router.go
internal/admin/router_test.go
internal/admin/server.go
main_admin.go
main_admin_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/admin-leader-healthz

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bootjp · 2026-04-27T12:17:11Z

@claude review

claude · 2026-04-27T12:17:27Z

Claude finished @bootjp's task in 1m 57s —— View job

Code Review: `feat(admin): /admin/healthz/leader`

Read changed files
Review internal/admin/router.go
Review internal/admin/router_test.go
Review internal/admin/server.go
Review main_admin.go
Review main_admin_test.go
Cross-check against S3/DynamoDB leader probe patterns
Post review findings

Overall the implementation is clean, well-structured, and consistent with the existing patterns in the codebase. I have three findings — one minor RFC compliance gap that's pre-existing but now the endpoint claims parity with S3/DynamoDB (which do set the header), one dead-code nit, and one body-text divergence from S3 worth documenting.

Finding 1 — Missing `Allow` header on 405 (minor, pre-existing gap)

File: internal/admin/router.go:237–240

RFC 7231 §6.5.5 requires that a 405 Method Not Allowed response MUST include an Allow header listing the supported methods. Both reference implementations set it:

// adapter/s3.go:2404
w.Header().Set("Allow", "GET, HEAD")
w.WriteHeader(http.StatusMethodNotAllowed)

// adapter/dynamodb.go:399
w.Header().Set("Allow", "GET, HEAD")
http.Error(w, http.StatusText(http.StatusMethodNotAllowed), http.StatusMethodNotAllowed)

The new serveLeaderHealth (and all existing handlers in router.go) omit this. Since the PR explicitly claims "same semantics" as S3/DynamoDB, this gap is now more visible. A load balancer or health-check framework that inspects the Allow header will get an incomplete response.

This is pre-existing across serveHealth, serveAsset, and serveSPA too — fixing it in this PR would be the right time since the claim of parity is being made. Fix this →

Finding 2 — Redundant nil guard inside `serveLeaderHealth` (nit, dead code)

File: internal/admin/router.go:242

func (rt *Router) serveLeaderHealth(w http.ResponseWriter, r *http.Request) {
    ...
    if rt.leader == nil || !rt.leader.IsVerifiedLeader() {  // rt.leader == nil can never be true here

dispatch() already short-circuits a nil probe to rt.notFound before serveLeaderHealth is ever called:

case routeHealthzLeader:
    if rt.leader == nil {
        return rt.notFound   // exits here when nil
    }
    return http.HandlerFunc(rt.serveLeaderHealth)  // only reached when non-nil

The rt.leader == nil branch inside serveLeaderHealth is dead code. It can only mislead a future reader into thinking it's doing something load-bearing when it isn't. Removing it makes the precondition explicit. Fix this →

Finding 3 — Body text diverges from S3 reference (documentation nit)

The PR description states "same body shape" as both S3 and DynamoDB, but there's a small difference:

Adapter	Success body	Failure body
S3	`"ok"`	`"not leader"`
DynamoDB	`"ok\n"`	`"not leader\n"`
Admin	`"ok\n"`	`"not leader\n"`

Admin matches DynamoDB but not S3 (S3 omits the trailing newline). This won't affect load balancer behavior since LBs probe status codes, not bodies — but the PR description's "identical responses" claim is technically only true vs. DynamoDB. A one-liner update to the PR description would be sufficient.

Positive observations

Interface design is correct. LeaderProbe / LeaderProbeFunc mirrors ClusterInfoFunc. The nil-probe → 404 contract is clean and distinguishes "feature off" from "503 not leader" as intended.
Classify ordering is correct. pathHealthzLeader is evaluated before the SPA fallback (routeSPA catches all /admin/* paths). The test TestRouter_HealthzLeader_NotSwallowedBySPA locks this down well.
Short-circuit is correct. Status().State != raftengine.StateLeader returns early without the VerifyLeader ReadIndex round-trip for followers, matching the S3/DynamoDB pattern exactly.
Timeout is appropriate. The 2s adminLeaderProbeTimeout with context.Background() is the right choice here — there's no incoming request context to inherit and 2s matches the S3/DynamoDB probe reference.
Multi-group "any leader" semantics are documented inline. The logic is sound for the common single-group case and the multi-group edge is explained in the comment.
Test coverage is thorough. All six cases in the behavior matrix are covered, including the SPA regression guard. The TestRouter_HealthzLeader_HeadOmitsBody test correctly checks status code on the follower HEAD case (503 with empty body), which is the tricky part.
Wiring in main_admin.go is clean. Follows the existing bridge pattern; startAdminServer signature extension with nil default in tests is minimal.

Summary: Two small fixes worth making before merge (the Allow header for RFC compliance, and removing the dead nil guard); one documentation-only nit about S3 body parity. The core logic is sound and ready to ship.

gemini-code-assist

Code Review

This pull request introduces a new leader health check endpoint, /admin/healthz/leader, to facilitate load balancer routing to the Raft leader. It adds a LeaderProbe interface and integrates it into the admin router and server components. Review feedback recommends correcting a misleading comment about route classification, removing a redundant nil check in the health handler, and utilizing the server's lifecycle context within the leader probe to ensure proper task cancellation during shutdown.

gemini-code-assist · 2026-04-27T12:18:58Z

+	// /admin/healthz/leader must be checked before /admin/healthz so
+	// the longer path does not fall through the equality check and
+	// resolve to the SPA fallback.


The comment regarding path ordering is misleading. Since both /admin/healthz/leader and /admin/healthz are matched using exact equality (==), the order between them does not matter for correctness. The critical requirement is that both are checked before the SPA fallback (lines 162-164), which uses a prefix match and would otherwise swallow these routes.

Suggested change

// /admin/healthz/leader must be checked before /admin/healthz so

// the longer path does not fall through the equality check and

// resolve to the SPA fallback.

// Both healthz paths must be checked before the SPA catch-all

// so they are not swallowed by the index.html fallback.

gemini-code-assist · 2026-04-27T12:18:58Z

+		return
+	}
+	status, body := http.StatusOK, "ok\n"
+	if rt.leader == nil || !rt.leader.IsVerifiedLeader() {


The check rt.leader == nil is redundant here. The dispatch method (lines 203-205) already ensures that rt.leader is non-nil before calling this handler. If it were nil, the router would have already returned a 404 JSON response via rt.notFound. Removing this check simplifies the logic and avoids suggesting that a nil probe would result in a 503 response.

Suggested change

if rt.leader == nil || !rt.leader.IsVerifiedLeader() {

if !rt.leader.IsVerifiedLeader() {

gemini-code-assist · 2026-04-27T12:18:58Z

 		return errors.Wrap(err, "build admin leader forwarder")
 	}
-	_, err = startAdminServer(ctx, lc, eg, cfg, staticCreds, clusterSrc, tablesSrc, bucketsSrc, queuesSrc, forwarder, keyvizSampler, keyvizFanoutCfg, buildVersion())
+	leaderProbe := newAdminLeaderProbe(runtimes)


Pass the server's lifecycle context to newAdminLeaderProbe to enable proper cancellation of background Raft verification tasks during shutdown.

Suggested change

leaderProbe := newAdminLeaderProbe(runtimes)

leaderProbe := newAdminLeaderProbe(ctx, runtimes)

References

When designing interfaces, use context.Context for managing deadlines and cancellation to ensure proper resource management and graceful shutdown.

gemini-code-assist · 2026-04-27T12:18:58Z

+func newAdminLeaderProbe(runtimes []*raftGroupRuntime) admin.LeaderProbe {
+	if len(runtimes) == 0 {
+		return nil
+	}
+	return admin.LeaderProbeFunc(func() bool {
+		for _, rt := range runtimes {
+			if rt == nil || rt.engine == nil {
+				continue
+			}
+			if rt.engine.Status().State != raftengine.StateLeader {
+				continue
+			}
+			ctx, cancel := context.WithTimeout(context.Background(), adminLeaderProbeTimeout)
+			err := rt.engine.VerifyLeader(ctx)
+			cancel()
+			if err == nil {
+				return true
+			}
+		}
+		return false
+	})
+}


The newAdminLeaderProbe function should accept and use the server's lifecycle context. Using context.Background() for VerifyLeader means that these operations will not be cancelled if the server is shutting down. Since VerifyLeader involves network I/O and Raft processing, respecting the server's context is a best practice for graceful shutdown and resource management.

func newAdminLeaderProbe(ctx context.Context, runtimes []*raftGroupRuntime) admin.LeaderProbe { if len(runtimes) == 0 { return nil } return admin.LeaderProbeFunc(func() bool { for _, rt := range runtimes { if rt == nil || rt.engine == nil { continue } if rt.engine.Status().State != raftengine.StateLeader { continue } tCtx, cancel := context.WithTimeout(ctx, adminLeaderProbeTimeout) err := rt.engine.VerifyLeader(tCtx) cancel() if err == nil { return true } } return false }) }

References

When designing interfaces, use context.Context for managing deadlines and cancellation to ensure proper resource management and graceful shutdown.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f50223275

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T12:25:27Z

+		for _, rt := range runtimes {
+			if rt == nil || rt.engine == nil {
+				continue
+			}
+			if rt.engine.Status().State != raftengine.StateLeader {


Base admin leader probe on default raft group

newAdminLeaderProbe returns healthy as soon as any local runtime verifies leadership, but admin write paths still key off the coordinator’s default group (kv/sharded_coordinator.go IsLeader/VerifyLeader) and SQS admin deletes have no follower-forwarding fallback (adapter/sqs_admin.go), so in multi-group deployments a node that leads a non-default group can return 200 here while still rejecting admin writes with 503 (or forcing extra forward hops). That breaks the endpoint’s stated load-balancer contract of selecting the current admin-write leader.

Useful? React with 👍 / 👎.

Address findings from PR #689 round-1 review: - [Codex P1] Scope newAdminLeaderProbe to the **default Raft group** via coordinate.IsLeader / coordinate.VerifyLeader instead of the prior "any local runtime" loop. Admin write paths (kv/sharded_coordinator.go, adapter/sqs_admin.go) all key off the default group, so a node leading only a non-default group could return 200 here while still rejecting admin writes -- breaking the endpoint's stated load-balancer contract. Routing through the coordinator keeps the healthz answer aligned with what the AdminForward proxy and SQS admin write path treat as authoritative. newAdminLeaderProbe(runtimes []*raftGroupRuntime) becomes newAdminLeaderProbe(coordinate kv.Coordinator). The change drops the per-runtime engine.VerifyLeader fan-out and the adminLeaderProbeTimeout constant -- coordinate.VerifyLeader is the same ReadIndex round-trip lease reads use, with the engine's own bounded deadline, so an outer 2 s timeout is no longer needed. - [Claude / Gemini] Add the RFC 7231 §6.5.5 Allow: GET, HEAD header to every router-served 405 response (healthz, healthz/leader, asset, SPA). Pulled the body shape into a writeMethodNotAllowed helper so the four call sites stay uniform; load balancers and synthetic-monitor tools now see the same Allow header the S3 and DynamoDB /healthz/leader handlers already advertise. New TestRouter_405_AllowHeader sweeps every 405-emitting path so a future handler that bypasses the helper would regress this test. - [Claude / Gemini] Drop the redundant rt.leader == nil check inside serveLeaderHealth. dispatch() already short-circuits the nil case to rt.notFound before this handler is ever invoked; the in-body check was dead code that misled readers about the precondition. Documented the precondition explicitly on the function. - [Gemini] Rewrite the misleading "ordering before /admin/healthz" comment in classify(). Both healthz paths are exact-match equality checks; their relative order does not matter for correctness. The load-bearing rule is that BOTH must run before the SPA prefix branch -- now spelled out that way. Self-review (CLAUDE.md 5 lenses): 1. Data loss -- None. Read-only handlers; no Raft proposals touched. 2. Concurrency -- The probe now serialises through the coordinator's default-group view, which is the same path lease reads use. Behaviour during leadership change is unchanged: VerifyLeader's ReadIndex returns an error, the probe returns false, the LB takes the node out of rotation. 3. Performance -- Cheaper than the prior per-runtime loop in the common single-group case (one IsLeader / one VerifyLeader vs N per-runtime Status checks plus M VerifyLeader calls). 4. Data consistency -- coordinate.VerifyLeader is the same ReadIndex round-trip the lease-read path uses; the load balancer can now key off the same liveness signal admin writes do. 5. Test coverage -- new TestRouter_405_AllowHeader. Existing six /admin/healthz/leader tests still cover happy / 503 / HEAD / POST / nil-probe / SPA-no-swallow. The redundant-nil-guard removal is covered by TestRouter_HealthzLeader_NilProbeReturns404 (still passes via dispatch()'s nil check). Body-text parity nit: the PR description's "identical responses to S3 and DynamoDB" is technically true vs DynamoDB but not S3 (S3 omits the trailing newline; this admin path matches DynamoDB). Not worth a behaviour change -- LBs key off status codes, not body strings -- but worth noting in case a future PR aligns the three.

bootjp · 2026-04-27T12:47:28Z

@claude review

claude · 2026-04-27T12:47:41Z

Claude finished @bootjp's task in 3m 21s —— View job

Code Review: `feat(admin): /admin/healthz/leader` (round 2)

Read changed files and git log
Verify round-1 findings were addressed
Trace coordinate.VerifyLeader() call chain for shutdown behavior
Post findings

The fix commit 689f2e4 addressed all round-1 findings. Here's the status and one new observation surfaced while tracing the implementation.

Round-1 findings: all addressed ✅

Finding	Status
RFC 7231 §6.5.5 `Allow` header missing on 405	✅ Fixed — `writeMethodNotAllowed` helper + `TestRouter_405_AllowHeader` sweep
Redundant `rt.leader == nil` guard in `serveLeaderHealth`	✅ Removed — precondition now documented on the function
Codex P1 — "any group" leader could return 200 while admin writes still 503	✅ Fixed — `newAdminLeaderProbe` now takes `kv.Coordinator` (default group)
Misleading "ordering before /admin/healthz" comment in `classify()`	✅ Rewritten accurately

New finding — `coordinate.VerifyLeader()` uses `context.Background()` internally (informational)

File: kv/raft_engine.go:43-45, main_admin.go:177

The fix commit's commit message claims "coordinate.VerifyLeader is the same ReadIndex round-trip lease reads use, with the engine's own bounded deadline, so a stalled cluster surfaces 503 here on its own without the probe needing an outer timeout." This is worth verifying, because it's the stated justification for not passing the server lifecycle context (Gemini's suggestion).

Tracing the call chain:

main_admin.go:177  coordinate.VerifyLeader()
kv/coordinator.go:616  → verifyLeaderEngine(c.engine)
kv/raft_engine.go:43   → verifyLeaderEngineCtx(context.Background(), engine)
raft/engine.go:742     → engine.VerifyLeader(context.Background())
raft/engine.go:1021    → submitRead(context.Background(), false)

submitRead has two selects: the first (<-e.doneCh) returns early if the engine is already stopped before the read can be enqueued — good. But the second select (waiting for <-req.done) has no <-e.doneCh arm:

// kv/raft_engine.go — submitRead (second select)
select {
case <-ctx.Done():       // context.Background() — never fires
case res := <-req.done:  // waits indefinitely; no e.doneCh escape
}

So if the engine stops after the read request is enqueued (req written to readCh), the probe goroutine can block until the raft engine's own shutdown logic delivers a result to req.done. Whether it does so promptly depends on the engine's teardown path — and that's not a new risk introduced here, since verifyLeaderEngine is called from other places too.

Practical impact: During graceful shutdown, a health probe that is mid-VerifyLeader when the server context is cancelled won't be interrupted. In the worst case it blocks until the raft engine stops naturally, with adminShutdownTimeout = 5 s as the outer backstop. Not a correctness issue and not introduced by this PR — it's pre-existing behavior in verifyLeaderEngine. The fix commit's reasoning is directionally correct (the engine stops → the read unblocks); it's just not instant.

Gemini's suggestion (pass the server lifecycle context to the probe) would actually eliminate this — a context.WithTimeout(ctx, 2s) wrapper, or exposing a VerifyLeaderCtx(ctx) variant on Coordinator, would let the probe respect the shutdown signal. But that's a broader change to the Coordinator interface and not required for this PR to be correct.

Verdict: Informational only. No action needed here unless you want to address shutdown latency precisely — and that belongs in a follow-up that also fixes the other verifyLeaderEngine callers consistently.

Nit — `TestRouter_HealthzRejectsPost` doesn't assert `Allow` header

File: internal/admin/router_test.go:59–66

The existing TestRouter_HealthzRejectsPost test was not updated to assert Allow: GET, HEAD — it only checks the status code. The new TestRouter_405_AllowHeader sweep test covers this path, so there's no regression risk. But TestRouter_HealthzLeader_RejectsPost (line 125–132) does assert the header inline, creating a small inconsistency in the test style. Minor, and arguably the sweep test is the right place to assert this invariant. Not a blocker. Fix this →

Positive observations

writeMethodNotAllowed helper is a clean DRY abstraction. The TestRouter_405_AllowHeader table sweep is the right way to lock this down — a future handler that bypasses the helper will regress the test rather than silently shipping a non-compliant 405.
Default-group scoping (coordinate.IsLeader / coordinate.VerifyLeader) is the correct fix for the Codex P1 finding. The comment at main_admin.go:155–160 clearly explains the "why" so a future reader understands the tradeoff.
verifyLeaderEngine short-circuit: followers still exit without a ReadIndex round-trip (IsLeader() returns false → return early). Only nodes claiming leadership pay the VerifyLeader cost.
TestRouter_405_AllowHeader is well structured as a parallel table test; the failure message "path %s missing RFC 7231 Allow header on 405" gives immediate signal on regressions.
All six behavior matrix cases are covered and the SPA regression guard is still in place.

Summary: Two commits, clean implementation. All round-1 findings resolved; one informational observation about shutdown latency in the existing verifyLeaderEngine path (pre-existing, not introduced here); one test nit. Ready to merge.

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

bootjp enabled auto-merge April 27, 2026 12:53

bootjp merged commit 67cf028 into main Apr 27, 2026
9 checks passed

bootjp deleted the feat/admin-leader-healthz branch April 27, 2026 12:53

	if rt.leader == nil \|\| !rt.leader.IsVerifiedLeader() {
	if !rt.leader.IsVerifiedLeader() {

	leaderProbe := newAdminLeaderProbe(runtimes)
	leaderProbe := newAdminLeaderProbe(ctx, runtimes)

Conversation

bootjp commented Apr 27, 2026

Summary

Behaviour matrix

Self-review (CLAUDE.md 5 lenses)

Test plan

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

bootjp commented Apr 27, 2026

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: feat(admin): /admin/healthz/leader

Finding 1 — Missing Allow header on 405 (minor, pre-existing gap)

Finding 2 — Redundant nil guard inside serveLeaderHealth (nit, dead code)

Finding 3 — Body text diverges from S3 reference (documentation nit)

Positive observations

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 27, 2026

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: feat(admin): /admin/healthz/leader (round 2)

Round-1 findings: all addressed ✅

New finding — coordinate.VerifyLeader() uses context.Background() internally (informational)

Nit — TestRouter_HealthzRejectsPost doesn't assert Allow header

Positive observations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

claude Bot commented Apr 27, 2026 •

edited

Loading

Code Review: `feat(admin): /admin/healthz/leader`

Finding 1 — Missing `Allow` header on 405 (minor, pre-existing gap)

Finding 2 — Redundant nil guard inside `serveLeaderHealth` (nit, dead code)

claude Bot commented Apr 27, 2026 •

edited

Loading

Code Review: `feat(admin): /admin/healthz/leader` (round 2)

New finding — `coordinate.VerifyLeader()` uses `context.Background()` internally (informational)

Nit — `TestRouter_HealthzRejectsPost` doesn't assert `Allow` header