feat(admin): /admin/healthz/leader for load-balancer leader probes#689
feat(admin): /admin/healthz/leader for load-balancer leader probes#689
Conversation
Add a leader-only health endpoint at /admin/healthz/leader so an
upstream load balancer (Caddy, HAProxy, ...) can route admin
traffic only to the current Raft leader, skipping the
follower->leader AdminForward proxy hop on writes. Mirrors the
/healthz/leader endpoints already shipped on the S3
(adapter/s3.go:serveS3LeaderHealthz) and DynamoDB
(adapter/dynamodb.go:serveDynamoLeaderHealthz) adapters so a
multi-protocol load balancer probing any of the three sees
identical semantics.
Wiring:
- internal/admin: new LeaderProbe interface + LeaderProbeFunc
adapter; ServerDeps.LeaderProbe; new
NewRouterWithLeaderProbe constructor (NewRouter delegates with
nil probe, preserving callers).
- Router.classify: new routeHealthzLeader matched BEFORE the SPA
catch-all and BEFORE the /admin/healthz equality check so the
longer path never resolves to index.html.
- Router.serveLeaderHealth: 200 + "ok\n" when probe returns true;
503 + "not leader\n" otherwise. text/plain, Cache-Control:
no-store, GET/HEAD only (mirrors S3/Dynamo).
- main_admin.go: newAdminLeaderProbe iterates the local raft
group runtimes, short-circuits on Status().State !=
StateLeader, and only nodes claiming leadership pay a 2 s
ReadIndex VerifyLeader round-trip. "Leader of any local
runtime" is treated as yes so a multi-group deployment that
distributes leadership still surfaces a useful answer; in the
common single-group / co-located case this collapses to "is
this node THE leader". Returns nil for an empty runtimes slice
so the router answers /admin/healthz/leader with a JSON 404
("feature off"), matching the Tables / Buckets / Queues
pattern.
Behaviour matrix:
| Path | Probe state | Response |
|----------------------------|--------------------|----------------------------|
| /admin/healthz | (n/a) | 200 ok |
| /admin/healthz/leader | wired, leader | 200 ok |
| /admin/healthz/leader | wired, follower | 503 not leader |
| /admin/healthz/leader | not wired | 404 not_found (JSON) |
| /admin/healthz/leader POST | (any) | 405 method_not_allowed |
Test coverage:
- TestRouter_HealthzLeader_ReturnsOKWhenLeader (200 + body shape)
- TestRouter_HealthzLeader_Returns503WhenNotLeader (503 + body)
- TestRouter_HealthzLeader_HeadOmitsBody (HEAD on both branches)
- TestRouter_HealthzLeader_RejectsPost (method allowlist)
- TestRouter_HealthzLeader_NilProbeReturns404 ("feature off")
- TestRouter_HealthzLeader_NotSwallowedBySPA (classify ordering)
Self-review (CLAUDE.md 5 lenses):
1. Data loss -- None. Read-only handler; no Raft proposals.
2. Concurrency -- VerifyLeader is bounded at 2 s, called only on
nodes that already claim leadership via Status(). No new
shared state.
3. Performance -- Cheap path: followers short-circuit via
Status() (no Raft round-trip). Leader path: one ReadIndex per
probe per group; load balancers typically probe every few
seconds, so the overhead is bounded and well below the
existing background HLC lease + ReadIndex traffic.
4. Data consistency -- VerifyLeader is the same ReadIndex round-
trip the lease-read path uses; a stale-leader follower mid-
silent-leadership-change cannot return 200.
5. Test coverage -- six router-level tests covering both happy
paths, the wired-but-not-leader case, the not-wired case, the
method allowlist, and the classify ordering that prevents the
SPA fallback from swallowing the path.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 1m 57s —— View job Code Review:
|
| Adapter | Success body | Failure body |
|---|---|---|
| S3 | "ok" |
"not leader" |
| DynamoDB | "ok\n" |
"not leader\n" |
| Admin | "ok\n" |
"not leader\n" |
Admin matches DynamoDB but not S3 (S3 omits the trailing newline). This won't affect load balancer behavior since LBs probe status codes, not bodies — but the PR description's "identical responses" claim is technically only true vs. DynamoDB. A one-liner update to the PR description would be sufficient.
Positive observations
- Interface design is correct.
LeaderProbe/LeaderProbeFuncmirrorsClusterInfoFunc. The nil-probe → 404 contract is clean and distinguishes "feature off" from "503 not leader" as intended. - Classify ordering is correct.
pathHealthzLeaderis evaluated before the SPA fallback (routeSPAcatches all/admin/*paths). The testTestRouter_HealthzLeader_NotSwallowedBySPAlocks this down well. - Short-circuit is correct.
Status().State != raftengine.StateLeaderreturns early without theVerifyLeaderReadIndex round-trip for followers, matching the S3/DynamoDB pattern exactly. - Timeout is appropriate. The 2s
adminLeaderProbeTimeoutwithcontext.Background()is the right choice here — there's no incoming request context to inherit and 2s matches the S3/DynamoDB probe reference. - Multi-group "any leader" semantics are documented inline. The logic is sound for the common single-group case and the multi-group edge is explained in the comment.
- Test coverage is thorough. All six cases in the behavior matrix are covered, including the SPA regression guard. The
TestRouter_HealthzLeader_HeadOmitsBodytest correctly checks status code on the follower HEAD case (503 with empty body), which is the tricky part. - Wiring in
main_admin.gois clean. Follows the existing bridge pattern;startAdminServersignature extension with nil default in tests is minimal.
Summary: Two small fixes worth making before merge (the Allow header for RFC compliance, and removing the dead nil guard); one documentation-only nit about S3 body parity. The core logic is sound and ready to ship.
There was a problem hiding this comment.
Code Review
This pull request introduces a new leader health check endpoint, /admin/healthz/leader, to facilitate load balancer routing to the Raft leader. It adds a LeaderProbe interface and integrates it into the admin router and server components. Review feedback recommends correcting a misleading comment about route classification, removing a redundant nil check in the health handler, and utilizing the server's lifecycle context within the leader probe to ensure proper task cancellation during shutdown.
| // /admin/healthz/leader must be checked before /admin/healthz so | ||
| // the longer path does not fall through the equality check and | ||
| // resolve to the SPA fallback. |
There was a problem hiding this comment.
The comment regarding path ordering is misleading. Since both /admin/healthz/leader and /admin/healthz are matched using exact equality (==), the order between them does not matter for correctness. The critical requirement is that both are checked before the SPA fallback (lines 162-164), which uses a prefix match and would otherwise swallow these routes.
| // /admin/healthz/leader must be checked before /admin/healthz so | |
| // the longer path does not fall through the equality check and | |
| // resolve to the SPA fallback. | |
| // Both healthz paths must be checked before the SPA catch-all | |
| // so they are not swallowed by the index.html fallback. |
| return | ||
| } | ||
| status, body := http.StatusOK, "ok\n" | ||
| if rt.leader == nil || !rt.leader.IsVerifiedLeader() { |
There was a problem hiding this comment.
The check rt.leader == nil is redundant here. The dispatch method (lines 203-205) already ensures that rt.leader is non-nil before calling this handler. If it were nil, the router would have already returned a 404 JSON response via rt.notFound. Removing this check simplifies the logic and avoids suggesting that a nil probe would result in a 503 response.
| if rt.leader == nil || !rt.leader.IsVerifiedLeader() { | |
| if !rt.leader.IsVerifiedLeader() { |
| return errors.Wrap(err, "build admin leader forwarder") | ||
| } | ||
| _, err = startAdminServer(ctx, lc, eg, cfg, staticCreds, clusterSrc, tablesSrc, bucketsSrc, queuesSrc, forwarder, keyvizSampler, keyvizFanoutCfg, buildVersion()) | ||
| leaderProbe := newAdminLeaderProbe(runtimes) |
There was a problem hiding this comment.
Pass the server's lifecycle context to newAdminLeaderProbe to enable proper cancellation of background Raft verification tasks during shutdown.
| leaderProbe := newAdminLeaderProbe(runtimes) | |
| leaderProbe := newAdminLeaderProbe(ctx, runtimes) |
References
- When designing interfaces, use context.Context for managing deadlines and cancellation to ensure proper resource management and graceful shutdown.
| func newAdminLeaderProbe(runtimes []*raftGroupRuntime) admin.LeaderProbe { | ||
| if len(runtimes) == 0 { | ||
| return nil | ||
| } | ||
| return admin.LeaderProbeFunc(func() bool { | ||
| for _, rt := range runtimes { | ||
| if rt == nil || rt.engine == nil { | ||
| continue | ||
| } | ||
| if rt.engine.Status().State != raftengine.StateLeader { | ||
| continue | ||
| } | ||
| ctx, cancel := context.WithTimeout(context.Background(), adminLeaderProbeTimeout) | ||
| err := rt.engine.VerifyLeader(ctx) | ||
| cancel() | ||
| if err == nil { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| }) | ||
| } |
There was a problem hiding this comment.
The newAdminLeaderProbe function should accept and use the server's lifecycle context. Using context.Background() for VerifyLeader means that these operations will not be cancelled if the server is shutting down. Since VerifyLeader involves network I/O and Raft processing, respecting the server's context is a best practice for graceful shutdown and resource management.
func newAdminLeaderProbe(ctx context.Context, runtimes []*raftGroupRuntime) admin.LeaderProbe {
if len(runtimes) == 0 {
return nil
}
return admin.LeaderProbeFunc(func() bool {
for _, rt := range runtimes {
if rt == nil || rt.engine == nil {
continue
}
if rt.engine.Status().State != raftengine.StateLeader {
continue
}
tCtx, cancel := context.WithTimeout(ctx, adminLeaderProbeTimeout)
err := rt.engine.VerifyLeader(tCtx)
cancel()
if err == nil {
return true
}
}
return false
})
}References
- When designing interfaces, use context.Context for managing deadlines and cancellation to ensure proper resource management and graceful shutdown.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5f50223275
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for _, rt := range runtimes { | ||
| if rt == nil || rt.engine == nil { | ||
| continue | ||
| } | ||
| if rt.engine.Status().State != raftengine.StateLeader { |
There was a problem hiding this comment.
Base admin leader probe on default raft group
newAdminLeaderProbe returns healthy as soon as any local runtime verifies leadership, but admin write paths still key off the coordinator’s default group (kv/sharded_coordinator.go IsLeader/VerifyLeader) and SQS admin deletes have no follower-forwarding fallback (adapter/sqs_admin.go), so in multi-group deployments a node that leads a non-default group can return 200 here while still rejecting admin writes with 503 (or forcing extra forward hops). That breaks the endpoint’s stated load-balancer contract of selecting the current admin-write leader.
Useful? React with 👍 / 👎.
Address findings from PR #689 round-1 review: - [Codex P1] Scope newAdminLeaderProbe to the **default Raft group** via coordinate.IsLeader / coordinate.VerifyLeader instead of the prior "any local runtime" loop. Admin write paths (kv/sharded_coordinator.go, adapter/sqs_admin.go) all key off the default group, so a node leading only a non-default group could return 200 here while still rejecting admin writes -- breaking the endpoint's stated load-balancer contract. Routing through the coordinator keeps the healthz answer aligned with what the AdminForward proxy and SQS admin write path treat as authoritative. newAdminLeaderProbe(runtimes []*raftGroupRuntime) becomes newAdminLeaderProbe(coordinate kv.Coordinator). The change drops the per-runtime engine.VerifyLeader fan-out and the adminLeaderProbeTimeout constant -- coordinate.VerifyLeader is the same ReadIndex round-trip lease reads use, with the engine's own bounded deadline, so an outer 2 s timeout is no longer needed. - [Claude / Gemini] Add the RFC 7231 §6.5.5 Allow: GET, HEAD header to every router-served 405 response (healthz, healthz/leader, asset, SPA). Pulled the body shape into a writeMethodNotAllowed helper so the four call sites stay uniform; load balancers and synthetic-monitor tools now see the same Allow header the S3 and DynamoDB /healthz/leader handlers already advertise. New TestRouter_405_AllowHeader sweeps every 405-emitting path so a future handler that bypasses the helper would regress this test. - [Claude / Gemini] Drop the redundant rt.leader == nil check inside serveLeaderHealth. dispatch() already short-circuits the nil case to rt.notFound before this handler is ever invoked; the in-body check was dead code that misled readers about the precondition. Documented the precondition explicitly on the function. - [Gemini] Rewrite the misleading "ordering before /admin/healthz" comment in classify(). Both healthz paths are exact-match equality checks; their relative order does not matter for correctness. The load-bearing rule is that BOTH must run before the SPA prefix branch -- now spelled out that way. Self-review (CLAUDE.md 5 lenses): 1. Data loss -- None. Read-only handlers; no Raft proposals touched. 2. Concurrency -- The probe now serialises through the coordinator's default-group view, which is the same path lease reads use. Behaviour during leadership change is unchanged: VerifyLeader's ReadIndex returns an error, the probe returns false, the LB takes the node out of rotation. 3. Performance -- Cheaper than the prior per-runtime loop in the common single-group case (one IsLeader / one VerifyLeader vs N per-runtime Status checks plus M VerifyLeader calls). 4. Data consistency -- coordinate.VerifyLeader is the same ReadIndex round-trip the lease-read path uses; the load balancer can now key off the same liveness signal admin writes do. 5. Test coverage -- new TestRouter_405_AllowHeader. Existing six /admin/healthz/leader tests still cover happy / 503 / HEAD / POST / nil-probe / SPA-no-swallow. The redundant-nil-guard removal is covered by TestRouter_HealthzLeader_NilProbeReturns404 (still passes via dispatch()'s nil check). Body-text parity nit: the PR description's "identical responses to S3 and DynamoDB" is technically true vs DynamoDB but not S3 (S3 omits the trailing newline; this admin path matches DynamoDB). Not worth a behaviour change -- LBs key off status codes, not body strings -- but worth noting in case a future PR aligns the three.
|
@claude review |
|
Claude finished @bootjp's task in 3m 21s —— View job Code Review:
|
| Finding | Status |
|---|---|
RFC 7231 §6.5.5 Allow header missing on 405 |
✅ Fixed — writeMethodNotAllowed helper + TestRouter_405_AllowHeader sweep |
Redundant rt.leader == nil guard in serveLeaderHealth |
✅ Removed — precondition now documented on the function |
| Codex P1 — "any group" leader could return 200 while admin writes still 503 | ✅ Fixed — newAdminLeaderProbe now takes kv.Coordinator (default group) |
Misleading "ordering before /admin/healthz" comment in classify() |
✅ Rewritten accurately |
New finding — coordinate.VerifyLeader() uses context.Background() internally (informational)
File: kv/raft_engine.go:43-45, main_admin.go:177
The fix commit's commit message claims "coordinate.VerifyLeader is the same ReadIndex round-trip lease reads use, with the engine's own bounded deadline, so a stalled cluster surfaces 503 here on its own without the probe needing an outer timeout." This is worth verifying, because it's the stated justification for not passing the server lifecycle context (Gemini's suggestion).
Tracing the call chain:
main_admin.go:177 coordinate.VerifyLeader()
kv/coordinator.go:616 → verifyLeaderEngine(c.engine)
kv/raft_engine.go:43 → verifyLeaderEngineCtx(context.Background(), engine)
raft/engine.go:742 → engine.VerifyLeader(context.Background())
raft/engine.go:1021 → submitRead(context.Background(), false)
submitRead has two selects: the first (<-e.doneCh) returns early if the engine is already stopped before the read can be enqueued — good. But the second select (waiting for <-req.done) has no <-e.doneCh arm:
// kv/raft_engine.go — submitRead (second select)
select {
case <-ctx.Done(): // context.Background() — never fires
case res := <-req.done: // waits indefinitely; no e.doneCh escape
}So if the engine stops after the read request is enqueued (req written to readCh), the probe goroutine can block until the raft engine's own shutdown logic delivers a result to req.done. Whether it does so promptly depends on the engine's teardown path — and that's not a new risk introduced here, since verifyLeaderEngine is called from other places too.
Practical impact: During graceful shutdown, a health probe that is mid-VerifyLeader when the server context is cancelled won't be interrupted. In the worst case it blocks until the raft engine stops naturally, with adminShutdownTimeout = 5 s as the outer backstop. Not a correctness issue and not introduced by this PR — it's pre-existing behavior in verifyLeaderEngine. The fix commit's reasoning is directionally correct (the engine stops → the read unblocks); it's just not instant.
Gemini's suggestion (pass the server lifecycle context to the probe) would actually eliminate this — a context.WithTimeout(ctx, 2s) wrapper, or exposing a VerifyLeaderCtx(ctx) variant on Coordinator, would let the probe respect the shutdown signal. But that's a broader change to the Coordinator interface and not required for this PR to be correct.
Verdict: Informational only. No action needed here unless you want to address shutdown latency precisely — and that belongs in a follow-up that also fixes the other verifyLeaderEngine callers consistently.
Nit — TestRouter_HealthzRejectsPost doesn't assert Allow header
File: internal/admin/router_test.go:59–66
The existing TestRouter_HealthzRejectsPost test was not updated to assert Allow: GET, HEAD — it only checks the status code. The new TestRouter_405_AllowHeader sweep test covers this path, so there's no regression risk. But TestRouter_HealthzLeader_RejectsPost (line 125–132) does assert the header inline, creating a small inconsistency in the test style. Minor, and arguably the sweep test is the right place to assert this invariant. Not a blocker. Fix this →
Positive observations
writeMethodNotAllowedhelper is a clean DRY abstraction. TheTestRouter_405_AllowHeadertable sweep is the right way to lock this down — a future handler that bypasses the helper will regress the test rather than silently shipping a non-compliant 405.- Default-group scoping (
coordinate.IsLeader/coordinate.VerifyLeader) is the correct fix for the Codex P1 finding. The comment atmain_admin.go:155–160clearly explains the "why" so a future reader understands the tradeoff. verifyLeaderEngineshort-circuit: followers still exit without a ReadIndex round-trip (IsLeader()returns false → return early). Only nodes claiming leadership pay theVerifyLeadercost.TestRouter_405_AllowHeaderis well structured as a parallel table test; the failure message"path %s missing RFC 7231 Allow header on 405"gives immediate signal on regressions.- All six behavior matrix cases are covered and the SPA regression guard is still in place.
Summary: Two commits, clean implementation. All round-1 findings resolved; one informational observation about shutdown latency in the existing verifyLeaderEngine path (pre-existing, not introduced here); one test nit. Ready to merge.
Summary
/admin/healthz/leaderso an upstream load balancer (Caddy, HAProxy, ...) can route admin traffic only to the current Raft leader, skipping the follower→leader AdminForward proxy hop on writes./healthz/leaderendpoints already shipped on the S3 (adapter/s3.go:serveS3LeaderHealthz) and DynamoDB (adapter/dynamodb.go:serveDynamoLeaderHealthz) adapters — same semantics, same body shape, same method allowlist — so a multi-protocol load balancer probing any of the three sees identical responses.LeaderProbeinterface ininternal/admin;NewRouterWithLeaderProbeconstructor (NewRouterdelegates withnilprobe, preserving existing callers);ServerDeps.LeaderProbe;newAdminLeaderProbeinmain_admin.goconsults the local Raft group runtimes.Behaviour matrix
/admin/healthzok/admin/healthz/leaderok/admin/healthz/leadernot leader/admin/healthz/leadernot_found(JSON)/admin/healthz/leaderPOSTmethod_not_allowedSelf-review (CLAUDE.md 5 lenses)
VerifyLeaderis bounded at 2 s, called only on nodes that already claim leadership viaStatus(). No new shared state.Status()(no Raft round-trip). Leader path: one ReadIndex per probe per group; load balancers typically probe every few seconds, so the overhead is bounded and well below the existing background HLC lease + ReadIndex traffic.VerifyLeaderis the same ReadIndex round-trip the lease-read path uses; a stale-leader follower mid-silent-leadership-change cannot return 200.Test plan
go test -race ./internal/admin/...— passes (1.3s)go test -race -run TestStartAdminServer .— passesgolangci-lint run ./...— clean-race@claude review