feat(admin): KeyViz cluster fan-out aggregator (Phase 2-C PR-1)#686
feat(admin): KeyViz cluster fan-out aggregator (Phase 2-C PR-1)#686bootjp merged 5 commits intodocs/keyviz-fanout-phase2cfrom
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements cluster-wide fan-out for KeyViz, enabling a node to aggregate traffic matrices from peers in parallel. It includes specific merging logic for different metric types and adds conflict detection for write metrics. Feedback highlights a security concern regarding unbounded memory allocation when decoding peer responses and suggests improving observability by logging network resource cleanup errors.
| return nil, fmt.Errorf("%w: status %d: %s", errKeyVizPeer, resp.StatusCode, string(body)) | ||
| } | ||
| var m KeyVizMatrix | ||
| if err := json.NewDecoder(resp.Body).Decode(&m); err != nil { |
There was a problem hiding this comment.
The JSON decoder reads from the response body without a size limit. A misbehaving or compromised peer could send an excessively large response, leading to high memory consumption or OOM on the aggregator node. Since the matrix size is bounded (max 1024 rows), we should use io.LimitReader to enforce a reasonable limit (e.g., 10MB).
| if err := json.NewDecoder(resp.Body).Decode(&m); err != nil { | |
| if err := json.NewDecoder(io.LimitReader(resp.Body, 10<<20)).Decode(&m); err != nil { |
References
- When handling HTTP requests, always limit the size of the request body to prevent Denial of Service attacks from oversized payloads.
| if err != nil { | ||
| return nil, pkgerrors.Wrap(err, "peer request") | ||
| } | ||
| defer func() { _ = resp.Body.Close() }() |
There was a problem hiding this comment.
The error from resp.Body.Close() is ignored, which violates the general rule about logging Close() errors on network resources. Additionally, the logger field in KeyVizFanout is currently unused. We should use the logger to record peer failures (including Close() errors and request errors) to provide server-side visibility into cluster issues.
| defer func() { _ = resp.Body.Close() }() | |
| defer func() { | |
| if err := resp.Body.Close(); err != nil { | |
| f.logger.LogAttrs(ctx, slog.LevelWarn, "failed to close peer response body", | |
| slog.String("peer", peer), | |
| slog.String("error", err.Error()), | |
| ) | |
| } | |
| }() |
References
- Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.
Two items from Gemini round-1 on PR #686: - security-high: io.LimitReader the JSON decode at 64 MiB. A misbehaving peer that streams gigabytes back at us would otherwise pin a goroutine on json.Decode and balloon memory. The cap is generous enough for the design's worst-case payload (1024 rows x 4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB JSON-encoded) but still bounded. - medium: log resp.Body.Close errors at debug level via the existing KeyVizFanout.logger field (which was wired but unused). peerResponseExceededLimit reuses the +1 byte from the LimitReader to detect overshoot without parsing the streamed body. Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap response decodes promptly and reports OK; the over-cap case is covered by manual inspection of the warning log path.
|
@claude review Round-1 follow-ups (97e09a8) addressing both Gemini items on PR #686:
|
|
Claude finished @bootjp's task in 3m 3s —— View job Round-1 follow-up review
Both Gemini items are addressed correctly in terms of their primary goals. One subtle correctness issue in the overshoot detection is worth calling out.
|
Seven items from Codex P2 + Gemini medium / security-medium / high: - 2.1 conflict flag definition: clarify per-row scope and signal that 2-C+ moves it to per-cell, removing the inconsistency between 2.1 and 5/6. - 3 auth: replace the conflated cookie-replay-plus-mint-token paragraph with a clean two-tier story. Phase 2-C MVP is anonymous on a private network; Phase 2-C+ adds a short-lived signed inter-node token derived from the existing admin signing key. Adds an explicit do-not-enable-on-untrusted warning. - 4.5 time alignment: bucket-align column timestamps to the keyvizStep boundary before merge so sub-step NTP jitter does not split each window into two adjacent merged columns. - 5 wire format: add per-node warnings array with a catalog_divergence code so the wire form actually carries the catalog-mismatch signal 4.4 promises. - 8 lens 3: document the 64 MiB io.LimitReader cap on peer response bodies (matches the cap implemented in PR #686). - 9 open Q1: resolve the DNS question. Resolution rides the stdlib resolver / nscd cache; no aggregator-local cache. Operators on flaky DNS colocate a resolver. - 1 background: add a Rollout notes paragraph addressing the Gemini high asks. The feature is off by default, mixed-version rollout is correct by construction (no schema migration, no state, forwards-compatible wire format), so a dual-write proxy / blue-green plan is not justified for this scope.
Two items from Gemini round-1 on PR #686: - security-high: io.LimitReader the JSON decode at 64 MiB. A misbehaving peer that streams gigabytes back at us would otherwise pin a goroutine on json.Decode and balloon memory. The cap is generous enough for the design's worst-case payload (1024 rows x 4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB JSON-encoded) but still bounded. - medium: log resp.Body.Close errors at debug level via the existing KeyVizFanout.logger field (which was wired but unused). peerResponseExceededLimit reuses the +1 byte from the LimitReader to detect overshoot without parsing the streamed body. Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap response decodes promptly and reports OK; the over-cap case is covered by manual inspection of the warning log path.
97e09a8 to
3c94ba4
Compare
Two items from Claude bot round-2 on PR #686: - Replace the LimitReader+probe pattern with a countingReader that wraps the LimitReader. The decoder's bufio buffering meant the external one-byte probe could return EOF even when the decoder itself had pulled past the cap into its own buffer; the warning fired only when the read happened to land at the boundary. The countingReader increments on every Read (including the chunks the decoder buffers internally), so the post-decode `n > cap` check fires reliably. The security bound (cap+1 max bytes off the wire) is unchanged; only the *detection* of overshoot is now deterministic. - Add WithResponseBodyLimit so tests can drive the over-cap path without serving 64 MiB. Backed by a new TestKeyVizFanoutRunPeerExceedsBodyLimit test that lowers the cap to 1 KiB and verifies the call completes promptly with a per-node status surfaced. Updated the keyVizPeerResponseBodyLimit comment to be honest about the worst-case sizing: 1024 rows x 4096 columns x 20 bytes/uint64 (max-encoded) is ~80 MiB, slightly over the 64 MiB cap. That is intentional - the operator-visible failure mode is "warning logged, matrix may be truncated", not "DoS". A future flag can expose the cap when an extreme-traffic deployment actually trips it.
|
@claude review Round-2 follow-ups (2addffe):
|
|
Claude finished @bootjp's task in 2m 30s —— View job Round-2 review
|
| Item | Status |
|---|---|
| Security bound (≤ cap+1 bytes off wire) | ✅ Correct and unchanged |
countingReader false-negative fix |
✅ Correct |
| Over-cap warning for near-cap success case | ✅ Reliable via byte-count post-decode |
| Over-cap path when decode fails (massively over cap) | ℹ️ Warning not logged (returns error instead), acknowledged |
WithResponseBodyLimit test hook |
✅ Clean |
| Sizing comment honesty (80 MiB > 64 MiB cap) | ✅ Now accurate |
| Automated test for warning-fires path | |
//nolint:wrapcheck |
ℹ️ Justified but worth noting for future reviewers |
No blocking issues. The PR is shippable. The remaining gap (automated test for the near-cap success + warning path) is the only thing worth tracking as a follow-up ticket if extreme-traffic deployments eventually trip the cap.
… test Two minor follow-ups from Claude bot round-2 on PR #686: - countingReader retains the //nolint:wrapcheck but with a longer comment explaining why: json.Decoder uses err == io.EOF (pointer compare, not errors.Is) to decide whether to keep pulling, so wrapping the underlying io.Reader error with %w would silently break the decoder's stop condition. CLAUDE.md generally says "avoid nolint, refactor instead", but here refactoring is impossible -- the only options are pass-through-and-suppress (this) or wrap-and-break-the-decoder. - Add TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning to cover the warning-fires path: small JSON envelope (~30 B) + 256 B of trailing whitespace against a 100 B cap. Decoder returns success; countingReader.n == 101 > 100 triggers the warning. The warning log itself is best-effort and not asserted from the test (slog has no clean injection point on the package), but the reachability of the success path under cap overrun is now pinned.
|
@claude review Round-3 follow-ups (4538f1d) addressing both round-2 minor notes:
|
Two items from Gemini round-1 on PR #686: - security-high: io.LimitReader the JSON decode at 64 MiB. A misbehaving peer that streams gigabytes back at us would otherwise pin a goroutine on json.Decode and balloon memory. The cap is generous enough for the design's worst-case payload (1024 rows x 4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB JSON-encoded) but still bounded. - medium: log resp.Body.Close errors at debug level via the existing KeyVizFanout.logger field (which was wired but unused). peerResponseExceededLimit reuses the +1 byte from the LimitReader to detect overshoot without parsing the streamed body. Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap response decodes promptly and reports OK; the over-cap case is covered by manual inspection of the warning log path.
Two items from Claude bot round-2 on PR #686: - Replace the LimitReader+probe pattern with a countingReader that wraps the LimitReader. The decoder's bufio buffering meant the external one-byte probe could return EOF even when the decoder itself had pulled past the cap into its own buffer; the warning fired only when the read happened to land at the boundary. The countingReader increments on every Read (including the chunks the decoder buffers internally), so the post-decode `n > cap` check fires reliably. The security bound (cap+1 max bytes off the wire) is unchanged; only the *detection* of overshoot is now deterministic. - Add WithResponseBodyLimit so tests can drive the over-cap path without serving 64 MiB. Backed by a new TestKeyVizFanoutRunPeerExceedsBodyLimit test that lowers the cap to 1 KiB and verifies the call completes promptly with a per-node status surfaced. Updated the keyVizPeerResponseBodyLimit comment to be honest about the worst-case sizing: 1024 rows x 4096 columns x 20 bytes/uint64 (max-encoded) is ~80 MiB, slightly over the 64 MiB cap. That is intentional - the operator-visible failure mode is "warning logged, matrix may be truncated", not "DoS". A future flag can expose the cap when an extreme-traffic deployment actually trips it.
… test Two minor follow-ups from Claude bot round-2 on PR #686: - countingReader retains the //nolint:wrapcheck but with a longer comment explaining why: json.Decoder uses err == io.EOF (pointer compare, not errors.Is) to decide whether to keep pulling, so wrapping the underlying io.Reader error with %w would silently break the decoder's stop condition. CLAUDE.md generally says "avoid nolint, refactor instead", but here refactoring is impossible -- the only options are pass-through-and-suppress (this) or wrap-and-break-the-decoder. - Add TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning to cover the warning-fires path: small JSON envelope (~30 B) + 256 B of trailing whitespace against a 100 B cap. Decoder returns success; countingReader.n == 101 > 100 triggers the warning. The warning log itself is best-effort and not asserted from the test (slog has no clean injection point on the package), but the reachability of the success path under cap overrun is now pinned.
4538f1d to
ec8c1e7
Compare
|
Claude finished @bootjp's task in 2m 17s —— View job Round-3 review
Both round-2 minor notes are addressed. One small precision issue in the nolint comment is worth recording for future readers; everything else is clean.
|
| Item | Status |
|---|---|
//nolint:wrapcheck suppression correctness |
✅ Justified |
Comment's stated reason (json.Decoder pointer-compare) |
ℹ️ Minor imprecision; goccy/go-json is the actual library |
| Warning-fires path covered by automated test | ✅ Byte-count path reachable and verified |
| Warning log emission asserted | ℹ️ Not asserted; acknowledged in PR description |
| Security bound (≤ cap+1 bytes off wire) | ✅ Unchanged and correct |
go test -race / golangci-lint |
✅ Clean per author |
No blocking issues. The two round-2 notes are closed. The PR is ready to merge.
Two items from Gemini round-1 on PR #686: - security-high: io.LimitReader the JSON decode at 64 MiB. A misbehaving peer that streams gigabytes back at us would otherwise pin a goroutine on json.Decode and balloon memory. The cap is generous enough for the design's worst-case payload (1024 rows x 4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB JSON-encoded) but still bounded. - medium: log resp.Body.Close errors at debug level via the existing KeyVizFanout.logger field (which was wired but unused). peerResponseExceededLimit reuses the +1 byte from the LimitReader to detect overshoot without parsing the streamed body. Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap response decodes promptly and reports OK; the over-cap case is covered by manual inspection of the warning log path.
Two items from Claude bot round-2 on PR #686: - Replace the LimitReader+probe pattern with a countingReader that wraps the LimitReader. The decoder's bufio buffering meant the external one-byte probe could return EOF even when the decoder itself had pulled past the cap into its own buffer; the warning fired only when the read happened to land at the boundary. The countingReader increments on every Read (including the chunks the decoder buffers internally), so the post-decode `n > cap` check fires reliably. The security bound (cap+1 max bytes off the wire) is unchanged; only the *detection* of overshoot is now deterministic. - Add WithResponseBodyLimit so tests can drive the over-cap path without serving 64 MiB. Backed by a new TestKeyVizFanoutRunPeerExceedsBodyLimit test that lowers the cap to 1 KiB and verifies the call completes promptly with a per-node status surfaced. Updated the keyVizPeerResponseBodyLimit comment to be honest about the worst-case sizing: 1024 rows x 4096 columns x 20 bytes/uint64 (max-encoded) is ~80 MiB, slightly over the 64 MiB cap. That is intentional - the operator-visible failure mode is "warning logged, matrix may be truncated", not "DoS". A future flag can expose the cap when an extreme-traffic deployment actually trips it.
… test Two minor follow-ups from Claude bot round-2 on PR #686: - countingReader retains the //nolint:wrapcheck but with a longer comment explaining why: json.Decoder uses err == io.EOF (pointer compare, not errors.Is) to decide whether to keep pulling, so wrapping the underlying io.Reader error with %w would silently break the decoder's stop condition. CLAUDE.md generally says "avoid nolint, refactor instead", but here refactoring is impossible -- the only options are pass-through-and-suppress (this) or wrap-and-break-the-decoder. - Add TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning to cover the warning-fires path: small JSON envelope (~30 B) + 256 B of trailing whitespace against a 100 B cap. Decoder returns success; countingReader.n == 101 > 100 triggers the warning. The warning log itself is best-effort and not asserted from the test (slog has no clean injection point on the package), but the reachability of the success path under cap overrun is now pinned.
ec8c1e7 to
186b6e9
Compare
|
@claude review Round-3 follow-up (186b6e9 after rebase on updated #685): generalized the
|
|
Claude finished @bootjp's task in 1m 6s —— View job Round-3 follow-up review
The generalized
|
| Item | Status |
|---|---|
//nolint:wrapcheck suppression correctness |
✅ Justified |
| Comment attribution (io.Reader contract, not specific library) | ✅ Fixed — contract correctly named as root cause |
Both encoding/json and goccy/go-json called out |
✅ Present, with correct framing |
| Contract-holds-independent-of-consumer clause | ✅ Explicit |
go test -race / golangci-lint |
✅ Clean per author |
All three rounds of review items are closed. No blocking issues remain. The PR is ready to merge.
The 2.1 wording said conflict fires "when any per-node values
disagreed for any cell," which under stable leadership would
include the leader-says-5 vs follower-says-0 split (the
steady-state shape) and contradicted both 4.2 and the 7 stable-
leader test expectation ("max wins, no conflict").
Narrow the predicate to its actual shape: conflict fires only
when two or more nodes report a non-zero value for the same cell.
Follower zeros against a leader's non-zero value do not count.
This matches the implementation in PR #686 (maxMerge ignores zero
on either side and returns conflict=false).
Mirrored the clarification in 4.2 so the conflict-predicate rule
is explicit in both the scope summary (2.1) and the merge-rule
detail (4.2).
Implements the server-side half of the fan-out path proposed in
docs/design/2026_04_27_proposed_keyviz_cluster_fanout.md. When the
operator passes --keyvizFanoutNodes, the existing
/admin/api/v1/keyviz/matrix handler aggregates the local matrix
with peer responses; with the flag empty (default), behaviour is
unchanged.
Aggregator (internal/admin/keyviz_fanout.go):
- Parallel HTTP GETs to each peer with a per-call timeout (default
2 s, --keyvizFanoutTimeout overrides).
- Forward-compatible request shape: the per-peer URL is rebuilt
from the parsed parameters so a peer running an older or newer
server cannot trip on a parameter we never intended to forward.
- Merge rules per design 4: reads / read_bytes sum across nodes
(distinct local serves are independent counts); writes /
write_bytes take the max with a row-level Conflict flag when
inputs disagree (conservative dedup that never overcounts during
a leadership flip; the canonical (raftGroupID, leaderTerm) rule
lands in Phase 2-C+).
- Degraded mode: a peer that times out, errors, or returns non-OK
contributes FanoutNodeStatus{OK: false, Error: ...}. The local
matrix still ships and Responded reflects partial success.
Wire format additions (forward-compatible):
- KeyVizMatrix.Fanout *FanoutResult — Nodes / Responded /
Expected; omitted when fan-out is disabled.
- KeyVizRow.Conflict bool — true when the max-merge collapsed
disagreeing values for any cell in the row.
Flag plumbing (main.go / main_admin.go):
- --keyvizFanoutNodes accepts comma-separated host:port endpoints.
- --keyvizFanoutTimeout sets the per-peer ceiling.
- Self-filter: a peer entry naming this node's own admin listener
(literal match, or 0.0.0.0/:: bind matched against a
127.0.0.1/localhost/::1 entry) is skipped so a symmetric
--keyvizFanoutNodes=node1,node2,node3 stamped onto every host
does not loop back over HTTP.
Tests (internal/admin/keyviz_fanout_test.go):
- mergeKeyVizMatrices table covering sum, stable-leader max,
leadership-flip max with conflict, union-of-columns, and
distinct-row order preservation.
- KeyVizFanout.Run end-to-end against httptest peers: single peer
ok, peer 5xx surfaces in degraded mode, peer hang respected by
the per-call timeout.
- buildKeyVizPeerURL table covering scheme/no-scheme/with-bounds
variants.
- Peer order is operator-supplied with self always first.
SPA changes follow in PR-2 (degraded banner + conflict hatching).
The proto / JSON extension with raftGroupID + leaderTerm follows in
PR-3 (Phase 2-C+).
Two items from Gemini round-1 on PR #686: - security-high: io.LimitReader the JSON decode at 64 MiB. A misbehaving peer that streams gigabytes back at us would otherwise pin a goroutine on json.Decode and balloon memory. The cap is generous enough for the design's worst-case payload (1024 rows x 4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB JSON-encoded) but still bounded. - medium: log resp.Body.Close errors at debug level via the existing KeyVizFanout.logger field (which was wired but unused). peerResponseExceededLimit reuses the +1 byte from the LimitReader to detect overshoot without parsing the streamed body. Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap response decodes promptly and reports OK; the over-cap case is covered by manual inspection of the warning log path.
Two items from Claude bot round-2 on PR #686: - Replace the LimitReader+probe pattern with a countingReader that wraps the LimitReader. The decoder's bufio buffering meant the external one-byte probe could return EOF even when the decoder itself had pulled past the cap into its own buffer; the warning fired only when the read happened to land at the boundary. The countingReader increments on every Read (including the chunks the decoder buffers internally), so the post-decode `n > cap` check fires reliably. The security bound (cap+1 max bytes off the wire) is unchanged; only the *detection* of overshoot is now deterministic. - Add WithResponseBodyLimit so tests can drive the over-cap path without serving 64 MiB. Backed by a new TestKeyVizFanoutRunPeerExceedsBodyLimit test that lowers the cap to 1 KiB and verifies the call completes promptly with a per-node status surfaced. Updated the keyVizPeerResponseBodyLimit comment to be honest about the worst-case sizing: 1024 rows x 4096 columns x 20 bytes/uint64 (max-encoded) is ~80 MiB, slightly over the 64 MiB cap. That is intentional - the operator-visible failure mode is "warning logged, matrix may be truncated", not "DoS". A future flag can expose the cap when an extreme-traffic deployment actually trips it.
… test Two minor follow-ups from Claude bot round-2 on PR #686: - countingReader retains the //nolint:wrapcheck but with a longer comment explaining why: json.Decoder uses err == io.EOF (pointer compare, not errors.Is) to decide whether to keep pulling, so wrapping the underlying io.Reader error with %w would silently break the decoder's stop condition. CLAUDE.md generally says "avoid nolint, refactor instead", but here refactoring is impossible -- the only options are pass-through-and-suppress (this) or wrap-and-break-the-decoder. - Add TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning to cover the warning-fires path: small JSON envelope (~30 B) + 256 B of trailing whitespace against a 100 B cap. Decoder returns success; countingReader.n == 101 > 100 triggers the warning. The warning log itself is best-effort and not asserted from the test (slog has no clean injection point on the package), but the reachability of the success path under cap overrun is now pinned.
186b6e9 to
7747be1
Compare
…se 2-C PR-2) (#687) ## Summary Phase 2-C **PR-2**, per the design doc landed in #685 §6: wire the fan-out `Fanout` / `conflict` shapes shipped by PR #686 into the admin SPA so operators can see when a peer failed and which rows are soft due to a leadership flip. Backend is unchanged. PR #686's `KeyVizMatrix.Fanout` block is currently emitted on every fan-out request but the SPA was discarding it. ## What this PR adds | Piece | Behavior | |---|---| | **Header counter** | `cluster view (N of M nodes)` in the per-matrix metadata strip. Hidden when `fanout` is absent (single-node mode, no behavior change). | | **FanoutBanner** | Degraded-mode card above the heatmap when `responded < expected`. Lists every failed node + error string. Hidden when the cluster is healthy. | | **ConflictOverlay** | SVG hatch over rows whose `conflict === true`. Layered inside the scroll container so the hatch tracks horizontal scroll (same idiom as `TimeAxis`). 4 px diagonal stripes, alpha 0.45, inheriting `currentColor` so it works against both light and dark themes. | | **RowDetail pill** | Hovering a conflicting row reveals a `conflict` pill with a hover-tooltip explaining the leadership-flip semantics. | ## Five-lens self-review 1. **Data loss** — n/a; SPA-only change reading existing fields. 2. **Concurrency / distributed** — n/a; single browser tab consuming an existing API. 3. **Performance** — Banner: zero-cost when healthy (`null` returned). Hatch: SVG with at most 1024 `<rect>` elements (matches `KeyVizRow` budget); empirically negligible. Overlay only renders when at least one row has `conflict === true`. 4. **Data consistency** — UI surfaces existing server signals; semantics come from PR #685 design §4.2 and #686 implementation. Not relitigated here. 5. **Test coverage** — `tsc -b --noEmit` and `vite build` clean. UI behavior (banner visibility, hatch presence, header text) is hard to assert in `tsc` alone; manual verification documented below. ## Test plan - [x] `npm run lint` (`tsc -b --noEmit`) — clean - [x] `npm run build` (Vite) — clean, output writes to `internal/admin/dist` - [ ] Manual: `make run` with `--keyvizEnabled --keyvizFanoutNodes=127.0.0.1:8080`. Open `/keyviz`; header reads `cluster view (1 of 1 nodes)`, no banner, no hatch. - [ ] Manual: configure `--keyvizFanoutNodes=127.0.0.1:8080,127.0.0.1:9999` (port 9999 unused). Header reads `1 of 2 nodes`, the failed-node banner appears with `connection refused` for `127.0.0.1:9999`. - [ ] Manual: smoke a leadership flip mid-window (or fake `conflict=true` server-side via `/admin/api/v1/keyviz/matrix?...`); the affected row renders with diagonal hatch overlay; hovering reveals the `conflict` pill. ## Out of scope - **Per-cell hatching** — Phase 2-C+ once the proto extension lands and `conflict` becomes per-cell instead of per-row. - **Per-node `generated_at` skew indicator** — design §9 Q3, deferred. - **Banner auto-dismiss / sound notification** — out of scope; an operator who has the page open will see the banner. CI alert pages are a separate system. Closes the SPA half of Phase 2-C; the proto extension follows in PR-3. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * KeyViz now displays cluster fan-out status with node availability and responsiveness counts. * Added visual indicators for rows experiencing conflicts during leadership transitions. * Row details now include conflict status labels with informative tooltips. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…401 (#692) ## Summary **Hotfix for the screenshot a user shared after enabling fan-out** — the heatmap collapsed to "1 of 6 nodes responded" with every peer returning `401 missing session cookie`. **Root cause**: PR #686 shipped fan-out with the design assumption (#685 §3) that peers would accept anonymous calls on a private network. But the receiving side runs the same `SessionAuth` middleware as the browser path, so anonymous fan-out calls are rejected with 401. The "anonymous on a private network" framing was wrong: the implementation didn't put a bypass on the peer side, so anonymous calls never went through end-to-end. **Fix**: forward the inbound user's cookies on every peer call. Cluster nodes already share `--adminSessionSigningKey` for HA, so a cookie minted on node A is verifiable on node B — no new infrastructure needed. ## What changed - `KeyVizFanout.Run` gains a `cookies []*http.Cookie` parameter; nil preserves the legacy behaviour for tests / future non-browser callers. - `KeyVizHandler` passes `r.Cookies()` so a SPA poll forwards `admin_session` (and `admin_csrf`, harmless on a GET) transparently. - `TestKeyVizFanoutRunForwardsCookies` pins that both cookies reach the peer verbatim. - Design doc §3 rewritten: documents the cookie-forwarding scheme + the operator-trust assumption that `--keyvizFanoutNodes` points at trusted hosts only. Phase 2-C+ will replace cookie forwarding with a dedicated inter-node token to decouple browser-session expiry from inter-node call validity. ## Five-lens self-review 1. **Data loss** — n/a; auth path change. 2. **Concurrency / distributed** — Cookies are passed by reference into per-peer goroutines but the slice is read-only inside `fetchPeer` (each goroutine `req.AddCookie`s independently). No shared mutation. 3. **Performance** — Adds N `req.AddCookie` calls per fan-out request. Trivial. 4. **Data consistency** — Auth-only change; merge semantics unchanged. 5. **Test coverage** — New `TestKeyVizFanoutRunForwardsCookies` asserts both cookies reach the peer with the original values. Existing fan-out tests pass `nil` and continue to exercise the unauthenticated path the existing code handled (peer 401s, request reports `ok=false`). ## Test plan - [x] `go test -race -count=1 ./internal/admin/...` — clean - [x] `golangci-lint run ./internal/admin/...` — clean - [ ] Manual: 6-node cluster with shared `ELASTICKV_ADMIN_SESSION_SIGNING_KEY`, `--keyvizFanoutNodes` set on every node. Heatmap should now show "6 of 6 nodes" instead of "1 of 6". ## Trust / threat model note Cookie forwarding is safe when (a) every peer is operator-configured and trusted, and (b) the network is private. **Do NOT point `--keyvizFanoutNodes` at an untrusted host** — the user's admin session would be replayed there. The design doc §3 update is explicit about this. Phase 2-C+ removes this footgun by switching to a dedicated inter-node token. Closes the production-impacting symptom from screenshot review.

Summary
Server-side half of Phase 2-C cluster fan-out, per
docs/design/2026_04_27_proposed_keyviz_cluster_fanout.md(PR #685). With--keyvizFanoutNodesset, the existing/admin/api/v1/keyviz/matrixhandler aggregates the local matrix with peer responses; with the flag empty (default), behaviour is unchanged.This is PR-1 of the implementation plan in §7 of the design. PR-2 (SPA degraded banner + conflict hatching) and PR-3 (proto extension with
raftGroupID+leaderTermfor the canonical merge) are separate.Depends on PR #685 (the design doc) for review context — branched off that branch.
Aggregator (
internal/admin/keyviz_fanout.go)--keyvizFanoutTimeoutoverrides).reads/read_bytes→ sum across nodeswrites/write_bytes→ max with row-levelConflictflag when inputs disagreeFanoutNodeStatus{OK: false, Error: ...}. The local matrix still ships.Wire format additions (forward-compatible)
{ "rows": [{ "...": "...", "conflict": false // NEW: row-level disagreement flag }], "fanout": { // NEW: omitted when fan-out is disabled "nodes": [{"node":"...", "ok":true, "error":""}], "responded": 2, "expected": 3 } }Old SPA against fan-out server keeps working — extra fields are additive.
Flag plumbing
--keyvizFanoutNodesaccepts comma-separatedhost:portendpoints.--keyvizFanoutTimeoutsets the per-peer ceiling (default 2 s).0.0.0.0/::bound listener matched against127.0.0.1/localhost/::1entries; a symmetric--keyvizFanoutNodes=node1,node2,node3stamped on every host does not loop back over HTTP.Five-lens self-review
TestKeyVizFanoutRunPeerTimeoutconfirms a hanging peer cannot extend the wall time.Observe) is untouched.go test -race -count=1 ./internal/admin/...clean;golangci-lint run ./...clean.Test plan
go test -race -count=1 ./internal/admin/...go test -race -count=1 .(main_admin tests with the newkeyVizFanoutConfig{}arg)go build ./...golangci-lint run ./...--keyvizFanoutNodes=...on each, drive write traffic against one node, confirm the heatmap reflects writes regardless of which node the SPA hitsresponded: 2, expected: 3plus the failed node's error stringWhat this PR does NOT do
FanoutorConflict. UI for those lands in PR-2.raftGroupID/leaderTermproto extension. PR-3 of the plan.GetClusterOverview. Static--keyvizFanoutNodesis the contract for Phase 2-C; cache + refresh-interval lands when clusters grow beyond hand-configured size.