Skip to content

feat(admin): KeyViz cluster fan-out aggregator (Phase 2-C PR-1)#686

Merged
bootjp merged 5 commits intodocs/keyviz-fanout-phase2cfrom
feat/keyviz-fanout-server
Apr 27, 2026
Merged

feat(admin): KeyViz cluster fan-out aggregator (Phase 2-C PR-1)#686
bootjp merged 5 commits intodocs/keyviz-fanout-phase2cfrom
feat/keyviz-fanout-server

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 27, 2026

Summary

Server-side half of Phase 2-C cluster fan-out, per docs/design/2026_04_27_proposed_keyviz_cluster_fanout.md (PR #685). With --keyvizFanoutNodes set, the existing /admin/api/v1/keyviz/matrix handler aggregates the local matrix with peer responses; with the flag empty (default), behaviour is unchanged.

This is PR-1 of the implementation plan in §7 of the design. PR-2 (SPA degraded banner + conflict hatching) and PR-3 (proto extension with raftGroupID + leaderTerm for the canonical merge) are separate.

Depends on PR #685 (the design doc) for review context — branched off that branch.

Aggregator (internal/admin/keyviz_fanout.go)

  • Parallel HTTP GETs to each peer with a per-call timeout (default 2 s, --keyvizFanoutTimeout overrides).
  • Forward-compatible request shape: the per-peer URL is rebuilt from the parsed parameters, so a peer running an older or newer server cannot trip on an unrecognised parameter.
  • Merge rules per design §4:
    • reads / read_bytes → sum across nodes
    • writes / write_bytes → max with row-level Conflict flag when inputs disagree
  • Degraded mode: a peer that times out, 5xx's, or fails to decode contributes FanoutNodeStatus{OK: false, Error: ...}. The local matrix still ships.

Wire format additions (forward-compatible)

{
  "rows": [{
    "...": "...",
    "conflict": false        // NEW: row-level disagreement flag
  }],
  "fanout": {                // NEW: omitted when fan-out is disabled
    "nodes": [{"node":"...", "ok":true, "error":""}],
    "responded": 2,
    "expected": 3
  }
}

Old SPA against fan-out server keeps working — extra fields are additive.

Flag plumbing

  • --keyvizFanoutNodes accepts comma-separated host:port endpoints.
  • --keyvizFanoutTimeout sets the per-peer ceiling (default 2 s).
  • Self-filter handles the bind-vs-advertise asymmetry: 0.0.0.0/:: bound listener matched against 127.0.0.1/localhost/::1 entries; a symmetric --keyvizFanoutNodes=node1,node2,node3 stamped on every host does not loop back over HTTP.

Five-lens self-review

  1. Data loss — Read-only against the existing sampler; conservative max-merge can under-count during a leadership flip but never over-counts. Conflict flag tells operators when the cell is soft.
  2. Concurrency / distributed — Per-peer calls run in parallel with a context deadline; the aggregator itself waits for all responses synchronously then merges. Test TestKeyVizFanoutRunPeerTimeout confirms a hanging peer cannot extend the wall time.
  3. Performance — Aggregator cost is O(N × M × C); at the 1024-row × 60-column × 3-node ceiling that's ~180k cells per request — well under the SPA's existing render budget. Hot path (Observe) is untouched.
  4. Data consistency — Merge rules conservative under leadership transitions. Reads exact in steady state and during transitions. The §9.1 canonical rule is preserved as a Phase 2-C+ contract.
  5. Test coverage — Table-driven over the four canonical scenarios in the design + URL-builder variants + per-node order. go test -race -count=1 ./internal/admin/... clean; golangci-lint run ./... clean.

Test plan

  • go test -race -count=1 ./internal/admin/...
  • go test -race -count=1 . (main_admin tests with the new keyVizFanoutConfig{} arg)
  • go build ./...
  • golangci-lint run ./...
  • Manual: 3-node cluster with --keyvizFanoutNodes=... on each, drive write traffic against one node, confirm the heatmap reflects writes regardless of which node the SPA hits
  • Manual: kill one node, confirm the response carries responded: 2, expected: 3 plus the failed node's error string

What this PR does NOT do

  • SPA changes. Wire-format additions are forwards-compatible; the SPA renders correctly without knowing about Fanout or Conflict. UI for those lands in PR-2.
  • raftGroupID / leaderTerm proto extension. PR-3 of the plan.
  • Membership discovery via GetClusterOverview. Static --keyvizFanoutNodes is the contract for Phase 2-C; cache + refresh-interval lands when clusters grow beyond hand-configured size.
  • Inter-node auth. The fan-out path is currently anonymous; intra-cluster admin traffic is assumed to ride a private network. Auth follows in a follow-up before this hits production.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8575b529-fe6c-4d1e-9955-ae38ce70cf51

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/keyviz-fanout-server

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements cluster-wide fan-out for KeyViz, enabling a node to aggregate traffic matrices from peers in parallel. It includes specific merging logic for different metric types and adds conflict detection for write metrics. Feedback highlights a security concern regarding unbounded memory allocation when decoding peer responses and suggests improving observability by logging network resource cleanup errors.

Comment thread internal/admin/keyviz_fanout.go Outdated
return nil, fmt.Errorf("%w: status %d: %s", errKeyVizPeer, resp.StatusCode, string(body))
}
var m KeyVizMatrix
if err := json.NewDecoder(resp.Body).Decode(&m); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The JSON decoder reads from the response body without a size limit. A misbehaving or compromised peer could send an excessively large response, leading to high memory consumption or OOM on the aggregator node. Since the matrix size is bounded (max 1024 rows), we should use io.LimitReader to enforce a reasonable limit (e.g., 10MB).

Suggested change
if err := json.NewDecoder(resp.Body).Decode(&m); err != nil {
if err := json.NewDecoder(io.LimitReader(resp.Body, 10<<20)).Decode(&m); err != nil {
References
  1. When handling HTTP requests, always limit the size of the request body to prevent Denial of Service attacks from oversized payloads.

Comment thread internal/admin/keyviz_fanout.go Outdated
if err != nil {
return nil, pkgerrors.Wrap(err, "peer request")
}
defer func() { _ = resp.Body.Close() }()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error from resp.Body.Close() is ignored, which violates the general rule about logging Close() errors on network resources. Additionally, the logger field in KeyVizFanout is currently unused. We should use the logger to record peer failures (including Close() errors and request errors) to provide server-side visibility into cluster issues.

Suggested change
defer func() { _ = resp.Body.Close() }()
defer func() {
if err := resp.Body.Close(); err != nil {
f.logger.LogAttrs(ctx, slog.LevelWarn, "failed to close peer response body",
slog.String("peer", peer),
slog.String("error", err.Error()),
)
}
}()
References
  1. Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

bootjp added a commit that referenced this pull request Apr 27, 2026
Two items from Gemini round-1 on PR #686:

- security-high: io.LimitReader the JSON decode at 64 MiB. A
  misbehaving peer that streams gigabytes back at us would otherwise
  pin a goroutine on json.Decode and balloon memory. The cap is
  generous enough for the design's worst-case payload (1024 rows x
  4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB
  JSON-encoded) but still bounded.
- medium: log resp.Body.Close errors at debug level via the existing
  KeyVizFanout.logger field (which was wired but unused).
  peerResponseExceededLimit reuses the +1 byte from the LimitReader
  to detect overshoot without parsing the streamed body.

Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap
response decodes promptly and reports OK; the over-cap case is
covered by manual inspection of the warning log path.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 27, 2026

@claude review

Round-1 follow-ups (97e09a8) addressing both Gemini items on PR #686:

  • security-high (line 274) — JSON decoder now reads through io.LimitReader at 64 MiB. Sizing per the design's worst case (1024 rows × 4096 columns × ~32 B per uint64 ≈ 128 MiB raw / ~32 MiB JSON-encoded). Over-cap responses log a warning via the existing KeyVizFanout.logger rather than silently truncating. New TestKeyVizFanoutRunPeerOverlargeBody pins the in-cap path.
  • medium (line 266)resp.Body.Close() errors now log at debug level on the same logger; the previously-unused logger field is wired into both the close path and the size-cap warning.

go test -race -count=1 ./internal/admin/... clean. golangci-lint run ./internal/admin/... clean.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Claude finished @bootjp's task in 3m 3s —— View job


Round-1 follow-up review

  • Gather context
  • Read internal/admin/keyviz_fanout.go
  • Read internal/admin/keyviz_fanout_test.go
  • Review LimitReader + overshoot detection
  • Review Close() logging
  • Review test coverage gap

Both Gemini items are addressed correctly in terms of their primary goals. One subtle correctness issue in the overshoot detection is worth calling out.

io.LimitReader cap — security bound is solid

internal/admin/keyviz_fanout.go lines ~295–310

The LimitReader at keyVizPeerResponseBodyLimit + 1 (64 MiB + 1 byte) correctly caps how many bytes a malicious peer can push through the JSON decoder. The security property holds unconditionally.

peerResponseExceededLimit — subtle false-negative due to json.Decoder buffering

internal/admin/keyviz_fanout.go (peerResponseExceededLimit + caller)

json.NewDecoder wraps the reader in a bufio.Reader and reads ahead in chunks. After Decode returns successfully, the decoder's internal buffer may have already consumed bytes past the JSON boundary (and past the LimitReader's cap). When peerResponseExceededLimit(limited) then reads directly from limited, it reads from resp.Body via the LimitReader — but the LimitReader may already be at its cap (N or N+1 already consumed into the decoder's internal buffer), so Read returns (0, io.EOF) and the function returns false.

Concretely: if the body is 100 MiB of valid JSON, the decoder reads 64 MiB + 1 bytes from limited in internal chunks, the LimitReader is exhausted, the decoder stops (either with an error or with a valid partial object). If it errored, fetchPeer returns the error — fine. If it somehow decoded a valid object (e.g. the first 64 MiB contained a complete JSON value), then peerResponseExceededLimit reads from the now-exhausted limited, gets EOF, and returns false. The warning doesn't fire.

This is not a security regression — the bytes read are still bounded to 64 MiB + 1. The peerResponseExceededLimit warning is best-effort, and the commit message is transparent about this ("over-cap case is covered by manual inspection"). Still worth noting so future readers don't assume the warning is reliable.

A fully reliable approach would be a custom io.Reader that counts bytes consumed and checks after decode:

type countingReader struct {
    r io.Reader
    n int64
}
func (c *countingReader) Read(p []byte) (int, error) {
    n, err := c.r.Read(p)
    c.n += int64(n)
    return n, err
}

cr := &countingReader{r: io.LimitReader(resp.Body, keyVizPeerResponseBodyLimit+1)}
var m KeyVizMatrix
if err := json.NewDecoder(cr).Decode(&m); err != nil { ... }
if cr.n > keyVizPeerResponseBodyLimit { /* warn */ }

This accounts for bytes consumed into the decoder's internal buffer and fires reliably. Whether it's worth the extra type for a best-effort warning is a judgment call — the current approach is acceptable given the explicit acknowledgment.

resp.Body.Close() logging — correct

internal/admin/keyviz_fanout.go lines ~278–285

slog.LevelDebug is the right level for a network close error on a best-effort admin path. Using ctx from the enclosing function is safe — slog.Logger.LogAttrs only extracts context values (e.g. trace IDs) from the context; it doesn't fail or skip on a cancelled context.

newKeyVizPeerStub refactor — good cleanup

Removing the status int parameter and hardcoding http.StatusOK is the right call since non-200 tests build their handler inline. The signature is cleaner and all callers updated correctly.

Test gap: over-cap warning path is untested

internal/admin/keyviz_fanout_test.go TestKeyVizFanoutRunPeerOverlargeBody

The test name implies it covers "overlarge body" but the body it creates is ~500 KB (64 rows × 4096 uint64 values of 0, each 1 char in JSON = ~270 KB). This is well under the 64 MiB cap. The test only pins the in-cap path, not the warning path. The commit message acknowledges this explicitly.

Per CLAUDE.md: "New/changed branches must have unit tests." The peerResponseExceededLimit function and the warning log in fetchPeer are new code with no automated coverage. A future PR-3 (or a follow-up to this one) should add a test that serves a body > 64 MiB (or temporarily lowers keyVizPeerResponseBodyLimit to a test value) and asserts:

  • The decode completes in bounded time
  • The returned matrix is either an error or a truncated-but-valid result
  • The log output contains the warning

Sizing comment inconsistency (non-blocking)

The constant comment says 1024 rows × 4096 columns × ~32 bytes per uint64 ≈ 128 MiB raw, ~32 MiB JSON-encoded. The 32-byte-per-uint64 figure is for binary encoding; JSON encoding of uint64(0) is 1 byte ("0"), and uint64(18446744073709551615) is 20 bytes. At realistic heatmap densities (mostly-zero traffic matrix), the JSON is much closer to 1–3 bytes/value, so 64 MiB covers the realistic worst case generously. But if a heatmap captures peak-traffic windows with large counts, the theoretical worst case (20 bytes/value × 4M values ≈ 84 MiB) exceeds the cap. Worth noting for a future flag that exposes the limit.


Summary: Security bound is real and correct. The peerResponseExceededLimit detection has a documented false-negative edge case due to decoder buffering (not a safety issue, just a warning reliability issue). Close-error logging is clean. The main outstanding item is an automated test for the actual over-cap path. No blocking issues — this is shippable as-is with the understanding that the over-cap test and the sizing note are tracked for follow-up.

bootjp added a commit that referenced this pull request Apr 27, 2026
Seven items from Codex P2 + Gemini medium / security-medium / high:

- 2.1 conflict flag definition: clarify per-row scope and signal
  that 2-C+ moves it to per-cell, removing the inconsistency
  between 2.1 and 5/6.
- 3 auth: replace the conflated cookie-replay-plus-mint-token
  paragraph with a clean two-tier story. Phase 2-C MVP is
  anonymous on a private network; Phase 2-C+ adds a short-lived
  signed inter-node token derived from the existing admin
  signing key. Adds an explicit do-not-enable-on-untrusted
  warning.
- 4.5 time alignment: bucket-align column timestamps to the
  keyvizStep boundary before merge so sub-step NTP jitter does
  not split each window into two adjacent merged columns.
- 5 wire format: add per-node warnings array with a
  catalog_divergence code so the wire form actually carries
  the catalog-mismatch signal 4.4 promises.
- 8 lens 3: document the 64 MiB io.LimitReader cap on peer
  response bodies (matches the cap implemented in PR #686).
- 9 open Q1: resolve the DNS question. Resolution rides the
  stdlib resolver / nscd cache; no aggregator-local cache.
  Operators on flaky DNS colocate a resolver.
- 1 background: add a Rollout notes paragraph addressing the
  Gemini high asks. The feature is off by default, mixed-version
  rollout is correct by construction (no schema migration, no
  state, forwards-compatible wire format), so a dual-write proxy
  / blue-green plan is not justified for this scope.
bootjp added a commit that referenced this pull request Apr 27, 2026
Two items from Gemini round-1 on PR #686:

- security-high: io.LimitReader the JSON decode at 64 MiB. A
  misbehaving peer that streams gigabytes back at us would otherwise
  pin a goroutine on json.Decode and balloon memory. The cap is
  generous enough for the design's worst-case payload (1024 rows x
  4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB
  JSON-encoded) but still bounded.
- medium: log resp.Body.Close errors at debug level via the existing
  KeyVizFanout.logger field (which was wired but unused).
  peerResponseExceededLimit reuses the +1 byte from the LimitReader
  to detect overshoot without parsing the streamed body.

Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap
response decodes promptly and reports OK; the over-cap case is
covered by manual inspection of the warning log path.
@bootjp bootjp force-pushed the feat/keyviz-fanout-server branch from 97e09a8 to 3c94ba4 Compare April 27, 2026 09:05
bootjp added a commit that referenced this pull request Apr 27, 2026
Two items from Claude bot round-2 on PR #686:

- Replace the LimitReader+probe pattern with a countingReader that
  wraps the LimitReader. The decoder's bufio buffering meant the
  external one-byte probe could return EOF even when the decoder
  itself had pulled past the cap into its own buffer; the warning
  fired only when the read happened to land at the boundary. The
  countingReader increments on every Read (including the chunks
  the decoder buffers internally), so the post-decode `n > cap`
  check fires reliably. The security bound (cap+1 max bytes off
  the wire) is unchanged; only the *detection* of overshoot is
  now deterministic.
- Add WithResponseBodyLimit so tests can drive the over-cap path
  without serving 64 MiB. Backed by a new TestKeyVizFanoutRunPeerExceedsBodyLimit
  test that lowers the cap to 1 KiB and verifies the call completes
  promptly with a per-node status surfaced.

Updated the keyVizPeerResponseBodyLimit comment to be honest about
the worst-case sizing: 1024 rows x 4096 columns x 20 bytes/uint64
(max-encoded) is ~80 MiB, slightly over the 64 MiB cap. That is
intentional - the operator-visible failure mode is "warning logged,
matrix may be truncated", not "DoS". A future flag can expose the
cap when an extreme-traffic deployment actually trips it.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 27, 2026

@claude review

Round-2 follow-ups (2addffe):

  • peerResponseExceededLimit false-negative — replaced the LimitReader-plus-external-probe pattern with the countingReader you suggested. Increments on every Read (including the chunks json.NewDecoder buffers internally), so the post-decode n > cap check fires deterministically regardless of where the decoder happened to stop. Security bound unchanged (cap+1 max bytes off the wire); only the overshoot detection is now reliable.
  • Over-cap test gap — added TestKeyVizFanoutRunPeerExceedsBodyLimit driven by a new WithResponseBodyLimit option. Test lowers the cap to 1 KiB, serves a body well over that, and asserts the call completes promptly with per-node status surfaced.
  • Sizing-comment inconsistency — rewrote the keyVizPeerResponseBodyLimit comment to be honest about the worst case: 1024 × 4096 × 20 B (max-encoded uint64) ≈ 80 MiB, which exceeds the 64 MiB cap. The operator-visible failure mode is "warning logged, matrix may be truncated", not DoS. A future flag exposes the cap when an extreme-traffic deployment actually trips it.

go test -race -count=1 ./internal/admin/... clean. golangci-lint run ./internal/admin/... clean.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Claude finished @bootjp's task in 2m 30s —— View job


Round-2 review

  • Gather context
  • Read internal/admin/keyviz_fanout.go (countingReader + limit changes)
  • Read internal/admin/keyviz_fanout_test.go (new ExceedsBodyLimit test)
  • Review countingReader implementation
  • Review WithResponseBodyLimit option
  • Review sizing comment
  • Final assessment

Round-2 follow-up review

All three items from Round-1 are addressed correctly. Below are observations on each, plus one subtle point worth being explicit about.


countingReader — false-negative is resolved correctly

internal/admin/keyviz_fanout.go:290–310

The implementation matches the pattern I suggested:

cr := &countingReader{r: io.LimitReader(resp.Body, f.responseBodyLimit()+1)}
var m KeyVizMatrix
if err := json.NewDecoder(cr).Decode(&m); err != nil {
    return nil, pkgerrors.Wrap(err, "decode peer response")
}
if cr.n > f.responseBodyLimit() {
    f.logger.LogAttrs(...)
}

The invariant holds: cr.n tracks every byte json.NewDecoder pulls from the LimitReader (including its internal bufio read-ahead), so the post-decode comparison fires reliably for the "just-barely-over-cap body that happens to be valid JSON at the boundary."

One subtlety to be aware of (not a bug — just clarifying the semantics): for a body that is massively over the cap (e.g. 80 MiB), the LimitReader returns io.EOF mid-JSON, the decoder returns an error, and we return nil, pkgerrors.Wrap(err, "decode peer response") before reaching the cr.n > cap check. The warning never logs in that path. The security property (≤ cap+1 bytes off the wire) holds unconditionally — the cr.n check is only load-bearing for the near-cap success case. The PR sizing comment is now honest about this, which is good.


WithResponseBodyLimit option — clean

internal/admin/keyviz_fanout.go:136–148

The <= 0 reset-to-default sentinel is the right choice. Values are only honoured in tests via the functional option; production leaves f.bodyLimit == 0 and falls through to the package-level constant.


Sizing comment — now correct

internal/admin/keyviz_fanout.go:40–56

The rewrite is accurate:

  • Worst-case: 1024 × 4096 × 20 B ≈ 80 MiB — exceeds the 64 MiB cap
  • Operator-visible failure mode: warning logged + decode error returned, not DoS
  • Future flag noted as the escape hatch

This is the right trade-off to document explicitly.


//nolint:wrapcheck in countingReader

internal/admin/keyviz_fanout.go:338

return n, err //nolint:wrapcheck // pure pass-through; underlying reader owns the error.

CLAUDE.md says "Avoid //nolint — refactor instead." The wrapcheck complaint here is a false positive: wrapping io.EOF or other io.Reader sentinel errors would break callers that errors.Is(err, io.EOF). The comment is the correct local exception. If the linter accepts it (reporter says lint is clean), this is fine — just note it for CLAUDE.md readers who encounter it in future reviews.


Test TestKeyVizFanoutRunPeerExceedsBodyLimit — covers the critical property

internal/admin/keyviz_fanout_test.go:380–427

With a 1 KiB cap and ~14 KB body, the decoder always fails mid-stream → peer returns ok=false. The "Either is fine" comment in the test is technically accurate but slightly misleading — in practice the outcome is always ok=false for this construction. The load-bearing assertion (time.Since(start) < 5s) is correct and is what this test actually pins.

The warning-fires path (decode succeeds, cr.n > cap) is not exercised by any automated test — that would require a body that is exactly cap+1 bytes of well-formed JSON. This was acknowledged in the PR description. It's acceptable as a documented gap; the security bound is what matters.


Summary

Item Status
Security bound (≤ cap+1 bytes off wire) ✅ Correct and unchanged
countingReader false-negative fix ✅ Correct
Over-cap warning for near-cap success case ✅ Reliable via byte-count post-decode
Over-cap path when decode fails (massively over cap) ℹ️ Warning not logged (returns error instead), acknowledged
WithResponseBodyLimit test hook ✅ Clean
Sizing comment honesty (80 MiB > 64 MiB cap) ✅ Now accurate
Automated test for warning-fires path ⚠️ Still a gap, acknowledged in PR
//nolint:wrapcheck ℹ️ Justified but worth noting for future reviewers

No blocking issues. The PR is shippable. The remaining gap (automated test for the near-cap success + warning path) is the only thing worth tracking as a follow-up ticket if extreme-traffic deployments eventually trip the cap.

bootjp added a commit that referenced this pull request Apr 27, 2026
… test

Two minor follow-ups from Claude bot round-2 on PR #686:

- countingReader retains the //nolint:wrapcheck but with a longer
  comment explaining why: json.Decoder uses err == io.EOF (pointer
  compare, not errors.Is) to decide whether to keep pulling, so
  wrapping the underlying io.Reader error with %w would silently
  break the decoder's stop condition. CLAUDE.md generally says
  "avoid nolint, refactor instead", but here refactoring is
  impossible -- the only options are pass-through-and-suppress
  (this) or wrap-and-break-the-decoder.
- Add TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning to cover
  the warning-fires path: small JSON envelope (~30 B) + 256 B of
  trailing whitespace against a 100 B cap. Decoder returns success;
  countingReader.n == 101 > 100 triggers the warning. The
  warning log itself is best-effort and not asserted from the
  test (slog has no clean injection point on the package), but the
  reachability of the success path under cap overrun is now pinned.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 27, 2026

@claude review

Round-3 follow-ups (4538f1d) addressing both round-2 minor notes:

  • //nolint:wrapcheck rationale — kept the suppression but extended the comment to explain why refactoring is impossible here: json.Decoder uses err == io.EOF (pointer compare, not errors.Is) to decide whether to keep pulling, so wrapping the io.Reader error with %w would silently break the decoder's stop condition. The CLAUDE.md "avoid nolint, refactor instead" rule has a genuine exception here.
  • Warning-fires test gap — added TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning: small JSON envelope (~30 B) + 256 B of trailing whitespace against a 100 B cap. json.Decoder reads in bufio chunks so the LimitReader hands it cap+1 = 101 bytes; decoder sees the complete object and returns nil, leaving cr.n == 101 > 100. The warning condition is now reachable from a unit test. The log line itself is not asserted (no clean slog injection point on the package), but the byte-count reliability is pinned.

go test -race -count=1 ./internal/admin/... clean. golangci-lint run ./internal/admin/... clean.

bootjp added a commit that referenced this pull request Apr 27, 2026
Two items from Gemini round-1 on PR #686:

- security-high: io.LimitReader the JSON decode at 64 MiB. A
  misbehaving peer that streams gigabytes back at us would otherwise
  pin a goroutine on json.Decode and balloon memory. The cap is
  generous enough for the design's worst-case payload (1024 rows x
  4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB
  JSON-encoded) but still bounded.
- medium: log resp.Body.Close errors at debug level via the existing
  KeyVizFanout.logger field (which was wired but unused).
  peerResponseExceededLimit reuses the +1 byte from the LimitReader
  to detect overshoot without parsing the streamed body.

Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap
response decodes promptly and reports OK; the over-cap case is
covered by manual inspection of the warning log path.
bootjp added a commit that referenced this pull request Apr 27, 2026
Two items from Claude bot round-2 on PR #686:

- Replace the LimitReader+probe pattern with a countingReader that
  wraps the LimitReader. The decoder's bufio buffering meant the
  external one-byte probe could return EOF even when the decoder
  itself had pulled past the cap into its own buffer; the warning
  fired only when the read happened to land at the boundary. The
  countingReader increments on every Read (including the chunks
  the decoder buffers internally), so the post-decode `n > cap`
  check fires reliably. The security bound (cap+1 max bytes off
  the wire) is unchanged; only the *detection* of overshoot is
  now deterministic.
- Add WithResponseBodyLimit so tests can drive the over-cap path
  without serving 64 MiB. Backed by a new TestKeyVizFanoutRunPeerExceedsBodyLimit
  test that lowers the cap to 1 KiB and verifies the call completes
  promptly with a per-node status surfaced.

Updated the keyVizPeerResponseBodyLimit comment to be honest about
the worst-case sizing: 1024 rows x 4096 columns x 20 bytes/uint64
(max-encoded) is ~80 MiB, slightly over the 64 MiB cap. That is
intentional - the operator-visible failure mode is "warning logged,
matrix may be truncated", not "DoS". A future flag can expose the
cap when an extreme-traffic deployment actually trips it.
bootjp added a commit that referenced this pull request Apr 27, 2026
… test

Two minor follow-ups from Claude bot round-2 on PR #686:

- countingReader retains the //nolint:wrapcheck but with a longer
  comment explaining why: json.Decoder uses err == io.EOF (pointer
  compare, not errors.Is) to decide whether to keep pulling, so
  wrapping the underlying io.Reader error with %w would silently
  break the decoder's stop condition. CLAUDE.md generally says
  "avoid nolint, refactor instead", but here refactoring is
  impossible -- the only options are pass-through-and-suppress
  (this) or wrap-and-break-the-decoder.
- Add TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning to cover
  the warning-fires path: small JSON envelope (~30 B) + 256 B of
  trailing whitespace against a 100 B cap. Decoder returns success;
  countingReader.n == 101 > 100 triggers the warning. The
  warning log itself is best-effort and not asserted from the
  test (slog has no clean injection point on the package), but the
  reachability of the success path under cap overrun is now pinned.
@bootjp bootjp force-pushed the feat/keyviz-fanout-server branch from 4538f1d to ec8c1e7 Compare April 27, 2026 09:20
@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Claude finished @bootjp's task in 2m 17s —— View job


Round-3 review

  • Gather context
  • Read internal/admin/keyviz_fanout.go (nolint comment + countingReader)
  • Read internal/admin/keyviz_fanout_test.go (new NearCap test)
  • Assess nolint rationale accuracy
  • Assess NearCap test correctness and coverage
  • Final assessment

Both round-2 minor notes are addressed. One small precision issue in the nolint comment is worth recording for future readers; everything else is clean.


//nolint:wrapcheck — suppression is correct, comment has one imprecision

internal/admin/keyviz_fanout.go:352–364

The suppression is justified and the multi-line explanation is useful. One precision note: the comment states "json.Decoder uses err == io.EOF (pointer compare, not errors.Is)". The actual library in scope is github.com/goccy/go-json, not encoding/json. Modern libraries like goccy/go-json may use errors.Is rather than pointer comparison for EOF detection.

This does not change the conclusion: the io.Reader interface contract requires implementations to return io.EOF unwrapped regardless of which consumer sees it. Code outside the JSON decoder — any other caller that borrows or wraps countingReader — may use pointer comparison. The canonical pass-through rule ("do not wrap io.Reader sentinel errors") stands independent of which JSON library happens to be the immediate consumer.

If the comment is ever revised, "the io.Reader contract requires pass-through of sentinels like io.EOF — wrapping breaks any caller relying on pointer equality" is more accurate than attributing the constraint to json.Decoder specifically. Not blocking — the suppression is correct; only the stated reason is slightly imprecise.


TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning — pins the right property

internal/admin/keyviz_fanout_test.go:397–426

The test construction is sound:

  • Tiny JSON (~30 B) + 256 B trailing whitespace against a 100 B cap
  • LimitReader set to cap+1 = 101, so the decoder's internal buffer reads exactly 101 bytes from countingReader
  • JSON object is complete at ~30 B → Decode returns nil
  • cr.n == 101 > 100 → warning branch is reachable in fetchPeer

The comment notes "bufio chunks" and attributes buffering behaviour to json.Decoder. goccy/go-json may not use bufio.Reader internally — it is a code-generation-based decoder with its own buffer management. In practice this does not matter: any reasonable read-ahead will consume past the JSON boundary into the whitespace padding, making cr.n > cap true. The test has been verified clean, confirming the property holds.

The test name slightly overpromises — it asserts Nodes[1].OK == true (decode succeeded) but does not assert the warning log was emitted. This was explicitly acknowledged in the PR description ("the log line itself is not asserted — no clean slog injection point on the package"). That is an acceptable documented gap; the byte-count reliability is what was under question in round-2, and that is now pinned. ✅


Summary

Item Status
//nolint:wrapcheck suppression correctness ✅ Justified
Comment's stated reason (json.Decoder pointer-compare) ℹ️ Minor imprecision; goccy/go-json is the actual library
Warning-fires path covered by automated test ✅ Byte-count path reachable and verified
Warning log emission asserted ℹ️ Not asserted; acknowledged in PR description
Security bound (≤ cap+1 bytes off wire) ✅ Unchanged and correct
go test -race / golangci-lint ✅ Clean per author

No blocking issues. The two round-2 notes are closed. The PR is ready to merge.

bootjp added a commit that referenced this pull request Apr 27, 2026
Two items from Gemini round-1 on PR #686:

- security-high: io.LimitReader the JSON decode at 64 MiB. A
  misbehaving peer that streams gigabytes back at us would otherwise
  pin a goroutine on json.Decode and balloon memory. The cap is
  generous enough for the design's worst-case payload (1024 rows x
  4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB
  JSON-encoded) but still bounded.
- medium: log resp.Body.Close errors at debug level via the existing
  KeyVizFanout.logger field (which was wired but unused).
  peerResponseExceededLimit reuses the +1 byte from the LimitReader
  to detect overshoot without parsing the streamed body.

Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap
response decodes promptly and reports OK; the over-cap case is
covered by manual inspection of the warning log path.
bootjp added a commit that referenced this pull request Apr 27, 2026
Two items from Claude bot round-2 on PR #686:

- Replace the LimitReader+probe pattern with a countingReader that
  wraps the LimitReader. The decoder's bufio buffering meant the
  external one-byte probe could return EOF even when the decoder
  itself had pulled past the cap into its own buffer; the warning
  fired only when the read happened to land at the boundary. The
  countingReader increments on every Read (including the chunks
  the decoder buffers internally), so the post-decode `n > cap`
  check fires reliably. The security bound (cap+1 max bytes off
  the wire) is unchanged; only the *detection* of overshoot is
  now deterministic.
- Add WithResponseBodyLimit so tests can drive the over-cap path
  without serving 64 MiB. Backed by a new TestKeyVizFanoutRunPeerExceedsBodyLimit
  test that lowers the cap to 1 KiB and verifies the call completes
  promptly with a per-node status surfaced.

Updated the keyVizPeerResponseBodyLimit comment to be honest about
the worst-case sizing: 1024 rows x 4096 columns x 20 bytes/uint64
(max-encoded) is ~80 MiB, slightly over the 64 MiB cap. That is
intentional - the operator-visible failure mode is "warning logged,
matrix may be truncated", not "DoS". A future flag can expose the
cap when an extreme-traffic deployment actually trips it.
bootjp added a commit that referenced this pull request Apr 27, 2026
… test

Two minor follow-ups from Claude bot round-2 on PR #686:

- countingReader retains the //nolint:wrapcheck but with a longer
  comment explaining why: json.Decoder uses err == io.EOF (pointer
  compare, not errors.Is) to decide whether to keep pulling, so
  wrapping the underlying io.Reader error with %w would silently
  break the decoder's stop condition. CLAUDE.md generally says
  "avoid nolint, refactor instead", but here refactoring is
  impossible -- the only options are pass-through-and-suppress
  (this) or wrap-and-break-the-decoder.
- Add TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning to cover
  the warning-fires path: small JSON envelope (~30 B) + 256 B of
  trailing whitespace against a 100 B cap. Decoder returns success;
  countingReader.n == 101 > 100 triggers the warning. The
  warning log itself is best-effort and not asserted from the
  test (slog has no clean injection point on the package), but the
  reachability of the success path under cap overrun is now pinned.
@bootjp bootjp force-pushed the feat/keyviz-fanout-server branch from ec8c1e7 to 186b6e9 Compare April 27, 2026 09:24
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 27, 2026

@claude review

Round-3 follow-up (186b6e9 after rebase on updated #685): generalized the //nolint:wrapcheck comment per your imprecision note. Now states the constraint as the io.Reader contract (sentinels must be passed unwrapped so err == io.EOF pointer compares keep working), with encoding/json and goccy/go-json mentioned as illustrative consumers — the underlying rule holds regardless of which decoder is in scope.

go test -race -count=1 ./internal/admin/... clean. golangci-lint run ./internal/admin/... clean.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Claude finished @bootjp's task in 1m 6s —— View job


Round-3 follow-up review

  • Gather context
  • Read internal/admin/keyviz_fanout.go (updated nolint comment)
  • Assess comment accuracy and completeness
  • Final assessment

The generalized //nolint:wrapcheck comment in keyviz_fanout.go:355–369 correctly addresses the round-3 precision note.

//nolint:wrapcheck comment — now accurate

The updated comment makes three improvements over the prior version:

  1. Root cause is the io.Reader contract, not a specific library: "implementations are required to pass io.EOF through unwrapped so any caller that does err == io.EOF (pointer compare) keeps working." This is the correct, library-independent framing.

  2. Explicit about the mechanism: "Wrapping with %w produces a different error value that pointer compare will not match, even though errors.Is would." The distinction between pointer equality and errors.Is is what makes this non-obvious; spelling it out is the right call.

  3. Correct attribution of consumers: Both encoding/json (historical, pointer compare) and goccy/go-json (the actual library in use, may use errors.Is) are named, with the clarification that "the io.Reader contract holds independent of which consumer is in scope." This is precisely what was missing from the prior version.

The type-level doc comment for countingReader (line 341–346) still references json.NewDecoder using internal buffering — but that's describing the motivation for the type's existence (why we can't probe the LimitReader after decode), not the contract, and the description is accurate regardless of which decoder implementation is used. No change needed there.

Summary

Item Status
//nolint:wrapcheck suppression correctness ✅ Justified
Comment attribution (io.Reader contract, not specific library) ✅ Fixed — contract correctly named as root cause
Both encoding/json and goccy/go-json called out ✅ Present, with correct framing
Contract-holds-independent-of-consumer clause ✅ Explicit
go test -race / golangci-lint ✅ Clean per author

All three rounds of review items are closed. No blocking issues remain. The PR is ready to merge.

bootjp added a commit that referenced this pull request Apr 27, 2026
The 2.1 wording said conflict fires "when any per-node values
disagreed for any cell," which under stable leadership would
include the leader-says-5 vs follower-says-0 split (the
steady-state shape) and contradicted both 4.2 and the 7 stable-
leader test expectation ("max wins, no conflict").

Narrow the predicate to its actual shape: conflict fires only
when two or more nodes report a non-zero value for the same cell.
Follower zeros against a leader's non-zero value do not count.

This matches the implementation in PR #686 (maxMerge ignores zero
on either side and returns conflict=false).

Mirrored the clarification in 4.2 so the conflict-predicate rule
is explicit in both the scope summary (2.1) and the merge-rule
detail (4.2).
bootjp added 2 commits April 27, 2026 18:27
Implements the server-side half of the fan-out path proposed in
docs/design/2026_04_27_proposed_keyviz_cluster_fanout.md. When the
operator passes --keyvizFanoutNodes, the existing
/admin/api/v1/keyviz/matrix handler aggregates the local matrix
with peer responses; with the flag empty (default), behaviour is
unchanged.

Aggregator (internal/admin/keyviz_fanout.go):

- Parallel HTTP GETs to each peer with a per-call timeout (default
  2 s, --keyvizFanoutTimeout overrides).
- Forward-compatible request shape: the per-peer URL is rebuilt
  from the parsed parameters so a peer running an older or newer
  server cannot trip on a parameter we never intended to forward.
- Merge rules per design 4: reads / read_bytes sum across nodes
  (distinct local serves are independent counts); writes /
  write_bytes take the max with a row-level Conflict flag when
  inputs disagree (conservative dedup that never overcounts during
  a leadership flip; the canonical (raftGroupID, leaderTerm) rule
  lands in Phase 2-C+).
- Degraded mode: a peer that times out, errors, or returns non-OK
  contributes FanoutNodeStatus{OK: false, Error: ...}. The local
  matrix still ships and Responded reflects partial success.

Wire format additions (forward-compatible):

- KeyVizMatrix.Fanout *FanoutResult — Nodes / Responded /
  Expected; omitted when fan-out is disabled.
- KeyVizRow.Conflict bool — true when the max-merge collapsed
  disagreeing values for any cell in the row.

Flag plumbing (main.go / main_admin.go):

- --keyvizFanoutNodes accepts comma-separated host:port endpoints.
- --keyvizFanoutTimeout sets the per-peer ceiling.
- Self-filter: a peer entry naming this node's own admin listener
  (literal match, or 0.0.0.0/:: bind matched against a
  127.0.0.1/localhost/::1 entry) is skipped so a symmetric
  --keyvizFanoutNodes=node1,node2,node3 stamped onto every host
  does not loop back over HTTP.

Tests (internal/admin/keyviz_fanout_test.go):

- mergeKeyVizMatrices table covering sum, stable-leader max,
  leadership-flip max with conflict, union-of-columns, and
  distinct-row order preservation.
- KeyVizFanout.Run end-to-end against httptest peers: single peer
  ok, peer 5xx surfaces in degraded mode, peer hang respected by
  the per-call timeout.
- buildKeyVizPeerURL table covering scheme/no-scheme/with-bounds
  variants.
- Peer order is operator-supplied with self always first.

SPA changes follow in PR-2 (degraded banner + conflict hatching).
The proto / JSON extension with raftGroupID + leaderTerm follows in
PR-3 (Phase 2-C+).
Two items from Gemini round-1 on PR #686:

- security-high: io.LimitReader the JSON decode at 64 MiB. A
  misbehaving peer that streams gigabytes back at us would otherwise
  pin a goroutine on json.Decode and balloon memory. The cap is
  generous enough for the design's worst-case payload (1024 rows x
  4096 columns x ~32 bytes per uint64 ~= 128 MiB raw, ~32 MiB
  JSON-encoded) but still bounded.
- medium: log resp.Body.Close errors at debug level via the existing
  KeyVizFanout.logger field (which was wired but unused).
  peerResponseExceededLimit reuses the +1 byte from the LimitReader
  to detect overshoot without parsing the streamed body.

Adds TestKeyVizFanoutRunPeerOverlargeBody to pin that an in-cap
response decodes promptly and reports OK; the over-cap case is
covered by manual inspection of the warning log path.
bootjp added 3 commits April 27, 2026 18:27
Two items from Claude bot round-2 on PR #686:

- Replace the LimitReader+probe pattern with a countingReader that
  wraps the LimitReader. The decoder's bufio buffering meant the
  external one-byte probe could return EOF even when the decoder
  itself had pulled past the cap into its own buffer; the warning
  fired only when the read happened to land at the boundary. The
  countingReader increments on every Read (including the chunks
  the decoder buffers internally), so the post-decode `n > cap`
  check fires reliably. The security bound (cap+1 max bytes off
  the wire) is unchanged; only the *detection* of overshoot is
  now deterministic.
- Add WithResponseBodyLimit so tests can drive the over-cap path
  without serving 64 MiB. Backed by a new TestKeyVizFanoutRunPeerExceedsBodyLimit
  test that lowers the cap to 1 KiB and verifies the call completes
  promptly with a per-node status surfaced.

Updated the keyVizPeerResponseBodyLimit comment to be honest about
the worst-case sizing: 1024 rows x 4096 columns x 20 bytes/uint64
(max-encoded) is ~80 MiB, slightly over the 64 MiB cap. That is
intentional - the operator-visible failure mode is "warning logged,
matrix may be truncated", not "DoS". A future flag can expose the
cap when an extreme-traffic deployment actually trips it.
… test

Two minor follow-ups from Claude bot round-2 on PR #686:

- countingReader retains the //nolint:wrapcheck but with a longer
  comment explaining why: json.Decoder uses err == io.EOF (pointer
  compare, not errors.Is) to decide whether to keep pulling, so
  wrapping the underlying io.Reader error with %w would silently
  break the decoder's stop condition. CLAUDE.md generally says
  "avoid nolint, refactor instead", but here refactoring is
  impossible -- the only options are pass-through-and-suppress
  (this) or wrap-and-break-the-decoder.
- Add TestKeyVizFanoutRunPeerNearCapSucceedsWithWarning to cover
  the warning-fires path: small JSON envelope (~30 B) + 256 B of
  trailing whitespace against a 100 B cap. Decoder returns success;
  countingReader.n == 101 > 100 triggers the warning. The
  warning log itself is best-effort and not asserted from the
  test (slog has no clean injection point on the package), but the
  reachability of the success path under cap overrun is now pinned.
@bootjp bootjp force-pushed the feat/keyviz-fanout-server branch from 186b6e9 to 7747be1 Compare April 27, 2026 09:27
@bootjp bootjp merged commit 9ec6ca2 into docs/keyviz-fanout-phase2c Apr 27, 2026
5 checks passed
@bootjp bootjp deleted the feat/keyviz-fanout-server branch April 27, 2026 09:44
bootjp added a commit that referenced this pull request Apr 27, 2026
…se 2-C PR-2) (#687)

## Summary

Phase 2-C **PR-2**, per the design doc landed in #685 §6: wire the
fan-out `Fanout` / `conflict` shapes shipped by PR #686 into the admin
SPA so operators can see when a peer failed and which rows are soft due
to a leadership flip.

Backend is unchanged. PR #686's `KeyVizMatrix.Fanout` block is currently
emitted on every fan-out request but the SPA was discarding it.

## What this PR adds

| Piece | Behavior |
|---|---|
| **Header counter** | `cluster view (N of M nodes)` in the per-matrix
metadata strip. Hidden when `fanout` is absent (single-node mode, no
behavior change). |
| **FanoutBanner** | Degraded-mode card above the heatmap when
`responded < expected`. Lists every failed node + error string. Hidden
when the cluster is healthy. |
| **ConflictOverlay** | SVG hatch over rows whose `conflict === true`.
Layered inside the scroll container so the hatch tracks horizontal
scroll (same idiom as `TimeAxis`). 4 px diagonal stripes, alpha 0.45,
inheriting `currentColor` so it works against both light and dark
themes. |
| **RowDetail pill** | Hovering a conflicting row reveals a `conflict`
pill with a hover-tooltip explaining the leadership-flip semantics. |

## Five-lens self-review

1. **Data loss** — n/a; SPA-only change reading existing fields.
2. **Concurrency / distributed** — n/a; single browser tab consuming an
existing API.
3. **Performance** — Banner: zero-cost when healthy (`null` returned).
Hatch: SVG with at most 1024 `<rect>` elements (matches `KeyVizRow`
budget); empirically negligible. Overlay only renders when at least one
row has `conflict === true`.
4. **Data consistency** — UI surfaces existing server signals; semantics
come from PR #685 design §4.2 and #686 implementation. Not relitigated
here.
5. **Test coverage** — `tsc -b --noEmit` and `vite build` clean. UI
behavior (banner visibility, hatch presence, header text) is hard to
assert in `tsc` alone; manual verification documented below.

## Test plan

- [x] `npm run lint` (`tsc -b --noEmit`) — clean
- [x] `npm run build` (Vite) — clean, output writes to
`internal/admin/dist`
- [ ] Manual: `make run` with `--keyvizEnabled
--keyvizFanoutNodes=127.0.0.1:8080`. Open `/keyviz`; header reads
`cluster view (1 of 1 nodes)`, no banner, no hatch.
- [ ] Manual: configure
`--keyvizFanoutNodes=127.0.0.1:8080,127.0.0.1:9999` (port 9999 unused).
Header reads `1 of 2 nodes`, the failed-node banner appears with
`connection refused` for `127.0.0.1:9999`.
- [ ] Manual: smoke a leadership flip mid-window (or fake
`conflict=true` server-side via `/admin/api/v1/keyviz/matrix?...`); the
affected row renders with diagonal hatch overlay; hovering reveals the
`conflict` pill.

## Out of scope

- **Per-cell hatching** — Phase 2-C+ once the proto extension lands and
`conflict` becomes per-cell instead of per-row.
- **Per-node `generated_at` skew indicator** — design §9 Q3, deferred.
- **Banner auto-dismiss / sound notification** — out of scope; an
operator who has the page open will see the banner. CI alert pages are a
separate system.

Closes the SPA half of Phase 2-C; the proto extension follows in PR-3.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* KeyViz now displays cluster fan-out status with node availability and
responsiveness counts.
* Added visual indicators for rows experiencing conflicts during
leadership transitions.
* Row details now include conflict status labels with informative
tooltips.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
bootjp added a commit that referenced this pull request Apr 27, 2026
…401 (#692)

## Summary

**Hotfix for the screenshot a user shared after enabling fan-out** — the
heatmap collapsed to "1 of 6 nodes responded" with every peer returning
`401 missing session cookie`.

**Root cause**: PR #686 shipped fan-out with the design assumption (#685
§3) that peers would accept anonymous calls on a private network. But
the receiving side runs the same `SessionAuth` middleware as the browser
path, so anonymous fan-out calls are rejected with 401. The "anonymous
on a private network" framing was wrong: the implementation didn't put a
bypass on the peer side, so anonymous calls never went through
end-to-end.

**Fix**: forward the inbound user's cookies on every peer call. Cluster
nodes already share `--adminSessionSigningKey` for HA, so a cookie
minted on node A is verifiable on node B — no new infrastructure needed.

## What changed

- `KeyVizFanout.Run` gains a `cookies []*http.Cookie` parameter; nil
preserves the legacy behaviour for tests / future non-browser callers.
- `KeyVizHandler` passes `r.Cookies()` so a SPA poll forwards
`admin_session` (and `admin_csrf`, harmless on a GET) transparently.
- `TestKeyVizFanoutRunForwardsCookies` pins that both cookies reach the
peer verbatim.
- Design doc §3 rewritten: documents the cookie-forwarding scheme + the
operator-trust assumption that `--keyvizFanoutNodes` points at trusted
hosts only. Phase 2-C+ will replace cookie forwarding with a dedicated
inter-node token to decouple browser-session expiry from inter-node call
validity.

## Five-lens self-review

1. **Data loss** — n/a; auth path change.
2. **Concurrency / distributed** — Cookies are passed by reference into
per-peer goroutines but the slice is read-only inside `fetchPeer` (each
goroutine `req.AddCookie`s independently). No shared mutation.
3. **Performance** — Adds N `req.AddCookie` calls per fan-out request.
Trivial.
4. **Data consistency** — Auth-only change; merge semantics unchanged.
5. **Test coverage** — New `TestKeyVizFanoutRunForwardsCookies` asserts
both cookies reach the peer with the original values. Existing fan-out
tests pass `nil` and continue to exercise the unauthenticated path the
existing code handled (peer 401s, request reports `ok=false`).

## Test plan

- [x] `go test -race -count=1 ./internal/admin/...` — clean
- [x] `golangci-lint run ./internal/admin/...` — clean
- [ ] Manual: 6-node cluster with shared
`ELASTICKV_ADMIN_SESSION_SIGNING_KEY`, `--keyvizFanoutNodes` set on
every node. Heatmap should now show "6 of 6 nodes" instead of "1 of 6".

## Trust / threat model note

Cookie forwarding is safe when (a) every peer is operator-configured and
trusted, and (b) the network is private. **Do NOT point
`--keyvizFanoutNodes` at an untrusted host** — the user's admin session
would be replayed there. The design doc §3 update is explicit about
this. Phase 2-C+ removes this footgun by switching to a dedicated
inter-node token.

Closes the production-impacting symptom from screenshot review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant