Skip to content

feat(raft): learner / non-voter membership (M1 engine + M2 operator surface)#676

Merged
bootjp merged 15 commits intomainfrom
feat/raft_learner
Apr 27, 2026
Merged

feat(raft): learner / non-voter membership (M1 engine + M2 operator surface)#676
bootjp merged 15 commits intomainfrom
feat/raft_learner

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 26, 2026

Add Raft learner / non-voter membership to elastickv. Operators can now attach a fresh node as a non-voting replica, let it catch up on the log, and then promote it to voter — without ever shrinking the cluster's effective fault tolerance during catch-up.

The codebase already had partial scaffolding (Server.Suffrage, RaftAdminMember.suffrage, applyConfigPeerChangeToMap keeping learner metadata) waiting for a consumer; this PR turns it on end-to-end.

Scope

Design doc (docs/design/2026_04_26_proposed_raft_learner.md)

Five rounds of review. Approved on Round 5: "ready to proceed to Milestone 1 implementation."

Milestone 1 — engine-level learner support

  • Admin.AddLearner and Admin.PromoteLearner on internal/raftengine.
  • Persisted peers file bumped to v2 with per-peer suffrage byte; v1 reader preserved for upgrade compat.
  • New apply-loop-owned voterCount + isLearnerNode cache (initialized at Open from snapshot ConfState; rebuilt from scratch — never patched — on every conf change).
  • All four len(e.peers) overloads switched to voterCount (recordQuorumAck, refreshStatus single-node fast path, removePeer post-removal, plus the no-op audit on nextPeersAfterConfigChange). Closes the 1-voter+1-learner lease-read stall the design flagged.
  • applyConfigChange / applyConfigChangeV2 extended with confState raftpb.ConfState. Four-step apply-loop sequence: refresh voter cache → per-peer cleanup → rebuild e.config.Servers → bookkeeping.
  • validateConfState accepts learner-only entries; voter check stays element-wise, learner check is set-based.
  • upsertPeer no longer owns e.config.Servers; the apply loop rebuilds it via configurationFromConfState.
  • PromoteLearner preconditions checked synchronously on the admin loop before propose: target must be a learner; leader's Progress.Match >= minAppliedIndex.

Milestone 2 — operator surface

  • gRPC RaftAdmin.AddLearner and RaftAdmin.PromoteLearner RPCs.
  • cmd/raftadmin add_learner <id> <address> [previous_index] and cmd/raftadmin promote_learner <id> [previous_index] [min_applied_index].
  • --raftJoinAsLearner flag — operator alarm (not consensus veto): when a node booted with the flag finds itself in ConfState.Voters, an ERROR-level structured log fires once and JoinRoleViolationCount increments. The node keeps running.
  • docs/raft_learner_operations.md runbook: prerequisites, attach/promote/verify, remove_server for learners, common errors.

Out of scope (deferred to M3 / separate proposal)

  • Follower-served reads (separate proposal needed for staleness bounds and adapter routing).
  • Joint consensus / atomic multi-member reconfig.
  • Jepsen workload exercising attach during partition.
  • Status.PerPeer exposing Progress.Match for min_applied_index selection.
  • Auto-promotion (engine never decides on its own to promote).

Test evidence

  • internal/raftengine/etcd/learner_test.go — 6 new tests:
    • TestAddLearnerReplicatesWithoutCountingAsVoter
    • TestPromoteLearnerSwapsRoleToVoter
    • TestPromoteLearnerRejectsNonLearner
    • TestPromoteLearnerRejectsNotCaughtUp
    • TestRemovePeerLearnerKeepsSingleNodeFastPath (the headline 1-voter+1-learner lease-read regression)
    • TestJoinAsLearnerAlarmFiresWhenAddedAsVoter
    • TestLinearizableReadOnLearnerForwardsToLeader
  • internal/raftengine/etcd/peer_metadata_test.go — 4 new tests covering v2 round-trip with mixed suffrage, v1 forward-compat, v2 always-write, unknown-version rejection.
  • internal/raftadmin/server_test.goTestServerMapsEngineAdminMethods extended for the two new RPCs.
  • cmd/raftadmin/main_test.goadd_learner / promote_learner happy-path subcommand tests + extended usage assertions.
  • All other suites (internal/..., kv/..., store/..., monitoring/..., distribution/..., proxy/..., cmd/...) pass.
  • golangci-lint: 0 issues.

Self-review (per CLAUDE.md)

  1. Data loss — Promotion never decrements voter count; quorum size after promote is floor((N+1)/2)+1, ≥ pre-promote. WAL purge tracker includes learners by construction. v2 peers file forward-compat is read-once-write-once.
  2. Concurrency — Add/promote run on the single-threaded admin loop. voterCount / isLearnerNode cache is single-writer (apply loop) under existing e.mu. recordQuorumAck learner-ack reject reads isLearnerNode under the same lock as e.peers. Promote-during-leader-change handled via min_applied_index precondition.
  3. Performance — Adding a learner adds outbound replication traffic but no extra Raft round-trips on the write path; no extra per-call cost on the lease-read fast path. v2 peers file overhead: 1 byte per peer (bounded at 1024 peers).
  4. Data consistency — Learner does not serve LinearizableRead from local FSM (returns ErrNotLeader). Voter / learner suffrage is sourced from ConfState, not e.peers. Snapshot-restore path also refreshes voterCount / isLearnerNode and runs the alarm hook.
  5. Test coverage — New / changed branches each have a unit test as listed above; conformance test confirms suffrage round-trips across restart; lease-read regression test pins the 1-voter+1-learner fast path.

Caveats

  • proto/Makefile pins libprotoc 29.3; this branch was regenerated with libprotoc 34.0 (host) + protoc-gen-go v1.36.11 + protoc-gen-go-grpc 1.6.1 (matching). The diff is the standard "two new messages + msgTypes index shift" — re-run make gen with the pinned toolchain if byte-identical generation matters.
  • Branch is 64 commits behind origin/main as of writing. Rebase before merge.

Turn on the dormant learner / non-voter path in the etcd raft engine so
operators can attach catch-up replicas without shrinking voter fault
tolerance during catch-up. Phase 3 of the etcd-migration doc deliberately
deferred this; the codebase already has partial scaffolding (Server.Suffrage,
RaftAdminMember.suffrage, applyConfigPeerChangeToMap keeping learner
metadata) waiting for a consumer.

Scope: AddLearner/PromoteLearner on raftengine.Admin, end-to-end suffrage
reporting, peer-metadata file v2 with a per-peer suffrage byte,
quorumAckTracker denominator audit so a slow learner cannot stall lease
reads, and an explicit min_applied_index precondition on promote. Out of
scope: follower-served reads, joint consensus, KV/adapter changes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 53 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 09a1737e-5828-4b37-b4e5-08f5c396b8ed

📥 Commits

Reviewing files that changed from the base of the PR and between 10f38b6 and 32e20b1.

⛔ Files ignored due to path filters (2)
  • proto/service.pb.go is excluded by !**/*.pb.go
  • proto/service_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (16)
  • cmd/raftadmin/main.go
  • cmd/raftadmin/main_test.go
  • docs/design/2026_04_26_proposed_raft_learner.md
  • docs/raft_learner_operations.md
  • internal/raftadmin/server.go
  • internal/raftadmin/server_test.go
  • internal/raftengine/engine.go
  • internal/raftengine/etcd/engine.go
  • internal/raftengine/etcd/engine_test.go
  • internal/raftengine/etcd/factory.go
  • internal/raftengine/etcd/learner_test.go
  • internal/raftengine/etcd/peer_metadata_test.go
  • internal/raftengine/factory.go
  • main.go
  • multiraft_runtime.go
  • proto/service.proto
📝 Walkthrough

Walkthrough

This pull request implements learner (non-voting) membership functionality in the Elastickv Raft system. Changes span a design document, new admin APIs (AddLearner, PromoteLearner), core engine logic with quorum/lease-read adjustments, v2 peers file format with per-peer suffrage bytes, persistence validation updates, and comprehensive end-to-end tests.

Changes

Cohort / File(s) Summary
Design & Specification
docs/design/2026_04_26_proposed_raft_learner.md
Detailed design proposal for learner membership: API definitions, v2 peers file format, quorum/lease-read correctness changes, promotion preconditions, bootstrap/join behavior, and test expectations.
Admin API Interface
internal/raftengine/engine.go, internal/raftadmin/server_test.go
Added AddLearner and PromoteLearner methods to the Admin interface; extended test fake with call-tracking and deterministic return values for both operations.
Core Engine Implementation
internal/raftengine/etcd/engine.go, internal/raftengine/etcd/engine_test.go
Implemented learner submission via ConfChangeAddLearnerNode and promotion via ConfChangeAddNode with minAppliedIndex precondition checks; refactored admin dispatch to handle learner operations; replaced quorum denominator with cached voterCount; rebuilt configuration publication to include learner suffrage; updated test expectations for persisted suffrage.
Peer Metadata & Persistence
internal/raftengine/etcd/peer_metadata.go, internal/raftengine/etcd/peer_metadata_test.go, internal/raftengine/etcd/peers.go, internal/raftengine/etcd/persistence.go
Added v2 peers file format with per-peer suffrage bytes; v1 defaults to voter; introduced Suffrage field to Peer struct; added "voter" and "learner" string constants; implemented binary read/write helpers for uint8.
Validation & Consistency
internal/raftengine/etcd/wal_store.go
Refactored validateConfState to derive expected membership from peers (voters ordered, learners as a set) and validate voters element-wise and learners as set membership instead of combined ConfState comparison.
Learner Behavior Tests
internal/raftengine/etcd/learner_test.go
Five comprehensive end-to-end tests validating: learner replication without quorum impact, suffrage reporting, promotion lifecycle, precondition enforcement (minAppliedIndex, learner-only requirement), and single-node fast-path preservation after learner removal.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Admin API
    participant Raft Core
    participant Log/FSM
    participant Persistence

    Client->>Admin API: AddLearner(nodeID, address)
    Admin API->>Raft Core: Submit ConfChangeAddLearnerNode
    Raft Core->>Log/FSM: Append & commit config change
    Raft Core->>Persistence: Persist v2 peers with suffrage
    Raft Core->>Admin API: Return applied index
    Admin API->>Client: Return index
    Note over Raft Core: Learner now replicates<br/>but not counted in quorum
Loading
sequenceDiagram
    participant Client
    participant Admin API
    participant Leader
    participant Raft Core
    participant Learner Node
    participant Log/FSM

    Client->>Admin API: PromoteLearner(nodeID, minAppliedIndex=50)
    Admin API->>Leader: Check Progress[nodeID].Match >= 50
    alt Progress.Match < minAppliedIndex
        Admin API->>Client: FailedPrecondition error
    else Learner caught up
        Admin API->>Raft Core: Submit ConfChangeAddNode
        Raft Core->>Log/FSM: Append & commit config change
        Learner Node->>Log/FSM: Apply config, transition to voter
        Raft Core->>Client: Return applied index
    end
    Note over Learner Node: Now counts toward quorum
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A learner joins without a vote,
So lightly on the raft they float,
Then when they've caught the logs complete,
Promotion makes the quorum sweet!
No voters lost, just wisdom grows.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing learner/non-voter membership in the raft engine with both operator surface and core engine APIs (M1 and M2 milestones).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/raft_learner

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request proposes a design for Raft Learner support to enable safe membership changes and replica catch-up without impacting quorum. The design covers engine API additions, etcd backend updates, and a new versioned format for the persisted peers file. The review feedback identifies a potential stall in lease-read logic for single-voter clusters, requests clarification on the authoritative source of peer metadata during recovery, and suggests detailing the enforcement of the join-as-learner flag to ensure zero-downtime migrations.

Comment on lines +259 to +262
The `ConfState` carried inside the local raft snapshot remains the
*authoritative* source of suffrage. The peers file is a cache plus an
operator-readable summary; on conflict, the snapshot's `ConfState` wins,
and `persistConfigState` overwrites the peers file accordingly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The design states that the snapshot's ConfState is the authoritative source of truth. However, the existing validateOpenPeers logic (line 2541 in internal/raftengine/etcd/engine.go) allows the persisted peers file to be 'ahead' of the snapshot when configuration changes are committed in the WAL but not yet snapshotted. In this scenario, the peers file is more up-to-date. The design should clarify that the authoritative source is the latest committed configuration (whether in a snapshot or the WAL), and that the peers file is the persistent record of that state used during recovery to resolve addresses before the WAL is replayed. Because this ambiguity poses a significant operational risk during node recovery, the design must also detail potential mitigation strategies to ensure consistency.

References
  1. When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary "bridge" or "proxy" mode.

Comment on lines +348 to +352
- The engine must compute the lease-read denominator from
`len(ConfState.Voters)` (excluding self if leader), not `len(e.peers)`.
`recordQuorumAck` already gates on `isLeader`; we additionally drop
responses from peers whose `nodeID` is in `ConfState.Learners` so a
learner heartbeat ack cannot count toward voter majority.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The proposed logic for the lease-read denominator must also be applied to the singleNodeLeaderAckMonoNs update in refreshStatus. Currently, refreshStatus (line 2176 in internal/raftengine/etcd/engine.go) checks clusterSize <= 1 using len(e.peers). If a learner is present in a single-voter cluster, len(e.peers) will be greater than 1, causing the leader to stop updating the single-node fast path and switch to the ackTracker. Since learner acks are dropped (as proposed in §4.6), the leader will never reach a quorum, stalling lease reads. This check must be updated to use the voter count from ConfState to ensure single-voter clusters with learners still benefit from the lease fast path.

Comment on lines +181 to +186
2. The leader's `Progress[nodeID].Match >= minAppliedIndex`. The leader
is the authority on follower progress; an operator who reads
`Status.AppliedIndex` from the learner itself sees a slightly stale
view, so the engine re-checks against the leader's tracker before
proposing. `minAppliedIndex == 0` skips the precondition (matches the
existing "0 means don't care" convention used by `prevIndex`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PromoteLearner precondition check against Progress[nodeID].Match will require calling RawNode.Status() on the leader. Note that Status() is relatively expensive in etcd/raft as it clones the entire progress map. While this is acceptable for an infrequent administrative operation, the implementation should be mindful of this cost and ensure it is only performed within the admin request handler, not on every iteration of the raft loop.

Comment on lines +317 to +321
- New CLI flag (per group): `--raftJoinAsLearner` (default `false`). When
set, the local node refuses to self-bootstrap and refuses to be added
with `AddVoter` from outside; it expects an operator to call
`AddLearner` against the existing leader before it appears in
`ConfState`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Regarding the --raftJoinAsLearner flag, the design mentions that the node 'refuses to be added with AddVoter from outside'. It would be beneficial to specify the enforcement mechanism. For instance, the node should verify its own suffrage when applying a ConfChange and fail or log a critical error if it is added as a voter when it was explicitly configured to join as a learner. This prevents accidental promotion before the node has caught up and potentially stalling the quorum. As this feature is intended for production-grade system migrations, the plan must also detail how this mechanism supports zero-downtime cutovers to avoid service interruption.

References
  1. When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 3m 27s —— View job


Review of docs/design/2026_04_26_proposed_raft_learner.md

  • Gather context and understand request
  • Read design document
  • Read existing raft engine code for context
  • Analyze design for correctness, completeness, and alignment with codebase
  • Post review feedback

The overall shape of this proposal is solid: the problem statement is accurate, the three-milestone breakdown is appropriately scoped, and the non-goals boundary is well-drawn. The dormant scaffolding (applyConfigPeerChangeToMap, Server.Suffrage, RaftAdminMember.suffrage) confirms this is the right moment to close the gap. That said, there are several correctness issues and under-specified implementation details that need to be addressed before Milestone 1 implementation starts.


High Priority — Correctness

1. refreshStatus single-node fast-path breaks with a single voter + one learner (extends gemini §352)

This is the most critical issue in the proposal. refreshStatus at engine.go:2176 does:

clusterSize := len(e.peers)
// ...
if status.State == raftengine.StateLeader && clusterSize <= 1 {
    e.singleNodeLeaderAckMonoNs.Store(monoclock.Now().Nanos())
} else {
    e.singleNodeLeaderAckMonoNs.Store(0)
}

Once AddLearner adds the learner to e.peers, len(e.peers) becomes 2 even though there is still only 1 voter. The single-node fast path is disabled, singleNodeLeaderAckMonoNs is zeroed, and LastQuorumAck() falls through to e.ackTracker.load(). The ackTracker has no entries (because §4.6 proposes dropping learner acks), so lease reads stall indefinitely.

§4.6 of the proposal only mentions followerQuorumForClusterSize and recordQuorumAckrefreshStatus is not listed as a call site to update. This must be added explicitly to Milestone 1: clusterSize <= 1 must use voter count from ConfState, not len(e.peers). Fix this →

2. recordQuorumAck denominator also uses len(e.peers)

engine.go:1490:

clusterSize := len(e.peers)
// ...
e.ackTracker.recordAck(msg.From, followerQuorumForClusterSize(clusterSize))

For a 3-voter + 1-learner cluster, followerQuorumForClusterSize(4) = 2, but the correct voter quorum is followerQuorumForClusterSize(3) = 1. The design mentions fixing this but doesn't specify how ConfState.Voters count is accessed here — recordQuorumAck runs in the event-loop goroutine (single writer), so it can read the cached ConfState without locking, but the design needs to state explicitly that the cached ConfState is maintained in the event loop's owned state.

3. confStateForPeers is not updated for learner entries in the v2 peers file

peers.go:216-222:

func confStateForPeers(peers []Peer) raftpb.ConfState {
    voters := make([]uint64, 0, len(peers))
    for _, peer := range peers {
        voters = append(voters, peer.NodeID)
    }
    return raftpb.ConfState{Voters: voters}
}

The v2 peers file stores a suffrage byte per peer. When a restarting learner node loads its v2 peers file, the peers slice contains the learner with suffrage=1. But confStateForPeers still emits every peer as a voter. This means a restarting learner would initialize its ConfState with itself (and other learners) in Voters, not Learners. The WAL replay will eventually correct this — but only after the relevant ConfChangeAddLearnerNode entries are replayed. Before that, validateConfState and early open-path logic see an inconsistent state. The design is silent on this; it needs either:

  • A Peer.Suffrage field added to the Peer struct and propagated through confStateForPeers, or
  • An explicit note that confStateForPeers is only called during cold bootstrap (where no learners can exist by design), and the WAL replay path is always authoritative for non-bootstrap restarts.

4. Dual e.config.Servers update paths will diverge on suffrage

engine.go:3093-3097upsertPeer updates e.config.Servers via upsertConfigServer with a hard-coded Suffrage: "voter". The design says upsertPeer "stops hard-coding 'voter'" and configurationFromConfState becomes the single source of truth.

But the design doesn't resolve the tension: if upsertPeer is called when a learner is added (via applyAddedPeer), it will update e.config.Servers with suffrage: "voter" until the fix lands. More importantly, after the fix, what should upsertPeer pass as Suffrage? It doesn't have access to the ConfState at call time. The cleanest fix is for applyConfigPeerChange to call configurationFromConfState (passing the current snapshot ConfState) to rebuild e.config.Servers after every peer change — making upsertPeer responsible only for e.peers (the address cache), not e.config.Servers. This architectural decision should be explicit in the design.


Medium Priority — Incomplete Specification

5. --raftJoinAsLearner enforcement mechanism underspecified (gemini §321)

The design says the node "refuses to be added with AddVoter from outside" but doesn't detail how. When the node applies a ConfChangeAddNode from the log (after the leader called AddVoter instead of AddLearner), does it check a local flag and reject the Apply? That would diverge the log and be catastrophic. More likely, the enforcement should happen at the RPC layer on the node itself — but how? The node receiving AddVoter from the admin CLI isn't the node being added; it's the leader. The joining node has no API surface to block ConfChangeAddNode. The design should either clarify that --raftJoinAsLearner is purely advisory/operational documentation ("run add_learner, not add_voter") or describe a safe enforcement mechanism (e.g., the joining node checks its suffrage in ConfState after the conf change lands and shuts down with a clear error if it finds itself in Voters instead of Learners).

6. Authoritative conf state source during recovery (gemini §262, extended)

The design says "on conflict, the snapshot's ConfState wins, and persistConfigState overwrites the peers file accordingly." But the validateOpenPeers function at engine.go:2541 short-circuits validation when persisted.Index > snapshot.Metadata.Index — meaning the persisted file is more current. This is correct for address resolution, but the suffrage of those peers is still only definitively known after WAL replay (the snapshot's ConfState plus replayed ConfChange entries). The design should explicitly call out this two-phase recovery: (a) load addresses from the peers file, (b) after WAL replay, overwrite e.peers suffrage cache from the post-replay ConfState. Currently the ConfState in the snapshot is used as the authoritative suffrage source, but WAL entries between the snapshot and the persisted peers index can change suffrage (learner added, learner promoted) without a new snapshot.

7. e.peers inclusion of learners affects all len(e.peers) call sites

The proposal adds learners to e.peers (by making applyConfigPeerChange handle ConfChangeAddLearnerNode). There are multiple places beyond §4.6 where len(e.peers) is used as a proxy for "voter count" or "cluster size". A grep reveals at least: refreshStatus (line 2163), recordQuorumAck (line 1490), removePeer (line 3151 — followerQuorumForClusterSize(postRemovalClusterSize)). The design should enumerate all call sites and confirm each is addressed in Milestone 1, not discovered during implementation.


Low Priority — Clarifications and Suggestions

8. PromoteLearner Progress.Match check: The design correctly notes that Status() clones the progress map and should only be called within the admin handler. Recommend adding a note that the check must happen before proposing the ConfChangeAddNode, not lazily.

9. min_applied_index = 0 footgun: The "0 skips the check" convention is an easy footgun for operators copy-pasting examples. Since this is a new RPC with no existing callers, requiring an explicit opt-in for the skip (e.g., a separate boolean field skip_min_applied_index_check) would be safer. The current 0-means-unchecked convention is also inconsistent with prevIndex where 0 means "don't check the config index" — in practice both are footguns but the proposal already acknowledges this; at least call it out in the open questions.

10. Test plan gap: snapshot-during-learner-catch-up: §5.5 includes a WAL purge test for learner mid-catch-up. Also include a test for the case where a snapshot is triggered while the learner is still receiving the previous snapshot (i.e., concurrent snapshot sends to a slow learner). This exercises snapshot_spool.go's dedup/cancel logic.

11. Design doc reference: §3.2 references engine.go:1817-1830 and engine.go:3080-3106 by line number. These will drift immediately once implementation starts. Either use function names + a brief excerpt, or note in the document that line numbers refer to the pre-implementation state.


Alignment with Gemini Reviewer

The three high-priority gemini issues:

  • §352 (single-node stall): Confirmed and extended above — the refreshStatus path is the specific gap, not just recordQuorumAck.
  • §262 (peers file authority): Confirmed — the two-phase recovery path needs explicit documentation.
  • §321 (join-as-learner enforcement): Confirmed — needs a concrete enforcement model or explicit advisory-only scoping.

The §186 (Status() cost) note is minor and correctly assessed as acceptable for admin ops.


Summary

The design is well-motivated and the non-goal boundaries are correct. The milestone breakdown is reasonable. The main blocker before Milestone 1 implementation is resolving the four correctness issues above (especially the single-node fast-path stall and the confStateForPeers/v2-file gap) and clarifying the --raftJoinAsLearner enforcement model. Recommend updating the design doc with those clarifications before cutting code.

Iterate on the learner design doc to close the eleven points raised by
the gemini-code-assist and claude review passes on PR #676.

Correctness fixes:
- §4.6 enumerates all four len(e.peers) overloads (recordQuorumAck,
  refreshStatus single-node fast path, removePeer post-removal, plus a
  no-op audit of nextPeersAfterConfigChange). Introduces an
  event-loop-owned voterCount + isLearnerNode cache so the lease-read
  hot path stays O(1). The 1-voter+1-learner stall the reviewer flagged
  is the specific scenario the cache is sized to fix.
- §4.2 splits e.config.Servers ownership: upsertPeer keeps only the
  address cache; the apply loop rebuilds e.config.Servers from
  configurationFromConfState after every conf change.
- §4.3 documents two-phase recovery (open-phase reads addresses from
  v2 peers file; WAL-replay phase rebuilds the suffrage view from
  ConfState) and pins confStateForPeers as voters-only / bootstrap-only.
  Peer.Suffrage is a v2-file artifact, not a runtime authority.

Specification:
- §4.5 gives a concrete enforcement model for --raftJoinAsLearner:
  post-apply local detection with an alarm metric, not pre-propose
  veto. Calls out that the flag is an operator alarm, not a
  consensus-level enforcement.
- §4.1 PromoteLearner explicitly preconditions before propose; reads
  Progress from rawNode.Status() on the same goroutine that owns it.
- §3.3 enumerates len(e.peers) call sites at design time.

Tests + open questions:
- §5.5 adds a snapshot-during-learner-catch-up test for snapshot_spool.
- §5.5 splits the lease-read regression into two named cases
  (1v+1L fast path retained; 3v+1L denominator correct).
- §8 lists the min_applied_index=0 footgun as an explicit open
  question with the skip_min_applied_check alternative.
- §3.2 swaps line numbers for function-name + excerpt references so
  the doc stops drifting against the code immediately.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

Addressed the eleven points from the previous review pass — full breakdown in the commit message of 08f59c7. Key items:

  • §4.6 enumerates all four len(e.peers) overloads and introduces an apply-loop-owned voterCount + isLearnerNode cache so the 1-voter+1-learner lease-read stall is fixed at the call site, not just at recordQuorumAck. refreshStatus and removePeer use the same cache.
  • §4.2 splits e.config.Servers ownership: upsertPeer keeps only the address cache; the apply loop rebuilds e.config.Servers from configurationFromConfState after every conf change.
  • §4.3 documents the two-phase recovery and pins confStateForPeers as voters-only / bootstrap-only. Peer.Suffrage is a v2-file artifact, not a runtime authority — closes the divergence concern.
  • §4.5 gives a concrete enforcement model for --raftJoinAsLearner (post-apply local alarm, not pre-propose veto) — closes the underspecification.
  • §3.3 enumerates the len(e.peers) overloads at design time so future implementers don't discover them. Milestone 1 explicitly fixes all four in one PR.
  • §5.5 adds a snapshot-during-learner-catch-up test; lease-read regression test split into the two named scenarios.
  • §8 records the min_applied_index = 0 footgun as an open question with the safer skip_min_applied_check alternative.
  • §3.2 swaps line numbers for function-name + excerpt references.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 4m 37s —— View job


Review of docs/design/2026_04_26_proposed_raft_learner.md (Round 2)

  • Gather context and understand request
  • Read updated design document
  • Verify all 11 previous review points are addressed
  • Check for any new issues introduced by the revisions
  • Post final review

All 11 points from the previous review pass are addressed. Details below, then two new issues surfaced by reading the code against the updated design.


Previous 11 points — verification

# Point Status
1 refreshStatus single-node fast-path stall with 1-voter+1-learner ✅ §4.6 item 2 introduces e.voterCount cache and explicitly names this "the single critical fix for the 1-voter+1-learner stall the reviewer flagged"
2 recordQuorumAck denominator uses len(e.peers) ✅ §4.6 item 1: replace with e.voterCount, early-reject acks when e.isLearnerNode[msg.From]
3 confStateForPeers not updated for learner entries in v2 peers file ✅ §4.3: confStateForPeers stays voters-only / bootstrap-only. Peer.Suffrage is a v2-file artifact; configurationFromConfState is the hot-path suffrage authority. The two writers are distinct and documented.
4 Dual e.config.Servers update paths diverge on suffrage ✅ §4.2 edit 4: upsertPeer keeps only address cache; apply loop rebuilds e.config.Servers via configurationFromConfState after every conf change. removePeer likewise stops splicing e.config.Servers.
5 --raftJoinAsLearner enforcement model underspecified ✅ §4.5: post-apply local alarm (not pre-propose veto). Rationale is sound — pre-propose veto would either require a network vetoing protocol or catastrophic mid-stream Apply rejection. Shutdown is explicitly rejected since the node already counts toward quorum once the conf change applies.
6 Authoritative conf-state source during recovery ✅ §4.3 "Authoritative source of suffrage during recovery": explicit two-phase recovery, snapshot+WAL wins, peers file catches up. The gap between snapshot index and persisted-peers index is explicitly addressed.
7 All len(e.peers) call sites enumerated ✅ §3.3 table enumerates all four; §4.6 addresses each; Milestone 1 fixes all four in one PR.
8 PromoteLearner Progress.Match check must be pre-propose, not lazy ✅ §4.1: "Both preconditions below run on the leader, on the single-threaded admin loop (handleAdmin), before calling rawNode.ProposeConfChange." Rationale included.
9 min_applied_index = 0 footgun ✅ §4.1 and §8 item 2a: documented as a known footgun, tracked as an open question with skip_min_applied_check alternative.
10 Test plan: snapshot-during-learner-catch-up ✅ §5.5: "Snapshot-during-learner-catch-up test: trigger a new FSM snapshot while a learner is still receiving the previous snapshot. Verifies snapshot_spool.go's dedup/cancel logic..."
11 Line numbers → function names in §3.2 ✅ §3.2: "Line numbers below reflect main at the propose date and will drift; the function names and the small excerpts are the stable references."

New issues

1. validateConfState breaks on restart after a learner is added — High priority

The design says (§4.2 edit 3):

"validateConfState (wal_store.go) accepts learner-only entries: len(conf.Learners) > 0 is no longer rejected."

That fix removes the len(conf.Learners) > 0 rejection at wal_store.go:419. But validateConfState also performs a voter-count comparison using confStateForPeers(peers) (wal_store.go:410-416):

expected := confStateForPeers(peers)              // emits ALL peers as voters
if len(conf.Voters) != len(expected.Voters) {     // compare counts
    return errors.Wrapf(errClusterMismatch, ...)
}

confStateForPeers adds every Peer to Voters regardless of suffrage. After a learner is added to e.peers and persisted to the v2 peers file, the peers slice passed to validateConfState from validateOpenPeers (engine.go:2539-2544) will contain 3 voter peers + 1 learner peer. confStateForPeers(peers) emits 4 voters. But the snapshot's ConfState.Voters has 3 voters (and the learner is in ConfState.Learners). The count mismatch triggers errClusterMismatch and the node fails to restart.

The scenario: leader has snapshot at index M (M >= persisted-peers index N), so validateOpenPeers calls validateConfState instead of short-circuiting at the persisted.Index > snapshot.Metadata.Index guard.

The fix is not just removing the Learners rejection — it also requires that the voter-count comparison in validateConfState only counts voter peers:

voterCount := 0
for _, p := range peers {
    if p.Suffrage != "learner" {
        voterCount++
    }
}
// or: use expected.Voters built only from voter-Peer entries

The design must specify this explicitly, since confStateForPeers is declared "unchanged" in §4.3. Fix this →

2. e.voterCount not initialized at Open time — Medium priority

The design introduces e.voterCount and e.isLearnerNode as apply-loop-owned fields, written "after rawNode.ApplyConfChange returns the new ConfState". But refreshStatus() is called at engine.go:467 during Open, before the apply loop processes any entry. At that point, e.voterCount is 0 (zero value). The clusterSize <= 1 check in refreshStatus becomes 0 <= 1 = true, which falsely activates the single-node fast path for a 3-voter cluster on startup.

The parallel: e.config is initialized at engine.go:448 from the bootstrap ConfState:

config: configurationFromConfState(peerMap, prepared.disk.LocalSnap.Metadata.ConfState),

e.voterCount and e.isLearnerNode need the same treatment — derive them from prepared.disk.LocalSnap.Metadata.ConfState at Open time, adjacent to the config initialization. The design should call this out explicitly, since the apply-loop-only description gives the false impression that no Open-time initialization is needed. Fix this →


Minor notes

  • §8 numbering: Items 2 and 2a are both labeled 2 in the markdown source. Consider 2b for the existing item 2 and 2a for the footgun, or just renumber them 2/3 and shift down. Not blocking.

  • e.isLearnerNode rebuild vs. patch: The design says both fields are "derived directly from conf.Voters / conf.Learners", implying a full rebuild on each conf change (correct). A note that the map is rebuilt from scratch — not incrementally patched — would prevent an implementer from writing e.isLearnerNode[nodeID] = true (which would leave stale entries after promotion).


Summary

The design is solid and the eleven previous points are all addressed with care. The two new issues above both require small additions to the design doc before Milestone 1 implementation starts: (1) clarify the validateConfState voter-count comparison and how it handles learner entries in the peers slice, and (2) add e.voterCount / e.isLearnerNode initialization to the Open-time setup alongside the existing e.config init.

claude[bot] Round 2 confirmed all 11 prior points, surfaced two new
issues plus minor cleanup:

- §4.2 edit 3: validateConfState voter-count comparison breaks on
  restart with a learner persisted in the v2 peers file, because
  confStateForPeers emits every Peer as a voter. Document that
  validateConfState builds its own expected voter set filtered by
  Suffrage != "learner" rather than re-using the cold-bootstrap helper.
  Keeps confStateForPeers minimal as the prior round documented.

- §4.6: e.voterCount and e.isLearnerNode must be initialized at Open
  time from prepared.disk.LocalSnap.Metadata.ConfState, adjacent to the
  existing e.config initialization. Without this, refreshStatus runs
  once during Open before the apply loop and reads zero values, falsely
  activating the single-node fast path on a 3-voter cluster startup.
  Also call out that isLearnerNode is rebuilt from scratch on each conf
  change, never patched, to prevent stale-true entries after promotion.

- §8 numbering: 2 / 2a → 2 / 3.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

Addressed both new issues from Round 2 (commit dcf1337):

  • High — validateConfState voter-count mismatch on restart with learners: §4.2 edit 3 now documents that validateConfState builds its own expected voter set filtered by Suffrage != "learner". confStateForPeers stays minimal as Round 1 documented. (Resolution (a) from your two options.)

  • Medium — e.voterCount / e.isLearnerNode not initialized at Open: §4.6 now has an explicit "Initialization at Open time" subsection that wires both fields adjacent to the existing e.config init from prepared.disk.LocalSnap.Metadata.ConfState. Closes the false single-node-fast-path activation on 3-voter startup.

  • Minor: §4.6 calls out that isLearnerNode is rebuilt from scratch on each conf change, never patched — addresses the stale-true-after-promotion footgun. §8 numbering fixed (2/2a → 2/3).

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 5m 8s —— View job


Review of docs/design/2026_04_26_proposed_raft_learner.md (Round 3)

  • Gather context and understand request
  • Read updated design document (commit dcf1337)
  • Verify Round 2 issues are addressed
  • Check for new issues introduced by revisions
  • Post final review

Both Round 2 issues are addressed. Details below, then two new issues surfaced by reading the code against the updated design.


Round 2 issues — verification

# Point Status
1 (High) validateConfState voter-count mismatch on restart with learners ✅ §4.2 edit 3: Learners rejection removed; Resolution (a) filter + compare approach documented.
2 (Medium) e.voterCount / e.isLearnerNode not initialized at Open ✅ §4.6 "Initialization at Open time" subsection added with explicit code wiring both fields from prepared.disk.LocalSnap.Metadata.ConfState adjacent to the existing e.config init.
Minor isLearnerNode stale-true-after-promotion footgun ✅ §4.6 explicitly states the map is rebuilt from scratch on each conf change, not patched.
Minor §8 numbering (2/2a) ⚠️ Partially fixed — items that were 2 and 2a are now 2 and 3, but the item that was originally 3 (--raftBootstrapMembers) is still labeled 3, creating a duplicate. Items that should be 4 and 5 are still labeled 3 and 4. The tail of §8 needs a renumber.

New issues

1. validateConfState Resolution (a) has an implicit ordering requirement for learners — Minor

Resolution (a) in §4.2 edit 3 says validateConfState "compares both Voters and Learners element-wise against the snapshot's ConfState." The element-wise comparison (the existing pattern in wal_store.go:414-418) is order-sensitive: it checks conf.Voters[i] != expected.Voters[i]. Extending this to Learners requires that the learner entries in the peers list passed to validateConfState are in the same order as ConfState.Learners.

The voter-ordering invariant already holds implicitly — persistConfigState writes voters in an order that matches ConfState.Voters, and normalizePeers preserves it. But the design doesn't establish that the same ordering guarantee will hold for learners (there's no prior convention, since learners have never been persisted before).

The design should either:

  • Explicitly state that persistConfigState will write learner entries in the same order as ConfState.Learners (pin this as a contract on the v2 writer), OR
  • Use a set-based comparison instead of element-wise for the learner entries, which is more robust and avoids the ordering dependency entirely. The voter check could keep element-wise (to match existing behavior); the learner check could use a set: len(conf.Learners) != filteredLearnerCount || !allLearnersPresent(conf.Learners, learnerSet).

Fix this →

2. applyConfigChange / applyConfigChangeV2 don't receive ConfState — Minor

§4.6 says e.voterCount and e.isLearnerNode are updated "in applyConfigChange / applyConfigChangeV2, after rawNode.ApplyConfChange(...) returns the new ConfState." §4.2 edit 4 similarly says e.config.Servers is "recomputed from the post-change ConfState by a single call to configurationFromConfState inside applyConfigChange / applyConfigChangeV2."

But the current signatures (engine.go:1803):

func (e *Engine) applyConfigChange(changeType raftpb.ConfChangeType, nodeID uint64, context []byte, index uint64)
func (e *Engine) applyConfigChangeV2(cc raftpb.ConfChangeV2, index uint64)

don't receive confState. The confState is available at the call sites (engine.go:1740 and engine.go:1752), not inside those functions.

The implementation will need to choose: (a) extend the signatures of applyConfigChange / applyConfigChangeV2 to accept confState, or (b) update e.voterCount, e.isLearnerNode, and e.config.Servers inline at the call site (between rawNode.ApplyConfChange and applyConfigChange). The design should specify which approach is intended so the implementation doesn't improvise. Note: nextPeersAfterConfigChange already has the ConfState parameter pattern (engine.go:3108), so option (a) is consistent with existing code shape.

Fix this →


Summary

The design is very close to implementation-ready. The two Round 2 correctness issues are properly resolved, and §4.6 is now the clearest section of the document. The two new issues above are both minor specification gaps that an implementer would need to resolve before writing the first line of code. Addressing either in the design doc (ordering contract vs. set-based comparison; signature extension vs. inline update) will close the loop.

claude[bot] Round 3 confirmed both Round-2 issues; flagged two minor
specification gaps and a residual numbering issue:

- §4.2 edit 3 Resolution (a): switch the learner check to a set-based
  comparison instead of element-wise. There is no prior convention
  pinning learner ordering in persisted state (learners have never
  been persisted before), and a set comparison is strictly more robust
  than retroactively introducing an ordering contract on the v2
  writer. The voter check stays element-wise, matching the existing
  well-established invariant.

- §4.2 edit 4: extend applyConfigChange / applyConfigChangeV2
  signatures to take confState raftpb.ConfState, mirroring the
  existing nextPeersAfterConfigChange shape. This is the natural
  threading point for the suffrage-aware updates of e.config.Servers,
  e.voterCount, and e.isLearnerNode. Document that suffrage updates
  run after the per-peer cleanup helpers so the apply-loop ordering
  is deterministic.

- §8: full renumber. Items were 1, 2, 3, 3, 4 — now 1, 2, 3, 4, 5.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

Addressed all three Round-3 items (commit 34c84dc):

  • Minor — validateConfState Resolution (a) learner ordering: switched the learner check to set-based comparison. No prior convention pins learner ordering in persisted state (learners have never been persisted before), and a set comparison is strictly more robust than introducing a new ordering contract on the v2 writer. Voter check stays element-wise to match the existing well-established invariant.

  • Minor — applyConfigChange / applyConfigChangeV2 signatures: chose option (a). §4.2 edit 4 now specifies the extended signature (..., confState raftpb.ConfState) mirroring nextPeersAfterConfigChange. Includes an explicit ordering note: suffrage-aware updates run after the per-peer cleanup helpers so the apply-loop ordering is deterministic.

  • §8 renumber: items were 1, 2, 3, 3, 4 → now 1, 2, 3, 4, 5.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 3m 15s —— View job


Review of docs/design/2026_04_26_proposed_raft_learner.md (Round 4)

  • Gather context and understand request
  • Read updated design document (commit 34c84dc)
  • Verify Round-3 issues are addressed
  • Check for new issues introduced by revisions
  • Post final review

All three Round-3 items are addressed. Details below, then two new issues surfaced by reading the updated §4.2 against §4.6 and §5.2.


Round-3 issues — verification

# Point Status
1 (Minor) validateConfState Resolution (a) learner ordering — set-based comparison ✅ §4.2 edit 3: voter check stays element-wise; learner check is set-based with allLearnersPresent. Set comparison avoids a new ordering contract on the v2 writer.
2 (Minor) applyConfigChange / applyConfigChangeV2 signatures ✅ §4.2 edit 4: chose option (a). Extended signatures accept confState raftpb.ConfState, mirroring nextPeersAfterConfigChange. Ordering note present: suffrage-aware updates run after per-peer cleanup helpers.
3 (Minor) §8 numbering (1, 2, 3, 3, 4 → 1, 2, 3, 4, 5) ✅ §8 now has five distinct numbered items.

New issues

1. Set-based learner check formula has a logic error — High priority

§4.2 edit 3 specifies the learner check as:

len(conf.Learners) != filteredLearnerCount && allLearnersPresent(conf.Learners, learnerSet)

If this expression is the error condition (return errClusterMismatch when true), the && is wrong. Consider:

conf.Learners peers-file learners Expected Formula result
[A, B] [A, B] valid 2 != 2 && true = false
[A, B] [A, C] (count same, members differ) error 2 != 2 && false = false ✗ — mismatch undetected
[A, B] [A] error 2 != 1 && false = false ✗ — mismatch undetected
[A] [A, B] error 1 != 2 && true = true

Cases 2 and 3 — a same-count member divergence and a conf-learner not present in peers — are silently passed through by the &&. An implementer copying this formula would produce a broken validation that accepts a mismatched cluster on restart.

The correct error condition is:

len(conf.Learners) != filteredLearnerCount || !allLearnersPresent(conf.Learners, learnerSet)

Equivalently, the valid condition:

len(conf.Learners) == filteredLearnerCount && allLearnersPresent(conf.Learners, learnerSet)

The voter check (len(conf.Voters) != len(expected.Voters)) uses != as the error condition — the learner check should follow the same pattern, just with a set-membership predicate replacing element-wise order. Fix this →

2. e.voterCount update ordering contradicts itself between §4.2 and §4.6 / §5.2 — Medium priority

§4.2 edit 4 says:

"Inside the body, suffrage-aware updates run after the existing per-peer cleanup helpers (upsertPeer / removePeer) so address cache mutations and voter-count rebuilds happen in a deterministic order on the single-threaded apply loop"

But §4.6 item 3 says:

"the apply loop updates e.voterCount from the post-change ConfState before removePeer is called for any address-cache cleanup, so removePeer simply reads e.voterCount."

And §5.2 confirms the same:

"the apply loop updates e.voterCount and e.isLearnerNode before invoking the per-peer cleanup helpers (upsertPeer, removePeer)"

These are directly contradictory. An implementer following §4.2 (update AFTER) and reading removePeer would find that e.voterCount still reflects the pre-change state when removePeer runs — which means removePeer cannot "simply read e.voterCount" as §4.6 promises. The post-removal voter count would be stale by one.

The correct ordering is voterCount update before removePeer / upsertPeer, as §4.6 and §5.2 state. The "after" in §4.2 edit 4 is wrong and should be changed to "before". The rationale for "before": removePeer reads e.voterCount to compute the post-removal ack-tracker threshold — this requires seeing the already-committed conf change, not the pre-change count. Fix this →


Summary

The design has absorbed four rounds of review and is close to implementation-ready. The two new issues above need to be resolved in the doc before code is written — the set-based formula bug in particular would propagate directly into a broken validateConfState if copied verbatim by an implementer.

claude[bot] Round 4 confirmed all three Round-3 items; flagged two
genuine bugs in the prose:

- §4.2 edit 3: the set-based learner check formula used && where it
  should have been ||. The error condition needs to fire on EITHER
  count mismatch OR set-membership mismatch — the && form silently
  accepts same-count member divergence and conf-learner-missing-from-peers
  cases. Replaced with the correct || form, plus an explicit valid-
  condition restatement and a note on which mismatch cases the form
  catches.

- §4.2 edit 4: the doc said suffrage-aware updates run after per-peer
  cleanup, but §4.6 and §5.2 say they run before. The contradiction
  resolves to two different orderings:
  - voterCount / isLearnerNode update BEFORE upsertPeer / removePeer
    (so removePeer reads the post-change count for its ack-tracker
    threshold).
  - e.config.Servers rebuild AFTER upsertPeer / removePeer (so it
    reads the post-cleanup address cache).
  Spelled out as a four-step apply-loop sequence to make the
  ordering unambiguous for an implementer.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

Both Round-4 issues were genuine bugs in my prose. Fixed in commit a9b918f:

  • High — set-based formula &&||: §4.2 edit 3 now states the error condition correctly: len mismatch || !allLearnersPresent. Includes the equivalent valid-condition form and an explicit note on the two mismatch cases the && form would have silently accepted (same-count member divergence; conf-learner missing from peers).

  • Medium — ordering contradiction: §4.2 edit 4 now spells out the apply-loop sequence as four explicit steps. The two suffrage-aware updates have different orderings:

    1. voterCount / isLearnerNode update before upsertPeer/removePeer (so removePeer reads the post-change count for its ack-tracker threshold — matches §4.6 and §5.2).
    2. e.config.Servers rebuild after upsertPeer/removePeer (so it reads the post-cleanup address cache).

    Step-numbered sequence makes this unambiguous for an implementer.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 3m 3s —— View job


Review of docs/design/2026_04_26_proposed_raft_learner.md (Round 5)

  • Gather context and understand request
  • Read updated design document (commit a9b918f)
  • Verify Round-4 issues are addressed
  • Check for new issues introduced by revisions
  • Post final review

Both Round-4 issues are correctly resolved. Details below.


Round-4 issues — verification

# Point Status
1 (High) Set-based formula && → `
2 (Medium) Ordering contradiction between §4.2, §4.6, §5.2 ✅ §4.2 edit 4 now has an explicit 4-step sequence that resolves the contradiction: (1) recompute e.voterCount / e.isLearnerNode before upsertPeer / removePeer, (2) run helpers, (3) rebuild e.config.Servers after helpers, (4) existing bookkeeping. §4.6 item 3 ("the apply loop updates e.voterCount ... before removePeer") and §5.2 ("updates e.voterCount ... before invoking the per-peer cleanup helpers") are now consistent with §4.2.

New issues

No new correctness issues introduced by the revisions. Two very minor documentation gaps that an implementer would need to infer, noted here for completeness but not blocking:

1. learnerSet construction is implied, not specified — Very minor

§4.2 edit 3 uses learnerSet and filteredLearnerCount in the pseudo-code without defining them. The reader must infer: filteredLearnerCount = count of peers where Suffrage == "learner" and learnerSet = {nodeID | peer in peers, peer.Suffrage == "learner"}. This is strongly implied by context and a competent implementer would get it right, but a one-line comment in the validation helper would close the ambiguity at implementation time.

2. learnerSetFromConfState helper referenced in §4.6 but not defined — Very minor

isLearnerNode: learnerSetFromConfState(prepared.disk.LocalSnap.Metadata.ConfState),

The helper's return type (map[uint64]bool), key semantics (learner node IDs → true), and implementation (iterate conf.Learners, set each to true) are obvious from context. No action needed before implementation.

3. Test plan gap: validateConfState mixed-peers mismatch — Informational

§5.5 does not include an explicit unit test for the regression validateConfState edit 3 guards against — specifically, that a node with a mixed voter+learner peers file and a snapshot whose ConfState correctly separates them does not trigger errClusterMismatch on restart. The conformance test ("including after restart") exercises this path end-to-end, so the regression is covered indirectly, but a focused unit test for validateConfState itself (isolated from the multi-node setup) would make it easier to catch regressions in the validation helper in isolation. Not blocking — the end-to-end test is sufficient.


Summary

The design has successfully absorbed five rounds of review. Both high-priority correctness issues from the previous four rounds are resolved, and the document is now internally consistent across §4.2, §4.6, and §5.2. No new correctness issues were introduced by the Round-5 revisions. The design is ready to proceed to Milestone 1 implementation.

bootjp added 5 commits April 27, 2026 02:56
Per learner design doc §4.3. Adds Peer.Suffrage as a v2-peers-file
artifact and bumps the persisted etcd-raft-peers.bin format to v2
with a per-peer suffrage byte (0=voter, 1=learner). The v1 reader is
preserved for upgrade compatibility — v1 files are read as all-voter,
which is exactly the current world.

Suffrage on the Peer struct is intentionally NOT consulted by
confStateForPeers (cold bootstrap stays voters-only) or
configurationFromConfState (hot path reads ConfState directly). It
exists solely to round-trip through the v2 file across restarts so
the post-WAL-replay ConfState view and the operator-visible peers
file stay consistent. ConfState remains the authoritative source of
suffrage truth at runtime.

Tests cover v2 round-trip with mixed voter/learner peers, v1
forward-compat reading as all-voter, and rejection of unknown file
versions.
Per learner design doc sections 4.2 and 4.6. Wires the dormant learner
scaffolding without exposing any operator-facing API yet -- the
engine now correctly applies ConfChangeAddLearnerNode in the live
path, reports suffrage end-to-end, and keeps learners out of the
voter-quorum denominator.

Engine:
- New apply-loop-owned voterCount + isLearnerNode cache, initialized
  at Open from the snapshot ConfState (Voters / Learners) and rebuilt
  from scratch on every conf change (never patched, so promotion of
  a learner cannot leak a stale isLearnerNode entry).
- applyConfigChange / applyConfigChangeV2 take the post-change
  ConfState (mirroring nextPeersAfterConfigChange) and run a
  four-step sequence: refreshVoterCache, per-peer cleanup,
  refreshConfigServers, configIndex bookkeeping. Step ordering
  matters: removePeer reads voterCount for the post-removal
  ack-tracker threshold, so the cache must be refreshed first;
  configurationFromConfState reads the post-cleanup e.peers, so the
  config rebuild must run after.
- applyConfigPeerChange routes ConfChangeAddLearnerNode through
  applyAddedPeer (suffrage is an attribute of ConfState, not e.peers).
- configurationFromConfState walks both Voters and Learners, emitting
  the right Suffrage string for each. upsertPeer / removePeer no
  longer touch e.config.Servers -- the apply loop owns it now via
  refreshConfigServers. Removed the dead upsertConfigServer /
  removeConfigServer / sortConfigServers / serverNodeID helpers.

len(e.peers) overloads (sections 4.6 and 3.3 audit):
- recordQuorumAck uses voterCount and early-rejects acks from
  learners. The 1-voter+1-learner stall the design flagged is the
  specific scenario this fixes.
- refreshStatus single-node fast path uses voterCount.
- removePeer uses postRemovalVoterCount instead of len(e.peers) for
  the ack-tracker threshold.
- nextPeersAfterConfigChange already only walks addresses; audited
  no-op.

validateConfState (section 4.2 edit 3):
- Stops rejecting non-empty Learners. Joint-consensus markers
  (VotersOutgoing, LearnersNext, AutoLeave) stay rejected.
- Builds expected voter slice filtered by Suffrage; voter check stays
  element-wise.
- Learner check is set-based, rejecting both same-count member
  divergence and missing-from-peers.
Per learner design doc section 4.1. The Admin interface and the etcd
backend now expose the operator-facing API for learner membership
mutations. No proto / CLI / flag changes yet -- those land in M2.

Engine API:
- Admin.AddLearner attaches a non-voting replica via V1
  ConfChangeAddLearnerNode. Mirrors AddVoter on the same submit path.
- Admin.PromoteLearner promotes via V1 ConfChangeAddNode. Two
  preconditions run on the leader (single-threaded admin loop) BEFORE
  the propose so a failed precondition does not enter the log:
    1. target must currently be in ConfState.Learners (not Voters);
    2. when minAppliedIndex is non-zero, leader-side
       Progress.Match >= minAppliedIndex.
  rawNode.Status() is read directly inside the admin handler -- same
  goroutine that owns rawNode -- so no clone or lock is required.

etcd backend:
- New adminAction values and a proposeMembershipChange helper that
  factors out the encode + storePendingConfig + ProposeConfChange
  dance shared by AddVoter / AddLearner / PromoteLearner.
- New sentinel errors: errPromoteLearnerNotLearner,
  errPromoteLearnerNoProgress, errPromoteLearnerNotCaughtUp. All
  three surface as FailedPrecondition through the existing
  adminError mapping in internal/raftadmin.
- submitAdminEx threads the richer adminRequest (now carrying
  minAppliedIndex) through the same channel; submitAdmin keeps its
  old shape for the unchanged callers.

Tests (internal/raftengine/etcd/learner_test.go):
- AddLearner happy path: suffrage reported correctly on both peers,
  writes commit on a 1-voter cluster (its own quorum) and replicate
  to the learner.
- Promote happy path: AddLearner -> warm a write -> PromoteLearner ->
  Configuration() flips to two voters.
- PromoteLearner rejects non-learner targets pre-propose.
- PromoteLearner rejects not-caught-up targets via minAppliedIndex.
- Lease-read regression: 1-voter + 1-learner stays on the
  single-node fast path (LastQuorumAck stays warm).
The Admin interface gained AddLearner / PromoteLearner in the
previous commit; without these stubs the existing fakeEngine no
longer satisfies the interface, which silently bypasses the admin
type-assertion in NewServer and surfaces as Unimplemented errors in
TestServerMapsEngineAdminMethods.

The fake records calls in matching slices so future tests for the
Server learner RPCs (M2) can assert on them.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/raftengine/etcd/engine.go (1)

2654-2657: ⚠️ Potential issue | 🟡 Minor

Single-node fast-path campaign is skipped when learners are present.

shouldCampaignSingleNode returns true only when len(cfg.Servers) == 1. After this PR, cfg.Servers includes learners, so a node restarting as the sole voter in a 1-voter + N-learner cluster will not take the fast-path Campaign() and instead waits a full electionTimeout (~1s) for the standard election path. Functionally correct, but a minor startup-latency regression and a behavior change versus today.

Suggested fix: count voters via e.voterCount (already wired up post-Open) or filter cfg.Servers by Suffrage == SuffrageVoter:

♻️ Proposed fix
 func (e *Engine) shouldCampaignSingleNode() bool {
 	cfg := e.currentConfiguration()
-	return len(cfg.Servers) == 1 && cfg.Servers[0].ID == e.localID
+	voterCount := 0
+	var soleVoter raftengine.Server
+	for _, s := range cfg.Servers {
+		if s.Suffrage == SuffrageVoter {
+			voterCount++
+			soleVoter = s
+		}
+	}
+	return voterCount == 1 && soleVoter.ID == e.localID
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/raftengine/etcd/engine.go` around lines 2654 - 2657,
shouldCampaignSingleNode currently counts all entries in cfg.Servers (including
learners) so the single-node fast-path is skipped when learners exist; update
shouldCampaignSingleNode to consider only voters by either using e.voterCount
(available after Open) or by filtering cfg.Servers for entries with
Server.Suffrage == SuffrageVoter and then returning true when the voter count ==
1 and that sole voter’s ID matches e.localID; locate and modify the
shouldCampaignSingleNode method to perform this voter-only check (reference:
shouldCampaignSingleNode, e.voterCount, cfg.Servers, SuffrageVoter).
🧹 Nitpick comments (6)
internal/raftengine/etcd/peer_metadata_test.go (2)

87-103: Test name doesn't match the assertion.

TestPersistedPeersV2RewritesAfterV1Read never reads a v1 file — it just calls savePersistedPeers once and verifies the writer emitted v2. Consider renaming to something like TestPersistedPeersWriterAlwaysEmitsV2 so future readers don't go hunting for the v1→v2 migration step that isn't there.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/raftengine/etcd/peer_metadata_test.go` around lines 87 - 103, Rename
the test TestPersistedPeersV2RewritesAfterV1Read to a name that matches its
behavior (e.g., TestPersistedPeersWriterAlwaysEmitsV2) and update any in-test
comment text to reflect that it verifies the writer emits v2; locate the test
function by its current name TestPersistedPeersV2RewritesAfterV1Read and adjust
references if applicable, leaving the calls to savePersistedPeers,
readPeersFileHeader, peersFileName and the assertion against peersFileVersionV2
unchanged.

13-121: Missing coverage for the unknown-suffrage-byte path.

readPersistedPeer rejects any v2 suffrage byte other than 0 or 1 (peer_metadata.go:181-183), but no test exercises that error branch. Consider adding a fixture that writes a v2 entry with suffrage = 0xFF and asserts LoadPersistedPeers returns an error — it pins the validation that prevents silent voter coercion if a future binary writes a new suffrage variant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/raftengine/etcd/peer_metadata_test.go` around lines 13 - 121, Add a
test (e.g., TestPersistedPeersV2UnknownSuffrage) that uses replaceFile to
hand-craft a v2 peers file: write peersFileMagic, writeU32(peersFileVersionV2),
writeU64(index), writeU32(count), then for a peer writeU64(NodeID),
writeString(ID), writeString(Address) and then write a single suffrage byte
0xFF; call LoadPersistedPeers and assert it returns an error. This will exercise
the readPersistedPeer path that rejects unknown suffrage bytes and ensure the
validation remains enforced.
internal/raftadmin/server_test.go (1)

141-209: Remove unused learner admin fake methods and recording fields.

The proto service (proto/service.proto) does not define AddLearner or PromoteLearner RPCs, and the Server does not expose these methods. The fake engine's AddLearner, PromoteLearner method implementations and the addLearnerCalls, promoteLearnerCalls recording fields (internal/raftadmin/server_test.go lines 27–28, 106–118) are dead code and should be removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/raftadmin/server_test.go` around lines 141 - 209, The test contains
dead code: remove the unused fake engine methods AddLearner and PromoteLearner
and their recording slices addLearnerCalls and promoteLearnerCalls from the
fakeEngine test fixture and any related test assertions; update the fakeEngine
type (references to AddLearner, PromoteLearner, addLearnerCalls,
promoteLearnerCalls) to only implement and record the RPCs actually used by
Server (e.g., AddVoter, RemoveServer, TransferLeadership) so the test compiles
and only exercises defined proto RPCs and Server methods.
internal/raftengine/etcd/engine.go (1)

1419-1442: Synchronous PromoteLearner precondition is sound; small ergonomic note on minAppliedIndex == 0.

Running both preconditions on the admin/run loop before ProposeConfChange is the right pattern — a failed precondition cannot leave a half-applied entry in the log. The use of e.rawNode.Status() is safe since this handler runs on the rawNode-owning goroutine; the Status() clone cost is acceptable for this rarely-invoked admin operation.

Per §8 open question 3 in the design doc, minAppliedIndex == 0 silently disables the primary safety check, while prevIndex == 0 is a benign no-op skip. Operators copy-pasting from existing scripts could inadvertently lose this guard. Consider tracking a follow-up to add an explicit skip_min_applied_check boolean (or refusing 0 and requiring an explicit sentinel) before declaring this surface stable in Milestone 2.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/raftengine/etcd/engine.go` around lines 1419 - 1442, The code
silently treats minAppliedIndex==0 as “skip check”, which is error-prone; update
the PromoteLearner request handling to require an explicit sentinel instead of
implicit 0: add a boolean flag (e.g., SkipMinAppliedCheck) to the admin request
struct and replace the current minAppliedIndex==0 branch with a check that only
skips the applied-index verification when SkipMinAppliedCheck is true; otherwise
return a validation error when minAppliedIndex==0 (or require callers to supply
a non-zero minAppliedIndex). Change the handler that currently reads
req.minAppliedIndex and the callers that construct the request so they set the
new SkipMinAppliedCheck flag or provide a concrete minAppliedIndex, and keep the
existing use of e.rawNode.Status(), progress.Match, e.isLearnerNode and
e.proposeMembershipChange unchanged.
docs/design/2026_04_26_proposed_raft_learner.md (1)

1-941: Thorough design doc; the four-pass review history shows in the precision of §4.2 and §4.6.

The design satisfies the design-doc-first learning ("For any change beyond a single-file edit ... write a *_proposed_*.md design doc first"), and the iterative refinements (set-based learner check, ordering-point split between voter-cache refresh and config-server rebuild, refreshVoterCache rebuild-from-scratch contract) are all reflected in the implementation.

Two small polish items:

  • Line 60 and Line 19 reference specific engine.go line numbers (engine.go:2656, engine.go:3096, engine.go:174). The doc acknowledges line-number drift on Line 128, but for these forward references in §2 / §3.2 you may want to switch to function-name + brief excerpt the same way later sections do, to stay consistent with the round-1 fix the author mentioned in the PR notes.
  • §8 open question 3 (Lines 909–917) flagging the min_applied_index == 0 footgun is the right thing to track explicitly; consider promoting it to a Milestone-2 gate so the operator surface doesn't ship the ergonomic hazard by default.

Based on learnings: "For any change beyond a single-file edit ... write a *_proposed_*.md design doc first and land it before implementation."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/2026_04_26_proposed_raft_learner.md` around lines 1 - 941,
Replace brittle line-number references with stable function-name + brief excerpt
references and promote the min_applied_index==0 footgun to a Milestone-2 gate:
update the two forward refs in §2/§3.2 that currently say "engine.go:2656",
"engine.go:3096", "engine.go:174" to instead cite the corresponding function
names (e.g., the methods or handlers those offsets refer to) plus a short
excerpt of the behavior, and in §8 (open question 3) move the discussion about
min_applied_index == 0 from an open question into the Milestone-2 checklist (or
add an explicit Milestone-2 acceptance criterion) so the operator-facing RPC
surface cannot ship without an approved ergonomics decision about requiring an
explicit skip flag or similar.
internal/raftengine/etcd/learner_test.go (1)

17-209: Solid end-to-end coverage; the negative cases pin down the precondition contract.

The five tests directly map to the design doc's M1 exit criteria and the §4.6 lease-read regression scenarios. Using sentinel matching with errors.Is (Lines 145, 170) for the precondition errors keeps them resilient against future error-message tweaks.

Per coding guidelines (prefer table-driven test cases), TestPromoteLearnerRejectsNonLearner and TestPromoteLearnerRejectsNotCaughtUp share nearly identical scaffolding and could collapse into a single table-driven test with wantErr sentinels. Optional — readability of standalone tests is also fine.

As per coding guidelines: "Co-locate test files in the same package and use table-driven test patterns."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/raftengine/etcd/learner_test.go` around lines 17 - 209, Tests
TestPromoteLearnerRejectsNonLearner and TestPromoteLearnerRejectsNotCaughtUp
duplicate setup and assertion logic; collapse them into a single table-driven
test (e.g., TestPromoteLearnerPreconditionCases) that iterates over cases with
fields {name, setupAction, wantErr} to exercise the two failure scenarios. In
each case reuse the existing scaffolding (newTransportTestNodes,
startTransportTestServers, openTransportTestNode, waitForLeaderNode) and perform
either AddVoter or AddLearner + unreachable-index promotion as before, then call
leader.engine.PromoteLearner(...) and assert errors.Is(err, wantErr) using the
sentinel errors errPromoteLearnerNotLearner and errPromoteLearnerNotCaughtUp;
keep each case independent via t.Run and the same ctx/cancel pattern. Ensure
test names reflect the case and remove the original two standalone tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/raftengine/etcd/engine.go`:
- Around line 2654-2657: shouldCampaignSingleNode currently counts all entries
in cfg.Servers (including learners) so the single-node fast-path is skipped when
learners exist; update shouldCampaignSingleNode to consider only voters by
either using e.voterCount (available after Open) or by filtering cfg.Servers for
entries with Server.Suffrage == SuffrageVoter and then returning true when the
voter count == 1 and that sole voter’s ID matches e.localID; locate and modify
the shouldCampaignSingleNode method to perform this voter-only check (reference:
shouldCampaignSingleNode, e.voterCount, cfg.Servers, SuffrageVoter).

---

Nitpick comments:
In `@docs/design/2026_04_26_proposed_raft_learner.md`:
- Around line 1-941: Replace brittle line-number references with stable
function-name + brief excerpt references and promote the min_applied_index==0
footgun to a Milestone-2 gate: update the two forward refs in §2/§3.2 that
currently say "engine.go:2656", "engine.go:3096", "engine.go:174" to instead
cite the corresponding function names (e.g., the methods or handlers those
offsets refer to) plus a short excerpt of the behavior, and in §8 (open question
3) move the discussion about min_applied_index == 0 from an open question into
the Milestone-2 checklist (or add an explicit Milestone-2 acceptance criterion)
so the operator-facing RPC surface cannot ship without an approved ergonomics
decision about requiring an explicit skip flag or similar.

In `@internal/raftadmin/server_test.go`:
- Around line 141-209: The test contains dead code: remove the unused fake
engine methods AddLearner and PromoteLearner and their recording slices
addLearnerCalls and promoteLearnerCalls from the fakeEngine test fixture and any
related test assertions; update the fakeEngine type (references to AddLearner,
PromoteLearner, addLearnerCalls, promoteLearnerCalls) to only implement and
record the RPCs actually used by Server (e.g., AddVoter, RemoveServer,
TransferLeadership) so the test compiles and only exercises defined proto RPCs
and Server methods.

In `@internal/raftengine/etcd/engine.go`:
- Around line 1419-1442: The code silently treats minAppliedIndex==0 as “skip
check”, which is error-prone; update the PromoteLearner request handling to
require an explicit sentinel instead of implicit 0: add a boolean flag (e.g.,
SkipMinAppliedCheck) to the admin request struct and replace the current
minAppliedIndex==0 branch with a check that only skips the applied-index
verification when SkipMinAppliedCheck is true; otherwise return a validation
error when minAppliedIndex==0 (or require callers to supply a non-zero
minAppliedIndex). Change the handler that currently reads req.minAppliedIndex
and the callers that construct the request so they set the new
SkipMinAppliedCheck flag or provide a concrete minAppliedIndex, and keep the
existing use of e.rawNode.Status(), progress.Match, e.isLearnerNode and
e.proposeMembershipChange unchanged.

In `@internal/raftengine/etcd/learner_test.go`:
- Around line 17-209: Tests TestPromoteLearnerRejectsNonLearner and
TestPromoteLearnerRejectsNotCaughtUp duplicate setup and assertion logic;
collapse them into a single table-driven test (e.g.,
TestPromoteLearnerPreconditionCases) that iterates over cases with fields {name,
setupAction, wantErr} to exercise the two failure scenarios. In each case reuse
the existing scaffolding (newTransportTestNodes, startTransportTestServers,
openTransportTestNode, waitForLeaderNode) and perform either AddVoter or
AddLearner + unreachable-index promotion as before, then call
leader.engine.PromoteLearner(...) and assert errors.Is(err, wantErr) using the
sentinel errors errPromoteLearnerNotLearner and errPromoteLearnerNotCaughtUp;
keep each case independent via t.Run and the same ctx/cancel pattern. Ensure
test names reflect the case and remove the original two standalone tests.

In `@internal/raftengine/etcd/peer_metadata_test.go`:
- Around line 87-103: Rename the test TestPersistedPeersV2RewritesAfterV1Read to
a name that matches its behavior (e.g., TestPersistedPeersWriterAlwaysEmitsV2)
and update any in-test comment text to reflect that it verifies the writer emits
v2; locate the test function by its current name
TestPersistedPeersV2RewritesAfterV1Read and adjust references if applicable,
leaving the calls to savePersistedPeers, readPeersFileHeader, peersFileName and
the assertion against peersFileVersionV2 unchanged.
- Around line 13-121: Add a test (e.g., TestPersistedPeersV2UnknownSuffrage)
that uses replaceFile to hand-craft a v2 peers file: write peersFileMagic,
writeU32(peersFileVersionV2), writeU64(index), writeU32(count), then for a peer
writeU64(NodeID), writeString(ID), writeString(Address) and then write a single
suffrage byte 0xFF; call LoadPersistedPeers and assert it returns an error. This
will exercise the readPersistedPeer path that rejects unknown suffrage bytes and
ensure the validation remains enforced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd057d86-1600-4246-a75a-8b738033dbe8

📥 Commits

Reviewing files that changed from the base of the PR and between b1b7a83 and 10f38b6.

📒 Files selected for processing (11)
  • docs/design/2026_04_26_proposed_raft_learner.md
  • internal/raftadmin/server_test.go
  • internal/raftengine/engine.go
  • internal/raftengine/etcd/engine.go
  • internal/raftengine/etcd/engine_test.go
  • internal/raftengine/etcd/learner_test.go
  • internal/raftengine/etcd/peer_metadata.go
  • internal/raftengine/etcd/peer_metadata_test.go
  • internal/raftengine/etcd/peers.go
  • internal/raftengine/etcd/persistence.go
  • internal/raftengine/etcd/wal_store.go

bootjp added 2 commits April 27, 2026 16:27
Per learner design doc Milestone 2. Operator-facing surface for the
M1 engine API: gRPC RaftAdmin service, internal/raftadmin server
wiring, and the cmd/raftadmin CLI subcommands.

Proto:
- service.proto: AddLearner and PromoteLearner RPCs.
- RaftAdminAddLearnerRequest mirrors AddVoterRequest.
- RaftAdminPromoteLearnerRequest carries id, previous_index, and
  min_applied_index. min_applied_index = 0 skips the catch-up
  precondition (kept for symmetry with previous_index but discouraged
  in production -- the runbook recommends a recent leader commit).

internal/raftadmin/server.go:
- AddLearner / PromoteLearner methods route through the existing
  Admin interface. Reuses the adminError code mapping so engine-level
  FailedPrecondition (errPromoteLearnerNotLearner /
  errPromoteLearnerNotCaughtUp / errPromoteLearnerNoProgress) surface
  unchanged.
- fakeEngine and TestServerMapsEngineAdminMethods extended.

cmd/raftadmin:
- add_learner <id> <address> [previous_index]
- promote_learner <id> [previous_index] [min_applied_index]
- Updated usage strings and the master fallback list.
- Tests for both subcommands print "index: N" on success.
Per learner design doc Milestone 2 (operator surface) and section 4.5
(post-apply alarm enforcement model).

Flag plumbing:
- main.go: new --raftJoinAsLearner bool. Default false. The flag is
  an OPERATOR ALARM, not a consensus-level enforcement: it does not
  prevent the leader from issuing AddVoter for the joiner. If a
  post-apply ConfState lists this node in Voters, an ERROR-level
  structured log fires and JoinRoleViolationCount increments.
- raftengine.FactoryConfig.JoinAsLearner threads the flag through
  the engine factory.
- etcd backend Engine struct stores joinAsLearner and a one-shot
  joinAlarmFired latch (atomic.Bool) so the alarm fires at most once
  per process per engine.

Apply-loop alarm hook:
- alarmIfJoinedAsVoter() runs at the end of applyConfigChange,
  applyConfigChangeV2, AND applyReadySnapshot. The snapshot path is
  important: a brand-new joiner often catches the leader's snapshot
  before any conf-change entries land, so the alarm needs the
  snapshot ConfState as much as the entry ConfState.
- applyReadySnapshot also now refreshes voterCount / isLearnerNode
  from snapshot.Metadata.ConfState so post-snapshot reads of those
  fields (recordQuorumAck, refreshStatus) see the new voter set
  immediately, not just after the next conf-change entry.
- JoinRoleViolationCount() exposes the process-wide alarm counter
  for monitoring.

Behaviour test (§5.5 last bullet):
- TestLinearizableReadOnLearnerForwardsToLeader: a learner returns
  raftengine.ErrNotLeader from LinearizableRead today. Locking the
  surface in with a typed errors.Is check guards against any future
  regression that lets the learner accidentally answer
  LinearizableRead from local FSM (follower-served reads are an
  explicit non-goal of M2).
- TestJoinAsLearnerAlarmFiresWhenAddedAsVoter: boots a joiner with
  joinAsLearner=true, has the leader call AddVoter (the misuse the
  alarm catches), confirms JoinRoleViolationCount increments and the
  joiner stays running.

Runbook:
- docs/raft_learner_operations.md: end-to-end operator workflow,
  prerequisites, the five-step attach/promote/verify sequence,
  remove-learner via remove_server, common errors, observability,
  explicit non-goals (no cold-bootstrap as learner, no auto-promote,
  no follower-served reads in M2).
@bootjp bootjp changed the title docs: propose raft learner support feat(raft): learner / non-voter membership (M1 engine + M2 operator surface) Apr 27, 2026
Real bug + nits surfaced by CodeRabbit on the M1+M2 branch.

Real bug (outside-diff):
- shouldCampaignSingleNode counted ALL servers in cfg.Servers
  (voters + learners). After learner support landed, a 1-voter +
  N-learner cluster on restart would skip the fast-path Campaign()
  and wait a full electionTimeout for the standard election. Now
  counts voters via Server.Suffrage. Same class as the four
  len(e.peers) overloads from §3.3 -- one I missed.

min_applied_index footgun (closes design §8 open question 3):
- proto: RaftAdminPromoteLearnerRequest gains skip_min_applied_check
  bool. The engine REJECTS min_applied_index == 0 with
  errPromoteLearnerMinAppliedZero unless skip_min_applied_check is
  also true, so a script that omits the catch-up bound gets a clean
  FailedPrecondition instead of silently disabling the safety check.
- raftengine.Admin.PromoteLearner gains the skipMinAppliedCheck
  parameter; etcd backend admin loop honors it.
- cmd/raftadmin promote_learner gains an optional
  skip_min_applied_check positional arg (true|false|1|0|yes|no).
- New unit test TestPromoteLearnerRejectsZeroMinAppliedWithoutSkip.
- Design doc §8: open question 3 marked Resolved. Runbook updated.

Test polish:
- TestPersistedPeersV2RewritesAfterV1Read renamed to
  TestPersistedPeersWriterAlwaysEmitsV2 (the body never reads a v1
  file -- it just verifies writer-side invariant).
- New TestPersistedPeersV2UnknownSuffrageRejected pins the 0xFF
  rejection path that future suffrage variants must not coerce
  through.

Doc polish:
- §2 / §3.2 / §7 line-number references replaced with stable
  function-name + brief excerpt references, matching the convention
  the doc already commits to in §3.2's preamble.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 27, 2026

@claude review

Addressed CodeRabbit Round 1 (commit 2f3bff2):

Real bug:

  • shouldCampaignSingleNode counted all servers including learners. A 1-voter+N-learner cluster on restart would skip the fast-path Campaign() and wait for electionTimeout. Now filters by Server.Suffrage == SuffrageVoter. Same class as the four len(e.peers) overloads from §3.3 — one I missed in M1.

min_applied_index = 0 footgun (closes design §8 open question 3):

  • Proto: RaftAdminPromoteLearnerRequest gains skip_min_applied_check bool.
  • Engine REJECTS min_applied_index == 0 with errPromoteLearnerMinAppliedZero unless skip flag is set.
  • Admin.PromoteLearner API gains skipMinAppliedCheck parameter.
  • raftadmin promote_learner gains optional skip_min_applied_check positional arg.
  • New TestPromoteLearnerRejectsZeroMinAppliedWithoutSkip.
  • Design doc §8 marks open question 3 Resolved; runbook updated.

Test polish:

  • TestPersistedPeersV2RewritesAfterV1ReadTestPersistedPeersWriterAlwaysEmitsV2 (body never read v1).
  • New TestPersistedPeersV2UnknownSuffrageRejected pins the 0xFF-rejection path.

Doc polish:

  • §2 / §3.2 / §7 line-number refs replaced with stable function-name + excerpt refs.

Skipped:

  • The "dead code in raftadmin server_test.go" finding was stale — review predated commit 3ed3c89 which adds the AddLearner / PromoteLearner proto RPCs and corresponding Server methods.
  • Table-driven the two PromoteLearner negative tests — reviewer marked optional and noted standalone tests are fine.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Claude finished @bootjp's task in 13m 1s —— View job


Review of feat/raft_learner — Round 2 (post CodeRabbit Round 1)

  • Gather context and understand request
  • Read engine changes (internal/raftengine/etcd/engine.go)
  • Read learner tests (internal/raftengine/etcd/learner_test.go)
  • Read peer metadata changes (internal/raftengine/etcd/peer_metadata.go, peer_metadata_test.go)
  • Read WAL store changes (internal/raftengine/etcd/wal_store.go)
  • Read admin server + proto changes
  • Read CLI changes (cmd/raftadmin/main.go)
  • Cross-check fixes claimed in the trigger comment
  • Post review feedback

The five fixes claimed in the trigger commit are all correctly implemented. One new correctness bug surfaced while tracing the nextPeersAfterConfigChangepersistConfigState path — it is high priority.


Claimed fixes — verification

Claim Status
shouldCampaignSingleNode counts voters only ✅ Confirmed at engine.go:2752. Filters cfg.Servers by Suffrage != SuffrageVoter, uses voter-only count for the == 1 check.
minAppliedIndex == 0 rejected unless skipMinAppliedCheck ✅ Engine-level guard at engine.go:1097. Proto field skip_min_applied_check at service.proto:236. CLI parseOptionalBool at main.go:326. New test TestPromoteLearnerRejectsZeroMinAppliedWithoutSkip at learner_test.go:156.
TestPersistedPeersWriterAlwaysEmitsV2 renamed ✅ Test at peer_metadata_test.go:92 correctly named. Comment updated to match — "writer is unconditionally v2".
TestPersistedPeersV2UnknownSuffrageRejected added ✅ Test at peer_metadata_test.go:113. Hand-crafts a v2 file with suffrage byte 0xFF, asserts LoadPersistedPeers returns an error. Write order (nodeID → suffrage byte → ID → address) correctly matches the production format.
Design doc line-number refs replaced ✅ §2 / §3.2 / §7 now use function names + excerpts.

New issue — High priority

nextPeersAfterConfigChange / nextPeersAfterConfigChangeV2 ignore the confState parameter — learner suffrage is never written to the v2 peers file

The apply loop (lines 1940-1958) calls:

confState := e.rawNode.ApplyConfChange(cc)
nextPeers := e.nextPeersAfterConfigChange(cc.Type, cc.NodeID, cc.Context, *confState)
if err := e.persistConfigState(entry.Index, *confState, nextPeers); err != nil {

nextPeersAfterConfigChange (line 3431) receives raftpb.ConfState as _ and ignores it:

func (e *Engine) nextPeersAfterConfigChange(changeType raftpb.ConfChangeType, nodeID uint64, context []byte, _ raftpb.ConfState) []Peer {
    ...
    applyConfigPeerChangeToMap(next, changeType, nodeID, context)
    return sortedPeerList(next)
}

applyConfigPeerChangeToMap decodes the peer from the ConfChange context bytes. encodeConfChangeContext (line 3696) only encodes NodeID, ID, and Address — no Suffrage. The decoded peer always has Peer.Suffrage == "".

All peers in e.peers also have Suffrage: "" because upsertPeer stores whatever was decoded from the ConfChange context (which lacks Suffrage). The comment at applyConfigPeerChange:2107 confirms this is intentional: "suffrage is an attribute of ConfState, not of e.peers."

The result: when persistConfigState writes the v2 peers file, every peer — including learners — gets Peer.Suffrage == "". persistedSuffrageByte("") returns persistedSuffrageVoter (0x00), so the learner is written as a voter.

Failure path on restart:

  1. Learner added at index N → config snapshot at N (configSnapshotUpToDate returns false for fresh clusters; bootstrap snapshot at index 1 < N) → snapshot ConfState.Learners = [learnerNodeID]
  2. Peers file at index N — learner has Suffrage = "" → written as voter
  3. Restart → normalizeOpenConfig loads peers file → cfg.Peers = peers.Peers → learner peer has Suffrage: "" → treated as voter
  4. validateOpenPeers: persisted.Index = N == snapshot.Metadata.Index = N → N > N is false → validateConfState runs
  5. splitPeersBySuffrage(peers) → learner (Suffrage="") lands in expectedVoters (not learnerSet)
  6. validateConfStateVoters(conf, expectedVoters)conf.Voters has N entries, expectedVoters has N+1 → errClusterMismatch → startup failure

No existing test catches this — none of the learner tests restart the engine after AddLearner. The conformance test mentioned in §5 of the design doc exercises "after restart" but that scenario is not yet in the test file.

Fix: Use the confState parameter (currently _) in nextPeersAfterConfigChange and nextPeersAfterConfigChangeV2 to annotate each peer's suffrage before returning:

func (e *Engine) nextPeersAfterConfigChange(changeType raftpb.ConfChangeType, nodeID uint64, context []byte, confState raftpb.ConfState) []Peer {
    e.mu.RLock()
    next := clonePeerMap(e.peers)
    e.mu.RUnlock()
    applyConfigPeerChangeToMap(next, changeType, nodeID, context)
    annotatePeerSuffrageFromConfState(next, confState) // set Suffrage from ConfState
    return sortedPeerList(next)
}

Where:

func annotatePeerSuffrageFromConfState(peers map[uint64]Peer, conf raftpb.ConfState) {
    learners := make(map[uint64]bool, len(conf.Learners))
    for _, id := range conf.Learners {
        learners[id] = true
    }
    for id, peer := range peers {
        if learners[id] {
            peer.Suffrage = SuffrageLearner
        } else {
            peer.Suffrage = SuffrageVoter
        }
        peers[id] = peer
    }
}

Apply the same fix to nextPeersAfterConfigChangeV2. Per the design doc §4.3: "Its sole role is to round-trip the v2 peers file across restarts so the operator view of the peers file is consistent with the eventual ConfState view after WAL replay." That contract is currently broken.

Required test: Add a restart case to TestAddLearnerReplicatesWithoutCountingAsVoter (or a new TestLearnerPeersFilePersistsSuffrageAcrossRestart): add a learner, close all engines, reopen them, and assert they start without error. This is the missing conformance test referenced in §5.5. Fix this →


Other observations (non-blocking)

promote_learner CLI usage string could warn about required min_applied_index

raftadmin promote_learner <id> (no other args) sends MinApplied=0, Skip=false, which the engine rejects with errPromoteLearnerMinAppliedZero. The usage string (main.go:425) shows [min_applied_index] as optional. The error message from the engine is descriptive, but the usage string could note "required unless skip_min_applied_check=true." Not blocking; the runbook covers this.

adminError maps all non-context errors to codes.FailedPrecondition (server.go:161)

errPromoteLearnerMinAppliedZero, errPromoteLearnerNotCaughtUp, and errPromoteLearnerNotLearner are all mapped to FailedPrecondition. This is semantically correct (the precondition was not met) and consistent with the existing AddVoter error mapping. ✓

TestJoinAsLearnerAlarmFiresWhenAddedAsVoter uses process-global counter correctly (learner_test.go:224)

before := JoinRoleViolationCount() captures the counter before the operation, so concurrent test execution against the global counter is handled safely. Since learner tests don't call t.Parallel(), intra-package sequencing prevents races. ✓


Summary

The five CodeRabbit Round 1 fixes are all correct and well-tested. The outstanding issue is a startup failure after a learner add: the v2 peers file writes the learner with Suffrage="" (treated as voter), causing validateConfState to fail on restart when the config snapshot and peers file land at the same index. The fix is small (use the already-received confState in nextPeersAfterConfigChange), but a restart test is also needed to lock it down.

CodeRabbit Round 2 caught a real correctness bug: AddLearner wrote
the v2 peers file with Suffrage="" for every peer (the ConfChange
context bytes do not encode suffrage and e.peers stores only
nodeID/ID/address). persistedSuffrageByte("") writes voter, so on
restart validateConfState saw a learner-as-voter peers list and
rejected startup with errClusterMismatch. This broke the design
doc §4.3 contract that the v2 peers file round-trips suffrage
across restarts.

The fix annotates Peer.Suffrage from the post-change ConfState at
the three persist sites:

1. nextPeersAfterConfigChange / nextPeersAfterConfigChangeV2:
   stop ignoring the confState parameter; new
   annotatePeerSuffrageFromConfState helper stamps Suffrage on the
   per-peer clone before persistConfigState writes the v2 file.
2. prepareOpenState: at Open time, annotate the operator-provided
   peer slice from the persisted snapshot's ConfState before
   savePersistedPeers writes the bootstrap-time peers file. Without
   this, a fresh joiner whose peers file is written by Open carries
   the bootstrap suffrage forever.
3. applyReadySnapshot: when a learner restores a snapshot from the
   leader (the common joiner path), the apply loop's
   nextPeersAfterConfigChange hook is bypassed -- the conf change
   is already incorporated into the snapshot's ConfState. New
   savePeersFileForSnapshot helper writes the v2 peers file with
   suffrage drawn from the snapshot's ConfState. Idempotent via
   savePersistedPeers's existing index check.

Tests:
- New TestLearnerPeersFilePersistsSuffrageAcrossRestart: add a
  learner, propagate a write so both peers are caught up, verify
  both nodes' v2 peers files carry Suffrage="learner" for the
  learner peer, then close + reopen both engines and assert no
  startup failure.
- TestNextPeersAfterConfigChange{,V2}* expectations updated to
  assert the annotated Suffrage rather than the empty-string
  pre-fix output.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 27, 2026

@claude review

Round 2 caught a real high-priority bug. Fixed in commit 536a90a:

  • Root cause: nextPeersAfterConfigChange ignored the confState parameter, so AddLearner wrote the v2 peers file with Suffrage="" for every peer (persistedSuffrageByte("") → voter byte). On restart, validateConfState saw a learner-as-voter peers list and rejected startup with errClusterMismatch.

  • Fix at three persist sites (the missing path was wider than just nextPeersAfterConfigChange):

    1. nextPeersAfterConfigChange / nextPeersAfterConfigChangeV2 now use the confState parameter (no more _); new annotatePeerSuffrageFromConfState helper stamps Suffrage on the per-peer clone.
    2. prepareOpenState annotates the operator-provided peer slice from disk.LocalSnap.Metadata.ConfState before savePersistedPeers writes the bootstrap-time file. Without this, a joiner that wrote its peers file at Open would never get the suffrage update.
    3. applyReadySnapshot writes the v2 peers file with suffrage from the snapshot's ConfState — the joiner-via-snapshot path bypasses the apply-loop nextPeersAfterConfigChange hook entirely (the conf change is already incorporated in the snapshot's ConfState; no separate entry to apply).
  • New regression test TestLearnerPeersFilePersistsSuffrageAcrossRestart: add a learner, propagate a write, verify both nodes' v2 peers files carry Suffrage="learner" for the learner peer, close + reopen both engines, assert no startup failure. This is the conformance test §5.5 promised but didn't have.

  • Pre-existing tests updated (TestNextPeersAfterConfigChange{,V2}*) — they asserted Suffrage="" which was the pre-fix behavior; now assert the annotated suffrage.

  • Non-blocking observations: skipped per the reviewer's framing (CLI usage hint, adminError mapping, JoinRoleViolationCount safety — all confirmed correct or acknowledged in the review).

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Claude finished @bootjp's task in 7m 52s —— View job


Review of feat/raft_learner — Round 3 (post-commit 536a90a)

  • Gather context and understand request
  • Read engine changes (annotatePeerSuffrageFromConfState, nextPeersAfterConfigChange, prepareOpenState, applyReadySnapshot)
  • Read learner tests (TestLearnerPeersFilePersistsSuffrageAcrossRestart)
  • Read peer metadata and WAL store changes
  • Read admin server, proto, and CLI changes
  • Cross-check all three persist-site fixes claimed in the trigger comment
  • Post review feedback

The Round 2 high-priority bug is correctly fixed at all three persist sites. Details below.


Claimed fixes — verification

Claim Status Notes
nextPeersAfterConfigChange / V2 use confState (no more _) engine.go:3470, engine.go:3481 — both call annotatePeerSuffrageFromConfState(next, confState) after applyConfigPeerChangeToMap
prepareOpenState annotates before bootstrap write engine.go:583annotatePeerSuffrageInSlice(peers, disk.LocalSnap.Metadata.ConfState) precedes savePersistedPeers
applyReadySnapshot writes peers file with annotated suffrage engine.go:1877savePeersFileForSnapshot calls annotatePeerSuffrageInSlice before savePersistedPeers
TestLearnerPeersFilePersistsSuffrageAcrossRestart closes + reopens learner_test.go:213-229 — genuine Close() + Open() (not mocked); would fail with errClusterMismatch without the fix
Pre-existing TestNextPeersAfterConfigChange* tests updated engine_test.go:1076-1103 — now assert SuffrageVoter/SuffrageLearner instead of ""

Four-step apply loop sequence — correct and consistent

applyConfigChange at engine.go:2046 exactly implements the §4.2 edit 4 sequence:

  1. refreshVoterCache(confState) — rebuilds voterCount + isLearnerNode before peer helpers run, so removePeer reads the post-change voter count for its ackTracker threshold
  2. applyConfigPeerChange(...) — mutates e.peers (address cache only, no Suffrage)
  3. refreshConfigServers(confState) — rebuilds e.config.Servers from e.peers + ConfState after helpers, so it sees post-cleanup addresses and post-ConfState suffrage
  4. setConfigIndex / resolveConfigChange / alarmIfJoinedAsVoter — bookkeeping

This matches applyConfigChangeV2 at engine.go:2055. The four-step comment at engine.go:2037-2045 is accurate.


annotatePeerSuffrageFromConfState logic — correct

for id, peer := range peers {
    switch {
    case learners[id]:
        peer.Suffrage = SuffrageLearner
    case voters[id]:
        peer.Suffrage = SuffrageVoter
    default:
        continue  // transient removal state — keep existing
    }
    peers[id] = peer
}

The default: continue correctly handles transient removal state (a peer not yet in either Voters or Learners, e.g., between ConfChangeRemoveNode propose and apply). Such peers will be dropped from e.peers when applyConfigPeerChange runs. The map is mutated in place, which is safe since nextPeersAfterConfigChange works on a clone of e.peers.

annotatePeerSuffrageInSlice uses the same logic for the slice form (bootstrap / snapshot paths) without an explicit default case — peers not in ConfState keep Suffrage="", which persistedSuffrageByte("") maps to voter (safe default for peers not yet cluster members).


validateConfState — set-based learner check is correct

validateConfStateLearners at wal_store.go:445:

if len(conf.Learners) != len(expected) {
    return errors.Wrapf(...)
}
for _, nodeID := range conf.Learners {
    if _, ok := expected[nodeID]; !ok {
        return errors.Wrapf(...)
    }
}

Uses ||-semantics (count mismatch OR member not-in-set → error), which is the correct fix from Round 4. The voter check stays element-wise (relies on the established NodeID-sort ordering invariant). The learner check is set-based (no prior ordering contract). Both cases from Round 4's table are handled: same-count-member-divergence (count passes, membership fails) and conf-learner-missing-from-peers (membership fails).


Voter cache initialization and snapshot-path refresh — correct

At Open (engine.go:508-509):

voterCount:    len(prepared.disk.LocalSnap.Metadata.ConfState.Voters),
isLearnerNode: learnerSetFromConfState(prepared.disk.LocalSnap.Metadata.ConfState),

Both fields initialized from snapshot ConfState, adjacent to e.config, before refreshStatus() is called at engine.go:528. The false-single-node-fast-path activation for a 3-voter cluster on startup (Round 2 issue 2) is closed.

At snapshot restore (engine.go:1868):

e.refreshVoterCache(snapshot.Metadata.ConfState)
e.setConfigurationFromConfState(snapshot.Metadata.ConfState, ...)

Voter cache is refreshed before config rebuild, matching the apply-loop ordering.

refreshVoterCache (engine.go:2107-2111) rebuilds both fields from scratch — never patches — so learner promotion cannot leave a stale isLearnerNode[id]=true entry.


Lock-free isLearnerNode read in recordQuorumAck — safe

recordQuorumAck at engine.go:1686 reads e.isLearnerNode[msg.From] without taking e.mu. The comment explains this is safe: both the writer (refreshVoterCache, called from applyConfigChange in the apply loop) and the reader (recordQuorumAck, called from handleStep) run on the same event-loop goroutine. The single-writer invariant is maintained by the event loop's sequential dispatch.

refreshStatus reads voterCount under e.mu.Lock() because Status() is a public API called from external goroutines.


shouldCampaignSingleNode — correct

engine.go:2786-2805 counts only servers with Suffrage == SuffrageVoter, explicitly skipping learners. A 1-voter + N-learner cluster correctly returns true. Test coverage: TestRemovePeerLearnerKeepsSingleNodeFastPath exercises the fast-path while a learner is attached.


Restart test is not a mock

TestLearnerPeersFilePersistsSuffrageAcrossRestart (learner_test.go:162):

  • Calls leader.engine.Close() and nodes[1].engine.Close() (not stubs)
  • Calls Open(ctx, OpenConfig{...}) with no Peers field — normalizeOpenConfig loads from the persisted peers file, so this exercises the full restart path through loadPersistedPeersStatevalidateOpenPeersvalidateConfState
  • Asserts require.NoError(t, err) — would surface errClusterMismatch if suffrage wasn't round-tripped

The assertion at learner_test.go:201-203 also directly checks the persisted file has SuffrageLearner before the restart, confirming the annotation happened at persist time rather than being deferred.


Minor observations (non-blocking)

1. applyReadySnapshot ordering: voter cache → config → peers file

The comment at engine.go:1870-1876 accurately explains the WHY for savePeersFileForSnapshot being called after setConfigurationFromConfState rather than the reverse. Worth noting: savePeersFileForSnapshot reads e.peers (address cache), not e.config.Servers (which has the suffrage). The suffrage for the file comes from snapshot.Metadata.ConfState passed to annotatePeerSuffrageInSlice. This is correct and the two data sources (address from e.peers, suffrage from ConfState) are independent.

2. parseOptionalBool in CLI accepts "" as false

main.go:333 has case "false", "0", "no", "":. This means an empty argument (e.g., raftadmin <addr> promote_learner n1 0 0 "") is treated as skip_min_applied_check=false. Not a correctness issue; the usage string at main.go:425 is descriptive enough.

3. fakePromoteLearnerCall assertion in server_test.go:229 uses only {id, prevIndex, minAppliedIndex} but not skipMinAppliedCheck. The call at line 202 passes SkipMinAppliedCheck: false (default), so skipMinAppliedCheck=false is implicitly asserted. Could add it for completeness but not wrong as-is.


Summary

All three persist-site fixes are correctly implemented and match the design doc's §4.3 contract. The new restart test actually closes and reopens the engines, pinning the regression. No new correctness issues introduced by commit 536a90a6. The implementation is ready to merge pending CI green.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 536a90a6f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fsmSnapDir: filepath.Join(prepared.cfg.DataDir, fsmSnapDirName),
tickInterval: prepared.cfg.TickInterval,
electionTick: prepared.cfg.ElectionTick,
joinAsLearner: prepared.cfg.JoinAsLearner,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Trigger join-as-learner alarm during engine startup

When a node restarts with JoinAsLearner=true and its persisted snapshot already has the local node in ConfState.Voters, startup does not emit the alarm or increment JoinRoleViolationCount unless a later config change/snapshot happens. In this code path, Open seeds config state from LocalSnap.Metadata.ConfState but never calls alarmIfJoinedAsVoter, so the operator can miss the only expected warning in steady state. This is reproducible after a mis-join followed by restart with no further membership changes.

Useful? React with 👍 / 👎.

Codex caught a real gap: alarmIfJoinedAsVoter only ran from
applyConfigChange / applyConfigChangeV2 / applyReadySnapshot. None
of those fire on a clean Open path -- the post-mis-join ConfState
loaded into prepared.disk.LocalSnap is never replayed. Steady-state
restart of a node that previously mis-joined as voter would have
emitted no alarm and never bumped JoinRoleViolationCount, so the
operator could miss the only signal in the steady state.

Fix:
- Open now calls engine.alarmIfJoinedAsVoter(prepared.disk.LocalSnap.Metadata.ConfState)
  once after refreshStatus, before run() spins up. The
  joinAlarmFired latch (atomic.Bool) ensures the alarm fires at
  most once per process even if a later snapshot or conf change
  reapplies the same role assignment.

Test:
- TestJoinAsLearnerAlarmFiresAtStartupForPersistedVoterRole opens
  the joiner with joinAsLearner=false, leader AddVoters it,
  propagates a write to land the conf change on the joiner's disk,
  closes the joiner, then reopens with joinAsLearner=true and
  asserts JoinRoleViolationCount incremented and the engine stays
  running.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 27, 2026

Addressed the Codex finding (commit 32e20b1):

alarmIfJoinedAsVoter previously only ran from applyConfigChange / applyConfigChangeV2 / applyReadySnapshot. A node that previously mis-joined as voter and is then restarted with --raftJoinAsLearner=true hits none of those paths — the post-mis-join ConfState is loaded from disk into prepared.disk.LocalSnap and never replayed in steady state.

Fix: Open now calls engine.alarmIfJoinedAsVoter(prepared.disk.LocalSnap.Metadata.ConfState) once after refreshStatus, before run() spins up. The existing joinAlarmFired latch (atomic.Bool) ensures the alarm fires at most once per process even if a later snapshot or conf change reapplies the same role assignment.

Regression test: TestJoinAsLearnerAlarmFiresAtStartupForPersistedVoterRole opens the joiner with joinAsLearner=false, has the leader AddVoter it, propagates a write so the conf change lands on the joiner's disk, closes it, then reopens with joinAsLearner=true and asserts JoinRoleViolationCount incremented and the engine stays running.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 27, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Claude finished @bootjp's task in 4m 21s —— View job


Review of feat/raft_learner — Round 4 (post-commit 32e20b1)

  • Gather context and read changed files
  • Review latest commit (32e20b1) — alarmIfJoinedAsVoter at Open time
  • Full implementation review against design doc
  • Check for new issues introduced
  • Post review feedback

The Codex finding is correctly addressed. Full verification below.


Latest commit (32e20b1) — verification

The gap: alarmIfJoinedAsVoter was only called from applyConfigChange / applyConfigChangeV2 / applyReadySnapshot. A node that mis-joined as voter, accumulated the post-join ConfState on disk, then restarted with --raftJoinAsLearner=true would hit none of those code paths during the clean Open; the persisted snapshot is loaded from disk directly, never replayed.

Fix implementation (engine.go:539):

engine.alarmIfJoinedAsVoter(prepared.disk.LocalSnap.Metadata.ConfState)

Called after refreshStatus() and before go engine.run(). The joinAlarmFired.CompareAndSwap(false, true) latch prevents duplicate alarms within a single engine instance's lifetime.

Latch semantics are correct for this scenario. joinAlarmFired is an atomic.Bool field on the Engine struct (not a package-level variable). A second call to Open creates a fresh Engine with joinAlarmFired=false, so the alarm fires again for the new instance. This is the intended behavior: each engine instance can alarm once.

Regression test (TestJoinAsLearnerAlarmFiresAtStartupForPersistedVoterRole, learner_test.go:378): The sequence — open with joinAsLearner=false, AddVoter, commit to land conf change on disk, Close, re-Open with joinAsLearner=true — properly exercises the exact gap. The test uses require.Greater(t, JoinRoleViolationCount(), before) which handles any pre-existing alarm count from prior test runs correctly. require.NotEqual(t, raftengine.StateShutdown, engine.State()) confirms §4.5's "no shutdown" contract. ✓


Cumulative implementation review

All issues from Rounds 1–3 remain correctly resolved:

Issue Status
refreshStatus single-node fast path with 1-voter+N-learner (voterCount <= 1, engine.go:2506)
recordQuorumAck denominator uses voterCount not len(e.peers) (engine.go:1700–1704)
shouldCampaignSingleNode counts voters only (engine.go:2808–2816)
v2 peers file round-trips suffrage at all three persist sites (apply loop, bootstrap, snapshot restore)
validateConfState: voter-count comparison uses splitPeersBySuffrage; learner check is set-based with `
voterCount/isLearnerNode initialized at Open from LocalSnap.Metadata.ConfState (engine.go:508–509)
Four-step apply loop sequence: voter-cache → peer helpers → config servers → bookkeeping (applyConfigChange, engine.go:2048–2064)
applyConfigChange/applyConfigChangeV2 accept confState raftpb.ConfState param
upsertPeer no longer writes e.config.Servers; refreshConfigServers is the sole writer
PromoteLearner preconditions checked pre-propose, on the admin loop before ProposeConfChange
minAppliedIndex==0 rejected unless skipMinAppliedCheck=true (engine.go:1115–1116)
--raftJoinAsLearner enforcement is post-apply local alarm, not pre-propose veto
Restart test TestLearnerPeersFilePersistsSuffrageAcrossRestart does genuine Close+Open
alarmIfJoinedAsVoter called at Open time — Codex finding (32e20b1)

New issue — Minor documentation inaccuracy

joinAlarmFired field comment says "per process lifetime" — should be "per Engine instance"

engine.go:192–193:

// joinAlarmFired latches the join-as-learner alarm so the
// structured ERROR log fires once per process lifetime. Without

joinAlarmFired is an atomic.Bool field on the Engine struct, initialized to false on every Open. The new regression test proves that a second Open (with joinAsLearner=true) fires the alarm, because the new Engine instance has joinAlarmFired=false. The comment calling this "per process lifetime" is therefore inaccurate — the correct description is "at most once per Engine instance" or "at most once per Open lifecycle."

This is a documentation nit (not a behavior bug — the code works exactly as intended), but the wrong phrasing could confuse a future reader trying to understand why the test expects the alarm on a second Open. Fix this →


Observations (non-blocking)

promote_learner CLI with no min_applied_index: raftadmin <addr> promote_learner <id> sends MinApplied=0, Skip=false, which the engine rejects with errPromoteLearnerMinAppliedZero. The usage string marks it optional ([min_applied_index]), but in practice it's required unless skip_min_applied_check=true. The runbook documents the correct invocation; the engine error is descriptive. This is acceptable.

Concurrency model for voterCount/isLearnerNode: refreshVoterCache writes both under e.mu (for refreshStatus which runs from external goroutines). recordQuorumAck reads both without a lock — safe because it runs on the same event-loop goroutine as refreshVoterCache, so the accesses are sequentially ordered. The comment at engine.go:1693–1696 documents this correctly. ✓

normalizePersistedPeer preserves Suffrage: The function only fills in missing NodeID/ID and returns the struct otherwise unchanged. The Suffrage field round-trips correctly through the normalizePersistedPeers path in writeCurrentPersistedPeers. ✓


Summary

The Codex-found gap is correctly closed. All fourteen tracked implementation issues across four rounds of review are resolved. The implementation is ready to merge pending CI green. The one item above (comment text) is a cleanup that can be addressed in a follow-up commit or as part of merge.

@bootjp bootjp merged commit 51907e6 into main Apr 27, 2026
9 checks passed
@bootjp bootjp deleted the feat/raft_learner branch April 27, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant