feat(raft): learner / non-voter membership (M1 engine + M2 operator surface)#676
feat(raft): learner / non-voter membership (M1 engine + M2 operator surface)#676
Conversation
Turn on the dormant learner / non-voter path in the etcd raft engine so operators can attach catch-up replicas without shrinking voter fault tolerance during catch-up. Phase 3 of the etcd-migration doc deliberately deferred this; the codebase already has partial scaffolding (Server.Suffrage, RaftAdminMember.suffrage, applyConfigPeerChangeToMap keeping learner metadata) waiting for a consumer. Scope: AddLearner/PromoteLearner on raftengine.Admin, end-to-end suffrage reporting, peer-metadata file v2 with a per-peer suffrage byte, quorumAckTracker denominator audit so a slow learner cannot stall lease reads, and an explicit min_applied_index precondition on promote. Out of scope: follower-served reads, joint consensus, KV/adapter changes.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (16)
📝 WalkthroughWalkthroughThis pull request implements learner (non-voting) membership functionality in the Elastickv Raft system. Changes span a design document, new admin APIs ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Admin API
participant Raft Core
participant Log/FSM
participant Persistence
Client->>Admin API: AddLearner(nodeID, address)
Admin API->>Raft Core: Submit ConfChangeAddLearnerNode
Raft Core->>Log/FSM: Append & commit config change
Raft Core->>Persistence: Persist v2 peers with suffrage
Raft Core->>Admin API: Return applied index
Admin API->>Client: Return index
Note over Raft Core: Learner now replicates<br/>but not counted in quorum
sequenceDiagram
participant Client
participant Admin API
participant Leader
participant Raft Core
participant Learner Node
participant Log/FSM
Client->>Admin API: PromoteLearner(nodeID, minAppliedIndex=50)
Admin API->>Leader: Check Progress[nodeID].Match >= 50
alt Progress.Match < minAppliedIndex
Admin API->>Client: FailedPrecondition error
else Learner caught up
Admin API->>Raft Core: Submit ConfChangeAddNode
Raft Core->>Log/FSM: Append & commit config change
Learner Node->>Log/FSM: Apply config, transition to voter
Raft Core->>Client: Return applied index
end
Note over Learner Node: Now counts toward quorum
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request proposes a design for Raft Learner support to enable safe membership changes and replica catch-up without impacting quorum. The design covers engine API additions, etcd backend updates, and a new versioned format for the persisted peers file. The review feedback identifies a potential stall in lease-read logic for single-voter clusters, requests clarification on the authoritative source of peer metadata during recovery, and suggests detailing the enforcement of the join-as-learner flag to ensure zero-downtime migrations.
| The `ConfState` carried inside the local raft snapshot remains the | ||
| *authoritative* source of suffrage. The peers file is a cache plus an | ||
| operator-readable summary; on conflict, the snapshot's `ConfState` wins, | ||
| and `persistConfigState` overwrites the peers file accordingly. |
There was a problem hiding this comment.
The design states that the snapshot's ConfState is the authoritative source of truth. However, the existing validateOpenPeers logic (line 2541 in internal/raftengine/etcd/engine.go) allows the persisted peers file to be 'ahead' of the snapshot when configuration changes are committed in the WAL but not yet snapshotted. In this scenario, the peers file is more up-to-date. The design should clarify that the authoritative source is the latest committed configuration (whether in a snapshot or the WAL), and that the peers file is the persistent record of that state used during recovery to resolve addresses before the WAL is replayed. Because this ambiguity poses a significant operational risk during node recovery, the design must also detail potential mitigation strategies to ensure consistency.
References
- When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary "bridge" or "proxy" mode.
| - The engine must compute the lease-read denominator from | ||
| `len(ConfState.Voters)` (excluding self if leader), not `len(e.peers)`. | ||
| `recordQuorumAck` already gates on `isLeader`; we additionally drop | ||
| responses from peers whose `nodeID` is in `ConfState.Learners` so a | ||
| learner heartbeat ack cannot count toward voter majority. |
There was a problem hiding this comment.
The proposed logic for the lease-read denominator must also be applied to the singleNodeLeaderAckMonoNs update in refreshStatus. Currently, refreshStatus (line 2176 in internal/raftengine/etcd/engine.go) checks clusterSize <= 1 using len(e.peers). If a learner is present in a single-voter cluster, len(e.peers) will be greater than 1, causing the leader to stop updating the single-node fast path and switch to the ackTracker. Since learner acks are dropped (as proposed in §4.6), the leader will never reach a quorum, stalling lease reads. This check must be updated to use the voter count from ConfState to ensure single-voter clusters with learners still benefit from the lease fast path.
| 2. The leader's `Progress[nodeID].Match >= minAppliedIndex`. The leader | ||
| is the authority on follower progress; an operator who reads | ||
| `Status.AppliedIndex` from the learner itself sees a slightly stale | ||
| view, so the engine re-checks against the leader's tracker before | ||
| proposing. `minAppliedIndex == 0` skips the precondition (matches the | ||
| existing "0 means don't care" convention used by `prevIndex`). |
There was a problem hiding this comment.
The PromoteLearner precondition check against Progress[nodeID].Match will require calling RawNode.Status() on the leader. Note that Status() is relatively expensive in etcd/raft as it clones the entire progress map. While this is acceptable for an infrequent administrative operation, the implementation should be mindful of this cost and ensure it is only performed within the admin request handler, not on every iteration of the raft loop.
| - New CLI flag (per group): `--raftJoinAsLearner` (default `false`). When | ||
| set, the local node refuses to self-bootstrap and refuses to be added | ||
| with `AddVoter` from outside; it expects an operator to call | ||
| `AddLearner` against the existing leader before it appears in | ||
| `ConfState`. |
There was a problem hiding this comment.
Regarding the --raftJoinAsLearner flag, the design mentions that the node 'refuses to be added with AddVoter from outside'. It would be beneficial to specify the enforcement mechanism. For instance, the node should verify its own suffrage when applying a ConfChange and fail or log a critical error if it is added as a voter when it was explicitly configured to join as a learner. This prevents accidental promotion before the node has caught up and potentially stalling the quorum. As this feature is intended for production-grade system migrations, the plan must also detail how this mechanism supports zero-downtime cutovers to avoid service interruption.
References
- When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.
|
@claude review |
|
Claude finished @bootjp's task in 3m 27s —— View job Review of
|
Iterate on the learner design doc to close the eleven points raised by the gemini-code-assist and claude review passes on PR #676. Correctness fixes: - §4.6 enumerates all four len(e.peers) overloads (recordQuorumAck, refreshStatus single-node fast path, removePeer post-removal, plus a no-op audit of nextPeersAfterConfigChange). Introduces an event-loop-owned voterCount + isLearnerNode cache so the lease-read hot path stays O(1). The 1-voter+1-learner stall the reviewer flagged is the specific scenario the cache is sized to fix. - §4.2 splits e.config.Servers ownership: upsertPeer keeps only the address cache; the apply loop rebuilds e.config.Servers from configurationFromConfState after every conf change. - §4.3 documents two-phase recovery (open-phase reads addresses from v2 peers file; WAL-replay phase rebuilds the suffrage view from ConfState) and pins confStateForPeers as voters-only / bootstrap-only. Peer.Suffrage is a v2-file artifact, not a runtime authority. Specification: - §4.5 gives a concrete enforcement model for --raftJoinAsLearner: post-apply local detection with an alarm metric, not pre-propose veto. Calls out that the flag is an operator alarm, not a consensus-level enforcement. - §4.1 PromoteLearner explicitly preconditions before propose; reads Progress from rawNode.Status() on the same goroutine that owns it. - §3.3 enumerates len(e.peers) call sites at design time. Tests + open questions: - §5.5 adds a snapshot-during-learner-catch-up test for snapshot_spool. - §5.5 splits the lease-read regression into two named cases (1v+1L fast path retained; 3v+1L denominator correct). - §8 lists the min_applied_index=0 footgun as an explicit open question with the skip_min_applied_check alternative. - §3.2 swaps line numbers for function-name + excerpt references so the doc stops drifting against the code immediately.
|
@claude review Addressed the eleven points from the previous review pass — full breakdown in the commit message of 08f59c7. Key items:
|
|
Claude finished @bootjp's task in 4m 37s —— View job Review of
|
| # | Point | Status |
|---|---|---|
| 1 | refreshStatus single-node fast-path stall with 1-voter+1-learner |
✅ §4.6 item 2 introduces e.voterCount cache and explicitly names this "the single critical fix for the 1-voter+1-learner stall the reviewer flagged" |
| 2 | recordQuorumAck denominator uses len(e.peers) |
✅ §4.6 item 1: replace with e.voterCount, early-reject acks when e.isLearnerNode[msg.From] |
| 3 | confStateForPeers not updated for learner entries in v2 peers file |
✅ §4.3: confStateForPeers stays voters-only / bootstrap-only. Peer.Suffrage is a v2-file artifact; configurationFromConfState is the hot-path suffrage authority. The two writers are distinct and documented. |
| 4 | Dual e.config.Servers update paths diverge on suffrage |
✅ §4.2 edit 4: upsertPeer keeps only address cache; apply loop rebuilds e.config.Servers via configurationFromConfState after every conf change. removePeer likewise stops splicing e.config.Servers. |
| 5 | --raftJoinAsLearner enforcement model underspecified |
✅ §4.5: post-apply local alarm (not pre-propose veto). Rationale is sound — pre-propose veto would either require a network vetoing protocol or catastrophic mid-stream Apply rejection. Shutdown is explicitly rejected since the node already counts toward quorum once the conf change applies. |
| 6 | Authoritative conf-state source during recovery | ✅ §4.3 "Authoritative source of suffrage during recovery": explicit two-phase recovery, snapshot+WAL wins, peers file catches up. The gap between snapshot index and persisted-peers index is explicitly addressed. |
| 7 | All len(e.peers) call sites enumerated |
✅ §3.3 table enumerates all four; §4.6 addresses each; Milestone 1 fixes all four in one PR. |
| 8 | PromoteLearner Progress.Match check must be pre-propose, not lazy |
✅ §4.1: "Both preconditions below run on the leader, on the single-threaded admin loop (handleAdmin), before calling rawNode.ProposeConfChange." Rationale included. |
| 9 | min_applied_index = 0 footgun |
✅ §4.1 and §8 item 2a: documented as a known footgun, tracked as an open question with skip_min_applied_check alternative. |
| 10 | Test plan: snapshot-during-learner-catch-up | ✅ §5.5: "Snapshot-during-learner-catch-up test: trigger a new FSM snapshot while a learner is still receiving the previous snapshot. Verifies snapshot_spool.go's dedup/cancel logic..." |
| 11 | Line numbers → function names in §3.2 | ✅ §3.2: "Line numbers below reflect main at the propose date and will drift; the function names and the small excerpts are the stable references." |
New issues
1. validateConfState breaks on restart after a learner is added — High priority
The design says (§4.2 edit 3):
"
validateConfState(wal_store.go) accepts learner-only entries:len(conf.Learners) > 0is no longer rejected."
That fix removes the len(conf.Learners) > 0 rejection at wal_store.go:419. But validateConfState also performs a voter-count comparison using confStateForPeers(peers) (wal_store.go:410-416):
expected := confStateForPeers(peers) // emits ALL peers as voters
if len(conf.Voters) != len(expected.Voters) { // compare counts
return errors.Wrapf(errClusterMismatch, ...)
}confStateForPeers adds every Peer to Voters regardless of suffrage. After a learner is added to e.peers and persisted to the v2 peers file, the peers slice passed to validateConfState from validateOpenPeers (engine.go:2539-2544) will contain 3 voter peers + 1 learner peer. confStateForPeers(peers) emits 4 voters. But the snapshot's ConfState.Voters has 3 voters (and the learner is in ConfState.Learners). The count mismatch triggers errClusterMismatch and the node fails to restart.
The scenario: leader has snapshot at index M (M >= persisted-peers index N), so validateOpenPeers calls validateConfState instead of short-circuiting at the persisted.Index > snapshot.Metadata.Index guard.
The fix is not just removing the Learners rejection — it also requires that the voter-count comparison in validateConfState only counts voter peers:
voterCount := 0
for _, p := range peers {
if p.Suffrage != "learner" {
voterCount++
}
}
// or: use expected.Voters built only from voter-Peer entriesThe design must specify this explicitly, since confStateForPeers is declared "unchanged" in §4.3. Fix this →
2. e.voterCount not initialized at Open time — Medium priority
The design introduces e.voterCount and e.isLearnerNode as apply-loop-owned fields, written "after rawNode.ApplyConfChange returns the new ConfState". But refreshStatus() is called at engine.go:467 during Open, before the apply loop processes any entry. At that point, e.voterCount is 0 (zero value). The clusterSize <= 1 check in refreshStatus becomes 0 <= 1 = true, which falsely activates the single-node fast path for a 3-voter cluster on startup.
The parallel: e.config is initialized at engine.go:448 from the bootstrap ConfState:
config: configurationFromConfState(peerMap, prepared.disk.LocalSnap.Metadata.ConfState),e.voterCount and e.isLearnerNode need the same treatment — derive them from prepared.disk.LocalSnap.Metadata.ConfState at Open time, adjacent to the config initialization. The design should call this out explicitly, since the apply-loop-only description gives the false impression that no Open-time initialization is needed. Fix this →
Minor notes
-
§8 numbering: Items
2and2aare both labeled2in the markdown source. Consider2bfor the existing item 2 and2afor the footgun, or just renumber them2/3and shift down. Not blocking. -
e.isLearnerNoderebuild vs. patch: The design says both fields are "derived directly fromconf.Voters/conf.Learners", implying a full rebuild on each conf change (correct). A note that the map is rebuilt from scratch — not incrementally patched — would prevent an implementer from writinge.isLearnerNode[nodeID] = true(which would leave stale entries after promotion).
Summary
The design is solid and the eleven previous points are all addressed with care. The two new issues above both require small additions to the design doc before Milestone 1 implementation starts: (1) clarify the validateConfState voter-count comparison and how it handles learner entries in the peers slice, and (2) add e.voterCount / e.isLearnerNode initialization to the Open-time setup alongside the existing e.config init.
claude[bot] Round 2 confirmed all 11 prior points, surfaced two new issues plus minor cleanup: - §4.2 edit 3: validateConfState voter-count comparison breaks on restart with a learner persisted in the v2 peers file, because confStateForPeers emits every Peer as a voter. Document that validateConfState builds its own expected voter set filtered by Suffrage != "learner" rather than re-using the cold-bootstrap helper. Keeps confStateForPeers minimal as the prior round documented. - §4.6: e.voterCount and e.isLearnerNode must be initialized at Open time from prepared.disk.LocalSnap.Metadata.ConfState, adjacent to the existing e.config initialization. Without this, refreshStatus runs once during Open before the apply loop and reads zero values, falsely activating the single-node fast path on a 3-voter cluster startup. Also call out that isLearnerNode is rebuilt from scratch on each conf change, never patched, to prevent stale-true entries after promotion. - §8 numbering: 2 / 2a → 2 / 3.
|
@claude review Addressed both new issues from Round 2 (commit dcf1337):
|
|
Claude finished @bootjp's task in 5m 8s —— View job Review of
|
| # | Point | Status |
|---|---|---|
| 1 (High) | validateConfState voter-count mismatch on restart with learners |
✅ §4.2 edit 3: Learners rejection removed; Resolution (a) filter + compare approach documented. |
| 2 (Medium) | e.voterCount / e.isLearnerNode not initialized at Open |
✅ §4.6 "Initialization at Open time" subsection added with explicit code wiring both fields from prepared.disk.LocalSnap.Metadata.ConfState adjacent to the existing e.config init. |
| Minor | isLearnerNode stale-true-after-promotion footgun |
✅ §4.6 explicitly states the map is rebuilt from scratch on each conf change, not patched. |
| Minor | §8 numbering (2/2a) | 2 and 2a are now 2 and 3, but the item that was originally 3 (--raftBootstrapMembers) is still labeled 3, creating a duplicate. Items that should be 4 and 5 are still labeled 3 and 4. The tail of §8 needs a renumber. |
New issues
1. validateConfState Resolution (a) has an implicit ordering requirement for learners — Minor
Resolution (a) in §4.2 edit 3 says validateConfState "compares both Voters and Learners element-wise against the snapshot's ConfState." The element-wise comparison (the existing pattern in wal_store.go:414-418) is order-sensitive: it checks conf.Voters[i] != expected.Voters[i]. Extending this to Learners requires that the learner entries in the peers list passed to validateConfState are in the same order as ConfState.Learners.
The voter-ordering invariant already holds implicitly — persistConfigState writes voters in an order that matches ConfState.Voters, and normalizePeers preserves it. But the design doesn't establish that the same ordering guarantee will hold for learners (there's no prior convention, since learners have never been persisted before).
The design should either:
- Explicitly state that
persistConfigStatewill write learner entries in the same order asConfState.Learners(pin this as a contract on the v2 writer), OR - Use a set-based comparison instead of element-wise for the learner entries, which is more robust and avoids the ordering dependency entirely. The voter check could keep element-wise (to match existing behavior); the learner check could use a set:
len(conf.Learners) != filteredLearnerCount || !allLearnersPresent(conf.Learners, learnerSet).
2. applyConfigChange / applyConfigChangeV2 don't receive ConfState — Minor
§4.6 says e.voterCount and e.isLearnerNode are updated "in applyConfigChange / applyConfigChangeV2, after rawNode.ApplyConfChange(...) returns the new ConfState." §4.2 edit 4 similarly says e.config.Servers is "recomputed from the post-change ConfState by a single call to configurationFromConfState inside applyConfigChange / applyConfigChangeV2."
But the current signatures (engine.go:1803):
func (e *Engine) applyConfigChange(changeType raftpb.ConfChangeType, nodeID uint64, context []byte, index uint64)
func (e *Engine) applyConfigChangeV2(cc raftpb.ConfChangeV2, index uint64)don't receive confState. The confState is available at the call sites (engine.go:1740 and engine.go:1752), not inside those functions.
The implementation will need to choose: (a) extend the signatures of applyConfigChange / applyConfigChangeV2 to accept confState, or (b) update e.voterCount, e.isLearnerNode, and e.config.Servers inline at the call site (between rawNode.ApplyConfChange and applyConfigChange). The design should specify which approach is intended so the implementation doesn't improvise. Note: nextPeersAfterConfigChange already has the ConfState parameter pattern (engine.go:3108), so option (a) is consistent with existing code shape.
Summary
The design is very close to implementation-ready. The two Round 2 correctness issues are properly resolved, and §4.6 is now the clearest section of the document. The two new issues above are both minor specification gaps that an implementer would need to resolve before writing the first line of code. Addressing either in the design doc (ordering contract vs. set-based comparison; signature extension vs. inline update) will close the loop.
claude[bot] Round 3 confirmed both Round-2 issues; flagged two minor specification gaps and a residual numbering issue: - §4.2 edit 3 Resolution (a): switch the learner check to a set-based comparison instead of element-wise. There is no prior convention pinning learner ordering in persisted state (learners have never been persisted before), and a set comparison is strictly more robust than retroactively introducing an ordering contract on the v2 writer. The voter check stays element-wise, matching the existing well-established invariant. - §4.2 edit 4: extend applyConfigChange / applyConfigChangeV2 signatures to take confState raftpb.ConfState, mirroring the existing nextPeersAfterConfigChange shape. This is the natural threading point for the suffrage-aware updates of e.config.Servers, e.voterCount, and e.isLearnerNode. Document that suffrage updates run after the per-peer cleanup helpers so the apply-loop ordering is deterministic. - §8: full renumber. Items were 1, 2, 3, 3, 4 — now 1, 2, 3, 4, 5.
|
@claude review Addressed all three Round-3 items (commit 34c84dc):
|
|
Claude finished @bootjp's task in 3m 15s —— View job Review of
|
| # | Point | Status |
|---|---|---|
| 1 (Minor) | validateConfState Resolution (a) learner ordering — set-based comparison |
✅ §4.2 edit 3: voter check stays element-wise; learner check is set-based with allLearnersPresent. Set comparison avoids a new ordering contract on the v2 writer. |
| 2 (Minor) | applyConfigChange / applyConfigChangeV2 signatures |
✅ §4.2 edit 4: chose option (a). Extended signatures accept confState raftpb.ConfState, mirroring nextPeersAfterConfigChange. Ordering note present: suffrage-aware updates run after per-peer cleanup helpers. |
| 3 (Minor) | §8 numbering (1, 2, 3, 3, 4 → 1, 2, 3, 4, 5) | ✅ §8 now has five distinct numbered items. |
New issues
1. Set-based learner check formula has a logic error — High priority
§4.2 edit 3 specifies the learner check as:
len(conf.Learners) != filteredLearnerCount && allLearnersPresent(conf.Learners, learnerSet)
If this expression is the error condition (return errClusterMismatch when true), the && is wrong. Consider:
conf.Learners |
peers-file learners | Expected | Formula result |
|---|---|---|---|
[A, B] |
[A, B] |
valid | 2 != 2 && true = false ✓ |
[A, B] |
[A, C] (count same, members differ) |
error | 2 != 2 && false = false ✗ — mismatch undetected |
[A, B] |
[A] |
error | 2 != 1 && false = false ✗ — mismatch undetected |
[A] |
[A, B] |
error | 1 != 2 && true = true ✓ |
Cases 2 and 3 — a same-count member divergence and a conf-learner not present in peers — are silently passed through by the &&. An implementer copying this formula would produce a broken validation that accepts a mismatched cluster on restart.
The correct error condition is:
len(conf.Learners) != filteredLearnerCount || !allLearnersPresent(conf.Learners, learnerSet)Equivalently, the valid condition:
len(conf.Learners) == filteredLearnerCount && allLearnersPresent(conf.Learners, learnerSet)The voter check (len(conf.Voters) != len(expected.Voters)) uses != as the error condition — the learner check should follow the same pattern, just with a set-membership predicate replacing element-wise order. Fix this →
2. e.voterCount update ordering contradicts itself between §4.2 and §4.6 / §5.2 — Medium priority
§4.2 edit 4 says:
"Inside the body, suffrage-aware updates run after the existing per-peer cleanup helpers (
upsertPeer/removePeer) so address cache mutations and voter-count rebuilds happen in a deterministic order on the single-threaded apply loop"
But §4.6 item 3 says:
"the apply loop updates
e.voterCountfrom the post-changeConfStatebeforeremovePeeris called for any address-cache cleanup, soremovePeersimply readse.voterCount."
And §5.2 confirms the same:
"the apply loop updates
e.voterCountande.isLearnerNodebefore invoking the per-peer cleanup helpers (upsertPeer,removePeer)"
These are directly contradictory. An implementer following §4.2 (update AFTER) and reading removePeer would find that e.voterCount still reflects the pre-change state when removePeer runs — which means removePeer cannot "simply read e.voterCount" as §4.6 promises. The post-removal voter count would be stale by one.
The correct ordering is voterCount update before removePeer / upsertPeer, as §4.6 and §5.2 state. The "after" in §4.2 edit 4 is wrong and should be changed to "before". The rationale for "before": removePeer reads e.voterCount to compute the post-removal ack-tracker threshold — this requires seeing the already-committed conf change, not the pre-change count. Fix this →
Summary
The design has absorbed four rounds of review and is close to implementation-ready. The two new issues above need to be resolved in the doc before code is written — the set-based formula bug in particular would propagate directly into a broken validateConfState if copied verbatim by an implementer.
claude[bot] Round 4 confirmed all three Round-3 items; flagged two
genuine bugs in the prose:
- §4.2 edit 3: the set-based learner check formula used && where it
should have been ||. The error condition needs to fire on EITHER
count mismatch OR set-membership mismatch — the && form silently
accepts same-count member divergence and conf-learner-missing-from-peers
cases. Replaced with the correct || form, plus an explicit valid-
condition restatement and a note on which mismatch cases the form
catches.
- §4.2 edit 4: the doc said suffrage-aware updates run after per-peer
cleanup, but §4.6 and §5.2 say they run before. The contradiction
resolves to two different orderings:
- voterCount / isLearnerNode update BEFORE upsertPeer / removePeer
(so removePeer reads the post-change count for its ack-tracker
threshold).
- e.config.Servers rebuild AFTER upsertPeer / removePeer (so it
reads the post-cleanup address cache).
Spelled out as a four-step apply-loop sequence to make the
ordering unambiguous for an implementer.
|
@claude review Both Round-4 issues were genuine bugs in my prose. Fixed in commit a9b918f:
|
|
Claude finished @bootjp's task in 3m 3s —— View job Review of
|
| # | Point | Status |
|---|---|---|
| 1 (High) | Set-based formula && → ` |
|
| 2 (Medium) | Ordering contradiction between §4.2, §4.6, §5.2 | ✅ §4.2 edit 4 now has an explicit 4-step sequence that resolves the contradiction: (1) recompute e.voterCount / e.isLearnerNode before upsertPeer / removePeer, (2) run helpers, (3) rebuild e.config.Servers after helpers, (4) existing bookkeeping. §4.6 item 3 ("the apply loop updates e.voterCount ... before removePeer") and §5.2 ("updates e.voterCount ... before invoking the per-peer cleanup helpers") are now consistent with §4.2. |
New issues
No new correctness issues introduced by the revisions. Two very minor documentation gaps that an implementer would need to infer, noted here for completeness but not blocking:
1. learnerSet construction is implied, not specified — Very minor
§4.2 edit 3 uses learnerSet and filteredLearnerCount in the pseudo-code without defining them. The reader must infer: filteredLearnerCount = count of peers where Suffrage == "learner" and learnerSet = {nodeID | peer in peers, peer.Suffrage == "learner"}. This is strongly implied by context and a competent implementer would get it right, but a one-line comment in the validation helper would close the ambiguity at implementation time.
2. learnerSetFromConfState helper referenced in §4.6 but not defined — Very minor
isLearnerNode: learnerSetFromConfState(prepared.disk.LocalSnap.Metadata.ConfState),The helper's return type (map[uint64]bool), key semantics (learner node IDs → true), and implementation (iterate conf.Learners, set each to true) are obvious from context. No action needed before implementation.
3. Test plan gap: validateConfState mixed-peers mismatch — Informational
§5.5 does not include an explicit unit test for the regression validateConfState edit 3 guards against — specifically, that a node with a mixed voter+learner peers file and a snapshot whose ConfState correctly separates them does not trigger errClusterMismatch on restart. The conformance test ("including after restart") exercises this path end-to-end, so the regression is covered indirectly, but a focused unit test for validateConfState itself (isolated from the multi-node setup) would make it easier to catch regressions in the validation helper in isolation. Not blocking — the end-to-end test is sufficient.
Summary
The design has successfully absorbed five rounds of review. Both high-priority correctness issues from the previous four rounds are resolved, and the document is now internally consistent across §4.2, §4.6, and §5.2. No new correctness issues were introduced by the Round-5 revisions. The design is ready to proceed to Milestone 1 implementation.
Per learner design doc §4.3. Adds Peer.Suffrage as a v2-peers-file artifact and bumps the persisted etcd-raft-peers.bin format to v2 with a per-peer suffrage byte (0=voter, 1=learner). The v1 reader is preserved for upgrade compatibility — v1 files are read as all-voter, which is exactly the current world. Suffrage on the Peer struct is intentionally NOT consulted by confStateForPeers (cold bootstrap stays voters-only) or configurationFromConfState (hot path reads ConfState directly). It exists solely to round-trip through the v2 file across restarts so the post-WAL-replay ConfState view and the operator-visible peers file stay consistent. ConfState remains the authoritative source of suffrage truth at runtime. Tests cover v2 round-trip with mixed voter/learner peers, v1 forward-compat reading as all-voter, and rejection of unknown file versions.
Per learner design doc sections 4.2 and 4.6. Wires the dormant learner scaffolding without exposing any operator-facing API yet -- the engine now correctly applies ConfChangeAddLearnerNode in the live path, reports suffrage end-to-end, and keeps learners out of the voter-quorum denominator. Engine: - New apply-loop-owned voterCount + isLearnerNode cache, initialized at Open from the snapshot ConfState (Voters / Learners) and rebuilt from scratch on every conf change (never patched, so promotion of a learner cannot leak a stale isLearnerNode entry). - applyConfigChange / applyConfigChangeV2 take the post-change ConfState (mirroring nextPeersAfterConfigChange) and run a four-step sequence: refreshVoterCache, per-peer cleanup, refreshConfigServers, configIndex bookkeeping. Step ordering matters: removePeer reads voterCount for the post-removal ack-tracker threshold, so the cache must be refreshed first; configurationFromConfState reads the post-cleanup e.peers, so the config rebuild must run after. - applyConfigPeerChange routes ConfChangeAddLearnerNode through applyAddedPeer (suffrage is an attribute of ConfState, not e.peers). - configurationFromConfState walks both Voters and Learners, emitting the right Suffrage string for each. upsertPeer / removePeer no longer touch e.config.Servers -- the apply loop owns it now via refreshConfigServers. Removed the dead upsertConfigServer / removeConfigServer / sortConfigServers / serverNodeID helpers. len(e.peers) overloads (sections 4.6 and 3.3 audit): - recordQuorumAck uses voterCount and early-rejects acks from learners. The 1-voter+1-learner stall the design flagged is the specific scenario this fixes. - refreshStatus single-node fast path uses voterCount. - removePeer uses postRemovalVoterCount instead of len(e.peers) for the ack-tracker threshold. - nextPeersAfterConfigChange already only walks addresses; audited no-op. validateConfState (section 4.2 edit 3): - Stops rejecting non-empty Learners. Joint-consensus markers (VotersOutgoing, LearnersNext, AutoLeave) stay rejected. - Builds expected voter slice filtered by Suffrage; voter check stays element-wise. - Learner check is set-based, rejecting both same-count member divergence and missing-from-peers.
Per learner design doc section 4.1. The Admin interface and the etcd
backend now expose the operator-facing API for learner membership
mutations. No proto / CLI / flag changes yet -- those land in M2.
Engine API:
- Admin.AddLearner attaches a non-voting replica via V1
ConfChangeAddLearnerNode. Mirrors AddVoter on the same submit path.
- Admin.PromoteLearner promotes via V1 ConfChangeAddNode. Two
preconditions run on the leader (single-threaded admin loop) BEFORE
the propose so a failed precondition does not enter the log:
1. target must currently be in ConfState.Learners (not Voters);
2. when minAppliedIndex is non-zero, leader-side
Progress.Match >= minAppliedIndex.
rawNode.Status() is read directly inside the admin handler -- same
goroutine that owns rawNode -- so no clone or lock is required.
etcd backend:
- New adminAction values and a proposeMembershipChange helper that
factors out the encode + storePendingConfig + ProposeConfChange
dance shared by AddVoter / AddLearner / PromoteLearner.
- New sentinel errors: errPromoteLearnerNotLearner,
errPromoteLearnerNoProgress, errPromoteLearnerNotCaughtUp. All
three surface as FailedPrecondition through the existing
adminError mapping in internal/raftadmin.
- submitAdminEx threads the richer adminRequest (now carrying
minAppliedIndex) through the same channel; submitAdmin keeps its
old shape for the unchanged callers.
Tests (internal/raftengine/etcd/learner_test.go):
- AddLearner happy path: suffrage reported correctly on both peers,
writes commit on a 1-voter cluster (its own quorum) and replicate
to the learner.
- Promote happy path: AddLearner -> warm a write -> PromoteLearner ->
Configuration() flips to two voters.
- PromoteLearner rejects non-learner targets pre-propose.
- PromoteLearner rejects not-caught-up targets via minAppliedIndex.
- Lease-read regression: 1-voter + 1-learner stays on the
single-node fast path (LastQuorumAck stays warm).
The Admin interface gained AddLearner / PromoteLearner in the previous commit; without these stubs the existing fakeEngine no longer satisfies the interface, which silently bypasses the admin type-assertion in NewServer and surfaces as Unimplemented errors in TestServerMapsEngineAdminMethods. The fake records calls in matching slices so future tests for the Server learner RPCs (M2) can assert on them.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/raftengine/etcd/engine.go (1)
2654-2657:⚠️ Potential issue | 🟡 MinorSingle-node fast-path campaign is skipped when learners are present.
shouldCampaignSingleNodereturns true only whenlen(cfg.Servers) == 1. After this PR,cfg.Serversincludes learners, so a node restarting as the sole voter in a 1-voter + N-learner cluster will not take the fast-pathCampaign()and instead waits a full electionTimeout (~1s) for the standard election path. Functionally correct, but a minor startup-latency regression and a behavior change versus today.Suggested fix: count voters via
e.voterCount(already wired up post-Open) or filtercfg.ServersbySuffrage == SuffrageVoter:♻️ Proposed fix
func (e *Engine) shouldCampaignSingleNode() bool { cfg := e.currentConfiguration() - return len(cfg.Servers) == 1 && cfg.Servers[0].ID == e.localID + voterCount := 0 + var soleVoter raftengine.Server + for _, s := range cfg.Servers { + if s.Suffrage == SuffrageVoter { + voterCount++ + soleVoter = s + } + } + return voterCount == 1 && soleVoter.ID == e.localID }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/raftengine/etcd/engine.go` around lines 2654 - 2657, shouldCampaignSingleNode currently counts all entries in cfg.Servers (including learners) so the single-node fast-path is skipped when learners exist; update shouldCampaignSingleNode to consider only voters by either using e.voterCount (available after Open) or by filtering cfg.Servers for entries with Server.Suffrage == SuffrageVoter and then returning true when the voter count == 1 and that sole voter’s ID matches e.localID; locate and modify the shouldCampaignSingleNode method to perform this voter-only check (reference: shouldCampaignSingleNode, e.voterCount, cfg.Servers, SuffrageVoter).
🧹 Nitpick comments (6)
internal/raftengine/etcd/peer_metadata_test.go (2)
87-103: Test name doesn't match the assertion.
TestPersistedPeersV2RewritesAfterV1Readnever reads a v1 file — it just callssavePersistedPeersonce and verifies the writer emitted v2. Consider renaming to something likeTestPersistedPeersWriterAlwaysEmitsV2so future readers don't go hunting for the v1→v2 migration step that isn't there.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/raftengine/etcd/peer_metadata_test.go` around lines 87 - 103, Rename the test TestPersistedPeersV2RewritesAfterV1Read to a name that matches its behavior (e.g., TestPersistedPeersWriterAlwaysEmitsV2) and update any in-test comment text to reflect that it verifies the writer emits v2; locate the test function by its current name TestPersistedPeersV2RewritesAfterV1Read and adjust references if applicable, leaving the calls to savePersistedPeers, readPeersFileHeader, peersFileName and the assertion against peersFileVersionV2 unchanged.
13-121: Missing coverage for the unknown-suffrage-byte path.
readPersistedPeerrejects any v2 suffrage byte other than0or1(peer_metadata.go:181-183), but no test exercises that error branch. Consider adding a fixture that writes a v2 entry withsuffrage = 0xFFand assertsLoadPersistedPeersreturns an error — it pins the validation that prevents silent voter coercion if a future binary writes a new suffrage variant.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/raftengine/etcd/peer_metadata_test.go` around lines 13 - 121, Add a test (e.g., TestPersistedPeersV2UnknownSuffrage) that uses replaceFile to hand-craft a v2 peers file: write peersFileMagic, writeU32(peersFileVersionV2), writeU64(index), writeU32(count), then for a peer writeU64(NodeID), writeString(ID), writeString(Address) and then write a single suffrage byte 0xFF; call LoadPersistedPeers and assert it returns an error. This will exercise the readPersistedPeer path that rejects unknown suffrage bytes and ensure the validation remains enforced.internal/raftadmin/server_test.go (1)
141-209: Remove unused learner admin fake methods and recording fields.The proto service (proto/service.proto) does not define
AddLearnerorPromoteLearnerRPCs, and the Server does not expose these methods. The fake engine'sAddLearner,PromoteLearnermethod implementations and theaddLearnerCalls,promoteLearnerCallsrecording fields (internal/raftadmin/server_test.go lines 27–28, 106–118) are dead code and should be removed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/raftadmin/server_test.go` around lines 141 - 209, The test contains dead code: remove the unused fake engine methods AddLearner and PromoteLearner and their recording slices addLearnerCalls and promoteLearnerCalls from the fakeEngine test fixture and any related test assertions; update the fakeEngine type (references to AddLearner, PromoteLearner, addLearnerCalls, promoteLearnerCalls) to only implement and record the RPCs actually used by Server (e.g., AddVoter, RemoveServer, TransferLeadership) so the test compiles and only exercises defined proto RPCs and Server methods.internal/raftengine/etcd/engine.go (1)
1419-1442: SynchronousPromoteLearnerprecondition is sound; small ergonomic note onminAppliedIndex == 0.Running both preconditions on the admin/run loop before
ProposeConfChangeis the right pattern — a failed precondition cannot leave a half-applied entry in the log. The use ofe.rawNode.Status()is safe since this handler runs on the rawNode-owning goroutine; theStatus()clone cost is acceptable for this rarely-invoked admin operation.Per §8 open question 3 in the design doc,
minAppliedIndex == 0silently disables the primary safety check, whileprevIndex == 0is a benign no-op skip. Operators copy-pasting from existing scripts could inadvertently lose this guard. Consider tracking a follow-up to add an explicitskip_min_applied_checkboolean (or refusing0and requiring an explicit sentinel) before declaring this surface stable in Milestone 2.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/raftengine/etcd/engine.go` around lines 1419 - 1442, The code silently treats minAppliedIndex==0 as “skip check”, which is error-prone; update the PromoteLearner request handling to require an explicit sentinel instead of implicit 0: add a boolean flag (e.g., SkipMinAppliedCheck) to the admin request struct and replace the current minAppliedIndex==0 branch with a check that only skips the applied-index verification when SkipMinAppliedCheck is true; otherwise return a validation error when minAppliedIndex==0 (or require callers to supply a non-zero minAppliedIndex). Change the handler that currently reads req.minAppliedIndex and the callers that construct the request so they set the new SkipMinAppliedCheck flag or provide a concrete minAppliedIndex, and keep the existing use of e.rawNode.Status(), progress.Match, e.isLearnerNode and e.proposeMembershipChange unchanged.docs/design/2026_04_26_proposed_raft_learner.md (1)
1-941: Thorough design doc; the four-pass review history shows in the precision of §4.2 and §4.6.The design satisfies the design-doc-first learning ("For any change beyond a single-file edit ... write a
*_proposed_*.mddesign doc first"), and the iterative refinements (set-based learner check, ordering-point split between voter-cache refresh and config-server rebuild,refreshVoterCacherebuild-from-scratch contract) are all reflected in the implementation.Two small polish items:
- Line 60 and Line 19 reference specific engine.go line numbers (
engine.go:2656,engine.go:3096,engine.go:174). The doc acknowledges line-number drift on Line 128, but for these forward references in §2 / §3.2 you may want to switch to function-name + brief excerpt the same way later sections do, to stay consistent with the round-1 fix the author mentioned in the PR notes.- §8 open question 3 (Lines 909–917) flagging the
min_applied_index == 0footgun is the right thing to track explicitly; consider promoting it to a Milestone-2 gate so the operator surface doesn't ship the ergonomic hazard by default.Based on learnings: "For any change beyond a single-file edit ... write a
*_proposed_*.mddesign doc first and land it before implementation."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/2026_04_26_proposed_raft_learner.md` around lines 1 - 941, Replace brittle line-number references with stable function-name + brief excerpt references and promote the min_applied_index==0 footgun to a Milestone-2 gate: update the two forward refs in §2/§3.2 that currently say "engine.go:2656", "engine.go:3096", "engine.go:174" to instead cite the corresponding function names (e.g., the methods or handlers those offsets refer to) plus a short excerpt of the behavior, and in §8 (open question 3) move the discussion about min_applied_index == 0 from an open question into the Milestone-2 checklist (or add an explicit Milestone-2 acceptance criterion) so the operator-facing RPC surface cannot ship without an approved ergonomics decision about requiring an explicit skip flag or similar.internal/raftengine/etcd/learner_test.go (1)
17-209: Solid end-to-end coverage; the negative cases pin down the precondition contract.The five tests directly map to the design doc's M1 exit criteria and the §4.6 lease-read regression scenarios. Using sentinel matching with
errors.Is(Lines 145, 170) for the precondition errors keeps them resilient against future error-message tweaks.Per coding guidelines (
prefer table-driven test cases),TestPromoteLearnerRejectsNonLearnerandTestPromoteLearnerRejectsNotCaughtUpshare nearly identical scaffolding and could collapse into a single table-driven test withwantErrsentinels. Optional — readability of standalone tests is also fine.As per coding guidelines: "Co-locate test files in the same package and use table-driven test patterns."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/raftengine/etcd/learner_test.go` around lines 17 - 209, Tests TestPromoteLearnerRejectsNonLearner and TestPromoteLearnerRejectsNotCaughtUp duplicate setup and assertion logic; collapse them into a single table-driven test (e.g., TestPromoteLearnerPreconditionCases) that iterates over cases with fields {name, setupAction, wantErr} to exercise the two failure scenarios. In each case reuse the existing scaffolding (newTransportTestNodes, startTransportTestServers, openTransportTestNode, waitForLeaderNode) and perform either AddVoter or AddLearner + unreachable-index promotion as before, then call leader.engine.PromoteLearner(...) and assert errors.Is(err, wantErr) using the sentinel errors errPromoteLearnerNotLearner and errPromoteLearnerNotCaughtUp; keep each case independent via t.Run and the same ctx/cancel pattern. Ensure test names reflect the case and remove the original two standalone tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/raftengine/etcd/engine.go`:
- Around line 2654-2657: shouldCampaignSingleNode currently counts all entries
in cfg.Servers (including learners) so the single-node fast-path is skipped when
learners exist; update shouldCampaignSingleNode to consider only voters by
either using e.voterCount (available after Open) or by filtering cfg.Servers for
entries with Server.Suffrage == SuffrageVoter and then returning true when the
voter count == 1 and that sole voter’s ID matches e.localID; locate and modify
the shouldCampaignSingleNode method to perform this voter-only check (reference:
shouldCampaignSingleNode, e.voterCount, cfg.Servers, SuffrageVoter).
---
Nitpick comments:
In `@docs/design/2026_04_26_proposed_raft_learner.md`:
- Around line 1-941: Replace brittle line-number references with stable
function-name + brief excerpt references and promote the min_applied_index==0
footgun to a Milestone-2 gate: update the two forward refs in §2/§3.2 that
currently say "engine.go:2656", "engine.go:3096", "engine.go:174" to instead
cite the corresponding function names (e.g., the methods or handlers those
offsets refer to) plus a short excerpt of the behavior, and in §8 (open question
3) move the discussion about min_applied_index == 0 from an open question into
the Milestone-2 checklist (or add an explicit Milestone-2 acceptance criterion)
so the operator-facing RPC surface cannot ship without an approved ergonomics
decision about requiring an explicit skip flag or similar.
In `@internal/raftadmin/server_test.go`:
- Around line 141-209: The test contains dead code: remove the unused fake
engine methods AddLearner and PromoteLearner and their recording slices
addLearnerCalls and promoteLearnerCalls from the fakeEngine test fixture and any
related test assertions; update the fakeEngine type (references to AddLearner,
PromoteLearner, addLearnerCalls, promoteLearnerCalls) to only implement and
record the RPCs actually used by Server (e.g., AddVoter, RemoveServer,
TransferLeadership) so the test compiles and only exercises defined proto RPCs
and Server methods.
In `@internal/raftengine/etcd/engine.go`:
- Around line 1419-1442: The code silently treats minAppliedIndex==0 as “skip
check”, which is error-prone; update the PromoteLearner request handling to
require an explicit sentinel instead of implicit 0: add a boolean flag (e.g.,
SkipMinAppliedCheck) to the admin request struct and replace the current
minAppliedIndex==0 branch with a check that only skips the applied-index
verification when SkipMinAppliedCheck is true; otherwise return a validation
error when minAppliedIndex==0 (or require callers to supply a non-zero
minAppliedIndex). Change the handler that currently reads req.minAppliedIndex
and the callers that construct the request so they set the new
SkipMinAppliedCheck flag or provide a concrete minAppliedIndex, and keep the
existing use of e.rawNode.Status(), progress.Match, e.isLearnerNode and
e.proposeMembershipChange unchanged.
In `@internal/raftengine/etcd/learner_test.go`:
- Around line 17-209: Tests TestPromoteLearnerRejectsNonLearner and
TestPromoteLearnerRejectsNotCaughtUp duplicate setup and assertion logic;
collapse them into a single table-driven test (e.g.,
TestPromoteLearnerPreconditionCases) that iterates over cases with fields {name,
setupAction, wantErr} to exercise the two failure scenarios. In each case reuse
the existing scaffolding (newTransportTestNodes, startTransportTestServers,
openTransportTestNode, waitForLeaderNode) and perform either AddVoter or
AddLearner + unreachable-index promotion as before, then call
leader.engine.PromoteLearner(...) and assert errors.Is(err, wantErr) using the
sentinel errors errPromoteLearnerNotLearner and errPromoteLearnerNotCaughtUp;
keep each case independent via t.Run and the same ctx/cancel pattern. Ensure
test names reflect the case and remove the original two standalone tests.
In `@internal/raftengine/etcd/peer_metadata_test.go`:
- Around line 87-103: Rename the test TestPersistedPeersV2RewritesAfterV1Read to
a name that matches its behavior (e.g., TestPersistedPeersWriterAlwaysEmitsV2)
and update any in-test comment text to reflect that it verifies the writer emits
v2; locate the test function by its current name
TestPersistedPeersV2RewritesAfterV1Read and adjust references if applicable,
leaving the calls to savePersistedPeers, readPeersFileHeader, peersFileName and
the assertion against peersFileVersionV2 unchanged.
- Around line 13-121: Add a test (e.g., TestPersistedPeersV2UnknownSuffrage)
that uses replaceFile to hand-craft a v2 peers file: write peersFileMagic,
writeU32(peersFileVersionV2), writeU64(index), writeU32(count), then for a peer
writeU64(NodeID), writeString(ID), writeString(Address) and then write a single
suffrage byte 0xFF; call LoadPersistedPeers and assert it returns an error. This
will exercise the readPersistedPeer path that rejects unknown suffrage bytes and
ensure the validation remains enforced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dd057d86-1600-4246-a75a-8b738033dbe8
📒 Files selected for processing (11)
docs/design/2026_04_26_proposed_raft_learner.mdinternal/raftadmin/server_test.gointernal/raftengine/engine.gointernal/raftengine/etcd/engine.gointernal/raftengine/etcd/engine_test.gointernal/raftengine/etcd/learner_test.gointernal/raftengine/etcd/peer_metadata.gointernal/raftengine/etcd/peer_metadata_test.gointernal/raftengine/etcd/peers.gointernal/raftengine/etcd/persistence.gointernal/raftengine/etcd/wal_store.go
Per learner design doc Milestone 2. Operator-facing surface for the M1 engine API: gRPC RaftAdmin service, internal/raftadmin server wiring, and the cmd/raftadmin CLI subcommands. Proto: - service.proto: AddLearner and PromoteLearner RPCs. - RaftAdminAddLearnerRequest mirrors AddVoterRequest. - RaftAdminPromoteLearnerRequest carries id, previous_index, and min_applied_index. min_applied_index = 0 skips the catch-up precondition (kept for symmetry with previous_index but discouraged in production -- the runbook recommends a recent leader commit). internal/raftadmin/server.go: - AddLearner / PromoteLearner methods route through the existing Admin interface. Reuses the adminError code mapping so engine-level FailedPrecondition (errPromoteLearnerNotLearner / errPromoteLearnerNotCaughtUp / errPromoteLearnerNoProgress) surface unchanged. - fakeEngine and TestServerMapsEngineAdminMethods extended. cmd/raftadmin: - add_learner <id> <address> [previous_index] - promote_learner <id> [previous_index] [min_applied_index] - Updated usage strings and the master fallback list. - Tests for both subcommands print "index: N" on success.
Per learner design doc Milestone 2 (operator surface) and section 4.5 (post-apply alarm enforcement model). Flag plumbing: - main.go: new --raftJoinAsLearner bool. Default false. The flag is an OPERATOR ALARM, not a consensus-level enforcement: it does not prevent the leader from issuing AddVoter for the joiner. If a post-apply ConfState lists this node in Voters, an ERROR-level structured log fires and JoinRoleViolationCount increments. - raftengine.FactoryConfig.JoinAsLearner threads the flag through the engine factory. - etcd backend Engine struct stores joinAsLearner and a one-shot joinAlarmFired latch (atomic.Bool) so the alarm fires at most once per process per engine. Apply-loop alarm hook: - alarmIfJoinedAsVoter() runs at the end of applyConfigChange, applyConfigChangeV2, AND applyReadySnapshot. The snapshot path is important: a brand-new joiner often catches the leader's snapshot before any conf-change entries land, so the alarm needs the snapshot ConfState as much as the entry ConfState. - applyReadySnapshot also now refreshes voterCount / isLearnerNode from snapshot.Metadata.ConfState so post-snapshot reads of those fields (recordQuorumAck, refreshStatus) see the new voter set immediately, not just after the next conf-change entry. - JoinRoleViolationCount() exposes the process-wide alarm counter for monitoring. Behaviour test (§5.5 last bullet): - TestLinearizableReadOnLearnerForwardsToLeader: a learner returns raftengine.ErrNotLeader from LinearizableRead today. Locking the surface in with a typed errors.Is check guards against any future regression that lets the learner accidentally answer LinearizableRead from local FSM (follower-served reads are an explicit non-goal of M2). - TestJoinAsLearnerAlarmFiresWhenAddedAsVoter: boots a joiner with joinAsLearner=true, has the leader call AddVoter (the misuse the alarm catches), confirms JoinRoleViolationCount increments and the joiner stays running. Runbook: - docs/raft_learner_operations.md: end-to-end operator workflow, prerequisites, the five-step attach/promote/verify sequence, remove-learner via remove_server, common errors, observability, explicit non-goals (no cold-bootstrap as learner, no auto-promote, no follower-served reads in M2).
Real bug + nits surfaced by CodeRabbit on the M1+M2 branch. Real bug (outside-diff): - shouldCampaignSingleNode counted ALL servers in cfg.Servers (voters + learners). After learner support landed, a 1-voter + N-learner cluster on restart would skip the fast-path Campaign() and wait a full electionTimeout for the standard election. Now counts voters via Server.Suffrage. Same class as the four len(e.peers) overloads from §3.3 -- one I missed. min_applied_index footgun (closes design §8 open question 3): - proto: RaftAdminPromoteLearnerRequest gains skip_min_applied_check bool. The engine REJECTS min_applied_index == 0 with errPromoteLearnerMinAppliedZero unless skip_min_applied_check is also true, so a script that omits the catch-up bound gets a clean FailedPrecondition instead of silently disabling the safety check. - raftengine.Admin.PromoteLearner gains the skipMinAppliedCheck parameter; etcd backend admin loop honors it. - cmd/raftadmin promote_learner gains an optional skip_min_applied_check positional arg (true|false|1|0|yes|no). - New unit test TestPromoteLearnerRejectsZeroMinAppliedWithoutSkip. - Design doc §8: open question 3 marked Resolved. Runbook updated. Test polish: - TestPersistedPeersV2RewritesAfterV1Read renamed to TestPersistedPeersWriterAlwaysEmitsV2 (the body never reads a v1 file -- it just verifies writer-side invariant). - New TestPersistedPeersV2UnknownSuffrageRejected pins the 0xFF rejection path that future suffrage variants must not coerce through. Doc polish: - §2 / §3.2 / §7 line-number references replaced with stable function-name + brief excerpt references, matching the convention the doc already commits to in §3.2's preamble.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review Addressed CodeRabbit Round 1 (commit 2f3bff2): Real bug:
Test polish:
Doc polish:
Skipped:
|
|
Claude finished @bootjp's task in 13m 1s —— View job Review of
|
| Claim | Status |
|---|---|
shouldCampaignSingleNode counts voters only |
✅ Confirmed at engine.go:2752. Filters cfg.Servers by Suffrage != SuffrageVoter, uses voter-only count for the == 1 check. |
minAppliedIndex == 0 rejected unless skipMinAppliedCheck |
✅ Engine-level guard at engine.go:1097. Proto field skip_min_applied_check at service.proto:236. CLI parseOptionalBool at main.go:326. New test TestPromoteLearnerRejectsZeroMinAppliedWithoutSkip at learner_test.go:156. |
TestPersistedPeersWriterAlwaysEmitsV2 renamed |
✅ Test at peer_metadata_test.go:92 correctly named. Comment updated to match — "writer is unconditionally v2". |
TestPersistedPeersV2UnknownSuffrageRejected added |
✅ Test at peer_metadata_test.go:113. Hand-crafts a v2 file with suffrage byte 0xFF, asserts LoadPersistedPeers returns an error. Write order (nodeID → suffrage byte → ID → address) correctly matches the production format. |
| Design doc line-number refs replaced | ✅ §2 / §3.2 / §7 now use function names + excerpts. |
New issue — High priority
nextPeersAfterConfigChange / nextPeersAfterConfigChangeV2 ignore the confState parameter — learner suffrage is never written to the v2 peers file
The apply loop (lines 1940-1958) calls:
confState := e.rawNode.ApplyConfChange(cc)
nextPeers := e.nextPeersAfterConfigChange(cc.Type, cc.NodeID, cc.Context, *confState)
if err := e.persistConfigState(entry.Index, *confState, nextPeers); err != nil {nextPeersAfterConfigChange (line 3431) receives raftpb.ConfState as _ and ignores it:
func (e *Engine) nextPeersAfterConfigChange(changeType raftpb.ConfChangeType, nodeID uint64, context []byte, _ raftpb.ConfState) []Peer {
...
applyConfigPeerChangeToMap(next, changeType, nodeID, context)
return sortedPeerList(next)
}applyConfigPeerChangeToMap decodes the peer from the ConfChange context bytes. encodeConfChangeContext (line 3696) only encodes NodeID, ID, and Address — no Suffrage. The decoded peer always has Peer.Suffrage == "".
All peers in e.peers also have Suffrage: "" because upsertPeer stores whatever was decoded from the ConfChange context (which lacks Suffrage). The comment at applyConfigPeerChange:2107 confirms this is intentional: "suffrage is an attribute of ConfState, not of e.peers."
The result: when persistConfigState writes the v2 peers file, every peer — including learners — gets Peer.Suffrage == "". persistedSuffrageByte("") returns persistedSuffrageVoter (0x00), so the learner is written as a voter.
Failure path on restart:
- Learner added at index N → config snapshot at N (
configSnapshotUpToDatereturns false for fresh clusters; bootstrap snapshot at index 1 < N) → snapshotConfState.Learners = [learnerNodeID] - Peers file at index N — learner has
Suffrage = ""→ written as voter - Restart →
normalizeOpenConfigloads peers file →cfg.Peers = peers.Peers→ learner peer hasSuffrage: ""→ treated as voter validateOpenPeers:persisted.Index = N == snapshot.Metadata.Index = N→ N > N is false →validateConfStaterunssplitPeersBySuffrage(peers)→ learner (Suffrage="") lands inexpectedVoters(notlearnerSet)validateConfStateVoters(conf, expectedVoters)→conf.Votershas N entries,expectedVotershas N+1 →errClusterMismatch→ startup failure
No existing test catches this — none of the learner tests restart the engine after AddLearner. The conformance test mentioned in §5 of the design doc exercises "after restart" but that scenario is not yet in the test file.
Fix: Use the confState parameter (currently _) in nextPeersAfterConfigChange and nextPeersAfterConfigChangeV2 to annotate each peer's suffrage before returning:
func (e *Engine) nextPeersAfterConfigChange(changeType raftpb.ConfChangeType, nodeID uint64, context []byte, confState raftpb.ConfState) []Peer {
e.mu.RLock()
next := clonePeerMap(e.peers)
e.mu.RUnlock()
applyConfigPeerChangeToMap(next, changeType, nodeID, context)
annotatePeerSuffrageFromConfState(next, confState) // set Suffrage from ConfState
return sortedPeerList(next)
}Where:
func annotatePeerSuffrageFromConfState(peers map[uint64]Peer, conf raftpb.ConfState) {
learners := make(map[uint64]bool, len(conf.Learners))
for _, id := range conf.Learners {
learners[id] = true
}
for id, peer := range peers {
if learners[id] {
peer.Suffrage = SuffrageLearner
} else {
peer.Suffrage = SuffrageVoter
}
peers[id] = peer
}
}Apply the same fix to nextPeersAfterConfigChangeV2. Per the design doc §4.3: "Its sole role is to round-trip the v2 peers file across restarts so the operator view of the peers file is consistent with the eventual ConfState view after WAL replay." That contract is currently broken.
Required test: Add a restart case to TestAddLearnerReplicatesWithoutCountingAsVoter (or a new TestLearnerPeersFilePersistsSuffrageAcrossRestart): add a learner, close all engines, reopen them, and assert they start without error. This is the missing conformance test referenced in §5.5. Fix this →
Other observations (non-blocking)
promote_learner CLI usage string could warn about required min_applied_index
raftadmin promote_learner <id> (no other args) sends MinApplied=0, Skip=false, which the engine rejects with errPromoteLearnerMinAppliedZero. The usage string (main.go:425) shows [min_applied_index] as optional. The error message from the engine is descriptive, but the usage string could note "required unless skip_min_applied_check=true." Not blocking; the runbook covers this.
adminError maps all non-context errors to codes.FailedPrecondition (server.go:161)
errPromoteLearnerMinAppliedZero, errPromoteLearnerNotCaughtUp, and errPromoteLearnerNotLearner are all mapped to FailedPrecondition. This is semantically correct (the precondition was not met) and consistent with the existing AddVoter error mapping. ✓
TestJoinAsLearnerAlarmFiresWhenAddedAsVoter uses process-global counter correctly (learner_test.go:224)
before := JoinRoleViolationCount() captures the counter before the operation, so concurrent test execution against the global counter is handled safely. Since learner tests don't call t.Parallel(), intra-package sequencing prevents races. ✓
Summary
The five CodeRabbit Round 1 fixes are all correct and well-tested. The outstanding issue is a startup failure after a learner add: the v2 peers file writes the learner with Suffrage="" (treated as voter), causing validateConfState to fail on restart when the config snapshot and peers file land at the same index. The fix is small (use the already-received confState in nextPeersAfterConfigChange), but a restart test is also needed to lock it down.
CodeRabbit Round 2 caught a real correctness bug: AddLearner wrote
the v2 peers file with Suffrage="" for every peer (the ConfChange
context bytes do not encode suffrage and e.peers stores only
nodeID/ID/address). persistedSuffrageByte("") writes voter, so on
restart validateConfState saw a learner-as-voter peers list and
rejected startup with errClusterMismatch. This broke the design
doc §4.3 contract that the v2 peers file round-trips suffrage
across restarts.
The fix annotates Peer.Suffrage from the post-change ConfState at
the three persist sites:
1. nextPeersAfterConfigChange / nextPeersAfterConfigChangeV2:
stop ignoring the confState parameter; new
annotatePeerSuffrageFromConfState helper stamps Suffrage on the
per-peer clone before persistConfigState writes the v2 file.
2. prepareOpenState: at Open time, annotate the operator-provided
peer slice from the persisted snapshot's ConfState before
savePersistedPeers writes the bootstrap-time peers file. Without
this, a fresh joiner whose peers file is written by Open carries
the bootstrap suffrage forever.
3. applyReadySnapshot: when a learner restores a snapshot from the
leader (the common joiner path), the apply loop's
nextPeersAfterConfigChange hook is bypassed -- the conf change
is already incorporated into the snapshot's ConfState. New
savePeersFileForSnapshot helper writes the v2 peers file with
suffrage drawn from the snapshot's ConfState. Idempotent via
savePersistedPeers's existing index check.
Tests:
- New TestLearnerPeersFilePersistsSuffrageAcrossRestart: add a
learner, propagate a write so both peers are caught up, verify
both nodes' v2 peers files carry Suffrage="learner" for the
learner peer, then close + reopen both engines and assert no
startup failure.
- TestNextPeersAfterConfigChange{,V2}* expectations updated to
assert the annotated Suffrage rather than the empty-string
pre-fix output.
|
@claude review Round 2 caught a real high-priority bug. Fixed in commit 536a90a:
|
|
Claude finished @bootjp's task in 7m 52s —— View job Review of
|
| Claim | Status | Notes |
|---|---|---|
nextPeersAfterConfigChange / V2 use confState (no more _) |
✅ | engine.go:3470, engine.go:3481 — both call annotatePeerSuffrageFromConfState(next, confState) after applyConfigPeerChangeToMap |
prepareOpenState annotates before bootstrap write |
✅ | engine.go:583 — annotatePeerSuffrageInSlice(peers, disk.LocalSnap.Metadata.ConfState) precedes savePersistedPeers |
applyReadySnapshot writes peers file with annotated suffrage |
✅ | engine.go:1877 — savePeersFileForSnapshot calls annotatePeerSuffrageInSlice before savePersistedPeers |
TestLearnerPeersFilePersistsSuffrageAcrossRestart closes + reopens |
✅ | learner_test.go:213-229 — genuine Close() + Open() (not mocked); would fail with errClusterMismatch without the fix |
Pre-existing TestNextPeersAfterConfigChange* tests updated |
✅ | engine_test.go:1076-1103 — now assert SuffrageVoter/SuffrageLearner instead of "" |
Four-step apply loop sequence — correct and consistent
applyConfigChange at engine.go:2046 exactly implements the §4.2 edit 4 sequence:
refreshVoterCache(confState)— rebuildsvoterCount+isLearnerNodebefore peer helpers run, soremovePeerreads the post-change voter count for its ackTracker thresholdapplyConfigPeerChange(...)— mutatese.peers(address cache only, no Suffrage)refreshConfigServers(confState)— rebuildse.config.Serversfrome.peers+ ConfState after helpers, so it sees post-cleanup addresses and post-ConfState suffragesetConfigIndex/resolveConfigChange/alarmIfJoinedAsVoter— bookkeeping
This matches applyConfigChangeV2 at engine.go:2055. The four-step comment at engine.go:2037-2045 is accurate.
annotatePeerSuffrageFromConfState logic — correct
for id, peer := range peers {
switch {
case learners[id]:
peer.Suffrage = SuffrageLearner
case voters[id]:
peer.Suffrage = SuffrageVoter
default:
continue // transient removal state — keep existing
}
peers[id] = peer
}The default: continue correctly handles transient removal state (a peer not yet in either Voters or Learners, e.g., between ConfChangeRemoveNode propose and apply). Such peers will be dropped from e.peers when applyConfigPeerChange runs. The map is mutated in place, which is safe since nextPeersAfterConfigChange works on a clone of e.peers.
annotatePeerSuffrageInSlice uses the same logic for the slice form (bootstrap / snapshot paths) without an explicit default case — peers not in ConfState keep Suffrage="", which persistedSuffrageByte("") maps to voter (safe default for peers not yet cluster members).
validateConfState — set-based learner check is correct
validateConfStateLearners at wal_store.go:445:
if len(conf.Learners) != len(expected) {
return errors.Wrapf(...)
}
for _, nodeID := range conf.Learners {
if _, ok := expected[nodeID]; !ok {
return errors.Wrapf(...)
}
}Uses ||-semantics (count mismatch OR member not-in-set → error), which is the correct fix from Round 4. The voter check stays element-wise (relies on the established NodeID-sort ordering invariant). The learner check is set-based (no prior ordering contract). Both cases from Round 4's table are handled: same-count-member-divergence (count passes, membership fails) and conf-learner-missing-from-peers (membership fails).
Voter cache initialization and snapshot-path refresh — correct
At Open (engine.go:508-509):
voterCount: len(prepared.disk.LocalSnap.Metadata.ConfState.Voters),
isLearnerNode: learnerSetFromConfState(prepared.disk.LocalSnap.Metadata.ConfState),Both fields initialized from snapshot ConfState, adjacent to e.config, before refreshStatus() is called at engine.go:528. The false-single-node-fast-path activation for a 3-voter cluster on startup (Round 2 issue 2) is closed.
At snapshot restore (engine.go:1868):
e.refreshVoterCache(snapshot.Metadata.ConfState)
e.setConfigurationFromConfState(snapshot.Metadata.ConfState, ...)Voter cache is refreshed before config rebuild, matching the apply-loop ordering.
refreshVoterCache (engine.go:2107-2111) rebuilds both fields from scratch — never patches — so learner promotion cannot leave a stale isLearnerNode[id]=true entry.
Lock-free isLearnerNode read in recordQuorumAck — safe
recordQuorumAck at engine.go:1686 reads e.isLearnerNode[msg.From] without taking e.mu. The comment explains this is safe: both the writer (refreshVoterCache, called from applyConfigChange in the apply loop) and the reader (recordQuorumAck, called from handleStep) run on the same event-loop goroutine. The single-writer invariant is maintained by the event loop's sequential dispatch.
refreshStatus reads voterCount under e.mu.Lock() because Status() is a public API called from external goroutines.
shouldCampaignSingleNode — correct
engine.go:2786-2805 counts only servers with Suffrage == SuffrageVoter, explicitly skipping learners. A 1-voter + N-learner cluster correctly returns true. Test coverage: TestRemovePeerLearnerKeepsSingleNodeFastPath exercises the fast-path while a learner is attached.
Restart test is not a mock
TestLearnerPeersFilePersistsSuffrageAcrossRestart (learner_test.go:162):
- Calls
leader.engine.Close()andnodes[1].engine.Close()(not stubs) - Calls
Open(ctx, OpenConfig{...})with noPeersfield —normalizeOpenConfigloads from the persisted peers file, so this exercises the full restart path throughloadPersistedPeersState→validateOpenPeers→validateConfState - Asserts
require.NoError(t, err)— would surfaceerrClusterMismatchif suffrage wasn't round-tripped
The assertion at learner_test.go:201-203 also directly checks the persisted file has SuffrageLearner before the restart, confirming the annotation happened at persist time rather than being deferred.
Minor observations (non-blocking)
1. applyReadySnapshot ordering: voter cache → config → peers file
The comment at engine.go:1870-1876 accurately explains the WHY for savePeersFileForSnapshot being called after setConfigurationFromConfState rather than the reverse. Worth noting: savePeersFileForSnapshot reads e.peers (address cache), not e.config.Servers (which has the suffrage). The suffrage for the file comes from snapshot.Metadata.ConfState passed to annotatePeerSuffrageInSlice. This is correct and the two data sources (address from e.peers, suffrage from ConfState) are independent.
2. parseOptionalBool in CLI accepts "" as false
main.go:333 has case "false", "0", "no", "":. This means an empty argument (e.g., raftadmin <addr> promote_learner n1 0 0 "") is treated as skip_min_applied_check=false. Not a correctness issue; the usage string at main.go:425 is descriptive enough.
3. fakePromoteLearnerCall assertion in server_test.go:229 uses only {id, prevIndex, minAppliedIndex} but not skipMinAppliedCheck. The call at line 202 passes SkipMinAppliedCheck: false (default), so skipMinAppliedCheck=false is implicitly asserted. Could add it for completeness but not wrong as-is.
Summary
All three persist-site fixes are correctly implemented and match the design doc's §4.3 contract. The new restart test actually closes and reopens the engines, pinning the regression. No new correctness issues introduced by commit 536a90a6. The implementation is ready to merge pending CI green.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 536a90a6f1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fsmSnapDir: filepath.Join(prepared.cfg.DataDir, fsmSnapDirName), | ||
| tickInterval: prepared.cfg.TickInterval, | ||
| electionTick: prepared.cfg.ElectionTick, | ||
| joinAsLearner: prepared.cfg.JoinAsLearner, |
There was a problem hiding this comment.
Trigger join-as-learner alarm during engine startup
When a node restarts with JoinAsLearner=true and its persisted snapshot already has the local node in ConfState.Voters, startup does not emit the alarm or increment JoinRoleViolationCount unless a later config change/snapshot happens. In this code path, Open seeds config state from LocalSnap.Metadata.ConfState but never calls alarmIfJoinedAsVoter, so the operator can miss the only expected warning in steady state. This is reproducible after a mis-join followed by restart with no further membership changes.
Useful? React with 👍 / 👎.
Codex caught a real gap: alarmIfJoinedAsVoter only ran from applyConfigChange / applyConfigChangeV2 / applyReadySnapshot. None of those fire on a clean Open path -- the post-mis-join ConfState loaded into prepared.disk.LocalSnap is never replayed. Steady-state restart of a node that previously mis-joined as voter would have emitted no alarm and never bumped JoinRoleViolationCount, so the operator could miss the only signal in the steady state. Fix: - Open now calls engine.alarmIfJoinedAsVoter(prepared.disk.LocalSnap.Metadata.ConfState) once after refreshStatus, before run() spins up. The joinAlarmFired latch (atomic.Bool) ensures the alarm fires at most once per process even if a later snapshot or conf change reapplies the same role assignment. Test: - TestJoinAsLearnerAlarmFiresAtStartupForPersistedVoterRole opens the joiner with joinAsLearner=false, leader AddVoters it, propagates a write to land the conf change on the joiner's disk, closes the joiner, then reopens with joinAsLearner=true and asserts JoinRoleViolationCount incremented and the engine stays running.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Addressed the Codex finding (commit 32e20b1):
Fix: Regression test: |
|
@claude review |
|
Claude finished @bootjp's task in 4m 21s —— View job Review of
|
| Issue | Status |
|---|---|
refreshStatus single-node fast path with 1-voter+N-learner (voterCount <= 1, engine.go:2506) |
✅ |
recordQuorumAck denominator uses voterCount not len(e.peers) (engine.go:1700–1704) |
✅ |
shouldCampaignSingleNode counts voters only (engine.go:2808–2816) |
✅ |
| v2 peers file round-trips suffrage at all three persist sites (apply loop, bootstrap, snapshot restore) | ✅ |
validateConfState: voter-count comparison uses splitPeersBySuffrage; learner check is set-based with ` |
|
voterCount/isLearnerNode initialized at Open from LocalSnap.Metadata.ConfState (engine.go:508–509) |
✅ |
Four-step apply loop sequence: voter-cache → peer helpers → config servers → bookkeeping (applyConfigChange, engine.go:2048–2064) |
✅ |
applyConfigChange/applyConfigChangeV2 accept confState raftpb.ConfState param |
✅ |
upsertPeer no longer writes e.config.Servers; refreshConfigServers is the sole writer |
✅ |
PromoteLearner preconditions checked pre-propose, on the admin loop before ProposeConfChange |
✅ |
minAppliedIndex==0 rejected unless skipMinAppliedCheck=true (engine.go:1115–1116) |
✅ |
--raftJoinAsLearner enforcement is post-apply local alarm, not pre-propose veto |
✅ |
Restart test TestLearnerPeersFilePersistsSuffrageAcrossRestart does genuine Close+Open |
✅ |
alarmIfJoinedAsVoter called at Open time — Codex finding (32e20b1) |
✅ |
New issue — Minor documentation inaccuracy
joinAlarmFired field comment says "per process lifetime" — should be "per Engine instance"
engine.go:192–193:
// joinAlarmFired latches the join-as-learner alarm so the
// structured ERROR log fires once per process lifetime. WithoutjoinAlarmFired is an atomic.Bool field on the Engine struct, initialized to false on every Open. The new regression test proves that a second Open (with joinAsLearner=true) fires the alarm, because the new Engine instance has joinAlarmFired=false. The comment calling this "per process lifetime" is therefore inaccurate — the correct description is "at most once per Engine instance" or "at most once per Open lifecycle."
This is a documentation nit (not a behavior bug — the code works exactly as intended), but the wrong phrasing could confuse a future reader trying to understand why the test expects the alarm on a second Open. Fix this →
Observations (non-blocking)
promote_learner CLI with no min_applied_index: raftadmin <addr> promote_learner <id> sends MinApplied=0, Skip=false, which the engine rejects with errPromoteLearnerMinAppliedZero. The usage string marks it optional ([min_applied_index]), but in practice it's required unless skip_min_applied_check=true. The runbook documents the correct invocation; the engine error is descriptive. This is acceptable.
Concurrency model for voterCount/isLearnerNode: refreshVoterCache writes both under e.mu (for refreshStatus which runs from external goroutines). recordQuorumAck reads both without a lock — safe because it runs on the same event-loop goroutine as refreshVoterCache, so the accesses are sequentially ordered. The comment at engine.go:1693–1696 documents this correctly. ✓
normalizePersistedPeer preserves Suffrage: The function only fills in missing NodeID/ID and returns the struct otherwise unchanged. The Suffrage field round-trips correctly through the normalizePersistedPeers path in writeCurrentPersistedPeers. ✓
Summary
The Codex-found gap is correctly closed. All fourteen tracked implementation issues across four rounds of review are resolved. The implementation is ready to merge pending CI green. The one item above (comment text) is a cleanup that can be addressed in a follow-up commit or as part of merge.

Add Raft learner / non-voter membership to elastickv. Operators can now attach a fresh node as a non-voting replica, let it catch up on the log, and then promote it to voter — without ever shrinking the cluster's effective fault tolerance during catch-up.
The codebase already had partial scaffolding (
Server.Suffrage,RaftAdminMember.suffrage,applyConfigPeerChangeToMapkeeping learner metadata) waiting for a consumer; this PR turns it on end-to-end.Scope
Design doc (
docs/design/2026_04_26_proposed_raft_learner.md)Five rounds of review. Approved on Round 5: "ready to proceed to Milestone 1 implementation."
Milestone 1 — engine-level learner support
Admin.AddLearnerandAdmin.PromoteLearneroninternal/raftengine.voterCount+isLearnerNodecache (initialized at Open from snapshot ConfState; rebuilt from scratch — never patched — on every conf change).len(e.peers)overloads switched tovoterCount(recordQuorumAck,refreshStatussingle-node fast path,removePeerpost-removal, plus the no-op audit onnextPeersAfterConfigChange). Closes the 1-voter+1-learner lease-read stall the design flagged.applyConfigChange/applyConfigChangeV2extended withconfState raftpb.ConfState. Four-step apply-loop sequence: refresh voter cache → per-peer cleanup → rebuilde.config.Servers→ bookkeeping.validateConfStateaccepts learner-only entries; voter check stays element-wise, learner check is set-based.upsertPeerno longer ownse.config.Servers; the apply loop rebuilds it viaconfigurationFromConfState.PromoteLearnerpreconditions checked synchronously on the admin loop before propose: target must be a learner; leader'sProgress.Match >= minAppliedIndex.Milestone 2 — operator surface
RaftAdmin.AddLearnerandRaftAdmin.PromoteLearnerRPCs.cmd/raftadmin add_learner <id> <address> [previous_index]andcmd/raftadmin promote_learner <id> [previous_index] [min_applied_index].--raftJoinAsLearnerflag — operator alarm (not consensus veto): when a node booted with the flag finds itself inConfState.Voters, anERROR-level structured log fires once andJoinRoleViolationCountincrements. The node keeps running.docs/raft_learner_operations.mdrunbook: prerequisites, attach/promote/verify,remove_serverfor learners, common errors.Out of scope (deferred to M3 / separate proposal)
Status.PerPeerexposingProgress.Matchformin_applied_indexselection.Test evidence
internal/raftengine/etcd/learner_test.go— 6 new tests:TestAddLearnerReplicatesWithoutCountingAsVoterTestPromoteLearnerSwapsRoleToVoterTestPromoteLearnerRejectsNonLearnerTestPromoteLearnerRejectsNotCaughtUpTestRemovePeerLearnerKeepsSingleNodeFastPath(the headline 1-voter+1-learner lease-read regression)TestJoinAsLearnerAlarmFiresWhenAddedAsVoterTestLinearizableReadOnLearnerForwardsToLeaderinternal/raftengine/etcd/peer_metadata_test.go— 4 new tests covering v2 round-trip with mixed suffrage, v1 forward-compat, v2 always-write, unknown-version rejection.internal/raftadmin/server_test.go—TestServerMapsEngineAdminMethodsextended for the two new RPCs.cmd/raftadmin/main_test.go—add_learner/promote_learnerhappy-path subcommand tests + extended usage assertions.internal/...,kv/...,store/...,monitoring/...,distribution/...,proxy/...,cmd/...) pass.golangci-lint: 0 issues.Self-review (per CLAUDE.md)
floor((N+1)/2)+1, ≥ pre-promote. WAL purge tracker includes learners by construction. v2 peers file forward-compat is read-once-write-once.voterCount/isLearnerNodecache is single-writer (apply loop) under existinge.mu.recordQuorumAcklearner-ack reject readsisLearnerNodeunder the same lock ase.peers. Promote-during-leader-change handled viamin_applied_indexprecondition.LinearizableReadfrom local FSM (returnsErrNotLeader). Voter / learner suffrage is sourced fromConfState, note.peers. Snapshot-restore path also refreshesvoterCount/isLearnerNodeand runs the alarm hook.Caveats
proto/Makefilepinslibprotoc 29.3; this branch was regenerated withlibprotoc 34.0(host) +protoc-gen-go v1.36.11+protoc-gen-go-grpc 1.6.1(matching). The diff is the standard "two new messages + msgTypes index shift" — re-runmake genwith the pinned toolchain if byte-identical generation matters.origin/mainas of writing. Rebase before merge.