feat(sqs): HT-FIFO schema + validators + dormancy gate (Phase 3.D PR 2)#681
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces HT-FIFO (High Throughput FIFO) support for SQS, adding new attributes such as PartitionCount, FifoThroughputLimit, and DeduplicationScope. It includes comprehensive validation logic for attribute immutability and cross-attribute consistency, alongside a temporary dormancy gate to manage the rollout phases. Feedback focuses on optimizing performance by implementing a manual FNV-1a loop to avoid heap allocations in the routing logic and short-circuiting metadata snapshots when HT-FIFO attributes are absent. Additionally, it is recommended to reuse existing error constants to reduce redundancy.
| // sqsErrInvalidParameterValue is the AWS code for incoherent | ||
| // parameter combinations (vs malformed individual values, which | ||
| // use sqsErrInvalidAttributeValue). HT-FIFO uses this for the | ||
| // {PartitionCount > 1, DeduplicationScope = "queue"} cross- | ||
| // attribute rejection. | ||
| sqsErrInvalidParameterValue = "InvalidParameterValue" |
There was a problem hiding this comment.
| h := fnv.New64a() | ||
| _, _ = h.Write([]byte(messageGroupID)) | ||
| // PartitionCount is a power of two (validator-enforced); mod is | ||
| // equivalent to mask-AND. The mask is meta.PartitionCount - 1. | ||
| // Computing the mask in uint64 first then narrowing to uint32 is | ||
| // safe because htfifoMaxPartitions == 32 fits in uint32 trivially. | ||
| mask := uint64(meta.PartitionCount - 1) | ||
| return uint32(h.Sum64() & mask) //nolint:gosec // masked by (PartitionCount - 1) ≤ htfifoMaxPartitions − 1, fits in uint32. |
There was a problem hiding this comment.
partitionFor is a performance-critical function for the HT-FIFO data plane. The current implementation using fnv.New64a() and []byte conversion causes heap allocations and interface overhead on every call. Since MessageGroupId is capped at 128 characters, a manual FNV-1a loop is significantly more efficient and avoids allocations entirely.
h := uint64(14695981039346656037)
for i := 0; i < len(messageGroupID); i++ {
h ^= uint64(messageGroupID[i])
h *= 1099511628211
}
// PartitionCount is a power of two (validator-enforced); mod is
// equivalent to mask-AND. The mask is meta.PartitionCount - 1.
// Computing the mask in uint64 first then narrowing to uint32 is
// safe because htfifoMaxPartitions == 32 fits in uint32 trivially.
mask := uint64(meta.PartitionCount - 1)
return uint32(h & mask) //nolint:gosec // masked by (PartitionCount - 1) ≤ htfifoMaxPartitions − 1, fits in uint32.| // per §3.2: if any immutable attribute carries a differing value, | ||
| // the entire request is rejected before any attribute is | ||
| // persisted (including mutable attributes in the same call). | ||
| preApply := snapshotImmutableHTFIFO(meta) |
There was a problem hiding this comment.
snapshotImmutableHTFIFO performs a heap allocation on every SetQueueAttributes call. Since most requests only modify mutable attributes (like VisibilityTimeout), we can use the htfifoAttributesPresent helper to skip the snapshot when no HT-FIFO attributes are provided in the request. This avoids redundant work before expensive operations, similar to short-circuiting snapshot checks.
| preApply := snapshotImmutableHTFIFO(meta) | |
| var preApply *sqsQueueMeta | |
| if htfifoAttributesPresent(attrs) { | |
| preApply = snapshotImmutableHTFIFO(meta) | |
| } |
References
- When checking if a configuration snapshot is up-to-date, short-circuit and return true if the current snapshot's index is strictly greater than the target index. Only compare configuration states if the indices are equal. Perform this check before any expensive FSM snapshot serialization to avoid redundant work.
| } | ||
| } | ||
| if meta.PartitionCount > 1 && meta.DeduplicationScope == htfifoDedupeScopeQueue { | ||
| return newSQSAPIError(http.StatusBadRequest, sqsErrInvalidParameterValue, |
| require.Error(t, err) | ||
| var apiErr *sqsAPIError | ||
| require.True(t, errors.As(err, &apiErr), "expected sqsAPIError, got %T", err) | ||
| require.Equal(t, sqsErrInvalidParameterValue, apiErr.errorType, |
|
@claude review |
|
Claude finished @bootjp's task in 3m 37s —— View job Code Review — Phase 3.D PR 2: HT-FIFO schema + validators + dormancy gate
Overall this is a well-structured, well-documented rollout gate. The design-doc-first workflow is followed, the dormancy-gate pattern is sound (gate-and-lift in one atomic commit), and the self-review covers the five CLAUDE.md lenses. The test coverage is thorough for the validator surface. A few issues worth addressing before merge: Bug:
|
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | Bug (latent, live in PR 5) | sqs_partitioning.go:121 |
PartitionCount > 1 not FIFO-only — Standard queue would be silently accepted after gate lift |
| 2 | Medium | sqs_catalog.go:68 |
Duplicate constant sqsErrInvalidParameterValue = sqsErrValidation |
| 3 | Medium | sqs_partitioning.go:83 |
partitionFor heap-allocates on every call via fnv.New64a() |
| 4 | Medium | sqs_catalog.go:~1370 |
snapshotImmutableHTFIFO allocates unconditionally on every SetQueueAttributes |
| 5 | Low | sqs_partitioning_integration_test.go:118 |
Test name/comment claims cross-attr coverage but dormancy gate fires first |
| 6 | Nit | sqs_partitioning_integration_test.go:276 |
Custom contains/indexOf instead of strings.Contains |
Issue #1 is the most important to fix before the gate lifts in PR 5. Issues #2–4 match the Gemini reviewer's findings and are worth fixing before merge to keep the codebase clean. Issues #5–6 are polish.
Three medium-priority Gemini findings + one cyclop fix from the rebase onto the updated PR #679 throttling branch. (1) sqsErrInvalidParameterValue removed (sqs_catalog.go, sqs_partitioning.go, sqs_partitioning_test.go): the constant duplicated sqsErrValidation which already carries "InvalidParameterValue". Reuses the existing constant; comment on the validator clarifies the choice. (2) snapshotImmutableHTFIFO heap allocation gated on htfifoAttributesPresent (sqs_catalog.go): the snapshot is only needed when the request actually carries an HT-FIFO attribute. A SetQueueAttributes that touches only mutable attributes (e.g. VisibilityTimeout — the common path) now skips the allocation entirely. The validator short-circuit on preApply == nil keeps correctness identical: a request with no HT-FIFO attribute can never change an HT-FIFO field. (3) partitionFor inlined FNV-1a (sqs_partitioning.go): the hash/fnv.New64a + h.Write([]byte(messageGroupID)) path heap- allocates the byte slice and pays the hash.Hash interface dispatch on every SendMessage. Replaced with a manual FNV-1a loop over the string — measurably faster on the routing hot path. MessageGroupId is capped at 128 chars by validation so the loop is bounded. (4) cyclop (trySetQueueAttributesOnce): the conditional preApply branch + immutability check + Throttle validator pushed cyclop to 12. Extracted applyAndValidateSetAttributes so the function stays under the ceiling; the helper is also useful for any future admin-surface callers that want the same apply+validate flow without the OCC dispatch. All tests pass under -race; golangci-lint clean.
16d3c23 to
fbd68a6
Compare
|
3件の Gemini medium + cyclop fix を入れて、PR #679 の最新 throttling branch にも rebase しました( Gemini medium #1 — sqsErrInvalidParameterValue 削除 Gemini medium #2 — snapshotImmutableHTFIFO heap alloc を skip Gemini medium #3 — partitionFor 手書き FNV-1a Cyclop fix Rebase note: PR #679 が新しい commit Verified: @claude review |
|
Claude finished @bootjp's task in 4m 13s —— View job Code Review — Phase 3.D PR 2: post-feedback round
All four Gemini medium / cyclop fixes are confirmed landed and correct. Three issues remain. Bug (latent, unresolved from round 1):
|
| Fix | Status |
|---|---|
Remove duplicate sqsErrInvalidParameterValue constant |
✓ done |
Conditional snapshotImmutableHTFIFO (gate on htfifoAttributesPresent) |
✓ done |
Inline FNV-1a in partitionFor (no heap alloc) |
✓ done |
Extract applyAndValidateSetAttributes to fix cyclop |
✓ done |
Summary
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | Bug (latent, live in PR 5) | sqs_partitioning.go:144 |
PartitionCount > 1 not FIFO-only — Standard queue accepted after gate lift |
| 2 | Low | sqs_partitioning_integration_test.go:110 |
Doc says "PartitionCount=1" but test sends 2; inline comment claims dormancy fires first but cross-attr rule fires first |
| 3 | Nit | sqs_partitioning_integration_test.go:273 |
Custom contains/indexOf instead of strings.Contains |
Issue #1 is the only blocker before gate lift in PR 5. Issues #2–3 are polish.
Three medium-priority Gemini findings + one cyclop fix from the rebase onto the updated PR #679 throttling branch. (1) sqsErrInvalidParameterValue removed (sqs_catalog.go, sqs_partitioning.go, sqs_partitioning_test.go): the constant duplicated sqsErrValidation which already carries "InvalidParameterValue". Reuses the existing constant; comment on the validator clarifies the choice. (2) snapshotImmutableHTFIFO heap allocation gated on htfifoAttributesPresent (sqs_catalog.go): the snapshot is only needed when the request actually carries an HT-FIFO attribute. A SetQueueAttributes that touches only mutable attributes (e.g. VisibilityTimeout — the common path) now skips the allocation entirely. The validator short-circuit on preApply == nil keeps correctness identical: a request with no HT-FIFO attribute can never change an HT-FIFO field. (3) partitionFor inlined FNV-1a (sqs_partitioning.go): the hash/fnv.New64a + h.Write([]byte(messageGroupID)) path heap- allocates the byte slice and pays the hash.Hash interface dispatch on every SendMessage. Replaced with a manual FNV-1a loop over the string — measurably faster on the routing hot path. MessageGroupId is capped at 128 chars by validation so the loop is bounded. (4) cyclop (trySetQueueAttributesOnce): the conditional preApply branch + immutability check + Throttle validator pushed cyclop to 12. Extracted applyAndValidateSetAttributes so the function stays under the ceiling; the helper is also useful for any future admin-surface callers that want the same apply+validate flow without the OCC dispatch. All tests pass under -race; golangci-lint clean.
fbd68a6 to
288b06e
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fbd68a6e94
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !meta.IsFIFO { | ||
| if meta.FifoThroughputLimit != "" { | ||
| return newSQSAPIError(http.StatusBadRequest, sqsErrInvalidAttributeValue, | ||
| "FifoThroughputLimit is only valid on FIFO queues") | ||
| } | ||
| if meta.DeduplicationScope != "" { | ||
| return newSQSAPIError(http.StatusBadRequest, sqsErrInvalidAttributeValue, | ||
| "DeduplicationScope is only valid on FIFO queues") | ||
| } |
There was a problem hiding this comment.
Reject PartitionCount on non-FIFO queues
validatePartitionConfig enforces FIFO-only semantics for FifoThroughputLimit and DeduplicationScope but not PartitionCount, so a Standard queue can be created with PartitionCount=1 and persist HT-FIFO-only metadata. This creates an invalid configuration surface (and for >1, currently returns the dormancy-gate message instead of a FIFO-only validation error), which is likely to confuse operators and makes control-plane behavior inconsistent with the HT-FIFO model.
Useful? React with 👍 / 👎.
| case "", htfifoThroughputPerMessageGroupID, htfifoThroughputPerQueue: | ||
| m.FifoThroughputLimit = v |
There was a problem hiding this comment.
Disallow empty HT-FIFO attribute values
The applier currently treats an explicit empty string as valid for FifoThroughputLimit (and similarly for DeduplicationScope), so a request that sends the key with "" succeeds and silently acts like unset rather than failing fast. This can hide client/configuration mistakes and produces surprising control-plane outcomes (especially when users think they set a concrete throughput/dedup mode).
Useful? React with 👍 / 👎.
…eanup Three items from the round-2 Claude review on PR #681: (1) Bug (latent, live in PR 5): validatePartitionConfig only guarded FifoThroughputLimit / DeduplicationScope as FIFO-only, not PartitionCount > 1. A Standard queue with PartitionCount=2 would slip past the validator after PR 5 lifts the dormancy gate. Added a guard inside the !meta.IsFIFO block: PartitionCount > 1 → InvalidAttributeValue("PartitionCount > 1 is only valid on FIFO queues"). PartitionCount 0 / 1 stay accepted on Standard (both mean "single-partition layout"). Test TestValidatePartitionConfig_StandardQueueRejectsHTFIFOAttrs extended to cover PartitionCount=2/4/8/16/32 → reject and PartitionCount=0/1 → accept. (2) Low: TestSQSServer_HTFIFO_RejectsQueueScopedDedupOnPartitioned had two contradictory comments. The function-level doc said "Test sets PartitionCount=1 to bypass dormancy" but the test actually sends PartitionCount=2; the inline comment claimed "dormancy fires first" but validatePartitionConfig (inside parseAttributesIntoMeta) runs before validatePartitionDormancyGate, so the cross-attr rule fires first. Rewrote the doc to describe the actual control-plane order (validatePartitionConfig → cross-attr rejection → wire 400) and removed the misleading "dormancy fires first" assertion. (3) Nit: replaced the custom 11-line contains/indexOf helpers with strings.Contains. Identical behaviour, idiomatic. All findings noted in the round-2 Claude review on PR #681.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Round 2 review の 3 件対応( Bug (latent, live in PR 5) — Low — Nit — custom Verified: @claude review |
|
Claude finished @bootjp's task in 2m 22s —— View job Code Review — Phase 3.D PR 2: round 3 (post-feedback)
All three round-3 items are correctly implemented. The PR is clean. Round-3 fixes confirmedBug (latent) — Low — test comment contradictions fixed ✓ Nit — One open Codex concern (not addressed in round 3 — low priority)The Codex P2 flag from round 1 about empty-string HT-FIFO attributes is still open. This is a design choice (matches how some AWS SDKs "clear" attributes), not a blocker. Worth a note in the PR description for PR 5 so the gate-lift commit can decide whether to add a non-empty guard then. SummaryAll round-1, round-2, and round-3 review items are resolved. The latent bug (Standard queue + |
310c1ca to
9869a48
Compare
Three medium-priority Gemini findings + one cyclop fix from the rebase onto the updated PR #679 throttling branch. (1) sqsErrInvalidParameterValue removed (sqs_catalog.go, sqs_partitioning.go, sqs_partitioning_test.go): the constant duplicated sqsErrValidation which already carries "InvalidParameterValue". Reuses the existing constant; comment on the validator clarifies the choice. (2) snapshotImmutableHTFIFO heap allocation gated on htfifoAttributesPresent (sqs_catalog.go): the snapshot is only needed when the request actually carries an HT-FIFO attribute. A SetQueueAttributes that touches only mutable attributes (e.g. VisibilityTimeout — the common path) now skips the allocation entirely. The validator short-circuit on preApply == nil keeps correctness identical: a request with no HT-FIFO attribute can never change an HT-FIFO field. (3) partitionFor inlined FNV-1a (sqs_partitioning.go): the hash/fnv.New64a + h.Write([]byte(messageGroupID)) path heap- allocates the byte slice and pays the hash.Hash interface dispatch on every SendMessage. Replaced with a manual FNV-1a loop over the string — measurably faster on the routing hot path. MessageGroupId is capped at 128 chars by validation so the loop is bounded. (4) cyclop (trySetQueueAttributesOnce): the conditional preApply branch + immutability check + Throttle validator pushed cyclop to 12. Extracted applyAndValidateSetAttributes so the function stays under the ceiling; the helper is also useful for any future admin-surface callers that want the same apply+validate flow without the OCC dispatch. All tests pass under -race; golangci-lint clean.
…eanup Three items from the round-2 Claude review on PR #681: (1) Bug (latent, live in PR 5): validatePartitionConfig only guarded FifoThroughputLimit / DeduplicationScope as FIFO-only, not PartitionCount > 1. A Standard queue with PartitionCount=2 would slip past the validator after PR 5 lifts the dormancy gate. Added a guard inside the !meta.IsFIFO block: PartitionCount > 1 → InvalidAttributeValue("PartitionCount > 1 is only valid on FIFO queues"). PartitionCount 0 / 1 stay accepted on Standard (both mean "single-partition layout"). Test TestValidatePartitionConfig_StandardQueueRejectsHTFIFOAttrs extended to cover PartitionCount=2/4/8/16/32 → reject and PartitionCount=0/1 → accept. (2) Low: TestSQSServer_HTFIFO_RejectsQueueScopedDedupOnPartitioned had two contradictory comments. The function-level doc said "Test sets PartitionCount=1 to bypass dormancy" but the test actually sends PartitionCount=2; the inline comment claimed "dormancy fires first" but validatePartitionConfig (inside parseAttributesIntoMeta) runs before validatePartitionDormancyGate, so the cross-attr rule fires first. Rewrote the doc to describe the actual control-plane order (validatePartitionConfig → cross-attr rejection → wire 400) and removed the misleading "dormancy fires first" assertion. (3) Nit: replaced the custom 11-line contains/indexOf helpers with strings.Contains. Identical behaviour, idiomatic. All findings noted in the round-2 Claude review on PR #681.
Implements PR 2 of the §11 multi-PR rollout from
docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md. The schema
fields land on sqsQueueMeta with the design's wire types; the
validator enforces the §3.2 cross-attribute rules; a temporary
CreateQueue gate rejects PartitionCount > 1 until PR 5 lifts the
gate atomically with the data-plane fanout.
Schema (sqs_catalog.go):
- sqsQueueMeta gains three optional fields:
* PartitionCount uint32 — number of FIFO partitions; 0/1 means
legacy single-partition; > 1 enables HT-FIFO.
* FifoThroughputLimit string — "perMessageGroupId" (default for
HT-FIFO) or "perQueue" (collapses every group to partition 0).
* DeduplicationScope string — "messageGroup" (default for HT-FIFO)
or "queue" (legacy single-window).
- attributesEqual extended via baseAttributesEqual + htfifoAttributesEqual
split (kept under cyclop ceiling).
- queueMetaToAttributes surfaces the configured HT-FIFO fields via
addHTFIFOAttributes so GetQueueAttributes("All") round-trips.
Routing (sqs_partitioning.go, new):
- partitionFor(meta, messageGroupId) uint32 implements §3.3:
FNV-1a over MessageGroupId & (PartitionCount - 1). Edge cases
documented in the godoc — PartitionCount 0/1 → 0, FifoThroughputLimit
perQueue short-circuits to 0, empty MessageGroupId → 0.
- The bitwise-mask optimisation requires PartitionCount be a power
of two; the validator enforces it. A future bug that leaks a
non-power-of-two value is caught by TestPartitionFor_PowerOfTwoMaskingMatchesMod.
Validation (sqs_partitioning.go):
- validatePartitionConfig enforces the §3.2 cross-attribute rules:
* PartitionCount must be a power of two in [1, htfifoMaxPartitions=32].
* FifoThroughputLimit / DeduplicationScope are FIFO-only.
* {PartitionCount > 1, DeduplicationScope = "queue"} rejects with
InvalidParameterValue (incoherent params, vs InvalidAttributeValue
for malformed individual values) per §3.2 cross-attribute gate.
- Runs in parseAttributesIntoMeta (after resolveFifoQueueFlag, so
the IsFIFO check sees the post-resolution flag) and in
trySetQueueAttributesOnce (after applyAttributes; IsFIFO comes
from the loaded meta).
- validatePartitionImmutability is the §3.2 rule that PartitionCount,
FifoThroughputLimit, and DeduplicationScope are immutable from
CreateQueue. SetQueueAttributes is all-or-nothing: a request that
touches a mutable attribute alongside an attempted immutable
change rejects the whole request before persisting either.
- snapshotImmutableHTFIFO captures the pre-apply values so the
immutability check has a clean before/after pair.
Dormancy gate (sqs_partitioning.go + sqs_catalog.go):
- validatePartitionDormancyGate is the §11 PR 2 temporary gate that
rejects CreateQueue with PartitionCount > 1 with
InvalidAttributeValue("PartitionCount > 1 requires HT-FIFO data
plane — not yet enabled"). The schema field exists in the meta
type but no partitioned data can land. Removed in PR 5 in the same
commit that wires the data-plane fanout — gate-and-lift is atomic
so a half-deployed cluster can never accept a partitioned queue
without the data plane to serve it.
Tests:
- adapter/sqs_partitioning_test.go: 14 unit tests covering
partitionFor (legacy zero/one, perQueue short-circuit, empty
MessageGroupId, determinism, distribution within ±5% across 8
partitions on 100k samples, power-of-two masking < N), isPowerOfTwo,
validatePartitionConfig (power-of-two, max cap, FIFO-only,
cross-attr InvalidParameterValue, single-partition + queue-dedup OK),
validatePartitionDormancyGate (rejects > 1, allows 0/1),
validatePartitionImmutability (per-attribute change rejects, same-
value no-op succeeds), htfifoAttributesPresent.
- adapter/sqs_partitioning_integration_test.go: 7 end-to-end tests
against a real createNode cluster covering the wire surface:
dormancy gate rejects PartitionCount > 1 (and the gate's reason
surfaces to the operator), allows PartitionCount=1, validator
rejects non-power-of-two, FIFO-only rejection on Standard queue,
cross-attr {partitions, queue dedup} rejects, immutability rejects
SetQueueAttributes change with same-value no-op succeeds, all-or-
nothing — combined mutable+immutable rejects entirely without
persisting the mutable change, GetQueueAttributes round-trip.
- All tests pass under -race; golangci-lint clean.
Out of scope for this PR (§11 PR 3-8):
- PR 3: keyspace threading — partitionIndex through every sqsMsg*Key
constructor, defaulting to 0 so existing queues stay byte-identical.
Gate from PR 2 still in place.
- PR 4: routing layer + --sqsFifoPartitionMap flag + mixed-version
gate (§8.5 capability advertisement + §8 leadership-refusal hook
in kv/lease_state.go). Gate still in place.
- PR 5: send/receive partition fanout, receipt-handle v2, removes
the dormancy gate atomically with the data-plane fanout.
- PR 6: PurgeQueue/DeleteQueue partition iteration + tombstone +
reaper.
- PR 7: Jepsen HT-FIFO workload + metrics.
- PR 8: doc lifecycle bump.
Three medium-priority Gemini findings + one cyclop fix from the rebase onto the updated PR #679 throttling branch. (1) sqsErrInvalidParameterValue removed (sqs_catalog.go, sqs_partitioning.go, sqs_partitioning_test.go): the constant duplicated sqsErrValidation which already carries "InvalidParameterValue". Reuses the existing constant; comment on the validator clarifies the choice. (2) snapshotImmutableHTFIFO heap allocation gated on htfifoAttributesPresent (sqs_catalog.go): the snapshot is only needed when the request actually carries an HT-FIFO attribute. A SetQueueAttributes that touches only mutable attributes (e.g. VisibilityTimeout — the common path) now skips the allocation entirely. The validator short-circuit on preApply == nil keeps correctness identical: a request with no HT-FIFO attribute can never change an HT-FIFO field. (3) partitionFor inlined FNV-1a (sqs_partitioning.go): the hash/fnv.New64a + h.Write([]byte(messageGroupID)) path heap- allocates the byte slice and pays the hash.Hash interface dispatch on every SendMessage. Replaced with a manual FNV-1a loop over the string — measurably faster on the routing hot path. MessageGroupId is capped at 128 chars by validation so the loop is bounded. (4) cyclop (trySetQueueAttributesOnce): the conditional preApply branch + immutability check + Throttle validator pushed cyclop to 12. Extracted applyAndValidateSetAttributes so the function stays under the ceiling; the helper is also useful for any future admin-surface callers that want the same apply+validate flow without the OCC dispatch. All tests pass under -race; golangci-lint clean.
…eanup Three items from the round-2 Claude review on PR #681: (1) Bug (latent, live in PR 5): validatePartitionConfig only guarded FifoThroughputLimit / DeduplicationScope as FIFO-only, not PartitionCount > 1. A Standard queue with PartitionCount=2 would slip past the validator after PR 5 lifts the dormancy gate. Added a guard inside the !meta.IsFIFO block: PartitionCount > 1 → InvalidAttributeValue("PartitionCount > 1 is only valid on FIFO queues"). PartitionCount 0 / 1 stay accepted on Standard (both mean "single-partition layout"). Test TestValidatePartitionConfig_StandardQueueRejectsHTFIFOAttrs extended to cover PartitionCount=2/4/8/16/32 → reject and PartitionCount=0/1 → accept. (2) Low: TestSQSServer_HTFIFO_RejectsQueueScopedDedupOnPartitioned had two contradictory comments. The function-level doc said "Test sets PartitionCount=1 to bypass dormancy" but the test actually sends PartitionCount=2; the inline comment claimed "dormancy fires first" but validatePartitionConfig (inside parseAttributesIntoMeta) runs before validatePartitionDormancyGate, so the cross-attr rule fires first. Rewrote the doc to describe the actual control-plane order (validatePartitionConfig → cross-attr rejection → wire 400) and removed the misleading "dormancy fires first" assertion. (3) Nit: replaced the custom 11-line contains/indexOf helpers with strings.Contains. Identical behaviour, idiomatic. All findings noted in the round-2 Claude review on PR #681.
9869a48 to
0b41e0f
Compare
Summary
Phase 3.D PR 2 (schema + validators + dormancy gate) per the §11 multi-PR rollout in
docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md(currently on PR #664).Stacks on PR #679 (Phase 3.C): branched off
feat/sqs-throttling-phase3cso the 3.D schema work shares the same base as the throttling implementation. Will be rebased once PR #679 merges to main.sqsQueueMetagainsPartitionCount uint32,FifoThroughputLimit string,DeduplicationScope string.partitionFor(meta, messageGroupId) uint32implements §3.3 — FNV-1a overMessageGroupId, masked byPartitionCount-1(validator-enforced power-of-two so the bitwise mask is equivalent to mod). Edge cases:PartitionCount0/1 → 0,FifoThroughputLimit=perQueueshort-circuits to 0, emptyMessageGroupId→ 0.validatePartitionConfigenforces the §3.2 cross-attribute rules: power-of-two[1, htfifoMaxPartitions=32],FifoThroughputLimit/DeduplicationScopeare FIFO-only,{PartitionCount > 1, DeduplicationScope = "queue"}rejects withInvalidParameterValueat control-plane time so the operator sees the error before the firstSendMessage.validatePartitionImmutabilityenforces the §3.2 immutability rule (the three HT-FIFO fields are immutable fromCreateQueueonward).SetQueueAttributesis all-or-nothing — a request that touches a mutable attribute alongside an attempted immutable change rejects the whole request before persisting either.validatePartitionDormancyGateis the §11 PR 2 temporary gate:CreateQueuewithPartitionCount > 1rejects withInvalidAttributeValue("PartitionCount > 1 requires HT-FIFO data plane — not yet enabled"). The schema field exists in the meta type but no partitioned data can land. Removed in PR 5 in the same commit that wires the data-plane fanout — gate-and-lift is atomic so a half-deployed cluster can never accept a partitioned queue without the data plane to serve it.GetQueueAttributes("All")round-trips the configured HT-FIFO fields viaaddHTFIFOAttributes.Test plan
adapter/sqs_partitioning_test.go— 14 unit tests coveringpartitionFor(legacy zero/one, perQueue short-circuit, empty MessageGroupId, determinism, distribution within ±5% across 8 partitions on 100k samples, power-of-two masking always < N),isPowerOfTwo,validatePartitionConfig(power-of-two, max cap, FIFO-only, cross-attrInvalidParameterValue, single-partition + queue-dedup OK),validatePartitionDormancyGate(rejects > 1, allows 0/1, gate reason surfaces to the operator),validatePartitionImmutability(per-attribute change rejects, same-value no-op succeeds),htfifoAttributesPresent.adapter/sqs_partitioning_integration_test.go— 7 end-to-end tests: dormancy gate rejectsPartitionCount > 1(and the gate's "not yet enabled" message surfaces), allowsPartitionCount=1, validator rejects non-power-of-two, FIFO-only rejection on Standard queue, cross-attr{PartitionCount=2, DeduplicationScope=queue}rejects, immutabilitySetQueueAttributeschange rejects with same-value no-op succeeding, all-or-nothing combined mutable+immutable rejects without persisting the mutable change,GetQueueAttributesround-trip omits unset fields.-race.golangci-lint run ./adapter/...→ 0 issues.Self-review (5 lenses)
sqsQueueMetaand are persisted via the existing OCCPutonsqsQueueMetaKey. The dormancy gate runs atCreateQueueadmission time; once PR 5 lifts the gate, no schema migration is needed (existing queues havePartitionCount=0which is equivalent to 1). The keyspace stays untouched in this PR (PR 3 threadspartitionIndexthroughsqsMsg*Key).nextTxnReadTSsnapshot thatapplyAttributesuses, so a concurrentSetQueueAttributesfrom another writer is caught by the existing OCCStartTS + ReadKeysfence (already in place forsetQueueAttributesWithRetry). Dormancy gate runs atCreateQueueadmission so a concurrent gate-lift mid-request is impossible.partitionForis one FNV-1a hash + one bitwise mask; constant time.validatePartitionConfigis field-comparison only, runs once perCreateQueue/SetQueueAttributes.addHTFIFOAttributesadds at most three map writes and only when fields are non-empty. Hot pathSendMessage/ReceiveMessageis unaffected by this PR (no routing changes — those land in PR 5).{partitions, queue dedup}rejects at control-plane time so an operator can never end up with a created-but-unserviceable queue (the runtime path that would have surfaced the same error at first send is unreachable). Immutability rule prevents mid-life routing-key reconfiguration that would silently violate within-group FIFO ordering.partitionFor's edge cases plus a 100k-sample distribution test that catches non-uniform routing immediately. Lint is clean.Out of scope (subsequent §11 rollout PRs)
partitionIndexthrough everysqsMsg*Keyconstructor, defaulting to 0 so existing queues stay byte-identical. Dormancy gate from this PR remains in place.--sqsFifoPartitionMapflag + mixed-version gate (§8.5 capability advertisement + §8 leadership-refusal hook inkv/lease_state.go). Dormancy gate still in place.PurgeQueue/DeleteQueuepartition iteration + tombstone schema update + reaper.Branch base
Branched off
feat/sqs-throttling-phase3c(PR #679) which itself branches offdocs/sqs-phase3-proposals(PR #664). Will rebase againstmainafter both ancestors merge.