Skip to content

fix(cold): replace read-arm TaskTracker backpressure with Semaphore#56

Open
rswanson wants to merge 1 commit intomainfrom
swanny/fix-cold-reader-deadlock
Open

fix(cold): replace read-arm TaskTracker backpressure with Semaphore#56
rswanson wants to merge 1 commit intomainfrom
swanny/fix-cold-reader-deadlock

Conversation

@rswanson
Copy link
Copy Markdown
Member

Summary

Fixes a deadlock in signet_cold::ColdStorageTask that wedges the entire cold-storage task the moment 64 concurrent reads are in flight. Once wedged, no reads or writes are dispatched; upstream RPCs stall, write mpsc fills (~256 chain advances ≈ 51 min at 12 s/block), try_send in dispatch_append_blocks returns Backpressure, and the node crashes via ? at signet-node/src/node.rs:285.

Observed in production on signet-sidecar running in the dev mainnet namespace: every ~50 min the pod crashed with cold storage backpressure: channel full; all cold-path RPCs (eth_getBlockByNumber, eth_getLogs, eth_gasPrice, eth_feeHistory) hung at the 60 s client timeout while hot-only RPCs (eth_getBalance, eth_blockNumber, eth_chainId) stayed sub-millisecond. Aurora sat idle at its min-pool 5 connections with zero ReadIOPS.

The bug

crates/cold/src/task/runner.rs, read arm (before this change):

maybe_read = self.read_receiver.recv() => {
    ...
    while self.task_tracker.len() >= MAX_CONCURRENT_READERS {   // 64
        tokio::select! {
            _ = self.cancel_token.cancelled() => { break; }
            _ = self.task_tracker.wait() => {}                    // <-- never resolves
        }
    }
    self.task_tracker.spawn(async move { inner.handle_read(req).await; });
}

tokio_util::TaskTracker::wait() documents:

"Waits until this TaskTracker is both closed and empty."

The read arm never calls close() — only the write arm does, as part of its drain-before-write (close(); wait(); reopen();). So the instant the 65th concurrent read arrives, the inner wait() future has no path to completion. tokio::select! inside a select! arm does not cause the outer select! to re-poll — the outer future is still "inside" the read arm's body. The whole run loop is pinned there forever.

Observable fallout, all matching telemetry:

  • New reads pile in the 256-slot read_receiver mpsc; send().await on the RPC side blocks; clients see exactly-60 s hangs when they time out.
  • Already-spawned 64 read tasks complete and release their connections. Pool returns to idle min (5). Aurora reports zero ReadIOPS and 100 % buffer cache ratio because nothing new queries it.
  • Writes queue in write_receiver (256 slots). Chain advances every ~12 s ⇒ 256 × 12 s ≈ 51 min, after which try_send(AppendBlocks) returns TrySendError::FullColdStorageError::Backpressure, which bubbles through self.storage.append_blocks(...)? in signet-node and kills the process.
  • handle_new_head / commit spans look fast in tracing because they measure the node-side try_send, which is fire-and-forget. They don't indicate the cold task is actually making progress.

The fix

Replace the TaskTracker-based backpressure with a Semaphore sized to MAX_CONCURRENT_READERS:

maybe_write = self.write_receiver.recv() => {
    ...
    // Drain all in-flight reads by acquiring every permit.
    let _drain = self.read_semaphore.clone()
        .acquire_many_owned(MAX_CONCURRENT_READERS as u32)
        .await
        .expect(\"read semaphore outlives the run loop\");
    self.handle_write(req).await;
    // _drain drops here, restoring permits.
}

maybe_read = self.read_receiver.recv() => {
    ...
    let permit = tokio::select! {
        _ = self.cancel_token.cancelled() => { break; }
        permit = self.read_semaphore.clone().acquire_owned() => {
            permit.expect(\"read semaphore outlives the run loop\")
        }
    };
    let inner = Arc::clone(&self.inner);
    self.task_tracker.spawn(async move {
        let _permit = permit;   // released on completion / panic
        inner.handle_read(req).await;
    });
}

Key properties:

  • Reader backpressure is now correct. acquire_owned() wakes immediately when any sibling reader releases a permit on completion. No closed-tracker invariant required.
  • Drain-before-write is preserved. acquire_many_owned(64) only resolves once every reader has released its permit, which is exactly the old close() + wait() semantics — minus the footgun.
  • TaskTracker is retained solely for graceful shutdown, where close() + wait() semantics are what we want (the shutdown path explicitly closes the tracker after the main loop exits).
  • Panic-safe. Dropping the spawned JoinHandle's future (on handler panic) drops _permit, so the semaphore count is never leaked.

Tests

Added crates/cold/tests/concurrency.rs with two regression tests:

  • reads_above_concurrency_cap_do_not_deadlock — issues 256 concurrent reads (4× the in-flight cap) and asserts all 256 complete within a 15 s guard.
  • write_after_saturating_reads_makes_progress — interleaves a write into the same 256-reader flood and asserts it completes within the same guard.

Both pass in milliseconds with this change. With the fix reverted, both reliably hit the 15 s deadlock guard:

thread 'reads_above_concurrency_cap_do_not_deadlock' panicked:
    cold storage task deadlocked under concurrent reads: Elapsed(())
thread 'write_after_saturating_reads_makes_progress' panicked:
    write was starved by saturated readers: Elapsed(())
test result: FAILED. 0 passed; 2 failed; ... finished in 15.01s

Test plan

  • cargo test -p signet-cold (existing conformance + new concurrency tests)
  • cargo +nightly fmt -- --check
  • cargo clippy --workspace --all-targets -- -D warnings
  • RUSTDOCFLAGS=\"-D warnings\" cargo doc --workspace --no-deps
  • Manually verified the new tests fail on origin/main's runner.rs and pass with this change.
  • Reviewer spot-check: cut a patch release of signet-cold + signet-storage, bump init4tech/node-components, rebuild signet-sidecar:latest, redeploy to mainnet dev, confirm the sidecar stops crashing every 51 min and eth_getBlockByNumber stops timing out.

🤖 Generated with Claude Code

The cold storage task's run loop wedges once `MAX_CONCURRENT_READERS`
(64) read handlers are in flight. The read arm's backpressure waits on
`TaskTracker::wait()`, but that future only resolves when the tracker is
both *closed* and *empty*. The read arm never closes the tracker (only
the write arm does, during drain-before-write), so the inner `wait()`
never completes, the outer `select!` never re-polls, and no further
reads *or* writes are dispatched.

Replace the TaskTracker-based backpressure with a `Semaphore`:

- Each reader acquires one permit before being spawned; the permit is
  released when the spawned task completes or panics.
- Writes acquire all `MAX_CONCURRENT_READERS` permits via
  `acquire_many_owned`, which unblocks only after every in-flight
  reader has released its permit — preserving the existing
  drain-before-write invariant with no close/reopen dance.

`TaskTracker` is kept solely for graceful shutdown, which is where its
close+wait semantics are actually wanted.

Add regression tests that spawn 256 concurrent reads (4× the in-flight
cap) and verify both reads and an interleaved write complete within a
15s guard. Before this change, both tests hit the guard; after, they
pass in milliseconds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rswanson
Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@prestwich
Copy link
Copy Markdown
Member

gonna have to sit down with the laptop to review this feel free to patch node to this branch in the meantime

@rswanson rswanson closed this Apr 16, 2026
@rswanson rswanson reopened this Apr 16, 2026
@rswanson
Copy link
Copy Markdown
Member Author

this does seem to fix things when patched locally and built into an image

@prestwich
Copy link
Copy Markdown
Member

[Claude Code]

@Fraser999 — could I get a second pair of eyes on this design spec? It extends the semaphore fix in this PR with a broader refactor. Landing this design would replace the current diff.


Cold storage read/write permit refactor

Context

PR #56 on swanny/fix-cold-reader-deadlock replaces the old
TaskTracker-based reader backpressure in signet_cold::ColdStorageTask
with a Semaphore, fixing a production deadlock that wedged the entire
cold-storage task the moment the 65th concurrent reader arrived.

The semaphore fix is correct and resolves the observed crash. During
review we identified a handful of smaller issues in the same area that
warrant a second pass:

  1. The write arm's acquire_many_owned(64).await is not wrapped in a
    cancel-select. A single stuck reader holds its permit indefinitely,
    preventing the writer from accumulating all 64 permits, and the
    outer select! can't re-poll the cancel arm while it's inside the
    write arm. Shutdown during a drain hangs.

  2. Backpressure is expressed through two independently-bounded
    mechanisms — the 256-slot read mpsc and the 64-permit semaphore —
    that have to stay in sync with each other. They don't today; the
    channel buffer is 4× larger than the permit pool, and the overlap
    serves no correctness purpose.

  3. At the steady-state 12s write cadence, a single stuck reader
    extends every subsequent drain and, over time, fills the write
    mpsc and regenerates the exact backpressure-induced crash the PR
    was written to fix. The semaphore change alone does not close this
    path.

  4. There is no handler-side operation deadline. Non-stream reads that
    don't complete within a reasonable bound have no mechanism to
    release their permit back to the pool.

This spec describes a refactor that resolves all four in one change.

Goals

  • Unify read backpressure into a single mechanism whose invariants are
    obvious.
  • Make shutdown responsive regardless of reader state.
  • Bound the worst-case drain duration so it cannot exceed the write
    cadence.
  • Preserve the existing drain-before-write correctness guarantee.
  • Preserve the existing graceful-shutdown semantics for both reads and
    long-lived stream producers.

Non-goals

  • Changing how streaming reads (StreamLogs) work. Streams continue
    to run independently with backend-level snapshot isolation and are
    not drained before writes.
  • Adding per-operation deadlines. A single global deadline is
    sufficient; per-op tuning can come later with evidence.
  • Interrupting in-progress backend calls on deadline expiry. The
    deadline drops our handle on the future; the backend may continue
    working until it completes or its connection is torn down. That is
    a backend-layer concern.
  • Changing the ColdStorage trait surface or the external
    ColdStorageHandle API beyond what's needed to attach permits to
    read requests.

Design

Architecture

Replace the single-run-loop model with two concurrently-running tasks:

  • Read dispatcher — always polling the read channel. Its only
    work is recv → spawn read handler. Never blocks on anything other
    than the channel itself.
  • Writer — serializes writes. For each write, acquires all 64
    permits (cancellably), executes the write, releases permits.

Both tasks share one read_semaphore: Arc<Semaphore> sized to
MAX_CONCURRENT_READERS = 64. This semaphore is the single
backpressure mechanism. It has no siblings.

Permit-attached-to-message

ColdStorageHandle's read methods acquire a permit before sending
into the channel:

pub async fn get_header(&self, spec: HeaderSpecifier)
    -> ColdResult<Option<SealedHeader>>
{
    let permit = self.read_sem.clone().acquire_owned().await
        .map_err(|_| ColdStorageError::Cancelled)?;
    let (tx, rx) = oneshot::channel();
    self.read_sender
        .send(PermittedReadRequest {
            permit,
            req: ColdReadRequest::GetHeader { spec, resp: tx },
        })
        .await
        .map_err(|_| ColdStorageError::Cancelled)?;
    rx.await.map_err(|_| ColdStorageError::Cancelled)?
}

The permit travels in the message, is moved into the spawned handler
by the dispatcher, and is dropped when the handler completes or panics.
Its lifetime spans: caller hold time → channel transit → handler
execution
. Dropping the handler future (panic, timeout, abort) drops
the permit, so the pool is never leaked.

Channel sizing: READ_CHANNEL_SIZE = MAX_CONCURRENT_READERS

With permits acquired pre-send, the total count of outstanding permits
(= held-by-caller + in-channel + in-handler) is bounded by 64. So the
read channel can never hold more than 64 items. We size it to exactly
64.

The channel is now a passive rendezvous. It cannot back-pressure
(that job belongs to the semaphore), and a caller holding a permit is
guaranteed space. Use try_send from the handle:

self.read_sender.try_send(PermittedReadRequest { permit, req })
    .map_err(|e| match e {
        TrySendError::Full(_) => unreachable!(
            "semaphore permit implies channel capacity"
        ),
        TrySendError::Closed(_) => ColdStorageError::Cancelled,
    })?;

Full is impossible by construction — it would indicate a permit
miscount. Closed maps to shutdown.

Write channel size is unchanged (WRITE_CHANNEL_SIZE = 256). Writes
are bounded separately by their own send-side mpsc capacity.

Writer task with cancellable drain

async fn run_writer(mut self) {
    while let Some(req) = self.write_receiver.recv().await {
        let _drain = tokio::select! {
            _ = self.cancel_token.cancelled() => break,
            d = self.read_semaphore.clone().acquire_many_owned(
                MAX_CONCURRENT_READERS as u32,
            ) => d.expect("read semaphore outlives the writer task"),
        };
        self.handle_write(req).await;
        // _drain drops here, restoring all permits.
    }
}

The cancel-select closes the "stuck reader + shutdown = hang" window
identified in PR #56 review.

Dispatcher task

async fn run_dispatcher(self) {
    while let Some(PermittedReadRequest { permit, req })
        = self.read_receiver.recv().await
    {
        let inner = Arc::clone(&self.inner);
        let deadline = self.read_deadline;
        self.task_tracker.spawn(async move {
            let _permit = permit;
            inner.handle_read_with_deadline(req, deadline).await;
        });
    }
}

handle_read_with_deadline refactors today's handle_read so the
response sender is extracted from the request up front, the body is
wrapped in tokio::time::timeout, and on timeout the sender receives
Err(ColdStorageError::Timeout) explicitly while a WARN is emitted
with the operation variant. Sketch:

impl<B: ColdStorageRead> ColdStorageTaskInner<B> {
    async fn handle_read_with_deadline(
        self: &Arc<Self>,
        req: ColdReadRequest,
        deadline: Duration,
    ) {
        let op = req.variant_name();
        let (resp, body) = req.split_response();
        match tokio::time::timeout(deadline, self.handle_read_body(body)).await {
            Ok(result) => { let _ = resp.send(result); }
            Err(_) => {
                warn!(operation = op, "cold read deadline exceeded");
                let _ = resp.send(Err(ColdStorageError::Timeout));
            }
        }
    }
}

The contract: on deadline expiry, the caller receives
ColdStorageError::Timeout and the permit is released. The exact
split_response / variant_name / handle_read_body surface is a
refactor of the current handle_read match arms; each read variant
carries its own response type so the split is mechanical.

Deadline value

DEFAULT_READ_DEADLINE = Duration::from_secs(5).

Rationale:

  • Every non-stream operation should complete in well under a second
    under normal conditions (point lookups on indexed columns, bounded
    batch lookups, filter scans bounded by max_logs).
  • The deadline is a guardrail against backend-layer pathology (stuck
    connection, runaway query, deadlocked transaction), not a ceiling
    for legitimate work.
  • Comfortably below the 12s write cadence, so a worst-case drain of
    5s leaves 7s of headroom per cycle before write mpsc pressure
    accumulates.
  • Well below the 60s RPC client timeout, so the caller sees a
    structured Timeout error rather than RecvError at the client
    layer.
  • A legitimate GetLogs that trips the 5s deadline is a signal that
    the caller should have used StreamLogs. The WARN log with the
    variant name makes that diagnosable.

Expose the deadline as a field on ColdStorageTask, configurable via
a builder method analogous to max_stream_deadline:

impl<B: ColdStorage> ColdStorageTask<B> {
    pub fn with_read_deadline(mut self, deadline: Duration) -> Self {
        // ...
    }
}

Drain semantics preserved

The writer's acquire_many_owned(64) still serves as the drain
barrier. It resolves only once every permit is free, which means:

  • No callers are mid-send (they're blocked at acquire_owned()).
  • No messages are waiting in the channel (each held a permit; the
    dispatcher has moved them all into handlers).
  • No handlers are running (each held its permit; all have completed).

The dispatcher keeps running during drain, so queued permits do not
get stranded. Tokio's FIFO semaphore fairness ensures that once the
writer is queued for 64 permits, later callers queue behind it and do
not re-pressurize the pool.

Shutdown

Each of the two tasks checks the cancel token in its outer select, so
either the cancel token firing or the relevant channel closing (all
ColdStorageHandle clones dropped) terminates it.

Dispatcher:

loop {
    tokio::select! {
        biased;
        _ = self.cancel_token.cancelled() => break,
        maybe = self.read_receiver.recv() => {
            let Some(PermittedReadRequest { permit, req }) = maybe else {
                break;
            };
            // spawn handler
        }
    }
}

Writer: same shape — cancel token arm plus write_receiver.recv()
plus the cancel-gated acquire_many_owned inside the write handling
block.

Full shutdown sequence (driven by the top-level task that owns both):

  1. Cancel token fires. Both tasks break out of their outer loops.
    • Dispatcher stops accepting new reads. In-flight queued messages
      that were already recv'd are spawned as handlers; any that
      remain in the channel stay there until all ColdStorageHandles
      are dropped (they get dropped on the handle side at shutdown).
    • Writer exits from either its outer cancel arm or its inner
      cancel-select around acquire_many.
  2. Wait for the handler task_tracker:
    task_tracker.close(); task_tracker.wait().await.
    Bounded by the 5s read deadline — no handler can run longer.
  3. Wait for the stream_tracker:
    stream_tracker.close(); stream_tracker.wait().await.
    Bounded by the per-stream deadline.

Order: handlers must drain before streams, because StreamLogs
handlers spawn stream producers. Draining streams first could miss a
producer spawned by a late-completing handler.

Implementation: ColdStorageTask::run becomes a thin orchestrator
that spawns run_dispatcher and run_writer on a JoinSet (or two
plain tokio::spawn handles), awaits both, then runs the two
tracker-drain steps in order.

API changes

Public surface changes are intentionally minimal:

  • ColdStorageTask::new gains a read_deadline parameter (with a
    convenience constructor that uses the default).
  • ColdStorageTask::with_read_deadline(Duration) -> Self builder.
  • ColdStorageError::Timeout variant added.

Internal:

  • PermittedReadRequest { permit: OwnedSemaphorePermit, req: ColdReadRequest }
    wraps messages on the read channel.
  • ColdStorageHandle holds Arc<Semaphore> for permit acquisition
    and exposes unchanged per-operation methods.
  • ColdStorageTask is split into run_dispatcher and run_writer;
    run becomes a thin wrapper that spawns both and manages shutdown.

No changes to ColdStorageRead, ColdStorageWrite, or ColdStorage
traits.

Testing

Add a new integration-test file crates/cold/tests/concurrency.rs
(replacing the one in PR #56) plus a small GatedBackend test helper.

GatedBackend<B>

Test-local wrapper over any ColdStorage backend. Each read method
acquires a permit on a test-controlled Arc<Semaphore> before
delegating to the inner backend:

struct GatedBackend<B> {
    inner: B,
    read_gate: Arc<Semaphore>,
}

impl<B: ColdStorageRead> ColdStorageRead for GatedBackend<B> {
    async fn get_header(&self, spec: HeaderSpecifier)
        -> ColdResult<Option<SealedHeader>>
    {
        let _p = self.read_gate.acquire().await.unwrap();
        self.inner.get_header(spec).await
    }
    // ... all other read methods follow the same pattern.
}

Tests control when readers release by adjusting the gate's permit
count.

Regression tests

  1. reads_above_concurrency_cap_do_not_deadlock (carried over from
    PR fix(cold): replace read-arm TaskTracker backpressure with Semaphore #56). 256 concurrent reads against a non-gated backend, 15s
    guard. Verifies the original bug stays fixed.

  2. write_after_saturating_reads_makes_progress (carried over).
    Same flood with a write interleaved.

  3. fairness_write_serves_before_later_readers (new). Gate the
    backend at 0 permits. Spawn 64 readers (they hang in the gate).
    Send a write. Spawn another 64 readers. Release all gate permits at
    once. Assert: the write's oneshot resolves before any of the
    later 64 readers' oneshots.

  4. cancel_during_reader_backpressure_shuts_down (new). Gate at
    0. Saturate with 64 readers plus a 65th queued on
    acquire_owned. Fire the cancel token. Assert the task shuts down
    within a deadlock guard (few seconds, well under 15s).

  5. cancel_during_write_drain_shuts_down (new). Gate at 0.
    Saturate 64 readers. Send a write (blocks in acquire_many). Fire
    the cancel token. Assert shutdown within the guard. This test
    would fail without the cancel-select fix.

  6. operation_deadline_releases_permit (new). Configure the task
    with a 200ms deadline. Gate at 0. Send one read. After 200ms,
    assert: the caller receives ColdStorageError::Timeout, and a
    subsequent read (issued after the deadline) acquires a permit and
    proceeds.

All tests use MemColdBackend wrapped in GatedBackend as the
fixture. Per-test timeouts are enforced via tokio::time::timeout so
a regression trips fast.

Rollout

Single PR rebased on main, replacing the current PR #56. Commit
structure:

  1. Introduce PermittedReadRequest and plumb permit acquisition
    through ColdStorageHandle.
  2. Split ColdStorageTask::run into dispatcher + writer tasks.
  3. Add cancellable drain in the writer.
  4. Add read_deadline field, with_read_deadline builder, and
    deadline-wrapping logic in the dispatcher. Add
    ColdStorageError::Timeout.
  5. Replace PR fix(cold): replace read-arm TaskTracker backpressure with Semaphore #56's test file with the expanded test suite and add
    GatedBackend helper.

Reviewer validation path unchanged from PR #56: cut a patch release,
bump init4tech/node-components, rebuild signet-sidecar:latest,
redeploy to dev mainnet, confirm no backpressure-induced crashes.

Open questions

None currently outstanding — scope, deadline value, unification
question, shutdown ordering, and single-PR-vs-split all resolved in
brainstorming.

@Fraser999
Copy link
Copy Markdown
Contributor

[Claude Code]

Spec is directionally right — the permit-attached-to-message, split dispatcher/writer, cancellable drain, and GatedBackend tests all land cleanly and close concrete gaps in the current PR. Two pieces of the design don't do what they claim to, though, and I think tweaking them changes the shape of the fix enough to mention before landing.

Core issue: tokio-level timeout doesn't cancel work, it only drops our handle on it

tokio::time::timeout(d, fut).await races a sleeper against fut and, on fire, drops fut. "Dropped" means Rust runs Drop impls; it does not mean work the future kicked off on another thread, connection, or DB server stops. The dispatcher-side timeout in the spec therefore releases only the one thing it owns — the semaphore permit — and nothing else. Three cases, all currently relevant:

MDBX point lookups as the backend is written today

Every ColdStorageRead impl in crates/cold-mdbx/src/backend.rs is an async fn wrapping a synchronous body (self.env.tx() → cursor ops → return). There are no .await points inside the body. The dispatcher polls once, the sync body runs to completion on that poll, Ready(...) is returned, and only then is control handed back to timeout. The sleeper never wakes; the deadline never fires.

So the 5s deadline as specified is silently inert for MDBX — the backend we care most about for this bug, and the one the deadline is advertised to guard against in the spec text:

The deadline is a guardrail against backend-layer pathology (stuck connection, runaway query, deadlocked transaction) ...

It isn't, for MDBX as written. The precondition — backend yields at await points — is unstated.

MDBX after moving reads to spawn_blocking

We need this change independently for runtime liveness (a synchronous MDBX call inside async fn currently blocks a tokio worker). Once it's in place, timeout can fire cleanly at the JoinHandle — but dropping a spawn_blocking JoinHandle does not cancel the blocking task, which tokio documents. The worker thread runs to completion, still holding the MDBX read tx (blocking writer page reclamation). Permit back, real resource not.

sqlx PG

Timeout-drop here works at the client: the future drops, the Transaction guard drops, sqlx evicts/resets the connection. But PG keeps executing the query server-side until it notices the cancel or finishes. On Aurora that can be seconds. Our pool sees connection churn; Aurora sees the query count go up. The permit returns before the real resource does.

Consequence: the permit count lies about backend load

If permits release on timeout-drop while backend work is still running, the semaphore stops reflecting in-flight backend work. Under repeated timeouts the cold task will admit 64 "fresh" reads while 64 "zombie" calls are still alive behind the scenes. Backend gets overcommitted even though our semaphore says 64. The semaphore is supposed to be the one backpressure mechanism; this breaks that invariant.

Proposed tweaks

Everything else in the spec (permit-attached-to-message, split dispatcher/writer, cancellable drain, GatedBackend + the five new tests) stays. Changes:

1. Permits release on backend completion, not on caller-side drop

The permit rides the handler's future, not the caller's. When a caller times out, the caller's oneshot::Receiver drops — the handler's send becomes a no-op, work continues, permit releases when the backend call returns. The semaphore then honestly reflects in-flight backend work at all times.

2. Remove the dispatcher-side timeout entirely

Delete DEFAULT_READ_DEADLINE, handle_read_with_deadline, and ColdStorageError::Timeout. The deadline as specified can't actually cancel MDBX work and has the side effect of breaking the permit accounting invariant from point 1. Don't expose a guarantee we can't keep.

3. Push timeouts to layers where they can actually cancel work

  • PG: issue SET LOCAL statement_timeout = <ms> at the top of each transaction. PG cancels the query server-side on expiry and the connection returns clean. Real cancellation, enforced by the backend.
  • MDBX: every read moves to spawn_blocking (required anyway for runtime liveness). Pass deadline: Instant into the blocking body; the body checks it between cursor advances / between per-block iterations — exactly the pattern produce_log_stream_blocking already uses. Syscall-level hangs remain unkillable, but those are a kernel / filesystem concern and no level of tokio magic fixes them.

4. Opt-in caller-side deadline on the handle

For callers that want fast-fail semantics (RPC clients with their own budget), expose get_header_with_deadline(spec, d) etc. that wraps send + recv in tokio::time::timeout. Caller gets Err(Timeout); handler keeps running; permit held until real completion. Honest — caller abandons interest, backend paces itself. Most callers probably don't need this; it's there for the ones that do.

5. The deadline's stated purpose — bounding drain duration — is really a node-layer concern

The failure chain in the PR description is: slow reader → long drain → write mpsc fills → dispatch_append_blocks returns Backpressure? in signet-node/src/node.rs:285 crashes the node. The deadline tries to bound the first step so the last step never happens. But Backpressure is designed as a transient signal; treating it as fatal is a node-side bug. Fixing that one line — queue / drop / log rather than ? — removes the pressure on the storage layer to pretend it can bound backend latency. Storage can then be honest about its limits and surface them via metrics (permit saturation, drain duration, backend p99) rather than hiding them in the dispatcher.

Net design

  • Permit = real backend slot. Released only on backend completion.
  • Timeouts land where cancellation is actually possible: caller side (abandon interest) and backend side (statement_timeout, in-body deadline checks).
  • ColdStorageError::Timeout gone. ColdStorageError::Backpressure remains and is treated as non-fatal upstream.
  • Metrics expose backend health honestly. Slow backend = slow cold task = visible in dashboards, not papered over.

Scope grows by: node-side backpressure handling fix + MDBX spawn_blocking migration. First is one line. Second is mechanical but touches every impl ColdStorageRead for MdbxColdBackend method. Both are required for this PR's fix to hold under production load, so I'd rather do them here than leave them as follow-ups — happy to sequence the MDBX migration as its own PR immediately after this one if you'd prefer to keep diff size down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants