Skip to content

bitreq: Evict dead entries from async connection pool#564

Open
tnull wants to merge 3 commits intorust-bitcoin:masterfrom
tnull:2026-04-bitreq-async-pool-fix
Open

bitreq: Evict dead entries from async connection pool#564
tnull wants to merge 3 commits intorust-bitcoin:masterfrom
tnull:2026-04-bitreq-async-pool-fix

Conversation

@tnull
Copy link
Copy Markdown
Collaborator

@tnull tnull commented Apr 22, 2026

Fixes #562.

Validate pool entries on acquire and clean up after every send:

  • Remove entries whose keep-alive deadline has passed or whose inner
    state has been poisoned, rather than blindly cloning the Arc.
  • Evict on send failure, Connection: close, or malformed
    Keep-Alive.
  • Refresh LRU position on a cache hit.

Pre-send insertion of a fresh Arc on a miss is kept so concurrent
callers arriving during the first round-trip can share the socket for
pipelining; reusable_until and the post-send eviction together
ensure they never observe a known-dead Arc.

tnull added 2 commits April 22, 2026 12:26
Expose a `pub(crate)` view of whether an `AsyncConnection` is still
healthy and, if so, until when. `None` means the inner
`next_request_id` has been poisoned (every failure path in `send`
already sets this); `Some(t)` is the current
`socket_new_requests_timeout`, which `send` already refreshes from
the server's `Keep-Alive: timeout=N` header. No behavioural change;
groundwork for the pool to check entry validity without duplicating
tracking.

Co-Authored-By: HAL 9000
Validate pool entries on acquire and clean up after every send:

- Remove entries whose keep-alive deadline has passed or whose inner
  state has been poisoned, rather than blindly cloning the `Arc`.
- Evict on send failure, `Connection: close`, or malformed
  `Keep-Alive`.
- Refresh LRU position on a cache hit.

Pre-send insertion of a fresh `Arc` on a miss is kept so concurrent
callers arriving during the first round-trip can share the socket for
pipelining; `reusable_until` and the post-send eviction together
ensure they never observe a known-dead `Arc`.

Co-Authored-By: HAL 9000
Drives `a, b, a, c, a` through a `Client` with capacity 2, backed by
three `tokio::net::TcpListener` servers so each URL resolves to a
distinct `ConnectionKey`. On the fix, step 3 promotes `a` to
most-recent, step 4's c-insert evicts `b`, and step 5 reuses the warm
`a` entry — three TCP accepts. Pre-fix, `a` stays at the front of the
LRU queue and gets evicted at step 4 instead — four accepts.

Covers one regression only: the LRU refresh on hit. The other
architectural improvement on this branch — explicit pool-layer
eviction on failure / `Connection: close` — produces the same
externally-observable behaviour as the pre-fix code because
`AsyncConnection::send`'s interior `retry_new_connection!` machinery
compensates by swapping poisoned inner state on the next use. That
change is defended on architectural grounds (explicit pool-layer
logic instead of reliance on the inner retry) rather than via a
black-box regression test.

Co-Authored-By: HAL 9000
Comment thread bitreq/src/client.rs
/// Removes any pool entry for `key`. No-op if the slot is already empty.
fn evict(&self, key: &ConnectionKey) {
let mut state = self.r#async.lock().unwrap();
state.connections.remove(key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should evict by identity, not just by ConnectionKey.

There is a race where an older poisoned Arc can finish later and remove a newer healthy entry for the same key:

  1. tasks A and B are both using the old pooled connection for K
  2. A fails and marks the old connection dead
  3. task C opens and inserts a fresh connection for K
  4. B reaches this cleanup path later and remove(key) drops C's fresh entry

That doesn't break in-flight requests, but it does create avoidable reconnect churn under load. I think evict needs to compare the currently pooled Arc against the one that just finished and only remove on Arc::ptr_eq.

Permalinks:

Copy link
Copy Markdown
Member

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm understanding the fix here - if we hit a dead connection when we go to request we should fix the Connection, not build a new one in the Client?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bitreq: Client connection pool retains dead connections indefinitely on I/O error

3 participants