Skip to content

feat(host-reth): replace ExEx backfill with direct DB reads#136

Open
prestwich wants to merge 9 commits intomainfrom
prestwich/db-backfill
Open

feat(host-reth): replace ExEx backfill with direct DB reads#136
prestwich wants to merge 9 commits intomainfrom
prestwich/db-backfill

Conversation

@prestwich
Copy link
Copy Markdown
Member

@prestwich prestwich commented Apr 10, 2026

Summary

  • Replace ExEx-driven backfill (which re-executes blocks) with direct DB reads from the reth provider, fixing slow startup and memory issues
  • New DbBackfill<P> reads blocks+receipts in batches up to finalized, then hands off to the ExEx stream for live blocks
  • HostChain enum unifies backfill and live chain segments behind Extractable
  • set_head documented and enforced as once-only across both notifier implementations

Details

The ExEx backfill mechanism re-executes historical blocks on startup, which is slow and has caused memory issues. The reth DB already contains executed results, making re-execution unnecessary.

New types (signet-host-reth):

  • DbBlock / DbChainSegment — owned block+receipts from DB, implements Extractable
  • DbBackfill<P> — batch reader using spawn_blocking for MDBX reads
  • HostChain — enum wrapping DbChainSegment (backfill) and RethChain (live)

Notifier changes:

  • RethHostNotifier::next_notification is now two-phase: drain DB backfill, then switch to ExEx
  • set_head creates a DbBackfill instead of calling ExEx set_with_head
  • ExEx stream is initiated after backfill completes, pointed at the last backfilled block
  • reth-stages-types dependency removed

Cross-crate:

  • HostNotifier::set_head trait doc clarified as once-only
  • RpcHostNotifier::set_head guards against repeated calls

Review follow-ups

  • Fraser: set_backfill_thresholds(None) now resets to DEFAULT_BATCH_SIZE via a new DbBackfill::reset_batch_size, matching the trait contract and RpcHostNotifier.
  • Evalir: Removed the genesis fallback after backfill completion. The ExEx startup race it was defending against was fixed upstream in reth (#19665 / #22168, merged Feb 2026), and at our call site DbBackfill has just successfully read last_backfilled from the same provider — so a missing header there now indicates DB-level failure and returns RethHostError::MissingHeader.

Closes ENG-1784

Test plan

  • cargo clippy passes (both --all-features and --no-default-features)
  • RUSTDOCFLAGS="-D warnings" cargo doc passes
  • All existing tests pass (signet-host-reth, signet-node-types, signet-host-rpc)
  • signet-node compiles cleanly with new HostChain type
  • Integration test on a reth node with historical data (manual)

🤖 Generated with Claude Code

prestwich and others added 8 commits April 10, 2026 12:17
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces three types for reading host-chain blocks from the reth DB:
- `DbBlock`: owned block+receipts pair from the provider
- `DbChainSegment`: newtype over `Vec<DbBlock>` implementing `Extractable`
  using the same `RecoveredBlockShim` transmute pattern as `RethChain`
- `DbBackfill<P>`: batch reader that walks from a cursor to the finalized
  block, recording metrics per batch via `crate::metrics`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `HostChain` enum with `Backfill(DbChainSegment)` and `Live(RethChain)`
variants, both delegating to the inner `Extractable` impl. Promotes
`DbChainSegment` to `pub` and re-exports both new types from the crate root.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace reth's built-in ExEx backfill with DbBackfill for startup
catch-up. The notifier now runs a two-phase loop: phase 1 drains DB
batches via DbBackfill, then phase 2 switches to live ExEx
notifications. set_head initializes backfill instead of resolving a
header directly, and set_backfill_thresholds configures DbBackfill
batch size. Chain type changes from RethChain to HostChain enum.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@prestwich prestwich requested a review from a team as a code owner April 10, 2026 16:34
@prestwich prestwich requested review from Evalir and Fraser999 April 10, 2026 16:36
Comment thread crates/host-reth/src/notifier.rs Outdated
Comment on lines +249 to +254
if let Some(backfill) = &mut self.backfill
&& let Some(max_blocks) = max_blocks
{
debug!(max_blocks, "configured DB backfill batch size");
backfill.set_batch_size(max_blocks);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trait method's doc comment says "None means use the backend's default.", and the RpcHostNotifier does this, so we should probably reset the batch size to the default here too in the None case.

Comment on lines +43 to +52
const {
assert!(
size_of::<RecoveredBlockShim>() == size_of::<RethRecovered>(),
"RecoveredBlockShim layout diverged from RethRecovered"
);
assert!(
align_of::<RecoveredBlockShim>() == align_of::<RethRecovered>(),
"RecoveredBlockShim alignment diverged from RethRecovered"
);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very nice

Comment thread crates/host-reth/src/notifier.rs Outdated
let backfill = self.backfill.take().expect("backfill was Some");
let last_backfilled = backfill.cursor().saturating_sub(1);

let head = self
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this behavior of fetching the last seen block, and if that fails, fall back to genesis, stems from a probably still unresolved bug where Reth starts the exex before its connected to its own db provider (that's roughly what I remember). This ofc complicates the logic quite a bit by adding the genesis fallback path.

Considering that most of the time we might be falling into phase 1 when restarting the node to catch up with a few blocks, maybe we can simplify this?

- `set_backfill_thresholds(None)` now resets the batch size to
  `DEFAULT_BATCH_SIZE` via a new `DbBackfill::reset_batch_size`,
  matching the trait contract ("`None` means use the backend's
  default") and the `RpcHostNotifier` implementation.
- Remove the genesis fallback after backfill completion. The
  documented ExEx startup race it was defending against
  (reth #19665 / #22168) was fixed upstream, and in any case
  `DbBackfill` just read `last_backfilled` from the same provider,
  so a missing header at this point indicates DB-level failure.
  Now returns `RethHostError::MissingHeader` instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants