Skip to content

Various refactors and test hardening#4603

Draft
joshuacolvin0 wants to merge 5 commits intomasterfrom
mel-flaky-fix
Draft

Various refactors and test hardening#4603
joshuacolvin0 wants to merge 5 commits intomasterfrom
mel-flaky-fix

Conversation

@joshuacolvin0
Copy link
Copy Markdown
Member

@joshuacolvin0 joshuacolvin0 commented Apr 6, 2026

Enable nodes to transition from the legacy inbox reader/tracker to the Message Extraction Layer (MEL). This PR builds on the MEL replacement work and adds:

  • Node transition: Detect legacy DB keys at startup and migrate to MEL state via CreateInitialMELStateFromLegacyDB
  • Error handling hardening: Fix nil pointer dereference risks, convert runtime panics to errors, extract prunePrefix helper with iterator leak fix
  • Message pruner: Prune all delayed message prefixes (legacy "d", RLP "e", MEL "y", parent chain "p") with legacy boundary capping
  • Crash safety: Delayed message rollback on accumulator mismatch in inbox reader, deleteTrailingEntries extraction for testability

Commit breakdown

  1. Core MEL implementation + node integration (61 files) — Architecture, review for correctness
  2. Fix nil pointer dereference risks (3 files) — Panic prevention, rubber-stamp
  3. Convert SetDelayedSequencer panics to errors (2 files) — Panic prevention, rubber-stamp
  4. Refactor and harden error handling (6 files) — Code quality, review for logic
  5. Safety tests (6 files) — Test coverage, skim

Key files for review

  • arbnode/mel/runner/mel.go — FSM lifecycle, stall detection, fatal escalation
  • arbnode/mel/runner/database.go — Atomic SaveProcessedBlock, legacy boundary dispatch
  • arbnode/mel/state.go — Two-stack FIFO delayed message queue, accumulator hash chains
  • arbnode/node.go — MEL-or-legacy dispatch, DB migration, computeMigrationStartBlock
  • arbnode/message_pruner.go — Multi-prefix pruning with prunePrefix helper
  • arbnode/transaction_streamer.go — deleteTrailingEntries extraction

Schema changes

Additive only (backwards compatible):

  • New keys: _lastPrunedLegacyDelayedMessageKey, _lastPrunedMelDelayedMessageKey, _lastPrunedParentChainBlockNumberKey, _initialMelStateBlockNum
  • No existing keys modified or removed
  • DbSchemaVersion unchanged at 2

Test plan

  • Unit tests pass: go test ./arbnode/mel/runner/ ./arbnode/ -count=1
  • Linters pass: golangci-lint run + go run ./linters ./...
  • System tests: TestMELMigrationFromLegacyNode, TestMessageExtractionLayer_*
  • Verify nil header guards prevent panics (commits 2 + 5)
  • Verify deleteTrailingEntries doesn't corrupt unrelated prefixes (commit 5)
  • Verify pruner handles all delayed message prefix formats (commit 4)

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

joshuacolvin0 and others added 5 commits April 5, 2026 19:28
…ests

Core MEL implementation and node integration for replacing the inbox
reader/tracker with the Message Extraction Layer:
- MEL runner with FSM (Start/ProcessingNextBlock/SavingMessages/Reorging)
- Atomic SaveProcessedBlock for crash-safe database writes
- Legacy DB migration via CreateInitialMELStateFromLegacyDB
- Node wiring with MEL-or-legacy dispatch
- Message pruner: prune legacy, RLP, MEL, and parent chain block prefixes
- Inbox reader: delayed message rollback on accumulator mismatch
- Transaction streamer: adapt for MEL BatchDataProvider interface
- Staker/validator: adapt for MEL BatchDataReader interface

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevent nil pointer dereference panics (which can corrupt the database)
when the parent chain RPC returns a nil header or a header with nil
Number field:
- logs_and_headers_fetcher.go: guard HeaderByNumber(ctx, nil) result
- process_next_block.go: extend latestBlk nil check to cover Number
- node.go: guard finalizedHeader.Number in computeMigrationStartBlock

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace runtime panics with returned errors to comply with the project
rule that the node must never panic at runtime (panics can corrupt the
database). The caller in NewDelayedSequencer now checks the error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MEL runner:
- Extract escalateIfPersistent helper for fatal error escalation
- Extract ReadMode constants (ReadModeLatest/Safe/Finalized)
- Collapse three identical error branches in updateLastBlockToRead
  into a single failReason path
- Remove redundant state validation in initialize (covered by
  Database.State -> Validate at load time)
- Add nil Message guard in GetDelayedMessageBytes

Message pruner:
- Extract prunePrefix helper to deduplicate fetch/delete/persist pattern
- Fix iterator leak in deleteFromLastPrunedUptoEndKey (defer Release)
- Propagate errors from fetchLastPrunedKey/insertLastPrunedKey instead
  of silently logging and returning zero

Inbox reader:
- Roll back delayed messages on accumulator mismatch to prevent
  orphaned entries in the DB

Transaction streamer:
- Extract deleteTrailingEntries for independent testability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eanup

- Nil header guards in logsAndHeadersFetcher (nil header + nil Number)
- escalateIfPersistent: nil chan, zero tolerance, threshold boundary,
  context cancellation
- SetMessageConsumer/SetBlockValidator: double-set and after-start guards
- GetDelayedMessage/GetBatchMetadata: boundary checks at and above count
- SetDelayedSequencer: double-set error after panic-to-error conversion
- validateAndInitializeDBForMEL: fresh-node path
- deleteTrailingEntries: orphan removal, sparse entries, zero count,
  unrelated prefix safety

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joshuacolvin0 joshuacolvin0 marked this pull request as draft April 6, 2026 02:39
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 6.61939% with 790 lines in your changes missing coverage. Please review.
✅ Project coverage is 26.03%. Comparing base (20ee29c) to head (e15593a).

❗ There is a different number of reports uploaded between BASE (20ee29c) and HEAD (e15593a). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (20ee29c) HEAD (e15593a)
4 2
Additional details and impacted files
@@                    Coverage Diff                     @@
##           node-transition-to-mel    #4603      +/-   ##
==========================================================
- Coverage                   34.00%   26.03%   -7.97%     
==========================================================
  Files                         495      495              
  Lines                       59162    59478     +316     
==========================================================
- Hits                        20120    15488    -4632     
- Misses                      35509    41003    +5494     
+ Partials                     3533     2987     -546     

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

❌ 218 Tests Failed:

Tests completed Failed Passed Skipped
4941 218 4723 0
View the top 3 failed tests by shortest run time
TestProgramKeccak
Stack Traces | 0.000s run time
=== RUN   TestProgramKeccak
INFO [04-06|02:51:11.358] New local node record                    seq=1,775,443,871,358 id=656e0ada0120f168   ip=127.0.0.1 udp=0 tcp=0
--- FAIL: TestProgramKeccak (0.00s)
TestPruningDBSizeReduction
Stack Traces | 0.000s run time
=== RUN   TestPruningDBSizeReduction
--- FAIL: TestPruningDBSizeReduction (0.00s)
TestProgramCallCost/delegate_call_contract/burnGas
Stack Traces | 0.000s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
        	/home/runner/work/nitro/nitro/util/testhelpers/testhelpers.go:28 +0x55
        github.com/offchainlabs/nitro/system_tests.Require(0xc0d8cf0000, {0x45655c0, 0x6219810}, {0x0, 0x0, 0x0})
        	/home/runner/work/nitro/nitro/system_tests/common_test.go:2292 +0x5d
        github.com/offchainlabs/nitro/system_tests.SendWaitTestTransactions(0xc0d8cf0000, {0x4586e68, 0xc00d9f8e60}, 0xc0f5bdc580, {0xc0e0dd7ca8, 0x2, 0xc000446540?})
        	/home/runner/work/nitro/nitro/system_tests/common_test.go:1324 +0x93
        github.com/offchainlabs/nitro/system_tests.(*TestClient).SendWaitTestTransactions(0xc149870f80, 0xc0d8cf0000, {0xc0e0dd7ca8, 0x2, 0x2})
        	/home/runner/work/nitro/nitro/system_tests/common_test.go:200 +0x5d
        github.com/offchainlabs/nitro/system_tests.measureGasUsage(0xc0d8cf0000, 0xc033350900, {0xa0, 0x20, 0x62, 0xaa, 0x4c, 0xb0, 0x87, 0x23, ...}, ...)
        	/home/runner/work/nitro/nitro/system_tests/program_gas_test.go:314 +0x189
        github.com/offchainlabs/nitro/system_tests.compareGasUsage(0xc0d8cf0000, 0xc033350900, {0xa0, 0x20, 0x62, 0xaa, 0x4c, 0xb0, 0x87, 0x23, ...}, ...)
        	/home/runner/work/nitro/nitro/system_tests/program_gas_test.go:358 +0x11d
        github.com/offchainlabs/nitro/system_tests.TestProgramCallCost.func1(0xc0d8cf0000)
        	/home/runner/work/nitro/nitro/system_tests/program_gas_test.go:204 +0x23d
        testing.tRunner(0xc0d8cf0000, 0xc0e574bb30)
        	/opt/hostedtoolcache/go/1.25.8/x64/src/testing/testing.go:1934 +0xea
        created by testing.(*T).Run in goroutine 524275
        	/opt/hostedtoolcache/go/1.25.8/x64/src/testing/testing.go:1997 +0x465
        
    program_gas_test.go:314: �[31;1m [] io: read/write on closed pipe �[0;0m
--- FAIL: TestProgramCallCost/delegate_call_contract/burnGas (0.00s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

@joshuacolvin0 joshuacolvin0 self-assigned this Apr 7, 2026
Base automatically changed from node-transition-to-mel to master April 14, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant