Skip to content

p2p: connection gater permanently blocks legitimate peers, preventing header sync initialization #3267

@auricom

Description

@auricom

Summary

A fullnode on Eden testnet (edennet-2) is unable to initialize header sync because its local BasicConnectionGater permanently blocks outbound connections to legitimate bootstrap/seed peers. The node falls back to DA sync and is 129 unable to sync properly.

Observed Behavior

On node startup, every dial attempt to known peers is rejected by the local gater:

1:42PM INF headers not yet available from peers, waiting to initialize header sync
  error="failed to fetch height 129686434 from peers: header/p2p: failed to open a new stream:
         failed to dial: failed to dial 12D3KooWEpRjbVZuQVZNxaifZDZ3dGLV6Dj7fT5Eox95PQhgWsJ7:
         gater disallows connection to peer" component=HeaderSyncService retry_in=1000

1:42PM WRN P2P header sync initialization timed out, deferring to DA sync component=HeaderSyncService timeout=30000
1:42PM WRN P2P header sync initialization timed out, deferring to DA sync component=DataSyncService timeout=30000

1:42PM INF execution layer is behind, syncing blocks
  blocks_to_sync=129674274 component=execution_replayer
  exec_layer_height=12160 target_height=129686434

All blocked peers are legitimate network participants (bootstrap nodes / sequencer peers). The blocking is outbound — the local gater refuses to dial them, not the remote node refusing the connection.

Root Cause Analysis

1. Gater state is persisted permanently

In pkg/p2p/client.go:85, the gater is created with a persistent datastore:

gater, err := conngater.NewBasicConnectionGater(ds)

conngater.BasicConnectionGater persists its blocklist (blocked peers, addresses, subnets) to the datastore using its own internal keys. This state survives node restarts. Once a peer ID is added to the blocklist — for any reason — it remains blocked indefinitely.

There is currently no mechanism to:

  • Clear the gater's persisted state without wiping the entire datastore
  • Set a TTL / expiry on blocks
  • Inspect which peers are currently blocked and why

2. No diagnostics when a peer is blocked

The only log entries are libp2p-level dial errors. ev-node never logs:

  • Which peers are in the blocklist at startup
  • Why a peer was originally added to the blocklist
  • How many entries the persisted blocklist contains

This makes diagnosing the issue extremely difficult.

3. Possible triggers for a stale blocklist

Most likely causes for peers ending up in the persistent blocklist:

  • A previous node session had the peer IDs listed in p2p.blocked_peers, the config was later changed, but the datastore entries were never cleaned up.
  • A prior version of the code or an external tool wrote block entries to the datastore that are now being loaded on startup.

Impact

  • Header sync never initializes via P2P; node is forced to DA-only sync.
  • With 129 M+ blocks to catch up, this is catastrophically slow for a fullnode operator.
  • No obvious recovery path besides wiping the datastore (losing sync state).

Expected Behavior

Suggested Fixes

  1. Use an in-memory gater (pass nil datastore to NewBasicConnectionGater) to avoid accumulating stale blocks across restarts. Persistence of user-configured blocks (p2p.blocked_peers) can still be handled at config-load time.
  2. Log the gater blocklist at startup so operators can diagnose why peers are being rejected.
  3. Add a TTL or expiry for dynamic blocks so stale entries are pruned automatically.
  4. Provide a CLI/RPC command to inspect and clear gater state without wiping the full datastore.

Environment

  • Chain: edennet-2 (Eden testnet)
  • ev-node version: 1.1.0
  • Node type: fullnode (sync-only)

Metadata

Metadata

Assignees

No one assigned

    Labels

    C:p2pp2p networking relatedT:bugSomething isn't working

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions