Skip to content

feat(net): reduce sync memory via lazy parsing and throttling#6717

Open
xxo1shine wants to merge 1 commit intotronprotocol:developfrom
xxo1shine:feat/sync-block-memory-optimization-1
Open

feat(net): reduce sync memory via lazy parsing and throttling#6717
xxo1shine wants to merge 1 commit intotronprotocol:developfrom
xxo1shine:feat/sync-block-memory-optimization-1

Conversation

@xxo1shine
Copy link
Copy Markdown
Collaborator

@xxo1shine xxo1shine commented Apr 28, 2026

What does this PR do?

Cap memory usage during block sync by deferring block deserialization and bounding in-flight block requests.

  1. New UnparsedBlock carrier — lightweight (BlockId, byte[]) pair (framework/.../net/service/sync/UnparsedBlock.java). Holds the parsed block id (cheap) plus the raw protobuf bytes; the block body is not deserialized until the worker thread is ready to process it. Implements equals/hashCode on blockId so it works as a map key.

  2. Defer deserialization in sync queues — in SyncService:

    • blockWaitToProcess and blockJustReceived change from Map<BlockMessage, PeerConnection> to Map<UnparsedBlock, PeerConnection>
    • processBlock(peer, blockMessage) builds the UnparsedBlock from blockMessage.getData() instead of inserting the fully-parsed message
    • handleSyncBlock parses the bytes back into a BlockMessage only when the block is about to be applied
  3. Throttle in-flight block requests via node.maxPendingBlockNum — new config controlling the total budget of requested + justReceived + waitToProcess blocks across all peers:

    • Default 500, clamped to [50, 2000] in NodeConfig.postProcess()
    • Wired through NodeConfig.maxPendingBlockNumArgs.applyNodeConfigCommonParameter.maxPendingBlockNum
    • Default mirrored in reference.conf; example documented in config.conf
  4. startFetchSyncBlock becomes budget-aware — computes remainNum = maxPendingBlockNum - requested - justReceived - waitToProcess; once the budget is
    exhausted it stops requesting new heights, with one exception: blocks at or below maxRequestedBlockNum (the previously-requested ceiling) are still allowed
    through, so a slow peer can be retried without stalling the whole sync.

  5. TestsSyncServiceTest updated to construct UnparsedBlock carriers in place of raw BlockMessage puts, plus boundary-condition coverage for the throttle
    path.

Why are these changes required?

  1. Sync-time heap was unbounded. Every received block was parsed eagerly into a BlockMessage and held in two maps until processing finished. Each BlockMessage keeps the parsed Block proto with every transaction expanded into Java objects, which inflates the on-heap footprint several times over the wire size. During catch-up sync from many peers, this routinely caused multi-GB heap growth and GC pressure, with documented OOM incidents on smaller-RAM nodes.

  2. UnparsedBlock collapses the worst case. Raw bytes are 1–2 MB per block; the parsed in-memory representation is significantly larger. Keeping the bytes raw until processing defers (and amortizes) the parse cost to the worker that actually needs it, and frees the in-flight buffer to grow proportional to bytes, not parsed-object-graph.

  3. Concurrent peer fetching had no global cap. A peer aggressively shipping inventory could push the blockJustReceived map far past safe levels, because the receiver only tracked per-peer MAX_BLOCK_FETCH_PER_PEER. Bounding requested + justReceived + waitToProcess makes the worst-case in-flight memory predictable: maxPendingBlockNum × avg block size.

  4. The maxRequestedBlockNum exemption avoids deadlock. A pure hard cap could stall sync if every peer holding a needed block goes idle and the budget is full —
    the only way to make progress is to retry the height, which requires re-requesting a block already in the budget. Allowing retries within the existing ceiling keeps sync forward-progressing under transient peer flakiness.

  5. Operators need the knob. Default 500 is conservative for nodes with ≤ 8 GB heap; large/dedicated nodes benefit from raising it for higher sync throughput, and constrained environments can lower it. Surfacing it as node.maxPendingBlockNum (rather than a constant) lets operators tune without a release.

This PR has been tested by:

  • Unit Tests
  • Manual Testing

Follow up

Extra details

Fixes #6685

@github-actions github-actions Bot requested a review from 317787106 April 28, 2026 09:35
@xxo1shine xxo1shine changed the title feat(net): reduce sync memory via lazy block parsing and request throttling feat(net): reduce sync memory via lazy parsing and throttling Apr 28, 2026
@halibobo1205 halibobo1205 added this to the GreatVoyage-v4.8.2 milestone Apr 28, 2026
@halibobo1205 halibobo1205 added the topic:net p2p net work, synchronization label Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:net p2p net work, synchronization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]Reduce memory usage in blocks sync via delayed serialization and cache limits

2 participants