Skip to content

fix: rework navigation map into structural/live/planning layers with confidence-based structural updates#1783

Draft
christiefhyang wants to merge 10 commits intodevfrom
christie/fix-navigation-cleanup
Draft

fix: rework navigation map into structural/live/planning layers with confidence-based structural updates#1783
christiefhyang wants to merge 10 commits intodevfrom
christie/fix-navigation-cleanup

Conversation

@christiefhyang
Copy link
Copy Markdown
Member

@christiefhyang christiefhyang commented Apr 13, 2026

Summary

This PR upgrades the navigation map pipeline from a single static binary map to a layered map model for better handling world changes.

The new model includes structural_map, live_map, and planning_map.

Global planner and local planner both consume planning_map so behavior stays consistent between global path generation and local obstacle handling.

Map fusion and clearing logic are moved into occupancy operations as pure functions, keeping planner modules focused on planning logic only.

Background and Motivation

  • Current map behavior is mostly static and does not robustly represent environment structure changes over time, such as moved desks or furniture updates.
  • The system needs a mechanism to distinguish short term perception changes from long term structural updates.
  • This PR introduces a confidence based structural update flow and a time bounded live layer, so the planner can react quickly to recent changes while preserving long term map stability.

Design Goals

  • Keep responsibilities clean
  • Navigation map manages layered map state
  • Operations module owns fusion and clearing logic
  • Planner modules only consume planning map and do not perform map fusion
  • Keep changes minimal and targeted
  • Only touch required files and avoid broad refactor
  • Support incremental rollout
  • Add config switches and thresholds to tune behavior safely

Implementation Details

  1. navigation_map.py
    Replaced single binary map storage with layered state
    structural_map stores long lived structure
    live_map stores most recent observation snapshot
    planning_map is composed from structural plus live layers
    Added per cell signed confirmation counts
    positive counts increase occupied confidence
    negative counts increase free confidence
    unknown observations do not change confidence
    Update flow
    on first update initialize structural map and counters
    on next updates optionally update structural map using confidence thresholds
    always refresh live map
    always refresh planning map through fusion
    Retention logic
    live layer retention is evaluated using receive time based age to avoid timestamp base mismatch between sources
    Compatibility
    binary_costmap is kept as an alias to planning_map to avoid broad call site changes

  2. operations.py
    Added pure functions for map update and fusion
    update_confirmation_counts updates signed confidence counters
    update_structural_map applies write and clear thresholds to structural layer
    fuse_planning_map combines structural and live layers for planner consumption
    These functions isolate data transformation from planner logic and make behavior easier to reason about and test.

  3. global_planner.py
    Safe goal check now reads planning_map instead of old binary map source
    No map fusion logic introduced here.

  4. local_planner.py
    Path forward obstacle check now reads planning_map
    This ensures local stop and replan triggers use the same map semantics as global planner decisions.

  5. global_config.py
    Added new configuration parameters
    structural_write_threshold
    structural_clear_threshold
    live_map_retention_sec
    enable_structural_update

Behavioral Changes

Planner map source is now unified as planning_map
Recent obstacles can affect planning immediately through live layer
Persistent observations can gradually update structural layer through confidence thresholds
If live layer expires by retention time, planning map falls back to structural layer only
Structural updates can be disabled by config for staged rollout

Why This Is Safe

  • Scope is limited to five files only
  • Existing planner entry points and module boundaries remain intact
  • Compatibility alias reduces migration risk for existing binary_costmap usage
  • Fusion logic is centralized in pure functions with explicit inputs and outputs
  • Config gates allow conservative deployment and easy rollback of structural updates

Validation

  • Syntax validation completed for all changed files using Python 3.11 py_compile
  • Full pytest run is currently blocked in this environment due to existing interpreter dependency mismatch unrelated to this PR
  • Code path audit confirms
  • global planner consumes planning_map
  • local planner clearance check consumes planning_map
  • map fusion and clearing logic stays outside planner modules

Risks and Mitigations

  1. Structural update thresholds may be too aggressive or too conservative for different environments
    Mitigation; Use structural_write_threshold and structural_clear_threshold for tuning, and enable_structural_update for controlled rollout

  2. Live layer may expire too quickly or too slowly depending on sensor cadence
    Mitigation; Tune live_map_retention_sec per deployment profile

  3. Time base mismatch between map message timestamp and process clock
    Mitigation; Live retention now uses receive time based aging in navigation_map

Rollout Plan

Phase 1
Enable layered planning map with structural updates enabled and conservative thresholds
Phase 2
Tune thresholds based on observed false occupied and false free transitions
Phase 3
Lock deployment defaults once behavior stabilizes in representative scenes

Rollback Plan

Set enable_structural_update to false to freeze structural layer updates
Increase or decrease live_map_retention_sec to adjust live influence quickly
If needed, planners still work with planning_map fallback behavior through existing interfaces without additional refactor

Files Changed

dimos/navigation/replanning_a_star/navigation_map.py
dimos/navigation/replanning_a_star/global_planner.py
dimos/navigation/replanning_a_star/local_planner.py
dimos/mapping/occupancy/operations.py
dimos/core/global_config.py

Contributor License Agreement

  • [ x ] I have read and approved the CLA.

@christiefhyang christiefhyang changed the title Christie/fix navigation cleanup fix: navigation cleanup Apr 13, 2026
@christiefhyang christiefhyang marked this pull request as ready for review April 13, 2026 09:36
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 13, 2026

Greptile Summary

This PR replaces the single static binary map in the navigation pipeline with a three-layer model (structural_map, live_map, planning_map), adds confidence-count-based structural update logic, and refactors map fusion and clearing into pure helper functions in operations.py. Planner modules are simplified to only consume planning_map, and a binary_costmap alias preserves backwards compatibility.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/defensive-coding suggestions that do not affect correctness under the documented default configuration.

All five findings are P2. The equality-threshold edge case in update_structural_map requires explicit misconfiguration to trigger (defaults are 3 and −3, well apart). The timestamp inconsistency in the planning_map property does not affect planning decisions. The exposed-reference concern is theoretical given current call sites. No crashes, data loss, or security issues were found.

dimos/mapping/occupancy/operations.py (validation logic) and dimos/core/global_config.py (missing config-time invariant check)

Important Files Changed

Filename Overview
dimos/navigation/replanning_a_star/navigation_map.py Core layered-map class; well-structured with RLock for thread safety. Minor: live_map and structural_map properties return internal references without copying, and planning_map property uses wall-clock time as the output timestamp rather than the message timestamp used in update().
dimos/mapping/occupancy/operations.py Three new pure functions for confidence counting, structural map updates, and planning map fusion. The update_structural_map validation allows clear_threshold == write_threshold, collapsing the hysteresis dead zone. Docstring says "mutate" but the function is pure. No tests added for the three new functions.
dimos/navigation/replanning_a_star/global_planner.py Safe-goal check updated to read planning_map instead of old binary map source; no map fusion logic introduced here. Clean change.
dimos/navigation/replanning_a_star/local_planner.py Path-clearance costmap updated to planning_map; no other behavioural changes. Clean change.
dimos/core/global_config.py Four new config params added with sensible defaults. No validation that structural_clear_threshold < structural_write_threshold is enforced at config-load time; a misconfiguration would surface only at runtime inside the update loop.

Sequence Diagram

sequenceDiagram
    participant Sensor as Sensor / ROS
    participant GP as GlobalPlanner
    participant NM as NavigationMap
    participant Ops as operations.py
    participant LP as LocalPlanner

    Sensor->>GP: handle_global_costmap(OccupancyGrid)
    GP->>NM: update(occupancy_grid)
    alt first update
        NM->>NM: _initialize_maps()<br/>structural_map = copy<br/>counts = threshold-seeded zeros
    else subsequent update
        NM->>Ops: update_confirmation_counts(counts, grid)
        Ops-->>NM: updated counts (int16, clipped ±100)
        NM->>Ops: update_structural_map(structural_map, counts, write_thr, clear_thr, ts)
        Ops-->>NM: new structural_map
    end
    NM->>NM: live_map = occupancy_grid.copy()<br/>live_map_received_at = time.time()
    NM->>Ops: fuse_planning_map(structural_map, live_map, ts)
    Ops->>Ops: overlay_occupied(structural_map, live_map)
    Ops-->>NM: planning_map

    GP->>NM: planning_map (property)
    NM->>NM: _get_live_map(now)<br/>age check vs live_map_retention_sec
    NM->>Ops: fuse_planning_map(structural_map, live_map_or_None, now)
    Ops-->>NM: planning_map
    NM-->>GP: planning_map

    GP->>GP: _find_safe_goal(goal)<br/>reads planning_map

    LP->>NM: planning_map (property, each control tick)
    NM-->>LP: planning_map
    LP->>LP: path_clearance.update_costmap(planning_map)<br/>obstacle check → replan signal
Loading

Reviews (1): Last reviewed commit: "Update global_config.py" | Re-trigger Greptile

Comment on lines +131 to +134
if clear_threshold > write_threshold:
raise ValueError(
f"clear_threshold ({clear_threshold}) must be <= write_threshold ({write_threshold})"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Validation allows zero-width dead zone

When clear_threshold == write_threshold, the two assignment lines run back-to-back on overlapping indices:

result_grid[counts >= write_threshold] = CostValues.OCCUPIED  # e.g. count == 0 → OCCUPIED
result_grid[counts <= clear_threshold] = CostValues.FREE      # count == 0 overwritten → FREE

Any cell whose count sits exactly at the shared threshold will be set to OCCUPIED and then immediately overwritten to FREE, collapsing the hysteresis dead zone to nothing. The guard should use strict inequality to prevent this:

Suggested change
if clear_threshold > write_threshold:
raise ValueError(
f"clear_threshold ({clear_threshold}) must be <= write_threshold ({write_threshold})"
)
if clear_threshold >= write_threshold:
raise ValueError(
f"clear_threshold ({clear_threshold}) must be < write_threshold ({write_threshold})"
)

clear_threshold: int,
ts: float,
) -> OccupancyGrid:
"""Apply confidence thresholds to mutate a structural map."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Docstring says "mutate" but function is pure

The docstring reads "Apply confidence thresholds to mutate a structural map.", but the function never modifies structural_map in place — it always allocates and returns a new OccupancyGrid. This is the correct design (pure function), but the wording will mislead readers who expect in-place mutation.

Suggested change
"""Apply confidence thresholds to mutate a structural map."""
"""Apply confidence thresholds to produce an updated structural map."""

Comment on lines +142 to +150
def _get_live_map(self, now: float) -> OccupancyGrid | None:
if self._live_map is None:
return None

age = now - self._live_map_received_at
if age > self._global_config.live_map_retention_sec:
return None

return self._live_map
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 live_map and structural_map properties expose internal references

_get_live_map returns self._live_map directly, so the live_map property hands callers a reference to the internal object without copying it. Any caller that mutates the returned OccupancyGrid.grid (e.g., in-place NumPy writes) would silently corrupt the stored live map state.

structural_map (line 56–59) has the same pattern. For consistency with confirmation_counts (which correctly returns .copy()), consider returning self._live_map.copy() and self._structural_map.copy() from those properties.

Comment on lines +71 to +77
def planning_map(self) -> OccupancyGrid:
with self._lock:
now = time.time()
self._refresh_planning_map(now, now)
if self._planning_map is None:
raise ValueError("No current planning map available")
return self._planning_map
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 planning_map property uses wall-clock time as the output timestamp

In update() the planning map is fused with output_ts = occupancy_grid.ts (the sensor message stamp). Here the property calls _refresh_planning_map(now, now), so the fused map gets ts = time.time() — a different clock source. Consumers that rely on the planning-map timestamp for message-age or synchronisation checks will see an inconsistent timestamp depending on whether the map was last refreshed by update() or by a property access.

Comment on lines +51 to +53
structural_write_threshold: int = 3
structural_clear_threshold: int = -3
live_map_retention_sec: float = 1.5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No config-time validation that structural_clear_threshold < structural_write_threshold

update_structural_map raises a ValueError at runtime if this invariant is violated, but GlobalConfig does not validate it at construction. A misconfiguration (e.g. STRUCTURAL_CLEAR_THRESHOLD=5 with STRUCTURAL_WRITE_THRESHOLD=3) would surface only deep in the navigation update loop. Consider adding a model_validator:

from pydantic import model_validator

@model_validator(mode="after")
def _validate_structural_thresholds(self) -> "GlobalConfig":
    if self.structural_clear_threshold >= self.structural_write_threshold:
        raise ValueError(
            "structural_clear_threshold must be < structural_write_threshold"
        )
    return self

@christiefhyang christiefhyang changed the title fix: navigation cleanup fix: Rework navigation map into structural/live/planning layers with confidence-based structural updates Apr 13, 2026
@christiefhyang christiefhyang changed the title fix: Rework navigation map into structural/live/planning layers with confidence-based structural updates fix: rework navigation map into structural/live/planning layers with confidence-based structural updates Apr 13, 2026
rerun_enabled: bool = True
rerun_server_addr: str | None = None
viewer_backend: ViewerBackend = "rerun-web"
n_dask_workers: int = 2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these changed?


if self._thread is not None and self._thread is not current_thread():
self._thread.join(DEFAULT_THREAD_JOIN_TIMEOUT)
self._thread.join(2)
Copy link
Copy Markdown
Member

@jeff-hykin jeff-hykin Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert

self._replan_limiter.reset()
self._plan_path()

def set_safe_goal_clearance(self, clearance: float) -> None:
Copy link
Copy Markdown
Member

@jeff-hykin jeff-hykin Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kinda seems important. Is it still active but under a different name?

@jeff-hykin
Copy link
Copy Markdown
Member

jeff-hykin commented Apr 13, 2026

I'm always happy to have help on nav. I would've hoped to have a bit more of a design discussion before work was started. Its true dynamic updates are a problem, but theyre mostly a problem for the 3d case. The 2d case is fine. Planning could be more advanced, but I'm concerned that a multi-layer approach is going to make things complicated. Ivan and myself are trying to make it where there are fewer maps (one source of truth) rather than more maps.

Could you explain how the system can tell a difference between something structural vs nonstructural?

@christiefhyang
Copy link
Copy Markdown
Member Author

rather

Reverting this PR back to draft. The core idea is to provide structural map updates and additions are mostly made in operations.py. I will take a look again on the revert changes you and Paul mentioned. Will circle back.

@christiefhyang christiefhyang marked this pull request as draft April 13, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants