fix: rework navigation map into structural/live/planning layers with confidence-based structural updates#1783
fix: rework navigation map into structural/live/planning layers with confidence-based structural updates#1783christiefhyang wants to merge 10 commits intodevfrom
Conversation
Greptile SummaryThis PR replaces the single static binary map in the navigation pipeline with a three-layer model ( Confidence Score: 5/5Safe to merge; all findings are P2 style/defensive-coding suggestions that do not affect correctness under the documented default configuration. All five findings are P2. The equality-threshold edge case in dimos/mapping/occupancy/operations.py (validation logic) and dimos/core/global_config.py (missing config-time invariant check) Important Files Changed
Sequence DiagramsequenceDiagram
participant Sensor as Sensor / ROS
participant GP as GlobalPlanner
participant NM as NavigationMap
participant Ops as operations.py
participant LP as LocalPlanner
Sensor->>GP: handle_global_costmap(OccupancyGrid)
GP->>NM: update(occupancy_grid)
alt first update
NM->>NM: _initialize_maps()<br/>structural_map = copy<br/>counts = threshold-seeded zeros
else subsequent update
NM->>Ops: update_confirmation_counts(counts, grid)
Ops-->>NM: updated counts (int16, clipped ±100)
NM->>Ops: update_structural_map(structural_map, counts, write_thr, clear_thr, ts)
Ops-->>NM: new structural_map
end
NM->>NM: live_map = occupancy_grid.copy()<br/>live_map_received_at = time.time()
NM->>Ops: fuse_planning_map(structural_map, live_map, ts)
Ops->>Ops: overlay_occupied(structural_map, live_map)
Ops-->>NM: planning_map
GP->>NM: planning_map (property)
NM->>NM: _get_live_map(now)<br/>age check vs live_map_retention_sec
NM->>Ops: fuse_planning_map(structural_map, live_map_or_None, now)
Ops-->>NM: planning_map
NM-->>GP: planning_map
GP->>GP: _find_safe_goal(goal)<br/>reads planning_map
LP->>NM: planning_map (property, each control tick)
NM-->>LP: planning_map
LP->>LP: path_clearance.update_costmap(planning_map)<br/>obstacle check → replan signal
Reviews (1): Last reviewed commit: "Update global_config.py" | Re-trigger Greptile |
| if clear_threshold > write_threshold: | ||
| raise ValueError( | ||
| f"clear_threshold ({clear_threshold}) must be <= write_threshold ({write_threshold})" | ||
| ) |
There was a problem hiding this comment.
Validation allows zero-width dead zone
When clear_threshold == write_threshold, the two assignment lines run back-to-back on overlapping indices:
result_grid[counts >= write_threshold] = CostValues.OCCUPIED # e.g. count == 0 → OCCUPIED
result_grid[counts <= clear_threshold] = CostValues.FREE # count == 0 overwritten → FREEAny cell whose count sits exactly at the shared threshold will be set to OCCUPIED and then immediately overwritten to FREE, collapsing the hysteresis dead zone to nothing. The guard should use strict inequality to prevent this:
| if clear_threshold > write_threshold: | |
| raise ValueError( | |
| f"clear_threshold ({clear_threshold}) must be <= write_threshold ({write_threshold})" | |
| ) | |
| if clear_threshold >= write_threshold: | |
| raise ValueError( | |
| f"clear_threshold ({clear_threshold}) must be < write_threshold ({write_threshold})" | |
| ) |
| clear_threshold: int, | ||
| ts: float, | ||
| ) -> OccupancyGrid: | ||
| """Apply confidence thresholds to mutate a structural map.""" |
There was a problem hiding this comment.
Docstring says "mutate" but function is pure
The docstring reads "Apply confidence thresholds to mutate a structural map.", but the function never modifies structural_map in place — it always allocates and returns a new OccupancyGrid. This is the correct design (pure function), but the wording will mislead readers who expect in-place mutation.
| """Apply confidence thresholds to mutate a structural map.""" | |
| """Apply confidence thresholds to produce an updated structural map.""" |
| def _get_live_map(self, now: float) -> OccupancyGrid | None: | ||
| if self._live_map is None: | ||
| return None | ||
|
|
||
| age = now - self._live_map_received_at | ||
| if age > self._global_config.live_map_retention_sec: | ||
| return None | ||
|
|
||
| return self._live_map |
There was a problem hiding this comment.
live_map and structural_map properties expose internal references
_get_live_map returns self._live_map directly, so the live_map property hands callers a reference to the internal object without copying it. Any caller that mutates the returned OccupancyGrid.grid (e.g., in-place NumPy writes) would silently corrupt the stored live map state.
structural_map (line 56–59) has the same pattern. For consistency with confirmation_counts (which correctly returns .copy()), consider returning self._live_map.copy() and self._structural_map.copy() from those properties.
| def planning_map(self) -> OccupancyGrid: | ||
| with self._lock: | ||
| now = time.time() | ||
| self._refresh_planning_map(now, now) | ||
| if self._planning_map is None: | ||
| raise ValueError("No current planning map available") | ||
| return self._planning_map |
There was a problem hiding this comment.
planning_map property uses wall-clock time as the output timestamp
In update() the planning map is fused with output_ts = occupancy_grid.ts (the sensor message stamp). Here the property calls _refresh_planning_map(now, now), so the fused map gets ts = time.time() — a different clock source. Consumers that rely on the planning-map timestamp for message-age or synchronisation checks will see an inconsistent timestamp depending on whether the map was last refreshed by update() or by a property access.
| structural_write_threshold: int = 3 | ||
| structural_clear_threshold: int = -3 | ||
| live_map_retention_sec: float = 1.5 |
There was a problem hiding this comment.
No config-time validation that
structural_clear_threshold < structural_write_threshold
update_structural_map raises a ValueError at runtime if this invariant is violated, but GlobalConfig does not validate it at construction. A misconfiguration (e.g. STRUCTURAL_CLEAR_THRESHOLD=5 with STRUCTURAL_WRITE_THRESHOLD=3) would surface only deep in the navigation update loop. Consider adding a model_validator:
from pydantic import model_validator
@model_validator(mode="after")
def _validate_structural_thresholds(self) -> "GlobalConfig":
if self.structural_clear_threshold >= self.structural_write_threshold:
raise ValueError(
"structural_clear_threshold must be < structural_write_threshold"
)
return self| rerun_enabled: bool = True | ||
| rerun_server_addr: str | None = None | ||
| viewer_backend: ViewerBackend = "rerun-web" | ||
| n_dask_workers: int = 2 |
|
|
||
| if self._thread is not None and self._thread is not current_thread(): | ||
| self._thread.join(DEFAULT_THREAD_JOIN_TIMEOUT) | ||
| self._thread.join(2) |
| self._replan_limiter.reset() | ||
| self._plan_path() | ||
|
|
||
| def set_safe_goal_clearance(self, clearance: float) -> None: |
There was a problem hiding this comment.
This kinda seems important. Is it still active but under a different name?
|
I'm always happy to have help on nav. I would've hoped to have a bit more of a design discussion before work was started. Its true dynamic updates are a problem, but theyre mostly a problem for the 3d case. The 2d case is fine. Planning could be more advanced, but I'm concerned that a multi-layer approach is going to make things complicated. Ivan and myself are trying to make it where there are fewer maps (one source of truth) rather than more maps. Could you explain how the system can tell a difference between something structural vs nonstructural? |
Reverting this PR back to draft. The core idea is to provide structural map updates and additions are mostly made in operations.py. I will take a look again on the revert changes you and Paul mentioned. Will circle back. |
Summary
This PR upgrades the navigation map pipeline from a single static binary map to a layered map model for better handling world changes.
The new model includes structural_map, live_map, and planning_map.
Global planner and local planner both consume planning_map so behavior stays consistent between global path generation and local obstacle handling.
Map fusion and clearing logic are moved into occupancy operations as pure functions, keeping planner modules focused on planning logic only.
Background and Motivation
Design Goals
Implementation Details
navigation_map.py
Replaced single binary map storage with layered state
structural_map stores long lived structure
live_map stores most recent observation snapshot
planning_map is composed from structural plus live layers
Added per cell signed confirmation counts
positive counts increase occupied confidence
negative counts increase free confidence
unknown observations do not change confidence
Update flow
on first update initialize structural map and counters
on next updates optionally update structural map using confidence thresholds
always refresh live map
always refresh planning map through fusion
Retention logic
live layer retention is evaluated using receive time based age to avoid timestamp base mismatch between sources
Compatibility
binary_costmap is kept as an alias to planning_map to avoid broad call site changes
operations.py
Added pure functions for map update and fusion
update_confirmation_counts updates signed confidence counters
update_structural_map applies write and clear thresholds to structural layer
fuse_planning_map combines structural and live layers for planner consumption
These functions isolate data transformation from planner logic and make behavior easier to reason about and test.
global_planner.py
Safe goal check now reads planning_map instead of old binary map source
No map fusion logic introduced here.
local_planner.py
Path forward obstacle check now reads planning_map
This ensures local stop and replan triggers use the same map semantics as global planner decisions.
global_config.py
Added new configuration parameters
structural_write_threshold
structural_clear_threshold
live_map_retention_sec
enable_structural_update
Behavioral Changes
Planner map source is now unified as planning_map
Recent obstacles can affect planning immediately through live layer
Persistent observations can gradually update structural layer through confidence thresholds
If live layer expires by retention time, planning map falls back to structural layer only
Structural updates can be disabled by config for staged rollout
Why This Is Safe
Validation
Risks and Mitigations
Structural update thresholds may be too aggressive or too conservative for different environments
Mitigation; Use structural_write_threshold and structural_clear_threshold for tuning, and enable_structural_update for controlled rollout
Live layer may expire too quickly or too slowly depending on sensor cadence
Mitigation; Tune live_map_retention_sec per deployment profile
Time base mismatch between map message timestamp and process clock
Mitigation; Live retention now uses receive time based aging in navigation_map
Rollout Plan
Phase 1
Enable layered planning map with structural updates enabled and conservative thresholds
Phase 2
Tune thresholds based on observed false occupied and false free transitions
Phase 3
Lock deployment defaults once behavior stabilizes in representative scenes
Rollback Plan
Set enable_structural_update to false to freeze structural layer updates
Increase or decrease live_map_retention_sec to adjust live influence quickly
If needed, planners still work with planning_map fallback behavior through existing interfaces without additional refactor
Files Changed
dimos/navigation/replanning_a_star/navigation_map.py
dimos/navigation/replanning_a_star/global_planner.py
dimos/navigation/replanning_a_star/local_planner.py
dimos/mapping/occupancy/operations.py
dimos/core/global_config.py
Contributor License Agreement