Skip to content

feat(api): add porcelain api with connect#1779

Open
paul-nechifor wants to merge 1 commit intodevfrom
paul/feat/porcelain-api-with-connect
Open

feat(api): add porcelain api with connect#1779
paul-nechifor wants to merge 1 commit intodevfrom
paul/feat/porcelain-api-with-connect

Conversation

@paul-nechifor
Copy link
Copy Markdown
Contributor

@paul-nechifor paul-nechifor commented Apr 12, 2026

Problem

Closes DIM-730

Solution

See the docs for how it works.

Breaking Changes

How to Test

Included in the docs.

Contributor License Agreement

  • I have read and approved the CLA.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 12, 2026

Greptile Summary

This PR introduces a Dimos porcelain API class and a Dimos.connect() classmethod that lets callers attach to an already-running DimOS instance via RPyC, plus a SkillsProxy for ergonomic skill discovery and dispatch. It also adds a coordinator-side RPyC discovery service (RpycServer/_CoordinatorService), wires rpyc_port into the run registry, and starts the service automatically in the CLI run command.

  • P1Dimos.connect(host="robot.local") (with no port) silently connects to localhost via the registry instead of raising an error; the host argument is swallowed by the if host is not None and port is not None guard falling through to the else branch.

Confidence Score: 4/5

Safe to merge after fixing the silent-host-ignore bug in Dimos.connect(); remaining findings are P2 style.

One P1 defect: Dimos.connect(host="x") silently ignores the host and connects to localhost instead, which could cause confusing misdirected connections in production. The rest are P2 style/quality concerns (private API access in LocalModuleSource, lock held during network I/O).

dimos/porcelain/dimos.py (P1 logic bug in connect()), dimos/porcelain/local_module_source.py (P2 style)

Important Files Changed

Filename Overview
dimos/porcelain/dimos.py Main porcelain API class; contains a P1 logic bug where host is silently ignored when port is omitted in Dimos.connect().
dimos/porcelain/local_module_source.py Local module source wrapper; bypasses the newly-added public coordinator methods and holds a lock during network I/O (P2 style issues).
dimos/porcelain/remote_module_source.py Remote module source backed by coordinator RPyC endpoint; correctly uses public API and handles connection caching.
dimos/porcelain/skills_proxy.py Skills discovery and dispatch proxy; cache is per-instance (a new SkillsProxy is created on each app.skills access), which is a reasonable design tradeoff.
dimos/core/coordination/rpyc_server.py New coordinator-side RPyC discovery service; correctly binds a port-0 socket in ThreadedServer.init before returning the port.
dimos/test_porcelain.py Comprehensive test suite covering local and remote Dimos scenarios; well structured with slow-marked integration tests.
dimos/porcelain/module_source.py Protocol definition for module sources; clean and minimal.
dimos/core/coordination/module_coordinator.py Added public list_module_names(), get_module_endpoint(), and start_rpyc_service() methods; correctly locks before accessing deployed modules.
dimos/core/run_registry.py Added rpyc_port field to RunEntry and get_most_recent_rpyc_port() helper; looks correct.

Sequence Diagram

sequenceDiagram
    participant User
    participant Dimos
    participant LocalModuleSource
    participant RemoteModuleSource
    participant ModuleCoordinator
    participant RpycServer
    participant Worker as PythonWorker (subprocess)

    Note over User,Worker: Local path (app.run)
    User->>Dimos: run(target)
    Dimos->>ModuleCoordinator: build(blueprint)
    ModuleCoordinator->>Worker: deploy / start modules
    ModuleCoordinator->>RpycServer: start() - coordinator port saved
    Dimos->>LocalModuleSource: created with coordinator ref

    User->>Dimos: app.skills.ping()
    Dimos->>LocalModuleSource: get_rpyc_module("StressTestModule")
    LocalModuleSource->>Worker: actor.start_rpyc() - worker port
    LocalModuleSource->>Worker: rpyc.connect(localhost, port)
    Worker-->>LocalModuleSource: module proxy
    LocalModuleSource-->>Dimos: module proxy
    Dimos-->>User: skill result

    Note over User,Worker: Remote path (Dimos.connect)
    User->>Dimos: connect(host, port)
    Dimos->>RemoteModuleSource: connect to coordinator RPyC port
    RemoteModuleSource->>RpycServer: list_modules() / get_module_endpoint(name)
    RpycServer->>ModuleCoordinator: delegate
    RemoteModuleSource->>Worker: rpyc.connect(host, worker_port)
    Worker-->>RemoteModuleSource: module proxy
    RemoteModuleSource-->>User: skill result
Loading

Reviews (1): Last reviewed commit: "feat(api): add porcelain api with connec..." | Re-trigger Greptile

Comment thread dimos/porcelain/dimos.py
Comment on lines +119 to +123
if host is not None and port is not None:
source: ModuleSource = RemoteModuleSource(host, port)
else:
rpyc_port = get_most_recent_rpyc_port(run_id=run_id)
source = RemoteModuleSource("localhost", rpyc_port)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 host silently ignored when port is omitted

When only host is provided, the if host is not None and port is not None guard is False, so the code falls through to the registry lookup and connects to "localhost" — the caller-supplied host is silently dropped. Dimos.connect(host="robot.local") will connect to localhost, not robot.local, with no error raised.

Suggested change
if host is not None and port is not None:
source: ModuleSource = RemoteModuleSource(host, port)
else:
rpyc_port = get_most_recent_rpyc_port(run_id=run_id)
source = RemoteModuleSource("localhost", rpyc_port)
if (host is None) != (port is None):
raise ValueError(
"host and port must both be provided together, or both omitted"
)
if host is not None and port is not None:
source: ModuleSource = RemoteModuleSource(host, port)
else:
rpyc_port = get_most_recent_rpyc_port(run_id=run_id)
source = RemoteModuleSource("localhost", rpyc_port)

Comment thread dimos/porcelain/local_module_source.py Outdated
Comment on lines +46 to +65
def get_rpyc_module(self, name: str) -> Any:
with self._lock:
cached = self._cache.get(name)
if cached is not None and not cached[0].closed:
return cached[1]

with self._coordinator._modules_lock:
for cls, proxy in self._coordinator._deployed_modules.items():
if cls.__name__ == name:
actor = proxy.actor_instance # type: ignore[attr-defined]
port = actor.start_rpyc()
module_id = actor._module_id
break
else:
raise KeyError(name)

conn = rpyc.connect("localhost", port, config={"sync_request_timeout": 30})
module = conn.root.get_module(module_id)
self._cache[name] = (conn, module)
return module
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Lock held across network I/O; private API access

Two concerns here: (1) self._lock is held while rpyc.connect() runs, serializing all concurrent get_rpyc_module calls even for different module names. (2) ModuleCoordinator.get_module_endpoint() was added in this PR to expose exactly this lookup; calling it would remove the dependency on _modules_lock, _deployed_modules, and proxy.actor_instance.

Consider restructuring so the coordinator call and the RPyC connect happen before re-acquiring self._lock to write to the cache.

@paul-nechifor paul-nechifor mentioned this pull request Apr 12, 2026
1 task
@paul-nechifor paul-nechifor force-pushed the paul/feat/porcelain-api-with-connect branch from d3da17c to 889de47 Compare April 12, 2026 05:51
@paul-nechifor paul-nechifor force-pushed the paul/feat/porcelain-api-with-connect branch from 889de47 to cb78739 Compare April 12, 2026 06:15
if state.rpyc_server is not None:
return WorkerResponse(result=state.rpyc_server.port)
WorkerRpycService._instances = state.instances
state.rpyc_server = ThreadedServer(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you listen on 0.0.0.0 which I don't actually mind (we assume Physical layer is our boundary) but we keep getting these security PRs

But then also worries me having a host setting per thing since exposing dimos to network becomes super complicated with 50 host settings.

I'd propose let's define a global setting for "dimos listen host" which is 127.0.0.1 by default.

and then your listen address is that global setting unless explicilty set as a config setting for your feature?

self._module_transports: dict[type[ModuleBase], dict[str, PubSubTransport[Any]]] = {}
self._started = False
self._modules_lock = threading.RLock()
self._rpyc = RpycServer(self)
Copy link
Copy Markdown
Contributor

@leshy leshy Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this creates competing communciation channels through which to call skills etc. we already have @rpc and that rpc client shim, I think this is an ok boundary to what you can access on remote modules.. and this will work as we transition to zenoh, switch to remote etc.

now we have a totally separate new protocol API for calling skills/modules, maybe RpyServer should only be on the coordinator which holds those RPC module shims?

Copy link
Copy Markdown
Contributor

@leshy leshy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big comment on actual protocol to talk to workers, but I'm aproving as we iterate. protocol replacable later also.

Another

run() and restart() are only available in local mode. On a connected instance they raise NotImplementedError.

IMO we especially want those in remote mode:

  • you start dimos, then claude iterates on the code on individual module, it wants to restart that module
  • openclaw controls dimos, wants to deploy a new module

I'm not totally sure why those features are not "basically for free" and require extra work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants