[pull] main from danny-avila:main by pull[bot] · Pull Request #68 · innFactory/agents

pull · 2026-04-26T10:43:12Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* feat: add tool output reference registry with FIFO eviction Introduces a run-scoped registry that stores successful tool outputs under stable `tool<idx>turn<turn>` keys and resolves `{{…}}` placeholders in string args. Keeps the registry disabled by default; opt-in via `RunConfig.toolOutputReferences.enabled`. Size caps mirror `calculateMaxToolResultChars` so substituted content cannot exceed what the model has already seen. * feat: wire tool output references through ToolNode and graphs Thread `RunConfig.toolOutputReferences` from `Run` into both `StandardGraph` and `MultiAgentGraph`, and from there into every compiled `ToolNode`. On each batch, ToolNode increments a per-run turn counter, resolves `{{toolturn<n>}}` placeholders in args before invoking, registers the un-annotated output under `toolturn<n>`, and annotates the LLM-visible `ToolMessage.content` via a `_ref` object key (for JSON objects) or a `[ref: …]` prefix line (everything else). Unresolved placeholders are surfaced to the model via a trailing `[unresolved refs: …]` line on the same tool message. The event-driven dispatch path gets the same treatment so subagent-as-tool calls benefit automatically. * docs: document tool output references in bash tool description Adds a dedicated section to the bash tool's prompt-facing description explaining the `tool<idx>turn<turn>` key format, where it surfaces (prefix line vs `_ref` field), how to embed `{{…}}` placeholders in the `command` string, and what happens when a reference is unknown. Any tool can reference previous outputs, but bash is the primary beneficiary since piping into `jq`/`grep`/`awk` is the main use case. * fix: gate bash reference docs on feature enablement The tool-output reference guidance lived inside the bash tool's static description constant, so the LLM was told to emit `{{tool0turn0}}` placeholders on every run regardless of whether `RunConfig.toolOutputReferences.enabled` was set. Those placeholders passed through unresolved and leaked into the shell command, producing confusing errors. Extract the guidance into a sibling `BashToolOutputReferencesGuide` constant plus a `buildBashExecutionToolDescription({ enableToolOutputReferences })` composer. Hosts enable the guidance only when they also enable the registry; the default `BashExecutionToolDescription` no longer mentions references. * refactor: tighten ToolOutputReferenceRegistry internals Addressing review findings on the registry module: - Drop the redundant `key` field from registry entries; store plain strings in `Map<string, string>` since the Map key is already the ref. - Move the stateful `/g` matcher off the module-level export. The exported `TOOL_OUTPUT_REF_PATTERN` is now non-global (no `lastIndex` footgun for future callers); the global variant lives as a private static on the class. - Short-circuit `resolve()` via a `hasAnyPlaceholder` pre-scan so placeholder-free args skip the deep tree walk and object allocations entirely. - Document the `maxTotalSize` soft-cap semantics on `ToolOutputReferencesConfig` (most-recent entry is never self-evicted). - Fix "appended" → "prepended" in the prefix helper's JSDoc. Tests updated to match the non-global regex shape. * fix: harden ToolNode reference lifecycle for edge cases Addresses the codex review plus the deeper audit: - Re-resolve `{{…}}` placeholders after a `PreToolUse` hook rewrites args via `updatedInput`. Previously a hook that introduced new placeholders would bypass substitution and leak raw text to the tool. The unresolved-refs record is refreshed on hook replacement so the LLM warning reflects the actually-sent args. - Reset the registry and per-run turn counter when `config.configurable.run_id` changes between `run()` invocations. A host that reuses the same Run across `processStream` calls no longer serves a stale `toolturn<n>` from a previous run. - Handle `ToolMessage` returns from tools: the LangChain `tool()` helper auto-wraps string returns in a `ToolMessage`, which was taking the early-return branch and skipping annotation/registration entirely. Now the `tool`-typed branch also applies the reference key (string content only) or logs a warning and passes through (array content). - Rename the public accessor to `_unsafeGetToolOutputRegistry` and mark it `@internal`: exposing a mutable registry on a published library invites external corruption. The accessor remains for test observation. Adds integration tests for the event-driven dispatch path (previously untested), including placeholder substitution in `ON_TOOL_EXECUTE` requests, unresolved-ref reporting, PostToolUse hook output overriding registered content, and PreToolUse hook re-resolution. Plus a runId-change reset test for the direct path. * fix: harden registry caps and _ref injection Addresses two codex P2 findings: - **maxTotalSize is now a hard cap.** The per-output cap is clamped to `min(maxOutputSize, maxTotalSize)` at construction so a single stored entry can never exceed the aggregate bound, and the eviction loop no longer keeps the most-recent entry around when it alone violates the cap. Fixes a case where `maxOutputSize: 1000, maxTotalSize: 100` silently retained 1000 chars. Updated the config JSDoc to drop the soft-cap language. - **`_ref: null` no longer clobbers the injected key.** The guard that treated an existing `_ref: null` as non-conflicting was correct, but the spread order `{ _ref: key, ...obj }` then let the null re-win. Reversed to `{ ...obj, _ref: key }` so the injected value always lands last. Adds tests for both: per-output clamping to aggregate bound, and `_ref: null` → overwritten with the generated key. * fix: surface unresolved refs on error paths and refine annotation UX Addresses codex round 3 (P2) plus three follow-up NITs: - **Unresolved refs on errors** (codex P2): `[unresolved refs: …]` is now appended to error ToolMessages too, across all three error paths — thrown exceptions in `runTool`, tool-returned `ToolMessage({status:'error'})`, and host-returned error results in `dispatchToolEvents`. Without this, an LLM using a bad reference key saw only the generic error and lost the self-correction signal this feature exists to provide. - **Preserve `_ref` as the first key** (NIT N-1): the round-2 `{ ...obj, _ref: key }` ordering fixed the null-clobber issue but regressed the key position. Destructure the conflicting key out first, then place our injected ref at the front, so the LLM reads the reference identifier before the payload. - **Warn-once per tool per run for non-string content** (NIT N-3): a tracked `Set` on the ToolNode prevents the per-invocation log spam when a tool consistently returns multi-part content. Cleared alongside the registry on runId change. - **JSDoc trail to the builder** (NIT N-2): `BashExecutionToolDefinition` now documents that reference-aware hosts should compose their own description via `buildBashExecutionToolDescription`. Adds tests for all three error paths carrying unresolved-ref hints and for the first-key placement invariant. * fix: clear registry state on anonymous runs to prevent leakage Addresses codex round 4 P2: When `config.configurable.run_id` is absent, `incomingRunId === lastRunId` (both `undefined`), so the previous reset branch never fired. A ToolNode reused across independent sessions that omit `run_id` could therefore serve stale `{{toolturn<n>}}` substitutions from a prior session — cross-session data leakage. Treat a missing `run_id` as "a new, distinct run" on every batch: clear the registry, warn-set, and turn counter on entry. Production callers going through `Run` always set `run_id`, so this only affects direct ToolNode usage (tests, library embedders). Tests updated to pass an explicit `run_id` through the helper so cross-batch persistence (`{{tool0turn0}}` substitution into a later call) continues to work under the new semantics. Added a dedicated test showing that anonymous callers get fresh state each batch (the placeholder resolves to an unresolved-ref warning in the second call because the first call's output is not retained). * fix: thread turn per invocation and preserve raw outputs in registry Two related changes: **Codex round 5 (P2): per-invocation turn.** `applyOutputReference` previously read `this.currentTurn` after the tool `await`, which is mutable shared state. Two concurrent `invoke()` calls on the same ToolNode could step on the field between sync-prefix assignment and post-await registration, producing cross-batch key collisions. `run()` now captures `turn` in a local and threads it through `runTool` and `dispatchToolEvents`; `applyOutputReference` takes a precomputed `refKey` instead of building one from shared state. **User-requested: preserve full tool outputs in the registry.** The registry no longer mirrors the LLM-visible truncation. It stores the raw, untruncated output so a `{{toolturn<n>}}` substitution pipes the complete payload into downstream bash/jq even when the LLM only saw a head+tail-truncated preview in `ToolMessage.content`. Separate size caps govern the registry: - `HARD_MAX_TOTAL_TOOL_OUTPUT_SIZE` is now 5 MB (was 800 KB). - `ToolOutputReferencesConfig.maxOutputSize` defaults to the 5 MB total cap, not `maxToolResultChars`. - `applyOutputReference(llmContent, registryContent, refKey, unresolved)` takes both views explicitly: the truncated, annotated form for the model and the full, pristine form for the registry. The event-driven dispatch path does the same split — the pre-hook raw and post-hook raw are tracked separately from the truncated LLM view, so the registered value is the hook-authoritative raw payload. Adds tests for: - Raw preservation: a tool returns 8 KB of output with `maxToolResultChars: 200`; the LLM sees the truncated form, but a later `{{tool0turn0}}` substitution delivers the full 8 KB verbatim. - Concurrent invocations within a run: two overlapping batches of the same run resolve gates in reverse order; each registers under its own turn index (tool0turn0 and tool0turn1) with correct content. * fix: share tool output registry across all ToolNodes in a run Addresses codex round 6 (P1): multi-agent graphs construct one ToolNode per agent, each with its own registry instance. A reference key produced in agent A (for example `tool0turn0`) therefore wasn't available when agent B tried to substitute `{{tool0turn0}}`, breaking cross-agent piping. Centralize run-scoped state onto the registry itself and share one instance across every ToolNode a `Graph` compiles: - `ToolOutputReferenceRegistry` now owns `turnCounter`, `lastRunId`, and `warnedNonStringTools`. New methods `nextTurn(runId)` and `claimWarnOnce(toolName)` handle the run-crossing reset and the warn-once memo respectively, atomically from JS's perspective so concurrent ToolNode invocations within one run see distinct turns. - `ToolNodeOptions` gains `toolOutputRegistry?: ToolOutputReferenceRegistry` — a host-supplied instance that takes precedence over the config object. Passing the config still works for direct ToolNode usage. - `Graph` lazily constructs one registry per run via `getOrCreateToolOutputRegistry()` and threads the same instance into every `CustomToolNode` it builds (both the event-driven and direct branches of `initializeTools`). - `ToolNode.run()` shrinks to `const turn = this.toolOutputRegistry?.nextTurn(runId) ?? 0;` — the registry handles reset bookkeeping so ToolNode no longer duplicates any run-scoped fields. Adds a test that registers output in one ToolNode and resolves `{{tool0turn0}}` in a *different* ToolNode sharing the same registry — cross-node substitution now works, and the second node's own output lands under `tool0turn1` (proving the turn counter is shared). * fix: preserve JSON validity and lower per-output default for ref annotation Addresses two codex round-7 P2 findings: **JSON validity with unresolved refs.** `applyOutputReference` appended the unresolved-refs trailer as a trailing text line even when the tool output was a JSON object, producing invalid JSON that consumers parsing `ToolMessage.content` couldn't handle. `annotateToolOutputWithReference` now takes the `unresolved[]` list and routes both the ref key and the unresolved refs through the same JSON injection path. New `TOOL_OUTPUT_UNRESOLVED_KEY` constant (`_unresolved_refs`) surfaces unresolved keys as an array field on the object instead of a trailing line. Non-object content keeps the old `[ref: …]\n…\n[unresolved refs: …]` shape. **Per-output cap default.** The registry's default per-output cap was `HARD_MAX_TOTAL_TOOL_OUTPUT_SIZE` (5 MB), letting a single `{{…}}` substitution inject multi-megabyte strings into bash `command` args and blowing typical shell `ARG_MAX` limits. Default now `HARD_MAX_TOOL_RESULT_CHARS` (~400 KB) — matches the standard tool-result budget and the LLM-visible truncation size. Hosts wanting fuller fidelity can still raise it up to the 5 MB total budget via config. Added tests covering: - unresolved refs carried as a JSON field for parseable objects; - trailer-line form for non-object content; - unresolved-only annotation (no ref key, for error paths); - JSON preservation when only unresolved refs are present; - no-op when nothing to annotate. * fix: emit resolved args and surface unresolved on non-string content Codex round 8 flagged two remaining gaps: **P2 — Resolved args leak to audit logs as `{{…}}` templates.** When `runTool` substituted a placeholder, the post-substitution args only lived in the local `args` variable. `handleRunToolCompletions` then emitted `ON_RUN_STEP_COMPLETED.tool_call.args` from the original `call.args`, so hosts observing the event saw the unresolved template rather than what the tool actually received. ToolNode now tracks `resolvedArgsByCallId` (cleared at the start of each batch, populated after `registry.resolve`). The completion handler prefers that map and falls back to `call.args` when no substitution happened. Event-driven dispatch already emitted the resolved args through `entry.args`, so only the direct path needed the fix. **P3 — Non-string ToolMessage content drops unresolved-ref warnings.** The non-string branch (multi-part `ToolMessage.content` blocks) only warn-logged the skipped registration; unresolved placeholder keys were silently dropped, denying the LLM the self-correction signal that the string and error paths already provide. Now prepend a `{ type: 'text', text: '[unresolved refs: …]' }` block to the array content so the model sees the warning before the data blocks. Adds tests for: - `ON_RUN_STEP_COMPLETED.tool_call.args` carrying the substituted command on a batch that resolved `{{tool0turn0}}`. - A multi-part-content tool that gets the `[unresolved refs: …]` text block prepended to its content array, with original blocks preserved after. * fix: preserve existing _ref/_unresolved_refs payload fields on annotation Codex round 9 (P2): `tryInjectRefIntoJsonObject` unconditionally stripped both `_ref` and `_unresolved_refs` before rebuilding the object, which silently deleted legitimate tool payload data on two paths: - plain-annotation (`key` set, no unresolved): an existing `_unresolved_refs` payload field was dropped because we always stripped it even though we weren't injecting one. - unresolved-only / error (`key` undefined, unresolved > 0): an existing `_ref` field was dropped for the same reason. Now only strip the key we're actually injecting. Pre-existing values for the *other* framework-owned field pass through untouched, turning the helper back into a pure annotation that mutates nothing outside its intended slot. Added a deep-equal collision guard for `_unresolved_refs` mirroring the existing `_ref` guard: when an existing array value differs from the one we'd inject, fall back to the prefix form rather than silently overwriting. Deep-equal arrays are accepted idempotently. New tests cover: `_unresolved_refs` preserved on the ref-only path; `_ref` preserved on the unresolved-only path; collision fallback when existing `_unresolved_refs` is a non-matching array; and deep-equal idempotence. * fix: parallelize mixed direct+event dispatch to isolate same-turn refs Codex round 10 (P2): in the mixed event-driven + direct path, `run()` awaited direct calls first and only then dispatched event calls. Because `runTool` registered its output immediately, event calls ran `registry.resolve` against a registry that already held the direct branch's just-registered outputs — meaning event args could resolve same-turn `{{toolturn<n>}}` references (even forward refs to same-batch direct calls). Pure-direct or pure-event batches didn't allow this, so the semantic was mode-dependent. Run both branches in parallel. Both sync prefixes — which include all `registry.resolve(args)` passes — now complete against the *pre-batch* registry state before either branch's async bodies get a chance to register. JS single-threaded execution guarantees no microtask (= no registration) runs between the two sync prefixes, so same-turn refs remain unresolved regardless of how calls are split across direct/event in a mixed batch. Cross-batch refs still resolve as before. Added a test that exercises a mixed batch with a direct call at index 0 and an event call at index 1 whose args reference `{{tool0turn0}}`. First batch: the placeholder does NOT resolve (host receives the literal template). Second batch: same ref now points across a batch boundary and resolves to the direct call's registered output from batch 1. * fix: restore fail-fast and partition registry state per runId Codex round 11 flagged two P1 regressions: **Fail-fast under mixed direct+event mode.** My round-10 `Promise.all([directPromise, eventPromise])` started the host- dispatch branch *immediately*, so a direct tool throwing (via `handleToolErrors=false` or `GraphInterrupt`) could no longer abort event dispatch before side effects ran. Fix: snapshot event calls' args synchronously against the pre-batch registry state, then serialize — await directs first, dispatch events only if directs succeeded. Same-turn isolation stays because the event args were captured before any direct registration. Added a test that confirms a thrown direct tool prevents any host `ON_TOOL_EXECUTE` dispatch. **Run-scoped registry state leaked across concurrent runs.** The registry's `nextTurn(runId)` cleared global state whenever `runId` changed, so two overlapping runs reusing the same ToolNode could clobber each other: run B's first batch wiped run A's in-flight state, and run A's later registrations landed in run B's bucket. Fix: partition the registry internally by `runId`. Each bucket owns its own entries/totalSize/turnCounter/warnMemo; all public methods (`set`/`get`/`resolve`/`nextTurn`/`claimWarnOnce`) now take `runId` and route to the correct bucket. Anonymous (`run_id` absent) batches still reset per-call. FIFO-evicts oldest runId bucket when `maxActiveRuns` (default 32) is exceeded; new `releaseRun(runId)` API for deterministic cleanup. ToolNode threads runId through every registry call site. Added a test that runs two runs back-to-back on a shared registry and verifies: run A's output survives run B starting; each run resolves `{{tool0turn0}}` against its own partition. * fix: clear registry on run cleanup and give anonymous batches unique scopes Codex round 12 flagged two more issues: **P2 — Graph.clearHeavyState didn't clear registry contents.** The Graph nulled its own `_toolOutputRegistry` field, but each compiled ToolNode had already captured the instance at construction time. So stored `toolturn<n>` entries (up to 5 MB of raw output) stayed reachable across subsequent `processStream()` calls on the same Run. Now `clearHeavyState()` calls `registry.clear()` before dropping the reference so ToolNodes holding the same instance see a wiped registry on the next batch. **P3 — Concurrent anonymous invocations clobbered each other.** `nextTurn(undefined)` deleted the shared anonymous bucket before creating a new one; two overlapping invocations without `run_id` could wipe each other's in-flight state. ToolNode now mints a unique `batchScopeId` (`\0anon-<N>`) per anonymous `run()` and threads it through every registry call in place of the raw `run_id`. Each anonymous batch gets its own partition, so concurrent anonymous calls are fully isolated. Tests added: - `clear()` drops every run bucket; `releaseRun(runId)` drops only the named one. - Two concurrent `node.invoke(...)` calls without `run_id`, gates resolved in reverse order — each observes its own registered output under `[ref: tool0turn0]` with no cross-contamination. * fix: clamp caller-supplied maxTotalSize to the documented hard cap Codex round 14 (P2): `HARD_MAX_TOTAL_TOOL_OUTPUT_SIZE` is documented as an absolute upper bound, but the constructor's user-provided branch was bypassing it — `calculateMaxTotalToolOutputSize` already clamps the computed default, but a host passing `maxTotalSize: 50_000_000` would have a 50 MB registry instead of 5 MB. Apply the same `Math.min(..., HARD_MAX_TOTAL_TOOL_OUTPUT_SIZE)` to caller-supplied values so the cap actually holds. Per-output is already clamped to the (now hard-bounded) total, so it inherits the fix. Added a test that requests `maxTotalSize: 50_000_000` and asserts both `totalLimit` and `perOutputLimit` end up ≤ 5 MB. * fix: clamp registry caps, snapshot for hook re-resolve, batch-local args sink Addresses two new codex P2 findings plus follow-up audit items: **Codex P2 — Clamp configured maxTotalSize to hard limit.** A caller-supplied `maxTotalSize` bypassed the documented 5 MB absolute cap (`HARD_MAX_TOTAL_TOOL_OUTPUT_SIZE`). The constructor now clamps via `Math.min(options.maxTotalSize, HARD_MAX_…)` so the registry can never retain more than the documented bound, regardless of host config. **Codex P2 — Preserve pre-batch ref snapshot after PreToolUse rewrites.** In the mixed direct+event path the PreToolUse hook's `updatedInput` re-resolved against the live registry, which by that point already had same-turn direct outputs registered. A hook-introduced `{{tool<idx>turn<current>}}` could therefore leak a same-batch direct result, breaking the same-turn isolation guarantee. Added `ToolOutputReferenceRegistry.snapshot(runId)` returning a frozen `ToolOutputResolveView` that resolves against a defensive- copied entries map. The mixed path takes a snapshot before launching directs and threads it into `dispatchToolEvents`; the hook re-resolve uses the snapshot when set, falling back to live-registry resolution in the pure-event path (where no in-batch registrations have happened yet). **Audit follow-ups:** - Renamed inner `turn` (per-tool usage count) to `usageCount` in `runTool` to stop shadowing the outer batch-turn parameter. - Switched ToolNode's `./toolOutputReferences` import to the `@/` alias used everywhere else in the file. - Removed instance-level `resolvedArgsByCallId` in favor of a batch- local `Map` threaded through `runTool` and `handleRunToolCompletions`. Concurrent `run()` calls on the same ToolNode no longer race on shared sink state — each batch gets its own map. Tests added: - `snapshot()` resolves against captured state and ignores later mutations / additions; multiple snapshots are independent. - `maxActiveRuns` LRU eviction drops the oldest bucket when capped. - `calculateMaxTotalToolOutputSize` boundary cases (undefined, 0, negative, exact-cap, just-past-cap). - `buildBashExecutionToolDescription` returns base by default and appends the references guide when `enableToolOutputReferences:true`. - New ToolNode integration test: mixed direct+event batch with a PreToolUse hook injecting `{{tool0turn0}}` — host receives the literal template (snapshot prevents same-turn resolution). * refactor: collapse runTool/dispatch params into batch-context options object Audit follow-ups from the post-round-14 comprehensive review: **#1 — Parameter explosion.** `dispatchToolEvents` had grown to 8 positional parameters (5 optional) and `runTool` to 6 (4 optional); `executeViaEvent` was passing trailing `undefined` placeholders to skip slots. Replaced with two typed options bags (`RunToolBatchContext`, `DispatchBatchContext`) so each method takes just `(call/calls, config, batchContext)`. Eliminates the `undefined`-placeholder anti-pattern and stops future context fields from re-bloating the signatures. **#2 — Import-type grouping.** `import type { ToolOutputResolveView, … }` was placed after the local value imports, separated from the other `import type` block. Moved into the type-import section per AGENTS.md's "import type imports should be in their own section" rule. **#3 — Test file location.** `src/utils/truncation.test.ts` was co-located with the source while sibling tests use the `__tests__/` subdirectory convention. Moved to `src/utils/__tests__/truncation.test.ts` and switched the import to the `@/utils/truncation` alias. **#4 — Missing snapshot edge case.** Added `snapshot()` test for a `runId` with no bucket — confirms placeholders are reported as unresolved and content passes through unchanged when nothing has been registered. Also exported `PreResolvedArgsMap` and `ResolvedArgsByCallId` aliases from the registry module so the new context types reuse a single source of truth instead of duplicating shape definitions. * chore: bump to 3.1.71-dev.0 and tighten handleRunToolCompletions typing - Use the exported `ResolvedArgsByCallId` alias for the `handleRunToolCompletions` `resolvedArgsByCallId` parameter so it matches the batch-context types instead of redeclaring the inline shape; the parameter's docstring moved to a `@param` block on the method JSDoc. - Bump `@librechat/agents` to `3.1.71-dev.0` for the tool-output-references release (rebased onto 3.1.70 release).

* refactor: lazy-annotate tool output refs at LLM request time Demote `_ref` annotation from a durable mutation of `ToolMessage.content` to a transient projection applied right before each provider call. `ToolNode` now stamps `additional_kwargs._refKey` / `_unresolvedRefs` and leaves `content` clean (truncated only). At request time, `attemptInvoke` pulls the run-scoped registry off the graph and applies `annotateMessagesForLLM` to project a copy of the relevant ToolMessages with the same `[ref: …]` / `_ref` shape the model was already trained on. Annotations never persist back into graph state, so downstream consumers (LibreChat tool-output panel, audit logs, persisted history) see raw output, and stale `_refKey` values from prior runs silently no-op via a registry-presence gate. The `RunConfig.toolOutputReferences.enabled` flag and the `{{toolturn<n>}}` substitution semantics are unchanged. * chore: bump version to 3.1.71-dev.1 * perf: drop strip + lazy-allocate output array in annotateMessagesForLLM Two complementary cleanups: 1. Drop `stripRefMetadata`. LangChain's provider serializers don't transmit `additional_kwargs` to provider HTTP APIs, and the projected message array is discarded after the request — so the metadata never reaches the wire and never round-trips back into graph state. The defensive clone was wasted work: an O(n) walk plus a fresh object per annotated ToolMessage on every invoke. 2. Lazy-allocate `out`. The previous `messages.map(...)` + trailing reference-equality scan allocated an m-length array on every call even when no ToolMessage needed annotation (the common case mid-conversation, when refs go stale). Switching to a `for` loop that copies via `messages.slice()` only on first projection makes the no-op path truly allocation-free. * fix: stamp _refScope so anonymous-run refs annotate correctly ToolNode partitions the registry by scope id, falling back to a synthetic per-batch key (`\0anon-<n>`) when `config.configurable.run_id` is absent. The lazy annotation transform looked up via the config-derived runId, which is undefined for anonymous callers — so the registry-presence gate missed and `[ref: …]` markers were silently dropped, even though the underlying output had been registered. Stamp the registry scope alongside `_refKey` in `ToolMessage.additional_kwargs` and read it back in `annotateMessagesForLLM`. For named runs `_refScope` equals the run_id; for anonymous batches it carries the synthetic scope so the lookup hits the right bucket. Falls back to the runId param when `_refScope` is absent (legacy / pre-this-fix messages). Caught by codex review on PR #117. * review: address comprehensive audit findings on lazy ref annotation Resolves all valid findings from the codex review (#117): #1 (MAJOR) - Defensive `additional_kwargs` strip on the projected ToolMessage. The earlier reasoning (LangChain serializers don't transmit `additional_kwargs`) is correct today but fragile against custom adapters and future LangChain changes. Strip our three framework-owned keys (`_refKey`, `_refScope`, `_unresolvedRefs`) on the projection so the metadata never reaches the wire under any serializer behavior. Cost: one shallow object spread per annotated message. Early-out when none of the keys are present preserves the no-strip path. #2 (MINOR) - New test for JSON-object content carrying both a live `_refKey` and `_unresolvedRefs`, exercising `annotateToolOutputWithReference`'s collision-detection logic through the projection entry point. #3 (MINOR) - New cross-run hydration smoke test in `invoke.test.ts`: runs ToolNode under run-1, persists a real tool message, then hydrates it into run-2's state alongside a fresh run-2 tool message. Verifies run-2's annotation transform annotates only run-2's live ref and leaves run-1's hydrated message clean — the headline scenario the lazy-annotation refactor was designed for. #4 (MINOR) - `Graph.getOrCreateToolOutputRegistry` JSDoc now carries `@internal`, mirroring `_unsafeGetToolOutputRegistry` on ToolNode. #7 (NIT) - Extracted `cloneToolMessageWithContent` helper to deduplicate the 7-field ToolMessage construction across the string-content and array-content branches. #6/#8/#9 (NIT) - Removed unused `streamMessages` field from `buildCapturingModel`, replaced `any` with a `StubModel` interface in test helpers, fixed eslint indentation warnings. #5 from the review (anonymous-run silently skipped annotation) was already resolved in the prior `_refScope` fix. * fix: defensively coerce ref metadata in annotateMessagesForLLM Treat persisted/hydrated `ToolMessage.additional_kwargs` as untrusted input. The previous transform trusted `_unresolvedRefs` to be a `string[]` and `_refKey`/`_refScope` to be strings, then immediately called `.length` / `.join(...)` and used the values as registry keys. A single hydrated message carrying a malformed shape (e.g. `_unresolvedRefs: 'tool0turn0'` from an upstream serialization quirk) would crash `attemptInvoke` before the provider call, turning one bad message into a hard run failure. Add three small reader helpers — `readRefKey`, `readRefScope`, `readUnresolvedRefs` — that coerce non-string scalars to undefined and non-array `_unresolvedRefs` to `[]`. Inside a real array, filter out non-string entries rather than silently failing later in the join. Caught by codex review on PR #117 (P2: Guard lazy annotation against malformed _unresolvedRefs). * test: add _refScope coverage at ToolNode unit level + clarify test title Two follow-up review findings: #1 (MINOR) - ToolNode unit tests had `getRefKey` and `getUnresolvedRefs` helpers but no `getRefScope`, so a regression in `recordOutputReference` that dropped `_refScope` stamping would only surface via the integration test in `invoke.test.ts`. Add `getRefScope` and spot-check it at two key sites: the basic stamping path (named run, scope == run_id) and the concurrent anonymous invocations path (each batch gets a distinct synthetic `\0anon-<n>` scope). #2 (NIT) - The test "drops additional_kwargs entirely when only framework keys were present" actually asserts `toEqual({})` because LangChain's ToolMessage constructor normalizes the `undefined` we pass to `{}`. Renamed to "leaves additional_kwargs empty when stripping removed every framework key" with a JSDoc clarifying the chain. * fix: strip framework ref metadata even on stale-ref-only messages Codex caught that the previous lazy-projection branch returned the original `ToolMessage` reference-equal when the only metadata was a stale `_refKey` (and no `_unresolvedRefs`), so `_refKey` / `_refScope` survived in `additional_kwargs` on the payload reaching `model.invoke`/`model.stream`. That contradicts the defensive goal of `stripFrameworkRefMetadata`: framework keys must never reach the wire under any custom or future serializer that transmits `additional_kwargs`. Project unconditionally whenever any framework key is present, regardless of whether annotation applies. The annotation step itself remains conditional on a live registry hit / `_unresolvedRefs`. Net effect: stale-ref messages keep their content unchanged but get a clean projected `additional_kwargs`. Also harden the feature-disabled fast path: hosts that haven't enabled `RunConfig.toolOutputReferences` get an explicit short-circuit at O(1) without touching a single ToolMessage. Added a regression test that spies on `BaseMessage._getType` to assert not a single message is iterated when the registry is undefined. Existing tests for stale/malformed ref metadata updated to assert the new strip-on-projection contract instead of reference-equality. * docs: nudge model toward heredoc in BashToolOutputReferencesGuide Real-world testing against LibreChat + ClickHouse + bash hit a predictable pattern: model emits `echo '{{ref}}' | wc -c`, the substituted ClickHouse output contains literal single quotes / parens / arbitrary bytes (e.g. `randomString` results), and the shell breaks before `wc` ever runs. Model self-corrected on retry with `wc -c << 'EOF' … EOF`, but burned a tool-call round trip to discover it. Substitution itself is doing the right thing — the SDK can't know the intended quoting context (jq, grep -F, here-string, python -c, …) so auto-escaping would break legitimate inline cases more often than it'd help. Same failure mode hits any inline literal data; refs just make it easier to trip on. Cheap fix: surface the heredoc-with-quoted-delimiter pattern in the guide upfront, so the model reaches for it without having to learn the lesson the hard way. Pure ASCII line-oriented output keeps the shorter `echo '…' | …` form. * chore: release 3.1.71 * fix: guard `in` probes against non-object additional_kwargs Codex P1: `annotateMessagesForLLM` cast `m.additional_kwargs` to `Record<string, unknown>` and probed with `'_refKey' in meta`. The `in` operator throws `TypeError` on primitives, so a hydrated or client-supplied ToolMessage carrying `additional_kwargs: 'string'` (or any non-object) would crash `attemptInvoke` before the provider call instead of being safely skipped. Add a runtime `typeof rawMeta !== 'object'` guard before the `in` probes. Malformed messages now pass through unchanged and the rest of the array continues to project normally — same defensive posture already applied to `_refKey` / `_unresolvedRefs` field shapes. Regression test forces a primitive past LangChain's typed setter via cast and asserts (a) no throw, (b) the malformed message passes through, and (c) a subsequent live-ref message still annotates. * chore: sync package-lock.json to 3.1.71 * review: address remaining review nits on lazy ref annotation Comprehensive audit follow-up: #1 (MINOR) - `InvokeContext` previously resolved indirectly through `Parameters<ChatModelStreamHandler['handle']>[3]`, hiding the `getOrCreateToolOutputRegistry()` accessor that `attemptInvoke` calls. Reshape the type as `NonNullable<...> & { getOrCreateToolOutputRegistry?(): ... }` so the contract is visible at this site without chasing the upstream handler signature. NonNullable strips `undefined` so the intersection doesn't collapse on the undefined branch; callers express optionality via `context?: InvokeContext`. #3 (MINOR) - New unit test for artifact preservation through `annotateMessagesForLLM`. Hosts attach `artifact` via the `content_and_artifact` response format (code-execution sessions, MCP tools); the projection must round-trip it untouched so audit logs and code-session tracking keep working. `makeToolMessage` helper extended to accept `artifact`. #4 (NIT) - Extracted test-helper return types `CapturingModel` and `StreamingCapturingModel` to named aliases. Closing-brace indentation is no longer ambiguous against property lines, and the helpers' return shape is reusable. #5 (NIT) - Documented the `as unknown as ToolMessage['content']` double-cast inline. LangChain's content union doesn't accept a freshly built mixed array literal even though the structural shape is valid at runtime; the comment now records why the cast is unavoidable. Skipped #2 (`tryFallbackProviders` test fragility): the reviewer themselves marked it "Accept as pragmatic." Adding a model-factory parameter to production code purely for testability is an anti-pattern, and deleting the test would lose meaningful coverage of the fallback branch. Skipped #6 (version bump bundling): the release version bump was explicitly requested in the PR conversation; the bundling is intentional for this project's workflow. * docs: align InvokeContext JSDoc with the actual call site The JSDoc example used `context?.getOrCreateToolOutputRegistry?.()` (double optional chain), but the actual call site is single-chain: `context?.getOrCreateToolOutputRegistry()`. The method is required on the `StandardGraph` branch of the intersection, so once `context` is non-null the call is unconditional. Note this inline so the docs don't drift from the code. Cosmetic-only follow-up to review of #117.

danny-avila added 2 commits April 25, 2026 18:41

pull Bot locked and limited conversation to collaborators Apr 26, 2026

pull Bot added the ⤵️ pull label Apr 26, 2026

pull Bot merged commit 644ec2d into innFactory:main Apr 26, 2026
1 check passed

pull Bot had a problem deploying to publish April 26, 2026 10:43 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from danny-avila:main#68

[pull] main from danny-avila:main#68
pull[bot] merged 2 commits intoinnFactory:mainfrom
danny-avila:main

pull Bot commented Apr 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pull Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pull Bot commented Apr 26, 2026 •

edited

Loading