Skip to content

[pull] main from danny-avila:main#66

Merged
pull[bot] merged 26 commits intoinnFactory:mainfrom
danny-avila:main
Apr 23, 2026
Merged

[pull] main from danny-avila:main#66
pull[bot] merged 26 commits intoinnFactory:mainfrom
danny-avila:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Apr 23, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* feat: add inert hook registry and executor scaffolding (phase 1/1)

Introduces `src/hooks/` with types, registry, and executor for a 12-event
hook lifecycle. Purely additive — not exported from `src/index.ts` and not
yet wired into `Run`, `Graph`, or `ToolNode`. Integration lands in the
next PR so this change is shippable independently with zero behavioral
impact on hosts that don't opt in.

- Discriminated-union input/output per event; generic `HookCallback<E>`
  and `HookMatcher<E>` for type-safe registration.
- `HookRegistry` uses `Map<sessionId, bucket>` (not `Record`) to avoid
  O(n^2) churn under parallel registration in multi-tenant hosts.
- `executeHooks` fans out in parallel, races hooks against a combined
  parent + timeout `AbortSignal` so non-listening hooks still time out,
  folds decisions with `deny > ask > allow` precedence, accumulates
  `additionalContext`, and self-removes `once: true` matchers after any
  successful hook. Errors are non-fatal: swallowed into the aggregated
  result and routed through an optional winston logger (falling back to
  `console.warn`), with internal matcher errors suppressed.

47 tests under `src/hooks/__tests__/` cover registry scoping, regex
matching, precedence folding, once-self-removal (success/failure/mixed),
timeout enforcement (including non-listening hooks), error non-fatality,
and parent signal combination.

* fix(hooks): address review findings — rename field, wire updatedOutput, cache patterns

Resolves the comprehensive review on #87:

- Rename `HookMatcher.matcher` -> `HookMatcher.pattern` to fix the
  self-referential naming (`m.matcher` reads as "the matcher's matcher").
- Wire `PostToolUseHookOutput.updatedOutput` through `AggregatedHookResult`
  and `fold()`. The type was previously a promise the executor didn't
  fulfill — a hook returning `{ updatedOutput: ... }` was silently dropped.
- Correct the JSDoc on `updatedInput`/`updatedOutput`. The previous copy
  claimed non-determinism "in Promise.all resolution order" — but
  `Promise.all` preserves input order, so the fold is deterministic in
  registration order (outer loop over matchers, inner loop over each
  matcher's hooks). Updated the `executeHooks` function docstring to match.
- Add regex compilation cache and length bound to `matchers.ts`. Patterns
  compile at most once per unique string and are reused across calls;
  patterns over MAX_PATTERN_LENGTH (512) are rejected outright as a cheap
  ReDoS mitigation. Documented the trust model (patterns are trusted input;
  hosts must validate user-supplied patterns upstream).
- Document the wildcard-only semantics on query-less events: a non-empty
  pattern on an event that doesn't supply a matchQuery (RunStart, Stop,
  etc.) never matches.
- Document the `once: true` concurrent-dispatch limitation: two parallel
  `executeHooks` calls each snapshot the matcher list, so `once` means
  "once per call", not "once globally". Matches Claude Code's semantics.
  Added a test that pins this behavior.
- Merge `matchers.filter(...)` + task-build loop into a single pass per
  AGENTS.md "consolidate sequential O(n) operations."
- Scope `eslint-disable no-console` to the single `console.warn` call in
  `reportErrors` instead of disabling it file-wide.
- Fix the timeout-ignoring-hook test: clear the dangling `setTimeout` and
  assert the error surfaces an abort-shaped label.
- Add tests: multi-hook `preventContinuation` (first-writer-wins on
  stopReason, flag without reason), `updatedOutput` flow + registration
  order, registration-order determinism for `updatedInput`, pattern
  length bound, compilation cache hit/miss/clear, concurrent `once`
  dispatch. Total: 47 -> 59 tests.

Findings audited and resolved:
- #1 ReDoS: cache + length bound + trust model doc (full fix blocked on
  host-side validation)
- #2 Wrong updatedInput ordering JSDoc: fixed
- #3 Dead `updatedOutput` type: wired through
- #4 Concurrent `once` race: documented + test
- #5 `HookMatcher.matcher` naming: renamed to `pattern`
- #6 `WideCallback` dead code: rejected as inaccurate (used in `runHook`)
- #7 Eslint-disable scope: line-scoped
- #8 File-path comments: rejected; matches existing project convention
- #9 Two-pass filter: single pass
- #10 Regex recompilation: cached
- CL-4 Undocumented wildcard semantics: documented
- RT-3 Timeout test dangling timer: fixed + error content verified
- RT-4 No multi-hook preventContinuation test: added

* perf(hooks): atomic once, bounded LRU, ReDoS heuristic, shared signal

Rework the phase-1 hooks scaffolding for the multi-tenant scale we
actually operate at. These are behavior changes (semantics of `once`
tightened, new pattern-rejection path) but none of them are wired into
the runtime yet — this PR is still inert — so the blast radius is the
hooks directory only.

## Atomic `once: true`

Removal now happens synchronously inside `executeHooks`, between
`registry.getMatchers` and the first `await`. Node's event loop
serialises sync work, so two concurrent `executeHooks` calls can never
both observe the same `once` matcher — whichever call runs its sync
prefix first consumes it, and the loser sees an empty bucket.

Semantic change: `once` is now at-most-one-dispatch, not
at-most-one-successful-execution. If every hook in a `once` matcher
throws, the matcher is still gone. The old retry-on-failure behavior
was Claude Code's model — fine for a single-user CLI, but in a
multi-tenant server it opens a window where concurrent bursts can
re-dispatch a "once" hook non-idempotently. "At most once, ever" is
the correct semantic for a shared registry. Hosts that need retry
should register a normal matcher and self-unregister via the callback
returned from `registry.register`.

Dropped `collectOnceMatchersForRemoval` + its Map/array allocations —
the post-hoc collection step is no longer needed.

Tests: replaced "keeps the matcher when every hook throws" with
"removes the matcher even when every hook in it throws". Added
"fires exactly once across 3 concurrent calls" and "fires exactly once
across 8-way concurrent dispatch when hooks are slow" (the slow test
deliberately uses a 10ms hook to force overlap).

## Bounded LRU regex cache

`matchers.ts` was unbounded. Under multi-tenant use where different
tenants register different patterns, the cache would leak. Capped at
`MAX_CACHE_SIZE = 256` entries with LRU eviction: hits refresh position
(delete-then-set), misses at capacity evict the oldest key. Failed
compiles are cached the same way so a tenant spamming bad patterns
doesn't burn CPU on every call.

Exposed `getMatcherCacheSize` and `MAX_CACHE_SIZE` for test
observability. Tests verify eviction order (cold patterns recompiled
after overflow) and that hot patterns refreshed mid-stream survive
eviction pressure.

## ReDoS heuristic: nested-quantifier detector

`hasNestedQuantifier` walks a pattern linearly, tracks paren depth,
and rejects any pattern where a quantified group contains another
quantifier — the `(a+)+`, `(.*)*`, `(\w+)+` shape that is the #1
catastrophic-backtracking source. Character classes (`[*+?]`) and
escaped quantifiers (`\+`) are correctly ignored. Rejection happens
before `new RegExp`, so adversarial input never reaches the compiler.

This is a floor, not a ceiling — it doesn't catch ambiguous-alternation
ReDoS like `(a|a)+`. Hosts that accept user-supplied patterns must
still validate upstream. Documented in the `HookMatcher.pattern` and
`matchesQuery` JSDoc.

Test verifies that `(a+)+$` on a 35-char adversarial input resolves in
under 50ms (would be seconds without the heuristic).

## Shared `AbortSignal` per matcher

Was: one `AbortSignal.timeout(ms)` + one `AbortSignal.any([parent,
timeout])` per hook. For a matcher with N hooks and the same timeout,
that's N timers + N composite signals.

Now: one signal per matcher, shared across all its hooks. Since hooks
in a matcher already share `matcher.timeout`, they may as well share
the timer. Behavior is identical (all hooks fire and race the same
deadline) but allocation drops from N to 1 per matcher.

## Tests

72 passing (was 59, +13):
- Atomic once: 2 concurrent tests + 1 failure-removal test updated
- LRU: cache size cap, eviction order, hit refresh
- Nested quantifier: classic shapes, deep nesting, escaped chars,
  character classes, false positives on legitimate patterns
- ReDoS mitigation: rejection path, <50ms response on adversarial input

* fix(hooks): fix ReDoS heuristic false-positives on (?:...) groups

Resolves the follow-up review on #87. Four findings, all valid.

## #1 MAJOR — hasNestedQuantifier false-positives on group syntax

The previous scanner treated `?` as a quantifier at any depth > 0, but
`?` immediately after `(` is group syntax, not a quantifier:
`(?:...)`, `(?=...)`, `(?!...)`, `(?<name>...)`, `(?<=...)`, `(?<!...)`.
Patterns like `(?:pre_)?tool_name` were silently rejected and the
registering hook never fired — with no error surfaced to the caller.

Fix: explicit group-syntax prefix skip inside the `(` handler. When the
scanner enters a group it peeks for `?` and advances past the modifier
characters (`:`, `=`, `!`, `<=`, `<!`, or a named-group label `<name>`)
before processing the group body. The `?` is therefore never considered
a quantifier at the start of a group.

While in there I also fixed a second correctness bug the reviewer
didn't catch: the old depth-indexed array didn't propagate a child
group's "contains quantifier" signal up to its parent, so `(?:(a+))+`
(outer quantifier over a wrapped quantified group, equivalent to
`(a+)+`) escaped detection. The scanner now uses an explicit stack of
`QuantifierFrame` records, and when a child group closes it propagates
`hasBacktrackRisk` to the parent frame — whether the child itself was
quantified or not. This catches `(?:(a+))+` correctly while still
allowing non-risky wrappers like `(?:(ab))+`.

Added 9 tests covering non-capturing groups, lookaheads, lookbehinds,
named captures, `(?:(a+))+` risk propagation, and `((ab)+)+` deep
wrapping. Verified existing ReDoS-detection tests still pass.

## #2 MINOR — executeHooks.test.ts did not clear the pattern cache

Matcher cache is module-level. `matchers.test.ts` clears it in
`beforeEach`; `executeHooks.test.ts` did not, so patterns compiled
during one test persisted across subsequent tests in the same file.
Added `clearMatcherCache()` to `executeHooks`'s `beforeEach` — no
test failures today, but restores test independence.

## #3 MINOR — Test utilities leaked into production barrel

`clearMatcherCache` and `getMatcherCacheSize` are test-only helpers
(their JSDoc says so) but were exported from `src/hooks/index.ts`.
When the integration PR eventually re-exports `src/hooks` from
`src/index.ts` they would become public API.

Removed both from the barrel. Tests already import them directly from
`../matchers`, so no test changes needed. `hasNestedQuantifier`,
`MAX_PATTERN_LENGTH`, and `MAX_CACHE_SIZE` remain exported — hosts can
legitimately use them for upstream validation.

## #4 NIT — LRU refresh overhead at low cache pressure

Every `compile()` cache hit was doing `delete + set` to refresh LRU
position, even when the cache was nowhere near capacity. With
`MAX_CACHE_SIZE = 256` and typical working sets of tens of patterns,
eviction pressure is near-zero and the refresh is pure overhead.

Added a `LRU_REFRESH_THRESHOLD = 75%` gate: refresh only runs when the
cache is above 192 entries. Below that the code fast-paths straight
back from `compile()`. Above it the LRU semantics kick in so hot
patterns still survive evictions — the existing "refreshes LRU
position on hit" test still passes because by the time it runs the
check, the cache is at capacity.

Tests: 72 -> 81 (+9 for group-syntax and risk-propagation coverage).
* feat: wire run-level hooks into processStream (phase 1/2)

Fires four hook lifecycle events from `Run.processStream`, making the
inert hooks module from PR #87 live. Hosts pass a pre-constructed
`HookRegistry` via `RunConfig.hooks`; when absent, all hook dispatch
is skipped (zero overhead for non-adopters).

## Integration points

- `RunConfig.hooks?: HookRegistry` — new optional field, mirrors the
  existing `customHandlers` pattern
- `Graph.hookRegistry` — field alongside `handlerRegistry`, cleared in
  `clearHeavyState()`
- `Run` constructor wires the registry onto both `this.hookRegistry`
  and `this.Graph.hookRegistry`

## Hook fire points in `processStream`

1. **RunStart** — after config setup, before stream creation. Receives
   `inputs.messages`, `runId`, `threadId`, `agentId`.
2. **UserPromptSubmit** — immediately after RunStart. Extracts prompt
   text from the last human message. If the hook returns
   `{decision: 'deny'}` or `{decision: 'ask'}`, the run aborts early
   and returns `undefined`. The `ask` case is an abort in v1 — the SDK
   returns the decision and the host decides how to implement
   interactive approval (see Phase 2 plan in the skills design report).
3. **Stop** — at the end of the try block, after the for-await loop.
   Receives the accumulated messages from `Graph.getRunMessages()`.
   Only fires on successful completion (not on error).
4. **StopFailure** — in a new catch block between try and finally.
   Receives the error message and last assistant message. Hook errors
   are swallowed (`.catch(() => {})`) so the original error propagates.

## Session teardown

`hookRegistry.clearSession(runId)` is called in the finally block,
guaranteeing cleanup even on error. Fulfills §3.9 rule #5 from the
design report.

## Public API

`src/hooks/` is now re-exported from `src/index.ts`. Hosts can import
`HookRegistry`, `executeHooks`, and all hook types directly from
`@librechat/agents`.

## Helper functions

Three module-level functions added to `run.ts`:
- `findLastHumanMessage` — backward scan for `getType() === 'human'`
- `findLastAssistantMessage` — backward scan for `getType() === 'ai'`
- `extractPromptText` — handles string and complex content arrays

## Tests

10 new integration tests using `Run.create` + `overrideTestModel` +
`processStream` (same pattern as `custom-event-await.test.ts`):
- RunStart fires with correct fields
- UserPromptSubmit extracts prompt, deny aborts, ask aborts
- Stop fires on success, does not fire on error
- StopFailure fires on error, preserves original exception
- Session cleared in finally (both success and error paths)
- No-hooks baseline works identically

Total: 81 existing + 10 integration = 91 passing.

* fix(hooks): address PR 2 review — DRY helpers, test coverage, guards

Resolves all findings from the comprehensive review on #88.

- #1 MAJOR: Add 3 tests for `extractPromptText` — multi-part content
  (text + image blocks yields 'hello\nworld'), image-only content
  (yields ''), and empty content array (yields '').
- #2 MINOR: Merge `findLastHumanMessage`/`findLastAssistantMessage`
  into parameterized `findLastMessageOfType(messages, type)`.
- #3 MINOR: Rewrite `extractPromptText` array path from
  `.filter().map().join()` triple-pass to single `for...of` loop
  accumulating into `parts[]` then `.join('\n')`.
- #4 MINOR: Add `config.callbacks = undefined` on deny/ask early
  return path so Langfuse handler reference is cleaned up even
  though no stream ran.
- #5 MINOR: Add comment on `UserPromptSubmit` fire point noting
  `attachments` is not yet wired (Phase 2).
- #6 MINOR: Strengthen `StopFailure` test assertion from
  `toBeTruthy()` to `typeof === 'string'` + `length > 0`.
- #7 MINOR: Add test for empty-content human message — verifies
  `UserPromptSubmit` fires with `prompt: ''`.
- #8 NIT: Add inline comment on `stopHookActive: false` explaining
  it will be `true` in Phase 2 when stop is triggered by a hook.
- #9 NIT: Use `hasHookFor('Stop', sessionId)` and
  `hasHookFor('StopFailure', sessionId)` guards before
  `getRunMessages()` to avoid the array copy when no hooks are
  registered for the event.
- CL-2: Wrap Stop hook dispatch in its own `.catch()` so an
  infrastructure error in `executeHooks` (theoretical — it catches
  internally) cannot masquerade as a stream failure in the catch
  block.

Tests: 91 -> 94 (+3 for extractPromptText coverage).
* feat: wire tool hooks into event-driven ToolNode (phase 1/3)

Fires PreToolUse, PostToolUse, PostToolUseFailure, and PermissionDenied
hooks inside `ToolNode.dispatchToolEvents` — the single chokepoint for
all event-driven tool execution in LibreChat.

## Hook lifecycle in dispatchToolEvents

1. **PreToolUse** fires per call in parallel before ON_TOOL_EXECUTE
   dispatch. Denied calls (deny or ask) produce error ToolMessages and
   fire PermissionDenied; surviving calls proceed with optional
   updatedInput applied to tool args.

2. Only approved calls are batched into ON_TOOL_EXECUTE and sent to the
   host. If all calls are denied, the batch dispatch is skipped entirely.

3. **PostToolUse** fires per successful result. If a hook returns
   updatedOutput, the ToolMessage content is replaced before the message
   is appended to the graph.

4. **PostToolUseFailure** fires per error result. Observational only —
   cannot modify the error message.

Post hooks are awaited (not fire-and-forget) so updatedOutput takes
effect before the ToolMessage is consumed. Hook errors are swallowed
via .catch() to avoid disrupting the tool result flow.
PermissionDenied is fire-and-forget since it's purely observational.

## Plumbing

- `ToolNodeOptions.hookRegistry?: HookRegistry` — new optional field
- `ToolNode` constructor stores `this.hookRegistry`
- `Graph.createToolNode()` passes `this.hookRegistry` at both
  construction sites (event-driven and traditional)
- `hookRegistry` is set on Graph BEFORE `createWorkflow()` is called
  (moved from the post-create assignment in Run constructor into
  `createLegacyGraph` and `createMultiAgentGraph`) so ToolNode receives
  a non-undefined registry at construction time

## hasHookFor guards

All four hook fire points use `hasHookFor(event, runId)` before calling
`executeHooks`, so when no hooks are registered for the event the
overhead is a single Map lookup + array length check — no Promise
allocations, no function calls.

## Tests

9 new integration tests in `src/hooks/__tests__/toolHooks.test.ts`
using event-driven mode (toolDefinitions + ON_TOOL_EXECUTE handler):

- PreToolUse fires with correct fields
- PreToolUse deny blocks execution (tool handler never called)
- PreToolUse ask blocks execution (v1)
- PreToolUse updatedInput rewrites args before dispatch to host
- PreToolUse hook errors are non-fatal (tool still executes)
- PermissionDenied fires after deny with reason
- PostToolUse fires with tool output
- PostToolUseFailure fires on error result
- No-hooks baseline works identically

Total: 94 existing + 9 new = 103 passing.

* fix(hooks): address PR 3 review — step completion, ordering, catch, tests

Resolves all 10 findings from the review on #90.

## Structural rewrite of dispatchToolEvents

The post-processing loop was rewritten to fix #1, #2, #6, #8, #10:

- #1 MAJOR: Denied calls now dispatch ON_RUN_STEP_COMPLETED via a
  shared `dispatchStepCompleted` helper (extracted from the duplicate
  logic in the executed-results loop). Without this, the frontend
  would show stuck spinners for denied tool calls.

- #2 MAJOR: Output messages are now collected in a Map keyed by
  tool_call_id, then returned in the original `toolCalls` input order
  via `toolCalls.map(call => messageByCallId.get(call.id))`. Before,
  denied messages were prepended to executed messages, breaking the
  implicit ordering contract.

- #6 MINOR: `toolUsageCount` is now incremented only for approved
  calls (moved from the initial `preToolCalls.map` into the
  `approvedEntries.map` that builds ToolCallRequests). Denied calls no
  longer inflate the turn counter.

- #8 MINOR: Eliminated the double `result.status === 'error'` check.
  ToolMessage is now constructed inside each status branch directly
  after hook dispatch.

- #10 NIT: `requests.find()` replaced with a pre-built
  `Map<id, ToolCallRequest>` for O(1) lookup per result, per AGENTS.md
  guidance on Map/Set over Array.find.

## Error protection

- #3 MAJOR: PreToolUse `Promise.all` now wraps each `executeHooks`
  call in `.catch(() => emptyResult)` so an infrastructure error in
  the hooks subsystem cannot crash the entire tool batch. Matches the
  stated "hook errors are non-fatal" contract and is consistent with
  PostToolUse/PostToolUseFailure which already had `.catch()`.

## New tests

- #4 MAJOR: Added `updatedOutput replaces the ToolMessage content` —
  registers a PostToolUse hook returning `{updatedOutput: 'REDACTED'}`,
  verifies the ToolMessage content in the graph's run messages is
  `'REDACTED'` (not the original output).

- #7 MINOR: PermissionDenied test replaced 50ms `setTimeout` with a
  `Promise` resolved by the hook callback. No more flaky sleep.

## Cleanup

- #9 NIT: Updated stale "once the tool-hook PR lands" comment in
  `src/hooks/index.ts`.

Tests: 103 -> 104 (+1 for updatedOutput).

* fix(hooks): off-by-one in dispatchStepCompleted index, test guard, turn fallback

- #1 MAJOR: dispatchStepCompleted read post-increment toolUsageCount
  for the `index` field. Added `turn?: number` parameter; approved-call
  site passes `request?.turn` (the pre-increment value), denied-call
  site omits it (fallback is correct since count was never bumped).
- #2 NIT: updatedOutput test now asserts `expect(toolMsg).toBeDefined()`
  before content extraction so a missing tool message produces a clear
  failure rather than "Expected undefined to be 'REDACTED'".
- #3 NIT: PreToolUse input `turn` field restored `?? 0` fallback so
  first-use tools see `turn: 0` instead of `turn: undefined`.

* fix(hooks): batch tests, turn JSDoc, freeze sentinel, remove unused hookRegistry

Addresses follow-up review findings on #90.

- #1 MAJOR: Add multi-call batch tests — partial deny (echo denied,
  math approved, order preserved) and all-denied (ON_TOOL_EXECUTE
  handler never called). Uses counter-based IDs (#7 fix) to avoid
  Date.now() collisions across calls in the same test.
- #2 MINOR: Document PreToolUseHookInput.turn semantics — within a
  single batch, all calls to the same tool share the same turn value.
  Per-call discrimination within a batch is not supported in v1.
- #3 MINOR: Remove hookRegistry from the traditional (non-event-driven)
  ToolNode construction site in Graph.ts. Hooks only fire in
  dispatchToolEvents; passing the registry to a ToolNode that never
  uses it created a false impression of hook support.
- #4 MINOR: Add PostToolUse error non-fatality test — throwing hook
  does not crash the batch, original content is preserved.
- #5+#8 MINOR: Freeze the hook-error fallback sentinel via
  Object.freeze and rename from emptyResult to HOOK_FALLBACK. Prevents
  future mutation bugs if someone pushes to the shared arrays.
- #6 MINOR: Document in ToolNodeOptions.hookRegistry JSDoc that hooks
  only fire for event-driven calls — directToolNames bypass dispatch.

Tests: 104 -> 107 (+3 for batch partial-deny, all-denied, and
PostToolUse error resilience).

* nit: simplify test assertions, keep shallow freeze (deep conflicts with mutable type)
Phases 1+2 of the Skills system, rebased on hooks PRs (#88, #90):

Types:
- InjectedMessage type in tools.ts (generic, any tool can use it)
- SkillCatalogEntry type in skill.ts
- ToolExecuteResult.injectedMessages field
- Constants.SKILL_TOOL = 'skill'

SkillTool (src/tools/SkillTool.ts):
- JSON Schema (single source of truth), LCTool definition, createSkillTool()
- Runtime-agnostic description (files "may" become accessible)
- Requires event-driven execution mode

ToolNode integration (src/tools/ToolNode.ts):
- convertInjectedMessages() converts to HumanMessage (both roles, avoids
  Anthropic/Google SystemMessage rejection). Original role in additional_kwargs.
- dispatchToolEvents returns { toolMessages, injected } tuple
- Injected messages placed AFTER ToolMessages (provider ordering)
- try/catch isolation around injection conversion
- Works with hooks-based PreToolUse/PostToolUse lifecycle

Catalog formatter (src/tools/skillCatalog.ts):
- formatSkillCatalog() with truncation ladder and budget enforcement
- fitNamesOnly drops trailing entries when names-only exceeds budget

Tests: 30 total (18 SkillTool + 12 skillCatalog)
* feat: add local bash execution and bash programmatic tool calling tools

Implement BashExecutor for local bash command execution via child_process
and BashProgrammaticToolCalling for multi-tool orchestration using bash
scripts with a local HTTP server for tool IPC via curl.

* fix: use remote Code API for bash tools instead of local child_process

BashExecutor now sends `lang: 'bash'` to the same /exec endpoint as
CodeExecutor. BashProgrammaticToolCalling sends bash code to
/exec/programmatic with the same round-trip loop as PTC. Both tools
share session/file handling with their counterparts. ToolNode session
injection and storage updated for all four code execution tool names.
Updated the constant name from EXECUTE_BASH to BASH_TOOL in the Constants enum and adjusted references in BashExecutor and ToolNode to reflect this change. This improves clarity and aligns with naming conventions.
When the skill handler uploads files to the code env and returns
artifact: { session_id, files }, ToolNode now stores that session
context (via storeCodeSessionFromResults). This makes skill-primed
files available to subsequent bash/code tool calls in the same run.

Also adds SKILL_TOOL to the codeSessionContext request building in
dispatchToolEvents, so the handler receives the current session
context and can check if files are already uploaded.
…95)

* feat: add initialSessions to RunConfig for seeding Graph session state

Allows hosts to seed the Graph's ToolSessionMap at run creation time.
Used by skill file re-priming: after uploading skill files to the code
env at run start, the host passes the session info so ToolNode can
inject session_id + files into subsequent bash/code tool calls.

* refactor: add CODE_EXECUTION_TOOLS set, DRY ToolNode checks

Add CODE_EXECUTION_TOOLS ReadonlySet to common/enum.ts containing
all four code execution tool constants (EXECUTE_CODE, BASH_TOOL,
PROGRAMMATIC_TOOL_CALLING, BASH_PROGRAMMATIC_TOOL_CALLING).

Replace 5 repeated 4-constant inline checks in ToolNode with
CODE_EXECUTION_TOOLS.has(). SKILL_TOOL remains a separate check
where needed since it's not a code execution tool itself.
When a skill is invoked, the body is injected as a HumanMessage into
LangGraph state but NOT persisted to conversation history. On follow-up
runs the skill body is lost.

formatAgentMessages now accepts an optional invokedSkillBodies Map
(skillName → body) and reconstructs HumanMessages at the correct
position in the message sequence — after the skill's ToolMessage,
matching where ToolNode originally injected them.

Detection happens inside the existing tool_call processing loop (both
with and without tools filtering), so there are zero extra message
iteration passes.
Critical fix:
- Capture endMessageIndex BEFORE skill body HumanMessage injection so
  injected messages are excluded from the assistant message's token
  distribution range. Prevents indexTokenCountMap corruption.

Review fixes:
- Remove dead else-if(!discoveredTools) branch inside if(discoveredTools)
  block and revert redundant optional chaining
- Extract extractSkillName() helper to eliminate DRY violation (identical
  args parsing IIFE was duplicated in two locations)
- Use Set<string> instead of string[] for pendingSkillNames (dedup)
- Lazy allocation via ??= (only allocated when skills are present)
* fix: resolve review findings for hooks & skills PR

- Fix stale JSDoc referencing non-existent future PR (run.ts)
- Use requestMap (Map.get) instead of Array.find in storeCodeSessionFromResults
- Extract shared updateCodeSession helper to eliminate duplicated session storage logic
- Add sync critical section markers around once-matcher removal in executeHooks
- Optimize fitNamesOnly from O(n²) to O(n) with running character sum
- Widen ReDoS timing assertion from 50ms to 200ms for CI stability
- Add test verifying ON_RUN_STEP_COMPLETED dispatch for denied tool calls

* test: assert event shape in deny step completion test
)

New READ_FILE constant and ReadFileToolDefinition for a general-purpose
file reader. Schema: { file_path: string }. Skill files use the path
format {skillName}/{path} (e.g. "pdf-analyzer/src/utils.py").

The tool is event-driven only (host handles via ON_TOOL_EXECUTE).
Works without code execution for skill files (DB/storage reads).
With code env, also supports code execution output files.
* feat: wire PreCompact/PostCompact hooks around summarization node

Fires PreCompact and PostCompact hooks inside `createSummarizeNode`,
the last two SDK-side hook events. These are auditing hooks — hosts
can archive the full transcript before compaction, observe the summary
produced, or log compaction events for telemetry.

## Fire points

- **PreCompact** fires after ON_SUMMARIZE_START, before the LLM
  summarization call. Receives `messagesBeforeCount` and the trigger
  type from the agent's summarization config.
- **PostCompact** fires after ON_SUMMARIZE_COMPLETE, after the summary
  is generated and enriched. Receives the summary text and
  `messagesAfterCount: 0` (the node removes all messages; the summary
  is injected into the system prompt, not as a message).

Both hooks are observational and non-blocking — errors are swallowed
via `.catch()`. Results are not consumed.

## Type fix

`PreCompactHookInput.trigger` was defined as `'threshold' | 'manual'
| 'error'` (from the initial PR #87) but those don't match the real
summarization trigger types. Updated to:
`'token_ratio' | 'remaining_tokens' | 'messages_to_refine' | 'default'`
with `(string & {})` for forward compatibility. `'default'` covers the
case where no trigger is configured and compaction fires on any pruning.

## Plumbing

`hookRegistry` added to `CreateSummarizeNodeParams.graph` interface.
`Graph.ts` passes `this.hookRegistry` at the `createSummarizeNode`
call site, same pattern as ToolNode.

## Tests

4 new integration tests using `FakeListChatModel` for the summarization
provider (no API keys required):
- PreCompact fires with correct messagesBeforeCount and trigger
- PostCompact fires with summary text
- Hook errors don't crash compaction
- No-hooks baseline works identically

Total: 108 existing + 4 new = 112 passing.

* fix(hooks): add threadId to compaction hooks, strengthen tests, fix imports

Resolves review findings on #103.

- #1 MAJOR: Add threadId to both PreCompact and PostCompact hook
  inputs. PreCompact extracts from the node's config parameter,
  PostCompact from runnableConfig. Restores consistency with all
  other hook sites in run.ts.
- #2 MINOR: Assert trigger field in PreCompact test — verifies the
  'default' fallback when no trigger is configured.
- #3 MINOR: Strengthen PostCompact summary assertion from
  length > 0 to toContain('Summary') against the known
  FakeListChatModel response.
- #4 MINOR: Add PostCompact error resilience test — throwing hook
  in dispatchCompletionEvents does not crash compaction.
- #5 MINOR: Fix import order in node.ts per AGENTS.md (type imports
  longest→shortest, value imports longest→shortest).
- #6 NIT: Update hooks/index.ts barrel comment to list
  createSummarizeNode as third consumer.
- #8 NIT: Assert agentId in both PreCompact and PostCompact tests.

Tests: 112 -> 113 (+1 PostCompact error resilience).

* test: assert threadId propagation and exact summary match in compaction hooks

* fix: use runnableConfig for PreCompact threadId extraction (matches PostCompact)
#104)

* feat: add subagent-as-tool primitive for hierarchical agent delegation

Adds a SubagentTool that lets a parent agent delegate tasks to child
agents with isolated context windows. Verbose tool outputs stay in
the child's context; only a filtered text summary returns to the parent.

Key components:
- SubagentConfig type on AgentInputs for declaring child agent types
- SubagentExecutor: creates child StandardGraph, invokes with isolated
  state, filters result (strips tool_use/thinking blocks), fires
  SubagentStart/SubagentStop hooks
- SubagentTool: LCTool definition with dynamic enum from configs
- Graph integration: tool created in createAgentNode() via graphTools,
  automatically becomes a direct tool (bypasses event-driven dispatch)
- Self-spawn support: reuse parent's AgentInputs for context isolation
  without separate agent config

Wires the existing SubagentStart (with deny support) and SubagentStop
hook types that were defined but had zero fire points.

* fix: resolve review findings for subagent primitive

Address all 6 issues from PR review:

1. Add subagentHooks.test.ts — end-to-end hook integration test
   through real Run pipeline (SubagentStart payload, SubagentStop
   messages, deny blocks execution). 3 new tests.

2. Remove `as unknown as` cast — follow MultiAgentGraph pattern
   (init graphTools as empty array, push directly).

3. Deduplicate schema — extract buildSubagentToolParams() in
   SubagentTool.ts, use it in Graph.ts. createSubagentToolDefinition()
   now delegates to it. Single source of truth for schema/description.

4. Propagate threadId — tool callback extracts thread_id from
   LangGraph config.configurable, passes to executor.execute().
   Hooks now receive correct threadId.

5. Pass tokenCounter to child graph — added to SubagentExecutorOptions,
   threaded from agentContext.tokenCounter through to child
   StandardGraph constructor.

6. Pass signal in invoke() config — workflow.invoke() now receives
   signal for more responsive abort between graph steps.

* fix: resolve second-pass audit findings for subagent primitive

Address 12 valid findings from the audit — 3 MAJOR, 6 MINOR, 3 NIT:

MAJOR fixes:

1. Depth tracking now works across graph boundaries. Previously the
   `depth`/`maxDepth` check only worked within a single executor; when
   `allowNested: true`, the child graph spawned its own executor at
   depth=0, making `maxSubagentDepth` non-functional. Fix: decrement
   `maxSubagentDepth` in `buildChildInputs`, so the child's own
   executor inherits a smaller budget and naturally blocks further
   nesting when it reaches 0. Removed the now-vestigial `depth` field.

2. Detect duplicate `SubagentConfig.type` values. `Map` construction
   silently dropped duplicate keys. Now `resolveSubagentConfigs`
   throws with a clear error.

3. Add test coverage for `allowNested: true`: verifies depth is
   decremented across boundaries, child inherits subagentConfigs,
   and budget is clamped to >= 0.

MINOR fixes:

4. Runtime guard for empty `description` in the tool callback — the
   LLM can send undefined/empty values; now we default to a safe
   fallback string.

5. Updated `src/hooks/index.ts` barrel comment to list SubagentExecutor
   as the fourth hook consumer (SubagentStart/SubagentStop).

6. Extracted property description strings as constants in
   SubagentTool.ts; `SubagentToolSchema` and `buildSubagentToolParams`
   now share a single source of truth.

7. Changed SubagentStop from fire-and-forget to awaited, matching the
   PostCompact pattern. Removes fragile `setTimeout` synchronization
   in tests. Errors still swallowed (observational).

8. Truncate error messages from child graph to 200 chars to prevent
   leaking internal details (stack traces, API errors) to the parent
   LLM.

9. Document that `ask` decision is treated identically to `deny` in
   the non-interactive subagent context.

NIT fixes:

11. Import order in SubagentExecutor.ts — `@/types` (multi-line,
    longest) now comes first in the local types sub-group.

12. Removed unused `contentParts` destructuring in verification script.

13. Fixed unsafe `msg.content as string` cast in verification script;
    now handles array content correctly.

Skipped #10 (NIT): `_sourceInputs` underscore prefix is an accepted
TS convention for cross-file-accessible-but-semantically-internal
fields.

Test count: 1074 passed (up from 1065, +9 new tests), 0 regressions.

* fix: address third-pass audit findings for subagent primitive

Resolve 3 NITs from the second review pass plus 2 new codex comments:

NIT #1 — Tighten truncation test: previously asserted only
`result.content.length < 500`. Now asserts exact envelope of 219
chars ("Subagent error: " prefix + 200 body + "..." suffix) so that
regressions in ERROR_MESSAGE_MAX_CHARS are caught. Added companion
test verifying short messages are not truncated.

NIT #2 — Skip tool creation when maxSubagentDepth: 0. When a host
set `maxSubagentDepth: 0` alongside `subagentConfigs`, the tool was
still bound to the LLM, and every invocation returned the depth-
exceeded error. The LLM wasted a turn. Now the `createAgentNode`
guard includes `effectiveSubagentDepth > 0`, so the tool is not
created at the leaf level. Added integration test.

NIT #3 — Document buildChildInputs audience. Added @remarks block
explaining this is an advanced utility exported primarily for
testing and internal use; hosts configuring subagents should not
call it directly.

Codex P1 (false positive, clarified): maxSubagentDepth propagation
across graph boundaries was already correct via the countdown in
buildChildInputs, but it wasn't obvious from the call site. Added
a multi-line JSDoc comment above the executor construction in
createAgentNode explaining how depth consumes across boundaries
through the decremented AgentInputs.maxSubagentDepth.

Codex P2 (real bug, fixed): toolSchemaTokens did not include the
subagent tool's schema. `calculateInstructionTokens()` was called
in `fromConfig()` before graphTools was populated, so the later-
injected subagent tool's schema was missing from the budget. Fixes:
  1. Include `graphTools` in the iteration in
     `calculateInstructionTokens` (also helps handoff tools, a
     pre-existing issue with the same root cause).
  2. Re-trigger the calculation at the end of `createAgentNode`
     after the subagent tool is pushed, so the token count reflects
     the final graphTools state before `createCallModel` awaits
     `tokenCalculationPromise`.

Added test asserting `toolSchemaTokens` is larger for agents with
subagents vs without, confirming the recalculation runs.

Test count: 1077 passed (up from 1074, +3 new tests), 0 regressions.

* fix: break circular dep between SubagentExecutor and StandardGraph

Rollup warned on build:
  Graph.ts → subagent/index.ts → SubagentExecutor.ts → Graph.ts

SubagentExecutor needed `StandardGraph` at runtime to construct the
child graph; Graph.ts needs SubagentExecutor to attach the tool. With
preserveModules, these land in separate chunks producing circular
chunk imports with undefined execution order.

Fix: dependency injection. SubagentExecutor now takes a required
`createChildGraph: ChildGraphFactory` option and only imports
`StandardGraph` as `import type` (erased at build). Graph.ts passes
`(input) => new StandardGraph(input)` when constructing the executor.
Runtime module graph is now acyclic; `import type` does not appear in
emitted JS.

Test updates: replaced `jest.spyOn(GraphModule, 'StandardGraph')`
(which relied on the runtime import we just removed) with direct mock
factories passed through the new option. Cleaner — tests no longer
need to reach into module internals. Same coverage.

Build: `npx rollup -c` now emits zero warnings.
Test count: 1077 passed, 0 regressions.

* chore: add multi-agent-subagent script to package.json

Introduced a new script entry for multi-agent-subagent, enabling execution of the corresponding TypeScript file with necessary configurations. This addition enhances the multi-agent functionality by allowing for more granular control over subagent operations.

* fix: isolate child subagent from parent's callback chain

The manual verification script (`npm run multi-agent-subagent`)
crashed with:

  Error: No agent context found for agent ID researcher
    at StandardGraph.getAgentContext
    at ChatModelStreamHandler.handle

Root cause: `Run.processStream` drives the parent via
`streamEvents`, which captures events from every nested runnable in
the call tree — including the child graph's LLM calls. When the
child's "researcher" agent streamed `on_chat_model_stream` events,
the parent's `ChatModelStreamHandler` received them and tried to
resolve the child's agent ID in the parent's `agentContexts` map,
which only contains the supervisor. Error thrown.

The Phase 1 plan was for child events to stay fully isolated. Unit
tests missed this because the child graph was always mocked (no
real `streamEvents` plumbing); only running the full pipeline
end-to-end surfaced it.

Fix: pass `callbacks: []` in the child `workflow.invoke()` config.
This overrides the inherited callbacks for that invocation, breaking
the event propagation chain. Also set `runName: subagent:<type>` and
a child-scoped `configurable.thread_id` so the child has a distinct
trace root rather than polluting the parent's trace.

Manual verification now works: supervisor delegates to researcher,
receives filtered text, synthesizes the final answer.

Test suite: 1079 passed, 0 subagent-suite regressions. The two newly
failing suites (ProgrammaticToolCalling / ToolSearch Live API tests)
are pre-existing — they require `LIBRECHAT_CODE_BASEURL` to be
configured and fail identically on the prior commit. They were
skipped before because OPENAI_API_KEY was absent.

* feat: align filterSubagentResult fallback with Claude Code

When a subagent's last AIMessage is pure tool_use (e.g. the model
hit maxTurns mid-tool-call), the previous implementation returned
the "Task completed" sentinel, discarding any textual progress from
earlier turns.

Claude Code's agentToolUtils.finalizeAgentTool walks backwards from
the last message, and on messages without text content continues
scanning earlier AIMessages, surfacing the most recent textual
thought instead of the sentinel.

This change matches that behavior. The three `return` sites inside
the loop that previously exited with the sentinel now `continue`
when the AIMessage yields no text; the loop exits with "Task
completed" only when no AIMessage in the history contains any text.

Concrete example:
  [0] Human: "What's the capital of France?"
  [1] AI:    "Let me search." + tool_use(search, ...)
  [2] Tool:  "Paris."
  [3] AI:    tool_use(search, ...)   <-- maxTurns, no text

  Before: "Task completed"
  After:  "Let me search."

Added two tests — the turn-limit scenario above, and the analogous
empty-string-content edge case. All nine existing filterSubagentResult
tests still pass unchanged.

* fix: update model name for non-Anthropic API to gpt-5.4

Changed the model name used in the multi-agent subagent script from 'gpt-4.1' to 'gpt-5.4' for improved performance and capabilities. This update ensures compatibility with the latest API offerings.
* feat: forward subagent child events and enable event-driven tools

Adds an opt-in event forwarding path for `SubagentExecutor` so host apps
that rely on event-driven tool dispatch (`toolDefinitions` +
`ON_TOOL_EXECUTE` handler) can use subagents — including self-spawn.

Before: a subagent in event-driven mode got `toolDefinitions` stripped in
`buildChildInputs` and ran with `callbacks: []`, so the LLM had no tool
schemas and ON_TOOL_EXECUTE never reached the host. The subagent replied
in prose without invoking any tools.

Now:
- `SubagentExecutor` accepts `parentHandlerRegistry` (a direct registry
  or a lazy getter — `Run` wires the registry onto the graph AFTER
  `createWorkflow`, so `createAgentNode` must capture lazily).
- When provided, `buildChildInputs` keeps `toolDefinitions`, and the
  child's invoke attaches a `BaseCallbackHandler` forwarder that:
    - routes ON_TOOL_EXECUTE to the parent's handler (event-driven tools
      work), and
    - wraps other custom events (run_step, run_step_delta,
      run_step_completed, message_delta, reasoning_delta) in a new
      `ON_SUBAGENT_UPDATE` envelope so hosts can render child progress
      in a separate UI surface.
- New `SubagentUpdateEvent` type and `ON_SUBAGENT_UPDATE` GraphEvents
  entry; start/stop/error envelopes fire around the child invoke.
- Legacy isolation is preserved when no registry is passed.

Test plan:
- `npx jest SubagentExecutor SubagentTool` — 36 pass (4 new covering
  forwarding, toolDefinition retention, legacy strip, lazy getter).
- Integration test against OpenAI (`subagent-event-driven-debug.ts`)
  confirms the child uses `calculator` via the parent's ON_TOOL_EXECUTE
  path and the host sees start/run_step/run_step_completed/stop
  ON_SUBAGENT_UPDATE envelopes.

* fix: gate child toolDefinitions on ON_TOOL_EXECUTE handler + propagate tool_call_id

Two follow-ups from review on #107:

1. Codex P1: `SubagentExecutor` was treating any non-null parent registry
   as "forwarding enabled" and always kept `toolDefinitions` on the
   child. Because `Run.create` always constructs a `HandlerRegistry`,
   this meant configurations with tools declared but no `ON_TOOL_EXECUTE`
   handler would hang forever — the child's `ToolNode` would dispatch a
   batch request and the forwarder would find no handler to resolve or
   reject it, leaving the promise pending.

   Now `keepToolDefinitions` is gated on
   `parentRegistry.getHandler(ON_TOOL_EXECUTE) != null`. A registry
   without the handler falls back to the legacy "strip toolDefinitions"
   path, which is a recoverable no-tools state instead of a hang.

2. `SubagentUpdateEvent` now carries `parentToolCallId`. `Graph.ts`
   pulls it from `config.toolCall.id` (LangChain threads the originating
   `ToolCall` onto `RunnableConfig`) when the subagent tool is invoked.
   This lets hosts correlate subagent updates to the parent tool call
   deterministically instead of inferring by event ordering — a concern
   raised on the LibreChat PR using this event stream.

Also bumps to `3.1.67-dev.2`.

Tests: `npx jest SubagentExecutor` — 38 pass (2 new covering the handler
gate and `parentToolCallId` propagation). Integration script verified
against real OpenAI: every ON_SUBAGENT_UPDATE envelope now carries the
parent's `tool_call_id`.

* chore: address audit findings — drop unused param, tighten cast, add tests

Addresses the comprehensive audit of #107:

### MAJOR #1 — ON_TOOL_EXECUTE forwarding coverage
Two new tests in the `event forwarding` suite drive the forwarder
callback directly with a synthesized `ToolExecuteBatchRequest`:
- `routes child ON_TOOL_EXECUTE dispatches through the parent registry`
  asserts the parent's `ON_TOOL_EXECUTE` handler is invoked, the batch's
  `resolve` fires with canned results, and the child receives them.
- `does NOT forward ON_TOOL_EXECUTE when the parent registry has no
  handler (safe fallback)` documents the companion case: without a
  handler the `resolve` never fires, which is why the executor gates
  `keepToolDefinitions` on handler presence in the first place.

### MINOR #2 — unused `childGraph` parameter
Removed from `createForwarderCallback`'s arg shape, destructuring, and
call site. It was accepted as `_childGraph` and never read.

### MINOR #3 — `data as never`
`EventHandler.handle` didn't include `ToolExecuteBatchRequest` in its
accepted data union, which is why the forwarder used `as never` to
shut the compiler up. Extended the union and switched the cast to the
concrete `ToolExecuteBatchRequest` type — the forwarder is now fully
type-checked on the hot path.

### MINOR #4 — error-phase envelope test
`emits an 'error' phase envelope when the child graph throws` uses
the throwing factory helper already in the file, registers a real
`ON_SUBAGENT_UPDATE` handler, and asserts start/error ordering plus
the `data.message` payload and `parentToolCallId` passthrough.

### MINOR #6 — dynamic imports in tests
Replaced `await import('@/events')` / `await import('@/common')` with
static top-level imports. Consistent with the rest of the file.

### MINOR #7 — `summarizeEvent` unit tests
Exported the pure function from `@/tools/subagent` and added 10 focused
tests covering every branch: tool_calls with one/many names, empty
tool_calls, message_creation, ON_TOOL_EXECUTE with/without names,
completed steps with/without a tool name, message_delta, and the
unknown-event fallback. Every user-visible label is now regression
protected.

### MINOR #8 — `config.toolCall` comment
Expanded the JSDoc to note the specific LangChain source
(`ToolRunnableConfig` in `@langchain/core/tools`, stable since 0.3.x)
and that the defensive read degrades to `undefined` on upstream
changes.

### NIT #9 — unused import
Removed `createContentAggregator` from `subagent-tools-debug.ts`.

### NIT #10 — `awaitHandlers = true` doc
Added a JSDoc block explaining the trade-off: required for
`ON_TOOL_EXECUTE` to actually block on the host's resolve, but it
also serializes observational events through any slow host handlers.
Refinement path noted for later if host-side latency becomes a problem.

Version bumped to `3.1.67-dev.3`.

Tests: 51 pass (40 SubagentExecutor + 10 new summarizeEvent + 1 new
error-phase + existing coverage retained).

* fix: clear parent run summary and discoveredTools from child inputs

Paired with Codex P1 on the LibreChat PR (danny-avila/LibreChat#12725)
that uses this event stream. `buildChildInputs` shallow-spreads the
parent's `AgentInputs`, which carries two run-scoped fields that leak
into the isolated child by default:

- `initialSummary` — cross-run conversation summary. On a conversation
  that has already been summarized, every child would inherit it and
  reason against unrelated prior chat.
- `discoveredTools` — tool names the parent's LLM searched for in
  earlier turns via `tool_search`. The child gets primed to use
  whatever the parent happened to find.

Both are cleared on the returned child inputs. Configuration fields
(summarization policy, provider options, context pruning config) are
preserved — those are policy, not context.

Affects both paths:
- Self-spawn: `resolveSubagentConfigs` clones parent `_sourceInputs`,
  then `buildChildInputs` strips.
- Explicit children: host-built `AgentInputs` goes through the same
  `buildChildInputs` before the child graph runs.

Version bumped to `3.1.67-dev.4`.

Regression test: `strips parent-run-scoped initialSummary and
discoveredTools from child inputs` — seeds parent inputs with
`initialSummary: { text, tokenCount }` and `discoveredTools: [...]`,
asserts both are `undefined` on the returned child. 52 pass.

* test: cover summarizeEvent step.type fallback branches

Addresses reviewer F2.1 (NIT) on #107. `summarizeEvent` resolves
run-step detail type via `step.stepDetails?.type ?? step.type ?? 'step'`.
The existing tests all passed `{ stepDetails: { type } }`, leaving the
second and third clauses of the chain uncovered.

Two new tests:
- top-level `step.type` (no `stepDetails` wrapper) for both
  `tool_calls` and `message_creation` — exercises the second clause
- empty `{}` payload — exercises the `'step'` default and the generic
  `Step: <detailType>` branch

Carried MINOR #5 (`SubagentUpdateEvent.data: unknown`) left as a
documented deferral; the JSDoc already notes "shape depends on phase"
and a discriminated union is a larger refactor.

54/54 pass.

* chore: add npm run scripts for subagent debug scripts

- `npm run subagent` — supervisor + researcher/coder subagent demo
  (no tool use; from #104)
- `npm run subagent:events` — event-driven flow matching LibreChat's
  architecture (toolDefinitions + ON_TOOL_EXECUTE handler + self-spawn
  with calculator). This is the script that demonstrates the #107 fix.
- `npm run subagent:tools` — traditional LangChain tool instances with
  self-spawn; pre-fix parity case.

All three accept `OPENAI_API_KEY` (and `--provider anthropic` for the
first).
* 🗑️ refactor: Remove CODE_API_KEY completely from the sandbox tool layer

Phase 8 (LibreChat danny-avila/LibreChat#12767) deprecates
`LIBRECHAT_CODE_API_KEY` across every LibreChat-side reference. The
sandbox service now authenticates consumers via other channels (network
boundary, service token, etc.), so the library layer can drop the
concept entirely instead of treating it as an optional forwarding
parameter.

**What's removed:**

- `EnvVar.CODE_API_KEY` enum value (the `CODE_BASEURL` entry stays).
- `apiKey` field and `[EnvVar.CODE_API_KEY]` indexer on
  `CodeExecutionToolParams`, `ProgrammaticToolCallingParams`, and
  `ToolSearchParams`. `BashExecutionToolParams` /
  `BashProgrammaticToolCallingParams` inherit the cleaner shapes.
- The precedence chain (`params[EnvVar.CODE_API_KEY] ?? params.apiKey ??
  getEnvironmentVariable(EnvVar.CODE_API_KEY) ?? ''`) gone from all
  five tool factories — no key resolution happens at all.
- Every `X-API-Key` outbound header in `BashExecutor.ts`,
  `CodeExecutor.ts`, `ProgrammaticToolCalling.ts`
  (fetchSessionFiles + makeRequest + factory), and `ToolSearch.ts`.
- `apiKey` parameter on `fetchSessionFiles(baseUrl, sessionId, proxy)`
  and `makeRequest(endpoint, body, proxy)` — both simplified to the
  remaining args.
- Throws in five places: "No API key provided for bash execution
  tool.", "...for code execution tool.", "...for programmatic tool
  calling.", "...for bash programmatic tool calling.", "...for tool
  search in code_interpreter mode."
- Dangling `getEnvironmentVariable` imports in
  BashExecutor/BashProgrammaticToolCalling/ProgrammaticToolCalling/
  ToolSearch (still used by CodeExecutor for `CODE_BASEURL`).

**Scripts:**

- `programmatic_exec.ts`, `tool_search.ts`: remove
  `LIBRECHAT_CODE_API_KEY` env reads + pre-flight `process.exit(1)`
  guards + `{ apiKey }` passed to the factories.
- `test_code_api.ts`: drop `API_KEY` constant + `X-API-Key` header.
- Integration tests (`ProgrammaticToolCalling.integration.test.ts`,
  `ToolSearch.integration.test.ts`): the skip gate flips from
  "key present" to an explicit `RUN_CODE_INTEGRATION_TESTS=1` opt-in.
  Live-sandbox tests don't run by default in CI regardless of auth.

**Backward compat:**

Consumers that previously passed `apiKey` in params now get a TS error
on that field — the correct remediation is to drop the arg. No runtime
behavior remains for stragglers because the header is gone.

**Verification:**

- `grep -rn "CODE_API_KEY\|LIBRECHAT_CODE_API_KEY" src/` returns 0 hits.
- `grep -rn "X-API-Key" src/tools/` returns only the SEARX search tool
  header (unrelated feature with its own optional apiKey).
- `npx tsc --noEmit`: clean
- `npx jest src/tools`: 379 passing, 23 skipped (integration gated on
  the new opt-in var). No regressions.
- `npx eslint` clean on every touched file.

* 📦 chore: bump version to 3.1.68-dev.1

* 🧹 chore: Remove stale LIBRECHAT_CODE_API_KEY from CI + publish workflows

Follow-up to the hard removal (bf64da3). Both workflow files still
passed `LIBRECHAT_CODE_API_KEY: ${{ secrets.LIBRECHAT_CODE_API_KEY }}`
to the "Run PTC unit tests" and "Run Tool Search unit tests" steps —
4 references total. No code in the repo reads the env var anymore, so
this was dead config that would mislead future maintainers. The
LIBRECHAT_CODE_BASEURL var stays (still read by getCodeBaseURL).

`grep -rn "CODE_API_KEY\|LIBRECHAT_CODE_API_KEY" .` (excluding
node_modules) now returns zero hits across the whole repo.
@pull pull Bot locked and limited conversation to collaborators Apr 23, 2026
@pull pull Bot added the ⤵️ pull label Apr 23, 2026
@pull pull Bot merged commit 7df6ad1 into innFactory:main Apr 23, 2026
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant