Skip to content

fix(google): use realtime input for Gemini 3.1 generateReply#1199

Open
rahulmanuwas wants to merge 1 commit intolivekit:mainfrom
rahulmanuwas:prototype/google-gemini31-generate-reply
Open

fix(google): use realtime input for Gemini 3.1 generateReply#1199
rahulmanuwas wants to merge 1 commit intolivekit:mainfrom
rahulmanuwas:prototype/google-gemini31-generate-reply

Conversation

@rahulmanuwas
Copy link
Copy Markdown
Contributor

@rahulmanuwas rahulmanuwas commented Apr 3, 2026

Summary

  • fixes generateReply() for gemini-3.1-flash-live-preview by avoiding mid-session sendClientContent(...)
  • uses sendRealtimeInput({ text: instructions ?? "." }) for Gemini 3.1 continuation instead
  • preserves function_call_output delivery by still sending restricted-model tool results as tool_response
  • makes the current limitation explicit: restricted-model non-tool chat history is local-only on the current JS SDK path
  • adds regression coverage and a patch changeset for @livekit/agents-plugin-google

Problem

generateReply() was using a Gemini 2.5-style continuation flow based on mid-session sendClientContent(...). That path works on 2.5, but gemini-3.1-flash-live-preview rejects it with Request contains an invalid argument.

What This PR Changes

For Gemini 3.1, generateReply() now triggers generation with sendRealtimeInput(...) instead of synthesizing a fake clientContent turn.

updateChatCtx() is split intentionally for restricted models:

  • function_call_output items still go out as tool_response
  • non-tool history updates stay local-only and emit a warning instead of pretending they were applied server-side
  • stale queued content events are dropped as a safety net

Validation

  • pnpm exec vitest run plugins/google/src/beta/realtime/realtime_api.test.ts
  • pnpm --filter @livekit/agents-plugin-google... build
  • manual runtime validation on April 4, 2026 against this branch

Manual generateReply() results:

  • gemini-3.1-flash-live-preview with instructions: success
  • gemini-3.1-flash-live-preview without instructions: success
  • gemini-2.5-flash-native-audio-preview-12-2025 control path: success

Explicit Non-Goals

This PR does not claim full server-side chat-history support for Gemini 3.1 on the current JS SDK path. Initial or updated non-tool history still needs a supported provider/SDK mechanism.

Refs #1197.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 3, 2026

🦋 Changeset detected

Latest commit: 81139ae

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 22 packages
Name Type
@livekit/agents-plugin-google Patch
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch
@livekit/agents-plugins-test Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 3, 2026

CLA assistant check
All committers have signed the CLA.

@rahulmanuwas
Copy link
Copy Markdown
Contributor Author

rahulmanuwas commented Apr 3, 2026

Historical note from April 3, 2026. This comment reflects an earlier draft state before the later fixes in this PR branch and is kept only as debugging context. The current branch-state summary is the newer comment on this PR.

At the time of this note:

  • raw Gemini 3.1 Live worked for direct sendRealtimeInput({ text: ... })
  • but the sendClientContent(...) path used by the earlier prototype did not
  • with historyConfig.initialHistoryInClientContent = true, the session still closed with Request contains an invalid argument.

That earlier failure is why the branch later moved away from the 2.5-style mid-session clientContent continuation strategy for Gemini 3.1.

@rahulmanuwas rahulmanuwas changed the title feat(google): prototype Gemini 3.1 realtime generateReply support spike(google): probe Gemini 3.1 realtime continuation path Apr 3, 2026
@rahulmanuwas rahulmanuwas force-pushed the prototype/google-gemini31-generate-reply branch from 421ac56 to b575539 Compare April 4, 2026 02:59
Copy link
Copy Markdown
Contributor Author

rahulmanuwas commented Apr 4, 2026

Historical note from early April 4, 2026. This comment reflects an intermediate branch state before the later fixes in this PR branch and is kept only as debugging context. The current branch-state summary is the newer comment on this PR.

At the time of this note, a docs-aligned Gemini 3.1 history-seeding attempt still failed, which narrowed the problem away from just turnComplete handling. That intermediate result led to the current narrower fix: stop using mid-session sendClientContent(...) for Gemini 3.1 generateReply() and use sendRealtimeInput(...) instead.

Copy link
Copy Markdown
Contributor Author

rahulmanuwas commented Apr 4, 2026

Maintainer note: this comment is the current branch-state summary and supersedes the earlier April 3/4 exploratory notes below.

Head: 81139aef on April 4, 2026. The branch was squashed into a single commit for review hygiene; no code changed in that history cleanup.

Current scope of this PR:

  • fixes the narrow Gemini 3.1 generateReply() failure by avoiding mid-session sendClientContent(...)
  • uses sendRealtimeInput({ text: instructions ?? '.' }) for Gemini 3.1 continuation
  • preserves restricted-model tool-result delivery by still sending function_call_output items as tool_response
  • states the remaining limitation explicitly: non-tool chat history for restricted models is still local-only on the current JS SDK path
  • includes a patch changeset for @livekit/agents-plugin-google

Validation for the code on this branch:

  • pnpm exec vitest run plugins/google/src/beta/realtime/realtime_api.test.ts
  • pnpm --filter @livekit/agents-plugin-google... build

Manual runtime status for generateReply() on this branch:

  • gemini-3.1-flash-live-preview with instructions: success
  • gemini-3.1-flash-live-preview without instructions: success
  • gemini-2.5-flash-native-audio-preview-12-2025 control path: success

So the branch is now intentionally narrower than the earlier spike comments: it fixes generateReply() for Gemini 3.1, keeps tool outputs flowing, does not overclaim chat-history support, and now presents as a single review commit.

@rahulmanuwas rahulmanuwas changed the title spike(google): probe Gemini 3.1 realtime continuation path fix(google): use realtime input for Gemini 3.1 generateReply Apr 4, 2026
@rahulmanuwas rahulmanuwas marked this pull request as ready for review April 4, 2026 04:28
devin-ai-integration[bot]

This comment was marked as resolved.

@rahulmanuwas rahulmanuwas force-pushed the prototype/google-gemini31-generate-reply branch from 030ffbb to 81139ae Compare April 4, 2026 14:11
Copy link
Copy Markdown
Contributor Author

After livekit/agents#5413 merged on April 11, 2026, I checked agents-js main again. plugins/google/src/beta/realtime/realtime_api.ts still has the older Gemini 3.1 guards in updateInstructions, updateChatCtx, and generateReply, so the JS side still appears to have the gap described in #1197.

From what I can tell, #1229 is a narrower draft mirror of the Python change. That makes sense as a parity step, but it does not appear to address the generateReply() failure from #1197.

#1199 was intended to cover that remaining JS gap: switch Gemini 3.1 continuation to sendRealtimeInput(...), preserve restricted-model tool_response delivery, keep the current chat-history limitation explicit, and include regression coverage plus a changeset.

If you'd prefer this split differently for review, I'm happy to do that. I can break #1199 into a narrower parity PR and a follow-up PR for the generateReply() fix and tests. Otherwise, I think #1199 still addresses behavior that is missing on main today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants