Skip to content

Conversation

@chenghao-mou
Copy link
Member

@chenghao-mou chenghao-mou commented Jan 26, 2026

This adds commit_user_turn support for realtime models:

This allows users to use turn_detection="manual" with a realtime model.

Summary by CodeRabbit

  • New Features
    • Added explicit "commit user turn" for real-time agent sessions to finalize user turns.
    • OpenAI provider: full commit triggers response creation.
    • Google/AWS/Ultravox providers: method present but logs warnings or acts as a placeholder per provider support.
    • Voice agent: when a realtime session is active, commits use the realtime path instead of audio-only processing.

✏️ Tip: You can customize this high-level summary in your review settings.

@chenghao-mou chenghao-mou requested a review from a team January 26, 2026 15:44
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 26, 2026

📝 Walkthrough

Walkthrough

Adds an abstract commit_user_turn() to RealtimeSession and implements it across realtime provider plugins; agent_activity now delegates to the realtime session when present, bypassing the audio-recognition commit path. Provider implementations either perform turn-finalization (OpenAI) or log unsupported warnings.

Changes

Cohort / File(s) Summary
Abstract Interface & Agent Layer
livekit-agents/livekit/agents/llm/realtime.py, livekit-agents/livekit/agents/voice/agent_activity.py
Adds abstract commit_user_turn() to RealtimeSession. agent_activity.commit_user_turn() now calls _rt_session.commit_user_turn() if present, otherwise falls back to AudioRecognition flow.
AWS Realtime Plugin
livekit-plugins/livekit-plugins-aws/.../realtime/realtime_model.py
Adds commit_user_turn() that logs a warning stating Nova Sonic Realtime API does not support user-turn commit.
Google Realtime Plugin
livekit-plugins/livekit-plugins-google/.../realtime/realtime_api.py
Adds commit_user_turn() and changes commit_audio() / clear_audio() to log warnings for Gemini Realtime API unsupported actions.
OpenAI Realtime Plugin (stable & beta)
livekit-plugins/livekit-plugins-openai/.../realtime/realtime_model.py, .../realtime_model_beta.py
Implements commit_user_turn() to warn on auto-response/turn-detection combos, call commit_audio(), and emit a ResponseCreateEvent (empty params) to finalize the user turn.
Ultravox Realtime Plugin
livekit-plugins/livekit-plugins-ultravox/.../realtime/realtime_model.py
Adds commit_user_turn() that logs unsupported warning; replaces push_video() no-op with a warning log.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant AgentActivity as AgentActivity
participant RTSession as RealtimeSession
participant AudioRec as AudioRecognition
Note over AgentActivity,RTSession,AudioRec: User turn commit decision flow
AgentActivity->>RTSession: commit_user_turn()
alt RT session exists
RTSession-->>AgentActivity: commit_user_turn handled
else No RT session
AgentActivity->>AudioRec: commit_user_turn(audio_detached, timeout)
AudioRec-->>AgentActivity: audio commit result
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I thumped my paw and tapped the clock,
A turn is closed, no more to talk,
Plugins nod, some warn, some send,
A tiny hop to mark the end,
— rabbit jubilation 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: adding commit_user_turn support across realtime models, which is the primary focus of all file modifications in this changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 267bea1 and dd8f80a.

📒 Files selected for processing (2)
  • livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py
  • livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model_beta.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model_beta.py
🧠 Learnings (1)
📚 Learning: 2026-01-19T23:21:47.799Z
Learnt from: vishal-seshagiri-infinitusai
Repo: livekit/agents PR: 4559
File: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/responses/llm.py:122-123
Timestamp: 2026-01-19T23:21:47.799Z
Learning: Note from PR `#4559`: response_format was added as a passthrough to the OpenAI Responses API in livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/responses/llm.py, but this is scoped only to the Google provider and not for OpenAI. Reviewers should ensure that this passthrough behavior is gated by the provider (Google) and that OpenAI paths do not inadvertently reuse the same passthrough. Consider adding explicit provider checks, and update tests to verify that only the Google provider uses this passthrough while the OpenAI provider ignores it.

Applied to files:

  • livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model_beta.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model_beta.py (4)
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py (3)
  • commit_user_turn (1295-1310)
  • commit_audio (1286-1289)
  • send_event (691-693)
livekit-plugins/livekit-plugins-ultravox/livekit/plugins/ultravox/realtime/realtime_model.py (2)
  • commit_user_turn (1141-1142)
  • commit_audio (1135-1136)
livekit-plugins/livekit-plugins-google/livekit/plugins/google/realtime/realtime_api.py (2)
  • commit_user_turn (1233-1234)
  • commit_audio (1227-1228)
livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/experimental/realtime/realtime_model.py (2)
  • commit_user_turn (2008-2009)
  • commit_audio (2002-2003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: livekit-plugins-deepgram
  • GitHub Check: unit-tests
  • GitHub Check: type-check (3.9)
  • GitHub Check: type-check (3.13)
🔇 Additional comments (1)
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model_beta.py (1)

1123-1138: LGTM! Implementation is correct and consistent with the non-beta version.

The commit_user_turn method correctly:

  1. Warns when VAD auto-response is enabled (which could conflict with manual turn commits)
  2. Commits any buffered audio via commit_audio()
  3. Sends a ResponseCreateEvent to trigger the model response

The use of Response() is appropriate for the beta API (vs RealtimeResponseCreateParams() in the non-beta version).

One minor note: line 1129 exceeds the 100-character limit per coding guidelines (~110 chars), but this matches the pattern in the non-beta implementation and breaking the string would reduce readability. If the linter flags it, consider using a shorter message or a line continuation.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@bml1g12 bml1g12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, as it means the docs here https://docs.livekit.io/agents/logic/turns/#manual would apply also to realtime model

it might make sense to have clear_user_turn() also call self.clear_audio() for realtime model? I say this because I think then https://docs.livekit.io/agents/logic/turns/#manual would fully apply

# When user starts speaking
@ctx.room.local_participant.register_rpc_method("start_turn")
async def start_turn(data: rtc.RpcInvocationData):
    session.interrupt()  # Stop any current agent speech
    session.clear_user_turn()  # Clear any previous input
    session.input.set_audio_enabled(True)  # Start listening

As for cascaded models clear_user_turn() clears any previous model input, but for realtime model we also need to clear the audio I think

response=RealtimeResponseCreateParams(),
)
)
self.clear_audio()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a clear_audio is needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, right? It seems redundant, but it is required for OpenAI according to their doc:
Screenshot 2026-01-27 at 12 55 20

Copy link
Contributor

@longcw longcw Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send input_audio_buffer.clear before beginning a new user input.

I think it means you need to clear the buffer before next time you want to start a new user speech but not means it's required after response.create for this turn.

maybe it's similar to the session.clear_user_turn in the example @bml1g12 mentioned above

# When user starts speaking
@ctx.room.local_participant.register_rpc_method("start_turn")
async def start_turn(data: rtc.RpcInvocationData):
    session.interrupt()  # Stop any current agent speech
    session.clear_user_turn()  # Clear any previous input
    session.input.set_audio_enabled(True)  # Start listening

Copy link
Member Author

@chenghao-mou chenghao-mou Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can put that call in the clear_user_turn part.

Turns out we don't need this if we call session.clear_user_turn.

Copy link
Contributor

@bml1g12 bml1g12 Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes clear_user_turn already calls clear_audio() under the hood it seems, so indeed when you start the new turn you probably want to clear audio, now when you end the turn - and indeed that means probably not needed in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants