Skip to content

Conversation

@Ludobaka
Copy link

@Ludobaka Ludobaka commented Jan 23, 2026

This PR fixes an issue where end-of-turn (EOT) detection fails when using ElevenLabs' Scribe V2 STT model with the MultilingualModel turn detector.

Problem

When running an agent with the scribe_v2 model and turn detection enabled via MultilingualModel, EOT detection does not work. The turn detector rejects the language code returned by the STT service:

12:08:15.241 DEBUG  livekit.agents     received user transcript
                                       {"user_transcript": "Hi, I'm just testing things.", "language": "eng", ...}
12:08:15.245 INFO   livekit.agents     Turn detector does not support language eng {"room": "console"}

Root Cause

The EOT model expects the ISO 639-1 language code en, but ElevenLabs Scribe V2 returns the ISO 639-3 code eng (see ElevenLabs API reference).

Solution

Normalize the language code in livekit-agents/livekit/agents/voice/audio_recognition.py when setting self._last_language.

Why this approach?

Handling normalization at the audio_recognition.py level (rather than in the ElevenLabs plugin) ensures consistent behavior across all STT providers that may return non-standard language codes.

Of course any recommendations are welcome and I'll be glad to improve this proposal.

Summary by CodeRabbit

  • New Features
    • Added a public language-code mapping and helper for ISO 639-3 → ISO 639-1 conversions.
  • Improvements
    • Applied language-code normalization across speech-to-text paths so detected and streamed language labels are more consistent.
  • Chores
    • No breaking changes; only new constants/helpers were added.

✏️ Tip: You can customize this high-level summary in your review settings.

@CLAassistant
Copy link

CLAassistant commented Jan 23, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

Adds a new language mapping module exposing ISO_639_3_TO_1 and iso639_3_to_1; updates STT code to normalize language codes in both batch (_recognize_impl) and streaming (_process_stream_event) paths so normalized codes are used in transcription events and SpeechData.language fields.

Changes

Cohort / File(s) Summary
Language mapping module
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/languages.py
New file adding ISO_639_3_TO_1 dictionary and `iso639_3_to_1(code: str
STT language normalization
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
Use iso639_3_to_1 to compute normalized_language and pass it into transcription construction in _recognize_impl; compute and use normalized_language in streaming path so SpeechData.language is set to the normalized code. No external function signatures changed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I nibbled three letters down to one,
A hop, a tweak, the mapping's done,
I whisper codes both streaming and batch,
So transcripts match without a scratch,
Tiny paws, tidy language run. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title describes fixing an issue with ElevenLabs Scribe v2 and EOT prediction, which aligns with the actual changes that normalize language codes and enable EOT detection to work properly with the MultilingualModel turn detector.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/voice/audio_recognition.py`:
- Around line 413-414: normalized_language is computed but never used: after
computing normalized_language ("en" if language == "eng" else language) the code
assigns self._last_language = language, so normalization isn't applied for
PREFLIGHT_TRANSCRIPT events; change the assignment to use normalized_language
instead (assign self._last_language = normalized_language) in the same block
where normalized_language is computed so downstream logic uses the normalized
code (refer to normalized_language and self._last_language in the audio
recognition handling code).
🧹 Nitpick comments (1)
livekit-agents/livekit/agents/voice/audio_recognition.py (1)

353-354: Consider a more robust language code normalization approach.

The current fix only handles "eng""en", but other STT providers (or ElevenLabs for other languages) may return additional ISO 639-3 codes (e.g., "spa", "fra", "deu"). A utility function or library like langcodes or pycountry would provide comprehensive normalization.

♻️ Suggested helper function
# Could be added to a utils module
def normalize_language_code(language: str | None) -> str | None:
    """Normalize ISO 639-3 codes to ISO 639-1 where applicable."""
    if language is None:
        return None
    # Common ISO 639-3 to ISO 639-1 mappings
    ISO_639_3_TO_1 = {
        "eng": "en",
        "spa": "es",
        "fra": "fr",
        "deu": "de",
        "ita": "it",
        "por": "pt",
        "rus": "ru",
        "zho": "zh",
        "jpn": "ja",
        "kor": "ko",
        # Add more as needed
    }
    return ISO_639_3_TO_1.get(language, language)

Then usage becomes:

-                normalized_language = "en" if language == "eng" else language
-                self._last_language = normalized_language
+                self._last_language = normalize_language_code(language)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fe642d and 3d38116.

📒 Files selected for processing (1)
  • livekit-agents/livekit/agents/voice/audio_recognition.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-agents/livekit/agents/voice/audio_recognition.py
🧠 Learnings (1)
📚 Learning: 2026-01-22T03:28:16.289Z
Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.

Applied to files:

  • livekit-agents/livekit/agents/voice/audio_recognition.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit-tests
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.9)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

language and len(transcript) > MIN_LANGUAGE_DETECTION_LENGTH
):
self._last_language = language
normalized_language = "en" if language == "eng" else language
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps move this to the plugin of the 11labs?

Copy link
Author

@Ludobaka Ludobaka Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial thought as well, but I went with this approach for resilience: any other STT service that returns eng instead of en would be handled automatically. It also opens the way to use a normalizing function later in case other languages have the issue.

That said, I'm happy to move the normalization to the ElevenLabs plugin if you think that's the more appropriate place for it.

Copy link
Author

@Ludobaka Ludobaka Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@longcw Any conclusion?

As soon as I'll get a confirmation for one or the other approach I'll apply any needed change to the PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each vendor/plugin should take care of its conversion/translation to standard values. The duplicated code is acceptable at this level at this point.

We can definitely optimize this later when this pattern becomes actually common in plugins.

"tur": "tr",
"rus": "ru",
"hin": "hi",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing: we should include all the languages, and let the EOT decide if it is supported or not. This information might be used in other places/components where language is important (TTS/LLM). We can include a comment to the link as well https://elevenlabs.io/docs/overview/capabilities/speech-to-text#supported-languages

You can definitely create a separate languages.py to list all the values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As ISO 639-3 language codes represent a thousand of languages, can we consider using a Python library for this instead of handling every cases manually? I found pycountry that might do the job.

Another way would be to only consider Elevenlabs supported languages and manually implement the mapping.

Feel free to suggest any other library you prefer or ask for a manual dictionary implementation (either full ISO 639-3 or only the ones supported by 11Labs).

Copy link
Member

@chenghao-mou chenghao-mou Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the confusion, it's all the languages 11labs can support.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py`:
- Around line 63-83: The mapping constant ISO_639_3_TO_1 should be annotated as
Dict[str, str] and the function iso639_3_to_1 must accept Optional[str] and
return Optional[str]; update its signature to e.g. code: Optional[str] ->
Optional[str>, guard against None before calling .lower() (return None
immediately if code is None or empty), and replace the docstring with a short
Google-style docstring describing args and returns. Also split or shorten the
long comment line so it stays under 100 characters and keep references to
ISO_639_3_TO_1 and iso639_3_to_1 when making the changes.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7bd4ac9 and 8ce8c2e.

📒 Files selected for processing (1)
  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (4)
livekit-plugins/livekit-plugins-fal/livekit/plugins/fal/stt.py (1)
  • _transcription_to_speech_event (85-89)
livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/stt.py (1)
  • _transcription_to_speech_event (166-174)
livekit-agents/livekit/agents/voice/agent.py (1)
  • stt (508-518)
livekit-agents/livekit/agents/stt/stt.py (1)
  • SpeechData (53-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: type-check (3.9)
  • GitHub Check: type-check (3.13)
  • GitHub Check: unit-tests
🔇 Additional comments (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)

244-247: LGTM: normalized language is passed through consistently.

This keeps the SpeechEvent language aligned with the normalized code.


508-512: LGTM: streaming path now emits normalized language codes.

The fallback to "en" for missing language is sensible for downstream consumers.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)

247-255: Potential None passed to function expecting str.

normalized_language can be None when language_code is None (from response_json.get("language_code")). However, _transcription_to_speech_event at line 259 expects language_code: str, not str | None.

Consider providing a fallback:

Suggested fix
-        normalized_language = iso639_3_to_1(language_code) or language_code
+        normalized_language = iso639_3_to_1(language_code) or language_code or ""

Alternatively, update the _transcription_to_speech_event signature to accept str | None if None is a valid value.

🧹 Nitpick comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)

63-80: Expand language mapping to cover all ElevenLabs-supported languages.

The current mapping includes only 14 of the ~100 languages supported by ElevenLabs Scribe. While the fallback logic (or language_code) prevents errors, unmapped languages will pass ISO 639-3 codes downstream, which defeats the purpose of normalization for the turn detector expecting ISO 639-1 codes.

Consider expanding the mapping to include all supported languages. Extracting this to a separate languages.py module would improve maintainability.

Reference: https://elevenlabs.io/docs/overview/capabilities/speech-to-text#supported-languages

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ce8c2e and 1f77143.

📒 Files selected for processing (1)
  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🔇 Additional comments (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)

83-85: LGTM — implementation is correct with proper None handling.

The function correctly handles None input and performs case-insensitive lookup. The type hints are accurate.

Optional: Per coding guidelines, Google-style docstrings with Args: and Returns: sections are preferred, though for a simple helper like this the current docstring is acceptable.


511-514: LGTM — proper fallback handling with default "en".

The fallback chain iso639_3_to_1(self._language) or self._language or "en" correctly handles all None cases, ensuring a valid language string is always provided.

Note: This differs from _recognize_impl (lines 247-249) which doesn't have a final fallback. Consider aligning the fallback behavior for consistency.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py`:
- Around line 224-226: The batch path can propagate a None language_code into
SpeechData.language; update the logic that computes normalized_language
(currently using iso639_3_to_1(language_code) or language_code) to fall back to
self._language or "en" when language_code is missing, e.g. derive a
safe_language = language_code or self._language or "en" then call
iso639_3_to_1(safe_language) or safe_language before passing to
_transcription_to_speech_event so SpeechData.language always receives a str.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f77143 and 92ea218.

📒 Files selected for processing (2)
  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/languages.py
  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/languages.py
  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/languages.py (1)
  • iso639_3_to_1 (112-114)
livekit-agents/livekit/agents/stt/stt.py (1)
  • SpeechData (53-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit-tests
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.9)
🔇 Additional comments (4)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/languages.py (2)

9-109: Mapping table looks good and centralized.

Nice to have all 11Labs language normalization in one place.


112-114: Helper is concise and null-safe.

This utility is straightforward and safe for optional inputs.

livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)

42-42: Good extraction of normalization utility.

Keeping the mapping in a dedicated module improves reuse and clarity.


488-492: Streaming path normalization + default is solid.

The fallback to "en" prevents None from leaking into SpeechData.language.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +224 to +226
normalized_language = iso639_3_to_1(language_code) or language_code
return self._transcription_to_speech_event(
language_code=language_code,
language_code=normalized_language,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "stt.py" | grep elevenlabs

Repository: livekit/agents

Length of output: 136


🏁 Script executed:

head -1 livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py | wc -l
wc -l livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py

Repository: livekit/agents

Length of output: 140


🏁 Script executed:

sed -n '210,240p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py

Repository: livekit/agents

Length of output: 1055


🏁 Script executed:

sed -n '180,240p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py

Repository: livekit/agents

Length of output: 2548


🏁 Script executed:

grep -n "class SpeechData" livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
rg -A 10 "class SpeechData"

Repository: livekit/agents

Length of output: 772


🏁 Script executed:

grep -n "def.*batch\|async def" livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py | head -20

Repository: livekit/agents

Length of output: 441


🏁 Script executed:

sed -n '170,250p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py

Repository: livekit/agents

Length of output: 3237


🏁 Script executed:

sed -n '326,450p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py

Repository: livekit/agents

Length of output: 5154


🏁 Script executed:

sed -n '470,550p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py

Repository: livekit/agents

Length of output: 3746


🏁 Script executed:

grep -n "_process_stream_event" livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py

Repository: livekit/agents

Length of output: 177


Guard against missing language_code in batch path.

If ElevenLabs omits language_code, None will propagate into SpeechData.language (annotated as str) and violate the type contract, potentially breaking downstream consumers. The streaming path has protection with a fallback chain (self._language or "en"), but the batch path lacks this. Consider aligning the batch path with the streaming pattern.

-        normalized_language = iso639_3_to_1(language_code) or language_code
+        normalized_language = (
+            iso639_3_to_1(language_code)
+            or language_code
+            or self._opts.language_code
+            or "en"
+        )
🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py`
around lines 224 - 226, The batch path can propagate a None language_code into
SpeechData.language; update the logic that computes normalized_language
(currently using iso639_3_to_1(language_code) or language_code) to fall back to
self._language or "en" when language_code is missing, e.g. derive a
safe_language = language_code or self._language or "en" then call
iso639_3_to_1(safe_language) or safe_language before passing to
_transcription_to_speech_event so SpeechData.language always receives a str.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants