-
Notifications
You must be signed in to change notification settings - Fork 2.7k
fix: 11Labs Scribe v2 model not working with EOT prediction model #4601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds a new language mapping module exposing ISO_639_3_TO_1 and iso639_3_to_1; updates STT code to normalize language codes in both batch (_recognize_impl) and streaming (_process_stream_event) paths so normalized codes are used in transcription events and SpeechData.language fields. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/voice/audio_recognition.py`:
- Around line 413-414: normalized_language is computed but never used: after
computing normalized_language ("en" if language == "eng" else language) the code
assigns self._last_language = language, so normalization isn't applied for
PREFLIGHT_TRANSCRIPT events; change the assignment to use normalized_language
instead (assign self._last_language = normalized_language) in the same block
where normalized_language is computed so downstream logic uses the normalized
code (refer to normalized_language and self._last_language in the audio
recognition handling code).
🧹 Nitpick comments (1)
livekit-agents/livekit/agents/voice/audio_recognition.py (1)
353-354: Consider a more robust language code normalization approach.The current fix only handles
"eng"→"en", but other STT providers (or ElevenLabs for other languages) may return additional ISO 639-3 codes (e.g.,"spa","fra","deu"). A utility function or library likelangcodesorpycountrywould provide comprehensive normalization.♻️ Suggested helper function
# Could be added to a utils module def normalize_language_code(language: str | None) -> str | None: """Normalize ISO 639-3 codes to ISO 639-1 where applicable.""" if language is None: return None # Common ISO 639-3 to ISO 639-1 mappings ISO_639_3_TO_1 = { "eng": "en", "spa": "es", "fra": "fr", "deu": "de", "ita": "it", "por": "pt", "rus": "ru", "zho": "zh", "jpn": "ja", "kor": "ko", # Add more as needed } return ISO_639_3_TO_1.get(language, language)Then usage becomes:
- normalized_language = "en" if language == "eng" else language - self._last_language = normalized_language + self._last_language = normalize_language_code(language)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
livekit-agents/livekit/agents/voice/audio_recognition.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/voice/audio_recognition.py
🧠 Learnings (1)
📚 Learning: 2026-01-22T03:28:16.289Z
Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.
Applied to files:
livekit-agents/livekit/agents/voice/audio_recognition.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.9)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| language and len(transcript) > MIN_LANGUAGE_DETECTION_LENGTH | ||
| ): | ||
| self._last_language = language | ||
| normalized_language = "en" if language == "eng" else language |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps move this to the plugin of the 11labs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial thought as well, but I went with this approach for resilience: any other STT service that returns eng instead of en would be handled automatically. It also opens the way to use a normalizing function later in case other languages have the issue.
That said, I'm happy to move the normalization to the ElevenLabs plugin if you think that's the more appropriate place for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@longcw Any conclusion?
As soon as I'll get a confirmation for one or the other approach I'll apply any needed change to the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each vendor/plugin should take care of its conversion/translation to standard values. The duplicated code is acceptable at this level at this point.
We can definitely optimize this later when this pattern becomes actually common in plugins.
| "tur": "tr", | ||
| "rus": "ru", | ||
| "hin": "hi", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small thing: we should include all the languages, and let the EOT decide if it is supported or not. This information might be used in other places/components where language is important (TTS/LLM). We can include a comment to the link as well https://elevenlabs.io/docs/overview/capabilities/speech-to-text#supported-languages
You can definitely create a separate languages.py to list all the values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As ISO 639-3 language codes represent a thousand of languages, can we consider using a Python library for this instead of handling every cases manually? I found pycountry that might do the job.
Another way would be to only consider Elevenlabs supported languages and manually implement the mapping.
Feel free to suggest any other library you prefer or ask for a manual dictionary implementation (either full ISO 639-3 or only the ones supported by 11Labs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the confusion, it's all the languages 11labs can support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py`:
- Around line 63-83: The mapping constant ISO_639_3_TO_1 should be annotated as
Dict[str, str] and the function iso639_3_to_1 must accept Optional[str] and
return Optional[str]; update its signature to e.g. code: Optional[str] ->
Optional[str>, guard against None before calling .lower() (return None
immediately if code is None or empty), and replace the docstring with a short
Google-style docstring describing args and returns. Also split or shorten the
long comment line so it stays under 100 characters and keep references to
ISO_639_3_TO_1 and iso639_3_to_1 when making the changes.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (4)
livekit-plugins/livekit-plugins-fal/livekit/plugins/fal/stt.py (1)
_transcription_to_speech_event(85-89)livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/stt.py (1)
_transcription_to_speech_event(166-174)livekit-agents/livekit/agents/voice/agent.py (1)
stt(508-518)livekit-agents/livekit/agents/stt/stt.py (1)
SpeechData(53-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: type-check (3.9)
- GitHub Check: type-check (3.13)
- GitHub Check: unit-tests
🔇 Additional comments (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)
244-247: LGTM: normalized language is passed through consistently.This keeps the SpeechEvent language aligned with the normalized code.
508-512: LGTM: streaming path now emits normalized language codes.The fallback to
"en"for missing language is sensible for downstream consumers.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
247-255: PotentialNonepassed to function expectingstr.
normalized_languagecan beNonewhenlanguage_codeisNone(fromresponse_json.get("language_code")). However,_transcription_to_speech_eventat line 259 expectslanguage_code: str, notstr | None.Consider providing a fallback:
Suggested fix
- normalized_language = iso639_3_to_1(language_code) or language_code + normalized_language = iso639_3_to_1(language_code) or language_code or ""Alternatively, update the
_transcription_to_speech_eventsignature to acceptstr | NoneifNoneis a valid value.
🧹 Nitpick comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
63-80: Expand language mapping to cover all ElevenLabs-supported languages.The current mapping includes only 14 of the ~100 languages supported by ElevenLabs Scribe. While the fallback logic (
or language_code) prevents errors, unmapped languages will pass ISO 639-3 codes downstream, which defeats the purpose of normalization for the turn detector expecting ISO 639-1 codes.Consider expanding the mapping to include all supported languages. Extracting this to a separate
languages.pymodule would improve maintainability.Reference: https://elevenlabs.io/docs/overview/capabilities/speech-to-text#supported-languages
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🔇 Additional comments (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)
83-85: LGTM — implementation is correct with proper None handling.The function correctly handles
Noneinput and performs case-insensitive lookup. The type hints are accurate.Optional: Per coding guidelines, Google-style docstrings with
Args:andReturns:sections are preferred, though for a simple helper like this the current docstring is acceptable.
511-514: LGTM — proper fallback handling with default "en".The fallback chain
iso639_3_to_1(self._language) or self._language or "en"correctly handles allNonecases, ensuring a valid language string is always provided.Note: This differs from
_recognize_impl(lines 247-249) which doesn't have a final fallback. Consider aligning the fallback behavior for consistency.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py`:
- Around line 224-226: The batch path can propagate a None language_code into
SpeechData.language; update the logic that computes normalized_language
(currently using iso639_3_to_1(language_code) or language_code) to fall back to
self._language or "en" when language_code is missing, e.g. derive a
safe_language = language_code or self._language or "en" then call
iso639_3_to_1(safe_language) or safe_language before passing to
_transcription_to_speech_event so SpeechData.language always receives a str.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/languages.pylivekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/languages.pylivekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/languages.py (1)
iso639_3_to_1(112-114)livekit-agents/livekit/agents/stt/stt.py (1)
SpeechData(53-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.9)
🔇 Additional comments (4)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/languages.py (2)
9-109: Mapping table looks good and centralized.Nice to have all 11Labs language normalization in one place.
112-114: Helper is concise and null-safe.This utility is straightforward and safe for optional inputs.
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)
42-42: Good extraction of normalization utility.Keeping the mapping in a dedicated module improves reuse and clarity.
488-492: Streaming path normalization + default is solid.The fallback to
"en"preventsNonefrom leaking intoSpeechData.language.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| normalized_language = iso639_3_to_1(language_code) or language_code | ||
| return self._transcription_to_speech_event( | ||
| language_code=language_code, | ||
| language_code=normalized_language, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "stt.py" | grep elevenlabsRepository: livekit/agents
Length of output: 136
🏁 Script executed:
head -1 livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py | wc -l
wc -l livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.pyRepository: livekit/agents
Length of output: 140
🏁 Script executed:
sed -n '210,240p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.pyRepository: livekit/agents
Length of output: 1055
🏁 Script executed:
sed -n '180,240p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.pyRepository: livekit/agents
Length of output: 2548
🏁 Script executed:
grep -n "class SpeechData" livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
rg -A 10 "class SpeechData"Repository: livekit/agents
Length of output: 772
🏁 Script executed:
grep -n "def.*batch\|async def" livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py | head -20Repository: livekit/agents
Length of output: 441
🏁 Script executed:
sed -n '170,250p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.pyRepository: livekit/agents
Length of output: 3237
🏁 Script executed:
sed -n '326,450p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.pyRepository: livekit/agents
Length of output: 5154
🏁 Script executed:
sed -n '470,550p' livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.pyRepository: livekit/agents
Length of output: 3746
🏁 Script executed:
grep -n "_process_stream_event" livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.pyRepository: livekit/agents
Length of output: 177
Guard against missing language_code in batch path.
If ElevenLabs omits language_code, None will propagate into SpeechData.language (annotated as str) and violate the type contract, potentially breaking downstream consumers. The streaming path has protection with a fallback chain (self._language or "en"), but the batch path lacks this. Consider aligning the batch path with the streaming pattern.
- normalized_language = iso639_3_to_1(language_code) or language_code
+ normalized_language = (
+ iso639_3_to_1(language_code)
+ or language_code
+ or self._opts.language_code
+ or "en"
+ )🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py`
around lines 224 - 226, The batch path can propagate a None language_code into
SpeechData.language; update the logic that computes normalized_language
(currently using iso639_3_to_1(language_code) or language_code) to fall back to
self._language or "en" when language_code is missing, e.g. derive a
safe_language = language_code or self._language or "en" then call
iso639_3_to_1(safe_language) or safe_language before passing to
_transcription_to_speech_event so SpeechData.language always receives a str.
This PR fixes an issue where end-of-turn (EOT) detection fails when using ElevenLabs' Scribe V2 STT model with the
MultilingualModelturn detector.Problem
When running an agent with the
scribe_v2model and turn detection enabled viaMultilingualModel, EOT detection does not work. The turn detector rejects the language code returned by the STT service:Root Cause
The EOT model expects the ISO 639-1 language code
en, but ElevenLabs Scribe V2 returns the ISO 639-3 codeeng(see ElevenLabs API reference).Solution
Normalize the language code in
livekit-agents/livekit/agents/voice/audio_recognition.pywhen settingself._last_language.Why this approach?
Handling normalization at the
audio_recognition.pylevel (rather than in the ElevenLabs plugin) ensures consistent behavior across all STT providers that may return non-standard language codes.Of course any recommendations are welcome and I'll be glad to improve this proposal.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.