-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix: Add reconnection support for ElevenLabs STT WebSocket disconnect… #4614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: Add reconnection support for ElevenLabs STT WebSocket disconnect… #4614
Conversation
…ions - Add retryable=True to APIStatusError when connection closes unexpectedly - Implement exponential backoff for reconnection attempts (max 5 attempts) - Reset _speaking state on reconnection to prevent state corruption - Add logging for reconnection attempts - Reset reconnection counter after successful reconnection Fixes livekit#4609
📝 WalkthroughWalkthroughAdds reconnection logic to ElevenLabs STT WebSocket handling: introduces reconnect attempts with exponential backoff, a maximum attempts limit, and resets relevant state on successful reconnects; raises APIStatusError(retryable=True) when the socket closes unexpectedly. Changes
Sequence DiagramsequenceDiagram
participant Agent as Agent/SpeechStream
participant WS as WebSocket
participant EL as ElevenLabs Service
Agent->>WS: establish connection
activate WS
WS->>EL: open stream
activate EL
EL-->>WS: stream messages
WS-->>Agent: deliver messages
WS-->>Agent: connection closes unexpectedly
deactivate WS
deactivate EL
Agent->>Agent: raise APIStatusError(retryable=true)
Agent->>Agent: increment reconnect_attempt
Agent->>Agent: compare to max_reconnect_attempts
alt attempts remaining
Agent->>Agent: exponential backoff delay
Agent->>WS: reconnect attempt
activate WS
WS->>EL: reopen stream
activate EL
EL-->>WS: stream resumes
WS-->>Agent: messages resume
Agent->>Agent: reset reconnect_attempt = 0
Agent->>Agent: clear reconnect state
else max exceeded
Agent->>Agent: stop stream (hard stop)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
Removed unnecessary whitespace in exponential backoff calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
400-447: Reconnection loop still exits on first disconnect.
APIStatusErrorfromrecv_task(and failed reconnect connects) currently bubbles out of_run, so the backoff/attempt logic never executes. Catch retryable connection errors, continue the loop until the attempt limit, and reset_speakingfor any reconnect (including_reconnect_event) so START_OF_SPEECH isn’t suppressed.🔧 Proposed fix
while True: try: if reconnect_attempt > 0: logger.info( "Reconnecting to ElevenLabs STT (attempt %d/%d)", reconnect_attempt + 1, max_reconnect_attempts, ) # Reset speaking state on reconnection self._speaking = False # Add exponential backoff await asyncio.sleep(min(2 ** reconnect_attempt, 10)) reconnect_attempt += 1 if reconnect_attempt > max_reconnect_attempts: logger.error("Max reconnection attempts reached for ElevenLabs STT") break - ws = await self._connect_ws() + try: + ws = await self._connect_ws() + except APIConnectionError as e: + if reconnect_attempt >= max_reconnect_attempts: + raise + logger.warning( + "ElevenLabs STT connect failed; retrying", + exc_info=e, + ) + continue tasks = [ asyncio.create_task(send_task(ws)), asyncio.create_task(recv_task(ws)), asyncio.create_task(keepalive_task(ws)), ] tasks_group = asyncio.gather(*tasks) wait_reconnect_task = asyncio.create_task(self._reconnect_event.wait()) try: done, _ = await asyncio.wait( (tasks_group, wait_reconnect_task), return_when=asyncio.FIRST_COMPLETED, ) for task in done: if task != wait_reconnect_task: task.result() if wait_reconnect_task not in done: break self._reconnect_event.clear() # Reset reconnection counter on successful reconnection reconnect_attempt = 0 + self._speaking = False + except APIStatusError as e: + if reconnect_attempt >= max_reconnect_attempts: + raise + logger.warning( + "ElevenLabs STT disconnected; retrying", + exc_info=e, + ) + continue finally: await utils.aio.gracefully_cancel(*tasks, wait_reconnect_task) tasks_group.cancel() tasks_group.exception() # Retrieve exception to prevent it from being logged finally: if ws is not None: await ws.close()
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
livekit-agents/livekit/agents/_exceptions.py (1)
APIStatusError(45-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: type-check (3.9)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.13)
🔇 Additional comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
385-388: Retryable close error handling looks good.This correctly marks unexpected WebSocket closures as retryable for upstream recovery logic.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
432-454: Missing exception handler for reconnection on unexpected disconnection.The reconnection logic with exponential backoff is added, but there's no
exceptblock to catch theAPIStatusErrorraised byrecv_taskwhen the WebSocket closes unexpectedly. The exception will propagate through bothfinallyblocks and exit thewhile Trueloop, causing the stream to end without any reconnection attempts.To enable automatic reconnection on unexpected disconnection, you need to catch
APIStatusError(or at least retryable errors) and continue the loop:🐛 Suggested fix to catch exceptions and retry
try: done, _ = await asyncio.wait( (tasks_group, wait_reconnect_task), return_when=asyncio.FIRST_COMPLETED, ) for task in done: if task != wait_reconnect_task: task.result() if wait_reconnect_task not in done: break self._reconnect_event.clear() # Reset reconnection counter on successful reconnection reconnect_attempt = 0 + except APIStatusError as e: + if not e.retryable: + raise + logger.warning("ElevenLabs STT connection error (retryable): %s", e) + # Continue the loop to trigger reconnection + continue finally: await utils.aio.gracefully_cancel(*tasks, wait_reconnect_task) tasks_group.cancel() tasks_group.exception() # Retrieve exception to prevent it from being loggedNote: With this fix, you should also reset
reconnect_attempt = 0after a successful connection is established (e.g., afterws = await self._connect_ws()succeeds and tasks start running without immediate failure), not just after an intentional reconnect via_reconnect_event.
🧹 Nitpick comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
407-421: Off-by-one confusion in reconnection logging.The log message is misleading. When
reconnect_attempt=1(first time through the reconnect path), it logs"attempt 2/5". This is because the log usesreconnect_attempt + 1, butreconnect_attemptis already 1 at this point (incremented from the initial connection attempt).Consider logging
reconnect_attemptdirectly to show the actual reconnection attempt number:♻️ Suggested fix
if reconnect_attempt > 0: logger.info( "Reconnecting to ElevenLabs STT (attempt %d/%d)", - reconnect_attempt + 1, + reconnect_attempt, max_reconnect_attempts, )
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
livekit-agents/livekit/agents/_exceptions.py (1)
APIStatusError(45-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)
385-388: LGTM - Explicit retryable flag improves clarity.The explicit
retryable=Truemakes the intent clear, even thoughAPIStatusErrordefaults toretryable=Truewhenstatus_code=-1. This correctly addresses the root cause from issue#4609.
413-414: Good practice: resetting_speakingstate on reconnection.Resetting
_speaking = Falseensures thatSTART_OF_SPEECHwill be properly emitted after reconnection, preventing state corruption where the stream might think it's already in a speech segment from the previous connection.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
…ions
Add retryable=True to APIStatusError when connection closes unexpectedly
Fixes ElevenLabs STT stream does not reconnect after mid-stream WebSocket disconnection #4609
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.