feat(observability): add OpenTelemetry tracing and metrics support#871
feat(observability): add OpenTelemetry tracing and metrics support#871minimAluminiumalism wants to merge 2 commits intoMoonshotAI:mainfrom
Conversation
7e695ce to
9f08e24
Compare
| def shutdown() -> None: | ||
| """Shutdown the OpenTelemetry SDK and flush any pending data.""" | ||
| global _initialized, _tracer_provider, _meter_provider | ||
|
|
||
| if not _initialized: | ||
| return | ||
|
|
||
| try: | ||
| if _tracer_provider is not None: | ||
| _tracer_provider.shutdown() | ||
| if _meter_provider is not None: | ||
| _meter_provider.shutdown() | ||
| logger.debug("Observability SDK shutdown complete") | ||
| except Exception as e: | ||
| logger.error("Error during observability SDK shutdown: {error}", error=e) | ||
|
|
||
| _initialized = False |
There was a problem hiding this comment.
🟡 Metrics module state not reset on SDK shutdown causing silent data loss on re-initialization
When shutdown() is called, the SDK sets _initialized = False but doesn't reset the metrics module's _metrics_initialized flag or the metric instrument variables (_session_counter, _turn_counter, etc.).
Click to expand
Issue Flow
initialize()with observability enabled → creates tracer/meterrecord_session_start()→_ensure_metrics_initialized()creates metric instruments, sets_metrics_initialized = Trueshutdown()→ sets SDK_initialized = False, but_metrics_initializedstaysTrueinitialize()again → creates NEW tracer/meterrecord_session_start()→_ensure_metrics_initialized()sees_metrics_initialized = True, returnsget_meter() is not None(True with new meter)- But
_session_counteretc. still point to OLD (shutdown) meter's instruments!
Relevant code in metrics.py:53-54:
if _metrics_initialized:
return get_meter() is not NoneThis early return prevents re-creation of metric instruments when they were created from a now-shutdown meter.
Impact
Metrics recorded after SDK re-initialization will silently go to the old shut-down meter instead of the new one, causing data loss. The code won't crash due to defensive None checks (e.g., if _session_counter is not None), but metrics will be silently discarded.
Recommendation: Add a function to reset metrics module state and call it from shutdown(). In sdk.py, also reset _tracer, _meter, _config to None. Consider adding a reset_metrics() function in metrics.py that resets _metrics_initialized = False and all metric instruments to None.
Was this helpful? React with 👍 or 👎 to provide feedback.
Related Issue
NA
Description
Add OpenTelemetry observability support.
Gemini CLI Telemetry - Reference implementation
Claude Code Monitoring
Checklist
make gen-changelogto update the changelog.make gen-docsto update the user documentation.