Skip to content

[FEATURE] Add tokens per second metrics for performance monitoring #1634

@labeveryday

Description

@labeveryday

Problem Statement

When comparing model performance across providers (Bedrock, Ollama, OpenAI, etc.), there's no built-in way to measure generation throughput. Tools like Ollama display tokens/second after runs, which is valuable for performance tuning and provider comparison. Currently, Strands tracks outputTokens and latencyMs separately, but doesn't compute the rate.

Proposed Solution

Add tokens per second metrics to the existing metrics system:

  1. Add output_tokens_per_second to EventLoopCycleMetric (per-turn)
  2. Add average_output_tokens_per_second computed property on EventLoopMetrics (across all turns)
  3. Export via OpenTelemetry histogram (strands.event_loop.output_tokens_per_second)

Calculation: output_tokens_per_second = outputTokens / (latencyMs / 1000)

Use Case

  • Compare generation speed across model providers
  • Identify performance regressions when switching models
  • Monitor throughput in production deployments
  • Benchmark different model configurations

Alternatives Solutions

Users can manually compute this from existing usage and metrics data in AgentResult, but having it built-in provides consistency and enables OpenTelemetry-based monitoring dashboards.

Additional Context

Related issues:

The building blocks already exist in the codebase:

  • Usage type in src/strands/types/event_loop.py tracks outputTokens
  • Metrics type tracks latencyMs
  • EventLoopMetrics in src/strands/telemetry/metrics.py already accumulates both
  • OpenTelemetry integration is in place for exporting new metrics

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions