Skip to content

_LocalPipeline Uses Class-Level Mutable State for Execution Registry #5572

@Mattral

Description

@Mattral

Bug: _LocalPipeline Uses Class-Level Mutable State for Execution Registry

PySDK Version

  • PySDK V3 (3.x)
  • PySDK V2 (2.x)

Describe the bug

The _LocalPipeline class defines _executions as a class-level mutable dictionary:

class _LocalPipeline(object):
    _executions = {}

This results in execution state being shared across all _LocalPipeline instances within the same Python process.

Because this registry is global to the class:

  • Execution state is not isolated per pipeline instance
  • Execution state is not isolated per LocalPipelineSession
  • The structure is not thread-safe
  • Execution objects accumulate without lifecycle management
  • This can cause cross-pipeline contamination in multi-pipeline scenarios

This violates expected session scoping and encapsulation guarantees typically followed in AWS SDK design patterns.


Why this matters

Local mode is commonly used for:

  • Development workflows
  • CI pipelines
  • Unit/integration testing
  • Multi-pipeline experimentation in notebooks

Shared mutable state introduces:

  • Non-deterministic behavior
  • Hard-to-debug test interference
  • Memory growth in long-running processes
  • Race conditions in concurrent execution environments

Even though this affects local mode only, isolation guarantees are still important for SDK correctness and developer trust.


To reproduce

The following minimal example demonstrates that _executions is shared across instances:

from sagemaker.mlops.local.pipeline_entities import _LocalPipeline

class DummyPipeline:
    def __init__(self, name):
        self.name = name
    def definition(self):
        return "{}"

pipeline1 = _LocalPipeline(DummyPipeline("pipeline1"))
pipeline2 = _LocalPipeline(DummyPipeline("pipeline2"))

pipeline1._executions["exec1"] = "execution1"
pipeline2._executions["exec2"] = "execution2"

print(pipeline1._executions)
print(pipeline2._executions)
print(pipeline1._executions is pipeline2._executions)

Output:

{'exec1': 'execution1', 'exec2': 'execution2'}
{'exec1': 'execution1', 'exec2': 'execution2'}
True

This confirms both instances reference the same dictionary object.


Expected behavior

Each _LocalPipeline instance should maintain its own execution registry.

Example correction:

class _LocalPipeline(object):

    def __init__(self, ...):
        ...
        self._executions = {}

This ensures:

  • Proper execution isolation
  • Predictable behavior across sessions
  • Improved test determinism
  • Elimination of unintended cross-instance state leakage

System information

  • SageMaker Python SDK version: main branch (sagemaker-mlops local module)
  • Python version: 3.9+
  • CPU or GPU: CPU
  • Custom Docker image: No

Proposed resolution

Move _executions from class scope to instance scope inside _LocalPipeline.__init__.

Optional enhancements for robustness:

  • Consider execution cleanup strategy (TTL or explicit removal)
  • Consider thread-safety if concurrent local execution is intended to be supported

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions