-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Bug: _LocalPipeline Uses Class-Level Mutable State for Execution Registry
PySDK Version
- PySDK V3 (3.x)
- PySDK V2 (2.x)
Describe the bug
The _LocalPipeline class defines _executions as a class-level mutable dictionary:
class _LocalPipeline(object):
_executions = {}This results in execution state being shared across all _LocalPipeline instances within the same Python process.
Because this registry is global to the class:
- Execution state is not isolated per pipeline instance
- Execution state is not isolated per
LocalPipelineSession - The structure is not thread-safe
- Execution objects accumulate without lifecycle management
- This can cause cross-pipeline contamination in multi-pipeline scenarios
This violates expected session scoping and encapsulation guarantees typically followed in AWS SDK design patterns.
Why this matters
Local mode is commonly used for:
- Development workflows
- CI pipelines
- Unit/integration testing
- Multi-pipeline experimentation in notebooks
Shared mutable state introduces:
- Non-deterministic behavior
- Hard-to-debug test interference
- Memory growth in long-running processes
- Race conditions in concurrent execution environments
Even though this affects local mode only, isolation guarantees are still important for SDK correctness and developer trust.
To reproduce
The following minimal example demonstrates that _executions is shared across instances:
from sagemaker.mlops.local.pipeline_entities import _LocalPipeline
class DummyPipeline:
def __init__(self, name):
self.name = name
def definition(self):
return "{}"
pipeline1 = _LocalPipeline(DummyPipeline("pipeline1"))
pipeline2 = _LocalPipeline(DummyPipeline("pipeline2"))
pipeline1._executions["exec1"] = "execution1"
pipeline2._executions["exec2"] = "execution2"
print(pipeline1._executions)
print(pipeline2._executions)
print(pipeline1._executions is pipeline2._executions)Output:
{'exec1': 'execution1', 'exec2': 'execution2'}
{'exec1': 'execution1', 'exec2': 'execution2'}
True
This confirms both instances reference the same dictionary object.
Expected behavior
Each _LocalPipeline instance should maintain its own execution registry.
Example correction:
class _LocalPipeline(object):
def __init__(self, ...):
...
self._executions = {}This ensures:
- Proper execution isolation
- Predictable behavior across sessions
- Improved test determinism
- Elimination of unintended cross-instance state leakage
System information
- SageMaker Python SDK version: main branch (sagemaker-mlops local module)
- Python version: 3.9+
- CPU or GPU: CPU
- Custom Docker image: No
Proposed resolution
Move _executions from class scope to instance scope inside _LocalPipeline.__init__.
Optional enhancements for robustness:
- Consider execution cleanup strategy (TTL or explicit removal)
- Consider thread-safety if concurrent local execution is intended to be supported