Skip to content

Support git_config parameter in SourceCode or ModelTrainer #5571

@moose-in-australia

Description

@moose-in-australia

Describe the feature you'd like
Add git_config parameter support to the SourceCode class (or directly to ModelTrainer) to enable fetching source code directly from Git repositories, similar to the functionality available in JumpStart models and v2 Estimator classes.

The git_config parameter should accept a dictionary with the following keys:

  • repo (required): Git repository URL (https, http, git@, or ssh://)
  • branch (optional): Branch name (defaults to 'master')
  • commit (optional): Specific commit hash
  • 2FA_enabled (optional): Boolean for GitHub 2FA authentication
  • username, password, token (optional): Authentication credentials

How would this feature be used? Please describe.
This feature would allow users to reference training code stored in Git repositories without manually cloning them first. This is particularly useful for:

  1. CI/CD pipelines - Automatically pull the latest training code from a repository
  2. Team collaboration - Share training scripts via version control without S3 uploads
  3. Reproducibility - Pin to specific commits for exact code versioning

Describe alternatives you've considered
Current workarounds include:

  1. Manual cloning - Clone the repository locally before creating ModelTrainer, then use local source_dir
  2. S3 upload - Upload code to S3 and reference it via S3 URI in source_dir
  3. Use legacy estimators - Switch to older Estimator classes that support git_config

Additional context
The SDK already has the infrastructure for this feature:

  • sagemaker.core.git_utils.git_clone_repo() handles Git cloning with authentication
  • sagemaker.core.git_utils._sanitize_git_url() provides security validation
  • JumpStart models (JumpStartModelInitKwargs) already support git_config

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions