feat: add Dockerfile for GPU training with Megatron backend by saurabhbikram · Pull Request #557 · OpenPipe/ART

saurabhbikram · 2026-02-12T08:32:19Z

Summary

Adds a production Dockerfile for running Megatron-based RL training on GPU instances. Based on pytorch/pytorch:2.9.0-cuda12.8-cudnn9-devel with:

megatron-core + megatron-bridge for distributed training
flash-attn compiled from source (must match container torch ABI)
Transformer Engine rebuilt from GitHub source (fixes ABI mismatch with PyPI prebuilt wheels compiled against NVIDIA's custom torch)
grouped_gemm for MoE LoRA support
Workaround for TE triton kernel incompatibility with Triton 3.5+ (core.get_int_dtype() not hashable by JIT)

Also improves .dockerignore to exclude .git/, .github/, .claude/, .ruff_cache/, *.pyc, and example data/venv/wandb directories from the build context.

Files changed

Dockerfile (new) — multi-stage GPU training image
.dockerignore — additional exclusions for cleaner build context

Adds a production Dockerfile based on pytorch/pytorch:2.9.0-cuda12.8 with all dependencies for Megatron-based RL training: - megatron-core + megatron-bridge for distributed training - flash-attn compiled from source to match container torch ABI - Transformer Engine rebuilt from source (fixes PyPI wheel ABI mismatch) - grouped_gemm for MoE LoRA support - Workaround for TE triton kernel incompatibility with Triton 3.5+ Also improves .dockerignore to exclude .git/, .github/, .claude/, .ruff_cache/, *.pyc, and example data/venv/wandb directories. https://claude.ai/code/session_017Y9KNNQX2RyVWnqpj3A4hh

saurabhbikram marked this pull request as ready for review February 12, 2026 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add Dockerfile for GPU training with Megatron backend#557

feat: add Dockerfile for GPU training with Megatron backend#557
saurabhbikram wants to merge 1 commit intoOpenPipe:mainfrom
nansen-ai:add-gpu-training-dockerfile

saurabhbikram commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

saurabhbikram commented Feb 12, 2026

Summary

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants