Skip to content

Occasional very slow XLA compilation during Stage 1 (jit__model) on CentOS 7 + A100 #347

@lg1243610235-cmyk

Description

@lg1243610235-cmyk

Hi Martin,

First of all, thank you very much for your excellent work. Thanks to your tool, a Linux beginner like me was able to successfully run the latest protein binder design workflow. I really appreciate the effort you’ve put into making this accessible.

System information

OS: CentOS 7 (Linux server)

RAM: 1 TB

GPU: NVIDIA A100 80GB

NVIDIA driver: 525.89

Job execution: running in background using nohup ... &

Issue description

While running the program, I noticed that occasionally the job reports a very slow operation during Stage 1: Test Logits, which appears in the nohup.out log file.

This does not happen for every trajectory. Some trajectories run normally, while others trigger this warning. When it happens, the compile step may take ~30 minutes to 1 hour before continuing.

Below is a typical log snippet:

Stage 1: Test Logits
2025-12-12 22:52:35.072047: E external/xla/xla/service/slow_operation_alarm.cc:73]


[Compiling module jit__model] Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.


2025-12-12 22:53:05.373007: E external/xla/xla/service/slow_operation_alarm.cc:140] The operation took 2m30.301582371s


[Compiling module jit__model] Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.


1 models [2] recycles 1 hard 0 soft 0.02 temp 1 loss 15.62 helix 1.79 pae 0.91 i_pae 0.95 con 5.11 i_con 4.80 plddt 0.22 ptm 0.66 i_ptm 0.08 rg 18.89

Questions

1.Is this behavior expected for certain trajectories or input sizes?

2.What could be the underlying reason for such slow XLA compilation?

3.Are there any recommended solutions or workarounds?

If you need any additional logs or environment details, I would be happy to provide them.

Thanks again for your great work!

Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions