Occasional very slow XLA compilation during Stage 1 (jit__model) on CentOS 7 + A100

Hi Martin,

First of all, thank you very much for your excellent work. Thanks to your tool, a Linux beginner like me was able to successfully run the latest protein binder design workflow. I really appreciate the effort you’ve put into making this accessible.

System information

OS: CentOS 7 (Linux server)

RAM: 1 TB

GPU: NVIDIA A100 80GB

NVIDIA driver: 525.89

Job execution: running in background using nohup ... &

Issue description

While running the program, I noticed that occasionally the job reports a very slow operation during Stage 1: Test Logits, which appears in the nohup.out log file.

This does not happen for every trajectory. Some trajectories run normally, while others trigger this warning. When it happens, the compile step may take ~30 minutes to 1 hour before continuing.

Below is a typical log snippet:

Stage 1: Test Logits 
2025-12-12 22:52:35.072047: E external/xla/xla/service/slow_operation_alarm.cc:73]
********************************
[Compiling module jit__model] Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
********************************
2025-12-12 22:53:05.373007: E external/xla/xla/service/slow_operation_alarm.cc:140] The operation took 2m30.301582371s

********************************
[Compiling module jit__model] Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
********************************
1 models [2] recycles 1 hard 0 soft 0.02 temp 1 loss 15.62 helix 1.79 pae 0.91 i_pae 0.95 con 5.11 i_con 4.80 plddt 0.22 ptm 0.66 i_ptm 0.08 rg 18.89

Questions

1.Is this behavior expected for certain trajectories or input sizes?

2.What could be the underlying reason for such slow XLA compilation?

3.Are there any recommended solutions or workarounds?

If you need any additional logs or environment details, I would be happy to provide them.

Thanks again for your great work!

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Occasional very slow XLA compilation during Stage 1 (jit__model) on CentOS 7 + A100 #347

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Occasional very slow XLA compilation during Stage 1 (jit__model) on CentOS 7 + A100 #347

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions