-
Notifications
You must be signed in to change notification settings - Fork 225
Description
Hi Martin,
First of all, thank you very much for your excellent work. Thanks to your tool, a Linux beginner like me was able to successfully run the latest protein binder design workflow. I really appreciate the effort you’ve put into making this accessible.
System information
OS: CentOS 7 (Linux server)
RAM: 1 TB
GPU: NVIDIA A100 80GB
NVIDIA driver: 525.89
Job execution: running in background using nohup ... &
Issue description
While running the program, I noticed that occasionally the job reports a very slow operation during Stage 1: Test Logits, which appears in the nohup.out log file.
This does not happen for every trajectory. Some trajectories run normally, while others trigger this warning. When it happens, the compile step may take ~30 minutes to 1 hour before continuing.
Below is a typical log snippet:
Stage 1: Test Logits
2025-12-12 22:52:35.072047: E external/xla/xla/service/slow_operation_alarm.cc:73]
[Compiling module jit__model] Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2025-12-12 22:53:05.373007: E external/xla/xla/service/slow_operation_alarm.cc:140] The operation took 2m30.301582371s
[Compiling module jit__model] Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
1 models [2] recycles 1 hard 0 soft 0.02 temp 1 loss 15.62 helix 1.79 pae 0.91 i_pae 0.95 con 5.11 i_con 4.80 plddt 0.22 ptm 0.66 i_ptm 0.08 rg 18.89
Questions
1.Is this behavior expected for certain trajectories or input sizes?
2.What could be the underlying reason for such slow XLA compilation?
3.Are there any recommended solutions or workarounds?
If you need any additional logs or environment details, I would be happy to provide them.
Thanks again for your great work!
Best regards