Fix Boltz-2 pLDDT/PAE indexing for residue-level scores #24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes two issues when running
ipsae.pyon Boltz-2 structures:plddt_AAAAA_model_0.npzandpae_AAAAA_model_0.npzwere indexed withtoken_array.astype(bool), which assumes a 1:1 correspondence between theBoltz-1 pLDDT/PAE vectors and the
token_maskbuilt from the mmCIF_atom_sitetable. For some Boltz-2 outputs this is not true and leads to:IndexError: index XXX is out of bounds for axis 0 with size Ymean_plddt = cb_plddt[list(pDockQ_unique_residues[chain1][chain2])].mean()cb_plddtcould end up having a length different from the number of scoredresidues (
numres), while downstream code assumes residue-level arrays oflength
numres(e.g. for pDockQ, ipSAE by residue).What this change does
For
boltz-1/boltz-2inputs we now:Load
plddtfromplddt_*.npz, scale it to 0–100, and then:len(plddt) >= max(CA_atom_num)+1, treat it as per-atom and buildresidue-level
plddt/cb_plddtusingCA_atom_num/CB_atom_num(same strategy as the AF3 code path).
len(plddt) == numres, treat it as per-residue and use it directly.numreswith a warning sothat downstream calculations never hit an out-of-bounds error.
Load
paefrompae_*.npzand ensurepae_matrixis(numres, numres):[:numres, :numres].numres x numres, use it as-is.This makes sure that all residue-level arrays (
plddt,cb_plddt,pae_matrix) are consistent with the rest of the script, and fixes theBoltz-2 crashes I was seeing in practice.
Manual testing
Ran
ipsae.pyon Boltz-2 outputs (structure.cif,plddt_*.npz,pae_*.npz,confidence_*.json) where the previous version raised:IndexError: index 604 is out of bounds for axis 0 with size 604IndexError: index 600 is out of bounds for axis 0 with size 600With this patch,
ipsae.pycompletes successfully and produces scores forall chain pairs (including pDockQ, pDockQ2, LIS and the various ipSAE
variants).
There are no changes to AF2/AF3 paths.