Skip to content

Conversation

@mcfrith
Copy link

@mcfrith mcfrith commented Oct 13, 2025

This adds a -d option to esl-mixdchlet fit, which restricts it to mixture Dirichlets that are DNA strand symmetric (for length-4 count vectors with counts of A C G T).

In other words, the mixture Dirichlet is unchanged if we swap the identities of A ↔ T and G ↔ C.

This is appropriate for DNA where the choice of "forward" strand is arbitrary (for example, transcriptional enhancers).

So, this may be useful for nhmmer.

I've probably got something wrong with (e.g.) the coding conventions...

@mcfrith
Copy link
Author

mcfrith commented Oct 23, 2025

Sorry for the churn. This version is a bit more general. If the number of mixture components is even, they will be paired so that each pair is DNA-strand-symmetric. If the number of components is odd, the final "odd" component will be unpaired and self-symmetric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant