-
Notifications
You must be signed in to change notification settings - Fork 769
Parallelizes MDAnalysis.analysis.InterRDF and MDAnalysis.analysis.InterRDF_s
#4884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello @tanishy7777! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2025-01-20 20:35:32 UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #4884 +/- ##
========================================
Coverage 93.88% 93.89%
========================================
Files 180 180
Lines 22422 22447 +25
Branches 3186 3188 +2
========================================
+ Hits 21052 21077 +25
Misses 902 902
Partials 468 468 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@tanishy7777 thanks for the prompt PR! I have some comments though:
Can we just sum it up separately though? I mean, make it a part of |
Got it! I have made |
|
when trying to make I tried to find out why this is happening by looking at the result itself which is being aggregated along 'count' so the so I tried finding the dimensions of the arrays manually because it wasnt converting to a numpy array. so its not able to convert to a numpy array, because of inconsistent dimensions. I am not sure how to resolve this here |
|
based on the above comment #4884 (comment) I think we can mark because its not possible to convert array of inhomogenous dimensions to a numpy array and since Because after the self.results.count[i] / normwont be possible as division is not supported between |
|
@tanishy7777 the class should be marked as non-parallelizable only if the algorithm to run it is not actually parallelizable, which I'm not yet convinced is the case for all the mentioned classes. But I think you're on the right path here, you just need to implement a custom aggregation function, instead of those implemented among Can you describe what kind of arrays you're trying to aggregate and can not find an appropriate function for? I didn't quite get it from your screenshots. |
it should be the same as if you'd run it without parallelization. And I assume you want to stack/sum/whatever along the dimension that corresponds to the timestep -- you can probably guess which one it is if you run it on some example with known number of frames. Example trajectories you can find in MDAnalysisTests. |
Got it. Will work on that! |
|
used a custom aggregator for
Thanks @marinegor for your help! |
analysis.rdf.InterRDF and analysis.rdf.InterRDF_s
analysis.rdf.InterRDF and analysis.rdf.InterRDF_sMDAnalysis.InterRDF and MDAnalysis.InterRDF_s
MDAnalysis.InterRDF and MDAnalysis.InterRDF_s MDAnalysis.analysis.InterRDF and MDAnalysis.analysis.InterRDF_s
|
@orbeckst @RMeli @marinegor I think this is ready to be merged, can you please review it? |
…ysis into parallize_rdf
|
@marinegor Sorry for the delayed action on this PR again, college has been quite hectic. I have added the changes that you had requested. For the formatting issues caused by black, I merged develop into my branch so I think it should be resolved. Please let me know if any further changes are needed. Thanks for reviewing my changes and for your guidance! |
|
@tanishy7777 thanks for all your contributions. As you know, reviewer time is pretty precious. I asked @marinegor to look at PR #4896 already so please be understanding if this one may take longer. |
Thanks for you reply. Completely understandable, no worries at all. |
|
@tanishy7777 just as a heads-up: Egor told us he has very little availability, so we'll have to find other reviewers to continue here. As the first step where you can help is to just resolve the conflict to get the PR back into mergeable state. |
|
@p-j-smith would you be in a position to review and look after this parallelization PR? Egor has already done a lot of deep reviewing here but as he can't do it at the moment and we'd like to move this improvement along, it would be great to have your 👀 on it. I am tentatively assigning you but if you can't do it, please un-assign yourself and let me know with a comment + ping and then I'll be looking for someone else. Thanks!! |
p-j-smith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work @tanishy7777 - just a couple of small changes to simplify the tests. And please merge in the latest changes from develop
p-j-smith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tanishy7777 for all your work on this!
All comments have been addressed by @tanishy7777
|
Thank you @tanishy7777 for the work and 🎉 ! Thank you @marinegor for the initial reviews and thank you @p-j-smith for getting it merged! |
|
While looking at the RDF PR #5073 I realized that some of the CHANGELOG and versionchanged entries on this PR were outdated. I fixed them as part of finalizing the other PR in https://github.com/gitzhangch/mdanalysis/commit/477b0e2efc8271f70b6b2fd318b8033e46bd5ee5 |
Sorry for missing that, and thanks for fixing! |
|
No problem :-) |
* Fixes #5067 * Refactored bin index calculation in analysis.rdf.InterRDF_s._single_frame to avoid multiple redundant np.histogram calls, improving performance; at least 20x for small systems, potentially >100x for realistic systems. * fixed change entries for PR #4884 on RDF parallelization - corrected version changed dates for PR #4884 in analysis.rdf to 2.10.0 - moved entry in CHANGELOG to 2.10.0 * add benchmark for InterRDF_s * Update CHANGELOG * update AUTHORS --------- Co-authored-by: Oliver Beckstein <orbeckst@gmail.com> Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>










Fixes #4675
Changes made in this Pull Request:
rdf.InterRDFandrdf.InterRDF_sTLDR: of the comments below: Initially I thought
rdfisnt parallizable but turns out both classes inrdfcan be parallelized.PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4884.org.readthedocs.build/en/4884/