-
Notifications
You must be signed in to change notification settings - Fork 63
Feature: shmem init error checking #1997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…it_error_checking
…it_error_checking
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1997 +/- ##
=======================================
Coverage 77.05% 77.05%
=======================================
Files 112 112
Lines 18959 18962 +3
=======================================
+ Hits 14608 14611 +3
Misses 4351 4351 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…it_error_checking
…' into feature-shmem_init_error_checking
…g' into feature-shmem_init_error_checking
…it_error_checking
| #ifndef T8_ENABLE_MPI | ||
| // If we do not use MPI, there is nothing to do. | ||
| // We only have a single process. | ||
| return 1; | ||
| #endif | ||
| #ifndef SC_ENABLE_MPICOMMSHARED | ||
| SC_ABORT ("Trying to use shared memory but SC_ENABLE_MPICOMMSHARED is not set. This should not happen if you use MPI " | ||
| "v.3.0 or higher. Maybe related to https://github.com/DLR-AMR/t8code/pull/1996."); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only the case when MPI is not linked or also when the number of ranks is 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the number of ranks is 1, but linked against MPI, it is save to use shared memory.
SC_ENABLE_MPICOMMSHARED will be defined and that case and this error will not occur.
| * that concentrates all trees at one process. */ | ||
| t8_shmem_init (sc_MPI_COMM_WORLD); | ||
| const int intranode_size = t8_shmem_init (sc_MPI_COMM_WORLD); | ||
| ASSERT_GT (intranode_size, 0) << "Could not initialize shared memory."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ASSERT_GT (intranode_size, 0) << "Could not initialize shared memory."; | |
| ASSERT_NE (intranode_size, 0) << "Could not initialize shared memory."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel more comfortable keeping GT, since >0 is a more restrictive condition then !=0.
If the intranode_size should be <0 then something went wrong and the memory was not initialized.
| * that concentrates all trees at one process. */ | ||
| t8_shmem_init (comm); | ||
| const int intranode_size = t8_shmem_init (comm); | ||
| ASSERT_GT (intranode_size, 0) << "Could not initialize shared memory."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ASSERT_GT (intranode_size, 0) << "Could not initialize shared memory."; | |
| ASSERT_NE (intranode_size, 0) << "Could not initialize shared memory."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
| /* setup shared memory usage */ | ||
| t8_shmem_init (comm); | ||
| const int intrasize_from_init = t8_shmem_init (comm); | ||
| ASSERT_GT (intrasize_from_init, 0) << "Error in t8_shmem_init. No intranode communicator set."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ASSERT_GT (intrasize_from_init, 0) << "Error in t8_shmem_init. No intranode communicator set."; | |
| ASSERT_NE (intrasize_from_init, 0) << "Error in t8_shmem_init. No intranode communicator set."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
| /* setup shared memory usage */ | ||
| t8_shmem_init (comm); | ||
| const int intranode_size = t8_shmem_init (comm); | ||
| ASSERT_GT (intranode_size, 0) << "Could not initialize shared memory."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ASSERT_GT (intranode_size, 0) << "Could not initialize shared memory."; | |
| ASSERT_NE (intranode_size, 0) << "Could not initialize shared memory."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
| /* setup shared memory usage */ | ||
| t8_shmem_init (comm); | ||
| const int intranode_size = t8_shmem_init (comm); | ||
| ASSERT_GT (intranode_size, 0) << "Could not initialize shared memory."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ASSERT_GT (intranode_size, 0) << "Could not initialize shared memory."; | |
| ASSERT_NE (intranode_size, 0) << "Could not initialize shared memory."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
| if (forest->global_first_desc == NULL) { | ||
| /* Set the shmem array type of comm */ | ||
| t8_shmem_init (comm); | ||
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup. Could not partition forest."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup. Could not partition forest."); | |
| SC_CHECK_ABORT (t8_shmem_init (comm) != 0, "Error in shared memory setup. Could not partition forest."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above
| if (forest->tree_offsets == NULL) { | ||
| /* Set the shmem array type of comm */ | ||
| t8_shmem_init (comm); | ||
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup. Could not partition forest."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup. Could not partition forest."); | |
| SC_CHECK_ABORT (t8_shmem_init (comm) != 0, "Error in shared memory setup. Could not partition forest."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above
| T8_ASSERT (forest->element_offsets == NULL); | ||
| /* Set the shmem array type to comm */ | ||
| t8_shmem_init (comm); | ||
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup. Could not partition forest."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup. Could not partition forest."); | |
| SC_CHECK_ABORT (t8_shmem_init (comm) != 0, "Error in shared memory setup. Could not partition forest."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above
|
|
||
| /* Try to set the comm type */ | ||
| t8_shmem_init (comm); | ||
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup. Could not load cmesh."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup. Could not load cmesh."); | |
| SC_CHECK_ABORT (t8_shmem_init (comm) != 0, "Error in shared memory setup. Could not load cmesh."); |
| tree_offset = cmesh->first_tree_shared ? -cmesh->first_tree - 1 : cmesh->first_tree; | ||
| if (cmesh->tree_offsets == NULL) { | ||
| t8_shmem_init (comm); | ||
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SC_CHECK_ABORT (t8_shmem_init (comm) > 0, "Error in shared memory setup."); | |
| SC_CHECK_ABORT (t8_shmem_init (comm) != 0, "Error in shared memory setup."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the != 0 fits better the true/false version of C, e.g. true being not zero.
|
Thanks for the review. I argue for keeping the >0 condition everywhere since it will catch more error cases. There was one comment by @sandro-elsweijer which i believe i have answered positively. |
Describe your changes here:
Related to #1985
The shared memory was often not initialized properly but we did not detect it.
I added an error code to
t8_shmem_initfollowing the convention ofsc_shmem_init.This actually caused many of our tests to fail giving more reason to this feature.
The errors are fixed with #1996 which should be merged first.
When the error code is met, we currently abort the program.
The reasoning is that shared memory should always be available to us, since it is part of MPI >=3.0
(This creates the new issue of requiring MPI 3.0 in our build system).
All these boxes must be checked by the AUTHOR before requesting review:
Documentation:,Bugfix:,Feature:,Improvement:orOther:.All these boxes must be checked by the REVIEWERS before merging the pull request:
As a reviewer please read through all the code lines and make sure that the code is fully understood, bug free, well-documented and well-structured.
General
Tests
If the Pull request introduces code that is not covered by the github action (for example coupling with a new library):
Scripts and Wiki
script/find_all_source_files.scpto check the indentation of these files.License
doc/(or already has one).