Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nancheck to ensure exit using openmp #239

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

KAClough
Copy link
Member

@KAClough KAClough commented Jul 10, 2023

Fixes #238.

Also (just because I had this in my code already) implements the output of physical coordinates instead of integer ones if m_dx is passed by the user.

@mirenradia would you support moving the NanCheck to specificPostTimestep() instead of specifiAdvance()? I think this is a historic thing - I don't see that you gain much by exiting during a RK4 substep rather than at the end, and it means it runs a load more times. If so I would updated this in the Examples here.

@KAClough KAClough added the bug Something isn't working label Jul 10, 2023
@KAClough KAClough self-assigned this Jul 10, 2023
@KAClough KAClough requested a review from mirenradia July 10, 2023 11:08
@mirenradia
Copy link
Member

Since this conflicts with changes in #236, let's wait until that is merged before rebasing the changes here onto that.

This commit addresses Issue 238 primarily in the NanCheck.cpp file,
where the switch to the master thread (#pragma omp master) is placed
AFTER the start of the "if" statement involving the "stop" variable,
rather than before.

I suspect that since the "stop" variable was altered under the "atomic
write" OpenMP command, if any thread other than the master thread caught
the Nan then the memory storing the "stop" variable would not be shared
with master, and thus the if statement leading to the error trap would
not be tripped if evaluated on the master thread. This change
effectively moves the evaluation of that logic statement to each OpenMP
thread individually.

This commit also has an error-trip built into the ScalarField example,
where in InitialScalarData I set the value of the extrinsic curvature K
somewhere in the box to Nan. This version has a commented-out NanCheck
right after the initial data as well, which can be used to debug the
current issue (see end).

I also added the specificPostTimeStep function to the ScalarField
example, and moved the NanCheck that was called in specificAdvance to
this function.

PS. Another way to solve this problem could be to transfer the "stop"
variable to the master thread in some way before the logic statement.
This may address certain error output issues that are present in the
current commit, meaning that if the error is caught on a non-master
thread it is also printed by that same thread. We may need to look into
this in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simulation sometimes fails to abort when NanCheck finds a NaN with >1 OpenMP thread.
3 participants