odls/base: Fix abormal cleanup when app is wrapped #3337
Closed
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
hello_ccrashes andwrapperdetects it, thenwrapperwillexit with a non-zero exit status. The orted will notice that and
start a kill process for all local processes.
SIGKILLto thewrapperprocess, and thatprocess will terminate and leave the
hello_crunning. Thehello_cwill continue to run (in this test case will wait in
MPI_Finalize)and the job will seem to hang.
going to wait on it. This prevents orted_cmd from seeing the process
as alive and waiting for it to complete (note that the pid is set
to
0so we wouldn't be able to mark it correctly later even ifwe did get a notice.
SIGKILLsignal to just thePIDofwrappersend it to
-PIDso that the kernel will send the signal to thewhole process group under
wrapperas well. This will case thehello_cprogram to terminate as well.