Skip to content

Conversation

weslleyspereira
Copy link
Collaborator

Try simple solution to solve #69.

@drhpc
Copy link

drhpc commented Jul 27, 2022

Is this a proper solution? Isn't the issue that the parallelization triggers NWIN=0 at all?

@weslleyspereira
Copy link
Collaborator Author

Is this a proper solution? Isn't the issue that the parallelization triggers NWIN=0 at all?

Thanks for asking. The answer has 2 parts:

A. This is a proper solution for the tests cases. Some other reasons for this fix:

  1. The update part does not make sense if NWIN=0.
  2. I verified that the tests result in FLOPS = 0 when NWIN=0. Accordingly to the algorithm logic, the update should be neglected when FLOPS = 0. I could have proposed a fix like the following:
 IF( FLOPS.NE.0 ) THEN IF( ( FLOPS*100 ) / ( 2*NWIN*NWIN ) .GE. MMULT ) THEN [...] ELSE [...] END IF END IF

But this solution, in my opinion, would also be a problem if for some reason NWIN=0 and FLOPS is non zero. My solution is based on reasoning 1.

B. This is not a proper solution for the possible communication issue we have. I will give a short report for what I obtained.

  • My machine: Ubuntu 20.04, 8 cores (16 threads), using OpenMPI and GNU compilers. I can give more information if needed.
  1. NWIN is always different from zero and the tests pass when running xdhseqr with 1 or more than 8 MPI processes. I obtain NWIN=0 when using 8 MPI processes.

  2. I obtain NWIN=0 when using 2 - 8 MPI processes. I does not matter if I use bind-to, oversubscribe or map-by flags. At least I couldn't find anything else by playing with those flags.

  3. When NWIN=0, there are 2 scenarios:
    i. NMWIN2 = 2. In this case, LILO = LIHI+1 and I = LILO.
    ii. NMWIN2 = 1. In this case, LILO < LIHI and I is trash, sometimes even a negative number.

We are still investigating this problem.

@liyangzhong7
Copy link

I would like to add a few more information to this bug. Before I see this post, I used multiple compiler-mpi combination to compile and test scalapack-2.2.0. I tried gcc-8.2.0+openmpi-4.0.0, gcc-10.1.0+openmpi-4.0.5, gcc-11.2.0+openmpi-4.1.2 and intel_oneapi-2021. The gcc-8.2.0 and intel-2021 passed all test; while using the compiler gcc-10 and gcc-11 to compile will fail for Test #69 xshseqr and Test #70: xdhseqr. These two tests failed only for mpirun with np = 2-8; the tests failed no matter you are using gcc-10 or gcc-11 or gcc-8 during run-time. If you compile with gcc-8 and test with gcc-10, it goes fine.

The solution given by [weslleyspereira] is effective, but remember you need to edit both "pstrord.f" and "pdtrord.f" in order to pass those two tests, respectively. It is somewhat surprised to me that the failed test results from a program bug instead of settings from the compiler or mpi, and I have spent a lot of time checking and comparing the settings and object files in my system. So I write this post hoping people with the same situation can find this solution earlier and save them some time :)

@weslleyspereira
Copy link
Collaborator Author

In case this PR is still reasonable, I just wanted to add that Meiyue Shao, one of the main contributors to this code, suggested we look at https://dl.acm.org/doi/10.1145/2699471 to check if the current implementation is correct. I don't have time to work on that now, but, please, feel welcome to take from where I stopped here.

@swathy3
Copy link

swathy3 commented Oct 13, 2025

As observed by [liyangzhong7], The proposed fix for both "pstrord.f" and "pdtrord.f" when NWIN=0 is required to run successfully on Ubuntu with gcc-13/14 and -np=2–8. Would you consider including this workaround in the master branch?

@langou langou merged commit b935167 into master Oct 13, 2025
@langou
Copy link
Contributor

langou commented Oct 13, 2025

Thanks for reminder. Merged.

@langou langou deleted the try-solving-69 branch October 13, 2025 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants