Skip to content

Defect: Invalid RMA address range accessing coarray element with allocatable component on remote image #774

@everythingfunctional

Description

@everythingfunctional
  • I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: 2.10.1
  • Fortran Compiler: gfortran 12.2.0
  • C compiler used for building lib: gcc 12.2.0
  • Installation method: ./install.sh
  • All flags & options passed to the installer: -M /usr
  • Output of uname -a: Linux stray 6.0.11-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 02 Dec 2022 17:25:31 +0000 x86_64 GNU/Linux
  • MPI library being used: openmpi 4.1.4-4
  • Machine architecture and number of physical cores: Intel x86, 4 core
  • Version of CMake: 3.25.1 installed via package manager

To help us debug your issue please explain:

What you were trying to do (and why)

Use derived type with allocatable component in coarray, Example

module payload_m implicit none private public :: payload_t, empty_payload type :: payload_t !! A raw buffer to facilitate data transfer between images !! !! Facilitates view of the data as either a string or raw bytes. !! Typical usage will be either to !! * produce a string representation of the data, and then parse that string to recover the original data !! * use the `transfer` function to copy the raw bytes of the data private integer, allocatable, public :: payload_(:) contains private procedure, public :: raw_payload procedure, public :: string_payload end type interface payload_t pure module function from_raw(payload) result(new_payload) implicit none integer, intent(in) :: payload(:) type(payload_t) :: new_payload end function pure module function from_string(payload) result(new_payload) implicit none character(len=*), intent(in) :: payload type(payload_t) :: new_payload end function module procedure empty_payload end interface interface pure module function empty_payload() implicit none type(payload_t) :: empty_payload end function pure module function raw_payload(self) implicit none class(payload_t), intent(in) :: self integer, allocatable :: raw_payload(:) end function pure module function string_payload(self) implicit none class(payload_t), intent(in) :: self character(len=:), allocatable :: string_payload end function end interface end module submodule(payload_m) payload_s implicit none contains module procedure from_raw new_payload%payload_ = payload end procedure module procedure from_string new_payload = payload_t([len(payload), transfer(payload,[integer::])]) end procedure module procedure empty_payload empty_payload%payload_ = [integer::] end procedure module procedure raw_payload if (allocated(self%payload_)) then raw_payload = self%payload_ else raw_payload = [integer::] end if end procedure module procedure string_payload if (allocated(self%payload_)) then if (size(self%payload_) > 0) then allocate(character(len=self%payload_(1)) :: string_payload) if (len(string_payload) > 0) & string_payload = transfer(self%payload_(2:),string_payload) else allocate(character(len=0) :: string_payload) end if else allocate(character(len=0) :: string_payload) end if end procedure end submodule program example use payload_m, only: payload_t character(len=*), parameter :: MESSAGE = "Hello, World!" type(payload_t) :: mailbox[*] if (this_image() == 1) then mailbox = payload_t(MESSAGE) end if sync all if (this_image() /= 1) then mailbox = mailbox[1] end if print *, mailbox%string_payload(), " from image: ", this_image() end program

What happened (include command output, screenshots, logs, etc.)

$ caf main.f90 $ cafrun -n 2 ./a.out Hello, World! from image: 1 Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0 0x7fb8c68519ff in ??? #1 0x55dfaf068c12 in ??? #2 0x55dfaf06a3b7 in ??? #3 0x55dfaf06a4ad in ??? #4 0x7fb8c683c28f in ??? #5 0x7fb8c683c349 in ??? #6 0x55dfaf068564 in _start	at ../sysdeps/x86_64/start.S:115 #7 0xffffffffffffffff in ??? -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that process rank 1 with PID 0 on node stray exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Error: Command: `/usr/bin/mpiexec -n 2 ./a.out` failed to run. 

with a more complex example (specifically the FEATS quadratic_solver example which can be found here in the examples folder) the error message is

[stray:40367] *** An error occurred in MPI_Get [stray:40367] *** reported by process [2800353281,1] [stray:40367] *** on win rdma window 5 [stray:40367] *** MPI_ERR_RMA_RANGE: invalid RMA address range [stray:40367] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort, [stray:40367] *** and potentially your MPI job) Error: Command: `/usr/bin/mpiexec -n 2 build/caf_3E83561A255DFE60/example/quadratic_solver` failed to run. 

What you expected to happen

The above example should print "Hello, World! from image: 1" for each image.

Step-by-step reproduction instructions to reproduce the error/bug

Simply compile the provided example as shown above. This was originally reported as follow up in #739, but was worthy of its own issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions