Skip to content

Conversation

@wenduwan
Copy link
Contributor

The current implementation has an out-of-bound bug that process_info->my_local_rank could exceed package_ranks array length. This patch eliminates package_ranks array and fixes the bug.

The current implementation has an out-of-bound bug that process_info->my_local_rank could exceed package_ranks array length. This patch eliminates package_ranks array and fixes the bug. Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
Comment on lines +754 to +758

if ((uint16_t) pname.vpid == process_info->my_local_rank) {
return ranks_on_package;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early return doesn't make sense to me. Can you explain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I see it now, it reflects previous return value of package_ranks[process_info->my_local_rank];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this makes sense to me, with the assumption that multiple peers cannot share the same vpid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. vpid is parsed from local_peers string in OPAL_MODEX_RECV_VALUE(rc, PMIX_LOCAL_PEERS, &pname, &local_peers, PMIX_STRING);

Based on my experiment, local_peers string should be the same for all local ranks, e.g. 0,5,3,6,2,1,4, each value mapping to a local rank id.

@wenduwan
Copy link
Contributor Author

@lrbison Can we get this merged?

@lrbison lrbison requested a review from hppritcha May 19, 2023 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants