Skip to content

Conversation

@AZero13
Copy link
Contributor

@AZero13 AZero13 commented Dec 27, 2023

In the case where we discard the value of expected after the compare and exchange, we can make the failure memory order for them std::memory_order_relaxed, as the value no longer matters in these cases. Furthermore, memory_order_acq_rel is not necessary for most of these operations anyway, as the value being assigned to the atomic variable does not depend on the current exact value of said atomic variable.

@AZero13 AZero13 requested review from a team, al45tair and mikeash as code owners December 27, 2023 16:06
@AZero13 AZero13 requested review from kavon and ktoso as code owners December 27, 2023 16:14
@AZero13 AZero13 force-pushed the relaxed-2 branch 3 times, most recently from f080786 to eeed690 Compare December 27, 2023 19:44
@AnthonyLatsis
Copy link
Collaborator

@swift-ci please smoke test macOS

@AZero13
Copy link
Contributor Author

AZero13 commented Dec 28, 2023

@AnthonyLatsis It passed. We should try the other tests and benchmarks

In the case where we discard the value of expected after the compare and exchange, we can make the failure memory order for them std::memory_order_relaxed, as the value no longer matters in these cases. Furthermore, memory_order_acq_rel is not necessary for most of these operations anyway, as the value being assigned to the atomic variable does not depend on the current exact value of said atomic variable.
@AnthonyLatsis
Copy link
Collaborator

I am not familiar with these code paths and have not taken time to review your changes. I would rather wait for a code owner to decide whether they look reasonable enough to continue testing them. I will trigger benchmarks, but please note that we might very well be missing an appropriate benchmark test for this.

@AnthonyLatsis
Copy link
Collaborator

@swift-ci please benchmark

@Catfish-Man Catfish-Man requested a review from rokhinip January 4, 2024 21:14
Copy link
Contributor

@rokhinip rokhinip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR is actually changing/improving things - we need more justification for each call site that is being changed for why we do not need all the of the acquire/acq_rel barriers stated.


auto waitingTask = waitQueue.load(std::memory_order_acquire);
if (!waitQueue.compare_exchange_strong(waitingTask, nullptr)) {
if (!waitQueue.compare_exchange_strong(waitingTask, nullptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a store release here? What other memory write are we trying to make visible before we publish this null ptr? I don't see any other memory write here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is seq_cst by default so are you saying we can make this relaxed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a store release here? What other memory write are we trying to make visible before we publish this null ptr? I don't see any other memory write here?

The loads of other threads need to see the write to nullptr.

// As another offer gets to run, it will have either a different waiting task, or no waiting task at all.
auto waitingTask = waitQueue.load(std::memory_order_acquire);
if (!waitQueue.compare_exchange_strong(waitingTask, nullptr)) {
if (!waitQueue.compare_exchange_strong(waitingTask, nullptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above.

assumedStatus, newStatus.completingPendingReadyWaiting(this).status,
/*success*/ std::memory_order_release,
/*failure*/ std::memory_order_acquire)) {
/*failure*/ std::memory_order_relaxed)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct? I think the acquire here is needed to make sure that we see the write by the child task whose results we are ready to consume.

}
// Put the waiting task at the beginning of the wait queue.
if (waitQueue.compare_exchange_strong(
if (waitQueue.compare_exchange_weak(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

expectedType, hashableBaseType ? hashableBaseType
: reinterpret_cast<const Metadata *>(1),
std::memory_order_acq_rel);
std::memory_order_relaxed, std::memory_order_relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this part of the runtime logic but I would be dubious of the fact that this acq_rel or the other acquire barriers in this file are spurious.

Furthermore, memory_order_acq_rel is not necessary for most of these operations anyway, as the value being assigned to the atomic variable does not depend on the current exact value of said atomic variable.

That is not the right way to think about this - you should be reasoning about whether there are other memory updates apart from the atomic, that need to be made visible to the thread in this operation.

@rokhinip
Copy link
Contributor

rokhinip commented Jan 5, 2024

FWIW, I also don't believe any of the CI tests or benchmarks will catch what is being changed here cause (a) they all likely run on x86_64 which has TSO. And most of these memory barriers issues only show up after deep stress tests on ARM platforms. The only way to be sure is to intellectually convince ourselves that the changes here (a) are correct (b) actually not regressing anything.

@glessard
Copy link
Contributor

glessard commented Jan 6, 2024

@AtariDreams perhaps the way forward is to justify each of these changes one by one, in separate PRs. Twenty at once is a lot.

Copy link
Contributor

@al45tair al45tair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you genuinely think that any of these are justifiable changes, please break this up into separate, smaller, PRs. Thinking each of these through is non-trivial effort, and I doubt anyone is especially keen to go through that for all of the places you're changing here in one go.

To re-iterate what others have said: testing alone is insufficient to tell whether your changes are right — even on a non-TSO platform it might require hours of stress testing to trigger a problem. The only way to be sure is to carefully reason through the program's behaviour, and since you are proposing weakening the memory ordering constraints there's a real chance you're adding subtle bugs here.

@AZero13 AZero13 closed this Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants