Weaken memory fences for objects we do not care about #70637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

AZero13 wants to merge 1 commit into swiftlang:main from AZero13:relaxed-2

Contributor

AZero13 commented Dec 27, 2023

In the case where we discard the value of expected after the compare and exchange, we can make the failure memory order for them std::memory_order_relaxed, as the value no longer matters in these cases. Furthermore, memory_order_acq_rel is not necessary for most of these operations anyway, as the value being assigned to the atomic variable does not depend on the current exact value of said atomic variable.

AZero13 requested review from a team, al45tair and mikeash as code owners

December 27, 2023 16:06

AZero13 force-pushed the relaxed-2 branch from f0494b9 to c90953f Compare

December 27, 2023 16:14

AZero13 requested review from kavon and ktoso as code owners

December 27, 2023 16:14

AZero13 force-pushed the relaxed-2 branch 3 times, most recently from f080786 to eeed690 Compare

December 27, 2023 19:44

Collaborator

AnthonyLatsis commented Dec 28, 2023

@swift-ci please smoke test macOS

Contributor Author

AZero13 commented Dec 28, 2023

@AnthonyLatsis It passed. We should try the other tests and benchmarks

Weaken memory fences for objects we do not care about

87aaab0

In the case where we discard the value of expected after the compare and exchange, we can make the failure memory order for them std::memory_order_relaxed, as the value no longer matters in these cases. Furthermore, memory_order_acq_rel is not necessary for most of these operations anyway, as the value being assigned to the atomic variable does not depend on the current exact value of said atomic variable.

AZero13 force-pushed the relaxed-2 branch from eeed690 to 87aaab0 Compare

December 28, 2023 14:53

Collaborator

AnthonyLatsis commented Dec 28, 2023

I am not familiar with these code paths and have not taken time to review your changes. I would rather wait for a code owner to decide whether they look reasonable enough to continue testing them. I will trigger benchmarks, but please note that we might very well be missing an appropriate benchmark test for this.

Collaborator

AnthonyLatsis commented Dec 28, 2023

@swift-ci please benchmark

Catfish-Man requested a review from rokhinip

January 4, 2024 21:14

rokhinip reviewed

View reviewed changes

Contributor

rokhinip left a comment

I don't think this PR is actually changing/improving things - we need more justification for each call site that is being changed for why we do not need all the of the acquire/acq_rel barriers stated.

stdlib/public/Concurrency/TaskGroup.cpp

      
     auto waitingTask = waitQueue.load(std::memory_order_acquire);  
     if (!waitQueue.compare_exchange_strong(waitingTask, nullptr)) {  
     if (!waitQueue.compare_exchange_strong(waitingTask, nullptr,

Contributor

rokhinip Jan 5, 2024

Why do we need a store release here? What other memory write are we trying to make visible before we publish this null ptr? I don't see any other memory write here?

Contributor Author

AZero13 Aug 18, 2024

It is seq_cst by default so are you saying we can make this relaxed?

Contributor Author

AZero13 Aug 18, 2024

Why do we need a store release here? What other memory write are we trying to make visible before we publish this null ptr? I don't see any other memory write here?

The loads of other threads need to see the write to nullptr.

stdlib/public/Concurrency/TaskGroup.cpp

       // As another offer gets to run, it will have either a different waiting task, or no waiting task at all.  
     auto waitingTask = waitQueue.load(std::memory_order_acquire);  
     if (!waitQueue.compare_exchange_strong(waitingTask, nullptr)) {  
     if (!waitQueue.compare_exchange_strong(waitingTask, nullptr,  
 

Contributor

rokhinip Jan 5, 2024

Same question as above.

stdlib/public/Concurrency/TaskGroup.cpp

       assumedStatus, newStatus.completingPendingReadyWaiting(this).status,  
     /*success*/ std::memory_order_release,  
     /*failure*/ std::memory_order_acquire)) {  
     /*failure*/ std::memory_order_relaxed)) {  
 

Contributor

rokhinip Jan 5, 2024

I don't think this is correct? I think the acquire here is needed to make sure that we see the write by the child task whose results we are ready to consume.

stdlib/public/Concurrency/TaskGroup.cpp Show resolved Hide resolved

stdlib/public/Concurrency/TaskGroup.cpp

       }  
     // Put the waiting task at the beginning of the wait queue.  
     if (waitQueue.compare_exchange_strong(  
     if (waitQueue.compare_exchange_weak(  
 

Contributor

rokhinip Jan 5, 2024

Same here

stdlib/public/runtime/ErrorObject.mm

       expectedType, hashableBaseType ? hashableBaseType  
     : reinterpret_cast<const Metadata *>(1),  
     std::memory_order_acq_rel);  
     std::memory_order_relaxed, std::memory_order_relaxed);  
 

Contributor

rokhinip Jan 5, 2024

I don't know this part of the runtime logic but I would be dubious of the fact that this acq_rel or the other acquire barriers in this file are spurious.

Furthermore, memory_order_acq_rel is not necessary for most of these operations anyway, as the value being assigned to the atomic variable does not depend on the current exact value of said atomic variable.

That is not the right way to think about this - you should be reasoning about whether there are other memory updates apart from the atomic, that need to be made visible to the thread in this operation.

Contributor

rokhinip commented Jan 5, 2024

FWIW, I also don't believe any of the CI tests or benchmarks will catch what is being changed here cause (a) they all likely run on x86_64 which has TSO. And most of these memory barriers issues only show up after deep stress tests on ARM platforms. The only way to be sure is to intellectually convince ourselves that the changes here (a) are correct (b) actually not regressing anything.

Contributor

glessard commented Jan 6, 2024 •

edited

Loading

@AtariDreams perhaps the way forward is to justify each of these changes one by one, in separate PRs. Twenty at once is a lot.

al45tair requested changes

View reviewed changes

Contributor

al45tair left a comment

If you genuinely think that any of these are justifiable changes, please break this up into separate, smaller, PRs. Thinking each of these through is non-trivial effort, and I doubt anyone is especially keen to go through that for all of the places you're changing here in one go.

To re-iterate what others have said: testing alone is insufficient to tell whether your changes are right — even on a non-TSO platform it might require hours of stress testing to trigger a problem. The only way to be sure is to carefully reason through the program's behaviour, and since you are proposing weakening the memory ordering constraints there's a real chance you're adding subtle bugs here.

AZero13 closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment