Skip to content

Conversation

@andrejtonev
Copy link
Contributor

@andrejtonev andrejtonev commented Mar 21, 2025

Prepare takes main_lock, which is held until the end of the transaction.
Commit takes repl_state lock (while holding on to main_lock from Prepare).
PeriodicSnapshot needs to fail for replicas, so must read repl_state. Bug: this lock is held for the duration of CreateSnapshot, which takes main_lock. This can lead to a deadlock.
Fix is to read repl_state and release the lock, then call CreateSnapshot.
Bug present from v3.1

@andrejtonev andrejtonev added bug bug CI -build=coverage -test=core Run coverage build and core tests on push CI -build=jepsen -test=core Run jepsen build and core tests on push CI -build=release -test=e2e Run release build and e2e tests on push labels Mar 21, 2025
@andrejtonev andrejtonev added this to the mg-v3.1.1 milestone Mar 21, 2025
@andrejtonev andrejtonev self-assigned this Mar 21, 2025
@andrejtonev
Copy link
Contributor Author

andrejtonev commented Mar 21, 2025

Git commit description, explain the changes you made here


Leave above in PR description, copy the below into a comment


Tracking

  • [Link to Epic/Issue]

Standard development

CI Testing Labels

  • Select the appropriate CI test labels (CI -build=build-name -test=test-suite)

Documentation checklist

  • Add the documentation label
  • Add the bug / feature label
  • Add the milestone for which this feature is intended
    • If not known, set for a later milestone
  • Write a release note, including added/changed clauses
    • Bugfix: Deadlock caused by write queries and periodic snapshots. Deadlock removed. 2819
  • [ Documentation PR link memgraph/documentation#XXXX ]
    • Is back linked to this development PR
  • [ Tag someone from docs team ]
@andrejtonev andrejtonev changed the title Deadlock from repl_state and main_lock (Prepare or Commit vs PeriodicSnapshot) Bufdix: Deadlock caused by wrong order of repl_state and main_lock (Prepare or Commit vs PeriodicSnapshot) Mar 24, 2025
@andrejtonev andrejtonev requested a review from as51340 March 24, 2025 10:27
@andrejtonev andrejtonev changed the title Bufdix: Deadlock caused by wrong order of repl_state and main_lock (Prepare or Commit vs PeriodicSnapshot) Bugfix: Deadlock caused by wrong order of repl_state and main_lock (Prepare or Commit vs PeriodicSnapshot) Mar 24, 2025
@andrejtonev andrejtonev changed the title Bugfix: Deadlock caused by wrong order of repl_state and main_lock (Prepare or Commit vs PeriodicSnapshot) Bugfix: Deadlock caused by wrong order of repl_state and main_lock Mar 24, 2025
@andrejtonev andrejtonev linked an issue Mar 24, 2025 that may be closed by this pull request
@andrejtonev andrejtonev added the Docs - changelog only Docs - changelog only label Mar 24, 2025
@andrejtonev andrejtonev added this pull request to the merge queue Mar 24, 2025
Merged via the queue into master with commit 35b6abb Mar 24, 2025
17 checks passed
@andrejtonev andrejtonev deleted the repl_state_deadlock_fix branch March 24, 2025 13:47
mattkjames7 pushed a commit that referenced this pull request Mar 25, 2025
…2819) Prepare takes main_lock, which is held until the end of the transaction. Commit takes repl_state lock (while holding on to main_lock from Prepare). PeriodicSnapshot needs to fail for replicas, so must read repl_state. Bug: this lock is held for the duration of CreateSnapshot, which takes main_lock. This can lead to a deadlock. Fix is to read repl_state and release the lock, then call CreateSnapshot. Bug present from v3.1
@gitbuda gitbuda mentioned this pull request Mar 27, 2025
18 tasks
as51340 pushed a commit that referenced this pull request Oct 24, 2025
…2819) Prepare takes main_lock, which is held until the end of the transaction. Commit takes repl_state lock (while holding on to main_lock from Prepare). PeriodicSnapshot needs to fail for replicas, so must read repl_state. Bug: this lock is held for the duration of CreateSnapshot, which takes main_lock. This can lead to a deadlock. Fix is to read repl_state and release the lock, then call CreateSnapshot. Bug present from v3.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug bug CI -build=coverage -test=core Run coverage build and core tests on push CI -build=jepsen -test=core Run jepsen build and core tests on push CI -build=release -test=e2e Run release build and e2e tests on push Docs - changelog only Docs - changelog only

2 participants