How to get DRBD nodes out of Connection State StandAlone (and WFConnection)?

Question

My Debian 8.9 DRBD 8.4.3 setup somehow has got into a state where the two nodes cannot connect over the network any more. They should replicate a single resource r1, but immediately after drbdadm down r1; drbadm up r1 on both nodes their /proc/drbd describe the situation as follows:

on 1st node (Connection State is either WFConnection or StandAlone):

1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- ns:0 nr:0 dw:0 dr:912 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:20

on 2nd node:

1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:48

The two nodes can ping each other over the IP addresses cited in /etc/drbd.d/r1.res, and netstat shows that both are listening on the cited port.

How can I (further diagnose and) get out of this situation so that the two nodes can become Connected and replicate over DRBD again?

BTW, on a higher level of abstraction this problem currently manifests itself by systemctl start drbd never exiting, apparently because it gets stuck in drbdadm wait-connect all (as suggested by /lib/systemd/system/drbd.service).

Tr33beard · Accepted Answer · 2020-05-21 08:16:05Z

The situation was apparently caused by a case of split-brain.

I had not noticed this because I had only inspected recent journal entries for drbd.service (sudo journalctl -u drbd), but the problem apparently was reported in other kernel logs and slightly earlier (sudo journalctl | grep Split-Brain).

With that, manually solving the split-brain (as described here or here) also resolved the troublesome situation as follows.

On split-brain victim (assuming the DRBD resource is r1):

drbdadm disconnect r1 drbdadm secondary r1 drbdadm connect --discard-my-data r1

On split-brain survivor:

drbdadm primary r1 drbdadm connect r1

It's best to include your steps in your answer versus linking to a site that might move later. I imagine you just needed drbdadm disconnect r1 on both nodes, then drbdadm connect r1 --discard-my-data on the victim, and drbdadm connect r1 on the survivor. — Matt Kereczman
– Matt Kereczman, Commented Aug 25, 2017 at 14:44
Thank you for adding the hint about grepping the journal for split-brain. I was in the same situation ;) — Sir Jane
– Sir Jane, Commented Feb 22, 2024 at 16:04

sysadmin1138 · Accepted Answer · 2022-12-22 17:19:08Z

I use the following pattern: On Sick Node(Which is not Current DC, run pcs status)

drbdadm dump all drbdadm disconnect resource drbdadm secondary resource drbdadm connect resource

On Healthy Node (Which is current DC, run pcs status )

drbdadm dump all drbdadm disconnect resource drbdadm primary resource drbdadm connect resource

Stack Exchange Network

How to get DRBD nodes out of Connection State StandAlone (and WFConnection)?

2 Answers 2

You must log in to answer this question.

Hot Network Questions

How to get DRBD nodes out of Connection State StandAlone (and WFConnection)?

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions