* Fwd: Bug Report : meet an unexcepted WFBitMapS status after restarting the primary
[not found] ` <CA+qeAOpn85PevU6yxKqyt358ZVhdmLfwdaxvcpi4vy32Y4u8Mg@mail.gmail.com>
@ 2020-02-06 2:01 ` Dongsheng Yang
0 siblings, 0 replies; only message in thread
From: Dongsheng Yang @ 2020-02-06 2:01 UTC (permalink / raw)
To: lars.ellenberg, philipp.reisner, joel.colledge, linux-block, drbd-dev
Cc: duan.zhang
Adding linux-block maillist......
---------- Forwarded message ---------
发件人: Dongsheng Yang <dongsheng081251@gmail.com>
Date: 2020年2月6日周四 上午9:44
Subject: Fwd: Bug Report : meet an unexcepted WFBitMapS status after
restarting the primary
To: <lars.ellenberg@linbit.com>, <philipp.reisner@linbit.com>,
<linux-block@vger.kernel.org>, <joel.colledge@linbit.com>,
<drbd-dev@lists.linbit.com>
Cc: <duan.zhang@easystack.cn>
Hi Philipp and Lars,
Any suggestions?
Thanx
---------- Forwarded message ---------
发件人: Dongsheng Yang <dongsheng081251@gmail.com>
Date: 2020年2月5日周三 下午7:06
Subject: Bug Report : meet an unexcepted WFBitMapS status after
restarting the primary
To: <joel.colledge@linbit.com>
Cc: <drbd-dev@lists.linbit.com>, <duan.zhang@easystack.cn>
Hi guys,
Version: drbd-9.0.21-1
Layout: drbd.res within 3 nodes -- node-1(Secondary), node-2(Primary),
node-3(Secondary)
Description:
a.reboot node-2 when cluster is working.
b.re-up the drbd.res on node-2 after it restarted.
c.an expected resync from node-3 to node-2 happens. When the resync is
done, however,
node-1 raises an unexpected WFBitMapS repl status and can't recover
to normal anymore.
Status output:
node-1: drbdadm status
drbd6 role:Secondary
disk:UpToDate
hotspare connection:Connecting
node-2 role:Primary
replication:WFBitMapS peer-disk:Consistent
node-3 role:Secondary
peer-disk:UpToDate
node-2: drbdadm status
drbd6 role:Primary
disk:UpToDate
hotspare connection:Connecting
node-1 role:Secondary
peer-disk:UpToDate
node-3 role:Secondary
peer-disk:UpToDate
I assume that there is a process sequence below according to my source
code version:
node-1 node-2
node-3
restarted with CRASHED_PRIMARY
start sync with node-3 as target
start sync with node-2 as source
… …
end sync with node-3
end sync with node-2
w_after_state_change
loop 1 within for loop against node-1:(a)
receive_uuids10 send uuid with
UUID_FLAG_GOT_STABLE&CRASHED_PRIMARY to node-1
receive uuid of node-2 with CRASHED_PRIMARY loop 2 within for
loop against node-3:
clear CRASHED_PRIMARY(b)
send uuid to node-2 with UUID_FLAG_RESYNC receive uuids10
sync_handshake to SYNC_SOURCE_IF_BOTH_FAILED sync_handshake to NO_SYNC
change repl state to WFBitMapS
The key problem is about the order of step(a) and step(b), that is,
node-2 sends the
unexpected CRASHED_PRIMARY to node-1 though it's actually no longer a
crashed primary
after syncing with node-3.
So may I have the below questions:
a.Is this really a BUG or just an expected result?
b.If there's already a patch fix within the newest verion?
c.If there's some workaround method against this kind of unexcepted
status, since I really
meet so many other problems like that :(
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2020-02-06 2:01 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CA+qeAOqyL5fDoFUXxVD0iaYSpY9P1qNH0Hd7eUUyGCg6hznKRQ@mail.gmail.com>
[not found] ` <CA+qeAOpn85PevU6yxKqyt358ZVhdmLfwdaxvcpi4vy32Y4u8Mg@mail.gmail.com>
2020-02-06 2:01 ` Fwd: Bug Report : meet an unexcepted WFBitMapS status after restarting the primary Dongsheng Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).