linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Philipp Reisner <philipp.reisner@linbit.com>
To: Jens Axboe <axboe@fb.com>, linux-kernel@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Subject: [PATCH 24/30] drbd: disallow promotion during resync handshake, avoid deadlock and hard reset
Date: Mon, 13 Jun 2016 16:09:12 +0200	[thread overview]
Message-ID: <1465826958-19398-25-git-send-email-philipp.reisner@linbit.com> (raw)
In-Reply-To: <1465826958-19398-1-git-send-email-philipp.reisner@linbit.com>

From: Lars Ellenberg <lars.ellenberg@linbit.com>

We already serialize connection state changes,
and other, non-connection state changes (role changes)
while we are establishing a connection.

But if we have an established connection,
then trigger a resync handshake (by primary --force or similar),
until now we just had to be "lucky".

Consider this sequence (e.g. deployment scenario):
create-md; up;
  -> Connected Secondary/Secondary Inconsistent/Inconsistent
then do a racy primary --force on both peers.

 block drbd0: drbd_sync_handshake:
 block drbd0: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: peer 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent )
 block drbd0: peer( Secondary -> Primary ) pdsk( Inconsistent -> UpToDate )
  *** HERE things go wrong. ***
 block drbd0: role( Secondary -> Primary )
 block drbd0: drbd_sync_handshake:
 block drbd0: self 0000000000000005:0000000000000000:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: peer C90D2FC716D232AB:0000000000000004:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: Becoming sync target due to disk states.
 block drbd0: Writing the whole bitmap, full sync required after drbd_sync_handshake.
 block drbd0: Remote failed to finish a request within 6007ms > ko-count (2) * timeout (30 * 0.1s)
 drbd s0: peer( Primary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )

The problem here is that the local promotion happens before the sync handshake
triggered by the remote promotion was completed.  Some assumptions elsewhere
become wrong, and when the expected resync handshake is then received and
processed, we get stuck in a deadlock, which can only be recovered by reboot :-(

Fix: if we know the peer has good data,
and our own disk is present, but NOT good,
and there is no resync going on yet,
we expect a sync handshake to happen "soon".
So reject a racy promotion with SS_IN_TRANSIENT_STATE.

Result:
 ... as above ...
 block drbd0: peer( Secondary -> Primary ) pdsk( Inconsistent -> UpToDate )
  *** local promotion being postponed until ... ***
 block drbd0: drbd_sync_handshake:
 block drbd0: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: peer 77868BDA836E12A5:0000000000000004:0000000000000000:0000000000000000 bits:25590 flags:0
  ...
 block drbd0: conn( WFBitMapT -> WFSyncUUID )
 block drbd0: updated sync uuid 85D06D0E8887AD44:0000000000000000:0000000000000000:0000000000000000
 block drbd0: conn( WFSyncUUID -> SyncTarget )
  *** ... after the resync handshake ***
 block drbd0: role( Secondary -> Primary )

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
---
 drivers/block/drbd/drbd_state.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 24422e8..7562c5c 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -906,6 +906,15 @@ is_valid_soft_transition(union drbd_state os, union drbd_state ns, struct drbd_c
 	      (ns.conn >= C_CONNECTED && os.conn == C_WF_REPORT_PARAMS)))
 		rv = SS_IN_TRANSIENT_STATE;
 
+	/* Do not promote during resync handshake triggered by "force primary".
+	 * This is a hack. It should really be rejected by the peer during the
+	 * cluster wide state change request. */
+	if (os.role != R_PRIMARY && ns.role == R_PRIMARY
+		&& ns.pdsk == D_UP_TO_DATE
+		&& ns.disk != D_UP_TO_DATE && ns.disk != D_DISKLESS
+		&& (ns.conn <= C_WF_SYNC_UUID || ns.conn != os.conn))
+			rv = SS_IN_TRANSIENT_STATE;
+
 	if ((ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) && os.conn < C_CONNECTED)
 		rv = SS_NEED_CONNECTION;
 
-- 
2.7.4

  parent reply	other threads:[~2016-06-13 14:25 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-13 14:08 [PATCH 00/30] DRBD updates Philipp Reisner
2016-06-13 14:08 ` [PATCH 01/30] drbd: bitmap bulk IO: do not always suspend IO Philipp Reisner
2016-06-13 14:08 ` [PATCH 02/30] drbd: change bitmap write-out when leaving resync states Philipp Reisner
2016-06-13 14:08 ` [PATCH 03/30] drbd: Kill code duplication Philipp Reisner
2016-06-13 14:08 ` [PATCH 04/30] drbd: Implement handling of thinly provisioned storage on resync target nodes Philipp Reisner
2016-06-13 14:08 ` [PATCH 05/30] drbd: Introduce new disk config option rs-discard-granularity Philipp Reisner
2016-06-13 14:08 ` [PATCH 06/30] drbd: Create the protocol feature THIN_RESYNC Philipp Reisner
2016-06-13 14:08 ` [PATCH 07/30] drbd: adjust assert in w_bitmap_io to account for BM_LOCKED_CHANGE_ALLOWED Philipp Reisner
2016-06-13 14:08 ` [PATCH 08/30] drbd: fix regression: protocol A sometimes synchronous, C sometimes double-latency Philipp Reisner
2016-06-13 14:08 ` [PATCH 09/30] drbd: fix for truncated minor number in callback command line Philipp Reisner
2016-06-13 14:08 ` [PATCH 10/30] drbd: allow parallel flushes for multi-volume resources Philipp Reisner
2016-06-13 14:08 ` [PATCH 11/30] drbd: when receiving P_TRIM, zero-out partial unaligned chunks Philipp Reisner
2016-06-13 14:09 ` [PATCH 12/30] drbd: possibly disable discard support, if backend has discard_zeroes_data=0 Philipp Reisner
2016-06-13 14:09 ` [PATCH 13/30] drbd: zero-out partial unaligned discards on local backend Philipp Reisner
2016-06-13 14:09 ` [PATCH 14/30] drbd: allow larger max_discard_sectors Philipp Reisner
2016-06-13 14:09 ` [PATCH 15/30] drbd: finish resync on sync source only by notification from sync target Philipp Reisner
2016-06-13 14:09 ` [PATCH 16/30] drbd: introduce unfence-peer handler Philipp Reisner
2016-06-13 14:09 ` [PATCH 17/30] drbd: don't forget error completion when "unsuspending" IO Philipp Reisner
2016-06-13 14:09 ` [PATCH 18/30] drbd: if there is no good data accessible, writes should be IO errors Philipp Reisner
2016-06-13 14:09 ` [PATCH 19/30] drbd: only restart frozen disk io when D_UP_TO_DATE Philipp Reisner
2016-06-13 14:09 ` [PATCH 20/30] drbd: discard_zeroes_if_aligned allows "thin" resync for discard_zeroes_data=0 Philipp Reisner
2016-06-13 14:09 ` [PATCH 21/30] drbd: report sizes if rejecting too small peer disk Philipp Reisner
2016-06-13 14:09 ` [PATCH 22/30] drbd: introduce WRITE_SAME support Philipp Reisner
2016-06-13 14:09 ` [PATCH 23/30] drbd: sync_handshake: handle identical uuids with current (frozen) Primary Philipp Reisner
2016-06-13 14:09 ` Philipp Reisner [this message]
2016-06-13 14:09 ` [PATCH 25/30] drbd: bump current uuid when resuming IO with diskless peer Philipp Reisner
2016-06-13 14:09 ` [PATCH 26/30] drbd: code cleanups without semantic changes Philipp Reisner
2016-06-13 14:09 ` [PATCH 27/30] drbd: get rid of empty statement in is_valid_state Philipp Reisner
2016-06-13 14:09 ` [PATCH 28/30] drbd: finally report ms, not jiffies, in log message Philipp Reisner
2016-06-13 14:09 ` [PATCH 29/30] drbd: al_write_transaction: skip re-scanning of bitmap page pointer array Philipp Reisner
2016-06-13 14:09 ` [PATCH 30/30] drbd: correctly handle failed crypto_alloc_hash Philipp Reisner
2016-06-13 15:11 ` [PATCH 00/30] DRBD updates Jens Axboe
2016-06-13 22:26   ` Philipp Reisner
2016-06-13 22:26   ` [PATCH 01/30] drbd: bitmap bulk IO: do not always suspend IO Philipp Reisner
2016-06-13 22:26   ` [PATCH 02/30] drbd: change bitmap write-out when leaving resync states Philipp Reisner
2016-06-13 22:26   ` [PATCH 03/30] drbd: Kill code duplication Philipp Reisner
2016-06-13 22:26   ` [PATCH 04/30] drbd: Implement handling of thinly provisioned storage on resync target nodes Philipp Reisner
2016-06-13 22:26   ` [PATCH 05/30] drbd: Introduce new disk config option rs-discard-granularity Philipp Reisner
2016-06-13 22:26   ` [PATCH 06/30] drbd: Create the protocol feature THIN_RESYNC Philipp Reisner
2016-06-13 22:26   ` [PATCH 07/30] drbd: adjust assert in w_bitmap_io to account for BM_LOCKED_CHANGE_ALLOWED Philipp Reisner
2016-06-13 22:26   ` [PATCH 08/30] drbd: fix regression: protocol A sometimes synchronous, C sometimes double-latency Philipp Reisner
2016-06-13 22:26   ` [PATCH 09/30] drbd: fix for truncated minor number in callback command line Philipp Reisner
2016-06-13 22:26   ` [PATCH 10/30] drbd: allow parallel flushes for multi-volume resources Philipp Reisner
2016-06-13 22:26   ` [PATCH 11/30] drbd: when receiving P_TRIM, zero-out partial unaligned chunks Philipp Reisner
2016-06-13 22:26   ` [PATCH 12/30] drbd: possibly disable discard support, if backend has discard_zeroes_data=0 Philipp Reisner
2016-06-13 22:26   ` [PATCH 13/30] drbd: zero-out partial unaligned discards on local backend Philipp Reisner
2016-06-13 22:26   ` [PATCH 14/30] drbd: allow larger max_discard_sectors Philipp Reisner
2016-06-13 22:26   ` [PATCH 15/30] drbd: finish resync on sync source only by notification from sync target Philipp Reisner
2016-06-13 22:26   ` [PATCH 16/30] drbd: introduce unfence-peer handler Philipp Reisner
2016-06-13 22:26   ` [PATCH 17/30] drbd: don't forget error completion when "unsuspending" IO Philipp Reisner
2016-06-13 22:26   ` [PATCH 18/30] drbd: if there is no good data accessible, writes should be IO errors Philipp Reisner
2016-06-13 22:26   ` [PATCH 19/30] drbd: only restart frozen disk io when D_UP_TO_DATE Philipp Reisner
2016-06-13 22:26   ` [PATCH 20/30] drbd: discard_zeroes_if_aligned allows "thin" resync for discard_zeroes_data=0 Philipp Reisner
2016-06-13 22:26   ` [PATCH 21/30] drbd: report sizes if rejecting too small peer disk Philipp Reisner
2016-06-13 22:26   ` [PATCH 22/30] drbd: introduce WRITE_SAME support Philipp Reisner
2016-06-13 22:26   ` [PATCH 23/30] drbd: sync_handshake: handle identical uuids with current (frozen) Primary Philipp Reisner
2016-06-13 22:26   ` [PATCH 24/30] drbd: disallow promotion during resync handshake, avoid deadlock and hard reset Philipp Reisner
2016-06-13 22:26   ` [PATCH 25/30] drbd: bump current uuid when resuming IO with diskless peer Philipp Reisner
2016-06-13 22:26   ` [PATCH 26/30] drbd: code cleanups without semantic changes Philipp Reisner
2016-06-13 22:26   ` [PATCH 27/30] drbd: get rid of empty statement in is_valid_state Philipp Reisner
2016-06-13 22:26   ` [PATCH 28/30] drbd: finally report ms, not jiffies, in log message Philipp Reisner
2016-06-13 22:26   ` [PATCH 29/30] drbd: al_write_transaction: skip re-scanning of bitmap page pointer array Philipp Reisner
2016-06-13 22:26   ` [PATCH 30/30] drbd: correctly handle failed crypto_alloc_hash Philipp Reisner
  -- strict thread matches above, loose matches on Subject: below --
2016-04-25 12:07 [PATCH 00/30] DBRD updates Philipp Reisner
2016-04-25 12:07 ` [PATCH 24/30] drbd: disallow promotion during resync handshake, avoid deadlock and hard reset Philipp Reisner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1465826958-19398-25-git-send-email-philipp.reisner@linbit.com \
    --to=philipp.reisner@linbit.com \
    --cc=axboe@fb.com \
    --cc=drbd-dev@lists.linbit.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).