linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lars Ellenberg <lars.ellenberg@linbit.com>
To: Jens Axboe <axboe@kernel.dk>,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Subject: [PATCH 08/17] drbd: reject attach of unsuitable uuids even if connected
Date: Thu, 20 Dec 2018 17:23:35 +0100	[thread overview]
Message-ID: <20181220162344.8430-9-lars.ellenberg@linbit.com> (raw)
In-Reply-To: <20181220162344.8430-1-lars.ellenberg@linbit.com>

Multiple failure scenario:
a) all good
   Connected Primary/Secondary UpToDate/UpToDate
b) lose disk on Primary,
   Connected Primary/Secondary Diskless/UpToDate
c) continue to write to the device,
   changes only make it to the Secondary storage.
d) lose disk on Secondary,
   Connected Primary/Secondary Diskless/Diskless
e) now try to re-attach on Primary

This would have succeeded before, even though that is clearly the
wrong data set to attach to (missing the modifications from c).
Because we only compared our "effective" and the "to-be-attached"
data generation uuid tags if (device->state.conn < C_CONNECTED).

Fix: change that constraint to (device->state.pdsk != D_UP_TO_DATE)
compare the uuids, and reject the attach.

This patch also tries to improve the reverse scenario:
first lose Secondary, then Primary disk,
then try to attach the disk on Secondary.

Before this patch, the attach on the Secondary succeeds, but since commit
drbd: disconnect, if the wrong UUIDs are attached on a connected peer
the Primary will notice unsuitable data, and drop the connection hard.

Though unfortunately at a point in time during the handshake where
we cannot easily abort the attach on the peer without more
refactoring of the handshake.

We now reject any attach to "unsuitable" uuids,
as long as we can see a Primary role,
unless we already have access to "good" data.

Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
---
 drivers/block/drbd/drbd_nl.c       |  6 +++---
 drivers/block/drbd/drbd_receiver.c | 19 +++++++++++++++++++
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index e4774f720de5..4b934e543e2d 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1960,9 +1960,9 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
 		}
 	}
 
-	if (device->state.conn < C_CONNECTED &&
-	    device->state.role == R_PRIMARY && device->ed_uuid &&
-	    (device->ed_uuid & ~((u64)1)) != (nbc->md.uuid[UI_CURRENT] & ~((u64)1))) {
+	if (device->state.pdsk != D_UP_TO_DATE && device->ed_uuid &&
+	    (device->state.role == R_PRIMARY || device->state.peer == R_PRIMARY) &&
+            (device->ed_uuid & ~((u64)1)) != (nbc->md.uuid[UI_CURRENT] & ~((u64)1))) {
 		drbd_err(device, "Can only attach to data with current UUID=%016llX\n",
 		    (unsigned long long)device->ed_uuid);
 		retcode = ERR_DATA_NOT_CURRENT;
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 85e3d846a23a..76d74b2122d6 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -4397,6 +4397,25 @@ static int receive_state(struct drbd_connection *connection, struct packet_info
 	if (peer_state.conn == C_AHEAD)
 		ns.conn = C_BEHIND;
 
+	/* TODO:
+	 * if (primary and diskless and peer uuid != effective uuid)
+	 *     abort attach on peer;
+	 *
+	 * If this node does not have good data, was already connected, but
+	 * the peer did a late attach only now, trying to "negotiate" with me,
+	 * AND I am currently Primary, possibly frozen, with some specific
+	 * "effective" uuid, this should never be reached, really, because
+	 * we first send the uuids, then the current state.
+	 *
+	 * In this scenario, we already dropped the connection hard
+	 * when we received the unsuitable uuids (receive_uuids().
+	 *
+	 * Should we want to change this, that is: not drop the connection in
+	 * receive_uuids() already, then we would need to add a branch here
+	 * that aborts the attach of "unsuitable uuids" on the peer in case
+	 * this node is currently Diskless Primary.
+	 */
+
 	if (device->p_uuid && peer_state.disk >= D_NEGOTIATING &&
 	    get_ldev_if_state(device, D_NEGOTIATING)) {
 		int cr; /* consider resync */
-- 
2.17.1


  parent reply	other threads:[~2018-12-20 16:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20 16:23 [PATCH 00/17] DRBD updates for 4.21 Lars Ellenberg
2018-12-20 16:23 ` [PATCH 01/17] drbd: narrow rcu_read_lock in drbd_sync_handshake Lars Ellenberg
2018-12-20 16:23 ` [PATCH 02/17] drbd: must not use connection after kref_put(&connection->kref) Lars Ellenberg
2018-12-20 16:23 ` [PATCH 03/17] drbd: centralize printk reporting of new size into drbd_set_my_capacity() Lars Ellenberg
2018-12-20 16:23 ` [PATCH 04/17] drbd: ignore "all zero" peer volume sizes in handshake Lars Ellenberg
2018-12-20 16:23 ` [PATCH 05/17] drbd: disconnect, if the wrong UUIDs are attached on a connected peer Lars Ellenberg
2018-12-20 16:23 ` [PATCH 06/17] drbd: fix confusing error message during attach Lars Ellenberg
2018-12-20 16:23 ` [PATCH 07/17] drbd: attach on connected diskless peer must not shrink a consistent device Lars Ellenberg
2018-12-20 16:23 ` Lars Ellenberg [this message]
2018-12-20 16:23 ` [PATCH 09/17] drbd: fix comment typos Lars Ellenberg
2018-12-20 16:23 ` [PATCH 10/17] drbd: do not block when adjusting "disk-options" while IO is frozen Lars Ellenberg
2018-12-20 16:23 ` [PATCH 11/17] drbd: avoid spurious self-outdating with concurrent disconnect / down Lars Ellenberg
2018-12-20 16:23 ` [PATCH 12/17] drbd: fix print_st_err()'s prototype to match the definition Lars Ellenberg
2018-12-20 16:23 ` [PATCH 13/17] drbd: don't retry connection if peers do not agree on "authentication" settings Lars Ellenberg
2018-12-20 16:23 ` [PATCH 14/17] drbd: skip spurious timeout (ping-timeo) when failing promote Lars Ellenberg
2018-12-20 16:23 ` [PATCH 15/17] drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire") Lars Ellenberg
2018-12-20 16:23 ` [PATCH 16/17] drbd: Avoid Clang warning about pointless switch statment Lars Ellenberg
2018-12-20 16:23 ` [PATCH 17/17] drbd: Change drbd_request_detach_interruptible's return type to int Lars Ellenberg
2018-12-20 16:27 ` [PATCH 00/17] DRBD updates for 4.21 Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181220162344.8430-9-lars.ellenberg@linbit.com \
    --to=lars.ellenberg@linbit.com \
    --cc=axboe@kernel.dk \
    --cc=drbd-dev@lists.linbit.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).