linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drbd: fix discard_zeroes_if_aligned regression
       [not found] <15124635.GA4107@soda.linbit>
@ 2018-01-15 23:00 ` Eric Wheeler
  2018-01-16  7:26   ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2018-01-15 23:00 UTC (permalink / raw)
  To: drbd-dev
  Cc: Eric Wheeler, Eric Wheeler, stable, Philipp Reisner,
	Lars Ellenberg, linux-kernel

From: Eric Wheeler <git@linux.ewheeler.net>

According to drbd.conf documentation, "To not break established and
expected behaviour, and suddenly cause fstrim on thin-provisioned LVs to
run out-of-space instead of freeing up space, the default value is yes."

This behavior regressed in the REQ_OP_WRITE_ZEROES refactor near
	45c21793 drbd: implement REQ_OP_WRITE_ZEROES
    0dbed96  drbd: make intelligent use of blkdev_issue_zeroout
which caused dm-thin backed DRBD volumes to zero blocks and run out of
space instead of passing discard to the backing device as defined by the
discard_zeroes_if_aligned option.

A helper function could reduce code duplication.

Signed-off-by: Eric Wheeler <drbd@linux.ewheeler.net>
Cc: <stable@vger.kernel.org> # 4.14
---
 drivers/block/drbd/drbd_receiver.c | 22 ++++++++++++++++++++--
 drivers/block/drbd/drbd_req.c      | 23 +++++++++++++++++++++--
 2 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 62a902f..58f0e43 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1507,9 +1507,27 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
 static void drbd_issue_peer_discard(struct drbd_device *device, struct drbd_peer_request *peer_req)
 {
 	struct block_device *bdev = device->ldev->backing_bdev;
+	struct disk_conf *dc;
+	bool discard_zeroes_if_aligned, zeroout;
 
-	if (blkdev_issue_zeroout(bdev, peer_req->i.sector, peer_req->i.size >> 9,
-			GFP_NOIO, 0))
+	rcu_read_lock();
+	dc = rcu_dereference(device->ldev->disk_conf);
+	discard_zeroes_if_aligned = dc->discard_zeroes_if_aligned;
+	rcu_read_unlock();
+
+	/* Use zeroout unless discard_zeroes_if_aligned is set.
+	 * If blkdev_issue_discard fails, then retry with blkdev_issue_zeroout.
+	 * See also drbd_process_discard_req() in drbd_req.c.
+	 */
+	zeroout = true;
+	if (discard_zeroes_if_aligned &&
+	    blkdev_issue_discard(bdev, peer_req->i.sector,
+		peer_req->i.size >> 9, GFP_NOIO, 0) == 0)
+		zeroout = false;
+
+	if (zeroout &&
+	    blkdev_issue_zeroout(bdev, peer_req->i.sector,
+		peer_req->i.size >> 9, GFP_NOIO, 0) != 0)
 		peer_req->flags |= EE_WAS_ERROR;
 
 	drbd_endio_write_sec_final(peer_req);
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index de8566e..070b5e7 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1158,9 +1158,28 @@ static int drbd_process_write_request(struct drbd_request *req)
 static void drbd_process_discard_req(struct drbd_request *req)
 {
 	struct block_device *bdev = req->device->ldev->backing_bdev;
+	struct drbd_device *device = req->device;
+	struct disk_conf *dc;
+	bool discard_zeroes_if_aligned, zeroout;
 
-	if (blkdev_issue_zeroout(bdev, req->i.sector, req->i.size >> 9,
-			GFP_NOIO, 0))
+	rcu_read_lock();
+	dc = rcu_dereference(device->ldev->disk_conf);
+	discard_zeroes_if_aligned = dc->discard_zeroes_if_aligned;
+	rcu_read_unlock();
+
+	/* Use zeroout unless discard_zeroes_if_aligned is set.
+	 * If blkdev_issue_discard fails, then retry with blkdev_issue_zeroout.
+	 * See also drbd_issue_peer_discard() in drbd_receiver.c.
+	 */
+	zeroout = true;
+	if (discard_zeroes_if_aligned &&
+	    blkdev_issue_discard(bdev, req->i.sector, req->i.size >> 9,
+			GFP_NOIO, 0) == 0)
+		zeroout = false;
+
+	if (zeroout &&
+	    blkdev_issue_zeroout(bdev, req->i.sector, req->i.size >> 9,
+			GFP_NOIO, 0) != 0)
 		req->private_bio->bi_status = BLK_STS_IOERR;
 	bio_endio(req->private_bio);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drbd: fix discard_zeroes_if_aligned regression
  2018-01-15 23:00 ` [PATCH] drbd: fix discard_zeroes_if_aligned regression Eric Wheeler
@ 2018-01-16  7:26   ` Christoph Hellwig
  2018-01-16  9:49     ` Lars Ellenberg
  2018-01-17  0:43     ` Eric Wheeler
  0 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2018-01-16  7:26 UTC (permalink / raw)
  To: Eric Wheeler
  Cc: drbd-dev, Eric Wheeler, Eric Wheeler, stable, Philipp Reisner,
	Lars Ellenberg, linux-kernel

NAK.  Calling a discard and expecting zeroing is simply buggy.

And double NAK for patches like this without a linux-block Cc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drbd: fix discard_zeroes_if_aligned regression
  2018-01-16  7:26   ` Christoph Hellwig
@ 2018-01-16  9:49     ` Lars Ellenberg
  2019-05-10 17:36       ` Eric Wheeler
  2018-01-17  0:43     ` Eric Wheeler
  1 sibling, 1 reply; 7+ messages in thread
From: Lars Ellenberg @ 2018-01-16  9:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Wheeler, drbd-dev, Eric Wheeler, Eric Wheeler, stable,
	Philipp Reisner, linux-kernel

On Mon, Jan 15, 2018 at 11:26:15PM -0800, Christoph Hellwig wrote:
> NAK.  Calling a discard and expecting zeroing is simply buggy.

What he said.

The bug/misunderstanding was that we now use zeroout even for discards,
with the assumption that it would try to do discards where it can.

Which is even true.

But our expectations of when zeroout "should" use unmap,
and where it actually can do that safely
based on the information it has,
don't really match:
our expectations where wrong, we assumed too much.
(in the general case).

Which means in DRBD we have to stop use zeroout for discards,
and again pass down normal discard for discards.

In the specific case where the backend to DRBD is lvm-thin,
AND it does de-alloc blocks on discard,
AND it does NOT have skip_block_zeroing set or it's backend
does zero on discard and it does discard passdown, and we tell
DRBD about all of that (using the discard_zeroes_if_aligned flag)
then we can do this "zero head and tail, discard full blocks",
and expect them to be zero.

If skip_block_zeroing is set however, and there is no discard
passdown in thin, or the backend of thin does not zero on discard,
DRBD can still pass down discards to thin, and expect them do be
de-allocated, but can not expect discarded ranges to remain
"zero", any later partial write to an unallocated area could pull
in different "undefined" garbage on different replicas for the
not-written-to part of a new allocated block.

The end result is that you have areas of the block device
that return different data depending on which replica you
read from.

But that is the case even eithout discard in that setup,
don't do that then or live with it.

"undefined data" is undefined, you have that directly on thin
in that setup already, with DRBD on top you now have several
versions of "undefined".

Anything on top of such a setup must not do "read-modify-write"
of "undefined" data and expect a defined result, adding DRBD
on top does not change that.

DRBD can deal with that just fine, but our "online verify" will
report loads of boring "mismatches" for those areas.

TL;DR: we'll stop doing "discard-is-zeroout"
(our assumptions were wrong).
We still won't do exactly "discard-is-discard", but re-enable our
"discard-is-discard plus zeroout on head and tail", because in
some relevant setups, this gives us the best result, and avoids
the false positives in our online-verify.

    Lars

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drbd: fix discard_zeroes_if_aligned regression
  2018-01-16  7:26   ` Christoph Hellwig
  2018-01-16  9:49     ` Lars Ellenberg
@ 2018-01-17  0:43     ` Eric Wheeler
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Wheeler @ 2018-01-17  0:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: drbd-dev, stable, Philipp Reisner, Lars Ellenberg, linux-kernel

On Mon, 15 Jan 2018, Christoph Hellwig wrote:

> NAK.  Calling a discard and expecting zeroing is simply buggy.

But of course, that would be silly.

We don't expect discard to zero---but we do expect discard to discard!

> And double NAK for patches like this without a linux-block Cc.

My appologies, I thought this was internal to DRBD.  

What is the general rule here?

Should linux-block always be Cc'ed with a patch?

--
Eric Wheeler

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drbd: fix discard_zeroes_if_aligned regression
  2018-01-16  9:49     ` Lars Ellenberg
@ 2019-05-10 17:36       ` Eric Wheeler
  2019-05-28 13:18         ` Lars Ellenberg
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2019-05-10 17:36 UTC (permalink / raw)
  To: Lars Ellenberg
  Cc: Christoph Hellwig, drbd-dev, stable, Philipp Reisner, linux-kernel

On Tue, 16 Jan 2018, Lars Ellenberg wrote:

> On Mon, Jan 15, 2018 at 11:26:15PM -0800, Christoph Hellwig wrote:
> > NAK.  Calling a discard and expecting zeroing is simply buggy.
> 
> What he said.
> 
> The bug/misunderstanding was that we now use zeroout even for discards,
> with the assumption that it would try to do discards where it can.
> 
> Which is even true.
> 
> But our expectations of when zeroout "should" use unmap,
> and where it actually can do that safely
> based on the information it has,
> don't really match:
> our expectations where wrong, we assumed too much.
> (in the general case).
> 
> Which means in DRBD we have to stop use zeroout for discards,
> and again pass down normal discard for discards.
> 
> In the specific case where the backend to DRBD is lvm-thin,
> AND it does de-alloc blocks on discard,
> AND it does NOT have skip_block_zeroing set or it's backend
> does zero on discard and it does discard passdown, and we tell
> DRBD about all of that (using the discard_zeroes_if_aligned flag)
> then we can do this "zero head and tail, discard full blocks",
> and expect them to be zero.
> 
> If skip_block_zeroing is set however, and there is no discard
> passdown in thin, or the backend of thin does not zero on discard,
> DRBD can still pass down discards to thin, and expect them do be
> de-allocated, but can not expect discarded ranges to remain
> "zero", any later partial write to an unallocated area could pull
> in different "undefined" garbage on different replicas for the
> not-written-to part of a new allocated block.
> 
> The end result is that you have areas of the block device
> that return different data depending on which replica you
> read from.
> 
> But that is the case even eithout discard in that setup,
> don't do that then or live with it.
> 
> "undefined data" is undefined, you have that directly on thin
> in that setup already, with DRBD on top you now have several
> versions of "undefined".
> 
> Anything on top of such a setup must not do "read-modify-write"
> of "undefined" data and expect a defined result, adding DRBD
> on top does not change that.
> 
> DRBD can deal with that just fine, but our "online verify" will
> report loads of boring "mismatches" for those areas.
> 
> TL;DR: we'll stop doing "discard-is-zeroout"
> (our assumptions were wrong).
> We still won't do exactly "discard-is-discard", but re-enable our
> "discard-is-discard plus zeroout on head and tail", because in
> some relevant setups, this gives us the best result, and avoids
> the false positives in our online-verify.
> 
>     Lars
> 

Hi Lars,

We just tried 4.19.x and this bugs still exists. We applied the patch 
which was originally submitted to this thread and it still applies cleanly 
and seems to work for our use case. You mentioned that you had some older 
code which zeroed out unaligned discard requests (or perhaps it was for a 
different purpose) that you may be able to use to patch this. Could you 
dig those up and see if we can get this solved?

It would be nice to be able to use drbd with thin backing volumes from the 
vanilla kernel. If this has already been fixed in something newer than 
4.19, then please point me to the commit.

Thank you for your help!

 
--
Eric Wheeler



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drbd: fix discard_zeroes_if_aligned regression
  2019-05-10 17:36       ` Eric Wheeler
@ 2019-05-28 13:18         ` Lars Ellenberg
  2019-06-02  0:28           ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Lars Ellenberg @ 2019-05-28 13:18 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: drbd-dev, linux-kernel

On Fri, May 10, 2019 at 05:36:32PM +0000, Eric Wheeler wrote:
> Hi Lars,
> 
> We just tried 4.19.x and this bugs still exists. We applied the patch 
> which was originally submitted to this thread and it still applies cleanly 
> and seems to work for our use case. You mentioned that you had some older 
> code which zeroed out unaligned discard requests (or perhaps it was for a 
> different purpose) that you may be able to use to patch this. Could you 
> dig those up and see if we can get this solved?
> 
> It would be nice to be able to use drbd with thin backing volumes from the 
> vanilla kernel. If this has already been fixed in something newer than 
> 4.19, then please point me to the commit.

I think it was merged upstream in 5.0
f31e583aa2c2 drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drbd: fix discard_zeroes_if_aligned regression
  2019-05-28 13:18         ` Lars Ellenberg
@ 2019-06-02  0:28           ` Eric Wheeler
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Wheeler @ 2019-06-02  0:28 UTC (permalink / raw)
  To: Lars Ellenberg; +Cc: drbd-dev, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2608 bytes --]

On Tue, 28 May 2019, Lars Ellenberg wrote:

> On Fri, May 10, 2019 at 05:36:32PM +0000, Eric Wheeler wrote:
> > Hi Lars,
> > 
> > We just tried 4.19.x and this bugs still exists. We applied the patch 
> > which was originally submitted to this thread and it still applies cleanly 
> > and seems to work for our use case. You mentioned that you had some older 
> > code which zeroed out unaligned discard requests (or perhaps it was for a 
> > different purpose) that you may be able to use to patch this. Could you 
> > dig those up and see if we can get this solved?
> > 
> > It would be nice to be able to use drbd with thin backing volumes from the 
> > vanilla kernel. If this has already been fixed in something newer than 
> > 4.19, then please point me to the commit.
> 
> I think it was merged upstream in 5.0
> f31e583aa2c2 drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")

Thanks Lars, I appreciate your patch.  

Your unaligned zerout code in drbd_issue_discard_or_zero_out() looks 
great.  I particulary like how you adjusted max_discard_sectors to the 
granularity, as well as alignment handling.  Well thought out.

Your commit notes that "for backward compatibility, P_TRIM means zero-out, 
unless the DRBD_FF_WZEROES feature flag is agreed upon during handshake."

We test our environment by deploying the newer kernel on one of the DRBD 
servers and checking for regressions---but this will cause a zero-out on 
the new server because the old server doesn't yet support DRBD_FF_WZEROES.

For our purpose, can you think of any reason that it would be unsafe to 
hack the following into drbd_do_features() so the newer version will not 
zero-out while we test and get both nodes up to the newer version?

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index c7ad88d..76191e6 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -5382,6 +5382,8 @@ static int drbd_do_features(struct drbd_connection *connection)
 	connection->agreed_pro_version = min_t(int, PRO_VERSION_MAX, p->protocol_max);
 	connection->agreed_features = PRO_FEATURES & be32_to_cpu(p->feature_flags);
 
+	connection->agreed_features |= DRBD_FF_WZEROES;
+
 	drbd_info(connection, "Handshake successful: "
 	     "Agreed network protocol version %d\n", connection->agreed_pro_version);


--
Eric Wheeler



> 
> -- 
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
> : R&D, Integration, Ops, Consulting, Support
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT
> 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-06-02  0:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <15124635.GA4107@soda.linbit>
2018-01-15 23:00 ` [PATCH] drbd: fix discard_zeroes_if_aligned regression Eric Wheeler
2018-01-16  7:26   ` Christoph Hellwig
2018-01-16  9:49     ` Lars Ellenberg
2019-05-10 17:36       ` Eric Wheeler
2019-05-28 13:18         ` Lars Ellenberg
2019-06-02  0:28           ` Eric Wheeler
2018-01-17  0:43     ` Eric Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).