All of lore.kernel.org
 help / color / mirror / Atom feed
* zeroout fixes
@ 2016-06-24  7:00 Christoph Hellwig
  2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Christoph Hellwig @ 2016-06-24  7:00 UTC (permalink / raw)
  To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block

The first one is a repost of my BLKDEV_DISCARD_ZERO with the spelling
fixes suggest from Martin, the second one just drops the -EOPNOTSUPP
special casing in blkdev_issue_write_same.

Note that the patches are against the block for-next tree, and
unfortunately we have diverged significantly from 4.7 in this area:
4.7 has the missing bio_put for the sumbmit_bio_wait callers, and
4.8 has the various req_op changes, including a new calling
convention for __blkdev_issue_zeroout.  I'm not sure how to best
sort this mess out, but I really think we need the bio_put in the
for-4.8 tree as well, and we should probably bring the changed
__blkdev_issue_discard calling convention into 4.7 as well.
Suggestion for how to best handle this?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
  2016-06-24  7:00 zeroout fixes Christoph Hellwig
@ 2016-06-24  7:00 ` Christoph Hellwig
  2016-06-29  5:03   ` Martin K. Petersen
  2016-06-24  7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
  2016-07-08  9:36 ` zeroout fixes Christoph Hellwig
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-06-24  7:00 UTC (permalink / raw)
  To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block

Currently blkdev_issue_zeroout cascades down from discards (if the driver
guarantees that discards zero data), to WRITE SAME and then to a loop
writing zeroes.  Unfortunately we ignore run-time EOPNOTSUPP errors in the
block layer blkdev_issue_discard helper to work around DM volumes that
may have mixed discard support underneath.

This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to
blkdev_issue_discard that indicates we are called for zeroing operation.
This allows both to ignore the EOPNOTSUPP hack and actually consolidating
the discard_zeroes_data check into the function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c        | 17 +++++++++++------
 include/linux/blkdev.h |  4 +++-
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 78626c2..45b35b1 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -36,12 +36,17 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		return -ENXIO;
 
 	if (flags & BLKDEV_DISCARD_SECURE) {
+		if (flags & BLKDEV_DISCARD_ZERO)
+			return -EOPNOTSUPP;
 		if (!blk_queue_secure_erase(q))
 			return -EOPNOTSUPP;
 		op = REQ_OP_SECURE_ERASE;
 	} else {
 		if (!blk_queue_discard(q))
 			return -EOPNOTSUPP;
+		if ((flags & BLKDEV_DISCARD_ZERO) &&
+		    !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
 		op = REQ_OP_DISCARD;
 	}
 
@@ -116,7 +121,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 			&bio);
 	if (!ret && bio) {
 		ret = submit_bio_wait(bio);
-		if (ret == -EOPNOTSUPP)
+		if (ret == -EOPNOTSUPP && !(flags & BLKDEV_DISCARD_ZERO))
 			ret = 0;
 	}
 	blk_finish_plug(&plug);
@@ -241,11 +246,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			 sector_t nr_sects, gfp_t gfp_mask, bool discard)
 {
-	struct request_queue *q = bdev_get_queue(bdev);
-
-	if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
-	    blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
-		return 0;
+	if (discard) {
+		if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+				BLKDEV_DISCARD_ZERO))
+			return 0;
+	}
 
 	if (bdev_write_same(bdev) &&
 	    blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9d1e0a4..b65ca66 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1136,7 +1136,9 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 	return bqt->tag_index[tag];
 }
 
-#define BLKDEV_DISCARD_SECURE  0x01    /* secure discard */
+
+#define BLKDEV_DISCARD_SECURE	(1 << 0)	/* issue a secure erase */
+#define BLKDEV_DISCARD_ZERO	(1 << 1)	/* must reliably zero data */
 
 extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
 extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  2016-06-24  7:00 zeroout fixes Christoph Hellwig
  2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
@ 2016-06-24  7:00 ` Christoph Hellwig
  2016-06-29  5:04   ` Martin K. Petersen
  2016-07-08  9:36 ` zeroout fixes Christoph Hellwig
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-06-24  7:00 UTC (permalink / raw)
  To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block

WRITE SAME is a data integrity operation and we can't simply ignore
errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 45b35b1..e371f83 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -178,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 
 	if (bio)
 		ret = submit_bio_wait(bio);
-	return ret != -EOPNOTSUPP ? ret : 0;
+	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
  2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
@ 2016-06-29  5:03   ` Martin K. Petersen
  0 siblings, 0 replies; 7+ messages in thread
From: Martin K. Petersen @ 2016-06-29  5:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block

>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:

Christoph> Currently blkdev_issue_zeroout cascades down from discards
Christoph> (if the driver guarantees that discards zero data), to WRITE
Christoph> SAME and then to a loop writing zeroes.  Unfortunately we
Christoph> ignore run-time EOPNOTSUPP errors in the block layer
Christoph> blkdev_issue_discard helper to work around DM volumes that
Christoph> may have mixed discard support underneath.

Christoph> This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to
Christoph> blkdev_issue_discard that indicates we are called for zeroing
Christoph> operation.  This allows both to ignore the EOPNOTSUPP hack
Christoph> and actually consolidating the discard_zeroes_data check into
Christoph> the function.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  2016-06-24  7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
@ 2016-06-29  5:04   ` Martin K. Petersen
  2016-07-03 19:52     ` Sitsofe Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2016-06-29  5:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block

>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:

Christoph> WRITE SAME is a data integrity operation and we can't simply
Christoph> ignore errors.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  2016-06-29  5:04   ` Martin K. Petersen
@ 2016-07-03 19:52     ` Sitsofe Wheeler
  0 siblings, 0 replies; 7+ messages in thread
From: Sitsofe Wheeler @ 2016-07-03 19:52 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Christoph Hellwig, axboe, shli, snitzer, linux-block

On Wed, Jun 29, 2016 at 01:04:02AM -0400, Martin K. Petersen wrote:
> >>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:
> 
> Christoph> WRITE SAME is a data integrity operation and we can't simply
> Christoph> ignore errors.
> 
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering

Along with the first patch this resolves the problems I was seeing when
using the script below when applied atop of axboe/linux-block.git branch
for-4.8/core (commit e118e4be8a96c6b89307ee3d360d954249981b8f).

The only snag is that when userland manually disables
max_write_same_blocks on the base device (by echoing 0 into it) stacked
devices will not set their their own queue/write_same_max_bytes to 0
when the BLKZEROOUT fails to use write same (but they do correctly zero
the device). If the base device somehow internally toggles this to 0
itself during a write same then dm devices seem to set
queue/write_same_max_bytes to 0.

Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>

#!/bin/bash
# Check BLKZEROOUT behaviour

modprobe scsi_debug write_same_length=131072
dev=$(lsblk -n -p -S | awk '/scsi_debug/ { print $1 }')
dev_kname=$(lsblk -n -S ${dev} | awk '{ print $1 }')
sys_dev_mwsb_path=$(realpath
/sys/block/"${dev_kname}"/device/scsi_disk/*/max_write_same_blocks)
pvcreate ${dev}
vgcreate vg ${dev}
lvcreate -L4M -n lv vg
lv_dev=/dev/mapper/vg-lv
sys_lv_wsmb_path="/sys/block/$(lsblk -n -o KNAME
"${lv_dev}")/queue/write_same_max_bytes"
orig_max_ws=$(cat "${sys_dev_mwsb_path}")
echo "Stacked device's write_same_max_bytes prior blkzeroout: $(cat
${sys_lv_wsmb_path})"
echo "Forcing base device's max_write_same_blocks to 0"
echo 0 > "${sys_dev_mwsb_path}"
size=$(blockdev --getsize64 "${lv_dev}")
tr '\0' '\377' < /dev/zero | dd oflag=direct bs=64K of="${lv_dev}"
count=$((${size} / (64 * 1024))) 
status=none
blkdiscard -v --zero "${lv_dev}"
dd if="${lv_dev}" iflag=direct bs=64K status=none | cmp -b --bytes
"${size}" /dev/zero -
verification=$?
echo "Stacked device's write_same_max_bytes after blkzeroout: $(cat
${sys_lv_wsmb_path})"
lvchange -a n "${lv_dev}"

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: zeroout fixes
  2016-06-24  7:00 zeroout fixes Christoph Hellwig
  2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
  2016-06-24  7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
@ 2016-07-08  9:36 ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2016-07-08  9:36 UTC (permalink / raw)
  To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block

On Fri, Jun 24, 2016 at 09:00:49AM +0200, Christoph Hellwig wrote:
> Note that the patches are against the block for-next tree, and
> unfortunately we have diverged significantly from 4.7 in this area:
> 4.7 has the missing bio_put for the sumbmit_bio_wait callers, and
> 4.8 has the various req_op changes, including a new calling
> convention for __blkdev_issue_zeroout.  I'm not sure how to best
> sort this mess out, but I really think we need the bio_put in the
> for-4.8 tree as well, and we should probably bring the changed
> __blkdev_issue_discard calling convention into 4.7 as well.
> Suggestion for how to best handle this?

Jens, any idea how to make forward progress on this issue?  Just
apply to 4.8 and then backport to stable?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-08  9:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-24  7:00 zeroout fixes Christoph Hellwig
2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
2016-06-29  5:03   ` Martin K. Petersen
2016-06-24  7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
2016-06-29  5:04   ` Martin K. Petersen
2016-07-03 19:52     ` Sitsofe Wheeler
2016-07-08  9:36 ` zeroout fixes Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.