All of lore.kernel.org
 help / color / mirror / Atom feed
* zeroout fixes
@ 2016-06-24  7:00 Christoph Hellwig
  2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Christoph Hellwig @ 2016-06-24  7:00 UTC (permalink / raw)
  To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block

The first one is a repost of my BLKDEV_DISCARD_ZERO with the spelling
fixes suggest from Martin, the second one just drops the -EOPNOTSUPP
special casing in blkdev_issue_write_same.

Note that the patches are against the block for-next tree, and
unfortunately we have diverged significantly from 4.7 in this area:
4.7 has the missing bio_put for the sumbmit_bio_wait callers, and
4.8 has the various req_op changes, including a new calling
convention for __blkdev_issue_zeroout.  I'm not sure how to best
sort this mess out, but I really think we need the bio_put in the
for-4.8 tree as well, and we should probably bring the changed
__blkdev_issue_discard calling convention into 4.7 as well.
Suggestion for how to best handle this?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
  2016-06-24  7:00 zeroout fixes Christoph Hellwig
@ 2016-06-24  7:00 ` Christoph Hellwig
  2016-06-29  5:03   ` Martin K. Petersen
  2016-06-24  7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
  2016-07-08  9:36 ` zeroout fixes Christoph Hellwig
  2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2016-06-24  7:00 UTC (permalink / raw)
  To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block

Currently blkdev_issue_zeroout cascades down from discards (if the driver
guarantees that discards zero data), to WRITE SAME and then to a loop
writing zeroes.  Unfortunately we ignore run-time EOPNOTSUPP errors in the
block layer blkdev_issue_discard helper to work around DM volumes that
may have mixed discard support underneath.

This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to
blkdev_issue_discard that indicates we are called for zeroing operation.
This allows both to ignore the EOPNOTSUPP hack and actually consolidating
the discard_zeroes_data check into the function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c        | 17 +++++++++++------
 include/linux/blkdev.h |  4 +++-
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 78626c2..45b35b1 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -36,12 +36,17 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		return -ENXIO;
 
 	if (flags & BLKDEV_DISCARD_SECURE) {
+		if (flags & BLKDEV_DISCARD_ZERO)
+			return -EOPNOTSUPP;
 		if (!blk_queue_secure_erase(q))
 			return -EOPNOTSUPP;
 		op = REQ_OP_SECURE_ERASE;
 	} else {
 		if (!blk_queue_discard(q))
 			return -EOPNOTSUPP;
+		if ((flags & BLKDEV_DISCARD_ZERO) &&
+		    !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
 		op = REQ_OP_DISCARD;
 	}
 
@@ -116,7 +121,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 			&bio);
 	if (!ret && bio) {
 		ret = submit_bio_wait(bio);
-		if (ret == -EOPNOTSUPP)
+		if (ret == -EOPNOTSUPP && !(flags & BLKDEV_DISCARD_ZERO))
 			ret = 0;
 	}
 	blk_finish_plug(&plug);
@@ -241,11 +246,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			 sector_t nr_sects, gfp_t gfp_mask, bool discard)
 {
-	struct request_queue *q = bdev_get_queue(bdev);
-
-	if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
-	    blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
-		return 0;
+	if (discard) {
+		if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+				BLKDEV_DISCARD_ZERO))
+			return 0;
+	}
 
 	if (bdev_write_same(bdev) &&
 	    blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9d1e0a4..b65ca66 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1136,7 +1136,9 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 	return bqt->tag_index[tag];
 }
 
-#define BLKDEV_DISCARD_SECURE  0x01    /* secure discard */
+
+#define BLKDEV_DISCARD_SECURE	(1 << 0)	/* issue a secure erase */
+#define BLKDEV_DISCARD_ZERO	(1 << 1)	/* must reliably zero data */
 
 extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
 extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  2016-06-24  7:00 zeroout fixes Christoph Hellwig
  2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
@ 2016-06-24  7:00 ` Christoph Hellwig
  2016-06-29  5:04   ` Martin K. Petersen
  2016-07-08  9:36 ` zeroout fixes Christoph Hellwig
  2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2016-06-24  7:00 UTC (permalink / raw)
  To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block

WRITE SAME is a data integrity operation and we can't simply ignore
errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 45b35b1..e371f83 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -178,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 
 	if (bio)
 		ret = submit_bio_wait(bio);
-	return ret != -EOPNOTSUPP ? ret : 0;
+	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
  2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
@ 2016-06-29  5:03   ` Martin K. Petersen
  0 siblings, 0 replies; 8+ messages in thread
From: Martin K. Petersen @ 2016-06-29  5:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block

>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:

Christoph> Currently blkdev_issue_zeroout cascades down from discards
Christoph> (if the driver guarantees that discards zero data), to WRITE
Christoph> SAME and then to a loop writing zeroes.  Unfortunately we
Christoph> ignore run-time EOPNOTSUPP errors in the block layer
Christoph> blkdev_issue_discard helper to work around DM volumes that
Christoph> may have mixed discard support underneath.

Christoph> This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to
Christoph> blkdev_issue_discard that indicates we are called for zeroing
Christoph> operation.  This allows both to ignore the EOPNOTSUPP hack
Christoph> and actually consolidating the discard_zeroes_data check into
Christoph> the function.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  2016-06-24  7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
@ 2016-06-29  5:04   ` Martin K. Petersen
  2016-07-03 19:52     ` Sitsofe Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Martin K. Petersen @ 2016-06-29  5:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block

>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:

Christoph> WRITE SAME is a data integrity operation and we can't simply
Christoph> ignore errors.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  2016-06-29  5:04   ` Martin K. Petersen
@ 2016-07-03 19:52     ` Sitsofe Wheeler
  0 siblings, 0 replies; 8+ messages in thread
From: Sitsofe Wheeler @ 2016-07-03 19:52 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Christoph Hellwig, axboe, shli, snitzer, linux-block

On Wed, Jun 29, 2016 at 01:04:02AM -0400, Martin K. Petersen wrote:
> >>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:
> 
> Christoph> WRITE SAME is a data integrity operation and we can't simply
> Christoph> ignore errors.
> 
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering

Along with the first patch this resolves the problems I was seeing when
using the script below when applied atop of axboe/linux-block.git branch
for-4.8/core (commit e118e4be8a96c6b89307ee3d360d954249981b8f).

The only snag is that when userland manually disables
max_write_same_blocks on the base device (by echoing 0 into it) stacked
devices will not set their their own queue/write_same_max_bytes to 0
when the BLKZEROOUT fails to use write same (but they do correctly zero
the device). If the base device somehow internally toggles this to 0
itself during a write same then dm devices seem to set
queue/write_same_max_bytes to 0.

Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>

#!/bin/bash
# Check BLKZEROOUT behaviour

modprobe scsi_debug write_same_length=131072
dev=$(lsblk -n -p -S | awk '/scsi_debug/ { print $1 }')
dev_kname=$(lsblk -n -S ${dev} | awk '{ print $1 }')
sys_dev_mwsb_path=$(realpath
/sys/block/"${dev_kname}"/device/scsi_disk/*/max_write_same_blocks)
pvcreate ${dev}
vgcreate vg ${dev}
lvcreate -L4M -n lv vg
lv_dev=/dev/mapper/vg-lv
sys_lv_wsmb_path="/sys/block/$(lsblk -n -o KNAME
"${lv_dev}")/queue/write_same_max_bytes"
orig_max_ws=$(cat "${sys_dev_mwsb_path}")
echo "Stacked device's write_same_max_bytes prior blkzeroout: $(cat
${sys_lv_wsmb_path})"
echo "Forcing base device's max_write_same_blocks to 0"
echo 0 > "${sys_dev_mwsb_path}"
size=$(blockdev --getsize64 "${lv_dev}")
tr '\0' '\377' < /dev/zero | dd oflag=direct bs=64K of="${lv_dev}"
count=$((${size} / (64 * 1024))) 
status=none
blkdiscard -v --zero "${lv_dev}"
dd if="${lv_dev}" iflag=direct bs=64K status=none | cmp -b --bytes
"${size}" /dev/zero -
verification=$?
echo "Stacked device's write_same_max_bytes after blkzeroout: $(cat
${sys_lv_wsmb_path})"
lvchange -a n "${lv_dev}"

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zeroout fixes
  2016-06-24  7:00 zeroout fixes Christoph Hellwig
  2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
  2016-06-24  7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
@ 2016-07-08  9:36 ` Christoph Hellwig
  2 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2016-07-08  9:36 UTC (permalink / raw)
  To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block

On Fri, Jun 24, 2016 at 09:00:49AM +0200, Christoph Hellwig wrote:
> Note that the patches are against the block for-next tree, and
> unfortunately we have diverged significantly from 4.7 in this area:
> 4.7 has the missing bio_put for the sumbmit_bio_wait callers, and
> 4.8 has the various req_op changes, including a new calling
> convention for __blkdev_issue_zeroout.  I'm not sure how to best
> sort this mess out, but I really think we need the bio_put in the
> for-4.8 tree as well, and we should probably bring the changed
> __blkdev_issue_discard calling convention into 4.7 as well.
> Suggestion for how to best handle this?

Jens, any idea how to make forward progress on this issue?  Just
apply to 4.8 and then backport to stable?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  2016-07-19  9:23 resend: " Christoph Hellwig
@ 2016-07-19  9:23 ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2016-07-19  9:23 UTC (permalink / raw)
  To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block

WRITE SAME is a data integrity operation and we can't simply ignore
errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 45b35b1..e371f83 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -178,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 
 	if (bio)
 		ret = submit_bio_wait(bio);
-	return ret != -EOPNOTSUPP ? ret : 0;
+	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-07-19  9:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-24  7:00 zeroout fixes Christoph Hellwig
2016-06-24  7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
2016-06-29  5:03   ` Martin K. Petersen
2016-06-24  7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
2016-06-29  5:04   ` Martin K. Petersen
2016-07-03 19:52     ` Sitsofe Wheeler
2016-07-08  9:36 ` zeroout fixes Christoph Hellwig
2016-07-19  9:23 resend: " Christoph Hellwig
2016-07-19  9:23 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.