* zeroout fixes @ 2016-06-24 7:00 Christoph Hellwig 2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Christoph Hellwig @ 2016-06-24 7:00 UTC (permalink / raw) To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block The first one is a repost of my BLKDEV_DISCARD_ZERO with the spelling fixes suggest from Martin, the second one just drops the -EOPNOTSUPP special casing in blkdev_issue_write_same. Note that the patches are against the block for-next tree, and unfortunately we have diverged significantly from 4.7 in this area: 4.7 has the missing bio_put for the sumbmit_bio_wait callers, and 4.8 has the various req_op changes, including a new calling convention for __blkdev_issue_zeroout. I'm not sure how to best sort this mess out, but I really think we need the bio_put in the for-4.8 tree as well, and we should probably bring the changed __blkdev_issue_discard calling convention into 4.7 as well. Suggestion for how to best handle this? ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout 2016-06-24 7:00 zeroout fixes Christoph Hellwig @ 2016-06-24 7:00 ` Christoph Hellwig 2016-06-29 5:03 ` Martin K. Petersen 2016-06-24 7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig 2016-07-08 9:36 ` zeroout fixes Christoph Hellwig 2 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2016-06-24 7:00 UTC (permalink / raw) To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block Currently blkdev_issue_zeroout cascades down from discards (if the driver guarantees that discards zero data), to WRITE SAME and then to a loop writing zeroes. Unfortunately we ignore run-time EOPNOTSUPP errors in the block layer blkdev_issue_discard helper to work around DM volumes that may have mixed discard support underneath. This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to blkdev_issue_discard that indicates we are called for zeroing operation. This allows both to ignore the EOPNOTSUPP hack and actually consolidating the discard_zeroes_data check into the function. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-lib.c | 17 +++++++++++------ include/linux/blkdev.h | 4 +++- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 78626c2..45b35b1 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -36,12 +36,17 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, return -ENXIO; if (flags & BLKDEV_DISCARD_SECURE) { + if (flags & BLKDEV_DISCARD_ZERO) + return -EOPNOTSUPP; if (!blk_queue_secure_erase(q)) return -EOPNOTSUPP; op = REQ_OP_SECURE_ERASE; } else { if (!blk_queue_discard(q)) return -EOPNOTSUPP; + if ((flags & BLKDEV_DISCARD_ZERO) && + !q->limits.discard_zeroes_data) + return -EOPNOTSUPP; op = REQ_OP_DISCARD; } @@ -116,7 +121,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, &bio); if (!ret && bio) { ret = submit_bio_wait(bio); - if (ret == -EOPNOTSUPP) + if (ret == -EOPNOTSUPP && !(flags & BLKDEV_DISCARD_ZERO)) ret = 0; } blk_finish_plug(&plug); @@ -241,11 +246,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, bool discard) { - struct request_queue *q = bdev_get_queue(bdev); - - if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data && - blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0) - return 0; + if (discard) { + if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, + BLKDEV_DISCARD_ZERO)) + return 0; + } if (bdev_write_same(bdev) && blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 9d1e0a4..b65ca66 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1136,7 +1136,9 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt, return bqt->tag_index[tag]; } -#define BLKDEV_DISCARD_SECURE 0x01 /* secure discard */ + +#define BLKDEV_DISCARD_SECURE (1 << 0) /* issue a secure erase */ +#define BLKDEV_DISCARD_ZERO (1 << 1) /* must reliably zero data */ extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *); extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, -- 2.1.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout 2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig @ 2016-06-29 5:03 ` Martin K. Petersen 0 siblings, 0 replies; 8+ messages in thread From: Martin K. Petersen @ 2016-06-29 5:03 UTC (permalink / raw) To: Christoph Hellwig Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block >>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes: Christoph> Currently blkdev_issue_zeroout cascades down from discards Christoph> (if the driver guarantees that discards zero data), to WRITE Christoph> SAME and then to a loop writing zeroes. Unfortunately we Christoph> ignore run-time EOPNOTSUPP errors in the block layer Christoph> blkdev_issue_discard helper to work around DM volumes that Christoph> may have mixed discard support underneath. Christoph> This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to Christoph> blkdev_issue_discard that indicates we are called for zeroing Christoph> operation. This allows both to ignore the EOPNOTSUPP hack Christoph> and actually consolidating the discard_zeroes_data check into Christoph> the function. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same 2016-06-24 7:00 zeroout fixes Christoph Hellwig 2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig @ 2016-06-24 7:00 ` Christoph Hellwig 2016-06-29 5:04 ` Martin K. Petersen 2016-07-08 9:36 ` zeroout fixes Christoph Hellwig 2 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2016-06-24 7:00 UTC (permalink / raw) To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block WRITE SAME is a data integrity operation and we can't simply ignore errors. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 45b35b1..e371f83 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -178,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (bio) ret = submit_bio_wait(bio); - return ret != -EOPNOTSUPP ? ret : 0; + return ret; } EXPORT_SYMBOL(blkdev_issue_write_same); -- 2.1.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same 2016-06-24 7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig @ 2016-06-29 5:04 ` Martin K. Petersen 2016-07-03 19:52 ` Sitsofe Wheeler 0 siblings, 1 reply; 8+ messages in thread From: Martin K. Petersen @ 2016-06-29 5:04 UTC (permalink / raw) To: Christoph Hellwig Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block >>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes: Christoph> WRITE SAME is a data integrity operation and we can't simply Christoph> ignore errors. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same 2016-06-29 5:04 ` Martin K. Petersen @ 2016-07-03 19:52 ` Sitsofe Wheeler 0 siblings, 0 replies; 8+ messages in thread From: Sitsofe Wheeler @ 2016-07-03 19:52 UTC (permalink / raw) To: Martin K. Petersen; +Cc: Christoph Hellwig, axboe, shli, snitzer, linux-block On Wed, Jun 29, 2016 at 01:04:02AM -0400, Martin K. Petersen wrote: > >>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes: > > Christoph> WRITE SAME is a data integrity operation and we can't simply > Christoph> ignore errors. > > Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> > > -- > Martin K. Petersen Oracle Linux Engineering Along with the first patch this resolves the problems I was seeing when using the script below when applied atop of axboe/linux-block.git branch for-4.8/core (commit e118e4be8a96c6b89307ee3d360d954249981b8f). The only snag is that when userland manually disables max_write_same_blocks on the base device (by echoing 0 into it) stacked devices will not set their their own queue/write_same_max_bytes to 0 when the BLKZEROOUT fails to use write same (but they do correctly zero the device). If the base device somehow internally toggles this to 0 itself during a write same then dm devices seem to set queue/write_same_max_bytes to 0. Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> #!/bin/bash # Check BLKZEROOUT behaviour modprobe scsi_debug write_same_length=131072 dev=$(lsblk -n -p -S | awk '/scsi_debug/ { print $1 }') dev_kname=$(lsblk -n -S ${dev} | awk '{ print $1 }') sys_dev_mwsb_path=$(realpath /sys/block/"${dev_kname}"/device/scsi_disk/*/max_write_same_blocks) pvcreate ${dev} vgcreate vg ${dev} lvcreate -L4M -n lv vg lv_dev=/dev/mapper/vg-lv sys_lv_wsmb_path="/sys/block/$(lsblk -n -o KNAME "${lv_dev}")/queue/write_same_max_bytes" orig_max_ws=$(cat "${sys_dev_mwsb_path}") echo "Stacked device's write_same_max_bytes prior blkzeroout: $(cat ${sys_lv_wsmb_path})" echo "Forcing base device's max_write_same_blocks to 0" echo 0 > "${sys_dev_mwsb_path}" size=$(blockdev --getsize64 "${lv_dev}") tr '\0' '\377' < /dev/zero | dd oflag=direct bs=64K of="${lv_dev}" count=$((${size} / (64 * 1024))) status=none blkdiscard -v --zero "${lv_dev}" dd if="${lv_dev}" iflag=direct bs=64K status=none | cmp -b --bytes "${size}" /dev/zero - verification=$? echo "Stacked device's write_same_max_bytes after blkzeroout: $(cat ${sys_lv_wsmb_path})" lvchange -a n "${lv_dev}" -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zeroout fixes 2016-06-24 7:00 zeroout fixes Christoph Hellwig 2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig 2016-06-24 7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig @ 2016-07-08 9:36 ` Christoph Hellwig 2 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2016-07-08 9:36 UTC (permalink / raw) To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block On Fri, Jun 24, 2016 at 09:00:49AM +0200, Christoph Hellwig wrote: > Note that the patches are against the block for-next tree, and > unfortunately we have diverged significantly from 4.7 in this area: > 4.7 has the missing bio_put for the sumbmit_bio_wait callers, and > 4.8 has the various req_op changes, including a new calling > convention for __blkdev_issue_zeroout. I'm not sure how to best > sort this mess out, but I really think we need the bio_put in the > for-4.8 tree as well, and we should probably bring the changed > __blkdev_issue_discard calling convention into 4.7 as well. > Suggestion for how to best handle this? Jens, any idea how to make forward progress on this issue? Just apply to 4.8 and then backport to stable? ^ permalink raw reply [flat|nested] 8+ messages in thread
* resend: zeroout fixes @ 2016-07-19 9:23 Christoph Hellwig 2016-07-19 9:23 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig 0 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2016-07-19 9:23 UTC (permalink / raw) To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block The first one is a repost of my BLKDEV_DISCARD_ZERO with the spelling fixes suggest from Martin, the second one just drops the -EOPNOTSUPP special casing in blkdev_issue_write_same. Note that the patches are against the block for-next tree, and unfortunately we have diverged significantly from 4.7 in this area: 4.7 has the missing bio_put for the sumbmit_bio_wait callers, and 4.8 has the various req_op changes, including a new calling convention for __blkdev_issue_zeroout. I'm not sure how to best sort this mess out, but I really think we need the bio_put in the for-4.8 tree as well, and we should probably bring the changed __blkdev_issue_discard calling convention into 4.7 as well. Suggestion for how to best handle this? Compared to the last posting on June 24th this just adds the Reviewed-by: tags from Martin. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same 2016-07-19 9:23 resend: " Christoph Hellwig @ 2016-07-19 9:23 ` Christoph Hellwig 0 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2016-07-19 9:23 UTC (permalink / raw) To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block WRITE SAME is a data integrity operation and we can't simply ignore errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> --- block/blk-lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 45b35b1..e371f83 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -178,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (bio) ret = submit_bio_wait(bio); - return ret != -EOPNOTSUPP ? ret : 0; + return ret; } EXPORT_SYMBOL(blkdev_issue_write_same); -- 2.1.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-07-19 9:23 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-24 7:00 zeroout fixes Christoph Hellwig 2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig 2016-06-29 5:03 ` Martin K. Petersen 2016-06-24 7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig 2016-06-29 5:04 ` Martin K. Petersen 2016-07-03 19:52 ` Sitsofe Wheeler 2016-07-08 9:36 ` zeroout fixes Christoph Hellwig 2016-07-19 9:23 resend: " Christoph Hellwig 2016-07-19 9:23 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.