* zeroout fixes
@ 2016-06-24 7:00 Christoph Hellwig
2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Christoph Hellwig @ 2016-06-24 7:00 UTC (permalink / raw)
To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block
The first one is a repost of my BLKDEV_DISCARD_ZERO with the spelling
fixes suggest from Martin, the second one just drops the -EOPNOTSUPP
special casing in blkdev_issue_write_same.
Note that the patches are against the block for-next tree, and
unfortunately we have diverged significantly from 4.7 in this area:
4.7 has the missing bio_put for the sumbmit_bio_wait callers, and
4.8 has the various req_op changes, including a new calling
convention for __blkdev_issue_zeroout. I'm not sure how to best
sort this mess out, but I really think we need the bio_put in the
for-4.8 tree as well, and we should probably bring the changed
__blkdev_issue_discard calling convention into 4.7 as well.
Suggestion for how to best handle this?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-24 7:00 zeroout fixes Christoph Hellwig
@ 2016-06-24 7:00 ` Christoph Hellwig
2016-06-29 5:03 ` Martin K. Petersen
2016-06-24 7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
2016-07-08 9:36 ` zeroout fixes Christoph Hellwig
2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2016-06-24 7:00 UTC (permalink / raw)
To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block
Currently blkdev_issue_zeroout cascades down from discards (if the driver
guarantees that discards zero data), to WRITE SAME and then to a loop
writing zeroes. Unfortunately we ignore run-time EOPNOTSUPP errors in the
block layer blkdev_issue_discard helper to work around DM volumes that
may have mixed discard support underneath.
This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to
blkdev_issue_discard that indicates we are called for zeroing operation.
This allows both to ignore the EOPNOTSUPP hack and actually consolidating
the discard_zeroes_data check into the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-lib.c | 17 +++++++++++------
include/linux/blkdev.h | 4 +++-
2 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 78626c2..45b35b1 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -36,12 +36,17 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
return -ENXIO;
if (flags & BLKDEV_DISCARD_SECURE) {
+ if (flags & BLKDEV_DISCARD_ZERO)
+ return -EOPNOTSUPP;
if (!blk_queue_secure_erase(q))
return -EOPNOTSUPP;
op = REQ_OP_SECURE_ERASE;
} else {
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
+ if ((flags & BLKDEV_DISCARD_ZERO) &&
+ !q->limits.discard_zeroes_data)
+ return -EOPNOTSUPP;
op = REQ_OP_DISCARD;
}
@@ -116,7 +121,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
&bio);
if (!ret && bio) {
ret = submit_bio_wait(bio);
- if (ret == -EOPNOTSUPP)
+ if (ret == -EOPNOTSUPP && !(flags & BLKDEV_DISCARD_ZERO))
ret = 0;
}
blk_finish_plug(&plug);
@@ -241,11 +246,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, bool discard)
{
- struct request_queue *q = bdev_get_queue(bdev);
-
- if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
- blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
- return 0;
+ if (discard) {
+ if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+ BLKDEV_DISCARD_ZERO))
+ return 0;
+ }
if (bdev_write_same(bdev) &&
blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9d1e0a4..b65ca66 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1136,7 +1136,9 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
return bqt->tag_index[tag];
}
-#define BLKDEV_DISCARD_SECURE 0x01 /* secure discard */
+
+#define BLKDEV_DISCARD_SECURE (1 << 0) /* issue a secure erase */
+#define BLKDEV_DISCARD_ZERO (1 << 1) /* must reliably zero data */
extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
2016-06-24 7:00 zeroout fixes Christoph Hellwig
2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
@ 2016-06-24 7:00 ` Christoph Hellwig
2016-06-29 5:04 ` Martin K. Petersen
2016-07-08 9:36 ` zeroout fixes Christoph Hellwig
2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2016-06-24 7:00 UTC (permalink / raw)
To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block
WRITE SAME is a data integrity operation and we can't simply ignore
errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 45b35b1..e371f83 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -178,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
if (bio)
ret = submit_bio_wait(bio);
- return ret != -EOPNOTSUPP ? ret : 0;
+ return ret;
}
EXPORT_SYMBOL(blkdev_issue_write_same);
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
@ 2016-06-29 5:03 ` Martin K. Petersen
0 siblings, 0 replies; 8+ messages in thread
From: Martin K. Petersen @ 2016-06-29 5:03 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block
>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:
Christoph> Currently blkdev_issue_zeroout cascades down from discards
Christoph> (if the driver guarantees that discards zero data), to WRITE
Christoph> SAME and then to a loop writing zeroes. Unfortunately we
Christoph> ignore run-time EOPNOTSUPP errors in the block layer
Christoph> blkdev_issue_discard helper to work around DM volumes that
Christoph> may have mixed discard support underneath.
Christoph> This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to
Christoph> blkdev_issue_discard that indicates we are called for zeroing
Christoph> operation. This allows both to ignore the EOPNOTSUPP hack
Christoph> and actually consolidating the discard_zeroes_data check into
Christoph> the function.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
2016-06-24 7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
@ 2016-06-29 5:04 ` Martin K. Petersen
2016-07-03 19:52 ` Sitsofe Wheeler
0 siblings, 1 reply; 8+ messages in thread
From: Martin K. Petersen @ 2016-06-29 5:04 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block
>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:
Christoph> WRITE SAME is a data integrity operation and we can't simply
Christoph> ignore errors.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
2016-06-29 5:04 ` Martin K. Petersen
@ 2016-07-03 19:52 ` Sitsofe Wheeler
0 siblings, 0 replies; 8+ messages in thread
From: Sitsofe Wheeler @ 2016-07-03 19:52 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: Christoph Hellwig, axboe, shli, snitzer, linux-block
On Wed, Jun 29, 2016 at 01:04:02AM -0400, Martin K. Petersen wrote:
> >>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:
>
> Christoph> WRITE SAME is a data integrity operation and we can't simply
> Christoph> ignore errors.
>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
>
> --
> Martin K. Petersen Oracle Linux Engineering
Along with the first patch this resolves the problems I was seeing when
using the script below when applied atop of axboe/linux-block.git branch
for-4.8/core (commit e118e4be8a96c6b89307ee3d360d954249981b8f).
The only snag is that when userland manually disables
max_write_same_blocks on the base device (by echoing 0 into it) stacked
devices will not set their their own queue/write_same_max_bytes to 0
when the BLKZEROOUT fails to use write same (but they do correctly zero
the device). If the base device somehow internally toggles this to 0
itself during a write same then dm devices seem to set
queue/write_same_max_bytes to 0.
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
#!/bin/bash
# Check BLKZEROOUT behaviour
modprobe scsi_debug write_same_length=131072
dev=$(lsblk -n -p -S | awk '/scsi_debug/ { print $1 }')
dev_kname=$(lsblk -n -S ${dev} | awk '{ print $1 }')
sys_dev_mwsb_path=$(realpath
/sys/block/"${dev_kname}"/device/scsi_disk/*/max_write_same_blocks)
pvcreate ${dev}
vgcreate vg ${dev}
lvcreate -L4M -n lv vg
lv_dev=/dev/mapper/vg-lv
sys_lv_wsmb_path="/sys/block/$(lsblk -n -o KNAME
"${lv_dev}")/queue/write_same_max_bytes"
orig_max_ws=$(cat "${sys_dev_mwsb_path}")
echo "Stacked device's write_same_max_bytes prior blkzeroout: $(cat
${sys_lv_wsmb_path})"
echo "Forcing base device's max_write_same_blocks to 0"
echo 0 > "${sys_dev_mwsb_path}"
size=$(blockdev --getsize64 "${lv_dev}")
tr '\0' '\377' < /dev/zero | dd oflag=direct bs=64K of="${lv_dev}"
count=$((${size} / (64 * 1024)))
status=none
blkdiscard -v --zero "${lv_dev}"
dd if="${lv_dev}" iflag=direct bs=64K status=none | cmp -b --bytes
"${size}" /dev/zero -
verification=$?
echo "Stacked device's write_same_max_bytes after blkzeroout: $(cat
${sys_lv_wsmb_path})"
lvchange -a n "${lv_dev}"
--
Sitsofe | http://sucs.org/~sits/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zeroout fixes
2016-06-24 7:00 zeroout fixes Christoph Hellwig
2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
2016-06-24 7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
@ 2016-07-08 9:36 ` Christoph Hellwig
2 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2016-07-08 9:36 UTC (permalink / raw)
To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block
On Fri, Jun 24, 2016 at 09:00:49AM +0200, Christoph Hellwig wrote:
> Note that the patches are against the block for-next tree, and
> unfortunately we have diverged significantly from 4.7 in this area:
> 4.7 has the missing bio_put for the sumbmit_bio_wait callers, and
> 4.8 has the various req_op changes, including a new calling
> convention for __blkdev_issue_zeroout. I'm not sure how to best
> sort this mess out, but I really think we need the bio_put in the
> for-4.8 tree as well, and we should probably bring the changed
> __blkdev_issue_discard calling convention into 4.7 as well.
> Suggestion for how to best handle this?
Jens, any idea how to make forward progress on this issue? Just
apply to 4.8 and then backport to stable?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
2016-07-19 9:23 resend: " Christoph Hellwig
@ 2016-07-19 9:23 ` Christoph Hellwig
0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2016-07-19 9:23 UTC (permalink / raw)
To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block
WRITE SAME is a data integrity operation and we can't simply ignore
errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/blk-lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 45b35b1..e371f83 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -178,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
if (bio)
ret = submit_bio_wait(bio);
- return ret != -EOPNOTSUPP ? ret : 0;
+ return ret;
}
EXPORT_SYMBOL(blkdev_issue_write_same);
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-07-19 9:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-24 7:00 zeroout fixes Christoph Hellwig
2016-06-24 7:00 ` [PATCH 1/2] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
2016-06-29 5:03 ` Martin K. Petersen
2016-06-24 7:00 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
2016-06-29 5:04 ` Martin K. Petersen
2016-07-03 19:52 ` Sitsofe Wheeler
2016-07-08 9:36 ` zeroout fixes Christoph Hellwig
2016-07-19 9:23 resend: " Christoph Hellwig
2016-07-19 9:23 ` [PATCH 2/2] block: don't ignore -EOPNOTSUPP blkdev_issue_write_same Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.