linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] block layer change necessary for bcache zoned device support
@ 2020-05-15 16:31 Coly Li
  2020-05-15 16:31 ` [RFC PATCH 1/4] block: change REQ_OP_ZONE_RESET from 6 to 13 Coly Li
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Coly Li @ 2020-05-15 16:31 UTC (permalink / raw)
  To: linux-block, damien.lemoal, hare, hch, axboe; +Cc: linux-bcache, Coly Li

Hi folks,

Recently I am working on supprt bcache to be created on zoned devices
e.g. host managed SMR hard drives, then frequent random READ I/Os on
these SMR drives can be accelerated.

To make the bcache code work correctly, there are some small but maybe
important changes of block layer code are necessary. Here I post these
patches for your review and comments.

Thank you all in advance.

Coly Li
---

Coly Li (4):
  block: change REQ_OP_ZONE_RESET from 6 to 13
  block: block: change REQ_OP_ZONE_RESET from 8 to 15
  block: remove queue_is_mq restriction from blk_revalidate_disk_zones()
  block: set bi_size to REQ_OP_ZONE_RESET bio

 block/blk-zoned.c         | 6 ++++--
 include/linux/blk_types.h | 8 ++++----
 2 files changed, 8 insertions(+), 6 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/4] block: change REQ_OP_ZONE_RESET from 6 to 13
  2020-05-15 16:31 [RFC PATCH 0/4] block layer change necessary for bcache zoned device support Coly Li
@ 2020-05-15 16:31 ` Coly Li
  2020-05-15 16:31 ` [RFC PATCH 2/4] block: block: change REQ_OP_ZONE_RESET from 8 to 15 Coly Li
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Coly Li @ 2020-05-15 16:31 UTC (permalink / raw)
  To: linux-block, damien.lemoal, hare, hch, axboe
  Cc: linux-bcache, Coly Li, Hannes Reinecke, Jens Axboe,
	Johannes Thumshirn, Shaun Tancheff

For a zoned device, e.g. host managed SMR hard drive, REQ_OP_ZONE_RESET
is to reset the LBA of a zone's write pointer back to the start LBA of
this zone. After the write point is reset, all previously stored data
in this zone is invalid and unaccessible anymore. Therefore, this op
code changes on disk data, belongs to a WRITE request op code.

Current REQ_OP_ZONE_RESET is defined as number 6, but the convention of
the op code is, READ requests are even numbers, and WRITE requests are
odd numbers. See how op_is_write defined,

397 static inline bool op_is_write(unsigned int op)
398 {
399         return (op & 1);
400 }

When create bcache device on top of the zoned SMR drive, bcache driver
has to handle the REQ_OP_ZONE_RESET bio by invalidate all cached data
belongs to the LBA range of the restting zone. A wrong op code value
will make the this zone management bio goes into bcache' read requests
code path and causes undefined result.

This patch changes REQ_OP_ZONE_RESET value from 6 to 13, the new odd
number will make bcache driver handle this zone management bio properly
in write requests code path.

Fixes: 87374179c535 ("block: add a proper block layer data direction encoding")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@fb.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Shaun Tancheff <shaun.tancheff@seagate.com>
---
 include/linux/blk_types.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 31eb92876be7..8f7bc15a6de3 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -282,8 +282,6 @@ enum req_opf {
 	REQ_OP_DISCARD		= 3,
 	/* securely erase sectors */
 	REQ_OP_SECURE_ERASE	= 5,
-	/* reset a zone write pointer */
-	REQ_OP_ZONE_RESET	= 6,
 	/* write the same sector many times */
 	REQ_OP_WRITE_SAME	= 7,
 	/* reset all the zone present on the device */
@@ -296,6 +294,8 @@ enum req_opf {
 	REQ_OP_ZONE_CLOSE	= 11,
 	/* Transition a zone to full */
 	REQ_OP_ZONE_FINISH	= 12,
+	/* reset a zone write pointer */
+	REQ_OP_ZONE_RESET	= 13,
 
 	/* SCSI passthrough using struct scsi_request */
 	REQ_OP_SCSI_IN		= 32,
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 2/4] block: block: change REQ_OP_ZONE_RESET from 8 to 15
  2020-05-15 16:31 [RFC PATCH 0/4] block layer change necessary for bcache zoned device support Coly Li
  2020-05-15 16:31 ` [RFC PATCH 1/4] block: change REQ_OP_ZONE_RESET from 6 to 13 Coly Li
@ 2020-05-15 16:31 ` Coly Li
  2020-05-15 17:22   ` Keith Busch
  2020-05-15 16:31 ` [RFC PATCH 3/4] block: remove queue_is_mq restriction from blk_revalidate_disk_zones() Coly Li
  2020-05-15 16:31 ` [RFC PATCH 4/4] block: set bi_size to REQ_OP_ZONE_RESET bio Coly Li
  3 siblings, 1 reply; 6+ messages in thread
From: Coly Li @ 2020-05-15 16:31 UTC (permalink / raw)
  To: linux-block, damien.lemoal, hare, hch, axboe
  Cc: linux-bcache, Coly Li, Chaitanya Kulkarni, Johannes Thumshirn

For a zoned device, e.g. host managed SMR hard drive, the request op
REQ_OP_ZONE_RESET_ALL is to reset LBA of all zones' writer pointers to
the start LBA of the zone they belong to. After all the write pointers
are reset, all previously stored data is invalid and unaccessible.
Therefore this op code changes on-disk data, belongs to WRITE request
op code.

Similar to the problem of REQ_OP_ZONE_RESET, REQ_OP_ZONE_RESET is an
even number 8, it means bios with REQ_OP_ZONE_RESET_ALL in bio->bi_opf
will be treated as READ request by op_is_write().

Same problem will happen when bcache device is created on top of zoned
device like host managed SMR hard drive, bcache driver should invalid
all cached data for the REQ_OP_ZONE_RESET_ALL request, but this zone
management bio is mistakenly treated as READ request and go into bcache
read code path, which will cause undefined behavior.

This patch changes REQ_OP_ZONE_RESET_ALL value from 8 to 15, this new
odd number will make bcache driver handle this zone management bio in
write request code path properly.

Fixes: e84e8f066395 ("block: add req op to reset all zones and flag")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 include/linux/blk_types.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 8f7bc15a6de3..618032fa1b29 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -284,8 +284,6 @@ enum req_opf {
 	REQ_OP_SECURE_ERASE	= 5,
 	/* write the same sector many times */
 	REQ_OP_WRITE_SAME	= 7,
-	/* reset all the zone present on the device */
-	REQ_OP_ZONE_RESET_ALL	= 8,
 	/* write the zero filled sector many times */
 	REQ_OP_WRITE_ZEROES	= 9,
 	/* Open a zone */
@@ -296,6 +294,8 @@ enum req_opf {
 	REQ_OP_ZONE_FINISH	= 12,
 	/* reset a zone write pointer */
 	REQ_OP_ZONE_RESET	= 13,
+	/* reset all the zone present on the device */
+	REQ_OP_ZONE_RESET_ALL	= 15,
 
 	/* SCSI passthrough using struct scsi_request */
 	REQ_OP_SCSI_IN		= 32,
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 3/4] block: remove queue_is_mq restriction from blk_revalidate_disk_zones()
  2020-05-15 16:31 [RFC PATCH 0/4] block layer change necessary for bcache zoned device support Coly Li
  2020-05-15 16:31 ` [RFC PATCH 1/4] block: change REQ_OP_ZONE_RESET from 6 to 13 Coly Li
  2020-05-15 16:31 ` [RFC PATCH 2/4] block: block: change REQ_OP_ZONE_RESET from 8 to 15 Coly Li
@ 2020-05-15 16:31 ` Coly Li
  2020-05-15 16:31 ` [RFC PATCH 4/4] block: set bi_size to REQ_OP_ZONE_RESET bio Coly Li
  3 siblings, 0 replies; 6+ messages in thread
From: Coly Li @ 2020-05-15 16:31 UTC (permalink / raw)
  To: linux-block, damien.lemoal, hare, hch, axboe; +Cc: linux-bcache, Coly Li

The bcache driver is bio based and NOT request based multiqueued driver,
if a zoned SMR hard drive is used as backing device of a bcache device,
calling blk_revalidate_disk_zones() for the bcache device will fail due
to the following check in blk_revalidate_disk_zones(),
478       if (WARN_ON_ONCE(!queue_is_mq(q)))
479             return -EIO;

Now bcache is able to export the zoned information from the underlying
zoned SMR drives and format zonefs on top of a bcache device, the
resitriction that a zoned device should be multiqueued is unnecessary
for now.

Although in commit ae58954d8734c ("block: don't handle bio based drivers
in blk_revalidate_disk_zones") it is said that bio based drivers should
not call blk_revalidate_disk_zones() and just manually update their own
q->nr_zones, but this is inaccurate. The bio based drivers also need to
set their zone size and initialize bitmaps for cnv and seq zones, it is
necessary to call blk_revalidate_disk_zones() for bio based drivers.

This patch removes the above queue_is_mq() restriction to permit
bcache driver calls blk_revalidate_disk_zones() for bcache device zoned
information initialization.

Fixes: ae58954d8734c ("block: don't handle bio based drivers in blk_revalidate_disk_zones")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/blk-zoned.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index f87956e0dcaf..1e0708c68267 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -475,8 +475,6 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
 
 	if (WARN_ON_ONCE(!blk_queue_is_zoned(q)))
 		return -EIO;
-	if (WARN_ON_ONCE(!queue_is_mq(q)))
-		return -EIO;
 
 	/*
 	 * Ensure that all memory allocations in this context are done as if
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 4/4] block: set bi_size to REQ_OP_ZONE_RESET bio
  2020-05-15 16:31 [RFC PATCH 0/4] block layer change necessary for bcache zoned device support Coly Li
                   ` (2 preceding siblings ...)
  2020-05-15 16:31 ` [RFC PATCH 3/4] block: remove queue_is_mq restriction from blk_revalidate_disk_zones() Coly Li
@ 2020-05-15 16:31 ` Coly Li
  3 siblings, 0 replies; 6+ messages in thread
From: Coly Li @ 2020-05-15 16:31 UTC (permalink / raw)
  To: linux-block, damien.lemoal, hare, hch, axboe
  Cc: linux-bcache, Coly Li, Ajay Joshi, Chaitanya Kulkarni,
	Hannes Reinecke, Johannes Thumshirn

Now for zoned device, zone management ioctl commands are converted into
zone management bios and handled by blkdev_zone_mgmt(). There are 4 zone
management bios are handled, their op code is,
- REQ_OP_ZONE_RESET
  Reset the zone's writer pointer and empty all previously stored data.
- REQ_OP_ZONE_OPEN
  Open the zones in the specified sector range, no influence on data.
- REQ_OP_ZONE_CLOSE
  Close the zones in the specified sector range, no influence on data.
- REQ_OP_ZONE_FINISH
  Mark the zone as full, no influence on data.
All the zone management bios has 0 byte size, a.k.a their bi_size is 0.

Exept for REQ_OP_ZONE_RESET request, zero length bio works fine for
other zone management bio, before the zoned device e.g. host managed SMR
hard drive can be created as a bcache device.

When a bcache device (virtual block device to forward bios like md raid
drivers) can be created on top of the zoned device, and a fast SSD is
attached as a cache device, bcache driver may cache the frequent random
READ requests on fast SSD to accelerate hot data READ performance.

When bcache driver receives a zone management bio for REQ_OP_ZONE_RESET
op, while forwarding the request to underlying zoned device e.g. host
managed SMR hard drive, it should also invalidate all cached data from
SSD for the resetting zone. Otherwise bcache will continue provide the
outdated cached data to READ request and cause potential data storage
inconsistency and corruption.

In order to invalidate outdated data from SSD for the reset zone, bcache
needs to know not only the start LBA but also the range length of the
resetting zone. Otherwise, bcache won't be able to accurately invalidate
the outdated cached data.

Is it possible to simply set the bi_size inside bcache driver? The
answer is NO. Although every REQ_OP_ZONE_RESET bio has exact length as
zone size or q->limits.chunk_sectors, it is possible that some other
layer stacking block driver (in the future) exists between bcache driver
and blkdev_zone_mgmt() where the zone management bio is made.

The best location to set bi_size is where the zone management bio is
composed in blkdev_zone_mgmt(), then no matter how this bio is split
before bcache driver receives it, bcache driver can always correctly
invalidate the resetting range.

This patch sets the bi_size of REQ_OP_ZONE_RESET bio for each resetting
zone. Here REQ_OP_ZONE_RESET is special whose bi_size should be set as
capacity of whole drive size, then bcache can invalidate all cached data
from SSD for the zoned backing device.

With this change, now bcache code can handle REQ_OP_ZONE_RESET bio in
the way very similar to REQ_OP_DISCARD bio with very little change.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Ajay Joshi <ajay.joshi@wdc.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 block/blk-zoned.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 1e0708c68267..01d91314399b 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -227,11 +227,15 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
 		if (op == REQ_OP_ZONE_RESET &&
 		    blkdev_allow_reset_all_zones(bdev, sector, nr_sectors)) {
 			bio->bi_opf = REQ_OP_ZONE_RESET_ALL;
+			bio->bi_iter.bi_sector = sector;
+			bio->bi_iter.bi_size = nr_sectors;
 			break;
 		}
 
 		bio->bi_opf = op | REQ_SYNC;
 		bio->bi_iter.bi_sector = sector;
+		if (op == REQ_OP_ZONE_RESET)
+			bio->bi_iter.bi_size = zone_sectors;
 		sector += zone_sectors;
 
 		/* This may take a while, so be nice to others */
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 2/4] block: block: change REQ_OP_ZONE_RESET from 8 to 15
  2020-05-15 16:31 ` [RFC PATCH 2/4] block: block: change REQ_OP_ZONE_RESET from 8 to 15 Coly Li
@ 2020-05-15 17:22   ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2020-05-15 17:22 UTC (permalink / raw)
  To: Coly Li
  Cc: linux-block, damien.lemoal, hare, hch, axboe, linux-bcache,
	Chaitanya Kulkarni, Johannes Thumshirn

The subject title for this patch should mention "REQ_OP_ZONE_RESET_ALL"
instead of "REQ_OP_ZONE_RESET".

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-15 17:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-15 16:31 [RFC PATCH 0/4] block layer change necessary for bcache zoned device support Coly Li
2020-05-15 16:31 ` [RFC PATCH 1/4] block: change REQ_OP_ZONE_RESET from 6 to 13 Coly Li
2020-05-15 16:31 ` [RFC PATCH 2/4] block: block: change REQ_OP_ZONE_RESET from 8 to 15 Coly Li
2020-05-15 17:22   ` Keith Busch
2020-05-15 16:31 ` [RFC PATCH 3/4] block: remove queue_is_mq restriction from blk_revalidate_disk_zones() Coly Li
2020-05-15 16:31 ` [RFC PATCH 4/4] block: set bi_size to REQ_OP_ZONE_RESET bio Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).