All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/11] Zoned block device support improvements
@ 2018-10-12  2:30 Damien Le Moal
  2018-10-12  2:30 ` [PATCH v3 01/11] scsi: sd_zbc: Rearrange code Damien Le Moal
                   ` (10 more replies)
  0 siblings, 11 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

This series improves zoned block device support (reduce overhead) and
introduces many simplifications to the code (overall, there are more deletions
than insertions).

In more details:
* Patches 1 to 3 are SCSI side (sd driver) cleanups and improvements reducing
  the overhead of report zones command execution during disk scan and
  revalidation.
* Patches 4 to 9 improve the useability and user API of zoned block devices.
* Patch 10 is the main part of this series. This patch replaces the
  REQ_OP_ZONE_REPORT BIO/request operation for executing report zones commands
  with a block device file operation, removing the need for the command reply
  payload in-place rewriting in the BIO buffer. This leads to major
  simplification of the code in many places.
* Patch 11 further simplifies the code of low level drivers by providing a
  generic implementation of zoned block device request queue zone bitmaps
  initialization and revalidation.

Please consider the addition of these patches in 4.20.
Comments are as always welcome.

Changes from v2:
* Reworked patch 9 to preserve the declaration of struct request_queue nr_zones
  field being conditional on CONFIG_BLK_DEV_ZONED

Changes from v1:
* Addressed Christoph's and Bart's comments
* Fixed several compilation errors with zoned block device support disabled
* Rebased on latest rc including the most recent dm patches

Christoph Hellwig (1):
  block: add a report_zones method

Damien Le Moal (10):
  scsi: sd_zbc: Rearrange code
  scsi: sd_zbc: Reduce boot device scan and revalidate time
  scsi: sd_zbc: Fix sd_zbc_check_zones() error checks
  block: Introduce blkdev_nr_zones() helper
  block: Limit allocation of zone descriptors for report zones
  block: Introduce BLKGETZONESZ ioctl
  block: Introduce BLKGETNRZONES ioctl
  block: Improve zone reset execution
  block: Expose queue nr_zones in sysfs
  block: Introduce blk_revalidate_disk_zones()

 block/blk-core.c               |   1 -
 block/blk-lib.c                |  13 +-
 block/blk-mq-debugfs.c         |   1 -
 block/blk-sysfs.c              |  13 +
 block/blk-zoned.c              | 359 +++++++++++++++---------
 block/blk.h                    |   8 +
 block/ioctl.c                  |   4 +
 drivers/block/null_blk.h       |  11 +-
 drivers/block/null_blk_main.c  |  30 +-
 drivers/block/null_blk_zoned.c |  57 +---
 drivers/md/dm-flakey.c         |  30 +-
 drivers/md/dm-linear.c         |  35 ++-
 drivers/md/dm-table.c          |  10 +
 drivers/md/dm-zoned-target.c   |   3 +-
 drivers/md/dm.c                | 169 ++++++-----
 drivers/scsi/sd.c              |  15 +-
 drivers/scsi/sd.h              |  15 +-
 drivers/scsi/sd_zbc.c          | 497 +++++++++------------------------
 include/linux/blk_types.h      |   2 -
 include/linux/blkdev.h         |  22 +-
 include/linux/device-mapper.h  |  12 +-
 include/trace/events/f2fs.h    |   1 -
 include/uapi/linux/blkzoned.h  |   3 +
 23 files changed, 591 insertions(+), 720 deletions(-)

-- 
2.17.1


















Christoph Hellwig (1):
  block: add a report_zones method

Damien Le Moal (10):
  scsi: sd_zbc: Rearrange code
  scsi: sd_zbc: Reduce boot device scan and revalidate time
  scsi: sd_zbc: Fix sd_zbc_check_zones() error checks
  block: Introduce blkdev_nr_zones() helper
  block: Limit allocation of zone descriptors for report zones
  block: Introduce BLKGETZONESZ ioctl
  block: Introduce BLKGETNRZONES ioctl
  block: Improve zone reset execution
  block: Expose queue nr_zones in sysfs
  block: Introduce blk_revalidate_disk_zones()

 block/blk-core.c               |   1 -
 block/blk-lib.c                |  13 +-
 block/blk-mq-debugfs.c         |   1 -
 block/blk-sysfs.c              |  13 +
 block/blk-zoned.c              | 359 +++++++++++++++---------
 block/blk.h                    |   8 +
 block/ioctl.c                  |   4 +
 drivers/block/null_blk.h       |  11 +-
 drivers/block/null_blk_main.c  |  30 +-
 drivers/block/null_blk_zoned.c |  57 +---
 drivers/md/dm-flakey.c         |  30 +-
 drivers/md/dm-linear.c         |  35 ++-
 drivers/md/dm-table.c          |  10 +
 drivers/md/dm-zoned-target.c   |   3 +-
 drivers/md/dm.c                | 169 ++++++-----
 drivers/scsi/sd.c              |  15 +-
 drivers/scsi/sd.h              |  15 +-
 drivers/scsi/sd_zbc.c          | 497 +++++++++------------------------
 include/linux/blk_types.h      |   2 -
 include/linux/blkdev.h         |  30 +-
 include/linux/device-mapper.h  |  12 +-
 include/trace/events/f2fs.h    |   1 -
 include/uapi/linux/blkzoned.h  |   3 +
 23 files changed, 600 insertions(+), 719 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 01/11] scsi: sd_zbc: Rearrange code
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:29     ` Hannes Reinecke
  2018-10-12  2:30 ` [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time Damien Le Moal
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

Move the urswrz check out of sd_zbc_read_zones() and into
sd_zbc_read_zoned_characteristics() where that value is obtained (read
from the disk zoned characteristics VPD page). Since this function now
does more than simply reading the VPD page, rename it to
sd_zbc_check_zoned_characteristics().
Also fix the error message displayed when reading that VPD page fails.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd_zbc.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 412c1787dcd9..0b7d8787f785 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -305,19 +305,19 @@ void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
 }
 
 /**
- * sd_zbc_read_zoned_characteristics - Read zoned block device characteristics
+ * sd_zbc_check_zoned_characteristics - Check zoned block device characteristics
  * @sdkp: Target disk
  * @buf: Buffer where to store the VPD page data
  *
- * Read VPD page B6.
+ * Read VPD page B6, get information and check that reads are unconstrained.
  */
-static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
-					     unsigned char *buf)
+static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
+					      unsigned char *buf)
 {
 
 	if (scsi_get_vpd_page(sdkp->device, 0xb6, buf, 64)) {
 		sd_printk(KERN_NOTICE, sdkp,
-			  "Unconstrained-read check failed\n");
+			  "Read zoned characteristics VPD page failed\n");
 		return -ENODEV;
 	}
 
@@ -335,6 +335,18 @@ static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
 		sdkp->zones_max_open = get_unaligned_be32(&buf[16]);
 	}
 
+	/*
+	 * Check for unconstrained reads: host-managed devices with
+	 * constrained reads (drives failing read after write pointer)
+	 * are not supported.
+	 */
+	if (!sdkp->urswrz) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+			  "constrained reads devices are not supported\n");
+		return -ENODEV;
+	}
+
 	return 0;
 }
 
@@ -675,24 +687,11 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 		 */
 		return 0;
 
-	/* Get zoned block device characteristics */
-	ret = sd_zbc_read_zoned_characteristics(sdkp, buf);
+	/* Check zoned block device characteristics (unconstrained reads) */
+	ret = sd_zbc_check_zoned_characteristics(sdkp, buf);
 	if (ret)
 		goto err;
 
-	/*
-	 * Check for unconstrained reads: host-managed devices with
-	 * constrained reads (drives failing read after write pointer)
-	 * are not supported.
-	 */
-	if (!sdkp->urswrz) {
-		if (sdkp->first_scan)
-			sd_printk(KERN_NOTICE, sdkp,
-			  "constrained reads devices are not supported\n");
-		ret = -ENODEV;
-		goto err;
-	}
-
 	/* Check capacity */
 	ret = sd_zbc_check_capacity(sdkp, buf);
 	if (ret)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
  2018-10-12  2:30 ` [PATCH v3 01/11] scsi: sd_zbc: Rearrange code Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:33     ` Hannes Reinecke
  2018-10-12  2:30 ` [PATCH v3 03/11] scsi: sd_zbc: Fix sd_zbc_check_zones() error checks Damien Le Moal
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

Handling checks of ZBC device capacity using the max_lba field of the
REPORT ZONES command reply for disks with rc_basis == 0 can be done
using the same report zones command reply used to check the "same"
field.

Avoid executing a report zones command solely to check the disk capacity
by merging sd_zbc_check_capacity() into sd_zbc_check_zone_size() and
renaming that function to sd_zbc_check_zones(). This removes a costly
execution of a full report zones command and so reduces device scan
duration at boot time as well as the duration of disk revalidate calls.

Furthermore, setting the partial report bit in the REPORT ZONES command
cdb can significantly reduce this command execution time as the device
does not have to count and report the total number of zones that could
be reported assuming a large enough reply buffer. A non-partial zone
report is necessary only for the first execution of report zones used to
check the same field value (to ensure that this value applies to all
zones of the disk). All other calls to sd_zbc_report_zones() can use a
partial report to reduce execution time.

Using a 14 TB ZBC disk, these simple changes reduce device scan time at
boot from about 3.5s down to about 900ms. Disk revalidate times are also
reduced from about 450ms down to 230ms.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd_zbc.c | 94 ++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 0b7d8787f785..ca73c46931c0 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -67,11 +67,17 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
  * @buf: Buffer to use for the reply
  * @buflen: the buffer size
  * @lba: Start LBA of the report
+ * @partial: Do partial report
  *
  * For internal use during device validation.
+ * Using partial=true can significantly speed up execution of a report zones
+ * command because the disk does not have to count all possible report matching
+ * zones and will only report the count of zones fitting in the command reply
+ * buffer.
  */
 static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
-			       unsigned int buflen, sector_t lba)
+			       unsigned int buflen, sector_t lba,
+			       bool partial)
 {
 	struct scsi_device *sdp = sdkp->device;
 	const int timeout = sdp->request_queue->rq_timeout;
@@ -85,6 +91,8 @@ static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
 	cmd[1] = ZI_REPORT_ZONES;
 	put_unaligned_be64(lba, &cmd[2]);
 	put_unaligned_be32(buflen, &cmd[10]);
+	if (partial)
+		cmd[14] = ZBC_REPORT_ZONE_PARTIAL;
 	memset(buf, 0, buflen);
 
 	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
@@ -350,60 +358,25 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
 	return 0;
 }
 
-/**
- * sd_zbc_check_capacity - Check reported capacity.
- * @sdkp: Target disk
- * @buf: Buffer to use for commands
- *
- * ZBC drive may report only the capacity of the first conventional zones at
- * LBA 0. This is indicated by the RC_BASIS field of the read capacity reply.
- * Check this here. If the disk reported only its conventional zones capacity,
- * get the total capacity by doing a report zones.
- */
-static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf)
-{
-	sector_t lba;
-	int ret;
-
-	if (sdkp->rc_basis != 0)
-		return 0;
-
-	/* Do a report zone to get the maximum LBA to check capacity */
-	ret = sd_zbc_report_zones(sdkp, buf, SD_BUF_SIZE, 0);
-	if (ret)
-		return ret;
-
-	/* The max_lba field is the capacity of this device */
-	lba = get_unaligned_be64(&buf[8]);
-	if (lba + 1 == sdkp->capacity)
-		return 0;
-
-	if (sdkp->first_scan)
-		sd_printk(KERN_WARNING, sdkp,
-			  "Changing capacity from %llu to max LBA+1 %llu\n",
-			  (unsigned long long)sdkp->capacity,
-			  (unsigned long long)lba + 1);
-	sdkp->capacity = lba + 1;
-
-	return 0;
-}
-
 #define SD_ZBC_BUF_SIZE 131072U
 
 /**
- * sd_zbc_check_zone_size - Check the device zone sizes
+ * sd_zbc_check_zones - Check the device capacity and zone sizes
  * @sdkp: Target disk
  *
- * Check that all zones of the device are equal. The last zone can however
- * be smaller. The zone size must also be a power of two number of LBAs.
+ * Check that the device capacity as reported by READ CAPACITY matches the
+ * max_lba value (plus one)of the report zones command reply. Also check that
+ * all zones of the device have an equal size, only allowing the last zone of
+ * the disk to have a smaller size (runt zone). The zone size must also be a
+ * power of two.
  *
  * Returns the zone size in number of blocks upon success or an error code
  * upon failure.
  */
-static s64 sd_zbc_check_zone_size(struct scsi_disk *sdkp)
+static s64 sd_zbc_check_zones(struct scsi_disk *sdkp)
 {
 	u64 zone_blocks = 0;
-	sector_t block = 0;
+	sector_t max_lba, block = 0;
 	unsigned char *buf;
 	unsigned char *rec;
 	unsigned int buf_len;
@@ -416,11 +389,28 @@ static s64 sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 	if (!buf)
 		return -ENOMEM;
 
-	/* Do a report zone to get the same field */
-	ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0);
+	/* Do a report zone to get max_lba and the same field */
+	ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0, false);
 	if (ret)
 		goto out_free;
 
+	if (sdkp->rc_basis == 0) {
+		/* The max_lba field is the capacity of this device */
+		max_lba = get_unaligned_be64(&buf[8]);
+		if (sdkp->capacity != max_lba + 1) {
+			if (sdkp->first_scan)
+				sd_printk(KERN_WARNING, sdkp,
+					"Changing capacity from %llu to max LBA+1 %llu\n",
+					(unsigned long long)sdkp->capacity,
+					(unsigned long long)max_lba + 1);
+			sdkp->capacity = max_lba + 1;
+		}
+	}
+
+	/*
+	 * Check same field: for any value other than 0, we know that all zones
+	 * have the same size.
+	 */
 	same = buf[4] & 0x0f;
 	if (same > 0) {
 		rec = &buf[64];
@@ -458,7 +448,7 @@ static s64 sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 
 		if (block < sdkp->capacity) {
 			ret = sd_zbc_report_zones(sdkp, buf,
-						  SD_ZBC_BUF_SIZE, block);
+						  SD_ZBC_BUF_SIZE, block, true);
 			if (ret)
 				goto out_free;
 		}
@@ -574,7 +564,8 @@ sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp, u32 zone_shift,
 		goto out;
 
 	while (lba < sdkp->capacity) {
-		ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, lba);
+		ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
+					  lba, true);
 		if (ret)
 			goto out;
 		lba = sd_zbc_get_seq_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
@@ -692,16 +683,11 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 	if (ret)
 		goto err;
 
-	/* Check capacity */
-	ret = sd_zbc_check_capacity(sdkp, buf);
-	if (ret)
-		goto err;
-
 	/*
 	 * Check zone size: only devices with a constant zone size (except
 	 * an eventual last runt zone) that is a power of 2 are supported.
 	 */
-	zone_blocks = sd_zbc_check_zone_size(sdkp);
+	zone_blocks = sd_zbc_check_zones(sdkp);
 	ret = -EFBIG;
 	if (zone_blocks != (u32)zone_blocks)
 		goto err;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 03/11] scsi: sd_zbc: Fix sd_zbc_check_zones() error checks
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
  2018-10-12  2:30 ` [PATCH v3 01/11] scsi: sd_zbc: Rearrange code Damien Le Moal
  2018-10-12  2:30 ` [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:35     ` Hannes Reinecke
  2018-10-12  2:30 ` [PATCH v3 04/11] block: Introduce blkdev_nr_zones() helper Damien Le Moal
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

The 32 bits overflow check for the zone size value is already done
within sd_zbc_check_zones() with the test:

} else if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {

so there is no need to check again for an out of range value in
sd_zbc_read_zones(). Simplify the code and fix sd_zbc_check_zones()
error return to -EFBIG instead of -ENODEV if the zone size is too large.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd_zbc.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index ca73c46931c0..44b64b4a922a 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -373,7 +373,7 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
  * Returns the zone size in number of blocks upon success or an error code
  * upon failure.
  */
-static s64 sd_zbc_check_zones(struct scsi_disk *sdkp)
+static s32 sd_zbc_check_zones(struct scsi_disk *sdkp)
 {
 	u64 zone_blocks = 0;
 	sector_t max_lba, block = 0;
@@ -472,7 +472,7 @@ static s64 sd_zbc_check_zones(struct scsi_disk *sdkp)
 		if (sdkp->first_scan)
 			sd_printk(KERN_NOTICE, sdkp,
 				  "Zone size too large\n");
-		ret = -ENODEV;
+		ret = -EFBIG;
 	} else {
 		ret = zone_blocks;
 	}
@@ -668,8 +668,7 @@ static int sd_zbc_setup(struct scsi_disk *sdkp, u32 zone_blocks)
 
 int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 {
-	int64_t zone_blocks;
-	int ret;
+	int ret, zone_blocks;
 
 	if (!sd_is_zoned(sdkp))
 		/*
@@ -688,12 +687,10 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 	 * an eventual last runt zone) that is a power of 2 are supported.
 	 */
 	zone_blocks = sd_zbc_check_zones(sdkp);
-	ret = -EFBIG;
-	if (zone_blocks != (u32)zone_blocks)
-		goto err;
-	ret = zone_blocks;
-	if (ret < 0)
+	if (zone_blocks < 0) {
+		ret = zone_blocks;
 		goto err;
+	}
 
 	/* The drive satisfies the kernel restrictions: set it up */
 	ret = sd_zbc_setup(sdkp, zone_blocks);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 04/11] block: Introduce blkdev_nr_zones() helper
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
                   ` (2 preceding siblings ...)
  2018-10-12  2:30 ` [PATCH v3 03/11] scsi: sd_zbc: Fix sd_zbc_check_zones() error checks Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:36     ` Hannes Reinecke
  2018-10-12  2:30 ` [PATCH v3 05/11] block: Limit allocation of zone descriptors for report zones Damien Le Moal
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

Introduce the blkdev_nr_zones() helper function to get the total
number of zones of a zoned block device. This number is always 0 for a
regular block device (q->limits.zoned == BLK_ZONED_NONE case).

Replace hard-coded number of zones calculation in dmz_get_zoned_device()
with a call to this helper.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-zoned.c            | 27 +++++++++++++++++++++++++++
 drivers/md/dm-zoned-target.c |  3 +--
 include/linux/blkdev.h       |  5 +++++
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index c461cf63f1f4..32e377f755d8 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -63,6 +63,33 @@ void __blk_req_zone_write_unlock(struct request *rq)
 }
 EXPORT_SYMBOL_GPL(__blk_req_zone_write_unlock);
 
+static inline unsigned int __blkdev_nr_zones(struct request_queue *q,
+					     sector_t nr_sectors)
+{
+	unsigned long zone_sectors = blk_queue_zone_sectors(q);
+
+	return (nr_sectors + zone_sectors - 1) >> ilog2(zone_sectors);
+}
+
+/**
+ * blkdev_nr_zones - Get number of zones
+ * @bdev:	Target block device
+ *
+ * Description:
+ *    Return the total number of zones of a zoned block device.
+ *    For a regular block device, the number of zones is always 0.
+ */
+unsigned int blkdev_nr_zones(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (!blk_queue_is_zoned(q))
+		return 0;
+
+	return __blkdev_nr_zones(q, bdev->bd_part->nr_sects);
+}
+EXPORT_SYMBOL_GPL(blkdev_nr_zones);
+
 /*
  * Check that a zone report belongs to the partition.
  * If yes, fix its start sector and write pointer, copy it in the
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index a44183ff4be0..12d96a263623 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -702,8 +702,7 @@ static int dmz_get_zoned_device(struct dm_target *ti, char *path)
 	dev->zone_nr_blocks = dmz_sect2blk(dev->zone_nr_sectors);
 	dev->zone_nr_blocks_shift = ilog2(dev->zone_nr_blocks);
 
-	dev->nr_zones = (dev->capacity + dev->zone_nr_sectors - 1)
-		>> dev->zone_nr_sectors_shift;
+	dev->nr_zones = blkdev_nr_zones(dev->bdev);
 
 	dmz->dev = dev;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6980014357d4..c24969b1741b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -401,6 +401,7 @@ struct blk_zone_report_hdr {
 	u8		padding[60];
 };
 
+extern unsigned int blkdev_nr_zones(struct block_device *bdev);
 extern int blkdev_report_zones(struct block_device *bdev,
 			       sector_t sector, struct blk_zone *zones,
 			       unsigned int *nr_zones, gfp_t gfp_mask);
@@ -414,6 +415,10 @@ extern int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
 
 #else /* CONFIG_BLK_DEV_ZONED */
 
+static inline unsigned int blkdev_nr_zones(struct block_device *bdev)
+{
+	return 0;
+}
 static inline int blkdev_report_zones_ioctl(struct block_device *bdev,
 					    fmode_t mode, unsigned int cmd,
 					    unsigned long arg)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 05/11] block: Limit allocation of zone descriptors for report zones
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
                   ` (3 preceding siblings ...)
  2018-10-12  2:30 ` [PATCH v3 04/11] block: Introduce blkdev_nr_zones() helper Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:36     ` Hannes Reinecke
  2018-10-12  2:30 ` [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl Damien Le Moal
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

There is no point in allocating more zone descriptors than the number of
zones a block device has for doing a zone report. Avoid doing that in
blkdev_report_zones_ioctl() by limiting the number of zone decriptors
allocated internally to process the user request.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-zoned.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 32e377f755d8..bb4ed69f917f 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -355,8 +355,7 @@ int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
 	if (!rep.nr_zones)
 		return -EINVAL;
 
-	if (rep.nr_zones > INT_MAX / sizeof(struct blk_zone))
-		return -ERANGE;
+	rep.nr_zones = min(blkdev_nr_zones(bdev), rep.nr_zones);
 
 	zones = kvmalloc_array(rep.nr_zones, sizeof(struct blk_zone),
 			       GFP_KERNEL | __GFP_ZERO);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
                   ` (4 preceding siblings ...)
  2018-10-12  2:30 ` [PATCH v3 05/11] block: Limit allocation of zone descriptors for report zones Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:37     ` Hannes Reinecke
  2018-10-12  8:23     ` Christoph Hellwig
  2018-10-12  2:30 ` [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl Damien Le Moal
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

Get a zoned block device zone size in number of 512 B sectors.
The zone size is always 0 for regular block devices.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/ioctl.c                 | 2 ++
 include/uapi/linux/blkzoned.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 3884d810efd2..f6d2c6f1f050 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -532,6 +532,8 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 		return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
 	case BLKRESETZONE:
 		return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
+	case BLKGETZONESZ:
+		return put_uint(arg, bdev_zone_sectors(bdev));
 	case HDIO_GETGEO:
 		return blkdev_getgeo(bdev, argp);
 	case BLKRAGET:
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
index ff5a5db8906a..281ac605f752 100644
--- a/include/uapi/linux/blkzoned.h
+++ b/include/uapi/linux/blkzoned.h
@@ -137,8 +137,10 @@ struct blk_zone_range {
  *                 sector specified in the report request structure.
  * @BLKRESETZONE: Reset the write pointer of the zones in the specified
  *                sector range. The sector range must be zone aligned.
+ * @BLKGETZONESZ: Get the device zone size in number of 512 B sectors.
  */
 #define BLKREPORTZONE	_IOWR(0x12, 130, struct blk_zone_report)
 #define BLKRESETZONE	_IOW(0x12, 131, struct blk_zone_range)
+#define BLKGETZONESZ	_IOW(0x12, 132, __u32)
 
 #endif /* _UAPI_BLKZONED_H */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
                   ` (5 preceding siblings ...)
  2018-10-12  2:30 ` [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:37     ` Hannes Reinecke
  2018-10-12  8:23     ` Christoph Hellwig
  2018-10-12  2:30 ` [PATCH v3 08/11] block: Improve zone reset execution Damien Le Moal
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

Get a zoned block device total number of zones. The device can be a
partition of the whole device. The number of zones is always 0 for
regular block devices.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/ioctl.c                 | 2 ++
 include/uapi/linux/blkzoned.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index f6d2c6f1f050..4825c78a6baa 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -534,6 +534,8 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 		return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
 	case BLKGETZONESZ:
 		return put_uint(arg, bdev_zone_sectors(bdev));
+	case BLKGETNRZONES:
+		return put_uint(arg, blkdev_nr_zones(bdev));
 	case HDIO_GETGEO:
 		return blkdev_getgeo(bdev, argp);
 	case BLKRAGET:
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
index 281ac605f752..8f08ff9bdea0 100644
--- a/include/uapi/linux/blkzoned.h
+++ b/include/uapi/linux/blkzoned.h
@@ -142,5 +142,6 @@ struct blk_zone_range {
 #define BLKREPORTZONE	_IOWR(0x12, 130, struct blk_zone_report)
 #define BLKRESETZONE	_IOW(0x12, 131, struct blk_zone_range)
 #define BLKGETZONESZ	_IOW(0x12, 132, __u32)
+#define BLKGETNRZONES	_IOW(0x12, 133, __u32)
 
 #endif /* _UAPI_BLKZONED_H */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 08/11] block: Improve zone reset execution
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
                   ` (6 preceding siblings ...)
  2018-10-12  2:30 ` [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:39     ` Hannes Reinecke
  2018-10-12  2:30 ` [PATCH v3 09/11] block: Expose queue nr_zones in sysfs Damien Le Moal
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

There is no need to synchronously execute all REQ_OP_ZONE_RESET BIOs
necessary to reset a range of zones. Similarly to what is done for
discard BIOs in blk-lib.c, all zone reset BIOs can be chained and
executed asynchronously and a synchronous call done only for the last
BIO of the chain.

Modify blkdev_reset_zones() to operate similarly to
blkdev_issue_discard() using the next_bio() helper for chaining BIOs. To
avoid code duplication of that function in blk_zoned.c, rename
next_bio() into blk_next_bio() and declare it as a block internal
function in blk.h.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c   | 13 ++++++-------
 block/blk-zoned.c | 29 ++++++++++++++++-------------
 block/blk.h       |  2 ++
 3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index d1b9dd03da25..d7bedee5f9ba 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -10,8 +10,7 @@
 
 #include "blk.h"
 
-static struct bio *next_bio(struct bio *bio, unsigned int nr_pages,
-		gfp_t gfp)
+struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp)
 {
 	struct bio *new = bio_alloc(gfp, nr_pages);
 
@@ -87,7 +86,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 			req_sects = end_sect - sector;
 		}
 
-		bio = next_bio(bio, 0, gfp_mask);
+		bio = blk_next_bio(bio, 0, gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		bio_set_dev(bio, bdev);
 		bio_set_op_attrs(bio, op, 0);
@@ -189,7 +188,7 @@ static int __blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	max_write_same_sectors = UINT_MAX >> 9;
 
 	while (nr_sects) {
-		bio = next_bio(bio, 1, gfp_mask);
+		bio = blk_next_bio(bio, 1, gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		bio_set_dev(bio, bdev);
 		bio->bi_vcnt = 1;
@@ -265,7 +264,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 		return -EOPNOTSUPP;
 
 	while (nr_sects) {
-		bio = next_bio(bio, 0, gfp_mask);
+		bio = blk_next_bio(bio, 0, gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		bio_set_dev(bio, bdev);
 		bio->bi_opf = REQ_OP_WRITE_ZEROES;
@@ -316,8 +315,8 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
 		return -EPERM;
 
 	while (nr_sects != 0) {
-		bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
-			       gfp_mask);
+		bio = blk_next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
+				   gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		bio_set_dev(bio, bdev);
 		bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index bb4ed69f917f..5d967fd39fbd 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -13,6 +13,8 @@
 #include <linux/rbtree.h>
 #include <linux/blkdev.h>
 
+#include "blk.h"
+
 static inline sector_t blk_zone_start(struct request_queue *q,
 				      sector_t sector)
 {
@@ -277,16 +279,17 @@ int blkdev_reset_zones(struct block_device *bdev,
 	struct request_queue *q = bdev_get_queue(bdev);
 	sector_t zone_sectors;
 	sector_t end_sector = sector + nr_sectors;
-	struct bio *bio;
+	struct bio *bio = NULL;
+	struct blk_plug plug;
 	int ret;
 
-	if (!q)
-		return -ENXIO;
-
 	if (!blk_queue_is_zoned(q))
 		return -EOPNOTSUPP;
 
-	if (end_sector > bdev->bd_part->nr_sects)
+	if (bdev_read_only(bdev))
+		return -EPERM;
+
+	if (!nr_sectors || end_sector > bdev->bd_part->nr_sects)
 		/* Out of range */
 		return -EINVAL;
 
@@ -299,19 +302,14 @@ int blkdev_reset_zones(struct block_device *bdev,
 	    end_sector != bdev->bd_part->nr_sects)
 		return -EINVAL;
 
+	blk_start_plug(&plug);
 	while (sector < end_sector) {
 
-		bio = bio_alloc(gfp_mask, 0);
+		bio = blk_next_bio(bio, 0, gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		bio_set_dev(bio, bdev);
 		bio_set_op_attrs(bio, REQ_OP_ZONE_RESET, 0);
 
-		ret = submit_bio_wait(bio);
-		bio_put(bio);
-
-		if (ret)
-			return ret;
-
 		sector += zone_sectors;
 
 		/* This may take a while, so be nice to others */
@@ -319,7 +317,12 @@ int blkdev_reset_zones(struct block_device *bdev,
 
 	}
 
-	return 0;
+	ret = submit_bio_wait(bio);
+	bio_put(bio);
+
+	blk_finish_plug(&plug);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(blkdev_reset_zones);
 
diff --git a/block/blk.h b/block/blk.h
index 9db4e389582c..3d67844676fb 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -421,4 +421,6 @@ extern int blk_iolatency_init(struct request_queue *q);
 static inline int blk_iolatency_init(struct request_queue *q) { return 0; }
 #endif
 
+struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp);
+
 #endif /* BLK_INTERNAL_H */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
                   ` (7 preceding siblings ...)
  2018-10-12  2:30 ` [PATCH v3 08/11] block: Improve zone reset execution Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:41     ` Hannes Reinecke
  2018-10-12  8:29     ` Christoph Hellwig
  2018-10-12  2:30 ` [PATCH v3 10/11] block: add a report_zones method Damien Le Moal
  2018-10-12  2:30 ` [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones() Damien Le Moal
  10 siblings, 2 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

Expose through sysfs the nr_zones field of struct request_queue.
Exposing this value helps in debugging disk issues as well as
facilitating scripts based use of the disk (e.g. blktests).

For zoned block devices, the nr_zones field indicates the total number
of zones of the device calculated using the known disk capacity and
zone size. This number of zones is always 0 for regular block devices.

Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
introduce the blk_queue_nr_zones() function to return the correct value
for any device, regardless if CONFIG_BLK_DEV_ZONED is set.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-sysfs.c      | 11 +++++++++++
 include/linux/blkdev.h | 10 ++++++++++
 2 files changed, 21 insertions(+)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 3772671cf2bc..92be8092ca4f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -300,6 +300,11 @@ static ssize_t queue_zoned_show(struct request_queue *q, char *page)
 	}
 }
 
+static ssize_t queue_nr_zones_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(blk_queue_nr_zones(q), page);
+}
+
 static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
 {
 	return queue_var_show((blk_queue_nomerges(q) << 1) |
@@ -637,6 +642,11 @@ static struct queue_sysfs_entry queue_zoned_entry = {
 	.show = queue_zoned_show,
 };
 
+static struct queue_sysfs_entry queue_nr_zones_entry = {
+	.attr = {.name = "nr_zones", .mode = 0444 },
+	.show = queue_nr_zones_show,
+};
+
 static struct queue_sysfs_entry queue_nomerges_entry = {
 	.attr = {.name = "nomerges", .mode = 0644 },
 	.show = queue_nomerges_show,
@@ -727,6 +737,7 @@ static struct attribute *default_attrs[] = {
 	&queue_write_zeroes_max_entry.attr,
 	&queue_nonrot_entry.attr,
 	&queue_zoned_entry.attr,
+	&queue_nr_zones_entry.attr,
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
 	&queue_iostats_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c24969b1741b..879753b10263 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -804,6 +804,11 @@ static inline unsigned int blk_queue_zone_sectors(struct request_queue *q)
 }
 
 #ifdef CONFIG_BLK_DEV_ZONED
+static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
+{
+	return blk_queue_is_zoned(q) ? q->nr_zones : 0;
+}
+
 static inline unsigned int blk_queue_zone_no(struct request_queue *q,
 					     sector_t sector)
 {
@@ -819,6 +824,11 @@ static inline bool blk_queue_zone_is_seq(struct request_queue *q,
 		return false;
 	return test_bit(blk_queue_zone_no(q, sector), q->seq_zones_bitmap);
 }
+#else /* CONFIG_BLK_DEV_ZONED */
+static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
+{
+	return 0;
+}
 #endif /* CONFIG_BLK_DEV_ZONED */
 
 static inline bool rq_is_sync(struct request *rq)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 10/11] block: add a report_zones method
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
                   ` (8 preceding siblings ...)
  2018-10-12  2:30 ` [PATCH v3 09/11] block: Expose queue nr_zones in sysfs Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:42     ` Hannes Reinecke
  2018-10-12  2:30 ` [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones() Damien Le Moal
  10 siblings, 1 reply; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

From: Christoph Hellwig <hch@lst.de>

Dispatching a report zones command through the request queue is a major
pain due to the command reply payload rewriting necessary. Given that
blkdev_report_zones() is executing everything synchronously, implement
report zones as a block device file operation instead, allowing major
simplification of the code in many places.

sd, null-blk, dm-linear and dm-flakey being the only block device
drivers supporting exposing zoned block devices, these drivers are
modified to provide the device side implementation of the
report_zones() block device file operation.

For device mappers, a new report_zones() target type operation is
defined so that the upper block layer calls blkdev_report_zones() can
be propagated down to the underlying devices of the dm targets.
Implementation for this new operation is added to the dm-linear and
dm-flakey targets.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Damien]
* Changed method block_device argument to gendisk
* Various bug fixes and improvements
* Added support for null_blk, dm-linear and dm-flakey.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-core.c               |   1 -
 block/blk-mq-debugfs.c         |   1 -
 block/blk-zoned.c              | 164 ++++++++++----------------------
 drivers/block/null_blk.h       |  11 ++-
 drivers/block/null_blk_main.c  |  23 +----
 drivers/block/null_blk_zoned.c |  57 +++--------
 drivers/md/dm-flakey.c         |  30 ++++--
 drivers/md/dm-linear.c         |  35 ++++---
 drivers/md/dm.c                | 169 ++++++++++++++++-----------------
 drivers/scsi/sd.c              |  13 +--
 drivers/scsi/sd.h              |  11 +--
 drivers/scsi/sd_zbc.c          | 153 +++++++++--------------------
 include/linux/blk_types.h      |   2 -
 include/linux/blkdev.h         |   8 +-
 include/linux/device-mapper.h  |  12 ++-
 include/trace/events/f2fs.h    |   1 -
 16 files changed, 266 insertions(+), 425 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index cff0a60ee200..18e7050eb5a4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2307,7 +2307,6 @@ generic_make_request_checks(struct bio *bio)
 		if (!q->limits.max_write_same_sectors)
 			goto not_supported;
 		break;
-	case REQ_OP_ZONE_REPORT:
 	case REQ_OP_ZONE_RESET:
 		if (!blk_queue_is_zoned(q))
 			goto not_supported;
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index cb1e6cf7ac48..a14ed04c1ff7 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -275,7 +275,6 @@ static const char *const op_name[] = {
 	REQ_OP_NAME(WRITE),
 	REQ_OP_NAME(FLUSH),
 	REQ_OP_NAME(DISCARD),
-	REQ_OP_NAME(ZONE_REPORT),
 	REQ_OP_NAME(SECURE_ERASE),
 	REQ_OP_NAME(ZONE_RESET),
 	REQ_OP_NAME(WRITE_SAME),
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 5d967fd39fbd..90cf503091d5 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -93,13 +93,10 @@ unsigned int blkdev_nr_zones(struct block_device *bdev)
 EXPORT_SYMBOL_GPL(blkdev_nr_zones);
 
 /*
- * Check that a zone report belongs to the partition.
- * If yes, fix its start sector and write pointer, copy it in the
- * zone information array and return true. Return false otherwise.
+ * Check that a zone report belongs to this partition, and if yes, fix its start
+ * sector and write pointer and return true. Return false otherwise.
  */
-static bool blkdev_report_zone(struct block_device *bdev,
-			       struct blk_zone *rep,
-			       struct blk_zone *zone)
+static bool blkdev_report_zone(struct block_device *bdev, struct blk_zone *rep)
 {
 	sector_t offset = get_start_sect(bdev);
 
@@ -114,11 +111,36 @@ static bool blkdev_report_zone(struct block_device *bdev,
 		rep->wp = rep->start + rep->len;
 	else
 		rep->wp -= offset;
-	memcpy(zone, rep, sizeof(struct blk_zone));
-
 	return true;
 }
 
+static int blk_report_zones(struct gendisk *disk, sector_t sector,
+			    struct blk_zone *zones, unsigned int *nr_zones,
+			    gfp_t gfp_mask)
+{
+	struct request_queue *q = disk->queue;
+	unsigned int z = 0, n, nrz = *nr_zones;
+	sector_t capacity = get_capacity(disk);
+	int ret;
+
+	while (z < nrz && sector < capacity) {
+		n = nrz - z;
+		ret = disk->fops->report_zones(disk, sector, &zones[z], &n,
+					       gfp_mask);
+		if (ret)
+			return ret;
+		if (!n)
+			break;
+		sector += blk_queue_zone_sectors(q) * n;
+		z += n;
+	}
+
+	WARN_ON(z > *nr_zones);
+	*nr_zones = z;
+
+	return 0;
+}
+
 /**
  * blkdev_report_zones - Get zones information
  * @bdev:	Target block device
@@ -133,130 +155,46 @@ static bool blkdev_report_zone(struct block_device *bdev,
  *    requested by @nr_zones. The number of zones actually reported is
  *    returned in @nr_zones.
  */
-int blkdev_report_zones(struct block_device *bdev,
-			sector_t sector,
-			struct blk_zone *zones,
-			unsigned int *nr_zones,
+int blkdev_report_zones(struct block_device *bdev, sector_t sector,
+			struct blk_zone *zones, unsigned int *nr_zones,
 			gfp_t gfp_mask)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
-	struct blk_zone_report_hdr *hdr;
-	unsigned int nrz = *nr_zones;
-	struct page *page;
-	unsigned int nr_rep;
-	size_t rep_bytes;
-	unsigned int nr_pages;
-	struct bio *bio;
-	struct bio_vec *bv;
-	unsigned int i, n, nz;
-	unsigned int ofst;
-	void *addr;
+	unsigned int i, nrz;
 	int ret;
 
-	if (!q)
-		return -ENXIO;
-
 	if (!blk_queue_is_zoned(q))
 		return -EOPNOTSUPP;
 
-	if (!nrz)
-		return 0;
-
-	if (sector > bdev->bd_part->nr_sects) {
-		*nr_zones = 0;
-		return 0;
-	}
-
 	/*
-	 * The zone report has a header. So make room for it in the
-	 * payload. Also make sure that the report fits in a single BIO
-	 * that will not be split down the stack.
+	 * A block device that advertized itself as zoned must have a
+	 * report_zones method. If it does not have one defined, the device
+	 * driver has a bug. So warn about that.
 	 */
-	rep_bytes = sizeof(struct blk_zone_report_hdr) +
-		sizeof(struct blk_zone) * nrz;
-	rep_bytes = (rep_bytes + PAGE_SIZE - 1) & PAGE_MASK;
-	if (rep_bytes > (queue_max_sectors(q) << 9))
-		rep_bytes = queue_max_sectors(q) << 9;
-
-	nr_pages = min_t(unsigned int, BIO_MAX_PAGES,
-			 rep_bytes >> PAGE_SHIFT);
-	nr_pages = min_t(unsigned int, nr_pages,
-			 queue_max_segments(q));
-
-	bio = bio_alloc(gfp_mask, nr_pages);
-	if (!bio)
-		return -ENOMEM;
+	if (WARN_ON_ONCE(!bdev->bd_disk->fops->report_zones))
+		return -EOPNOTSUPP;
 
-	bio_set_dev(bio, bdev);
-	bio->bi_iter.bi_sector = blk_zone_start(q, sector);
-	bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, 0);
-
-	for (i = 0; i < nr_pages; i++) {
-		page = alloc_page(gfp_mask);
-		if (!page) {
-			ret = -ENOMEM;
-			goto out;
-		}
-		if (!bio_add_page(bio, page, PAGE_SIZE, 0)) {
-			__free_page(page);
-			break;
-		}
+	if (!*nr_zones || sector >= bdev->bd_part->nr_sects) {
+		*nr_zones = 0;
+		return 0;
 	}
 
-	if (i == 0)
-		ret = -ENOMEM;
-	else
-		ret = submit_bio_wait(bio);
+	nrz = min(*nr_zones,
+		  __blkdev_nr_zones(q, bdev->bd_part->nr_sects - sector));
+	ret = blk_report_zones(bdev->bd_disk, get_start_sect(bdev) + sector,
+			       zones, &nrz, gfp_mask);
 	if (ret)
-		goto out;
-
-	/*
-	 * Process the report result: skip the header and go through the
-	 * reported zones to fixup and fixup the zone information for
-	 * partitions. At the same time, return the zone information into
-	 * the zone array.
-	 */
-	n = 0;
-	nz = 0;
-	nr_rep = 0;
-	bio_for_each_segment_all(bv, bio, i) {
+		return ret;
 
-		if (!bv->bv_page)
+	for (i = 0; i < nrz; i++) {
+		if (!blkdev_report_zone(bdev, zones))
 			break;
-
-		addr = kmap_atomic(bv->bv_page);
-
-		/* Get header in the first page */
-		ofst = 0;
-		if (!nr_rep) {
-			hdr = addr;
-			nr_rep = hdr->nr_zones;
-			ofst = sizeof(struct blk_zone_report_hdr);
-		}
-
-		/* Fixup and report zones */
-		while (ofst < bv->bv_len &&
-		       n < nr_rep && nz < nrz) {
-			if (blkdev_report_zone(bdev, addr + ofst, &zones[nz]))
-				nz++;
-			ofst += sizeof(struct blk_zone);
-			n++;
-		}
-
-		kunmap_atomic(addr);
-
-		if (n >= nr_rep || nz >= nrz)
-			break;
-
+		zones++;
 	}
 
-	*nr_zones = nz;
-out:
-	bio_for_each_segment_all(bv, bio, i)
-		__free_page(bv->bv_page);
-	bio_put(bio);
+	*nr_zones = i;
 
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(blkdev_report_zones);
 
diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index 34e0030f0592..7685df43f1ef 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -87,7 +87,9 @@ struct nullb {
 #ifdef CONFIG_BLK_DEV_ZONED
 int null_zone_init(struct nullb_device *dev);
 void null_zone_exit(struct nullb_device *dev);
-blk_status_t null_zone_report(struct nullb *nullb, struct bio *bio);
+int null_zone_report(struct gendisk *disk, sector_t sector,
+		     struct blk_zone *zones, unsigned int *nr_zones,
+		     gfp_t gfp_mask);
 void null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 			unsigned int nr_sectors);
 void null_zone_reset(struct nullb_cmd *cmd, sector_t sector);
@@ -97,10 +99,11 @@ static inline int null_zone_init(struct nullb_device *dev)
 	return -EINVAL;
 }
 static inline void null_zone_exit(struct nullb_device *dev) {}
-static inline blk_status_t null_zone_report(struct nullb *nullb,
-					    struct bio *bio)
+static inline int null_zone_report(struct gendisk *disk, sector_t sector,
+				   struct blk_zone *zones,
+				   unsigned int *nr_zones, gfp_t gfp_mask)
 {
-	return BLK_STS_NOTSUPP;
+	return -EOPNOTSUPP;
 }
 static inline void null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 				   unsigned int nr_sectors)
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 093b614d6524..f5759ca768d1 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1157,34 +1157,12 @@ static void null_restart_queue_async(struct nullb *nullb)
 	}
 }
 
-static bool cmd_report_zone(struct nullb *nullb, struct nullb_cmd *cmd)
-{
-	struct nullb_device *dev = cmd->nq->dev;
-
-	if (dev->queue_mode == NULL_Q_BIO) {
-		if (bio_op(cmd->bio) == REQ_OP_ZONE_REPORT) {
-			cmd->error = null_zone_report(nullb, cmd->bio);
-			return true;
-		}
-	} else {
-		if (req_op(cmd->rq) == REQ_OP_ZONE_REPORT) {
-			cmd->error = null_zone_report(nullb, cmd->rq->bio);
-			return true;
-		}
-	}
-
-	return false;
-}
-
 static blk_status_t null_handle_cmd(struct nullb_cmd *cmd)
 {
 	struct nullb_device *dev = cmd->nq->dev;
 	struct nullb *nullb = dev->nullb;
 	int err = 0;
 
-	if (cmd_report_zone(nullb, cmd))
-		goto out;
-
 	if (test_bit(NULLB_DEV_FL_THROTTLED, &dev->flags)) {
 		struct request *rq = cmd->rq;
 
@@ -1528,6 +1506,7 @@ static const struct block_device_operations null_fops = {
 	.owner =	THIS_MODULE,
 	.open =		null_open,
 	.release =	null_release,
+	.report_zones =	null_zone_report,
 };
 
 static void null_init_queue(struct nullb *nullb, struct nullb_queue *nq)
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 7c6b86d98700..c0b0e4a3fa8f 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -48,54 +48,27 @@ void null_zone_exit(struct nullb_device *dev)
 	kvfree(dev->zones);
 }
 
-static void null_zone_fill_bio(struct nullb_device *dev, struct bio *bio,
-			       unsigned int zno, unsigned int nr_zones)
+int null_zone_report(struct gendisk *disk, sector_t sector,
+		     struct blk_zone *zones, unsigned int *nr_zones,
+		     gfp_t gfp_mask)
 {
-	struct blk_zone_report_hdr *hdr = NULL;
-	struct bio_vec bvec;
-	struct bvec_iter iter;
-	void *addr;
-	unsigned int zones_to_cpy;
-
-	bio_for_each_segment(bvec, bio, iter) {
-		addr = kmap_atomic(bvec.bv_page);
-
-		zones_to_cpy = bvec.bv_len / sizeof(struct blk_zone);
-
-		if (!hdr) {
-			hdr = (struct blk_zone_report_hdr *)addr;
-			hdr->nr_zones = nr_zones;
-			zones_to_cpy--;
-			addr += sizeof(struct blk_zone_report_hdr);
-		}
-
-		zones_to_cpy = min_t(unsigned int, zones_to_cpy, nr_zones);
-
-		memcpy(addr, &dev->zones[zno],
-				zones_to_cpy * sizeof(struct blk_zone));
-
-		kunmap_atomic(addr);
+	struct nullb *nullb = disk->private_data;
+	struct nullb_device *dev = nullb->dev;
+	unsigned int zno, nrz = 0;
 
-		nr_zones -= zones_to_cpy;
-		zno += zones_to_cpy;
+	if (!dev->zoned)
+		/* Not a zoned null device */
+		return -EOPNOTSUPP;
 
-		if (!nr_zones)
-			break;
+	zno = null_zone_no(dev, sector);
+	if (zno < dev->nr_zones) {
+		nrz = min_t(unsigned int, *nr_zones, dev->nr_zones - zno);
+		memcpy(zones, &dev->zones[zno], nrz * sizeof(struct blk_zone));
 	}
-}
 
-blk_status_t null_zone_report(struct nullb *nullb, struct bio *bio)
-{
-	struct nullb_device *dev = nullb->dev;
-	unsigned int zno = null_zone_no(dev, bio->bi_iter.bi_sector);
-	unsigned int nr_zones = dev->nr_zones - zno;
-	unsigned int max_zones;
+	*nr_zones = nrz;
 
-	max_zones = (bio->bi_iter.bi_size / sizeof(struct blk_zone)) - 1;
-	nr_zones = min_t(unsigned int, nr_zones, max_zones);
-	null_zone_fill_bio(nullb->dev, bio, zno, nr_zones);
-
-	return BLK_STS_OK;
+	return 0;
 }
 
 void null_zone_write(struct nullb_cmd *cmd, sector_t sector,
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 32aabe27b37c..3cb97fa4c11d 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -315,10 +315,6 @@ static int flakey_map(struct dm_target *ti, struct bio *bio)
 	if (bio_op(bio) == REQ_OP_ZONE_RESET)
 		goto map_bio;
 
-	/* We need to remap reported zones, so remember the BIO iter */
-	if (bio_op(bio) == REQ_OP_ZONE_REPORT)
-		goto map_bio;
-
 	/* Are we alive ? */
 	elapsed = (jiffies - fc->start_time) / HZ;
 	if (elapsed % (fc->up_interval + fc->down_interval) >= fc->up_interval) {
@@ -380,11 +376,6 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio,
 	if (bio_op(bio) == REQ_OP_ZONE_RESET)
 		return DM_ENDIO_DONE;
 
-	if (bio_op(bio) == REQ_OP_ZONE_REPORT) {
-		dm_remap_zone_report(ti, bio, fc->start);
-		return DM_ENDIO_DONE;
-	}
-
 	if (!*error && pb->bio_submitted && (bio_data_dir(bio) == READ)) {
 		if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) &&
 		    all_corrupt_bio_flags_match(bio, fc)) {
@@ -457,6 +448,26 @@ static int flakey_prepare_ioctl(struct dm_target *ti, struct block_device **bdev
 	return 0;
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+static int flakey_report_zones(struct dm_target *ti, sector_t sector,
+			       struct blk_zone *zones, unsigned int *nr_zones,
+			       gfp_t gfp_mask)
+{
+	struct flakey_c *fc = ti->private;
+	int ret;
+
+	/* Do report and remap it */
+	ret = blkdev_report_zones(fc->dev->bdev, flakey_map_sector(ti, sector),
+				  zones, nr_zones, gfp_mask);
+	if (ret != 0)
+		return ret;
+
+	if (*nr_zones)
+		dm_remap_zone_report(ti, fc->start, zones, nr_zones);
+	return 0;
+}
+#endif
+
 static int flakey_iterate_devices(struct dm_target *ti, iterate_devices_callout_fn fn, void *data)
 {
 	struct flakey_c *fc = ti->private;
@@ -469,6 +480,7 @@ static struct target_type flakey_target = {
 	.version = {1, 5, 0},
 #ifdef CONFIG_BLK_DEV_ZONED
 	.features = DM_TARGET_ZONED_HM,
+	.report_zones = flakey_report_zones,
 #endif
 	.module = THIS_MODULE,
 	.ctr    = flakey_ctr,
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 2f7c44a006c4..8d7ddee6ac4d 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -102,19 +102,6 @@ static int linear_map(struct dm_target *ti, struct bio *bio)
 	return DM_MAPIO_REMAPPED;
 }
 
-#ifdef CONFIG_BLK_DEV_ZONED
-static int linear_end_io(struct dm_target *ti, struct bio *bio,
-			 blk_status_t *error)
-{
-	struct linear_c *lc = ti->private;
-
-	if (!*error && bio_op(bio) == REQ_OP_ZONE_REPORT)
-		dm_remap_zone_report(ti, bio, lc->start);
-
-	return DM_ENDIO_DONE;
-}
-#endif
-
 static void linear_status(struct dm_target *ti, status_type_t type,
 			  unsigned status_flags, char *result, unsigned maxlen)
 {
@@ -148,6 +135,26 @@ static int linear_prepare_ioctl(struct dm_target *ti, struct block_device **bdev
 	return 0;
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+static int linear_report_zones(struct dm_target *ti, sector_t sector,
+			       struct blk_zone *zones, unsigned int *nr_zones,
+			       gfp_t gfp_mask)
+{
+	struct linear_c *lc = (struct linear_c *) ti->private;
+	int ret;
+
+	/* Do report and remap it */
+	ret = blkdev_report_zones(lc->dev->bdev, linear_map_sector(ti, sector),
+				  zones, nr_zones, gfp_mask);
+	if (ret != 0)
+		return ret;
+
+	if (*nr_zones)
+		dm_remap_zone_report(ti, lc->start, zones, nr_zones);
+	return 0;
+}
+#endif
+
 static int linear_iterate_devices(struct dm_target *ti,
 				  iterate_devices_callout_fn fn, void *data)
 {
@@ -211,8 +218,8 @@ static struct target_type linear_target = {
 	.name   = "linear",
 	.version = {1, 4, 0},
 #ifdef CONFIG_BLK_DEV_ZONED
-	.end_io = linear_end_io,
 	.features = DM_TARGET_PASSES_INTEGRITY | DM_TARGET_ZONED_HM,
+	.report_zones = linear_report_zones,
 #else
 	.features = DM_TARGET_PASSES_INTEGRITY,
 #endif
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 45abb54037fc..6be21dc210a1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -458,6 +458,57 @@ static int dm_blk_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 	return dm_get_geometry(md, geo);
 }
 
+static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
+			       struct blk_zone *zones, unsigned int *nr_zones,
+			       gfp_t gfp_mask)
+{
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct mapped_device *md = disk->private_data;
+	struct dm_target *tgt;
+	struct dm_table *map;
+	int srcu_idx, ret;
+
+	if (dm_suspended_md(md))
+		return -EAGAIN;
+
+	map = dm_get_live_table(md, &srcu_idx);
+	if (!map)
+		return -EIO;
+
+	tgt = dm_table_find_target(map, sector);
+	if (!dm_target_is_valid(tgt)) {
+		ret = -EIO;
+		goto out;
+	}
+
+	/*
+	 * If we are executing this, we already know that the block device
+	 * is a zoned device and so each target should have support for that
+	 * type of drive. A missing report_zones method means that the target
+	 * driver has a problem.
+	 */
+	if (WARN_ON(!tgt->type->report_zones)) {
+		ret = -EIO;
+		goto out;
+	}
+
+	/*
+	 * blkdev_report_zones() will loop and call this again to cover all the
+	 * zones of the target, eventually moving on to the next target.
+	 * So there is no need to loop here trying to fill the entire array
+	 * of zones.
+	 */
+	ret = tgt->type->report_zones(tgt, sector, zones,
+				      nr_zones, gfp_mask);
+
+out:
+	dm_put_live_table(md, srcu_idx);
+	return ret;
+#else
+	return -ENOTSUPP;
+#endif
+}
+
 static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
 			    struct block_device **bdev)
 	__acquires(md->io_barrier)
@@ -1155,93 +1206,49 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors)
 EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
 
 /*
- * The zone descriptors obtained with a zone report indicate zone positions
- * within the target backing device, regardless of that device is a partition
- * and regardless of the target mapping start sector on the device or partition.
- * The zone descriptors start sector and write pointer position must be adjusted
- * to match their relative position within the dm device.
- * A target may call dm_remap_zone_report() after completion of a
- * REQ_OP_ZONE_REPORT bio to remap the zone descriptors obtained from the
- * backing device.
+ * The zone descriptors obtained with a zone report indicate
+ * zone positions within the underlying device of the target. The zone
+ * descriptors must be remapped to match their position within the dm device.
+ * The caller target should obtain the zones information using
+ * blkdev_report_zones() to ensure that remapping for partition offset is
+ * already handled.
  */
-void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start)
+void dm_remap_zone_report(struct dm_target *ti, sector_t start,
+			  struct blk_zone *zones, unsigned int *nr_zones)
 {
 #ifdef CONFIG_BLK_DEV_ZONED
-	struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone);
-	struct bio *report_bio = tio->io->orig_bio;
-	struct blk_zone_report_hdr *hdr = NULL;
 	struct blk_zone *zone;
-	unsigned int nr_rep = 0;
-	unsigned int ofst;
-	sector_t part_offset;
-	struct bio_vec bvec;
-	struct bvec_iter iter;
-	void *addr;
-
-	if (bio->bi_status)
-		return;
-
-	/*
-	 * bio sector was incremented by the request size on completion. Taking
-	 * into account the original request sector, the target start offset on
-	 * the backing device and the target mapping offset (ti->begin), the
-	 * start sector of the backing device. The partition offset is always 0
-	 * if the target uses a whole device.
-	 */
-	part_offset = bio->bi_iter.bi_sector + ti->begin - (start + bio_end_sector(report_bio));
+	unsigned int nrz = *nr_zones;
+	int i;
 
 	/*
-	 * Remap the start sector of the reported zones. For sequential zones,
-	 * also remap the write pointer position.
+	 * Remap the start sector and write pointer position of the zones in
+	 * the array. Since we may have obtained from the target underlying
+	 * device more zones that the target size, also adjust the number
+	 * of zones.
 	 */
-	bio_for_each_segment(bvec, report_bio, iter) {
-		addr = kmap_atomic(bvec.bv_page);
-
-		/* Remember the report header in the first page */
-		if (!hdr) {
-			hdr = addr;
-			ofst = sizeof(struct blk_zone_report_hdr);
-		} else
-			ofst = 0;
-
-		/* Set zones start sector */
-		while (hdr->nr_zones && ofst < bvec.bv_len) {
-			zone = addr + ofst;
-			zone->start -= part_offset;
-			if (zone->start >= start + ti->len) {
-				hdr->nr_zones = 0;
-				break;
-			}
-			zone->start = zone->start + ti->begin - start;
-			if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
-				if (zone->cond == BLK_ZONE_COND_FULL)
-					zone->wp = zone->start + zone->len;
-				else if (zone->cond == BLK_ZONE_COND_EMPTY)
-					zone->wp = zone->start;
-				else
-					zone->wp = zone->wp + ti->begin - start - part_offset;
-			}
-			ofst += sizeof(struct blk_zone);
-			hdr->nr_zones--;
-			nr_rep++;
+	for (i = 0; i < nrz; i++) {
+		zone = zones + i;
+		if (zone->start >= start + ti->len) {
+			memset(zone, 0, sizeof(struct blk_zone) * (nrz - i));
+			break;
 		}
 
-		if (addr != hdr)
-			kunmap_atomic(addr);
+		zone->start = zone->start + ti->begin - start;
+		if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
+			continue;
 
-		if (!hdr->nr_zones)
-			break;
-	}
-
-	if (hdr) {
-		hdr->nr_zones = nr_rep;
-		kunmap_atomic(hdr);
+		if (zone->cond == BLK_ZONE_COND_FULL)
+			zone->wp = zone->start + zone->len;
+		else if (zone->cond == BLK_ZONE_COND_EMPTY)
+			zone->wp = zone->start;
+		else
+			zone->wp = zone->wp + ti->begin - start;
 	}
 
-	bio_advance(report_bio, report_bio->bi_iter.bi_size);
-
+	*nr_zones = i;
 #else /* !CONFIG_BLK_DEV_ZONED */
-	bio->bi_status = BLK_STS_NOTSUPP;
+	*nr_zones = 0;
 #endif
 }
 EXPORT_SYMBOL_GPL(dm_remap_zone_report);
@@ -1327,8 +1334,7 @@ static int clone_bio(struct dm_target_io *tio, struct bio *bio,
 			return r;
 	}
 
-	if (bio_op(bio) != REQ_OP_ZONE_REPORT)
-		bio_advance(clone, to_bytes(sector - clone->bi_iter.bi_sector));
+	bio_advance(clone, to_bytes(sector - clone->bi_iter.bi_sector));
 	clone->bi_iter.bi_size = to_bytes(len);
 
 	if (unlikely(bio_integrity(bio) != NULL))
@@ -1541,7 +1547,6 @@ static bool __process_abnormal_io(struct clone_info *ci, struct dm_target *ti,
  */
 static int __split_and_process_non_flush(struct clone_info *ci)
 {
-	struct bio *bio = ci->bio;
 	struct dm_target *ti;
 	unsigned len;
 	int r;
@@ -1553,11 +1558,7 @@ static int __split_and_process_non_flush(struct clone_info *ci)
 	if (unlikely(__process_abnormal_io(ci, ti, &r)))
 		return r;
 
-	if (bio_op(bio) == REQ_OP_ZONE_REPORT)
-		len = ci->sector_count;
-	else
-		len = min_t(sector_t, max_io_len(ci->sector, ti),
-			    ci->sector_count);
+	len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count);
 
 	r = __clone_and_map_data_bio(ci, ti, ci->sector, &len);
 	if (r < 0)
@@ -1616,9 +1617,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
 				 * We take a clone of the original to store in
 				 * ci.io->orig_bio to be used by end_io_acct() and
 				 * for dec_pending to use for completion handling.
-				 * As this path is not used for REQ_OP_ZONE_REPORT,
-				 * the usage of io->orig_bio in dm_remap_zone_report()
-				 * won't be affected by this reassignment.
 				 */
 				struct bio *b = bio_split(bio, bio_sectors(bio) - ci.sector_count,
 							  GFP_NOIO, &md->queue->bio_split);
@@ -3167,6 +3165,7 @@ static const struct block_device_operations dm_blk_dops = {
 	.release = dm_blk_close,
 	.ioctl = dm_blk_ioctl,
 	.getgeo = dm_blk_getgeo,
+	.report_zones = dm_blk_report_zones,
 	.pr_ops = &dm_pr_ops,
 	.owner = THIS_MODULE
 };
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4a57ffecc7e6..718aaf15d812 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1271,8 +1271,6 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 	case REQ_OP_READ:
 	case REQ_OP_WRITE:
 		return sd_setup_read_write_cmnd(cmd);
-	case REQ_OP_ZONE_REPORT:
-		return sd_zbc_setup_report_cmnd(cmd);
 	case REQ_OP_ZONE_RESET:
 		return sd_zbc_setup_reset_cmnd(cmd);
 	default:
@@ -1801,6 +1799,7 @@ static const struct block_device_operations sd_fops = {
 	.check_events		= sd_check_events,
 	.revalidate_disk	= sd_revalidate_disk,
 	.unlock_native_capacity	= sd_unlock_native_capacity,
+	.report_zones		= sd_zbc_report_zones,
 	.pr_ops			= &sd_pr_ops,
 };
 
@@ -1952,16 +1951,6 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			scsi_set_resid(SCpnt, blk_rq_bytes(req));
 		}
 		break;
-	case REQ_OP_ZONE_REPORT:
-		if (!result) {
-			good_bytes = scsi_bufflen(SCpnt)
-				- scsi_get_resid(SCpnt);
-			scsi_set_resid(SCpnt, 0);
-		} else {
-			good_bytes = 0;
-			scsi_set_resid(SCpnt, blk_rq_bytes(req));
-		}
-		break;
 	default:
 		/*
 		 * In case of bogus fw or device, we could end up having
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index a7d4f50b67d4..f72f20fd0d8b 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -273,10 +273,12 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp)
 extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
 extern void sd_zbc_remove(struct scsi_disk *sdkp);
 extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
-extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd);
 extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
 extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
 			    struct scsi_sense_hdr *sshdr);
+extern int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
+			       struct blk_zone *zones, unsigned int *nr_zones,
+			       gfp_t gfp_mask);
 
 #else /* CONFIG_BLK_DEV_ZONED */
 
@@ -290,11 +292,6 @@ static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
 
 static inline void sd_zbc_print_zones(struct scsi_disk *sdkp) {}
 
-static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
-{
-	return BLKPREP_INVALID;
-}
-
 static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
 {
 	return BLKPREP_INVALID;
@@ -304,6 +301,8 @@ static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
 				   unsigned int good_bytes,
 				   struct scsi_sense_hdr *sshdr) {}
 
+#define sd_zbc_report_zones NULL
+
 #endif /* CONFIG_BLK_DEV_ZONED */
 
 #endif /* _SCSI_DISK_H */
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 44b64b4a922a..1e8274f473b0 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -62,7 +62,7 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
 }
 
 /**
- * sd_zbc_report_zones - Issue a REPORT ZONES scsi command.
+ * sd_zbc_do_report_zones - Issue a REPORT ZONES scsi command.
  * @sdkp: The target disk
  * @buf: Buffer to use for the reply
  * @buflen: the buffer size
@@ -75,9 +75,9 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
  * zones and will only report the count of zones fitting in the command reply
  * buffer.
  */
-static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
-			       unsigned int buflen, sector_t lba,
-			       bool partial)
+static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
+				  unsigned int buflen, sector_t lba,
+				  bool partial)
 {
 	struct scsi_device *sdp = sdkp->device;
 	const int timeout = sdp->request_queue->rq_timeout;
@@ -118,108 +118,56 @@ static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
 }
 
 /**
- * sd_zbc_setup_report_cmnd - Prepare a REPORT ZONES scsi command
- * @cmd: The command to setup
+ * sd_zbc_report_zones - Disk report zones operation.
+ * @disk: The target disk
+ * @sector: Start 512B sector of the report
+ * @zones: Array of zone descriptors
+ * @nr_zones: Number of descriptors in the array
+ * @gfp_mask: Memory allocation mask
  *
- * Call in sd_init_command() for a REQ_OP_ZONE_REPORT request.
+ * Execute a report zones command on the target disk.
  */
-int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
+			struct blk_zone *zones, unsigned int *nr_zones,
+			gfp_t gfp_mask)
 {
-	struct request *rq = cmd->request;
-	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
-	sector_t lba, sector = blk_rq_pos(rq);
-	unsigned int nr_bytes = blk_rq_bytes(rq);
-	int ret;
-
-	WARN_ON(nr_bytes == 0);
+	struct scsi_disk *sdkp = scsi_disk(disk);
+	unsigned int i, buflen, nrz = *nr_zones;
+	unsigned char *buf;
+	size_t offset = 0;
+	int ret = 0;
 
 	if (!sd_is_zoned(sdkp))
 		/* Not a zoned device */
-		return BLKPREP_KILL;
-
-	ret = scsi_init_io(cmd);
-	if (ret != BLKPREP_OK)
-		return ret;
-
-	cmd->cmd_len = 16;
-	memset(cmd->cmnd, 0, cmd->cmd_len);
-	cmd->cmnd[0] = ZBC_IN;
-	cmd->cmnd[1] = ZI_REPORT_ZONES;
-	lba = sectors_to_logical(sdkp->device, sector);
-	put_unaligned_be64(lba, &cmd->cmnd[2]);
-	put_unaligned_be32(nr_bytes, &cmd->cmnd[10]);
-	/* Do partial report for speeding things up */
-	cmd->cmnd[14] = ZBC_REPORT_ZONE_PARTIAL;
-
-	cmd->sc_data_direction = DMA_FROM_DEVICE;
-	cmd->sdb.length = nr_bytes;
-	cmd->transfersize = sdkp->device->sector_size;
-	cmd->allowed = 0;
+		return -EOPNOTSUPP;
 
-	return BLKPREP_OK;
-}
-
-/**
- * sd_zbc_report_zones_complete - Process a REPORT ZONES scsi command reply.
- * @scmd: The completed report zones command
- * @good_bytes: reply size in bytes
- *
- * Convert all reported zone descriptors to struct blk_zone. The conversion
- * is done in-place, directly in the request specified sg buffer.
- */
-static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
-					 unsigned int good_bytes)
-{
-	struct request *rq = scmd->request;
-	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
-	struct sg_mapping_iter miter;
-	struct blk_zone_report_hdr hdr;
-	struct blk_zone zone;
-	unsigned int offset, bytes = 0;
-	unsigned long flags;
-	u8 *buf;
-
-	if (good_bytes < 64)
-		return;
-
-	memset(&hdr, 0, sizeof(struct blk_zone_report_hdr));
-
-	sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd),
-		       SG_MITER_TO_SG | SG_MITER_ATOMIC);
+	/*
+	 * Get a reply buffer for the number of requested zones plus a header.
+	 * For ATA, buffers must be aligned to 512B.
+	 */
+	buflen = roundup((nrz + 1) * 64, 512);
+	buf = kmalloc(buflen, gfp_mask);
+	if (!buf)
+		return -ENOMEM;
 
-	local_irq_save(flags);
-	while (sg_miter_next(&miter) && bytes < good_bytes) {
+	ret = sd_zbc_do_report_zones(sdkp, buf, buflen,
+			sectors_to_logical(sdkp->device, sector), true);
+	if (ret)
+		goto out_free_buf;
 
-		buf = miter.addr;
-		offset = 0;
+	nrz = min(nrz, get_unaligned_be32(&buf[0]) / 64);
+	for (i = 0; i < nrz; i++) {
+		offset += 64;
+		sd_zbc_parse_report(sdkp, buf + offset, zones);
+		zones++;
+	}
 
-		if (bytes == 0) {
-			/* Set the report header */
-			hdr.nr_zones = min_t(unsigned int,
-					 (good_bytes - 64) / 64,
-					 get_unaligned_be32(&buf[0]) / 64);
-			memcpy(buf, &hdr, sizeof(struct blk_zone_report_hdr));
-			offset += 64;
-			bytes += 64;
-		}
+	*nr_zones = nrz;
 
-		/* Parse zone descriptors */
-		while (offset < miter.length && hdr.nr_zones) {
-			WARN_ON(offset > miter.length);
-			buf = miter.addr + offset;
-			sd_zbc_parse_report(sdkp, buf, &zone);
-			memcpy(buf, &zone, sizeof(struct blk_zone));
-			offset += 64;
-			bytes += 64;
-			hdr.nr_zones--;
-		}
-
-		if (!hdr.nr_zones)
-			break;
+out_free_buf:
+	kfree(buf);
 
-	}
-	sg_miter_stop(&miter);
-	local_irq_restore(flags);
+	return ret;
 }
 
 /**
@@ -302,13 +250,6 @@ void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
 	case REQ_OP_WRITE_ZEROES:
 	case REQ_OP_WRITE_SAME:
 		break;
-
-	case REQ_OP_ZONE_REPORT:
-
-		if (!result)
-			sd_zbc_report_zones_complete(cmd, good_bytes);
-		break;
-
 	}
 }
 
@@ -390,7 +331,7 @@ static s32 sd_zbc_check_zones(struct scsi_disk *sdkp)
 		return -ENOMEM;
 
 	/* Do a report zone to get max_lba and the same field */
-	ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0, false);
+	ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0, false);
 	if (ret)
 		goto out_free;
 
@@ -447,8 +388,8 @@ static s32 sd_zbc_check_zones(struct scsi_disk *sdkp)
 		}
 
 		if (block < sdkp->capacity) {
-			ret = sd_zbc_report_zones(sdkp, buf,
-						  SD_ZBC_BUF_SIZE, block, true);
+			ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
+						     block, true);
 			if (ret)
 				goto out_free;
 		}
@@ -564,8 +505,8 @@ sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp, u32 zone_shift,
 		goto out;
 
 	while (lba < sdkp->capacity) {
-		ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
-					  lba, true);
+		ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, lba,
+					     true);
 		if (ret)
 			goto out;
 		lba = sd_zbc_get_seq_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index f6dfb30737d8..1dcf652ba0aa 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -284,8 +284,6 @@ enum req_opf {
 	REQ_OP_FLUSH		= 2,
 	/* discard sectors */
 	REQ_OP_DISCARD		= 3,
-	/* get zone information */
-	REQ_OP_ZONE_REPORT	= 4,
 	/* securely erase sectors */
 	REQ_OP_SECURE_ERASE	= 5,
 	/* seset a zone write pointer */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 879753b10263..b5c75edfa8c7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -396,11 +396,6 @@ struct queue_limits {
 
 #ifdef CONFIG_BLK_DEV_ZONED
 
-struct blk_zone_report_hdr {
-	unsigned int	nr_zones;
-	u8		padding[60];
-};
-
 extern unsigned int blkdev_nr_zones(struct block_device *bdev);
 extern int blkdev_report_zones(struct block_device *bdev,
 			       sector_t sector, struct blk_zone *zones,
@@ -2002,6 +1997,9 @@ struct block_device_operations {
 	int (*getgeo)(struct block_device *, struct hd_geometry *);
 	/* this callback is with swap_lock and sometimes page table lock held */
 	void (*swap_slot_free_notify) (struct block_device *, unsigned long);
+	int (*report_zones)(struct gendisk *, sector_t sector,
+			    struct blk_zone *zones, unsigned int *nr_zones,
+			    gfp_t gfp_mask);
 	struct module *owner;
 	const struct pr_ops *pr_ops;
 };
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 6fb0808e87c8..a23b396a8edc 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -92,6 +92,11 @@ typedef int (*dm_message_fn) (struct dm_target *ti, unsigned argc, char **argv,
 
 typedef int (*dm_prepare_ioctl_fn) (struct dm_target *ti, struct block_device **bdev);
 
+typedef int (*dm_report_zones_fn) (struct dm_target *ti, sector_t sector,
+				   struct blk_zone *zones,
+				   unsigned int *nr_zones,
+				   gfp_t gfp_mask);
+
 /*
  * These iteration functions are typically used to check (and combine)
  * properties of underlying devices.
@@ -180,6 +185,9 @@ struct target_type {
 	dm_status_fn status;
 	dm_message_fn message;
 	dm_prepare_ioctl_fn prepare_ioctl;
+#ifdef CONFIG_BLK_DEV_ZONED
+	dm_report_zones_fn report_zones;
+#endif
 	dm_busy_fn busy;
 	dm_iterate_devices_fn iterate_devices;
 	dm_io_hints_fn io_hints;
@@ -420,8 +428,8 @@ struct gendisk *dm_disk(struct mapped_device *md);
 int dm_suspended(struct dm_target *ti);
 int dm_noflush_suspending(struct dm_target *ti);
 void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors);
-void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,
-			  sector_t start);
+void dm_remap_zone_report(struct dm_target *ti, sector_t start,
+			  struct blk_zone *zones, unsigned int *nr_zones);
 union map_info *dm_get_rq_mapinfo(struct request *rq);
 
 struct queue_limits *dm_get_queue_limits(struct mapped_device *md);
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 795698925d20..3ec73f17ee2a 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -82,7 +82,6 @@ TRACE_DEFINE_ENUM(CP_TRIMMED);
 		{ REQ_OP_WRITE,			"WRITE" },		\
 		{ REQ_OP_FLUSH,			"FLUSH" },		\
 		{ REQ_OP_DISCARD,		"DISCARD" },		\
-		{ REQ_OP_ZONE_REPORT,		"ZONE_REPORT" },	\
 		{ REQ_OP_SECURE_ERASE,		"SECURE_ERASE" },	\
 		{ REQ_OP_ZONE_RESET,		"ZONE_RESET" },		\
 		{ REQ_OP_WRITE_SAME,		"WRITE_SAME" },		\
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones()
  2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
                   ` (9 preceding siblings ...)
  2018-10-12  2:30 ` [PATCH v3 10/11] block: add a report_zones method Damien Le Moal
@ 2018-10-12  2:30 ` Damien Le Moal
  2018-10-12  7:44     ` Hannes Reinecke
  2018-10-12  8:30     ` Christoph Hellwig
  10 siblings, 2 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  2:30 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

Drivers exposing zoned block devices have to initialize and maintain
correctness (i.e. revalidate) of the device zone bitmaps attached to
the device request queue (seq_zones_bitmap and seq_zones_wlock).

To simplify coding this, introduce a generic helper function
blk_revalidate_disk_zones() suitable for most (and likely all) cases.
This new function always update the seq_zones_bitmap and seq_zones_wlock
bitmaps as well as the queue nr_zones field when called for a disk
using a request based queue. For a disk using a BIO based queue, only
the number of zones is updated since these queues do not have
schedulers and so do not need the zone bitmaps.

With this change, the zone bitmap initialization code in sd_zbc.c can be
replaced with a call to this function in sd_zbc_read_zones(), which is
called from the disk revalidate block operation method.

A call to blk_revalidate_disk_zones() is also added to the null_blk
driver for devices created with the zoned mode enabled.

Finally, to ensure that zoned devices created with dm-linear or
dm-flakey expose the correct number of zones through sysfs, a call to
blk_revalidate_disk_zones() is added to dm_table_set_restrictions().

The zone bitmaps allocated and initialized with
blk_revalidate_disk_zones() are freed automatically from
__blk_release_queue() using the block internal function
blk_queue_free_zone_bitmaps().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-sysfs.c             |   2 +
 block/blk-zoned.c             | 136 +++++++++++++++++++++
 block/blk.h                   |   6 +
 drivers/block/null_blk_main.c |   7 ++
 drivers/md/dm-table.c         |  10 ++
 drivers/scsi/sd.c             |   2 -
 drivers/scsi/sd.h             |   4 -
 drivers/scsi/sd_zbc.c         | 218 ++++------------------------------
 include/linux/blkdev.h        |   7 ++
 9 files changed, 194 insertions(+), 198 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 92be8092ca4f..0641533597f1 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -852,6 +852,8 @@ static void __blk_release_queue(struct work_struct *work)
 	if (q->queue_tags)
 		__blk_queue_free_tags(q);
 
+	blk_queue_free_zone_bitmaps(q);
+
 	if (!q->mq_ops) {
 		if (q->exit_rq_fn)
 			q->exit_rq_fn(q, q->fq->flush_rq);
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 90cf503091d5..13ba2011a306 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/rbtree.h>
 #include <linux/blkdev.h>
+#include <linux/blk-mq.h>
 
 #include "blk.h"
 
@@ -359,3 +360,138 @@ int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
 	return blkdev_reset_zones(bdev, zrange.sector, zrange.nr_sectors,
 				  GFP_KERNEL);
 }
+
+static inline unsigned long *blk_alloc_zone_bitmap(int node,
+						   unsigned int nr_zones)
+{
+	return kcalloc_node(BITS_TO_LONGS(nr_zones), sizeof(unsigned long),
+			    GFP_NOIO, node);
+}
+
+/*
+ * Allocate an array of struct blk_zone to get nr_zones zone information.
+ * The allocated array may be smaller than nr_zones.
+ */
+static struct blk_zone *blk_alloc_zones(int node, unsigned int *nr_zones)
+{
+	size_t size = *nr_zones * sizeof(struct blk_zone);
+	struct page *page;
+	int order;
+
+	for (order = get_order(size); order > 0; order--) {
+		page = alloc_pages_node(node, GFP_NOIO | __GFP_ZERO, order);
+		if (page) {
+			*nr_zones = min_t(unsigned int, *nr_zones,
+				(PAGE_SIZE << order) / sizeof(struct blk_zone));
+			return page_address(page);
+		}
+	}
+
+	return NULL;
+}
+
+void blk_queue_free_zone_bitmaps(struct request_queue *q)
+{
+	kfree(q->seq_zones_bitmap);
+	q->seq_zones_bitmap = NULL;
+	kfree(q->seq_zones_wlock);
+	q->seq_zones_wlock = NULL;
+}
+
+/**
+ * blk_revalidate_disk_zones - (re)allocate and initialize zone bitmaps
+ * @disk:	Target disk
+ *
+ * Helper function for low-level device drivers to (re) allocate and initialize
+ * a disk request queue zone bitmaps. This functions should normally be called
+ * within the disk ->revalidate method. For BIO based queues, no zone bitmap
+ * is allocated.
+ */
+int blk_revalidate_disk_zones(struct gendisk *disk)
+{
+	struct request_queue *q = disk->queue;
+	unsigned int nr_zones = __blkdev_nr_zones(q, get_capacity(disk));
+	unsigned long *seq_zones_wlock = NULL, *seq_zones_bitmap = NULL;
+	unsigned int i, rep_nr_zones = 0, z = 0, nrz;
+	struct blk_zone *zones = NULL;
+	sector_t sector = 0;
+	int ret = 0;
+
+	/*
+	 * BIO based queues do not use a scheduler so only q->nr_zones
+	 * needs to be updated so that the sysfs exposed value is correct.
+	 */
+	if (!queue_is_rq_based(q)) {
+		q->nr_zones = nr_zones;
+		return 0;
+	}
+
+	if (!blk_queue_is_zoned(q) || !nr_zones) {
+		nr_zones = 0;
+		goto update;
+	}
+
+	/* Allocate bitmaps */
+	ret = -ENOMEM;
+	seq_zones_wlock = blk_alloc_zone_bitmap(q->node, nr_zones);
+	if (!seq_zones_wlock)
+		goto out;
+	seq_zones_bitmap = blk_alloc_zone_bitmap(q->node, nr_zones);
+	if (!seq_zones_bitmap)
+		goto out;
+
+	/* Get zone information and initialize seq_zones_bitmap */
+	rep_nr_zones = nr_zones;
+	zones = blk_alloc_zones(q->node, &rep_nr_zones);
+	if (!zones)
+		goto out;
+
+	while (z < nr_zones) {
+		nrz = min(nr_zones - z, rep_nr_zones);
+		ret = blk_report_zones(disk, sector, zones, &nrz, GFP_NOIO);
+		if (ret)
+			goto out;
+		if (!nrz)
+			break;
+		for (i = 0; i < nrz; i++) {
+			if (zones[i].type != BLK_ZONE_TYPE_CONVENTIONAL)
+				set_bit(z, seq_zones_bitmap);
+			z++;
+		}
+		sector += nrz * blk_queue_zone_sectors(q);
+	}
+
+	if (WARN_ON(z != nr_zones)) {
+		ret = -EIO;
+		goto out;
+	}
+
+update:
+	/*
+	 * Install the new bitmaps, making sure the queue is stopped and
+	 * all I/Os are completed (i.e. a scheduler is not referencing the
+	 * bitmaps).
+	 */
+	blk_mq_freeze_queue(q);
+	q->nr_zones = nr_zones;
+	swap(q->seq_zones_wlock, seq_zones_wlock);
+	swap(q->seq_zones_bitmap, seq_zones_bitmap);
+	blk_mq_unfreeze_queue(q);
+
+out:
+	free_pages((unsigned long)zones,
+		   get_order(rep_nr_zones * sizeof(struct blk_zone)));
+	kfree(seq_zones_wlock);
+	kfree(seq_zones_bitmap);
+
+	if (ret) {
+		pr_warn("%s: failed to revalidate zones\n", disk->disk_name);
+		blk_mq_freeze_queue(q);
+		blk_queue_free_zone_bitmaps(q);
+		blk_mq_unfreeze_queue(q);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blk_revalidate_disk_zones);
+
diff --git a/block/blk.h b/block/blk.h
index 3d67844676fb..7e30ecea0680 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -423,4 +423,10 @@ static inline int blk_iolatency_init(struct request_queue *q) { return 0; }
 
 struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp);
 
+#ifdef CONFIG_BLK_DEV_ZONED
+void blk_queue_free_zone_bitmaps(struct request_queue *q);
+#else
+static inline void blk_queue_free_zone_bitmaps(struct request_queue *q) {}
+#endif
+
 #endif /* BLK_INTERNAL_H */
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index f5759ca768d1..07780942b79f 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1613,6 +1613,13 @@ static int null_gendisk_register(struct nullb *nullb)
 	disk->queue		= nullb->q;
 	strncpy(disk->disk_name, nullb->disk_name, DISK_NAME_LEN);
 
+	if (nullb->dev->zoned) {
+		int ret = blk_revalidate_disk_zones(disk);
+
+		if (ret != 0)
+			return ret;
+	}
+
 	add_disk(disk);
 	return 0;
 }
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3d0e2c198f06..fb4bea20657b 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1937,6 +1937,16 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	 */
 	if (blk_queue_add_random(q) && dm_table_all_devices_attribute(t, device_is_not_random))
 		blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
+
+	/*
+	 * For a zoned target, the number of zones should be updated for the
+	 * correct value to be exposed in sysfs queue/nr_zones. For a BIO based
+	 * target, this is all that is needed. For a request based target, the
+	 * queue zone bitmaps must also be updated.
+	 * Use blk_revalidate_disk_zones() to handle this.
+	 */
+	if (blk_queue_is_zoned(q))
+		blk_revalidate_disk_zones(t->md->disk);
 }
 
 unsigned int dm_table_get_num_targets(struct dm_table *t)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 718aaf15d812..370b43683470 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3413,8 +3413,6 @@ static int sd_remove(struct device *dev)
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 
-	sd_zbc_remove(sdkp);
-
 	free_opal_dev(sdkp->opal_dev);
 
 	blk_register_region(devt, SD_MINORS, NULL,
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index f72f20fd0d8b..1d63f3a23ffb 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -76,7 +76,6 @@ struct scsi_disk {
 #ifdef CONFIG_BLK_DEV_ZONED
 	u32		nr_zones;
 	u32		zone_blocks;
-	u32		zone_shift;
 	u32		zones_optimal_open;
 	u32		zones_optimal_nonseq;
 	u32		zones_max_open;
@@ -271,7 +270,6 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp)
 #ifdef CONFIG_BLK_DEV_ZONED
 
 extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
-extern void sd_zbc_remove(struct scsi_disk *sdkp);
 extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
 extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
 extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
@@ -288,8 +286,6 @@ static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
 	return 0;
 }
 
-static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
-
 static inline void sd_zbc_print_zones(struct scsi_disk *sdkp) {}
 
 static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 1e8274f473b0..fa02e00fb740 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -424,191 +424,10 @@ static s32 sd_zbc_check_zones(struct scsi_disk *sdkp)
 	return ret;
 }
 
-/**
- * sd_zbc_alloc_zone_bitmap - Allocate a zone bitmap (one bit per zone).
- * @nr_zones: Number of zones to allocate space for.
- * @numa_node: NUMA node to allocate the memory from.
- */
-static inline unsigned long *
-sd_zbc_alloc_zone_bitmap(u32 nr_zones, int numa_node)
-{
-	return kcalloc_node(BITS_TO_LONGS(nr_zones), sizeof(unsigned long),
-			    GFP_KERNEL, numa_node);
-}
-
-/**
- * sd_zbc_get_seq_zones - Parse report zones reply to identify sequential zones
- * @sdkp: disk used
- * @buf: report reply buffer
- * @buflen: length of @buf
- * @zone_shift: logarithm base 2 of the number of blocks in a zone
- * @seq_zones_bitmap: bitmap of sequential zones to set
- *
- * Parse reported zone descriptors in @buf to identify sequential zones and
- * set the reported zone bit in @seq_zones_bitmap accordingly.
- * Since read-only and offline zones cannot be written, do not
- * mark them as sequential in the bitmap.
- * Return the LBA after the last zone reported.
- */
-static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf,
-				     unsigned int buflen, u32 zone_shift,
-				     unsigned long *seq_zones_bitmap)
-{
-	sector_t lba, next_lba = sdkp->capacity;
-	unsigned int buf_len, list_length;
-	unsigned char *rec;
-	u8 type, cond;
-
-	list_length = get_unaligned_be32(&buf[0]) + 64;
-	buf_len = min(list_length, buflen);
-	rec = buf + 64;
-
-	while (rec < buf + buf_len) {
-		type = rec[0] & 0x0f;
-		cond = (rec[1] >> 4) & 0xf;
-		lba = get_unaligned_be64(&rec[16]);
-		if (type != ZBC_ZONE_TYPE_CONV &&
-		    cond != ZBC_ZONE_COND_READONLY &&
-		    cond != ZBC_ZONE_COND_OFFLINE)
-			set_bit(lba >> zone_shift, seq_zones_bitmap);
-		next_lba = lba + get_unaligned_be64(&rec[8]);
-		rec += 64;
-	}
-
-	return next_lba;
-}
-
-/**
- * sd_zbc_setup_seq_zones_bitmap - Initialize a seq zone bitmap.
- * @sdkp: target disk
- * @zone_shift: logarithm base 2 of the number of blocks in a zone
- * @nr_zones: number of zones to set up a seq zone bitmap for
- *
- * Allocate a zone bitmap and initialize it by identifying sequential zones.
- */
-static unsigned long *
-sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp, u32 zone_shift,
-			      u32 nr_zones)
-{
-	struct request_queue *q = sdkp->disk->queue;
-	unsigned long *seq_zones_bitmap;
-	sector_t lba = 0;
-	unsigned char *buf;
-	int ret = -ENOMEM;
-
-	seq_zones_bitmap = sd_zbc_alloc_zone_bitmap(nr_zones, q->node);
-	if (!seq_zones_bitmap)
-		return ERR_PTR(-ENOMEM);
-
-	buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
-	if (!buf)
-		goto out;
-
-	while (lba < sdkp->capacity) {
-		ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, lba,
-					     true);
-		if (ret)
-			goto out;
-		lba = sd_zbc_get_seq_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
-					   zone_shift, seq_zones_bitmap);
-	}
-
-	if (lba != sdkp->capacity) {
-		/* Something went wrong */
-		ret = -EIO;
-	}
-
-out:
-	kfree(buf);
-	if (ret) {
-		kfree(seq_zones_bitmap);
-		return ERR_PTR(ret);
-	}
-	return seq_zones_bitmap;
-}
-
-static void sd_zbc_cleanup(struct scsi_disk *sdkp)
-{
-	struct request_queue *q = sdkp->disk->queue;
-
-	kfree(q->seq_zones_bitmap);
-	q->seq_zones_bitmap = NULL;
-
-	kfree(q->seq_zones_wlock);
-	q->seq_zones_wlock = NULL;
-
-	q->nr_zones = 0;
-}
-
-static int sd_zbc_setup(struct scsi_disk *sdkp, u32 zone_blocks)
-{
-	struct request_queue *q = sdkp->disk->queue;
-	u32 zone_shift = ilog2(zone_blocks);
-	u32 nr_zones;
-	int ret;
-
-	/* chunk_sectors indicates the zone size */
-	blk_queue_chunk_sectors(q,
-			logical_to_sectors(sdkp->device, zone_blocks));
-	nr_zones = round_up(sdkp->capacity, zone_blocks) >> zone_shift;
-
-	/*
-	 * Initialize the device request queue information if the number
-	 * of zones changed.
-	 */
-	if (nr_zones != sdkp->nr_zones || nr_zones != q->nr_zones) {
-		unsigned long *seq_zones_wlock = NULL, *seq_zones_bitmap = NULL;
-		size_t zone_bitmap_size;
-
-		if (nr_zones) {
-			seq_zones_wlock = sd_zbc_alloc_zone_bitmap(nr_zones,
-								   q->node);
-			if (!seq_zones_wlock) {
-				ret = -ENOMEM;
-				goto err;
-			}
-
-			seq_zones_bitmap = sd_zbc_setup_seq_zones_bitmap(sdkp,
-							zone_shift, nr_zones);
-			if (IS_ERR(seq_zones_bitmap)) {
-				ret = PTR_ERR(seq_zones_bitmap);
-				kfree(seq_zones_wlock);
-				goto err;
-			}
-		}
-		zone_bitmap_size = BITS_TO_LONGS(nr_zones) *
-			sizeof(unsigned long);
-		blk_mq_freeze_queue(q);
-		if (q->nr_zones != nr_zones) {
-			/* READ16/WRITE16 is mandatory for ZBC disks */
-			sdkp->device->use_16_for_rw = 1;
-			sdkp->device->use_10_for_rw = 0;
-
-			sdkp->zone_blocks = zone_blocks;
-			sdkp->zone_shift = zone_shift;
-			sdkp->nr_zones = nr_zones;
-			q->nr_zones = nr_zones;
-			swap(q->seq_zones_wlock, seq_zones_wlock);
-			swap(q->seq_zones_bitmap, seq_zones_bitmap);
-		} else if (memcmp(q->seq_zones_bitmap, seq_zones_bitmap,
-				  zone_bitmap_size) != 0) {
-			memcpy(q->seq_zones_bitmap, seq_zones_bitmap,
-			       zone_bitmap_size);
-		}
-		blk_mq_unfreeze_queue(q);
-		kfree(seq_zones_wlock);
-		kfree(seq_zones_bitmap);
-	}
-
-	return 0;
-
-err:
-	sd_zbc_cleanup(sdkp);
-	return ret;
-}
-
 int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 {
+	struct gendisk *disk = sdkp->disk;
+	unsigned int nr_zones;
 	int ret, zone_blocks;
 
 	if (!sd_is_zoned(sdkp))
@@ -634,24 +453,39 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 	}
 
 	/* The drive satisfies the kernel restrictions: set it up */
-	ret = sd_zbc_setup(sdkp, zone_blocks);
-	if (ret)
-		goto err;
+	blk_queue_chunk_sectors(sdkp->disk->queue,
+			logical_to_sectors(sdkp->device, zone_blocks));
+	nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks);
+
+	/* READ16/WRITE16 is mandatory for ZBC disks */
+	sdkp->device->use_16_for_rw = 1;
+	sdkp->device->use_10_for_rw = 0;
+
+	/*
+	 * If something changed, revalidate the disk zone bitmaps once we have
+	 * the capacity, that is on the second revalidate execution during disk
+	 * scan and always during normal revalidate.
+	 */
+	if (sdkp->first_scan)
+		return 0;
+	if (sdkp->zone_blocks != zone_blocks ||
+	    sdkp->nr_zones != nr_zones ||
+	    disk->queue->nr_zones != nr_zones) {
+		ret = blk_revalidate_disk_zones(disk);
+		if (ret != 0)
+			goto err;
+		sdkp->zone_blocks = zone_blocks;
+		sdkp->nr_zones = nr_zones;
+	}
 
 	return 0;
 
 err:
 	sdkp->capacity = 0;
-	sd_zbc_cleanup(sdkp);
 
 	return ret;
 }
 
-void sd_zbc_remove(struct scsi_disk *sdkp)
-{
-	sd_zbc_cleanup(sdkp);
-}
-
 void sd_zbc_print_zones(struct scsi_disk *sdkp)
 {
 	if (!sd_is_zoned(sdkp) || !sdkp->capacity)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b5c75edfa8c7..36f79e202932 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -402,6 +402,7 @@ extern int blkdev_report_zones(struct block_device *bdev,
 			       unsigned int *nr_zones, gfp_t gfp_mask);
 extern int blkdev_reset_zones(struct block_device *bdev, sector_t sectors,
 			      sector_t nr_sectors, gfp_t gfp_mask);
+extern int blk_revalidate_disk_zones(struct gendisk *disk);
 
 extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
 				     unsigned int cmd, unsigned long arg);
@@ -414,6 +415,12 @@ static inline unsigned int blkdev_nr_zones(struct block_device *bdev)
 {
 	return 0;
 }
+
+static inline int blk_revalidate_disk_zones(struct gendisk *disk)
+{
+	return 0;
+}
+
 static inline int blkdev_report_zones_ioctl(struct block_device *bdev,
 					    fmode_t mode, unsigned int cmd,
 					    unsigned long arg)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 01/11] scsi: sd_zbc: Rearrange code
  2018-10-12  2:30 ` [PATCH v3 01/11] scsi: sd_zbc: Rearrange code Damien Le Moal
@ 2018-10-12  7:29     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:29 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Move the urswrz check out of sd_zbc_read_zones() and into
> sd_zbc_read_zoned_characteristics() where that value is obtained (read
> from the disk zoned characteristics VPD page). Since this function now
> does more than simply reading the VPD page, rename it to
> sd_zbc_check_zoned_characteristics().
> Also fix the error message displayed when reading that VPD page fails.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd_zbc.c | 39 +++++++++++++++++++--------------------
>   1 file changed, 19 insertions(+), 20 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 01/11] scsi: sd_zbc: Rearrange code
@ 2018-10-12  7:29     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:29 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Move the urswrz check out of sd_zbc_read_zones() and into
> sd_zbc_read_zoned_characteristics() where that value is obtained (read
> from the disk zoned characteristics VPD page). Since this function now
> does more than simply reading the VPD page, rename it to
> sd_zbc_check_zoned_characteristics().
> Also fix the error message displayed when reading that VPD page fails.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd_zbc.c | 39 +++++++++++++++++++--------------------
>   1 file changed, 19 insertions(+), 20 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time
  2018-10-12  2:30 ` [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time Damien Le Moal
@ 2018-10-12  7:33     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:33 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Handling checks of ZBC device capacity using the max_lba field of the
> REPORT ZONES command reply for disks with rc_basis == 0 can be done
> using the same report zones command reply used to check the "same"
> field.
> 
> Avoid executing a report zones command solely to check the disk capacity
> by merging sd_zbc_check_capacity() into sd_zbc_check_zone_size() and
> renaming that function to sd_zbc_check_zones(). This removes a costly
> execution of a full report zones command and so reduces device scan
> duration at boot time as well as the duration of disk revalidate calls.
> 
> Furthermore, setting the partial report bit in the REPORT ZONES command
> cdb can significantly reduce this command execution time as the device
> does not have to count and report the total number of zones that could
> be reported assuming a large enough reply buffer. A non-partial zone
> report is necessary only for the first execution of report zones used to
> check the same field value (to ensure that this value applies to all
> zones of the disk). All other calls to sd_zbc_report_zones() can use a
> partial report to reduce execution time.
> 
> Using a 14 TB ZBC disk, these simple changes reduce device scan time at
> boot from about 3.5s down to about 900ms. Disk revalidate times are also
> reduced from about 450ms down to 230ms.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd_zbc.c | 94 ++++++++++++++++++-------------------------
>   1 file changed, 40 insertions(+), 54 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

... and it'll be your fault if there are drives which do not support the 
'partial' bit :)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time
@ 2018-10-12  7:33     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:33 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Handling checks of ZBC device capacity using the max_lba field of the
> REPORT ZONES command reply for disks with rc_basis == 0 can be done
> using the same report zones command reply used to check the "same"
> field.
> 
> Avoid executing a report zones command solely to check the disk capacity
> by merging sd_zbc_check_capacity() into sd_zbc_check_zone_size() and
> renaming that function to sd_zbc_check_zones(). This removes a costly
> execution of a full report zones command and so reduces device scan
> duration at boot time as well as the duration of disk revalidate calls.
> 
> Furthermore, setting the partial report bit in the REPORT ZONES command
> cdb can significantly reduce this command execution time as the device
> does not have to count and report the total number of zones that could
> be reported assuming a large enough reply buffer. A non-partial zone
> report is necessary only for the first execution of report zones used to
> check the same field value (to ensure that this value applies to all
> zones of the disk). All other calls to sd_zbc_report_zones() can use a
> partial report to reduce execution time.
> 
> Using a 14 TB ZBC disk, these simple changes reduce device scan time at
> boot from about 3.5s down to about 900ms. Disk revalidate times are also
> reduced from about 450ms down to 230ms.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd_zbc.c | 94 ++++++++++++++++++-------------------------
>   1 file changed, 40 insertions(+), 54 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

... and it'll be your fault if there are drives which do not support the 
'partial' bit :)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 03/11] scsi: sd_zbc: Fix sd_zbc_check_zones() error checks
  2018-10-12  2:30 ` [PATCH v3 03/11] scsi: sd_zbc: Fix sd_zbc_check_zones() error checks Damien Le Moal
@ 2018-10-12  7:35     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:35 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> The 32 bits overflow check for the zone size value is already done
> within sd_zbc_check_zones() with the test:
> 
> } else if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {
> 
> so there is no need to check again for an out of range value in
> sd_zbc_read_zones(). Simplify the code and fix sd_zbc_check_zones()
> error return to -EFBIG instead of -ENODEV if the zone size is too large.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd_zbc.c | 15 ++++++---------
>   1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index ca73c46931c0..44b64b4a922a 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -373,7 +373,7 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
>    * Returns the zone size in number of blocks upon success or an error code
>    * upon failure.
>    */
> -static s64 sd_zbc_check_zones(struct scsi_disk *sdkp)
> +static s32 sd_zbc_check_zones(struct scsi_disk *sdkp)
>   {
>   	u64 zone_blocks = 0;
>   	sector_t max_lba, block = 0;
Does this have to be an 's32' ?
Seeing that it's overloaded with an error code, and never sent to any 
hardware, can't we just make it a simple 'int' ?

> @@ -472,7 +472,7 @@ static s64 sd_zbc_check_zones(struct scsi_disk *sdkp)
>   		if (sdkp->first_scan)
>   			sd_printk(KERN_NOTICE, sdkp,
>   				  "Zone size too large\n");
> -		ret = -ENODEV;
> +		ret = -EFBIG;
>   	} else {
>   		ret = zone_blocks;
>   	}
> @@ -668,8 +668,7 @@ static int sd_zbc_setup(struct scsi_disk *sdkp, u32 zone_blocks)
>   
>   int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
>   {
> -	int64_t zone_blocks;
> -	int ret;
> +	int ret, zone_blocks;
>   
>   	if (!sd_is_zoned(sdkp))
>   		/*
> @@ -688,12 +687,10 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
>   	 * an eventual last runt zone) that is a power of 2 are supported.
>   	 */
>   	zone_blocks = sd_zbc_check_zones(sdkp);
> -	ret = -EFBIG;
> -	if (zone_blocks != (u32)zone_blocks)
> -		goto err;
> -	ret = zone_blocks;
> -	if (ret < 0)
> +	if (zone_blocks < 0) {
> +		ret = zone_blocks;
>   		goto err;
> +	}
>   
>   	/* The drive satisfies the kernel restrictions: set it up */
>   	ret = sd_zbc_setup(sdkp, zone_blocks);
> 
Remaining bits are okay.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 03/11] scsi: sd_zbc: Fix sd_zbc_check_zones() error checks
@ 2018-10-12  7:35     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:35 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> The 32 bits overflow check for the zone size value is already done
> within sd_zbc_check_zones() with the test:
> 
> } else if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {
> 
> so there is no need to check again for an out of range value in
> sd_zbc_read_zones(). Simplify the code and fix sd_zbc_check_zones()
> error return to -EFBIG instead of -ENODEV if the zone size is too large.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd_zbc.c | 15 ++++++---------
>   1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index ca73c46931c0..44b64b4a922a 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -373,7 +373,7 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
>    * Returns the zone size in number of blocks upon success or an error code
>    * upon failure.
>    */
> -static s64 sd_zbc_check_zones(struct scsi_disk *sdkp)
> +static s32 sd_zbc_check_zones(struct scsi_disk *sdkp)
>   {
>   	u64 zone_blocks = 0;
>   	sector_t max_lba, block = 0;
Does this have to be an 's32' ?
Seeing that it's overloaded with an error code, and never sent to any 
hardware, can't we just make it a simple 'int' ?

> @@ -472,7 +472,7 @@ static s64 sd_zbc_check_zones(struct scsi_disk *sdkp)
>   		if (sdkp->first_scan)
>   			sd_printk(KERN_NOTICE, sdkp,
>   				  "Zone size too large\n");
> -		ret = -ENODEV;
> +		ret = -EFBIG;
>   	} else {
>   		ret = zone_blocks;
>   	}
> @@ -668,8 +668,7 @@ static int sd_zbc_setup(struct scsi_disk *sdkp, u32 zone_blocks)
>   
>   int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
>   {
> -	int64_t zone_blocks;
> -	int ret;
> +	int ret, zone_blocks;
>   
>   	if (!sd_is_zoned(sdkp))
>   		/*
> @@ -688,12 +687,10 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
>   	 * an eventual last runt zone) that is a power of 2 are supported.
>   	 */
>   	zone_blocks = sd_zbc_check_zones(sdkp);
> -	ret = -EFBIG;
> -	if (zone_blocks != (u32)zone_blocks)
> -		goto err;
> -	ret = zone_blocks;
> -	if (ret < 0)
> +	if (zone_blocks < 0) {
> +		ret = zone_blocks;
>   		goto err;
> +	}
>   
>   	/* The drive satisfies the kernel restrictions: set it up */
>   	ret = sd_zbc_setup(sdkp, zone_blocks);
> 
Remaining bits are okay.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 04/11] block: Introduce blkdev_nr_zones() helper
  2018-10-12  2:30 ` [PATCH v3 04/11] block: Introduce blkdev_nr_zones() helper Damien Le Moal
@ 2018-10-12  7:36     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:36 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Introduce the blkdev_nr_zones() helper function to get the total
> number of zones of a zoned block device. This number is always 0 for a
> regular block device (q->limits.zoned == BLK_ZONED_NONE case).
> 
> Replace hard-coded number of zones calculation in dmz_get_zoned_device()
> with a call to this helper.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-zoned.c            | 27 +++++++++++++++++++++++++++
>   drivers/md/dm-zoned-target.c |  3 +--
>   include/linux/blkdev.h       |  5 +++++
>   3 files changed, 33 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 04/11] block: Introduce blkdev_nr_zones() helper
@ 2018-10-12  7:36     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:36 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Introduce the blkdev_nr_zones() helper function to get the total
> number of zones of a zoned block device. This number is always 0 for a
> regular block device (q->limits.zoned == BLK_ZONED_NONE case).
> 
> Replace hard-coded number of zones calculation in dmz_get_zoned_device()
> with a call to this helper.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-zoned.c            | 27 +++++++++++++++++++++++++++
>   drivers/md/dm-zoned-target.c |  3 +--
>   include/linux/blkdev.h       |  5 +++++
>   3 files changed, 33 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 05/11] block: Limit allocation of zone descriptors for report zones
  2018-10-12  2:30 ` [PATCH v3 05/11] block: Limit allocation of zone descriptors for report zones Damien Le Moal
@ 2018-10-12  7:36     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:36 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> There is no point in allocating more zone descriptors than the number of
> zones a block device has for doing a zone report. Avoid doing that in
> blkdev_report_zones_ioctl() by limiting the number of zone decriptors
> allocated internally to process the user request.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-zoned.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 05/11] block: Limit allocation of zone descriptors for report zones
@ 2018-10-12  7:36     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:36 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> There is no point in allocating more zone descriptors than the number of
> zones a block device has for doing a zone report. Avoid doing that in
> blkdev_report_zones_ioctl() by limiting the number of zone decriptors
> allocated internally to process the user request.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-zoned.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl
  2018-10-12  2:30 ` [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl Damien Le Moal
@ 2018-10-12  7:37     ` Hannes Reinecke
  2018-10-12  8:23     ` Christoph Hellwig
  1 sibling, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:37 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Get a zoned block device zone size in number of 512 B sectors.
> The zone size is always 0 for regular block devices.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/ioctl.c                 | 2 ++
>   include/uapi/linux/blkzoned.h | 2 ++
>   2 files changed, 4 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl
@ 2018-10-12  7:37     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:37 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Get a zoned block device zone size in number of 512 B sectors.
> The zone size is always 0 for regular block devices.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/ioctl.c                 | 2 ++
>   include/uapi/linux/blkzoned.h | 2 ++
>   2 files changed, 4 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl
  2018-10-12  2:30 ` [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl Damien Le Moal
@ 2018-10-12  7:37     ` Hannes Reinecke
  2018-10-12  8:23     ` Christoph Hellwig
  1 sibling, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:37 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Get a zoned block device total number of zones. The device can be a
> partition of the whole device. The number of zones is always 0 for
> regular block devices.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/ioctl.c                 | 2 ++
>   include/uapi/linux/blkzoned.h | 1 +
>   2 files changed, 3 insertions(+)
> 
And that's us, trying to get rid of ioctls ...

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl
@ 2018-10-12  7:37     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:37 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Get a zoned block device total number of zones. The device can be a
> partition of the whole device. The number of zones is always 0 for
> regular block devices.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/ioctl.c                 | 2 ++
>   include/uapi/linux/blkzoned.h | 1 +
>   2 files changed, 3 insertions(+)
> 
And that's us, trying to get rid of ioctls ...

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 08/11] block: Improve zone reset execution
  2018-10-12  2:30 ` [PATCH v3 08/11] block: Improve zone reset execution Damien Le Moal
@ 2018-10-12  7:39     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:39 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> There is no need to synchronously execute all REQ_OP_ZONE_RESET BIOs
> necessary to reset a range of zones. Similarly to what is done for
> discard BIOs in blk-lib.c, all zone reset BIOs can be chained and
> executed asynchronously and a synchronous call done only for the last
> BIO of the chain.
> 
> Modify blkdev_reset_zones() to operate similarly to
> blkdev_issue_discard() using the next_bio() helper for chaining BIOs. To
> avoid code duplication of that function in blk_zoned.c, rename
> next_bio() into blk_next_bio() and declare it as a block internal
> function in blk.h.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-lib.c   | 13 ++++++-------
>   block/blk-zoned.c | 29 ++++++++++++++++-------------
>   block/blk.h       |  2 ++
>   3 files changed, 24 insertions(+), 20 deletions(-)
> 
Neat.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 08/11] block: Improve zone reset execution
@ 2018-10-12  7:39     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:39 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> There is no need to synchronously execute all REQ_OP_ZONE_RESET BIOs
> necessary to reset a range of zones. Similarly to what is done for
> discard BIOs in blk-lib.c, all zone reset BIOs can be chained and
> executed asynchronously and a synchronous call done only for the last
> BIO of the chain.
> 
> Modify blkdev_reset_zones() to operate similarly to
> blkdev_issue_discard() using the next_bio() helper for chaining BIOs. To
> avoid code duplication of that function in blk_zoned.c, rename
> next_bio() into blk_next_bio() and declare it as a block internal
> function in blk.h.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-lib.c   | 13 ++++++-------
>   block/blk-zoned.c | 29 ++++++++++++++++-------------
>   block/blk.h       |  2 ++
>   3 files changed, 24 insertions(+), 20 deletions(-)
> 
Neat.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
  2018-10-12  2:30 ` [PATCH v3 09/11] block: Expose queue nr_zones in sysfs Damien Le Moal
@ 2018-10-12  7:41     ` Hannes Reinecke
  2018-10-12  8:29     ` Christoph Hellwig
  1 sibling, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:41 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Expose through sysfs the nr_zones field of struct request_queue.
> Exposing this value helps in debugging disk issues as well as
> facilitating scripts based use of the disk (e.g. blktests).
> 
> For zoned block devices, the nr_zones field indicates the total number
> of zones of the device calculated using the known disk capacity and
> zone size. This number of zones is always 0 for regular block devices.
> 
> Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
> introduce the blk_queue_nr_zones() function to return the correct value
> for any device, regardless if CONFIG_BLK_DEV_ZONED is set.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/blk-sysfs.c      | 11 +++++++++++
>   include/linux/blkdev.h | 10 ++++++++++
>   2 files changed, 21 insertions(+)
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 3772671cf2bc..92be8092ca4f 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -300,6 +300,11 @@ static ssize_t queue_zoned_show(struct request_queue *q, char *page)
>   	}
>   }
>   
> +static ssize_t queue_nr_zones_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(blk_queue_nr_zones(q), page);
> +}
> +
>   static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>   {
>   	return queue_var_show((blk_queue_nomerges(q) << 1) |
> @@ -637,6 +642,11 @@ static struct queue_sysfs_entry queue_zoned_entry = {
>   	.show = queue_zoned_show,
>   };
>   
> +static struct queue_sysfs_entry queue_nr_zones_entry = {
> +	.attr = {.name = "nr_zones", .mode = 0444 },
> +	.show = queue_nr_zones_show,
> +};
> +
>   static struct queue_sysfs_entry queue_nomerges_entry = {
>   	.attr = {.name = "nomerges", .mode = 0644 },
>   	.show = queue_nomerges_show,
> @@ -727,6 +737,7 @@ static struct attribute *default_attrs[] = {
>   	&queue_write_zeroes_max_entry.attr,
>   	&queue_nonrot_entry.attr,
>   	&queue_zoned_entry.attr,
> +	&queue_nr_zones_entry.attr,
>   	&queue_nomerges_entry.attr,
>   	&queue_rq_affinity_entry.attr,
>   	&queue_iostats_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c24969b1741b..879753b10263 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -804,6 +804,11 @@ static inline unsigned int blk_queue_zone_sectors(struct request_queue *q)
>   }
>   
>   #ifdef CONFIG_BLK_DEV_ZONED
> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
> +{
> +	return blk_queue_is_zoned(q) ? q->nr_zones : 0;
> +}
> +
>   static inline unsigned int blk_queue_zone_no(struct request_queue *q,
>   					     sector_t sector)
>   {
> @@ -819,6 +824,11 @@ static inline bool blk_queue_zone_is_seq(struct request_queue *q,
>   		return false;
>   	return test_bit(blk_queue_zone_no(q, sector), q->seq_zones_bitmap);
>   }
> +#else /* CONFIG_BLK_DEV_ZONED */
> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
> +{
> +	return 0;
> +}
>   #endif /* CONFIG_BLK_DEV_ZONED */
>   
>   static inline bool rq_is_sync(struct request *rq)
> 
Actually, we should be checking whether we can't blank out this 
attribute via the is_visible mechanism; after all, not every block 
device is zoned, and those which are not have no need of the attribute...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
@ 2018-10-12  7:41     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:41 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Expose through sysfs the nr_zones field of struct request_queue.
> Exposing this value helps in debugging disk issues as well as
> facilitating scripts based use of the disk (e.g. blktests).
> 
> For zoned block devices, the nr_zones field indicates the total number
> of zones of the device calculated using the known disk capacity and
> zone size. This number of zones is always 0 for regular block devices.
> 
> Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
> introduce the blk_queue_nr_zones() function to return the correct value
> for any device, regardless if CONFIG_BLK_DEV_ZONED is set.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/blk-sysfs.c      | 11 +++++++++++
>   include/linux/blkdev.h | 10 ++++++++++
>   2 files changed, 21 insertions(+)
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 3772671cf2bc..92be8092ca4f 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -300,6 +300,11 @@ static ssize_t queue_zoned_show(struct request_queue *q, char *page)
>   	}
>   }
>   
> +static ssize_t queue_nr_zones_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(blk_queue_nr_zones(q), page);
> +}
> +
>   static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>   {
>   	return queue_var_show((blk_queue_nomerges(q) << 1) |
> @@ -637,6 +642,11 @@ static struct queue_sysfs_entry queue_zoned_entry = {
>   	.show = queue_zoned_show,
>   };
>   
> +static struct queue_sysfs_entry queue_nr_zones_entry = {
> +	.attr = {.name = "nr_zones", .mode = 0444 },
> +	.show = queue_nr_zones_show,
> +};
> +
>   static struct queue_sysfs_entry queue_nomerges_entry = {
>   	.attr = {.name = "nomerges", .mode = 0644 },
>   	.show = queue_nomerges_show,
> @@ -727,6 +737,7 @@ static struct attribute *default_attrs[] = {
>   	&queue_write_zeroes_max_entry.attr,
>   	&queue_nonrot_entry.attr,
>   	&queue_zoned_entry.attr,
> +	&queue_nr_zones_entry.attr,
>   	&queue_nomerges_entry.attr,
>   	&queue_rq_affinity_entry.attr,
>   	&queue_iostats_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c24969b1741b..879753b10263 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -804,6 +804,11 @@ static inline unsigned int blk_queue_zone_sectors(struct request_queue *q)
>   }
>   
>   #ifdef CONFIG_BLK_DEV_ZONED
> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
> +{
> +	return blk_queue_is_zoned(q) ? q->nr_zones : 0;
> +}
> +
>   static inline unsigned int blk_queue_zone_no(struct request_queue *q,
>   					     sector_t sector)
>   {
> @@ -819,6 +824,11 @@ static inline bool blk_queue_zone_is_seq(struct request_queue *q,
>   		return false;
>   	return test_bit(blk_queue_zone_no(q, sector), q->seq_zones_bitmap);
>   }
> +#else /* CONFIG_BLK_DEV_ZONED */
> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
> +{
> +	return 0;
> +}
>   #endif /* CONFIG_BLK_DEV_ZONED */
>   
>   static inline bool rq_is_sync(struct request *rq)
> 
Actually, we should be checking whether we can't blank out this 
attribute via the is_visible mechanism; after all, not every block 
device is zoned, and those which are not have no need of the attribute...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 10/11] block: add a report_zones method
  2018-10-12  2:30 ` [PATCH v3 10/11] block: add a report_zones method Damien Le Moal
@ 2018-10-12  7:42     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:42 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Dispatching a report zones command through the request queue is a major
> pain due to the command reply payload rewriting necessary. Given that
> blkdev_report_zones() is executing everything synchronously, implement
> report zones as a block device file operation instead, allowing major
> simplification of the code in many places.
> 
> sd, null-blk, dm-linear and dm-flakey being the only block device
> drivers supporting exposing zoned block devices, these drivers are
> modified to provide the device side implementation of the
> report_zones() block device file operation.
> 
> For device mappers, a new report_zones() target type operation is
> defined so that the upper block layer calls blkdev_report_zones() can
> be propagated down to the underlying devices of the dm targets.
> Implementation for this new operation is added to the dm-linear and
> dm-flakey targets.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> [Damien]
> * Changed method block_device argument to gendisk
> * Various bug fixes and improvements
> * Added support for null_blk, dm-linear and dm-flakey.
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/blk-core.c               |   1 -
>   block/blk-mq-debugfs.c         |   1 -
>   block/blk-zoned.c              | 164 ++++++++++----------------------
>   drivers/block/null_blk.h       |  11 ++-
>   drivers/block/null_blk_main.c  |  23 +----
>   drivers/block/null_blk_zoned.c |  57 +++--------
>   drivers/md/dm-flakey.c         |  30 ++++--
>   drivers/md/dm-linear.c         |  35 ++++---
>   drivers/md/dm.c                | 169 ++++++++++++++++-----------------
>   drivers/scsi/sd.c              |  13 +--
>   drivers/scsi/sd.h              |  11 +--
>   drivers/scsi/sd_zbc.c          | 153 +++++++++--------------------
>   include/linux/blk_types.h      |   2 -
>   include/linux/blkdev.h         |   8 +-
>   include/linux/device-mapper.h  |  12 ++-
>   include/trace/events/f2fs.h    |   1 -
>   16 files changed, 266 insertions(+), 425 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 10/11] block: add a report_zones method
@ 2018-10-12  7:42     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:42 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Dispatching a report zones command through the request queue is a major
> pain due to the command reply payload rewriting necessary. Given that
> blkdev_report_zones() is executing everything synchronously, implement
> report zones as a block device file operation instead, allowing major
> simplification of the code in many places.
> 
> sd, null-blk, dm-linear and dm-flakey being the only block device
> drivers supporting exposing zoned block devices, these drivers are
> modified to provide the device side implementation of the
> report_zones() block device file operation.
> 
> For device mappers, a new report_zones() target type operation is
> defined so that the upper block layer calls blkdev_report_zones() can
> be propagated down to the underlying devices of the dm targets.
> Implementation for this new operation is added to the dm-linear and
> dm-flakey targets.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> [Damien]
> * Changed method block_device argument to gendisk
> * Various bug fixes and improvements
> * Added support for null_blk, dm-linear and dm-flakey.
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/blk-core.c               |   1 -
>   block/blk-mq-debugfs.c         |   1 -
>   block/blk-zoned.c              | 164 ++++++++++----------------------
>   drivers/block/null_blk.h       |  11 ++-
>   drivers/block/null_blk_main.c  |  23 +----
>   drivers/block/null_blk_zoned.c |  57 +++--------
>   drivers/md/dm-flakey.c         |  30 ++++--
>   drivers/md/dm-linear.c         |  35 ++++---
>   drivers/md/dm.c                | 169 ++++++++++++++++-----------------
>   drivers/scsi/sd.c              |  13 +--
>   drivers/scsi/sd.h              |  11 +--
>   drivers/scsi/sd_zbc.c          | 153 +++++++++--------------------
>   include/linux/blk_types.h      |   2 -
>   include/linux/blkdev.h         |   8 +-
>   include/linux/device-mapper.h  |  12 ++-
>   include/trace/events/f2fs.h    |   1 -
>   16 files changed, 266 insertions(+), 425 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones()
  2018-10-12  2:30 ` [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones() Damien Le Moal
@ 2018-10-12  7:44     ` Hannes Reinecke
  2018-10-12  8:30     ` Christoph Hellwig
  1 sibling, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:44 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Drivers exposing zoned block devices have to initialize and maintain
> correctness (i.e. revalidate) of the device zone bitmaps attached to
> the device request queue (seq_zones_bitmap and seq_zones_wlock).
> 
> To simplify coding this, introduce a generic helper function
> blk_revalidate_disk_zones() suitable for most (and likely all) cases.
> This new function always update the seq_zones_bitmap and seq_zones_wlock
> bitmaps as well as the queue nr_zones field when called for a disk
> using a request based queue. For a disk using a BIO based queue, only
> the number of zones is updated since these queues do not have
> schedulers and so do not need the zone bitmaps.
> 
> With this change, the zone bitmap initialization code in sd_zbc.c can be
> replaced with a call to this function in sd_zbc_read_zones(), which is
> called from the disk revalidate block operation method.
> 
> A call to blk_revalidate_disk_zones() is also added to the null_blk
> driver for devices created with the zoned mode enabled.
> 
> Finally, to ensure that zoned devices created with dm-linear or
> dm-flakey expose the correct number of zones through sysfs, a call to
> blk_revalidate_disk_zones() is added to dm_table_set_restrictions().
> 
> The zone bitmaps allocated and initialized with
> blk_revalidate_disk_zones() are freed automatically from
> __blk_release_queue() using the block internal function
> blk_queue_free_zone_bitmaps().
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/blk-sysfs.c             |   2 +
>   block/blk-zoned.c             | 136 +++++++++++++++++++++
>   block/blk.h                   |   6 +
>   drivers/block/null_blk_main.c |   7 ++
>   drivers/md/dm-table.c         |  10 ++
>   drivers/scsi/sd.c             |   2 -
>   drivers/scsi/sd.h             |   4 -
>   drivers/scsi/sd_zbc.c         | 218 ++++------------------------------
>   include/linux/blkdev.h        |   7 ++
>   9 files changed, 194 insertions(+), 198 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones()
@ 2018-10-12  7:44     ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  7:44 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 4:30 AM, Damien Le Moal wrote:
> Drivers exposing zoned block devices have to initialize and maintain
> correctness (i.e. revalidate) of the device zone bitmaps attached to
> the device request queue (seq_zones_bitmap and seq_zones_wlock).
> 
> To simplify coding this, introduce a generic helper function
> blk_revalidate_disk_zones() suitable for most (and likely all) cases.
> This new function always update the seq_zones_bitmap and seq_zones_wlock
> bitmaps as well as the queue nr_zones field when called for a disk
> using a request based queue. For a disk using a BIO based queue, only
> the number of zones is updated since these queues do not have
> schedulers and so do not need the zone bitmaps.
> 
> With this change, the zone bitmap initialization code in sd_zbc.c can be
> replaced with a call to this function in sd_zbc_read_zones(), which is
> called from the disk revalidate block operation method.
> 
> A call to blk_revalidate_disk_zones() is also added to the null_blk
> driver for devices created with the zoned mode enabled.
> 
> Finally, to ensure that zoned devices created with dm-linear or
> dm-flakey expose the correct number of zones through sysfs, a call to
> blk_revalidate_disk_zones() is added to dm_table_set_restrictions().
> 
> The zone bitmaps allocated and initialized with
> blk_revalidate_disk_zones() are freed automatically from
> __blk_release_queue() using the block internal function
> blk_queue_free_zone_bitmaps().
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/blk-sysfs.c             |   2 +
>   block/blk-zoned.c             | 136 +++++++++++++++++++++
>   block/blk.h                   |   6 +
>   drivers/block/null_blk_main.c |   7 ++
>   drivers/md/dm-table.c         |  10 ++
>   drivers/scsi/sd.c             |   2 -
>   drivers/scsi/sd.h             |   4 -
>   drivers/scsi/sd_zbc.c         | 218 ++++------------------------------
>   include/linux/blkdev.h        |   7 ++
>   9 files changed, 194 insertions(+), 198 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time
  2018-10-12  7:33     ` Hannes Reinecke
@ 2018-10-12  7:48       ` Damien Le Moal
  -1 siblings, 0 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  7:48 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 2018/10/12 16:33, Hannes Reinecke wrote:=0A=
> On 10/12/18 4:30 AM, Damien Le Moal wrote:=0A=
>> Handling checks of ZBC device capacity using the max_lba field of the=0A=
>> REPORT ZONES command reply for disks with rc_basis =3D=3D 0 can be done=
=0A=
>> using the same report zones command reply used to check the "same"=0A=
>> field.=0A=
>>=0A=
>> Avoid executing a report zones command solely to check the disk capacity=
=0A=
>> by merging sd_zbc_check_capacity() into sd_zbc_check_zone_size() and=0A=
>> renaming that function to sd_zbc_check_zones(). This removes a costly=0A=
>> execution of a full report zones command and so reduces device scan=0A=
>> duration at boot time as well as the duration of disk revalidate calls.=
=0A=
>>=0A=
>> Furthermore, setting the partial report bit in the REPORT ZONES command=
=0A=
>> cdb can significantly reduce this command execution time as the device=
=0A=
>> does not have to count and report the total number of zones that could=
=0A=
>> be reported assuming a large enough reply buffer. A non-partial zone=0A=
>> report is necessary only for the first execution of report zones used to=
=0A=
>> check the same field value (to ensure that this value applies to all=0A=
>> zones of the disk). All other calls to sd_zbc_report_zones() can use a=
=0A=
>> partial report to reduce execution time.=0A=
>>=0A=
>> Using a 14 TB ZBC disk, these simple changes reduce device scan time at=
=0A=
>> boot from about 3.5s down to about 900ms. Disk revalidate times are also=
=0A=
>> reduced from about 450ms down to 230ms.=0A=
>>=0A=
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>=0A=
>> Reviewed-by: Christoph Hellwig <hch@lst.de>=0A=
>> ---=0A=
>>   drivers/scsi/sd_zbc.c | 94 ++++++++++++++++++-------------------------=
=0A=
>>   1 file changed, 40 insertions(+), 54 deletions(-)=0A=
>>=0A=
> Reviewed-by: Hannes Reinecke <hare@suse.com>=0A=
> =0A=
> ... and it'll be your fault if there are drives which do not support the =
=0A=
> 'partial' bit :)=0A=
=0A=
The partial bit definition is in the ZBC/ZAC specs and it is not optional. =
If a=0A=
drive does not support it, it would be out of specs :)=0A=
=0A=
> =0A=
> Cheers,=0A=
> =0A=
> Hannes=0A=
> =0A=
=0A=
=0A=
-- =0A=
Damien Le Moal=0A=
Western Digital Research=0A=

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time
@ 2018-10-12  7:48       ` Damien Le Moal
  0 siblings, 0 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  7:48 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 2018/10/12 16:33, Hannes Reinecke wrote:
> On 10/12/18 4:30 AM, Damien Le Moal wrote:
>> Handling checks of ZBC device capacity using the max_lba field of the
>> REPORT ZONES command reply for disks with rc_basis == 0 can be done
>> using the same report zones command reply used to check the "same"
>> field.
>>
>> Avoid executing a report zones command solely to check the disk capacity
>> by merging sd_zbc_check_capacity() into sd_zbc_check_zone_size() and
>> renaming that function to sd_zbc_check_zones(). This removes a costly
>> execution of a full report zones command and so reduces device scan
>> duration at boot time as well as the duration of disk revalidate calls.
>>
>> Furthermore, setting the partial report bit in the REPORT ZONES command
>> cdb can significantly reduce this command execution time as the device
>> does not have to count and report the total number of zones that could
>> be reported assuming a large enough reply buffer. A non-partial zone
>> report is necessary only for the first execution of report zones used to
>> check the same field value (to ensure that this value applies to all
>> zones of the disk). All other calls to sd_zbc_report_zones() can use a
>> partial report to reduce execution time.
>>
>> Using a 14 TB ZBC disk, these simple changes reduce device scan time at
>> boot from about 3.5s down to about 900ms. Disk revalidate times are also
>> reduced from about 450ms down to 230ms.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   drivers/scsi/sd_zbc.c | 94 ++++++++++++++++++-------------------------
>>   1 file changed, 40 insertions(+), 54 deletions(-)
>>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> 
> ... and it'll be your fault if there are drives which do not support the 
> 'partial' bit :)

The partial bit definition is in the ZBC/ZAC specs and it is not optional. If a
drive does not support it, it would be out of specs :)

> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 03/11] scsi: sd_zbc: Fix sd_zbc_check_zones() error checks
  2018-10-12  7:35     ` Hannes Reinecke
@ 2018-10-12  7:50       ` Damien Le Moal
  -1 siblings, 0 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  7:50 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 2018/10/12 16:35, Hannes Reinecke wrote:=0A=
> On 10/12/18 4:30 AM, Damien Le Moal wrote:=0A=
>> The 32 bits overflow check for the zone size value is already done=0A=
>> within sd_zbc_check_zones() with the test:=0A=
>>=0A=
>> } else if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {=
=0A=
>>=0A=
>> so there is no need to check again for an out of range value in=0A=
>> sd_zbc_read_zones(). Simplify the code and fix sd_zbc_check_zones()=0A=
>> error return to -EFBIG instead of -ENODEV if the zone size is too large.=
=0A=
>>=0A=
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>=0A=
>> Reviewed-by: Christoph Hellwig <hch@lst.de>=0A=
>> ---=0A=
>>   drivers/scsi/sd_zbc.c | 15 ++++++---------=0A=
>>   1 file changed, 6 insertions(+), 9 deletions(-)=0A=
>>=0A=
>> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c=0A=
>> index ca73c46931c0..44b64b4a922a 100644=0A=
>> --- a/drivers/scsi/sd_zbc.c=0A=
>> +++ b/drivers/scsi/sd_zbc.c=0A=
>> @@ -373,7 +373,7 @@ static int sd_zbc_check_zoned_characteristics(struct=
 scsi_disk *sdkp,=0A=
>>    * Returns the zone size in number of blocks upon success or an error =
code=0A=
>>    * upon failure.=0A=
>>    */=0A=
>> -static s64 sd_zbc_check_zones(struct scsi_disk *sdkp)=0A=
>> +static s32 sd_zbc_check_zones(struct scsi_disk *sdkp)=0A=
>>   {=0A=
>>   	u64 zone_blocks =3D 0;=0A=
>>   	sector_t max_lba, block =3D 0;=0A=
> Does this have to be an 's32' ?=0A=
> Seeing that it's overloaded with an error code, and never sent to any =0A=
> hardware, can't we just make it a simple 'int' ?=0A=
=0A=
OK. Will do.=0A=
=0A=
=0A=
-- =0A=
Damien Le Moal=0A=
Western Digital Research=0A=

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 03/11] scsi: sd_zbc: Fix sd_zbc_check_zones() error checks
@ 2018-10-12  7:50       ` Damien Le Moal
  0 siblings, 0 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  7:50 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 2018/10/12 16:35, Hannes Reinecke wrote:
> On 10/12/18 4:30 AM, Damien Le Moal wrote:
>> The 32 bits overflow check for the zone size value is already done
>> within sd_zbc_check_zones() with the test:
>>
>> } else if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {
>>
>> so there is no need to check again for an out of range value in
>> sd_zbc_read_zones(). Simplify the code and fix sd_zbc_check_zones()
>> error return to -EFBIG instead of -ENODEV if the zone size is too large.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   drivers/scsi/sd_zbc.c | 15 ++++++---------
>>   1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
>> index ca73c46931c0..44b64b4a922a 100644
>> --- a/drivers/scsi/sd_zbc.c
>> +++ b/drivers/scsi/sd_zbc.c
>> @@ -373,7 +373,7 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
>>    * Returns the zone size in number of blocks upon success or an error code
>>    * upon failure.
>>    */
>> -static s64 sd_zbc_check_zones(struct scsi_disk *sdkp)
>> +static s32 sd_zbc_check_zones(struct scsi_disk *sdkp)
>>   {
>>   	u64 zone_blocks = 0;
>>   	sector_t max_lba, block = 0;
> Does this have to be an 's32' ?
> Seeing that it's overloaded with an error code, and never sent to any 
> hardware, can't we just make it a simple 'int' ?

OK. Will do.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
  2018-10-12  7:41     ` Hannes Reinecke
@ 2018-10-12  7:55       ` Damien Le Moal
  -1 siblings, 0 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  7:55 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 2018/10/12 16:41, Hannes Reinecke wrote:=0A=
> On 10/12/18 4:30 AM, Damien Le Moal wrote:=0A=
>> Expose through sysfs the nr_zones field of struct request_queue.=0A=
>> Exposing this value helps in debugging disk issues as well as=0A=
>> facilitating scripts based use of the disk (e.g. blktests).=0A=
>>=0A=
>> For zoned block devices, the nr_zones field indicates the total number=
=0A=
>> of zones of the device calculated using the known disk capacity and=0A=
>> zone size. This number of zones is always 0 for regular block devices.=
=0A=
>>=0A=
>> Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,=0A=
>> introduce the blk_queue_nr_zones() function to return the correct value=
=0A=
>> for any device, regardless if CONFIG_BLK_DEV_ZONED is set.=0A=
>>=0A=
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>=0A=
>> ---=0A=
>>   block/blk-sysfs.c      | 11 +++++++++++=0A=
>>   include/linux/blkdev.h | 10 ++++++++++=0A=
>>   2 files changed, 21 insertions(+)=0A=
>>=0A=
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c=0A=
>> index 3772671cf2bc..92be8092ca4f 100644=0A=
>> --- a/block/blk-sysfs.c=0A=
>> +++ b/block/blk-sysfs.c=0A=
>> @@ -300,6 +300,11 @@ static ssize_t queue_zoned_show(struct request_queu=
e *q, char *page)=0A=
>>   	}=0A=
>>   }=0A=
>>   =0A=
>> +static ssize_t queue_nr_zones_show(struct request_queue *q, char *page)=
=0A=
>> +{=0A=
>> +	return queue_var_show(blk_queue_nr_zones(q), page);=0A=
>> +}=0A=
>> +=0A=
>>   static ssize_t queue_nomerges_show(struct request_queue *q, char *page=
)=0A=
>>   {=0A=
>>   	return queue_var_show((blk_queue_nomerges(q) << 1) |=0A=
>> @@ -637,6 +642,11 @@ static struct queue_sysfs_entry queue_zoned_entry =
=3D {=0A=
>>   	.show =3D queue_zoned_show,=0A=
>>   };=0A=
>>   =0A=
>> +static struct queue_sysfs_entry queue_nr_zones_entry =3D {=0A=
>> +	.attr =3D {.name =3D "nr_zones", .mode =3D 0444 },=0A=
>> +	.show =3D queue_nr_zones_show,=0A=
>> +};=0A=
>> +=0A=
>>   static struct queue_sysfs_entry queue_nomerges_entry =3D {=0A=
>>   	.attr =3D {.name =3D "nomerges", .mode =3D 0644 },=0A=
>>   	.show =3D queue_nomerges_show,=0A=
>> @@ -727,6 +737,7 @@ static struct attribute *default_attrs[] =3D {=0A=
>>   	&queue_write_zeroes_max_entry.attr,=0A=
>>   	&queue_nonrot_entry.attr,=0A=
>>   	&queue_zoned_entry.attr,=0A=
>> +	&queue_nr_zones_entry.attr,=0A=
>>   	&queue_nomerges_entry.attr,=0A=
>>   	&queue_rq_affinity_entry.attr,=0A=
>>   	&queue_iostats_entry.attr,=0A=
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h=0A=
>> index c24969b1741b..879753b10263 100644=0A=
>> --- a/include/linux/blkdev.h=0A=
>> +++ b/include/linux/blkdev.h=0A=
>> @@ -804,6 +804,11 @@ static inline unsigned int blk_queue_zone_sectors(s=
truct request_queue *q)=0A=
>>   }=0A=
>>   =0A=
>>   #ifdef CONFIG_BLK_DEV_ZONED=0A=
>> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)=
=0A=
>> +{=0A=
>> +	return blk_queue_is_zoned(q) ? q->nr_zones : 0;=0A=
>> +}=0A=
>> +=0A=
>>   static inline unsigned int blk_queue_zone_no(struct request_queue *q,=
=0A=
>>   					     sector_t sector)=0A=
>>   {=0A=
>> @@ -819,6 +824,11 @@ static inline bool blk_queue_zone_is_seq(struct req=
uest_queue *q,=0A=
>>   		return false;=0A=
>>   	return test_bit(blk_queue_zone_no(q, sector), q->seq_zones_bitmap);=
=0A=
>>   }=0A=
>> +#else /* CONFIG_BLK_DEV_ZONED */=0A=
>> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)=
=0A=
>> +{=0A=
>> +	return 0;=0A=
>> +}=0A=
>>   #endif /* CONFIG_BLK_DEV_ZONED */=0A=
>>   =0A=
>>   static inline bool rq_is_sync(struct request *rq)=0A=
>>=0A=
> Actually, we should be checking whether we can't blank out this =0A=
> attribute via the is_visible mechanism; after all, not every block =0A=
> device is zoned, and those which are not have no need of the attribute...=
=0A=
=0A=
Yes we could. But it is somewhat the same as drives that do not support tri=
m:=0A=
they get max_discard_sectors to 0. The attribute says "not supported becaus=
e the=0A=
value is 0". I followed the same principle for nr_zones: value of 0 says "n=
o=0A=
zones", hopefully in sync with "zoned" saying "none". And if these 2 are no=
t in=0A=
sync as such, then we have a defective drive or a bug. So actually always h=
aving=0A=
nr_zones is useful I think.=0A=
=0A=
But I can make it invisible for regular disks if you insist :)=0A=
=0A=
-- =0A=
Damien Le Moal=0A=
Western Digital Research=0A=

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
@ 2018-10-12  7:55       ` Damien Le Moal
  0 siblings, 0 replies; 54+ messages in thread
From: Damien Le Moal @ 2018-10-12  7:55 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 2018/10/12 16:41, Hannes Reinecke wrote:
> On 10/12/18 4:30 AM, Damien Le Moal wrote:
>> Expose through sysfs the nr_zones field of struct request_queue.
>> Exposing this value helps in debugging disk issues as well as
>> facilitating scripts based use of the disk (e.g. blktests).
>>
>> For zoned block devices, the nr_zones field indicates the total number
>> of zones of the device calculated using the known disk capacity and
>> zone size. This number of zones is always 0 for regular block devices.
>>
>> Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
>> introduce the blk_queue_nr_zones() function to return the correct value
>> for any device, regardless if CONFIG_BLK_DEV_ZONED is set.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>   block/blk-sysfs.c      | 11 +++++++++++
>>   include/linux/blkdev.h | 10 ++++++++++
>>   2 files changed, 21 insertions(+)
>>
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 3772671cf2bc..92be8092ca4f 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -300,6 +300,11 @@ static ssize_t queue_zoned_show(struct request_queue *q, char *page)
>>   	}
>>   }
>>   
>> +static ssize_t queue_nr_zones_show(struct request_queue *q, char *page)
>> +{
>> +	return queue_var_show(blk_queue_nr_zones(q), page);
>> +}
>> +
>>   static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>>   {
>>   	return queue_var_show((blk_queue_nomerges(q) << 1) |
>> @@ -637,6 +642,11 @@ static struct queue_sysfs_entry queue_zoned_entry = {
>>   	.show = queue_zoned_show,
>>   };
>>   
>> +static struct queue_sysfs_entry queue_nr_zones_entry = {
>> +	.attr = {.name = "nr_zones", .mode = 0444 },
>> +	.show = queue_nr_zones_show,
>> +};
>> +
>>   static struct queue_sysfs_entry queue_nomerges_entry = {
>>   	.attr = {.name = "nomerges", .mode = 0644 },
>>   	.show = queue_nomerges_show,
>> @@ -727,6 +737,7 @@ static struct attribute *default_attrs[] = {
>>   	&queue_write_zeroes_max_entry.attr,
>>   	&queue_nonrot_entry.attr,
>>   	&queue_zoned_entry.attr,
>> +	&queue_nr_zones_entry.attr,
>>   	&queue_nomerges_entry.attr,
>>   	&queue_rq_affinity_entry.attr,
>>   	&queue_iostats_entry.attr,
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index c24969b1741b..879753b10263 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -804,6 +804,11 @@ static inline unsigned int blk_queue_zone_sectors(struct request_queue *q)
>>   }
>>   
>>   #ifdef CONFIG_BLK_DEV_ZONED
>> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
>> +{
>> +	return blk_queue_is_zoned(q) ? q->nr_zones : 0;
>> +}
>> +
>>   static inline unsigned int blk_queue_zone_no(struct request_queue *q,
>>   					     sector_t sector)
>>   {
>> @@ -819,6 +824,11 @@ static inline bool blk_queue_zone_is_seq(struct request_queue *q,
>>   		return false;
>>   	return test_bit(blk_queue_zone_no(q, sector), q->seq_zones_bitmap);
>>   }
>> +#else /* CONFIG_BLK_DEV_ZONED */
>> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
>> +{
>> +	return 0;
>> +}
>>   #endif /* CONFIG_BLK_DEV_ZONED */
>>   
>>   static inline bool rq_is_sync(struct request *rq)
>>
> Actually, we should be checking whether we can't blank out this 
> attribute via the is_visible mechanism; after all, not every block 
> device is zoned, and those which are not have no need of the attribute...

Yes we could. But it is somewhat the same as drives that do not support trim:
they get max_discard_sectors to 0. The attribute says "not supported because the
value is 0". I followed the same principle for nr_zones: value of 0 says "no
zones", hopefully in sync with "zoned" saying "none". And if these 2 are not in
sync as such, then we have a defective drive or a bug. So actually always having
nr_zones is useful I think.

But I can make it invisible for regular disks if you insist :)

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl
  2018-10-12  2:30 ` [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl Damien Le Moal
@ 2018-10-12  8:23     ` Christoph Hellwig
  2018-10-12  8:23     ` Christoph Hellwig
  1 sibling, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:23 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, Christoph Hellwig, Matias Bjorling

On Fri, Oct 12, 2018 at 11:30:07AM +0900, Damien Le Moal wrote:
> Get a zoned block device zone size in number of 512 B sectors.
> The zone size is always 0 for regular block devices.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl
@ 2018-10-12  8:23     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:23 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, Martin K . Petersen, Mike Snitzer, linux-block,
	dm-devel, linux-scsi, Christoph Hellwig, Matias Bjorling

On Fri, Oct 12, 2018 at 11:30:07AM +0900, Damien Le Moal wrote:
> Get a zoned block device zone size in number of 512 B sectors.
> The zone size is always 0 for regular block devices.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl
  2018-10-12  2:30 ` [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl Damien Le Moal
@ 2018-10-12  8:23     ` Christoph Hellwig
  2018-10-12  8:23     ` Christoph Hellwig
  1 sibling, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:23 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, Christoph Hellwig, Matias Bjorling

On Fri, Oct 12, 2018 at 11:30:08AM +0900, Damien Le Moal wrote:
> Get a zoned block device total number of zones. The device can be a
> partition of the whole device. The number of zones is always 0 for
> regular block devices.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl
@ 2018-10-12  8:23     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:23 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, Martin K . Petersen, Mike Snitzer, linux-block,
	dm-devel, linux-scsi, Christoph Hellwig, Matias Bjorling

On Fri, Oct 12, 2018 at 11:30:08AM +0900, Damien Le Moal wrote:
> Get a zoned block device total number of zones. The device can be a
> partition of the whole device. The number of zones is always 0 for
> regular block devices.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
  2018-10-12  7:41     ` Hannes Reinecke
@ 2018-10-12  8:28       ` Christoph Hellwig
  -1 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:28 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, Christoph Hellwig,
	Matias Bjorling

On Fri, Oct 12, 2018 at 09:41:28AM +0200, Hannes Reinecke wrote:
>> +{
>> +	return 0;
>> +}
>>   #endif /* CONFIG_BLK_DEV_ZONED */
>>     static inline bool rq_is_sync(struct request *rq)
>>
> Actually, we should be checking whether we can't blank out this attribute 
> via the is_visible mechanism; after all, not every block device is zoned, 
> and those which are not have no need of the attribute...

The answer would be 0 in this case.  I'm not sure blanking out is worth
it as it will create a lot more code for very little gain..

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
@ 2018-10-12  8:28       ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:28 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, Damien Le Moal, linux-scsi, Martin K . Petersen,
	Mike Snitzer, linux-block, dm-devel, Christoph Hellwig,
	Matias Bjorling

On Fri, Oct 12, 2018 at 09:41:28AM +0200, Hannes Reinecke wrote:
>> +{
>> +	return 0;
>> +}
>>   #endif /* CONFIG_BLK_DEV_ZONED */
>>     static inline bool rq_is_sync(struct request *rq)
>>
> Actually, we should be checking whether we can't blank out this attribute 
> via the is_visible mechanism; after all, not every block device is zoned, 
> and those which are not have no need of the attribute...

The answer would be 0 in this case.  I'm not sure blanking out is worth
it as it will create a lot more code for very little gain..

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
  2018-10-12  2:30 ` [PATCH v3 09/11] block: Expose queue nr_zones in sysfs Damien Le Moal
@ 2018-10-12  8:29     ` Christoph Hellwig
  2018-10-12  8:29     ` Christoph Hellwig
  1 sibling, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:29 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, Christoph Hellwig, Matias Bjorling

On Fri, Oct 12, 2018 at 11:30:10AM +0900, Damien Le Moal wrote:
> Expose through sysfs the nr_zones field of struct request_queue.
> Exposing this value helps in debugging disk issues as well as
> facilitating scripts based use of the disk (e.g. blktests).
> 
> For zoned block devices, the nr_zones field indicates the total number
> of zones of the device calculated using the known disk capacity and
> zone size. This number of zones is always 0 for regular block devices.
> 
> Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
> introduce the blk_queue_nr_zones() function to return the correct value
> for any device, regardless if CONFIG_BLK_DEV_ZONED is set.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
@ 2018-10-12  8:29     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:29 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, Martin K . Petersen, Mike Snitzer, linux-block,
	dm-devel, linux-scsi, Christoph Hellwig, Matias Bjorling

On Fri, Oct 12, 2018 at 11:30:10AM +0900, Damien Le Moal wrote:
> Expose through sysfs the nr_zones field of struct request_queue.
> Exposing this value helps in debugging disk issues as well as
> facilitating scripts based use of the disk (e.g. blktests).
> 
> For zoned block devices, the nr_zones field indicates the total number
> of zones of the device calculated using the known disk capacity and
> zone size. This number of zones is always 0 for regular block devices.
> 
> Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
> introduce the blk_queue_nr_zones() function to return the correct value
> for any device, regardless if CONFIG_BLK_DEV_ZONED is set.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones()
  2018-10-12  2:30 ` [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones() Damien Le Moal
@ 2018-10-12  8:30     ` Christoph Hellwig
  2018-10-12  8:30     ` Christoph Hellwig
  1 sibling, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:30 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, Christoph Hellwig, Matias Bjorling

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones()
@ 2018-10-12  8:30     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-10-12  8:30 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, Martin K . Petersen, Mike Snitzer, linux-block,
	dm-devel, linux-scsi, Christoph Hellwig, Matias Bjorling

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
  2018-10-12  7:55       ` Damien Le Moal
@ 2018-10-12  8:41         ` Hannes Reinecke
  -1 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  8:41 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 9:55 AM, Damien Le Moal wrote:
> On 2018/10/12 16:41, Hannes Reinecke wrote:
>> On 10/12/18 4:30 AM, Damien Le Moal wrote:
>>> Expose through sysfs the nr_zones field of struct request_queue.
>>> Exposing this value helps in debugging disk issues as well as
>>> facilitating scripts based use of the disk (e.g. blktests).
>>>
>>> For zoned block devices, the nr_zones field indicates the total number
>>> of zones of the device calculated using the known disk capacity and
>>> zone size. This number of zones is always 0 for regular block devices.
>>>
>>> Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
>>> introduce the blk_queue_nr_zones() function to return the correct value
>>> for any device, regardless if CONFIG_BLK_DEV_ZONED is set.
>>>
>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>> ---
>>>    block/blk-sysfs.c      | 11 +++++++++++
>>>    include/linux/blkdev.h | 10 ++++++++++
>>>    2 files changed, 21 insertions(+)
>>>
>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>>> index 3772671cf2bc..92be8092ca4f 100644
>>> --- a/block/blk-sysfs.c
>>> +++ b/block/blk-sysfs.c
>>> @@ -300,6 +300,11 @@ static ssize_t queue_zoned_show(struct request_queue *q, char *page)
>>>    	}
>>>    }
>>>    
>>> +static ssize_t queue_nr_zones_show(struct request_queue *q, char *page)
>>> +{
>>> +	return queue_var_show(blk_queue_nr_zones(q), page);
>>> +}
>>> +
>>>    static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>>>    {
>>>    	return queue_var_show((blk_queue_nomerges(q) << 1) |
>>> @@ -637,6 +642,11 @@ static struct queue_sysfs_entry queue_zoned_entry = {
>>>    	.show = queue_zoned_show,
>>>    };
>>>    
>>> +static struct queue_sysfs_entry queue_nr_zones_entry = {
>>> +	.attr = {.name = "nr_zones", .mode = 0444 },
>>> +	.show = queue_nr_zones_show,
>>> +};
>>> +
>>>    static struct queue_sysfs_entry queue_nomerges_entry = {
>>>    	.attr = {.name = "nomerges", .mode = 0644 },
>>>    	.show = queue_nomerges_show,
>>> @@ -727,6 +737,7 @@ static struct attribute *default_attrs[] = {
>>>    	&queue_write_zeroes_max_entry.attr,
>>>    	&queue_nonrot_entry.attr,
>>>    	&queue_zoned_entry.attr,
>>> +	&queue_nr_zones_entry.attr,
>>>    	&queue_nomerges_entry.attr,
>>>    	&queue_rq_affinity_entry.attr,
>>>    	&queue_iostats_entry.attr,
>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>> index c24969b1741b..879753b10263 100644
>>> --- a/include/linux/blkdev.h
>>> +++ b/include/linux/blkdev.h
>>> @@ -804,6 +804,11 @@ static inline unsigned int blk_queue_zone_sectors(struct request_queue *q)
>>>    }
>>>    
>>>    #ifdef CONFIG_BLK_DEV_ZONED
>>> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
>>> +{
>>> +	return blk_queue_is_zoned(q) ? q->nr_zones : 0;
>>> +}
>>> +
>>>    static inline unsigned int blk_queue_zone_no(struct request_queue *q,
>>>    					     sector_t sector)
>>>    {
>>> @@ -819,6 +824,11 @@ static inline bool blk_queue_zone_is_seq(struct request_queue *q,
>>>    		return false;
>>>    	return test_bit(blk_queue_zone_no(q, sector), q->seq_zones_bitmap);
>>>    }
>>> +#else /* CONFIG_BLK_DEV_ZONED */
>>> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
>>> +{
>>> +	return 0;
>>> +}
>>>    #endif /* CONFIG_BLK_DEV_ZONED */
>>>    
>>>    static inline bool rq_is_sync(struct request *rq)
>>>
>> Actually, we should be checking whether we can't blank out this
>> attribute via the is_visible mechanism; after all, not every block
>> device is zoned, and those which are not have no need of the attribute...
> 
> Yes we could. But it is somewhat the same as drives that do not support trim:
> they get max_discard_sectors to 0. The attribute says "not supported because the
> value is 0". I followed the same principle for nr_zones: value of 0 says "no
> zones", hopefully in sync with "zoned" saying "none". And if these 2 are not in
> sync as such, then we have a defective drive or a bug. So actually always having
> nr_zones is useful I think.
> 
> But I can make it invisible for regular disks if you insist :)
> 
I don't mind; but then I might be zbc-biased already.
Leave up to the maintainers to decide.
Hence:

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/11] block: Expose queue nr_zones in sysfs
@ 2018-10-12  8:41         ` Hannes Reinecke
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-10-12  8:41 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer
  Cc: Christoph Hellwig, Matias Bjorling

On 10/12/18 9:55 AM, Damien Le Moal wrote:
> On 2018/10/12 16:41, Hannes Reinecke wrote:
>> On 10/12/18 4:30 AM, Damien Le Moal wrote:
>>> Expose through sysfs the nr_zones field of struct request_queue.
>>> Exposing this value helps in debugging disk issues as well as
>>> facilitating scripts based use of the disk (e.g. blktests).
>>>
>>> For zoned block devices, the nr_zones field indicates the total number
>>> of zones of the device calculated using the known disk capacity and
>>> zone size. This number of zones is always 0 for regular block devices.
>>>
>>> Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
>>> introduce the blk_queue_nr_zones() function to return the correct value
>>> for any device, regardless if CONFIG_BLK_DEV_ZONED is set.
>>>
>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>> ---
>>>    block/blk-sysfs.c      | 11 +++++++++++
>>>    include/linux/blkdev.h | 10 ++++++++++
>>>    2 files changed, 21 insertions(+)
>>>
>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>>> index 3772671cf2bc..92be8092ca4f 100644
>>> --- a/block/blk-sysfs.c
>>> +++ b/block/blk-sysfs.c
>>> @@ -300,6 +300,11 @@ static ssize_t queue_zoned_show(struct request_queue *q, char *page)
>>>    	}
>>>    }
>>>    
>>> +static ssize_t queue_nr_zones_show(struct request_queue *q, char *page)
>>> +{
>>> +	return queue_var_show(blk_queue_nr_zones(q), page);
>>> +}
>>> +
>>>    static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>>>    {
>>>    	return queue_var_show((blk_queue_nomerges(q) << 1) |
>>> @@ -637,6 +642,11 @@ static struct queue_sysfs_entry queue_zoned_entry = {
>>>    	.show = queue_zoned_show,
>>>    };
>>>    
>>> +static struct queue_sysfs_entry queue_nr_zones_entry = {
>>> +	.attr = {.name = "nr_zones", .mode = 0444 },
>>> +	.show = queue_nr_zones_show,
>>> +};
>>> +
>>>    static struct queue_sysfs_entry queue_nomerges_entry = {
>>>    	.attr = {.name = "nomerges", .mode = 0644 },
>>>    	.show = queue_nomerges_show,
>>> @@ -727,6 +737,7 @@ static struct attribute *default_attrs[] = {
>>>    	&queue_write_zeroes_max_entry.attr,
>>>    	&queue_nonrot_entry.attr,
>>>    	&queue_zoned_entry.attr,
>>> +	&queue_nr_zones_entry.attr,
>>>    	&queue_nomerges_entry.attr,
>>>    	&queue_rq_affinity_entry.attr,
>>>    	&queue_iostats_entry.attr,
>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>> index c24969b1741b..879753b10263 100644
>>> --- a/include/linux/blkdev.h
>>> +++ b/include/linux/blkdev.h
>>> @@ -804,6 +804,11 @@ static inline unsigned int blk_queue_zone_sectors(struct request_queue *q)
>>>    }
>>>    
>>>    #ifdef CONFIG_BLK_DEV_ZONED
>>> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
>>> +{
>>> +	return blk_queue_is_zoned(q) ? q->nr_zones : 0;
>>> +}
>>> +
>>>    static inline unsigned int blk_queue_zone_no(struct request_queue *q,
>>>    					     sector_t sector)
>>>    {
>>> @@ -819,6 +824,11 @@ static inline bool blk_queue_zone_is_seq(struct request_queue *q,
>>>    		return false;
>>>    	return test_bit(blk_queue_zone_no(q, sector), q->seq_zones_bitmap);
>>>    }
>>> +#else /* CONFIG_BLK_DEV_ZONED */
>>> +static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
>>> +{
>>> +	return 0;
>>> +}
>>>    #endif /* CONFIG_BLK_DEV_ZONED */
>>>    
>>>    static inline bool rq_is_sync(struct request *rq)
>>>
>> Actually, we should be checking whether we can't blank out this
>> attribute via the is_visible mechanism; after all, not every block
>> device is zoned, and those which are not have no need of the attribute...
> 
> Yes we could. But it is somewhat the same as drives that do not support trim:
> they get max_discard_sectors to 0. The attribute says "not supported because the
> value is 0". I followed the same principle for nr_zones: value of 0 says "no
> zones", hopefully in sync with "zoned" saying "none". And if these 2 are not in
> sync as such, then we have a defective drive or a bug. So actually always having
> nr_zones is useful I think.
> 
> But I can make it invisible for regular disks if you insist :)
> 
I don't mind; but then I might be zbc-biased already.
Leave up to the maintainers to decide.
Hence:

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time
  2018-10-12  7:33     ` Hannes Reinecke
@ 2018-10-12 10:28       ` Matias Bjorling
  -1 siblings, 0 replies; 54+ messages in thread
From: Matias Bjorling @ 2018-10-12 10:28 UTC (permalink / raw)
  To: hare, Damien Le Moal, linux-block, axboe, linux-scsi,
	martin.petersen, dm-devel, snitzer
  Cc: hch

DQoNCk9uIDEwLzEyLzIwMTggMDk6MzMgQU0sIEhhbm5lcyBSZWluZWNrZSB3cm90ZToNCj4gT24g
MTAvMTIvMTggNDozMCBBTSwgRGFtaWVuIExlIE1vYWwgd3JvdGU6DQo+PiBIYW5kbGluZyBjaGVj
a3Mgb2YgWkJDIGRldmljZSBjYXBhY2l0eSB1c2luZyB0aGUgbWF4X2xiYSBmaWVsZCBvZiB0aGUN
Cj4+IFJFUE9SVCBaT05FUyBjb21tYW5kIHJlcGx5IGZvciBkaXNrcyB3aXRoIHJjX2Jhc2lzID09
IDAgY2FuIGJlIGRvbmUNCj4+IHVzaW5nIHRoZSBzYW1lIHJlcG9ydCB6b25lcyBjb21tYW5kIHJl
cGx5IHVzZWQgdG8gY2hlY2sgdGhlICJzYW1lIg0KPj4gZmllbGQuDQo+Pg0KPj4gQXZvaWQgZXhl
Y3V0aW5nIGEgcmVwb3J0IHpvbmVzIGNvbW1hbmQgc29sZWx5IHRvIGNoZWNrIHRoZSBkaXNrIGNh
cGFjaXR5DQo+PiBieSBtZXJnaW5nIHNkX3piY19jaGVja19jYXBhY2l0eSgpIGludG8gc2RfemJj
X2NoZWNrX3pvbmVfc2l6ZSgpIGFuZA0KPj4gcmVuYW1pbmcgdGhhdCBmdW5jdGlvbiB0byBzZF96
YmNfY2hlY2tfem9uZXMoKS4gVGhpcyByZW1vdmVzIGEgY29zdGx5DQo+PiBleGVjdXRpb24gb2Yg
YSBmdWxsIHJlcG9ydCB6b25lcyBjb21tYW5kIGFuZCBzbyByZWR1Y2VzIGRldmljZSBzY2FuDQo+
PiBkdXJhdGlvbiBhdCBib290IHRpbWUgYXMgd2VsbCBhcyB0aGUgZHVyYXRpb24gb2YgZGlzayBy
ZXZhbGlkYXRlIGNhbGxzLg0KPj4NCj4+IEZ1cnRoZXJtb3JlLCBzZXR0aW5nIHRoZSBwYXJ0aWFs
IHJlcG9ydCBiaXQgaW4gdGhlIFJFUE9SVCBaT05FUyBjb21tYW5kDQo+PiBjZGIgY2FuIHNpZ25p
ZmljYW50bHkgcmVkdWNlIHRoaXMgY29tbWFuZCBleGVjdXRpb24gdGltZSBhcyB0aGUgZGV2aWNl
DQo+PiBkb2VzIG5vdCBoYXZlIHRvIGNvdW50IGFuZCByZXBvcnQgdGhlIHRvdGFsIG51bWJlciBv
ZiB6b25lcyB0aGF0IGNvdWxkDQo+PiBiZSByZXBvcnRlZCBhc3N1bWluZyBhIGxhcmdlIGVub3Vn
aCByZXBseSBidWZmZXIuIEEgbm9uLXBhcnRpYWwgem9uZQ0KPj4gcmVwb3J0IGlzIG5lY2Vzc2Fy
eSBvbmx5IGZvciB0aGUgZmlyc3QgZXhlY3V0aW9uIG9mIHJlcG9ydCB6b25lcyB1c2VkIHRvDQo+
PiBjaGVjayB0aGUgc2FtZSBmaWVsZCB2YWx1ZSAodG8gZW5zdXJlIHRoYXQgdGhpcyB2YWx1ZSBh
cHBsaWVzIHRvIGFsbA0KPj4gem9uZXMgb2YgdGhlIGRpc2spLiBBbGwgb3RoZXIgY2FsbHMgdG8g
c2RfemJjX3JlcG9ydF96b25lcygpIGNhbiB1c2UgYQ0KPj4gcGFydGlhbCByZXBvcnQgdG8gcmVk
dWNlIGV4ZWN1dGlvbiB0aW1lLg0KPj4NCj4+IFVzaW5nIGEgMTQgVEIgWkJDIGRpc2ssIHRoZXNl
IHNpbXBsZSBjaGFuZ2VzIHJlZHVjZSBkZXZpY2Ugc2NhbiB0aW1lIGF0DQo+PiBib290IGZyb20g
YWJvdXQgMy41cyBkb3duIHRvIGFib3V0IDkwMG1zLiBEaXNrIHJldmFsaWRhdGUgdGltZXMgYXJl
IGFsc28NCj4+IHJlZHVjZWQgZnJvbSBhYm91dCA0NTBtcyBkb3duIHRvIDIzMG1zLg0KPj4NCj4+
IFNpZ25lZC1vZmYtYnk6IERhbWllbiBMZSBNb2FsIDxkYW1pZW4ubGVtb2FsQHdkYy5jb20+DQo+
PiBSZXZpZXdlZC1ieTogQ2hyaXN0b3BoIEhlbGx3aWcgPGhjaEBsc3QuZGU+DQo+PiAtLS0NCj4+
IMKgIGRyaXZlcnMvc2NzaS9zZF96YmMuYyB8IDk0ICsrKysrKysrKysrKysrKysrKy0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0NCj4+IMKgIDEgZmlsZSBjaGFuZ2VkLCA0MCBpbnNlcnRpb25zKCsp
LCA1NCBkZWxldGlvbnMoLSkNCj4+DQo+IFJldmlld2VkLWJ5OiBIYW5uZXMgUmVpbmVja2UgPGhh
cmVAc3VzZS5jb20+DQo+DQo+IC4uLiBhbmQgaXQnbGwgYmUgeW91ciBmYXVsdCBpZiB0aGVyZSBh
cmUgZHJpdmVzIHdoaWNoIGRvIG5vdCBzdXBwb3J0IA0KPiB0aGUgJ3BhcnRpYWwnIGJpdCA6KQ0K
Pg0KDQpJIGFscmVhZHkgbG9vayBmb3J3YXJkIHRoZSBxdWlyayBsaXN0IDspDQoNCj4gQ2hlZXJz
LA0KPg0KPiBIYW5uZXMNCg0K

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time
@ 2018-10-12 10:28       ` Matias Bjorling
  0 siblings, 0 replies; 54+ messages in thread
From: Matias Bjorling @ 2018-10-12 10:28 UTC (permalink / raw)
  To: hare, Damien Le Moal, linux-block, axboe, linux-scsi,
	martin.petersen, dm-devel, snitzer
  Cc: hch



On 10/12/2018 09:33 AM, Hannes Reinecke wrote:
> On 10/12/18 4:30 AM, Damien Le Moal wrote:
>> Handling checks of ZBC device capacity using the max_lba field of the
>> REPORT ZONES command reply for disks with rc_basis == 0 can be done
>> using the same report zones command reply used to check the "same"
>> field.
>>
>> Avoid executing a report zones command solely to check the disk capacity
>> by merging sd_zbc_check_capacity() into sd_zbc_check_zone_size() and
>> renaming that function to sd_zbc_check_zones(). This removes a costly
>> execution of a full report zones command and so reduces device scan
>> duration at boot time as well as the duration of disk revalidate calls.
>>
>> Furthermore, setting the partial report bit in the REPORT ZONES command
>> cdb can significantly reduce this command execution time as the device
>> does not have to count and report the total number of zones that could
>> be reported assuming a large enough reply buffer. A non-partial zone
>> report is necessary only for the first execution of report zones used to
>> check the same field value (to ensure that this value applies to all
>> zones of the disk). All other calls to sd_zbc_report_zones() can use a
>> partial report to reduce execution time.
>>
>> Using a 14 TB ZBC disk, these simple changes reduce device scan time at
>> boot from about 3.5s down to about 900ms. Disk revalidate times are also
>> reduced from about 450ms down to 230ms.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   drivers/scsi/sd_zbc.c | 94 ++++++++++++++++++-------------------------
>>   1 file changed, 40 insertions(+), 54 deletions(-)
>>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
>
> ... and it'll be your fault if there are drives which do not support 
> the 'partial' bit :)
>

I already look forward the quirk list ;)

> Cheers,
>
> Hannes


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2018-10-12 15:16 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-12  2:30 [PATCH v3 00/11] Zoned block device support improvements Damien Le Moal
2018-10-12  2:30 ` [PATCH v3 01/11] scsi: sd_zbc: Rearrange code Damien Le Moal
2018-10-12  7:29   ` Hannes Reinecke
2018-10-12  7:29     ` Hannes Reinecke
2018-10-12  2:30 ` [PATCH v3 02/11] scsi: sd_zbc: Reduce boot device scan and revalidate time Damien Le Moal
2018-10-12  7:33   ` Hannes Reinecke
2018-10-12  7:33     ` Hannes Reinecke
2018-10-12  7:48     ` Damien Le Moal
2018-10-12  7:48       ` Damien Le Moal
2018-10-12 10:28     ` Matias Bjorling
2018-10-12 10:28       ` Matias Bjorling
2018-10-12  2:30 ` [PATCH v3 03/11] scsi: sd_zbc: Fix sd_zbc_check_zones() error checks Damien Le Moal
2018-10-12  7:35   ` Hannes Reinecke
2018-10-12  7:35     ` Hannes Reinecke
2018-10-12  7:50     ` Damien Le Moal
2018-10-12  7:50       ` Damien Le Moal
2018-10-12  2:30 ` [PATCH v3 04/11] block: Introduce blkdev_nr_zones() helper Damien Le Moal
2018-10-12  7:36   ` Hannes Reinecke
2018-10-12  7:36     ` Hannes Reinecke
2018-10-12  2:30 ` [PATCH v3 05/11] block: Limit allocation of zone descriptors for report zones Damien Le Moal
2018-10-12  7:36   ` Hannes Reinecke
2018-10-12  7:36     ` Hannes Reinecke
2018-10-12  2:30 ` [PATCH v3 06/11] block: Introduce BLKGETZONESZ ioctl Damien Le Moal
2018-10-12  7:37   ` Hannes Reinecke
2018-10-12  7:37     ` Hannes Reinecke
2018-10-12  8:23   ` Christoph Hellwig
2018-10-12  8:23     ` Christoph Hellwig
2018-10-12  2:30 ` [PATCH v3 07/11] block: Introduce BLKGETNRZONES ioctl Damien Le Moal
2018-10-12  7:37   ` Hannes Reinecke
2018-10-12  7:37     ` Hannes Reinecke
2018-10-12  8:23   ` Christoph Hellwig
2018-10-12  8:23     ` Christoph Hellwig
2018-10-12  2:30 ` [PATCH v3 08/11] block: Improve zone reset execution Damien Le Moal
2018-10-12  7:39   ` Hannes Reinecke
2018-10-12  7:39     ` Hannes Reinecke
2018-10-12  2:30 ` [PATCH v3 09/11] block: Expose queue nr_zones in sysfs Damien Le Moal
2018-10-12  7:41   ` Hannes Reinecke
2018-10-12  7:41     ` Hannes Reinecke
2018-10-12  7:55     ` Damien Le Moal
2018-10-12  7:55       ` Damien Le Moal
2018-10-12  8:41       ` Hannes Reinecke
2018-10-12  8:41         ` Hannes Reinecke
2018-10-12  8:28     ` Christoph Hellwig
2018-10-12  8:28       ` Christoph Hellwig
2018-10-12  8:29   ` Christoph Hellwig
2018-10-12  8:29     ` Christoph Hellwig
2018-10-12  2:30 ` [PATCH v3 10/11] block: add a report_zones method Damien Le Moal
2018-10-12  7:42   ` Hannes Reinecke
2018-10-12  7:42     ` Hannes Reinecke
2018-10-12  2:30 ` [PATCH v3 11/11] block: Introduce blk_revalidate_disk_zones() Damien Le Moal
2018-10-12  7:44   ` Hannes Reinecke
2018-10-12  7:44     ` Hannes Reinecke
2018-10-12  8:30   ` Christoph Hellwig
2018-10-12  8:30     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.