linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] Zoned block device enhancements and zone report rework
@ 2019-11-08  1:56 Damien Le Moal
  2019-11-08  1:56 ` [PATCH 1/9] block: Enhance blk_revalidate_disk_zones() Damien Le Moal
                   ` (8 more replies)
  0 siblings, 9 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:56 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

This series of patches introduces changes to zoned block device handling
code with the intent to simplify the code while optimizing run-time
operation, particularly in the area of zone reporting.

The first patch lifts the device zone check code out of the sd driver
and reimplements these zone checks generically as part of
blk_revalidate_disk_zones(). This avoids zoned block device drivers to
have to implement these checks. The second patch simplifies this
function code for the !zoned case.

The third patch is a small cleanup of zone report processing in
preparation for the fourth patch which removes support for partitions
on zoned block devices. As mentioned in that patch commit message, none
of the known partitioning tools support zoned devices and there are no
known use case in the field of SMR disks being used with partitions.
Dropping partition supports allows to significantly simplify the code
for zone report as zone sector values remapping becomes unnecessary.

Patch 5 to 6 are small cleanups and fixes of the null_blk driver zoned
mode.

The prep patch 7 optimizes zone report buffer allocation for the SCSI
sd driver. Finally, patch 8 introduces a new interface for report zones
handling using a callback function executed per zone reported by the
device. This allows avoiding the need to allocate large arrays of
blk_zone structures for the execution of zone reports. This can
significantly reduce memory usage and pressure on the memory management
system while significantly simplify the code all over.

Overall, this series not only reduces significantly the code size, it
also improves run-time memory usage for zone report execution.

This series applies cleanly on the for-next block tree on top of the
zone management operation series. It may however create a conflict with
Christoph's reqork of disk size revalidation. Please consider this
series for inclusion in the 5.5 kernel.

As always, comments are welcome.


Christoph Hellwig (4):
  block: cleanup the !zoned case in blk_revalidate_disk_zones
  null_blk: clean up the block device operations
  null_blk: clean up report zones
  block: rework zone reporting

Damien Le Moal (5):
  block: Enhance blk_revalidate_disk_zones()
  block: Simplify report zones execution
  block: Remove partition support for zoned block devices
  null_blk: Add zone_nr_conv to features
  scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()

 block/blk-core.c               |   6 +-
 block/blk-zoned.c              | 356 ++++++++++++++-------------------
 block/partition-generic.c      |  74 +------
 drivers/block/null_blk.h       |  11 +-
 drivers/block/null_blk_main.c  |  21 +-
 drivers/block/null_blk_zoned.c |  33 ++-
 drivers/md/dm-flakey.c         |  18 +-
 drivers/md/dm-linear.c         |  20 +-
 drivers/md/dm-zoned-metadata.c | 131 +++++-------
 drivers/md/dm.c                | 130 +++++-------
 drivers/scsi/sd.h              |   4 +-
 drivers/scsi/sd_zbc.c          | 235 ++++++++--------------
 fs/f2fs/super.c                |  51 ++---
 include/linux/blkdev.h         |  15 +-
 include/linux/device-mapper.h  |  24 ++-
 15 files changed, 427 insertions(+), 702 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/9] block: Enhance blk_revalidate_disk_zones()
  2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
@ 2019-11-08  1:56 ` Damien Le Moal
  2019-11-08  6:28   ` [dm-devel] " Christoph Hellwig
  2019-11-08  7:10   ` Hannes Reinecke
  2019-11-08  1:56 ` [PATCH 2/9] block: cleanup the !zoned case in blk_revalidate_disk_zones Damien Le Moal
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:56 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

For ZBC and ZAC zoned devices, the scsi driver revalidation processing
implemented by sd_revalidate_disk() includes a call to
sd_zbc_read_zones() which executes a full disk zone report used to
check that all zones of the disk are the same size. This processing is
followed by a call to blk_revalidate_disk_zones(), used to initialize
the device request queue zone bitmaps (zone type and zone write lock
bitmaps). To do so, blk_revalidate_disk_zones() also executes a full
device zone report to obtain zone types. As a result, the entire
zoned block device revalidation process includes two full device zone
report.

By moving the zone size checks into blk_revalidate_disk_zones(), this
process can be optimized to a single full device zone report, leading to
shorter device scan and revalidation times. This patch implements this
optimization, reducing the original full device zone report implemented
in sd_zbc_check_zones() to a single, small, report zones command
execution to obtain the size of the first zone of the device. Checks
whether all zones of the device are the same size as the first zone
size are moved to the generic blk_check_zone() function called from
blk_revalidate_disk_zones().

This optimization also has the following benefits:
1) fewer memory allocations in the scsi layer during disk revalidation
   as the potentailly large buffer for zone report execution is not
   needed.
2) Implement zone checks in a generic manner, reducing the burden on
   device driver which only need to obtain the zone size and check that
   this size is a power of 2 number of LBAs. Any new type of zoned
   block device will benefit from this.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-zoned.c     |  62 +++++++++++++++++++++++-
 drivers/scsi/sd_zbc.c | 107 ++++++++----------------------------------
 2 files changed, 80 insertions(+), 89 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 481eaf7d04d4..dae787f67019 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -448,6 +448,58 @@ void blk_queue_free_zone_bitmaps(struct request_queue *q)
 	q->seq_zones_wlock = NULL;
 }
 
+/*
+ * Helper function to check the validity of zones of a zoned block device.
+ */
+static bool blk_zone_valid(struct gendisk *disk, struct blk_zone *zone,
+			   sector_t *sector)
+{
+	struct request_queue *q = disk->queue;
+	sector_t zone_sectors = blk_queue_zone_sectors(q);
+	sector_t capacity = get_capacity(disk);
+
+	/*
+	 * All zones must have the same size, with the exception on an eventual
+	 * smaller last zone.
+	 */
+	if (zone->start + zone_sectors < capacity &&
+	    zone->len != zone_sectors) {
+		pr_warn("%s: Invalid zoned device with non constant zone size\n",
+			disk->disk_name);
+		return false;
+	}
+
+	if (zone->start + zone->len >= capacity &&
+	    zone->len > zone_sectors) {
+		pr_warn("%s: Invalid zoned device with larger last zone size\n",
+			disk->disk_name);
+		return false;
+	}
+
+	/* Check for holes in the zone report */
+	if (zone->start != *sector) {
+		pr_warn("%s: Zone gap at sectors %llu..%llu\n",
+			disk->disk_name, *sector, zone->start);
+		return false;
+	}
+
+	/* Check zone type */
+	switch (zone->type) {
+	case BLK_ZONE_TYPE_CONVENTIONAL:
+	case BLK_ZONE_TYPE_SEQWRITE_REQ:
+	case BLK_ZONE_TYPE_SEQWRITE_PREF:
+		break;
+	default:
+		pr_warn("%s: Invalid zone type 0x%x at sectors %llu\n",
+			disk->disk_name, (int)zone->type, zone->start);
+		return false;
+	}
+
+	*sector += zone->len;
+
+	return true;
+}
+
 /**
  * blk_revalidate_disk_zones - (re)allocate and initialize zone bitmaps
  * @disk:	Target disk
@@ -497,7 +549,10 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
 	if (!seq_zones_bitmap)
 		goto out;
 
-	/* Get zone information and initialize seq_zones_bitmap */
+	/*
+	 * Get zone information to check the zones and initialize
+	 * seq_zones_bitmap.
+	 */
 	rep_nr_zones = nr_zones;
 	zones = blk_alloc_zones(&rep_nr_zones);
 	if (!zones)
@@ -511,11 +566,14 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
 		if (!nrz)
 			break;
 		for (i = 0; i < nrz; i++) {
+			if (!blk_zone_valid(disk, &zones[i], &sector)) {
+				ret = -ENODEV;
+				goto out;
+			}
 			if (zones[i].type != BLK_ZONE_TYPE_CONVENTIONAL)
 				set_bit(z, seq_zones_bitmap);
 			z++;
 		}
-		sector += nrz * blk_queue_zone_sectors(q);
 	}
 
 	if (WARN_ON(z != nr_zones)) {
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 39f10ec0dfcf..7c4690f26698 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -339,32 +339,19 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
  * Returns the zone size in number of blocks upon success or an error code
  * upon failure.
  */
-static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
+static int sd_zbc_check_zones(struct scsi_disk *sdkp, unsigned char *buf,
+			      u32 *zblocks)
 {
-	size_t bufsize, buflen;
-	unsigned int noio_flag;
+	size_t buflen;
 	u64 zone_blocks = 0;
-	sector_t max_lba, block = 0;
-	unsigned char *buf;
+	sector_t max_lba;
 	unsigned char *rec;
 	int ret;
-	u8 same;
-
-	/* Do all memory allocations as if GFP_NOIO was specified */
-	noio_flag = memalloc_noio_save();
 
-	/* Get a buffer */
-	buf = sd_zbc_alloc_report_buffer(sdkp, SD_ZBC_REPORT_MAX_ZONES,
-					 &bufsize);
-	if (!buf) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	/* Do a report zone to get max_lba and the same field */
-	ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, 0, false);
+	/* Do a report zone to get max_lba and the size of the first zone */
+	ret = sd_zbc_do_report_zones(sdkp, buf, SD_BUF_SIZE, 0, false);
 	if (ret)
-		goto out_free;
+		return ret;
 
 	if (sdkp->rc_basis == 0) {
 		/* The max_lba field is the capacity of this device */
@@ -379,82 +366,28 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
 		}
 	}
 
-	/*
-	 * Check same field: for any value other than 0, we know that all zones
-	 * have the same size.
-	 */
-	same = buf[4] & 0x0f;
-	if (same > 0) {
-		rec = &buf[64];
-		zone_blocks = get_unaligned_be64(&rec[8]);
-		goto out;
-	}
-
-	/*
-	 * Check the size of all zones: all zones must be of
-	 * equal size, except the last zone which can be smaller
-	 * than other zones.
-	 */
-	do {
-
-		/* Parse REPORT ZONES header */
-		buflen = min_t(size_t, get_unaligned_be32(&buf[0]) + 64,
-			       bufsize);
-		rec = buf + 64;
-
-		/* Parse zone descriptors */
-		while (rec < buf + buflen) {
-			u64 this_zone_blocks = get_unaligned_be64(&rec[8]);
-
-			if (zone_blocks == 0) {
-				zone_blocks = this_zone_blocks;
-			} else if (this_zone_blocks != zone_blocks &&
-				   (block + this_zone_blocks < sdkp->capacity
-				    || this_zone_blocks > zone_blocks)) {
-				zone_blocks = 0;
-				goto out;
-			}
-			block += this_zone_blocks;
-			rec += 64;
-		}
-
-		if (block < sdkp->capacity) {
-			ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, block,
-						     true);
-			if (ret)
-				goto out_free;
-		}
-
-	} while (block < sdkp->capacity);
-
-out:
-	if (!zone_blocks) {
-		if (sdkp->first_scan)
-			sd_printk(KERN_NOTICE, sdkp,
-				  "Devices with non constant zone "
-				  "size are not supported\n");
-		ret = -ENODEV;
-	} else if (!is_power_of_2(zone_blocks)) {
+	/* Parse REPORT ZONES header */
+	buflen = min_t(size_t, get_unaligned_be32(&buf[0]) + 64, SD_BUF_SIZE);
+	rec = buf + 64;
+	zone_blocks = get_unaligned_be64(&rec[8]);
+	if (!zone_blocks || !is_power_of_2(zone_blocks)) {
 		if (sdkp->first_scan)
 			sd_printk(KERN_NOTICE, sdkp,
 				  "Devices with non power of 2 zone "
 				  "size are not supported\n");
-		ret = -ENODEV;
-	} else if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {
+		return -ENODEV;
+	}
+
+	if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {
 		if (sdkp->first_scan)
 			sd_printk(KERN_NOTICE, sdkp,
 				  "Zone size too large\n");
-		ret = -EFBIG;
-	} else {
-		*zblocks = zone_blocks;
-		ret = 0;
+		return -EFBIG;
 	}
 
-out_free:
-	memalloc_noio_restore(noio_flag);
-	kvfree(buf);
+	*zblocks = zone_blocks;
 
-	return ret;
+	return 0;
 }
 
 int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
@@ -480,7 +413,7 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 	 * Check zone size: only devices with a constant zone size (except
 	 * an eventual last runt zone) that is a power of 2 are supported.
 	 */
-	ret = sd_zbc_check_zones(sdkp, &zone_blocks);
+	ret = sd_zbc_check_zones(sdkp, buf, &zone_blocks);
 	if (ret != 0)
 		goto err;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/9] block: cleanup the !zoned case in blk_revalidate_disk_zones
  2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
  2019-11-08  1:56 ` [PATCH 1/9] block: Enhance blk_revalidate_disk_zones() Damien Le Moal
@ 2019-11-08  1:56 ` Damien Le Moal
  2019-11-08  7:11   ` Hannes Reinecke
  2019-11-08 18:50   ` Chaitanya Kulkarni
  2019-11-08  1:56 ` [PATCH 3/9] block: Simplify report zones execution Damien Le Moal
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:56 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

From: Christoph Hellwig <hch@lst.de>

blk_revalidate_disk_zones is never called for non-zoned devices.  Just
return early and warn instead of trying to handle this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-zoned.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index dae787f67019..523a28d7a15c 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -520,6 +520,9 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
 	sector_t sector = 0;
 	int ret = 0;
 
+	if (WARN_ON_ONCE(!blk_queue_is_zoned(q)))
+		return -EIO;
+
 	/*
 	 * BIO based queues do not use a scheduler so only q->nr_zones
 	 * needs to be updated so that the sysfs exposed value is correct.
@@ -535,10 +538,8 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
 	 */
 	noio_flag = memalloc_noio_save();
 
-	if (!blk_queue_is_zoned(q) || !nr_zones) {
-		nr_zones = 0;
+	if (!nr_zones)
 		goto update;
-	}
 
 	/* Allocate bitmaps */
 	ret = -ENOMEM;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/9] block: Simplify report zones execution
  2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
  2019-11-08  1:56 ` [PATCH 1/9] block: Enhance blk_revalidate_disk_zones() Damien Le Moal
  2019-11-08  1:56 ` [PATCH 2/9] block: cleanup the !zoned case in blk_revalidate_disk_zones Damien Le Moal
@ 2019-11-08  1:56 ` Damien Le Moal
  2019-11-08  7:12   ` Hannes Reinecke
  2019-11-08  1:56 ` [PATCH 4/9] block: Remove partition support for zoned block devices Damien Le Moal
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:56 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

All kernel users of blkdev_report_zones() as well as applications use
through ioctl(BLKZONEREPORT) expect to potentially get less zone
descriptors than requested. As such, the use of the internal report
zones command execution loop implemented by blk_report_zones() is
not necessary and can even be harmful to performance by causing the
execution of inefficient small zones report command to service the
reminder of a requested zone array.

This patch removes blk_report_zones(), simplifying the code. Also
remove a now incorrect comment in dm_blk_report_zones().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Javier Gonzalez <javier@javigon.com>
---
 block/blk-zoned.c | 34 +++++-----------------------------
 drivers/md/dm.c   |  6 ------
 2 files changed, 5 insertions(+), 35 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 523a28d7a15c..ea4e086ba00e 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -119,31 +119,6 @@ static bool blkdev_report_zone(struct block_device *bdev, struct blk_zone *rep)
 	return true;
 }
 
-static int blk_report_zones(struct gendisk *disk, sector_t sector,
-			    struct blk_zone *zones, unsigned int *nr_zones)
-{
-	struct request_queue *q = disk->queue;
-	unsigned int z = 0, n, nrz = *nr_zones;
-	sector_t capacity = get_capacity(disk);
-	int ret;
-
-	while (z < nrz && sector < capacity) {
-		n = nrz - z;
-		ret = disk->fops->report_zones(disk, sector, &zones[z], &n);
-		if (ret)
-			return ret;
-		if (!n)
-			break;
-		sector += blk_queue_zone_sectors(q) * n;
-		z += n;
-	}
-
-	WARN_ON(z > *nr_zones);
-	*nr_zones = z;
-
-	return 0;
-}
-
 /**
  * blkdev_report_zones - Get zones information
  * @bdev:	Target block device
@@ -164,6 +139,7 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector,
 			struct blk_zone *zones, unsigned int *nr_zones)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
+	struct gendisk *disk = bdev->bd_disk;
 	unsigned int i, nrz;
 	int ret;
 
@@ -175,7 +151,7 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector,
 	 * report_zones method. If it does not have one defined, the device
 	 * driver has a bug. So warn about that.
 	 */
-	if (WARN_ON_ONCE(!bdev->bd_disk->fops->report_zones))
+	if (WARN_ON_ONCE(!disk->fops->report_zones))
 		return -EOPNOTSUPP;
 
 	if (!*nr_zones || sector >= bdev->bd_part->nr_sects) {
@@ -185,8 +161,8 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector,
 
 	nrz = min(*nr_zones,
 		  __blkdev_nr_zones(q, bdev->bd_part->nr_sects - sector));
-	ret = blk_report_zones(bdev->bd_disk, get_start_sect(bdev) + sector,
-			       zones, &nrz);
+	ret = disk->fops->report_zones(disk, get_start_sect(bdev) + sector,
+				       zones, &nrz);
 	if (ret)
 		return ret;
 
@@ -561,7 +537,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
 
 	while (z < nr_zones) {
 		nrz = min(nr_zones - z, rep_nr_zones);
-		ret = blk_report_zones(disk, sector, zones, &nrz);
+		ret = disk->fops->report_zones(disk, sector, zones, &nrz);
 		if (ret)
 			goto out;
 		if (!nrz)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index bc143c1b2333..89189c29438f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -473,12 +473,6 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
 		goto out;
 	}
 
-	/*
-	 * blkdev_report_zones() will loop and call this again to cover all the
-	 * zones of the target, eventually moving on to the next target.
-	 * So there is no need to loop here trying to fill the entire array
-	 * of zones.
-	 */
 	ret = tgt->type->report_zones(tgt, sector, zones, nr_zones);
 
 out:
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/9] block: Remove partition support for zoned block devices
  2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
                   ` (2 preceding siblings ...)
  2019-11-08  1:56 ` [PATCH 3/9] block: Simplify report zones execution Damien Le Moal
@ 2019-11-08  1:56 ` Damien Le Moal
  2019-11-08  6:30   ` [dm-devel] " Christoph Hellwig
  2019-11-08  7:17   ` Hannes Reinecke
  2019-11-08  1:56 ` [PATCH 5/9] null_blk: clean up the block device operations Damien Le Moal
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:56 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

No known partitioning tool supports zoned block devices, especially the
host managed flavor with strong sequential write constraints.
Furthermore, there are also no known user nor use cases for partitioned
zoned block devices.

This patch removes partition device creation for zoned block devices,
which allows simplifying the processing of zone commands for zoned
block devices. A warning is added if a partition table is found on the
device.

For report zones operations no zone sector information remapping is
necessary anymore, simplifying the code. Of note is that remapping of
zone reports for DM targets is still necessary as done by
dm_remap_zone_report().

Similarly, remaping of a zone reset bio is not necessary anymore.
Testing for the applicability of the zone reset all request also becomes
simpler and only needs to check that the number of sectors of the
requested zone range is equal to the disk capacity.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-core.c          |  6 +---
 block/blk-zoned.c         | 62 ++++++--------------------------
 block/partition-generic.c | 74 +++++----------------------------------
 drivers/md/dm.c           |  3 --
 4 files changed, 21 insertions(+), 124 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3306a3c5bed6..df6b70476187 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -851,11 +851,7 @@ static inline int blk_partition_remap(struct bio *bio)
 	if (unlikely(bio_check_ro(bio, p)))
 		goto out;
 
-	/*
-	 * Zone management bios do not have a sector count but they do have
-	 * a start sector filled out and need to be remapped.
-	 */
-	if (bio_sectors(bio) || op_is_zone_mgmt(bio_op(bio))) {
+	if (bio_sectors(bio)) {
 		if (bio_check_eod(bio, part_nr_sects_read(p)))
 			goto out;
 		bio->bi_iter.bi_sector += p->start_sect;
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index ea4e086ba00e..ae665e490858 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -93,32 +93,10 @@ unsigned int blkdev_nr_zones(struct block_device *bdev)
 	if (!blk_queue_is_zoned(q))
 		return 0;
 
-	return __blkdev_nr_zones(q, bdev->bd_part->nr_sects);
+	return __blkdev_nr_zones(q, get_capacity(bdev->bd_disk));
 }
 EXPORT_SYMBOL_GPL(blkdev_nr_zones);
 
-/*
- * Check that a zone report belongs to this partition, and if yes, fix its start
- * sector and write pointer and return true. Return false otherwise.
- */
-static bool blkdev_report_zone(struct block_device *bdev, struct blk_zone *rep)
-{
-	sector_t offset = get_start_sect(bdev);
-
-	if (rep->start < offset)
-		return false;
-
-	rep->start -= offset;
-	if (rep->start + rep->len > bdev->bd_part->nr_sects)
-		return false;
-
-	if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
-		rep->wp = rep->start + rep->len;
-	else
-		rep->wp -= offset;
-	return true;
-}
-
 /**
  * blkdev_report_zones - Get zones information
  * @bdev:	Target block device
@@ -140,8 +118,7 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector,
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 	struct gendisk *disk = bdev->bd_disk;
-	unsigned int i, nrz;
-	int ret;
+	sector_t capacity = get_capacity(disk);
 
 	if (!blk_queue_is_zoned(q))
 		return -EOPNOTSUPP;
@@ -154,27 +131,14 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector,
 	if (WARN_ON_ONCE(!disk->fops->report_zones))
 		return -EOPNOTSUPP;
 
-	if (!*nr_zones || sector >= bdev->bd_part->nr_sects) {
+	if (!*nr_zones || sector >= capacity) {
 		*nr_zones = 0;
 		return 0;
 	}
 
-	nrz = min(*nr_zones,
-		  __blkdev_nr_zones(q, bdev->bd_part->nr_sects - sector));
-	ret = disk->fops->report_zones(disk, get_start_sect(bdev) + sector,
-				       zones, &nrz);
-	if (ret)
-		return ret;
+	*nr_zones = min(*nr_zones, __blkdev_nr_zones(q, capacity - sector));
 
-	for (i = 0; i < nrz; i++) {
-		if (!blkdev_report_zone(bdev, zones))
-			break;
-		zones++;
-	}
-
-	*nr_zones = i;
-
-	return 0;
+	return disk->fops->report_zones(disk, sector, zones, nr_zones);
 }
 EXPORT_SYMBOL_GPL(blkdev_report_zones);
 
@@ -185,15 +149,11 @@ static inline bool blkdev_allow_reset_all_zones(struct block_device *bdev,
 	if (!blk_queue_zone_resetall(bdev_get_queue(bdev)))
 		return false;
 
-	if (sector || nr_sectors != part_nr_sects_read(bdev->bd_part))
-		return false;
 	/*
-	 * REQ_OP_ZONE_RESET_ALL can be executed only if the block device is
-	 * the entire disk, that is, if the blocks device start offset is 0 and
-	 * its capacity is the same as the entire disk.
+	 * REQ_OP_ZONE_RESET_ALL can be executed only if the number of sectors
+	 * of the applicable zone range is the entire disk.
 	 */
-	return get_start_sect(bdev) == 0 &&
-	       part_nr_sects_read(bdev->bd_part) == get_capacity(bdev->bd_disk);
+	return !sector && nr_sectors == get_capacity(bdev->bd_disk);
 }
 
 /**
@@ -218,6 +178,7 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 	sector_t zone_sectors = blk_queue_zone_sectors(q);
+	sector_t capacity = get_capacity(bdev->bd_disk);
 	sector_t end_sector = sector + nr_sectors;
 	struct bio *bio = NULL;
 	int ret;
@@ -231,7 +192,7 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
 	if (!op_is_zone_mgmt(op))
 		return -EOPNOTSUPP;
 
-	if (!nr_sectors || end_sector > bdev->bd_part->nr_sects)
+	if (!nr_sectors || end_sector > capacity)
 		/* Out of range */
 		return -EINVAL;
 
@@ -239,8 +200,7 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
 	if (sector & (zone_sectors - 1))
 		return -EINVAL;
 
-	if ((nr_sectors & (zone_sectors - 1)) &&
-	    end_sector != bdev->bd_part->nr_sects)
+	if ((nr_sectors & (zone_sectors - 1)) && end_sector != capacity)
 		return -EINVAL;
 
 	while (sector < end_sector) {
diff --git a/block/partition-generic.c b/block/partition-generic.c
index aee643ce13d1..31bff3fb28af 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -459,56 +459,6 @@ static int drop_partitions(struct gendisk *disk, struct block_device *bdev)
 	return 0;
 }
 
-static bool part_zone_aligned(struct gendisk *disk,
-			      struct block_device *bdev,
-			      sector_t from, sector_t size)
-{
-	unsigned int zone_sectors = bdev_zone_sectors(bdev);
-
-	/*
-	 * If this function is called, then the disk is a zoned block device
-	 * (host-aware or host-managed). This can be detected even if the
-	 * zoned block device support is disabled (CONFIG_BLK_DEV_ZONED not
-	 * set). In this case, however, only host-aware devices will be seen
-	 * as a block device is not created for host-managed devices. Without
-	 * zoned block device support, host-aware drives can still be used as
-	 * regular block devices (no zone operation) and their zone size will
-	 * be reported as 0. Allow this case.
-	 */
-	if (!zone_sectors)
-		return true;
-
-	/*
-	 * Check partition start and size alignement. If the drive has a
-	 * smaller last runt zone, ignore it and allow the partition to
-	 * use it. Check the zone size too: it should be a power of 2 number
-	 * of sectors.
-	 */
-	if (WARN_ON_ONCE(!is_power_of_2(zone_sectors))) {
-		u32 rem;
-
-		div_u64_rem(from, zone_sectors, &rem);
-		if (rem)
-			return false;
-		if ((from + size) < get_capacity(disk)) {
-			div_u64_rem(size, zone_sectors, &rem);
-			if (rem)
-				return false;
-		}
-
-	} else {
-
-		if (from & (zone_sectors - 1))
-			return false;
-		if ((from + size) < get_capacity(disk) &&
-		    (size & (zone_sectors - 1)))
-			return false;
-
-	}
-
-	return true;
-}
-
 int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 {
 	struct parsed_partitions *state = NULL;
@@ -544,6 +494,14 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 		}
 		return -EIO;
 	}
+
+	/* Partitions are not supported on zoned block devices */
+	if (bdev_is_zoned(bdev)) {
+		pr_warn("%s: ignoring partition table on zoned block device\n",
+			disk->disk_name);
+		goto out;
+	}
+
 	/*
 	 * If any partition code tried to read beyond EOD, try
 	 * unlocking native capacity even if partition table is
@@ -607,21 +565,6 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 			}
 		}
 
-		/*
-		 * On a zoned block device, partitions should be aligned on the
-		 * device zone size (i.e. zone boundary crossing not allowed).
-		 * Otherwise, resetting the write pointer of the last zone of
-		 * one partition may impact the following partition.
-		 */
-		if (bdev_is_zoned(bdev) &&
-		    !part_zone_aligned(disk, bdev, from, size)) {
-			printk(KERN_WARNING
-			       "%s: p%d start %llu+%llu is not zone aligned\n",
-			       disk->disk_name, p, (unsigned long long) from,
-			       (unsigned long long) size);
-			continue;
-		}
-
 		part = add_partition(disk, p, from, size,
 				     state->parts[p].flags,
 				     &state->parts[p].info);
@@ -635,6 +578,7 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 			md_autodetect_dev(part_to_dev(part)->devt);
 #endif
 	}
+out:
 	free_partitions(state);
 	return 0;
 }
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 89189c29438f..76f4cfdd6b41 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1211,9 +1211,6 @@ EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
  * The zone descriptors obtained with a zone report indicate
  * zone positions within the underlying device of the target. The zone
  * descriptors must be remapped to match their position within the dm device.
- * The caller target should obtain the zones information using
- * blkdev_report_zones() to ensure that remapping for partition offset is
- * already handled.
  */
 void dm_remap_zone_report(struct dm_target *ti, sector_t start,
 			  struct blk_zone *zones, unsigned int *nr_zones)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/9] null_blk: clean up the block device operations
  2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
                   ` (3 preceding siblings ...)
  2019-11-08  1:56 ` [PATCH 4/9] block: Remove partition support for zoned block devices Damien Le Moal
@ 2019-11-08  1:56 ` Damien Le Moal
  2019-11-08  7:17   ` Hannes Reinecke
  2019-11-08 18:47   ` Chaitanya Kulkarni
  2019-11-08  1:56 ` [PATCH 6/9] null_blk: clean up report zones Damien Le Moal
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:56 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

From: Christoph Hellwig <hch@lst.de>

Remove the pointless stub open and release methods, give the operations
vector a slightly less confusing name, and use normal alignment for the
assignment operators.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk_main.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 40a64bdd8ea7..89d385bab45b 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1442,20 +1442,9 @@ static void null_config_discard(struct nullb *nullb)
 	blk_queue_flag_set(QUEUE_FLAG_DISCARD, nullb->q);
 }
 
-static int null_open(struct block_device *bdev, fmode_t mode)
-{
-	return 0;
-}
-
-static void null_release(struct gendisk *disk, fmode_t mode)
-{
-}
-
-static const struct block_device_operations null_fops = {
-	.owner =	THIS_MODULE,
-	.open =		null_open,
-	.release =	null_release,
-	.report_zones =	null_zone_report,
+static const struct block_device_operations null_ops = {
+	.owner		= THIS_MODULE,
+	.report_zones	= null_zone_report,
 };
 
 static void null_init_queue(struct nullb *nullb, struct nullb_queue *nq)
@@ -1556,7 +1545,7 @@ static int null_gendisk_register(struct nullb *nullb)
 	disk->flags |= GENHD_FL_EXT_DEVT | GENHD_FL_SUPPRESS_PARTITION_INFO;
 	disk->major		= null_major;
 	disk->first_minor	= nullb->index;
-	disk->fops		= &null_fops;
+	disk->fops		= &null_ops;
 	disk->private_data	= nullb;
 	disk->queue		= nullb->q;
 	strncpy(disk->disk_name, nullb->disk_name, DISK_NAME_LEN);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/9] null_blk: clean up report zones
  2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
                   ` (4 preceding siblings ...)
  2019-11-08  1:56 ` [PATCH 5/9] null_blk: clean up the block device operations Damien Le Moal
@ 2019-11-08  1:56 ` Damien Le Moal
  2019-11-08  7:18   ` Hannes Reinecke
  2019-11-08 18:47   ` Chaitanya Kulkarni
  2019-11-08  1:57 ` [PATCH 7/9] null_blk: Add zone_nr_conv to features Damien Le Moal
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:56 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

From: Christoph Hellwig <hch@lst.de>

Make the instance name match the method name and define the name to NULL
instead of providing an inline stub, which is rather pointless for a
method call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk.h       | 11 +++--------
 drivers/block/null_blk_main.c  |  2 +-
 drivers/block/null_blk_zoned.c |  4 ++--
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index 93c2a3d403da..9bf56fbab091 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -91,8 +91,8 @@ struct nullb {
 #ifdef CONFIG_BLK_DEV_ZONED
 int null_zone_init(struct nullb_device *dev);
 void null_zone_exit(struct nullb_device *dev);
-int null_zone_report(struct gendisk *disk, sector_t sector,
-		     struct blk_zone *zones, unsigned int *nr_zones);
+int null_report_zones(struct gendisk *disk, sector_t sector,
+		      struct blk_zone *zones, unsigned int *nr_zones);
 blk_status_t null_handle_zoned(struct nullb_cmd *cmd,
 				enum req_opf op, sector_t sector,
 				sector_t nr_sectors);
@@ -105,12 +105,6 @@ static inline int null_zone_init(struct nullb_device *dev)
 	return -EINVAL;
 }
 static inline void null_zone_exit(struct nullb_device *dev) {}
-static inline int null_zone_report(struct gendisk *disk, sector_t sector,
-				   struct blk_zone *zones,
-				   unsigned int *nr_zones)
-{
-	return -EOPNOTSUPP;
-}
 static inline blk_status_t null_handle_zoned(struct nullb_cmd *cmd,
 					     enum req_opf op, sector_t sector,
 					     sector_t nr_sectors)
@@ -123,5 +117,6 @@ static inline size_t null_zone_valid_read_len(struct nullb *nullb,
 {
 	return len;
 }
+#define null_report_zones	NULL
 #endif /* CONFIG_BLK_DEV_ZONED */
 #endif /* __NULL_BLK_H */
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 89d385bab45b..2687eb36441c 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1444,7 +1444,7 @@ static void null_config_discard(struct nullb *nullb)
 
 static const struct block_device_operations null_ops = {
 	.owner		= THIS_MODULE,
-	.report_zones	= null_zone_report,
+	.report_zones	= null_report_zones,
 };
 
 static void null_init_queue(struct nullb *nullb, struct nullb_queue *nq)
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 02f41a3bc4cb..00696f16664b 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -66,8 +66,8 @@ void null_zone_exit(struct nullb_device *dev)
 	kvfree(dev->zones);
 }
 
-int null_zone_report(struct gendisk *disk, sector_t sector,
-		     struct blk_zone *zones, unsigned int *nr_zones)
+int null_report_zones(struct gendisk *disk, sector_t sector,
+		      struct blk_zone *zones, unsigned int *nr_zones)
 {
 	struct nullb *nullb = disk->private_data;
 	struct nullb_device *dev = nullb->dev;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 7/9] null_blk: Add zone_nr_conv to features
  2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
                   ` (5 preceding siblings ...)
  2019-11-08  1:56 ` [PATCH 6/9] null_blk: clean up report zones Damien Le Moal
@ 2019-11-08  1:57 ` Damien Le Moal
  2019-11-08  7:18   ` Hannes Reinecke
  2019-11-08 18:48   ` Chaitanya Kulkarni
  2019-11-08  1:57 ` [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer() Damien Le Moal
  2019-11-08  1:57 ` [PATCH 9/9] block: rework zone reporting Damien Le Moal
  8 siblings, 2 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:57 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

For a null_blk device with zoned mode enabled, the number of
conventional zones can be configured through configfs with the
zone_nr_conv parameter. Add this missing parameter in the features
string.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 2687eb36441c..27fb34d7da31 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -467,7 +467,7 @@ nullb_group_drop_item(struct config_group *group, struct config_item *item)
 
 static ssize_t memb_group_features_show(struct config_item *item, char *page)
 {
-	return snprintf(page, PAGE_SIZE, "memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size\n");
+	return snprintf(page, PAGE_SIZE, "memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size,zone_nr_conv\n");
 }
 
 CONFIGFS_ATTR_RO(memb_group_, features);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
  2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
                   ` (6 preceding siblings ...)
  2019-11-08  1:57 ` [PATCH 7/9] null_blk: Add zone_nr_conv to features Damien Le Moal
@ 2019-11-08  1:57 ` Damien Le Moal
  2019-11-08  6:31   ` [dm-devel] " Christoph Hellwig
                     ` (2 more replies)
  2019-11-08  1:57 ` [PATCH 9/9] block: rework zone reporting Damien Le Moal
  8 siblings, 3 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:57 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

There is no need to arbitrarily limit the size of a report zone to the
number of zones defined by SD_ZBC_REPORT_MAX_ZONES. Rather, simply
calculate the report buffer size needed for the requested number of
zones without exceeding the device total number of zones. This buffer
size limitation to the hardware maximum transfer size and page mapping
capabilities is kept unchanged. Starting with this initial buffer size,
the allocation is optimized by iterating over decreasing buffer size
until the allocation succeeds. This ensures forward progress for zone
reports and avoids failures of zones revalidation under memory pressure.

While at it, also replace the hard coded 512 B sector size with the
SECTOR_SIZE macro.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd_zbc.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 7c4690f26698..f191af15de1b 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -104,11 +104,6 @@ static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
 	return 0;
 }
 
-/*
- * Maximum number of zones to get with one report zones command.
- */
-#define SD_ZBC_REPORT_MAX_ZONES		8192U
-
 /**
  * Allocate a buffer for report zones reply.
  * @sdkp: The target disk
@@ -138,17 +133,22 @@ static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
 	 * sure that the allocated buffer can always be mapped by limiting the
 	 * number of pages allocated to the HBA max segments limit.
 	 */
-	nr_zones = min(nr_zones, SD_ZBC_REPORT_MAX_ZONES);
-	bufsize = roundup((nr_zones + 1) * 64, 512);
+	nr_zones = min(nr_zones, sdkp->nr_zones);
+	bufsize = roundup((nr_zones + 1) * 64, SECTOR_SIZE);
 	bufsize = min_t(size_t, bufsize,
 			queue_max_hw_sectors(q) << SECTOR_SHIFT);
 	bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT);
 
-	buf = vzalloc(bufsize);
-	if (buf)
-		*buflen = bufsize;
+	while (bufsize >= SECTOR_SIZE) {
+		buf = vzalloc(bufsize);
+		if (buf) {
+			*buflen = bufsize;
+			return buf;
+		}
+		bufsize >>= 1;
+	}
 
-	return buf;
+	return NULL;
 }
 
 /**
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 9/9] block: rework zone reporting
  2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
                   ` (7 preceding siblings ...)
  2019-11-08  1:57 ` [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer() Damien Le Moal
@ 2019-11-08  1:57 ` Damien Le Moal
  2019-11-08  7:23   ` Hannes Reinecke
  2019-11-08 15:16   ` Mike Snitzer
  8 siblings, 2 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  1:57 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

From: Christoph Hellwig <hch@lst.de>

Avoid the need to allocate a potentially large array of struct blk_zone
in the block layer by switching the ->report_zones method interface to
a callback model. Now the caller simply supplies a callback that is
executed on each reported zone, and private data for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-zoned.c              | 253 +++++++++++++--------------------
 drivers/block/null_blk.h       |   2 +-
 drivers/block/null_blk_zoned.c |  31 ++--
 drivers/md/dm-flakey.c         |  18 +--
 drivers/md/dm-linear.c         |  20 +--
 drivers/md/dm-zoned-metadata.c | 131 +++++++----------
 drivers/md/dm.c                | 121 +++++++---------
 drivers/scsi/sd.h              |   4 +-
 drivers/scsi/sd_zbc.c          | 106 +++++++-------
 fs/f2fs/super.c                |  51 +++----
 include/linux/blkdev.h         |  15 +-
 include/linux/device-mapper.h  |  24 +++-
 12 files changed, 329 insertions(+), 447 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index ae665e490858..6fad6f3f6980 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -101,44 +101,35 @@ EXPORT_SYMBOL_GPL(blkdev_nr_zones);
  * blkdev_report_zones - Get zones information
  * @bdev:	Target block device
  * @sector:	Sector from which to report zones
- * @zones:	Array of zone structures where to return the zones information
- * @nr_zones:	Number of zone structures in the zone array
+ * @nr_zones:	Maximum number of zones to report
+ * @cb:		Callback function called for each reported zone
+ * @data:	Private data for the callback
  *
  * Description:
- *    Get zone information starting from the zone containing @sector.
- *    The number of zone information reported may be less than the number
- *    requested by @nr_zones. The number of zones actually reported is
- *    returned in @nr_zones.
- *    The caller must use memalloc_noXX_save/restore() calls to control
- *    memory allocations done within this function (zone array and command
- *    buffer allocation by the device driver).
+ *    Get zone information starting from the zone containing @sector for at most
+ *    @nr_zones, and call @cb for each zone reported by the device.
+ *    To report all zones in a device starting from @sector, the BLK_ALL_ZONES
+ *    constant can be passed to @nr_zones.
+ *    Returns the number of zones reported by the device, or a negative errno
+ *    value in case of failure.
+ *
+ *    Note: The caller must use memalloc_noXX_save/restore() calls to control
+ *    memory allocations done within this function.
  */
 int blkdev_report_zones(struct block_device *bdev, sector_t sector,
-			struct blk_zone *zones, unsigned int *nr_zones)
+			unsigned int nr_zones, report_zones_cb cb, void *data)
 {
-	struct request_queue *q = bdev_get_queue(bdev);
 	struct gendisk *disk = bdev->bd_disk;
 	sector_t capacity = get_capacity(disk);
 
-	if (!blk_queue_is_zoned(q))
-		return -EOPNOTSUPP;
-
-	/*
-	 * A block device that advertized itself as zoned must have a
-	 * report_zones method. If it does not have one defined, the device
-	 * driver has a bug. So warn about that.
-	 */
-	if (WARN_ON_ONCE(!disk->fops->report_zones))
+	if (!blk_queue_is_zoned(bdev_get_queue(bdev)) ||
+	    WARN_ON_ONCE(!disk->fops->report_zones))
 		return -EOPNOTSUPP;
 
-	if (!*nr_zones || sector >= capacity) {
-		*nr_zones = 0;
+	if (!nr_zones || sector >= capacity)
 		return 0;
-	}
 
-	*nr_zones = min(*nr_zones, __blkdev_nr_zones(q, capacity - sector));
-
-	return disk->fops->report_zones(disk, sector, zones, nr_zones);
+	return disk->fops->report_zones(disk, sector, nr_zones, cb, data);
 }
 EXPORT_SYMBOL_GPL(blkdev_report_zones);
 
@@ -232,6 +223,20 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
 }
 EXPORT_SYMBOL_GPL(blkdev_zone_mgmt);
 
+struct zone_report_args {
+	struct blk_zone __user *zones;
+};
+
+static int blkdev_copy_zone_to_user(struct blk_zone *zone, unsigned int idx,
+				    void *data)
+{
+	struct zone_report_args *args = data;
+
+	if (copy_to_user(&args->zones[idx], zone, sizeof(struct blk_zone)))
+		return -EFAULT;
+	return 0;
+}
+
 /*
  * BLKREPORTZONE ioctl processing.
  * Called from blkdev_ioctl.
@@ -240,9 +245,9 @@ int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
 			      unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
+	struct zone_report_args args;
 	struct request_queue *q;
 	struct blk_zone_report rep;
-	struct blk_zone *zones;
 	int ret;
 
 	if (!argp)
@@ -264,32 +269,16 @@ int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
 	if (!rep.nr_zones)
 		return -EINVAL;
 
-	rep.nr_zones = min(blkdev_nr_zones(bdev), rep.nr_zones);
-
-	zones = kvmalloc_array(rep.nr_zones, sizeof(struct blk_zone),
-			       GFP_KERNEL | __GFP_ZERO);
-	if (!zones)
-		return -ENOMEM;
-
-	ret = blkdev_report_zones(bdev, rep.sector, zones, &rep.nr_zones);
-	if (ret)
-		goto out;
-
-	if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report))) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	if (rep.nr_zones) {
-		if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
-				 sizeof(struct blk_zone) * rep.nr_zones))
-			ret = -EFAULT;
-	}
+	args.zones = argp + sizeof(struct blk_zone_report);
+	ret = blkdev_report_zones(bdev, rep.sector, rep.nr_zones,
+				  blkdev_copy_zone_to_user, &args);
+	if (ret < 0)
+		return ret;
 
- out:
-	kvfree(zones);
-
-	return ret;
+	rep.nr_zones = ret;
+	if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report)))
+		return -EFAULT;
+	return 0;
 }
 
 /*
@@ -351,31 +340,6 @@ static inline unsigned long *blk_alloc_zone_bitmap(int node,
 			    GFP_NOIO, node);
 }
 
-/*
- * Allocate an array of struct blk_zone to get nr_zones zone information.
- * The allocated array may be smaller than nr_zones.
- */
-static struct blk_zone *blk_alloc_zones(unsigned int *nr_zones)
-{
-	struct blk_zone *zones;
-	size_t nrz = min(*nr_zones, BLK_ZONED_REPORT_MAX_ZONES);
-
-	/*
-	 * GFP_KERNEL here is meaningless as the caller task context has
-	 * the PF_MEMALLOC_NOIO flag set in blk_revalidate_disk_zones()
-	 * with memalloc_noio_save().
-	 */
-	zones = kvcalloc(nrz, sizeof(struct blk_zone), GFP_KERNEL);
-	if (!zones) {
-		*nr_zones = 0;
-		return NULL;
-	}
-
-	*nr_zones = nrz;
-
-	return zones;
-}
-
 void blk_queue_free_zone_bitmaps(struct request_queue *q)
 {
 	kfree(q->seq_zones_bitmap);
@@ -384,12 +348,21 @@ void blk_queue_free_zone_bitmaps(struct request_queue *q)
 	q->seq_zones_wlock = NULL;
 }
 
+struct blk_revalidate_zone_args {
+	struct gendisk	*disk;
+	unsigned long	*seq_zones_bitmap;
+	unsigned long	*seq_zones_wlock;
+	sector_t	sector;
+};
+
 /*
  * Helper function to check the validity of zones of a zoned block device.
  */
-static bool blk_zone_valid(struct gendisk *disk, struct blk_zone *zone,
-			   sector_t *sector)
+static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx,
+				  void *data)
 {
+	struct blk_revalidate_zone_args *args = data;
+	struct gendisk *disk = args->disk;
 	struct request_queue *q = disk->queue;
 	sector_t zone_sectors = blk_queue_zone_sectors(q);
 	sector_t capacity = get_capacity(disk);
@@ -409,14 +382,14 @@ static bool blk_zone_valid(struct gendisk *disk, struct blk_zone *zone,
 	    zone->len > zone_sectors) {
 		pr_warn("%s: Invalid zoned device with larger last zone size\n",
 			disk->disk_name);
-		return false;
+		return -ENODEV;
 	}
 
 	/* Check for holes in the zone report */
-	if (zone->start != *sector) {
+	if (zone->start != args->sector) {
 		pr_warn("%s: Zone gap at sectors %llu..%llu\n",
-			disk->disk_name, *sector, zone->start);
-		return false;
+			disk->disk_name, args->sector, zone->start);
+		return -ENODEV;
 	}
 
 	/* Check zone type */
@@ -428,12 +401,38 @@ static bool blk_zone_valid(struct gendisk *disk, struct blk_zone *zone,
 	default:
 		pr_warn("%s: Invalid zone type 0x%x at sectors %llu\n",
 			disk->disk_name, (int)zone->type, zone->start);
-		return false;
+		return -ENODEV;
 	}
 
-	*sector += zone->len;
+	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
+		set_bit(idx, args->seq_zones_bitmap);
 
-	return true;
+	args->sector += zone->len;
+	return 0;
+}
+
+static int blk_update_zone_info(struct gendisk *disk, unsigned int nr_zones,
+				struct blk_revalidate_zone_args *args)
+{
+	/*
+	 * Ensure that all memory allocations in this context are done as
+	 * if GFP_NOIO was specified.
+	 */
+	unsigned int noio_flag = memalloc_noio_save();
+	struct request_queue *q = disk->queue;
+	int ret;
+
+	args->seq_zones_wlock = blk_alloc_zone_bitmap(q->node, nr_zones);
+	if (!args->seq_zones_wlock)
+		return -ENOMEM;
+	args->seq_zones_bitmap = blk_alloc_zone_bitmap(q->node, nr_zones);
+	if (!args->seq_zones_bitmap)
+		return -ENOMEM;
+
+	ret = disk->fops->report_zones(disk, 0, nr_zones,
+				       blk_revalidate_zone_cb, args);
+	memalloc_noio_restore(noio_flag);
+	return ret;
 }
 
 /**
@@ -449,11 +448,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
 {
 	struct request_queue *q = disk->queue;
 	unsigned int nr_zones = __blkdev_nr_zones(q, get_capacity(disk));
-	unsigned long *seq_zones_wlock = NULL, *seq_zones_bitmap = NULL;
-	unsigned int i, rep_nr_zones = 0, z = 0, nrz;
-	struct blk_zone *zones = NULL;
-	unsigned int noio_flag;
-	sector_t sector = 0;
+	struct blk_revalidate_zone_args args = { .disk = disk };
 	int ret = 0;
 
 	if (WARN_ON_ONCE(!blk_queue_is_zoned(q)))
@@ -468,82 +463,28 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
 		return 0;
 	}
 
-	/*
-	 * Ensure that all memory allocations in this context are done as
-	 * if GFP_NOIO was specified.
-	 */
-	noio_flag = memalloc_noio_save();
-
-	if (!nr_zones)
-		goto update;
-
-	/* Allocate bitmaps */
-	ret = -ENOMEM;
-	seq_zones_wlock = blk_alloc_zone_bitmap(q->node, nr_zones);
-	if (!seq_zones_wlock)
-		goto out;
-	seq_zones_bitmap = blk_alloc_zone_bitmap(q->node, nr_zones);
-	if (!seq_zones_bitmap)
-		goto out;
-
-	/*
-	 * Get zone information to check the zones and initialize
-	 * seq_zones_bitmap.
-	 */
-	rep_nr_zones = nr_zones;
-	zones = blk_alloc_zones(&rep_nr_zones);
-	if (!zones)
-		goto out;
-
-	while (z < nr_zones) {
-		nrz = min(nr_zones - z, rep_nr_zones);
-		ret = disk->fops->report_zones(disk, sector, zones, &nrz);
-		if (ret)
-			goto out;
-		if (!nrz)
-			break;
-		for (i = 0; i < nrz; i++) {
-			if (!blk_zone_valid(disk, &zones[i], &sector)) {
-				ret = -ENODEV;
-				goto out;
-			}
-			if (zones[i].type != BLK_ZONE_TYPE_CONVENTIONAL)
-				set_bit(z, seq_zones_bitmap);
-			z++;
-		}
-	}
-
-	if (WARN_ON(z != nr_zones)) {
-		ret = -EIO;
-		goto out;
-	}
+	if (nr_zones)
+		ret = blk_update_zone_info(disk, nr_zones, &args);
 
-update:
 	/*
 	 * Install the new bitmaps, making sure the queue is stopped and
 	 * all I/Os are completed (i.e. a scheduler is not referencing the
 	 * bitmaps).
 	 */
 	blk_mq_freeze_queue(q);
-	q->nr_zones = nr_zones;
-	swap(q->seq_zones_wlock, seq_zones_wlock);
-	swap(q->seq_zones_bitmap, seq_zones_bitmap);
-	blk_mq_unfreeze_queue(q);
-
-out:
-	memalloc_noio_restore(noio_flag);
-
-	kvfree(zones);
-	kfree(seq_zones_wlock);
-	kfree(seq_zones_bitmap);
-
-	if (ret) {
+	if (ret >= 0) {
+		q->nr_zones = nr_zones;
+		swap(q->seq_zones_wlock, args.seq_zones_wlock);
+		swap(q->seq_zones_bitmap, args.seq_zones_bitmap);
+		ret = 0;
+	} else {
 		pr_warn("%s: failed to revalidate zones\n", disk->disk_name);
-		blk_mq_freeze_queue(q);
 		blk_queue_free_zone_bitmaps(q);
-		blk_mq_unfreeze_queue(q);
 	}
+	blk_mq_unfreeze_queue(q);
 
+	kfree(args.seq_zones_wlock);
+	kfree(args.seq_zones_bitmap);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blk_revalidate_disk_zones);
diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index 9bf56fbab091..bc837862b767 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -92,7 +92,7 @@ struct nullb {
 int null_zone_init(struct nullb_device *dev);
 void null_zone_exit(struct nullb_device *dev);
 int null_report_zones(struct gendisk *disk, sector_t sector,
-		      struct blk_zone *zones, unsigned int *nr_zones);
+		      unsigned int nr_zones, report_zones_cb cb, void *data);
 blk_status_t null_handle_zoned(struct nullb_cmd *cmd,
 				enum req_opf op, sector_t sector,
 				sector_t nr_sectors);
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 00696f16664b..d4d88b581822 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -67,21 +67,34 @@ void null_zone_exit(struct nullb_device *dev)
 }
 
 int null_report_zones(struct gendisk *disk, sector_t sector,
-		      struct blk_zone *zones, unsigned int *nr_zones)
+		unsigned int nr_zones, report_zones_cb cb, void *data)
 {
 	struct nullb *nullb = disk->private_data;
 	struct nullb_device *dev = nullb->dev;
-	unsigned int zno, nrz = 0;
+	unsigned int first_zone, i;
+	struct blk_zone zone;
+	int error;
 
-	zno = null_zone_no(dev, sector);
-	if (zno < dev->nr_zones) {
-		nrz = min_t(unsigned int, *nr_zones, dev->nr_zones - zno);
-		memcpy(zones, &dev->zones[zno], nrz * sizeof(struct blk_zone));
-	}
+	first_zone = null_zone_no(dev, sector);
+	if (first_zone >= dev->nr_zones)
+		return 0;
 
-	*nr_zones = nrz;
+	nr_zones = min(nr_zones, dev->nr_zones - first_zone);
+	for (i = 0; i < nr_zones; i++) {
+		/*
+		 * Stacked DM target drivers will remap the zone information by
+		 * modifying the zone information passed to the report callback.
+		 * So use a local copy to avoid corruption of the device zone
+		 * array.
+		 */
+		memcpy(&zone, &dev->zones[first_zone + i],
+		       sizeof(struct blk_zone));
+		error = cb(&zone, i, data);
+		if (error)
+			return error;
+	}
 
-	return 0;
+	return nr_zones;
 }
 
 size_t null_zone_valid_read_len(struct nullb *nullb,
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 76587e9af0ef..a2cc9e45cbba 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -459,21 +459,15 @@ static int flakey_prepare_ioctl(struct dm_target *ti, struct block_device **bdev
 }
 
 #ifdef CONFIG_BLK_DEV_ZONED
-static int flakey_report_zones(struct dm_target *ti, sector_t sector,
-			       struct blk_zone *zones, unsigned int *nr_zones)
+static int flakey_report_zones(struct dm_target *ti,
+		struct dm_report_zones_args *args, unsigned int nr_zones)
 {
 	struct flakey_c *fc = ti->private;
-	int ret;
+	sector_t sector = flakey_map_sector(ti, args->next_sector);
 
-	/* Do report and remap it */
-	ret = blkdev_report_zones(fc->dev->bdev, flakey_map_sector(ti, sector),
-				  zones, nr_zones);
-	if (ret != 0)
-		return ret;
-
-	if (*nr_zones)
-		dm_remap_zone_report(ti, fc->start, zones, nr_zones);
-	return 0;
+	args->start = fc->start;
+	return blkdev_report_zones(fc->dev->bdev, sector, nr_zones,
+				   dm_report_zones_cb, args);
 }
 #endif
 
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 97acafd48c85..8d07fdf63a47 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -136,21 +136,15 @@ static int linear_prepare_ioctl(struct dm_target *ti, struct block_device **bdev
 }
 
 #ifdef CONFIG_BLK_DEV_ZONED
-static int linear_report_zones(struct dm_target *ti, sector_t sector,
-			       struct blk_zone *zones, unsigned int *nr_zones)
+static int linear_report_zones(struct dm_target *ti,
+		struct dm_report_zones_args *args, unsigned int nr_zones)
 {
-	struct linear_c *lc = (struct linear_c *) ti->private;
-	int ret;
-
-	/* Do report and remap it */
-	ret = blkdev_report_zones(lc->dev->bdev, linear_map_sector(ti, sector),
-				  zones, nr_zones);
-	if (ret != 0)
-		return ret;
+	struct linear_c *lc = ti->private;
+	sector_t sector = linear_map_sector(ti, args->next_sector);
 
-	if (*nr_zones)
-		dm_remap_zone_report(ti, lc->start, zones, nr_zones);
-	return 0;
+	args->start = lc->start;
+	return blkdev_report_zones(lc->dev->bdev, sector, nr_zones,
+				   dm_report_zones_cb, args);
 }
 #endif
 
diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index feb4718ce6a6..069e4675da6b 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1080,9 +1080,10 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 /*
  * Initialize a zone descriptor.
  */
-static int dmz_init_zone(struct dmz_metadata *zmd, struct dm_zone *zone,
-			 struct blk_zone *blkz)
+static int dmz_init_zone(struct blk_zone *blkz, unsigned int idx, void *data)
 {
+	struct dmz_metadata *zmd = data;
+	struct dm_zone *zone = &zmd->zones[idx];
 	struct dmz_dev *dev = zmd->dev;
 
 	/* Ignore the eventual last runt (smaller) zone */
@@ -1096,26 +1097,29 @@ static int dmz_init_zone(struct dmz_metadata *zmd, struct dm_zone *zone,
 	atomic_set(&zone->refcount, 0);
 	zone->chunk = DMZ_MAP_UNMAPPED;
 
-	if (blkz->type == BLK_ZONE_TYPE_CONVENTIONAL) {
+	switch (blkz->type) {
+	case BLK_ZONE_TYPE_CONVENTIONAL:
 		set_bit(DMZ_RND, &zone->flags);
 		zmd->nr_rnd_zones++;
-	} else if (blkz->type == BLK_ZONE_TYPE_SEQWRITE_REQ ||
-		   blkz->type == BLK_ZONE_TYPE_SEQWRITE_PREF) {
+		break;
+	case BLK_ZONE_TYPE_SEQWRITE_REQ:
+	case BLK_ZONE_TYPE_SEQWRITE_PREF:
 		set_bit(DMZ_SEQ, &zone->flags);
-	} else
+		break;
+	default:
 		return -ENXIO;
-
-	if (blkz->cond == BLK_ZONE_COND_OFFLINE)
-		set_bit(DMZ_OFFLINE, &zone->flags);
-	else if (blkz->cond == BLK_ZONE_COND_READONLY)
-		set_bit(DMZ_READ_ONLY, &zone->flags);
+	}
 
 	if (dmz_is_rnd(zone))
 		zone->wp_block = 0;
 	else
 		zone->wp_block = dmz_sect2blk(blkz->wp - blkz->start);
 
-	if (!dmz_is_offline(zone) && !dmz_is_readonly(zone)) {
+	if (blkz->cond == BLK_ZONE_COND_OFFLINE)
+		set_bit(DMZ_OFFLINE, &zone->flags);
+	else if (blkz->cond == BLK_ZONE_COND_READONLY)
+		set_bit(DMZ_READ_ONLY, &zone->flags);
+	else {
 		zmd->nr_useable_zones++;
 		if (dmz_is_rnd(zone)) {
 			zmd->nr_rnd_zones++;
@@ -1138,12 +1142,6 @@ static void dmz_drop_zones(struct dmz_metadata *zmd)
 	zmd->zones = NULL;
 }
 
-/*
- * The size of a zone report in number of zones.
- * This results in 4096*64B=256KB report zones commands.
- */
-#define DMZ_REPORT_NR_ZONES	4096
-
 /*
  * Allocate and initialize zone descriptors using the zone
  * information from disk.
@@ -1151,11 +1149,7 @@ static void dmz_drop_zones(struct dmz_metadata *zmd)
 static int dmz_init_zones(struct dmz_metadata *zmd)
 {
 	struct dmz_dev *dev = zmd->dev;
-	struct dm_zone *zone;
-	struct blk_zone *blkz;
-	unsigned int nr_blkz;
-	sector_t sector = 0;
-	int i, ret = 0;
+	int ret;
 
 	/* Init */
 	zmd->zone_bitmap_size = dev->zone_nr_blocks >> 3;
@@ -1169,54 +1163,38 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 	dmz_dev_info(dev, "Using %zu B for zone information",
 		     sizeof(struct dm_zone) * dev->nr_zones);
 
-	/* Get zone information */
-	nr_blkz = DMZ_REPORT_NR_ZONES;
-	blkz = kcalloc(nr_blkz, sizeof(struct blk_zone), GFP_KERNEL);
-	if (!blkz) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
 	/*
-	 * Get zone information and initialize zone descriptors.
-	 * At the same time, determine where the super block
-	 * should be: first block of the first randomly writable
-	 * zone.
+	 * Get zone information and initialize zone descriptors.  At the same
+	 * time, determine where the super block should be: first block of the
+	 * first randomly writable zone.
 	 */
-	zone = zmd->zones;
-	while (sector < dev->capacity) {
-		/* Get zone information */
-		nr_blkz = DMZ_REPORT_NR_ZONES;
-		ret = blkdev_report_zones(dev->bdev, sector, blkz, &nr_blkz);
-		if (ret) {
-			dmz_dev_err(dev, "Report zones failed %d", ret);
-			goto out;
-		}
+	ret = blkdev_report_zones(dev->bdev, 0, BLK_ALL_ZONES, dmz_init_zone,
+				  zmd);
+	if (ret < 0) {
+		dmz_drop_zones(zmd);
+		return ret;
+	}
 
-		if (!nr_blkz)
-			break;
+	return 0;
+}
 
-		/* Process report */
-		for (i = 0; i < nr_blkz; i++) {
-			ret = dmz_init_zone(zmd, zone, &blkz[i]);
-			if (ret)
-				goto out;
-			sector += dev->zone_nr_sectors;
-			zone++;
-		}
-	}
+static int dmz_update_zone_cb(struct blk_zone *blkz, unsigned int idx,
+			      void *data)
+{
+	struct dm_zone *zone = data;
 
-	/* The entire zone configuration of the disk should now be known */
-	if (sector < dev->capacity) {
-		dmz_dev_err(dev, "Failed to get correct zone information");
-		ret = -ENXIO;
-	}
-out:
-	kfree(blkz);
-	if (ret)
-		dmz_drop_zones(zmd);
+	clear_bit(DMZ_OFFLINE, &zone->flags);
+	clear_bit(DMZ_READ_ONLY, &zone->flags);
+	if (blkz->cond == BLK_ZONE_COND_OFFLINE)
+		set_bit(DMZ_OFFLINE, &zone->flags);
+	else if (blkz->cond == BLK_ZONE_COND_READONLY)
+		set_bit(DMZ_READ_ONLY, &zone->flags);
 
-	return ret;
+	if (dmz_is_seq(zone))
+		zone->wp_block = dmz_sect2blk(blkz->wp - blkz->start);
+	else
+		zone->wp_block = 0;
+	return 0;
 }
 
 /*
@@ -1224,9 +1202,7 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
  */
 static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 {
-	unsigned int nr_blkz = 1;
 	unsigned int noio_flag;
-	struct blk_zone blkz;
 	int ret;
 
 	/*
@@ -1236,29 +1212,18 @@ static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 	 * GFP_NOIO was specified.
 	 */
 	noio_flag = memalloc_noio_save();
-	ret = blkdev_report_zones(zmd->dev->bdev, dmz_start_sect(zmd, zone),
-				  &blkz, &nr_blkz);
+	ret = blkdev_report_zones(zmd->dev->bdev, dmz_start_sect(zmd, zone), 1,
+				  dmz_update_zone_cb, zone);
 	memalloc_noio_restore(noio_flag);
-	if (!nr_blkz)
+
+	if (ret == 0)
 		ret = -EIO;
-	if (ret) {
+	if (ret < 0) {
 		dmz_dev_err(zmd->dev, "Get zone %u report failed",
 			    dmz_id(zmd, zone));
 		return ret;
 	}
 
-	clear_bit(DMZ_OFFLINE, &zone->flags);
-	clear_bit(DMZ_READ_ONLY, &zone->flags);
-	if (blkz.cond == BLK_ZONE_COND_OFFLINE)
-		set_bit(DMZ_OFFLINE, &zone->flags);
-	else if (blkz.cond == BLK_ZONE_COND_READONLY)
-		set_bit(DMZ_READ_ONLY, &zone->flags);
-
-	if (dmz_is_seq(zone))
-		zone->wp_block = dmz_sect2blk(blkz.wp - blkz.start);
-	else
-		zone->wp_block = 0;
-
 	return 0;
 }
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 76f4cfdd6b41..e8f9661a10a1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -440,14 +440,48 @@ static int dm_blk_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 	return dm_get_geometry(md, geo);
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data)
+{
+	struct dm_report_zones_args *args = data;
+	sector_t sector_diff = args->tgt->begin - args->start;
+
+	/*
+	 * Ignore zones beyond the target range.
+	 */
+	if (zone->start >= args->start + args->tgt->len)
+		return 0;
+
+	/*
+	 * Remap the start sector and write pointer position of the zone
+	 * to match its position in the target range.
+	 */
+	zone->start += sector_diff;
+	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
+		if (zone->cond == BLK_ZONE_COND_FULL)
+			zone->wp = zone->start + zone->len;
+		else if (zone->cond == BLK_ZONE_COND_EMPTY)
+			zone->wp = zone->start;
+		else
+			zone->wp += sector_diff;
+	}
+
+	args->next_sector = zone->start + zone->len;
+	return args->orig_cb(zone, args->zone_idx++, args->orig_data);
+}
+EXPORT_SYMBOL_GPL(dm_report_zones_cb);
+
 static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
-			       struct blk_zone *zones, unsigned int *nr_zones)
+		unsigned int nr_zones, report_zones_cb cb, void *data)
 {
-#ifdef CONFIG_BLK_DEV_ZONED
 	struct mapped_device *md = disk->private_data;
-	struct dm_target *tgt;
 	struct dm_table *map;
 	int srcu_idx, ret;
+	struct dm_report_zones_args args = {
+		.next_sector = sector,
+		.orig_data = data,
+		.orig_cb = cb,
+	};
 
 	if (dm_suspended_md(md))
 		return -EAGAIN;
@@ -456,32 +490,30 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
 	if (!map)
 		return -EIO;
 
-	tgt = dm_table_find_target(map, sector);
-	if (!tgt) {
-		ret = -EIO;
-		goto out;
-	}
+	do {
+		struct dm_target *tgt;
 
-	/*
-	 * If we are executing this, we already know that the block device
-	 * is a zoned device and so each target should have support for that
-	 * type of drive. A missing report_zones method means that the target
-	 * driver has a problem.
-	 */
-	if (WARN_ON(!tgt->type->report_zones)) {
-		ret = -EIO;
-		goto out;
-	}
+		tgt = dm_table_find_target(map, args.next_sector);
+		if (WARN_ON_ONCE(!tgt->type->report_zones)) {
+			ret = -EIO;
+			goto out;
+		}
 
-	ret = tgt->type->report_zones(tgt, sector, zones, nr_zones);
+		args.tgt = tgt;
+		ret = tgt->type->report_zones(tgt, &args, nr_zones);
+		if (ret < 0)
+			goto out;
+	} while (args.zone_idx < nr_zones &&
+		 args.next_sector < get_capacity(disk));
 
+	ret = args.zone_idx;
 out:
 	dm_put_live_table(md, srcu_idx);
 	return ret;
-#else
-	return -ENOTSUPP;
-#endif
 }
+#else
+#define dm_blk_report_zones		NULL
+#endif /* CONFIG_BLK_DEV_ZONED */
 
 static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
 			    struct block_device **bdev)
@@ -1207,51 +1239,6 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors)
 }
 EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
 
-/*
- * The zone descriptors obtained with a zone report indicate
- * zone positions within the underlying device of the target. The zone
- * descriptors must be remapped to match their position within the dm device.
- */
-void dm_remap_zone_report(struct dm_target *ti, sector_t start,
-			  struct blk_zone *zones, unsigned int *nr_zones)
-{
-#ifdef CONFIG_BLK_DEV_ZONED
-	struct blk_zone *zone;
-	unsigned int nrz = *nr_zones;
-	int i;
-
-	/*
-	 * Remap the start sector and write pointer position of the zones in
-	 * the array. Since we may have obtained from the target underlying
-	 * device more zones that the target size, also adjust the number
-	 * of zones.
-	 */
-	for (i = 0; i < nrz; i++) {
-		zone = zones + i;
-		if (zone->start >= start + ti->len) {
-			memset(zone, 0, sizeof(struct blk_zone) * (nrz - i));
-			break;
-		}
-
-		zone->start = zone->start + ti->begin - start;
-		if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
-			continue;
-
-		if (zone->cond == BLK_ZONE_COND_FULL)
-			zone->wp = zone->start + zone->len;
-		else if (zone->cond == BLK_ZONE_COND_EMPTY)
-			zone->wp = zone->start;
-		else
-			zone->wp = zone->wp + ti->begin - start;
-	}
-
-	*nr_zones = i;
-#else /* !CONFIG_BLK_DEV_ZONED */
-	*nr_zones = 0;
-#endif
-}
-EXPORT_SYMBOL_GPL(dm_remap_zone_report);
-
 static blk_qc_t __map_bio(struct dm_target_io *tio)
 {
 	int r;
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index bf2102a749bc..42fd3f00e4a5 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -213,8 +213,8 @@ blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd,
 					 unsigned char op, bool all);
 extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
 			    struct scsi_sense_hdr *sshdr);
-extern int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
-			       struct blk_zone *zones, unsigned int *nr_zones);
+int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
+		unsigned int nr_zones, report_zones_cb cb, void *data);
 
 #else /* CONFIG_BLK_DEV_ZONED */
 
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index f191af15de1b..62b908aa1d21 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -19,34 +19,27 @@
 
 #include "sd.h"
 
-/**
- * sd_zbc_parse_report - Convert a zone descriptor to a struct blk_zone,
- * @sdkp: The disk the report originated from
- * @buf: Address of the report zone descriptor
- * @zone: the destination zone structure
- *
- * All LBA sized values are converted to 512B sectors unit.
- */
-static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
-				struct blk_zone *zone)
+static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
+			       unsigned int idx, report_zones_cb cb, void *data)
 {
 	struct scsi_device *sdp = sdkp->device;
+	struct blk_zone zone = { 0 };
 
-	memset(zone, 0, sizeof(struct blk_zone));
-
-	zone->type = buf[0] & 0x0f;
-	zone->cond = (buf[1] >> 4) & 0xf;
+	zone.type = buf[0] & 0x0f;
+	zone.cond = (buf[1] >> 4) & 0xf;
 	if (buf[1] & 0x01)
-		zone->reset = 1;
+		zone.reset = 1;
 	if (buf[1] & 0x02)
-		zone->non_seq = 1;
-
-	zone->len = logical_to_sectors(sdp, get_unaligned_be64(&buf[8]));
-	zone->start = logical_to_sectors(sdp, get_unaligned_be64(&buf[16]));
-	zone->wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24]));
-	if (zone->type != ZBC_ZONE_TYPE_CONV &&
-	    zone->cond == ZBC_ZONE_COND_FULL)
-		zone->wp = zone->start + zone->len;
+		zone.non_seq = 1;
+
+	zone.len = logical_to_sectors(sdp, get_unaligned_be64(&buf[8]));
+	zone.start = logical_to_sectors(sdp, get_unaligned_be64(&buf[16]));
+	zone.wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24]));
+	if (zone.type != ZBC_ZONE_TYPE_CONV &&
+	    zone.cond == ZBC_ZONE_COND_FULL)
+		zone.wp = zone.start + zone.len;
+
+	return cb(&zone, idx, data);
 }
 
 /**
@@ -152,60 +145,61 @@ static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
 }
 
 /**
- * sd_zbc_report_zones - Disk report zones operation.
- * @disk: The target disk
- * @sector: Start 512B sector of the report
- * @zones: Array of zone descriptors
- * @nr_zones: Number of descriptors in the array
- *
- * Execute a report zones command on the target disk.
+ * sd_zbc_zone_sectors - Get the device zone size in number of 512B sectors.
+ * @sdkp: The target disk
  */
+static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
+{
+	return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
+}
+
 int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
-			struct blk_zone *zones, unsigned int *nr_zones)
+			unsigned int nr_zones, report_zones_cb cb, void *data)
 {
 	struct scsi_disk *sdkp = scsi_disk(disk);
-	unsigned int i, nrz = *nr_zones;
+	unsigned int nr, i;
 	unsigned char *buf;
-	size_t buflen = 0, offset = 0;
-	int ret = 0;
+	size_t offset, buflen = 0;
+	int zone_idx = 0;
+	int ret;
 
 	if (!sd_is_zoned(sdkp))
 		/* Not a zoned device */
 		return -EOPNOTSUPP;
 
-	buf = sd_zbc_alloc_report_buffer(sdkp, nrz, &buflen);
+	buf = sd_zbc_alloc_report_buffer(sdkp, nr_zones, &buflen);
 	if (!buf)
 		return -ENOMEM;
 
-	ret = sd_zbc_do_report_zones(sdkp, buf, buflen,
-			sectors_to_logical(sdkp->device, sector), true);
-	if (ret)
-		goto out;
+	while (zone_idx < nr_zones && sector < get_capacity(disk)) {
+		ret = sd_zbc_do_report_zones(sdkp, buf, buflen,
+				sectors_to_logical(sdkp->device, sector), true);
+		if (ret)
+			goto out;
+
+		offset = 0;
+		nr = min(nr_zones, get_unaligned_be32(&buf[0]) / 64);
+		if (!nr)
+			break;
+
+		for (i = 0; i < nr && zone_idx < nr_zones; i++) {
+			offset += 64;
+			ret = sd_zbc_parse_report(sdkp, buf + offset, zone_idx,
+						  cb, data);
+			if (ret)
+				goto out;
+			zone_idx++;
+		}
 
-	nrz = min(nrz, get_unaligned_be32(&buf[0]) / 64);
-	for (i = 0; i < nrz; i++) {
-		offset += 64;
-		sd_zbc_parse_report(sdkp, buf + offset, zones);
-		zones++;
+		sector += sd_zbc_zone_sectors(sdkp) * i;
 	}
 
-	*nr_zones = nrz;
-
+	ret = zone_idx;
 out:
 	kvfree(buf);
-
 	return ret;
 }
 
-/**
- * sd_zbc_zone_sectors - Get the device zone size in number of 512B sectors.
- * @sdkp: The target disk
- */
-static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
-{
-	return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
-}
-
 /**
  * sd_zbc_setup_zone_mgmt_cmnd - Prepare a zone ZBC_OUT command. The operations
  *			can be RESET WRITE POINTER, OPEN, CLOSE or FINISH.
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 1443cee15863..95761740cf1f 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2857,15 +2857,21 @@ static int init_percpu_info(struct f2fs_sb_info *sbi)
 }
 
 #ifdef CONFIG_BLK_DEV_ZONED
+static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx,
+			       void *data)
+{
+	struct f2fs_dev_info *dev = data;
+
+	if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL)
+		set_bit(idx, dev->blkz_seq);
+	return 0;
+}
+
 static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
 {
 	struct block_device *bdev = FDEV(devi).bdev;
 	sector_t nr_sectors = bdev->bd_part->nr_sects;
-	sector_t sector = 0;
-	struct blk_zone *zones;
-	unsigned int i, nr_zones;
-	unsigned int n = 0;
-	int err = -EIO;
+	int ret;
 
 	if (!f2fs_sb_has_blkzoned(sbi))
 		return 0;
@@ -2890,38 +2896,13 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
 	if (!FDEV(devi).blkz_seq)
 		return -ENOMEM;
 
-#define F2FS_REPORT_NR_ZONES   4096
-
-	zones = f2fs_kzalloc(sbi,
-			     array_size(F2FS_REPORT_NR_ZONES,
-					sizeof(struct blk_zone)),
-			     GFP_KERNEL);
-	if (!zones)
-		return -ENOMEM;
-
 	/* Get block zones type */
-	while (zones && sector < nr_sectors) {
-
-		nr_zones = F2FS_REPORT_NR_ZONES;
-		err = blkdev_report_zones(bdev, sector, zones, &nr_zones);
-		if (err)
-			break;
-		if (!nr_zones) {
-			err = -EIO;
-			break;
-		}
-
-		for (i = 0; i < nr_zones; i++) {
-			if (zones[i].type != BLK_ZONE_TYPE_CONVENTIONAL)
-				set_bit(n, FDEV(devi).blkz_seq);
-			sector += zones[i].len;
-			n++;
-		}
-	}
-
-	kvfree(zones);
+	ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb,
+				  &FDEV(devi));
+	if (ret < 0)
+		return ret;
 
-	return err;
+	return 0;
 }
 #endif
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index dbef541c2530..17e384aa5a94 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -349,17 +349,16 @@ struct queue_limits {
 	enum blk_zoned_model	zoned;
 };
 
+typedef int (*report_zones_cb)(struct blk_zone *zone, unsigned int idx,
+			       void *data);
+
 #ifdef CONFIG_BLK_DEV_ZONED
 
-/*
- * Maximum number of zones to report with a single report zones command.
- */
-#define BLK_ZONED_REPORT_MAX_ZONES	8192U
+#define BLK_ALL_ZONES  ((unsigned int)-1)
+int blkdev_report_zones(struct block_device *bdev, sector_t sector,
+			unsigned int nr_zones, report_zones_cb cb, void *data);
 
 extern unsigned int blkdev_nr_zones(struct block_device *bdev);
-extern int blkdev_report_zones(struct block_device *bdev,
-			       sector_t sector, struct blk_zone *zones,
-			       unsigned int *nr_zones);
 extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
 			    sector_t sectors, sector_t nr_sectors,
 			    gfp_t gfp_mask);
@@ -1710,7 +1709,7 @@ struct block_device_operations {
 	/* this callback is with swap_lock and sometimes page table lock held */
 	void (*swap_slot_free_notify) (struct block_device *, unsigned long);
 	int (*report_zones)(struct gendisk *, sector_t sector,
-			    struct blk_zone *zones, unsigned int *nr_zones);
+			unsigned int nr_zones, report_zones_cb cb, void *data);
 	struct module *owner;
 	const struct pr_ops *pr_ops;
 };
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 399ad8632356..a164cc81b710 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -17,6 +17,7 @@
 struct dm_dev;
 struct dm_target;
 struct dm_table;
+struct dm_report_zones_args;
 struct mapped_device;
 struct bio_vec;
 
@@ -93,9 +94,9 @@ typedef int (*dm_message_fn) (struct dm_target *ti, unsigned argc, char **argv,
 
 typedef int (*dm_prepare_ioctl_fn) (struct dm_target *ti, struct block_device **bdev);
 
-typedef int (*dm_report_zones_fn) (struct dm_target *ti, sector_t sector,
-				   struct blk_zone *zones,
-				   unsigned int *nr_zones);
+typedef int (*dm_report_zones_fn) (struct dm_target *ti,
+				   struct dm_report_zones_args *args,
+				   unsigned int nr_zones);
 
 /*
  * These iteration functions are typically used to check (and combine)
@@ -422,10 +423,23 @@ struct gendisk *dm_disk(struct mapped_device *md);
 int dm_suspended(struct dm_target *ti);
 int dm_noflush_suspending(struct dm_target *ti);
 void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors);
-void dm_remap_zone_report(struct dm_target *ti, sector_t start,
-			  struct blk_zone *zones, unsigned int *nr_zones);
 union map_info *dm_get_rq_mapinfo(struct request *rq);
 
+#ifdef CONFIG_BLK_DEV_ZONED
+struct dm_report_zones_args {
+	struct dm_target *tgt;
+	sector_t next_sector;
+
+	void *orig_data;
+	report_zones_cb orig_cb;
+	unsigned int zone_idx;
+
+	/* must be filled by ->report_zones before calling dm_report_zones_cb */
+	sector_t start;
+};
+int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data);
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 /*
  * Device mapper functions to parse and create devices specified by the
  * parameter "dm-mod.create="
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [PATCH 1/9] block: Enhance blk_revalidate_disk_zones()
  2019-11-08  1:56 ` [PATCH 1/9] block: Enhance blk_revalidate_disk_zones() Damien Le Moal
@ 2019-11-08  6:28   ` Christoph Hellwig
  2019-11-08  7:10   ` Hannes Reinecke
  1 sibling, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2019-11-08  6:28 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [PATCH 4/9] block: Remove partition support for zoned block devices
  2019-11-08  1:56 ` [PATCH 4/9] block: Remove partition support for zoned block devices Damien Le Moal
@ 2019-11-08  6:30   ` Christoph Hellwig
  2019-11-08  7:17   ` Hannes Reinecke
  1 sibling, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2019-11-08  6:30 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Note that this has a conflict with my series that cleans up the area
around rescan_partitions, but that series will only get simpler after
this patch is merged.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
  2019-11-08  1:57 ` [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer() Damien Le Moal
@ 2019-11-08  6:31   ` Christoph Hellwig
  2019-11-08  7:20   ` Hannes Reinecke
  2019-11-08 19:06   ` [dm-devel] " Bart Van Assche
  2 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2019-11-08  6:31 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, Mike Snitzer, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/9] block: Enhance blk_revalidate_disk_zones()
  2019-11-08  1:56 ` [PATCH 1/9] block: Enhance blk_revalidate_disk_zones() Damien Le Moal
  2019-11-08  6:28   ` [dm-devel] " Christoph Hellwig
@ 2019-11-08  7:10   ` Hannes Reinecke
  1 sibling, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2019-11-08  7:10 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/8/19 2:56 AM, Damien Le Moal wrote:
> For ZBC and ZAC zoned devices, the scsi driver revalidation processing
> implemented by sd_revalidate_disk() includes a call to
> sd_zbc_read_zones() which executes a full disk zone report used to
> check that all zones of the disk are the same size. This processing is
> followed by a call to blk_revalidate_disk_zones(), used to initialize
> the device request queue zone bitmaps (zone type and zone write lock
> bitmaps). To do so, blk_revalidate_disk_zones() also executes a full
> device zone report to obtain zone types. As a result, the entire
> zoned block device revalidation process includes two full device zone
> report.
> 
> By moving the zone size checks into blk_revalidate_disk_zones(), this
> process can be optimized to a single full device zone report, leading to
> shorter device scan and revalidation times. This patch implements this
> optimization, reducing the original full device zone report implemented
> in sd_zbc_check_zones() to a single, small, report zones command
> execution to obtain the size of the first zone of the device. Checks
> whether all zones of the device are the same size as the first zone
> size are moved to the generic blk_check_zone() function called from
> blk_revalidate_disk_zones().
> 
> This optimization also has the following benefits:
> 1) fewer memory allocations in the scsi layer during disk revalidation
>    as the potentailly large buffer for zone report execution is not
>    needed.
> 2) Implement zone checks in a generic manner, reducing the burden on
>    device driver which only need to obtain the zone size and check that
>    this size is a power of 2 number of LBAs. Any new type of zoned
>    block device will benefit from this.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  block/blk-zoned.c     |  62 +++++++++++++++++++++++-
>  drivers/scsi/sd_zbc.c | 107 ++++++++----------------------------------
>  2 files changed, 80 insertions(+), 89 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/9] block: cleanup the !zoned case in blk_revalidate_disk_zones
  2019-11-08  1:56 ` [PATCH 2/9] block: cleanup the !zoned case in blk_revalidate_disk_zones Damien Le Moal
@ 2019-11-08  7:11   ` Hannes Reinecke
  2019-11-08 18:50   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2019-11-08  7:11 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/8/19 2:56 AM, Damien Le Moal wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> blk_revalidate_disk_zones is never called for non-zoned devices.  Just
> return early and warn instead of trying to handle this case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  block/blk-zoned.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index dae787f67019..523a28d7a15c 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -520,6 +520,9 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
>  	sector_t sector = 0;
>  	int ret = 0;
>  
> +	if (WARN_ON_ONCE(!blk_queue_is_zoned(q)))
> +		return -EIO;
> +
>  	/*
>  	 * BIO based queues do not use a scheduler so only q->nr_zones
>  	 * needs to be updated so that the sysfs exposed value is correct.
> @@ -535,10 +538,8 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
>  	 */
>  	noio_flag = memalloc_noio_save();
>  
> -	if (!blk_queue_is_zoned(q) || !nr_zones) {
> -		nr_zones = 0;
> +	if (!nr_zones)
>  		goto update;
> -	}
>  
>  	/* Allocate bitmaps */
>  	ret = -ENOMEM;
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/9] block: Simplify report zones execution
  2019-11-08  1:56 ` [PATCH 3/9] block: Simplify report zones execution Damien Le Moal
@ 2019-11-08  7:12   ` Hannes Reinecke
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2019-11-08  7:12 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/8/19 2:56 AM, Damien Le Moal wrote:
> All kernel users of blkdev_report_zones() as well as applications use
> through ioctl(BLKZONEREPORT) expect to potentially get less zone
> descriptors than requested. As such, the use of the internal report
> zones command execution loop implemented by blk_report_zones() is
> not necessary and can even be harmful to performance by causing the
> execution of inefficient small zones report command to service the
> reminder of a requested zone array.
> 
> This patch removes blk_report_zones(), simplifying the code. Also
> remove a now incorrect comment in dm_blk_report_zones().
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Javier Gonzalez <javier@javigon.com>
> ---
>  block/blk-zoned.c | 34 +++++-----------------------------
>  drivers/md/dm.c   |  6 ------
>  2 files changed, 5 insertions(+), 35 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/9] block: Remove partition support for zoned block devices
  2019-11-08  1:56 ` [PATCH 4/9] block: Remove partition support for zoned block devices Damien Le Moal
  2019-11-08  6:30   ` [dm-devel] " Christoph Hellwig
@ 2019-11-08  7:17   ` Hannes Reinecke
  2019-11-08  7:28     ` Damien Le Moal
  1 sibling, 1 reply; 31+ messages in thread
From: Hannes Reinecke @ 2019-11-08  7:17 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/8/19 2:56 AM, Damien Le Moal wrote:
> No known partitioning tool supports zoned block devices, especially the
> host managed flavor with strong sequential write constraints.
> Furthermore, there are also no known user nor use cases for partitioned
> zoned block devices.
> 
> This patch removes partition device creation for zoned block devices,
> which allows simplifying the processing of zone commands for zoned
> block devices. A warning is added if a partition table is found on the
> device.
> 
> For report zones operations no zone sector information remapping is
> necessary anymore, simplifying the code. Of note is that remapping of
> zone reports for DM targets is still necessary as done by
> dm_remap_zone_report().
> 
> Similarly, remaping of a zone reset bio is not necessary anymore.
> Testing for the applicability of the zone reset all request also becomes
> simpler and only needs to check that the number of sectors of the
> requested zone range is equal to the disk capacity.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  block/blk-core.c          |  6 +---
>  block/blk-zoned.c         | 62 ++++++--------------------------
>  block/partition-generic.c | 74 +++++----------------------------------
>  drivers/md/dm.c           |  3 --
>  4 files changed, 21 insertions(+), 124 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 3306a3c5bed6..df6b70476187 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -851,11 +851,7 @@ static inline int blk_partition_remap(struct bio *bio)
>  	if (unlikely(bio_check_ro(bio, p)))
>  		goto out;
>  
> -	/*
> -	 * Zone management bios do not have a sector count but they do have
> -	 * a start sector filled out and need to be remapped.
> -	 */
> -	if (bio_sectors(bio) || op_is_zone_mgmt(bio_op(bio))) {
> +	if (bio_sectors(bio)) {
>  		if (bio_check_eod(bio, part_nr_sects_read(p)))
>  			goto out;
>  		bio->bi_iter.bi_sector += p->start_sect;
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index ea4e086ba00e..ae665e490858 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -93,32 +93,10 @@ unsigned int blkdev_nr_zones(struct block_device *bdev)
>  	if (!blk_queue_is_zoned(q))
>  		return 0;
>  
> -	return __blkdev_nr_zones(q, bdev->bd_part->nr_sects);
> +	return __blkdev_nr_zones(q, get_capacity(bdev->bd_disk));
>  }
>  EXPORT_SYMBOL_GPL(blkdev_nr_zones);
>  
> -/*
> - * Check that a zone report belongs to this partition, and if yes, fix its start
> - * sector and write pointer and return true. Return false otherwise.
> - */
> -static bool blkdev_report_zone(struct block_device *bdev, struct blk_zone *rep)
> -{
> -	sector_t offset = get_start_sect(bdev);
> -
> -	if (rep->start < offset)
> -		return false;
> -
> -	rep->start -= offset;
> -	if (rep->start + rep->len > bdev->bd_part->nr_sects)
> -		return false;
> -
> -	if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
> -		rep->wp = rep->start + rep->len;
> -	else
> -		rep->wp -= offset;
> -	return true;
> -}
> -
>  /**
>   * blkdev_report_zones - Get zones information
>   * @bdev:	Target block device
> @@ -140,8 +118,7 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector,
>  {
>  	struct request_queue *q = bdev_get_queue(bdev);
>  	struct gendisk *disk = bdev->bd_disk;
> -	unsigned int i, nrz;
> -	int ret;
> +	sector_t capacity = get_capacity(disk);
>  
>  	if (!blk_queue_is_zoned(q))
>  		return -EOPNOTSUPP;
> @@ -154,27 +131,14 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector,
>  	if (WARN_ON_ONCE(!disk->fops->report_zones))
>  		return -EOPNOTSUPP;
>  
> -	if (!*nr_zones || sector >= bdev->bd_part->nr_sects) {
> +	if (!*nr_zones || sector >= capacity) {
>  		*nr_zones = 0;
>  		return 0;
>  	}
>  
> -	nrz = min(*nr_zones,
> -		  __blkdev_nr_zones(q, bdev->bd_part->nr_sects - sector));
> -	ret = disk->fops->report_zones(disk, get_start_sect(bdev) + sector,
> -				       zones, &nrz);
> -	if (ret)
> -		return ret;
> +	*nr_zones = min(*nr_zones, __blkdev_nr_zones(q, capacity - sector));
>  
> -	for (i = 0; i < nrz; i++) {
> -		if (!blkdev_report_zone(bdev, zones))
> -			break;
> -		zones++;
> -	}
> -
> -	*nr_zones = i;
> -
> -	return 0;
> +	return disk->fops->report_zones(disk, sector, zones, nr_zones);
>  }
>  EXPORT_SYMBOL_GPL(blkdev_report_zones);
>  
> @@ -185,15 +149,11 @@ static inline bool blkdev_allow_reset_all_zones(struct block_device *bdev,
>  	if (!blk_queue_zone_resetall(bdev_get_queue(bdev)))
>  		return false;
>  
> -	if (sector || nr_sectors != part_nr_sects_read(bdev->bd_part))
> -		return false;
>  	/*
> -	 * REQ_OP_ZONE_RESET_ALL can be executed only if the block device is
> -	 * the entire disk, that is, if the blocks device start offset is 0 and
> -	 * its capacity is the same as the entire disk.
> +	 * REQ_OP_ZONE_RESET_ALL can be executed only if the number of sectors
> +	 * of the applicable zone range is the entire disk.
>  	 */
> -	return get_start_sect(bdev) == 0 &&
> -	       part_nr_sects_read(bdev->bd_part) == get_capacity(bdev->bd_disk);
> +	return !sector && nr_sectors == get_capacity(bdev->bd_disk);
>  }
>  
>  /**
> @@ -218,6 +178,7 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
>  {
>  	struct request_queue *q = bdev_get_queue(bdev);
>  	sector_t zone_sectors = blk_queue_zone_sectors(q);
> +	sector_t capacity = get_capacity(bdev->bd_disk);
>  	sector_t end_sector = sector + nr_sectors;
>  	struct bio *bio = NULL;
>  	int ret;
> @@ -231,7 +192,7 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
>  	if (!op_is_zone_mgmt(op))
>  		return -EOPNOTSUPP;
>  
> -	if (!nr_sectors || end_sector > bdev->bd_part->nr_sects)
> +	if (!nr_sectors || end_sector > capacity)
>  		/* Out of range */
>  		return -EINVAL;
>  
> @@ -239,8 +200,7 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
>  	if (sector & (zone_sectors - 1))
>  		return -EINVAL;
>  
> -	if ((nr_sectors & (zone_sectors - 1)) &&
> -	    end_sector != bdev->bd_part->nr_sects)
> +	if ((nr_sectors & (zone_sectors - 1)) && end_sector != capacity)
>  		return -EINVAL;
>  
>  	while (sector < end_sector) {
> diff --git a/block/partition-generic.c b/block/partition-generic.c
> index aee643ce13d1..31bff3fb28af 100644
> --- a/block/partition-generic.c
> +++ b/block/partition-generic.c
> @@ -459,56 +459,6 @@ static int drop_partitions(struct gendisk *disk, struct block_device *bdev)
>  	return 0;
>  }
>  
> -static bool part_zone_aligned(struct gendisk *disk,
> -			      struct block_device *bdev,
> -			      sector_t from, sector_t size)
> -{
> -	unsigned int zone_sectors = bdev_zone_sectors(bdev);
> -
> -	/*
> -	 * If this function is called, then the disk is a zoned block device
> -	 * (host-aware or host-managed). This can be detected even if the
> -	 * zoned block device support is disabled (CONFIG_BLK_DEV_ZONED not
> -	 * set). In this case, however, only host-aware devices will be seen
> -	 * as a block device is not created for host-managed devices. Without
> -	 * zoned block device support, host-aware drives can still be used as
> -	 * regular block devices (no zone operation) and their zone size will
> -	 * be reported as 0. Allow this case.
> -	 */
> -	if (!zone_sectors)
> -		return true;
> -
> -	/*
> -	 * Check partition start and size alignement. If the drive has a
> -	 * smaller last runt zone, ignore it and allow the partition to
> -	 * use it. Check the zone size too: it should be a power of 2 number
> -	 * of sectors.
> -	 */
> -	if (WARN_ON_ONCE(!is_power_of_2(zone_sectors))) {
> -		u32 rem;
> -
> -		div_u64_rem(from, zone_sectors, &rem);
> -		if (rem)
> -			return false;
> -		if ((from + size) < get_capacity(disk)) {
> -			div_u64_rem(size, zone_sectors, &rem);
> -			if (rem)
> -				return false;
> -		}
> -
> -	} else {
> -
> -		if (from & (zone_sectors - 1))
> -			return false;
> -		if ((from + size) < get_capacity(disk) &&
> -		    (size & (zone_sectors - 1)))
> -			return false;
> -
> -	}
> -
> -	return true;
> -}
> -
>  int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
>  {
>  	struct parsed_partitions *state = NULL;
> @@ -544,6 +494,14 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
>  		}
>  		return -EIO;
>  	}
> +
> +	/* Partitions are not supported on zoned block devices */
> +	if (bdev_is_zoned(bdev)) {
> +		pr_warn("%s: ignoring partition table on zoned block device\n",
> +			disk->disk_name);
> +		goto out;
> +	}
> +
>  	/*
>  	 * If any partition code tried to read beyond EOD, try
>  	 * unlocking native capacity even if partition table is

While I do applaud removing special cases for zoned devices, we do have
the GENHD_FL_NO_PART_SCAN for precisely this use case.
Any particular reason why this isn't being used, nor even set?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/9] null_blk: clean up the block device operations
  2019-11-08  1:56 ` [PATCH 5/9] null_blk: clean up the block device operations Damien Le Moal
@ 2019-11-08  7:17   ` Hannes Reinecke
  2019-11-08 18:47   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2019-11-08  7:17 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/8/19 2:56 AM, Damien Le Moal wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Remove the pointless stub open and release methods, give the operations
> vector a slightly less confusing name, and use normal alignment for the
> assignment operators.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/block/null_blk_main.c | 19 ++++---------------
>  1 file changed, 4 insertions(+), 15 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/9] null_blk: clean up report zones
  2019-11-08  1:56 ` [PATCH 6/9] null_blk: clean up report zones Damien Le Moal
@ 2019-11-08  7:18   ` Hannes Reinecke
  2019-11-08 18:47   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2019-11-08  7:18 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/8/19 2:56 AM, Damien Le Moal wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Make the instance name match the method name and define the name to NULL
> instead of providing an inline stub, which is rather pointless for a
> method call.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/block/null_blk.h       | 11 +++--------
>  drivers/block/null_blk_main.c  |  2 +-
>  drivers/block/null_blk_zoned.c |  4 ++--
>  3 files changed, 6 insertions(+), 11 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] null_blk: Add zone_nr_conv to features
  2019-11-08  1:57 ` [PATCH 7/9] null_blk: Add zone_nr_conv to features Damien Le Moal
@ 2019-11-08  7:18   ` Hannes Reinecke
  2019-11-08 18:48   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2019-11-08  7:18 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/8/19 2:57 AM, Damien Le Moal wrote:
> For a null_blk device with zoned mode enabled, the number of
> conventional zones can be configured through configfs with the
> zone_nr_conv parameter. Add this missing parameter in the features
> string.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/block/null_blk_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
> index 2687eb36441c..27fb34d7da31 100644
> --- a/drivers/block/null_blk_main.c
> +++ b/drivers/block/null_blk_main.c
> @@ -467,7 +467,7 @@ nullb_group_drop_item(struct config_group *group, struct config_item *item)
>  
>  static ssize_t memb_group_features_show(struct config_item *item, char *page)
>  {
> -	return snprintf(page, PAGE_SIZE, "memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size\n");
> +	return snprintf(page, PAGE_SIZE, "memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size,zone_nr_conv\n");
>  }
>  
>  CONFIGFS_ATTR_RO(memb_group_, features);
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
  2019-11-08  1:57 ` [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer() Damien Le Moal
  2019-11-08  6:31   ` [dm-devel] " Christoph Hellwig
@ 2019-11-08  7:20   ` Hannes Reinecke
  2019-11-08 19:06   ` [dm-devel] " Bart Van Assche
  2 siblings, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2019-11-08  7:20 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/8/19 2:57 AM, Damien Le Moal wrote:
> There is no need to arbitrarily limit the size of a report zone to the
> number of zones defined by SD_ZBC_REPORT_MAX_ZONES. Rather, simply
> calculate the report buffer size needed for the requested number of
> zones without exceeding the device total number of zones. This buffer
> size limitation to the hardware maximum transfer size and page mapping
> capabilities is kept unchanged. Starting with this initial buffer size,
> the allocation is optimized by iterating over decreasing buffer size
> until the allocation succeeds. This ensures forward progress for zone
> reports and avoids failures of zones revalidation under memory pressure.
> 
> While at it, also replace the hard coded 512 B sector size with the
> SECTOR_SIZE macro.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/scsi/sd_zbc.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9/9] block: rework zone reporting
  2019-11-08  1:57 ` [PATCH 9/9] block: rework zone reporting Damien Le Moal
@ 2019-11-08  7:23   ` Hannes Reinecke
  2019-11-08 15:16   ` Mike Snitzer
  1 sibling, 0 replies; 31+ messages in thread
From: Hannes Reinecke @ 2019-11-08  7:23 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/8/19 2:57 AM, Damien Le Moal wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Avoid the need to allocate a potentially large array of struct blk_zone
> in the block layer by switching the ->report_zones method interface to
> a callback model. Now the caller simply supplies a callback that is
> executed on each reported zone, and private data for it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  block/blk-zoned.c              | 253 +++++++++++++--------------------
>  drivers/block/null_blk.h       |   2 +-
>  drivers/block/null_blk_zoned.c |  31 ++--
>  drivers/md/dm-flakey.c         |  18 +--
>  drivers/md/dm-linear.c         |  20 +--
>  drivers/md/dm-zoned-metadata.c | 131 +++++++----------
>  drivers/md/dm.c                | 121 +++++++---------
>  drivers/scsi/sd.h              |   4 +-
>  drivers/scsi/sd_zbc.c          | 106 +++++++-------
>  fs/f2fs/super.c                |  51 +++----
>  include/linux/blkdev.h         |  15 +-
>  include/linux/device-mapper.h  |  24 +++-
>  12 files changed, 329 insertions(+), 447 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/9] block: Remove partition support for zoned block devices
  2019-11-08  7:17   ` Hannes Reinecke
@ 2019-11-08  7:28     ` Damien Le Moal
  0 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2019-11-08  7:28 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 2019/11/08 16:17, Hannes Reinecke wrote:
> On 11/8/19 2:56 AM, Damien Le Moal wrote:
>> No known partitioning tool supports zoned block devices, especially the
>> host managed flavor with strong sequential write constraints.
>> Furthermore, there are also no known user nor use cases for partitioned
>> zoned block devices.
>>
>> This patch removes partition device creation for zoned block devices,
>> which allows simplifying the processing of zone commands for zoned
>> block devices. A warning is added if a partition table is found on the
>> device.
>>
>> For report zones operations no zone sector information remapping is
>> necessary anymore, simplifying the code. Of note is that remapping of
>> zone reports for DM targets is still necessary as done by
>> dm_remap_zone_report().
>>
>> Similarly, remaping of a zone reset bio is not necessary anymore.
>> Testing for the applicability of the zone reset all request also becomes
>> simpler and only needs to check that the number of sectors of the
>> requested zone range is equal to the disk capacity.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>  block/blk-core.c          |  6 +---
>>  block/blk-zoned.c         | 62 ++++++--------------------------
>>  block/partition-generic.c | 74 +++++----------------------------------
>>  drivers/md/dm.c           |  3 --
>>  4 files changed, 21 insertions(+), 124 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 3306a3c5bed6..df6b70476187 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -851,11 +851,7 @@ static inline int blk_partition_remap(struct bio *bio)
>>  	if (unlikely(bio_check_ro(bio, p)))
>>  		goto out;
>>  
>> -	/*
>> -	 * Zone management bios do not have a sector count but they do have
>> -	 * a start sector filled out and need to be remapped.
>> -	 */
>> -	if (bio_sectors(bio) || op_is_zone_mgmt(bio_op(bio))) {
>> +	if (bio_sectors(bio)) {
>>  		if (bio_check_eod(bio, part_nr_sects_read(p)))
>>  			goto out;
>>  		bio->bi_iter.bi_sector += p->start_sect;
>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>> index ea4e086ba00e..ae665e490858 100644
>> --- a/block/blk-zoned.c
>> +++ b/block/blk-zoned.c
>> @@ -93,32 +93,10 @@ unsigned int blkdev_nr_zones(struct block_device *bdev)
>>  	if (!blk_queue_is_zoned(q))
>>  		return 0;
>>  
>> -	return __blkdev_nr_zones(q, bdev->bd_part->nr_sects);
>> +	return __blkdev_nr_zones(q, get_capacity(bdev->bd_disk));
>>  }
>>  EXPORT_SYMBOL_GPL(blkdev_nr_zones);
>>  
>> -/*
>> - * Check that a zone report belongs to this partition, and if yes, fix its start
>> - * sector and write pointer and return true. Return false otherwise.
>> - */
>> -static bool blkdev_report_zone(struct block_device *bdev, struct blk_zone *rep)
>> -{
>> -	sector_t offset = get_start_sect(bdev);
>> -
>> -	if (rep->start < offset)
>> -		return false;
>> -
>> -	rep->start -= offset;
>> -	if (rep->start + rep->len > bdev->bd_part->nr_sects)
>> -		return false;
>> -
>> -	if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
>> -		rep->wp = rep->start + rep->len;
>> -	else
>> -		rep->wp -= offset;
>> -	return true;
>> -}
>> -
>>  /**
>>   * blkdev_report_zones - Get zones information
>>   * @bdev:	Target block device
>> @@ -140,8 +118,7 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector,
>>  {
>>  	struct request_queue *q = bdev_get_queue(bdev);
>>  	struct gendisk *disk = bdev->bd_disk;
>> -	unsigned int i, nrz;
>> -	int ret;
>> +	sector_t capacity = get_capacity(disk);
>>  
>>  	if (!blk_queue_is_zoned(q))
>>  		return -EOPNOTSUPP;
>> @@ -154,27 +131,14 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector,
>>  	if (WARN_ON_ONCE(!disk->fops->report_zones))
>>  		return -EOPNOTSUPP;
>>  
>> -	if (!*nr_zones || sector >= bdev->bd_part->nr_sects) {
>> +	if (!*nr_zones || sector >= capacity) {
>>  		*nr_zones = 0;
>>  		return 0;
>>  	}
>>  
>> -	nrz = min(*nr_zones,
>> -		  __blkdev_nr_zones(q, bdev->bd_part->nr_sects - sector));
>> -	ret = disk->fops->report_zones(disk, get_start_sect(bdev) + sector,
>> -				       zones, &nrz);
>> -	if (ret)
>> -		return ret;
>> +	*nr_zones = min(*nr_zones, __blkdev_nr_zones(q, capacity - sector));
>>  
>> -	for (i = 0; i < nrz; i++) {
>> -		if (!blkdev_report_zone(bdev, zones))
>> -			break;
>> -		zones++;
>> -	}
>> -
>> -	*nr_zones = i;
>> -
>> -	return 0;
>> +	return disk->fops->report_zones(disk, sector, zones, nr_zones);
>>  }
>>  EXPORT_SYMBOL_GPL(blkdev_report_zones);
>>  
>> @@ -185,15 +149,11 @@ static inline bool blkdev_allow_reset_all_zones(struct block_device *bdev,
>>  	if (!blk_queue_zone_resetall(bdev_get_queue(bdev)))
>>  		return false;
>>  
>> -	if (sector || nr_sectors != part_nr_sects_read(bdev->bd_part))
>> -		return false;
>>  	/*
>> -	 * REQ_OP_ZONE_RESET_ALL can be executed only if the block device is
>> -	 * the entire disk, that is, if the blocks device start offset is 0 and
>> -	 * its capacity is the same as the entire disk.
>> +	 * REQ_OP_ZONE_RESET_ALL can be executed only if the number of sectors
>> +	 * of the applicable zone range is the entire disk.
>>  	 */
>> -	return get_start_sect(bdev) == 0 &&
>> -	       part_nr_sects_read(bdev->bd_part) == get_capacity(bdev->bd_disk);
>> +	return !sector && nr_sectors == get_capacity(bdev->bd_disk);
>>  }
>>  
>>  /**
>> @@ -218,6 +178,7 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
>>  {
>>  	struct request_queue *q = bdev_get_queue(bdev);
>>  	sector_t zone_sectors = blk_queue_zone_sectors(q);
>> +	sector_t capacity = get_capacity(bdev->bd_disk);
>>  	sector_t end_sector = sector + nr_sectors;
>>  	struct bio *bio = NULL;
>>  	int ret;
>> @@ -231,7 +192,7 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
>>  	if (!op_is_zone_mgmt(op))
>>  		return -EOPNOTSUPP;
>>  
>> -	if (!nr_sectors || end_sector > bdev->bd_part->nr_sects)
>> +	if (!nr_sectors || end_sector > capacity)
>>  		/* Out of range */
>>  		return -EINVAL;
>>  
>> @@ -239,8 +200,7 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
>>  	if (sector & (zone_sectors - 1))
>>  		return -EINVAL;
>>  
>> -	if ((nr_sectors & (zone_sectors - 1)) &&
>> -	    end_sector != bdev->bd_part->nr_sects)
>> +	if ((nr_sectors & (zone_sectors - 1)) && end_sector != capacity)
>>  		return -EINVAL;
>>  
>>  	while (sector < end_sector) {
>> diff --git a/block/partition-generic.c b/block/partition-generic.c
>> index aee643ce13d1..31bff3fb28af 100644
>> --- a/block/partition-generic.c
>> +++ b/block/partition-generic.c
>> @@ -459,56 +459,6 @@ static int drop_partitions(struct gendisk *disk, struct block_device *bdev)
>>  	return 0;
>>  }
>>  
>> -static bool part_zone_aligned(struct gendisk *disk,
>> -			      struct block_device *bdev,
>> -			      sector_t from, sector_t size)
>> -{
>> -	unsigned int zone_sectors = bdev_zone_sectors(bdev);
>> -
>> -	/*
>> -	 * If this function is called, then the disk is a zoned block device
>> -	 * (host-aware or host-managed). This can be detected even if the
>> -	 * zoned block device support is disabled (CONFIG_BLK_DEV_ZONED not
>> -	 * set). In this case, however, only host-aware devices will be seen
>> -	 * as a block device is not created for host-managed devices. Without
>> -	 * zoned block device support, host-aware drives can still be used as
>> -	 * regular block devices (no zone operation) and their zone size will
>> -	 * be reported as 0. Allow this case.
>> -	 */
>> -	if (!zone_sectors)
>> -		return true;
>> -
>> -	/*
>> -	 * Check partition start and size alignement. If the drive has a
>> -	 * smaller last runt zone, ignore it and allow the partition to
>> -	 * use it. Check the zone size too: it should be a power of 2 number
>> -	 * of sectors.
>> -	 */
>> -	if (WARN_ON_ONCE(!is_power_of_2(zone_sectors))) {
>> -		u32 rem;
>> -
>> -		div_u64_rem(from, zone_sectors, &rem);
>> -		if (rem)
>> -			return false;
>> -		if ((from + size) < get_capacity(disk)) {
>> -			div_u64_rem(size, zone_sectors, &rem);
>> -			if (rem)
>> -				return false;
>> -		}
>> -
>> -	} else {
>> -
>> -		if (from & (zone_sectors - 1))
>> -			return false;
>> -		if ((from + size) < get_capacity(disk) &&
>> -		    (size & (zone_sectors - 1)))
>> -			return false;
>> -
>> -	}
>> -
>> -	return true;
>> -}
>> -
>>  int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
>>  {
>>  	struct parsed_partitions *state = NULL;
>> @@ -544,6 +494,14 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
>>  		}
>>  		return -EIO;
>>  	}
>> +
>> +	/* Partitions are not supported on zoned block devices */
>> +	if (bdev_is_zoned(bdev)) {
>> +		pr_warn("%s: ignoring partition table on zoned block device\n",
>> +			disk->disk_name);
>> +		goto out;
>> +	}
>> +
>>  	/*
>>  	 * If any partition code tried to read beyond EOD, try
>>  	 * unlocking native capacity even if partition table is
> 
> While I do applaud removing special cases for zoned devices, we do have
> the GENHD_FL_NO_PART_SCAN for precisely this use case.
> Any particular reason why this isn't being used, nor even set?

If we set the flag, indeed partitions will be ignored, which is exactly
what we want. But that also means that there will be no warning message
whatsoever in the very unlikely case of a user using an SMR disk with
partitions. I fully agree with you that we should set this flag, but I
decided against it initially to have the "partition table ignored"
warning printed for now.

If, as I expect, there are no screams coming from the field/users, we
can then safely add the flag and remove some more code (the test and
pr_warn call in rescan_partitions()).

> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9/9] block: rework zone reporting
  2019-11-08  1:57 ` [PATCH 9/9] block: rework zone reporting Damien Le Moal
  2019-11-08  7:23   ` Hannes Reinecke
@ 2019-11-08 15:16   ` Mike Snitzer
  1 sibling, 0 replies; 31+ messages in thread
From: Mike Snitzer @ 2019-11-08 15:16 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	dm-devel, linux-f2fs-devel, Jaegeuk Kim, Chao Yu

On Thu, Nov 07 2019 at  8:57pm -0500,
Damien Le Moal <damien.lemoal@wdc.com> wrote:

> From: Christoph Hellwig <hch@lst.de>
> 
> Avoid the need to allocate a potentially large array of struct blk_zone
> in the block layer by switching the ->report_zones method interface to
> a callback model. Now the caller simply supplies a callback that is
> executed on each reported zone, and private data for it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Reviewed-by: Mike Snitzer <snitzer@redhat.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/9] null_blk: clean up the block device operations
  2019-11-08  1:56 ` [PATCH 5/9] null_blk: clean up the block device operations Damien Le Moal
  2019-11-08  7:17   ` Hannes Reinecke
@ 2019-11-08 18:47   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 31+ messages in thread
From: Chaitanya Kulkarni @ 2019-11-08 18:47 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

Looks good to me.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
On 11/07/2019 05:57 PM, Damien Le Moal wrote:
> From: Christoph Hellwig<hch@lst.de>
>
> Remove the pointless stub open and release methods, give the operations
> vector a slightly less confusing name, and use normal alignment for the
> assignment operators.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
> Signed-off-by: Damien Le Moal<damien.lemoal@wdc.com>
> ---


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/9] null_blk: clean up report zones
  2019-11-08  1:56 ` [PATCH 6/9] null_blk: clean up report zones Damien Le Moal
  2019-11-08  7:18   ` Hannes Reinecke
@ 2019-11-08 18:47   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 31+ messages in thread
From: Chaitanya Kulkarni @ 2019-11-08 18:47 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

Looks good to me.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
On 11/07/2019 05:57 PM, Damien Le Moal wrote:
> From: Christoph Hellwig<hch@lst.de>
>
> Make the instance name match the method name and define the name to NULL
> instead of providing an inline stub, which is rather pointless for a
> method call.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
> Signed-off-by: Damien Le Moal<damien.lemoal@wdc.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/9] null_blk: Add zone_nr_conv to features
  2019-11-08  1:57 ` [PATCH 7/9] null_blk: Add zone_nr_conv to features Damien Le Moal
  2019-11-08  7:18   ` Hannes Reinecke
@ 2019-11-08 18:48   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 31+ messages in thread
From: Chaitanya Kulkarni @ 2019-11-08 18:48 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

Looks good to me.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

On 11/07/2019 05:57 PM, Damien Le Moal wrote:
> For a null_blk device with zoned mode enabled, the number of
> conventional zones can be configured through configfs with the
> zone_nr_conv parameter. Add this missing parameter in the features
> string.
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/9] block: cleanup the !zoned case in blk_revalidate_disk_zones
  2019-11-08  1:56 ` [PATCH 2/9] block: cleanup the !zoned case in blk_revalidate_disk_zones Damien Le Moal
  2019-11-08  7:11   ` Hannes Reinecke
@ 2019-11-08 18:50   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 31+ messages in thread
From: Chaitanya Kulkarni @ 2019-11-08 18:50 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

Looks good to me.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

On 11/07/2019 05:57 PM, Damien Le Moal wrote:
> From: Christoph Hellwig<hch@lst.de>
>
> blk_revalidate_disk_zones is never called for non-zoned devices.  Just
> return early and warn instead of trying to handle this case.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
> Signed-off-by: Damien Le Moal<damien.lemoal@wdc.com>
> ---


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
  2019-11-08  1:57 ` [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer() Damien Le Moal
  2019-11-08  6:31   ` [dm-devel] " Christoph Hellwig
  2019-11-08  7:20   ` Hannes Reinecke
@ 2019-11-08 19:06   ` Bart Van Assche
  2019-11-09  2:54     ` Damien Le Moal
  2 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2019-11-08 19:06 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 11/7/19 5:57 PM, Damien Le Moal wrote:
> -	buf = vzalloc(bufsize);
> -	if (buf)
> -		*buflen = bufsize;
> +	while (bufsize >= SECTOR_SIZE) {
> +		buf = vzalloc(bufsize);
> +		if (buf) {
> +			*buflen = bufsize;
> +			return buf;
> +		}
> +		bufsize >>= 1;
> +	}

Hi Damien,

Has it been considered to pass the __GFP_NORETRY flag to this vzalloc() 
call?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
  2019-11-08 19:06   ` [dm-devel] " Bart Van Assche
@ 2019-11-09  2:54     ` Damien Le Moal
  2019-11-09  3:02       ` Bart Van Assche
  0 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2019-11-09  2:54 UTC (permalink / raw)
  To: Bart Van Assche, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 2019/11/09 4:06, Bart Van Assche wrote:
> On 11/7/19 5:57 PM, Damien Le Moal wrote:
>> -	buf = vzalloc(bufsize);
>> -	if (buf)
>> -		*buflen = bufsize;
>> +	while (bufsize >= SECTOR_SIZE) {
>> +		buf = vzalloc(bufsize);
>> +		if (buf) {
>> +			*buflen = bufsize;
>> +			return buf;
>> +		}
>> +		bufsize >>= 1;
>> +	}
> 
> Hi Damien,
> 
> Has it been considered to pass the __GFP_NORETRY flag to this vzalloc() 
> call?

Do you mean using

__vmalloc(bufsize,
	  GFP_KERNEL | __GFP_ZERO | __GFP_NORETRY, PAGE_KERNEL);

instead of vzalloc() ? (since we cannot pass GFP flags to vzalloc()...)

Note that this is called with GFP_NOIO set for the caller context in the
case of revalidate zones, and default to GFP_KERNEL for
blkdev_report_zones() unless the caller also tweaks the context memalloc
flags.

> 
> Thanks,
> 
> Bart.
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dm-devel] [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
  2019-11-09  2:54     ` Damien Le Moal
@ 2019-11-09  3:02       ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2019-11-09  3:02 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, dm-devel, Mike Snitzer, linux-f2fs-devel,
	Jaegeuk Kim, Chao Yu

On 2019-11-08 18:54, Damien Le Moal wrote:
> On 2019/11/09 4:06, Bart Van Assche wrote:
>> On 11/7/19 5:57 PM, Damien Le Moal wrote:
>>> -	buf = vzalloc(bufsize);
>>> -	if (buf)
>>> -		*buflen = bufsize;
>>> +	while (bufsize >= SECTOR_SIZE) {
>>> +		buf = vzalloc(bufsize);
>>> +		if (buf) {
>>> +			*buflen = bufsize;
>>> +			return buf;
>>> +		}
>>> +		bufsize >>= 1;
>>> +	}
>>
>> Hi Damien,
>>
>> Has it been considered to pass the __GFP_NORETRY flag to this vzalloc() 
>> call?
> 
> Do you mean using
> 
> __vmalloc(bufsize,
> 	  GFP_KERNEL | __GFP_ZERO | __GFP_NORETRY, PAGE_KERNEL);
> 
> instead of vzalloc() ? (since we cannot pass GFP flags to vzalloc()...)
> 
> Note that this is called with GFP_NOIO set for the caller context in the
> case of revalidate zones, and default to GFP_KERNEL for
> blkdev_report_zones() unless the caller also tweaks the context memalloc
> flags.

Hi Damien,

Yes, that's what I meant. The following comment from mm/util.c explains
why __GFP_RETRY should be used if it is OK for an allocation to fail:

/*
 * We want to attempt a large physically contiguous block first because
 * it is less likely to fragment multiple larger blocks and therefore
 * contribute to a long term fragmentation less than vmalloc fallback.
 * However make sure that larger requests are not too disruptive - no
 * OOM killer and no allocation failure warnings as we have a fallback.
 */

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-11-09  3:02 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-08  1:56 [PATCH 0/9] Zoned block device enhancements and zone report rework Damien Le Moal
2019-11-08  1:56 ` [PATCH 1/9] block: Enhance blk_revalidate_disk_zones() Damien Le Moal
2019-11-08  6:28   ` [dm-devel] " Christoph Hellwig
2019-11-08  7:10   ` Hannes Reinecke
2019-11-08  1:56 ` [PATCH 2/9] block: cleanup the !zoned case in blk_revalidate_disk_zones Damien Le Moal
2019-11-08  7:11   ` Hannes Reinecke
2019-11-08 18:50   ` Chaitanya Kulkarni
2019-11-08  1:56 ` [PATCH 3/9] block: Simplify report zones execution Damien Le Moal
2019-11-08  7:12   ` Hannes Reinecke
2019-11-08  1:56 ` [PATCH 4/9] block: Remove partition support for zoned block devices Damien Le Moal
2019-11-08  6:30   ` [dm-devel] " Christoph Hellwig
2019-11-08  7:17   ` Hannes Reinecke
2019-11-08  7:28     ` Damien Le Moal
2019-11-08  1:56 ` [PATCH 5/9] null_blk: clean up the block device operations Damien Le Moal
2019-11-08  7:17   ` Hannes Reinecke
2019-11-08 18:47   ` Chaitanya Kulkarni
2019-11-08  1:56 ` [PATCH 6/9] null_blk: clean up report zones Damien Le Moal
2019-11-08  7:18   ` Hannes Reinecke
2019-11-08 18:47   ` Chaitanya Kulkarni
2019-11-08  1:57 ` [PATCH 7/9] null_blk: Add zone_nr_conv to features Damien Le Moal
2019-11-08  7:18   ` Hannes Reinecke
2019-11-08 18:48   ` Chaitanya Kulkarni
2019-11-08  1:57 ` [PATCH 8/9] scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer() Damien Le Moal
2019-11-08  6:31   ` [dm-devel] " Christoph Hellwig
2019-11-08  7:20   ` Hannes Reinecke
2019-11-08 19:06   ` [dm-devel] " Bart Van Assche
2019-11-09  2:54     ` Damien Le Moal
2019-11-09  3:02       ` Bart Van Assche
2019-11-08  1:57 ` [PATCH 9/9] block: rework zone reporting Damien Le Moal
2019-11-08  7:23   ` Hannes Reinecke
2019-11-08 15:16   ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).