linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] block: add zone write granularity limit
@ 2021-01-19 13:17 Damien Le Moal
  2021-01-19 13:17 ` [PATCH v2 1/2] block: introduce zone_write_granularity limit Damien Le Moal
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Damien Le Moal @ 2021-01-19 13:17 UTC (permalink / raw)
  To: linux-block, Jens Axboe
  Cc: Chaitanya Kulkarni, linux-scsi, Martin K . Petersen, linux-nvme,
	Christoph Hellwig, Keith Busch

(resending as I forgot to add nvme and scsi lists and their
maintainers. My apologies for the noise)

The first patch in this series introduces the zone write granularity
queue limit to indicate the alignment constraint for write operations
into sequential zones of zoned block devices.

The second patch fixes adds the missing documentation for
zone_append_max_bytes to the sysfs block documentation.

Changes form v1:
* Fixed typo in patch 2

Damien Le Moal (2):
  block: introduce zone_write_granularity limit
  block: document zone_append_max_bytes attribute

 Documentation/block/queue-sysfs.rst | 13 +++++++++++++
 block/blk-settings.c                | 28 ++++++++++++++++++++++++++++
 block/blk-sysfs.c                   |  7 +++++++
 drivers/nvme/host/zns.c             |  1 +
 drivers/scsi/sd_zbc.c               | 10 ++++++++++
 include/linux/blkdev.h              |  3 +++
 6 files changed, 62 insertions(+)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] block: introduce zone_write_granularity limit
  2021-01-19 13:17 [PATCH v2 0/2] block: add zone write granularity limit Damien Le Moal
@ 2021-01-19 13:17 ` Damien Le Moal
  2021-01-20 10:10   ` Christoph Hellwig
  2021-01-19 13:17 ` [PATCH v2 2/2] block: document zone_append_max_bytes attribute Damien Le Moal
  2021-01-20 10:07 ` [PATCH v2 0/2] block: add zone write granularity limit Christoph Hellwig
  2 siblings, 1 reply; 10+ messages in thread
From: Damien Le Moal @ 2021-01-19 13:17 UTC (permalink / raw)
  To: linux-block, Jens Axboe
  Cc: Chaitanya Kulkarni, linux-scsi, Martin K . Petersen, linux-nvme,
	Christoph Hellwig, Keith Busch

Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that
all writes into sequential write required zones be aligned to the device
physical block size. However, NVMe ZNS does not have this constraint and
allows write operations into sequential zones to be logical block size
aligned. This inconsistency does not help with portability of software
across device types.
To solve this, introduce the zone_write_granularity queue limit to
indicate the alignment constraint, in bytes, of write operations into
zones of a zoned block device. This new limit is exported as a
read-only sysfs queue attribute and the helper
blk_queue_zone_write_granularity() introduced for drivers to set this
limit. The scsi disk driver is modified to use this helper to set
host-managed SMR disk zone write granularity to the disk physical block
size. The nvme driver zns support use this helper to set the new limit
to the logical block size of the zoned namespace.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 Documentation/block/queue-sysfs.rst |  7 +++++++
 block/blk-settings.c                | 28 ++++++++++++++++++++++++++++
 block/blk-sysfs.c                   |  7 +++++++
 drivers/nvme/host/zns.c             |  1 +
 drivers/scsi/sd_zbc.c               | 10 ++++++++++
 include/linux/blkdev.h              |  3 +++
 6 files changed, 56 insertions(+)

diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
index 2638d3446b79..c8bf8bc3c03a 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -273,4 +273,11 @@ devices are described in the ZBC (Zoned Block Commands) and ZAC
 do not support zone commands, they will be treated as regular block devices
 and zoned will report "none".
 
+zone_write_granularity (RO)
+---------------------------
+This indicates the alignment constraint, in bytes, for write operations in
+sequential zones of zoned block devices (devices with a zoned attributed
+that reports "host-managed" or "host-aware"). This value is always 0 for
+regular block devices.
+
 Jens Axboe <jens.axboe@oracle.com>, February 2009
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 43990b1d148b..6be6ed9485e3 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -60,6 +60,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->io_opt = 0;
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
+	lim->zone_write_granularity = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -366,6 +367,31 @@ void blk_queue_physical_block_size(struct request_queue *q, unsigned int size)
 }
 EXPORT_SYMBOL(blk_queue_physical_block_size);
 
+/**
+ * blk_queue_zone_write_granularity - set zone write granularity for the queue
+ * @q:  the request queue for the zoned device
+ * @size:  the zone write granularity size, in bytes
+ *
+ * Description:
+ *   This should be set to the lowest possible size allowing to write in
+ *   sequential zones of a zoned block device.
+ */
+void blk_queue_zone_write_granularity(struct request_queue *q,
+				      unsigned int size)
+{
+	if (WARN_ON(!blk_queue_is_zoned(q)))
+		return;
+
+	q->limits.zone_write_granularity = size;
+
+	if (q->limits.zone_write_granularity < q->limits.logical_block_size)
+		q->limits.zone_write_granularity = q->limits.logical_block_size;
+
+	if (q->limits.zone_write_granularity < q->limits.io_min)
+		q->limits.zone_write_granularity = q->limits.io_min;
+}
+EXPORT_SYMBOL_GPL(blk_queue_zone_write_granularity);
+
 /**
  * blk_queue_alignment_offset - set physical block alignment offset
  * @q:	the request queue for the device
@@ -631,6 +657,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			t->discard_granularity;
 	}
 
+	t->zone_write_granularity = max(t->zone_write_granularity,
+					b->zone_write_granularity);
 	t->zoned = max(t->zoned, b->zoned);
 	return ret;
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f1683af0..7ea3dd4d876b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -219,6 +219,11 @@ static ssize_t queue_write_zeroes_max_show(struct request_queue *q, char *page)
 		(unsigned long long)q->limits.max_write_zeroes_sectors << 9);
 }
 
+static ssize_t queue_zone_write_granularity_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.zone_write_granularity, page);
+}
+
 static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page)
 {
 	unsigned long long max_sectors = q->limits.max_zone_append_sectors;
@@ -585,6 +590,7 @@ QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
 QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
 QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
+QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity");
 
 QUEUE_RO_ENTRY(queue_zoned, "zoned");
 QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
@@ -639,6 +645,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
+	&queue_zone_write_granularity_entry.attr,
 	&queue_nonrot_entry.attr,
 	&queue_zoned_entry.attr,
 	&queue_nr_zones_entry.attr,
diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index 1dfe9a3500e3..f25311ccd996 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -113,6 +113,7 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
 	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
 	blk_queue_max_open_zones(q, le32_to_cpu(id->mor) + 1);
 	blk_queue_max_active_zones(q, le32_to_cpu(id->mar) + 1);
+	blk_queue_zone_write_granularity(q, q->limits.logical_block_size);
 free_data:
 	kfree(id);
 	return status;
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index cf07b7f93579..41d602f7e62e 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -789,6 +789,16 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 	blk_queue_max_active_zones(q, 0);
 	nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks);
 
+	/*
+	 * Per ZBC and ZAC specifications, writes in sequential write required
+	 * zones of host-managed devices must be aligned to the device physical
+	 * block size.
+	 */
+	if (blk_queue_zoned_model(q) == BLK_ZONED_HM)
+		blk_queue_zone_write_granularity(q, sdkp->physical_block_size);
+	else
+		blk_queue_zone_write_granularity(q, sdkp->device->sector_size);
+
 	/* READ16/WRITE16 is mandatory for ZBC disks */
 	sdkp->device->use_16_for_rw = 1;
 	sdkp->device->use_10_for_rw = 0;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..011b3d2cd273 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -337,6 +337,7 @@ struct queue_limits {
 	unsigned int		max_zone_append_sectors;
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
+	unsigned int		zone_write_granularity;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -1161,6 +1162,8 @@ extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
 		unsigned int max_zone_append_sectors);
 extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
+void blk_queue_zone_write_granularity(struct request_queue *q,
+				      unsigned int size);
 extern void blk_queue_alignment_offset(struct request_queue *q,
 				       unsigned int alignment);
 void blk_queue_update_readahead(struct request_queue *q);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] block: document zone_append_max_bytes attribute
  2021-01-19 13:17 [PATCH v2 0/2] block: add zone write granularity limit Damien Le Moal
  2021-01-19 13:17 ` [PATCH v2 1/2] block: introduce zone_write_granularity limit Damien Le Moal
@ 2021-01-19 13:17 ` Damien Le Moal
  2021-01-20 10:10   ` Christoph Hellwig
  2021-01-20 10:07 ` [PATCH v2 0/2] block: add zone write granularity limit Christoph Hellwig
  2 siblings, 1 reply; 10+ messages in thread
From: Damien Le Moal @ 2021-01-19 13:17 UTC (permalink / raw)
  To: linux-block, Jens Axboe
  Cc: Chaitanya Kulkarni, linux-scsi, Martin K . Petersen, linux-nvme,
	Christoph Hellwig, Keith Busch

The description of the zone_append_max_bytes sysfs queue attribute is
missing from Documentation/block/queue-sysfs.rst. Add it.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 Documentation/block/queue-sysfs.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
index c8bf8bc3c03a..4dc7f0d499a8 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -261,6 +261,12 @@ For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
 bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
 is not supported.
 
+zone_append_max_bytes (RO)
+--------------------------
+This is the maximum number of bytes that can be written to a sequential
+zone of a zoned block device using a zone append write operation
+(REQ_OP_ZONE_APPEND). This value is always 0 for regular block devices.
+
 zoned (RO)
 ----------
 This indicates if the device is a zoned block device and the zone model of the
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] block: add zone write granularity limit
  2021-01-19 13:17 [PATCH v2 0/2] block: add zone write granularity limit Damien Le Moal
  2021-01-19 13:17 ` [PATCH v2 1/2] block: introduce zone_write_granularity limit Damien Le Moal
  2021-01-19 13:17 ` [PATCH v2 2/2] block: document zone_append_max_bytes attribute Damien Le Moal
@ 2021-01-20 10:07 ` Christoph Hellwig
  2021-01-20 10:42   ` Damien Le Moal
  2021-01-20 10:46   ` Damien Le Moal
  2 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2021-01-20 10:07 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, Chaitanya Kulkarni, linux-scsi,
	Martin K . Petersen, linux-nvme, Christoph Hellwig, Keith Busch

Shouldn't zonefs (in addition to the pending btrfs and nvmet patches)
start using this new value instead pf the physical block size?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] block: introduce zone_write_granularity limit
  2021-01-19 13:17 ` [PATCH v2 1/2] block: introduce zone_write_granularity limit Damien Le Moal
@ 2021-01-20 10:10   ` Christoph Hellwig
  2021-01-20 11:00     ` Damien Le Moal
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2021-01-20 10:10 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, Chaitanya Kulkarni, linux-scsi,
	Martin K . Petersen, linux-nvme, Christoph Hellwig, Keith Busch

On Tue, Jan 19, 2021 at 10:17:22PM +0900, Damien Le Moal wrote:
> Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that
> all writes into sequential write required zones be aligned to the device
> physical block size. However, NVMe ZNS does not have this constraint and
> allows write operations into sequential zones to be logical block size
> aligned. This inconsistency does not help with portability of software
> across device types.
> To solve this, introduce the zone_write_granularity queue limit to
> indicate the alignment constraint, in bytes, of write operations into
> zones of a zoned block device. This new limit is exported as a
> read-only sysfs queue attribute and the helper
> blk_queue_zone_write_granularity() introduced for drivers to set this
> limit. The scsi disk driver is modified to use this helper to set
> host-managed SMR disk zone write granularity to the disk physical block
> size. The nvme driver zns support use this helper to set the new limit
> to the logical block size of the zoned namespace.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  Documentation/block/queue-sysfs.rst |  7 +++++++
>  block/blk-settings.c                | 28 ++++++++++++++++++++++++++++
>  block/blk-sysfs.c                   |  7 +++++++
>  drivers/nvme/host/zns.c             |  1 +
>  drivers/scsi/sd_zbc.c               | 10 ++++++++++
>  include/linux/blkdev.h              |  3 +++
>  6 files changed, 56 insertions(+)
> 
> diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
> index 2638d3446b79..c8bf8bc3c03a 100644
> --- a/Documentation/block/queue-sysfs.rst
> +++ b/Documentation/block/queue-sysfs.rst
> @@ -273,4 +273,11 @@ devices are described in the ZBC (Zoned Block Commands) and ZAC
>  do not support zone commands, they will be treated as regular block devices
>  and zoned will report "none".
>  
> +zone_write_granularity (RO)
> +---------------------------
> +This indicates the alignment constraint, in bytes, for write operations in
> +sequential zones of zoned block devices (devices with a zoned attributed
> +that reports "host-managed" or "host-aware"). This value is always 0 for
> +regular block devices.
> +
>  Jens Axboe <jens.axboe@oracle.com>, February 2009
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 43990b1d148b..6be6ed9485e3 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -60,6 +60,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->io_opt = 0;
>  	lim->misaligned = 0;
>  	lim->zoned = BLK_ZONED_NONE;
> +	lim->zone_write_granularity = 0;

I think this should default to 512 just like the logic and physical
block size.

>  }
>  EXPORT_SYMBOL(blk_set_default_limits);
>  
> @@ -366,6 +367,31 @@ void blk_queue_physical_block_size(struct request_queue *q, unsigned int size)
>  }
>  EXPORT_SYMBOL(blk_queue_physical_block_size);
>  
> +/**
> + * blk_queue_zone_write_granularity - set zone write granularity for the queue
> + * @q:  the request queue for the zoned device
> + * @size:  the zone write granularity size, in bytes
> + *
> + * Description:
> + *   This should be set to the lowest possible size allowing to write in
> + *   sequential zones of a zoned block device.
> + */
> +void blk_queue_zone_write_granularity(struct request_queue *q,
> +				      unsigned int size)
> +{
> +	if (WARN_ON(!blk_queue_is_zoned(q)))
> +		return;
> +
> +	q->limits.zone_write_granularity = size;
> +
> +	if (q->limits.zone_write_granularity < q->limits.logical_block_size)
> +		q->limits.zone_write_granularity = q->limits.logical_block_size;

I think this should be a WARN_ON_ONCE.

> +	if (q->limits.zone_write_granularity < q->limits.io_min)
> +		q->limits.zone_write_granularity = q->limits.io_min;

I don't think this makes sense at all.

> +static ssize_t queue_zone_write_granularity_show(struct request_queue *q, char *page)

Overly long line.

> +	/*
> +	 * Per ZBC and ZAC specifications, writes in sequential write required
> +	 * zones of host-managed devices must be aligned to the device physical
> +	 * block size.
> +	 */
> +	if (blk_queue_zoned_model(q) == BLK_ZONED_HM)
> +		blk_queue_zone_write_granularity(q, sdkp->physical_block_size);
> +	else
> +		blk_queue_zone_write_granularity(q, sdkp->device->sector_size);

Do we really want to special case HA drives here?  I though we generally
either treat them as drive managed (if they have partitions) or else like
host managed ones.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] block: document zone_append_max_bytes attribute
  2021-01-19 13:17 ` [PATCH v2 2/2] block: document zone_append_max_bytes attribute Damien Le Moal
@ 2021-01-20 10:10   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2021-01-20 10:10 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, Chaitanya Kulkarni, linux-scsi,
	Martin K . Petersen, linux-nvme, Christoph Hellwig, Keith Busch

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] block: add zone write granularity limit
  2021-01-20 10:07 ` [PATCH v2 0/2] block: add zone write granularity limit Christoph Hellwig
@ 2021-01-20 10:42   ` Damien Le Moal
  2021-01-20 10:46   ` Damien Le Moal
  1 sibling, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2021-01-20 10:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, Jens Axboe, Chaitanya Kulkarni, linux-scsi,
	Martin K . Petersen, linux-nvme, Keith Busch

On 2021/01/20 19:07, Christoph Hellwig wrote:
> Shouldn't zonefs (in addition to the pending btrfs and nvmet patches)
> start using this new value instead pf the physical block size?
> 

Yes for zonefs. I will add one patch to this series for that.
For nvmet patches, I will let Chaitanya handle that.
For on-going btrfs, I think we can cover that with btrfs-progs (check in mkfs)
for now and add a patch as a fix in 5.12 to check the FS block size on zoned
devices, if the series is accepted. Patching now would cause the btrfs tree to
not build.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] block: add zone write granularity limit
  2021-01-20 10:07 ` [PATCH v2 0/2] block: add zone write granularity limit Christoph Hellwig
  2021-01-20 10:42   ` Damien Le Moal
@ 2021-01-20 10:46   ` Damien Le Moal
  1 sibling, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2021-01-20 10:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, Jens Axboe, Chaitanya Kulkarni, linux-scsi,
	Martin K . Petersen, linux-nvme, Keith Busch

On 2021/01/20 19:07, Christoph Hellwig wrote:
> Shouldn't zonefs (in addition to the pending btrfs and nvmet patches)
> start using this new value instead pf the physical block size?
> 

And I think null_blk needs that limit set too. Forgot to add it.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] block: introduce zone_write_granularity limit
  2021-01-20 10:10   ` Christoph Hellwig
@ 2021-01-20 11:00     ` Damien Le Moal
  2021-01-20 12:36       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Damien Le Moal @ 2021-01-20 11:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, Jens Axboe, Chaitanya Kulkarni, linux-scsi,
	Martin K . Petersen, linux-nvme, Keith Busch

On 2021/01/20 19:10, Christoph Hellwig wrote:
> On Tue, Jan 19, 2021 at 10:17:22PM +0900, Damien Le Moal wrote:
>> Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that
>> all writes into sequential write required zones be aligned to the device
>> physical block size. However, NVMe ZNS does not have this constraint and
>> allows write operations into sequential zones to be logical block size
>> aligned. This inconsistency does not help with portability of software
>> across device types.
>> To solve this, introduce the zone_write_granularity queue limit to
>> indicate the alignment constraint, in bytes, of write operations into
>> zones of a zoned block device. This new limit is exported as a
>> read-only sysfs queue attribute and the helper
>> blk_queue_zone_write_granularity() introduced for drivers to set this
>> limit. The scsi disk driver is modified to use this helper to set
>> host-managed SMR disk zone write granularity to the disk physical block
>> size. The nvme driver zns support use this helper to set the new limit
>> to the logical block size of the zoned namespace.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>  Documentation/block/queue-sysfs.rst |  7 +++++++
>>  block/blk-settings.c                | 28 ++++++++++++++++++++++++++++
>>  block/blk-sysfs.c                   |  7 +++++++
>>  drivers/nvme/host/zns.c             |  1 +
>>  drivers/scsi/sd_zbc.c               | 10 ++++++++++
>>  include/linux/blkdev.h              |  3 +++
>>  6 files changed, 56 insertions(+)
>>
>> diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
>> index 2638d3446b79..c8bf8bc3c03a 100644
>> --- a/Documentation/block/queue-sysfs.rst
>> +++ b/Documentation/block/queue-sysfs.rst
>> @@ -273,4 +273,11 @@ devices are described in the ZBC (Zoned Block Commands) and ZAC
>>  do not support zone commands, they will be treated as regular block devices
>>  and zoned will report "none".
>>  
>> +zone_write_granularity (RO)
>> +---------------------------
>> +This indicates the alignment constraint, in bytes, for write operations in
>> +sequential zones of zoned block devices (devices with a zoned attributed
>> +that reports "host-managed" or "host-aware"). This value is always 0 for
>> +regular block devices.
>> +
>>  Jens Axboe <jens.axboe@oracle.com>, February 2009
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 43990b1d148b..6be6ed9485e3 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -60,6 +60,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>>  	lim->io_opt = 0;
>>  	lim->misaligned = 0;
>>  	lim->zoned = BLK_ZONED_NONE;
>> +	lim->zone_write_granularity = 0;
> 
> I think this should default to 512 just like the logic and physical
> block size.

Hmm. I wanted to keep this limit to 0 for regular devices. If we default to 512,
regular devices will see that value. They can ignore it of course, but having it
as 0 makes it clear that it should be ignored.

> 
>>  }
>>  EXPORT_SYMBOL(blk_set_default_limits);
>>  
>> @@ -366,6 +367,31 @@ void blk_queue_physical_block_size(struct request_queue *q, unsigned int size)
>>  }
>>  EXPORT_SYMBOL(blk_queue_physical_block_size);
>>  
>> +/**
>> + * blk_queue_zone_write_granularity - set zone write granularity for the queue
>> + * @q:  the request queue for the zoned device
>> + * @size:  the zone write granularity size, in bytes
>> + *
>> + * Description:
>> + *   This should be set to the lowest possible size allowing to write in
>> + *   sequential zones of a zoned block device.
>> + */
>> +void blk_queue_zone_write_granularity(struct request_queue *q,
>> +				      unsigned int size)
>> +{
>> +	if (WARN_ON(!blk_queue_is_zoned(q)))
>> +		return;
>> +
>> +	q->limits.zone_write_granularity = size;
>> +
>> +	if (q->limits.zone_write_granularity < q->limits.logical_block_size)
>> +		q->limits.zone_write_granularity = q->limits.logical_block_size;
> 
> I think this should be a WARN_ON_ONCE.

OK.

> 
>> +	if (q->limits.zone_write_granularity < q->limits.io_min)
>> +		q->limits.zone_write_granularity = q->limits.io_min;
> 
> I don't think this makes sense at all.

Arg ! Yes, that is a hint only ! I misread the comments for blk_limits_io_min().
Will remove this.

> 
>> +static ssize_t queue_zone_write_granularity_show(struct request_queue *q, char *page)
> 
> Overly long line.
> 
>> +	/*
>> +	 * Per ZBC and ZAC specifications, writes in sequential write required
>> +	 * zones of host-managed devices must be aligned to the device physical
>> +	 * block size.
>> +	 */
>> +	if (blk_queue_zoned_model(q) == BLK_ZONED_HM)
>> +		blk_queue_zone_write_granularity(q, sdkp->physical_block_size);
>> +	else
>> +		blk_queue_zone_write_granularity(q, sdkp->device->sector_size);
> 
> Do we really want to special case HA drives here?  I though we generally
> either treat them as drive managed (if they have partitions) or else like
> host managed ones.

If the HA drive is treated as drive-managed (i.e. it has a partition), then the
model here will be seen as BLK_ZONED_NONE and we should ignore it, or better,
set the zone_write_granularity to 0. If the drive is actually used as a zoned
drive, then we will see BLK_ZONED_HA here and we can use the logical block size
since HA drives do not have that restriction on physical block alignment.
So my code above is wrong. The else should be

else if (blk_queue_zoned_model(q) == BLK_ZONED_HA)

And I think we also need to add "q->limit.zone_write_granularity = 0" in
blk_queue_set_zoned() if model == BLK_ZONED_NONE at the end of that function.

Will send a v3 with these fixes.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] block: introduce zone_write_granularity limit
  2021-01-20 11:00     ` Damien Le Moal
@ 2021-01-20 12:36       ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2021-01-20 12:36 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, linux-block, Jens Axboe, Chaitanya Kulkarni,
	linux-scsi, Martin K . Petersen, linux-nvme, Keith Busch

On Wed, Jan 20, 2021 at 11:00:29AM +0000, Damien Le Moal wrote:
> Hmm. I wanted to keep this limit to 0 for regular devices. If we default to 512,
> regular devices will see that value. They can ignore it of course, but having it
> as 0 makes it clear that it should be ignored.

Hmm.  Maybe we should keep the 0 default and only set it in sd.c.  Then the
query helper can default to the logical block size instead.  That'll just d
o the right thing without changes to most drivers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-01-20 13:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-19 13:17 [PATCH v2 0/2] block: add zone write granularity limit Damien Le Moal
2021-01-19 13:17 ` [PATCH v2 1/2] block: introduce zone_write_granularity limit Damien Le Moal
2021-01-20 10:10   ` Christoph Hellwig
2021-01-20 11:00     ` Damien Le Moal
2021-01-20 12:36       ` Christoph Hellwig
2021-01-19 13:17 ` [PATCH v2 2/2] block: document zone_append_max_bytes attribute Damien Le Moal
2021-01-20 10:10   ` Christoph Hellwig
2021-01-20 10:07 ` [PATCH v2 0/2] block: add zone write granularity limit Christoph Hellwig
2021-01-20 10:42   ` Damien Le Moal
2021-01-20 10:46   ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).