All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] ZBC / Zoned block device support
@ 2016-09-28  8:45 ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

This series introduces support for zoned block devices. It integrates
earlier submissions by Hannes Reinecke and Shaun Tancheff. Compared to the
previous series version, the code was significantly simplified by limiting
support to zoned devices satisfying the following conditions:
1) All zones of the device are the same size, with the exception of an
   eventual last smaller runt zone.
2) For host-managed disks, reads must be unrestricted (read commands do not
   fail due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are ignored.

These 2 conditions allowed dropping the zone information cache implemented
in the previous version. This simplifies the code and also reduces the memory
consumption at run time. Support for zoned devices now only require one bit
per zone (less than 8KB in total). This bit field is used to write-lock
zones and prevent the concurrent execution of multiple write commands in
the same zone. This avoids write ordering problems at dispatch time, for
both the simple queue and scsi-mq settings.

The new operations introduced to suport zone manipulation was reduced to
only the two main ZBC/ZAC defined commands: REPORT ZONES (REQ_OP_ZONE_REPORT)
and RESET WRITE POINTER (REQ_OP_ZONE_RESET). This brings the total number of
operations defined to 8, which fits in the 3 bits (REQ_OP_BITS) reserved for
operation code in bio->bi_opf and req->cmd_flags.

Most of the ZBC specific code is kept out of sd.c and implemented in the
new file sd_zbc.c. Similarly, at the block layer, most of the zoned block
device code is implemented in the new blk-zoned.c.

For host-managed zoned block devices, the sequential write constraint of
write pointer zones is exposed to the user. Users of the disk (applications,
file systems or device mappers) must sequentially write to zones. This means
that for raw block device accesses from applications, buffered writes are
unreliable and direct I/Os must be used (or buffered writes with O_SYNC).

Access to zone manipulation operations is also provided to applications
through a set of new ioctls. This allows applications operating on raw
block devices (e.g. mkfs.xxx) to discover a device zone layout and
manipulate zone state.

Changes from v3:
* Fixed several typos and tabs/spaces
* Added description of zoned and chunk_sectors queue attributes in
  Documentation/ABI/testing/sysfs-block
* Fixed sd_read_capacity call in sd.c and to avoid missing information on
  the first pass of a disk scan
* Fixed scsi_disk zone related field to use logical block size unit instead
  of 512B sector unit.

Changes from v2:
* Use kcalloc to allocate zone information array for ioctl
* Use kcalloc to allocate zone information array for ioctl
* Export GPL the functions blkdev_report_zones and blkdev_reset_zones
* Shuffled uapi definitions from patch 7 into patch 5

Damien Le Moal (1):
  block: Add 'zoned' queue limit

Hannes Reinecke (4):
  blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  block: update chunk_sectors in blk_stack_limits()
  block: Implement support for zoned block devices
  sd: Implement support for ZBC devices

Shaun Tancheff (2):
  block: Define zoned block device operations
  blk-zoned: implement ioctls

 Documentation/ABI/testing/sysfs-block |  29 ++
 block/Kconfig                         |   8 +
 block/Makefile                        |   1 +
 block/blk-core.c                      |   4 +
 block/blk-settings.c                  |   5 +
 block/blk-sysfs.c                     |  29 ++
 block/blk-zoned.c                     | 350 +++++++++++++++++++
 block/ioctl.c                         |   4 +
 drivers/scsi/Makefile                 |   1 +
 drivers/scsi/sd.c                     | 143 ++++++--
 drivers/scsi/sd.h                     |  70 ++++
 drivers/scsi/sd_zbc.c                 | 624 ++++++++++++++++++++++++++++++++++
 include/linux/blk_types.h             |   2 +
 include/linux/blkdev.h                |  99 ++++++
 include/scsi/scsi_proto.h             |  17 +
 include/uapi/linux/Kbuild             |   1 +
 include/uapi/linux/blkzoned.h         | 143 ++++++++
 include/uapi/linux/fs.h               |   4 +
 18 files changed, 1501 insertions(+), 33 deletions(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 drivers/scsi/sd_zbc.c
 create mode 100644 include/uapi/linux/blkzoned.h

-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 0/7] ZBC / Zoned block device support
@ 2016-09-28  8:45 ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

This series introduces support for zoned block devices. It integrates
earlier submissions by Hannes Reinecke and Shaun Tancheff. Compared to the
previous series version, the code was significantly simplified by limiting
support to zoned devices satisfying the following conditions:
1) All zones of the device are the same size, with the exception of an
   eventual last smaller runt zone.
2) For host-managed disks, reads must be unrestricted (read commands do not
   fail due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are ignored.

These 2 conditions allowed dropping the zone information cache implemented
in the previous version. This simplifies the code and also reduces the memory
consumption at run time. Support for zoned devices now only require one bit
per zone (less than 8KB in total). This bit field is used to write-lock
zones and prevent the concurrent execution of multiple write commands in
the same zone. This avoids write ordering problems at dispatch time, for
both the simple queue and scsi-mq settings.

The new operations introduced to suport zone manipulation was reduced to
only the two main ZBC/ZAC defined commands: REPORT ZONES (REQ_OP_ZONE_REPORT)
and RESET WRITE POINTER (REQ_OP_ZONE_RESET). This brings the total number of
operations defined to 8, which fits in the 3 bits (REQ_OP_BITS) reserved for
operation code in bio->bi_opf and req->cmd_flags.

Most of the ZBC specific code is kept out of sd.c and implemented in the
new file sd_zbc.c. Similarly, at the block layer, most of the zoned block
device code is implemented in the new blk-zoned.c.

For host-managed zoned block devices, the sequential write constraint of
write pointer zones is exposed to the user. Users of the disk (applications,
file systems or device mappers) must sequentially write to zones. This means
that for raw block device accesses from applications, buffered writes are
unreliable and direct I/Os must be used (or buffered writes with O_SYNC).

Access to zone manipulation operations is also provided to applications
through a set of new ioctls. This allows applications operating on raw
block devices (e.g. mkfs.xxx) to discover a device zone layout and
manipulate zone state.

Changes from v3:
* Fixed several typos and tabs/spaces
* Added description of zoned and chunk_sectors queue attributes in
  Documentation/ABI/testing/sysfs-block
* Fixed sd_read_capacity call in sd.c and to avoid missing information on
  the first pass of a disk scan
* Fixed scsi_disk zone related field to use logical block size unit instead
  of 512B sector unit.

Changes from v2:
* Use kcalloc to allocate zone information array for ioctl
* Use kcalloc to allocate zone information array for ioctl
* Export GPL the functions blkdev_report_zones and blkdev_reset_zones
* Shuffled uapi definitions from patch 7 into patch 5

Damien Le Moal (1):
  block: Add 'zoned' queue limit

Hannes Reinecke (4):
  blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  block: update chunk_sectors in blk_stack_limits()
  block: Implement support for zoned block devices
  sd: Implement support for ZBC devices

Shaun Tancheff (2):
  block: Define zoned block device operations
  blk-zoned: implement ioctls

 Documentation/ABI/testing/sysfs-block |  29 ++
 block/Kconfig                         |   8 +
 block/Makefile                        |   1 +
 block/blk-core.c                      |   4 +
 block/blk-settings.c                  |   5 +
 block/blk-sysfs.c                     |  29 ++
 block/blk-zoned.c                     | 350 +++++++++++++++++++
 block/ioctl.c                         |   4 +
 drivers/scsi/Makefile                 |   1 +
 drivers/scsi/sd.c                     | 143 ++++++--
 drivers/scsi/sd.h                     |  70 ++++
 drivers/scsi/sd_zbc.c                 | 624 ++++++++++++++++++++++++++++++++++
 include/linux/blk_types.h             |   2 +
 include/linux/blkdev.h                |  99 ++++++
 include/scsi/scsi_proto.h             |  17 +
 include/uapi/linux/Kbuild             |   1 +
 include/uapi/linux/blkzoned.h         | 143 ++++++++
 include/uapi/linux/fs.h               |   4 +
 18 files changed, 1501 insertions(+), 33 deletions(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 drivers/scsi/sd_zbc.c
 create mode 100644 include/uapi/linux/blkzoned.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 1/7] block: Add 'zoned' queue limit
  2016-09-28  8:45 ` Damien Le Moal
@ 2016-09-28  8:45   ` Damien Le Moal
  -1 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

Add the zoned queue limit to indicate the zoning model of a block device.
Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
for host-managed zone block devices. The standards defined drive managed
model is not defined here since these block devices do not provide any
command for accessing zone information. Drive managed model devices will
be reported as BLK_ZONED_NONE.

The helper functions blk_queue_zoned_model and bdev_zoned_model return
the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
return a boolean for callers to test if a block device is zoned.

The zoned attribute is also exported as a string to applications via
sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
BLK_ZONED_HM as "host-managed".

Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 Documentation/ABI/testing/sysfs-block | 16 ++++++++++++
 block/blk-settings.c                  |  1 +
 block/blk-sysfs.c                     | 18 ++++++++++++++
 include/linux/blkdev.h                | 47 +++++++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 71d184d..75a5055 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -235,3 +235,19 @@ Description:
 		write_same_max_bytes is 0, write same is not supported
 		by the device.
 
+What:		/sys/block/<disk>/queue/zoned
+Date:		September 2016
+Contact:	Damien Le Moal <damien.lemoal@hgst.com>
+Description:
+		zoned indicates if the device is a zoned block device
+		and the zone model of the device if it is indeed zoned.
+		The possible values indicated by zoned are "none" for
+		regular block devices and "host-aware" or "host-managed"
+		for zoned block devices. The characteristics of
+		host-aware and host-managed zoned block devices are
+		described in the ZBC (Zoned Block Commands) and ZAC
+		(Zoned Device ATA Command Set) standards. These standards
+		also define the "drive-managed" zone model. However,
+		since drive-managed zoned block devices do not support
+		zone commands, they will be treated as regular block
+		devices and zoned will report "none".
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f679ae1..b1d5b7f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->io_opt = 0;
 	lim->misaligned = 0;
 	lim->cluster = 1;
+	lim->zoned = BLK_ZONED_NONE;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..ff9cd9c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
+static ssize_t queue_zoned_show(struct request_queue *q, char *page)
+{
+	switch (blk_queue_zoned_model(q)) {
+	case BLK_ZONED_HA:
+		return sprintf(page, "host-aware\n");
+	case BLK_ZONED_HM:
+		return sprintf(page, "host-managed\n");
+	default:
+		return sprintf(page, "none\n");
+	}
+}
+
 static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
 {
 	return queue_var_show((blk_queue_nomerges(q) << 1) |
@@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry = {
 	.store = queue_store_nonrot,
 };
 
+static struct queue_sysfs_entry queue_zoned_entry = {
+	.attr = {.name = "zoned", .mode = S_IRUGO },
+	.show = queue_zoned_show,
+};
+
 static struct queue_sysfs_entry queue_nomerges_entry = {
 	.attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_nomerges_show,
@@ -546,6 +563,7 @@ static struct attribute *default_attrs[] = {
 	&queue_discard_zeroes_data_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_nonrot_entry.attr,
+	&queue_zoned_entry.attr,
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
 	&queue_iostats_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..f19e16b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -261,6 +261,15 @@ struct blk_queue_tag {
 #define BLK_SCSI_MAX_CMDS	(256)
 #define BLK_SCSI_CMD_PER_LONG	(BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
+/*
+ * Zoned block device models (zoned limit).
+ */
+enum blk_zoned_model {
+	BLK_ZONED_NONE,	/* Regular block device */
+	BLK_ZONED_HA,	/* Host-aware zoned block device */
+	BLK_ZONED_HM,	/* Host-managed zoned block device */
+};
+
 struct queue_limits {
 	unsigned long		bounce_pfn;
 	unsigned long		seg_boundary_mask;
@@ -290,6 +299,7 @@ struct queue_limits {
 	unsigned char		cluster;
 	unsigned char		discard_zeroes_data;
 	unsigned char		raid_partial_stripes_expensive;
+	enum blk_zoned_model	zoned;
 };
 
 struct request_queue {
@@ -627,6 +637,23 @@ static inline unsigned int blk_queue_cluster(struct request_queue *q)
 	return q->limits.cluster;
 }
 
+static inline enum blk_zoned_model
+blk_queue_zoned_model(struct request_queue *q)
+{
+	return q->limits.zoned;
+}
+
+static inline bool blk_queue_is_zoned(struct request_queue *q)
+{
+	switch (blk_queue_zoned_model(q)) {
+	case BLK_ZONED_HA:
+	case BLK_ZONED_HM:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */
@@ -1354,6 +1381,26 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
 	return 0;
 }
 
+static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_zoned_model(q);
+
+	return BLK_ZONED_NONE;
+}
+
+static inline bool bdev_is_zoned(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_is_zoned(q);
+
+	return false;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 1/7] block: Add 'zoned' queue limit
@ 2016-09-28  8:45   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

Add the zoned queue limit to indicate the zoning model of a block device.
Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
for host-managed zone block devices. The standards defined drive managed
model is not defined here since these block devices do not provide any
command for accessing zone information. Drive managed model devices will
be reported as BLK_ZONED_NONE.

The helper functions blk_queue_zoned_model and bdev_zoned_model return
the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
return a boolean for callers to test if a block device is zoned.

The zoned attribute is also exported as a string to applications via
sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
BLK_ZONED_HM as "host-managed".

Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 Documentation/ABI/testing/sysfs-block | 16 ++++++++++++
 block/blk-settings.c                  |  1 +
 block/blk-sysfs.c                     | 18 ++++++++++++++
 include/linux/blkdev.h                | 47 +++++++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 71d184d..75a5055 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -235,3 +235,19 @@ Description:
 		write_same_max_bytes is 0, write same is not supported
 		by the device.
 
+What:		/sys/block/<disk>/queue/zoned
+Date:		September 2016
+Contact:	Damien Le Moal <damien.lemoal@hgst.com>
+Description:
+		zoned indicates if the device is a zoned block device
+		and the zone model of the device if it is indeed zoned.
+		The possible values indicated by zoned are "none" for
+		regular block devices and "host-aware" or "host-managed"
+		for zoned block devices. The characteristics of
+		host-aware and host-managed zoned block devices are
+		described in the ZBC (Zoned Block Commands) and ZAC
+		(Zoned Device ATA Command Set) standards. These standards
+		also define the "drive-managed" zone model. However,
+		since drive-managed zoned block devices do not support
+		zone commands, they will be treated as regular block
+		devices and zoned will report "none".
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f679ae1..b1d5b7f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->io_opt = 0;
 	lim->misaligned = 0;
 	lim->cluster = 1;
+	lim->zoned = BLK_ZONED_NONE;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..ff9cd9c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
+static ssize_t queue_zoned_show(struct request_queue *q, char *page)
+{
+	switch (blk_queue_zoned_model(q)) {
+	case BLK_ZONED_HA:
+		return sprintf(page, "host-aware\n");
+	case BLK_ZONED_HM:
+		return sprintf(page, "host-managed\n");
+	default:
+		return sprintf(page, "none\n");
+	}
+}
+
 static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
 {
 	return queue_var_show((blk_queue_nomerges(q) << 1) |
@@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry = {
 	.store = queue_store_nonrot,
 };
 
+static struct queue_sysfs_entry queue_zoned_entry = {
+	.attr = {.name = "zoned", .mode = S_IRUGO },
+	.show = queue_zoned_show,
+};
+
 static struct queue_sysfs_entry queue_nomerges_entry = {
 	.attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_nomerges_show,
@@ -546,6 +563,7 @@ static struct attribute *default_attrs[] = {
 	&queue_discard_zeroes_data_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_nonrot_entry.attr,
+	&queue_zoned_entry.attr,
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
 	&queue_iostats_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..f19e16b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -261,6 +261,15 @@ struct blk_queue_tag {
 #define BLK_SCSI_MAX_CMDS	(256)
 #define BLK_SCSI_CMD_PER_LONG	(BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
+/*
+ * Zoned block device models (zoned limit).
+ */
+enum blk_zoned_model {
+	BLK_ZONED_NONE,	/* Regular block device */
+	BLK_ZONED_HA,	/* Host-aware zoned block device */
+	BLK_ZONED_HM,	/* Host-managed zoned block device */
+};
+
 struct queue_limits {
 	unsigned long		bounce_pfn;
 	unsigned long		seg_boundary_mask;
@@ -290,6 +299,7 @@ struct queue_limits {
 	unsigned char		cluster;
 	unsigned char		discard_zeroes_data;
 	unsigned char		raid_partial_stripes_expensive;
+	enum blk_zoned_model	zoned;
 };
 
 struct request_queue {
@@ -627,6 +637,23 @@ static inline unsigned int blk_queue_cluster(struct request_queue *q)
 	return q->limits.cluster;
 }
 
+static inline enum blk_zoned_model
+blk_queue_zoned_model(struct request_queue *q)
+{
+	return q->limits.zoned;
+}
+
+static inline bool blk_queue_is_zoned(struct request_queue *q)
+{
+	switch (blk_queue_zoned_model(q)) {
+	case BLK_ZONED_HA:
+	case BLK_ZONED_HM:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */
@@ -1354,6 +1381,26 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
 	return 0;
 }
 
+static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_zoned_model(q);
+
+	return BLK_ZONED_NONE;
+}
+
+static inline bool bdev_is_zoned(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_is_zoned(q);
+
+	return false;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  2016-09-28  8:45 ` Damien Le Moal
@ 2016-09-28  8:45   ` Damien Le Moal
  -1 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

The queue limits already have a 'chunk_sectors' setting, so
we should be presenting it via sysfs.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: Updated Documentation/ABI/testing/sysfs-block]

Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 Documentation/ABI/testing/sysfs-block | 13 +++++++++++++
 block/blk-sysfs.c                     | 11 +++++++++++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 75a5055..ee2d5cd 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -251,3 +251,16 @@ Description:
 		since drive-managed zoned block devices do not support
 		zone commands, they will be treated as regular block
 		devices and zoned will report "none".
+
+What:		/sys/block/<disk>/queue/chunk_sectors
+Date:		September 2016
+Contact:	Hannes Reinecke <hare@suse.com>
+Description:
+		chunk_sectors has different meaning depending on the type
+		of the disk. For a RAID device (dm-raid), chunk_sectors
+		indicates the size in 512B sectors of the RAID volume
+		stripe segment. For a zoned block device, either
+		host-aware or host-managed, chunk_sectors indicates the
+		size of 512B sectors of the zones of the device, with
+		the eventual exception of the last zone of the device
+		which may be smaller.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff9cd9c..488c2e2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct request_queue *q, char *pag
 	return queue_var_show(queue_physical_block_size(q), page);
 }
 
+static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.chunk_sectors, page);
+}
+
 static ssize_t queue_io_min_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(queue_io_min(q), page);
@@ -455,6 +460,11 @@ static struct queue_sysfs_entry queue_physical_block_size_entry = {
 	.show = queue_physical_block_size_show,
 };
 
+static struct queue_sysfs_entry queue_chunk_sectors_entry = {
+	.attr = {.name = "chunk_sectors", .mode = S_IRUGO },
+	.show = queue_chunk_sectors_show,
+};
+
 static struct queue_sysfs_entry queue_io_min_entry = {
 	.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
 	.show = queue_io_min_show,
@@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
 	&queue_hw_sector_size_entry.attr,
 	&queue_logical_block_size_entry.attr,
 	&queue_physical_block_size_entry.attr,
+	&queue_chunk_sectors_entry.attr,
 	&queue_io_min_entry.attr,
 	&queue_io_opt_entry.attr,
 	&queue_discard_granularity_entry.attr,
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
@ 2016-09-28  8:45   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

The queue limits already have a 'chunk_sectors' setting, so
we should be presenting it via sysfs.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: Updated Documentation/ABI/testing/sysfs-block]

Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 Documentation/ABI/testing/sysfs-block | 13 +++++++++++++
 block/blk-sysfs.c                     | 11 +++++++++++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 75a5055..ee2d5cd 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -251,3 +251,16 @@ Description:
 		since drive-managed zoned block devices do not support
 		zone commands, they will be treated as regular block
 		devices and zoned will report "none".
+
+What:		/sys/block/<disk>/queue/chunk_sectors
+Date:		September 2016
+Contact:	Hannes Reinecke <hare@suse.com>
+Description:
+		chunk_sectors has different meaning depending on the type
+		of the disk. For a RAID device (dm-raid), chunk_sectors
+		indicates the size in 512B sectors of the RAID volume
+		stripe segment. For a zoned block device, either
+		host-aware or host-managed, chunk_sectors indicates the
+		size of 512B sectors of the zones of the device, with
+		the eventual exception of the last zone of the device
+		which may be smaller.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff9cd9c..488c2e2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct request_queue *q, char *pag
 	return queue_var_show(queue_physical_block_size(q), page);
 }
 
+static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.chunk_sectors, page);
+}
+
 static ssize_t queue_io_min_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(queue_io_min(q), page);
@@ -455,6 +460,11 @@ static struct queue_sysfs_entry queue_physical_block_size_entry = {
 	.show = queue_physical_block_size_show,
 };
 
+static struct queue_sysfs_entry queue_chunk_sectors_entry = {
+	.attr = {.name = "chunk_sectors", .mode = S_IRUGO },
+	.show = queue_chunk_sectors_show,
+};
+
 static struct queue_sysfs_entry queue_io_min_entry = {
 	.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
 	.show = queue_io_min_show,
@@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
 	&queue_hw_sector_size_entry.attr,
 	&queue_logical_block_size_entry.attr,
 	&queue_physical_block_size_entry.attr,
+	&queue_chunk_sectors_entry.attr,
 	&queue_io_min_entry.attr,
 	&queue_io_opt_entry.attr,
 	&queue_discard_granularity_entry.attr,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 3/7] block: update chunk_sectors in blk_stack_limits()
  2016-09-28  8:45 ` Damien Le Moal
@ 2016-09-28  8:45   ` Damien Le Moal
  -1 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal, Hannes Reinecke

From: Hannes Reinecke <hare@suse.de>

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-settings.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b1d5b7f..55369a6 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			t->discard_granularity;
 	}
 
+	if (b->chunk_sectors)
+		t->chunk_sectors = min_not_zero(t->chunk_sectors,
+						b->chunk_sectors);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 3/7] block: update chunk_sectors in blk_stack_limits()
@ 2016-09-28  8:45   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal, Hannes Reinecke

From: Hannes Reinecke <hare@suse.de>

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-settings.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b1d5b7f..55369a6 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			t->discard_granularity;
 	}
 
+	if (b->chunk_sectors)
+		t->chunk_sectors = min_not_zero(t->chunk_sectors,
+						b->chunk_sectors);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 4/7] block: Define zoned block device operations
  2016-09-28  8:45 ` Damien Le Moal
@ 2016-09-28  8:45   ` Damien Le Moal
  -1 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal, Shaun Tancheff

From: Shaun Tancheff <shaun.tancheff@seagate.com>

Define REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET for handling zones of
host-managed and host-aware zoned block devices. With with these two
new operations, the total number of operations defined reaches 8 and
still fits with the 3 bits definition of REQ_OP_BITS.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-core.c          | 4 ++++
 include/linux/blk_types.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..e4eda5d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1941,6 +1941,10 @@ generic_make_request_checks(struct bio *bio)
 	case REQ_OP_WRITE_SAME:
 		if (!bdev_write_same(bio->bi_bdev))
 			goto not_supported;
+	case REQ_OP_ZONE_REPORT:
+	case REQ_OP_ZONE_RESET:
+		if (!bdev_is_zoned(bio->bi_bdev))
+			goto not_supported;
 		break;
 	default:
 		break;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ec..dd50dce 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -243,6 +243,8 @@ enum req_op {
 	REQ_OP_SECURE_ERASE,	/* request to securely erase sectors */
 	REQ_OP_WRITE_SAME,	/* write same block many times */
 	REQ_OP_FLUSH,		/* request for cache flush */
+	REQ_OP_ZONE_REPORT,	/* Get zone information */
+	REQ_OP_ZONE_RESET,	/* Reset a zone write pointer */
 };
 
 #define REQ_OP_BITS 3
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 4/7] block: Define zoned block device operations
@ 2016-09-28  8:45   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal, Shaun Tancheff

From: Shaun Tancheff <shaun.tancheff@seagate.com>

Define REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET for handling zones of
host-managed and host-aware zoned block devices. With with these two
new operations, the total number of operations defined reaches 8 and
still fits with the 3 bits definition of REQ_OP_BITS.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-core.c          | 4 ++++
 include/linux/blk_types.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..e4eda5d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1941,6 +1941,10 @@ generic_make_request_checks(struct bio *bio)
 	case REQ_OP_WRITE_SAME:
 		if (!bdev_write_same(bio->bi_bdev))
 			goto not_supported;
+	case REQ_OP_ZONE_REPORT:
+	case REQ_OP_ZONE_RESET:
+		if (!bdev_is_zoned(bio->bi_bdev))
+			goto not_supported;
 		break;
 	default:
 		break;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ec..dd50dce 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -243,6 +243,8 @@ enum req_op {
 	REQ_OP_SECURE_ERASE,	/* request to securely erase sectors */
 	REQ_OP_WRITE_SAME,	/* write same block many times */
 	REQ_OP_FLUSH,		/* request for cache flush */
+	REQ_OP_ZONE_REPORT,	/* Get zone information */
+	REQ_OP_ZONE_RESET,	/* Reset a zone write pointer */
 };
 
 #define REQ_OP_BITS 3
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 5/7] block: Implement support for zoned block devices
  2016-09-28  8:45 ` Damien Le Moal
@ 2016-09-28  8:45   ` Damien Le Moal
  -1 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Implement zoned block device zone information reporting and reset.
Zone information are reported as struct blk_zone. This implementation
does not differentiate between host-aware and host-managed device
models and is valid for both. Two functions are provided:
blkdev_report_zones for discovering the zone configuration of a
zoned block device, and blkdev_reset_zones for resetting the write
pointer of sequential zones. The helper function blk_queue_zone_size
and bdev_zone_size are also provided for, as the name suggest,
obtaining the zone size (in 512B sectors) of the zones of the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed the zone cache
         * Implement report zones operation based on earlier proposal
           by Shaun Tancheff <shaun.tancheff@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/Kconfig                 |   8 ++
 block/Makefile                |   1 +
 block/blk-zoned.c             | 257 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h        |  31 +++++
 include/uapi/linux/Kbuild     |   1 +
 include/uapi/linux/blkzoned.h | 103 +++++++++++++++++
 6 files changed, 401 insertions(+)
 create mode 100644 block/blk-zoned.c
 create mode 100644 include/uapi/linux/blkzoned.h

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..6b0ad08 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
 	T10/SCSI Data Integrity Field or the T13/ATA External Path
 	Protection.  If in doubt, say N.
 
+config BLK_DEV_ZONED
+	bool "Zoned block device support"
+	---help---
+	Block layer zoned block device support. This option enables
+	support for ZAC/ZBC host-managed and host-aware zoned block devices.
+
+	Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_DEV_THROTTLING
 	bool "Block layer bio throttling support"
 	depends on BLK_CGROUP=y
diff --git a/block/Makefile b/block/Makefile
index 36acdd7..9371bc7 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -22,4 +22,5 @@ obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
 obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
+obj-$(CONFIG_BLK_DEV_ZONED)	+= blk-zoned.o
 obj-$(CONFIG_BLK_MQ_PCI)	+= blk-mq-pci.o
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 0000000..1603573
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,257 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ *
+ * Copyright (c) 2016, Damien Le Moal
+ * Copyright (c) 2016, Western Digital
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/rbtree.h>
+#include <linux/blkdev.h>
+
+static inline sector_t blk_zone_start(struct request_queue *q,
+				      sector_t sector)
+{
+	sector_t zone_mask = blk_queue_zone_size(q) - 1;
+
+	return sector & ~zone_mask;
+}
+
+/*
+ * Check that a zone report belongs to the partition.
+ * If yes, fix its start sector and write pointer, copy it in the
+ * zone information array and return true. Return false otherwise.
+ */
+static bool blkdev_report_zone(struct block_device *bdev,
+			       struct blk_zone *rep,
+			       struct blk_zone *zone)
+{
+	sector_t offset = get_start_sect(bdev);
+
+	if (rep->start < offset)
+		return false;
+
+	rep->start -= offset;
+	if (rep->start + rep->len > bdev->bd_part->nr_sects)
+		return false;
+
+	if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		rep->wp = rep->start + rep->len;
+	else
+		rep->wp -= offset;
+	memcpy(zone, rep, sizeof(struct blk_zone));
+
+	return true;
+}
+
+/**
+ * blkdev_report_zones - Get zones information
+ * @bdev:	Target block device
+ * @sector:	Sector from which to report zones
+ * @zones:	Array of zone structures where to return the zones information
+ * @nr_zones:	Number of zone structures in the zone array
+ * @gfp_mask:	Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Get zone information starting from the zone containing @sector.
+ *    The number of zone information reported may be less than the number
+ *    requested by @nr_zones. The number of zones actually reported is
+ *    returned in @nr_zones.
+ */
+int blkdev_report_zones(struct block_device *bdev,
+			sector_t sector,
+			struct blk_zone *zones,
+			unsigned int *nr_zones,
+			gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct blk_zone_report_hdr *hdr;
+	unsigned int nrz = *nr_zones;
+	struct page *page;
+	unsigned int nr_rep;
+	size_t rep_bytes;
+	unsigned int nr_pages;
+	struct bio *bio;
+	struct bio_vec *bv;
+	unsigned int i, n, nz;
+	unsigned int ofst;
+	void *addr;
+	int ret = 0;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (!nrz)
+		return 0;
+
+	if (sector > bdev->bd_part->nr_sects) {
+		*nr_zones = 0;
+		return 0;
+	}
+
+	/*
+	 * The zone report has a header. So make room for it in the
+	 * payload. Also make sure that the report fits in a single BIO
+	 * that will not be split down the stack.
+	 */
+	rep_bytes = sizeof(struct blk_zone_report_hdr) +
+		sizeof(struct blk_zone) * nrz;
+	rep_bytes = (rep_bytes + PAGE_SIZE - 1) & PAGE_MASK;
+	if (rep_bytes > (queue_max_sectors(q) << 9))
+		rep_bytes = queue_max_sectors(q) << 9;
+
+	nr_pages = min_t(unsigned int, BIO_MAX_PAGES,
+			 rep_bytes >> PAGE_SHIFT);
+	nr_pages = min_t(unsigned int, nr_pages,
+			 queue_max_segments(q));
+
+	bio = bio_alloc(gfp_mask, nr_pages);
+	if (!bio)
+		return -ENOMEM;
+
+	bio->bi_bdev = bdev;
+	bio->bi_iter.bi_sector = blk_zone_start(q, sector);
+	bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, 0);
+
+	for (i = 0; i < nr_pages; i++) {
+		page = alloc_page(gfp_mask);
+		if (!page) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		if (!bio_add_page(bio, page, PAGE_SIZE, 0)) {
+			__free_page(page);
+			break;
+		}
+	}
+
+	if (i == 0)
+		ret = -ENOMEM;
+	else
+		ret = submit_bio_wait(bio);
+	if (ret)
+		goto out;
+
+	/*
+	 * Process the report result: skip the header and go through the
+	 * reported zones to fixup and fixup the zone information for
+	 * partitions. At the same time, return the zone information into
+	 * the zone array.
+	 */
+	n = 0;
+	nz = 0;
+	nr_rep = 0;
+	bio_for_each_segment_all(bv, bio, i) {
+
+		if (!bv->bv_page)
+			break;
+
+		addr = kmap_atomic(bv->bv_page);
+
+		/* Get header in the first page */
+		ofst = 0;
+		if (!nr_rep) {
+			hdr = (struct blk_zone_report_hdr *) addr;
+			nr_rep = hdr->nr_zones;
+			ofst = sizeof(struct blk_zone_report_hdr);
+		}
+
+		/* Fixup and report zones */
+		while (ofst < bv->bv_len &&
+		       n < nr_rep && nz < nrz) {
+			if (blkdev_report_zone(bdev, addr + ofst, &zones[nz]))
+				nz++;
+			ofst += sizeof(struct blk_zone);
+			n++;
+		}
+
+		kunmap_atomic(addr);
+
+		if (n >= nr_rep || nz >= nrz)
+			break;
+
+	}
+
+out:
+	bio_for_each_segment_all(bv, bio, i)
+		__free_page(bv->bv_page);
+	bio_put(bio);
+
+	if (ret == 0)
+		*nr_zones = nz;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkdev_report_zones);
+
+/**
+ * blkdev_reset_zones - Reset zones write pointer
+ * @bdev:	Target block device
+ * @sector:	Start sector of the first zone to reset
+ * @nr_sectors:	Number of sectors, at least the length of one zone
+ * @gfp_mask:	Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Reset the write pointer of the zones contained in the range
+ *    @sector..@sector+@nr_sectors. Specifying the entire disk sector range
+ *    is valid, but the specified range should not contain conventional zones.
+ */
+int blkdev_reset_zones(struct block_device *bdev,
+		       sector_t sector, sector_t nr_sectors,
+		       gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	sector_t zone_sectors;
+	sector_t end_sector = sector + nr_sectors;
+	struct bio *bio;
+	int ret;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (end_sector > bdev->bd_part->nr_sects)
+		/* Out of range */
+		return -EINVAL;
+
+	/* Check alignment (handle eventual smaller last zone) */
+	zone_sectors = blk_queue_zone_size(q);
+	if (sector & (zone_sectors - 1))
+		return -EINVAL;
+
+	if ((nr_sectors & (zone_sectors - 1)) &&
+	    end_sector != bdev->bd_part->nr_sects)
+		return -EINVAL;
+
+	while (sector < end_sector) {
+
+		bio = bio_alloc(gfp_mask, 0);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_bdev = bdev;
+		bio_set_op_attrs(bio, REQ_OP_ZONE_RESET, 0);
+
+		ret = submit_bio_wait(bio);
+		bio_put(bio);
+
+		if (ret)
+			return ret;
+
+		sector += zone_sectors;
+
+		/* This may take a while, so be nice to others */
+		cond_resched();
+
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkdev_reset_zones);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f19e16b..252043f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include <linux/rcupdate.h>
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
+#include <linux/blkzoned.h>
 
 struct module;
 struct scsi_ioctl_command;
@@ -302,6 +303,21 @@ struct queue_limits {
 	enum blk_zoned_model	zoned;
 };
 
+#ifdef CONFIG_BLK_DEV_ZONED
+
+struct blk_zone_report_hdr {
+	unsigned int	nr_zones;
+	u8		padding[60];
+};
+
+extern int blkdev_report_zones(struct block_device *bdev,
+			       sector_t sector, struct blk_zone *zones,
+			       unsigned int *nr_zones, gfp_t gfp_mask);
+extern int blkdev_reset_zones(struct block_device *bdev, sector_t sectors,
+			      sector_t nr_sectors, gfp_t gfp_mask);
+
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 struct request_queue {
 	/*
 	 * Together with queue_head for cacheline sharing
@@ -654,6 +670,11 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
 	}
 }
 
+static inline unsigned int blk_queue_zone_size(struct request_queue *q)
+{
+	return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0;
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */
@@ -1401,6 +1422,16 @@ static inline bool bdev_is_zoned(struct block_device *bdev)
 	return false;
 }
 
+static inline unsigned int bdev_zone_size(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_zone_size(q);
+
+	return 0;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index dd60439..92466a6 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -70,6 +70,7 @@ header-y += bfs_fs.h
 header-y += binfmts.h
 header-y += blkpg.h
 header-y += blktrace_api.h
+header-y += blkzoned.h
 header-y += bpf_common.h
 header-y += bpf_perf_event.h
 header-y += bpf.h
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
new file mode 100644
index 0000000..a381721
--- /dev/null
+++ b/include/uapi/linux/blkzoned.h
@@ -0,0 +1,103 @@
+/*
+ * Zoned block devices handling.
+ *
+ * Copyright (C) 2015 Seagate Technology PLC
+ *
+ * Written by: Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
+ * Copyright (C) 2016 Western Digital
+ *
+ * This file is licensed under  the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+#ifndef _UAPI_BLKZONED_H
+#define _UAPI_BLKZONED_H
+
+#include <linux/types.h>
+
+/**
+ * enum blk_zone_type - Types of zones allowed in a zoned device.
+ *
+ * @BLK_ZONE_TYPE_CONVENTIONAL: The zone has no write pointer and can be writen
+ *                              randomly. Zone reset has no effect on the zone.
+ * @BLK_ZONE_TYPE_SEQWRITE_REQ: The zone must be written sequentially
+ * @BLK_ZONE_TYPE_SEQWRITE_PREF: The zone can be written non-sequentially
+ *
+ * Any other value not defined is reserved and must be considered as invalid.
+ */
+enum blk_zone_type {
+	BLK_ZONE_TYPE_CONVENTIONAL	= 0x1,
+	BLK_ZONE_TYPE_SEQWRITE_REQ	= 0x2,
+	BLK_ZONE_TYPE_SEQWRITE_PREF	= 0x3,
+};
+
+/**
+ * enum blk_zone_cond - Condition [state] of a zone in a zoned device.
+ *
+ * @BLK_ZONE_COND_NOT_WP: The zone has no write pointer, it is conventional.
+ * @BLK_ZONE_COND_EMPTY: The zone is empty.
+ * @BLK_ZONE_COND_IMP_OPEN: The zone is open, but not explicitly opened.
+ * @BLK_ZONE_COND_EXP_OPEN: The zones was explicitly opened by an
+ *                          OPEN ZONE command.
+ * @BLK_ZONE_COND_CLOSED: The zone was [explicitly] closed after writing.
+ * @BLK_ZONE_COND_FULL: The zone is marked as full, possibly by a zone
+ *                      FINISH ZONE command.
+ * @BLK_ZONE_COND_READONLY: The zone is read-only.
+ * @BLK_ZONE_COND_OFFLINE: The zone is offline (sectors cannot be read/written).
+ *
+ * The Zone Condition state machine in the ZBC/ZAC standards maps the above
+ * deinitions as:
+ *   - ZC1: Empty         | BLK_ZONE_EMPTY
+ *   - ZC2: Implicit Open | BLK_ZONE_COND_IMP_OPEN
+ *   - ZC3: Explicit Open | BLK_ZONE_COND_EXP_OPEN
+ *   - ZC4: Closed        | BLK_ZONE_CLOSED
+ *   - ZC5: Full          | BLK_ZONE_FULL
+ *   - ZC6: Read Only     | BLK_ZONE_READONLY
+ *   - ZC7: Offline       | BLK_ZONE_OFFLINE
+ *
+ * Conditions 0x5 to 0xC are reserved by the current ZBC/ZAC spec and should
+ * be considered invalid.
+ */
+enum blk_zone_cond {
+	BLK_ZONE_COND_NOT_WP	= 0x0,
+	BLK_ZONE_COND_EMPTY	= 0x1,
+	BLK_ZONE_COND_IMP_OPEN	= 0x2,
+	BLK_ZONE_COND_EXP_OPEN	= 0x3,
+	BLK_ZONE_COND_CLOSED	= 0x4,
+	BLK_ZONE_COND_READONLY	= 0xD,
+	BLK_ZONE_COND_FULL	= 0xE,
+	BLK_ZONE_COND_OFFLINE	= 0xF,
+};
+
+/**
+ * struct blk_zone - Zone descriptor for BLKREPORTZONE ioctl.
+ *
+ * @start: Zone start in 512 B sector units
+ * @len: Zone length in 512 B sector units
+ * @wp: Zone write pointer location in 512 B sector units
+ * @type: see enum blk_zone_type for possible values
+ * @cond: see enum blk_zone_cond for possible values
+ * @non_seq: Flag indicating that the zone is using non-sequential resources
+ *           (for host-aware zoned block devices only).
+ * @reset: Flag indicating that a zone reset is recommended.
+ * @reserved: Padding to 64 B to match the ZBC/ZAC defined zone descriptor size.
+ *
+ * start, len and wp use the regular 512 B sector unit, regardless of the
+ * device logical block size. The overall structure size is 64 B to match the
+ * ZBC/ZAC defined zone descriptor and allow support for future additional
+ * zone information.
+ */
+struct blk_zone {
+	__u64	start;		/* Zone start sector */
+	__u64	len;		/* Zone length in number of sectors */
+	__u64	wp;		/* Zone write pointer position */
+	__u8	type;		/* Zone type */
+	__u8	cond;		/* Zone condition */
+	__u8	non_seq;	/* Non-sequential write resources active */
+	__u8	reset;		/* Reset write pointer recommended */
+	__u8	reserved[36];
+};
+
+#endif /* _UAPI_BLKZONED_H */
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 5/7] block: Implement support for zoned block devices
@ 2016-09-28  8:45   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Implement zoned block device zone information reporting and reset.
Zone information are reported as struct blk_zone. This implementation
does not differentiate between host-aware and host-managed device
models and is valid for both. Two functions are provided:
blkdev_report_zones for discovering the zone configuration of a
zoned block device, and blkdev_reset_zones for resetting the write
pointer of sequential zones. The helper function blk_queue_zone_size
and bdev_zone_size are also provided for, as the name suggest,
obtaining the zone size (in 512B sectors) of the zones of the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed the zone cache
         * Implement report zones operation based on earlier proposal
           by Shaun Tancheff <shaun.tancheff@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/Kconfig                 |   8 ++
 block/Makefile                |   1 +
 block/blk-zoned.c             | 257 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h        |  31 +++++
 include/uapi/linux/Kbuild     |   1 +
 include/uapi/linux/blkzoned.h | 103 +++++++++++++++++
 6 files changed, 401 insertions(+)
 create mode 100644 block/blk-zoned.c
 create mode 100644 include/uapi/linux/blkzoned.h

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..6b0ad08 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
 	T10/SCSI Data Integrity Field or the T13/ATA External Path
 	Protection.  If in doubt, say N.
 
+config BLK_DEV_ZONED
+	bool "Zoned block device support"
+	---help---
+	Block layer zoned block device support. This option enables
+	support for ZAC/ZBC host-managed and host-aware zoned block devices.
+
+	Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_DEV_THROTTLING
 	bool "Block layer bio throttling support"
 	depends on BLK_CGROUP=y
diff --git a/block/Makefile b/block/Makefile
index 36acdd7..9371bc7 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -22,4 +22,5 @@ obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
 obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
+obj-$(CONFIG_BLK_DEV_ZONED)	+= blk-zoned.o
 obj-$(CONFIG_BLK_MQ_PCI)	+= blk-mq-pci.o
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 0000000..1603573
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,257 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ *
+ * Copyright (c) 2016, Damien Le Moal
+ * Copyright (c) 2016, Western Digital
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/rbtree.h>
+#include <linux/blkdev.h>
+
+static inline sector_t blk_zone_start(struct request_queue *q,
+				      sector_t sector)
+{
+	sector_t zone_mask = blk_queue_zone_size(q) - 1;
+
+	return sector & ~zone_mask;
+}
+
+/*
+ * Check that a zone report belongs to the partition.
+ * If yes, fix its start sector and write pointer, copy it in the
+ * zone information array and return true. Return false otherwise.
+ */
+static bool blkdev_report_zone(struct block_device *bdev,
+			       struct blk_zone *rep,
+			       struct blk_zone *zone)
+{
+	sector_t offset = get_start_sect(bdev);
+
+	if (rep->start < offset)
+		return false;
+
+	rep->start -= offset;
+	if (rep->start + rep->len > bdev->bd_part->nr_sects)
+		return false;
+
+	if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		rep->wp = rep->start + rep->len;
+	else
+		rep->wp -= offset;
+	memcpy(zone, rep, sizeof(struct blk_zone));
+
+	return true;
+}
+
+/**
+ * blkdev_report_zones - Get zones information
+ * @bdev:	Target block device
+ * @sector:	Sector from which to report zones
+ * @zones:	Array of zone structures where to return the zones information
+ * @nr_zones:	Number of zone structures in the zone array
+ * @gfp_mask:	Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Get zone information starting from the zone containing @sector.
+ *    The number of zone information reported may be less than the number
+ *    requested by @nr_zones. The number of zones actually reported is
+ *    returned in @nr_zones.
+ */
+int blkdev_report_zones(struct block_device *bdev,
+			sector_t sector,
+			struct blk_zone *zones,
+			unsigned int *nr_zones,
+			gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct blk_zone_report_hdr *hdr;
+	unsigned int nrz = *nr_zones;
+	struct page *page;
+	unsigned int nr_rep;
+	size_t rep_bytes;
+	unsigned int nr_pages;
+	struct bio *bio;
+	struct bio_vec *bv;
+	unsigned int i, n, nz;
+	unsigned int ofst;
+	void *addr;
+	int ret = 0;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (!nrz)
+		return 0;
+
+	if (sector > bdev->bd_part->nr_sects) {
+		*nr_zones = 0;
+		return 0;
+	}
+
+	/*
+	 * The zone report has a header. So make room for it in the
+	 * payload. Also make sure that the report fits in a single BIO
+	 * that will not be split down the stack.
+	 */
+	rep_bytes = sizeof(struct blk_zone_report_hdr) +
+		sizeof(struct blk_zone) * nrz;
+	rep_bytes = (rep_bytes + PAGE_SIZE - 1) & PAGE_MASK;
+	if (rep_bytes > (queue_max_sectors(q) << 9))
+		rep_bytes = queue_max_sectors(q) << 9;
+
+	nr_pages = min_t(unsigned int, BIO_MAX_PAGES,
+			 rep_bytes >> PAGE_SHIFT);
+	nr_pages = min_t(unsigned int, nr_pages,
+			 queue_max_segments(q));
+
+	bio = bio_alloc(gfp_mask, nr_pages);
+	if (!bio)
+		return -ENOMEM;
+
+	bio->bi_bdev = bdev;
+	bio->bi_iter.bi_sector = blk_zone_start(q, sector);
+	bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, 0);
+
+	for (i = 0; i < nr_pages; i++) {
+		page = alloc_page(gfp_mask);
+		if (!page) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		if (!bio_add_page(bio, page, PAGE_SIZE, 0)) {
+			__free_page(page);
+			break;
+		}
+	}
+
+	if (i == 0)
+		ret = -ENOMEM;
+	else
+		ret = submit_bio_wait(bio);
+	if (ret)
+		goto out;
+
+	/*
+	 * Process the report result: skip the header and go through the
+	 * reported zones to fixup and fixup the zone information for
+	 * partitions. At the same time, return the zone information into
+	 * the zone array.
+	 */
+	n = 0;
+	nz = 0;
+	nr_rep = 0;
+	bio_for_each_segment_all(bv, bio, i) {
+
+		if (!bv->bv_page)
+			break;
+
+		addr = kmap_atomic(bv->bv_page);
+
+		/* Get header in the first page */
+		ofst = 0;
+		if (!nr_rep) {
+			hdr = (struct blk_zone_report_hdr *) addr;
+			nr_rep = hdr->nr_zones;
+			ofst = sizeof(struct blk_zone_report_hdr);
+		}
+
+		/* Fixup and report zones */
+		while (ofst < bv->bv_len &&
+		       n < nr_rep && nz < nrz) {
+			if (blkdev_report_zone(bdev, addr + ofst, &zones[nz]))
+				nz++;
+			ofst += sizeof(struct blk_zone);
+			n++;
+		}
+
+		kunmap_atomic(addr);
+
+		if (n >= nr_rep || nz >= nrz)
+			break;
+
+	}
+
+out:
+	bio_for_each_segment_all(bv, bio, i)
+		__free_page(bv->bv_page);
+	bio_put(bio);
+
+	if (ret == 0)
+		*nr_zones = nz;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkdev_report_zones);
+
+/**
+ * blkdev_reset_zones - Reset zones write pointer
+ * @bdev:	Target block device
+ * @sector:	Start sector of the first zone to reset
+ * @nr_sectors:	Number of sectors, at least the length of one zone
+ * @gfp_mask:	Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Reset the write pointer of the zones contained in the range
+ *    @sector..@sector+@nr_sectors. Specifying the entire disk sector range
+ *    is valid, but the specified range should not contain conventional zones.
+ */
+int blkdev_reset_zones(struct block_device *bdev,
+		       sector_t sector, sector_t nr_sectors,
+		       gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	sector_t zone_sectors;
+	sector_t end_sector = sector + nr_sectors;
+	struct bio *bio;
+	int ret;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (end_sector > bdev->bd_part->nr_sects)
+		/* Out of range */
+		return -EINVAL;
+
+	/* Check alignment (handle eventual smaller last zone) */
+	zone_sectors = blk_queue_zone_size(q);
+	if (sector & (zone_sectors - 1))
+		return -EINVAL;
+
+	if ((nr_sectors & (zone_sectors - 1)) &&
+	    end_sector != bdev->bd_part->nr_sects)
+		return -EINVAL;
+
+	while (sector < end_sector) {
+
+		bio = bio_alloc(gfp_mask, 0);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_bdev = bdev;
+		bio_set_op_attrs(bio, REQ_OP_ZONE_RESET, 0);
+
+		ret = submit_bio_wait(bio);
+		bio_put(bio);
+
+		if (ret)
+			return ret;
+
+		sector += zone_sectors;
+
+		/* This may take a while, so be nice to others */
+		cond_resched();
+
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkdev_reset_zones);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f19e16b..252043f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include <linux/rcupdate.h>
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
+#include <linux/blkzoned.h>
 
 struct module;
 struct scsi_ioctl_command;
@@ -302,6 +303,21 @@ struct queue_limits {
 	enum blk_zoned_model	zoned;
 };
 
+#ifdef CONFIG_BLK_DEV_ZONED
+
+struct blk_zone_report_hdr {
+	unsigned int	nr_zones;
+	u8		padding[60];
+};
+
+extern int blkdev_report_zones(struct block_device *bdev,
+			       sector_t sector, struct blk_zone *zones,
+			       unsigned int *nr_zones, gfp_t gfp_mask);
+extern int blkdev_reset_zones(struct block_device *bdev, sector_t sectors,
+			      sector_t nr_sectors, gfp_t gfp_mask);
+
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 struct request_queue {
 	/*
 	 * Together with queue_head for cacheline sharing
@@ -654,6 +670,11 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
 	}
 }
 
+static inline unsigned int blk_queue_zone_size(struct request_queue *q)
+{
+	return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0;
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */
@@ -1401,6 +1422,16 @@ static inline bool bdev_is_zoned(struct block_device *bdev)
 	return false;
 }
 
+static inline unsigned int bdev_zone_size(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_zone_size(q);
+
+	return 0;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index dd60439..92466a6 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -70,6 +70,7 @@ header-y += bfs_fs.h
 header-y += binfmts.h
 header-y += blkpg.h
 header-y += blktrace_api.h
+header-y += blkzoned.h
 header-y += bpf_common.h
 header-y += bpf_perf_event.h
 header-y += bpf.h
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
new file mode 100644
index 0000000..a381721
--- /dev/null
+++ b/include/uapi/linux/blkzoned.h
@@ -0,0 +1,103 @@
+/*
+ * Zoned block devices handling.
+ *
+ * Copyright (C) 2015 Seagate Technology PLC
+ *
+ * Written by: Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
+ * Copyright (C) 2016 Western Digital
+ *
+ * This file is licensed under  the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+#ifndef _UAPI_BLKZONED_H
+#define _UAPI_BLKZONED_H
+
+#include <linux/types.h>
+
+/**
+ * enum blk_zone_type - Types of zones allowed in a zoned device.
+ *
+ * @BLK_ZONE_TYPE_CONVENTIONAL: The zone has no write pointer and can be writen
+ *                              randomly. Zone reset has no effect on the zone.
+ * @BLK_ZONE_TYPE_SEQWRITE_REQ: The zone must be written sequentially
+ * @BLK_ZONE_TYPE_SEQWRITE_PREF: The zone can be written non-sequentially
+ *
+ * Any other value not defined is reserved and must be considered as invalid.
+ */
+enum blk_zone_type {
+	BLK_ZONE_TYPE_CONVENTIONAL	= 0x1,
+	BLK_ZONE_TYPE_SEQWRITE_REQ	= 0x2,
+	BLK_ZONE_TYPE_SEQWRITE_PREF	= 0x3,
+};
+
+/**
+ * enum blk_zone_cond - Condition [state] of a zone in a zoned device.
+ *
+ * @BLK_ZONE_COND_NOT_WP: The zone has no write pointer, it is conventional.
+ * @BLK_ZONE_COND_EMPTY: The zone is empty.
+ * @BLK_ZONE_COND_IMP_OPEN: The zone is open, but not explicitly opened.
+ * @BLK_ZONE_COND_EXP_OPEN: The zones was explicitly opened by an
+ *                          OPEN ZONE command.
+ * @BLK_ZONE_COND_CLOSED: The zone was [explicitly] closed after writing.
+ * @BLK_ZONE_COND_FULL: The zone is marked as full, possibly by a zone
+ *                      FINISH ZONE command.
+ * @BLK_ZONE_COND_READONLY: The zone is read-only.
+ * @BLK_ZONE_COND_OFFLINE: The zone is offline (sectors cannot be read/written).
+ *
+ * The Zone Condition state machine in the ZBC/ZAC standards maps the above
+ * deinitions as:
+ *   - ZC1: Empty         | BLK_ZONE_EMPTY
+ *   - ZC2: Implicit Open | BLK_ZONE_COND_IMP_OPEN
+ *   - ZC3: Explicit Open | BLK_ZONE_COND_EXP_OPEN
+ *   - ZC4: Closed        | BLK_ZONE_CLOSED
+ *   - ZC5: Full          | BLK_ZONE_FULL
+ *   - ZC6: Read Only     | BLK_ZONE_READONLY
+ *   - ZC7: Offline       | BLK_ZONE_OFFLINE
+ *
+ * Conditions 0x5 to 0xC are reserved by the current ZBC/ZAC spec and should
+ * be considered invalid.
+ */
+enum blk_zone_cond {
+	BLK_ZONE_COND_NOT_WP	= 0x0,
+	BLK_ZONE_COND_EMPTY	= 0x1,
+	BLK_ZONE_COND_IMP_OPEN	= 0x2,
+	BLK_ZONE_COND_EXP_OPEN	= 0x3,
+	BLK_ZONE_COND_CLOSED	= 0x4,
+	BLK_ZONE_COND_READONLY	= 0xD,
+	BLK_ZONE_COND_FULL	= 0xE,
+	BLK_ZONE_COND_OFFLINE	= 0xF,
+};
+
+/**
+ * struct blk_zone - Zone descriptor for BLKREPORTZONE ioctl.
+ *
+ * @start: Zone start in 512 B sector units
+ * @len: Zone length in 512 B sector units
+ * @wp: Zone write pointer location in 512 B sector units
+ * @type: see enum blk_zone_type for possible values
+ * @cond: see enum blk_zone_cond for possible values
+ * @non_seq: Flag indicating that the zone is using non-sequential resources
+ *           (for host-aware zoned block devices only).
+ * @reset: Flag indicating that a zone reset is recommended.
+ * @reserved: Padding to 64 B to match the ZBC/ZAC defined zone descriptor size.
+ *
+ * start, len and wp use the regular 512 B sector unit, regardless of the
+ * device logical block size. The overall structure size is 64 B to match the
+ * ZBC/ZAC defined zone descriptor and allow support for future additional
+ * zone information.
+ */
+struct blk_zone {
+	__u64	start;		/* Zone start sector */
+	__u64	len;		/* Zone length in number of sectors */
+	__u64	wp;		/* Zone write pointer position */
+	__u8	type;		/* Zone type */
+	__u8	cond;		/* Zone condition */
+	__u8	non_seq;	/* Non-sequential write resources active */
+	__u8	reset;		/* Reset write pointer recommended */
+	__u8	reserved[36];
+};
+
+#endif /* _UAPI_BLKZONED_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 6/7] sd: Implement support for ZBC devices
  2016-09-28  8:45 ` Damien Le Moal
@ 2016-09-28  8:45   ` Damien Le Moal
  -1 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Implement ZBC support functions to setup zoned disks, both
host-managed and host-aware models. Only zoned disks that satisfy
the following conditions are supported:
1) All zones are the same size, with the exception of an eventual
   last smaller runt zone.
2) For host-managed disks, reads are unrestricted (reads are not
   failed due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are setup with
a capacity of 0 to prevent their use.

The function sd_zbc_read_zones, called from sd_revalidate_disk,
checks that the device satisfies the above two constraints. This
function may also change the disk capacity previously set by
sd_read_capacity for devices reporting only the capacity of
conventional zones at the beginning of the LBA range (i.e. devices
reporting rc_basis set to 0).

The capacity message output was moved out of sd_read_capacity into
a new function sd_print_capacity to include this eventual capacity
change by sd_zbc_read_zones. This new function also includes a call
to sd_zbc_print_zones to display the number of zones and zone size
of the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed zone cache support
         * Removed mapping of discard to reset write pointer command
         * Modified sd_zbc_read_zones to include checks that the
           device satisfies the kernel constraints
         * Implemeted REPORT ZONES setup and post-processing based
           on code from Shaun Tancheff <shaun.tancheff@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 drivers/scsi/Makefile     |   1 +
 drivers/scsi/sd.c         | 143 ++++++++---
 drivers/scsi/sd.h         |  70 ++++++
 drivers/scsi/sd_zbc.c     | 624 ++++++++++++++++++++++++++++++++++++++++++++++
 include/scsi/scsi_proto.h |  17 ++
 5 files changed, 822 insertions(+), 33 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index fc0d9b8..350513c 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -180,6 +180,7 @@ hv_storvsc-y			:= storvsc_drv.o
 
 sd_mod-objs	:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
+sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
 
 sr_mod-objs	:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 51e5629..4d63260 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -93,6 +93,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS	16
@@ -163,7 +164,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 	static const char temp[] = "temporary ";
 	int len;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		/* no cache control on RBC devices; theoretically they
 		 * can do it, but there's probably so many exceptions
 		 * it's not worth the risk */
@@ -262,7 +263,7 @@ allow_restart_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	sdp->allow_restart = simple_strtoul(buf, NULL, 10);
@@ -392,6 +393,11 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
+	if (sd_is_zoned(sdkp)) {
+		sd_config_discard(sdkp, SD_LBP_DISABLE);
+		return count;
+	}
+
 	if (sdp->type != TYPE_DISK)
 		return -EINVAL;
 
@@ -459,7 +465,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &max);
@@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
 	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
+		if (ret != BLKPREP_OK)
+			return ret;
+	}
+
 	sector >>= ilog2(sdp->sector_size) - 9;
 	nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
@@ -963,6 +976,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
 					(unsigned long long)block));
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
+		if (ret != BLKPREP_OK)
+			goto out;
+	}
+
 	/*
 	 * If we have a 1K hardware sectorsize, prevent access to single
 	 * 512 byte sectors.  In theory we could handle this - in fact
@@ -1149,6 +1169,10 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 	case REQ_OP_READ:
 	case REQ_OP_WRITE:
 		return sd_setup_read_write_cmnd(cmd);
+	case REQ_OP_ZONE_REPORT:
+		return sd_zbc_setup_report_cmnd(cmd);
+	case REQ_OP_ZONE_RESET:
+		return sd_zbc_setup_reset_cmnd(cmd);
 	default:
 		BUG();
 	}
@@ -1780,7 +1804,10 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	unsigned char op = SCpnt->cmnd[0];
 	unsigned char unmap = SCpnt->cmnd[1] & 8;
 
-	if (req_op(req) == REQ_OP_DISCARD || req_op(req) == REQ_OP_WRITE_SAME) {
+	switch (req_op(req)) {
+	case REQ_OP_DISCARD:
+	case REQ_OP_WRITE_SAME:
+	case REQ_OP_ZONE_RESET:
 		if (!result) {
 			good_bytes = blk_rq_bytes(req);
 			scsi_set_resid(SCpnt, 0);
@@ -1788,6 +1815,17 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			good_bytes = 0;
 			scsi_set_resid(SCpnt, blk_rq_bytes(req));
 		}
+		break;
+	case REQ_OP_ZONE_REPORT:
+		if (!result) {
+			good_bytes = scsi_bufflen(SCpnt)
+				- scsi_get_resid(SCpnt);
+			scsi_set_resid(SCpnt, 0);
+		} else {
+			good_bytes = 0;
+			scsi_set_resid(SCpnt, blk_rq_bytes(req));
+		}
+		break;
 	}
 
 	if (result) {
@@ -1848,7 +1886,11 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	default:
 		break;
 	}
+
  out:
+	if (sd_is_zoned(sdkp))
+		sd_zbc_complete(SCpnt, good_bytes, &sshdr);
+
 	SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt,
 					   "sd_done: completed %d of %d bytes\n",
 					   good_bytes, scsi_bufflen(SCpnt)));
@@ -1983,7 +2025,6 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 	}
 }
 
-
 /*
  * Determine whether disk supports Data Integrity Field.
  */
@@ -2133,6 +2174,9 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	/* Logical blocks per physical block exponent */
 	sdkp->physical_block_size = (1 << (buffer[13] & 0xf)) * sector_size;
 
+	/* RC basis */
+	sdkp->rc_basis = (buffer[12] >> 4) & 0x3;
+
 	/* Lowest aligned logical block */
 	alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
 	blk_queue_alignment_offset(sdp->request_queue, alignment);
@@ -2242,7 +2286,6 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
 {
 	int sector_size;
 	struct scsi_device *sdp = sdkp->device;
-	sector_t old_capacity = sdkp->capacity;
 
 	if (sd_try_rc16_first(sdp)) {
 		sector_size = read_capacity_16(sdkp, sdp, buffer);
@@ -2323,35 +2366,44 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
 		sector_size = 512;
 	}
 	blk_queue_logical_block_size(sdp->request_queue, sector_size);
+	blk_queue_physical_block_size(sdp->request_queue,
+				      sdkp->physical_block_size);
+	sdkp->device->sector_size = sector_size;
 
-	{
-		char cap_str_2[10], cap_str_10[10];
+	if (sdkp->capacity > 0xffffffff)
+		sdp->use_16_for_rw = 1;
 
-		string_get_size(sdkp->capacity, sector_size,
-				STRING_UNITS_2, cap_str_2, sizeof(cap_str_2));
-		string_get_size(sdkp->capacity, sector_size,
-				STRING_UNITS_10, cap_str_10,
-				sizeof(cap_str_10));
+}
 
-		if (sdkp->first_scan || old_capacity != sdkp->capacity) {
-			sd_printk(KERN_NOTICE, sdkp,
-				  "%llu %d-byte logical blocks: (%s/%s)\n",
-				  (unsigned long long)sdkp->capacity,
-				  sector_size, cap_str_10, cap_str_2);
+/*
+ * Print disk capacity
+ */
+static void
+sd_print_capacity(struct scsi_disk *sdkp,
+		  sector_t old_capacity)
+{
+	int sector_size = sdkp->device->sector_size;
+	char cap_str_2[10], cap_str_10[10];
 
-			if (sdkp->physical_block_size != sector_size)
-				sd_printk(KERN_NOTICE, sdkp,
-					  "%u-byte physical blocks\n",
-					  sdkp->physical_block_size);
-		}
-	}
+	string_get_size(sdkp->capacity, sector_size,
+			STRING_UNITS_2, cap_str_2, sizeof(cap_str_2));
+	string_get_size(sdkp->capacity, sector_size,
+			STRING_UNITS_10, cap_str_10,
+			sizeof(cap_str_10));
 
-	if (sdkp->capacity > 0xffffffff)
-		sdp->use_16_for_rw = 1;
+	if (sdkp->first_scan || old_capacity != sdkp->capacity) {
+		sd_printk(KERN_NOTICE, sdkp,
+			  "%llu %d-byte logical blocks: (%s/%s)\n",
+			  (unsigned long long)sdkp->capacity,
+			  sector_size, cap_str_10, cap_str_2);
 
-	blk_queue_physical_block_size(sdp->request_queue,
-				      sdkp->physical_block_size);
-	sdkp->device->sector_size = sector_size;
+		if (sdkp->physical_block_size != sector_size)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "%u-byte physical blocks\n",
+				  sdkp->physical_block_size);
+
+		sd_zbc_print_zones(sdkp);
+	}
 }
 
 /* called with buffer of length 512 */
@@ -2613,7 +2665,7 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
 	struct scsi_mode_data data;
 	struct scsi_sense_hdr sshdr;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return;
 
 	if (sdkp->protection_type == 0)
@@ -2720,6 +2772,7 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
  */
 static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 {
+	struct request_queue *q = sdkp->disk->queue;
 	unsigned char *buffer;
 	u16 rot;
 	const int vpd_len = 64;
@@ -2734,10 +2787,21 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 	rot = get_unaligned_be16(&buffer[4]);
 
 	if (rot == 1) {
-		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
-		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
+		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
+		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, q);
 	}
 
+	sdkp->zoned = (buffer[8] >> 4) & 3;
+	if (sdkp->zoned == 1)
+		q->limits.zoned = BLK_ZONED_HA;
+	else if (sdkp->device->type == TYPE_ZBC)
+		q->limits.zoned = BLK_ZONED_HM;
+	else
+		q->limits.zoned = BLK_ZONED_NONE;
+	if (blk_queue_is_zoned(q) && sdkp->first_scan)
+		sd_printk(KERN_NOTICE, sdkp, "Host-%s zoned block device\n",
+		      q->limits.zoned == BLK_ZONED_HM ? "managed" : "aware");
+
  out:
 	kfree(buffer);
 }
@@ -2809,6 +2873,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	struct scsi_disk *sdkp = scsi_disk(disk);
 	struct scsi_device *sdp = sdkp->device;
 	struct request_queue *q = sdkp->disk->queue;
+	sector_t old_capacity = sdkp->capacity;
 	unsigned char *buffer;
 	unsigned int dev_max, rw_max;
 
@@ -2842,8 +2907,11 @@ static int sd_revalidate_disk(struct gendisk *disk)
 			sd_read_block_provisioning(sdkp);
 			sd_read_block_limits(sdkp);
 			sd_read_block_characteristics(sdkp);
+			sd_zbc_read_zones(sdkp, buffer);
 		}
 
+		sd_print_capacity(sdkp, old_capacity);
+
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
@@ -3041,9 +3109,16 @@ static int sd_probe(struct device *dev)
 
 	scsi_autopm_get_device(sdp);
 	error = -ENODEV;
-	if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
+	if (sdp->type != TYPE_DISK &&
+	    sdp->type != TYPE_ZBC &&
+	    sdp->type != TYPE_MOD &&
+	    sdp->type != TYPE_RBC)
 		goto out;
 
+#ifndef CONFIG_BLK_DEV_ZONED
+	if (sdp->type == TYPE_ZBC)
+		goto out;
+#endif
 	SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
 					"sd_probe\n"));
 
@@ -3147,6 +3222,8 @@ static int sd_remove(struct device *dev)
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 
+	sd_zbc_remove(sdkp);
+
 	blk_register_region(devt, SD_MINORS, NULL,
 			    sd_default_probe, NULL, NULL);
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index c8d9863..6bd4226 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -64,6 +64,15 @@ struct scsi_disk {
 	struct scsi_device *device;
 	struct device	dev;
 	struct gendisk	*disk;
+#ifdef CONFIG_BLK_DEV_ZONED
+	unsigned int	nr_zones;
+	unsigned int	zone_blocks;
+	unsigned int	zone_shift;
+	unsigned long	*zones_wlock;
+	unsigned int	zones_optimal_open;
+	unsigned int	zones_optimal_nonseq;
+	unsigned int	zones_max_open;
+#endif
 	atomic_t	openers;
 	sector_t	capacity;	/* size in logical blocks */
 	u32		max_xfer_blocks;
@@ -94,6 +103,9 @@ struct scsi_disk {
 	unsigned	lbpvpd : 1;
 	unsigned	ws10 : 1;
 	unsigned	ws16 : 1;
+	unsigned	rc_basis: 2;
+	unsigned	zoned: 2;
+	unsigned	urswrz : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
@@ -156,6 +168,11 @@ static inline unsigned int logical_to_bytes(struct scsi_device *sdev, sector_t b
 	return blocks * sdev->sector_size;
 }
 
+static inline sector_t sectors_to_logical(struct scsi_device *sdev, sector_t sector)
+{
+	return sector >> (ilog2(sdev->sector_size) - 9);
+}
+
 /*
  * Look up the DIX operation based on whether the command is read or
  * write and whether dix and dif are enabled.
@@ -239,4 +256,57 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+static inline int sd_is_zoned(struct scsi_disk *sdkp)
+{
+	return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
+}
+
+#ifdef CONFIG_BLK_DEV_ZONED
+
+extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
+extern void sd_zbc_remove(struct scsi_disk *sdkp);
+extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
+extern int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
+				   sector_t sector, unsigned int nr_sectors);
+extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd);
+extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
+extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
+			    struct scsi_sense_hdr *sshdr);
+
+#else /* CONFIG_BLK_DEV_ZONED */
+
+static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
+				    unsigned char *buf)
+{
+	return 0;
+}
+
+static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
+
+static inline void sd_zbc_print_zones(struct scsi_disk *sdkp) {}
+
+static inline int sd_zbc_setup_read_write(struct scsi_disk *sdkp,
+					  struct request *rq, sector_t sector,
+					  unsigned int num_sectors)
+{
+	/* Let the drive fail requests */
+	return BLKPREP_OK;
+}
+
+static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_KILL;
+}
+
+static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_KILL;
+}
+
+static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
+				   unsigned int good_bytes,
+				   struct scsi_sense_hdr *sshdr) {}
+
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 #endif /* _SCSI_DISK_H */
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
new file mode 100644
index 0000000..a4da0ed
--- /dev/null
+++ b/drivers/scsi/sd_zbc.c
@@ -0,0 +1,624 @@
+/*
+ * SCSI Zoned Block commands
+ *
+ * Copyright (C) 2014-2015 SUSE Linux GmbH
+ * Written by: Hannes Reinecke <hare@suse.de>
+ * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
+ * Modified by: Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include <linux/blkdev.h>
+
+#include <asm/unaligned.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_dbg.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_eh.h>
+
+#include "sd.h"
+#include "scsi_priv.h"
+
+enum zbc_zone_type {
+	ZBC_ZONE_TYPE_CONV = 0x1,
+	ZBC_ZONE_TYPE_SEQWRITE_REQ,
+	ZBC_ZONE_TYPE_SEQWRITE_PREF,
+	ZBC_ZONE_TYPE_RESERVED,
+};
+
+enum zbc_zone_cond {
+	ZBC_ZONE_COND_NO_WP,
+	ZBC_ZONE_COND_EMPTY,
+	ZBC_ZONE_COND_IMP_OPEN,
+	ZBC_ZONE_COND_EXP_OPEN,
+	ZBC_ZONE_COND_CLOSED,
+	ZBC_ZONE_COND_READONLY = 0xd,
+	ZBC_ZONE_COND_FULL,
+	ZBC_ZONE_COND_OFFLINE,
+};
+
+/**
+ * Convert a zone descriptor to a zone struct.
+ */
+static void sd_zbc_parse_report(struct scsi_disk *sdkp,
+				u8 *buf,
+				struct blk_zone *zone)
+{
+	struct scsi_device *sdp = sdkp->device;
+
+	memset(zone, 0, sizeof(struct blk_zone));
+
+	zone->type = buf[0] & 0x0f;
+	zone->cond = (buf[1] >> 4) & 0xf;
+	if (buf[1] & 0x01)
+		zone->reset = 1;
+	if (buf[1] & 0x02)
+		zone->non_seq = 1;
+
+	zone->len = logical_to_sectors(sdp, get_unaligned_be64(&buf[8]));
+	zone->start = logical_to_sectors(sdp, get_unaligned_be64(&buf[16]));
+	zone->wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24]));
+	if (zone->type != ZBC_ZONE_TYPE_CONV &&
+	    zone->cond == ZBC_ZONE_COND_FULL)
+		zone->wp = zone->start + zone->len;
+}
+
+/**
+ * Issue a REPORT ZONES scsi command.
+ */
+static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
+			       unsigned int buflen, sector_t lba)
+{
+	struct scsi_device *sdp = sdkp->device;
+	const int timeout = sdp->request_queue->rq_timeout;
+	struct scsi_sense_hdr sshdr;
+	unsigned char cmd[16];
+	unsigned int rep_len;
+	int result;
+
+	memset(cmd, 0, 16);
+	cmd[0] = ZBC_IN;
+	cmd[1] = ZI_REPORT_ZONES;
+	put_unaligned_be64(lba, &cmd[2]);
+	put_unaligned_be32(buflen, &cmd[10]);
+	memset(buf, 0, buflen);
+
+	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
+				  buf, buflen, &sshdr,
+				  timeout, SD_MAX_RETRIES, NULL);
+	if (result) {
+		sd_printk(KERN_ERR, sdkp,
+			  "REPORT ZONES lba %llu failed with %d/%d\n",
+			  (unsigned long long)lba,
+			  host_byte(result), driver_byte(result));
+		return -EIO;
+	}
+
+	rep_len = get_unaligned_be32(&buf[0]);
+	if (rep_len < 64) {
+		sd_printk(KERN_ERR, sdkp,
+			  "REPORT ZONES report invalid length %u\n",
+			  rep_len);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t lba, sector = blk_rq_pos(rq);
+	unsigned int nr_bytes = blk_rq_bytes(rq);
+	int ret;
+
+	WARN_ON(nr_bytes == 0);
+
+	if (!sd_is_zoned(sdkp))
+		/* Not a zoned device */
+		return BLKPREP_KILL;
+
+	ret = scsi_init_io(cmd);
+	if (ret != BLKPREP_OK)
+		return ret;
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_IN;
+	cmd->cmnd[1] = ZI_REPORT_ZONES;
+	lba = sectors_to_logical(sdkp->device, sector);
+	put_unaligned_be64(lba, &cmd->cmnd[2]);
+	put_unaligned_be32(nr_bytes, &cmd->cmnd[10]);
+	/* Do partial report for speeding things up */
+	cmd->cmnd[14] = ZBC_REPORT_ZONE_PARTIAL;
+
+	cmd->sc_data_direction = DMA_FROM_DEVICE;
+	cmd->sdb.length = nr_bytes;
+	cmd->transfersize = sdkp->device->sector_size;
+	cmd->allowed = 0;
+
+	/*
+	 * Report may return less bytes than requested. Make sure
+	 * to report completion on the entire initial request.
+	 */
+	rq->__data_len = nr_bytes;
+
+	return BLKPREP_OK;
+}
+
+static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
+					 unsigned int good_bytes)
+{
+	struct request *rq = scmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	struct sg_mapping_iter miter;
+	struct blk_zone_report_hdr hdr;
+	struct blk_zone zone;
+	unsigned int offset, bytes = 0;
+	unsigned long flags;
+	u8 *buf;
+
+	if (good_bytes < 64)
+		return;
+
+	memset(&hdr, 0, sizeof(struct blk_zone_report_hdr));
+
+	sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd),
+		       SG_MITER_TO_SG | SG_MITER_ATOMIC);
+
+	local_irq_save(flags);
+	while (sg_miter_next(&miter) && bytes < good_bytes) {
+
+		buf = miter.addr;
+		offset = 0;
+
+		if (bytes == 0) {
+			/* Set the report header */
+			hdr.nr_zones = min_t(unsigned int,
+					 (good_bytes - 64) / 64,
+					 get_unaligned_be32(&buf[0]) / 64);
+			memcpy(buf, &hdr, sizeof(struct blk_zone_report_hdr));
+			offset += 64;
+			bytes += 64;
+		}
+
+		/* Parse zone descriptors */
+		while (offset < miter.length && hdr.nr_zones) {
+			WARN_ON(offset > miter.length);
+			buf = miter.addr + offset;
+			sd_zbc_parse_report(sdkp, buf, &zone);
+			memcpy(buf, &zone, sizeof(struct blk_zone));
+			offset += 64;
+			bytes += 64;
+			hdr.nr_zones--;
+		}
+
+		if (!hdr.nr_zones)
+			break;
+
+	}
+	sg_miter_stop(&miter);
+	local_irq_restore(flags);
+}
+
+static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
+{
+	return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
+}
+
+static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
+					  sector_t sector)
+{
+	return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
+}
+
+int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t sector = blk_rq_pos(rq);
+	sector_t block = sectors_to_logical(sdkp->device, sector);
+
+	if (!sd_is_zoned(sdkp))
+		/* Not a zoned device */
+		return BLKPREP_KILL;
+
+	if (sdkp->device->changed)
+		return BLKPREP_KILL;
+
+	if (sector & (sd_zbc_zone_sectors(sdkp) - 1))
+		/* Unaligned request */
+		return BLKPREP_KILL;
+
+	/* Do not allow concurrent reset and writes */
+	if (!test_and_set_bit(sd_zbc_zone_no(sdkp, sector),
+			      sdkp->zones_wlock))
+		return BLKPREP_DEFER;
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_OUT;
+	cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
+	put_unaligned_be64(block, &cmd->cmnd[2]);
+
+	rq->timeout = SD_TIMEOUT;
+	cmd->sc_data_direction = DMA_NONE;
+	cmd->transfersize = 0;
+	cmd->allowed = 0;
+
+	return BLKPREP_OK;
+}
+
+int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
+			    sector_t sector, unsigned int nr_sectors)
+{
+	sector_t zone_sectors = sd_zbc_zone_sectors(sdkp);
+	sector_t zone_ofst = sector & (zone_sectors - 1);
+
+	/*
+	 * Note: alignment of the read/write on logical blocks
+	 * is done after this function returns in sd_setup_read_write.
+	 */
+
+	/* Do not allow zone boundaries crossing */
+	if (zone_ofst + nr_sectors > zone_sectors)
+		return BLKPREP_KILL;
+
+	/*
+	 * Do not issue more than one write at a time per
+	 * zone. This solves write ordering problems due to
+	 * the unlocking of the request queue in the dispatch
+	 * path in the non scsi-mq case. For scsi-mq, this
+	 * also avoids potential write reordering when multiple
+	 * threads running on different CPUs write to the same
+	 * zone (with a synchronized sequential pattern).
+	 */
+	if (req_op(rq) == REQ_OP_WRITE ||
+	    req_op(rq) == REQ_OP_WRITE_SAME) {
+		if (!test_and_set_bit(sd_zbc_zone_no(sdkp, sector),
+				      sdkp->zones_wlock))
+			return BLKPREP_DEFER;
+	}
+
+	return BLKPREP_OK;
+}
+
+void sd_zbc_complete(struct scsi_cmnd *cmd,
+		     unsigned int good_bytes,
+		     struct scsi_sense_hdr *sshdr)
+{
+	int result = cmd->result;
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+
+	switch (req_op(rq)) {
+	case REQ_OP_WRITE:
+	case REQ_OP_WRITE_SAME:
+
+		if (result &&
+		    sshdr->sense_key == ILLEGAL_REQUEST &&
+		    sshdr->asc == 0x21)
+			/*
+			 * It is unlikely that retrying write requests failed
+			 * with any kind of alignement error will result in
+			 * success. So don't.
+			 */
+			cmd->allowed = 0;
+
+		/* Fallthru */
+
+	case REQ_OP_ZONE_RESET:
+
+		/* Unlock the zone */
+		clear_bit_unlock(sd_zbc_zone_no(sdkp, blk_rq_pos(rq)),
+				 sdkp->zones_wlock);
+		smp_mb__after_atomic();
+
+		if (result &&
+		    sshdr->sense_key == ILLEGAL_REQUEST &&
+		    sshdr->asc == 0x24)
+			/*
+			 * INVALID FIELD IN CDB error: Reset of a conventional
+			 * zone was attempted. Nothing to worry about,
+			 * so be quiet about the error.
+			 */
+			rq->cmd_flags |= REQ_QUIET;
+
+		break;
+
+	case REQ_OP_ZONE_REPORT:
+
+		if (!result)
+			sd_zbc_report_zones_complete(cmd, good_bytes);
+		break;
+
+	}
+}
+
+/**
+ * Read zoned block device characteristics (VPD page B6).
+ */
+static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
+					     unsigned char *buf)
+{
+
+	if (scsi_get_vpd_page(sdkp->device, 0xb6, buf, 64)) {
+		sd_printk(KERN_NOTICE, sdkp,
+			  "Unconstrained-read check failed\n");
+		return -ENODEV;
+	}
+
+	if (sdkp->device->type != TYPE_ZBC) {
+		/* Host-aware */
+		sdkp->urswrz = 1;
+		sdkp->zones_optimal_open = get_unaligned_be64(&buf[8]);
+		sdkp->zones_optimal_nonseq = get_unaligned_be64(&buf[12]);
+		sdkp->zones_max_open = 0;
+	} else {
+		/* Host-managed */
+		sdkp->urswrz = buf[4] & 1;
+		sdkp->zones_optimal_open = 0;
+		sdkp->zones_optimal_nonseq = 0;
+		sdkp->zones_max_open = get_unaligned_be64(&buf[16]);
+	}
+
+	return 0;
+}
+
+/**
+ * Check reported capacity.
+ */
+static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
+				 unsigned char *buf)
+{
+	sector_t lba;
+	int ret;
+
+	if (sdkp->rc_basis != 0)
+		return 0;
+
+	/* Do a report zone to get the maximum LBA to check capacity */
+	ret = sd_zbc_report_zones(sdkp, buf, SD_BUF_SIZE, 0);
+	if (ret)
+		return ret;
+
+	/* The max_lba field is the capacity of this device */
+	lba = get_unaligned_be64(&buf[8]);
+	if (lba + 1 == sdkp->capacity)
+		return 0;
+
+	if (sdkp->first_scan)
+		sd_printk(KERN_WARNING, sdkp,
+			  "Changing capacity from %zu to max LBA+1 %llu\n",
+			  sdkp->capacity,
+			  (unsigned long long)lba + 1);
+	sdkp->capacity = lba + 1;
+
+	return 0;
+}
+
+#define SD_ZBC_BUF_SIZE 131072
+
+static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
+{
+	u64 zone_blocks;
+	sector_t block = 0;
+	unsigned char *buf;
+	unsigned char *rec;
+	unsigned int buf_len;
+	unsigned int list_length;
+	int ret;
+	u8 same;
+
+	sdkp->zone_blocks = 0;
+
+	/* Get a buffer */
+	buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* Do a report zone to get the same field */
+	ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0);
+	if (ret)
+		goto out;
+
+	same = buf[4] & 0x0f;
+	if (same > 0) {
+		rec = &buf[64];
+		zone_blocks = get_unaligned_be64(&rec[8]);
+		goto out;
+	}
+
+	/*
+	 * Check the size of all zones: all zones must be of
+	 * equal size, except the last zone which can be smaller
+	 * than other zones.
+	 */
+	do {
+
+		/* Parse REPORT ZONES header */
+		list_length = get_unaligned_be32(&buf[0]) + 64;
+		rec = buf + 64;
+		if (list_length < SD_ZBC_BUF_SIZE)
+			buf_len = list_length;
+		else
+			buf_len = SD_ZBC_BUF_SIZE;
+
+		/* Parse zone descriptors */
+		while (rec < buf + buf_len) {
+			zone_blocks = get_unaligned_be64(&rec[8]);
+			if (sdkp->zone_blocks == 0) {
+				sdkp->zone_blocks = zone_blocks;
+			} else if (zone_blocks != sdkp->zone_blocks &&
+				   (block + zone_blocks < sdkp->capacity
+				    || zone_blocks > sdkp->zone_blocks)) {
+				zone_blocks = 0;
+				goto out;
+			}
+			block += zone_blocks;
+			rec += 64;
+		}
+
+		if (block < sdkp->capacity) {
+			ret = sd_zbc_report_zones(sdkp, buf,
+						  SD_ZBC_BUF_SIZE, block);
+			if (ret)
+				return ret;
+		}
+
+	} while (block < sdkp->capacity);
+
+	zone_blocks = sdkp->zone_blocks;
+
+out:
+	kfree(buf);
+
+	if (!zone_blocks) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Devices with non constant zone "
+				  "size are not supported\n");
+		return -ENODEV;
+	}
+
+	if (!is_power_of_2(zone_blocks)) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Devices with non power of 2 zone "
+				  "size are not supported\n");
+		return -ENODEV;
+	}
+
+	if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Zone size too large\n");
+		return -ENODEV;
+	}
+
+	sdkp->zone_blocks = zone_blocks;
+
+	return 0;
+}
+
+static int sd_zbc_setup(struct scsi_disk *sdkp)
+{
+
+	/* chunk_sectors indicates the zone size */
+	blk_queue_chunk_sectors(sdkp->disk->queue,
+			logical_to_sectors(sdkp->device, sdkp->zone_blocks));
+	sdkp->zone_shift = ilog2(sdkp->zone_blocks);
+	sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;
+	if (sdkp->capacity & (sdkp->zone_blocks - 1))
+		sdkp->nr_zones++;
+
+	if (!sdkp->zones_wlock) {
+		sdkp->zones_wlock = kzalloc(BITS_TO_LONGS(sdkp->nr_zones),
+					    GFP_KERNEL);
+		if (!sdkp->zones_wlock)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+int sd_zbc_read_zones(struct scsi_disk *sdkp,
+		      unsigned char *buf)
+{
+	sector_t capacity;
+	int ret = 0;
+
+	if (!sd_is_zoned(sdkp))
+		/*
+		 * Device managed or normal SCSI disk,
+		 * no special handling required
+		 */
+		return 0;
+
+
+	/* Get zoned block device characteristics */
+	ret = sd_zbc_read_zoned_characteristics(sdkp, buf);
+	if (ret)
+		goto err;
+
+	/*
+	 * Check for unconstrained reads: host-managed devices with
+	 * constrained reads (drives failing read after write pointer)
+	 * are not supported.
+	 */
+	if (!sdkp->urswrz) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+			  "constrained reads devices are not supported\n");
+		ret = -ENODEV;
+		goto err;
+	}
+
+	/* Check capacity */
+	ret = sd_zbc_check_capacity(sdkp, buf);
+	if (ret)
+		goto err;
+	capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+
+	/*
+	 * Check zone size: only devices with a constant zone size (except
+	 * an eventual last runt zone) that is a power of 2 are supported.
+	 */
+	ret = sd_zbc_check_zone_size(sdkp);
+	if (ret)
+		goto err;
+
+	/* The drive satisfies the kernel restrictions: set it up */
+	ret = sd_zbc_setup(sdkp);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	sdkp->capacity = 0;
+
+	return ret;
+}
+
+void sd_zbc_remove(struct scsi_disk *sdkp)
+{
+	kfree(sdkp->zones_wlock);
+	sdkp->zones_wlock = NULL;
+}
+
+void sd_zbc_print_zones(struct scsi_disk *sdkp)
+{
+	if (!sd_is_zoned(sdkp) || !sdkp->capacity)
+		return;
+
+	if (sdkp->capacity & (sdkp->zone_blocks - 1))
+		sd_printk(KERN_NOTICE, sdkp,
+			  "%u zones of %u logical blocks + 1 runt zone\n",
+			  sdkp->nr_zones - 1,
+			  sdkp->zone_blocks);
+	else
+		sd_printk(KERN_NOTICE, sdkp,
+			  "%u zones of %u logical blocks\n",
+			  sdkp->nr_zones,
+			  sdkp->zone_blocks);
+}
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index d1defd1..6ba66e0 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -299,4 +299,21 @@ struct scsi_lun {
 #define SCSI_ACCESS_STATE_MASK        0x0f
 #define SCSI_ACCESS_STATE_PREFERRED   0x80
 
+/* Reporting options for REPORT ZONES */
+enum zbc_zone_reporting_options {
+	ZBC_ZONE_REPORTING_OPTION_ALL = 0,
+	ZBC_ZONE_REPORTING_OPTION_EMPTY,
+	ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_CLOSED,
+	ZBC_ZONE_REPORTING_OPTION_FULL,
+	ZBC_ZONE_REPORTING_OPTION_READONLY,
+	ZBC_ZONE_REPORTING_OPTION_OFFLINE,
+	ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
+	ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
+	ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
+};
+
+#define ZBC_REPORT_ZONE_PARTIAL 0x80
+
 #endif /* _SCSI_PROTO_H_ */
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 6/7] sd: Implement support for ZBC devices
@ 2016-09-28  8:45   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Implement ZBC support functions to setup zoned disks, both
host-managed and host-aware models. Only zoned disks that satisfy
the following conditions are supported:
1) All zones are the same size, with the exception of an eventual
   last smaller runt zone.
2) For host-managed disks, reads are unrestricted (reads are not
   failed due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are setup with
a capacity of 0 to prevent their use.

The function sd_zbc_read_zones, called from sd_revalidate_disk,
checks that the device satisfies the above two constraints. This
function may also change the disk capacity previously set by
sd_read_capacity for devices reporting only the capacity of
conventional zones at the beginning of the LBA range (i.e. devices
reporting rc_basis set to 0).

The capacity message output was moved out of sd_read_capacity into
a new function sd_print_capacity to include this eventual capacity
change by sd_zbc_read_zones. This new function also includes a call
to sd_zbc_print_zones to display the number of zones and zone size
of the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed zone cache support
         * Removed mapping of discard to reset write pointer command
         * Modified sd_zbc_read_zones to include checks that the
           device satisfies the kernel constraints
         * Implemeted REPORT ZONES setup and post-processing based
           on code from Shaun Tancheff <shaun.tancheff@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 drivers/scsi/Makefile     |   1 +
 drivers/scsi/sd.c         | 143 ++++++++---
 drivers/scsi/sd.h         |  70 ++++++
 drivers/scsi/sd_zbc.c     | 624 ++++++++++++++++++++++++++++++++++++++++++++++
 include/scsi/scsi_proto.h |  17 ++
 5 files changed, 822 insertions(+), 33 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index fc0d9b8..350513c 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -180,6 +180,7 @@ hv_storvsc-y			:= storvsc_drv.o
 
 sd_mod-objs	:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
+sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
 
 sr_mod-objs	:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 51e5629..4d63260 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -93,6 +93,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS	16
@@ -163,7 +164,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 	static const char temp[] = "temporary ";
 	int len;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		/* no cache control on RBC devices; theoretically they
 		 * can do it, but there's probably so many exceptions
 		 * it's not worth the risk */
@@ -262,7 +263,7 @@ allow_restart_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	sdp->allow_restart = simple_strtoul(buf, NULL, 10);
@@ -392,6 +393,11 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
+	if (sd_is_zoned(sdkp)) {
+		sd_config_discard(sdkp, SD_LBP_DISABLE);
+		return count;
+	}
+
 	if (sdp->type != TYPE_DISK)
 		return -EINVAL;
 
@@ -459,7 +465,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &max);
@@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
 	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
+		if (ret != BLKPREP_OK)
+			return ret;
+	}
+
 	sector >>= ilog2(sdp->sector_size) - 9;
 	nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
@@ -963,6 +976,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
 					(unsigned long long)block));
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
+		if (ret != BLKPREP_OK)
+			goto out;
+	}
+
 	/*
 	 * If we have a 1K hardware sectorsize, prevent access to single
 	 * 512 byte sectors.  In theory we could handle this - in fact
@@ -1149,6 +1169,10 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 	case REQ_OP_READ:
 	case REQ_OP_WRITE:
 		return sd_setup_read_write_cmnd(cmd);
+	case REQ_OP_ZONE_REPORT:
+		return sd_zbc_setup_report_cmnd(cmd);
+	case REQ_OP_ZONE_RESET:
+		return sd_zbc_setup_reset_cmnd(cmd);
 	default:
 		BUG();
 	}
@@ -1780,7 +1804,10 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	unsigned char op = SCpnt->cmnd[0];
 	unsigned char unmap = SCpnt->cmnd[1] & 8;
 
-	if (req_op(req) == REQ_OP_DISCARD || req_op(req) == REQ_OP_WRITE_SAME) {
+	switch (req_op(req)) {
+	case REQ_OP_DISCARD:
+	case REQ_OP_WRITE_SAME:
+	case REQ_OP_ZONE_RESET:
 		if (!result) {
 			good_bytes = blk_rq_bytes(req);
 			scsi_set_resid(SCpnt, 0);
@@ -1788,6 +1815,17 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			good_bytes = 0;
 			scsi_set_resid(SCpnt, blk_rq_bytes(req));
 		}
+		break;
+	case REQ_OP_ZONE_REPORT:
+		if (!result) {
+			good_bytes = scsi_bufflen(SCpnt)
+				- scsi_get_resid(SCpnt);
+			scsi_set_resid(SCpnt, 0);
+		} else {
+			good_bytes = 0;
+			scsi_set_resid(SCpnt, blk_rq_bytes(req));
+		}
+		break;
 	}
 
 	if (result) {
@@ -1848,7 +1886,11 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	default:
 		break;
 	}
+
  out:
+	if (sd_is_zoned(sdkp))
+		sd_zbc_complete(SCpnt, good_bytes, &sshdr);
+
 	SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt,
 					   "sd_done: completed %d of %d bytes\n",
 					   good_bytes, scsi_bufflen(SCpnt)));
@@ -1983,7 +2025,6 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 	}
 }
 
-
 /*
  * Determine whether disk supports Data Integrity Field.
  */
@@ -2133,6 +2174,9 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	/* Logical blocks per physical block exponent */
 	sdkp->physical_block_size = (1 << (buffer[13] & 0xf)) * sector_size;
 
+	/* RC basis */
+	sdkp->rc_basis = (buffer[12] >> 4) & 0x3;
+
 	/* Lowest aligned logical block */
 	alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
 	blk_queue_alignment_offset(sdp->request_queue, alignment);
@@ -2242,7 +2286,6 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
 {
 	int sector_size;
 	struct scsi_device *sdp = sdkp->device;
-	sector_t old_capacity = sdkp->capacity;
 
 	if (sd_try_rc16_first(sdp)) {
 		sector_size = read_capacity_16(sdkp, sdp, buffer);
@@ -2323,35 +2366,44 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
 		sector_size = 512;
 	}
 	blk_queue_logical_block_size(sdp->request_queue, sector_size);
+	blk_queue_physical_block_size(sdp->request_queue,
+				      sdkp->physical_block_size);
+	sdkp->device->sector_size = sector_size;
 
-	{
-		char cap_str_2[10], cap_str_10[10];
+	if (sdkp->capacity > 0xffffffff)
+		sdp->use_16_for_rw = 1;
 
-		string_get_size(sdkp->capacity, sector_size,
-				STRING_UNITS_2, cap_str_2, sizeof(cap_str_2));
-		string_get_size(sdkp->capacity, sector_size,
-				STRING_UNITS_10, cap_str_10,
-				sizeof(cap_str_10));
+}
 
-		if (sdkp->first_scan || old_capacity != sdkp->capacity) {
-			sd_printk(KERN_NOTICE, sdkp,
-				  "%llu %d-byte logical blocks: (%s/%s)\n",
-				  (unsigned long long)sdkp->capacity,
-				  sector_size, cap_str_10, cap_str_2);
+/*
+ * Print disk capacity
+ */
+static void
+sd_print_capacity(struct scsi_disk *sdkp,
+		  sector_t old_capacity)
+{
+	int sector_size = sdkp->device->sector_size;
+	char cap_str_2[10], cap_str_10[10];
 
-			if (sdkp->physical_block_size != sector_size)
-				sd_printk(KERN_NOTICE, sdkp,
-					  "%u-byte physical blocks\n",
-					  sdkp->physical_block_size);
-		}
-	}
+	string_get_size(sdkp->capacity, sector_size,
+			STRING_UNITS_2, cap_str_2, sizeof(cap_str_2));
+	string_get_size(sdkp->capacity, sector_size,
+			STRING_UNITS_10, cap_str_10,
+			sizeof(cap_str_10));
 
-	if (sdkp->capacity > 0xffffffff)
-		sdp->use_16_for_rw = 1;
+	if (sdkp->first_scan || old_capacity != sdkp->capacity) {
+		sd_printk(KERN_NOTICE, sdkp,
+			  "%llu %d-byte logical blocks: (%s/%s)\n",
+			  (unsigned long long)sdkp->capacity,
+			  sector_size, cap_str_10, cap_str_2);
 
-	blk_queue_physical_block_size(sdp->request_queue,
-				      sdkp->physical_block_size);
-	sdkp->device->sector_size = sector_size;
+		if (sdkp->physical_block_size != sector_size)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "%u-byte physical blocks\n",
+				  sdkp->physical_block_size);
+
+		sd_zbc_print_zones(sdkp);
+	}
 }
 
 /* called with buffer of length 512 */
@@ -2613,7 +2665,7 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
 	struct scsi_mode_data data;
 	struct scsi_sense_hdr sshdr;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return;
 
 	if (sdkp->protection_type == 0)
@@ -2720,6 +2772,7 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
  */
 static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 {
+	struct request_queue *q = sdkp->disk->queue;
 	unsigned char *buffer;
 	u16 rot;
 	const int vpd_len = 64;
@@ -2734,10 +2787,21 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 	rot = get_unaligned_be16(&buffer[4]);
 
 	if (rot == 1) {
-		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
-		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
+		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
+		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, q);
 	}
 
+	sdkp->zoned = (buffer[8] >> 4) & 3;
+	if (sdkp->zoned == 1)
+		q->limits.zoned = BLK_ZONED_HA;
+	else if (sdkp->device->type == TYPE_ZBC)
+		q->limits.zoned = BLK_ZONED_HM;
+	else
+		q->limits.zoned = BLK_ZONED_NONE;
+	if (blk_queue_is_zoned(q) && sdkp->first_scan)
+		sd_printk(KERN_NOTICE, sdkp, "Host-%s zoned block device\n",
+		      q->limits.zoned == BLK_ZONED_HM ? "managed" : "aware");
+
  out:
 	kfree(buffer);
 }
@@ -2809,6 +2873,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	struct scsi_disk *sdkp = scsi_disk(disk);
 	struct scsi_device *sdp = sdkp->device;
 	struct request_queue *q = sdkp->disk->queue;
+	sector_t old_capacity = sdkp->capacity;
 	unsigned char *buffer;
 	unsigned int dev_max, rw_max;
 
@@ -2842,8 +2907,11 @@ static int sd_revalidate_disk(struct gendisk *disk)
 			sd_read_block_provisioning(sdkp);
 			sd_read_block_limits(sdkp);
 			sd_read_block_characteristics(sdkp);
+			sd_zbc_read_zones(sdkp, buffer);
 		}
 
+		sd_print_capacity(sdkp, old_capacity);
+
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
@@ -3041,9 +3109,16 @@ static int sd_probe(struct device *dev)
 
 	scsi_autopm_get_device(sdp);
 	error = -ENODEV;
-	if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
+	if (sdp->type != TYPE_DISK &&
+	    sdp->type != TYPE_ZBC &&
+	    sdp->type != TYPE_MOD &&
+	    sdp->type != TYPE_RBC)
 		goto out;
 
+#ifndef CONFIG_BLK_DEV_ZONED
+	if (sdp->type == TYPE_ZBC)
+		goto out;
+#endif
 	SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
 					"sd_probe\n"));
 
@@ -3147,6 +3222,8 @@ static int sd_remove(struct device *dev)
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 
+	sd_zbc_remove(sdkp);
+
 	blk_register_region(devt, SD_MINORS, NULL,
 			    sd_default_probe, NULL, NULL);
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index c8d9863..6bd4226 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -64,6 +64,15 @@ struct scsi_disk {
 	struct scsi_device *device;
 	struct device	dev;
 	struct gendisk	*disk;
+#ifdef CONFIG_BLK_DEV_ZONED
+	unsigned int	nr_zones;
+	unsigned int	zone_blocks;
+	unsigned int	zone_shift;
+	unsigned long	*zones_wlock;
+	unsigned int	zones_optimal_open;
+	unsigned int	zones_optimal_nonseq;
+	unsigned int	zones_max_open;
+#endif
 	atomic_t	openers;
 	sector_t	capacity;	/* size in logical blocks */
 	u32		max_xfer_blocks;
@@ -94,6 +103,9 @@ struct scsi_disk {
 	unsigned	lbpvpd : 1;
 	unsigned	ws10 : 1;
 	unsigned	ws16 : 1;
+	unsigned	rc_basis: 2;
+	unsigned	zoned: 2;
+	unsigned	urswrz : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
@@ -156,6 +168,11 @@ static inline unsigned int logical_to_bytes(struct scsi_device *sdev, sector_t b
 	return blocks * sdev->sector_size;
 }
 
+static inline sector_t sectors_to_logical(struct scsi_device *sdev, sector_t sector)
+{
+	return sector >> (ilog2(sdev->sector_size) - 9);
+}
+
 /*
  * Look up the DIX operation based on whether the command is read or
  * write and whether dix and dif are enabled.
@@ -239,4 +256,57 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+static inline int sd_is_zoned(struct scsi_disk *sdkp)
+{
+	return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
+}
+
+#ifdef CONFIG_BLK_DEV_ZONED
+
+extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
+extern void sd_zbc_remove(struct scsi_disk *sdkp);
+extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
+extern int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
+				   sector_t sector, unsigned int nr_sectors);
+extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd);
+extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
+extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
+			    struct scsi_sense_hdr *sshdr);
+
+#else /* CONFIG_BLK_DEV_ZONED */
+
+static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
+				    unsigned char *buf)
+{
+	return 0;
+}
+
+static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
+
+static inline void sd_zbc_print_zones(struct scsi_disk *sdkp) {}
+
+static inline int sd_zbc_setup_read_write(struct scsi_disk *sdkp,
+					  struct request *rq, sector_t sector,
+					  unsigned int num_sectors)
+{
+	/* Let the drive fail requests */
+	return BLKPREP_OK;
+}
+
+static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_KILL;
+}
+
+static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_KILL;
+}
+
+static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
+				   unsigned int good_bytes,
+				   struct scsi_sense_hdr *sshdr) {}
+
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 #endif /* _SCSI_DISK_H */
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
new file mode 100644
index 0000000..a4da0ed
--- /dev/null
+++ b/drivers/scsi/sd_zbc.c
@@ -0,0 +1,624 @@
+/*
+ * SCSI Zoned Block commands
+ *
+ * Copyright (C) 2014-2015 SUSE Linux GmbH
+ * Written by: Hannes Reinecke <hare@suse.de>
+ * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
+ * Modified by: Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include <linux/blkdev.h>
+
+#include <asm/unaligned.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_dbg.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_eh.h>
+
+#include "sd.h"
+#include "scsi_priv.h"
+
+enum zbc_zone_type {
+	ZBC_ZONE_TYPE_CONV = 0x1,
+	ZBC_ZONE_TYPE_SEQWRITE_REQ,
+	ZBC_ZONE_TYPE_SEQWRITE_PREF,
+	ZBC_ZONE_TYPE_RESERVED,
+};
+
+enum zbc_zone_cond {
+	ZBC_ZONE_COND_NO_WP,
+	ZBC_ZONE_COND_EMPTY,
+	ZBC_ZONE_COND_IMP_OPEN,
+	ZBC_ZONE_COND_EXP_OPEN,
+	ZBC_ZONE_COND_CLOSED,
+	ZBC_ZONE_COND_READONLY = 0xd,
+	ZBC_ZONE_COND_FULL,
+	ZBC_ZONE_COND_OFFLINE,
+};
+
+/**
+ * Convert a zone descriptor to a zone struct.
+ */
+static void sd_zbc_parse_report(struct scsi_disk *sdkp,
+				u8 *buf,
+				struct blk_zone *zone)
+{
+	struct scsi_device *sdp = sdkp->device;
+
+	memset(zone, 0, sizeof(struct blk_zone));
+
+	zone->type = buf[0] & 0x0f;
+	zone->cond = (buf[1] >> 4) & 0xf;
+	if (buf[1] & 0x01)
+		zone->reset = 1;
+	if (buf[1] & 0x02)
+		zone->non_seq = 1;
+
+	zone->len = logical_to_sectors(sdp, get_unaligned_be64(&buf[8]));
+	zone->start = logical_to_sectors(sdp, get_unaligned_be64(&buf[16]));
+	zone->wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24]));
+	if (zone->type != ZBC_ZONE_TYPE_CONV &&
+	    zone->cond == ZBC_ZONE_COND_FULL)
+		zone->wp = zone->start + zone->len;
+}
+
+/**
+ * Issue a REPORT ZONES scsi command.
+ */
+static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
+			       unsigned int buflen, sector_t lba)
+{
+	struct scsi_device *sdp = sdkp->device;
+	const int timeout = sdp->request_queue->rq_timeout;
+	struct scsi_sense_hdr sshdr;
+	unsigned char cmd[16];
+	unsigned int rep_len;
+	int result;
+
+	memset(cmd, 0, 16);
+	cmd[0] = ZBC_IN;
+	cmd[1] = ZI_REPORT_ZONES;
+	put_unaligned_be64(lba, &cmd[2]);
+	put_unaligned_be32(buflen, &cmd[10]);
+	memset(buf, 0, buflen);
+
+	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
+				  buf, buflen, &sshdr,
+				  timeout, SD_MAX_RETRIES, NULL);
+	if (result) {
+		sd_printk(KERN_ERR, sdkp,
+			  "REPORT ZONES lba %llu failed with %d/%d\n",
+			  (unsigned long long)lba,
+			  host_byte(result), driver_byte(result));
+		return -EIO;
+	}
+
+	rep_len = get_unaligned_be32(&buf[0]);
+	if (rep_len < 64) {
+		sd_printk(KERN_ERR, sdkp,
+			  "REPORT ZONES report invalid length %u\n",
+			  rep_len);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t lba, sector = blk_rq_pos(rq);
+	unsigned int nr_bytes = blk_rq_bytes(rq);
+	int ret;
+
+	WARN_ON(nr_bytes == 0);
+
+	if (!sd_is_zoned(sdkp))
+		/* Not a zoned device */
+		return BLKPREP_KILL;
+
+	ret = scsi_init_io(cmd);
+	if (ret != BLKPREP_OK)
+		return ret;
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_IN;
+	cmd->cmnd[1] = ZI_REPORT_ZONES;
+	lba = sectors_to_logical(sdkp->device, sector);
+	put_unaligned_be64(lba, &cmd->cmnd[2]);
+	put_unaligned_be32(nr_bytes, &cmd->cmnd[10]);
+	/* Do partial report for speeding things up */
+	cmd->cmnd[14] = ZBC_REPORT_ZONE_PARTIAL;
+
+	cmd->sc_data_direction = DMA_FROM_DEVICE;
+	cmd->sdb.length = nr_bytes;
+	cmd->transfersize = sdkp->device->sector_size;
+	cmd->allowed = 0;
+
+	/*
+	 * Report may return less bytes than requested. Make sure
+	 * to report completion on the entire initial request.
+	 */
+	rq->__data_len = nr_bytes;
+
+	return BLKPREP_OK;
+}
+
+static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
+					 unsigned int good_bytes)
+{
+	struct request *rq = scmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	struct sg_mapping_iter miter;
+	struct blk_zone_report_hdr hdr;
+	struct blk_zone zone;
+	unsigned int offset, bytes = 0;
+	unsigned long flags;
+	u8 *buf;
+
+	if (good_bytes < 64)
+		return;
+
+	memset(&hdr, 0, sizeof(struct blk_zone_report_hdr));
+
+	sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd),
+		       SG_MITER_TO_SG | SG_MITER_ATOMIC);
+
+	local_irq_save(flags);
+	while (sg_miter_next(&miter) && bytes < good_bytes) {
+
+		buf = miter.addr;
+		offset = 0;
+
+		if (bytes == 0) {
+			/* Set the report header */
+			hdr.nr_zones = min_t(unsigned int,
+					 (good_bytes - 64) / 64,
+					 get_unaligned_be32(&buf[0]) / 64);
+			memcpy(buf, &hdr, sizeof(struct blk_zone_report_hdr));
+			offset += 64;
+			bytes += 64;
+		}
+
+		/* Parse zone descriptors */
+		while (offset < miter.length && hdr.nr_zones) {
+			WARN_ON(offset > miter.length);
+			buf = miter.addr + offset;
+			sd_zbc_parse_report(sdkp, buf, &zone);
+			memcpy(buf, &zone, sizeof(struct blk_zone));
+			offset += 64;
+			bytes += 64;
+			hdr.nr_zones--;
+		}
+
+		if (!hdr.nr_zones)
+			break;
+
+	}
+	sg_miter_stop(&miter);
+	local_irq_restore(flags);
+}
+
+static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
+{
+	return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
+}
+
+static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
+					  sector_t sector)
+{
+	return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
+}
+
+int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t sector = blk_rq_pos(rq);
+	sector_t block = sectors_to_logical(sdkp->device, sector);
+
+	if (!sd_is_zoned(sdkp))
+		/* Not a zoned device */
+		return BLKPREP_KILL;
+
+	if (sdkp->device->changed)
+		return BLKPREP_KILL;
+
+	if (sector & (sd_zbc_zone_sectors(sdkp) - 1))
+		/* Unaligned request */
+		return BLKPREP_KILL;
+
+	/* Do not allow concurrent reset and writes */
+	if (!test_and_set_bit(sd_zbc_zone_no(sdkp, sector),
+			      sdkp->zones_wlock))
+		return BLKPREP_DEFER;
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_OUT;
+	cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
+	put_unaligned_be64(block, &cmd->cmnd[2]);
+
+	rq->timeout = SD_TIMEOUT;
+	cmd->sc_data_direction = DMA_NONE;
+	cmd->transfersize = 0;
+	cmd->allowed = 0;
+
+	return BLKPREP_OK;
+}
+
+int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
+			    sector_t sector, unsigned int nr_sectors)
+{
+	sector_t zone_sectors = sd_zbc_zone_sectors(sdkp);
+	sector_t zone_ofst = sector & (zone_sectors - 1);
+
+	/*
+	 * Note: alignment of the read/write on logical blocks
+	 * is done after this function returns in sd_setup_read_write.
+	 */
+
+	/* Do not allow zone boundaries crossing */
+	if (zone_ofst + nr_sectors > zone_sectors)
+		return BLKPREP_KILL;
+
+	/*
+	 * Do not issue more than one write at a time per
+	 * zone. This solves write ordering problems due to
+	 * the unlocking of the request queue in the dispatch
+	 * path in the non scsi-mq case. For scsi-mq, this
+	 * also avoids potential write reordering when multiple
+	 * threads running on different CPUs write to the same
+	 * zone (with a synchronized sequential pattern).
+	 */
+	if (req_op(rq) == REQ_OP_WRITE ||
+	    req_op(rq) == REQ_OP_WRITE_SAME) {
+		if (!test_and_set_bit(sd_zbc_zone_no(sdkp, sector),
+				      sdkp->zones_wlock))
+			return BLKPREP_DEFER;
+	}
+
+	return BLKPREP_OK;
+}
+
+void sd_zbc_complete(struct scsi_cmnd *cmd,
+		     unsigned int good_bytes,
+		     struct scsi_sense_hdr *sshdr)
+{
+	int result = cmd->result;
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+
+	switch (req_op(rq)) {
+	case REQ_OP_WRITE:
+	case REQ_OP_WRITE_SAME:
+
+		if (result &&
+		    sshdr->sense_key == ILLEGAL_REQUEST &&
+		    sshdr->asc == 0x21)
+			/*
+			 * It is unlikely that retrying write requests failed
+			 * with any kind of alignement error will result in
+			 * success. So don't.
+			 */
+			cmd->allowed = 0;
+
+		/* Fallthru */
+
+	case REQ_OP_ZONE_RESET:
+
+		/* Unlock the zone */
+		clear_bit_unlock(sd_zbc_zone_no(sdkp, blk_rq_pos(rq)),
+				 sdkp->zones_wlock);
+		smp_mb__after_atomic();
+
+		if (result &&
+		    sshdr->sense_key == ILLEGAL_REQUEST &&
+		    sshdr->asc == 0x24)
+			/*
+			 * INVALID FIELD IN CDB error: Reset of a conventional
+			 * zone was attempted. Nothing to worry about,
+			 * so be quiet about the error.
+			 */
+			rq->cmd_flags |= REQ_QUIET;
+
+		break;
+
+	case REQ_OP_ZONE_REPORT:
+
+		if (!result)
+			sd_zbc_report_zones_complete(cmd, good_bytes);
+		break;
+
+	}
+}
+
+/**
+ * Read zoned block device characteristics (VPD page B6).
+ */
+static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
+					     unsigned char *buf)
+{
+
+	if (scsi_get_vpd_page(sdkp->device, 0xb6, buf, 64)) {
+		sd_printk(KERN_NOTICE, sdkp,
+			  "Unconstrained-read check failed\n");
+		return -ENODEV;
+	}
+
+	if (sdkp->device->type != TYPE_ZBC) {
+		/* Host-aware */
+		sdkp->urswrz = 1;
+		sdkp->zones_optimal_open = get_unaligned_be64(&buf[8]);
+		sdkp->zones_optimal_nonseq = get_unaligned_be64(&buf[12]);
+		sdkp->zones_max_open = 0;
+	} else {
+		/* Host-managed */
+		sdkp->urswrz = buf[4] & 1;
+		sdkp->zones_optimal_open = 0;
+		sdkp->zones_optimal_nonseq = 0;
+		sdkp->zones_max_open = get_unaligned_be64(&buf[16]);
+	}
+
+	return 0;
+}
+
+/**
+ * Check reported capacity.
+ */
+static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
+				 unsigned char *buf)
+{
+	sector_t lba;
+	int ret;
+
+	if (sdkp->rc_basis != 0)
+		return 0;
+
+	/* Do a report zone to get the maximum LBA to check capacity */
+	ret = sd_zbc_report_zones(sdkp, buf, SD_BUF_SIZE, 0);
+	if (ret)
+		return ret;
+
+	/* The max_lba field is the capacity of this device */
+	lba = get_unaligned_be64(&buf[8]);
+	if (lba + 1 == sdkp->capacity)
+		return 0;
+
+	if (sdkp->first_scan)
+		sd_printk(KERN_WARNING, sdkp,
+			  "Changing capacity from %zu to max LBA+1 %llu\n",
+			  sdkp->capacity,
+			  (unsigned long long)lba + 1);
+	sdkp->capacity = lba + 1;
+
+	return 0;
+}
+
+#define SD_ZBC_BUF_SIZE 131072
+
+static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
+{
+	u64 zone_blocks;
+	sector_t block = 0;
+	unsigned char *buf;
+	unsigned char *rec;
+	unsigned int buf_len;
+	unsigned int list_length;
+	int ret;
+	u8 same;
+
+	sdkp->zone_blocks = 0;
+
+	/* Get a buffer */
+	buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* Do a report zone to get the same field */
+	ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0);
+	if (ret)
+		goto out;
+
+	same = buf[4] & 0x0f;
+	if (same > 0) {
+		rec = &buf[64];
+		zone_blocks = get_unaligned_be64(&rec[8]);
+		goto out;
+	}
+
+	/*
+	 * Check the size of all zones: all zones must be of
+	 * equal size, except the last zone which can be smaller
+	 * than other zones.
+	 */
+	do {
+
+		/* Parse REPORT ZONES header */
+		list_length = get_unaligned_be32(&buf[0]) + 64;
+		rec = buf + 64;
+		if (list_length < SD_ZBC_BUF_SIZE)
+			buf_len = list_length;
+		else
+			buf_len = SD_ZBC_BUF_SIZE;
+
+		/* Parse zone descriptors */
+		while (rec < buf + buf_len) {
+			zone_blocks = get_unaligned_be64(&rec[8]);
+			if (sdkp->zone_blocks == 0) {
+				sdkp->zone_blocks = zone_blocks;
+			} else if (zone_blocks != sdkp->zone_blocks &&
+				   (block + zone_blocks < sdkp->capacity
+				    || zone_blocks > sdkp->zone_blocks)) {
+				zone_blocks = 0;
+				goto out;
+			}
+			block += zone_blocks;
+			rec += 64;
+		}
+
+		if (block < sdkp->capacity) {
+			ret = sd_zbc_report_zones(sdkp, buf,
+						  SD_ZBC_BUF_SIZE, block);
+			if (ret)
+				return ret;
+		}
+
+	} while (block < sdkp->capacity);
+
+	zone_blocks = sdkp->zone_blocks;
+
+out:
+	kfree(buf);
+
+	if (!zone_blocks) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Devices with non constant zone "
+				  "size are not supported\n");
+		return -ENODEV;
+	}
+
+	if (!is_power_of_2(zone_blocks)) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Devices with non power of 2 zone "
+				  "size are not supported\n");
+		return -ENODEV;
+	}
+
+	if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Zone size too large\n");
+		return -ENODEV;
+	}
+
+	sdkp->zone_blocks = zone_blocks;
+
+	return 0;
+}
+
+static int sd_zbc_setup(struct scsi_disk *sdkp)
+{
+
+	/* chunk_sectors indicates the zone size */
+	blk_queue_chunk_sectors(sdkp->disk->queue,
+			logical_to_sectors(sdkp->device, sdkp->zone_blocks));
+	sdkp->zone_shift = ilog2(sdkp->zone_blocks);
+	sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;
+	if (sdkp->capacity & (sdkp->zone_blocks - 1))
+		sdkp->nr_zones++;
+
+	if (!sdkp->zones_wlock) {
+		sdkp->zones_wlock = kzalloc(BITS_TO_LONGS(sdkp->nr_zones),
+					    GFP_KERNEL);
+		if (!sdkp->zones_wlock)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+int sd_zbc_read_zones(struct scsi_disk *sdkp,
+		      unsigned char *buf)
+{
+	sector_t capacity;
+	int ret = 0;
+
+	if (!sd_is_zoned(sdkp))
+		/*
+		 * Device managed or normal SCSI disk,
+		 * no special handling required
+		 */
+		return 0;
+
+
+	/* Get zoned block device characteristics */
+	ret = sd_zbc_read_zoned_characteristics(sdkp, buf);
+	if (ret)
+		goto err;
+
+	/*
+	 * Check for unconstrained reads: host-managed devices with
+	 * constrained reads (drives failing read after write pointer)
+	 * are not supported.
+	 */
+	if (!sdkp->urswrz) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+			  "constrained reads devices are not supported\n");
+		ret = -ENODEV;
+		goto err;
+	}
+
+	/* Check capacity */
+	ret = sd_zbc_check_capacity(sdkp, buf);
+	if (ret)
+		goto err;
+	capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+
+	/*
+	 * Check zone size: only devices with a constant zone size (except
+	 * an eventual last runt zone) that is a power of 2 are supported.
+	 */
+	ret = sd_zbc_check_zone_size(sdkp);
+	if (ret)
+		goto err;
+
+	/* The drive satisfies the kernel restrictions: set it up */
+	ret = sd_zbc_setup(sdkp);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	sdkp->capacity = 0;
+
+	return ret;
+}
+
+void sd_zbc_remove(struct scsi_disk *sdkp)
+{
+	kfree(sdkp->zones_wlock);
+	sdkp->zones_wlock = NULL;
+}
+
+void sd_zbc_print_zones(struct scsi_disk *sdkp)
+{
+	if (!sd_is_zoned(sdkp) || !sdkp->capacity)
+		return;
+
+	if (sdkp->capacity & (sdkp->zone_blocks - 1))
+		sd_printk(KERN_NOTICE, sdkp,
+			  "%u zones of %u logical blocks + 1 runt zone\n",
+			  sdkp->nr_zones - 1,
+			  sdkp->zone_blocks);
+	else
+		sd_printk(KERN_NOTICE, sdkp,
+			  "%u zones of %u logical blocks\n",
+			  sdkp->nr_zones,
+			  sdkp->zone_blocks);
+}
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index d1defd1..6ba66e0 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -299,4 +299,21 @@ struct scsi_lun {
 #define SCSI_ACCESS_STATE_MASK        0x0f
 #define SCSI_ACCESS_STATE_PREFERRED   0x80
 
+/* Reporting options for REPORT ZONES */
+enum zbc_zone_reporting_options {
+	ZBC_ZONE_REPORTING_OPTION_ALL = 0,
+	ZBC_ZONE_REPORTING_OPTION_EMPTY,
+	ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_CLOSED,
+	ZBC_ZONE_REPORTING_OPTION_FULL,
+	ZBC_ZONE_REPORTING_OPTION_READONLY,
+	ZBC_ZONE_REPORTING_OPTION_OFFLINE,
+	ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
+	ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
+	ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
+};
+
+#define ZBC_REPORT_ZONE_PARTIAL 0x80
+
 #endif /* _SCSI_PROTO_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 7/7] blk-zoned: implement ioctls
  2016-09-28  8:45 ` Damien Le Moal
@ 2016-09-28  8:45   ` Damien Le Moal
  -1 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal, Shaun Tancheff

From: Shaun Tancheff <shaun@tancheff.com>

Adds the new BLKREPORTZONE and BLKRESETZONE ioctls for respectively
obtaining the zone configuration of a zoned block device and resetting
the write pointer of sequential zones of a zoned block device.

The BLKREPORTZONE ioctl maps directly to a single call of the function
blkdev_report_zones. The zone information result is passed as an array
of struct blk_zone identical to the structure used internally for
processing the REQ_OP_ZONE_REPORT operation.  The BLKRESETZONE ioctl
maps to a single call of the blkdev_reset_zones function.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-zoned.c             | 93 +++++++++++++++++++++++++++++++++++++++++++
 block/ioctl.c                 |  4 ++
 include/linux/blkdev.h        | 21 ++++++++++
 include/uapi/linux/blkzoned.h | 40 +++++++++++++++++++
 include/uapi/linux/fs.h       |  4 ++
 5 files changed, 162 insertions(+)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 1603573..667f95d 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -255,3 +255,96 @@ int blkdev_reset_zones(struct block_device *bdev,
 	return 0;
 }
 EXPORT_SYMBOL_GPL(blkdev_reset_zones);
+
+/**
+ * BLKREPORTZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+			      unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct request_queue *q;
+	struct blk_zone_report rep;
+	struct blk_zone *zones;
+	int ret;
+
+	if (!argp)
+		return -EINVAL;
+
+	q = bdev_get_queue(bdev);
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -ENOTTY;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (copy_from_user(&rep, argp, sizeof(struct blk_zone_report)))
+		return -EFAULT;
+
+	if (!rep.nr_zones)
+		return -EINVAL;
+
+	zones = kcalloc(rep.nr_zones, sizeof(struct blk_zone), GFP_KERNEL);
+	if (!zones)
+		return -ENOMEM;
+
+	ret = blkdev_report_zones(bdev, rep.sector,
+				  zones, &rep.nr_zones,
+				  GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (rep.nr_zones) {
+		if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
+				 sizeof(struct blk_zone) * rep.nr_zones))
+			ret = -EFAULT;
+	}
+
+ out:
+	kfree(zones);
+
+	return ret;
+}
+
+/**
+ * BLKRESETZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+			     unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct request_queue *q;
+	struct blk_zone_range zrange;
+
+	if (!argp)
+		return -EINVAL;
+
+	q = bdev_get_queue(bdev);
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -ENOTTY;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range)))
+		return -EFAULT;
+
+	return blkdev_reset_zones(bdev, zrange.sector, zrange.nr_sectors,
+				  GFP_KERNEL);
+}
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..448f78a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -513,6 +513,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 				BLKDEV_DISCARD_SECURE);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
+	case BLKREPORTZONE:
+		return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
+	case BLKRESETZONE:
+		return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
 	case HDIO_GETGEO:
 		return blkdev_getgeo(bdev, argp);
 	case BLKRAGET:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 252043f..90097dd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,6 +316,27 @@ extern int blkdev_report_zones(struct block_device *bdev,
 extern int blkdev_reset_zones(struct block_device *bdev, sector_t sectors,
 			      sector_t nr_sectors, gfp_t gfp_mask);
 
+extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+				     unsigned int cmd, unsigned long arg);
+extern int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+				    unsigned int cmd, unsigned long arg);
+
+#else /* CONFIG_BLK_DEV_ZONED */
+
+static inline int blkdev_report_zones_ioctl(struct block_device *bdev,
+					    fmode_t mode, unsigned int cmd,
+					    unsigned long arg)
+{
+	return -ENOTTY;
+}
+
+static inline int blkdev_reset_zones_ioctl(struct block_device *bdev,
+					   fmode_t mode, unsigned int cmd,
+					   unsigned long arg)
+{
+	return -ENOTTY;
+}
+
 #endif /* CONFIG_BLK_DEV_ZONED */
 
 struct request_queue {
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
index a381721..40d1d7b 100644
--- a/include/uapi/linux/blkzoned.h
+++ b/include/uapi/linux/blkzoned.h
@@ -16,6 +16,7 @@
 #define _UAPI_BLKZONED_H
 
 #include <linux/types.h>
+#include <linux/ioctl.h>
 
 /**
  * enum blk_zone_type - Types of zones allowed in a zoned device.
@@ -100,4 +101,43 @@ struct blk_zone {
 	__u8	reserved[36];
 };
 
+/**
+ * struct blk_zone_report - BLKREPORTZONE ioctl request/reply
+ *
+ * @sector: starting sector of report
+ * @nr_zones: IN maximum / OUT actual
+ * @reserved: padding to 16 byte alignment
+ * @zones: Space to hold @nr_zones @zones entries on reply.
+ *
+ * The array of at most @nr_zones must follow this structure in memory.
+ */
+struct blk_zone_report {
+	__u64		sector;
+	__u32		nr_zones;
+	__u8		reserved[4];
+	struct blk_zone zones[0];
+} __packed;
+
+/**
+ * struct blk_zone_range - BLKRESETZONE ioctl request
+ * @sector: starting sector of the first zone to issue reset write pointer
+ * @nr_sectors: Total number of sectors of 1 or more zones to reset
+ */
+struct blk_zone_range {
+	__u64		sector;
+	__u64		nr_sectors;
+};
+
+/**
+ * Zoned block device ioctl's:
+ *
+ * @BLKREPORTZONE: Get zone information. Takes a zone report as argument.
+ *                 The zone report will start from the zone containing the
+ *                 sector specified in the report request structure.
+ * @BLKRESETZONE: Reset the write pointer of the zones in the specified
+ *                sector range. The sector range must be zone aligned.
+ */
+#define BLKREPORTZONE	_IOWR(0x12, 130, struct blk_zone_report)
+#define BLKRESETZONE	_IOW(0x12, 131, struct blk_zone_range)
+
 #endif /* _UAPI_BLKZONED_H */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3b00f7c..e0fc7f0 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -222,6 +222,10 @@ struct fsxattr {
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
+/*
+ * A jump here: 130-131 are reserved for zoned block devices
+ * (see uapi/linux/blkzoned.h)
+ */
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 7/7] blk-zoned: implement ioctls
@ 2016-09-28  8:45   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2016-09-28  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Shaun Tancheff, Damien Le Moal, Shaun Tancheff

From: Shaun Tancheff <shaun@tancheff.com>

Adds the new BLKREPORTZONE and BLKRESETZONE ioctls for respectively
obtaining the zone configuration of a zoned block device and resetting
the write pointer of sequential zones of a zoned block device.

The BLKREPORTZONE ioctl maps directly to a single call of the function
blkdev_report_zones. The zone information result is passed as an array
of struct blk_zone identical to the structure used internally for
processing the REQ_OP_ZONE_REPORT operation.  The BLKRESETZONE ioctl
maps to a single call of the blkdev_reset_zones function.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-zoned.c             | 93 +++++++++++++++++++++++++++++++++++++++++++
 block/ioctl.c                 |  4 ++
 include/linux/blkdev.h        | 21 ++++++++++
 include/uapi/linux/blkzoned.h | 40 +++++++++++++++++++
 include/uapi/linux/fs.h       |  4 ++
 5 files changed, 162 insertions(+)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 1603573..667f95d 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -255,3 +255,96 @@ int blkdev_reset_zones(struct block_device *bdev,
 	return 0;
 }
 EXPORT_SYMBOL_GPL(blkdev_reset_zones);
+
+/**
+ * BLKREPORTZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+			      unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct request_queue *q;
+	struct blk_zone_report rep;
+	struct blk_zone *zones;
+	int ret;
+
+	if (!argp)
+		return -EINVAL;
+
+	q = bdev_get_queue(bdev);
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -ENOTTY;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (copy_from_user(&rep, argp, sizeof(struct blk_zone_report)))
+		return -EFAULT;
+
+	if (!rep.nr_zones)
+		return -EINVAL;
+
+	zones = kcalloc(rep.nr_zones, sizeof(struct blk_zone), GFP_KERNEL);
+	if (!zones)
+		return -ENOMEM;
+
+	ret = blkdev_report_zones(bdev, rep.sector,
+				  zones, &rep.nr_zones,
+				  GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (rep.nr_zones) {
+		if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
+				 sizeof(struct blk_zone) * rep.nr_zones))
+			ret = -EFAULT;
+	}
+
+ out:
+	kfree(zones);
+
+	return ret;
+}
+
+/**
+ * BLKRESETZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+			     unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct request_queue *q;
+	struct blk_zone_range zrange;
+
+	if (!argp)
+		return -EINVAL;
+
+	q = bdev_get_queue(bdev);
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -ENOTTY;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range)))
+		return -EFAULT;
+
+	return blkdev_reset_zones(bdev, zrange.sector, zrange.nr_sectors,
+				  GFP_KERNEL);
+}
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..448f78a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -513,6 +513,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 				BLKDEV_DISCARD_SECURE);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
+	case BLKREPORTZONE:
+		return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
+	case BLKRESETZONE:
+		return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
 	case HDIO_GETGEO:
 		return blkdev_getgeo(bdev, argp);
 	case BLKRAGET:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 252043f..90097dd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,6 +316,27 @@ extern int blkdev_report_zones(struct block_device *bdev,
 extern int blkdev_reset_zones(struct block_device *bdev, sector_t sectors,
 			      sector_t nr_sectors, gfp_t gfp_mask);
 
+extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+				     unsigned int cmd, unsigned long arg);
+extern int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+				    unsigned int cmd, unsigned long arg);
+
+#else /* CONFIG_BLK_DEV_ZONED */
+
+static inline int blkdev_report_zones_ioctl(struct block_device *bdev,
+					    fmode_t mode, unsigned int cmd,
+					    unsigned long arg)
+{
+	return -ENOTTY;
+}
+
+static inline int blkdev_reset_zones_ioctl(struct block_device *bdev,
+					   fmode_t mode, unsigned int cmd,
+					   unsigned long arg)
+{
+	return -ENOTTY;
+}
+
 #endif /* CONFIG_BLK_DEV_ZONED */
 
 struct request_queue {
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
index a381721..40d1d7b 100644
--- a/include/uapi/linux/blkzoned.h
+++ b/include/uapi/linux/blkzoned.h
@@ -16,6 +16,7 @@
 #define _UAPI_BLKZONED_H
 
 #include <linux/types.h>
+#include <linux/ioctl.h>
 
 /**
  * enum blk_zone_type - Types of zones allowed in a zoned device.
@@ -100,4 +101,43 @@ struct blk_zone {
 	__u8	reserved[36];
 };
 
+/**
+ * struct blk_zone_report - BLKREPORTZONE ioctl request/reply
+ *
+ * @sector: starting sector of report
+ * @nr_zones: IN maximum / OUT actual
+ * @reserved: padding to 16 byte alignment
+ * @zones: Space to hold @nr_zones @zones entries on reply.
+ *
+ * The array of at most @nr_zones must follow this structure in memory.
+ */
+struct blk_zone_report {
+	__u64		sector;
+	__u32		nr_zones;
+	__u8		reserved[4];
+	struct blk_zone zones[0];
+} __packed;
+
+/**
+ * struct blk_zone_range - BLKRESETZONE ioctl request
+ * @sector: starting sector of the first zone to issue reset write pointer
+ * @nr_sectors: Total number of sectors of 1 or more zones to reset
+ */
+struct blk_zone_range {
+	__u64		sector;
+	__u64		nr_sectors;
+};
+
+/**
+ * Zoned block device ioctl's:
+ *
+ * @BLKREPORTZONE: Get zone information. Takes a zone report as argument.
+ *                 The zone report will start from the zone containing the
+ *                 sector specified in the report request structure.
+ * @BLKRESETZONE: Reset the write pointer of the zones in the specified
+ *                sector range. The sector range must be zone aligned.
+ */
+#define BLKREPORTZONE	_IOWR(0x12, 130, struct blk_zone_report)
+#define BLKRESETZONE	_IOW(0x12, 131, struct blk_zone_range)
+
 #endif /* _UAPI_BLKZONED_H */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3b00f7c..e0fc7f0 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -222,6 +222,10 @@ struct fsxattr {
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
+/*
+ * A jump here: 130-131 are reserved for zoned block devices
+ * (see uapi/linux/blkzoned.h)
+ */
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 1/7] block: Add 'zoned' queue limit
  2016-09-28  8:45   ` Damien Le Moal
@ 2016-09-29  1:32     ` Shaun Tancheff
  -1 siblings, 0 replies; 34+ messages in thread
From: Shaun Tancheff @ 2016-09-29  1:32 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lemoal@hgst.com> wr=
ote:
> Add the zoned queue limit to indicate the zoning model of a block device.
> Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
> 1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
> for host-managed zone block devices. The standards defined drive managed
> model is not defined here since these block devices do not provide any
> command for accessing zone information. Drive managed model devices will
> be reported as BLK_ZONED_NONE.
>
> The helper functions blk_queue_zoned_model and bdev_zoned_model return
> the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
> return a boolean for callers to test if a block device is zoned.
>
> The zoned attribute is also exported as a string to applications via
> sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
> BLK_ZONED_HM as "host-managed".
>
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  Documentation/ABI/testing/sysfs-block | 16 ++++++++++++
>  block/blk-settings.c                  |  1 +
>  block/blk-sysfs.c                     | 18 ++++++++++++++
>  include/linux/blkdev.h                | 47 +++++++++++++++++++++++++++++=
++++++
>  4 files changed, 82 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/te=
sting/sysfs-block
> index 71d184d..75a5055 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -235,3 +235,19 @@ Description:
>                 write_same_max_bytes is 0, write same is not supported
>                 by the device.
>
> +What:          /sys/block/<disk>/queue/zoned
> +Date:          September 2016
> +Contact:       Damien Le Moal <damien.lemoal@hgst.com>
> +Description:
> +               zoned indicates if the device is a zoned block device
> +               and the zone model of the device if it is indeed zoned.
> +               The possible values indicated by zoned are "none" for
> +               regular block devices and "host-aware" or "host-managed"
> +               for zoned block devices. The characteristics of
> +               host-aware and host-managed zoned block devices are
> +               described in the ZBC (Zoned Block Commands) and ZAC
> +               (Zoned Device ATA Command Set) standards. These standards
> +               also define the "drive-managed" zone model. However,
> +               since drive-managed zoned block devices do not support
> +               zone commands, they will be treated as regular block
> +               devices and zoned will report "none".
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index f679ae1..b1d5b7f 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>         lim->io_opt =3D 0;
>         lim->misaligned =3D 0;
>         lim->cluster =3D 1;
> +       lim->zoned =3D BLK_ZONED_NONE;
>  }
>  EXPORT_SYMBOL(blk_set_default_limits);
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 9cc8d7c..ff9cd9c 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
>  QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
>  #undef QUEUE_SYSFS_BIT_FNS
>
> +static ssize_t queue_zoned_show(struct request_queue *q, char *page)
> +{
> +       switch (blk_queue_zoned_model(q)) {
> +       case BLK_ZONED_HA:
> +               return sprintf(page, "host-aware\n");
> +       case BLK_ZONED_HM:
> +               return sprintf(page, "host-managed\n");
> +       default:
> +               return sprintf(page, "none\n");
> +       }
> +}
> +
>  static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>  {
>         return queue_var_show((blk_queue_nomerges(q) << 1) |
> @@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry =
=3D {
>         .store =3D queue_store_nonrot,
>  };
>
> +static struct queue_sysfs_entry queue_zoned_entry =3D {
> +       .attr =3D {.name =3D "zoned", .mode =3D S_IRUGO },
> +       .show =3D queue_zoned_show,
> +};
> +
>  static struct queue_sysfs_entry queue_nomerges_entry =3D {
>         .attr =3D {.name =3D "nomerges", .mode =3D S_IRUGO | S_IWUSR },
>         .show =3D queue_nomerges_show,
> @@ -546,6 +563,7 @@ static struct attribute *default_attrs[] =3D {
>         &queue_discard_zeroes_data_entry.attr,
>         &queue_write_same_max_entry.attr,
>         &queue_nonrot_entry.attr,
> +       &queue_zoned_entry.attr,
>         &queue_nomerges_entry.attr,
>         &queue_rq_affinity_entry.attr,
>         &queue_iostats_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358..f19e16b 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -261,6 +261,15 @@ struct blk_queue_tag {
>  #define BLK_SCSI_MAX_CMDS      (256)
>  #define BLK_SCSI_CMD_PER_LONG  (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
>
> +/*
> + * Zoned block device models (zoned limit).
> + */
> +enum blk_zoned_model {
> +       BLK_ZONED_NONE, /* Regular block device */
> +       BLK_ZONED_HA,   /* Host-aware zoned block device */
> +       BLK_ZONED_HM,   /* Host-managed zoned block device */
> +};
> +
>  struct queue_limits {
>         unsigned long           bounce_pfn;
>         unsigned long           seg_boundary_mask;
> @@ -290,6 +299,7 @@ struct queue_limits {
>         unsigned char           cluster;
>         unsigned char           discard_zeroes_data;
>         unsigned char           raid_partial_stripes_expensive;
> +       enum blk_zoned_model    zoned;
>  };
>
>  struct request_queue {
> @@ -627,6 +637,23 @@ static inline unsigned int blk_queue_cluster(struct =
request_queue *q)
>         return q->limits.cluster;
>  }
>
> +static inline enum blk_zoned_model
> +blk_queue_zoned_model(struct request_queue *q)
> +{
> +       return q->limits.zoned;
> +}
> +
> +static inline bool blk_queue_is_zoned(struct request_queue *q)
> +{
> +       switch (blk_queue_zoned_model(q)) {
> +       case BLK_ZONED_HA:
> +       case BLK_ZONED_HM:
> +               return true;
> +       default:
> +               return false;
> +       }
> +}
> +
>  /*
>   * We regard a request as sync, if either a read or a sync write
>   */
> @@ -1354,6 +1381,26 @@ static inline unsigned int bdev_write_same(struct =
block_device *bdev)
>         return 0;
>  }
>
> +static inline enum blk_zoned_model bdev_zoned_model(struct block_device =
*bdev)
> +{
> +       struct request_queue *q =3D bdev_get_queue(bdev);
> +
> +       if (q)
> +               return blk_queue_zoned_model(q);
> +
> +       return BLK_ZONED_NONE;
> +}
> +
> +static inline bool bdev_is_zoned(struct block_device *bdev)
> +{
> +       struct request_queue *q =3D bdev_get_queue(bdev);
> +
> +       if (q)
> +               return blk_queue_is_zoned(q);
> +
> +       return false;
> +}
> +
>  static inline int queue_dma_alignment(struct request_queue *q)
>  {
>         return q ? q->dma_alignment : 511;
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=3Dhttp=
-3A__vger.kernel.org_majordomo-2Dinfo.html&d=3DDQIBAg&c=3DIGDlg0lD0b-nebmJJ=
0Kp8A&r=3DWg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA&m=3DOrJGmhxktFJiu0t9z=
ZDWOTM1h0hle-YsGIdgS8egsv4&s=3DiBLL4ue7jd5w6PMQqeLF8l-1CVvqmRuI_aQgJJV6Cp0&=
e=3D



--=20
Shaun Tancheff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 1/7] block: Add 'zoned' queue limit
@ 2016-09-29  1:32     ` Shaun Tancheff
  0 siblings, 0 replies; 34+ messages in thread
From: Shaun Tancheff @ 2016-09-29  1:32 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
> Add the zoned queue limit to indicate the zoning model of a block device.
> Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
> 1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
> for host-managed zone block devices. The standards defined drive managed
> model is not defined here since these block devices do not provide any
> command for accessing zone information. Drive managed model devices will
> be reported as BLK_ZONED_NONE.
>
> The helper functions blk_queue_zoned_model and bdev_zoned_model return
> the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
> return a boolean for callers to test if a block device is zoned.
>
> The zoned attribute is also exported as a string to applications via
> sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
> BLK_ZONED_HM as "host-managed".
>
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  Documentation/ABI/testing/sysfs-block | 16 ++++++++++++
>  block/blk-settings.c                  |  1 +
>  block/blk-sysfs.c                     | 18 ++++++++++++++
>  include/linux/blkdev.h                | 47 +++++++++++++++++++++++++++++++++++
>  4 files changed, 82 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
> index 71d184d..75a5055 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -235,3 +235,19 @@ Description:
>                 write_same_max_bytes is 0, write same is not supported
>                 by the device.
>
> +What:          /sys/block/<disk>/queue/zoned
> +Date:          September 2016
> +Contact:       Damien Le Moal <damien.lemoal@hgst.com>
> +Description:
> +               zoned indicates if the device is a zoned block device
> +               and the zone model of the device if it is indeed zoned.
> +               The possible values indicated by zoned are "none" for
> +               regular block devices and "host-aware" or "host-managed"
> +               for zoned block devices. The characteristics of
> +               host-aware and host-managed zoned block devices are
> +               described in the ZBC (Zoned Block Commands) and ZAC
> +               (Zoned Device ATA Command Set) standards. These standards
> +               also define the "drive-managed" zone model. However,
> +               since drive-managed zoned block devices do not support
> +               zone commands, they will be treated as regular block
> +               devices and zoned will report "none".
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index f679ae1..b1d5b7f 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>         lim->io_opt = 0;
>         lim->misaligned = 0;
>         lim->cluster = 1;
> +       lim->zoned = BLK_ZONED_NONE;
>  }
>  EXPORT_SYMBOL(blk_set_default_limits);
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 9cc8d7c..ff9cd9c 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
>  QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
>  #undef QUEUE_SYSFS_BIT_FNS
>
> +static ssize_t queue_zoned_show(struct request_queue *q, char *page)
> +{
> +       switch (blk_queue_zoned_model(q)) {
> +       case BLK_ZONED_HA:
> +               return sprintf(page, "host-aware\n");
> +       case BLK_ZONED_HM:
> +               return sprintf(page, "host-managed\n");
> +       default:
> +               return sprintf(page, "none\n");
> +       }
> +}
> +
>  static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>  {
>         return queue_var_show((blk_queue_nomerges(q) << 1) |
> @@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry = {
>         .store = queue_store_nonrot,
>  };
>
> +static struct queue_sysfs_entry queue_zoned_entry = {
> +       .attr = {.name = "zoned", .mode = S_IRUGO },
> +       .show = queue_zoned_show,
> +};
> +
>  static struct queue_sysfs_entry queue_nomerges_entry = {
>         .attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
>         .show = queue_nomerges_show,
> @@ -546,6 +563,7 @@ static struct attribute *default_attrs[] = {
>         &queue_discard_zeroes_data_entry.attr,
>         &queue_write_same_max_entry.attr,
>         &queue_nonrot_entry.attr,
> +       &queue_zoned_entry.attr,
>         &queue_nomerges_entry.attr,
>         &queue_rq_affinity_entry.attr,
>         &queue_iostats_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358..f19e16b 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -261,6 +261,15 @@ struct blk_queue_tag {
>  #define BLK_SCSI_MAX_CMDS      (256)
>  #define BLK_SCSI_CMD_PER_LONG  (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
>
> +/*
> + * Zoned block device models (zoned limit).
> + */
> +enum blk_zoned_model {
> +       BLK_ZONED_NONE, /* Regular block device */
> +       BLK_ZONED_HA,   /* Host-aware zoned block device */
> +       BLK_ZONED_HM,   /* Host-managed zoned block device */
> +};
> +
>  struct queue_limits {
>         unsigned long           bounce_pfn;
>         unsigned long           seg_boundary_mask;
> @@ -290,6 +299,7 @@ struct queue_limits {
>         unsigned char           cluster;
>         unsigned char           discard_zeroes_data;
>         unsigned char           raid_partial_stripes_expensive;
> +       enum blk_zoned_model    zoned;
>  };
>
>  struct request_queue {
> @@ -627,6 +637,23 @@ static inline unsigned int blk_queue_cluster(struct request_queue *q)
>         return q->limits.cluster;
>  }
>
> +static inline enum blk_zoned_model
> +blk_queue_zoned_model(struct request_queue *q)
> +{
> +       return q->limits.zoned;
> +}
> +
> +static inline bool blk_queue_is_zoned(struct request_queue *q)
> +{
> +       switch (blk_queue_zoned_model(q)) {
> +       case BLK_ZONED_HA:
> +       case BLK_ZONED_HM:
> +               return true;
> +       default:
> +               return false;
> +       }
> +}
> +
>  /*
>   * We regard a request as sync, if either a read or a sync write
>   */
> @@ -1354,6 +1381,26 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
>         return 0;
>  }
>
> +static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
> +{
> +       struct request_queue *q = bdev_get_queue(bdev);
> +
> +       if (q)
> +               return blk_queue_zoned_model(q);
> +
> +       return BLK_ZONED_NONE;
> +}
> +
> +static inline bool bdev_is_zoned(struct block_device *bdev)
> +{
> +       struct request_queue *q = bdev_get_queue(bdev);
> +
> +       if (q)
> +               return blk_queue_is_zoned(q);
> +
> +       return false;
> +}
> +
>  static inline int queue_dma_alignment(struct request_queue *q)
>  {
>         return q ? q->dma_alignment : 511;
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DQIBAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA&m=OrJGmhxktFJiu0t9zZDWOTM1h0hle-YsGIdgS8egsv4&s=iBLL4ue7jd5w6PMQqeLF8l-1CVvqmRuI_aQgJJV6Cp0&e=



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  2016-09-28  8:45   ` Damien Le Moal
  (?)
@ 2016-09-29  1:32   ` Shaun Tancheff
  -1 siblings, 0 replies; 34+ messages in thread
From: Shaun Tancheff @ 2016-09-29  1:32 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> The queue limits already have a 'chunk_sectors' setting, so
> we should be presenting it via sysfs.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
>
> [Damien: Updated Documentation/ABI/testing/sysfs-block]
>
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  Documentation/ABI/testing/sysfs-block | 13 +++++++++++++
>  block/blk-sysfs.c                     | 11 +++++++++++
>  2 files changed, 24 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
> index 75a5055..ee2d5cd 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -251,3 +251,16 @@ Description:
>                 since drive-managed zoned block devices do not support
>                 zone commands, they will be treated as regular block
>                 devices and zoned will report "none".
> +
> +What:          /sys/block/<disk>/queue/chunk_sectors
> +Date:          September 2016
> +Contact:       Hannes Reinecke <hare@suse.com>
> +Description:
> +               chunk_sectors has different meaning depending on the type
> +               of the disk. For a RAID device (dm-raid), chunk_sectors
> +               indicates the size in 512B sectors of the RAID volume
> +               stripe segment. For a zoned block device, either
> +               host-aware or host-managed, chunk_sectors indicates the
> +               size of 512B sectors of the zones of the device, with
> +               the eventual exception of the last zone of the device
> +               which may be smaller.
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index ff9cd9c..488c2e2 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct request_queue *q, char *pag
>         return queue_var_show(queue_physical_block_size(q), page);
>  }
>
> +static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
> +{
> +       return queue_var_show(q->limits.chunk_sectors, page);
> +}
> +
>  static ssize_t queue_io_min_show(struct request_queue *q, char *page)
>  {
>         return queue_var_show(queue_io_min(q), page);
> @@ -455,6 +460,11 @@ static struct queue_sysfs_entry queue_physical_block_size_entry = {
>         .show = queue_physical_block_size_show,
>  };
>
> +static struct queue_sysfs_entry queue_chunk_sectors_entry = {
> +       .attr = {.name = "chunk_sectors", .mode = S_IRUGO },
> +       .show = queue_chunk_sectors_show,
> +};
> +
>  static struct queue_sysfs_entry queue_io_min_entry = {
>         .attr = {.name = "minimum_io_size", .mode = S_IRUGO },
>         .show = queue_io_min_show,
> @@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
>         &queue_hw_sector_size_entry.attr,
>         &queue_logical_block_size_entry.attr,
>         &queue_physical_block_size_entry.attr,
> +       &queue_chunk_sectors_entry.attr,
>         &queue_io_min_entry.attr,
>         &queue_io_opt_entry.attr,
>         &queue_discard_granularity_entry.attr,
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/7] block: update chunk_sectors in blk_stack_limits()
  2016-09-28  8:45   ` Damien Le Moal
  (?)
@ 2016-09-29  1:33   ` Shaun Tancheff
  -1 siblings, 0 replies; 34+ messages in thread
From: Shaun Tancheff @ 2016-09-29  1:33 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff,
	Hannes Reinecke

On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  block/blk-settings.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index b1d5b7f..55369a6 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>                         t->discard_granularity;
>         }
>
> +       if (b->chunk_sectors)
> +               t->chunk_sectors = min_not_zero(t->chunk_sectors,
> +                                               b->chunk_sectors);
> +
>         return ret;
>  }
>  EXPORT_SYMBOL(blk_stack_limits);
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 5/7] block: Implement support for zoned block devices
  2016-09-28  8:45   ` Damien Le Moal
  (?)
@ 2016-09-29  1:34   ` Shaun Tancheff
  -1 siblings, 0 replies; 34+ messages in thread
From: Shaun Tancheff @ 2016-09-29  1:34 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> Implement zoned block device zone information reporting and reset.
> Zone information are reported as struct blk_zone. This implementation
> does not differentiate between host-aware and host-managed device
> models and is valid for both. Two functions are provided:
> blkdev_report_zones for discovering the zone configuration of a
> zoned block device, and blkdev_reset_zones for resetting the write
> pointer of sequential zones. The helper function blk_queue_zone_size
> and bdev_zone_size are also provided for, as the name suggest,
> obtaining the zone size (in 512B sectors) of the zones of the device.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
>
> [Damien: * Removed the zone cache
>          * Implement report zones operation based on earlier proposal
>            by Shaun Tancheff <shaun.tancheff@seagate.com>]
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  block/Kconfig                 |   8 ++
>  block/Makefile                |   1 +
>  block/blk-zoned.c             | 257 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/blkdev.h        |  31 +++++
>  include/uapi/linux/Kbuild     |   1 +
>  include/uapi/linux/blkzoned.h | 103 +++++++++++++++++
>  6 files changed, 401 insertions(+)
>  create mode 100644 block/blk-zoned.c
>  create mode 100644 include/uapi/linux/blkzoned.h
>
> diff --git a/block/Kconfig b/block/Kconfig
> index 1d4d624..6b0ad08 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
>         T10/SCSI Data Integrity Field or the T13/ATA External Path
>         Protection.  If in doubt, say N.
>
> +config BLK_DEV_ZONED
> +       bool "Zoned block device support"
> +       ---help---
> +       Block layer zoned block device support. This option enables
> +       support for ZAC/ZBC host-managed and host-aware zoned block devices.
> +
> +       Say yes here if you have a ZAC or ZBC storage device.
> +
>  config BLK_DEV_THROTTLING
>         bool "Block layer bio throttling support"
>         depends on BLK_CGROUP=y
> diff --git a/block/Makefile b/block/Makefile
> index 36acdd7..9371bc7 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -22,4 +22,5 @@ obj-$(CONFIG_IOSCHED_CFQ)     += cfq-iosched.o
>  obj-$(CONFIG_BLOCK_COMPAT)     += compat_ioctl.o
>  obj-$(CONFIG_BLK_CMDLINE_PARSER)       += cmdline-parser.o
>  obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
> +obj-$(CONFIG_BLK_DEV_ZONED)    += blk-zoned.o
>  obj-$(CONFIG_BLK_MQ_PCI)       += blk-mq-pci.o
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> new file mode 100644
> index 0000000..1603573
> --- /dev/null
> +++ b/block/blk-zoned.c
> @@ -0,0 +1,257 @@
> +/*
> + * Zoned block device handling
> + *
> + * Copyright (c) 2015, Hannes Reinecke
> + * Copyright (c) 2015, SUSE Linux GmbH
> + *
> + * Copyright (c) 2016, Damien Le Moal
> + * Copyright (c) 2016, Western Digital
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/rbtree.h>
> +#include <linux/blkdev.h>
> +
> +static inline sector_t blk_zone_start(struct request_queue *q,
> +                                     sector_t sector)
> +{
> +       sector_t zone_mask = blk_queue_zone_size(q) - 1;
> +
> +       return sector & ~zone_mask;
> +}
> +
> +/*
> + * Check that a zone report belongs to the partition.
> + * If yes, fix its start sector and write pointer, copy it in the
> + * zone information array and return true. Return false otherwise.
> + */
> +static bool blkdev_report_zone(struct block_device *bdev,
> +                              struct blk_zone *rep,
> +                              struct blk_zone *zone)
> +{
> +       sector_t offset = get_start_sect(bdev);
> +
> +       if (rep->start < offset)
> +               return false;
> +
> +       rep->start -= offset;
> +       if (rep->start + rep->len > bdev->bd_part->nr_sects)
> +               return false;
> +
> +       if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
> +               rep->wp = rep->start + rep->len;
> +       else
> +               rep->wp -= offset;
> +       memcpy(zone, rep, sizeof(struct blk_zone));
> +
> +       return true;
> +}
> +
> +/**
> + * blkdev_report_zones - Get zones information
> + * @bdev:      Target block device
> + * @sector:    Sector from which to report zones
> + * @zones:     Array of zone structures where to return the zones information
> + * @nr_zones:  Number of zone structures in the zone array
> + * @gfp_mask:  Memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + *    Get zone information starting from the zone containing @sector.
> + *    The number of zone information reported may be less than the number
> + *    requested by @nr_zones. The number of zones actually reported is
> + *    returned in @nr_zones.
> + */
> +int blkdev_report_zones(struct block_device *bdev,
> +                       sector_t sector,
> +                       struct blk_zone *zones,
> +                       unsigned int *nr_zones,
> +                       gfp_t gfp_mask)
> +{
> +       struct request_queue *q = bdev_get_queue(bdev);
> +       struct blk_zone_report_hdr *hdr;
> +       unsigned int nrz = *nr_zones;
> +       struct page *page;
> +       unsigned int nr_rep;
> +       size_t rep_bytes;
> +       unsigned int nr_pages;
> +       struct bio *bio;
> +       struct bio_vec *bv;
> +       unsigned int i, n, nz;
> +       unsigned int ofst;
> +       void *addr;
> +       int ret = 0;
> +
> +       if (!q)
> +               return -ENXIO;
> +
> +       if (!blk_queue_is_zoned(q))
> +               return -EOPNOTSUPP;
> +
> +       if (!nrz)
> +               return 0;
> +
> +       if (sector > bdev->bd_part->nr_sects) {
> +               *nr_zones = 0;
> +               return 0;
> +       }
> +
> +       /*
> +        * The zone report has a header. So make room for it in the
> +        * payload. Also make sure that the report fits in a single BIO
> +        * that will not be split down the stack.
> +        */
> +       rep_bytes = sizeof(struct blk_zone_report_hdr) +
> +               sizeof(struct blk_zone) * nrz;
> +       rep_bytes = (rep_bytes + PAGE_SIZE - 1) & PAGE_MASK;
> +       if (rep_bytes > (queue_max_sectors(q) << 9))
> +               rep_bytes = queue_max_sectors(q) << 9;
> +
> +       nr_pages = min_t(unsigned int, BIO_MAX_PAGES,
> +                        rep_bytes >> PAGE_SHIFT);
> +       nr_pages = min_t(unsigned int, nr_pages,
> +                        queue_max_segments(q));
> +
> +       bio = bio_alloc(gfp_mask, nr_pages);
> +       if (!bio)
> +               return -ENOMEM;
> +
> +       bio->bi_bdev = bdev;
> +       bio->bi_iter.bi_sector = blk_zone_start(q, sector);
> +       bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, 0);
> +
> +       for (i = 0; i < nr_pages; i++) {
> +               page = alloc_page(gfp_mask);
> +               if (!page) {
> +                       ret = -ENOMEM;
> +                       goto out;
> +               }
> +               if (!bio_add_page(bio, page, PAGE_SIZE, 0)) {
> +                       __free_page(page);
> +                       break;
> +               }
> +       }
> +
> +       if (i == 0)
> +               ret = -ENOMEM;
> +       else
> +               ret = submit_bio_wait(bio);
> +       if (ret)
> +               goto out;
> +
> +       /*
> +        * Process the report result: skip the header and go through the
> +        * reported zones to fixup and fixup the zone information for
> +        * partitions. At the same time, return the zone information into
> +        * the zone array.
> +        */
> +       n = 0;
> +       nz = 0;
> +       nr_rep = 0;
> +       bio_for_each_segment_all(bv, bio, i) {
> +
> +               if (!bv->bv_page)
> +                       break;
> +
> +               addr = kmap_atomic(bv->bv_page);
> +
> +               /* Get header in the first page */
> +               ofst = 0;
> +               if (!nr_rep) {
> +                       hdr = (struct blk_zone_report_hdr *) addr;
> +                       nr_rep = hdr->nr_zones;
> +                       ofst = sizeof(struct blk_zone_report_hdr);
> +               }
> +
> +               /* Fixup and report zones */
> +               while (ofst < bv->bv_len &&
> +                      n < nr_rep && nz < nrz) {
> +                       if (blkdev_report_zone(bdev, addr + ofst, &zones[nz]))
> +                               nz++;
> +                       ofst += sizeof(struct blk_zone);
> +                       n++;
> +               }
> +
> +               kunmap_atomic(addr);
> +
> +               if (n >= nr_rep || nz >= nrz)
> +                       break;
> +
> +       }
> +
> +out:
> +       bio_for_each_segment_all(bv, bio, i)
> +               __free_page(bv->bv_page);
> +       bio_put(bio);
> +
> +       if (ret == 0)
> +               *nr_zones = nz;
> +
> +       return ret;
> +}
> +EXPORT_SYMBOL_GPL(blkdev_report_zones);
> +
> +/**
> + * blkdev_reset_zones - Reset zones write pointer
> + * @bdev:      Target block device
> + * @sector:    Start sector of the first zone to reset
> + * @nr_sectors:        Number of sectors, at least the length of one zone
> + * @gfp_mask:  Memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + *    Reset the write pointer of the zones contained in the range
> + *    @sector..@sector+@nr_sectors. Specifying the entire disk sector range
> + *    is valid, but the specified range should not contain conventional zones.
> + */
> +int blkdev_reset_zones(struct block_device *bdev,
> +                      sector_t sector, sector_t nr_sectors,
> +                      gfp_t gfp_mask)
> +{
> +       struct request_queue *q = bdev_get_queue(bdev);
> +       sector_t zone_sectors;
> +       sector_t end_sector = sector + nr_sectors;
> +       struct bio *bio;
> +       int ret;
> +
> +       if (!q)
> +               return -ENXIO;
> +
> +       if (!blk_queue_is_zoned(q))
> +               return -EOPNOTSUPP;
> +
> +       if (end_sector > bdev->bd_part->nr_sects)
> +               /* Out of range */
> +               return -EINVAL;
> +
> +       /* Check alignment (handle eventual smaller last zone) */
> +       zone_sectors = blk_queue_zone_size(q);
> +       if (sector & (zone_sectors - 1))
> +               return -EINVAL;
> +
> +       if ((nr_sectors & (zone_sectors - 1)) &&
> +           end_sector != bdev->bd_part->nr_sects)
> +               return -EINVAL;
> +
> +       while (sector < end_sector) {
> +
> +               bio = bio_alloc(gfp_mask, 0);
> +               bio->bi_iter.bi_sector = sector;
> +               bio->bi_bdev = bdev;
> +               bio_set_op_attrs(bio, REQ_OP_ZONE_RESET, 0);
> +
> +               ret = submit_bio_wait(bio);
> +               bio_put(bio);
> +
> +               if (ret)
> +                       return ret;
> +
> +               sector += zone_sectors;
> +
> +               /* This may take a while, so be nice to others */
> +               cond_resched();
> +
> +       }
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(blkdev_reset_zones);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index f19e16b..252043f 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -24,6 +24,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/percpu-refcount.h>
>  #include <linux/scatterlist.h>
> +#include <linux/blkzoned.h>
>
>  struct module;
>  struct scsi_ioctl_command;
> @@ -302,6 +303,21 @@ struct queue_limits {
>         enum blk_zoned_model    zoned;
>  };
>
> +#ifdef CONFIG_BLK_DEV_ZONED
> +
> +struct blk_zone_report_hdr {
> +       unsigned int    nr_zones;
> +       u8              padding[60];
> +};
> +
> +extern int blkdev_report_zones(struct block_device *bdev,
> +                              sector_t sector, struct blk_zone *zones,
> +                              unsigned int *nr_zones, gfp_t gfp_mask);
> +extern int blkdev_reset_zones(struct block_device *bdev, sector_t sectors,
> +                             sector_t nr_sectors, gfp_t gfp_mask);
> +
> +#endif /* CONFIG_BLK_DEV_ZONED */
> +
>  struct request_queue {
>         /*
>          * Together with queue_head for cacheline sharing
> @@ -654,6 +670,11 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
>         }
>  }
>
> +static inline unsigned int blk_queue_zone_size(struct request_queue *q)
> +{
> +       return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0;
> +}
> +
>  /*
>   * We regard a request as sync, if either a read or a sync write
>   */
> @@ -1401,6 +1422,16 @@ static inline bool bdev_is_zoned(struct block_device *bdev)
>         return false;
>  }
>
> +static inline unsigned int bdev_zone_size(struct block_device *bdev)
> +{
> +       struct request_queue *q = bdev_get_queue(bdev);
> +
> +       if (q)
> +               return blk_queue_zone_size(q);
> +
> +       return 0;
> +}
> +
>  static inline int queue_dma_alignment(struct request_queue *q)
>  {
>         return q ? q->dma_alignment : 511;
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index dd60439..92466a6 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -70,6 +70,7 @@ header-y += bfs_fs.h
>  header-y += binfmts.h
>  header-y += blkpg.h
>  header-y += blktrace_api.h
> +header-y += blkzoned.h
>  header-y += bpf_common.h
>  header-y += bpf_perf_event.h
>  header-y += bpf.h
> diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
> new file mode 100644
> index 0000000..a381721
> --- /dev/null
> +++ b/include/uapi/linux/blkzoned.h
> @@ -0,0 +1,103 @@
> +/*
> + * Zoned block devices handling.
> + *
> + * Copyright (C) 2015 Seagate Technology PLC
> + *
> + * Written by: Shaun Tancheff <shaun.tancheff@seagate.com>
> + *
> + * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
> + * Copyright (C) 2016 Western Digital
> + *
> + * This file is licensed under  the terms of the GNU General Public
> + * License version 2. This program is licensed "as is" without any
> + * warranty of any kind, whether express or implied.
> + */
> +#ifndef _UAPI_BLKZONED_H
> +#define _UAPI_BLKZONED_H
> +
> +#include <linux/types.h>
> +
> +/**
> + * enum blk_zone_type - Types of zones allowed in a zoned device.
> + *
> + * @BLK_ZONE_TYPE_CONVENTIONAL: The zone has no write pointer and can be writen
> + *                              randomly. Zone reset has no effect on the zone.
> + * @BLK_ZONE_TYPE_SEQWRITE_REQ: The zone must be written sequentially
> + * @BLK_ZONE_TYPE_SEQWRITE_PREF: The zone can be written non-sequentially
> + *
> + * Any other value not defined is reserved and must be considered as invalid.
> + */
> +enum blk_zone_type {
> +       BLK_ZONE_TYPE_CONVENTIONAL      = 0x1,
> +       BLK_ZONE_TYPE_SEQWRITE_REQ      = 0x2,
> +       BLK_ZONE_TYPE_SEQWRITE_PREF     = 0x3,
> +};
> +
> +/**
> + * enum blk_zone_cond - Condition [state] of a zone in a zoned device.
> + *
> + * @BLK_ZONE_COND_NOT_WP: The zone has no write pointer, it is conventional.
> + * @BLK_ZONE_COND_EMPTY: The zone is empty.
> + * @BLK_ZONE_COND_IMP_OPEN: The zone is open, but not explicitly opened.
> + * @BLK_ZONE_COND_EXP_OPEN: The zones was explicitly opened by an
> + *                          OPEN ZONE command.
> + * @BLK_ZONE_COND_CLOSED: The zone was [explicitly] closed after writing.
> + * @BLK_ZONE_COND_FULL: The zone is marked as full, possibly by a zone
> + *                      FINISH ZONE command.
> + * @BLK_ZONE_COND_READONLY: The zone is read-only.
> + * @BLK_ZONE_COND_OFFLINE: The zone is offline (sectors cannot be read/written).
> + *
> + * The Zone Condition state machine in the ZBC/ZAC standards maps the above
> + * deinitions as:
> + *   - ZC1: Empty         | BLK_ZONE_EMPTY
> + *   - ZC2: Implicit Open | BLK_ZONE_COND_IMP_OPEN
> + *   - ZC3: Explicit Open | BLK_ZONE_COND_EXP_OPEN
> + *   - ZC4: Closed        | BLK_ZONE_CLOSED
> + *   - ZC5: Full          | BLK_ZONE_FULL
> + *   - ZC6: Read Only     | BLK_ZONE_READONLY
> + *   - ZC7: Offline       | BLK_ZONE_OFFLINE
> + *
> + * Conditions 0x5 to 0xC are reserved by the current ZBC/ZAC spec and should
> + * be considered invalid.
> + */
> +enum blk_zone_cond {
> +       BLK_ZONE_COND_NOT_WP    = 0x0,
> +       BLK_ZONE_COND_EMPTY     = 0x1,
> +       BLK_ZONE_COND_IMP_OPEN  = 0x2,
> +       BLK_ZONE_COND_EXP_OPEN  = 0x3,
> +       BLK_ZONE_COND_CLOSED    = 0x4,
> +       BLK_ZONE_COND_READONLY  = 0xD,
> +       BLK_ZONE_COND_FULL      = 0xE,
> +       BLK_ZONE_COND_OFFLINE   = 0xF,
> +};
> +
> +/**
> + * struct blk_zone - Zone descriptor for BLKREPORTZONE ioctl.
> + *
> + * @start: Zone start in 512 B sector units
> + * @len: Zone length in 512 B sector units
> + * @wp: Zone write pointer location in 512 B sector units
> + * @type: see enum blk_zone_type for possible values
> + * @cond: see enum blk_zone_cond for possible values
> + * @non_seq: Flag indicating that the zone is using non-sequential resources
> + *           (for host-aware zoned block devices only).
> + * @reset: Flag indicating that a zone reset is recommended.
> + * @reserved: Padding to 64 B to match the ZBC/ZAC defined zone descriptor size.
> + *
> + * start, len and wp use the regular 512 B sector unit, regardless of the
> + * device logical block size. The overall structure size is 64 B to match the
> + * ZBC/ZAC defined zone descriptor and allow support for future additional
> + * zone information.
> + */
> +struct blk_zone {
> +       __u64   start;          /* Zone start sector */
> +       __u64   len;            /* Zone length in number of sectors */
> +       __u64   wp;             /* Zone write pointer position */
> +       __u8    type;           /* Zone type */
> +       __u8    cond;           /* Zone condition */
> +       __u8    non_seq;        /* Non-sequential write resources active */
> +       __u8    reset;          /* Reset write pointer recommended */
> +       __u8    reserved[36];
> +};
> +
> +#endif /* _UAPI_BLKZONED_H */
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 6/7] sd: Implement support for ZBC devices
  2016-09-28  8:45   ` Damien Le Moal
  (?)
@ 2016-09-29  1:35   ` Shaun Tancheff
  -1 siblings, 0 replies; 34+ messages in thread
From: Shaun Tancheff @ 2016-09-29  1:35 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> Implement ZBC support functions to setup zoned disks, both
> host-managed and host-aware models. Only zoned disks that satisfy
> the following conditions are supported:
> 1) All zones are the same size, with the exception of an eventual
>    last smaller runt zone.
> 2) For host-managed disks, reads are unrestricted (reads are not
>    failed due to zone or write pointer alignement constraints).
> Zoned disks that do not satisfy these 2 conditions are setup with
> a capacity of 0 to prevent their use.
>
> The function sd_zbc_read_zones, called from sd_revalidate_disk,
> checks that the device satisfies the above two constraints. This
> function may also change the disk capacity previously set by
> sd_read_capacity for devices reporting only the capacity of
> conventional zones at the beginning of the LBA range (i.e. devices
> reporting rc_basis set to 0).
>
> The capacity message output was moved out of sd_read_capacity into
> a new function sd_print_capacity to include this eventual capacity
> change by sd_zbc_read_zones. This new function also includes a call
> to sd_zbc_print_zones to display the number of zones and zone size
> of the device.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
>
> [Damien: * Removed zone cache support
>          * Removed mapping of discard to reset write pointer command
>          * Modified sd_zbc_read_zones to include checks that the
>            device satisfies the kernel constraints
>          * Implemeted REPORT ZONES setup and post-processing based
>            on code from Shaun Tancheff <shaun.tancheff@seagate.com>]
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  drivers/scsi/Makefile     |   1 +
>  drivers/scsi/sd.c         | 143 ++++++++---
>  drivers/scsi/sd.h         |  70 ++++++
>  drivers/scsi/sd_zbc.c     | 624 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/scsi/scsi_proto.h |  17 ++
>  5 files changed, 822 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/scsi/sd_zbc.c
>
> diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
> index fc0d9b8..350513c 100644
> --- a/drivers/scsi/Makefile
> +++ b/drivers/scsi/Makefile
> @@ -180,6 +180,7 @@ hv_storvsc-y                        := storvsc_drv.o
>
>  sd_mod-objs    := sd.o
>  sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
> +sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
>
>  sr_mod-objs    := sr.o sr_ioctl.o sr_vendor.o
>  ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 51e5629..4d63260 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -93,6 +93,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
> +MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
>
>  #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
>  #define SD_MINORS      16
> @@ -163,7 +164,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
>         static const char temp[] = "temporary ";
>         int len;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 /* no cache control on RBC devices; theoretically they
>                  * can do it, but there's probably so many exceptions
>                  * it's not worth the risk */
> @@ -262,7 +263,7 @@ allow_restart_store(struct device *dev, struct device_attribute *attr,
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 return -EINVAL;
>
>         sdp->allow_restart = simple_strtoul(buf, NULL, 10);
> @@ -392,6 +393,11 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
>
> +       if (sd_is_zoned(sdkp)) {
> +               sd_config_discard(sdkp, SD_LBP_DISABLE);
> +               return count;
> +       }
> +
>         if (sdp->type != TYPE_DISK)
>                 return -EINVAL;
>
> @@ -459,7 +465,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 return -EINVAL;
>
>         err = kstrtoul(buf, 10, &max);
> @@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
>
>         BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
>
> +       if (sd_is_zoned(sdkp)) {
> +               /* sd_zbc_setup_read_write uses block layer sector units */
> +               ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
> +               if (ret != BLKPREP_OK)
> +                       return ret;
> +       }
> +
>         sector >>= ilog2(sdp->sector_size) - 9;
>         nr_sectors >>= ilog2(sdp->sector_size) - 9;
>
> @@ -963,6 +976,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
>         SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
>                                         (unsigned long long)block));
>
> +       if (sd_is_zoned(sdkp)) {
> +               /* sd_zbc_setup_read_write uses block layer sector units */
> +               ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
> +               if (ret != BLKPREP_OK)
> +                       goto out;
> +       }
> +
>         /*
>          * If we have a 1K hardware sectorsize, prevent access to single
>          * 512 byte sectors.  In theory we could handle this - in fact
> @@ -1149,6 +1169,10 @@ static int sd_init_command(struct scsi_cmnd *cmd)
>         case REQ_OP_READ:
>         case REQ_OP_WRITE:
>                 return sd_setup_read_write_cmnd(cmd);
> +       case REQ_OP_ZONE_REPORT:
> +               return sd_zbc_setup_report_cmnd(cmd);
> +       case REQ_OP_ZONE_RESET:
> +               return sd_zbc_setup_reset_cmnd(cmd);
>         default:
>                 BUG();
>         }
> @@ -1780,7 +1804,10 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>         unsigned char op = SCpnt->cmnd[0];
>         unsigned char unmap = SCpnt->cmnd[1] & 8;
>
> -       if (req_op(req) == REQ_OP_DISCARD || req_op(req) == REQ_OP_WRITE_SAME) {
> +       switch (req_op(req)) {
> +       case REQ_OP_DISCARD:
> +       case REQ_OP_WRITE_SAME:
> +       case REQ_OP_ZONE_RESET:
>                 if (!result) {
>                         good_bytes = blk_rq_bytes(req);
>                         scsi_set_resid(SCpnt, 0);
> @@ -1788,6 +1815,17 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>                         good_bytes = 0;
>                         scsi_set_resid(SCpnt, blk_rq_bytes(req));
>                 }
> +               break;
> +       case REQ_OP_ZONE_REPORT:
> +               if (!result) {
> +                       good_bytes = scsi_bufflen(SCpnt)
> +                               - scsi_get_resid(SCpnt);
> +                       scsi_set_resid(SCpnt, 0);
> +               } else {
> +                       good_bytes = 0;
> +                       scsi_set_resid(SCpnt, blk_rq_bytes(req));
> +               }
> +               break;
>         }
>
>         if (result) {
> @@ -1848,7 +1886,11 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>         default:
>                 break;
>         }
> +
>   out:
> +       if (sd_is_zoned(sdkp))
> +               sd_zbc_complete(SCpnt, good_bytes, &sshdr);
> +
>         SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt,
>                                            "sd_done: completed %d of %d bytes\n",
>                                            good_bytes, scsi_bufflen(SCpnt)));
> @@ -1983,7 +2025,6 @@ sd_spinup_disk(struct scsi_disk *sdkp)
>         }
>  }
>
> -
>  /*
>   * Determine whether disk supports Data Integrity Field.
>   */
> @@ -2133,6 +2174,9 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>         /* Logical blocks per physical block exponent */
>         sdkp->physical_block_size = (1 << (buffer[13] & 0xf)) * sector_size;
>
> +       /* RC basis */
> +       sdkp->rc_basis = (buffer[12] >> 4) & 0x3;
> +
>         /* Lowest aligned logical block */
>         alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
>         blk_queue_alignment_offset(sdp->request_queue, alignment);
> @@ -2242,7 +2286,6 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
>  {
>         int sector_size;
>         struct scsi_device *sdp = sdkp->device;
> -       sector_t old_capacity = sdkp->capacity;
>
>         if (sd_try_rc16_first(sdp)) {
>                 sector_size = read_capacity_16(sdkp, sdp, buffer);
> @@ -2323,35 +2366,44 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
>                 sector_size = 512;
>         }
>         blk_queue_logical_block_size(sdp->request_queue, sector_size);
> +       blk_queue_physical_block_size(sdp->request_queue,
> +                                     sdkp->physical_block_size);
> +       sdkp->device->sector_size = sector_size;
>
> -       {
> -               char cap_str_2[10], cap_str_10[10];
> +       if (sdkp->capacity > 0xffffffff)
> +               sdp->use_16_for_rw = 1;
>
> -               string_get_size(sdkp->capacity, sector_size,
> -                               STRING_UNITS_2, cap_str_2, sizeof(cap_str_2));
> -               string_get_size(sdkp->capacity, sector_size,
> -                               STRING_UNITS_10, cap_str_10,
> -                               sizeof(cap_str_10));
> +}
>
> -               if (sdkp->first_scan || old_capacity != sdkp->capacity) {
> -                       sd_printk(KERN_NOTICE, sdkp,
> -                                 "%llu %d-byte logical blocks: (%s/%s)\n",
> -                                 (unsigned long long)sdkp->capacity,
> -                                 sector_size, cap_str_10, cap_str_2);
> +/*
> + * Print disk capacity
> + */
> +static void
> +sd_print_capacity(struct scsi_disk *sdkp,
> +                 sector_t old_capacity)
> +{
> +       int sector_size = sdkp->device->sector_size;
> +       char cap_str_2[10], cap_str_10[10];
>
> -                       if (sdkp->physical_block_size != sector_size)
> -                               sd_printk(KERN_NOTICE, sdkp,
> -                                         "%u-byte physical blocks\n",
> -                                         sdkp->physical_block_size);
> -               }
> -       }
> +       string_get_size(sdkp->capacity, sector_size,
> +                       STRING_UNITS_2, cap_str_2, sizeof(cap_str_2));
> +       string_get_size(sdkp->capacity, sector_size,
> +                       STRING_UNITS_10, cap_str_10,
> +                       sizeof(cap_str_10));
>
> -       if (sdkp->capacity > 0xffffffff)
> -               sdp->use_16_for_rw = 1;
> +       if (sdkp->first_scan || old_capacity != sdkp->capacity) {
> +               sd_printk(KERN_NOTICE, sdkp,
> +                         "%llu %d-byte logical blocks: (%s/%s)\n",
> +                         (unsigned long long)sdkp->capacity,
> +                         sector_size, cap_str_10, cap_str_2);
>
> -       blk_queue_physical_block_size(sdp->request_queue,
> -                                     sdkp->physical_block_size);
> -       sdkp->device->sector_size = sector_size;
> +               if (sdkp->physical_block_size != sector_size)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                                 "%u-byte physical blocks\n",
> +                                 sdkp->physical_block_size);
> +
> +               sd_zbc_print_zones(sdkp);
> +       }
>  }
>
>  /* called with buffer of length 512 */
> @@ -2613,7 +2665,7 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
>         struct scsi_mode_data data;
>         struct scsi_sense_hdr sshdr;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 return;
>
>         if (sdkp->protection_type == 0)
> @@ -2720,6 +2772,7 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
>   */
>  static void sd_read_block_characteristics(struct scsi_disk *sdkp)
>  {
> +       struct request_queue *q = sdkp->disk->queue;
>         unsigned char *buffer;
>         u16 rot;
>         const int vpd_len = 64;
> @@ -2734,10 +2787,21 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
>         rot = get_unaligned_be16(&buffer[4]);
>
>         if (rot == 1) {
> -               queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
> -               queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
> +               queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
> +               queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, q);
>         }
>
> +       sdkp->zoned = (buffer[8] >> 4) & 3;
> +       if (sdkp->zoned == 1)
> +               q->limits.zoned = BLK_ZONED_HA;
> +       else if (sdkp->device->type == TYPE_ZBC)
> +               q->limits.zoned = BLK_ZONED_HM;
> +       else
> +               q->limits.zoned = BLK_ZONED_NONE;
> +       if (blk_queue_is_zoned(q) && sdkp->first_scan)
> +               sd_printk(KERN_NOTICE, sdkp, "Host-%s zoned block device\n",
> +                     q->limits.zoned == BLK_ZONED_HM ? "managed" : "aware");
> +
>   out:
>         kfree(buffer);
>  }
> @@ -2809,6 +2873,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
>         struct scsi_disk *sdkp = scsi_disk(disk);
>         struct scsi_device *sdp = sdkp->device;
>         struct request_queue *q = sdkp->disk->queue;
> +       sector_t old_capacity = sdkp->capacity;
>         unsigned char *buffer;
>         unsigned int dev_max, rw_max;
>
> @@ -2842,8 +2907,11 @@ static int sd_revalidate_disk(struct gendisk *disk)
>                         sd_read_block_provisioning(sdkp);
>                         sd_read_block_limits(sdkp);
>                         sd_read_block_characteristics(sdkp);
> +                       sd_zbc_read_zones(sdkp, buffer);
>                 }
>
> +               sd_print_capacity(sdkp, old_capacity);
> +
>                 sd_read_write_protect_flag(sdkp, buffer);
>                 sd_read_cache_type(sdkp, buffer);
>                 sd_read_app_tag_own(sdkp, buffer);
> @@ -3041,9 +3109,16 @@ static int sd_probe(struct device *dev)
>
>         scsi_autopm_get_device(sdp);
>         error = -ENODEV;
> -       if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
> +       if (sdp->type != TYPE_DISK &&
> +           sdp->type != TYPE_ZBC &&
> +           sdp->type != TYPE_MOD &&
> +           sdp->type != TYPE_RBC)
>                 goto out;
>
> +#ifndef CONFIG_BLK_DEV_ZONED
> +       if (sdp->type == TYPE_ZBC)
> +               goto out;
> +#endif
>         SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
>                                         "sd_probe\n"));
>
> @@ -3147,6 +3222,8 @@ static int sd_remove(struct device *dev)
>         del_gendisk(sdkp->disk);
>         sd_shutdown(dev);
>
> +       sd_zbc_remove(sdkp);
> +
>         blk_register_region(devt, SD_MINORS, NULL,
>                             sd_default_probe, NULL, NULL);
>
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index c8d9863..6bd4226 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -64,6 +64,15 @@ struct scsi_disk {
>         struct scsi_device *device;
>         struct device   dev;
>         struct gendisk  *disk;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +       unsigned int    nr_zones;
> +       unsigned int    zone_blocks;
> +       unsigned int    zone_shift;
> +       unsigned long   *zones_wlock;
> +       unsigned int    zones_optimal_open;
> +       unsigned int    zones_optimal_nonseq;
> +       unsigned int    zones_max_open;
> +#endif
>         atomic_t        openers;
>         sector_t        capacity;       /* size in logical blocks */
>         u32             max_xfer_blocks;
> @@ -94,6 +103,9 @@ struct scsi_disk {
>         unsigned        lbpvpd : 1;
>         unsigned        ws10 : 1;
>         unsigned        ws16 : 1;
> +       unsigned        rc_basis: 2;
> +       unsigned        zoned: 2;
> +       unsigned        urswrz : 1;
>  };
>  #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
>
> @@ -156,6 +168,11 @@ static inline unsigned int logical_to_bytes(struct scsi_device *sdev, sector_t b
>         return blocks * sdev->sector_size;
>  }
>
> +static inline sector_t sectors_to_logical(struct scsi_device *sdev, sector_t sector)
> +{
> +       return sector >> (ilog2(sdev->sector_size) - 9);
> +}
> +
>  /*
>   * Look up the DIX operation based on whether the command is read or
>   * write and whether dix and dif are enabled.
> @@ -239,4 +256,57 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
>
>  #endif /* CONFIG_BLK_DEV_INTEGRITY */
>
> +static inline int sd_is_zoned(struct scsi_disk *sdkp)
> +{
> +       return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
> +}
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +
> +extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
> +extern void sd_zbc_remove(struct scsi_disk *sdkp);
> +extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
> +extern int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
> +                                  sector_t sector, unsigned int nr_sectors);
> +extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd);
> +extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
> +extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
> +                           struct scsi_sense_hdr *sshdr);
> +
> +#else /* CONFIG_BLK_DEV_ZONED */
> +
> +static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
> +                                   unsigned char *buf)
> +{
> +       return 0;
> +}
> +
> +static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
> +
> +static inline void sd_zbc_print_zones(struct scsi_disk *sdkp) {}
> +
> +static inline int sd_zbc_setup_read_write(struct scsi_disk *sdkp,
> +                                         struct request *rq, sector_t sector,
> +                                         unsigned int num_sectors)
> +{
> +       /* Let the drive fail requests */
> +       return BLKPREP_OK;
> +}
> +
> +static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
> +{
> +       return BLKPREP_KILL;
> +}
> +
> +static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
> +{
> +       return BLKPREP_KILL;
> +}
> +
> +static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
> +                                  unsigned int good_bytes,
> +                                  struct scsi_sense_hdr *sshdr) {}
> +
> +#endif /* CONFIG_BLK_DEV_ZONED */
> +
>  #endif /* _SCSI_DISK_H */
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> new file mode 100644
> index 0000000..a4da0ed
> --- /dev/null
> +++ b/drivers/scsi/sd_zbc.c
> @@ -0,0 +1,624 @@
> +/*
> + * SCSI Zoned Block commands
> + *
> + * Copyright (C) 2014-2015 SUSE Linux GmbH
> + * Written by: Hannes Reinecke <hare@suse.de>
> + * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
> + * Modified by: Shaun Tancheff <shaun.tancheff@seagate.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; see the file COPYING.  If not, write to
> + * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
> + * USA.
> + *
> + */
> +
> +#include <linux/blkdev.h>
> +
> +#include <asm/unaligned.h>
> +
> +#include <scsi/scsi.h>
> +#include <scsi/scsi_cmnd.h>
> +#include <scsi/scsi_dbg.h>
> +#include <scsi/scsi_device.h>
> +#include <scsi/scsi_driver.h>
> +#include <scsi/scsi_host.h>
> +#include <scsi/scsi_eh.h>
> +
> +#include "sd.h"
> +#include "scsi_priv.h"
> +
> +enum zbc_zone_type {
> +       ZBC_ZONE_TYPE_CONV = 0x1,
> +       ZBC_ZONE_TYPE_SEQWRITE_REQ,
> +       ZBC_ZONE_TYPE_SEQWRITE_PREF,
> +       ZBC_ZONE_TYPE_RESERVED,
> +};
> +
> +enum zbc_zone_cond {
> +       ZBC_ZONE_COND_NO_WP,
> +       ZBC_ZONE_COND_EMPTY,
> +       ZBC_ZONE_COND_IMP_OPEN,
> +       ZBC_ZONE_COND_EXP_OPEN,
> +       ZBC_ZONE_COND_CLOSED,
> +       ZBC_ZONE_COND_READONLY = 0xd,
> +       ZBC_ZONE_COND_FULL,
> +       ZBC_ZONE_COND_OFFLINE,
> +};
> +
> +/**
> + * Convert a zone descriptor to a zone struct.
> + */
> +static void sd_zbc_parse_report(struct scsi_disk *sdkp,
> +                               u8 *buf,
> +                               struct blk_zone *zone)
> +{
> +       struct scsi_device *sdp = sdkp->device;
> +
> +       memset(zone, 0, sizeof(struct blk_zone));
> +
> +       zone->type = buf[0] & 0x0f;
> +       zone->cond = (buf[1] >> 4) & 0xf;
> +       if (buf[1] & 0x01)
> +               zone->reset = 1;
> +       if (buf[1] & 0x02)
> +               zone->non_seq = 1;
> +
> +       zone->len = logical_to_sectors(sdp, get_unaligned_be64(&buf[8]));
> +       zone->start = logical_to_sectors(sdp, get_unaligned_be64(&buf[16]));
> +       zone->wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24]));
> +       if (zone->type != ZBC_ZONE_TYPE_CONV &&
> +           zone->cond == ZBC_ZONE_COND_FULL)
> +               zone->wp = zone->start + zone->len;
> +}
> +
> +/**
> + * Issue a REPORT ZONES scsi command.
> + */
> +static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
> +                              unsigned int buflen, sector_t lba)
> +{
> +       struct scsi_device *sdp = sdkp->device;
> +       const int timeout = sdp->request_queue->rq_timeout;
> +       struct scsi_sense_hdr sshdr;
> +       unsigned char cmd[16];
> +       unsigned int rep_len;
> +       int result;
> +
> +       memset(cmd, 0, 16);
> +       cmd[0] = ZBC_IN;
> +       cmd[1] = ZI_REPORT_ZONES;
> +       put_unaligned_be64(lba, &cmd[2]);
> +       put_unaligned_be32(buflen, &cmd[10]);
> +       memset(buf, 0, buflen);
> +
> +       result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
> +                                 buf, buflen, &sshdr,
> +                                 timeout, SD_MAX_RETRIES, NULL);
> +       if (result) {
> +               sd_printk(KERN_ERR, sdkp,
> +                         "REPORT ZONES lba %llu failed with %d/%d\n",
> +                         (unsigned long long)lba,
> +                         host_byte(result), driver_byte(result));
> +               return -EIO;
> +       }
> +
> +       rep_len = get_unaligned_be32(&buf[0]);
> +       if (rep_len < 64) {
> +               sd_printk(KERN_ERR, sdkp,
> +                         "REPORT ZONES report invalid length %u\n",
> +                         rep_len);
> +               return -EIO;
> +       }
> +
> +       return 0;
> +}
> +
> +int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
> +{
> +       struct request *rq = cmd->request;
> +       struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +       sector_t lba, sector = blk_rq_pos(rq);
> +       unsigned int nr_bytes = blk_rq_bytes(rq);
> +       int ret;
> +
> +       WARN_ON(nr_bytes == 0);
> +
> +       if (!sd_is_zoned(sdkp))
> +               /* Not a zoned device */
> +               return BLKPREP_KILL;
> +
> +       ret = scsi_init_io(cmd);
> +       if (ret != BLKPREP_OK)
> +               return ret;
> +
> +       cmd->cmd_len = 16;
> +       memset(cmd->cmnd, 0, cmd->cmd_len);
> +       cmd->cmnd[0] = ZBC_IN;
> +       cmd->cmnd[1] = ZI_REPORT_ZONES;
> +       lba = sectors_to_logical(sdkp->device, sector);
> +       put_unaligned_be64(lba, &cmd->cmnd[2]);
> +       put_unaligned_be32(nr_bytes, &cmd->cmnd[10]);
> +       /* Do partial report for speeding things up */
> +       cmd->cmnd[14] = ZBC_REPORT_ZONE_PARTIAL;
> +
> +       cmd->sc_data_direction = DMA_FROM_DEVICE;
> +       cmd->sdb.length = nr_bytes;
> +       cmd->transfersize = sdkp->device->sector_size;
> +       cmd->allowed = 0;
> +
> +       /*
> +        * Report may return less bytes than requested. Make sure
> +        * to report completion on the entire initial request.
> +        */
> +       rq->__data_len = nr_bytes;
> +
> +       return BLKPREP_OK;
> +}
> +
> +static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
> +                                        unsigned int good_bytes)
> +{
> +       struct request *rq = scmd->request;
> +       struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +       struct sg_mapping_iter miter;
> +       struct blk_zone_report_hdr hdr;
> +       struct blk_zone zone;
> +       unsigned int offset, bytes = 0;
> +       unsigned long flags;
> +       u8 *buf;
> +
> +       if (good_bytes < 64)
> +               return;
> +
> +       memset(&hdr, 0, sizeof(struct blk_zone_report_hdr));
> +
> +       sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd),
> +                      SG_MITER_TO_SG | SG_MITER_ATOMIC);
> +
> +       local_irq_save(flags);
> +       while (sg_miter_next(&miter) && bytes < good_bytes) {
> +
> +               buf = miter.addr;
> +               offset = 0;
> +
> +               if (bytes == 0) {
> +                       /* Set the report header */
> +                       hdr.nr_zones = min_t(unsigned int,
> +                                        (good_bytes - 64) / 64,
> +                                        get_unaligned_be32(&buf[0]) / 64);
> +                       memcpy(buf, &hdr, sizeof(struct blk_zone_report_hdr));
> +                       offset += 64;
> +                       bytes += 64;
> +               }
> +
> +               /* Parse zone descriptors */
> +               while (offset < miter.length && hdr.nr_zones) {
> +                       WARN_ON(offset > miter.length);
> +                       buf = miter.addr + offset;
> +                       sd_zbc_parse_report(sdkp, buf, &zone);
> +                       memcpy(buf, &zone, sizeof(struct blk_zone));
> +                       offset += 64;
> +                       bytes += 64;
> +                       hdr.nr_zones--;
> +               }
> +
> +               if (!hdr.nr_zones)
> +                       break;
> +
> +       }
> +       sg_miter_stop(&miter);
> +       local_irq_restore(flags);
> +}
> +
> +static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
> +{
> +       return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
> +}
> +
> +static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
> +                                         sector_t sector)
> +{
> +       return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
> +}
> +
> +int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
> +{
> +       struct request *rq = cmd->request;
> +       struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +       sector_t sector = blk_rq_pos(rq);
> +       sector_t block = sectors_to_logical(sdkp->device, sector);
> +
> +       if (!sd_is_zoned(sdkp))
> +               /* Not a zoned device */
> +               return BLKPREP_KILL;
> +
> +       if (sdkp->device->changed)
> +               return BLKPREP_KILL;
> +
> +       if (sector & (sd_zbc_zone_sectors(sdkp) - 1))
> +               /* Unaligned request */
> +               return BLKPREP_KILL;
> +
> +       /* Do not allow concurrent reset and writes */
> +       if (!test_and_set_bit(sd_zbc_zone_no(sdkp, sector),
> +                             sdkp->zones_wlock))
> +               return BLKPREP_DEFER;
> +
> +       cmd->cmd_len = 16;
> +       memset(cmd->cmnd, 0, cmd->cmd_len);
> +       cmd->cmnd[0] = ZBC_OUT;
> +       cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
> +       put_unaligned_be64(block, &cmd->cmnd[2]);
> +
> +       rq->timeout = SD_TIMEOUT;
> +       cmd->sc_data_direction = DMA_NONE;
> +       cmd->transfersize = 0;
> +       cmd->allowed = 0;
> +
> +       return BLKPREP_OK;
> +}
> +
> +int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
> +                           sector_t sector, unsigned int nr_sectors)
> +{
> +       sector_t zone_sectors = sd_zbc_zone_sectors(sdkp);
> +       sector_t zone_ofst = sector & (zone_sectors - 1);
> +
> +       /*
> +        * Note: alignment of the read/write on logical blocks
> +        * is done after this function returns in sd_setup_read_write.
> +        */
> +
> +       /* Do not allow zone boundaries crossing */
> +       if (zone_ofst + nr_sectors > zone_sectors)
> +               return BLKPREP_KILL;
> +
> +       /*
> +        * Do not issue more than one write at a time per
> +        * zone. This solves write ordering problems due to
> +        * the unlocking of the request queue in the dispatch
> +        * path in the non scsi-mq case. For scsi-mq, this
> +        * also avoids potential write reordering when multiple
> +        * threads running on different CPUs write to the same
> +        * zone (with a synchronized sequential pattern).
> +        */
> +       if (req_op(rq) == REQ_OP_WRITE ||
> +           req_op(rq) == REQ_OP_WRITE_SAME) {
> +               if (!test_and_set_bit(sd_zbc_zone_no(sdkp, sector),
> +                                     sdkp->zones_wlock))
> +                       return BLKPREP_DEFER;
> +       }
> +
> +       return BLKPREP_OK;
> +}
> +
> +void sd_zbc_complete(struct scsi_cmnd *cmd,
> +                    unsigned int good_bytes,
> +                    struct scsi_sense_hdr *sshdr)
> +{
> +       int result = cmd->result;
> +       struct request *rq = cmd->request;
> +       struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +
> +       switch (req_op(rq)) {
> +       case REQ_OP_WRITE:
> +       case REQ_OP_WRITE_SAME:
> +
> +               if (result &&
> +                   sshdr->sense_key == ILLEGAL_REQUEST &&
> +                   sshdr->asc == 0x21)
> +                       /*
> +                        * It is unlikely that retrying write requests failed
> +                        * with any kind of alignement error will result in
> +                        * success. So don't.
> +                        */
> +                       cmd->allowed = 0;
> +
> +               /* Fallthru */
> +
> +       case REQ_OP_ZONE_RESET:
> +
> +               /* Unlock the zone */
> +               clear_bit_unlock(sd_zbc_zone_no(sdkp, blk_rq_pos(rq)),
> +                                sdkp->zones_wlock);
> +               smp_mb__after_atomic();
> +
> +               if (result &&
> +                   sshdr->sense_key == ILLEGAL_REQUEST &&
> +                   sshdr->asc == 0x24)
> +                       /*
> +                        * INVALID FIELD IN CDB error: Reset of a conventional
> +                        * zone was attempted. Nothing to worry about,
> +                        * so be quiet about the error.
> +                        */
> +                       rq->cmd_flags |= REQ_QUIET;
> +
> +               break;
> +
> +       case REQ_OP_ZONE_REPORT:
> +
> +               if (!result)
> +                       sd_zbc_report_zones_complete(cmd, good_bytes);
> +               break;
> +
> +       }
> +}
> +
> +/**
> + * Read zoned block device characteristics (VPD page B6).
> + */
> +static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
> +                                            unsigned char *buf)
> +{
> +
> +       if (scsi_get_vpd_page(sdkp->device, 0xb6, buf, 64)) {
> +               sd_printk(KERN_NOTICE, sdkp,
> +                         "Unconstrained-read check failed\n");
> +               return -ENODEV;
> +       }
> +
> +       if (sdkp->device->type != TYPE_ZBC) {
> +               /* Host-aware */
> +               sdkp->urswrz = 1;
> +               sdkp->zones_optimal_open = get_unaligned_be64(&buf[8]);
> +               sdkp->zones_optimal_nonseq = get_unaligned_be64(&buf[12]);
> +               sdkp->zones_max_open = 0;
> +       } else {
> +               /* Host-managed */
> +               sdkp->urswrz = buf[4] & 1;
> +               sdkp->zones_optimal_open = 0;
> +               sdkp->zones_optimal_nonseq = 0;
> +               sdkp->zones_max_open = get_unaligned_be64(&buf[16]);
> +       }
> +
> +       return 0;
> +}
> +
> +/**
> + * Check reported capacity.
> + */
> +static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
> +                                unsigned char *buf)
> +{
> +       sector_t lba;
> +       int ret;
> +
> +       if (sdkp->rc_basis != 0)
> +               return 0;
> +
> +       /* Do a report zone to get the maximum LBA to check capacity */
> +       ret = sd_zbc_report_zones(sdkp, buf, SD_BUF_SIZE, 0);
> +       if (ret)
> +               return ret;
> +
> +       /* The max_lba field is the capacity of this device */
> +       lba = get_unaligned_be64(&buf[8]);
> +       if (lba + 1 == sdkp->capacity)
> +               return 0;
> +
> +       if (sdkp->first_scan)
> +               sd_printk(KERN_WARNING, sdkp,
> +                         "Changing capacity from %zu to max LBA+1 %llu\n",
> +                         sdkp->capacity,
> +                         (unsigned long long)lba + 1);
> +       sdkp->capacity = lba + 1;
> +
> +       return 0;
> +}
> +
> +#define SD_ZBC_BUF_SIZE 131072
> +
> +static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
> +{
> +       u64 zone_blocks;
> +       sector_t block = 0;
> +       unsigned char *buf;
> +       unsigned char *rec;
> +       unsigned int buf_len;
> +       unsigned int list_length;
> +       int ret;
> +       u8 same;
> +
> +       sdkp->zone_blocks = 0;
> +
> +       /* Get a buffer */
> +       buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
> +       if (!buf)
> +               return -ENOMEM;
> +
> +       /* Do a report zone to get the same field */
> +       ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0);
> +       if (ret)
> +               goto out;
> +
> +       same = buf[4] & 0x0f;
> +       if (same > 0) {
> +               rec = &buf[64];
> +               zone_blocks = get_unaligned_be64(&rec[8]);
> +               goto out;
> +       }
> +
> +       /*
> +        * Check the size of all zones: all zones must be of
> +        * equal size, except the last zone which can be smaller
> +        * than other zones.
> +        */
> +       do {
> +
> +               /* Parse REPORT ZONES header */
> +               list_length = get_unaligned_be32(&buf[0]) + 64;
> +               rec = buf + 64;
> +               if (list_length < SD_ZBC_BUF_SIZE)
> +                       buf_len = list_length;
> +               else
> +                       buf_len = SD_ZBC_BUF_SIZE;
> +
> +               /* Parse zone descriptors */
> +               while (rec < buf + buf_len) {
> +                       zone_blocks = get_unaligned_be64(&rec[8]);
> +                       if (sdkp->zone_blocks == 0) {
> +                               sdkp->zone_blocks = zone_blocks;
> +                       } else if (zone_blocks != sdkp->zone_blocks &&
> +                                  (block + zone_blocks < sdkp->capacity
> +                                   || zone_blocks > sdkp->zone_blocks)) {
> +                               zone_blocks = 0;
> +                               goto out;
> +                       }
> +                       block += zone_blocks;
> +                       rec += 64;
> +               }
> +
> +               if (block < sdkp->capacity) {
> +                       ret = sd_zbc_report_zones(sdkp, buf,
> +                                                 SD_ZBC_BUF_SIZE, block);
> +                       if (ret)
> +                               return ret;
> +               }
> +
> +       } while (block < sdkp->capacity);
> +
> +       zone_blocks = sdkp->zone_blocks;
> +
> +out:
> +       kfree(buf);
> +
> +       if (!zone_blocks) {
> +               if (sdkp->first_scan)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                                 "Devices with non constant zone "
> +                                 "size are not supported\n");
> +               return -ENODEV;
> +       }
> +
> +       if (!is_power_of_2(zone_blocks)) {
> +               if (sdkp->first_scan)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                                 "Devices with non power of 2 zone "
> +                                 "size are not supported\n");
> +               return -ENODEV;
> +       }
> +
> +       if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) {
> +               if (sdkp->first_scan)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                                 "Zone size too large\n");
> +               return -ENODEV;
> +       }
> +
> +       sdkp->zone_blocks = zone_blocks;
> +
> +       return 0;
> +}
> +
> +static int sd_zbc_setup(struct scsi_disk *sdkp)
> +{
> +
> +       /* chunk_sectors indicates the zone size */
> +       blk_queue_chunk_sectors(sdkp->disk->queue,
> +                       logical_to_sectors(sdkp->device, sdkp->zone_blocks));
> +       sdkp->zone_shift = ilog2(sdkp->zone_blocks);
> +       sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;
> +       if (sdkp->capacity & (sdkp->zone_blocks - 1))
> +               sdkp->nr_zones++;
> +
> +       if (!sdkp->zones_wlock) {
> +               sdkp->zones_wlock = kzalloc(BITS_TO_LONGS(sdkp->nr_zones),
> +                                           GFP_KERNEL);
> +               if (!sdkp->zones_wlock)
> +                       return -ENOMEM;
> +       }
> +
> +       return 0;
> +}
> +
> +int sd_zbc_read_zones(struct scsi_disk *sdkp,
> +                     unsigned char *buf)
> +{
> +       sector_t capacity;
> +       int ret = 0;
> +
> +       if (!sd_is_zoned(sdkp))
> +               /*
> +                * Device managed or normal SCSI disk,
> +                * no special handling required
> +                */
> +               return 0;
> +
> +
> +       /* Get zoned block device characteristics */
> +       ret = sd_zbc_read_zoned_characteristics(sdkp, buf);
> +       if (ret)
> +               goto err;
> +
> +       /*
> +        * Check for unconstrained reads: host-managed devices with
> +        * constrained reads (drives failing read after write pointer)
> +        * are not supported.
> +        */
> +       if (!sdkp->urswrz) {
> +               if (sdkp->first_scan)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                         "constrained reads devices are not supported\n");
> +               ret = -ENODEV;
> +               goto err;
> +       }
> +
> +       /* Check capacity */
> +       ret = sd_zbc_check_capacity(sdkp, buf);
> +       if (ret)
> +               goto err;
> +       capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
> +
> +       /*
> +        * Check zone size: only devices with a constant zone size (except
> +        * an eventual last runt zone) that is a power of 2 are supported.
> +        */
> +       ret = sd_zbc_check_zone_size(sdkp);
> +       if (ret)
> +               goto err;
> +
> +       /* The drive satisfies the kernel restrictions: set it up */
> +       ret = sd_zbc_setup(sdkp);
> +       if (ret)
> +               goto err;
> +
> +       return 0;
> +
> +err:
> +       sdkp->capacity = 0;
> +
> +       return ret;
> +}
> +
> +void sd_zbc_remove(struct scsi_disk *sdkp)
> +{
> +       kfree(sdkp->zones_wlock);
> +       sdkp->zones_wlock = NULL;
> +}
> +
> +void sd_zbc_print_zones(struct scsi_disk *sdkp)
> +{
> +       if (!sd_is_zoned(sdkp) || !sdkp->capacity)
> +               return;
> +
> +       if (sdkp->capacity & (sdkp->zone_blocks - 1))
> +               sd_printk(KERN_NOTICE, sdkp,
> +                         "%u zones of %u logical blocks + 1 runt zone\n",
> +                         sdkp->nr_zones - 1,
> +                         sdkp->zone_blocks);
> +       else
> +               sd_printk(KERN_NOTICE, sdkp,
> +                         "%u zones of %u logical blocks\n",
> +                         sdkp->nr_zones,
> +                         sdkp->zone_blocks);
> +}
> diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
> index d1defd1..6ba66e0 100644
> --- a/include/scsi/scsi_proto.h
> +++ b/include/scsi/scsi_proto.h
> @@ -299,4 +299,21 @@ struct scsi_lun {
>  #define SCSI_ACCESS_STATE_MASK        0x0f
>  #define SCSI_ACCESS_STATE_PREFERRED   0x80
>
> +/* Reporting options for REPORT ZONES */
> +enum zbc_zone_reporting_options {
> +       ZBC_ZONE_REPORTING_OPTION_ALL = 0,
> +       ZBC_ZONE_REPORTING_OPTION_EMPTY,
> +       ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
> +       ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
> +       ZBC_ZONE_REPORTING_OPTION_CLOSED,
> +       ZBC_ZONE_REPORTING_OPTION_FULL,
> +       ZBC_ZONE_REPORTING_OPTION_READONLY,
> +       ZBC_ZONE_REPORTING_OPTION_OFFLINE,
> +       ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
> +       ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
> +       ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
> +};
> +
> +#define ZBC_REPORT_ZONE_PARTIAL 0x80
> +
>  #endif /* _SCSI_PROTO_H_ */
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 1/7] block: Add 'zoned' queue limit
  2016-09-28  8:45   ` Damien Le Moal
@ 2016-09-30  1:59     ` Martin K. Petersen
  -1 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  1:59 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Damien> Add the zoned queue limit to indicate the zoning model of a
Damien> block device.  Defined values are 0 (BLK_ZONED_NONE) for regular
Damien> block devices, 1 (BLK_ZONED_HA) for host-aware zone block
Damien> devices and 2 (BLK_ZONED_HM) for host-managed zone block
Damien> devices. The standards defined drive managed model is not
Damien> defined here since these block devices do not provide any
Damien> command for accessing zone information. Drive managed model
Damien> devices will be reported as BLK_ZONED_NONE.

Damien> The helper functions blk_queue_zoned_model and bdev_zoned_model
Damien> return the zoned limit and the functions blk_queue_is_zoned and
Damien> bdev_is_zoned return a boolean for callers to test if a block
Damien> device is zoned.

Damien> The zoned attribute is also exported as a string to applications
Damien> via sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as
Damien> "host-aware" and BLK_ZONED_HM as "host-managed".

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 1/7] block: Add 'zoned' queue limit
@ 2016-09-30  1:59     ` Martin K. Petersen
  0 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  1:59 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Damien> Add the zoned queue limit to indicate the zoning model of a
Damien> block device.  Defined values are 0 (BLK_ZONED_NONE) for regular
Damien> block devices, 1 (BLK_ZONED_HA) for host-aware zone block
Damien> devices and 2 (BLK_ZONED_HM) for host-managed zone block
Damien> devices. The standards defined drive managed model is not
Damien> defined here since these block devices do not provide any
Damien> command for accessing zone information. Drive managed model
Damien> devices will be reported as BLK_ZONED_NONE.

Damien> The helper functions blk_queue_zoned_model and bdev_zoned_model
Damien> return the zoned limit and the functions blk_queue_is_zoned and
Damien> bdev_is_zoned return a boolean for callers to test if a block
Damien> device is zoned.

Damien> The zoned attribute is also exported as a string to applications
Damien> via sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as
Damien> "host-aware" and BLK_ZONED_HM as "host-managed".

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  2016-09-28  8:45   ` Damien Le Moal
@ 2016-09-30  2:02     ` Martin K. Petersen
  -1 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:02 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Damien> From: Hannes Reinecke <hare@suse.de> The queue limits already
Damien> have a 'chunk_sectors' setting, so we should be presenting it
Damien> via sysfs.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
@ 2016-09-30  2:02     ` Martin K. Petersen
  0 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:02 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Damien> From: Hannes Reinecke <hare@suse.de> The queue limits already
Damien> have a 'chunk_sectors' setting, so we should be presenting it
Damien> via sysfs.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/7] block: update chunk_sectors in blk_stack_limits()
  2016-09-28  8:45   ` Damien Le Moal
@ 2016-09-30  2:02     ` Martin K. Petersen
  -1 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:02 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff,
	Hannes Reinecke

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 3/7] block: update chunk_sectors in blk_stack_limits()
@ 2016-09-30  2:02     ` Martin K. Petersen
  0 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:02 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff,
	Hannes Reinecke

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 4/7] block: Define zoned block device operations
  2016-09-28  8:45   ` Damien Le Moal
@ 2016-09-30  2:03     ` Martin K. Petersen
  -1 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:03 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff,
	Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 4/7] block: Define zoned block device operations
@ 2016-09-30  2:03     ` Martin K. Petersen
  0 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:03 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff,
	Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 5/7] block: Implement support for zoned block devices
  2016-09-28  8:45   ` Damien Le Moal
@ 2016-09-30  2:05     ` Martin K. Petersen
  -1 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:05 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Damien> Implement zoned block device zone information reporting and
Damien> reset.  Zone information are reported as struct blk_zone. This
Damien> implementation does not differentiate between host-aware and
Damien> host-managed device models and is valid for both. Two functions
Damien> are provided: blkdev_report_zones for discovering the zone
Damien> configuration of a zoned block device, and blkdev_reset_zones
Damien> for resetting the write pointer of sequential zones. The helper
Damien> function blk_queue_zone_size and bdev_zone_size are also
Damien> provided for, as the name suggest, obtaining the zone size (in
Damien> 512B sectors) of the zones of the device.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 5/7] block: Implement support for zoned block devices
@ 2016-09-30  2:05     ` Martin K. Petersen
  0 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:05 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Damien> Implement zoned block device zone information reporting and
Damien> reset.  Zone information are reported as struct blk_zone. This
Damien> implementation does not differentiate between host-aware and
Damien> host-managed device models and is valid for both. Two functions
Damien> are provided: blkdev_report_zones for discovering the zone
Damien> configuration of a zoned block device, and blkdev_reset_zones
Damien> for resetting the write pointer of sequential zones. The helper
Damien> function blk_queue_zone_size and bdev_zone_size are also
Damien> provided for, as the name suggest, obtaining the zone size (in
Damien> 512B sectors) of the zones of the device.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 6/7] sd: Implement support for ZBC devices
  2016-09-28  8:45   ` Damien Le Moal
@ 2016-09-30  2:37     ` Martin K. Petersen
  -1 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:37 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Damien,

Almost there! And A-OK on the read capacity changes.

However:

@@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
 	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */

That comment really says: "I am doing confusing stuff that doesn't
follow the normal calling convention in the driver". Plus it's another
case of using block layer sectors where they shouldn't be.

Please just pass the scsi_cmnd to sd_zbc_set_read_write() like it's done
for sd_zbc_setup_reset_cmnd() and the regular sd_setup_* calls. And then
no commentary is necessary...

+		ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
+		if (ret != BLKPREP_OK)
+			return ret;
+	}
+

@@ -963,6 +976,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
 					(unsigned long long)block));
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
+		if (ret != BLKPREP_OK)
+			goto out;
+	}
+

Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 6/7] sd: Implement support for ZBC devices
@ 2016-09-30  2:37     ` Martin K. Petersen
  0 siblings, 0 replies; 34+ messages in thread
From: Martin K. Petersen @ 2016-09-30  2:37 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Shaun Tancheff

>>>>> "Damien" == Damien Le Moal <damien.lemoal@hgst.com> writes:

Damien,

Almost there! And A-OK on the read capacity changes.

However:

@@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
 	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */

That comment really says: "I am doing confusing stuff that doesn't
follow the normal calling convention in the driver". Plus it's another
case of using block layer sectors where they shouldn't be.

Please just pass the scsi_cmnd to sd_zbc_set_read_write() like it's done
for sd_zbc_setup_reset_cmnd() and the regular sd_setup_* calls. And then
no commentary is necessary...

+		ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
+		if (ret != BLKPREP_OK)
+			return ret;
+	}
+

@@ -963,6 +976,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
 					(unsigned long long)block));
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
+		if (ret != BLKPREP_OK)
+			goto out;
+	}
+

Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-09-30  2:37 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-28  8:45 [PATCH v4 0/7] ZBC / Zoned block device support Damien Le Moal
2016-09-28  8:45 ` Damien Le Moal
2016-09-28  8:45 ` [PATCH v4 1/7] block: Add 'zoned' queue limit Damien Le Moal
2016-09-28  8:45   ` Damien Le Moal
2016-09-29  1:32   ` Shaun Tancheff
2016-09-29  1:32     ` Shaun Tancheff
2016-09-30  1:59   ` Martin K. Petersen
2016-09-30  1:59     ` Martin K. Petersen
2016-09-28  8:45 ` [PATCH v4 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes Damien Le Moal
2016-09-28  8:45   ` Damien Le Moal
2016-09-29  1:32   ` Shaun Tancheff
2016-09-30  2:02   ` Martin K. Petersen
2016-09-30  2:02     ` Martin K. Petersen
2016-09-28  8:45 ` [PATCH v4 3/7] block: update chunk_sectors in blk_stack_limits() Damien Le Moal
2016-09-28  8:45   ` Damien Le Moal
2016-09-29  1:33   ` Shaun Tancheff
2016-09-30  2:02   ` Martin K. Petersen
2016-09-30  2:02     ` Martin K. Petersen
2016-09-28  8:45 ` [PATCH v4 4/7] block: Define zoned block device operations Damien Le Moal
2016-09-28  8:45   ` Damien Le Moal
2016-09-30  2:03   ` Martin K. Petersen
2016-09-30  2:03     ` Martin K. Petersen
2016-09-28  8:45 ` [PATCH v4 5/7] block: Implement support for zoned block devices Damien Le Moal
2016-09-28  8:45   ` Damien Le Moal
2016-09-29  1:34   ` Shaun Tancheff
2016-09-30  2:05   ` Martin K. Petersen
2016-09-30  2:05     ` Martin K. Petersen
2016-09-28  8:45 ` [PATCH v4 6/7] sd: Implement support for ZBC devices Damien Le Moal
2016-09-28  8:45   ` Damien Le Moal
2016-09-29  1:35   ` Shaun Tancheff
2016-09-30  2:37   ` Martin K. Petersen
2016-09-30  2:37     ` Martin K. Petersen
2016-09-28  8:45 ` [PATCH v4 7/7] blk-zoned: implement ioctls Damien Le Moal
2016-09-28  8:45   ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.