All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] ZBC / Zoned block device support
@ 2016-09-26 11:14 ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Damien Le Moal

Re-sending due to error in mailing list addresses. My apologies if
you receive multiple copies of the emails.

This series introduces support for zoned block devices. It integrates
earlier submissions by Hannes Reinecke and Shaun Tancheff. Compared to the
previous series version, the code was significantly simplified by limiting
support to zoned devices satisfying the following conditions:
1) All zones of the device are the same size, with the exception of an
   eventual last smaller runt zone.
2) For host-managed disks, reads must be unrestricted (read commands do not
   fail due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are ignored.

These 2 conditions allowed dropping the zone information cache implemented
in the previous version. This simplifies the code and also reduces the memory
consumption at run time. Support for zoned devices now only require one bit
per zone (less than 8KB in total). This bit field is used to write-lock
zones and prevent the concurrent execution of multiple write commands in
the same zone. This avoids write ordering problems at dispatch time, for
both the simple queue and scsi-mq settings.

The new operations introduced to suport zone manipulation was reduced to
only the two main ZBC/ZAC defined commands: REPORT ZONES (REQ_OP_ZONE_REPORT)
and RESET WRITE POINTER (REQ_OP_ZONE_RESET). This brings the total number of
operations defined to 8, which fits in the 3 bits (REQ_OP_BITS) reserved for
operation code in bio->bi_opf and req->cmd_flags.

Most of the ZBC specific code is kept out of sd.c and implemented in the
new file sd_zbc.c. Similarly, at the block layer, most of the zoned block
device code is implemented in the new blk-zoned.c.

For host-managed zoned block devices, the sequential write constraint of
write pointer zones is exposed to the user. Users of the disk (applications,
file systems or device mappers) must sequentially write to zones. This means
that for raw block device accesses from applications, buffered writes are
unreliable and direct I/Os must be used (or buffered writes with O_SYNC).

Access to zone manipulation operations is also provided to applications
through a set of new ioctls. This allows applications operating on raw
block devices (e.g. mkfs.xxx) to discover a device zone layout and
manipulate zone state.

Damien Le Moal (1):
  block: Add 'zoned' queue limit

Hannes Reinecke (4):
  blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  block: update chunk_sectors in blk_stack_limits()
  block: Implement support for zoned block devices
  sd: Implement support for ZBC devices

Shaun Tancheff (2):
  block: Define zoned block device operations
  blk-zoned: implement ioctls

 block/Kconfig                 |   8 +
 block/Makefile                |   1 +
 block/blk-core.c              |   4 +
 block/blk-settings.c          |   5 +
 block/blk-sysfs.c             |  29 +++
 block/blk-zoned.c             | 335 ++++++++++++++++++++++++
 block/ioctl.c                 |   4 +
 drivers/scsi/Makefile         |   1 +
 drivers/scsi/sd.c             |  97 ++++++-
 drivers/scsi/sd.h             |  67 +++++
 drivers/scsi/sd_zbc.c         | 586 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blk_types.h     |   2 +
 include/linux/blkdev.h        |  99 +++++++
 include/scsi/scsi_proto.h     |  17 ++
 include/uapi/linux/Kbuild     |   1 +
 include/uapi/linux/blkzoned.h | 143 +++++++++++
 include/uapi/linux/fs.h       |   4 +
 17 files changed, 1389 insertions(+), 14 deletions(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 drivers/scsi/sd_zbc.c
 create mode 100644 include/uapi/linux/blkzoned.h

-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 0/7] ZBC / Zoned block device support
@ 2016-09-26 11:14 ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Damien Le Moal

Re-sending due to error in mailing list addresses. My apologies if
you receive multiple copies of the emails.

This series introduces support for zoned block devices. It integrates
earlier submissions by Hannes Reinecke and Shaun Tancheff. Compared to the
previous series version, the code was significantly simplified by limiting
support to zoned devices satisfying the following conditions:
1) All zones of the device are the same size, with the exception of an
   eventual last smaller runt zone.
2) For host-managed disks, reads must be unrestricted (read commands do not
   fail due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are ignored.

These 2 conditions allowed dropping the zone information cache implemented
in the previous version. This simplifies the code and also reduces the memory
consumption at run time. Support for zoned devices now only require one bit
per zone (less than 8KB in total). This bit field is used to write-lock
zones and prevent the concurrent execution of multiple write commands in
the same zone. This avoids write ordering problems at dispatch time, for
both the simple queue and scsi-mq settings.

The new operations introduced to suport zone manipulation was reduced to
only the two main ZBC/ZAC defined commands: REPORT ZONES (REQ_OP_ZONE_REPORT)
and RESET WRITE POINTER (REQ_OP_ZONE_RESET). This brings the total number of
operations defined to 8, which fits in the 3 bits (REQ_OP_BITS) reserved for
operation code in bio->bi_opf and req->cmd_flags.

Most of the ZBC specific code is kept out of sd.c and implemented in the
new file sd_zbc.c. Similarly, at the block layer, most of the zoned block
device code is implemented in the new blk-zoned.c.

For host-managed zoned block devices, the sequential write constraint of
write pointer zones is exposed to the user. Users of the disk (applications,
file systems or device mappers) must sequentially write to zones. This means
that for raw block device accesses from applications, buffered writes are
unreliable and direct I/Os must be used (or buffered writes with O_SYNC).

Access to zone manipulation operations is also provided to applications
through a set of new ioctls. This allows applications operating on raw
block devices (e.g. mkfs.xxx) to discover a device zone layout and
manipulate zone state.

Damien Le Moal (1):
  block: Add 'zoned' queue limit

Hannes Reinecke (4):
  blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  block: update chunk_sectors in blk_stack_limits()
  block: Implement support for zoned block devices
  sd: Implement support for ZBC devices

Shaun Tancheff (2):
  block: Define zoned block device operations
  blk-zoned: implement ioctls

 block/Kconfig                 |   8 +
 block/Makefile                |   1 +
 block/blk-core.c              |   4 +
 block/blk-settings.c          |   5 +
 block/blk-sysfs.c             |  29 +++
 block/blk-zoned.c             | 335 ++++++++++++++++++++++++
 block/ioctl.c                 |   4 +
 drivers/scsi/Makefile         |   1 +
 drivers/scsi/sd.c             |  97 ++++++-
 drivers/scsi/sd.h             |  67 +++++
 drivers/scsi/sd_zbc.c         | 586 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blk_types.h     |   2 +
 include/linux/blkdev.h        |  99 +++++++
 include/scsi/scsi_proto.h     |  17 ++
 include/uapi/linux/Kbuild     |   1 +
 include/uapi/linux/blkzoned.h | 143 +++++++++++
 include/uapi/linux/fs.h       |   4 +
 17 files changed, 1389 insertions(+), 14 deletions(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 drivers/scsi/sd_zbc.c
 create mode 100644 include/uapi/linux/blkzoned.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/7] block: Add 'zoned' queue limit
  2016-09-26 11:14 ` Damien Le Moal
@ 2016-09-26 11:14   ` Damien Le Moal
  -1 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Damien Le Moal

Add the zoned queue limit to indicate the zoning model of a block device.
Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
for host-managed zone block devices. The standards defined drive managed
model is not defined here since these block devices do not provide any
command for accessing zone information. Drive managed model devices will
be reported as BLK_ZONED_NONE.

The helper functions blk_queue_zoned_model and bdev_zoned_model return
the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
return a boolean for callers to test if a block device is zoned.

The zoned attribute is also exported as a string to applications via
sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
BLK_ZONED_HM as "host-managed".

Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-settings.c   |  1 +
 block/blk-sysfs.c      | 18 ++++++++++++++++++
 include/linux/blkdev.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index f679ae1..b1d5b7f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->io_opt = 0;
 	lim->misaligned = 0;
 	lim->cluster = 1;
+	lim->zoned = BLK_ZONED_NONE;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..ff9cd9c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
+static ssize_t queue_zoned_show(struct request_queue *q, char *page)
+{
+	switch (blk_queue_zoned_model(q)) {
+	case BLK_ZONED_HA:
+		return sprintf(page, "host-aware\n");
+	case BLK_ZONED_HM:
+		return sprintf(page, "host-managed\n");
+	default:
+		return sprintf(page, "none\n");
+	}
+}
+
 static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
 {
 	return queue_var_show((blk_queue_nomerges(q) << 1) |
@@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry = {
 	.store = queue_store_nonrot,
 };
 
+static struct queue_sysfs_entry queue_zoned_entry = {
+	.attr = {.name = "zoned", .mode = S_IRUGO },
+	.show = queue_zoned_show,
+};
+
 static struct queue_sysfs_entry queue_nomerges_entry = {
 	.attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_nomerges_show,
@@ -546,6 +563,7 @@ static struct attribute *default_attrs[] = {
 	&queue_discard_zeroes_data_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_nonrot_entry.attr,
+	&queue_zoned_entry.attr,
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
 	&queue_iostats_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..f19e16b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -261,6 +261,15 @@ struct blk_queue_tag {
 #define BLK_SCSI_MAX_CMDS	(256)
 #define BLK_SCSI_CMD_PER_LONG	(BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
+/*
+ * Zoned block device models (zoned limit).
+ */
+enum blk_zoned_model {
+	BLK_ZONED_NONE,	/* Regular block device */
+	BLK_ZONED_HA,	/* Host-aware zoned block device */
+	BLK_ZONED_HM,	/* Host-managed zoned block device */
+};
+
 struct queue_limits {
 	unsigned long		bounce_pfn;
 	unsigned long		seg_boundary_mask;
@@ -290,6 +299,7 @@ struct queue_limits {
 	unsigned char		cluster;
 	unsigned char		discard_zeroes_data;
 	unsigned char		raid_partial_stripes_expensive;
+	enum blk_zoned_model	zoned;
 };
 
 struct request_queue {
@@ -627,6 +637,23 @@ static inline unsigned int blk_queue_cluster(struct request_queue *q)
 	return q->limits.cluster;
 }
 
+static inline enum blk_zoned_model
+blk_queue_zoned_model(struct request_queue *q)
+{
+	return q->limits.zoned;
+}
+
+static inline bool blk_queue_is_zoned(struct request_queue *q)
+{
+	switch (blk_queue_zoned_model(q)) {
+	case BLK_ZONED_HA:
+	case BLK_ZONED_HM:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */
@@ -1354,6 +1381,26 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
 	return 0;
 }
 
+static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_zoned_model(q);
+
+	return BLK_ZONED_NONE;
+}
+
+static inline bool bdev_is_zoned(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_is_zoned(q);
+
+	return false;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 1/7] block: Add 'zoned' queue limit
@ 2016-09-26 11:14   ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Damien Le Moal

Add the zoned queue limit to indicate the zoning model of a block device.
Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
for host-managed zone block devices. The standards defined drive managed
model is not defined here since these block devices do not provide any
command for accessing zone information. Drive managed model devices will
be reported as BLK_ZONED_NONE.

The helper functions blk_queue_zoned_model and bdev_zoned_model return
the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
return a boolean for callers to test if a block device is zoned.

The zoned attribute is also exported as a string to applications via
sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
BLK_ZONED_HM as "host-managed".

Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-settings.c   |  1 +
 block/blk-sysfs.c      | 18 ++++++++++++++++++
 include/linux/blkdev.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index f679ae1..b1d5b7f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->io_opt = 0;
 	lim->misaligned = 0;
 	lim->cluster = 1;
+	lim->zoned = BLK_ZONED_NONE;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..ff9cd9c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
+static ssize_t queue_zoned_show(struct request_queue *q, char *page)
+{
+	switch (blk_queue_zoned_model(q)) {
+	case BLK_ZONED_HA:
+		return sprintf(page, "host-aware\n");
+	case BLK_ZONED_HM:
+		return sprintf(page, "host-managed\n");
+	default:
+		return sprintf(page, "none\n");
+	}
+}
+
 static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
 {
 	return queue_var_show((blk_queue_nomerges(q) << 1) |
@@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry = {
 	.store = queue_store_nonrot,
 };
 
+static struct queue_sysfs_entry queue_zoned_entry = {
+	.attr = {.name = "zoned", .mode = S_IRUGO },
+	.show = queue_zoned_show,
+};
+
 static struct queue_sysfs_entry queue_nomerges_entry = {
 	.attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_nomerges_show,
@@ -546,6 +563,7 @@ static struct attribute *default_attrs[] = {
 	&queue_discard_zeroes_data_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_nonrot_entry.attr,
+	&queue_zoned_entry.attr,
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
 	&queue_iostats_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..f19e16b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -261,6 +261,15 @@ struct blk_queue_tag {
 #define BLK_SCSI_MAX_CMDS	(256)
 #define BLK_SCSI_CMD_PER_LONG	(BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
+/*
+ * Zoned block device models (zoned limit).
+ */
+enum blk_zoned_model {
+	BLK_ZONED_NONE,	/* Regular block device */
+	BLK_ZONED_HA,	/* Host-aware zoned block device */
+	BLK_ZONED_HM,	/* Host-managed zoned block device */
+};
+
 struct queue_limits {
 	unsigned long		bounce_pfn;
 	unsigned long		seg_boundary_mask;
@@ -290,6 +299,7 @@ struct queue_limits {
 	unsigned char		cluster;
 	unsigned char		discard_zeroes_data;
 	unsigned char		raid_partial_stripes_expensive;
+	enum blk_zoned_model	zoned;
 };
 
 struct request_queue {
@@ -627,6 +637,23 @@ static inline unsigned int blk_queue_cluster(struct request_queue *q)
 	return q->limits.cluster;
 }
 
+static inline enum blk_zoned_model
+blk_queue_zoned_model(struct request_queue *q)
+{
+	return q->limits.zoned;
+}
+
+static inline bool blk_queue_is_zoned(struct request_queue *q)
+{
+	switch (blk_queue_zoned_model(q)) {
+	case BLK_ZONED_HA:
+	case BLK_ZONED_HM:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */
@@ -1354,6 +1381,26 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
 	return 0;
 }
 
+static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_zoned_model(q);
+
+	return BLK_ZONED_NONE;
+}
+
+static inline bool bdev_is_zoned(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_is_zoned(q);
+
+	return false;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  2016-09-26 11:14 ` Damien Le Moal
@ 2016-09-26 11:14   ` Damien Le Moal
  -1 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

The queue limits already have a 'chunk_sectors' setting, so
we should be presenting it via sysfs.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-sysfs.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff9cd9c..488c2e2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct request_queue *q, char *pag
 	return queue_var_show(queue_physical_block_size(q), page);
 }
 
+static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.chunk_sectors, page);
+}
+
 static ssize_t queue_io_min_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(queue_io_min(q), page);
@@ -455,6 +460,11 @@ static struct queue_sysfs_entry queue_physical_block_size_entry = {
 	.show = queue_physical_block_size_show,
 };
 
+static struct queue_sysfs_entry queue_chunk_sectors_entry = {
+	.attr = {.name = "chunk_sectors", .mode = S_IRUGO },
+	.show = queue_chunk_sectors_show,
+};
+
 static struct queue_sysfs_entry queue_io_min_entry = {
 	.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
 	.show = queue_io_min_show,
@@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
 	&queue_hw_sector_size_entry.attr,
 	&queue_logical_block_size_entry.attr,
 	&queue_physical_block_size_entry.attr,
+	&queue_chunk_sectors_entry.attr,
 	&queue_io_min_entry.attr,
 	&queue_io_opt_entry.attr,
 	&queue_discard_granularity_entry.attr,
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
@ 2016-09-26 11:14   ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

The queue limits already have a 'chunk_sectors' setting, so
we should be presenting it via sysfs.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-sysfs.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff9cd9c..488c2e2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct request_queue *q, char *pag
 	return queue_var_show(queue_physical_block_size(q), page);
 }
 
+static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.chunk_sectors, page);
+}
+
 static ssize_t queue_io_min_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(queue_io_min(q), page);
@@ -455,6 +460,11 @@ static struct queue_sysfs_entry queue_physical_block_size_entry = {
 	.show = queue_physical_block_size_show,
 };
 
+static struct queue_sysfs_entry queue_chunk_sectors_entry = {
+	.attr = {.name = "chunk_sectors", .mode = S_IRUGO },
+	.show = queue_chunk_sectors_show,
+};
+
 static struct queue_sysfs_entry queue_io_min_entry = {
 	.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
 	.show = queue_io_min_show,
@@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
 	&queue_hw_sector_size_entry.attr,
 	&queue_logical_block_size_entry.attr,
 	&queue_physical_block_size_entry.attr,
+	&queue_chunk_sectors_entry.attr,
 	&queue_io_min_entry.attr,
 	&queue_io_opt_entry.attr,
 	&queue_discard_granularity_entry.attr,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/7] block: update chunk_sectors in blk_stack_limits()
  2016-09-26 11:14 ` Damien Le Moal
@ 2016-09-26 11:14   ` Damien Le Moal
  -1 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Hannes Reinecke, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-settings.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b1d5b7f..55369a6 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			t->discard_granularity;
 	}
 
+	if (b->chunk_sectors)
+		t->chunk_sectors = min_not_zero(t->chunk_sectors,
+						b->chunk_sectors);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/7] block: update chunk_sectors in blk_stack_limits()
@ 2016-09-26 11:14   ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Hannes Reinecke, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-settings.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b1d5b7f..55369a6 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			t->discard_granularity;
 	}
 
+	if (b->chunk_sectors)
+		t->chunk_sectors = min_not_zero(t->chunk_sectors,
+						b->chunk_sectors);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/7] block: Define zoned block device operations
  2016-09-26 11:14 ` Damien Le Moal
@ 2016-09-26 11:14   ` Damien Le Moal
  -1 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Shaun Tancheff, Damien Le Moal

From: Shaun Tancheff <shaun.tancheff@seagate.com>

Define REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET for handling zones of
host-managed and host-aware zoned block devices. With with these two
new operations, the total number of operations defined reaches 8 and
still fits with the 3 bits definition of REQ_OP_BITS.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-core.c          | 4 ++++
 include/linux/blk_types.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..e4eda5d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1941,6 +1941,10 @@ generic_make_request_checks(struct bio *bio)
 	case REQ_OP_WRITE_SAME:
 		if (!bdev_write_same(bio->bi_bdev))
 			goto not_supported;
+	case REQ_OP_ZONE_REPORT:
+	case REQ_OP_ZONE_RESET:
+		if (!bdev_is_zoned(bio->bi_bdev))
+			goto not_supported;
 		break;
 	default:
 		break;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ec..dd50dce 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -243,6 +243,8 @@ enum req_op {
 	REQ_OP_SECURE_ERASE,	/* request to securely erase sectors */
 	REQ_OP_WRITE_SAME,	/* write same block many times */
 	REQ_OP_FLUSH,		/* request for cache flush */
+	REQ_OP_ZONE_REPORT,	/* Get zone information */
+	REQ_OP_ZONE_RESET,	/* Reset a zone write pointer */
 };
 
 #define REQ_OP_BITS 3
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/7] block: Define zoned block device operations
@ 2016-09-26 11:14   ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Shaun Tancheff, Damien Le Moal

From: Shaun Tancheff <shaun.tancheff@seagate.com>

Define REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET for handling zones of
host-managed and host-aware zoned block devices. With with these two
new operations, the total number of operations defined reaches 8 and
still fits with the 3 bits definition of REQ_OP_BITS.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-core.c          | 4 ++++
 include/linux/blk_types.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..e4eda5d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1941,6 +1941,10 @@ generic_make_request_checks(struct bio *bio)
 	case REQ_OP_WRITE_SAME:
 		if (!bdev_write_same(bio->bi_bdev))
 			goto not_supported;
+	case REQ_OP_ZONE_REPORT:
+	case REQ_OP_ZONE_RESET:
+		if (!bdev_is_zoned(bio->bi_bdev))
+			goto not_supported;
 		break;
 	default:
 		break;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ec..dd50dce 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -243,6 +243,8 @@ enum req_op {
 	REQ_OP_SECURE_ERASE,	/* request to securely erase sectors */
 	REQ_OP_WRITE_SAME,	/* write same block many times */
 	REQ_OP_FLUSH,		/* request for cache flush */
+	REQ_OP_ZONE_REPORT,	/* Get zone information */
+	REQ_OP_ZONE_RESET,	/* Reset a zone write pointer */
 };
 
 #define REQ_OP_BITS 3
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/7] block: Implement support for zoned block devices
  2016-09-26 11:14 ` Damien Le Moal
@ 2016-09-26 11:14   ` Damien Le Moal
  -1 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Implement zoned block device zone information reporting and reset.
Zone information are reported as struct blk_zone. This implementation
does not differentiate between host-aware and host-managed device
models and is valid for both. Two functions are provided:
blkdev_report_zones for discovering the zone configuration of a
zoned block device, and blkdev_reset_zones for resetting the write
pointer of sequential zones. The helper function blk_queue_zone_size
and bdev_zone_size are also provided for, as the name suggest,
obtaining the zone size (in 512B sectors) of the zones of the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed the zone cache
         * Implement report zones operation based on earlier proposal
           by Shaun Tancheff <shaun.tancheff@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/Kconfig          |   8 ++
 block/Makefile         |   1 +
 block/blk-zoned.c      | 240 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  71 +++++++++++++++
 4 files changed, 320 insertions(+)
 create mode 100644 block/blk-zoned.c

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..6b0ad08 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
 	T10/SCSI Data Integrity Field or the T13/ATA External Path
 	Protection.  If in doubt, say N.
 
+config BLK_DEV_ZONED
+	bool "Zoned block device support"
+	---help---
+	Block layer zoned block device support. This option enables
+	support for ZAC/ZBC host-managed and host-aware zoned block devices.
+
+	Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_DEV_THROTTLING
 	bool "Block layer bio throttling support"
 	depends on BLK_CGROUP=y
diff --git a/block/Makefile b/block/Makefile
index 36acdd7..9371bc7 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -22,4 +22,5 @@ obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
 obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
+obj-$(CONFIG_BLK_DEV_ZONED)	+= blk-zoned.o
 obj-$(CONFIG_BLK_MQ_PCI)	+= blk-mq-pci.o
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 0000000..473cb0a
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,240 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ *
+ * Copyright (c) 2016, Damien Le Moal
+ * Copyright (c) 2016, Western Digital
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/rbtree.h>
+#include <linux/blkdev.h>
+
+static inline sector_t blk_zone_start(struct request_queue *q,
+				      sector_t sector)
+{
+	sector_t zone_mask = blk_queue_zone_size(q) - 1;
+
+	return sector & ~zone_mask;
+}
+
+static inline void blkdev_report_to_zone(struct block_device *bdev,
+					 void *rep,
+					 struct blk_zone *zone)
+{
+	sector_t offset = get_start_sect(bdev);
+
+	memcpy(zone, rep, sizeof(struct blk_zone));
+	zone->start -= offset;
+	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		zone->wp = zone->start + zone->len;
+	else
+		zone->wp -= offset;
+}
+
+/**
+ * blkdev_report_zones - Get zones information
+ * @bdev:	Target block device
+ * @sector:	Sector from which to report zones
+ * @zones:      Array of zone structures where to return the zones information
+ * @nr_zones:   Number of zone structures in the zone array
+ * @gfp_mask:	Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Get zone information starting from the zone containing @sector.
+ *    The number of zone information reported may be less than the number
+ *    requested by @nr_zones. The number of zones actually reported is
+ *    returned in @nr_zones.
+ */
+int blkdev_report_zones(struct block_device *bdev,
+			sector_t sector,
+			struct blk_zone *zones,
+			unsigned int *nr_zones,
+			gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct blk_zone_report_hdr *hdr;
+	unsigned int nrz = *nr_zones;
+	struct page *page;
+	unsigned int nr_rep;
+	size_t rep_bytes;
+	unsigned int nr_pages;
+	struct bio *bio;
+	struct bio_vec *bv;
+	unsigned int i, nz;
+	unsigned int ofst;
+	void *addr;
+	int ret = 0;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (!nrz)
+		return 0;
+
+	if (sector > bdev->bd_part->nr_sects) {
+		*nr_zones = 0;
+		return 0;
+	}
+
+	/*
+	 * The zone report has a header. So make room for it in the
+	 * payload. Also make sure that the report fits in a single BIO
+	 * that will not be split down the stack.
+	 */
+	rep_bytes = sizeof(struct blk_zone_report_hdr) +
+		sizeof(struct blk_zone) * nrz;
+	rep_bytes = (rep_bytes + PAGE_SIZE - 1) & PAGE_MASK;
+	if (rep_bytes > (queue_max_sectors(q) << 9))
+		rep_bytes = queue_max_sectors(q) << 9;
+
+	nr_pages = min_t(unsigned int, BIO_MAX_PAGES,
+			 rep_bytes >> PAGE_SHIFT);
+	nr_pages = min_t(unsigned int, nr_pages,
+			 queue_max_segments(q));
+
+	bio = bio_alloc(gfp_mask, nr_pages);
+	if (!bio)
+		return -ENOMEM;
+
+	bio->bi_bdev = bdev;
+	bio->bi_iter.bi_sector = blk_zone_start(q, sector);
+	bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, 0);
+
+	for (i = 0; i < nr_pages; i++) {
+		page = alloc_page(gfp_mask);
+		if (!page) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		if (!bio_add_page(bio, page, PAGE_SIZE, 0)) {
+			__free_page(page);
+			break;
+		}
+	}
+
+	if (i == 0)
+		ret = -ENOMEM;
+	else
+		ret = submit_bio_wait(bio);
+	if (ret)
+		goto out;
+
+	/*
+	 * Process the report resukt: skip the header and go through the
+	 * reported zones to fixup and fixup the zone information for
+	 * partitions. At the same time, return the zone information into
+	 * the zone array.
+	 */
+	nz = 0;
+	nr_rep = 0;
+	bio_for_each_segment_all(bv, bio, i) {
+
+		if (!bv->bv_page)
+			break;
+
+		addr = kmap_atomic(bv->bv_page);
+
+		/* Get header in the first page */
+		ofst = 0;
+		if (!nr_rep) {
+			hdr = (struct blk_zone_report_hdr *) addr;
+			nr_rep = hdr->nr_zones;
+			ofst = sizeof(struct blk_zone_report_hdr);
+		}
+
+		/* Fixup and report zones */
+		while (ofst < bv->bv_len &&
+		       nz < min_t(unsigned int, nr_rep, nrz)) {
+			blkdev_report_to_zone(bdev, addr + ofst, &zones[nz]);
+			ofst += sizeof(struct blk_zone);
+			nz++;
+		}
+
+		kunmap_atomic(addr);
+
+		if (!nr_rep)
+			break;
+
+	}
+
+out:
+	bio_for_each_segment_all(bv, bio, i)
+		__free_page(bv->bv_page);
+	bio_put(bio);
+
+	if (ret == 0)
+		*nr_zones = nz;
+
+	return ret;
+}
+
+/**
+ * blkdev_reset_zones - Reset zones write pointer
+ * @bdev:       Target block device
+ * @sector:     Start sector of the first zone to reset
+ * @nr_sectors: Number of sectors, at least the length of one zone
+ * @gfp_mask:   Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Reset the write pointer of the zones contained in the range
+ *    @sector..@sector+@nr_sectors. Specifying the entire disk sector range
+ *    is valid, but the specified range should not contain conventional zones.
+ */
+int blkdev_reset_zones(struct block_device *bdev,
+		       sector_t sector, sector_t nr_sectors,
+		       gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	sector_t zone_sectors;
+	sector_t end_sector = sector + nr_sectors;
+	struct bio *bio;
+	int ret;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (end_sector > bdev->bd_part->nr_sects)
+		/* Out of range */
+		return -EINVAL;
+
+	/* Check alignement (handle eventual smaller last zone) */
+	zone_sectors = blk_queue_zone_size(q);
+	if (sector & (zone_sectors - 1))
+		return -EINVAL;
+
+	if ((nr_sectors & (zone_sectors - 1)) &&
+	    end_sector != bdev->bd_part->nr_sects)
+		return -EINVAL;
+
+	while (sector < end_sector) {
+
+		bio = bio_alloc(gfp_mask, 0);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_bdev = bdev;
+		bio_set_op_attrs(bio, REQ_OP_ZONE_RESET, 0);
+
+		ret = submit_bio_wait(bio);
+		bio_put(bio);
+
+		if (ret)
+			return ret;
+
+		sector += zone_sectors;
+
+		/* This may take a while, so be nice to others */
+		cond_resched();
+
+	}
+
+	return 0;
+}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f19e16b..6034f38 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -302,6 +302,62 @@ struct queue_limits {
 	enum blk_zoned_model	zoned;
 };
 
+#ifdef CONFIG_BLK_DEV_ZONED
+
+/*
+ * Zone type.
+ */
+enum blk_zone_type {
+	BLK_ZONE_TYPE_UNKNOWN,
+	BLK_ZONE_TYPE_CONVENTIONAL,
+	BLK_ZONE_TYPE_SEQWRITE_REQ,
+	BLK_ZONE_TYPE_SEQWRITE_PREF,
+};
+
+/*
+ * Zone condition.
+ */
+enum blk_zone_cond {
+	BLK_ZONE_COND_NO_WP,
+	BLK_ZONE_COND_EMPTY,
+	BLK_ZONE_COND_IMP_OPEN,
+	BLK_ZONE_COND_EXP_OPEN,
+	BLK_ZONE_COND_CLOSED,
+	BLK_ZONE_COND_READONLY = 0xd,
+	BLK_ZONE_COND_FULL,
+	BLK_ZONE_COND_OFFLINE,
+};
+
+/*
+ * Zone descriptor for BLKREPORTZONE.
+ * start, len and wp use the regulare 512 B sector unit,
+ * regardless of the device logical block size. The overall
+ * structure size is 64 B to match the ZBC/ZAC defined zone descriptor
+ * and allow support for future additional zone information.
+ */
+struct blk_zone {
+	u64	start;		/* Zone start sector */
+	u64	len;		/* Zone length in number of sectors */
+	u64	wp;		/* Zone write pointer position */
+	u8	type;		/* Zone type */
+	u8	cond;		/* Zone condition */
+	u8	non_seq;	/* Non-sequential write resources active */
+	u8	reset;		/* Reset write pointer recommended */
+	u8	reserved[36];
+};
+
+struct blk_zone_report_hdr {
+	unsigned int	nr_zones;
+	u8		padding[60];
+};
+
+extern int blkdev_report_zones(struct block_device *,
+				sector_t, struct blk_zone *,
+				unsigned int *, gfp_t);
+extern int blkdev_reset_zones(struct block_device *, sector_t,
+				sector_t, gfp_t);
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 struct request_queue {
 	/*
 	 * Together with queue_head for cacheline sharing
@@ -654,6 +710,11 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
 	}
 }
 
+static inline unsigned int blk_queue_zone_size(struct request_queue *q)
+{
+	return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0;
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */
@@ -1401,6 +1462,16 @@ static inline bool bdev_is_zoned(struct block_device *bdev)
 	return false;
 }
 
+static inline unsigned int bdev_zone_size(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_zone_size(q);
+
+	return 0;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/7] block: Implement support for zoned block devices
@ 2016-09-26 11:14   ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Implement zoned block device zone information reporting and reset.
Zone information are reported as struct blk_zone. This implementation
does not differentiate between host-aware and host-managed device
models and is valid for both. Two functions are provided:
blkdev_report_zones for discovering the zone configuration of a
zoned block device, and blkdev_reset_zones for resetting the write
pointer of sequential zones. The helper function blk_queue_zone_size
and bdev_zone_size are also provided for, as the name suggest,
obtaining the zone size (in 512B sectors) of the zones of the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed the zone cache
         * Implement report zones operation based on earlier proposal
           by Shaun Tancheff <shaun.tancheff@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/Kconfig          |   8 ++
 block/Makefile         |   1 +
 block/blk-zoned.c      | 240 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  71 +++++++++++++++
 4 files changed, 320 insertions(+)
 create mode 100644 block/blk-zoned.c

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..6b0ad08 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
 	T10/SCSI Data Integrity Field or the T13/ATA External Path
 	Protection.  If in doubt, say N.
 
+config BLK_DEV_ZONED
+	bool "Zoned block device support"
+	---help---
+	Block layer zoned block device support. This option enables
+	support for ZAC/ZBC host-managed and host-aware zoned block devices.
+
+	Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_DEV_THROTTLING
 	bool "Block layer bio throttling support"
 	depends on BLK_CGROUP=y
diff --git a/block/Makefile b/block/Makefile
index 36acdd7..9371bc7 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -22,4 +22,5 @@ obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
 obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
+obj-$(CONFIG_BLK_DEV_ZONED)	+= blk-zoned.o
 obj-$(CONFIG_BLK_MQ_PCI)	+= blk-mq-pci.o
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 0000000..473cb0a
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,240 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ *
+ * Copyright (c) 2016, Damien Le Moal
+ * Copyright (c) 2016, Western Digital
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/rbtree.h>
+#include <linux/blkdev.h>
+
+static inline sector_t blk_zone_start(struct request_queue *q,
+				      sector_t sector)
+{
+	sector_t zone_mask = blk_queue_zone_size(q) - 1;
+
+	return sector & ~zone_mask;
+}
+
+static inline void blkdev_report_to_zone(struct block_device *bdev,
+					 void *rep,
+					 struct blk_zone *zone)
+{
+	sector_t offset = get_start_sect(bdev);
+
+	memcpy(zone, rep, sizeof(struct blk_zone));
+	zone->start -= offset;
+	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		zone->wp = zone->start + zone->len;
+	else
+		zone->wp -= offset;
+}
+
+/**
+ * blkdev_report_zones - Get zones information
+ * @bdev:	Target block device
+ * @sector:	Sector from which to report zones
+ * @zones:      Array of zone structures where to return the zones information
+ * @nr_zones:   Number of zone structures in the zone array
+ * @gfp_mask:	Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Get zone information starting from the zone containing @sector.
+ *    The number of zone information reported may be less than the number
+ *    requested by @nr_zones. The number of zones actually reported is
+ *    returned in @nr_zones.
+ */
+int blkdev_report_zones(struct block_device *bdev,
+			sector_t sector,
+			struct blk_zone *zones,
+			unsigned int *nr_zones,
+			gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct blk_zone_report_hdr *hdr;
+	unsigned int nrz = *nr_zones;
+	struct page *page;
+	unsigned int nr_rep;
+	size_t rep_bytes;
+	unsigned int nr_pages;
+	struct bio *bio;
+	struct bio_vec *bv;
+	unsigned int i, nz;
+	unsigned int ofst;
+	void *addr;
+	int ret = 0;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (!nrz)
+		return 0;
+
+	if (sector > bdev->bd_part->nr_sects) {
+		*nr_zones = 0;
+		return 0;
+	}
+
+	/*
+	 * The zone report has a header. So make room for it in the
+	 * payload. Also make sure that the report fits in a single BIO
+	 * that will not be split down the stack.
+	 */
+	rep_bytes = sizeof(struct blk_zone_report_hdr) +
+		sizeof(struct blk_zone) * nrz;
+	rep_bytes = (rep_bytes + PAGE_SIZE - 1) & PAGE_MASK;
+	if (rep_bytes > (queue_max_sectors(q) << 9))
+		rep_bytes = queue_max_sectors(q) << 9;
+
+	nr_pages = min_t(unsigned int, BIO_MAX_PAGES,
+			 rep_bytes >> PAGE_SHIFT);
+	nr_pages = min_t(unsigned int, nr_pages,
+			 queue_max_segments(q));
+
+	bio = bio_alloc(gfp_mask, nr_pages);
+	if (!bio)
+		return -ENOMEM;
+
+	bio->bi_bdev = bdev;
+	bio->bi_iter.bi_sector = blk_zone_start(q, sector);
+	bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, 0);
+
+	for (i = 0; i < nr_pages; i++) {
+		page = alloc_page(gfp_mask);
+		if (!page) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		if (!bio_add_page(bio, page, PAGE_SIZE, 0)) {
+			__free_page(page);
+			break;
+		}
+	}
+
+	if (i == 0)
+		ret = -ENOMEM;
+	else
+		ret = submit_bio_wait(bio);
+	if (ret)
+		goto out;
+
+	/*
+	 * Process the report resukt: skip the header and go through the
+	 * reported zones to fixup and fixup the zone information for
+	 * partitions. At the same time, return the zone information into
+	 * the zone array.
+	 */
+	nz = 0;
+	nr_rep = 0;
+	bio_for_each_segment_all(bv, bio, i) {
+
+		if (!bv->bv_page)
+			break;
+
+		addr = kmap_atomic(bv->bv_page);
+
+		/* Get header in the first page */
+		ofst = 0;
+		if (!nr_rep) {
+			hdr = (struct blk_zone_report_hdr *) addr;
+			nr_rep = hdr->nr_zones;
+			ofst = sizeof(struct blk_zone_report_hdr);
+		}
+
+		/* Fixup and report zones */
+		while (ofst < bv->bv_len &&
+		       nz < min_t(unsigned int, nr_rep, nrz)) {
+			blkdev_report_to_zone(bdev, addr + ofst, &zones[nz]);
+			ofst += sizeof(struct blk_zone);
+			nz++;
+		}
+
+		kunmap_atomic(addr);
+
+		if (!nr_rep)
+			break;
+
+	}
+
+out:
+	bio_for_each_segment_all(bv, bio, i)
+		__free_page(bv->bv_page);
+	bio_put(bio);
+
+	if (ret == 0)
+		*nr_zones = nz;
+
+	return ret;
+}
+
+/**
+ * blkdev_reset_zones - Reset zones write pointer
+ * @bdev:       Target block device
+ * @sector:     Start sector of the first zone to reset
+ * @nr_sectors: Number of sectors, at least the length of one zone
+ * @gfp_mask:   Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Reset the write pointer of the zones contained in the range
+ *    @sector..@sector+@nr_sectors. Specifying the entire disk sector range
+ *    is valid, but the specified range should not contain conventional zones.
+ */
+int blkdev_reset_zones(struct block_device *bdev,
+		       sector_t sector, sector_t nr_sectors,
+		       gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	sector_t zone_sectors;
+	sector_t end_sector = sector + nr_sectors;
+	struct bio *bio;
+	int ret;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (end_sector > bdev->bd_part->nr_sects)
+		/* Out of range */
+		return -EINVAL;
+
+	/* Check alignement (handle eventual smaller last zone) */
+	zone_sectors = blk_queue_zone_size(q);
+	if (sector & (zone_sectors - 1))
+		return -EINVAL;
+
+	if ((nr_sectors & (zone_sectors - 1)) &&
+	    end_sector != bdev->bd_part->nr_sects)
+		return -EINVAL;
+
+	while (sector < end_sector) {
+
+		bio = bio_alloc(gfp_mask, 0);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_bdev = bdev;
+		bio_set_op_attrs(bio, REQ_OP_ZONE_RESET, 0);
+
+		ret = submit_bio_wait(bio);
+		bio_put(bio);
+
+		if (ret)
+			return ret;
+
+		sector += zone_sectors;
+
+		/* This may take a while, so be nice to others */
+		cond_resched();
+
+	}
+
+	return 0;
+}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f19e16b..6034f38 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -302,6 +302,62 @@ struct queue_limits {
 	enum blk_zoned_model	zoned;
 };
 
+#ifdef CONFIG_BLK_DEV_ZONED
+
+/*
+ * Zone type.
+ */
+enum blk_zone_type {
+	BLK_ZONE_TYPE_UNKNOWN,
+	BLK_ZONE_TYPE_CONVENTIONAL,
+	BLK_ZONE_TYPE_SEQWRITE_REQ,
+	BLK_ZONE_TYPE_SEQWRITE_PREF,
+};
+
+/*
+ * Zone condition.
+ */
+enum blk_zone_cond {
+	BLK_ZONE_COND_NO_WP,
+	BLK_ZONE_COND_EMPTY,
+	BLK_ZONE_COND_IMP_OPEN,
+	BLK_ZONE_COND_EXP_OPEN,
+	BLK_ZONE_COND_CLOSED,
+	BLK_ZONE_COND_READONLY = 0xd,
+	BLK_ZONE_COND_FULL,
+	BLK_ZONE_COND_OFFLINE,
+};
+
+/*
+ * Zone descriptor for BLKREPORTZONE.
+ * start, len and wp use the regulare 512 B sector unit,
+ * regardless of the device logical block size. The overall
+ * structure size is 64 B to match the ZBC/ZAC defined zone descriptor
+ * and allow support for future additional zone information.
+ */
+struct blk_zone {
+	u64	start;		/* Zone start sector */
+	u64	len;		/* Zone length in number of sectors */
+	u64	wp;		/* Zone write pointer position */
+	u8	type;		/* Zone type */
+	u8	cond;		/* Zone condition */
+	u8	non_seq;	/* Non-sequential write resources active */
+	u8	reset;		/* Reset write pointer recommended */
+	u8	reserved[36];
+};
+
+struct blk_zone_report_hdr {
+	unsigned int	nr_zones;
+	u8		padding[60];
+};
+
+extern int blkdev_report_zones(struct block_device *,
+				sector_t, struct blk_zone *,
+				unsigned int *, gfp_t);
+extern int blkdev_reset_zones(struct block_device *, sector_t,
+				sector_t, gfp_t);
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 struct request_queue {
 	/*
 	 * Together with queue_head for cacheline sharing
@@ -654,6 +710,11 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
 	}
 }
 
+static inline unsigned int blk_queue_zone_size(struct request_queue *q)
+{
+	return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0;
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */
@@ -1401,6 +1462,16 @@ static inline bool bdev_is_zoned(struct block_device *bdev)
 	return false;
 }
 
+static inline unsigned int bdev_zone_size(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_zone_size(q);
+
+	return 0;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 6/7] sd: Implement support for ZBC devices
  2016-09-26 11:14 ` Damien Le Moal
@ 2016-09-26 11:14   ` Damien Le Moal
  -1 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Implement ZBC support functions to setup zoned disks, both
host-managed and host-aware models. Only zoned disks that satisfy
the following conditions are supported:
1) All zones are the same size, with the exception of an eventual
   last smaller runt zone.
2) For host-managed disks, reads are unrestricted (reads are not
   failed due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions will be ignored.

The capacity read of the device triggers the zoned block device
checks. As this needs the zone model of the disk, the call to
sd_read_capacity is moved after the call to
sd_read_block_characteristics so that host-aware devices are
properlly detected and initialized. The call to sd_zbc_read_zones
in sd_read_capacity may change the device capacity obtained with
the sd_read_capacity_16 function for devices reporting only the
capacity of conventional zones at the beginning of the LBA range
(i.e. devices with rc_basis set to 0).

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed zone cache support
         * Removed mapping of discard to reset write pointer command
         * Modified sd_zbc_read_zones to include checks that the
           device satisfies the kernel constraints
         * Implemeted REPORT ZONES setup and post-processing based
           on code from Shaun Tancheff <shaun.tancheff@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 drivers/scsi/Makefile     |   1 +
 drivers/scsi/sd.c         |  97 ++++++--
 drivers/scsi/sd.h         |  67 ++++++
 drivers/scsi/sd_zbc.c     | 586 ++++++++++++++++++++++++++++++++++++++++++++++
 include/scsi/scsi_proto.h |  17 ++
 5 files changed, 754 insertions(+), 14 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index fc0d9b8..350513c 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -180,6 +180,7 @@ hv_storvsc-y			:= storvsc_drv.o
 
 sd_mod-objs	:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
+sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
 
 sr_mod-objs	:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 51e5629..4b3523b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -93,6 +93,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS	16
@@ -163,7 +164,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 	static const char temp[] = "temporary ";
 	int len;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		/* no cache control on RBC devices; theoretically they
 		 * can do it, but there's probably so many exceptions
 		 * it's not worth the risk */
@@ -262,7 +263,7 @@ allow_restart_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	sdp->allow_restart = simple_strtoul(buf, NULL, 10);
@@ -392,6 +393,11 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
+	if (sd_is_zoned(sdkp)) {
+		sd_config_discard(sdkp, SD_LBP_DISABLE);
+		return count;
+	}
+
 	if (sdp->type != TYPE_DISK)
 		return -EINVAL;
 
@@ -459,7 +465,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &max);
@@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
 	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
+		if (ret != BLKPREP_OK)
+			return ret;
+	}
+
 	sector >>= ilog2(sdp->sector_size) - 9;
 	nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
@@ -963,6 +976,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
 					(unsigned long long)block));
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
+		if (ret != BLKPREP_OK)
+			goto out;
+	}
+
 	/*
 	 * If we have a 1K hardware sectorsize, prevent access to single
 	 * 512 byte sectors.  In theory we could handle this - in fact
@@ -1149,6 +1169,10 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 	case REQ_OP_READ:
 	case REQ_OP_WRITE:
 		return sd_setup_read_write_cmnd(cmd);
+	case REQ_OP_ZONE_REPORT:
+		return sd_zbc_setup_report_cmnd(cmd);
+	case REQ_OP_ZONE_RESET:
+		return sd_zbc_setup_reset_cmnd(cmd);
 	default:
 		BUG();
 	}
@@ -1780,7 +1804,10 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	unsigned char op = SCpnt->cmnd[0];
 	unsigned char unmap = SCpnt->cmnd[1] & 8;
 
-	if (req_op(req) == REQ_OP_DISCARD || req_op(req) == REQ_OP_WRITE_SAME) {
+	switch (req_op(req)) {
+	case REQ_OP_DISCARD:
+	case REQ_OP_WRITE_SAME:
+	case REQ_OP_ZONE_RESET:
 		if (!result) {
 			good_bytes = blk_rq_bytes(req);
 			scsi_set_resid(SCpnt, 0);
@@ -1788,6 +1815,17 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			good_bytes = 0;
 			scsi_set_resid(SCpnt, blk_rq_bytes(req));
 		}
+		break;
+	case REQ_OP_ZONE_REPORT:
+		if (!result) {
+			good_bytes = scsi_bufflen(SCpnt)
+				- scsi_get_resid(SCpnt);
+			scsi_set_resid(SCpnt, 0);
+		} else {
+			good_bytes = 0;
+			scsi_set_resid(SCpnt, blk_rq_bytes(req));
+		}
+		break;
 	}
 
 	if (result) {
@@ -1848,7 +1886,11 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	default:
 		break;
 	}
+
  out:
+	if (sd_is_zoned(sdkp))
+		sd_zbc_complete(SCpnt, good_bytes, &sshdr);
+
 	SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt,
 					   "sd_done: completed %d of %d bytes\n",
 					   good_bytes, scsi_bufflen(SCpnt)));
@@ -1983,7 +2025,6 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 	}
 }
 
-
 /*
  * Determine whether disk supports Data Integrity Field.
  */
@@ -2133,6 +2174,9 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	/* Logical blocks per physical block exponent */
 	sdkp->physical_block_size = (1 << (buffer[13] & 0xf)) * sector_size;
 
+	/* RC basis */
+	sdkp->rc_basis = (buffer[12] >> 4) & 0x3;
+
 	/* Lowest aligned logical block */
 	alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
 	blk_queue_alignment_offset(sdp->request_queue, alignment);
@@ -2323,6 +2367,13 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
 		sector_size = 512;
 	}
 	blk_queue_logical_block_size(sdp->request_queue, sector_size);
+	blk_queue_physical_block_size(sdp->request_queue,
+				      sdkp->physical_block_size);
+	sdkp->device->sector_size = sector_size;
+
+	if (sd_zbc_read_zones(sdkp, buffer) < 0)
+		/* The drive zone layout could not be checked */
+		sdkp->capacity = 0;
 
 	{
 		char cap_str_2[10], cap_str_10[10];
@@ -2349,9 +2400,6 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
 	if (sdkp->capacity > 0xffffffff)
 		sdp->use_16_for_rw = 1;
 
-	blk_queue_physical_block_size(sdp->request_queue,
-				      sdkp->physical_block_size);
-	sdkp->device->sector_size = sector_size;
 }
 
 /* called with buffer of length 512 */
@@ -2613,7 +2661,7 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
 	struct scsi_mode_data data;
 	struct scsi_sense_hdr sshdr;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return;
 
 	if (sdkp->protection_type == 0)
@@ -2720,6 +2768,7 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
  */
 static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 {
+	struct request_queue *q = sdkp->disk->queue;
 	unsigned char *buffer;
 	u16 rot;
 	const int vpd_len = 64;
@@ -2734,10 +2783,21 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 	rot = get_unaligned_be16(&buffer[4]);
 
 	if (rot == 1) {
-		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
-		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
+		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
+		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, q);
 	}
 
+	sdkp->zoned = (buffer[8] >> 4) & 3;
+	if (sdkp->zoned == 1)
+		q->limits.zoned = BLK_ZONED_HA;
+	else if (sdkp->device->type == TYPE_ZBC)
+		q->limits.zoned = BLK_ZONED_HM;
+	else
+		q->limits.zoned = BLK_ZONED_NONE;
+	if (blk_queue_is_zoned(q) && sdkp->first_scan)
+		sd_printk(KERN_NOTICE, sdkp, "Host-%s zoned block device\n",
+		      q->limits.zoned == BLK_ZONED_HM ? "managed" : "aware");
+
  out:
 	kfree(buffer);
 }
@@ -2836,14 +2896,14 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	 * react badly if we do.
 	 */
 	if (sdkp->media_present) {
-		sd_read_capacity(sdkp, buffer);
-
 		if (scsi_device_supports_vpd(sdp)) {
 			sd_read_block_provisioning(sdkp);
 			sd_read_block_limits(sdkp);
 			sd_read_block_characteristics(sdkp);
 		}
 
+		sd_read_capacity(sdkp, buffer);
+
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
@@ -3041,9 +3101,16 @@ static int sd_probe(struct device *dev)
 
 	scsi_autopm_get_device(sdp);
 	error = -ENODEV;
-	if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
+	if (sdp->type != TYPE_DISK &&
+	    sdp->type != TYPE_ZBC &&
+	    sdp->type != TYPE_MOD &&
+	    sdp->type != TYPE_RBC)
 		goto out;
 
+#ifndef CONFIG_BLK_DEV_ZONED
+	if (sdp->type == TYPE_ZBC)
+		goto out;
+#endif
 	SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
 					"sd_probe\n"));
 
@@ -3147,6 +3214,8 @@ static int sd_remove(struct device *dev)
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 
+	sd_zbc_remove(sdkp);
+
 	blk_register_region(devt, SD_MINORS, NULL,
 			    sd_default_probe, NULL, NULL);
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index c8d9863..0ba47c1 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -64,6 +64,15 @@ struct scsi_disk {
 	struct scsi_device *device;
 	struct device	dev;
 	struct gendisk	*disk;
+#ifdef CONFIG_BLK_DEV_ZONED
+	unsigned int	nr_zones;
+	sector_t	zone_sectors;
+	unsigned int	zone_shift;
+	unsigned long	*zones_wlock;
+	unsigned int	zones_optimal_open;
+	unsigned int	zones_optimal_nonseq;
+	unsigned int	zones_max_open;
+#endif
 	atomic_t	openers;
 	sector_t	capacity;	/* size in logical blocks */
 	u32		max_xfer_blocks;
@@ -94,6 +103,9 @@ struct scsi_disk {
 	unsigned	lbpvpd : 1;
 	unsigned	ws10 : 1;
 	unsigned	ws16 : 1;
+	unsigned	rc_basis: 2;
+	unsigned	zoned: 2;
+	unsigned	urswrz : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
@@ -156,6 +168,11 @@ static inline unsigned int logical_to_bytes(struct scsi_device *sdev, sector_t b
 	return blocks * sdev->sector_size;
 }
 
+static inline sector_t sectors_to_logical(struct scsi_device *sdev, sector_t sector)
+{
+	return sector >> (ilog2(sdev->sector_size) - 9);
+}
+
 /*
  * Look up the DIX operation based on whether the command is read or
  * write and whether dix and dif are enabled.
@@ -239,4 +256,54 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+static inline int sd_is_zoned(struct scsi_disk *sdkp)
+{
+	return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
+}
+
+#ifdef CONFIG_BLK_DEV_ZONED
+
+extern int sd_zbc_read_zones(struct scsi_disk *, unsigned char *);
+extern void sd_zbc_remove(struct scsi_disk *);
+extern int sd_zbc_setup_read_write(struct scsi_disk *, struct request *,
+				   sector_t, unsigned int);
+extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *);
+extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *);
+extern void sd_zbc_complete(struct scsi_cmnd *, unsigned int,
+			    struct scsi_sense_hdr *);
+
+#else /* CONFIG_BLK_DEV_ZONED */
+
+static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
+				    unsigned char *buf)
+{
+	return 0;
+}
+
+static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
+
+static inline int sd_zbc_setup_read_write(struct scsi_disk *sdkp,
+					  struct request *rq, sector_t sector,
+					  unsigned int num_sectors)
+{
+	/* Let the drive fail requests */
+	return BLKPREP_OK;
+}
+
+static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_KILL;
+}
+
+static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_KILL;
+}
+
+static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
+				   unsigned int good_bytes,
+				   struct scsi_sense_hdr *sshdr) {}
+
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 #endif /* _SCSI_DISK_H */
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
new file mode 100644
index 0000000..2680f51
--- /dev/null
+++ b/drivers/scsi/sd_zbc.c
@@ -0,0 +1,586 @@
+/*
+ * SCSI Zoned Block commands
+ *
+ * Copyright (C) 2014-2015 SUSE Linux GmbH
+ * Written by: Hannes Reinecke <hare@suse.de>
+ * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
+ * Modified by: Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include <linux/blkdev.h>
+
+#include <asm/unaligned.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_dbg.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_eh.h>
+
+#include "sd.h"
+#include "scsi_priv.h"
+
+enum zbc_zone_type {
+	ZBC_ZONE_TYPE_CONV = 0x1,
+	ZBC_ZONE_TYPE_SEQWRITE_REQ,
+	ZBC_ZONE_TYPE_SEQWRITE_PREF,
+	ZBC_ZONE_TYPE_RESERVED,
+};
+
+enum zbc_zone_cond {
+	ZBC_ZONE_COND_NO_WP,
+	ZBC_ZONE_COND_EMPTY,
+	ZBC_ZONE_COND_IMP_OPEN,
+	ZBC_ZONE_COND_EXP_OPEN,
+	ZBC_ZONE_COND_CLOSED,
+	ZBC_ZONE_COND_READONLY = 0xd,
+	ZBC_ZONE_COND_FULL,
+	ZBC_ZONE_COND_OFFLINE,
+};
+
+/**
+ * Convert a zone descriptor to a zone struct.
+ */
+static void sd_zbc_parse_report(struct scsi_disk *sdkp,
+				u8 *buf,
+				struct blk_zone *zone)
+{
+	struct scsi_device *sdp = sdkp->device;
+
+	memset(zone, 0, sizeof(struct blk_zone));
+
+	zone->type = buf[0] & 0x0f;
+	zone->cond = (buf[1] >> 4) & 0xf;
+	if (buf[1] & 0x01)
+		zone->reset = 1;
+	if (buf[1] & 0x02)
+		zone->non_seq = 1;
+
+	zone->len = logical_to_sectors(sdp, get_unaligned_be64(&buf[8]));
+	zone->start = logical_to_sectors(sdp, get_unaligned_be64(&buf[16]));
+	zone->wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24]));
+	if (zone->type != ZBC_ZONE_TYPE_CONV &&
+	    zone->cond == ZBC_ZONE_COND_FULL)
+		zone->wp = zone->start + zone->len;
+}
+
+/**
+ * Issue a REPORT ZONES scsi command.
+ */
+static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
+			       unsigned int buflen, sector_t start_sector)
+{
+	struct scsi_device *sdp = sdkp->device;
+	const int timeout = sdp->request_queue->rq_timeout;
+	struct scsi_sense_hdr sshdr;
+	sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
+	unsigned char cmd[16];
+	unsigned int rep_len;
+	int result;
+
+	memset(cmd, 0, 16);
+	cmd[0] = ZBC_IN;
+	cmd[1] = ZI_REPORT_ZONES;
+	put_unaligned_be64(start_lba, &cmd[2]);
+	put_unaligned_be32(buflen, &cmd[10]);
+	memset(buf, 0, buflen);
+
+	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
+				  buf, buflen, &sshdr,
+				  timeout, SD_MAX_RETRIES, NULL);
+	if (result) {
+		sd_printk(KERN_ERR, sdkp,
+			  "REPORT ZONES lba %llu failed with %d/%d\n",
+			  (unsigned long long)start_lba,
+			  host_byte(result), driver_byte(result));
+		return -EIO;
+	}
+
+	rep_len = get_unaligned_be32(&buf[0]);
+	if (rep_len < 64) {
+		sd_printk(KERN_ERR, sdkp,
+			  "REPORT ZONES report invalid length %u\n",
+			  rep_len);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t lba, sector = blk_rq_pos(rq);
+	unsigned int nr_bytes = blk_rq_bytes(rq);
+	int ret;
+
+	WARN_ON(nr_bytes == 0);
+
+	if (!sd_is_zoned(sdkp))
+		/* Not a zoned device */
+		return BLKPREP_KILL;
+
+	ret = scsi_init_io(cmd);
+	if (ret != BLKPREP_OK)
+		return ret;
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_IN;
+	cmd->cmnd[1] = ZI_REPORT_ZONES;
+	lba = sectors_to_logical(sdkp->device, sector);
+	put_unaligned_be64(lba, &cmd->cmnd[2]);
+	put_unaligned_be32(nr_bytes, &cmd->cmnd[10]);
+	/* Do partial report for speeding things up */
+	cmd->cmnd[14] = ZBC_REPORT_ZONE_PARTIAL;
+
+	cmd->sc_data_direction = DMA_FROM_DEVICE;
+	cmd->sdb.length = nr_bytes;
+	cmd->transfersize = sdkp->device->sector_size;
+	cmd->allowed = 0;
+
+	/*
+	 * Report may return less bytes than requested. Make sure
+	 * to report completion on the entire initial request.
+	 */
+	rq->__data_len = nr_bytes;
+
+	return BLKPREP_OK;
+}
+
+static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
+					 unsigned int good_bytes)
+{
+	struct request *rq = scmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	struct sg_mapping_iter miter;
+	struct blk_zone_report_hdr hdr;
+	struct blk_zone zone;
+	unsigned int offset, bytes = 0;
+	unsigned long flags;
+	u8 *buf;
+
+	if (good_bytes < 64)
+		return;
+
+	memset(&hdr, 0, sizeof(struct blk_zone_report_hdr));
+
+	sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd),
+		       SG_MITER_TO_SG | SG_MITER_ATOMIC);
+
+	local_irq_save(flags);
+	while (sg_miter_next(&miter) && bytes < good_bytes) {
+
+		buf = miter.addr;
+		offset = 0;
+
+		if (bytes == 0) {
+			/* Set the report header */
+			hdr.nr_zones = min_t(unsigned int,
+					 (good_bytes - 64) / 64,
+					 get_unaligned_be32(&buf[0]) / 64);
+			memcpy(buf, &hdr, sizeof(struct blk_zone_report_hdr));
+			offset += 64;
+			bytes += 64;
+		}
+
+		/* Parse zone descriptors */
+		while (offset < miter.length && hdr.nr_zones) {
+			WARN_ON(offset > miter.length);
+			buf = miter.addr + offset;
+			sd_zbc_parse_report(sdkp, buf, &zone);
+			memcpy(buf, &zone, sizeof(struct blk_zone));
+			offset += 64;
+			bytes += 64;
+			hdr.nr_zones--;
+		}
+
+		if (!hdr.nr_zones)
+			break;
+
+	}
+	sg_miter_stop(&miter);
+	local_irq_restore(flags);
+}
+
+int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t sector = blk_rq_pos(rq);
+	unsigned int zone_no = sector >> sdkp->zone_shift;
+
+	if (!sd_is_zoned(sdkp))
+		/* Not a zoned device */
+		return BLKPREP_KILL;
+
+	if (sector & (sdkp->zone_sectors - 1))
+		/* Unaligned request */
+		return BLKPREP_KILL;
+
+	/* Do not allow concurrent reset and writes */
+	if (!test_and_set_bit(zone_no, sdkp->zones_wlock))
+		return BLKPREP_DEFER;
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_OUT;
+	cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
+	put_unaligned_be64(sectors_to_logical(sdkp->device, sector),
+			   &cmd->cmnd[2]);
+
+	rq->timeout = SD_TIMEOUT;
+	cmd->sc_data_direction = DMA_NONE;
+	cmd->transfersize = 0;
+	cmd->allowed = 0;
+
+	return BLKPREP_OK;
+}
+
+int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
+			    sector_t sector, unsigned int sectors)
+{
+	sector_t zone_ofst = sector & (sdkp->zone_sectors - 1);
+
+	/* Do not allow zone boundaries crossing */
+	if (zone_ofst + sectors > sdkp->zone_sectors)
+		return BLKPREP_KILL;
+
+	/*
+	 * Do not issue more than one write at a time per
+	 * zone. This solves write ordering problems due to
+	 * the unlocking of the request queue in the dispatch
+	 * path in the non scsi-mq case. For scsi-mq, this
+	 * also avoids potential write reordering when multiple
+	 * threads running on different CPUs write to the same
+	 * zone (with a synchronized sequential pattern).
+	 */
+	if (req_op(rq) == REQ_OP_WRITE ||
+	    req_op(rq) == REQ_OP_WRITE_SAME) {
+		unsigned int zone_no = sector >> sdkp->zone_shift;
+		if (!test_and_set_bit(zone_no, sdkp->zones_wlock))
+			return BLKPREP_DEFER;
+	}
+
+	return BLKPREP_OK;
+}
+
+void sd_zbc_complete(struct scsi_cmnd *cmd,
+		     unsigned int good_bytes,
+		     struct scsi_sense_hdr *sshdr)
+{
+	int result = cmd->result;
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	unsigned int zone_no;
+
+	switch (req_op(rq)) {
+	case REQ_OP_WRITE:
+	case REQ_OP_WRITE_SAME:
+
+		if (result &&
+		    sshdr->sense_key == ILLEGAL_REQUEST &&
+		    sshdr->asc == 0x21)
+			/*
+			 * It is unlikely that retrying write requests failed
+			 * with any kind of alignement error will result in
+			 * success. So don't.
+			 */
+			cmd->allowed = 0;
+
+		/* Fallthru */
+
+	case REQ_OP_ZONE_RESET:
+
+		/* Unlock the zone */
+		zone_no = blk_rq_pos(rq) >> sdkp->zone_shift;
+		clear_bit_unlock(zone_no, sdkp->zones_wlock);
+		smp_mb__after_atomic();
+
+		if (result &&
+		    sshdr->sense_key == ILLEGAL_REQUEST &&
+		    sshdr->asc == 0x24)
+			/*
+			 * INVALID FIELD IN CDB error: Reset of a conventional
+			 * zone was attempted. Nothing to worry about,
+			 * so be quiet about the error.
+			 */
+			rq->cmd_flags |= REQ_QUIET;
+
+		break;
+
+	case REQ_OP_ZONE_REPORT:
+
+		if (!result)
+			sd_zbc_report_zones_complete(cmd, good_bytes);
+		break;
+
+	}
+}
+
+/**
+ * Read zoned block device characteristics (VPD page B6).
+ */
+static int sd_zbc_read_zoned_charateristics(struct scsi_disk *sdkp,
+					    unsigned char *buf)
+{
+
+	if (scsi_get_vpd_page(sdkp->device, 0xb6, buf, 64))
+		return -ENODEV;
+
+	if (sdkp->device->type != TYPE_ZBC) {
+		/* Host-aware */
+		sdkp->urswrz = 1;
+		sdkp->zones_optimal_open = get_unaligned_be64(&buf[8]);
+		sdkp->zones_optimal_nonseq = get_unaligned_be64(&buf[12]);
+		sdkp->zones_max_open = 0;
+	} else {
+		/* Host-managed */
+		sdkp->urswrz = buf[4] & 1;
+		sdkp->zones_optimal_open = 0;
+		sdkp->zones_optimal_nonseq = 0;
+		sdkp->zones_max_open = get_unaligned_be64(&buf[16]);
+	}
+
+	return 0;
+}
+
+/**
+ * Check reported capacity.
+ */
+static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
+				 unsigned char *buf)
+{
+	sector_t lba;
+	int ret;
+
+	if (sdkp->rc_basis != 0)
+		return 0;
+
+	/* Do a report zone to get the maximum LBA to check capacity */
+	ret = sd_zbc_report_zones(sdkp, buf, SD_BUF_SIZE, 0);
+	if (ret)
+		return ret;
+
+	/* The max_lba field is the capacity of this device */
+	lba = get_unaligned_be64(&buf[8]);
+	if (lba + 1 > sdkp->capacity) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_WARNING, sdkp,
+				  "Changing capacity from %zu "
+				  "to max LBA+1 %llu\n",
+				  sdkp->capacity,
+				  (unsigned long long)lba + 1);
+		sdkp->capacity = lba + 1;
+	}
+
+	return 0;
+}
+
+#define SD_ZBC_BUF_SIZE 131072
+
+static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
+{
+	sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+	struct blk_zone zone;
+	sector_t sector = 0;
+	unsigned char *buf;
+	unsigned char *rec;
+	unsigned int buf_len;
+	unsigned int list_length;
+	int ret;
+	u8 same;
+
+	/* Get a buffer */
+	buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* Do a report zone to get the same field */
+	ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0);
+	if (ret)
+		goto out;
+
+	same = buf[4] & 0x0f;
+	if (same > 0) {
+		unsigned char *rec = &buf[64];
+		sdkp->zone_sectors = logical_to_sectors(sdkp->device,
+						get_unaligned_be64(&rec[8]));
+		goto out;
+	}
+
+	/* Check the size of all zones */
+	sdkp->zone_sectors = (sector_t)-1;
+	do {
+
+		/* Parse REPORT ZONES header */
+		list_length = get_unaligned_be32(&buf[0]) + 64;
+		rec = buf + 64;
+		if (list_length < SD_ZBC_BUF_SIZE)
+			buf_len = list_length;
+		else
+			buf_len = SD_ZBC_BUF_SIZE;
+
+		/* Parse zone descriptors */
+		while (rec < buf + buf_len) {
+			sd_zbc_parse_report(sdkp, rec, &zone);
+			if (sdkp->zone_sectors == (sector_t)-1) {
+				sdkp->zone_sectors = zone.len;
+			} else if (sector + zone.len < capacity &&
+				   zone.len != sdkp->zone_sectors) {
+				sdkp->zone_sectors = 0;
+				goto out;
+			}
+			sector += zone.len;
+			rec += 64;
+		}
+
+		if (sector < capacity) {
+			ret = sd_zbc_report_zones(sdkp, buf,
+					SD_ZBC_BUF_SIZE, sector);
+			if (ret)
+				return ret;
+		}
+
+	} while (sector < capacity);
+
+out:
+	kfree(buf);
+
+	if (!sdkp->zone_sectors) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Devices with non constant zone "
+				  "size are not supported\n");
+		return -ENODEV;
+	}
+
+	if (!is_power_of_2(sdkp->zone_sectors)) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Devices with non power of 2 zone "
+				  "size are not supported\n");
+		return -ENODEV;
+	}
+
+	if ((sdkp->zone_sectors << 9) > UINT_MAX) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Zone size too large\n");
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int sd_zbc_setup(struct scsi_disk *sdkp)
+{
+	sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+
+	/* chunk_sectors indicates the zone size */
+	blk_queue_chunk_sectors(sdkp->disk->queue, sdkp->zone_sectors);
+	sdkp->zone_shift = ilog2(sdkp->zone_sectors);
+	sdkp->nr_zones = capacity >> sdkp->zone_shift;
+	if (capacity & (sdkp->zone_sectors - 1))
+		sdkp->nr_zones++;
+
+	if (!sdkp->zones_wlock) {
+		sdkp->zones_wlock = kzalloc(BITS_TO_LONGS(sdkp->nr_zones),
+					    GFP_KERNEL);
+		if (!sdkp->zones_wlock)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+int sd_zbc_read_zones(struct scsi_disk *sdkp,
+		      unsigned char *buf)
+{
+	sector_t capacity;
+	int ret = 0;
+
+	if (!sd_is_zoned(sdkp))
+		/*
+		 * Device managed or normal SCSI disk,
+		 * no special handling required
+		 */
+		return 0;
+
+
+	/* Get zoned block device characteristics */
+	ret = sd_zbc_read_zoned_charateristics(sdkp, buf);
+	if (ret)
+		return ret;
+
+	/*
+	 * Check for unconstrained reads: host-managed devices with
+	 * constrained reads (drives failing read after write pointer)
+	 * are not supported.
+	 */
+	if (!sdkp->urswrz) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+			  "constrained reads devices are not supported\n");
+		return -ENODEV;
+	}
+
+	/* Check capacity */
+	ret = sd_zbc_check_capacity(sdkp, buf);
+	if (ret)
+		return ret;
+	capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+
+	/*
+	 * Check zone size: only devices with a constant zone size (except
+	 * an eventual last runt zone) that is a power of 2 are supported.
+	 */
+	ret = sd_zbc_check_zone_size(sdkp);
+	if (ret)
+		return ret;
+
+	/* The drive satisfies the kernel restrictions: set it up */
+	ret = sd_zbc_setup(sdkp);
+	if (ret)
+		return ret;
+
+	if (sdkp->first_scan) {
+		if (sdkp->nr_zones * sdkp->zone_sectors == capacity)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "%u zones of %llu sectors\n",
+				  sdkp->nr_zones,
+				  (unsigned long long)sdkp->zone_sectors);
+		else
+			sd_printk(KERN_NOTICE, sdkp,
+				  "%u zones of %llu sectors "
+				  "+ 1 runt zone\n",
+				  sdkp->nr_zones - 1,
+				  (unsigned long long)sdkp->zone_sectors);
+	}
+
+	return 0;
+}
+
+void sd_zbc_remove(struct scsi_disk *sdkp)
+{
+	kfree(sdkp->zones_wlock);
+	sdkp->zones_wlock = NULL;
+}
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index d1defd1..6ba66e0 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -299,4 +299,21 @@ struct scsi_lun {
 #define SCSI_ACCESS_STATE_MASK        0x0f
 #define SCSI_ACCESS_STATE_PREFERRED   0x80
 
+/* Reporting options for REPORT ZONES */
+enum zbc_zone_reporting_options {
+	ZBC_ZONE_REPORTING_OPTION_ALL = 0,
+	ZBC_ZONE_REPORTING_OPTION_EMPTY,
+	ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_CLOSED,
+	ZBC_ZONE_REPORTING_OPTION_FULL,
+	ZBC_ZONE_REPORTING_OPTION_READONLY,
+	ZBC_ZONE_REPORTING_OPTION_OFFLINE,
+	ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
+	ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
+	ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
+};
+
+#define ZBC_REPORT_ZONE_PARTIAL 0x80
+
 #endif /* _SCSI_PROTO_H_ */
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 6/7] sd: Implement support for ZBC devices
@ 2016-09-26 11:14   ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Hannes Reinecke, Damien Le Moal

From: Hannes Reinecke <hare@suse.de>

Implement ZBC support functions to setup zoned disks, both
host-managed and host-aware models. Only zoned disks that satisfy
the following conditions are supported:
1) All zones are the same size, with the exception of an eventual
   last smaller runt zone.
2) For host-managed disks, reads are unrestricted (reads are not
   failed due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions will be ignored.

The capacity read of the device triggers the zoned block device
checks. As this needs the zone model of the disk, the call to
sd_read_capacity is moved after the call to
sd_read_block_characteristics so that host-aware devices are
properlly detected and initialized. The call to sd_zbc_read_zones
in sd_read_capacity may change the device capacity obtained with
the sd_read_capacity_16 function for devices reporting only the
capacity of conventional zones at the beginning of the LBA range
(i.e. devices with rc_basis set to 0).

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed zone cache support
         * Removed mapping of discard to reset write pointer command
         * Modified sd_zbc_read_zones to include checks that the
           device satisfies the kernel constraints
         * Implemeted REPORT ZONES setup and post-processing based
           on code from Shaun Tancheff <shaun.tancheff@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 drivers/scsi/Makefile     |   1 +
 drivers/scsi/sd.c         |  97 ++++++--
 drivers/scsi/sd.h         |  67 ++++++
 drivers/scsi/sd_zbc.c     | 586 ++++++++++++++++++++++++++++++++++++++++++++++
 include/scsi/scsi_proto.h |  17 ++
 5 files changed, 754 insertions(+), 14 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index fc0d9b8..350513c 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -180,6 +180,7 @@ hv_storvsc-y			:= storvsc_drv.o
 
 sd_mod-objs	:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
+sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
 
 sr_mod-objs	:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 51e5629..4b3523b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -93,6 +93,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS	16
@@ -163,7 +164,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 	static const char temp[] = "temporary ";
 	int len;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		/* no cache control on RBC devices; theoretically they
 		 * can do it, but there's probably so many exceptions
 		 * it's not worth the risk */
@@ -262,7 +263,7 @@ allow_restart_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	sdp->allow_restart = simple_strtoul(buf, NULL, 10);
@@ -392,6 +393,11 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
+	if (sd_is_zoned(sdkp)) {
+		sd_config_discard(sdkp, SD_LBP_DISABLE);
+		return count;
+	}
+
 	if (sdp->type != TYPE_DISK)
 		return -EINVAL;
 
@@ -459,7 +465,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &max);
@@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
 	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
+		if (ret != BLKPREP_OK)
+			return ret;
+	}
+
 	sector >>= ilog2(sdp->sector_size) - 9;
 	nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
@@ -963,6 +976,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
 					(unsigned long long)block));
 
+	if (sd_is_zoned(sdkp)) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
+		if (ret != BLKPREP_OK)
+			goto out;
+	}
+
 	/*
 	 * If we have a 1K hardware sectorsize, prevent access to single
 	 * 512 byte sectors.  In theory we could handle this - in fact
@@ -1149,6 +1169,10 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 	case REQ_OP_READ:
 	case REQ_OP_WRITE:
 		return sd_setup_read_write_cmnd(cmd);
+	case REQ_OP_ZONE_REPORT:
+		return sd_zbc_setup_report_cmnd(cmd);
+	case REQ_OP_ZONE_RESET:
+		return sd_zbc_setup_reset_cmnd(cmd);
 	default:
 		BUG();
 	}
@@ -1780,7 +1804,10 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	unsigned char op = SCpnt->cmnd[0];
 	unsigned char unmap = SCpnt->cmnd[1] & 8;
 
-	if (req_op(req) == REQ_OP_DISCARD || req_op(req) == REQ_OP_WRITE_SAME) {
+	switch (req_op(req)) {
+	case REQ_OP_DISCARD:
+	case REQ_OP_WRITE_SAME:
+	case REQ_OP_ZONE_RESET:
 		if (!result) {
 			good_bytes = blk_rq_bytes(req);
 			scsi_set_resid(SCpnt, 0);
@@ -1788,6 +1815,17 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			good_bytes = 0;
 			scsi_set_resid(SCpnt, blk_rq_bytes(req));
 		}
+		break;
+	case REQ_OP_ZONE_REPORT:
+		if (!result) {
+			good_bytes = scsi_bufflen(SCpnt)
+				- scsi_get_resid(SCpnt);
+			scsi_set_resid(SCpnt, 0);
+		} else {
+			good_bytes = 0;
+			scsi_set_resid(SCpnt, blk_rq_bytes(req));
+		}
+		break;
 	}
 
 	if (result) {
@@ -1848,7 +1886,11 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	default:
 		break;
 	}
+
  out:
+	if (sd_is_zoned(sdkp))
+		sd_zbc_complete(SCpnt, good_bytes, &sshdr);
+
 	SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt,
 					   "sd_done: completed %d of %d bytes\n",
 					   good_bytes, scsi_bufflen(SCpnt)));
@@ -1983,7 +2025,6 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 	}
 }
 
-
 /*
  * Determine whether disk supports Data Integrity Field.
  */
@@ -2133,6 +2174,9 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	/* Logical blocks per physical block exponent */
 	sdkp->physical_block_size = (1 << (buffer[13] & 0xf)) * sector_size;
 
+	/* RC basis */
+	sdkp->rc_basis = (buffer[12] >> 4) & 0x3;
+
 	/* Lowest aligned logical block */
 	alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
 	blk_queue_alignment_offset(sdp->request_queue, alignment);
@@ -2323,6 +2367,13 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
 		sector_size = 512;
 	}
 	blk_queue_logical_block_size(sdp->request_queue, sector_size);
+	blk_queue_physical_block_size(sdp->request_queue,
+				      sdkp->physical_block_size);
+	sdkp->device->sector_size = sector_size;
+
+	if (sd_zbc_read_zones(sdkp, buffer) < 0)
+		/* The drive zone layout could not be checked */
+		sdkp->capacity = 0;
 
 	{
 		char cap_str_2[10], cap_str_10[10];
@@ -2349,9 +2400,6 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
 	if (sdkp->capacity > 0xffffffff)
 		sdp->use_16_for_rw = 1;
 
-	blk_queue_physical_block_size(sdp->request_queue,
-				      sdkp->physical_block_size);
-	sdkp->device->sector_size = sector_size;
 }
 
 /* called with buffer of length 512 */
@@ -2613,7 +2661,7 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
 	struct scsi_mode_data data;
 	struct scsi_sense_hdr sshdr;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return;
 
 	if (sdkp->protection_type == 0)
@@ -2720,6 +2768,7 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
  */
 static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 {
+	struct request_queue *q = sdkp->disk->queue;
 	unsigned char *buffer;
 	u16 rot;
 	const int vpd_len = 64;
@@ -2734,10 +2783,21 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 	rot = get_unaligned_be16(&buffer[4]);
 
 	if (rot == 1) {
-		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
-		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
+		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
+		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, q);
 	}
 
+	sdkp->zoned = (buffer[8] >> 4) & 3;
+	if (sdkp->zoned == 1)
+		q->limits.zoned = BLK_ZONED_HA;
+	else if (sdkp->device->type == TYPE_ZBC)
+		q->limits.zoned = BLK_ZONED_HM;
+	else
+		q->limits.zoned = BLK_ZONED_NONE;
+	if (blk_queue_is_zoned(q) && sdkp->first_scan)
+		sd_printk(KERN_NOTICE, sdkp, "Host-%s zoned block device\n",
+		      q->limits.zoned == BLK_ZONED_HM ? "managed" : "aware");
+
  out:
 	kfree(buffer);
 }
@@ -2836,14 +2896,14 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	 * react badly if we do.
 	 */
 	if (sdkp->media_present) {
-		sd_read_capacity(sdkp, buffer);
-
 		if (scsi_device_supports_vpd(sdp)) {
 			sd_read_block_provisioning(sdkp);
 			sd_read_block_limits(sdkp);
 			sd_read_block_characteristics(sdkp);
 		}
 
+		sd_read_capacity(sdkp, buffer);
+
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
@@ -3041,9 +3101,16 @@ static int sd_probe(struct device *dev)
 
 	scsi_autopm_get_device(sdp);
 	error = -ENODEV;
-	if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
+	if (sdp->type != TYPE_DISK &&
+	    sdp->type != TYPE_ZBC &&
+	    sdp->type != TYPE_MOD &&
+	    sdp->type != TYPE_RBC)
 		goto out;
 
+#ifndef CONFIG_BLK_DEV_ZONED
+	if (sdp->type == TYPE_ZBC)
+		goto out;
+#endif
 	SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
 					"sd_probe\n"));
 
@@ -3147,6 +3214,8 @@ static int sd_remove(struct device *dev)
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 
+	sd_zbc_remove(sdkp);
+
 	blk_register_region(devt, SD_MINORS, NULL,
 			    sd_default_probe, NULL, NULL);
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index c8d9863..0ba47c1 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -64,6 +64,15 @@ struct scsi_disk {
 	struct scsi_device *device;
 	struct device	dev;
 	struct gendisk	*disk;
+#ifdef CONFIG_BLK_DEV_ZONED
+	unsigned int	nr_zones;
+	sector_t	zone_sectors;
+	unsigned int	zone_shift;
+	unsigned long	*zones_wlock;
+	unsigned int	zones_optimal_open;
+	unsigned int	zones_optimal_nonseq;
+	unsigned int	zones_max_open;
+#endif
 	atomic_t	openers;
 	sector_t	capacity;	/* size in logical blocks */
 	u32		max_xfer_blocks;
@@ -94,6 +103,9 @@ struct scsi_disk {
 	unsigned	lbpvpd : 1;
 	unsigned	ws10 : 1;
 	unsigned	ws16 : 1;
+	unsigned	rc_basis: 2;
+	unsigned	zoned: 2;
+	unsigned	urswrz : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
@@ -156,6 +168,11 @@ static inline unsigned int logical_to_bytes(struct scsi_device *sdev, sector_t b
 	return blocks * sdev->sector_size;
 }
 
+static inline sector_t sectors_to_logical(struct scsi_device *sdev, sector_t sector)
+{
+	return sector >> (ilog2(sdev->sector_size) - 9);
+}
+
 /*
  * Look up the DIX operation based on whether the command is read or
  * write and whether dix and dif are enabled.
@@ -239,4 +256,54 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+static inline int sd_is_zoned(struct scsi_disk *sdkp)
+{
+	return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
+}
+
+#ifdef CONFIG_BLK_DEV_ZONED
+
+extern int sd_zbc_read_zones(struct scsi_disk *, unsigned char *);
+extern void sd_zbc_remove(struct scsi_disk *);
+extern int sd_zbc_setup_read_write(struct scsi_disk *, struct request *,
+				   sector_t, unsigned int);
+extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *);
+extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *);
+extern void sd_zbc_complete(struct scsi_cmnd *, unsigned int,
+			    struct scsi_sense_hdr *);
+
+#else /* CONFIG_BLK_DEV_ZONED */
+
+static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
+				    unsigned char *buf)
+{
+	return 0;
+}
+
+static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
+
+static inline int sd_zbc_setup_read_write(struct scsi_disk *sdkp,
+					  struct request *rq, sector_t sector,
+					  unsigned int num_sectors)
+{
+	/* Let the drive fail requests */
+	return BLKPREP_OK;
+}
+
+static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_KILL;
+}
+
+static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_KILL;
+}
+
+static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
+				   unsigned int good_bytes,
+				   struct scsi_sense_hdr *sshdr) {}
+
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 #endif /* _SCSI_DISK_H */
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
new file mode 100644
index 0000000..2680f51
--- /dev/null
+++ b/drivers/scsi/sd_zbc.c
@@ -0,0 +1,586 @@
+/*
+ * SCSI Zoned Block commands
+ *
+ * Copyright (C) 2014-2015 SUSE Linux GmbH
+ * Written by: Hannes Reinecke <hare@suse.de>
+ * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
+ * Modified by: Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include <linux/blkdev.h>
+
+#include <asm/unaligned.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_dbg.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_eh.h>
+
+#include "sd.h"
+#include "scsi_priv.h"
+
+enum zbc_zone_type {
+	ZBC_ZONE_TYPE_CONV = 0x1,
+	ZBC_ZONE_TYPE_SEQWRITE_REQ,
+	ZBC_ZONE_TYPE_SEQWRITE_PREF,
+	ZBC_ZONE_TYPE_RESERVED,
+};
+
+enum zbc_zone_cond {
+	ZBC_ZONE_COND_NO_WP,
+	ZBC_ZONE_COND_EMPTY,
+	ZBC_ZONE_COND_IMP_OPEN,
+	ZBC_ZONE_COND_EXP_OPEN,
+	ZBC_ZONE_COND_CLOSED,
+	ZBC_ZONE_COND_READONLY = 0xd,
+	ZBC_ZONE_COND_FULL,
+	ZBC_ZONE_COND_OFFLINE,
+};
+
+/**
+ * Convert a zone descriptor to a zone struct.
+ */
+static void sd_zbc_parse_report(struct scsi_disk *sdkp,
+				u8 *buf,
+				struct blk_zone *zone)
+{
+	struct scsi_device *sdp = sdkp->device;
+
+	memset(zone, 0, sizeof(struct blk_zone));
+
+	zone->type = buf[0] & 0x0f;
+	zone->cond = (buf[1] >> 4) & 0xf;
+	if (buf[1] & 0x01)
+		zone->reset = 1;
+	if (buf[1] & 0x02)
+		zone->non_seq = 1;
+
+	zone->len = logical_to_sectors(sdp, get_unaligned_be64(&buf[8]));
+	zone->start = logical_to_sectors(sdp, get_unaligned_be64(&buf[16]));
+	zone->wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24]));
+	if (zone->type != ZBC_ZONE_TYPE_CONV &&
+	    zone->cond == ZBC_ZONE_COND_FULL)
+		zone->wp = zone->start + zone->len;
+}
+
+/**
+ * Issue a REPORT ZONES scsi command.
+ */
+static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
+			       unsigned int buflen, sector_t start_sector)
+{
+	struct scsi_device *sdp = sdkp->device;
+	const int timeout = sdp->request_queue->rq_timeout;
+	struct scsi_sense_hdr sshdr;
+	sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
+	unsigned char cmd[16];
+	unsigned int rep_len;
+	int result;
+
+	memset(cmd, 0, 16);
+	cmd[0] = ZBC_IN;
+	cmd[1] = ZI_REPORT_ZONES;
+	put_unaligned_be64(start_lba, &cmd[2]);
+	put_unaligned_be32(buflen, &cmd[10]);
+	memset(buf, 0, buflen);
+
+	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
+				  buf, buflen, &sshdr,
+				  timeout, SD_MAX_RETRIES, NULL);
+	if (result) {
+		sd_printk(KERN_ERR, sdkp,
+			  "REPORT ZONES lba %llu failed with %d/%d\n",
+			  (unsigned long long)start_lba,
+			  host_byte(result), driver_byte(result));
+		return -EIO;
+	}
+
+	rep_len = get_unaligned_be32(&buf[0]);
+	if (rep_len < 64) {
+		sd_printk(KERN_ERR, sdkp,
+			  "REPORT ZONES report invalid length %u\n",
+			  rep_len);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t lba, sector = blk_rq_pos(rq);
+	unsigned int nr_bytes = blk_rq_bytes(rq);
+	int ret;
+
+	WARN_ON(nr_bytes == 0);
+
+	if (!sd_is_zoned(sdkp))
+		/* Not a zoned device */
+		return BLKPREP_KILL;
+
+	ret = scsi_init_io(cmd);
+	if (ret != BLKPREP_OK)
+		return ret;
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_IN;
+	cmd->cmnd[1] = ZI_REPORT_ZONES;
+	lba = sectors_to_logical(sdkp->device, sector);
+	put_unaligned_be64(lba, &cmd->cmnd[2]);
+	put_unaligned_be32(nr_bytes, &cmd->cmnd[10]);
+	/* Do partial report for speeding things up */
+	cmd->cmnd[14] = ZBC_REPORT_ZONE_PARTIAL;
+
+	cmd->sc_data_direction = DMA_FROM_DEVICE;
+	cmd->sdb.length = nr_bytes;
+	cmd->transfersize = sdkp->device->sector_size;
+	cmd->allowed = 0;
+
+	/*
+	 * Report may return less bytes than requested. Make sure
+	 * to report completion on the entire initial request.
+	 */
+	rq->__data_len = nr_bytes;
+
+	return BLKPREP_OK;
+}
+
+static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
+					 unsigned int good_bytes)
+{
+	struct request *rq = scmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	struct sg_mapping_iter miter;
+	struct blk_zone_report_hdr hdr;
+	struct blk_zone zone;
+	unsigned int offset, bytes = 0;
+	unsigned long flags;
+	u8 *buf;
+
+	if (good_bytes < 64)
+		return;
+
+	memset(&hdr, 0, sizeof(struct blk_zone_report_hdr));
+
+	sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd),
+		       SG_MITER_TO_SG | SG_MITER_ATOMIC);
+
+	local_irq_save(flags);
+	while (sg_miter_next(&miter) && bytes < good_bytes) {
+
+		buf = miter.addr;
+		offset = 0;
+
+		if (bytes == 0) {
+			/* Set the report header */
+			hdr.nr_zones = min_t(unsigned int,
+					 (good_bytes - 64) / 64,
+					 get_unaligned_be32(&buf[0]) / 64);
+			memcpy(buf, &hdr, sizeof(struct blk_zone_report_hdr));
+			offset += 64;
+			bytes += 64;
+		}
+
+		/* Parse zone descriptors */
+		while (offset < miter.length && hdr.nr_zones) {
+			WARN_ON(offset > miter.length);
+			buf = miter.addr + offset;
+			sd_zbc_parse_report(sdkp, buf, &zone);
+			memcpy(buf, &zone, sizeof(struct blk_zone));
+			offset += 64;
+			bytes += 64;
+			hdr.nr_zones--;
+		}
+
+		if (!hdr.nr_zones)
+			break;
+
+	}
+	sg_miter_stop(&miter);
+	local_irq_restore(flags);
+}
+
+int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t sector = blk_rq_pos(rq);
+	unsigned int zone_no = sector >> sdkp->zone_shift;
+
+	if (!sd_is_zoned(sdkp))
+		/* Not a zoned device */
+		return BLKPREP_KILL;
+
+	if (sector & (sdkp->zone_sectors - 1))
+		/* Unaligned request */
+		return BLKPREP_KILL;
+
+	/* Do not allow concurrent reset and writes */
+	if (!test_and_set_bit(zone_no, sdkp->zones_wlock))
+		return BLKPREP_DEFER;
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_OUT;
+	cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
+	put_unaligned_be64(sectors_to_logical(sdkp->device, sector),
+			   &cmd->cmnd[2]);
+
+	rq->timeout = SD_TIMEOUT;
+	cmd->sc_data_direction = DMA_NONE;
+	cmd->transfersize = 0;
+	cmd->allowed = 0;
+
+	return BLKPREP_OK;
+}
+
+int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
+			    sector_t sector, unsigned int sectors)
+{
+	sector_t zone_ofst = sector & (sdkp->zone_sectors - 1);
+
+	/* Do not allow zone boundaries crossing */
+	if (zone_ofst + sectors > sdkp->zone_sectors)
+		return BLKPREP_KILL;
+
+	/*
+	 * Do not issue more than one write at a time per
+	 * zone. This solves write ordering problems due to
+	 * the unlocking of the request queue in the dispatch
+	 * path in the non scsi-mq case. For scsi-mq, this
+	 * also avoids potential write reordering when multiple
+	 * threads running on different CPUs write to the same
+	 * zone (with a synchronized sequential pattern).
+	 */
+	if (req_op(rq) == REQ_OP_WRITE ||
+	    req_op(rq) == REQ_OP_WRITE_SAME) {
+		unsigned int zone_no = sector >> sdkp->zone_shift;
+		if (!test_and_set_bit(zone_no, sdkp->zones_wlock))
+			return BLKPREP_DEFER;
+	}
+
+	return BLKPREP_OK;
+}
+
+void sd_zbc_complete(struct scsi_cmnd *cmd,
+		     unsigned int good_bytes,
+		     struct scsi_sense_hdr *sshdr)
+{
+	int result = cmd->result;
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	unsigned int zone_no;
+
+	switch (req_op(rq)) {
+	case REQ_OP_WRITE:
+	case REQ_OP_WRITE_SAME:
+
+		if (result &&
+		    sshdr->sense_key == ILLEGAL_REQUEST &&
+		    sshdr->asc == 0x21)
+			/*
+			 * It is unlikely that retrying write requests failed
+			 * with any kind of alignement error will result in
+			 * success. So don't.
+			 */
+			cmd->allowed = 0;
+
+		/* Fallthru */
+
+	case REQ_OP_ZONE_RESET:
+
+		/* Unlock the zone */
+		zone_no = blk_rq_pos(rq) >> sdkp->zone_shift;
+		clear_bit_unlock(zone_no, sdkp->zones_wlock);
+		smp_mb__after_atomic();
+
+		if (result &&
+		    sshdr->sense_key == ILLEGAL_REQUEST &&
+		    sshdr->asc == 0x24)
+			/*
+			 * INVALID FIELD IN CDB error: Reset of a conventional
+			 * zone was attempted. Nothing to worry about,
+			 * so be quiet about the error.
+			 */
+			rq->cmd_flags |= REQ_QUIET;
+
+		break;
+
+	case REQ_OP_ZONE_REPORT:
+
+		if (!result)
+			sd_zbc_report_zones_complete(cmd, good_bytes);
+		break;
+
+	}
+}
+
+/**
+ * Read zoned block device characteristics (VPD page B6).
+ */
+static int sd_zbc_read_zoned_charateristics(struct scsi_disk *sdkp,
+					    unsigned char *buf)
+{
+
+	if (scsi_get_vpd_page(sdkp->device, 0xb6, buf, 64))
+		return -ENODEV;
+
+	if (sdkp->device->type != TYPE_ZBC) {
+		/* Host-aware */
+		sdkp->urswrz = 1;
+		sdkp->zones_optimal_open = get_unaligned_be64(&buf[8]);
+		sdkp->zones_optimal_nonseq = get_unaligned_be64(&buf[12]);
+		sdkp->zones_max_open = 0;
+	} else {
+		/* Host-managed */
+		sdkp->urswrz = buf[4] & 1;
+		sdkp->zones_optimal_open = 0;
+		sdkp->zones_optimal_nonseq = 0;
+		sdkp->zones_max_open = get_unaligned_be64(&buf[16]);
+	}
+
+	return 0;
+}
+
+/**
+ * Check reported capacity.
+ */
+static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
+				 unsigned char *buf)
+{
+	sector_t lba;
+	int ret;
+
+	if (sdkp->rc_basis != 0)
+		return 0;
+
+	/* Do a report zone to get the maximum LBA to check capacity */
+	ret = sd_zbc_report_zones(sdkp, buf, SD_BUF_SIZE, 0);
+	if (ret)
+		return ret;
+
+	/* The max_lba field is the capacity of this device */
+	lba = get_unaligned_be64(&buf[8]);
+	if (lba + 1 > sdkp->capacity) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_WARNING, sdkp,
+				  "Changing capacity from %zu "
+				  "to max LBA+1 %llu\n",
+				  sdkp->capacity,
+				  (unsigned long long)lba + 1);
+		sdkp->capacity = lba + 1;
+	}
+
+	return 0;
+}
+
+#define SD_ZBC_BUF_SIZE 131072
+
+static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
+{
+	sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+	struct blk_zone zone;
+	sector_t sector = 0;
+	unsigned char *buf;
+	unsigned char *rec;
+	unsigned int buf_len;
+	unsigned int list_length;
+	int ret;
+	u8 same;
+
+	/* Get a buffer */
+	buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* Do a report zone to get the same field */
+	ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0);
+	if (ret)
+		goto out;
+
+	same = buf[4] & 0x0f;
+	if (same > 0) {
+		unsigned char *rec = &buf[64];
+		sdkp->zone_sectors = logical_to_sectors(sdkp->device,
+						get_unaligned_be64(&rec[8]));
+		goto out;
+	}
+
+	/* Check the size of all zones */
+	sdkp->zone_sectors = (sector_t)-1;
+	do {
+
+		/* Parse REPORT ZONES header */
+		list_length = get_unaligned_be32(&buf[0]) + 64;
+		rec = buf + 64;
+		if (list_length < SD_ZBC_BUF_SIZE)
+			buf_len = list_length;
+		else
+			buf_len = SD_ZBC_BUF_SIZE;
+
+		/* Parse zone descriptors */
+		while (rec < buf + buf_len) {
+			sd_zbc_parse_report(sdkp, rec, &zone);
+			if (sdkp->zone_sectors == (sector_t)-1) {
+				sdkp->zone_sectors = zone.len;
+			} else if (sector + zone.len < capacity &&
+				   zone.len != sdkp->zone_sectors) {
+				sdkp->zone_sectors = 0;
+				goto out;
+			}
+			sector += zone.len;
+			rec += 64;
+		}
+
+		if (sector < capacity) {
+			ret = sd_zbc_report_zones(sdkp, buf,
+					SD_ZBC_BUF_SIZE, sector);
+			if (ret)
+				return ret;
+		}
+
+	} while (sector < capacity);
+
+out:
+	kfree(buf);
+
+	if (!sdkp->zone_sectors) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Devices with non constant zone "
+				  "size are not supported\n");
+		return -ENODEV;
+	}
+
+	if (!is_power_of_2(sdkp->zone_sectors)) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Devices with non power of 2 zone "
+				  "size are not supported\n");
+		return -ENODEV;
+	}
+
+	if ((sdkp->zone_sectors << 9) > UINT_MAX) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Zone size too large\n");
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int sd_zbc_setup(struct scsi_disk *sdkp)
+{
+	sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+
+	/* chunk_sectors indicates the zone size */
+	blk_queue_chunk_sectors(sdkp->disk->queue, sdkp->zone_sectors);
+	sdkp->zone_shift = ilog2(sdkp->zone_sectors);
+	sdkp->nr_zones = capacity >> sdkp->zone_shift;
+	if (capacity & (sdkp->zone_sectors - 1))
+		sdkp->nr_zones++;
+
+	if (!sdkp->zones_wlock) {
+		sdkp->zones_wlock = kzalloc(BITS_TO_LONGS(sdkp->nr_zones),
+					    GFP_KERNEL);
+		if (!sdkp->zones_wlock)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+int sd_zbc_read_zones(struct scsi_disk *sdkp,
+		      unsigned char *buf)
+{
+	sector_t capacity;
+	int ret = 0;
+
+	if (!sd_is_zoned(sdkp))
+		/*
+		 * Device managed or normal SCSI disk,
+		 * no special handling required
+		 */
+		return 0;
+
+
+	/* Get zoned block device characteristics */
+	ret = sd_zbc_read_zoned_charateristics(sdkp, buf);
+	if (ret)
+		return ret;
+
+	/*
+	 * Check for unconstrained reads: host-managed devices with
+	 * constrained reads (drives failing read after write pointer)
+	 * are not supported.
+	 */
+	if (!sdkp->urswrz) {
+		if (sdkp->first_scan)
+			sd_printk(KERN_NOTICE, sdkp,
+			  "constrained reads devices are not supported\n");
+		return -ENODEV;
+	}
+
+	/* Check capacity */
+	ret = sd_zbc_check_capacity(sdkp, buf);
+	if (ret)
+		return ret;
+	capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+
+	/*
+	 * Check zone size: only devices with a constant zone size (except
+	 * an eventual last runt zone) that is a power of 2 are supported.
+	 */
+	ret = sd_zbc_check_zone_size(sdkp);
+	if (ret)
+		return ret;
+
+	/* The drive satisfies the kernel restrictions: set it up */
+	ret = sd_zbc_setup(sdkp);
+	if (ret)
+		return ret;
+
+	if (sdkp->first_scan) {
+		if (sdkp->nr_zones * sdkp->zone_sectors == capacity)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "%u zones of %llu sectors\n",
+				  sdkp->nr_zones,
+				  (unsigned long long)sdkp->zone_sectors);
+		else
+			sd_printk(KERN_NOTICE, sdkp,
+				  "%u zones of %llu sectors "
+				  "+ 1 runt zone\n",
+				  sdkp->nr_zones - 1,
+				  (unsigned long long)sdkp->zone_sectors);
+	}
+
+	return 0;
+}
+
+void sd_zbc_remove(struct scsi_disk *sdkp)
+{
+	kfree(sdkp->zones_wlock);
+	sdkp->zones_wlock = NULL;
+}
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index d1defd1..6ba66e0 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -299,4 +299,21 @@ struct scsi_lun {
 #define SCSI_ACCESS_STATE_MASK        0x0f
 #define SCSI_ACCESS_STATE_PREFERRED   0x80
 
+/* Reporting options for REPORT ZONES */
+enum zbc_zone_reporting_options {
+	ZBC_ZONE_REPORTING_OPTION_ALL = 0,
+	ZBC_ZONE_REPORTING_OPTION_EMPTY,
+	ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_CLOSED,
+	ZBC_ZONE_REPORTING_OPTION_FULL,
+	ZBC_ZONE_REPORTING_OPTION_READONLY,
+	ZBC_ZONE_REPORTING_OPTION_OFFLINE,
+	ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
+	ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
+	ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
+};
+
+#define ZBC_REPORT_ZONE_PARTIAL 0x80
+
 #endif /* _SCSI_PROTO_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 7/7] blk-zoned: implement ioctls
  2016-09-26 11:14 ` Damien Le Moal
@ 2016-09-26 11:14   ` Damien Le Moal
  -1 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Shaun Tancheff, Damien Le Moal

From: Shaun Tancheff <shaun.tancheff@seagate.com>

Adds the new BLKREPORTZONE and BLKRESETZONE ioctls for respectively
obtaining the zone configuration of a zoned block device and resetting
the write pointer of sequential zones of a zoned block device.

The BLKREPORTZONE ioctl maps directly to a single call of the function
blkdev_report_zones. The zone information result is passed as an array
of struct blk_zone identical to the structure used internally for
processing the REQ_OP_ZONE_REPORT operation.  The BLKRESETZONE ioctl
maps to a single call of the blkdev_reset_zones function.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-zoned.c             |  95 ++++++++++++++++++++++++++++
 block/ioctl.c                 |   4 ++
 include/linux/blkdev.h        |  65 +++++++------------
 include/uapi/linux/Kbuild     |   1 +
 include/uapi/linux/blkzoned.h | 143 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h       |   4 ++
 6 files changed, 270 insertions(+), 42 deletions(-)
 create mode 100644 include/uapi/linux/blkzoned.h

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 473cb0a..8c70bd6 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/rbtree.h>
 #include <linux/blkdev.h>
+#include <linux/blkzoned.h>
 
 static inline sector_t blk_zone_start(struct request_queue *q,
 				      sector_t sector)
@@ -238,3 +239,97 @@ int blkdev_reset_zones(struct block_device *bdev,
 
 	return 0;
 }
+
+/**
+ * BLKREPORTZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+			      unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct request_queue *q;
+	struct blk_zone_report rep;
+	struct blk_zone *zones;
+	int ret;
+
+	if (!argp)
+		return -EINVAL;
+
+	q = bdev_get_queue(bdev);
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -ENOTTY;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (copy_from_user(&rep, argp, sizeof(struct blk_zone_report)))
+		return -EFAULT;
+
+	if (!rep.nr_zones)
+		return -EINVAL;
+
+	zones = kzalloc(sizeof(struct blk_zone) * rep.nr_zones,
+			GFP_KERNEL);
+	if (!zones)
+		return -ENOMEM;
+
+	ret = blkdev_report_zones(bdev, rep.sector,
+				  zones, &rep.nr_zones,
+				  GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (rep.nr_zones) {
+		if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
+				 sizeof(struct blk_zone) * rep.nr_zones))
+			ret = -EFAULT;
+	}
+
+ out:
+	kfree(zones);
+
+	return ret;
+}
+
+/**
+ * BLKRESETZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+			     unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct request_queue *q;
+	struct blk_zone_range zrange;
+
+	if (!argp)
+		return -EINVAL;
+
+	q = bdev_get_queue(bdev);
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -ENOTTY;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range)))
+		return -EFAULT;
+
+	return blkdev_reset_zones(bdev, zrange.sector, zrange.nr_sectors,
+				  GFP_KERNEL);
+}
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..448f78a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -513,6 +513,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 				BLKDEV_DISCARD_SECURE);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
+	case BLKREPORTZONE:
+		return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
+	case BLKRESETZONE:
+		return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
 	case HDIO_GETGEO:
 		return blkdev_getgeo(bdev, argp);
 	case BLKRAGET:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6034f38..0a75285 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include <linux/rcupdate.h>
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
+#include <linux/blkzoned.h>
 
 struct module;
 struct scsi_ioctl_command;
@@ -304,48 +305,6 @@ struct queue_limits {
 
 #ifdef CONFIG_BLK_DEV_ZONED
 
-/*
- * Zone type.
- */
-enum blk_zone_type {
-	BLK_ZONE_TYPE_UNKNOWN,
-	BLK_ZONE_TYPE_CONVENTIONAL,
-	BLK_ZONE_TYPE_SEQWRITE_REQ,
-	BLK_ZONE_TYPE_SEQWRITE_PREF,
-};
-
-/*
- * Zone condition.
- */
-enum blk_zone_cond {
-	BLK_ZONE_COND_NO_WP,
-	BLK_ZONE_COND_EMPTY,
-	BLK_ZONE_COND_IMP_OPEN,
-	BLK_ZONE_COND_EXP_OPEN,
-	BLK_ZONE_COND_CLOSED,
-	BLK_ZONE_COND_READONLY = 0xd,
-	BLK_ZONE_COND_FULL,
-	BLK_ZONE_COND_OFFLINE,
-};
-
-/*
- * Zone descriptor for BLKREPORTZONE.
- * start, len and wp use the regulare 512 B sector unit,
- * regardless of the device logical block size. The overall
- * structure size is 64 B to match the ZBC/ZAC defined zone descriptor
- * and allow support for future additional zone information.
- */
-struct blk_zone {
-	u64	start;		/* Zone start sector */
-	u64	len;		/* Zone length in number of sectors */
-	u64	wp;		/* Zone write pointer position */
-	u8	type;		/* Zone type */
-	u8	cond;		/* Zone condition */
-	u8	non_seq;	/* Non-sequential write resources active */
-	u8	reset;		/* Reset write pointer recommended */
-	u8	reserved[36];
-};
-
 struct blk_zone_report_hdr {
 	unsigned int	nr_zones;
 	u8		padding[60];
@@ -356,6 +315,28 @@ extern int blkdev_report_zones(struct block_device *,
 				unsigned int *, gfp_t);
 extern int blkdev_reset_zones(struct block_device *, sector_t,
 				sector_t, gfp_t);
+
+extern int blkdev_report_zones_ioctl(struct block_device *, fmode_t,
+				     unsigned int, unsigned long);
+extern int blkdev_reset_zones_ioctl(struct block_device *, fmode_t,
+				    unsigned int, unsigned long);
+
+#else /* CONFIG_BLK_DEV_ZONED */
+
+static inline int blkdev_report_zones_ioctl(struct block_device *bdev,
+					    fmode_t mode, unsigned int cmd,
+					    unsigned long arg)
+{
+	return -ENOTTY;
+}
+
+static inline int blkdev_reset_zones_ioctl(struct block_device *bdev,
+					   fmode_t mode, unsigned int cmd,
+					   unsigned long arg)
+{
+	return -ENOTTY;
+}
+
 #endif /* CONFIG_BLK_DEV_ZONED */
 
 struct request_queue {
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index dd60439..92466a6 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -70,6 +70,7 @@ header-y += bfs_fs.h
 header-y += binfmts.h
 header-y += blkpg.h
 header-y += blktrace_api.h
+header-y += blkzoned.h
 header-y += bpf_common.h
 header-y += bpf_perf_event.h
 header-y += bpf.h
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
new file mode 100644
index 0000000..40d1d7b
--- /dev/null
+++ b/include/uapi/linux/blkzoned.h
@@ -0,0 +1,143 @@
+/*
+ * Zoned block devices handling.
+ *
+ * Copyright (C) 2015 Seagate Technology PLC
+ *
+ * Written by: Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
+ * Copyright (C) 2016 Western Digital
+ *
+ * This file is licensed under  the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+#ifndef _UAPI_BLKZONED_H
+#define _UAPI_BLKZONED_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/**
+ * enum blk_zone_type - Types of zones allowed in a zoned device.
+ *
+ * @BLK_ZONE_TYPE_CONVENTIONAL: The zone has no write pointer and can be writen
+ *                              randomly. Zone reset has no effect on the zone.
+ * @BLK_ZONE_TYPE_SEQWRITE_REQ: The zone must be written sequentially
+ * @BLK_ZONE_TYPE_SEQWRITE_PREF: The zone can be written non-sequentially
+ *
+ * Any other value not defined is reserved and must be considered as invalid.
+ */
+enum blk_zone_type {
+	BLK_ZONE_TYPE_CONVENTIONAL	= 0x1,
+	BLK_ZONE_TYPE_SEQWRITE_REQ	= 0x2,
+	BLK_ZONE_TYPE_SEQWRITE_PREF	= 0x3,
+};
+
+/**
+ * enum blk_zone_cond - Condition [state] of a zone in a zoned device.
+ *
+ * @BLK_ZONE_COND_NOT_WP: The zone has no write pointer, it is conventional.
+ * @BLK_ZONE_COND_EMPTY: The zone is empty.
+ * @BLK_ZONE_COND_IMP_OPEN: The zone is open, but not explicitly opened.
+ * @BLK_ZONE_COND_EXP_OPEN: The zones was explicitly opened by an
+ *                          OPEN ZONE command.
+ * @BLK_ZONE_COND_CLOSED: The zone was [explicitly] closed after writing.
+ * @BLK_ZONE_COND_FULL: The zone is marked as full, possibly by a zone
+ *                      FINISH ZONE command.
+ * @BLK_ZONE_COND_READONLY: The zone is read-only.
+ * @BLK_ZONE_COND_OFFLINE: The zone is offline (sectors cannot be read/written).
+ *
+ * The Zone Condition state machine in the ZBC/ZAC standards maps the above
+ * deinitions as:
+ *   - ZC1: Empty         | BLK_ZONE_EMPTY
+ *   - ZC2: Implicit Open | BLK_ZONE_COND_IMP_OPEN
+ *   - ZC3: Explicit Open | BLK_ZONE_COND_EXP_OPEN
+ *   - ZC4: Closed        | BLK_ZONE_CLOSED
+ *   - ZC5: Full          | BLK_ZONE_FULL
+ *   - ZC6: Read Only     | BLK_ZONE_READONLY
+ *   - ZC7: Offline       | BLK_ZONE_OFFLINE
+ *
+ * Conditions 0x5 to 0xC are reserved by the current ZBC/ZAC spec and should
+ * be considered invalid.
+ */
+enum blk_zone_cond {
+	BLK_ZONE_COND_NOT_WP	= 0x0,
+	BLK_ZONE_COND_EMPTY	= 0x1,
+	BLK_ZONE_COND_IMP_OPEN	= 0x2,
+	BLK_ZONE_COND_EXP_OPEN	= 0x3,
+	BLK_ZONE_COND_CLOSED	= 0x4,
+	BLK_ZONE_COND_READONLY	= 0xD,
+	BLK_ZONE_COND_FULL	= 0xE,
+	BLK_ZONE_COND_OFFLINE	= 0xF,
+};
+
+/**
+ * struct blk_zone - Zone descriptor for BLKREPORTZONE ioctl.
+ *
+ * @start: Zone start in 512 B sector units
+ * @len: Zone length in 512 B sector units
+ * @wp: Zone write pointer location in 512 B sector units
+ * @type: see enum blk_zone_type for possible values
+ * @cond: see enum blk_zone_cond for possible values
+ * @non_seq: Flag indicating that the zone is using non-sequential resources
+ *           (for host-aware zoned block devices only).
+ * @reset: Flag indicating that a zone reset is recommended.
+ * @reserved: Padding to 64 B to match the ZBC/ZAC defined zone descriptor size.
+ *
+ * start, len and wp use the regular 512 B sector unit, regardless of the
+ * device logical block size. The overall structure size is 64 B to match the
+ * ZBC/ZAC defined zone descriptor and allow support for future additional
+ * zone information.
+ */
+struct blk_zone {
+	__u64	start;		/* Zone start sector */
+	__u64	len;		/* Zone length in number of sectors */
+	__u64	wp;		/* Zone write pointer position */
+	__u8	type;		/* Zone type */
+	__u8	cond;		/* Zone condition */
+	__u8	non_seq;	/* Non-sequential write resources active */
+	__u8	reset;		/* Reset write pointer recommended */
+	__u8	reserved[36];
+};
+
+/**
+ * struct blk_zone_report - BLKREPORTZONE ioctl request/reply
+ *
+ * @sector: starting sector of report
+ * @nr_zones: IN maximum / OUT actual
+ * @reserved: padding to 16 byte alignment
+ * @zones: Space to hold @nr_zones @zones entries on reply.
+ *
+ * The array of at most @nr_zones must follow this structure in memory.
+ */
+struct blk_zone_report {
+	__u64		sector;
+	__u32		nr_zones;
+	__u8		reserved[4];
+	struct blk_zone zones[0];
+} __packed;
+
+/**
+ * struct blk_zone_range - BLKRESETZONE ioctl request
+ * @sector: starting sector of the first zone to issue reset write pointer
+ * @nr_sectors: Total number of sectors of 1 or more zones to reset
+ */
+struct blk_zone_range {
+	__u64		sector;
+	__u64		nr_sectors;
+};
+
+/**
+ * Zoned block device ioctl's:
+ *
+ * @BLKREPORTZONE: Get zone information. Takes a zone report as argument.
+ *                 The zone report will start from the zone containing the
+ *                 sector specified in the report request structure.
+ * @BLKRESETZONE: Reset the write pointer of the zones in the specified
+ *                sector range. The sector range must be zone aligned.
+ */
+#define BLKREPORTZONE	_IOWR(0x12, 130, struct blk_zone_report)
+#define BLKRESETZONE	_IOW(0x12, 131, struct blk_zone_range)
+
+#endif /* _UAPI_BLKZONED_H */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3b00f7c..e0fc7f0 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -222,6 +222,10 @@ struct fsxattr {
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
+/*
+ * A jump here: 130-131 are reserved for zoned block devices
+ * (see uapi/linux/blkzoned.h)
+ */
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
-- 
2.7.4

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 7/7] blk-zoned: implement ioctls
@ 2016-09-26 11:14   ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 11:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Shaun Tancheff, Damien Le Moal

From: Shaun Tancheff <shaun.tancheff@seagate.com>

Adds the new BLKREPORTZONE and BLKRESETZONE ioctls for respectively
obtaining the zone configuration of a zoned block device and resetting
the write pointer of sequential zones of a zoned block device.

The BLKREPORTZONE ioctl maps directly to a single call of the function
blkdev_report_zones. The zone information result is passed as an array
of struct blk_zone identical to the structure used internally for
processing the REQ_OP_ZONE_REPORT operation.  The BLKRESETZONE ioctl
maps to a single call of the blkdev_reset_zones function.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 block/blk-zoned.c             |  95 ++++++++++++++++++++++++++++
 block/ioctl.c                 |   4 ++
 include/linux/blkdev.h        |  65 +++++++------------
 include/uapi/linux/Kbuild     |   1 +
 include/uapi/linux/blkzoned.h | 143 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h       |   4 ++
 6 files changed, 270 insertions(+), 42 deletions(-)
 create mode 100644 include/uapi/linux/blkzoned.h

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 473cb0a..8c70bd6 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/rbtree.h>
 #include <linux/blkdev.h>
+#include <linux/blkzoned.h>
 
 static inline sector_t blk_zone_start(struct request_queue *q,
 				      sector_t sector)
@@ -238,3 +239,97 @@ int blkdev_reset_zones(struct block_device *bdev,
 
 	return 0;
 }
+
+/**
+ * BLKREPORTZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+			      unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct request_queue *q;
+	struct blk_zone_report rep;
+	struct blk_zone *zones;
+	int ret;
+
+	if (!argp)
+		return -EINVAL;
+
+	q = bdev_get_queue(bdev);
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -ENOTTY;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (copy_from_user(&rep, argp, sizeof(struct blk_zone_report)))
+		return -EFAULT;
+
+	if (!rep.nr_zones)
+		return -EINVAL;
+
+	zones = kzalloc(sizeof(struct blk_zone) * rep.nr_zones,
+			GFP_KERNEL);
+	if (!zones)
+		return -ENOMEM;
+
+	ret = blkdev_report_zones(bdev, rep.sector,
+				  zones, &rep.nr_zones,
+				  GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (rep.nr_zones) {
+		if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
+				 sizeof(struct blk_zone) * rep.nr_zones))
+			ret = -EFAULT;
+	}
+
+ out:
+	kfree(zones);
+
+	return ret;
+}
+
+/**
+ * BLKRESETZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+			     unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct request_queue *q;
+	struct blk_zone_range zrange;
+
+	if (!argp)
+		return -EINVAL;
+
+	q = bdev_get_queue(bdev);
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -ENOTTY;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range)))
+		return -EFAULT;
+
+	return blkdev_reset_zones(bdev, zrange.sector, zrange.nr_sectors,
+				  GFP_KERNEL);
+}
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..448f78a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -513,6 +513,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 				BLKDEV_DISCARD_SECURE);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
+	case BLKREPORTZONE:
+		return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
+	case BLKRESETZONE:
+		return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
 	case HDIO_GETGEO:
 		return blkdev_getgeo(bdev, argp);
 	case BLKRAGET:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6034f38..0a75285 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include <linux/rcupdate.h>
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
+#include <linux/blkzoned.h>
 
 struct module;
 struct scsi_ioctl_command;
@@ -304,48 +305,6 @@ struct queue_limits {
 
 #ifdef CONFIG_BLK_DEV_ZONED
 
-/*
- * Zone type.
- */
-enum blk_zone_type {
-	BLK_ZONE_TYPE_UNKNOWN,
-	BLK_ZONE_TYPE_CONVENTIONAL,
-	BLK_ZONE_TYPE_SEQWRITE_REQ,
-	BLK_ZONE_TYPE_SEQWRITE_PREF,
-};
-
-/*
- * Zone condition.
- */
-enum blk_zone_cond {
-	BLK_ZONE_COND_NO_WP,
-	BLK_ZONE_COND_EMPTY,
-	BLK_ZONE_COND_IMP_OPEN,
-	BLK_ZONE_COND_EXP_OPEN,
-	BLK_ZONE_COND_CLOSED,
-	BLK_ZONE_COND_READONLY = 0xd,
-	BLK_ZONE_COND_FULL,
-	BLK_ZONE_COND_OFFLINE,
-};
-
-/*
- * Zone descriptor for BLKREPORTZONE.
- * start, len and wp use the regulare 512 B sector unit,
- * regardless of the device logical block size. The overall
- * structure size is 64 B to match the ZBC/ZAC defined zone descriptor
- * and allow support for future additional zone information.
- */
-struct blk_zone {
-	u64	start;		/* Zone start sector */
-	u64	len;		/* Zone length in number of sectors */
-	u64	wp;		/* Zone write pointer position */
-	u8	type;		/* Zone type */
-	u8	cond;		/* Zone condition */
-	u8	non_seq;	/* Non-sequential write resources active */
-	u8	reset;		/* Reset write pointer recommended */
-	u8	reserved[36];
-};
-
 struct blk_zone_report_hdr {
 	unsigned int	nr_zones;
 	u8		padding[60];
@@ -356,6 +315,28 @@ extern int blkdev_report_zones(struct block_device *,
 				unsigned int *, gfp_t);
 extern int blkdev_reset_zones(struct block_device *, sector_t,
 				sector_t, gfp_t);
+
+extern int blkdev_report_zones_ioctl(struct block_device *, fmode_t,
+				     unsigned int, unsigned long);
+extern int blkdev_reset_zones_ioctl(struct block_device *, fmode_t,
+				    unsigned int, unsigned long);
+
+#else /* CONFIG_BLK_DEV_ZONED */
+
+static inline int blkdev_report_zones_ioctl(struct block_device *bdev,
+					    fmode_t mode, unsigned int cmd,
+					    unsigned long arg)
+{
+	return -ENOTTY;
+}
+
+static inline int blkdev_reset_zones_ioctl(struct block_device *bdev,
+					   fmode_t mode, unsigned int cmd,
+					   unsigned long arg)
+{
+	return -ENOTTY;
+}
+
 #endif /* CONFIG_BLK_DEV_ZONED */
 
 struct request_queue {
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index dd60439..92466a6 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -70,6 +70,7 @@ header-y += bfs_fs.h
 header-y += binfmts.h
 header-y += blkpg.h
 header-y += blktrace_api.h
+header-y += blkzoned.h
 header-y += bpf_common.h
 header-y += bpf_perf_event.h
 header-y += bpf.h
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
new file mode 100644
index 0000000..40d1d7b
--- /dev/null
+++ b/include/uapi/linux/blkzoned.h
@@ -0,0 +1,143 @@
+/*
+ * Zoned block devices handling.
+ *
+ * Copyright (C) 2015 Seagate Technology PLC
+ *
+ * Written by: Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
+ * Copyright (C) 2016 Western Digital
+ *
+ * This file is licensed under  the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+#ifndef _UAPI_BLKZONED_H
+#define _UAPI_BLKZONED_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/**
+ * enum blk_zone_type - Types of zones allowed in a zoned device.
+ *
+ * @BLK_ZONE_TYPE_CONVENTIONAL: The zone has no write pointer and can be writen
+ *                              randomly. Zone reset has no effect on the zone.
+ * @BLK_ZONE_TYPE_SEQWRITE_REQ: The zone must be written sequentially
+ * @BLK_ZONE_TYPE_SEQWRITE_PREF: The zone can be written non-sequentially
+ *
+ * Any other value not defined is reserved and must be considered as invalid.
+ */
+enum blk_zone_type {
+	BLK_ZONE_TYPE_CONVENTIONAL	= 0x1,
+	BLK_ZONE_TYPE_SEQWRITE_REQ	= 0x2,
+	BLK_ZONE_TYPE_SEQWRITE_PREF	= 0x3,
+};
+
+/**
+ * enum blk_zone_cond - Condition [state] of a zone in a zoned device.
+ *
+ * @BLK_ZONE_COND_NOT_WP: The zone has no write pointer, it is conventional.
+ * @BLK_ZONE_COND_EMPTY: The zone is empty.
+ * @BLK_ZONE_COND_IMP_OPEN: The zone is open, but not explicitly opened.
+ * @BLK_ZONE_COND_EXP_OPEN: The zones was explicitly opened by an
+ *                          OPEN ZONE command.
+ * @BLK_ZONE_COND_CLOSED: The zone was [explicitly] closed after writing.
+ * @BLK_ZONE_COND_FULL: The zone is marked as full, possibly by a zone
+ *                      FINISH ZONE command.
+ * @BLK_ZONE_COND_READONLY: The zone is read-only.
+ * @BLK_ZONE_COND_OFFLINE: The zone is offline (sectors cannot be read/written).
+ *
+ * The Zone Condition state machine in the ZBC/ZAC standards maps the above
+ * deinitions as:
+ *   - ZC1: Empty         | BLK_ZONE_EMPTY
+ *   - ZC2: Implicit Open | BLK_ZONE_COND_IMP_OPEN
+ *   - ZC3: Explicit Open | BLK_ZONE_COND_EXP_OPEN
+ *   - ZC4: Closed        | BLK_ZONE_CLOSED
+ *   - ZC5: Full          | BLK_ZONE_FULL
+ *   - ZC6: Read Only     | BLK_ZONE_READONLY
+ *   - ZC7: Offline       | BLK_ZONE_OFFLINE
+ *
+ * Conditions 0x5 to 0xC are reserved by the current ZBC/ZAC spec and should
+ * be considered invalid.
+ */
+enum blk_zone_cond {
+	BLK_ZONE_COND_NOT_WP	= 0x0,
+	BLK_ZONE_COND_EMPTY	= 0x1,
+	BLK_ZONE_COND_IMP_OPEN	= 0x2,
+	BLK_ZONE_COND_EXP_OPEN	= 0x3,
+	BLK_ZONE_COND_CLOSED	= 0x4,
+	BLK_ZONE_COND_READONLY	= 0xD,
+	BLK_ZONE_COND_FULL	= 0xE,
+	BLK_ZONE_COND_OFFLINE	= 0xF,
+};
+
+/**
+ * struct blk_zone - Zone descriptor for BLKREPORTZONE ioctl.
+ *
+ * @start: Zone start in 512 B sector units
+ * @len: Zone length in 512 B sector units
+ * @wp: Zone write pointer location in 512 B sector units
+ * @type: see enum blk_zone_type for possible values
+ * @cond: see enum blk_zone_cond for possible values
+ * @non_seq: Flag indicating that the zone is using non-sequential resources
+ *           (for host-aware zoned block devices only).
+ * @reset: Flag indicating that a zone reset is recommended.
+ * @reserved: Padding to 64 B to match the ZBC/ZAC defined zone descriptor size.
+ *
+ * start, len and wp use the regular 512 B sector unit, regardless of the
+ * device logical block size. The overall structure size is 64 B to match the
+ * ZBC/ZAC defined zone descriptor and allow support for future additional
+ * zone information.
+ */
+struct blk_zone {
+	__u64	start;		/* Zone start sector */
+	__u64	len;		/* Zone length in number of sectors */
+	__u64	wp;		/* Zone write pointer position */
+	__u8	type;		/* Zone type */
+	__u8	cond;		/* Zone condition */
+	__u8	non_seq;	/* Non-sequential write resources active */
+	__u8	reset;		/* Reset write pointer recommended */
+	__u8	reserved[36];
+};
+
+/**
+ * struct blk_zone_report - BLKREPORTZONE ioctl request/reply
+ *
+ * @sector: starting sector of report
+ * @nr_zones: IN maximum / OUT actual
+ * @reserved: padding to 16 byte alignment
+ * @zones: Space to hold @nr_zones @zones entries on reply.
+ *
+ * The array of at most @nr_zones must follow this structure in memory.
+ */
+struct blk_zone_report {
+	__u64		sector;
+	__u32		nr_zones;
+	__u8		reserved[4];
+	struct blk_zone zones[0];
+} __packed;
+
+/**
+ * struct blk_zone_range - BLKRESETZONE ioctl request
+ * @sector: starting sector of the first zone to issue reset write pointer
+ * @nr_sectors: Total number of sectors of 1 or more zones to reset
+ */
+struct blk_zone_range {
+	__u64		sector;
+	__u64		nr_sectors;
+};
+
+/**
+ * Zoned block device ioctl's:
+ *
+ * @BLKREPORTZONE: Get zone information. Takes a zone report as argument.
+ *                 The zone report will start from the zone containing the
+ *                 sector specified in the report request structure.
+ * @BLKRESETZONE: Reset the write pointer of the zones in the specified
+ *                sector range. The sector range must be zone aligned.
+ */
+#define BLKREPORTZONE	_IOWR(0x12, 130, struct blk_zone_report)
+#define BLKRESETZONE	_IOW(0x12, 131, struct blk_zone_range)
+
+#endif /* _UAPI_BLKZONED_H */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3b00f7c..e0fc7f0 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -222,6 +222,10 @@ struct fsxattr {
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
+/*
+ * A jump here: 130-131 are reserved for zoned block devices
+ * (see uapi/linux/blkzoned.h)
+ */
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/7] blk-zoned: implement ioctls
  2016-09-26 11:14   ` Damien Le Moal
  (?)
@ 2016-09-26 16:37   ` Christoph Hellwig
  2016-09-26 23:12     ` Shaun Tancheff
  2016-09-26 23:30       ` Damien Le Moal
  -1 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2016-09-26 16:37 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Shaun Tancheff

> +	zones = kzalloc(sizeof(struct blk_zone) * rep.nr_zones,
> +			GFP_KERNEL);
> +	if (!zones)
> +		return -ENOMEM;

This should use kcalloc to get us underflow checking for the user
controlled allocation size.

> +	if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (rep.nr_zones) {
> +		if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
> +				 sizeof(struct blk_zone) * rep.nr_zones))
> +			ret = -EFAULT;
> +	}

We could actually do this with a single big copy_to_user.  Not that
it really matters, though..

> -/*
> - * Zone type.
> - */
> -enum blk_zone_type {
> -	BLK_ZONE_TYPE_UNKNOWN,
> -	BLK_ZONE_TYPE_CONVENTIONAL,
> -	BLK_ZONE_TYPE_SEQWRITE_REQ,
> -	BLK_ZONE_TYPE_SEQWRITE_PREF,
> -};

Please don't move this code around after it was added just two
patches earlier.  I'd say just split adding the new blkzoned.h
uapi header into a patch of it's own and add that before the
core block code.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/7] blk-zoned: implement ioctls
  2016-09-26 16:37   ` Christoph Hellwig
@ 2016-09-26 23:12     ` Shaun Tancheff
  2016-09-27 18:24       ` Christoph Hellwig
  2016-09-26 23:30       ` Damien Le Moal
  1 sibling, 1 reply; 28+ messages in thread
From: Shaun Tancheff @ 2016-09-26 23:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Damien Le Moal, Jens Axboe, linux-block, linux-scsi,
	Christoph Hellwig, Martin K . Petersen

On Mon, Sep 26, 2016 at 11:37 AM, Christoph Hellwig <hch@infradead.org> wrote:
>> +     zones = kzalloc(sizeof(struct blk_zone) * rep.nr_zones,
>> +                     GFP_KERNEL);
>> +     if (!zones)
>> +             return -ENOMEM;
>
> This should use kcalloc to get us underflow checking for the user
> controlled allocation size.

Ah. yes. Will fix that.

>> +     if (copy_to_user(argp, &rep, sizeof(struct blk_zone_report))) {
>> +             ret = -EFAULT;
>> +             goto out;
>> +     }
>> +
>> +     if (rep.nr_zones) {
>> +             if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
>> +                              sizeof(struct blk_zone) * rep.nr_zones))
>> +                     ret = -EFAULT;
>> +     }
>
> We could actually do this with a single big copy_to_user.  Not that
> it really matters, though..

Except our source locations are disjoint (stack and kcalloc'd).

>> -/*
>> - * Zone type.
>> - */
>> -enum blk_zone_type {
>> -     BLK_ZONE_TYPE_UNKNOWN,
>> -     BLK_ZONE_TYPE_CONVENTIONAL,
>> -     BLK_ZONE_TYPE_SEQWRITE_REQ,
>> -     BLK_ZONE_TYPE_SEQWRITE_PREF,
>> -};
>
> Please don't move this code around after it was added just two
> patches earlier.  I'd say just split adding the new blkzoned.h
> uapi header into a patch of it's own and add that before the
> core block code.

Sounds good. Will reshuffle the patchset tonight.

Thanks!
-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/7] blk-zoned: implement ioctls
  2016-09-26 16:37   ` Christoph Hellwig
@ 2016-09-26 23:30       ` Damien Le Moal
  2016-09-26 23:30       ` Damien Le Moal
  1 sibling, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 23:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Shaun Tancheff


Christoph,

On 9/27/16 01:37, Christoph Hellwig wrote:
>> -/*
>> - * Zone type.
>> - */
>> -enum blk_zone_type {
>> -	BLK_ZONE_TYPE_UNKNOWN,
>> -	BLK_ZONE_TYPE_CONVENTIONAL,
>> -	BLK_ZONE_TYPE_SEQWRITE_REQ,
>> -	BLK_ZONE_TYPE_SEQWRITE_PREF,
>> -};
> =

> Please don't move this code around after it was added just two
> patches earlier.  I'd say just split adding the new blkzoned.h
> uapi header into a patch of it's own and add that before the
> core block code.

Or we could just simply merge patches 5 and 7... Even more simple.
Would that be OK ? Shaun, any objection ?

Best regards.

-- =

Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital brand
Damien.LeMoal@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa, =

Kanagawa, 252-0888 Japan
www.hgst.com
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality N=
otice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or l=
egally privileged information of WDC and/or its affiliates, and are intende=
d solely for the use of the individual or entity to which they are addresse=
d. If you are not the intended recipient, any disclosure, copying, distribu=
tion or any action taken or omitted to be taken in reliance on it, is prohi=
bited. If you have received this e-mail in error, please notify the sender =
immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/7] blk-zoned: implement ioctls
@ 2016-09-26 23:30       ` Damien Le Moal
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2016-09-26 23:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Shaun Tancheff


Christoph,

On 9/27/16 01:37, Christoph Hellwig wrote:
>> -/*
>> - * Zone type.
>> - */
>> -enum blk_zone_type {
>> -	BLK_ZONE_TYPE_UNKNOWN,
>> -	BLK_ZONE_TYPE_CONVENTIONAL,
>> -	BLK_ZONE_TYPE_SEQWRITE_REQ,
>> -	BLK_ZONE_TYPE_SEQWRITE_PREF,
>> -};
> 
> Please don't move this code around after it was added just two
> patches earlier.  I'd say just split adding the new blkzoned.h
> uapi header into a patch of it's own and add that before the
> core block code.

Or we could just simply merge patches 5 and 7... Even more simple.
Would that be OK ? Shaun, any objection ?

Best regards.

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital brand
Damien.LeMoal@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa, 
Kanagawa, 252-0888 Japan
www.hgst.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/7] blk-zoned: implement ioctls
  2016-09-26 23:30       ` Damien Le Moal
@ 2016-09-26 23:58         ` Shaun Tancheff
  -1 siblings, 0 replies; 28+ messages in thread
From: Shaun Tancheff @ 2016-09-26 23:58 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-scsi,
	Christoph Hellwig, Martin K . Petersen

No objection here.

On Mon, Sep 26, 2016 at 6:30 PM, Damien Le Moal <damien.lemoal@hgst.com> wr=
ote:
>
> Christoph,
>
> On 9/27/16 01:37, Christoph Hellwig wrote:
>>> -/*
>>> - * Zone type.
>>> - */
>>> -enum blk_zone_type {
>>> -    BLK_ZONE_TYPE_UNKNOWN,
>>> -    BLK_ZONE_TYPE_CONVENTIONAL,
>>> -    BLK_ZONE_TYPE_SEQWRITE_REQ,
>>> -    BLK_ZONE_TYPE_SEQWRITE_PREF,
>>> -};
>>
>> Please don't move this code around after it was added just two
>> patches earlier.  I'd say just split adding the new blkzoned.h
>> uapi header into a patch of it's own and add that before the
>> core block code.
>
> Or we could just simply merge patches 5 and 7... Even more simple.
> Would that be OK ? Shaun, any objection ?
>
> Best regards.
>
> --
> Damien Le Moal, Ph.D.
> Sr. Manager, System Software Group, HGST Research,
> HGST, a Western Digital brand
> Damien.LeMoal@hgst.com
> (+81) 0466-98-3593 (ext. 513593)
> 1 kirihara-cho, Fujisawa,
> Kanagawa, 252-0888 Japan
> www.hgst.com
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality=
 Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or=
 legally privileged information of WDC and/or its affiliates, and are inten=
ded solely for the use of the individual or entity to which they are addres=
sed. If you are not the intended recipient, any disclosure, copying, distri=
bution or any action taken or omitted to be taken in reliance on it, is pro=
hibited. If you have received this e-mail in error, please notify the sende=
r immediately and delete the e-mail in its entirety from your system.
>



--=20
Shaun Tancheff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/7] blk-zoned: implement ioctls
@ 2016-09-26 23:58         ` Shaun Tancheff
  0 siblings, 0 replies; 28+ messages in thread
From: Shaun Tancheff @ 2016-09-26 23:58 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-scsi,
	Christoph Hellwig, Martin K . Petersen

No objection here.

On Mon, Sep 26, 2016 at 6:30 PM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
>
> Christoph,
>
> On 9/27/16 01:37, Christoph Hellwig wrote:
>>> -/*
>>> - * Zone type.
>>> - */
>>> -enum blk_zone_type {
>>> -    BLK_ZONE_TYPE_UNKNOWN,
>>> -    BLK_ZONE_TYPE_CONVENTIONAL,
>>> -    BLK_ZONE_TYPE_SEQWRITE_REQ,
>>> -    BLK_ZONE_TYPE_SEQWRITE_PREF,
>>> -};
>>
>> Please don't move this code around after it was added just two
>> patches earlier.  I'd say just split adding the new blkzoned.h
>> uapi header into a patch of it's own and add that before the
>> core block code.
>
> Or we could just simply merge patches 5 and 7... Even more simple.
> Would that be OK ? Shaun, any objection ?
>
> Best regards.
>
> --
> Damien Le Moal, Ph.D.
> Sr. Manager, System Software Group, HGST Research,
> HGST, a Western Digital brand
> Damien.LeMoal@hgst.com
> (+81) 0466-98-3593 (ext. 513593)
> 1 kirihara-cho, Fujisawa,
> Kanagawa, 252-0888 Japan
> www.hgst.com
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/7] blk-zoned: implement ioctls
  2016-09-26 23:12     ` Shaun Tancheff
@ 2016-09-27 18:24       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2016-09-27 18:24 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Christoph Hellwig, Damien Le Moal, Jens Axboe, linux-block,
	linux-scsi, Christoph Hellwig, Martin K . Petersen

On Mon, Sep 26, 2016 at 06:12:24PM -0500, Shaun Tancheff wrote:
> Except our source locations are disjoint (stack and kcalloc'd).

Indeed.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/7] block: Implement support for zoned block devices
  2016-09-26 11:14   ` Damien Le Moal
  (?)
@ 2016-09-27 18:51   ` Shaun Tancheff
  -1 siblings, 0 replies; 28+ messages in thread
From: Shaun Tancheff @ 2016-09-27 18:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Hannes Reinecke, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, shaun.tancheff, shaun

From: Hannes Reinecke <hare@suse.de>

Implement zoned block device zone information reporting and reset.
Zone information are reported as struct blk_zone. This implementation
does not differentiate between host-aware and host-managed device
models and is valid for both. Two functions are provided:
blkdev_report_zones for discovering the zone configuration of a
zoned block device, and blkdev_reset_zones for resetting the write
pointer of sequential zones. The helper function blk_queue_zone_size
and bdev_zone_size are also provided for, as the name suggest,
obtaining the zone size (in 512B sectors) of the zones of the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed the zone cache
         * Implement report zones operation based on earlier proposal
           by Shaun Tancheff <shaun.tancheff@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---

Changes from v2:
 - Added EXPORT_SYMBOL_GPL() per Damien
 - Added uapi blkzoned.h earlier and put shared enums/struct directly
   into blkzoned.h

 block/Kconfig                 |   8 ++
 block/Makefile                |   1 +
 block/blk-zoned.c             | 242 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h        |  30 ++++++
 include/uapi/linux/Kbuild     |   1 +
 include/uapi/linux/blkzoned.h | 103 ++++++++++++++++++
 6 files changed, 385 insertions(+)
 create mode 100644 block/blk-zoned.c
 create mode 100644 include/uapi/linux/blkzoned.h

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..6b0ad08 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
 	T10/SCSI Data Integrity Field or the T13/ATA External Path
 	Protection.  If in doubt, say N.
 
+config BLK_DEV_ZONED
+	bool "Zoned block device support"
+	---help---
+	Block layer zoned block device support. This option enables
+	support for ZAC/ZBC host-managed and host-aware zoned block devices.
+
+	Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_DEV_THROTTLING
 	bool "Block layer bio throttling support"
 	depends on BLK_CGROUP=y
diff --git a/block/Makefile b/block/Makefile
index 36acdd7..9371bc7 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -22,4 +22,5 @@ obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
 obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
+obj-$(CONFIG_BLK_DEV_ZONED)	+= blk-zoned.o
 obj-$(CONFIG_BLK_MQ_PCI)	+= blk-mq-pci.o
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 0000000..bc4159d
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,242 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ *
+ * Copyright (c) 2016, Damien Le Moal
+ * Copyright (c) 2016, Western Digital
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/rbtree.h>
+#include <linux/blkdev.h>
+
+static inline sector_t blk_zone_start(struct request_queue *q,
+				      sector_t sector)
+{
+	sector_t zone_mask = blk_queue_zone_size(q) - 1;
+
+	return sector & ~zone_mask;
+}
+
+static inline void blkdev_report_to_zone(struct block_device *bdev,
+					 void *rep,
+					 struct blk_zone *zone)
+{
+	sector_t offset = get_start_sect(bdev);
+
+	memcpy(zone, rep, sizeof(struct blk_zone));
+	zone->start -= offset;
+	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		zone->wp = zone->start + zone->len;
+	else
+		zone->wp -= offset;
+}
+
+/**
+ * blkdev_report_zones - Get zones information
+ * @bdev:	Target block device
+ * @sector:	Sector from which to report zones
+ * @zones:      Array of zone structures where to return the zones information
+ * @nr_zones:   Number of zone structures in the zone array
+ * @gfp_mask:	Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Get zone information starting from the zone containing @sector.
+ *    The number of zone information reported may be less than the number
+ *    requested by @nr_zones. The number of zones actually reported is
+ *    returned in @nr_zones.
+ */
+int blkdev_report_zones(struct block_device *bdev,
+			sector_t sector,
+			struct blk_zone *zones,
+			unsigned int *nr_zones,
+			gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct blk_zone_report_hdr *hdr;
+	unsigned int nrz = *nr_zones;
+	struct page *page;
+	unsigned int nr_rep;
+	size_t rep_bytes;
+	unsigned int nr_pages;
+	struct bio *bio;
+	struct bio_vec *bv;
+	unsigned int i, nz;
+	unsigned int ofst;
+	void *addr;
+	int ret = 0;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (!nrz)
+		return 0;
+
+	if (sector > bdev->bd_part->nr_sects) {
+		*nr_zones = 0;
+		return 0;
+	}
+
+	/*
+	 * The zone report has a header. So make room for it in the
+	 * payload. Also make sure that the report fits in a single BIO
+	 * that will not be split down the stack.
+	 */
+	rep_bytes = sizeof(struct blk_zone_report_hdr) +
+		sizeof(struct blk_zone) * nrz;
+	rep_bytes = (rep_bytes + PAGE_SIZE - 1) & PAGE_MASK;
+	if (rep_bytes > (queue_max_sectors(q) << 9))
+		rep_bytes = queue_max_sectors(q) << 9;
+
+	nr_pages = min_t(unsigned int, BIO_MAX_PAGES,
+			 rep_bytes >> PAGE_SHIFT);
+	nr_pages = min_t(unsigned int, nr_pages,
+			 queue_max_segments(q));
+
+	bio = bio_alloc(gfp_mask, nr_pages);
+	if (!bio)
+		return -ENOMEM;
+
+	bio->bi_bdev = bdev;
+	bio->bi_iter.bi_sector = blk_zone_start(q, sector);
+	bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, 0);
+
+	for (i = 0; i < nr_pages; i++) {
+		page = alloc_page(gfp_mask);
+		if (!page) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		if (!bio_add_page(bio, page, PAGE_SIZE, 0)) {
+			__free_page(page);
+			break;
+		}
+	}
+
+	if (i == 0)
+		ret = -ENOMEM;
+	else
+		ret = submit_bio_wait(bio);
+	if (ret)
+		goto out;
+
+	/*
+	 * Process the report resukt: skip the header and go through the
+	 * reported zones to fixup and fixup the zone information for
+	 * partitions. At the same time, return the zone information into
+	 * the zone array.
+	 */
+	nz = 0;
+	nr_rep = 0;
+	bio_for_each_segment_all(bv, bio, i) {
+
+		if (!bv->bv_page)
+			break;
+
+		addr = kmap_atomic(bv->bv_page);
+
+		/* Get header in the first page */
+		ofst = 0;
+		if (!nr_rep) {
+			hdr = (struct blk_zone_report_hdr *) addr;
+			nr_rep = hdr->nr_zones;
+			ofst = sizeof(struct blk_zone_report_hdr);
+		}
+
+		/* Fixup and report zones */
+		while (ofst < bv->bv_len &&
+		       nz < min_t(unsigned int, nr_rep, nrz)) {
+			blkdev_report_to_zone(bdev, addr + ofst, &zones[nz]);
+			ofst += sizeof(struct blk_zone);
+			nz++;
+		}
+
+		kunmap_atomic(addr);
+
+		if (!nr_rep)
+			break;
+
+	}
+
+out:
+	bio_for_each_segment_all(bv, bio, i)
+		__free_page(bv->bv_page);
+	bio_put(bio);
+
+	if (ret == 0)
+		*nr_zones = nz;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkdev_report_zones);
+
+/**
+ * blkdev_reset_zones - Reset zones write pointer
+ * @bdev:       Target block device
+ * @sector:     Start sector of the first zone to reset
+ * @nr_sectors: Number of sectors, at least the length of one zone
+ * @gfp_mask:   Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Reset the write pointer of the zones contained in the range
+ *    @sector..@sector+@nr_sectors. Specifying the entire disk sector range
+ *    is valid, but the specified range should not contain conventional zones.
+ */
+int blkdev_reset_zones(struct block_device *bdev,
+		       sector_t sector, sector_t nr_sectors,
+		       gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+	sector_t zone_sectors;
+	sector_t end_sector = sector + nr_sectors;
+	struct bio *bio;
+	int ret;
+
+	if (!q)
+		return -ENXIO;
+
+	if (!blk_queue_is_zoned(q))
+		return -EOPNOTSUPP;
+
+	if (end_sector > bdev->bd_part->nr_sects)
+		/* Out of range */
+		return -EINVAL;
+
+	/* Check alignement (handle eventual smaller last zone) */
+	zone_sectors = blk_queue_zone_size(q);
+	if (sector & (zone_sectors - 1))
+		return -EINVAL;
+
+	if ((nr_sectors & (zone_sectors - 1)) &&
+	    end_sector != bdev->bd_part->nr_sects)
+		return -EINVAL;
+
+	while (sector < end_sector) {
+
+		bio = bio_alloc(gfp_mask, 0);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_bdev = bdev;
+		bio_set_op_attrs(bio, REQ_OP_ZONE_RESET, 0);
+
+		ret = submit_bio_wait(bio);
+		bio_put(bio);
+
+		if (ret)
+			return ret;
+
+		sector += zone_sectors;
+
+		/* This may take a while, so be nice to others */
+		cond_resched();
+
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkdev_reset_zones);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f19e16b..6316972 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include <linux/rcupdate.h>
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
+#include <linux/blkzoned.h>
 
 struct module;
 struct scsi_ioctl_command;
@@ -302,6 +303,20 @@ struct queue_limits {
 	enum blk_zoned_model	zoned;
 };
 
+#ifdef CONFIG_BLK_DEV_ZONED
+
+struct blk_zone_report_hdr {
+	unsigned int	nr_zones;
+	u8		padding[60];
+};
+
+extern int blkdev_report_zones(struct block_device *,
+				sector_t, struct blk_zone *,
+				unsigned int *, gfp_t);
+extern int blkdev_reset_zones(struct block_device *, sector_t,
+				sector_t, gfp_t);
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 struct request_queue {
 	/*
 	 * Together with queue_head for cacheline sharing
@@ -654,6 +669,11 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
 	}
 }
 
+static inline unsigned int blk_queue_zone_size(struct request_queue *q)
+{
+	return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0;
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */
@@ -1401,6 +1421,16 @@ static inline bool bdev_is_zoned(struct block_device *bdev)
 	return false;
 }
 
+static inline unsigned int bdev_zone_size(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_zone_size(q);
+
+	return 0;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index dd60439..92466a6 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -70,6 +70,7 @@ header-y += bfs_fs.h
 header-y += binfmts.h
 header-y += blkpg.h
 header-y += blktrace_api.h
+header-y += blkzoned.h
 header-y += bpf_common.h
 header-y += bpf_perf_event.h
 header-y += bpf.h
diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
new file mode 100644
index 0000000..a381721
--- /dev/null
+++ b/include/uapi/linux/blkzoned.h
@@ -0,0 +1,103 @@
+/*
+ * Zoned block devices handling.
+ *
+ * Copyright (C) 2015 Seagate Technology PLC
+ *
+ * Written by: Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
+ * Copyright (C) 2016 Western Digital
+ *
+ * This file is licensed under  the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+#ifndef _UAPI_BLKZONED_H
+#define _UAPI_BLKZONED_H
+
+#include <linux/types.h>
+
+/**
+ * enum blk_zone_type - Types of zones allowed in a zoned device.
+ *
+ * @BLK_ZONE_TYPE_CONVENTIONAL: The zone has no write pointer and can be writen
+ *                              randomly. Zone reset has no effect on the zone.
+ * @BLK_ZONE_TYPE_SEQWRITE_REQ: The zone must be written sequentially
+ * @BLK_ZONE_TYPE_SEQWRITE_PREF: The zone can be written non-sequentially
+ *
+ * Any other value not defined is reserved and must be considered as invalid.
+ */
+enum blk_zone_type {
+	BLK_ZONE_TYPE_CONVENTIONAL	= 0x1,
+	BLK_ZONE_TYPE_SEQWRITE_REQ	= 0x2,
+	BLK_ZONE_TYPE_SEQWRITE_PREF	= 0x3,
+};
+
+/**
+ * enum blk_zone_cond - Condition [state] of a zone in a zoned device.
+ *
+ * @BLK_ZONE_COND_NOT_WP: The zone has no write pointer, it is conventional.
+ * @BLK_ZONE_COND_EMPTY: The zone is empty.
+ * @BLK_ZONE_COND_IMP_OPEN: The zone is open, but not explicitly opened.
+ * @BLK_ZONE_COND_EXP_OPEN: The zones was explicitly opened by an
+ *                          OPEN ZONE command.
+ * @BLK_ZONE_COND_CLOSED: The zone was [explicitly] closed after writing.
+ * @BLK_ZONE_COND_FULL: The zone is marked as full, possibly by a zone
+ *                      FINISH ZONE command.
+ * @BLK_ZONE_COND_READONLY: The zone is read-only.
+ * @BLK_ZONE_COND_OFFLINE: The zone is offline (sectors cannot be read/written).
+ *
+ * The Zone Condition state machine in the ZBC/ZAC standards maps the above
+ * deinitions as:
+ *   - ZC1: Empty         | BLK_ZONE_EMPTY
+ *   - ZC2: Implicit Open | BLK_ZONE_COND_IMP_OPEN
+ *   - ZC3: Explicit Open | BLK_ZONE_COND_EXP_OPEN
+ *   - ZC4: Closed        | BLK_ZONE_CLOSED
+ *   - ZC5: Full          | BLK_ZONE_FULL
+ *   - ZC6: Read Only     | BLK_ZONE_READONLY
+ *   - ZC7: Offline       | BLK_ZONE_OFFLINE
+ *
+ * Conditions 0x5 to 0xC are reserved by the current ZBC/ZAC spec and should
+ * be considered invalid.
+ */
+enum blk_zone_cond {
+	BLK_ZONE_COND_NOT_WP	= 0x0,
+	BLK_ZONE_COND_EMPTY	= 0x1,
+	BLK_ZONE_COND_IMP_OPEN	= 0x2,
+	BLK_ZONE_COND_EXP_OPEN	= 0x3,
+	BLK_ZONE_COND_CLOSED	= 0x4,
+	BLK_ZONE_COND_READONLY	= 0xD,
+	BLK_ZONE_COND_FULL	= 0xE,
+	BLK_ZONE_COND_OFFLINE	= 0xF,
+};
+
+/**
+ * struct blk_zone - Zone descriptor for BLKREPORTZONE ioctl.
+ *
+ * @start: Zone start in 512 B sector units
+ * @len: Zone length in 512 B sector units
+ * @wp: Zone write pointer location in 512 B sector units
+ * @type: see enum blk_zone_type for possible values
+ * @cond: see enum blk_zone_cond for possible values
+ * @non_seq: Flag indicating that the zone is using non-sequential resources
+ *           (for host-aware zoned block devices only).
+ * @reset: Flag indicating that a zone reset is recommended.
+ * @reserved: Padding to 64 B to match the ZBC/ZAC defined zone descriptor size.
+ *
+ * start, len and wp use the regular 512 B sector unit, regardless of the
+ * device logical block size. The overall structure size is 64 B to match the
+ * ZBC/ZAC defined zone descriptor and allow support for future additional
+ * zone information.
+ */
+struct blk_zone {
+	__u64	start;		/* Zone start sector */
+	__u64	len;		/* Zone length in number of sectors */
+	__u64	wp;		/* Zone write pointer position */
+	__u8	type;		/* Zone type */
+	__u8	cond;		/* Zone condition */
+	__u8	non_seq;	/* Non-sequential write resources active */
+	__u8	reset;		/* Reset write pointer recommended */
+	__u8	reserved[36];
+};
+
+#endif /* _UAPI_BLKZONED_H */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/7] sd: Implement support for ZBC devices
  2016-09-26 11:14   ` Damien Le Moal
  (?)
@ 2016-09-27 21:08   ` Shaun Tancheff
  -1 siblings, 0 replies; 28+ messages in thread
From: Shaun Tancheff @ 2016-09-27 21:08 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke

On Mon, Sep 26, 2016 at 6:14 AM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> Implement ZBC support functions to setup zoned disks, both
> host-managed and host-aware models. Only zoned disks that satisfy
> the following conditions are supported:
> 1) All zones are the same size, with the exception of an eventual
>    last smaller runt zone.
> 2) For host-managed disks, reads are unrestricted (reads are not
>    failed due to zone or write pointer alignement constraints).
> Zoned disks that do not satisfy these 2 conditions will be ignored.
>
> The capacity read of the device triggers the zoned block device
> checks. As this needs the zone model of the disk, the call to
> sd_read_capacity is moved after the call to
> sd_read_block_characteristics so that host-aware devices are
> properlly detected and initialized. The call to sd_zbc_read_zones
> in sd_read_capacity may change the device capacity obtained with
> the sd_read_capacity_16 function for devices reporting only the
> capacity of conventional zones at the beginning of the LBA range
> (i.e. devices with rc_basis set to 0).
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
>
> [Damien: * Removed zone cache support
>          * Removed mapping of discard to reset write pointer command
>          * Modified sd_zbc_read_zones to include checks that the
>            device satisfies the kernel constraints
>          * Implemeted REPORT ZONES setup and post-processing based
>            on code from Shaun Tancheff <shaun.tancheff@seagate.com>]
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  drivers/scsi/Makefile     |   1 +
>  drivers/scsi/sd.c         |  97 ++++++--
>  drivers/scsi/sd.h         |  67 ++++++
>  drivers/scsi/sd_zbc.c     | 586 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/scsi/scsi_proto.h |  17 ++
>  5 files changed, 754 insertions(+), 14 deletions(-)
>  create mode 100644 drivers/scsi/sd_zbc.c
>
> diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
> index fc0d9b8..350513c 100644
> --- a/drivers/scsi/Makefile
> +++ b/drivers/scsi/Makefile
> @@ -180,6 +180,7 @@ hv_storvsc-y                        := storvsc_drv.o
>
>  sd_mod-objs    := sd.o
>  sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
> +sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
>
>  sr_mod-objs    := sr.o sr_ioctl.o sr_vendor.o
>  ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 51e5629..4b3523b 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -93,6 +93,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
> +MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
>
>  #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
>  #define SD_MINORS      16
> @@ -163,7 +164,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
>         static const char temp[] = "temporary ";
>         int len;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 /* no cache control on RBC devices; theoretically they
>                  * can do it, but there's probably so many exceptions
>                  * it's not worth the risk */
> @@ -262,7 +263,7 @@ allow_restart_store(struct device *dev, struct device_attribute *attr,
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 return -EINVAL;
>
>         sdp->allow_restart = simple_strtoul(buf, NULL, 10);
> @@ -392,6 +393,11 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
>
> +       if (sd_is_zoned(sdkp)) {
> +               sd_config_discard(sdkp, SD_LBP_DISABLE);
> +               return count;
> +       }
> +
>         if (sdp->type != TYPE_DISK)
>                 return -EINVAL;
>
> @@ -459,7 +465,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 return -EINVAL;
>
>         err = kstrtoul(buf, 10, &max);
> @@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
>
>         BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
>
> +       if (sd_is_zoned(sdkp)) {
> +               /* sd_zbc_setup_read_write uses block layer sector units */
> +               ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
> +               if (ret != BLKPREP_OK)
> +                       return ret;
> +       }
> +
>         sector >>= ilog2(sdp->sector_size) - 9;
>         nr_sectors >>= ilog2(sdp->sector_size) - 9;
>
> @@ -963,6 +976,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
>         SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
>                                         (unsigned long long)block));
>
> +       if (sd_is_zoned(sdkp)) {
> +               /* sd_zbc_setup_read_write uses block layer sector units */
> +               ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
> +               if (ret != BLKPREP_OK)
> +                       goto out;
> +       }
> +
>         /*
>          * If we have a 1K hardware sectorsize, prevent access to single
>          * 512 byte sectors.  In theory we could handle this - in fact
> @@ -1149,6 +1169,10 @@ static int sd_init_command(struct scsi_cmnd *cmd)
>         case REQ_OP_READ:
>         case REQ_OP_WRITE:
>                 return sd_setup_read_write_cmnd(cmd);
> +       case REQ_OP_ZONE_REPORT:
> +               return sd_zbc_setup_report_cmnd(cmd);
> +       case REQ_OP_ZONE_RESET:
> +               return sd_zbc_setup_reset_cmnd(cmd);
>         default:
>                 BUG();
>         }
> @@ -1780,7 +1804,10 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>         unsigned char op = SCpnt->cmnd[0];
>         unsigned char unmap = SCpnt->cmnd[1] & 8;
>
> -       if (req_op(req) == REQ_OP_DISCARD || req_op(req) == REQ_OP_WRITE_SAME) {
> +       switch (req_op(req)) {
> +       case REQ_OP_DISCARD:
> +       case REQ_OP_WRITE_SAME:
> +       case REQ_OP_ZONE_RESET:
>                 if (!result) {
>                         good_bytes = blk_rq_bytes(req);
>                         scsi_set_resid(SCpnt, 0);
> @@ -1788,6 +1815,17 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>                         good_bytes = 0;
>                         scsi_set_resid(SCpnt, blk_rq_bytes(req));
>                 }
> +               break;
> +       case REQ_OP_ZONE_REPORT:
> +               if (!result) {
> +                       good_bytes = scsi_bufflen(SCpnt)
> +                               - scsi_get_resid(SCpnt);
> +                       scsi_set_resid(SCpnt, 0);
> +               } else {
> +                       good_bytes = 0;
> +                       scsi_set_resid(SCpnt, blk_rq_bytes(req));
> +               }
> +               break;
>         }
>
>         if (result) {
> @@ -1848,7 +1886,11 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>         default:
>                 break;
>         }
> +
>   out:
> +       if (sd_is_zoned(sdkp))
> +               sd_zbc_complete(SCpnt, good_bytes, &sshdr);
> +
>         SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt,
>                                            "sd_done: completed %d of %d bytes\n",
>                                            good_bytes, scsi_bufflen(SCpnt)));
> @@ -1983,7 +2025,6 @@ sd_spinup_disk(struct scsi_disk *sdkp)
>         }
>  }
>
> -
>  /*
>   * Determine whether disk supports Data Integrity Field.
>   */
> @@ -2133,6 +2174,9 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>         /* Logical blocks per physical block exponent */
>         sdkp->physical_block_size = (1 << (buffer[13] & 0xf)) * sector_size;
>
> +       /* RC basis */
> +       sdkp->rc_basis = (buffer[12] >> 4) & 0x3;
> +
>         /* Lowest aligned logical block */
>         alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
>         blk_queue_alignment_offset(sdp->request_queue, alignment);
> @@ -2323,6 +2367,13 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
>                 sector_size = 512;
>         }
>         blk_queue_logical_block_size(sdp->request_queue, sector_size);
> +       blk_queue_physical_block_size(sdp->request_queue,
> +                                     sdkp->physical_block_size);
> +       sdkp->device->sector_size = sector_size;
> +
> +       if (sd_zbc_read_zones(sdkp, buffer) < 0)
> +               /* The drive zone layout could not be checked */
> +               sdkp->capacity = 0;
>
>         {
>                 char cap_str_2[10], cap_str_10[10];
> @@ -2349,9 +2400,6 @@ sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
>         if (sdkp->capacity > 0xffffffff)
>                 sdp->use_16_for_rw = 1;
>
> -       blk_queue_physical_block_size(sdp->request_queue,
> -                                     sdkp->physical_block_size);
> -       sdkp->device->sector_size = sector_size;
>  }
>
>  /* called with buffer of length 512 */
> @@ -2613,7 +2661,7 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
>         struct scsi_mode_data data;
>         struct scsi_sense_hdr sshdr;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 return;
>
>         if (sdkp->protection_type == 0)
> @@ -2720,6 +2768,7 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
>   */
>  static void sd_read_block_characteristics(struct scsi_disk *sdkp)
>  {
> +       struct request_queue *q = sdkp->disk->queue;
>         unsigned char *buffer;
>         u16 rot;
>         const int vpd_len = 64;
> @@ -2734,10 +2783,21 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
>         rot = get_unaligned_be16(&buffer[4]);
>
>         if (rot == 1) {
> -               queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
> -               queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
> +               queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
> +               queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, q);
>         }
>
> +       sdkp->zoned = (buffer[8] >> 4) & 3;
> +       if (sdkp->zoned == 1)
> +               q->limits.zoned = BLK_ZONED_HA;
> +       else if (sdkp->device->type == TYPE_ZBC)
> +               q->limits.zoned = BLK_ZONED_HM;
> +       else
> +               q->limits.zoned = BLK_ZONED_NONE;
> +       if (blk_queue_is_zoned(q) && sdkp->first_scan)
> +               sd_printk(KERN_NOTICE, sdkp, "Host-%s zoned block device\n",
> +                     q->limits.zoned == BLK_ZONED_HM ? "managed" : "aware");
> +
>   out:
>         kfree(buffer);
>  }
> @@ -2836,14 +2896,14 @@ static int sd_revalidate_disk(struct gendisk *disk)
>          * react badly if we do.
>          */
>         if (sdkp->media_present) {
> -               sd_read_capacity(sdkp, buffer);
> -
>                 if (scsi_device_supports_vpd(sdp)) {
>                         sd_read_block_provisioning(sdkp);
>                         sd_read_block_limits(sdkp);
>                         sd_read_block_characteristics(sdkp);
>                 }
>
> +               sd_read_capacity(sdkp, buffer);
> +
>                 sd_read_write_protect_flag(sdkp, buffer);
>                 sd_read_cache_type(sdkp, buffer);
>                 sd_read_app_tag_own(sdkp, buffer);
> @@ -3041,9 +3101,16 @@ static int sd_probe(struct device *dev)
>
>         scsi_autopm_get_device(sdp);
>         error = -ENODEV;
> -       if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
> +       if (sdp->type != TYPE_DISK &&
> +           sdp->type != TYPE_ZBC &&
> +           sdp->type != TYPE_MOD &&
> +           sdp->type != TYPE_RBC)
>                 goto out;
>
> +#ifndef CONFIG_BLK_DEV_ZONED
> +       if (sdp->type == TYPE_ZBC)
> +               goto out;
> +#endif
>         SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
>                                         "sd_probe\n"));
>
> @@ -3147,6 +3214,8 @@ static int sd_remove(struct device *dev)
>         del_gendisk(sdkp->disk);
>         sd_shutdown(dev);
>
> +       sd_zbc_remove(sdkp);
> +
>         blk_register_region(devt, SD_MINORS, NULL,
>                             sd_default_probe, NULL, NULL);
>
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index c8d9863..0ba47c1 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -64,6 +64,15 @@ struct scsi_disk {
>         struct scsi_device *device;
>         struct device   dev;
>         struct gendisk  *disk;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +       unsigned int    nr_zones;
> +       sector_t        zone_sectors;
> +       unsigned int    zone_shift;
> +       unsigned long   *zones_wlock;
> +       unsigned int    zones_optimal_open;
> +       unsigned int    zones_optimal_nonseq;
> +       unsigned int    zones_max_open;
> +#endif
>         atomic_t        openers;
>         sector_t        capacity;       /* size in logical blocks */
>         u32             max_xfer_blocks;
> @@ -94,6 +103,9 @@ struct scsi_disk {
>         unsigned        lbpvpd : 1;
>         unsigned        ws10 : 1;
>         unsigned        ws16 : 1;
> +       unsigned        rc_basis: 2;
> +       unsigned        zoned: 2;
> +       unsigned        urswrz : 1;
>  };
>  #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
>
> @@ -156,6 +168,11 @@ static inline unsigned int logical_to_bytes(struct scsi_device *sdev, sector_t b
>         return blocks * sdev->sector_size;
>  }
>
> +static inline sector_t sectors_to_logical(struct scsi_device *sdev, sector_t sector)
> +{
> +       return sector >> (ilog2(sdev->sector_size) - 9);
> +}
> +
>  /*
>   * Look up the DIX operation based on whether the command is read or
>   * write and whether dix and dif are enabled.
> @@ -239,4 +256,54 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
>
>  #endif /* CONFIG_BLK_DEV_INTEGRITY */
>
> +static inline int sd_is_zoned(struct scsi_disk *sdkp)
> +{
> +       return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
> +}
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +
> +extern int sd_zbc_read_zones(struct scsi_disk *, unsigned char *);
> +extern void sd_zbc_remove(struct scsi_disk *);
> +extern int sd_zbc_setup_read_write(struct scsi_disk *, struct request *,
> +                                  sector_t, unsigned int);
> +extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *);
> +extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *);
> +extern void sd_zbc_complete(struct scsi_cmnd *, unsigned int,
> +                           struct scsi_sense_hdr *);
> +
> +#else /* CONFIG_BLK_DEV_ZONED */
> +
> +static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
> +                                   unsigned char *buf)
> +{
> +       return 0;
> +}
> +
> +static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
> +
> +static inline int sd_zbc_setup_read_write(struct scsi_disk *sdkp,
> +                                         struct request *rq, sector_t sector,
> +                                         unsigned int num_sectors)
> +{
> +       /* Let the drive fail requests */
> +       return BLKPREP_OK;
> +}
> +
> +static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
> +{
> +       return BLKPREP_KILL;
> +}
> +
> +static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
> +{
> +       return BLKPREP_KILL;
> +}
> +
> +static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
> +                                  unsigned int good_bytes,
> +                                  struct scsi_sense_hdr *sshdr) {}
> +
> +#endif /* CONFIG_BLK_DEV_ZONED */
> +
>  #endif /* _SCSI_DISK_H */
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> new file mode 100644
> index 0000000..2680f51
> --- /dev/null
> +++ b/drivers/scsi/sd_zbc.c
> @@ -0,0 +1,586 @@
> +/*
> + * SCSI Zoned Block commands
> + *
> + * Copyright (C) 2014-2015 SUSE Linux GmbH
> + * Written by: Hannes Reinecke <hare@suse.de>
> + * Modified by: Damien Le Moal <damien.lemoal@hgst.com>
> + * Modified by: Shaun Tancheff <shaun.tancheff@seagate.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; see the file COPYING.  If not, write to
> + * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
> + * USA.
> + *
> + */
> +
> +#include <linux/blkdev.h>
> +
> +#include <asm/unaligned.h>
> +
> +#include <scsi/scsi.h>
> +#include <scsi/scsi_cmnd.h>
> +#include <scsi/scsi_dbg.h>
> +#include <scsi/scsi_device.h>
> +#include <scsi/scsi_driver.h>
> +#include <scsi/scsi_host.h>
> +#include <scsi/scsi_eh.h>
> +
> +#include "sd.h"
> +#include "scsi_priv.h"
> +
> +enum zbc_zone_type {
> +       ZBC_ZONE_TYPE_CONV = 0x1,
> +       ZBC_ZONE_TYPE_SEQWRITE_REQ,
> +       ZBC_ZONE_TYPE_SEQWRITE_PREF,
> +       ZBC_ZONE_TYPE_RESERVED,
> +};
> +
> +enum zbc_zone_cond {
> +       ZBC_ZONE_COND_NO_WP,
> +       ZBC_ZONE_COND_EMPTY,
> +       ZBC_ZONE_COND_IMP_OPEN,
> +       ZBC_ZONE_COND_EXP_OPEN,
> +       ZBC_ZONE_COND_CLOSED,
> +       ZBC_ZONE_COND_READONLY = 0xd,
> +       ZBC_ZONE_COND_FULL,
> +       ZBC_ZONE_COND_OFFLINE,
> +};
> +
> +/**
> + * Convert a zone descriptor to a zone struct.
> + */
> +static void sd_zbc_parse_report(struct scsi_disk *sdkp,
> +                               u8 *buf,
> +                               struct blk_zone *zone)
> +{
> +       struct scsi_device *sdp = sdkp->device;
> +
> +       memset(zone, 0, sizeof(struct blk_zone));
> +
> +       zone->type = buf[0] & 0x0f;
> +       zone->cond = (buf[1] >> 4) & 0xf;
> +       if (buf[1] & 0x01)
> +               zone->reset = 1;
> +       if (buf[1] & 0x02)
> +               zone->non_seq = 1;
> +
> +       zone->len = logical_to_sectors(sdp, get_unaligned_be64(&buf[8]));
> +       zone->start = logical_to_sectors(sdp, get_unaligned_be64(&buf[16]));
> +       zone->wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24]));
> +       if (zone->type != ZBC_ZONE_TYPE_CONV &&
> +           zone->cond == ZBC_ZONE_COND_FULL)
> +               zone->wp = zone->start + zone->len;
> +}
> +
> +/**
> + * Issue a REPORT ZONES scsi command.
> + */
> +static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
> +                              unsigned int buflen, sector_t start_sector)
> +{
> +       struct scsi_device *sdp = sdkp->device;
> +       const int timeout = sdp->request_queue->rq_timeout;
> +       struct scsi_sense_hdr sshdr;
> +       sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
> +       unsigned char cmd[16];
> +       unsigned int rep_len;
> +       int result;
> +
> +       memset(cmd, 0, 16);
> +       cmd[0] = ZBC_IN;
> +       cmd[1] = ZI_REPORT_ZONES;
> +       put_unaligned_be64(start_lba, &cmd[2]);
> +       put_unaligned_be32(buflen, &cmd[10]);
> +       memset(buf, 0, buflen);
> +
> +       result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
> +                                 buf, buflen, &sshdr,
> +                                 timeout, SD_MAX_RETRIES, NULL);
> +       if (result) {
> +               sd_printk(KERN_ERR, sdkp,
> +                         "REPORT ZONES lba %llu failed with %d/%d\n",
> +                         (unsigned long long)start_lba,
> +                         host_byte(result), driver_byte(result));
> +               return -EIO;
> +       }
> +
> +       rep_len = get_unaligned_be32(&buf[0]);
> +       if (rep_len < 64) {
> +               sd_printk(KERN_ERR, sdkp,
> +                         "REPORT ZONES report invalid length %u\n",
> +                         rep_len);
> +               return -EIO;
> +       }
> +
> +       return 0;
> +}
> +
> +int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
> +{
> +       struct request *rq = cmd->request;
> +       struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +       sector_t lba, sector = blk_rq_pos(rq);
> +       unsigned int nr_bytes = blk_rq_bytes(rq);
> +       int ret;
> +
> +       WARN_ON(nr_bytes == 0);
> +
> +       if (!sd_is_zoned(sdkp))
> +               /* Not a zoned device */
> +               return BLKPREP_KILL;
> +
> +       ret = scsi_init_io(cmd);
> +       if (ret != BLKPREP_OK)
> +               return ret;
> +
> +       cmd->cmd_len = 16;
> +       memset(cmd->cmnd, 0, cmd->cmd_len);
> +       cmd->cmnd[0] = ZBC_IN;
> +       cmd->cmnd[1] = ZI_REPORT_ZONES;
> +       lba = sectors_to_logical(sdkp->device, sector);
> +       put_unaligned_be64(lba, &cmd->cmnd[2]);
> +       put_unaligned_be32(nr_bytes, &cmd->cmnd[10]);
> +       /* Do partial report for speeding things up */
> +       cmd->cmnd[14] = ZBC_REPORT_ZONE_PARTIAL;
> +
> +       cmd->sc_data_direction = DMA_FROM_DEVICE;
> +       cmd->sdb.length = nr_bytes;
> +       cmd->transfersize = sdkp->device->sector_size;
> +       cmd->allowed = 0;
> +
> +       /*
> +        * Report may return less bytes than requested. Make sure
> +        * to report completion on the entire initial request.
> +        */
> +       rq->__data_len = nr_bytes;
> +
> +       return BLKPREP_OK;
> +}
> +
> +static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
> +                                        unsigned int good_bytes)
> +{
> +       struct request *rq = scmd->request;
> +       struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +       struct sg_mapping_iter miter;
> +       struct blk_zone_report_hdr hdr;
> +       struct blk_zone zone;
> +       unsigned int offset, bytes = 0;
> +       unsigned long flags;
> +       u8 *buf;
> +
> +       if (good_bytes < 64)
> +               return;
> +
> +       memset(&hdr, 0, sizeof(struct blk_zone_report_hdr));
> +
> +       sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd),
> +                      SG_MITER_TO_SG | SG_MITER_ATOMIC);
> +
> +       local_irq_save(flags);
> +       while (sg_miter_next(&miter) && bytes < good_bytes) {
> +
> +               buf = miter.addr;
> +               offset = 0;
> +
> +               if (bytes == 0) {
> +                       /* Set the report header */
> +                       hdr.nr_zones = min_t(unsigned int,
> +                                        (good_bytes - 64) / 64,
> +                                        get_unaligned_be32(&buf[0]) / 64);
> +                       memcpy(buf, &hdr, sizeof(struct blk_zone_report_hdr));
> +                       offset += 64;
> +                       bytes += 64;
> +               }
> +
> +               /* Parse zone descriptors */
> +               while (offset < miter.length && hdr.nr_zones) {
> +                       WARN_ON(offset > miter.length);
> +                       buf = miter.addr + offset;
> +                       sd_zbc_parse_report(sdkp, buf, &zone);
> +                       memcpy(buf, &zone, sizeof(struct blk_zone));
> +                       offset += 64;
> +                       bytes += 64;
> +                       hdr.nr_zones--;
> +               }
> +
> +               if (!hdr.nr_zones)
> +                       break;
> +
> +       }
> +       sg_miter_stop(&miter);
> +       local_irq_restore(flags);
> +}
> +
> +int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
> +{
> +       struct request *rq = cmd->request;
> +       struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +       sector_t sector = blk_rq_pos(rq);
> +       unsigned int zone_no = sector >> sdkp->zone_shift;
> +
> +       if (!sd_is_zoned(sdkp))
> +               /* Not a zoned device */
> +               return BLKPREP_KILL;
> +
> +       if (sector & (sdkp->zone_sectors - 1))
> +               /* Unaligned request */
> +               return BLKPREP_KILL;
> +
> +       /* Do not allow concurrent reset and writes */
> +       if (!test_and_set_bit(zone_no, sdkp->zones_wlock))
> +               return BLKPREP_DEFER;
> +
> +       cmd->cmd_len = 16;
> +       memset(cmd->cmnd, 0, cmd->cmd_len);
> +       cmd->cmnd[0] = ZBC_OUT;
> +       cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
> +       put_unaligned_be64(sectors_to_logical(sdkp->device, sector),
> +                          &cmd->cmnd[2]);
> +
> +       rq->timeout = SD_TIMEOUT;
> +       cmd->sc_data_direction = DMA_NONE;
> +       cmd->transfersize = 0;
> +       cmd->allowed = 0;
> +
> +       return BLKPREP_OK;
> +}
> +
> +int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
> +                           sector_t sector, unsigned int sectors)
> +{
> +       sector_t zone_ofst = sector & (sdkp->zone_sectors - 1);
> +
> +       /* Do not allow zone boundaries crossing */
> +       if (zone_ofst + sectors > sdkp->zone_sectors)
> +               return BLKPREP_KILL;
> +
> +       /*
> +        * Do not issue more than one write at a time per
> +        * zone. This solves write ordering problems due to
> +        * the unlocking of the request queue in the dispatch
> +        * path in the non scsi-mq case. For scsi-mq, this
> +        * also avoids potential write reordering when multiple
> +        * threads running on different CPUs write to the same
> +        * zone (with a synchronized sequential pattern).
> +        */
> +       if (req_op(rq) == REQ_OP_WRITE ||
> +           req_op(rq) == REQ_OP_WRITE_SAME) {
> +               unsigned int zone_no = sector >> sdkp->zone_shift;
> +               if (!test_and_set_bit(zone_no, sdkp->zones_wlock))
> +                       return BLKPREP_DEFER;
> +       }
> +
> +       return BLKPREP_OK;
> +}
> +
> +void sd_zbc_complete(struct scsi_cmnd *cmd,
> +                    unsigned int good_bytes,
> +                    struct scsi_sense_hdr *sshdr)
> +{
> +       int result = cmd->result;
> +       struct request *rq = cmd->request;
> +       struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +       unsigned int zone_no;
> +
> +       switch (req_op(rq)) {
> +       case REQ_OP_WRITE:
> +       case REQ_OP_WRITE_SAME:
> +
> +               if (result &&
> +                   sshdr->sense_key == ILLEGAL_REQUEST &&
> +                   sshdr->asc == 0x21)
> +                       /*
> +                        * It is unlikely that retrying write requests failed
> +                        * with any kind of alignement error will result in
> +                        * success. So don't.
> +                        */
> +                       cmd->allowed = 0;
> +
> +               /* Fallthru */
> +
> +       case REQ_OP_ZONE_RESET:
> +
> +               /* Unlock the zone */
> +               zone_no = blk_rq_pos(rq) >> sdkp->zone_shift;
> +               clear_bit_unlock(zone_no, sdkp->zones_wlock);
> +               smp_mb__after_atomic();
> +
> +               if (result &&
> +                   sshdr->sense_key == ILLEGAL_REQUEST &&
> +                   sshdr->asc == 0x24)
> +                       /*
> +                        * INVALID FIELD IN CDB error: Reset of a conventional
> +                        * zone was attempted. Nothing to worry about,
> +                        * so be quiet about the error.
> +                        */
> +                       rq->cmd_flags |= REQ_QUIET;
> +
> +               break;
> +
> +       case REQ_OP_ZONE_REPORT:
> +
> +               if (!result)
> +                       sd_zbc_report_zones_complete(cmd, good_bytes);
> +               break;
> +
> +       }
> +}
> +
> +/**
> + * Read zoned block device characteristics (VPD page B6).
> + */
> +static int sd_zbc_read_zoned_charateristics(struct scsi_disk *sdkp,
> +                                           unsigned char *buf)
> +{
> +
> +       if (scsi_get_vpd_page(sdkp->device, 0xb6, buf, 64))
> +               return -ENODEV;
> +
> +       if (sdkp->device->type != TYPE_ZBC) {
> +               /* Host-aware */
> +               sdkp->urswrz = 1;
> +               sdkp->zones_optimal_open = get_unaligned_be64(&buf[8]);
> +               sdkp->zones_optimal_nonseq = get_unaligned_be64(&buf[12]);
> +               sdkp->zones_max_open = 0;
> +       } else {
> +               /* Host-managed */
> +               sdkp->urswrz = buf[4] & 1;
> +               sdkp->zones_optimal_open = 0;
> +               sdkp->zones_optimal_nonseq = 0;
> +               sdkp->zones_max_open = get_unaligned_be64(&buf[16]);
> +       }
> +
> +       return 0;
> +}
> +
> +/**
> + * Check reported capacity.
> + */
> +static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
> +                                unsigned char *buf)
> +{
> +       sector_t lba;
> +       int ret;
> +
> +       if (sdkp->rc_basis != 0)
> +               return 0;
> +
> +       /* Do a report zone to get the maximum LBA to check capacity */
> +       ret = sd_zbc_report_zones(sdkp, buf, SD_BUF_SIZE, 0);
> +       if (ret)
> +               return ret;
> +
> +       /* The max_lba field is the capacity of this device */
> +       lba = get_unaligned_be64(&buf[8]);
> +       if (lba + 1 > sdkp->capacity) {
> +               if (sdkp->first_scan)
> +                       sd_printk(KERN_WARNING, sdkp,
> +                                 "Changing capacity from %zu "
> +                                 "to max LBA+1 %llu\n",
> +                                 sdkp->capacity,
> +                                 (unsigned long long)lba + 1);
> +               sdkp->capacity = lba + 1;
> +       }
> +
> +       return 0;
> +}
> +
> +#define SD_ZBC_BUF_SIZE 131072
> +
> +static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
> +{
> +       sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
> +       struct blk_zone zone;
> +       sector_t sector = 0;
> +       unsigned char *buf;
> +       unsigned char *rec;
> +       unsigned int buf_len;
> +       unsigned int list_length;
> +       int ret;
> +       u8 same;
> +
> +       /* Get a buffer */
> +       buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
> +       if (!buf)
> +               return -ENOMEM;
> +
> +       /* Do a report zone to get the same field */
> +       ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0);
> +       if (ret)
> +               goto out;
> +
> +       same = buf[4] & 0x0f;
> +       if (same > 0) {
> +               unsigned char *rec = &buf[64];
> +               sdkp->zone_sectors = logical_to_sectors(sdkp->device,
> +                                               get_unaligned_be64(&rec[8]));
> +               goto out;
> +       }
> +
> +       /* Check the size of all zones */
> +       sdkp->zone_sectors = (sector_t)-1;
> +       do {
> +
> +               /* Parse REPORT ZONES header */
> +               list_length = get_unaligned_be32(&buf[0]) + 64;
> +               rec = buf + 64;
> +               if (list_length < SD_ZBC_BUF_SIZE)
> +                       buf_len = list_length;
> +               else
> +                       buf_len = SD_ZBC_BUF_SIZE;
> +
> +               /* Parse zone descriptors */
> +               while (rec < buf + buf_len) {
> +                       sd_zbc_parse_report(sdkp, rec, &zone);
> +                       if (sdkp->zone_sectors == (sector_t)-1) {
> +                               sdkp->zone_sectors = zone.len;
> +                       } else if (sector + zone.len < capacity &&
> +                                  zone.len != sdkp->zone_sectors) {
> +                               sdkp->zone_sectors = 0;
> +                               goto out;
> +                       }
> +                       sector += zone.len;
> +                       rec += 64;
> +               }
> +
> +               if (sector < capacity) {
> +                       ret = sd_zbc_report_zones(sdkp, buf,
> +                                       SD_ZBC_BUF_SIZE, sector);
> +                       if (ret)
> +                               return ret;
> +               }
> +
> +       } while (sector < capacity);
> +
> +out:
> +       kfree(buf);
> +
> +       if (!sdkp->zone_sectors) {
> +               if (sdkp->first_scan)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                                 "Devices with non constant zone "
> +                                 "size are not supported\n");
> +               return -ENODEV;
> +       }
> +
> +       if (!is_power_of_2(sdkp->zone_sectors)) {
> +               if (sdkp->first_scan)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                                 "Devices with non power of 2 zone "
> +                                 "size are not supported\n");
> +               return -ENODEV;
> +       }
> +
> +       if ((sdkp->zone_sectors << 9) > UINT_MAX) {
> +               if (sdkp->first_scan)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                                 "Zone size too large\n");
> +               return -ENODEV;
> +       }
> +
> +       return 0;
> +}
> +
> +static int sd_zbc_setup(struct scsi_disk *sdkp)
> +{
> +       sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
> +
> +       /* chunk_sectors indicates the zone size */
> +       blk_queue_chunk_sectors(sdkp->disk->queue, sdkp->zone_sectors);
> +       sdkp->zone_shift = ilog2(sdkp->zone_sectors);
> +       sdkp->nr_zones = capacity >> sdkp->zone_shift;
> +       if (capacity & (sdkp->zone_sectors - 1))
> +               sdkp->nr_zones++;
> +
> +       if (!sdkp->zones_wlock) {
> +               sdkp->zones_wlock = kzalloc(BITS_TO_LONGS(sdkp->nr_zones),
> +                                           GFP_KERNEL);
> +               if (!sdkp->zones_wlock)
> +                       return -ENOMEM;
> +       }
> +
> +       return 0;
> +}
> +
> +int sd_zbc_read_zones(struct scsi_disk *sdkp,
> +                     unsigned char *buf)
> +{
> +       sector_t capacity;
> +       int ret = 0;
> +
> +       if (!sd_is_zoned(sdkp))
> +               /*
> +                * Device managed or normal SCSI disk,
> +                * no special handling required
> +                */
> +               return 0;
> +
> +
> +       /* Get zoned block device characteristics */
> +       ret = sd_zbc_read_zoned_charateristics(sdkp, buf);
> +       if (ret)
> +               return ret;
> +
> +       /*
> +        * Check for unconstrained reads: host-managed devices with
> +        * constrained reads (drives failing read after write pointer)
> +        * are not supported.
> +        */
> +       if (!sdkp->urswrz) {
> +               if (sdkp->first_scan)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                         "constrained reads devices are not supported\n");
> +               return -ENODEV;
> +       }
> +
> +       /* Check capacity */
> +       ret = sd_zbc_check_capacity(sdkp, buf);
> +       if (ret)
> +               return ret;
> +       capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
> +
> +       /*
> +        * Check zone size: only devices with a constant zone size (except
> +        * an eventual last runt zone) that is a power of 2 are supported.
> +        */
> +       ret = sd_zbc_check_zone_size(sdkp);
> +       if (ret)
> +               return ret;
> +
> +       /* The drive satisfies the kernel restrictions: set it up */
> +       ret = sd_zbc_setup(sdkp);
> +       if (ret)
> +               return ret;
> +
> +       if (sdkp->first_scan) {
> +               if (sdkp->nr_zones * sdkp->zone_sectors == capacity)
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                                 "%u zones of %llu sectors\n",
> +                                 sdkp->nr_zones,
> +                                 (unsigned long long)sdkp->zone_sectors);
> +               else
> +                       sd_printk(KERN_NOTICE, sdkp,
> +                                 "%u zones of %llu sectors "
> +                                 "+ 1 runt zone\n",
> +                                 sdkp->nr_zones - 1,
> +                                 (unsigned long long)sdkp->zone_sectors);
> +       }
> +
> +       return 0;
> +}
> +
> +void sd_zbc_remove(struct scsi_disk *sdkp)
> +{
> +       kfree(sdkp->zones_wlock);
> +       sdkp->zones_wlock = NULL;
> +}
> diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
> index d1defd1..6ba66e0 100644
> --- a/include/scsi/scsi_proto.h
> +++ b/include/scsi/scsi_proto.h
> @@ -299,4 +299,21 @@ struct scsi_lun {
>  #define SCSI_ACCESS_STATE_MASK        0x0f
>  #define SCSI_ACCESS_STATE_PREFERRED   0x80
>
> +/* Reporting options for REPORT ZONES */
> +enum zbc_zone_reporting_options {
> +       ZBC_ZONE_REPORTING_OPTION_ALL = 0,
> +       ZBC_ZONE_REPORTING_OPTION_EMPTY,
> +       ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
> +       ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
> +       ZBC_ZONE_REPORTING_OPTION_CLOSED,
> +       ZBC_ZONE_REPORTING_OPTION_FULL,
> +       ZBC_ZONE_REPORTING_OPTION_READONLY,
> +       ZBC_ZONE_REPORTING_OPTION_OFFLINE,
> +       ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
> +       ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
> +       ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
> +};
> +
> +#define ZBC_REPORT_ZONE_PARTIAL 0x80
> +
>  #endif /* _SCSI_PROTO_H_ */
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] block: Add 'zoned' queue limit
  2016-09-26 11:14   ` Damien Le Moal
  (?)
@ 2016-09-27 21:09   ` Shaun Tancheff
  -1 siblings, 0 replies; 28+ messages in thread
From: Shaun Tancheff @ 2016-09-27 21:09 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen

On Mon, Sep 26, 2016 at 6:14 AM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
> Add the zoned queue limit to indicate the zoning model of a block device.
> Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
> 1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
> for host-managed zone block devices. The standards defined drive managed
> model is not defined here since these block devices do not provide any
> command for accessing zone information. Drive managed model devices will
> be reported as BLK_ZONED_NONE.
>
> The helper functions blk_queue_zoned_model and bdev_zoned_model return
> the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
> return a boolean for callers to test if a block device is zoned.
>
> The zoned attribute is also exported as a string to applications via
> sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
> BLK_ZONED_HM as "host-managed".
>
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  block/blk-settings.c   |  1 +
>  block/blk-sysfs.c      | 18 ++++++++++++++++++
>  include/linux/blkdev.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 66 insertions(+)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index f679ae1..b1d5b7f 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>         lim->io_opt = 0;
>         lim->misaligned = 0;
>         lim->cluster = 1;
> +       lim->zoned = BLK_ZONED_NONE;
>  }
>  EXPORT_SYMBOL(blk_set_default_limits);
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 9cc8d7c..ff9cd9c 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
>  QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
>  #undef QUEUE_SYSFS_BIT_FNS
>
> +static ssize_t queue_zoned_show(struct request_queue *q, char *page)
> +{
> +       switch (blk_queue_zoned_model(q)) {
> +       case BLK_ZONED_HA:
> +               return sprintf(page, "host-aware\n");
> +       case BLK_ZONED_HM:
> +               return sprintf(page, "host-managed\n");
> +       default:
> +               return sprintf(page, "none\n");
> +       }
> +}
> +
>  static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>  {
>         return queue_var_show((blk_queue_nomerges(q) << 1) |
> @@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry = {
>         .store = queue_store_nonrot,
>  };
>
> +static struct queue_sysfs_entry queue_zoned_entry = {
> +       .attr = {.name = "zoned", .mode = S_IRUGO },
> +       .show = queue_zoned_show,
> +};
> +
>  static struct queue_sysfs_entry queue_nomerges_entry = {
>         .attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
>         .show = queue_nomerges_show,
> @@ -546,6 +563,7 @@ static struct attribute *default_attrs[] = {
>         &queue_discard_zeroes_data_entry.attr,
>         &queue_write_same_max_entry.attr,
>         &queue_nonrot_entry.attr,
> +       &queue_zoned_entry.attr,
>         &queue_nomerges_entry.attr,
>         &queue_rq_affinity_entry.attr,
>         &queue_iostats_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358..f19e16b 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -261,6 +261,15 @@ struct blk_queue_tag {
>  #define BLK_SCSI_MAX_CMDS      (256)
>  #define BLK_SCSI_CMD_PER_LONG  (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
>
> +/*
> + * Zoned block device models (zoned limit).
> + */
> +enum blk_zoned_model {
> +       BLK_ZONED_NONE, /* Regular block device */
> +       BLK_ZONED_HA,   /* Host-aware zoned block device */
> +       BLK_ZONED_HM,   /* Host-managed zoned block device */
> +};
> +
>  struct queue_limits {
>         unsigned long           bounce_pfn;
>         unsigned long           seg_boundary_mask;
> @@ -290,6 +299,7 @@ struct queue_limits {
>         unsigned char           cluster;
>         unsigned char           discard_zeroes_data;
>         unsigned char           raid_partial_stripes_expensive;
> +       enum blk_zoned_model    zoned;
>  };
>
>  struct request_queue {
> @@ -627,6 +637,23 @@ static inline unsigned int blk_queue_cluster(struct request_queue *q)
>         return q->limits.cluster;
>  }
>
> +static inline enum blk_zoned_model
> +blk_queue_zoned_model(struct request_queue *q)
> +{
> +       return q->limits.zoned;
> +}
> +
> +static inline bool blk_queue_is_zoned(struct request_queue *q)
> +{
> +       switch (blk_queue_zoned_model(q)) {
> +       case BLK_ZONED_HA:
> +       case BLK_ZONED_HM:
> +               return true;
> +       default:
> +               return false;
> +       }
> +}
> +
>  /*
>   * We regard a request as sync, if either a read or a sync write
>   */
> @@ -1354,6 +1381,26 @@ static inline unsigned int bdev_write_same(struct block_device *bdev)
>         return 0;
>  }
>
> +static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
> +{
> +       struct request_queue *q = bdev_get_queue(bdev);
> +
> +       if (q)
> +               return blk_queue_zoned_model(q);
> +
> +       return BLK_ZONED_NONE;
> +}
> +
> +static inline bool bdev_is_zoned(struct block_device *bdev)
> +{
> +       struct request_queue *q = bdev_get_queue(bdev);
> +
> +       if (q)
> +               return blk_queue_is_zoned(q);
> +
> +       return false;
> +}
> +
>  static inline int queue_dma_alignment(struct request_queue *q)
>  {
>         return q ? q->dma_alignment : 511;
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  2016-09-26 11:14   ` Damien Le Moal
  (?)
@ 2016-09-27 21:10   ` Shaun Tancheff
  -1 siblings, 0 replies; 28+ messages in thread
From: Shaun Tancheff @ 2016-09-27 21:10 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke

On Mon, Sep 26, 2016 at 6:14 AM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> The queue limits already have a 'chunk_sectors' setting, so
> we should be presenting it via sysfs.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  block/blk-sysfs.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index ff9cd9c..488c2e2 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct request_queue *q, char *pag
>         return queue_var_show(queue_physical_block_size(q), page);
>  }
>
> +static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
> +{
> +       return queue_var_show(q->limits.chunk_sectors, page);
> +}
> +
>  static ssize_t queue_io_min_show(struct request_queue *q, char *page)
>  {
>         return queue_var_show(queue_io_min(q), page);
> @@ -455,6 +460,11 @@ static struct queue_sysfs_entry queue_physical_block_size_entry = {
>         .show = queue_physical_block_size_show,
>  };
>
> +static struct queue_sysfs_entry queue_chunk_sectors_entry = {
> +       .attr = {.name = "chunk_sectors", .mode = S_IRUGO },
> +       .show = queue_chunk_sectors_show,
> +};
> +
>  static struct queue_sysfs_entry queue_io_min_entry = {
>         .attr = {.name = "minimum_io_size", .mode = S_IRUGO },
>         .show = queue_io_min_show,
> @@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
>         &queue_hw_sector_size_entry.attr,
>         &queue_logical_block_size_entry.attr,
>         &queue_physical_block_size_entry.attr,
> +       &queue_chunk_sectors_entry.attr,
>         &queue_io_min_entry.attr,
>         &queue_io_opt_entry.attr,
>         &queue_discard_granularity_entry.attr,
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>


> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/7] block: update chunk_sectors in blk_stack_limits()
  2016-09-26 11:14   ` Damien Le Moal
  (?)
@ 2016-09-27 21:12   ` Shaun Tancheff
  -1 siblings, 0 replies; 28+ messages in thread
From: Shaun Tancheff @ 2016-09-27 21:12 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Hannes Reinecke, Hannes Reinecke

On Mon, Sep 26, 2016 at 6:14 AM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
> ---
>  block/blk-settings.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index b1d5b7f..55369a6 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>                         t->discard_granularity;
>         }
>
> +       if (b->chunk_sectors)
> +               t->chunk_sectors = min_not_zero(t->chunk_sectors,
> +                                               b->chunk_sectors);
> +
>         return ret;
>  }
>  EXPORT_SYMBOL(blk_stack_limits);
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>


> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-09-27 21:12 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-26 11:14 [PATCH v2 0/7] ZBC / Zoned block device support Damien Le Moal
2016-09-26 11:14 ` Damien Le Moal
2016-09-26 11:14 ` [PATCH v2 1/7] block: Add 'zoned' queue limit Damien Le Moal
2016-09-26 11:14   ` Damien Le Moal
2016-09-27 21:09   ` Shaun Tancheff
2016-09-26 11:14 ` [PATCH v2 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes Damien Le Moal
2016-09-26 11:14   ` Damien Le Moal
2016-09-27 21:10   ` Shaun Tancheff
2016-09-26 11:14 ` [PATCH v2 3/7] block: update chunk_sectors in blk_stack_limits() Damien Le Moal
2016-09-26 11:14   ` Damien Le Moal
2016-09-27 21:12   ` Shaun Tancheff
2016-09-26 11:14 ` [PATCH v2 4/7] block: Define zoned block device operations Damien Le Moal
2016-09-26 11:14   ` Damien Le Moal
2016-09-26 11:14 ` [PATCH v2 5/7] block: Implement support for zoned block devices Damien Le Moal
2016-09-26 11:14   ` Damien Le Moal
2016-09-27 18:51   ` [PATCH v3 " Shaun Tancheff
2016-09-26 11:14 ` [PATCH v2 6/7] sd: Implement support for ZBC devices Damien Le Moal
2016-09-26 11:14   ` Damien Le Moal
2016-09-27 21:08   ` Shaun Tancheff
2016-09-26 11:14 ` [PATCH v2 7/7] blk-zoned: implement ioctls Damien Le Moal
2016-09-26 11:14   ` Damien Le Moal
2016-09-26 16:37   ` Christoph Hellwig
2016-09-26 23:12     ` Shaun Tancheff
2016-09-27 18:24       ` Christoph Hellwig
2016-09-26 23:30     ` Damien Le Moal
2016-09-26 23:30       ` Damien Le Moal
2016-09-26 23:58       ` Shaun Tancheff
2016-09-26 23:58         ` Shaun Tancheff

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.