All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Add support for ZBC host-managed devices
@ 2016-07-19 13:25 Hannes Reinecke
  2016-07-19 13:25 ` [PATCH 1/5] sd: configure ZBC devices Hannes Reinecke
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-19 13:25 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Damien Le Moal,
	Hannes Reinecke

Hi James,

this patchset adds support for ZBC host-managed devices to sd.
Support for it is selected with the new 'SCSI_ZBC' config option.

Patch has been made over Tejuns 'libata/for-4.8' repository.
It relies on the patchset "Support for zoned block devices" posted
earlier.

As usual, comments and reviews are welcome.

Damien Le Moal (2):
  sd: Limit messages for ZBC disks capacity change
  sd_zbc: Fix handling of ZBC read after write pointer

Hannes Reinecke (3):
  sd: configure ZBC devices
  sd: Implement new RESET_WP provisioning mode
  sd: Implement support for ZBC devices

 drivers/scsi/Kconfig      |   8 +
 drivers/scsi/Makefile     |   1 +
 drivers/scsi/sd.c         | 201 +++++++++++++++--
 drivers/scsi/sd.h         |  67 ++++++
 drivers/scsi/sd_zbc.c     | 559 ++++++++++++++++++++++++++++++++++++++++++++++
 include/scsi/scsi_proto.h |  17 ++
 6 files changed, 832 insertions(+), 21 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/5] sd: configure ZBC devices
  2016-07-19 13:25 [PATCH 0/5] Add support for ZBC host-managed devices Hannes Reinecke
@ 2016-07-19 13:25 ` Hannes Reinecke
  2016-07-20  0:46   ` Damien Le Moal
                     ` (2 more replies)
  2016-07-19 13:25 ` [PATCH 2/5] sd: Implement new RESET_WP provisioning mode Hannes Reinecke
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-19 13:25 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Damien Le Moal,
	Hannes Reinecke, Hannes Reinecke

For ZBC devices I/O must not cross zone boundaries, so setup
the 'chunk_sectors' block queue setting to the zone size.
This is only valid for REPORT ZONES SAME type 2 or 3;
for other types the zone sizes might be different
for individual zones. So issue a warning if the type is
found to be different.
Also the capacity might be different from the announced
capacity, so adjust it as needed.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/sd.c         | 120 ++++++++++++++++++++++++++++++++++++++++++++--
 drivers/scsi/sd.h         |  12 +++++
 include/scsi/scsi_proto.h |  17 +++++++
 3 files changed, 144 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 428c03e..249ea81 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1972,6 +1972,57 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 	}
 }
 
+/**
+ * sd_zbc_report_zones - Issue a REPORT ZONES scsi command
+ * @sdkp: SCSI disk to which the command should be send
+ * @buffer: response buffer
+ * @bufflen: length of @buffer
+ * @start_sector: logical sector for the zone information should be reported
+ * @option: option for report zones command
+ * @partial: flag to set 'partial' bit for report zones command
+ */
+static int
+sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buffer,
+		    int bufflen, sector_t start_sector,
+		    enum zbc_zone_reporting_options option, bool partial)
+{
+	struct scsi_device *sdp = sdkp->device;
+	const int timeout = sdp->request_queue->rq_timeout
+		* SD_FLUSH_TIMEOUT_MULTIPLIER;
+	struct scsi_sense_hdr sshdr;
+	sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
+	unsigned char cmd[16];
+	int result;
+
+	if (!scsi_device_online(sdp)) {
+		sd_printk(KERN_INFO, sdkp, "device not online\n");
+		return -ENODEV;
+	}
+
+	sd_printk(KERN_INFO, sdkp, "REPORT ZONES lba %zu len %d\n",
+		  start_lba, bufflen);
+
+	memset(cmd, 0, 16);
+	cmd[0] = ZBC_IN;
+	cmd[1] = ZI_REPORT_ZONES;
+	put_unaligned_be64(start_lba, &cmd[2]);
+	put_unaligned_be32(bufflen, &cmd[10]);
+	cmd[14] = (partial ? ZBC_REPORT_ZONE_PARTIAL : 0) | option;
+	memset(buffer, 0, bufflen);
+
+	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
+				  buffer, bufflen, &sshdr,
+				  timeout, SD_MAX_RETRIES, NULL);
+
+	if (result) {
+		sd_printk(KERN_NOTICE, sdkp,
+			  "REPORT ZONES lba %zu failed with %d/%d\n",
+			  start_lba, host_byte(result), driver_byte(result));
+
+		return -EIO;
+	}
+	return 0;
+}
 
 /*
  * Determine whether disk supports Data Integrity Field.
@@ -2014,6 +2065,59 @@ static int sd_read_protection_type(struct scsi_disk *sdkp, unsigned char *buffer
 	return ret;
 }
 
+static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+	int retval;
+	unsigned char *desc;
+	u32 rep_len;
+	u8 same;
+	u64 zone_len, lba;
+
+	if (sdkp->zoned != 1)
+		/* Device managed, no special handling required */
+		return;
+
+	retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
+				     0, ZBC_ZONE_REPORTING_OPTION_ALL, false);
+	if (retval < 0)
+		return;
+
+	rep_len = get_unaligned_be32(&buffer[0]);
+	if (rep_len < 64) {
+		sd_printk(KERN_WARNING, sdkp,
+			  "REPORT ZONES report invalid length %u\n",
+			  rep_len);
+		return;
+	}
+
+	if (sdkp->rc_basis == 0) {
+		/* The max_lba field is the capacity of a zoned device */
+		lba = get_unaligned_be64(&buffer[8]);
+		if (lba + 1 > sdkp->capacity) {
+			sd_printk(KERN_WARNING, sdkp,
+				  "Max LBA %zu (capacity %zu)\n",
+				  (sector_t) lba + 1, sdkp->capacity);
+			sdkp->capacity = lba + 1;
+		}
+	}
+
+	/*
+	 * Adjust 'chunk_sectors' to the zone length if the device
+	 * supports equal zone sizes.
+	 */
+	same = buffer[4] & 0xf;
+	if (same == 0 || same > 3) {
+		sd_printk(KERN_WARNING, sdkp,
+			  "REPORT ZONES SAME type %d not supported\n", same);
+		return;
+	}
+	/* Read the zone length from the first zone descriptor */
+	desc = &buffer[64];
+	zone_len = logical_to_sectors(sdkp->device,
+				      get_unaligned_be64(&desc[8]));
+	blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
+}
+
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
 			struct scsi_sense_hdr *sshdr, int sense_valid,
 			int the_result)
@@ -2122,6 +2226,9 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	/* Logical blocks per physical block exponent */
 	sdkp->physical_block_size = (1 << (buffer[13] & 0xf)) * sector_size;
 
+	/* RC basis */
+	sdkp->rc_basis = (buffer[12] >> 4) & 0x3;
+
 	/* Lowest aligned logical block */
 	alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
 	blk_queue_alignment_offset(sdp->request_queue, alignment);
@@ -2312,6 +2419,11 @@ got_data:
 		sector_size = 512;
 	}
 	blk_queue_logical_block_size(sdp->request_queue, sector_size);
+	blk_queue_physical_block_size(sdp->request_queue,
+				      sdkp->physical_block_size);
+	sdkp->device->sector_size = sector_size;
+
+	sd_read_zones(sdkp, buffer);
 
 	{
 		char cap_str_2[10], cap_str_10[10];
@@ -2338,9 +2450,6 @@ got_data:
 	if (sdkp->capacity > 0xffffffff)
 		sdp->use_16_for_rw = 1;
 
-	blk_queue_physical_block_size(sdp->request_queue,
-				      sdkp->physical_block_size);
-	sdkp->device->sector_size = sector_size;
 }
 
 /* called with buffer of length 512 */
@@ -2727,6 +2836,7 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
 	}
 
+	sdkp->zoned = (buffer[8] >> 4) & 3;
  out:
 	kfree(buffer);
 }
@@ -2825,14 +2935,14 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	 * react badly if we do.
 	 */
 	if (sdkp->media_present) {
-		sd_read_capacity(sdkp, buffer);
-
 		if (scsi_device_supports_vpd(sdp)) {
 			sd_read_block_provisioning(sdkp);
 			sd_read_block_limits(sdkp);
 			sd_read_block_characteristics(sdkp);
 		}
 
+		sd_read_capacity(sdkp, buffer);
+
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 654630b..74ec357 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -94,6 +94,8 @@ struct scsi_disk {
 	unsigned	lbpvpd : 1;
 	unsigned	ws10 : 1;
 	unsigned	ws16 : 1;
+	unsigned	rc_basis: 2;
+	unsigned	zoned: 2;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
@@ -151,6 +153,16 @@ static inline sector_t logical_to_sectors(struct scsi_device *sdev, sector_t blo
 	return blocks << (ilog2(sdev->sector_size) - 9);
 }
 
+static inline sector_t sectors_to_logical(struct scsi_device *sdev, sector_t sector)
+{
+	return sector >> (ilog2(sdev->sector_size) - 9);
+}
+
+static inline unsigned int logical_to_bytes(struct scsi_device *sdev, sector_t blocks)
+{
+	return blocks * sdev->sector_size;
+}
+
 /*
  * A DIF-capable target device can be formatted with different
  * protection schemes.  Currently 0 through 3 are defined:
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index d1defd1..6ba66e0 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -299,4 +299,21 @@ struct scsi_lun {
 #define SCSI_ACCESS_STATE_MASK        0x0f
 #define SCSI_ACCESS_STATE_PREFERRED   0x80
 
+/* Reporting options for REPORT ZONES */
+enum zbc_zone_reporting_options {
+	ZBC_ZONE_REPORTING_OPTION_ALL = 0,
+	ZBC_ZONE_REPORTING_OPTION_EMPTY,
+	ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
+	ZBC_ZONE_REPORTING_OPTION_CLOSED,
+	ZBC_ZONE_REPORTING_OPTION_FULL,
+	ZBC_ZONE_REPORTING_OPTION_READONLY,
+	ZBC_ZONE_REPORTING_OPTION_OFFLINE,
+	ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
+	ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
+	ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
+};
+
+#define ZBC_REPORT_ZONE_PARTIAL 0x80
+
 #endif /* _SCSI_PROTO_H_ */
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] sd: Implement new RESET_WP provisioning mode
  2016-07-19 13:25 [PATCH 0/5] Add support for ZBC host-managed devices Hannes Reinecke
  2016-07-19 13:25 ` [PATCH 1/5] sd: configure ZBC devices Hannes Reinecke
@ 2016-07-19 13:25 ` Hannes Reinecke
  2016-07-20  0:49   ` Damien Le Moal
  2016-07-19 13:25 ` [PATCH 3/5] sd: Implement support for ZBC devices Hannes Reinecke
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-19 13:25 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Damien Le Moal,
	Hannes Reinecke

We can map the RESET WRITE POINTER command onto a 'discard'
request.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/sd.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-----------
 drivers/scsi/sd.h |  1 +
 2 files changed, 53 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 249ea81..52dda83 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -369,6 +369,7 @@ static const char *lbp_mode[] = {
 	[SD_LBP_WS16]		= "writesame_16",
 	[SD_LBP_WS10]		= "writesame_10",
 	[SD_LBP_ZERO]		= "writesame_zero",
+	[SD_ZBC_RESET_WP]	= "reset_wp",
 	[SD_LBP_DISABLE]	= "disabled",
 };
 
@@ -391,6 +392,13 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
+	if (sdkp->zoned == 1) {
+		if (!strncmp(buf, lbp_mode[SD_ZBC_RESET_WP], 20)) {
+			sd_config_discard(sdkp, SD_ZBC_RESET_WP);
+			return count;
+		}
+		return -EINVAL;
+	}
 	if (sdp->type != TYPE_DISK)
 		return -EINVAL;
 
@@ -683,6 +691,11 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 		q->limits.discard_zeroes_data = sdkp->lbprz;
 		break;
 
+	case SD_ZBC_RESET_WP:
+		max_blocks = sdkp->unmap_granularity;
+		q->limits.discard_zeroes_data = 1;
+		break;
+
 	case SD_LBP_ZERO:
 		max_blocks = min_not_zero(sdkp->max_ws_blocks,
 					  (u32)SD_MAX_WS10_BLOCKS);
@@ -711,16 +724,20 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 	unsigned int nr_sectors = blk_rq_sectors(rq);
 	unsigned int nr_bytes = blk_rq_bytes(rq);
 	unsigned int len;
-	int ret;
+	int ret = 0;
 	char *buf;
-	struct page *page;
+	struct page *page = NULL;
 
 	sector >>= ilog2(sdp->sector_size) - 9;
 	nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
-	page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
-	if (!page)
-		return BLKPREP_DEFER;
+	if (sdkp->provisioning_mode != SD_ZBC_RESET_WP) {
+		page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+		if (!page)
+			return BLKPREP_DEFER;
+	}
+
+	rq->completion_data = page;
 
 	switch (sdkp->provisioning_mode) {
 	case SD_LBP_UNMAP:
@@ -760,12 +777,21 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 		len = sdkp->device->sector_size;
 		break;
 
+	case SD_ZBC_RESET_WP:
+		cmd->cmd_len = 16;
+		cmd->cmnd[0] = ZBC_OUT;
+		cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
+		put_unaligned_be64(sector, &cmd->cmnd[2]);
+		/* Reset Write Pointer doesn't have a payload */
+		len = 0;
+		cmd->sc_data_direction = DMA_NONE;
+		break;
+
 	default:
 		ret = BLKPREP_INVALID;
 		goto out;
 	}
 
-	rq->completion_data = page;
 	rq->timeout = SD_TIMEOUT;
 
 	cmd->transfersize = len;
@@ -779,13 +805,17 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 	 * discarded on disk. This allows us to report completion on the full
 	 * amount of blocks described by the request.
 	 */
-	blk_add_request_payload(rq, page, 0, len);
-	ret = scsi_init_io(cmd);
+	if (len) {
+		blk_add_request_payload(rq, page, 0, len);
+		ret = scsi_init_io(cmd);
+	}
 	rq->__data_len = nr_bytes;
 
 out:
-	if (ret != BLKPREP_OK)
+	if (page && ret != BLKPREP_OK) {
+		rq->completion_data = NULL;
 		__free_page(page);
+	}
 	return ret;
 }
 
@@ -1151,7 +1181,8 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 {
 	struct request *rq = SCpnt->request;
 
-	if (rq->cmd_flags & REQ_DISCARD)
+	if (rq->cmd_flags & REQ_DISCARD &&
+	    rq->completion_data)
 		__free_page(rq->completion_data);
 
 	if (SCpnt->cmnd != rq->cmd) {
@@ -1768,6 +1799,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	int sense_deferred = 0;
 	unsigned char op = SCpnt->cmnd[0];
 	unsigned char unmap = SCpnt->cmnd[1] & 8;
+	unsigned char sa = SCpnt->cmnd[1] & 0xf;
 
 	if (req->cmd_flags & REQ_DISCARD || req->cmd_flags & REQ_WRITE_SAME) {
 		if (!result) {
@@ -1819,6 +1851,10 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			case UNMAP:
 				sd_config_discard(sdkp, SD_LBP_DISABLE);
 				break;
+			case ZBC_OUT:
+				if (sa == ZO_RESET_WRITE_POINTER)
+					sd_config_discard(sdkp, SD_LBP_DISABLE);
+				break;
 			case WRITE_SAME_16:
 			case WRITE_SAME:
 				if (unmap)
@@ -2113,9 +2149,12 @@ static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
 	}
 	/* Read the zone length from the first zone descriptor */
 	desc = &buffer[64];
-	zone_len = logical_to_sectors(sdkp->device,
-				      get_unaligned_be64(&desc[8]));
-	blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
+	zone_len = get_unaligned_be64(&desc[8]);
+	sdkp->unmap_alignment = zone_len;
+	sdkp->unmap_granularity = zone_len;
+	blk_queue_chunk_sectors(sdkp->disk->queue,
+				logical_to_sectors(sdkp->device, zone_len));
+	sd_config_discard(sdkp, SD_ZBC_RESET_WP);
 }
 
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 74ec357..4439693 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -56,6 +56,7 @@ enum {
 	SD_LBP_WS16,		/* Use WRITE SAME(16) with UNMAP bit */
 	SD_LBP_WS10,		/* Use WRITE SAME(10) with UNMAP bit */
 	SD_LBP_ZERO,		/* Use WRITE SAME(10) with zero payload */
+	SD_ZBC_RESET_WP,	/* Use RESET WRITE POINTER */
 	SD_LBP_DISABLE,		/* Discard disabled due to failed cmd */
 };
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] sd: Implement support for ZBC devices
  2016-07-19 13:25 [PATCH 0/5] Add support for ZBC host-managed devices Hannes Reinecke
  2016-07-19 13:25 ` [PATCH 1/5] sd: configure ZBC devices Hannes Reinecke
  2016-07-19 13:25 ` [PATCH 2/5] sd: Implement new RESET_WP provisioning mode Hannes Reinecke
@ 2016-07-19 13:25 ` Hannes Reinecke
  2016-07-20  0:54   ` Damien Le Moal
  2016-08-12  6:00   ` Shaun Tancheff
  2016-07-19 13:25 ` [PATCH 4/5] sd: Limit messages for ZBC disks capacity change Hannes Reinecke
  2016-07-19 13:25 ` [PATCH 5/5] sd_zbc: Fix handling of ZBC read after write pointer Hannes Reinecke
  4 siblings, 2 replies; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-19 13:25 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Damien Le Moal,
	Hannes Reinecke

Implement ZBC support functions to read in the zone information
and setup the zone tree.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/Kconfig  |   8 +
 drivers/scsi/Makefile |   1 +
 drivers/scsi/sd.c     | 129 ++++++------
 drivers/scsi/sd.h     |  54 +++++
 drivers/scsi/sd_zbc.c | 538 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 670 insertions(+), 60 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 98e5d51..4b9a882 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -202,6 +202,14 @@ config SCSI_ENCLOSURE
 	  it has an enclosure device.  Selecting this option will just allow
 	  certain enclosure conditions to be reported and is not required.
 
+config SCSI_ZBC
+	bool "SCSI ZBC (zoned block commands) Support"
+	depends on SCSI && BLK_DEV_ZONED
+	help
+	  Enable support for ZBC (zoned block commands) devices.
+
+	  If unsure say N.
+
 config SCSI_CONSTANTS
 	bool "Verbose SCSI error reporting (kernel size += 36K)"
 	depends on SCSI
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 862ab4e..49bde97 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -178,6 +178,7 @@ hv_storvsc-y			:= storvsc_drv.o
 
 sd_mod-objs	:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
+sd_mod-$(CONFIG_SCSI_ZBC) += sd_zbc.o
 
 sr_mod-objs	:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 52dda83..f7b6132 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -92,6 +92,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS	16
@@ -162,7 +163,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 	static const char temp[] = "temporary ";
 	int len;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		/* no cache control on RBC devices; theoretically they
 		 * can do it, but there's probably so many exceptions
 		 * it's not worth the risk */
@@ -261,7 +262,7 @@ allow_restart_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	sdp->allow_restart = simple_strtoul(buf, NULL, 10);
@@ -392,7 +393,7 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdkp->zoned == 1) {
+	if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
 		if (!strncmp(buf, lbp_mode[SD_ZBC_RESET_WP], 20)) {
 			sd_config_discard(sdkp, SD_ZBC_RESET_WP);
 			return count;
@@ -466,7 +467,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &max);
@@ -778,6 +779,11 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 		break;
 
 	case SD_ZBC_RESET_WP:
+		/* sd_zbc_setup_discard uses block layer sector units */
+		ret = sd_zbc_setup_discard(sdkp, rq, blk_rq_pos(rq),
+					   blk_rq_sectors(rq));
+		if (ret != BLKPREP_OK)
+			goto out;
 		cmd->cmd_len = 16;
 		cmd->cmnd[0] = ZBC_OUT;
 		cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
@@ -873,6 +879,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
 	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+	if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
+		if (ret != BLKPREP_OK)
+			return ret;
+	}
+
 	sector >>= ilog2(sdp->sector_size) - 9;
 	nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
@@ -992,6 +1005,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
 					(unsigned long long)block));
 
+	if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
+		/* sd_zbc_setup_read_write uses block layer sector units */
+		ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
+		if (ret != BLKPREP_OK)
+			goto out;
+	}
+
 	/*
 	 * If we have a 1K hardware sectorsize, prevent access to single
 	 * 512 byte sectors.  In theory we could handle this - in fact
@@ -1806,6 +1826,13 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			good_bytes = blk_rq_bytes(req);
 			scsi_set_resid(SCpnt, 0);
 		} else {
+#ifdef CONFIG_SCSI_ZBC
+			if (op == ZBC_OUT)
+				/* RESET WRITE POINTER failed */
+				sd_zbc_update_zones(sdkp,
+						    blk_rq_pos(req),
+						    512, true);
+#endif
 			good_bytes = 0;
 			scsi_set_resid(SCpnt, blk_rq_bytes(req));
 		}
@@ -1869,6 +1896,26 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 				}
 			}
 		}
+		if (sshdr.asc == 0x21) {
+			/*
+			 * ZBC: read beyond the write pointer position.
+			 * Clear out error and return the buffer as-is.
+			 */
+			if (sshdr.ascq == 0x06) {
+				good_bytes = blk_rq_bytes(req);
+				scsi_set_resid(SCpnt, 0);
+			}
+#ifdef CONFIG_SCSI_ZBC
+			/*
+			 * ZBC: Unaligned write command.
+			 * Write did not start a write pointer position.
+			 */
+			if (sshdr.ascq == 0x04)
+				sd_zbc_update_zones(sdkp,
+						    blk_rq_pos(req),
+						    512, true);
+#endif
+		}
 		break;
 	default:
 		break;
@@ -2008,58 +2055,6 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 	}
 }
 
-/**
- * sd_zbc_report_zones - Issue a REPORT ZONES scsi command
- * @sdkp: SCSI disk to which the command should be send
- * @buffer: response buffer
- * @bufflen: length of @buffer
- * @start_sector: logical sector for the zone information should be reported
- * @option: option for report zones command
- * @partial: flag to set 'partial' bit for report zones command
- */
-static int
-sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buffer,
-		    int bufflen, sector_t start_sector,
-		    enum zbc_zone_reporting_options option, bool partial)
-{
-	struct scsi_device *sdp = sdkp->device;
-	const int timeout = sdp->request_queue->rq_timeout
-		* SD_FLUSH_TIMEOUT_MULTIPLIER;
-	struct scsi_sense_hdr sshdr;
-	sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
-	unsigned char cmd[16];
-	int result;
-
-	if (!scsi_device_online(sdp)) {
-		sd_printk(KERN_INFO, sdkp, "device not online\n");
-		return -ENODEV;
-	}
-
-	sd_printk(KERN_INFO, sdkp, "REPORT ZONES lba %zu len %d\n",
-		  start_lba, bufflen);
-
-	memset(cmd, 0, 16);
-	cmd[0] = ZBC_IN;
-	cmd[1] = ZI_REPORT_ZONES;
-	put_unaligned_be64(start_lba, &cmd[2]);
-	put_unaligned_be32(bufflen, &cmd[10]);
-	cmd[14] = (partial ? ZBC_REPORT_ZONE_PARTIAL : 0) | option;
-	memset(buffer, 0, bufflen);
-
-	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
-				  buffer, bufflen, &sshdr,
-				  timeout, SD_MAX_RETRIES, NULL);
-
-	if (result) {
-		sd_printk(KERN_NOTICE, sdkp,
-			  "REPORT ZONES lba %zu failed with %d/%d\n",
-			  start_lba, host_byte(result), driver_byte(result));
-
-		return -EIO;
-	}
-	return 0;
-}
-
 /*
  * Determine whether disk supports Data Integrity Field.
  */
@@ -2109,8 +2104,11 @@ static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
 	u8 same;
 	u64 zone_len, lba;
 
-	if (sdkp->zoned != 1)
-		/* Device managed, no special handling required */
+	if (sdkp->zoned != 1 && sdkp->device->type != TYPE_ZBC)
+		/*
+		 * Device managed or normal SCSI disk,
+		 * no special handling required
+		 */
 		return;
 
 	retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
@@ -2155,6 +2153,8 @@ static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
 	blk_queue_chunk_sectors(sdkp->disk->queue,
 				logical_to_sectors(sdkp->device, zone_len));
 	sd_config_discard(sdkp, SD_ZBC_RESET_WP);
+
+	sd_zbc_setup(sdkp, buffer, SD_BUF_SIZE);
 }
 
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
@@ -2750,7 +2750,7 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
 	struct scsi_mode_data data;
 	struct scsi_sense_hdr sshdr;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return;
 
 	if (sdkp->protection_type == 0)
@@ -3180,9 +3180,16 @@ static int sd_probe(struct device *dev)
 
 	scsi_autopm_get_device(sdp);
 	error = -ENODEV;
-	if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
+	if (sdp->type != TYPE_DISK &&
+	    sdp->type != TYPE_ZBC &&
+	    sdp->type != TYPE_MOD &&
+	    sdp->type != TYPE_RBC)
 		goto out;
 
+#ifndef CONFIG_SCSI_ZBC
+	if (sdp->type == TYPE_ZBC)
+		goto out;
+#endif
 	SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
 					"sd_probe\n"));
 
@@ -3286,6 +3293,8 @@ static int sd_remove(struct device *dev)
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 
+	sd_zbc_remove(sdkp);
+
 	blk_register_region(devt, SD_MINORS, NULL,
 			    sd_default_probe, NULL, NULL);
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4439693..5827b62 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -65,6 +65,12 @@ struct scsi_disk {
 	struct scsi_device *device;
 	struct device	dev;
 	struct gendisk	*disk;
+#ifdef CONFIG_SCSI_ZBC
+	struct workqueue_struct *zone_work_q;
+	unsigned long	zone_flags;
+#define SD_ZBC_ZONE_RESET 1
+#define SD_ZBC_ZONE_INIT  2
+#endif
 	atomic_t	openers;
 	sector_t	capacity;	/* size in logical blocks */
 	u32		max_xfer_blocks;
@@ -277,4 +283,52 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+#ifdef CONFIG_SCSI_ZBC
+
+extern int sd_zbc_report_zones(struct scsi_disk *, unsigned char *, int,
+			       sector_t, enum zbc_zone_reporting_options, bool);
+extern int sd_zbc_setup(struct scsi_disk *, char *, int);
+extern void sd_zbc_remove(struct scsi_disk *);
+extern void sd_zbc_reset_zones(struct scsi_disk *);
+extern int sd_zbc_setup_discard(struct scsi_disk *, struct request *,
+				sector_t, unsigned int);
+extern int sd_zbc_setup_read_write(struct scsi_disk *, struct request *,
+				   sector_t, unsigned int);
+extern void sd_zbc_update_zones(struct scsi_disk *, sector_t, int, bool);
+extern void sd_zbc_refresh_zone_work(struct work_struct *);
+
+#else /* CONFIG_SCSI_ZBC */
+
+static inline int sd_zbc_report_zones(struct scsi_disk *sdkp,
+				      unsigned char *buf, int buf_len,
+				      sector_t start_sector,
+				      enum zbc_zone_reporting_options option,
+				      bool partial)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int sd_zbc_setup(struct scsi_disk *sdkp,
+			       unsigned char *buf, int buf_len)
+{
+	return 0;
+}
+
+static inline int sd_zbc_setup_discard(struct scsi_disk *sdkp,
+				       struct request *rq, sector_t sector,
+				       unsigned int num_sectors)
+{
+	return BLKPREP_OK;
+}
+
+static inline int sd_zbc_setup_read_write(struct scsi_disk *sdkp,
+					  struct request *rq, sector_t sector,
+					  unsigned int num_sectors)
+{
+	return BLKPREP_OK;
+}
+
+static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
+#endif /* CONFIG_SCSI_ZBC */
+
 #endif /* _SCSI_DISK_H */
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
new file mode 100644
index 0000000..75cef62
--- /dev/null
+++ b/drivers/scsi/sd_zbc.c
@@ -0,0 +1,538 @@
+/*
+ * sd_zbc.c - SCSI Zoned Block commands
+ *
+ * Copyright (C) 2014-2015 SUSE Linux GmbH
+ * Written by: Hannes Reinecke <hare@suse.de>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include <linux/blkdev.h>
+#include <linux/rbtree.h>
+
+#include <asm/unaligned.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_dbg.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_eh.h>
+
+#include "sd.h"
+#include "scsi_priv.h"
+
+enum zbc_zone_cond {
+	ZBC_ZONE_COND_NO_WP,
+	ZBC_ZONE_COND_EMPTY,
+	ZBC_ZONE_COND_IMPLICIT_OPEN,
+	ZBC_ZONE_COND_EXPLICIT_OPEN,
+	ZBC_ZONE_COND_CLOSED,
+	ZBC_ZONE_COND_READONLY = 0xd,
+	ZBC_ZONE_COND_FULL,
+	ZBC_ZONE_COND_OFFLINE,
+};
+
+#define SD_ZBC_BUF_SIZE 524288
+
+#define sd_zbc_debug(sdkp, fmt, args...)				\
+	pr_debug("%s %s [%s]: " fmt,					\
+		 dev_driver_string(&(sdkp)->device->sdev_gendev),	\
+		 dev_name(&(sdkp)->device->sdev_gendev),		\
+		 (sdkp)->disk->disk_name, ## args)
+
+#define sd_zbc_debug_ratelimit(sdkp, fmt, args...)		\
+	do {							\
+		if (printk_ratelimit())				\
+			sd_zbc_debug(sdkp, fmt, ## args);	\
+	} while( 0 )
+
+struct zbc_update_work {
+	struct work_struct zone_work;
+	struct scsi_disk *sdkp;
+	sector_t	zone_sector;
+	int		zone_buflen;
+	char		zone_buf[0];
+};
+
+struct blk_zone *zbc_desc_to_zone(struct scsi_disk *sdkp, unsigned char *rec)
+{
+	struct blk_zone *zone;
+	enum zbc_zone_cond zone_cond;
+	sector_t wp = (sector_t)-1;
+
+	zone = kzalloc(sizeof(struct blk_zone), GFP_KERNEL);
+	if (!zone)
+		return NULL;
+
+	spin_lock_init(&zone->lock);
+	zone->type = rec[0] & 0xf;
+	zone_cond = (rec[1] >> 4) & 0xf;
+	zone->len = logical_to_sectors(sdkp->device,
+				       get_unaligned_be64(&rec[8]));
+	zone->start = logical_to_sectors(sdkp->device,
+					 get_unaligned_be64(&rec[16]));
+
+	if (blk_zone_is_smr(zone)) {
+		wp = logical_to_sectors(sdkp->device,
+					get_unaligned_be64(&rec[24]));
+		if (zone_cond == ZBC_ZONE_COND_READONLY) {
+			zone->state = BLK_ZONE_READONLY;
+		} else if (zone_cond == ZBC_ZONE_COND_OFFLINE) {
+			zone->state = BLK_ZONE_OFFLINE;
+		} else {
+			zone->state = BLK_ZONE_OPEN;
+		}
+	} else
+		zone->state = BLK_ZONE_NO_WP;
+
+	zone->wp = wp;
+	/*
+	 * Fixup block zone state
+	 */
+	if (zone_cond == ZBC_ZONE_COND_EMPTY &&
+	    zone->wp != zone->start) {
+		sd_zbc_debug(sdkp,
+			     "zone %zu state EMPTY wp %zu: adjust wp\n",
+			     zone->start, zone->wp);
+		zone->wp = zone->start;
+	}
+	if (zone_cond == ZBC_ZONE_COND_FULL &&
+	    zone->wp != zone->start + zone->len) {
+		sd_zbc_debug(sdkp,
+			     "zone %zu state FULL wp %zu: adjust wp\n",
+			     zone->start, zone->wp);
+		zone->wp = zone->start + zone->len;
+	}
+
+	return zone;
+}
+
+sector_t zbc_parse_zones(struct scsi_disk *sdkp, unsigned char *buf,
+			 unsigned int buf_len)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	unsigned char *rec = buf;
+	int rec_no = 0;
+	unsigned int list_length;
+	sector_t next_sector = -1;
+	u8 same;
+
+	/* Parse REPORT ZONES header */
+	list_length = get_unaligned_be32(&buf[0]);
+	same = buf[4] & 0xf;
+	rec = buf + 64;
+	list_length += 64;
+
+	if (list_length < buf_len)
+		buf_len = list_length;
+
+	while (rec < buf + buf_len) {
+		struct blk_zone *this, *old;
+		unsigned long flags;
+
+		this = zbc_desc_to_zone(sdkp, rec);
+		if (!this)
+			break;
+
+		next_sector = this->start + this->len;
+		old = blk_insert_zone(q, this);
+		if (old) {
+			spin_lock_irqsave(&old->lock, flags);
+			if (blk_zone_is_smr(old)) {
+				old->wp = this->wp;
+				old->state = this->state;
+			}
+			spin_unlock_irqrestore(&old->lock, flags);
+			kfree(this);
+		}
+		rec += 64;
+		rec_no++;
+	}
+
+	sd_zbc_debug(sdkp,
+		     "Inserted %d zones, next sector %zu len %d\n",
+		     rec_no, next_sector, list_length);
+
+	return next_sector;
+}
+
+void sd_zbc_refresh_zone_work(struct work_struct *work)
+{
+	struct zbc_update_work *zbc_work =
+		container_of(work, struct zbc_update_work, zone_work);
+	struct scsi_disk *sdkp = zbc_work->sdkp;
+	struct request_queue *q = sdkp->disk->queue;
+	unsigned int zone_buflen;
+	int ret;
+	sector_t last_sector;
+	sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+
+	zone_buflen = zbc_work->zone_buflen;
+	ret = sd_zbc_report_zones(sdkp, zbc_work->zone_buf, zone_buflen,
+				  zbc_work->zone_sector,
+				  ZBC_ZONE_REPORTING_OPTION_ALL, true);
+	if (ret)
+		goto done_free;
+
+	last_sector = zbc_parse_zones(sdkp, zbc_work->zone_buf, zone_buflen);
+	if (last_sector != -1 && last_sector < capacity) {
+		if (test_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
+			sd_zbc_debug(sdkp,
+				     "zones in reset, cancelling refresh\n");
+			ret = -EAGAIN;
+			goto done_free;
+		}
+
+		zbc_work->zone_sector = last_sector;
+		queue_work(sdkp->zone_work_q, &zbc_work->zone_work);
+		/* Kick request queue to be on the safe side */
+		goto done_start_queue;
+	}
+done_free:
+	kfree(zbc_work);
+	if (test_and_clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags) && ret) {
+		sd_zbc_debug(sdkp,
+			     "Cancelling zone initialisation\n");
+	}
+done_start_queue:
+	if (q->mq_ops)
+		blk_mq_start_hw_queues(q);
+	else {
+		unsigned long flags;
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_start_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+}
+
+/**
+ * sd_zbc_update_zones - Update zone information for @sector
+ * @sdkp: SCSI disk for which the zone information needs to be updated
+ * @sector: sector to be updated
+ * @bufsize: buffersize to be allocated
+ * @update: true if existing zones should be updated
+ */
+void sd_zbc_update_zones(struct scsi_disk *sdkp, sector_t sector, int bufsize,
+			 bool update)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	struct zbc_update_work *zbc_work;
+	struct blk_zone *zone;
+	struct rb_node *node;
+	int zone_num = 0, zone_busy = 0, num_rec;
+	sector_t next_sector = sector;
+
+	if (test_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
+		sd_zbc_debug(sdkp,
+			     "zones in reset, not starting update\n");
+		return;
+	}
+
+retry:
+	zbc_work = kzalloc(sizeof(struct zbc_update_work) + bufsize,
+			   update ? GFP_NOWAIT : GFP_KERNEL);
+	if (!zbc_work) {
+		if (bufsize > 512) {
+			sd_zbc_debug(sdkp,
+				     "retry with buffer size %d\n", bufsize);
+			bufsize = bufsize >> 1;
+			goto retry;
+		}
+		sd_zbc_debug(sdkp,
+			     "failed to allocate %d bytes\n", bufsize);
+		if (!update)
+			clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags);
+		return;
+	}
+	zbc_work->zone_sector = sector;
+	zbc_work->zone_buflen = bufsize;
+	zbc_work->sdkp = sdkp;
+	INIT_WORK(&zbc_work->zone_work, sd_zbc_refresh_zone_work);
+	num_rec = (bufsize / 64) - 1;
+
+	/*
+	 * Mark zones under update as BUSY
+	 */
+	if (update) {
+		for (node = rb_first(&q->zones); node; node = rb_next(node)) {
+			unsigned long flags;
+
+			zone = rb_entry(node, struct blk_zone, node);
+			if (num_rec == 0)
+				break;
+			if (zone->start != next_sector)
+				continue;
+			next_sector += zone->len;
+			num_rec--;
+
+			spin_lock_irqsave(&zone->lock, flags);
+			if (blk_zone_is_smr(zone)) {
+				if (zone->state == BLK_ZONE_BUSY) {
+					zone_busy++;
+				} else {
+					zone->state = BLK_ZONE_BUSY;
+					zone->wp = zone->start;
+				}
+				zone_num++;
+			}
+			spin_unlock_irqrestore(&zone->lock, flags);
+		}
+		if (zone_num && (zone_num == zone_busy)) {
+			sd_zbc_debug(sdkp,
+				     "zone update for %zu in progress\n",
+				     sector);
+			kfree(zbc_work);
+			return;
+		}
+	}
+
+	if (!queue_work(sdkp->zone_work_q, &zbc_work->zone_work)) {
+		sd_zbc_debug(sdkp,
+			     "zone update already queued?\n");
+		kfree(zbc_work);
+	}
+}
+
+/**
+ * sd_zbc_report_zones - Issue a REPORT ZONES scsi command
+ * @sdkp: SCSI disk to which the command should be send
+ * @buffer: response buffer
+ * @bufflen: length of @buffer
+ * @start_sector: logical sector for the zone information should be reported
+ * @option: reporting option to be used
+ * @partial: flag to set the 'partial' bit for report zones command
+ */
+int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buffer,
+			int bufflen, sector_t start_sector,
+			enum zbc_zone_reporting_options option, bool partial)
+{
+	struct scsi_device *sdp = sdkp->device;
+	const int timeout = sdp->request_queue->rq_timeout
+			* SD_FLUSH_TIMEOUT_MULTIPLIER;
+	struct scsi_sense_hdr sshdr;
+	sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
+	unsigned char cmd[16];
+	int result;
+
+	if (!scsi_device_online(sdp))
+		return -ENODEV;
+
+	sd_zbc_debug(sdkp, "REPORT ZONES lba %zu len %d\n",
+		     start_lba, bufflen);
+
+	memset(cmd, 0, 16);
+	cmd[0] = ZBC_IN;
+	cmd[1] = ZI_REPORT_ZONES;
+	put_unaligned_be64(start_lba, &cmd[2]);
+	put_unaligned_be32(bufflen, &cmd[10]);
+	cmd[14] = (partial ? ZBC_REPORT_ZONE_PARTIAL : 0) | option;
+	memset(buffer, 0, bufflen);
+
+	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
+				buffer, bufflen, &sshdr,
+				timeout, SD_MAX_RETRIES, NULL);
+
+	if (result) {
+		sd_zbc_debug(sdkp,
+			     "REPORT ZONES lba %zu failed with %d/%d\n",
+			     start_lba, host_byte(result), driver_byte(result));
+		return -EIO;
+	}
+
+	return 0;
+}
+
+int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq,
+			 sector_t sector, unsigned int num_sectors)
+{
+	struct blk_zone *zone;
+	int ret = BLKPREP_OK;
+	unsigned long flags;
+
+	zone = blk_lookup_zone(rq->q, sector);
+	if (!zone)
+		return BLKPREP_KILL;
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	if (zone->state == BLK_ZONE_UNKNOWN ||
+	    zone->state == BLK_ZONE_BUSY) {
+		sd_zbc_debug_ratelimit(sdkp,
+				       "Discarding zone %zu state %x, deferring\n",
+				       zone->start, zone->state);
+		ret = BLKPREP_DEFER;
+		goto out;
+	}
+	if (zone->state == BLK_ZONE_OFFLINE) {
+		/* let the drive fail the command */
+		sd_zbc_debug_ratelimit(sdkp,
+				       "Discarding offline zone %zu\n",
+				       zone->start);
+		goto out;
+	}
+
+	if (!blk_zone_is_smr(zone)) {
+		sd_zbc_debug_ratelimit(sdkp,
+				       "Discarding %s zone %zu\n",
+				       blk_zone_is_cmr(zone) ? "CMR" : "unknown",
+				       zone->start);
+		ret = BLKPREP_DONE;
+		goto out;
+	}
+	if (blk_zone_is_empty(zone)) {
+		sd_zbc_debug_ratelimit(sdkp,
+				       "Discarding empty zone %zu\n",
+				       zone->start);
+		ret = BLKPREP_DONE;
+		goto out;
+	}
+
+	if (zone->start != sector ||
+	    zone->len < num_sectors) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Misaligned RESET WP, start %zu/%zu "
+			  "len %zu/%u\n",
+			  zone->start, sector, zone->len, num_sectors);
+		ret = BLKPREP_KILL;
+		goto out;
+	}
+
+	/*
+	 * Opportunistic setting, will be fixed up with
+	 * zone update if RESET WRITE POINTER fails.
+	 */
+	zone->wp = zone->start;
+
+out:
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	return ret;
+}
+
+int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
+			    sector_t sector, unsigned int num_sectors)
+{
+	struct blk_zone *zone;
+	int ret = BLKPREP_OK;
+	unsigned long flags;
+
+	zone = blk_lookup_zone(sdkp->disk->queue, sector);
+	if (!zone) {
+		/* Might happen during zone initialization */
+		sd_zbc_debug_ratelimit(sdkp,
+				       "zone for sector %zu not found, skipping\n",
+				       sector);
+		return BLKPREP_OK;
+	}
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	if (zone->state == BLK_ZONE_UNKNOWN ||
+	    zone->state == BLK_ZONE_BUSY) {
+		sd_zbc_debug_ratelimit(sdkp,
+				       "zone %zu state %x, deferring\n",
+				       zone->start, zone->state);
+		ret = BLKPREP_DEFER;
+		goto out;
+	}
+	if (zone->state == BLK_ZONE_OFFLINE) {
+		/* let the drive fail the command */
+		sd_zbc_debug_ratelimit(sdkp,
+				       "zone %zu offline\n",
+				       zone->start);
+		goto out;
+	}
+
+	if (rq->cmd_flags & (REQ_WRITE | REQ_WRITE_SAME)) {
+		if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+			goto out;
+		if (zone->state == BLK_ZONE_READONLY)
+			goto out;
+		if (blk_zone_is_full(zone)) {
+			sd_zbc_debug(sdkp,
+				     "Write to full zone %zu/%zu\n",
+				     sector, zone->wp);
+			ret = BLKPREP_KILL;
+			goto out;
+		}
+		if (zone->wp != sector) {
+			sd_zbc_debug(sdkp,
+				     "Misaligned write %zu/%zu\n",
+				     sector, zone->wp);
+			ret = BLKPREP_KILL;
+			goto out;
+		}
+		zone->wp += num_sectors;
+	} else if (blk_zone_is_smr(zone) && (zone->wp <= sector)) {
+		sd_zbc_debug(sdkp,
+			     "Read beyond wp %zu/%zu\n",
+			     sector, zone->wp);
+		ret = BLKPREP_DONE;
+	}
+
+out:
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	return ret;
+}
+
+int sd_zbc_setup(struct scsi_disk *sdkp, char *buf, int buf_len)
+{
+	sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+	sector_t last_sector;
+
+	if (test_and_set_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags)) {
+		sdev_printk(KERN_WARNING, sdkp->device,
+			    "zone initialisation already running\n");
+		return 0;
+	}
+
+	if (!sdkp->zone_work_q) {
+		char wq_name[32];
+
+		sprintf(wq_name, "zbc_wq_%s", sdkp->disk->disk_name);
+		sdkp->zone_work_q = create_singlethread_workqueue(wq_name);
+		if (!sdkp->zone_work_q) {
+			sdev_printk(KERN_WARNING, sdkp->device,
+				    "create zoned disk workqueue failed\n");
+			return -ENOMEM;
+		}
+	} else if (!test_and_set_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
+		drain_workqueue(sdkp->zone_work_q);
+		clear_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags);
+	}
+
+	last_sector = zbc_parse_zones(sdkp, buf, buf_len);
+	if (last_sector != -1 && last_sector < capacity) {
+		sd_zbc_update_zones(sdkp, last_sector, SD_ZBC_BUF_SIZE, false);
+	} else
+		clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags);
+
+	return 0;
+}
+
+void sd_zbc_remove(struct scsi_disk *sdkp)
+{
+	if (sdkp->zone_work_q) {
+		if (!test_and_set_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags))
+			drain_workqueue(sdkp->zone_work_q);
+		clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags);
+		destroy_workqueue(sdkp->zone_work_q);
+	}
+}
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] sd: Limit messages for ZBC disks capacity change
  2016-07-19 13:25 [PATCH 0/5] Add support for ZBC host-managed devices Hannes Reinecke
                   ` (2 preceding siblings ...)
  2016-07-19 13:25 ` [PATCH 3/5] sd: Implement support for ZBC devices Hannes Reinecke
@ 2016-07-19 13:25 ` Hannes Reinecke
  2016-07-19 13:25 ` [PATCH 5/5] sd_zbc: Fix handling of ZBC read after write pointer Hannes Reinecke
  4 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-19 13:25 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Damien Le Moal

From: Damien Le Moal <damien.lemoal@hgst.com>

For ZBC disks with RC_BASIS=0, limit the message indicating
the capacity adjustment to the disk first scan.

Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 drivers/scsi/sd.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f7b6132..3a9d96e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2128,9 +2128,10 @@ static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
 		/* The max_lba field is the capacity of a zoned device */
 		lba = get_unaligned_be64(&buffer[8]);
 		if (lba + 1 > sdkp->capacity) {
-			sd_printk(KERN_WARNING, sdkp,
-				  "Max LBA %zu (capacity %zu)\n",
-				  (sector_t) lba + 1, sdkp->capacity);
+			if (sdkp->first_scan)
+				sd_printk(KERN_WARNING, sdkp,
+					  "Changing capacity from %zu to Max LBA+1 %zu\n",
+					  sdkp->capacity, (sector_t) lba + 1);
 			sdkp->capacity = lba + 1;
 		}
 	}
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] sd_zbc: Fix handling of ZBC read after write pointer
  2016-07-19 13:25 [PATCH 0/5] Add support for ZBC host-managed devices Hannes Reinecke
                   ` (3 preceding siblings ...)
  2016-07-19 13:25 ` [PATCH 4/5] sd: Limit messages for ZBC disks capacity change Hannes Reinecke
@ 2016-07-19 13:25 ` Hannes Reinecke
  4 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-19 13:25 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Damien Le Moal

From: Damien Le Moal <damien.lemoal@hgst.com>

For read requests beyond a sequential write required
zone write pointer, zero-out the request buffer and
directly complete the command. For read requests
straddling the write pointer position, limit the
request size to the number of valid sectors. The
remaining will be processed as a second request
and the buffer zeroed out.

Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
---
 drivers/scsi/sd.c     |  4 ++--
 drivers/scsi/sd.h     |  4 ++--
 drivers/scsi/sd_zbc.c | 33 +++++++++++++++++++++++++++------
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3a9d96e..4b704b0 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -881,7 +881,7 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
 	if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
 		/* sd_zbc_setup_read_write uses block layer sector units */
-		ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
+		ret = sd_zbc_setup_read_write(sdkp, rq, sector, &nr_sectors);
 		if (ret != BLKPREP_OK)
 			return ret;
 	}
@@ -1007,7 +1007,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 
 	if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
 		/* sd_zbc_setup_read_write uses block layer sector units */
-		ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
+		ret = sd_zbc_setup_read_write(sdkp, rq, block, &this_count);
 		if (ret != BLKPREP_OK)
 			goto out;
 	}
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 5827b62..47f922f 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -293,7 +293,7 @@ extern void sd_zbc_reset_zones(struct scsi_disk *);
 extern int sd_zbc_setup_discard(struct scsi_disk *, struct request *,
 				sector_t, unsigned int);
 extern int sd_zbc_setup_read_write(struct scsi_disk *, struct request *,
-				   sector_t, unsigned int);
+				   sector_t, unsigned int *);
 extern void sd_zbc_update_zones(struct scsi_disk *, sector_t, int, bool);
 extern void sd_zbc_refresh_zone_work(struct work_struct *);
 
@@ -323,7 +323,7 @@ static inline int sd_zbc_setup_discard(struct scsi_disk *sdkp,
 
 static inline int sd_zbc_setup_read_write(struct scsi_disk *sdkp,
 					  struct request *rq, sector_t sector,
-					  unsigned int num_sectors)
+					  unsigned int *num_sectors)
 {
 	return BLKPREP_OK;
 }
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 75cef62..0485c61 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -426,9 +426,10 @@ out:
 }
 
 int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
-			    sector_t sector, unsigned int num_sectors)
+			    sector_t sector, unsigned int *num_sectors)
 {
 	struct blk_zone *zone;
+	unsigned int sectors = *num_sectors;
 	int ret = BLKPREP_OK;
 	unsigned long flags;
 
@@ -478,12 +479,32 @@ int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
 			ret = BLKPREP_KILL;
 			goto out;
 		}
-		zone->wp += num_sectors;
-	} else if (blk_zone_is_smr(zone) && (zone->wp <= sector)) {
+		zone->wp += sectors;
+	} else if (zone->type == BLK_ZONE_TYPE_SEQWRITE_REQ &&
+		   zone->wp <= sector + sectors) {
+		if (zone->wp <= sector) {
+			/* Read beyond WP: clear request buffer */
+			struct req_iterator iter;
+			struct bio_vec bvec;
+			void *buf;
+			sd_zbc_debug(sdkp,
+				     "Read beyond wp %zu+%u/%zu\n",
+				     sector, sectors, zone->wp);
+			rq_for_each_segment(bvec, rq, iter) {
+				buf = bvec_kmap_irq(&bvec, &flags);
+				memset(buf, 0, bvec.bv_len);
+				flush_dcache_page(bvec.bv_page);
+				bvec_kunmap_irq(buf, &flags);
+			}
+			ret = BLKPREP_DONE;
+			goto out;
+		}
+		/* Read straddle WP position: limit request size */
+		*num_sectors = zone->wp - sector;
 		sd_zbc_debug(sdkp,
-			     "Read beyond wp %zu/%zu\n",
-			     sector, zone->wp);
-		ret = BLKPREP_DONE;
+			     "Read straddle wp %zu+%u/%zu => %zu+%u\n",
+			     sector, sectors, zone->wp,
+			     sector, *num_sectors);
 	}
 
 out:
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] sd: configure ZBC devices
  2016-07-19 13:25 ` [PATCH 1/5] sd: configure ZBC devices Hannes Reinecke
@ 2016-07-20  0:46   ` Damien Le Moal
  2016-07-22 21:56   ` Ewan D. Milne
  2016-08-01 14:24   ` Shaun Tancheff
  2 siblings, 0 replies; 19+ messages in thread
From: Damien Le Moal @ 2016-07-20  0:46 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke


On 7/19/16 22:25, Hannes Reinecke wrote:
> For ZBC devices I/O must not cross zone boundaries, so setup
> the 'chunk_sectors' block queue setting to the zone size.
> This is only valid for REPORT ZONES SAME type 2 or 3;
> for other types the zone sizes might be different
> for individual zones. So issue a warning if the type is
> found to be different.
> Also the capacity might be different from the announced
> capacity, so adjust it as needed.
>
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/scsi/sd.c         | 120 ++++++++++++++++++++++++++++++++++++++++++++--
>  drivers/scsi/sd.h         |  12 +++++
>  include/scsi/scsi_proto.h |  17 +++++++
>  3 files changed, 144 insertions(+), 5 deletions(-)

Reviewed-by: Damien Le Moal <damien.lemoal@hgst.com>
Tested-by: Damien Le Moal <damien.lemoal@hgst.com>

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital brand
Damien.LeMoal@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.hgst.com
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] sd: Implement new RESET_WP provisioning mode
  2016-07-19 13:25 ` [PATCH 2/5] sd: Implement new RESET_WP provisioning mode Hannes Reinecke
@ 2016-07-20  0:49   ` Damien Le Moal
  2016-07-20 14:52     ` Hannes Reinecke
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Le Moal @ 2016-07-20  0:49 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig

Hi Hannes,

On 7/19/16 22:25, Hannes Reinecke wrote:
> We can map the RESET WRITE POINTER command onto a 'discard'
> request.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/sd.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-----------
>  drivers/scsi/sd.h |  1 +
>  2 files changed, 53 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 249ea81..52dda83 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -369,6 +369,7 @@ static const char *lbp_mode[] = {
>  	[SD_LBP_WS16]		= "writesame_16",
>  	[SD_LBP_WS10]		= "writesame_10",
>  	[SD_LBP_ZERO]		= "writesame_zero",
> +	[SD_ZBC_RESET_WP]	= "reset_wp",
>  	[SD_LBP_DISABLE]	= "disabled",
>  };
>
> @@ -391,6 +392,13 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EACCES;
>
> +	if (sdkp->zoned == 1) {
> +		if (!strncmp(buf, lbp_mode[SD_ZBC_RESET_WP], 20)) {
> +			sd_config_discard(sdkp, SD_ZBC_RESET_WP);
> +			return count;
> +		}
> +		return -EINVAL;
> +	}
>  	if (sdp->type != TYPE_DISK)
>  		return -EINVAL;
>
> @@ -683,6 +691,11 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
>  		q->limits.discard_zeroes_data = sdkp->lbprz;
>  		break;
>
> +	case SD_ZBC_RESET_WP:
> +		max_blocks = sdkp->unmap_granularity;
> +		q->limits.discard_zeroes_data = 1;
> +		break;
> +
>  	case SD_LBP_ZERO:
>  		max_blocks = min_not_zero(sdkp->max_ws_blocks,
>  					  (u32)SD_MAX_WS10_BLOCKS);

I am still wondering if setting discard_zeroes_data to 1 is the right 
choice here since nothing will happen for conventional zones (no 
zeroing, no reset, nothing). discard_zeroes_data=0 may be a safer 
choice, even though I have not hit any issue with it set to 1.

Reviewed-by: Damien Le Moal <damien.lemoal@hgst.com>
Tested-by: Damien Le Moal <damien.lemoal@hgst.com>

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital brand
Damien.LeMoal@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.hgst.com
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] sd: Implement support for ZBC devices
  2016-07-19 13:25 ` [PATCH 3/5] sd: Implement support for ZBC devices Hannes Reinecke
@ 2016-07-20  0:54   ` Damien Le Moal
  2016-08-12  6:00   ` Shaun Tancheff
  1 sibling, 0 replies; 19+ messages in thread
From: Damien Le Moal @ 2016-07-20  0:54 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig


On 7/19/16 22:25, Hannes Reinecke wrote:
> Implement ZBC support functions to read in the zone information
> and setup the zone tree.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/Kconfig  |   8 +
>  drivers/scsi/Makefile |   1 +
>  drivers/scsi/sd.c     | 129 ++++++------
>  drivers/scsi/sd.h     |  54 +++++
>  drivers/scsi/sd_zbc.c | 538 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 670 insertions(+), 60 deletions(-)
>  create mode 100644 drivers/scsi/sd_zbc.c
[...]
> +/**
> + * sd_zbc_update_zones - Update zone information for @sector
> + * @sdkp: SCSI disk for which the zone information needs to be updated
> + * @sector: sector to be updated
> + * @bufsize: buffersize to be allocated
> + * @update: true if existing zones should be updated
> + */
> +void sd_zbc_update_zones(struct scsi_disk *sdkp, sector_t sector, int bufsize,
> +			 bool update)
> +{
> +	struct request_queue *q = sdkp->disk->queue;
> +	struct zbc_update_work *zbc_work;
> +	struct blk_zone *zone;
> +	struct rb_node *node;
> +	int zone_num = 0, zone_busy = 0, num_rec;
> +	sector_t next_sector = sector;
> +
> +	if (test_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
> +		sd_zbc_debug(sdkp,
> +			     "zones in reset, not starting update\n");
> +		return;
> +	}
> +
> +retry:
> +	zbc_work = kzalloc(sizeof(struct zbc_update_work) + bufsize,
> +			   update ? GFP_NOWAIT : GFP_KERNEL);

Christoph suggested that GFP_ATOMIC is more appropriate here instead of 
GFP_NOWAIT.

Reviewed-by: Damien Le Moal <damien.lemoal@hgst.com>
Tested-by: Damien Le Moal <damien.lemoal@hgst.com>

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital brand
Damien.LeMoal@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.hgst.com
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] sd: Implement new RESET_WP provisioning mode
  2016-07-20  0:49   ` Damien Le Moal
@ 2016-07-20 14:52     ` Hannes Reinecke
  0 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-20 14:52 UTC (permalink / raw)
  To: Damien Le Moal, Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig

On 07/20/2016 02:49 AM, Damien Le Moal wrote:
> Hi Hannes,
> 
> On 7/19/16 22:25, Hannes Reinecke wrote:
>> We can map the RESET WRITE POINTER command onto a 'discard'
>> request.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>  drivers/scsi/sd.c | 65
>> ++++++++++++++++++++++++++++++++++++++++++++-----------
>>  drivers/scsi/sd.h |  1 +
>>  2 files changed, 53 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 249ea81..52dda83 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -369,6 +369,7 @@ static const char *lbp_mode[] = {
>>      [SD_LBP_WS16]        = "writesame_16",
>>      [SD_LBP_WS10]        = "writesame_10",
>>      [SD_LBP_ZERO]        = "writesame_zero",
>> +    [SD_ZBC_RESET_WP]    = "reset_wp",
>>      [SD_LBP_DISABLE]    = "disabled",
>>  };
>>
>> @@ -391,6 +392,13 @@ provisioning_mode_store(struct device *dev,
>> struct device_attribute *attr,
>>      if (!capable(CAP_SYS_ADMIN))
>>          return -EACCES;
>>
>> +    if (sdkp->zoned == 1) {
>> +        if (!strncmp(buf, lbp_mode[SD_ZBC_RESET_WP], 20)) {
>> +            sd_config_discard(sdkp, SD_ZBC_RESET_WP);
>> +            return count;
>> +        }
>> +        return -EINVAL;
>> +    }
>>      if (sdp->type != TYPE_DISK)
>>          return -EINVAL;
>>
>> @@ -683,6 +691,11 @@ static void sd_config_discard(struct scsi_disk
>> *sdkp, unsigned int mode)
>>          q->limits.discard_zeroes_data = sdkp->lbprz;
>>          break;
>>
>> +    case SD_ZBC_RESET_WP:
>> +        max_blocks = sdkp->unmap_granularity;
>> +        q->limits.discard_zeroes_data = 1;
>> +        break;
>> +
>>      case SD_LBP_ZERO:
>>          max_blocks = min_not_zero(sdkp->max_ws_blocks,
>>                        (u32)SD_MAX_WS10_BLOCKS);
> 
> I am still wondering if setting discard_zeroes_data to 1 is the right
> choice here since nothing will happen for conventional zones (no
> zeroing, no reset, nothing). discard_zeroes_data=0 may be a safer
> choice, even though I have not hit any issue with it set to 1.
> 
This setting needs to be reviewed once hchs zero-out patches are in.
Thing is, we need to properly differentiate between 'discard' and
'zero-out'. Unfortunately ATM the libata stack only implements a
translation for 'write_same', which is then mapped onto DSM TRIM.
So the 'write_discard_zeroes' is even incorrect for current libata usage.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] sd: configure ZBC devices
  2016-07-19 13:25 ` [PATCH 1/5] sd: configure ZBC devices Hannes Reinecke
  2016-07-20  0:46   ` Damien Le Moal
@ 2016-07-22 21:56   ` Ewan D. Milne
  2016-07-23 20:31     ` Hannes Reinecke
  2016-08-01 14:24   ` Shaun Tancheff
  2 siblings, 1 reply; 19+ messages in thread
From: Ewan D. Milne @ 2016-07-22 21:56 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Damien Le Moal, Hannes Reinecke

On Tue, 2016-07-19 at 15:25 +0200, Hannes Reinecke wrote:
> For ZBC devices I/O must not cross zone boundaries, so setup
> the 'chunk_sectors' block queue setting to the zone size.
> This is only valid for REPORT ZONES SAME type 2 or 3;
> for other types the zone sizes might be different
> for individual zones. So issue a warning if the type is
> found to be different.
> Also the capacity might be different from the announced
> capacity, so adjust it as needed.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>

...

> +static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
> +{
> +	int retval;
> +	unsigned char *desc;
> +	u32 rep_len;
> +	u8 same;
> +	u64 zone_len, lba;
> +
> +	if (sdkp->zoned != 1)
> +		/* Device managed, no special handling required */
> +		return;
> +
> +	retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
> +				     0, ZBC_ZONE_REPORTING_OPTION_ALL, false);
> +	if (retval < 0)
> +		return;
> +
> +	rep_len = get_unaligned_be32(&buffer[0]);
> +	if (rep_len < 64) {
> +		sd_printk(KERN_WARNING, sdkp,
> +			  "REPORT ZONES report invalid length %u\n",
> +			  rep_len);
> +		return;
> +	}
> +
> +	if (sdkp->rc_basis == 0) {
> +		/* The max_lba field is the capacity of a zoned device */
> +		lba = get_unaligned_be64(&buffer[8]);
> +		if (lba + 1 > sdkp->capacity) {
> +			sd_printk(KERN_WARNING, sdkp,
> +				  "Max LBA %zu (capacity %zu)\n",
> +				  (sector_t) lba + 1, sdkp->capacity);
> +			sdkp->capacity = lba + 1;
> +		}
> +	}
> +
> +	/*
> +	 * Adjust 'chunk_sectors' to the zone length if the device
> +	 * supports equal zone sizes.
> +	 */
> +	same = buffer[4] & 0xf;
> +	if (same == 0 || same > 3) {
> +		sd_printk(KERN_WARNING, sdkp,
> +			  "REPORT ZONES SAME type %d not supported\n", same);
> +		return;
> +	}
> +	/* Read the zone length from the first zone descriptor */
> +	desc = &buffer[64];
> +	zone_len = logical_to_sectors(sdkp->device,
> +				      get_unaligned_be64(&desc[8]));
> +	blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
> +}
> +

So, blk_queue_chunk_sectors() has:

void blk_queue_chunk_sectors(struct request_queue *q, unsigned int chunk_sectors)
{
        BUG_ON(!is_power_of_2(chunk_sectors));
        q->limits.chunk_sectors = chunk_sectors;
}

and it seems like if some device reports a non-power-of-2 zone_len then we
will BUG_ON().  Probably would be better if we reported an error instead?

-Ewan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] sd: configure ZBC devices
  2016-07-22 21:56   ` Ewan D. Milne
@ 2016-07-23 20:31     ` Hannes Reinecke
  2016-07-23 22:04       ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-23 20:31 UTC (permalink / raw)
  To: emilne, Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Damien Le Moal

On 07/22/2016 11:56 PM, Ewan D. Milne wrote:
> On Tue, 2016-07-19 at 15:25 +0200, Hannes Reinecke wrote:
>> For ZBC devices I/O must not cross zone boundaries, so setup
>> the 'chunk_sectors' block queue setting to the zone size.
>> This is only valid for REPORT ZONES SAME type 2 or 3;
>> for other types the zone sizes might be different
>> for individual zones. So issue a warning if the type is
>> found to be different.
>> Also the capacity might be different from the announced
>> capacity, so adjust it as needed.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.com>
> 
> ...
> 
>> +static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
>> +{
>> +	int retval;
>> +	unsigned char *desc;
>> +	u32 rep_len;
>> +	u8 same;
>> +	u64 zone_len, lba;
>> +
>> +	if (sdkp->zoned != 1)
>> +		/* Device managed, no special handling required */
>> +		return;
>> +
>> +	retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
>> +				     0, ZBC_ZONE_REPORTING_OPTION_ALL, false);
>> +	if (retval < 0)
>> +		return;
>> +
>> +	rep_len = get_unaligned_be32(&buffer[0]);
>> +	if (rep_len < 64) {
>> +		sd_printk(KERN_WARNING, sdkp,
>> +			  "REPORT ZONES report invalid length %u\n",
>> +			  rep_len);
>> +		return;
>> +	}
>> +
>> +	if (sdkp->rc_basis == 0) {
>> +		/* The max_lba field is the capacity of a zoned device */
>> +		lba = get_unaligned_be64(&buffer[8]);
>> +		if (lba + 1 > sdkp->capacity) {
>> +			sd_printk(KERN_WARNING, sdkp,
>> +				  "Max LBA %zu (capacity %zu)\n",
>> +				  (sector_t) lba + 1, sdkp->capacity);
>> +			sdkp->capacity = lba + 1;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * Adjust 'chunk_sectors' to the zone length if the device
>> +	 * supports equal zone sizes.
>> +	 */
>> +	same = buffer[4] & 0xf;
>> +	if (same == 0 || same > 3) {
>> +		sd_printk(KERN_WARNING, sdkp,
>> +			  "REPORT ZONES SAME type %d not supported\n", same);
>> +		return;
>> +	}
>> +	/* Read the zone length from the first zone descriptor */
>> +	desc = &buffer[64];
>> +	zone_len = logical_to_sectors(sdkp->device,
>> +				      get_unaligned_be64(&desc[8]));
>> +	blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
>> +}
>> +
> 
> So, blk_queue_chunk_sectors() has:
> 
> void blk_queue_chunk_sectors(struct request_queue *q, unsigned int chunk_sectors)
> {
>         BUG_ON(!is_power_of_2(chunk_sectors));
>         q->limits.chunk_sectors = chunk_sectors;
> }
> 
> and it seems like if some device reports a non-power-of-2 zone_len then we
> will BUG_ON().  Probably would be better if we reported an error instead?
> 
The ZBC spec mandates that the zone size must be a power of 2.
So I don't have problems with triggering a BUG_ON for non-compliant
drives.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] sd: configure ZBC devices
  2016-07-23 20:31     ` Hannes Reinecke
@ 2016-07-23 22:04       ` Bart Van Assche
  2016-07-24  7:07         ` Hannes Reinecke
  2016-07-25  6:00         ` Hannes Reinecke
  0 siblings, 2 replies; 19+ messages in thread
From: Bart Van Assche @ 2016-07-23 22:04 UTC (permalink / raw)
  To: Hannes Reinecke, emilne, Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Damien Le Moal

On 07/23/16 13:31, Hannes Reinecke wrote:
> On 07/22/2016 11:56 PM, Ewan D. Milne wrote:
>> On Tue, 2016-07-19 at 15:25 +0200, Hannes Reinecke wrote:
>>> For ZBC devices I/O must not cross zone boundaries, so setup
>>> the 'chunk_sectors' block queue setting to the zone size.
>>> This is only valid for REPORT ZONES SAME type 2 or 3;
>>> for other types the zone sizes might be different
>>> for individual zones. So issue a warning if the type is
>>> found to be different.
>>> Also the capacity might be different from the announced
>>> capacity, so adjust it as needed.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>>
>> ...
>>
>>> +static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
>>> +{
>>> +	int retval;
>>> +	unsigned char *desc;
>>> +	u32 rep_len;
>>> +	u8 same;
>>> +	u64 zone_len, lba;
>>> +
>>> +	if (sdkp->zoned != 1)
>>> +		/* Device managed, no special handling required */
>>> +		return;
>>> +
>>> +	retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
>>> +				     0, ZBC_ZONE_REPORTING_OPTION_ALL, false);
>>> +	if (retval < 0)
>>> +		return;
>>> +
>>> +	rep_len = get_unaligned_be32(&buffer[0]);
>>> +	if (rep_len < 64) {
>>> +		sd_printk(KERN_WARNING, sdkp,
>>> +			  "REPORT ZONES report invalid length %u\n",
>>> +			  rep_len);
>>> +		return;
>>> +	}
>>> +
>>> +	if (sdkp->rc_basis == 0) {
>>> +		/* The max_lba field is the capacity of a zoned device */
>>> +		lba = get_unaligned_be64(&buffer[8]);
>>> +		if (lba + 1 > sdkp->capacity) {
>>> +			sd_printk(KERN_WARNING, sdkp,
>>> +				  "Max LBA %zu (capacity %zu)\n",
>>> +				  (sector_t) lba + 1, sdkp->capacity);
>>> +			sdkp->capacity = lba + 1;
>>> +		}
>>> +	}
>>> +
>>> +	/*
>>> +	 * Adjust 'chunk_sectors' to the zone length if the device
>>> +	 * supports equal zone sizes.
>>> +	 */
>>> +	same = buffer[4] & 0xf;
>>> +	if (same == 0 || same > 3) {
>>> +		sd_printk(KERN_WARNING, sdkp,
>>> +			  "REPORT ZONES SAME type %d not supported\n", same);
>>> +		return;
>>> +	}
>>> +	/* Read the zone length from the first zone descriptor */
>>> +	desc = &buffer[64];
>>> +	zone_len = logical_to_sectors(sdkp->device,
>>> +				      get_unaligned_be64(&desc[8]));
>>> +	blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
>>> +}
>>> +
>>
>> So, blk_queue_chunk_sectors() has:
>>
>> void blk_queue_chunk_sectors(struct request_queue *q, unsigned int chunk_sectors)
>> {
>>         BUG_ON(!is_power_of_2(chunk_sectors));
>>         q->limits.chunk_sectors = chunk_sectors;
>> }
>>
>> and it seems like if some device reports a non-power-of-2 zone_len then we
>> will BUG_ON().  Probably would be better if we reported an error instead?
>>
> The ZBC spec mandates that the zone size must be a power of 2.
> So I don't have problems with triggering a BUG_ON for non-compliant
> drives.

Triggering BUG_ON() if zone_len is not a power of two is completely 
unacceptable. No matter what zone information a ZBC drive exports that 
shouldn't result in a kernel oops.

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] sd: configure ZBC devices
  2016-07-23 22:04       ` Bart Van Assche
@ 2016-07-24  7:07         ` Hannes Reinecke
  2016-07-25  6:00         ` Hannes Reinecke
  1 sibling, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-24  7:07 UTC (permalink / raw)
  To: Bart Van Assche, Hannes Reinecke, emilne
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Damien Le Moal

On 07/24/2016 12:04 AM, Bart Van Assche wrote:
> On 07/23/16 13:31, Hannes Reinecke wrote:
>> On 07/22/2016 11:56 PM, Ewan D. Milne wrote:
>>> On Tue, 2016-07-19 at 15:25 +0200, Hannes Reinecke wrote:
>>>> For ZBC devices I/O must not cross zone boundaries, so setup
>>>> the 'chunk_sectors' block queue setting to the zone size.
>>>> This is only valid for REPORT ZONES SAME type 2 or 3;
>>>> for other types the zone sizes might be different
>>>> for individual zones. So issue a warning if the type is
>>>> found to be different.
>>>> Also the capacity might be different from the announced
>>>> capacity, so adjust it as needed.
>>>>
>>>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>>>
>>> ...
>>>
>>>> +static void sd_read_zones(struct scsi_disk *sdkp, unsigned char
>>>> *buffer)
>>>> +{
>>>> +    int retval;
>>>> +    unsigned char *desc;
>>>> +    u32 rep_len;
>>>> +    u8 same;
>>>> +    u64 zone_len, lba;
>>>> +
>>>> +    if (sdkp->zoned != 1)
>>>> +        /* Device managed, no special handling required */
>>>> +        return;
>>>> +
>>>> +    retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
>>>> +                     0, ZBC_ZONE_REPORTING_OPTION_ALL, false);
>>>> +    if (retval < 0)
>>>> +        return;
>>>> +
>>>> +    rep_len = get_unaligned_be32(&buffer[0]);
>>>> +    if (rep_len < 64) {
>>>> +        sd_printk(KERN_WARNING, sdkp,
>>>> +              "REPORT ZONES report invalid length %u\n",
>>>> +              rep_len);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (sdkp->rc_basis == 0) {
>>>> +        /* The max_lba field is the capacity of a zoned device */
>>>> +        lba = get_unaligned_be64(&buffer[8]);
>>>> +        if (lba + 1 > sdkp->capacity) {
>>>> +            sd_printk(KERN_WARNING, sdkp,
>>>> +                  "Max LBA %zu (capacity %zu)\n",
>>>> +                  (sector_t) lba + 1, sdkp->capacity);
>>>> +            sdkp->capacity = lba + 1;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Adjust 'chunk_sectors' to the zone length if the device
>>>> +     * supports equal zone sizes.
>>>> +     */
>>>> +    same = buffer[4] & 0xf;
>>>> +    if (same == 0 || same > 3) {
>>>> +        sd_printk(KERN_WARNING, sdkp,
>>>> +              "REPORT ZONES SAME type %d not supported\n", same);
>>>> +        return;
>>>> +    }
>>>> +    /* Read the zone length from the first zone descriptor */
>>>> +    desc = &buffer[64];
>>>> +    zone_len = logical_to_sectors(sdkp->device,
>>>> +                      get_unaligned_be64(&desc[8]));
>>>> +    blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
>>>> +}
>>>> +
>>>
>>> So, blk_queue_chunk_sectors() has:
>>>
>>> void blk_queue_chunk_sectors(struct request_queue *q, unsigned int
>>> chunk_sectors)
>>> {
>>>         BUG_ON(!is_power_of_2(chunk_sectors));
>>>         q->limits.chunk_sectors = chunk_sectors;
>>> }
>>>
>>> and it seems like if some device reports a non-power-of-2 zone_len
>>> then we
>>> will BUG_ON().  Probably would be better if we reported an error
>>> instead?
>>>
>> The ZBC spec mandates that the zone size must be a power of 2.
>> So I don't have problems with triggering a BUG_ON for non-compliant
>> drives.
> 
> Triggering BUG_ON() if zone_len is not a power of two is completely
> unacceptable. No matter what zone information a ZBC drive exports that
> shouldn't result in a kernel oops.
> 
Okay, I'll be changing it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] sd: configure ZBC devices
  2016-07-23 22:04       ` Bart Van Assche
  2016-07-24  7:07         ` Hannes Reinecke
@ 2016-07-25  6:00         ` Hannes Reinecke
  2016-07-25 13:24           ` Ewan D. Milne
  1 sibling, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2016-07-25  6:00 UTC (permalink / raw)
  To: Bart Van Assche, emilne, Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Damien Le Moal

On 07/24/2016 12:04 AM, Bart Van Assche wrote:
> On 07/23/16 13:31, Hannes Reinecke wrote:
>> On 07/22/2016 11:56 PM, Ewan D. Milne wrote:
>>> On Tue, 2016-07-19 at 15:25 +0200, Hannes Reinecke wrote:
>>>> For ZBC devices I/O must not cross zone boundaries, so setup
>>>> the 'chunk_sectors' block queue setting to the zone size.
>>>> This is only valid for REPORT ZONES SAME type 2 or 3;
>>>> for other types the zone sizes might be different
>>>> for individual zones. So issue a warning if the type is
>>>> found to be different.
>>>> Also the capacity might be different from the announced
>>>> capacity, so adjust it as needed.
>>>>
>>>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>>>
>>> ...
>>>
>>>> +static void sd_read_zones(struct scsi_disk *sdkp, unsigned char
>>>> *buffer)
>>>> +{
>>>> +    int retval;
>>>> +    unsigned char *desc;
>>>> +    u32 rep_len;
>>>> +    u8 same;
>>>> +    u64 zone_len, lba;
>>>> +
>>>> +    if (sdkp->zoned != 1)
>>>> +        /* Device managed, no special handling required */
>>>> +        return;
>>>> +
>>>> +    retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
>>>> +                     0, ZBC_ZONE_REPORTING_OPTION_ALL, false);
>>>> +    if (retval < 0)
>>>> +        return;
>>>> +
>>>> +    rep_len = get_unaligned_be32(&buffer[0]);
>>>> +    if (rep_len < 64) {
>>>> +        sd_printk(KERN_WARNING, sdkp,
>>>> +              "REPORT ZONES report invalid length %u\n",
>>>> +              rep_len);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (sdkp->rc_basis == 0) {
>>>> +        /* The max_lba field is the capacity of a zoned device */
>>>> +        lba = get_unaligned_be64(&buffer[8]);
>>>> +        if (lba + 1 > sdkp->capacity) {
>>>> +            sd_printk(KERN_WARNING, sdkp,
>>>> +                  "Max LBA %zu (capacity %zu)\n",
>>>> +                  (sector_t) lba + 1, sdkp->capacity);
>>>> +            sdkp->capacity = lba + 1;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Adjust 'chunk_sectors' to the zone length if the device
>>>> +     * supports equal zone sizes.
>>>> +     */
>>>> +    same = buffer[4] & 0xf;
>>>> +    if (same == 0 || same > 3) {
>>>> +        sd_printk(KERN_WARNING, sdkp,
>>>> +              "REPORT ZONES SAME type %d not supported\n", same);
>>>> +        return;
>>>> +    }
>>>> +    /* Read the zone length from the first zone descriptor */
>>>> +    desc = &buffer[64];
>>>> +    zone_len = logical_to_sectors(sdkp->device,
>>>> +                      get_unaligned_be64(&desc[8]));
>>>> +    blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
>>>> +}
>>>> +
>>>
>>> So, blk_queue_chunk_sectors() has:
>>>
>>> void blk_queue_chunk_sectors(struct request_queue *q, unsigned int
>>> chunk_sectors)
>>> {
>>>         BUG_ON(!is_power_of_2(chunk_sectors));
>>>         q->limits.chunk_sectors = chunk_sectors;
>>> }
>>>
>>> and it seems like if some device reports a non-power-of-2 zone_len
>>> then we
>>> will BUG_ON().  Probably would be better if we reported an error
>>> instead?
>>>
>> The ZBC spec mandates that the zone size must be a power of 2.
>> So I don't have problems with triggering a BUG_ON for non-compliant
>> drives.
> 
> Triggering BUG_ON() if zone_len is not a power of two is completely
> unacceptable. No matter what zone information a ZBC drive exports that
> shouldn't result in a kernel oops.
> 
Ok, will be fixing this.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] sd: configure ZBC devices
  2016-07-25  6:00         ` Hannes Reinecke
@ 2016-07-25 13:24           ` Ewan D. Milne
  0 siblings, 0 replies; 19+ messages in thread
From: Ewan D. Milne @ 2016-07-25 13:24 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Bart Van Assche, Hannes Reinecke, Martin K. Petersen,
	James Bottomley, linux-scsi, Christoph Hellwig, Damien Le Moal

On Mon, 2016-07-25 at 08:00 +0200, Hannes Reinecke wrote:
> On 07/24/2016 12:04 AM, Bart Van Assche wrote:
> > On 07/23/16 13:31, Hannes Reinecke wrote:
> >> On 07/22/2016 11:56 PM, Ewan D. Milne wrote:
> >>>
> >>> So, blk_queue_chunk_sectors() has:
> >>>
> >>> void blk_queue_chunk_sectors(struct request_queue *q, unsigned int
> >>> chunk_sectors)
> >>> {
> >>>         BUG_ON(!is_power_of_2(chunk_sectors));
> >>>         q->limits.chunk_sectors = chunk_sectors;
> >>> }
> >>>
> >>> and it seems like if some device reports a non-power-of-2 zone_len
> >>> then we
> >>> will BUG_ON().  Probably would be better if we reported an error
> >>> instead?
> >>>
> >> The ZBC spec mandates that the zone size must be a power of 2.
> >> So I don't have problems with triggering a BUG_ON for non-compliant
> >> drives.
> > 
> > Triggering BUG_ON() if zone_len is not a power of two is completely
> > unacceptable. No matter what zone information a ZBC drive exports that
> > shouldn't result in a kernel oops.
> > 
> Ok, will be fixing this.
> 
> Cheers,
> 
> Hannes

Yes, unfortunately we have too much history with non-compliant devices.
And, I much prefer to avoid crashing the kernel if it is not necessary.

Thanks.

-Ewan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] sd: configure ZBC devices
  2016-07-19 13:25 ` [PATCH 1/5] sd: configure ZBC devices Hannes Reinecke
  2016-07-20  0:46   ` Damien Le Moal
  2016-07-22 21:56   ` Ewan D. Milne
@ 2016-08-01 14:24   ` Shaun Tancheff
  2016-08-01 14:29     ` Hannes Reinecke
  2 siblings, 1 reply; 19+ messages in thread
From: Shaun Tancheff @ 2016-08-01 14:24 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman

On Tue, Jul 19, 2016 at 8:25 AM, Hannes Reinecke <hare@suse.de> wrote:
> For ZBC devices I/O must not cross zone boundaries, so setup
> the 'chunk_sectors' block queue setting to the zone size.
> This is only valid for REPORT ZONES SAME type 2 or 3;
> for other types the zone sizes might be different
> for individual zones. So issue a warning if the type is
> found to be different.
> Also the capacity might be different from the announced
> capacity, so adjust it as needed.
>
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/scsi/sd.c         | 120 ++++++++++++++++++++++++++++++++++++++++++++--
>  drivers/scsi/sd.h         |  12 +++++
>  include/scsi/scsi_proto.h |  17 +++++++
>  3 files changed, 144 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 428c03e..249ea81 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1972,6 +1972,57 @@ sd_spinup_disk(struct scsi_disk *sdkp)
>         }
>  }
>
> +/**
> + * sd_zbc_report_zones - Issue a REPORT ZONES scsi command
> + * @sdkp: SCSI disk to which the command should be send
> + * @buffer: response buffer
> + * @bufflen: length of @buffer
> + * @start_sector: logical sector for the zone information should be reported
> + * @option: option for report zones command
> + * @partial: flag to set 'partial' bit for report zones command
> + */
> +static int
> +sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buffer,
> +                   int bufflen, sector_t start_sector,
> +                   enum zbc_zone_reporting_options option, bool partial)
> +{
> +       struct scsi_device *sdp = sdkp->device;
> +       const int timeout = sdp->request_queue->rq_timeout
> +               * SD_FLUSH_TIMEOUT_MULTIPLIER;
> +       struct scsi_sense_hdr sshdr;
> +       sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
> +       unsigned char cmd[16];
> +       int result;
> +
> +       if (!scsi_device_online(sdp)) {
> +               sd_printk(KERN_INFO, sdkp, "device not online\n");
> +               return -ENODEV;
> +       }
> +
> +       sd_printk(KERN_INFO, sdkp, "REPORT ZONES lba %zu len %d\n",
> +                 start_lba, bufflen);
> +
> +       memset(cmd, 0, 16);
> +       cmd[0] = ZBC_IN;
> +       cmd[1] = ZI_REPORT_ZONES;
> +       put_unaligned_be64(start_lba, &cmd[2]);
> +       put_unaligned_be32(bufflen, &cmd[10]);
> +       cmd[14] = (partial ? ZBC_REPORT_ZONE_PARTIAL : 0) | option;
> +       memset(buffer, 0, bufflen);
> +
> +       result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
> +                                 buffer, bufflen, &sshdr,
> +                                 timeout, SD_MAX_RETRIES, NULL);
> +
> +       if (result) {
> +               sd_printk(KERN_NOTICE, sdkp,
> +                         "REPORT ZONES lba %zu failed with %d/%d\n",
> +                         start_lba, host_byte(result), driver_byte(result));
> +
> +               return -EIO;
> +       }
> +       return 0;
> +}
>
>  /*
>   * Determine whether disk supports Data Integrity Field.
> @@ -2014,6 +2065,59 @@ static int sd_read_protection_type(struct scsi_disk *sdkp, unsigned char *buffer
>         return ret;
>  }
>
> +static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
> +{
> +       int retval;
> +       unsigned char *desc;
> +       u32 rep_len;
> +       u8 same;
> +       u64 zone_len, lba;
> +
> +       if (sdkp->zoned != 1)
> +               /* Device managed, no special handling required */
> +               return;
> +
> +       retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
> +                                    0, ZBC_ZONE_REPORTING_OPTION_ALL, false);
> +       if (retval < 0)
> +               return;
> +
> +       rep_len = get_unaligned_be32(&buffer[0]);
> +       if (rep_len < 64) {
> +               sd_printk(KERN_WARNING, sdkp,
> +                         "REPORT ZONES report invalid length %u\n",
> +                         rep_len);
> +               return;
> +       }
> +
> +       if (sdkp->rc_basis == 0) {
> +               /* The max_lba field is the capacity of a zoned device */
> +               lba = get_unaligned_be64(&buffer[8]);
> +               if (lba + 1 > sdkp->capacity) {
> +                       sd_printk(KERN_WARNING, sdkp,
> +                                 "Max LBA %zu (capacity %zu)\n",
> +                                 (sector_t) lba + 1, sdkp->capacity);
> +                       sdkp->capacity = lba + 1;
> +               }
> +       }
> +
> +       /*
> +        * Adjust 'chunk_sectors' to the zone length if the device
> +        * supports equal zone sizes.
> +        */
> +       same = buffer[4] & 0xf;
> +       if (same == 0 || same > 3) {
> +               sd_printk(KERN_WARNING, sdkp,
> +                         "REPORT ZONES SAME type %d not supported\n", same);
> +               return;
> +       }

It's a bit unfortunate that you abort here. The current Seagate Host
Aware drives
must report a same code of 0 here due to the final 'runt' zone and are therefore
not supported by your RB-Tree in the following patches.

> +       /* Read the zone length from the first zone descriptor */
> +       desc = &buffer[64];
> +       zone_len = logical_to_sectors(sdkp->device,
> +                                     get_unaligned_be64(&desc[8]));
> +       blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
> +}
> +
>  static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
>                         struct scsi_sense_hdr *sshdr, int sense_valid,
>                         int the_result)

-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] sd: configure ZBC devices
  2016-08-01 14:24   ` Shaun Tancheff
@ 2016-08-01 14:29     ` Hannes Reinecke
  0 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2016-08-01 14:29 UTC (permalink / raw)
  To: Shaun Tancheff, Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Damien Le Moal, Josh Bingaman

On 08/01/2016 04:24 PM, Shaun Tancheff wrote:
> On Tue, Jul 19, 2016 at 8:25 AM, Hannes Reinecke <hare@suse.de> wrote:
>> For ZBC devices I/O must not cross zone boundaries, so setup
>> the 'chunk_sectors' block queue setting to the zone size.
>> This is only valid for REPORT ZONES SAME type 2 or 3;
>> for other types the zone sizes might be different
>> for individual zones. So issue a warning if the type is
>> found to be different.
>> Also the capacity might be different from the announced
>> capacity, so adjust it as needed.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>> ---
>>  drivers/scsi/sd.c         | 120 ++++++++++++++++++++++++++++++++++++++++++++--
>>  drivers/scsi/sd.h         |  12 +++++
>>  include/scsi/scsi_proto.h |  17 +++++++
>>  3 files changed, 144 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 428c03e..249ea81 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1972,6 +1972,57 @@ sd_spinup_disk(struct scsi_disk *sdkp)
>>         }
>>  }
>>
>> +/**
>> + * sd_zbc_report_zones - Issue a REPORT ZONES scsi command
>> + * @sdkp: SCSI disk to which the command should be send
>> + * @buffer: response buffer
>> + * @bufflen: length of @buffer
>> + * @start_sector: logical sector for the zone information should be reported
>> + * @option: option for report zones command
>> + * @partial: flag to set 'partial' bit for report zones command
>> + */
>> +static int
>> +sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buffer,
>> +                   int bufflen, sector_t start_sector,
>> +                   enum zbc_zone_reporting_options option, bool partial)
>> +{
>> +       struct scsi_device *sdp = sdkp->device;
>> +       const int timeout = sdp->request_queue->rq_timeout
>> +               * SD_FLUSH_TIMEOUT_MULTIPLIER;
>> +       struct scsi_sense_hdr sshdr;
>> +       sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
>> +       unsigned char cmd[16];
>> +       int result;
>> +
>> +       if (!scsi_device_online(sdp)) {
>> +               sd_printk(KERN_INFO, sdkp, "device not online\n");
>> +               return -ENODEV;
>> +       }
>> +
>> +       sd_printk(KERN_INFO, sdkp, "REPORT ZONES lba %zu len %d\n",
>> +                 start_lba, bufflen);
>> +
>> +       memset(cmd, 0, 16);
>> +       cmd[0] = ZBC_IN;
>> +       cmd[1] = ZI_REPORT_ZONES;
>> +       put_unaligned_be64(start_lba, &cmd[2]);
>> +       put_unaligned_be32(bufflen, &cmd[10]);
>> +       cmd[14] = (partial ? ZBC_REPORT_ZONE_PARTIAL : 0) | option;
>> +       memset(buffer, 0, bufflen);
>> +
>> +       result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
>> +                                 buffer, bufflen, &sshdr,
>> +                                 timeout, SD_MAX_RETRIES, NULL);
>> +
>> +       if (result) {
>> +               sd_printk(KERN_NOTICE, sdkp,
>> +                         "REPORT ZONES lba %zu failed with %d/%d\n",
>> +                         start_lba, host_byte(result), driver_byte(result));
>> +
>> +               return -EIO;
>> +       }
>> +       return 0;
>> +}
>>
>>  /*
>>   * Determine whether disk supports Data Integrity Field.
>> @@ -2014,6 +2065,59 @@ static int sd_read_protection_type(struct scsi_disk *sdkp, unsigned char *buffer
>>         return ret;
>>  }
>>
>> +static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
>> +{
>> +       int retval;
>> +       unsigned char *desc;
>> +       u32 rep_len;
>> +       u8 same;
>> +       u64 zone_len, lba;
>> +
>> +       if (sdkp->zoned != 1)
>> +               /* Device managed, no special handling required */
>> +               return;
>> +
>> +       retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
>> +                                    0, ZBC_ZONE_REPORTING_OPTION_ALL, false);
>> +       if (retval < 0)
>> +               return;
>> +
>> +       rep_len = get_unaligned_be32(&buffer[0]);
>> +       if (rep_len < 64) {
>> +               sd_printk(KERN_WARNING, sdkp,
>> +                         "REPORT ZONES report invalid length %u\n",
>> +                         rep_len);
>> +               return;
>> +       }
>> +
>> +       if (sdkp->rc_basis == 0) {
>> +               /* The max_lba field is the capacity of a zoned device */
>> +               lba = get_unaligned_be64(&buffer[8]);
>> +               if (lba + 1 > sdkp->capacity) {
>> +                       sd_printk(KERN_WARNING, sdkp,
>> +                                 "Max LBA %zu (capacity %zu)\n",
>> +                                 (sector_t) lba + 1, sdkp->capacity);
>> +                       sdkp->capacity = lba + 1;
>> +               }
>> +       }
>> +
>> +       /*
>> +        * Adjust 'chunk_sectors' to the zone length if the device
>> +        * supports equal zone sizes.
>> +        */
>> +       same = buffer[4] & 0xf;
>> +       if (same == 0 || same > 3) {
>> +               sd_printk(KERN_WARNING, sdkp,
>> +                         "REPORT ZONES SAME type %d not supported\n", same);
>> +               return;
>> +       }
>
> It's a bit unfortunate that you abort here. The current Seagate Host
> Aware drives
> must report a same code of 0 here due to the final 'runt' zone and are therefore
> not supported by your RB-Tree in the following patches.
>
Hmm. Yes, I am aware that Seagate is using '0' here.
However, I'm about to redo my patchset anyway as the sysfs attributes 
were deemed to complex.
So what I'm going to do is to have a single sysfs attribute 'zone_len' 
(or 'zone_size' ?) presenting the size of the zones (minus the last one).
And then we can setup that attribute once we've read in all zones; that 
way we'll be insulated against any issues with 'same == 0'.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] sd: Implement support for ZBC devices
  2016-07-19 13:25 ` [PATCH 3/5] sd: Implement support for ZBC devices Hannes Reinecke
  2016-07-20  0:54   ` Damien Le Moal
@ 2016-08-12  6:00   ` Shaun Tancheff
  1 sibling, 0 replies; 19+ messages in thread
From: Shaun Tancheff @ 2016-08-12  6:00 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Damien Le Moal

On Tue, Jul 19, 2016 at 8:25 AM, Hannes Reinecke <hare@suse.de> wrote:
> Implement ZBC support functions to read in the zone information
> and setup the zone tree.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/Kconfig  |   8 +
>  drivers/scsi/Makefile |   1 +
>  drivers/scsi/sd.c     | 129 ++++++------
>  drivers/scsi/sd.h     |  54 +++++
>  drivers/scsi/sd_zbc.c | 538 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 670 insertions(+), 60 deletions(-)
>  create mode 100644 drivers/scsi/sd_zbc.c
>
> diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
> index 98e5d51..4b9a882 100644
> --- a/drivers/scsi/Kconfig
> +++ b/drivers/scsi/Kconfig
> @@ -202,6 +202,14 @@ config SCSI_ENCLOSURE
>           it has an enclosure device.  Selecting this option will just allow
>           certain enclosure conditions to be reported and is not required.
>
> +config SCSI_ZBC
> +       bool "SCSI ZBC (zoned block commands) Support"
> +       depends on SCSI && BLK_DEV_ZONED
> +       help
> +         Enable support for ZBC (zoned block commands) devices.
> +
> +         If unsure say N.
> +
>  config SCSI_CONSTANTS
>         bool "Verbose SCSI error reporting (kernel size += 36K)"
>         depends on SCSI
> diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
> index 862ab4e..49bde97 100644
> --- a/drivers/scsi/Makefile
> +++ b/drivers/scsi/Makefile
> @@ -178,6 +178,7 @@ hv_storvsc-y                        := storvsc_drv.o
>
>  sd_mod-objs    := sd.o
>  sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
> +sd_mod-$(CONFIG_SCSI_ZBC) += sd_zbc.o
>
>  sr_mod-objs    := sr.o sr_ioctl.o sr_vendor.o
>  ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 52dda83..f7b6132 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -92,6 +92,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
> +MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
>
>  #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
>  #define SD_MINORS      16
> @@ -162,7 +163,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
>         static const char temp[] = "temporary ";
>         int len;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 /* no cache control on RBC devices; theoretically they
>                  * can do it, but there's probably so many exceptions
>                  * it's not worth the risk */
> @@ -261,7 +262,7 @@ allow_restart_store(struct device *dev, struct device_attribute *attr,
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 return -EINVAL;
>
>         sdp->allow_restart = simple_strtoul(buf, NULL, 10);
> @@ -392,7 +393,7 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
>
> -       if (sdkp->zoned == 1) {
> +       if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
>                 if (!strncmp(buf, lbp_mode[SD_ZBC_RESET_WP], 20)) {
>                         sd_config_discard(sdkp, SD_ZBC_RESET_WP);
>                         return count;
> @@ -466,7 +467,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EACCES;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 return -EINVAL;
>
>         err = kstrtoul(buf, 10, &max);
> @@ -778,6 +779,11 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
>                 break;
>
>         case SD_ZBC_RESET_WP:
> +               /* sd_zbc_setup_discard uses block layer sector units */
> +               ret = sd_zbc_setup_discard(sdkp, rq, blk_rq_pos(rq),
> +                                          blk_rq_sectors(rq));
> +               if (ret != BLKPREP_OK)
> +                       goto out;
>                 cmd->cmd_len = 16;
>                 cmd->cmnd[0] = ZBC_OUT;
>                 cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
> @@ -873,6 +879,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
>
>         BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
>
> +       if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
> +               /* sd_zbc_setup_read_write uses block layer sector units */
> +               ret = sd_zbc_setup_read_write(sdkp, rq, sector, nr_sectors);
> +               if (ret != BLKPREP_OK)
> +                       return ret;
> +       }
> +
>         sector >>= ilog2(sdp->sector_size) - 9;
>         nr_sectors >>= ilog2(sdp->sector_size) - 9;
>
> @@ -992,6 +1005,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
>         SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
>                                         (unsigned long long)block));
>
> +       if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
> +               /* sd_zbc_setup_read_write uses block layer sector units */
> +               ret = sd_zbc_setup_read_write(sdkp, rq, block, this_count);
> +               if (ret != BLKPREP_OK)
> +                       goto out;
> +       }
> +
>         /*
>          * If we have a 1K hardware sectorsize, prevent access to single
>          * 512 byte sectors.  In theory we could handle this - in fact
> @@ -1806,6 +1826,13 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>                         good_bytes = blk_rq_bytes(req);
>                         scsi_set_resid(SCpnt, 0);
>                 } else {
> +#ifdef CONFIG_SCSI_ZBC
> +                       if (op == ZBC_OUT)
> +                               /* RESET WRITE POINTER failed */
> +                               sd_zbc_update_zones(sdkp,
> +                                                   blk_rq_pos(req),
> +                                                   512, true);
> +#endif
>                         good_bytes = 0;
>                         scsi_set_resid(SCpnt, blk_rq_bytes(req));
>                 }
> @@ -1869,6 +1896,26 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>                                 }
>                         }
>                 }
> +               if (sshdr.asc == 0x21) {
> +                       /*
> +                        * ZBC: read beyond the write pointer position.
> +                        * Clear out error and return the buffer as-is.
> +                        */
> +                       if (sshdr.ascq == 0x06) {
> +                               good_bytes = blk_rq_bytes(req);
> +                               scsi_set_resid(SCpnt, 0);
> +                       }
> +#ifdef CONFIG_SCSI_ZBC
> +                       /*
> +                        * ZBC: Unaligned write command.
> +                        * Write did not start a write pointer position.
> +                        */
> +                       if (sshdr.ascq == 0x04)
> +                               sd_zbc_update_zones(sdkp,
> +                                                   blk_rq_pos(req),
> +                                                   512, true);
> +#endif
> +               }
>                 break;
>         default:
>                 break;
> @@ -2008,58 +2055,6 @@ sd_spinup_disk(struct scsi_disk *sdkp)
>         }
>  }
>
> -/**
> - * sd_zbc_report_zones - Issue a REPORT ZONES scsi command
> - * @sdkp: SCSI disk to which the command should be send
> - * @buffer: response buffer
> - * @bufflen: length of @buffer
> - * @start_sector: logical sector for the zone information should be reported
> - * @option: option for report zones command
> - * @partial: flag to set 'partial' bit for report zones command
> - */
> -static int
> -sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buffer,
> -                   int bufflen, sector_t start_sector,
> -                   enum zbc_zone_reporting_options option, bool partial)
> -{
> -       struct scsi_device *sdp = sdkp->device;
> -       const int timeout = sdp->request_queue->rq_timeout
> -               * SD_FLUSH_TIMEOUT_MULTIPLIER;
> -       struct scsi_sense_hdr sshdr;
> -       sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
> -       unsigned char cmd[16];
> -       int result;
> -
> -       if (!scsi_device_online(sdp)) {
> -               sd_printk(KERN_INFO, sdkp, "device not online\n");
> -               return -ENODEV;
> -       }
> -
> -       sd_printk(KERN_INFO, sdkp, "REPORT ZONES lba %zu len %d\n",
> -                 start_lba, bufflen);
> -
> -       memset(cmd, 0, 16);
> -       cmd[0] = ZBC_IN;
> -       cmd[1] = ZI_REPORT_ZONES;
> -       put_unaligned_be64(start_lba, &cmd[2]);
> -       put_unaligned_be32(bufflen, &cmd[10]);
> -       cmd[14] = (partial ? ZBC_REPORT_ZONE_PARTIAL : 0) | option;
> -       memset(buffer, 0, bufflen);
> -
> -       result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
> -                                 buffer, bufflen, &sshdr,
> -                                 timeout, SD_MAX_RETRIES, NULL);
> -
> -       if (result) {
> -               sd_printk(KERN_NOTICE, sdkp,
> -                         "REPORT ZONES lba %zu failed with %d/%d\n",
> -                         start_lba, host_byte(result), driver_byte(result));
> -
> -               return -EIO;
> -       }
> -       return 0;
> -}
> -
>  /*
>   * Determine whether disk supports Data Integrity Field.
>   */
> @@ -2109,8 +2104,11 @@ static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
>         u8 same;
>         u64 zone_len, lba;
>
> -       if (sdkp->zoned != 1)
> -               /* Device managed, no special handling required */
> +       if (sdkp->zoned != 1 && sdkp->device->type != TYPE_ZBC)
> +               /*
> +                * Device managed or normal SCSI disk,
> +                * no special handling required
> +                */
>                 return;
>
>         retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
> @@ -2155,6 +2153,8 @@ static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
>         blk_queue_chunk_sectors(sdkp->disk->queue,
>                                 logical_to_sectors(sdkp->device, zone_len));
>         sd_config_discard(sdkp, SD_ZBC_RESET_WP);
> +
> +       sd_zbc_setup(sdkp, buffer, SD_BUF_SIZE);
>  }
>
>  static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
> @@ -2750,7 +2750,7 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
>         struct scsi_mode_data data;
>         struct scsi_sense_hdr sshdr;
>
> -       if (sdp->type != TYPE_DISK)
> +       if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
>                 return;
>
>         if (sdkp->protection_type == 0)
> @@ -3180,9 +3180,16 @@ static int sd_probe(struct device *dev)
>
>         scsi_autopm_get_device(sdp);
>         error = -ENODEV;
> -       if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
> +       if (sdp->type != TYPE_DISK &&
> +           sdp->type != TYPE_ZBC &&
> +           sdp->type != TYPE_MOD &&
> +           sdp->type != TYPE_RBC)
>                 goto out;
>
> +#ifndef CONFIG_SCSI_ZBC
> +       if (sdp->type == TYPE_ZBC)
> +               goto out;
> +#endif
>         SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
>                                         "sd_probe\n"));
>
> @@ -3286,6 +3293,8 @@ static int sd_remove(struct device *dev)
>         del_gendisk(sdkp->disk);
>         sd_shutdown(dev);
>
> +       sd_zbc_remove(sdkp);
> +
>         blk_register_region(devt, SD_MINORS, NULL,
>                             sd_default_probe, NULL, NULL);
>
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 4439693..5827b62 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -65,6 +65,12 @@ struct scsi_disk {
>         struct scsi_device *device;
>         struct device   dev;
>         struct gendisk  *disk;
> +#ifdef CONFIG_SCSI_ZBC
> +       struct workqueue_struct *zone_work_q;
> +       unsigned long   zone_flags;
> +#define SD_ZBC_ZONE_RESET 1
> +#define SD_ZBC_ZONE_INIT  2
> +#endif
>         atomic_t        openers;
>         sector_t        capacity;       /* size in logical blocks */
>         u32             max_xfer_blocks;
> @@ -277,4 +283,52 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
>
>  #endif /* CONFIG_BLK_DEV_INTEGRITY */
>
> +#ifdef CONFIG_SCSI_ZBC
> +
> +extern int sd_zbc_report_zones(struct scsi_disk *, unsigned char *, int,
> +                              sector_t, enum zbc_zone_reporting_options, bool);
> +extern int sd_zbc_setup(struct scsi_disk *, char *, int);
> +extern void sd_zbc_remove(struct scsi_disk *);
> +extern void sd_zbc_reset_zones(struct scsi_disk *);
> +extern int sd_zbc_setup_discard(struct scsi_disk *, struct request *,
> +                               sector_t, unsigned int);
> +extern int sd_zbc_setup_read_write(struct scsi_disk *, struct request *,
> +                                  sector_t, unsigned int);
> +extern void sd_zbc_update_zones(struct scsi_disk *, sector_t, int, bool);
> +extern void sd_zbc_refresh_zone_work(struct work_struct *);
> +
> +#else /* CONFIG_SCSI_ZBC */
> +
> +static inline int sd_zbc_report_zones(struct scsi_disk *sdkp,
> +                                     unsigned char *buf, int buf_len,
> +                                     sector_t start_sector,
> +                                     enum zbc_zone_reporting_options option,
> +                                     bool partial)
> +{
> +       return -EOPNOTSUPP;
> +}
> +
> +static inline int sd_zbc_setup(struct scsi_disk *sdkp,
> +                              unsigned char *buf, int buf_len)
> +{
> +       return 0;
> +}
> +
> +static inline int sd_zbc_setup_discard(struct scsi_disk *sdkp,
> +                                      struct request *rq, sector_t sector,
> +                                      unsigned int num_sectors)
> +{
> +       return BLKPREP_OK;
> +}
> +
> +static inline int sd_zbc_setup_read_write(struct scsi_disk *sdkp,
> +                                         struct request *rq, sector_t sector,
> +                                         unsigned int num_sectors)
> +{
> +       return BLKPREP_OK;
> +}
> +
> +static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
> +#endif /* CONFIG_SCSI_ZBC */
> +
>  #endif /* _SCSI_DISK_H */
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> new file mode 100644
> index 0000000..75cef62
> --- /dev/null
> +++ b/drivers/scsi/sd_zbc.c
> @@ -0,0 +1,538 @@
> +/*
> + * sd_zbc.c - SCSI Zoned Block commands
> + *
> + * Copyright (C) 2014-2015 SUSE Linux GmbH
> + * Written by: Hannes Reinecke <hare@suse.de>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; see the file COPYING.  If not, write to
> + * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
> + * USA.
> + *
> + */
> +
> +#include <linux/blkdev.h>
> +#include <linux/rbtree.h>
> +
> +#include <asm/unaligned.h>
> +
> +#include <scsi/scsi.h>
> +#include <scsi/scsi_cmnd.h>
> +#include <scsi/scsi_dbg.h>
> +#include <scsi/scsi_device.h>
> +#include <scsi/scsi_driver.h>
> +#include <scsi/scsi_host.h>
> +#include <scsi/scsi_eh.h>
> +
> +#include "sd.h"
> +#include "scsi_priv.h"
> +
> +enum zbc_zone_cond {
> +       ZBC_ZONE_COND_NO_WP,
> +       ZBC_ZONE_COND_EMPTY,
> +       ZBC_ZONE_COND_IMPLICIT_OPEN,
> +       ZBC_ZONE_COND_EXPLICIT_OPEN,
> +       ZBC_ZONE_COND_CLOSED,
> +       ZBC_ZONE_COND_READONLY = 0xd,
> +       ZBC_ZONE_COND_FULL,
> +       ZBC_ZONE_COND_OFFLINE,
> +};
> +
> +#define SD_ZBC_BUF_SIZE 524288
> +
> +#define sd_zbc_debug(sdkp, fmt, args...)                               \
> +       pr_debug("%s %s [%s]: " fmt,                                    \
> +                dev_driver_string(&(sdkp)->device->sdev_gendev),       \
> +                dev_name(&(sdkp)->device->sdev_gendev),                \
> +                (sdkp)->disk->disk_name, ## args)
> +
> +#define sd_zbc_debug_ratelimit(sdkp, fmt, args...)             \
> +       do {                                                    \
> +               if (printk_ratelimit())                         \
> +                       sd_zbc_debug(sdkp, fmt, ## args);       \
> +       } while( 0 )
> +
> +struct zbc_update_work {
> +       struct work_struct zone_work;
> +       struct scsi_disk *sdkp;
> +       sector_t        zone_sector;
> +       int             zone_buflen;
> +       char            zone_buf[0];
> +};
> +
> +struct blk_zone *zbc_desc_to_zone(struct scsi_disk *sdkp, unsigned char *rec)
> +{
> +       struct blk_zone *zone;
> +       enum zbc_zone_cond zone_cond;
> +       sector_t wp = (sector_t)-1;
> +
> +       zone = kzalloc(sizeof(struct blk_zone), GFP_KERNEL);
> +       if (!zone)
> +               return NULL;
> +
> +       spin_lock_init(&zone->lock);
> +       zone->type = rec[0] & 0xf;
> +       zone_cond = (rec[1] >> 4) & 0xf;
> +       zone->len = logical_to_sectors(sdkp->device,
> +                                      get_unaligned_be64(&rec[8]));
> +       zone->start = logical_to_sectors(sdkp->device,
> +                                        get_unaligned_be64(&rec[16]));
> +
> +       if (blk_zone_is_smr(zone)) {
> +               wp = logical_to_sectors(sdkp->device,
> +                                       get_unaligned_be64(&rec[24]));
> +               if (zone_cond == ZBC_ZONE_COND_READONLY) {
> +                       zone->state = BLK_ZONE_READONLY;
> +               } else if (zone_cond == ZBC_ZONE_COND_OFFLINE) {
> +                       zone->state = BLK_ZONE_OFFLINE;
> +               } else {
> +                       zone->state = BLK_ZONE_OPEN;
> +               }
> +       } else
> +               zone->state = BLK_ZONE_NO_WP;
> +
> +       zone->wp = wp;
> +       /*
> +        * Fixup block zone state
> +        */
> +       if (zone_cond == ZBC_ZONE_COND_EMPTY &&
> +           zone->wp != zone->start) {
> +               sd_zbc_debug(sdkp,
> +                            "zone %zu state EMPTY wp %zu: adjust wp\n",
> +                            zone->start, zone->wp);
> +               zone->wp = zone->start;
> +       }
> +       if (zone_cond == ZBC_ZONE_COND_FULL &&
> +           zone->wp != zone->start + zone->len) {
> +               sd_zbc_debug(sdkp,
> +                            "zone %zu state FULL wp %zu: adjust wp\n",
> +                            zone->start, zone->wp);
> +               zone->wp = zone->start + zone->len;
> +       }
> +
> +       return zone;
> +}
> +
> +sector_t zbc_parse_zones(struct scsi_disk *sdkp, unsigned char *buf,
> +                        unsigned int buf_len)
> +{
> +       struct request_queue *q = sdkp->disk->queue;
> +       unsigned char *rec = buf;
> +       int rec_no = 0;
> +       unsigned int list_length;
> +       sector_t next_sector = -1;
> +       u8 same;
> +
> +       /* Parse REPORT ZONES header */
> +       list_length = get_unaligned_be32(&buf[0]);
> +       same = buf[4] & 0xf;
> +       rec = buf + 64;
> +       list_length += 64;
> +
> +       if (list_length < buf_len)
> +               buf_len = list_length;
> +
> +       while (rec < buf + buf_len) {
> +               struct blk_zone *this, *old;
> +               unsigned long flags;
> +
> +               this = zbc_desc_to_zone(sdkp, rec);
> +               if (!this)
> +                       break;
> +
> +               next_sector = this->start + this->len;
> +               old = blk_insert_zone(q, this);
> +               if (old) {
> +                       spin_lock_irqsave(&old->lock, flags);
> +                       if (blk_zone_is_smr(old)) {
> +                               old->wp = this->wp;
> +                               old->state = this->state;
> +                       }
> +                       spin_unlock_irqrestore(&old->lock, flags);
> +                       kfree(this);
> +               }
> +               rec += 64;
> +               rec_no++;
> +       }
> +
> +       sd_zbc_debug(sdkp,
> +                    "Inserted %d zones, next sector %zu len %d\n",
> +                    rec_no, next_sector, list_length);
> +
> +       return next_sector;
> +}
> +
> +void sd_zbc_refresh_zone_work(struct work_struct *work)
> +{
> +       struct zbc_update_work *zbc_work =
> +               container_of(work, struct zbc_update_work, zone_work);
> +       struct scsi_disk *sdkp = zbc_work->sdkp;
> +       struct request_queue *q = sdkp->disk->queue;
> +       unsigned int zone_buflen;
> +       int ret;
> +       sector_t last_sector;
> +       sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
> +
> +       zone_buflen = zbc_work->zone_buflen;
> +       ret = sd_zbc_report_zones(sdkp, zbc_work->zone_buf, zone_buflen,
> +                                 zbc_work->zone_sector,
> +                                 ZBC_ZONE_REPORTING_OPTION_ALL, true);
> +       if (ret)
> +               goto done_free;
> +
> +       last_sector = zbc_parse_zones(sdkp, zbc_work->zone_buf, zone_buflen);
> +       if (last_sector != -1 && last_sector < capacity) {
> +               if (test_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
> +                       sd_zbc_debug(sdkp,
> +                                    "zones in reset, cancelling refresh\n");
> +                       ret = -EAGAIN;
> +                       goto done_free;
> +               }
> +
> +               zbc_work->zone_sector = last_sector;
> +               queue_work(sdkp->zone_work_q, &zbc_work->zone_work);
> +               /* Kick request queue to be on the safe side */
> +               goto done_start_queue;
> +       }
> +done_free:
> +       kfree(zbc_work);
> +       if (test_and_clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags) && ret) {
> +               sd_zbc_debug(sdkp,
> +                            "Cancelling zone initialisation\n");
> +       }
> +done_start_queue:
> +       if (q->mq_ops)
> +               blk_mq_start_hw_queues(q);
> +       else {
> +               unsigned long flags;
> +
> +               spin_lock_irqsave(q->queue_lock, flags);
> +               blk_start_queue(q);
> +               spin_unlock_irqrestore(q->queue_lock, flags);
> +       }
> +}
> +
> +/**
> + * sd_zbc_update_zones - Update zone information for @sector
> + * @sdkp: SCSI disk for which the zone information needs to be updated
> + * @sector: sector to be updated
> + * @bufsize: buffersize to be allocated
> + * @update: true if existing zones should be updated
> + */
> +void sd_zbc_update_zones(struct scsi_disk *sdkp, sector_t sector, int bufsize,
> +                        bool update)
> +{
> +       struct request_queue *q = sdkp->disk->queue;
> +       struct zbc_update_work *zbc_work;
> +       struct blk_zone *zone;
> +       struct rb_node *node;
> +       int zone_num = 0, zone_busy = 0, num_rec;
> +       sector_t next_sector = sector;
> +
> +       if (test_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
> +               sd_zbc_debug(sdkp,
> +                            "zones in reset, not starting update\n");
> +               return;
> +       }
> +
> +retry:
> +       zbc_work = kzalloc(sizeof(struct zbc_update_work) + bufsize,
> +                          update ? GFP_NOWAIT : GFP_KERNEL);
> +       if (!zbc_work) {
> +               if (bufsize > 512) {
> +                       sd_zbc_debug(sdkp,
> +                                    "retry with buffer size %d\n", bufsize);
> +                       bufsize = bufsize >> 1;
> +                       goto retry;
> +               }
> +               sd_zbc_debug(sdkp,
> +                            "failed to allocate %d bytes\n", bufsize);
> +               if (!update)
> +                       clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags);
> +               return;
> +       }
> +       zbc_work->zone_sector = sector;
> +       zbc_work->zone_buflen = bufsize;
> +       zbc_work->sdkp = sdkp;
> +       INIT_WORK(&zbc_work->zone_work, sd_zbc_refresh_zone_work);
> +       num_rec = (bufsize / 64) - 1;
> +
> +       /*
> +        * Mark zones under update as BUSY
> +        */
> +       if (update) {
> +               for (node = rb_first(&q->zones); node; node = rb_next(node)) {
> +                       unsigned long flags;
> +
> +                       zone = rb_entry(node, struct blk_zone, node);
> +                       if (num_rec == 0)
> +                               break;
> +                       if (zone->start != next_sector)
> +                               continue;
> +                       next_sector += zone->len;
> +                       num_rec--;
> +
> +                       spin_lock_irqsave(&zone->lock, flags);
> +                       if (blk_zone_is_smr(zone)) {
> +                               if (zone->state == BLK_ZONE_BUSY) {
> +                                       zone_busy++;
> +                               } else {
> +                                       zone->state = BLK_ZONE_BUSY;
> +                                       zone->wp = zone->start;
> +                               }
> +                               zone_num++;
> +                       }
> +                       spin_unlock_irqrestore(&zone->lock, flags);
> +               }
> +               if (zone_num && (zone_num == zone_busy)) {
> +                       sd_zbc_debug(sdkp,
> +                                    "zone update for %zu in progress\n",
> +                                    sector);
> +                       kfree(zbc_work);
> +                       return;
> +               }
> +       }
> +
> +       if (!queue_work(sdkp->zone_work_q, &zbc_work->zone_work)) {
> +               sd_zbc_debug(sdkp,
> +                            "zone update already queued?\n");
> +               kfree(zbc_work);
> +       }
> +}
> +
> +/**
> + * sd_zbc_report_zones - Issue a REPORT ZONES scsi command
> + * @sdkp: SCSI disk to which the command should be send
> + * @buffer: response buffer
> + * @bufflen: length of @buffer
> + * @start_sector: logical sector for the zone information should be reported
> + * @option: reporting option to be used
> + * @partial: flag to set the 'partial' bit for report zones command
> + */
> +int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buffer,
> +                       int bufflen, sector_t start_sector,
> +                       enum zbc_zone_reporting_options option, bool partial)
> +{
> +       struct scsi_device *sdp = sdkp->device;
> +       const int timeout = sdp->request_queue->rq_timeout
> +                       * SD_FLUSH_TIMEOUT_MULTIPLIER;
> +       struct scsi_sense_hdr sshdr;
> +       sector_t start_lba = sectors_to_logical(sdkp->device, start_sector);
> +       unsigned char cmd[16];
> +       int result;
> +
> +       if (!scsi_device_online(sdp))
> +               return -ENODEV;
> +
> +       sd_zbc_debug(sdkp, "REPORT ZONES lba %zu len %d\n",
> +                    start_lba, bufflen);
> +
> +       memset(cmd, 0, 16);
> +       cmd[0] = ZBC_IN;
> +       cmd[1] = ZI_REPORT_ZONES;
> +       put_unaligned_be64(start_lba, &cmd[2]);
> +       put_unaligned_be32(bufflen, &cmd[10]);
> +       cmd[14] = (partial ? ZBC_REPORT_ZONE_PARTIAL : 0) | option;
> +       memset(buffer, 0, bufflen);
> +
> +       result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
> +                               buffer, bufflen, &sshdr,
> +                               timeout, SD_MAX_RETRIES, NULL);
> +
> +       if (result) {
> +               sd_zbc_debug(sdkp,
> +                            "REPORT ZONES lba %zu failed with %d/%d\n",
> +                            start_lba, host_byte(result), driver_byte(result));
> +               return -EIO;
> +       }
> +
> +       return 0;
> +}
> +
> +int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq,
> +                        sector_t sector, unsigned int num_sectors)
> +{
> +       struct blk_zone *zone;
> +       int ret = BLKPREP_OK;
> +       unsigned long flags;
> +
> +       zone = blk_lookup_zone(rq->q, sector);
> +       if (!zone)
> +               return BLKPREP_KILL;
> +
> +       spin_lock_irqsave(&zone->lock, flags);
> +
> +       if (zone->state == BLK_ZONE_UNKNOWN ||
> +           zone->state == BLK_ZONE_BUSY) {
> +               sd_zbc_debug_ratelimit(sdkp,
> +                                      "Discarding zone %zu state %x, deferring\n",
> +                                      zone->start, zone->state);
> +               ret = BLKPREP_DEFER;
> +               goto out;
> +       }
> +       if (zone->state == BLK_ZONE_OFFLINE) {
> +               /* let the drive fail the command */
> +               sd_zbc_debug_ratelimit(sdkp,
> +                                      "Discarding offline zone %zu\n",
> +                                      zone->start);
> +               goto out;
> +       }
> +
> +       if (!blk_zone_is_smr(zone)) {
> +               sd_zbc_debug_ratelimit(sdkp,
> +                                      "Discarding %s zone %zu\n",
> +                                      blk_zone_is_cmr(zone) ? "CMR" : "unknown",
> +                                      zone->start);
> +               ret = BLKPREP_DONE;
> +               goto out;
> +       }
> +       if (blk_zone_is_empty(zone)) {
> +               sd_zbc_debug_ratelimit(sdkp,
> +                                      "Discarding empty zone %zu\n",
> +                                      zone->start);
> +               ret = BLKPREP_DONE;
> +               goto out;
> +       }
> +
> +       if (zone->start != sector ||
> +           zone->len < num_sectors) {
> +               sd_printk(KERN_ERR, sdkp,
> +                         "Misaligned RESET WP, start %zu/%zu "
> +                         "len %zu/%u\n",
> +                         zone->start, sector, zone->len, num_sectors);
> +               ret = BLKPREP_KILL;
> +               goto out;
> +       }

It appears that you allow RESET WP to succeed here when
num_sectors is less than the number of blocks in use as
indicated by the zone->wp.

> +       /*
> +        * Opportunistic setting, will be fixed up with
> +        * zone update if RESET WRITE POINTER fails.
> +        */
> +       zone->wp = zone->start;
> +
> +out:
> +       spin_unlock_irqrestore(&zone->lock, flags);
> +
> +       return ret;
> +}
> +
> +int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq,
> +                           sector_t sector, unsigned int num_sectors)
> +{
> +       struct blk_zone *zone;
> +       int ret = BLKPREP_OK;
> +       unsigned long flags;
> +
> +       zone = blk_lookup_zone(sdkp->disk->queue, sector);
> +       if (!zone) {
> +               /* Might happen during zone initialization */
> +               sd_zbc_debug_ratelimit(sdkp,
> +                                      "zone for sector %zu not found, skipping\n",
> +                                      sector);
> +               return BLKPREP_OK;
> +       }
> +
> +       spin_lock_irqsave(&zone->lock, flags);
> +
> +       if (zone->state == BLK_ZONE_UNKNOWN ||
> +           zone->state == BLK_ZONE_BUSY) {
> +               sd_zbc_debug_ratelimit(sdkp,
> +                                      "zone %zu state %x, deferring\n",
> +                                      zone->start, zone->state);
> +               ret = BLKPREP_DEFER;
> +               goto out;
> +       }
> +       if (zone->state == BLK_ZONE_OFFLINE) {
> +               /* let the drive fail the command */
> +               sd_zbc_debug_ratelimit(sdkp,
> +                                      "zone %zu offline\n",
> +                                      zone->start);
> +               goto out;
> +       }
> +
> +       if (rq->cmd_flags & (REQ_WRITE | REQ_WRITE_SAME)) {
> +               if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> +                       goto out;
> +               if (zone->state == BLK_ZONE_READONLY)
> +                       goto out;
> +               if (blk_zone_is_full(zone)) {
> +                       sd_zbc_debug(sdkp,
> +                                    "Write to full zone %zu/%zu\n",
> +                                    sector, zone->wp);
> +                       ret = BLKPREP_KILL;
> +                       goto out;
> +               }
> +               if (zone->wp != sector) {
> +                       sd_zbc_debug(sdkp,
> +                                    "Misaligned write %zu/%zu\n",
> +                                    sector, zone->wp);
> +                       ret = BLKPREP_KILL;
> +                       goto out;
> +               }
> +               zone->wp += num_sectors;
> +       } else if (blk_zone_is_smr(zone) && (zone->wp <= sector)) {
> +               sd_zbc_debug(sdkp,
> +                            "Read beyond wp %zu/%zu\n",
> +                            sector, zone->wp);
> +               ret = BLKPREP_DONE;
> +       }
> +
> +out:
> +       spin_unlock_irqrestore(&zone->lock, flags);
> +
> +       return ret;
> +}
> +
> +int sd_zbc_setup(struct scsi_disk *sdkp, char *buf, int buf_len)
> +{
> +       sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
> +       sector_t last_sector;
> +
> +       if (test_and_set_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags)) {
> +               sdev_printk(KERN_WARNING, sdkp->device,
> +                           "zone initialisation already running\n");
> +               return 0;
> +       }
> +
> +       if (!sdkp->zone_work_q) {
> +               char wq_name[32];
> +
> +               sprintf(wq_name, "zbc_wq_%s", sdkp->disk->disk_name);
> +               sdkp->zone_work_q = create_singlethread_workqueue(wq_name);
> +               if (!sdkp->zone_work_q) {
> +                       sdev_printk(KERN_WARNING, sdkp->device,
> +                                   "create zoned disk workqueue failed\n");
> +                       return -ENOMEM;
> +               }
> +       } else if (!test_and_set_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
> +               drain_workqueue(sdkp->zone_work_q);
> +               clear_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags);
> +       }
> +
> +       last_sector = zbc_parse_zones(sdkp, buf, buf_len);
> +       if (last_sector != -1 && last_sector < capacity) {
> +               sd_zbc_update_zones(sdkp, last_sector, SD_ZBC_BUF_SIZE, false);
> +       } else
> +               clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags);
> +
> +       return 0;
> +}
> +
> +void sd_zbc_remove(struct scsi_disk *sdkp)
> +{
> +       if (sdkp->zone_work_q) {
> +               if (!test_and_set_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags))
> +                       drain_workqueue(sdkp->zone_work_q);
> +               clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags);
> +               destroy_workqueue(sdkp->zone_work_q);
> +       }
> +}
> --
> 1.8.5.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=CwIBAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA&m=TECAPpeng5OMyCHPt1hU8vo6KAxzybSw2on8YvGxkFA&s=FuZ8S92fAROISBQ96aUzY73nDV4L0J8ME36u9FCTWK8&e=



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-08-12  6:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 13:25 [PATCH 0/5] Add support for ZBC host-managed devices Hannes Reinecke
2016-07-19 13:25 ` [PATCH 1/5] sd: configure ZBC devices Hannes Reinecke
2016-07-20  0:46   ` Damien Le Moal
2016-07-22 21:56   ` Ewan D. Milne
2016-07-23 20:31     ` Hannes Reinecke
2016-07-23 22:04       ` Bart Van Assche
2016-07-24  7:07         ` Hannes Reinecke
2016-07-25  6:00         ` Hannes Reinecke
2016-07-25 13:24           ` Ewan D. Milne
2016-08-01 14:24   ` Shaun Tancheff
2016-08-01 14:29     ` Hannes Reinecke
2016-07-19 13:25 ` [PATCH 2/5] sd: Implement new RESET_WP provisioning mode Hannes Reinecke
2016-07-20  0:49   ` Damien Le Moal
2016-07-20 14:52     ` Hannes Reinecke
2016-07-19 13:25 ` [PATCH 3/5] sd: Implement support for ZBC devices Hannes Reinecke
2016-07-20  0:54   ` Damien Le Moal
2016-08-12  6:00   ` Shaun Tancheff
2016-07-19 13:25 ` [PATCH 4/5] sd: Limit messages for ZBC disks capacity change Hannes Reinecke
2016-07-19 13:25 ` [PATCH 5/5] sd_zbc: Fix handling of ZBC read after write pointer Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.