All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] block/scsi: Implement SMR drive support
@ 2016-04-04 10:00 Hannes Reinecke
  2016-04-04 10:00 ` [PATCH 1/9] blk-sysfs: Add 'chunk_sectors' to sysfs attributes Hannes Reinecke
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

Hi all,

here's a patchset implementing SMR (shingled magnetic recording)
device support for the block and SCSI layer.

There are two main parts to it:
- mapping the 'RESET WRITE POINTER' command to the 'discard' functionality.
  The 'RESET WRITE POINTER' operation is pretty close to the existing
  'discard' functionality with the 'discard_zeroes_blocks' bit set.
  So I've added a new 'reset_wp' provisioning mode for this.
- Adding a 'zone' pointer to the request queue. This pointer holds an
  RB-tree with the zone information, which can be used by other layers
  to access the write pointer.

This is the third part of a larger patchset for ZAC/ZBC support;
it requires the scsi trace fixes queued for in mkp/4.7/scsi-queue and
the patchsets 'libata: SATL update' and 'libata: ZAC support' I've
posted earlier.
The full patchset can be found at:

git.kernel.org/hare/scsi-devel/h/zbc.v3

As usual, comments and reviews are welcome.

Hannes Reinecke (9):
  blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  block: update chunk_sectors in blk_stack_limits()
  sd: configure ZBC devices
  sd: Implement new RESET_WP provisioning mode
  block: Implement support for zoned block devices
  block: Add 'zoned' sysfs queue attribute
  block: Introduce BLKPREP_DONE
  block: Add 'BLK_MQ_RQ_QUEUE_DONE' return value
  sd: Implement support for ZBC devices

 block/Kconfig           |   9 ++
 block/Makefile          |   1 +
 block/blk-core.c        |  11 +-
 block/blk-mq.c          |   1 +
 block/blk-settings.c    |   3 +
 block/blk-sysfs.c       |  74 +++++++++
 block/blk-zoned.c       |  70 +++++++++
 drivers/scsi/Kconfig    |   8 +
 drivers/scsi/Makefile   |   1 +
 drivers/scsi/scsi_lib.c |   4 +
 drivers/scsi/sd.c       | 267 ++++++++++++++++++++++++++++---
 drivers/scsi/sd.h       |  44 ++++++
 drivers/scsi/sd_zbc.c   | 411 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h  |   1 +
 include/linux/blkdev.h  |  48 ++++++
 15 files changed, 933 insertions(+), 20 deletions(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 drivers/scsi/sd_zbc.c

-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/9] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
@ 2016-04-04 10:00 ` Hannes Reinecke
  2016-04-14 19:09   ` Bart Van Assche
  2016-04-04 10:00 ` [PATCH 2/9] block: update chunk_sectors in blk_stack_limits() Hannes Reinecke
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

The queue limits already have a 'chunk_sectors' setting, so
we should be presenting it via sysfs.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-sysfs.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index dd93763..ff97091 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -130,6 +130,26 @@ static ssize_t queue_physical_block_size_show(struct request_queue *q, char *pag
 	return queue_var_show(queue_physical_block_size(q), page);
 }
 
+static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->limits.chunk_sectors, page);
+}
+
+static ssize_t
+queue_chunk_sectors_store(struct request_queue *q, const char *page, size_t count)
+{
+	unsigned long chunk_sectors;
+
+	ssize_t ret = queue_var_store(&chunk_sectors, page, count);
+	if (ret < 0)
+		return ret;
+	spin_lock_irq(q->queue_lock);
+	blk_queue_chunk_sectors(q, chunk_sectors);
+	spin_unlock_irq(q->queue_lock);
+
+	return ret;
+}
+
 static ssize_t queue_io_min_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(queue_io_min(q), page);
@@ -406,6 +426,12 @@ static struct queue_sysfs_entry queue_physical_block_size_entry = {
 	.show = queue_physical_block_size_show,
 };
 
+static struct queue_sysfs_entry queue_chunk_sectors_entry = {
+	.attr = {.name = "chunk_sectors", .mode = S_IRUGO | S_IWUSR },
+	.show = queue_chunk_sectors_show,
+	.store = queue_chunk_sectors_store,
+};
+
 static struct queue_sysfs_entry queue_io_min_entry = {
 	.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
 	.show = queue_io_min_show,
@@ -490,6 +516,7 @@ static struct attribute *default_attrs[] = {
 	&queue_hw_sector_size_entry.attr,
 	&queue_logical_block_size_entry.attr,
 	&queue_physical_block_size_entry.attr,
+	&queue_chunk_sectors_entry.attr,
 	&queue_io_min_entry.attr,
 	&queue_io_opt_entry.attr,
 	&queue_discard_granularity_entry.attr,
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/9] block: update chunk_sectors in blk_stack_limits()
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
  2016-04-04 10:00 ` [PATCH 1/9] blk-sysfs: Add 'chunk_sectors' to sysfs attributes Hannes Reinecke
@ 2016-04-04 10:00 ` Hannes Reinecke
  2016-04-15  3:41   ` Bart Van Assche
  2016-04-04 10:00 ` [PATCH 3/9] sd: configure ZBC devices Hannes Reinecke
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 block/blk-settings.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index c7bb666..29fa900 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -630,6 +630,9 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			t->discard_granularity;
 	}
 
+	if (b->chunk_sectors)
+		t->chunk_sectors = max(t->chunk_sectors, b->chunk_sectors);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/9] sd: configure ZBC devices
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
  2016-04-04 10:00 ` [PATCH 1/9] blk-sysfs: Add 'chunk_sectors' to sysfs attributes Hannes Reinecke
  2016-04-04 10:00 ` [PATCH 2/9] block: update chunk_sectors in blk_stack_limits() Hannes Reinecke
@ 2016-04-04 10:00 ` Hannes Reinecke
  2016-04-15 15:47   ` Bart Van Assche
  2016-04-04 10:00 ` [PATCH 4/9] sd: Implement new RESET_WP provisioning mode Hannes Reinecke
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke, Hannes Reinecke

For ZBC devices I/O must not cross zone boundaries, so setup
the 'chunk_sectors' block queue setting to the zone size.
This is only valid for REPORT ZONES SAME type 2 or 3;
for other types the zone sizes might be different
for individual zones. So issue a warning if the type is
found to be different.
Also the capacity might be different from the announced
capacity, so adjust it as needed.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/sd.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 drivers/scsi/sd.h |   2 +
 2 files changed, 104 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8401697..74637fd 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1972,6 +1972,45 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 	}
 }
 
+static int
+sd_zbc_report_zones(struct scsi_disk *sdkp, sector_t start_lba,
+		    unsigned char *buffer, int bufflen )
+{
+	struct scsi_device *sdp = sdkp->device;
+	const int timeout = sdp->request_queue->rq_timeout
+		* SD_FLUSH_TIMEOUT_MULTIPLIER;
+	struct scsi_sense_hdr sshdr;
+	unsigned char cmd[16];
+	int result;
+
+	if (!scsi_device_online(sdp)) {
+		sd_printk(KERN_INFO, sdkp, "device not online\n");
+		return -ENODEV;
+	}
+
+	sd_printk(KERN_INFO, sdkp, "REPORT ZONES lba %zu len %d\n",
+		  start_lba, bufflen);
+
+	memset(cmd, 0, 16);
+	cmd[0] = ZBC_IN;
+	cmd[1] = ZI_REPORT_ZONES;
+	put_unaligned_be64(start_lba, &cmd[2]);
+	put_unaligned_be32(bufflen, &cmd[10]);
+	memset(buffer, 0, bufflen);
+
+	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
+				  buffer, bufflen, &sshdr,
+				  timeout, SD_MAX_RETRIES, NULL);
+
+	if (result) {
+		sd_printk(KERN_NOTICE, sdkp,
+			  "REPORT ZONES lba %zu failed with %d/%d\n",
+			  start_lba, host_byte(result), driver_byte(result));
+
+		return -EIO;
+	}
+	return 0;
+}
 
 /*
  * Determine whether disk supports Data Integrity Field.
@@ -2014,6 +2053,58 @@ static int sd_read_protection_type(struct scsi_disk *sdkp, unsigned char *buffer
 	return ret;
 }
 
+static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+	int retval;
+	unsigned char *desc;
+	u32 rep_len;
+	u8 same;
+	u64 zone_len, lba;
+
+	if (sdkp->zoned != 1)
+		/* Device managed, no special handling required */
+		return;
+
+	retval = sd_zbc_report_zones(sdkp, 0, buffer, SD_BUF_SIZE);
+	if (retval < 0)
+		return;
+
+	rep_len = get_unaligned_be32(&buffer[0]);
+	if (rep_len < 64) {
+		sd_printk(KERN_WARNING, sdkp,
+			  "REPORT ZONES report invalid length %u\n",
+			  rep_len);
+		return;
+	}
+
+	if (sdkp->rc_basis == 0) {
+		/* The max_lba field is the capacity of a zoned device */
+		lba = get_unaligned_be64(&buffer[8]);
+		if (lba + 1 > sdkp->capacity) {
+			sd_printk(KERN_WARNING, sdkp,
+				  "Max LBA %zu (capacity %zu)\n",
+				  (sector_t) lba + 1, sdkp->capacity);
+			sdkp->capacity = lba + 1;
+		}
+	}
+
+	/*
+	 * Adjust 'chunk_sectors' to the zone length if the device
+	 * supports equal zone sizes.
+	 */
+	same = buffer[4] & 0xf;
+	if (same == 0 || same > 3) {
+		sd_printk(KERN_WARNING, sdkp,
+			  "REPORT ZONES SAME type %d not supported\n", same);
+		return;
+	}
+	/* Read the zone length from the first zone descriptor */
+	desc = &buffer[64];
+	zone_len = logical_to_sectors(sdkp->device,
+				      get_unaligned_be64(&desc[8]));
+	blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
+}
+
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
 			struct scsi_sense_hdr *sshdr, int sense_valid,
 			int the_result)
@@ -2122,6 +2213,9 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	/* Logical blocks per physical block exponent */
 	sdkp->physical_block_size = (1 << (buffer[13] & 0xf)) * sector_size;
 
+	/* RC basis */
+	sdkp->rc_basis = (buffer[12] >> 4) & 0x3;
+
 	/* Lowest aligned logical block */
 	alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
 	blk_queue_alignment_offset(sdp->request_queue, alignment);
@@ -2312,6 +2406,11 @@ got_data:
 		sector_size = 512;
 	}
 	blk_queue_logical_block_size(sdp->request_queue, sector_size);
+	blk_queue_physical_block_size(sdp->request_queue,
+				      sdkp->physical_block_size);
+	sdkp->device->sector_size = sector_size;
+
+	sd_read_zones(sdkp, buffer);
 
 	{
 		char cap_str_2[10], cap_str_10[10];
@@ -2338,9 +2437,6 @@ got_data:
 	if (sdkp->capacity > 0xffffffff)
 		sdp->use_16_for_rw = 1;
 
-	blk_queue_physical_block_size(sdp->request_queue,
-				      sdkp->physical_block_size);
-	sdkp->device->sector_size = sector_size;
 }
 
 /* called with buffer of length 512 */
@@ -2727,6 +2823,7 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
 	}
 
+	sdkp->zoned = (buffer[8] >> 4) & 3;
  out:
 	kfree(buffer);
 }
@@ -2842,14 +2939,14 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	 * react badly if we do.
 	 */
 	if (sdkp->media_present) {
-		sd_read_capacity(sdkp, buffer);
-
 		if (sd_try_extended_inquiry(sdp)) {
 			sd_read_block_provisioning(sdkp);
 			sd_read_block_limits(sdkp);
 			sd_read_block_characteristics(sdkp);
 		}
 
+		sd_read_capacity(sdkp, buffer);
+
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 654630b..ab1696d 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -94,6 +94,8 @@ struct scsi_disk {
 	unsigned	lbpvpd : 1;
 	unsigned	ws10 : 1;
 	unsigned	ws16 : 1;
+	unsigned	rc_basis: 2;
+	unsigned	zoned: 2;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/9] sd: Implement new RESET_WP provisioning mode
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
                   ` (2 preceding siblings ...)
  2016-04-04 10:00 ` [PATCH 3/9] sd: configure ZBC devices Hannes Reinecke
@ 2016-04-04 10:00 ` Hannes Reinecke
  2016-04-04 10:00 ` [PATCH 5/9] block: Implement support for zoned block devices Hannes Reinecke
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

We can map the RESET WRITE POINTER command onto a 'discard'
request.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/sd.c | 52 +++++++++++++++++++++++++++++++++++++++++++---------
 drivers/scsi/sd.h |  1 +
 2 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 74637fd..9220c66 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -369,6 +369,7 @@ static const char *lbp_mode[] = {
 	[SD_LBP_WS16]		= "writesame_16",
 	[SD_LBP_WS10]		= "writesame_10",
 	[SD_LBP_ZERO]		= "writesame_zero",
+	[SD_ZBC_RESET_WP]	= "reset_wp",
 	[SD_LBP_DISABLE]	= "disabled",
 };
 
@@ -391,6 +392,13 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
+	if (sdkp->zoned == 1) {
+		if (!strncmp(buf, lbp_mode[SD_ZBC_RESET_WP], 20)) {
+			sd_config_discard(sdkp, SD_ZBC_RESET_WP);
+			return count;
+		}
+		return -EINVAL;
+	}
 	if (sdp->type != TYPE_DISK)
 		return -EINVAL;
 
@@ -683,6 +691,12 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 		q->limits.discard_zeroes_data = sdkp->lbprz;
 		break;
 
+	case SD_ZBC_RESET_WP:
+		max_blocks = min_not_zero(sdkp->max_unmap_blocks,
+					  (u32)SD_MAX_WS16_BLOCKS);
+		q->limits.discard_zeroes_data = 1;
+		break;
+
 	case SD_LBP_ZERO:
 		max_blocks = min_not_zero(sdkp->max_ws_blocks,
 					  (u32)SD_MAX_WS10_BLOCKS);
@@ -711,16 +725,18 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 	unsigned int nr_sectors = blk_rq_sectors(rq);
 	unsigned int nr_bytes = blk_rq_bytes(rq);
 	unsigned int len;
-	int ret;
+	int ret = 0;
 	char *buf;
-	struct page *page;
+	struct page *page = NULL;
 
 	sector >>= ilog2(sdp->sector_size) - 9;
 	nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
-	page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
-	if (!page)
-		return BLKPREP_DEFER;
+	if (sdkp->provisioning_mode != SD_ZBC_RESET_WP) {
+		page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+		if (!page)
+			return BLKPREP_DEFER;
+	}
 
 	switch (sdkp->provisioning_mode) {
 	case SD_LBP_UNMAP:
@@ -760,6 +776,16 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 		len = sdkp->device->sector_size;
 		break;
 
+	case SD_ZBC_RESET_WP:
+		cmd->cmd_len = 16;
+		cmd->cmnd[0] = ZBC_OUT;
+		cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
+		put_unaligned_be64(sector, &cmd->cmnd[2]);
+		/* Reset Write Pointer doesn't have a payload */
+		len = 0;
+		cmd->sc_data_direction = DMA_NONE;
+		break;
+
 	default:
 		ret = BLKPREP_INVALID;
 		goto out;
@@ -779,12 +805,14 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 	 * discarded on disk. This allows us to report completion on the full
 	 * amount of blocks described by the request.
 	 */
-	blk_add_request_payload(rq, page, len);
-	ret = scsi_init_io(cmd);
+	if (len) {
+		blk_add_request_payload(rq, page, len);
+		ret = scsi_init_io(cmd);
+	}
 	rq->__data_len = nr_bytes;
 
 out:
-	if (ret != BLKPREP_OK)
+	if (page && ret != BLKPREP_OK)
 		__free_page(page);
 	return ret;
 }
@@ -1151,7 +1179,8 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 {
 	struct request *rq = SCpnt->request;
 
-	if (rq->cmd_flags & REQ_DISCARD)
+	if (rq->cmd_flags & REQ_DISCARD &&
+	    rq->completion_data)
 		__free_page(rq->completion_data);
 
 	if (SCpnt->cmnd != rq->cmd) {
@@ -1768,6 +1797,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	int sense_deferred = 0;
 	unsigned char op = SCpnt->cmnd[0];
 	unsigned char unmap = SCpnt->cmnd[1] & 8;
+	unsigned char sa = SCpnt->cmnd[1] & 0xf;
 
 	if (req->cmd_flags & REQ_DISCARD || req->cmd_flags & REQ_WRITE_SAME) {
 		if (!result) {
@@ -1819,6 +1849,10 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			case UNMAP:
 				sd_config_discard(sdkp, SD_LBP_DISABLE);
 				break;
+			case ZBC_OUT:
+				if (sa == ZO_RESET_WRITE_POINTER)
+					sd_config_discard(sdkp, SD_LBP_DISABLE);
+				break;
 			case WRITE_SAME_16:
 			case WRITE_SAME:
 				if (unmap)
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index ab1696d..5debd49 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -56,6 +56,7 @@ enum {
 	SD_LBP_WS16,		/* Use WRITE SAME(16) with UNMAP bit */
 	SD_LBP_WS10,		/* Use WRITE SAME(10) with UNMAP bit */
 	SD_LBP_ZERO,		/* Use WRITE SAME(10) with zero payload */
+	SD_ZBC_RESET_WP,	/* Use RESET WRITE POINTER */
 	SD_LBP_DISABLE,		/* Discard disabled due to failed cmd */
 };
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/9] block: Implement support for zoned block devices
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
                   ` (3 preceding siblings ...)
  2016-04-04 10:00 ` [PATCH 4/9] sd: Implement new RESET_WP provisioning mode Hannes Reinecke
@ 2016-04-04 10:00 ` Hannes Reinecke
  2016-04-15 17:37   ` Bart Van Assche
  2016-04-04 10:00 ` [PATCH 6/9] block: Add 'zoned' sysfs queue attribute Hannes Reinecke
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

Implement a RB-Tree holding the zone information and
add support functions for maintaining the RB-Tree.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/Kconfig          |  9 +++++++
 block/Makefile         |  1 +
 block/blk-core.c       |  5 ++++
 block/blk-zoned.c      | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h | 47 +++++++++++++++++++++++++++++++++
 5 files changed, 132 insertions(+)
 create mode 100644 block/blk-zoned.c

diff --git a/block/Kconfig b/block/Kconfig
index 0363cd7..8162312 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -113,6 +113,15 @@ config BLK_DEV_THROTTLING
 
 	See Documentation/cgroups/blkio-controller.txt for more information.
 
+config BLK_DEV_ZONED
+	bool "Zoned block device support"
+	default n
+	---help---
+	Block layer zoned block device support. This option enables
+	support for zoned block (ZAC/ZBC) devices.
+
+	Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_CMDLINE_PARSER
 	bool "Block device command line partition parser"
 	default n
diff --git a/block/Makefile b/block/Makefile
index 9eda232..bc9e62c 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_BLK_DEV_BSG)	+= bsg.o
 obj-$(CONFIG_BLK_DEV_BSGLIB)	+= bsg-lib.o
 obj-$(CONFIG_BLK_CGROUP)	+= blk-cgroup.o
 obj-$(CONFIG_BLK_DEV_THROTTLING)	+= blk-throttle.o
+obj-$(CONFIG_BLK_DEV_ZONED)	+= blk-zoned.o
 obj-$(CONFIG_IOSCHED_NOOP)	+= noop-iosched.o
 obj-$(CONFIG_IOSCHED_DEADLINE)	+= deadline-iosched.o
 obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
diff --git a/block/blk-core.c b/block/blk-core.c
index 827f8ba..d3d3db9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -588,6 +588,8 @@ void blk_cleanup_queue(struct request_queue *q)
 		blk_mq_free_queue(q);
 	percpu_ref_exit(&q->q_usage_counter);
 
+	blk_drop_zones(q);
+
 	spin_lock_irq(lock);
 	if (q->queue_lock != &q->__queue_lock)
 		q->queue_lock = &q->__queue_lock;
@@ -724,6 +726,9 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 #ifdef CONFIG_BLK_CGROUP
 	INIT_LIST_HEAD(&q->blkg_list);
 #endif
+#ifdef CONFIG_BLK_DEV_ZONED
+	q->zones = RB_ROOT;
+#endif
 	INIT_DELAYED_WORK(&q->delay_work, blk_delay_work);
 
 	kobject_init(&q->kobj, &blk_queue_ktype);
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 0000000..975e863
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,70 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/blkdev.h>
+#include <linux/rbtree.h>
+
+struct blk_zone *blk_lookup_zone(struct request_queue *q, sector_t lba)
+{
+	struct rb_root *root = &q->zones;
+	struct rb_node *node = root->rb_node;
+
+	while (node) {
+		struct blk_zone *zone = container_of(node, struct blk_zone,
+						     node);
+
+		if (lba < zone->start)
+			node = node->rb_left;
+		else if (lba >= zone->start + zone->len)
+			node = node->rb_right;
+		else
+			return zone;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(blk_lookup_zone);
+
+struct blk_zone *blk_insert_zone(struct request_queue *q, struct blk_zone *data)
+{
+	struct rb_root *root = &q->zones;
+	struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct blk_zone *this = container_of(*new, struct blk_zone,
+						     node);
+		parent = *new;
+		if (data->start + data->len <= this->start)
+			new = &((*new)->rb_left);
+		else if (data->start >= this->start + this->len)
+			new = &((*new)->rb_right);
+		else {
+			/* Return existing zone */
+			return this;
+		}
+	}
+	/* Add new node and rebalance tree. */
+	rb_link_node(&data->node, parent, new);
+	rb_insert_color(&data->node, root);
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(blk_insert_zone);
+
+void blk_drop_zones(struct request_queue *q)
+{
+	struct rb_root *root = &q->zones;
+	struct blk_zone *zone, *next;
+
+	rbtree_postorder_for_each_entry_safe(zone, next, root, node) {
+		kfree(zone);
+	}
+	q->zones = RB_ROOT;
+}
+EXPORT_SYMBOL_GPL(blk_drop_zones);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7e5d7e0..f58bcdc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -249,6 +249,50 @@ struct blk_queue_tag {
 #define BLK_SCSI_MAX_CMDS	(256)
 #define BLK_SCSI_CMD_PER_LONG	(BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
+#ifdef CONFIG_BLK_DEV_ZONED
+enum blk_zone_type {
+	BLK_ZONE_TYPE_UNKNOWN,
+	BLK_ZONE_TYPE_CONVENTIONAL,
+	BLK_ZONE_TYPE_SEQWRITE_REQ,
+	BLK_ZONE_TYPE_SEQWRITE_PREF,
+	BLK_ZONE_TYPE_RESERVED,
+};
+
+enum blk_zone_state {
+	BLK_ZONE_UNKNOWN,
+	BLK_ZONE_NO_WP,
+	BLK_ZONE_OPEN,
+	BLK_ZONE_READONLY,
+	BLK_ZONE_OFFLINE,
+	BLK_ZONE_BUSY,
+};
+
+struct blk_zone {
+	struct rb_node node;
+	spinlock_t lock;
+	sector_t start;
+	size_t len;
+	sector_t wp;
+	enum blk_zone_type type;
+	enum blk_zone_state state;
+	void *private_data;
+};
+
+#define blk_zone_is_smr(z) ((z)->type == BLK_ZONE_TYPE_SEQWRITE_REQ ||	\
+			    (z)->type == BLK_ZONE_TYPE_SEQWRITE_PREF)
+
+#define blk_zone_is_cmr(z) ((z)->type == BLK_ZONE_TYPE_CONVENTIONAL)
+#define blk_zone_is_full(z) ((z)->wp == (z)->start + (z)->len)
+#define blk_zone_is_empty(z) ((z)->wp == (z)->start)
+
+extern struct blk_zone *blk_lookup_zone(struct request_queue *, sector_t);
+extern struct blk_zone *blk_insert_zone(struct request_queue *,
+					struct blk_zone *);
+extern void blk_drop_zones(struct request_queue *);
+#else
+static inline void blk_drop_zones(struct request_queue *q) { };
+#endif
+
 struct queue_limits {
 	unsigned long		bounce_pfn;
 	unsigned long		seg_boundary_mask;
@@ -421,6 +465,9 @@ struct request_queue {
 
 	struct queue_limits	limits;
 
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct rb_root		zones;
+#endif
 	/*
 	 * sg stuff
 	 */
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/9] block: Add 'zoned' sysfs queue attribute
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
                   ` (4 preceding siblings ...)
  2016-04-04 10:00 ` [PATCH 5/9] block: Implement support for zoned block devices Hannes Reinecke
@ 2016-04-04 10:00 ` Hannes Reinecke
  2016-04-07  1:56   ` Damien Le Moal
  2016-04-15 17:45   ` Bart Van Assche
  2016-04-04 10:00 ` [PATCH 7/9] block: Introduce BLKPREP_DONE Hannes Reinecke
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

Add a sysfs queue attribute 'zoned' to display the zone layout
for zoned devices.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff97091..748bb27 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -244,6 +244,43 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
 	return queue_var_show(max_hw_sectors_kb, (page));
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+static ssize_t queue_zoned_show(struct request_queue *q, char *page)
+{
+	struct rb_node *node;
+	struct blk_zone *zone;
+	ssize_t offset = 0, end = 0;
+	size_t size = 0, num = 0;
+	enum blk_zone_type type = BLK_ZONE_TYPE_UNKNOWN;
+
+	for (node = rb_first(&q->zones); node; node = rb_next(node)) {
+		zone = rb_entry(node, struct blk_zone, node);
+		if (zone->type != type ||
+		    zone->len != size ||
+		    end != zone->start) {
+			if (size != 0)
+				offset += sprintf(page + offset, "%zu\n", num);
+			/* We can only store one page ... */
+			if (offset + 42 > PAGE_SIZE) {
+				offset += sprintf(page + offset, "...\n");
+				return offset;
+			}
+			size = zone->len;
+			type = zone->type;
+			offset += sprintf(page + offset, "%zu %zu %d ",
+					  zone->start, size, type);
+			num = 0;
+			end = zone->start + size;
+		} else
+			end += zone->len;
+		num++;
+	}
+	if (num > 0)
+		offset += sprintf(page + offset, "%zu\n", num);
+	return offset > 0 ? offset : -EINVAL;
+}
+#endif
+
 #define QUEUE_SYSFS_BIT_FNS(name, flag, neg)				\
 static ssize_t								\
 queue_show_##name(struct request_queue *q, char *page)			\
@@ -468,6 +505,13 @@ static struct queue_sysfs_entry queue_write_same_max_entry = {
 	.show = queue_write_same_max_show,
 };
 
+#ifdef CONFIG_BLK_DEV_ZONED
+static struct queue_sysfs_entry queue_zoned_entry = {
+	.attr = {.name = "zoned", .mode = S_IRUGO },
+	.show = queue_zoned_show,
+};
+#endif
+
 static struct queue_sysfs_entry queue_nonrot_entry = {
 	.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_show_nonrot,
@@ -525,6 +569,9 @@ static struct attribute *default_attrs[] = {
 	&queue_discard_zeroes_data_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_nonrot_entry.attr,
+#ifdef CONFIG_BLK_DEV_ZONED
+	&queue_zoned_entry.attr,
+#endif
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
 	&queue_iostats_entry.attr,
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/9] block: Introduce BLKPREP_DONE
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
                   ` (5 preceding siblings ...)
  2016-04-04 10:00 ` [PATCH 6/9] block: Add 'zoned' sysfs queue attribute Hannes Reinecke
@ 2016-04-04 10:00 ` Hannes Reinecke
  2016-04-15 17:49   ` Bart Van Assche
  2016-04-04 10:00 ` [PATCH 8/9] block: Add 'BLK_MQ_RQ_QUEUE_DONE' return value Hannes Reinecke
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

Add a new blkprep return code BLKPREP_DONE to signal completion
without I/O error.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-core.c        | 6 +++++-
 drivers/scsi/scsi_lib.c | 1 +
 include/linux/blkdev.h  | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d3d3db9..33ea429 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2460,9 +2460,13 @@ struct request *blk_peek_request(struct request_queue *q)
 
 			rq = NULL;
 			break;
-		} else if (ret == BLKPREP_KILL || ret == BLKPREP_INVALID) {
+		} else if (ret == BLKPREP_KILL || ret == BLKPREP_INVALID ||
+			   ret == BLKPREP_DONE) {
 			int err = (ret == BLKPREP_INVALID) ? -EREMOTEIO : -EIO;
 
+			if (ret == BLKPREP_DONE)
+				err = 0;
+
 			rq->cmd_flags |= REQ_QUIET;
 			/*
 			 * Mark this request as started so we don't trigger
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8106515..1cc75fb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1346,6 +1346,7 @@ scsi_prep_return(struct request_queue *q, struct request *req, int ret)
 	case BLKPREP_KILL:
 	case BLKPREP_INVALID:
 		req->errors = DID_NO_CONNECT << 16;
+	case BLKPREP_DONE:
 		/* release the command and kill it */
 		if (req->special) {
 			struct scsi_cmnd *cmd = req->special;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f58bcdc..0dead8d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -735,6 +735,7 @@ enum {
 	BLKPREP_KILL,		/* fatal error, kill, return -EIO */
 	BLKPREP_DEFER,		/* leave on queue */
 	BLKPREP_INVALID,	/* invalid command, kill, return -EREMOTEIO */
+	BLKPREP_DONE,		/* complete w/o error */
 };
 
 extern unsigned long blk_max_low_pfn, blk_max_pfn;
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 8/9] block: Add 'BLK_MQ_RQ_QUEUE_DONE' return value
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
                   ` (6 preceding siblings ...)
  2016-04-04 10:00 ` [PATCH 7/9] block: Introduce BLKPREP_DONE Hannes Reinecke
@ 2016-04-04 10:00 ` Hannes Reinecke
  2016-04-15 17:56   ` Bart Van Assche
  2016-04-04 10:00 ` [PATCH 9/9] sd: Implement support for ZBC devices Hannes Reinecke
  2016-04-08 18:35 ` [PATCH 0/9] block/scsi: Implement SMR drive support Shaun Tancheff
  9 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke, Hannes Reinecke

Add a return value BLK_MQ_RQ_QUEUE_DONE to terminate a request
without error.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 block/blk-mq.c          | 1 +
 drivers/scsi/scsi_lib.c | 3 +++
 include/linux/blk-mq.h  | 1 +
 3 files changed, 5 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 050f7a1..8b863b7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -793,6 +793,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 			pr_err("blk-mq: bad return on queue: %d\n", ret);
 		case BLK_MQ_RQ_QUEUE_ERROR:
 			rq->errors = -EIO;
+		case BLK_MQ_RQ_QUEUE_DONE:
 			blk_mq_end_request(rq, rq->errors);
 			break;
 		}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1cc75fb..7cb66b0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1891,6 +1891,8 @@ static inline int prep_to_mq(int ret)
 		return 0;
 	case BLKPREP_DEFER:
 		return BLK_MQ_RQ_QUEUE_BUSY;
+	case BLKPREP_DONE:
+		return BLK_MQ_RQ_QUEUE_DONE;
 	default:
 		return BLK_MQ_RQ_QUEUE_ERROR;
 	}
@@ -2034,6 +2036,7 @@ out:
 			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
 		break;
 	case BLK_MQ_RQ_QUEUE_ERROR:
+	case BLK_MQ_RQ_QUEUE_DONE:
 		/*
 		 * Make sure to release all allocated ressources when
 		 * we hit an error, as we will never see this command
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 15a73d4..ab107a8 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -151,6 +151,7 @@ enum {
 	BLK_MQ_RQ_QUEUE_OK	= 0,	/* queued fine */
 	BLK_MQ_RQ_QUEUE_BUSY	= 1,	/* requeue IO for later */
 	BLK_MQ_RQ_QUEUE_ERROR	= 2,	/* end IO with error */
+	BLK_MQ_RQ_QUEUE_DONE	= 3,	/* end IO w/o error */
 
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 9/9] sd: Implement support for ZBC devices
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
                   ` (7 preceding siblings ...)
  2016-04-04 10:00 ` [PATCH 8/9] block: Add 'BLK_MQ_RQ_QUEUE_DONE' return value Hannes Reinecke
@ 2016-04-04 10:00 ` Hannes Reinecke
  2016-04-15 18:31   ` Bart Van Assche
  2016-04-08 18:35 ` [PATCH 0/9] block/scsi: Implement SMR drive support Shaun Tancheff
  9 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-04 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

Implement ZBC support functions to read in the zone information
and setup the zone tree.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/Kconfig  |   8 +
 drivers/scsi/Makefile |   1 +
 drivers/scsi/sd.c     | 120 +++++++++++++--
 drivers/scsi/sd.h     |  41 +++++
 drivers/scsi/sd_zbc.c | 411 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 570 insertions(+), 11 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 0950567..4c6cdc2 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -201,6 +201,14 @@ config SCSI_ENCLOSURE
 	  it has an enclosure device.  Selecting this option will just allow
 	  certain enclosure conditions to be reported and is not required.
 
+config SCSI_ZBC
+	bool "SCSI ZBC (zoned block commands) Support"
+	depends on SCSI && BLK_DEV_ZONED
+	help
+	  Enable support for ZBC (zoned block commands) devices.
+
+	  If unsure say N.
+
 config SCSI_CONSTANTS
 	bool "Verbose SCSI error reporting (kernel size += 36K)"
 	depends on SCSI
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 862ab4e..49bde97 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -178,6 +178,7 @@ hv_storvsc-y			:= storvsc_drv.o
 
 sd_mod-objs	:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
+sd_mod-$(CONFIG_SCSI_ZBC) += sd_zbc.o
 
 sr_mod-objs	:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 9220c66..ad7efbc 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -92,6 +92,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS	16
@@ -162,7 +163,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 	static const char temp[] = "temporary ";
 	int len;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		/* no cache control on RBC devices; theoretically they
 		 * can do it, but there's probably so many exceptions
 		 * it's not worth the risk */
@@ -261,7 +262,7 @@ allow_restart_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	sdp->allow_restart = simple_strtoul(buf, NULL, 10);
@@ -392,7 +393,7 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdkp->zoned == 1) {
+	if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
 		if (!strncmp(buf, lbp_mode[SD_ZBC_RESET_WP], 20)) {
 			sd_config_discard(sdkp, SD_ZBC_RESET_WP);
 			return count;
@@ -466,7 +467,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &max);
@@ -728,6 +729,10 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 	int ret = 0;
 	char *buf;
 	struct page *page = NULL;
+#ifdef CONFIG_SCSI_ZBC
+	struct blk_zone *zone;
+	unsigned long flags;
+#endif
 
 	sector >>= ilog2(sdp->sector_size) - 9;
 	nr_sectors >>= ilog2(sdp->sector_size) - 9;
@@ -777,6 +782,52 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 		break;
 
 	case SD_ZBC_RESET_WP:
+#ifdef CONFIG_SCSI_ZBC
+		zone = blk_lookup_zone(rq->q, sector);
+		if (!zone) {
+			ret = BLKPREP_KILL;
+			goto out;
+		}
+		spin_lock_irqsave(&zone->lock, flags);
+		if (zone->state == BLK_ZONE_BUSY) {
+			sd_printk(KERN_INFO, sdkp,
+				  "Discarding busy zone %zu/%zu\n",
+				  zone->start, zone->len);
+			spin_unlock_irqrestore(&zone->lock, flags);
+			ret = BLKPREP_DEFER;
+			goto out;
+		}
+		if (!blk_zone_is_smr(zone)) {
+			sd_printk(KERN_INFO, sdkp,
+				  "Discarding %s zone %zu/%zu\n",
+				  blk_zone_is_cmr(zone) ? "CMR" : "unknown",
+				  zone->start, zone->len);
+			spin_unlock_irqrestore(&zone->lock, flags);
+			ret = BLKPREP_DONE;
+			goto out;
+		}
+		if (blk_zone_is_empty(zone)) {
+			spin_unlock_irqrestore(&zone->lock, flags);
+			ret = BLKPREP_DONE;
+			goto out;
+		}
+		if (zone->start != sector ||
+		    zone->len < nr_sectors) {
+			sd_printk(KERN_INFO, sdkp,
+				  "Misaligned RESET WP, start %zu/%zu "
+				  "len %zu/%u\n",
+				  zone->start, sector, zone->len, nr_sectors);
+			spin_unlock_irqrestore(&zone->lock, flags);
+			ret = BLKPREP_KILL;
+			goto out;
+		}
+		/*
+		 * Opportunistic setting, needs to be fixed up
+		 * if RESET WRITE POINTER fails.
+		 */
+		zone->wp = zone->start;
+		spin_unlock_irqrestore(&zone->lock, flags);
+#endif
 		cmd->cmd_len = 16;
 		cmd->cmnd[0] = ZBC_OUT;
 		cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
@@ -990,6 +1041,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
 					(unsigned long long)block));
 
+	if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
+		/* sd_zbc_lookup_zone lba is in block layer sector units */
+		ret = sd_zbc_lookup_zone(sdkp, rq, block, this_count);
+		if (ret != BLKPREP_OK)
+			goto out;
+	}
+
 	/*
 	 * If we have a 1K hardware sectorsize, prevent access to single
 	 * 512 byte sectors.  In theory we could handle this - in fact
@@ -1804,6 +1862,13 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			good_bytes = blk_rq_bytes(req);
 			scsi_set_resid(SCpnt, 0);
 		} else {
+#ifdef CONFIG_SCSI_ZBC
+			if (op == ZBC_OUT)
+				/* RESET WRITE POINTER failed */
+				sd_zbc_update_zones(sdkp,
+						    blk_rq_pos(req),
+						    512, true);
+#endif
 			good_bytes = 0;
 			scsi_set_resid(SCpnt, blk_rq_bytes(req));
 		}
@@ -1867,6 +1932,26 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 				}
 			}
 		}
+		if (sshdr.asc == 0x21) {
+			/*
+			 * ZBC: read beyond the write pointer position.
+			 * Clear out error and return the buffer as-is.
+			 */
+			if (sshdr.ascq == 0x06) {
+				good_bytes = blk_rq_bytes(req);
+				scsi_set_resid(SCpnt, 0);
+			}
+#ifdef CONFIG_SCSI_ZBC
+			/*
+			 * ZBC: Unaligned write command.
+			 * Write did not start a write pointer position.
+			 */
+			if (sshdr.ascq == 0x04)
+				sd_zbc_update_zones(sdkp,
+						    blk_rq_pos(req),
+						    512, true);
+#endif
+		}
 		break;
 	default:
 		break;
@@ -2006,9 +2091,8 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 	}
 }
 
-static int
-sd_zbc_report_zones(struct scsi_disk *sdkp, sector_t start_lba,
-		    unsigned char *buffer, int bufflen )
+int sd_zbc_report_zones(struct scsi_disk *sdkp, sector_t start_lba,
+			unsigned char *buffer, int bufflen )
 {
 	struct scsi_device *sdp = sdkp->device;
 	const int timeout = sdp->request_queue->rq_timeout
@@ -2095,8 +2179,11 @@ static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
 	u8 same;
 	u64 zone_len, lba;
 
-	if (sdkp->zoned != 1)
-		/* Device managed, no special handling required */
+	if (sdkp->zoned != 1 && sdkp->device->type != TYPE_ZBC)
+		/*
+		 * Device managed or normal SCSI disk,
+		 * no special handling required
+		 */
 		return;
 
 	retval = sd_zbc_report_zones(sdkp, 0, buffer, SD_BUF_SIZE);
@@ -2137,6 +2224,8 @@ static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
 	zone_len = logical_to_sectors(sdkp->device,
 				      get_unaligned_be64(&desc[8]));
 	blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
+
+	sd_zbc_setup(sdkp, buffer, SD_BUF_SIZE);
 }
 
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
@@ -2732,7 +2821,7 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
 	struct scsi_mode_data data;
 	struct scsi_sense_hdr sshdr;
 
-	if (sdp->type != TYPE_DISK)
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
 		return;
 
 	if (sdkp->protection_type == 0)
@@ -3179,9 +3268,16 @@ static int sd_probe(struct device *dev)
 
 	scsi_autopm_get_device(sdp);
 	error = -ENODEV;
-	if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
+	if (sdp->type != TYPE_DISK &&
+	    sdp->type != TYPE_ZBC &&
+	    sdp->type != TYPE_MOD &&
+	    sdp->type != TYPE_RBC)
 		goto out;
 
+#ifndef CONFIG_SCSI_ZBC
+	if (sdp->type == TYPE_ZBC)
+		goto out;
+#endif
 	SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
 					"sd_probe\n"));
 
@@ -3285,6 +3381,8 @@ static int sd_remove(struct device *dev)
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 
+	sd_zbc_remove(sdkp);
+
 	blk_register_region(devt, SD_MINORS, NULL,
 			    sd_default_probe, NULL, NULL);
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 5debd49..35c75fa 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -65,6 +65,12 @@ struct scsi_disk {
 	struct scsi_device *device;
 	struct device	dev;
 	struct gendisk	*disk;
+#ifdef CONFIG_SCSI_ZBC
+	struct workqueue_struct *zone_work_q;
+	unsigned long	zone_flags;
+#define SD_ZBC_ZONE_RESET 1
+#define SD_ZBC_ZONE_INIT  2
+#endif
 	atomic_t	openers;
 	sector_t	capacity;	/* size in logical blocks */
 	u32		max_xfer_blocks;
@@ -154,6 +160,11 @@ static inline sector_t logical_to_sectors(struct scsi_device *sdev, sector_t blo
 	return blocks << (ilog2(sdev->sector_size) - 9);
 }
 
+static inline sector_t sectors_to_logical(struct scsi_device *sdev, sector_t sector)
+{
+	return sector >> (ilog2(sdev->sector_size) - 9);
+}
+
 /*
  * A DIF-capable target device can be formatted with different
  * protection schemes.  Currently 0 through 3 are defined:
@@ -267,4 +278,34 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+#ifdef CONFIG_SCSI_ZBC
+
+extern int sd_zbc_report_zones(struct scsi_disk *sdkp, sector_t start_lba,
+			       unsigned char *buffer, int bufflen );
+extern int sd_zbc_setup(struct scsi_disk *, char *, int);
+extern void sd_zbc_remove(struct scsi_disk *);
+extern void sd_zbc_reset_zones(struct scsi_disk *);
+extern int sd_zbc_lookup_zone(struct scsi_disk *, struct request *,
+			      sector_t, unsigned int);
+extern void sd_zbc_update_zones(struct scsi_disk *, sector_t, int, bool);
+extern void sd_zbc_refresh_zone_work(struct work_struct *);
+
+#else /* CONFIG_SCSI_ZBC */
+
+static inline int sd_zbc_setup(struct scsi_disk *sdkp,
+			       unsigned char *buf, int buf_len)
+{
+	return 0;
+}
+
+static inline int sd_zbc_lookup_zone(struct scsi_disk *sdkp,
+				     struct request *rq, sector_t sector,
+				     unsigned int num_sectors)
+{
+	return BLKPREP_OK;
+}
+
+static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
+#endif /* CONFIG_SCSI_ZBC */
+
 #endif /* _SCSI_DISK_H */
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
new file mode 100644
index 0000000..9d8221c
--- /dev/null
+++ b/drivers/scsi/sd_zbc.c
@@ -0,0 +1,411 @@
+/*
+ * sd_zbc.c - SCSI Zoned Block commands
+ *
+ * Copyright (C) 2014-2015 SUSE Linux GmbH
+ * Written by: Hannes Reinecke <hare@suse.de>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include <linux/blkdev.h>
+#include <linux/rbtree.h>
+
+#include <asm/unaligned.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_dbg.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_eh.h>
+
+#include "sd.h"
+#include "scsi_priv.h"
+
+enum zbc_zone_cond {
+	ZBC_ZONE_COND_NO_WP,
+	ZBC_ZONE_COND_EMPTY,
+	ZBC_ZONE_COND_IMPLICIT_OPEN,
+	ZBC_ZONE_COND_EXPLICIT_OPEN,
+	ZBC_ZONE_COND_CLOSED,
+	ZBC_ZONE_COND_READONLY = 0xd,
+	ZBC_ZONE_COND_FULL,
+	ZBC_ZONE_COND_OFFLINE,
+};
+
+#define SD_ZBC_BUF_SIZE 524288
+
+#undef SD_ZBC_DEBUG
+
+struct zbc_update_work {
+	struct work_struct zone_work;
+	struct scsi_disk *sdkp;
+	spinlock_t	zone_lock;
+	sector_t	zone_sector;
+	int		zone_buflen;
+	char		zone_buf[0];
+};
+
+struct blk_zone *zbc_desc_to_zone(struct scsi_disk *sdkp, unsigned char *rec)
+{
+	struct blk_zone *zone;
+	enum zbc_zone_cond zone_cond;
+	sector_t wp = (sector_t)-1;
+
+	zone = kzalloc(sizeof(struct blk_zone), GFP_KERNEL);
+	if (!zone)
+		return NULL;
+
+	spin_lock_init(&zone->lock);
+	zone->type = rec[0] & 0xf;
+	zone_cond = (rec[1] >> 4) & 0xf;
+	zone->len = logical_to_sectors(sdkp->device,
+				       get_unaligned_be64(&rec[8]));
+	zone->start = logical_to_sectors(sdkp->device,
+					 get_unaligned_be64(&rec[16]));
+
+	if (blk_zone_is_smr(zone)) {
+		wp = logical_to_sectors(sdkp->device,
+					get_unaligned_be64(&rec[24]));
+		if (zone_cond == ZBC_ZONE_COND_READONLY) {
+			zone->state = BLK_ZONE_READONLY;
+		} else if (zone_cond == ZBC_ZONE_COND_OFFLINE) {
+			zone->state = BLK_ZONE_OFFLINE;
+		} else {
+			zone->state = BLK_ZONE_OPEN;
+		}
+	} else
+		zone->state = BLK_ZONE_NO_WP;
+
+	zone->wp = wp;
+	/*
+	 * Fixup block zone state
+	 */
+	if (zone_cond == ZBC_ZONE_COND_EMPTY &&
+	    zone->wp != zone->start) {
+#ifdef SD_ZBC_DEBUG
+		sd_printk(KERN_INFO, sdkp,
+			  "zone %zu state EMPTY wp %zu: adjust wp\n",
+			  zone->start, zone->wp);
+#endif
+		zone->wp = zone->start;
+	}
+	if (zone_cond == ZBC_ZONE_COND_FULL &&
+	    zone->wp != zone->start + zone->len) {
+#ifdef SD_ZBC_DEBUG
+		sd_printk(KERN_INFO, sdkp,
+			  "zone %zu state FULL wp %zu: adjust wp\n",
+			  zone->start, zone->wp);
+#endif
+		zone->wp = zone->start + zone->len;
+	}
+
+	return zone;
+}
+
+sector_t zbc_parse_zones(struct scsi_disk *sdkp, unsigned char *buf,
+			 unsigned int buf_len)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	unsigned char *rec = buf;
+	int rec_no = 0;
+	unsigned int list_length;
+	sector_t next_sector = -1;
+	u8 same;
+
+	/* Parse REPORT ZONES header */
+	list_length = get_unaligned_be32(&buf[0]);
+	same = buf[4] & 0xf;
+	rec = buf + 64;
+	list_length += 64;
+
+	if (list_length < buf_len)
+		buf_len = list_length;
+
+	while (rec < buf + buf_len) {
+		struct blk_zone *this, *old;
+		unsigned long flags;
+
+		this = zbc_desc_to_zone(sdkp, rec);
+		if (!this)
+			break;
+
+		next_sector = this->start + this->len;
+		old = blk_insert_zone(q, this);
+		if (old) {
+			spin_lock_irqsave(&old->lock, flags);
+			if (blk_zone_is_smr(old)) {
+				old->wp = this->wp;
+				old->state = this->state;
+			}
+			spin_unlock_irqrestore(&old->lock, flags);
+			kfree(this);
+		}
+		rec += 64;
+		rec_no++;
+	}
+
+#ifdef SD_ZBC_DEBUG
+	sd_printk(KERN_INFO, sdkp,
+		  "Inserted %d zones, next sector %zu len %d\n",
+		  rec_no, next_sector, list_length);
+#endif
+	return next_sector;
+}
+
+void sd_zbc_refresh_zone_work(struct work_struct *work)
+{
+	struct zbc_update_work *zbc_work =
+		container_of(work, struct zbc_update_work, zone_work);
+	struct scsi_disk *sdkp = zbc_work->sdkp;
+	struct request_queue *q = sdkp->disk->queue;
+	unsigned int zone_buflen;
+	int ret;
+	sector_t last_sector;
+	sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+	sector_t zone_lba = sectors_to_logical(sdkp->device,
+					       zbc_work->zone_sector);
+
+	zone_buflen = zbc_work->zone_buflen;
+	ret = sd_zbc_report_zones(sdkp, zone_lba, zbc_work->zone_buf,
+				  zone_buflen);
+	if (ret)
+		goto done_free;
+
+	last_sector = zbc_parse_zones(sdkp, zbc_work->zone_buf, zone_buflen);
+	if (last_sector != -1 && last_sector < capacity) {
+		if (test_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
+#ifdef SD_ZBC_DEBUG
+			sd_printk(KERN_INFO, sdkp,
+				  "zones in reset, cancelling refresh\n");
+#endif
+			ret = -EAGAIN;
+			goto done_free;
+		}
+
+		zbc_work->zone_sector = last_sector;
+		queue_work(sdkp->zone_work_q, &zbc_work->zone_work);
+		/* Kick request queue to be on the safe side */
+		goto done_start_queue;
+	}
+done_free:
+	kfree(zbc_work);
+	if (test_and_clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags) && ret) {
+		sd_printk(KERN_INFO, sdkp,
+			  "Cancelling zone initialisation\n");
+	}
+done_start_queue:
+	if (q->mq_ops)
+		blk_mq_start_hw_queues(q);
+	else {
+		unsigned long flags;
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_start_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+}
+
+void sd_zbc_update_zones(struct scsi_disk *sdkp, sector_t sector, int bufsize,
+			 bool update)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	struct zbc_update_work *zbc_work;
+	struct blk_zone *zone;
+	struct rb_node *node;
+	int zone_num = 0, zone_busy = 0, num_rec;
+	sector_t next_sector = sector;
+
+	if (test_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
+		sd_printk(KERN_INFO, sdkp,
+			  "zones in reset, not starting update\n");
+		return;
+	}
+
+retry:
+	zbc_work = kzalloc(sizeof(struct zbc_update_work) + bufsize,
+			   GFP_KERNEL);
+	if (!zbc_work) {
+		if (bufsize > 512) {
+			sd_printk(KERN_INFO, sdkp,
+				  "retry with buffer size %d\n", bufsize);
+			bufsize = bufsize >> 1;
+			goto retry;
+		}
+		sd_printk(KERN_INFO, sdkp,
+			  "failed to allocate %d bytes\n", bufsize);
+		if (!update)
+			clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags);
+		return;
+	}
+	zbc_work->zone_sector = sector;
+	zbc_work->zone_buflen = bufsize;
+	zbc_work->sdkp = sdkp;
+	INIT_WORK(&zbc_work->zone_work, sd_zbc_refresh_zone_work);
+	num_rec = (bufsize / 64) - 1;
+
+	/*
+	 * Mark zones under update as BUSY
+	 */
+	if (update) {
+		for (node = rb_first(&q->zones); node; node = rb_next(node)) {
+			unsigned long flags;
+
+			zone = rb_entry(node, struct blk_zone, node);
+			if (num_rec == 0)
+				break;
+			if (zone->start != next_sector)
+				continue;
+			next_sector += zone->len;
+			num_rec--;
+
+			spin_lock_irqsave(&zone->lock, flags);
+			if (blk_zone_is_smr(zone)) {
+				if (zone->state == BLK_ZONE_BUSY) {
+					zone_busy++;
+				} else {
+					zone->state = BLK_ZONE_BUSY;
+					zone->wp = zone->start;
+				}
+				zone_num++;
+			}
+			spin_unlock_irqrestore(&zone->lock, flags);
+		}
+		if (zone_num && (zone_num == zone_busy)) {
+			sd_printk(KERN_INFO, sdkp,
+			  "zone update for %zu in progress\n", sector);
+			kfree(zbc_work);
+			return;
+		}
+	}
+
+	if (!queue_work(sdkp->zone_work_q, &zbc_work->zone_work)) {
+		sd_printk(KERN_INFO, sdkp,
+			  "zone update already queued?\n");
+		kfree(zbc_work);
+	}
+}
+
+int sd_zbc_lookup_zone(struct scsi_disk *sdkp, struct request *rq,
+		       sector_t sector, unsigned int num_sectors)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	struct blk_zone *zone = NULL;
+	int ret = BLKPREP_OK;
+	unsigned long flags;
+
+	zone = blk_lookup_zone(q, sector);
+	/* Might happen during zone initialization */
+	if (!zone) {
+#ifdef SD_ZBC_DEBUG
+		if (printk_ratelimit())
+			sd_printk(KERN_INFO, sdkp,
+				  "zone for sector %zu not found, skipping\n",
+				  sector);
+#endif
+		return BLKPREP_OK;
+	}
+	spin_lock_irqsave(&zone->lock, flags);
+	if (zone->state == BLK_ZONE_UNKNOWN ||
+	    zone->state == BLK_ZONE_BUSY) {
+		if (printk_ratelimit())
+			sd_printk(KERN_INFO, sdkp,
+				  "zone %zu state %x, deferring\n",
+				  zone->start, zone->state);
+		ret = BLKPREP_DEFER;
+	} else {
+		if (rq_data_dir(rq) == WRITE) {
+			if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+				goto out;
+			if (blk_zone_is_full(zone)) {
+#ifdef SD_ZBC_DEBUG
+				sd_printk(KERN_ERR, sdkp,
+					  "Write to full zone %zu/%zu\n",
+					  sector, zone->wp);
+#endif
+				ret = BLKPREP_KILL;
+				goto out;
+			}
+			if (zone->wp != sector) {
+#ifdef SD_ZBC_DEBUG
+				sd_printk(KERN_ERR, sdkp,
+					  "Misaligned write %zu/%zu\n",
+					  sector, zone->wp);
+#endif
+				ret = BLKPREP_KILL;
+				goto out;
+			}
+			zone->wp += num_sectors;
+		} else if (blk_zone_is_smr(zone) && (zone->wp <= sector)) {
+#ifdef SD_ZBC_DEBUG
+			sd_printk(KERN_INFO, sdkp,
+				    "Read beyond wp %zu/%zu\n",
+				    sector, zone->wp);
+#endif
+			ret = BLKPREP_DONE;
+		}
+	}
+out:
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	return ret;
+}
+
+int sd_zbc_setup(struct scsi_disk *sdkp, char *buf, int buf_len)
+{
+	sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
+	sector_t last_sector;
+
+	if (test_and_set_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags)) {
+		sdev_printk(KERN_WARNING, sdkp->device,
+			    "zone initialisation already running\n");
+		return 0;
+	}
+
+	if (!sdkp->zone_work_q) {
+		char wq_name[32];
+
+		sprintf(wq_name, "zbc_wq_%s", sdkp->disk->disk_name);
+		sdkp->zone_work_q = create_singlethread_workqueue(wq_name);
+		if (!sdkp->zone_work_q) {
+			sdev_printk(KERN_WARNING, sdkp->device,
+				    "create zoned disk workqueue failed\n");
+			return -ENOMEM;
+		}
+	} else if (!test_and_set_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
+		drain_workqueue(sdkp->zone_work_q);
+		clear_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags);
+	}
+
+	last_sector = zbc_parse_zones(sdkp, buf, buf_len);
+	if (last_sector != -1 && last_sector < capacity) {
+		sd_zbc_update_zones(sdkp, last_sector, SD_ZBC_BUF_SIZE, false);
+	} else
+		clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags);
+
+	return 0;
+}
+
+void sd_zbc_remove(struct scsi_disk *sdkp)
+{
+	if (sdkp->zone_work_q) {
+		if (!test_and_set_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags))
+			drain_workqueue(sdkp->zone_work_q);
+		clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags);
+		destroy_workqueue(sdkp->zone_work_q);
+	}
+}
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/9] block: Add 'zoned' sysfs queue attribute
  2016-04-04 10:00 ` [PATCH 6/9] block: Add 'zoned' sysfs queue attribute Hannes Reinecke
@ 2016-04-07  1:56   ` Damien Le Moal
  2016-04-07  5:57     ` Hannes Reinecke
  2016-04-15 17:45   ` Bart Van Assche
  1 sibling, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2016-04-07  1:56 UTC (permalink / raw)
  To: Hannes Reinecke, Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, linux-scsi, Sathya Prakash


Hannes,

>Add a sysfs queue attribute 'zoned' to display the zone layout
>for zoned devices.
>
>Signed-off-by: Hannes Reinecke <hare@suse.de>
>---
> block/blk-sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 47 insertions(+)
>
>diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>index ff97091..748bb27 100644
>--- a/block/blk-sysfs.c
>+++ b/block/blk-sysfs.c
>@@ -244,6 +244,43 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
> 	return queue_var_show(max_hw_sectors_kb, (page));
> }
> 
>+#ifdef CONFIG_BLK_DEV_ZONED
>+static ssize_t queue_zoned_show(struct request_queue *q, char *page)
>+{
>+	struct rb_node *node;
>+	struct blk_zone *zone;
>+	ssize_t offset = 0, end = 0;
>+	size_t size = 0, num = 0;
>+	enum blk_zone_type type = BLK_ZONE_TYPE_UNKNOWN;
>+
>+	for (node = rb_first(&q->zones); node; node = rb_next(node)) {
>+		zone = rb_entry(node, struct blk_zone, node);
>+		if (zone->type != type ||
>+		    zone->len != size ||
>+		    end != zone->start) {
>+			if (size != 0)
>+				offset += sprintf(page + offset, "%zu\n", num);
>+			/* We can only store one page ... */
>+			if (offset + 42 > PAGE_SIZE) {
>+				offset += sprintf(page + offset, "...\n");
>+				return offset;
>+			}
>+			size = zone->len;
>+			type = zone->type;
>+			offset += sprintf(page + offset, "%zu %zu %d ",
>+					  zone->start, size, type);
>+			num = 0;
>+			end = zone->start + size;
>+		} else
>+			end += zone->len;
>+		num++;
>+	}
>+	if (num > 0)
>+		offset += sprintf(page + offset, "%zu\n", num);
>+	return offset > 0 ? offset : -EINVAL;
>+}
>+#endif

With this, an application reading the “zoned” file for a non-SMR disk
will get a -EINVAL error. Not really super nice. Could we just have the
zoned files contain a single “0” for non-SMR disks ? Or at least have the
file empty and read return 0 Bytes. That would allow applications to easily
and cleanly detect if they are dealing with a SMR disk (or not) instead of
assuming that “-EINVAL” means “not SMR”, which seems very ugly to me.

Best.


--
Damien Le Moal, Ph.D.
Sr Manager, System Software Group, WW Research,
HGST, a Western Digital company
Damien.LeMoal@hgst.com
Tel: (+81) 0466-98-3593 (Ext. 51-3593)
1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan
www.hgst.com <http://www.hgst.com>
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/9] block: Add 'zoned' sysfs queue attribute
  2016-04-07  1:56   ` Damien Le Moal
@ 2016-04-07  5:57     ` Hannes Reinecke
  0 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-07  5:57 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, linux-scsi, Sathya Prakash

On 04/07/2016 03:56 AM, Damien Le Moal wrote:
> 
> Hannes,
> 
>> Add a sysfs queue attribute 'zoned' to display the zone layout
>> for zoned devices.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>> block/blk-sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 47 insertions(+)
>>
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index ff97091..748bb27 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -244,6 +244,43 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
>> 	return queue_var_show(max_hw_sectors_kb, (page));
>> }
>>
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +static ssize_t queue_zoned_show(struct request_queue *q, char *page)
>> +{
>> +	struct rb_node *node;
>> +	struct blk_zone *zone;
>> +	ssize_t offset = 0, end = 0;
>> +	size_t size = 0, num = 0;
>> +	enum blk_zone_type type = BLK_ZONE_TYPE_UNKNOWN;
>> +
>> +	for (node = rb_first(&q->zones); node; node = rb_next(node)) {
>> +		zone = rb_entry(node, struct blk_zone, node);
>> +		if (zone->type != type ||
>> +		    zone->len != size ||
>> +		    end != zone->start) {
>> +			if (size != 0)
>> +				offset += sprintf(page + offset, "%zu\n", num);
>> +			/* We can only store one page ... */
>> +			if (offset + 42 > PAGE_SIZE) {
>> +				offset += sprintf(page + offset, "...\n");
>> +				return offset;
>> +			}
>> +			size = zone->len;
>> +			type = zone->type;
>> +			offset += sprintf(page + offset, "%zu %zu %d ",
>> +					  zone->start, size, type);
>> +			num = 0;
>> +			end = zone->start + size;
>> +		} else
>> +			end += zone->len;
>> +		num++;
>> +	}
>> +	if (num > 0)
>> +		offset += sprintf(page + offset, "%zu\n", num);
>> +	return offset > 0 ? offset : -EINVAL;
>> +}
>> +#endif
> 
> With this, an application reading the “zoned” file for a non-SMR disk
> will get a -EINVAL error. Not really super nice. Could we just have the
> zoned files contain a single “0” for non-SMR disks ? Or at least have the
> file empty and read return 0 Bytes. That would allow applications to easily
> and cleanly detect if they are dealing with a SMR disk (or not) instead of
> assuming that “-EINVAL” means “not SMR”, which seems very ugly to me.
> 
Sure.

Actually I was looking in blanking out this attribute entirely like
I did with the SCSI sysfs attributes, but as the 'queue' sysfs
directory is a blank kobj it's not easily done.
But yes, '0' seems like a reasonable value here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/9] block/scsi: Implement SMR drive support
  2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
                   ` (8 preceding siblings ...)
  2016-04-04 10:00 ` [PATCH 9/9] sd: Implement support for ZBC devices Hannes Reinecke
@ 2016-04-08 18:35 ` Shaun Tancheff
  2016-04-09  8:01   ` Hannes Reinecke
  9 siblings, 1 reply; 30+ messages in thread
From: Shaun Tancheff @ 2016-04-08 18:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Damien Le Moal, linux-scsi, Sathya Prakash

On Mon, Apr 4, 2016 at 5:00 PM, Hannes Reinecke <hare@suse.de> wrote:
> Hi all,
>
> here's a patchset implementing SMR (shingled magnetic recording)
> device support for the block and SCSI layer.
>
> There are two main parts to it:
> - mapping the 'RESET WRITE POINTER' command to the 'discard' functionality.
>   The 'RESET WRITE POINTER' operation is pretty close to the existing
>   'discard' functionality with the 'discard_zeroes_blocks' bit set.
>   So I've added a new 'reset_wp' provisioning mode for this.

Completely agree with the REQ_OP_DISCARD -> Reset WP translation
seems like a good idea. I have tried something similar and ended up
essentially adding a 'reset wp' flag instead.
Now I am optimistic to see if I can use you patch to get the
discard -> reset wp working in my device mapper.

> - Adding a 'zone' pointer to the request queue. This pointer holds an
>   RB-tree with the zone information, which can be used by other layers
>   to access the write pointer.

Here is where I have some concerns. Having a common in-kernel
shadow of the drive's zone state seems problematic to me.

Also if I am understanding the direction here it is to hold the zone
information in an rbtree. Since that comes to just under 30,000
entries I think it would be better to shift to an array of
write pointer offsets.

At the moment my translation layer keeps track of activity and state
of all the zones on the drive so that is how I have been handling
the zone data up to this point.

> This is the third part of a larger patchset for ZAC/ZBC support;
> it requires the scsi trace fixes queued for in mkp/4.7/scsi-queue and
> the patchsets 'libata: SATL update' and 'libata: ZAC support' I've
> posted earlier.
> The full patchset can be found at:
>
> git.kernel.org/hare/scsi-devel/h/zbc.v3
>
> As usual, comments and reviews are welcome.
>
> Hannes Reinecke (9):
>   blk-sysfs: Add 'chunk_sectors' to sysfs attributes
>   block: update chunk_sectors in blk_stack_limits()
>   sd: configure ZBC devices
>   sd: Implement new RESET_WP provisioning mode
>   block: Implement support for zoned block devices
>   block: Add 'zoned' sysfs queue attribute
>   block: Introduce BLKPREP_DONE
>   block: Add 'BLK_MQ_RQ_QUEUE_DONE' return value
>   sd: Implement support for ZBC devices
>
>  block/Kconfig           |   9 ++
>  block/Makefile          |   1 +
>  block/blk-core.c        |  11 +-
>  block/blk-mq.c          |   1 +
>  block/blk-settings.c    |   3 +
>  block/blk-sysfs.c       |  74 +++++++++
>  block/blk-zoned.c       |  70 +++++++++
>  drivers/scsi/Kconfig    |   8 +
>  drivers/scsi/Makefile   |   1 +
>  drivers/scsi/scsi_lib.c |   4 +
>  drivers/scsi/sd.c       | 267 ++++++++++++++++++++++++++++---
>  drivers/scsi/sd.h       |  44 ++++++
>  drivers/scsi/sd_zbc.c   | 411 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/blk-mq.h  |   1 +
>  include/linux/blkdev.h  |  48 ++++++
>  15 files changed, 933 insertions(+), 20 deletions(-)
>  create mode 100644 block/blk-zoned.c
>  create mode 100644 drivers/scsi/sd_zbc.c
>
> --
> 1.8.5.6
>



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/9] block/scsi: Implement SMR drive support
  2016-04-08 18:35 ` [PATCH 0/9] block/scsi: Implement SMR drive support Shaun Tancheff
@ 2016-04-09  8:01   ` Hannes Reinecke
  0 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-09  8:01 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Damien Le Moal, linux-scsi, Sathya Prakash

On 04/08/2016 08:35 PM, Shaun Tancheff wrote:
> On Mon, Apr 4, 2016 at 5:00 PM, Hannes Reinecke <hare@suse.de> wrote:
>> Hi all,
>>
>> here's a patchset implementing SMR (shingled magnetic recording)
>> device support for the block and SCSI layer.
>>
>> There are two main parts to it:
>> - mapping the 'RESET WRITE POINTER' command to the 'discard' functionality.
>>    The 'RESET WRITE POINTER' operation is pretty close to the existing
>>    'discard' functionality with the 'discard_zeroes_blocks' bit set.
>>    So I've added a new 'reset_wp' provisioning mode for this.
>
> Completely agree with the REQ_OP_DISCARD -> Reset WP translation
> seems like a good idea. I have tried something similar and ended up
> essentially adding a 'reset wp' flag instead.
> Now I am optimistic to see if I can use you patch to get the
> discard -> reset wp working in my device mapper.
>
It works quite well here with my setup, although I've tripped across two 
caveats:

- We currently don't handle conventional zones.
   It would make sense to fallback to normal block zeroing here.
- Issuing 'RESET WP' is dead slow (at least on the prototypes I've had)
   Short-circuiting it for empty zones is a _major_ performance win here;
   the time for issuing discards for an entire drive is reduced by
   several orders of magnitude. So you absolutely need an in-kernel
   zone tree for this.

>> - Adding a 'zone' pointer to the request queue. This pointer holds an
>>    RB-tree with the zone information, which can be used by other layers
>>    to access the write pointer.
>
> Here is where I have some concerns. Having a common in-kernel
> shadow of the drive's zone state seems problematic to me.
>
Well, this is the general SMR programming model, is it not?
And as already pointed out above you really want this tree to be present 
to avoid unnecessary RESET WP calls.
You also need it to format READ calls correctly for host-managed drives; 
from my understanding of the programming model any READ call crossing 
the write pointer will be aborted.
Which you could easily circumvent by splitting the READ call in two 
parts, one up to the read pointer and another beyond it. For which again 
you need the zone tree.

> Also if I am understanding the direction here it is to hold the zone
> information in an rbtree. Since that comes to just under 30,000
> entries I think it would be better to shift to an array of
> write pointer offsets.
>
The thing is that using an rbtree might actually be faster than an 
array; the rbtree entries easily fit into the processor cache, whereas 
the array doesn't. So you might end up having a slower access when using 
arrays despite being easier to code.

> At the moment my translation layer keeps track of activity and state
> of all the zones on the drive so that is how I have been handling
> the zone data up to this point.
>
As outlined above: Any driver/filesystem need access to the zone states 
as it might need to align its internal structures to the zones.
But you also need to keep track of the zones in the SCSI layer so as to 
format the RESET WP correctly. Which means you basically need a common tree.

As you might've seen I've also programmed my own zoned device-mapper 
device, caching individual zones. We should discuss if those two 
approached can't be merged, to end up with a common device-mapper target.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/9] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  2016-04-04 10:00 ` [PATCH 1/9] blk-sysfs: Add 'chunk_sectors' to sysfs attributes Hannes Reinecke
@ 2016-04-14 19:09   ` Bart Van Assche
  2016-04-15  6:01     ` Hannes Reinecke
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2016-04-14 19:09 UTC (permalink / raw)
  To: Hannes Reinecke, Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash

On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
> The queue limits already have a 'chunk_sectors' setting, so
> we should be presenting it via sysfs.

This patch does more than exporting chunk_sectors via sysfs. It also 
makes that parameter configurable. Please mention this in the patch 
description.

My understanding of the block drivers that call 
blk_queue_chunk_sectors() is that increasing the chunk_sectors parameter 
will break these drivers. I think the single queue_limits.chunk_sectors 
parameter needs to be converted into two parameters:
- The value set by the block driver by calling
   blk_queue_chunk_sectors().
- The value configured from user space through sysfs.

This will allow to ensure that the chunk_sectors parameter can be 
increased from user space and also that it cannot be decreased.

> +static ssize_t
> +queue_chunk_sectors_store(struct request_queue *q, const char *page, size_t count)
> +{
> +	unsigned long chunk_sectors;
> +
> +	ssize_t ret = queue_var_store(&chunk_sectors, page, count);
> +	if (ret < 0)
> +		return ret;
> +	spin_lock_irq(q->queue_lock);
> +	blk_queue_chunk_sectors(q, chunk_sectors);
> +	spin_unlock_irq(q->queue_lock);
> +
> +	return ret;
> +}

In blk_queue_chunk_sectors() I found the following:

	BUG_ON(!is_power_of_2(chunk_sectors));

Please make sure that this BUG_ON() cannot be triggered from user space.

Additionally, an update for Documentation/block/queue-sysfs.txt is 
missing from this patch.

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/9] block: update chunk_sectors in blk_stack_limits()
  2016-04-04 10:00 ` [PATCH 2/9] block: update chunk_sectors in blk_stack_limits() Hannes Reinecke
@ 2016-04-15  3:41   ` Bart Van Assche
  2016-04-15  6:05     ` Hannes Reinecke
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2016-04-15  3:41 UTC (permalink / raw)
  To: Hannes Reinecke, Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

On 04/04/16 03:00, Hannes Reinecke wrote:
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index c7bb666..29fa900 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -630,6 +630,9 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>   			t->discard_granularity;
>   	}
>
> +	if (b->chunk_sectors)
> +		t->chunk_sectors = max(t->chunk_sectors, b->chunk_sectors);
> +
>   	return ret;
>   }
>   EXPORT_SYMBOL(blk_stack_limits);

Hello Hannes,

My understanding is that a non-zero chunk_sectors value defines a 
maximum I/O size. Shouldn't max() be changed into min_not_zero()? From 
include/linux/blkdev.h:

static inline unsigned int blk_max_size_offset(struct request_queue *q,
					       sector_t offset)
{
	if (!q->limits.chunk_sectors)
		return q->limits.max_sectors;

	return q->limits.chunk_sectors -
			(offset & (q->limits.chunk_sectors - 1));
}

Bart.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/9] blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  2016-04-14 19:09   ` Bart Van Assche
@ 2016-04-15  6:01     ` Hannes Reinecke
  0 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-15  6:01 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash

On 04/14/2016 09:09 PM, Bart Van Assche wrote:
> On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
>> The queue limits already have a 'chunk_sectors' setting, so
>> we should be presenting it via sysfs.
> 
> This patch does more than exporting chunk_sectors via sysfs. It also
> makes that parameter configurable. Please mention this in the patch
> description.
> 
> My understanding of the block drivers that call
> blk_queue_chunk_sectors() is that increasing the chunk_sectors
> parameter will break these drivers. I think the single
> queue_limits.chunk_sectors parameter needs to be converted into two
> parameters:
> - The value set by the block driver by calling
>   blk_queue_chunk_sectors().
> - The value configured from user space through sysfs.
> 
> This will allow to ensure that the chunk_sectors parameter can be
> increased from user space and also that it cannot be decreased.
> 
>> +static ssize_t
>> +queue_chunk_sectors_store(struct request_queue *q, const char
>> *page, size_t count)
>> +{
>> +    unsigned long chunk_sectors;
>> +
>> +    ssize_t ret = queue_var_store(&chunk_sectors, page, count);
>> +    if (ret < 0)
>> +        return ret;
>> +    spin_lock_irq(q->queue_lock);
>> +    blk_queue_chunk_sectors(q, chunk_sectors);
>> +    spin_unlock_irq(q->queue_lock);
>> +
>> +    return ret;
>> +}
> 
> In blk_queue_chunk_sectors() I found the following:
> 
>     BUG_ON(!is_power_of_2(chunk_sectors));
> 
> Please make sure that this BUG_ON() cannot be triggered from user
> space.
> 
> Additionally, an update for Documentation/block/queue-sysfs.txt is
> missing from this patch.
> 

Ah, right.
Actually, making the sysfs attribute writeable is a leftover from
the original (debugging) patchset, so I'll be removing it from the
next iteration of this patchset.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/9] block: update chunk_sectors in blk_stack_limits()
  2016-04-15  3:41   ` Bart Van Assche
@ 2016-04-15  6:05     ` Hannes Reinecke
  0 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-15  6:05 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

On 04/15/2016 05:41 AM, Bart Van Assche wrote:
> On 04/04/16 03:00, Hannes Reinecke wrote:
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index c7bb666..29fa900 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -630,6 +630,9 @@ int blk_stack_limits(struct queue_limits *t,
>> struct queue_limits *b,
>>               t->discard_granularity;
>>       }
>>
>> +    if (b->chunk_sectors)
>> +        t->chunk_sectors = max(t->chunk_sectors, b->chunk_sectors);
>> +
>>       return ret;
>>   }
>>   EXPORT_SYMBOL(blk_stack_limits);
> 
> Hello Hannes,
> 
> My understanding is that a non-zero chunk_sectors value defines a
> maximum I/O size. Shouldn't max() be changed into min_not_zero()?
> From include/linux/blkdev.h:
> 
> static inline unsigned int blk_max_size_offset(struct request_queue *q,
>                            sector_t offset)
> {
>     if (!q->limits.chunk_sectors)
>         return q->limits.max_sectors;
> 
>     return q->limits.chunk_sectors -
>             (offset & (q->limits.chunk_sectors - 1));
> }
> 
Hmm.

_Actually_ it should be the least common denominator, me thinks...
Have to think it through a bit more.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/9] sd: configure ZBC devices
  2016-04-04 10:00 ` [PATCH 3/9] sd: configure ZBC devices Hannes Reinecke
@ 2016-04-15 15:47   ` Bart Van Assche
  2016-04-15 18:01     ` Hannes Reinecke
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2016-04-15 15:47 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
> +static int
> +sd_zbc_report_zones(struct scsi_disk *sdkp, sector_t start_lba,
> +		    unsigned char *buffer, int bufflen )
> +{
> [ ... ]
> +	put_unaligned_be64(start_lba, &cmd[2]);
> +	put_unaligned_be32(bufflen, &cmd[10]);

The argument "sector_t start_lba" is confusing me. Isn't a number either 
a sector number or an LBA? Shouldn't that number be shifted right by 
ilog2(sdp->sector_size) - 9 before storing it in the CDB?

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/9] block: Implement support for zoned block devices
  2016-04-04 10:00 ` [PATCH 5/9] block: Implement support for zoned block devices Hannes Reinecke
@ 2016-04-15 17:37   ` Bart Van Assche
  0 siblings, 0 replies; 30+ messages in thread
From: Bart Van Assche @ 2016-04-15 17:37 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash

On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
> +struct blk_zone *blk_lookup_zone(struct request_queue *q, sector_t lba)

A similar comment applies to this function: does this function expect a 
sector_t or an LBA as its second argument?

> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 7e5d7e0..f58bcdc 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -249,6 +249,50 @@ struct blk_queue_tag {
>   #define BLK_SCSI_MAX_CMDS	(256)
>   #define BLK_SCSI_CMD_PER_LONG	(BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
>
> +#ifdef CONFIG_BLK_DEV_ZONED
> +enum blk_zone_type {
> +	BLK_ZONE_TYPE_UNKNOWN,
> +	BLK_ZONE_TYPE_CONVENTIONAL,
> +	BLK_ZONE_TYPE_SEQWRITE_REQ,
> +	BLK_ZONE_TYPE_SEQWRITE_PREF,
> +	BLK_ZONE_TYPE_RESERVED,
> +};
> +
> +enum blk_zone_state {
> +	BLK_ZONE_UNKNOWN,
> +	BLK_ZONE_NO_WP,
> +	BLK_ZONE_OPEN,
> +	BLK_ZONE_READONLY,
> +	BLK_ZONE_OFFLINE,
> +	BLK_ZONE_BUSY,
> +};
> +
> +struct blk_zone {
> +	struct rb_node node;
> +	spinlock_t lock;
> +	sector_t start;
> +	size_t len;
> +	sector_t wp;
> +	enum blk_zone_type type;
> +	enum blk_zone_state state;
> +	void *private_data;
> +};
> +
> +#define blk_zone_is_smr(z) ((z)->type == BLK_ZONE_TYPE_SEQWRITE_REQ ||	\
> +			    (z)->type == BLK_ZONE_TYPE_SEQWRITE_PREF)
> +
> +#define blk_zone_is_cmr(z) ((z)->type == BLK_ZONE_TYPE_CONVENTIONAL)
> +#define blk_zone_is_full(z) ((z)->wp == (z)->start + (z)->len)
> +#define blk_zone_is_empty(z) ((z)->wp == (z)->start)
> +
> +extern struct blk_zone *blk_lookup_zone(struct request_queue *, sector_t);
> +extern struct blk_zone *blk_insert_zone(struct request_queue *,
> +					struct blk_zone *);
> +extern void blk_drop_zones(struct request_queue *);
> +#else
> +static inline void blk_drop_zones(struct request_queue *q) { };
> +#endif

Have you considered to create a new header file for these definitions 
instead of adding these to <linux/blkdev.h>?

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/9] block: Add 'zoned' sysfs queue attribute
  2016-04-04 10:00 ` [PATCH 6/9] block: Add 'zoned' sysfs queue attribute Hannes Reinecke
  2016-04-07  1:56   ` Damien Le Moal
@ 2016-04-15 17:45   ` Bart Van Assche
  2016-04-15 18:03     ` Hannes Reinecke
  1 sibling, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2016-04-15 17:45 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash

On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
> Add a sysfs queue attribute 'zoned' to display the zone layout
> for zoned devices.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   block/blk-sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 47 insertions(+)
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index ff97091..748bb27 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -244,6 +244,43 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
>   	return queue_var_show(max_hw_sectors_kb, (page));
>   }
>
> +#ifdef CONFIG_BLK_DEV_ZONED
> +static ssize_t queue_zoned_show(struct request_queue *q, char *page)
> +{
> +	struct rb_node *node;
> +	struct blk_zone *zone;
> +	ssize_t offset = 0, end = 0;
> +	size_t size = 0, num = 0;
> +	enum blk_zone_type type = BLK_ZONE_TYPE_UNKNOWN;
> +
> +	for (node = rb_first(&q->zones); node; node = rb_next(node)) {
> +		zone = rb_entry(node, struct blk_zone, node);
> +		if (zone->type != type ||
> +		    zone->len != size ||
> +		    end != zone->start) {
> +			if (size != 0)
> +				offset += sprintf(page + offset, "%zu\n", num);
> +			/* We can only store one page ... */
> +			if (offset + 42 > PAGE_SIZE) {
> +				offset += sprintf(page + offset, "...\n");
> +				return offset;
> +			}
> +			size = zone->len;
> +			type = zone->type;
> +			offset += sprintf(page + offset, "%zu %zu %d ",
> +					  zone->start, size, type);
> +			num = 0;
> +			end = zone->start + size;
> +		} else
> +			end += zone->len;
> +		num++;
> +	}
> +	if (num > 0)
> +		offset += sprintf(page + offset, "%zu\n", num);
> +	return offset > 0 ? offset : -EINVAL;
> +}
> +#endif
> +
>   #define QUEUE_SYSFS_BIT_FNS(name, flag, neg)				\
>   static ssize_t								\
>   queue_show_##name(struct request_queue *q, char *page)			\
> @@ -468,6 +505,13 @@ static struct queue_sysfs_entry queue_write_same_max_entry = {
>   	.show = queue_write_same_max_show,
>   };
>
> +#ifdef CONFIG_BLK_DEV_ZONED
> +static struct queue_sysfs_entry queue_zoned_entry = {
> +	.attr = {.name = "zoned", .mode = S_IRUGO },
> +	.show = queue_zoned_show,
> +};
> +#endif
> +

Hello Hannes,

Have you considered to move the above definitions into a new file? That 
would allow to avoid two #ifdefs and to move the code that decides 
whether or not the above code gets built into block/Makefile.

Additionally, have you considered to create one sysfs directory per zone 
instead of one sysfs attribute with all zone information? From 
Documentation/filesystems/sysfs.txt: "Attributes should be ASCII text 
files, preferably with only one value per file."

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/9] block: Introduce BLKPREP_DONE
  2016-04-04 10:00 ` [PATCH 7/9] block: Introduce BLKPREP_DONE Hannes Reinecke
@ 2016-04-15 17:49   ` Bart Van Assche
  0 siblings, 0 replies; 30+ messages in thread
From: Bart Van Assche @ 2016-04-15 17:49 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash

On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
> @@ -2460,9 +2460,13 @@ struct request *blk_peek_request(struct request_queue *q)
>
>   			rq = NULL;
>   			break;
> -		} else if (ret == BLKPREP_KILL || ret == BLKPREP_INVALID) {
> +		} else if (ret == BLKPREP_KILL || ret == BLKPREP_INVALID ||
> +			   ret == BLKPREP_DONE) {
>   			int err = (ret == BLKPREP_INVALID) ? -EREMOTEIO : -EIO;
>
> +			if (ret == BLKPREP_DONE)
> +				err = 0;
> +
>   			rq->cmd_flags |= REQ_QUIET;
>   			/*
>   			 * Mark this request as started so we don't trigger

Hello Hannes,

How about using a switch/case statement to translate BLKPREP_* into an 
error code?

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/9] block: Add 'BLK_MQ_RQ_QUEUE_DONE' return value
  2016-04-04 10:00 ` [PATCH 8/9] block: Add 'BLK_MQ_RQ_QUEUE_DONE' return value Hannes Reinecke
@ 2016-04-15 17:56   ` Bart Van Assche
  2016-04-15 18:05     ` Hannes Reinecke
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2016-04-15 17:56 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
> Add a return value BLK_MQ_RQ_QUEUE_DONE to terminate a request
> without error.

Hello Hannes,

Are you sure it's OK not to update blk_mq_direct_issue_request()?

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/9] sd: configure ZBC devices
  2016-04-15 15:47   ` Bart Van Assche
@ 2016-04-15 18:01     ` Hannes Reinecke
  2016-04-16 11:24       ` Hannes Reinecke
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-15 18:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

On 04/15/2016 05:47 PM, Bart Van Assche wrote:
> On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
>> +static int
>> +sd_zbc_report_zones(struct scsi_disk *sdkp, sector_t start_lba,
>> +            unsigned char *buffer, int bufflen )
>> +{
>> [ ... ]
>> +    put_unaligned_be64(start_lba, &cmd[2]);
>> +    put_unaligned_be32(bufflen, &cmd[10]);
>
> The argument "sector_t start_lba" is confusing me. Isn't a number either
> a sector number or an LBA? Shouldn't that number be shifted right by
> ilog2(sdp->sector_size) - 9 before storing it in the CDB?
>
Well. I've considered the 'lba' the native disk lba, and sector the 
(logical) sector number from the block layer.
I was under the impression that this was the correct nomenclature here; 
if not I'm happy to change it.

But yes, I should be adding a documentation to that function.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/9] block: Add 'zoned' sysfs queue attribute
  2016-04-15 17:45   ` Bart Van Assche
@ 2016-04-15 18:03     ` Hannes Reinecke
  2016-04-15 18:42       ` Bart Van Assche
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-15 18:03 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash

On 04/15/2016 07:45 PM, Bart Van Assche wrote:
> On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
>> Add a sysfs queue attribute 'zoned' to display the zone layout
>> for zoned devices.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   block/blk-sysfs.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 47 insertions(+)
>>
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index ff97091..748bb27 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -244,6 +244,43 @@ static ssize_t queue_max_hw_sectors_show(struct
>> request_queue *q, char *page)
>>       return queue_var_show(max_hw_sectors_kb, (page));
>>   }
>>
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +static ssize_t queue_zoned_show(struct request_queue *q, char *page)
>> +{
>> +    struct rb_node *node;
>> +    struct blk_zone *zone;
>> +    ssize_t offset = 0, end = 0;
>> +    size_t size = 0, num = 0;
>> +    enum blk_zone_type type = BLK_ZONE_TYPE_UNKNOWN;
>> +
>> +    for (node = rb_first(&q->zones); node; node = rb_next(node)) {
>> +        zone = rb_entry(node, struct blk_zone, node);
>> +        if (zone->type != type ||
>> +            zone->len != size ||
>> +            end != zone->start) {
>> +            if (size != 0)
>> +                offset += sprintf(page + offset, "%zu\n", num);
>> +            /* We can only store one page ... */
>> +            if (offset + 42 > PAGE_SIZE) {
>> +                offset += sprintf(page + offset, "...\n");
>> +                return offset;
>> +            }
>> +            size = zone->len;
>> +            type = zone->type;
>> +            offset += sprintf(page + offset, "%zu %zu %d ",
>> +                      zone->start, size, type);
>> +            num = 0;
>> +            end = zone->start + size;
>> +        } else
>> +            end += zone->len;
>> +        num++;
>> +    }
>> +    if (num > 0)
>> +        offset += sprintf(page + offset, "%zu\n", num);
>> +    return offset > 0 ? offset : -EINVAL;
>> +}
>> +#endif
>> +
>>   #define QUEUE_SYSFS_BIT_FNS(name, flag, neg)                \
>>   static ssize_t                                \
>>   queue_show_##name(struct request_queue *q, char *page)            \
>> @@ -468,6 +505,13 @@ static struct queue_sysfs_entry
>> queue_write_same_max_entry = {
>>       .show = queue_write_same_max_show,
>>   };
>>
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +static struct queue_sysfs_entry queue_zoned_entry = {
>> +    .attr = {.name = "zoned", .mode = S_IRUGO },
>> +    .show = queue_zoned_show,
>> +};
>> +#endif
>> +
>
> Hello Hannes,
>
> Have you considered to move the above definitions into a new file? That
> would allow to avoid two #ifdefs and to move the code that decides
> whether or not the above code gets built into block/Makefile.
>
Well, it's just these few lines, so I thought it easier to put it into 
the existing file.
But sure I can add a separate files for that.

> Additionally, have you considered to create one sysfs directory per zone
> instead of one sysfs attribute with all zone information? From
> Documentation/filesystems/sysfs.txt: "Attributes should be ASCII text
> files, preferably with only one value per file."
>
Yes, I have considered it.
But doing so would require me to add about 20k sysfs attributes.
For no apparent gain.
So I've settled on this condensed approach here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/9] block: Add 'BLK_MQ_RQ_QUEUE_DONE' return value
  2016-04-15 17:56   ` Bart Van Assche
@ 2016-04-15 18:05     ` Hannes Reinecke
  0 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-15 18:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

On 04/15/2016 07:56 PM, Bart Van Assche wrote:
> On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
>> Add a return value BLK_MQ_RQ_QUEUE_DONE to terminate a request
>> without error.
>
> Hello Hannes,
>
> Are you sure it's OK not to update blk_mq_direct_issue_request()?
>
Ah. Missed that one.
Will be updating the patch.

Thanks for the review.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 9/9] sd: Implement support for ZBC devices
  2016-04-04 10:00 ` [PATCH 9/9] sd: Implement support for ZBC devices Hannes Reinecke
@ 2016-04-15 18:31   ` Bart Van Assche
  2016-04-16 11:34     ` Hannes Reinecke
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2016-04-15 18:31 UTC (permalink / raw)
  To: Hannes Reinecke, Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash

On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
> @@ -728,6 +729,10 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
>   	int ret = 0;
>   	char *buf;
>   	struct page *page = NULL;
> +#ifdef CONFIG_SCSI_ZBC
> +	struct blk_zone *zone;
> +	unsigned long flags;
> +#endif

There is a strong preference in the Linux kernel for avoiding #ifdefs 
and to move code that depends on the value of a CONFIG_* variable into 
a file for which the compilation depends on that CONFIG_* variable. 
Please consider to move the ZBC code from sd_setup_discard_cmnd() into a 
new function in sd_zbc.c.

> +#ifdef CONFIG_SCSI_ZBC
> +		zone = blk_lookup_zone(rq->q, sector);
> +		if (!zone) {
> +			ret = BLKPREP_KILL;
> +			goto out;
> +		}
> +		spin_lock_irqsave(&zone->lock, flags);
> +		if (zone->state == BLK_ZONE_BUSY) {
> +			sd_printk(KERN_INFO, sdkp,
> +				  "Discarding busy zone %zu/%zu\n",
> +				  zone->start, zone->len);
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +			ret = BLKPREP_DEFER;
> +			goto out;
> +		}
> +		if (!blk_zone_is_smr(zone)) {
> +			sd_printk(KERN_INFO, sdkp,
> +				  "Discarding %s zone %zu/%zu\n",
> +				  blk_zone_is_cmr(zone) ? "CMR" : "unknown",
> +				  zone->start, zone->len);
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +			ret = BLKPREP_DONE;
> +			goto out;
> +		}
> +		if (blk_zone_is_empty(zone)) {
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +			ret = BLKPREP_DONE;
> +			goto out;
> +		}
> +		if (zone->start != sector ||
> +		    zone->len < nr_sectors) {
> +			sd_printk(KERN_INFO, sdkp,
> +				  "Misaligned RESET WP, start %zu/%zu "
> +				  "len %zu/%u\n",
> +				  zone->start, sector, zone->len, nr_sectors);
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +			ret = BLKPREP_KILL;
> +			goto out;
> +		}
> +		/*
> +		 * Opportunistic setting, needs to be fixed up
> +		 * if RESET WRITE POINTER fails.
> +		 */
> +		zone->wp = zone->start;
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +#endif
 >   		cmd->cmd_len = 16;
 >   		cmd->cmnd[0] = ZBC_OUT;
 >   		cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;

Which mechanism prevents that zone->state is modified after it has been 
checked and before the RESET WRITE POINTER command has finished?

> @@ -990,6 +1041,13 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
>   	SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
>   					(unsigned long long)block));
>
> +	if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
> +		/* sd_zbc_lookup_zone lba is in block layer sector units */
> +		ret = sd_zbc_lookup_zone(sdkp, rq, block, this_count);
> +		if (ret != BLKPREP_OK)
> +			goto out;
> +	}
> +

Which mechanism guarantees that the above code won't run concurrently 
with zbc_parse_zones()?

> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 5debd49..35c75fa 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -65,6 +65,12 @@ struct scsi_disk {
>   	struct scsi_device *device;
>   	struct device	dev;
>   	struct gendisk	*disk;
> +#ifdef CONFIG_SCSI_ZBC
> +	struct workqueue_struct *zone_work_q;
> +	unsigned long	zone_flags;
> +#define SD_ZBC_ZONE_RESET 1
> +#define SD_ZBC_ZONE_INIT  2
> +#endif

The above two constants are only used in source file sd_zbc.c. Have you 
considered to move the definition of these constants into sd_zbc.c?

> +#undef SD_ZBC_DEBUG

Please use the dynamic_debug facility instead of #ifdef SD_ZBC_DEBUG + 
sd_printk().

> +void sd_zbc_refresh_zone_work(struct work_struct *work)
> +{
> +	struct zbc_update_work *zbc_work =
> +		container_of(work, struct zbc_update_work, zone_work);
> +	struct scsi_disk *sdkp = zbc_work->sdkp;
> +	struct request_queue *q = sdkp->disk->queue;
> +	unsigned int zone_buflen;
> +	int ret;
> +	sector_t last_sector;
> +	sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
> +	sector_t zone_lba = sectors_to_logical(sdkp->device,
> +					       zbc_work->zone_sector);
> +
> +	zone_buflen = zbc_work->zone_buflen;
> +	ret = sd_zbc_report_zones(sdkp, zone_lba, zbc_work->zone_buf,
> +				  zone_buflen);
> +	if (ret)
> +		goto done_free;
> +
> +	last_sector = zbc_parse_zones(sdkp, zbc_work->zone_buf, zone_buflen);
> +	if (last_sector != -1 && last_sector < capacity) {
> +		if (test_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
> +#ifdef SD_ZBC_DEBUG
> +			sd_printk(KERN_INFO, sdkp,
> +				  "zones in reset, cancelling refresh\n");
> +#endif
> +			ret = -EAGAIN;
> +			goto done_free;
> +		}
> +
> +		zbc_work->zone_sector = last_sector;
> +		queue_work(sdkp->zone_work_q, &zbc_work->zone_work);
> +		/* Kick request queue to be on the safe side */
> +		goto done_start_queue;
> +	}
> +done_free:
> +	kfree(zbc_work);
> +	if (test_and_clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags) && ret) {
> +		sd_printk(KERN_INFO, sdkp,
> +			  "Cancelling zone initialisation\n");
> +	}
> +done_start_queue:
> +	if (q->mq_ops)
> +		blk_mq_start_hw_queues(q);
> +	else {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(q->queue_lock, flags);
> +		blk_start_queue(q);
> +		spin_unlock_irqrestore(q->queue_lock, flags);
> +	}
> +}

Which mechanism prevents concurrent execution of 
sd_zbc_refresh_zone_work() and READ and WRITE commands?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/9] block: Add 'zoned' sysfs queue attribute
  2016-04-15 18:03     ` Hannes Reinecke
@ 2016-04-15 18:42       ` Bart Van Assche
  0 siblings, 0 replies; 30+ messages in thread
From: Bart Van Assche @ 2016-04-15 18:42 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash

On 04/15/2016 11:03 AM, Hannes Reinecke wrote:
> On 04/15/2016 07:45 PM, Bart Van Assche wrote:
>> Additionally, have you considered to create one sysfs directory per zone
>> instead of one sysfs attribute with all zone information? From
>> Documentation/filesystems/sysfs.txt: "Attributes should be ASCII text
>> files, preferably with only one value per file."
>
> Yes, I have considered it.
> But doing so would require me to add about 20k sysfs attributes.
> For no apparent gain.
> So I've settled on this condensed approach here.

Hello Hannes,

My understanding is that the one-value-per-file rule was introduced for 
sysfs to make it easy for software, e.g. shell scripts, to process that 
information. If multiple values occur in the same sysfs file sometimes 
complicated code is needed in shell scripts to parse these values. If 
one value is present in each sysfs file it becomes much easier for 
software to process that sysfs information.

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/9] sd: configure ZBC devices
  2016-04-15 18:01     ` Hannes Reinecke
@ 2016-04-16 11:24       ` Hannes Reinecke
  0 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-16 11:24 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash,
	Hannes Reinecke

On 04/15/2016 08:01 PM, Hannes Reinecke wrote:
> On 04/15/2016 05:47 PM, Bart Van Assche wrote:
>> On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
>>> +static int
>>> +sd_zbc_report_zones(struct scsi_disk *sdkp, sector_t start_lba,
>>> +            unsigned char *buffer, int bufflen )
>>> +{
>>> [ ... ]
>>> +    put_unaligned_be64(start_lba, &cmd[2]);
>>> +    put_unaligned_be32(bufflen, &cmd[10]);
>>
>> The argument "sector_t start_lba" is confusing me. Isn't a number either
>> a sector number or an LBA? Shouldn't that number be shifted right by
>> ilog2(sdp->sector_size) - 9 before storing it in the CDB?
>>
> Well. I've considered the 'lba' the native disk lba, and sector the
> (logical) sector number from the block layer.
> I was under the impression that this was the correct nomenclature here;
> if not I'm happy to change it.
>
> But yes, I should be adding a documentation to that function.
>
I've changed sd_zbc_report_zones() to use the sector number,
not the LBA. And added some documentation clarifying that.

Thanks for the review.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 9/9] sd: Implement support for ZBC devices
  2016-04-15 18:31   ` Bart Van Assche
@ 2016-04-16 11:34     ` Hannes Reinecke
  0 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2016-04-16 11:34 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Martin K. Petersen, Christoph Hellwig,
	Shaun Tancheff, Damien Le Moal, linux-scsi, Sathya Prakash

On 04/15/2016 08:31 PM, Bart Van Assche wrote:
> On 04/04/2016 03:00 AM, Hannes Reinecke wrote:
>> @@ -728,6 +729,10 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd
>> *cmd)
>>       int ret = 0;
>>       char *buf;
>>       struct page *page = NULL;
>> +#ifdef CONFIG_SCSI_ZBC
>> +    struct blk_zone *zone;
>> +    unsigned long flags;
>> +#endif
>
> There is a strong preference in the Linux kernel for avoiding #ifdefs
> and to move code that depends on the value of a CONFIG_* variable into a
> file for which the compilation depends on that CONFIG_* variable. Please
> consider to move the ZBC code from sd_setup_discard_cmnd() into a new
> function in sd_zbc.c.
>
Well; the integrity code also added some #ifdefs, so I thought it would 
be acceptable, too.

I can reconsider it, of course, if preferred.

>> +#ifdef CONFIG_SCSI_ZBC
>> +        zone = blk_lookup_zone(rq->q, sector);
>> +        if (!zone) {
>> +            ret = BLKPREP_KILL;
>> +            goto out;
>> +        }
>> +        spin_lock_irqsave(&zone->lock, flags);
>> +        if (zone->state == BLK_ZONE_BUSY) {
>> +            sd_printk(KERN_INFO, sdkp,
>> +                  "Discarding busy zone %zu/%zu\n",
>> +                  zone->start, zone->len);
>> +            spin_unlock_irqrestore(&zone->lock, flags);
>> +            ret = BLKPREP_DEFER;
>> +            goto out;
>> +        }
>> +        if (!blk_zone_is_smr(zone)) {
>> +            sd_printk(KERN_INFO, sdkp,
>> +                  "Discarding %s zone %zu/%zu\n",
>> +                  blk_zone_is_cmr(zone) ? "CMR" : "unknown",
>> +                  zone->start, zone->len);
>> +            spin_unlock_irqrestore(&zone->lock, flags);
>> +            ret = BLKPREP_DONE;
>> +            goto out;
>> +        }
>> +        if (blk_zone_is_empty(zone)) {
>> +            spin_unlock_irqrestore(&zone->lock, flags);
>> +            ret = BLKPREP_DONE;
>> +            goto out;
>> +        }
>> +        if (zone->start != sector ||
>> +            zone->len < nr_sectors) {
>> +            sd_printk(KERN_INFO, sdkp,
>> +                  "Misaligned RESET WP, start %zu/%zu "
>> +                  "len %zu/%u\n",
>> +                  zone->start, sector, zone->len, nr_sectors);
>> +            spin_unlock_irqrestore(&zone->lock, flags);
>> +            ret = BLKPREP_KILL;
>> +            goto out;
>> +        }
>> +        /*
>> +         * Opportunistic setting, needs to be fixed up
>> +         * if RESET WRITE POINTER fails.
>> +         */
>> +        zone->wp = zone->start;
>> +        spin_unlock_irqrestore(&zone->lock, flags);
>> +#endif
>  >           cmd->cmd_len = 16;
>  >           cmd->cmnd[0] = ZBC_OUT;
>  >           cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
>
> Which mechanism prevents that zone->state is modified after it has been
> checked and before the RESET WRITE POINTER command has finished?
>
See below.

>> @@ -990,6 +1041,13 @@ static int sd_setup_read_write_cmnd(struct
>> scsi_cmnd *SCpnt)
>>       SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "block=%llu\n",
>>                       (unsigned long long)block));
>>
>> +    if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
>> +        /* sd_zbc_lookup_zone lba is in block layer sector units */
>> +        ret = sd_zbc_lookup_zone(sdkp, rq, block, this_count);
>> +        if (ret != BLKPREP_OK)
>> +            goto out;
>> +    }
>> +
>
> Which mechanism guarantees that the above code won't run concurrently
> with zbc_parse_zones()?
>
See below. There is no overall lock (the zone layout is considered 
immutable once set), but each zone has its own spinlock.
If the zone state is set to BUSY (see below) sd_zbc_lookup_zone will 
return BLKPREP_DEFER, and the request won't be scheduled.

>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index 5debd49..35c75fa 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -65,6 +65,12 @@ struct scsi_disk {
>>       struct scsi_device *device;
>>       struct device    dev;
>>       struct gendisk    *disk;
>> +#ifdef CONFIG_SCSI_ZBC
>> +    struct workqueue_struct *zone_work_q;
>> +    unsigned long    zone_flags;
>> +#define SD_ZBC_ZONE_RESET 1
>> +#define SD_ZBC_ZONE_INIT  2
>> +#endif
>
> The above two constants are only used in source file sd_zbc.c. Have you
> considered to move the definition of these constants into sd_zbc.c?
>
>> +#undef SD_ZBC_DEBUG
>
> Please use the dynamic_debug facility instead of #ifdef SD_ZBC_DEBUG +
> sd_printk().
>
Okay, will be doing so.

>> +void sd_zbc_refresh_zone_work(struct work_struct *work)
>> +{
>> +    struct zbc_update_work *zbc_work =
>> +        container_of(work, struct zbc_update_work, zone_work);
>> +    struct scsi_disk *sdkp = zbc_work->sdkp;
>> +    struct request_queue *q = sdkp->disk->queue;
>> +    unsigned int zone_buflen;
>> +    int ret;
>> +    sector_t last_sector;
>> +    sector_t capacity = logical_to_sectors(sdkp->device,
>> sdkp->capacity);
>> +    sector_t zone_lba = sectors_to_logical(sdkp->device,
>> +                           zbc_work->zone_sector);
>> +
>> +    zone_buflen = zbc_work->zone_buflen;
>> +    ret = sd_zbc_report_zones(sdkp, zone_lba, zbc_work->zone_buf,
>> +                  zone_buflen);
>> +    if (ret)
>> +        goto done_free;
>> +
>> +    last_sector = zbc_parse_zones(sdkp, zbc_work->zone_buf,
>> zone_buflen);
>> +    if (last_sector != -1 && last_sector < capacity) {
>> +        if (test_bit(SD_ZBC_ZONE_RESET, &sdkp->zone_flags)) {
>> +#ifdef SD_ZBC_DEBUG
>> +            sd_printk(KERN_INFO, sdkp,
>> +                  "zones in reset, cancelling refresh\n");
>> +#endif
>> +            ret = -EAGAIN;
>> +            goto done_free;
>> +        }
>> +
>> +        zbc_work->zone_sector = last_sector;
>> +        queue_work(sdkp->zone_work_q, &zbc_work->zone_work);
>> +        /* Kick request queue to be on the safe side */
>> +        goto done_start_queue;
>> +    }
>> +done_free:
>> +    kfree(zbc_work);
>> +    if (test_and_clear_bit(SD_ZBC_ZONE_INIT, &sdkp->zone_flags) &&
>> ret) {
>> +        sd_printk(KERN_INFO, sdkp,
>> +              "Cancelling zone initialisation\n");
>> +    }
>> +done_start_queue:
>> +    if (q->mq_ops)
>> +        blk_mq_start_hw_queues(q);
>> +    else {
>> +        unsigned long flags;
>> +
>> +        spin_lock_irqsave(q->queue_lock, flags);
>> +        blk_start_queue(q);
>> +        spin_unlock_irqrestore(q->queue_lock, flags);
>> +    }
>> +}
>
> Which mechanism prevents concurrent execution of
> sd_zbc_refresh_zone_work() and READ and WRITE commands?
>
When sd_zbc_refresh_zone_work is started it'll set all zones to be 
updated to 'BUSY', and the prep_rq() function will defer any I/O
until REPORT_ZONES has returned and updated the state to something 
other, like BLK_ZONE_OPEN.

Thanks for the review.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-04-16 11:34 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-04 10:00 [PATCH 0/9] block/scsi: Implement SMR drive support Hannes Reinecke
2016-04-04 10:00 ` [PATCH 1/9] blk-sysfs: Add 'chunk_sectors' to sysfs attributes Hannes Reinecke
2016-04-14 19:09   ` Bart Van Assche
2016-04-15  6:01     ` Hannes Reinecke
2016-04-04 10:00 ` [PATCH 2/9] block: update chunk_sectors in blk_stack_limits() Hannes Reinecke
2016-04-15  3:41   ` Bart Van Assche
2016-04-15  6:05     ` Hannes Reinecke
2016-04-04 10:00 ` [PATCH 3/9] sd: configure ZBC devices Hannes Reinecke
2016-04-15 15:47   ` Bart Van Assche
2016-04-15 18:01     ` Hannes Reinecke
2016-04-16 11:24       ` Hannes Reinecke
2016-04-04 10:00 ` [PATCH 4/9] sd: Implement new RESET_WP provisioning mode Hannes Reinecke
2016-04-04 10:00 ` [PATCH 5/9] block: Implement support for zoned block devices Hannes Reinecke
2016-04-15 17:37   ` Bart Van Assche
2016-04-04 10:00 ` [PATCH 6/9] block: Add 'zoned' sysfs queue attribute Hannes Reinecke
2016-04-07  1:56   ` Damien Le Moal
2016-04-07  5:57     ` Hannes Reinecke
2016-04-15 17:45   ` Bart Van Assche
2016-04-15 18:03     ` Hannes Reinecke
2016-04-15 18:42       ` Bart Van Assche
2016-04-04 10:00 ` [PATCH 7/9] block: Introduce BLKPREP_DONE Hannes Reinecke
2016-04-15 17:49   ` Bart Van Assche
2016-04-04 10:00 ` [PATCH 8/9] block: Add 'BLK_MQ_RQ_QUEUE_DONE' return value Hannes Reinecke
2016-04-15 17:56   ` Bart Van Assche
2016-04-15 18:05     ` Hannes Reinecke
2016-04-04 10:00 ` [PATCH 9/9] sd: Implement support for ZBC devices Hannes Reinecke
2016-04-15 18:31   ` Bart Van Assche
2016-04-16 11:34     ` Hannes Reinecke
2016-04-08 18:35 ` [PATCH 0/9] block/scsi: Implement SMR drive support Shaun Tancheff
2016-04-09  8:01   ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.