All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 00/14] scsi-mq support for ZBC disks
@ 2017-10-02  7:15 Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 01/14] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h Damien Le Moal
                   ` (13 more replies)
  0 siblings, 14 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

This series implements support for ZBC disks used through the scsi-mq I/O path.

The current scsi level support of ZBC disks guarantees write request ordering
using a per-zone write lock which prevents issuing simultaneously multiple
write commands to a zone, doing so avoid reordering of sequential writes to
sequential zones. This method is however ineffective when scsi-mq is used with
zoned block devices. This is due to the different execution model of blk-mq
which passes a request to the scsi layer for dispatching after the request has
been removed from the I/O scheduler queue. That is, when the scsi layer tries
to lock the target zone of the request, the request may already be out of
order and zone write locking fails to prevent that.

Various approaches have been tried to solve this problem. All of them had the
serious disadvantage of cluttering blk-mq code with zoned block device specific
conditions and processing. As such extensive changes can only turn into a
maintenance nightmares, a radically different solution is proposed here.

This series proposes implementing scsi-mq support for zoned block devices at
the I/O scheduler level with simple modifications of the mq-deadline scheduler.
the modifications are the addition of a per zone write locking mechanism
similar to that implemented in sd_zbc.c for the legacy scsi path. The zone
write locking mechanism is used for the exact same purpose, that is, to limit
writes per zone to at most one request to avoid reordering. The locking context
however changes from that of scsi-sq and is moved to the dispatch_request
method of the scheduler. Within this context, under a spin lock guaranteeing
atomicity against other dispatch contexts, target zones of write requests can
be locked before write requests removal from the scheduler. In effect, this
results in the same behavior as the legacy scsi path. Sequential write ordering
is preserved.

The changes to mq-deadline do not affect regular disks: the same scheduling
behavior is maintained for these. The modification are also optimized to not
lock conventional zones. To do so, additional data is introduced in the
request queue structure so that the low level scsi code can pass upward
information such as the total number of zones and zone types of the device.
The availability of this new data avoids difficulties in accessing this
information from the I/O scheduler initialization method (init_queue() method)
context.

Of note is that the last patch of this series removes setting mq-deadline as the
default scheduler for block devices with a single hardware queue. The reason for
this is that setting the default scheduler is done very early in the device
initialization sequence, when the disk characteristics are not yet known. This
results in mq-deadline not correctly setting the default zones write locking
behavior nased on the device zoning model. Setting of a default I/O scheduler
can be done easily with udev rules later in the system initialization process,
leading to correct default settings for zoned block devices.

Comments are as always very much appreciated.

Changes from v5:
* Refactor patches to introduce the zone_lock spinlock only when needed
* Addressed Bart's comments (in particular explanations of the zone_lock
  spinlock use)

Changes from v4:
* Various fixes and improvements (From Christoph's comments)
* Dropped zones_wlock scheduler tunable attribute

Changes from v3:
* Integrated support directly into mq-deadline instead of creating a new I/O
  scheduler.
* Disable setting of default mq scheduler for single queue devices

Changes from v2:
* Introduced blk_zoned structure
* Moved I/O scheduler from drivers/scsi to block 

Changes from v1:
* Addressed Bart's comments for the blk-mq patches (declarations files)
* Split (former) patch 4 into multiple patches to facilitate review
* Fixed scsi disk lookup from io scheduler by introducing
  scsi_disk_from_queue()

Damien Le Moal (14):
  scsi: sd_zbc: Move ZBC declarations to scsi_proto.h
  scsi: sd_zbc: Fix comments and indentation
  scsi: sd_zbc: Rearrange code
  scsi: sd_zbc: Use well defined macros
  scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()
  block: Add zoned block device information to request queue
  scsi: sd_zbc: Initialize device request queue zoned data
  scsi: sd_zbc: Limit zone write locking to sequential zones
  scsi: sd_zbc: Disable zone write locking with scsi-mq
  block: mq-deadline: Add zoned block device data
  blokc: mq-deadline: Introduce dispatch helpers
  block: mq-deadline: Introduce zone locking support
  block: mq-deadline: Limit write request dispatch for zoned block
    devices
  block: do not set mq default scheduler

 block/elevator.c          |  17 +--
 block/mq-deadline.c       | 271 +++++++++++++++++++++++++++++++++++--
 drivers/scsi/scsi_lib.c   |   5 +-
 drivers/scsi/sd_zbc.c     | 334 +++++++++++++++++++++++++++++++++++-----------
 include/linux/blkdev.h    |  53 ++++++++
 include/scsi/scsi_proto.h |  45 +++++--
 6 files changed, 612 insertions(+), 113 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V6 01/14] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 02/14] scsi: sd_zbc: Fix comments and indentation Damien Le Moal
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Move standard macro definitions for the zone types and zone conditions
to scsi_proto.h together with the definitions related to the
REPORT ZONES command. While at it, define all values in the enums to
be clear.

Also remove unnecessary includes in sd_zbc.c.

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd_zbc.c     | 24 ------------------------
 include/scsi/scsi_proto.h | 45 ++++++++++++++++++++++++++++++++++-----------
 2 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 8aa54779aac1..692c8cbc7ed8 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -28,32 +28,8 @@
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
-#include <scsi/scsi_dbg.h>
-#include <scsi/scsi_device.h>
-#include <scsi/scsi_driver.h>
-#include <scsi/scsi_host.h>
-#include <scsi/scsi_eh.h>
 
 #include "sd.h"
-#include "scsi_priv.h"
-
-enum zbc_zone_type {
-	ZBC_ZONE_TYPE_CONV = 0x1,
-	ZBC_ZONE_TYPE_SEQWRITE_REQ,
-	ZBC_ZONE_TYPE_SEQWRITE_PREF,
-	ZBC_ZONE_TYPE_RESERVED,
-};
-
-enum zbc_zone_cond {
-	ZBC_ZONE_COND_NO_WP,
-	ZBC_ZONE_COND_EMPTY,
-	ZBC_ZONE_COND_IMP_OPEN,
-	ZBC_ZONE_COND_EXP_OPEN,
-	ZBC_ZONE_COND_CLOSED,
-	ZBC_ZONE_COND_READONLY = 0xd,
-	ZBC_ZONE_COND_FULL,
-	ZBC_ZONE_COND_OFFLINE,
-};
 
 /**
  * Convert a zone descriptor to a zone struct.
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index 8c285d9a06d8..39130a9c05bf 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -301,19 +301,42 @@ struct scsi_lun {
 
 /* Reporting options for REPORT ZONES */
 enum zbc_zone_reporting_options {
-	ZBC_ZONE_REPORTING_OPTION_ALL = 0,
-	ZBC_ZONE_REPORTING_OPTION_EMPTY,
-	ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
-	ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
-	ZBC_ZONE_REPORTING_OPTION_CLOSED,
-	ZBC_ZONE_REPORTING_OPTION_FULL,
-	ZBC_ZONE_REPORTING_OPTION_READONLY,
-	ZBC_ZONE_REPORTING_OPTION_OFFLINE,
-	ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
-	ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
-	ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
+	ZBC_ZONE_REPORTING_OPTION_ALL		= 0x00,
+	ZBC_ZONE_REPORTING_OPTION_EMPTY		= 0x01,
+	ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN	= 0x02,
+	ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN	= 0x03,
+	ZBC_ZONE_REPORTING_OPTION_CLOSED	= 0x04,
+	ZBC_ZONE_REPORTING_OPTION_FULL		= 0x05,
+	ZBC_ZONE_REPORTING_OPTION_READONLY	= 0x06,
+	ZBC_ZONE_REPORTING_OPTION_OFFLINE	= 0x07,
+	/* 0x08 to 0x0f are reserved */
+	ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP	= 0x10,
+	ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE	= 0x11,
+	/* 0x12 to 0x3e are reserved */
+	ZBC_ZONE_REPORTING_OPTION_NON_WP	= 0x3f,
 };
 
 #define ZBC_REPORT_ZONE_PARTIAL 0x80
 
+/* Zone types of REPORT ZONES zone descriptors */
+enum zbc_zone_type {
+	ZBC_ZONE_TYPE_CONV		= 0x1,
+	ZBC_ZONE_TYPE_SEQWRITE_REQ	= 0x2,
+	ZBC_ZONE_TYPE_SEQWRITE_PREF	= 0x3,
+	/* 0x4 to 0xf are reserved */
+};
+
+/* Zone conditions of REPORT ZONES zone descriptors */
+enum zbc_zone_cond {
+	ZBC_ZONE_COND_NO_WP		= 0x0,
+	ZBC_ZONE_COND_EMPTY		= 0x1,
+	ZBC_ZONE_COND_IMP_OPEN		= 0x2,
+	ZBC_ZONE_COND_EXP_OPEN		= 0x3,
+	ZBC_ZONE_COND_CLOSED		= 0x4,
+	/* 0x5 to 0xc are reserved */
+	ZBC_ZONE_COND_READONLY		= 0xd,
+	ZBC_ZONE_COND_FULL		= 0xe,
+	ZBC_ZONE_COND_OFFLINE		= 0xf,
+};
+
 #endif /* _SCSI_PROTO_H_ */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 02/14] scsi: sd_zbc: Fix comments and indentation
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 01/14] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02 16:50     ` Bart Van Assche
  2017-10-02  7:15 ` [PATCH V6 03/14] scsi: sd_zbc: Rearrange code Damien Le Moal
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Fix comments style (use kernel-doc style) and content to clarify some
functions. Also fix some functions signature indentation and remove a
useless blank line in sd_zbc_read_zones().

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c |   5 ++-
 drivers/scsi/sd_zbc.c   | 117 +++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 104 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..c72b97a74906 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1752,7 +1752,10 @@ static void scsi_done(struct scsi_cmnd *cmd)
  *
  * Returns:     Nothing
  *
- * Lock status: IO request lock assumed to be held when called.
+ * Lock status: request queue lock assumed to be held when called.
+ *
+ * Note: See sd_zbc.c sd_zbc_write_lock_zone() for write order
+ * protection for ZBC disks.
  */
 static void scsi_request_fn(struct request_queue *q)
 	__releases(q->queue_lock)
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 692c8cbc7ed8..023f705ae235 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -32,10 +32,14 @@
 #include "sd.h"
 
 /**
- * Convert a zone descriptor to a zone struct.
+ * sd_zbc_parse_report - Convert a zone descriptor to a struct blk_zone,
+ * @sdkp: The disk the report originated from
+ * @buf: Address of the report zone descriptor
+ * @zone: the destination zone structure
+ *
+ * All LBA sized values are converted to 512B sectors unit.
  */
-static void sd_zbc_parse_report(struct scsi_disk *sdkp,
-				u8 *buf,
+static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
 				struct blk_zone *zone)
 {
 	struct scsi_device *sdp = sdkp->device;
@@ -58,7 +62,13 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp,
 }
 
 /**
- * Issue a REPORT ZONES scsi command.
+ * sd_zbc_report_zones - Issue a REPORT ZONES scsi command.
+ * @sdkp: The target disk
+ * @buf: Buffer to use for the reply
+ * @buflen: the buffer size
+ * @lba: Start LBA of the report
+ *
+ * For internal use during device validation.
  */
 static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
 			       unsigned int buflen, sector_t lba)
@@ -99,6 +109,12 @@ static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
 	return 0;
 }
 
+/**
+ * sd_zbc_setup_report_cmnd - Prepare a REPORT ZONES scsi command
+ * @cmd: The command to setup
+ *
+ * Call in sd_init_command() for a REQ_OP_ZONE_REPORT request.
+ */
 int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -141,6 +157,14 @@ int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
 	return BLKPREP_OK;
 }
 
+/**
+ * sd_zbc_report_zones_complete - Process a REPORT ZONES scsi command reply.
+ * @scmd: The completed report zones command
+ * @good_bytes: reply size in bytes
+ *
+ * Convert all reported zone descriptors to struct blk_zone. The conversion
+ * is done in-place, directly in the request specified sg buffer.
+ */
 static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
 					 unsigned int good_bytes)
 {
@@ -196,17 +220,32 @@ static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
 	local_irq_restore(flags);
 }
 
+/**
+ * sd_zbc_zone_sectors - Get the device zone size in number of 512B sectors.
+ * @sdkp: The target disk
+ */
 static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
 {
 	return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
 }
 
+/**
+ * sd_zbc_zone_no - Get the number of the zone conataining a sector.
+ * @sdkp: The target disk
+ * @sector: 512B sector address contained in the zone
+ */
 static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
 					  sector_t sector)
 {
 	return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
 }
 
+/**
+ * sd_zbc_setup_reset_cmnd - Prepare a RESET WRITE POINTER scsi command.
+ * @cmd: the command to setup
+ *
+ * Called from sd_init_command() for a REQ_OP_ZONE_RESET request.
+ */
 int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -239,6 +278,23 @@ int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
 	return BLKPREP_OK;
 }
 
+/**
+ * sd_zbc_write_lock_zone - Write lock a sequential zone.
+ * @cmd: write command
+ *
+ * Called from sd_init_cmd() for write requests (standard write, write same or
+ * write zeroes operations). If the request target zone is not already locked,
+ * the zone is locked and BLKPREP_OK returned, allowing the request to proceed
+ * through dispatch in scsi_request_fn(). Otherwise, BLKPREP_DEFER is returned,
+ * forcing the request to wait for the zone to be unlocked, that is, for the
+ * previously issued write request targeting the same zone to complete.
+ *
+ * This is called from blk_peek_request() context with the queue lock held and
+ * before the request is removed from the scheduler. As a result, multiple
+ * contexts executing concurrently scsi_request_fn() cannot result in write
+ * sequence reordering as only a single write request per zone is allowed to
+ * proceed.
+ */
 int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -261,10 +317,7 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	 * Do not issue more than one write at a time per
 	 * zone. This solves write ordering problems due to
 	 * the unlocking of the request queue in the dispatch
-	 * path in the non scsi-mq case. For scsi-mq, this
-	 * also avoids potential write reordering when multiple
-	 * threads running on different CPUs write to the same
-	 * zone (with a synchronized sequential pattern).
+	 * path in the non scsi-mq case.
 	 */
 	if (sdkp->zones_wlock &&
 	    test_and_set_bit(zno, sdkp->zones_wlock))
@@ -276,6 +329,13 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	return BLKPREP_OK;
 }
 
+/**
+ * sd_zbc_write_unlock_zone - Write unlock a sequential zone.
+ * @cmd: write command
+ *
+ * Called from sd_uninit_cmd(). Unlocking the request target zone will allow
+ * dispatching the next write request for the zone.
+ */
 void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -290,8 +350,16 @@ void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
 	}
 }
 
-void sd_zbc_complete(struct scsi_cmnd *cmd,
-		     unsigned int good_bytes,
+/**
+ * sd_zbc_complete - ZBC command post processing.
+ * @cmd: Completed command
+ * @good_bytes: Command reply bytes
+ * @sshdr: command sense header
+ *
+ * Called from sd_done(). Process report zones reply and handle reset zone
+ * and write commands errors.
+ */
+void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
 		     struct scsi_sense_hdr *sshdr)
 {
 	int result = cmd->result;
@@ -336,7 +404,11 @@ void sd_zbc_complete(struct scsi_cmnd *cmd,
 }
 
 /**
- * Read zoned block device characteristics (VPD page B6).
+ * sd_zbc_read_zoned_characteristics - Read zoned block device characteristics
+ * @sdkp: Target disk
+ * @buf: Buffer where to store the VPD page data
+ *
+ * Read VPD page B6.
  */
 static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
 					     unsigned char *buf)
@@ -366,10 +438,16 @@ static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
 }
 
 /**
- * Check reported capacity.
+ * sd_zbc_check_capacity - Check reported capacity.
+ * @sdkp: Target disk
+ * @buf: Buffer to use for commands
+ *
+ * ZBC drive may report only the capacity of the first conventional zones at
+ * LBA 0. This is indicated by the RC_BASIS field of the read capacity reply.
+ * Check this here. If the disk reported only its conventional zones capacity,
+ * get the total capacity by doing a report zones.
  */
-static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
-				 unsigned char *buf)
+static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf)
 {
 	sector_t lba;
 	int ret;
@@ -399,6 +477,13 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
 
 #define SD_ZBC_BUF_SIZE 131072
 
+/**
+ * sd_zbc_check_zone_size - Check the device zone sizes
+ * @sdkp: Target disk
+ *
+ * Check that all zones of the device are equal. The last zone can however
+ * be smaller. The zone size must also be a power of two number of LBAs.
+ */
 static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 {
 	u64 zone_blocks;
@@ -525,8 +610,7 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
 	return 0;
 }
 
-int sd_zbc_read_zones(struct scsi_disk *sdkp,
-		      unsigned char *buf)
+int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 {
 	int ret;
 
@@ -537,7 +621,6 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp,
 		 */
 		return 0;
 
-
 	/* Get zoned block device characteristics */
 	ret = sd_zbc_read_zoned_characteristics(sdkp, buf);
 	if (ret)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 03/14] scsi: sd_zbc: Rearrange code
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 01/14] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 02/14] scsi: sd_zbc: Fix comments and indentation Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 04/14] scsi: sd_zbc: Use well defined macros Damien Le Moal
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Rearrange sd_zbc_setup() to include use_16_for_rw and use_10_for_rw
assignments and move the calculation of sdkp->zone_shift together
with the assignment of the verified zone_blocks value in
sd_zbc_check_zone_size().

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
---
 drivers/scsi/sd_zbc.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 023f705ae235..7dbaf920679e 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -584,6 +584,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 	}
 
 	sdkp->zone_blocks = zone_blocks;
+	sdkp->zone_shift = ilog2(zone_blocks);
 
 	return 0;
 }
@@ -591,10 +592,13 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 static int sd_zbc_setup(struct scsi_disk *sdkp)
 {
 
+	/* READ16/WRITE16 is mandatory for ZBC disks */
+	sdkp->device->use_16_for_rw = 1;
+	sdkp->device->use_10_for_rw = 0;
+
 	/* chunk_sectors indicates the zone size */
 	blk_queue_chunk_sectors(sdkp->disk->queue,
 			logical_to_sectors(sdkp->device, sdkp->zone_blocks));
-	sdkp->zone_shift = ilog2(sdkp->zone_blocks);
 	sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;
 	if (sdkp->capacity & (sdkp->zone_blocks - 1))
 		sdkp->nr_zones++;
@@ -657,10 +661,6 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 	if (ret)
 		goto err;
 
-	/* READ16/WRITE16 is mandatory for ZBC disks */
-	sdkp->device->use_16_for_rw = 1;
-	sdkp->device->use_10_for_rw = 0;
-
 	return 0;
 
 err:
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 04/14] scsi: sd_zbc: Use well defined macros
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (2 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 03/14] scsi: sd_zbc: Rearrange code Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 05/14] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics() Damien Le Moal
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

instead of open coding, use the min() macro to calculate a report zones
reply buffer length in sd_zbc_check_zone_size() and the round_up()
macro for calculating the number of zones in sd_zbc_setup().

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd_zbc.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 7dbaf920679e..bbad851c1789 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -475,7 +475,7 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf)
 	return 0;
 }
 
-#define SD_ZBC_BUF_SIZE 131072
+#define SD_ZBC_BUF_SIZE 131072U
 
 /**
  * sd_zbc_check_zone_size - Check the device zone sizes
@@ -526,10 +526,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 		/* Parse REPORT ZONES header */
 		list_length = get_unaligned_be32(&buf[0]) + 64;
 		rec = buf + 64;
-		if (list_length < SD_ZBC_BUF_SIZE)
-			buf_len = list_length;
-		else
-			buf_len = SD_ZBC_BUF_SIZE;
+		buf_len = min(list_length, SD_ZBC_BUF_SIZE);
 
 		/* Parse zone descriptors */
 		while (rec < buf + buf_len) {
@@ -599,9 +596,8 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
 	/* chunk_sectors indicates the zone size */
 	blk_queue_chunk_sectors(sdkp->disk->queue,
 			logical_to_sectors(sdkp->device, sdkp->zone_blocks));
-	sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;
-	if (sdkp->capacity & (sdkp->zone_blocks - 1))
-		sdkp->nr_zones++;
+	sdkp->nr_zones =
+		round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift;
 
 	if (!sdkp->zones_wlock) {
 		sdkp->zones_wlock = kcalloc(BITS_TO_LONGS(sdkp->nr_zones),
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 05/14] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (3 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 04/14] scsi: sd_zbc: Use well defined macros Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 06/14] block: Add zoned block device information to request queue Damien Le Moal
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche, stable

The three values starting at byte 8 of the Zoned Block Device
Characteristics VPD page B6h are 32 bits values, not 64bits. So use
get_unaligned_be32() to retrieve the values and not get_unaligned_be64()

Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Cc: <stable@vger.kernel.org>

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd_zbc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index bbad851c1789..27793b9f54c0 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -423,15 +423,15 @@ static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
 	if (sdkp->device->type != TYPE_ZBC) {
 		/* Host-aware */
 		sdkp->urswrz = 1;
-		sdkp->zones_optimal_open = get_unaligned_be64(&buf[8]);
-		sdkp->zones_optimal_nonseq = get_unaligned_be64(&buf[12]);
+		sdkp->zones_optimal_open = get_unaligned_be32(&buf[8]);
+		sdkp->zones_optimal_nonseq = get_unaligned_be32(&buf[12]);
 		sdkp->zones_max_open = 0;
 	} else {
 		/* Host-managed */
 		sdkp->urswrz = buf[4] & 1;
 		sdkp->zones_optimal_open = 0;
 		sdkp->zones_optimal_nonseq = 0;
-		sdkp->zones_max_open = get_unaligned_be64(&buf[16]);
+		sdkp->zones_max_open = get_unaligned_be32(&buf[16]);
 	}
 
 	return 0;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 06/14] block: Add zoned block device information to request queue
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (4 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 05/14] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics() Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 07/14] scsi: sd_zbc: Initialize device request queue zoned data Damien Le Moal
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Components relying only on the requeuest_queue structure for accessing
block devices (e.g. I/O schedulers) have a limited knowledged of the
device characteristics. In particular, the device capacity cannot be
easily discovered, which for a zoned block device also result in the
inability to easily know the number of zones of the device (the zone
size is indicated by the chunk_sectors field of the queue limits).

Introduce the nr_zones field to the request_queue sturcture to simplify
access to this information. Also, add the bitmap seq_zone_bitmap which
indicates which zones of the device are sequential zones (write
preferred or write required). These two fields are initialized by the
low level block device driver (sd.c for ZBC/ZAC disks). They are not
initialized by stacking drivers (device mappers) handling zoned block
devices (e.g. dm-linear).

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
---
 include/linux/blkdev.h | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02fa42d24b52..a44807c502f0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -544,6 +544,18 @@ struct request_queue {
 	struct queue_limits	limits;
 
 	/*
+	 * Zoned block device information for mq I/O schedulers.
+	 * nr_zones is the total number of zones of the device. This is always
+	 * 0 for regular block devices. seq_zone_bitmap is a bitmap of nr_zones
+	 * bits which indicates if a zone is conventional (bit clear) or
+	 * sequential (bit set). Both nr_zones and seq_zone_bitmap are set
+	 * by the low level device driver. Stacking drivers (device mappers)
+	 * may or may not initialize these fields.
+	 */
+	unsigned int	nr_zones;
+	unsigned long	*seq_zone_bitmap;
+
+	/*
 	 * sg stuff
 	 */
 	unsigned int		sg_timeout;
@@ -786,6 +798,27 @@ static inline unsigned int blk_queue_zone_sectors(struct request_queue *q)
 	return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0;
 }
 
+static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
+{
+	return q->nr_zones;
+}
+
+static inline unsigned int blk_queue_zone_no(struct request_queue *q,
+					     sector_t sector)
+{
+	if (!blk_queue_is_zoned(q))
+		return 0;
+	return sector >> ilog2(q->limits.chunk_sectors);
+}
+
+static inline bool blk_queue_zone_is_seq(struct request_queue *q,
+					 sector_t sector)
+{
+	if (!blk_queue_is_zoned(q) || !q->seq_zone_bitmap)
+		return false;
+	return test_bit(blk_queue_zone_no(q, sector), q->seq_zone_bitmap);
+}
+
 static inline bool rq_is_sync(struct request *rq)
 {
 	return op_is_sync(rq->cmd_flags);
@@ -1032,6 +1065,16 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
 	return blk_rq_cur_bytes(rq) >> 9;
 }
 
+static inline unsigned int blk_rq_zone_no(struct request *rq)
+{
+	return blk_queue_zone_no(rq->q, blk_rq_pos(rq));
+}
+
+static inline unsigned int blk_rq_zone_is_seq(struct request *rq)
+{
+	return blk_queue_zone_is_seq(rq->q, blk_rq_pos(rq));
+}
+
 /*
  * Some commands like WRITE SAME have a payload or data transfer size which
  * is different from the size of the request.  Any driver that supports such
@@ -1583,6 +1626,16 @@ static inline unsigned int bdev_zone_sectors(struct block_device *bdev)
 	return 0;
 }
 
+static inline unsigned int bdev_nr_zones(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_nr_zones(q);
+
+	return 0;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 07/14] scsi: sd_zbc: Initialize device request queue zoned data
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (5 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 06/14] block: Add zoned block device information to request queue Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02 16:57     ` Bart Van Assche
  2017-10-02  7:15 ` [PATCH V6 08/14] scsi: sd_zbc: Limit zone write locking to sequential zones Damien Le Moal
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Initialize the seq_zone_bitmap and nr_zones fields of the disk request
queue on disk revalidate. As the seq_zone_bitmap allocation is
identical to the allocation of the zone write lock bitmap, introduce
the helper sd_zbc_alloc_zone_bitmap(). Using this helper, wait for the
disk capacity and number of zones to stabilize on the second
revalidation pass to allocate and initialize the bitmaps.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd_zbc.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 133 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 27793b9f54c0..4ccb20074cb3 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -586,8 +586,121 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 	return 0;
 }
 
+/**
+ * sd_zbc_alloc_zone_bitmap - Allocate a zone bitmap (one bit per zone).
+ * @sdkp: The disk of the bitmap
+ */
+static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
+{
+	struct request_queue *q = sdkp->disk->queue;
+
+	return kzalloc_node(BITS_TO_LONGS(sdkp->nr_zones)
+			    * sizeof(unsigned long),
+			    GFP_KERNEL, q->node);
+}
+
+/**
+ * sd_zbc_get_seq_zones - Parse report zones reply to identify sequential zones
+ * @sdkp: disk used
+ * @buf: report reply buffer
+ * @seq_zone_bitamp: bitmap of sequential zones to set
+ *
+ * Parse reported zone descriptors in @buf to identify sequential zones and
+ * set the reported zone bit in @seq_zone_bitmap accordingly.
+ * Since read-only and offline zones cannot be written, do not
+ * mark them as sequential in the bitmap.
+ * Return the LBA after the last zone reported.
+ */
+static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf,
+				     unsigned int buflen,
+				     unsigned long *seq_zone_bitmap)
+{
+	sector_t lba, next_lba = sdkp->capacity;
+	unsigned int buf_len, list_length;
+	unsigned char *rec;
+	u8 type, cond;
+
+	list_length = get_unaligned_be32(&buf[0]) + 64;
+	buf_len = min(list_length, buflen);
+	rec = buf + 64;
+
+	while (rec < buf + buf_len) {
+		type = rec[0] & 0x0f;
+		cond = (rec[1] >> 4) & 0xf;
+		lba = get_unaligned_be64(&rec[16]);
+		if (type != ZBC_ZONE_TYPE_CONV &&
+		    cond != ZBC_ZONE_COND_READONLY &&
+		    cond != ZBC_ZONE_COND_OFFLINE)
+			set_bit(lba >> sdkp->zone_shift, seq_zone_bitmap);
+		next_lba = lba + get_unaligned_be64(&rec[8]);
+		rec += 64;
+	}
+
+	return next_lba;
+}
+
+/**
+ * sd_zbc_setup_seq_zone_bitmap - Initialize the disk seq zone bitmap.
+ * @sdkp: target disk
+ *
+ * Allocate a zone bitmap and initialize it by identifying sequential zones.
+ */
+static int sd_zbc_setup_seq_zone_bitmap(struct scsi_disk *sdkp)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	unsigned long *seq_zone_bitmap;
+	sector_t lba = 0;
+	unsigned char *buf;
+	int ret = -ENOMEM;
+
+	seq_zone_bitmap = sd_zbc_alloc_zone_bitmap(sdkp);
+	if (!seq_zone_bitmap)
+		return -ENOMEM;
+
+	buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
+	if (!buf)
+		goto out;
+
+	while (lba < sdkp->capacity) {
+		ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, lba);
+		if (ret)
+			goto out;
+		lba = sd_zbc_get_seq_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
+					   seq_zone_bitmap);
+	}
+
+	if (lba != sdkp->capacity) {
+		/* Something went wrong */
+		ret = -EIO;
+	}
+
+out:
+	kfree(buf);
+	if (ret) {
+		kfree(seq_zone_bitmap);
+		return ret;
+	}
+
+	q->seq_zone_bitmap = seq_zone_bitmap;
+
+	return 0;
+}
+
+static void sd_zbc_cleanup(struct scsi_disk *sdkp)
+{
+	struct request_queue *q = sdkp->disk->queue;
+
+	kfree(q->seq_zone_bitmap);
+	q->seq_zone_bitmap = NULL;
+
+	kfree(sdkp->zones_wlock);
+	sdkp->zones_wlock = NULL;
+}
+
 static int sd_zbc_setup(struct scsi_disk *sdkp)
 {
+	struct request_queue *q = sdkp->disk->queue;
+	int ret;
 
 	/* READ16/WRITE16 is mandatory for ZBC disks */
 	sdkp->device->use_16_for_rw = 1;
@@ -599,14 +712,29 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
 	sdkp->nr_zones =
 		round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift;
 
+	/*
+	 * Wait for the disk capacity to stabilize before
+	 * initializing zone related information.
+	 */
+	if (sdkp->first_scan)
+		return 0;
+
 	if (!sdkp->zones_wlock) {
-		sdkp->zones_wlock = kcalloc(BITS_TO_LONGS(sdkp->nr_zones),
-					    sizeof(unsigned long),
-					    GFP_KERNEL);
+		sdkp->zones_wlock = sd_zbc_alloc_zone_bitmap(sdkp);
 		if (!sdkp->zones_wlock)
 			return -ENOMEM;
 	}
 
+	if (!q->seq_zone_bitmap) {
+		ret = sd_zbc_setup_seq_zone_bitmap(sdkp);
+		if (ret) {
+			sd_zbc_cleanup(sdkp);
+			return ret;
+		}
+	}
+
+	q->nr_zones = sdkp->nr_zones;
+
 	return 0;
 }
 
@@ -661,14 +789,14 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 
 err:
 	sdkp->capacity = 0;
+	sd_zbc_cleanup(sdkp);
 
 	return ret;
 }
 
 void sd_zbc_remove(struct scsi_disk *sdkp)
 {
-	kfree(sdkp->zones_wlock);
-	sdkp->zones_wlock = NULL;
+	sd_zbc_cleanup(sdkp);
 }
 
 void sd_zbc_print_zones(struct scsi_disk *sdkp)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 08/14] scsi: sd_zbc: Limit zone write locking to sequential zones
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (6 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 07/14] scsi: sd_zbc: Initialize device request queue zoned data Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02 17:00     ` Bart Van Assche
  2017-10-02  7:15 ` [PATCH V6 09/14] scsi: sd_zbc: Disable zone write locking with scsi-mq Damien Le Moal
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Conventional zones of zoned block devices have no write constraints.
Write locking of conventional zones is thus not necessary and can even
hurt performance by unnecessarily operating the disk under low queue
depth. To avoid this, use the disk request queue seq_zone_bitmap to
allow any write to be issued to conventional zones, locking only
sequential zones.

While at it, remove the helper sd_zbc_zone_no() and use
blk_rq_zone_no() instead.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd_zbc.c | 32 ++++++++++++--------------------
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 4ccb20074cb3..4cb271619fc6 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -230,17 +230,6 @@ static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
 }
 
 /**
- * sd_zbc_zone_no - Get the number of the zone conataining a sector.
- * @sdkp: The target disk
- * @sector: 512B sector address contained in the zone
- */
-static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
-					  sector_t sector)
-{
-	return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
-}
-
-/**
  * sd_zbc_setup_reset_cmnd - Prepare a RESET WRITE POINTER scsi command.
  * @cmd: the command to setup
  *
@@ -301,7 +290,6 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 	sector_t sector = blk_rq_pos(rq);
 	sector_t zone_sectors = sd_zbc_zone_sectors(sdkp);
-	unsigned int zno = sd_zbc_zone_no(sdkp, sector);
 
 	/*
 	 * Note: Checks of the alignment of the write command on
@@ -309,18 +297,21 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	 */
 
 	/* Do not allow zone boundaries crossing on host-managed drives */
-	if (blk_queue_zoned_model(sdkp->disk->queue) == BLK_ZONED_HM &&
+	if (blk_queue_zoned_model(rq->q) == BLK_ZONED_HM &&
 	    (sector & (zone_sectors - 1)) + blk_rq_sectors(rq) > zone_sectors)
 		return BLKPREP_KILL;
 
 	/*
-	 * Do not issue more than one write at a time per
-	 * zone. This solves write ordering problems due to
-	 * the unlocking of the request queue in the dispatch
-	 * path in the non scsi-mq case.
+	 * There is no write constraints on conventional zones. So any write
+	 * command can be sent. But do not issue more than one write command
+	 * at a time per sequential zone. This avoids write ordering problems
+	 * due to the unlocking of the request queue in the dispatch path of
+	 * legacy scsi path, as well as at the HBA level (e.g. AHCI).
 	 */
+	if (!blk_rq_zone_is_seq(rq))
+		return BLKPREP_OK;
 	if (sdkp->zones_wlock &&
-	    test_and_set_bit(zno, sdkp->zones_wlock))
+	    test_and_set_bit(blk_rq_zone_no(rq), sdkp->zones_wlock))
 		return BLKPREP_DEFER;
 
 	WARN_ON_ONCE(cmd->flags & SCMD_ZONE_WRITE_LOCK);
@@ -341,8 +332,9 @@ void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
 	struct request *rq = cmd->request;
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 
-	if (sdkp->zones_wlock && cmd->flags & SCMD_ZONE_WRITE_LOCK) {
-		unsigned int zno = sd_zbc_zone_no(sdkp, blk_rq_pos(rq));
+	if (cmd->flags & SCMD_ZONE_WRITE_LOCK) {
+		unsigned int zno = blk_rq_zone_no(rq);
+
 		WARN_ON_ONCE(!test_bit(zno, sdkp->zones_wlock));
 		cmd->flags &= ~SCMD_ZONE_WRITE_LOCK;
 		clear_bit_unlock(zno, sdkp->zones_wlock);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 09/14] scsi: sd_zbc: Disable zone write locking with scsi-mq
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (7 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 08/14] scsi: sd_zbc: Limit zone write locking to sequential zones Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 10/14] block: mq-deadline: Add zoned block device data Damien Le Moal
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

In the case of a ZBC disk used with scsi-mq, zone write locking does
not prevent write reordering in sequential zones. Unlike the legacy
case, zone locking is done after the command request is removed from
the scheduler dispatch queue. That is, at the time of zone locking,
the write command may already be out of order, making locking
ineffective. Write order guarantees can only be provided by an
adapted I/O scheduler.

Disable zone write locking in sd_zbc_write_lock_zone() if the disk is
used with scsi-mq. As the disk zones_wlock bitmap is not necessry,
do not allocate it.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd_zbc.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 4cb271619fc6..f56ed1bc3eaa 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -287,6 +287,7 @@ int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
 int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
+	struct request_queue *q = rq->q;
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 	sector_t sector = blk_rq_pos(rq);
 	sector_t zone_sectors = sd_zbc_zone_sectors(sdkp);
@@ -297,10 +298,14 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	 */
 
 	/* Do not allow zone boundaries crossing on host-managed drives */
-	if (blk_queue_zoned_model(rq->q) == BLK_ZONED_HM &&
+	if (blk_queue_zoned_model(q) == BLK_ZONED_HM &&
 	    (sector & (zone_sectors - 1)) + blk_rq_sectors(rq) > zone_sectors)
 		return BLKPREP_KILL;
 
+	/* No write locking with scsi-mq */
+	if (q->mq_ops)
+		return BLKPREP_OK;
+
 	/*
 	 * There is no write constraints on conventional zones. So any write
 	 * command can be sent. But do not issue more than one write command
@@ -711,7 +716,7 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
 	if (sdkp->first_scan)
 		return 0;
 
-	if (!sdkp->zones_wlock) {
+	if (!q->mq_ops && !sdkp->zones_wlock) {
 		sdkp->zones_wlock = sd_zbc_alloc_zone_bitmap(sdkp);
 		if (!sdkp->zones_wlock)
 			return -ENOMEM;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 10/14] block: mq-deadline: Add zoned block device data
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (8 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 09/14] scsi: sd_zbc: Disable zone write locking with scsi-mq Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02 23:06     ` Bart Van Assche
  2017-10-02  7:15 ` [PATCH V6 11/14] blokc: mq-deadline: Introduce dispatch helpers Damien Le Moal
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Introduce a zone bitmap field in mq-deadline private data to support
zoned block devices. The zone bitmap is used to implement per zone
write locking. Modify mq-deadline init_queue() and exit_queue()
elevator methods to handle initialization and cleanup of the zone write
lock bitmap.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/mq-deadline.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index a1cad4331edd..f0c01ef934b4 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -60,6 +60,8 @@ struct deadline_data {
 
 	spinlock_t lock;
 	struct list_head dispatch;
+
+	unsigned long *zones_wlock;
 };
 
 static inline struct rb_root *
@@ -300,6 +302,34 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	return rq;
 }
 
+static int deadline_init_zones_wlock(struct request_queue *q,
+				     struct deadline_data *dd)
+{
+	/*
+	 * For regular drives or non-conforming zoned block device,
+	 * do not use zone write locking.
+	 */
+	if (!blk_queue_nr_zones(q))
+		return 0;
+
+	/*
+	 * Treat host aware drives as regular disks.
+	 */
+	if (blk_queue_zoned_model(q) != BLK_ZONED_HM)
+		return 0;
+
+	dd->zones_wlock = kzalloc_node(BITS_TO_LONGS(blk_queue_nr_zones(q))
+				       * sizeof(unsigned long),
+				       GFP_KERNEL, q->node);
+	if (!dd->zones_wlock)
+		return -ENOMEM;
+
+	pr_info("mq-deadline: %s: zones write locking enabled\n",
+		dev_name(q->backing_dev_info->dev));
+
+	return 0;
+}
+
 static void dd_exit_queue(struct elevator_queue *e)
 {
 	struct deadline_data *dd = e->elevator_data;
@@ -307,6 +337,7 @@ static void dd_exit_queue(struct elevator_queue *e)
 	BUG_ON(!list_empty(&dd->fifo_list[READ]));
 	BUG_ON(!list_empty(&dd->fifo_list[WRITE]));
 
+	kfree(dd->zones_wlock);
 	kfree(dd);
 }
 
@@ -317,16 +348,15 @@ static int dd_init_queue(struct request_queue *q, struct elevator_type *e)
 {
 	struct deadline_data *dd;
 	struct elevator_queue *eq;
+	int ret = -ENOMEM;
 
 	eq = elevator_alloc(q, e);
 	if (!eq)
 		return -ENOMEM;
 
 	dd = kzalloc_node(sizeof(*dd), GFP_KERNEL, q->node);
-	if (!dd) {
-		kobject_put(&eq->kobj);
-		return -ENOMEM;
-	}
+	if (!dd)
+		goto out_put_elv;
 	eq->elevator_data = dd;
 
 	INIT_LIST_HEAD(&dd->fifo_list[READ]);
@@ -341,8 +371,18 @@ static int dd_init_queue(struct request_queue *q, struct elevator_type *e)
 	spin_lock_init(&dd->lock);
 	INIT_LIST_HEAD(&dd->dispatch);
 
+	ret = deadline_init_zones_wlock(q, dd);
+	if (ret)
+		goto out_free_dd;
+
 	q->elevator = eq;
 	return 0;
+
+out_free_dd:
+	kfree(dd);
+out_put_elv:
+	kobject_put(&eq->kobj);
+	return ret;
 }
 
 static int dd_request_merge(struct request_queue *q, struct request **rq,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 11/14] blokc: mq-deadline: Introduce dispatch helpers
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (9 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 10/14] block: mq-deadline: Add zoned block device data Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 12/14] block: mq-deadline: Introduce zone locking support Damien Le Moal
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Avoid directly referencing the next_rq and fifo_list arrays using the
helper functions deadline_next_request() and deadline_fifo_request() to
facilitate changes in the dispatch request selection in
__dd_dispatch_request().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
---
 block/mq-deadline.c | 45 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f0c01ef934b4..6b7b84ee8f82 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -194,13 +194,42 @@ static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 }
 
 /*
+ * For the specified data direction, return the next request to
+ * dispatch using arrival ordered lists.
+ */
+static struct request *
+deadline_fifo_request(struct deadline_data *dd, int data_dir)
+{
+	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
+		return NULL;
+
+	if (list_empty(&dd->fifo_list[data_dir]))
+		return NULL;
+
+	return rq_entry_fifo(dd->fifo_list[data_dir].next);
+}
+
+/*
+ * For the specified data direction, return the next request to
+ * dispatch using sector position sorted lists.
+ */
+static struct request *
+deadline_next_request(struct deadline_data *dd, int data_dir)
+{
+	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
+		return NULL;
+
+	return dd->next_rq[data_dir];
+}
+
+/*
  * deadline_dispatch_requests selects the best request according to
  * read/write expire, fifo_batch, etc
  */
 static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 {
 	struct deadline_data *dd = hctx->queue->elevator->elevator_data;
-	struct request *rq;
+	struct request *rq, *next_rq;
 	bool reads, writes;
 	int data_dir;
 
@@ -216,10 +245,9 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	/*
 	 * batches are currently reads XOR writes
 	 */
-	if (dd->next_rq[WRITE])
-		rq = dd->next_rq[WRITE];
-	else
-		rq = dd->next_rq[READ];
+	rq = deadline_next_request(dd, WRITE);
+	if (!rq)
+		rq = deadline_next_request(dd, READ);
 
 	if (rq && dd->batching < dd->fifo_batch)
 		/* we have a next request are still entitled to batch */
@@ -262,19 +290,20 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	/*
 	 * we are not running a batch, find best request for selected data_dir
 	 */
-	if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
+	next_rq = deadline_next_request(dd, data_dir);
+	if (deadline_check_fifo(dd, data_dir) || !next_rq) {
 		/*
 		 * A deadline has expired, the last request was in the other
 		 * direction, or we have run out of higher-sectored requests.
 		 * Start again from the request with the earliest expiry time.
 		 */
-		rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
+		rq = deadline_fifo_request(dd, data_dir);
 	} else {
 		/*
 		 * The last req was the same dir and we have a next request in
 		 * sort order. No expired requests so continue on from here.
 		 */
-		rq = dd->next_rq[data_dir];
+		rq = next_rq;
 	}
 
 	dd->batching = 0;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 12/14] block: mq-deadline: Introduce zone locking support
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (10 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 11/14] blokc: mq-deadline: Introduce dispatch helpers Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02 23:12     ` Bart Van Assche
  2017-10-02  7:15 ` [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices Damien Le Moal
  2017-10-02  7:15 ` [PATCH V6 14/14] block: do not set mq default scheduler Damien Le Moal
  13 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

For a write request to a zoned block device, lock the request target
zone upon request displatch. The zone is unlocked either when the
request completes or when the request is requeued (inserted).

To indicate that a request has locked its target zone, use the first
pointer of the request elevator private data to store the value
RQ_ZONE_WLOCKED. Testing for this value allows quick decision in
dd_insert_request() and dd_completed_request() regarding the need for
unlocking the target zone of a request.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/mq-deadline.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 6b7b84ee8f82..93a1aede5dd0 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -177,6 +177,91 @@ deadline_move_request(struct deadline_data *dd, struct request *rq)
 }
 
 /*
+ * Return true if a request is a write requests that needs zone
+ * write locking.
+ */
+static inline bool deadline_request_needs_zone_wlock(struct deadline_data *dd,
+						     struct request *rq)
+{
+
+	if (!dd->zones_wlock)
+		return false;
+
+	if (blk_rq_is_passthrough(rq))
+		return false;
+
+	/*
+	 * REQ_OP_SCSI_* and REQ_OP_DRV_* are already handled with
+	 * the previous check. Add them again here so that all request
+	 * operations defined by enum req_off are handled (so that a compiler
+	 * warning shows up if/when request operation definitions change.
+	 */
+	switch (req_op(rq)) {
+	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_WRITE_SAME:
+	case REQ_OP_WRITE:
+		return blk_rq_zone_is_seq(rq);
+	default:
+		return false;
+	}
+}
+
+/*
+ * Abuse the elv.priv[0] pointer to indicate if a request has write
+ * locked its target zone. Only write request to a zoned block device
+ * can own a zone write lock.
+ */
+enum rq_zone_lock {
+	RQ_ZONE_NO_WLOCK = 0UL,
+	RQ_ZONE_WLOCKED  = 1UL,
+};
+
+static inline void deadline_set_request_zone_wlock(struct request *rq)
+{
+	rq->elv.priv[0] = (void *)RQ_ZONE_WLOCKED;
+}
+
+static inline void deadline_clear_request_zone_wlock(struct request *rq)
+{
+	rq->elv.priv[0] = (void *)RQ_ZONE_NO_WLOCK;
+}
+
+static inline bool deadline_request_has_zone_wlock(struct request *rq)
+{
+	return rq->elv.priv[0] == (void *)RQ_ZONE_WLOCKED;
+}
+
+/*
+ * Write lock the target zone of a write request.
+ */
+static void deadline_wlock_zone(struct deadline_data *dd,
+				struct request *rq)
+{
+	WARN_ON_ONCE(deadline_request_has_zone_wlock(rq));
+	WARN_ON_ONCE(test_and_set_bit(blk_rq_zone_no(rq), dd->zones_wlock));
+	deadline_set_request_zone_wlock(rq);
+}
+
+/*
+ * Write unlock the target zone of a write request.
+ */
+static void deadline_wunlock_zone(struct deadline_data *dd,
+				  struct request *rq)
+{
+	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));
+	deadline_clear_request_zone_wlock(rq);
+}
+
+/*
+ * Test the write lock state of the target zone of a write request.
+ */
+static inline bool deadline_zone_is_wlocked(struct deadline_data *dd,
+					    struct request *rq)
+{
+	return test_bit(blk_rq_zone_no(rq), dd->zones_wlock);
+}
+
+/*
  * deadline_check_fifo returns 0 if there are no expired requests on the fifo,
  * 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
  */
@@ -315,6 +400,11 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	dd->batching++;
 	deadline_move_request(dd, rq);
 done:
+	/*
+	 * If the request needs its target zone locked, do it.
+	 */
+	if (deadline_request_needs_zone_wlock(dd, rq))
+		deadline_wlock_zone(dd, rq);
 	rq->rq_flags |= RQF_STARTED;
 	return rq;
 }
@@ -464,6 +554,13 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	struct deadline_data *dd = q->elevator->elevator_data;
 	const int data_dir = rq_data_dir(rq);
 
+	/*
+	 * This may be a requeue of a write request that has locked its
+	 * target zone. If this is the case, release the zone lock.
+	 */
+	if (deadline_request_has_zone_wlock(rq))
+		deadline_wunlock_zone(dd, rq);
+
 	if (blk_mq_sched_try_insert_merge(q, rq))
 		return;
 
@@ -508,6 +605,19 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 	spin_unlock(&dd->lock);
 }
 
+/*
+ * For zoned block devices, write unlock the target zone of
+ * completed write requests.
+ */
+static void dd_completed_request(struct request *rq)
+{
+	if (deadline_request_has_zone_wlock(rq)) {
+		struct deadline_data *dd = rq->q->elevator->elevator_data;
+
+		deadline_wunlock_zone(dd, rq);
+	}
+}
+
 static bool dd_has_work(struct blk_mq_hw_ctx *hctx)
 {
 	struct deadline_data *dd = hctx->queue->elevator->elevator_data;
@@ -709,6 +819,7 @@ static struct elevator_type mq_deadline = {
 	.ops.mq = {
 		.insert_requests	= dd_insert_requests,
 		.dispatch_request	= dd_dispatch_request,
+		.completed_request	= dd_completed_request,
 		.next_request		= elv_rb_latter_request,
 		.former_request		= elv_rb_former_request,
 		.bio_merge		= dd_bio_merge,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (11 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 12/14] block: mq-deadline: Introduce zone locking support Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  2017-10-02 23:44     ` Bart Van Assche
  2017-10-02  7:15 ` [PATCH V6 14/14] block: do not set mq default scheduler Damien Le Moal
  13 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

When dispatching write requests to a zoned block device, only allow
requests targeting an unlocked zone. Requests targeting a locked zone
are left in the scheduler queue to preserve the initial write order.
If no write request can be dispatched, allow reads to be dispatched
even if the write batch is not done.

To ensure that the search for an appropriate write request is atomic
in deadline_fifo_request() and deadline_next_request() with reagrd to
write requests zone lock state, introduce the spinlock zone_lock.
Holding this lock while doing the search in these functions as well as
when unlocking the target zone of a completed write request in
dd_completed_request() ensure that the search does not pickup a write
request in the middle of a zone queued write sequence.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/mq-deadline.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 93a1aede5dd0..cbca24e1f96e 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -61,6 +61,7 @@ struct deadline_data {
 	spinlock_t lock;
 	struct list_head dispatch;
 
+	spinlock_t zone_lock;
 	unsigned long *zones_wlock;
 };
 
@@ -244,12 +245,24 @@ static void deadline_wlock_zone(struct deadline_data *dd,
 
 /*
  * Write unlock the target zone of a write request.
+ * Clearing the target zone write lock bit is done with the scheduler zone_lock
+ * spinlock held so that deadline_next_request() and deadline_fifo_request()
+ * cannot see the lock state of a zone change due to a request completion during
+ * their eventual search for an appropriate write request. Otherwise, for a zone
+ * with multiple write requests queued, a non sequential write request
+ * can be chosen.
  */
 static void deadline_wunlock_zone(struct deadline_data *dd,
 				  struct request *rq)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&dd->zone_lock, flags);
+
 	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));
 	deadline_clear_request_zone_wlock(rq);
+
+	spin_unlock_irqrestore(&dd->zone_lock, flags);
 }
 
 /*
@@ -279,19 +292,47 @@ static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 }
 
 /*
+ * Test if a request can be dispatched.
+ */
+static inline bool deadline_can_dispatch_request(struct deadline_data *dd,
+						 struct request *rq)
+{
+	if (!deadline_request_needs_zone_wlock(dd, rq))
+		return true;
+	return !deadline_zone_is_wlocked(dd, rq);
+}
+
+/*
  * For the specified data direction, return the next request to
  * dispatch using arrival ordered lists.
  */
 static struct request *
 deadline_fifo_request(struct deadline_data *dd, int data_dir)
 {
+	struct request *rq;
+	unsigned long flags;
+
 	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
 		return NULL;
 
 	if (list_empty(&dd->fifo_list[data_dir]))
 		return NULL;
 
-	return rq_entry_fifo(dd->fifo_list[data_dir].next);
+	if (!dd->zones_wlock || data_dir == READ)
+		return rq_entry_fifo(dd->fifo_list[data_dir].next);
+
+	spin_lock_irqsave(&dd->zone_lock, flags);
+
+	list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {
+		if (deadline_can_dispatch_request(dd, rq))
+			goto out;
+	}
+	rq = NULL;
+
+out:
+	spin_unlock_irqrestore(&dd->zone_lock, flags);
+
+	return rq;
 }
 
 /*
@@ -301,10 +342,25 @@ deadline_fifo_request(struct deadline_data *dd, int data_dir)
 static struct request *
 deadline_next_request(struct deadline_data *dd, int data_dir)
 {
+	struct request *rq;
+	unsigned long flags;
+
 	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
 		return NULL;
 
-	return dd->next_rq[data_dir];
+	rq = dd->next_rq[data_dir];
+	if (!dd->zones_wlock || data_dir == READ)
+		return rq;
+
+	spin_lock_irqsave(&dd->zone_lock, flags);
+	while (rq) {
+		if (deadline_can_dispatch_request(dd, rq))
+			break;
+		rq = deadline_latter_request(rq);
+	}
+	spin_unlock_irqrestore(&dd->zone_lock, flags);
+
+	return rq;
 }
 
 /*
@@ -346,7 +402,8 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	if (reads) {
 		BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
 
-		if (writes && (dd->starved++ >= dd->writes_starved))
+		if (deadline_fifo_request(dd, WRITE) &&
+		    (dd->starved++ >= dd->writes_starved))
 			goto dispatch_writes;
 
 		data_dir = READ;
@@ -391,6 +448,13 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 		rq = next_rq;
 	}
 
+	/*
+	 * If we only have writes queued and none of them can be dispatched,
+	 * rq will be NULL.
+	 */
+	if (!rq)
+		return NULL;
+
 	dd->batching = 0;
 
 dispatch_request:
@@ -490,6 +554,7 @@ static int dd_init_queue(struct request_queue *q, struct elevator_type *e)
 	spin_lock_init(&dd->lock);
 	INIT_LIST_HEAD(&dd->dispatch);
 
+	spin_lock_init(&dd->zone_lock);
 	ret = deadline_init_zones_wlock(q, dd);
 	if (ret)
 		goto out_free_dd;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V6 14/14] block: do not set mq default scheduler
  2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
                   ` (12 preceding siblings ...)
  2017-10-02  7:15 ` [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices Damien Le Moal
@ 2017-10-02  7:15 ` Damien Le Moal
  13 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-02  7:15 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

For blk-mq disks with a single hardware queue, setting by default the
disk scheduler to mq-deadline early during the queue initialization
prevents properly setting zone write locking for host managed zoned
block device as the disk type is not yet known.

Fix this by simply not setting the default scheduler to mq-deadline for
single hardware queue disks. A udev rule can be used to easily do the
same later in the system initialization sequence, when the device
characteristics are known.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/elevator.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 153926a90901..8b65a757f726 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -222,19 +222,14 @@ int elevator_init(struct request_queue *q, char *name)
 
 	if (!e) {
 		/*
-		 * For blk-mq devices, we default to using mq-deadline,
-		 * if available, for single queue devices. If deadline
-		 * isn't available OR we have multiple queues, default
-		 * to "none".
+		 * For blk-mq devices, default to "none". udev can later set
+		 * an appropriate default scheduler based on the disk
+		 * characteristics which we do not yet have here.
 		 */
-		if (q->mq_ops) {
-			if (q->nr_hw_queues == 1)
-				e = elevator_get("mq-deadline", false);
-			if (!e)
-				return 0;
-		} else
-			e = elevator_get(CONFIG_DEFAULT_IOSCHED, false);
+		if (q->mq_ops)
+			return 0;
 
+		e = elevator_get(CONFIG_DEFAULT_IOSCHED, false);
 		if (!e) {
 			printk(KERN_ERR
 				"Default I/O scheduler not found. " \
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 02/14] scsi: sd_zbc: Fix comments and indentation
  2017-10-02  7:15 ` [PATCH V6 02/14] scsi: sd_zbc: Fix comments and indentation Damien Le Moal
@ 2017-10-02 16:50     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 16:50 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gTW9uLCAyMDE3LTEwLTAyIGF0IDE2OjE1ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gRml4IGNvbW1lbnRzIHN0eWxlICh1c2Uga2VybmVsLWRvYyBzdHlsZSkgYW5kIGNvbnRlbnQg
dG8gY2xhcmlmeSBzb21lDQo+IGZ1bmN0aW9ucy4gQWxzbyBmaXggc29tZSBmdW5jdGlvbnMgc2ln
bmF0dXJlIGluZGVudGF0aW9uIGFuZCByZW1vdmUgYQ0KPiB1c2VsZXNzIGJsYW5rIGxpbmUgaW4g
c2RfemJjX3JlYWRfem9uZXMoKS4NCg0KUmV2aWV3ZWQtYnk6IEJhcnQgVmFuIEFzc2NoZSA8QmFy
dC5WYW5Bc3NjaGVAd2RjLmNvbT4=

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 02/14] scsi: sd_zbc: Fix comments and indentation
@ 2017-10-02 16:50     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 16:50 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
> Fix comments style (use kernel-doc style) and content to clarify some
> functions. Also fix some functions signature indentation and remove a
> useless blank line in sd_zbc_read_zones().

Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 07/14] scsi: sd_zbc: Initialize device request queue zoned data
  2017-10-02  7:15 ` [PATCH V6 07/14] scsi: sd_zbc: Initialize device request queue zoned data Damien Le Moal
@ 2017-10-02 16:57     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 16:57 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gTW9uLCAyMDE3LTEwLTAyIGF0IDE2OjE1ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gSW5pdGlhbGl6ZSB0aGUgc2VxX3pvbmVfYml0bWFwIGFuZCBucl96b25lcyBmaWVsZHMgb2Yg
dGhlIGRpc2sgcmVxdWVzdA0KPiBxdWV1ZSBvbiBkaXNrIHJldmFsaWRhdGUuIEFzIHRoZSBzZXFf
em9uZV9iaXRtYXAgYWxsb2NhdGlvbiBpcw0KPiBpZGVudGljYWwgdG8gdGhlIGFsbG9jYXRpb24g
b2YgdGhlIHpvbmUgd3JpdGUgbG9jayBiaXRtYXAsIGludHJvZHVjZQ0KPiB0aGUgaGVscGVyIHNk
X3piY19hbGxvY196b25lX2JpdG1hcCgpLiBVc2luZyB0aGlzIGhlbHBlciwgd2FpdCBmb3IgdGhl
DQo+IGRpc2sgY2FwYWNpdHkgYW5kIG51bWJlciBvZiB6b25lcyB0byBzdGFiaWxpemUgb24gdGhl
IHNlY29uZA0KPiByZXZhbGlkYXRpb24gcGFzcyB0byBhbGxvY2F0ZSBhbmQgaW5pdGlhbGl6ZSB0
aGUgYml0bWFwcy4NCg0KUmV2aWV3ZWQtYnk6IEJhcnQgVmFuIEFzc2NoZSA8QmFydC5WYW5Bc3Nj
aGVAd2RjLmNvbT4=

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 07/14] scsi: sd_zbc: Initialize device request queue zoned data
@ 2017-10-02 16:57     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 16:57 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
> Initialize the seq_zone_bitmap and nr_zones fields of the disk request
> queue on disk revalidate. As the seq_zone_bitmap allocation is
> identical to the allocation of the zone write lock bitmap, introduce
> the helper sd_zbc_alloc_zone_bitmap(). Using this helper, wait for the
> disk capacity and number of zones to stabilize on the second
> revalidation pass to allocate and initialize the bitmaps.

Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 08/14] scsi: sd_zbc: Limit zone write locking to sequential zones
  2017-10-02  7:15 ` [PATCH V6 08/14] scsi: sd_zbc: Limit zone write locking to sequential zones Damien Le Moal
@ 2017-10-02 17:00     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 17:00 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gTW9uLCAyMDE3LTEwLTAyIGF0IDE2OjE1ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gKwkgKiBUaGVyZSBpcyBubyB3cml0ZSBjb25zdHJhaW50cyBvbiBjb252ZW50aW9uYWwgem9u
ZXMuIFNvIGFueSB3cml0ZQ0KICAgICAgICAgICAgICAgICBeXg0KICAgICAgICAgICAgICAgICBh
cmU/DQoNCkFueXdheToNCg0KUmV2aWV3ZWQtYnk6IEJhcnQgVmFuIEFzc2NoZSA8QmFydC5WYW5B
c3NjaGVAd2RjLmNvbT4NCg==

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 08/14] scsi: sd_zbc: Limit zone write locking to sequential zones
@ 2017-10-02 17:00     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 17:00 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
> +	 * There is no write constraints on conventional zones. So any write
                 ^^
                 are?

Anyway:

Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 10/14] block: mq-deadline: Add zoned block device data
  2017-10-02  7:15 ` [PATCH V6 10/14] block: mq-deadline: Add zoned block device data Damien Le Moal
@ 2017-10-02 23:06     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 23:06 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gTW9uLCAyMDE3LTEwLTAyIGF0IDE2OjE1ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gKwlwcl9pbmZvKCJtcS1kZWFkbGluZTogJXM6IHpvbmVzIHdyaXRlIGxvY2tpbmcgZW5hYmxl
ZFxuIiwNCj4gKwkJZGV2X25hbWUocS0+YmFja2luZ19kZXZfaW5mby0+ZGV2KSk7DQoNClNob3Vs
ZCB0aGlzIG1lc3NhZ2UgcGVyaGFwcyBiZSBjaGFuZ2VkIGludG8gIi4uLiB6b25lIHdyaXRlIGxv
Y2tpbmcgLi4uIg0Kb3IgICIuLi4gcGVyLXpvbmUgd3JpdGUgbG9ja2luZyAuLi4iPyBBbnl3YXk6
DQoNClJldmlld2VkLWJ5OiBCYXJ0IFZhbiBBc3NjaGUgPEJhcnQuVmFuQXNzY2hlQHdkYy5jb20+
DQo=

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 10/14] block: mq-deadline: Add zoned block device data
@ 2017-10-02 23:06     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 23:06 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
> +	pr_info("mq-deadline: %s: zones write locking enabled\n",
> +		dev_name(q->backing_dev_info->dev));

Should this message perhaps be changed into "... zone write locking ..."
or  "... per-zone write locking ..."? Anyway:

Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 12/14] block: mq-deadline: Introduce zone locking support
  2017-10-02  7:15 ` [PATCH V6 12/14] block: mq-deadline: Introduce zone locking support Damien Le Moal
@ 2017-10-02 23:12     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 23:12 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gTW9uLCAyMDE3LTEwLTAyIGF0IDE2OjE1ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gRm9yIGEgd3JpdGUgcmVxdWVzdCB0byBhIHpvbmVkIGJsb2NrIGRldmljZSwgbG9jayB0aGUg
cmVxdWVzdCB0YXJnZXQNCj4gem9uZSB1cG9uIHJlcXVlc3QgZGlzcGxhdGNoLiBUaGUgem9uZSBp
cyB1bmxvY2tlZCBlaXRoZXIgd2hlbiB0aGUNCiAgICAgICAgICAgICAgICAgICAgXl5eXl5eXl5e
DQogICAgICAgICAgICAgICAgICAgIGRpc3BhdGNoPw0KPiByZXF1ZXN0IGNvbXBsZXRlcyBvciB3
aGVuIHRoZSByZXF1ZXN0IGlzIHJlcXVldWVkIChpbnNlcnRlZCkuDQoNCkFueXdheToNCg0KUmV2
aWV3ZWQtYnk6IEJhcnQgVmFuIEFzc2NoZSA8QmFydC5WYW5Bc3NjaGVAd2RjLmNvbT4NCg==

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 12/14] block: mq-deadline: Introduce zone locking support
@ 2017-10-02 23:12     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 23:12 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
> For a write request to a zoned block device, lock the request target
> zone upon request displatch. The zone is unlocked either when the
                    ^^^^^^^^^
                    dispatch?
> request completes or when the request is requeued (inserted).

Anyway:

Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices
  2017-10-02  7:15 ` [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices Damien Le Moal
@ 2017-10-02 23:44     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 23:44 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gTW9uLCAyMDE3LTEwLTAyIGF0IDE2OjE1ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gV2hlbiBkaXNwYXRjaGluZyB3cml0ZSByZXF1ZXN0cyB0byBhIHpvbmVkIGJsb2NrIGRldmlj
ZSwgb25seSBhbGxvdw0KPiByZXF1ZXN0cyB0YXJnZXRpbmcgYW4gdW5sb2NrZWQgem9uZS4gUmVx
dWVzdHMgdGFyZ2V0aW5nIGEgbG9ja2VkIHpvbmUNCj4gYXJlIGxlZnQgaW4gdGhlIHNjaGVkdWxl
ciBxdWV1ZSB0byBwcmVzZXJ2ZSB0aGUgaW5pdGlhbCB3cml0ZSBvcmRlci4NCj4gSWYgbm8gd3Jp
dGUgcmVxdWVzdCBjYW4gYmUgZGlzcGF0Y2hlZCwgYWxsb3cgcmVhZHMgdG8gYmUgZGlzcGF0Y2hl
ZA0KPiBldmVuIGlmIHRoZSB3cml0ZSBiYXRjaCBpcyBub3QgZG9uZS4NCj4NCj4gVG8gZW5zdXJl
IHRoYXQgdGhlIHNlYXJjaCBmb3IgYW4gYXBwcm9wcmlhdGUgd3JpdGUgcmVxdWVzdCBpcyBhdG9t
aWMNCj4gaW4gZGVhZGxpbmVfZmlmb19yZXF1ZXN0KCkgYW5kIGRlYWRsaW5lX25leHRfcmVxdWVz
dCgpIHdpdGggcmVhZ3JkIHRvDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgIF5eXl5eXg0KICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICByZWdhcmQ/DQo+IHdyaXRlIHJl
cXVlc3RzIHpvbmUgbG9jayBzdGF0ZSwgaW50cm9kdWNlIHRoZSBzcGlubG9jayB6b25lX2xvY2su
DQo+IEhvbGRpbmcgdGhpcyBsb2NrIHdoaWxlIGRvaW5nIHRoZSBzZWFyY2ggaW4gdGhlc2UgZnVu
Y3Rpb25zIGFzIHdlbGwgYXMNCj4gd2hlbiB1bmxvY2tpbmcgdGhlIHRhcmdldCB6b25lIG9mIGEg
Y29tcGxldGVkIHdyaXRlIHJlcXVlc3QgaW4NCj4gZGRfY29tcGxldGVkX3JlcXVlc3QoKSBlbnN1
cmUgdGhhdCB0aGUgc2VhcmNoIGRvZXMgbm90IHBpY2t1cCBhIHdyaXRlDQo+IHJlcXVlc3QgaW4g
dGhlIG1pZGRsZSBvZiBhIHpvbmUgcXVldWVkIHdyaXRlIHNlcXVlbmNlLg0KDQpTaW5jZSB0aGVy
ZSBpcyBhbHJlYWR5IGEgc3BpbmxvY2sgaW4gdGhlIG1xLWRlYWRsaW5lIHNjaGVkdWxlciB0aGF0
IHNlcmlhbGl6ZXMNCm1vc3Qgb3BlcmF0aW9ucywgZG8gd2UgcmVhbGx5IG5lZWQgYSBuZXcgc3Bp
bmxvY2s/DQoNCj4gIC8qDQo+ICAgKiBXcml0ZSB1bmxvY2sgdGhlIHRhcmdldCB6b25lIG9mIGEg
d3JpdGUgcmVxdWVzdC4NCj4gKyAqIENsZWFyaW5nIHRoZSB0YXJnZXQgem9uZSB3cml0ZSBsb2Nr
IGJpdCBpcyBkb25lIHdpdGggdGhlIHNjaGVkdWxlciB6b25lX2xvY2sNCj4gKyAqIHNwaW5sb2Nr
IGhlbGQgc28gdGhhdCBkZWFkbGluZV9uZXh0X3JlcXVlc3QoKSBhbmQgZGVhZGxpbmVfZmlmb19y
ZXF1ZXN0KCkNCj4gKyAqIGNhbm5vdCBzZWUgdGhlIGxvY2sgc3RhdGUgb2YgYSB6b25lIGNoYW5n
ZSBkdWUgdG8gYSByZXF1ZXN0IGNvbXBsZXRpb24gZHVyaW5nDQo+ICsgKiB0aGVpciBldmVudHVh
bCBzZWFyY2ggZm9yIGFuIGFwcHJvcHJpYXRlIHdyaXRlIHJlcXVlc3QuIE90aGVyd2lzZSwgZm9y
IGEgem9uZQ0KPiArICogd2l0aCBtdWx0aXBsZSB3cml0ZSByZXF1ZXN0cyBxdWV1ZWQsIGEgbm9u
IHNlcXVlbnRpYWwgd3JpdGUgcmVxdWVzdA0KPiArICogY2FuIGJlIGNob3Nlbi4NCj4gICAqLw0K
PiAgc3RhdGljIHZvaWQgZGVhZGxpbmVfd3VubG9ja196b25lKHN0cnVjdCBkZWFkbGluZV9kYXRh
ICpkZCwNCj4gIAkJCQkgIHN0cnVjdCByZXF1ZXN0ICpycSkNCj4gIHsNCj4gKwl1bnNpZ25lZCBs
b25nIGZsYWdzOw0KPiArDQo+ICsJc3Bpbl9sb2NrX2lycXNhdmUoJmRkLT56b25lX2xvY2ssIGZs
YWdzKTsNCj4gKw0KPiAgCVdBUk5fT05fT05DRSghdGVzdF9hbmRfY2xlYXJfYml0KGJsa19ycV96
b25lX25vKHJxKSwgZGQtPnpvbmVzX3dsb2NrKSk7DQo+ICAJZGVhZGxpbmVfY2xlYXJfcmVxdWVz
dF96b25lX3dsb2NrKHJxKTsNCj4gKw0KPiArCXNwaW5fdW5sb2NrX2lycXJlc3RvcmUoJmRkLT56
b25lX2xvY2ssIGZsYWdzKTsNCj4gIH0NCg0KSXMgYSByZXF1ZXN0IHJlbW92ZWQgZnJvbSB0aGUg
c29ydCBhbmQgZmlmbyBsaXN0cyBiZWZvcmUgYmVpbmcgZGlzcGF0Y2hlZD8gSWYgc28sDQpkb2Vz
IHRoYXQgbWVhbiB0aGF0IG9idGFpbmluZyB6b25lX2xvY2sgaW4gdGhlIGFib3ZlIGZ1bmN0aW9u
IGlzIG5vdCBuZWNlc3Nhcnk/DQoNCj4gIHN0YXRpYyBzdHJ1Y3QgcmVxdWVzdCAqDQo+ICBkZWFk
bGluZV9maWZvX3JlcXVlc3Qoc3RydWN0IGRlYWRsaW5lX2RhdGEgKmRkLCBpbnQgZGF0YV9kaXIp
DQo+ICB7DQo+ICsJc3RydWN0IHJlcXVlc3QgKnJxOw0KPiArCXVuc2lnbmVkIGxvbmcgZmxhZ3M7
DQo+ICsNCj4gIAlpZiAoV0FSTl9PTl9PTkNFKGRhdGFfZGlyICE9IFJFQUQgJiYgZGF0YV9kaXIg
IT0gV1JJVEUpKQ0KPiAgCQlyZXR1cm4gTlVMTDsNCj4gIA0KPiAgCWlmIChsaXN0X2VtcHR5KCZk
ZC0+Zmlmb19saXN0W2RhdGFfZGlyXSkpDQo+ICAJCXJldHVybiBOVUxMOw0KPiAgDQo+IC0JcmV0
dXJuIHJxX2VudHJ5X2ZpZm8oZGQtPmZpZm9fbGlzdFtkYXRhX2Rpcl0ubmV4dCk7DQo+ICsJaWYg
KCFkZC0+em9uZXNfd2xvY2sgfHwgZGF0YV9kaXIgPT0gUkVBRCkNCj4gKwkJcmV0dXJuIHJxX2Vu
dHJ5X2ZpZm8oZGQtPmZpZm9fbGlzdFtkYXRhX2Rpcl0ubmV4dCk7DQo+ICsNCj4gKwlzcGluX2xv
Y2tfaXJxc2F2ZSgmZGQtPnpvbmVfbG9jaywgZmxhZ3MpOw0KPiArDQo+ICsJbGlzdF9mb3JfZWFj
aF9lbnRyeShycSwgJmRkLT5maWZvX2xpc3RbV1JJVEVdLCBxdWV1ZWxpc3QpIHsNCj4gKwkJaWYg
KGRlYWRsaW5lX2Nhbl9kaXNwYXRjaF9yZXF1ZXN0KGRkLCBycSkpDQo+ICsJCQlnb3RvIG91dDsN
Cj4gKwl9DQo+ICsJcnEgPSBOVUxMOw0KPiArDQo+ICtvdXQ6DQo+ICsJc3Bpbl91bmxvY2tfaXJx
cmVzdG9yZSgmZGQtPnpvbmVfbG9jaywgZmxhZ3MpOw0KPiArDQo+ICsJcmV0dXJuIHJxOw0KPiAg
fQ0KDQpTYW1lIHF1ZXN0aW9uIGhlcmU6IGlzIGl0IHJlYWxseSBuZWNlc3NhcnkgdG8gb2J0YWlu
IHpvbmVfbG9jaz8NCg0KVGhhbmtzLA0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices
@ 2017-10-02 23:44     ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-02 23:44 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
> When dispatching write requests to a zoned block device, only allow
> requests targeting an unlocked zone. Requests targeting a locked zone
> are left in the scheduler queue to preserve the initial write order.
> If no write request can be dispatched, allow reads to be dispatched
> even if the write batch is not done.
>
> To ensure that the search for an appropriate write request is atomic
> in deadline_fifo_request() and deadline_next_request() with reagrd to
                                                              ^^^^^^
                                                              regard?
> write requests zone lock state, introduce the spinlock zone_lock.
> Holding this lock while doing the search in these functions as well as
> when unlocking the target zone of a completed write request in
> dd_completed_request() ensure that the search does not pickup a write
> request in the middle of a zone queued write sequence.

Since there is already a spinlock in the mq-deadline scheduler that serializes
most operations, do we really need a new spinlock?

>  /*
>   * Write unlock the target zone of a write request.
> + * Clearing the target zone write lock bit is done with the scheduler zone_lock
> + * spinlock held so that deadline_next_request() and deadline_fifo_request()
> + * cannot see the lock state of a zone change due to a request completion during
> + * their eventual search for an appropriate write request. Otherwise, for a zone
> + * with multiple write requests queued, a non sequential write request
> + * can be chosen.
>   */
>  static void deadline_wunlock_zone(struct deadline_data *dd,
>  				  struct request *rq)
>  {
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&dd->zone_lock, flags);
> +
>  	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));
>  	deadline_clear_request_zone_wlock(rq);
> +
> +	spin_unlock_irqrestore(&dd->zone_lock, flags);
>  }

Is a request removed from the sort and fifo lists before being dispatched? If so,
does that mean that obtaining zone_lock in the above function is not necessary?

>  static struct request *
>  deadline_fifo_request(struct deadline_data *dd, int data_dir)
>  {
> +	struct request *rq;
> +	unsigned long flags;
> +
>  	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
>  		return NULL;
>  
>  	if (list_empty(&dd->fifo_list[data_dir]))
>  		return NULL;
>  
> -	return rq_entry_fifo(dd->fifo_list[data_dir].next);
> +	if (!dd->zones_wlock || data_dir == READ)
> +		return rq_entry_fifo(dd->fifo_list[data_dir].next);
> +
> +	spin_lock_irqsave(&dd->zone_lock, flags);
> +
> +	list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {
> +		if (deadline_can_dispatch_request(dd, rq))
> +			goto out;
> +	}
> +	rq = NULL;
> +
> +out:
> +	spin_unlock_irqrestore(&dd->zone_lock, flags);
> +
> +	return rq;
>  }

Same question here: is it really necessary to obtain zone_lock?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices
  2017-10-02 23:44     ` Bart Van Assche
  (?)
@ 2017-10-03  0:19     ` Damien Le Moal
  2017-10-03 20:56         ` Bart Van Assche
  -1 siblings, 1 reply; 31+ messages in thread
From: Damien Le Moal @ 2017-10-03  0:19 UTC (permalink / raw)
  To: Bart Van Assche, linux-scsi, linux-block, martin.petersen, axboe; +Cc: hch

Bart,

On 10/3/17 08:44, Bart Van Assche wrote:
> On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
>> When dispatching write requests to a zoned block device, only allow
>> requests targeting an unlocked zone. Requests targeting a locked zone
>> are left in the scheduler queue to preserve the initial write order.
>> If no write request can be dispatched, allow reads to be dispatched
>> even if the write batch is not done.
>>
>> To ensure that the search for an appropriate write request is atomic
>> in deadline_fifo_request() and deadline_next_request() with reagrd to
>                                                               ^^^^^^
>                                                               regard?

Will fix.

>> write requests zone lock state, introduce the spinlock zone_lock.
>> Holding this lock while doing the search in these functions as well as
>> when unlocking the target zone of a completed write request in
>> dd_completed_request() ensure that the search does not pickup a write
>> request in the middle of a zone queued write sequence.
> 
> Since there is already a spinlock in the mq-deadline scheduler that serializes
> most operations, do we really need a new spinlock?

The dd->lock spinlock is used without IRQ disabling. So it does not
protect against request completion method calls. That spinlock being a
"big lock", I did not want to use it with IRQ disabled.

>>  /*
>>   * Write unlock the target zone of a write request.
>> + * Clearing the target zone write lock bit is done with the scheduler zone_lock
>> + * spinlock held so that deadline_next_request() and deadline_fifo_request()
>> + * cannot see the lock state of a zone change due to a request completion during
>> + * their eventual search for an appropriate write request. Otherwise, for a zone
>> + * with multiple write requests queued, a non sequential write request
>> + * can be chosen.
>>   */
>>  static void deadline_wunlock_zone(struct deadline_data *dd,
>>  				  struct request *rq)
>>  {
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&dd->zone_lock, flags);
>> +
>>  	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));
>>  	deadline_clear_request_zone_wlock(rq);
>> +
>> +	spin_unlock_irqrestore(&dd->zone_lock, flags);
>>  }
> 
> Is a request removed from the sort and fifo lists before being dispatched? If so,
> does that mean that obtaining zone_lock in the above function is not necessary?

Yes, a request is removed from the sort tree and fifo list before
dispatching. But the dd->zone_lock spinlock is not there to protect
that, dd->lock protects the sort tree and fifo list. dd->zone_lock was
added to prevent the completed_request() method from changing a zone
lock state while deadline_fifo_requests() or deadline_next_request() are
running. Ex:

Imagine this case: write request A for a zone Z is being executed (it
was dispatched) so Z is locked. Dispatch runs and inspects the next
requests in sort order. Let say we have the sequential writes B, C, D, E
queued for the same zone Z. First B is inspected and cannot be
dispatched (zone locked). Inspection move on to C, but before the that,
A completes and Z is unlocked. Then C will be OK to go as the zone is
now unlocked. But it is the wrong choice as it will result in out of
order write. B must be the next request dispatched after A.

dd->zone_lock prevents this from happening. Without this spinlock, the
bad example case above happens very easily.

>>  static struct request *
>>  deadline_fifo_request(struct deadline_data *dd, int data_dir)
>>  {
>> +	struct request *rq;
>> +	unsigned long flags;
>> +
>>  	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
>>  		return NULL;
>>  
>>  	if (list_empty(&dd->fifo_list[data_dir]))
>>  		return NULL;
>>  
>> -	return rq_entry_fifo(dd->fifo_list[data_dir].next);
>> +	if (!dd->zones_wlock || data_dir == READ)
>> +		return rq_entry_fifo(dd->fifo_list[data_dir].next);
>> +
>> +	spin_lock_irqsave(&dd->zone_lock, flags);
>> +
>> +	list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {
>> +		if (deadline_can_dispatch_request(dd, rq))
>> +			goto out;
>> +	}
>> +	rq = NULL;
>> +
>> +out:
>> +	spin_unlock_irqrestore(&dd->zone_lock, flags);
>> +
>> +	return rq;
>>  }
> 
> Same question here: is it really necessary to obtain zone_lock?

See above. Same comment.

Best regards.

-- 
Damien Le Moal,
Western Digital

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices
  2017-10-03  0:19     ` Damien Le Moal
@ 2017-10-03 20:56         ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-03 20:56 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gVHVlLCAyMDE3LTEwLTAzIGF0IDA5OjE5ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gT24gMTAvMy8xNyAwODo0NCwgQmFydCBWYW4gQXNzY2hlIHdyb3RlOg0KPiA+IE9uIE1vbiwg
MjAxNy0xMC0wMiBhdCAxNjoxNSArMDkwMCwgRGFtaWVuIExlIE1vYWwgd3JvdGU6DQo+ID4gPiAg
c3RhdGljIHZvaWQgZGVhZGxpbmVfd3VubG9ja196b25lKHN0cnVjdCBkZWFkbGluZV9kYXRhICpk
ZCwNCj4gPiA+ICAJCQkJICBzdHJ1Y3QgcmVxdWVzdCAqcnEpDQo+ID4gPiAgew0KPiA+ID4gKwl1
bnNpZ25lZCBsb25nIGZsYWdzOw0KPiA+ID4gKw0KPiA+ID4gKwlzcGluX2xvY2tfaXJxc2F2ZSgm
ZGQtPnpvbmVfbG9jaywgZmxhZ3MpOw0KPiA+ID4gKw0KPiA+ID4gIAlXQVJOX09OX09OQ0UoIXRl
c3RfYW5kX2NsZWFyX2JpdChibGtfcnFfem9uZV9ubyhycSksIGRkLT56b25lc193bG9jaykpOw0K
PiA+ID4gIAlkZWFkbGluZV9jbGVhcl9yZXF1ZXN0X3pvbmVfd2xvY2socnEpOw0KPiA+ID4gKw0K
PiA+ID4gKwlzcGluX3VubG9ja19pcnFyZXN0b3JlKCZkZC0+em9uZV9sb2NrLCBmbGFncyk7DQo+
ID4gPiAgfQ0KPiA+IA0KPiA+IElzIGEgcmVxdWVzdCByZW1vdmVkIGZyb20gdGhlIHNvcnQgYW5k
IGZpZm8gbGlzdHMgYmVmb3JlIGJlaW5nIGRpc3BhdGNoZWQ/IElmIHNvLA0KPiA+IGRvZXMgdGhh
dCBtZWFuIHRoYXQgb2J0YWluaW5nIHpvbmVfbG9jayBpbiB0aGUgYWJvdmUgZnVuY3Rpb24gaXMg
bm90IG5lY2Vzc2FyeT8NCj4gDQo+IFllcywgYSByZXF1ZXN0IGlzIHJlbW92ZWQgZnJvbSB0aGUg
c29ydCB0cmVlIGFuZCBmaWZvIGxpc3QgYmVmb3JlDQo+IGRpc3BhdGNoaW5nLiBCdXQgdGhlIGRk
LT56b25lX2xvY2sgc3BpbmxvY2sgaXMgbm90IHRoZXJlIHRvIHByb3RlY3QNCj4gdGhhdCwgZGQt
PmxvY2sgcHJvdGVjdHMgdGhlIHNvcnQgdHJlZSBhbmQgZmlmbyBsaXN0LiBkZC0+em9uZV9sb2Nr
IHdhcw0KPiBhZGRlZCB0byBwcmV2ZW50IHRoZSBjb21wbGV0ZWRfcmVxdWVzdCgpIG1ldGhvZCBm
cm9tIGNoYW5naW5nIGEgem9uZQ0KPiBsb2NrIHN0YXRlIHdoaWxlIGRlYWRsaW5lX2ZpZm9fcmVx
dWVzdHMoKSBvciBkZWFkbGluZV9uZXh0X3JlcXVlc3QoKSBhcmUNCj4gcnVubmluZy4gRXg6DQo+
IA0KPiBJbWFnaW5lIHRoaXMgY2FzZTogd3JpdGUgcmVxdWVzdCBBIGZvciBhIHpvbmUgWiBpcyBi
ZWluZyBleGVjdXRlZCAoaXQNCj4gd2FzIGRpc3BhdGNoZWQpIHNvIFogaXMgbG9ja2VkLiBEaXNw
YXRjaCBydW5zIGFuZCBpbnNwZWN0cyB0aGUgbmV4dA0KPiByZXF1ZXN0cyBpbiBzb3J0IG9yZGVy
LiBMZXQgc2F5IHdlIGhhdmUgdGhlIHNlcXVlbnRpYWwgd3JpdGVzIEIsIEMsIEQsIEUNCj4gcXVl
dWVkIGZvciB0aGUgc2FtZSB6b25lIFouIEZpcnN0IEIgaXMgaW5zcGVjdGVkIGFuZCBjYW5ub3Qg
YmUNCj4gZGlzcGF0Y2hlZCAoem9uZSBsb2NrZWQpLiBJbnNwZWN0aW9uIG1vdmUgb24gdG8gQywg
YnV0IGJlZm9yZSB0aGUgdGhhdCwNCj4gQSBjb21wbGV0ZXMgYW5kIFogaXMgdW5sb2NrZWQuIFRo
ZW4gQyB3aWxsIGJlIE9LIHRvIGdvIGFzIHRoZSB6b25lIGlzDQo+IG5vdyB1bmxvY2tlZC4gQnV0
IGl0IGlzIHRoZSB3cm9uZyBjaG9pY2UgYXMgaXQgd2lsbCByZXN1bHQgaW4gb3V0IG9mDQo+IG9y
ZGVyIHdyaXRlLiBCIG11c3QgYmUgdGhlIG5leHQgcmVxdWVzdCBkaXNwYXRjaGVkIGFmdGVyIEEu
DQo+IA0KPiBkZC0+em9uZV9sb2NrIHByZXZlbnRzIHRoaXMgZnJvbSBoYXBwZW5pbmcuIFdpdGhv
dXQgdGhpcyBzcGlubG9jaywgdGhlDQo+IGJhZCBleGFtcGxlIGNhc2UgYWJvdmUgaGFwcGVucyB2
ZXJ5IGVhc2lseS4NCg0KSGVsbG8gRGFtaWVuLA0KDQpUaGFua3MgZm9yIHRoZSBkZXRhaWxlZCBh
bmQgcmVhbGx5IGNsZWFyIHJlcGx5LiBJIGhvcGUgeW91IGRvIG5vdCBtaW5kIHRoYXQgSQ0KaGF2
ZSBhIGZldyBmdXJ0aGVyIHF1ZXN0aW9ucyBhYm91dCB0aGlzIHBhdGNoPw0KLSBEb2VzIHRoZSB6
b25lX2xvY2sgc3BpbmxvY2sgaGF2ZSB0byBiZSBhY3F1aXJlZCBieSBib3RoIGRlYWRsaW5lX3d1
bmxvY2tfem9uZSgpDQogIGNhbGxlcnMgb3Igb25seSBieSB0aGUgY2FsbCBmcm9tIHRoZSByZXF1
ZXN0IGNvbXBsZXRpb24gcGF0aD8NCi0gV2h5IGRvIGJvdGggdGhlIG1xLWRlYWRsaW5lIGFuZCB0
aGUgc2QgZHJpdmVyIGVhY2ggaGF2ZSB0aGVpciBvd24gaW5zdGFuY2Ugb2YNCiAgdGhlIHpvbmVz
X3dsb2NrIGJpdG1hcD8gSGFzIGl0IGJlZW4gY29uc2lkZXJlZCB0byBjb252ZXJ0IGJvdGggYml0
bWFwcyBpbnRvIGENCiAgc2luZ2xlIGJpdG1hcCB0aGF0IGlzIHNoYXJlZCBieSBib3RoIGtlcm5l
bCBjb21wb25lbnRzIGFuZCB0aGF0IGV4aXN0cyBlLmcuIGF0DQogIHRoZSByZXF1ZXN0IHF1ZXVl
IGxldmVsPw0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices
@ 2017-10-03 20:56         ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2017-10-03 20:56 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Tue, 2017-10-03 at 09:19 +0900, Damien Le Moal wrote:
> On 10/3/17 08:44, Bart Van Assche wrote:
> > On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
> > >  static void deadline_wunlock_zone(struct deadline_data *dd,
> > >  				  struct request *rq)
> > >  {
> > > +	unsigned long flags;
> > > +
> > > +	spin_lock_irqsave(&dd->zone_lock, flags);
> > > +
> > >  	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));
> > >  	deadline_clear_request_zone_wlock(rq);
> > > +
> > > +	spin_unlock_irqrestore(&dd->zone_lock, flags);
> > >  }
> > 
> > Is a request removed from the sort and fifo lists before being dispatched? If so,
> > does that mean that obtaining zone_lock in the above function is not necessary?
> 
> Yes, a request is removed from the sort tree and fifo list before
> dispatching. But the dd->zone_lock spinlock is not there to protect
> that, dd->lock protects the sort tree and fifo list. dd->zone_lock was
> added to prevent the completed_request() method from changing a zone
> lock state while deadline_fifo_requests() or deadline_next_request() are
> running. Ex:
> 
> Imagine this case: write request A for a zone Z is being executed (it
> was dispatched) so Z is locked. Dispatch runs and inspects the next
> requests in sort order. Let say we have the sequential writes B, C, D, E
> queued for the same zone Z. First B is inspected and cannot be
> dispatched (zone locked). Inspection move on to C, but before the that,
> A completes and Z is unlocked. Then C will be OK to go as the zone is
> now unlocked. But it is the wrong choice as it will result in out of
> order write. B must be the next request dispatched after A.
> 
> dd->zone_lock prevents this from happening. Without this spinlock, the
> bad example case above happens very easily.

Hello Damien,

Thanks for the detailed and really clear reply. I hope you do not mind that I
have a few further questions about this patch?
- Does the zone_lock spinlock have to be acquired by both deadline_wunlock_zone()
  callers or only by the call from the request completion path?
- Why do both the mq-deadline and the sd driver each have their own instance of
  the zones_wlock bitmap? Has it been considered to convert both bitmaps into a
  single bitmap that is shared by both kernel components and that exists e.g. at
  the request queue level?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices
  2017-10-03 20:56         ` Bart Van Assche
  (?)
@ 2017-10-03 23:03         ` Damien Le Moal
  -1 siblings, 0 replies; 31+ messages in thread
From: Damien Le Moal @ 2017-10-03 23:03 UTC (permalink / raw)
  To: Bart Van Assche, linux-scsi, linux-block, martin.petersen, axboe; +Cc: hch

Bart,

On 10/4/17 05:56, Bart Van Assche wrote:
> On Tue, 2017-10-03 at 09:19 +0900, Damien Le Moal wrote:
>> On 10/3/17 08:44, Bart Van Assche wrote:
>>> On Mon, 2017-10-02 at 16:15 +0900, Damien Le Moal wrote:
>>>>  static void deadline_wunlock_zone(struct deadline_data *dd,
>>>>  				  struct request *rq)
>>>>  {
>>>> +	unsigned long flags;
>>>> +
>>>> +	spin_lock_irqsave(&dd->zone_lock, flags);
>>>> +
>>>>  	WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq), dd->zones_wlock));
>>>>  	deadline_clear_request_zone_wlock(rq);
>>>> +
>>>> +	spin_unlock_irqrestore(&dd->zone_lock, flags);
>>>>  }
>>>
>>> Is a request removed from the sort and fifo lists before being dispatched? If so,
>>> does that mean that obtaining zone_lock in the above function is not necessary?
>>
>> Yes, a request is removed from the sort tree and fifo list before
>> dispatching. But the dd->zone_lock spinlock is not there to protect
>> that, dd->lock protects the sort tree and fifo list. dd->zone_lock was
>> added to prevent the completed_request() method from changing a zone
>> lock state while deadline_fifo_requests() or deadline_next_request() are
>> running. Ex:
>>
>> Imagine this case: write request A for a zone Z is being executed (it
>> was dispatched) so Z is locked. Dispatch runs and inspects the next
>> requests in sort order. Let say we have the sequential writes B, C, D, E
>> queued for the same zone Z. First B is inspected and cannot be
>> dispatched (zone locked). Inspection move on to C, but before the that,
>> A completes and Z is unlocked. Then C will be OK to go as the zone is
>> now unlocked. But it is the wrong choice as it will result in out of
>> order write. B must be the next request dispatched after A.
>>
>> dd->zone_lock prevents this from happening. Without this spinlock, the
>> bad example case above happens very easily.
> 
> Hello Damien,
> 
> Thanks for the detailed and really clear reply. I hope you do not mind that I
> have a few further questions about this patch?
> - Does the zone_lock spinlock have to be acquired by both deadline_wunlock_zone()
>   callers or only by the call from the request completion path?

Not really. Only the completion path is strongly needed as the insert
path unlock is under dd->lock, which prevents concurrent execution of
the sort or fifo request search. So the zone_lock lock/unlock could be
moved out of deadline_wunlock_zone() and done directly in
dd_completed_request().

> - Why do both the mq-deadline and the sd driver each have their own instance of
>   the zones_wlock bitmap? Has it been considered to convert both bitmaps into a
>   single bitmap that is shared by both kernel components and that exists e.g. at
>   the request queue level?

The sd driver level zone locking handles only the legacy path. Hence the
zone lock bitmap attached to the scsi disk struct. For scsi-mq/blk-mq,
mq-deadline does the zone locking. Both bitmaps do not exist at the same
time.

Indeed we could move the zone lock bitmap in the request queue so that
it is common between legacy and mq cases. Christoph has a series doing
that, and going further by doing the zone locking directly within the
block layer dispatch code for both legacy and mq path. So the scsi level
locking and mq-deadline locking become unnecessary. Working on that new
series right now.

Best regards.

-- 
Damien Le Moal,
Western Digital

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-10-03 23:03 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-02  7:15 [PATCH V6 00/14] scsi-mq support for ZBC disks Damien Le Moal
2017-10-02  7:15 ` [PATCH V6 01/14] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h Damien Le Moal
2017-10-02  7:15 ` [PATCH V6 02/14] scsi: sd_zbc: Fix comments and indentation Damien Le Moal
2017-10-02 16:50   ` Bart Van Assche
2017-10-02 16:50     ` Bart Van Assche
2017-10-02  7:15 ` [PATCH V6 03/14] scsi: sd_zbc: Rearrange code Damien Le Moal
2017-10-02  7:15 ` [PATCH V6 04/14] scsi: sd_zbc: Use well defined macros Damien Le Moal
2017-10-02  7:15 ` [PATCH V6 05/14] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics() Damien Le Moal
2017-10-02  7:15 ` [PATCH V6 06/14] block: Add zoned block device information to request queue Damien Le Moal
2017-10-02  7:15 ` [PATCH V6 07/14] scsi: sd_zbc: Initialize device request queue zoned data Damien Le Moal
2017-10-02 16:57   ` Bart Van Assche
2017-10-02 16:57     ` Bart Van Assche
2017-10-02  7:15 ` [PATCH V6 08/14] scsi: sd_zbc: Limit zone write locking to sequential zones Damien Le Moal
2017-10-02 17:00   ` Bart Van Assche
2017-10-02 17:00     ` Bart Van Assche
2017-10-02  7:15 ` [PATCH V6 09/14] scsi: sd_zbc: Disable zone write locking with scsi-mq Damien Le Moal
2017-10-02  7:15 ` [PATCH V6 10/14] block: mq-deadline: Add zoned block device data Damien Le Moal
2017-10-02 23:06   ` Bart Van Assche
2017-10-02 23:06     ` Bart Van Assche
2017-10-02  7:15 ` [PATCH V6 11/14] blokc: mq-deadline: Introduce dispatch helpers Damien Le Moal
2017-10-02  7:15 ` [PATCH V6 12/14] block: mq-deadline: Introduce zone locking support Damien Le Moal
2017-10-02 23:12   ` Bart Van Assche
2017-10-02 23:12     ` Bart Van Assche
2017-10-02  7:15 ` [PATCH V6 13/14] block: mq-deadline: Limit write request dispatch for zoned block devices Damien Le Moal
2017-10-02 23:44   ` Bart Van Assche
2017-10-02 23:44     ` Bart Van Assche
2017-10-03  0:19     ` Damien Le Moal
2017-10-03 20:56       ` Bart Van Assche
2017-10-03 20:56         ` Bart Van Assche
2017-10-03 23:03         ` Damien Le Moal
2017-10-02  7:15 ` [PATCH V6 14/14] block: do not set mq default scheduler Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.