All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks
@ 2017-09-01 11:36 Damien Le Moal
  2017-09-01 11:36 ` [PATCH 1/8] block: Fix declaration of blk-mq debugfs functions Damien Le Moal
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Damien Le Moal @ 2017-09-01 11:36 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

This series implements support for ZBC disks used through the blk-mq/scsi-mq
I/O path.

The current scsi level support of ZBC disks guarantees write request ordering
using a per-zone write lock which prevents issuing simultaneously multiple
write commands to a zone, doing so avoid reordering of sequential writes to
sequential zones. This method is however ineffective when scsi-mq is used with
zoned block devices. This is due to the different execution model of blk-mq
which passes a request to the scsi layer for dispatching after the request has
been removed from the I/O scheduler queue. That is, when the scsi layer tries
to lock the target zone of the request, the request may already be out of
order and zone write locking fails to prevent that.

Various approaches have been tried to solve this problem. All of them had the
serious disadvantage of cluttering blk-mq code with zoned block device specific
conditions and processing. As such extensive changes can only transform into
maintenance nightmares, a radically different solution is proposed here.

This series support implementation is in the form of a new "zoned" I/O
scheduler based on mq-deadline. Zoned is mostly identical to the mq-deadline
scheduler. It only differs from mq-deadline from the addition of a per zone
write locking mechanism similar to that implemented in sd_zbc.c. The zoned
scheduler zone write locking mechanism is used for the exact same purpose as
the one in the scsi disk driver: limit writes per zone to one to avoid
reordering. The locking context however changes and is moved to the
dispatch_request method of the scheduler, that is, target zones of write
requests can be locked before the requests are issued. In effect, this results
in the same behavior as the legacy scsi path. Sequential write ordering is
preserved.

The zoned scheduler is added as part of the scsi code under driver/scsi. This
allows access to information on the disk zones maintained within the device
scsi_disk structure as this information (e.g. zone types) cannot be retreived
from the context of an I/O scheduler initialization method (init_queue()
method).

the series patches are as follows:
- The first 2 patches fix exported symbols declaration to allow compiling an
  I/O scheduler outside of the block/ directory. No functional changes from
  these patches.
- Patch 3 and 4 reorganize and cleanup the scsi ZBC support to facilitate the
  intorduction of the scheduler. No functional changes from these patches.
- Path 5 fixes a bug
- Patch 6 is an optimization for the legacy scsi path which avoids zone
  locking if a write request targets a conventional zone. 
- Path 7 disables zone write locking for disks accessed through scsi-mq.
- Patch 8 introduces the zoned scheduler.

Comments are as always very much appreciated.

Thank you.


Damien Le Moal (8):
  block: Fix declaration of blk-mq debugfs functions
  block: Fix declaration of blk-mq scheduler functions
  scsi: sd_zbc: Move ZBC declarations to scsi_proto.h
  scsi: sd_zbc: Reorganize and cleanup
  scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()
  scsi: sd_zbc: Limit zone write locking to sequential zones
  scsi: sd_zbc: Disable zone write locking with scsi-mq
  scsi: Introduce ZBC disk I/O scheduler

 Documentation/block/zoned-iosched.txt |  48 ++
 block/Kconfig.iosched                 |  12 +
 block/blk-mq-debugfs.h                |  14 -
 block/blk-mq-sched.h                  |   7 -
 drivers/scsi/Makefile                 |   1 +
 drivers/scsi/sd.c                     |   1 +
 drivers/scsi/sd.h                     |  57 +--
 drivers/scsi/sd_zbc.c                 | 236 +++++----
 drivers/scsi/sd_zbc.h                 |  79 +++
 drivers/scsi/sd_zbc_iosched.c         | 934 ++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h                |  30 ++
 include/scsi/scsi_proto.h             |  46 +-
 12 files changed, 1293 insertions(+), 172 deletions(-)
 create mode 100644 Documentation/block/zoned-iosched.txt
 create mode 100644 drivers/scsi/sd_zbc.h
 create mode 100644 drivers/scsi/sd_zbc_iosched.c

-- 
2.13.5

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/8] block: Fix declaration of blk-mq debugfs functions
  2017-09-01 11:36 [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks Damien Le Moal
@ 2017-09-01 11:36 ` Damien Le Moal
  2017-09-01 16:00     ` Bart Van Assche
  2017-09-01 11:36 ` [PATCH 2/8] block: Fix declaration of blk-mq scheduler functions Damien Le Moal
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Damien Le Moal @ 2017-09-01 11:36 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

__blk_mq_debugfs_rq_show() and blk_mq_debugfs_rq_show() are exported
symbols but declared in the block internal file block/blk-mq-debugfs.h.
Move the declaration of these functions to the public linux/blk-mq.h
file to make these functions available to other modules.

While at it, also move the definition of the blk_mq_debugfs_attr
strcture to allow scheduler modules outside of block to define debugfs
attributes.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-mq-debugfs.h | 14 --------------
 include/linux/blk-mq.h | 20 ++++++++++++++++++++
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
index a182e6f97565..85759aef53e1 100644
--- a/block/blk-mq-debugfs.h
+++ b/block/blk-mq-debugfs.h
@@ -3,20 +3,6 @@
 
 #ifdef CONFIG_BLK_DEBUG_FS
 
-#include <linux/seq_file.h>
-
-struct blk_mq_debugfs_attr {
-	const char *name;
-	umode_t mode;
-	int (*show)(void *, struct seq_file *);
-	ssize_t (*write)(void *, const char __user *, size_t, loff_t *);
-	/* Set either .show or .seq_ops. */
-	const struct seq_operations *seq_ops;
-};
-
-int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq);
-int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
-
 int blk_mq_debugfs_register(struct request_queue *q);
 void blk_mq_debugfs_unregister(struct request_queue *q);
 int blk_mq_debugfs_register_hctx(struct request_queue *q,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 14542308d25b..a369174a9679 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -5,6 +5,21 @@
 #include <linux/sbitmap.h>
 #include <linux/srcu.h>
 
+#ifdef CONFIG_BLK_DEBUG_FS
+
+#include <linux/seq_file.h>
+
+struct blk_mq_debugfs_attr {
+	const char *name;
+	umode_t mode;
+	int (*show)(void *, struct seq_file *);
+	ssize_t (*write)(void *, const char __user *, size_t, loff_t *);
+	/* Set either .show or .seq_ops. */
+	const struct seq_operations *seq_ops;
+};
+
+#endif
+
 struct blk_mq_tags;
 struct blk_flush_queue;
 
@@ -289,4 +304,9 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
 	for ((i) = 0; (i) < (hctx)->nr_ctx &&				\
 	     ({ ctx = (hctx)->ctxs[(i)]; 1; }); (i)++)
 
+#ifdef CONFIG_BLK_DEBUG_FS
+int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq);
+int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
+#endif
+
 #endif
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/8] block: Fix declaration of blk-mq scheduler functions
  2017-09-01 11:36 [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks Damien Le Moal
  2017-09-01 11:36 ` [PATCH 1/8] block: Fix declaration of blk-mq debugfs functions Damien Le Moal
@ 2017-09-01 11:36 ` Damien Le Moal
  2017-09-01 16:04     ` Bart Van Assche
  2017-09-01 11:36 ` [PATCH 3/8] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h Damien Le Moal
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Damien Le Moal @ 2017-09-01 11:36 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

The functions blk_mq_sched_free_hctx_data(), blk_mq_sched_try_merge(),
blk_mq_sched_try_insert_merge() and blk_mq_sched_request_inserted() are
all exported symbols but declared only internally in
block/blk-mq-sched.h. Move their declaration to include/linux/blk-mq.h
to make them available to block external scheduler modules.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-mq-sched.h   |  7 -------
 include/linux/blk-mq.h | 10 ++++++++++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 9267d0b7c197..3b725f8fbebe 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -4,16 +4,9 @@
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
 
-void blk_mq_sched_free_hctx_data(struct request_queue *q,
-				 void (*exit)(struct blk_mq_hw_ctx *));
-
 void blk_mq_sched_assign_ioc(struct request *rq, struct bio *bio);
 
-void blk_mq_sched_request_inserted(struct request *rq);
-bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
-				struct request **merged_request);
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
-bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
 void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a369174a9679..128742552549 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -284,6 +284,16 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues);
 void blk_mq_quiesce_queue_nowait(struct request_queue *q);
 
 /*
+ * Scheduler helper functions.
+ */
+void blk_mq_sched_free_hctx_data(struct request_queue *q,
+				 void (*exit)(struct blk_mq_hw_ctx *));
+bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
+			    struct request **merged_request);
+bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
+void blk_mq_sched_request_inserted(struct request *rq);
+
+/*
  * Driver command data is immediately after the request. So subtract request
  * size to get back to the original request, add request size to get the PDU.
  */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/8] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h
  2017-09-01 11:36 [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks Damien Le Moal
  2017-09-01 11:36 ` [PATCH 1/8] block: Fix declaration of blk-mq debugfs functions Damien Le Moal
  2017-09-01 11:36 ` [PATCH 2/8] block: Fix declaration of blk-mq scheduler functions Damien Le Moal
@ 2017-09-01 11:36 ` Damien Le Moal
  2017-09-01 16:08     ` Bart Van Assche
  2017-09-01 11:36 ` [PATCH 4/8] scsi: sd_zbc: Reorganize and cleanup Damien Le Moal
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Damien Le Moal @ 2017-09-01 11:36 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Move standard macro definitions for the zone types and zone conditions
to scsi_proto.h together with the definitions related to the
REPORT ZONES command.

While at it, define all values in the enums to be clear.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd_zbc.c     | 18 ------------------
 include/scsi/scsi_proto.h | 46 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 35 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 8aa54779aac1..d445a57f99bd 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -37,24 +37,6 @@
 #include "sd.h"
 #include "scsi_priv.h"
 
-enum zbc_zone_type {
-	ZBC_ZONE_TYPE_CONV = 0x1,
-	ZBC_ZONE_TYPE_SEQWRITE_REQ,
-	ZBC_ZONE_TYPE_SEQWRITE_PREF,
-	ZBC_ZONE_TYPE_RESERVED,
-};
-
-enum zbc_zone_cond {
-	ZBC_ZONE_COND_NO_WP,
-	ZBC_ZONE_COND_EMPTY,
-	ZBC_ZONE_COND_IMP_OPEN,
-	ZBC_ZONE_COND_EXP_OPEN,
-	ZBC_ZONE_COND_CLOSED,
-	ZBC_ZONE_COND_READONLY = 0xd,
-	ZBC_ZONE_COND_FULL,
-	ZBC_ZONE_COND_OFFLINE,
-};
-
 /**
  * Convert a zone descriptor to a zone struct.
  */
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index 8c285d9a06d8..20d7b3fbc351 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -301,19 +301,43 @@ struct scsi_lun {
 
 /* Reporting options for REPORT ZONES */
 enum zbc_zone_reporting_options {
-	ZBC_ZONE_REPORTING_OPTION_ALL = 0,
-	ZBC_ZONE_REPORTING_OPTION_EMPTY,
-	ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
-	ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
-	ZBC_ZONE_REPORTING_OPTION_CLOSED,
-	ZBC_ZONE_REPORTING_OPTION_FULL,
-	ZBC_ZONE_REPORTING_OPTION_READONLY,
-	ZBC_ZONE_REPORTING_OPTION_OFFLINE,
-	ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
-	ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
-	ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
+	ZBC_ZONE_REPORTING_OPTION_ALL		= 0x00,
+	ZBC_ZONE_REPORTING_OPTION_EMPTY		= 0x01,
+	ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN	= 0x02,
+	ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN	= 0x03,
+	ZBC_ZONE_REPORTING_OPTION_CLOSED	= 0x04,
+	ZBC_ZONE_REPORTING_OPTION_FULL		= 0x05,
+	ZBC_ZONE_REPORTING_OPTION_READONLY	= 0x06,
+	ZBC_ZONE_REPORTING_OPTION_OFFLINE	= 0x07,
+	/* 0x08 to 0x0f are reserved */
+	ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP	= 0x10,
+	ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE	= 0x11,
+	/* 0x12 to 0x3e are reserved */
+	ZBC_ZONE_REPORTING_OPTION_NON_WP	= 0x3f,
 };
 
 #define ZBC_REPORT_ZONE_PARTIAL 0x80
 
+/* Zone types of REPORT ZONES zone descriptors */
+enum zbc_zone_type {
+	ZBC_ZONE_TYPE_CONV		= 0x1,
+	ZBC_ZONE_TYPE_SEQWRITE_REQ	= 0x2,
+	ZBC_ZONE_TYPE_SEQWRITE_PREF	= 0x3,
+	/* 0x4 to 0xf are reserved */
+	ZBC_ZONE_TYPE_RESERVED		= 0x4,
+};
+
+/* Zone conditions of REPORT ZONES zone descriptors */
+enum zbc_zone_cond {
+	ZBC_ZONE_COND_NO_WP		= 0x0,
+	ZBC_ZONE_COND_EMPTY		= 0x1,
+	ZBC_ZONE_COND_IMP_OPEN		= 0x2,
+	ZBC_ZONE_COND_EXP_OPEN		= 0x3,
+	ZBC_ZONE_COND_CLOSED		= 0x4,
+	/* 0x5 to 0xc are reserved */
+	ZBC_ZONE_COND_READONLY		= 0xd,
+	ZBC_ZONE_COND_FULL		= 0xe,
+	ZBC_ZONE_COND_OFFLINE		= 0xf,
+};
+
 #endif /* _SCSI_PROTO_H_ */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/8] scsi: sd_zbc: Reorganize and cleanup
  2017-09-01 11:36 [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks Damien Le Moal
                   ` (2 preceding siblings ...)
  2017-09-01 11:36 ` [PATCH 3/8] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h Damien Le Moal
@ 2017-09-01 11:36 ` Damien Le Moal
  2017-09-01 16:14     ` Bart Van Assche
  2017-09-01 11:36 ` [PATCH 5/8] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics() Damien Le Moal
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Damien Le Moal @ 2017-09-01 11:36 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Introduce sd_zbc.h for zbc related declarations (avoid cluttering sd.h).

Fix comments style in sd_zbc.c (do not use documentation format) and
add/fix comments to enhance explanations. Also remove a useless blank
line in sd_zbc_read_zones().

Rearrange sd_zbc_setup() to include use_16_for_rw and use_10_for_rw
assignment and simplify nr_zones calculation.

Finally, use the min() macro sd_zbc_check_zone_size() to get a report
zone reply size instead of the open coded version.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd.c     |  1 +
 drivers/scsi/sd.h     | 53 --------------------------------
 drivers/scsi/sd_zbc.c | 85 +++++++++++++++++++++++++--------------------------
 drivers/scsi/sd_zbc.h | 75 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 117 insertions(+), 97 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.h

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 962cef13d406..6711d49fde79 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -68,6 +68,7 @@
 #include <scsi/scsicam.h>
 
 #include "sd.h"
+#include "sd_zbc.h"
 #include "scsi_priv.h"
 #include "scsi_logging.h"
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 99c4dde9b6bf..5940390d30f3 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -272,57 +272,4 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
-static inline int sd_is_zoned(struct scsi_disk *sdkp)
-{
-	return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
-}
-
-#ifdef CONFIG_BLK_DEV_ZONED
-
-extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
-extern void sd_zbc_remove(struct scsi_disk *sdkp);
-extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
-extern int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd);
-extern void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd);
-extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd);
-extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
-extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
-			    struct scsi_sense_hdr *sshdr);
-
-#else /* CONFIG_BLK_DEV_ZONED */
-
-static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
-				    unsigned char *buf)
-{
-	return 0;
-}
-
-static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
-
-static inline void sd_zbc_print_zones(struct scsi_disk *sdkp) {}
-
-static inline int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
-{
-	/* Let the drive fail requests */
-	return BLKPREP_OK;
-}
-
-static inline void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd) {}
-
-static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
-{
-	return BLKPREP_INVALID;
-}
-
-static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
-{
-	return BLKPREP_INVALID;
-}
-
-static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
-				   unsigned int good_bytes,
-				   struct scsi_sense_hdr *sshdr) {}
-
-#endif /* CONFIG_BLK_DEV_ZONED */
-
 #endif /* _SCSI_DISK_H */
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index d445a57f99bd..810377007665 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -26,22 +26,12 @@
 
 #include <asm/unaligned.h>
 
-#include <scsi/scsi.h>
-#include <scsi/scsi_cmnd.h>
-#include <scsi/scsi_dbg.h>
-#include <scsi/scsi_device.h>
-#include <scsi/scsi_driver.h>
-#include <scsi/scsi_host.h>
-#include <scsi/scsi_eh.h>
-
-#include "sd.h"
-#include "scsi_priv.h"
-
-/**
+#include "sd_zbc.h"
+
+/*
  * Convert a zone descriptor to a zone struct.
  */
-static void sd_zbc_parse_report(struct scsi_disk *sdkp,
-				u8 *buf,
+static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
 				struct blk_zone *zone)
 {
 	struct scsi_device *sdp = sdkp->device;
@@ -63,8 +53,9 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp,
 		zone->wp = zone->start + zone->len;
 }
 
-/**
+/*
  * Issue a REPORT ZONES scsi command.
+ * For internal use.
  */
 static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
 			       unsigned int buflen, sector_t lba)
@@ -105,6 +96,9 @@ static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
 	return 0;
 }
 
+/*
+ * Prepare a REPORT ZONES scsi command for a REQ_OP_ZONE_REPORT request.
+ */
 int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -147,6 +141,10 @@ int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
 	return BLKPREP_OK;
 }
 
+/*
+ * Process a REPORT ZONES scsi command reply, converting all reported
+ * zone descriptors to blk_zone structures.
+ */
 static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
 					 unsigned int good_bytes)
 {
@@ -202,17 +200,9 @@ static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
 	local_irq_restore(flags);
 }
 
-static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
-{
-	return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
-}
-
-static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
-					  sector_t sector)
-{
-	return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
-}
-
+/*
+ * Prepare a RESET WRITE POINTER scsi command for a REQ_OP_ZONE_RESET request.
+ */
 int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -245,6 +235,11 @@ int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
 	return BLKPREP_OK;
 }
 
+/*
+ * Write lock a sequential zone.
+ * Called from sd_init_cmd() for write requests
+ * (standard write, write same or write zeroes operations).
+ */
 int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -282,6 +277,10 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	return BLKPREP_OK;
 }
 
+/*
+ * Write unlock a sequential zone.
+ * Called from sd_uninit_cmd().
+ */
 void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -341,7 +340,7 @@ void sd_zbc_complete(struct scsi_cmnd *cmd,
 	}
 }
 
-/**
+/*
  * Read zoned block device characteristics (VPD page B6).
  */
 static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
@@ -371,7 +370,7 @@ static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
 	return 0;
 }
 
-/**
+/*
  * Check reported capacity.
  */
 static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
@@ -403,8 +402,12 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp,
 	return 0;
 }
 
-#define SD_ZBC_BUF_SIZE 131072
+#define SD_ZBC_BUF_SIZE 131072U
 
+/*
+ * Check that all zones of the device have the same size, with eventually
+ * the exception of the last zone which is allowed to have a smaller size.
+ */
 static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 {
 	u64 zone_blocks;
@@ -447,10 +450,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 		/* Parse REPORT ZONES header */
 		list_length = get_unaligned_be32(&buf[0]) + 64;
 		rec = buf + 64;
-		if (list_length < SD_ZBC_BUF_SIZE)
-			buf_len = list_length;
-		else
-			buf_len = SD_ZBC_BUF_SIZE;
+		buf_len = min(list_length, SD_ZBC_BUF_SIZE);
 
 		/* Parse zone descriptors */
 		while (rec < buf + buf_len) {
@@ -505,6 +505,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 	}
 
 	sdkp->zone_blocks = zone_blocks;
+	sdkp->zone_shift = ilog2(zone_blocks);
 
 	return 0;
 }
@@ -512,13 +513,15 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 static int sd_zbc_setup(struct scsi_disk *sdkp)
 {
 
+	/* READ16/WRITE16 is mandatory for ZBC disks */
+	sdkp->device->use_16_for_rw = 1;
+	sdkp->device->use_10_for_rw = 0;
+
 	/* chunk_sectors indicates the zone size */
 	blk_queue_chunk_sectors(sdkp->disk->queue,
 			logical_to_sectors(sdkp->device, sdkp->zone_blocks));
-	sdkp->zone_shift = ilog2(sdkp->zone_blocks);
-	sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;
-	if (sdkp->capacity & (sdkp->zone_blocks - 1))
-		sdkp->nr_zones++;
+	sdkp->nr_zones =
+		round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift;
 
 	if (!sdkp->zones_wlock) {
 		sdkp->zones_wlock = kcalloc(BITS_TO_LONGS(sdkp->nr_zones),
@@ -531,8 +534,7 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
 	return 0;
 }
 
-int sd_zbc_read_zones(struct scsi_disk *sdkp,
-		      unsigned char *buf)
+int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 {
 	int ret;
 
@@ -543,7 +545,6 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp,
 		 */
 		return 0;
 
-
 	/* Get zoned block device characteristics */
 	ret = sd_zbc_read_zoned_characteristics(sdkp, buf);
 	if (ret)
@@ -580,10 +581,6 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp,
 	if (ret)
 		goto err;
 
-	/* READ16/WRITE16 is mandatory for ZBC disks */
-	sdkp->device->use_16_for_rw = 1;
-	sdkp->device->use_10_for_rw = 0;
-
 	return 0;
 
 err:
diff --git a/drivers/scsi/sd_zbc.h b/drivers/scsi/sd_zbc.h
new file mode 100644
index 000000000000..c120672d56d6
--- /dev/null
+++ b/drivers/scsi/sd_zbc.h
@@ -0,0 +1,75 @@
+#ifndef _SCSI_DISK_ZBC_H
+#define _SCSI_DISK_ZBC_H
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_device.h>
+#include "sd.h"
+
+static inline int sd_is_zoned(struct scsi_disk *sdkp)
+{
+	return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
+}
+
+#ifdef CONFIG_BLK_DEV_ZONED
+
+static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
+{
+	return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
+}
+
+static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
+					  sector_t sector)
+{
+	return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
+}
+
+extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
+extern void sd_zbc_remove(struct scsi_disk *sdkp);
+extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
+
+extern int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd);
+extern void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd);
+
+extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd);
+extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
+extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
+			    struct scsi_sense_hdr *sshdr);
+
+#else /* CONFIG_BLK_DEV_ZONED */
+
+static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
+				    unsigned char *buf)
+{
+	return 0;
+}
+
+static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
+
+static inline void sd_zbc_print_zones(struct scsi_disk *sdkp) {}
+
+static inline int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
+{
+	/* Let the drive fail requests */
+	return BLKPREP_OK;
+}
+
+static inline void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd) {}
+
+static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_INVALID;
+}
+
+static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
+{
+	return BLKPREP_INVALID;
+}
+
+static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
+				   unsigned int good_bytes,
+				   struct scsi_sense_hdr *sshdr) {}
+
+#endif /* CONFIG_BLK_DEV_ZONED */
+
+#endif /* _SCSI_DISK_ZBC_H */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/8] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()
  2017-09-01 11:36 [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks Damien Le Moal
                   ` (3 preceding siblings ...)
  2017-09-01 11:36 ` [PATCH 4/8] scsi: sd_zbc: Reorganize and cleanup Damien Le Moal
@ 2017-09-01 11:36 ` Damien Le Moal
  2017-09-01 16:15     ` Bart Van Assche
  2017-09-01 11:36 ` [PATCH 6/8] scsi: sd_zbc: Limit zone write locking to sequential zones Damien Le Moal
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Damien Le Moal @ 2017-09-01 11:36 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche, stable

The three values starting at byte 8 of the Zoned Block Device
Characteristics VPD page B6h are 32 bits values, not 64bits. So use
get_unaligned_be32() to retrieve the values and not get_unaligned_be64()

Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Cc: <stable@vger.kernel.org>

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd_zbc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 810377007665..0e6b5f5d14e7 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -356,15 +356,15 @@ static int sd_zbc_read_zoned_characteristics(struct scsi_disk *sdkp,
 	if (sdkp->device->type != TYPE_ZBC) {
 		/* Host-aware */
 		sdkp->urswrz = 1;
-		sdkp->zones_optimal_open = get_unaligned_be64(&buf[8]);
-		sdkp->zones_optimal_nonseq = get_unaligned_be64(&buf[12]);
+		sdkp->zones_optimal_open = get_unaligned_be32(&buf[8]);
+		sdkp->zones_optimal_nonseq = get_unaligned_be32(&buf[12]);
 		sdkp->zones_max_open = 0;
 	} else {
 		/* Host-managed */
 		sdkp->urswrz = buf[4] & 1;
 		sdkp->zones_optimal_open = 0;
 		sdkp->zones_optimal_nonseq = 0;
-		sdkp->zones_max_open = get_unaligned_be64(&buf[16]);
+		sdkp->zones_max_open = get_unaligned_be32(&buf[16]);
 	}
 
 	return 0;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/8] scsi: sd_zbc: Limit zone write locking to sequential zones
  2017-09-01 11:36 [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks Damien Le Moal
                   ` (4 preceding siblings ...)
  2017-09-01 11:36 ` [PATCH 5/8] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics() Damien Le Moal
@ 2017-09-01 11:36 ` Damien Le Moal
  2017-09-01 16:25     ` Bart Van Assche
  2017-09-01 11:36 ` [PATCH 7/8] scsi: sd_zbc: Disable zone write locking with scsi-mq Damien Le Moal
  2017-09-01 11:36 ` [PATCH 8/8] scsi: Introduce ZBC disk I/O scheduler Damien Le Moal
  7 siblings, 1 reply; 25+ messages in thread
From: Damien Le Moal @ 2017-09-01 11:36 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Zoned block devices have no write constraints for conventional zones
so write locking of conventional zones is not necessary and can hurt
performance. To avoid this, introduce the seq_zones bitmap to indicate
if a zone is a sequential one. Use this information to allow any write
to be issued to conventional zones, locking only sequential zones.

As the zone bitmap allocation for seq_zones is identical to the zones
write lock bitmap, introduce the helper sd_zbc_alloc_zone_bitmap().
Using this helper, wait for the disk capacity and number of zones to
stabilize on the second revalidation pass to allocate and initialize
the bitmaps.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd.h     |   1 +
 drivers/scsi/sd_zbc.c | 113 ++++++++++++++++++++++++++++++++++++++++++--------
 drivers/scsi/sd_zbc.h |   9 ++++
 3 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 5940390d30f3..761d3fbf72ef 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -77,6 +77,7 @@ struct scsi_disk {
 	unsigned int	zone_blocks;
 	unsigned int	zone_shift;
 	unsigned long	*zones_wlock;
+	unsigned long	*seq_zones;
 	unsigned int	zones_optimal_open;
 	unsigned int	zones_optimal_nonseq;
 	unsigned int	zones_max_open;
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 0e6b5f5d14e7..3a9feadcc133 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -246,7 +246,7 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 	sector_t sector = blk_rq_pos(rq);
 	sector_t zone_sectors = sd_zbc_zone_sectors(sdkp);
-	unsigned int zno = sd_zbc_zone_no(sdkp, sector);
+	unsigned int zno;
 
 	/*
 	 * Note: Checks of the alignment of the write command on
@@ -254,21 +254,20 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	 */
 
 	/* Do not allow zone boundaries crossing on host-managed drives */
-	if (blk_queue_zoned_model(sdkp->disk->queue) == BLK_ZONED_HM &&
+	if (blk_queue_zoned_model(rq->q) == BLK_ZONED_HM &&
 	    (sector & (zone_sectors - 1)) + blk_rq_sectors(rq) > zone_sectors)
 		return BLKPREP_KILL;
 
 	/*
-	 * Do not issue more than one write at a time per
-	 * zone. This solves write ordering problems due to
-	 * the unlocking of the request queue in the dispatch
-	 * path in the non scsi-mq case. For scsi-mq, this
-	 * also avoids potential write reordering when multiple
-	 * threads running on different CPUs write to the same
-	 * zone (with a synchronized sequential pattern).
+	 * There is no write constraint on conventional zones, but do not issue
+	 * more than one write at a time per sequential zone. This avoids write
+	 * ordering problems due to the unlocking of the request queue in the
+	 * dispatch path of the non scsi-mq (legacy) case.
 	 */
-	if (sdkp->zones_wlock &&
-	    test_and_set_bit(zno, sdkp->zones_wlock))
+	zno = sd_zbc_zone_no(sdkp, sector);
+	if (!test_bit(zno, sdkp->seq_zones))
+		return BLKPREP_OK;
+	if (test_and_set_bit(zno, sdkp->zones_wlock))
 		return BLKPREP_DEFER;
 
 	WARN_ON_ONCE(cmd->flags & SCMD_ZONE_WRITE_LOCK);
@@ -286,8 +285,9 @@ void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
 	struct request *rq = cmd->request;
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 
-	if (sdkp->zones_wlock && cmd->flags & SCMD_ZONE_WRITE_LOCK) {
+	if (cmd->flags & SCMD_ZONE_WRITE_LOCK) {
 		unsigned int zno = sd_zbc_zone_no(sdkp, blk_rq_pos(rq));
+
 		WARN_ON_ONCE(!test_bit(zno, sdkp->zones_wlock));
 		cmd->flags &= ~SCMD_ZONE_WRITE_LOCK;
 		clear_bit_unlock(zno, sdkp->zones_wlock);
@@ -510,8 +510,67 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 	return 0;
 }
 
+/*
+ * Initialize the sequential zone bitmap to allow identifying sequential zones.
+ */
+static int sd_zbc_setup_seq_zones(struct scsi_disk *sdkp)
+{
+	unsigned long *seq_zones;
+	sector_t block = 0;
+	unsigned char *buf;
+	unsigned char *rec;
+	unsigned int buf_len;
+	unsigned int list_length;
+	unsigned int n = 0;
+	int ret = -ENOMEM;
+
+	/* Allocate zone type bitmap */
+	seq_zones = sd_zbc_alloc_zone_bitmap(sdkp);
+	if (!seq_zones)
+		return -ENOMEM;
+
+	buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
+	if (!buf)
+		goto out;
+
+	while (block < sdkp->capacity) {
+
+		ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, block);
+		if (ret)
+			goto out;
+
+		/* Parse reported zone descriptors */
+		list_length = get_unaligned_be32(&buf[0]) + 64;
+		rec = buf + 64;
+		buf_len = min(list_length, SD_ZBC_BUF_SIZE);
+		while (rec < buf + buf_len) {
+			if ((rec[0] & 0x0f) != ZBC_ZONE_TYPE_CONV)
+				set_bit(n, seq_zones);
+			block += get_unaligned_be64(&rec[8]);
+			rec += 64;
+			n++;
+		}
+
+	}
+
+	if (n != sdkp->nr_zones)
+		ret = -EIO;
+
+out:
+	kfree(buf);
+	if (ret) {
+		kfree(seq_zones);
+		return ret;
+	}
+
+	sdkp->seq_zones = seq_zones;
+
+	return 0;
+}
+
 static int sd_zbc_setup(struct scsi_disk *sdkp)
 {
+	int ret;
 
 	/* READ16/WRITE16 is mandatory for ZBC disks */
 	sdkp->device->use_16_for_rw = 1;
@@ -523,17 +582,37 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
 	sdkp->nr_zones =
 		round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift;
 
+	/*
+	 * Wait for the disk capacity to stabilize before
+	 * initializing zone related information.
+	 */
+	if (sdkp->first_scan)
+		return 0;
+
 	if (!sdkp->zones_wlock) {
-		sdkp->zones_wlock = kcalloc(BITS_TO_LONGS(sdkp->nr_zones),
-					    sizeof(unsigned long),
-					    GFP_KERNEL);
+		sdkp->zones_wlock = sd_zbc_alloc_zone_bitmap(sdkp);
 		if (!sdkp->zones_wlock)
 			return -ENOMEM;
 	}
 
+	if (!sdkp->seq_zones) {
+		ret = sd_zbc_setup_seq_zones(sdkp);
+		if (ret)
+			return ret;
+	}
+
 	return 0;
 }
 
+static void sd_zbc_cleanup(struct scsi_disk *sdkp)
+{
+	kfree(sdkp->seq_zones);
+	sdkp->seq_zones = NULL;
+
+	kfree(sdkp->zones_wlock);
+	sdkp->zones_wlock = NULL;
+}
+
 int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 {
 	int ret;
@@ -585,14 +664,14 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 
 err:
 	sdkp->capacity = 0;
+	sd_zbc_cleanup(sdkp);
 
 	return ret;
 }
 
 void sd_zbc_remove(struct scsi_disk *sdkp)
 {
-	kfree(sdkp->zones_wlock);
-	sdkp->zones_wlock = NULL;
+	sd_zbc_cleanup(sdkp);
 }
 
 void sd_zbc_print_zones(struct scsi_disk *sdkp)
diff --git a/drivers/scsi/sd_zbc.h b/drivers/scsi/sd_zbc.h
index c120672d56d6..bf2126e96ded 100644
--- a/drivers/scsi/sd_zbc.h
+++ b/drivers/scsi/sd_zbc.h
@@ -24,6 +24,15 @@ static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
 	return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
 }
 
+static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
+{
+	struct request_queue *q = sdkp->disk->queue;
+
+	return kzalloc_node(BITS_TO_LONGS(sdkp->nr_zones)
+			    * sizeof(unsigned long),
+			    GFP_KERNEL, q->node);
+}
+
 extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
 extern void sd_zbc_remove(struct scsi_disk *sdkp);
 extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/8] scsi: sd_zbc: Disable zone write locking with scsi-mq
  2017-09-01 11:36 [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks Damien Le Moal
                   ` (5 preceding siblings ...)
  2017-09-01 11:36 ` [PATCH 6/8] scsi: sd_zbc: Limit zone write locking to sequential zones Damien Le Moal
@ 2017-09-01 11:36 ` Damien Le Moal
  2017-09-01 16:26     ` Bart Van Assche
  2017-09-01 11:36 ` [PATCH 8/8] scsi: Introduce ZBC disk I/O scheduler Damien Le Moal
  7 siblings, 1 reply; 25+ messages in thread
From: Damien Le Moal @ 2017-09-01 11:36 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

In the case of a ZBC disk used with scsi-mq, zone write locking does
not prevent write reordering in sequential zones. Unlike the legacy
case, zone locking can only be done after the command request is
removed from the scheduler dispatch queue. That is, at the time of
zone locking, the write command may already be out of order.

Disable zone write locking in sd_zbc_write_lock_zone() if the disk is
used with scsi-mq. Write order guarantees can be provided by an
adapted I/O scheduler.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd_zbc.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 3a9feadcc133..0f0a74fbd9c5 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -258,6 +258,10 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	    (sector & (zone_sectors - 1)) + blk_rq_sectors(rq) > zone_sectors)
 		return BLKPREP_KILL;
 
+	/* No write locking with scsi-mq */
+	if (rq->q->mq_ops)
+		return BLKPREP_OK;
+
 	/*
 	 * There is no write constraint on conventional zones, but do not issue
 	 * more than one write at a time per sequential zone. This avoids write
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8/8] scsi: Introduce ZBC disk I/O scheduler
  2017-09-01 11:36 [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks Damien Le Moal
                   ` (6 preceding siblings ...)
  2017-09-01 11:36 ` [PATCH 7/8] scsi: sd_zbc: Disable zone write locking with scsi-mq Damien Le Moal
@ 2017-09-01 11:36 ` Damien Le Moal
  2017-09-01 17:19     ` Bart Van Assche
  7 siblings, 1 reply; 25+ messages in thread
From: Damien Le Moal @ 2017-09-01 11:36 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

The zoned I/O scheduler is mostly identical to mq-deadline and retains
the same configuration attributes. The main difference is that the
zoned scheduler will ensure that at any time at most only one write
request (command) per sequential zone is in flight (has been issued to
the disk) in order to protect against sequential write reordering
potentially resulting from the concurrent execution of request dispatch
by multiple contexts.

This is achieved similarly to the legacy SCSI I/O path by write locking
zones when a write requests is issued. Subsequent writes to the same
zone have to wait for the completion of the previously issued write
before being in turn dispatched to the disk. This ensures that
sequential writes are processed in the correct order without needing
any modification to the execution model of blk-mq. In addition, this
also protects against write reordering at the HBA level (e.g. AHCI).

This zoned scheduler code is added under the drivers/scsi directory so
that information managed using the scsi_disk structure can be directly
referenced. Doing so, cluttering the block layer with device type
specific code is avoided.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 Documentation/block/zoned-iosched.txt |  48 ++
 block/Kconfig.iosched                 |  12 +
 drivers/scsi/Makefile                 |   1 +
 drivers/scsi/sd.h                     |   3 +-
 drivers/scsi/sd_zbc.c                 |  16 +-
 drivers/scsi/sd_zbc.h                 |  11 +-
 drivers/scsi/sd_zbc_iosched.c         | 934 ++++++++++++++++++++++++++++++++++
 7 files changed, 1009 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/block/zoned-iosched.txt
 create mode 100644 drivers/scsi/sd_zbc_iosched.c

diff --git a/Documentation/block/zoned-iosched.txt b/Documentation/block/zoned-iosched.txt
new file mode 100644
index 000000000000..b269125bdc61
--- /dev/null
+++ b/Documentation/block/zoned-iosched.txt
@@ -0,0 +1,48 @@
+Zoned I/O scheduler
+===================
+
+The Zoned I/O scheduler solves zoned block devices write ordering problems
+inherent to the absence of a global request queue lock in the blk-mq
+infrastructure. Multiple contexts may try to dispatch simultaneously write
+requests to the same sequential zone of a zoned block device, doing so
+potentially breaking the sequential write order imposed by the device.
+
+The Zoned I/O scheduler is based on the mq-deadline scheduler. It shares the
+same tunables and behaves in a comparable manner. The main difference introduced
+with the zoned scheduler is handling of write batches. Whereas mq-deadline will
+keep dispatching write requests to the device as long as the batching size
+allows and reads are not starved, the zoned scheduler introduces additional
+constraints:
+1) At most only one write request can be issued to a sequential zone of the
+device. This ensures that no reordering of sequential writes to a sequential
+zone can happen once the write request leaves the scheduler internal queue (rb
+tree).
+2) The number of sequential zones that can be simultaneously written is limited
+to the device advertized maximum number of open zones. This additional condition
+avoids performance degradation due to excessive open zone resource use at the
+device level.
+
+These conditions do not apply to write requests targeting conventional zones.
+For these, the zoned scheduler behaves exactly like the mq-deadline scheduler.
+
+The zoned I/O scheduler cannot be used with regular block devices. It can only
+be used with host-managed or host-aware zoned block devices.
+Using the zoned I/O scheduler is mandatory with host-managed disks unless the
+disk user tightly controls itself write sequencing to sequential zones. The
+zoned scheduler will treat host-aware disks exactly the same way as host-managed
+devices. That is, eventhough host aware disks can be randomly written, the zoned
+scheduler will still impose the limit to one write per zone so that sequential
+writes sequences are preserved.
+
+For host-managed disks, automating the used of the zoned scheduler can be done
+simply with a udev rule. An example of such rule is shown below.
+
+# Set zoned scheduler for host-managed zoned block devices
+ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/zoned}=="host-managed", \
+	ATTR{queue/scheduler}="zoned"
+
+Zoned I/O scheduler tunables
+============================
+
+Tunables of the Zoned I/O scheduler are identical to those of the deadline
+I/O scheduler. See Documentation/block/deadline-iosched.txt for details.
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index fd2cefa47d35..b87c67dbf1f6 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -70,6 +70,18 @@ config MQ_IOSCHED_DEADLINE
 	---help---
 	  MQ version of the deadline IO scheduler.
 
+config MQ_IOSCHED_ZONED
+	tristate "Zoned I/O scheduler"
+	depends on BLK_DEV_ZONED
+	depends on SCSI_MOD
+	depends on BLK_DEV_SD
+	default y
+	---help---
+	  MQ deadline IO scheduler with support for zoned block devices.
+
+	  This should be set as the default I/O scheduler for host-managed
+	  zoned block devices. It is optional for host-aware block devices.
+
 config MQ_IOSCHED_KYBER
 	tristate "Kyber I/O scheduler"
 	default y
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 93dbe58c47c8..5e35e1a7a1a4 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -176,6 +176,7 @@ hv_storvsc-y			:= storvsc_drv.o
 sd_mod-objs	:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
 sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
+obj-$(CONFIG_MQ_IOSCHED_ZONED) += sd_zbc_iosched.o
 
 sr_mod-objs	:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 761d3fbf72ef..fe9c93dacdd9 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -75,7 +75,8 @@ struct scsi_disk {
 #ifdef CONFIG_BLK_DEV_ZONED
 	unsigned int	nr_zones;
 	unsigned int	zone_blocks;
-	unsigned int	zone_shift;
+	unsigned int	zone_sectors;
+	unsigned int	zone_sectors_shift;
 	unsigned long	*zones_wlock;
 	unsigned long	*seq_zones;
 	unsigned int	zones_optimal_open;
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 0f0a74fbd9c5..7910a449cf53 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -217,7 +217,7 @@ int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
 	if (sdkp->device->changed)
 		return BLKPREP_KILL;
 
-	if (sector & (sd_zbc_zone_sectors(sdkp) - 1))
+	if (sector & (sdkp->zone_sectors - 1))
 		/* Unaligned request */
 		return BLKPREP_KILL;
 
@@ -245,7 +245,7 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	struct request *rq = cmd->request;
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 	sector_t sector = blk_rq_pos(rq);
-	sector_t zone_sectors = sd_zbc_zone_sectors(sdkp);
+	sector_t zone_sectors = sdkp->zone_sectors;
 	unsigned int zno;
 
 	/*
@@ -268,7 +268,7 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 	 * ordering problems due to the unlocking of the request queue in the
 	 * dispatch path of the non scsi-mq (legacy) case.
 	 */
-	zno = sd_zbc_zone_no(sdkp, sector);
+	zno = sd_zbc_rq_zone_no(sdkp, rq);
 	if (!test_bit(zno, sdkp->seq_zones))
 		return BLKPREP_OK;
 	if (test_and_set_bit(zno, sdkp->zones_wlock))
@@ -290,7 +290,7 @@ void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 
 	if (cmd->flags & SCMD_ZONE_WRITE_LOCK) {
-		unsigned int zno = sd_zbc_zone_no(sdkp, blk_rq_pos(rq));
+		unsigned int zno = sd_zbc_rq_zone_no(sdkp, rq);
 
 		WARN_ON_ONCE(!test_bit(zno, sdkp->zones_wlock));
 		cmd->flags &= ~SCMD_ZONE_WRITE_LOCK;
@@ -509,7 +509,8 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 	}
 
 	sdkp->zone_blocks = zone_blocks;
-	sdkp->zone_shift = ilog2(zone_blocks);
+	sdkp->zone_sectors = logical_to_sectors(sdkp->device, zone_blocks);
+	sdkp->zone_sectors_shift = ilog2(sdkp->zone_sectors);
 
 	return 0;
 }
@@ -574,6 +575,7 @@ static int sd_zbc_setup_seq_zones(struct scsi_disk *sdkp)
 
 static int sd_zbc_setup(struct scsi_disk *sdkp)
 {
+	sector_t zone_blocks = sdkp->zone_blocks;
 	int ret;
 
 	/* READ16/WRITE16 is mandatory for ZBC disks */
@@ -582,9 +584,9 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
 
 	/* chunk_sectors indicates the zone size */
 	blk_queue_chunk_sectors(sdkp->disk->queue,
-			logical_to_sectors(sdkp->device, sdkp->zone_blocks));
+				logical_to_sectors(sdkp->device, zone_blocks));
 	sdkp->nr_zones =
-		round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift;
+		round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks);
 
 	/*
 	 * Wait for the disk capacity to stabilize before
diff --git a/drivers/scsi/sd_zbc.h b/drivers/scsi/sd_zbc.h
index bf2126e96ded..bb2818757e85 100644
--- a/drivers/scsi/sd_zbc.h
+++ b/drivers/scsi/sd_zbc.h
@@ -13,15 +13,10 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp)
 
 #ifdef CONFIG_BLK_DEV_ZONED
 
-static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
+static inline unsigned int sd_zbc_rq_zone_no(struct scsi_disk *sdkp,
+					     struct request *rq)
 {
-	return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
-}
-
-static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
-					  sector_t sector)
-{
-	return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
+	return blk_rq_pos(rq) >> sdkp->zone_sectors_shift;
 }
 
 static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
diff --git a/drivers/scsi/sd_zbc_iosched.c b/drivers/scsi/sd_zbc_iosched.c
new file mode 100644
index 000000000000..af6328b3600f
--- /dev/null
+++ b/drivers/scsi/sd_zbc_iosched.c
@@ -0,0 +1,934 @@
+/*
+ *  Zoned MQ Deadline i/o scheduler - adaptation of the MQ deadline scheduler,
+ *  for zoned block devices used with the blk-mq scheduling framework
+ *
+ *  Copyright (C) 2016 Jens Axboe <axboe@kernel.dk>
+ *  Copyright (C) 2017 Damien Le Moal <damien.lemoal@wdc.com>
+ */
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/blkdev.h>
+#include <linux/blk-mq.h>
+#include <linux/elevator.h>
+#include <linux/bio.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/compiler.h>
+#include <linux/rbtree.h>
+#include <linux/sbitmap.h>
+#include <linux/seq_file.h>
+
+#include "sd_zbc.h"
+
+/*
+ * See Documentation/block/deadline-iosched.txt
+ */
+
+/* max time before a read is submitted. */
+static const int read_expire = HZ / 2;
+
+/* ditto for writes, these limits are SOFT! */
+static const int write_expire = 5 * HZ;
+
+/* max times reads can starve a write */
+static const int writes_starved = 2;
+
+/*
+ * Number of sequential requests treated as one by the above parameters.
+ * For throughput.
+ */
+static const int fifo_batch = 16;
+
+/*
+ * Run time data.
+ */
+struct zoned_data {
+	/*
+	 * requests (zoned_rq s) are present on both sort_list and fifo_list
+	 */
+	struct rb_root sort_list[2];
+	struct list_head fifo_list[2];
+
+	/*
+	 * next in sort order. read, write or both are NULL
+	 */
+	struct request *next_rq[2];
+	unsigned int batching;		/* number of sequential requests made */
+	unsigned int starved;		/* times reads have starved writes */
+
+	/*
+	 * settings that change how the i/o scheduler behaves
+	 */
+	int fifo_expire[2];
+	int fifo_batch;
+	int writes_starved;
+	int front_merges;
+
+	spinlock_t lock;
+	struct list_head dispatch;
+
+	struct scsi_disk *sdkp;
+
+	spinlock_t zones_lock;
+	unsigned long *zones_wlock;
+	unsigned long *seq_zones;
+};
+
+static inline struct rb_root *
+zoned_rb_root(struct zoned_data *zd, struct request *rq)
+{
+	return &zd->sort_list[rq_data_dir(rq)];
+}
+
+/*
+ * get the request after `rq' in sector-sorted order
+ */
+static inline struct request *
+zoned_latter_request(struct request *rq)
+{
+	struct rb_node *node = rb_next(&rq->rb_node);
+
+	if (node)
+		return rb_entry_rq(node);
+
+	return NULL;
+}
+
+static void
+zoned_add_rq_rb(struct zoned_data *zd, struct request *rq)
+{
+	struct rb_root *root = zoned_rb_root(zd, rq);
+
+	elv_rb_add(root, rq);
+}
+
+static inline void
+zoned_del_rq_rb(struct zoned_data *zd, struct request *rq)
+{
+	const int data_dir = rq_data_dir(rq);
+
+	if (zd->next_rq[data_dir] == rq)
+		zd->next_rq[data_dir] = zoned_latter_request(rq);
+
+	elv_rb_del(zoned_rb_root(zd, rq), rq);
+}
+
+/*
+ * remove rq from rbtree and fifo.
+ */
+static void zoned_remove_request(struct request_queue *q, struct request *rq)
+{
+	struct zoned_data *zd = q->elevator->elevator_data;
+
+	list_del_init(&rq->queuelist);
+
+	/*
+	 * We might not be on the rbtree, if we are doing an insert merge
+	 */
+	if (!RB_EMPTY_NODE(&rq->rb_node))
+		zoned_del_rq_rb(zd, rq);
+
+	elv_rqhash_del(q, rq);
+	if (q->last_merge == rq)
+		q->last_merge = NULL;
+}
+
+static void zd_request_merged(struct request_queue *q, struct request *req,
+			      enum elv_merge type)
+{
+	struct zoned_data *zd = q->elevator->elevator_data;
+
+	/*
+	 * if the merge was a front merge, we need to reposition request
+	 */
+	if (type == ELEVATOR_FRONT_MERGE) {
+		elv_rb_del(zoned_rb_root(zd, req), req);
+		zoned_add_rq_rb(zd, req);
+	}
+}
+
+static void zd_merged_requests(struct request_queue *q, struct request *req,
+			       struct request *next)
+{
+	/*
+	 * if next expires before rq, assign its expire time to rq
+	 * and move into next position (next will be deleted) in fifo
+	 */
+	if (!list_empty(&req->queuelist) && !list_empty(&next->queuelist)) {
+		if (time_before((unsigned long)next->fifo_time,
+				(unsigned long)req->fifo_time)) {
+			list_move(&req->queuelist, &next->queuelist);
+			req->fifo_time = next->fifo_time;
+		}
+	}
+
+	/*
+	 * kill knowledge of next, this one is a goner
+	 */
+	zoned_remove_request(q, next);
+}
+
+/*
+ * Return true if a request is a write requests that needs zone
+ * write locking.
+ */
+static inline bool zoned_request_needs_wlock(struct zoned_data *zd,
+					     struct request *rq)
+{
+	unsigned int zno = sd_zbc_rq_zone_no(zd->sdkp, rq);
+
+	if (blk_rq_is_passthrough(rq))
+		return false;
+
+	if (!test_bit(zno, zd->seq_zones))
+		return false;
+
+	switch (req_op(rq)) {
+	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_WRITE_SAME:
+	case REQ_OP_WRITE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+/*
+ * Abuse the elv.priv[0] pointer to indicate if a request
+ * has locked its target zone.
+ */
+#define RQ_LOCKED_ZONE		((void *)1UL)
+static inline void zoned_set_request_lock(struct request *rq)
+{
+	rq->elv.priv[0] = RQ_LOCKED_ZONE;
+}
+
+#define RQ_ZONE_NO_LOCK		((void *)0UL)
+static inline void zoned_clear_request_lock(struct request *rq)
+{
+	rq->elv.priv[0] = RQ_ZONE_NO_LOCK;
+}
+
+static inline bool zoned_request_has_lock(struct request *rq)
+{
+	return rq->elv.priv[0] == RQ_LOCKED_ZONE;
+}
+
+/*
+ * Write lock the target zone of a write request.
+ */
+static void zoned_wlock_request_zone(struct zoned_data *zd, struct request *rq)
+{
+	unsigned int zno = sd_zbc_rq_zone_no(zd->sdkp, rq);
+
+	WARN_ON_ONCE(zoned_request_has_lock(rq));
+	WARN_ON_ONCE(test_and_set_bit(zno, zd->zones_wlock));
+	zoned_set_request_lock(rq);
+}
+
+/*
+ * Write unlock the target zone of a write request.
+ */
+static void zoned_wunlock_request_zone(struct zoned_data *zd,
+				       struct request *rq)
+{
+	unsigned int zno = sd_zbc_rq_zone_no(zd->sdkp, rq);
+	unsigned long flags;
+
+	/*
+	 * Dispatch may be running on a different CPU.
+	 * So do not unlock the zone until it is done or
+	 * a write request in the middle of a sequence may end up
+	 * being dispatched.
+	 */
+	spin_lock_irqsave(&zd->zones_lock, flags);
+
+	WARN_ON_ONCE(!test_and_clear_bit(zno, zd->zones_wlock));
+	zoned_clear_request_lock(rq);
+
+	spin_unlock_irqrestore(&zd->zones_lock, flags);
+}
+
+/*
+ * Test the write lock state of the target zone of a write request.
+ */
+static inline bool zoned_request_zone_is_wlocked(struct zoned_data *zd,
+						 struct request *rq)
+{
+	unsigned int zno = sd_zbc_rq_zone_no(zd->sdkp, rq);
+
+	return test_bit(zno, zd->zones_wlock);
+}
+
+/*
+ * move an entry to dispatch queue
+ */
+static void zoned_move_request(struct zoned_data *zd, struct request *rq)
+{
+	const int data_dir = rq_data_dir(rq);
+
+	zd->next_rq[READ] = NULL;
+	zd->next_rq[WRITE] = NULL;
+	zd->next_rq[data_dir] = zoned_latter_request(rq);
+
+	/*
+	 * take it off the sort and fifo list
+	 */
+	zoned_remove_request(rq->q, rq);
+}
+
+/*
+ * zoned_check_fifo returns 0 if there are no expired requests on the fifo,
+ * 1 otherwise. Requires !list_empty(&zd->fifo_list[data_dir])
+ */
+static inline int zoned_check_fifo(struct zoned_data *zd, int ddir)
+{
+	struct request *rq = rq_entry_fifo(zd->fifo_list[ddir].next);
+
+	/*
+	 * rq is expired!
+	 */
+	if (time_after_eq(jiffies, (unsigned long)rq->fifo_time))
+		return 1;
+
+	return 0;
+}
+
+/*
+ * Test if a request can be dispatched.
+ */
+static inline bool zoned_can_dispatch_request(struct zoned_data *zd,
+					      struct request *rq)
+{
+	return !zoned_request_needs_wlock(zd, rq) ||
+		!zoned_request_zone_is_wlocked(zd, rq);
+}
+
+/*
+ * For the specified data direction, find the next request that can be
+ * dispatched. Search in increasing sector position.
+ */
+static struct request *
+zoned_next_request(struct zoned_data *zd, int data_dir)
+{
+	struct request *rq = zd->next_rq[data_dir];
+	unsigned long flags;
+
+	if (data_dir == READ)
+		return rq;
+
+	spin_lock_irqsave(&zd->zones_lock, flags);
+	while (rq) {
+		if (zoned_can_dispatch_request(zd, rq))
+			break;
+		rq = zoned_latter_request(rq);
+	}
+	spin_unlock_irqrestore(&zd->zones_lock, flags);
+
+	return rq;
+}
+
+/*
+ * For the specified data direction, find the next request that can be
+ * dispatched. Search in arrival order from the oldest request.
+ */
+static struct request *
+zoned_fifo_request(struct zoned_data *zd, int data_dir)
+{
+	struct request *rq;
+	unsigned long flags;
+
+	if (list_empty(&zd->fifo_list[data_dir]))
+		return NULL;
+
+	if (data_dir == READ)
+		return rq_entry_fifo(zd->fifo_list[READ].next);
+
+	spin_lock_irqsave(&zd->zones_lock, flags);
+
+	list_for_each_entry(rq, &zd->fifo_list[WRITE], queuelist) {
+		if (zoned_can_dispatch_request(zd, rq))
+			goto out;
+	}
+	rq = NULL;
+
+out:
+	spin_unlock_irqrestore(&zd->zones_lock, flags);
+
+	return rq;
+}
+
+/*
+ * __zd_dispatch_request selects the best request according to
+ * read/write batch expiration, fifo_batch, target zone lock state, etc
+ */
+static struct request *__zd_dispatch_request(struct blk_mq_hw_ctx *hctx)
+{
+	struct zoned_data *zd = hctx->queue->elevator->elevator_data;
+	struct request *rq = NULL, *next_rq;
+	bool reads, writes;
+	int data_dir;
+
+	if (!list_empty(&zd->dispatch)) {
+		rq = list_first_entry(&zd->dispatch, struct request, queuelist);
+		list_del_init(&rq->queuelist);
+		goto done;
+	}
+
+	reads = !list_empty(&zd->fifo_list[READ]);
+	writes = !list_empty(&zd->fifo_list[WRITE]);
+
+	/*
+	 * batches are currently reads XOR writes
+	 */
+	rq = zoned_next_request(zd, WRITE);
+	if (!rq)
+		rq = zoned_next_request(zd, READ);
+	if (rq && zd->batching < zd->fifo_batch)
+		/* we have a next request are still entitled to batch */
+		goto dispatch_request;
+
+	/*
+	 * at this point we are not running a batch. select the appropriate
+	 * data direction (read / write)
+	 */
+
+	if (reads) {
+		if (writes && (zd->starved++ >= zd->writes_starved))
+			goto dispatch_writes;
+
+		data_dir = READ;
+
+		goto dispatch_find_request;
+	}
+
+	/*
+	 * there are either no reads or writes have been starved
+	 */
+
+	if (writes) {
+dispatch_writes:
+		zd->starved = 0;
+
+		/* Really select writes if at least one can be dispatched */
+		if (zoned_fifo_request(zd, WRITE))
+			data_dir = WRITE;
+		else
+			data_dir = READ;
+
+		goto dispatch_find_request;
+	}
+
+	return NULL;
+
+dispatch_find_request:
+	/*
+	 * we are not running a batch, find best request for selected data_dir
+	 */
+	next_rq = zoned_next_request(zd, data_dir);
+	if (zoned_check_fifo(zd, data_dir) || !next_rq) {
+		/*
+		 * A deadline has expired, the last request was in the other
+		 * direction, or we have run out of higher-sectored requests.
+		 * Start again from the request with the earliest expiry time.
+		 */
+		rq = zoned_fifo_request(zd, data_dir);
+	} else {
+		/*
+		 * The last req was the same dir and we have a next request in
+		 * sort order. No expired requests so continue on from here.
+		 */
+		rq = next_rq;
+	}
+
+	if (!rq)
+		return NULL;
+
+	zd->batching = 0;
+
+dispatch_request:
+	/*
+	 * rq is the selected appropriate request.
+	 */
+	zd->batching++;
+	zoned_move_request(zd, rq);
+
+done:
+	/*
+	 * If the request needs its target zone locked, do it.
+	 */
+	if (zoned_request_needs_wlock(zd, rq))
+		zoned_wlock_request_zone(zd, rq);
+	rq->rq_flags |= RQF_STARTED;
+	return rq;
+}
+
+static struct request *zd_dispatch_request(struct blk_mq_hw_ctx *hctx)
+{
+	struct zoned_data *zd = hctx->queue->elevator->elevator_data;
+	struct request *rq;
+
+	spin_lock(&zd->lock);
+	rq = __zd_dispatch_request(hctx);
+	spin_unlock(&zd->lock);
+
+	return rq;
+}
+
+static int zd_request_merge(struct request_queue *q, struct request **rq,
+			    struct bio *bio)
+{
+	struct zoned_data *zd = q->elevator->elevator_data;
+	sector_t sector = bio_end_sector(bio);
+	struct request *__rq;
+
+	if (!zd->front_merges)
+		return ELEVATOR_NO_MERGE;
+
+	__rq = elv_rb_find(&zd->sort_list[bio_data_dir(bio)], sector);
+	if (__rq) {
+		if (WARN_ON(sector != blk_rq_pos(__rq)))
+			return ELEVATOR_NO_MERGE;
+
+		if (elv_bio_merge_ok(__rq, bio)) {
+			*rq = __rq;
+			return ELEVATOR_FRONT_MERGE;
+		}
+	}
+
+	return ELEVATOR_NO_MERGE;
+}
+
+static bool zd_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
+{
+	struct request_queue *q = hctx->queue;
+	struct zoned_data *zd = q->elevator->elevator_data;
+	struct request *free = NULL;
+	bool ret;
+
+	spin_lock(&zd->lock);
+	ret = blk_mq_sched_try_merge(q, bio, &free);
+	spin_unlock(&zd->lock);
+
+	if (free)
+		blk_mq_free_request(free);
+
+	return ret;
+}
+
+/*
+ * add rq to rbtree and fifo
+ */
+static void __zd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
+				bool at_head)
+{
+	struct request_queue *q = hctx->queue;
+	struct zoned_data *zd = q->elevator->elevator_data;
+	const int data_dir = rq_data_dir(rq);
+
+	if (blk_mq_sched_try_insert_merge(q, rq))
+		return;
+
+	blk_mq_sched_request_inserted(rq);
+
+	if (at_head || blk_rq_is_passthrough(rq)) {
+		if (at_head)
+			list_add(&rq->queuelist, &zd->dispatch);
+		else
+			list_add_tail(&rq->queuelist, &zd->dispatch);
+	} else {
+		zoned_add_rq_rb(zd, rq);
+
+		if (rq_mergeable(rq)) {
+			elv_rqhash_add(q, rq);
+			if (!q->last_merge)
+				q->last_merge = rq;
+		}
+
+		/*
+		 * set expire time and add to fifo list
+		 */
+		rq->fifo_time = jiffies + zd->fifo_expire[data_dir];
+		list_add_tail(&rq->queuelist, &zd->fifo_list[data_dir]);
+	}
+}
+
+static void zd_insert_requests(struct blk_mq_hw_ctx *hctx,
+			       struct list_head *list, bool at_head)
+{
+	struct request_queue *q = hctx->queue;
+	struct zoned_data *zd = q->elevator->elevator_data;
+
+	spin_lock(&zd->lock);
+	while (!list_empty(list)) {
+		struct request *rq;
+
+		rq = list_first_entry(list, struct request, queuelist);
+		list_del_init(&rq->queuelist);
+
+		/*
+		 * This may be a requeue of a request that has locked its
+		 * target zone. If this is the case, release the zone lock.
+		 */
+		if (zoned_request_has_lock(rq))
+			zoned_wunlock_request_zone(zd, rq);
+
+		__zd_insert_request(hctx, rq, at_head);
+	}
+	spin_unlock(&zd->lock);
+}
+
+/*
+ * Write unlock the target zone of a completed write request.
+ */
+static void zd_completed_request(struct request *rq)
+{
+
+	if (zoned_request_has_lock(rq)) {
+		struct zoned_data *zd = rq->q->elevator->elevator_data;
+
+		zoned_wunlock_request_zone(zd, rq);
+	}
+}
+
+static bool zd_has_work(struct blk_mq_hw_ctx *hctx)
+{
+	struct zoned_data *zd = hctx->queue->elevator->elevator_data;
+
+	return !list_empty_careful(&zd->dispatch) ||
+		!list_empty_careful(&zd->fifo_list[0]) ||
+		!list_empty_careful(&zd->fifo_list[1]);
+}
+
+static struct scsi_disk *__zoned_scsi_disk(struct request_queue *q)
+{
+	struct scsi_device *sdp;
+	struct scsi_disk *sdkp;
+
+	if (!blk_queue_is_zoned(q)) {
+		pr_err("zoned: Not a zoned block device\n");
+		return NULL;
+	}
+
+	sdp = scsi_device_from_queue(q);
+	if (!sdp) {
+		pr_err("zoned: Not a SCSI device\n");
+		return NULL;
+	}
+
+	sdkp = dev_get_drvdata(&sdp->sdev_gendev);
+	if (WARN_ON(sdkp->disk->queue != q))
+		return NULL;
+
+	return sdkp;
+}
+
+/*
+ * Initialize elevator private data (zoned_data).
+ */
+static int zd_init_queue(struct request_queue *q, struct elevator_type *e)
+{
+	struct scsi_disk *sdkp;
+	struct zoned_data *zd;
+	struct elevator_queue *eq;
+	int ret;
+
+	sdkp = __zoned_scsi_disk(q);
+	if (!sdkp)
+		return -ENODEV;
+
+	eq = elevator_alloc(q, e);
+	if (!eq)
+		return -ENOMEM;
+
+	zd = kzalloc_node(sizeof(*zd), GFP_KERNEL, q->node);
+	if (!zd) {
+		kobject_put(&eq->kobj);
+		return -ENOMEM;
+	}
+
+	INIT_LIST_HEAD(&zd->fifo_list[READ]);
+	INIT_LIST_HEAD(&zd->fifo_list[WRITE]);
+	zd->sort_list[READ] = RB_ROOT;
+	zd->sort_list[WRITE] = RB_ROOT;
+	zd->fifo_expire[READ] = read_expire;
+	zd->fifo_expire[WRITE] = write_expire;
+	zd->writes_starved = writes_starved;
+	zd->front_merges = 1;
+	zd->fifo_batch = fifo_batch;
+	spin_lock_init(&zd->lock);
+	INIT_LIST_HEAD(&zd->dispatch);
+
+	zd->sdkp = sdkp;
+	spin_lock_init(&zd->zones_lock);
+
+	zd->zones_wlock = sdkp->zones_wlock;
+	zd->seq_zones = sdkp->seq_zones;
+	if (!zd->zones_wlock || !zd->seq_zones) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	eq->elevator_data = zd;
+	q->elevator = eq;
+
+	return 0;
+
+err:
+	kfree(zd);
+	kobject_put(&eq->kobj);
+
+	return ret;
+}
+
+static void zd_exit_queue(struct elevator_queue *e)
+{
+	struct zoned_data *zd = e->elevator_data;
+
+	WARN_ON(!list_empty(&zd->fifo_list[READ]));
+	WARN_ON(!list_empty(&zd->fifo_list[WRITE]));
+
+	kfree(zd);
+}
+
+/*
+ * sysfs parts below
+ */
+static ssize_t
+zoned_var_show(int var, char *page)
+{
+	return sprintf(page, "%d\n", var);
+}
+
+static ssize_t
+zoned_var_store(int *var, const char *page, size_t count)
+{
+	char *p = (char *) page;
+	int ret;
+
+	ret = kstrtoint(p, 10, var);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+#define SHOW_FUNCTION(__FUNC, __VAR, __CONV)				\
+static ssize_t __FUNC(struct elevator_queue *e, char *page)		\
+{									\
+	struct zoned_data *zd = e->elevator_data;			\
+	int __data = __VAR;						\
+	if (__CONV)							\
+		__data = jiffies_to_msecs(__data);			\
+	return zoned_var_show(__data, (page));			\
+}
+SHOW_FUNCTION(zoned_read_expire_show, zd->fifo_expire[READ], 1);
+SHOW_FUNCTION(zoned_write_expire_show, zd->fifo_expire[WRITE], 1);
+SHOW_FUNCTION(zoned_writes_starved_show, zd->writes_starved, 0);
+SHOW_FUNCTION(zoned_front_merges_show, zd->front_merges, 0);
+SHOW_FUNCTION(zoned_fifo_batch_show, zd->fifo_batch, 0);
+#undef SHOW_FUNCTION
+
+#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV)			\
+static ssize_t __FUNC(struct elevator_queue *e,				\
+		      const char *page, size_t count)			\
+{									\
+	struct zoned_data *zd = e->elevator_data;			\
+	int __data;							\
+	int ret = zoned_var_store(&__data, (page), count);		\
+	if (__data < (MIN))						\
+		__data = (MIN);						\
+	else if (__data > (MAX))					\
+		__data = (MAX);						\
+	if (__CONV)							\
+		*(__PTR) = msecs_to_jiffies(__data);			\
+	else								\
+		*(__PTR) = __data;					\
+	return ret;							\
+}
+STORE_FUNCTION(zoned_read_expire_store, &zd->fifo_expire[READ],
+	       0, INT_MAX, 1);
+STORE_FUNCTION(zoned_write_expire_store, &zd->fifo_expire[WRITE],
+	       0, INT_MAX, 1);
+STORE_FUNCTION(zoned_writes_starved_store, &zd->writes_starved,
+	       INT_MIN, INT_MAX, 0);
+STORE_FUNCTION(zoned_front_merges_store, &zd->front_merges,
+	       0, 1, 0);
+STORE_FUNCTION(zoned_fifo_batch_store, &zd->fifo_batch,
+	       0, INT_MAX, 0);
+#undef STORE_FUNCTION
+
+#define DD_ATTR(name) \
+	__ATTR(name, S_IRUGO|S_IWUSR, zoned_##name##_show, \
+				      zoned_##name##_store)
+
+static struct elv_fs_entry zoned_attrs[] = {
+	DD_ATTR(read_expire),
+	DD_ATTR(write_expire),
+	DD_ATTR(writes_starved),
+	DD_ATTR(front_merges),
+	DD_ATTR(fifo_batch),
+	__ATTR_NULL
+};
+
+#ifdef CONFIG_BLK_DEBUG_FS
+#define ZONED_DEBUGFS_DDIR_ATTRS(ddir, name)				\
+static void *zoned_##name##_fifo_start(struct seq_file *m,		\
+					  loff_t *pos)			\
+	__acquires(&zd->lock)						\
+{									\
+	struct request_queue *q = m->private;				\
+	struct zoned_data *zd = q->elevator->elevator_data;		\
+									\
+	spin_lock(&zd->lock);						\
+	return seq_list_start(&zd->fifo_list[ddir], *pos);		\
+}									\
+									\
+static void *zoned_##name##_fifo_next(struct seq_file *m, void *v,	\
+					 loff_t *pos)			\
+{									\
+	struct request_queue *q = m->private;				\
+	struct zoned_data *zd = q->elevator->elevator_data;		\
+									\
+	return seq_list_next(v, &zd->fifo_list[ddir], pos);		\
+}									\
+									\
+static void zoned_##name##_fifo_stop(struct seq_file *m, void *v)	\
+	__releases(&zd->lock)						\
+{									\
+	struct request_queue *q = m->private;				\
+	struct zoned_data *zd = q->elevator->elevator_data;		\
+									\
+	spin_unlock(&zd->lock);						\
+}									\
+									\
+static const struct seq_operations zoned_##name##_fifo_seq_ops = {	\
+	.start	= zoned_##name##_fifo_start,				\
+	.next	= zoned_##name##_fifo_next,				\
+	.stop	= zoned_##name##_fifo_stop,				\
+	.show	= blk_mq_debugfs_rq_show,				\
+};									\
+									\
+static int zoned_##name##_next_rq_show(void *data,			\
+					  struct seq_file *m)		\
+{									\
+	struct request_queue *q = data;					\
+	struct zoned_data *zd = q->elevator->elevator_data;		\
+	struct request *rq = zd->next_rq[ddir];				\
+									\
+	if (rq)								\
+		__blk_mq_debugfs_rq_show(m, rq);			\
+	return 0;							\
+}
+ZONED_DEBUGFS_DDIR_ATTRS(READ, read)
+ZONED_DEBUGFS_DDIR_ATTRS(WRITE, write)
+#undef ZONED_DEBUGFS_DDIR_ATTRS
+
+static int zoned_batching_show(void *data, struct seq_file *m)
+{
+	struct request_queue *q = data;
+	struct zoned_data *zd = q->elevator->elevator_data;
+
+	seq_printf(m, "%u\n", zd->batching);
+	return 0;
+}
+
+static int zoned_starved_show(void *data, struct seq_file *m)
+{
+	struct request_queue *q = data;
+	struct zoned_data *zd = q->elevator->elevator_data;
+
+	seq_printf(m, "%u\n", zd->starved);
+	return 0;
+}
+
+static void *zoned_dispatch_start(struct seq_file *m, loff_t *pos)
+	__acquires(&zd->lock)
+{
+	struct request_queue *q = m->private;
+	struct zoned_data *zd = q->elevator->elevator_data;
+
+	spin_lock(&zd->lock);
+	return seq_list_start(&zd->dispatch, *pos);
+}
+
+static void *zoned_dispatch_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct request_queue *q = m->private;
+	struct zoned_data *zd = q->elevator->elevator_data;
+
+	return seq_list_next(v, &zd->dispatch, pos);
+}
+
+static void zoned_dispatch_stop(struct seq_file *m, void *v)
+	__releases(&zd->lock)
+{
+	struct request_queue *q = m->private;
+	struct zoned_data *zd = q->elevator->elevator_data;
+
+	spin_unlock(&zd->lock);
+}
+
+static const struct seq_operations zoned_dispatch_seq_ops = {
+	.start	= zoned_dispatch_start,
+	.next	= zoned_dispatch_next,
+	.stop	= zoned_dispatch_stop,
+	.show	= blk_mq_debugfs_rq_show,
+};
+
+#define ZONED_QUEUE_DDIR_ATTRS(name)					     \
+	{#name "_fifo_list", 0400, .seq_ops = &zoned_##name##_fifo_seq_ops}, \
+	{#name "_next_rq", 0400, zoned_##name##_next_rq_show}
+static const struct blk_mq_debugfs_attr zoned_queue_debugfs_attrs[] = {
+	ZONED_QUEUE_DDIR_ATTRS(read),
+	ZONED_QUEUE_DDIR_ATTRS(write),
+	{"batching", 0400, zoned_batching_show},
+	{"starved", 0400, zoned_starved_show},
+	{"dispatch", 0400, .seq_ops = &zoned_dispatch_seq_ops},
+	{},
+};
+#undef ZONED_QUEUE_DDIR_ATTRS
+#endif
+
+static struct elevator_type zoned_elv = {
+	.ops.mq = {
+		.insert_requests	= zd_insert_requests,
+		.dispatch_request	= zd_dispatch_request,
+		.completed_request	= zd_completed_request,
+		.next_request		= elv_rb_latter_request,
+		.former_request		= elv_rb_former_request,
+		.bio_merge		= zd_bio_merge,
+		.request_merge		= zd_request_merge,
+		.requests_merged	= zd_merged_requests,
+		.request_merged		= zd_request_merged,
+		.has_work		= zd_has_work,
+		.init_sched		= zd_init_queue,
+		.exit_sched		= zd_exit_queue,
+	},
+
+	.uses_mq	= true,
+#ifdef CONFIG_BLK_DEBUG_FS
+	.queue_debugfs_attrs = zoned_queue_debugfs_attrs,
+#endif
+	.elevator_attrs = zoned_attrs,
+	.elevator_name = "zoned",
+	.elevator_owner = THIS_MODULE,
+};
+
+static int __init zoned_init(void)
+{
+	return elv_register(&zoned_elv);
+}
+
+static void __exit zoned_exit(void)
+{
+	elv_unregister(&zoned_elv);
+}
+
+module_init(zoned_init);
+module_exit(zoned_exit);
+
+MODULE_AUTHOR("Damien Le Moal");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Zoned MQ deadline IO scheduler");
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8] block: Fix declaration of blk-mq debugfs functions
  2017-09-01 11:36 ` [PATCH 1/8] block: Fix declaration of blk-mq debugfs functions Damien Le Moal
@ 2017-09-01 16:00     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:00 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDIwOjM2ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gZGlmZiAtLWdpdCBhL2luY2x1ZGUvbGludXgvYmxrLW1xLmggYi9pbmNsdWRlL2xpbnV4L2Js
ay1tcS5oDQo+IGluZGV4IDE0NTQyMzA4ZDI1Yi4uYTM2OTE3NGE5Njc5IDEwMDY0NA0KPiAtLS0g
YS9pbmNsdWRlL2xpbnV4L2Jsay1tcS5oDQo+ICsrKyBiL2luY2x1ZGUvbGludXgvYmxrLW1xLmgN
Cj4gQEAgLTUsNiArNSwyMSBAQA0KPiAgI2luY2x1ZGUgPGxpbnV4L3NiaXRtYXAuaD4NCj4gICNp
bmNsdWRlIDxsaW51eC9zcmN1Lmg+DQo+ICANCj4gKyNpZmRlZiBDT05GSUdfQkxLX0RFQlVHX0ZT
DQo+ICsNCj4gKyNpbmNsdWRlIDxsaW51eC9zZXFfZmlsZS5oPg0KPiArDQo+ICtzdHJ1Y3QgYmxr
X21xX2RlYnVnZnNfYXR0ciB7DQo+ICsJY29uc3QgY2hhciAqbmFtZTsNCj4gKwl1bW9kZV90IG1v
ZGU7DQo+ICsJaW50ICgqc2hvdykodm9pZCAqLCBzdHJ1Y3Qgc2VxX2ZpbGUgKik7DQo+ICsJc3Np
emVfdCAoKndyaXRlKSh2b2lkICosIGNvbnN0IGNoYXIgX191c2VyICosIHNpemVfdCwgbG9mZl90
ICopOw0KPiArCS8qIFNldCBlaXRoZXIgLnNob3cgb3IgLnNlcV9vcHMuICovDQo+ICsJY29uc3Qg
c3RydWN0IHNlcV9vcGVyYXRpb25zICpzZXFfb3BzOw0KPiArfTsNCj4gKw0KPiArI2VuZGlmDQo+
ICsNCj4gIHN0cnVjdCBibGtfbXFfdGFnczsNCj4gIHN0cnVjdCBibGtfZmx1c2hfcXVldWU7DQo+
ICANCj4gQEAgLTI4OSw0ICszMDQsOSBAQCBzdGF0aWMgaW5saW5lIHZvaWQgKmJsa19tcV9ycV90
b19wZHUoc3RydWN0IHJlcXVlc3QgKnJxKQ0KPiAgCWZvciAoKGkpID0gMDsgKGkpIDwgKGhjdHgp
LT5ucl9jdHggJiYJCQkJXA0KPiAgCSAgICAgKHsgY3R4ID0gKGhjdHgpLT5jdHhzWyhpKV07IDE7
IH0pOyAoaSkrKykNCj4gIA0KPiArI2lmZGVmIENPTkZJR19CTEtfREVCVUdfRlMNCj4gK2ludCBf
X2Jsa19tcV9kZWJ1Z2ZzX3JxX3Nob3coc3RydWN0IHNlcV9maWxlICptLCBzdHJ1Y3QgcmVxdWVz
dCAqcnEpOw0KPiAraW50IGJsa19tcV9kZWJ1Z2ZzX3JxX3Nob3coc3RydWN0IHNlcV9maWxlICpt
LCB2b2lkICp2KTsNCj4gKyNlbmRpZg0KPiArDQo+ICAjZW5kaWYNCg0KSGVsbG8gRGFtaWVuLA0K
DQpTaG91bGQgdGhlc2UgZGVmaW5pdGlvbnMgcGVyaGFwcyBiZSBtb3ZlZCB0byBhIG5ldyBoZWFk
ZXIgZmlsZSwgZS5nLg0KaW5jbHVkZS9saW51eC9ibGstbXEtZGVidWdmcy5oLCBzdWNoIHRoYXQg
dGhlIGludHJvZHVjdGlvbiBvZiBtb3JlICNpZmRlZnMNCmFuZCAjaW5jbHVkZSA8bGludXgvc2Vx
X2ZpbGUuaD4gaW4gaW5jbHVkZS9saW51eC9ibGstbXEuaCBjYW4gYmUgYXZvaWRlZD8NCg0KVGhh
bmtzLA0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8] block: Fix declaration of blk-mq debugfs functions
@ 2017-09-01 16:00     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:00 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 14542308d25b..a369174a9679 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -5,6 +5,21 @@
>  #include <linux/sbitmap.h>
>  #include <linux/srcu.h>
>  
> +#ifdef CONFIG_BLK_DEBUG_FS
> +
> +#include <linux/seq_file.h>
> +
> +struct blk_mq_debugfs_attr {
> +	const char *name;
> +	umode_t mode;
> +	int (*show)(void *, struct seq_file *);
> +	ssize_t (*write)(void *, const char __user *, size_t, loff_t *);
> +	/* Set either .show or .seq_ops. */
> +	const struct seq_operations *seq_ops;
> +};
> +
> +#endif
> +
>  struct blk_mq_tags;
>  struct blk_flush_queue;
>  
> @@ -289,4 +304,9 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
>  	for ((i) = 0; (i) < (hctx)->nr_ctx &&				\
>  	     ({ ctx = (hctx)->ctxs[(i)]; 1; }); (i)++)
>  
> +#ifdef CONFIG_BLK_DEBUG_FS
> +int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq);
> +int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
> +#endif
> +
>  #endif

Hello Damien,

Should these definitions perhaps be moved to a new header file, e.g.
include/linux/blk-mq-debugfs.h, such that the introduction of more #ifdefs
and #include <linux/seq_file.h> in include/linux/blk-mq.h can be avoided?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] block: Fix declaration of blk-mq scheduler functions
  2017-09-01 11:36 ` [PATCH 2/8] block: Fix declaration of blk-mq scheduler functions Damien Le Moal
@ 2017-09-01 16:04     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:04 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDIwOjM2ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gLS0tIGEvaW5jbHVkZS9saW51eC9ibGstbXEuaA0KPiArKysgYi9pbmNsdWRlL2xpbnV4L2Js
ay1tcS5oDQo+IEBAIC0yODQsNiArMjg0LDE2IEBAIHZvaWQgYmxrX21xX3VwZGF0ZV9ucl9od19x
dWV1ZXMoc3RydWN0IGJsa19tcV90YWdfc2V0ICpzZXQsIGludCBucl9od19xdWV1ZXMpOw0KPiAg
dm9pZCBibGtfbXFfcXVpZXNjZV9xdWV1ZV9ub3dhaXQoc3RydWN0IHJlcXVlc3RfcXVldWUgKnEp
Ow0KPiAgDQo+ICAvKg0KPiArICogU2NoZWR1bGVyIGhlbHBlciBmdW5jdGlvbnMuDQo+ICsgKi8N
Cj4gK3ZvaWQgYmxrX21xX3NjaGVkX2ZyZWVfaGN0eF9kYXRhKHN0cnVjdCByZXF1ZXN0X3F1ZXVl
ICpxLA0KPiArCQkJCSB2b2lkICgqZXhpdCkoc3RydWN0IGJsa19tcV9od19jdHggKikpOw0KPiAr
Ym9vbCBibGtfbXFfc2NoZWRfdHJ5X21lcmdlKHN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxLCBzdHJ1
Y3QgYmlvICpiaW8sDQo+ICsJCQkgICAgc3RydWN0IHJlcXVlc3QgKiptZXJnZWRfcmVxdWVzdCk7
DQo+ICtib29sIGJsa19tcV9zY2hlZF90cnlfaW5zZXJ0X21lcmdlKHN0cnVjdCByZXF1ZXN0X3F1
ZXVlICpxLCBzdHJ1Y3QgcmVxdWVzdCAqcnEpOw0KPiArdm9pZCBibGtfbXFfc2NoZWRfcmVxdWVz
dF9pbnNlcnRlZChzdHJ1Y3QgcmVxdWVzdCAqcnEpOw0KPiArDQo+ICsvKg0KPiAgICogRHJpdmVy
IGNvbW1hbmQgZGF0YSBpcyBpbW1lZGlhdGVseSBhZnRlciB0aGUgcmVxdWVzdC4gU28gc3VidHJh
Y3QgcmVxdWVzdA0KPiAgICogc2l6ZSB0byBnZXQgYmFjayB0byB0aGUgb3JpZ2luYWwgcmVxdWVz
dCwgYWRkIHJlcXVlc3Qgc2l6ZSB0byBnZXQgdGhlIFBEVS4NCj4gICAqLw0KDQpIZWxsbyBEYW1p
ZW4sDQoNCkEgc2ltaWxhciBjb21tZW50IGFzIGZvciBwYXRjaCAxLzggZm9yIHRoaXMgcGF0Y2g6
IHNob3VsZCB0aGVzZSBkZWNsYXJhdGlvbnMNCnBlcmhhcHMgYmUgbW92ZWQgdG8gYSBuZXcgaGVh
ZGVyIGZpbGUsIGUuZy4gaW5jbHVkZS9saW51eC9ibGstbXEtc2NoZWQuaA0KYmVjYXVzZSB0aGVz
ZSBmdW5jdGlvbnMgYXJlIG9ubHkgdXNlZCBieSBJL08gc2NoZWR1bGVycyBhbmQgbm90IGluIHRo
ZQ0KaW1wbGVtZW50YXRpb24gb2YgYmxvY2sgZHJpdmVycz8gVGhhdCB3aWxsIGhlbHAgdG8gYXZv
aWQgdGhhdCBibG9jayBkcml2ZXINCmF1dGhvcnMgc3RhcnQgd29uZGVyaW5nIHdoZXRoZXIgb3Ig
bm90IHRoZXkgaGF2ZSB0byBjYWxsIHVzZSB0aGVzZSBmdW5jdGlvbnMNCndoaWxlIHRoZXkgYXJl
IHJlYWRpbmcgaW5jbHVkZS9saW51eC9ibGstbXEuaC4NCg0KVGhhbmtzLA0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] block: Fix declaration of blk-mq scheduler functions
@ 2017-09-01 16:04     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:04 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -284,6 +284,16 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues);
>  void blk_mq_quiesce_queue_nowait(struct request_queue *q);
>  
>  /*
> + * Scheduler helper functions.
> + */
> +void blk_mq_sched_free_hctx_data(struct request_queue *q,
> +				 void (*exit)(struct blk_mq_hw_ctx *));
> +bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
> +			    struct request **merged_request);
> +bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
> +void blk_mq_sched_request_inserted(struct request *rq);
> +
> +/*
>   * Driver command data is immediately after the request. So subtract request
>   * size to get back to the original request, add request size to get the PDU.
>   */

Hello Damien,

A similar comment as for patch 1/8 for this patch: should these declarations
perhaps be moved to a new header file, e.g. include/linux/blk-mq-sched.h
because these functions are only used by I/O schedulers and not in the
implementation of block drivers? That will help to avoid that block driver
authors start wondering whether or not they have to call use these functions
while they are reading include/linux/blk-mq.h.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/8] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h
  2017-09-01 11:36 ` [PATCH 3/8] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h Damien Le Moal
@ 2017-09-01 16:08     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:08 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDIwOjM2ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gKy8qIFpvbmUgdHlwZXMgb2YgUkVQT1JUIFpPTkVTIHpvbmUgZGVzY3JpcHRvcnMgKi8NCj4g
K2VudW0gemJjX3pvbmVfdHlwZSB7DQo+ICsJWkJDX1pPTkVfVFlQRV9DT05WCQk9IDB4MSwNCj4g
KwlaQkNfWk9ORV9UWVBFX1NFUVdSSVRFX1JFUQk9IDB4MiwNCj4gKwlaQkNfWk9ORV9UWVBFX1NF
UVdSSVRFX1BSRUYJPSAweDMsDQo+ICsJLyogMHg0IHRvIDB4ZiBhcmUgcmVzZXJ2ZWQgKi8NCj4g
KwlaQkNfWk9ORV9UWVBFX1JFU0VSVkVECQk9IDB4NCwNCj4gK307DQoNCkhlbGxvIERhbWllbiwN
Cg0KU2luY2Ugbm8gY29kZSBpcyB1c2luZyBaQkNfWk9ORV9UWVBFX1JFU0VSVkVELCBpcyB0aGUg
Y29tbWVudCBhYm91dCByZXNlcnZlZA0KdmFsdWVzIHN1ZmZpY2llbnQgYW5kIGNhbiB0aGUgZGVm
aW5pdGlvbiBvZiBaQkNfWk9ORV9UWVBFX1JFU0VSVkVEIGJlIGxlZnQgb3V0Pw0KDQpBbnl3YXk6
DQoNClJldmlld2VkLWJ5OiBCYXJ0IFZhbiBBc3NjaGUgPGJhcnQudmFuYXNzY2hlQHdkYy5jb20+
DQoNCg==

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/8] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h
@ 2017-09-01 16:08     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:08 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> +/* Zone types of REPORT ZONES zone descriptors */
> +enum zbc_zone_type {
> +	ZBC_ZONE_TYPE_CONV		= 0x1,
> +	ZBC_ZONE_TYPE_SEQWRITE_REQ	= 0x2,
> +	ZBC_ZONE_TYPE_SEQWRITE_PREF	= 0x3,
> +	/* 0x4 to 0xf are reserved */
> +	ZBC_ZONE_TYPE_RESERVED		= 0x4,
> +};

Hello Damien,

Since no code is using ZBC_ZONE_TYPE_RESERVED, is the comment about reserved
values sufficient and can the definition of ZBC_ZONE_TYPE_RESERVED be left out?

Anyway:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/8] scsi: sd_zbc: Reorganize and cleanup
  2017-09-01 11:36 ` [PATCH 4/8] scsi: sd_zbc: Reorganize and cleanup Damien Le Moal
@ 2017-09-01 16:14     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:14 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDIwOjM2ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gSW50cm9kdWNlIHNkX3piYy5oIGZvciB6YmMgcmVsYXRlZCBkZWNsYXJhdGlvbnMgKGF2b2lk
IGNsdXR0ZXJpbmcgc2QuaCkuDQo+IA0KPiBGaXggY29tbWVudHMgc3R5bGUgaW4gc2RfemJjLmMg
KGRvIG5vdCB1c2UgZG9jdW1lbnRhdGlvbiBmb3JtYXQpIGFuZA0KPiBhZGQvZml4IGNvbW1lbnRz
IHRvIGVuaGFuY2UgZXhwbGFuYXRpb25zLiBBbHNvIHJlbW92ZSBhIHVzZWxlc3MgYmxhbmsNCj4g
bGluZSBpbiBzZF96YmNfcmVhZF96b25lcygpLg0KPiANCj4gUmVhcnJhbmdlIHNkX3piY19zZXR1
cCgpIHRvIGluY2x1ZGUgdXNlXzE2X2Zvcl9ydyBhbmQgdXNlXzEwX2Zvcl9ydw0KPiBhc3NpZ25t
ZW50IGFuZCBzaW1wbGlmeSBucl96b25lcyBjYWxjdWxhdGlvbi4NCj4gDQo+IEZpbmFsbHksIHVz
ZSB0aGUgbWluKCkgbWFjcm8gc2RfemJjX2NoZWNrX3pvbmVfc2l6ZSgpIHRvIGdldCBhIHJlcG9y
dA0KPiB6b25lIHJlcGx5IHNpemUgaW5zdGVhZCBvZiB0aGUgb3BlbiBjb2RlZCB2ZXJzaW9uLg0K
DQpIZWxsbyBEYW1pZW4sDQoNClNob3VsZCB0aGlzIHBhdGNoIHBlcmhhcHMgYmUgc3BsaXQgaW50
byBtdWx0aXBsZSBwYXRjaGVzIHRvIG1ha2UgcmV2aWV3DQplYXNpZXI/DQoNCj4gK3N0YXRpYyBp
bmxpbmUgdm9pZCBzZF96YmNfcmVtb3ZlKHN0cnVjdCBzY3NpX2Rpc2sgKnNka3ApIHt9DQoNClBs
ZWFzZSB1c2UgdGhlIHNhbWUgY29kaW5nIHN0eWxlIGFzIGluIG90aGVyIExpbnV4IGtlcm5lbCBo
ZWFkZXIgZmlsZXMsDQpuYW1lbHkgJ3snIGFuZCAnfScgYm90aCBhdCB0aGUgc3RhcnQgb2YgYSBs
aW5lLiBTZWUgZS5nLiBpbmNsdWRlL2xpbnV4L3N5c2ZzLmguDQoNClRoYW5rcywNCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/8] scsi: sd_zbc: Reorganize and cleanup
@ 2017-09-01 16:14     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:14 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> Introduce sd_zbc.h for zbc related declarations (avoid cluttering sd.h).
> 
> Fix comments style in sd_zbc.c (do not use documentation format) and
> add/fix comments to enhance explanations. Also remove a useless blank
> line in sd_zbc_read_zones().
> 
> Rearrange sd_zbc_setup() to include use_16_for_rw and use_10_for_rw
> assignment and simplify nr_zones calculation.
> 
> Finally, use the min() macro sd_zbc_check_zone_size() to get a report
> zone reply size instead of the open coded version.

Hello Damien,

Should this patch perhaps be split into multiple patches to make review
easier?

> +static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}

Please use the same coding style as in other Linux kernel header files,
namely '{' and '}' both at the start of a line. See e.g. include/linux/sysfs.h.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()
  2017-09-01 11:36 ` [PATCH 5/8] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics() Damien Le Moal
@ 2017-09-01 16:15     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:15 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe
  Cc: hch, stable

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDIwOjM2ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gVGhlIHRocmVlIHZhbHVlcyBzdGFydGluZyBhdCBieXRlIDggb2YgdGhlIFpvbmVkIEJsb2Nr
IERldmljZQ0KPiBDaGFyYWN0ZXJpc3RpY3MgVlBEIHBhZ2UgQjZoIGFyZSAzMiBiaXRzIHZhbHVl
cywgbm90IDY0Yml0cy4gU28gdXNlDQo+IGdldF91bmFsaWduZWRfYmUzMigpIHRvIHJldHJpZXZl
IHRoZSB2YWx1ZXMgYW5kIG5vdCBnZXRfdW5hbGlnbmVkX2JlNjQoKQ0KDQpSZXZpZXdlZC1ieTog
QmFydCBWYW4gQXNzY2hlIDxiYXJ0LnZhbmFzc2NoZUB3ZGMuY29tPg==

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()
@ 2017-09-01 16:15     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:15 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe
  Cc: hch, stable

On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> The three values starting at byte 8 of the Zoned Block Device
> Characteristics VPD page B6h are 32 bits values, not 64bits. So use
> get_unaligned_be32() to retrieve the values and not get_unaligned_be64()

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/8] scsi: sd_zbc: Limit zone write locking to sequential zones
  2017-09-01 11:36 ` [PATCH 6/8] scsi: sd_zbc: Limit zone write locking to sequential zones Damien Le Moal
@ 2017-09-01 16:25     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:25 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDIwOjM2ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gWm9uZWQgYmxvY2sgZGV2aWNlcyBoYXZlIG5vIHdyaXRlIGNvbnN0cmFpbnRzIGZvciBjb252
ZW50aW9uYWwgem9uZXMNCj4gc28gd3JpdGUgbG9ja2luZyBvZiBjb252ZW50aW9uYWwgem9uZXMg
aXMgbm90IG5lY2Vzc2FyeSBhbmQgY2FuIGh1cnQNCj4gcGVyZm9ybWFuY2UuIFRvIGF2b2lkIHRo
aXMsIGludHJvZHVjZSB0aGUgc2VxX3pvbmVzIGJpdG1hcCB0byBpbmRpY2F0ZQ0KPiBpZiBhIHpv
bmUgaXMgYSBzZXF1ZW50aWFsIG9uZS4gVXNlIHRoaXMgaW5mb3JtYXRpb24gdG8gYWxsb3cgYW55
IHdyaXRlDQo+IHRvIGJlIGlzc3VlZCB0byBjb252ZW50aW9uYWwgem9uZXMsIGxvY2tpbmcgb25s
eSBzZXF1ZW50aWFsIHpvbmVzLg0KPiANCj4gQXMgdGhlIHpvbmUgYml0bWFwIGFsbG9jYXRpb24g
Zm9yIHNlcV96b25lcyBpcyBpZGVudGljYWwgdG8gdGhlIHpvbmVzDQo+IHdyaXRlIGxvY2sgYml0
bWFwLCBpbnRyb2R1Y2UgdGhlIGhlbHBlciBzZF96YmNfYWxsb2Nfem9uZV9iaXRtYXAoKS4NCj4g
VXNpbmcgdGhpcyBoZWxwZXIsIHdhaXQgZm9yIHRoZSBkaXNrIGNhcGFjaXR5IGFuZCBudW1iZXIg
b2Ygem9uZXMgdG8NCj4gc3RhYmlsaXplIG9uIHRoZSBzZWNvbmQgcmV2YWxpZGF0aW9uIHBhc3Mg
dG8gYWxsb2NhdGUgYW5kIGluaXRpYWxpemUNCj4gdGhlIGJpdG1hcHMuDQoNClJldmlld2VkLWJ5
OiBCYXJ0IFZhbiBBc3NjaGUgPGJhcnQudmFuYXNzY2hlQHdkYy5jb20+

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/8] scsi: sd_zbc: Limit zone write locking to sequential zones
@ 2017-09-01 16:25     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:25 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> Zoned block devices have no write constraints for conventional zones
> so write locking of conventional zones is not necessary and can hurt
> performance. To avoid this, introduce the seq_zones bitmap to indicate
> if a zone is a sequential one. Use this information to allow any write
> to be issued to conventional zones, locking only sequential zones.
> 
> As the zone bitmap allocation for seq_zones is identical to the zones
> write lock bitmap, introduce the helper sd_zbc_alloc_zone_bitmap().
> Using this helper, wait for the disk capacity and number of zones to
> stabilize on the second revalidation pass to allocate and initialize
> the bitmaps.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/8] scsi: sd_zbc: Disable zone write locking with scsi-mq
  2017-09-01 11:36 ` [PATCH 7/8] scsi: sd_zbc: Disable zone write locking with scsi-mq Damien Le Moal
@ 2017-09-01 16:26     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:26 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDIwOjM2ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gSW4gdGhlIGNhc2Ugb2YgYSBaQkMgZGlzayB1c2VkIHdpdGggc2NzaS1tcSwgem9uZSB3cml0
ZSBsb2NraW5nIGRvZXMNCj4gbm90IHByZXZlbnQgd3JpdGUgcmVvcmRlcmluZyBpbiBzZXF1ZW50
aWFsIHpvbmVzLiBVbmxpa2UgdGhlIGxlZ2FjeQ0KPiBjYXNlLCB6b25lIGxvY2tpbmcgY2FuIG9u
bHkgYmUgZG9uZSBhZnRlciB0aGUgY29tbWFuZCByZXF1ZXN0IGlzDQo+IHJlbW92ZWQgZnJvbSB0
aGUgc2NoZWR1bGVyIGRpc3BhdGNoIHF1ZXVlLiBUaGF0IGlzLCBhdCB0aGUgdGltZSBvZg0KPiB6
b25lIGxvY2tpbmcsIHRoZSB3cml0ZSBjb21tYW5kIG1heSBhbHJlYWR5IGJlIG91dCBvZiBvcmRl
ci4NCj4gDQo+IERpc2FibGUgem9uZSB3cml0ZSBsb2NraW5nIGluIHNkX3piY193cml0ZV9sb2Nr
X3pvbmUoKSBpZiB0aGUgZGlzayBpcw0KPiB1c2VkIHdpdGggc2NzaS1tcS4gV3JpdGUgb3JkZXIg
Z3VhcmFudGVlcyBjYW4gYmUgcHJvdmlkZWQgYnkgYW4NCj4gYWRhcHRlZCBJL08gc2NoZWR1bGVy
Lg0KDQpSZXZpZXdlZC1ieTogQmFydCBWYW4gQXNzY2hlIDxiYXJ0LnZhbmFzc2NoZUB3ZGMuY29t
Pg0KDQo=

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/8] scsi: sd_zbc: Disable zone write locking with scsi-mq
@ 2017-09-01 16:26     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 16:26 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> In the case of a ZBC disk used with scsi-mq, zone write locking does
> not prevent write reordering in sequential zones. Unlike the legacy
> case, zone locking can only be done after the command request is
> removed from the scheduler dispatch queue. That is, at the time of
> zone locking, the write command may already be out of order.
> 
> Disable zone write locking in sd_zbc_write_lock_zone() if the disk is
> used with scsi-mq. Write order guarantees can be provided by an
> adapted I/O scheduler.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/8] scsi: Introduce ZBC disk I/O scheduler
  2017-09-01 11:36 ` [PATCH 8/8] scsi: Introduce ZBC disk I/O scheduler Damien Le Moal
@ 2017-09-01 17:19     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 17:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

T24gRnJpLCAyMDE3LTA5LTAxIGF0IDIwOjM2ICswOTAwLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToN
Cj4gK3N0YXRpYyBzdHJ1Y3Qgc2NzaV9kaXNrICpfX3pvbmVkX3Njc2lfZGlzayhzdHJ1Y3QgcmVx
dWVzdF9xdWV1ZSAqcSkNCj4gK3sNCj4gKyAgICAgICBzdHJ1Y3Qgc2NzaV9kZXZpY2UgKnNkcDsN
Cj4gKyAgICAgICBzdHJ1Y3Qgc2NzaV9kaXNrICpzZGtwOw0KPiArDQo+ICsgICAgICAgaWYgKCFi
bGtfcXVldWVfaXNfem9uZWQocSkpIHsNCj4gKyAgICAgICAgICAgICAgIHByX2Vycigiem9uZWQ6
IE5vdCBhIHpvbmVkIGJsb2NrIGRldmljZVxuIik7DQo+ICsgICAgICAgICAgICAgICByZXR1cm4g
TlVMTDsNCj4gKyAgICAgICB9DQo+ICsNCj4gKyAgICAgICBzZHAgPSBzY3NpX2RldmljZV9mcm9t
X3F1ZXVlKHEpOw0KPiArICAgICAgIGlmICghc2RwKSB7DQo+ICsgICAgICAgICAgICAgICBwcl9l
cnIoInpvbmVkOiBOb3QgYSBTQ1NJIGRldmljZVxuIik7DQo+ICsgICAgICAgICAgICAgICByZXR1
cm4gTlVMTDsNCj4gKyAgICAgICB9DQo+ICsNCj4gKyAgICAgICBzZGtwID0gZGV2X2dldF9kcnZk
YXRhKCZzZHAtPnNkZXZfZ2VuZGV2KTsNCj4gKyAgICAgICBpZiAoV0FSTl9PTihzZGtwLT5kaXNr
LT5xdWV1ZSAhPSBxKSkNCj4gKyAgICAgICAgICAgICAgIHJldHVybiBOVUxMOw0KPiArDQo+ICsg
ICAgICAgcmV0dXJuIHNka3A7DQo+ICt9DQoNCkhlbGxvIERhbWllbiwNCg0KQ2FuIHJlYWRpbmcg
c2RrcC0+ZGlzay0+cXVldWUgY2F1c2UgYSBrZXJuZWwgY3Jhc2ggaWYgc2RwIGRvZXMgbm90IHBv
aW50IGF0DQphIFNDU0kgZGV2aWNlIHRoYXQgaXMgYXNzb2NpYXRlZCB3aXRoIGEgU0NTSSBkaXNr
PyBIb3cgYWJvdXQgdXNpbmcgc29tZXRoaW5nDQpsaWtlIHRoZSBjb2RlIGJlbG93IHRvIGNvbnZl
cnQgYSByZXF1ZXN0IHF1ZXVlIHBvaW50ZXIgaW50byBhIFNDU0kgZGlzaw0KcG9pbnRlcj8NCg0K
c3RhdGljIGludCBsb29rdXBfZGlzayhzdHJ1Y3QgZGV2aWNlICpkZXYsIHZvaWQgKmRhdGEpDQp7
DQoJc3RydWN0IHNjc2lfZGlzayAqKnNka3AgPSBkYXRhOw0KDQoJaWYgKCEqc2RrcCAmJiBkZXYt
PmNsYXNzID09ICZzZF9kaXNrX2NsYXNzKQ0KCQkqc2RrcCA9IHRvX3Njc2lfZGlzayhkZXYpOw0K
DQoJcmV0dXJuIDA7DQp9DQoNCnN0YXRpYyBzdHJ1Y3Qgc2NzaV9kaXNrICpxX3RvX3Nka3Aoc3Ry
dWN0IHJlcXVlc3RfcXVldWUgKnEpDQp7DQoJc3RydWN0IHNjc2lfZGV2aWNlICpzZHAgPSBzY3Np
X2RldmljZV9mcm9tX3F1ZXVlKHEpOw0KCXN0cnVjdCBzY3NpX2Rpc2sgKnNka3AgPSBOVUxMOw0K
DQoJaWYgKHNkcCkNCgkJZGV2aWNlX2Zvcl9lYWNoX2NoaWxkKCZzZHAtPnNkZXZfZ2VuZGV2LCAm
c2RrcCwgbG9va3VwX2Rpc2spOw0KCXJldHVybiBzZGtwOw0KfQ0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/8] scsi: Introduce ZBC disk I/O scheduler
@ 2017-09-01 17:19     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-09-01 17:19 UTC (permalink / raw)
  To: linux-scsi, linux-block, Damien Le Moal, martin.petersen, axboe; +Cc: hch

On Fri, 2017-09-01 at 20:36 +0900, Damien Le Moal wrote:
> +static struct scsi_disk *__zoned_scsi_disk(struct request_queue *q)
> +{
> +       struct scsi_device *sdp;
> +       struct scsi_disk *sdkp;
> +
> +       if (!blk_queue_is_zoned(q)) {
> +               pr_err("zoned: Not a zoned block device\n");
> +               return NULL;
> +       }
> +
> +       sdp = scsi_device_from_queue(q);
> +       if (!sdp) {
> +               pr_err("zoned: Not a SCSI device\n");
> +               return NULL;
> +       }
> +
> +       sdkp = dev_get_drvdata(&sdp->sdev_gendev);
> +       if (WARN_ON(sdkp->disk->queue != q))
> +               return NULL;
> +
> +       return sdkp;
> +}

Hello Damien,

Can reading sdkp->disk->queue cause a kernel crash if sdp does not point at
a SCSI device that is associated with a SCSI disk? How about using something
like the code below to convert a request queue pointer into a SCSI disk
pointer?

static int lookup_disk(struct device *dev, void *data)
{
	struct scsi_disk **sdkp = data;

	if (!*sdkp && dev->class == &sd_disk_class)
		*sdkp = to_scsi_disk(dev);

	return 0;
}

static struct scsi_disk *q_to_sdkp(struct request_queue *q)
{
	struct scsi_device *sdp = scsi_device_from_queue(q);
	struct scsi_disk *sdkp = NULL;

	if (sdp)
		device_for_each_child(&sdp->sdev_gendev, &sdkp, lookup_disk);
	return sdkp;
}

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-09-01 17:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-01 11:36 [PATCH 0/8] blk-mq/scsi-mq support for ZBC disks Damien Le Moal
2017-09-01 11:36 ` [PATCH 1/8] block: Fix declaration of blk-mq debugfs functions Damien Le Moal
2017-09-01 16:00   ` Bart Van Assche
2017-09-01 16:00     ` Bart Van Assche
2017-09-01 11:36 ` [PATCH 2/8] block: Fix declaration of blk-mq scheduler functions Damien Le Moal
2017-09-01 16:04   ` Bart Van Assche
2017-09-01 16:04     ` Bart Van Assche
2017-09-01 11:36 ` [PATCH 3/8] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h Damien Le Moal
2017-09-01 16:08   ` Bart Van Assche
2017-09-01 16:08     ` Bart Van Assche
2017-09-01 11:36 ` [PATCH 4/8] scsi: sd_zbc: Reorganize and cleanup Damien Le Moal
2017-09-01 16:14   ` Bart Van Assche
2017-09-01 16:14     ` Bart Van Assche
2017-09-01 11:36 ` [PATCH 5/8] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics() Damien Le Moal
2017-09-01 16:15   ` Bart Van Assche
2017-09-01 16:15     ` Bart Van Assche
2017-09-01 11:36 ` [PATCH 6/8] scsi: sd_zbc: Limit zone write locking to sequential zones Damien Le Moal
2017-09-01 16:25   ` Bart Van Assche
2017-09-01 16:25     ` Bart Van Assche
2017-09-01 11:36 ` [PATCH 7/8] scsi: sd_zbc: Disable zone write locking with scsi-mq Damien Le Moal
2017-09-01 16:26   ` Bart Van Assche
2017-09-01 16:26     ` Bart Van Assche
2017-09-01 11:36 ` [PATCH 8/8] scsi: Introduce ZBC disk I/O scheduler Damien Le Moal
2017-09-01 17:19   ` Bart Van Assche
2017-09-01 17:19     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.