All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V8 0/7] blk-mq support for ZBC disks
@ 2017-11-09  6:57 Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 1/7] block: introduce zoned block devices zone write locking Damien Le Moal
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-11-09  6:57 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

This series, formerly titled "scsi-mq support for ZBC disks", implements
support for ZBC disks for system using the scsi-mq I/O path.

The current scsi level support of ZBC disks guarantees write request ordering
using a per-zone write lock which prevents issuing simultaneously multiple
write commands to a zone, doing so avoid reordering of sequential writes to
sequential zones. This method is however ineffective when scsi-mq is used with
zoned block devices. This is due to the different execution model of blk-mq
which passes a request to the scsi layer for dispatching after the request has
been removed from the I/O scheduler queue. That is, when the scsi layer tries
to lock the target zone of the request, the request may already be out of
order and zone write locking fails to prevent that.

Various approaches have been tried to solve this problem directly from the core
code of blk-mq. All of them had the serious disadvantage of cluttering blk-mq
code with zoned block device specific conditions and processing, making
maintenance and testing difficult.

This series adds blk-mq support for zoned block devices at the I/O scheduler
level with simple modifications of the mq-deadline scheduler. Implementation
is done with reusable helpers defined in the zoned block device support file
(blk-zoned.c). These helpers provide per zone write locking control functions
similar to what was implemented directly in the SCSI layer in sd_zbc.c.
The zone write locking mechanism is used by mq-deadline for the exact same
purpose, that is, to limit writes per zone to at most one request to avoid
reordering.

The changes to mq-deadline do not affect its operation with regular disks. The
same scheduling behavior is maintained for these devices. Compared to the SCSI
layer zone locking implementation, this series optimizes avoids locking
conventional zones which result in a use of these zone that is comparable to a
regular disk.

This series also implements changes to the legacy deadline-iosched. Doing so,
the zone locking code at the SCSI layer in sd.c and sd_zbc.c can be removed.
This results in a significant simplification of the sd driver command handling.

Patch 1 to 5 introduce the zone locking helpers in the block layer and modify
the deadline and mq-deadline schedulers. They equally apply on top of the block
tree branch for-4.15/block and on top of the scsi tree branch 4.15/scsi-queue.
Patch 6 to 8 remove the SCSI layer zone locking and initialize the device
request queue zone information. They apply to the scsi tree branch
4.15/scsi-queue. To cleanly apply these last 3 patches to the block tree branch
for-4.15/block, the following patches from the scsi tree must first be applied:

aa8a845662 "scsi: sd_zbc: Move ZBC declarations to scsi_proto.h"
e98f42bcad "scsi: sd_zbc: Fix comments and indentation"
5eed92d173 "scsi: sd_zbc: Rearrange code"
e8c77ec483 "scsi: sd_zbc: Use well defined macros"
4a109032e3 "scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()"

Of note is that this series imposes the use of the deadline and mq-deadline
schedulers with zoned block devices. A system can trivialy enforce this using
a udev rule such as:

ACTION=="add|change", KERNEL=="sd[a-z]", ATTRS{queue/zoned}=="host-managed", \
ATTR{queue/scheduler}="deadline"

This rules applies equally for the legacy SCSI path as well as the scsi-mq path
thanks to "mq-deadline" being aliased to "deadline".

Comments are as always very much appreciated.

Changes from v7:
* Merged patch 8 into patch 6
* Fixed various typos and commit messages

Changes from v6:
* Implement zone write locking helpers in the block layer
* Also modify legacy path deadline scheduler to remove all zone write locking
  code from the scsi layer

Changes from v5:
* Refactor patches to introduce the zone_lock spinlock only when needed
* Addressed Bart's comments (in particular explanations of the zone_lock
  spinlock use)

Changes from v4:
* Various fixes and improvements (From Christoph's comments)
* Dropped zones_wlock scheduler tunable attribute

Changes from v3:
* Integrated support directly into mq-deadline instead of creating a new I/O
  scheduler.
* Disable setting of default mq scheduler for single queue devices

Changes from v2:
* Introduced blk_zoned structure
* Moved I/O scheduler from drivers/scsi to block

Changes from v1:
* Addressed Bart's comments for the blk-mq patches (declarations files)
* Split (former) patch 4 into multiple patches to facilitate review
* Fixed scsi disk lookup from io scheduler by introducing
  scsi_disk_from_queue()

Christoph Hellwig (1):
  block: introduce zoned block devices zone write locking

Damien Le Moal (6):
  mq-deadline: Introduce dispatch helpers
  mq-deadline: Introduce zone locking support
  deadline-iosched: Introduce dispatch helpers
  deadline-iosched: Introduce zone locking support
  sd_zbc: Initialize device request queue zoned data
  sd: Remove zone write locking

 block/blk-core.c         |   1 +
 block/blk-zoned.c        |  42 +++++++++
 block/deadline-iosched.c | 114 ++++++++++++++++++++---
 block/mq-deadline.c      | 130 ++++++++++++++++++++++++--
 drivers/scsi/sd.c        |  41 +--------
 drivers/scsi/sd.h        |  11 ---
 drivers/scsi/sd_zbc.c    | 235 +++++++++++++++++++++++++++++------------------
 include/linux/blkdev.h   | 111 ++++++++++++++++++++++
 include/scsi/scsi_cmnd.h |   3 +-
 9 files changed, 528 insertions(+), 160 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V8 1/7] block: introduce zoned block devices zone write locking
  2017-11-09  6:57 [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
@ 2017-11-09  6:57 ` Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 2/7] mq-deadline: Introduce dispatch helpers Damien Le Moal
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-11-09  6:57 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

From: Christoph Hellwig <hch@lst.de>

Components relying only on the request_queue structure for accessing
block devices (e.g. I/O schedulers) have a limited knowledged of the
device characteristics. In particular, the device capacity cannot be
easily discovered, which for a zoned block device also result in the
inability to easily know the number of zones of the device (the zone
size is indicated by the chunk_sectors field of the queue limits).

Introduce the nr_zones field to the request_queue structure to simplify
access to this information. Also, add the bitmap seq_zone_bitmap which
indicates which zones of the device are sequential zones (write
preferred or write required) and the bitmap seq_zones_wlock which
indicates if a zone is write locked, that is, if a write request
targeting a zone was dispatched to the device. These fields are
initialized by the low level block device driver (sd.c for ZBC/ZAC
disks). They are not initialized by stacking drivers (device mappers)
handling zoned block devices (e.g. dm-linear).

Using this, I/O schedulers can introduce zone write locking to control
request dispatching to a zoned block device and avoid write request
reordering by limiting to at most a single write request per zone
outside of the scheduler at any time.

Based on previous patches from Damien Le Moal.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Damien]
* Fixed comments and identation in blkdev.h
* Changed helper functions
* Fixed this commit message
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-core.c       |   1 +
 block/blk-zoned.c      |  42 +++++++++++++++++++
 include/linux/blkdev.h | 111 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 154 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index b8d1aa2d1008..e887c0b45d0b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1555,6 +1555,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
 
 	lockdep_assert_held(q->queue_lock);
 
+	blk_req_zone_write_unlock(req);
 	blk_pm_put_request(req);
 
 	elv_completed_request(q, req);
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index ff57fb51b338..acb7252c7e81 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -22,6 +22,48 @@ static inline sector_t blk_zone_start(struct request_queue *q,
 }
 
 /*
+ * Return true if a request is a write requests that needs zone write locking.
+ */
+bool blk_req_needs_zone_write_lock(struct request *rq)
+{
+	if (!rq->q->seq_zones_wlock)
+		return false;
+
+	if (blk_rq_is_passthrough(rq))
+		return false;
+
+	switch (req_op(rq)) {
+	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_WRITE_SAME:
+	case REQ_OP_WRITE:
+		return blk_rq_zone_is_seq(rq);
+	default:
+		return false;
+	}
+}
+EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock);
+
+void __blk_req_zone_write_lock(struct request *rq)
+{
+	if (WARN_ON_ONCE(test_and_set_bit(blk_rq_zone_no(rq),
+					  rq->q->seq_zones_wlock)))
+		return;
+
+	WARN_ON_ONCE(rq->rq_flags & RQF_ZONE_WRITE_LOCKED);
+	rq->rq_flags |= RQF_ZONE_WRITE_LOCKED;
+}
+EXPORT_SYMBOL_GPL(__blk_req_zone_write_lock);
+
+void __blk_req_zone_write_unlock(struct request *rq)
+{
+	rq->rq_flags &= ~RQF_ZONE_WRITE_LOCKED;
+	if (rq->q->seq_zones_wlock)
+		WARN_ON_ONCE(!test_and_clear_bit(blk_rq_zone_no(rq),
+						 rq->q->seq_zones_wlock));
+}
+EXPORT_SYMBOL_GPL(__blk_req_zone_write_unlock);
+
+/*
  * Check that a zone report belongs to the partition.
  * If yes, fix its start sector and write pointer, copy it in the
  * zone information array and return true. Return false otherwise.
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 225617dd0a3f..1c715dae31e4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -120,6 +120,8 @@ typedef __u32 __bitwise req_flags_t;
 /* Look at ->special_vec for the actual data payload instead of the
    bio chain. */
 #define RQF_SPECIAL_PAYLOAD	((__force req_flags_t)(1 << 18))
+/* The per-zone write lock is held for this request */
+#define RQF_ZONE_WRITE_LOCKED	((__force req_flags_t)(1 << 19))
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \
@@ -546,6 +548,22 @@ struct request_queue {
 	struct queue_limits	limits;
 
 	/*
+	 * Zoned block device information for request dispatch control.
+	 * nr_zones is the total number of zones of the device. This is always
+	 * 0 for regular block devices. seq_zones_bitmap is a bitmap of nr_zones
+	 * bits which indicates if a zone is conventional (bit clear) or
+	 * sequential (bit set). seq_zones_wlock is a bitmap of nr_zones
+	 * bits which indicates if a zone is write locked, that is, if a write
+	 * request targeting the zone was dispatched. All three fields are
+	 * initialized by the low level device driver (e.g. scsi/sd.c).
+	 * Stacking drivers (device mappers) may or may not initialize
+	 * these fields.
+	 */
+	unsigned int		nr_zones;
+	unsigned long		*seq_zones_bitmap;
+	unsigned long		*seq_zones_wlock;
+
+	/*
 	 * sg stuff
 	 */
 	unsigned int		sg_timeout;
@@ -783,6 +801,27 @@ static inline unsigned int blk_queue_zone_sectors(struct request_queue *q)
 	return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0;
 }
 
+static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
+{
+	return q->nr_zones;
+}
+
+static inline unsigned int blk_queue_zone_no(struct request_queue *q,
+					     sector_t sector)
+{
+	if (!blk_queue_is_zoned(q))
+		return 0;
+	return sector >> ilog2(q->limits.chunk_sectors);
+}
+
+static inline bool blk_queue_zone_is_seq(struct request_queue *q,
+					 sector_t sector)
+{
+	if (!blk_queue_is_zoned(q) || !q->seq_zones_bitmap)
+		return false;
+	return test_bit(blk_queue_zone_no(q, sector), q->seq_zones_bitmap);
+}
+
 static inline bool rq_is_sync(struct request *rq)
 {
 	return op_is_sync(rq->cmd_flags);
@@ -1019,6 +1058,16 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
 	return blk_rq_cur_bytes(rq) >> 9;
 }
 
+static inline unsigned int blk_rq_zone_no(struct request *rq)
+{
+	return blk_queue_zone_no(rq->q, blk_rq_pos(rq));
+}
+
+static inline unsigned int blk_rq_zone_is_seq(struct request *rq)
+{
+	return blk_queue_zone_is_seq(rq->q, blk_rq_pos(rq));
+}
+
 /*
  * Some commands like WRITE SAME have a payload or data transfer size which
  * is different from the size of the request.  Any driver that supports such
@@ -1568,7 +1617,15 @@ static inline unsigned int bdev_zone_sectors(struct block_device *bdev)
 
 	if (q)
 		return blk_queue_zone_sectors(q);
+	return 0;
+}
 
+static inline unsigned int bdev_nr_zones(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return blk_queue_nr_zones(q);
 	return 0;
 }
 
@@ -1944,6 +2001,60 @@ extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
 extern int bdev_read_page(struct block_device *, sector_t, struct page *);
 extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
+
+#ifdef CONFIG_BLK_DEV_ZONED
+bool blk_req_needs_zone_write_lock(struct request *rq);
+void __blk_req_zone_write_lock(struct request *rq);
+void __blk_req_zone_write_unlock(struct request *rq);
+
+static inline void blk_req_zone_write_lock(struct request *rq)
+{
+	if (blk_req_needs_zone_write_lock(rq))
+		__blk_req_zone_write_lock(rq);
+}
+
+static inline void blk_req_zone_write_unlock(struct request *rq)
+{
+	if (rq->rq_flags & RQF_ZONE_WRITE_LOCKED)
+		__blk_req_zone_write_unlock(rq);
+}
+
+static inline bool blk_req_zone_is_write_locked(struct request *rq)
+{
+	return rq->q->seq_zones_wlock &&
+		test_bit(blk_rq_zone_no(rq), rq->q->seq_zones_wlock);
+}
+
+static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
+{
+	if (!blk_req_needs_zone_write_lock(rq))
+		return true;
+	return !blk_req_zone_is_write_locked(rq);
+}
+#else
+static inline bool blk_req_needs_zone_write_lock(struct request *rq)
+{
+	return false;
+}
+
+static inline void blk_req_zone_write_lock(struct request *rq)
+{
+}
+
+static inline void blk_req_zone_write_unlock(struct request *rq)
+{
+}
+static inline bool blk_req_zone_is_write_locked(struct request *rq)
+{
+	return false;
+}
+
+static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
+{
+	return true;
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 #else /* CONFIG_BLOCK */
 
 struct block_device;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V8 2/7] mq-deadline: Introduce dispatch helpers
  2017-11-09  6:57 [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 1/7] block: introduce zoned block devices zone write locking Damien Le Moal
@ 2017-11-09  6:57 ` Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 3/7] mq-deadline: Introduce zone locking support Damien Le Moal
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-11-09  6:57 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Avoid directly referencing the next_rq and fifo_list arrays using the
helper functions deadline_next_request() and deadline_fifo_request() to
facilitate changes in the dispatch request selection in
__dd_dispatch_request() for zoned block devices.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/mq-deadline.c | 45 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 0179e484ec98..8bd6db9e69c7 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -192,13 +192,42 @@ static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 }
 
 /*
+ * For the specified data direction, return the next request to
+ * dispatch using arrival ordered lists.
+ */
+static struct request *
+deadline_fifo_request(struct deadline_data *dd, int data_dir)
+{
+	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
+		return NULL;
+
+	if (list_empty(&dd->fifo_list[data_dir]))
+		return NULL;
+
+	return rq_entry_fifo(dd->fifo_list[data_dir].next);
+}
+
+/*
+ * For the specified data direction, return the next request to
+ * dispatch using sector position sorted lists.
+ */
+static struct request *
+deadline_next_request(struct deadline_data *dd, int data_dir)
+{
+	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
+		return NULL;
+
+	return dd->next_rq[data_dir];
+}
+
+/*
  * deadline_dispatch_requests selects the best request according to
  * read/write expire, fifo_batch, etc
  */
 static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 {
 	struct deadline_data *dd = hctx->queue->elevator->elevator_data;
-	struct request *rq;
+	struct request *rq, *next_rq;
 	bool reads, writes;
 	int data_dir;
 
@@ -214,10 +243,9 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	/*
 	 * batches are currently reads XOR writes
 	 */
-	if (dd->next_rq[WRITE])
-		rq = dd->next_rq[WRITE];
-	else
-		rq = dd->next_rq[READ];
+	rq = deadline_next_request(dd, WRITE);
+	if (!rq)
+		rq = deadline_next_request(dd, READ);
 
 	if (rq && dd->batching < dd->fifo_batch)
 		/* we have a next request are still entitled to batch */
@@ -260,19 +288,20 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	/*
 	 * we are not running a batch, find best request for selected data_dir
 	 */
-	if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
+	next_rq = deadline_next_request(dd, data_dir);
+	if (deadline_check_fifo(dd, data_dir) || !next_rq) {
 		/*
 		 * A deadline has expired, the last request was in the other
 		 * direction, or we have run out of higher-sectored requests.
 		 * Start again from the request with the earliest expiry time.
 		 */
-		rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
+		rq = deadline_fifo_request(dd, data_dir);
 	} else {
 		/*
 		 * The last req was the same dir and we have a next request in
 		 * sort order. No expired requests so continue on from here.
 		 */
-		rq = dd->next_rq[data_dir];
+		rq = next_rq;
 	}
 
 	dd->batching = 0;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V8 3/7] mq-deadline: Introduce zone locking support
  2017-11-09  6:57 [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 1/7] block: introduce zoned block devices zone write locking Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 2/7] mq-deadline: Introduce dispatch helpers Damien Le Moal
@ 2017-11-09  6:57 ` Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 4/7] deadline-iosched: Introduce dispatch helpers Damien Le Moal
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-11-09  6:57 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Introduce zone write locking to avoid write request reordering with
zoned block devices. This is achieved using a finer selection of the
next request to dispatch:
1) Any non-write request is always allowed to proceed.
2) Any write to a conventional zone is always allowed to proceed.
3) For a write to a sequential zone, the zone lock is first checked.
   a) If the zone is not locked, the write is allowed to proceed after
      its target zone is locked.
   b) If the zone is locked, the write request is skipped and the next
      request in the dispatch queue tested (back to step 1).

For a write request that has locked its target zone, the zone is
unlocked either when the request completes with a call to the method
deadline_request_completed() or when the request is requeued using
dd_insert_request().

Requests targeting a locked zone are always left in the scheduler queue
to preserve the lba ordering for write requests. If no write request
can be dispatched, allow reads to be dispatched even if the write batch
is not done.

If the device used is not a zoned block device, or if zoned block device
support is disabled, this patch does not modify mq-deadline behavior.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/mq-deadline.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 86 insertions(+), 3 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 8bd6db9e69c7..d56972e8ebda 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -59,6 +59,7 @@ struct deadline_data {
 	int front_merges;
 
 	spinlock_t lock;
+	spinlock_t zone_lock;
 	struct list_head dispatch;
 };
 
@@ -198,13 +199,33 @@ static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 static struct request *
 deadline_fifo_request(struct deadline_data *dd, int data_dir)
 {
+	struct request *rq;
+	unsigned long flags;
+
 	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
 		return NULL;
 
 	if (list_empty(&dd->fifo_list[data_dir]))
 		return NULL;
 
-	return rq_entry_fifo(dd->fifo_list[data_dir].next);
+	rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
+	if (data_dir == READ || !blk_queue_is_zoned(rq->q))
+		return rq;
+
+	/*
+	 * Look for a write request that can be dispatched, that is one with
+	 * an unlocked target zone.
+	 */
+	spin_lock_irqsave(&dd->zone_lock, flags);
+	list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {
+		if (blk_req_can_dispatch_to_zone(rq))
+			goto out;
+	}
+	rq = NULL;
+out:
+	spin_unlock_irqrestore(&dd->zone_lock, flags);
+
+	return rq;
 }
 
 /*
@@ -214,10 +235,32 @@ deadline_fifo_request(struct deadline_data *dd, int data_dir)
 static struct request *
 deadline_next_request(struct deadline_data *dd, int data_dir)
 {
+	struct request *rq;
+	unsigned long flags;
+
 	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
 		return NULL;
 
-	return dd->next_rq[data_dir];
+	rq = dd->next_rq[data_dir];
+	if (!rq)
+		return NULL;
+
+	if (data_dir == READ || !blk_queue_is_zoned(rq->q))
+		return rq;
+
+	/*
+	 * Look for a write request that can be dispatched, that is one with
+	 * an unlocked target zone.
+	 */
+	spin_lock_irqsave(&dd->zone_lock, flags);
+	while (rq) {
+		if (blk_req_can_dispatch_to_zone(rq))
+			break;
+		rq = deadline_latter_request(rq);
+	}
+	spin_unlock_irqrestore(&dd->zone_lock, flags);
+
+	return rq;
 }
 
 /*
@@ -259,7 +302,8 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	if (reads) {
 		BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
 
-		if (writes && (dd->starved++ >= dd->writes_starved))
+		if (deadline_fifo_request(dd, WRITE) &&
+		    (dd->starved++ >= dd->writes_starved))
 			goto dispatch_writes;
 
 		data_dir = READ;
@@ -304,6 +348,13 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 		rq = next_rq;
 	}
 
+	/*
+	 * For a zoned block device, if we only have writes queued and none of
+	 * them can be dispatched, rq will be NULL.
+	 */
+	if (!rq)
+		return NULL;
+
 	dd->batching = 0;
 
 dispatch_request:
@@ -313,6 +364,10 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	dd->batching++;
 	deadline_move_request(dd, rq);
 done:
+	/*
+	 * If the request needs its target zone locked, do it.
+	 */
+	blk_req_zone_write_lock(rq);
 	rq->rq_flags |= RQF_STARTED;
 	return rq;
 }
@@ -368,6 +423,7 @@ static int dd_init_queue(struct request_queue *q, struct elevator_type *e)
 	dd->front_merges = 1;
 	dd->fifo_batch = fifo_batch;
 	spin_lock_init(&dd->lock);
+	spin_lock_init(&dd->zone_lock);
 	INIT_LIST_HEAD(&dd->dispatch);
 
 	q->elevator = eq;
@@ -424,6 +480,12 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	struct deadline_data *dd = q->elevator->elevator_data;
 	const int data_dir = rq_data_dir(rq);
 
+	/*
+	 * This may be a requeue of a write request that has locked its
+	 * target zone. If it is the case, this releases the zone lock.
+	 */
+	blk_req_zone_write_unlock(rq);
+
 	if (blk_mq_sched_try_insert_merge(q, rq))
 		return;
 
@@ -468,6 +530,26 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 	spin_unlock(&dd->lock);
 }
 
+/*
+ * For zoned block devices, write unlock the target zone of
+ * completed write requests. Do this while holding the zone lock
+ * spinlock so that the zone is never unlocked while deadline_fifo_request()
+ * while deadline_next_request() are executing.
+ */
+static void dd_completed_request(struct request *rq)
+{
+	struct request_queue *q = rq->q;
+
+	if (blk_queue_is_zoned(q)) {
+		struct deadline_data *dd = q->elevator->elevator_data;
+		unsigned long flags;
+
+		spin_lock_irqsave(&dd->zone_lock, flags);
+		blk_req_zone_write_unlock(rq);
+		spin_unlock_irqrestore(&dd->zone_lock, flags);
+	}
+}
+
 static bool dd_has_work(struct blk_mq_hw_ctx *hctx)
 {
 	struct deadline_data *dd = hctx->queue->elevator->elevator_data;
@@ -669,6 +751,7 @@ static struct elevator_type mq_deadline = {
 	.ops.mq = {
 		.insert_requests	= dd_insert_requests,
 		.dispatch_request	= dd_dispatch_request,
+		.completed_request	= dd_completed_request,
 		.next_request		= elv_rb_latter_request,
 		.former_request		= elv_rb_former_request,
 		.bio_merge		= dd_bio_merge,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V8 4/7] deadline-iosched: Introduce dispatch helpers
  2017-11-09  6:57 [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
                   ` (2 preceding siblings ...)
  2017-11-09  6:57 ` [PATCH V8 3/7] mq-deadline: Introduce zone locking support Damien Le Moal
@ 2017-11-09  6:57 ` Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 5/7] deadline-iosched: Introduce zone locking support Damien Le Moal
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-11-09  6:57 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Avoid directly referencing the next_rq and fifo_list arrays using the
helper functions deadline_next_request() and deadline_fifo_request() to
facilitate changes in the dispatch request selection in
deadline_dispatch_requests() for zoned block devices.

While at it, also remove the unnecessary forward declaration of the
function deadline_move_request().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/deadline-iosched.c | 47 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index b83f77460d28..81e3f0897457 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -50,8 +50,6 @@ struct deadline_data {
 	int front_merges;
 };
 
-static void deadline_move_request(struct deadline_data *, struct request *);
-
 static inline struct rb_root *
 deadline_rb_root(struct deadline_data *dd, struct request *rq)
 {
@@ -231,6 +229,35 @@ static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 }
 
 /*
+ * For the specified data direction, return the next request to dispatch using
+ * arrival ordered lists.
+ */
+static struct request *
+deadline_fifo_request(struct deadline_data *dd, int data_dir)
+{
+	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
+		return NULL;
+
+	if (list_empty(&dd->fifo_list[data_dir]))
+		return NULL;
+
+	return rq_entry_fifo(dd->fifo_list[data_dir].next);
+}
+
+/*
+ * For the specified data direction, return the next request to dispatch using
+ * sector position sorted lists.
+ */
+static struct request *
+deadline_next_request(struct deadline_data *dd, int data_dir)
+{
+	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
+		return NULL;
+
+	return dd->next_rq[data_dir];
+}
+
+/*
  * deadline_dispatch_requests selects the best request according to
  * read/write expire, fifo_batch, etc
  */
@@ -239,16 +266,15 @@ static int deadline_dispatch_requests(struct request_queue *q, int force)
 	struct deadline_data *dd = q->elevator->elevator_data;
 	const int reads = !list_empty(&dd->fifo_list[READ]);
 	const int writes = !list_empty(&dd->fifo_list[WRITE]);
-	struct request *rq;
+	struct request *rq, *next_rq;
 	int data_dir;
 
 	/*
 	 * batches are currently reads XOR writes
 	 */
-	if (dd->next_rq[WRITE])
-		rq = dd->next_rq[WRITE];
-	else
-		rq = dd->next_rq[READ];
+	rq = deadline_next_request(dd, WRITE);
+	if (!rq)
+		rq = deadline_next_request(dd, READ);
 
 	if (rq && dd->batching < dd->fifo_batch)
 		/* we have a next request are still entitled to batch */
@@ -291,19 +317,20 @@ static int deadline_dispatch_requests(struct request_queue *q, int force)
 	/*
 	 * we are not running a batch, find best request for selected data_dir
 	 */
-	if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
+	next_rq = deadline_next_request(dd, data_dir);
+	if (deadline_check_fifo(dd, data_dir) || !next_rq) {
 		/*
 		 * A deadline has expired, the last request was in the other
 		 * direction, or we have run out of higher-sectored requests.
 		 * Start again from the request with the earliest expiry time.
 		 */
-		rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
+		rq = deadline_fifo_request(dd, data_dir);
 	} else {
 		/*
 		 * The last req was the same dir and we have a next request in
 		 * sort order. No expired requests so continue on from here.
 		 */
-		rq = dd->next_rq[data_dir];
+		rq = next_rq;
 	}
 
 	dd->batching = 0;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V8 5/7] deadline-iosched: Introduce zone locking support
  2017-11-09  6:57 [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
                   ` (3 preceding siblings ...)
  2017-11-09  6:57 ` [PATCH V8 4/7] deadline-iosched: Introduce dispatch helpers Damien Le Moal
@ 2017-11-09  6:57 ` Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 6/7] sd_zbc: Initialize device request queue zoned data Damien Le Moal
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-11-09  6:57 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Introduce zone write locking to avoid write request reordering with
zoned block devices. This is achieved using a finer selection of the
next request to dispatch:
1) Any non-write request is always allowed to proceed.
2) Any write to a conventional zone is always allowed to proceed.
3) For a write to a sequential zone, the zone lock is first checked.
   a) If the zone is not locked, the write is allowed to proceed after
      its target zone is locked.
   b) If the zone is locked, the write request is skipped and the next
      request in the dispatch queue tested (back to step 1).

For a write request that has locked its target zone, the zone is
unlocked either when the request completes and the method
deadline_request_completed() is called, or when the request is requeued
using the method deadline_add_request().

Requests targeting a locked zone are always left in the scheduler queue
to preserve the initial write order. If no write request can be
dispatched, allow reads to be dispatched even if the write batch is not
done.

If the device used is not a zoned block device, or if zoned block device
support is disabled, this patch does not modify deadline behavior.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/deadline-iosched.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index 81e3f0897457..9de9f156e203 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -98,6 +98,12 @@ deadline_add_request(struct request_queue *q, struct request *rq)
 	struct deadline_data *dd = q->elevator->elevator_data;
 	const int data_dir = rq_data_dir(rq);
 
+	/*
+	 * This may be a requeue of a write request that has locked its
+	 * target zone. If it is the case, this releases the zone lock.
+	 */
+	blk_req_zone_write_unlock(rq);
+
 	deadline_add_rq_rb(dd, rq);
 
 	/*
@@ -188,6 +194,12 @@ deadline_move_to_dispatch(struct deadline_data *dd, struct request *rq)
 {
 	struct request_queue *q = rq->q;
 
+	/*
+	 * For a zoned block device, write requests must write lock their
+	 * target zone.
+	 */
+	blk_req_zone_write_lock(rq);
+
 	deadline_remove_request(q, rq);
 	elv_dispatch_add_tail(q, rq);
 }
@@ -235,13 +247,28 @@ static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 static struct request *
 deadline_fifo_request(struct deadline_data *dd, int data_dir)
 {
+	struct request *rq;
+
 	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
 		return NULL;
 
 	if (list_empty(&dd->fifo_list[data_dir]))
 		return NULL;
 
-	return rq_entry_fifo(dd->fifo_list[data_dir].next);
+	rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
+	if (data_dir == READ || !blk_queue_is_zoned(rq->q))
+		return rq;
+
+	/*
+	 * Look for a write request that can be dispatched, that is one with
+	 * an unlocked target zone.
+	 */
+	list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {
+		if (blk_req_can_dispatch_to_zone(rq))
+			return rq;
+	}
+
+	return NULL;
 }
 
 /*
@@ -251,10 +278,29 @@ deadline_fifo_request(struct deadline_data *dd, int data_dir)
 static struct request *
 deadline_next_request(struct deadline_data *dd, int data_dir)
 {
+	struct request *rq;
+
 	if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
 		return NULL;
 
-	return dd->next_rq[data_dir];
+	rq = dd->next_rq[data_dir];
+	if (!rq)
+		return NULL;
+
+	if (data_dir == READ || !blk_queue_is_zoned(rq->q))
+		return rq;
+
+	/*
+	 * Look for a write request that can be dispatched, that is one with
+	 * an unlocked target zone.
+	 */
+	while (rq) {
+		if (blk_req_can_dispatch_to_zone(rq))
+			return rq;
+		rq = deadline_latter_request(rq);
+	}
+
+	return NULL;
 }
 
 /*
@@ -288,7 +334,8 @@ static int deadline_dispatch_requests(struct request_queue *q, int force)
 	if (reads) {
 		BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
 
-		if (writes && (dd->starved++ >= dd->writes_starved))
+		if (deadline_fifo_request(dd, WRITE) &&
+		    (dd->starved++ >= dd->writes_starved))
 			goto dispatch_writes;
 
 		data_dir = READ;
@@ -333,6 +380,13 @@ static int deadline_dispatch_requests(struct request_queue *q, int force)
 		rq = next_rq;
 	}
 
+	/*
+	 * For a zoned block device, if we only have writes queued and none of
+	 * them can be dispatched, rq will be NULL.
+	 */
+	if (!rq)
+		return 0;
+
 	dd->batching = 0;
 
 dispatch_request:
@@ -345,6 +399,16 @@ static int deadline_dispatch_requests(struct request_queue *q, int force)
 	return 1;
 }
 
+/*
+ * For zoned block devices, write unlock the target zone of completed
+ * write requests.
+ */
+static void
+deadline_completed_request(struct request_queue *q, struct request *rq)
+{
+	blk_req_zone_write_unlock(rq);
+}
+
 static void deadline_exit_queue(struct elevator_queue *e)
 {
 	struct deadline_data *dd = e->elevator_data;
@@ -466,6 +530,7 @@ static struct elevator_type iosched_deadline = {
 		.elevator_merged_fn =		deadline_merged_request,
 		.elevator_merge_req_fn =	deadline_merged_requests,
 		.elevator_dispatch_fn =		deadline_dispatch_requests,
+		.elevator_completed_req_fn =	deadline_completed_request,
 		.elevator_add_req_fn =		deadline_add_request,
 		.elevator_former_req_fn =	elv_rb_former_request,
 		.elevator_latter_req_fn =	elv_rb_latter_request,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V8 6/7] sd_zbc: Initialize device request queue zoned data
  2017-11-09  6:57 [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
                   ` (4 preceding siblings ...)
  2017-11-09  6:57 ` [PATCH V8 5/7] deadline-iosched: Introduce zone locking support Damien Le Moal
@ 2017-11-09  6:57 ` Damien Le Moal
  2017-11-21  2:36   ` Martin K. Petersen
  2017-11-09  6:57 ` [PATCH V8 7/7] sd: Remove zone write locking Damien Le Moal
  2017-11-24 23:48   ` Damien Le Moal
  7 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2017-11-09  6:57 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

Initialize the seq_zones_bitmap, seq_zones_wlock and nr_zones fields of
the disk request queue on disk revalidate. As the seq_zones_bitmap
and seq_zones_wlock allocations are identical, introduce the helper
sd_zbc_alloc_zone_bitmap(). Using this helper, reallocate the bitmaps
whenever the disk capacity (number of zones) changes.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd_zbc.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 144 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 27793b9f54c0..c715b8363ce0 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -586,8 +586,123 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 	return 0;
 }
 
+/**
+ * sd_zbc_alloc_zone_bitmap - Allocate a zone bitmap (one bit per zone).
+ * @sdkp: The disk of the bitmap
+ */
+static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
+{
+	struct request_queue *q = sdkp->disk->queue;
+
+	return kzalloc_node(BITS_TO_LONGS(sdkp->nr_zones)
+			    * sizeof(unsigned long),
+			    GFP_KERNEL, q->node);
+}
+
+/**
+ * sd_zbc_get_seq_zones - Parse report zones reply to identify sequential zones
+ * @sdkp: disk used
+ * @buf: report reply buffer
+ * @seq_zone_bitamp: bitmap of sequential zones to set
+ *
+ * Parse reported zone descriptors in @buf to identify sequential zones and
+ * set the reported zone bit in @seq_zones_bitmap accordingly.
+ * Since read-only and offline zones cannot be written, do not
+ * mark them as sequential in the bitmap.
+ * Return the LBA after the last zone reported.
+ */
+static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf,
+				     unsigned int buflen,
+				     unsigned long *seq_zones_bitmap)
+{
+	sector_t lba, next_lba = sdkp->capacity;
+	unsigned int buf_len, list_length;
+	unsigned char *rec;
+	u8 type, cond;
+
+	list_length = get_unaligned_be32(&buf[0]) + 64;
+	buf_len = min(list_length, buflen);
+	rec = buf + 64;
+
+	while (rec < buf + buf_len) {
+		type = rec[0] & 0x0f;
+		cond = (rec[1] >> 4) & 0xf;
+		lba = get_unaligned_be64(&rec[16]);
+		if (type != ZBC_ZONE_TYPE_CONV &&
+		    cond != ZBC_ZONE_COND_READONLY &&
+		    cond != ZBC_ZONE_COND_OFFLINE)
+			set_bit(lba >> sdkp->zone_shift, seq_zones_bitmap);
+		next_lba = lba + get_unaligned_be64(&rec[8]);
+		rec += 64;
+	}
+
+	return next_lba;
+}
+
+/**
+ * sd_zbc_setup_seq_zones_bitmap - Initialize the disk seq zone bitmap.
+ * @sdkp: target disk
+ *
+ * Allocate a zone bitmap and initialize it by identifying sequential zones.
+ */
+static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	unsigned long *seq_zones_bitmap;
+	sector_t lba = 0;
+	unsigned char *buf;
+	int ret = -ENOMEM;
+
+	seq_zones_bitmap = sd_zbc_alloc_zone_bitmap(sdkp);
+	if (!seq_zones_bitmap)
+		return -ENOMEM;
+
+	buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
+	if (!buf)
+		goto out;
+
+	while (lba < sdkp->capacity) {
+		ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, lba);
+		if (ret)
+			goto out;
+		lba = sd_zbc_get_seq_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
+					   seq_zones_bitmap);
+	}
+
+	if (lba != sdkp->capacity) {
+		/* Something went wrong */
+		ret = -EIO;
+	}
+
+out:
+	kfree(buf);
+	if (ret) {
+		kfree(seq_zones_bitmap);
+		return ret;
+	}
+
+	q->seq_zones_bitmap = seq_zones_bitmap;
+
+	return 0;
+}
+
+static void sd_zbc_cleanup(struct scsi_disk *sdkp)
+{
+	struct request_queue *q = sdkp->disk->queue;
+
+	kfree(q->seq_zones_bitmap);
+	q->seq_zones_bitmap = NULL;
+
+	kfree(q->seq_zones_wlock);
+	q->seq_zones_wlock = NULL;
+
+	q->nr_zones = 0;
+}
+
 static int sd_zbc_setup(struct scsi_disk *sdkp)
 {
+	struct request_queue *q = sdkp->disk->queue;
+	int ret;
 
 	/* READ16/WRITE16 is mandatory for ZBC disks */
 	sdkp->device->use_16_for_rw = 1;
@@ -599,15 +714,36 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
 	sdkp->nr_zones =
 		round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift;
 
-	if (!sdkp->zones_wlock) {
-		sdkp->zones_wlock = kcalloc(BITS_TO_LONGS(sdkp->nr_zones),
-					    sizeof(unsigned long),
-					    GFP_KERNEL);
-		if (!sdkp->zones_wlock)
-			return -ENOMEM;
+	/*
+	 * Initialize the device request queue information if the number
+	 * of zones changed.
+	 */
+	if (sdkp->nr_zones != q->nr_zones) {
+
+		sd_zbc_cleanup(sdkp);
+
+		q->nr_zones = sdkp->nr_zones;
+		if (sdkp->nr_zones) {
+			q->seq_zones_wlock = sd_zbc_alloc_zone_bitmap(sdkp);
+			if (!q->seq_zones_wlock) {
+				ret = -ENOMEM;
+				goto err;
+			}
+
+			ret = sd_zbc_setup_seq_zones_bitmap(sdkp);
+			if (ret) {
+				sd_zbc_cleanup(sdkp);
+				goto err;
+			}
+		}
+
 	}
 
 	return 0;
+
+err:
+	sd_zbc_cleanup(sdkp);
+	return ret;
 }
 
 int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
@@ -661,14 +797,14 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 
 err:
 	sdkp->capacity = 0;
+	sd_zbc_cleanup(sdkp);
 
 	return ret;
 }
 
 void sd_zbc_remove(struct scsi_disk *sdkp)
 {
-	kfree(sdkp->zones_wlock);
-	sdkp->zones_wlock = NULL;
+	sd_zbc_cleanup(sdkp);
 }
 
 void sd_zbc_print_zones(struct scsi_disk *sdkp)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V8 7/7] sd: Remove zone write locking
  2017-11-09  6:57 [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
                   ` (5 preceding siblings ...)
  2017-11-09  6:57 ` [PATCH V8 6/7] sd_zbc: Initialize device request queue zoned data Damien Le Moal
@ 2017-11-09  6:57 ` Damien Le Moal
  2017-11-24 23:48   ` Damien Le Moal
  7 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-11-09  6:57 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe
  Cc: Christoph Hellwig, Bart Van Assche

The block layer now handles zone write locking.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/scsi/sd.c        | 41 +++---------------------
 drivers/scsi/sd.h        | 11 -------
 drivers/scsi/sd_zbc.c    | 83 ------------------------------------------------
 include/scsi/scsi_cmnd.h |  3 +-
 4 files changed, 6 insertions(+), 132 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fb9f8b5f4673..2d5b0f84ff14 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -835,16 +835,13 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 	u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
 	u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
-	int ret;
 
 	if (!(rq->cmd_flags & REQ_NOUNMAP)) {
 		switch (sdkp->zeroing_mode) {
 		case SD_ZERO_WS16_UNMAP:
-			ret = sd_setup_write_same16_cmnd(cmd, true);
-			goto out;
+			return sd_setup_write_same16_cmnd(cmd, true);
 		case SD_ZERO_WS10_UNMAP:
-			ret = sd_setup_write_same10_cmnd(cmd, true);
-			goto out;
+			return sd_setup_write_same10_cmnd(cmd, true);
 		}
 	}
 
@@ -852,15 +849,9 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
 		return BLKPREP_INVALID;
 
 	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
-		ret = sd_setup_write_same16_cmnd(cmd, false);
-	else
-		ret = sd_setup_write_same10_cmnd(cmd, false);
-
-out:
-	if (sd_is_zoned(sdkp) && ret == BLKPREP_OK)
-		return sd_zbc_write_lock_zone(cmd);
+		return sd_setup_write_same16_cmnd(cmd, false);
 
-	return ret;
+	return sd_setup_write_same10_cmnd(cmd, false);
 }
 
 static void sd_config_write_same(struct scsi_disk *sdkp)
@@ -928,12 +919,6 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
 	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
-	if (sd_is_zoned(sdkp)) {
-		ret = sd_zbc_write_lock_zone(cmd);
-		if (ret != BLKPREP_OK)
-			return ret;
-	}
-
 	sector >>= ilog2(sdp->sector_size) - 9;
 	nr_sectors >>= ilog2(sdp->sector_size) - 9;
 
@@ -968,9 +953,6 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 	ret = scsi_init_io(cmd);
 	rq->__data_len = nr_bytes;
 
-	if (sd_is_zoned(sdkp) && ret != BLKPREP_OK)
-		sd_zbc_write_unlock_zone(cmd);
-
 	return ret;
 }
 
@@ -1000,19 +982,12 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	sector_t threshold;
 	unsigned int this_count = blk_rq_sectors(rq);
 	unsigned int dif, dix;
-	bool zoned_write = sd_is_zoned(sdkp) && rq_data_dir(rq) == WRITE;
 	int ret;
 	unsigned char protect;
 
-	if (zoned_write) {
-		ret = sd_zbc_write_lock_zone(SCpnt);
-		if (ret != BLKPREP_OK)
-			return ret;
-	}
-
 	ret = scsi_init_io(SCpnt);
 	if (ret != BLKPREP_OK)
-		goto out;
+		return ret;
 	WARN_ON_ONCE(SCpnt != rq->special);
 
 	/* from here on until we're complete, any goto out
@@ -1231,9 +1206,6 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	 */
 	ret = BLKPREP_OK;
  out:
-	if (zoned_write && ret != BLKPREP_OK)
-		sd_zbc_write_unlock_zone(SCpnt);
-
 	return ret;
 }
 
@@ -1277,9 +1249,6 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 {
 	struct request *rq = SCpnt->request;
 
-	if (SCpnt->flags & SCMD_ZONE_WRITE_LOCK)
-		sd_zbc_write_unlock_zone(SCpnt);
-
 	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
 		__free_page(rq->special_vec.bv_page);
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 99c4dde9b6bf..112627c1cc85 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -76,7 +76,6 @@ struct scsi_disk {
 	unsigned int	nr_zones;
 	unsigned int	zone_blocks;
 	unsigned int	zone_shift;
-	unsigned long	*zones_wlock;
 	unsigned int	zones_optimal_open;
 	unsigned int	zones_optimal_nonseq;
 	unsigned int	zones_max_open;
@@ -282,8 +281,6 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp)
 extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
 extern void sd_zbc_remove(struct scsi_disk *sdkp);
 extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
-extern int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd);
-extern void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd);
 extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd);
 extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
 extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
@@ -301,14 +298,6 @@ static inline void sd_zbc_remove(struct scsi_disk *sdkp) {}
 
 static inline void sd_zbc_print_zones(struct scsi_disk *sdkp) {}
 
-static inline int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
-{
-	/* Let the drive fail requests */
-	return BLKPREP_OK;
-}
-
-static inline void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd) {}
-
 static inline int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
 {
 	return BLKPREP_INVALID;
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index c715b8363ce0..6c348a211ebb 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -230,17 +230,6 @@ static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
 }
 
 /**
- * sd_zbc_zone_no - Get the number of the zone conataining a sector.
- * @sdkp: The target disk
- * @sector: 512B sector address contained in the zone
- */
-static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
-					  sector_t sector)
-{
-	return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
-}
-
-/**
  * sd_zbc_setup_reset_cmnd - Prepare a RESET WRITE POINTER scsi command.
  * @cmd: the command to setup
  *
@@ -279,78 +268,6 @@ int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
 }
 
 /**
- * sd_zbc_write_lock_zone - Write lock a sequential zone.
- * @cmd: write command
- *
- * Called from sd_init_cmd() for write requests (standard write, write same or
- * write zeroes operations). If the request target zone is not already locked,
- * the zone is locked and BLKPREP_OK returned, allowing the request to proceed
- * through dispatch in scsi_request_fn(). Otherwise, BLKPREP_DEFER is returned,
- * forcing the request to wait for the zone to be unlocked, that is, for the
- * previously issued write request targeting the same zone to complete.
- *
- * This is called from blk_peek_request() context with the queue lock held and
- * before the request is removed from the scheduler. As a result, multiple
- * contexts executing concurrently scsi_request_fn() cannot result in write
- * sequence reordering as only a single write request per zone is allowed to
- * proceed.
- */
-int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
-{
-	struct request *rq = cmd->request;
-	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
-	sector_t sector = blk_rq_pos(rq);
-	sector_t zone_sectors = sd_zbc_zone_sectors(sdkp);
-	unsigned int zno = sd_zbc_zone_no(sdkp, sector);
-
-	/*
-	 * Note: Checks of the alignment of the write command on
-	 * logical blocks is done in sd.c
-	 */
-
-	/* Do not allow zone boundaries crossing on host-managed drives */
-	if (blk_queue_zoned_model(sdkp->disk->queue) == BLK_ZONED_HM &&
-	    (sector & (zone_sectors - 1)) + blk_rq_sectors(rq) > zone_sectors)
-		return BLKPREP_KILL;
-
-	/*
-	 * Do not issue more than one write at a time per
-	 * zone. This solves write ordering problems due to
-	 * the unlocking of the request queue in the dispatch
-	 * path in the non scsi-mq case.
-	 */
-	if (sdkp->zones_wlock &&
-	    test_and_set_bit(zno, sdkp->zones_wlock))
-		return BLKPREP_DEFER;
-
-	WARN_ON_ONCE(cmd->flags & SCMD_ZONE_WRITE_LOCK);
-	cmd->flags |= SCMD_ZONE_WRITE_LOCK;
-
-	return BLKPREP_OK;
-}
-
-/**
- * sd_zbc_write_unlock_zone - Write unlock a sequential zone.
- * @cmd: write command
- *
- * Called from sd_uninit_cmd(). Unlocking the request target zone will allow
- * dispatching the next write request for the zone.
- */
-void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
-{
-	struct request *rq = cmd->request;
-	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
-
-	if (sdkp->zones_wlock && cmd->flags & SCMD_ZONE_WRITE_LOCK) {
-		unsigned int zno = sd_zbc_zone_no(sdkp, blk_rq_pos(rq));
-		WARN_ON_ONCE(!test_bit(zno, sdkp->zones_wlock));
-		cmd->flags &= ~SCMD_ZONE_WRITE_LOCK;
-		clear_bit_unlock(zno, sdkp->zones_wlock);
-		smp_mb__after_atomic();
-	}
-}
-
-/**
  * sd_zbc_complete - ZBC command post processing.
  * @cmd: Completed command
  * @good_bytes: Command reply bytes
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 3d3f8b342e05..04477648aa1a 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -57,8 +57,7 @@ struct scsi_pointer {
 /* for scmd->flags */
 #define SCMD_TAGGED		(1 << 0)
 #define SCMD_UNCHECKED_ISA_DMA	(1 << 1)
-#define SCMD_ZONE_WRITE_LOCK	(1 << 2)
-#define SCMD_INITIALIZED	(1 << 3)
+#define SCMD_INITIALIZED	(1 << 2)
 /* flags preserved across unprep / reprep */
 #define SCMD_PRESERVED_FLAGS	(SCMD_UNCHECKED_ISA_DMA | SCMD_INITIALIZED)
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 6/7] sd_zbc: Initialize device request queue zoned data
  2017-11-09  6:57 ` [PATCH V8 6/7] sd_zbc: Initialize device request queue zoned data Damien Le Moal
@ 2017-11-21  2:36   ` Martin K. Petersen
  0 siblings, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2017-11-21  2:36 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, Martin K . Petersen, linux-block, Jens Axboe,
	Christoph Hellwig, Bart Van Assche


Damien,

> Initialize the seq_zones_bitmap, seq_zones_wlock and nr_zones fields
> of the disk request queue on disk revalidate. As the seq_zones_bitmap
> and seq_zones_wlock allocations are identical, introduce the helper
> sd_zbc_alloc_zone_bitmap(). Using this helper, reallocate the bitmaps
> whenever the disk capacity (number of zones) changes.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 0/7] blk-mq support for ZBC disks
  2017-11-09  6:57 [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
@ 2017-11-24 23:48   ` Damien Le Moal
  2017-11-09  6:57 ` [PATCH V8 2/7] mq-deadline: Introduce dispatch helpers Damien Le Moal
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-11-24 23:48 UTC (permalink / raw)
  To: linux-scsi, linux-block, martin.petersen, axboe; +Cc: hch, Bart Van Assche

W0Z1bGwgcXVvdGUgZGVsZXRlZF0NCg0KSGkgSmVucywNCg0KQW55IGNvbW1lbnQgcmVnYXJkaW5n
IHRoaXMgc2VyaWVzID8NCkkgdW5kZXJzdGFuZCB0aGF0IHRoaXMgd291bGQgYmUgZm9yIHRoZSA0
LjE2IG1lcmdlIHdpbmRvdywgc28gbm8gaHVycnksDQpidXQgSSB3b3VsZCBsaWtlIHRvIGtub3cg
aWYgSSBuZWVkIHRvIGdvIGJhY2sgdG8gdGhlIGRyYXdpbmcgYm9hcmQgZm9yDQpaQkMgYmxrLW1x
L3Njc2ktbXEgc3VwcG9ydCBvciBpZiB0aGlzIGlzIGFuIGFjY2VwdGFibGUgc29sdXRpb24uDQoN
CkJlc3QgcmVnYXJkcy4NCg0KLS0gDQpEYW1pZW4gTGUgTW9hbA0KV2VzdGVybiBEaWdpdGFsIFJl
c2VhcmNoDQo=

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 0/7] blk-mq support for ZBC disks
@ 2017-11-24 23:48   ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-11-24 23:48 UTC (permalink / raw)
  To: linux-scsi, linux-block, martin.petersen, axboe; +Cc: hch, Bart Van Assche

[Full quote deleted]

Hi Jens,

Any comment regarding this series ?
I understand that this would be for the 4.16 merge window, so no hurry,
but I would like to know if I need to go back to the drawing board for
ZBC blk-mq/scsi-mq support or if this is an acceptable solution.

Best regards.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 0/7] blk-mq support for ZBC disks
  2017-11-24 23:48   ` Damien Le Moal
  (?)
@ 2017-11-24 23:54   ` Jens Axboe
  2017-12-12  1:18       ` Damien Le Moal
  -1 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2017-11-24 23:54 UTC (permalink / raw)
  To: Damien Le Moal, linux-scsi, linux-block, martin.petersen
  Cc: hch, Bart Van Assche

On 11/24/2017 04:48 PM, Damien Le Moal wrote:
> [Full quote deleted]
> 
> Hi Jens,
> 
> Any comment regarding this series ?
> I understand that this would be for the 4.16 merge window, so no hurry,
> but I would like to know if I need to go back to the drawing board for
> ZBC blk-mq/scsi-mq support or if this is an acceptable solution.

I'll give it a thorough look-over on Monday.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 0/7] blk-mq support for ZBC disks
  2017-11-24 23:54   ` Jens Axboe
@ 2017-12-12  1:18       ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-12-12  1:18 UTC (permalink / raw)
  To: linux-scsi, linux-block, martin.petersen, axboe; +Cc: hch, Bart Van Assche

SmVucywNCg0KT24gRnJpLCAyMDE3LTExLTI0IGF0IDE2OjU0IC0wNzAwLCBKZW5zIEF4Ym9lIHdy
b3RlOg0KPiBPbiAxMS8yNC8yMDE3IDA0OjQ4IFBNLCBEYW1pZW4gTGUgTW9hbCB3cm90ZToNCj4g
PiBbRnVsbCBxdW90ZSBkZWxldGVkXQ0KPiA+IA0KPiA+IEhpIEplbnMsDQo+ID4gDQo+ID4gQW55
IGNvbW1lbnQgcmVnYXJkaW5nIHRoaXMgc2VyaWVzID8NCj4gPiBJIHVuZGVyc3RhbmQgdGhhdCB0
aGlzIHdvdWxkIGJlIGZvciB0aGUgNC4xNiBtZXJnZSB3aW5kb3csIHNvIG5vIGh1cnJ5LA0KPiA+
IGJ1dCBJIHdvdWxkIGxpa2UgdG8ga25vdyBpZiBJIG5lZWQgdG8gZ28gYmFjayB0byB0aGUgZHJh
d2luZyBib2FyZCBmb3INCj4gPiBaQkMgYmxrLW1xL3Njc2ktbXEgc3VwcG9ydCBvciBpZiB0aGlz
IGlzIGFuIGFjY2VwdGFibGUgc29sdXRpb24uDQo+IA0KPiBJJ2xsIGdpdmUgaXQgYSB0aG9yb3Vn
aCBsb29rLW92ZXIgb24gTW9uZGF5Lg0KDQpXb3VsZCB5b3UgaGF2ZSBhbnkgY29tbWVudCBvbiB0
aGUgc2VyaWVzID8NCkkgdW5kZXJzdGFuZCB5b3UgYXJlIGJ1c3kgc28gcGxlYXNlIGZlZWwgZnJl
ZSB0byBsZXQgbWUga25vdyBpZiB0aGVyZSBpcw0KYW55dGhpbmcgSSBjYW4gZG8gdG8gZmFjaWxp
dGF0ZSB5b3VyIHJldmlldy4NCg0KVGhhbmsgeW91Lg0KDQotLSANCkRhbWllbiBMZSBNb2FsDQpX
ZXN0ZXJuIERpZ2l0YWw=

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V8 0/7] blk-mq support for ZBC disks
@ 2017-12-12  1:18       ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2017-12-12  1:18 UTC (permalink / raw)
  To: linux-scsi, linux-block, martin.petersen, axboe; +Cc: hch, Bart Van Assche

Jens,

On Fri, 2017-11-24 at 16:54 -0700, Jens Axboe wrote:
> On 11/24/2017 04:48 PM, Damien Le Moal wrote:
> > [Full quote deleted]
> > 
> > Hi Jens,
> > 
> > Any comment regarding this series ?
> > I understand that this would be for the 4.16 merge window, so no hurry,
> > but I would like to know if I need to go back to the drawing board for
> > ZBC blk-mq/scsi-mq support or if this is an acceptable solution.
> 
> I'll give it a thorough look-over on Monday.

Would you have any comment on the series ?
I understand you are busy so please feel free to let me know if there is
anything I can do to facilitate your review.

Thank you.

-- 
Damien Le Moal
Western Digital

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-12-12  1:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09  6:57 [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
2017-11-09  6:57 ` [PATCH V8 1/7] block: introduce zoned block devices zone write locking Damien Le Moal
2017-11-09  6:57 ` [PATCH V8 2/7] mq-deadline: Introduce dispatch helpers Damien Le Moal
2017-11-09  6:57 ` [PATCH V8 3/7] mq-deadline: Introduce zone locking support Damien Le Moal
2017-11-09  6:57 ` [PATCH V8 4/7] deadline-iosched: Introduce dispatch helpers Damien Le Moal
2017-11-09  6:57 ` [PATCH V8 5/7] deadline-iosched: Introduce zone locking support Damien Le Moal
2017-11-09  6:57 ` [PATCH V8 6/7] sd_zbc: Initialize device request queue zoned data Damien Le Moal
2017-11-21  2:36   ` Martin K. Petersen
2017-11-09  6:57 ` [PATCH V8 7/7] sd: Remove zone write locking Damien Le Moal
2017-11-24 23:48 ` [PATCH V8 0/7] blk-mq support for ZBC disks Damien Le Moal
2017-11-24 23:48   ` Damien Le Moal
2017-11-24 23:54   ` Jens Axboe
2017-12-12  1:18     ` Damien Le Moal
2017-12-12  1:18       ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.