All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/12] Submit zoned writes in order
@ 2023-04-07 23:58 Bart Van Assche
  2023-04-07 23:58 ` [PATCH v2 01/12] block: Send zoned writes to the I/O scheduler Bart Van Assche
                   ` (11 more replies)
  0 siblings, 12 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche

Hi Jens,

Tests with a zoned UFS prototype have shown that there are plenty of
opportunities for reordering in the block layer for zoned writes (REQ_OP_WRITE).
The UFS driver is more likely to trigger reordering than other SCSI drivers
because it reports BLK_STS_DEV_RESOURCE more often, e.g. during clock scaling.
This patch series makes sure that zoned writes are submitted in order without
affecting other workloads significantly.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Changes compared to v1:
- Fixed two issues detected by the kernel test robot.

Bart Van Assche (12):
  block: Send zoned writes to the I/O scheduler
  block: Send flush requests to the I/O scheduler
  block: Send requeued requests to the I/O scheduler
  block: Requeue requests if a CPU is unplugged
  block: One requeue list per hctx
  block: Preserve the order of requeued requests
  block: Make it easier to debug zoned write reordering
  block: mq-deadline: Simplify deadline_skip_seq_writes()
  block: mq-deadline: Disable head insertion for zoned writes
  block: mq-deadline: Introduce a local variable
  block: mq-deadline: Fix a race condition related to zoned writes
  block: mq-deadline: Handle requeued requests correctly

 block/blk-flush.c      |   3 +-
 block/blk-mq-debugfs.c |  66 +++++++++++------------
 block/blk-mq.c         | 115 +++++++++++++++++++++++------------------
 block/blk.h            |  19 +++++++
 block/mq-deadline.c    |  65 ++++++++++++++++++-----
 include/linux/blk-mq.h |  40 +++++++-------
 include/linux/blkdev.h |   4 --
 7 files changed, 194 insertions(+), 118 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 01/12] block: Send zoned writes to the I/O scheduler
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  7:42   ` Damien Le Moal
  2023-04-07 23:58 ` [PATCH v2 02/12] block: Send flush requests " Bart Van Assche
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Send zoned writes inserted by the device mapper to the I/O scheduler.
This prevents that zoned writes get reordered if a device mapper driver
has been stacked on top of a driver for a zoned block device.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 16 +++++++++++++---
 block/blk.h    | 19 +++++++++++++++++++
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index db93b1a71157..fefc9a728e0e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3008,9 +3008,19 @@ blk_status_t blk_insert_cloned_request(struct request *rq)
 	blk_account_io_start(rq);
 
 	/*
-	 * Since we have a scheduler attached on the top device,
-	 * bypass a potential scheduler on the bottom device for
-	 * insert.
+	 * Send zoned writes to the I/O scheduler if an I/O scheduler has been
+	 * attached.
+	 */
+	if (q->elevator && blk_rq_is_seq_zoned_write(rq)) {
+		blk_mq_sched_insert_request(rq, /*at_head=*/false,
+					    /*run_queue=*/true,
+					    /*async=*/false);
+		return BLK_STS_OK;
+	}
+
+	/*
+	 * If no I/O scheduler has been attached or if the request is not a
+	 * zoned write bypass the I/O scheduler attached to the bottom device.
 	 */
 	blk_mq_run_dispatch_ops(q,
 			ret = blk_mq_request_issue_directly(rq, true));
diff --git a/block/blk.h b/block/blk.h
index d65d96994a94..4b6f8d7a6b84 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -118,6 +118,25 @@ static inline bool bvec_gap_to_prev(const struct queue_limits *lim,
 	return __bvec_gap_to_prev(lim, bprv, offset);
 }
 
+/**
+ * blk_rq_is_seq_zoned_write() - Whether @rq is a write request for a sequential zone.
+ * @rq: Request to examine.
+ *
+ * In this context sequential zone means either a sequential write required or
+ * to a sequential write preferred zone.
+ */
+static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
+{
+	switch (req_op(rq)) {
+	case REQ_OP_WRITE:
+	case REQ_OP_WRITE_ZEROES:
+		return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
+	case REQ_OP_ZONE_APPEND:
+	default:
+		return false;
+	}
+}
+
 static inline bool rq_mergeable(struct request *rq)
 {
 	if (blk_rq_is_passthrough(rq))

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 02/12] block: Send flush requests to the I/O scheduler
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
  2023-04-07 23:58 ` [PATCH v2 01/12] block: Send zoned writes to the I/O scheduler Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  7:46   ` Damien Le Moal
  2023-04-07 23:58 ` [PATCH v2 03/12] block: Send requeued " Bart Van Assche
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Prevent that zoned writes with the FUA flag set are reordered against each
other or against other zoned writes. Separate the I/O scheduler members
from the flush members in struct request since with this patch applied a
request may pass through both an I/O scheduler and the flush machinery.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-flush.c      |  3 ++-
 block/blk-mq.c         | 11 ++++-------
 block/mq-deadline.c    |  2 +-
 include/linux/blk-mq.h | 27 +++++++++++----------------
 4 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 53202eff545e..e0cf153388d8 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -432,7 +432,8 @@ void blk_insert_flush(struct request *rq)
 	 */
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
-		blk_mq_request_bypass_insert(rq, false, true);
+		blk_mq_sched_insert_request(rq, /*at_head=*/false,
+					    /*run_queue=*/true, /*async=*/true);
 		return;
 	}
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fefc9a728e0e..250556546bbf 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -390,8 +390,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		INIT_HLIST_NODE(&rq->hash);
 		RB_CLEAR_NODE(&rq->rb_node);
 
-		if (!op_is_flush(data->cmd_flags) &&
-		    e->type->ops.prepare_request) {
+		if (e->type->ops.prepare_request) {
 			e->type->ops.prepare_request(rq);
 			rq->rq_flags |= RQF_ELVPRIV;
 		}
@@ -452,13 +451,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 		data->rq_flags |= RQF_ELV;
 
 		/*
-		 * Flush/passthrough requests are special and go directly to the
-		 * dispatch list. Don't include reserved tags in the
-		 * limiting, as it isn't useful.
+		 * Do not limit the depth for passthrough requests nor for
+		 * requests with a reserved tag.
 		 */
-		if (!op_is_flush(data->cmd_flags) &&
+		if (e->type->ops.limit_depth &&
 		    !blk_op_is_passthrough(data->cmd_flags) &&
-		    e->type->ops.limit_depth &&
 		    !(data->flags & BLK_MQ_REQ_RESERVED))
 			e->type->ops.limit_depth(data->cmd_flags, data);
 	}
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f10c2a0d18d4..d885ccf49170 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -789,7 +789,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	prio = ioprio_class_to_prio[ioprio_class];
 	per_prio = &dd->per_prio[prio];
-	if (!rq->elv.priv[0]) {
+	if (!rq->elv.priv[0] && !(rq->rq_flags & RQF_FLUSH_SEQ)) {
 		per_prio->stats.inserted++;
 		rq->elv.priv[0] = (void *)(uintptr_t)1;
 	}
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 06caacd77ed6..5e6c79ad83d2 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -169,25 +169,20 @@ struct request {
 		void *completion_data;
 	};
 
-
 	/*
 	 * Three pointers are available for the IO schedulers, if they need
-	 * more they have to dynamically allocate it.  Flush requests are
-	 * never put on the IO scheduler. So let the flush fields share
-	 * space with the elevator data.
+	 * more they have to dynamically allocate it.
 	 */
-	union {
-		struct {
-			struct io_cq		*icq;
-			void			*priv[2];
-		} elv;
-
-		struct {
-			unsigned int		seq;
-			struct list_head	list;
-			rq_end_io_fn		*saved_end_io;
-		} flush;
-	};
+	struct {
+		struct io_cq		*icq;
+		void			*priv[2];
+	} elv;
+
+	struct {
+		unsigned int		seq;
+		struct list_head	list;
+		rq_end_io_fn		*saved_end_io;
+	} flush;
 
 	union {
 		struct __call_single_data csd;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 03/12] block: Send requeued requests to the I/O scheduler
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
  2023-04-07 23:58 ` [PATCH v2 01/12] block: Send zoned writes to the I/O scheduler Bart Van Assche
  2023-04-07 23:58 ` [PATCH v2 02/12] block: Send flush requests " Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  7:53   ` Damien Le Moal
                     ` (2 more replies)
  2023-04-07 23:58 ` [PATCH v2 04/12] block: Requeue requests if a CPU is unplugged Bart Van Assche
                   ` (8 subsequent siblings)
  11 siblings, 3 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Let the I/O scheduler control which requests are dispatched.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c         | 21 +++++++++------------
 include/linux/blk-mq.h |  5 +++--
 2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 250556546bbf..57315395434b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1426,15 +1426,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
 
 		rq->rq_flags &= ~RQF_SOFTBARRIER;
 		list_del_init(&rq->queuelist);
-		/*
-		 * If RQF_DONTPREP, rq has contained some driver specific
-		 * data, so insert it to hctx dispatch list to avoid any
-		 * merge.
-		 */
-		if (rq->rq_flags & RQF_DONTPREP)
-			blk_mq_request_bypass_insert(rq, false, false);
-		else
-			blk_mq_sched_insert_request(rq, true, false, false);
+		blk_mq_sched_insert_request(rq, /*at_head=*/true, false, false);
 	}
 
 	while (!list_empty(&rq_list)) {
@@ -2065,9 +2057,14 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 		if (nr_budgets)
 			blk_mq_release_budgets(q, list);
 
-		spin_lock(&hctx->lock);
-		list_splice_tail_init(list, &hctx->dispatch);
-		spin_unlock(&hctx->lock);
+		if (!q->elevator) {
+			spin_lock(&hctx->lock);
+			list_splice_tail_init(list, &hctx->dispatch);
+			spin_unlock(&hctx->lock);
+		} else {
+			q->elevator->type->ops.insert_requests(hctx, list,
+							/*at_head=*/true);
+		}
 
 		/*
 		 * Order adding requests to hctx->dispatch and checking
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 5e6c79ad83d2..3a3bee9085e3 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -64,8 +64,9 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_RESV			((__force req_flags_t)(1 << 23))
 
 /* flags that prevent us from merging requests: */
-#define RQF_NOMERGE_FLAGS \
-	(RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
+#define RQF_NOMERGE_FLAGS                                               \
+	(RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_DONTPREP | \
+	 RQF_SPECIAL_PAYLOAD)
 
 enum mq_rq_state {
 	MQ_RQ_IDLE		= 0,

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 04/12] block: Requeue requests if a CPU is unplugged
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-04-07 23:58 ` [PATCH v2 03/12] block: Send requeued " Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  7:54   ` Damien Le Moal
  2023-04-11 12:40   ` Christoph Hellwig
  2023-04-07 23:58 ` [PATCH v2 05/12] block: One requeue list per hctx Bart Van Assche
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Requeue requests instead of sending these to the dispatch list if a CPU
is unplugged to prevent reordering of zoned writes.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 57315395434b..77fdaed4e074 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3495,9 +3495,17 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	if (list_empty(&tmp))
 		return 0;
 
-	spin_lock(&hctx->lock);
-	list_splice_tail_init(&tmp, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
+	if (hctx->queue->elevator) {
+		struct request *rq, *next;
+
+		list_for_each_entry_safe(rq, next, &tmp, queuelist)
+			blk_mq_requeue_request(rq, false);
+		blk_mq_kick_requeue_list(hctx->queue);
+	} else {
+		spin_lock(&hctx->lock);
+		list_splice_tail_init(&tmp, &hctx->dispatch);
+		spin_unlock(&hctx->lock);
+	}
 
 	blk_mq_run_hw_queue(hctx, true);
 	return 0;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 05/12] block: One requeue list per hctx
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-04-07 23:58 ` [PATCH v2 04/12] block: Requeue requests if a CPU is unplugged Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  7:58   ` Damien Le Moal
  2023-04-07 23:58 ` [PATCH v2 06/12] block: Preserve the order of requeued requests Bart Van Assche
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Prepare for processing the requeue list from inside __blk_mq_run_hw_queue().

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-debugfs.c | 66 +++++++++++++++++++++---------------------
 block/blk-mq.c         | 58 +++++++++++++++++++++++--------------
 include/linux/blk-mq.h |  4 +++
 include/linux/blkdev.h |  4 ---
 4 files changed, 73 insertions(+), 59 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 212a7f301e73..5eb930754347 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -20,37 +20,6 @@ static int queue_poll_stat_show(void *data, struct seq_file *m)
 	return 0;
 }
 
-static void *queue_requeue_list_start(struct seq_file *m, loff_t *pos)
-	__acquires(&q->requeue_lock)
-{
-	struct request_queue *q = m->private;
-
-	spin_lock_irq(&q->requeue_lock);
-	return seq_list_start(&q->requeue_list, *pos);
-}
-
-static void *queue_requeue_list_next(struct seq_file *m, void *v, loff_t *pos)
-{
-	struct request_queue *q = m->private;
-
-	return seq_list_next(v, &q->requeue_list, pos);
-}
-
-static void queue_requeue_list_stop(struct seq_file *m, void *v)
-	__releases(&q->requeue_lock)
-{
-	struct request_queue *q = m->private;
-
-	spin_unlock_irq(&q->requeue_lock);
-}
-
-static const struct seq_operations queue_requeue_list_seq_ops = {
-	.start	= queue_requeue_list_start,
-	.next	= queue_requeue_list_next,
-	.stop	= queue_requeue_list_stop,
-	.show	= blk_mq_debugfs_rq_show,
-};
-
 static int blk_flags_show(struct seq_file *m, const unsigned long flags,
 			  const char *const *flag_name, int flag_name_count)
 {
@@ -156,11 +125,10 @@ static ssize_t queue_state_write(void *data, const char __user *buf,
 
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
 	{ "poll_stat", 0400, queue_poll_stat_show },
-	{ "requeue_list", 0400, .seq_ops = &queue_requeue_list_seq_ops },
 	{ "pm_only", 0600, queue_pm_only_show, NULL },
 	{ "state", 0600, queue_state_show, queue_state_write },
 	{ "zone_wlock", 0400, queue_zone_wlock_show, NULL },
-	{ },
+	{},
 };
 
 #define HCTX_STATE_NAME(name) [BLK_MQ_S_##name] = #name
@@ -513,6 +481,37 @@ static int hctx_dispatch_busy_show(void *data, struct seq_file *m)
 	return 0;
 }
 
+static void *hctx_requeue_list_start(struct seq_file *m, loff_t *pos)
+	__acquires(&hctx->requeue_lock)
+{
+	struct blk_mq_hw_ctx *hctx = m->private;
+
+	spin_lock_irq(&hctx->requeue_lock);
+	return seq_list_start(&hctx->requeue_list, *pos);
+}
+
+static void *hctx_requeue_list_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct blk_mq_hw_ctx *hctx = m->private;
+
+	return seq_list_next(v, &hctx->requeue_list, pos);
+}
+
+static void hctx_requeue_list_stop(struct seq_file *m, void *v)
+	__releases(&hctx->requeue_lock)
+{
+	struct blk_mq_hw_ctx *hctx = m->private;
+
+	spin_unlock_irq(&hctx->requeue_lock);
+}
+
+static const struct seq_operations hctx_requeue_list_seq_ops = {
+	.start = hctx_requeue_list_start,
+	.next = hctx_requeue_list_next,
+	.stop = hctx_requeue_list_stop,
+	.show = blk_mq_debugfs_rq_show,
+};
+
 #define CTX_RQ_SEQ_OPS(name, type)					\
 static void *ctx_##name##_rq_list_start(struct seq_file *m, loff_t *pos) \
 	__acquires(&ctx->lock)						\
@@ -628,6 +627,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
 	{"run", 0600, hctx_run_show, hctx_run_write},
 	{"active", 0400, hctx_active_show},
 	{"dispatch_busy", 0400, hctx_dispatch_busy_show},
+	{"requeue_list", 0400, .seq_ops = &hctx_requeue_list_seq_ops},
 	{"type", 0400, hctx_type_show},
 	{},
 };
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 77fdaed4e074..deb3d08a6b26 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1411,14 +1411,17 @@ EXPORT_SYMBOL(blk_mq_requeue_request);
 
 static void blk_mq_requeue_work(struct work_struct *work)
 {
-	struct request_queue *q =
-		container_of(work, struct request_queue, requeue_work.work);
+	struct blk_mq_hw_ctx *hctx =
+		container_of(work, struct blk_mq_hw_ctx, requeue_work.work);
 	LIST_HEAD(rq_list);
 	struct request *rq, *next;
 
-	spin_lock_irq(&q->requeue_lock);
-	list_splice_init(&q->requeue_list, &rq_list);
-	spin_unlock_irq(&q->requeue_lock);
+	if (list_empty_careful(&hctx->requeue_list))
+		return;
+
+	spin_lock_irq(&hctx->requeue_lock);
+	list_splice_init(&hctx->requeue_list, &rq_list);
+	spin_unlock_irq(&hctx->requeue_lock);
 
 	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
 		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
@@ -1435,13 +1438,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		blk_mq_sched_insert_request(rq, false, false, false);
 	}
 
-	blk_mq_run_hw_queues(q, false);
+	blk_mq_run_hw_queue(hctx, false);
 }
 
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 				bool kick_requeue_list)
 {
-	struct request_queue *q = rq->q;
+	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 	unsigned long flags;
 
 	/*
@@ -1449,31 +1452,42 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 	 * request head insertion from the workqueue.
 	 */
 	BUG_ON(rq->rq_flags & RQF_SOFTBARRIER);
+	WARN_ON_ONCE(!rq->mq_hctx);
 
-	spin_lock_irqsave(&q->requeue_lock, flags);
+	spin_lock_irqsave(&hctx->requeue_lock, flags);
 	if (at_head) {
 		rq->rq_flags |= RQF_SOFTBARRIER;
-		list_add(&rq->queuelist, &q->requeue_list);
+		list_add(&rq->queuelist, &hctx->requeue_list);
 	} else {
-		list_add_tail(&rq->queuelist, &q->requeue_list);
+		list_add_tail(&rq->queuelist, &hctx->requeue_list);
 	}
-	spin_unlock_irqrestore(&q->requeue_lock, flags);
+	spin_unlock_irqrestore(&hctx->requeue_lock, flags);
 
 	if (kick_requeue_list)
-		blk_mq_kick_requeue_list(q);
+		blk_mq_kick_requeue_list(rq->q);
 }
 
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
-	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work, 0);
+	struct blk_mq_hw_ctx *hctx;
+	unsigned long i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
+					    &hctx->requeue_work, 0);
 }
 EXPORT_SYMBOL(blk_mq_kick_requeue_list);
 
 void blk_mq_delay_kick_requeue_list(struct request_queue *q,
 				    unsigned long msecs)
 {
-	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work,
-				    msecs_to_jiffies(msecs));
+	struct blk_mq_hw_ctx *hctx;
+	unsigned long i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
+					    &hctx->requeue_work,
+					    msecs_to_jiffies(msecs));
 }
 EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
 
@@ -3594,6 +3608,10 @@ static int blk_mq_init_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
 {
+	INIT_DELAYED_WORK(&hctx->requeue_work, blk_mq_requeue_work);
+	INIT_LIST_HEAD(&hctx->requeue_list);
+	spin_lock_init(&hctx->requeue_lock);
+
 	hctx->queue_num = hctx_idx;
 
 	if (!(hctx->flags & BLK_MQ_F_STACKING))
@@ -4209,10 +4227,6 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
 	blk_mq_update_poll_flag(q);
 
-	INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
-	INIT_LIST_HEAD(&q->requeue_list);
-	spin_lock_init(&q->requeue_lock);
-
 	q->nr_requests = set->queue_depth;
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
@@ -4757,10 +4771,10 @@ void blk_mq_cancel_work_sync(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
-	cancel_delayed_work_sync(&q->requeue_work);
-
-	queue_for_each_hw_ctx(q, hctx, i)
+	queue_for_each_hw_ctx(q, hctx, i) {
+		cancel_delayed_work_sync(&hctx->requeue_work);
 		cancel_delayed_work_sync(&hctx->run_work);
+	}
 }
 
 static int __init blk_mq_init(void)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3a3bee9085e3..0157f1569980 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -311,6 +311,10 @@ struct blk_mq_hw_ctx {
 		unsigned long		state;
 	} ____cacheline_aligned_in_smp;
 
+	struct list_head	requeue_list;
+	spinlock_t		requeue_lock;
+	struct delayed_work	requeue_work;
+
 	/**
 	 * @run_work: Used for scheduling a hardware queue run at a later time.
 	 */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e3242e67a8e3..f5fa53cd13bd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -491,10 +491,6 @@ struct request_queue {
 	 */
 	struct blk_flush_queue	*fq;
 
-	struct list_head	requeue_list;
-	spinlock_t		requeue_lock;
-	struct delayed_work	requeue_work;
-
 	struct mutex		sysfs_lock;
 	struct mutex		sysfs_dir_lock;
 

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 06/12] block: Preserve the order of requeued requests
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-04-07 23:58 ` [PATCH v2 05/12] block: One requeue list per hctx Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  8:01   ` Damien Le Moal
  2023-04-11 12:43   ` Christoph Hellwig
  2023-04-07 23:58 ` [PATCH v2 07/12] block: Make it easier to debug zoned write reordering Bart Van Assche
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

If a queue is run before all requeued requests have been sent to the I/O
scheduler, the I/O scheduler may dispatch the wrong request. Fix this by
making __blk_mq_run_hw_queue() process the requeue_list instead of
blk_mq_requeue_work().

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c         | 35 ++++++++++-------------------------
 include/linux/blk-mq.h |  1 -
 2 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index deb3d08a6b26..562868dff43f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -64,6 +64,7 @@ static inline blk_qc_t blk_rq_to_qc(struct request *rq)
 static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
 {
 	return !list_empty_careful(&hctx->dispatch) ||
+		!list_empty_careful(&hctx->requeue_list) ||
 		sbitmap_any_bit_set(&hctx->ctx_map) ||
 			blk_mq_sched_has_work(hctx);
 }
@@ -1409,10 +1410,8 @@ void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 }
 EXPORT_SYMBOL(blk_mq_requeue_request);
 
-static void blk_mq_requeue_work(struct work_struct *work)
+static void blk_mq_process_requeue_list(struct blk_mq_hw_ctx *hctx)
 {
-	struct blk_mq_hw_ctx *hctx =
-		container_of(work, struct blk_mq_hw_ctx, requeue_work.work);
 	LIST_HEAD(rq_list);
 	struct request *rq, *next;
 
@@ -1437,8 +1436,6 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		list_del_init(&rq->queuelist);
 		blk_mq_sched_insert_request(rq, false, false, false);
 	}
-
-	blk_mq_run_hw_queue(hctx, false);
 }
 
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
@@ -1464,30 +1461,19 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 	spin_unlock_irqrestore(&hctx->requeue_lock, flags);
 
 	if (kick_requeue_list)
-		blk_mq_kick_requeue_list(rq->q);
+		blk_mq_run_hw_queue(hctx, /*async=*/true);
 }
 
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
-	struct blk_mq_hw_ctx *hctx;
-	unsigned long i;
-
-	queue_for_each_hw_ctx(q, hctx, i)
-		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
-					    &hctx->requeue_work, 0);
+	blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL(blk_mq_kick_requeue_list);
 
 void blk_mq_delay_kick_requeue_list(struct request_queue *q,
 				    unsigned long msecs)
 {
-	struct blk_mq_hw_ctx *hctx;
-	unsigned long i;
-
-	queue_for_each_hw_ctx(q, hctx, i)
-		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
-					    &hctx->requeue_work,
-					    msecs_to_jiffies(msecs));
+	blk_mq_delay_run_hw_queues(q, msecs);
 }
 EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
 
@@ -2146,6 +2132,8 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	 */
 	WARN_ON_ONCE(in_interrupt());
 
+	blk_mq_process_requeue_list(hctx);
+
 	blk_mq_run_dispatch_ops(hctx->queue,
 			blk_mq_sched_dispatch_requests(hctx));
 }
@@ -2317,7 +2305,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 		 * scheduler.
 		 */
 		if (!sq_hctx || sq_hctx == hctx ||
-		    !list_empty_careful(&hctx->dispatch))
+		    blk_mq_hctx_has_pending(hctx))
 			blk_mq_run_hw_queue(hctx, async);
 	}
 }
@@ -2353,7 +2341,7 @@ void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs)
 		 * scheduler.
 		 */
 		if (!sq_hctx || sq_hctx == hctx ||
-		    !list_empty_careful(&hctx->dispatch))
+		    blk_mq_hctx_has_pending(hctx))
 			blk_mq_delay_run_hw_queue(hctx, msecs);
 	}
 }
@@ -3608,7 +3596,6 @@ static int blk_mq_init_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
 {
-	INIT_DELAYED_WORK(&hctx->requeue_work, blk_mq_requeue_work);
 	INIT_LIST_HEAD(&hctx->requeue_list);
 	spin_lock_init(&hctx->requeue_lock);
 
@@ -4771,10 +4758,8 @@ void blk_mq_cancel_work_sync(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
-	queue_for_each_hw_ctx(q, hctx, i) {
-		cancel_delayed_work_sync(&hctx->requeue_work);
+	queue_for_each_hw_ctx(q, hctx, i)
 		cancel_delayed_work_sync(&hctx->run_work);
-	}
 }
 
 static int __init blk_mq_init(void)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 0157f1569980..e62feb17af96 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -313,7 +313,6 @@ struct blk_mq_hw_ctx {
 
 	struct list_head	requeue_list;
 	spinlock_t		requeue_lock;
-	struct delayed_work	requeue_work;
 
 	/**
 	 * @run_work: Used for scheduling a hardware queue run at a later time.

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 07/12] block: Make it easier to debug zoned write reordering
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
                   ` (5 preceding siblings ...)
  2023-04-07 23:58 ` [PATCH v2 06/12] block: Preserve the order of requeued requests Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  8:06   ` Damien Le Moal
  2023-04-07 23:58 ` [PATCH v2 08/12] block: mq-deadline: Simplify deadline_skip_seq_writes() Bart Van Assche
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Issue a kernel warning if reordering could happen.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 562868dff43f..d89a0e6cf37d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2478,6 +2478,8 @@ void blk_mq_request_bypass_insert(struct request *rq, bool at_head,
 {
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
+	WARN_ON_ONCE(rq->q->elevator && blk_rq_is_seq_zoned_write(rq));
+
 	spin_lock(&hctx->lock);
 	if (at_head)
 		list_add(&rq->queuelist, &hctx->dispatch);
@@ -2570,6 +2572,8 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	bool run_queue = true;
 	int budget_token;
 
+	WARN_ON_ONCE(q->elevator && blk_rq_is_seq_zoned_write(rq));
+
 	/*
 	 * RCU or SRCU read lock is needed before checking quiesced flag.
 	 *

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 08/12] block: mq-deadline: Simplify deadline_skip_seq_writes()
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
                   ` (6 preceding siblings ...)
  2023-04-07 23:58 ` [PATCH v2 07/12] block: Make it easier to debug zoned write reordering Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  8:07   ` Damien Le Moal
  2023-04-07 23:58 ` [PATCH v2 09/12] block: mq-deadline: Disable head insertion for zoned writes Bart Van Assche
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Make deadline_skip_seq_writes() shorter without changing its
functionality.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index d885ccf49170..50a9d3b0a291 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -312,12 +312,9 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
 						struct request *rq)
 {
 	sector_t pos = blk_rq_pos(rq);
-	sector_t skipped_sectors = 0;
 
-	while (rq) {
-		if (blk_rq_pos(rq) != pos + skipped_sectors)
-			break;
-		skipped_sectors += blk_rq_sectors(rq);
+	while (rq && blk_rq_pos(rq) == pos) {
+		pos += blk_rq_sectors(rq);
 		rq = deadline_latter_request(rq);
 	}
 

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 09/12] block: mq-deadline: Disable head insertion for zoned writes
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
                   ` (7 preceding siblings ...)
  2023-04-07 23:58 ` [PATCH v2 08/12] block: mq-deadline: Simplify deadline_skip_seq_writes() Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  8:10   ` Damien Le Moal
  2023-04-07 23:58 ` [PATCH v2 10/12] block: mq-deadline: Introduce a local variable Bart Van Assche
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Make sure that zoned writes are submitted in LBA order.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 50a9d3b0a291..891ee0da73ac 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -798,7 +798,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	trace_block_rq_insert(rq);
 
-	if (at_head) {
+	if (at_head && !blk_rq_is_seq_zoned_write(rq)) {
 		list_add(&rq->queuelist, &per_prio->dispatch);
 		rq->fifo_time = jiffies;
 	} else {

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 10/12] block: mq-deadline: Introduce a local variable
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
                   ` (8 preceding siblings ...)
  2023-04-07 23:58 ` [PATCH v2 09/12] block: mq-deadline: Disable head insertion for zoned writes Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  8:11   ` Damien Le Moal
  2023-04-07 23:58 ` [PATCH v2 11/12] block: mq-deadline: Fix a race condition related to zoned writes Bart Van Assche
  2023-04-07 23:58 ` [PATCH v2 12/12] block: mq-deadline: Handle requeued requests correctly Bart Van Assche
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Prepare for adding more code that uses the request queue pointer.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 891ee0da73ac..8c2bc9fdcf8c 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -368,6 +368,7 @@ static struct request *
 deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		      enum dd_data_dir data_dir)
 {
+	struct request_queue *q;
 	struct request *rq;
 	unsigned long flags;
 
@@ -375,7 +376,8 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	if (!rq)
 		return NULL;
 
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	q = rq->q;
+	if (data_dir == DD_READ || !blk_queue_is_zoned(q))
 		return rq;
 
 	/*
@@ -389,7 +391,7 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	while (rq) {
 		if (blk_req_can_dispatch_to_zone(rq))
 			break;
-		if (blk_queue_nonrot(rq->q))
+		if (blk_queue_nonrot(q))
 			rq = deadline_latter_request(rq);
 		else
 			rq = deadline_skip_seq_writes(dd, rq);

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 11/12] block: mq-deadline: Fix a race condition related to zoned writes
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
                   ` (9 preceding siblings ...)
  2023-04-07 23:58 ` [PATCH v2 10/12] block: mq-deadline: Introduce a local variable Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  8:16   ` Damien Le Moal
  2023-04-07 23:58 ` [PATCH v2 12/12] block: mq-deadline: Handle requeued requests correctly Bart Van Assche
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

Let deadline_next_request() only consider the first zoned write per
zone. This patch fixes a race condition between deadline_next_request()
and completion of zoned writes.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c    | 24 +++++++++++++++++++++---
 include/linux/blk-mq.h |  5 +++++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 8c2bc9fdcf8c..d49e20d3011d 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -389,12 +389,30 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	 */
 	spin_lock_irqsave(&dd->zone_lock, flags);
 	while (rq) {
+		unsigned int zno = blk_rq_zone_no(rq);
+
 		if (blk_req_can_dispatch_to_zone(rq))
 			break;
-		if (blk_queue_nonrot(q))
-			rq = deadline_latter_request(rq);
-		else
+
+		WARN_ON_ONCE(!blk_queue_is_zoned(q));
+
+		if (!blk_queue_nonrot(q)) {
 			rq = deadline_skip_seq_writes(dd, rq);
+			if (!rq)
+				break;
+			rq = deadline_earlier_request(rq);
+			if (WARN_ON_ONCE(!rq))
+				break;
+		}
+
+		/*
+		 * Skip all other write requests for the zone with zone number
+		 * 'zno'. This prevents that this function selects a zoned write
+		 * that is not the first write for a given zone.
+		 */
+		while ((rq = deadline_latter_request(rq)) &&
+		       blk_rq_zone_no(rq) == zno)
+			;
 	}
 	spin_unlock_irqrestore(&dd->zone_lock, flags);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e62feb17af96..515dfd04d736 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -1193,6 +1193,11 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
 	return !blk_req_zone_is_write_locked(rq);
 }
 #else /* CONFIG_BLK_DEV_ZONED */
+static inline unsigned int blk_rq_zone_no(struct request *rq)
+{
+	return 0;
+}
+
 static inline bool blk_req_needs_zone_write_lock(struct request *rq)
 {
 	return false;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 12/12] block: mq-deadline: Handle requeued requests correctly
  2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
                   ` (10 preceding siblings ...)
  2023-04-07 23:58 ` [PATCH v2 11/12] block: mq-deadline: Fix a race condition related to zoned writes Bart Van Assche
@ 2023-04-07 23:58 ` Bart Van Assche
  2023-04-10  8:32   ` Damien Le Moal
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-07 23:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal, Ming Lei, Mike Snitzer

If a zoned write is requeued with an LBA that is lower than already
inserted zoned writes, make sure that it is submitted first.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index d49e20d3011d..c536b499a60f 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -162,8 +162,19 @@ static void
 deadline_add_rq_rb(struct dd_per_prio *per_prio, struct request *rq)
 {
 	struct rb_root *root = deadline_rb_root(per_prio, rq);
+	struct request **next_rq = &per_prio->next_rq[rq_data_dir(rq)];
 
 	elv_rb_add(root, rq);
+	if (*next_rq == NULL || !blk_queue_is_zoned(rq->q))
+		return;
+	/*
+	 * If a request got requeued or requests have been submitted out of
+	 * order, make sure that per zone the request with the lowest LBA is
+	 * submitted first.
+	 */
+	if (blk_rq_pos(rq) < blk_rq_pos(*next_rq) &&
+	    blk_rq_zone_no(rq) == blk_rq_zone_no(*next_rq))
+		*next_rq = rq;
 }
 
 static inline void
@@ -822,6 +833,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		list_add(&rq->queuelist, &per_prio->dispatch);
 		rq->fifo_time = jiffies;
 	} else {
+		struct list_head *insert_before;
+
 		deadline_add_rq_rb(per_prio, rq);
 
 		if (rq_mergeable(rq)) {
@@ -834,7 +847,18 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		 * set expire time and add to fifo list
 		 */
 		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
-		list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
+		insert_before = &per_prio->fifo_list[data_dir];
+		if (blk_rq_is_seq_zoned_write(rq)) {
+			const unsigned int zno = blk_rq_zone_no(rq);
+			struct request *rq2 = rq;
+
+			while ((rq2 = deadline_earlier_request(rq2)) &&
+			       blk_rq_zone_no(rq2) == zno &&
+			       blk_rq_pos(rq2) > blk_rq_pos(rq)) {
+				insert_before = &rq2->queuelist;
+			}
+		}
+		list_add_tail(&rq->queuelist, insert_before);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 01/12] block: Send zoned writes to the I/O scheduler
  2023-04-07 23:58 ` [PATCH v2 01/12] block: Send zoned writes to the I/O scheduler Bart Van Assche
@ 2023-04-10  7:42   ` Damien Le Moal
  2023-04-10 16:35     ` Bart Van Assche
  0 siblings, 1 reply; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  7:42 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Send zoned writes inserted by the device mapper to the I/O scheduler.
> This prevents that zoned writes get reordered if a device mapper driver
> has been stacked on top of a driver for a zoned block device.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c | 16 +++++++++++++---
>  block/blk.h    | 19 +++++++++++++++++++
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index db93b1a71157..fefc9a728e0e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3008,9 +3008,19 @@ blk_status_t blk_insert_cloned_request(struct request *rq)
>  	blk_account_io_start(rq);
>  
>  	/*
> -	 * Since we have a scheduler attached on the top device,
> -	 * bypass a potential scheduler on the bottom device for
> -	 * insert.
> +	 * Send zoned writes to the I/O scheduler if an I/O scheduler has been
> +	 * attached.
> +	 */
> +	if (q->elevator && blk_rq_is_seq_zoned_write(rq)) {
> +		blk_mq_sched_insert_request(rq, /*at_head=*/false,
> +					    /*run_queue=*/true,
> +					    /*async=*/false);
> +		return BLK_STS_OK;
> +	}

Looks OK to me, but as I understand it, this function is used only for request
based device mapper, none of which currently support zoned block devices if I a
not mistaken. So is this change really necessary ?

> +
> +	/*
> +	 * If no I/O scheduler has been attached or if the request is not a
> +	 * zoned write bypass the I/O scheduler attached to the bottom device.
>  	 */
>  	blk_mq_run_dispatch_ops(q,
>  			ret = blk_mq_request_issue_directly(rq, true));
> diff --git a/block/blk.h b/block/blk.h
> index d65d96994a94..4b6f8d7a6b84 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -118,6 +118,25 @@ static inline bool bvec_gap_to_prev(const struct queue_limits *lim,
>  	return __bvec_gap_to_prev(lim, bprv, offset);
>  }
>  
> +/**
> + * blk_rq_is_seq_zoned_write() - Whether @rq is a write request for a sequential zone.
> + * @rq: Request to examine.
> + *
> + * In this context sequential zone means either a sequential write required or
> + * to a sequential write preferred zone.
> + */
> +static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
> +{
> +	switch (req_op(rq)) {
> +	case REQ_OP_WRITE:
> +	case REQ_OP_WRITE_ZEROES:
> +		return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
> +	case REQ_OP_ZONE_APPEND:
> +	default:
> +		return false;
> +	}
> +}
> +
>  static inline bool rq_mergeable(struct request *rq)
>  {
>  	if (blk_rq_is_passthrough(rq))


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 02/12] block: Send flush requests to the I/O scheduler
  2023-04-07 23:58 ` [PATCH v2 02/12] block: Send flush requests " Bart Van Assche
@ 2023-04-10  7:46   ` Damien Le Moal
  2023-04-11  0:15     ` Bart Van Assche
  0 siblings, 1 reply; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  7:46 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Prevent that zoned writes with the FUA flag set are reordered against each
> other or against other zoned writes. Separate the I/O scheduler members
> from the flush members in struct request since with this patch applied a
> request may pass through both an I/O scheduler and the flush machinery.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-flush.c      |  3 ++-
>  block/blk-mq.c         | 11 ++++-------
>  block/mq-deadline.c    |  2 +-
>  include/linux/blk-mq.h | 27 +++++++++++----------------
>  4 files changed, 18 insertions(+), 25 deletions(-)
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 53202eff545e..e0cf153388d8 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -432,7 +432,8 @@ void blk_insert_flush(struct request *rq)
>  	 */
>  	if ((policy & REQ_FSEQ_DATA) &&
>  	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
> -		blk_mq_request_bypass_insert(rq, false, true);
> +		blk_mq_sched_insert_request(rq, /*at_head=*/false,
> +					    /*run_queue=*/true, /*async=*/true);
>  		return;
>  	}
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index fefc9a728e0e..250556546bbf 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -390,8 +390,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  		INIT_HLIST_NODE(&rq->hash);
>  		RB_CLEAR_NODE(&rq->rb_node);
>  
> -		if (!op_is_flush(data->cmd_flags) &&
> -		    e->type->ops.prepare_request) {
> +		if (e->type->ops.prepare_request) {
>  			e->type->ops.prepare_request(rq);
>  			rq->rq_flags |= RQF_ELVPRIV;
>  		}
> @@ -452,13 +451,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>  		data->rq_flags |= RQF_ELV;
>  
>  		/*
> -		 * Flush/passthrough requests are special and go directly to the
> -		 * dispatch list. Don't include reserved tags in the
> -		 * limiting, as it isn't useful.
> +		 * Do not limit the depth for passthrough requests nor for
> +		 * requests with a reserved tag.
>  		 */
> -		if (!op_is_flush(data->cmd_flags) &&
> +		if (e->type->ops.limit_depth &&
>  		    !blk_op_is_passthrough(data->cmd_flags) &&
> -		    e->type->ops.limit_depth &&
>  		    !(data->flags & BLK_MQ_REQ_RESERVED))
>  			e->type->ops.limit_depth(data->cmd_flags, data);
>  	}
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index f10c2a0d18d4..d885ccf49170 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -789,7 +789,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  
>  	prio = ioprio_class_to_prio[ioprio_class];
>  	per_prio = &dd->per_prio[prio];
> -	if (!rq->elv.priv[0]) {
> +	if (!rq->elv.priv[0] && !(rq->rq_flags & RQF_FLUSH_SEQ)) {
>  		per_prio->stats.inserted++;
>  		rq->elv.priv[0] = (void *)(uintptr_t)1;
>  	}

Given that this change has nothing specific to zoned devices, you could rewrite
the commit message to mention that. And are bfq and kyber OK with this change as
well ?

Also, to be consistent with this change, shouldn't blk_mq_sched_bypass_insert()
be updated as well ? That function is called from blk_mq_sched_insert_request().

> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 06caacd77ed6..5e6c79ad83d2 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -169,25 +169,20 @@ struct request {
>  		void *completion_data;
>  	};
>  
> -
>  	/*
>  	 * Three pointers are available for the IO schedulers, if they need
> -	 * more they have to dynamically allocate it.  Flush requests are
> -	 * never put on the IO scheduler. So let the flush fields share
> -	 * space with the elevator data.
> +	 * more they have to dynamically allocate it.
>  	 */
> -	union {
> -		struct {
> -			struct io_cq		*icq;
> -			void			*priv[2];
> -		} elv;
> -
> -		struct {
> -			unsigned int		seq;
> -			struct list_head	list;
> -			rq_end_io_fn		*saved_end_io;
> -		} flush;
> -	};
> +	struct {
> +		struct io_cq		*icq;
> +		void			*priv[2];
> +	} elv;
> +
> +	struct {
> +		unsigned int		seq;
> +		struct list_head	list;
> +		rq_end_io_fn		*saved_end_io;
> +	} flush;
>  
>  	union {
>  		struct __call_single_data csd;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 03/12] block: Send requeued requests to the I/O scheduler
  2023-04-07 23:58 ` [PATCH v2 03/12] block: Send requeued " Bart Van Assche
@ 2023-04-10  7:53   ` Damien Le Moal
  2023-04-10 16:59     ` Bart Van Assche
  2023-04-11 12:38   ` Christoph Hellwig
  2023-04-11 13:14   ` Christoph Hellwig
  2 siblings, 1 reply; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  7:53 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Let the I/O scheduler control which requests are dispatched.

Well, that is the main function of the IO scheduler already. So could you
develop the commit message here to explain in more details which case this is
changing ?

> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c         | 21 +++++++++------------
>  include/linux/blk-mq.h |  5 +++--
>  2 files changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 250556546bbf..57315395434b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1426,15 +1426,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  
>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
>  		list_del_init(&rq->queuelist);
> -		/*
> -		 * If RQF_DONTPREP, rq has contained some driver specific
> -		 * data, so insert it to hctx dispatch list to avoid any
> -		 * merge.
> -		 */
> -		if (rq->rq_flags & RQF_DONTPREP)
> -			blk_mq_request_bypass_insert(rq, false, false);
> -		else
> -			blk_mq_sched_insert_request(rq, true, false, false);
> +		blk_mq_sched_insert_request(rq, /*at_head=*/true, false, false);
>  	}
>  
>  	while (!list_empty(&rq_list)) {
> @@ -2065,9 +2057,14 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
>  		if (nr_budgets)
>  			blk_mq_release_budgets(q, list);
>  
> -		spin_lock(&hctx->lock);
> -		list_splice_tail_init(list, &hctx->dispatch);
> -		spin_unlock(&hctx->lock);
> +		if (!q->elevator) {
> +			spin_lock(&hctx->lock);
> +			list_splice_tail_init(list, &hctx->dispatch);
> +			spin_unlock(&hctx->lock);
> +		} else {
> +			q->elevator->type->ops.insert_requests(hctx, list,
> +							/*at_head=*/true);

Dispatch at head = true ? Why ? This seems wrong. It may be valid for the
requeue case (even then, I am not convinced), but looks very wrong for the
regular dispatch case.

> +		}
>  
>  		/*
>  		 * Order adding requests to hctx->dispatch and checking
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 5e6c79ad83d2..3a3bee9085e3 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -64,8 +64,9 @@ typedef __u32 __bitwise req_flags_t;
>  #define RQF_RESV			((__force req_flags_t)(1 << 23))
>  
>  /* flags that prevent us from merging requests: */
> -#define RQF_NOMERGE_FLAGS \
> -	(RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
> +#define RQF_NOMERGE_FLAGS                                               \
> +	(RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_DONTPREP | \
> +	 RQF_SPECIAL_PAYLOAD)
>  
>  enum mq_rq_state {
>  	MQ_RQ_IDLE		= 0,


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/12] block: Requeue requests if a CPU is unplugged
  2023-04-07 23:58 ` [PATCH v2 04/12] block: Requeue requests if a CPU is unplugged Bart Van Assche
@ 2023-04-10  7:54   ` Damien Le Moal
  2023-04-11 12:40   ` Christoph Hellwig
  1 sibling, 0 replies; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  7:54 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Requeue requests instead of sending these to the dispatch list if a CPU
> is unplugged to prevent reordering of zoned writes.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Looks OK.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> ---
>  block/blk-mq.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 57315395434b..77fdaed4e074 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3495,9 +3495,17 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
>  	if (list_empty(&tmp))
>  		return 0;
>  
> -	spin_lock(&hctx->lock);
> -	list_splice_tail_init(&tmp, &hctx->dispatch);
> -	spin_unlock(&hctx->lock);
> +	if (hctx->queue->elevator) {
> +		struct request *rq, *next;
> +
> +		list_for_each_entry_safe(rq, next, &tmp, queuelist)
> +			blk_mq_requeue_request(rq, false);
> +		blk_mq_kick_requeue_list(hctx->queue);
> +	} else {
> +		spin_lock(&hctx->lock);
> +		list_splice_tail_init(&tmp, &hctx->dispatch);
> +		spin_unlock(&hctx->lock);
> +	}
>  
>  	blk_mq_run_hw_queue(hctx, true);
>  	return 0;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 05/12] block: One requeue list per hctx
  2023-04-07 23:58 ` [PATCH v2 05/12] block: One requeue list per hctx Bart Van Assche
@ 2023-04-10  7:58   ` Damien Le Moal
  2023-04-10 17:04     ` Bart Van Assche
  0 siblings, 1 reply; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  7:58 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Prepare for processing the requeue list from inside __blk_mq_run_hw_queue().

With such short comment, it is hard to see exactly what this patch is trying to
do. The first part seems to be adding debugfs stuff, which I think is fine, but
should be its own patch. The second part move the requeue work from per qeue to
per hctx as I understand it. Why ? Can you explain that here ?

> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq-debugfs.c | 66 +++++++++++++++++++++---------------------
>  block/blk-mq.c         | 58 +++++++++++++++++++++++--------------
>  include/linux/blk-mq.h |  4 +++
>  include/linux/blkdev.h |  4 ---
>  4 files changed, 73 insertions(+), 59 deletions(-)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 212a7f301e73..5eb930754347 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -20,37 +20,6 @@ static int queue_poll_stat_show(void *data, struct seq_file *m)
>  	return 0;
>  }
>  
> -static void *queue_requeue_list_start(struct seq_file *m, loff_t *pos)
> -	__acquires(&q->requeue_lock)
> -{
> -	struct request_queue *q = m->private;
> -
> -	spin_lock_irq(&q->requeue_lock);
> -	return seq_list_start(&q->requeue_list, *pos);
> -}
> -
> -static void *queue_requeue_list_next(struct seq_file *m, void *v, loff_t *pos)
> -{
> -	struct request_queue *q = m->private;
> -
> -	return seq_list_next(v, &q->requeue_list, pos);
> -}
> -
> -static void queue_requeue_list_stop(struct seq_file *m, void *v)
> -	__releases(&q->requeue_lock)
> -{
> -	struct request_queue *q = m->private;
> -
> -	spin_unlock_irq(&q->requeue_lock);
> -}
> -
> -static const struct seq_operations queue_requeue_list_seq_ops = {
> -	.start	= queue_requeue_list_start,
> -	.next	= queue_requeue_list_next,
> -	.stop	= queue_requeue_list_stop,
> -	.show	= blk_mq_debugfs_rq_show,
> -};
> -
>  static int blk_flags_show(struct seq_file *m, const unsigned long flags,
>  			  const char *const *flag_name, int flag_name_count)
>  {
> @@ -156,11 +125,10 @@ static ssize_t queue_state_write(void *data, const char __user *buf,
>  
>  static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
>  	{ "poll_stat", 0400, queue_poll_stat_show },
> -	{ "requeue_list", 0400, .seq_ops = &queue_requeue_list_seq_ops },
>  	{ "pm_only", 0600, queue_pm_only_show, NULL },
>  	{ "state", 0600, queue_state_show, queue_state_write },
>  	{ "zone_wlock", 0400, queue_zone_wlock_show, NULL },
> -	{ },
> +	{},
>  };
>  
>  #define HCTX_STATE_NAME(name) [BLK_MQ_S_##name] = #name
> @@ -513,6 +481,37 @@ static int hctx_dispatch_busy_show(void *data, struct seq_file *m)
>  	return 0;
>  }
>  
> +static void *hctx_requeue_list_start(struct seq_file *m, loff_t *pos)
> +	__acquires(&hctx->requeue_lock)
> +{
> +	struct blk_mq_hw_ctx *hctx = m->private;
> +
> +	spin_lock_irq(&hctx->requeue_lock);
> +	return seq_list_start(&hctx->requeue_list, *pos);
> +}
> +
> +static void *hctx_requeue_list_next(struct seq_file *m, void *v, loff_t *pos)
> +{
> +	struct blk_mq_hw_ctx *hctx = m->private;
> +
> +	return seq_list_next(v, &hctx->requeue_list, pos);
> +}
> +
> +static void hctx_requeue_list_stop(struct seq_file *m, void *v)
> +	__releases(&hctx->requeue_lock)
> +{
> +	struct blk_mq_hw_ctx *hctx = m->private;
> +
> +	spin_unlock_irq(&hctx->requeue_lock);
> +}
> +
> +static const struct seq_operations hctx_requeue_list_seq_ops = {
> +	.start = hctx_requeue_list_start,
> +	.next = hctx_requeue_list_next,
> +	.stop = hctx_requeue_list_stop,
> +	.show = blk_mq_debugfs_rq_show,
> +};
> +
>  #define CTX_RQ_SEQ_OPS(name, type)					\
>  static void *ctx_##name##_rq_list_start(struct seq_file *m, loff_t *pos) \
>  	__acquires(&ctx->lock)						\
> @@ -628,6 +627,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
>  	{"run", 0600, hctx_run_show, hctx_run_write},
>  	{"active", 0400, hctx_active_show},
>  	{"dispatch_busy", 0400, hctx_dispatch_busy_show},
> +	{"requeue_list", 0400, .seq_ops = &hctx_requeue_list_seq_ops},
>  	{"type", 0400, hctx_type_show},
>  	{},
>  };
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 77fdaed4e074..deb3d08a6b26 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1411,14 +1411,17 @@ EXPORT_SYMBOL(blk_mq_requeue_request);
>  
>  static void blk_mq_requeue_work(struct work_struct *work)
>  {
> -	struct request_queue *q =
> -		container_of(work, struct request_queue, requeue_work.work);
> +	struct blk_mq_hw_ctx *hctx =
> +		container_of(work, struct blk_mq_hw_ctx, requeue_work.work);
>  	LIST_HEAD(rq_list);
>  	struct request *rq, *next;
>  
> -	spin_lock_irq(&q->requeue_lock);
> -	list_splice_init(&q->requeue_list, &rq_list);
> -	spin_unlock_irq(&q->requeue_lock);
> +	if (list_empty_careful(&hctx->requeue_list))
> +		return;
> +
> +	spin_lock_irq(&hctx->requeue_lock);
> +	list_splice_init(&hctx->requeue_list, &rq_list);
> +	spin_unlock_irq(&hctx->requeue_lock);
>  
>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
>  		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
> @@ -1435,13 +1438,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  		blk_mq_sched_insert_request(rq, false, false, false);
>  	}
>  
> -	blk_mq_run_hw_queues(q, false);
> +	blk_mq_run_hw_queue(hctx, false);
>  }
>  
>  void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
>  				bool kick_requeue_list)
>  {
> -	struct request_queue *q = rq->q;
> +	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>  	unsigned long flags;
>  
>  	/*
> @@ -1449,31 +1452,42 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
>  	 * request head insertion from the workqueue.
>  	 */
>  	BUG_ON(rq->rq_flags & RQF_SOFTBARRIER);
> +	WARN_ON_ONCE(!rq->mq_hctx);
>  
> -	spin_lock_irqsave(&q->requeue_lock, flags);
> +	spin_lock_irqsave(&hctx->requeue_lock, flags);
>  	if (at_head) {
>  		rq->rq_flags |= RQF_SOFTBARRIER;
> -		list_add(&rq->queuelist, &q->requeue_list);
> +		list_add(&rq->queuelist, &hctx->requeue_list);
>  	} else {
> -		list_add_tail(&rq->queuelist, &q->requeue_list);
> +		list_add_tail(&rq->queuelist, &hctx->requeue_list);
>  	}
> -	spin_unlock_irqrestore(&q->requeue_lock, flags);
> +	spin_unlock_irqrestore(&hctx->requeue_lock, flags);
>  
>  	if (kick_requeue_list)
> -		blk_mq_kick_requeue_list(q);
> +		blk_mq_kick_requeue_list(rq->q);
>  }
>  
>  void blk_mq_kick_requeue_list(struct request_queue *q)
>  {
> -	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work, 0);
> +	struct blk_mq_hw_ctx *hctx;
> +	unsigned long i;
> +
> +	queue_for_each_hw_ctx(q, hctx, i)
> +		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
> +					    &hctx->requeue_work, 0);
>  }
>  EXPORT_SYMBOL(blk_mq_kick_requeue_list);
>  
>  void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>  				    unsigned long msecs)
>  {
> -	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work,
> -				    msecs_to_jiffies(msecs));
> +	struct blk_mq_hw_ctx *hctx;
> +	unsigned long i;
> +
> +	queue_for_each_hw_ctx(q, hctx, i)
> +		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
> +					    &hctx->requeue_work,
> +					    msecs_to_jiffies(msecs));
>  }
>  EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>  
> @@ -3594,6 +3608,10 @@ static int blk_mq_init_hctx(struct request_queue *q,
>  		struct blk_mq_tag_set *set,
>  		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
>  {
> +	INIT_DELAYED_WORK(&hctx->requeue_work, blk_mq_requeue_work);
> +	INIT_LIST_HEAD(&hctx->requeue_list);
> +	spin_lock_init(&hctx->requeue_lock);
> +
>  	hctx->queue_num = hctx_idx;
>  
>  	if (!(hctx->flags & BLK_MQ_F_STACKING))
> @@ -4209,10 +4227,6 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
>  	blk_mq_update_poll_flag(q);
>  
> -	INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
> -	INIT_LIST_HEAD(&q->requeue_list);
> -	spin_lock_init(&q->requeue_lock);
> -
>  	q->nr_requests = set->queue_depth;
>  
>  	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
> @@ -4757,10 +4771,10 @@ void blk_mq_cancel_work_sync(struct request_queue *q)
>  	struct blk_mq_hw_ctx *hctx;
>  	unsigned long i;
>  
> -	cancel_delayed_work_sync(&q->requeue_work);
> -
> -	queue_for_each_hw_ctx(q, hctx, i)
> +	queue_for_each_hw_ctx(q, hctx, i) {
> +		cancel_delayed_work_sync(&hctx->requeue_work);
>  		cancel_delayed_work_sync(&hctx->run_work);
> +	}
>  }
>  
>  static int __init blk_mq_init(void)
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 3a3bee9085e3..0157f1569980 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -311,6 +311,10 @@ struct blk_mq_hw_ctx {
>  		unsigned long		state;
>  	} ____cacheline_aligned_in_smp;
>  
> +	struct list_head	requeue_list;
> +	spinlock_t		requeue_lock;
> +	struct delayed_work	requeue_work;
> +
>  	/**
>  	 * @run_work: Used for scheduling a hardware queue run at a later time.
>  	 */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index e3242e67a8e3..f5fa53cd13bd 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -491,10 +491,6 @@ struct request_queue {
>  	 */
>  	struct blk_flush_queue	*fq;
>  
> -	struct list_head	requeue_list;
> -	spinlock_t		requeue_lock;
> -	struct delayed_work	requeue_work;
> -
>  	struct mutex		sysfs_lock;
>  	struct mutex		sysfs_dir_lock;
>  


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 06/12] block: Preserve the order of requeued requests
  2023-04-07 23:58 ` [PATCH v2 06/12] block: Preserve the order of requeued requests Bart Van Assche
@ 2023-04-10  8:01   ` Damien Le Moal
  2023-04-11 12:43   ` Christoph Hellwig
  1 sibling, 0 replies; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  8:01 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> If a queue is run before all requeued requests have been sent to the I/O
> scheduler, the I/O scheduler may dispatch the wrong request. Fix this by
> making __blk_mq_run_hw_queue() process the requeue_list instead of
> blk_mq_requeue_work().

I think that the part of patch 5 that move the requeue work to per hctx should
be together with this patch. That would make the review easier.
I am guessing that the move to per hctx is to try to reduce lock contention ?
That is not clearly explained. Given that requeue events should be infrequent
exceptions, is that really necessary ?

> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c         | 35 ++++++++++-------------------------
>  include/linux/blk-mq.h |  1 -
>  2 files changed, 10 insertions(+), 26 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index deb3d08a6b26..562868dff43f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -64,6 +64,7 @@ static inline blk_qc_t blk_rq_to_qc(struct request *rq)
>  static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
>  {
>  	return !list_empty_careful(&hctx->dispatch) ||
> +		!list_empty_careful(&hctx->requeue_list) ||
>  		sbitmap_any_bit_set(&hctx->ctx_map) ||
>  			blk_mq_sched_has_work(hctx);
>  }
> @@ -1409,10 +1410,8 @@ void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
>  }
>  EXPORT_SYMBOL(blk_mq_requeue_request);
>  
> -static void blk_mq_requeue_work(struct work_struct *work)
> +static void blk_mq_process_requeue_list(struct blk_mq_hw_ctx *hctx)
>  {
> -	struct blk_mq_hw_ctx *hctx =
> -		container_of(work, struct blk_mq_hw_ctx, requeue_work.work);
>  	LIST_HEAD(rq_list);
>  	struct request *rq, *next;
>  
> @@ -1437,8 +1436,6 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  		list_del_init(&rq->queuelist);
>  		blk_mq_sched_insert_request(rq, false, false, false);
>  	}
> -
> -	blk_mq_run_hw_queue(hctx, false);
>  }
>  
>  void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
> @@ -1464,30 +1461,19 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
>  	spin_unlock_irqrestore(&hctx->requeue_lock, flags);
>  
>  	if (kick_requeue_list)
> -		blk_mq_kick_requeue_list(rq->q);
> +		blk_mq_run_hw_queue(hctx, /*async=*/true);
>  }
>  
>  void blk_mq_kick_requeue_list(struct request_queue *q)
>  {
> -	struct blk_mq_hw_ctx *hctx;
> -	unsigned long i;
> -
> -	queue_for_each_hw_ctx(q, hctx, i)
> -		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
> -					    &hctx->requeue_work, 0);
> +	blk_mq_run_hw_queues(q, true);
>  }
>  EXPORT_SYMBOL(blk_mq_kick_requeue_list);
>  
>  void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>  				    unsigned long msecs)
>  {
> -	struct blk_mq_hw_ctx *hctx;
> -	unsigned long i;
> -
> -	queue_for_each_hw_ctx(q, hctx, i)
> -		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
> -					    &hctx->requeue_work,
> -					    msecs_to_jiffies(msecs));
> +	blk_mq_delay_run_hw_queues(q, msecs);
>  }
>  EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
>  
> @@ -2146,6 +2132,8 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>  	 */
>  	WARN_ON_ONCE(in_interrupt());
>  
> +	blk_mq_process_requeue_list(hctx);
> +
>  	blk_mq_run_dispatch_ops(hctx->queue,
>  			blk_mq_sched_dispatch_requests(hctx));
>  }
> @@ -2317,7 +2305,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
>  		 * scheduler.
>  		 */
>  		if (!sq_hctx || sq_hctx == hctx ||
> -		    !list_empty_careful(&hctx->dispatch))
> +		    blk_mq_hctx_has_pending(hctx))
>  			blk_mq_run_hw_queue(hctx, async);
>  	}
>  }
> @@ -2353,7 +2341,7 @@ void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs)
>  		 * scheduler.
>  		 */
>  		if (!sq_hctx || sq_hctx == hctx ||
> -		    !list_empty_careful(&hctx->dispatch))
> +		    blk_mq_hctx_has_pending(hctx))
>  			blk_mq_delay_run_hw_queue(hctx, msecs);
>  	}
>  }
> @@ -3608,7 +3596,6 @@ static int blk_mq_init_hctx(struct request_queue *q,
>  		struct blk_mq_tag_set *set,
>  		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
>  {
> -	INIT_DELAYED_WORK(&hctx->requeue_work, blk_mq_requeue_work);
>  	INIT_LIST_HEAD(&hctx->requeue_list);
>  	spin_lock_init(&hctx->requeue_lock);
>  
> @@ -4771,10 +4758,8 @@ void blk_mq_cancel_work_sync(struct request_queue *q)
>  	struct blk_mq_hw_ctx *hctx;
>  	unsigned long i;
>  
> -	queue_for_each_hw_ctx(q, hctx, i) {
> -		cancel_delayed_work_sync(&hctx->requeue_work);
> +	queue_for_each_hw_ctx(q, hctx, i)
>  		cancel_delayed_work_sync(&hctx->run_work);
> -	}
>  }
>  
>  static int __init blk_mq_init(void)
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 0157f1569980..e62feb17af96 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -313,7 +313,6 @@ struct blk_mq_hw_ctx {
>  
>  	struct list_head	requeue_list;
>  	spinlock_t		requeue_lock;
> -	struct delayed_work	requeue_work;
>  
>  	/**
>  	 * @run_work: Used for scheduling a hardware queue run at a later time.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 07/12] block: Make it easier to debug zoned write reordering
  2023-04-07 23:58 ` [PATCH v2 07/12] block: Make it easier to debug zoned write reordering Bart Van Assche
@ 2023-04-10  8:06   ` Damien Le Moal
  0 siblings, 0 replies; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  8:06 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Issue a kernel warning if reordering could happen.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 562868dff43f..d89a0e6cf37d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2478,6 +2478,8 @@ void blk_mq_request_bypass_insert(struct request *rq, bool at_head,
>  {
>  	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>  
> +	WARN_ON_ONCE(rq->q->elevator && blk_rq_is_seq_zoned_write(rq));
> +
>  	spin_lock(&hctx->lock);
>  	if (at_head)
>  		list_add(&rq->queuelist, &hctx->dispatch);
> @@ -2570,6 +2572,8 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  	bool run_queue = true;
>  	int budget_token;
>  
> +	WARN_ON_ONCE(q->elevator && blk_rq_is_seq_zoned_write(rq));
> +
>  	/*
>  	 * RCU or SRCU read lock is needed before checking quiesced flag.
>  	 *

Looks OK, but I think it would be preferable to optimize
blk_rq_is_seq_zoned_write() to compile to be always false for kernels where
CONFIG_BLK_DEV_ZONED is not set. E.g.:

static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
{
	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
		switch (req_op(rq)) {
		case REQ_OP_WRITE:
		case REQ_OP_WRITE_ZEROES:
			return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
		case REQ_OP_ZONE_APPEND:
		default:
			return false;
		}
	}

	return false;
}


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/12] block: mq-deadline: Simplify deadline_skip_seq_writes()
  2023-04-07 23:58 ` [PATCH v2 08/12] block: mq-deadline: Simplify deadline_skip_seq_writes() Bart Van Assche
@ 2023-04-10  8:07   ` Damien Le Moal
  0 siblings, 0 replies; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  8:07 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Make deadline_skip_seq_writes() shorter without changing its
> functionality.
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index d885ccf49170..50a9d3b0a291 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -312,12 +312,9 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>  						struct request *rq)
>  {
>  	sector_t pos = blk_rq_pos(rq);
> -	sector_t skipped_sectors = 0;
>  
> -	while (rq) {
> -		if (blk_rq_pos(rq) != pos + skipped_sectors)
> -			break;
> -		skipped_sectors += blk_rq_sectors(rq);
> +	while (rq && blk_rq_pos(rq) == pos) {
> +		pos += blk_rq_sectors(rq);
>  		rq = deadline_latter_request(rq);
>  	}

Not really related to this series at all. But looks good.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 09/12] block: mq-deadline: Disable head insertion for zoned writes
  2023-04-07 23:58 ` [PATCH v2 09/12] block: mq-deadline: Disable head insertion for zoned writes Bart Van Assche
@ 2023-04-10  8:10   ` Damien Le Moal
  2023-04-10 17:09     ` Bart Van Assche
  0 siblings, 1 reply; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  8:10 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Make sure that zoned writes are submitted in LBA order.
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 50a9d3b0a291..891ee0da73ac 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -798,7 +798,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  
>  	trace_block_rq_insert(rq);
>  
> -	if (at_head) {
> +	if (at_head && !blk_rq_is_seq_zoned_write(rq)) {
>  		list_add(&rq->queuelist, &per_prio->dispatch);
>  		rq->fifo_time = jiffies;
>  	} else {

Looks OK, but I would prefer us addressing the caller site using at_head = true,
as that is almost always completely wrong for sequential zone writes.
That would reduce the number of places we check for blk_rq_is_seq_zoned_write().

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 10/12] block: mq-deadline: Introduce a local variable
  2023-04-07 23:58 ` [PATCH v2 10/12] block: mq-deadline: Introduce a local variable Bart Van Assche
@ 2023-04-10  8:11   ` Damien Le Moal
  0 siblings, 0 replies; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  8:11 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Prepare for adding more code that uses the request queue pointer.
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 11/12] block: mq-deadline: Fix a race condition related to zoned writes
  2023-04-07 23:58 ` [PATCH v2 11/12] block: mq-deadline: Fix a race condition related to zoned writes Bart Van Assche
@ 2023-04-10  8:16   ` Damien Le Moal
  2023-04-10 17:23     ` Bart Van Assche
  0 siblings, 1 reply; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  8:16 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> Let deadline_next_request() only consider the first zoned write per
> zone. This patch fixes a race condition between deadline_next_request()
> and completion of zoned writes.
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c    | 24 +++++++++++++++++++++---
>  include/linux/blk-mq.h |  5 +++++
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 8c2bc9fdcf8c..d49e20d3011d 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -389,12 +389,30 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  	 */
>  	spin_lock_irqsave(&dd->zone_lock, flags);
>  	while (rq) {
> +		unsigned int zno = blk_rq_zone_no(rq);
> +
>  		if (blk_req_can_dispatch_to_zone(rq))
>  			break;
> -		if (blk_queue_nonrot(q))
> -			rq = deadline_latter_request(rq);
> -		else
> +
> +		WARN_ON_ONCE(!blk_queue_is_zoned(q));

I do not think this WARN is useful as blk_req_can_dispatch_to_zone() will always
return true for regular block devices.

> +
> +		if (!blk_queue_nonrot(q)) {
>  			rq = deadline_skip_seq_writes(dd, rq);
> +			if (!rq)
> +				break;
> +			rq = deadline_earlier_request(rq);
> +			if (WARN_ON_ONCE(!rq))
> +				break;

I do not understand why this is needed.

> +		}
> +
> +		/*
> +		 * Skip all other write requests for the zone with zone number
> +		 * 'zno'. This prevents that this function selects a zoned write
> +		 * that is not the first write for a given zone.
> +		 */
> +		while ((rq = deadline_latter_request(rq)) &&
> +		       blk_rq_zone_no(rq) == zno)
> +			;
>  	}
>  	spin_unlock_irqrestore(&dd->zone_lock, flags);
>  
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index e62feb17af96..515dfd04d736 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -1193,6 +1193,11 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
>  	return !blk_req_zone_is_write_locked(rq);
>  }
>  #else /* CONFIG_BLK_DEV_ZONED */
> +static inline unsigned int blk_rq_zone_no(struct request *rq)
> +{
> +	return 0;
> +}
> +
>  static inline bool blk_req_needs_zone_write_lock(struct request *rq)
>  {
>  	return false;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 12/12] block: mq-deadline: Handle requeued requests correctly
  2023-04-07 23:58 ` [PATCH v2 12/12] block: mq-deadline: Handle requeued requests correctly Bart Van Assche
@ 2023-04-10  8:32   ` Damien Le Moal
  2023-04-10 17:31     ` Bart Van Assche
  0 siblings, 1 reply; 41+ messages in thread
From: Damien Le Moal @ 2023-04-10  8:32 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/8/23 08:58, Bart Van Assche wrote:
> If a zoned write is requeued with an LBA that is lower than already
> inserted zoned writes, make sure that it is submitted first.
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index d49e20d3011d..c536b499a60f 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -162,8 +162,19 @@ static void
>  deadline_add_rq_rb(struct dd_per_prio *per_prio, struct request *rq)
>  {
>  	struct rb_root *root = deadline_rb_root(per_prio, rq);
> +	struct request **next_rq = &per_prio->next_rq[rq_data_dir(rq)];
>  
>  	elv_rb_add(root, rq);
> +	if (*next_rq == NULL || !blk_queue_is_zoned(rq->q))
> +		return;

A blank line here would be nice.

> +	/*
> +	 * If a request got requeued or requests have been submitted out of
> +	 * order, make sure that per zone the request with the lowest LBA is

Requests being submitted out of order would be a bug in the issuer code. So I
would not even mention this here and focus on explaining that requeue may cause
seeing inserts outs of order.

> +	 * submitted first.
> +	 */
> +	if (blk_rq_pos(rq) < blk_rq_pos(*next_rq) &&
> +	    blk_rq_zone_no(rq) == blk_rq_zone_no(*next_rq))
> +		*next_rq = rq;
>  }
>  
>  static inline void
> @@ -822,6 +833,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  		list_add(&rq->queuelist, &per_prio->dispatch);
>  		rq->fifo_time = jiffies;
>  	} else {
> +		struct list_head *insert_before;
> +
>  		deadline_add_rq_rb(per_prio, rq);
>  
>  		if (rq_mergeable(rq)) {
> @@ -834,7 +847,18 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  		 * set expire time and add to fifo list
>  		 */
>  		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> -		list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
> +		insert_before = &per_prio->fifo_list[data_dir];
> +		if (blk_rq_is_seq_zoned_write(rq)) {
> +			const unsigned int zno = blk_rq_zone_no(rq);
> +			struct request *rq2 = rq;
> +
> +			while ((rq2 = deadline_earlier_request(rq2)) &&
> +			       blk_rq_zone_no(rq2) == zno &&
> +			       blk_rq_pos(rq2) > blk_rq_pos(rq)) {
> +				insert_before = &rq2->queuelist;
> +			}
> +		}
> +		list_add_tail(&rq->queuelist, insert_before);

This seems incorrect: the fifo list is ordered in arrival time, so always
queuing at the tail is the right thing to do. What I think you want to do here
is when we choose the next request to be the oldest (to implement aging), you
want to get the first request for the target zone of that oldest request. But
why do that on insert ? This should be in the dispatch code, coded in
deadline_fifo_request(), no ?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 01/12] block: Send zoned writes to the I/O scheduler
  2023-04-10  7:42   ` Damien Le Moal
@ 2023-04-10 16:35     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-10 16:35 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/10/23 00:42, Damien Le Moal wrote:
> On 4/8/23 08:58, Bart Van Assche wrote:
>> Send zoned writes inserted by the device mapper to the I/O scheduler.
>> This prevents that zoned writes get reordered if a device mapper driver
>> has been stacked on top of a driver for a zoned block device.
>>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: Mike Snitzer <snitzer@kernel.org>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   block/blk-mq.c | 16 +++++++++++++---
>>   block/blk.h    | 19 +++++++++++++++++++
>>   2 files changed, 32 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index db93b1a71157..fefc9a728e0e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -3008,9 +3008,19 @@ blk_status_t blk_insert_cloned_request(struct request *rq)
>>   	blk_account_io_start(rq);
>>   
>>   	/*
>> -	 * Since we have a scheduler attached on the top device,
>> -	 * bypass a potential scheduler on the bottom device for
>> -	 * insert.
>> +	 * Send zoned writes to the I/O scheduler if an I/O scheduler has been
>> +	 * attached.
>> +	 */
>> +	if (q->elevator && blk_rq_is_seq_zoned_write(rq)) {
>> +		blk_mq_sched_insert_request(rq, /*at_head=*/false,
>> +					    /*run_queue=*/true,
>> +					    /*async=*/false);
>> +		return BLK_STS_OK;
>> +	}
> 
> Looks OK to me, but as I understand it, this function is used only for request
> based device mapper, none of which currently support zoned block devices if I a
> not mistaken. So is this change really necessary ?

Hi Damien,

Probably not. I will double check whether this patch can be dropped.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 03/12] block: Send requeued requests to the I/O scheduler
  2023-04-10  7:53   ` Damien Le Moal
@ 2023-04-10 16:59     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-10 16:59 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Damien Le Moal,
	Ming Lei, Mike Snitzer

On 4/10/23 00:53, Damien Le Moal wrote:
> On 4/8/23 08:58, Bart Van Assche wrote:
>> @@ -2065,9 +2057,14 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
>>   		if (nr_budgets)
>>   			blk_mq_release_budgets(q, list);
>>   
>> -		spin_lock(&hctx->lock);
>> -		list_splice_tail_init(list, &hctx->dispatch);
>> -		spin_unlock(&hctx->lock);
>> +		if (!q->elevator) {
>> +			spin_lock(&hctx->lock);
>> +			list_splice_tail_init(list, &hctx->dispatch);
>> +			spin_unlock(&hctx->lock);
>> +		} else {
>> +			q->elevator->type->ops.insert_requests(hctx, list,
>> +							/*at_head=*/true);
> 
> Dispatch at head = true ? Why ? This seems wrong. It may be valid for the
> requeue case (even then, I am not convinced), but looks very wrong for the
> regular dispatch case.

Hi Damien,

blk_mq_sched_dispatch_requests() dispatches requests from hctx->dispatch 
before it checks whether any requests can be dispatched from the I/O 
scheduler. As one can see in the quoted change above, 
blk_mq_dispatch_rq_list() moves any requests that could not be 
dispatched to the hctx->dispatch dispatch list. Since 
blk_mq_sched_dispatch_requests() processes the dispatch list first, this 
comes down to insertion at the head of the list. Hence the at_head=true 
argument in the call to insert_requests().

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 05/12] block: One requeue list per hctx
  2023-04-10  7:58   ` Damien Le Moal
@ 2023-04-10 17:04     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-10 17:04 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/10/23 00:58, Damien Le Moal wrote:
> On 4/8/23 08:58, Bart Van Assche wrote:
>> Prepare for processing the requeue list from inside __blk_mq_run_hw_queue().
> 
> With such short comment, it is hard to see exactly what this patch is trying to
> do. The first part seems to be adding debugfs stuff, which I think is fine, but
> should be its own patch. The second part move the requeue work from per qeue to
> per hctx as I understand it. Why ? Can you explain that here ?

Hi Damien,

This patch splits the request queue requeue list into one requeue list 
per hardware queue. I split the requeue list because I do not want that 
the next patch in this series would affect performance negatively if no 
I/O scheduler has been attached. I will double check whether this patch 
is really necessary.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 09/12] block: mq-deadline: Disable head insertion for zoned writes
  2023-04-10  8:10   ` Damien Le Moal
@ 2023-04-10 17:09     ` Bart Van Assche
  2023-04-11  6:44       ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-10 17:09 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/10/23 01:10, Damien Le Moal wrote:
> On 4/8/23 08:58, Bart Van Assche wrote:
>> Make sure that zoned writes are submitted in LBA order.
>>
>> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: Mike Snitzer <snitzer@kernel.org>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   block/mq-deadline.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>> index 50a9d3b0a291..891ee0da73ac 100644
>> --- a/block/mq-deadline.c
>> +++ b/block/mq-deadline.c
>> @@ -798,7 +798,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>   
>>   	trace_block_rq_insert(rq);
>>   
>> -	if (at_head) {
>> +	if (at_head && !blk_rq_is_seq_zoned_write(rq)) {
>>   		list_add(&rq->queuelist, &per_prio->dispatch);
>>   		rq->fifo_time = jiffies;
>>   	} else {
> 
> Looks OK, but I would prefer us addressing the caller site using at_head = true,
> as that is almost always completely wrong for sequential zone writes.
> That would reduce the number of places we check for blk_rq_is_seq_zoned_write().

Hi Damien,

The code for deciding whether or not to use head insertion is spread all 
over the block layer. I prefer a single additional check to disable head 
insertion instead of modifying all the code that decides whether or not 
to use head-insertion. Additionally, the call to 
blk_rq_is_seq_zoned_write() would remain if the decision whether or not 
to use head insertion is moved into the callers of 
elevator_type.insert_request.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 11/12] block: mq-deadline: Fix a race condition related to zoned writes
  2023-04-10  8:16   ` Damien Le Moal
@ 2023-04-10 17:23     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-10 17:23 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/10/23 01:16, Damien Le Moal wrote:
> On 4/8/23 08:58, Bart Van Assche wrote:
>> +		WARN_ON_ONCE(!blk_queue_is_zoned(q));
> 
> I do not think this WARN is useful as blk_req_can_dispatch_to_zone() will always
> return true for regular block devices.

Hi Damien,

I will leave it out.

Are you OK with adding a blk_rq_zone_no() definition if 
CONFIG_BLK_DEV_ZONED is not defined (as has been done in this patch) or 
do you perhaps prefer that I surround the code blocks that have been 
added by this patch and in which blk_rq_zone_no() is called by #ifdef 
CONFIG_BLK_DEV_ZONED / #endif?

>> +
>> +		if (!blk_queue_nonrot(q)) {
>>   			rq = deadline_skip_seq_writes(dd, rq);
>> +			if (!rq)
>> +				break;
>> +			rq = deadline_earlier_request(rq);
>> +			if (WARN_ON_ONCE(!rq))
>> +				break;
> 
> I do not understand why this is needed.

Are you perhaps referring to the deadline_earlier_request() call? The 
while loop below advances 'rq' at least to the next request. The 
deadline_earlier_request() call compensates for this by going back to 
the previous request.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 12/12] block: mq-deadline: Handle requeued requests correctly
  2023-04-10  8:32   ` Damien Le Moal
@ 2023-04-10 17:31     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-10 17:31 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/10/23 01:32, Damien Le Moal wrote:
> On 4/8/23 08:58, Bart Van Assche wrote:
>> @@ -834,7 +847,18 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>   		 * set expire time and add to fifo list
>>   		 */
>>   		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
>> -		list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]);
>> +		insert_before = &per_prio->fifo_list[data_dir];
>> +		if (blk_rq_is_seq_zoned_write(rq)) {
>> +			const unsigned int zno = blk_rq_zone_no(rq);
>> +			struct request *rq2 = rq;
>> +
>> +			while ((rq2 = deadline_earlier_request(rq2)) &&
>> +			       blk_rq_zone_no(rq2) == zno &&
>> +			       blk_rq_pos(rq2) > blk_rq_pos(rq)) {
>> +				insert_before = &rq2->queuelist;
>> +			}
>> +		}
>> +		list_add_tail(&rq->queuelist, insert_before);
> 
> This seems incorrect: the fifo list is ordered in arrival time, so always
> queuing at the tail is the right thing to do. What I think you want to do here
> is when we choose the next request to be the oldest (to implement aging), you
> want to get the first request for the target zone of that oldest request. But
> why do that on insert ? This should be in the dispatch code, coded in
> deadline_fifo_request(), no ?

Hi Damien,

If the dispatch code would have to look up the zoned write with the 
lowest LBA then it would have to iterate over the entire FIFO list. The 
above loop will on average perform much less work. If no requeuing 
happens, the expression 'blk_rq_pos(rq2) > blk_rq_pos(rq)' will evaluate 
to false the first time it is evaluated and the loop body will never be 
executed.

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 02/12] block: Send flush requests to the I/O scheduler
  2023-04-10  7:46   ` Damien Le Moal
@ 2023-04-11  0:15     ` Bart Van Assche
  2023-04-11  6:38       ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2023-04-11  0:15 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Christoph Hellwig, Ming Lei, Mike Snitzer

On 4/10/23 00:46, Damien Le Moal wrote:
> Given that this change has nothing specific to zoned devices, you could rewrite
> the commit message to mention that. And are bfq and kyber OK with this change as
> well ?
> 
> Also, to be consistent with this change, shouldn't blk_mq_sched_bypass_insert()
> be updated as well ? That function is called from blk_mq_sched_insert_request().

Hi Damien,

I'm considering to replace patch 02/12 from this series by the patch below:


Subject: [PATCH] block: Send flush requests to the I/O scheduler

Send flush requests to the I/O scheduler such that I/O scheduler policies
are applied to writes with the FUA flag set. Separate the I/O scheduler
members from the flush members in struct request since with this patch
applied a request may pass through both an I/O scheduler and the flush
machinery.

This change affects the statistics of I/O schedulers that track I/O
statistics (BFQ and mq-deadline).

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
  block/blk-flush.c      |  8 ++++++--
  block/blk-mq-sched.c   | 19 +++++++++++++++----
  block/blk-mq-sched.h   |  1 +
  block/blk-mq.c         | 22 +++++-----------------
  include/linux/blk-mq.h | 27 +++++++++++----------------
  5 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 53202eff545e..cf0afb75fafd 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -336,8 +336,11 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
  		 * account of this driver tag
  		 */
  		flush_rq->rq_flags |= RQF_MQ_INFLIGHT;
-	} else
+	} else {
  		flush_rq->internal_tag = first_rq->internal_tag;
+		flush_rq->rq_flags |= RQF_ELV;
+		blk_mq_sched_prepare(flush_rq);
+	}

  	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
  	flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
@@ -432,7 +435,8 @@ void blk_insert_flush(struct request *rq)
  	 */
  	if ((policy & REQ_FSEQ_DATA) &&
  	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
-		blk_mq_request_bypass_insert(rq, false, true);
+		blk_mq_sched_insert_request(rq, /*at_head=*/false,
+					    /*run_queue=*/true, /*async=*/true);
  		return;
  	}

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 62c8c1ba1321..9938c35aa7ed 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -18,6 +18,20 @@
  #include "blk-mq-tag.h"
  #include "blk-wbt.h"

+/* Prepare a request for insertion into an I/O scheduler. */
+void blk_mq_sched_prepare(struct request *rq)
+{
+	struct elevator_queue *e = rq->q->elevator;
+
+	INIT_HLIST_NODE(&rq->hash);
+	RB_CLEAR_NODE(&rq->rb_node);
+
+	if (e->type->ops.prepare_request) {
+		e->type->ops.prepare_request(rq);
+		rq->rq_flags |= RQF_ELVPRIV;
+	}
+}
+
  /*
   * Mark a hardware queue as needing a restart.
   */
@@ -397,10 +411,7 @@ static bool blk_mq_sched_bypass_insert(struct request *rq)
  	 * passthrough request is added to scheduler queue, there isn't any
  	 * chance to dispatch it given we prioritize requests in hctx->dispatch.
  	 */
-	if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
-		return true;
-
-	return false;
+	return req_op(rq) == REQ_OP_FLUSH || blk_rq_is_passthrough(rq);
  }

  void blk_mq_sched_insert_request(struct request *rq, bool at_head,
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 025013972453..6337c5a66af6 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -8,6 +8,7 @@

  #define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ)

+void blk_mq_sched_prepare(struct request *rq);
  bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
  		unsigned int nr_segs, struct request **merged_request);
  bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index db93b1a71157..2731466b1952 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -384,18 +384,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
  	WRITE_ONCE(rq->deadline, 0);
  	req_ref_set(rq, 1);

-	if (rq->rq_flags & RQF_ELV) {
-		struct elevator_queue *e = data->q->elevator;
-
-		INIT_HLIST_NODE(&rq->hash);
-		RB_CLEAR_NODE(&rq->rb_node);
-
-		if (!op_is_flush(data->cmd_flags) &&
-		    e->type->ops.prepare_request) {
-			e->type->ops.prepare_request(rq);
-			rq->rq_flags |= RQF_ELVPRIV;
-		}
-	}
+	if (rq->rq_flags & RQF_ELV)
+		blk_mq_sched_prepare(rq);

  	return rq;
  }
@@ -452,13 +442,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
  		data->rq_flags |= RQF_ELV;

  		/*
-		 * Flush/passthrough requests are special and go directly to the
-		 * dispatch list. Don't include reserved tags in the
-		 * limiting, as it isn't useful.
+		 * Do not limit the depth for passthrough requests nor for
+		 * requests with a reserved tag.
  		 */
-		if (!op_is_flush(data->cmd_flags) &&
+		if (e->type->ops.limit_depth &&
  		    !blk_op_is_passthrough(data->cmd_flags) &&
-		    e->type->ops.limit_depth &&
  		    !(data->flags & BLK_MQ_REQ_RESERVED))
  			e->type->ops.limit_depth(data->cmd_flags, data);
  	}
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 06caacd77ed6..5e6c79ad83d2 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -169,25 +169,20 @@ struct request {
  		void *completion_data;
  	};

-
  	/*
  	 * Three pointers are available for the IO schedulers, if they need
-	 * more they have to dynamically allocate it.  Flush requests are
-	 * never put on the IO scheduler. So let the flush fields share
-	 * space with the elevator data.
+	 * more they have to dynamically allocate it.
  	 */
-	union {
-		struct {
-			struct io_cq		*icq;
-			void			*priv[2];
-		} elv;
-
-		struct {
-			unsigned int		seq;
-			struct list_head	list;
-			rq_end_io_fn		*saved_end_io;
-		} flush;
-	};
+	struct {
+		struct io_cq		*icq;
+		void			*priv[2];
+	} elv;
+
+	struct {
+		unsigned int		seq;
+		struct list_head	list;
+		rq_end_io_fn		*saved_end_io;
+	} flush;

  	union {
  		struct __call_single_data csd;


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 02/12] block: Send flush requests to the I/O scheduler
  2023-04-11  0:15     ` Bart Van Assche
@ 2023-04-11  6:38       ` Christoph Hellwig
  2023-04-11 17:13         ` Bart Van Assche
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2023-04-11  6:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Damien Le Moal, Jens Axboe, linux-block, Jaegeuk Kim,
	Christoph Hellwig, Ming Lei, Mike Snitzer

On Mon, Apr 10, 2023 at 05:15:44PM -0700, Bart Van Assche wrote:
> Subject: [PATCH] block: Send flush requests to the I/O scheduler
>
> Send flush requests to the I/O scheduler such that I/O scheduler policies
> are applied to writes with the FUA flag set. Separate the I/O scheduler
> members from the flush members in struct request since with this patch
> applied a request may pass through both an I/O scheduler and the flush
> machinery.
>
> This change affects the statistics of I/O schedulers that track I/O
> statistics (BFQ and mq-deadline).

This looks reasonably to me, as these special cases are nasty.
But we'll need very careful testing, including performance testing
to ensure this doesn't regress.

> +		blk_mq_sched_insert_request(rq, /*at_head=*/false,
> +					    /*run_queue=*/true, /*async=*/true);

And place drop these silly comments.  If you want to do something about
this rather suboptimal interface convert the three booleans to a flags
argument with properly named flags.

> -	if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
> -		return true;
> -
> -	return false;
> +	return req_op(rq) == REQ_OP_FLUSH || blk_rq_is_passthrough(rq);

This just seem like an arbitrary reformatting.  While I also prefer
your new version, I don't think it belongs into this patch.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 09/12] block: mq-deadline: Disable head insertion for zoned writes
  2023-04-10 17:09     ` Bart Van Assche
@ 2023-04-11  6:44       ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2023-04-11  6:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Damien Le Moal, Jens Axboe, linux-block, Jaegeuk Kim,
	Christoph Hellwig, Ming Lei, Mike Snitzer

On Mon, Apr 10, 2023 at 10:09:40AM -0700, Bart Van Assche wrote:
> The code for deciding whether or not to use head insertion is spread all 
> over the block layer. I prefer a single additional check to disable head 
> insertion instead of modifying all the code that decides whether or not to 
> use head-insertion. Additionally, the call to blk_rq_is_seq_zoned_write() 
> would remain if the decision whether or not to use head insertion is moved 
> into the callers of elevator_type.insert_request.

I think it's time to do a proper audit and sort this out.

at_head insertations make absolute sense for certain passthrough commands,
so the path in from blk_execute_rq/blk_execute_rq_nowait is fine, and it
should always be passed on, even for zoned devices.

For everything else head insertations looks very questionable and I
think we need to go through them and figure out if any of them should
be kept.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 03/12] block: Send requeued requests to the I/O scheduler
  2023-04-07 23:58 ` [PATCH v2 03/12] block: Send requeued " Bart Van Assche
  2023-04-10  7:53   ` Damien Le Moal
@ 2023-04-11 12:38   ` Christoph Hellwig
  2023-04-11 17:17     ` Bart Van Assche
  2023-04-11 13:14   ` Christoph Hellwig
  2 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2023-04-11 12:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Christoph Hellwig,
	Damien Le Moal, Ming Lei, Mike Snitzer

On Fri, Apr 07, 2023 at 04:58:13PM -0700, Bart Van Assche wrote:
> -		/*
> -		 * If RQF_DONTPREP, rq has contained some driver specific
> -		 * data, so insert it to hctx dispatch list to avoid any
> -		 * merge.
> -		 */
> -		if (rq->rq_flags & RQF_DONTPREP)
> -			blk_mq_request_bypass_insert(rq, false, false);
> -		else
> -			blk_mq_sched_insert_request(rq, true, false, false);
> +		blk_mq_sched_insert_request(rq, /*at_head=*/true, false, false);

This effectively undoes commit aef1897cd36d, and instead you add
RQF_DONTPREP to RQF_NOMERGE_FLAGS.  This makes sense to be, but should
probably be more clearly documented in the commit log.

> @@ -2065,9 +2057,14 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
>  		if (nr_budgets)
>  			blk_mq_release_budgets(q, list);
>  
> -		spin_lock(&hctx->lock);
> -		list_splice_tail_init(list, &hctx->dispatch);
> -		spin_unlock(&hctx->lock);
> +		if (!q->elevator) {
> +			spin_lock(&hctx->lock);
> +			list_splice_tail_init(list, &hctx->dispatch);
> +			spin_unlock(&hctx->lock);
> +		} else {
> +			q->elevator->type->ops.insert_requests(hctx, list,
> +							/*at_head=*/true);
> +		}

But I have no idea how this is related in any way.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/12] block: Requeue requests if a CPU is unplugged
  2023-04-07 23:58 ` [PATCH v2 04/12] block: Requeue requests if a CPU is unplugged Bart Van Assche
  2023-04-10  7:54   ` Damien Le Moal
@ 2023-04-11 12:40   ` Christoph Hellwig
  2023-04-11 17:18     ` Bart Van Assche
  1 sibling, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2023-04-11 12:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Christoph Hellwig,
	Damien Le Moal, Ming Lei, Mike Snitzer

On Fri, Apr 07, 2023 at 04:58:14PM -0700, Bart Van Assche wrote:
> +	if (hctx->queue->elevator) {
> +		struct request *rq, *next;
> +
> +		list_for_each_entry_safe(rq, next, &tmp, queuelist)
> +			blk_mq_requeue_request(rq, false);
> +		blk_mq_kick_requeue_list(hctx->queue);
> +	} else {
> +		spin_lock(&hctx->lock);
> +		list_splice_tail_init(&tmp, &hctx->dispatch);
> +		spin_unlock(&hctx->lock);
> +	}

Given that this isn't exactly a fast path, is there any reason to
not always go through the requeue_list?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 06/12] block: Preserve the order of requeued requests
  2023-04-07 23:58 ` [PATCH v2 06/12] block: Preserve the order of requeued requests Bart Van Assche
  2023-04-10  8:01   ` Damien Le Moal
@ 2023-04-11 12:43   ` Christoph Hellwig
  1 sibling, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2023-04-11 12:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Christoph Hellwig,
	Damien Le Moal, Ming Lei, Mike Snitzer

I think this should be merged with the previous patch.

>  void blk_mq_kick_requeue_list(struct request_queue *q)
>  {
> +	blk_mq_run_hw_queues(q, true);
>  }
>  EXPORT_SYMBOL(blk_mq_kick_requeue_list);

Pleae just remove blk_mq_kick_requeue_list entirely.

>
>  
>  void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>  				    unsigned long msecs)
>  {
> +	blk_mq_delay_run_hw_queues(q, msecs);
>  }
>  EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);

Same for blk_mq_delay_kick_requeue_list.

>  		if (!sq_hctx || sq_hctx == hctx ||
> -		    !list_empty_careful(&hctx->dispatch))
> +		    blk_mq_hctx_has_pending(hctx))
>  			blk_mq_run_hw_queue(hctx, async);
>  	}
>  }
> @@ -2353,7 +2341,7 @@ void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs)
>  		 * scheduler.
>  		 */
>  		if (!sq_hctx || sq_hctx == hctx ||
> -		    !list_empty_careful(&hctx->dispatch))
> +		    blk_mq_hctx_has_pending(hctx))

This check would probably benefit from being factored into a helper.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 03/12] block: Send requeued requests to the I/O scheduler
  2023-04-07 23:58 ` [PATCH v2 03/12] block: Send requeued " Bart Van Assche
  2023-04-10  7:53   ` Damien Le Moal
  2023-04-11 12:38   ` Christoph Hellwig
@ 2023-04-11 13:14   ` Christoph Hellwig
  2 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2023-04-11 13:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Christoph Hellwig,
	Damien Le Moal, Ming Lei, Mike Snitzer

On Fri, Apr 07, 2023 at 04:58:13PM -0700, Bart Van Assche wrote:
> +		blk_mq_sched_insert_request(rq, /*at_head=*/true, false, false);

The whole usage of at_head in the request_list-related code looks
suspicious to me.

All callers of blk_mq_add_to_requeue_list except for blk_kick_flush
pass at_head=true.  So the request_list is basically LIFO except for
that one caller.

blk_mq_requeue_work than does a HEAD insert for them, unless they
are marked RQF_DONTPREP because the driver already did some setup.
So except for the RQF_DONTPREP we basically revert the at_head insert.

This all feels wrong to me.  I think we need to get to a point where
the request_list itself is always added to at the tail, processed
head to tail, but inserted into the scheduler or the hctx rq_list
before other pending requests, probaly using similar code as
blk_mq_flush_plug_list / blk_mq_dispatch_plug_list.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 02/12] block: Send flush requests to the I/O scheduler
  2023-04-11  6:38       ` Christoph Hellwig
@ 2023-04-11 17:13         ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-11 17:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Damien Le Moal, Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei,
	Mike Snitzer

On 4/10/23 23:38, Christoph Hellwig wrote:
> On Mon, Apr 10, 2023 at 05:15:44PM -0700, Bart Van Assche wrote:
>> +		blk_mq_sched_insert_request(rq, /*at_head=*/false,
>> +					    /*run_queue=*/true, /*async=*/true);
> 
> And place drop these silly comments.  If you want to do something about
> this rather suboptimal interface convert the three booleans to a flags
> argument with properly named flags.

I will remove these comments, but it is not clear to me what is silly about
these comments. There are even tools that can check the correctness of these
comments. See also
https://clang.llvm.org/extra/clang-tidy/checks/bugprone/argument-comment.html

>> -	if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
>> -		return true;
>> -
>> -	return false;
>> +	return req_op(rq) == REQ_OP_FLUSH || blk_rq_is_passthrough(rq);
> 
> This just seem like an arbitrary reformatting.  While I also prefer
> your new version, I don't think it belongs into this patch.

I will move this change into a patch of its own.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 03/12] block: Send requeued requests to the I/O scheduler
  2023-04-11 12:38   ` Christoph Hellwig
@ 2023-04-11 17:17     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-11 17:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Damien Le Moal, Ming Lei,
	Mike Snitzer

On 4/11/23 05:38, Christoph Hellwig wrote:
> On Fri, Apr 07, 2023 at 04:58:13PM -0700, Bart Van Assche wrote:
>> @@ -2065,9 +2057,14 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
>>   		if (nr_budgets)
>>   			blk_mq_release_budgets(q, list);
>>   
>> -		spin_lock(&hctx->lock);
>> -		list_splice_tail_init(list, &hctx->dispatch);
>> -		spin_unlock(&hctx->lock);
>> +		if (!q->elevator) {
>> +			spin_lock(&hctx->lock);
>> +			list_splice_tail_init(list, &hctx->dispatch);
>> +			spin_unlock(&hctx->lock);
>> +		} else {
>> +			q->elevator->type->ops.insert_requests(hctx, list,
>> +							/*at_head=*/true);
>> +		}
> 
> But I have no idea how this is related in any way.

Hi Christoph,

The I/O scheduler can only control the order in which requests are 
dispatched if:
- blk_mq_run_hw_queue() moves requests from the requeue list to the I/O
   scheduler before it asks the I/O scheduler to dispatch a request.
- No requests end up on any other queue than the I/O scheduler queue or
   the requeue list.

The scope of this patch is to send requeued requests back to the I/O 
scheduler. Hence the above change.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/12] block: Requeue requests if a CPU is unplugged
  2023-04-11 12:40   ` Christoph Hellwig
@ 2023-04-11 17:18     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2023-04-11 17:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Damien Le Moal, Ming Lei,
	Mike Snitzer

On 4/11/23 05:40, Christoph Hellwig wrote:
> On Fri, Apr 07, 2023 at 04:58:14PM -0700, Bart Van Assche wrote:
>> +	if (hctx->queue->elevator) {
>> +		struct request *rq, *next;
>> +
>> +		list_for_each_entry_safe(rq, next, &tmp, queuelist)
>> +			blk_mq_requeue_request(rq, false);
>> +		blk_mq_kick_requeue_list(hctx->queue);
>> +	} else {
>> +		spin_lock(&hctx->lock);
>> +		list_splice_tail_init(&tmp, &hctx->dispatch);
>> +		spin_unlock(&hctx->lock);
>> +	}
> 
> Given that this isn't exactly a fast path, is there any reason to
> not always go through the requeue_list?

Hi Christoph,

I will simplify this patch by letting blk_mq_hctx_notify_dead() always 
send requests to the requeue list.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2023-04-11 17:18 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-07 23:58 [PATCH v2 00/12] Submit zoned writes in order Bart Van Assche
2023-04-07 23:58 ` [PATCH v2 01/12] block: Send zoned writes to the I/O scheduler Bart Van Assche
2023-04-10  7:42   ` Damien Le Moal
2023-04-10 16:35     ` Bart Van Assche
2023-04-07 23:58 ` [PATCH v2 02/12] block: Send flush requests " Bart Van Assche
2023-04-10  7:46   ` Damien Le Moal
2023-04-11  0:15     ` Bart Van Assche
2023-04-11  6:38       ` Christoph Hellwig
2023-04-11 17:13         ` Bart Van Assche
2023-04-07 23:58 ` [PATCH v2 03/12] block: Send requeued " Bart Van Assche
2023-04-10  7:53   ` Damien Le Moal
2023-04-10 16:59     ` Bart Van Assche
2023-04-11 12:38   ` Christoph Hellwig
2023-04-11 17:17     ` Bart Van Assche
2023-04-11 13:14   ` Christoph Hellwig
2023-04-07 23:58 ` [PATCH v2 04/12] block: Requeue requests if a CPU is unplugged Bart Van Assche
2023-04-10  7:54   ` Damien Le Moal
2023-04-11 12:40   ` Christoph Hellwig
2023-04-11 17:18     ` Bart Van Assche
2023-04-07 23:58 ` [PATCH v2 05/12] block: One requeue list per hctx Bart Van Assche
2023-04-10  7:58   ` Damien Le Moal
2023-04-10 17:04     ` Bart Van Assche
2023-04-07 23:58 ` [PATCH v2 06/12] block: Preserve the order of requeued requests Bart Van Assche
2023-04-10  8:01   ` Damien Le Moal
2023-04-11 12:43   ` Christoph Hellwig
2023-04-07 23:58 ` [PATCH v2 07/12] block: Make it easier to debug zoned write reordering Bart Van Assche
2023-04-10  8:06   ` Damien Le Moal
2023-04-07 23:58 ` [PATCH v2 08/12] block: mq-deadline: Simplify deadline_skip_seq_writes() Bart Van Assche
2023-04-10  8:07   ` Damien Le Moal
2023-04-07 23:58 ` [PATCH v2 09/12] block: mq-deadline: Disable head insertion for zoned writes Bart Van Assche
2023-04-10  8:10   ` Damien Le Moal
2023-04-10 17:09     ` Bart Van Assche
2023-04-11  6:44       ` Christoph Hellwig
2023-04-07 23:58 ` [PATCH v2 10/12] block: mq-deadline: Introduce a local variable Bart Van Assche
2023-04-10  8:11   ` Damien Le Moal
2023-04-07 23:58 ` [PATCH v2 11/12] block: mq-deadline: Fix a race condition related to zoned writes Bart Van Assche
2023-04-10  8:16   ` Damien Le Moal
2023-04-10 17:23     ` Bart Van Assche
2023-04-07 23:58 ` [PATCH v2 12/12] block: mq-deadline: Handle requeued requests correctly Bart Van Assche
2023-04-10  8:32   ` Damien Le Moal
2023-04-10 17:31     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.