linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: less special casing for flush requests
@ 2023-04-16 20:09 Christoph Hellwig
  2023-04-16 20:09 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Hi all,

inspired by Barts quest for less reording during requeues, this series
aims to limit the scope of requeues.  This is mostly done by simply
sending down more than we currently do down the normal submission path
with the help of Bart's patch to allow requests that are in the flush
state machine to still be seen by the I/O schedulers.

Diffstat:
 block/blk-flush.c      |  115 +++++++++++++++++++++++++++----------------------
 block/blk-mq-debugfs.c |    1 
 block/blk-mq-sched.c   |   14 +++++
 block/blk-mq-sched.h   |    1 
 block/blk-mq.c         |   68 ++++++++++------------------
 block/blk-mq.h         |    5 --
 block/blk.h            |    2 
 include/linux/blk-mq.h |   31 +++++--------
 include/linux/blkdev.h |    1 
 9 files changed, 119 insertions(+), 119 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper
  2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
  2023-04-17  6:22   ` Damien Le Moal
  2023-04-16 20:09 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Factor out a helper from blk_insert_flush that initializes the flush
machine related fields in struct request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 04698ed9bcd4a9..422a6d5446d1c5 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -376,6 +376,15 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
 	return RQ_END_IO_NONE;
 }
 
+static void blk_rq_init_flush(struct request *rq)
+{
+	memset(&rq->flush, 0, sizeof(rq->flush));
+	INIT_LIST_HEAD(&rq->flush.list);
+	rq->rq_flags |= RQF_FLUSH_SEQ;
+	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
+	rq->end_io = mq_flush_data_end_io;
+}
+
 /**
  * blk_insert_flush - insert a new PREFLUSH/FUA request
  * @rq: request to insert
@@ -437,13 +446,7 @@ void blk_insert_flush(struct request *rq)
 	 * @rq should go through flush machinery.  Mark it part of flush
 	 * sequence and submit for further processing.
 	 */
-	memset(&rq->flush, 0, sizeof(rq->flush));
-	INIT_LIST_HEAD(&rq->flush.list);
-	rq->rq_flags |= RQF_FLUSH_SEQ;
-	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
-
-	rq->end_io = mq_flush_data_end_io;
-
+	blk_rq_init_flush(rq);
 	spin_lock_irq(&fq->mq_flush_lock);
 	blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
 	spin_unlock_irq(&fq->mq_flush_lock);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/7] blk-mq: reflow blk_insert_flush
  2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
  2023-04-16 20:09 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
  2023-04-17  6:29   ` Damien Le Moal
  2023-04-16 20:09 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Use a switch statement to decide on the disposition of a flush request
instead of multiple if statements, out of which one does checks that are
more complex than required.  Also warn on a malformed request early
on instead of doing a BUG_ON later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c | 53 +++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 422a6d5446d1c5..9fcfee7394f27d 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -402,6 +402,9 @@ void blk_insert_flush(struct request *rq)
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
+	/* FLUSH/FUA request must never be merged */
+	WARN_ON_ONCE(rq->bio != rq->biotail);
+
 	/*
 	 * @policy now records what operations need to be done.  Adjust
 	 * REQ_PREFLUSH and FUA for the driver.
@@ -417,39 +420,35 @@ void blk_insert_flush(struct request *rq)
 	 */
 	rq->cmd_flags |= REQ_SYNC;
 
-	/*
-	 * An empty flush handed down from a stacking driver may
-	 * translate into nothing if the underlying device does not
-	 * advertise a write-back cache.  In this case, simply
-	 * complete the request.
-	 */
-	if (!policy) {
+	switch (policy) {
+	case 0:
+		/*
+		 * An empty flush handed down from a stacking driver may
+		 * translate into nothing if the underlying device does not
+		 * advertise a write-back cache.  In this case, simply
+		 * complete the request.
+		 */
 		blk_mq_end_request(rq, 0);
 		return;
-	}
-
-	BUG_ON(rq->bio != rq->biotail); /*assumes zero or single bio rq */
-
-	/*
-	 * If there's data but flush is not necessary, the request can be
-	 * processed directly without going through flush machinery.  Queue
-	 * for normal execution.
-	 */
-	if ((policy & REQ_FSEQ_DATA) &&
-	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
+	case REQ_FSEQ_DATA:
+		/*
+		 * If there's data, but no flush is necessary, the request can
+		 * be processed directly without going through flush machinery.
+		 * Queue for normal execution.
+		 */
 		blk_mq_request_bypass_insert(rq, 0);
 		blk_mq_run_hw_queue(hctx, false);
 		return;
+	default:
+		/*
+		 * Mark the request as part of a flush sequence and submit it
+		 * for further processing to the flush state machine.
+		 */
+		blk_rq_init_flush(rq);
+		spin_lock_irq(&fq->mq_flush_lock);
+		blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
+		spin_unlock_irq(&fq->mq_flush_lock);
 	}
-
-	/*
-	 * @rq should go through flush machinery.  Mark it part of flush
-	 * sequence and submit for further processing.
-	 */
-	blk_rq_init_flush(rq);
-	spin_lock_irq(&fq->mq_flush_lock);
-	blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
-	spin_unlock_irq(&fq->mq_flush_lock);
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands
  2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
  2023-04-16 20:09 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
  2023-04-16 20:09 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
  2023-04-17  6:31   ` Damien Le Moal
  2023-04-17 19:48   ` Bart Van Assche
  2023-04-16 20:09 ` [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

If blk_insert_flush decides that a command does not need to use the
flush state machine, return false and let blk_mq_submit_bio handle
it the normal way (including using an I/O scheduler) instead of doing
a bypass insert.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c | 22 ++++++++--------------
 block/blk-mq.c    |  8 ++++----
 block/blk-mq.h    |  4 ----
 block/blk.h       |  2 +-
 4 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 9fcfee7394f27d..5bc462524473b2 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -385,22 +385,17 @@ static void blk_rq_init_flush(struct request *rq)
 	rq->end_io = mq_flush_data_end_io;
 }
 
-/**
- * blk_insert_flush - insert a new PREFLUSH/FUA request
- * @rq: request to insert
- *
- * To be called from __elv_add_request() for %ELEVATOR_INSERT_FLUSH insertions.
- * or __blk_mq_run_hw_queue() to dispatch request.
- * @rq is being submitted.  Analyze what needs to be done and put it on the
- * right queue.
+/*
+ * Insert a PREFLUSH/FUA request into the flush state machine.
+ * Returns true if the request has been consumed by the flush state machine,
+ * or false if the caller should continue to process it.
  */
-void blk_insert_flush(struct request *rq)
+bool blk_insert_flush(struct request *rq)
 {
 	struct request_queue *q = rq->q;
 	unsigned long fflags = q->queue_flags;	/* may change, cache */
 	unsigned int policy = blk_flush_policy(fflags, rq);
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
-	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
 	/* FLUSH/FUA request must never be merged */
 	WARN_ON_ONCE(rq->bio != rq->biotail);
@@ -429,16 +424,14 @@ void blk_insert_flush(struct request *rq)
 		 * complete the request.
 		 */
 		blk_mq_end_request(rq, 0);
-		return;
+		return true;
 	case REQ_FSEQ_DATA:
 		/*
 		 * If there's data, but no flush is necessary, the request can
 		 * be processed directly without going through flush machinery.
 		 * Queue for normal execution.
 		 */
-		blk_mq_request_bypass_insert(rq, 0);
-		blk_mq_run_hw_queue(hctx, false);
-		return;
+		return false;
 	default:
 		/*
 		 * Mark the request as part of a flush sequence and submit it
@@ -448,6 +441,7 @@ void blk_insert_flush(struct request *rq)
 		spin_lock_irq(&fq->mq_flush_lock);
 		blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
 		spin_unlock_irq(&fq->mq_flush_lock);
+		return true;
 	}
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c2d297efe229db..452bc17cce4dcf 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -45,6 +45,8 @@
 static DEFINE_PER_CPU(struct llist_head, blk_cpu_done);
 
 static void blk_mq_insert_request(struct request *rq, blk_insert_t flags);
+static void blk_mq_request_bypass_insert(struct request *rq,
+		blk_insert_t flags);
 static void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 		struct list_head *list);
 
@@ -2429,7 +2431,7 @@ static void blk_mq_run_work_fn(struct work_struct *work)
  * Should only be used carefully, when the caller knows we want to
  * bypass a potential IO scheduler on the target device.
  */
-void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
+static void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
 {
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
@@ -2972,10 +2974,8 @@ void blk_mq_submit_bio(struct bio *bio)
 		return;
 	}
 
-	if (op_is_flush(bio->bi_opf)) {
-		blk_insert_flush(rq);
+	if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
 		return;
-	}
 
 	if (plug) {
 		blk_add_rq_to_plug(plug, rq);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index f882677ff106a5..62c505e6ef1e40 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -64,10 +64,6 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
 void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
 			     struct blk_mq_tags *tags,
 			     unsigned int hctx_idx);
-/*
- * Internal helpers for request insertion into sw queues
- */
-void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags);
 
 /*
  * CPU -> queue mappings
diff --git a/block/blk.h b/block/blk.h
index 2da83110347173..512a1c0db6ba15 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -277,7 +277,7 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
  */
 #define ELV_ON_HASH(rq) ((rq)->rq_flags & RQF_HASHED)
 
-void blk_insert_flush(struct request *rq);
+bool blk_insert_flush(struct request *rq);
 
 int elevator_switch(struct request_queue *q, struct elevator_type *new_e);
 void elevator_disable(struct request_queue *q);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine
  2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-04-16 20:09 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
  2023-04-17  6:34   ` Damien Le Moal
  2023-04-16 20:09 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

From: Bart Van Assche <bvanassche@acm.org>

Send write requests issued by the flush state machine through the normal
I/O submission path including the I/O scheduler (if present) so that I/O
scheduler policies are applied to writes with the FUA flag set.

Separate the I/O scheduler members from the flush members in struct
request since now a request may pass through both an I/O scheduler
and the flush machinery.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[hch: rebased]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c      |  5 ++++-
 block/blk-mq-sched.c   | 14 ++++++++++++++
 block/blk-mq-sched.h   |  1 +
 block/blk-mq.c         | 24 ++++++------------------
 include/linux/blk-mq.h | 27 +++++++++++----------------
 5 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 5bc462524473b2..f62e74d9d56bc8 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -330,8 +330,11 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 		 * account of this driver tag
 		 */
 		flush_rq->rq_flags |= RQF_MQ_INFLIGHT;
-	} else
+	} else {
 		flush_rq->internal_tag = first_rq->internal_tag;
+		flush_rq->rq_flags |= RQF_ELV;
+		blk_mq_sched_prepare(flush_rq);
+	}
 
 	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
 	flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 67c95f31b15bb1..70087554eeb8f4 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -16,6 +16,20 @@
 #include "blk-mq-sched.h"
 #include "blk-wbt.h"
 
+/* Prepare a request for insertion into an I/O scheduler. */
+void blk_mq_sched_prepare(struct request *rq)
+{
+	struct elevator_queue *e = rq->q->elevator;
+
+	INIT_HLIST_NODE(&rq->hash);
+	RB_CLEAR_NODE(&rq->rb_node);
+
+	if (e->type->ops.prepare_request) {
+		e->type->ops.prepare_request(rq);
+		rq->rq_flags |= RQF_ELVPRIV;
+	}
+}
+
 /*
  * Mark a hardware queue as needing a restart.
  */
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 7c3cbad17f3052..3049e988ca5916 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -7,6 +7,7 @@
 
 #define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ)
 
+void blk_mq_sched_prepare(struct request *rq);
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs, struct request **merged_request);
 bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 452bc17cce4dcf..50a03e1592819c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -388,18 +388,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	WRITE_ONCE(rq->deadline, 0);
 	req_ref_set(rq, 1);
 
-	if (rq->rq_flags & RQF_ELV) {
-		struct elevator_queue *e = data->q->elevator;
-
-		INIT_HLIST_NODE(&rq->hash);
-		RB_CLEAR_NODE(&rq->rb_node);
-
-		if (!op_is_flush(data->cmd_flags) &&
-		    e->type->ops.prepare_request) {
-			e->type->ops.prepare_request(rq);
-			rq->rq_flags |= RQF_ELVPRIV;
-		}
-	}
+	if (rq->rq_flags & RQF_ELV)
+		blk_mq_sched_prepare(rq);
 
 	return rq;
 }
@@ -456,13 +446,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 		data->rq_flags |= RQF_ELV;
 
 		/*
-		 * Flush/passthrough requests are special and go directly to the
-		 * dispatch list. Don't include reserved tags in the
-		 * limiting, as it isn't useful.
+		 * Do not limit the depth for passthrough requests nor for
+		 * requests with a reserved tag.
 		 */
-		if (!op_is_flush(data->cmd_flags) &&
+		if (e->type->ops.limit_depth &&
 		    !blk_op_is_passthrough(data->cmd_flags) &&
-		    e->type->ops.limit_depth &&
 		    !(data->flags & BLK_MQ_REQ_RESERVED))
 			e->type->ops.limit_depth(data->cmd_flags, data);
 	}
@@ -2496,7 +2484,7 @@ static void blk_mq_insert_request(struct request *rq, blk_insert_t flags)
 		 * dispatch it given we prioritize requests in hctx->dispatch.
 		 */
 		blk_mq_request_bypass_insert(rq, flags);
-	} else if (rq->rq_flags & RQF_FLUSH_SEQ) {
+	} else if (req_op(rq) == REQ_OP_FLUSH) {
 		/*
 		 * Firstly normal IO request is inserted to scheduler queue or
 		 * sw queue, meantime we add flush request to dispatch queue(
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1dacb2c81fdda1..a204a5f3cc9524 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -169,25 +169,20 @@ struct request {
 		void *completion_data;
 	};
 
-
 	/*
 	 * Three pointers are available for the IO schedulers, if they need
-	 * more they have to dynamically allocate it.  Flush requests are
-	 * never put on the IO scheduler. So let the flush fields share
-	 * space with the elevator data.
+	 * more they have to dynamically allocate it.
 	 */
-	union {
-		struct {
-			struct io_cq		*icq;
-			void			*priv[2];
-		} elv;
-
-		struct {
-			unsigned int		seq;
-			struct list_head	list;
-			rq_end_io_fn		*saved_end_io;
-		} flush;
-	};
+	struct {
+		struct io_cq		*icq;
+		void			*priv[2];
+	} elv;
+
+	struct {
+		unsigned int		seq;
+		struct list_head	list;
+		rq_end_io_fn		*saved_end_io;
+	} flush;
 
 	union {
 		struct __call_single_data csd;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests
  2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-04-16 20:09 ` [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
  2023-04-17  6:36   ` Damien Le Moal
  2023-04-16 20:09 ` [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Requests with the FUA bit on hardware without FUA support need a post
flush before returning the caller, but they can still be sent using
the normal I/O path after initializing the flush-related fields and
end I/O handler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index f62e74d9d56bc8..9eda6d46438dba 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -435,6 +435,17 @@ bool blk_insert_flush(struct request *rq)
 		 * Queue for normal execution.
 		 */
 		return false;
+	case REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH:
+		/*
+		 * Initialize the flush fields and completion handler to trigger
+		 * the post flush, and then just pass the command on.
+		 */
+		blk_rq_init_flush(rq);
+		rq->flush.seq |= REQ_FSEQ_PREFLUSH;
+		spin_lock_irq(&fq->mq_flush_lock);
+		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
+		spin_unlock_irq(&fq->mq_flush_lock);
+		return false;
 	default:
 		/*
 		 * Mark the request as part of a flush sequence and submit it
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands
  2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-04-16 20:09 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
  2023-04-17  6:38   ` Damien Le Moal
  2023-04-17 19:51   ` Bart Van Assche
  2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
  2023-05-16  0:08 ` RFC: less special casing for flush requests Bart Van Assche
  7 siblings, 2 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

blk_flush_complete_seq currently queues requests that write data after
a pre-flush from the flush state machine at the head of the queue.
This doesn't really make sense, as the original request bypassed all
queue lists by directly diverting to blk_insert_flush from
blk_mq_submit_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 9eda6d46438dba..69e9806f575455 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -188,7 +188,7 @@ static void blk_flush_complete_seq(struct request *rq,
 
 	case REQ_FSEQ_DATA:
 		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
-		blk_mq_add_to_requeue_list(rq, BLK_MQ_INSERT_AT_HEAD);
+		blk_mq_add_to_requeue_list(rq, 0);
 		blk_mq_kick_requeue_list(q);
 		break;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-04-16 20:09 ` [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
  2023-04-16 21:01   ` Bart Van Assche
                     ` (3 more replies)
  2023-05-16  0:08 ` RFC: less special casing for flush requests Bart Van Assche
  7 siblings, 4 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Currently both requeues of commands that were already sent to the
driver and flush commands submitted from the flush state machine
share the same requeue_list struct request_queue, despite requeues
doing head insertations and flushes not.  Switch to using two
separate lists instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c      |  9 +++++++--
 block/blk-mq-debugfs.c |  1 -
 block/blk-mq.c         | 36 +++++++++++++++---------------------
 block/blk-mq.h         |  1 -
 include/linux/blk-mq.h |  4 +---
 include/linux/blkdev.h |  1 +
 6 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 69e9806f575455..231d3780e74ad1 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
 
 	case REQ_FSEQ_DATA:
 		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
-		blk_mq_add_to_requeue_list(rq, 0);
+		spin_lock(&q->requeue_lock);
+		list_add_tail(&rq->queuelist, &q->flush_list);
+		spin_unlock(&q->requeue_lock);
 		blk_mq_kick_requeue_list(q);
 		break;
 
@@ -349,7 +351,10 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 	smp_wmb();
 	req_ref_set(flush_rq, 1);
 
-	blk_mq_add_to_requeue_list(flush_rq, 0);
+	spin_lock(&q->requeue_lock);
+	list_add_tail(&flush_rq->queuelist, &q->flush_list);
+	spin_unlock(&q->requeue_lock);
+
 	blk_mq_kick_requeue_list(q);
 }
 
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index d23a8554ec4aeb..0d2a0f0524b53e 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -241,7 +241,6 @@ static const char *const cmd_flag_name[] = {
 #define RQF_NAME(name) [ilog2((__force u32)RQF_##name)] = #name
 static const char *const rqf_name[] = {
 	RQF_NAME(STARTED),
-	RQF_NAME(SOFTBARRIER),
 	RQF_NAME(FLUSH_SEQ),
 	RQF_NAME(MIXED_MERGE),
 	RQF_NAME(MQ_INFLIGHT),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 50a03e1592819c..eaed84a346c9c4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1403,13 +1403,16 @@ static void __blk_mq_requeue_request(struct request *rq)
 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
 	struct request_queue *q = rq->q;
+	unsigned long flags;
 
 	__blk_mq_requeue_request(rq);
 
 	/* this request will be re-inserted to io scheduler queue */
 	blk_mq_sched_requeue_request(rq);
 
-	blk_mq_add_to_requeue_list(rq, BLK_MQ_INSERT_AT_HEAD);
+	spin_lock_irqsave(&q->requeue_lock, flags);
+	list_add_tail(&rq->queuelist, &q->requeue_list);
+	spin_unlock_irqrestore(&q->requeue_lock, flags);
 
 	if (kick_requeue_list)
 		blk_mq_kick_requeue_list(q);
@@ -1421,13 +1424,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	struct request_queue *q =
 		container_of(work, struct request_queue, requeue_work.work);
 	LIST_HEAD(rq_list);
-	struct request *rq, *next;
+	LIST_HEAD(flush_list);
+	struct request *rq;
 
 	spin_lock_irq(&q->requeue_lock);
 	list_splice_init(&q->requeue_list, &rq_list);
+	list_splice_init(&q->flush_list, &flush_list);
 	spin_unlock_irq(&q->requeue_lock);
 
-	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
+	while (!list_empty(&rq_list)) {
+		rq = list_entry(rq_list.next, struct request, queuelist);
 		/*
 		 * If RQF_DONTPREP ist set, the request has been started by the
 		 * driver already and might have driver-specific data allocated
@@ -1435,18 +1441,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		 * block layer merges for the request.
 		 */
 		if (rq->rq_flags & RQF_DONTPREP) {
-			rq->rq_flags &= ~RQF_SOFTBARRIER;
 			list_del_init(&rq->queuelist);
 			blk_mq_request_bypass_insert(rq, 0);
-		} else if (rq->rq_flags & RQF_SOFTBARRIER) {
-			rq->rq_flags &= ~RQF_SOFTBARRIER;
+		} else {
 			list_del_init(&rq->queuelist);
 			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
 		}
 	}
 
-	while (!list_empty(&rq_list)) {
-		rq = list_entry(rq_list.next, struct request, queuelist);
+	while (!list_empty(&flush_list)) {
+		rq = list_entry(flush_list.next, struct request, queuelist);
 		list_del_init(&rq->queuelist);
 		blk_mq_insert_request(rq, 0);
 	}
@@ -1454,24 +1458,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	blk_mq_run_hw_queues(q, false);
 }
 
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags)
+void blk_flush_queue_insert(struct request *rq)
 {
 	struct request_queue *q = rq->q;
 	unsigned long flags;
 
-	/*
-	 * We abuse this flag that is otherwise used by the I/O scheduler to
-	 * request head insertion from the workqueue.
-	 */
-	BUG_ON(rq->rq_flags & RQF_SOFTBARRIER);
-
 	spin_lock_irqsave(&q->requeue_lock, flags);
-	if (insert_flags & BLK_MQ_INSERT_AT_HEAD) {
-		rq->rq_flags |= RQF_SOFTBARRIER;
-		list_add(&rq->queuelist, &q->requeue_list);
-	} else {
-		list_add_tail(&rq->queuelist, &q->requeue_list);
-	}
+	list_add_tail(&rq->queuelist, &q->flush_list);
 	spin_unlock_irqrestore(&q->requeue_lock, flags);
 }
 
@@ -4222,6 +4215,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	blk_mq_update_poll_flag(q);
 
 	INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
+	INIT_LIST_HEAD(&q->flush_list);
 	INIT_LIST_HEAD(&q->requeue_list);
 	spin_lock_init(&q->requeue_lock);
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 62c505e6ef1e40..1955e0c08d154c 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,7 +47,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
 bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *,
 			     unsigned int);
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags);
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_ctx *start);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a204a5f3cc9524..23fa9fdf59f3e1 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -28,8 +28,6 @@ typedef __u32 __bitwise req_flags_t;
 
 /* drive already may have started this one */
 #define RQF_STARTED		((__force req_flags_t)(1 << 1))
-/* may not be passed by ioscheduler */
-#define RQF_SOFTBARRIER		((__force req_flags_t)(1 << 3))
 /* request for flush sequence */
 #define RQF_FLUSH_SEQ		((__force req_flags_t)(1 << 4))
 /* merge of different types, fail separately */
@@ -65,7 +63,7 @@ typedef __u32 __bitwise req_flags_t;
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \
-	(RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
+	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
 
 enum mq_rq_state {
 	MQ_RQ_IDLE		= 0,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6ede578dfbc642..649274b4043b69 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -490,6 +490,7 @@ struct request_queue {
 	 * for flush operations
 	 */
 	struct blk_flush_queue	*fq;
+	struct list_head	flush_list;
 
 	struct list_head	requeue_list;
 	spinlock_t		requeue_lock;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
@ 2023-04-16 21:01   ` Bart Van Assche
  2023-04-17  4:26     ` Christoph Hellwig
  2023-04-17  6:46   ` Damien Le Moal
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2023-04-16 21:01 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 4/16/23 13:09, Christoph Hellwig wrote:
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 69e9806f575455..231d3780e74ad1 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
>   
>   	case REQ_FSEQ_DATA:
>   		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
> -		blk_mq_add_to_requeue_list(rq, 0);
> +		spin_lock(&q->requeue_lock);
> +		list_add_tail(&rq->queuelist, &q->flush_list);
> +		spin_unlock(&q->requeue_lock);
>   		blk_mq_kick_requeue_list(q);
>   		break;

At least the SCSI core can call blk_flush_complete_seq() from interrupt 
context so I don't think the above code is correct. The call chain is as 
follows:

LLD interrupt handler
   scsi_done()
     scsi_done_internal()
       blk_mq_complete_request()
         scsi_complete()
           scsi_finish_command()
             scsi_io_completion()
               scsi_end_request()
                 __blk_mq_end_request()
                   flush_end_io()
                     blk_flush_complete_seq()

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-04-16 21:01   ` Bart Van Assche
@ 2023-04-17  4:26     ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-17  4:26 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block

On Sun, Apr 16, 2023 at 02:01:30PM -0700, Bart Van Assche wrote:
> On 4/16/23 13:09, Christoph Hellwig wrote:
>> diff --git a/block/blk-flush.c b/block/blk-flush.c
>> index 69e9806f575455..231d3780e74ad1 100644
>> --- a/block/blk-flush.c
>> +++ b/block/blk-flush.c
>> @@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
>>     	case REQ_FSEQ_DATA:
>>   		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
>> -		blk_mq_add_to_requeue_list(rq, 0);
>> +		spin_lock(&q->requeue_lock);
>> +		list_add_tail(&rq->queuelist, &q->flush_list);
>> +		spin_unlock(&q->requeue_lock);
>>   		blk_mq_kick_requeue_list(q);
>>   		break;
>
> At least the SCSI core can call blk_flush_complete_seq() from interrupt 
> context so I don't think the above code is correct. The call chain is as 
> follows:

All callers of blk_flush_complete_seq already disable interrupts when
taking mq_flush_lock.  No need to disable interrupts again for a nested
lock then.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper
  2023-04-16 20:09 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
@ 2023-04-17  6:22   ` Damien Le Moal
  2023-04-17  6:24     ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2023-04-17  6:22 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block

On 4/17/23 05:09, Christoph Hellwig wrote:
> Factor out a helper from blk_insert_flush that initializes the flush
> machine related fields in struct request.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-flush.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 04698ed9bcd4a9..422a6d5446d1c5 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -376,6 +376,15 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
>  	return RQ_END_IO_NONE;
>  }
>  
> +static void blk_rq_init_flush(struct request *rq)
> +{
> +	memset(&rq->flush, 0, sizeof(rq->flush));
> +	INIT_LIST_HEAD(&rq->flush.list);
> +	rq->rq_flags |= RQF_FLUSH_SEQ;
> +	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
> +	rq->end_io = mq_flush_data_end_io;
> +}

struct flush is:

struct {
	unsigned int		seq;
	struct list_head	list;
	rq_end_io_fn		*saved_end_io;
} flush;

So given that list and saved_end_io are initialized here, we could remove the
memset() by initializing seq "manually":

+static void blk_rq_init_flush(struct request *rq)
+{
+	rq->flush.seq = 0;
+	INIT_LIST_HEAD(&rq->flush.list);
+	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
+	rq->rq_flags |= RQF_FLUSH_SEQ;
+	rq->end_io = mq_flush_data_end_io;
+}

No ?

Otherwise, look good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> +
>  /**
>   * blk_insert_flush - insert a new PREFLUSH/FUA request
>   * @rq: request to insert
> @@ -437,13 +446,7 @@ void blk_insert_flush(struct request *rq)
>  	 * @rq should go through flush machinery.  Mark it part of flush
>  	 * sequence and submit for further processing.
>  	 */
> -	memset(&rq->flush, 0, sizeof(rq->flush));
> -	INIT_LIST_HEAD(&rq->flush.list);
> -	rq->rq_flags |= RQF_FLUSH_SEQ;
> -	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
> -
> -	rq->end_io = mq_flush_data_end_io;
> -
> +	blk_rq_init_flush(rq);
>  	spin_lock_irq(&fq->mq_flush_lock);
>  	blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
>  	spin_unlock_irq(&fq->mq_flush_lock);


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper
  2023-04-17  6:22   ` Damien Le Moal
@ 2023-04-17  6:24     ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-17  6:24 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block

On Mon, Apr 17, 2023 at 03:22:21PM +0900, Damien Le Moal wrote:
> > +static void blk_rq_init_flush(struct request *rq)
> > +{
> > +	memset(&rq->flush, 0, sizeof(rq->flush));
> > +	INIT_LIST_HEAD(&rq->flush.list);
> > +	rq->rq_flags |= RQF_FLUSH_SEQ;
> > +	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
> > +	rq->end_io = mq_flush_data_end_io;
> > +}
> 
> struct flush is:
> 
> struct {
> 	unsigned int		seq;
> 	struct list_head	list;
> 	rq_end_io_fn		*saved_end_io;
> } flush;
> 
> So given that list and saved_end_io are initialized here, we could remove the
> memset() by initializing seq "manually":

Yes.  And seq is always initialized in the callers, and I think I can also
removed saved_end_io entirely.  So maybe we can drop this patch in the
end and just do some other changes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] blk-mq: reflow blk_insert_flush
  2023-04-16 20:09 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
@ 2023-04-17  6:29   ` Damien Le Moal
  0 siblings, 0 replies; 30+ messages in thread
From: Damien Le Moal @ 2023-04-17  6:29 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block

On 4/17/23 05:09, Christoph Hellwig wrote:
> Use a switch statement to decide on the disposition of a flush request
> instead of multiple if statements, out of which one does checks that are
> more complex than required.  Also warn on a malformed request early
> on instead of doing a BUG_ON later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

One nit below, but looks OK overall.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> ---
>  block/blk-flush.c | 53 +++++++++++++++++++++++------------------------
>  1 file changed, 26 insertions(+), 27 deletions(-)
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 422a6d5446d1c5..9fcfee7394f27d 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -402,6 +402,9 @@ void blk_insert_flush(struct request *rq)
>  	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
>  	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>  
> +	/* FLUSH/FUA request must never be merged */
> +	WARN_ON_ONCE(rq->bio != rq->biotail);

This could be:

	if (WARN_ON_ONCE(rq->bio != rq->biotail)) {
		blk_mq_end_request(rq, BLK_STS_IOERR);
 		return;
	}

To avoid an oops on the malformed request later on ?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands
  2023-04-16 20:09 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
@ 2023-04-17  6:31   ` Damien Le Moal
  2023-04-17 19:48   ` Bart Van Assche
  1 sibling, 0 replies; 30+ messages in thread
From: Damien Le Moal @ 2023-04-17  6:31 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block

On 4/17/23 05:09, Christoph Hellwig wrote:
> If blk_insert_flush decides that a command does not need to use the
> flush state machine, return false and let blk_mq_submit_bio handle
> it the normal way (including using an I/O scheduler) instead of doing
> a bypass insert.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Nice !

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine
  2023-04-16 20:09 ` [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine Christoph Hellwig
@ 2023-04-17  6:34   ` Damien Le Moal
  0 siblings, 0 replies; 30+ messages in thread
From: Damien Le Moal @ 2023-04-17  6:34 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block

On 4/17/23 05:09, Christoph Hellwig wrote:
> From: Bart Van Assche <bvanassche@acm.org>
> 
> Send write requests issued by the flush state machine through the normal
> I/O submission path including the I/O scheduler (if present) so that I/O
> scheduler policies are applied to writes with the FUA flag set.
> 
> Separate the I/O scheduler members from the flush members in struct
> request since now a request may pass through both an I/O scheduler
> and the flush machinery.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> [hch: rebased]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests
  2023-04-16 20:09 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
@ 2023-04-17  6:36   ` Damien Le Moal
  2023-04-17  6:39     ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2023-04-17  6:36 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block

On 4/17/23 05:09, Christoph Hellwig wrote:
> Requests with the FUA bit on hardware without FUA support need a post
> flush before returning the caller, but they can still be sent using

s/returning/returning to

> the normal I/O path after initializing the flush-related fields and
> end I/O handler.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-flush.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index f62e74d9d56bc8..9eda6d46438dba 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -435,6 +435,17 @@ bool blk_insert_flush(struct request *rq)
>  		 * Queue for normal execution.
>  		 */
>  		return false;
> +	case REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH:
> +		/*
> +		 * Initialize the flush fields and completion handler to trigger
> +		 * the post flush, and then just pass the command on.
> +		 */
> +		blk_rq_init_flush(rq);
> +		rq->flush.seq |= REQ_FSEQ_PREFLUSH;

Shouldn't this be REQ_FSEQ_POSTFLUSH ?

> +		spin_lock_irq(&fq->mq_flush_lock);
> +		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
> +		spin_unlock_irq(&fq->mq_flush_lock);
> +		return false;
>  	default:
>  		/*
>  		 * Mark the request as part of a flush sequence and submit it


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands
  2023-04-16 20:09 ` [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands Christoph Hellwig
@ 2023-04-17  6:38   ` Damien Le Moal
  2023-04-17 19:51   ` Bart Van Assche
  1 sibling, 0 replies; 30+ messages in thread
From: Damien Le Moal @ 2023-04-17  6:38 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block

On 4/17/23 05:09, Christoph Hellwig wrote:
> blk_flush_complete_seq currently queues requests that write data after
> a pre-flush from the flush state machine at the head of the queue.
> This doesn't really make sense, as the original request bypassed all
> queue lists by directly diverting to blk_insert_flush from
> blk_mq_submit_bio.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests
  2023-04-17  6:36   ` Damien Le Moal
@ 2023-04-17  6:39     ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-04-17  6:39 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block

On Mon, Apr 17, 2023 at 03:36:54PM +0900, Damien Le Moal wrote:
> > +	case REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH:
> > +		/*
> > +		 * Initialize the flush fields and completion handler to trigger
> > +		 * the post flush, and then just pass the command on.
> > +		 */
> > +		blk_rq_init_flush(rq);
> > +		rq->flush.seq |= REQ_FSEQ_PREFLUSH;
> 
> Shouldn't this be REQ_FSEQ_POSTFLUSH ?

Yes.  My fault for optimizing away the complicated assignment in the
last minute..

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
  2023-04-16 21:01   ` Bart Van Assche
@ 2023-04-17  6:46   ` Damien Le Moal
  2023-04-19 12:37   ` kernel test robot
  2023-04-19 17:34   ` kernel test robot
  3 siblings, 0 replies; 30+ messages in thread
From: Damien Le Moal @ 2023-04-17  6:46 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block

On 4/17/23 05:09, Christoph Hellwig wrote:
> Currently both requeues of commands that were already sent to the
> driver and flush commands submitted from the flush state machine
> share the same requeue_list struct request_queue, despite requeues
> doing head insertations and flushes not.  Switch to using two
> separate lists instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> ---
>  block/blk-flush.c      |  9 +++++++--
>  block/blk-mq-debugfs.c |  1 -
>  block/blk-mq.c         | 36 +++++++++++++++---------------------
>  block/blk-mq.h         |  1 -
>  include/linux/blk-mq.h |  4 +---
>  include/linux/blkdev.h |  1 +
>  6 files changed, 24 insertions(+), 28 deletions(-)
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 69e9806f575455..231d3780e74ad1 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
>  
>  	case REQ_FSEQ_DATA:
>  		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
> -		blk_mq_add_to_requeue_list(rq, 0);
> +		spin_lock(&q->requeue_lock);
> +		list_add_tail(&rq->queuelist, &q->flush_list);
> +		spin_unlock(&q->requeue_lock);
>  		blk_mq_kick_requeue_list(q);

With this change, this function name is a little misleading... But I do not have
a better name to propose :)



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands
  2023-04-16 20:09 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
  2023-04-17  6:31   ` Damien Le Moal
@ 2023-04-17 19:48   ` Bart Van Assche
  1 sibling, 0 replies; 30+ messages in thread
From: Bart Van Assche @ 2023-04-17 19:48 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 4/16/23 13:09, Christoph Hellwig wrote:
> If blk_insert_flush decides that a command does not need to use the
> flush state machine, return false and let blk_mq_submit_bio handle
> it the normal way (including using an I/O scheduler) instead of doing
> a bypass insert.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands
  2023-04-16 20:09 ` [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands Christoph Hellwig
  2023-04-17  6:38   ` Damien Le Moal
@ 2023-04-17 19:51   ` Bart Van Assche
  1 sibling, 0 replies; 30+ messages in thread
From: Bart Van Assche @ 2023-04-17 19:51 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 4/16/23 13:09, Christoph Hellwig wrote:
> blk_flush_complete_seq currently queues requests that write data after
> a pre-flush from the flush state machine at the head of the queue.
> This doesn't really make sense, as the original request bypassed all
> queue lists by directly diverting to blk_insert_flush from
> blk_mq_submit_bio.
insertations -> insertions

Otherwise:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
  2023-04-16 21:01   ` Bart Van Assche
  2023-04-17  6:46   ` Damien Le Moal
@ 2023-04-19 12:37   ` kernel test robot
  2023-04-19 17:34   ` kernel test robot
  3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2023-04-19 12:37 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: oe-kbuild-all, Bart Van Assche, Damien Le Moal, linux-block

Hi Christoph,

kernel test robot noticed the following build warnings:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20230418]
[cannot apply to linus/master v6.3-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Christoph-Hellwig/blk-mq-reflow-blk_insert_flush/20230417-051014
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20230416200930.29542-8-hch%40lst.de
patch subject: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
config: i386-randconfig-a003 (https://download.01.org/0day-ci/archive/20230419/202304192001.KzxpDQmK-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/c1197e1a4ff5b38b73d3ec923987ca857f5e2695
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Christoph-Hellwig/blk-mq-reflow-blk_insert_flush/20230417-051014
        git checkout c1197e1a4ff5b38b73d3ec923987ca857f5e2695
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304192001.KzxpDQmK-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> block/blk-mq.c:1461:6: warning: no previous prototype for 'blk_flush_queue_insert' [-Wmissing-prototypes]
    1461 | void blk_flush_queue_insert(struct request *rq)
         |      ^~~~~~~~~~~~~~~~~~~~~~


vim +/blk_flush_queue_insert +1461 block/blk-mq.c

  1460	
> 1461	void blk_flush_queue_insert(struct request *rq)
  1462	{
  1463		struct request_queue *q = rq->q;
  1464		unsigned long flags;
  1465	
  1466		spin_lock_irqsave(&q->requeue_lock, flags);
  1467		list_add_tail(&rq->queuelist, &q->flush_list);
  1468		spin_unlock_irqrestore(&q->requeue_lock, flags);
  1469	}
  1470	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
                     ` (2 preceding siblings ...)
  2023-04-19 12:37   ` kernel test robot
@ 2023-04-19 17:34   ` kernel test robot
  3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2023-04-19 17:34 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: llvm, oe-kbuild-all, Bart Van Assche, Damien Le Moal, linux-block

Hi Christoph,

kernel test robot noticed the following build warnings:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20230418]
[cannot apply to linus/master v6.3-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Christoph-Hellwig/blk-mq-reflow-blk_insert_flush/20230417-051014
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20230416200930.29542-8-hch%40lst.de
patch subject: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
config: x86_64-randconfig-a013-20230417 (https://download.01.org/0day-ci/archive/20230420/202304200122.GEFsqxFh-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/c1197e1a4ff5b38b73d3ec923987ca857f5e2695
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Christoph-Hellwig/blk-mq-reflow-blk_insert_flush/20230417-051014
        git checkout c1197e1a4ff5b38b73d3ec923987ca857f5e2695
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304200122.GEFsqxFh-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> block/blk-mq.c:1461:6: warning: no previous prototype for function 'blk_flush_queue_insert' [-Wmissing-prototypes]
   void blk_flush_queue_insert(struct request *rq)
        ^
   block/blk-mq.c:1461:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void blk_flush_queue_insert(struct request *rq)
   ^
   static 
   1 warning generated.


vim +/blk_flush_queue_insert +1461 block/blk-mq.c

  1460	
> 1461	void blk_flush_queue_insert(struct request *rq)
  1462	{
  1463		struct request_queue *q = rq->q;
  1464		unsigned long flags;
  1465	
  1466		spin_lock_irqsave(&q->requeue_lock, flags);
  1467		list_add_tail(&rq->queuelist, &q->flush_list);
  1468		spin_unlock_irqrestore(&q->requeue_lock, flags);
  1469	}
  1470	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RFC: less special casing for flush requests
  2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
@ 2023-05-16  0:08 ` Bart Van Assche
  2023-05-16  4:03   ` Christoph Hellwig
  7 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2023-05-16  0:08 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 4/16/23 13:09, Christoph Hellwig wrote:
> inspired by Barts quest for less reording during requeues, this series
> aims to limit the scope of requeues.  This is mostly done by simply
> sending down more than we currently do down the normal submission path
> with the help of Bart's patch to allow requests that are in the flush
> state machine to still be seen by the I/O schedulers.

Hi Christoph,

Do you have the time to continue the work on this patch series? Please 
let me know in case you would prefer that I continue the work on this 
patch series. I just found and fixed a bug in my patch "block: Send FUA 
requests to the I/O scheduler" (not included in this series).

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RFC: less special casing for flush requests
  2023-05-16  0:08 ` RFC: less special casing for flush requests Bart Van Assche
@ 2023-05-16  4:03   ` Christoph Hellwig
  2023-05-18  5:57     ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2023-05-16  4:03 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block

On Mon, May 15, 2023 at 05:08:27PM -0700, Bart Van Assche wrote:
> Do you have the time to continue the work on this patch series? Please let 
> me know in case you would prefer that I continue the work on this patch 
> series. I just found and fixed a bug in my patch "block: Send FUA requests 
> to the I/O scheduler" (not included in this series).

I did another pass last weekend and was planning to finish it off today.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RFC: less special casing for flush requests
  2023-05-16  4:03   ` Christoph Hellwig
@ 2023-05-18  5:57     ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2023-05-18  5:57 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block

On Tue, May 16, 2023 at 06:03:51AM +0200, Christoph Hellwig wrote:
> On Mon, May 15, 2023 at 05:08:27PM -0700, Bart Van Assche wrote:
> > Do you have the time to continue the work on this patch series? Please let 
> > me know in case you would prefer that I continue the work on this patch 
> > series. I just found and fixed a bug in my patch "block: Send FUA requests 
> > to the I/O scheduler" (not included in this series).
> 
> I did another pass last weekend and was planning to finish it off today.

Turns out this all interacts with the scheduler tags vs passthrough
dicussion we had, but I now have a version of the series rebased on top
of that:

    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-flush

I will send it out once we've settled the tags discussion.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-05-20  4:56     ` Christoph Hellwig
@ 2023-05-21 14:06       ` Bart Van Assche
  0 siblings, 0 replies; 30+ messages in thread
From: Bart Van Assche @ 2023-05-21 14:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Damien Le Moal, linux-block

On 5/19/23 21:56, Christoph Hellwig wrote:
> On Fri, May 19, 2023 at 12:55:45PM -0700, Bart Van Assche wrote:
>>>    	LIST_HEAD(rq_list);
>>> -	struct request *rq, *next;
>>> +	LIST_HEAD(flush_list);
>>> +	struct request *rq;
>>>      	spin_lock_irq(&q->requeue_lock);
>>>    	list_splice_init(&q->requeue_list, &rq_list);
>>> +	list_splice_init(&q->flush_list, &flush_list);
>>>    	spin_unlock_irq(&q->requeue_lock);
>>
>> "rq_list" stands for "request_list". That name is now confusing since this patch
>> add a second request list (flush_list).
> 
> It is.  But I think you were planning on doing a bigger rework in this
> area anyway?

Yes, that's right. I plan to remove both rq_list and flush_list from 
this function.

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-05-19 19:55   ` Bart Van Assche
@ 2023-05-20  4:56     ` Christoph Hellwig
  2023-05-21 14:06       ` Bart Van Assche
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2023-05-20  4:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block

On Fri, May 19, 2023 at 12:55:45PM -0700, Bart Van Assche wrote:
>>   	LIST_HEAD(rq_list);
>> -	struct request *rq, *next;
>> +	LIST_HEAD(flush_list);
>> +	struct request *rq;
>>     	spin_lock_irq(&q->requeue_lock);
>>   	list_splice_init(&q->requeue_list, &rq_list);
>> +	list_splice_init(&q->flush_list, &flush_list);
>>   	spin_unlock_irq(&q->requeue_lock);
>
> "rq_list" stands for "request_list". That name is now confusing since this patch
> add a second request list (flush_list).

It is.  But I think you were planning on doing a bigger rework in this
area anyway?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-05-19  4:40 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
@ 2023-05-19 19:55   ` Bart Van Assche
  2023-05-20  4:56     ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2023-05-19 19:55 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 5/18/23 21:40, Christoph Hellwig wrote:
> Currently both requeues of commands that were already sent to the
> driver and flush commands submitted from the flush state machine
> share the same requeue_list struct request_queue, despite requeues
> doing head insertations and flushes not.  Switch to using two
> separate lists instead.

insertations -> insertions. See also https://www.google.com/search?q=insertation.

> @@ -1434,13 +1437,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
>   	struct request_queue *q =
>   		container_of(work, struct request_queue, requeue_work.work);
>   	LIST_HEAD(rq_list);
> -	struct request *rq, *next;
> +	LIST_HEAD(flush_list);
> +	struct request *rq;
>   
>   	spin_lock_irq(&q->requeue_lock);
>   	list_splice_init(&q->requeue_list, &rq_list);
> +	list_splice_init(&q->flush_list, &flush_list);
>   	spin_unlock_irq(&q->requeue_lock);

"rq_list" stands for "request_list". That name is now confusing since this patch
add a second request list (flush_list).

Otherwise this patch looks good to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
@ 2023-05-19  4:40 ` Christoph Hellwig
  2023-05-19 19:55   ` Bart Van Assche
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Currently both requeues of commands that were already sent to the
driver and flush commands submitted from the flush state machine
share the same requeue_list struct request_queue, despite requeues
doing head insertations and flushes not.  Switch to using two
separate lists instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-flush.c      |  9 +++++++--
 block/blk-mq-debugfs.c |  1 -
 block/blk-mq.c         | 42 +++++++++++++-----------------------------
 block/blk-mq.h         |  1 -
 include/linux/blk-mq.h |  4 +---
 include/linux/blkdev.h |  1 +
 6 files changed, 22 insertions(+), 36 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index f407a59503173d..dba392cf22bec6 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
 
 	case REQ_FSEQ_DATA:
 		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
-		blk_mq_add_to_requeue_list(rq, 0);
+		spin_lock(&q->requeue_lock);
+		list_add_tail(&rq->queuelist, &q->flush_list);
+		spin_unlock(&q->requeue_lock);
 		blk_mq_kick_requeue_list(q);
 		break;
 
@@ -346,7 +348,10 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 	smp_wmb();
 	req_ref_set(flush_rq, 1);
 
-	blk_mq_add_to_requeue_list(flush_rq, 0);
+	spin_lock(&q->requeue_lock);
+	list_add_tail(&flush_rq->queuelist, &q->flush_list);
+	spin_unlock(&q->requeue_lock);
+
 	blk_mq_kick_requeue_list(q);
 }
 
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 22e39b9a77ecf2..68165a50951b68 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -244,7 +244,6 @@ static const char *const cmd_flag_name[] = {
 #define RQF_NAME(name) [ilog2((__force u32)RQF_##name)] = #name
 static const char *const rqf_name[] = {
 	RQF_NAME(STARTED),
-	RQF_NAME(SOFTBARRIER),
 	RQF_NAME(FLUSH_SEQ),
 	RQF_NAME(MIXED_MERGE),
 	RQF_NAME(MQ_INFLIGHT),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index aac67bc3d3680c..551e7760f45e20 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1416,13 +1416,16 @@ static void __blk_mq_requeue_request(struct request *rq)
 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
 	struct request_queue *q = rq->q;
+	unsigned long flags;
 
 	__blk_mq_requeue_request(rq);
 
 	/* this request will be re-inserted to io scheduler queue */
 	blk_mq_sched_requeue_request(rq);
 
-	blk_mq_add_to_requeue_list(rq, BLK_MQ_INSERT_AT_HEAD);
+	spin_lock_irqsave(&q->requeue_lock, flags);
+	list_add_tail(&rq->queuelist, &q->requeue_list);
+	spin_unlock_irqrestore(&q->requeue_lock, flags);
 
 	if (kick_requeue_list)
 		blk_mq_kick_requeue_list(q);
@@ -1434,13 +1437,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	struct request_queue *q =
 		container_of(work, struct request_queue, requeue_work.work);
 	LIST_HEAD(rq_list);
-	struct request *rq, *next;
+	LIST_HEAD(flush_list);
+	struct request *rq;
 
 	spin_lock_irq(&q->requeue_lock);
 	list_splice_init(&q->requeue_list, &rq_list);
+	list_splice_init(&q->flush_list, &flush_list);
 	spin_unlock_irq(&q->requeue_lock);
 
-	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
+	while (!list_empty(&rq_list)) {
+		rq = list_entry(rq_list.next, struct request, queuelist);
 		/*
 		 * If RQF_DONTPREP ist set, the request has been started by the
 		 * driver already and might have driver-specific data allocated
@@ -1448,18 +1454,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		 * block layer merges for the request.
 		 */
 		if (rq->rq_flags & RQF_DONTPREP) {
-			rq->rq_flags &= ~RQF_SOFTBARRIER;
 			list_del_init(&rq->queuelist);
 			blk_mq_request_bypass_insert(rq, 0);
-		} else if (rq->rq_flags & RQF_SOFTBARRIER) {
-			rq->rq_flags &= ~RQF_SOFTBARRIER;
+		} else {
 			list_del_init(&rq->queuelist);
 			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
 		}
 	}
 
-	while (!list_empty(&rq_list)) {
-		rq = list_entry(rq_list.next, struct request, queuelist);
+	while (!list_empty(&flush_list)) {
+		rq = list_entry(flush_list.next, struct request, queuelist);
 		list_del_init(&rq->queuelist);
 		blk_mq_insert_request(rq, 0);
 	}
@@ -1467,27 +1471,6 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	blk_mq_run_hw_queues(q, false);
 }
 
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags)
-{
-	struct request_queue *q = rq->q;
-	unsigned long flags;
-
-	/*
-	 * We abuse this flag that is otherwise used by the I/O scheduler to
-	 * request head insertion from the workqueue.
-	 */
-	BUG_ON(rq->rq_flags & RQF_SOFTBARRIER);
-
-	spin_lock_irqsave(&q->requeue_lock, flags);
-	if (insert_flags & BLK_MQ_INSERT_AT_HEAD) {
-		rq->rq_flags |= RQF_SOFTBARRIER;
-		list_add(&rq->queuelist, &q->requeue_list);
-	} else {
-		list_add_tail(&rq->queuelist, &q->requeue_list);
-	}
-	spin_unlock_irqrestore(&q->requeue_lock, flags);
-}
-
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
 	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work, 0);
@@ -4239,6 +4222,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	blk_mq_update_poll_flag(q);
 
 	INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
+	INIT_LIST_HEAD(&q->flush_list);
 	INIT_LIST_HEAD(&q->requeue_list);
 	spin_lock_init(&q->requeue_lock);
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ec7d2fb0b3c8ef..8c642e9f32f102 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,7 +47,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
 bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *,
 			     unsigned int);
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags);
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_ctx *start);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 935201c8974371..d778cb6b211233 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -28,8 +28,6 @@ typedef __u32 __bitwise req_flags_t;
 
 /* drive already may have started this one */
 #define RQF_STARTED		((__force req_flags_t)(1 << 1))
-/* may not be passed by ioscheduler */
-#define RQF_SOFTBARRIER		((__force req_flags_t)(1 << 3))
 /* request for flush sequence */
 #define RQF_FLUSH_SEQ		((__force req_flags_t)(1 << 4))
 /* merge of different types, fail separately */
@@ -65,7 +63,7 @@ typedef __u32 __bitwise req_flags_t;
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \
-	(RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
+	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
 
 enum mq_rq_state {
 	MQ_RQ_IDLE		= 0,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3952c52d6cd1b0..fe99948688dfda 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -487,6 +487,7 @@ struct request_queue {
 	 * for flush operations
 	 */
 	struct blk_flush_queue	*fq;
+	struct list_head	flush_list;
 
 	struct list_head	requeue_list;
 	spinlock_t		requeue_lock;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-05-21 14:06 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
2023-04-16 20:09 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
2023-04-17  6:22   ` Damien Le Moal
2023-04-17  6:24     ` Christoph Hellwig
2023-04-16 20:09 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
2023-04-17  6:29   ` Damien Le Moal
2023-04-16 20:09 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
2023-04-17  6:31   ` Damien Le Moal
2023-04-17 19:48   ` Bart Van Assche
2023-04-16 20:09 ` [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine Christoph Hellwig
2023-04-17  6:34   ` Damien Le Moal
2023-04-16 20:09 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
2023-04-17  6:36   ` Damien Le Moal
2023-04-17  6:39     ` Christoph Hellwig
2023-04-16 20:09 ` [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands Christoph Hellwig
2023-04-17  6:38   ` Damien Le Moal
2023-04-17 19:51   ` Bart Van Assche
2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
2023-04-16 21:01   ` Bart Van Assche
2023-04-17  4:26     ` Christoph Hellwig
2023-04-17  6:46   ` Damien Le Moal
2023-04-19 12:37   ` kernel test robot
2023-04-19 17:34   ` kernel test robot
2023-05-16  0:08 ` RFC: less special casing for flush requests Bart Van Assche
2023-05-16  4:03   ` Christoph Hellwig
2023-05-18  5:57     ` Christoph Hellwig
2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
2023-05-19  4:40 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
2023-05-19 19:55   ` Bart Van Assche
2023-05-20  4:56     ` Christoph Hellwig
2023-05-21 14:06       ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).