All of lore.kernel.org
 help / color / mirror / Atom feed
* less special casing for flush requests v2
@ 2023-05-19  4:40 Christoph Hellwig
  2023-05-19  4:40 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Hi all,

inspired by Barts quest for less reording during requeues, this series
aims to limit the scope of requeues.  This is mostly done by simply
sending down more than we currently do down the normal submission path
with the help of Bart's patch to allow requests that are in the flush
state machine to still be seen by the I/O schedulers.

Changes since v1:
 - rebased
 - fix a bug in the blk state machine insertation order

Diffstat:
 block/blk-flush.c      |  110 +++++++++++++++++++++++++++----------------------
 block/blk-mq-debugfs.c |    1 
 block/blk-mq.c         |   54 ++++++++----------------
 block/blk-mq.h         |    5 --
 block/blk.h            |    2 
 include/linux/blk-mq.h |   31 +++++--------
 include/linux/blkdev.h |    1 
 7 files changed, 94 insertions(+), 110 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper
  2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
@ 2023-05-19  4:40 ` Christoph Hellwig
  2023-05-19 18:54   ` Bart Van Assche
  2023-05-19  4:40 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Factor out a helper from blk_insert_flush that initializes the flush
machine related fields in struct request, and don't bother with the
full memset as there's just a few fields to initialize, and all but
one already have explicit initializers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-flush.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 04698ed9bcd4a9..ed37d272f787eb 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -376,6 +376,15 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
 	return RQ_END_IO_NONE;
 }
 
+static void blk_rq_init_flush(struct request *rq)
+{
+	rq->flush.seq = 0;
+	INIT_LIST_HEAD(&rq->flush.list);
+	rq->rq_flags |= RQF_FLUSH_SEQ;
+	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
+	rq->end_io = mq_flush_data_end_io;
+}
+
 /**
  * blk_insert_flush - insert a new PREFLUSH/FUA request
  * @rq: request to insert
@@ -437,13 +446,7 @@ void blk_insert_flush(struct request *rq)
 	 * @rq should go through flush machinery.  Mark it part of flush
 	 * sequence and submit for further processing.
 	 */
-	memset(&rq->flush, 0, sizeof(rq->flush));
-	INIT_LIST_HEAD(&rq->flush.list);
-	rq->rq_flags |= RQF_FLUSH_SEQ;
-	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
-
-	rq->end_io = mq_flush_data_end_io;
-
+	blk_rq_init_flush(rq);
 	spin_lock_irq(&fq->mq_flush_lock);
 	blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
 	spin_unlock_irq(&fq->mq_flush_lock);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] blk-mq: reflow blk_insert_flush
  2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
  2023-05-19  4:40 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
@ 2023-05-19  4:40 ` Christoph Hellwig
  2023-05-19 19:38   ` Bart Van Assche
  2023-05-19  4:40 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Use a switch statement to decide on the disposition of a flush request
instead of multiple if statements, out of which one does checks that are
more complex than required.  Also warn on a malformed request early
on instead of doing a BUG_ON later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-flush.c | 53 +++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index ed37d272f787eb..d8144f1f6fb12f 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -402,6 +402,9 @@ void blk_insert_flush(struct request *rq)
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
+	/* FLUSH/FUA request must never be merged */
+	WARN_ON_ONCE(rq->bio != rq->biotail);
+
 	/*
 	 * @policy now records what operations need to be done.  Adjust
 	 * REQ_PREFLUSH and FUA for the driver.
@@ -417,39 +420,35 @@ void blk_insert_flush(struct request *rq)
 	 */
 	rq->cmd_flags |= REQ_SYNC;
 
-	/*
-	 * An empty flush handed down from a stacking driver may
-	 * translate into nothing if the underlying device does not
-	 * advertise a write-back cache.  In this case, simply
-	 * complete the request.
-	 */
-	if (!policy) {
+	switch (policy) {
+	case 0:
+		/*
+		 * An empty flush handed down from a stacking driver may
+		 * translate into nothing if the underlying device does not
+		 * advertise a write-back cache.  In this case, simply
+		 * complete the request.
+		 */
 		blk_mq_end_request(rq, 0);
 		return;
-	}
-
-	BUG_ON(rq->bio != rq->biotail); /*assumes zero or single bio rq */
-
-	/*
-	 * If there's data but flush is not necessary, the request can be
-	 * processed directly without going through flush machinery.  Queue
-	 * for normal execution.
-	 */
-	if ((policy & REQ_FSEQ_DATA) &&
-	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
+	case REQ_FSEQ_DATA:
+		/*
+		 * If there's data, but no flush is necessary, the request can
+		 * be processed directly without going through flush machinery.
+		 * Queue for normal execution.
+		 */
 		blk_mq_request_bypass_insert(rq, 0);
 		blk_mq_run_hw_queue(hctx, false);
 		return;
+	default:
+		/*
+		 * Mark the request as part of a flush sequence and submit it
+		 * for further processing to the flush state machine.
+		 */
+		blk_rq_init_flush(rq);
+		spin_lock_irq(&fq->mq_flush_lock);
+		blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
+		spin_unlock_irq(&fq->mq_flush_lock);
 	}
-
-	/*
-	 * @rq should go through flush machinery.  Mark it part of flush
-	 * sequence and submit for further processing.
-	 */
-	blk_rq_init_flush(rq);
-	spin_lock_irq(&fq->mq_flush_lock);
-	blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
-	spin_unlock_irq(&fq->mq_flush_lock);
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands
  2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
  2023-05-19  4:40 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
  2023-05-19  4:40 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
@ 2023-05-19  4:40 ` Christoph Hellwig
  2023-05-19  4:40 ` [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

If blk_insert_flush decides that a command does not need to use the
flush state machine, return false and let blk_mq_submit_bio handle
it the normal way (including using an I/O scheduler) instead of doing
a bypass insert.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-flush.c | 22 ++++++++--------------
 block/blk-mq.c    |  8 ++++----
 block/blk-mq.h    |  4 ----
 block/blk.h       |  2 +-
 4 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index d8144f1f6fb12f..6fb9cf2d38184b 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -385,22 +385,17 @@ static void blk_rq_init_flush(struct request *rq)
 	rq->end_io = mq_flush_data_end_io;
 }
 
-/**
- * blk_insert_flush - insert a new PREFLUSH/FUA request
- * @rq: request to insert
- *
- * To be called from __elv_add_request() for %ELEVATOR_INSERT_FLUSH insertions.
- * or __blk_mq_run_hw_queue() to dispatch request.
- * @rq is being submitted.  Analyze what needs to be done and put it on the
- * right queue.
+/*
+ * Insert a PREFLUSH/FUA request into the flush state machine.
+ * Returns true if the request has been consumed by the flush state machine,
+ * or false if the caller should continue to process it.
  */
-void blk_insert_flush(struct request *rq)
+bool blk_insert_flush(struct request *rq)
 {
 	struct request_queue *q = rq->q;
 	unsigned long fflags = q->queue_flags;	/* may change, cache */
 	unsigned int policy = blk_flush_policy(fflags, rq);
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
-	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
 	/* FLUSH/FUA request must never be merged */
 	WARN_ON_ONCE(rq->bio != rq->biotail);
@@ -429,16 +424,14 @@ void blk_insert_flush(struct request *rq)
 		 * complete the request.
 		 */
 		blk_mq_end_request(rq, 0);
-		return;
+		return true;
 	case REQ_FSEQ_DATA:
 		/*
 		 * If there's data, but no flush is necessary, the request can
 		 * be processed directly without going through flush machinery.
 		 * Queue for normal execution.
 		 */
-		blk_mq_request_bypass_insert(rq, 0);
-		blk_mq_run_hw_queue(hctx, false);
-		return;
+		return false;
 	default:
 		/*
 		 * Mark the request as part of a flush sequence and submit it
@@ -448,6 +441,7 @@ void blk_insert_flush(struct request *rq)
 		spin_lock_irq(&fq->mq_flush_lock);
 		blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
 		spin_unlock_irq(&fq->mq_flush_lock);
+		return true;
 	}
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e021740154feae..c0b394096b6b6b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -45,6 +45,8 @@
 static DEFINE_PER_CPU(struct llist_head, blk_cpu_done);
 
 static void blk_mq_insert_request(struct request *rq, blk_insert_t flags);
+static void blk_mq_request_bypass_insert(struct request *rq,
+		blk_insert_t flags);
 static void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 		struct list_head *list);
 
@@ -2430,7 +2432,7 @@ static void blk_mq_run_work_fn(struct work_struct *work)
  * Should only be used carefully, when the caller knows we want to
  * bypass a potential IO scheduler on the target device.
  */
-void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
+static void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
 {
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
@@ -2977,10 +2979,8 @@ void blk_mq_submit_bio(struct bio *bio)
 		return;
 	}
 
-	if (op_is_flush(bio->bi_opf)) {
-		blk_insert_flush(rq);
+	if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
 		return;
-	}
 
 	if (plug) {
 		blk_add_rq_to_plug(plug, rq);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d15981db34b958..ec7d2fb0b3c8ef 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -64,10 +64,6 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
 void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
 			     struct blk_mq_tags *tags,
 			     unsigned int hctx_idx);
-/*
- * Internal helpers for request insertion into sw queues
- */
-void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags);
 
 /*
  * CPU -> queue mappings
diff --git a/block/blk.h b/block/blk.h
index 45547bcf111938..9f171b8f1e3402 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -269,7 +269,7 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
  */
 #define ELV_ON_HASH(rq) ((rq)->rq_flags & RQF_HASHED)
 
-void blk_insert_flush(struct request *rq);
+bool blk_insert_flush(struct request *rq);
 
 int elevator_switch(struct request_queue *q, struct elevator_type *new_e);
 void elevator_disable(struct request_queue *q);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
  2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-05-19  4:40 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
@ 2023-05-19  4:40 ` Christoph Hellwig
  2023-05-19 19:44   ` Bart Van Assche
  2023-05-24  5:53   ` Christoph Hellwig
  2023-05-19  4:40 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

From: Bart Van Assche <bvanassche@acm.org>

Send write requests issued by the flush state machine through the normal
I/O submission path including the I/O scheduler (if present) so that I/O
scheduler policies are applied to writes with the FUA flag set.

Separate the I/O scheduler members from the flush members in struct
request since now a request may pass through both an I/O scheduler
and the flush machinery.

Note that the actual flush requests, which have no bio attached to the
request still bypass the I/O schedulers.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[hch: rebased]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-mq.c         |  4 ++--
 include/linux/blk-mq.h | 27 +++++++++++----------------
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c0b394096b6b6b..aac67bc3d3680c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -458,7 +458,7 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 		 * Flush/passthrough requests are special and go directly to the
 		 * dispatch list.
 		 */
-		if (!op_is_flush(data->cmd_flags) &&
+		if ((data->cmd_flags & REQ_OP_MASK) != REQ_OP_FLUSH &&
 		    !blk_op_is_passthrough(data->cmd_flags)) {
 			struct elevator_mq_ops *ops = &q->elevator->type->ops;
 
@@ -2497,7 +2497,7 @@ static void blk_mq_insert_request(struct request *rq, blk_insert_t flags)
 		 * dispatch it given we prioritize requests in hctx->dispatch.
 		 */
 		blk_mq_request_bypass_insert(rq, flags);
-	} else if (rq->rq_flags & RQF_FLUSH_SEQ) {
+	} else if (req_op(rq) == REQ_OP_FLUSH) {
 		/*
 		 * Firstly normal IO request is inserted to scheduler queue or
 		 * sw queue, meantime we add flush request to dispatch queue(
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 49d14b1acfa5df..935201c8974371 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -169,25 +169,20 @@ struct request {
 		void *completion_data;
 	};
 
-
 	/*
 	 * Three pointers are available for the IO schedulers, if they need
-	 * more they have to dynamically allocate it.  Flush requests are
-	 * never put on the IO scheduler. So let the flush fields share
-	 * space with the elevator data.
+	 * more they have to dynamically allocate it.
 	 */
-	union {
-		struct {
-			struct io_cq		*icq;
-			void			*priv[2];
-		} elv;
-
-		struct {
-			unsigned int		seq;
-			struct list_head	list;
-			rq_end_io_fn		*saved_end_io;
-		} flush;
-	};
+	struct {
+		struct io_cq		*icq;
+		void			*priv[2];
+	} elv;
+
+	struct {
+		unsigned int		seq;
+		struct list_head	list;
+		rq_end_io_fn		*saved_end_io;
+	} flush;
 
 	union {
 		struct __call_single_data csd;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests
  2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-05-19  4:40 ` [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine Christoph Hellwig
@ 2023-05-19  4:40 ` Christoph Hellwig
  2023-05-19 19:42   ` Bart Van Assche
  2023-05-19  4:40 ` [PATCH 6/7] blk-mq: do not do head insertions post-pre-flush commands Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Requests with the FUA bit on hardware without FUA support need a post
flush before returning to the caller, but they can still be sent using
the normal I/O path after initializing the flush-related fields and
end I/O handler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 6fb9cf2d38184b..7121f9ad0762f8 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -432,6 +432,17 @@ bool blk_insert_flush(struct request *rq)
 		 * Queue for normal execution.
 		 */
 		return false;
+	case REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH:
+		/*
+		 * Initialize the flush fields and completion handler to trigger
+		 * the post flush, and then just pass the command on.
+		 */
+		blk_rq_init_flush(rq);
+		rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
+		spin_lock_irq(&fq->mq_flush_lock);
+		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
+		spin_unlock_irq(&fq->mq_flush_lock);
+		return false;
 	default:
 		/*
 		 * Mark the request as part of a flush sequence and submit it
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] blk-mq: do not do head insertions post-pre-flush commands
  2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-05-19  4:40 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
@ 2023-05-19  4:40 ` Christoph Hellwig
  2023-05-19  4:40 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
  2023-05-20  1:53 ` less special casing for flush requests v2 Jens Axboe
  7 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

blk_flush_complete_seq currently queues requests that write data after
a pre-flush from the flush state machine at the head of the queue.
This doesn't really make sense, as the original request bypassed all
queue lists by directly diverting to blk_insert_flush from
blk_mq_submit_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 7121f9ad0762f8..f407a59503173d 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -188,7 +188,7 @@ static void blk_flush_complete_seq(struct request *rq,
 
 	case REQ_FSEQ_DATA:
 		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
-		blk_mq_add_to_requeue_list(rq, BLK_MQ_INSERT_AT_HEAD);
+		blk_mq_add_to_requeue_list(rq, 0);
 		blk_mq_kick_requeue_list(q);
 		break;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-05-19  4:40 ` [PATCH 6/7] blk-mq: do not do head insertions post-pre-flush commands Christoph Hellwig
@ 2023-05-19  4:40 ` Christoph Hellwig
  2023-05-19 19:55   ` Bart Van Assche
  2023-05-20  1:53 ` less special casing for flush requests v2 Jens Axboe
  7 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

Currently both requeues of commands that were already sent to the
driver and flush commands submitted from the flush state machine
share the same requeue_list struct request_queue, despite requeues
doing head insertations and flushes not.  Switch to using two
separate lists instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-flush.c      |  9 +++++++--
 block/blk-mq-debugfs.c |  1 -
 block/blk-mq.c         | 42 +++++++++++++-----------------------------
 block/blk-mq.h         |  1 -
 include/linux/blk-mq.h |  4 +---
 include/linux/blkdev.h |  1 +
 6 files changed, 22 insertions(+), 36 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index f407a59503173d..dba392cf22bec6 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
 
 	case REQ_FSEQ_DATA:
 		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
-		blk_mq_add_to_requeue_list(rq, 0);
+		spin_lock(&q->requeue_lock);
+		list_add_tail(&rq->queuelist, &q->flush_list);
+		spin_unlock(&q->requeue_lock);
 		blk_mq_kick_requeue_list(q);
 		break;
 
@@ -346,7 +348,10 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 	smp_wmb();
 	req_ref_set(flush_rq, 1);
 
-	blk_mq_add_to_requeue_list(flush_rq, 0);
+	spin_lock(&q->requeue_lock);
+	list_add_tail(&flush_rq->queuelist, &q->flush_list);
+	spin_unlock(&q->requeue_lock);
+
 	blk_mq_kick_requeue_list(q);
 }
 
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 22e39b9a77ecf2..68165a50951b68 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -244,7 +244,6 @@ static const char *const cmd_flag_name[] = {
 #define RQF_NAME(name) [ilog2((__force u32)RQF_##name)] = #name
 static const char *const rqf_name[] = {
 	RQF_NAME(STARTED),
-	RQF_NAME(SOFTBARRIER),
 	RQF_NAME(FLUSH_SEQ),
 	RQF_NAME(MIXED_MERGE),
 	RQF_NAME(MQ_INFLIGHT),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index aac67bc3d3680c..551e7760f45e20 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1416,13 +1416,16 @@ static void __blk_mq_requeue_request(struct request *rq)
 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
 	struct request_queue *q = rq->q;
+	unsigned long flags;
 
 	__blk_mq_requeue_request(rq);
 
 	/* this request will be re-inserted to io scheduler queue */
 	blk_mq_sched_requeue_request(rq);
 
-	blk_mq_add_to_requeue_list(rq, BLK_MQ_INSERT_AT_HEAD);
+	spin_lock_irqsave(&q->requeue_lock, flags);
+	list_add_tail(&rq->queuelist, &q->requeue_list);
+	spin_unlock_irqrestore(&q->requeue_lock, flags);
 
 	if (kick_requeue_list)
 		blk_mq_kick_requeue_list(q);
@@ -1434,13 +1437,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	struct request_queue *q =
 		container_of(work, struct request_queue, requeue_work.work);
 	LIST_HEAD(rq_list);
-	struct request *rq, *next;
+	LIST_HEAD(flush_list);
+	struct request *rq;
 
 	spin_lock_irq(&q->requeue_lock);
 	list_splice_init(&q->requeue_list, &rq_list);
+	list_splice_init(&q->flush_list, &flush_list);
 	spin_unlock_irq(&q->requeue_lock);
 
-	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
+	while (!list_empty(&rq_list)) {
+		rq = list_entry(rq_list.next, struct request, queuelist);
 		/*
 		 * If RQF_DONTPREP ist set, the request has been started by the
 		 * driver already and might have driver-specific data allocated
@@ -1448,18 +1454,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		 * block layer merges for the request.
 		 */
 		if (rq->rq_flags & RQF_DONTPREP) {
-			rq->rq_flags &= ~RQF_SOFTBARRIER;
 			list_del_init(&rq->queuelist);
 			blk_mq_request_bypass_insert(rq, 0);
-		} else if (rq->rq_flags & RQF_SOFTBARRIER) {
-			rq->rq_flags &= ~RQF_SOFTBARRIER;
+		} else {
 			list_del_init(&rq->queuelist);
 			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
 		}
 	}
 
-	while (!list_empty(&rq_list)) {
-		rq = list_entry(rq_list.next, struct request, queuelist);
+	while (!list_empty(&flush_list)) {
+		rq = list_entry(flush_list.next, struct request, queuelist);
 		list_del_init(&rq->queuelist);
 		blk_mq_insert_request(rq, 0);
 	}
@@ -1467,27 +1471,6 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	blk_mq_run_hw_queues(q, false);
 }
 
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags)
-{
-	struct request_queue *q = rq->q;
-	unsigned long flags;
-
-	/*
-	 * We abuse this flag that is otherwise used by the I/O scheduler to
-	 * request head insertion from the workqueue.
-	 */
-	BUG_ON(rq->rq_flags & RQF_SOFTBARRIER);
-
-	spin_lock_irqsave(&q->requeue_lock, flags);
-	if (insert_flags & BLK_MQ_INSERT_AT_HEAD) {
-		rq->rq_flags |= RQF_SOFTBARRIER;
-		list_add(&rq->queuelist, &q->requeue_list);
-	} else {
-		list_add_tail(&rq->queuelist, &q->requeue_list);
-	}
-	spin_unlock_irqrestore(&q->requeue_lock, flags);
-}
-
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
 	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work, 0);
@@ -4239,6 +4222,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	blk_mq_update_poll_flag(q);
 
 	INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
+	INIT_LIST_HEAD(&q->flush_list);
 	INIT_LIST_HEAD(&q->requeue_list);
 	spin_lock_init(&q->requeue_lock);
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ec7d2fb0b3c8ef..8c642e9f32f102 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,7 +47,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
 bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *,
 			     unsigned int);
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags);
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_ctx *start);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 935201c8974371..d778cb6b211233 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -28,8 +28,6 @@ typedef __u32 __bitwise req_flags_t;
 
 /* drive already may have started this one */
 #define RQF_STARTED		((__force req_flags_t)(1 << 1))
-/* may not be passed by ioscheduler */
-#define RQF_SOFTBARRIER		((__force req_flags_t)(1 << 3))
 /* request for flush sequence */
 #define RQF_FLUSH_SEQ		((__force req_flags_t)(1 << 4))
 /* merge of different types, fail separately */
@@ -65,7 +63,7 @@ typedef __u32 __bitwise req_flags_t;
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \
-	(RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
+	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
 
 enum mq_rq_state {
 	MQ_RQ_IDLE		= 0,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3952c52d6cd1b0..fe99948688dfda 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -487,6 +487,7 @@ struct request_queue {
 	 * for flush operations
 	 */
 	struct blk_flush_queue	*fq;
+	struct list_head	flush_list;
 
 	struct list_head	requeue_list;
 	spinlock_t		requeue_lock;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper
  2023-05-19  4:40 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
@ 2023-05-19 18:54   ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2023-05-19 18:54 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 5/18/23 21:40, Christoph Hellwig wrote:
> Factor out a helper from blk_insert_flush that initializes the flush
> machine related fields in struct request, and don't bother with the
> full memset as there's just a few fields to initialize, and all but
> one already have explicit initializers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   block/blk-flush.c | 17 ++++++++++-------
>   1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 04698ed9bcd4a9..ed37d272f787eb 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -376,6 +376,15 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
>   	return RQ_END_IO_NONE;
>   }
>   
> +static void blk_rq_init_flush(struct request *rq)
> +{
> +	rq->flush.seq = 0;
> +	INIT_LIST_HEAD(&rq->flush.list);
> +	rq->rq_flags |= RQF_FLUSH_SEQ;
> +	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
> +	rq->end_io = mq_flush_data_end_io;
> +}
> +
>   /**
>    * blk_insert_flush - insert a new PREFLUSH/FUA request
>    * @rq: request to insert
> @@ -437,13 +446,7 @@ void blk_insert_flush(struct request *rq)
>   	 * @rq should go through flush machinery.  Mark it part of flush
>   	 * sequence and submit for further processing.
>   	 */
> -	memset(&rq->flush, 0, sizeof(rq->flush));
> -	INIT_LIST_HEAD(&rq->flush.list);
> -	rq->rq_flags |= RQF_FLUSH_SEQ;
> -	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
> -
> -	rq->end_io = mq_flush_data_end_io;
> -
> +	blk_rq_init_flush(rq);
>   	spin_lock_irq(&fq->mq_flush_lock);
>   	blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
>   	spin_unlock_irq(&fq->mq_flush_lock);

Although I'm not sure it's useful to keep the "/* Usually NULL */" comment:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] blk-mq: reflow blk_insert_flush
  2023-05-19  4:40 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
@ 2023-05-19 19:38   ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2023-05-19 19:38 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 5/18/23 21:40, Christoph Hellwig wrote:
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index ed37d272f787eb..d8144f1f6fb12f 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -402,6 +402,9 @@ void blk_insert_flush(struct request *rq)
>   	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
>   	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>   
> +	/* FLUSH/FUA request must never be merged */
> +	WARN_ON_ONCE(rq->bio != rq->biotail);

request -> requests?

> -	/*
> -	 * If there's data but flush is not necessary, the request can be
> -	 * processed directly without going through flush machinery.  Queue
> -	 * for normal execution.
> -	 */
> -	if ((policy & REQ_FSEQ_DATA) &&
> -	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
> +	case REQ_FSEQ_DATA:
> +		/*
> +		 * If there's data, but no flush is necessary, the request can
> +		 * be processed directly without going through flush machinery.
> +		 * Queue for normal execution.
> +		 */

there's -> there is. "there's" is a contraction of "there has". See also
https://dictionary.cambridge.org/dictionary/english/there-s.

Otherwise this patch looks good to me. Hence:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests
  2023-05-19  4:40 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
@ 2023-05-19 19:42   ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2023-05-19 19:42 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 5/18/23 21:40, Christoph Hellwig wrote:
> Requests with the FUA bit on hardware without FUA support need a post
> flush before returning to the caller, but they can still be sent using
> the normal I/O path after initializing the flush-related fields and
> end I/O handler.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
  2023-05-19  4:40 ` [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine Christoph Hellwig
@ 2023-05-19 19:44   ` Bart Van Assche
  2023-05-24  5:53   ` Christoph Hellwig
  1 sibling, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2023-05-19 19:44 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 5/18/23 21:40, Christoph Hellwig wrote:
>   	/*
>   	 * Three pointers are available for the IO schedulers, if they need
> -	 * more they have to dynamically allocate it.  Flush requests are
> -	 * never put on the IO scheduler. So let the flush fields share
> -	 * space with the elevator data.
> +	 * more they have to dynamically allocate it.
>   	 */

(commenting on my own patch) How about adding the words "private data" in the
above comment to improve clarity? That will change the above comment into:

	/*
	 * Three pointers are available for IO schedulers. If they need
	 * more private data they have to allocate it dynamically.
	 */


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-05-19  4:40 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
@ 2023-05-19 19:55   ` Bart Van Assche
  2023-05-20  4:56     ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2023-05-19 19:55 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 5/18/23 21:40, Christoph Hellwig wrote:
> Currently both requeues of commands that were already sent to the
> driver and flush commands submitted from the flush state machine
> share the same requeue_list struct request_queue, despite requeues
> doing head insertations and flushes not.  Switch to using two
> separate lists instead.

insertations -> insertions. See also https://www.google.com/search?q=insertation.

> @@ -1434,13 +1437,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
>   	struct request_queue *q =
>   		container_of(work, struct request_queue, requeue_work.work);
>   	LIST_HEAD(rq_list);
> -	struct request *rq, *next;
> +	LIST_HEAD(flush_list);
> +	struct request *rq;
>   
>   	spin_lock_irq(&q->requeue_lock);
>   	list_splice_init(&q->requeue_list, &rq_list);
> +	list_splice_init(&q->flush_list, &flush_list);
>   	spin_unlock_irq(&q->requeue_lock);

"rq_list" stands for "request_list". That name is now confusing since this patch
add a second request list (flush_list).

Otherwise this patch looks good to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: less special casing for flush requests v2
  2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-05-19  4:40 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
@ 2023-05-20  1:53 ` Jens Axboe
  7 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2023-05-20  1:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Bart Van Assche, Damien Le Moal, linux-block


On Fri, 19 May 2023 06:40:43 +0200, Christoph Hellwig wrote:
> inspired by Barts quest for less reording during requeues, this series
> aims to limit the scope of requeues.  This is mostly done by simply
> sending down more than we currently do down the normal submission path
> with the help of Bart's patch to allow requests that are in the flush
> state machine to still be seen by the I/O schedulers.
> 
> Changes since v1:
>  - rebased
>  - fix a bug in the blk state machine insertation order
> 
> [...]

Applied, thanks!

[1/7] blk-mq: factor out a blk_rq_init_flush helper
      commit: 0b573692f19501dfe2aeaf37b272ec07f60c70b9
[2/7] blk-mq: reflow blk_insert_flush
      commit: c1075e548ce6e6b5c7b71f2b05d344164ebc52bb
[3/7] blk-mq: defer to the normal submission path for non-flush flush commands
      commit: 360f264834e34d08530c2fb9b67e3ffa65318761
[4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
      commit: be4c427809b0a746aff54dbb8ef663f0184291d0
[5/7] blk-mq: defer to the normal submission path for post-flush requests
      commit: 615939a2ae734e3e68c816d6749d1f5f79c62ab7
[6/7] blk-mq: do not do head insertions post-pre-flush commands
      commit: 1e82fadfc6b96ca79f69d0bcf938d31032bb43d2
[7/7] blk-mq: don't use the requeue list to queue flush commands
      commit: 9a67aa52a42b31ad44220cc218df3b75a5cd5d05

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-05-19 19:55   ` Bart Van Assche
@ 2023-05-20  4:56     ` Christoph Hellwig
  2023-05-21 14:06       ` Bart Van Assche
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-20  4:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block

On Fri, May 19, 2023 at 12:55:45PM -0700, Bart Van Assche wrote:
>>   	LIST_HEAD(rq_list);
>> -	struct request *rq, *next;
>> +	LIST_HEAD(flush_list);
>> +	struct request *rq;
>>     	spin_lock_irq(&q->requeue_lock);
>>   	list_splice_init(&q->requeue_list, &rq_list);
>> +	list_splice_init(&q->flush_list, &flush_list);
>>   	spin_unlock_irq(&q->requeue_lock);
>
> "rq_list" stands for "request_list". That name is now confusing since this patch
> add a second request list (flush_list).

It is.  But I think you were planning on doing a bigger rework in this
area anyway?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
  2023-05-20  4:56     ` Christoph Hellwig
@ 2023-05-21 14:06       ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2023-05-21 14:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Damien Le Moal, linux-block

On 5/19/23 21:56, Christoph Hellwig wrote:
> On Fri, May 19, 2023 at 12:55:45PM -0700, Bart Van Assche wrote:
>>>    	LIST_HEAD(rq_list);
>>> -	struct request *rq, *next;
>>> +	LIST_HEAD(flush_list);
>>> +	struct request *rq;
>>>      	spin_lock_irq(&q->requeue_lock);
>>>    	list_splice_init(&q->requeue_list, &rq_list);
>>> +	list_splice_init(&q->flush_list, &flush_list);
>>>    	spin_unlock_irq(&q->requeue_lock);
>>
>> "rq_list" stands for "request_list". That name is now confusing since this patch
>> add a second request list (flush_list).
> 
> It is.  But I think you were planning on doing a bigger rework in this
> area anyway?

Yes, that's right. I plan to remove both rq_list and flush_list from 
this function.

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
  2023-05-19  4:40 ` [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine Christoph Hellwig
  2023-05-19 19:44   ` Bart Van Assche
@ 2023-05-24  5:53   ` Christoph Hellwig
  2023-05-24 18:07     ` Bart Van Assche
  2023-05-28 20:50     ` Bart Van Assche
  1 sibling, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-24  5:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block

I turns out this causes some kind of hang I haven't been able to
debug yet in blktests' hotplug test.   Can you drop this and the
subsequent patches for now?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
  2023-05-24  5:53   ` Christoph Hellwig
@ 2023-05-24 18:07     ` Bart Van Assche
  2023-05-25  8:14       ` Christoph Hellwig
  2023-05-28 20:50     ` Bart Van Assche
  1 sibling, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2023-05-24 18:07 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 5/23/23 22:53, Christoph Hellwig wrote:
> I turns out this causes some kind of hang I haven't been able to
> debug yet in blktests' hotplug test.   Can you drop this and the
> subsequent patches for now?

Hi Christoph,

I haven't seen this hang in my tests. If you can tell me how to run 
blktests I can help with root-causing this issue. This is how I run 
blktests:

modprobe brd
echo "TEST_DEVS=(/dev/ram0)" > config
export use_siw=1
./check -q

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
  2023-05-24 18:07     ` Bart Van Assche
@ 2023-05-25  8:14       ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-25  8:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block

On Wed, May 24, 2023 at 11:07:41AM -0700, Bart Van Assche wrote:
> On 5/23/23 22:53, Christoph Hellwig wrote:
>> I turns out this causes some kind of hang I haven't been able to
>> debug yet in blktests' hotplug test.   Can you drop this and the
>> subsequent patches for now?
>
> Hi Christoph,
>
> I haven't seen this hang in my tests. If you can tell me how to run 
> blktests I can help with root-causing this issue. This is how I run 
> blktests:

This is a simple ./check run with this config, and most importantly
modular scsi_debug (which is not my usual config, othewise I would
have noticed it earlier):

----- snip -----
TEST_DEVS=(/dev/vdb)
nvme_trtype=tcp
----- snip -----

It hangs in block/001 when probing scsi_debug:

[  242.790601] INFO: task modprobe:3702 blocked for more than 120 seconds.
[  242.791572]       Not tainted 6.4.0-rc2+ #1179
[  242.792201] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  242.793387] task:modprobe        state:D stack:0     pid:3702  ppid:3686
flags:0x00004002
[  242.794724] Call Trace:
[  242.795121]  <TASK>
[  242.795465]  __schedule+0x307/0x840
[  242.796053]  ? call_usermodehelper_exec+0xee/0x180
[  242.796812]  schedule+0x57/0xa0
[  242.797316]  async_synchronize_full+0xa0/0x130
[  242.798029]  ? destroy_sched_domains_rcu+0x20/0x20
[  242.798803]  do_init_module+0x19f/0x200
[  242.799657]  __do_sys_finit_module+0x9e/0xf0
[  242.800324]  do_syscall_64+0x34/0x80
[  242.800879]  entry_SYSCALL_64_after_hwframe+0x63/0xcd


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
  2023-05-24  5:53   ` Christoph Hellwig
  2023-05-24 18:07     ` Bart Van Assche
@ 2023-05-28 20:50     ` Bart Van Assche
  2023-05-30 14:55       ` Christoph Hellwig
  1 sibling, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2023-05-28 20:50 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block

On 5/23/23 22:53, Christoph Hellwig wrote:
> I turns out this causes some kind of hang I haven't been able to
> debug yet in blktests' hotplug test.   Can you drop this and the
> subsequent patches for now?

Hi Christoph,

Are you perhaps referring to test block/027? It passes in my test-VM
with Jens' for-next branch and the following blktests config file:

TEST_DEVS=(/dev/vdb)

It seems to me that the entire patch series is still present on Jens'
for-next branch (commit 05ab4411bd69b809aa62c774866ce0b9be072418)?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
  2023-05-28 20:50     ` Bart Van Assche
@ 2023-05-30 14:55       ` Christoph Hellwig
  2023-05-31 13:34         ` Bart Van Assche
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-30 14:55 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block

On Sun, May 28, 2023 at 01:50:50PM -0700, Bart Van Assche wrote:
> On 5/23/23 22:53, Christoph Hellwig wrote:
>> I turns out this causes some kind of hang I haven't been able to
>> debug yet in blktests' hotplug test.   Can you drop this and the
>> subsequent patches for now?
>
> Hi Christoph,
>
> Are you perhaps referring to test block/027? It passes in my test-VM
> with Jens' for-next branch and the following blktests config file:

No, I mean block/001

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
  2023-05-30 14:55       ` Christoph Hellwig
@ 2023-05-31 13:34         ` Bart Van Assche
  2023-05-31 15:00           ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2023-05-31 13:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Damien Le Moal, linux-block

On 5/30/23 07:55, Christoph Hellwig wrote:
> No, I mean block/001

Test block/001 also passes on my test setup. That setup is as follows:
* Kernel commit 0cac5b67c168 ("Merge branch 'for-6.5/block' into
   for-next").
* Multiple debugging options enabled in the kernel config (DEBUG_LIST,
   PROVE_LOCKING, KASAN, UBSAN, ...).

I ran this test twice: once with /dev/vdb as test device and once with 
/dev/ram0 as test device.

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine
  2023-05-31 13:34         ` Bart Van Assche
@ 2023-05-31 15:00           ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2023-05-31 15:00 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block

On Wed, May 31, 2023 at 06:34:49AM -0700, Bart Van Assche wrote:
> On 5/30/23 07:55, Christoph Hellwig wrote:
>> No, I mean block/001
>
> Test block/001 also passes on my test setup. That setup is as follows:
> * Kernel commit 0cac5b67c168 ("Merge branch 'for-6.5/block' into
>   for-next").
> * Multiple debugging options enabled in the kernel config (DEBUG_LIST,
>   PROVE_LOCKING, KASAN, UBSAN, ...).
>
> I ran this test twice: once with /dev/vdb as test device and once with 
> /dev/ram0 as test device.

Hi Bart,

I did a quick re-run on the latest for-6.5/block tree and can't reproduce
the hang.  Let me see if it is still there with my old branch or if
something else made it go away.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-05-31 15:00 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-19  4:40 less special casing for flush requests v2 Christoph Hellwig
2023-05-19  4:40 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
2023-05-19 18:54   ` Bart Van Assche
2023-05-19  4:40 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
2023-05-19 19:38   ` Bart Van Assche
2023-05-19  4:40 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
2023-05-19  4:40 ` [PATCH 4/7] blk-mq: use the I/O scheduler for writes from the flush state machine Christoph Hellwig
2023-05-19 19:44   ` Bart Van Assche
2023-05-24  5:53   ` Christoph Hellwig
2023-05-24 18:07     ` Bart Van Assche
2023-05-25  8:14       ` Christoph Hellwig
2023-05-28 20:50     ` Bart Van Assche
2023-05-30 14:55       ` Christoph Hellwig
2023-05-31 13:34         ` Bart Van Assche
2023-05-31 15:00           ` Christoph Hellwig
2023-05-19  4:40 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
2023-05-19 19:42   ` Bart Van Assche
2023-05-19  4:40 ` [PATCH 6/7] blk-mq: do not do head insertions post-pre-flush commands Christoph Hellwig
2023-05-19  4:40 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
2023-05-19 19:55   ` Bart Van Assche
2023-05-20  4:56     ` Christoph Hellwig
2023-05-21 14:06       ` Bart Van Assche
2023-05-20  1:53 ` less special casing for flush requests v2 Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.