linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] blk-mq: optimize the size of struct request
@ 2023-07-07  9:37 chengming.zhou
  2023-07-07  9:37 ` [PATCH v3 1/4] blk-mq: delete unused completion_data in " chengming.zhou
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: chengming.zhou @ 2023-07-07  9:37 UTC (permalink / raw)
  To: axboe, ming.lei, hch, tj; +Cc: linux-block, linux-kernel, zhouchengming

From: Chengming Zhou <zhouchengming@bytedance.com>

v3:
 - Collect Reviewed-by tags from Ming and Christoph. Thanks!
 - Remove the list and csd variables which are only used once.
 - Fix a bug report of blktests nvme/012 by re-initialization of
   rq->queuelist, which maybe corrupted by rq->rq_next reuse.
 - [v2] https://lore.kernel.org/all/20230629110359.1111832-1-chengming.zhou@linux.dev/

v2:
 - Change to use call_single_data_t, which use __aligned() to avoid
   to use 2 cache lines for 1 csd. Thanks Ming Lei.
 - [v1] https://lore.kernel.org/all/20230627120854.971475-1-chengming.zhou@linux.dev/

Hello,

After the commit be4c427809b0 ("blk-mq: use the I/O scheduler for
writes from the flush state machine"), rq->flush can't reuse rq->elv
anymore, since flush_data requests can go into io scheduler now.

That increased the size of struct request by 24 bytes, but this
patchset can decrease the size by 40 bytes, which is good I think.

patch 1 is just cleanup by the way.

patch 2 use percpu csd to do remote complete instead of per-rq csd,
decrease the size by 24 bytes.

patch 3-4 reuse rq->queuelist in flush state machine pending list,
and maintain unsigned long counter of inflight flush_data requests,
decrease the size by 16 bytes.

Thanks for comments!

Chengming Zhou (4):
  blk-mq: delete unused completion_data in struct request
  blk-mq: use percpu csd to remote complete instead of per-rq csd
  blk-flush: count inflight flush_data requests
  blk-flush: reuse rq queuelist in flush state machine

 block/blk-flush.c      | 24 ++++++++++++++----------
 block/blk-mq.c         | 12 ++++++------
 block/blk.h            |  5 ++---
 include/linux/blk-mq.h | 10 ++--------
 4 files changed, 24 insertions(+), 27 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/4] blk-mq: delete unused completion_data in struct request
  2023-07-07  9:37 [PATCH v3 0/4] blk-mq: optimize the size of struct request chengming.zhou
@ 2023-07-07  9:37 ` chengming.zhou
  2023-07-07 10:05   ` Ming Lei
  2023-07-07  9:37 ` [PATCH v3 2/4] blk-mq: use percpu csd to remote complete instead of per-rq csd chengming.zhou
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: chengming.zhou @ 2023-07-07  9:37 UTC (permalink / raw)
  To: axboe, ming.lei, hch, tj; +Cc: linux-block, linux-kernel, zhouchengming

From: Chengming Zhou <zhouchengming@bytedance.com>

After global search, I found "completion_data" in struct request
is not used anywhere, so just clean it up by the way.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blk-mq.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f401067ac03a..0a1c404e6c7a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -158,13 +158,11 @@ struct request {
 
 	/*
 	 * The rb_node is only used inside the io scheduler, requests
-	 * are pruned when moved to the dispatch queue. So let the
-	 * completion_data share space with the rb_node.
+	 * are pruned when moved to the dispatch queue.
 	 */
 	union {
 		struct rb_node rb_node;	/* sort/lookup */
 		struct bio_vec special_vec;
-		void *completion_data;
 	};
 
 	/*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/4] blk-mq: use percpu csd to remote complete instead of per-rq csd
  2023-07-07  9:37 [PATCH v3 0/4] blk-mq: optimize the size of struct request chengming.zhou
  2023-07-07  9:37 ` [PATCH v3 1/4] blk-mq: delete unused completion_data in " chengming.zhou
@ 2023-07-07  9:37 ` chengming.zhou
  2023-07-10  6:42   ` Christoph Hellwig
  2023-07-07  9:37 ` [PATCH v3 3/4] blk-flush: count inflight flush_data requests chengming.zhou
  2023-07-07  9:37 ` [PATCH v3 4/4] blk-flush: reuse rq queuelist in flush state machine chengming.zhou
  3 siblings, 1 reply; 8+ messages in thread
From: chengming.zhou @ 2023-07-07  9:37 UTC (permalink / raw)
  To: axboe, ming.lei, hch, tj; +Cc: linux-block, linux-kernel, zhouchengming

From: Chengming Zhou <zhouchengming@bytedance.com>

If request need to be completed remotely, we insert it into percpu llist,
and smp_call_function_single_async() if llist is empty previously.

We don't need to use per-rq csd, percpu csd is enough. And the size of
struct request is decreased by 24 bytes.

This way is cleaner, and looks correct, given block softirq is guaranteed
to be scheduled to consume the list if one new request is added to this
percpu list, either smp_call_function_single_async() returns -EBUSY or 0.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
v3:
 - Remove the list and csd variables as they are only used once, as
   suggested by Christoph Hellwig.

v2:
 - Change to use call_single_data_t, which avoid to use 2 cache lines
   for 1 csd, as suggested by Ming Lei.
 - Improve the commit log, the explanation is copied from Ming Lei.
---
 block/blk-mq.c         | 12 ++++++------
 include/linux/blk-mq.h |  5 +----
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index decb6ab2d508..7d013588077a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -43,6 +43,7 @@
 #include "blk-ioprio.h"
 
 static DEFINE_PER_CPU(struct llist_head, blk_cpu_done);
+static DEFINE_PER_CPU(call_single_data_t, blk_cpu_csd);
 
 static void blk_mq_insert_request(struct request *rq, blk_insert_t flags);
 static void blk_mq_request_bypass_insert(struct request *rq,
@@ -1154,15 +1155,11 @@ static inline bool blk_mq_complete_need_ipi(struct request *rq)
 
 static void blk_mq_complete_send_ipi(struct request *rq)
 {
-	struct llist_head *list;
 	unsigned int cpu;
 
 	cpu = rq->mq_ctx->cpu;
-	list = &per_cpu(blk_cpu_done, cpu);
-	if (llist_add(&rq->ipi_list, list)) {
-		INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq);
-		smp_call_function_single_async(cpu, &rq->csd);
-	}
+	if (llist_add(&rq->ipi_list, &per_cpu(blk_cpu_done, cpu)))
+		smp_call_function_single_async(cpu, &per_cpu(blk_cpu_csd, cpu));
 }
 
 static void blk_mq_raise_softirq(struct request *rq)
@@ -4796,6 +4793,9 @@ static int __init blk_mq_init(void)
 
 	for_each_possible_cpu(i)
 		init_llist_head(&per_cpu(blk_cpu_done, i));
+	for_each_possible_cpu(i)
+		INIT_CSD(&per_cpu(blk_cpu_csd, i),
+			 __blk_mq_complete_request_remote, NULL);
 	open_softirq(BLOCK_SOFTIRQ, blk_done_softirq);
 
 	cpuhp_setup_state_nocalls(CPUHP_BLOCK_SOFTIRQ_DEAD,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 0a1c404e6c7a..34d400171b3e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -180,10 +180,7 @@ struct request {
 		rq_end_io_fn		*saved_end_io;
 	} flush;
 
-	union {
-		struct __call_single_data csd;
-		u64 fifo_time;
-	};
+	u64 fifo_time;
 
 	/*
 	 * completion callback.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/4] blk-flush: count inflight flush_data requests
  2023-07-07  9:37 [PATCH v3 0/4] blk-mq: optimize the size of struct request chengming.zhou
  2023-07-07  9:37 ` [PATCH v3 1/4] blk-mq: delete unused completion_data in " chengming.zhou
  2023-07-07  9:37 ` [PATCH v3 2/4] blk-mq: use percpu csd to remote complete instead of per-rq csd chengming.zhou
@ 2023-07-07  9:37 ` chengming.zhou
  2023-07-07  9:37 ` [PATCH v3 4/4] blk-flush: reuse rq queuelist in flush state machine chengming.zhou
  3 siblings, 0 replies; 8+ messages in thread
From: chengming.zhou @ 2023-07-07  9:37 UTC (permalink / raw)
  To: axboe, ming.lei, hch, tj; +Cc: linux-block, linux-kernel, zhouchengming

From: Chengming Zhou <zhouchengming@bytedance.com>

The flush state machine use a double list to link all inflight
flush_data requests, to avoid issuing separate post-flushes for
these flush_data requests which shared PREFLUSH.

So we can't reuse rq->queuelist, this is why we need rq->flush.list

In preparation of the next patch that reuse rq->queuelist for flush
state machine, we change the double linked list to unsigned long
counter, which count all inflight flush_data requests.

This is ok since we only need to know if there is any inflight
flush_data request, so unsigned long counter is good.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c | 9 +++++----
 block/blk.h       | 5 ++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index dba392cf22be..bb7adfc2a5da 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -187,7 +187,8 @@ static void blk_flush_complete_seq(struct request *rq,
 		break;
 
 	case REQ_FSEQ_DATA:
-		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
+		list_del_init(&rq->flush.list);
+		fq->flush_data_in_flight++;
 		spin_lock(&q->requeue_lock);
 		list_add_tail(&rq->queuelist, &q->flush_list);
 		spin_unlock(&q->requeue_lock);
@@ -299,7 +300,7 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 		return;
 
 	/* C2 and C3 */
-	if (!list_empty(&fq->flush_data_in_flight) &&
+	if (fq->flush_data_in_flight &&
 	    time_before(jiffies,
 			fq->flush_pending_since + FLUSH_PENDING_TIMEOUT))
 		return;
@@ -374,6 +375,7 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
 	 * the comment in flush_end_io().
 	 */
 	spin_lock_irqsave(&fq->mq_flush_lock, flags);
+	fq->flush_data_in_flight--;
 	blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error);
 	spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 
@@ -445,7 +447,7 @@ bool blk_insert_flush(struct request *rq)
 		blk_rq_init_flush(rq);
 		rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
 		spin_lock_irq(&fq->mq_flush_lock);
-		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
+		fq->flush_data_in_flight++;
 		spin_unlock_irq(&fq->mq_flush_lock);
 		return false;
 	default:
@@ -496,7 +498,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 
 	INIT_LIST_HEAD(&fq->flush_queue[0]);
 	INIT_LIST_HEAD(&fq->flush_queue[1]);
-	INIT_LIST_HEAD(&fq->flush_data_in_flight);
 
 	return fq;
 
diff --git a/block/blk.h b/block/blk.h
index 608c5dcc516b..686712e13835 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -15,15 +15,14 @@ struct elevator_type;
 extern struct dentry *blk_debugfs_root;
 
 struct blk_flush_queue {
+	spinlock_t		mq_flush_lock;
 	unsigned int		flush_pending_idx:1;
 	unsigned int		flush_running_idx:1;
 	blk_status_t 		rq_status;
 	unsigned long		flush_pending_since;
 	struct list_head	flush_queue[2];
-	struct list_head	flush_data_in_flight;
+	unsigned long		flush_data_in_flight;
 	struct request		*flush_rq;
-
-	spinlock_t		mq_flush_lock;
 };
 
 bool is_flush_rq(struct request *req);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 4/4] blk-flush: reuse rq queuelist in flush state machine
  2023-07-07  9:37 [PATCH v3 0/4] blk-mq: optimize the size of struct request chengming.zhou
                   ` (2 preceding siblings ...)
  2023-07-07  9:37 ` [PATCH v3 3/4] blk-flush: count inflight flush_data requests chengming.zhou
@ 2023-07-07  9:37 ` chengming.zhou
  2023-07-07 10:06   ` Ming Lei
  3 siblings, 1 reply; 8+ messages in thread
From: chengming.zhou @ 2023-07-07  9:37 UTC (permalink / raw)
  To: axboe, ming.lei, hch, tj; +Cc: linux-block, linux-kernel, zhouchengming

From: Chengming Zhou <zhouchengming@bytedance.com>

Since we don't need to maintain inflight flush_data requests list
anymore, we can reuse rq->queuelist for flush pending list.

Note in mq_flush_data_end_io(), we need to re-initialize rq->queuelist
before reusing it in the state machine when end, since the rq->rq_next
also reuse it, may have corrupted rq->queuelist by the driver.

This patch decrease the size of struct request by 16 bytes.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
v3:
 - fix a bug report of blktests nvme/012, we need to re-initialize
   rq->queuelist before reusing it in the state machine when end.
   Because rq->rq_next reuse may have corrupted it. Thanks Ming Lei.
---
 block/blk-flush.c      | 17 ++++++++++-------
 include/linux/blk-mq.h |  1 -
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index bb7adfc2a5da..4826d2d61a23 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -183,14 +183,13 @@ static void blk_flush_complete_seq(struct request *rq,
 		/* queue for flush */
 		if (list_empty(pending))
 			fq->flush_pending_since = jiffies;
-		list_move_tail(&rq->flush.list, pending);
+		list_move_tail(&rq->queuelist, pending);
 		break;
 
 	case REQ_FSEQ_DATA:
-		list_del_init(&rq->flush.list);
 		fq->flush_data_in_flight++;
 		spin_lock(&q->requeue_lock);
-		list_add_tail(&rq->queuelist, &q->flush_list);
+		list_move_tail(&rq->queuelist, &q->flush_list);
 		spin_unlock(&q->requeue_lock);
 		blk_mq_kick_requeue_list(q);
 		break;
@@ -202,7 +201,7 @@ static void blk_flush_complete_seq(struct request *rq,
 		 * flush data request completion path.  Restore @rq for
 		 * normal completion and end it.
 		 */
-		list_del_init(&rq->flush.list);
+		list_del_init(&rq->queuelist);
 		blk_flush_restore_request(rq);
 		blk_mq_end_request(rq, error);
 		break;
@@ -258,7 +257,7 @@ static enum rq_end_io_ret flush_end_io(struct request *flush_rq,
 	fq->flush_running_idx ^= 1;
 
 	/* and push the waiting requests to the next stage */
-	list_for_each_entry_safe(rq, n, running, flush.list) {
+	list_for_each_entry_safe(rq, n, running, queuelist) {
 		unsigned int seq = blk_flush_cur_seq(rq);
 
 		BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
@@ -292,7 +291,7 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 {
 	struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
 	struct request *first_rq =
-		list_first_entry(pending, struct request, flush.list);
+		list_first_entry(pending, struct request, queuelist);
 	struct request *flush_rq = fq->flush_rq;
 
 	/* C1 described at the top of this file */
@@ -376,6 +375,11 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
 	 */
 	spin_lock_irqsave(&fq->mq_flush_lock, flags);
 	fq->flush_data_in_flight--;
+	/*
+	 * May have been corrupted by rq->rq_next reuse, we need to
+	 * re-initialize rq->queuelist before reusing it here.
+	 */
+	INIT_LIST_HEAD(&rq->queuelist);
 	blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error);
 	spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 
@@ -386,7 +390,6 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
 static void blk_rq_init_flush(struct request *rq)
 {
 	rq->flush.seq = 0;
-	INIT_LIST_HEAD(&rq->flush.list);
 	rq->rq_flags |= RQF_FLUSH_SEQ;
 	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
 	rq->end_io = mq_flush_data_end_io;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 34d400171b3e..ab790eba5fcf 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -176,7 +176,6 @@ struct request {
 
 	struct {
 		unsigned int		seq;
-		struct list_head	list;
 		rq_end_io_fn		*saved_end_io;
 	} flush;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/4] blk-mq: delete unused completion_data in struct request
  2023-07-07  9:37 ` [PATCH v3 1/4] blk-mq: delete unused completion_data in " chengming.zhou
@ 2023-07-07 10:05   ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2023-07-07 10:05 UTC (permalink / raw)
  To: chengming.zhou; +Cc: axboe, hch, tj, linux-block, linux-kernel, zhouchengming

On Fri, Jul 07, 2023 at 05:37:19PM +0800, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> After global search, I found "completion_data" in struct request
> is not used anywhere, so just clean it up by the way.
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 4/4] blk-flush: reuse rq queuelist in flush state machine
  2023-07-07  9:37 ` [PATCH v3 4/4] blk-flush: reuse rq queuelist in flush state machine chengming.zhou
@ 2023-07-07 10:06   ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2023-07-07 10:06 UTC (permalink / raw)
  To: chengming.zhou; +Cc: axboe, hch, tj, linux-block, linux-kernel, zhouchengming

On Fri, Jul 07, 2023 at 05:37:22PM +0800, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Since we don't need to maintain inflight flush_data requests list
> anymore, we can reuse rq->queuelist for flush pending list.
> 
> Note in mq_flush_data_end_io(), we need to re-initialize rq->queuelist
> before reusing it in the state machine when end, since the rq->rq_next
> also reuse it, may have corrupted rq->queuelist by the driver.
> 
> This patch decrease the size of struct request by 16 bytes.
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> v3:
>  - fix a bug report of blktests nvme/012, we need to re-initialize
>    rq->queuelist before reusing it in the state machine when end.
>    Because rq->rq_next reuse may have corrupted it. Thanks Ming Lei.

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/4] blk-mq: use percpu csd to remote complete instead of per-rq csd
  2023-07-07  9:37 ` [PATCH v3 2/4] blk-mq: use percpu csd to remote complete instead of per-rq csd chengming.zhou
@ 2023-07-10  6:42   ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2023-07-10  6:42 UTC (permalink / raw)
  To: chengming.zhou
  Cc: axboe, ming.lei, hch, tj, linux-block, linux-kernel, zhouchengming

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-07-10  6:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07  9:37 [PATCH v3 0/4] blk-mq: optimize the size of struct request chengming.zhou
2023-07-07  9:37 ` [PATCH v3 1/4] blk-mq: delete unused completion_data in " chengming.zhou
2023-07-07 10:05   ` Ming Lei
2023-07-07  9:37 ` [PATCH v3 2/4] blk-mq: use percpu csd to remote complete instead of per-rq csd chengming.zhou
2023-07-10  6:42   ` Christoph Hellwig
2023-07-07  9:37 ` [PATCH v3 3/4] blk-flush: count inflight flush_data requests chengming.zhou
2023-07-07  9:37 ` [PATCH v3 4/4] blk-flush: reuse rq queuelist in flush state machine chengming.zhou
2023-07-07 10:06   ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).