All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/14] Various block layer optimizations
@ 2021-10-17  1:37 Jens Axboe
  2021-10-17  1:37 ` [PATCH 01/14] block: inline fast path of driver tag allocation Jens Axboe
                   ` (13 more replies)
  0 siblings, 14 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block

Hi,

Various cleanups and optimizations that I've worked up while pushing
out IOPS-per-core from ~5.5M to ~8.2M.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/14] block: inline fast path of driver tag allocation
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  8:42   ` Christoph Hellwig
  2021-10-17  1:37 ` [PATCH 02/14] block: don't bother iter advancing a fully done bio Jens Axboe
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

If we don't use an IO scheduler or have shared tags, then we don't need
to call into this external function at all. This saves ~2% for such
a setup.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c |  8 +++-----
 block/blk-mq.h | 15 ++++++++++++++-
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1bbe5de66c40..90bc93fe373e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1145,7 +1145,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
 	return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
 }
 
-static bool __blk_mq_get_driver_tag(struct request *rq)
+static bool __blk_mq_alloc_driver_tag(struct request *rq)
 {
 	struct sbitmap_queue *bt = &rq->mq_hctx->tags->bitmap_tags;
 	unsigned int tag_offset = rq->mq_hctx->tags->nr_reserved_tags;
@@ -1169,11 +1169,9 @@ static bool __blk_mq_get_driver_tag(struct request *rq)
 	return true;
 }
 
-bool blk_mq_get_driver_tag(struct request *rq)
+bool __blk_mq_get_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq)
 {
-	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
-
-	if (rq->tag == BLK_MQ_NO_TAG && !__blk_mq_get_driver_tag(rq))
+	if (rq->tag == BLK_MQ_NO_TAG && !__blk_mq_alloc_driver_tag(rq))
 		return false;
 
 	if ((hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) &&
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 8be447995106..ceed0a001c76 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -264,7 +264,20 @@ static inline void blk_mq_put_driver_tag(struct request *rq)
 	__blk_mq_put_driver_tag(rq->mq_hctx, rq);
 }
 
-bool blk_mq_get_driver_tag(struct request *rq);
+bool __blk_mq_get_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq);
+
+static inline bool blk_mq_get_driver_tag(struct request *rq)
+{
+	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
+
+	if (rq->tag != BLK_MQ_NO_TAG &&
+	    !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+		hctx->tags->rqs[rq->tag] = rq;
+		return true;
+	}
+
+	return __blk_mq_get_driver_tag(hctx, rq);
+}
 
 static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
 {
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/14] block: don't bother iter advancing a fully done bio
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
  2021-10-17  1:37 ` [PATCH 01/14] block: inline fast path of driver tag allocation Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  8:42   ` Christoph Hellwig
  2021-10-17  1:37 ` [PATCH 03/14] block: remove useless caller argument to print_req_error() Jens Axboe
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

If we're completing nbytes and nbytes is the size of the bio, don't bother
with calling into the iterator increment helpers. Just clear the bio
size and we're done.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/bio.c         | 15 ++-------------
 include/linux/bio.h | 24 ++++++++++++++++++++++--
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index a3c9ff23a036..2427e6fca942 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1278,18 +1278,7 @@ int submit_bio_wait(struct bio *bio)
 }
 EXPORT_SYMBOL(submit_bio_wait);
 
-/**
- * bio_advance - increment/complete a bio by some number of bytes
- * @bio:	bio to advance
- * @bytes:	number of bytes to complete
- *
- * This updates bi_sector, bi_size and bi_idx; if the number of bytes to
- * complete doesn't align with a bvec boundary, then bv_len and bv_offset will
- * be updated on the last bvec as well.
- *
- * @bio will then represent the remaining, uncompleted portion of the io.
- */
-void bio_advance(struct bio *bio, unsigned bytes)
+void __bio_advance(struct bio *bio, unsigned bytes)
 {
 	if (bio_integrity(bio))
 		bio_integrity_advance(bio, bytes);
@@ -1297,7 +1286,7 @@ void bio_advance(struct bio *bio, unsigned bytes)
 	bio_crypt_advance(bio, bytes);
 	bio_advance_iter(bio, &bio->bi_iter, bytes);
 }
-EXPORT_SYMBOL(bio_advance);
+EXPORT_SYMBOL(__bio_advance);
 
 void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
 			struct bio *src, struct bvec_iter *src_iter)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 62d684b7dd4c..9538f20ffaa5 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -119,6 +119,28 @@ static inline void bio_advance_iter_single(const struct bio *bio,
 		bvec_iter_advance_single(bio->bi_io_vec, iter, bytes);
 }
 
+void __bio_advance(struct bio *, unsigned bytes);
+
+/**
+ * bio_advance - increment/complete a bio by some number of bytes
+ * @bio:	bio to advance
+ * @bytes:	number of bytes to complete
+ *
+ * This updates bi_sector, bi_size and bi_idx; if the number of bytes to
+ * complete doesn't align with a bvec boundary, then bv_len and bv_offset will
+ * be updated on the last bvec as well.
+ *
+ * @bio will then represent the remaining, uncompleted portion of the io.
+ */
+static inline void bio_advance(struct bio *bio, unsigned int nbytes)
+{
+	if (nbytes == bio->bi_iter.bi_size) {
+		bio->bi_iter.bi_size = 0;
+		return;
+	}
+	__bio_advance(bio, nbytes);
+}
+
 #define __bio_for_each_segment(bvl, bio, iter, start)			\
 	for (iter = (start);						\
 	     (iter).bi_size &&						\
@@ -381,8 +403,6 @@ static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
 struct request_queue;
 
 extern int submit_bio_wait(struct bio *bio);
-extern void bio_advance(struct bio *, unsigned);
-
 extern void bio_init(struct bio *bio, struct bio_vec *table,
 		     unsigned short max_vecs);
 extern void bio_uninit(struct bio *);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/14] block: remove useless caller argument to print_req_error()
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
  2021-10-17  1:37 ` [PATCH 01/14] block: inline fast path of driver tag allocation Jens Axboe
  2021-10-17  1:37 ` [PATCH 02/14] block: don't bother iter advancing a fully done bio Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  8:42   ` Christoph Hellwig
  2021-10-17  1:37 ` [PATCH 04/14] block: move update request helpers into blk-mq.c Jens Axboe
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

We have exactly one caller of this, just get rid of adding the useless
function name to the output.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d5b0258dd218..2596327f07d6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -217,8 +217,7 @@ int blk_status_to_errno(blk_status_t status)
 }
 EXPORT_SYMBOL_GPL(blk_status_to_errno);
 
-static void print_req_error(struct request *req, blk_status_t status,
-		const char *caller)
+static void print_req_error(struct request *req, blk_status_t status)
 {
 	int idx = (__force int)status;
 
@@ -226,9 +225,9 @@ static void print_req_error(struct request *req, blk_status_t status,
 		return;
 
 	printk_ratelimited(KERN_ERR
-		"%s: %s error, dev %s, sector %llu op 0x%x:(%s) flags 0x%x "
+		"%s error, dev %s, sector %llu op 0x%x:(%s) flags 0x%x "
 		"phys_seg %u prio class %u\n",
-		caller, blk_errors[idx].name,
+		blk_errors[idx].name,
 		req->rq_disk ? req->rq_disk->disk_name : "?",
 		blk_rq_pos(req), req_op(req), blk_op_str(req_op(req)),
 		req->cmd_flags & ~REQ_OP_MASK,
@@ -1464,7 +1463,7 @@ bool blk_update_request(struct request *req, blk_status_t error,
 
 	if (unlikely(error && !blk_rq_is_passthrough(req) &&
 		     !(req->rq_flags & RQF_QUIET)))
-		print_req_error(req, error, __func__);
+		print_req_error(req, error);
 
 	blk_account_io_completion(req, nr_bytes);
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/14] block: move update request helpers into blk-mq.c
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (2 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 03/14] block: remove useless caller argument to print_req_error() Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  8:43   ` Christoph Hellwig
  2021-10-17  1:37 ` [PATCH 05/14] block: don't call blk_status_to_errno() for success status Jens Axboe
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

For some reason we still have them in blk-core, with the rest of the
request completion being in blk-mq. That causes and out-of-line call
for each completion.

Move them into blk-mq.c instead, where they belong.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c | 146 +----------------------------------------------
 block/blk-mq.c   | 144 ++++++++++++++++++++++++++++++++++++++++++++++
 block/blk.h      |   1 +
 3 files changed, 146 insertions(+), 145 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2596327f07d6..bdc03b80a8d0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -217,7 +217,7 @@ int blk_status_to_errno(blk_status_t status)
 }
 EXPORT_SYMBOL_GPL(blk_status_to_errno);
 
-static void print_req_error(struct request *req, blk_status_t status)
+void blk_print_req_error(struct request *req, blk_status_t status)
 {
 	int idx = (__force int)status;
 
@@ -235,33 +235,6 @@ static void print_req_error(struct request *req, blk_status_t status)
 		IOPRIO_PRIO_CLASS(req->ioprio));
 }
 
-static void req_bio_endio(struct request *rq, struct bio *bio,
-			  unsigned int nbytes, blk_status_t error)
-{
-	if (error)
-		bio->bi_status = error;
-
-	if (unlikely(rq->rq_flags & RQF_QUIET))
-		bio_set_flag(bio, BIO_QUIET);
-
-	bio_advance(bio, nbytes);
-
-	if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) {
-		/*
-		 * Partial zone append completions cannot be supported as the
-		 * BIO fragments may end up not being written sequentially.
-		 */
-		if (bio->bi_iter.bi_size)
-			bio->bi_status = BLK_STS_IOERR;
-		else
-			bio->bi_iter.bi_sector = rq->__sector;
-	}
-
-	/* don't actually finish bio if it's part of flush sequence */
-	if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
-		bio_endio(bio);
-}
-
 void blk_dump_rq_flags(struct request *rq, char *msg)
 {
 	printk(KERN_INFO "%s: dev %s: flags=%llx\n", msg,
@@ -1304,17 +1277,6 @@ static void update_io_ticks(struct block_device *part, unsigned long now,
 	}
 }
 
-static void blk_account_io_completion(struct request *req, unsigned int bytes)
-{
-	if (req->part && blk_do_io_stat(req)) {
-		const int sgrp = op_stat_group(req_op(req));
-
-		part_stat_lock();
-		part_stat_add(req->part, sectors[sgrp], bytes >> 9);
-		part_stat_unlock();
-	}
-}
-
 void __blk_account_io_done(struct request *req, u64 now)
 {
 	const int sgrp = op_stat_group(req_op(req));
@@ -1423,112 +1385,6 @@ void blk_steal_bios(struct bio_list *list, struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_steal_bios);
 
-/**
- * blk_update_request - Complete multiple bytes without completing the request
- * @req:      the request being processed
- * @error:    block status code
- * @nr_bytes: number of bytes to complete for @req
- *
- * Description:
- *     Ends I/O on a number of bytes attached to @req, but doesn't complete
- *     the request structure even if @req doesn't have leftover.
- *     If @req has leftover, sets it up for the next range of segments.
- *
- *     Passing the result of blk_rq_bytes() as @nr_bytes guarantees
- *     %false return from this function.
- *
- * Note:
- *	The RQF_SPECIAL_PAYLOAD flag is ignored on purpose in this function
- *      except in the consistency check at the end of this function.
- *
- * Return:
- *     %false - this request doesn't have any more data
- *     %true  - this request has more data
- **/
-bool blk_update_request(struct request *req, blk_status_t error,
-		unsigned int nr_bytes)
-{
-	int total_bytes;
-
-	trace_block_rq_complete(req, blk_status_to_errno(error), nr_bytes);
-
-	if (!req->bio)
-		return false;
-
-#ifdef CONFIG_BLK_DEV_INTEGRITY
-	if (blk_integrity_rq(req) && req_op(req) == REQ_OP_READ &&
-	    error == BLK_STS_OK)
-		req->q->integrity.profile->complete_fn(req, nr_bytes);
-#endif
-
-	if (unlikely(error && !blk_rq_is_passthrough(req) &&
-		     !(req->rq_flags & RQF_QUIET)))
-		print_req_error(req, error);
-
-	blk_account_io_completion(req, nr_bytes);
-
-	total_bytes = 0;
-	while (req->bio) {
-		struct bio *bio = req->bio;
-		unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);
-
-		if (bio_bytes == bio->bi_iter.bi_size)
-			req->bio = bio->bi_next;
-
-		/* Completion has already been traced */
-		bio_clear_flag(bio, BIO_TRACE_COMPLETION);
-		req_bio_endio(req, bio, bio_bytes, error);
-
-		total_bytes += bio_bytes;
-		nr_bytes -= bio_bytes;
-
-		if (!nr_bytes)
-			break;
-	}
-
-	/*
-	 * completely done
-	 */
-	if (!req->bio) {
-		/*
-		 * Reset counters so that the request stacking driver
-		 * can find how many bytes remain in the request
-		 * later.
-		 */
-		req->__data_len = 0;
-		return false;
-	}
-
-	req->__data_len -= total_bytes;
-
-	/* update sector only for requests with clear definition of sector */
-	if (!blk_rq_is_passthrough(req))
-		req->__sector += total_bytes >> 9;
-
-	/* mixed attributes always follow the first bio */
-	if (req->rq_flags & RQF_MIXED_MERGE) {
-		req->cmd_flags &= ~REQ_FAILFAST_MASK;
-		req->cmd_flags |= req->bio->bi_opf & REQ_FAILFAST_MASK;
-	}
-
-	if (!(req->rq_flags & RQF_SPECIAL_PAYLOAD)) {
-		/*
-		 * If total number of sectors is less than the first segment
-		 * size, something has gone terribly wrong.
-		 */
-		if (blk_rq_bytes(req) < blk_rq_cur_bytes(req)) {
-			blk_dump_rq_flags(req, "request botched");
-			req->__data_len = blk_rq_cur_bytes(req);
-		}
-
-		/* recalculate the number of segments */
-		req->nr_phys_segments = blk_recalc_rq_segments(req);
-	}
-
-	return true;
-}
-EXPORT_SYMBOL_GPL(blk_update_request);
-
 #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
 /**
  * rq_flush_dcache_pages - Helper function to flush all pages in a request
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 90bc93fe373e..ffccc5f0f66a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -613,6 +613,150 @@ void blk_mq_free_plug_rqs(struct blk_plug *plug)
 	}
 }
 
+static void req_bio_endio(struct request *rq, struct bio *bio,
+			  unsigned int nbytes, blk_status_t error)
+{
+	if (error)
+		bio->bi_status = error;
+
+	if (unlikely(rq->rq_flags & RQF_QUIET))
+		bio_set_flag(bio, BIO_QUIET);
+
+	bio_advance(bio, nbytes);
+
+	if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) {
+		/*
+		 * Partial zone append completions cannot be supported as the
+		 * BIO fragments may end up not being written sequentially.
+		 */
+		if (bio->bi_iter.bi_size)
+			bio->bi_status = BLK_STS_IOERR;
+		else
+			bio->bi_iter.bi_sector = rq->__sector;
+	}
+
+	/* don't actually finish bio if it's part of flush sequence */
+	if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
+		bio_endio(bio);
+}
+
+static void blk_account_io_completion(struct request *req, unsigned int bytes)
+{
+	if (req->part && blk_do_io_stat(req)) {
+		const int sgrp = op_stat_group(req_op(req));
+
+		part_stat_lock();
+		part_stat_add(req->part, sectors[sgrp], bytes >> 9);
+		part_stat_unlock();
+	}
+}
+
+/**
+ * blk_update_request - Complete multiple bytes without completing the request
+ * @req:      the request being processed
+ * @error:    block status code
+ * @nr_bytes: number of bytes to complete for @req
+ *
+ * Description:
+ *     Ends I/O on a number of bytes attached to @req, but doesn't complete
+ *     the request structure even if @req doesn't have leftover.
+ *     If @req has leftover, sets it up for the next range of segments.
+ *
+ *     Passing the result of blk_rq_bytes() as @nr_bytes guarantees
+ *     %false return from this function.
+ *
+ * Note:
+ *	The RQF_SPECIAL_PAYLOAD flag is ignored on purpose in this function
+ *      except in the consistency check at the end of this function.
+ *
+ * Return:
+ *     %false - this request doesn't have any more data
+ *     %true  - this request has more data
+ **/
+bool blk_update_request(struct request *req, blk_status_t error,
+		unsigned int nr_bytes)
+{
+	int total_bytes;
+
+	trace_block_rq_complete(req, blk_status_to_errno(error), nr_bytes);
+
+	if (!req->bio)
+		return false;
+
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	if (blk_integrity_rq(req) && req_op(req) == REQ_OP_READ &&
+	    error == BLK_STS_OK)
+		req->q->integrity.profile->complete_fn(req, nr_bytes);
+#endif
+
+	if (unlikely(error && !blk_rq_is_passthrough(req) &&
+		     !(req->rq_flags & RQF_QUIET)))
+		blk_print_req_error(req, error);
+
+	blk_account_io_completion(req, nr_bytes);
+
+	total_bytes = 0;
+	while (req->bio) {
+		struct bio *bio = req->bio;
+		unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);
+
+		if (bio_bytes == bio->bi_iter.bi_size)
+			req->bio = bio->bi_next;
+
+		/* Completion has already been traced */
+		bio_clear_flag(bio, BIO_TRACE_COMPLETION);
+		req_bio_endio(req, bio, bio_bytes, error);
+
+		total_bytes += bio_bytes;
+		nr_bytes -= bio_bytes;
+
+		if (!nr_bytes)
+			break;
+	}
+
+	/*
+	 * completely done
+	 */
+	if (!req->bio) {
+		/*
+		 * Reset counters so that the request stacking driver
+		 * can find how many bytes remain in the request
+		 * later.
+		 */
+		req->__data_len = 0;
+		return false;
+	}
+
+	req->__data_len -= total_bytes;
+
+	/* update sector only for requests with clear definition of sector */
+	if (!blk_rq_is_passthrough(req))
+		req->__sector += total_bytes >> 9;
+
+	/* mixed attributes always follow the first bio */
+	if (req->rq_flags & RQF_MIXED_MERGE) {
+		req->cmd_flags &= ~REQ_FAILFAST_MASK;
+		req->cmd_flags |= req->bio->bi_opf & REQ_FAILFAST_MASK;
+	}
+
+	if (!(req->rq_flags & RQF_SPECIAL_PAYLOAD)) {
+		/*
+		 * If total number of sectors is less than the first segment
+		 * size, something has gone terribly wrong.
+		 */
+		if (blk_rq_bytes(req) < blk_rq_cur_bytes(req)) {
+			blk_dump_rq_flags(req, "request botched");
+			req->__data_len = blk_rq_cur_bytes(req);
+		}
+
+		/* recalculate the number of segments */
+		req->nr_phys_segments = blk_recalc_rq_segments(req);
+	}
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(blk_update_request);
+
 inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
 {
 	if (blk_mq_need_time_stamp(rq)) {
diff --git a/block/blk.h b/block/blk.h
index f6e61cebd6ae..fdfaa6896fc4 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -213,6 +213,7 @@ static inline void blk_integrity_del(struct gendisk *disk)
 
 unsigned long blk_rq_timeout(unsigned long timeout);
 void blk_add_timer(struct request *req);
+void blk_print_req_error(struct request *req, blk_status_t status);
 
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs, struct request **same_queue_rq);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/14] block: don't call blk_status_to_errno() for success status
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (3 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 04/14] block: move update request helpers into blk-mq.c Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  8:53   ` Christoph Hellwig
  2021-10-18 10:49   ` Pavel Begunkov
  2021-10-17  1:37 ` [PATCH 06/14] block: store elevator state in request Jens Axboe
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Christoph Hellwig

We only need to call it to resolve the blk_status_t -> errno mapping
if the status is different than BLK_STS_OK. Check that it is before
doing so.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ffccc5f0f66a..fa5b12200404 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -676,9 +676,11 @@ static void blk_account_io_completion(struct request *req, unsigned int bytes)
 bool blk_update_request(struct request *req, blk_status_t error,
 		unsigned int nr_bytes)
 {
-	int total_bytes;
+	int total_bytes, blk_errno = 0;
 
-	trace_block_rq_complete(req, blk_status_to_errno(error), nr_bytes);
+	if (unlikely(error != BLK_STS_OK))
+		blk_errno = blk_status_to_errno(error);
+	trace_block_rq_complete(req, blk_errno, nr_bytes);
 
 	if (!req->bio)
 		return false;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/14] block: store elevator state in request
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (4 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 05/14] block: don't call blk_status_to_errno() for success status Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  8:58   ` Christoph Hellwig
  2021-10-19 22:21   ` Guillaume Tucker
  2021-10-17  1:37 ` [PATCH 07/14] block: change plugging to use a singly linked list Jens Axboe
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Add an rq private RQF_ELV flag, which tells the block layer that this
request was initialized on a queue that has an IO scheduler attached.
This allows for faster checking in the fast path, rather than having to
deference rq->q later on.

Elevator switching does full quiesce of the queue before detaching an
IO scheduler, so it's safe to cache this in the request itself.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq-sched.h   | 27 ++++++++++++++++-----------
 block/blk-mq.c         | 20 +++++++++++---------
 include/linux/blk-mq.h |  2 ++
 3 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index fe252278ed9a..98836106b25f 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -56,29 +56,34 @@ static inline bool
 blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
 			 struct bio *bio)
 {
-	struct elevator_queue *e = q->elevator;
-
-	if (e && e->type->ops.allow_merge)
-		return e->type->ops.allow_merge(q, rq, bio);
+	if (rq->rq_flags & RQF_ELV) {
+		struct elevator_queue *e = q->elevator;
 
+		if (e->type->ops.allow_merge)
+			return e->type->ops.allow_merge(q, rq, bio);
+	}
 	return true;
 }
 
 static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
 {
-	struct elevator_queue *e = rq->q->elevator;
+	if (rq->rq_flags & RQF_ELV) {
+		struct elevator_queue *e = rq->q->elevator;
 
-	if (e && e->type->ops.completed_request)
-		e->type->ops.completed_request(rq, now);
+		if (e->type->ops.completed_request)
+			e->type->ops.completed_request(rq, now);
+	}
 }
 
 static inline void blk_mq_sched_requeue_request(struct request *rq)
 {
-	struct request_queue *q = rq->q;
-	struct elevator_queue *e = q->elevator;
+	if (rq->rq_flags & RQF_ELV) {
+		struct request_queue *q = rq->q;
+		struct elevator_queue *e = q->elevator;
 
-	if ((rq->rq_flags & RQF_ELVPRIV) && e && e->type->ops.requeue_request)
-		e->type->ops.requeue_request(rq);
+		if ((rq->rq_flags & RQF_ELVPRIV) && e->type->ops.requeue_request)
+			e->type->ops.requeue_request(rq);
+	}
 }
 
 static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fa5b12200404..5d22c228f6df 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -299,7 +299,7 @@ void blk_mq_wake_waiters(struct request_queue *q)
  */
 static inline bool blk_mq_need_time_stamp(struct request *rq)
 {
-	return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS)) || rq->q->elevator;
+	return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS | RQF_ELV));
 }
 
 static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
@@ -309,9 +309,11 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	struct request *rq = tags->static_rqs[tag];
 
 	if (data->q->elevator) {
+		rq->rq_flags = RQF_ELV;
 		rq->tag = BLK_MQ_NO_TAG;
 		rq->internal_tag = tag;
 	} else {
+		rq->rq_flags = 0;
 		rq->tag = tag;
 		rq->internal_tag = BLK_MQ_NO_TAG;
 	}
@@ -320,7 +322,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	rq->q = data->q;
 	rq->mq_ctx = data->ctx;
 	rq->mq_hctx = data->hctx;
-	rq->rq_flags = 0;
 	rq->cmd_flags = data->cmd_flags;
 	if (data->flags & BLK_MQ_REQ_PM)
 		rq->rq_flags |= RQF_PM;
@@ -356,11 +357,11 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
 	refcount_set(&rq->ref, 1);
 
-	if (!op_is_flush(data->cmd_flags)) {
+	if (!op_is_flush(data->cmd_flags) && (rq->rq_flags & RQF_ELV)) {
 		struct elevator_queue *e = data->q->elevator;
 
 		rq->elv.icq = NULL;
-		if (e && e->type->ops.prepare_request) {
+		if (e->type->ops.prepare_request) {
 			if (e->type->icq_cache)
 				blk_mq_sched_assign_ioc(rq);
 
@@ -575,12 +576,13 @@ static void __blk_mq_free_request(struct request *rq)
 void blk_mq_free_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
-	struct elevator_queue *e = q->elevator;
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
-	if (rq->rq_flags & RQF_ELVPRIV) {
-		if (e && e->type->ops.finish_request)
+	if (rq->rq_flags & (RQF_ELVPRIV | RQF_ELV)) {
+		struct elevator_queue *e = q->elevator;
+
+		if (e->type->ops.finish_request)
 			e->type->ops.finish_request(rq);
 		if (rq->elv.icq) {
 			put_io_context(rq->elv.icq->ioc);
@@ -2239,7 +2241,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		goto insert;
 	}
 
-	if (q->elevator && !bypass_insert)
+	if ((rq->rq_flags & RQF_ELV) && !bypass_insert)
 		goto insert;
 
 	budget_token = blk_mq_get_dispatch_budget(q);
@@ -2475,7 +2477,7 @@ void blk_mq_submit_bio(struct bio *bio)
 		}
 
 		blk_add_rq_to_plug(plug, rq);
-	} else if (q->elevator) {
+	} else if (rq->rq_flags & RQF_ELV) {
 		/* Insert the request at the IO scheduler queue */
 		blk_mq_sched_insert_request(rq, false, true, true);
 	} else if (plug && !blk_queue_nomerges(q)) {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a9c1d0882550..3a399aa372b5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -55,6 +55,8 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_MQ_POLL_SLEPT	((__force req_flags_t)(1 << 20))
 /* ->timeout has been called, don't expire again */
 #define RQF_TIMED_OUT		((__force req_flags_t)(1 << 21))
+/* queue has elevator attached */
+#define RQF_ELV			((__force req_flags_t)(1 << 22))
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/14] block: change plugging to use a singly linked list
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (5 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 06/14] block: store elevator state in request Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-17  4:45     ` kernel test robot
                     ` (3 more replies)
  2021-10-17  1:37 ` [PATCH 08/14] block: improve layout of struct request Jens Axboe
                   ` (6 subsequent siblings)
  13 siblings, 4 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Use a singly linked list for the blk_plug. This saves 8 bytes in the
blk_plug struct, and makes for faster list manipulations than doubly
linked lists. As we don't use the doubly linked lists for anything,
singly linked is just fine.

This yields a bump in default (merging enabled) performance from 7.0
to 7.1M IOPS, and ~7.5M IOPS with merging disabled.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c       |   5 +-
 block/blk-merge.c      |   8 +--
 block/blk-mq.c         | 156 +++++++++++++++++++++++++++--------------
 block/blk.h            |   2 +-
 include/linux/blkdev.h |   6 +-
 5 files changed, 113 insertions(+), 64 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index bdc03b80a8d0..fced71948162 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1542,11 +1542,12 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
 	if (tsk->plug)
 		return;
 
-	INIT_LIST_HEAD(&plug->mq_list);
+	plug->mq_list = NULL;
 	plug->cached_rq = NULL;
 	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
 	plug->rq_count = 0;
 	plug->multiple_queues = false;
+	plug->has_elevator = false;
 	plug->nowait = false;
 	INIT_LIST_HEAD(&plug->cb_list);
 
@@ -1632,7 +1633,7 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 {
 	flush_plug_callbacks(plug, from_schedule);
 
-	if (!list_empty(&plug->mq_list))
+	if (!rq_list_empty(plug->mq_list))
 		blk_mq_flush_plug_list(plug, from_schedule);
 	if (unlikely(!from_schedule && plug->cached_rq))
 		blk_mq_free_plug_rqs(plug);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index ec727234ac48..b3f3e689a5ac 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -1085,23 +1085,23 @@ static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
  * Caller must ensure !blk_queue_nomerges(q) beforehand.
  */
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
-		unsigned int nr_segs, struct request **same_queue_rq)
+		unsigned int nr_segs, bool *same_queue_rq)
 {
 	struct blk_plug *plug;
 	struct request *rq;
 
 	plug = blk_mq_plug(q, bio);
-	if (!plug || list_empty(&plug->mq_list))
+	if (!plug || rq_list_empty(plug->mq_list))
 		return false;
 
 	/* check the previously added entry for a quick merge attempt */
-	rq = list_last_entry(&plug->mq_list, struct request, queuelist);
+	rq = rq_list_peek(&plug->mq_list);
 	if (rq->q == q && same_queue_rq) {
 		/*
 		 * Only blk-mq multiple hardware queues case checks the rq in
 		 * the same queue, there should be only one such rq in a queue
 		 */
-		*same_queue_rq = rq;
+		*same_queue_rq = true;
 	}
 	if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) == BIO_MERGE_OK)
 		return true;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5d22c228f6df..cd1249284c1f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -19,7 +19,6 @@
 #include <linux/smp.h>
 #include <linux/interrupt.h>
 #include <linux/llist.h>
-#include <linux/list_sort.h>
 #include <linux/cpu.h>
 #include <linux/cache.h>
 #include <linux/sched/sysctl.h>
@@ -2118,54 +2117,100 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 	spin_unlock(&ctx->lock);
 }
 
-static int plug_rq_cmp(void *priv, const struct list_head *a,
-		       const struct list_head *b)
+static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 {
-	struct request *rqa = container_of(a, struct request, queuelist);
-	struct request *rqb = container_of(b, struct request, queuelist);
+	struct blk_mq_hw_ctx *hctx = NULL;
+	int queued = 0;
+	int errors = 0;
+
+	while (!rq_list_empty(plug->mq_list)) {
+		struct request *rq;
+		blk_status_t ret;
+
+		rq = rq_list_pop(&plug->mq_list);
 
-	if (rqa->mq_ctx != rqb->mq_ctx)
-		return rqa->mq_ctx > rqb->mq_ctx;
-	if (rqa->mq_hctx != rqb->mq_hctx)
-		return rqa->mq_hctx > rqb->mq_hctx;
+		if (!hctx)
+			hctx = rq->mq_hctx;
+		else if (hctx != rq->mq_hctx && hctx->queue->mq_ops->commit_rqs) {
+			trace_block_unplug(hctx->queue, queued, !from_schedule);
+			hctx->queue->mq_ops->commit_rqs(hctx);
+			queued = 0;
+			hctx = rq->mq_hctx;
+		}
+
+		ret = blk_mq_request_issue_directly(rq,
+						rq_list_empty(plug->mq_list));
+		if (ret != BLK_STS_OK) {
+			if (ret == BLK_STS_RESOURCE ||
+					ret == BLK_STS_DEV_RESOURCE) {
+				blk_mq_request_bypass_insert(rq, false,
+						rq_list_empty(plug->mq_list));
+				break;
+			}
+			blk_mq_end_request(rq, ret);
+			errors++;
+		} else
+			queued++;
+	}
 
-	return blk_rq_pos(rqa) > blk_rq_pos(rqb);
+	/*
+	 * If we didn't flush the entire list, we could have told
+	 * the driver there was more coming, but that turned out to
+	 * be a lie.
+	 */
+	if ((!rq_list_empty(plug->mq_list) || errors) &&
+	     hctx->queue->mq_ops->commit_rqs && queued)
+		hctx->queue->mq_ops->commit_rqs(hctx);
 }
 
 void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 {
+	struct blk_mq_hw_ctx *this_hctx;
+	struct blk_mq_ctx *this_ctx;
+	unsigned int depth;
 	LIST_HEAD(list);
 
-	if (list_empty(&plug->mq_list))
+	if (rq_list_empty(plug->mq_list))
 		return;
-	list_splice_init(&plug->mq_list, &list);
-
-	if (plug->rq_count > 2 && plug->multiple_queues)
-		list_sort(NULL, &list, plug_rq_cmp);
-
 	plug->rq_count = 0;
 
+	if (!plug->multiple_queues && !plug->has_elevator) {
+		blk_mq_plug_issue_direct(plug, from_schedule);
+		if (rq_list_empty(plug->mq_list))
+			return;
+	}
+
+	this_hctx = NULL;
+	this_ctx = NULL;
+	depth = 0;
 	do {
-		struct list_head rq_list;
-		struct request *rq, *head_rq = list_entry_rq(list.next);
-		struct list_head *pos = &head_rq->queuelist; /* skip first */
-		struct blk_mq_hw_ctx *this_hctx = head_rq->mq_hctx;
-		struct blk_mq_ctx *this_ctx = head_rq->mq_ctx;
-		unsigned int depth = 1;
-
-		list_for_each_continue(pos, &list) {
-			rq = list_entry_rq(pos);
-			BUG_ON(!rq->q);
-			if (rq->mq_hctx != this_hctx || rq->mq_ctx != this_ctx)
-				break;
-			depth++;
+		struct request *rq;
+
+		rq = rq_list_pop(&plug->mq_list);
+
+		if (!this_hctx) {
+			this_hctx = rq->mq_hctx;
+			this_ctx = rq->mq_ctx;
+		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx) {
+			trace_block_unplug(this_hctx->queue, depth,
+						!from_schedule);
+			blk_mq_sched_insert_requests(this_hctx, this_ctx,
+						&list, from_schedule);
+			depth = 0;
+			this_hctx = rq->mq_hctx;
+			this_ctx = rq->mq_ctx;
+
 		}
 
-		list_cut_before(&rq_list, &list, pos);
-		trace_block_unplug(head_rq->q, depth, !from_schedule);
-		blk_mq_sched_insert_requests(this_hctx, this_ctx, &rq_list,
+		list_add(&rq->queuelist, &list);
+		depth++;
+	} while (!rq_list_empty(plug->mq_list));
+
+	if (!list_empty(&list)) {
+		trace_block_unplug(this_hctx->queue, depth, !from_schedule);
+		blk_mq_sched_insert_requests(this_hctx, this_ctx, &list,
 						from_schedule);
-	} while(!list_empty(&list));
+	}
 }
 
 static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
@@ -2345,16 +2390,17 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 
 static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
 {
-	list_add_tail(&rq->queuelist, &plug->mq_list);
-	plug->rq_count++;
-	if (!plug->multiple_queues && !list_is_singular(&plug->mq_list)) {
-		struct request *tmp;
+	if (!plug->multiple_queues) {
+		struct request *nxt = rq_list_peek(&plug->mq_list);
 
-		tmp = list_first_entry(&plug->mq_list, struct request,
-						queuelist);
-		if (tmp->q != rq->q)
+		if (nxt && nxt->q != rq->q)
 			plug->multiple_queues = true;
 	}
+	if (!plug->has_elevator && (rq->rq_flags & RQF_ELV))
+		plug->has_elevator = true;
+	rq->rq_next = NULL;
+	rq_list_add(&plug->mq_list, rq);
+	plug->rq_count++;
 }
 
 /*
@@ -2389,7 +2435,7 @@ void blk_mq_submit_bio(struct bio *bio)
 	const int is_flush_fua = op_is_flush(bio->bi_opf);
 	struct request *rq;
 	struct blk_plug *plug;
-	struct request *same_queue_rq = NULL;
+	bool same_queue_rq = false;
 	unsigned int nr_segs = 1;
 	blk_status_t ret;
 
@@ -2463,15 +2509,17 @@ void blk_mq_submit_bio(struct bio *bio)
 		 * IO may benefit a lot from plug merging.
 		 */
 		unsigned int request_count = plug->rq_count;
-		struct request *last = NULL;
+		struct request *tmp = NULL;
 
-		if (!request_count)
+		if (!request_count) {
 			trace_block_plug(q);
-		else
-			last = list_entry_rq(plug->mq_list.prev);
+		} else if (!blk_queue_nomerges(q)) {
+			tmp = rq_list_peek(&plug->mq_list);
+			if (blk_rq_bytes(tmp) < BLK_PLUG_FLUSH_SIZE)
+				tmp = NULL;
+		}
 
-		if (request_count >= blk_plug_max_rq_count(plug) || (last &&
-		    blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
+		if (request_count >= blk_plug_max_rq_count(plug) || tmp) {
 			blk_flush_plug_list(plug, false);
 			trace_block_plug(q);
 		}
@@ -2481,6 +2529,8 @@ void blk_mq_submit_bio(struct bio *bio)
 		/* Insert the request at the IO scheduler queue */
 		blk_mq_sched_insert_request(rq, false, true, true);
 	} else if (plug && !blk_queue_nomerges(q)) {
+		struct request *first_rq = NULL;
+
 		/*
 		 * We do limited plugging. If the bio can be merged, do that.
 		 * Otherwise the existing request in the plug list will be
@@ -2488,19 +2538,17 @@ void blk_mq_submit_bio(struct bio *bio)
 		 * The plug list might get flushed before this. If that happens,
 		 * the plug list is empty, and same_queue_rq is invalid.
 		 */
-		if (list_empty(&plug->mq_list))
-			same_queue_rq = NULL;
-		if (same_queue_rq) {
-			list_del_init(&same_queue_rq->queuelist);
+		if (!rq_list_empty(plug->mq_list) && same_queue_rq) {
+			first_rq = rq_list_pop(&plug->mq_list);
 			plug->rq_count--;
 		}
 		blk_add_rq_to_plug(plug, rq);
 		trace_block_plug(q);
 
-		if (same_queue_rq) {
+		if (first_rq) {
 			trace_block_unplug(q, 1, true);
-			blk_mq_try_issue_directly(same_queue_rq->mq_hctx,
-						  same_queue_rq);
+			blk_mq_try_issue_directly(first_rq->mq_hctx,
+						  first_rq);
 		}
 	} else if ((q->nr_hw_queues > 1 && is_sync) ||
 		   !rq->mq_hctx->dispatch_busy) {
diff --git a/block/blk.h b/block/blk.h
index fdfaa6896fc4..c15d1ab224b8 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -216,7 +216,7 @@ void blk_add_timer(struct request *req);
 void blk_print_req_error(struct request *req, blk_status_t status);
 
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
-		unsigned int nr_segs, struct request **same_queue_rq);
+		unsigned int nr_segs, bool *same_queue_rq);
 bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
 			struct bio *bio, unsigned int nr_segs);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 43dda725dcae..c6a6402cb1a1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -727,7 +727,7 @@ extern void blk_set_queue_dying(struct request_queue *);
  * schedule() where blk_schedule_flush_plug() is called.
  */
 struct blk_plug {
-	struct list_head mq_list; /* blk-mq requests */
+	struct request *mq_list; /* blk-mq requests */
 
 	/* if ios_left is > 1, we can batch tag/rq allocations */
 	struct request *cached_rq;
@@ -736,6 +736,7 @@ struct blk_plug {
 	unsigned short rq_count;
 
 	bool multiple_queues;
+	bool has_elevator;
 	bool nowait;
 
 	struct list_head cb_list; /* md requires an unplug callback */
@@ -776,8 +777,7 @@ static inline bool blk_needs_flush_plug(struct task_struct *tsk)
 	struct blk_plug *plug = tsk->plug;
 
 	return plug &&
-		 (!list_empty(&plug->mq_list) ||
-		 !list_empty(&plug->cb_list));
+		 (plug->mq_list || !list_empty(&plug->cb_list));
 }
 
 int blkdev_issue_flush(struct block_device *bdev);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/14] block: improve layout of struct request
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (6 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 07/14] block: change plugging to use a singly linked list Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  9:19   ` Christoph Hellwig
  2021-10-17  1:37 ` [PATCH 09/14] block: only mark bio as tracked if it really is tracked Jens Axboe
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

It's been a while since this was analyzed, move some members around to
better flow with the use case. Initial state up top, and queued state
after that. This improves my peak case by about 1.5%, from 7750K to
7900K IOPS.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/blk-mq.h | 90 +++++++++++++++++++++---------------------
 1 file changed, 46 insertions(+), 44 deletions(-)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3a399aa372b5..95c3bd3a008e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -85,6 +85,8 @@ struct request {
 	int tag;
 	int internal_tag;
 
+	unsigned int timeout;
+
 	/* the following two fields are internal, NEVER access directly */
 	unsigned int __data_len;	/* total data len */
 	sector_t __sector;		/* sector cursor */
@@ -97,49 +99,6 @@ struct request {
 		struct request *rq_next;
 	};
 
-	/*
-	 * The hash is used inside the scheduler, and killed once the
-	 * request reaches the dispatch list. The ipi_list is only used
-	 * to queue the request for softirq completion, which is long
-	 * after the request has been unhashed (and even removed from
-	 * the dispatch list).
-	 */
-	union {
-		struct hlist_node hash;	/* merge hash */
-		struct llist_node ipi_list;
-	};
-
-	/*
-	 * The rb_node is only used inside the io scheduler, requests
-	 * are pruned when moved to the dispatch queue. So let the
-	 * completion_data share space with the rb_node.
-	 */
-	union {
-		struct rb_node rb_node;	/* sort/lookup */
-		struct bio_vec special_vec;
-		void *completion_data;
-		int error_count; /* for legacy drivers, don't use */
-	};
-
-	/*
-	 * Three pointers are available for the IO schedulers, if they need
-	 * more they have to dynamically allocate it.  Flush requests are
-	 * never put on the IO scheduler. So let the flush fields share
-	 * space with the elevator data.
-	 */
-	union {
-		struct {
-			struct io_cq		*icq;
-			void			*priv[2];
-		} elv;
-
-		struct {
-			unsigned int		seq;
-			struct list_head	list;
-			rq_end_io_fn		*saved_end_io;
-		} flush;
-	};
-
 	struct gendisk *rq_disk;
 	struct block_device *part;
 #ifdef CONFIG_BLK_RQ_ALLOC_TIME
@@ -182,9 +141,52 @@ struct request {
 	enum mq_rq_state state;
 	refcount_t ref;
 
-	unsigned int timeout;
 	unsigned long deadline;
 
+	/*
+	 * The hash is used inside the scheduler, and killed once the
+	 * request reaches the dispatch list. The ipi_list is only used
+	 * to queue the request for softirq completion, which is long
+	 * after the request has been unhashed (and even removed from
+	 * the dispatch list).
+	 */
+	union {
+		struct hlist_node hash;	/* merge hash */
+		struct llist_node ipi_list;
+	};
+
+	/*
+	 * The rb_node is only used inside the io scheduler, requests
+	 * are pruned when moved to the dispatch queue. So let the
+	 * completion_data share space with the rb_node.
+	 */
+	union {
+		struct rb_node rb_node;	/* sort/lookup */
+		struct bio_vec special_vec;
+		void *completion_data;
+		int error_count; /* for legacy drivers, don't use */
+	};
+
+
+	/*
+	 * Three pointers are available for the IO schedulers, if they need
+	 * more they have to dynamically allocate it.  Flush requests are
+	 * never put on the IO scheduler. So let the flush fields share
+	 * space with the elevator data.
+	 */
+	union {
+		struct {
+			struct io_cq		*icq;
+			void			*priv[2];
+		} elv;
+
+		struct {
+			unsigned int		seq;
+			struct list_head	list;
+			rq_end_io_fn		*saved_end_io;
+		} flush;
+	};
+
 	union {
 		struct __call_single_data csd;
 		u64 fifo_time;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/14] block: only mark bio as tracked if it really is tracked
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (7 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 08/14] block: improve layout of struct request Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  9:19   ` Christoph Hellwig
  2021-10-17  1:37 ` [PATCH 10/14] block: move blk_mq_tag_to_rq() inline Jens Axboe
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

We set BIO_TRACKED unconditionally when rq_qos_throttle() is called, even
though we may not even have an rq_qos handler. Only mark it as TRACKED if
it really is potentially tracked.

This saves considerable time for the case where the bio isn't tracked:

     2.64%     -1.65%  [kernel.vmlinux]  [k] bio_endio

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-rq-qos.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index f000f83e0621..3cfbc8668cba 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -189,9 +189,10 @@ static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio)
 	 * BIO_TRACKED lets controllers know that a bio went through the
 	 * normal rq_qos path.
 	 */
-	bio_set_flag(bio, BIO_TRACKED);
-	if (q->rq_qos)
+	if (q->rq_qos) {
+		bio_set_flag(bio, BIO_TRACKED);
 		__rq_qos_throttle(q->rq_qos, bio);
+	}
 }
 
 static inline void rq_qos_track(struct request_queue *q, struct request *rq,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/14] block: move blk_mq_tag_to_rq() inline
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (8 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 09/14] block: only mark bio as tracked if it really is tracked Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-17  1:37 ` [PATCH 11/14] block: optimize blk_mq_rq_ctx_init() Jens Axboe
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

This is in the fast path of driver issue or completion, and it's a single
array index operation. Move it inline to avoid a function call for it.

This does mean making struct blk_mq_tags block layer public, but there's
not really much in there.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq-tag.h     | 23 -----------------------
 block/blk-mq.c         | 11 -----------
 include/linux/blk-mq.h | 35 ++++++++++++++++++++++++++++++++++-
 3 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 71c2f7d8e9b7..e617c7220626 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -4,29 +4,6 @@
 
 struct blk_mq_alloc_data;
 
-/*
- * Tag address space map.
- */
-struct blk_mq_tags {
-	unsigned int nr_tags;
-	unsigned int nr_reserved_tags;
-
-	atomic_t active_queues;
-
-	struct sbitmap_queue bitmap_tags;
-	struct sbitmap_queue breserved_tags;
-
-	struct request **rqs;
-	struct request **static_rqs;
-	struct list_head page_list;
-
-	/*
-	 * used to clear request reference in rqs[] before freeing one
-	 * request pool
-	 */
-	spinlock_t lock;
-};
-
 extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
 					unsigned int reserved_tags,
 					int node, int alloc_policy);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index cd1249284c1f..064fdeeb1be5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1069,17 +1069,6 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
 
-struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
-{
-	if (tag < tags->nr_tags) {
-		prefetch(tags->rqs[tag]);
-		return tags->rqs[tag];
-	}
-
-	return NULL;
-}
-EXPORT_SYMBOL(blk_mq_tag_to_rq);
-
 static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq,
 			       void *priv, bool reserved)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 95c3bd3a008e..1dccea9505e5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -685,7 +685,40 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 		unsigned int op, blk_mq_req_flags_t flags,
 		unsigned int hctx_idx);
-struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag);
+
+/*
+ * Tag address space map.
+ */
+struct blk_mq_tags {
+	unsigned int nr_tags;
+	unsigned int nr_reserved_tags;
+
+	atomic_t active_queues;
+
+	struct sbitmap_queue bitmap_tags;
+	struct sbitmap_queue breserved_tags;
+
+	struct request **rqs;
+	struct request **static_rqs;
+	struct list_head page_list;
+
+	/*
+	 * used to clear request reference in rqs[] before freeing one
+	 * request pool
+	 */
+	spinlock_t lock;
+};
+
+static inline struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags,
+					       unsigned int tag)
+{
+	if (tag < tags->nr_tags) {
+		prefetch(tags->rqs[tag]);
+		return tags->rqs[tag];
+	}
+
+	return NULL;
+}
 
 enum {
 	BLK_MQ_UNIQUE_TAG_BITS = 16,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/14] block: optimize blk_mq_rq_ctx_init()
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (9 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 10/14] block: move blk_mq_tag_to_rq() inline Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  9:20   ` Christoph Hellwig
  2021-10-17  1:37 ` [PATCH 12/14] block: align blkdev_dio inlined bio to a cacheline Jens Axboe
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

This takes ~4.7% on a peak performance run, with this patch it brings it
down to ~3.7%.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: rebase and move queuelist init]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c | 64 ++++++++++++++++++++++++--------------------------
 1 file changed, 31 insertions(+), 33 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 064fdeeb1be5..0d05ac4a3b50 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -304,63 +304,62 @@ static inline bool blk_mq_need_time_stamp(struct request *rq)
 static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		unsigned int tag, u64 alloc_time_ns)
 {
+	struct blk_mq_ctx *ctx = data->ctx;
+	struct blk_mq_hw_ctx *hctx = data->hctx;
+	struct request_queue *q = data->q;
 	struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
 	struct request *rq = tags->static_rqs[tag];
+	unsigned int cmd_flags = data->cmd_flags;
+	u64 start_time_ns = 0;
 
-	if (data->q->elevator) {
-		rq->rq_flags = RQF_ELV;
-		rq->tag = BLK_MQ_NO_TAG;
-		rq->internal_tag = tag;
-	} else {
+	rq->q = q;
+	rq->mq_ctx = ctx;
+	rq->mq_hctx = hctx;
+	rq->cmd_flags = cmd_flags;
+	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
+	data->hctx->queued++;
+	if (!q->elevator) {
 		rq->rq_flags = 0;
 		rq->tag = tag;
 		rq->internal_tag = BLK_MQ_NO_TAG;
+	} else {
+		rq->rq_flags = RQF_ELV;
+		rq->tag = BLK_MQ_NO_TAG;
+		rq->internal_tag = tag;
 	}
-
-	/* csd/requeue_work/fifo_time is initialized before use */
-	rq->q = data->q;
-	rq->mq_ctx = data->ctx;
-	rq->mq_hctx = data->hctx;
-	rq->cmd_flags = data->cmd_flags;
-	if (data->flags & BLK_MQ_REQ_PM)
-		rq->rq_flags |= RQF_PM;
-	if (blk_queue_io_stat(data->q))
-		rq->rq_flags |= RQF_IO_STAT;
+	rq->timeout = 0;
 	INIT_LIST_HEAD(&rq->queuelist);
-	INIT_HLIST_NODE(&rq->hash);
-	RB_CLEAR_NODE(&rq->rb_node);
 	rq->rq_disk = NULL;
 	rq->part = NULL;
 #ifdef CONFIG_BLK_RQ_ALLOC_TIME
 	rq->alloc_time_ns = alloc_time_ns;
 #endif
-	if (blk_mq_need_time_stamp(rq))
-		rq->start_time_ns = ktime_get_ns();
-	else
-		rq->start_time_ns = 0;
 	rq->io_start_time_ns = 0;
 	rq->stats_sectors = 0;
 	rq->nr_phys_segments = 0;
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 	rq->nr_integrity_segments = 0;
 #endif
-	blk_crypto_rq_set_defaults(rq);
-	/* tag was already set */
-	WRITE_ONCE(rq->deadline, 0);
-
-	rq->timeout = 0;
-
 	rq->end_io = NULL;
 	rq->end_io_data = NULL;
-
-	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
+	if (data->flags & BLK_MQ_REQ_PM)
+		rq->rq_flags |= RQF_PM;
+	if (blk_queue_io_stat(data->q))
+		rq->rq_flags |= RQF_IO_STAT;
+	if (blk_mq_need_time_stamp(rq))
+		start_time_ns = ktime_get_ns();
+	rq->start_time_ns = start_time_ns;
+	blk_crypto_rq_set_defaults(rq);
 	refcount_set(&rq->ref, 1);
+	WRITE_ONCE(rq->deadline, 0);
 
-	if (!op_is_flush(data->cmd_flags) && (rq->rq_flags & RQF_ELV)) {
-		struct elevator_queue *e = data->q->elevator;
+	if (rq->rq_flags & RQF_ELV) {
+		struct elevator_queue *e = q->elevator;
 
 		rq->elv.icq = NULL;
-		if (e->type->ops.prepare_request) {
+		RB_CLEAR_NODE(&rq->rb_node);
+		INIT_HLIST_NODE(&rq->hash);
+		if (!op_is_flush(cmd_flags) && e->type->ops.prepare_request) {
 			if (e->type->icq_cache)
 				blk_mq_sched_assign_ioc(rq);
 
@@ -369,7 +368,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		}
 	}
 
-	data->hctx->queued++;
 	return rq;
 }
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 12/14] block: align blkdev_dio inlined bio to a cacheline
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (10 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 11/14] block: optimize blk_mq_rq_ctx_init() Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  9:21   ` Christoph Hellwig
  2021-10-17  1:37 ` [PATCH 13/14] block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes Jens Axboe
  2021-10-17  1:37 ` [PATCH 14/14] block: remove some blk_mq_hw_ctx debugfs entries Jens Axboe
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

We get all sorts of unreliable and funky results since the bio is
designed to align on a cacheline, which it does not when inlined like
this.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/fops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/fops.c b/block/fops.c
index 1d4f862950bb..7eebd590342b 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -137,7 +137,7 @@ struct blkdev_dio {
 	size_t			size;
 	atomic_t		ref;
 	unsigned int		flags;
-	struct bio		bio;
+	struct bio		bio ____cacheline_aligned_in_smp;
 };
 
 static struct bio_set blkdev_dio_pool;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 13/14] block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (11 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 12/14] block: align blkdev_dio inlined bio to a cacheline Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  9:21   ` Christoph Hellwig
  2021-10-17  1:37 ` [PATCH 14/14] block: remove some blk_mq_hw_ctx debugfs entries Jens Axboe
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

These were added as part of early days debugging for blk-mq, and they
are not really useful anymore. Rather than spend cycles updating them,
just get rid of them.

As a bonus, this shrinks the per-cpu software queue size from 256b
to 192b. That's a whole cacheline less.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq-debugfs.c | 54 ------------------------------------------
 block/blk-mq-sched.c   |  5 +---
 block/blk-mq.c         |  3 ---
 block/blk-mq.h         |  7 ------
 4 files changed, 1 insertion(+), 68 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 409a8256d9ff..928a16af9175 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -663,57 +663,6 @@ CTX_RQ_SEQ_OPS(default, HCTX_TYPE_DEFAULT);
 CTX_RQ_SEQ_OPS(read, HCTX_TYPE_READ);
 CTX_RQ_SEQ_OPS(poll, HCTX_TYPE_POLL);
 
-static int ctx_dispatched_show(void *data, struct seq_file *m)
-{
-	struct blk_mq_ctx *ctx = data;
-
-	seq_printf(m, "%lu %lu\n", ctx->rq_dispatched[1], ctx->rq_dispatched[0]);
-	return 0;
-}
-
-static ssize_t ctx_dispatched_write(void *data, const char __user *buf,
-				    size_t count, loff_t *ppos)
-{
-	struct blk_mq_ctx *ctx = data;
-
-	ctx->rq_dispatched[0] = ctx->rq_dispatched[1] = 0;
-	return count;
-}
-
-static int ctx_merged_show(void *data, struct seq_file *m)
-{
-	struct blk_mq_ctx *ctx = data;
-
-	seq_printf(m, "%lu\n", ctx->rq_merged);
-	return 0;
-}
-
-static ssize_t ctx_merged_write(void *data, const char __user *buf,
-				size_t count, loff_t *ppos)
-{
-	struct blk_mq_ctx *ctx = data;
-
-	ctx->rq_merged = 0;
-	return count;
-}
-
-static int ctx_completed_show(void *data, struct seq_file *m)
-{
-	struct blk_mq_ctx *ctx = data;
-
-	seq_printf(m, "%lu %lu\n", ctx->rq_completed[1], ctx->rq_completed[0]);
-	return 0;
-}
-
-static ssize_t ctx_completed_write(void *data, const char __user *buf,
-				   size_t count, loff_t *ppos)
-{
-	struct blk_mq_ctx *ctx = data;
-
-	ctx->rq_completed[0] = ctx->rq_completed[1] = 0;
-	return count;
-}
-
 static int blk_mq_debugfs_show(struct seq_file *m, void *v)
 {
 	const struct blk_mq_debugfs_attr *attr = m->private;
@@ -803,9 +752,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
 	{"default_rq_list", 0400, .seq_ops = &ctx_default_rq_list_seq_ops},
 	{"read_rq_list", 0400, .seq_ops = &ctx_read_rq_list_seq_ops},
 	{"poll_rq_list", 0400, .seq_ops = &ctx_poll_rq_list_seq_ops},
-	{"dispatched", 0600, ctx_dispatched_show, ctx_dispatched_write},
-	{"merged", 0600, ctx_merged_show, ctx_merged_write},
-	{"completed", 0600, ctx_completed_show, ctx_completed_write},
 	{},
 };
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index efc1374b8831..e85b7556b096 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -387,13 +387,10 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
 	 * potentially merge with. Currently includes a hand-wavy stop
 	 * count of 8, to not spend too much time checking for merges.
 	 */
-	if (blk_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs)) {
-		ctx->rq_merged++;
+	if (blk_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs))
 		ret = true;
-	}
 
 	spin_unlock(&ctx->lock);
-
 	return ret;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0d05ac4a3b50..4d91b74ce67a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -316,7 +316,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	rq->mq_ctx = ctx;
 	rq->mq_hctx = hctx;
 	rq->cmd_flags = cmd_flags;
-	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
 	data->hctx->queued++;
 	if (!q->elevator) {
 		rq->rq_flags = 0;
@@ -573,7 +572,6 @@ static void __blk_mq_free_request(struct request *rq)
 void blk_mq_free_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
-	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
 	if (rq->rq_flags & (RQF_ELVPRIV | RQF_ELV)) {
@@ -587,7 +585,6 @@ void blk_mq_free_request(struct request *rq)
 		}
 	}
 
-	ctx->rq_completed[rq_is_sync(rq)]++;
 	if (rq->rq_flags & RQF_MQ_INFLIGHT)
 		__blk_mq_dec_active_requests(hctx);
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ceed0a001c76..c12554c58abd 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -25,13 +25,6 @@ struct blk_mq_ctx {
 	unsigned short		index_hw[HCTX_MAX_TYPES];
 	struct blk_mq_hw_ctx 	*hctxs[HCTX_MAX_TYPES];
 
-	/* incremented at dispatch time */
-	unsigned long		rq_dispatched[2];
-	unsigned long		rq_merged;
-
-	/* incremented at completion time */
-	unsigned long		____cacheline_aligned_in_smp rq_completed[2];
-
 	struct request_queue	*queue;
 	struct blk_mq_ctxs      *ctxs;
 	struct kobject		kobj;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 14/14] block: remove some blk_mq_hw_ctx debugfs entries
  2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
                   ` (12 preceding siblings ...)
  2021-10-17  1:37 ` [PATCH 13/14] block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes Jens Axboe
@ 2021-10-17  1:37 ` Jens Axboe
  2021-10-18  9:22   ` Christoph Hellwig
  13 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2021-10-17  1:37 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Just like the blk_mq_ctx counterparts, we've got a bunch of counters
in here that are only for debugfs and are of questionnable value. They
are:

- dispatched, index of how many requests were dispatched in one go

- poll_{considered,invoked,success}, which track poll sucess rates. We're
  confident in the iopoll implementation at this point, don't bother
  tracking these.

As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes,
dropping a whole cacheline.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq-debugfs.c | 67 ------------------------------------------
 block/blk-mq.c         | 16 ----------
 include/linux/blk-mq.h | 10 -------
 3 files changed, 93 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 928a16af9175..68ca5d21cda7 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -529,70 +529,6 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
 	return res;
 }
 
-static int hctx_io_poll_show(void *data, struct seq_file *m)
-{
-	struct blk_mq_hw_ctx *hctx = data;
-
-	seq_printf(m, "considered=%lu\n", hctx->poll_considered);
-	seq_printf(m, "invoked=%lu\n", hctx->poll_invoked);
-	seq_printf(m, "success=%lu\n", hctx->poll_success);
-	return 0;
-}
-
-static ssize_t hctx_io_poll_write(void *data, const char __user *buf,
-				  size_t count, loff_t *ppos)
-{
-	struct blk_mq_hw_ctx *hctx = data;
-
-	hctx->poll_considered = hctx->poll_invoked = hctx->poll_success = 0;
-	return count;
-}
-
-static int hctx_dispatched_show(void *data, struct seq_file *m)
-{
-	struct blk_mq_hw_ctx *hctx = data;
-	int i;
-
-	seq_printf(m, "%8u\t%lu\n", 0U, hctx->dispatched[0]);
-
-	for (i = 1; i < BLK_MQ_MAX_DISPATCH_ORDER - 1; i++) {
-		unsigned int d = 1U << (i - 1);
-
-		seq_printf(m, "%8u\t%lu\n", d, hctx->dispatched[i]);
-	}
-
-	seq_printf(m, "%8u+\t%lu\n", 1U << (i - 1), hctx->dispatched[i]);
-	return 0;
-}
-
-static ssize_t hctx_dispatched_write(void *data, const char __user *buf,
-				     size_t count, loff_t *ppos)
-{
-	struct blk_mq_hw_ctx *hctx = data;
-	int i;
-
-	for (i = 0; i < BLK_MQ_MAX_DISPATCH_ORDER; i++)
-		hctx->dispatched[i] = 0;
-	return count;
-}
-
-static int hctx_queued_show(void *data, struct seq_file *m)
-{
-	struct blk_mq_hw_ctx *hctx = data;
-
-	seq_printf(m, "%lu\n", hctx->queued);
-	return 0;
-}
-
-static ssize_t hctx_queued_write(void *data, const char __user *buf,
-				 size_t count, loff_t *ppos)
-{
-	struct blk_mq_hw_ctx *hctx = data;
-
-	hctx->queued = 0;
-	return count;
-}
-
 static int hctx_run_show(void *data, struct seq_file *m)
 {
 	struct blk_mq_hw_ctx *hctx = data;
@@ -738,9 +674,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
 	{"tags_bitmap", 0400, hctx_tags_bitmap_show},
 	{"sched_tags", 0400, hctx_sched_tags_show},
 	{"sched_tags_bitmap", 0400, hctx_sched_tags_bitmap_show},
-	{"io_poll", 0600, hctx_io_poll_show, hctx_io_poll_write},
-	{"dispatched", 0600, hctx_dispatched_show, hctx_dispatched_write},
-	{"queued", 0600, hctx_queued_show, hctx_queued_write},
 	{"run", 0600, hctx_run_show, hctx_run_write},
 	{"active", 0400, hctx_active_show},
 	{"dispatch_busy", 0400, hctx_dispatch_busy_show},
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4d91b74ce67a..2197cfbf081f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -316,7 +316,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	rq->mq_ctx = ctx;
 	rq->mq_hctx = hctx;
 	rq->cmd_flags = cmd_flags;
-	data->hctx->queued++;
 	if (!q->elevator) {
 		rq->rq_flags = 0;
 		rq->tag = tag;
@@ -1268,14 +1267,6 @@ struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
 	return data.rq;
 }
 
-static inline unsigned int queued_to_index(unsigned int queued)
-{
-	if (!queued)
-		return 0;
-
-	return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
-}
-
 static bool __blk_mq_alloc_driver_tag(struct request *rq)
 {
 	struct sbitmap_queue *bt = &rq->mq_hctx->tags->bitmap_tags;
@@ -1597,8 +1588,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 	if (!list_empty(&zone_list))
 		list_splice_tail_init(&zone_list, list);
 
-	hctx->dispatched[queued_to_index(queued)]++;
-
 	/* If we didn't flush the entire list, we could have told the driver
 	 * there was more coming, but that turned out to be a lie.
 	 */
@@ -4212,14 +4201,9 @@ static int blk_mq_poll_classic(struct request_queue *q, blk_qc_t cookie,
 	long state = get_current_state();
 	int ret;
 
-	hctx->poll_considered++;
-
 	do {
-		hctx->poll_invoked++;
-
 		ret = q->mq_ops->poll(hctx);
 		if (ret > 0) {
-			hctx->poll_success++;
 			__set_current_state(TASK_RUNNING);
 			return ret;
 		}
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1dccea9505e5..9fc868abcc81 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -341,9 +341,6 @@ struct blk_mq_hw_ctx {
 	unsigned long		queued;
 	/** @run: Number of dispatched requests. */
 	unsigned long		run;
-#define BLK_MQ_MAX_DISPATCH_ORDER	7
-	/** @dispatched: Number of dispatch requests by queue. */
-	unsigned long		dispatched[BLK_MQ_MAX_DISPATCH_ORDER];
 
 	/** @numa_node: NUMA node the storage adapter has been connected to. */
 	unsigned int		numa_node;
@@ -363,13 +360,6 @@ struct blk_mq_hw_ctx {
 	/** @kobj: Kernel object for sysfs. */
 	struct kobject		kobj;
 
-	/** @poll_considered: Count times blk_mq_poll() was called. */
-	unsigned long		poll_considered;
-	/** @poll_invoked: Count how many requests blk_mq_poll() polled. */
-	unsigned long		poll_invoked;
-	/** @poll_success: Count how many polled requests were completed. */
-	unsigned long		poll_success;
-
 #ifdef CONFIG_BLK_DEBUG_FS
 	/**
 	 * @debugfs_dir: debugfs directory for this hardware queue. Named
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/14] block: change plugging to use a singly linked list
  2021-10-17  1:37 ` [PATCH 07/14] block: change plugging to use a singly linked list Jens Axboe
@ 2021-10-17  4:45     ` kernel test robot
  2021-10-18  9:19   ` Christoph Hellwig
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: kernel test robot @ 2021-10-17  4:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: llvm, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7043 bytes --]

Hi Jens,

I love your patch! Yet something to improve:

[auto build test ERROR on axboe-block/for-next]
[cannot apply to linus/master v5.15-rc5 next-20211015]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jens-Axboe/Various-block-layer-optimizations/20211017-093822
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: hexagon-randconfig-r041-20211017 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 746dd6a700931988dd9021d3d04718f1929885a5)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6ed4540d0461ab59f1dc96a0b448eb12c3446182
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jens-Axboe/Various-block-layer-optimizations/20211017-093822
        git checkout 6ed4540d0461ab59f1dc96a0b448eb12c3446182
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> block/blk-core.c:1636:7: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
           if (!rq_list_empty(plug->mq_list))
                ^
   1 error generated.
--
>> block/blk-merge.c:1094:15: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
           if (!plug || rq_list_empty(plug->mq_list))
                        ^
>> block/blk-merge.c:1098:7: error: implicit declaration of function 'rq_list_peek' [-Werror,-Wimplicit-function-declaration]
           rq = rq_list_peek(&plug->mq_list);
                ^
   block/blk-merge.c:1098:7: note: did you mean 'bio_list_peek'?
   include/linux/bio.h:553:27: note: 'bio_list_peek' declared here
   static inline struct bio *bio_list_peek(struct bio_list *bl)
                             ^
>> block/blk-merge.c:1098:5: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
           rq = rq_list_peek(&plug->mq_list);
              ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning and 2 errors generated.
--
>> block/blk-mq.c:2134:10: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
           while (!rq_list_empty(plug->mq_list)) {
                   ^
>> block/blk-mq.c:2138:8: error: implicit declaration of function 'rq_list_pop' [-Werror,-Wimplicit-function-declaration]
                   rq = rq_list_pop(&plug->mq_list);
                        ^
   block/blk-mq.c:2138:8: note: did you mean 'bio_list_pop'?
   include/linux/bio.h:558:27: note: 'bio_list_pop' declared here
   static inline struct bio *bio_list_pop(struct bio_list *bl)
                             ^
>> block/blk-mq.c:2138:6: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
                   rq = rq_list_pop(&plug->mq_list);
                      ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
   block/blk-mq.c:2181:6: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
           if (rq_list_empty(plug->mq_list))
               ^
   block/blk-mq.c:2197:8: error: implicit declaration of function 'rq_list_pop' [-Werror,-Wimplicit-function-declaration]
                   rq = rq_list_pop(&plug->mq_list);
                        ^
   block/blk-mq.c:2197:6: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
                   rq = rq_list_pop(&plug->mq_list);
                      ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> block/blk-mq.c:2402:25: error: implicit declaration of function 'rq_list_peek' [-Werror,-Wimplicit-function-declaration]
                   struct request *nxt = rq_list_peek(&plug->mq_list);
                                         ^
   block/blk-mq.c:2402:25: note: did you mean 'bio_list_peek'?
   include/linux/bio.h:553:27: note: 'bio_list_peek' declared here
   static inline struct bio *bio_list_peek(struct bio_list *bl)
                             ^
>> block/blk-mq.c:2402:19: warning: incompatible integer to pointer conversion initializing 'struct request *' with an expression of type 'int' [-Wint-conversion]
                   struct request *nxt = rq_list_peek(&plug->mq_list);
                                   ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> block/blk-mq.c:2410:2: error: implicit declaration of function 'rq_list_add' [-Werror,-Wimplicit-function-declaration]
           rq_list_add(&plug->mq_list, rq);
           ^
   block/blk-mq.c:2410:2: note: did you mean '__list_add'?
   include/linux/list.h:63:20: note: '__list_add' declared here
   static inline void __list_add(struct list_head *new,
                      ^
   block/blk-mq.c:2526:10: error: implicit declaration of function 'rq_list_peek' [-Werror,-Wimplicit-function-declaration]
                           tmp = rq_list_peek(&plug->mq_list);
                                 ^
   block/blk-mq.c:2526:8: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
                           tmp = rq_list_peek(&plug->mq_list);
                               ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   block/blk-mq.c:2550:8: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
                   if (!rq_list_empty(plug->mq_list) && same_queue_rq) {
                        ^
   block/blk-mq.c:2551:15: error: implicit declaration of function 'rq_list_pop' [-Werror,-Wimplicit-function-declaration]
                           first_rq = rq_list_pop(&plug->mq_list);
                                      ^
   block/blk-mq.c:2551:13: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
                           first_rq = rq_list_pop(&plug->mq_list);
                                    ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
   5 warnings and 9 errors generated.


vim +/rq_list_empty +1636 block/blk-core.c

  1631	
  1632	void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
  1633	{
  1634		flush_plug_callbacks(plug, from_schedule);
  1635	
> 1636		if (!rq_list_empty(plug->mq_list))
  1637			blk_mq_flush_plug_list(plug, from_schedule);
  1638		if (unlikely(!from_schedule && plug->cached_rq))
  1639			blk_mq_free_plug_rqs(plug);
  1640	}
  1641	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24159 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/14] block: change plugging to use a singly linked list
@ 2021-10-17  4:45     ` kernel test robot
  0 siblings, 0 replies; 40+ messages in thread
From: kernel test robot @ 2021-10-17  4:45 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7168 bytes --]

Hi Jens,

I love your patch! Yet something to improve:

[auto build test ERROR on axboe-block/for-next]
[cannot apply to linus/master v5.15-rc5 next-20211015]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jens-Axboe/Various-block-layer-optimizations/20211017-093822
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: hexagon-randconfig-r041-20211017 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 746dd6a700931988dd9021d3d04718f1929885a5)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6ed4540d0461ab59f1dc96a0b448eb12c3446182
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jens-Axboe/Various-block-layer-optimizations/20211017-093822
        git checkout 6ed4540d0461ab59f1dc96a0b448eb12c3446182
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> block/blk-core.c:1636:7: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
           if (!rq_list_empty(plug->mq_list))
                ^
   1 error generated.
--
>> block/blk-merge.c:1094:15: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
           if (!plug || rq_list_empty(plug->mq_list))
                        ^
>> block/blk-merge.c:1098:7: error: implicit declaration of function 'rq_list_peek' [-Werror,-Wimplicit-function-declaration]
           rq = rq_list_peek(&plug->mq_list);
                ^
   block/blk-merge.c:1098:7: note: did you mean 'bio_list_peek'?
   include/linux/bio.h:553:27: note: 'bio_list_peek' declared here
   static inline struct bio *bio_list_peek(struct bio_list *bl)
                             ^
>> block/blk-merge.c:1098:5: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
           rq = rq_list_peek(&plug->mq_list);
              ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning and 2 errors generated.
--
>> block/blk-mq.c:2134:10: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
           while (!rq_list_empty(plug->mq_list)) {
                   ^
>> block/blk-mq.c:2138:8: error: implicit declaration of function 'rq_list_pop' [-Werror,-Wimplicit-function-declaration]
                   rq = rq_list_pop(&plug->mq_list);
                        ^
   block/blk-mq.c:2138:8: note: did you mean 'bio_list_pop'?
   include/linux/bio.h:558:27: note: 'bio_list_pop' declared here
   static inline struct bio *bio_list_pop(struct bio_list *bl)
                             ^
>> block/blk-mq.c:2138:6: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
                   rq = rq_list_pop(&plug->mq_list);
                      ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
   block/blk-mq.c:2181:6: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
           if (rq_list_empty(plug->mq_list))
               ^
   block/blk-mq.c:2197:8: error: implicit declaration of function 'rq_list_pop' [-Werror,-Wimplicit-function-declaration]
                   rq = rq_list_pop(&plug->mq_list);
                        ^
   block/blk-mq.c:2197:6: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
                   rq = rq_list_pop(&plug->mq_list);
                      ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> block/blk-mq.c:2402:25: error: implicit declaration of function 'rq_list_peek' [-Werror,-Wimplicit-function-declaration]
                   struct request *nxt = rq_list_peek(&plug->mq_list);
                                         ^
   block/blk-mq.c:2402:25: note: did you mean 'bio_list_peek'?
   include/linux/bio.h:553:27: note: 'bio_list_peek' declared here
   static inline struct bio *bio_list_peek(struct bio_list *bl)
                             ^
>> block/blk-mq.c:2402:19: warning: incompatible integer to pointer conversion initializing 'struct request *' with an expression of type 'int' [-Wint-conversion]
                   struct request *nxt = rq_list_peek(&plug->mq_list);
                                   ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> block/blk-mq.c:2410:2: error: implicit declaration of function 'rq_list_add' [-Werror,-Wimplicit-function-declaration]
           rq_list_add(&plug->mq_list, rq);
           ^
   block/blk-mq.c:2410:2: note: did you mean '__list_add'?
   include/linux/list.h:63:20: note: '__list_add' declared here
   static inline void __list_add(struct list_head *new,
                      ^
   block/blk-mq.c:2526:10: error: implicit declaration of function 'rq_list_peek' [-Werror,-Wimplicit-function-declaration]
                           tmp = rq_list_peek(&plug->mq_list);
                                 ^
   block/blk-mq.c:2526:8: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
                           tmp = rq_list_peek(&plug->mq_list);
                               ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   block/blk-mq.c:2550:8: error: implicit declaration of function 'rq_list_empty' [-Werror,-Wimplicit-function-declaration]
                   if (!rq_list_empty(plug->mq_list) && same_queue_rq) {
                        ^
   block/blk-mq.c:2551:15: error: implicit declaration of function 'rq_list_pop' [-Werror,-Wimplicit-function-declaration]
                           first_rq = rq_list_pop(&plug->mq_list);
                                      ^
   block/blk-mq.c:2551:13: warning: incompatible integer to pointer conversion assigning to 'struct request *' from 'int' [-Wint-conversion]
                           first_rq = rq_list_pop(&plug->mq_list);
                                    ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
   5 warnings and 9 errors generated.


vim +/rq_list_empty +1636 block/blk-core.c

  1631	
  1632	void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
  1633	{
  1634		flush_plug_callbacks(plug, from_schedule);
  1635	
> 1636		if (!rq_list_empty(plug->mq_list))
  1637			blk_mq_flush_plug_list(plug, from_schedule);
  1638		if (unlikely(!from_schedule && plug->cached_rq))
  1639			blk_mq_free_plug_rqs(plug);
  1640	}
  1641	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 24159 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/14] block: inline fast path of driver tag allocation
  2021-10-17  1:37 ` [PATCH 01/14] block: inline fast path of driver tag allocation Jens Axboe
@ 2021-10-18  8:42   ` Christoph Hellwig
  2021-10-18 14:38     ` Jens Axboe
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  8:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Sat, Oct 16, 2021 at 07:37:35PM -0600, Jens Axboe wrote:
> If we don't use an IO scheduler or have shared tags, then we don't need
> to call into this external function at all. This saves ~2% for such
> a setup.

This looks correct, although the call chain gets a little confusing
now.  How much difference is this over basically inlining the
whole old blk_mq_get_driver_tag?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 02/14] block: don't bother iter advancing a fully done bio
  2021-10-17  1:37 ` [PATCH 02/14] block: don't bother iter advancing a fully done bio Jens Axboe
@ 2021-10-18  8:42   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  8:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/14] block: remove useless caller argument to print_req_error()
  2021-10-17  1:37 ` [PATCH 03/14] block: remove useless caller argument to print_req_error() Jens Axboe
@ 2021-10-18  8:42   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  8:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Sat, Oct 16, 2021 at 07:37:37PM -0600, Jens Axboe wrote:
> We have exactly one caller of this, just get rid of adding the useless
> function name to the output.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/14] block: move update request helpers into blk-mq.c
  2021-10-17  1:37 ` [PATCH 04/14] block: move update request helpers into blk-mq.c Jens Axboe
@ 2021-10-18  8:43   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  8:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 05/14] block: don't call blk_status_to_errno() for success status
  2021-10-17  1:37 ` [PATCH 05/14] block: don't call blk_status_to_errno() for success status Jens Axboe
@ 2021-10-18  8:53   ` Christoph Hellwig
  2021-10-18 10:49   ` Pavel Begunkov
  1 sibling, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  8:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig

On Sat, Oct 16, 2021 at 07:37:39PM -0600, Jens Axboe wrote:
> We only need to call it to resolve the blk_status_t -> errno mapping
> if the status is different than BLK_STS_OK. Check that it is before
> doing so.
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---

Actually what I had in mind was something like this:

---
From 371ccad84c28127ff0ce8a09106505986532b5ec Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 18 Oct 2021 10:45:18 +0200
Subject: block: don't call blk_status_to_errno in blk_update_request

We only need to call it to resolve the blk_status_t -> errno mapping for
tracing, so move the conversion into the tracepoints that are not called
at all when tracing isn't enabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c             | 2 +-
 include/trace/events/block.h | 6 +++---
 kernel/trace/blktrace.c      | 7 ++++---
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d5b0258dd218b..007bc2b339c55 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1451,7 +1451,7 @@ bool blk_update_request(struct request *req, blk_status_t error,
 {
 	int total_bytes;
 
-	trace_block_rq_complete(req, blk_status_to_errno(error), nr_bytes);
+	trace_block_rq_complete(req, error, nr_bytes);
 
 	if (!req->bio)
 		return false;
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index cc5ab96a7471f..a95daa4d4caa2 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -114,7 +114,7 @@ TRACE_EVENT(block_rq_requeue,
  */
 TRACE_EVENT(block_rq_complete,
 
-	TP_PROTO(struct request *rq, int error, unsigned int nr_bytes),
+	TP_PROTO(struct request *rq, blk_status_t error, unsigned int nr_bytes),
 
 	TP_ARGS(rq, error, nr_bytes),
 
@@ -122,7 +122,7 @@ TRACE_EVENT(block_rq_complete,
 		__field(  dev_t,	dev			)
 		__field(  sector_t,	sector			)
 		__field(  unsigned int,	nr_sector		)
-		__field(  int,		error			)
+		__field(  int	,	error			)
 		__array(  char,		rwbs,	RWBS_LEN	)
 		__dynamic_array( char,	cmd,	1		)
 	),
@@ -131,7 +131,7 @@ TRACE_EVENT(block_rq_complete,
 		__entry->dev	   = rq->rq_disk ? disk_devt(rq->rq_disk) : 0;
 		__entry->sector    = blk_rq_pos(rq);
 		__entry->nr_sector = nr_bytes >> 9;
-		__entry->error     = error;
+		__entry->error     = blk_status_to_errno(error);
 
 		blk_fill_rwbs(__entry->rwbs, rq->cmd_flags);
 		__get_str(cmd)[0] = '\0';
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index fa91f398f28b7..1183c88634aa6 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -816,7 +816,7 @@ blk_trace_request_get_cgid(struct request *rq)
  *     Records an action against a request. Will log the bio offset + size.
  *
  **/
-static void blk_add_trace_rq(struct request *rq, int error,
+static void blk_add_trace_rq(struct request *rq, blk_status_t error,
 			     unsigned int nr_bytes, u32 what, u64 cgid)
 {
 	struct blk_trace *bt;
@@ -834,7 +834,8 @@ static void blk_add_trace_rq(struct request *rq, int error,
 		what |= BLK_TC_ACT(BLK_TC_FS);
 
 	__blk_add_trace(bt, blk_rq_trace_sector(rq), nr_bytes, req_op(rq),
-			rq->cmd_flags, what, error, 0, NULL, cgid);
+			rq->cmd_flags, what, blk_status_to_errno(error), 0,
+			NULL, cgid);
 	rcu_read_unlock();
 }
 
@@ -863,7 +864,7 @@ static void blk_add_trace_rq_requeue(void *ignore, struct request *rq)
 }
 
 static void blk_add_trace_rq_complete(void *ignore, struct request *rq,
-			int error, unsigned int nr_bytes)
+			blk_status_t error, unsigned int nr_bytes)
 {
 	blk_add_trace_rq(rq, error, nr_bytes, BLK_TA_COMPLETE,
 			 blk_trace_request_get_cgid(rq));
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/14] block: store elevator state in request
  2021-10-17  1:37 ` [PATCH 06/14] block: store elevator state in request Jens Axboe
@ 2021-10-18  8:58   ` Christoph Hellwig
  2021-10-18 14:34     ` Jens Axboe
  2021-10-19 22:21   ` Guillaume Tucker
  1 sibling, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  8:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Sat, Oct 16, 2021 at 07:37:40PM -0600, Jens Axboe wrote:
>  
>  static inline void blk_mq_sched_requeue_request(struct request *rq)
>  {
> +	if (rq->rq_flags & RQF_ELV) {
> +		struct request_queue *q = rq->q;
> +		struct elevator_queue *e = q->elevator;
>  
> +		if ((rq->rq_flags & RQF_ELVPRIV) && e->type->ops.requeue_request)
> +			e->type->ops.requeue_request(rq);
> +	}

I think we can just kill of RQF_ELVPRIV now.  It is equivalent to
RQF_ELV for !op_is_flush requests.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/14] block: change plugging to use a singly linked list
  2021-10-17  1:37 ` [PATCH 07/14] block: change plugging to use a singly linked list Jens Axboe
  2021-10-17  4:45     ` kernel test robot
@ 2021-10-18  9:19   ` Christoph Hellwig
  2021-10-18 16:10     ` Jens Axboe
  2021-10-18 12:56   ` Pavel Begunkov
  2021-10-24 14:09   ` kernel test robot
  3 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  9:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Sat, Oct 16, 2021 at 07:37:41PM -0600, Jens Axboe wrote:
> Use a singly linked list for the blk_plug. This saves 8 bytes in the
> blk_plug struct, and makes for faster list manipulations than doubly
> linked lists. As we don't use the doubly linked lists for anything,
> singly linked is just fine.

This patch also does a few other thing, like changing the same_queue_rq
type and adding a has_elevator flag to the plug.  Can you please split
this out and document the changes better?

>static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
>  {
> +	struct blk_mq_hw_ctx *hctx = NULL;
> +	int queued = 0;
> +	int errors = 0;
> +
> +	while (!rq_list_empty(plug->mq_list)) {
> +		struct request *rq;
> +		blk_status_t ret;
> +
> +		rq = rq_list_pop(&plug->mq_list);
>  
> +		if (!hctx)
> +			hctx = rq->mq_hctx;
> +		else if (hctx != rq->mq_hctx && hctx->queue->mq_ops->commit_rqs) {
> +			trace_block_unplug(hctx->queue, queued, !from_schedule);
> +			hctx->queue->mq_ops->commit_rqs(hctx);
> +			queued = 0;
> +			hctx = rq->mq_hctx;
> +		}
> +
> +		ret = blk_mq_request_issue_directly(rq,
> +						rq_list_empty(plug->mq_list));
> +		if (ret != BLK_STS_OK) {
> +			if (ret == BLK_STS_RESOURCE ||
> +					ret == BLK_STS_DEV_RESOURCE) {
> +				blk_mq_request_bypass_insert(rq, false,
> +						rq_list_empty(plug->mq_list));
> +				break;
> +			}
> +			blk_mq_end_request(rq, ret);
> +			errors++;
> +		} else
> +			queued++;
> +	}

This all looks a bit messy to me.  I'd suggest reordering this a bit
including a new helper:

static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued,
		bool from_schedule)
{
	if (hctx->queue->mq_ops->commit_rqs) {
		trace_block_unplug(hctx->queue, *queued, !from_schedule);
		hctx->queue->mq_ops->commit_rqs(hctx);
	}
	*queued = 0;
}

static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
{
	struct blk_mq_hw_ctx *hctx = NULL;
	int queued = 0;
	int errors = 0;

	while ((rq = rq_list_pop(&plug->mq_list))) {
		bool last = rq_list_empty(plug->mq_list);
		blk_status_t ret;

		if (hctx != rq->mq_hctx) {
			if (hctx)
				blk_mq_commit_rqs(hctx, &queued, from_schedule);
			hctx = rq->mq_hctx;
		}

		ret = blk_mq_request_issue_directly(rq, last);
		switch (ret) {
		case BLK_STS_OK:
			queued++;
			break;
		case BLK_STS_RESOURCE:
		case BLK_STS_DEV_RESOURCE:
			blk_mq_request_bypass_insert(rq, false, last);
			blk_mq_commit_rqs(hctx, &queued, from_schedule);
			return;
		default:
			blk_mq_end_request(rq, ret);
			errors++;
			break;
		}
	}

	/*
	 * If we didn't flush the entire list, we could have told the driver
	 * there was more coming, but that turned out to be a lie.
	 */
	if (errors)
		blk_mq_commit_rqs(hctx, &queued, from_schedule);
}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 08/14] block: improve layout of struct request
  2021-10-17  1:37 ` [PATCH 08/14] block: improve layout of struct request Jens Axboe
@ 2021-10-18  9:19   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  9:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 09/14] block: only mark bio as tracked if it really is tracked
  2021-10-17  1:37 ` [PATCH 09/14] block: only mark bio as tracked if it really is tracked Jens Axboe
@ 2021-10-18  9:19   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  9:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Sat, Oct 16, 2021 at 07:37:43PM -0600, Jens Axboe wrote:
> We set BIO_TRACKED unconditionally when rq_qos_throttle() is called, even
> though we may not even have an rq_qos handler. Only mark it as TRACKED if
> it really is potentially tracked.
> 
> This saves considerable time for the case where the bio isn't tracked:
> 
>      2.64%     -1.65%  [kernel.vmlinux]  [k] bio_endio
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 11/14] block: optimize blk_mq_rq_ctx_init()
  2021-10-17  1:37 ` [PATCH 11/14] block: optimize blk_mq_rq_ctx_init() Jens Axboe
@ 2021-10-18  9:20   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  9:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Sat, Oct 16, 2021 at 07:37:45PM -0600, Jens Axboe wrote:
> This takes ~4.7% on a peak performance run, with this patch it brings it
> down to ~3.7%.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> [axboe: rebase and move queuelist init]
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 12/14] block: align blkdev_dio inlined bio to a cacheline
  2021-10-17  1:37 ` [PATCH 12/14] block: align blkdev_dio inlined bio to a cacheline Jens Axboe
@ 2021-10-18  9:21   ` Christoph Hellwig
  2021-10-18 14:36     ` Jens Axboe
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  9:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Sat, Oct 16, 2021 at 07:37:46PM -0600, Jens Axboe wrote:
> We get all sorts of unreliable and funky results since the bio is
> designed to align on a cacheline, which it does not when inlined like
> this.

Well, I guess you'll need to audit all the front_pad users for
something like this then?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/14] block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes
  2021-10-17  1:37 ` [PATCH 13/14] block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes Jens Axboe
@ 2021-10-18  9:21   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  9:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 14/14] block: remove some blk_mq_hw_ctx debugfs entries
  2021-10-17  1:37 ` [PATCH 14/14] block: remove some blk_mq_hw_ctx debugfs entries Jens Axboe
@ 2021-10-18  9:22   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-18  9:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 05/14] block: don't call blk_status_to_errno() for success status
  2021-10-17  1:37 ` [PATCH 05/14] block: don't call blk_status_to_errno() for success status Jens Axboe
  2021-10-18  8:53   ` Christoph Hellwig
@ 2021-10-18 10:49   ` Pavel Begunkov
  1 sibling, 0 replies; 40+ messages in thread
From: Pavel Begunkov @ 2021-10-18 10:49 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig

On 10/17/21 01:37, Jens Axboe wrote:
> We only need to call it to resolve the blk_status_t -> errno mapping
> if the status is different than BLK_STS_OK. Check that it is before
> doing so.
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>   block/blk-mq.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ffccc5f0f66a..fa5b12200404 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -676,9 +676,11 @@ static void blk_account_io_completion(struct request *req, unsigned int bytes)
>   bool blk_update_request(struct request *req, blk_status_t error,
>   		unsigned int nr_bytes)
>   {
> -	int total_bytes;
> +	int total_bytes, blk_errno = 0;
>   
> -	trace_block_rq_complete(req, blk_status_to_errno(error), nr_bytes);
> +	if (unlikely(error != BLK_STS_OK))
> +		blk_errno = blk_status_to_errno(error);
> +	trace_block_rq_complete(req, blk_errno, nr_bytes);

Last time I checked relevant asm, the inference of arguments was done
under the trace branch, so no extra work if tracing is not active.


>   
>   	if (!req->bio)
>   		return false;
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/14] block: change plugging to use a singly linked list
  2021-10-17  1:37 ` [PATCH 07/14] block: change plugging to use a singly linked list Jens Axboe
  2021-10-17  4:45     ` kernel test robot
  2021-10-18  9:19   ` Christoph Hellwig
@ 2021-10-18 12:56   ` Pavel Begunkov
  2021-10-18 13:34     ` Jens Axboe
  2021-10-24 14:09   ` kernel test robot
  3 siblings, 1 reply; 40+ messages in thread
From: Pavel Begunkov @ 2021-10-18 12:56 UTC (permalink / raw)
  To: Jens Axboe, linux-block

On 10/17/21 01:37, Jens Axboe wrote:
> Use a singly linked list for the blk_plug. This saves 8 bytes in the
> blk_plug struct, and makes for faster list manipulations than doubly
> linked lists. As we don't use the doubly linked lists for anything,
> singly linked is just fine.
> 
> This yields a bump in default (merging enabled) performance from 7.0
> to 7.1M IOPS, and ~7.5M IOPS with merging disabled.

block/blk-merge.c: In function ‘blk_attempt_plug_merge’:
block/blk-merge.c:1094:22: error: implicit declaration of function ‘rq_list_empty’; did you mean ‘bio_list_empty’? [-Werror=implicit-function-declaration]
  1094 |         if (!plug || rq_list_empty(plug->mq_list))


Also in blk-mq.c and others, and I don't see it defined anywhere.
Used just fetched for-5.16/block. Did I miss it somewhere?


> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>   block/blk-core.c       |   5 +-
>   block/blk-merge.c      |   8 +--
>   block/blk-mq.c         | 156 +++++++++++++++++++++++++++--------------
>   block/blk.h            |   2 +-
>   include/linux/blkdev.h |   6 +-
>   5 files changed, 113 insertions(+), 64 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index bdc03b80a8d0..fced71948162 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1542,11 +1542,12 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>   	if (tsk->plug)
>   		return;
>   
> -	INIT_LIST_HEAD(&plug->mq_list);
> +	plug->mq_list = NULL;
>   	plug->cached_rq = NULL;
>   	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
>   	plug->rq_count = 0;
>   	plug->multiple_queues = false;
> +	plug->has_elevator = false;
>   	plug->nowait = false;
>   	INIT_LIST_HEAD(&plug->cb_list);
>   
> @@ -1632,7 +1633,7 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>   {
>   	flush_plug_callbacks(plug, from_schedule);
>   
> -	if (!list_empty(&plug->mq_list))
> +	if (!rq_list_empty(plug->mq_list))
>   		blk_mq_flush_plug_list(plug, from_schedule);
>   	if (unlikely(!from_schedule && plug->cached_rq))
>   		blk_mq_free_plug_rqs(plug);
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index ec727234ac48..b3f3e689a5ac 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -1085,23 +1085,23 @@ static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
>    * Caller must ensure !blk_queue_nomerges(q) beforehand.
>    */
>   bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
> -		unsigned int nr_segs, struct request **same_queue_rq)
> +		unsigned int nr_segs, bool *same_queue_rq)
>   {
>   	struct blk_plug *plug;
>   	struct request *rq;
>   
>   	plug = blk_mq_plug(q, bio);
> -	if (!plug || list_empty(&plug->mq_list))
> +	if (!plug || rq_list_empty(plug->mq_list))
>   		return false;
>   
>   	/* check the previously added entry for a quick merge attempt */
> -	rq = list_last_entry(&plug->mq_list, struct request, queuelist);
> +	rq = rq_list_peek(&plug->mq_list);
>   	if (rq->q == q && same_queue_rq) {
>   		/*
>   		 * Only blk-mq multiple hardware queues case checks the rq in
>   		 * the same queue, there should be only one such rq in a queue
>   		 */
> -		*same_queue_rq = rq;
> +		*same_queue_rq = true;
>   	}
>   	if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) == BIO_MERGE_OK)
>   		return true;
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 5d22c228f6df..cd1249284c1f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -19,7 +19,6 @@
>   #include <linux/smp.h>
>   #include <linux/interrupt.h>
>   #include <linux/llist.h>
> -#include <linux/list_sort.h>
>   #include <linux/cpu.h>
>   #include <linux/cache.h>
>   #include <linux/sched/sysctl.h>
> @@ -2118,54 +2117,100 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
>   	spin_unlock(&ctx->lock);
>   }
>   
> -static int plug_rq_cmp(void *priv, const struct list_head *a,
> -		       const struct list_head *b)
> +static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
>   {
> -	struct request *rqa = container_of(a, struct request, queuelist);
> -	struct request *rqb = container_of(b, struct request, queuelist);
> +	struct blk_mq_hw_ctx *hctx = NULL;
> +	int queued = 0;
> +	int errors = 0;
> +
> +	while (!rq_list_empty(plug->mq_list)) {
> +		struct request *rq;
> +		blk_status_t ret;
> +
> +		rq = rq_list_pop(&plug->mq_list);
>   
> -	if (rqa->mq_ctx != rqb->mq_ctx)
> -		return rqa->mq_ctx > rqb->mq_ctx;
> -	if (rqa->mq_hctx != rqb->mq_hctx)
> -		return rqa->mq_hctx > rqb->mq_hctx;
> +		if (!hctx)
> +			hctx = rq->mq_hctx;
> +		else if (hctx != rq->mq_hctx && hctx->queue->mq_ops->commit_rqs) {
> +			trace_block_unplug(hctx->queue, queued, !from_schedule);
> +			hctx->queue->mq_ops->commit_rqs(hctx);
> +			queued = 0;
> +			hctx = rq->mq_hctx;
> +		}
> +
> +		ret = blk_mq_request_issue_directly(rq,
> +						rq_list_empty(plug->mq_list));
> +		if (ret != BLK_STS_OK) {
> +			if (ret == BLK_STS_RESOURCE ||
> +					ret == BLK_STS_DEV_RESOURCE) {
> +				blk_mq_request_bypass_insert(rq, false,
> +						rq_list_empty(plug->mq_list));
> +				break;
> +			}
> +			blk_mq_end_request(rq, ret);
> +			errors++;
> +		} else
> +			queued++;
> +	}
>   
> -	return blk_rq_pos(rqa) > blk_rq_pos(rqb);
> +	/*
> +	 * If we didn't flush the entire list, we could have told
> +	 * the driver there was more coming, but that turned out to
> +	 * be a lie.
> +	 */
> +	if ((!rq_list_empty(plug->mq_list) || errors) &&
> +	     hctx->queue->mq_ops->commit_rqs && queued)
> +		hctx->queue->mq_ops->commit_rqs(hctx);
>   }
>   
>   void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>   {
> +	struct blk_mq_hw_ctx *this_hctx;
> +	struct blk_mq_ctx *this_ctx;
> +	unsigned int depth;
>   	LIST_HEAD(list);
>   
> -	if (list_empty(&plug->mq_list))
> +	if (rq_list_empty(plug->mq_list))
>   		return;
> -	list_splice_init(&plug->mq_list, &list);
> -
> -	if (plug->rq_count > 2 && plug->multiple_queues)
> -		list_sort(NULL, &list, plug_rq_cmp);
> -
>   	plug->rq_count = 0;
>   
> +	if (!plug->multiple_queues && !plug->has_elevator) {
> +		blk_mq_plug_issue_direct(plug, from_schedule);
> +		if (rq_list_empty(plug->mq_list))
> +			return;
> +	}
> +
> +	this_hctx = NULL;
> +	this_ctx = NULL;
> +	depth = 0;
>   	do {
> -		struct list_head rq_list;
> -		struct request *rq, *head_rq = list_entry_rq(list.next);
> -		struct list_head *pos = &head_rq->queuelist; /* skip first */
> -		struct blk_mq_hw_ctx *this_hctx = head_rq->mq_hctx;
> -		struct blk_mq_ctx *this_ctx = head_rq->mq_ctx;
> -		unsigned int depth = 1;
> -
> -		list_for_each_continue(pos, &list) {
> -			rq = list_entry_rq(pos);
> -			BUG_ON(!rq->q);
> -			if (rq->mq_hctx != this_hctx || rq->mq_ctx != this_ctx)
> -				break;
> -			depth++;
> +		struct request *rq;
> +
> +		rq = rq_list_pop(&plug->mq_list);
> +
> +		if (!this_hctx) {
> +			this_hctx = rq->mq_hctx;
> +			this_ctx = rq->mq_ctx;
> +		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx) {
> +			trace_block_unplug(this_hctx->queue, depth,
> +						!from_schedule);
> +			blk_mq_sched_insert_requests(this_hctx, this_ctx,
> +						&list, from_schedule);
> +			depth = 0;
> +			this_hctx = rq->mq_hctx;
> +			this_ctx = rq->mq_ctx;
> +
>   		}
>   
> -		list_cut_before(&rq_list, &list, pos);
> -		trace_block_unplug(head_rq->q, depth, !from_schedule);
> -		blk_mq_sched_insert_requests(this_hctx, this_ctx, &rq_list,
> +		list_add(&rq->queuelist, &list);
> +		depth++;
> +	} while (!rq_list_empty(plug->mq_list));
> +
> +	if (!list_empty(&list)) {
> +		trace_block_unplug(this_hctx->queue, depth, !from_schedule);
> +		blk_mq_sched_insert_requests(this_hctx, this_ctx, &list,
>   						from_schedule);
> -	} while(!list_empty(&list));
> +	}
>   }
>   
>   static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
> @@ -2345,16 +2390,17 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>   
>   static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
>   {
> -	list_add_tail(&rq->queuelist, &plug->mq_list);
> -	plug->rq_count++;
> -	if (!plug->multiple_queues && !list_is_singular(&plug->mq_list)) {
> -		struct request *tmp;
> +	if (!plug->multiple_queues) {
> +		struct request *nxt = rq_list_peek(&plug->mq_list);
>   
> -		tmp = list_first_entry(&plug->mq_list, struct request,
> -						queuelist);
> -		if (tmp->q != rq->q)
> +		if (nxt && nxt->q != rq->q)
>   			plug->multiple_queues = true;
>   	}
> +	if (!plug->has_elevator && (rq->rq_flags & RQF_ELV))
> +		plug->has_elevator = true;
> +	rq->rq_next = NULL;
> +	rq_list_add(&plug->mq_list, rq);
> +	plug->rq_count++;
>   }
>   
>   /*
> @@ -2389,7 +2435,7 @@ void blk_mq_submit_bio(struct bio *bio)
>   	const int is_flush_fua = op_is_flush(bio->bi_opf);
>   	struct request *rq;
>   	struct blk_plug *plug;
> -	struct request *same_queue_rq = NULL;
> +	bool same_queue_rq = false;
>   	unsigned int nr_segs = 1;
>   	blk_status_t ret;
>   
> @@ -2463,15 +2509,17 @@ void blk_mq_submit_bio(struct bio *bio)
>   		 * IO may benefit a lot from plug merging.
>   		 */
>   		unsigned int request_count = plug->rq_count;
> -		struct request *last = NULL;
> +		struct request *tmp = NULL;
>   
> -		if (!request_count)
> +		if (!request_count) {
>   			trace_block_plug(q);
> -		else
> -			last = list_entry_rq(plug->mq_list.prev);
> +		} else if (!blk_queue_nomerges(q)) {
> +			tmp = rq_list_peek(&plug->mq_list);
> +			if (blk_rq_bytes(tmp) < BLK_PLUG_FLUSH_SIZE)
> +				tmp = NULL;
> +		}
>   
> -		if (request_count >= blk_plug_max_rq_count(plug) || (last &&
> -		    blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
> +		if (request_count >= blk_plug_max_rq_count(plug) || tmp) {
>   			blk_flush_plug_list(plug, false);
>   			trace_block_plug(q);
>   		}
> @@ -2481,6 +2529,8 @@ void blk_mq_submit_bio(struct bio *bio)
>   		/* Insert the request at the IO scheduler queue */
>   		blk_mq_sched_insert_request(rq, false, true, true);
>   	} else if (plug && !blk_queue_nomerges(q)) {
> +		struct request *first_rq = NULL;
> +
>   		/*
>   		 * We do limited plugging. If the bio can be merged, do that.
>   		 * Otherwise the existing request in the plug list will be
> @@ -2488,19 +2538,17 @@ void blk_mq_submit_bio(struct bio *bio)
>   		 * The plug list might get flushed before this. If that happens,
>   		 * the plug list is empty, and same_queue_rq is invalid.
>   		 */
> -		if (list_empty(&plug->mq_list))
> -			same_queue_rq = NULL;
> -		if (same_queue_rq) {
> -			list_del_init(&same_queue_rq->queuelist);
> +		if (!rq_list_empty(plug->mq_list) && same_queue_rq) {
> +			first_rq = rq_list_pop(&plug->mq_list);
>   			plug->rq_count--;
>   		}
>   		blk_add_rq_to_plug(plug, rq);
>   		trace_block_plug(q);
>   
> -		if (same_queue_rq) {
> +		if (first_rq) {
>   			trace_block_unplug(q, 1, true);
> -			blk_mq_try_issue_directly(same_queue_rq->mq_hctx,
> -						  same_queue_rq);
> +			blk_mq_try_issue_directly(first_rq->mq_hctx,
> +						  first_rq);
>   		}
>   	} else if ((q->nr_hw_queues > 1 && is_sync) ||
>   		   !rq->mq_hctx->dispatch_busy) {
> diff --git a/block/blk.h b/block/blk.h
> index fdfaa6896fc4..c15d1ab224b8 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -216,7 +216,7 @@ void blk_add_timer(struct request *req);
>   void blk_print_req_error(struct request *req, blk_status_t status);
>   
>   bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
> -		unsigned int nr_segs, struct request **same_queue_rq);
> +		unsigned int nr_segs, bool *same_queue_rq);
>   bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
>   			struct bio *bio, unsigned int nr_segs);
>   
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 43dda725dcae..c6a6402cb1a1 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -727,7 +727,7 @@ extern void blk_set_queue_dying(struct request_queue *);
>    * schedule() where blk_schedule_flush_plug() is called.
>    */
>   struct blk_plug {
> -	struct list_head mq_list; /* blk-mq requests */
> +	struct request *mq_list; /* blk-mq requests */
>   
>   	/* if ios_left is > 1, we can batch tag/rq allocations */
>   	struct request *cached_rq;
> @@ -736,6 +736,7 @@ struct blk_plug {
>   	unsigned short rq_count;
>   
>   	bool multiple_queues;
> +	bool has_elevator;
>   	bool nowait;
>   
>   	struct list_head cb_list; /* md requires an unplug callback */
> @@ -776,8 +777,7 @@ static inline bool blk_needs_flush_plug(struct task_struct *tsk)
>   	struct blk_plug *plug = tsk->plug;
>   
>   	return plug &&
> -		 (!list_empty(&plug->mq_list) ||
> -		 !list_empty(&plug->cb_list));
> +		 (plug->mq_list || !list_empty(&plug->cb_list));
>   }
>   
>   int blkdev_issue_flush(struct block_device *bdev);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/14] block: change plugging to use a singly linked list
  2021-10-18 12:56   ` Pavel Begunkov
@ 2021-10-18 13:34     ` Jens Axboe
  0 siblings, 0 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-18 13:34 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block

On 10/18/21 6:56 AM, Pavel Begunkov wrote:
> On 10/17/21 01:37, Jens Axboe wrote:
>> Use a singly linked list for the blk_plug. This saves 8 bytes in the
>> blk_plug struct, and makes for faster list manipulations than doubly
>> linked lists. As we don't use the doubly linked lists for anything,
>> singly linked is just fine.
>>
>> This yields a bump in default (merging enabled) performance from 7.0
>> to 7.1M IOPS, and ~7.5M IOPS with merging disabled.
> 
> block/blk-merge.c: In function ‘blk_attempt_plug_merge’:
> block/blk-merge.c:1094:22: error: implicit declaration of function ‘rq_list_empty’; did you mean ‘bio_list_empty’? [-Werror=implicit-function-declaration]
>   1094 |         if (!plug || rq_list_empty(plug->mq_list))
> 
> 
> Also in blk-mq.c and others, and I don't see it defined anywhere.
> Used just fetched for-5.16/block. Did I miss it somewhere?

I forgot to include this prep patch:

https://git.kernel.dk/cgit/linux-block/commit/?h=perf-wip&id=b3ca14988ce2eb125a8a80c3ee146f3ad55cfcf4

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/14] block: store elevator state in request
  2021-10-18  8:58   ` Christoph Hellwig
@ 2021-10-18 14:34     ` Jens Axboe
  0 siblings, 0 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-18 14:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 10/18/21 2:58 AM, Christoph Hellwig wrote:
> On Sat, Oct 16, 2021 at 07:37:40PM -0600, Jens Axboe wrote:
>>  
>>  static inline void blk_mq_sched_requeue_request(struct request *rq)
>>  {
>> +	if (rq->rq_flags & RQF_ELV) {
>> +		struct request_queue *q = rq->q;
>> +		struct elevator_queue *e = q->elevator;
>>  
>> +		if ((rq->rq_flags & RQF_ELVPRIV) && e->type->ops.requeue_request)
>> +			e->type->ops.requeue_request(rq);
>> +	}
> 
> I think we can just kill of RQF_ELVPRIV now.  It is equivalent to
> RQF_ELV for !op_is_flush requests.

I'd prefer not to for this round, as ELV could be set without PRIV depending
on the scheduler.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 12/14] block: align blkdev_dio inlined bio to a cacheline
  2021-10-18  9:21   ` Christoph Hellwig
@ 2021-10-18 14:36     ` Jens Axboe
  0 siblings, 0 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-18 14:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 10/18/21 3:21 AM, Christoph Hellwig wrote:
> On Sat, Oct 16, 2021 at 07:37:46PM -0600, Jens Axboe wrote:
>> We get all sorts of unreliable and funky results since the bio is
>> designed to align on a cacheline, which it does not when inlined like
>> this.
> 
> Well, I guess you'll need to audit all the front_pad users for
> something like this then?

It's not a correctness issue, it's just less efficient. Not really
a new thing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/14] block: inline fast path of driver tag allocation
  2021-10-18  8:42   ` Christoph Hellwig
@ 2021-10-18 14:38     ` Jens Axboe
  0 siblings, 0 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-18 14:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 10/18/21 2:42 AM, Christoph Hellwig wrote:
> On Sat, Oct 16, 2021 at 07:37:35PM -0600, Jens Axboe wrote:
>> If we don't use an IO scheduler or have shared tags, then we don't need
>> to call into this external function at all. This saves ~2% for such
>> a setup.
> 
> This looks correct, although the call chain gets a little confusing
> now.  How much difference is this over basically inlining the
> whole old blk_mq_get_driver_tag?

It just condenses the fast path into the checks upfront. I can run the
numbers, but it'll be extra work.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/14] block: change plugging to use a singly linked list
  2021-10-18  9:19   ` Christoph Hellwig
@ 2021-10-18 16:10     ` Jens Axboe
  0 siblings, 0 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-18 16:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 10/18/21 3:19 AM, Christoph Hellwig wrote:
> On Sat, Oct 16, 2021 at 07:37:41PM -0600, Jens Axboe wrote:
>> Use a singly linked list for the blk_plug. This saves 8 bytes in the
>> blk_plug struct, and makes for faster list manipulations than doubly
>> linked lists. As we don't use the doubly linked lists for anything,
>> singly linked is just fine.
> 
> This patch also does a few other thing, like changing the same_queue_rq
> type and adding a has_elevator flag to the plug.  Can you please split
> this out and document the changes better?

I'll split it, should probably be 3 patches.

> 
>> static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
>>  {
>> +	struct blk_mq_hw_ctx *hctx = NULL;
>> +	int queued = 0;
>> +	int errors = 0;
>> +
>> +	while (!rq_list_empty(plug->mq_list)) {
>> +		struct request *rq;
>> +		blk_status_t ret;
>> +
>> +		rq = rq_list_pop(&plug->mq_list);
>>  
>> +		if (!hctx)
>> +			hctx = rq->mq_hctx;
>> +		else if (hctx != rq->mq_hctx && hctx->queue->mq_ops->commit_rqs) {
>> +			trace_block_unplug(hctx->queue, queued, !from_schedule);
>> +			hctx->queue->mq_ops->commit_rqs(hctx);
>> +			queued = 0;
>> +			hctx = rq->mq_hctx;
>> +		}
>> +
>> +		ret = blk_mq_request_issue_directly(rq,
>> +						rq_list_empty(plug->mq_list));
>> +		if (ret != BLK_STS_OK) {
>> +			if (ret == BLK_STS_RESOURCE ||
>> +					ret == BLK_STS_DEV_RESOURCE) {
>> +				blk_mq_request_bypass_insert(rq, false,
>> +						rq_list_empty(plug->mq_list));
>> +				break;
>> +			}
>> +			blk_mq_end_request(rq, ret);
>> +			errors++;
>> +		} else
>> +			queued++;
>> +	}
> 
> This all looks a bit messy to me.  I'd suggest reordering this a bit
> including a new helper:
> 
> static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued,
> 		bool from_schedule)
> {
> 	if (hctx->queue->mq_ops->commit_rqs) {
> 		trace_block_unplug(hctx->queue, *queued, !from_schedule);
> 		hctx->queue->mq_ops->commit_rqs(hctx);
> 	}
> 	*queued = 0;
> }
> 
> static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
> {
> 	struct blk_mq_hw_ctx *hctx = NULL;
> 	int queued = 0;
> 	int errors = 0;
> 
> 	while ((rq = rq_list_pop(&plug->mq_list))) {
> 		bool last = rq_list_empty(plug->mq_list);
> 		blk_status_t ret;
> 
> 		if (hctx != rq->mq_hctx) {
> 			if (hctx)
> 				blk_mq_commit_rqs(hctx, &queued, from_schedule);
> 			hctx = rq->mq_hctx;
> 		}
> 
> 		ret = blk_mq_request_issue_directly(rq, last);
> 		switch (ret) {
> 		case BLK_STS_OK:
> 			queued++;
> 			break;
> 		case BLK_STS_RESOURCE:
> 		case BLK_STS_DEV_RESOURCE:
> 			blk_mq_request_bypass_insert(rq, false, last);
> 			blk_mq_commit_rqs(hctx, &queued, from_schedule);
> 			return;
> 		default:
> 			blk_mq_end_request(rq, ret);
> 			errors++;
> 			break;
> 		}
> 	}
> 
> 	/*
> 	 * If we didn't flush the entire list, we could have told the driver
> 	 * there was more coming, but that turned out to be a lie.
> 	 */
> 	if (errors)
> 		blk_mq_commit_rqs(hctx, &queued, from_schedule);
> }

Good suggestion!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/14] block: store elevator state in request
  2021-10-17  1:37 ` [PATCH 06/14] block: store elevator state in request Jens Axboe
  2021-10-18  8:58   ` Christoph Hellwig
@ 2021-10-19 22:21   ` Guillaume Tucker
  2021-10-19 22:26     ` Jens Axboe
  1 sibling, 1 reply; 40+ messages in thread
From: Guillaume Tucker @ 2021-10-19 22:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Mark Brown, linux-kernel, kernelci

Hi Jens,

On 17/10/2021 02:37, Jens Axboe wrote:
> Add an rq private RQF_ELV flag, which tells the block layer that this
> request was initialized on a queue that has an IO scheduler attached.
> This allows for faster checking in the fast path, rather than having to
> deference rq->q later on.
> 
> Elevator switching does full quiesce of the queue before detaching an
> IO scheduler, so it's safe to cache this in the request itself.

A kernelci.org automated bisection found that this patch
introduced a regression in next-20211019 with a NULL pointer
dereference, which only seems to be affecting QEMU but across all
architectures.

More details about the regression can be found here:

  https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20211019/plan/baseline/
  https://linux.kernelci.org/test/case/id/616ea20eb7104071c43358ea/

See also all the test jobs involved in the automated bisection:

  https://lava.collabora.co.uk/scheduler/device_type/qemu?dt_search=bisection-287

If you do send a fix, please include this trailer:

  Reported-by: "kernelci.org bot" <bot@kernelci.org>


Please let us know if this seems like a valid bisection result
and if you need any help to debug the issue or try a fix.

Best wishes,
Guillaume


GitHub: https://github.com/kernelci/kernelci-project/issues/68


> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-mq-sched.h   | 27 ++++++++++++++++-----------
>  block/blk-mq.c         | 20 +++++++++++---------
>  include/linux/blk-mq.h |  2 ++
>  3 files changed, 29 insertions(+), 20 deletions(-)
> 
> diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
> index fe252278ed9a..98836106b25f 100644
> --- a/block/blk-mq-sched.h
> +++ b/block/blk-mq-sched.h
> @@ -56,29 +56,34 @@ static inline bool
>  blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>  			 struct bio *bio)
>  {
> -	struct elevator_queue *e = q->elevator;
> -
> -	if (e && e->type->ops.allow_merge)
> -		return e->type->ops.allow_merge(q, rq, bio);
> +	if (rq->rq_flags & RQF_ELV) {
> +		struct elevator_queue *e = q->elevator;
>  
> +		if (e->type->ops.allow_merge)
> +			return e->type->ops.allow_merge(q, rq, bio);
> +	}
>  	return true;
>  }
>  
>  static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
>  {
> -	struct elevator_queue *e = rq->q->elevator;
> +	if (rq->rq_flags & RQF_ELV) {
> +		struct elevator_queue *e = rq->q->elevator;
>  
> -	if (e && e->type->ops.completed_request)
> -		e->type->ops.completed_request(rq, now);
> +		if (e->type->ops.completed_request)
> +			e->type->ops.completed_request(rq, now);
> +	}
>  }
>  
>  static inline void blk_mq_sched_requeue_request(struct request *rq)
>  {
> -	struct request_queue *q = rq->q;
> -	struct elevator_queue *e = q->elevator;
> +	if (rq->rq_flags & RQF_ELV) {
> +		struct request_queue *q = rq->q;
> +		struct elevator_queue *e = q->elevator;
>  
> -	if ((rq->rq_flags & RQF_ELVPRIV) && e && e->type->ops.requeue_request)
> -		e->type->ops.requeue_request(rq);
> +		if ((rq->rq_flags & RQF_ELVPRIV) && e->type->ops.requeue_request)
> +			e->type->ops.requeue_request(rq);
> +	}
>  }
>  
>  static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index fa5b12200404..5d22c228f6df 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -299,7 +299,7 @@ void blk_mq_wake_waiters(struct request_queue *q)
>   */
>  static inline bool blk_mq_need_time_stamp(struct request *rq)
>  {
> -	return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS)) || rq->q->elevator;
> +	return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS | RQF_ELV));
>  }
>  
>  static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
> @@ -309,9 +309,11 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  	struct request *rq = tags->static_rqs[tag];
>  
>  	if (data->q->elevator) {
> +		rq->rq_flags = RQF_ELV;
>  		rq->tag = BLK_MQ_NO_TAG;
>  		rq->internal_tag = tag;
>  	} else {
> +		rq->rq_flags = 0;
>  		rq->tag = tag;
>  		rq->internal_tag = BLK_MQ_NO_TAG;
>  	}
> @@ -320,7 +322,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  	rq->q = data->q;
>  	rq->mq_ctx = data->ctx;
>  	rq->mq_hctx = data->hctx;
> -	rq->rq_flags = 0;
>  	rq->cmd_flags = data->cmd_flags;
>  	if (data->flags & BLK_MQ_REQ_PM)
>  		rq->rq_flags |= RQF_PM;
> @@ -356,11 +357,11 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
>  	refcount_set(&rq->ref, 1);
>  
> -	if (!op_is_flush(data->cmd_flags)) {
> +	if (!op_is_flush(data->cmd_flags) && (rq->rq_flags & RQF_ELV)) {
>  		struct elevator_queue *e = data->q->elevator;
>  
>  		rq->elv.icq = NULL;
> -		if (e && e->type->ops.prepare_request) {
> +		if (e->type->ops.prepare_request) {
>  			if (e->type->icq_cache)
>  				blk_mq_sched_assign_ioc(rq);
>  
> @@ -575,12 +576,13 @@ static void __blk_mq_free_request(struct request *rq)
>  void blk_mq_free_request(struct request *rq)
>  {
>  	struct request_queue *q = rq->q;
> -	struct elevator_queue *e = q->elevator;
>  	struct blk_mq_ctx *ctx = rq->mq_ctx;
>  	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>  
> -	if (rq->rq_flags & RQF_ELVPRIV) {
> -		if (e && e->type->ops.finish_request)
> +	if (rq->rq_flags & (RQF_ELVPRIV | RQF_ELV)) {
> +		struct elevator_queue *e = q->elevator;
> +
> +		if (e->type->ops.finish_request)
>  			e->type->ops.finish_request(rq);
>  		if (rq->elv.icq) {
>  			put_io_context(rq->elv.icq->ioc);
> @@ -2239,7 +2241,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  		goto insert;
>  	}
>  
> -	if (q->elevator && !bypass_insert)
> +	if ((rq->rq_flags & RQF_ELV) && !bypass_insert)
>  		goto insert;
>  
>  	budget_token = blk_mq_get_dispatch_budget(q);
> @@ -2475,7 +2477,7 @@ void blk_mq_submit_bio(struct bio *bio)
>  		}
>  
>  		blk_add_rq_to_plug(plug, rq);
> -	} else if (q->elevator) {
> +	} else if (rq->rq_flags & RQF_ELV) {
>  		/* Insert the request at the IO scheduler queue */
>  		blk_mq_sched_insert_request(rq, false, true, true);
>  	} else if (plug && !blk_queue_nomerges(q)) {
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index a9c1d0882550..3a399aa372b5 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -55,6 +55,8 @@ typedef __u32 __bitwise req_flags_t;
>  #define RQF_MQ_POLL_SLEPT	((__force req_flags_t)(1 << 20))
>  /* ->timeout has been called, don't expire again */
>  #define RQF_TIMED_OUT		((__force req_flags_t)(1 << 21))
> +/* queue has elevator attached */
> +#define RQF_ELV			((__force req_flags_t)(1 << 22))
>  
>  /* flags that prevent us from merging requests: */
>  #define RQF_NOMERGE_FLAGS \
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/14] block: store elevator state in request
  2021-10-19 22:21   ` Guillaume Tucker
@ 2021-10-19 22:26     ` Jens Axboe
  0 siblings, 0 replies; 40+ messages in thread
From: Jens Axboe @ 2021-10-19 22:26 UTC (permalink / raw)
  To: Guillaume Tucker; +Cc: linux-block, Mark Brown, linux-kernel, kernelci

On 10/19/21 4:21 PM, Guillaume Tucker wrote:
> Hi Jens,
> 
> On 17/10/2021 02:37, Jens Axboe wrote:
>> Add an rq private RQF_ELV flag, which tells the block layer that this
>> request was initialized on a queue that has an IO scheduler attached.
>> This allows for faster checking in the fast path, rather than having to
>> deference rq->q later on.
>>
>> Elevator switching does full quiesce of the queue before detaching an
>> IO scheduler, so it's safe to cache this in the request itself.
> 
> A kernelci.org automated bisection found that this patch
> introduced a regression in next-20211019 with a NULL pointer
> dereference, which only seems to be affecting QEMU but across all
> architectures.
> 
> More details about the regression can be found here:
> 
>   https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20211019/plan/baseline/
>   https://linux.kernelci.org/test/case/id/616ea20eb7104071c43358ea/
> 
> See also all the test jobs involved in the automated bisection:
> 
>   https://lava.collabora.co.uk/scheduler/device_type/qemu?dt_search=bisection-287
> 
> If you do send a fix, please include this trailer:
> 
>   Reported-by: "kernelci.org bot" <bot@kernelci.org>
> 
> 
> Please let us know if this seems like a valid bisection result
> and if you need any help to debug the issue or try a fix.

This got fixed yesterday, current tree is fine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/14] block: change plugging to use a singly linked list
  2021-10-17  1:37 ` [PATCH 07/14] block: change plugging to use a singly linked list Jens Axboe
                     ` (2 preceding siblings ...)
  2021-10-18 12:56   ` Pavel Begunkov
@ 2021-10-24 14:09   ` kernel test robot
  3 siblings, 0 replies; 40+ messages in thread
From: kernel test robot @ 2021-10-24 14:09 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5433 bytes --]

Hi Jens,

I love your patch! Yet something to improve:

[auto build test ERROR on axboe-block/for-next]
[cannot apply to linus/master v5.15-rc6 next-20211022]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jens-Axboe/Various-block-layer-optimizations/20211017-093822
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/6ed4540d0461ab59f1dc96a0b448eb12c3446182
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jens-Axboe/Various-block-layer-optimizations/20211017-093822
        git checkout 6ed4540d0461ab59f1dc96a0b448eb12c3446182
        # save the attached .config to linux build tree
        make W=1 ARCH=um SUBARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   block/blk-core.c: In function 'blk_flush_plug_list':
>> block/blk-core.c:1636:7: error: implicit declaration of function 'rq_list_empty'; did you mean 'bio_list_empty'? [-Werror=implicit-function-declaration]
    1636 |  if (!rq_list_empty(plug->mq_list))
         |       ^~~~~~~~~~~~~
         |       bio_list_empty
   cc1: some warnings being treated as errors
--
   block/blk-merge.c: In function 'blk_attempt_plug_merge':
>> block/blk-merge.c:1094:15: error: implicit declaration of function 'rq_list_empty'; did you mean 'bio_list_empty'? [-Werror=implicit-function-declaration]
    1094 |  if (!plug || rq_list_empty(plug->mq_list))
         |               ^~~~~~~~~~~~~
         |               bio_list_empty
>> block/blk-merge.c:1098:7: error: implicit declaration of function 'rq_list_peek'; did you mean 'bio_list_peek'? [-Werror=implicit-function-declaration]
    1098 |  rq = rq_list_peek(&plug->mq_list);
         |       ^~~~~~~~~~~~
         |       bio_list_peek
>> block/blk-merge.c:1098:5: warning: assignment to 'struct request *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
    1098 |  rq = rq_list_peek(&plug->mq_list);
         |     ^
   cc1: some warnings being treated as errors
--
   block/blk-mq.c: In function 'blk_mq_plug_issue_direct':
>> block/blk-mq.c:2134:10: error: implicit declaration of function 'rq_list_empty'; did you mean 'bio_list_empty'? [-Werror=implicit-function-declaration]
    2134 |  while (!rq_list_empty(plug->mq_list)) {
         |          ^~~~~~~~~~~~~
         |          bio_list_empty
>> block/blk-mq.c:2138:8: error: implicit declaration of function 'rq_list_pop'; did you mean 'bio_list_pop'? [-Werror=implicit-function-declaration]
    2138 |   rq = rq_list_pop(&plug->mq_list);
         |        ^~~~~~~~~~~
         |        bio_list_pop
>> block/blk-mq.c:2138:6: warning: assignment to 'struct request *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
    2138 |   rq = rq_list_pop(&plug->mq_list);
         |      ^
   block/blk-mq.c: In function 'blk_mq_flush_plug_list':
   block/blk-mq.c:2197:6: warning: assignment to 'struct request *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
    2197 |   rq = rq_list_pop(&plug->mq_list);
         |      ^
   block/blk-mq.c: In function 'blk_add_rq_to_plug':
>> block/blk-mq.c:2402:25: error: implicit declaration of function 'rq_list_peek'; did you mean 'bio_list_peek'? [-Werror=implicit-function-declaration]
    2402 |   struct request *nxt = rq_list_peek(&plug->mq_list);
         |                         ^~~~~~~~~~~~
         |                         bio_list_peek
>> block/blk-mq.c:2402:25: warning: initialization of 'struct request *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
>> block/blk-mq.c:2410:2: error: implicit declaration of function 'rq_list_add'; did you mean 'rq_qos_add'? [-Werror=implicit-function-declaration]
    2410 |  rq_list_add(&plug->mq_list, rq);
         |  ^~~~~~~~~~~
         |  rq_qos_add
   block/blk-mq.c: In function 'blk_mq_submit_bio':
   block/blk-mq.c:2526:8: warning: assignment to 'struct request *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
    2526 |    tmp = rq_list_peek(&plug->mq_list);
         |        ^
   block/blk-mq.c:2551:13: warning: assignment to 'struct request *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
    2551 |    first_rq = rq_list_pop(&plug->mq_list);
         |             ^
   cc1: some warnings being treated as errors


vim +1636 block/blk-core.c

  1631	
  1632	void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
  1633	{
  1634		flush_plug_callbacks(plug, from_schedule);
  1635	
> 1636		if (!rq_list_empty(plug->mq_list))
  1637			blk_mq_flush_plug_list(plug, from_schedule);
  1638		if (unlikely(!from_schedule && plug->cached_rq))
  1639			blk_mq_free_plug_rqs(plug);
  1640	}
  1641	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 9710 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-10-24 14:09 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-17  1:37 [PATCHSET 0/14] Various block layer optimizations Jens Axboe
2021-10-17  1:37 ` [PATCH 01/14] block: inline fast path of driver tag allocation Jens Axboe
2021-10-18  8:42   ` Christoph Hellwig
2021-10-18 14:38     ` Jens Axboe
2021-10-17  1:37 ` [PATCH 02/14] block: don't bother iter advancing a fully done bio Jens Axboe
2021-10-18  8:42   ` Christoph Hellwig
2021-10-17  1:37 ` [PATCH 03/14] block: remove useless caller argument to print_req_error() Jens Axboe
2021-10-18  8:42   ` Christoph Hellwig
2021-10-17  1:37 ` [PATCH 04/14] block: move update request helpers into blk-mq.c Jens Axboe
2021-10-18  8:43   ` Christoph Hellwig
2021-10-17  1:37 ` [PATCH 05/14] block: don't call blk_status_to_errno() for success status Jens Axboe
2021-10-18  8:53   ` Christoph Hellwig
2021-10-18 10:49   ` Pavel Begunkov
2021-10-17  1:37 ` [PATCH 06/14] block: store elevator state in request Jens Axboe
2021-10-18  8:58   ` Christoph Hellwig
2021-10-18 14:34     ` Jens Axboe
2021-10-19 22:21   ` Guillaume Tucker
2021-10-19 22:26     ` Jens Axboe
2021-10-17  1:37 ` [PATCH 07/14] block: change plugging to use a singly linked list Jens Axboe
2021-10-17  4:45   ` kernel test robot
2021-10-17  4:45     ` kernel test robot
2021-10-18  9:19   ` Christoph Hellwig
2021-10-18 16:10     ` Jens Axboe
2021-10-18 12:56   ` Pavel Begunkov
2021-10-18 13:34     ` Jens Axboe
2021-10-24 14:09   ` kernel test robot
2021-10-17  1:37 ` [PATCH 08/14] block: improve layout of struct request Jens Axboe
2021-10-18  9:19   ` Christoph Hellwig
2021-10-17  1:37 ` [PATCH 09/14] block: only mark bio as tracked if it really is tracked Jens Axboe
2021-10-18  9:19   ` Christoph Hellwig
2021-10-17  1:37 ` [PATCH 10/14] block: move blk_mq_tag_to_rq() inline Jens Axboe
2021-10-17  1:37 ` [PATCH 11/14] block: optimize blk_mq_rq_ctx_init() Jens Axboe
2021-10-18  9:20   ` Christoph Hellwig
2021-10-17  1:37 ` [PATCH 12/14] block: align blkdev_dio inlined bio to a cacheline Jens Axboe
2021-10-18  9:21   ` Christoph Hellwig
2021-10-18 14:36     ` Jens Axboe
2021-10-17  1:37 ` [PATCH 13/14] block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes Jens Axboe
2021-10-18  9:21   ` Christoph Hellwig
2021-10-17  1:37 ` [PATCH 14/14] block: remove some blk_mq_hw_ctx debugfs entries Jens Axboe
2021-10-18  9:22   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.