linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] block optimisation round
@ 2021-10-19 21:24 Pavel Begunkov
  2021-10-19 21:24 ` [PATCH 01/16] block: turn macro helpers into inline functions Pavel Begunkov
                   ` (19 more replies)
  0 siblings, 20 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Jens tried out a similar series with some not yet sent additions:
8.2-8.3 MIOPS -> ~9 MIOPS, or 8-10%.

12/16 is bulky, but it nicely drives the numbers. Moreover, with
it we can rid of some not used anymore optimisations in
__blkdev_direct_IO() because it awlays serve multiple bios.
E.g. no need in conditional referencing with DIO_MULTI_BIO,
and _probably_ can be converted to chained bio.

Pavel Begunkov (16):
  block: turn macro helpers into inline functions
  block: convert leftovers to bdev_get_queue
  block: optimise req_bio_endio()
  block: don't bloat enter_queue with percpu_ref
  block: inline a part of bio_release_pages()
  block: clean up blk_mq_submit_bio() merging
  blocK: move plug flush functions to blk-mq.c
  block: optimise blk_flush_plug_list
  block: optimise boundary blkdev_read_iter's checks
  block: optimise blkdev_bio_end_io()
  block: add optimised version bio_set_dev()
  block: add single bio async direct IO helper
  block: add async version of bio_set_polled
  block: skip advance when async and not needed
  block: optimise blk_may_split for normal rw
  block: optimise submit_bio_checks for normal rw

 block/bio.c            |  20 +++----
 block/blk-core.c       | 105 ++++++++++++++--------------------
 block/blk-merge.c      |   2 +-
 block/blk-mq-sched.c   |   2 +-
 block/blk-mq-sched.h   |  12 +---
 block/blk-mq.c         |  64 ++++++++++++++-------
 block/blk-mq.h         |   1 -
 block/blk.h            |  20 ++++---
 block/fops.c           | 125 ++++++++++++++++++++++++++++++++++-------
 include/linux/bio.h    |  60 ++++++++++++++------
 include/linux/blk-mq.h |   2 -
 11 files changed, 259 insertions(+), 154 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 01/16] block: turn macro helpers into inline functions
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:09   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 02/16] block: convert leftovers to bdev_get_queue Pavel Begunkov
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Replace bio_set_dev() with an identical inline helper and move it
further to fix a dependency problem with bio_associate_blkg(). Do the
same for bio_copy_dev().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/bio.h | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 9538f20ffaa5..b12453d7b8a8 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -430,22 +430,6 @@ void zero_fill_bio(struct bio *bio);
 
 extern const char *bio_devname(struct bio *bio, char *buffer);
 
-#define bio_set_dev(bio, bdev) 				\
-do {							\
-	bio_clear_flag(bio, BIO_REMAPPED);		\
-	if ((bio)->bi_bdev != (bdev))			\
-		bio_clear_flag(bio, BIO_THROTTLED);	\
-	(bio)->bi_bdev = (bdev);			\
-	bio_associate_blkg(bio);			\
-} while (0)
-
-#define bio_copy_dev(dst, src)			\
-do {						\
-	bio_clear_flag(dst, BIO_REMAPPED);		\
-	(dst)->bi_bdev = (src)->bi_bdev;	\
-	bio_clone_blkg_association(dst, src);	\
-} while (0)
-
 #define bio_dev(bio) \
 	disk_devt((bio)->bi_bdev->bd_disk)
 
@@ -463,6 +447,22 @@ static inline void bio_clone_blkg_association(struct bio *dst,
 					      struct bio *src) { }
 #endif	/* CONFIG_BLK_CGROUP */
 
+static inline void bio_set_dev(struct bio *bio, struct block_device *bdev)
+{
+	bio_clear_flag(bio, BIO_REMAPPED);
+	if (bio->bi_bdev != bdev)
+		bio_clear_flag(bio, BIO_THROTTLED);
+	bio->bi_bdev = bdev;
+	bio_associate_blkg(bio);
+}
+
+static inline void bio_copy_dev(struct bio *dst, struct bio *src)
+{
+	bio_clear_flag(dst, BIO_REMAPPED);
+	dst->bi_bdev = src->bi_bdev;
+	bio_clone_blkg_association(dst, src);
+}
+
 /*
  * BIO list management for use by remapping drivers (e.g. DM or MD) and loop.
  *
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 02/16] block: convert leftovers to bdev_get_queue
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
  2021-10-19 21:24 ` [PATCH 01/16] block: turn macro helpers into inline functions Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-19 22:34   ` Chaitanya Kulkarni
  2021-10-19 21:24 ` [PATCH 03/16] block: optimise req_bio_endio() Pavel Begunkov
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Convert bdev->bd_disk->queue to bdev_get_queue(), which is faster.
Apparently, there are a few such spots in block that got lost during
rebases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/blk-core.c  | 2 +-
 block/blk-merge.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e6ad5b51d0c3..c1ba34777c6d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1080,7 +1080,7 @@ EXPORT_SYMBOL(submit_bio);
  */
 int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
 {
-	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 	blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
 	int ret;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 3e6fa449caff..df69f4bb7717 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -383,7 +383,7 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
  */
 void blk_queue_split(struct bio **bio)
 {
-	struct request_queue *q = (*bio)->bi_bdev->bd_disk->queue;
+	struct request_queue *q = bdev_get_queue((*bio)->bi_bdev);
 	unsigned int nr_segs;
 
 	if (blk_may_split(q, *bio))
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 03/16] block: optimise req_bio_endio()
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
  2021-10-19 21:24 ` [PATCH 01/16] block: turn macro helpers into inline functions Pavel Begunkov
  2021-10-19 21:24 ` [PATCH 02/16] block: convert leftovers to bdev_get_queue Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:11   ` Christoph Hellwig
  2021-10-22  9:58   ` Shinichiro Kawasaki
  2021-10-19 21:24 ` [PATCH 04/16] block: don't bloat enter_queue with percpu_ref Pavel Begunkov
                   ` (16 subsequent siblings)
  19 siblings, 2 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

First, get rid of an extra branch and chain error checks. Also reshuffle
it with bio_advance(), so it goes closer to the final check, with that
the compiler loads rq->rq_flags only once, and also doesn't reload
bio->bi_iter.bi_size if bio_advance() didn't actually advanced the iter.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/blk-mq.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3481a8712234..bab1fccda6ca 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -617,25 +617,23 @@ void blk_mq_free_plug_rqs(struct blk_plug *plug)
 static void req_bio_endio(struct request *rq, struct bio *bio,
 			  unsigned int nbytes, blk_status_t error)
 {
-	if (error)
+	if (unlikely(error)) {
 		bio->bi_status = error;
-
-	if (unlikely(rq->rq_flags & RQF_QUIET))
-		bio_set_flag(bio, BIO_QUIET);
-
-	bio_advance(bio, nbytes);
-
-	if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) {
+	} else if (req_op(rq) == REQ_OP_ZONE_APPEND) {
 		/*
 		 * Partial zone append completions cannot be supported as the
 		 * BIO fragments may end up not being written sequentially.
 		 */
-		if (bio->bi_iter.bi_size)
+		if (bio->bi_iter.bi_size == nbytes)
 			bio->bi_status = BLK_STS_IOERR;
 		else
 			bio->bi_iter.bi_sector = rq->__sector;
 	}
 
+	bio_advance(bio, nbytes);
+
+	if (unlikely(rq->rq_flags & RQF_QUIET))
+		bio_set_flag(bio, BIO_QUIET);
 	/* don't actually finish bio if it's part of flush sequence */
 	if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
 		bio_endio(bio);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 04/16] block: don't bloat enter_queue with percpu_ref
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 03/16] block: optimise req_bio_endio() Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-19 22:32   ` Chaitanya Kulkarni
  2021-10-20  6:12   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 05/16] block: inline a part of bio_release_pages() Pavel Begunkov
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

percpu_ref_put() are inlined for performance and bloat the binary, we
don't care about the fail case of blk_try_enter_queue(), so we can
replace it with a call to blk_queue_exit().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/blk-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c1ba34777c6d..88752e51d2b6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -404,7 +404,7 @@ static bool blk_try_enter_queue(struct request_queue *q, bool pm)
 	return true;
 
 fail_put:
-	percpu_ref_put(&q->q_usage_counter);
+	blk_queue_exit(q);
 fail:
 	rcu_read_unlock();
 	return false;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 05/16] block: inline a part of bio_release_pages()
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (3 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 04/16] block: don't bloat enter_queue with percpu_ref Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:13   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 06/16] block: clean up blk_mq_submit_bio() merging Pavel Begunkov
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Inline BIO_NO_PAGE_REF check of bio_release_pages() to avoid function
call.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/bio.c         | 7 ++-----
 include/linux/bio.h | 8 +++++++-
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 4f397ba47db5..46a87c72d2b4 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1033,21 +1033,18 @@ int bio_add_page(struct bio *bio, struct page *page,
 }
 EXPORT_SYMBOL(bio_add_page);
 
-void bio_release_pages(struct bio *bio, bool mark_dirty)
+void __bio_release_pages(struct bio *bio, bool mark_dirty)
 {
 	struct bvec_iter_all iter_all;
 	struct bio_vec *bvec;
 
-	if (bio_flagged(bio, BIO_NO_PAGE_REF))
-		return;
-
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		if (mark_dirty && !PageCompound(bvec->bv_page))
 			set_page_dirty_lock(bvec->bv_page);
 		put_page(bvec->bv_page);
 	}
 }
-EXPORT_SYMBOL_GPL(bio_release_pages);
+EXPORT_SYMBOL_GPL(__bio_release_pages);
 
 static void __bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
 {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index b12453d7b8a8..c88700d1bdc3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -417,7 +417,7 @@ int bio_add_zone_append_page(struct bio *bio, struct page *page,
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
-void bio_release_pages(struct bio *bio, bool mark_dirty);
+void __bio_release_pages(struct bio *bio, bool mark_dirty);
 extern void bio_set_pages_dirty(struct bio *bio);
 extern void bio_check_pages_dirty(struct bio *bio);
 
@@ -428,6 +428,12 @@ extern void bio_free_pages(struct bio *bio);
 void guard_bio_eod(struct bio *bio);
 void zero_fill_bio(struct bio *bio);
 
+static inline void bio_release_pages(struct bio *bio, bool mark_dirty)
+{
+	if (!bio_flagged(bio, BIO_NO_PAGE_REF))
+		__bio_release_pages(bio, mark_dirty);
+}
+
 extern const char *bio_devname(struct bio *bio, char *buffer);
 
 #define bio_dev(bio) \
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 06/16] block: clean up blk_mq_submit_bio() merging
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (4 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 05/16] block: inline a part of bio_release_pages() Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:16   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 07/16] blocK: move plug flush functions to blk-mq.c Pavel Begunkov
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Combine blk_mq_sched_bio_merge() and blk_attempt_plug_merge() under a
common if, so we don't check it twice. Also honor bio_mergeable() for
blk_mq_sched_bio_merge().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/blk-mq-sched.c |  2 +-
 block/blk-mq-sched.h | 12 +-----------
 block/blk-mq.c       | 13 +++++++------
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index e85b7556b096..5b259fdea794 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -361,7 +361,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	}
 }
 
-bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
+bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs)
 {
 	struct elevator_queue *e = q->elevator;
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 98836106b25f..25d1034952b6 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -12,7 +12,7 @@ void blk_mq_sched_assign_ioc(struct request *rq);
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs, struct request **merged_request);
-bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
+bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs);
 bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq,
 				   struct list_head *free);
@@ -42,16 +42,6 @@ static inline bool bio_mergeable(struct bio *bio)
 	return !(bio->bi_opf & REQ_NOMERGE_FLAGS);
 }
 
-static inline bool
-blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
-		unsigned int nr_segs)
-{
-	if (blk_queue_nomerges(q) || !bio_mergeable(bio))
-		return false;
-
-	return __blk_mq_sched_bio_merge(q, bio, nr_segs);
-}
-
 static inline bool
 blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
 			 struct bio *bio)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index bab1fccda6ca..218bfaa98591 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2482,12 +2482,13 @@ void blk_mq_submit_bio(struct bio *bio)
 	if (!bio_integrity_prep(bio))
 		goto queue_exit;
 
-	if (!is_flush_fua && !blk_queue_nomerges(q) &&
-	    blk_attempt_plug_merge(q, bio, nr_segs, &same_queue_rq))
-		goto queue_exit;
-
-	if (blk_mq_sched_bio_merge(q, bio, nr_segs))
-		goto queue_exit;
+	if (!blk_queue_nomerges(q) && bio_mergeable(bio)) {
+		if (!is_flush_fua &&
+		    blk_attempt_plug_merge(q, bio, nr_segs, &same_queue_rq))
+			goto queue_exit;
+		if (blk_mq_sched_bio_merge(q, bio, nr_segs))
+			goto queue_exit;
+	}
 
 	rq_qos_throttle(q, bio);
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 07/16] blocK: move plug flush functions to blk-mq.c
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (5 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 06/16] block: clean up blk_mq_submit_bio() merging Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-19 22:34   ` Chaitanya Kulkarni
  2021-10-20  6:17   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 08/16] block: optimise blk_flush_plug_list Pavel Begunkov
                   ` (12 subsequent siblings)
  19 siblings, 2 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Flushing is tightly coupled with blk-mq and almost all
blk_flush_plug_list() callees are in blk-mq.c. So move the whole thing
there, so the compiler is able to apply more optimisations and inline.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/blk-core.c       | 27 ---------------------------
 block/blk-mq.c         | 33 +++++++++++++++++++++++++++++----
 block/blk-mq.h         |  1 -
 include/linux/blk-mq.h |  2 --
 4 files changed, 29 insertions(+), 34 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 88752e51d2b6..52019b8a1487 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1595,23 +1595,6 @@ void blk_start_plug(struct blk_plug *plug)
 }
 EXPORT_SYMBOL(blk_start_plug);
 
-static void flush_plug_callbacks(struct blk_plug *plug, bool from_schedule)
-{
-	LIST_HEAD(callbacks);
-
-	while (!list_empty(&plug->cb_list)) {
-		list_splice_init(&plug->cb_list, &callbacks);
-
-		while (!list_empty(&callbacks)) {
-			struct blk_plug_cb *cb = list_first_entry(&callbacks,
-							  struct blk_plug_cb,
-							  list);
-			list_del(&cb->list);
-			cb->callback(cb, from_schedule);
-		}
-	}
-}
-
 struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn unplug, void *data,
 				      int size)
 {
@@ -1637,16 +1620,6 @@ struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn unplug, void *data,
 }
 EXPORT_SYMBOL(blk_check_plugged);
 
-void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
-{
-	flush_plug_callbacks(plug, from_schedule);
-
-	if (!rq_list_empty(plug->mq_list))
-		blk_mq_flush_plug_list(plug, from_schedule);
-	if (unlikely(!from_schedule && plug->cached_rq))
-		blk_mq_free_plug_rqs(plug);
-}
-
 /**
  * blk_finish_plug - mark the end of a batch of submitted I/O
  * @plug:	The &struct blk_plug passed to blk_start_plug()
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 218bfaa98591..6bdbaa838030 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -604,7 +604,7 @@ void blk_mq_free_request(struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_mq_free_request);
 
-void blk_mq_free_plug_rqs(struct blk_plug *plug)
+static void blk_mq_free_plug_rqs(struct blk_plug *plug)
 {
 	struct request *rq;
 
@@ -2199,15 +2199,13 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 		blk_mq_commit_rqs(hctx, &queued, from_schedule);
 }
 
-void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
+static void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 {
 	struct blk_mq_hw_ctx *this_hctx;
 	struct blk_mq_ctx *this_ctx;
 	unsigned int depth;
 	LIST_HEAD(list);
 
-	if (rq_list_empty(plug->mq_list))
-		return;
 	plug->rq_count = 0;
 
 	if (!plug->multiple_queues && !plug->has_elevator) {
@@ -2249,6 +2247,33 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	}
 }
 
+static void flush_plug_callbacks(struct blk_plug *plug, bool from_schedule)
+{
+	LIST_HEAD(callbacks);
+
+	while (!list_empty(&plug->cb_list)) {
+		list_splice_init(&plug->cb_list, &callbacks);
+
+		while (!list_empty(&callbacks)) {
+			struct blk_plug_cb *cb = list_first_entry(&callbacks,
+							  struct blk_plug_cb,
+							  list);
+			list_del(&cb->list);
+			cb->callback(cb, from_schedule);
+		}
+	}
+}
+
+void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
+{
+	flush_plug_callbacks(plug, from_schedule);
+
+	if (!rq_list_empty(plug->mq_list))
+		blk_mq_flush_plug_list(plug, from_schedule);
+	if (unlikely(!from_schedule && plug->cached_rq))
+		blk_mq_free_plug_rqs(plug);
+}
+
 static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
 		unsigned int nr_segs)
 {
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ebf67f4d4f2e..bab40688e59b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -121,7 +121,6 @@ extern int __blk_mq_register_dev(struct device *dev, struct request_queue *q);
 extern int blk_mq_sysfs_register(struct request_queue *q);
 extern void blk_mq_sysfs_unregister(struct request_queue *q);
 extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
-void blk_mq_free_plug_rqs(struct blk_plug *plug);
 
 void blk_mq_release(struct request_queue *q);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 6cf35de151a9..e13780236550 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -656,8 +656,6 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set,
 		unsigned int set_flags);
 void blk_mq_free_tag_set(struct blk_mq_tag_set *set);
 
-void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule);
-
 void blk_mq_free_request(struct request *rq);
 
 bool blk_mq_queue_inflight(struct request_queue *q);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 08/16] block: optimise blk_flush_plug_list
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (6 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 07/16] blocK: move plug flush functions to blk-mq.c Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:29   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 09/16] block: optimise boundary blkdev_read_iter's checks Pavel Begunkov
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

First, don't init a callback list if there are no plug callbacks. Also,
replace internals of the function with do-while.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/blk-mq.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6bdbaa838030..6627ea76f7c6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2251,22 +2251,24 @@ static void flush_plug_callbacks(struct blk_plug *plug, bool from_schedule)
 {
 	LIST_HEAD(callbacks);
 
-	while (!list_empty(&plug->cb_list)) {
+	do {
 		list_splice_init(&plug->cb_list, &callbacks);
 
-		while (!list_empty(&callbacks)) {
+		do {
 			struct blk_plug_cb *cb = list_first_entry(&callbacks,
 							  struct blk_plug_cb,
 							  list);
+
 			list_del(&cb->list);
 			cb->callback(cb, from_schedule);
-		}
-	}
+		} while (!list_empty(&callbacks));
+	} while (!list_empty(&plug->cb_list));
 }
 
 void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 {
-	flush_plug_callbacks(plug, from_schedule);
+	if (!list_empty(&plug->cb_list))
+		flush_plug_callbacks(plug, from_schedule);
 
 	if (!rq_list_empty(plug->mq_list))
 		blk_mq_flush_plug_list(plug, from_schedule);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 09/16] block: optimise boundary blkdev_read_iter's checks
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (7 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 08/16] block: optimise blk_flush_plug_list Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:29   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 10/16] block: optimise blkdev_bio_end_io() Pavel Begunkov
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Combine pos and len checks and mark unlikely. Also, don't reexpand if
it's not truncated.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/fops.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 21d25ee0e4bf..8f733c919ed1 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -503,17 +503,20 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	size_t shorted = 0;
 	ssize_t ret;
 
-	if (pos >= size)
-		return 0;
-
-	size -= pos;
-	if (iov_iter_count(to) > size) {
-		shorted = iov_iter_count(to) - size;
-		iov_iter_truncate(to, size);
+	if (unlikely(pos + iov_iter_count(to) > size)) {
+		if (pos >= size)
+			return 0;
+		size -= pos;
+		if (iov_iter_count(to) > size) {
+			shorted = iov_iter_count(to) - size;
+			iov_iter_truncate(to, size);
+		}
 	}
 
 	ret = generic_file_read_iter(iocb, to);
-	iov_iter_reexpand(to, iov_iter_count(to) + shorted);
+
+	if (unlikely(shorted))
+		iov_iter_reexpand(to, iov_iter_count(to) + shorted);
 	return ret;
 }
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 10/16] block: optimise blkdev_bio_end_io()
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (8 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 09/16] block: optimise boundary blkdev_read_iter's checks Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:30   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 11/16] block: add optimised version bio_set_dev() Pavel Begunkov
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Save dio->flags in a variable, so it doesn't reload it a bunch of times.
Also use cached in a var iocb for the same reason.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/fops.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 8f733c919ed1..8e3790faafb8 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -145,13 +145,13 @@ static struct bio_set blkdev_dio_pool;
 static void blkdev_bio_end_io(struct bio *bio)
 {
 	struct blkdev_dio *dio = bio->bi_private;
-	bool should_dirty = dio->flags & DIO_SHOULD_DIRTY;
+	unsigned int flags = dio->flags;
 
 	if (bio->bi_status && !dio->bio.bi_status)
 		dio->bio.bi_status = bio->bi_status;
 
-	if (!(dio->flags & DIO_MULTI_BIO) || atomic_dec_and_test(&dio->ref)) {
-		if (!(dio->flags & DIO_IS_SYNC)) {
+	if (!(flags & DIO_MULTI_BIO) || atomic_dec_and_test(&dio->ref)) {
+		if (!(flags & DIO_IS_SYNC)) {
 			struct kiocb *iocb = dio->iocb;
 			ssize_t ret;
 
@@ -164,8 +164,8 @@ static void blkdev_bio_end_io(struct bio *bio)
 				ret = blk_status_to_errno(dio->bio.bi_status);
 			}
 
-			dio->iocb->ki_complete(iocb, ret, 0);
-			if (dio->flags & DIO_MULTI_BIO)
+			iocb->ki_complete(iocb, ret, 0);
+			if (flags & DIO_MULTI_BIO)
 				bio_put(&dio->bio);
 		} else {
 			struct task_struct *waiter = dio->waiter;
@@ -175,7 +175,7 @@ static void blkdev_bio_end_io(struct bio *bio)
 		}
 	}
 
-	if (should_dirty) {
+	if (flags & DIO_SHOULD_DIRTY) {
 		bio_check_pages_dirty(bio);
 	} else {
 		bio_release_pages(bio, false);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 11/16] block: add optimised version bio_set_dev()
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (9 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 10/16] block: optimise blkdev_bio_end_io() Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-19 22:36   ` Chaitanya Kulkarni
  2021-10-20  6:20   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 12/16] block: add single bio async direct IO helper Pavel Begunkov
                   ` (8 subsequent siblings)
  19 siblings, 2 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

If a bio was just allocated its flags should be zero and there is no
need to clear them. Add __bio_set_dev(), which is faster and doesn't
care about clering flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/fops.c        |  4 ++--
 include/linux/bio.h | 10 ++++++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 8e3790faafb8..7cf98db0595a 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -75,7 +75,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 	}
 
 	bio_init(&bio, vecs, nr_pages);
-	bio_set_dev(&bio, bdev);
+	__bio_set_dev(&bio, bdev);
 	bio.bi_iter.bi_sector = pos >> 9;
 	bio.bi_write_hint = iocb->ki_hint;
 	bio.bi_private = current;
@@ -224,7 +224,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 		blk_start_plug(&plug);
 
 	for (;;) {
-		bio_set_dev(bio, bdev);
+		__bio_set_dev(bio, bdev);
 		bio->bi_iter.bi_sector = pos >> 9;
 		bio->bi_write_hint = iocb->ki_hint;
 		bio->bi_private = dio;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c88700d1bdc3..0ab4fa2c89c3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -453,13 +453,19 @@ static inline void bio_clone_blkg_association(struct bio *dst,
 					      struct bio *src) { }
 #endif	/* CONFIG_BLK_CGROUP */
 
+/* can be used only with freshly allocated bios */
+static inline void __bio_set_dev(struct bio *bio, struct block_device *bdev)
+{
+	bio->bi_bdev = bdev;
+	bio_associate_blkg(bio);
+}
+
 static inline void bio_set_dev(struct bio *bio, struct block_device *bdev)
 {
 	bio_clear_flag(bio, BIO_REMAPPED);
 	if (bio->bi_bdev != bdev)
 		bio_clear_flag(bio, BIO_THROTTLED);
-	bio->bi_bdev = bdev;
-	bio_associate_blkg(bio);
+	__bio_set_dev(bio, bdev);
 }
 
 static inline void bio_copy_dev(struct bio *dst, struct bio *src)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 12/16] block: add single bio async direct IO helper
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (10 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 11/16] block: add optimised version bio_set_dev() Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:36   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 13/16] block: add async version of bio_set_polled Pavel Begunkov
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

As with __blkdev_direct_IO_simple(), we can implement direct IO more
efficiently if there is only one bio. Add __blkdev_direct_IO_async() and
blkdev_bio_end_io_async(). This patch brings me from 4.45-4.5 MIOPS with
nullblk to 4.7+.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/fops.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 87 insertions(+), 3 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 7cf98db0595a..0f1332374756 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -305,6 +305,88 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	return ret;
 }
 
+static void blkdev_bio_end_io_async(struct bio *bio)
+{
+	struct blkdev_dio *dio = container_of(bio, struct blkdev_dio, bio);
+	struct kiocb *iocb = dio->iocb;
+	ssize_t ret;
+
+	if (likely(!bio->bi_status)) {
+		ret = dio->size;
+		iocb->ki_pos += ret;
+	} else {
+		ret = blk_status_to_errno(bio->bi_status);
+	}
+
+	iocb->ki_complete(iocb, ret, 0);
+
+	if (dio->flags & DIO_SHOULD_DIRTY) {
+		bio_check_pages_dirty(bio);
+	} else {
+		bio_release_pages(bio, false);
+		bio_put(bio);
+	}
+}
+
+static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
+					struct iov_iter *iter,
+					unsigned int nr_pages)
+{
+	struct block_device *bdev = iocb->ki_filp->private_data;
+	struct blkdev_dio *dio;
+	struct bio *bio;
+	loff_t pos = iocb->ki_pos;
+	int ret = 0;
+
+	if ((pos | iov_iter_alignment(iter)) &
+	    (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	bio = bio_alloc_kiocb(iocb, nr_pages, &blkdev_dio_pool);
+	dio = container_of(bio, struct blkdev_dio, bio);
+	__bio_set_dev(bio, bdev);
+	bio->bi_iter.bi_sector = pos >> 9;
+	bio->bi_write_hint = iocb->ki_hint;
+	bio->bi_end_io = blkdev_bio_end_io_async;
+	bio->bi_ioprio = iocb->ki_ioprio;
+	dio->flags = 0;
+	dio->iocb = iocb;
+
+	ret = bio_iov_iter_get_pages(bio, iter);
+	if (unlikely(ret)) {
+		bio->bi_status = BLK_STS_IOERR;
+		bio_endio(bio);
+		return BLK_STS_IOERR;
+	}
+	dio->size = bio->bi_iter.bi_size;
+
+	if (iov_iter_rw(iter) == READ) {
+		bio->bi_opf = REQ_OP_READ;
+		if (iter_is_iovec(iter)) {
+			dio->flags |= DIO_SHOULD_DIRTY;
+			bio_set_pages_dirty(bio);
+		}
+	} else {
+		bio->bi_opf = dio_bio_write_op(iocb);
+		task_io_account_write(bio->bi_iter.bi_size);
+	}
+
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		bio->bi_opf |= REQ_NOWAIT;
+	/*
+	 * Don't plug for HIPRI/polled IO, as those should go straight
+	 * to issue
+	 */
+	if (iocb->ki_flags & IOCB_HIPRI) {
+		bio_set_polled(bio, iocb);
+		submit_bio(bio);
+		WRITE_ONCE(iocb->private, bio);
+	} else {
+		submit_bio(bio);
+	}
+	return -EIOCBQUEUED;
+}
+
 static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
 	unsigned int nr_pages;
@@ -313,9 +395,11 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		return 0;
 
 	nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
-	if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_VECS)
-		return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
-
+	if (likely(nr_pages <= BIO_MAX_VECS)) {
+		if (is_sync_kiocb(iocb))
+			return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
+		return __blkdev_direct_IO_async(iocb, iter, nr_pages);
+	}
 	return __blkdev_direct_IO(iocb, iter, bio_max_segs(nr_pages));
 }
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 13/16] block: add async version of bio_set_polled
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (11 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 12/16] block: add single bio async direct IO helper Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:37   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 14/16] block: skip advance when async and not needed Pavel Begunkov
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

If we know that a iocb is async we can optimise bio_set_polled() a bit,
add a new helper bio_set_polled_async().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/fops.c        | 6 +++---
 include/linux/bio.h | 5 +++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 0f1332374756..ee27ffbdd018 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -371,17 +371,17 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 		task_io_account_write(bio->bi_iter.bi_size);
 	}
 
-	if (iocb->ki_flags & IOCB_NOWAIT)
-		bio->bi_opf |= REQ_NOWAIT;
 	/*
 	 * Don't plug for HIPRI/polled IO, as those should go straight
 	 * to issue
 	 */
 	if (iocb->ki_flags & IOCB_HIPRI) {
-		bio_set_polled(bio, iocb);
+		bio_set_polled_async(bio, iocb);
 		submit_bio(bio);
 		WRITE_ONCE(iocb->private, bio);
 	} else {
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			bio->bi_opf |= REQ_NOWAIT;
 		submit_bio(bio);
 	}
 	return -EIOCBQUEUED;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 0ab4fa2c89c3..4043e0774b89 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -743,6 +743,11 @@ static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb)
 		bio->bi_opf |= REQ_NOWAIT;
 }
 
+static inline void bio_set_polled_async(struct bio *bio, struct kiocb *kiocb)
+{
+	bio->bi_opf |= REQ_POLLED | REQ_NOWAIT;
+}
+
 struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp);
 
 #endif /* __LINUX_BIO_H */
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 14/16] block: skip advance when async and not needed
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (12 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 13/16] block: add async version of bio_set_polled Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:41   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 15/16] block: optimise blk_may_split for normal rw Pavel Begunkov
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Nobody cares about iov iterators state if we return -EIOCBQUEUED, so as
the we now have __blkdev_direct_IO_async(), which gets pages only once,
we can skip expensive iov_iter_advance(). It's around 1-2% of all CPU
spent.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/bio.c         | 13 ++++++++-----
 block/fops.c        |  2 +-
 include/linux/bio.h |  9 ++++++++-
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 46a87c72d2b4..0ed836e98734 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1058,10 +1058,12 @@ static void __bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
 	bio_set_flag(bio, BIO_CLONED);
 }
 
-static int bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
+static int bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter,
+			    bool hint_skip_advance)
 {
 	__bio_iov_bvec_set(bio, iter);
-	iov_iter_advance(iter, iter->count);
+	if (!hint_skip_advance)
+		iov_iter_advance(iter, iter->count);
 	return 0;
 }
 
@@ -1212,14 +1214,15 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
  * It's intended for direct IO, so doesn't do PSI tracking, the caller is
  * responsible for setting BIO_WORKINGSET if necessary.
  */
-int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
+int bio_iov_iter_get_pages_hint(struct bio *bio, struct iov_iter *iter,
+				bool hint_skip_advance)
 {
 	int ret = 0;
 
 	if (iov_iter_is_bvec(iter)) {
 		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
 			return bio_iov_bvec_set_append(bio, iter);
-		return bio_iov_bvec_set(bio, iter);
+		return bio_iov_bvec_set(bio, iter, hint_skip_advance);
 	}
 
 	do {
@@ -1233,7 +1236,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	bio_clear_flag(bio, BIO_WORKINGSET);
 	return bio->bi_vcnt ? 0 : ret;
 }
-EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
+EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages_hint);
 
 static void submit_bio_wait_endio(struct bio *bio)
 {
diff --git a/block/fops.c b/block/fops.c
index ee27ffbdd018..d4c770c5085b 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -352,7 +352,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	dio->flags = 0;
 	dio->iocb = iocb;
 
-	ret = bio_iov_iter_get_pages(bio, iter);
+	ret = bio_iov_iter_get_pages_hint(bio, iter, true);
 	if (unlikely(ret)) {
 		bio->bi_status = BLK_STS_IOERR;
 		bio_endio(bio);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 4043e0774b89..51413fe33720 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -416,7 +416,8 @@ int bio_add_zone_append_page(struct bio *bio, struct page *page,
 			     unsigned int len, unsigned int offset);
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
-int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
+int bio_iov_iter_get_pages_hint(struct bio *bio, struct iov_iter *iter,
+				bool hint_skip_advance);
 void __bio_release_pages(struct bio *bio, bool mark_dirty);
 extern void bio_set_pages_dirty(struct bio *bio);
 extern void bio_check_pages_dirty(struct bio *bio);
@@ -428,6 +429,12 @@ extern void bio_free_pages(struct bio *bio);
 void guard_bio_eod(struct bio *bio);
 void zero_fill_bio(struct bio *bio);
 
+static inline int bio_iov_iter_get_pages(struct bio *bio,
+					 struct iov_iter *iter)
+{
+	return bio_iov_iter_get_pages_hint(bio, iter, false);
+}
+
 static inline void bio_release_pages(struct bio *bio, bool mark_dirty)
 {
 	if (!bio_flagged(bio, BIO_NO_PAGE_REF))
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 15/16] block: optimise blk_may_split for normal rw
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (13 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 14/16] block: skip advance when async and not needed Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:25   ` Christoph Hellwig
  2021-10-19 21:24 ` [PATCH 16/16] block: optimise submit_bio_checks " Pavel Begunkov
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Read/write/flush are the most common operations, optimise switch in
blk_may_split() for these cases. All three added conditions are compiled
into a single comparison as the corresponding REQ_OP_* take 0-2.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/blk.h | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 6a039e6c7d07..0bf00e96e1f0 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -269,14 +269,18 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *,
 
 static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
 {
-	switch (bio_op(bio)) {
-	case REQ_OP_DISCARD:
-	case REQ_OP_SECURE_ERASE:
-	case REQ_OP_WRITE_ZEROES:
-	case REQ_OP_WRITE_SAME:
-		return true; /* non-trivial splitting decisions */
-	default:
-		break;
+	unsigned int op = bio_op(bio);
+
+	if (op != REQ_OP_READ && op != REQ_OP_WRITE && op != REQ_OP_FLUSH) {
+		switch (op) {
+		case REQ_OP_DISCARD:
+		case REQ_OP_SECURE_ERASE:
+		case REQ_OP_WRITE_ZEROES:
+		case REQ_OP_WRITE_SAME:
+			return true; /* non-trivial splitting decisions */
+		default:
+			break;
+		}
 	}
 
 	/*
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 16/16] block: optimise submit_bio_checks for normal rw
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (14 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 15/16] block: optimise blk_may_split for normal rw Pavel Begunkov
@ 2021-10-19 21:24 ` Pavel Begunkov
  2021-10-20  6:26   ` Christoph Hellwig
  2021-10-19 23:31 ` [PATCH 00/16] block optimisation round Jens Axboe
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-19 21:24 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Pavel Begunkov

Optimise the switch in submit_bio_checks() for reads, writes and
flushes. REQ_OP_READ/WRITE/FLUSH take numbers from 0 to 2, so the added
checks are compiled into a single condition:

if (op <= REQ_OP_FLUSH) {} else { switch() ... };

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/blk-core.c | 74 +++++++++++++++++++++++++-----------------------
 1 file changed, 39 insertions(+), 35 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 52019b8a1487..7ba8f53a8340 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -776,6 +776,7 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 	struct request_queue *q = bdev_get_queue(bdev);
 	blk_status_t status = BLK_STS_IOERR;
 	struct blk_plug *plug;
+	unsigned op;
 
 	might_sleep();
 
@@ -817,41 +818,44 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
 		bio_clear_polled(bio);
 
-	switch (bio_op(bio)) {
-	case REQ_OP_DISCARD:
-		if (!blk_queue_discard(q))
-			goto not_supported;
-		break;
-	case REQ_OP_SECURE_ERASE:
-		if (!blk_queue_secure_erase(q))
-			goto not_supported;
-		break;
-	case REQ_OP_WRITE_SAME:
-		if (!q->limits.max_write_same_sectors)
-			goto not_supported;
-		break;
-	case REQ_OP_ZONE_APPEND:
-		status = blk_check_zone_append(q, bio);
-		if (status != BLK_STS_OK)
-			goto end_io;
-		break;
-	case REQ_OP_ZONE_RESET:
-	case REQ_OP_ZONE_OPEN:
-	case REQ_OP_ZONE_CLOSE:
-	case REQ_OP_ZONE_FINISH:
-		if (!blk_queue_is_zoned(q))
-			goto not_supported;
-		break;
-	case REQ_OP_ZONE_RESET_ALL:
-		if (!blk_queue_is_zoned(q) || !blk_queue_zone_resetall(q))
-			goto not_supported;
-		break;
-	case REQ_OP_WRITE_ZEROES:
-		if (!q->limits.max_write_zeroes_sectors)
-			goto not_supported;
-		break;
-	default:
-		break;
+	op = bio_op(bio);
+	if (op != REQ_OP_READ && op != REQ_OP_WRITE && op != REQ_OP_FLUSH) {
+		switch (op) {
+		case REQ_OP_DISCARD:
+			if (!blk_queue_discard(q))
+				goto not_supported;
+			break;
+		case REQ_OP_SECURE_ERASE:
+			if (!blk_queue_secure_erase(q))
+				goto not_supported;
+			break;
+		case REQ_OP_WRITE_SAME:
+			if (!q->limits.max_write_same_sectors)
+				goto not_supported;
+			break;
+		case REQ_OP_ZONE_APPEND:
+			status = blk_check_zone_append(q, bio);
+			if (status != BLK_STS_OK)
+				goto end_io;
+			break;
+		case REQ_OP_ZONE_RESET:
+		case REQ_OP_ZONE_OPEN:
+		case REQ_OP_ZONE_CLOSE:
+		case REQ_OP_ZONE_FINISH:
+			if (!blk_queue_is_zoned(q))
+				goto not_supported;
+			break;
+		case REQ_OP_ZONE_RESET_ALL:
+			if (!blk_queue_is_zoned(q) || !blk_queue_zone_resetall(q))
+				goto not_supported;
+			break;
+		case REQ_OP_WRITE_ZEROES:
+			if (!q->limits.max_write_zeroes_sectors)
+				goto not_supported;
+			break;
+		default:
+			break;
+		}
 	}
 
 	/*
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/16] block: don't bloat enter_queue with percpu_ref
  2021-10-19 21:24 ` [PATCH 04/16] block: don't bloat enter_queue with percpu_ref Pavel Begunkov
@ 2021-10-19 22:32   ` Chaitanya Kulkarni
  2021-10-20  6:12   ` Christoph Hellwig
  1 sibling, 0 replies; 57+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-19 22:32 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block; +Cc: Jens Axboe

On 10/19/21 2:24 PM, Pavel Begunkov wrote:
> External email: Use caution opening links or attachments
> 
> 
> percpu_ref_put() are inlined for performance and bloat the binary, we
> don't care about the fail case of blk_try_enter_queue(), so we can
> replace it with a call to blk_queue_exit().
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 02/16] block: convert leftovers to bdev_get_queue
  2021-10-19 21:24 ` [PATCH 02/16] block: convert leftovers to bdev_get_queue Pavel Begunkov
@ 2021-10-19 22:34   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 57+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-19 22:34 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block; +Cc: Jens Axboe

On 10/19/21 2:24 PM, Pavel Begunkov wrote:
> External email: Use caution opening links or attachments
> 
> 
> Convert bdev->bd_disk->queue to bdev_get_queue(), which is faster.
> Apparently, there are a few such spots in block that got lost during
> rebases.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/16] blocK: move plug flush functions to blk-mq.c
  2021-10-19 21:24 ` [PATCH 07/16] blocK: move plug flush functions to blk-mq.c Pavel Begunkov
@ 2021-10-19 22:34   ` Chaitanya Kulkarni
  2021-10-20  6:17   ` Christoph Hellwig
  1 sibling, 0 replies; 57+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-19 22:34 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block; +Cc: Jens Axboe

On 10/19/21 2:24 PM, Pavel Begunkov wrote:
> External email: Use caution opening links or attachments
> 
> 
> Flushing is tightly coupled with blk-mq and almost all
> blk_flush_plug_list() callees are in blk-mq.c. So move the whole thing
> there, so the compiler is able to apply more optimisations and inline.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/16] block: add optimised version bio_set_dev()
  2021-10-19 21:24 ` [PATCH 11/16] block: add optimised version bio_set_dev() Pavel Begunkov
@ 2021-10-19 22:36   ` Chaitanya Kulkarni
  2021-10-20  6:20   ` Christoph Hellwig
  1 sibling, 0 replies; 57+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-19 22:36 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block; +Cc: Jens Axboe

On 10/19/21 2:24 PM, Pavel Begunkov wrote:
> External email: Use caution opening links or attachments
> 
> 
> If a bio was just allocated its flags should be zero and there is no
> need to clear them. Add __bio_set_dev(), which is faster and doesn't
> care about clering flags.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/16] block optimisation round
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (15 preceding siblings ...)
  2021-10-19 21:24 ` [PATCH 16/16] block: optimise submit_bio_checks " Pavel Begunkov
@ 2021-10-19 23:31 ` Jens Axboe
  2021-10-20  0:21 ` Jens Axboe
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 57+ messages in thread
From: Jens Axboe @ 2021-10-19 23:31 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block

On 10/19/21 3:24 PM, Pavel Begunkov wrote:
> Jens tried out a similar series with some not yet sent additions:
> 8.2-8.3 MIOPS -> ~9 MIOPS, or 8-10%.
> 
> 12/16 is bulky, but it nicely drives the numbers. Moreover, with
> it we can rid of some not used anymore optimisations in
> __blkdev_direct_IO() because it awlays serve multiple bios.
> E.g. no need in conditional referencing with DIO_MULTI_BIO,
> and _probably_ can be converted to chained bio.

These look good to me, only the trivial comment that patch 7
has a blocK instead of block.

I'll await further comments/reviews on these.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/16] block optimisation round
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (16 preceding siblings ...)
  2021-10-19 23:31 ` [PATCH 00/16] block optimisation round Jens Axboe
@ 2021-10-20  0:21 ` Jens Axboe
  2021-10-20  0:22   ` Jens Axboe
  2021-10-20 14:12 ` (subset) " Jens Axboe
  2021-10-20 14:54 ` Pavel Begunkov
  19 siblings, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2021-10-20  0:21 UTC (permalink / raw)
  To: linux-block, Pavel Begunkov; +Cc: Jens Axboe

On Tue, 19 Oct 2021 22:24:09 +0100, Pavel Begunkov wrote:
> Jens tried out a similar series with some not yet sent additions:
> 8.2-8.3 MIOPS -> ~9 MIOPS, or 8-10%.
> 
> 12/16 is bulky, but it nicely drives the numbers. Moreover, with
> it we can rid of some not used anymore optimisations in
> __blkdev_direct_IO() because it awlays serve multiple bios.
> E.g. no need in conditional referencing with DIO_MULTI_BIO,
> and _probably_ can be converted to chained bio.
> 
> [...]

Applied, thanks!

[01/16] block: turn macro helpers into inline functions
        (no commit info)
[02/16] block: convert leftovers to bdev_get_queue
        (no commit info)
[03/16] block: optimise req_bio_endio()
        (no commit info)
[04/16] block: don't bloat enter_queue with percpu_ref
        (no commit info)
[05/16] block: inline a part of bio_release_pages()
        (no commit info)
[06/16] block: clean up blk_mq_submit_bio() merging
        (no commit info)
[07/16] blocK: move plug flush functions to blk-mq.c
        (no commit info)
[08/16] block: optimise blk_flush_plug_list
        (no commit info)
[09/16] block: optimise boundary blkdev_read_iter's checks
        (no commit info)
[10/16] block: optimise blkdev_bio_end_io()
        (no commit info)
[11/16] block: add optimised version bio_set_dev()
        (no commit info)
[12/16] block: add single bio async direct IO helper
        (no commit info)
[13/16] block: add async version of bio_set_polled
        (no commit info)
[14/16] block: skip advance when async and not needed
        (no commit info)
[15/16] block: optimise blk_may_split for normal rw
        (no commit info)
[16/16] block: optimise submit_bio_checks for normal rw
        (no commit info)

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/16] block optimisation round
  2021-10-20  0:21 ` Jens Axboe
@ 2021-10-20  0:22   ` Jens Axboe
  0 siblings, 0 replies; 57+ messages in thread
From: Jens Axboe @ 2021-10-20  0:22 UTC (permalink / raw)
  To: linux-block, Pavel Begunkov

On 10/19/21 6:21 PM, Jens Axboe wrote:
> On Tue, 19 Oct 2021 22:24:09 +0100, Pavel Begunkov wrote:
>> Jens tried out a similar series with some not yet sent additions:
>> 8.2-8.3 MIOPS -> ~9 MIOPS, or 8-10%.
>>
>> 12/16 is bulky, but it nicely drives the numbers. Moreover, with
>> it we can rid of some not used anymore optimisations in
>> __blkdev_direct_IO() because it awlays serve multiple bios.
>> E.g. no need in conditional referencing with DIO_MULTI_BIO,
>> and _probably_ can be converted to chained bio.
>>
>> [...]
> 
> Applied, thanks!
> 
> [01/16] block: turn macro helpers into inline functions
>         (no commit info)
> [02/16] block: convert leftovers to bdev_get_queue
>         (no commit info)
> [03/16] block: optimise req_bio_endio()
>         (no commit info)
> [04/16] block: don't bloat enter_queue with percpu_ref
>         (no commit info)
> [05/16] block: inline a part of bio_release_pages()
>         (no commit info)
> [06/16] block: clean up blk_mq_submit_bio() merging
>         (no commit info)
> [07/16] blocK: move plug flush functions to blk-mq.c
>         (no commit info)
> [08/16] block: optimise blk_flush_plug_list
>         (no commit info)
> [09/16] block: optimise boundary blkdev_read_iter's checks
>         (no commit info)
> [10/16] block: optimise blkdev_bio_end_io()
>         (no commit info)
> [11/16] block: add optimised version bio_set_dev()
>         (no commit info)
> [12/16] block: add single bio async direct IO helper
>         (no commit info)
> [13/16] block: add async version of bio_set_polled
>         (no commit info)
> [14/16] block: skip advance when async and not needed
>         (no commit info)
> [15/16] block: optimise blk_may_split for normal rw
>         (no commit info)
> [16/16] block: optimise submit_bio_checks for normal rw
>         (no commit info)

Sorry, b4 got too eager on that one, they are not applied just yet.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/16] block: turn macro helpers into inline functions
  2021-10-19 21:24 ` [PATCH 01/16] block: turn macro helpers into inline functions Pavel Begunkov
@ 2021-10-20  6:09   ` Christoph Hellwig
  0 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:09 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:10PM +0100, Pavel Begunkov wrote:
> Replace bio_set_dev() with an identical inline helper and move it
> further to fix a dependency problem with bio_associate_blkg(). Do the
> same for bio_copy_dev().

Looks fine.

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/16] block: optimise req_bio_endio()
  2021-10-19 21:24 ` [PATCH 03/16] block: optimise req_bio_endio() Pavel Begunkov
@ 2021-10-20  6:11   ` Christoph Hellwig
  2021-10-22  9:58   ` Shinichiro Kawasaki
  1 sibling, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:11 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:12PM +0100, Pavel Begunkov wrote:
> First, get rid of an extra branch and chain error checks. Also reshuffle
> it with bio_advance(), so it goes closer to the final check, with that
> the compiler loads rq->rq_flags only once, and also doesn't reload
> bio->bi_iter.bi_size if bio_advance() didn't actually advanced the iter.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

We should also probably look into killing the whole strange RQF_QUIET
to BIO_QUIET some day.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/16] block: don't bloat enter_queue with percpu_ref
  2021-10-19 21:24 ` [PATCH 04/16] block: don't bloat enter_queue with percpu_ref Pavel Begunkov
  2021-10-19 22:32   ` Chaitanya Kulkarni
@ 2021-10-20  6:12   ` Christoph Hellwig
  2021-10-20 12:08     ` Pavel Begunkov
  1 sibling, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:12 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:13PM +0100, Pavel Begunkov wrote:
> percpu_ref_put() are inlined for performance and bloat the binary, we
> don't care about the fail case of blk_try_enter_queue(), so we can
> replace it with a call to blk_queue_exit().

Does this make a difference for you?

That being said using the proper helpers always seems like a good idea,
so:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/16] block: inline a part of bio_release_pages()
  2021-10-19 21:24 ` [PATCH 05/16] block: inline a part of bio_release_pages() Pavel Begunkov
@ 2021-10-20  6:13   ` Christoph Hellwig
  2021-10-20 12:19     ` Pavel Begunkov
  2021-10-20 14:15     ` Jens Axboe
  0 siblings, 2 replies; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:13 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:14PM +0100, Pavel Begunkov wrote:
> Inline BIO_NO_PAGE_REF check of bio_release_pages() to avoid function
> call.

Fine with me, but it seems like we're really getting into benchmarketing
at some point..

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 06/16] block: clean up blk_mq_submit_bio() merging
  2021-10-19 21:24 ` [PATCH 06/16] block: clean up blk_mq_submit_bio() merging Pavel Begunkov
@ 2021-10-20  6:16   ` Christoph Hellwig
  2021-10-20 12:20     ` Pavel Begunkov
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:16 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:15PM +0100, Pavel Begunkov wrote:
> Combine blk_mq_sched_bio_merge() and blk_attempt_plug_merge() under a
> common if, so we don't check it twice. Also honor bio_mergeable() for
> blk_mq_sched_bio_merge().
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  block/blk-mq-sched.c |  2 +-
>  block/blk-mq-sched.h | 12 +-----------
>  block/blk-mq.c       | 13 +++++++------
>  3 files changed, 9 insertions(+), 18 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index e85b7556b096..5b259fdea794 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -361,7 +361,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  	}
>  }
>  
> -bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
> +bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
>  		unsigned int nr_segs)
>  {
>  	struct elevator_queue *e = q->elevator;
> diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
> index 98836106b25f..25d1034952b6 100644
> --- a/block/blk-mq-sched.h
> +++ b/block/blk-mq-sched.h
> @@ -12,7 +12,7 @@ void blk_mq_sched_assign_ioc(struct request *rq);
>  
>  bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
>  		unsigned int nr_segs, struct request **merged_request);
> -bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
> +bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
>  		unsigned int nr_segs);
>  bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq,
>  				   struct list_head *free);
> @@ -42,16 +42,6 @@ static inline bool bio_mergeable(struct bio *bio)
>  	return !(bio->bi_opf & REQ_NOMERGE_FLAGS);
>  }
>  
> -static inline bool
> -blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
> -		unsigned int nr_segs)
> -{
> -	if (blk_queue_nomerges(q) || !bio_mergeable(bio))
> -		return false;
> -
> -	return __blk_mq_sched_bio_merge(q, bio, nr_segs);
> -}
> -
>  static inline bool
>  blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>  			 struct bio *bio)
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index bab1fccda6ca..218bfaa98591 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2482,12 +2482,13 @@ void blk_mq_submit_bio(struct bio *bio)
>  	if (!bio_integrity_prep(bio))
>  		goto queue_exit;
>  
> -	if (!is_flush_fua && !blk_queue_nomerges(q) &&
> -	    blk_attempt_plug_merge(q, bio, nr_segs, &same_queue_rq))
> -		goto queue_exit;
> -
> -	if (blk_mq_sched_bio_merge(q, bio, nr_segs))
> -		goto queue_exit;
> +	if (!blk_queue_nomerges(q) && bio_mergeable(bio)) {

bio_mergeable just checks REQ_NOMERGE_FLAGS, which includes
REQ_PREFLUSH and REQ_FUA...

> +		if (!is_flush_fua &&

... so this is not needed.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/16] blocK: move plug flush functions to blk-mq.c
  2021-10-19 21:24 ` [PATCH 07/16] blocK: move plug flush functions to blk-mq.c Pavel Begunkov
  2021-10-19 22:34   ` Chaitanya Kulkarni
@ 2021-10-20  6:17   ` Christoph Hellwig
  2021-10-20 12:23     ` Pavel Begunkov
  1 sibling, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:17 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

Spelling error in the subject.

On Tue, Oct 19, 2021 at 10:24:16PM +0100, Pavel Begunkov wrote:
> Flushing is tightly coupled with blk-mq and almost all
> blk_flush_plug_list() callees are in blk-mq.c. So move the whole thing
> there, so the compiler is able to apply more optimisations and inline.

No, it isn't.  The whole callback handling is all about bio based
drivers.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/16] block: add optimised version bio_set_dev()
  2021-10-19 21:24 ` [PATCH 11/16] block: add optimised version bio_set_dev() Pavel Begunkov
  2021-10-19 22:36   ` Chaitanya Kulkarni
@ 2021-10-20  6:20   ` Christoph Hellwig
  2021-10-20 12:29     ` Pavel Begunkov
  1 sibling, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:20 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:20PM +0100, Pavel Begunkov wrote:
> If a bio was just allocated its flags should be zero and there is no
> need to clear them. Add __bio_set_dev(), which is faster and doesn't
> care about clering flags.

This is entirely the wrong way around.  Please add a new bio_reset_dev
for the no more than about two hand full callers that actually had
a device set and just optimize bio_set_dev.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 15/16] block: optimise blk_may_split for normal rw
  2021-10-19 21:24 ` [PATCH 15/16] block: optimise blk_may_split for normal rw Pavel Begunkov
@ 2021-10-20  6:25   ` Christoph Hellwig
  2021-10-20 13:38     ` Pavel Begunkov
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:25 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:24PM +0100, Pavel Begunkov wrote:
> +	unsigned int op = bio_op(bio);
> +
> +	if (op != REQ_OP_READ && op != REQ_OP_WRITE && op != REQ_OP_FLUSH) {
> +		switch (op) {
> +		case REQ_OP_DISCARD:
> +		case REQ_OP_SECURE_ERASE:
> +		case REQ_OP_WRITE_ZEROES:
> +		case REQ_OP_WRITE_SAME:
> +			return true; /* non-trivial splitting decisions */
> +		default:
> +			break;
> +		}

Nesting the if and the switch is too ugly to live.  If you want ifs do
just them.  But I'd really like to see numbers for this, also compared
to epxlicitly checking for REQ_OP_READ and REQ_OP_WRITE and maybe using
__builtin_expect for those values.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 16/16] block: optimise submit_bio_checks for normal rw
  2021-10-19 21:24 ` [PATCH 16/16] block: optimise submit_bio_checks " Pavel Begunkov
@ 2021-10-20  6:26   ` Christoph Hellwig
  0 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:26 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:25PM +0100, Pavel Begunkov wrote:
> Optimise the switch in submit_bio_checks() for reads, writes and
> flushes. REQ_OP_READ/WRITE/FLUSH take numbers from 0 to 2, so the added
> checks are compiled into a single condition:
> 
> if (op <= REQ_OP_FLUSH) {} else { switch() ... };

Same comment as for the previous one.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/16] block: optimise blk_flush_plug_list
  2021-10-19 21:24 ` [PATCH 08/16] block: optimise blk_flush_plug_list Pavel Begunkov
@ 2021-10-20  6:29   ` Christoph Hellwig
  2021-10-20 12:26     ` Pavel Begunkov
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:29 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:17PM +0100, Pavel Begunkov wrote:
> First, don't init a callback list if there are no plug callbacks. Also,
> replace internals of the function with do-while.

So the check to not call into the callback code when there are none,
which is the usual case, totally makes sense.  But what is the point of
the rest?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 09/16] block: optimise boundary blkdev_read_iter's checks
  2021-10-19 21:24 ` [PATCH 09/16] block: optimise boundary blkdev_read_iter's checks Pavel Begunkov
@ 2021-10-20  6:29   ` Christoph Hellwig
  0 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:29 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:18PM +0100, Pavel Begunkov wrote:
> Combine pos and len checks and mark unlikely. Also, don't reexpand if
> it's not truncated.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/16] block: optimise blkdev_bio_end_io()
  2021-10-19 21:24 ` [PATCH 10/16] block: optimise blkdev_bio_end_io() Pavel Begunkov
@ 2021-10-20  6:30   ` Christoph Hellwig
  2021-10-20 12:29     ` Pavel Begunkov
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:30 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:19PM +0100, Pavel Begunkov wrote:
> Save dio->flags in a variable, so it doesn't reload it a bunch of times.
> Also use cached in a var iocb for the same reason.

Same question again, does this really make a difference?  We really
shouldn't have to try to work around the compiler like this (even if
this relatively harmless in the end).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 12/16] block: add single bio async direct IO helper
  2021-10-19 21:24 ` [PATCH 12/16] block: add single bio async direct IO helper Pavel Begunkov
@ 2021-10-20  6:36   ` Christoph Hellwig
  2021-10-20 12:35     ` Pavel Begunkov
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:36 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:21PM +0100, Pavel Begunkov wrote:
> +	bio = bio_alloc_kiocb(iocb, nr_pages, &blkdev_dio_pool);
> +	dio = container_of(bio, struct blkdev_dio, bio);
> +	__bio_set_dev(bio, bdev);
> +	bio->bi_iter.bi_sector = pos >> 9;

SECTOR_SHIFT.

> +	bio->bi_write_hint = iocb->ki_hint;
> +	bio->bi_end_io = blkdev_bio_end_io_async;
> +	bio->bi_ioprio = iocb->ki_ioprio;
> +	dio->flags = 0;
> +	dio->iocb = iocb;
> +
> +	ret = bio_iov_iter_get_pages(bio, iter);
> +	if (unlikely(ret)) {
> +		bio->bi_status = BLK_STS_IOERR;
> +		bio_endio(bio);
> +		return BLK_STS_IOERR;

This function does not return a blk_status_t, so this is wrong (and
sparse should have complained).  I also don't think the error path
here should go hrough the bio for error handling but just do a put and
return the error.

> +	if (iov_iter_rw(iter) == READ) {
> +		bio->bi_opf = REQ_OP_READ;
> +		if (iter_is_iovec(iter)) {
> +			dio->flags |= DIO_SHOULD_DIRTY;
> +			bio_set_pages_dirty(bio);
> +		}
> +	} else {
> +		bio->bi_opf = dio_bio_write_op(iocb);
> +		task_io_account_write(bio->bi_iter.bi_size);
> +	}
> +
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		bio->bi_opf |= REQ_NOWAIT;

This code is entirely duplicated, pleae move it into an (inline) helper.

> +	/*
> +	 * Don't plug for HIPRI/polled IO, as those should go straight
> +	 * to issue
> +	 */

This comment seems misplaced as the function does not use plugging at
all.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 13/16] block: add async version of bio_set_polled
  2021-10-19 21:24 ` [PATCH 13/16] block: add async version of bio_set_polled Pavel Begunkov
@ 2021-10-20  6:37   ` Christoph Hellwig
  2021-10-20 12:58     ` Pavel Begunkov
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:37 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:22PM +0100, Pavel Begunkov wrote:
> If we know that a iocb is async we can optimise bio_set_polled() a bit,
> add a new helper bio_set_polled_async().

This avoids a single if.  Why?  I'm really worried about all these
micro-optimizations that just make the code harder and harder to
maintain.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 14/16] block: skip advance when async and not needed
  2021-10-19 21:24 ` [PATCH 14/16] block: skip advance when async and not needed Pavel Begunkov
@ 2021-10-20  6:41   ` Christoph Hellwig
  0 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20  6:41 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe

On Tue, Oct 19, 2021 at 10:24:23PM +0100, Pavel Begunkov wrote:
> Nobody cares about iov iterators state if we return -EIOCBQUEUED, so as
> the we now have __blkdev_direct_IO_async(), which gets pages only once,
> we can skip expensive iov_iter_advance(). It's around 1-2% of all CPU
> spent.

This is a pretty horrible code structure.  If you do want to optimize
this case just call __bio_iov_bvec_set directly from your fast path for
the iov_iter_is_bvec case.  And please add a big fat comment explaining
it and document in the commit why this matters using actual numbers on
a real workload.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/16] block: don't bloat enter_queue with percpu_ref
  2021-10-20  6:12   ` Christoph Hellwig
@ 2021-10-20 12:08     ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 12:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:12, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:13PM +0100, Pavel Begunkov wrote:
>> percpu_ref_put() are inlined for performance and bloat the binary, we
>> don't care about the fail case of blk_try_enter_queue(), so we can
>> replace it with a call to blk_queue_exit().

Thanks for going through the series!

> Does this make a difference for you?

It did with a different compiler with extra patches and a different
base, but checking with for-5.16/block the binary size stays the same.


> That being said using the proper helpers always seems like a good idea,
> so:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/16] block: inline a part of bio_release_pages()
  2021-10-20  6:13   ` Christoph Hellwig
@ 2021-10-20 12:19     ` Pavel Begunkov
  2021-10-20 14:15     ` Jens Axboe
  1 sibling, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 12:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:13, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:14PM +0100, Pavel Begunkov wrote:
>> Inline BIO_NO_PAGE_REF check of bio_release_pages() to avoid function
>> call.
> 
> Fine with me, but it seems like we're really getting into benchmarketing
> at some point..

Well, certainly don't want to get to that point, but at least this one
is in io_uring's hot path, i.e. fixed buffers.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 06/16] block: clean up blk_mq_submit_bio() merging
  2021-10-20  6:16   ` Christoph Hellwig
@ 2021-10-20 12:20     ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 12:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:16, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:15PM +0100, Pavel Begunkov wrote:
>> Combine blk_mq_sched_bio_merge() and blk_attempt_plug_merge() under a
>> common if, so we don't check it twice. Also honor bio_mergeable() for
>> blk_mq_sched_bio_merge().
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   block/blk-mq-sched.c |  2 +-
>>   block/blk-mq-sched.h | 12 +-----------
>>   block/blk-mq.c       | 13 +++++++------
>>   3 files changed, 9 insertions(+), 18 deletions(-)
>>
[...]
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index bab1fccda6ca..218bfaa98591 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2482,12 +2482,13 @@ void blk_mq_submit_bio(struct bio *bio)
>>   	if (!bio_integrity_prep(bio))
>>   		goto queue_exit;
>>   
>> -	if (!is_flush_fua && !blk_queue_nomerges(q) &&
>> -	    blk_attempt_plug_merge(q, bio, nr_segs, &same_queue_rq))
>> -		goto queue_exit;
>> -
>> -	if (blk_mq_sched_bio_merge(q, bio, nr_segs))
>> -		goto queue_exit;
>> +	if (!blk_queue_nomerges(q) && bio_mergeable(bio)) {
> 
> bio_mergeable just checks REQ_NOMERGE_FLAGS, which includes
> REQ_PREFLUSH and REQ_FUA...
> 
>> +		if (!is_flush_fua &&
> 
> ... so this is not needed.

Missed it, thanks. I also need to update the message then

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/16] blocK: move plug flush functions to blk-mq.c
  2021-10-20  6:17   ` Christoph Hellwig
@ 2021-10-20 12:23     ` Pavel Begunkov
  2021-10-20 12:37       ` Christoph Hellwig
  0 siblings, 1 reply; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 12:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:17, Christoph Hellwig wrote:
> Spelling error in the subject.
> 
> On Tue, Oct 19, 2021 at 10:24:16PM +0100, Pavel Begunkov wrote:
>> Flushing is tightly coupled with blk-mq and almost all
>> blk_flush_plug_list() callees are in blk-mq.c. So move the whole thing
>> there, so the compiler is able to apply more optimisations and inline.
> 
> No, it isn't.  The whole callback handling is all about bio based
> drivers.

How about leaving flush_plug_callbacks() in blk-core.c but moving
everything else?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/16] block: optimise blk_flush_plug_list
  2021-10-20  6:29   ` Christoph Hellwig
@ 2021-10-20 12:26     ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 12:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:29, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:17PM +0100, Pavel Begunkov wrote:
>> First, don't init a callback list if there are no plug callbacks. Also,
>> replace internals of the function with do-while.
> 
> So the check to not call into the callback code when there are none,
> which is the usual case, totally makes sense.  But what is the point of
> the rest?

I don't care much about that part, can leave it alone, especially if
blk_flush_plug_list() stays in blk-core.c. I'll likely squash this
patch into the previous one.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/16] block: optimise blkdev_bio_end_io()
  2021-10-20  6:30   ` Christoph Hellwig
@ 2021-10-20 12:29     ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 12:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:30, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:19PM +0100, Pavel Begunkov wrote:
>> Save dio->flags in a variable, so it doesn't reload it a bunch of times.
>> Also use cached in a var iocb for the same reason.
> 
> Same question again, does this really make a difference?  We really

No, will remove the patch

> shouldn't have to try to work around the compiler like this (even if
> this relatively harmless in the end).


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/16] block: add optimised version bio_set_dev()
  2021-10-20  6:20   ` Christoph Hellwig
@ 2021-10-20 12:29     ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 12:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:20, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:20PM +0100, Pavel Begunkov wrote:
>> If a bio was just allocated its flags should be zero and there is no
>> need to clear them. Add __bio_set_dev(), which is faster and doesn't
>> care about clering flags.
> 
> This is entirely the wrong way around.  Please add a new bio_reset_dev
> for the no more than about two hand full callers that actually had
> a device set and just optimize bio_set_dev.

makes sense, thanks

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 12/16] block: add single bio async direct IO helper
  2021-10-20  6:36   ` Christoph Hellwig
@ 2021-10-20 12:35     ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 12:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:36, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:21PM +0100, Pavel Begunkov wrote:
>> +	bio = bio_alloc_kiocb(iocb, nr_pages, &blkdev_dio_pool);
>> +	dio = container_of(bio, struct blkdev_dio, bio);
>> +	__bio_set_dev(bio, bdev);
>> +	bio->bi_iter.bi_sector = pos >> 9;
> 
> SECTOR_SHIFT.
> 
>> +	bio->bi_write_hint = iocb->ki_hint;
>> +	bio->bi_end_io = blkdev_bio_end_io_async;
>> +	bio->bi_ioprio = iocb->ki_ioprio;
>> +	dio->flags = 0;
>> +	dio->iocb = iocb;
>> +
>> +	ret = bio_iov_iter_get_pages(bio, iter);
>> +	if (unlikely(ret)) {
>> +		bio->bi_status = BLK_STS_IOERR;
>> +		bio_endio(bio);
>> +		return BLK_STS_IOERR;
> 
> This function does not return a blk_status_t, so this is wrong (and
> sparse should have complained).  I also don't think the error path
> here should go hrough the bio for error handling but just do a put and
> return the error.

My bad, following __blkdev_direct_IO() it was intended to be
blk_status_to_errno(BLK_STS_IOERR), but just return is much
better.

> 
>> +	if (iov_iter_rw(iter) == READ) {
>> +		bio->bi_opf = REQ_OP_READ;
>> +		if (iter_is_iovec(iter)) {
>> +			dio->flags |= DIO_SHOULD_DIRTY;
>> +			bio_set_pages_dirty(bio);
>> +		}
>> +	} else {
>> +		bio->bi_opf = dio_bio_write_op(iocb);
>> +		task_io_account_write(bio->bi_iter.bi_size);
>> +	}
>> +
>> +	if (iocb->ki_flags & IOCB_NOWAIT)
>> +		bio->bi_opf |= REQ_NOWAIT;
> 
> This code is entirely duplicated, pleae move it into an (inline) helper.

I'll try it out, thanks

>> +	/*
>> +	 * Don't plug for HIPRI/polled IO, as those should go straight
>> +	 * to issue
>> +	 */
> 
> This comment seems misplaced as the function does not use plugging at
> all.

will kill it


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/16] blocK: move plug flush functions to blk-mq.c
  2021-10-20 12:23     ` Pavel Begunkov
@ 2021-10-20 12:37       ` Christoph Hellwig
  2021-10-20 13:18         ` Pavel Begunkov
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20 12:37 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Christoph Hellwig, linux-block, Jens Axboe

On Wed, Oct 20, 2021 at 01:23:05PM +0100, Pavel Begunkov wrote:
> How about leaving flush_plug_callbacks() in blk-core.c but moving
> everything else?

That's at least a little better.  I'd still prefer to keep the wrappers
out as well.  I've been wondering a bit if the whole callback handling
needs a rework.  Let me thing about this a bit more, maybe I'll have
patches later today.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 13/16] block: add async version of bio_set_polled
  2021-10-20  6:37   ` Christoph Hellwig
@ 2021-10-20 12:58     ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 12:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:37, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:22PM +0100, Pavel Begunkov wrote:
>> If we know that a iocb is async we can optimise bio_set_polled() a bit,
>> add a new helper bio_set_polled_async().
> 
> This avoids a single if.  Why?  I'm really worried about all these
> micro-optimizations that just make the code harder and harder to
> maintain.

Not really just one, it also moves REQ_F_NOWAIT, and alias analysis
doesn't work well here, e.g. it can't conclude anything about
relations b/w @iocb and @bio, those can do nothing but store/load
to/from memory.

Assembly related to that HIPRI path before:

# block/fops.c:378: 	if (iocb->ki_flags & IOCB_NOWAIT)
	movl	32(%rbx), %eax	# iocb_31(D)->ki_flags, _20
# block/fops.c:378: 	if (iocb->ki_flags & IOCB_NOWAIT)
	testb	$8, %al	#, _20
	je	.L200	#,
# block/fops.c:379: 		bio->bi_opf |= REQ_NOWAIT;
	orl	$2097152, 16(%r12)	#, bio_40->bi_opf
# block/fops.c:380: 	if (iocb->ki_flags & IOCB_HIPRI) {
	movl	32(%rbx), %eax	# iocb_31(D)->ki_flags, _20
.L200:
# block/fops.c:380: 	if (iocb->ki_flags & IOCB_HIPRI) {
	testb	$1, %al	#, _20
	je	.L201	#,
# ./include/linux/bio.h:748: 	bio->bi_opf |= REQ_POLLED;
	movl	16(%r12), %eax	# bio_40->bi_opf, _73
	movl	%eax, %edx	# _73, tmp138
	orl	$16777216, %edx	#, tmp138
	movl	%edx, 16(%r12)	# tmp138, bio_40->bi_opf
# ./include/linux/bio.h:749: 	if (!is_sync_kiocb(kiocb))
	cmpq	$0, 16(%rbx)	#, iocb_31(D)->ki_complete
	je	.L202	#,
# ./include/linux/bio.h:750: 		bio->bi_opf |= REQ_NOWAIT;
	orl	$18874368, %eax	#, tmp139
	movl	%eax, 16(%r12)	# tmp139, bio_40->bi_opf
.L202:
# block/fops.c:382: 		submit_bio(bio);
	movq	%r12, %rdi	# bio,
	call	submit_bio	#

After:

# block/fops.c:378: 	if (iocb->ki_flags & IOCB_HIPRI) {
	movl	32(%rbx), %eax	# iocb_30(D)->ki_flags, _20
# block/fops.c:378: 	if (iocb->ki_flags & IOCB_HIPRI) {
	testb	$1, %al	#, _20
	je	.L210	#,
.L213:
# ./include/linux/bio.h:755: 	bio->bi_opf |= REQ_POLLED | REQ_NOWAIT;
	orl	$18874368, 16(%r12)	#, bio_39->bi_opf
# block/fops.c:380: 		submit_bio(bio);
	movq	%r12, %rdi	# bio,
	call	submit_bio	#


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/16] blocK: move plug flush functions to blk-mq.c
  2021-10-20 12:37       ` Christoph Hellwig
@ 2021-10-20 13:18         ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 13:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 13:37, Christoph Hellwig wrote:
> On Wed, Oct 20, 2021 at 01:23:05PM +0100, Pavel Begunkov wrote:
>> How about leaving flush_plug_callbacks() in blk-core.c but moving
>> everything else?
> 
> That's at least a little better.  I'd still prefer to keep the wrappers
> out as well.  I've been wondering a bit if the whole callback handling
> needs a rework.  Let me thing about this a bit more, maybe I'll have
> patches later today.

Sure, I'll leave it to you. If you'll be changing parts around,
you can also fold in that part of 8/16 that avoids extra list init

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 15/16] block: optimise blk_may_split for normal rw
  2021-10-20  6:25   ` Christoph Hellwig
@ 2021-10-20 13:38     ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 13:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Jens Axboe

On 10/20/21 07:25, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:24PM +0100, Pavel Begunkov wrote:
>> +	unsigned int op = bio_op(bio);
>> +
>> +	if (op != REQ_OP_READ && op != REQ_OP_WRITE && op != REQ_OP_FLUSH) {
>> +		switch (op) {
>> +		case REQ_OP_DISCARD:
>> +		case REQ_OP_SECURE_ERASE:
>> +		case REQ_OP_WRITE_ZEROES:
>> +		case REQ_OP_WRITE_SAME:
>> +			return true; /* non-trivial splitting decisions */
>> +		default:
>> +			break;
>> +		}
> 
> Nesting the if and the switch is too ugly to live.  If you want ifs do
> just them.  But I'd really like to see numbers for this, also compared
> to epxlicitly checking for REQ_OP_READ and REQ_OP_WRITE and maybe using
> __builtin_expect for those values.

What I want to get from the compiler is:

if (op <= REQ_OP_FLUSH)
	goto after_switch_label;
else switch () { ... }


Was trying to hint it somehow (gcc 11.1),

(void)__builtin_expect(op <= FLUSH, 1);

No luck, asm doesn't change. Not sure why people don't like
it, so let me ask which one is better?

1)

if (op == read || op == write ...)
	goto label;
else switch () { }

2)

if (op == read || op == write ...)
	goto label;
else if () ...
else if () ...
else if () ...


For the numbers, had profiling for the whole series (nullblk):

+    2.82%     2.82%  io_uring  [kernel.vmlinux]    [k] submit_bio_checks
+    2.51%     2.50%  io_uring  [kernel.vmlinux]    [k] submit_bio_checks

Because the relative % for "after" should grow because of other
optimisations, so the difference should be _a bit_ larger. Need to
retest.

And some asm (for submit_bio_checks()) for comparison. Before:

# block/blk-core.c:823: 		switch (op) {
	cmpl	$9, %eax	#, op
	je	.L616	#,
	ja	.L617	#,
	cmpl	$5, %eax	#, op
	je	.L618	#,
	cmpl	$7, %eax	#, op
	jne	.L696	#,
	...
.L696:
	cmpl	$3, %eax	#, op
	jne	.L621	#,
	...
.L621 (label after switch)

After:

# block/blk-core.c:822: 	if (op != REQ_OP_READ && op != REQ_OP_WRITE && op != REQ_OP_FLUSH) {
	cmpb	$2, %al	#, _18
# block/blk-core.c:822: 	if (op != REQ_OP_READ && op != REQ_OP_WRITE && op != REQ_OP_FLUSH) {
	jbe	.L616	#,
	...
.L616 (label after switch)


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: (subset) [PATCH 00/16] block optimisation round
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (17 preceding siblings ...)
  2021-10-20  0:21 ` Jens Axboe
@ 2021-10-20 14:12 ` Jens Axboe
  2021-10-20 14:54 ` Pavel Begunkov
  19 siblings, 0 replies; 57+ messages in thread
From: Jens Axboe @ 2021-10-20 14:12 UTC (permalink / raw)
  To: linux-block, Pavel Begunkov; +Cc: Jens Axboe

On Tue, 19 Oct 2021 22:24:09 +0100, Pavel Begunkov wrote:
> Jens tried out a similar series with some not yet sent additions:
> 8.2-8.3 MIOPS -> ~9 MIOPS, or 8-10%.
> 
> 12/16 is bulky, but it nicely drives the numbers. Moreover, with
> it we can rid of some not used anymore optimisations in
> __blkdev_direct_IO() because it awlays serve multiple bios.
> E.g. no need in conditional referencing with DIO_MULTI_BIO,
> and _probably_ can be converted to chained bio.
> 
> [...]

Applied, thanks!

[01/16] block: turn macro helpers into inline functions
        (no commit info)
[02/16] block: convert leftovers to bdev_get_queue
        (no commit info)
[03/16] block: optimise req_bio_endio()
        (no commit info)
[04/16] block: don't bloat enter_queue with percpu_ref
        (no commit info)
[05/16] block: inline a part of bio_release_pages()
        (no commit info)

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/16] block: inline a part of bio_release_pages()
  2021-10-20  6:13   ` Christoph Hellwig
  2021-10-20 12:19     ` Pavel Begunkov
@ 2021-10-20 14:15     ` Jens Axboe
  2021-10-20 17:29       ` Christoph Hellwig
  1 sibling, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2021-10-20 14:15 UTC (permalink / raw)
  To: Christoph Hellwig, Pavel Begunkov; +Cc: linux-block

On 10/20/21 12:13 AM, Christoph Hellwig wrote:
> On Tue, Oct 19, 2021 at 10:24:14PM +0100, Pavel Begunkov wrote:
>> Inline BIO_NO_PAGE_REF check of bio_release_pages() to avoid function
>> call.
> 
> Fine with me, but it seems like we're really getting into benchmarketing
> at some point..

I just want to be very clear that neither mine nor Pavel's work is
trying to get into benchmarketing. There are very tangible performance
benefits, and they are backed up by code analysis and testing. That
said, the point is of course not to make this any harder to follow or
maintain, but we are doing suboptimal things in various places and those
should get cleaned up, particularly when they impact performance. Are
some a big eager? Perhaps, but let's not that that cloud the perception
of the effort as a whole.

The fact that 5.15-rc6 will do ~5.5M/core and post these optimizations
we'll be at 9M/core is a solid 60%+ improvement. That's not
benchmarketing, that's just making the stack more efficient.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/16] block optimisation round
  2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
                   ` (18 preceding siblings ...)
  2021-10-20 14:12 ` (subset) " Jens Axboe
@ 2021-10-20 14:54 ` Pavel Begunkov
  19 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-20 14:54 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe, Christoph Hellwig

On 10/19/21 22:24, Pavel Begunkov wrote:
> Jens tried out a similar series with some not yet sent additions:
> 8.2-8.3 MIOPS -> ~9 MIOPS, or 8-10%.
> 
> 12/16 is bulky, but it nicely drives the numbers. Moreover, with
> it we can rid of some not used anymore optimisations in
> __blkdev_direct_IO() because it awlays serve multiple bios.
> E.g. no need in conditional referencing with DIO_MULTI_BIO,
> and _probably_ can be converted to chained bio.
Some numbers, using nullblk is not perfect, but empirically
from numbers Jens posts his Optane setup usually gives somewhat
relatable results in terms of % difference. (probably, divide
the difference in percents by 2 for the worst case).

modprobe null_blk no_sched=1 irqmode=1 completion_nsec=0 submit_queues=16 poll_queues=32
echo 0 > /sys/block/nullb0/queue/iostats
echo 2 > /sys/block/nullb0/queue/nomerges
nice -n -20 taskset -c 0 ./io_uring -d32 -s32 -c32 -p1 -B1 -F1 -b512 /dev/nullb0
# polled=1, fixedbufs=1, register_files=1, buffered=0 QD=32, sq_ring=32, cq_ring=64

# baseline (for-5.16/block)

IOPS=4304768, IOS/call=32/32, inflight=32 (32)
IOPS=4289824, IOS/call=32/32, inflight=32 (32)
IOPS=4227808, IOS/call=32/32, inflight=32 (32)
IOPS=4187008, IOS/call=32/32, inflight=32 (32)
IOPS=4196992, IOS/call=32/32, inflight=32 (32)
IOPS=4208384, IOS/call=32/32, inflight=32 (32)
IOPS=4233888, IOS/call=32/32, inflight=32 (32)
IOPS=4266432, IOS/call=32/32, inflight=32 (32)
IOPS=4232352, IOS/call=32/32, inflight=32 (32)

# + patch 14/16 (skip advance)

IOPS=4367424, IOS/call=32/32, inflight=0 (16)
IOPS=4401088, IOS/call=32/32, inflight=32 (32)
IOPS=4400544, IOS/call=32/32, inflight=0 (29)
IOPS=4400768, IOS/call=32/32, inflight=32 (32)
IOPS=4409568, IOS/call=32/32, inflight=32 (32)
IOPS=4373888, IOS/call=32/32, inflight=32 (32)
IOPS=4392544, IOS/call=32/32, inflight=32 (32)
IOPS=4368192, IOS/call=32/32, inflight=32 (32)
IOPS=4362976, IOS/call=32/32, inflight=32 (32)

Comparing profiling. Before:
+    1.75%  io_uring  [kernel.vmlinux]  [k] bio_iov_iter_get_pages
+    0.90%  io_uring  [kernel.vmlinux]  [k] iov_iter_advance

After:
+    0.91%  io_uring  [kernel.vmlinux]  [k] bio_iov_iter_get_pages_hint
[no iov_iter_advance]

# + patches 15,16 (switch optimisation)

IOPS=4485984, IOS/call=32/32, inflight=32 (32)
IOPS=4500384, IOS/call=32/32, inflight=32 (32)
IOPS=4524512, IOS/call=32/32, inflight=32 (32)
IOPS=4507424, IOS/call=32/32, inflight=32 (32)
IOPS=4497216, IOS/call=32/32, inflight=32 (32)
IOPS=4496832, IOS/call=32/32, inflight=32 (32)
IOPS=4505632, IOS/call=32/32, inflight=32 (32)
IOPS=4476224, IOS/call=32/32, inflight=32 (32)
IOPS=4478592, IOS/call=32/32, inflight=32 (32)
IOPS=4480128, IOS/call=32/32, inflight=32 (32)
IOPS=4468640, IOS/call=32/32, inflight=32 (32)

Before:
+    1.92%  io_uring  [kernel.vmlinux]  [k] submit_bio_checks
+    5.56%  io_uring  [kernel.vmlinux]  [k] blk_mq_submit_bio
After:
+    1.66%  io_uring  [kernel.vmlinux]  [k] submit_bio_checks
+    5.49%  io_uring  [kernel.vmlinux]  [k] blk_mq_submit_bio

0.3% difference from perf, ~2% from absolute numbers, which is
most probably just a coincidence. But 0.3% looks realistic.


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/16] block: inline a part of bio_release_pages()
  2021-10-20 14:15     ` Jens Axboe
@ 2021-10-20 17:29       ` Christoph Hellwig
  0 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2021-10-20 17:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Pavel Begunkov, linux-block

On Wed, Oct 20, 2021 at 08:15:25AM -0600, Jens Axboe wrote:
> I just want to be very clear that neither mine nor Pavel's work is
> trying to get into benchmarketing. There are very tangible performance
> benefits, and they are backed up by code analysis and testing. That
> said, the point is of course not to make this any harder to follow or
> maintain, but we are doing suboptimal things in various places and those
> should get cleaned up, particularly when they impact performance. Are
> some a big eager? Perhaps, but let's not that that cloud the perception
> of the effort as a whole.

A lot of it seems generally useful.  But optimizing for BIO_NO_PAGE_REF
is really special.  Besides a few in-kernel special cases this is all
about the io_uring pre-registered buffers, which are very much a
special case and not a normal workload at all.  In this case it is just
an extra conditional moved into a few callers so I think we're ok, but
I'm really worried about optimizing this benchmark fast path over the
code everyone actually uses.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/16] block: optimise req_bio_endio()
  2021-10-19 21:24 ` [PATCH 03/16] block: optimise req_bio_endio() Pavel Begunkov
  2021-10-20  6:11   ` Christoph Hellwig
@ 2021-10-22  9:58   ` Shinichiro Kawasaki
  2021-10-22 10:58     ` Pavel Begunkov
  1 sibling, 1 reply; 57+ messages in thread
From: Shinichiro Kawasaki @ 2021-10-22  9:58 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: linux-block, Jens Axboe, Damien Le Moal

Hello Pavel,

Recently I tried out for-next branch and observed that simple dd command to
zonefs files causes an I/O error.

$ sudo dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 oflag=direct
dd: error writing '/mnt/seq/0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 0.00409641 s, 0.0 kB/s

At that time, kernel reported warnings.

[90713.298721][ T2735] zonefs (nvme0n1) WARNING: inode 1: invalid size 0 (should be 4096)
[90713.299761][ T2735] zonefs (nvme0n1) WARNING: remounting filesystem read-only

I bisected and found that this patch triggers the error and warnings. I think
one liner change is needed in this patch. Please find it below, in line.


On Oct 19, 2021 / 22:24, Pavel Begunkov wrote:
> First, get rid of an extra branch and chain error checks. Also reshuffle
> it with bio_advance(), so it goes closer to the final check, with that
> the compiler loads rq->rq_flags only once, and also doesn't reload
> bio->bi_iter.bi_size if bio_advance() didn't actually advanced the iter.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  block/blk-mq.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3481a8712234..bab1fccda6ca 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -617,25 +617,23 @@ void blk_mq_free_plug_rqs(struct blk_plug *plug)
>  static void req_bio_endio(struct request *rq, struct bio *bio,
>  			  unsigned int nbytes, blk_status_t error)
>  {
> -	if (error)
> +	if (unlikely(error)) {
>  		bio->bi_status = error;
> -
> -	if (unlikely(rq->rq_flags & RQF_QUIET))
> -		bio_set_flag(bio, BIO_QUIET);
> -
> -	bio_advance(bio, nbytes);
> -
> -	if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) {
> +	} else if (req_op(rq) == REQ_OP_ZONE_APPEND) {
>  		/*
>  		 * Partial zone append completions cannot be supported as the
>  		 * BIO fragments may end up not being written sequentially.
>  		 */
> -		if (bio->bi_iter.bi_size)
> +		if (bio->bi_iter.bi_size == nbytes)

I think the line above should be,

		if (bio->bi_iter.bi_size != nbytes)

Before applying the patch, the if statement checked "bi_size is not zero".
After applying the patch, bio_advance(bio, nbytes) moved after this check.
Then bi_size is not decremented by nbytes and the check should be "bi_size is
not nbytes". With this modification, the I/O error and the warnings go away.

>  			bio->bi_status = BLK_STS_IOERR;
>  		else
>  			bio->bi_iter.bi_sector = rq->__sector;
>  	}
>  
> +	bio_advance(bio, nbytes);
> +
> +	if (unlikely(rq->rq_flags & RQF_QUIET))
> +		bio_set_flag(bio, BIO_QUIET);
>  	/* don't actually finish bio if it's part of flush sequence */
>  	if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
>  		bio_endio(bio);
> -- 
> 2.33.1
> 

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/16] block: optimise req_bio_endio()
  2021-10-22  9:58   ` Shinichiro Kawasaki
@ 2021-10-22 10:58     ` Pavel Begunkov
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Begunkov @ 2021-10-22 10:58 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: linux-block, Jens Axboe, Damien Le Moal

On 10/22/21 10:58, Shinichiro Kawasaki wrote:
> Hello Pavel,
> 
> Recently I tried out for-next branch and observed that simple dd command to
> zonefs files causes an I/O error.
> 
> $ sudo dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 oflag=direct
> dd: error writing '/mnt/seq/0': Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes copied, 0.00409641 s, 0.0 kB/s
> 
> At that time, kernel reported warnings.
> 
> [90713.298721][ T2735] zonefs (nvme0n1) WARNING: inode 1: invalid size 0 (should be 4096)
> [90713.299761][ T2735] zonefs (nvme0n1) WARNING: remounting filesystem read-only
> 
> I bisected and found that this patch triggers the error and warnings. I think
> one liner change is needed in this patch. Please find it below, in line.

[...]
> -	if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) {
>> +	} else if (req_op(rq) == REQ_OP_ZONE_APPEND) {
>>   		/*
>>   		 * Partial zone append completions cannot be supported as the
>>   		 * BIO fragments may end up not being written sequentially.
>>   		 */
>> -		if (bio->bi_iter.bi_size)
>> +		if (bio->bi_iter.bi_size == nbytes)
> 
> I think the line above should be,
> 
> 		if (bio->bi_iter.bi_size != nbytes)

You're right, that was a stupid mistake, thanks!

Jens, will you fold it in or would you prefer a patch?


> Before applying the patch, the if statement checked "bi_size is not zero".
> After applying the patch, bio_advance(bio, nbytes) moved after this check.
> Then bi_size is not decremented by nbytes and the check should be "bi_size is
> not nbytes". With this modification, the I/O error and the warnings go away.
> 
>>   			bio->bi_status = BLK_STS_IOERR;
>>   		else
>>   			bio->bi_iter.bi_sector = rq->__sector;
>>   	}
>>   
>> +	bio_advance(bio, nbytes);
>> +
>> +	if (unlikely(rq->rq_flags & RQF_QUIET))
>> +		bio_set_flag(bio, BIO_QUIET);
>>   	/* don't actually finish bio if it's part of flush sequence */
>>   	if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
>>   		bio_endio(bio);
>> -- 
>> 2.33.1
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2021-10-22 10:59 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19 21:24 [PATCH 00/16] block optimisation round Pavel Begunkov
2021-10-19 21:24 ` [PATCH 01/16] block: turn macro helpers into inline functions Pavel Begunkov
2021-10-20  6:09   ` Christoph Hellwig
2021-10-19 21:24 ` [PATCH 02/16] block: convert leftovers to bdev_get_queue Pavel Begunkov
2021-10-19 22:34   ` Chaitanya Kulkarni
2021-10-19 21:24 ` [PATCH 03/16] block: optimise req_bio_endio() Pavel Begunkov
2021-10-20  6:11   ` Christoph Hellwig
2021-10-22  9:58   ` Shinichiro Kawasaki
2021-10-22 10:58     ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 04/16] block: don't bloat enter_queue with percpu_ref Pavel Begunkov
2021-10-19 22:32   ` Chaitanya Kulkarni
2021-10-20  6:12   ` Christoph Hellwig
2021-10-20 12:08     ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 05/16] block: inline a part of bio_release_pages() Pavel Begunkov
2021-10-20  6:13   ` Christoph Hellwig
2021-10-20 12:19     ` Pavel Begunkov
2021-10-20 14:15     ` Jens Axboe
2021-10-20 17:29       ` Christoph Hellwig
2021-10-19 21:24 ` [PATCH 06/16] block: clean up blk_mq_submit_bio() merging Pavel Begunkov
2021-10-20  6:16   ` Christoph Hellwig
2021-10-20 12:20     ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 07/16] blocK: move plug flush functions to blk-mq.c Pavel Begunkov
2021-10-19 22:34   ` Chaitanya Kulkarni
2021-10-20  6:17   ` Christoph Hellwig
2021-10-20 12:23     ` Pavel Begunkov
2021-10-20 12:37       ` Christoph Hellwig
2021-10-20 13:18         ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 08/16] block: optimise blk_flush_plug_list Pavel Begunkov
2021-10-20  6:29   ` Christoph Hellwig
2021-10-20 12:26     ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 09/16] block: optimise boundary blkdev_read_iter's checks Pavel Begunkov
2021-10-20  6:29   ` Christoph Hellwig
2021-10-19 21:24 ` [PATCH 10/16] block: optimise blkdev_bio_end_io() Pavel Begunkov
2021-10-20  6:30   ` Christoph Hellwig
2021-10-20 12:29     ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 11/16] block: add optimised version bio_set_dev() Pavel Begunkov
2021-10-19 22:36   ` Chaitanya Kulkarni
2021-10-20  6:20   ` Christoph Hellwig
2021-10-20 12:29     ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 12/16] block: add single bio async direct IO helper Pavel Begunkov
2021-10-20  6:36   ` Christoph Hellwig
2021-10-20 12:35     ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 13/16] block: add async version of bio_set_polled Pavel Begunkov
2021-10-20  6:37   ` Christoph Hellwig
2021-10-20 12:58     ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 14/16] block: skip advance when async and not needed Pavel Begunkov
2021-10-20  6:41   ` Christoph Hellwig
2021-10-19 21:24 ` [PATCH 15/16] block: optimise blk_may_split for normal rw Pavel Begunkov
2021-10-20  6:25   ` Christoph Hellwig
2021-10-20 13:38     ` Pavel Begunkov
2021-10-19 21:24 ` [PATCH 16/16] block: optimise submit_bio_checks " Pavel Begunkov
2021-10-20  6:26   ` Christoph Hellwig
2021-10-19 23:31 ` [PATCH 00/16] block optimisation round Jens Axboe
2021-10-20  0:21 ` Jens Axboe
2021-10-20  0:22   ` Jens Axboe
2021-10-20 14:12 ` (subset) " Jens Axboe
2021-10-20 14:54 ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).