linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3] block: optimize for small block size IO
@ 2019-10-29 10:51 Ming Lei
  2019-10-29 11:04 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ming Lei @ 2019-10-29 10:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Coly Li, Christoph Hellwig, Keith Busch,
	linux-bcache

__blk_queue_split() may be a bit heavy for small block size(such as
512B, or 4KB) IO, so introduce one flag to decide if this bio includes
multiple page. And only consider to try splitting this bio in case
that the multiple page flag is set.

~3% - 5% IOPS improvement can be observed on io_uring test over
null_blk(MQ), and the io_uring test code is from fio/t/io_uring.c

bch_bio_map() should be the only one which doesn't use bio_add_page(),
so force to mark bio built via bch_bio_map() as MULTI_PAGE.

RAID5 has similar usage too, however the bio is really single-page bio,
so not necessary to handle it.

Cc: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: linux-bcache@vger.kernel.org
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
V3:
	- simplify check in __bio_add_page() as suggested by Christoph
V2:
	- share bit flag with passthrough IO
	- deal with adding multipage in one bio_add_page()

 block/bio.c               | 9 +++++++++
 block/blk-merge.c         | 4 ++++
 block/bounce.c            | 3 +++
 drivers/md/bcache/util.c  | 2 ++
 include/linux/blk_types.h | 3 +++
 5 files changed, 21 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 8f0ed6228fc5..eeb81679689b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -583,6 +583,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	bio_set_flag(bio, BIO_CLONED);
 	if (bio_flagged(bio_src, BIO_THROTTLED))
 		bio_set_flag(bio, BIO_THROTTLED);
+	if (bio_flagged(bio_src, BIO_MULTI_PAGE))
+		bio_set_flag(bio, BIO_MULTI_PAGE);
 	bio->bi_opf = bio_src->bi_opf;
 	bio->bi_ioprio = bio_src->bi_ioprio;
 	bio->bi_write_hint = bio_src->bi_write_hint;
@@ -757,6 +759,9 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 		if (page_is_mergeable(bv, page, len, off, same_page)) {
 			bv->bv_len += len;
 			bio->bi_iter.bi_size += len;
+
+			if (!*same_page)
+				bio_set_flag(bio, BIO_MULTI_PAGE);
 			return true;
 		}
 	}
@@ -789,6 +794,10 @@ void __bio_add_page(struct bio *bio, struct page *page,
 	bio->bi_iter.bi_size += len;
 	bio->bi_vcnt++;
 
+	if (!bio_flagged(bio, BIO_MULTI_PAGE) && (bio->bi_vcnt >= 2 ||
+				bv->bv_len > PAGE_SIZE))
+		bio_set_flag(bio, BIO_MULTI_PAGE);
+
 	if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page)))
 		bio_set_flag(bio, BIO_WORKINGSET);
 }
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 48e6725b32ee..737bbec9e153 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -309,6 +309,10 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
 				nr_segs);
 		break;
 	default:
+		if (!bio_flagged(*bio, BIO_MULTI_PAGE)) {
+			*nr_segs = 1;
+			return;
+		}
 		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
 		break;
 	}
diff --git a/block/bounce.c b/block/bounce.c
index f8ed677a1bf7..4b18a2accccc 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -253,6 +253,9 @@ static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
+	if (bio_flagged(bio_src, BIO_MULTI_PAGE))
+		bio_set_flag(bio, BIO_MULTI_PAGE);
+
 	switch (bio_op(bio)) {
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 62fb917f7a4f..71f5cbb6fdd6 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -253,6 +253,8 @@ start:		bv->bv_len	= min_t(size_t, PAGE_SIZE - bv->bv_offset,
 
 		size -= bv->bv_len;
 	}
+
+	bio_set_flag(bio, BIO_MULTI_PAGE);
 }
 
 /**
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d688b96d1d63..10b9a3539716 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -220,6 +220,9 @@ enum {
 				 * throttling rules. Don't do it again. */
 	BIO_TRACE_COMPLETION,	/* bio_endio() should trace the final completion
 				 * of this bio. */
+	BIO_MULTI_PAGE = BIO_USER_MAPPED,
+				/* used for optimize small BS IO from FS, so
+				 * share the bit flag with passthrough IO */
 	BIO_QUEUE_ENTERED,	/* can use blk_queue_enter_live() */
 	BIO_TRACKED,		/* set if bio goes through the rq_qos path */
 	BIO_FLAG_LAST
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] block: optimize for small block size IO
  2019-10-29 10:51 [PATCH V3] block: optimize for small block size IO Ming Lei
@ 2019-10-29 11:04 ` Christoph Hellwig
  2019-10-29 15:11   ` Jens Axboe
  2019-10-30  0:21   ` Ming Lei
  2019-10-29 15:09 ` Bart Van Assche
  2019-11-01 14:47 ` Jens Axboe
  2 siblings, 2 replies; 8+ messages in thread
From: Christoph Hellwig @ 2019-10-29 11:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Coly Li, Christoph Hellwig, Keith Busch,
	linux-bcache

I still haven't seen an explanation why this simple thing doesn't work
just as well:

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 48e6725b32ee..f3073700166f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -309,6 +309,11 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
 				nr_segs);
 		break;
 	default:
+		if ((*bio)->bi_vcnt == 1 &&
+		    (*bio)->bi_io_vec[0].bv_len <= PAGE_SIZE) {
+			*nr_segs = 1;
+			return;
+		}
 		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
 		break;
 	}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] block: optimize for small block size IO
  2019-10-29 10:51 [PATCH V3] block: optimize for small block size IO Ming Lei
  2019-10-29 11:04 ` Christoph Hellwig
@ 2019-10-29 15:09 ` Bart Van Assche
  2019-11-01 14:47 ` Jens Axboe
  2 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2019-10-29 15:09 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Coly Li, Christoph Hellwig, Keith Busch, linux-bcache

On 10/29/19 3:51 AM, Ming Lei wrote:
> __blk_queue_split() may be a bit heavy for small block size(such as
> 512B, or 4KB) IO, so introduce one flag to decide if this bio includes
> multiple page. And only consider to try splitting this bio in case
> that the multiple page flag is set.
> 
> ~3% - 5% IOPS improvement can be observed on io_uring test over
> null_blk(MQ), and the io_uring test code is from fio/t/io_uring.c
> 
> bch_bio_map() should be the only one which doesn't use bio_add_page(),
> so force to mark bio built via bch_bio_map() as MULTI_PAGE.
> 
> RAID5 has similar usage too, however the bio is really single-page bio,
> so not necessary to handle it.

Although this patch looks fine to me, I'm concerned about the new flag. 
That's additional state information and something that could get out of 
sync. Has it been considered to implement this optimization by handling
bio->bi_vcnt == 1 separately at the start of blk_bio_segment_split()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] block: optimize for small block size IO
  2019-10-29 11:04 ` Christoph Hellwig
@ 2019-10-29 15:11   ` Jens Axboe
  2019-10-30  0:21   ` Ming Lei
  1 sibling, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2019-10-29 15:11 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: linux-block, Coly Li, Keith Busch, linux-bcache

On 10/29/19 5:04 AM, Christoph Hellwig wrote:
> I still haven't seen an explanation why this simple thing doesn't work
> just as well:
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 48e6725b32ee..f3073700166f 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -309,6 +309,11 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
>   				nr_segs);
>   		break;
>   	default:
> +		if ((*bio)->bi_vcnt == 1 &&
> +		    (*bio)->bi_io_vec[0].bv_len <= PAGE_SIZE) {
> +			*nr_segs = 1;
> +			return;
> +		}
>   		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
>   		break;
>   	}

I used that as a hack months ago for testing, but never ran it full through
testing.

As a reference, either one of these bumps my poll performance by 3% on a
fast device that's core limited.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] block: optimize for small block size IO
  2019-10-29 11:04 ` Christoph Hellwig
  2019-10-29 15:11   ` Jens Axboe
@ 2019-10-30  0:21   ` Ming Lei
  2019-10-30 13:54     ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Ming Lei @ 2019-10-30  0:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Coly Li, Keith Busch, linux-bcache

On Tue, Oct 29, 2019 at 04:04:25AM -0700, Christoph Hellwig wrote:
> I still haven't seen an explanation why this simple thing doesn't work
> just as well:
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 48e6725b32ee..f3073700166f 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -309,6 +309,11 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
>  				nr_segs);
>  		break;
>  	default:
> +		if ((*bio)->bi_vcnt == 1 &&
> +		    (*bio)->bi_io_vec[0].bv_len <= PAGE_SIZE) {
> +			*nr_segs = 1;
> +			return;
> +		}
>  		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
>  		break;
>  	}

This bio(*bio) may be a fast-cloned bio from somewhere(DM, MD, ...), so the above
check can't work sometime.


thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] block: optimize for small block size IO
  2019-10-30  0:21   ` Ming Lei
@ 2019-10-30 13:54     ` Christoph Hellwig
  2019-10-31  1:33       ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-10-30 13:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Coly Li, Keith Busch,
	linux-bcache

On Wed, Oct 30, 2019 at 08:21:26AM +0800, Ming Lei wrote:
> > +		if ((*bio)->bi_vcnt == 1 &&
> > +		    (*bio)->bi_io_vec[0].bv_len <= PAGE_SIZE) {
> > +			*nr_segs = 1;
> > +			return;
> > +		}
> >  		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
> >  		break;
> >  	}
> 
> This bio(*bio) may be a fast-cloned bio from somewhere(DM, MD, ...), so the above
> check can't work sometime.

Please explain how it doesn't work.  In the worse case it will give us
a false negastive, that is we don't take the fast path when in theory
we could, but then again fast cloneѕ bios will have so much overhead
that it should not matter.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] block: optimize for small block size IO
  2019-10-30 13:54     ` Christoph Hellwig
@ 2019-10-31  1:33       ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2019-10-31  1:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Coly Li, Keith Busch, linux-bcache

On Wed, Oct 30, 2019 at 06:54:26AM -0700, Christoph Hellwig wrote:
> On Wed, Oct 30, 2019 at 08:21:26AM +0800, Ming Lei wrote:
> > > +		if ((*bio)->bi_vcnt == 1 &&
> > > +		    (*bio)->bi_io_vec[0].bv_len <= PAGE_SIZE) {
> > > +			*nr_segs = 1;
> > > +			return;
> > > +		}
> > >  		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
> > >  		break;
> > >  	}
> > 
> > This bio(*bio) may be a fast-cloned bio from somewhere(DM, MD, ...), so the above
> > check can't work sometime.
> 
> Please explain how it doesn't work.  In the worse case it will give us
> a false negastive, that is we don't take the fast path when in theory
> we could, but then again fast cloneѕ bios will have so much overhead
> that it should not matter.

In theory, we shouldn't use fast clone bio's .bi_vcnt, so far it is
zero, in future fast clone bio's bi_vcnt might be used for other purpose.

Also this bio may not a fast cloned bio, but it has been splitted(such
as one 4k bio, and its front 512 byte is splitted out). It depends on
if the remained bio needs to be splitted again. In reality, looks it
won't be possible.

In short, the above way is like a hack, but it should work in reality.

I am fine with that, if comment of this usage is added.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] block: optimize for small block size IO
  2019-10-29 10:51 [PATCH V3] block: optimize for small block size IO Ming Lei
  2019-10-29 11:04 ` Christoph Hellwig
  2019-10-29 15:09 ` Bart Van Assche
@ 2019-11-01 14:47 ` Jens Axboe
  2 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2019-11-01 14:47 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, Coly Li, Christoph Hellwig, Keith Busch, linux-bcache

On 10/29/19 4:51 AM, Ming Lei wrote:
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 48e6725b32ee..737bbec9e153 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -309,6 +309,10 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
>   				nr_segs);
>   		break;
>   	default:
> +		if (!bio_flagged(*bio, BIO_MULTI_PAGE)) {
> +			*nr_segs = 1;
> +			return;
> +		}
>   		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
>   		break;
>   	}

Can we just make that:

  	default:
		if (!bio_flagged(*bio, BIO_MULTI_PAGE)) {
			*nr_segs = 1;
			split = NULL;
		} else {
  			split = blk_bio_segment_split(q, *bio, &q->bio_split,
							nr_segs);
		}
  		break;

Otherwise this looks fine to me, and the win is palatable.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-11-01 14:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-29 10:51 [PATCH V3] block: optimize for small block size IO Ming Lei
2019-10-29 11:04 ` Christoph Hellwig
2019-10-29 15:11   ` Jens Axboe
2019-10-30  0:21   ` Ming Lei
2019-10-30 13:54     ` Christoph Hellwig
2019-10-31  1:33       ` Ming Lei
2019-10-29 15:09 ` Bart Van Assche
2019-11-01 14:47 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).