All of lore.kernel.org
 help / color / mirror / Atom feed
From: Changheun Lee <nanich.lee@samsung.com>
To: damien.lemoal@wdc.com
Cc: Avri.Altman@wdc.com, Johannes.Thumshirn@wdc.com,
	alex_y_xu@yahoo.ca, alim.akhtar@samsung.com,
	asml.silence@gmail.com, axboe@kernel.dk, bgoncalv@redhat.com,
	bvanassche@acm.org, cang@codeaurora.org,
	gregkh@linuxfoundation.org, hch@infradead.org,
	jaegeuk@kernel.org, jejb@linux.ibm.com, jisoo2146.oh@samsung.com,
	junho89.kim@samsung.com, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	martin.petersen@oracle.com, ming.lei@redhat.com,
	mj0123.lee@samsung.com, nanich.lee@samsung.com, osandov@fb.com,
	patchwork-bot@kernel.org, seunghwan.hyun@samsung.com,
	sookwan7.kim@samsung.com, tj@kernel.org, tom.leiming@gmail.com,
	woosung2.lee@samsung.com, yi.zhang@redhat.com,
	yt0928.kim@samsung.com
Subject: Re: [PATCH v12 1/3] bio: control bio max size
Date: Fri,  4 Jun 2021 16:34:59 +0900	[thread overview]
Message-ID: <20210604073459.29235-1-nanich.lee@samsung.com> (raw)
In-Reply-To: <DM6PR04MB70812AF342F46F453696A447E73B9@DM6PR04MB7081.namprd04.prod.outlook.com>

> On 2021/06/04 14:22, Changheun Lee wrote:
> > bio size can grow up to 4GB after muli-page bvec has been enabled.
> > But sometimes large size of bio would lead to inefficient behaviors.
> > Control of bio max size will be helpful to improve inefficiency.
> > 
> > Below is a example for inefficient behaviours.
> > In case of large chunk direct I/O, - 32MB chunk read in user space -
> > all pages for 32MB would be merged to a bio structure if the pages
> > physical addresses are contiguous. It makes some delay to submit
> > until merge complete. bio max size should be limited to a proper size.
> > 
> > When 32MB chunk read with direct I/O option is coming from userspace,
> > kernel behavior is below now in do_direct_IO() loop. It's timeline.
> > 
> >  | bio merge for 32MB. total 8,192 pages are merged.
> >  | total elapsed time is over 2ms.
> >  |------------------ ... ----------------------->|
> >                                                  | 8,192 pages merged a bio.
> >                                                  | at this time, first bio submit is done.
> >                                                  | 1 bio is split to 32 read request and issue.
> >                                                  |--------------->
> >                                                   |--------------->
> >                                                    |--------------->
> >                                                               ......
> >                                                                    |--------------->
> >                                                                     |--------------->|
> >                           total 19ms elapsed to complete 32MB read done from device. |
> > 
> > If bio max size is limited with 1MB, behavior is changed below.
> > 
> >  | bio merge for 1MB. 256 pages are merged for each bio.
> >  | total 32 bio will be made.
> >  | total elapsed time is over 2ms. it's same.
> >  | but, first bio submit timing is fast. about 100us.
> >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> >       | 256 pages merged a bio.
> >       | at this time, first bio submit is done.
> >       | and 1 read request is issued for 1 bio.
> >       |--------------->
> >            |--------------->
> >                 |--------------->
> >                                       ......
> >                                                  |--------------->
> >                                                   |--------------->|
> >         total 17ms elapsed to complete 32MB read done from device. |
> > 
> > As a result, read request issue timing is faster if bio max size is limited.
> > Current kernel behavior with multipage bvec, super large bio can be created.
> > And it lead to delay first I/O request issue.
> > 
> > Signed-off-by: Changheun Lee <nanich.lee@samsung.com>
> > ---
> >  block/bio.c            | 17 ++++++++++++++---
> >  block/blk-settings.c   | 19 +++++++++++++++++++
> >  include/linux/bio.h    |  4 +++-
> >  include/linux/blkdev.h |  3 +++
> >  4 files changed, 39 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index 44205dfb6b60..73b673f1684e 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -255,6 +255,13 @@ void bio_init(struct bio *bio, struct bio_vec *table,
> >  }
> >  EXPORT_SYMBOL(bio_init);
> >  
> > +unsigned int bio_max_bytes(struct bio *bio)
> > +{
> > +	struct block_device *bdev = bio->bi_bdev;
> > +
> > +	return bdev ? bdev->bd_disk->queue->limits.max_bio_bytes : UINT_MAX;
> > +}
> 
> unsigned int bio_max_bytes(struct bio *bio)
> {
> 	struct block_device *bdev = bio->bi_bdev;
> 
> 	if (!bdev)
> 		return UINT_MAX;
> 	return bdev->bd_disk->queue->limits.max_bio_bytes;
> }
> 
> is a lot more readable...
> Also, I remember there was some problems with bd_disk possibly being null. Was
> that fixed ?

Thank you for review. But I'd like current style, and it's readable enough
now I think. Null of bd_disk was just suspicion. bd_disk is not null if bdev
is not null.

> 
> > +
> >  /**
> >   * bio_reset - reinitialize a bio
> >   * @bio:	bio to reset
> > @@ -866,7 +873,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
> >  		struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
> >  
> >  		if (page_is_mergeable(bv, page, len, off, same_page)) {
> > -			if (bio->bi_iter.bi_size > UINT_MAX - len) {
> > +			if (bio->bi_iter.bi_size > bio_max_bytes(bio) - len) {
> >  				*same_page = false;
> >  				return false;
> >  			}
> > @@ -995,6 +1002,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> >  {
> >  	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
> >  	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
> > +	unsigned int bytes_left = bio_max_bytes(bio) - bio->bi_iter.bi_size;
> >  	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
> >  	struct page **pages = (struct page **)bv;
> >  	bool same_page = false;
> > @@ -1010,7 +1018,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> >  	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
> >  	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
> >  
> > -	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
> > +	size = iov_iter_get_pages(iter, pages, bytes_left, nr_pages,
> > +				  &offset);
> >  	if (unlikely(size <= 0))
> >  		return size ? size : -EFAULT;
> >  
> > @@ -1038,6 +1047,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
> >  {
> >  	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
> >  	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
> > +	unsigned int bytes_left = bio_max_bytes(bio) - bio->bi_iter.bi_size;
> >  	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> >  	unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
> >  	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
> > @@ -1058,7 +1068,8 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
> >  	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
> >  	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
> >  
> > -	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
> > +	size = iov_iter_get_pages(iter, pages, bytes_left, nr_pages,
> > +				  &offset);
> >  	if (unlikely(size <= 0))
> >  		return size ? size : -EFAULT;
> >  
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index 902c40d67120..e270e31519a1 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -32,6 +32,7 @@ EXPORT_SYMBOL_GPL(blk_queue_rq_timeout);
> >   */
> >  void blk_set_default_limits(struct queue_limits *lim)
> >  {
> > +	lim->max_bio_bytes = UINT_MAX;
> >  	lim->max_segments = BLK_MAX_SEGMENTS;
> >  	lim->max_discard_segments = 1;
> >  	lim->max_integrity_segments = 0;
> > @@ -100,6 +101,24 @@ void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce)
> >  }
> >  EXPORT_SYMBOL(blk_queue_bounce_limit);
> >  
> > +/**
> > + * blk_queue_max_bio_bytes - set bio max size for queue
> 
> blk_queue_max_bio_bytes - set max_bio_bytes queue limit
> 
> And then you can drop the not very useful description.

OK. I'll.

> 
> > + * @q: the request queue for the device
> > + * @bytes : bio max bytes to be set
> > + *
> > + * Description:
> > + *    Set proper bio max size to optimize queue operating.
> > + **/
> > +void blk_queue_max_bio_bytes(struct request_queue *q, unsigned int bytes)
> > +{
> > +	struct queue_limits *limits = &q->limits;
> > +	unsigned int max_bio_bytes = round_up(bytes, PAGE_SIZE);
> > +
> > +	limits->max_bio_bytes = max_t(unsigned int, max_bio_bytes,
> > +				      BIO_MAX_VECS * PAGE_SIZE);
> > +}
> > +EXPORT_SYMBOL(blk_queue_max_bio_bytes);
> 
> Setting of the stacked limits is still missing.

max_bio_bytes for stacked device is just default(UINT_MAX) in this patch.
Because blk_set_stacking_limits() call blk_set_default_limits().
I'll work continue for stacked device after this patchowork.

> 
> > +
> >  /**
> >   * blk_queue_max_hw_sectors - set max sectors for a request for this queue
> >   * @q:  the request queue for the device
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index a0b4cfdf62a4..3959cc1a0652 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -106,6 +106,8 @@ static inline void *bio_data(struct bio *bio)
> >  	return NULL;
> >  }
> >  
> > +extern unsigned int bio_max_bytes(struct bio *bio);
> > +
> >  /**
> >   * bio_full - check if the bio is full
> >   * @bio:	bio to check
> > @@ -119,7 +121,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
> >  	if (bio->bi_vcnt >= bio->bi_max_vecs)
> >  		return true;
> >  
> > -	if (bio->bi_iter.bi_size > UINT_MAX - len)
> > +	if (bio->bi_iter.bi_size > bio_max_bytes(bio) - len)
> >  		return true;
> >  
> >  	return false;
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 1255823b2bc0..861888501fc0 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -326,6 +326,8 @@ enum blk_bounce {
> >  };
> >  
> >  struct queue_limits {
> > +	unsigned int		max_bio_bytes;
> > +
> >  	enum blk_bounce		bounce;
> >  	unsigned long		seg_boundary_mask;
> >  	unsigned long		virt_boundary_mask;
> > @@ -1132,6 +1134,7 @@ extern void blk_abort_request(struct request *);
> >   * Access functions for manipulating queue properties
> >   */
> >  extern void blk_cleanup_queue(struct request_queue *);
> > +extern void blk_queue_max_bio_bytes(struct request_queue *, unsigned int);
> >  void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce limit);
> >  extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int);
> >  extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int);
> > 
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research

Thank you,
Changheun Lee

  parent reply	other threads:[~2021-06-04  7:53 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20210604052157epcas1p2e5eebb52d08b06174696290e11fdd5a4@epcas1p2.samsung.com>
2021-06-04  5:03 ` [PATCH v12 0/3] bio: control bio max size Changheun Lee
     [not found]   ` <CGME20210604052159epcas1p4370bee98aad882ab335dda1565db94fb@epcas1p4.samsung.com>
2021-06-04  5:03     ` [PATCH v12 1/3] " Changheun Lee
2021-06-04  7:24       ` Damien Le Moal
     [not found]         ` <CGME20210604075331epcas1p13bb57f9ddfc7b112dec1ba8cf40fdc74@epcas1p1.samsung.com>
2021-06-04  7:34           ` Changheun Lee [this message]
2021-06-04  8:01             ` Damien Le Moal
     [not found]               ` <CGME20210604094209epcas1p26368dd18011bb2761529432cf2656a9f@epcas1p2.samsung.com>
2021-06-04  9:23                 ` Changheun Lee
2021-06-04 14:52             ` Bart Van Assche
2021-06-07  6:35               ` Christoph Hellwig
2021-06-07 16:46                 ` Bart Van Assche
2021-06-08  5:22                   ` Christoph Hellwig
2021-06-07  6:32       ` Christoph Hellwig
     [not found]         ` <CGME20210607110122epcas1p4557bcbc7abac791f2557cc0d317214fd@epcas1p4.samsung.com>
2021-06-07 10:42           ` Changheun Lee
     [not found]   ` <CGME20210604052200epcas1p10754a3187476a0fcbfcc89c103a6d436@epcas1p1.samsung.com>
2021-06-04  5:03     ` [PATCH v12 2/3] blk-sysfs: add max_bio_bytes Changheun Lee
     [not found]   ` <CGME20210604052201epcas1p41a27660b20d70b7fc4295c8f131d33ce@epcas1p4.samsung.com>
2021-06-04  5:03     ` [PATCH v12 3/3] ufs: set max_bio_bytes with queue max sectors Changheun Lee
2021-06-04 16:11       ` Bart Van Assche
     [not found]         ` <CGME20210607094031epcas1p1f4a9ee01eaa4652ba0e8eb6a4964c952@epcas1p1.samsung.com>
2021-06-07  9:21           ` Changheun Lee
2021-06-04  6:41   ` [PATCH v12 0/3] bio: control bio max size Can Guo
2021-06-04 16:13   ` Bart Van Assche
     [not found]     ` <CGME20210607101609epcas1p392324f6d215e329d632a615c4b1adf4c@epcas1p3.samsung.com>
2021-06-07  9:57       ` Changheun Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210604073459.29235-1-nanich.lee@samsung.com \
    --to=nanich.lee@samsung.com \
    --cc=Avri.Altman@wdc.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=alex_y_xu@yahoo.ca \
    --cc=alim.akhtar@samsung.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bgoncalv@redhat.com \
    --cc=bvanassche@acm.org \
    --cc=cang@codeaurora.org \
    --cc=damien.lemoal@wdc.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=jisoo2146.oh@samsung.com \
    --cc=junho89.kim@samsung.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=mj0123.lee@samsung.com \
    --cc=osandov@fb.com \
    --cc=patchwork-bot@kernel.org \
    --cc=seunghwan.hyun@samsung.com \
    --cc=sookwan7.kim@samsung.com \
    --cc=tj@kernel.org \
    --cc=tom.leiming@gmail.com \
    --cc=woosung2.lee@samsung.com \
    --cc=yi.zhang@redhat.com \
    --cc=yt0928.kim@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.