linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 0/1] bio: limit bio max size
       [not found] <CGME20210513102018epcas1p2a69f8e50cdf8380e433aea1a9303d04c@epcas1p2.samsung.com>
@ 2021-05-13 10:02 ` Changheun Lee
       [not found]   ` <CGME20210513102019epcas1p35d6740527dacd9bff7d8b07316e4cbfd@epcas1p3.samsung.com>
  2021-05-13 11:23   ` [PATCH v10 0/1] " Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: Changheun Lee @ 2021-05-13 10:02 UTC (permalink / raw)
  To: alex_y_xu, yi.zhang, jaegeuk, bgoncalv, Johannes.Thumshirn,
	asml.silence, axboe, bvanassche, damien.lemoal, gregkh, hch,
	linux-block, linux-kernel, ming.lei, osandov, patchwork-bot, tj,
	tom.leiming
  Cc: jisoo2146.oh, junho89.kim, mj0123.lee, seunghwan.hyun,
	sookwan7.kim, woosung2.lee, yt0928.kim, Changheun Lee

 v10 modifications
 1. bio_max_bytes is set to "BIO_MAX_VECS * PAGE_SIZE" as minimum.
 2. fix maxsize when iov_iter_get_pages() is call in
    __bio_iov_append_get_pages()

Changheun Lee (1):
  bio: limit bio max size

 block/bio.c            | 17 ++++++++++++++---
 block/blk-settings.c   |  7 +++++++
 include/linux/bio.h    |  4 +++-
 include/linux/blkdev.h |  2 ++
 4 files changed, 26 insertions(+), 4 deletions(-)

-- 
2.29.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v10 1/1] bio: limit bio max size
       [not found]   ` <CGME20210513102019epcas1p35d6740527dacd9bff7d8b07316e4cbfd@epcas1p3.samsung.com>
@ 2021-05-13 10:02     ` Changheun Lee
  0 siblings, 0 replies; 6+ messages in thread
From: Changheun Lee @ 2021-05-13 10:02 UTC (permalink / raw)
  To: alex_y_xu, yi.zhang, jaegeuk, bgoncalv, Johannes.Thumshirn,
	asml.silence, axboe, bvanassche, damien.lemoal, gregkh, hch,
	linux-block, linux-kernel, ming.lei, osandov, patchwork-bot, tj,
	tom.leiming
  Cc: jisoo2146.oh, junho89.kim, mj0123.lee, seunghwan.hyun,
	sookwan7.kim, woosung2.lee, yt0928.kim, Changheun Lee

bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now in do_direct_IO() loop. it's timeline.

 | bio merge for 32MB. total 8,192 pages are merged.
 | total elapsed time is over 2ms.
 |------------------ ... ----------------------->|
                                                 | 8,192 pages merged a bio.
                                                 | at this time, first bio submit is done.
                                                 | 1 bio is split to 32 read request and issue.
                                                 |--------------->
                                                  |--------------->
                                                   |--------------->
                                                              ......
                                                                   |--------------->
                                                                    |--------------->|
                          total 19ms elapsed to complete 32MB read done from device. |

If bio max size is limited with 1MB, behavior is changed below.

 | bio merge for 1MB. 256 pages are merged for each bio.
 | total 32 bio will be made.
 | total elapsed time is over 2ms. it's same.
 | but, first bio submit timing is fast. about 100us.
 |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
      | 256 pages merged a bio.
      | at this time, first bio submit is done.
      | and 1 read request is issued for 1 bio.
      |--------------->
           |--------------->
                |--------------->
                                      ......
                                                 |--------------->
                                                  |--------------->|
        total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee <nanich.lee@samsung.com>
---
 block/bio.c            | 17 ++++++++++++++---
 block/blk-settings.c   |  7 +++++++
 include/linux/bio.h    |  4 +++-
 include/linux/blkdev.h |  2 ++
 4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 44205dfb6b60..a79be441990a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -255,6 +255,13 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 }
 EXPORT_SYMBOL(bio_init);
 
+unsigned int bio_max_size(struct bio *bio)
+{
+	struct block_device *bdev = bio->bi_bdev;
+
+	return bdev ? bdev->bd_disk->queue->limits.bio_max_bytes : UINT_MAX;
+}
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:	bio to reset
@@ -866,7 +873,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 		struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
 		if (page_is_mergeable(bv, page, len, off, same_page)) {
-			if (bio->bi_iter.bi_size > UINT_MAX - len) {
+			if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
 				*same_page = false;
 				return false;
 			}
@@ -995,6 +1002,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
+	unsigned int bytes_left = bio_max_size(bio) - bio->bi_iter.bi_size;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
 	bool same_page = false;
@@ -1010,7 +1018,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
+	size = iov_iter_get_pages(iter, pages, bytes_left, nr_pages,
+				  &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
@@ -1038,6 +1047,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 {
 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
+	unsigned int bytes_left = bio_max_size(bio) - bio->bi_iter.bi_size;
 	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
 	unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
@@ -1058,7 +1068,8 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
+	size = iov_iter_get_pages(iter, pages, bytes_left, nr_pages,
+				  &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 902c40d67120..6c67244d899e 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -32,6 +32,7 @@ EXPORT_SYMBOL_GPL(blk_queue_rq_timeout);
  */
 void blk_set_default_limits(struct queue_limits *lim)
 {
+	lim->bio_max_bytes = UINT_MAX;
 	lim->max_segments = BLK_MAX_SEGMENTS;
 	lim->max_discard_segments = 1;
 	lim->max_integrity_segments = 0;
@@ -140,6 +141,12 @@ void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_secto
 				 limits->logical_block_size >> SECTOR_SHIFT);
 	limits->max_sectors = max_sectors;
 
+	if (check_shl_overflow(max_sectors, SECTOR_SHIFT,
+				&limits->bio_max_bytes))
+		limits->bio_max_bytes = UINT_MAX;
+	limits->bio_max_bytes = max_t(unsigned int, limits->bio_max_bytes,
+				      BIO_MAX_VECS * PAGE_SIZE);
+
 	q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
 }
 EXPORT_SYMBOL(blk_queue_max_hw_sectors);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a0b4cfdf62a4..f1a99f0a240c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -106,6 +106,8 @@ static inline void *bio_data(struct bio *bio)
 	return NULL;
 }
 
+extern unsigned int bio_max_size(struct bio *bio);
+
 /**
  * bio_full - check if the bio is full
  * @bio:	bio to check
@@ -119,7 +121,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
 	if (bio->bi_vcnt >= bio->bi_max_vecs)
 		return true;
 
-	if (bio->bi_iter.bi_size > UINT_MAX - len)
+	if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
 		return true;
 
 	return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1255823b2bc0..9fb255b48a57 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -326,6 +326,8 @@ enum blk_bounce {
 };
 
 struct queue_limits {
+	unsigned int		bio_max_bytes;
+
 	enum blk_bounce		bounce;
 	unsigned long		seg_boundary_mask;
 	unsigned long		virt_boundary_mask;
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v10 0/1] bio: limit bio max size
  2021-05-13 10:02 ` [PATCH v10 0/1] bio: limit bio max size Changheun Lee
       [not found]   ` <CGME20210513102019epcas1p35d6740527dacd9bff7d8b07316e4cbfd@epcas1p3.samsung.com>
@ 2021-05-13 11:23   ` Christoph Hellwig
       [not found]     ` <CGME20210514065054epcas1p4bd5c92a59d4010da4447ef62f65fdd4b@epcas1p4.samsung.com>
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2021-05-13 11:23 UTC (permalink / raw)
  To: Changheun Lee
  Cc: alex_y_xu, yi.zhang, jaegeuk, bgoncalv, Johannes.Thumshirn,
	asml.silence, axboe, bvanassche, damien.lemoal, gregkh, hch,
	linux-block, linux-kernel, ming.lei, osandov, patchwork-bot, tj,
	tom.leiming, jisoo2146.oh, junho89.kim, mj0123.lee,
	seunghwan.hyun, sookwan7.kim, woosung2.lee, yt0928.kim

FYI, I still think this whole limit is a bad idea, and instead we
need to fix the algorithms to not waste time iterating over the bios
again and again in non-optimal ways.  I still haven't seen an answer
if your original latency problems are reproducable with the iomap
direct I/O implementation, and we've also not even started the work on
making some of the common iterators less stupid.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v10 0/1] bio: limit bio max size
       [not found]     ` <CGME20210514065054epcas1p4bd5c92a59d4010da4447ef62f65fdd4b@epcas1p4.samsung.com>
@ 2021-05-14  6:32       ` Changheun Lee
  2021-05-17 10:11         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Changheun Lee @ 2021-05-14  6:32 UTC (permalink / raw)
  To: hch
  Cc: Johannes.Thumshirn, alex_y_xu, asml.silence, axboe, bgoncalv,
	bvanassche, damien.lemoal, gregkh, jaegeuk, jisoo2146.oh,
	junho89.kim, linux-block, linux-kernel, ming.lei, mj0123.lee,
	nanich.lee, osandov, patchwork-bot, seunghwan.hyun, sookwan7.kim,
	tj, tom.leiming, woosung2.lee, yi.zhang, yt0928.kim

> FYI, I still think this whole limit is a bad idea, and instead we
> need to fix the algorithms to not waste time iterating over the bios
> again and again in non-optimal ways.  I still haven't seen an answer
> if your original latency problems are reproducable with the iomap
> direct I/O implementation, and we've also not even started the work on
> making some of the common iterators less stupid.

Sorry. I missed your request.
Actually iomap direct I/O implementaion in f2fs, or do_direct_IO() is not
easy to me. So I tested it on ext4 environment. iomap is applied in ext4
already. As a result, same performance degradation is reproduced.

I tested 512MB file read with direct I/O. and chunk size is 64MB.
 - on SCSI disk, with no limit of bio max size(4GB) : avg. 630 MB/s
 - on SCSI disk, with limit bio max size to 1MB     : avg. 645 MB/s
 - on ramdisk, with no limit of bio max size(4GB)   : avg. 2749 MB/s
 - on ramdisk, with limit bio max size to 1MB       : avg. 3068 MB/s

I set ramdisk environment as below.
 - dd if=/dev/zero of=/mnt/ramdisk.img bs=$((1024*1024)) count=1024
 - mkfs.ext4 /mnt/ramdisk.img
 - mkdir /mnt/ext4ramdisk
 - mount -o loop /mnt/ramdisk.img /mnt/ext4ramdisk

With low performance disk, bio submit delay caused by large bio size is
not big protion. So it can't be feel easily. But it will be shown in high
performance disk.

Limit bio size is not affact to stacked driver. it's for physical disk
driver. As your advise, algorithm fixing to remove waste time is needed for
each module. I agree with you. But I think it might be good submitting bio
when bio size is over than disk transfer size logically too.

Thanks,
Changheun Lee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v10 0/1] bio: limit bio max size
  2021-05-14  6:32       ` Changheun Lee
@ 2021-05-17 10:11         ` Christoph Hellwig
       [not found]           ` <CGME20210526034859epcas1p3b3ab803406c983df431f89b6f9097f08@epcas1p3.samsung.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2021-05-17 10:11 UTC (permalink / raw)
  To: Changheun Lee
  Cc: hch, Johannes.Thumshirn, alex_y_xu, asml.silence, axboe,
	bgoncalv, bvanassche, damien.lemoal, gregkh, jaegeuk,
	jisoo2146.oh, junho89.kim, linux-block, linux-kernel, ming.lei,
	mj0123.lee, osandov, patchwork-bot, seunghwan.hyun, sookwan7.kim,
	tj, tom.leiming, woosung2.lee, yi.zhang, yt0928.kim

On Fri, May 14, 2021 at 03:32:41PM +0900, Changheun Lee wrote:
> I tested 512MB file read with direct I/O. and chunk size is 64MB.
>  - on SCSI disk, with no limit of bio max size(4GB) : avg. 630 MB/s
>  - on SCSI disk, with limit bio max size to 1MB     : avg. 645 MB/s
>  - on ramdisk, with no limit of bio max size(4GB)   : avg. 2749 MB/s
>  - on ramdisk, with limit bio max size to 1MB       : avg. 3068 MB/s
> 
> I set ramdisk environment as below.
>  - dd if=/dev/zero of=/mnt/ramdisk.img bs=$((1024*1024)) count=1024
>  - mkfs.ext4 /mnt/ramdisk.img
>  - mkdir /mnt/ext4ramdisk
>  - mount -o loop /mnt/ramdisk.img /mnt/ext4ramdisk
> 
> With low performance disk, bio submit delay caused by large bio size is
> not big protion. So it can't be feel easily. But it will be shown in high
> performance disk.

So let's attack the problem properly:

 1) switch f2fs to a direct I/O implementation that does not suck
 2) look into optimizing the iomap code to e.g. submit the bio once
    it is larger than queue_io_opt() without failing to add to a bio
    which would be annoying for things like huge pages.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v10 0/1] bio: limit bio max size
       [not found]           ` <CGME20210526034859epcas1p3b3ab803406c983df431f89b6f9097f08@epcas1p3.samsung.com>
@ 2021-05-26  3:30             ` Changheun Lee
  0 siblings, 0 replies; 6+ messages in thread
From: Changheun Lee @ 2021-05-26  3:30 UTC (permalink / raw)
  To: hch
  Cc: Johannes.Thumshirn, alex_y_xu, asml.silence, axboe, bgoncalv,
	bvanassche, damien.lemoal, gregkh, jaegeuk, jisoo2146.oh,
	junho89.kim, linux-block, linux-kernel, ming.lei, mj0123.lee,
	nanich.lee, osandov, patchwork-bot, seunghwan.hyun, sookwan7.kim,
	tj, tom.leiming, woosung2.lee, yi.zhang, yt0928.kim

> On Fri, May 14, 2021 at 03:32:41PM +0900, Changheun Lee wrote:
> > I tested 512MB file read with direct I/O. and chunk size is 64MB.
> >  - on SCSI disk, with no limit of bio max size(4GB) : avg. 630 MB/s
> >  - on SCSI disk, with limit bio max size to 1MB     : avg. 645 MB/s
> >  - on ramdisk, with no limit of bio max size(4GB)   : avg. 2749 MB/s
> >  - on ramdisk, with limit bio max size to 1MB       : avg. 3068 MB/s
> > 
> > I set ramdisk environment as below.
> >  - dd if=/dev/zero of=/mnt/ramdisk.img bs=$((1024*1024)) count=1024
> >  - mkfs.ext4 /mnt/ramdisk.img
> >  - mkdir /mnt/ext4ramdisk
> >  - mount -o loop /mnt/ramdisk.img /mnt/ext4ramdisk
> > 
> > With low performance disk, bio submit delay caused by large bio size is
> > not big protion. So it can't be feel easily. But it will be shown in high
> > performance disk.
> 
> So let's attack the problem properly:
> 
> 1) switch f2fs to a direct I/O implementation that does not suck
> 2) look into optimizing the iomap code to e.g. submit the bio once
>    it is larger than queue_io_opt() without failing to add to a bio
>    which would be annoying for things like huge pages.

There is bio submit delay in iomap_dio_bio_actor() too.
As bio size increases, bio_iov_iter_get_pages() in iomap_dio_bio_actor()
takes time more. I measured how much time is spent of bio_iov_iter_get_pages()
for each bio size with ftrace.

I added ftrace at below position.
--------------
static loff_t
iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
		                struct iomap_dio *dio, struct iomap *iomap)
{
	... snip ...

	nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
	do {
		... snip ...

		trace_mark_begin_end('B', "iomap_dio_bio_actor",
			"bio_iov_iter_get_pages", "bi_size", bio->bi_iter.bi_size, 0);
		ret = bio_iov_iter_get_pages(bio, dio->submit.iter);
		trace_mark_begin_end('E', "iomap_dio_bio_actor",
			"bio_iov_iter_get_pages", "bi_size", bio->bi_iter.bi_size, 0);
		... snip ...

	} while (nr_pages);
	... snip ...
}

bio submit delay was 0.834ms for 32MB bio.
----------
4154.574861: mark_begin_end: B|11511|iomap_dio_bio_actor:bio_iov_iter_get_pages|bi_size=0;
4154.575317: mark_begin_end: E|11511|iomap_dio_bio_actor:bio_iov_iter_get_pages|bi_size=34181120;
4154.575695: block_bio_queue: 7,5 R 719672 + 66760 [tiotest]

bio submit delay was 0.027ms for 1MB bio.
----------
4868.617791: mark_begin_end: B|19510|iomap_dio_bio_actor:bio_iov_iter_get_pages|bi_size=0;
4868.617807: mark_begin_end: E|19510|iomap_dio_bio_actor:bio_iov_iter_get_pages|bi_size=1048576;
4868.617818: block_bio_queue: 7,5 R 1118208 + 2048 [tiotest]

To optimize this, current patch, or similar approch is needed in
bio_iov_iter_get_pages(). Is it OK making a new function to set bio max
size like as below? And call it where bio max size limitation is needed.

void blk_queue_bio_max_size(struct request_queue *q, unsigned int bytes)
{
	struct queue_limits *limits = &q->limits;
	unsigned int bio_max_size = round_up(bytes, PAGE_SIZE);

	limits->bio_max_bytes = max_t(unsigned int, bio_max_size,
			              BIO_MAX_VECS * PAGE_SIZE);
}
EXPORT_SYMBOL(blk_queue_bio_max_size);


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-26  3:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20210513102018epcas1p2a69f8e50cdf8380e433aea1a9303d04c@epcas1p2.samsung.com>
2021-05-13 10:02 ` [PATCH v10 0/1] bio: limit bio max size Changheun Lee
     [not found]   ` <CGME20210513102019epcas1p35d6740527dacd9bff7d8b07316e4cbfd@epcas1p3.samsung.com>
2021-05-13 10:02     ` [PATCH v10 1/1] " Changheun Lee
2021-05-13 11:23   ` [PATCH v10 0/1] " Christoph Hellwig
     [not found]     ` <CGME20210514065054epcas1p4bd5c92a59d4010da4447ef62f65fdd4b@epcas1p4.samsung.com>
2021-05-14  6:32       ` Changheun Lee
2021-05-17 10:11         ` Christoph Hellwig
     [not found]           ` <CGME20210526034859epcas1p3b3ab803406c983df431f89b6f9097f08@epcas1p3.samsung.com>
2021-05-26  3:30             ` Changheun Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).