All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marek Szyprowski <m.szyprowski@samsung.com>
To: Changheun Lee <nanich.lee@samsung.com>,
	bvanassche@acm.org, Johannes.Thumshirn@wdc.com,
	asml.silence@gmail.com, axboe@kernel.dk, damien.lemoal@wdc.com,
	gregkh@linuxfoundation.org, hch@infradead.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	ming.lei@redhat.com, osandov@fb.com, patchwork-bot@kernel.org,
	tj@kernel.org, tom.leiming@gmail.com
Cc: jisoo2146.oh@samsung.com, junho89.kim@samsung.com,
	mj0123.lee@samsung.com, seunghwan.hyun@samsung.com,
	sookwan7.kim@samsung.com, woosung2.lee@samsung.com,
	yt0928.kim@samsung.com,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: Re: [PATCH v8] bio: limit bio max size
Date: Mon, 26 Apr 2021 15:18:19 +0200	[thread overview]
Message-ID: <cb011703-4781-fd8e-d628-3dc488e0de7d@samsung.com> (raw)
In-Reply-To: <20210421094745.29660-1-nanich.lee@samsung.com>

Hi Changheu,

On 21.04.2021 11:47, Changheun Lee wrote:
> bio size can grow up to 4GB when muli-page bvec is enabled.
> but sometimes it would lead to inefficient behaviors.
> in case of large chunk direct I/O, - 32MB chunk read in user space -
> all pages for 32MB would be merged to a bio structure if the pages
> physical addresses are contiguous. it makes some delay to submit
> until merge complete. bio max size should be limited to a proper size.
>
> When 32MB chunk read with direct I/O option is coming from userspace,
> kernel behavior is below now in do_direct_IO() loop. it's timeline.
>
>   | bio merge for 32MB. total 8,192 pages are merged.
>   | total elapsed time is over 2ms.
>   |------------------ ... ----------------------->|
>                                                   | 8,192 pages merged a bio.
>                                                   | at this time, first bio submit is done.
>                                                   | 1 bio is split to 32 read request and issue.
>                                                   |--------------->
>                                                    |--------------->
>                                                     |--------------->
>                                                                ......
>                                                                     |--------------->
>                                                                      |--------------->|
>                            total 19ms elapsed to complete 32MB read done from device. |
>
> If bio max size is limited with 1MB, behavior is changed below.
>
>   | bio merge for 1MB. 256 pages are merged for each bio.
>   | total 32 bio will be made.
>   | total elapsed time is over 2ms. it's same.
>   | but, first bio submit timing is fast. about 100us.
>   |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
>        | 256 pages merged a bio.
>        | at this time, first bio submit is done.
>        | and 1 read request is issued for 1 bio.
>        |--------------->
>             |--------------->
>                  |--------------->
>                                        ......
>                                                   |--------------->
>                                                    |--------------->|
>          total 17ms elapsed to complete 32MB read done from device. |
>
> As a result, read request issue timing is faster if bio max size is limited.
> Current kernel behavior with multipage bvec, super large bio can be created.
> And it lead to delay first I/O request issue.
>
> Signed-off-by: Changheun Lee <nanich.lee@samsung.com>

This patch landed in linux-next 20210426 as commit 42fb54fbc707 ("bio: 
limit bio max size"). Sadly it causes the following regression during 
boot on my test systems:

8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 0000023c
pgd = (ptrval)
[0000023c] *pgd=00000000
Internal error: Oops: 5 [#2] SMP ARM
Modules linked in:
CPU: 0 PID: 186 Comm: systemd-udevd Tainted: G      D 
5.12.0-next-20210426 #3045
Hardware name: Generic DT based system
PC is at bio_add_hw_page+0x58/0x1fc
LR is at bio_add_pc_page+0x40/0x5c
pc : [<c06c5bf0>]    lr : [<c06c5dd4>]    psr: 20000013
sp : c3cc7de0  ip : ffffffff  fp : 00000000
r10: 00000cc0  r9 : c20b2000  r8 : c21b5680
r7 : dbc51b80  r6 : c30d0540  r5 : 00000014  r4 : c21b5680
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : c30d0540
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 43ccc06a  DAC: 00000051
Register r0 information: slab request_queue start c30d0540 pointer offset 0
Register r1 information: NULL pointer
Register r2 information: NULL pointer
Register r3 information: NULL pointer
Register r4 information: slab kmalloc-128 start c21b5680 pointer offset 
0 size 128
Register r5 information: non-paged memory
Register r6 information: slab request_queue start c30d0540 pointer offset 0
Register r7 information: non-slab/vmalloc memory
Register r8 information: slab kmalloc-128 start c21b5680 pointer offset 
0 size 128
Register r9 information: slab kmalloc-4k start c20b2000 pointer offset 0 
size 4096
Register r10 information: non-paged memory
Register r11 information: NULL pointer
Register r12 information: non-paged memory
Process systemd-udevd (pid: 186, stack limit = 0x(ptrval))
Stack: (0xc3cc7de0 to 0xc3cc8000)
...
[<c06c5bf0>] (bio_add_hw_page) from [<c06c5dd4>] (bio_add_pc_page+0x40/0x5c)
[<c06c5dd4>] (bio_add_pc_page) from [<c06cf0ac>] 
(blk_rq_map_kern+0x234/0x304)
[<c06cf0ac>] (blk_rq_map_kern) from [<c0a54634>] (serial_show+0x64/0xd4)
[<c0a54634>] (serial_show) from [<c0a228ac>] (dev_attr_show+0x18/0x48)
[<c0a228ac>] (dev_attr_show) from [<c054721c>] (sysfs_kf_seq_show+0x88/0xf4)
[<c054721c>] (sysfs_kf_seq_show) from [<c04d7a44>] 
(seq_read_iter+0x10c/0x4bc)
[<c04d7a44>] (seq_read_iter) from [<c04adf60>] (vfs_read+0x1d4/0x2e0)
[<c04adf60>] (vfs_read) from [<c04ae47c>] (ksys_read+0x5c/0xd0)
[<c04ae47c>] (ksys_read) from [<c03000c0>] (ret_fast_syscall+0x0/0x58)
Exception stack(0xc3cc7fa8 to 0xc3cc7ff0)
...
Code: e1520003 9a000021 e5942004 e5941020 (e592223c)
---[ end trace 51c4d8003ec70244 ]---

It can be also reproduced with qemu and ARM 32bit virt machine. 
Reverting it on top of linux-next 20210426 fixes the issue. If you need 
more information, let me know.

> ---
>   block/bio.c            | 9 ++++++++-
>   block/blk-settings.c   | 5 +++++
>   include/linux/bio.h    | 4 +++-
>   include/linux/blkdev.h | 2 ++
>   4 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 50e579088aca..9e5061ecc317 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -255,6 +255,13 @@ void bio_init(struct bio *bio, struct bio_vec *table,
>   }
>   EXPORT_SYMBOL(bio_init);
>   
> +unsigned int bio_max_size(struct bio *bio)
> +{
> +	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +
> +	return q->limits.bio_max_bytes;
> +}
> +
>   /**
>    * bio_reset - reinitialize a bio
>    * @bio:	bio to reset
> @@ -866,7 +873,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
>   		struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
>   
>   		if (page_is_mergeable(bv, page, len, off, same_page)) {
> -			if (bio->bi_iter.bi_size > UINT_MAX - len) {
> +			if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
>   				*same_page = false;
>   				return false;
>   			}
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index b4aa2f37fab6..cd3dcb5afe50 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -37,6 +37,7 @@ EXPORT_SYMBOL_GPL(blk_queue_rq_timeout);
>    */
>   void blk_set_default_limits(struct queue_limits *lim)
>   {
> +	lim->bio_max_bytes = UINT_MAX;
>   	lim->max_segments = BLK_MAX_SEGMENTS;
>   	lim->max_discard_segments = 1;
>   	lim->max_integrity_segments = 0;
> @@ -168,6 +169,10 @@ void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_secto
>   				 limits->logical_block_size >> SECTOR_SHIFT);
>   	limits->max_sectors = max_sectors;
>   
> +	if (check_shl_overflow(max_sectors, SECTOR_SHIFT,
> +				&limits->bio_max_bytes))
> +		limits->bio_max_bytes = UINT_MAX;
> +
>   	q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
>   }
>   EXPORT_SYMBOL(blk_queue_max_hw_sectors);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d0246c92a6e8..e5add63da3af 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -106,6 +106,8 @@ static inline void *bio_data(struct bio *bio)
>   	return NULL;
>   }
>   
> +extern unsigned int bio_max_size(struct bio *bio);
> +
>   /**
>    * bio_full - check if the bio is full
>    * @bio:	bio to check
> @@ -119,7 +121,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
>   	if (bio->bi_vcnt >= bio->bi_max_vecs)
>   		return true;
>   
> -	if (bio->bi_iter.bi_size > UINT_MAX - len)
> +	if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
>   		return true;
>   
>   	return false;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 158aefae1030..c205d60ac611 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -312,6 +312,8 @@ enum blk_zoned_model {
>   };
>   
>   struct queue_limits {
> +	unsigned int		bio_max_bytes;
> +
>   	unsigned long		bounce_pfn;
>   	unsigned long		seg_boundary_mask;
>   	unsigned long		virt_boundary_mask;

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


  parent reply	other threads:[~2021-04-26 13:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20210421100544epcas1p13c2c86e84102f0955dd591f72e45756a@epcas1p1.samsung.com>
2021-04-21  9:47 ` [PATCH v8] bio: limit bio max size Changheun Lee
2021-04-21 16:29   ` Bart Van Assche
     [not found]     ` <CGME20210422000922epcas1p4bb2d0220652f3c497f063719f82bc829@epcas1p4.samsung.com>
2021-04-21 23:51       ` Changheun Lee
2021-04-22 20:33   ` Bart Van Assche
2021-04-23 11:36   ` Damien Le Moal
2021-04-23 14:38   ` Jens Axboe
2021-04-25  2:20   ` [bio] 803f54ef52: BUG:kernel_NULL_pointer_dereference,address kernel test robot
2021-04-25  2:20     ` kernel test robot
2021-04-26 13:18   ` Marek Szyprowski [this message]
2021-04-26 20:09     ` [PATCH v8] bio: limit bio max size Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb011703-4781-fd8e-d628-3dc488e0de7d@samsung.com \
    --to=m.szyprowski@samsung.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=b.zolnierkie@samsung.com \
    --cc=bvanassche@acm.org \
    --cc=damien.lemoal@wdc.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=jisoo2146.oh@samsung.com \
    --cc=junho89.kim@samsung.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mj0123.lee@samsung.com \
    --cc=nanich.lee@samsung.com \
    --cc=osandov@fb.com \
    --cc=patchwork-bot@kernel.org \
    --cc=seunghwan.hyun@samsung.com \
    --cc=sookwan7.kim@samsung.com \
    --cc=tj@kernel.org \
    --cc=tom.leiming@gmail.com \
    --cc=woosung2.lee@samsung.com \
    --cc=yt0928.kim@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.