All of lore.kernel.org
 help / color / mirror / Atom feed
From: Changheun Lee <nanich.lee@samsung.com>
To: ming.lei@redhat.com
Cc: Johannes.Thumshirn@wdc.com, asml.silence@gmail.com,
	axboe@kernel.dk, damien.lemoal@wdc.com, hch@infradead.org,
	jisoo2146.oh@samsung.com, junho89.kim@samsung.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	mj0123.lee@samsung.com, nanich.lee@samsung.com, osandov@fb.com,
	patchwork-bot@kernel.org, seunghwan.hyun@samsung.com,
	sookwan7.kim@samsung.com, tj@kernel.org, tom.leiming@gmail.com,
	woosung2.lee@samsung.com, yt0928.kim@samsung.com
Subject: Re: [PATCH v4 1/2] bio: limit bio max size
Date: Tue,  2 Feb 2021 13:12:04 +0900	[thread overview]
Message-ID: <20210202041204.28995-1-nanich.lee@samsung.com> (raw)
In-Reply-To: <20210201071413.GC9481@T590>

> On Mon, Feb 01, 2021 at 11:52:48AM +0900, Changheun Lee wrote:
> > > On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote:
> > > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > > but sometimes it would lead to inefficient behaviors.
> > > > in case of large chunk direct I/O, - 32MB chunk read in user space -
> > > > all pages for 32MB would be merged to a bio structure if the pages
> > > > physical addresses are contiguous. it makes some delay to submit
> > > > until merge complete. bio max size should be limited to a proper size.
> > > > 
> > > > When 32MB chunk read with direct I/O option is coming from userspace,
> > > > kernel behavior is below now. it's timeline.
> > > > 
> > > >  | bio merge for 32MB. total 8,192 pages are merged.
> > > >  | total elapsed time is over 2ms.
> > > >  |------------------ ... ----------------------->|
> > > >                                                  | 8,192 pages merged a bio.
> > > >                                                  | at this time, first bio submit is done.
> > > >                                                  | 1 bio is split to 32 read request and issue.
> > > >                                                  |--------------->
> > > >                                                   |--------------->
> > > >                                                    |--------------->
> > > >                                                               ......
> > > >                                                                    |--------------->
> > > >                                                                     |--------------->|
> > > >                           total 19ms elapsed to complete 32MB read done from device. |
> > > > 
> > > > If bio max size is limited with 1MB, behavior is changed below.
> > > > 
> bio_iov_iter_get_pages> > >  | bio merge for 1MB. 256 pages are merged for each bio.
> > > >  | total 32 bio will be made.
> > > >  | total elapsed time is over 2ms. it's same.
> > > >  | but, first bio submit timing is fast. about 100us.
> > > >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > > >       | 256 pages merged a bio.
> > > >       | at this time, first bio submit is done.
> > > >       | and 1 read request is issued for 1 bio.
> > > >       |--------------->
> > > >            |--------------->
> > > >                 |--------------->
> > > >                                       ......
> > > >                                                  |--------------->
> > > >                                                   |--------------->|
> > > >         total 17ms elapsed to complete 32MB read done from device. |
> > > 
> > > Can you share us if enabling THP in your application can avoid this issue? BTW, you
> > > need to make the 32MB buffer aligned with huge page size. IMO, THP perfectly fits
> > > your case.
> > > 
> > 
> > THP is enabled already like as below in my environment. It has no effect.
> > 
> > cat /sys/kernel/mm/transparent_hugepage/enabled
> > [always] madvise never
> 
> The 32MB user buffer needs to be huge page size aligned. If your system
> supports bcc/bpftrace, it is quite easy to check if the buffer is
> aligned.
> 
> > 
> > This issue was reported from performance benchmark application in open market.
> > I can't control application's working in open market.
> > It's not only my own case. This issue might be occured in many mobile environment.
> > At least, I checked this problem in exynos, and qualcomm chipset.
> 
> You just said it takes 2ms for building 32MB bio, but you never investigate the
> reason. I guess it is from get_user_pages_fast(), but maybe others. Can you dig
> further for the reason? Maybe it is one arm64 specific issue.
> 
> BTW, bio_iov_iter_get_pages() just takes ~200us on one x86_64 VM with THP, which is
> observed via bcc/funclatency when running the following workload:
> 

I think you focused on bio_iov_iter_get_pages() because I just commented page
merge delay only. Sorry about that. I missed details of this issue.
Actually there are many operations during while-loop in do_direct_IO().
Page merge operation is just one among them. Page merge operation is called
by dio_send_cur_page() in while-loop. Below is call stack.

__bio_try_merge_page+0x4c/0x614
bio_add_page+0x40/0x12c
dio_send_cur_page+0x13c/0x374
submit_page_section+0xb4/0x304
do_direct_IO+0x3d4/0x854
do_blockdev_direct_IO+0x488/0xa18
__blockdev_direct_IO+0x30/0x3c
f2fs_direct_IO+0x6d0/0xb80
generic_file_read_iter+0x284/0x45c
f2fs_file_read_iter+0x3c/0xac
__vfs_read+0x19c/0x204
vfs_read+0xa4/0x144

2ms delay is not only caused by page merge operation. it inculdes many the
other operations too. But those many operations included page merge should
be executed more if bio size is grow up.

> [root@ktest-01 test]# cat fio.job
> [global]
> bs=32768k
> rw=randread
> iodepth=1
> ioengine=psync
> direct=1
> runtime=20
> time_based
> 
> group_reporting=0
> ramp_time=5
> 
> [diotest]
> filename=/dev/sde
> 
> 
> [root@ktest-01 func]# /usr/share/bcc/tools/funclatency bio_iov_iter_get_pages
> Tracing 1 functions for "bio_iov_iter_get_pages"... Hit Ctrl-C to end.
> ^C
> nsecs               : count     distribution
> 0 -> 1          : 0        |                                        |
> 2 -> 3          : 0        |                                        |
> 4 -> 7          : 0        |                                        |
> 8 -> 15         : 0        |                                        |
> 16 -> 31         : 0        |                                        |
> 32 -> 63         : 0        |                                        |
> 64 -> 127        : 0        |                                        |
> 128 -> 255        : 0        |                                        |
> 256 -> 511        : 0        |                                        |
> 512 -> 1023       : 0        |                                        |
> 1024 -> 2047       : 0        |                                        |
> 2048 -> 4095       : 0        |                                        |
> 4096 -> 8191       : 0        |                                        |
> 8192 -> 16383      : 0        |                                        |
> 16384 -> 32767      : 0        |                                        |
> 32768 -> 65535      : 0        |                                        |
> 65536 -> 131071     : 0        |                                        |
> 131072 -> 262143     : 1842     |****************************************|
> 262144 -> 524287     : 125      |**                                      |
> 524288 -> 1048575    : 6        |                                        |
> 1048576 -> 2097151    : 0        |                                        |
> 2097152 -> 4194303    : 1        |                                        |
> 4194304 -> 8388607    : 0        |                                        |
> 8388608 -> 16777215   : 1        |                                        |
> Detaching...
> 
> 
> 
> -- 
> Ming
> 

---
Changheun Lee
Samsung Electronics

  parent reply	other threads:[~2021-02-02  4:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20210129040447epcas1p4531f0bf1ddebf0b469af87e85199cc43@epcas1p4.samsung.com>
2021-01-29  3:49 ` [PATCH v4 1/2] bio: limit bio max size Changheun Lee
     [not found]   ` <CGME20210129040448epcas1p44da94c82ac4fb82d77c452bda56d6b09@epcas1p4.samsung.com>
2021-01-29  3:49     ` [PATCH v4 2/2] bio: add limit_bio_size sysfs Changheun Lee
2021-02-03  9:00       ` Greg KH
     [not found]         ` <CGME20210203093835epcas1p2e86a35ba3012882950abc7013cae59b9@epcas1p2.samsung.com>
2021-02-03  9:22           ` Changheun Lee
2021-02-03 10:06             ` Greg KH
     [not found]               ` <CGME20210203113650epcas1p2ea64df5b6349975fa92c1605edc92961@epcas1p2.samsung.com>
2021-02-03 11:21                 ` [PATCH " Changheun Lee
2021-02-03 13:03                   ` Greg KH
2021-01-29  7:23   ` [PATCH v4 1/2] bio: limit bio max size Ming Lei
     [not found]   ` <CGME20210201030830epcas1p402e8a088fb16af9fbbb130b152e097f1@epcas1p4.samsung.com>
2021-02-01  2:52     ` Changheun Lee
2021-02-01  7:14       ` Ming Lei
     [not found]         ` <CGME20210202042747epcas1p2054df120098b3130cb104cf8e4731797@epcas1p2.samsung.com>
2021-02-02  4:12           ` Changheun Lee [this message]
2021-02-03  3:40             ` Ming Lei
     [not found]               ` <CGME20210203054543epcas1p454a7520710a1ae066237375691db90fd@epcas1p4.samsung.com>
2021-02-03  5:30                 ` Changheun Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210202041204.28995-1-nanich.lee@samsung.com \
    --to=nanich.lee@samsung.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@wdc.com \
    --cc=hch@infradead.org \
    --cc=jisoo2146.oh@samsung.com \
    --cc=junho89.kim@samsung.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mj0123.lee@samsung.com \
    --cc=osandov@fb.com \
    --cc=patchwork-bot@kernel.org \
    --cc=seunghwan.hyun@samsung.com \
    --cc=sookwan7.kim@samsung.com \
    --cc=tj@kernel.org \
    --cc=tom.leiming@gmail.com \
    --cc=woosung2.lee@samsung.com \
    --cc=yt0928.kim@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.