linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Changheun Lee <nanich.lee@samsung.com>
To: damien.lemoal@wdc.com
Cc: Johannes.Thumshirn@wdc.com, axboe@kernel.dk,
	jisoo2146.oh@samsung.com, junho89.kim@samsung.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	mj0123.lee@samsung.com, nanich.lee@samsung.com,
	seunghwan.hyun@samsung.com, sookwan7.kim@samsung.com,
	tj@kernel.org, woosung2.lee@samsung.com, yt0928.kim@samsung.com
Subject: Re: Re: [PATCH] bio: limit bio max size.
Date: Wed, 13 Jan 2021 15:39:36 +0900	[thread overview]
Message-ID: <20210113063936.4189-1-nanich.lee@samsung.com> (raw)
In-Reply-To: <CH2PR04MB6522091DE371F5EB7EE98200E7A90@CH2PR04MB6522.namprd04.prod.outlook.com>

>On 2021/01/13 13:01, Changheun Lee wrote:
>>> On 2021/01/12 21:14, Changheun Lee wrote:
>>>>> On 2021/01/12 17:52, Changheun Lee wrote:
>>>>>> From: "Changheun Lee" <nanich.lee@samsung.com>
>>>>>>
>>>>>> bio size can grow up to 4GB when muli-page bvec is enabled.
>>>>>> but sometimes it would lead to inefficient behaviors.
>>>>>> in case of large chunk direct I/O, - 64MB chunk read in user space -
>>>>>> all pages for 64MB would be merged to a bio structure if memory address is
>>>>>> continued phsycally. it makes some delay to submit until merge complete.
>>>>>> bio max size should be limited as a proper size.
>>>>>
>>>>> But merging physically contiguous pages into the same bvec + later automatic bio
>>>>> split on submit should give you better throughput for large IOs compared to
>>>>> having to issue a bio chain of smaller BIOs that are arbitrarily sized and will
>>>>> likely need splitting anyway (because of DMA boundaries etc).
>>>>>
>>>>> Do you have a specific case where you see higher performance with this patch
>>>>> applied ? On Intel, BIO_MAX_SIZE would be 1MB... That is arbitrary and too small
>>>>> considering that many hardware can execute larger IOs than that.
>>>>>
>>>>
>>>> When I tested 32MB chunk read with O_DIRECT in android, all pages of 32MB
>>>> is merged into a bio structure.
>>>> And elapsed time to merge complete was about 2ms.
>>>> It means first bio-submit is after 2ms.
>>>> If bio size is limited with 1MB with this patch, first bio-submit is about
>>>> 100us by bio_full operation.
>>>
>>> bio_submit() will split the large BIO case into multiple requests while the
>>> small BIO case will likely result one or two requests only. That likely explain
>>> the time difference here. However, for the large case, the 2ms will issue ALL
>>> requests needed for processing the entire 32MB user IO while the 1MB bio case
>>> will need 32 different bio_submit() calls. So what is the actual total latency
>>> difference for the entire 32MB user IO ? That is I think what needs to be
>>> compared here.
>>>
>>> Also, what is your device max_sectors_kb and max queue depth ?
>>>
>> 
>> 32MB total latency is about 19ms including merge time without this patch.
>> But with this patch, total latency is about 17ms including merge time too.
>> Actually 32MB read time from device is same - about 16.7ms - in driver layer.
>> No need to hold more I/O than max_sectors_kb during bio merge.
>> My device is UFS. and max_sectors_kb is 1MB, queue depth is 32.
>
>This may be due to the CPU being slow: it takes time to build the 32MB BIO,
>during which the device is idling. With the 1MB BIO limit, the first BIO hits
>the drive much more quickly, reducing idle time of the device and leading to
>higher throughput. But there are 31 more BIOs to build and issue after the first
>one... That does create a pipeline of requests keeping the device busy, but that
>also likely keeps your CPU a lot more busy building these additional BIOs.
>Overall, do you see a difference in CPU load for the 32MB BIO case vs the 1MB
>max BIO case ? Any increase in CPU load with the lower BIO size limit ?
>

CPU load is more than 32MB bio size. Because bio_sumbit progress is doing in parallel.
But I tested it same CPU operation frequency, So there are no difference of CPU speed.
Logically, bio max size is 4GB now. it's not realistic I know, but 4GB merge to a bio
will takes much time even if CPU works fast.
This is why I think bio max size should be limited.

>> 
>>>> It's not large delay and can't be observed with low speed device.
>>>> But it's needed to reduce merge delay for high speed device.
>>>> I improved 512MB sequential read performance from 1900MB/s to 2000MB/s
>>>> with this patch on android platform.
>>>> As you said, 1MB might be small for some device.
>>>> But method is needed to re-size, or select the bio max size.
>>>
>>> At the very least, I think that such limit should not be arbitrary as your patch
>>> proposes but rely on the device characteristics (e.g.
>>> max_hw_sectors_kb/max_sectors_kb and queue depth).
>>>
>> 
>> I agree with your opinion, I thought same as your idea. For that, deep research
>> is needed, proper timing to set and bio structure modification, etc ...
>
>Why would you need any BIO structure modifications ? Your patch is on the right
>track if limiting the BIO size is the right solution (I am not still completely
>convinced). E.g., the code:
>
>if (page_is_mergeable(bv, page, len, off, same_page)) {
>if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) {
>*same_page = false;
>return false;
>}
>
>could just become:
>
>if (page_is_mergeable(bv, page, len, off, same_page)) {
>if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
>*same_page = false;
>return false;
>}
>
>With bio_max_size() being something like:
>
>static inline size_t bio_max_size(struct bio *bio)
>{
>sector_t max_sectors = blk_queue_get_max_sectors(bio->bi_disk->queue,
>bio_op(bio));
>
>return max_sectors << SECTOR_SHIFT;
>}
>
>Note that this is not super efficient as a BIO maximum size depends on the BIO
>offset too (its start sector). So writing something similar to
>blk_rq_get_max_sectors() would probably be better.

Good suggestion. :)

>
>> Current is simple patch for default bio max size.
>> Before applying of multipage bvec, bio max size was 1MB in kernel 4.x by BIO_MAX_PAGES.
>> So I think 1MB bio max size is reasonable as a default.
>
>max_sectors_kb is always defined for any block device so I do not think there is
>a need for any arbitrary default value.
>
>Since such optimization likely very much depend on the speed of the system CPU
>and of the storage device used, it may be a good idea to have this configurable
>through sysfs. That is, bio_max_size() simply returns UINT_MAX leading to no
>change from the current behavior if the optimization is disabled (default) and
>max_sectors_kb if it is enabled.
>

OK, I agree with you. It will be best for all now.
I'll try to make this.

>> 
>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Changheun Lee <nanich.lee@samsung.com>
>>>>>> ---
>>>>>>  block/bio.c         | 2 +-
>>>>>>  include/linux/bio.h | 3 ++-
>>>>>>  2 files changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/block/bio.c b/block/bio.c
>>>>>> index 1f2cc1fbe283..dbe14d675f28 100644
>>>>>> --- a/block/bio.c
>>>>>> +++ b/block/bio.c
>>>>>> @@ -877,7 +877,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
>>>>>>  		struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
>>>>>>  
>>>>>>  		if (page_is_mergeable(bv, page, len, off, same_page)) {
>>>>>> -			if (bio->bi_iter.bi_size > UINT_MAX - len) {
>>>>>> +			if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) {
>>>>>>  				*same_page = false;
>>>>>>  				return false;
>>>>>>  			}
>>>>>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>>>>>> index 1edda614f7ce..0f49b354b1f6 100644
>>>>>> --- a/include/linux/bio.h
>>>>>> +++ b/include/linux/bio.h
>>>>>> @@ -20,6 +20,7 @@
>>>>>>  #endif
>>>>>>  
>>>>>>  #define BIO_MAX_PAGES		256
>>>>>> +#define BIO_MAX_SIZE		(BIO_MAX_PAGES * PAGE_SIZE)
>>>>>>  
>>>>>>  #define bio_prio(bio)			(bio)->bi_ioprio
>>>>>>  #define bio_set_prio(bio, prio)		((bio)->bi_ioprio = prio)
>>>>>> @@ -113,7 +114,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
>>>>>>  	if (bio->bi_vcnt >= bio->bi_max_vecs)
>>>>>>  		return true;
>>>>>>  
>>>>>> -	if (bio->bi_iter.bi_size > UINT_MAX - len)
>>>>>> +	if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len)
>>>>>>  		return true;
>>>>>>  
>>>>>>  	return false;
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Damien Le Moal
>>>>> Western Digital Research
>>>>
>>>
>>>
>>> -- 
>>> Damien Le Moal
>>> Western Digital Research
>>>
>> 
>> ---
>> Changheun Lee
>> Samsung Electronics
>> 
>> 
>
>
>-- 
>Damien Le Moal
>Western Digital Research
>

---
Changheun Lee
Samsung Electronics


  parent reply	other threads:[~2021-01-13  6:55 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20210112084819epcas1p2389fe5fd665e4ff7b41ad92344547294@epcas1p2.samsung.com>
2021-01-12  8:33 ` [PATCH] bio: limit bio max size Changheun Lee
2021-01-12  9:16   ` Damien Le Moal
     [not found]     ` <CGME20210112121356epcas1p124baa93f10eb3400539ba4db27c18955@epcas1p1.samsung.com>
2021-01-12 11:58       ` Changheun Lee
2021-01-13  1:16         ` Damien Le Moal
     [not found]           ` <CGME20210113040146epcas1p230596c7c3760471dca442d1f7ce4dc55@epcas1p2.samsung.com>
2021-01-13  3:46             ` Changheun Lee
2021-01-13  5:53               ` Damien Le Moal
     [not found]                 ` <CGME20210113065444epcas1p4b8ee3edb314a06b1a9f92fd0e38ca856@epcas1p4.samsung.com>
2021-01-13  6:39                   ` Changheun Lee [this message]
2021-01-13  7:12                     ` Damien Le Moal
2021-01-13  9:19               ` Ming Lei
2021-01-13  9:28                 ` Damien Le Moal
2021-01-13 10:24                   ` Ming Lei
2021-01-13 11:16                     ` Damien Le Moal
2021-01-13 11:47                       ` Ming Lei
2021-01-13 12:02                         ` Damien Le Moal
2021-01-14  3:52                           ` Ming Lei
2021-01-14  4:00                             ` Damien Le Moal
     [not found]                               ` <CGME20210114045019epcas1p16d4f5f258c2a3b290540ac640745764d@epcas1p1.samsung.com>
2021-01-14  4:35                                 ` Changheun Lee
2021-01-17 12:47   ` [bio] 70c9aa94e8: WARNING:at_block/bio.c:#bio_iov_iter_get_pages kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210113063936.4189-1-nanich.lee@samsung.com \
    --to=nanich.lee@samsung.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@wdc.com \
    --cc=jisoo2146.oh@samsung.com \
    --cc=junho89.kim@samsung.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mj0123.lee@samsung.com \
    --cc=seunghwan.hyun@samsung.com \
    --cc=sookwan7.kim@samsung.com \
    --cc=tj@kernel.org \
    --cc=woosung2.lee@samsung.com \
    --cc=yt0928.kim@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).