All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blk-mq: modify hybrid sleep time to aggressive
@ 2020-11-18  0:47 Dongjoo Seo
  2020-11-18  7:07 ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Dongjoo Seo @ 2020-11-18  0:47 UTC (permalink / raw)
  To: axboe, hch, ming.lei; +Cc: linux-block

Current sleep time for hybrid polling is half of mean time.
The 'half' sleep time is good for minimizing the cpu utilization.
But, the problem is that its cpu utilization is still high.
this patch can help to minimize the cpu utilization side.

Below 1,2 is my test hardware sets.

1. Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz + Samsung 970 pro 1Tb
2. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz + INTEL SSDPED1D480GA 480G

        |  Classic Polling | Hybrid Polling  | this Patch
-----------------------------------------------------------------
        cpu util | IOPS(k) | cpu util | IOPS | cpu util | IOPS  |
-----------------------------------------------------------------
1.       99.96   |   491   |  56.98   | 467  | 35.98    | 442   |
-----------------------------------------------------------------
2.       99.94   |   582   |  56.3    | 582  | 35.28    | 582   |

cpu util means that sum of sys and user util.

I used 4k rand read for this test.
because that case is worst case of I/O performance side.
below one is my fio setup.

name=pollTest
ioengine=pvsync2
hipri
direct=1
size=100%
randrepeat=0
time_based
ramp_time=0
norandommap
refill_buffers
log_avg_msec=1000
log_max_value=1
group_reporting
filename=/dev/nvme0n1
[rd_rnd_qd_1_4k_1w]
bs=4k
iodepth=32
numjobs=[num of cpus]
rw=randread
runtime=60
write_bw_log=bw_rd_rnd_qd_1_4k_1w
write_iops_log=iops_rd_rnd_qd_1_4k_1w
write_lat_log=lat_rd_rnd_qd_1_4k_1w

Thanks

Signed-off-by: Dongjoo Seo <commisori28@gmail.com>
---
 block/blk-mq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1b25ec2fe9be..c3d578416899 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3749,8 +3749,7 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
 		return ret;
 
 	if (q->poll_stat[bucket].nr_samples)
-		ret = (q->poll_stat[bucket].mean + 1) / 2;
-
+		ret = (q->poll_stat[bucket].mean + 1) * 3 / 4;
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-mq: modify hybrid sleep time to aggressive
  2020-11-18  0:47 [PATCH] blk-mq: modify hybrid sleep time to aggressive Dongjoo Seo
@ 2020-11-18  7:07 ` Christoph Hellwig
  2020-11-18  7:16   ` Damien Le Moal
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2020-11-18  7:07 UTC (permalink / raw)
  To: Dongjoo Seo; +Cc: axboe, hch, ming.lei, linux-block, Damien Le Moal

Adding Damien who wrote this code.

On Wed, Nov 18, 2020 at 09:47:46AM +0900, Dongjoo Seo wrote:
> Current sleep time for hybrid polling is half of mean time.
> The 'half' sleep time is good for minimizing the cpu utilization.
> But, the problem is that its cpu utilization is still high.
> this patch can help to minimize the cpu utilization side.
> 
> Below 1,2 is my test hardware sets.
> 
> 1. Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz + Samsung 970 pro 1Tb
> 2. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz + INTEL SSDPED1D480GA 480G
> 
>         |  Classic Polling | Hybrid Polling  | this Patch
> -----------------------------------------------------------------
>         cpu util | IOPS(k) | cpu util | IOPS | cpu util | IOPS  |
> -----------------------------------------------------------------
> 1.       99.96   |   491   |  56.98   | 467  | 35.98    | 442   |
> -----------------------------------------------------------------
> 2.       99.94   |   582   |  56.3    | 582  | 35.28    | 582   |
> 
> cpu util means that sum of sys and user util.
> 
> I used 4k rand read for this test.
> because that case is worst case of I/O performance side.
> below one is my fio setup.
> 
> name=pollTest
> ioengine=pvsync2
> hipri
> direct=1
> size=100%
> randrepeat=0
> time_based
> ramp_time=0
> norandommap
> refill_buffers
> log_avg_msec=1000
> log_max_value=1
> group_reporting
> filename=/dev/nvme0n1
> [rd_rnd_qd_1_4k_1w]
> bs=4k
> iodepth=32
> numjobs=[num of cpus]
> rw=randread
> runtime=60
> write_bw_log=bw_rd_rnd_qd_1_4k_1w
> write_iops_log=iops_rd_rnd_qd_1_4k_1w
> write_lat_log=lat_rd_rnd_qd_1_4k_1w
> 
> Thanks
> 
> Signed-off-by: Dongjoo Seo <commisori28@gmail.com>
> ---
>  block/blk-mq.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 1b25ec2fe9be..c3d578416899 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3749,8 +3749,7 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
>  		return ret;
>  
>  	if (q->poll_stat[bucket].nr_samples)
> -		ret = (q->poll_stat[bucket].mean + 1) / 2;
> -
> +		ret = (q->poll_stat[bucket].mean + 1) * 3 / 4;
>  	return ret;
>  }
>  
> -- 
> 2.17.1
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-mq: modify hybrid sleep time to aggressive
  2020-11-18  7:07 ` Christoph Hellwig
@ 2020-11-18  7:16   ` Damien Le Moal
  2020-11-18  9:26     ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: Damien Le Moal @ 2020-11-18  7:16 UTC (permalink / raw)
  To: hch, Dongjoo Seo; +Cc: axboe, ming.lei, linux-block, sbates

On 2020/11/18 16:07, Christoph Hellwig wrote:
> Adding Damien who wrote this code.

Nope. It wasn't me. I think it was Stephen Bates:

commit 720b8ccc4500 ("blk-mq: Add a polling specific stats function")

So +Stephen.


> 
> On Wed, Nov 18, 2020 at 09:47:46AM +0900, Dongjoo Seo wrote:
>> Current sleep time for hybrid polling is half of mean time.
>> The 'half' sleep time is good for minimizing the cpu utilization.
>> But, the problem is that its cpu utilization is still high.
>> this patch can help to minimize the cpu utilization side.
>>
>> Below 1,2 is my test hardware sets.
>>
>> 1. Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz + Samsung 970 pro 1Tb
>> 2. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz + INTEL SSDPED1D480GA 480G
>>
>>         |  Classic Polling | Hybrid Polling  | this Patch
>> -----------------------------------------------------------------
>>         cpu util | IOPS(k) | cpu util | IOPS | cpu util | IOPS  |
>> -----------------------------------------------------------------
>> 1.       99.96   |   491   |  56.98   | 467  | 35.98    | 442   |
>> -----------------------------------------------------------------
>> 2.       99.94   |   582   |  56.3    | 582  | 35.28    | 582   |
>>
>> cpu util means that sum of sys and user util.
>>
>> I used 4k rand read for this test.
>> because that case is worst case of I/O performance side.
>> below one is my fio setup.
>>
>> name=pollTest
>> ioengine=pvsync2
>> hipri
>> direct=1
>> size=100%
>> randrepeat=0
>> time_based
>> ramp_time=0
>> norandommap
>> refill_buffers
>> log_avg_msec=1000
>> log_max_value=1
>> group_reporting
>> filename=/dev/nvme0n1
>> [rd_rnd_qd_1_4k_1w]
>> bs=4k
>> iodepth=32
>> numjobs=[num of cpus]
>> rw=randread
>> runtime=60
>> write_bw_log=bw_rd_rnd_qd_1_4k_1w
>> write_iops_log=iops_rd_rnd_qd_1_4k_1w
>> write_lat_log=lat_rd_rnd_qd_1_4k_1w
>>
>> Thanks
>>
>> Signed-off-by: Dongjoo Seo <commisori28@gmail.com>
>> ---
>>  block/blk-mq.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 1b25ec2fe9be..c3d578416899 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -3749,8 +3749,7 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
>>  		return ret;
>>  
>>  	if (q->poll_stat[bucket].nr_samples)
>> -		ret = (q->poll_stat[bucket].mean + 1) / 2;
>> -
>> +		ret = (q->poll_stat[bucket].mean + 1) * 3 / 4;
>>  	return ret;
>>  }
>>  
>> -- 
>> 2.17.1
>>
> ---end quoted text---
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-mq: modify hybrid sleep time to aggressive
  2020-11-18  7:16   ` Damien Le Moal
@ 2020-11-18  9:26     ` Pavel Begunkov
       [not found]       ` <7F6FFFCB-3FD1-4A7B-8D30-FF4BBAD4AEA4@gmail.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2020-11-18  9:26 UTC (permalink / raw)
  To: Damien Le Moal, hch, Dongjoo Seo; +Cc: axboe, ming.lei, linux-block, sbates

On 18/11/2020 07:16, Damien Le Moal wrote:
> On 2020/11/18 16:07, Christoph Hellwig wrote:
>> Adding Damien who wrote this code.
> 
> Nope. It wasn't me. I think it was Stephen Bates:
> 
> commit 720b8ccc4500 ("blk-mq: Add a polling specific stats function")
> 
> So +Stephen.
>>
>> On Wed, Nov 18, 2020 at 09:47:46AM +0900, Dongjoo Seo wrote:
>>> Current sleep time for hybrid polling is half of mean time.
>>> The 'half' sleep time is good for minimizing the cpu utilization.
>>> But, the problem is that its cpu utilization is still high.
>>> this patch can help to minimize the cpu utilization side.

This won't work well. When I was experimenting I saw that half mean
is actually is too much for fast enough requests, like <20us 4K writes,
it's oversleeping them. Even more I'm afraid of getting in a vicious
cycle, when oversleeping increases statistical mean, that increases
sleep time, that again increases stat mean, and so on. That what
happened for me when the scheme was too aggressive.

I actually sent once patches [1] for automatic dynamic sleep time
adjustment, but nobody cared.

[1] https://lkml.org/lkml/2019/4/30/117

>>>
>>> Below 1,2 is my test hardware sets.
>>>
>>> 1. Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz + Samsung 970 pro 1Tb
>>> 2. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz + INTEL SSDPED1D480GA 480G
>>>
>>>         |  Classic Polling | Hybrid Polling  | this Patch
>>> -----------------------------------------------------------------
>>>         cpu util | IOPS(k) | cpu util | IOPS | cpu util | IOPS  |
>>> -----------------------------------------------------------------
>>> 1.       99.96   |   491   |  56.98   | 467  | 35.98    | 442   |
>>> -----------------------------------------------------------------
>>> 2.       99.94   |   582   |  56.3    | 582  | 35.28    | 582   |
>>>
>>> cpu util means that sum of sys and user util.
>>>
>>> I used 4k rand read for this test.
>>> because that case is worst case of I/O performance side.
>>> below one is my fio setup.
>>>
>>> name=pollTest
>>> ioengine=pvsync2
>>> hipri
>>> direct=1
>>> size=100%
>>> randrepeat=0
>>> time_based
>>> ramp_time=0
>>> norandommap
>>> refill_buffers
>>> log_avg_msec=1000
>>> log_max_value=1
>>> group_reporting
>>> filename=/dev/nvme0n1
>>> [rd_rnd_qd_1_4k_1w]
>>> bs=4k
>>> iodepth=32
>>> numjobs=[num of cpus]
>>> rw=randread
>>> runtime=60
>>> write_bw_log=bw_rd_rnd_qd_1_4k_1w
>>> write_iops_log=iops_rd_rnd_qd_1_4k_1w
>>> write_lat_log=lat_rd_rnd_qd_1_4k_1w
>>>
>>> Thanks
>>>
>>> Signed-off-by: Dongjoo Seo <commisori28@gmail.com>
>>> ---
>>>  block/blk-mq.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 1b25ec2fe9be..c3d578416899 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -3749,8 +3749,7 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
>>>  		return ret;
>>>  
>>>  	if (q->poll_stat[bucket].nr_samples)
>>> -		ret = (q->poll_stat[bucket].mean + 1) / 2;
>>> -
>>> +		ret = (q->poll_stat[bucket].mean + 1) * 3 / 4;
>>>  	return ret;
>>>  }
>>>  
>>> -- 
>>> 2.17.1
>>>
>> ---end quoted text---
>>
> 
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-mq: modify hybrid sleep time to aggressive
       [not found]       ` <7F6FFFCB-3FD1-4A7B-8D30-FF4BBAD4AEA4@gmail.com>
@ 2020-11-18 14:17         ` Pavel Begunkov
       [not found]           ` <CABM9hu3FE6ZZL=oWznbJUw2i9i8qJ1AYKotg_uEeAe1Vu+8Ong@mail.gmail.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2020-11-18 14:17 UTC (permalink / raw)
  To: dongjoo seo; +Cc: Damien Le Moal, hch, axboe, ming.lei, linux-block, sbates

On 18/11/2020 10:35, dongjoo seo wrote:
> I agree with your opinion. And your patch is also good approach.
> How about combine it? Adaptive solution with 3/4.

I couldn't disclose numbers back then, but thanks to a steep skewed
latency distribution of NAND/SSDs, it actually was automatically
adjusting it to ~3/4 for QD1 and long enough requests (~75+ us).
Also, if "max(sleep_ns, half_mean)" is removed, it was keeping the
time below 1/2 for fast requests (less than ~30us), and that is a
good thing because it was constantly oversleeping them.
Though new ultra low-latency SSDs came since then.

The real problem is to find anyone who actually uses it, otherwise
it's just a chunk of dead code. Do you? Anyone? I remember once it
was completely broken for months, but that was barely noticed.


> Because, if we get the intensive workloads then we need to 
> decrease the whole cpu utilization even with [1].
> 
> [1] https://lkml.org/lkml/2019/4/30/117 <https://lkml.org/lkml/2019/4/30/117>
> 
>> On Nov 18, 2020, at 6:26 PM, Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 18/11/2020 07:16, Damien Le Moal wrote:
>>> On 2020/11/18 16:07, Christoph Hellwig wrote:
>>>> Adding Damien who wrote this code.
>>>
>>> Nope. It wasn't me. I think it was Stephen Bates:
>>>
>>> commit 720b8ccc4500 ("blk-mq: Add a polling specific stats function")
>>>
>>> So +Stephen.
>>>>
>>>> On Wed, Nov 18, 2020 at 09:47:46AM +0900, Dongjoo Seo wrote:
>>>>> Current sleep time for hybrid polling is half of mean time.
>>>>> The 'half' sleep time is good for minimizing the cpu utilization.
>>>>> But, the problem is that its cpu utilization is still high.
>>>>> this patch can help to minimize the cpu utilization side.
>>
>> This won't work well. When I was experimenting I saw that half mean
>> is actually is too much for fast enough requests, like <20us 4K writes,
>> it's oversleeping them. Even more I'm afraid of getting in a vicious
>> cycle, when oversleeping increases statistical mean, that increases
>> sleep time, that again increases stat mean, and so on. That what
>> happened for me when the scheme was too aggressive.
>>
>> I actually sent once patches [1] for automatic dynamic sleep time
>> adjustment, but nobody cared.
>>
>> [1] https://lkml.org/lkml/2019/4/30/117 <https://lkml.org/lkml/2019/4/30/117>
>>
>>>>>
>>>>> Below 1,2 is my test hardware sets.
>>>>>
>>>>> 1. Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz + Samsung 970 pro 1Tb
>>>>> 2. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz + INTEL SSDPED1D480GA 480G
>>>>>
>>>>>        |  Classic Polling | Hybrid Polling  | this Patch
>>>>> -----------------------------------------------------------------
>>>>>        cpu util | IOPS(k) | cpu util | IOPS | cpu util | IOPS  |
>>>>> -----------------------------------------------------------------
>>>>> 1.       99.96   |   491   |  56.98   | 467  | 35.98    | 442   |
>>>>> -----------------------------------------------------------------
>>>>> 2.       99.94   |   582   |  56.3    | 582  | 35.28    | 582   |
>>>>>
>>>>> cpu util means that sum of sys and user util.
>>>>>
>>>>> I used 4k rand read for this test.
>>>>> because that case is worst case of I/O performance side.
>>>>> below one is my fio setup.
>>>>>
>>>>> name=pollTest
>>>>> ioengine=pvsync2
>>>>> hipri
>>>>> direct=1
>>>>> size=100%
>>>>> randrepeat=0
>>>>> time_based
>>>>> ramp_time=0
>>>>> norandommap
>>>>> refill_buffers
>>>>> log_avg_msec=1000
>>>>> log_max_value=1
>>>>> group_reporting
>>>>> filename=/dev/nvme0n1
>>>>> [rd_rnd_qd_1_4k_1w]
>>>>> bs=4k
>>>>> iodepth=32
>>>>> numjobs=[num of cpus]
>>>>> rw=randread
>>>>> runtime=60
>>>>> write_bw_log=bw_rd_rnd_qd_1_4k_1w
>>>>> write_iops_log=iops_rd_rnd_qd_1_4k_1w
>>>>> write_lat_log=lat_rd_rnd_qd_1_4k_1w
>>>>>
>>>>> Thanks
>>>>>
>>>>> Signed-off-by: Dongjoo Seo <commisori28@gmail.com>
>>>>> ---
>>>>> block/blk-mq.c | 3 +--
>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index 1b25ec2fe9be..c3d578416899 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -3749,8 +3749,7 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
>>>>> 		return ret;
>>>>>
>>>>> 	if (q->poll_stat[bucket].nr_samples)
>>>>> -		ret = (q->poll_stat[bucket].mean + 1) / 2;
>>>>> -
>>>>> +		ret = (q->poll_stat[bucket].mean + 1) * 3 / 4;
>>>>> 	return ret;
>>>>> }
>>>>>
>>>>> -- 
>>>>> 2.17.1
>>>>>
>>>> ---end quoted text---
>>>>
>>>
>>>
>>
>> -- 
>> Pavel Begunkov
> 
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-mq: modify hybrid sleep time to aggressive
       [not found]           ` <CABM9hu3FE6ZZL=oWznbJUw2i9i8qJ1AYKotg_uEeAe1Vu+8Ong@mail.gmail.com>
@ 2020-11-19 17:51             ` Pavel Begunkov
  0 siblings, 0 replies; 7+ messages in thread
From: Pavel Begunkov @ 2020-11-19 17:51 UTC (permalink / raw)
  To: dongjoo seo; +Cc: Damien Le Moal, hch, axboe, ming.lei, linux-block, sbates

On 18/11/2020 15:12, dongjoo seo wrote:
> Actually I worked with a many-core machine with an NVMe devices.
> It have 120~192 cpu core and lots of memory and disk device.
> For that, I checked the scalability of block layer and NVMe device driver.

I'm rather mean not test setups but production servers or large
sets of user actually using it.

AFAIK "normal users" like laptops and so on don't even use iopoll,
that's too wasteful. Maybe someone here can correct me.

> During that process, I found the polling methodologies like io_uring, SPDK.

Just to notice, io_uring is not iopoll centric, and internally it
goes through similar code path as HIPRI reads/writes, i.e. using
mentioned NVMe driver polling.

> But, for applying that engines, we need to modify the application's code.
> Thats why I and we interest in the polling in NVMe device driver.
> 
> And, you are right, normal user can not recognize that
> difference btw previous one and polling based approach.
> However, I believe that ultra low-latency dissemination will make this
> active.

Ultra low-latency devices make hybrid polling not as useful as
before because a) it's harder to get right statistics with all
discrepancies in the host system, b) tighter gaps makes relative
overhead on that sleep higher + easier to oversleep.

> 
> Anyway, Thank you for your opinion.
> 
> 2020년 11월 18일 (수) 오후 11:20, Pavel Begunkov <asml.silence@gmail.com>님이 작성:
> 
>> On 18/11/2020 10:35, dongjoo seo wrote:
>>> I agree with your opinion. And your patch is also good approach.
>>> How about combine it? Adaptive solution with 3/4.
>>
>> I couldn't disclose numbers back then, but thanks to a steep skewed
>> latency distribution of NAND/SSDs, it actually was automatically
>> adjusting it to ~3/4 for QD1 and long enough requests (~75+ us).
>> Also, if "max(sleep_ns, half_mean)" is removed, it was keeping the
>> time below 1/2 for fast requests (less than ~30us), and that is a
>> good thing because it was constantly oversleeping them.
>> Though new ultra low-latency SSDs came since then.
>>
>> The real problem is to find anyone who actually uses it, otherwise
>> it's just a chunk of dead code. Do you? Anyone? I remember once it
>> was completely broken for months, but that was barely noticed.
>>
>>
>>> Because, if we get the intensive workloads then we need to
>>> decrease the whole cpu utilization even with [1].
>>>
>>> [1] https://lkml.org/lkml/2019/4/30/117 <
>> https://lkml.org/lkml/2019/4/30/117>
>>>
>>>> On Nov 18, 2020, at 6:26 PM, Pavel Begunkov <asml.silence@gmail.com>
>> wrote:
>>>>
>>>> On 18/11/2020 07:16, Damien Le Moal wrote:
>>>>> On 2020/11/18 16:07, Christoph Hellwig wrote:
>>>>>> Adding Damien who wrote this code.
>>>>>
>>>>> Nope. It wasn't me. I think it was Stephen Bates:
>>>>>
>>>>> commit 720b8ccc4500 ("blk-mq: Add a polling specific stats function")
>>>>>
>>>>> So +Stephen.
>>>>>>
>>>>>> On Wed, Nov 18, 2020 at 09:47:46AM +0900, Dongjoo Seo wrote:
>>>>>>> Current sleep time for hybrid polling is half of mean time.
>>>>>>> The 'half' sleep time is good for minimizing the cpu utilization.
>>>>>>> But, the problem is that its cpu utilization is still high.
>>>>>>> this patch can help to minimize the cpu utilization side.
>>>>
>>>> This won't work well. When I was experimenting I saw that half mean
>>>> is actually is too much for fast enough requests, like <20us 4K writes,
>>>> it's oversleeping them. Even more I'm afraid of getting in a vicious
>>>> cycle, when oversleeping increases statistical mean, that increases
>>>> sleep time, that again increases stat mean, and so on. That what
>>>> happened for me when the scheme was too aggressive.
>>>>
>>>> I actually sent once patches [1] for automatic dynamic sleep time
>>>> adjustment, but nobody cared.
>>>>
>>>> [1] https://lkml.org/lkml/2019/4/30/117 <
>> https://lkml.org/lkml/2019/4/30/117>
>>>>
>>>>>>>
>>>>>>> Below 1,2 is my test hardware sets.
>>>>>>>
>>>>>>> 1. Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz + Samsung 970 pro 1Tb
>>>>>>> 2. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz + INTEL SSDPED1D480GA
>> 480G
>>>>>>>
>>>>>>>        |  Classic Polling | Hybrid Polling  | this Patch
>>>>>>> -----------------------------------------------------------------
>>>>>>>        cpu util | IOPS(k) | cpu util | IOPS | cpu util | IOPS  |
>>>>>>> -----------------------------------------------------------------
>>>>>>> 1.       99.96   |   491   |  56.98   | 467  | 35.98    | 442   |
>>>>>>> -----------------------------------------------------------------
>>>>>>> 2.       99.94   |   582   |  56.3    | 582  | 35.28    | 582   |
>>>>>>>
>>>>>>> cpu util means that sum of sys and user util.
>>>>>>>
>>>>>>> I used 4k rand read for this test.
>>>>>>> because that case is worst case of I/O performance side.
>>>>>>> below one is my fio setup.
>>>>>>>
>>>>>>> name=pollTest
>>>>>>> ioengine=pvsync2
>>>>>>> hipri
>>>>>>> direct=1
>>>>>>> size=100%
>>>>>>> randrepeat=0
>>>>>>> time_based
>>>>>>> ramp_time=0
>>>>>>> norandommap
>>>>>>> refill_buffers
>>>>>>> log_avg_msec=1000
>>>>>>> log_max_value=1
>>>>>>> group_reporting
>>>>>>> filename=/dev/nvme0n1
>>>>>>> [rd_rnd_qd_1_4k_1w]
>>>>>>> bs=4k
>>>>>>> iodepth=32
>>>>>>> numjobs=[num of cpus]
>>>>>>> rw=randread
>>>>>>> runtime=60
>>>>>>> write_bw_log=bw_rd_rnd_qd_1_4k_1w
>>>>>>> write_iops_log=iops_rd_rnd_qd_1_4k_1w
>>>>>>> write_lat_log=lat_rd_rnd_qd_1_4k_1w
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Signed-off-by: Dongjoo Seo <commisori28@gmail.com>
>>>>>>> ---
>>>>>>> block/blk-mq.c | 3 +--
>>>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>>>> index 1b25ec2fe9be..c3d578416899 100644
>>>>>>> --- a/block/blk-mq.c
>>>>>>> +++ b/block/blk-mq.c
>>>>>>> @@ -3749,8 +3749,7 @@ static unsigned long blk_mq_poll_nsecs(struct
>> request_queue *q,
>>>>>>>           return ret;
>>>>>>>
>>>>>>>   if (q->poll_stat[bucket].nr_samples)
>>>>>>> -         ret = (q->poll_stat[bucket].mean + 1) / 2;
>>>>>>> -
>>>>>>> +         ret = (q->poll_stat[bucket].mean + 1) * 3 / 4;
>>>>>>>   return ret;
>>>>>>> }
>>>>>>>
>>>>>>> --
>>>>>>> 2.17.1
>>>>>>>
>>>>>> ---end quoted text---
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Pavel Begunkov
>>>
>>>
>>
>> --
>> Pavel Begunkov
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] blk-mq: modify hybrid sleep time to aggressive
@ 2020-11-16 16:43 Dongjoo Seo
  0 siblings, 0 replies; 7+ messages in thread
From: Dongjoo Seo @ 2020-11-16 16:43 UTC (permalink / raw)
  To: linux-block

Current sleep time for hybrid polling is half of mean time.
The 'half' sleep time is good for minimizing the cpu utilization.
But, the problem is that its cpu utilization is still high.
this patch can help to minimize the cpu utilization side.

Below 1,2 is my test hardware sets.

1. Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz + Samsung 970 pro 1Tb
2. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz + INTEL SSDPED1D480GA 480G

        |  Classic Polling | Hybrid Polling  | this Patch
-----------------------------------------------------------------
        cpu util | IOPS(k) | cpu util | IOPS | cpu util | IOPS  |
-----------------------------------------------------------------
1.       99.96   |   491   |  56.98   | 467  | 35.98    | 442   |
-----------------------------------------------------------------
2.       99.94   |   582   |  56.3    | 582  | 35.28    | 582   |

cpu util means that sum of sys and user util.

I used 4k rand read for this test.
because that case is worst case of I/O performance side.
below one is my fio setup.

name=pollTest
ioengine=pvsync2
hipri
direct=1
size=100%
randrepeat=0
time_based
ramp_time=0
norandommap
refill_buffers
log_avg_msec=1000
log_max_value=1
group_reporting
filename=/dev/nvme0n1
[rd_rnd_qd_1_4k_1w]
bs=4k
iodepth=32
numjobs=[num of cpus]
rw=randread
runtime=60
write_bw_log=bw_rd_rnd_qd_1_4k_1w
write_iops_log=iops_rd_rnd_qd_1_4k_1w
write_lat_log=lat_rd_rnd_qd_1_4k_1w

Thanks

Signed-off-by: Dongjoo Seo <commisori28@gmail.com>
---
 block/blk-mq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1b25ec2fe9be..c3d578416899 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3749,8 +3749,7 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
 		return ret;
 
 	if (q->poll_stat[bucket].nr_samples)
-		ret = (q->poll_stat[bucket].mean + 1) / 2;
-
+		ret = (q->poll_stat[bucket].mean + 1) * 3 / 4;
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-19 17:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18  0:47 [PATCH] blk-mq: modify hybrid sleep time to aggressive Dongjoo Seo
2020-11-18  7:07 ` Christoph Hellwig
2020-11-18  7:16   ` Damien Le Moal
2020-11-18  9:26     ` Pavel Begunkov
     [not found]       ` <7F6FFFCB-3FD1-4A7B-8D30-FF4BBAD4AEA4@gmail.com>
2020-11-18 14:17         ` Pavel Begunkov
     [not found]           ` <CABM9hu3FE6ZZL=oWznbJUw2i9i8qJ1AYKotg_uEeAe1Vu+8Ong@mail.gmail.com>
2020-11-19 17:51             ` Pavel Begunkov
  -- strict thread matches above, loose matches on Subject: below --
2020-11-16 16:43 Dongjoo Seo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.