linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] blk-iocost: fix false positive lagging
@ 2022-05-26 13:35 Chengming Zhou
  2022-05-26 17:43 ` Tejun Heo
  2022-05-28  8:17 ` Chengming Zhou
  0 siblings, 2 replies; 5+ messages in thread
From: Chengming Zhou @ 2022-05-26 13:35 UTC (permalink / raw)
  To: tj, axboe; +Cc: linux-block, linux-kernel, Chengming Zhou

I found many false positive lagging during iocost test.

Since iocg->vtime will be advanced to (vnow - margins.target)
in hweight_after_donation(), which called throw away excess,
the iocg->done_vtime will also be advanced that much.

       period_at_vtime  <--period_vtime-->  vnow
              |                              |
  --------------------------------------------------->
        |<--->|
     margins.target
        |->
  vtime, done_vtime

If that iocg has some inflight io when vnow, but its done_vtime
is before period_at_vtime, ioc_timer_fn() will think it has
lagging io, even these io maybe issued just before now.

This patch change the condition to check if vdone is before
(period_at_vtime - margins.target) instead of period_at_vtime.

But there is another problem that this patch doesn't fix.
Since vtime will be advanced, we can't check if vtime is
after (vnow - MAX_LAGGING_PERIODS * period_vtime) to tell
whether this iocg pin lagging for too long.

Maybe we can add lagging_periods in iocg to record how many
periods this iocg pin lagging, but I don't know when to clean it.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 block/blk-iocost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 33a11ba971ea..42e301b7527b 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2259,7 +2259,7 @@ static void ioc_timer_fn(struct timer_list *timer)
 		    time_after64(vtime, vdone) &&
 		    time_after64(vtime, now.vnow -
 				 MAX_LAGGING_PERIODS * period_vtime) &&
-		    time_before64(vdone, now.vnow - period_vtime))
+		    time_before64(vdone, ioc->period_at_vtime - ioc->margins.target))
 			nr_lagging++;
 
 		/*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] blk-iocost: fix false positive lagging
  2022-05-26 13:35 [PATCH] blk-iocost: fix false positive lagging Chengming Zhou
@ 2022-05-26 17:43 ` Tejun Heo
  2022-05-26 23:43   ` [Phishing Risk] [External] " Chengming Zhou
  2022-05-28  8:17 ` Chengming Zhou
  1 sibling, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2022-05-26 17:43 UTC (permalink / raw)
  To: Chengming Zhou; +Cc: axboe, linux-block, linux-kernel

Hello,

On Thu, May 26, 2022 at 09:35:54PM +0800, Chengming Zhou wrote:
> I found many false positive lagging during iocost test.
> 
> Since iocg->vtime will be advanced to (vnow - margins.target)
> in hweight_after_donation(), which called throw away excess,
> the iocg->done_vtime will also be advanced that much.
> 
>        period_at_vtime  <--period_vtime-->  vnow
>               |                              |
>   --------------------------------------------------->
>         |<--->|
>      margins.target
>         |->
>   vtime, done_vtime

All it does is shifting the vtime (and done_vtime) within the current window
so that we don't build up budget too lage a budget possibly spanning
multiple periods. The lagging detection is supposed to detect IOs which are
issued two+ periods ago which didn't finish in the last period. So, I don't
think the above sliding up the window affects that detection given that the
lagging detection is done before the window sliding. All it's checking is
whether there still are in-flight IOs which were issued two+ windows ago, so
how the last window has been fast forwarded shouldn't affect the detection,
no?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Phishing Risk] [External] Re: [PATCH] blk-iocost: fix false positive lagging
  2022-05-26 17:43 ` Tejun Heo
@ 2022-05-26 23:43   ` Chengming Zhou
  0 siblings, 0 replies; 5+ messages in thread
From: Chengming Zhou @ 2022-05-26 23:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-block, linux-kernel

Hello,

On 2022/5/27 01:43, Tejun Heo wrote:
> Hello,
> 
> On Thu, May 26, 2022 at 09:35:54PM +0800, Chengming Zhou wrote:
>> I found many false positive lagging during iocost test.
>>
>> Since iocg->vtime will be advanced to (vnow - margins.target)
>> in hweight_after_donation(), which called throw away excess,
>> the iocg->done_vtime will also be advanced that much.
>>
>>        period_at_vtime  <--period_vtime-->  vnow
>>               |                              |
>>   --------------------------------------------------->
>>         |<--->|
>>      margins.target
>>         |->
>>   vtime, done_vtime
> 
> All it does is shifting the vtime (and done_vtime) within the current window
> so that we don't build up budget too lage a budget possibly spanning
> multiple periods. 

Yes, this is necessary. Suppose in the last timer, the iocg doesn't have inflights
and have excess, then iocg->vtime = iocg->done_vtime = (period_at_vtime - margins.target)

> The lagging detection is supposed to detect IOs which are
> issued two+ periods ago which didn't finish in the last period. So, I don't

Yes, I understand.

> think the above sliding up the window affects that detection given that the
> lagging detection is done before the window sliding. All it's checking is
> whether there still are in-flight IOs which were issued two+ windows ago, so
> how the last window has been fast forwarded shouldn't affect the detection,
> no?

Right, the lagging detection is done before the window sliding in this period timer.
The conditions that it checks vtime, done_vtime have been slided in the last timer.

time_after64(vtime, vdone) &&
time_after64(vtime, now.vnow - MAX_LAGGING_PERIODS * period_vtime) &&
time_before64(vdone, now.vnow - period_vtime)

The first condition says it has some inflights, the second condition is always true
if vtime has been slided in the last timer, the third condition will be true if the
cost of io completed since last timer < ioc->margins.target.

So I think it doesn't check correctly whether it has inflights that were issued two+
windows ago.

Thanks.

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] blk-iocost: fix false positive lagging
  2022-05-26 13:35 [PATCH] blk-iocost: fix false positive lagging Chengming Zhou
  2022-05-26 17:43 ` Tejun Heo
@ 2022-05-28  8:17 ` Chengming Zhou
  2022-06-01 12:32   ` Chengming Zhou
  1 sibling, 1 reply; 5+ messages in thread
From: Chengming Zhou @ 2022-05-28  8:17 UTC (permalink / raw)
  To: tj, axboe; +Cc: linux-block, linux-kernel

On 2022/5/26 21:35, Chengming Zhou wrote:
> I found many false positive lagging during iocost test.
> 
> Since iocg->vtime will be advanced to (vnow - margins.target)
> in hweight_after_donation(), which called throw away excess,
> the iocg->done_vtime will also be advanced that much.
> 
>        period_at_vtime  <--period_vtime-->  vnow
>               |                              |
>   --------------------------------------------------->
>         |<--->|
>      margins.target
>         |->
>   vtime, done_vtime
> 
> If that iocg has some inflight io when vnow, but its done_vtime
> is before period_at_vtime, ioc_timer_fn() will think it has
> lagging io, even these io maybe issued just before now.
> 
> This patch change the condition to check if vdone is before
> (period_at_vtime - margins.target) instead of period_at_vtime.
> 
> But there is another problem that this patch doesn't fix.
> Since vtime will be advanced, we can't check if vtime is
> after (vnow - MAX_LAGGING_PERIODS * period_vtime) to tell
> whether this iocg pin lagging for too long.
> 
> Maybe we can add lagging_periods in iocg to record how many
> periods this iocg pin lagging, but I don't know when to clean it.

Hello tejun, I add lagging_periods in iocg based on the original patch,
to record how many periods this iocg pin lagging. So we can use it to
avoid letting cmds which take a very long time pin lagging for too long.

Thanks.


diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 33a11ba971ea..998bb38ffb37 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -541,6 +541,8 @@ struct ioc_gq {
        u64                             indebt_since;
        u64                             indelay_since;

+       int                             lagging_periods;
+
        /* this iocg's depth in the hierarchy and ancestors including self */
        int                             level;
        struct ioc_gq                   *ancestors[];
@@ -2257,10 +2259,13 @@ static void ioc_timer_fn(struct timer_list *timer)
                if ((ppm_rthr != MILLION || ppm_wthr != MILLION) &&
                    !atomic_read(&iocg_to_blkg(iocg)->use_delay) &&
                    time_after64(vtime, vdone) &&
-                   time_after64(vtime, now.vnow -
-                                MAX_LAGGING_PERIODS * period_vtime) &&
-                   time_before64(vdone, now.vnow - period_vtime))
-                       nr_lagging++;
+                   time_before64(vdone, ioc->period_at_vtime - ioc->margins.target)) {
+                       if (iocg->lagging_periods < MAX_LAGGING_PERIODS) {
+                               nr_lagging++;
+                               iocg->lagging_periods++;
+                       }
+               } else if (iocg->lagging_periods)
+                       iocg->lagging_periods = 0;

                /*
                 * Determine absolute usage factoring in in-flight IOs to avoid


> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  block/blk-iocost.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-iocost.c b/block/blk-iocost.c
> index 33a11ba971ea..42e301b7527b 100644
> --- a/block/blk-iocost.c
> +++ b/block/blk-iocost.c
> @@ -2259,7 +2259,7 @@ static void ioc_timer_fn(struct timer_list *timer)
>  		    time_after64(vtime, vdone) &&
>  		    time_after64(vtime, now.vnow -
>  				 MAX_LAGGING_PERIODS * period_vtime) &&
> -		    time_before64(vdone, now.vnow - period_vtime))
> +		    time_before64(vdone, ioc->period_at_vtime - ioc->margins.target))
>  			nr_lagging++;
>  
>  		/*

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] blk-iocost: fix false positive lagging
  2022-05-28  8:17 ` Chengming Zhou
@ 2022-06-01 12:32   ` Chengming Zhou
  0 siblings, 0 replies; 5+ messages in thread
From: Chengming Zhou @ 2022-06-01 12:32 UTC (permalink / raw)
  To: tj, axboe; +Cc: linux-block, linux-kernel

On 2022/5/28 16:17, Chengming Zhou wrote:
> On 2022/5/26 21:35, Chengming Zhou wrote:
>> I found many false positive lagging during iocost test.
>>
>> Since iocg->vtime will be advanced to (vnow - margins.target)
>> in hweight_after_donation(), which called throw away excess,
>> the iocg->done_vtime will also be advanced that much.
>>
>>        period_at_vtime  <--period_vtime-->  vnow
>>               |                              |
>>   --------------------------------------------------->
>>         |<--->|
>>      margins.target
>>         |->
>>   vtime, done_vtime
>>
>> If that iocg has some inflight io when vnow, but its done_vtime
>> is before period_at_vtime, ioc_timer_fn() will think it has
>> lagging io, even these io maybe issued just before now.
>>
>> This patch change the condition to check if vdone is before
>> (period_at_vtime - margins.target) instead of period_at_vtime.
>>
>> But there is another problem that this patch doesn't fix.
>> Since vtime will be advanced, we can't check if vtime is
>> after (vnow - MAX_LAGGING_PERIODS * period_vtime) to tell
>> whether this iocg pin lagging for too long.
>>
>> Maybe we can add lagging_periods in iocg to record how many
>> periods this iocg pin lagging, but I don't know when to clean it.
> 
> Hello tejun, I add lagging_periods in iocg based on the original patch,
> to record how many periods this iocg pin lagging. So we can use it to
> avoid letting cmds which take a very long time pin lagging for too long.
> 
> Thanks.
> 
> 
> diff --git a/block/blk-iocost.c b/block/blk-iocost.c
> index 33a11ba971ea..998bb38ffb37 100644
> --- a/block/blk-iocost.c
> +++ b/block/blk-iocost.c
> @@ -541,6 +541,8 @@ struct ioc_gq {
>         u64                             indebt_since;
>         u64                             indelay_since;
> 
> +       int                             lagging_periods;
> +
>         /* this iocg's depth in the hierarchy and ancestors including self */
>         int                             level;
>         struct ioc_gq                   *ancestors[];
> @@ -2257,10 +2259,13 @@ static void ioc_timer_fn(struct timer_list *timer)
>                 if ((ppm_rthr != MILLION || ppm_wthr != MILLION) &&
>                     !atomic_read(&iocg_to_blkg(iocg)->use_delay) &&
>                     time_after64(vtime, vdone) &&
> -                   time_after64(vtime, now.vnow -
> -                                MAX_LAGGING_PERIODS * period_vtime) &&
> -                   time_before64(vdone, now.vnow - period_vtime))
> -                       nr_lagging++;
> +                   time_before64(vdone, ioc->period_at_vtime - ioc->margins.target)) {
> +                       if (iocg->lagging_periods < MAX_LAGGING_PERIODS) {
> +                               nr_lagging++;
> +                               iocg->lagging_periods++;
> +                       }
> +               } else if (iocg->lagging_periods)
> +                       iocg->lagging_periods = 0;
> 
>                 /*
>                  * Determine absolute usage factoring in in-flight IOs to avoid
> 

Hi, I tested with this version, previous false laggings are gone. So I wonder
if I should send v2 for review?

Thanks!

> 
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> ---
>>  block/blk-iocost.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/blk-iocost.c b/block/blk-iocost.c
>> index 33a11ba971ea..42e301b7527b 100644
>> --- a/block/blk-iocost.c
>> +++ b/block/blk-iocost.c
>> @@ -2259,7 +2259,7 @@ static void ioc_timer_fn(struct timer_list *timer)
>>  		    time_after64(vtime, vdone) &&
>>  		    time_after64(vtime, now.vnow -
>>  				 MAX_LAGGING_PERIODS * period_vtime) &&
>> -		    time_before64(vdone, now.vnow - period_vtime))
>> +		    time_before64(vdone, ioc->period_at_vtime - ioc->margins.target))
>>  			nr_lagging++;
>>  
>>  		/*

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-01 12:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-26 13:35 [PATCH] blk-iocost: fix false positive lagging Chengming Zhou
2022-05-26 17:43 ` Tejun Heo
2022-05-26 23:43   ` [Phishing Risk] [External] " Chengming Zhou
2022-05-28  8:17 ` Chengming Zhou
2022-06-01 12:32   ` Chengming Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).