Re: [PATCH -next v3 2/2] blk-throttle: fix io hung due to configuration updates

From: "Michal Koutný" <mkoutny@suse.com>
To: "yukuai (C)" <yukuai3@huawei.com>
Cc: tj@kernel.org, axboe@kernel.dk, ming.lei@redhat.com,
	geert@linux-m68k.org, cgroups@vger.kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	yi.zhang@huawei.com
Subject: Re: [PATCH -next v3 2/2] blk-throttle: fix io hung due to configuration updates
Date: Thu, 19 May 2022 18:10:26 +0200	[thread overview]
Message-ID: <20220519161026.GG16096@blackbody.suse.cz> (raw)
In-Reply-To: <a8953189-af42-0225-3031-daf61347524a@huawei.com>

On Thu, May 19, 2022 at 08:14:28PM +0800, "yukuai (C)" <yukuai3@huawei.com> wrote:
> tg_with_in_bps_limit:
>  jiffy_elapsed_rnd = jiffies - tg->slice_start[rw];
>  tmp = bps_limit * jiffy_elapsed_rnd;
>  do_div(tmp, HZ);
>  bytes_allowed = tmp; -> how many bytes are allowed in this slice,
> 		         incluing dispatched.
>  if (tg->bytes_disp[rw] + bio_size <= bytes_allowed)
>   *wait = 0 -> no need to wait if this bio is within limit
> 
>  extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
>  -> extra_bytes is based on 'bytes_disp'
> 
> For example:
> 
> 1) bps_limit is 2k, we issue two io, (1k and 9k)
> 2) the first io(1k) will be dispatched, bytes_disp = 1k, slice_start = 0
>    the second io(9k) is waiting for (9 - (2 - 1)) / 2 = 4 s

The 2nd io arrived at 1s, the wait time is 4s, i.e. it can be dispatched
at 5s (i.e. 10k/*2kB/s = 5s).

> 3) after 3 s, we update bps_limit to 1k, then new waiting is caculated:
> 
> without this patch:  bytes_disp = 0, slict_start =3:
> bytes_allowed = 1k	                            <--- why 1k and not 0?
> extra_bytes = 9k - 1k = 8k
> wait = 8s

This looks like it was calculated at time 4s (1s after new config was
set).

> 
> whth this patch: bytes_disp = 0.5k, slice_start =  0,
> bytes_allowed = 1k * 3 + 1k = 4k
> extra_bytes =  0.5k + 9k - 4k = 5.5k
> wait = 5.5s

This looks like calculated at 4s, so the IO would be waiting till
4s+5.5s = 9.5s.

As I don't know why using time 4s, I'll shift this calculation to the
time 3s (when the config changes):

bytes_disp = 0.5k, slice_start =  0,
bytes_allowed = 1k * 3  = 3k
extra_bytes =  0.5k + 9k - 3k = 7.5k
wait = 7.5s

In absolute time, the IO would wait till 3s+7.5s = 10.5s

OK, either your 9.5s or my 10.5s looks weird (although earlier than
original 4s+8s=12s).
However, the IO should ideally only wait till

    3s + (9k -   (6k    -    1k)     ) / 1k/s =
         bio - (allowed - dispatched)  / new_limit

   =3s + 4k / 1k/s = 7s

   ('allowed' is based on old limit)

Or in another example, what if you change the config from 2k/s to ∞k/s
(unlimited, let's neglect the arithmetic overflow that you handle
explicitly, imagine a big number but not so big to be greater than
division result).

In such a case, the wait time should be zero, i.e. IO should be
dispatched right at the time of config change.
(With your patch that still calculates >0 wait time (and the original
behavior gives >0 wait too.)

> I hope I can expliain it clearly...

Yes, thanks for pointing me to relevant parts.
I hope I grasped them correctly.

IOW, your patch and formula make the wait time shorter but still IO can
be delayed indefinitely if you pass a sequence of new configs. (AFAIU)

Regards,
Michal