All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel
@ 2016-09-12  6:45 Hou Tao
  2016-09-13 13:46 ` Vivek Goyal
  0 siblings, 1 reply; 7+ messages in thread
From: Hou Tao @ 2016-09-12  6:45 UTC (permalink / raw)
  To: linux-block; +Cc: linux-kernel, Jens Axboe, Vivek Goyal

Due to commit 500462a9de65 ("timers: Switch to a non-cascading wheel"),
the slack of timer increases when the timeout increases:

 So for HZ=250 we end up with the following granularity levels:
  Level Offset   Granularity                  Range
      0      0          4 ms                 0 ms -        252 ms
      1     64         32 ms               256 ms -       2044 ms (256ms - ~2s)
      2    128        256 ms              2048 ms -      16380 ms (~2s   - ~16s)

When the slack is bigger than throtl_slice (100ms), there will be
a problem: throtl_slice_used() will always return true, a new slice
will always be genereated, and the bio will be throttled forever.

The following is a example:

 echo 253:0 512 > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
 fio --readonly --direct=1 --filename=/dev/vda --size=4K --rate=4K \
		 --rw=read --ioengine=libaio --iodepth=16 --name 1

 the slack of 8s-timer is about 302ms.

 throtl / [R] bio. bdisp=0 sz=4096 bps=512 iodisp=0 iops=4294967295 queued=0/0
 throtl schedule timer. delay=8000 jiffies=4295784850
 throtl / dispatch nr_queued=1 read=1 write=0, bdisp=0/0, iodisp=0/0
 throtl / [R] new slice start=4295793152 end=4295793252 jiffies=4295793152
 throtl / [R] extend slice start=4295793152 end=4295801200 jiffies=4295793152
 throtl schedule timer. delay=8000 jiffies=4295793152
 throtl / dispatch nr_queued=1 read=1 write=0, bdisp=0/0, iodisp=0/0
 throtl / [R] new slice start=4295801344 end=4295801444 jiffies=4295801344
 throtl / [R] extend slice start=4295801344 end=4295809400 jiffies=4295801344
 throtl schedule timer. delay=8000 jiffies=4295801344

Fix it by checking the delayed dispatch in tg_may_dispatch():
1. If there is any dispatched bio, the time slice must have been used,
   so it's OK to renew the time slice.
2. If there is no queued bio, the time slice must have been expired,
   so it's Ok to renew the time slice.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 block/blk-throttle.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index f1aba26..91f8140 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -591,13 +591,20 @@ static inline void throtl_extend_slice(struct throtl_grp *tg, bool rw,
 		   tg->slice_end[rw], jiffies);
 }
 
+static bool throtl_is_delayed_disp(struct throtl_grp *tg, bool rw)
+{
+	return (time_after(jiffies, tg->slice_end[rw]) &&
+			!tg->bytes_disp[rw] && !tg->io_disp[rw] &&
+			tg->service_queue.nr_queued[rw]) ? true : false;
+}
+
 /* Determine if previously allocated or extended slice is complete or not */
 static bool throtl_slice_used(struct throtl_grp *tg, bool rw)
 {
 	if (time_in_range(jiffies, tg->slice_start[rw], tg->slice_end[rw]))
 		return false;
 
-	return 1;
+	return true;
 }
 
 /* Trim the used slices and adjust slice start accordingly */
@@ -782,7 +789,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 	 * existing slice to make sure it is at least throtl_slice interval
 	 * long since now.
 	 */
-	if (throtl_slice_used(tg, rw))
+	if (throtl_slice_used(tg, rw) && !throtl_is_delayed_disp(tg, rw))
 		throtl_start_new_slice(tg, rw);
 	else {
 		if (time_before(tg->slice_end[rw], jiffies + throtl_slice))
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel
  2016-09-12  6:45 [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel Hou Tao
@ 2016-09-13 13:46 ` Vivek Goyal
  2016-09-17  1:26   ` Hou Tao
  2016-09-19 21:06   ` Vivek Goyal
  0 siblings, 2 replies; 7+ messages in thread
From: Vivek Goyal @ 2016-09-13 13:46 UTC (permalink / raw)
  To: Hou Tao; +Cc: linux-block, linux-kernel, Jens Axboe, Tejun Heo, Thomas Gleixner

On Mon, Sep 12, 2016 at 02:45:30PM +0800, Hou Tao wrote:
> Due to commit 500462a9de65 ("timers: Switch to a non-cascading wheel"),
> the slack of timer increases when the timeout increases:
> 
>  So for HZ=250 we end up with the following granularity levels:
>   Level Offset   Granularity                  Range
>       0      0          4 ms                 0 ms -        252 ms
>       1     64         32 ms               256 ms -       2044 ms (256ms - ~2s)
>       2    128        256 ms              2048 ms -      16380 ms (~2s   - ~16s)
> 
> When the slack is bigger than throtl_slice (100ms), there will be
> a problem: throtl_slice_used() will always return true, a new slice
> will always be genereated, and the bio will be throttled forever.
> 
> The following is a example:
> 
>  echo 253:0 512 > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
>  fio --readonly --direct=1 --filename=/dev/vda --size=4K --rate=4K \
> 		 --rw=read --ioengine=libaio --iodepth=16 --name 1
> 
>  the slack of 8s-timer is about 302ms.
> 
>  throtl / [R] bio. bdisp=0 sz=4096 bps=512 iodisp=0 iops=4294967295 queued=0/0
>  throtl schedule timer. delay=8000 jiffies=4295784850
>  throtl / dispatch nr_queued=1 read=1 write=0, bdisp=0/0, iodisp=0/0
>  throtl / [R] new slice start=4295793152 end=4295793252 jiffies=4295793152
>  throtl / [R] extend slice start=4295793152 end=4295801200 jiffies=4295793152
>  throtl schedule timer. delay=8000 jiffies=4295793152
>  throtl / dispatch nr_queued=1 read=1 write=0, bdisp=0/0, iodisp=0/0
>  throtl / [R] new slice start=4295801344 end=4295801444 jiffies=4295801344
>  throtl / [R] extend slice start=4295801344 end=4295809400 jiffies=4295801344
>  throtl schedule timer. delay=8000 jiffies=4295801344
> 
> Fix it by checking the delayed dispatch in tg_may_dispatch():
> 1. If there is any dispatched bio, the time slice must have been used,
>    so it's OK to renew the time slice.
> 2. If there is no queued bio, the time slice must have been expired,
>    so it's Ok to renew the time slice.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  block/blk-throttle.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index f1aba26..91f8140 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -591,13 +591,20 @@ static inline void throtl_extend_slice(struct throtl_grp *tg, bool rw,
>  		   tg->slice_end[rw], jiffies);
>  }
>  
> +static bool throtl_is_delayed_disp(struct throtl_grp *tg, bool rw)
> +{
> +	return (time_after(jiffies, tg->slice_end[rw]) &&
> +			!tg->bytes_disp[rw] && !tg->io_disp[rw] &&
> +			tg->service_queue.nr_queued[rw]) ? true : false;
> +}

Hi Hou Tao,

[ CC Tejun and Thomas ]

Thanks for the patch. I can reproduce it. I am wondering that why are you
doing so many checks. Can't we just check if throttle group is empty or
not. If it is empty and slice has expired, then start a new slice. If
throttle group is not empty, then we know slice has to be an active slice
and should be extended (despite the fact that it might have expired
because timer function got called later than we expected it to be).

Can you please try following patch. It seems to resolve the issue for me.

Vivek


Subject: blk-throttle: Extend slice if throttle group is not empty

Right now, if slice is expired, we start a new slice. If a bio is
queued, we keep on extending slice by throtle_slice interval (100ms).

This worked well as long as pending timer function got executed with-in
few milli seconds of scheduled time. But looks like with recent changes
in timer subsystem, slack can be much longer depending on the expiry time
of the scheduled timer.

commit 500462a9de65 ("timers: Switch to a non-cascading wheel")

This means, by the time timer function gets executed, it is possible the
delay from scheduled time is more than 100ms. That means current code
will conclude that existing slice has expired and a new one needs to
be started. New slice will be 100ms by default and that will not be
sufficient to meet rate requirement of group given the bio size and
bio will not be dispatched and we will start a new timer function to
wait. And when that timer expires, same process will repeat and we
will wait again and this can easily be an infinite loop.

Solve this issue by starting a new slice only if throttle gropup is
empty. If it is not empty, that means there should be an active slice
going on. Ideally it should not be expired but given the slack, it is
possible that it has expired.

Reported-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-throttle.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: rhvgoyal-linux/block/blk-throttle.c
===================================================================
--- rhvgoyal-linux.orig/block/blk-throttle.c	2016-09-13 08:55:33.616200176 -0400
+++ rhvgoyal-linux/block/blk-throttle.c	2016-09-13 09:17:10.664200176 -0400
@@ -780,9 +780,11 @@ static bool tg_may_dispatch(struct throt
 	/*
 	 * If previous slice expired, start a new one otherwise renew/extend
 	 * existing slice to make sure it is at least throtl_slice interval
-	 * long since now.
+	 * long since now. New slice is started only for empty throttle group.
+	 * If there is queued bio, that means there should be an active
+	 * slice and it should be extended instead.
 	 */
-	if (throtl_slice_used(tg, rw))
+	if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
 		throtl_start_new_slice(tg, rw);
 	else {
 		if (time_before(tg->slice_end[rw], jiffies + throtl_slice))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel
  2016-09-13 13:46 ` Vivek Goyal
@ 2016-09-17  1:26   ` Hou Tao
  2016-09-19 21:06   ` Vivek Goyal
  1 sibling, 0 replies; 7+ messages in thread
From: Hou Tao @ 2016-09-17  1:26 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-block, Jens Axboe, Tejun Heo, Thomas Gleixner


> Thanks for the patch. I can reproduce it. I am wondering that why are you
> doing so many checks. Can't we just check if throttle group is empty or
> not. If it is empty and slice has expired, then start a new slice. If
> throttle group is not empty, then we know slice has to be an active slice
> and should be extended (despite the fact that it might have expired
> because timer function got called later than we expected it to be).
Yes, only checking the empty status of bio queue is enough.

The check of dispatched ios and bytes was used to ensure the time slice
had not been used, but the logic is incorrect: even if the time slice
had been used, the extend of time slice is needed if the timer slacked.

> Can you please try following patch. It seems to resolve the issue for me.
It works for me.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel
  2016-09-13 13:46 ` Vivek Goyal
  2016-09-17  1:26   ` Hou Tao
@ 2016-09-19 21:06   ` Vivek Goyal
  2016-09-19 21:11     ` Jens Axboe
  1 sibling, 1 reply; 7+ messages in thread
From: Vivek Goyal @ 2016-09-19 21:06 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel, Tejun Heo, Thomas Gleixner, Hou Tao

On Tue, Sep 13, 2016 at 09:46:46AM -0400, Vivek Goyal wrote:
> 
> Hi Hou Tao,
> 
> [ CC Tejun and Thomas ]
> 
> Thanks for the patch. I can reproduce it. I am wondering that why are you
> doing so many checks. Can't we just check if throttle group is empty or
> not. If it is empty and slice has expired, then start a new slice. If
> throttle group is not empty, then we know slice has to be an active slice
> and should be extended (despite the fact that it might have expired
> because timer function got called later than we expected it to be).
> 
> Can you please try following patch. It seems to resolve the issue for me.
> 
> Vivek

Hi Jens,

Can you please pick this patch. It seems to fix the reported issued.
Please let me know if you prefer a separate posting.

Vivek

> 
> 
> Subject: blk-throttle: Extend slice if throttle group is not empty
> 
> Right now, if slice is expired, we start a new slice. If a bio is
> queued, we keep on extending slice by throtle_slice interval (100ms).
> 
> This worked well as long as pending timer function got executed with-in
> few milli seconds of scheduled time. But looks like with recent changes
> in timer subsystem, slack can be much longer depending on the expiry time
> of the scheduled timer.
> 
> commit 500462a9de65 ("timers: Switch to a non-cascading wheel")
> 
> This means, by the time timer function gets executed, it is possible the
> delay from scheduled time is more than 100ms. That means current code
> will conclude that existing slice has expired and a new one needs to
> be started. New slice will be 100ms by default and that will not be
> sufficient to meet rate requirement of group given the bio size and
> bio will not be dispatched and we will start a new timer function to
> wait. And when that timer expires, same process will repeat and we
> will wait again and this can easily be an infinite loop.
> 
> Solve this issue by starting a new slice only if throttle gropup is
> empty. If it is not empty, that means there should be an active slice
> going on. Ideally it should not be expired but given the slack, it is
> possible that it has expired.
> 
> Reported-by: Hou Tao <houtao1@huawei.com>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  block/blk-throttle.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Index: rhvgoyal-linux/block/blk-throttle.c
> ===================================================================
> --- rhvgoyal-linux.orig/block/blk-throttle.c	2016-09-13 08:55:33.616200176 -0400
> +++ rhvgoyal-linux/block/blk-throttle.c	2016-09-13 09:17:10.664200176 -0400
> @@ -780,9 +780,11 @@ static bool tg_may_dispatch(struct throt
>  	/*
>  	 * If previous slice expired, start a new one otherwise renew/extend
>  	 * existing slice to make sure it is at least throtl_slice interval
> -	 * long since now.
> +	 * long since now. New slice is started only for empty throttle group.
> +	 * If there is queued bio, that means there should be an active
> +	 * slice and it should be extended instead.
>  	 */
> -	if (throtl_slice_used(tg, rw))
> +	if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
>  		throtl_start_new_slice(tg, rw);
>  	else {
>  		if (time_before(tg->slice_end[rw], jiffies + throtl_slice))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel
  2016-09-19 21:06   ` Vivek Goyal
@ 2016-09-19 21:11     ` Jens Axboe
  2016-09-27 12:57       ` Vivek Goyal
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2016-09-19 21:11 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-block, linux-kernel, Tejun Heo, Thomas Gleixner, Hou Tao

On 09/19/2016 03:06 PM, Vivek Goyal wrote:
> On Tue, Sep 13, 2016 at 09:46:46AM -0400, Vivek Goyal wrote:
>>
>> Hi Hou Tao,
>>
>> [ CC Tejun and Thomas ]
>>
>> Thanks for the patch. I can reproduce it. I am wondering that why are you
>> doing so many checks. Can't we just check if throttle group is empty or
>> not. If it is empty and slice has expired, then start a new slice. If
>> throttle group is not empty, then we know slice has to be an active slice
>> and should be extended (despite the fact that it might have expired
>> because timer function got called later than we expected it to be).
>>
>> Can you please try following patch. It seems to resolve the issue for me.
>>
>> Vivek
>
> Hi Jens,
>
> Can you please pick this patch. It seems to fix the reported issued.
> Please let me know if you prefer a separate posting.

I'll apply it, thanks Vivek.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel
  2016-09-19 21:11     ` Jens Axboe
@ 2016-09-27 12:57       ` Vivek Goyal
  2016-09-27 14:10         ` Vivek Goyal
  0 siblings, 1 reply; 7+ messages in thread
From: Vivek Goyal @ 2016-09-27 12:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel, Tejun Heo, Thomas Gleixner, Hou Tao

On Mon, Sep 19, 2016 at 03:11:10PM -0600, Jens Axboe wrote:
> On 09/19/2016 03:06 PM, Vivek Goyal wrote:
> > On Tue, Sep 13, 2016 at 09:46:46AM -0400, Vivek Goyal wrote:
> > > 
> > > Hi Hou Tao,
> > > 
> > > [ CC Tejun and Thomas ]
> > > 
> > > Thanks for the patch. I can reproduce it. I am wondering that why are you
> > > doing so many checks. Can't we just check if throttle group is empty or
> > > not. If it is empty and slice has expired, then start a new slice. If
> > > throttle group is not empty, then we know slice has to be an active slice
> > > and should be extended (despite the fact that it might have expired
> > > because timer function got called later than we expected it to be).
> > > 
> > > Can you please try following patch. It seems to resolve the issue for me.
> > > 
> > > Vivek
> > 
> > Hi Jens,
> > 
> > Can you please pick this patch. It seems to fix the reported issued.
> > Please let me know if you prefer a separate posting.
> 
> I'll apply it, thanks Vivek.

Hi Jens,

Did you get a chance to apply this one. Can't find it in for-next branch.

Vivek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel
  2016-09-27 12:57       ` Vivek Goyal
@ 2016-09-27 14:10         ` Vivek Goyal
  0 siblings, 0 replies; 7+ messages in thread
From: Vivek Goyal @ 2016-09-27 14:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel, Tejun Heo, Thomas Gleixner, Hou Tao

On Tue, Sep 27, 2016 at 08:57:22AM -0400, Vivek Goyal wrote:
> On Mon, Sep 19, 2016 at 03:11:10PM -0600, Jens Axboe wrote:
> > On 09/19/2016 03:06 PM, Vivek Goyal wrote:
> > > On Tue, Sep 13, 2016 at 09:46:46AM -0400, Vivek Goyal wrote:
> > > > 
> > > > Hi Hou Tao,
> > > > 
> > > > [ CC Tejun and Thomas ]
> > > > 
> > > > Thanks for the patch. I can reproduce it. I am wondering that why are you
> > > > doing so many checks. Can't we just check if throttle group is empty or
> > > > not. If it is empty and slice has expired, then start a new slice. If
> > > > throttle group is not empty, then we know slice has to be an active slice
> > > > and should be extended (despite the fact that it might have expired
> > > > because timer function got called later than we expected it to be).
> > > > 
> > > > Can you please try following patch. It seems to resolve the issue for me.
> > > > 
> > > > Vivek
> > > 
> > > Hi Jens,
> > > 
> > > Can you please pick this patch. It seems to fix the reported issued.
> > > Please let me know if you prefer a separate posting.
> > 
> > I'll apply it, thanks Vivek.
> 
> Hi Jens,
> 
> Did you get a chance to apply this one. Can't find it in for-next branch.
> 

Oh, just noticed that you sent it for 4.8. Sorry for the noise.

Vivek

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-09-27 14:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-12  6:45 [PATCH] blk-throttle: fix infinite throttling caused by non-cascading timer wheel Hou Tao
2016-09-13 13:46 ` Vivek Goyal
2016-09-17  1:26   ` Hou Tao
2016-09-19 21:06   ` Vivek Goyal
2016-09-19 21:11     ` Jens Axboe
2016-09-27 12:57       ` Vivek Goyal
2016-09-27 14:10         ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.