All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
@ 2019-01-14 19:21 Liu Bo
  2019-01-17 23:03 ` Liu Bo
  2019-01-18 12:58 ` Jens Axboe
  0 siblings, 2 replies; 11+ messages in thread
From: Liu Bo @ 2019-01-14 19:21 UTC (permalink / raw)
  To: linux-block; +Cc: Josef Bacik

Our test reported the following stack, and vmcore showed that
->inflight counter is -1.

[ffffc9003fcc38d0] __schedule at ffffffff8173d95d
[ffffc9003fcc3958] schedule at ffffffff8173de26
[ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
[ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
[ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
[ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
[ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
[ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
[ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
[ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
[ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
[ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
[ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
[ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
[ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
[ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
[ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
[ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
[ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e

The ->inflight counter may be negative (-1) if

0) blk-throttle had been enabled when the IO was issued, so its bio
has a associated blkg,

1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
in this blkg was not available by then,

2) blk-iolatency was enabled before this IO reached its endio, so that
iolatency_grp became available when the IO did the endio.

3) the ->inflight counter is decreased from 0 to -1.

This uses atomic_dec_is_positive() instead to avoid the negative
inflight counter.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 block/blk-iolatency.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 159af76d60e6..da4ede5c5ddf 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -660,7 +660,12 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
 		}
 		rqw = &iolat->rq_wait;
 
-		atomic_dec(&rqw->inflight);
+		/*
+		 * if blkg is enabled by another throttler (e.g. blk-throttle)
+		 * and blk-iolatency is enabled while IO is inflight,
+		 * blk-iolatency's ->inflight counter may be still 0.
+		 */
+		atomic_dec_if_positive(&rqw->inflight);
 		if (!enabled || iolat->min_lat_nsec == 0)
 			goto next;
 		iolatency_record_time(iolat, &bio->bi_issue, now,
@@ -692,7 +697,8 @@ static void blkcg_iolatency_cleanup(struct rq_qos *rqos, struct bio *bio)
 			goto next;
 
 		rqw = &iolat->rq_wait;
-		atomic_dec(&rqw->inflight);
+		/* see comments in blkcg_iolatency_done_io() */
+		atomic_dec_if_positive(&rqw->inflight);
 		wake_up(&rqw->wait);
 next:
 		blkg = blkg->parent;
-- 
2.20.1.2.gb21ebb6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-14 19:21 [PATCH] blk-iolatency: fix IO hang due to negative inflight counter Liu Bo
@ 2019-01-17 23:03 ` Liu Bo
  2019-01-18 12:58 ` Jens Axboe
  1 sibling, 0 replies; 11+ messages in thread
From: Liu Bo @ 2019-01-17 23:03 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-block, Josef Bacik

ping?

thanks,
liubo

On Mon, Jan 14, 2019 at 11:25 AM Liu Bo <bo.liu@linux.alibaba.com> wrote:
>
> Our test reported the following stack, and vmcore showed that
> ->inflight counter is -1.
>
> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
> [ffffc9003fcc3958] schedule at ffffffff8173de26
> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
>
> The ->inflight counter may be negative (-1) if
>
> 0) blk-throttle had been enabled when the IO was issued, so its bio
> has a associated blkg,
>
> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
> in this blkg was not available by then,
>
> 2) blk-iolatency was enabled before this IO reached its endio, so that
> iolatency_grp became available when the IO did the endio.
>
> 3) the ->inflight counter is decreased from 0 to -1.
>
> This uses atomic_dec_is_positive() instead to avoid the negative
> inflight counter.
>
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  block/blk-iolatency.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
> index 159af76d60e6..da4ede5c5ddf 100644
> --- a/block/blk-iolatency.c
> +++ b/block/blk-iolatency.c
> @@ -660,7 +660,12 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
>                 }
>                 rqw = &iolat->rq_wait;
>
> -               atomic_dec(&rqw->inflight);
> +               /*
> +                * if blkg is enabled by another throttler (e.g. blk-throttle)
> +                * and blk-iolatency is enabled while IO is inflight,
> +                * blk-iolatency's ->inflight counter may be still 0.
> +                */
> +               atomic_dec_if_positive(&rqw->inflight);
>                 if (!enabled || iolat->min_lat_nsec == 0)
>                         goto next;
>                 iolatency_record_time(iolat, &bio->bi_issue, now,
> @@ -692,7 +697,8 @@ static void blkcg_iolatency_cleanup(struct rq_qos *rqos, struct bio *bio)
>                         goto next;
>
>                 rqw = &iolat->rq_wait;
> -               atomic_dec(&rqw->inflight);
> +               /* see comments in blkcg_iolatency_done_io() */
> +               atomic_dec_if_positive(&rqw->inflight);
>                 wake_up(&rqw->wait);
>  next:
>                 blkg = blkg->parent;
> --
> 2.20.1.2.gb21ebb6
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-14 19:21 [PATCH] blk-iolatency: fix IO hang due to negative inflight counter Liu Bo
  2019-01-17 23:03 ` Liu Bo
@ 2019-01-18 12:58 ` Jens Axboe
  2019-01-18 16:21   ` Josef Bacik
  1 sibling, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2019-01-18 12:58 UTC (permalink / raw)
  To: Liu Bo, linux-block; +Cc: Josef Bacik

On 1/14/19 12:21 PM, Liu Bo wrote:
> Our test reported the following stack, and vmcore showed that
> ->inflight counter is -1.
> 
> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
> [ffffc9003fcc3958] schedule at ffffffff8173de26
> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
> 
> The ->inflight counter may be negative (-1) if
> 
> 0) blk-throttle had been enabled when the IO was issued, so its bio
> has a associated blkg,
> 
> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
> in this blkg was not available by then,
> 
> 2) blk-iolatency was enabled before this IO reached its endio, so that
> iolatency_grp became available when the IO did the endio.
> 
> 3) the ->inflight counter is decreased from 0 to -1.
> 
> This uses atomic_dec_is_positive() instead to avoid the negative
> inflight counter.

The problem with that is that it'll hide a lot of other issues, too.
Any way we can either track if this rqw is in flight, and only dec
if it is, or quiesce when enabling?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-18 12:58 ` Jens Axboe
@ 2019-01-18 16:21   ` Josef Bacik
  2019-01-18 16:28     ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2019-01-18 16:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Liu Bo, linux-block, Josef Bacik

On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
> On 1/14/19 12:21 PM, Liu Bo wrote:
> > Our test reported the following stack, and vmcore showed that
> > ->inflight counter is -1.
> > 
> > [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
> > [ffffc9003fcc3958] schedule at ffffffff8173de26
> > [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
> > [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
> > [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
> > [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
> > [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
> > [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
> > [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
> > [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
> > [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
> > [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
> > [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
> > [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
> > [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
> > [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
> > [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
> > [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
> > [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
> > 
> > The ->inflight counter may be negative (-1) if
> > 
> > 0) blk-throttle had been enabled when the IO was issued, so its bio
> > has a associated blkg,
> > 
> > 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
> > in this blkg was not available by then,
> > 
> > 2) blk-iolatency was enabled before this IO reached its endio, so that
> > iolatency_grp became available when the IO did the endio.
> > 
> > 3) the ->inflight counter is decreased from 0 to -1.
> > 
> > This uses atomic_dec_is_positive() instead to avoid the negative
> > inflight counter.
> 
> The problem with that is that it'll hide a lot of other issues, too.
> Any way we can either track if this rqw is in flight, and only dec
> if it is, or quiesce when enabling?
> 

I worried about this too, but really the side-effect of allowing more through
because of mis-counting means we just let more IO through.  I think maybe we add
a debug option that we can turn on to see if we're messing up accounting, but in
general I don't see a problem with this approach.

The problem we're running into here is there's not really a good way to tag a
bio as "seen by io.latency."  We just have to assume if we're on and there's a
bi_blkg associated that we saw it at submit time.  We can't just add a flag for
every io controller that starts tracking inflight io's, so for now I think this
is a reasonable solution.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-18 16:21   ` Josef Bacik
@ 2019-01-18 16:28     ` Jens Axboe
  2019-01-18 16:43       ` Josef Bacik
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2019-01-18 16:28 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Liu Bo, linux-block

On 1/18/19 9:21 AM, Josef Bacik wrote:
> On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
>> On 1/14/19 12:21 PM, Liu Bo wrote:
>>> Our test reported the following stack, and vmcore showed that
>>> ->inflight counter is -1.
>>>
>>> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
>>> [ffffc9003fcc3958] schedule at ffffffff8173de26
>>> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
>>> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
>>> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
>>> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
>>> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
>>> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
>>> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
>>> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
>>> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
>>> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
>>> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
>>> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
>>> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
>>> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
>>> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
>>> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
>>> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
>>>
>>> The ->inflight counter may be negative (-1) if
>>>
>>> 0) blk-throttle had been enabled when the IO was issued, so its bio
>>> has a associated blkg,
>>>
>>> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
>>> in this blkg was not available by then,
>>>
>>> 2) blk-iolatency was enabled before this IO reached its endio, so that
>>> iolatency_grp became available when the IO did the endio.
>>>
>>> 3) the ->inflight counter is decreased from 0 to -1.
>>>
>>> This uses atomic_dec_is_positive() instead to avoid the negative
>>> inflight counter.
>>
>> The problem with that is that it'll hide a lot of other issues, too.
>> Any way we can either track if this rqw is in flight, and only dec
>> if it is, or quiesce when enabling?
>>
> 
> I worried about this too, but really the side-effect of allowing more through
> because of mis-counting means we just let more IO through.  I think maybe we add
> a debug option that we can turn on to see if we're messing up accounting, but in
> general I don't see a problem with this approach.

The problem is that a problem in accounting elsewhere (like missing increment)
will now go unnoticed, which could completely screw it up. An occasional
blip like the one described is totally fine, but that's not guaranteed to be
the case.

> The problem we're running into here is there's not really a good way to tag a
> bio as "seen by io.latency."  We just have to assume if we're on and there's a
> bi_blkg associated that we saw it at submit time.  We can't just add a flag for
> every io controller that starts tracking inflight io's, so for now I think this
> is a reasonable solution.  Thanks,

Can we quiesce instead when enabling/disabling?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-18 16:28     ` Jens Axboe
@ 2019-01-18 16:43       ` Josef Bacik
  2019-01-19  1:39         ` Liu Bo
  0 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2019-01-18 16:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Josef Bacik, Liu Bo, linux-block

On Fri, Jan 18, 2019 at 09:28:06AM -0700, Jens Axboe wrote:
> On 1/18/19 9:21 AM, Josef Bacik wrote:
> > On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
> >> On 1/14/19 12:21 PM, Liu Bo wrote:
> >>> Our test reported the following stack, and vmcore showed that
> >>> ->inflight counter is -1.
> >>>
> >>> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
> >>> [ffffc9003fcc3958] schedule at ffffffff8173de26
> >>> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
> >>> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
> >>> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
> >>> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
> >>> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
> >>> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
> >>> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
> >>> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
> >>> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
> >>> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
> >>> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
> >>> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
> >>> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
> >>> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
> >>> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
> >>> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
> >>> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
> >>>
> >>> The ->inflight counter may be negative (-1) if
> >>>
> >>> 0) blk-throttle had been enabled when the IO was issued, so its bio
> >>> has a associated blkg,
> >>>
> >>> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
> >>> in this blkg was not available by then,
> >>>
> >>> 2) blk-iolatency was enabled before this IO reached its endio, so that
> >>> iolatency_grp became available when the IO did the endio.
> >>>
> >>> 3) the ->inflight counter is decreased from 0 to -1.
> >>>
> >>> This uses atomic_dec_is_positive() instead to avoid the negative
> >>> inflight counter.
> >>
> >> The problem with that is that it'll hide a lot of other issues, too.
> >> Any way we can either track if this rqw is in flight, and only dec
> >> if it is, or quiesce when enabling?
> >>
> > 
> > I worried about this too, but really the side-effect of allowing more through
> > because of mis-counting means we just let more IO through.  I think maybe we add
> > a debug option that we can turn on to see if we're messing up accounting, but in
> > general I don't see a problem with this approach.
> 
> The problem is that a problem in accounting elsewhere (like missing increment)
> will now go unnoticed, which could completely screw it up. An occasional
> blip like the one described is totally fine, but that's not guaranteed to be
> the case.
> 

Yeah I agree it's kind of shitty.

> > The problem we're running into here is there's not really a good way to tag a
> > bio as "seen by io.latency."  We just have to assume if we're on and there's a
> > bi_blkg associated that we saw it at submit time.  We can't just add a flag for
> > every io controller that starts tracking inflight io's, so for now I think this
> > is a reasonable solution.  Thanks,
> 
> Can we quiesce instead when enabling/disabling?

Actually that's not a bad idea, you want to look into that Liu?  That way we can
make sure everything is correct always.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-18 16:43       ` Josef Bacik
@ 2019-01-19  1:39         ` Liu Bo
  2019-01-19  1:51           ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Liu Bo @ 2019-01-19  1:39 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Jens Axboe, Liu Bo, linux-block

On Fri, Jan 18, 2019 at 8:43 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Fri, Jan 18, 2019 at 09:28:06AM -0700, Jens Axboe wrote:
> > On 1/18/19 9:21 AM, Josef Bacik wrote:
> > > On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
> > >> On 1/14/19 12:21 PM, Liu Bo wrote:
> > >>> Our test reported the following stack, and vmcore showed that
> > >>> ->inflight counter is -1.
> > >>>
> > >>> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
> > >>> [ffffc9003fcc3958] schedule at ffffffff8173de26
> > >>> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
> > >>> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
> > >>> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
> > >>> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
> > >>> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
> > >>> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
> > >>> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
> > >>> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
> > >>> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
> > >>> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
> > >>> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
> > >>> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
> > >>> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
> > >>> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
> > >>> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
> > >>> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
> > >>> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
> > >>>
> > >>> The ->inflight counter may be negative (-1) if
> > >>>
> > >>> 0) blk-throttle had been enabled when the IO was issued, so its bio
> > >>> has a associated blkg,
> > >>>
> > >>> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
> > >>> in this blkg was not available by then,
> > >>>
> > >>> 2) blk-iolatency was enabled before this IO reached its endio, so that
> > >>> iolatency_grp became available when the IO did the endio.
> > >>>
> > >>> 3) the ->inflight counter is decreased from 0 to -1.
> > >>>
> > >>> This uses atomic_dec_is_positive() instead to avoid the negative
> > >>> inflight counter.
> > >>
> > >> The problem with that is that it'll hide a lot of other issues, too.
> > >> Any way we can either track if this rqw is in flight, and only dec
> > >> if it is, or quiesce when enabling?
> > >>
> > >
> > > I worried about this too, but really the side-effect of allowing more through
> > > because of mis-counting means we just let more IO through.  I think maybe we add
> > > a debug option that we can turn on to see if we're messing up accounting, but in
> > > general I don't see a problem with this approach.
> >
> > The problem is that a problem in accounting elsewhere (like missing increment)
> > will now go unnoticed, which could completely screw it up. An occasional
> > blip like the one described is totally fine, but that's not guaranteed to be
> > the case.
> >
>
> Yeah I agree it's kind of shitty.
>
> > > The problem we're running into here is there's not really a good way to tag a
> > > bio as "seen by io.latency."  We just have to assume if we're on and there's a
> > > bi_blkg associated that we saw it at submit time.  We can't just add a flag for
> > > every io controller that starts tracking inflight io's, so for now I think this
> > > is a reasonable solution.  Thanks,
> >
> > Can we quiesce instead when enabling/disabling?
>
> Actually that's not a bad idea, you want to look into that Liu?  That way we can
> make sure everything is correct always.  Thanks,
>

Just to confirm, does "quiesce when enabling/disabling" mean "waiting
for inflight IOs to complete when enabling/disabling iolatency"?

thanks,
liubo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-19  1:39         ` Liu Bo
@ 2019-01-19  1:51           ` Jens Axboe
  2019-01-23  7:52             ` Liu Bo
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2019-01-19  1:51 UTC (permalink / raw)
  To: Liu Bo, Josef Bacik; +Cc: Liu Bo, linux-block

On 1/18/19 6:39 PM, Liu Bo wrote:
> On Fri, Jan 18, 2019 at 8:43 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> On Fri, Jan 18, 2019 at 09:28:06AM -0700, Jens Axboe wrote:
>>> On 1/18/19 9:21 AM, Josef Bacik wrote:
>>>> On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
>>>>> On 1/14/19 12:21 PM, Liu Bo wrote:
>>>>>> Our test reported the following stack, and vmcore showed that
>>>>>> ->inflight counter is -1.
>>>>>>
>>>>>> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
>>>>>> [ffffc9003fcc3958] schedule at ffffffff8173de26
>>>>>> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
>>>>>> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
>>>>>> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
>>>>>> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
>>>>>> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
>>>>>> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
>>>>>> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
>>>>>> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
>>>>>> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
>>>>>> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
>>>>>> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
>>>>>> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
>>>>>> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
>>>>>> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
>>>>>> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
>>>>>> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
>>>>>> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
>>>>>>
>>>>>> The ->inflight counter may be negative (-1) if
>>>>>>
>>>>>> 0) blk-throttle had been enabled when the IO was issued, so its bio
>>>>>> has a associated blkg,
>>>>>>
>>>>>> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
>>>>>> in this blkg was not available by then,
>>>>>>
>>>>>> 2) blk-iolatency was enabled before this IO reached its endio, so that
>>>>>> iolatency_grp became available when the IO did the endio.
>>>>>>
>>>>>> 3) the ->inflight counter is decreased from 0 to -1.
>>>>>>
>>>>>> This uses atomic_dec_is_positive() instead to avoid the negative
>>>>>> inflight counter.
>>>>>
>>>>> The problem with that is that it'll hide a lot of other issues, too.
>>>>> Any way we can either track if this rqw is in flight, and only dec
>>>>> if it is, or quiesce when enabling?
>>>>>
>>>>
>>>> I worried about this too, but really the side-effect of allowing more through
>>>> because of mis-counting means we just let more IO through.  I think maybe we add
>>>> a debug option that we can turn on to see if we're messing up accounting, but in
>>>> general I don't see a problem with this approach.
>>>
>>> The problem is that a problem in accounting elsewhere (like missing increment)
>>> will now go unnoticed, which could completely screw it up. An occasional
>>> blip like the one described is totally fine, but that's not guaranteed to be
>>> the case.
>>>
>>
>> Yeah I agree it's kind of shitty.
>>
>>>> The problem we're running into here is there's not really a good way to tag a
>>>> bio as "seen by io.latency."  We just have to assume if we're on and there's a
>>>> bi_blkg associated that we saw it at submit time.  We can't just add a flag for
>>>> every io controller that starts tracking inflight io's, so for now I think this
>>>> is a reasonable solution.  Thanks,
>>>
>>> Can we quiesce instead when enabling/disabling?
>>
>> Actually that's not a bad idea, you want to look into that Liu?  That way we can
>> make sure everything is correct always.  Thanks,
>>
> 
> Just to confirm, does "quiesce when enabling/disabling" mean "waiting
> for inflight IOs to complete when enabling/disabling iolatency"?

Precisely.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-19  1:51           ` Jens Axboe
@ 2019-01-23  7:52             ` Liu Bo
  2019-01-23 15:09               ` Jens Axboe
  2019-01-23 17:26               ` Josef Bacik
  0 siblings, 2 replies; 11+ messages in thread
From: Liu Bo @ 2019-01-23  7:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Josef Bacik, Liu Bo, linux-block

On Fri, Jan 18, 2019 at 5:51 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 1/18/19 6:39 PM, Liu Bo wrote:
> > On Fri, Jan 18, 2019 at 8:43 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >>
> >> On Fri, Jan 18, 2019 at 09:28:06AM -0700, Jens Axboe wrote:
> >>> On 1/18/19 9:21 AM, Josef Bacik wrote:
> >>>> On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
> >>>>> On 1/14/19 12:21 PM, Liu Bo wrote:
> >>>>>> Our test reported the following stack, and vmcore showed that
> >>>>>> ->inflight counter is -1.
> >>>>>>
> >>>>>> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
> >>>>>> [ffffc9003fcc3958] schedule at ffffffff8173de26
> >>>>>> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
> >>>>>> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
> >>>>>> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
> >>>>>> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
> >>>>>> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
> >>>>>> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
> >>>>>> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
> >>>>>> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
> >>>>>> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
> >>>>>> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
> >>>>>> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
> >>>>>> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
> >>>>>> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
> >>>>>> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
> >>>>>> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
> >>>>>> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
> >>>>>> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
> >>>>>>
> >>>>>> The ->inflight counter may be negative (-1) if
> >>>>>>
> >>>>>> 0) blk-throttle had been enabled when the IO was issued, so its bio
> >>>>>> has a associated blkg,
> >>>>>>
> >>>>>> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
> >>>>>> in this blkg was not available by then,
> >>>>>>
> >>>>>> 2) blk-iolatency was enabled before this IO reached its endio, so that
> >>>>>> iolatency_grp became available when the IO did the endio.
> >>>>>>
> >>>>>> 3) the ->inflight counter is decreased from 0 to -1.
> >>>>>>
> >>>>>> This uses atomic_dec_is_positive() instead to avoid the negative
> >>>>>> inflight counter.
> >>>>>
> >>>>> The problem with that is that it'll hide a lot of other issues, too.
> >>>>> Any way we can either track if this rqw is in flight, and only dec
> >>>>> if it is, or quiesce when enabling?
> >>>>>
> >>>>
> >>>> I worried about this too, but really the side-effect of allowing more through
> >>>> because of mis-counting means we just let more IO through.  I think maybe we add
> >>>> a debug option that we can turn on to see if we're messing up accounting, but in
> >>>> general I don't see a problem with this approach.
> >>>
> >>> The problem is that a problem in accounting elsewhere (like missing increment)
> >>> will now go unnoticed, which could completely screw it up. An occasional
> >>> blip like the one described is totally fine, but that's not guaranteed to be
> >>> the case.
> >>>
> >>
> >> Yeah I agree it's kind of shitty.
> >>
> >>>> The problem we're running into here is there's not really a good way to tag a
> >>>> bio as "seen by io.latency."  We just have to assume if we're on and there's a
> >>>> bi_blkg associated that we saw it at submit time.  We can't just add a flag for
> >>>> every io controller that starts tracking inflight io's, so for now I think this
> >>>> is a reasonable solution.  Thanks,
> >>>
> >>> Can we quiesce instead when enabling/disabling?
> >>
> >> Actually that's not a bad idea, you want to look into that Liu?  That way we can
> >> make sure everything is correct always.  Thanks,
> >>
> >
> > Just to confirm, does "quiesce when enabling/disabling" mean "waiting
> > for inflight IOs to complete when enabling/disabling iolatency"?
>

So it might not be an easy thing to quiesce queue because...

iolatency_set_limit()
   blkg_conf_prep() // return with rcu and queue's lock

neither freeze_queue nor quiesce_queue is allowed in rcu reader side context.

If we do it outside of blkg_conf_prep/finish(), I feel like the
overhead is more than it worths.

Any thoughts?

thanks,
liubo

> Precisely.
>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-23  7:52             ` Liu Bo
@ 2019-01-23 15:09               ` Jens Axboe
  2019-01-23 17:26               ` Josef Bacik
  1 sibling, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2019-01-23 15:09 UTC (permalink / raw)
  To: Liu Bo; +Cc: Josef Bacik, Liu Bo, linux-block

On 1/23/19 12:52 AM, Liu Bo wrote:
> On Fri, Jan 18, 2019 at 5:51 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 1/18/19 6:39 PM, Liu Bo wrote:
>>> On Fri, Jan 18, 2019 at 8:43 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>
>>>> On Fri, Jan 18, 2019 at 09:28:06AM -0700, Jens Axboe wrote:
>>>>> On 1/18/19 9:21 AM, Josef Bacik wrote:
>>>>>> On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
>>>>>>> On 1/14/19 12:21 PM, Liu Bo wrote:
>>>>>>>> Our test reported the following stack, and vmcore showed that
>>>>>>>> ->inflight counter is -1.
>>>>>>>>
>>>>>>>> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
>>>>>>>> [ffffc9003fcc3958] schedule at ffffffff8173de26
>>>>>>>> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
>>>>>>>> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
>>>>>>>> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
>>>>>>>> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
>>>>>>>> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
>>>>>>>> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
>>>>>>>> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
>>>>>>>> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
>>>>>>>> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
>>>>>>>> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
>>>>>>>> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
>>>>>>>> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
>>>>>>>> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
>>>>>>>> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
>>>>>>>> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
>>>>>>>> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
>>>>>>>> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
>>>>>>>>
>>>>>>>> The ->inflight counter may be negative (-1) if
>>>>>>>>
>>>>>>>> 0) blk-throttle had been enabled when the IO was issued, so its bio
>>>>>>>> has a associated blkg,
>>>>>>>>
>>>>>>>> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
>>>>>>>> in this blkg was not available by then,
>>>>>>>>
>>>>>>>> 2) blk-iolatency was enabled before this IO reached its endio, so that
>>>>>>>> iolatency_grp became available when the IO did the endio.
>>>>>>>>
>>>>>>>> 3) the ->inflight counter is decreased from 0 to -1.
>>>>>>>>
>>>>>>>> This uses atomic_dec_is_positive() instead to avoid the negative
>>>>>>>> inflight counter.
>>>>>>>
>>>>>>> The problem with that is that it'll hide a lot of other issues, too.
>>>>>>> Any way we can either track if this rqw is in flight, and only dec
>>>>>>> if it is, or quiesce when enabling?
>>>>>>>
>>>>>>
>>>>>> I worried about this too, but really the side-effect of allowing more through
>>>>>> because of mis-counting means we just let more IO through.  I think maybe we add
>>>>>> a debug option that we can turn on to see if we're messing up accounting, but in
>>>>>> general I don't see a problem with this approach.
>>>>>
>>>>> The problem is that a problem in accounting elsewhere (like missing increment)
>>>>> will now go unnoticed, which could completely screw it up. An occasional
>>>>> blip like the one described is totally fine, but that's not guaranteed to be
>>>>> the case.
>>>>>
>>>>
>>>> Yeah I agree it's kind of shitty.
>>>>
>>>>>> The problem we're running into here is there's not really a good way to tag a
>>>>>> bio as "seen by io.latency."  We just have to assume if we're on and there's a
>>>>>> bi_blkg associated that we saw it at submit time.  We can't just add a flag for
>>>>>> every io controller that starts tracking inflight io's, so for now I think this
>>>>>> is a reasonable solution.  Thanks,
>>>>>
>>>>> Can we quiesce instead when enabling/disabling?
>>>>
>>>> Actually that's not a bad idea, you want to look into that Liu?  That way we can
>>>> make sure everything is correct always.  Thanks,
>>>>
>>>
>>> Just to confirm, does "quiesce when enabling/disabling" mean "waiting
>>> for inflight IOs to complete when enabling/disabling iolatency"?
>>
> 
> So it might not be an easy thing to quiesce queue because...
> 
> iolatency_set_limit()
>    blkg_conf_prep() // return with rcu and queue's lock
> 
> neither freeze_queue nor quiesce_queue is allowed in rcu reader side context.
> 
> If we do it outside of blkg_conf_prep/finish(), I feel like the
> overhead is more than it worths.

I think it's hard to say it's more overhead than it's worth, when the
current situation is clearly buggy and the proposed fix has definite
flaws. The only other alternative I can see is to have a flag per
throttling entity, which isn't going to work due to flag shortage.

So I'd very much encourage you to continue going down the path of making
quiesce work. You could potentially do a callback based solution, if the
context within the prep/finish is too limiting?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] blk-iolatency: fix IO hang due to negative inflight counter
  2019-01-23  7:52             ` Liu Bo
  2019-01-23 15:09               ` Jens Axboe
@ 2019-01-23 17:26               ` Josef Bacik
  1 sibling, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2019-01-23 17:26 UTC (permalink / raw)
  To: Liu Bo; +Cc: Jens Axboe, Josef Bacik, Liu Bo, linux-block

On Tue, Jan 22, 2019 at 11:52:06PM -0800, Liu Bo wrote:
> On Fri, Jan 18, 2019 at 5:51 PM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > On 1/18/19 6:39 PM, Liu Bo wrote:
> > > On Fri, Jan 18, 2019 at 8:43 AM Josef Bacik <josef@toxicpanda.com> wrote:
> > >>
> > >> On Fri, Jan 18, 2019 at 09:28:06AM -0700, Jens Axboe wrote:
> > >>> On 1/18/19 9:21 AM, Josef Bacik wrote:
> > >>>> On Fri, Jan 18, 2019 at 05:58:18AM -0700, Jens Axboe wrote:
> > >>>>> On 1/14/19 12:21 PM, Liu Bo wrote:
> > >>>>>> Our test reported the following stack, and vmcore showed that
> > >>>>>> ->inflight counter is -1.
> > >>>>>>
> > >>>>>> [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
> > >>>>>> [ffffc9003fcc3958] schedule at ffffffff8173de26
> > >>>>>> [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
> > >>>>>> [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
> > >>>>>> [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
> > >>>>>> [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
> > >>>>>> [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
> > >>>>>> [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
> > >>>>>> [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
> > >>>>>> [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
> > >>>>>> [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
> > >>>>>> [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
> > >>>>>> [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
> > >>>>>> [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
> > >>>>>> [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
> > >>>>>> [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
> > >>>>>> [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
> > >>>>>> [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
> > >>>>>> [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
> > >>>>>>
> > >>>>>> The ->inflight counter may be negative (-1) if
> > >>>>>>
> > >>>>>> 0) blk-throttle had been enabled when the IO was issued, so its bio
> > >>>>>> has a associated blkg,
> > >>>>>>
> > >>>>>> 1) blk-iolatency was disabled when the IO was issued, so iolatency_grp
> > >>>>>> in this blkg was not available by then,
> > >>>>>>
> > >>>>>> 2) blk-iolatency was enabled before this IO reached its endio, so that
> > >>>>>> iolatency_grp became available when the IO did the endio.
> > >>>>>>
> > >>>>>> 3) the ->inflight counter is decreased from 0 to -1.
> > >>>>>>
> > >>>>>> This uses atomic_dec_is_positive() instead to avoid the negative
> > >>>>>> inflight counter.
> > >>>>>
> > >>>>> The problem with that is that it'll hide a lot of other issues, too.
> > >>>>> Any way we can either track if this rqw is in flight, and only dec
> > >>>>> if it is, or quiesce when enabling?
> > >>>>>
> > >>>>
> > >>>> I worried about this too, but really the side-effect of allowing more through
> > >>>> because of mis-counting means we just let more IO through.  I think maybe we add
> > >>>> a debug option that we can turn on to see if we're messing up accounting, but in
> > >>>> general I don't see a problem with this approach.
> > >>>
> > >>> The problem is that a problem in accounting elsewhere (like missing increment)
> > >>> will now go unnoticed, which could completely screw it up. An occasional
> > >>> blip like the one described is totally fine, but that's not guaranteed to be
> > >>> the case.
> > >>>
> > >>
> > >> Yeah I agree it's kind of shitty.
> > >>
> > >>>> The problem we're running into here is there's not really a good way to tag a
> > >>>> bio as "seen by io.latency."  We just have to assume if we're on and there's a
> > >>>> bi_blkg associated that we saw it at submit time.  We can't just add a flag for
> > >>>> every io controller that starts tracking inflight io's, so for now I think this
> > >>>> is a reasonable solution.  Thanks,
> > >>>
> > >>> Can we quiesce instead when enabling/disabling?
> > >>
> > >> Actually that's not a bad idea, you want to look into that Liu?  That way we can
> > >> make sure everything is correct always.  Thanks,
> > >>
> > >
> > > Just to confirm, does "quiesce when enabling/disabling" mean "waiting
> > > for inflight IOs to complete when enabling/disabling iolatency"?
> >
> 
> So it might not be an easy thing to quiesce queue because...
> 
> iolatency_set_limit()
>    blkg_conf_prep() // return with rcu and queue's lock
> 
> neither freeze_queue nor quiesce_queue is allowed in rcu reader side context.
> 
> If we do it outside of blkg_conf_prep/finish(), I feel like the
> overhead is more than it worths.

I had to do this for io.weight yesterday, I just put it in
blkg_conf_prep/finish.  I don't think that's unreasonable, it's not like we mess
with settings a lot, it'll get setup on cgroup creation before something is
really doing IO, and saves us if things are running currently.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-23 17:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-14 19:21 [PATCH] blk-iolatency: fix IO hang due to negative inflight counter Liu Bo
2019-01-17 23:03 ` Liu Bo
2019-01-18 12:58 ` Jens Axboe
2019-01-18 16:21   ` Josef Bacik
2019-01-18 16:28     ` Jens Axboe
2019-01-18 16:43       ` Josef Bacik
2019-01-19  1:39         ` Liu Bo
2019-01-19  1:51           ` Jens Axboe
2019-01-23  7:52             ` Liu Bo
2019-01-23 15:09               ` Jens Axboe
2019-01-23 17:26               ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.