stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 4.19 LTS high /proc/diskstats io_ticks
@ 2020-03-25 10:02 Rantala, Tommi T. (Nokia - FI/Espoo)
  2020-03-25 10:07 ` Greg KH
  0 siblings, 1 reply; 4+ messages in thread
From: Rantala, Tommi T. (Nokia - FI/Espoo) @ 2020-03-25 10:02 UTC (permalink / raw)
  To: axboe, osandov; +Cc: stable, linux-kernel, linux-block

Hi,

Tools like sar and iostat are reporting abnormally high %util with 4.19.y
running in VM (the disk is almost idle):

  $ sar -dp
  Linux 4.19.107-1.x86_64   03/25/20    _x86_64_   (6 CPU)

  00:00:00        DEV       tps      ...     %util
  00:10:00        vda      0.55      ...     98.07
  ...
  10:00:00        vda      0.44      ...     99.74
  Average:        vda      0.48      ...     98.98

The numbers look reasonable for the partition:

  # iostat -x -p ALL 1 1
  Linux 4.19.107-1.x86_64   03/25/20    _x86_64_  (6 CPU)

  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            10.51    0.00    8.58    0.05    0.11   80.75

  Device            r/s     ...  %util
  vda              0.02     ...  98.25
  vda1             0.01     ...   0.09


Lots of io_ticks in /proc/diskstats:

# cat /proc/uptime
45787.03 229321.29

# grep vda /proc/diskstats
 253      0 vda 760 0 38498 731 28165 43212 1462928 157514 0 44690263
44812032 0 0 0 0
 253      1 vda1 350 0 19074 293 26169 43212 1462912 154931 0 41560 150998
0 0 0 0


Other people are apparently seeing this too with 4.19:
https://kudzia.eu/b/2019/09/iostat-x-1-reporting-100-utilization-of-nearly-idle-nvme-drives/


I also see this only in 4.19.y and bisected to this (based on the Fixes
tag, this should have been taken to 4.14 too...):

commit 6131837b1de66116459ef4413e26fdbc70d066dc
Author: Omar Sandoval <osandov@fb.com>
Date:   Thu Apr 26 00:21:58 2018 -0700

  blk-mq: count allocated but not started requests in iostats inflight

  In the legacy block case, we increment the counter right after we
  allocate the request, not when the driver handles it. In both the legacy
  and blk-mq cases, part_inc_in_flight() is called from
  blk_account_io_start() right after we've allocated the request. blk-mq
  only considers requests started requests as inflight, but this is
  inconsistent with the legacy definition and the intention in the code.
  This removes the started condition and instead counts all allocated
  requests.

  Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant")
  Signed-off-by: Omar Sandoval <osandov@fb.com>
  Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c3621453ad87..5450cbc61f8d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -95,18 +95,15 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx
*hctx,
 {
        struct mq_inflight *mi = priv;
 
-       if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) {
-               /*
-                * index[0] counts the specific partition that was asked
-                * for. index[1] counts the ones that are active on the
-                * whole device, so increment that if mi->part is indeed
-                * a partition, and not a whole device.
-                */
-               if (rq->part == mi->part)
-                       mi->inflight[0]++;
-               if (mi->part->partno)
-                       mi->inflight[1]++;
-       }
+       /*
+        * index[0] counts the specific partition that was asked for.
index[1]
+        * counts the ones that are active on the whole device, so
increment
+        * that if mi->part is indeed a partition, and not a whole device.
+        */
+       if (rq->part == mi->part)
+               mi->inflight[0]++;
+       if (mi->part->partno)
+               mi->inflight[1]++;
 }
 
 void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,



If I get it right, when the disk is idle, and some request is allocated,
part_round_stats() with this commit will now add all ticks between
previous I/O and current time (now - part->stamp) to io_ticks.

Before the commit, part_round_stats() would only update part->stamp when
called after request allocation.

Any thoughts how to best fix this in 4.19?
I see the io_ticks accounting has been reworked in 5.0, do we need to
backport those to 4.19, or any ill effects if this commit is reverted in
4.19?

-Tommi


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: 4.19 LTS high /proc/diskstats io_ticks
  2020-03-25 10:02 4.19 LTS high /proc/diskstats io_ticks Rantala, Tommi T. (Nokia - FI/Espoo)
@ 2020-03-25 10:07 ` Greg KH
  2020-03-25 11:22   ` Rantala, Tommi T. (Nokia - FI/Espoo)
  0 siblings, 1 reply; 4+ messages in thread
From: Greg KH @ 2020-03-25 10:07 UTC (permalink / raw)
  To: Rantala, Tommi T. (Nokia - FI/Espoo)
  Cc: axboe, osandov, stable, linux-kernel, linux-block

On Wed, Mar 25, 2020 at 10:02:41AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
> Hi,
> 
> Tools like sar and iostat are reporting abnormally high %util with 4.19.y
> running in VM (the disk is almost idle):
> 
>   $ sar -dp
>   Linux 4.19.107-1.x86_64   03/25/20    _x86_64_   (6 CPU)
> 
>   00:00:00        DEV       tps      ...     %util
>   00:10:00        vda      0.55      ...     98.07
>   ...
>   10:00:00        vda      0.44      ...     99.74
>   Average:        vda      0.48      ...     98.98
> 
> The numbers look reasonable for the partition:
> 
>   # iostat -x -p ALL 1 1
>   Linux 4.19.107-1.x86_64   03/25/20    _x86_64_  (6 CPU)
> 
>   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             10.51    0.00    8.58    0.05    0.11   80.75
> 
>   Device            r/s     ...  %util
>   vda              0.02     ...  98.25
>   vda1             0.01     ...   0.09
> 
> 
> Lots of io_ticks in /proc/diskstats:
> 
> # cat /proc/uptime
> 45787.03 229321.29
> 
> # grep vda /proc/diskstats
>  253      0 vda 760 0 38498 731 28165 43212 1462928 157514 0 44690263
> 44812032 0 0 0 0
>  253      1 vda1 350 0 19074 293 26169 43212 1462912 154931 0 41560 150998
> 0 0 0 0
> 
> 
> Other people are apparently seeing this too with 4.19:
> https://kudzia.eu/b/2019/09/iostat-x-1-reporting-100-utilization-of-nearly-idle-nvme-drives/
> 
> 
> I also see this only in 4.19.y and bisected to this (based on the Fixes
> tag, this should have been taken to 4.14 too...):
> 
> commit 6131837b1de66116459ef4413e26fdbc70d066dc
> Author: Omar Sandoval <osandov@fb.com>
> Date:   Thu Apr 26 00:21:58 2018 -0700
> 
>   blk-mq: count allocated but not started requests in iostats inflight
> 
>   In the legacy block case, we increment the counter right after we
>   allocate the request, not when the driver handles it. In both the legacy
>   and blk-mq cases, part_inc_in_flight() is called from
>   blk_account_io_start() right after we've allocated the request. blk-mq
>   only considers requests started requests as inflight, but this is
>   inconsistent with the legacy definition and the intention in the code.
>   This removes the started condition and instead counts all allocated
>   requests.
> 
>   Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant")
>   Signed-off-by: Omar Sandoval <osandov@fb.com>
>   Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c3621453ad87..5450cbc61f8d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -95,18 +95,15 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx
> *hctx,
>  {
>         struct mq_inflight *mi = priv;
>  
> -       if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) {
> -               /*
> -                * index[0] counts the specific partition that was asked
> -                * for. index[1] counts the ones that are active on the
> -                * whole device, so increment that if mi->part is indeed
> -                * a partition, and not a whole device.
> -                */
> -               if (rq->part == mi->part)
> -                       mi->inflight[0]++;
> -               if (mi->part->partno)
> -                       mi->inflight[1]++;
> -       }
> +       /*
> +        * index[0] counts the specific partition that was asked for.
> index[1]
> +        * counts the ones that are active on the whole device, so
> increment
> +        * that if mi->part is indeed a partition, and not a whole device.
> +        */
> +       if (rq->part == mi->part)
> +               mi->inflight[0]++;
> +       if (mi->part->partno)
> +               mi->inflight[1]++;
>  }
>  
>  void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
> 
> 
> 
> If I get it right, when the disk is idle, and some request is allocated,
> part_round_stats() with this commit will now add all ticks between
> previous I/O and current time (now - part->stamp) to io_ticks.
> 
> Before the commit, part_round_stats() would only update part->stamp when
> called after request allocation.

So this is a "false" reporting?  there's really no load?

> Any thoughts how to best fix this in 4.19?
> I see the io_ticks accounting has been reworked in 5.0, do we need to
> backport those to 4.19, or any ill effects if this commit is reverted in
> 4.19?

Do you see this issue in 5.4?  What's keeping you from moving to 5.4.y?

And if this isn't a real issue, is that a problem too?

As you can test this, if you have a set of patches backported that could
resolve it, can you send them to us?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4.19 LTS high /proc/diskstats io_ticks
  2020-03-25 10:07 ` Greg KH
@ 2020-03-25 11:22   ` Rantala, Tommi T. (Nokia - FI/Espoo)
  2020-03-25 15:24     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Rantala, Tommi T. (Nokia - FI/Espoo) @ 2020-03-25 11:22 UTC (permalink / raw)
  To: gregkh
  Cc: snitzer, stable, axboe, linux-kernel, osandov, linux-block, mpatocka

On Wed, 2020-03-25 at 11:07 +0100, Greg KH wrote:
> On Wed, Mar 25, 2020 at 10:02:41AM +0000, Rantala, Tommi T. (Nokia -
> FI/Espoo) wrote:
> > 
> > Other people are apparently seeing this too with 4.19:
> > https://kudzia.eu/b/2019/09/iostat-x-1-reporting-100-utilization-of-nearly-idle-nvme-drives/
> > 
> > 
> > I also see this only in 4.19.y and bisected to this (based on the Fixes
> > tag, this should have been taken to 4.14 too...):
> > 
> > commit 6131837b1de66116459ef4413e26fdbc70d066dc
> > Author: Omar Sandoval <osandov@fb.com>
> > Date:   Thu Apr 26 00:21:58 2018 -0700
> > 
> >   blk-mq: count allocated but not started requests in iostats inflight
> > 
> >   In the legacy block case, we increment the counter right after we
> >   allocate the request, not when the driver handles it. In both the
> > legacy
> >   and blk-mq cases, part_inc_in_flight() is called from
> >   blk_account_io_start() right after we've allocated the request. blk-mq
> >   only considers requests started requests as inflight, but this is
> >   inconsistent with the legacy definition and the intention in the code.
> >   This removes the started condition and instead counts all allocated
> >   requests.
> > 
> >   Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant")
> >   Signed-off-by: Omar Sandoval <osandov@fb.com>
> >   Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > 
> > 
> > 
> > If I get it right, when the disk is idle, and some request is allocated,
> > part_round_stats() with this commit will now add all ticks between
> > previous I/O and current time (now - part->stamp) to io_ticks.
> > 
> > Before the commit, part_round_stats() would only update part->stamp when
> > called after request allocation.
> 
> So this is a "false" reporting?  there's really no load?

Correct.

> > Any thoughts how to best fix this in 4.19?
> > I see the io_ticks accounting has been reworked in 5.0, do we need to
> > backport those to 4.19, or any ill effects if this commit is reverted in
> > 4.19?
> 
> Do you see this issue in 5.4?  What's keeping you from moving to 5.4.y?

Yes it's fixed in 5.0 and later.
Fixing it in 4.19.x would benefit debian stable users too. :-)
BTW is the EOL for 4.19 still end of 2020?

> And if this isn't a real issue, is that a problem too?
> 
> As you can test this, if you have a set of patches backported that could
> resolve it, can you send them to us?

Based on quick looks it's solved by this, I'll need to check if it's enough
to fix it:

commit 5b18b5a737600fd20ba2045f320d5926ebbf341a
Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Thu Dec 6 11:41:19 2018 -0500

    block: delete part_round_stats and switch to less precise counting




> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4.19 LTS high /proc/diskstats io_ticks
  2020-03-25 11:22   ` Rantala, Tommi T. (Nokia - FI/Espoo)
@ 2020-03-25 15:24     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2020-03-25 15:24 UTC (permalink / raw)
  To: Rantala, Tommi T. (Nokia - FI/Espoo), gregkh
  Cc: snitzer, stable, linux-kernel, osandov, linux-block, mpatocka

On 3/25/20 5:22 AM, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
> Based on quick looks it's solved by this, I'll need to check if it's enough
> to fix it:
> 
> commit 5b18b5a737600fd20ba2045f320d5926ebbf341a
> Author: Mikulas Patocka <mpatocka@redhat.com>
> Date:   Thu Dec 6 11:41:19 2018 -0500
> 
>     block: delete part_round_stats and switch to less precise counting

Unfortunately that has its own set of issues, which are currently being
fixed just now...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-03-25 15:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25 10:02 4.19 LTS high /proc/diskstats io_ticks Rantala, Tommi T. (Nokia - FI/Espoo)
2020-03-25 10:07 ` Greg KH
2020-03-25 11:22   ` Rantala, Tommi T. (Nokia - FI/Espoo)
2020-03-25 15:24     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).