linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [regression] very inaccurate %util of iostat
@ 2020-03-24  3:19 Ming Lei
  2020-03-24  3:53 ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2020-03-24  3:19 UTC (permalink / raw)
  To: linux-block, Jens Axboe, Mike Snitzer, mpatocka

Hi Guys,

Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
changes calculation of 'io_ticks' a lot.

In theory, io_ticks counts the time when there is any IO in-flight or in-queue,
so it has to rely on in-flight counting of IO.

However, commit 5b18b5a73760 changes io_ticks's accounting into the
following way:

	stamp = READ_ONCE(part->stamp);
	if (unlikely(stamp != now)) {
		if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
			__part_stat_add(part, io_ticks, 1);
	}

So this way doesn't use any in-flight IO's info, simply adding 1 if stamp
changes compared with previous stamp, no matter if there is any in-flight
IO or not.

Now when there is very heavy IO on disks, %util is still much less than
100%, especially on HDD, the reason could be that IO latency can be much more
than 1ms in case of 1000HZ, so the above calculation is very inaccurate.

Another extreme example is that if IOs take long time to complete, such
as IO stall, %util may show 0% utilization, instead of 100%.

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very inaccurate %util of iostat
  2020-03-24  3:19 [regression] very inaccurate %util of iostat Ming Lei
@ 2020-03-24  3:53 ` Mike Snitzer
  2020-10-23  2:48   ` Weiping Zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2020-03-24  3:53 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Jens Axboe, mpatocka, Konstantin Khlebnikov

On Mon, Mar 23 2020 at 11:19pm -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> Hi Guys,
> 
> Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
> changes calculation of 'io_ticks' a lot.
> 
> In theory, io_ticks counts the time when there is any IO in-flight or in-queue,
> so it has to rely on in-flight counting of IO.
> 
> However, commit 5b18b5a73760 changes io_ticks's accounting into the
> following way:
> 
> 	stamp = READ_ONCE(part->stamp);
> 	if (unlikely(stamp != now)) {
> 		if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
> 			__part_stat_add(part, io_ticks, 1);
> 	}
> 
> So this way doesn't use any in-flight IO's info, simply adding 1 if stamp
> changes compared with previous stamp, no matter if there is any in-flight
> IO or not.
> 
> Now when there is very heavy IO on disks, %util is still much less than
> 100%, especially on HDD, the reason could be that IO latency can be much more
> than 1ms in case of 1000HZ, so the above calculation is very inaccurate.
> 
> Another extreme example is that if IOs take long time to complete, such
> as IO stall, %util may show 0% utilization, instead of 100%.

Hi Ming,

Your email triggered a memory of someone else (Konstantin Khlebnikov)
having reported and fixed this relatively recently, please see this
patchset: https://lkml.org/lkml/2020/3/2/336

Obviously this needs fixing.  If you have time to review/polish the
proposed patches that'd be great.

Mike


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very inaccurate %util of iostat
  2020-03-24  3:53 ` Mike Snitzer
@ 2020-10-23  2:48   ` Weiping Zhang
  2020-10-23  5:50     ` Weiping Zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Weiping Zhang @ 2020-10-23  2:48 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Ming Lei, linux-block, Jens Axboe, mpatocka, Konstantin Khlebnikov

On Tue, Mar 24, 2020 at 11:54 AM Mike Snitzer <snitzer@redhat.com> wrote:
>
> On Mon, Mar 23 2020 at 11:19pm -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
>
> > Hi Guys,
> >
> > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
> > changes calculation of 'io_ticks' a lot.
> >
> > In theory, io_ticks counts the time when there is any IO in-flight or in-queue,
> > so it has to rely on in-flight counting of IO.
> >
> > However, commit 5b18b5a73760 changes io_ticks's accounting into the
> > following way:
> >
> >       stamp = READ_ONCE(part->stamp);
> >       if (unlikely(stamp != now)) {
> >               if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
> >                       __part_stat_add(part, io_ticks, 1);
> >       }
> >
> > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp
> > changes compared with previous stamp, no matter if there is any in-flight
> > IO or not.
> >
> > Now when there is very heavy IO on disks, %util is still much less than
> > 100%, especially on HDD, the reason could be that IO latency can be much more
> > than 1ms in case of 1000HZ, so the above calculation is very inaccurate.
> >
> > Another extreme example is that if IOs take long time to complete, such
> > as IO stall, %util may show 0% utilization, instead of 100%.
>
> Hi Ming,
>
> Your email triggered a memory of someone else (Konstantin Khlebnikov)
> having reported and fixed this relatively recently, please see this
> patchset: https://lkml.org/lkml/2020/3/2/336
>
> Obviously this needs fixing.  If you have time to review/polish the
> proposed patches that'd be great.
>
> Mike
>

Hi,

commit 5b18b5a73760   makes io.util larger than the real, when IO
inflight count <= 1,
even with the commit 2b8bd423614, the problem is exist too.

static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
{
        unsigned long stamp;
again:
        stamp = READ_ONCE(part->stamp);
        if (unlikely(stamp != now)) {
                if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
                        __part_stat_add(part, io_ticks, end ? now - stamp : 1);
        }

when a new start, blk_account_io_start => update_io_ticks and add 1 jiffy to
io_ticks, even there is no IO before, so it will always add an extra 1 jiffy.
So we should know is there any inflight IO before.



Before commit 5b18b5a73760,
The io_ticks will not be added, if there is no inflight when start a new IO.
static void part_round_stats_single(struct request_queue *q,
                                    struct hd_struct *part, unsigned long now,
                                    unsigned int inflight)
{
        if (inflight) {
                __part_stat_add(part, time_in_queue,
                                inflight * (now - part->stamp));
                __part_stat_add(part, io_ticks, (now - part->stamp));
        }
        part->stamp = now;
}


Reproduce:
fio -name=test -ioengine=sync -bs=4K -rw=write
-filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
-runtime=300 -rate=2m,2m

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very inaccurate %util of iostat
  2020-10-23  2:48   ` Weiping Zhang
@ 2020-10-23  5:50     ` Weiping Zhang
       [not found]       ` <156351603710306@mail.yandex-team.ru>
  0 siblings, 1 reply; 5+ messages in thread
From: Weiping Zhang @ 2020-10-23  5:50 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Ming Lei, linux-block, Jens Axboe, mpatocka, Konstantin Khlebnikov

On Fri, Oct 23, 2020 at 10:48 AM Weiping Zhang <zwp10758@gmail.com> wrote:
>
> On Tue, Mar 24, 2020 at 11:54 AM Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > On Mon, Mar 23 2020 at 11:19pm -0400,
> > Ming Lei <ming.lei@redhat.com> wrote:
> >
> > > Hi Guys,
> > >
> > > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
> > > changes calculation of 'io_ticks' a lot.
> > >
> > > In theory, io_ticks counts the time when there is any IO in-flight or in-queue,
> > > so it has to rely on in-flight counting of IO.
> > >
> > > However, commit 5b18b5a73760 changes io_ticks's accounting into the
> > > following way:
> > >
> > >       stamp = READ_ONCE(part->stamp);
> > >       if (unlikely(stamp != now)) {
> > >               if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
> > >                       __part_stat_add(part, io_ticks, 1);
> > >       }
> > >
> > > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp
> > > changes compared with previous stamp, no matter if there is any in-flight
> > > IO or not.
> > >
> > > Now when there is very heavy IO on disks, %util is still much less than
> > > 100%, especially on HDD, the reason could be that IO latency can be much more
> > > than 1ms in case of 1000HZ, so the above calculation is very inaccurate.
> > >
> > > Another extreme example is that if IOs take long time to complete, such
> > > as IO stall, %util may show 0% utilization, instead of 100%.
> >
> > Hi Ming,
> >
> > Your email triggered a memory of someone else (Konstantin Khlebnikov)
> > having reported and fixed this relatively recently, please see this
> > patchset: https://lkml.org/lkml/2020/3/2/336
> >
> > Obviously this needs fixing.  If you have time to review/polish the
> > proposed patches that'd be great.
> >
> > Mike
> >
>
> Hi,
>
> commit 5b18b5a73760   makes io.util larger than the real, when IO
> inflight count <= 1,
> even with the commit 2b8bd423614, the problem is exist too.
>
> static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
> {
>         unsigned long stamp;
> again:
>         stamp = READ_ONCE(part->stamp);
>         if (unlikely(stamp != now)) {
>                 if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
>                         __part_stat_add(part, io_ticks, end ? now - stamp : 1);
>         }
>
> when a new start, blk_account_io_start => update_io_ticks and add 1 jiffy to
> io_ticks, even there is no IO before, so it will always add an extra 1 jiffy.
> So we should know is there any inflight IO before.
>
>
>
> Before commit 5b18b5a73760,
> The io_ticks will not be added, if there is no inflight when start a new IO.
> static void part_round_stats_single(struct request_queue *q,
>                                     struct hd_struct *part, unsigned long now,
>                                     unsigned int inflight)
> {
>         if (inflight) {
>                 __part_stat_add(part, time_in_queue,
>                                 inflight * (now - part->stamp));
>                 __part_stat_add(part, io_ticks, (now - part->stamp));
>         }
>         part->stamp = now;
> }
>
>
> Reproduce:
> fio -name=test -ioengine=sync -bs=4K -rw=write
> -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
> -runtime=300 -rate=2m,2m



Let's explain in more detail.

I run the following command on a host, with different kernel version.

fio -name=test -ioengine=sync -bs=4K -rw=write
-filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
-runtime=300 -rate=2m,2m

If we run fio in a sync direct io mode, IO will be proccessed one by one,
you can see that there are 512 IOs completed in one second.


kernel: 4.19.0

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00  512.00     0.00     2.00
8.00     0.21    0.40    0.00    0.40   0.40  20.60

The averate io.latency is 0.4ms
So the disk time cost in one second should be 0.4 * 512 = 204.8 ms,
that means, %util should be 20%.


kernel: commit f9893351ac
In the latest kernel commit f9893351ac, I got the follow number,
Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the
io.latency io.latency will be 1 + 0.4 = 1.4ms,
1.4 * 512 = 716.8ms, so the %util show it about 72%.

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vda              0.00  512.00      0.00      2.00     0.00     0.00
0.00   0.00    0.00    0.40   0.20     0.00     4.00   1.41  72.10

There is a big gap for %util.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very inaccurate %util of iostat
       [not found]       ` <156351603710306@mail.yandex-team.ru>
@ 2020-10-27  4:26         ` Weiping Zhang
  0 siblings, 0 replies; 5+ messages in thread
From: Weiping Zhang @ 2020-10-27  4:26 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Mike Snitzer, Ming Lei, linux-block, Jens Axboe, mpatocka

On Mon, Oct 26, 2020 at 7:17 PM Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
>
> 23.10.2020, 08:51, "Weiping Zhang" <zwp10758@gmail.com>:
>
> On Fri, Oct 23, 2020 at 10:48 AM Weiping Zhang <zwp10758@gmail.com> wrote:
>
>
>  On Tue, Mar 24, 2020 at 11:54 AM Mike Snitzer <snitzer@redhat.com> wrote:
>  >
>  > On Mon, Mar 23 2020 at 11:19pm -0400,
>  > Ming Lei <ming.lei@redhat.com> wrote:
>  >
>  > > Hi Guys,
>  > >
>  > > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
>  > > changes calculation of 'io_ticks' a lot.
>  > >
>  > > In theory, io_ticks counts the time when there is any IO in-flight or in-queue,
>  > > so it has to rely on in-flight counting of IO.
>  > >
>  > > However, commit 5b18b5a73760 changes io_ticks's accounting into the
>  > > following way:
>  > >
>  > > stamp = READ_ONCE(part->stamp);
>  > > if (unlikely(stamp != now)) {
>  > > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
>  > > __part_stat_add(part, io_ticks, 1);
>  > > }
>  > >
>  > > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp
>  > > changes compared with previous stamp, no matter if there is any in-flight
>  > > IO or not.
>  > >
>  > > Now when there is very heavy IO on disks, %util is still much less than
>  > > 100%, especially on HDD, the reason could be that IO latency can be much more
>  > > than 1ms in case of 1000HZ, so the above calculation is very inaccurate.
>  > >
>  > > Another extreme example is that if IOs take long time to complete, such
>  > > as IO stall, %util may show 0% utilization, instead of 100%.
>  >
>  > Hi Ming,
>  >
>  > Your email triggered a memory of someone else (Konstantin Khlebnikov)
>  > having reported and fixed this relatively recently, please see this
>  > patchset: https://lkml.org/lkml/2020/3/2/336
>  >
>  > Obviously this needs fixing. If you have time to review/polish the
>  > proposed patches that'd be great.
>  >
>  > Mike
>  >
>
>  Hi,
>
>  commit 5b18b5a73760 makes io.util larger than the real, when IO
>  inflight count <= 1,
>  even with the commit 2b8bd423614, the problem is exist too.
>
>  static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
>  {
>          unsigned long stamp;
>  again:
>          stamp = READ_ONCE(part->stamp);
>          if (unlikely(stamp != now)) {
>                  if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
>                          __part_stat_add(part, io_ticks, end ? now - stamp : 1);
>          }
>
>  when a new start, blk_account_io_start => update_io_ticks and add 1 jiffy to
>  io_ticks, even there is no IO before, so it will always add an extra 1 jiffy.
>  So we should know is there any inflight IO before.
>
>
>
>  Before commit 5b18b5a73760,
>  The io_ticks will not be added, if there is no inflight when start a new IO.
>  static void part_round_stats_single(struct request_queue *q,
>                                      struct hd_struct *part, unsigned long now,
>                                      unsigned int inflight)
>  {
>          if (inflight) {
>                  __part_stat_add(part, time_in_queue,
>                                  inflight * (now - part->stamp));
>                  __part_stat_add(part, io_ticks, (now - part->stamp));
>          }
>          part->stamp = now;
>  }
>
>
>  Reproduce:
>  fio -name=test -ioengine=sync -bs=4K -rw=write
>  -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
>  -runtime=300 -rate=2m,2m
>
>
>
>
> Let's explain in more detail.
>
> I run the following command on a host, with different kernel version.
>
> fio -name=test -ioengine=sync -bs=4K -rw=write
> -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
> -runtime=300 -rate=2m,2m
>
> If we run fio in a sync direct io mode, IO will be proccessed one by one,
> you can see that there are 512 IOs completed in one second.
>
>
> kernel: 4.19.0
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> vda 0.00 0.00 0.00 512.00 0.00 2.00
> 8.00 0.21 0.40 0.00 0.40 0.40 20.60
>
> The averate io.latency is 0.4ms
> So the disk time cost in one second should be 0.4 * 512 = 204.8 ms,
> that means, %util should be 20%.
>
>
> kernel: commit f9893351ac
> In the latest kernel commit f9893351ac, I got the follow number,
> Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the
> io.latency io.latency will be 1 + 0.4 = 1.4ms,
> 1.4 * 512 = 716.8ms, so the %util show it about 72%.
>
> Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s
> %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
> vda 0.00 512.00 0.00 2.00 0.00 0.00
> 0.00 0.00 0.00 0.40 0.20 0.00 4.00 1.41 72.10
>
> There is a big gap for %util.
>
>
> Yep. Meaning of this statistics has changed. Now it approximates count of jiffies where was running at last one request. I.e. every tiny request always accounts whole jiffy were it started/finished.
>
When Q2Q >= 1 jiffy, an extra 1jiffy was added when IO start, it's
really not reasonable,


modprobe null_blk submit_queues=8 queue_mode=2 irqmode=2 completion_nsec=100000
fio -name=test -ioengine=sync -bs=4K -rw=write -filename=/dev/nullb0
-size=100M -time_based=1 -direct=1 -runtime=300 -rate=4m &

                        w/s   w_await  %util
-----------------------------------------------
before patch 1024         0.15  100
after   patch  1024         0.15  14.5

The gap is really unacceptable.

The test patch:
https://github.com/dublio/linux/commits/fix_util_v5

> %util is still useful - it shows how disk activity is grouped across time.
>
> You could get precise "utilization" in form of average queue depth from read/write time.
It's hard to get the percent of overlap each other for inflight IOs
when device has high queue depth.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-27  4:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-24  3:19 [regression] very inaccurate %util of iostat Ming Lei
2020-03-24  3:53 ` Mike Snitzer
2020-10-23  2:48   ` Weiping Zhang
2020-10-23  5:50     ` Weiping Zhang
     [not found]       ` <156351603710306@mail.yandex-team.ru>
2020-10-27  4:26         ` Weiping Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).