linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] fix inaccurate io_ticks
@ 2020-10-27  4:54 Weiping Zhang
  2020-11-04  3:26 ` Weiping Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Weiping Zhang @ 2020-10-27  4:54 UTC (permalink / raw)
  To: axboe, ming.lei, snitzer, mpatocka; +Cc: linux-block

Hi,

This patchset include two patches,

01. block: fix inaccurate io_ticks
fix the io_ticks if start a new IO and there is no inflight IO before.

02. blk-mq: break more earlier when interate hctx
An optimization for blk_mq_queue_inflight and blk_mq_part_is_in_flight
these two function only want to know if there is IO inflight and do
not care how many inflight IOs are there.
After this patch blk_mq_queue_inflight will stop interate other hctx
when find a inflight IO, blk_mq_part_is_in_inflight stop interate
other setbit/hctx when find a inflight IO.

Changes since v4:
 * only get inflight in update_io_ticks when start a new IO every jiffy.

Changes since v3:
 * add a parameter for blk_mq_queue_tag_busy_iter to break earlier
   when interate hctx of a queue, since blk_mq_part_is_in_inflight
   and blk_mq_queue_inflight do not care how many inflight IOs.

Changes since v2:
* use blk_mq_queue_tag_busy_iter framework instead of open-code.
* update_io_ticks before update inflight for __part_start_io_acct

Changes since v1:
* avoid iterate all tagset, return directly if find a set bit.
* fix some typo in commit message

Weiping Zhang (2):
  block: fix inaccurate io_ticks
  blk-mq: break more earlier when interate hctx

 block/blk-core.c       | 19 ++++++++++----
 block/blk-mq-tag.c     | 11 ++++++--
 block/blk-mq-tag.h     |  2 +-
 block/blk-mq.c         | 58 +++++++++++++++++++++++++++++++++++++++---
 block/blk-mq.h         |  1 +
 block/blk.h            |  1 +
 block/genhd.c          | 13 ++++++++++
 include/linux/blk-mq.h |  1 +
 8 files changed, 94 insertions(+), 12 deletions(-)

-- 
2.18.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] fix inaccurate io_ticks
  2020-10-27  4:54 [PATCH v5 0/2] fix inaccurate io_ticks Weiping Zhang
@ 2020-11-04  3:26 ` Weiping Zhang
  2020-11-17  3:01   ` Weiping Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Weiping Zhang @ 2020-11-04  3:26 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei, Mike Snitzer, mpatocka, linux-block

Ping

On Wed, Oct 28, 2020 at 6:59 AM Weiping Zhang
<zhangweiping@didiglobal.com> wrote:
>
> Hi,
>
> This patchset include two patches,
>
> 01. block: fix inaccurate io_ticks
> fix the io_ticks if start a new IO and there is no inflight IO before.
>
> 02. blk-mq: break more earlier when interate hctx
> An optimization for blk_mq_queue_inflight and blk_mq_part_is_in_flight
> these two function only want to know if there is IO inflight and do
> not care how many inflight IOs are there.
> After this patch blk_mq_queue_inflight will stop interate other hctx
> when find a inflight IO, blk_mq_part_is_in_inflight stop interate
> other setbit/hctx when find a inflight IO.
>
> Changes since v4:
>  * only get inflight in update_io_ticks when start a new IO every jiffy.
>
> Changes since v3:
>  * add a parameter for blk_mq_queue_tag_busy_iter to break earlier
>    when interate hctx of a queue, since blk_mq_part_is_in_inflight
>    and blk_mq_queue_inflight do not care how many inflight IOs.
>
> Changes since v2:
> * use blk_mq_queue_tag_busy_iter framework instead of open-code.
> * update_io_ticks before update inflight for __part_start_io_acct
>
> Changes since v1:
> * avoid iterate all tagset, return directly if find a set bit.
> * fix some typo in commit message
>
> Weiping Zhang (2):
>   block: fix inaccurate io_ticks
>   blk-mq: break more earlier when interate hctx
>
>  block/blk-core.c       | 19 ++++++++++----
>  block/blk-mq-tag.c     | 11 ++++++--
>  block/blk-mq-tag.h     |  2 +-
>  block/blk-mq.c         | 58 +++++++++++++++++++++++++++++++++++++++---
>  block/blk-mq.h         |  1 +
>  block/blk.h            |  1 +
>  block/genhd.c          | 13 ++++++++++
>  include/linux/blk-mq.h |  1 +
>  8 files changed, 94 insertions(+), 12 deletions(-)
>
> --
> 2.18.4
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] fix inaccurate io_ticks
  2020-11-04  3:26 ` Weiping Zhang
@ 2020-11-17  3:01   ` Weiping Zhang
  2020-11-17  3:27     ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: Weiping Zhang @ 2020-11-17  3:01 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei, Mike Snitzer, mpatocka, linux-block

Hi Jens,

Ping

On Wed, Nov 4, 2020 at 11:26 AM Weiping Zhang <zwp10758@gmail.com> wrote:
>
> Ping
>
> On Wed, Oct 28, 2020 at 6:59 AM Weiping Zhang
> <zhangweiping@didiglobal.com> wrote:
> >
> > Hi,
> >
> > This patchset include two patches,
> >
> > 01. block: fix inaccurate io_ticks
> > fix the io_ticks if start a new IO and there is no inflight IO before.
> >
> > 02. blk-mq: break more earlier when interate hctx
> > An optimization for blk_mq_queue_inflight and blk_mq_part_is_in_flight
> > these two function only want to know if there is IO inflight and do
> > not care how many inflight IOs are there.
> > After this patch blk_mq_queue_inflight will stop interate other hctx
> > when find a inflight IO, blk_mq_part_is_in_inflight stop interate
> > other setbit/hctx when find a inflight IO.
> >
> > Changes since v4:
> >  * only get inflight in update_io_ticks when start a new IO every jiffy.
> >
> > Changes since v3:
> >  * add a parameter for blk_mq_queue_tag_busy_iter to break earlier
> >    when interate hctx of a queue, since blk_mq_part_is_in_inflight
> >    and blk_mq_queue_inflight do not care how many inflight IOs.
> >
> > Changes since v2:
> > * use blk_mq_queue_tag_busy_iter framework instead of open-code.
> > * update_io_ticks before update inflight for __part_start_io_acct
> >
> > Changes since v1:
> > * avoid iterate all tagset, return directly if find a set bit.
> > * fix some typo in commit message
> >
> > Weiping Zhang (2):
> >   block: fix inaccurate io_ticks
> >   blk-mq: break more earlier when interate hctx
> >
> >  block/blk-core.c       | 19 ++++++++++----
> >  block/blk-mq-tag.c     | 11 ++++++--
> >  block/blk-mq-tag.h     |  2 +-
> >  block/blk-mq.c         | 58 +++++++++++++++++++++++++++++++++++++++---
> >  block/blk-mq.h         |  1 +
> >  block/blk.h            |  1 +
> >  block/genhd.c          | 13 ++++++++++
> >  include/linux/blk-mq.h |  1 +
> >  8 files changed, 94 insertions(+), 12 deletions(-)
> >
> > --
> > 2.18.4
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] fix inaccurate io_ticks
  2020-11-17  3:01   ` Weiping Zhang
@ 2020-11-17  3:27     ` Ming Lei
  2020-11-17  4:59       ` Weiping Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2020-11-17  3:27 UTC (permalink / raw)
  To: Weiping Zhang; +Cc: Jens Axboe, Mike Snitzer, mpatocka, linux-block

On Tue, Nov 17, 2020 at 11:01:49AM +0800, Weiping Zhang wrote:
> Hi Jens,
> 
> Ping

Hello Weiping,

Not sure we have to fix this issue, and adding blk_mq_queue_inflight()
back to IO path brings cost which turns out to be visible, and I did
get soft lockup report on Azure NVMe because of this kind of cost.

BTW, suppose the io accounting issue needs to be fixed, just wondering
why not simply revert 5b18b5a73760 ("block: delete part_round_stats and
switch to less precise counting"), and the original way had been worked
for decades.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] fix inaccurate io_ticks
  2020-11-17  3:27     ` Ming Lei
@ 2020-11-17  4:59       ` Weiping Zhang
  2020-11-17  7:40         ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: Weiping Zhang @ 2020-11-17  4:59 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Mike Snitzer, mpatocka, linux-block

On Tue, Nov 17, 2020 at 11:28 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Tue, Nov 17, 2020 at 11:01:49AM +0800, Weiping Zhang wrote:
> > Hi Jens,
> >
> > Ping
>
> Hello Weiping,
>
> Not sure we have to fix this issue, and adding blk_mq_queue_inflight()
> back to IO path brings cost which turns out to be visible, and I did
> get soft lockup report on Azure NVMe because of this kind of cost.
>
Have you test v5, this patch is different from v1, the v1 gets
inflight for each IO,
v5 has changed to get inflight every jiffer.

If for v5, can we reproduce it on null_blk ?

> BTW, suppose the io accounting issue needs to be fixed, just wondering
> why not simply revert 5b18b5a73760 ("block: delete part_round_stats and
> switch to less precise counting"), and the original way had been worked
> for decades.
>
This patch is more better than before, it will break early when find there is
inflight io on any cpu, for the worst case(the io in running on the last cpu),
it iterates all cpus.
>
> Thanks,
> Ming
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] fix inaccurate io_ticks
  2020-11-17  4:59       ` Weiping Zhang
@ 2020-11-17  7:40         ` Ming Lei
  2020-11-18  5:55           ` Weiping Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2020-11-17  7:40 UTC (permalink / raw)
  To: Weiping Zhang; +Cc: Jens Axboe, Mike Snitzer, mpatocka, linux-block

On Tue, Nov 17, 2020 at 12:59:46PM +0800, Weiping Zhang wrote:
> On Tue, Nov 17, 2020 at 11:28 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Tue, Nov 17, 2020 at 11:01:49AM +0800, Weiping Zhang wrote:
> > > Hi Jens,
> > >
> > > Ping
> >
> > Hello Weiping,
> >
> > Not sure we have to fix this issue, and adding blk_mq_queue_inflight()
> > back to IO path brings cost which turns out to be visible, and I did
> > get soft lockup report on Azure NVMe because of this kind of cost.
> >
> Have you test v5, this patch is different from v1, the v1 gets
> inflight for each IO,
> v5 has changed to get inflight every jiffer.

I meant the issue can be reproduced on kernel before 5b18b5a73760("block:
delete part_round_stats and switch to less precise counting").

Also do we really need to fix this issue? I understand device
utilization becomes not accurate at very small load, is it really
worth of adding runtime load in fast path for fixing this issue?

> 
> If for v5, can we reproduce it on null_blk ?

No, I just saw report on Azure NVMe.

> 
> > BTW, suppose the io accounting issue needs to be fixed, just wondering
> > why not simply revert 5b18b5a73760 ("block: delete part_round_stats and
> > switch to less precise counting"), and the original way had been worked
> > for decades.
> >
> This patch is more better than before, it will break early when find there is
> inflight io on any cpu, for the worst case(the io in running on the last cpu),
> it iterates all cpus.

Please see the following case:

1) one device has 256 hw queues, and the system has 256 cpu cores, and
each hw queue's depth is 1k.

2) there isn't any io load on CPUs(0 ~ 254)

3) heavy io load is run on CPU 255

So with your trick the code still need to iterate hw queues from 0 to 254, and
the load isn't something which can be ignored. Especially it is just for
io accounting.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] fix inaccurate io_ticks
  2020-11-17  7:40         ` Ming Lei
@ 2020-11-18  5:55           ` Weiping Zhang
  2020-11-26 11:23             ` Weiping Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Weiping Zhang @ 2020-11-18  5:55 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Mike Snitzer, mpatocka, linux-block

On Tue, Nov 17, 2020 at 3:40 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Tue, Nov 17, 2020 at 12:59:46PM +0800, Weiping Zhang wrote:
> > On Tue, Nov 17, 2020 at 11:28 AM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > On Tue, Nov 17, 2020 at 11:01:49AM +0800, Weiping Zhang wrote:
> > > > Hi Jens,
> > > >
> > > > Ping
> > >
> > > Hello Weiping,
> > >
> > > Not sure we have to fix this issue, and adding blk_mq_queue_inflight()
> > > back to IO path brings cost which turns out to be visible, and I did
> > > get soft lockup report on Azure NVMe because of this kind of cost.
> > >
> > Have you test v5, this patch is different from v1, the v1 gets
> > inflight for each IO,
> > v5 has changed to get inflight every jiffer.
>
> I meant the issue can be reproduced on kernel before 5b18b5a73760("block:
> delete part_round_stats and switch to less precise counting").
>
> Also do we really need to fix this issue? I understand device
> utilization becomes not accurate at very small load, is it really
> worth of adding runtime load in fast path for fixing this issue?
>
Hello Ming,

The problem is user hard to know how busy disk is,
for small load, it shows high utilization, for heavy load it also shows
high utilization, that makes %util meaningless.

The following test case shows a big gap with same workload:

modprobe null_blk submit_queues=8 queue_mode=2 irqmode=2 completion_nsec=100000
fio -name=test -ioengine=sync -bs=4K -rw=write -filename=/dev/nullb0
-size=100M -time_based=1 -direct=1 -runtime=300 -rate=4m &

                        w/s   w_await  %util
-----------------------------------------------
before patch 1024         0.15  100
after   patch  1024         0.15  14.5

I know for hyper speed disk, add such accounting in fast path is harmful,
maybe we add an interface to enable/disable io_ticks accounting, like
what /sys/block/<disk>/queue/iostat does.

eg: /sys/block/<disk>/queue/iostat_io_ticks
when write 0 to it, just disable io_ticks totally.

Or any other good idea ?

> >
> > If for v5, can we reproduce it on null_blk ?
>
> No, I just saw report on Azure NVMe.
>
> >
> > > BTW, suppose the io accounting issue needs to be fixed, just wondering
> > > why not simply revert 5b18b5a73760 ("block: delete part_round_stats and
> > > switch to less precise counting"), and the original way had been worked
> > > for decades.
> > >
> > This patch is more better than before, it will break early when find there is
> > inflight io on any cpu, for the worst case(the io in running on the last cpu),
> > it iterates all cpus.
>
Yes, it's the worst case.
Actually v5 has two improvements compare to before 5b18b5a73760:
1. for io end, v5 do not get inflight count
2. for io start, v5 just find the first inflight io in any cpu, for
the worst case it does same as before.

> Please see the following case:
>
> 1) one device has 256 hw queues, and the system has 256 cpu cores, and
> each hw queue's depth is 1k.
>
> 2) there isn't any io load on CPUs(0 ~ 254)
>
> 3) heavy io load is run on CPU 255
>
> So with your trick the code still need to iterate hw queues from 0 to 254, and
> the load isn't something which can be ignored. Especially it is just for
> io accounting.
>
>
> Thanks,
> Ming
>
Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] fix inaccurate io_ticks
  2020-11-18  5:55           ` Weiping Zhang
@ 2020-11-26 11:23             ` Weiping Zhang
  2020-12-17 17:03               ` Weiping Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Weiping Zhang @ 2020-11-26 11:23 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Mike Snitzer, mpatocka, linux-block

Ping

On Wed, Nov 18, 2020 at 1:55 PM Weiping Zhang <zwp10758@gmail.com> wrote:
>
> On Tue, Nov 17, 2020 at 3:40 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Tue, Nov 17, 2020 at 12:59:46PM +0800, Weiping Zhang wrote:
> > > On Tue, Nov 17, 2020 at 11:28 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > On Tue, Nov 17, 2020 at 11:01:49AM +0800, Weiping Zhang wrote:
> > > > > Hi Jens,
> > > > >
> > > > > Ping
> > > >
> > > > Hello Weiping,
> > > >
> > > > Not sure we have to fix this issue, and adding blk_mq_queue_inflight()
> > > > back to IO path brings cost which turns out to be visible, and I did
> > > > get soft lockup report on Azure NVMe because of this kind of cost.
> > > >
> > > Have you test v5, this patch is different from v1, the v1 gets
> > > inflight for each IO,
> > > v5 has changed to get inflight every jiffer.
> >
> > I meant the issue can be reproduced on kernel before 5b18b5a73760("block:
> > delete part_round_stats and switch to less precise counting").
> >
> > Also do we really need to fix this issue? I understand device
> > utilization becomes not accurate at very small load, is it really
> > worth of adding runtime load in fast path for fixing this issue?
> >
> Hello Ming,
>
> The problem is user hard to know how busy disk is,
> for small load, it shows high utilization, for heavy load it also shows
> high utilization, that makes %util meaningless.
>
> The following test case shows a big gap with same workload:
>
> modprobe null_blk submit_queues=8 queue_mode=2 irqmode=2 completion_nsec=100000
> fio -name=test -ioengine=sync -bs=4K -rw=write -filename=/dev/nullb0
> -size=100M -time_based=1 -direct=1 -runtime=300 -rate=4m &
>
>                         w/s   w_await  %util
> -----------------------------------------------
> before patch 1024         0.15  100
> after   patch  1024         0.15  14.5
>
> I know for hyper speed disk, add such accounting in fast path is harmful,
> maybe we add an interface to enable/disable io_ticks accounting, like
> what /sys/block/<disk>/queue/iostat does.
>
> eg: /sys/block/<disk>/queue/iostat_io_ticks
> when write 0 to it, just disable io_ticks totally.
>
> Or any other good idea ?
>
> > >
> > > If for v5, can we reproduce it on null_blk ?
> >
> > No, I just saw report on Azure NVMe.
> >
> > >
> > > > BTW, suppose the io accounting issue needs to be fixed, just wondering
> > > > why not simply revert 5b18b5a73760 ("block: delete part_round_stats and
> > > > switch to less precise counting"), and the original way had been worked
> > > > for decades.
> > > >
> > > This patch is more better than before, it will break early when find there is
> > > inflight io on any cpu, for the worst case(the io in running on the last cpu),
> > > it iterates all cpus.
> >
> Yes, it's the worst case.
> Actually v5 has two improvements compare to before 5b18b5a73760:
> 1. for io end, v5 do not get inflight count
> 2. for io start, v5 just find the first inflight io in any cpu, for
> the worst case it does same as before.
>
> > Please see the following case:
> >
> > 1) one device has 256 hw queues, and the system has 256 cpu cores, and
> > each hw queue's depth is 1k.
> >
> > 2) there isn't any io load on CPUs(0 ~ 254)
> >
> > 3) heavy io load is run on CPU 255
> >
> > So with your trick the code still need to iterate hw queues from 0 to 254, and
> > the load isn't something which can be ignored. Especially it is just for
> > io accounting.
> >
> >
> > Thanks,
> > Ming
> >
> Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/2] fix inaccurate io_ticks
  2020-11-26 11:23             ` Weiping Zhang
@ 2020-12-17 17:03               ` Weiping Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Weiping Zhang @ 2020-12-17 17:03 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Mike Snitzer, mpatocka, linux-block

Ping again.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-12-17 17:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-27  4:54 [PATCH v5 0/2] fix inaccurate io_ticks Weiping Zhang
2020-11-04  3:26 ` Weiping Zhang
2020-11-17  3:01   ` Weiping Zhang
2020-11-17  3:27     ` Ming Lei
2020-11-17  4:59       ` Weiping Zhang
2020-11-17  7:40         ` Ming Lei
2020-11-18  5:55           ` Weiping Zhang
2020-11-26 11:23             ` Weiping Zhang
2020-12-17 17:03               ` Weiping Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).