* [regression] very inaccurate %util of iostat @ 2020-03-24 3:19 Ming Lei 2020-03-24 3:53 ` Mike Snitzer 0 siblings, 1 reply; 5+ messages in thread From: Ming Lei @ 2020-03-24 3:19 UTC (permalink / raw) To: linux-block, Jens Axboe, Mike Snitzer, mpatocka Hi Guys, Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting") changes calculation of 'io_ticks' a lot. In theory, io_ticks counts the time when there is any IO in-flight or in-queue, so it has to rely on in-flight counting of IO. However, commit 5b18b5a73760 changes io_ticks's accounting into the following way: stamp = READ_ONCE(part->stamp); if (unlikely(stamp != now)) { if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) __part_stat_add(part, io_ticks, 1); } So this way doesn't use any in-flight IO's info, simply adding 1 if stamp changes compared with previous stamp, no matter if there is any in-flight IO or not. Now when there is very heavy IO on disks, %util is still much less than 100%, especially on HDD, the reason could be that IO latency can be much more than 1ms in case of 1000HZ, so the above calculation is very inaccurate. Another extreme example is that if IOs take long time to complete, such as IO stall, %util may show 0% utilization, instead of 100%. Thanks, Ming ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: very inaccurate %util of iostat 2020-03-24 3:19 [regression] very inaccurate %util of iostat Ming Lei @ 2020-03-24 3:53 ` Mike Snitzer 2020-10-23 2:48 ` Weiping Zhang 0 siblings, 1 reply; 5+ messages in thread From: Mike Snitzer @ 2020-03-24 3:53 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block, Jens Axboe, mpatocka, Konstantin Khlebnikov On Mon, Mar 23 2020 at 11:19pm -0400, Ming Lei <ming.lei@redhat.com> wrote: > Hi Guys, > > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting") > changes calculation of 'io_ticks' a lot. > > In theory, io_ticks counts the time when there is any IO in-flight or in-queue, > so it has to rely on in-flight counting of IO. > > However, commit 5b18b5a73760 changes io_ticks's accounting into the > following way: > > stamp = READ_ONCE(part->stamp); > if (unlikely(stamp != now)) { > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) > __part_stat_add(part, io_ticks, 1); > } > > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp > changes compared with previous stamp, no matter if there is any in-flight > IO or not. > > Now when there is very heavy IO on disks, %util is still much less than > 100%, especially on HDD, the reason could be that IO latency can be much more > than 1ms in case of 1000HZ, so the above calculation is very inaccurate. > > Another extreme example is that if IOs take long time to complete, such > as IO stall, %util may show 0% utilization, instead of 100%. Hi Ming, Your email triggered a memory of someone else (Konstantin Khlebnikov) having reported and fixed this relatively recently, please see this patchset: https://lkml.org/lkml/2020/3/2/336 Obviously this needs fixing. If you have time to review/polish the proposed patches that'd be great. Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: very inaccurate %util of iostat 2020-03-24 3:53 ` Mike Snitzer @ 2020-10-23 2:48 ` Weiping Zhang 2020-10-23 5:50 ` Weiping Zhang 0 siblings, 1 reply; 5+ messages in thread From: Weiping Zhang @ 2020-10-23 2:48 UTC (permalink / raw) To: Mike Snitzer Cc: Ming Lei, linux-block, Jens Axboe, mpatocka, Konstantin Khlebnikov On Tue, Mar 24, 2020 at 11:54 AM Mike Snitzer <snitzer@redhat.com> wrote: > > On Mon, Mar 23 2020 at 11:19pm -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > Hi Guys, > > > > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting") > > changes calculation of 'io_ticks' a lot. > > > > In theory, io_ticks counts the time when there is any IO in-flight or in-queue, > > so it has to rely on in-flight counting of IO. > > > > However, commit 5b18b5a73760 changes io_ticks's accounting into the > > following way: > > > > stamp = READ_ONCE(part->stamp); > > if (unlikely(stamp != now)) { > > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) > > __part_stat_add(part, io_ticks, 1); > > } > > > > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp > > changes compared with previous stamp, no matter if there is any in-flight > > IO or not. > > > > Now when there is very heavy IO on disks, %util is still much less than > > 100%, especially on HDD, the reason could be that IO latency can be much more > > than 1ms in case of 1000HZ, so the above calculation is very inaccurate. > > > > Another extreme example is that if IOs take long time to complete, such > > as IO stall, %util may show 0% utilization, instead of 100%. > > Hi Ming, > > Your email triggered a memory of someone else (Konstantin Khlebnikov) > having reported and fixed this relatively recently, please see this > patchset: https://lkml.org/lkml/2020/3/2/336 > > Obviously this needs fixing. If you have time to review/polish the > proposed patches that'd be great. > > Mike > Hi, commit 5b18b5a73760 makes io.util larger than the real, when IO inflight count <= 1, even with the commit 2b8bd423614, the problem is exist too. static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end) { unsigned long stamp; again: stamp = READ_ONCE(part->stamp); if (unlikely(stamp != now)) { if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) __part_stat_add(part, io_ticks, end ? now - stamp : 1); } when a new start, blk_account_io_start => update_io_ticks and add 1 jiffy to io_ticks, even there is no IO before, so it will always add an extra 1 jiffy. So we should know is there any inflight IO before. Before commit 5b18b5a73760, The io_ticks will not be added, if there is no inflight when start a new IO. static void part_round_stats_single(struct request_queue *q, struct hd_struct *part, unsigned long now, unsigned int inflight) { if (inflight) { __part_stat_add(part, time_in_queue, inflight * (now - part->stamp)); __part_stat_add(part, io_ticks, (now - part->stamp)); } part->stamp = now; } Reproduce: fio -name=test -ioengine=sync -bs=4K -rw=write -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1 -runtime=300 -rate=2m,2m ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: very inaccurate %util of iostat 2020-10-23 2:48 ` Weiping Zhang @ 2020-10-23 5:50 ` Weiping Zhang [not found] ` <156351603710306@mail.yandex-team.ru> 0 siblings, 1 reply; 5+ messages in thread From: Weiping Zhang @ 2020-10-23 5:50 UTC (permalink / raw) To: Mike Snitzer Cc: Ming Lei, linux-block, Jens Axboe, mpatocka, Konstantin Khlebnikov On Fri, Oct 23, 2020 at 10:48 AM Weiping Zhang <zwp10758@gmail.com> wrote: > > On Tue, Mar 24, 2020 at 11:54 AM Mike Snitzer <snitzer@redhat.com> wrote: > > > > On Mon, Mar 23 2020 at 11:19pm -0400, > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > Hi Guys, > > > > > > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting") > > > changes calculation of 'io_ticks' a lot. > > > > > > In theory, io_ticks counts the time when there is any IO in-flight or in-queue, > > > so it has to rely on in-flight counting of IO. > > > > > > However, commit 5b18b5a73760 changes io_ticks's accounting into the > > > following way: > > > > > > stamp = READ_ONCE(part->stamp); > > > if (unlikely(stamp != now)) { > > > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) > > > __part_stat_add(part, io_ticks, 1); > > > } > > > > > > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp > > > changes compared with previous stamp, no matter if there is any in-flight > > > IO or not. > > > > > > Now when there is very heavy IO on disks, %util is still much less than > > > 100%, especially on HDD, the reason could be that IO latency can be much more > > > than 1ms in case of 1000HZ, so the above calculation is very inaccurate. > > > > > > Another extreme example is that if IOs take long time to complete, such > > > as IO stall, %util may show 0% utilization, instead of 100%. > > > > Hi Ming, > > > > Your email triggered a memory of someone else (Konstantin Khlebnikov) > > having reported and fixed this relatively recently, please see this > > patchset: https://lkml.org/lkml/2020/3/2/336 > > > > Obviously this needs fixing. If you have time to review/polish the > > proposed patches that'd be great. > > > > Mike > > > > Hi, > > commit 5b18b5a73760 makes io.util larger than the real, when IO > inflight count <= 1, > even with the commit 2b8bd423614, the problem is exist too. > > static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end) > { > unsigned long stamp; > again: > stamp = READ_ONCE(part->stamp); > if (unlikely(stamp != now)) { > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) > __part_stat_add(part, io_ticks, end ? now - stamp : 1); > } > > when a new start, blk_account_io_start => update_io_ticks and add 1 jiffy to > io_ticks, even there is no IO before, so it will always add an extra 1 jiffy. > So we should know is there any inflight IO before. > > > > Before commit 5b18b5a73760, > The io_ticks will not be added, if there is no inflight when start a new IO. > static void part_round_stats_single(struct request_queue *q, > struct hd_struct *part, unsigned long now, > unsigned int inflight) > { > if (inflight) { > __part_stat_add(part, time_in_queue, > inflight * (now - part->stamp)); > __part_stat_add(part, io_ticks, (now - part->stamp)); > } > part->stamp = now; > } > > > Reproduce: > fio -name=test -ioengine=sync -bs=4K -rw=write > -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1 > -runtime=300 -rate=2m,2m Let's explain in more detail. I run the following command on a host, with different kernel version. fio -name=test -ioengine=sync -bs=4K -rw=write -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1 -runtime=300 -rate=2m,2m If we run fio in a sync direct io mode, IO will be proccessed one by one, you can see that there are 512 IOs completed in one second. kernel: 4.19.0 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vda 0.00 0.00 0.00 512.00 0.00 2.00 8.00 0.21 0.40 0.00 0.40 0.40 20.60 The averate io.latency is 0.4ms So the disk time cost in one second should be 0.4 * 512 = 204.8 ms, that means, %util should be 20%. kernel: commit f9893351ac In the latest kernel commit f9893351ac, I got the follow number, Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the io.latency io.latency will be 1 + 0.4 = 1.4ms, 1.4 * 512 = 716.8ms, so the %util show it about 72%. Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util vda 0.00 512.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.40 0.20 0.00 4.00 1.41 72.10 There is a big gap for %util. ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <156351603710306@mail.yandex-team.ru>]
* Re: very inaccurate %util of iostat [not found] ` <156351603710306@mail.yandex-team.ru> @ 2020-10-27 4:26 ` Weiping Zhang 0 siblings, 0 replies; 5+ messages in thread From: Weiping Zhang @ 2020-10-27 4:26 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Mike Snitzer, Ming Lei, linux-block, Jens Axboe, mpatocka On Mon, Oct 26, 2020 at 7:17 PM Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote: > > 23.10.2020, 08:51, "Weiping Zhang" <zwp10758@gmail.com>: > > On Fri, Oct 23, 2020 at 10:48 AM Weiping Zhang <zwp10758@gmail.com> wrote: > > > On Tue, Mar 24, 2020 at 11:54 AM Mike Snitzer <snitzer@redhat.com> wrote: > > > > On Mon, Mar 23 2020 at 11:19pm -0400, > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > Hi Guys, > > > > > > Commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting") > > > changes calculation of 'io_ticks' a lot. > > > > > > In theory, io_ticks counts the time when there is any IO in-flight or in-queue, > > > so it has to rely on in-flight counting of IO. > > > > > > However, commit 5b18b5a73760 changes io_ticks's accounting into the > > > following way: > > > > > > stamp = READ_ONCE(part->stamp); > > > if (unlikely(stamp != now)) { > > > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) > > > __part_stat_add(part, io_ticks, 1); > > > } > > > > > > So this way doesn't use any in-flight IO's info, simply adding 1 if stamp > > > changes compared with previous stamp, no matter if there is any in-flight > > > IO or not. > > > > > > Now when there is very heavy IO on disks, %util is still much less than > > > 100%, especially on HDD, the reason could be that IO latency can be much more > > > than 1ms in case of 1000HZ, so the above calculation is very inaccurate. > > > > > > Another extreme example is that if IOs take long time to complete, such > > > as IO stall, %util may show 0% utilization, instead of 100%. > > > > Hi Ming, > > > > Your email triggered a memory of someone else (Konstantin Khlebnikov) > > having reported and fixed this relatively recently, please see this > > patchset: https://lkml.org/lkml/2020/3/2/336 > > > > Obviously this needs fixing. If you have time to review/polish the > > proposed patches that'd be great. > > > > Mike > > > > Hi, > > commit 5b18b5a73760 makes io.util larger than the real, when IO > inflight count <= 1, > even with the commit 2b8bd423614, the problem is exist too. > > static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end) > { > unsigned long stamp; > again: > stamp = READ_ONCE(part->stamp); > if (unlikely(stamp != now)) { > if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) > __part_stat_add(part, io_ticks, end ? now - stamp : 1); > } > > when a new start, blk_account_io_start => update_io_ticks and add 1 jiffy to > io_ticks, even there is no IO before, so it will always add an extra 1 jiffy. > So we should know is there any inflight IO before. > > > > Before commit 5b18b5a73760, > The io_ticks will not be added, if there is no inflight when start a new IO. > static void part_round_stats_single(struct request_queue *q, > struct hd_struct *part, unsigned long now, > unsigned int inflight) > { > if (inflight) { > __part_stat_add(part, time_in_queue, > inflight * (now - part->stamp)); > __part_stat_add(part, io_ticks, (now - part->stamp)); > } > part->stamp = now; > } > > > Reproduce: > fio -name=test -ioengine=sync -bs=4K -rw=write > -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1 > -runtime=300 -rate=2m,2m > > > > > Let's explain in more detail. > > I run the following command on a host, with different kernel version. > > fio -name=test -ioengine=sync -bs=4K -rw=write > -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1 > -runtime=300 -rate=2m,2m > > If we run fio in a sync direct io mode, IO will be proccessed one by one, > you can see that there are 512 IOs completed in one second. > > > kernel: 4.19.0 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 512.00 0.00 2.00 > 8.00 0.21 0.40 0.00 0.40 0.40 20.60 > > The averate io.latency is 0.4ms > So the disk time cost in one second should be 0.4 * 512 = 204.8 ms, > that means, %util should be 20%. > > > kernel: commit f9893351ac > In the latest kernel commit f9893351ac, I got the follow number, > Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the > io.latency io.latency will be 1 + 0.4 = 1.4ms, > 1.4 * 512 = 716.8ms, so the %util show it about 72%. > > Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s > %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util > vda 0.00 512.00 0.00 2.00 0.00 0.00 > 0.00 0.00 0.00 0.40 0.20 0.00 4.00 1.41 72.10 > > There is a big gap for %util. > > > Yep. Meaning of this statistics has changed. Now it approximates count of jiffies where was running at last one request. I.e. every tiny request always accounts whole jiffy were it started/finished. > When Q2Q >= 1 jiffy, an extra 1jiffy was added when IO start, it's really not reasonable, modprobe null_blk submit_queues=8 queue_mode=2 irqmode=2 completion_nsec=100000 fio -name=test -ioengine=sync -bs=4K -rw=write -filename=/dev/nullb0 -size=100M -time_based=1 -direct=1 -runtime=300 -rate=4m & w/s w_await %util ----------------------------------------------- before patch 1024 0.15 100 after patch 1024 0.15 14.5 The gap is really unacceptable. The test patch: https://github.com/dublio/linux/commits/fix_util_v5 > %util is still useful - it shows how disk activity is grouped across time. > > You could get precise "utilization" in form of average queue depth from read/write time. It's hard to get the percent of overlap each other for inflight IOs when device has high queue depth. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-10-27 4:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-24 3:19 [regression] very inaccurate %util of iostat Ming Lei 2020-03-24 3:53 ` Mike Snitzer 2020-10-23 2:48 ` Weiping Zhang 2020-10-23 5:50 ` Weiping Zhang [not found] ` <156351603710306@mail.yandex-team.ru> 2020-10-27 4:26 ` Weiping Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).