linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Weiping Zhang <zhangweiping@didiglobal.com>
To: <axboe@kernel.dk>, <tj@kernel.org>
Cc: <linux-block@vger.kernel.org>, <cgroups@vger.kernel.org>
Subject: [RFC 0/3] blkcg: add blk-iotrack
Date: Sat, 21 Mar 2020 09:20:36 +0800	[thread overview]
Message-ID: <cover.1584728740.git.zhangweiping@didiglobal.com> (raw)

Hi all,

This patchset try to add a monitor-only module blk-iotrack for block
cgroup.

It contains kernel space blk-iotrack and user space tools iotrack, and
you can also write your own tool to do more data analysis.

blk-iotrack was designed to track various io statistic of block cgroup,
it is based on rq_qos framework. It only tracks io and does not do any
throttlling.

Compare to blk-iolatency, it provides 8 configurable latency buckets,
/sys/fs/cgroup/io.iotrack.lat_thresh, blk-iotrack will account the
number of IOs whose latency less than corresponding threshold. In this
way we can get the cgroup's latency distribution. The default latency
bucket is 50us, 100us, 200us, 400us, 1ms, 2ms, 4ms, 8ms.

Compare to io.stat.{rbytes,wbytes,rios,wios,dbytes,dios}, it account
IOs when IO completed, instead of submited. If IO was throttled by
io scheduler or other throttle policy, then there is a gap, these
IOs have not been completed yet.

The previous patch has record the timestamp for each bio, when it
was issued to the disk driver. Then we can get the disk latency in
rq_qos_done_bio, this is also be called D2C time. In rq_qos_done_bio,
blk-iotrack also record total latency(now - bio_issue_time), actually
it can be treated as the Q2C time. In this way, we can get the percentile
%d2c=D2C/Q2C for each cgroup. It's very useful to detect the main latency
is from disk or software e.g. io scheduler or other block cgroup throttle
policy.

The user space tool, which called iotrack, used to collect these basic
io statistics and then generate more valuable metrics at cgroup level.
From iotrack, you can get a cgroup's percentile for io, bytes,
total_time and disk_time of the whole disk. It can easily to evaluate
the real weight of the weight based policy(bfq, blk-iocost).
There are lots of metrics for read and write generate by iotrack,
for more details, please visit: https://github.com/dublio/iotrack.

Test result for two fio with randread 4K,
test1 cgroup bfq weight = 800
test2 cgroup bfq weight = 100

Device      io/s   MB/s    %io    %MB    %tm   %dtm  %d2c %hit0 %hit1 %hit2 %hit3 %hit4 %hit5  %hit6  %hit7 cgroup
nvme1n1 44588.00 174.17 100.00 100.00 100.00 100.00 38.46  0.25 45.27 95.90 98.33 99.47 99.85  99.92  99.95 /
nvme1n1 30206.00 117.99  67.74  67.74  29.44  67.29 87.90  0.35 47.82 99.22 99.98 99.99 99.99 100.00 100.00 /test1
nvme1n1 14370.00  56.13  32.23  32.23  70.55  32.69 17.82  0.03 39.89 88.92 94.88 98.37 99.53  99.77  99.85 /test2

* The root block cgroup "/" shows the io statistics for whole ssd disk.

* test1 use disk's %67 iops and bps.

* %dtm stands for the on disk time, test1 cgroup get 67% of whole disk,
	that means test1 gets more disk time than test2.

* For test's %d2c, there is only 17% latency cost at hardware disk,
	that means the main latency cames from software, it was
	throttled by softwre.


The patch1 and patch2 are preapre patch.
The last patch implement blk-iotrack.

Weiping Zhang (3):
  update the real issue size when bio_split
  bio: track timestamp of submitting bio the disk driver
  blkcg: add blk-iotrack

 block/Kconfig              |   6 +
 block/Makefile             |   1 +
 block/bio.c                |  13 ++
 block/blk-cgroup.c         |   4 +
 block/blk-iotrack.c        | 436 +++++++++++++++++++++++++++++++++++++
 block/blk-mq.c             |   3 +
 block/blk-rq-qos.h         |   3 +
 block/blk.h                |   7 +
 include/linux/blk-cgroup.h |   6 +
 include/linux/blk_types.h  |  38 ++++
 10 files changed, 517 insertions(+)
 create mode 100644 block/blk-iotrack.c

-- 
2.18.1


WARNING: multiple messages have this Message-ID (diff)
From: Weiping Zhang <zhangweiping@didiglobal.com>
To: <axboe@kernel.dk>, <tj@kernel.org>
Cc: <linux-block@vger.kernel.org>, <cgroups@vger.kernel.org>
Subject: [RFC PATCH v2 0/3] blkcg: add blk-iotrack
Date: Fri, 27 Mar 2020 14:27:13 +0800	[thread overview]
Message-ID: <cover.1584728740.git.zhangweiping@didiglobal.com> (raw)
Message-ID: <20200327062713.ZaBe1RvE9YKUyw-hHcWl_X3U9tzykmk1LvdSPzis6hU@z> (raw)

Hi all,

This patchset try to add a monitor-only module blk-iotrack for block
cgroup.

It contains kernel space blk-iotrack and user space tools iotrack, and
you can also write your own tool to do more data analysis.

blk-iotrack was designed to track various io statistic of block cgroup,
it is based on rq_qos framework. It only tracks io and does not do any
throttlling.

Compare to blk-iolatency, it provides 8 configurable latency buckets,
/sys/fs/cgroup/io.iotrack.lat_thresh, blk-iotrack will account the
number of IOs whose latency less than corresponding threshold. In this
way we can get the cgroup's latency distribution. The default latency
bucket is 50us, 100us, 200us, 400us, 1ms, 2ms, 4ms, 8ms.

Compare to io.stat.{rbytes,wbytes,rios,wios,dbytes,dios}, it account
IOs when IO completed, instead of submited. If IO was throttled by
io scheduler or other throttle policy, then there is a gap, these
IOs have not been completed yet.

The previous patch has record the timestamp for each bio, when it
was issued to the disk driver. Then we can get the disk latency in
rq_qos_done_bio, this is also be called D2C time. In rq_qos_done_bio,
blk-iotrack also record total latency(now - bio_issue_time), actually
it can be treated as the Q2C time. In this way, we can get the percentile
%d2c=D2C/Q2C for each cgroup. It's very useful to detect the main latency
is from disk or software e.g. io scheduler or other block cgroup throttle
policy.

The user space tool, which called iotrack, used to collect these basic
io statistics and then generate more valuable metrics at cgroup level.
From iotrack, you can get a cgroup's percentage for io(%io), bytes(%byte),
total_time and disk_time of the whole disk. It can easily to evaluate
the real weight of the weight based policy(bfq, blk-iocost).
Except the basic io/s, MB/s, iotrack also show:
%io
There are lots of metrics for read and write generate by iotrack,
for more details, please visit: https://github.com/dublio/iotrack.

Test result for two fio with randread 4K,
test1 cgroup bfq weight = 800
test2 cgroup bfq weight = 100
numjobs=8, iodepth=32

Device   rrqm/s wrqm/s r/s    w/s    rMB/s  wMB/s  avgrqkb  avgqu-sz await    r_await  w_await  svctm    %util    conc
nvme1n1  0      0      217341 0      848.98 0.00   4.00     475.03   2.28     2.28     0.00     0.00     100.20   474.08

Device   io/s   MB/s   %io    %byte  %tm    %dtm   %d2c ad2c aq2c   %hit0 %hit1 %hit2 %hit3 %hit4 %hit5 %hit6 %hit7 cgroup
nvme1n1  217345 849.00 100.00 100.00 100.00 100.00 4.09 0.09 2.28   23.97 62.43 89.88 98.44 99.88 99.88 99.88 99.88 /
nvme1n1  193183 754.62 88.88  88.88  45.91  84.54  7.52 0.09 1.18   26.85 64.87 90.71 98.40 99.88 99.88 99.88 99.88 /test1
nvme1n1  24235  94.67  11.15  11.15  54.09  15.48  1.17 0.13 11.06  0.98  43.00 83.31 98.77 99.87 99.87 99.87 99.87 /test2

* The root block cgroup "/" shows the io statistics for whole ssd disk.

* test1 use disk's 88% iops and bps.

* %dtm stands for the on disk time, test1 cgroup get 85% of whole disk,
	that means test1 gets more disk time than test2.

* For test's %d2c, there is only 1.17% latency cost at hardware disk,
	that means the main latency cames from software, it was
	throttled by softwre.

* aq2c: average q2c, test2's aq2c(11ms) > test1's aq2c(1ms).

* For latency distribution, hit1(<=100us), 64% io of test1 <=100us, and
        43% io of test2 <=100us, test1's latency is better than test2.

For more detail test report, please visit:
https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test

The patch1 and patch2 are preapre patch.
The last patch implement blk-iotrack.

Changes since v1:
* fix bio issue_size when split bio, v1 patch will clear issue_time.

Weiping Zhang (3):
  update the real issue size when bio_split
  bio: track timestamp of submitting bio the disk driver
  blkcg: add blk-iotrack

 block/Kconfig              |   6 +
 block/Makefile             |   1 +
 block/bio.c                |  13 ++
 block/blk-cgroup.c         |   4 +
 block/blk-iotrack.c        | 436 +++++++++++++++++++++++++++++++++++++
 block/blk-mq.c             |   3 +
 block/blk-rq-qos.h         |   3 +
 block/blk.h                |   7 +
 include/linux/blk-cgroup.h |   6 +
 include/linux/blk_types.h  |  38 ++++
 10 files changed, 517 insertions(+)
 create mode 100644 block/blk-iotrack.c

-- 
2.18.1


             reply	other threads:[~2020-03-21  1:20 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21  1:20 Weiping Zhang [this message]
2020-03-21  1:21 ` [RFC 1/3] update the real issue size when bio_split Weiping Zhang
2020-03-21  1:21 ` [RFC 2/3] bio: track timestamp of submitting bio the disk driver Weiping Zhang
2020-03-21  1:21 ` [RFC 3/3] blkcg: add blk-iotrack Weiping Zhang
2020-03-24 18:27 ` [RFC 0/3] " Tejun Heo
2020-03-25 12:49   ` Weiping Zhang
2020-03-25 14:12     ` Tejun Heo
2020-03-25 16:45       ` Weiping Zhang
2020-03-26 15:08         ` Weiping Zhang
2020-03-26 16:14           ` Tejun Heo
2020-03-26 16:27             ` Weiping Zhang
2020-03-31 14:19               ` Tejun Heo
2020-03-27  6:27 ` [RFC PATCH v2 " Weiping Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1584728740.git.zhangweiping@didiglobal.com \
    --to=zhangweiping@didiglobal.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).