From: Weiping Zhang <zwp10758@gmail.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
Bart Van Assche <bvanassche@acm.org>,
linux-nvme@lists.infradead.org, Ming Lei <ming.lei@redhat.com>,
linux-block@vger.kernel.org, Minwoo Im <minwoo.im.dev@gmail.com>,
cgroups@vger.kernel.org, Tejun Heo <tj@kernel.org>,
"Nadolski, Edmund" <edmund.nadolski@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v5 0/4] Add support Weighted Round Robin for blkcg and nvme
Date: Tue, 31 Mar 2020 14:17:06 +0800 [thread overview]
Message-ID: <CAA70yB62_6JD_8dJTGPjnjJfyJSa1xqiCVwwNYtsTCUXQR5uCA@mail.gmail.com> (raw)
In-Reply-To: <CAA70yB5qAj8YnNiPVD5zmPrrTr0A0F3v2cC6t2S1Fb0kiECLfw@mail.gmail.com>
> > On the driver implementation, the number of module parameters being
> > added here is problematic. We already have 2 special classes of queues,
> > and defining this at the module level is considered too coarse when
> > the system has different devices on opposite ends of the capability
> > spectrum. For example, users want polled queues for the fast devices,
> > and none for the slower tier. We just don't have a good mechanism to
> > define per-controller resources, and more queue classes will make this
> > problem worse.
> >
> We can add a new "string" module parameter, which contains a model number,
> in most cases, the save product with a common prefix model number, so
> in this way
> nvme can distinguish the different performance devices(hign or low end).
> Before create io queue, nvme driver can get the device's Model number(40 Bytes),
> then nvme driver can compare device's model number with module parameter, to
> decide how many io queues for each disk;
>
> /* if model_number is MODEL_ANY, these parameters will be applied to
> all nvme devices. */
> char dev_io_queues[1024] = "model_number=MODEL_ANY,
> poll=0,read=0,wrr_low=0,wrr_medium=0,wrr_high=0,wrr_urgent=0";
> /* these paramters only affect nvme disk whose model number is "XXX" */
> char dev_io_queues[1024] = "model_number=XXX,
> poll=1,read=2,wrr_low=3,wrr_medium=4,wrr_high=5,wrr_urgent=0;";
>
> struct dev_io_queues {
> char model_number[40];
> unsigned int poll;
> unsgined int read;
> unsigned int wrr_low;
> unsigned int wrr_medium;
> unsigned int wrr_high;
> unsigned int wrr_urgent;
> };
>
> We can use these two variable to store io queue configurations:
>
> /* default values for the all disk, except whose model number is not
> in io_queues_cfg */
> struct dev_io_queues io_queues_def = {};
>
> /* user defined values for a specific model number */
> struct dev_io_queues io_queues_cfg = {};
>
> If we need multiple configurations( > 2), we can also extend
> dev_io_queues to support it.
>
Hi Maintainers,
If we add patch to support these queue count at controller level,
instead moudle level,
shall we add WRR ?
Recently I do some cgroup io weight testing,
https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test
I think a proper io weight policy
should consider high weight cgroup's iops, latency and also take whole
disk's throughput
into account, that is to say, the policy should do more carfully trade
off between cgroup's
IO performance and whole disk's throughput. I know one policy cannot
do all things perfectly,
but from the test result nvme-wrr can work well.
From the following test result, nvme-wrr work well for both cgroup's
latency, iops, and whole
disk's throughput.
Notes:
blk-iocost: only set qos.model, not set percentage latency.
nvme-wrr: set weight by:
h=64;m=32;l=8;ab=0; nvme set-feature /dev/nvme1n1 -f 1 -v $(printf
"0x%x\n" $(($ab<<0|$l<<8|$m<<16|$h<<24)))
echo "$major:$minor high" > /sys/fs/cgroup/test1/io.wrr
echo "$major:$minor low" > /sys/fs/cgroup/test2/io.wrr
Randread vs Randread:
cgroup.test1.weight : cgroup.test2.weight = 8 : 1
high weight cgroup test1: randread, fio: numjobs=8, iodepth=32, bs=4K
low weight cgroup test2: randread, fio: numjobs=8, iodepth=32, bs=4K
test case bw iops rd_avg_lat wr_avg_lat
rd_p99_lat wr_p99_lat
=======================================================================================
bfq_test1 767226 191806 1333.30 0.00
536.00 0.00
bfq_test2 94607 23651 10816.06 0.00
610.00 0.00
iocost_test1 1457718 364429 701.76 0.00
1630.00 0.00
iocost_test2 1466337 366584 697.62 0.00
1613.00 0.00
none_test1 1456585 364146 702.22 0.00
1646.00 0.00
none_test2 1463090 365772 699.12 0.00
1613.00 0.00
wrr_test1 2635391 658847 387.94 0.00
1236.00 0.00
wrr_test2 365428 91357 2801.00 0.00
5537.00 0.00
https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test#215-summary-fio-output
Randread vs Seq Write:
cgroup.test1.weight : cgroup.test2.weight = 8 : 1
high weight cgroup test1: randread, fio: numjobs=8, iodepth=32, bs=4K
low weight cgroup test2: seq write, fio: numjobs=1, iodepth=32, bs=256K
test case bw iops rd_avg_lat wr_avg_lat
rd_p99_lat wr_p99_lat
=======================================================================================
bfq_test1 814327 203581 1256.19 0.00 593.00 0.00
bfq_test2 104758 409 0.00 78196.32 0.00
1052770.00
iocost_test1 270467 67616 3784.02 0.00 9371.00 0.00
iocost_test2 1541575 6021 0.00 5313.02 0.00
6848.00
none_test1 271708 67927 3767.01 0.00 9502.00 0.00
none_test2 1541951 6023 0.00 5311.50 0.00
6848.00
wrr_test1 775005 193751 1320.17 0.00 4112.00 0.00
wrr_test2 1198319 4680 0.00 6835.30 0.00
8847.00
https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test#225-summary-fio-output
Thanks
Weiping
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-03-31 6:17 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-04 3:30 [PATCH v5 0/4] Add support Weighted Round Robin for blkcg and nvme Weiping Zhang
2020-02-04 3:31 ` [PATCH v5 1/4] block: add weighted round robin for blkcgroup Weiping Zhang
2020-02-04 3:31 ` [PATCH v5 2/4] nvme: add get_ams for nvme_ctrl_ops Weiping Zhang
2020-02-04 3:31 ` [PATCH v5 3/4] nvme-pci: rename module parameter write_queues to read_queues Weiping Zhang
2020-02-04 3:31 ` [PATCH v5 4/4] nvme: add support weighted round robin queue Weiping Zhang
2020-02-04 15:42 ` [PATCH v5 0/4] Add support Weighted Round Robin for blkcg and nvme Keith Busch
2020-02-16 8:09 ` Weiping Zhang
2020-03-31 6:17 ` Weiping Zhang [this message]
2020-03-31 10:29 ` Paolo Valente
2020-03-31 14:36 ` Tejun Heo
2020-03-31 15:47 ` Weiping Zhang
2020-03-31 15:51 ` Tejun Heo
2020-03-31 15:52 ` Christoph Hellwig
2020-03-31 15:54 ` Tejun Heo
2020-03-31 16:31 ` Weiping Zhang
2020-03-31 16:33 ` Christoph Hellwig
2020-03-31 16:52 ` Weiping Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAA70yB62_6JD_8dJTGPjnjJfyJSa1xqiCVwwNYtsTCUXQR5uCA@mail.gmail.com \
--to=zwp10758@gmail.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=cgroups@vger.kernel.org \
--cc=edmund.nadolski@intel.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=ming.lei@redhat.com \
--cc=minwoo.im.dev@gmail.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).