From: Paolo Valente <paolo.valente@linaro.org>
To: Weiping Zhang <zwp10758@gmail.com>
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>,
Tejun Heo <tj@kernel.org>, Christoph Hellwig <hch@lst.de>,
Bart Van Assche <bvanassche@acm.org>,
Minwoo Im <minwoo.im.dev@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ming Lei <ming.lei@redhat.com>,
"Nadolski, Edmund" <edmund.nadolski@intel.com>,
linux-block@vger.kernel.org, cgroups@vger.kernel.org,
linux-nvme@lists.infradead.org
Subject: Re: [PATCH v5 0/4] Add support Weighted Round Robin for blkcg and nvme
Date: Tue, 31 Mar 2020 12:29:05 +0200 [thread overview]
Message-ID: <7CD57B83-F067-4918-878C-BAC413C6A2B3@linaro.org> (raw)
In-Reply-To: <CAA70yB62_6JD_8dJTGPjnjJfyJSa1xqiCVwwNYtsTCUXQR5uCA@mail.gmail.com>
> Il giorno 31 mar 2020, alle ore 08:17, Weiping Zhang <zwp10758@gmail.com> ha scritto:
>
>>> On the driver implementation, the number of module parameters being
>>> added here is problematic. We already have 2 special classes of queues,
>>> and defining this at the module level is considered too coarse when
>>> the system has different devices on opposite ends of the capability
>>> spectrum. For example, users want polled queues for the fast devices,
>>> and none for the slower tier. We just don't have a good mechanism to
>>> define per-controller resources, and more queue classes will make this
>>> problem worse.
>>>
>> We can add a new "string" module parameter, which contains a model number,
>> in most cases, the save product with a common prefix model number, so
>> in this way
>> nvme can distinguish the different performance devices(hign or low end).
>> Before create io queue, nvme driver can get the device's Model number(40 Bytes),
>> then nvme driver can compare device's model number with module parameter, to
>> decide how many io queues for each disk;
>>
>> /* if model_number is MODEL_ANY, these parameters will be applied to
>> all nvme devices. */
>> char dev_io_queues[1024] = "model_number=MODEL_ANY,
>> poll=0,read=0,wrr_low=0,wrr_medium=0,wrr_high=0,wrr_urgent=0";
>> /* these paramters only affect nvme disk whose model number is "XXX" */
>> char dev_io_queues[1024] = "model_number=XXX,
>> poll=1,read=2,wrr_low=3,wrr_medium=4,wrr_high=5,wrr_urgent=0;";
>>
>> struct dev_io_queues {
>> char model_number[40];
>> unsigned int poll;
>> unsgined int read;
>> unsigned int wrr_low;
>> unsigned int wrr_medium;
>> unsigned int wrr_high;
>> unsigned int wrr_urgent;
>> };
>>
>> We can use these two variable to store io queue configurations:
>>
>> /* default values for the all disk, except whose model number is not
>> in io_queues_cfg */
>> struct dev_io_queues io_queues_def = {};
>>
>> /* user defined values for a specific model number */
>> struct dev_io_queues io_queues_cfg = {};
>>
>> If we need multiple configurations( > 2), we can also extend
>> dev_io_queues to support it.
>>
>
> Hi Maintainers,
>
> If we add patch to support these queue count at controller level,
> instead moudle level,
> shall we add WRR ?
>
> Recently I do some cgroup io weight testing,
> https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test
> I think a proper io weight policy
> should consider high weight cgroup's iops, latency and also take whole
> disk's throughput
> into account, that is to say, the policy should do more carfully trade
> off between cgroup's
> IO performance and whole disk's throughput. I know one policy cannot
> do all things perfectly,
> but from the test result nvme-wrr can work well.
>
> From the following test result, nvme-wrr work well for both cgroup's
> latency, iops, and whole
> disk's throughput.
>
> Notes:
> blk-iocost: only set qos.model, not set percentage latency.
> nvme-wrr: set weight by:
> h=64;m=32;l=8;ab=0; nvme set-feature /dev/nvme1n1 -f 1 -v $(printf
> "0x%x\n" $(($ab<<0|$l<<8|$m<<16|$h<<24)))
> echo "$major:$minor high" > /sys/fs/cgroup/test1/io.wrr
> echo "$major:$minor low" > /sys/fs/cgroup/test2/io.wrr
>
>
> Randread vs Randread:
> cgroup.test1.weight : cgroup.test2.weight = 8 : 1
> high weight cgroup test1: randread, fio: numjobs=8, iodepth=32, bs=4K
> low weight cgroup test2: randread, fio: numjobs=8, iodepth=32, bs=4K
>
> test case bw iops rd_avg_lat wr_avg_lat
> rd_p99_lat wr_p99_lat
> =======================================================================================
> bfq_test1 767226 191806 1333.30 0.00
> 536.00 0.00
> bfq_test2 94607 23651 10816.06 0.00
> 610.00 0.00
> iocost_test1 1457718 364429 701.76 0.00
> 1630.00 0.00
> iocost_test2 1466337 366584 697.62 0.00
> 1613.00 0.00
> none_test1 1456585 364146 702.22 0.00
> 1646.00 0.00
> none_test2 1463090 365772 699.12 0.00
> 1613.00 0.00
> wrr_test1 2635391 658847 387.94 0.00
> 1236.00 0.00
> wrr_test2 365428 91357 2801.00 0.00
> 5537.00 0.00
>
> https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test#215-summary-fio-output
>
>
Glad to see that BFQ meets weights. Sad to see how it is suffering in
terms of IOPS on your system.
Good job with your scheduler!
However, as for I/O control, the hard-to-control cases are not the
ones with constantly-full deep queues. BFQ complexity stems from the
need to control also the tough cases. An example is sync I/O with
I/O depth one against async I/O. On the other hand, those use cases
may not be of interest for your scheduler.
Thanks,
Paolo
> Randread vs Seq Write:
> cgroup.test1.weight : cgroup.test2.weight = 8 : 1
> high weight cgroup test1: randread, fio: numjobs=8, iodepth=32, bs=4K
> low weight cgroup test2: seq write, fio: numjobs=1, iodepth=32, bs=256K
>
> test case bw iops rd_avg_lat wr_avg_lat
> rd_p99_lat wr_p99_lat
> =======================================================================================
> bfq_test1 814327 203581 1256.19 0.00 593.00 0.00
> bfq_test2 104758 409 0.00 78196.32 0.00
> 1052770.00
> iocost_test1 270467 67616 3784.02 0.00 9371.00 0.00
> iocost_test2 1541575 6021 0.00 5313.02 0.00
> 6848.00
> none_test1 271708 67927 3767.01 0.00 9502.00 0.00
> none_test2 1541951 6023 0.00 5311.50 0.00
> 6848.00
> wrr_test1 775005 193751 1320.17 0.00 4112.00 0.00
> wrr_test2 1198319 4680 0.00 6835.30 0.00
> 8847.00
>
>
> https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test#225-summary-fio-output
>
> Thanks
> Weiping
next prev parent reply other threads:[~2020-03-31 10:27 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-04 3:30 [PATCH v5 0/4] Add support Weighted Round Robin for blkcg and nvme Weiping Zhang
2020-02-04 3:31 ` [PATCH v5 1/4] block: add weighted round robin for blkcgroup Weiping Zhang
2020-02-04 3:31 ` [PATCH v5 2/4] nvme: add get_ams for nvme_ctrl_ops Weiping Zhang
2020-02-04 3:31 ` [PATCH v5 3/4] nvme-pci: rename module parameter write_queues to read_queues Weiping Zhang
2020-02-04 3:31 ` [PATCH v5 4/4] nvme: add support weighted round robin queue Weiping Zhang
2020-02-04 15:42 ` [PATCH v5 0/4] Add support Weighted Round Robin for blkcg and nvme Keith Busch
2020-02-16 8:09 ` Weiping Zhang
2020-03-31 6:17 ` Weiping Zhang
2020-03-31 10:29 ` Paolo Valente [this message]
2020-03-31 14:36 ` Tejun Heo
2020-03-31 15:47 ` Weiping Zhang
2020-03-31 15:51 ` Tejun Heo
2020-03-31 15:52 ` Christoph Hellwig
2020-03-31 15:54 ` Tejun Heo
2020-03-31 16:31 ` Weiping Zhang
2020-03-31 16:33 ` Christoph Hellwig
2020-03-31 16:52 ` Weiping Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7CD57B83-F067-4918-878C-BAC413C6A2B3@linaro.org \
--to=paolo.valente@linaro.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=cgroups@vger.kernel.org \
--cc=edmund.nadolski@intel.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=ming.lei@redhat.com \
--cc=minwoo.im.dev@gmail.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=zwp10758@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).