Re: [PATCH v5 0/4] Add support Weighted Round Robin for blkcg and nvme

From: Weiping Zhang <zwp10758@gmail.com>
To: Tejun Heo <tj@kernel.org>
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@lst.de>,
	Bart Van Assche <bvanassche@acm.org>,
	Minwoo Im <minwoo.im.dev@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ming Lei <ming.lei@redhat.com>,
	"Nadolski, Edmund" <edmund.nadolski@intel.com>,
	linux-block@vger.kernel.org, cgroups@vger.kernel.org,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH v5 0/4] Add support Weighted Round Robin for blkcg and nvme
Date: Tue, 31 Mar 2020 23:47:41 +0800	[thread overview]
Message-ID: <CAA70yB51=VQrL+2wC+DL8cYmGVACb2_w5UHc4XFn7MgZjUJaeg@mail.gmail.com> (raw)
In-Reply-To: <20200331143635.GS162390@mtj.duckdns.org>

Tejun Heo <tj@kernel.org> 于2020年3月31日周二 下午10:36写道：
>
> Hello, Weiping.
>
> On Tue, Mar 31, 2020 at 02:17:06PM +0800, Weiping Zhang wrote:
> > Recently I do some cgroup io weight testing,
> > https://github.com/dublio/iotrack/wiki/cgroup-io-weight-test
> > I think a proper io weight policy
> > should consider high weight cgroup's iops, latency and also take whole
> > disk's throughput
> > into account, that is to say, the policy should do more carfully trade
> > off between cgroup's
> > IO performance and whole disk's throughput. I know one policy cannot
> > do all things perfectly,
> > but from the test result nvme-wrr can work well.
>
> That's w/o iocost QoS targets configured, right? iocost should be able to
> achieve similar results as wrr with QoS configured.
>
Yes, I have not set Qos target.
> > From the following test result, nvme-wrr work well for both cgroup's
> > latency, iops, and whole
> > disk's throughput.
>
> As I wrote before, the issues I see with wrr are the followings.
>
> * Hardware dependent. Some will work ok or even fantastic. Many others will do
>   horribly.
>
> * Lack of configuration granularity. We can't configure it granular enough to
>   serve hierarchical configuration.
>
> * Likely not a huge problem with the deep QD of nvmes but lack of queue depth
>   control can lead to loss of latency control and thus loss of protection for
>   low concurrency workloads when pitched against workloads which can saturate
>   QD.
>
> All that said, given the feature is available, I don't see any reason to not
> allow to use it, but I don't think it fits the cgroup interface model given the
> hardware dependency and coarse granularity. For these cases, I think the right
> thing to do is using cgroups to provide tagging information - ie. build a
> dedicated interface which takes cgroup fd or ino as the tag and associate
> configurations that way. There already are other use cases which use cgroup this
> way (e.g. perf).
>
Do you means drop the "io.wrr" or "blkio.wrr" in cgroup, and use a
dedicated interface
like /dev/xxx or /proc/xxx?

I see the perf code:
struct fd f = fdget(fd)
struct cgroup_subsys_state *css =
css_tryget_online_from_dir(f.file->f_path.dentry,
        &perf_event_cgrp_subsys);

Looks can be applied to block cgroup in same way.

Thanks your help.