Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
From: Tejun Heo <tj@kernel.org>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: hannes@cmpxchg.org, clm@fb.com, dennisz@fb.com,
	Josef Bacik <jbacik@fb.com>,
	kernel-team@fb.com, newella@fb.com, lizefan@huawei.com,
	axboe@kernel.dk, Paolo Valente <paolo.valente@linaro.org>,
	Rik van Riel <riel@surriel.com>,
	josef@toxicpanda.com, cgroups@vger.kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/10] blkcg: implement blk-iocost
Date: Mon, 14 Oct 2019 08:36:43 -0700
Message-ID: <20191014153643.GD18794@devbig004.ftw2.facebook.com> (raw)
In-Reply-To: <20191009153629.GA5400@blackbody.suse.cz>

Hello,

On Wed, Oct 09, 2019 at 05:36:29PM +0200, Michal Koutný wrote:
> Because I'm not fully convinced using the root cgroup for the latter is
> a good idea and I don't have a better one (what about
> /sys/kernel/cgroup/?), I'd like to question the former to potentially
> postpone finding the place for its parameters :-)

Yeah, I mean, I don't know.  If these params were useful outside
iocost controller itself, under /sys/block would be a better place but
it's kind tightly tied to vrate.  We likely can talk on the subject
for a really long time probalby because there's no clearly technically
better choice here, so...

> On Wed, Aug 28, 2019 at 03:05:58PM -0700, Tejun Heo <tj@kernel.org> wrote:
> > [...]
> > Please see the top comment in blk-iocost.c and documentation for
> > more details.
> I admit I did't grasp the explanations in the cgroup-v2.rst, perhaps
> some of the explanations from blk-iocost.c would be useful there as
> well.
> 
> IIUC, the controls are supposed to be abstracted and generic to express
> high-level ideas and be independent of particular details.
> Here a bunch of parameters is introduced whose tuning may become a
> complex optimization task.
> 
> What is the metric that is the QoS controller striving to guarantee?
> How does it differ from the io.latency policy?

Yeah, it's kinda unfortunate that it requires this many parameters but
at least my opinion is that that's reflecting the inherent
complexities of the underlying devices and how workloads interact with
them.  Andy knows and can explain this a lot better than me but here's
what's we're working on:

For the cost model, the plan is to build a database of model-specific
model parameters which are loaded during boot.  The cost model
parameters are pretty straight forward to determine, so hopefully this
won't be too difficult.

For QoS parameters, Andy is currently working on a method to determine
the set of parametesr which are at the edge of total work cliff -
ie. the point where tighetning QoS params further starts reducing the
total amount of work the device can do significantly.  This would be
the neutral parameters to use for a given device unless there are
overriding latency requirements, so it's likely that this can be part
of the model-specific parameter set.

We're currently deploying the controller to a lot of machines and
gathering data to verify model accuracies and controller behaviors.
It's working pretty well already and once the methods become more
mature, we'll upstream them (whichever projects they end up
belonging).


> > [...] 
> > + * 2-2. Vrate Adjustment
> > + * [...] When this delay becomes noticeable, it's a clear
> > + * indication that the device is saturated and we lower the vrate.  This
> > + * saturation signal is fairly conservative as it only triggers when both
> > + * hardware and software queues are filled up, and is used as the default
> > + * busy signal.
> (The following paragraph is based only on naïve understanding of the
> block layer.) So the device's vrate is lowered, causing its vtime
> growing slower, i.e.  postponing issuing an IO later for all cgroups
> accessing the device. But what's the purpose of this? If the queues fill
> up, wouldn't be all naturally pushed back by the longer queue time
> anyway? And wouldn't slowing down the device's vtime just cause queueing
> elsewhere?

Nothing can issue IOs indefinitely without some of them completing and
the total amount of work a workload can do is conjoined with the
completion latencies.  Most IO devices have queue depth which is at
some level reasonable given the performance characteritics of the
device; otherwise, the device would develop a really fat pipe in it
which can frustrate various use cases.  On top, block layer adds some
limited amount of queueing to avoid command bubbles (2x qd, usually),
so, while definitely not stringent in any way, the queueing is already
regulated so that things don't get too crazy.

Regulating based on qd may not be enough for latency sensitive
synchronous workloads; however, for a lot of workloads such as reading
file contents or copying them which have in-kernel windowing
mechanisms, it can provide a reasonable level of protection to keep
the effectiveness of the windowing mechanisms without sacrificing
noticeable level of total work.

Thanks.

-- 
tejun

  reply index

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-28 22:05 [PATCHSET v3 block/for-linus] IO cost model based work-conserving porportional controller Tejun Heo
2019-08-28 22:05 ` [PATCH 01/10] blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn() Tejun Heo
2019-08-28 22:05 ` [PATCH 02/10] blkcg: make ->cpd_init_fn() optional Tejun Heo
2019-08-28 22:05 ` [PATCH 03/10] blkcg: separate blkcg_conf_get_disk() out of blkg_conf_prep() Tejun Heo
2019-08-28 22:05 ` [PATCH 04/10] block/rq_qos: add rq_qos_merge() Tejun Heo
2019-08-28 22:05 ` [PATCH 05/10] block/rq_qos: implement rq_qos_ops->queue_depth_changed() Tejun Heo
2019-08-28 22:05 ` [PATCH 06/10] blkcg: s/RQ_QOS_CGROUP/RQ_QOS_LATENCY/ Tejun Heo
2019-08-28 22:05 ` [PATCH 07/10] blk-mq: add optional request->alloc_time_ns Tejun Heo
2019-08-28 22:05 ` [PATCH 08/10] blkcg: implement blk-iocost Tejun Heo
2019-08-29 15:53   ` [PATCH] blkcg: fix missing free on error path of blk_iocost_init() Tejun Heo
2019-09-10 12:55   ` [PATCH 08/10] blkcg: implement blk-iocost Michal Koutný
2019-09-10 16:08     ` Tejun Heo
2019-09-11  8:18       ` Paolo Valente
2019-09-11 14:16         ` Tejun Heo
2019-09-11 15:54           ` Tejun Heo
2019-09-11 16:44           ` Paolo Valente
2019-10-03 14:51       ` Michal Koutný
2019-10-03 16:45         ` Tejun Heo
2019-10-09 15:36           ` Michal Koutný
2019-10-14 15:36             ` Tejun Heo [this message]
2019-11-01 16:15               ` Michal Koutný
2019-11-01 16:56                 ` Paolo Valente
2019-08-28 22:05 ` [PATCH 09/10] blkcg: add tools/cgroup/iocost_monitor.py Tejun Heo
2019-08-28 22:06 ` [PATCH 10/10] blkcg: add tools/cgroup/iocost_coef_gen.py Tejun Heo
2019-08-29  3:29 ` [PATCHSET v3 block/for-linus] IO cost model based work-conserving porportional controller Jens Axboe
     [not found] ` <20190829082248.6464-1-hdanton@sina.com>
2019-08-29 15:43   ` [PATCH 07/10] blk-mq: add optional request->alloc_time_ns Tejun Heo
     [not found] ` <20190829133928.16192-1-hdanton@sina.com>
2019-08-29 15:46   ` [PATCH 08/10] blkcg: implement blk-iocost Tejun Heo
2019-08-29 15:54 ` [PATCHSET v3 block/for-linus] IO cost model based work-conserving porportional controller Paolo Valente
2019-08-29 15:56   ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2019-07-10 20:51 [PATCHSET v2 " Tejun Heo
2019-07-10 20:51 ` [PATCH 08/10] blkcg: implement blk-iocost Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191014153643.GD18794@devbig004.ftw2.facebook.com \
    --to=tj@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=clm@fb.com \
    --cc=dennisz@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=jbacik@fb.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=mkoutny@suse.com \
    --cc=newella@fb.com \
    --cc=paolo.valente@linaro.org \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git