All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Valente <paolo.valente@unimore.it>
To: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>, Shaohua Li <shli@fb.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jens Axboe <axboe@fb.com>,
	Kernel-team@fb.com, jmoyer@redhat.com,
	Mark Brown <broonie@kernel.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	Ulf Hansson <ulf.hansson@linaro.org>
Subject: Re: [PATCH V3 00/11] block-throttle: add .high limit
Date: Tue, 4 Oct 2016 18:22:28 +0200	[thread overview]
Message-ID: <A5525664-DF90-4604-B64A-E793BBC0CB6A@unimore.it> (raw)
In-Reply-To: <20161004155616.GB4205@htj.duckdns.org>


> Il giorno 04 ott 2016, alle ore 17:56, Tejun Heo <tj@kernel.org> ha scritto:
> 
> Hello, Vivek.
> 
> On Tue, Oct 04, 2016 at 09:28:05AM -0400, Vivek Goyal wrote:
>> On Mon, Oct 03, 2016 at 02:20:19PM -0700, Shaohua Li wrote:
>>> Hi,
>>> 
>>> The background is we don't have an ioscheduler for blk-mq yet, so we can't
>>> prioritize processes/cgroups.
>> 
>> So this is an interim solution till we have ioscheduler for blk-mq?
> 
> It's a common permanent solution which applies to both !mq and mq.
> 
>>> This patch set tries to add basic arbitration
>>> between cgroups with blk-throttle. It adds a new limit io.high for
>>> blk-throttle. It's only for cgroup2.
>>> 
>>> io.max is a hard limit throttling. cgroups with a max limit never dispatch more
>>> IO than their max limit. While io.high is a best effort throttling. cgroups
>>> with high limit can run above their high limit at appropriate time.
>>> Specifically, if all cgroups reach their high limit, all cgroups can run above
>>> their high limit. If any cgroup runs under its high limit, all other cgroups
>>> will run according to their high limit.
>> 
>> Hi Shaohua,
>> 
>> I still don't understand why we should not implement a weight based
>> proportional IO mechanism and how this mechanism is better than proportional IO .
> 
> Oh, if we actually can implement proportional IO control, it'd be
> great.  The problem is that we have no way of knowing IO cost for
> highspeed ssd devices.  CFQ gets around the problem by using the
> walltime as the measure of resource usage and scheduling time slices,
> which works fine for rotating disks but horribly for highspeed ssds.
> 

Could you please elaborate more on this point?  BFQ uses sectors
served to measure service, and, on the all the fast devices on which
we have tested it, it accurately distributes
bandwidth as desired, redistributes excess bandwidth with any issue,
and guarantees high responsiveness and low latency at application and
system level (e.g., ~0 drop rate in video playback, with any background
workload tested).

Could you please suggest me some test to show how sector-based
guarantees fails?

Thanks,
Paolo

> We can get some semblance of proportional control by just counting bw
> or iops but both break down badly as a means to measure the actual
> resource consumption depending on the workload.  While limit based
> control is more tedious to configure, it doesn't misrepresent what's
> going on and is a lot less likely to produce surprising outcomes.
> 
> We *can* try to concoct something which tries to do proportional
> control for highspeed ssds but that's gonna be quite a bit of
> complexity and I'm not so sure it'd be justifiable given that we can't
> even figure out measurement of the most basic operating unit.
> 
>> Agreed that we have issues with proportional IO and we don't have good
>> solutions for these problems. But I can't see that how this mechanism
>> will overcome these problems either.
> 
> It mostly defers the burden to the one who's configuring the limits
> and expects it to know the characteristics of the device and workloads
> and configure accordingly.  It's quite a bit more tedious to use but
> should be able to cover good portion of use cases without being overly
> complicated.  I agree that it'd be nice to have a simple proportional
> control but as you said can't see a good solution for it at the
> moment.
> 
>> IIRC, biggest issue with proportional IO was that a low prio group might
>> fill up the device queue with plenty of IO requests and later when high
>> prio cgroup comes, it will still experience latencies anyway. And solution
>> to the problem probably would be to get some awareness in device about 
>> priority of request and map weights to those priority. That way higher
>> prio requests get prioritized.
> 
> Nah, the real problem is that we can't even decide what the
> proportions should be based on.  The most fundamental part is missing.
> 
>> Or run device at lower queue depth. That will improve latencies but migth
>> reduce overall throughput.
> 
> And that we can't do this (and thus basically operate close to
> scheduling time slices) for highspeed ssds.
> 
>> Or thorottle number of buffered writes (as Jens's writeback throttling)
>> patches were doing. Buffered writes seem to be biggest culprit for 
>> increased latencies and being able to control these should help.
> 
> That's a different topic.
> 
>> ioprio/weight based proportional IO mechanism is much more generic and
>> much easier to configure for any kind of storage. io.high is absolute
>> limit and makes it much harder to configure. One needs to know a lot
>> about underlying volume/device's bandwidth (which varies a lot anyway
>> based on workload).
> 
> Yeap, no disagreement there, but it still is a workable solution.
> 
>> IMHO, we seem to be trying to cater to one specific use case using
>> this mechanism. Something ioprio/weight based will be much more
>> generic and we should explore implementing that along with building
>> notion of ioprio in devices. When these two work together, we might
>> be able to see good results. Just software mechanism alone might not
>> be enough.
> 
> I don't think it's catering to specific use cases.  It is a generic
> mechanism which demands knowledge and experimentation to configure.
> It's more a way for the kernel to cop out and defer figuring out
> device characteristics to userland.  If you have a better idea, I'm
> all ears.
> 
> Thanks.
> 
> -- 
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Paolo Valente
Algogroup
Dipartimento di Scienze Fisiche, Informatiche e Matematiche
Via Campi 213/B
41125 Modena - Italy
http://algogroup.unimore.it/people/paolo/






  reply	other threads:[~2016-10-04 16:22 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-03 21:20 [PATCH V3 00/11] block-throttle: add .high limit Shaohua Li
2016-10-03 21:20 ` [PATCH v3 01/11] block-throttle: prepare support multiple limits Shaohua Li
2016-10-03 21:20 ` [PATCH v3 02/11] block-throttle: add .high interface Shaohua Li
2016-10-03 21:20 ` [PATCH v3 03/11] block-throttle: configure bps/iops limit for cgroup in high limit Shaohua Li
2016-10-03 21:20 ` [PATCH v3 04/11] block-throttle: add upgrade logic for LIMIT_HIGH state Shaohua Li
2016-10-03 21:20 ` [PATCH v3 05/11] block-throttle: add downgrade logic Shaohua Li
2016-10-03 21:20 ` [PATCH v3 06/11] blk-throttle: make sure expire time isn't too big Shaohua Li
2016-10-03 21:20 ` [PATCH v3 07/11] blk-throttle: make throtl_slice tunable Shaohua Li
2016-10-03 21:20 ` [PATCH v3 08/11] blk-throttle: detect completed idle cgroup Shaohua Li
2016-10-03 21:20 ` [PATCH v3 09/11] block-throttle: make bandwidth change smooth Shaohua Li
2016-10-03 21:20 ` [PATCH v3 10/11] block-throttle: add a simple idle detection Shaohua Li
2016-10-03 21:20 ` [PATCH v3 11/11] blk-throttle: ignore idle cgroup limit Shaohua Li
2016-10-04 13:28 ` [PATCH V3 00/11] block-throttle: add .high limit Vivek Goyal
2016-10-04 15:56   ` Tejun Heo
2016-10-04 16:22     ` Paolo Valente [this message]
2016-10-04 16:27       ` Tejun Heo
2016-10-04 17:01         ` Paolo Valente
2016-10-04 17:28           ` Shaohua Li
2016-10-04 17:43             ` Paolo Valente
2016-10-04 18:28               ` Shaohua Li
2016-10-04 19:49                 ` Paolo Valente
2016-10-04 18:54               ` Tejun Heo
2016-10-04 19:02                 ` Paolo Valente
2016-10-04 19:14                   ` Tejun Heo
2016-10-04 19:29                     ` Paolo Valente
2016-10-04 20:27                       ` Tejun Heo
2016-10-05 12:37                         ` Paolo Valente
2016-10-05 13:12                           ` Vivek Goyal
2016-10-05 14:04                             ` Paolo Valente
2016-10-05 14:49                           ` Tejun Heo
2016-10-05 18:30                             ` Shaohua Li
2016-10-05 19:08                               ` Shaohua Li
2016-10-05 19:57                                 ` Paolo Valente
2016-10-05 20:36                                   ` Shaohua Li
2016-10-06  7:22                                     ` Paolo Valente
2016-10-05 19:47                               ` Paolo Valente
2016-10-05 20:07                                 ` Paolo Valente
2016-10-05 20:46                                 ` Shaohua Li
2016-10-06  7:58                                   ` Paolo Valente
2016-10-06 13:15                                     ` Paolo Valente
2016-10-06 17:49                                       ` Vivek Goyal
2016-10-06 18:01                                         ` Paolo Valente
2016-10-06 18:32                                           ` Vivek Goyal
2016-10-06 20:51                                             ` Paolo Valente
2016-10-06 19:44                                         ` Mark Brown
2016-10-06 19:57                                     ` Shaohua Li
2016-10-06 22:24                                       ` Paolo Valente
     [not found]                         ` <CACsaVZ+AqSXHTRdpdrQQp6PuynEPeB-5YOyweWsenjvuKsD12w@mail.gmail.com>
2016-10-09  1:15                           ` Fwd: " Kyle Sanderson
2016-10-14 16:40                             ` Tejun Heo
2016-10-14 17:13                               ` Paolo Valente
2016-10-14 18:35                                 ` Tejun Heo
2016-10-16 19:02                                   ` Paolo Valente
2016-10-18  5:15                                     ` Kyle Sanderson
2016-10-06  8:04                     ` Linus Walleij
2016-10-06 11:03                       ` Mark Brown
2016-10-06 11:57                         ` Austin S. Hemmelgarn
2016-10-06 12:50                           ` Paolo Valente
2016-10-06 13:52                             ` Austin S. Hemmelgarn
2016-10-06 15:05                               ` Paolo Valente
2016-10-06 15:10                                 ` Austin S. Hemmelgarn
2016-10-08 10:46                       ` Heinz Diehl
2016-10-04 18:12     ` Vivek Goyal
2016-10-04 18:50       ` Tejun Heo
2016-10-04 18:56         ` Paolo Valente
2016-10-04 17:08   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A5525664-DF90-4604-B64A-E793BBC0CB6A@unimore.it \
    --to=paolo.valente@unimore.it \
    --cc=Kernel-team@fb.com \
    --cc=axboe@fb.com \
    --cc=broonie@kernel.org \
    --cc=jmoyer@redhat.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=tj@kernel.org \
    --cc=ulf.hansson@linaro.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.