All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Valente <paolo.valente@unimore.it>
To: Shaohua Li <shli@fb.com>
Cc: Tejun Heo <tj@kernel.org>, Vivek Goyal <vgoyal@redhat.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jens Axboe <axboe@fb.com>,
	Kernel-team@fb.com, jmoyer@redhat.com,
	Mark Brown <broonie@kernel.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	Ulf Hansson <ulf.hansson@linaro.org>
Subject: Re: [PATCH V3 00/11] block-throttle: add .high limit
Date: Tue, 4 Oct 2016 19:43:48 +0200	[thread overview]
Message-ID: <EE90477C-FD56-48C1-8A45-97AF39A8BB0C@unimore.it> (raw)
In-Reply-To: <20161004172852.GB73678@anikkar-mbp.local.dhcp.thefacebook.com>


> Il giorno 04 ott 2016, alle ore 19:28, Shaohua Li <shli@fb.com> ha scritto:
> 
> On Tue, Oct 04, 2016 at 07:01:39PM +0200, Paolo Valente wrote:
>> 
>>> Il giorno 04 ott 2016, alle ore 18:27, Tejun Heo <tj@kernel.org> ha scritto:
>>> 
>>> Hello,
>>> 
>>> On Tue, Oct 04, 2016 at 06:22:28PM +0200, Paolo Valente wrote:
>>>> Could you please elaborate more on this point?  BFQ uses sectors
>>>> served to measure service, and, on the all the fast devices on which
>>>> we have tested it, it accurately distributes
>>>> bandwidth as desired, redistributes excess bandwidth with any issue,
>>>> and guarantees high responsiveness and low latency at application and
>>>> system level (e.g., ~0 drop rate in video playback, with any background
>>>> workload tested).
>>> 
>>> The same argument as before.  Bandwidth is a very bad measure of IO
>>> resources spent.  For specific use cases (like desktop or whatever),
>>> this can work but not generally.
>>> 
>> 
>> Actually, we have already discussed this point, and IMHO the arguments
>> that (apparently) convinced you that bandwidth is the most relevant
>> service guarantee for I/O in desktops and the like, prove that
>> bandwidth is the most important service guarantee in servers too.
>> 
>> Again, all the examples I can think of seem to confirm it:
>> . file hosting: a good service must guarantee reasonable read/write,
>> i.e., download/upload, speeds to users
>> . file streaming: a good service must guarantee low drop rates, and
>> this can be guaranteed only by guaranteeing bandwidth and latency
>> . web hosting: high bandwidth and low latency needed here too
>> . clouds: high bw and low latency needed to let, e.g., users of VMs
>> enjoy high responsiveness and, for example, reasonable file-copy
>> time
>> ...
>> 
>> To put in yet another way, with packet I/O in, e.g., clouds, there are
>> basically the same issues, and the main goal is again guaranteeing
>> bandwidth and low latency among nodes.
>> 
>> Could you please provide a concrete server example (assuming we still
>> agree about desktops), where I/O bandwidth does not matter while time
>> does?
> 
> I don't think IO bandwidth does not matter. The problem is bandwidth can't
> measure IO cost. For example, you can't say 8k IO costs 2x IO resource than 4k
> IO.
> 

For what goal do you need to be able to say this, once you succeeded
in guaranteeing bandwidth and low latency to each
process/client/group/node/user?

>>>> Could you please suggest me some test to show how sector-based
>>>> guarantees fails?
>>> 
>>> Well, mix 4k random and sequential workloads and try to distribute the
>>> acteual IO resources.
>>> 
>> 
>> 
>> If I'm not mistaken, we have already gone through this example too,
>> and I thought we agreed on what service scheme worked best, again
>> focusing only on desktops.  To make a long story short(er), here is a
>> snippet from one of our last exchanges.
>> 
>> ----------
>> 
>> On Sat, Apr 16, 2016 at 12:08:44AM +0200, Paolo Valente wrote:
>>> Maybe the source of confusion is the fact that a simple sector-based,
>>> proportional share scheduler always distributes total bandwidth
>>> according to weights. The catch is the additional BFQ rule: random
>>> workloads get only time isolation, and are charged for full budgets,
>>> so as to not affect the schedule of quasi-sequential workloads. So,
>>> the correct claim for BFQ is that it distributes total bandwidth
>>> according to weights (only) when all competing workloads are
>>> quasi-sequential. If some workloads are random, then these workloads
>>> are just time scheduled. This does break proportional-share bandwidth
>>> distribution with mixed workloads, but, much more importantly, saves
>>> both total throughput and individual bandwidths of quasi-sequential
>>> workloads.
>>> 
>>> We could then check whether I did succeed in tuning timeouts and
>>> budgets so as to achieve the best tradeoffs. But this is probably a
>>> second-order problem as of now.
> 
> I don't see why random/sequential matters for SSD. what really matters is
> request size and IO depth. Time scheduling is skeptical too, as workloads can
> dispatch all IO within almost 0 time in high queue depth disks.
> 

That's an orthogonal issue.  If what matter is, e.g., size, then it is
enough to replace "sequential I/O" with "large-request I/O".  In case
I have been too vague, here is an example: I mean that, e.g, in an I/O
scheduler you replace the function that computes whether a queue is
seeky based on request distance, with a function based on
request size.  And this is exactly what has been already done, for
example, in CFQ:

	if (blk_queue_nonrot(cfqd->queue))
		cfqq->seek_history |= (n_sec < CFQQ_SECT_THR_NONROT);
	else
		cfqq->seek_history |= (sdist > CFQQ_SEEK_THR);

Thanks,
Paolo

> Thanks,
> Shaohua


--
Paolo Valente
Algogroup
Dipartimento di Scienze Fisiche, Informatiche e Matematiche
Via Campi 213/B
41125 Modena - Italy
http://algogroup.unimore.it/people/paolo/






  reply	other threads:[~2016-10-04 17:44 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-03 21:20 [PATCH V3 00/11] block-throttle: add .high limit Shaohua Li
2016-10-03 21:20 ` [PATCH v3 01/11] block-throttle: prepare support multiple limits Shaohua Li
2016-10-03 21:20 ` [PATCH v3 02/11] block-throttle: add .high interface Shaohua Li
2016-10-03 21:20 ` [PATCH v3 03/11] block-throttle: configure bps/iops limit for cgroup in high limit Shaohua Li
2016-10-03 21:20 ` [PATCH v3 04/11] block-throttle: add upgrade logic for LIMIT_HIGH state Shaohua Li
2016-10-03 21:20 ` [PATCH v3 05/11] block-throttle: add downgrade logic Shaohua Li
2016-10-03 21:20 ` [PATCH v3 06/11] blk-throttle: make sure expire time isn't too big Shaohua Li
2016-10-03 21:20 ` [PATCH v3 07/11] blk-throttle: make throtl_slice tunable Shaohua Li
2016-10-03 21:20 ` [PATCH v3 08/11] blk-throttle: detect completed idle cgroup Shaohua Li
2016-10-03 21:20 ` [PATCH v3 09/11] block-throttle: make bandwidth change smooth Shaohua Li
2016-10-03 21:20 ` [PATCH v3 10/11] block-throttle: add a simple idle detection Shaohua Li
2016-10-03 21:20 ` [PATCH v3 11/11] blk-throttle: ignore idle cgroup limit Shaohua Li
2016-10-04 13:28 ` [PATCH V3 00/11] block-throttle: add .high limit Vivek Goyal
2016-10-04 15:56   ` Tejun Heo
2016-10-04 16:22     ` Paolo Valente
2016-10-04 16:27       ` Tejun Heo
2016-10-04 17:01         ` Paolo Valente
2016-10-04 17:28           ` Shaohua Li
2016-10-04 17:43             ` Paolo Valente [this message]
2016-10-04 18:28               ` Shaohua Li
2016-10-04 19:49                 ` Paolo Valente
2016-10-04 18:54               ` Tejun Heo
2016-10-04 19:02                 ` Paolo Valente
2016-10-04 19:14                   ` Tejun Heo
2016-10-04 19:29                     ` Paolo Valente
2016-10-04 20:27                       ` Tejun Heo
2016-10-05 12:37                         ` Paolo Valente
2016-10-05 13:12                           ` Vivek Goyal
2016-10-05 14:04                             ` Paolo Valente
2016-10-05 14:49                           ` Tejun Heo
2016-10-05 18:30                             ` Shaohua Li
2016-10-05 19:08                               ` Shaohua Li
2016-10-05 19:57                                 ` Paolo Valente
2016-10-05 20:36                                   ` Shaohua Li
2016-10-06  7:22                                     ` Paolo Valente
2016-10-05 19:47                               ` Paolo Valente
2016-10-05 20:07                                 ` Paolo Valente
2016-10-05 20:46                                 ` Shaohua Li
2016-10-06  7:58                                   ` Paolo Valente
2016-10-06 13:15                                     ` Paolo Valente
2016-10-06 17:49                                       ` Vivek Goyal
2016-10-06 18:01                                         ` Paolo Valente
2016-10-06 18:32                                           ` Vivek Goyal
2016-10-06 20:51                                             ` Paolo Valente
2016-10-06 19:44                                         ` Mark Brown
2016-10-06 19:57                                     ` Shaohua Li
2016-10-06 22:24                                       ` Paolo Valente
     [not found]                         ` <CACsaVZ+AqSXHTRdpdrQQp6PuynEPeB-5YOyweWsenjvuKsD12w@mail.gmail.com>
2016-10-09  1:15                           ` Fwd: " Kyle Sanderson
2016-10-14 16:40                             ` Tejun Heo
2016-10-14 17:13                               ` Paolo Valente
2016-10-14 18:35                                 ` Tejun Heo
2016-10-16 19:02                                   ` Paolo Valente
2016-10-18  5:15                                     ` Kyle Sanderson
2016-10-06  8:04                     ` Linus Walleij
2016-10-06 11:03                       ` Mark Brown
2016-10-06 11:57                         ` Austin S. Hemmelgarn
2016-10-06 12:50                           ` Paolo Valente
2016-10-06 13:52                             ` Austin S. Hemmelgarn
2016-10-06 15:05                               ` Paolo Valente
2016-10-06 15:10                                 ` Austin S. Hemmelgarn
2016-10-08 10:46                       ` Heinz Diehl
2016-10-04 18:12     ` Vivek Goyal
2016-10-04 18:50       ` Tejun Heo
2016-10-04 18:56         ` Paolo Valente
2016-10-04 17:08   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EE90477C-FD56-48C1-8A45-97AF39A8BB0C@unimore.it \
    --to=paolo.valente@unimore.it \
    --cc=Kernel-team@fb.com \
    --cc=axboe@fb.com \
    --cc=broonie@kernel.org \
    --cc=jmoyer@redhat.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=tj@kernel.org \
    --cc=ulf.hansson@linaro.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.