linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	Fabio Checconi <fchecconi@gmail.com>,
	Arianna Avanzini <avanzini.arianna@gmail.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ulf Hansson <ulf.hansson@linaro.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	broonie@kernel.org
Subject: Re: [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler
Date: Tue, 1 Mar 2016 13:46:56 -0500	[thread overview]
Message-ID: <20160301184656.GI3965@htj.duckdns.org> (raw)
In-Reply-To: <72E81252-203C-4EB7-8459-B9B7060029C6@linaro.org>

Hello, Paolo.

Sorry about the delay.

On Sat, Feb 20, 2016 at 11:23:43AM +0100, Paolo Valente wrote:
> Before replying to your points, I want to stress that I'm not a
> champion of budget-based scheduling at all costs. Budget-based
> scheduling just seems to provide tight bandwidth and latency
> guarantees that are practically impossible to get with time-based
> scheduling. I will try to explain this fact better, and to provide
> also a numerical example, in my replies to your points.

I do like the budget-based scheduling.  It just feels that the budget
is based on the wrong unit.

...
> I think I got your point. In any case, a queue is not punished *after*
> it has consumed an undue amount of the resource, because a queue just
> cannot get to consume an undue amount of the resource. There is a
> timeout: a queue cannot be served for more than a pre-defined maximum
> time slice.
> 
> Even if a queue expires before that timeout, BFQ checks anyway, on the
> expiration of the queue, whether the queue is so slow to not deserve
> accurate service-based guarantees. This is done to achieve additional
> robustness. In fact, if service-based guarantees were provided to a
> very slow queue that, for some reason, never causes the timeout to
> expire, then the queue would happen to be served too often, i.e., to
> get the undue amount of IO resource you mention.

I see.  Once a queue starts timing out its slice, it gets switched to
time based scheduling; however, as you mentioned, workloads which
generate moderate random IOs would still get preferential treatment
over workloads which generate sequential IOs, by definition.

...
> Your metaphor is clear, but it does not seem to match what I expect
> from my storage device. As a user of my PC, I’m concerned, e.g., about
> how long it takes to copy a large file. I’m not concerned about what
> percentage of time will be guaranteed to my file copy if other
> processes happen to do I/O in parallel. As a similar example, a good

The underlying device is fundamentally incapable of giving guarantees
like that.  The only way to get a (quasi) bandwidth guarantee from a
disk device is either ensuring that the IO is almost completely
sequential or there's enough buffer in capacity for the expected
seekiness of the IO pattern.

For use cases where the differences in seekiness across workloads are
accidental - e.g. all are trying to stream different files but some
files are more fragmented by accident - using bandwidth as the
resource unit would be helpful in mitigating the random gaps that the
user shouldn't be bothered by, but that'd be focusing on a pretty
narrow set of use cases.

Workloads are varied and underlying device performs wildly differently
depending on the specific IO pattern.  Because rotating disks suck so
badly, it's true that there's a lot of wiggle room in what the IO
scheduler can do.  People are accustomed to dealing with random
behaviors.  That said, it still doesn't feel comfortable to use the
obviously wrong unit as the fundamental basis of resource
distribution.

> file-hosting service must probably guarantee reasonable read/write,
> i.e., download/upload, speeds to users (of course, I/O speed matters
> only if the bottleneck is not the network). Most likely, a user of
> such a service does not care (directly) about how much resource-time
> is guaranteed to the I/O related to his/her downloads/uploads.
>
> With a budget-based service scheme, you can easily provide these
> service guarantees accurately. In particular, you can achieve this
> goal even if the bandwidth fluctuates, provided that fluctuations
> depend mostly on I/O patterns. That is, it must hold true that the
> device achieves about the same bandwidth with the same I/O
> pattern. This is exactly the case with both rotational and

So, yes, I see that bandwidth based control would yield a better
result for this specific use case but at the same time this is a very
specialized use case and probably the *only* use case where bandwidth
based distribution makes sense - equivalent logically sequential
workloads where the specifics of IO pattern are completely accidental.
We can't really design the whole thing around that single use case.

> non-rotational devices. With a time-based scheme, it is impossible to
> provide these service guarantees, if bandwidth fluctuates and
> requirements are minimally stringent. I will try to show it with a
> simple numerical example. Before this numerical example, I would like
> to report a last practical example.

I agree that bandwidth based distribution would behave better for this
use case but think the above paragraph is going a bit too far.
Bandwidth based distribution can stick to the line better but that
just means that time based scheduling would need a bit more buffer in
achieving the same level of behavioral consistency.  It's not like
bandwidth can actually be guarnateed no matter what we do.

...
> To achieve, with a time-based scheduler, the same precise and stable
> bandwidth distribution as with a budget-based scheduler, the only
> solution is to change weights dynamically, as a function of how the
> throughput achieved by B varies in its time slots. Such a solution
> would be definitely complex, if ever feasible and stable.

Isn't that a use case specifically carved for bandwidth based
distribution?  Imagine how this would work when e.g. there are mostly
sequential IO workloads and fluctuating random workloads.  Addition of
another sequential workload would behave as expected but addition of
random workloads would cripple everyone to the same level if the
random workloads play their cards right.

One side of the coin is "if I have two parallel file copies, they
proceed at the same speed regardless of how they're distributed across
disk" and the other is "but if I start an application which
intermittently issues random IOs, my copies take 5x longer".  Isn't
"the two parallel copies mostly keep the same pace but may deviate a
bit but addition of random IOs doesn't collapse the whole thing" a
better proposition?

Hmmm... it could be that I'm mistaken on how trigger happy the switch
to time based scheduling is.  Maybe it's sensitive enough that
bandwidth based scheduling is only applied to workloads which are
mostly sequential.  I'm sorry if I'm being too dense on this point but
can you please give me some examples on what would happen when
sequential workloads and random ones mix?

Thanks.

-- 
tejun

  parent reply	other threads:[~2016-03-01 18:47 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01 22:12 [PATCH RFC 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 01/22] block, cfq: remove queue merging for close cooperators Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 02/22] block, cfq: remove close-based preemption Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 03/22] block, cfq: remove deep seek queues logic Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 04/22] block, cfq: remove SSD-related logic Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 05/22] block, cfq: get rid of hierarchical support Paolo Valente
2016-02-10 23:04   ` Tejun Heo
2016-02-01 22:12 ` [PATCH RFC 06/22] block, cfq: get rid of queue preemption Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 07/22] block, cfq: get rid of workload type Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 08/22] block, cfq: get rid of latency tunables Paolo Valente
2016-02-10 23:05   ` Tejun Heo
2016-02-01 22:12 ` [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Paolo Valente
2016-02-11 22:22   ` Tejun Heo
2016-02-12  0:35     ` Mark Brown
2016-02-17 15:57       ` Tejun Heo
2016-02-17 16:02         ` Mark Brown
2016-02-17 17:04           ` Tejun Heo
2016-02-17 18:13             ` Jonathan Corbet
2016-02-17 19:45               ` Tejun Heo
2016-02-17 19:56                 ` Jonathan Corbet
2016-02-17 20:14                   ` Tejun Heo
2016-02-17  9:02     ` Paolo Valente
2016-02-17 17:02       ` Tejun Heo
2016-02-20 10:23         ` Paolo Valente
2016-02-20 11:02           ` Paolo Valente
2016-03-01 18:46           ` Tejun Heo [this message]
2016-03-04 17:29             ` Linus Walleij
2016-03-04 17:39               ` Christoph Hellwig
2016-03-04 18:10                 ` Austin S. Hemmelgarn
2016-03-11 11:16                   ` Christoph Hellwig
2016-03-11 13:38                     ` Austin S. Hemmelgarn
2016-03-05 12:18                 ` Linus Walleij
2016-03-11 11:17                   ` Christoph Hellwig
2016-03-11 11:24                     ` Nikolay Borisov
2016-03-11 11:49                       ` Christoph Hellwig
2016-03-11 14:53                     ` Linus Walleij
2016-03-09  6:55                 ` Paolo Valente
2016-04-13 19:54                 ` Tejun Heo
2016-04-14  5:03                   ` Mark Brown
2016-03-09  6:34             ` Paolo Valente
2016-04-13 20:41               ` Tejun Heo
2016-04-14 10:23                 ` Paolo Valente
2016-04-14 16:29                   ` Tejun Heo
2016-04-15 14:20                     ` Paolo Valente
2016-04-15 15:08                       ` Tejun Heo
2016-04-15 16:17                         ` Paolo Valente
2016-04-15 19:29                           ` Tejun Heo
2016-04-15 22:08                             ` Paolo Valente
2016-04-15 22:45                               ` Tejun Heo
2016-04-16  6:03                                 ` Paolo Valente
2016-04-15 14:49                     ` Linus Walleij
2016-02-01 22:12 ` [PATCH RFC 10/22] block, bfq: add full hierarchical scheduling and cgroups support Paolo Valente
2016-02-11 22:28   ` Tejun Heo
2016-02-17  9:07     ` Paolo Valente
2016-02-17 17:14       ` Tejun Heo
2016-02-17 17:45         ` Tejun Heo
2016-04-20  9:32     ` Paolo
2016-04-22 18:13       ` Tejun Heo
2016-04-22 18:19         ` Paolo Valente
2016-04-22 18:41           ` Tejun Heo
2016-04-22 19:05             ` Paolo Valente
2016-04-22 19:32               ` Tejun Heo
2016-04-23  7:07                 ` Paolo Valente
2016-04-25 19:24                   ` Tejun Heo
2016-04-25 20:30                     ` Paolo
2016-05-06 20:20                       ` Paolo Valente
2016-05-12 13:11                         ` Paolo
2016-07-27 16:13                         ` [PATCH RFC V8 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 01/22] block, cfq: remove queue merging for close cooperators Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 02/22] block, cfq: remove close-based preemption Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 03/22] block, cfq: remove deep seek queues logic Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 04/22] block, cfq: remove SSD-related logic Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 05/22] block, cfq: get rid of hierarchical support Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 06/22] block, cfq: get rid of queue preemption Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 07/22] block, cfq: get rid of workload type Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 08/22] block, cfq: get rid of latency tunables Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 10/22] block, bfq: add full hierarchical scheduling and cgroups support Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 11/22] block, bfq: improve throughput boosting Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 12/22] block, bfq: modify the peak-rate estimator Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 13/22] block, bfq: add more fairness with writes and slow processes Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 14/22] block, bfq: improve responsiveness Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 15/22] block, bfq: reduce I/O latency for soft real-time applications Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 16/22] block, bfq: preserve a low latency also with NCQ-capable drives Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 17/22] block, bfq: reduce latency during request-pool saturation Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 18/22] block, bfq: add Early Queue Merge (EQM) Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 19/22] block, bfq: reduce idling only in symmetric scenarios Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 20/22] block, bfq: boost the throughput on NCQ-capable flash-based devices Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 21/22] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 22/22] block, bfq: handle bursts of queue activations Paolo Valente
2016-07-28 16:50                           ` [PATCH RFC V8 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo
2016-02-01 22:12 ` [PATCH RFC 11/22] block, bfq: improve throughput boosting Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 12/22] block, bfq: modify the peak-rate estimator Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 13/22] block, bfq: add more fairness to boost throughput and reduce latency Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 14/22] block, bfq: improve responsiveness Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 15/22] block, bfq: reduce I/O latency for soft real-time applications Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 16/22] block, bfq: preserve a low latency also with NCQ-capable drives Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 17/22] block, bfq: reduce latency during request-pool saturation Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 18/22] block, bfq: add Early Queue Merge (EQM) Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 19/22] block, bfq: reduce idling only in symmetric scenarios Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 20/22] block, bfq: boost the throughput on NCQ-capable flash-based devices Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 21/22] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 22/22] block, bfq: handle bursts of queue activations Paolo Valente

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160301184656.GI3965@htj.duckdns.org \
    --to=tj@kernel.org \
    --cc=avanzini.arianna@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=broonie@kernel.org \
    --cc=fchecconi@gmail.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).