linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Valente <paolo.valente@linaro.org>
To: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	Fabio Checconi <fchecconi@gmail.com>,
	Arianna Avanzini <avanzini.arianna@gmail.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ulf Hansson <ulf.hansson@linaro.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	Mark Brown <broonie@kernel.org>
Subject: Re: [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler
Date: Fri, 15 Apr 2016 16:20:44 +0200	[thread overview]
Message-ID: <427F5DF5-507A-4657-8279-B6A8FD98F6D8@linaro.org> (raw)
In-Reply-To: <20160414162953.GG12583@htj.duckdns.org>


Il giorno 14/apr/2016, alle ore 18:29, Tejun Heo <tj@kernel.org> ha scritto:

> Hello, Paolo.
> 
> On Thu, Apr 14, 2016 at 12:23:14PM +0200, Paolo Valente wrote:
> ...
>>>> 1) Stable(r) and tight bandwidth distribution for mostly-sequential
>>>> reads/writes
>>> 
>>> So, yeah, the above makes toal sense.
>>> 
>>>> 2) Stable(r) and high responsiveness
>>>> 3) Stable(r) and low latency for soft real-time applications
>>>> 4) Faster execution of dev tasks, such as compile and git operations
>>>> (checkout, merge, …), in the presence of background workloads, and
>>>> while guaranteeing a high responsiveness too
>>> 
>>> But can you please enlighten me on why 2-4 are inherently tied to
>>> bandwidth-based scheduling?
>> 
>> Goals 2-4 are obtained by granting a higher share of the throughput
>> to the applications to privilege. The more stably and accurately the
>> underlying scheduling engine is able to enforce the desired bandwidth
>> distribution, the more stably and accurately higher shares can be
>> guaranteed. Then 2-4 follows from 1, i.e., from that BFQ guarantees
>> a stabler and tight(er) bandwidth distribution.
> 
> 4) makes sense as a lot of that workload would be at least
> quasi-sequential but I can't tell why 2) and 3) would depend on
> bandwidth based scheduling.  They're about recognizing workloads which
> can benefit from low latency and treating them accordingly.  Why would
> making the underlying scheduling time based change that?
> 

Because, in BFQ, "treating them accordingly" means raising their
weights to let them receive a higher share of the throughput (other
rigid solutions, such as priority scheduling, would easily lead to
starvation problems). With time-based scheduling, the share of the
throughput guaranteed for a given value of the weight is less stable
than with sector-based scheduling. Then, to provide the same latency
guarantees as with sector-based scheduling, weights have to be raised
more. This would throttle unprivileged processes more. And it would
not however improve stability.

With time-based scheduling, latency guarantees among two privileged
applications may vary more too, even if both applications perform
quasi-sequential I/O. In fact, throughput shares would vary more
depending, e.g., on where the sectors requested by the applications are
located.

>>> To summarize,
>>> 
>>> 1. I still don't understand why bandwidth-based scheduling is better
>>>  (sorry).  The only reason I can think of is that most workloads
>>>  that we care about are at least quasi-sequential and can benefit
>>>  from ignoring randomness to a certain degree.  Is that it?
>>> 
>> 
>> If I have understood correctly, you refer to that maximum ~30%
>> throughput loss that a quasi-sequential workload can incur (because of
>> some randomness or of other unlucky accidents). If so, then I think
>> you fully got the point.
> 
> Alright, I see.
> 
>>> 2. I don't think strict fairness matters is all that important for IO
>>>  scheduling in general.  Whatever gives us the best overall result
>>>  should work, so if bandwidth based scheduling does that great;
>>>  however, fairness does matter across cgroups.  A cgroup configured
>>>  to receive 50% of IO resources should get close to that no matter
>>>  what others are doing, would bfq be able to do that?
>> 
>> BFQ guarantees 50% of the bandwidth of the resource, not 50% of the
>> time. In this respect, with 50% of the time instead of 50% of the
> 
> So, across cgroups, I don't think we can pretend that bandwidth is the
> resource.  There should be a reasonable level of isolation.  Bandwidth
> for a rotating disk is a byproduct which can fluctuate widely.  "You
> have 50% of the total disk bandwidth" doesn't mean anything if that
> bandwidth can easily fluctuate a hundred fold.
> 

I agree that, if a system serves a workload whose characteristics
change significantly every 100ms, or even more frequently, and in an
unpredictable way, then both time-based and sector-based scheduling
provide exactly the same level of bandwidth guarantees. That is,
almost no bandwidth guarantee.

But AFAIK many systems, services and applications do not behave in such
a way. On the contrary, their IO patterns are rather stable.

So, if we choose time-based scheduling in view of the fact that for an
unpredictable system there would be no benefits with sector-based
scheduling, then we just throw away all the benefits that we would
have with sector-based scheduling on more stable systems.

>> bandwidth, a group suffers from the bandwidth fluctuation, higher
>> latency and throughput loss problems that I have tried to highlight.
>> Plus, it is not possible to easily answer to questions like, e.g.: "how
>> long would it take to copy this file"?.
> 
> It's actually a lot more difficult to answer that with bandwidth
> scheduling.  Let's say cgroup A has 50% of disk time.  Sure, there are
> inaccuracies, but it should be able to get close to the ballpark -
> let's be lax and say between 30% and 45% of raw sequential bandwidth.
> It isn't ideal but now imagine bandwidth based scheduling.  Depending
> on what the others are doing, it may get 5% or even lower of the raw
> sequential bandwidth.  It isn't isolating anything.
> 

Definitely. Nevertheless my point is still about the same: we have to
consider one system at a time. If the workload of the system is highly
variable and completely unpredictable, then it is hard to provide any
bandwidth guarantee with any solution.

But if the workload has a minimum of stability, then sector scheduling
either wins or provides the same guarantees as time-based guarantees.
For example, a concrete instance of your low-bandwidth example may be
one where you have one quasi-sequential workload W, competing with
nine random workloads. In this case, if, e.g., all workloads have the
same weight, then BFQ would schedule the resource like a time-based
scheduler: one full budget (which lasts for about one time slice) for
workload W, followed by one time slice for each of the other
workloads. Then there would be no service-guarantee loss with respect
to time-based scheduling.

In contrast, in all the other examples I have mentioned so far
(file-hosting, streaming, video/audio playback against background
workloads, application start-up, ...) sector-based scheduling would be
clearly beneficial even in a hierarchical setting.

In the end, if we give up sector scheduling for cgroups, we can only
lose some benefits. Unless I'm still missing some even more important
problem (sorry about that).

>> In any case, it is of course possible to get time distribution also
>> with BFQ, by 'just' letting it work in the time domain. However,
>> changing BFQ to operate in the time domain, or, probably much better,
>> extending BFQ to operate correctly in both domains, would be a lot of
>> work. I don't know whether it would be worth the effort and the
>> extra complexity.
> 
> As I wrote before, as fairness isn't that important for normal
> scheduling, if empirical data show that bandwidth based scheduling is
> beneficial for most common workloads, that's awesome especially given
> that CFQ has plenty of issues.  I don't think cgroup case is workable
> as currently implemented tho.
> 

I was thinking about some solution to achieve both goals. An option is
probably to let BFQ work in a double mode: sector-based within groups
and time-based among groups. However, I find it a little messy and
confusing.

Other ideas/solutions? I have no better proposal at the moment :(

Thanks,
Paolo


> Thanks.
> 
> -- 
> tejun

  reply	other threads:[~2016-04-15 14:20 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01 22:12 [PATCH RFC 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 01/22] block, cfq: remove queue merging for close cooperators Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 02/22] block, cfq: remove close-based preemption Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 03/22] block, cfq: remove deep seek queues logic Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 04/22] block, cfq: remove SSD-related logic Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 05/22] block, cfq: get rid of hierarchical support Paolo Valente
2016-02-10 23:04   ` Tejun Heo
2016-02-01 22:12 ` [PATCH RFC 06/22] block, cfq: get rid of queue preemption Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 07/22] block, cfq: get rid of workload type Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 08/22] block, cfq: get rid of latency tunables Paolo Valente
2016-02-10 23:05   ` Tejun Heo
2016-02-01 22:12 ` [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Paolo Valente
2016-02-11 22:22   ` Tejun Heo
2016-02-12  0:35     ` Mark Brown
2016-02-17 15:57       ` Tejun Heo
2016-02-17 16:02         ` Mark Brown
2016-02-17 17:04           ` Tejun Heo
2016-02-17 18:13             ` Jonathan Corbet
2016-02-17 19:45               ` Tejun Heo
2016-02-17 19:56                 ` Jonathan Corbet
2016-02-17 20:14                   ` Tejun Heo
2016-02-17  9:02     ` Paolo Valente
2016-02-17 17:02       ` Tejun Heo
2016-02-20 10:23         ` Paolo Valente
2016-02-20 11:02           ` Paolo Valente
2016-03-01 18:46           ` Tejun Heo
2016-03-04 17:29             ` Linus Walleij
2016-03-04 17:39               ` Christoph Hellwig
2016-03-04 18:10                 ` Austin S. Hemmelgarn
2016-03-11 11:16                   ` Christoph Hellwig
2016-03-11 13:38                     ` Austin S. Hemmelgarn
2016-03-05 12:18                 ` Linus Walleij
2016-03-11 11:17                   ` Christoph Hellwig
2016-03-11 11:24                     ` Nikolay Borisov
2016-03-11 11:49                       ` Christoph Hellwig
2016-03-11 14:53                     ` Linus Walleij
2016-03-09  6:55                 ` Paolo Valente
2016-04-13 19:54                 ` Tejun Heo
2016-04-14  5:03                   ` Mark Brown
2016-03-09  6:34             ` Paolo Valente
2016-04-13 20:41               ` Tejun Heo
2016-04-14 10:23                 ` Paolo Valente
2016-04-14 16:29                   ` Tejun Heo
2016-04-15 14:20                     ` Paolo Valente [this message]
2016-04-15 15:08                       ` Tejun Heo
2016-04-15 16:17                         ` Paolo Valente
2016-04-15 19:29                           ` Tejun Heo
2016-04-15 22:08                             ` Paolo Valente
2016-04-15 22:45                               ` Tejun Heo
2016-04-16  6:03                                 ` Paolo Valente
2016-04-15 14:49                     ` Linus Walleij
2016-02-01 22:12 ` [PATCH RFC 10/22] block, bfq: add full hierarchical scheduling and cgroups support Paolo Valente
2016-02-11 22:28   ` Tejun Heo
2016-02-17  9:07     ` Paolo Valente
2016-02-17 17:14       ` Tejun Heo
2016-02-17 17:45         ` Tejun Heo
2016-04-20  9:32     ` Paolo
2016-04-22 18:13       ` Tejun Heo
2016-04-22 18:19         ` Paolo Valente
2016-04-22 18:41           ` Tejun Heo
2016-04-22 19:05             ` Paolo Valente
2016-04-22 19:32               ` Tejun Heo
2016-04-23  7:07                 ` Paolo Valente
2016-04-25 19:24                   ` Tejun Heo
2016-04-25 20:30                     ` Paolo
2016-05-06 20:20                       ` Paolo Valente
2016-05-12 13:11                         ` Paolo
2016-07-27 16:13                         ` [PATCH RFC V8 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 01/22] block, cfq: remove queue merging for close cooperators Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 02/22] block, cfq: remove close-based preemption Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 03/22] block, cfq: remove deep seek queues logic Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 04/22] block, cfq: remove SSD-related logic Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 05/22] block, cfq: get rid of hierarchical support Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 06/22] block, cfq: get rid of queue preemption Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 07/22] block, cfq: get rid of workload type Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 08/22] block, cfq: get rid of latency tunables Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 10/22] block, bfq: add full hierarchical scheduling and cgroups support Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 11/22] block, bfq: improve throughput boosting Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 12/22] block, bfq: modify the peak-rate estimator Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 13/22] block, bfq: add more fairness with writes and slow processes Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 14/22] block, bfq: improve responsiveness Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 15/22] block, bfq: reduce I/O latency for soft real-time applications Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 16/22] block, bfq: preserve a low latency also with NCQ-capable drives Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 17/22] block, bfq: reduce latency during request-pool saturation Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 18/22] block, bfq: add Early Queue Merge (EQM) Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 19/22] block, bfq: reduce idling only in symmetric scenarios Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 20/22] block, bfq: boost the throughput on NCQ-capable flash-based devices Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 21/22] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 22/22] block, bfq: handle bursts of queue activations Paolo Valente
2016-07-28 16:50                           ` [PATCH RFC V8 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo
2016-02-01 22:12 ` [PATCH RFC 11/22] block, bfq: improve throughput boosting Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 12/22] block, bfq: modify the peak-rate estimator Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 13/22] block, bfq: add more fairness to boost throughput and reduce latency Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 14/22] block, bfq: improve responsiveness Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 15/22] block, bfq: reduce I/O latency for soft real-time applications Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 16/22] block, bfq: preserve a low latency also with NCQ-capable drives Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 17/22] block, bfq: reduce latency during request-pool saturation Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 18/22] block, bfq: add Early Queue Merge (EQM) Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 19/22] block, bfq: reduce idling only in symmetric scenarios Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 20/22] block, bfq: boost the throughput on NCQ-capable flash-based devices Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 21/22] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 22/22] block, bfq: handle bursts of queue activations Paolo Valente

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=427F5DF5-507A-4657-8279-B6A8FD98F6D8@linaro.org \
    --to=paolo.valente@linaro.org \
    --cc=avanzini.arianna@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=broonie@kernel.org \
    --cc=fchecconi@gmail.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).