Re: [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler

From: Tejun Heo <tj@kernel.org>
To: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	Fabio Checconi <fchecconi@gmail.com>,
	Arianna Avanzini <avanzini.arianna@gmail.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ulf Hansson <ulf.hansson@linaro.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	broonie@kernel.org
Subject: Re: [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler
Date: Wed, 13 Apr 2016 16:41:10 -0400	[thread overview]
Message-ID: <20160413204110.GF20142@htj.duckdns.org> (raw)
In-Reply-To: <E0694788-2787-4D99-88FC-8EAC5E335CBE@linaro.org>

Hello,

Sorry about the long delay.

On Wed, Mar 09, 2016 at 07:34:15AM +0100, Paolo Valente wrote:
> This is probably the focal point of our discussion. Unfortunately, I
> am still not convinced of your claim. In fact, basing budgets on
> sectors (service), instead of time, still seems to me the only way to
> provide the stronger bandwidth and low-latency guarantees that I have
> tried to highlight in my previous email. And these guarantees do not
> seem to concern only a single special case, but several, common use
> cases for server and desktop systems. I will try to repeat these facts
> more concisely, and hopefully more clearly, in my replies to next
> points.

I'm still trying to get my head wrapped around why basing the
scheduling on bandwidth would have those benefits because the
connection isn't intuitive to me at all.  If you're saying that most
workloads care about bandwidth a lot more and the specifics of their
IO patterns are mostly accidental and should be discounted for
scheduling decisions, I can understand how that could be.  Is that
what you're saying?

> > I see.  Once a queue starts timing out its slice, it gets switched to
> > time based scheduling; however, as you mentioned, workloads which
> > generate moderate random IOs would still get preferential treatment
> > over workloads which generate sequential IOs, by definition.
> 
> Exactly. However, there is still something I don’t fully understand in
> your doubt. With BFQ, workloads that generate moderate random IOs
> would actually do less damage to throughput, on average, than with
> CFQ. In fact, with CFQ the queues containing these IOs would
> systematically get a full time slice, while there are two
> possibilities with BFQ:
> 1) If the degree of randomness of (the IOs in) these queues is not too
> high, then these queues are likely to finish budgets before
> timeouts. In this case, with BFQ these queues get less service than
> with CFQ, and thus can waste throughput less.
> 2) If the degree of randomness of these queues is very high, then they
> consume full time slices with BFQ, exactly as with CFQ.
> 
> Of course, performance may differ if time slices, i.e., timeouts,
> differ between BFQ and CFQ, but this is easy to tune, if needed.

Hmm.. the above doesn't really make sense to me, so you're saying that
bandwidth based control only cuts down the slice a random workload
would use and thus wouldn't benefit them; however, that cut-down of
slice is based on bandwidth consumption, so it would kick in a lot
more for sequential workloads.  It wouldn't make a random workload's
slice longer than the timeout but it would make sequantial ones'
slices shorter.  What am I missing?

> > Workloads are varied and underlying device performs wildly differently
> > depending on the specific IO pattern.  Because rotating disks suck so
> > badly, it's true that there's a lot of wiggle room in what the IO
> > scheduler can do.  People are accustomed to dealing with random
> > behaviors.  That said, it still doesn't feel comfortable to use the
> > obviously wrong unit as the fundamental basis of resource
> > distribution.
> 
> Actually this does not seem to match our (admittedly limited)
> experience with: low-to-high-end rotational devices, RAIDS, SSDs, SD
> cards and eMMCs. When stimulated with the same patterns in out tests,
> these devices always responded with about the same IO service
> times. And this seems to comply with intuition, because, apart from
> different initial cache states, the same stimuli cause about the same
> arm movements, cache hits/misses, and circuitry operations.

Oh, that's not what I meant.  If you feed the same sequence of IOs,
they would behave similarly.  What I meant was that the composition of
IOs themselves would change significantly depneding on how different
types of workloads get scheduled.

> > So, yes, I see that bandwidth based control would yield a better
> > result for this specific use case but at the same time this is a very
> > specialized use case and probably the *only* use case where bandwidth
> > based distribution makes sense - equivalent logically sequential
> > workloads where the specifics of IO pattern are completely accidental.
> > We can't really design the whole thing around that single use case.
> 
> Actually, the tight bandwidth guarantees that I have tried to
> highlight are the ground on which the other low-latency guarantees are
> built. So, to sum up the set of guarantees that Linus discussed in
> more detail in his email, BFQ mainly guarantees, even in the presence
> of throughput fluctuations, and thanks also to sector-based
> scheduling:
> 1) Stable(r) and tight bandwidth distribution for mostly-sequential
> reads/writes

So, yeah, the above makes toal sense.

> 2) Stable(r) and high responsiveness
> 3) Stable(r) and low latency for soft real-time applications
> 4) Faster execution of dev tasks, such as compile and git operations
> (checkout, merge, …), in the presence of background workloads, and
> while guaranteeing a high responsiveness too

But can you please enlighten me on why 2-4 are inherently tied to
bandwidth-based scheduling?

> > Hmmm... it could be that I'm mistaken on how trigger happy the switch
> > to time based scheduling is.  Maybe it's sensitive enough that
> > bandwidth based scheduling is only applied to workloads which are
> > mostly sequential.  I'm sorry if I'm being too dense on this point but
> > can you please give me some examples on what would happen when
> > sequential workloads and random ones mix?
> > 
> 
> In the simplest case,
> . sequential workloads would get sector-based service guarantees, with
> the resulting stability and low-latency properties that I have tried
> to highlight;
> . random workloads would get time-based service, and thus similar
> service guarantees as with CFQ (actually guarantees would still be
> tighter, because of the more accurate scheduling policy of BFQ).

But don't the above inherently mean that sequential workloads would
get less in terms of IO time?

To summarize,

1. I still don't understand why bandwidth-based scheduling is better
   (sorry).  The only reason I can think of is that most workloads
   that we care about are at least quasi-sequential and can benefit
   from ignoring randomness to a certain degree.  Is that it?

2. I don't think strict fairness matters is all that important for IO
   scheduling in general.  Whatever gives us the best overall result
   should work, so if bandwidth based scheduling does that great;
   however, fairness does matter across cgroups.  A cgroup configured
   to receive 50% of IO resources should get close to that no matter
   what others are doing, would bfq be able to do that?

Thanks.

-- 
tejun