From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932957AbcDNQ36 (ORCPT <rfc822;w@1wt.eu>);
	Thu, 14 Apr 2016 12:29:58 -0400
Received: from mail-qg0-f67.google.com ([209.85.192.67]:34315 "EHLO
	mail-qg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932370AbcDNQ34 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 14 Apr 2016 12:29:56 -0400
Date: Thu, 14 Apr 2016 12:29:53 -0400
From: Tejun Heo <tj@kernel.org>
To: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>, Fabio Checconi <fchecconi@gmail.com>,
        Arianna Avanzini <avanzini.arianna@gmail.com>,
        linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
        Ulf Hansson <ulf.hansson@linaro.org>,
        Linus Walleij <linus.walleij@linaro.org>, broonie@kernel.org
Subject: Re: [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O
 scheduler
Message-ID: <20160414162953.GG12583@htj.duckdns.org>
References: <1454364778-25179-1-git-send-email-paolo.valente@linaro.org>
 <1454364778-25179-10-git-send-email-paolo.valente@linaro.org>
 <20160211222210.GC3741@mtj.duckdns.org>
 <8FDE2B10-9BD2-4741-917F-5A37A74E5B58@linaro.org>
 <20160217170206.GU3741@mtj.duckdns.org>
 <72E81252-203C-4EB7-8459-B9B7060029C6@linaro.org>
 <20160301184656.GI3965@htj.duckdns.org>
 <E0694788-2787-4D99-88FC-8EAC5E335CBE@linaro.org>
 <20160413204110.GF20142@htj.duckdns.org>
 <2B664E4D-857C-4BBA-BE77-97EA6CC3F270@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <2B664E4D-857C-4BBA-BE77-97EA6CC3F270@linaro.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello, Paolo.

On Thu, Apr 14, 2016 at 12:23:14PM +0200, Paolo Valente wrote:
...
> >> 1) Stable(r) and tight bandwidth distribution for mostly-sequential
> >> reads/writes
> > 
> > So, yeah, the above makes toal sense.
> > 
> >> 2) Stable(r) and high responsiveness
> >> 3) Stable(r) and low latency for soft real-time applications
> >> 4) Faster execution of dev tasks, such as compile and git operations
> >> (checkout, merge, …), in the presence of background workloads, and
> >> while guaranteeing a high responsiveness too
> > 
> > But can you please enlighten me on why 2-4 are inherently tied to
> > bandwidth-based scheduling?
> 
> Goals 2-4 are obtained by granting a higher share of the throughput
> to the applications to privilege. The more stably and accurately the
> underlying scheduling engine is able to enforce the desired bandwidth
> distribution, the more stably and accurately higher shares can be
> guaranteed. Then 2-4 follows from 1, i.e., from that BFQ guarantees
> a stabler and tight(er) bandwidth distribution.

4) makes sense as a lot of that workload would be at least
quasi-sequential but I can't tell why 2) and 3) would depend on
bandwidth based scheduling.  They're about recognizing workloads which
can benefit from low latency and treating them accordingly.  Why would
making the underlying scheduling time based change that?

> > To summarize,
> > 
> > 1. I still don't understand why bandwidth-based scheduling is better
> >   (sorry).  The only reason I can think of is that most workloads
> >   that we care about are at least quasi-sequential and can benefit
> >   from ignoring randomness to a certain degree.  Is that it?
> > 
> 
> If I have understood correctly, you refer to that maximum ~30%
> throughput loss that a quasi-sequential workload can incur (because of
> some randomness or of other unlucky accidents). If so, then I think
> you fully got the point.

Alright, I see.

> > 2. I don't think strict fairness matters is all that important for IO
> >   scheduling in general.  Whatever gives us the best overall result
> >   should work, so if bandwidth based scheduling does that great;
> >   however, fairness does matter across cgroups.  A cgroup configured
> >   to receive 50% of IO resources should get close to that no matter
> >   what others are doing, would bfq be able to do that?
> 
> BFQ guarantees 50% of the bandwidth of the resource, not 50% of the
> time. In this respect, with 50% of the time instead of 50% of the

So, across cgroups, I don't think we can pretend that bandwidth is the
resource.  There should be a reasonable level of isolation.  Bandwidth
for a rotating disk is a byproduct which can fluctuate widely.  "You
have 50% of the total disk bandwidth" doesn't mean anything if that
bandwidth can easily fluctuate a hundred fold.

> bandwidth, a group suffers from the bandwidth fluctuation, higher
> latency and throughput loss problems that I have tried to highlight.
> Plus, it is not possible to easily answer to questions like, e.g.: "how
> long would it take to copy this file"?.

It's actually a lot more difficult to answer that with bandwidth
scheduling.  Let's say cgroup A has 50% of disk time.  Sure, there are
inaccuracies, but it should be able to get close to the ballpark -
let's be lax and say between 30% and 45% of raw sequential bandwidth.
It isn't ideal but now imagine bandwidth based scheduling.  Depending
on what the others are doing, it may get 5% or even lower of the raw
sequential bandwidth.  It isn't isolating anything.

> In any case, it is of course possible to get time distribution also
> with BFQ, by 'just' letting it work in the time domain. However,
> changing BFQ to operate in the time domain, or, probably much better,
> extending BFQ to operate correctly in both domains, would be a lot of
> work. I don't know whether it would be worth the effort and the
> extra complexity.

As I wrote before, as fairness isn't that important for normal
scheduling, if empirical data show that bandwidth based scheduling is
beneficial for most common workloads, that's awesome especially given
that CFQ has plenty of issues.  I don't think cgroup case is workable
as currently implemented tho.

Thanks.

-- 
tejun