From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752593AbcDOQSF (ORCPT <rfc822;w@1wt.eu>);
	Fri, 15 Apr 2016 12:18:05 -0400
Received: from mail-wm0-f51.google.com ([74.125.82.51]:35816 "EHLO
	mail-wm0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752499AbcDOQSC convert rfc822-to-8bit (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 15 Apr 2016 12:18:02 -0400
Subject: Re: [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Content-Type: text/plain; charset=windows-1252
From: Paolo Valente <paolo.valente@linaro.org>
In-Reply-To: <20160415150835.GI12583@htj.duckdns.org>
Date: Fri, 15 Apr 2016 18:17:55 +0200
Cc: Jens Axboe <axboe@kernel.dk>, Fabio Checconi <fchecconi@gmail.com>,
        Arianna Avanzini <avanzini.arianna@gmail.com>,
        linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
        Ulf Hansson <ulf.hansson@linaro.org>,
        Linus Walleij <linus.walleij@linaro.org>,
        Mark Brown <broonie@kernel.org>
Content-Transfer-Encoding: 8BIT
Message-Id: <700B77C8-CB01-41C3-96E7-ED2C0B5A85D0@linaro.org>
References: <20160211222210.GC3741@mtj.duckdns.org> <8FDE2B10-9BD2-4741-917F-5A37A74E5B58@linaro.org> <20160217170206.GU3741@mtj.duckdns.org> <72E81252-203C-4EB7-8459-B9B7060029C6@linaro.org> <20160301184656.GI3965@htj.duckdns.org> <E0694788-2787-4D99-88FC-8EAC5E335CBE@linaro.org> <20160413204110.GF20142@htj.duckdns.org> <2B664E4D-857C-4BBA-BE77-97EA6CC3F270@linaro.org> <20160414162953.GG12583@htj.duckdns.org> <427F5DF5-507A-4657-8279-B6A8FD98F6D8@linaro.org> <20160415150835.GI12583@htj.duckdns.org>
To: Tejun Heo <tj@kernel.org>
X-Mailer: Apple Mail (2.1878.6)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


Il giorno 15/apr/2016, alle ore 17:08, Tejun Heo <tj@kernel.org> ha scritto:

> Hello, Paolo.
> 
> On Fri, Apr 15, 2016 at 04:20:44PM +0200, Paolo Valente wrote:
>>> It's actually a lot more difficult to answer that with bandwidth
>>> scheduling.  Let's say cgroup A has 50% of disk time.  Sure, there are
>>> inaccuracies, but it should be able to get close to the ballpark -
>>> let's be lax and say between 30% and 45% of raw sequential bandwidth.
>>> It isn't ideal but now imagine bandwidth based scheduling.  Depending
>>> on what the others are doing, it may get 5% or even lower of the raw
>>> sequential bandwidth.  It isn't isolating anything.
>> 
>> Definitely. Nevertheless my point is still about the same: we have to
>> consider one system at a time. If the workload of the system is highly
>> variable and completely unpredictable, then it is hard to provide any
>> bandwidth guarantee with any solution.
> 
> I don't think that is true with time based scheduling.  If you
> allocate 50% of time, it'll get close to 50% of IO time which
> translates to bandwidth which is lower than 50% but still in the
> ballpark.

But this is the same minimal service guarantee that you get with BFQ
in any case. I'm sorry for being so confusing to not make this central
point clear :(

>  That is very different from "we can't guarantee anything if
> the other workloads are highly variable”.
> 


If you have 50% of the time, but
. you don’t know anything about your workload properties, and
. the device speed can vary by two orders of magnitude,
then you can't provide any bandwidth guarantee, with any scheduler. Of
course I'm neglecting the minimal, trivial guarantee "getting a fraction
of the minimum possible speed of the device".

If you have 50% of the time allocated for a quasi-sequential workload,
then bandwidth and latencies may vary by an uncontrollable 30 or 40%,
depending on what you and the other groups do.

With the same device, if you have 50% of the bandwidth allocated with
BFQ for a quasi-sequential workload, then you can provide bandwidth
and latencies that may vary at most by a (still uncontrollable) 3 or
4%, depending on what you and the other groups do.

This improvement is shown, e.g., in my--admittedly boring--numerical
example, and is confirmed by my experimental results so far.

> So, I get that for a lot of workload, especially interactive ones, IO
> patterns are quasi-sequential and bw based scheduling is beneficial
> and we don't care that much about fairness in general; however, it's
> problematic that it would make the behavior of proportional control
> quite surprising.

If I have somehow convinced you with what I wrote above, then I hope
we might agree that a surprising behavior of BFQ with cgroups would be
just a matter of bugs.

Thanks,
Paolo

> 
>>> As I wrote before, as fairness isn't that important for normal
>>> scheduling, if empirical data show that bandwidth based scheduling is
>>> beneficial for most common workloads, that's awesome especially given
>>> that CFQ has plenty of issues.  I don't think cgroup case is workable
>>> as currently implemented tho.
>> 
>> I was thinking about some solution to achieve both goals. An option is
>> probably to let BFQ work in a double mode: sector-based within groups
>> and time-based among groups. However, I find it a little messy and
>> confusing.
>> 
>> Other ideas/solutions? I have no better proposal at the moment :(
> 
> No idea.  I don't think isolation could work without time based
> scheduling at some level tho. :(
> 
> Thanks.
> 
> -- 
> tejun