From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752068AbcDOPIy (ORCPT ); Fri, 15 Apr 2016 11:08:54 -0400 Received: from mail-yw0-f181.google.com ([209.85.161.181]:32848 "EHLO mail-yw0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751879AbcDOPIw (ORCPT ); Fri, 15 Apr 2016 11:08:52 -0400 Date: Fri, 15 Apr 2016 11:08:35 -0400 From: Tejun Heo To: Paolo Valente Cc: Jens Axboe , Fabio Checconi , Arianna Avanzini , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Ulf Hansson , Linus Walleij , Mark Brown Subject: Re: [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Message-ID: <20160415150835.GI12583@htj.duckdns.org> References: <20160211222210.GC3741@mtj.duckdns.org> <8FDE2B10-9BD2-4741-917F-5A37A74E5B58@linaro.org> <20160217170206.GU3741@mtj.duckdns.org> <72E81252-203C-4EB7-8459-B9B7060029C6@linaro.org> <20160301184656.GI3965@htj.duckdns.org> <20160413204110.GF20142@htj.duckdns.org> <2B664E4D-857C-4BBA-BE77-97EA6CC3F270@linaro.org> <20160414162953.GG12583@htj.duckdns.org> <427F5DF5-507A-4657-8279-B6A8FD98F6D8@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <427F5DF5-507A-4657-8279-B6A8FD98F6D8@linaro.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Paolo. On Fri, Apr 15, 2016 at 04:20:44PM +0200, Paolo Valente wrote: > > It's actually a lot more difficult to answer that with bandwidth > > scheduling. Let's say cgroup A has 50% of disk time. Sure, there are > > inaccuracies, but it should be able to get close to the ballpark - > > let's be lax and say between 30% and 45% of raw sequential bandwidth. > > It isn't ideal but now imagine bandwidth based scheduling. Depending > > on what the others are doing, it may get 5% or even lower of the raw > > sequential bandwidth. It isn't isolating anything. > > Definitely. Nevertheless my point is still about the same: we have to > consider one system at a time. If the workload of the system is highly > variable and completely unpredictable, then it is hard to provide any > bandwidth guarantee with any solution. I don't think that is true with time based scheduling. If you allocate 50% of time, it'll get close to 50% of IO time which translates to bandwidth which is lower than 50% but still in the ballpark. That is very different from "we can't guarantee anything if the other workloads are highly variable". So, I get that for a lot of workload, especially interactive ones, IO patterns are quasi-sequential and bw based scheduling is beneficial and we don't care that much about fairness in general; however, it's problematic that it would make the behavior of proportional control quite surprising. > > As I wrote before, as fairness isn't that important for normal > > scheduling, if empirical data show that bandwidth based scheduling is > > beneficial for most common workloads, that's awesome especially given > > that CFQ has plenty of issues. I don't think cgroup case is workable > > as currently implemented tho. > > I was thinking about some solution to achieve both goals. An option is > probably to let BFQ work in a double mode: sector-based within groups > and time-based among groups. However, I find it a little messy and > confusing. > > Other ideas/solutions? I have no better proposal at the moment :( No idea. I don't think isolation could work without time based scheduling at some level tho. :( Thanks. -- tejun