From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932957AbcDNQ36 (ORCPT ); Thu, 14 Apr 2016 12:29:58 -0400 Received: from mail-qg0-f67.google.com ([209.85.192.67]:34315 "EHLO mail-qg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932370AbcDNQ34 (ORCPT ); Thu, 14 Apr 2016 12:29:56 -0400 Date: Thu, 14 Apr 2016 12:29:53 -0400 From: Tejun Heo To: Paolo Valente Cc: Jens Axboe , Fabio Checconi , Arianna Avanzini , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Ulf Hansson , Linus Walleij , broonie@kernel.org Subject: Re: [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Message-ID: <20160414162953.GG12583@htj.duckdns.org> References: <1454364778-25179-1-git-send-email-paolo.valente@linaro.org> <1454364778-25179-10-git-send-email-paolo.valente@linaro.org> <20160211222210.GC3741@mtj.duckdns.org> <8FDE2B10-9BD2-4741-917F-5A37A74E5B58@linaro.org> <20160217170206.GU3741@mtj.duckdns.org> <72E81252-203C-4EB7-8459-B9B7060029C6@linaro.org> <20160301184656.GI3965@htj.duckdns.org> <20160413204110.GF20142@htj.duckdns.org> <2B664E4D-857C-4BBA-BE77-97EA6CC3F270@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <2B664E4D-857C-4BBA-BE77-97EA6CC3F270@linaro.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Paolo. On Thu, Apr 14, 2016 at 12:23:14PM +0200, Paolo Valente wrote: ... > >> 1) Stable(r) and tight bandwidth distribution for mostly-sequential > >> reads/writes > > > > So, yeah, the above makes toal sense. > > > >> 2) Stable(r) and high responsiveness > >> 3) Stable(r) and low latency for soft real-time applications > >> 4) Faster execution of dev tasks, such as compile and git operations > >> (checkout, merge, …), in the presence of background workloads, and > >> while guaranteeing a high responsiveness too > > > > But can you please enlighten me on why 2-4 are inherently tied to > > bandwidth-based scheduling? > > Goals 2-4 are obtained by granting a higher share of the throughput > to the applications to privilege. The more stably and accurately the > underlying scheduling engine is able to enforce the desired bandwidth > distribution, the more stably and accurately higher shares can be > guaranteed. Then 2-4 follows from 1, i.e., from that BFQ guarantees > a stabler and tight(er) bandwidth distribution. 4) makes sense as a lot of that workload would be at least quasi-sequential but I can't tell why 2) and 3) would depend on bandwidth based scheduling. They're about recognizing workloads which can benefit from low latency and treating them accordingly. Why would making the underlying scheduling time based change that? > > To summarize, > > > > 1. I still don't understand why bandwidth-based scheduling is better > > (sorry). The only reason I can think of is that most workloads > > that we care about are at least quasi-sequential and can benefit > > from ignoring randomness to a certain degree. Is that it? > > > > If I have understood correctly, you refer to that maximum ~30% > throughput loss that a quasi-sequential workload can incur (because of > some randomness or of other unlucky accidents). If so, then I think > you fully got the point. Alright, I see. > > 2. I don't think strict fairness matters is all that important for IO > > scheduling in general. Whatever gives us the best overall result > > should work, so if bandwidth based scheduling does that great; > > however, fairness does matter across cgroups. A cgroup configured > > to receive 50% of IO resources should get close to that no matter > > what others are doing, would bfq be able to do that? > > BFQ guarantees 50% of the bandwidth of the resource, not 50% of the > time. In this respect, with 50% of the time instead of 50% of the So, across cgroups, I don't think we can pretend that bandwidth is the resource. There should be a reasonable level of isolation. Bandwidth for a rotating disk is a byproduct which can fluctuate widely. "You have 50% of the total disk bandwidth" doesn't mean anything if that bandwidth can easily fluctuate a hundred fold. > bandwidth, a group suffers from the bandwidth fluctuation, higher > latency and throughput loss problems that I have tried to highlight. > Plus, it is not possible to easily answer to questions like, e.g.: "how > long would it take to copy this file"?. It's actually a lot more difficult to answer that with bandwidth scheduling. Let's say cgroup A has 50% of disk time. Sure, there are inaccuracies, but it should be able to get close to the ballpark - let's be lax and say between 30% and 45% of raw sequential bandwidth. It isn't ideal but now imagine bandwidth based scheduling. Depending on what the others are doing, it may get 5% or even lower of the raw sequential bandwidth. It isn't isolating anything. > In any case, it is of course possible to get time distribution also > with BFQ, by 'just' letting it work in the time domain. However, > changing BFQ to operate in the time domain, or, probably much better, > extending BFQ to operate correctly in both domains, would be a lot of > work. I don't know whether it would be worth the effort and the > extra complexity. As I wrote before, as fairness isn't that important for normal scheduling, if empirical data show that bandwidth based scheduling is beneficial for most common workloads, that's awesome especially given that CFQ has plenty of issues. I don't think cgroup case is workable as currently implemented tho. Thanks. -- tejun