From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756825AbZDVKW4 (ORCPT ); Wed, 22 Apr 2009 06:22:56 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753652AbZDVKWr (ORCPT ); Wed, 22 Apr 2009 06:22:47 -0400 Received: from mail-bw0-f163.google.com ([209.85.218.163]:41223 "EHLO mail-bw0-f163.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751025AbZDVKWq (ORCPT ); Wed, 22 Apr 2009 06:22:46 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=BBN/71q0skJOiDL+hmL4o9YFV/6502S/SM78MiAnkeFdb0QEBVWm8t5F6vgoo/kHdI okVHiwPzYMpGyro7ahtndnGzNVrlEAKhrbo6T6fxFjUi/0kQtbYcwyCOE4zy5FaJcSJ/ Eyj5gmGITRsveeF6irDnRD0FcdCw4VuLOWfA8= Date: Wed, 22 Apr 2009 12:22:41 +0200 From: Andrea Righi To: KAMEZAWA Hiroyuki Cc: randy.dunlap@oracle.com, Carl Henrik Lunde , Jens Axboe , eric.rannaud@gmail.com, Balbir Singh , fernando@oss.ntt.co.jp, dradford@bluehost.com, Gui@smtp1.linux-foundation.org, agk@sourceware.org, subrata@linux.vnet.ibm.com, Paul Menage , Theodore Tso , akpm@linux-foundation.org, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, dave@linux.vnet.ibm.com, matt@bluehost.com, roberto@unbit.it, ngupta@google.com Subject: Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO Message-ID: <20090422102239.GA1935@linux> References: <20090421140631.GF19186@mit.edu> <20090421143130.GA22626@linux> <20090421163537.GI19186@mit.edu> <20090421172317.GM19637@balbir.in.ibm.com> <20090421174620.GD15541@mit.edu> <20090421181429.GO19637@balbir.in.ibm.com> <20090421191401.GF15541@mit.edu> <20090421204905.GA5573@linux> <20090422093349.1ee9ae82.kamezawa.hiroyu@jp.fujitsu.com> <20090422102153.9aec17b9.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090422102153.9aec17b9.kamezawa.hiroyu@jp.fujitsu.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 22, 2009 at 10:21:53AM +0900, KAMEZAWA Hiroyuki wrote: > On Wed, 22 Apr 2009 09:33:49 +0900 > KAMEZAWA Hiroyuki wrote: > > > > > And this should be probably strictly connected to the IO controller. If > > > we throttle or delay the dispatching/submission of some IO requests > > > without throttling the dirty pages rate a cgroup could completely waste > > > its own available memory with dirty (hard and slow to reclaim) pages. > > > > > > That is in part the approach I used in io-throttle v12, adding a hook in > > > balance_dirty_pages_ratelimited_nr() to throttle the current task when > > > cgroup's IO limit are exceeded. Argh! > > > > > > So, another proposal could be to re-add in io-throttle v14 the old hook > > > also in balance_dirty_pages_ratelimited_nr(). > > > > > > In this way io-throttle would: > > > > > > - use page_cgroup infrastructure and page_cgroup->flags to encode the > > > cgroup id that firstly dirtied a generic page > > > - account and opportunely throttle sync and writeback IO requests in > > > submit_bio() > > > - at the same time throttle the tasks in > > > balance_dirty_pages_ratelimited_nr() if the cgroup they belong has > > > exhausted the IO BW (or quota, share, etc. in case of proportional BW > > > limit) > > > > > > > IMHO, io-controller should just work as I/O subsystem as bdi. Now, per-bdi dirty_ratio > > is suppoted and it seems to work well. > > > > Can't we write a function like bdi_writeout_fraction() ?; > > It will be a simple choice. > > > One more thing, if you want dirty_ratio for throttoling I/O not for supporing page reclaim, > Something like task_dirty_limit() will be apporpriate. > > Thanks, > -Kame Actually I was proposing something quite similar, if I've understood well. Just add a hook in balance_dirty_pages() to throttle tasks in cgroups that exhausted their IO BW. The way to do so will be similar to the per-bdi write throttling, taking in account the IO requests previously submitted per cgroup, the pages dirtied per cgroup (considering that are not necessarily dirtied by the owner of the page) and apply something like congestion_wait() to throttle the tasks in the cgroups that exceeded the BW limit. Maybe we can just introduce cgroup_dirty_limit() simply replicating what we're doing for task_dirty_limit(), but using per cgroup statistics of course. I can change the io-throttle controller to do so. This feature should be valid also for the proportional BW approach. BTW Vivek's proposal to also dispatch IO requests according to cgroup proportional BW limits can be still valid and it is worth to be tested IMHO. But we must also find a way to say to the right cgroup: hey! stop to waste the memory with dirty pages, because you've directly or indirectly generated too much IO in the system and I'm throttling and/or not scheduling your IO requests. Objections? -Andrea