From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755171Ab0KXNUW (ORCPT ); Wed, 24 Nov 2010 08:20:22 -0500 Received: from mga11.intel.com ([192.55.52.93]:28066 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754012Ab0KXNUU (ORCPT ); Wed, 24 Nov 2010 08:20:20 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.59,248,1288594800"; d="scan'208";a="860956809" Date: Wed, 24 Nov 2010 21:20:13 +0800 From: Wu Fengguang To: Peter Zijlstra Cc: Andrew Morton , Jan Kara , "Li, Shaohua" , Christoph Hellwig , Dave Chinner , "Theodore Ts'o" , Chris Mason , Mel Gorman , Rik van Riel , KOSAKI Motohiro , linux-mm , "linux-fsdevel@vger.kernel.org" , LKML Subject: Re: [PATCH 06/13] writeback: bdi write bandwidth estimation Message-ID: <20101124132012.GA12117@localhost> References: <20101117042720.033773013@intel.com> <20101117042850.002299964@intel.com> <1290596732.2072.450.camel@laptop> <20101124121046.GA8333@localhost> <1290603047.2072.465.camel@laptop> <20101124131437.GE10413@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20101124131437.GE10413@localhost> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 24, 2010 at 09:14:37PM +0800, Wu Fengguang wrote: > On Wed, Nov 24, 2010 at 08:50:47PM +0800, Peter Zijlstra wrote: > > On Wed, 2010-11-24 at 20:10 +0800, Wu Fengguang wrote: > > > > > + /* > > > > > + * When there lots of tasks throttled in balance_dirty_pages(), they > > > > > + * will each try to update the bandwidth for the same period, making > > > > > + * the bandwidth drift much faster than the desired rate (as in the > > > > > + * single dirtier case). So do some rate limiting. > > > > > + */ > > > > > + if (jiffies - bdi->write_bandwidth_update_time < elapsed) > > > > > + goto snapshot; > > > > > > > > Why this goto snapshot and not simply return? This is the second call > > > > (bdi_update_bandwidth equivalent). > > > > > > Good question. The loop inside balance_dirty_pages() normally run only > > > once, however wb_writeback() may loop over and over again. If we just > > > return here, the condition > > > > > > (jiffies - bdi->write_bandwidth_update_time < elapsed) > > > > > > cannot be reset, then future bdi_update_bandwidth() calls in the same > > > wb_writeback() loop will never find it OK to update the bandwidth. > > > > But the thing is, you don't want to reset that, it might loop so fast > > you'll throttle all of them, if you keep the pre-throttle value you'll > > eventually pass, no? > > It (let's name it A) only resets the _local_ vars bw_* when it's sure > by the condition > > (jiffies - bdi->write_bandwidth_update_time < elapsed) this will be true if someone else has _done_ overlapped estimation, otherwise it will equal: jiffies - bdi->write_bandwidth_update_time == elapsed Sorry the comment needs updating. Thanks, Fengguang > that someone else (name B) has updated the _global_ bandwidth in the > time range we planned. So there may be some time in A's range that is > not covered by B, but sure the range is not totally bypassed without > updating the bandwidth. > > > > It does assume no races between CPUs.. We may need some per-cpu based > > > estimation. > > > > But that multi-writer race is valid even for the balance_dirty_pages() > > call, two or more could interleave on the bw_time and bw_written > > variables. > > The race will only exist in each task's local vars (their bw_* will > overlap). But the update bdi->write_bandwidth* will be safeguarded > by the above check. When the task is scheduled back, it may find > updated write_bandwidth_update_time and hence give up his estimation. > This is rather tricky.. > > Thanks, > Fengguang