From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759351Ab1FASTI (ORCPT ); Wed, 1 Jun 2011 14:19:08 -0400 Received: from e4.ny.us.ibm.com ([32.97.182.144]:40081 "EHLO e4.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753902Ab1FASTF (ORCPT ); Wed, 1 Jun 2011 14:19:05 -0400 Date: Wed, 1 Jun 2011 11:19:00 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Damien Wyart , Ingo Molnar , Mike Galbraith , linux-kernel@vger.kernel.org Subject: Re: Very high CPU load when idle with 3.0-rc1 Message-ID: <20110601181900.GI2274@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20110530055924.GA9169@brouette> <1306755291.1200.2872.camel@twins> <20110530162354.GQ2668@linux.vnet.ibm.com> <1306775989.2497.527.camel@laptop> <20110530212833.GS2668@linux.vnet.ibm.com> <1306791219.23844.12.camel@twins> <20110531014543.GU2668@linux.vnet.ibm.com> <1306926339.2353.191.camel@twins> <20110601143743.GA2274@linux.vnet.ibm.com> <1306947513.2497.624.camel@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1306947513.2497.624.camel@laptop> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 01, 2011 at 06:58:33PM +0200, Peter Zijlstra wrote: > On Wed, 2011-06-01 at 07:37 -0700, Paul E. McKenney wrote: > > > > > I considered that, but working out when it is OK to deboost them is > > > > decidedly non-trivial. > > > > > > Where exactly is the problem there? The boost lasts for as long as it > > > takes to finish the grace period, right? There's a distinct set of > > > callbacks associated with each grace-period, right? In which case you > > > can de-boost your thread the moment you're done processing that set. > > > > > > Or am I simply confused about how all this is supposed to work? > > > > The main complications are: (1) the fact that it is hard to tell exactly > > which grace period to wait for, this one or the next one, and (2) the > > fact that callbacks get shuffled when CPUs go offline. > > I can't say I would worry too much about 2, hotplug and RT don't really > go hand-in-hand anyway. Perhaps not, but I do need to handle the combination. > On 1 however, is that due to the boost condition? The boost condition is straightforward. By default, if a grace period lasts for more than 500 milliseconds, boosting starts. So the obvious answer is "deboost when the grace period ends", but different CPUs become aware of the end at different times, so it is still a bit fuzzy. > I must admit that my thought there is somewhat fuzzy since I just > realized I don't actually know the exact condition to start boosting, > but suppose we boost because the queue is too large, then waiting for > the current grace period might not reduce the queue length, as most > callbacks might actually be for the next. > > If however the condition is grace period duration, then completion of > the current grace period is sufficient, since the whole boost condition > is defined as such. [ if the next is also exceeding the time limit, > that's a whole next boost ] Don't get me wrong -- it can be done. Just a bit ugly due to the fact that different CPUs have different views of when the grace period ends. > > That said, it might be possible if we are willing to live with some > > approximate behavior. For example, always waiting for the next grace > > period (rather than the current one) to finish, and boosting through the > > extra callbacks in case where a given CPU "adopts" callbacks that must > > be boosted when that CPU also has some callbacks whose priority must be > > boosted and some that need not be. > > That might make sense, but I must admit to not fully understanding the > whole current/next thing yet. And I cannot claim to have thought it through thoroughly, for that matter. > > The reason I am not all that excited about taking this approach is that > > it doesn't help worst-case latency. > > Well, not running at the top most prio does help those tasks running at > a higher priority, so in that regard it does reduce the jitter for a > number of tasks. By default, boosting is to RT prio 1, so shouldn't bother most RT processes. > Also, I guess there's the whole question of what prio to boost to which > I somehow totally forgot about, which is a non-trivial thing in its own > right, since there isn't really someone blocked on grace period > completion (although in the special case of someone calling sync_rcu it > is clear). I am not all that excited about synchronize_rcu() controlling the boost priority, but having synchronize_rcu_expedited() do so might make sense. But I would want someone to come up with a situation needing this first. Other than that, it is similar to working out what priority softirq should run at in PREEMPT_RT. > > Plus the current implementation is just a less-precise approximation. > > (Sorry, couldn't resist!) > > Appreciated, on a similar note I still need to actually look at all this > (preempt) tree-rcu stuff to learn how exactly it works. And I do need to document it. For one thing, I usually find a few bugs when I do that. For another, the previous documentation is getting quite dated. Thanx, Paul