From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752376AbbF2H5G (ORCPT ); Mon, 29 Jun 2015 03:57:06 -0400 Received: from casper.infradead.org ([85.118.1.10]:39509 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752113AbbF2H44 (ORCPT ); Mon, 29 Jun 2015 03:56:56 -0400 Date: Mon, 29 Jun 2015 09:56:46 +0200 From: Peter Zijlstra To: "Paul E. McKenney" Cc: Oleg Nesterov , tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock Message-ID: <20150629075645.GD19282@twins.programming.kicks-ass.net> References: <20150624164200.GP3644@twins.programming.kicks-ass.net> <20150624171004.GG3717@linux.vnet.ibm.com> <20150624175830.GS3644@twins.programming.kicks-ass.net> <20150625032303.GO3717@linux.vnet.ibm.com> <20150625110734.GX3644@twins.programming.kicks-ass.net> <20150625134726.GR3717@linux.vnet.ibm.com> <20150625142011.GU19282@twins.programming.kicks-ass.net> <20150625145133.GT3717@linux.vnet.ibm.com> <20150626123207.GZ19282@twins.programming.kicks-ass.net> <20150626161415.GY3717@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150626161415.GY3717@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 26, 2015 at 09:14:28AM -0700, Paul E. McKenney wrote: > > To me it just makes more sense to have a single RCU state machine. With > > expedited we'll push it as fast as we can, but no faster. > > Suppose that someone invokes synchronize_sched_expedited(), but there > is no normal grace period in flight. Then each CPU will note its own > quiescent state, but when it later might have tried to push it up the > tree, it will see that there is no grace period in effect, and will > therefore not bother. Right, I did mention the force grace period machinery to make sure we start one before poking :-) > OK, we could have synchronize_sched_expedited() tell the grace-period > kthread to start a grace period if one was not already in progress. I had indeed forgotten that got farmed out to the kthread; on which, my poor desktop seems to have spend ~140 minutes of its (most recent) existence poking RCU things. 7 root 20 0 0 0 0 S 0.0 0.0 56:34.66 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 20:58.19 rcuos/0 9 root 20 0 0 0 0 S 0.0 0.0 18:50.75 rcuos/1 10 root 20 0 0 0 0 S 0.0 0.0 18:30.62 rcuos/2 11 root 20 0 0 0 0 S 0.0 0.0 17:33.24 rcuos/3 12 root 20 0 0 0 0 S 0.0 0.0 2:43.54 rcuos/4 13 root 20 0 0 0 0 S 0.0 0.0 3:00.31 rcuos/5 14 root 20 0 0 0 0 S 0.0 0.0 3:09.27 rcuos/6 15 root 20 0 0 0 0 S 0.0 0.0 2:52.98 rcuos/7 Which is almost as much time as my konsole: 2853 peterz 20 0 586240 103664 41848 S 1.0 0.3 147:39.50 konsole Which seems somewhat excessive. But who knows. > OK, the grace-period kthread could tell synchronize_sched_expedited() > when it has finished initializing the grace period, though this is > starting to get a bit on the Rube Goldberg side. But this -still- is > not good enough, because even though the grace-period kthread has fully > initialized the new grace period, the individual CPUs are unaware of it. Right, so over the weekend -- I had postponed reading this rather long email for I was knackered -- I had figured that because we trickle the GP completion up, you probably equally trickle the GP start down of sorts and there might be 'interesting' things there. > And they will therefore continue to ignore any quiescent state that they > encounter, because they cannot prove that it actually happened after > the start of the current grace period. Right, badness :-) Although here I'll once again go ahead and say something ignorant; how come that's a problem? Surely if we know the kthread thing has finished starting a GP, any one CPU issuing a full memory barrier (as would be implied by switching to the stop worker) must then indeed observe that global state? due to that transitivity thing. That is, I'm having a wee bit of bother for seeing how you'd need manipulation of global variables as you elude to below. > But this -still- isn't good enough, because > idle CPUs never will become aware of the new grace period -- by design, > as they are supposed to be able to sleep through an arbitrary number of > grace periods. Yes, I'm sure. Waking up seems like a serializing experience though; but I suppose that's not good enough if we wake up right before we force start the GP. > I feel like there is a much easier way, but cannot yet articulate it. > I came across a couple of complications and a blind alley with it thus > far, but it still looks promising. I expect to be able to generate > actual code for it within a few days, but right now it is just weird > abstract shapes in my head. (Sorry, if I knew how to describe them, > I could just write the code! When I do write the code, it will probably > seem obvious and trivial, that being the usual outcome...) Hehe, glad to have been of help :-)