From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752152AbbFYLHt (ORCPT ); Thu, 25 Jun 2015 07:07:49 -0400 Received: from casper.infradead.org ([85.118.1.10]:43926 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751451AbbFYLHn (ORCPT ); Thu, 25 Jun 2015 07:07:43 -0400 Date: Thu, 25 Jun 2015 13:07:34 +0200 From: Peter Zijlstra To: "Paul E. McKenney" Cc: Oleg Nesterov , tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock Message-ID: <20150625110734.GX3644@twins.programming.kicks-ass.net> References: <20150624073503.GH3644@twins.programming.kicks-ass.net> <20150624145030.GB3717@linux.vnet.ibm.com> <20150624150151.GN3644@twins.programming.kicks-ass.net> <20150624152705.GE3717@linux.vnet.ibm.com> <20150624154010.GS19282@twins.programming.kicks-ass.net> <20150624160851.GF3717@linux.vnet.ibm.com> <20150624164200.GP3644@twins.programming.kicks-ass.net> <20150624171004.GG3717@linux.vnet.ibm.com> <20150624175830.GS3644@twins.programming.kicks-ass.net> <20150625032303.GO3717@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150625032303.GO3717@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 24, 2015 at 08:23:17PM -0700, Paul E. McKenney wrote: > Here is what I had in mind, where you don't have any global trashing > except when the ->expedited_sequence gets updated. Passes mild rcutorture > testing. > /* > + * Each pass through the following loop works its way > + * up the rcu_node tree, returning if others have done the > + * work or otherwise falls through holding the root rnp's > + * ->exp_funnel_mutex. The mapping from CPU to rcu_node structure > + * can be inexact, as it is just promoting locality and is not > + * strictly needed for correctness. > */ > + rnp0 = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode; > + for (; rnp0 != NULL; rnp0 = rnp0->parent) { > + if (sync_sched_exp_wd(rsp, rnp1, &rsp->expedited_workdone1, s)) > return; > + mutex_lock(&rnp0->exp_funnel_mutex); > + if (rnp1) > + mutex_unlock(&rnp1->exp_funnel_mutex); > + rnp1 = rnp0; > + } > + rnp0 = rnp1; /* rcu_get_root(rsp), AKA root rcu_node structure. */ > + if (sync_sched_exp_wd(rsp, rnp0, &rsp->expedited_workdone2, s)) > + return; I'm still somewhat confused by the whole strict order sequence vs this non ordered 'polling' of global state. This funnel thing basically waits random times depending on the contention of these mutexes and tries again. Ultimately serializing on the root funnel thing. So on the one hand you have to strictly order these expedited caller, but then you don't want to actually process them in order. If 'by magic' you manage to process the 3rd in queue, you can drop the 2nd because it will have waited long enough. OTOH the 2nd will have waited too long. You also do not take the actual RCU state machine into account -- this is a parallel state. Can't we integrate the force quiescent state machinery with the expedited machinery -- that is instead of building a parallel state, use the expedited thing to push the regular machine forward? We can use the stop_machine calls to force the local RCU state forward, after all, we _know_ we just made a context switch into the stopper thread. All we need to do is disable interrupts to hold off the tick (which normally drives the state machine) and just unconditionally advance our state. If we use the regular GP machinery, you also don't have to strongly order the callers, just stick them on whatever GP was active when they came in and let them roll, this allows much better (and more natural) concurrent processing.