From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934399AbbJHPeA (ORCPT ); Thu, 8 Oct 2015 11:34:00 -0400 Received: from e32.co.us.ibm.com ([32.97.110.150]:36226 "EHLO e32.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934072AbbJHPd5 (ORCPT ); Thu, 8 Oct 2015 11:33:57 -0400 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Thu, 8 Oct 2015 08:33:51 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com Subject: Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation Message-ID: <20151008153351.GC3910@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20151006162907.GA12020@linux.vnet.ibm.com> <1444148977-14108-1-git-send-email-paulmck@linux.vnet.ibm.com> <1444148977-14108-2-git-send-email-paulmck@linux.vnet.ibm.com> <20151006202937.GX3604@twins.programming.kicks-ass.net> <20151006205850.GW3910@linux.vnet.ibm.com> <20151007075114.GW2881@worktop.programming.kicks-ass.net> <20151007143325.GF3910@linux.vnet.ibm.com> <20151007144024.GI17308@twins.programming.kicks-ass.net> <20151007164858.GQ3910@linux.vnet.ibm.com> <20151008094933.GK3816@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151008094933.GK3816@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15100815-0005-0000-0000-000018CBC877 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 08, 2015 at 11:49:33AM +0200, Peter Zijlstra wrote: > On Wed, Oct 07, 2015 at 09:48:58AM -0700, Paul E. McKenney wrote: > > > > Some implementation choice requires this barrier upgrade -- and in > > > another email I suggest its the whole tree thing, we need to firmly > > > establish the state of one level before propagating the state up etc. > > > > > > Now I'm not entirely sure this is fully correct, but its the best I > > > could come up. > > > > It is pretty close. Ignoring dyntick idle for the moment, things > > go (very) roughly like this: > > > > o The RCU grace-period kthread notices that a new grace period > > is needed. It initializes the tree, which includes acquiring > > every rcu_node structure's ->lock. > > > > o CPU A notices that there is a new grace period. It acquires > > the ->lock of its leaf rcu_node structure, which forces full > > ordering against the grace-period kthread. > > If the kthread took _all_ rcu_node locks, then this does not require the > barrier upgrade because they will share a lock variable. > > > o Some time later, that CPU A realizes that it has passed > > through a quiescent state, and again acquires its leaf rcu_node > > structure's ->lock, again enforcing full ordering, but this > > time against all CPUs corresponding to this same leaf rcu_node > > structure that previously noticed quiescent states for this > > same grace period. Also against all prior readers on this > > same CPU. > > This again reads like the same lock variable is involved, and therefore > the barrier upgrade is not required for this. > > > o Some time later, CPU B (corresponding to that same leaf > > rcu_node structure) is the last of that leaf's group of CPUs > > to notice a quiescent state. It has also acquired that leaf's > > ->lock, again forcing ordering against its prior RCU read-side > > critical sections, but also against all the prior RCU > > read-side critical sections of all other CPUs corresponding > > to this same leaf. > > same lock var again.. > > > o CPU B therefore moves up the tree, acquiring the parent > > rcu_node structures' ->lock. In so doing, it forces full > > ordering against all prior RCU read-side critical sections > > of all CPUs corresponding to all leaf rcu_node structures > > subordinate to the current (non-leaf) rcu_node structure. > > And here we iterate the tree and get another lock var involved, here the > barrier upgrade will actually do something. Yep. And I am way too lazy to sort out exactly which acquisitions really truly need smp_mb__after_unlock_lock() and which don't. Besides, if I tried to sort it out, I would occasionally get it wrong, and this would be a real pain to debug. Therefore, I simply do smp_mb__after_unlock_lock() on all acquisitions of the rcu_node structures' ->lock fields. I can actually validate that! ;-) > > o And so on, up the tree. > > idem.. > > > o When CPU C reaches the root of the tree, and realizes that > > it is the last CPU to report a quiescent state for the > > current grace period, its acquisition of the root rcu_node > > structure's ->lock has forced full ordering against all > > RCU read-side critical sections that started before this > > grace period -- on all CPUs. > > Right, which makes the full barrier transitivity thing important > > > CPU C therefore awakens the grace-period kthread. > > > o When the grace-period kthread wakes up, it does cleanup, > > which (you guessed it!) requires acquiring the ->lock of > > each rcu_node structure. This not only forces full ordering > > against each pre-existing RCU read-side critical section, > > it also sets up things so that... > > Again, if it takes _all_ rcu_nodes, it also shares a lock variable and > hence the upgrade is not required. > > > o When CPU D notices that the grace period ended, it does so > > while holding its leaf rcu_node structure's ->lock. This > > forces full ordering against all relevant RCU read-side > > critical sections. This ordering prevails when CPU D later > > starts invoking RCU callbacks. > > Does also not seem to require the upgrade.. > > > Hey, you asked!!! ;-) > > No, I asked what all the barrier upgrade was for, most of the above does > not seem to rely on that at all. > > The only place this upgrade matters is the UNLOCK x + LOCK y scenario, > as also per the comment above smp_mb__after_unlock_lock(). > > Any other ordering is not on this but on the other primitives and > irrelevant to the barrier upgrade. I am still keeping an smp_mb__after_unlock_lock() after every ->lock. Trying to track which needs it and which does not is asking for subtle bugs. > > Again, this is a cartoon-like view of the ordering that leaves out a > > lot of details, but it should get across the gist of the ordering. > > So the ordering I'm interested in, is the bit that is provided by the > barrier upgrade, and that seems very limited and directly pertains to > the tree iteration, ensuring its fully separated and transitive. > > So I'll stick to explanation that the barrier upgrade is purely for the > tree iteration, to separate and make transitive the tree level state. Fair enough, but I will be sticking to the simple coding rule that keeps RCU out of trouble! Thanx, Paul