From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
jiangshanlai@gmail.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org,
dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com,
fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com
Subject: Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
Date: Thu, 8 Oct 2015 08:33:51 -0700 [thread overview]
Message-ID: <20151008153351.GC3910@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151008094933.GK3816@twins.programming.kicks-ass.net>
On Thu, Oct 08, 2015 at 11:49:33AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 09:48:58AM -0700, Paul E. McKenney wrote:
>
> > > Some implementation choice requires this barrier upgrade -- and in
> > > another email I suggest its the whole tree thing, we need to firmly
> > > establish the state of one level before propagating the state up etc.
> > >
> > > Now I'm not entirely sure this is fully correct, but its the best I
> > > could come up.
> >
> > It is pretty close. Ignoring dyntick idle for the moment, things
> > go (very) roughly like this:
> >
> > o The RCU grace-period kthread notices that a new grace period
> > is needed. It initializes the tree, which includes acquiring
> > every rcu_node structure's ->lock.
> >
> > o CPU A notices that there is a new grace period. It acquires
> > the ->lock of its leaf rcu_node structure, which forces full
> > ordering against the grace-period kthread.
>
> If the kthread took _all_ rcu_node locks, then this does not require the
> barrier upgrade because they will share a lock variable.
>
> > o Some time later, that CPU A realizes that it has passed
> > through a quiescent state, and again acquires its leaf rcu_node
> > structure's ->lock, again enforcing full ordering, but this
> > time against all CPUs corresponding to this same leaf rcu_node
> > structure that previously noticed quiescent states for this
> > same grace period. Also against all prior readers on this
> > same CPU.
>
> This again reads like the same lock variable is involved, and therefore
> the barrier upgrade is not required for this.
>
> > o Some time later, CPU B (corresponding to that same leaf
> > rcu_node structure) is the last of that leaf's group of CPUs
> > to notice a quiescent state. It has also acquired that leaf's
> > ->lock, again forcing ordering against its prior RCU read-side
> > critical sections, but also against all the prior RCU
> > read-side critical sections of all other CPUs corresponding
> > to this same leaf.
>
> same lock var again..
>
> > o CPU B therefore moves up the tree, acquiring the parent
> > rcu_node structures' ->lock. In so doing, it forces full
> > ordering against all prior RCU read-side critical sections
> > of all CPUs corresponding to all leaf rcu_node structures
> > subordinate to the current (non-leaf) rcu_node structure.
>
> And here we iterate the tree and get another lock var involved, here the
> barrier upgrade will actually do something.
Yep. And I am way too lazy to sort out exactly which acquisitions really
truly need smp_mb__after_unlock_lock() and which don't. Besides, if I
tried to sort it out, I would occasionally get it wrong, and this would be
a real pain to debug. Therefore, I simply do smp_mb__after_unlock_lock()
on all acquisitions of the rcu_node structures' ->lock fields. I can
actually validate that! ;-)
> > o And so on, up the tree.
>
> idem..
>
> > o When CPU C reaches the root of the tree, and realizes that
> > it is the last CPU to report a quiescent state for the
> > current grace period, its acquisition of the root rcu_node
> > structure's ->lock has forced full ordering against all
> > RCU read-side critical sections that started before this
> > grace period -- on all CPUs.
>
> Right, which makes the full barrier transitivity thing important
>
> > CPU C therefore awakens the grace-period kthread.
>
> > o When the grace-period kthread wakes up, it does cleanup,
> > which (you guessed it!) requires acquiring the ->lock of
> > each rcu_node structure. This not only forces full ordering
> > against each pre-existing RCU read-side critical section,
> > it also sets up things so that...
>
> Again, if it takes _all_ rcu_nodes, it also shares a lock variable and
> hence the upgrade is not required.
>
> > o When CPU D notices that the grace period ended, it does so
> > while holding its leaf rcu_node structure's ->lock. This
> > forces full ordering against all relevant RCU read-side
> > critical sections. This ordering prevails when CPU D later
> > starts invoking RCU callbacks.
>
> Does also not seem to require the upgrade..
>
> > Hey, you asked!!! ;-)
>
> No, I asked what all the barrier upgrade was for, most of the above does
> not seem to rely on that at all.
>
> The only place this upgrade matters is the UNLOCK x + LOCK y scenario,
> as also per the comment above smp_mb__after_unlock_lock().
>
> Any other ordering is not on this but on the other primitives and
> irrelevant to the barrier upgrade.
I am still keeping an smp_mb__after_unlock_lock() after every ->lock.
Trying to track which needs it and which does not is asking for
subtle bugs.
> > Again, this is a cartoon-like view of the ordering that leaves out a
> > lot of details, but it should get across the gist of the ordering.
>
> So the ordering I'm interested in, is the bit that is provided by the
> barrier upgrade, and that seems very limited and directly pertains to
> the tree iteration, ensuring its fully separated and transitive.
>
> So I'll stick to explanation that the barrier upgrade is purely for the
> tree iteration, to separate and make transitive the tree level state.
Fair enough, but I will be sticking to the simple coding rule that keeps
RCU out of trouble!
Thanx, Paul
next prev parent reply other threads:[~2015-10-08 15:34 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-06 16:29 [PATCH tip/core/rcu 0/18] Expedited grace-period improvements for 4.4 Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation Paul E. McKenney
2015-10-06 20:29 ` Peter Zijlstra
2015-10-06 20:58 ` Paul E. McKenney
2015-10-07 7:51 ` Peter Zijlstra
2015-10-07 8:42 ` Mathieu Desnoyers
2015-10-07 11:01 ` Peter Zijlstra
2015-10-07 11:50 ` Peter Zijlstra
2015-10-07 12:03 ` Peter Zijlstra
2015-10-07 12:05 ` kbuild test robot
2015-10-07 12:09 ` kbuild test robot
2015-10-07 12:11 ` kbuild test robot
2015-10-07 12:17 ` Peter Zijlstra
2015-10-07 13:44 ` [kbuild-all] " Fengguang Wu
2015-10-07 13:55 ` Peter Zijlstra
2015-10-07 14:21 ` Fengguang Wu
2015-10-07 14:28 ` Peter Zijlstra
2015-10-07 15:18 ` Paul E. McKenney
2015-10-08 10:24 ` Peter Zijlstra
2015-10-07 15:15 ` Paul E. McKenney
2015-10-07 14:33 ` Paul E. McKenney
2015-10-07 14:40 ` Peter Zijlstra
2015-10-07 16:48 ` Paul E. McKenney
2015-10-08 9:49 ` Peter Zijlstra
2015-10-08 15:33 ` Paul E. McKenney [this message]
2015-10-08 17:12 ` Peter Zijlstra
2015-10-08 17:46 ` Paul E. McKenney
2015-10-09 0:10 ` Paul E. McKenney
2015-10-09 8:44 ` Peter Zijlstra
2015-10-06 16:29 ` [PATCH tip/core/rcu 03/18] rcu: Consolidate tree setup for synchronize_rcu_expedited() Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period Paul E. McKenney
2015-10-07 13:24 ` Peter Zijlstra
2015-10-07 18:11 ` Paul E. McKenney
2015-10-07 13:35 ` Peter Zijlstra
2015-10-07 15:44 ` Paul E. McKenney
2015-10-07 13:43 ` Peter Zijlstra
2015-10-07 13:49 ` Peter Zijlstra
2015-10-07 16:14 ` Paul E. McKenney
2015-10-08 9:00 ` Peter Zijlstra
2015-10-07 16:13 ` Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 05/18] rcu: Move synchronize_sched_expedited() to combining tree Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 06/18] rcu: Rename qs_pending to core_needs_qs Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 07/18] rcu: Invert passed_quiesce and rename to cpu_no_qs Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 08/18] rcu: Make ->cpu_no_qs be a union for aggregate OR Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 09/18] rcu: Switch synchronize_sched_expedited() to IPI Paul E. McKenney
2015-10-07 14:18 ` Peter Zijlstra
2015-10-07 16:24 ` Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 10/18] rcu: Stop silencing lockdep false positive for expedited grace periods Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 11/18] rcu: Stop excluding CPU hotplug in synchronize_sched_expedited() Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 12/18] cpu: Remove try_get_online_cpus() Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 13/18] rcu: Prepare for consolidating expedited CPU selection Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 14/18] rcu: Consolidate " Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 15/18] rcu: Add online/offline info to expedited stall warning message Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 16/18] rcu: Add tasks to expedited stall-warning messages Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 17/18] rcu: Enable stall warnings for synchronize_rcu_expedited() Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited() Paul E. McKenney
2015-10-07 14:26 ` Peter Zijlstra
2015-10-07 16:26 ` Paul E. McKenney
2015-10-08 9:01 ` Peter Zijlstra
2015-10-08 15:06 ` Paul E. McKenney
2015-10-08 15:12 ` Peter Zijlstra
2015-10-08 15:19 ` Paul E. McKenney
2015-10-08 18:01 ` Josh Triplett
2015-10-09 0:11 ` Paul E. McKenney
2015-10-09 0:48 ` Josh Triplett
2015-10-09 3:54 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151008153351.GC3910@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=bobby.prani@gmail.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=dvhart@linux.intel.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).