All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org,
	der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com,
	viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org
Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock
Date: Thu, 25 Jun 2015 07:51:46 -0700	[thread overview]
Message-ID: <20150625145133.GT3717@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150625142011.GU19282@twins.programming.kicks-ass.net>

On Thu, Jun 25, 2015 at 04:20:11PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 25, 2015 at 06:47:55AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 25, 2015 at 01:07:34PM +0200, Peter Zijlstra wrote:
> > > I'm still somewhat confused by the whole strict order sequence vs this
> > > non ordered 'polling' of global state.
> > > 
> > > This funnel thing basically waits random times depending on the
> > > contention of these mutexes and tries again. Ultimately serializing on
> > > the root funnel thing.
> > 
> > Not random at all!
> 
> No, they are random per, definition it depends on the amount of
> contention and since that's random, the rest it too.

Not sure how to parse this one.  ;-)

> > The whole funnel is controlled by the root ->exp_funnel_mutex holder,
> > who is going to hold the lock for a single expedited grace period, then
> > release it.  This means that any time a task acquires a lock, there is
> > very likely to have been a recent state change.  Hence the checks after
> > each lock acquisition.
> > 
> > So in the heavy-use case, what tends to happen is that there are one
> > or two expedited grace periods, and then the entire queue of waiters
> > acquiring ->exp_funnel_mutex simply evaporates -- they can make use of
> > the expedited grace period whose completion resulted in their acquisition
> > completing and thus them being awakened.  No fuss, no muss, no unnecessary
> > contention or cache thrashing.
> 
> Plenty of cache trashing, since your 'tree' is not at all cache aligned
> or even remotely coherent with the actual machine topology -- I'll keep
> reminding you :-)

And, as I keep reminding you, if you actually show me system-level data
demonstrating that this is a real problem, I might consider taking some
action.  And also reminding you that in the meantime, you can experiment
by setting the fanout sizes to match a given system and see if it makes
any visible difference.  (Yes, I do understand the odd numbering of
hyperthreads, but you can still run a reasonable experiment.)

> But I must admit that the workings of the sequence thing elided me this
> morning. Yes that's much better than the strict ticket order of before.

OK, good!

> > > You also do not take the actual RCU state machine into account -- this
> > > is a parallel state.
> > > 
> > > Can't we integrate the force quiescent state machinery with the
> > > expedited machinery -- that is instead of building a parallel state, use
> > > the expedited thing to push the regular machine forward?
> > > 
> > > We can use the stop_machine calls to force the local RCU state forward,
> > > after all, we _know_ we just made a context switch into the stopper
> > > thread. All we need to do is disable interrupts to hold off the tick
> > > (which normally drives the state machine) and just unconditionally
> > > advance our state.
> > > 
> > > If we use the regular GP machinery, you also don't have to strongly
> > > order the callers, just stick them on whatever GP was active when they
> > > came in and let them roll, this allows much better (and more natural)
> > > concurrent processing.
> > 
> > That gets quite complex, actually.  Lots of races with the normal grace
> > periods doing one thing or another.
> 
> How so? I'm probably missing several years of RCU trickery and detail
> again, but since we can advance from the tick, we should be able to
> advance from the stop work with IRQs disabled with equal ease.
> 
> And since the stop work and the tick are fully serialized, there cannot
> be any races there.
> 
> And the stop work against other CPUs is the exact same races you already
> had with tick vs tick.
> 
> So please humour me and explain how all this is far more complicated ;-)

Yeah, I do need to get RCU design/implementation documentation put together.

In the meantime, RCU's normal grace-period machinery is designed to be
quite loosely coupled.  The idea is that almost all actions occur locally,
reducing contention and cache thrashing.  But an expedited grace period
needs tight coupling in order to be able to complete quickly.  Making
something that switches between loose and tight coupling in short order
is not at all simple.

> > However, it should be quite easy to go the other way and make the normal
> > grace-period processing take advantage of expedited grace periods that
> > happened to occur at the right time.  I will look into this, thank you
> > for the nudge!
> 
> That should already be happening, right? Since we force context
> switches, the tick driven RCU state machine will observe those and make
> progress -- assuming it was trying to make progress at all of course.

It is to an extent, but I believe that I can do better.  On the other hand,
it is quite possible that this is a 6AM delusion on my part.  ;-)

If it is not a delusion, the eventual solution will likely be a much more
satisfying answer to your "why not merge into the normal RCU grace period
machinery" question.  But I need to complete reworking the expedited
machinery first!

							Thanx, Paul


  reply	other threads:[~2015-06-25 14:52 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-22 12:16 [RFC][PATCH 00/13] percpu rwsem -v2 Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 01/13] rcu: Create rcu_sync infrastructure Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 02/13] rcusync: Introduce struct rcu_sync_ops Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 03/13] rcusync: Add the CONFIG_PROVE_RCU checks Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 04/13] rcusync: Introduce rcu_sync_dtor() Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 05/13] percpu-rwsem: Optimize readers and reduce global impact Peter Zijlstra
2015-06-22 23:02   ` Oleg Nesterov
2015-06-23  7:28   ` Nicholas Mc Guire
2015-06-25 19:08     ` Peter Zijlstra
2015-06-25 19:17       ` Tejun Heo
2015-06-29  9:32         ` Peter Zijlstra
2015-06-29 15:12           ` Tejun Heo
2015-06-29 15:14             ` Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 06/13] percpu-rwsem: Provide percpu_down_read_trylock() Peter Zijlstra
2015-06-22 23:08   ` Oleg Nesterov
2015-06-22 12:16 ` [RFC][PATCH 07/13] sched: Reorder task_struct Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 08/13] percpu-rwsem: DEFINE_STATIC_PERCPU_RWSEM Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 09/13] hotplug: Replace hotplug lock with percpu-rwsem Peter Zijlstra
2015-06-22 22:57   ` Oleg Nesterov
2015-06-23  7:16     ` Peter Zijlstra
2015-06-23 17:01       ` Oleg Nesterov
2015-06-23 17:53         ` Peter Zijlstra
2015-06-24 13:50           ` Oleg Nesterov
2015-06-24 14:13             ` Peter Zijlstra
2015-06-24 15:12               ` Oleg Nesterov
2015-06-24 16:15                 ` Peter Zijlstra
2015-06-28 23:56             ` [PATCH 0/3] percpu-rwsem: introduce percpu_rw_semaphore->recursive mode Oleg Nesterov
2015-06-28 23:56               ` [PATCH 1/3] rcusync: introduce rcu_sync_struct->exclusive mode Oleg Nesterov
2015-06-28 23:56               ` [PATCH 2/3] percpu-rwsem: don't use percpu_rw_semaphore->rw_sem to exclude writers Oleg Nesterov
2015-06-28 23:56               ` [PATCH 3/3] percpu-rwsem: introduce percpu_rw_semaphore->recursive mode Oleg Nesterov
2015-06-22 12:16 ` [RFC][PATCH 10/13] fs/locks: Replace lg_global with a percpu-rwsem Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 11/13] fs/locks: Replace lg_local with a per-cpu spinlock Peter Zijlstra
2015-06-23  0:19   ` Oleg Nesterov
2015-06-22 12:16 ` [RFC][PATCH 12/13] stop_machine: Remove lglock Peter Zijlstra
2015-06-22 22:21   ` Oleg Nesterov
2015-06-23 10:09     ` Peter Zijlstra
2015-06-23 10:55       ` Peter Zijlstra
2015-06-23 11:20         ` Peter Zijlstra
2015-06-23 13:08           ` Peter Zijlstra
2015-06-23 16:36             ` Oleg Nesterov
2015-06-23 17:30             ` Paul E. McKenney
2015-06-23 18:04               ` Peter Zijlstra
2015-06-23 18:26                 ` Paul E. McKenney
2015-06-23 19:05                   ` Paul E. McKenney
2015-06-24  2:23                     ` Paul E. McKenney
2015-06-24  8:32                       ` Peter Zijlstra
2015-06-24  9:31                         ` Peter Zijlstra
2015-06-24 13:48                           ` Paul E. McKenney
2015-06-24 15:01                         ` Paul E. McKenney
2015-06-24 15:34                           ` Peter Zijlstra
2015-06-24  7:35                   ` Peter Zijlstra
2015-06-24  8:42                     ` Ingo Molnar
2015-06-24 13:39                       ` Paul E. McKenney
2015-06-24 13:43                         ` Ingo Molnar
2015-06-24 14:03                           ` Paul E. McKenney
2015-06-24 14:50                     ` Paul E. McKenney
2015-06-24 15:01                       ` Peter Zijlstra
2015-06-24 15:27                         ` Paul E. McKenney
2015-06-24 15:40                           ` Peter Zijlstra
2015-06-24 16:09                             ` Paul E. McKenney
2015-06-24 16:42                               ` Peter Zijlstra
2015-06-24 17:10                                 ` Paul E. McKenney
2015-06-24 17:20                                   ` Paul E. McKenney
2015-06-24 17:29                                     ` Peter Zijlstra
2015-06-24 17:28                                   ` Peter Zijlstra
2015-06-24 17:32                                     ` Peter Zijlstra
2015-06-24 18:14                                     ` Peter Zijlstra
2015-06-24 17:58                                   ` Peter Zijlstra
2015-06-25  3:23                                     ` Paul E. McKenney
2015-06-25 11:07                                       ` Peter Zijlstra
2015-06-25 13:47                                         ` Paul E. McKenney
2015-06-25 14:20                                           ` Peter Zijlstra
2015-06-25 14:51                                             ` Paul E. McKenney [this message]
2015-06-26 12:32                                               ` Peter Zijlstra
2015-06-26 16:14                                                 ` Paul E. McKenney
2015-06-29  7:56                                                   ` Peter Zijlstra
2015-06-30 21:32                                                     ` Paul E. McKenney
2015-07-01 11:56                                                       ` Peter Zijlstra
2015-07-01 15:56                                                         ` Paul E. McKenney
2015-07-01 16:16                                                           ` Peter Zijlstra
2015-07-01 18:45                                                             ` Paul E. McKenney
2015-06-23 14:39         ` Paul E. McKenney
2015-06-23 16:20       ` Oleg Nesterov
2015-06-23 17:24         ` Oleg Nesterov
2015-06-25 19:18           ` Peter Zijlstra
2015-06-22 12:16 ` [RFC][PATCH 13/13] locking: " Peter Zijlstra
2015-06-22 12:36 ` [RFC][PATCH 00/13] percpu rwsem -v2 Peter Zijlstra
2015-06-22 18:11 ` Daniel Wagner
2015-06-22 19:05   ` Peter Zijlstra
2015-06-23  9:35     ` Daniel Wagner
2015-06-23 10:00       ` Ingo Molnar
2015-06-23 14:34       ` Peter Zijlstra
2015-06-23 14:56         ` Daniel Wagner
2015-06-23 17:50           ` Peter Zijlstra
2015-06-23 19:36             ` Peter Zijlstra
2015-06-24  8:46               ` Ingo Molnar
2015-06-24  9:01                 ` Peter Zijlstra
2015-06-24  9:18                 ` Daniel Wagner
2015-07-01  5:57                   ` Daniel Wagner
2015-07-01 21:54                     ` Linus Torvalds
2015-07-02  9:41                       ` Peter Zijlstra
2015-07-20  5:53                         ` Daniel Wagner
2015-07-20 18:44                           ` Linus Torvalds
2015-06-22 20:06 ` Linus Torvalds
2015-06-23 16:10 ` Davidlohr Bueso
2015-06-23 16:21   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150625145133.GT3717@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=dave@stgolabs.net \
    --cc=der.herr@hofr.at \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.