All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
Date: Sun, 8 Mar 2009 21:28:24 -0700	[thread overview]
Message-ID: <20090309042824.GU10625@linux.vnet.ibm.com> (raw)
In-Reply-To: <49B4854C.1010805@cn.fujitsu.com>

On Mon, Mar 09, 2009 at 10:56:12AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > On Sun, Mar 08, 2009 at 10:58:43AM +0800, Lai Jiangshan wrote:
> >> Paul E. McKenney wrote:
> >>> On Sat, Mar 07, 2009 at 06:54:38PM +0800, Lai Jiangshan wrote:
> >>>> [RFC]
> >>>> I don't like this patch, but I thought for some days and I can't
> >>>> thought out a better one.
> >>>>
> >>>> I'm very hope rcu_barrier() can be called anywhere(any sleepable context).
> >>>> But get_online_cpus() is a very large lock, it limits rcu_barrier().
> >>>>
> >>>> We can avoid get_online_cpus() easily for rcupreempt by using a new rcu_barrier:
> >>>> void rcu_barrier(void)
> >>>> {
> >>>> 	for each rcu_data {
> >>>> 		lock rcu_data;
> >>>> 		if rcu_data is not empty, queue a callback for rcu_barrier;
> >>>> 		unlock rcu_data;
> >>>> 	}
> >>>> }
> >>>> But we cannot use this algorithm for rcuclassic and rcutree,
> >>>> rcu_data in rcuclassic and rcutree have not a spinlock for queuing callback.
> >>>>
> >>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> >>>>
> >>>> cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
> >>>> still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
> >>>> to complete, so we must ensure callbacks in dead cpu are migrated to
> >>>> online cpu.
> >>> Hmmm...  I thought that on_each_cpu() took care of interlocking with
> >>> CPU hotplug via smp_call_function().  During a CPU-hotplug operation,
> >>> the RCU callbacks do get migrated from the CPU going offline.  Are you
> >>> seeing a sequence of events that finds a hole in this approach?
> >>>
> >>> Now, if a CPU were to go offline in the middle of smp_call_function()
> >>> there could be trouble, but I was under the impression that the
> >>> preempt_disable() in on_each_cpu() prevented this from happening.
> >>>
> >>> So, please tell me more!
> >>>
> >> preempt_disable() ensure online cpu is still online until preempt_enable(),
> >> but preempt_disable()/preempt_enable() can't ensure rcu callbacks migrated.
> >>
> >>
> >> rcu_barrier()			|	_cpu_down()
> >> 				|		__cpu_die() (cpu D is dead)
> >> 	........................|............................
> >> 	on_each_cpu()		|
> >> 	........................|...........................
> >> 	wait_for_completion()	|		rcu_offline_cpu() (move cpu D's
> >> 				|			rcu callbacks to A,B,or C)
> >>
> >>
> >> on_each_cpu() does not queue rcu_barrier_callback to cpu D(it's dead).
> >> So rcu_barrier() will not wait for callbacks which are original at cpu D.
> >>
> >> We need ensure callbacks in dead cpu are migrated to online cpu before
> >> we call on_each_cpu().
> > 
> > Good catch!!!  I did indeed miss that possibility.  :-/
> > 
> > Hmmmm...  rcu_barrier() already acquires a global mutex, and is an
> > infrequent operation, so I am not all that worried about the scalability.
> 
> I do not worry about the scalability either.
> When we use get_online_cpus(), rcu_barrier() can not be called anywhere
> (any sleepable context), this is what I worry about.
> 
> Most locks in kernel are locked after cpu_hotplug.lock,
> if a path has required one of these lock, it cannot call get_online_cpus().
> (to avoid ABBA deadlock)
> So, if we use get_online_cpus() in rcu_barrier(), we cannot use rcu_barrier()
> in most area in kernel.

Yes, that could be painful.

> > But I agree that there should be a better way to do this.  One approach
> > might be to the dying CPU enqueue the rcu_barrier() callback on its
> > own list when it goes offline, during the stop_machine() time period.
> > This enqueuing operation would require some care -- it would be necessary
> > to check to see if the callback was already on the list, for example,
> > as well as to properly adjust the rcu_barrier_completion() state.
> > 
> > Of course, it would also be necessary to handle the case where an
> > rcu_barrier() callback was enqueued when there was no rcu_barrier()
> > in flight, preferably by preventing this from happening.
> > 
> > An entirely different approach would be to steal a trick from CPU
> > designers, and use a count of the number of rcu_barrier() calls (this
> > counter could be a single bit).  Have a per-CPU counter of the number
> > of callbacks outstanding for each counter value.  Then rcu_barrier()
> > simply increments the rcu_barrier() counter, and waits until the
> > number of outstanding callbacks corresponding to the old value drops
> > to zero.  This would get rid of the need for rcu_barrier() to enqueue
> > callbacks, preventing the scenario above from arising in the first
> > place.
> 
> Will you implement it with one of better ways?

I would start after I get the idr crash resolved -- first patch already
done (http://lkml.org/lkml/2009/3/7/133), but problems remain.  So if
you want to get started, I would be happy to review.  Your tail-pointer
trick would be needed to allow the counts to be correctly maintained
when migrating callbacks, of course.

							Thanx, Paul

  reply	other threads:[~2009-03-09  4:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-07 10:54 [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu Lai Jiangshan
2009-03-07 17:29 ` Paul E. McKenney
2009-03-08  2:58   ` Lai Jiangshan
2009-03-08  6:20     ` Paul E. McKenney
2009-03-09  2:56       ` Lai Jiangshan
2009-03-09  4:28         ` Paul E. McKenney [this message]
2009-03-08 16:00 ` Ingo Molnar
2009-03-19  3:06   ` Lai Jiangshan
2009-03-19  4:05     ` Paul E. McKenney
     [not found]       ` <20090319082237.GA32179@elte.hu>
2009-03-20  9:40         ` Lai Jiangshan
2009-03-20 20:00           ` [tip:core/rcu] rcu: " Lai Jiangshan
2009-03-30 22:12           ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090309042824.GU10625@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.