All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
@ 2009-03-07 10:54 Lai Jiangshan
  2009-03-07 17:29 ` Paul E. McKenney
  2009-03-08 16:00 ` Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Lai Jiangshan @ 2009-03-07 10:54 UTC (permalink / raw)
  To: Ingo Molnar, Paul E. McKenney, Peter Zijlstra, LKML

[RFC]
I don't like this patch, but I thought for some days and I can't
thought out a better one.

I'm very hope rcu_barrier() can be called anywhere(any sleepable context).
But get_online_cpus() is a very large lock, it limits rcu_barrier().

We can avoid get_online_cpus() easily for rcupreempt by using a new rcu_barrier:
void rcu_barrier(void)
{
	for each rcu_data {
		lock rcu_data;
		if rcu_data is not empty, queue a callback for rcu_barrier;
		unlock rcu_data;
	}
}
But we cannot use this algorithm for rcuclassic and rcutree,
rcu_data in rcuclassic and rcutree have not a spinlock for queuing callback.

From: Lai Jiangshan <laijs@cn.fujitsu.com>

cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
to complete, so we must ensure callbacks in dead cpu are migrated to
online cpu.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index cae8a05..4665d18 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -129,6 +129,8 @@ static void rcu_barrier_func(void *type)
 static void _rcu_barrier(enum rcu_barrier type)
 {
 	BUG_ON(in_interrupt());
+	/* Ensure callbacks in dead cpu are migrated to online cpu */
+	get_online_cpus();
 	/* Take cpucontrol mutex to protect against CPU hotplug */
 	mutex_lock(&rcu_barrier_mutex);
 	init_completion(&rcu_barrier_completion);
@@ -147,6 +149,7 @@ static void _rcu_barrier(enum rcu_barrier type)
 		complete(&rcu_barrier_completion);
 	wait_for_completion(&rcu_barrier_completion);
 	mutex_unlock(&rcu_barrier_mutex);
+	put_online_cpus();
 }
 
 /**


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-07 10:54 [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu Lai Jiangshan
@ 2009-03-07 17:29 ` Paul E. McKenney
  2009-03-08  2:58   ` Lai Jiangshan
  2009-03-08 16:00 ` Ingo Molnar
  1 sibling, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2009-03-07 17:29 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Ingo Molnar, Peter Zijlstra, LKML

On Sat, Mar 07, 2009 at 06:54:38PM +0800, Lai Jiangshan wrote:
> [RFC]
> I don't like this patch, but I thought for some days and I can't
> thought out a better one.
> 
> I'm very hope rcu_barrier() can be called anywhere(any sleepable context).
> But get_online_cpus() is a very large lock, it limits rcu_barrier().
> 
> We can avoid get_online_cpus() easily for rcupreempt by using a new rcu_barrier:
> void rcu_barrier(void)
> {
> 	for each rcu_data {
> 		lock rcu_data;
> 		if rcu_data is not empty, queue a callback for rcu_barrier;
> 		unlock rcu_data;
> 	}
> }
> But we cannot use this algorithm for rcuclassic and rcutree,
> rcu_data in rcuclassic and rcutree have not a spinlock for queuing callback.
> 
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
> still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
> to complete, so we must ensure callbacks in dead cpu are migrated to
> online cpu.

Hmmm...  I thought that on_each_cpu() took care of interlocking with
CPU hotplug via smp_call_function().  During a CPU-hotplug operation,
the RCU callbacks do get migrated from the CPU going offline.  Are you
seeing a sequence of events that finds a hole in this approach?

Now, if a CPU were to go offline in the middle of smp_call_function()
there could be trouble, but I was under the impression that the
preempt_disable() in on_each_cpu() prevented this from happening.

So, please tell me more!

							Thanx, Paul

> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> index cae8a05..4665d18 100644
> --- a/kernel/rcupdate.c
> +++ b/kernel/rcupdate.c
> @@ -129,6 +129,8 @@ static void rcu_barrier_func(void *type)
>  static void _rcu_barrier(enum rcu_barrier type)
>  {
>  	BUG_ON(in_interrupt());
> +	/* Ensure callbacks in dead cpu are migrated to online cpu */
> +	get_online_cpus();
>  	/* Take cpucontrol mutex to protect against CPU hotplug */
>  	mutex_lock(&rcu_barrier_mutex);
>  	init_completion(&rcu_barrier_completion);
> @@ -147,6 +149,7 @@ static void _rcu_barrier(enum rcu_barrier type)
>  		complete(&rcu_barrier_completion);
>  	wait_for_completion(&rcu_barrier_completion);
>  	mutex_unlock(&rcu_barrier_mutex);
> +	put_online_cpus();
>  }
> 
>  /**
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-07 17:29 ` Paul E. McKenney
@ 2009-03-08  2:58   ` Lai Jiangshan
  2009-03-08  6:20     ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2009-03-08  2:58 UTC (permalink / raw)
  To: paulmck; +Cc: Ingo Molnar, Peter Zijlstra, LKML

Paul E. McKenney wrote:
> On Sat, Mar 07, 2009 at 06:54:38PM +0800, Lai Jiangshan wrote:
>> [RFC]
>> I don't like this patch, but I thought for some days and I can't
>> thought out a better one.
>>
>> I'm very hope rcu_barrier() can be called anywhere(any sleepable context).
>> But get_online_cpus() is a very large lock, it limits rcu_barrier().
>>
>> We can avoid get_online_cpus() easily for rcupreempt by using a new rcu_barrier:
>> void rcu_barrier(void)
>> {
>> 	for each rcu_data {
>> 		lock rcu_data;
>> 		if rcu_data is not empty, queue a callback for rcu_barrier;
>> 		unlock rcu_data;
>> 	}
>> }
>> But we cannot use this algorithm for rcuclassic and rcutree,
>> rcu_data in rcuclassic and rcutree have not a spinlock for queuing callback.
>>
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>
>> cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
>> still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
>> to complete, so we must ensure callbacks in dead cpu are migrated to
>> online cpu.
> 
> Hmmm...  I thought that on_each_cpu() took care of interlocking with
> CPU hotplug via smp_call_function().  During a CPU-hotplug operation,
> the RCU callbacks do get migrated from the CPU going offline.  Are you
> seeing a sequence of events that finds a hole in this approach?
> 
> Now, if a CPU were to go offline in the middle of smp_call_function()
> there could be trouble, but I was under the impression that the
> preempt_disable() in on_each_cpu() prevented this from happening.
> 
> So, please tell me more!
> 

preempt_disable() ensure online cpu is still online until preempt_enable(),
but preempt_disable()/preempt_enable() can't ensure rcu callbacks migrated.


rcu_barrier()			|	_cpu_down()
				|		__cpu_die() (cpu D is dead)
	........................|............................
	on_each_cpu()		|
	........................|...........................
	wait_for_completion()	|		rcu_offline_cpu() (move cpu D's
				|			rcu callbacks to A,B,or C)


on_each_cpu() does not queue rcu_barrier_callback to cpu D(it's dead).
So rcu_barrier() will not wait for callbacks which are original at cpu D.

We need ensure callbacks in dead cpu are migrated to online cpu before
we call on_each_cpu().

Thanks, Lai.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-08  2:58   ` Lai Jiangshan
@ 2009-03-08  6:20     ` Paul E. McKenney
  2009-03-09  2:56       ` Lai Jiangshan
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2009-03-08  6:20 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Ingo Molnar, Peter Zijlstra, LKML

On Sun, Mar 08, 2009 at 10:58:43AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > On Sat, Mar 07, 2009 at 06:54:38PM +0800, Lai Jiangshan wrote:
> >> [RFC]
> >> I don't like this patch, but I thought for some days and I can't
> >> thought out a better one.
> >>
> >> I'm very hope rcu_barrier() can be called anywhere(any sleepable context).
> >> But get_online_cpus() is a very large lock, it limits rcu_barrier().
> >>
> >> We can avoid get_online_cpus() easily for rcupreempt by using a new rcu_barrier:
> >> void rcu_barrier(void)
> >> {
> >> 	for each rcu_data {
> >> 		lock rcu_data;
> >> 		if rcu_data is not empty, queue a callback for rcu_barrier;
> >> 		unlock rcu_data;
> >> 	}
> >> }
> >> But we cannot use this algorithm for rcuclassic and rcutree,
> >> rcu_data in rcuclassic and rcutree have not a spinlock for queuing callback.
> >>
> >> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> >>
> >> cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
> >> still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
> >> to complete, so we must ensure callbacks in dead cpu are migrated to
> >> online cpu.
> > 
> > Hmmm...  I thought that on_each_cpu() took care of interlocking with
> > CPU hotplug via smp_call_function().  During a CPU-hotplug operation,
> > the RCU callbacks do get migrated from the CPU going offline.  Are you
> > seeing a sequence of events that finds a hole in this approach?
> > 
> > Now, if a CPU were to go offline in the middle of smp_call_function()
> > there could be trouble, but I was under the impression that the
> > preempt_disable() in on_each_cpu() prevented this from happening.
> > 
> > So, please tell me more!
> > 
> 
> preempt_disable() ensure online cpu is still online until preempt_enable(),
> but preempt_disable()/preempt_enable() can't ensure rcu callbacks migrated.
> 
> 
> rcu_barrier()			|	_cpu_down()
> 				|		__cpu_die() (cpu D is dead)
> 	........................|............................
> 	on_each_cpu()		|
> 	........................|...........................
> 	wait_for_completion()	|		rcu_offline_cpu() (move cpu D's
> 				|			rcu callbacks to A,B,or C)
> 
> 
> on_each_cpu() does not queue rcu_barrier_callback to cpu D(it's dead).
> So rcu_barrier() will not wait for callbacks which are original at cpu D.
> 
> We need ensure callbacks in dead cpu are migrated to online cpu before
> we call on_each_cpu().

Good catch!!!  I did indeed miss that possibility.  :-/

Hmmmm...  rcu_barrier() already acquires a global mutex, and is an
infrequent operation, so I am not all that worried about the scalability.

But I agree that there should be a better way to do this.  One approach
might be to the dying CPU enqueue the rcu_barrier() callback on its
own list when it goes offline, during the stop_machine() time period.
This enqueuing operation would require some care -- it would be necessary
to check to see if the callback was already on the list, for example,
as well as to properly adjust the rcu_barrier_completion() state.

Of course, it would also be necessary to handle the case where an
rcu_barrier() callback was enqueued when there was no rcu_barrier()
in flight, preferably by preventing this from happening.

An entirely different approach would be to steal a trick from CPU
designers, and use a count of the number of rcu_barrier() calls (this
counter could be a single bit).  Have a per-CPU counter of the number
of callbacks outstanding for each counter value.  Then rcu_barrier()
simply increments the rcu_barrier() counter, and waits until the
number of outstanding callbacks corresponding to the old value drops
to zero.  This would get rid of the need for rcu_barrier() to enqueue
callbacks, preventing the scenario above from arising in the first
place.

Other thoughts?

And again, good catch!!!

						Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-07 10:54 [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu Lai Jiangshan
  2009-03-07 17:29 ` Paul E. McKenney
@ 2009-03-08 16:00 ` Ingo Molnar
  2009-03-19  3:06   ` Lai Jiangshan
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2009-03-08 16:00 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Paul E. McKenney, Peter Zijlstra, LKML


* Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> [RFC]
> I don't like this patch, but I thought for some days and I can't
> thought out a better one.

Interesting find. Found via code review or via testing? If via 
testing, what is the symptom of the bug when it hits - did you 
see CPU hotplug stress-tests hanging? Crashing too perhaps? How 
frequently did it occur?

It also appears we introduced this bug at around 2.6.26 (when 
the CPU hotplug code was rewritten) - or had it before that as 
well.

We are late in -rc's so i'm trying to scope whether it's a 
2.6.29 must-have or can wait for 2.6.30.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-08  6:20     ` Paul E. McKenney
@ 2009-03-09  2:56       ` Lai Jiangshan
  2009-03-09  4:28         ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2009-03-09  2:56 UTC (permalink / raw)
  To: paulmck; +Cc: Ingo Molnar, Peter Zijlstra, LKML

Paul E. McKenney wrote:
> On Sun, Mar 08, 2009 at 10:58:43AM +0800, Lai Jiangshan wrote:
>> Paul E. McKenney wrote:
>>> On Sat, Mar 07, 2009 at 06:54:38PM +0800, Lai Jiangshan wrote:
>>>> [RFC]
>>>> I don't like this patch, but I thought for some days and I can't
>>>> thought out a better one.
>>>>
>>>> I'm very hope rcu_barrier() can be called anywhere(any sleepable context).
>>>> But get_online_cpus() is a very large lock, it limits rcu_barrier().
>>>>
>>>> We can avoid get_online_cpus() easily for rcupreempt by using a new rcu_barrier:
>>>> void rcu_barrier(void)
>>>> {
>>>> 	for each rcu_data {
>>>> 		lock rcu_data;
>>>> 		if rcu_data is not empty, queue a callback for rcu_barrier;
>>>> 		unlock rcu_data;
>>>> 	}
>>>> }
>>>> But we cannot use this algorithm for rcuclassic and rcutree,
>>>> rcu_data in rcuclassic and rcutree have not a spinlock for queuing callback.
>>>>
>>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>>
>>>> cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
>>>> still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
>>>> to complete, so we must ensure callbacks in dead cpu are migrated to
>>>> online cpu.
>>> Hmmm...  I thought that on_each_cpu() took care of interlocking with
>>> CPU hotplug via smp_call_function().  During a CPU-hotplug operation,
>>> the RCU callbacks do get migrated from the CPU going offline.  Are you
>>> seeing a sequence of events that finds a hole in this approach?
>>>
>>> Now, if a CPU were to go offline in the middle of smp_call_function()
>>> there could be trouble, but I was under the impression that the
>>> preempt_disable() in on_each_cpu() prevented this from happening.
>>>
>>> So, please tell me more!
>>>
>> preempt_disable() ensure online cpu is still online until preempt_enable(),
>> but preempt_disable()/preempt_enable() can't ensure rcu callbacks migrated.
>>
>>
>> rcu_barrier()			|	_cpu_down()
>> 				|		__cpu_die() (cpu D is dead)
>> 	........................|............................
>> 	on_each_cpu()		|
>> 	........................|...........................
>> 	wait_for_completion()	|		rcu_offline_cpu() (move cpu D's
>> 				|			rcu callbacks to A,B,or C)
>>
>>
>> on_each_cpu() does not queue rcu_barrier_callback to cpu D(it's dead).
>> So rcu_barrier() will not wait for callbacks which are original at cpu D.
>>
>> We need ensure callbacks in dead cpu are migrated to online cpu before
>> we call on_each_cpu().
> 
> Good catch!!!  I did indeed miss that possibility.  :-/
> 
> Hmmmm...  rcu_barrier() already acquires a global mutex, and is an
> infrequent operation, so I am not all that worried about the scalability.

I do not worry about the scalability either.
When we use get_online_cpus(), rcu_barrier() can not be called anywhere
(any sleepable context), this is what I worry about.

Most locks in kernel are locked after cpu_hotplug.lock,
if a path has required one of these lock, it cannot call get_online_cpus().
(to avoid ABBA deadlock)
So, if we use get_online_cpus() in rcu_barrier(), we cannot use rcu_barrier()
in most area in kernel.

> 
> But I agree that there should be a better way to do this.  One approach
> might be to the dying CPU enqueue the rcu_barrier() callback on its
> own list when it goes offline, during the stop_machine() time period.
> This enqueuing operation would require some care -- it would be necessary
> to check to see if the callback was already on the list, for example,
> as well as to properly adjust the rcu_barrier_completion() state.
> 
> Of course, it would also be necessary to handle the case where an
> rcu_barrier() callback was enqueued when there was no rcu_barrier()
> in flight, preferably by preventing this from happening.
> 
> An entirely different approach would be to steal a trick from CPU
> designers, and use a count of the number of rcu_barrier() calls (this
> counter could be a single bit).  Have a per-CPU counter of the number
> of callbacks outstanding for each counter value.  Then rcu_barrier()
> simply increments the rcu_barrier() counter, and waits until the
> number of outstanding callbacks corresponding to the old value drops
> to zero.  This would get rid of the need for rcu_barrier() to enqueue
> callbacks, preventing the scenario above from arising in the first
> place.
> 

Will you implement it with one of better ways?

Lai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-09  2:56       ` Lai Jiangshan
@ 2009-03-09  4:28         ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2009-03-09  4:28 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Ingo Molnar, Peter Zijlstra, LKML

On Mon, Mar 09, 2009 at 10:56:12AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > On Sun, Mar 08, 2009 at 10:58:43AM +0800, Lai Jiangshan wrote:
> >> Paul E. McKenney wrote:
> >>> On Sat, Mar 07, 2009 at 06:54:38PM +0800, Lai Jiangshan wrote:
> >>>> [RFC]
> >>>> I don't like this patch, but I thought for some days and I can't
> >>>> thought out a better one.
> >>>>
> >>>> I'm very hope rcu_barrier() can be called anywhere(any sleepable context).
> >>>> But get_online_cpus() is a very large lock, it limits rcu_barrier().
> >>>>
> >>>> We can avoid get_online_cpus() easily for rcupreempt by using a new rcu_barrier:
> >>>> void rcu_barrier(void)
> >>>> {
> >>>> 	for each rcu_data {
> >>>> 		lock rcu_data;
> >>>> 		if rcu_data is not empty, queue a callback for rcu_barrier;
> >>>> 		unlock rcu_data;
> >>>> 	}
> >>>> }
> >>>> But we cannot use this algorithm for rcuclassic and rcutree,
> >>>> rcu_data in rcuclassic and rcutree have not a spinlock for queuing callback.
> >>>>
> >>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> >>>>
> >>>> cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
> >>>> still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
> >>>> to complete, so we must ensure callbacks in dead cpu are migrated to
> >>>> online cpu.
> >>> Hmmm...  I thought that on_each_cpu() took care of interlocking with
> >>> CPU hotplug via smp_call_function().  During a CPU-hotplug operation,
> >>> the RCU callbacks do get migrated from the CPU going offline.  Are you
> >>> seeing a sequence of events that finds a hole in this approach?
> >>>
> >>> Now, if a CPU were to go offline in the middle of smp_call_function()
> >>> there could be trouble, but I was under the impression that the
> >>> preempt_disable() in on_each_cpu() prevented this from happening.
> >>>
> >>> So, please tell me more!
> >>>
> >> preempt_disable() ensure online cpu is still online until preempt_enable(),
> >> but preempt_disable()/preempt_enable() can't ensure rcu callbacks migrated.
> >>
> >>
> >> rcu_barrier()			|	_cpu_down()
> >> 				|		__cpu_die() (cpu D is dead)
> >> 	........................|............................
> >> 	on_each_cpu()		|
> >> 	........................|...........................
> >> 	wait_for_completion()	|		rcu_offline_cpu() (move cpu D's
> >> 				|			rcu callbacks to A,B,or C)
> >>
> >>
> >> on_each_cpu() does not queue rcu_barrier_callback to cpu D(it's dead).
> >> So rcu_barrier() will not wait for callbacks which are original at cpu D.
> >>
> >> We need ensure callbacks in dead cpu are migrated to online cpu before
> >> we call on_each_cpu().
> > 
> > Good catch!!!  I did indeed miss that possibility.  :-/
> > 
> > Hmmmm...  rcu_barrier() already acquires a global mutex, and is an
> > infrequent operation, so I am not all that worried about the scalability.
> 
> I do not worry about the scalability either.
> When we use get_online_cpus(), rcu_barrier() can not be called anywhere
> (any sleepable context), this is what I worry about.
> 
> Most locks in kernel are locked after cpu_hotplug.lock,
> if a path has required one of these lock, it cannot call get_online_cpus().
> (to avoid ABBA deadlock)
> So, if we use get_online_cpus() in rcu_barrier(), we cannot use rcu_barrier()
> in most area in kernel.

Yes, that could be painful.

> > But I agree that there should be a better way to do this.  One approach
> > might be to the dying CPU enqueue the rcu_barrier() callback on its
> > own list when it goes offline, during the stop_machine() time period.
> > This enqueuing operation would require some care -- it would be necessary
> > to check to see if the callback was already on the list, for example,
> > as well as to properly adjust the rcu_barrier_completion() state.
> > 
> > Of course, it would also be necessary to handle the case where an
> > rcu_barrier() callback was enqueued when there was no rcu_barrier()
> > in flight, preferably by preventing this from happening.
> > 
> > An entirely different approach would be to steal a trick from CPU
> > designers, and use a count of the number of rcu_barrier() calls (this
> > counter could be a single bit).  Have a per-CPU counter of the number
> > of callbacks outstanding for each counter value.  Then rcu_barrier()
> > simply increments the rcu_barrier() counter, and waits until the
> > number of outstanding callbacks corresponding to the old value drops
> > to zero.  This would get rid of the need for rcu_barrier() to enqueue
> > callbacks, preventing the scenario above from arising in the first
> > place.
> 
> Will you implement it with one of better ways?

I would start after I get the idr crash resolved -- first patch already
done (http://lkml.org/lkml/2009/3/7/133), but problems remain.  So if
you want to get started, I would be happy to review.  Your tail-pointer
trick would be needed to allow the counts to be correctly maintained
when migrating callbacks, of course.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-08 16:00 ` Ingo Molnar
@ 2009-03-19  3:06   ` Lai Jiangshan
  2009-03-19  4:05     ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2009-03-19  3:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Paul E. McKenney, Peter Zijlstra, LKML

Ingo Molnar wrote:
> * Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
>> [RFC]
>> I don't like this patch, but I thought for some days and I can't
>> thought out a better one.
> 
> Interesting find. Found via code review or via testing? If via 
> testing, what is the symptom of the bug when it hits - did you 
> see CPU hotplug stress-tests hanging? Crashing too perhaps? How 
> frequently did it occur?

I found this bug when I tested the draft version of kfree_rcu(V3).

I noticed kfree_rcu_cpu_notify() is called earlier than
rcu_cpu_notify(). This means rcu_barrier() is called earlier than
RCU callbacks migration, it should lockup as expectation. But actually,
this lockup can not occurred, I tried to explore it, and I found that
rcu_barrier() does not handle cpu_hotplug. It includes two bugs.

kfree_rcu(V3) (V4 is available too, it will be sent soon):
http://lkml.org/lkml/2009/3/6/156

The V1 fix of this bug:
http://lkml.org/lkml/2009/3/7/38

The fix of the other bug: (it changed the scheduler's code too)
http://lkml.org/lkml/2009/3/7/39

Subject: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu (V2)

cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
to complete, so we must ensure callbacks in dead cpu are migrated to
online cpu.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index cae8a05..2c7b845 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -122,6 +122,8 @@ static void rcu_barrier_func(void *type)
 	}
 }
 
+static inline void wait_migrated_callbacks(void);
+
 /*
  * Orchestrate the specified type of RCU barrier, waiting for all
  * RCU callbacks of the specified type to complete.
@@ -147,6 +149,7 @@ static void _rcu_barrier(enum rcu_barrier type)
 		complete(&rcu_barrier_completion);
 	wait_for_completion(&rcu_barrier_completion);
 	mutex_unlock(&rcu_barrier_mutex);
+	wait_migrated_callbacks();
 }
 
 /**
@@ -176,9 +179,50 @@ void rcu_barrier_sched(void)
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_sched);
 
+static atomic_t rcu_migrate_type_count = ATOMIC_INIT(0);
+static struct rcu_head rcu_migrate_head[3];
+static DECLARE_WAIT_QUEUE_HEAD(rcu_migrate_wq);
+
+static void rcu_migrate_callback(struct rcu_head *notused)
+{
+	if (atomic_dec_and_test(&rcu_migrate_type_count))
+		wake_up(&rcu_migrate_wq);
+}
+
+static inline void wait_migrated_callbacks(void)
+{
+	wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
+}
+
+static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
+		unsigned long action, void *hcpu)
+{
+	if (action == CPU_DYING) {
+		/*
+		 * preempt_disable() in on_each_cpu() prevents stop_machine(),
+		 * so when "on_each_cpu(rcu_barrier_func, (void *)type, 1);"
+		 * returns, all online cpus have queued rcu_barrier_func(),
+		 * and the dead cpu(if it exist) queues rcu_migrate_callback()s.
+		 *
+		 * These callbacks ensure _rcu_barrier() waits for all
+		 * RCU callbacks of the specified type to complete.
+		 */
+		atomic_set(&rcu_migrate_type_count, 3);
+		call_rcu_bh(rcu_migrate_head, rcu_migrate_callback);
+		call_rcu_sched(rcu_migrate_head + 1, rcu_migrate_callback);
+		call_rcu(rcu_migrate_head + 2, rcu_migrate_callback);
+	} else if (action == CPU_POST_DEAD) {
+		/* rcu_migrate_head is protected by cpu_add_remove_lock */
+		wait_migrated_callbacks();
+	}
+
+	return NOTIFY_OK;
+}
+
 void __init rcu_init(void)
 {
 	__rcu_init();
+	hotcpu_notifier(rcu_barrier_cpu_hotplug, 0);
 }
 
 void rcu_scheduler_starting(void)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-19  3:06   ` Lai Jiangshan
@ 2009-03-19  4:05     ` Paul E. McKenney
       [not found]       ` <20090319082237.GA32179@elte.hu>
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2009-03-19  4:05 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Ingo Molnar, Peter Zijlstra, LKML

On Thu, Mar 19, 2009 at 11:06:39AM +0800, Lai Jiangshan wrote:
> Ingo Molnar wrote:
> > * Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> > 
> >> [RFC]
> >> I don't like this patch, but I thought for some days and I can't
> >> thought out a better one.
> > 
> > Interesting find. Found via code review or via testing? If via 
> > testing, what is the symptom of the bug when it hits - did you 
> > see CPU hotplug stress-tests hanging? Crashing too perhaps? How 
> > frequently did it occur?
> 
> I found this bug when I tested the draft version of kfree_rcu(V3).
> 
> I noticed kfree_rcu_cpu_notify() is called earlier than
> rcu_cpu_notify(). This means rcu_barrier() is called earlier than
> RCU callbacks migration, it should lockup as expectation. But actually,
> this lockup can not occurred, I tried to explore it, and I found that
> rcu_barrier() does not handle cpu_hotplug. It includes two bugs.
> 
> kfree_rcu(V3) (V4 is available too, it will be sent soon):
> http://lkml.org/lkml/2009/3/6/156
> 
> The V1 fix of this bug:
> http://lkml.org/lkml/2009/3/7/38
> 
> The fix of the other bug: (it changed the scheduler's code too)
> http://lkml.org/lkml/2009/3/7/39
> 
> Subject: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu (V2)
> 
> cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
> still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
> to complete, so we must ensure callbacks in dead cpu are migrated to
> online cpu.

Good stuff, Lai!!!  Simpler than any of the approaches that I was
considering, and, better yet, independent of the underlying RCU
implementation!!!

I was initially worried that wake_up() might wake only one of two
possible wait_event()s, namely rcu_barrier() and the CPU_POST_DEAD code,
but the fact that wait_event() clears WQ_FLAG_EXCLUSIVE avoids that issue.
I was also worried about the fact that different RCU implementations have
different mappings of call_rcu(), call_rcu_bh(), and call_rcu_sched(), but
this is OK as well because we just get an extra (harmless) callback in the
case that they map together (for example, Classic RCU has call_rcu_sched()
mapping to call_rcu()).

Overlap of CPU-hotplug operations is prevented by cpu_add_remove_lock,
and any stray callbacks that arrive (for example, from irq handlers
running on the dying CPU) either are ahead of the CPU_DYING callbacks on
the one hand (and thus accounted for), or happened after the rcu_barrier()
started on the other (and thus don't need to be accounted for).

So...

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> index cae8a05..2c7b845 100644
> --- a/kernel/rcupdate.c
> +++ b/kernel/rcupdate.c
> @@ -122,6 +122,8 @@ static void rcu_barrier_func(void *type)
>  	}
>  }
> 
> +static inline void wait_migrated_callbacks(void);
> +
>  /*
>   * Orchestrate the specified type of RCU barrier, waiting for all
>   * RCU callbacks of the specified type to complete.
> @@ -147,6 +149,7 @@ static void _rcu_barrier(enum rcu_barrier type)
>  		complete(&rcu_barrier_completion);
>  	wait_for_completion(&rcu_barrier_completion);
>  	mutex_unlock(&rcu_barrier_mutex);
> +	wait_migrated_callbacks();
>  }
> 
>  /**
> @@ -176,9 +179,50 @@ void rcu_barrier_sched(void)
>  }
>  EXPORT_SYMBOL_GPL(rcu_barrier_sched);
> 
> +static atomic_t rcu_migrate_type_count = ATOMIC_INIT(0);
> +static struct rcu_head rcu_migrate_head[3];
> +static DECLARE_WAIT_QUEUE_HEAD(rcu_migrate_wq);
> +
> +static void rcu_migrate_callback(struct rcu_head *notused)
> +{
> +	if (atomic_dec_and_test(&rcu_migrate_type_count))
> +		wake_up(&rcu_migrate_wq);
> +}
> +
> +static inline void wait_migrated_callbacks(void)
> +{
> +	wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
> +}
> +
> +static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
> +		unsigned long action, void *hcpu)
> +{
> +	if (action == CPU_DYING) {
> +		/*
> +		 * preempt_disable() in on_each_cpu() prevents stop_machine(),
> +		 * so when "on_each_cpu(rcu_barrier_func, (void *)type, 1);"
> +		 * returns, all online cpus have queued rcu_barrier_func(),
> +		 * and the dead cpu(if it exist) queues rcu_migrate_callback()s.
> +		 *
> +		 * These callbacks ensure _rcu_barrier() waits for all
> +		 * RCU callbacks of the specified type to complete.
> +		 */
> +		atomic_set(&rcu_migrate_type_count, 3);
> +		call_rcu_bh(rcu_migrate_head, rcu_migrate_callback);
> +		call_rcu_sched(rcu_migrate_head + 1, rcu_migrate_callback);
> +		call_rcu(rcu_migrate_head + 2, rcu_migrate_callback);
> +	} else if (action == CPU_POST_DEAD) {
> +		/* rcu_migrate_head is protected by cpu_add_remove_lock */
> +		wait_migrated_callbacks();
> +	}
> +
> +	return NOTIFY_OK;
> +}
> +
>  void __init rcu_init(void)
>  {
>  	__rcu_init();
> +	hotcpu_notifier(rcu_barrier_cpu_hotplug, 0);
>  }
> 
>  void rcu_scheduler_starting(void)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
       [not found]       ` <20090319082237.GA32179@elte.hu>
@ 2009-03-20  9:40         ` Lai Jiangshan
  2009-03-20 20:00           ` [tip:core/rcu] rcu: " Lai Jiangshan
  2009-03-30 22:12           ` Lai Jiangshan
  0 siblings, 2 replies; 12+ messages in thread
From: Lai Jiangshan @ 2009-03-20  9:40 UTC (permalink / raw)
  To: Ingo Molnar, Paul E. McKenney, Peter Zijlstra, LKML

Ingo Molnar wrote:
> 
> Lai, would you mind to resend the full patch with an updated 
> changelog tha adds Paul's review tag, against latest -tip?
>

Subject: [PATCH -tip] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu (V2)

cpu hotplug may be happened asynchronously, some rcu callbacks are maybe
still in dead cpu, rcu_barrier() also needs to wait for these rcu callbacks
to complete, so we must ensure callbacks in dead cpu are migrated to
online cpu.

Paul E. McKenney's review:

  Good stuff, Lai!!!  Simpler than any of the approaches that I was
  considering, and, better yet, independent of the underlying RCU
  implementation!!!

  I was initially worried that wake_up() might wake only one of two
  possible wait_event()s, namely rcu_barrier() and the CPU_POST_DEAD code,
  but the fact that wait_event() clears WQ_FLAG_EXCLUSIVE avoids that issue.
  I was also worried about the fact that different RCU implementations have
  different mappings of call_rcu(), call_rcu_bh(), and call_rcu_sched(), but
  this is OK as well because we just get an extra (harmless) callback in the
  case that they map together (for example, Classic RCU has call_rcu_sched()
  mapping to call_rcu()).

  Overlap of CPU-hotplug operations is prevented by cpu_add_remove_lock,
  and any stray callbacks that arrive (for example, from irq handlers
  running on the dying CPU) either are ahead of the CPU_DYING callbacks on
  the one hand (and thus accounted for), or happened after the rcu_barrier()
  started on the other (and thus don't need to be accounted for).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index cae8a05..2c7b845 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -122,6 +122,8 @@ static void rcu_barrier_func(void *type)
 	}
 }
 
+static inline void wait_migrated_callbacks(void);
+
 /*
  * Orchestrate the specified type of RCU barrier, waiting for all
  * RCU callbacks of the specified type to complete.
@@ -147,6 +149,7 @@ static void _rcu_barrier(enum rcu_barrier type)
 		complete(&rcu_barrier_completion);
 	wait_for_completion(&rcu_barrier_completion);
 	mutex_unlock(&rcu_barrier_mutex);
+	wait_migrated_callbacks();
 }
 
 /**
@@ -176,9 +179,50 @@ void rcu_barrier_sched(void)
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_sched);
 
+static atomic_t rcu_migrate_type_count = ATOMIC_INIT(0);
+static struct rcu_head rcu_migrate_head[3];
+static DECLARE_WAIT_QUEUE_HEAD(rcu_migrate_wq);
+
+static void rcu_migrate_callback(struct rcu_head *notused)
+{
+	if (atomic_dec_and_test(&rcu_migrate_type_count))
+		wake_up(&rcu_migrate_wq);
+}
+
+static inline void wait_migrated_callbacks(void)
+{
+	wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
+}
+
+static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
+		unsigned long action, void *hcpu)
+{
+	if (action == CPU_DYING) {
+		/*
+		 * preempt_disable() in on_each_cpu() prevents stop_machine(),
+		 * so when "on_each_cpu(rcu_barrier_func, (void *)type, 1);"
+		 * returns, all online cpus have queued rcu_barrier_func(),
+		 * and the dead cpu(if it exist) queues rcu_migrate_callback()s.
+		 *
+		 * These callbacks ensure _rcu_barrier() waits for all
+		 * RCU callbacks of the specified type to complete.
+		 */
+		atomic_set(&rcu_migrate_type_count, 3);
+		call_rcu_bh(rcu_migrate_head, rcu_migrate_callback);
+		call_rcu_sched(rcu_migrate_head + 1, rcu_migrate_callback);
+		call_rcu(rcu_migrate_head + 2, rcu_migrate_callback);
+	} else if (action == CPU_POST_DEAD) {
+		/* rcu_migrate_head is protected by cpu_add_remove_lock */
+		wait_migrated_callbacks();
+	}
+
+	return NOTIFY_OK;
+}
+
 void __init rcu_init(void)
 {
 	__rcu_init();
+	hotcpu_notifier(rcu_barrier_cpu_hotplug, 0);
 }
 
 void rcu_scheduler_starting(void)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [tip:core/rcu] rcu: rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-20  9:40         ` Lai Jiangshan
@ 2009-03-20 20:00           ` Lai Jiangshan
  2009-03-30 22:12           ` Lai Jiangshan
  1 sibling, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2009-03-20 20:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulmck, hpa, mingo, peterz, tglx, laijs, mingo

Commit-ID:  04cb9ac11ba898049c99dc67ede6e68f6d700dee
Gitweb:     http://git.kernel.org/tip/04cb9ac11ba898049c99dc67ede6e68f6d700dee
Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
AuthorDate: Fri, 20 Mar 2009 17:40:06 +0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 20 Mar 2009 19:37:33 +0100

rcu: rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu

cpu hotplug may happen asynchronously, some rcu callbacks are maybe
still on dead cpu, rcu_barrier() also needs to wait for these rcu
callbacks to complete, so we must ensure callbacks in dead cpu are
migrated to online cpu.

Paul E. McKenney's review:

  Good stuff, Lai!!!  Simpler than any of the approaches that I was
  considering, and, better yet, independent of the underlying RCU
  implementation!!!

  I was initially worried that wake_up() might wake only one of two
  possible wait_event()s, namely rcu_barrier() and the CPU_POST_DEAD code,
  but the fact that wait_event() clears WQ_FLAG_EXCLUSIVE avoids that issue.
  I was also worried about the fact that different RCU implementations have
  different mappings of call_rcu(), call_rcu_bh(), and call_rcu_sched(), but
  this is OK as well because we just get an extra (harmless) callback in the
  case that they map together (for example, Classic RCU has call_rcu_sched()
  mapping to call_rcu()).

  Overlap of CPU-hotplug operations is prevented by cpu_add_remove_lock,
  and any stray callbacks that arrive (for example, from irq handlers
  running on the dying CPU) either are ahead of the CPU_DYING callbacks on
  the one hand (and thus accounted for), or happened after the rcu_barrier()
  started on the other (and thus don't need to be accounted for).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49C36476.1010400@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/rcupdate.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index cae8a05..2c7b845 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -122,6 +122,8 @@ static void rcu_barrier_func(void *type)
 	}
 }
 
+static inline void wait_migrated_callbacks(void);
+
 /*
  * Orchestrate the specified type of RCU barrier, waiting for all
  * RCU callbacks of the specified type to complete.
@@ -147,6 +149,7 @@ static void _rcu_barrier(enum rcu_barrier type)
 		complete(&rcu_barrier_completion);
 	wait_for_completion(&rcu_barrier_completion);
 	mutex_unlock(&rcu_barrier_mutex);
+	wait_migrated_callbacks();
 }
 
 /**
@@ -176,9 +179,50 @@ void rcu_barrier_sched(void)
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_sched);
 
+static atomic_t rcu_migrate_type_count = ATOMIC_INIT(0);
+static struct rcu_head rcu_migrate_head[3];
+static DECLARE_WAIT_QUEUE_HEAD(rcu_migrate_wq);
+
+static void rcu_migrate_callback(struct rcu_head *notused)
+{
+	if (atomic_dec_and_test(&rcu_migrate_type_count))
+		wake_up(&rcu_migrate_wq);
+}
+
+static inline void wait_migrated_callbacks(void)
+{
+	wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
+}
+
+static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
+		unsigned long action, void *hcpu)
+{
+	if (action == CPU_DYING) {
+		/*
+		 * preempt_disable() in on_each_cpu() prevents stop_machine(),
+		 * so when "on_each_cpu(rcu_barrier_func, (void *)type, 1);"
+		 * returns, all online cpus have queued rcu_barrier_func(),
+		 * and the dead cpu(if it exist) queues rcu_migrate_callback()s.
+		 *
+		 * These callbacks ensure _rcu_barrier() waits for all
+		 * RCU callbacks of the specified type to complete.
+		 */
+		atomic_set(&rcu_migrate_type_count, 3);
+		call_rcu_bh(rcu_migrate_head, rcu_migrate_callback);
+		call_rcu_sched(rcu_migrate_head + 1, rcu_migrate_callback);
+		call_rcu(rcu_migrate_head + 2, rcu_migrate_callback);
+	} else if (action == CPU_POST_DEAD) {
+		/* rcu_migrate_head is protected by cpu_add_remove_lock */
+		wait_migrated_callbacks();
+	}
+
+	return NOTIFY_OK;
+}
+
 void __init rcu_init(void)
 {
 	__rcu_init();
+	hotcpu_notifier(rcu_barrier_cpu_hotplug, 0);
 }
 
 void rcu_scheduler_starting(void)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [tip:core/rcu] rcu: rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu
  2009-03-20  9:40         ` Lai Jiangshan
  2009-03-20 20:00           ` [tip:core/rcu] rcu: " Lai Jiangshan
@ 2009-03-30 22:12           ` Lai Jiangshan
  1 sibling, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2009-03-30 22:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulmck, hpa, mingo, peterz, tglx, laijs, mingo

Commit-ID:  f69b17d7e745d8edd7c0d90390cbaa77e63c5ea3
Gitweb:     http://git.kernel.org/tip/f69b17d7e745d8edd7c0d90390cbaa77e63c5ea3
Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
AuthorDate: Fri, 20 Mar 2009 17:40:06 +0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 31 Mar 2009 00:09:37 +0200

rcu: rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu

cpu hotplug may happen asynchronously, some rcu callbacks are maybe
still on dead cpu, rcu_barrier() also needs to wait for these rcu
callbacks to complete, so we must ensure callbacks in dead cpu are
migrated to online cpu.

Paul E. McKenney's review:

  Good stuff, Lai!!!  Simpler than any of the approaches that I was
  considering, and, better yet, independent of the underlying RCU
  implementation!!!

  I was initially worried that wake_up() might wake only one of two
  possible wait_event()s, namely rcu_barrier() and the CPU_POST_DEAD code,
  but the fact that wait_event() clears WQ_FLAG_EXCLUSIVE avoids that issue.
  I was also worried about the fact that different RCU implementations have
  different mappings of call_rcu(), call_rcu_bh(), and call_rcu_sched(), but
  this is OK as well because we just get an extra (harmless) callback in the
  case that they map together (for example, Classic RCU has call_rcu_sched()
  mapping to call_rcu()).

  Overlap of CPU-hotplug operations is prevented by cpu_add_remove_lock,
  and any stray callbacks that arrive (for example, from irq handlers
  running on the dying CPU) either are ahead of the CPU_DYING callbacks on
  the one hand (and thus accounted for), or happened after the rcu_barrier()
  started on the other (and thus don't need to be accounted for).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49C36476.1010400@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/rcupdate.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index cae8a05..2c7b845 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -122,6 +122,8 @@ static void rcu_barrier_func(void *type)
 	}
 }
 
+static inline void wait_migrated_callbacks(void);
+
 /*
  * Orchestrate the specified type of RCU barrier, waiting for all
  * RCU callbacks of the specified type to complete.
@@ -147,6 +149,7 @@ static void _rcu_barrier(enum rcu_barrier type)
 		complete(&rcu_barrier_completion);
 	wait_for_completion(&rcu_barrier_completion);
 	mutex_unlock(&rcu_barrier_mutex);
+	wait_migrated_callbacks();
 }
 
 /**
@@ -176,9 +179,50 @@ void rcu_barrier_sched(void)
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_sched);
 
+static atomic_t rcu_migrate_type_count = ATOMIC_INIT(0);
+static struct rcu_head rcu_migrate_head[3];
+static DECLARE_WAIT_QUEUE_HEAD(rcu_migrate_wq);
+
+static void rcu_migrate_callback(struct rcu_head *notused)
+{
+	if (atomic_dec_and_test(&rcu_migrate_type_count))
+		wake_up(&rcu_migrate_wq);
+}
+
+static inline void wait_migrated_callbacks(void)
+{
+	wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
+}
+
+static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
+		unsigned long action, void *hcpu)
+{
+	if (action == CPU_DYING) {
+		/*
+		 * preempt_disable() in on_each_cpu() prevents stop_machine(),
+		 * so when "on_each_cpu(rcu_barrier_func, (void *)type, 1);"
+		 * returns, all online cpus have queued rcu_barrier_func(),
+		 * and the dead cpu(if it exist) queues rcu_migrate_callback()s.
+		 *
+		 * These callbacks ensure _rcu_barrier() waits for all
+		 * RCU callbacks of the specified type to complete.
+		 */
+		atomic_set(&rcu_migrate_type_count, 3);
+		call_rcu_bh(rcu_migrate_head, rcu_migrate_callback);
+		call_rcu_sched(rcu_migrate_head + 1, rcu_migrate_callback);
+		call_rcu(rcu_migrate_head + 2, rcu_migrate_callback);
+	} else if (action == CPU_POST_DEAD) {
+		/* rcu_migrate_head is protected by cpu_add_remove_lock */
+		wait_migrated_callbacks();
+	}
+
+	return NOTIFY_OK;
+}
+
 void __init rcu_init(void)
 {
 	__rcu_init();
+	hotcpu_notifier(rcu_barrier_cpu_hotplug, 0);
 }
 
 void rcu_scheduler_starting(void)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-03-30 22:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-07 10:54 [PATCH] rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu Lai Jiangshan
2009-03-07 17:29 ` Paul E. McKenney
2009-03-08  2:58   ` Lai Jiangshan
2009-03-08  6:20     ` Paul E. McKenney
2009-03-09  2:56       ` Lai Jiangshan
2009-03-09  4:28         ` Paul E. McKenney
2009-03-08 16:00 ` Ingo Molnar
2009-03-19  3:06   ` Lai Jiangshan
2009-03-19  4:05     ` Paul E. McKenney
     [not found]       ` <20090319082237.GA32179@elte.hu>
2009-03-20  9:40         ` Lai Jiangshan
2009-03-20 20:00           ` [tip:core/rcu] rcu: " Lai Jiangshan
2009-03-30 22:12           ` Lai Jiangshan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.