All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>,
	linux-kernel@vger.kernel.org, cl@linux-foundation.org,
	mingo@elte.hu, akpm@linux-foundation.org, dipankar@in.ibm.com,
	josht@linux.vnet.ibm.com, schamp@sgi.com, niv@us.ibm.com,
	dvhltc@us.ibm.com, ego@in.ibm.com, rostedt@goodmis.org,
	peterz@infradead.org, benh@kernel.crashing.org,
	davem@davemloft.net, tony.luck@intel.com
Subject: Re: [PATCH, RFC, tip/core/rcu] v3 scalable classic RCU implementation
Date: Sun, 31 Aug 2008 10:20:01 -0700	[thread overview]
Message-ID: <20080831172001.GA7015@linux.vnet.ibm.com> (raw)
In-Reply-To: <48BA7944.8070402@colorfullife.com>

On Sun, Aug 31, 2008 at 12:58:12PM +0200, Manfred Spraul wrote:
> Paul E. McKenney wrote:
>>
>>> Perhaps it's possible to rely on CPU_DYING, but I haven't figured out yet 
>>> how to handle read-side critical sections in CPU_DYING handlers.
>>> Interrupts after CPU_DYING could be handled by rcu_irq_enter(), 
>>> rcu_irq_exit() [yes, they exist on x86: the arch code enables the local 
>>> interrupts in order to process the currently queued interrupts]
>>>     
>>
>> My feeling is that CPU online/offline will be quite rare, so it should
>> be OK to clean up after the races in force_quiescent_state(), which in
>> this version is called every three ticks in a given grace period.
>>   
> If you add failing cpu offline calls, then the problem appears to be 
> unsolvable:
> If I get it right, the offlining process looks like this:
> * one cpu in the system makes the CPU_DOWN_PREPARE notifier call. These 
> calls can sleep (e.g. slab sleeps on semaphores). The cpu that goes offline 
> is still alive, still doing arbitrary work. cpu_quiet calls on behalf of 
> the cpu would be wrong.
> * stop_machine: all cpus schedule to a special kernel thread [1], only the 
> dying cpu runs.
> * The cpu that goes offline calls the CPU_DYING notifiers.
> * __cpu_disable(): The cpu that goes offline check if it's possible to 
> offline the cpu. At least on i386, this can fail.
> On success:
> * at least on i386: the cpu that goes offline handles outstanding 
> interrupts. I'm not sure, perhaps even softirqs are handled.
> * the cpus stopps handling interrupts.
> * stop machine leaves, the remaining cpus continue their work.

As I understand it, this is the point where the dying CPU disables
interrupts and removes itself from the online masks.  Though I would
feel better if there was an smp_mb() after the last local_irq_disable()
and before the remove_cpu_from_maps()!

> * The CPU_DEAD notifiers are called. They can sleep.
> On failure:
> * all cpus continue their work. call_rcu, synchronize_rcu(), ...
> * some time later: the CPU_DOWN_FAILED callbacks are called.
>
> Is that description correct?

Gautham?

> Then:
> - treating a cpu as always quiet after the rcu notifer was called with 
> CPU_OFFLINE_PREPARE is wrong: the target cpu still runs normal code: user 
> space, kernel space, interrupts, whatever. The target cpu still accepts 
> interrupst, thus treating it as "normal" should work.

Indeed!  My current code doesn't declare them offline until the CPU_DEAD
notifiers are called.  And force_quiescent_state() does not consider
them to be offline until after they have cleared their bit in
cpu_online_map, which does not happen until the outgoing CPU has
disabled interrupts, at least in x86.  So my current code should be
OK on x86.

It -looks- like stop_cpu() expects to be called with irqs disabled,
but I don't see what would be disabling irqs.  (Don't kthreads normally
start with irqs enabled?)  Ah, I see it -- the stop_cpu() threads
sequence through a state machine, and one of the states disables
irqs for everyone.

So the only problem would occur in architectures that re-enable irqs
in the middle of __cpu_disable(), as x86 does (but x86 correctly orders
the clearing of the cpu_online_mask bit, so is OK).  This of course
has the added benefit that irq handlers aren't running on a CPU that
is marked offline.

Checking other architectures:

o	ARM arch/arm/kernel/smp.c __cpu_disable() does not re-enable
	irqs, so is OK.

!	arch/ia64/kernel/smpboot.c __cpu_disable() clears itself
	from the cpu_online mask before flushing pending irqs, which
	might include RCU read-side critical sections.	I believe that
	the "cpu_clear(cpu, cpu_online_map)" must move to after the
	"fixup_irqs()".

o	arch/powerpc/kernel/smp.c __cpu_disable() does not disable
	irqs directly, but calls subarch-specific functions noted
	below.

o	arch/powerpc/platforms/powermac/smp.c smp_core99_cpu_disable()
	does not appear to re-enable irqs, so should be OK.

!	arch/powerpc/kernel/smp.c generic_cpu_disable() clears itself
	from the cpu_online_mask before invoking fixup_irqs(), which
	momentarily enables irqs.  I believe that the  "cpu_clear(cpu,
	cpu_online_map)" must move to after the "fixup_irqs()".

?	arch/powerpc/platforms/pseries/hotplug-cpu.c pseries_cpu_disable()
	clears itself from the cpu_online_mask before calling
	xics_migrate_irqs_away().  This function rejects already-pending
	irqs, then redirects future irqs.  Not clear to me what happens
	if an irq arrives between the reject and the immediately
	following removal from the global interrupt queue.

o	arch/s390/kernel/smp.c __cpu_disable() does not reenable irqs
	so is OK.

!	arch/sparc64/kernel/smp.c __cpu_disable() clears its bit before
	re-enabling interrupts.  I believe that the "cpu_clear(cpu,
	cpu_online_map)" needs to happen after the local_irq_disable().

?	include/asm-parisc/smp.h __cpu_disable() just returns without
	doing anything.  This means pa-risc does not support hotplug
	CPU?  If so, no problem.

I am sending (untested) patches separately for the amusement of the
arch maintainers.

> __cpu_disable() success:
> - after CPU_DYING, a cpu is either in an interrupt or outside read-side 
> critical sections. Parallel synchronize_rcu() calls are impossible until 
> the cpu is dead. call_rcu() is probably possible.
> - The CPU_DEAD notifiers are called. a synchronize_rcu() call before the 
> rcu notifier is called is possible.
> __cpu_disable() failure:
> - CPU_DYING is called, but the cpu remains fully alive. The system comes 
> fully alive again.
> - some time later, CPU_DEAD is called.
>
> With the current CPU_DYING callback, it's impossible to be both 
> deadlock-free and race-free with the given conditions. If __cpu_disable() 
> succeeds, then the cpu must be treated as gone and always idle. If 
> __cpu_disable() fails, then the cpu must be treated as fully there. Doing 
> both things at the same time is impossible. Waiting until CPU_DOWN_FAILED 
> or CPU_DEAD is called is impossible, too: Either synchronize_rcu() in a 
> CPU_DEAD notifier [called before the rcu notifier] would deadlock or 
> read-side critical sections on the not-killed cpu would race.

Assuming that the ordering of processing pending irqs and marking the
CPU offline in cpu_online_mask can be resolved as noted above, it should
work fine -- if a CPU's bit is clear, we can safely ignore it.  The race
can be resolved by checking the CPU's bit in force_quiescent_state().

Or am I missing something?

> What about moving the CPU_DYING notifier calls behind the __cpu_disable() 
> call?
> Any other solutions?

RCU should ignore the CPU_DYING notifier calls -- only the CPU_DEAD.*
calls should be processed for CPUs being offlined.  Right?

> Btw, as far as I can see, rcupreempt would deadlock if a CPU_DEAD notifier 
> uses synchronize_rcu().
> Probably noone will ever succeed in triggering the deadlock:
> - cpu goes offline.
> - the other cpus in the system are restarted.
> - one cpu does the CPU_DEAD notifier calls.
> - before the rcu notifier is called with CPU_DEAD:
> - one CPU_DEAD notifier sleeps.
> - while CPU_DEAD is sleeping: on the same cpu: kmem_cache_destroy is 
> called. get_online_cpus immediately succeeds.
> - kmem_cache_destroy acquires the cache_chain_mutex.
> - kmem_cache_destroy does synchronize_rcu(), it sleeps.
> - CPU_DEAD processing continues, the slab CPU_DEAD tries to acquire the 
> cache_chain_mutex. it sleeps, too.
> --> deadlock, because the already dead cpu will never signal itself as 
> quiet. Thus synchronize_rcu() will never succeed, thus the slab CPU_DEAD 
> notifier will never return, thus rcu_offline_cpu() is never called.

It is entirely possible that rcu_try_flip_waitack() and
rcu_try_flip_waitmb() need to check the AND of rcu_cpu_online_map and
cpu_online_map.  If this really is a problem (and it might well be),
then the easiest fix is to check for cpu_is_offline(cpu) in both
rcu_try_flip_waitmb_needed() and rcu_try_flip_waitack_needed(), and
that in both versions of both functions.  Thoughts?

> --
>    Manfred
> [1] open question: with rcu_preempt, is it possible that these cpus could 
> be inside read side critical sections?

Yes, they could.  Well, tasks that were previously running on them
might be preempted or blocked waiting on locks while still in RCU
read-side critical sections.  However, when a given CPU goes offline,
rcu_preempt moves that CPU's counters to some surviving CPU.  So RCU
knows that these tasks are still in RCU read-side critical sections,
and will therefore wait for them.

							Thanx, Paul

  reply	other threads:[~2008-08-31 17:20 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-21 23:43 [PATCH, RFC, tip/core/rcu] scalable classic RCU implementation Paul E. McKenney
2008-08-22  4:37 ` Ingo Molnar
2008-08-22 13:47   ` Paul E. McKenney
2008-08-22 17:22     ` Paul E. McKenney
2008-08-22 18:16       ` Josh Triplett
2008-08-23 16:07       ` Ingo Molnar
2008-08-24  2:44         ` Paul E. McKenney
2008-08-22 23:29 ` Josh Triplett
2008-08-23  1:53   ` Paul E. McKenney
2008-08-25 22:02     ` Josh Triplett
2008-08-26 16:05       ` Paul E. McKenney
2008-08-27  0:38         ` Josh Triplett
2008-08-27 18:34           ` Paul E. McKenney
2008-08-27 20:23             ` Josh Triplett
2008-08-27 20:41               ` Paul E. McKenney
2008-08-25 10:34   ` Peter Zijlstra
2008-08-25 15:16     ` Paul E. McKenney
2008-08-25 15:26       ` Peter Zijlstra
2008-08-27 18:28         ` Paul E. McKenney
2008-08-24  8:08 ` Manfred Spraul
2008-08-24 16:32   ` Paul E. McKenney
2008-08-24 18:25     ` Manfred Spraul
2008-08-24 21:19       ` Paul E. McKenney
2008-08-25  0:07 ` [PATCH, RFC, tip/core/rcu] v2 " Paul E. McKenney
2008-08-30  0:49   ` [PATCH, RFC, tip/core/rcu] v3 " Paul E. McKenney
2008-08-30  9:33     ` Peter Zijlstra
2008-08-30 14:10       ` Paul E. McKenney
2008-08-30 15:40         ` Peter Zijlstra
2008-08-30 19:38           ` Paul E. McKenney
2008-09-02 13:26           ` Mathieu Desnoyers
2008-09-02 13:41             ` Peter Zijlstra
2008-09-02 14:55               ` Paul E. McKenney
2008-08-30  9:58     ` Lai Jiangshan
2008-08-30 13:32       ` Manfred Spraul
2008-08-30 14:34         ` Paul E. McKenney
2008-08-31 10:58           ` Manfred Spraul
2008-08-31 17:20             ` Paul E. McKenney [this message]
2008-08-31 17:45               ` Manfred Spraul
2008-08-31 17:55                 ` Paul E. McKenney
2008-08-31 18:18                   ` Manfred Spraul
2008-08-31 19:23                     ` Paul E. McKenney
2008-08-30 14:29       ` Paul E. McKenney
2008-09-01  9:38     ` Andi Kleen
2008-09-02  1:05       ` Paul E. McKenney
2008-09-02  6:18         ` Andi Kleen
2008-09-05 15:29     ` [PATCH, RFC] v4 " Paul E. McKenney
2008-09-05 19:33       ` Andrew Morton
2008-09-05 23:04         ` Paul E. McKenney
2008-09-05 23:52           ` Andrew Morton
2008-09-06  4:16             ` Paul E. McKenney
2008-09-06 16:37       ` Manfred Spraul
2008-09-07 17:25         ` Paul E. McKenney
2008-09-07 10:18       ` [RFC, PATCH] Add a CPU_STARTING notifier (was: Re: [PATCH, RFC] v4 scalable classic RCU implementation) Manfred Spraul
2008-09-07 11:07         ` Andi Kleen
2008-09-07 19:46         ` Paul E. McKenney
2008-09-15 16:02       ` [PATCH, RFC] v4 scalable classic RCU implementation Paul E. McKenney
2008-09-16 16:52         ` Manfred Spraul
2008-09-16 17:30           ` Paul E. McKenney
2008-09-16 17:48             ` Manfred Spraul
2008-09-16 18:22               ` Paul E. McKenney
2008-09-21 11:09               ` Manfred Spraul
2008-09-21 21:14                 ` Paul E. McKenney
2008-09-23 23:53         ` [PATCH, RFC] v6 " Paul E. McKenney
2008-09-25  7:26           ` Ingo Molnar
2008-09-25 14:05             ` Paul E. McKenney
2008-09-25  7:29           ` Ingo Molnar
2008-09-25 14:18             ` Paul E. McKenney
2008-10-10 16:09           ` [PATCH, RFC] v7 " Paul E. McKenney
2008-10-12 15:52             ` Manfred Spraul
2008-10-12 22:46               ` Paul E. McKenney
2008-10-13 18:03                 ` Manfred Spraul
2008-10-15  1:11                   ` Paul E. McKenney
2008-10-15  8:13                     ` Manfred Spraul
2008-10-15 15:26                       ` Paul E. McKenney
2008-10-22 18:41                         ` Manfred Spraul
2008-10-22 21:02                           ` Paul E. McKenney
2008-10-22 21:24                             ` Manfred Spraul
2008-10-27 16:45                               ` Paul E. McKenney
2008-10-27 19:48                                 ` Manfred Spraul
2008-10-27 23:52                                   ` Paul E. McKenney
2008-10-28  5:30                                     ` Manfred Spraul
2008-10-28 15:17                                       ` Paul E. McKenney
2008-10-28 17:21                                         ` Manfred Spraul
2008-10-28 17:35                                           ` Paul E. McKenney
2008-10-17  8:34             ` Gautham R Shenoy
2008-10-17 15:35               ` Gautham R Shenoy
2008-10-17 15:46                 ` Paul E. McKenney
2008-10-17 15:43               ` Paul E. McKenney
2008-12-08 18:42               ` Paul E. McKenney
2008-11-02 20:10             ` Manfred Spraul
2008-11-03 20:33               ` Paul E. McKenney
2008-11-05 19:48                 ` Manfred Spraul
2008-11-05 21:27                   ` Paul E. McKenney
2008-11-15 23:20             ` [PATCH, RFC] v8 " Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080831172001.GA7015@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=cl@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=dipankar@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=ego@in.ibm.com \
    --cc=josht@linux.vnet.ibm.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=schamp@sgi.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.