All of lore.kernel.org
 help / color / mirror / Atom feed
* Lost IPIs during CPU Hotplug
@ 2013-05-20 12:37 Heiko Carstens
  2013-05-21 14:52 ` Heiko Carstens
  0 siblings, 1 reply; 2+ messages in thread
From: Heiko Carstens @ 2013-05-20 12:37 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Thomas Gleixner, Andrew Morton,
	Linus Torvalds, Shaohua Li, Peter Zijlstra
  Cc: linux-kernel

I just got a dump from a system running a 3.0.something kernel, however I
think the problem exists with current kernels as well.

Testcase was some I/O intense workload together with cpu hotplug stress.

When trying to bring a cpu online we got an endless loop on the cpu that
issued the cpu_up and called smp_call_function_single() within its cpu
hotplug notifier:

(cpu 1)
		generic_exec_single at 19230c
 #0 [7edd77b98] smp_call_function_single at 19261c
 #1 [7edd77c00] iucv_cpu_notify at 4e2896
 #2 [7edd77c50] notifier_call_chain at 4eaf7e
 #3 [7edd77ca8] __raw_notifier_call_chain at 17e342
 #4 [7edd77d00] _cpu_up at 4dfcde
 #5 [7edd77d60] cpu_up at 4dfde2
 #6 [7edd77d88] store_online at 4dde2e
 #7 [7edd77db8] sysfs_write_file at 2e7284
 #8 [7edd77e10] vfs_write at 260434
 #9 [7edd77e70] sys_write at 260608
#10 [7edd77eb8] sysc_noemu at 4e8d80

generic_exec_single() was told to send an IPI to cpu 3. Afterwards it busy
loops waiting for confirmation (wait == 1) which never happens:
[...]
	if (wait)
		csd_lock_wait(csd);
[...]

Looking at the dump cpu 3 actually never received the IPI and the reason
seems to be that generic_exec_single() only sends an IPI to the remote cpu
if its call_single_queue is empty. The dump however shows 108 (!) pending
IPIs, all of them "trigger_softirq" requests, for cpu 3.
Therefore no IPI was sent since the assumption is that some other cpu
already sent one or is about to send one.

It looks to me like the IPI(s) was lost when cpu 3 was brought down before:

When stop_machine_cpu_stop() executes its state machine to offline cpu 3
there could have been the case that when changing to the
STOPMACHINE_DISABLE_IRQ state cpu 3 disabled irqs before cpu 1.
So when cpu 1 still has IRQs enabled it could have received an interrupt
and its bottom halve would send an IPI to cpu 3 which will never see it
before it goes finally offline.
(Besides that there is no run time guarentee when an IPI will be seen by
 the receiving cpu anyway.)

So we end up with cpu 3 offline, but having a pending IPI and a non
empty call_single_queue for cpu 3 which will prevent sending further IPIs
if the cpu comes online again.

So it looks to me like yet another CPU_DYING cpu hotplug notifier is needed
for the generic smp code, which looks for pending IPIs on the to be brought
down cpu and executes them.

Does that make sense?


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Lost IPIs during CPU Hotplug
  2013-05-20 12:37 Lost IPIs during CPU Hotplug Heiko Carstens
@ 2013-05-21 14:52 ` Heiko Carstens
  0 siblings, 0 replies; 2+ messages in thread
From: Heiko Carstens @ 2013-05-21 14:52 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Thomas Gleixner, Andrew Morton,
	Linus Torvalds, Shaohua Li, Peter Zijlstra, linux-kernel

On Mon, May 20, 2013 at 02:37:43PM +0200, Heiko Carstens wrote:
> I just got a dump from a system running a 3.0.something kernel, however I
> think the problem exists with current kernels as well.
> 
> Testcase was some I/O intense workload together with cpu hotplug stress.
> 
> When trying to bring a cpu online we got an endless loop on the cpu that
> issued the cpu_up and called smp_call_function_single() within its cpu
> hotplug notifier:

[...]

> It looks to me like the IPI(s) was lost when cpu 3 was brought down before:

[...]

> So it looks to me like yet another CPU_DYING cpu hotplug notifier is needed
> for the generic smp code, which looks for pending IPIs on the to be brought
> down cpu and executes them.
> 
> Does that make sense?

Ok, I was able to reproduce it. And the fix should be in the s390 specific
arch code within __cpu_disable() just before the cpu gets removed from the
cpu online mask.
No idea why this never has been seen before.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-05-21 14:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-20 12:37 Lost IPIs during CPU Hotplug Heiko Carstens
2013-05-21 14:52 ` Heiko Carstens

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.