linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch, 2.6.10-rc2-mm3, x86] fix reboot hang / APIC errors
@ 2004-11-26 10:40 Ingo Molnar
  0 siblings, 0 replies; only message in thread
From: Ingo Molnar @ 2004-11-26 10:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel


the attached patch fixes SMP/x86 systems to not hang during reboot in
certain circumstances, printing out an endless stream of:

 APIC error on CPU1: 00(04)
 APIC error on CPU1: 04(04)
 APIC error on CPU1: 04(04)

The bug is this: sys_reboot() calls device_shutdown(), which calls each
registered driver and shuts it down. One of them is lapic0, which is
lethal: it will disable the local APIC on the currently executing CPU.
This is buggy in a number of ways 1) there's no guarantee that the
reboot process wont migrate to another CPU at this point (it's still a
fully functioning kernel) 2) if another CPU tries to send an cross-CPU
IPI message after this CPU's local APIC has been disabled, that other
CPU will get an infinite stream of APIC error interrupts - locking the
system up in a flood of messages. The reboot never happens.

the fix is to do what we always did: only disable the local APIC in the
very final moments, as part of the smp_send_stop() logic.

AFAICS, this is a fresh changeset that came in over the ACPI BK tree, it
has not been committed to Linus' tree yet. This bug was found via
PREEMPT_RT where the race is much more likely to trigger, but there's no
reason why this could not occur in the generic kernel.

similarly, i think it's unfortunate that the IO-APIC driver has a
shutdown handler as well - while i cannot see a bug scenario right now,
i think it's quite fragile to disable all external interrupts (including
the timer interrupt!) at this point. Again, this is best done in
machine_restart(). (which already does it.)

	Ingo

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/arch/i386/kernel/apic.c.orig
+++ linux/arch/i386/kernel/apic.c
@@ -654,9 +654,12 @@ static int lapic_resume(struct sys_devic
 }
 
 
+/*
+ * This device has no shutdown method - fully functioning local APICs
+ * are needed on every CPU up until machine_halt/restart/poweroff.
+ */
 static struct sysdev_class lapic_sysclass = {
 	set_kset_name("lapic"),
-	.shutdown	= lapic_shutdown,
 	.resume		= lapic_resume,
 	.suspend	= lapic_suspend,
 };


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2004-11-27  6:02 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-26 10:40 [patch, 2.6.10-rc2-mm3, x86] fix reboot hang / APIC errors Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).