linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 11629] New: quad G5 fails to shut down
       [not found] <bug-11629-10286@http.bugzilla.kernel.org/>
@ 2008-09-23 21:30 ` Andrew Morton
  2008-09-23 21:50   ` Johannes Berg
                     ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Andrew Morton @ 2008-09-23 21:30 UTC (permalink / raw)
  To: johannes; +Cc: linuxppc-dev, Thomas, Ingo Molnar, Gleixner, bugme-daemon

On Tue, 23 Sep 2008 14:20:55 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11629
> 
>            Summary: quad G5 fails to shut down
>            Product: Platform Specific/Hardware
>            Version: 2.5
>      KernelVersion: 2.6.27-rc6+wireless bits
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: PPC-64
>         AssignedTo: anton@samba.org
>         ReportedBy: johannes@sipsolutions.net
> 
> 
> Latest working kernel version: don't remember, pretty sure 2.6.26 worked but
> have been running development kernels forever
> Earliest failing kernel version: n/a, haven't checked, has been failing for a
> while
> Distribution: debian/unstable
> Hardware Environment: quad G5 powermac
> Software Environment: ?
> Problem Description: machine fails to shut down, 
> 
> Steps to reproduce:
> shut down the machine, see it fail.
> 
> http://johannes.sipsolutions.net/files/dsc_2274.jpg
> 

That went splat in the clockevents code.

I saw a pull request with several fixes float past this morning:

From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Andrew Morton <akpm@linux-foundation.org>
Subject: [git pull] timer fixes
Date: Tue, 23 Sep 2008 21:41:00 +0200

Fingers crossed..

I'll mark this as a post-2.6.26 regression, although you seem a bit
wobbly on that.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 11629] New: quad G5 fails to shut down
  2008-09-23 21:30 ` [Bugme-new] [Bug 11629] New: quad G5 fails to shut down Andrew Morton
@ 2008-09-23 21:50   ` Johannes Berg
  2008-09-23 22:45   ` Johannes Berg
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Johannes Berg @ 2008-09-23 21:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linuxppc-dev, Thomas Gleixner, Ingo Molnar, bugme-daemon

[-- Attachment #1: Type: text/plain, Size: 1051 bytes --]

On Tue, 2008-09-23 at 14:30 -0700, Andrew Morton wrote:

> > http://johannes.sipsolutions.net/files/dsc_2274.jpg
> > 
> 
> That went splat in the clockevents code.

Yup.

> I saw a pull request with several fixes float past this morning:
> 
> From: Ingo Molnar <mingo@elte.hu>
> To: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: linux-kernel@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Andrew Morton <akpm@linux-foundation.org>
> Subject: [git pull] timer fixes
> Date: Tue, 23 Sep 2008 21:41:00 +0200
> 
> Fingers crossed..

Ah, cool, I'll pull it in once I've verified this other problem I have
with ftrace on the same box.

> I'll mark this as a post-2.6.26 regression, although you seem a bit
> wobbly on that.

I was fairly sure it was ok with 2.6.26 but after checking it appears
that the crash is a post-2.6.26 regression, but the failure to shutdown
is not. Sorry, I've been following wireless-testing.git on this box for
so long it slipped my mind.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 11629] New: quad G5 fails to shut down
  2008-09-23 21:30 ` [Bugme-new] [Bug 11629] New: quad G5 fails to shut down Andrew Morton
  2008-09-23 21:50   ` Johannes Berg
@ 2008-09-23 22:45   ` Johannes Berg
  2008-09-23 23:10     ` Johannes Berg
  2008-09-24 10:07   ` Johannes Berg
  2008-09-25  8:56   ` Johannes Berg
  3 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2008-09-23 22:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linuxppc-dev, Thomas Gleixner, Ingo Molnar, bugme-daemon

[-- Attachment #1: Type: text/plain, Size: 900 bytes --]

On Tue, 2008-09-23 at 14:30 -0700, Andrew Morton wrote:

> > http://johannes.sipsolutions.net/files/dsc_2274.jpg
> > 
> 
> That went splat in the clockevents code.
> 
> I saw a pull request with several fixes float past this morning:
> 
> From: Ingo Molnar <mingo@elte.hu>
> To: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: linux-kernel@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Andrew Morton <akpm@linux-foundation.org>
> Subject: [git pull] timer fixes
> Date: Tue, 23 Sep 2008 21:41:00 +0200
> 
> Fingers crossed..

Well, that does seem to fix the regression. That is, it no longer
crashes. It still doesn't shut down, but that seems older and possibly
partially my fault as I wrote the CPU hotplug implementation (thought it
works fine for hibernate, never mind the fact that resuming from
hibernate hangs)

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 11629] New: quad G5 fails to shut down
  2008-09-23 22:45   ` Johannes Berg
@ 2008-09-23 23:10     ` Johannes Berg
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Berg @ 2008-09-23 23:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linuxppc-dev, Thomas Gleixner, Ingo Molnar, bugme-daemon

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

On Wed, 2008-09-24 at 00:45 +0200, Johannes Berg wrote:

> Well, that does seem to fix the regression. That is, it no longer
> crashes.

Spoke too early, it seems that the crash isn't exactly reproducible all
the time nor happens at the same spot all the time. I'd suspect that the
timer code isn't to blame but something in the IPI code or so.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 11629] New: quad G5 fails to shut down
  2008-09-23 21:30 ` [Bugme-new] [Bug 11629] New: quad G5 fails to shut down Andrew Morton
  2008-09-23 21:50   ` Johannes Berg
  2008-09-23 22:45   ` Johannes Berg
@ 2008-09-24 10:07   ` Johannes Berg
  2008-09-25  8:56   ` Johannes Berg
  3 siblings, 0 replies; 7+ messages in thread
From: Johannes Berg @ 2008-09-24 10:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linuxppc-dev, bugme-daemon, Thomas Gleixner, Ingo Molnar

On Tue, 2008-09-23 at 14:30 -0700, Andrew Morton wrote:

> > Steps to reproduce:
> > shut down the machine, see it fail.
> > 
> > http://johannes.sipsolutions.net/files/dsc_2274.jpg
> > 
> 
> That went splat in the clockevents code.

This is still strange, it appears that when the CPUs fail to offline, we
still call into the clockevents code and it rightfully crashes, but I'll
check that out separately.

I've found the problem, below is a tentative fix. The problem has been
around for quite a while, here's the whole commit that "caused" it
(because it's so short):

commit 4047727e5ae33f9b8d2b7766d1994ea6e5ec2991
Author: Mark Lord <lkml@rtr.ca>
Date:   Mon Oct 1 01:20:10 2007 -0700

    Fix SMP poweroff hangs
    
    We need to disable all CPUs other than the boot CPU (usually 0) before
    attempting to power-off modern SMP machines.  This fixes the
    hang-on-poweroff issue on my MythTV SMP box, and also on Thomas Gleixner's
    new toybox.
    
    Signed-off-by: Mark Lord <mlord@pobox.com>
    Acked-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/kernel/sys.c b/kernel/sys.c
index 1b33b05..8ae2e63 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -32,6 +32,7 @@
 #include <linux/getcpu.h>
 #include <linux/task_io_accounting_ops.h>
 #include <linux/seccomp.h>
+#include <linux/cpu.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
@@ -878,6 +879,7 @@ void kernel_power_off(void)
        kernel_shutdown_prepare(SYSTEM_POWER_OFF);
        if (pm_power_off_prepare)
                pm_power_off_prepare();
+       disable_nonboot_cpus();
        sysdev_shutdown();
        printk(KERN_EMERG "Power down.\n");
        machine_power_off();


And here's the fix.

From: Johannes Berg <johannes@sipsolutions.net>
Subject: powerpc: fix shutdown

I tracked down the shutdown regression to CPUs not dying
when being shut down during power-off. This turns out to
be due to the system_state being SYSTEM_POWER_OFF, which
this code doesn't take as a valid state for shutting off
CPUs in.

This has never made sense to me, but when I added hotplug
code to implement hibernate I only "made it work" and did
not question the need to check the system_state. Thomas
Gleixner helped me dig, but the only thing we found is
that it was added with the original commit that added CPU
hotplug support.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joel Schopp <jschopp@austin.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/idle.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -36,9 +36,7 @@
 #ifdef CONFIG_HOTPLUG_CPU
 /* this is used for software suspend, and that shuts down
  * CPUs even while the system is still booting... */
-#define cpu_should_die()	(cpu_is_offline(smp_processor_id()) && \
-				   (system_state == SYSTEM_RUNNING     \
-				 || system_state == SYSTEM_BOOTING))
+#define cpu_should_die()	cpu_is_offline(smp_processor_id())
 #else
 #define cpu_should_die()	0
 #endif

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 11629] New: quad G5 fails to shut down
  2008-09-23 21:30 ` [Bugme-new] [Bug 11629] New: quad G5 fails to shut down Andrew Morton
                     ` (2 preceding siblings ...)
  2008-09-24 10:07   ` Johannes Berg
@ 2008-09-25  8:56   ` Johannes Berg
  2008-09-26 19:17     ` Joel Schopp
  3 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2008-09-25  8:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linuxppc-dev, bugme-daemon, Thomas Gleixner, Ingo Molnar

I wrote:

> And here's the fix.

Well, that should of course remove that comment there too. New version
below.

From: Johannes Berg <johannes@sipsolutions.net>
Subject: powerpc: fix shutdown

I tracked down the shutdown regression to CPUs not dying
when being shut down during power-off. This turns out to
be due to the system_state being SYSTEM_POWER_OFF, which
this code doesn't take as a valid state for shutting off
CPUs in.

This has never made sense to me, but when I added hotplug
code to implement hibernate I only "made it work" and did
not question the need to check the system_state. Thomas
Gleixner helped me dig, but the only thing we found is
that it was added with the original commit that added CPU
hotplug support.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joel Schopp <jschopp@austin.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/idle.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

--- everything.orig/arch/powerpc/kernel/idle.c	2008-09-01 12:28:37.000000000 +0200
+++ everything/arch/powerpc/kernel/idle.c	2008-09-25 10:55:45.000000000 +0200
@@ -34,11 +34,7 @@
 #include <asm/smp.h>
 
 #ifdef CONFIG_HOTPLUG_CPU
-/* this is used for software suspend, and that shuts down
- * CPUs even while the system is still booting... */
-#define cpu_should_die()	(cpu_is_offline(smp_processor_id()) && \
-				   (system_state == SYSTEM_RUNNING     \
-				 || system_state == SYSTEM_BOOTING))
+#define cpu_should_die()	cpu_is_offline(smp_processor_id())
 #else
 #define cpu_should_die()	0
 #endif

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 11629] New: quad G5 fails to shut down
  2008-09-25  8:56   ` Johannes Berg
@ 2008-09-26 19:17     ` Joel Schopp
  0 siblings, 0 replies; 7+ messages in thread
From: Joel Schopp @ 2008-09-26 19:17 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linuxppc-dev, Thomas Gleixner, bugme-daemon, Andrew Morton, Ingo Molnar


> From: Johannes Berg <johannes@sipsolutions.net>
> Subject: powerpc: fix shutdown
>
> I tracked down the shutdown regression to CPUs not dying
> when being shut down during power-off. This turns out to
> be due to the system_state being SYSTEM_POWER_OFF, which
> this code doesn't take as a valid state for shutting off
> CPUs in.
>
> This has never made sense to me, but when I added hotplug
> code to implement hibernate I only "made it work" and did
> not question the need to check the system_state. Thomas
> Gleixner helped me dig, but the only thing we found is
> that it was added with the original commit that added CPU
> hotplug support.
>
> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Joel Schopp <jschopp@austin.ibm.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>   
Tested this patch on a Power6 partition with cpu hotplug online/offline 
and some shutdown and reboot cycles.  Didn't see any ill effects.

Acked-by: Joel Schopp <jschopp@austin.ibm.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-09-26 19:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-11629-10286@http.bugzilla.kernel.org/>
2008-09-23 21:30 ` [Bugme-new] [Bug 11629] New: quad G5 fails to shut down Andrew Morton
2008-09-23 21:50   ` Johannes Berg
2008-09-23 22:45   ` Johannes Berg
2008-09-23 23:10     ` Johannes Berg
2008-09-24 10:07   ` Johannes Berg
2008-09-25  8:56   ` Johannes Berg
2008-09-26 19:17     ` Joel Schopp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).