All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/smp: Fix cpuN startup panic
@ 2012-08-07  9:50 Chen, LinX Z
  2012-08-07 16:33 ` Jiang Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Chen, LinX Z @ 2012-08-07  9:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, tglx, hpa, yanmin_zhang

From: Lin Chen <linx.z.chen@intel.com>

We hit a panic while doing cpu hotplug test.
<0>[  627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up but did not get a callout!
<0>[  627.982864]
<4>[  627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
<4>[  627.982883] Call Trace:
<4>[  627.982903]  [<c18f2977>] panic+0x66/0x16c
<4>[  627.982918]  [<c12234cc>] ? default_get_apic_id+0x1c/0x40
<4>[  627.982931]  [<c18ef96d>] start_secondary+0xda/0x252

During BSP bootup AP, it is possible that BSP be preempted before
finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
AP busy wait for it. At present, AP will wait for 2 seconds then panic.

This patch let AP waits until BSP finish the startup sequence and gives
WARNING when BSP is preempted more than 2 seconds.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Lin Chen <linx.z.chen@intel.com>
---
  arch/x86/kernel/smpboot.c |   11 ++++++-----
  1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7c5a8c3..a9e3379 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
  	 * Waiting 2s total for startup (udelay is not yet working)
  	 */
  	timeout = jiffies + 2*HZ;
-	while (time_before(jiffies, timeout)) {
+	while (1) {
  		/*
  		 * Has the boot CPU finished it's STARTUP sequence?
  		 */
  		if (cpumask_test_cpu(cpuid, cpu_callout_mask))
  			break;
  		cpu_relax();
+		if (!time_before(jiffies, timeout)) {
+			WARN(1, "%s: CPU%d started up but did not get a callout!\n",
+					__func__, cpuid);
+			timeout = jiffies + 2*HZ;
+		}
  	}

-	if (!time_before(jiffies, timeout)) {
-		panic("%s: CPU%d started up but did not get a callout!\n",
-		      __func__, cpuid);
-	}

  	/*
  	 * the boot CPU has finished the init stage and is spinning
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/smp: Fix cpuN startup panic
  2012-08-07  9:50 [PATCH] x86/smp: Fix cpuN startup panic Chen, LinX Z
@ 2012-08-07 16:33 ` Jiang Liu
  2012-08-07 23:20   ` Yanmin Zhang
  0 siblings, 1 reply; 3+ messages in thread
From: Jiang Liu @ 2012-08-07 16:33 UTC (permalink / raw)
  To: Chen, LinX Z; +Cc: linux-kernel, mingo, tglx, hpa, yanmin_zhang

On 08/07/2012 05:50 PM, Chen, LinX Z wrote:
> From: Lin Chen <linx.z.chen@intel.com>
> 
> We hit a panic while doing cpu hotplug test.
> <0>[  627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up but did not get a callout!
> <0>[  627.982864]
> <4>[  627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
> <4>[  627.982883] Call Trace:
> <4>[  627.982903]  [<c18f2977>] panic+0x66/0x16c
> <4>[  627.982918]  [<c12234cc>] ? default_get_apic_id+0x1c/0x40
> <4>[  627.982931]  [<c18ef96d>] start_secondary+0xda/0x252
> 
> During BSP bootup AP, it is possible that BSP be preempted before
> finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
> AP busy wait for it. At present, AP will wait for 2 seconds then panic.
> 
> This patch let AP waits until BSP finish the startup sequence and gives
> WARNING when BSP is preempted more than 2 seconds.
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: Lin Chen <linx.z.chen@intel.com>
> ---
>  arch/x86/kernel/smpboot.c |   11 ++++++-----
>  1 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 7c5a8c3..a9e3379 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
>       * Waiting 2s total for startup (udelay is not yet working)
>       */
>      timeout = jiffies + 2*HZ;
> -    while (time_before(jiffies, timeout)) {
> +    while (1) {
Hi Yanmin,

	Seems a little risky, what if a slave CPU can't be booted due to hardware errors?
	Regards!
	Gerry

>          /*
>           * Has the boot CPU finished it's STARTUP sequence?
>           */
>          if (cpumask_test_cpu(cpuid, cpu_callout_mask))
>              break;
>          cpu_relax();
> +        if (!time_before(jiffies, timeout)) {
> +            WARN(1, "%s: CPU%d started up but did not get a callout!\n",
> +                    __func__, cpuid);
> +            timeout = jiffies + 2*HZ;
> +        }
>      }
> 
> -    if (!time_before(jiffies, timeout)) {
> -        panic("%s: CPU%d started up but did not get a callout!\n",
> -              __func__, cpuid);
> -    }
> 
>      /*
>       * the boot CPU has finished the init stage and is spinning


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/smp: Fix cpuN startup panic
  2012-08-07 16:33 ` Jiang Liu
@ 2012-08-07 23:20   ` Yanmin Zhang
  0 siblings, 0 replies; 3+ messages in thread
From: Yanmin Zhang @ 2012-08-07 23:20 UTC (permalink / raw)
  To: Jiang Liu; +Cc: Chen, LinX Z, linux-kernel, mingo, tglx, hpa

On Wed, 2012-08-08 at 00:33 +0800, Jiang Liu wrote:
> On 08/07/2012 05:50 PM, Chen, LinX Z wrote:
> > From: Lin Chen <linx.z.chen@intel.com>
> > 
> > We hit a panic while doing cpu hotplug test.
> > <0>[  627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up but did not get a callout!
> > <0>[  627.982864]
> > <4>[  627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
> > <4>[  627.982883] Call Trace:
> > <4>[  627.982903]  [<c18f2977>] panic+0x66/0x16c
> > <4>[  627.982918]  [<c12234cc>] ? default_get_apic_id+0x1c/0x40
> > <4>[  627.982931]  [<c18ef96d>] start_secondary+0xda/0x252
> > 
> > During BSP bootup AP, it is possible that BSP be preempted before
> > finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
> > AP busy wait for it. At present, AP will wait for 2 seconds then panic.
> > 
> > This patch let AP waits until BSP finish the startup sequence and gives
> > WARNING when BSP is preempted more than 2 seconds.
> > 
> > Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> > Signed-off-by: Lin Chen <linx.z.chen@intel.com>
> > ---
> >  arch/x86/kernel/smpboot.c |   11 ++++++-----
> >  1 files changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 7c5a8c3..a9e3379 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
> >       * Waiting 2s total for startup (udelay is not yet working)
> >       */
> >      timeout = jiffies + 2*HZ;
> > -    while (time_before(jiffies, timeout)) {
> > +    while (1) {
> Hi Yanmin,
> 
> 	Seems a little risky, what if a slave CPU can't be booted due to hardware errors?
Slave CPU runs the loop. Basically, there is a handshake between BSP and AP.
The patch doesn't change BSP codes. So when slave CPU fails, BSP still goes ahead
and kernel still works.
 
> 	Regards!
> 	Gerry
> 
> >          /*
> >           * Has the boot CPU finished it's STARTUP sequence?
> >           */
> >          if (cpumask_test_cpu(cpuid, cpu_callout_mask))
> >              break;
> >          cpu_relax();
> > +        if (!time_before(jiffies, timeout)) {
> > +            WARN(1, "%s: CPU%d started up but did not get a callout!\n",
> > +                    __func__, cpuid);
> > +            timeout = jiffies + 2*HZ;
> > +        }
> >      }
> > 
> > -    if (!time_before(jiffies, timeout)) {
> > -        panic("%s: CPU%d started up but did not get a callout!\n",
> > -              __func__, cpuid);
> > -    }
> > 
> >      /*
> >       * the boot CPU has finished the init stage and is spinning
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-08-07 23:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-07  9:50 [PATCH] x86/smp: Fix cpuN startup panic Chen, LinX Z
2012-08-07 16:33 ` Jiang Liu
2012-08-07 23:20   ` Yanmin Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.