All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 1/2] kexec: return error of machine_kexec() fails
@ 2013-06-12 20:01 Stephen Warren
  2013-06-12 20:01 ` [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown Stephen Warren
  2013-06-21 22:41 ` [PATCH V2 1/2] kexec: return error of machine_kexec() fails Stephen Warren
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen Warren @ 2013-06-12 20:01 UTC (permalink / raw)
  To: linux-arm-kernel

From: Stephen Warren <swarren@nvidia.com>

Prior to commit 3ab8352 "kexec jump", if machine_kexec() returned,
sys_reboot() would return -EINVAL. This patch restores this behaviour
for the non-KEXEC_JUMP case, where machine_kexec() is not expected to
return.

This situation can occur on ARM, where kexec requires disabling all but
one CPU using CPU hotplug. However, if hotplug isn't supported by the
particular HW the kernel is running on, then kexec cannot succeed.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
v2: New patch.

Eric, Patch 2/2 doesn't really depend on this in particular, except to
avoid returning 0 back to user-space on failure, and thus generating:

kexec failed: Success

Hence, I think you can just apply this to your tree, and Russell will
apply 2/2 to his. If not, perhaps you can ack it so Russell can take
both?
---
 kernel/kexec.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 59f7b55..bde1190 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1702,6 +1702,8 @@ int kernel_kexec(void)
 		pm_restore_console();
 		unlock_system_sleep();
 	}
+#else
+	error = -EINVAL;
 #endif
 
  Unlock:
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-12 20:01 [PATCH V2 1/2] kexec: return error of machine_kexec() fails Stephen Warren
@ 2013-06-12 20:01 ` Stephen Warren
  2013-06-13  1:58   ` zhangfei gao
                     ` (2 more replies)
  2013-06-21 22:41 ` [PATCH V2 1/2] kexec: return error of machine_kexec() fails Stephen Warren
  1 sibling, 3 replies; 12+ messages in thread
From: Stephen Warren @ 2013-06-12 20:01 UTC (permalink / raw)
  To: linux-arm-kernel

From: Stephen Warren <swarren@nvidia.com>

Add comments to machine_shutdown()/halt()/power_off()/restart() that
describe their purpose and/or requirements re: CPUs being active/not.

In machine_shutdown(), replace the call to smp_send_stop() with a call to
disable_nonboot_cpus(). This completely disables all but one CPU, thus
satisfying the requirement that only a single CPU be active for kexec.
Adjust Kconfig dependencies for this change.

In machine_halt()/power_off()/restart(), call smp_send_stop() directly,
rather than via machine_shutdown(); these functions don't need to
completely de-activate all CPUs using hotplug, but rather just quiesce
them.

Remove smp_kill_cpus(), and its call from smp_send_stop().
smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling
smp_ops.cpu_die() on the target CPUs first. At least some implementations
of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra,
for example. Since smp_send_stop() is only used for shutdown, halt, and
power-off, there is no need to attempt any kind of CPU hotplug here.

Adjust Kconfig to reflect that machine_shutdown() (and hence kexec)
relies upon disable_nonboot_cpus(). However, this alone doesn't guarantee
that hotplug will work, or even that hotplug is implemented for a
particular piece of HW that a multi-platform zImage runs on. Hence, add
error-checking to machine_kexec() to determine whether it did work.

Suggested-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
v2:
* Removed some unnecessary #ifdefs.
* Simplified Kconfig depends for kexec.
* Replace BUG()s with kinder error-checking code.
* Significantly structurally reworked.
---
 arch/arm/Kconfig                |  2 +-
 arch/arm/kernel/machine_kexec.c |  4 ++++
 arch/arm/kernel/process.c       | 43 +++++++++++++++++++++++++++++++++++------
 arch/arm/kernel/smp.c           | 13 -------------
 4 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 42d6ea2..ecab9f7 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2028,7 +2028,7 @@ config XIP_PHYS_ADDR
 
 config KEXEC
 	bool "Kexec system call (EXPERIMENTAL)"
-	depends on (!SMP || HOTPLUG_CPU)
+	depends on (!SMP || PM_SLEEP_SMP)
 	help
 	  kexec is a system call that implements the ability to shutdown your
 	  current kernel, and to start another kernel.  It is like a reboot
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 8ef8c93..4fb074c 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -134,6 +134,10 @@ void machine_kexec(struct kimage *image)
 	unsigned long reboot_code_buffer_phys;
 	void *reboot_code_buffer;
 
+	if (num_online_cpus() > 1) {
+		pr_err("kexec: error: multiple CPUs still online\n");
+		return;
+	}
 
 	page_list = image->head & PAGE_MASK;
 
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 282de48..6e8931c 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -184,30 +184,61 @@ int __init reboot_setup(char *str)
 
 __setup("reboot=", reboot_setup);
 
+/*
+ * Called by kexec, immediately prior to machine_kexec().
+ *
+ * This must completely disable all secondary CPUs; simply causing those CPUs
+ * to execute e.g. a RAM-based pin loop is not sufficient. This allows the
+ * kexec'd kernel to use any and all RAM as it sees fit, without having to
+ * avoid any code or data used by any SW CPU pin loop. The CPU hotplug
+ * functionality embodied in disable_nonboot_cpus() to achieve this.
+ */
 void machine_shutdown(void)
 {
-#ifdef CONFIG_SMP
-	smp_send_stop();
-#endif
+	disable_nonboot_cpus();
 }
 
+/*
+ * Halting simply requires that the secondary CPUs stop performing any
+ * activity (executing tasks, handling interrupts). smp_send_stop()
+ * achieves this.
+ */
 void machine_halt(void)
 {
-	machine_shutdown();
+	smp_send_stop();
+
 	local_irq_disable();
 	while (1);
 }
 
+/*
+ * Power-off simply requires that the secondary CPUs stop performing any
+ * activity (executing tasks, handling interrupts). smp_send_stop()
+ * achieves this. When the system power is turned off, it will take all CPUs
+ * with it.
+ */
 void machine_power_off(void)
 {
-	machine_shutdown();
+	smp_send_stop();
+
 	if (pm_power_off)
 		pm_power_off();
 }
 
+/*
+ * Restart requires that the secondary CPUs stop performing any activity
+ * while the primary CPU resets the system. Systems with a single CPU can
+ * use soft_restart() as their machine descriptor's .restart hook, since that
+ * will cause the only available CPU to reset. Systems with multiple CPUs must
+ * provide a HW restart implementation, to ensure that all CPUs reset at once.
+ * This is required so that any code running after reset on the primary CPU
+ * doesn't have to co-ordinate with other CPUs to ensure they aren't still
+ * executing pre-reset code, and using RAM that the primary CPU's code wishes
+ * to use. Implementing such co-ordination would be essentially impossible.
+ */
 void machine_restart(char *cmd)
 {
-	machine_shutdown();
+	smp_send_stop();
 
 	arm_pm_restart(reboot_mode, cmd);
 
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 550d63c..5919eb4 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -651,17 +651,6 @@ void smp_send_reschedule(int cpu)
 	smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-static void smp_kill_cpus(cpumask_t *mask)
-{
-	unsigned int cpu;
-	for_each_cpu(cpu, mask)
-		platform_cpu_kill(cpu);
-}
-#else
-static void smp_kill_cpus(cpumask_t *mask) { }
-#endif
-
 void smp_send_stop(void)
 {
 	unsigned long timeout;
@@ -679,8 +668,6 @@ void smp_send_stop(void)
 
 	if (num_online_cpus() > 1)
 		pr_warning("SMP: failed to stop secondary CPUs\n");
-
-	smp_kill_cpus(&mask);
 }
 
 /*
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-12 20:01 ` [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown Stephen Warren
@ 2013-06-13  1:58   ` zhangfei gao
  2013-06-13 15:05     ` Stephen Warren
  2013-06-13  7:45   ` Will Deacon
  2013-06-17 18:58   ` Stephen Warren
  2 siblings, 1 reply; 12+ messages in thread
From: zhangfei gao @ 2013-06-13  1:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 13, 2013 at 4:01 AM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> From: Stephen Warren <swarren@nvidia.com>
>
> Add comments to machine_shutdown()/halt()/power_off()/restart() that
> describe their purpose and/or requirements re: CPUs being active/not.
>
> In machine_shutdown(), replace the call to smp_send_stop() with a call to
> disable_nonboot_cpus(). This completely disables all but one CPU, thus
> satisfying the requirement that only a single CPU be active for kexec.
> Adjust Kconfig dependencies for this change.
>
> In machine_halt()/power_off()/restart(), call smp_send_stop() directly,
> rather than via machine_shutdown(); these functions don't need to
> completely de-activate all CPUs using hotplug, but rather just quiesce
> them.
>
> Remove smp_kill_cpus(), and its call from smp_send_stop().
> smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling
> smp_ops.cpu_die() on the target CPUs first. At least some implementations
> of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra,
> for example. Since smp_send_stop() is only used for shutdown, halt, and
> power-off, there is no need to attempt any kind of CPU hotplug here.
>

Hi, Setphen

We also met the issue: fail to reboot system from panic.
Solve it by the following change.

cpu_die is function kill cpu itself
cpu_kill is functoin to check whethe the cpu is dead or not

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index e171a0e..624e97c 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -554,6 +554,9 @@ static void ipi_cpu_stop(unsigned int cpu)
        local_fiq_disable();
        local_irq_disable();

+#ifdef CONFIG_HOTPLUG_CPU
+       cpu_die();
+#endif
        while (1)
                cpu_relax();
 }

What do you think?

Thanks

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-12 20:01 ` [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown Stephen Warren
  2013-06-13  1:58   ` zhangfei gao
@ 2013-06-13  7:45   ` Will Deacon
  2013-06-13 14:59     ` Stephen Warren
  2013-06-13 17:10     ` Russell King - ARM Linux
  2013-06-17 18:58   ` Stephen Warren
  2 siblings, 2 replies; 12+ messages in thread
From: Will Deacon @ 2013-06-13  7:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Stephen,

On Wed, Jun 12, 2013 at 09:01:21PM +0100, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> Add comments to machine_shutdown()/halt()/power_off()/restart() that
> describe their purpose and/or requirements re: CPUs being active/not.
> 
> In machine_shutdown(), replace the call to smp_send_stop() with a call to
> disable_nonboot_cpus(). This completely disables all but one CPU, thus
> satisfying the requirement that only a single CPU be active for kexec.
> Adjust Kconfig dependencies for this change.
> 
> In machine_halt()/power_off()/restart(), call smp_send_stop() directly,
> rather than via machine_shutdown(); these functions don't need to
> completely de-activate all CPUs using hotplug, but rather just quiesce
> them.
> 
> Remove smp_kill_cpus(), and its call from smp_send_stop().
> smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling
> smp_ops.cpu_die() on the target CPUs first. At least some implementations
> of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra,
> for example. Since smp_send_stop() is only used for shutdown, halt, and
> power-off, there is no need to attempt any kind of CPU hotplug here.
> 
> Adjust Kconfig to reflect that machine_shutdown() (and hence kexec)
> relies upon disable_nonboot_cpus(). However, this alone doesn't guarantee
> that hotplug will work, or even that hotplug is implemented for a
> particular piece of HW that a multi-platform zImage runs on. Hence, add
> error-checking to machine_kexec() to determine whether it did work.

[...]

> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index 282de48..6e8931c 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -184,30 +184,61 @@ int __init reboot_setup(char *str)
>  
>  __setup("reboot=", reboot_setup);
>  
> +/*
> + * Called by kexec, immediately prior to machine_kexec().
> + *
> + * This must completely disable all secondary CPUs; simply causing those CPUs
> + * to execute e.g. a RAM-based pin loop is not sufficient. This allows the
> + * kexec'd kernel to use any and all RAM as it sees fit, without having to
> + * avoid any code or data used by any SW CPU pin loop. The CPU hotplug
> + * functionality embodied in disable_nonboot_cpus() to achieve this.
> + */
>  void machine_shutdown(void)
>  {
> -#ifdef CONFIG_SMP
> -	smp_send_stop();
> -#endif
> +	disable_nonboot_cpus();
>  }

Any reason not to move this into machine_kexec and leave machine_shutdown
as a nop?

Anyway, I'm on holiday without internet until Tuesday, so for these two
patches:

  Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-13  7:45   ` Will Deacon
@ 2013-06-13 14:59     ` Stephen Warren
  2013-06-13 17:11       ` Russell King - ARM Linux
  2013-06-13 17:10     ` Russell King - ARM Linux
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen Warren @ 2013-06-13 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/13/2013 01:45 AM, Will Deacon wrote:
> Hi Stephen,
> 
> On Wed, Jun 12, 2013 at 09:01:21PM +0100, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> Add comments to machine_shutdown()/halt()/power_off()/restart() that
>> describe their purpose and/or requirements re: CPUs being active/not.
>>
>> In machine_shutdown(), replace the call to smp_send_stop() with a call to
>> disable_nonboot_cpus(). This completely disables all but one CPU, thus
>> satisfying the requirement that only a single CPU be active for kexec.
>> Adjust Kconfig dependencies for this change.
>>
>> In machine_halt()/power_off()/restart(), call smp_send_stop() directly,
>> rather than via machine_shutdown(); these functions don't need to
>> completely de-activate all CPUs using hotplug, but rather just quiesce
>> them.
>>
>> Remove smp_kill_cpus(), and its call from smp_send_stop().
>> smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling
>> smp_ops.cpu_die() on the target CPUs first. At least some implementations
>> of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra,
>> for example. Since smp_send_stop() is only used for shutdown, halt, and
>> power-off, there is no need to attempt any kind of CPU hotplug here.
>>
>> Adjust Kconfig to reflect that machine_shutdown() (and hence kexec)
>> relies upon disable_nonboot_cpus(). However, this alone doesn't guarantee
>> that hotplug will work, or even that hotplug is implemented for a
>> particular piece of HW that a multi-platform zImage runs on. Hence, add
>> error-checking to machine_kexec() to determine whether it did work.

>> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c

>> +/*
>> + * Called by kexec, immediately prior to machine_kexec().
>> + *
>> + * This must completely disable all secondary CPUs; simply causing those CPUs
>> + * to execute e.g. a RAM-based pin loop is not sufficient. This allows the
>> + * kexec'd kernel to use any and all RAM as it sees fit, without having to
>> + * avoid any code or data used by any SW CPU pin loop. The CPU hotplug
>> + * functionality embodied in disable_nonboot_cpus() to achieve this.
>> + */
>>  void machine_shutdown(void)
>>  {
>> -#ifdef CONFIG_SMP
>> -	smp_send_stop();
>> -#endif
>> +	disable_nonboot_cpus();
>>  }
> 
> Any reason not to move this into machine_kexec and leave machine_shutdown
> as a nop?

Well, the function needs to exist to link the kernel, so I figured it
may as well do something:-)

Judging by the function names, it seems like machine_shutdown() should
put the machine into a state where kexec can work, and machine_kexec()
should do the final "jump". That matches doing disable_nonboot_cpus() in
here. I don't know much about KEXEC_JUMP, but if it were ever
implemented for ARM, I'm not sure we'd want to do the
disable_nonboot_cpus() in that case(?), and machine_shutdown() is called
if !KEXEC_JUMP, but machine_kexec() is called for both.

It also means I get to put that comment right there, which is next to
the equivalent commands for machine_halt/power_off/restart, rather than
in some completely different place in machine_kexec.c.

> Anyway, I'm on holiday without internet until Tuesday, so for these two
> patches:
> 
>   Acked-by: Will Deacon <will.deacon@arm.com>

Thanks!

Russell, does this look good to you? I assume I should put this into the
ARM patch tracker.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-13  1:58   ` zhangfei gao
@ 2013-06-13 15:05     ` Stephen Warren
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Warren @ 2013-06-13 15:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/12/2013 07:58 PM, zhangfei gao wrote:
> On Thu, Jun 13, 2013 at 4:01 AM, Stephen Warren <swarren@wwwdotorg.org> wrote:
...
> Hi, Setphen
> 
> We also met the issue: fail to reboot system from panic.
> Solve it by the following change.
> 
> cpu_die is function kill cpu itself
> cpu_kill is functoin to check whethe the cpu is dead or not
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c

> @@ -554,6 +554,9 @@ static void ipi_cpu_stop(unsigned int cpu)

> +#ifdef CONFIG_HOTPLUG_CPU
> +       cpu_die();
> +#endif
>         while (1)
>                 cpu_relax();
>  }
> 
> What do you think?

That probably works too, since it pairs cpu_die/cpu_kill, although the
patch I sent ends up running less code during the halt/power_off/restart
path, which seems like a good idea; less stuff to potentially break.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-13  7:45   ` Will Deacon
  2013-06-13 14:59     ` Stephen Warren
@ 2013-06-13 17:10     ` Russell King - ARM Linux
  1 sibling, 0 replies; 12+ messages in thread
From: Russell King - ARM Linux @ 2013-06-13 17:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 13, 2013 at 08:45:51AM +0100, Will Deacon wrote:
> Hi Stephen,
> 
> On Wed, Jun 12, 2013 at 09:01:21PM +0100, Stephen Warren wrote:
> >  void machine_shutdown(void)
> >  {
> > -#ifdef CONFIG_SMP
> > -	smp_send_stop();
> > -#endif
> > +	disable_nonboot_cpus();
> >  }
> 
> Any reason not to move this into machine_kexec and leave machine_shutdown
> as a nop?

If you take a peek at kernel/kexec.c, machine_shutdown() is only invoked
if we haven't frozen processes and already disabled via hotplug the other
CPUs - which will have already been done via disable_nonboot_cpus() in
that path.

So moving it to kexec would mean that path calls disable_nonboot_cpus()
twice, which probably isn't a good thing.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-13 14:59     ` Stephen Warren
@ 2013-06-13 17:11       ` Russell King - ARM Linux
  2013-06-14  4:53         ` zhangfei gao
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King - ARM Linux @ 2013-06-13 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 13, 2013 at 08:59:40AM -0600, Stephen Warren wrote:
> Russell, does this look good to you? I assume I should put this into the
> ARM patch tracker.

I think it's fine.  It would be nice if zhangfei gao would test these
patches and confirm whether they fix it there as well.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-13 17:11       ` Russell King - ARM Linux
@ 2013-06-14  4:53         ` zhangfei gao
  0 siblings, 0 replies; 12+ messages in thread
From: zhangfei gao @ 2013-06-14  4:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 14, 2013 at 1:11 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Jun 13, 2013 at 08:59:40AM -0600, Stephen Warren wrote:
>> Russell, does this look good to you? I assume I should put this into the
>> ARM patch tracker.
>
> I think it's fine.  It would be nice if zhangfei gao would test these
> patches and confirm whether they fix it there as well.

It also works here, pass two hours stress test of panic -> reboot.

Tested-by:  Zhangfei Gao <zhangfei.gao@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-12 20:01 ` [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown Stephen Warren
  2013-06-13  1:58   ` zhangfei gao
  2013-06-13  7:45   ` Will Deacon
@ 2013-06-17 18:58   ` Stephen Warren
  2013-06-17 19:41     ` Russell King - ARM Linux
  2 siblings, 1 reply; 12+ messages in thread
From: Stephen Warren @ 2013-06-17 18:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/12/2013 02:01 PM, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> Add comments to machine_shutdown()/halt()/power_off()/restart() that
> describe their purpose and/or requirements re: CPUs being active/not.
> 
> In machine_shutdown(), replace the call to smp_send_stop() with a call to
> disable_nonboot_cpus(). This completely disables all but one CPU, thus
> satisfying the requirement that only a single CPU be active for kexec.
> Adjust Kconfig dependencies for this change.
> 
> In machine_halt()/power_off()/restart(), call smp_send_stop() directly,
> rather than via machine_shutdown(); these functions don't need to
> completely de-activate all CPUs using hotplug, but rather just quiesce
> them.
> 
> Remove smp_kill_cpus(), and its call from smp_send_stop().
> smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling
> smp_ops.cpu_die() on the target CPUs first. At least some implementations
> of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra,
> for example. Since smp_send_stop() is only used for shutdown, halt, and
> power-off, there is no need to attempt any kind of CPU hotplug here.
> 
> Adjust Kconfig to reflect that machine_shutdown() (and hence kexec)
> relies upon disable_nonboot_cpus(). However, this alone doesn't guarantee
> that hotplug will work, or even that hotplug is implemented for a
> particular piece of HW that a multi-platform zImage runs on. Hence, add
> error-checking to machine_kexec() to determine whether it did work.

Russell,

The patch which initially triggered the problem [shutdown/reboot hangs
on Tegra] (cf7df37 "reboot: rigrate shutdown/reboot to boot cpu") ended
up going into v3.10; I assumed it was only going into v3.11.

Is it possible to take this patch for v3.10 rather than v3.11? (or is
your git-curr branch for 3.10; that's where your patchd told me this was
applied.)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown
  2013-06-17 18:58   ` Stephen Warren
@ 2013-06-17 19:41     ` Russell King - ARM Linux
  0 siblings, 0 replies; 12+ messages in thread
From: Russell King - ARM Linux @ 2013-06-17 19:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 17, 2013 at 12:58:53PM -0600, Stephen Warren wrote:
> On 06/12/2013 02:01 PM, Stephen Warren wrote:
> > From: Stephen Warren <swarren@nvidia.com>
> > 
> > Add comments to machine_shutdown()/halt()/power_off()/restart() that
> > describe their purpose and/or requirements re: CPUs being active/not.
> > 
> > In machine_shutdown(), replace the call to smp_send_stop() with a call to
> > disable_nonboot_cpus(). This completely disables all but one CPU, thus
> > satisfying the requirement that only a single CPU be active for kexec.
> > Adjust Kconfig dependencies for this change.
> > 
> > In machine_halt()/power_off()/restart(), call smp_send_stop() directly,
> > rather than via machine_shutdown(); these functions don't need to
> > completely de-activate all CPUs using hotplug, but rather just quiesce
> > them.
> > 
> > Remove smp_kill_cpus(), and its call from smp_send_stop().
> > smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling
> > smp_ops.cpu_die() on the target CPUs first. At least some implementations
> > of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra,
> > for example. Since smp_send_stop() is only used for shutdown, halt, and
> > power-off, there is no need to attempt any kind of CPU hotplug here.
> > 
> > Adjust Kconfig to reflect that machine_shutdown() (and hence kexec)
> > relies upon disable_nonboot_cpus(). However, this alone doesn't guarantee
> > that hotplug will work, or even that hotplug is implemented for a
> > particular piece of HW that a multi-platform zImage runs on. Hence, add
> > error-checking to machine_kexec() to determine whether it did work.
> 
> Russell,
> 
> The patch which initially triggered the problem [shutdown/reboot hangs
> on Tegra] (cf7df37 "reboot: rigrate shutdown/reboot to boot cpu") ended
> up going into v3.10; I assumed it was only going into v3.11.
> 
> Is it possible to take this patch for v3.10 rather than v3.11? (or is
> your git-curr branch for 3.10; that's where your patchd told me this was
> applied.)

The concern I have is that we're now at -rc6, and my "fixes" branch for
this time around is getting much larger than previous ones:

 5 files changed, 14 insertions(+), 11 deletions(-)

 3 files changed, 2 insertions(+), 3 deletions(-)

 7 files changed, 45 insertions(+), 5 deletions(-)

and it's currently looking like:

 7 files changed, 61 insertions(+), 9 deletions(-)

yes, not huge, but it's the wrong direction - and consider I've dropped
one thing from fixes this morning because they were actually broken...
and you're asking me to include this:

 4 files changed, 42 insertions(+), 20 deletions(-)

into that, which'll make it:

11 files changed, 103 insertions(+), 29 deletions(-)

and it really starts to look like things are heading in the wrong
direction... especially as far as Linus would be concerned for -rc6...

I will try though. :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 1/2] kexec: return error of machine_kexec() fails
  2013-06-12 20:01 [PATCH V2 1/2] kexec: return error of machine_kexec() fails Stephen Warren
  2013-06-12 20:01 ` [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown Stephen Warren
@ 2013-06-21 22:41 ` Stephen Warren
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Warren @ 2013-06-21 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/12/2013 02:01 PM, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> Prior to commit 3ab8352 "kexec jump", if machine_kexec() returned,
> sys_reboot() would return -EINVAL. This patch restores this behaviour
> for the non-KEXEC_JUMP case, where machine_kexec() is not expected to
> return.
> 
> This situation can occur on ARM, where kexec requires disabling all but
> one CPU using CPU hotplug. However, if hotplug isn't supported by the
> particular HW the kernel is running on, then kexec cannot succeed.

Eric, does this look reasonable to go into 3.11? Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-06-21 22:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-12 20:01 [PATCH V2 1/2] kexec: return error of machine_kexec() fails Stephen Warren
2013-06-12 20:01 ` [PATCH V2 2/2] ARM: decouple CPU offlining from reboot/shutdown Stephen Warren
2013-06-13  1:58   ` zhangfei gao
2013-06-13 15:05     ` Stephen Warren
2013-06-13  7:45   ` Will Deacon
2013-06-13 14:59     ` Stephen Warren
2013-06-13 17:11       ` Russell King - ARM Linux
2013-06-14  4:53         ` zhangfei gao
2013-06-13 17:10     ` Russell King - ARM Linux
2013-06-17 18:58   ` Stephen Warren
2013-06-17 19:41     ` Russell King - ARM Linux
2013-06-21 22:41 ` [PATCH V2 1/2] kexec: return error of machine_kexec() fails Stephen Warren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.