* [PATCH 0/2] Fix hibernate on SMP spin-table systems
@ 2016-06-17 9:34 James Morse
2016-06-17 9:34 ` [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel James Morse
2016-06-17 9:34 ` [PATCH 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs James Morse
0 siblings, 2 replies; 11+ messages in thread
From: James Morse @ 2016-06-17 9:34 UTC (permalink / raw)
To: linux-arm-kernel
Hi all,
These two patches prevent hibernate on systems that use spin-tables and
have multiple CPUs. It also wires up 'cpus_stuck_in_kernel' which was
added for v4.7.
Prior to 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N"), we would
bring all the CPUs we would ever have up during boot. On a system with
spin tables and multiple CPUS the core hibernate code would prevent
hibernation because it can't disable secondary CPUs.
After 44dbcc93ab67, when we boot with 'maxcpus=1', we no longer bring
all the CPUs up, but we do move them into the secondary_holding_pen.
Resuming from hibernate will overwrite the secondary_holding_pen,
potentially releasing the secondary CPUs. If the kernel has been
loaded at a different physical address over hibernate and resume
the secondary_holding_pen may be at a different location after resume.
The core code can't help us with this, because these CPUs don't show
up in 'num_online_cpus()'
These two patches fix the problem by detecting multiple 'possible cpus'
that we have no mechanism to take offline and preventing hibernate [2].
This only happens for spin-table systems with multiple CPUs.
Kexec needs the same checks, so the 'or spin-tables' logic[0] got added
to the helper function.
(This problem was spotted on another thread [1])
[0] http://www.spinics.net/lists/arm-kernel/msg510097.html
[1] http://www.spinics.net/lists/arm-kernel/msg511880.html
[2] Failing to hibernate an SMP spin-tables system booted with maxcpus=1
---------------------%<---------------------
root@localhost:~# echo disk > /sys/power/state
[12248.197718] PM: Syncing filesystems ... done.
[12248.197727] Freezing user space processes ... (elapsed 0.001 seconds) done.
[12248.203197] PM: Preallocating image memory... done (allocated 50769 pages)
[12261.838699] PM: Allocated 203076 kbytes in 13.63 seconds (14.89 MB/s)
[12261.838760] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[12261.840732] Suspending console(s) (use no_console_suspend to debug)
[12261.842540] PM: freeze of devices complete after 1.732 msecs
[12261.843897] PM: late freeze of devices complete after 1.333 msecs
[12261.845191] PM: noirq freeze of devices complete after 1.272 msecs
[12261.845201] Disabling non-boot CPUs ...
[12261.845206] hibernate: Can't hibernate: no mechanism to offline secondary CPUs.
[12261.845206] PM: Error -16 creating hibernation image
[12261.846140] PM: noirq recover of devices complete after 0.908 msecs
[12261.847160] PM: early recover of devices complete after 0.940 msecs
[12262.765886] PM: recover of devices complete after 1.452 msecs
[12262.769191] Restarting tasks ... done.
-bash: echo: write error: Device or resource busy
root at localhost:~#
---------------------%<---------------------
James Morse (2):
arm64: smp: Add function to determine if cpus are stuck in the kernel
arm64: hibernate: Don't hibernate on systems with stuck CPUs
arch/arm64/include/asm/smp.h | 20 ++++++++++++++++++++
arch/arm64/kernel/hibernate.c | 6 ++++++
arch/arm64/kernel/smp.c | 13 +++++++++++++
3 files changed, 39 insertions(+)
--
2.8.0.rc3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
2016-06-17 9:34 [PATCH 0/2] Fix hibernate on SMP spin-table systems James Morse
@ 2016-06-17 9:34 ` James Morse
2016-06-17 10:27 ` Mark Rutland
2016-06-17 16:37 ` Geoff Levand
2016-06-17 9:34 ` [PATCH 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs James Morse
1 sibling, 2 replies; 11+ messages in thread
From: James Morse @ 2016-06-17 9:34 UTC (permalink / raw)
To: linux-arm-kernel
kernel/smp.c has a fancy counter that keeps track of the number of CPUs
it marked as not-present and left in cpu_park_loop(). If there are any
CPUs spinning in here, features like kexec or hibernate may release them
by overwriting this memory.
This problem also occurs on machines using spin-tables to release
secondary cores.
After commit 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N")
we bring all known cpus into the secondary holding pen, but may not bring
them up depending on 'maxcpus'. This memory can't be re-used by kexec
or hibernate.
Add a function cpus_are_stuck_in_kernel() to determine if either of these
cases have occurred.
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
---
arch/arm64/include/asm/smp.h | 20 ++++++++++++++++++++
arch/arm64/kernel/smp.c | 13 +++++++++++++
2 files changed, 33 insertions(+)
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 433e50405274..4be755bcc07a 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -124,6 +124,26 @@ static inline void cpu_panic_kernel(void)
cpu_park_loop();
}
+/*
+ * Kernel features such as hibernate and kexec depend on cpu hotplug to know
+ * they can replace any kernel memory they are not using themselves.
+ *
+ * There are two corner cases:
+ * If a secondary CPU fails to come online, (e.g. due to mismatched features),
+ * it will try to call cpu_die(). If this fails, it increases the counter
+ * cpus_stuck_in_kernel and sits in cpu_park_loop(). The memory containing
+ * this function must not be re-used for anything else as the 'stuck' core
+ * is executing it.
+ *
+ * CPUs are also considered stuck in the kernel if we have multiple CPUs
+ * and no way to offline secondary CPUs. This happens when secondaries
+ * are released via spin-table, these CPUs are moved into the kernel's
+ * secondary_holding_pen, which must not be overwritten.
+ *
+ * This function is used to inhibit features like kexec and hibernate.
+ */
+bool cpus_are_stuck_in_kernel(void);
+
#endif /* ifndef __ASSEMBLY__ */
#endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 678e0842cb3b..e197502f94fd 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -909,3 +909,16 @@ int setup_profiling_timer(unsigned int multiplier)
{
return -EINVAL;
}
+
+bool cpus_are_stuck_in_kernel(void)
+{
+ bool ret = !!cpus_stuck_in_kernel;
+#ifdef CONFIG_HOTPLUG_CPU
+ int any_cpu = raw_smp_processor_id();
+
+ if (num_possible_cpus() > 1 && !cpu_ops[any_cpu]->cpu_die)
+ ret = true;
+#endif
+
+ return ret;
+}
--
2.8.0.rc3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs
2016-06-17 9:34 [PATCH 0/2] Fix hibernate on SMP spin-table systems James Morse
2016-06-17 9:34 ` [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel James Morse
@ 2016-06-17 9:34 ` James Morse
2016-06-17 10:28 ` Mark Rutland
1 sibling, 1 reply; 11+ messages in thread
From: James Morse @ 2016-06-17 9:34 UTC (permalink / raw)
To: linux-arm-kernel
Hibernate relies on cpu hotplug to prevent secondary cores executing
the kernel text while it is being restored.
Add a call to cpus_are_stuck_in_kernel() to determine if there are
CPUs not counted by 'num_online_cpus()', and prevent hibernate in this
case.
Fixes: 82869ac57b5 ("arm64: kernel: Add support for hibernate/suspend-to-disk")
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/kernel/hibernate.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index f8df75d740f4..21ab5df9fa76 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -33,6 +33,7 @@
#include <asm/pgtable.h>
#include <asm/pgtable-hwdef.h>
#include <asm/sections.h>
+#include <asm/smp.h>
#include <asm/suspend.h>
#include <asm/virt.h>
@@ -236,6 +237,11 @@ int swsusp_arch_suspend(void)
unsigned long flags;
struct sleep_stack_data state;
+ if (cpus_are_stuck_in_kernel()) {
+ pr_err("Can't hibernate: no mechanism to offline secondary CPUs.\n");
+ return -EBUSY;
+ }
+
local_dbg_save(flags);
if (__cpu_suspend_enter(&state)) {
--
2.8.0.rc3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
2016-06-17 9:34 ` [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel James Morse
@ 2016-06-17 10:27 ` Mark Rutland
2016-06-17 11:16 ` Suzuki K Poulose
2016-06-17 16:37 ` Geoff Levand
1 sibling, 1 reply; 11+ messages in thread
From: Mark Rutland @ 2016-06-17 10:27 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jun 17, 2016 at 10:34:56AM +0100, James Morse wrote:
> kernel/smp.c has a fancy counter that keeps track of the number of CPUs
> it marked as not-present and left in cpu_park_loop(). If there are any
> CPUs spinning in here, features like kexec or hibernate may release them
> by overwriting this memory.
>
> This problem also occurs on machines using spin-tables to release
> secondary cores.
> After commit 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N")
> we bring all known cpus into the secondary holding pen, but may not bring
> them up depending on 'maxcpus'. This memory can't be re-used by kexec
> or hibernate.
>
> Add a function cpus_are_stuck_in_kernel() to determine if either of these
> cases have occurred.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> arch/arm64/include/asm/smp.h | 20 ++++++++++++++++++++
> arch/arm64/kernel/smp.c | 13 +++++++++++++
> 2 files changed, 33 insertions(+)
>
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index 433e50405274..4be755bcc07a 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -124,6 +124,26 @@ static inline void cpu_panic_kernel(void)
> cpu_park_loop();
> }
>
> +/*
> + * Kernel features such as hibernate and kexec depend on cpu hotplug to know
> + * they can replace any kernel memory they are not using themselves.
> + *
> + * There are two corner cases:
> + * If a secondary CPU fails to come online, (e.g. due to mismatched features),
> + * it will try to call cpu_die(). If this fails, it increases the counter
> + * cpus_stuck_in_kernel and sits in cpu_park_loop(). The memory containing
> + * this function must not be re-used for anything else as the 'stuck' core
> + * is executing it.
It might also be stuck in __no_granule_support, if it never made it to C
code. In that case, the CPU in charge of bringing up that new CPU will
increment the counter in __cpu_up.
There might be other reasons we do something like that in future, so it
might be better to be a little less specific and say something like:
If a secondary CPU enters the kernel but fails to come online,
(e.g. due to mismatched features), and cannot exit the kernel,
we increment cpus_stuck_in_kernel and leave the CPU in a
quiesecent loop within the kernel text. The memory containing
this loop must not be re-used for anything else as the 'stuck'
core is executing it.
Otherwise, this looks good. FWIW, either way:
Acked-by: Mark Rutland <mark.rutland@arm.com>
Thanks,
Mark.
> + *
> + * CPUs are also considered stuck in the kernel if we have multiple CPUs
> + * and no way to offline secondary CPUs. This happens when secondaries
> + * are released via spin-table, these CPUs are moved into the kernel's
> + * secondary_holding_pen, which must not be overwritten.
> + *
> + * This function is used to inhibit features like kexec and hibernate.
> + */
> +bool cpus_are_stuck_in_kernel(void);
> +
> #endif /* ifndef __ASSEMBLY__ */
>
> #endif /* ifndef __ASM_SMP_H */
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 678e0842cb3b..e197502f94fd 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -909,3 +909,16 @@ int setup_profiling_timer(unsigned int multiplier)
> {
> return -EINVAL;
> }
> +
> +bool cpus_are_stuck_in_kernel(void)
> +{
> + bool ret = !!cpus_stuck_in_kernel;
> +#ifdef CONFIG_HOTPLUG_CPU
> + int any_cpu = raw_smp_processor_id();
> +
> + if (num_possible_cpus() > 1 && !cpu_ops[any_cpu]->cpu_die)
> + ret = true;
> +#endif
> +
> + return ret;
> +}
> --
> 2.8.0.rc3
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs
2016-06-17 9:34 ` [PATCH 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs James Morse
@ 2016-06-17 10:28 ` Mark Rutland
0 siblings, 0 replies; 11+ messages in thread
From: Mark Rutland @ 2016-06-17 10:28 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jun 17, 2016 at 10:34:57AM +0100, James Morse wrote:
> Hibernate relies on cpu hotplug to prevent secondary cores executing
> the kernel text while it is being restored.
>
> Add a call to cpus_are_stuck_in_kernel() to determine if there are
> CPUs not counted by 'num_online_cpus()', and prevent hibernate in this
> case.
>
> Fixes: 82869ac57b5 ("arm64: kernel: Add support for hibernate/suspend-to-disk")
> Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Mark.
> ---
> arch/arm64/kernel/hibernate.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
> index f8df75d740f4..21ab5df9fa76 100644
> --- a/arch/arm64/kernel/hibernate.c
> +++ b/arch/arm64/kernel/hibernate.c
> @@ -33,6 +33,7 @@
> #include <asm/pgtable.h>
> #include <asm/pgtable-hwdef.h>
> #include <asm/sections.h>
> +#include <asm/smp.h>
> #include <asm/suspend.h>
> #include <asm/virt.h>
>
> @@ -236,6 +237,11 @@ int swsusp_arch_suspend(void)
> unsigned long flags;
> struct sleep_stack_data state;
>
> + if (cpus_are_stuck_in_kernel()) {
> + pr_err("Can't hibernate: no mechanism to offline secondary CPUs.\n");
> + return -EBUSY;
> + }
> +
> local_dbg_save(flags);
>
> if (__cpu_suspend_enter(&state)) {
> --
> 2.8.0.rc3
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
2016-06-17 10:27 ` Mark Rutland
@ 2016-06-17 11:16 ` Suzuki K Poulose
2016-06-17 16:48 ` James Morse
0 siblings, 1 reply; 11+ messages in thread
From: Suzuki K Poulose @ 2016-06-17 11:16 UTC (permalink / raw)
To: linux-arm-kernel
On 17/06/16 11:27, Mark Rutland wrote:
> On Fri, Jun 17, 2016 at 10:34:56AM +0100, James Morse wrote:
>> kernel/smp.c has a fancy counter that keeps track of the number of CPUs
>> it marked as not-present and left in cpu_park_loop(). If there are any
>> CPUs spinning in here, features like kexec or hibernate may release them
>> by overwriting this memory.
>>
>> This problem also occurs on machines using spin-tables to release
>> secondary cores.
>> After commit 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N")
>> we bring all known cpus into the secondary holding pen, but may not bring
>> them up depending on 'maxcpus'. This memory can't be re-used by kexec
>> or hibernate.
>>
>> Add a function cpus_are_stuck_in_kernel() to determine if either of these
>> cases have occurred.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> arch/arm64/include/asm/smp.h | 20 ++++++++++++++++++++
>> arch/arm64/kernel/smp.c | 13 +++++++++++++
>> 2 files changed, 33 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
>> index 433e50405274..4be755bcc07a 100644
>> --- a/arch/arm64/include/asm/smp.h
>> +++ b/arch/arm64/include/asm/smp.h
>> @@ -124,6 +124,26 @@ static inline void cpu_panic_kernel(void)
>> cpu_park_loop();
>> }
>>
>> +/*
>> + * Kernel features such as hibernate and kexec depend on cpu hotplug to know
>> + * they can replace any kernel memory they are not using themselves.
>> + *
>> + * There are two corner cases:
>> + * If a secondary CPU fails to come online, (e.g. due to mismatched features),
>> + * it will try to call cpu_die(). If this fails, it increases the counter
>> + * cpus_stuck_in_kernel and sits in cpu_park_loop(). The memory containing
>> + * this function must not be re-used for anything else as the 'stuck' core
>> + * is executing it.
>
> It might also be stuck in __no_granule_support, if it never made it to C
> code. In that case, the CPU in charge of bringing up that new CPU will
> increment the counter in __cpu_up.
Just to clarify, *in all the cases*, the CPU in charge of bringing up updates
the cpus_stuck_in_kernel.
>
> There might be other reasons we do something like that in future, so it
> might be better to be a little less specific and say something like:
>
> If a secondary CPU enters the kernel but fails to come online,
> (e.g. due to mismatched features), and cannot exit the kernel,
> we increment cpus_stuck_in_kernel and leave the CPU in a
> quiesecent loop within the kernel text. The memory containing
> this loop must not be re-used for anything else as the 'stuck'
> core is executing it.
Agree.
>> +bool cpus_are_stuck_in_kernel(void);
>> +
>> #endif /* ifndef __ASSEMBLY__ */
>>
>> #endif /* ifndef __ASM_SMP_H */
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index 678e0842cb3b..e197502f94fd 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -909,3 +909,16 @@ int setup_profiling_timer(unsigned int multiplier)
>> {
>> return -EINVAL;
>> }
>> +
>> +bool cpus_are_stuck_in_kernel(void)
>> +{
>> + bool ret = !!cpus_stuck_in_kernel;
>> +#ifdef CONFIG_HOTPLUG_CPU
>> + int any_cpu = raw_smp_processor_id();
>> +
>> + if (num_possible_cpus() > 1 && !cpu_ops[any_cpu]->cpu_die)
>> + ret = true;
>> +#endif
Minor nit: Moving the cpu_die check to a static inline function with
an obvious name might make the code look better.
return !!cpus_stuck_in_kernel || !have_cpu_die() ?
Eitherway,
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cheers
Suzuki
>> +
>> + return ret;
>> +}
>> --
>> 2.8.0.rc3
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
2016-06-17 9:34 ` [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel James Morse
2016-06-17 10:27 ` Mark Rutland
@ 2016-06-17 16:37 ` Geoff Levand
2016-06-17 16:39 ` Suzuki K Poulose
2016-06-17 16:42 ` Mark Rutland
1 sibling, 2 replies; 11+ messages in thread
From: Geoff Levand @ 2016-06-17 16:37 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, 2016-06-17 at 10:34 +0100, James Morse wrote:
> +bool cpus_are_stuck_in_kernel(void)
> +{
> +> > bool ret = !!cpus_stuck_in_kernel;
> +#ifdef CONFIG_HOTPLUG_CPU
How about using 'if (IS_ENABLED(CONFIG_HOTPLUG_CPU))' here?
> + int any_cpu = raw_smp_processor_id();
> +
> +> > if (num_possible_cpus() > 1 && !cpu_ops[any_cpu]->cpu_die)
> +> > > ret = true;
> +#endif
-Geoff
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
2016-06-17 16:37 ` Geoff Levand
@ 2016-06-17 16:39 ` Suzuki K Poulose
2016-06-17 16:42 ` Mark Rutland
1 sibling, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2016-06-17 16:39 UTC (permalink / raw)
To: linux-arm-kernel
On 17/06/16 17:37, Geoff Levand wrote:
> On Fri, 2016-06-17 at 10:34 +0100, James Morse wrote:
>> +bool cpus_are_stuck_in_kernel(void)
>> +{
>> +> > bool ret = !!cpus_stuck_in_kernel;
>> +#ifdef CONFIG_HOTPLUG_CPU
>
> How about using 'if (IS_ENABLED(CONFIG_HOTPLUG_CPU))' here?
>
>> + int any_cpu = raw_smp_processor_id();
>> +
>> +> > if (num_possible_cpus() > 1 && !cpu_ops[any_cpu]->cpu_die)
>> +> > > ret = true;
>> +#endif
>
That won't work, as the cpu_ops->cpu_die is defined only if CONFIG_HOTPLUG_CPU.
May be we should fix that.
Suzuki
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
2016-06-17 16:37 ` Geoff Levand
2016-06-17 16:39 ` Suzuki K Poulose
@ 2016-06-17 16:42 ` Mark Rutland
1 sibling, 0 replies; 11+ messages in thread
From: Mark Rutland @ 2016-06-17 16:42 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jun 17, 2016 at 09:37:12AM -0700, Geoff Levand wrote:
> On Fri, 2016-06-17 at 10:34 +0100, James Morse wrote:
> > +bool cpus_are_stuck_in_kernel(void)
> > +{
> > +> > bool ret = !!cpus_stuck_in_kernel;
> > +#ifdef CONFIG_HOTPLUG_CPU
>
> How about using 'if (IS_ENABLED(CONFIG_HOTPLUG_CPU))' here?
I wanted to suggest that too, but ...
> > + int any_cpu = raw_smp_processor_id();
> > +
> > +> > if (num_possible_cpus() > 1 && !cpu_ops[any_cpu]->cpu_die)
> > +> > > ret = true;
... here we'd blow up as cpu_ops::cpu_die is only defined when
CONFIG_HOTPLUG_CPU is selected.
So we'll have to use an ifdef, but we could instead ifdef a whole stub
function (e.g. have_cpu_die()), as Suzuki suggested.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
2016-06-17 11:16 ` Suzuki K Poulose
@ 2016-06-17 16:48 ` James Morse
2016-06-21 18:07 ` Will Deacon
0 siblings, 1 reply; 11+ messages in thread
From: James Morse @ 2016-06-17 16:48 UTC (permalink / raw)
To: linux-arm-kernel
Hi Suzuki,
On 17/06/16 12:16, Suzuki K Poulose wrote:
> On 17/06/16 11:27, Mark Rutland wrote:
>> On Fri, Jun 17, 2016 at 10:34:56AM +0100, James Morse wrote:
>>> kernel/smp.c has a fancy counter that keeps track of the number of CPUs
>>> it marked as not-present and left in cpu_park_loop(). If there are any
>>> CPUs spinning in here, features like kexec or hibernate may release them
>>> by overwriting this memory.
>>>
>>> This problem also occurs on machines using spin-tables to release
>>> secondary cores.
>>> After commit 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N")
>>> we bring all known cpus into the secondary holding pen, but may not bring
>>> them up depending on 'maxcpus'. This memory can't be re-used by kexec
>>> or hibernate.
>>>
>>> Add a function cpus_are_stuck_in_kernel() to determine if either of these
>>> cases have occurred.
>> It might also be stuck in __no_granule_support, if it never made it to C
>> code. In that case, the CPU in charge of bringing up that new CPU will
>> increment the counter in __cpu_up.
>
> Just to clarify, *in all the cases*, the CPU in charge of bringing up updates
> the cpus_stuck_in_kernel.
Ah, my mistake. I will switch it for Mark's suggestion.
>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>> index 678e0842cb3b..e197502f94fd 100644
>>> --- a/arch/arm64/kernel/smp.c
>>> +++ b/arch/arm64/kernel/smp.c
>>> @@ -909,3 +909,16 @@ int setup_profiling_timer(unsigned int multiplier)
>>> {
>>> return -EINVAL;
>>> }
>>> +
>>> +bool cpus_are_stuck_in_kernel(void)
>>> +{
>>> + bool ret = !!cpus_stuck_in_kernel;
>>> +#ifdef CONFIG_HOTPLUG_CPU
>>> + int any_cpu = raw_smp_processor_id();
>>> +
>>> + if (num_possible_cpus() > 1 && !cpu_ops[any_cpu]->cpu_die)
>>> + ret = true;
>>> +#endif
>
> Minor nit: Moving the cpu_die check to a static inline function with
> an obvious name might make the code look better.
>
> return !!cpus_stuck_in_kernel || !have_cpu_die() ?
>
That would be better!
> Eitherway,
>
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Thanks,
James
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
2016-06-17 16:48 ` James Morse
@ 2016-06-21 18:07 ` Will Deacon
0 siblings, 0 replies; 11+ messages in thread
From: Will Deacon @ 2016-06-21 18:07 UTC (permalink / raw)
To: linux-arm-kernel
HI James,
On Fri, Jun 17, 2016 at 05:48:35PM +0100, James Morse wrote:
> On 17/06/16 12:16, Suzuki K Poulose wrote:
> > On 17/06/16 11:27, Mark Rutland wrote:
> >> On Fri, Jun 17, 2016 at 10:34:56AM +0100, James Morse wrote:
> >>> kernel/smp.c has a fancy counter that keeps track of the number of CPUs
> >>> it marked as not-present and left in cpu_park_loop(). If there are any
> >>> CPUs spinning in here, features like kexec or hibernate may release them
> >>> by overwriting this memory.
> >>>
> >>> This problem also occurs on machines using spin-tables to release
> >>> secondary cores.
> >>> After commit 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N")
> >>> we bring all known cpus into the secondary holding pen, but may not bring
> >>> them up depending on 'maxcpus'. This memory can't be re-used by kexec
> >>> or hibernate.
> >>>
> >>> Add a function cpus_are_stuck_in_kernel() to determine if either of these
> >>> cases have occurred.
>
> >> It might also be stuck in __no_granule_support, if it never made it to C
> >> code. In that case, the CPU in charge of bringing up that new CPU will
> >> increment the counter in __cpu_up.
> >
> > Just to clarify, *in all the cases*, the CPU in charge of bringing up updates
> > the cpus_stuck_in_kernel.
>
> Ah, my mistake. I will switch it for Mark's suggestion.
>
> >>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> >>> index 678e0842cb3b..e197502f94fd 100644
> >>> --- a/arch/arm64/kernel/smp.c
> >>> +++ b/arch/arm64/kernel/smp.c
> >>> @@ -909,3 +909,16 @@ int setup_profiling_timer(unsigned int multiplier)
> >>> {
> >>> return -EINVAL;
> >>> }
> >>> +
> >>> +bool cpus_are_stuck_in_kernel(void)
> >>> +{
> >>> + bool ret = !!cpus_stuck_in_kernel;
> >>> +#ifdef CONFIG_HOTPLUG_CPU
> >>> + int any_cpu = raw_smp_processor_id();
> >>> +
> >>> + if (num_possible_cpus() > 1 && !cpu_ops[any_cpu]->cpu_die)
> >>> + ret = true;
> >>> +#endif
> >
> > Minor nit: Moving the cpu_die check to a static inline function with
> > an obvious name might make the code look better.
> >
> > return !!cpus_stuck_in_kernel || !have_cpu_die() ?
> >
>
> That would be better!
Can you post a new version of this, please?
Will
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-06-21 18:07 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17 9:34 [PATCH 0/2] Fix hibernate on SMP spin-table systems James Morse
2016-06-17 9:34 ` [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel James Morse
2016-06-17 10:27 ` Mark Rutland
2016-06-17 11:16 ` Suzuki K Poulose
2016-06-17 16:48 ` James Morse
2016-06-21 18:07 ` Will Deacon
2016-06-17 16:37 ` Geoff Levand
2016-06-17 16:39 ` Suzuki K Poulose
2016-06-17 16:42 ` Mark Rutland
2016-06-17 9:34 ` [PATCH 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs James Morse
2016-06-17 10:28 ` Mark Rutland
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.