[PATCH 0/4][RFC v2] x86, hotplug: Use hlt instead of mwait when resuming from hibernation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4][RFC v2] x86, hotplug: Use hlt instead of mwait when resuming from hibernation
@ 2016-06-25 16:18 Chen Yu
  2016-06-25 16:18 ` [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it is NULL Chen Yu
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Chen Yu @ 2016-06-25 16:18 UTC (permalink / raw)
  To: linux-pm, x86
  Cc: Rafael J. Wysocki, Len Brown, Peter Zijlstra, H. Peter Anvin,
	Borislav Petkov, Pavel Machek, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel, Chen Yu

Currently it is reported that, when system is trying to resume
from hibernation, the nonboot CPUs might be incorrectly woken up
and hang there. The reason for this is because of inconsistent
page tables across hibernation resume. To avoid this situation,
use hlt instead of mwait to put nonboot CPUs in a more safe state
and just let them watch the boot CPU to do all the things and
wakes them up later.

The first three patches are preparation for the fourth one, please
refer to [4/4] for detail.

Chen Yu (4):
  PM / sleep: Avoid accessing frozen_cpus if it is NULL
  PM / sleep: Introduce arch-specific hook for disable/enable nonboot
    cpus
  PM / hibernate: introduce a flag to indicate resuming from hibernation
  x86, hotplug: Use hlt instead of mwait when resuming from hibernation

 arch/x86/kernel/smpboot.c | 15 +++++++++++++++
 include/linux/cpu.h       |  2 ++
 include/linux/suspend.h   |  7 +++++++
 kernel/cpu.c              | 38 ++++++++++++++++++++++++++++++++++++++
 kernel/power/hibernate.c  |  3 +++
 5 files changed, 65 insertions(+)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it is NULL
  2016-06-25 16:18 [PATCH 0/4][RFC v2] x86, hotplug: Use hlt instead of mwait when resuming from hibernation Chen Yu
@ 2016-06-25 16:18 ` Chen Yu
  2016-06-25 16:51   ` Pavel Machek
  2016-06-25 16:18 ` [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus Chen Yu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Chen Yu @ 2016-06-25 16:18 UTC (permalink / raw)
  To: linux-pm, x86
  Cc: Rafael J. Wysocki, Len Brown, Peter Zijlstra, H. Peter Anvin,
	Borislav Petkov, Pavel Machek, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel, Chen Yu

frozen_cpus might be NULL if the allocation in previous
alloc_frozen_cpus failed, when CONFIG_CPUMASK_OFFSTACK
is set.

This patch avoid accessing this cpumask if it is NULL.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d948e44..d25266e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1021,6 +1021,8 @@ int disable_nonboot_cpus(void)
 {
 	int cpu, first_cpu, error = 0;
 
+	if (frozen_cpus == NULL)
+		return -ENOMEM;
 	cpu_maps_update_begin();
 	first_cpu = cpumask_first(cpu_online_mask);
 	/*
@@ -1072,6 +1074,8 @@ void enable_nonboot_cpus(void)
 {
 	int cpu, error;
 
+	if (frozen_cpus == NULL)
+		return;
 	/* Allow everyone to use the CPU hotplug again */
 	cpu_maps_update_begin();
 	WARN_ON(--cpu_hotplug_disabled < 0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus
  2016-06-25 16:18 [PATCH 0/4][RFC v2] x86, hotplug: Use hlt instead of mwait when resuming from hibernation Chen Yu
  2016-06-25 16:18 ` [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it is NULL Chen Yu
@ 2016-06-25 16:18 ` Chen Yu
  2016-06-25 16:51   ` Pavel Machek
  2016-10-07 19:31   ` Andy Lutomirski
  2016-06-25 16:18 ` [PATCH 3/4][RFC v2] PM / hibernate: introduce a flag to indicate resuming from hibernation Chen Yu
  2016-06-25 16:19 ` [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when " Chen Yu
  3 siblings, 2 replies; 21+ messages in thread
From: Chen Yu @ 2016-06-25 16:18 UTC (permalink / raw)
  To: linux-pm, x86
  Cc: Rafael J. Wysocki, Len Brown, Peter Zijlstra, H. Peter Anvin,
	Borislav Petkov, Pavel Machek, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel, Chen Yu

There is requirement that we need to do some arch-specific
operations before putting the nonboot CPUs offline/online.
One of the requirements comes from the hibernation resume
process on x86_64, we need to kick all the offlin-CPUs
online and offline again, in order to put them in a safe
state, thus to avoid possible unwilling wake up during
hibernation resume.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/cpu.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d25266e..ce6e5e4 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1017,6 +1017,16 @@ EXPORT_SYMBOL_GPL(cpu_up);
 #ifdef CONFIG_PM_SLEEP_SMP
 static cpumask_var_t frozen_cpus;
 
+void __weak arch_disable_nonboot_cpus_pre(void)
+{
+
+}
+
+void __weak arch_enable_nonboot_cpus_pre(void)
+{
+
+}
+
 int disable_nonboot_cpus(void)
 {
 	int cpu, first_cpu, error = 0;
@@ -1024,6 +1034,7 @@ int disable_nonboot_cpus(void)
 	if (frozen_cpus == NULL)
 		return -ENOMEM;
 	cpu_maps_update_begin();
+	arch_disable_nonboot_cpus_pre();
 	first_cpu = cpumask_first(cpu_online_mask);
 	/*
 	 * We take down all of the non-boot CPUs in one shot to avoid races
@@ -1078,6 +1089,7 @@ void enable_nonboot_cpus(void)
 		return;
 	/* Allow everyone to use the CPU hotplug again */
 	cpu_maps_update_begin();
+	arch_enable_nonboot_cpus_pre();
 	WARN_ON(--cpu_hotplug_disabled < 0);
 	if (cpumask_empty(frozen_cpus))
 		goto out;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4][RFC v2] PM / hibernate: introduce a flag to indicate resuming from hibernation
  2016-06-25 16:18 [PATCH 0/4][RFC v2] x86, hotplug: Use hlt instead of mwait when resuming from hibernation Chen Yu
  2016-06-25 16:18 ` [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it is NULL Chen Yu
  2016-06-25 16:18 ` [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus Chen Yu
@ 2016-06-25 16:18 ` Chen Yu
  2016-06-25 16:53   ` Pavel Machek
  2016-06-25 16:19 ` [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when " Chen Yu
  3 siblings, 1 reply; 21+ messages in thread
From: Chen Yu @ 2016-06-25 16:18 UTC (permalink / raw)
  To: linux-pm, x86
  Cc: Rafael J. Wysocki, Len Brown, Peter Zijlstra, H. Peter Anvin,
	Borislav Petkov, Pavel Machek, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel, Chen Yu

Sometime we need to do some operations before resuming from
hibernation, so introduce a flag to indicate this stage.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 include/linux/suspend.h  | 7 +++++++
 kernel/power/hibernate.c | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 8b6ec7e..422e87a 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -384,6 +384,12 @@ extern bool system_entering_hibernation(void);
 extern bool hibernation_available(void);
 asmlinkage int swsusp_save(void);
 extern struct pbe *restore_pblist;
+extern bool in_resume_hibernate;
+
+static inline bool hibernation_in_resume(void)
+{
+	return in_resume_hibernate;
+}
 #else /* CONFIG_HIBERNATION */
 static inline void register_nosave_region(unsigned long b, unsigned long e) {}
 static inline void register_nosave_region_late(unsigned long b, unsigned long e) {}
@@ -395,6 +401,7 @@ static inline void hibernation_set_ops(const struct platform_hibernation_ops *op
 static inline int hibernate(void) { return -ENOSYS; }
 static inline bool system_entering_hibernation(void) { return false; }
 static inline bool hibernation_available(void) { return false; }
+static inline bool hibernation_in_resume(void) { return false; }
 #endif /* CONFIG_HIBERNATION */
 
 /* Hibernation and suspend events */
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index fca9254..41be909 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -43,6 +43,7 @@ static char resume_file[256] = CONFIG_PM_STD_PARTITION;
 dev_t swsusp_resume_device;
 sector_t swsusp_resume_block;
 __visible int in_suspend __nosavedata;
+bool in_resume_hibernate;
 
 enum {
 	HIBERNATION_INVALID,
@@ -433,6 +434,7 @@ static int resume_target_kernel(bool platform_mode)
 	if (error)
 		goto Cleanup;
 
+	in_resume_hibernate = true;
 	error = disable_nonboot_cpus();
 	if (error)
 		goto Enable_cpus;
@@ -474,6 +476,7 @@ static int resume_target_kernel(bool platform_mode)
 	local_irq_enable();
 
  Enable_cpus:
+	in_resume_hibernate = false;
 	enable_nonboot_cpus();
 
  Cleanup:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when resuming from hibernation
  2016-06-25 16:18 [PATCH 0/4][RFC v2] x86, hotplug: Use hlt instead of mwait when resuming from hibernation Chen Yu
                   ` (2 preceding siblings ...)
  2016-06-25 16:18 ` [PATCH 3/4][RFC v2] PM / hibernate: introduce a flag to indicate resuming from hibernation Chen Yu
@ 2016-06-25 16:19 ` Chen Yu
  2016-10-07 19:47   ` Andy Lutomirski
  3 siblings, 1 reply; 21+ messages in thread
From: Chen Yu @ 2016-06-25 16:19 UTC (permalink / raw)
  To: linux-pm, x86
  Cc: Rafael J. Wysocki, Len Brown, Peter Zijlstra, H. Peter Anvin,
	Borislav Petkov, Pavel Machek, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel, Chen Yu

Here's the story of what the problem is, why this
happened, and why this patch looks like this:

Stress test from Varun Koyyalagunta reports that, the
nonboot CPU would hang occasionally, when resuming from
hibernation. Further investigation shows that, the precise
stage when nonboot CPU hangs, is the time when the nonboot
CPU been woken up incorrectly, and tries to monitor the
mwait_ptr for the second time, then an exception is
triggered due to illegal vaddr access, say, something like,
'Unable to handler kernel address of 0xffff8800ba800010...'

Further investigation shows that, the exception is caused
by accessing a page without PRESENT flag, because the pte entry
for this vaddr is zero. Here's the scenario how this problem
happens: Page table for direct mapping is allocated dynamically
by kernel_physical_mapping_init, it is possible that in the
resume process, when the boot CPU is trying to write back pages
to their original address, and just right to writes to the monitor
mwait_ptr then wakes up one of the nonboot CPUs, since the page
table currently used by the nonboot CPU might not the same as it
is before the hibernation, an exception might occur due to
inconsistent page table.

First try is to get rid of this problem by changing the monitor
address from task.flag to zero page, because one one would write
to zero page. But this still have problem because of ping-pong
wake up situation in mwait_play_dead:

One possible implementation of a clflush is a read-invalidate snoop,
which is what a store might look like, so cflush might break the mwait.

1. CPU1 wait at zero page
2. CPU2 cflush zero page, wake CPU1 up, then CPU2 waits at zero page
3. CPU1 is woken up, and invoke cflush zero page, thus wake up CPU2 again.
then the nonboot CPUs never sleep for long.

So it's better to monitor different address for each
nonboot CPUs, however since there is only one zero page, at most:
PAGE_SIZE/L1_CACHE_LINE CPUs are satisfied, which is usually 64
on a x86_64, apparently it's not enough for servers, maybe more
zero pages are required.

So choose the solution as Brian suggested, to put the nonboot CPUs
into hlt before resuming. But Rafael has mentioned that, if some of
the CPUs have already been offline before hibernation, then the problem
is still there. So this patch tries to kick the already offline CPUs woken
up and fall into hlt, and then put the rest online CPUs into hlt.
In this way, all the nonboot CPUs will wait at a safe state,
without touching any memory during s/r. (It's not safe to modify
mwait_play_dead, because once previous offline CPUs are woken up,
it will either access text code, whose page table is not safe anymore
across hibernation, due to:
Commit ab76f7b4ab23 ("x86/mm: Set NX on gap between __ex_table and
rodata").

Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
Reported-and-tested-by: Varun Koyyalagunta <cpudebug@centtech.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 arch/x86/kernel/smpboot.c | 15 +++++++++++++++
 include/linux/cpu.h       |  2 ++
 kernel/cpu.c              | 22 ++++++++++++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fafe8b9..86672be 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -53,6 +53,7 @@
 #include <linux/stackprotector.h>
 #include <linux/gfp.h>
 #include <linux/cpuidle.h>
+#include <linux/suspend.h>

 #include <asm/acpi.h>
 #include <asm/desc.h>
@@ -1341,6 +1342,18 @@ void arch_enable_nonboot_cpus_end(void)
 	mtrr_aps_init();
 }

+void arch_disable_nonboot_cpus_pre(void)
+{
+	if (!hibernation_in_resume())
+		return;
+	kick_offline_cpus(1);
+}
+
+void arch_enable_nonboot_cpus_prev(void)
+{
+	kick_offline_cpus(0);
+}
+
 /*
  * Early setup to make printk work.
  */
@@ -1642,6 +1655,8 @@ void native_play_dead(void)
 	play_dead_common();
 	tboot_shutdown(TB_SHUTDOWN_WFS);

+	if (hibernation_in_resume())
+		hlt_play_dead();
 	mwait_play_dead();	/* Only returns on failure */
 	if (cpuidle_play_dead())
 		hlt_play_dead();
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 21597dc..d408200 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -241,9 +241,11 @@ static inline void cpu_hotplug_done(void) {}
 #ifdef CONFIG_PM_SLEEP_SMP
 extern int disable_nonboot_cpus(void);
 extern void enable_nonboot_cpus(void);
+extern int kick_offline_cpus(int first);
 #else /* !CONFIG_PM_SLEEP_SMP */
 static inline int disable_nonboot_cpus(void) { return 0; }
 static inline void enable_nonboot_cpus(void) {}
+static inline int kick_offline_cpus(int first) { return 0; }
 #endif /* !CONFIG_PM_SLEEP_SMP */

 void cpu_startup_entry(enum cpuhp_state state);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index ce6e5e4..538a382 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1016,6 +1016,25 @@ EXPORT_SYMBOL_GPL(cpu_up);

 #ifdef CONFIG_PM_SLEEP_SMP
 static cpumask_var_t frozen_cpus;
+static cpumask_var_t offline_cpus;
+
+int kick_offline_cpus(int first)
+{
+	int cpu;
+
+	if (offline_cpus == NULL)
+		return -ENOMEM;
+
+	if (first)
+		cpumask_andnot(offline_cpus, cpu_possible_mask, cpu_online_mask);
+
+	for_each_cpu(cpu, offline_cpus) {
+		_cpu_up(cpu, 1, CPUHP_ONLINE);
+		_cpu_down(cpu, 1, CPUHP_OFFLINE);
+	}
+
+	return 0;
+}

 void __weak arch_disable_nonboot_cpus_pre(void)
 {
@@ -1120,6 +1139,9 @@ static int __init alloc_frozen_cpus(void)
 {
 	if (!alloc_cpumask_var(&frozen_cpus, GFP_KERNEL|__GFP_ZERO))
 		return -ENOMEM;
+	/* No need to free frozen_cpus if failed. */
+	if (!alloc_cpumask_var(&offline_cpus, GFP_KERNEL|__GFP_ZERO))
+		return -ENOMEM;
 	return 0;
 }
 core_initcall(alloc_frozen_cpus);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it is NULL
  2016-06-25 16:18 ` [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it is NULL Chen Yu
@ 2016-06-25 16:51   ` Pavel Machek
  2016-06-26  1:16     ` Chen, Yu C
  2016-06-26  4:25     ` Chen, Yu C
  0 siblings, 2 replies; 21+ messages in thread
From: Pavel Machek @ 2016-06-25 16:51 UTC (permalink / raw)
  To: Chen Yu
  Cc: linux-pm, x86, Rafael J. Wysocki, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel

On Sun 2016-06-26 00:18:30, Chen Yu wrote:
> frozen_cpus might be NULL if the allocation in previous
> alloc_frozen_cpus failed, when CONFIG_CPUMASK_OFFSTACK
> is set.
> 
> This patch avoid accessing this cpumask if it is NULL.
> 
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
>  kernel/cpu.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index d948e44..d25266e 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1021,6 +1021,8 @@ int disable_nonboot_cpus(void)
>  {
>  	int cpu, first_cpu, error = 0;
>  
> +	if (frozen_cpus == NULL)
> +		return -ENOMEM;
>  	cpu_maps_update_begin();
>  	first_cpu = cpumask_first(cpu_online_mask);
>  	/*

I'd say that whoever allocates frozen_cpus should just abort the
hibernation if there's not enough memory for the operation...? This
seems like checking for the problem too late.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus
  2016-06-25 16:18 ` [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus Chen Yu
@ 2016-06-25 16:51   ` Pavel Machek
  2016-06-26  1:19     ` Chen, Yu C
  2016-10-07 19:31   ` Andy Lutomirski
  1 sibling, 1 reply; 21+ messages in thread
From: Pavel Machek @ 2016-06-25 16:51 UTC (permalink / raw)
  To: Chen Yu
  Cc: linux-pm, x86, Rafael J. Wysocki, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel

On Sun 2016-06-26 00:18:41, Chen Yu wrote:
> There is requirement that we need to do some arch-specific
> operations before putting the nonboot CPUs offline/online.
> One of the requirements comes from the hibernation resume
> process on x86_64, we need to kick all the offlin-CPUs
> online and offline again, in order to put them in a safe
> state, thus to avoid possible unwilling wake up during
> hibernation resume.

You may want to check english here.


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4][RFC v2] PM / hibernate: introduce a flag to indicate resuming from hibernation
  2016-06-25 16:18 ` [PATCH 3/4][RFC v2] PM / hibernate: introduce a flag to indicate resuming from hibernation Chen Yu
@ 2016-06-25 16:53   ` Pavel Machek
  2016-06-26  1:34     ` Chen, Yu C
  0 siblings, 1 reply; 21+ messages in thread
From: Pavel Machek @ 2016-06-25 16:53 UTC (permalink / raw)
  To: Chen Yu
  Cc: linux-pm, x86, Rafael J. Wysocki, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel

On Sun 2016-06-26 00:18:52, Chen Yu wrote:
> Sometime we need to do some operations before resuming from
> hibernation, so introduce a flag to indicate this stage.
> 
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
>  include/linux/suspend.h  | 7 +++++++
>  kernel/power/hibernate.c | 3 +++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index 8b6ec7e..422e87a 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -384,6 +384,12 @@ extern bool system_entering_hibernation(void);
>  extern bool hibernation_available(void);
>  asmlinkage int swsusp_save(void);
>  extern struct pbe *restore_pblist;
> +extern bool in_resume_hibernate;

in_resume_hibernation? But it is pretty sad if we need another such
state function...
								Pavel

> +static inline bool hibernation_in_resume(void)
> +{
> +	return in_resume_hibernate;
> +}
>  #else /* CONFIG_HIBERNATION */
>  static inline void register_nosave_region(unsigned long b, unsigned long e) {}
>  static inline void register_nosave_region_late(unsigned long b, unsigned long e) {}
> @@ -395,6 +401,7 @@ static inline void hibernation_set_ops(const struct platform_hibernation_ops *op
>  static inline int hibernate(void) { return -ENOSYS; }
>  static inline bool system_entering_hibernation(void) { return false; }
>  static inline bool hibernation_available(void) { return false; }
> +static inline bool hibernation_in_resume(void) { return false; }
>  #endif /* CONFIG_HIBERNATION */
>  
>  /* Hibernation and suspend events */
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index fca9254..41be909 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -43,6 +43,7 @@ static char resume_file[256] = CONFIG_PM_STD_PARTITION;
>  dev_t swsusp_resume_device;
>  sector_t swsusp_resume_block;
>  __visible int in_suspend __nosavedata;
> +bool in_resume_hibernate;
>  
>  enum {
>  	HIBERNATION_INVALID,
> @@ -433,6 +434,7 @@ static int resume_target_kernel(bool platform_mode)
>  	if (error)
>  		goto Cleanup;
>  
> +	in_resume_hibernate = true;
>  	error = disable_nonboot_cpus();
>  	if (error)
>  		goto Enable_cpus;
> @@ -474,6 +476,7 @@ static int resume_target_kernel(bool platform_mode)
>  	local_irq_enable();
>  
>   Enable_cpus:
> +	in_resume_hibernate = false;
>  	enable_nonboot_cpus();
>  
>   Cleanup:

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it is NULL
  2016-06-25 16:51   ` Pavel Machek
@ 2016-06-26  1:16     ` Chen, Yu C
  2016-06-26  4:25     ` Chen, Yu C
  1 sibling, 0 replies; 21+ messages in thread
From: Chen, Yu C @ 2016-06-26  1:16 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-pm, x86, Rafael J. Wysocki, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel

Hi,

> -----Original Message-----
> From: Pavel Machek [mailto:pavel@ucw.cz]
> Sent: Sunday, June 26, 2016 12:51 AM
> To: Chen, Yu C
> Cc: linux-pm@vger.kernel.org; x86@kernel.org; Rafael J. Wysocki; Len Brown;
> Peter Zijlstra; H. Peter Anvin; Borislav Petkov; Brian Gerst; Thomas Gleixner;
> Ingo Molnar; Varun Koyyalagunta; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it
> is NULL
> 
> On Sun 2016-06-26 00:18:30, Chen Yu wrote:
> > frozen_cpus might be NULL if the allocation in previous
> > alloc_frozen_cpus failed, when CONFIG_CPUMASK_OFFSTACK is set.
> >
> > This patch avoid accessing this cpumask if it is NULL.
> >
> > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > ---
> >  kernel/cpu.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c index d948e44..d25266e 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -1021,6 +1021,8 @@ int disable_nonboot_cpus(void)  {
> >  	int cpu, first_cpu, error = 0;
> >
> > +	if (frozen_cpus == NULL)
> > +		return -ENOMEM;
> >  	cpu_maps_update_begin();
> >  	first_cpu = cpumask_first(cpu_online_mask);
> >  	/*
> 
> I'd say that whoever allocates frozen_cpus should just abort the hibernation if
> there's not enough memory for the operation...? This seems like checking for
> the problem too late.
> 
The allocation of frozen_cpus is alloc_frozen_cpus, which is in core_initcall,
and do_initcall during boot up  seems not to care about the return value from 
these functions.  So I think either we add the check in disable_nonboot_cpus, or
we can set noresume = 1 and nohibernate = 1 if alloc_frozen_cpus fails.

thanks,
Yu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus
  2016-06-25 16:51   ` Pavel Machek
@ 2016-06-26  1:19     ` Chen, Yu C
  0 siblings, 0 replies; 21+ messages in thread
From: Chen, Yu C @ 2016-06-26  1:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-pm, x86, Rafael J. Wysocki, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel


Hi,


> -----Original Message-----
> From: Pavel Machek [mailto:pavel@ucw.cz]
> Sent: Sunday, June 26, 2016 12:52 AM
> To: Chen, Yu C
> Cc: linux-pm@vger.kernel.org; x86@kernel.org; Rafael J. Wysocki; Len Brown;
> Peter Zijlstra; H. Peter Anvin; Borislav Petkov; Brian Gerst; Thomas Gleixner;
> Ingo Molnar; Varun Koyyalagunta; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for
> disable/enable nonboot cpus
> 
> On Sun 2016-06-26 00:18:41, Chen Yu wrote:
> > There is requirement that we need to do some arch-specific operations
> > before putting the nonboot CPUs offline/online.
> > One of the requirements comes from the hibernation resume process on
> > x86_64, we need to kick all the offlin-CPUs online and offline again,
> > in order to put them in a safe state, thus to avoid possible unwilling
> > wake up during hibernation resume.
> 
> You may want to check english here.
Ah, sorry, I'll rewrite the log, and [PATCH 4/4] might contain the detail of this issue.

thanks,
Yu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 3/4][RFC v2] PM / hibernate: introduce a flag to indicate resuming from hibernation
  2016-06-25 16:53   ` Pavel Machek
@ 2016-06-26  1:34     ` Chen, Yu C
  0 siblings, 0 replies; 21+ messages in thread
From: Chen, Yu C @ 2016-06-26  1:34 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-pm, x86, Rafael J. Wysocki, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel

Hi,

> -----Original Message-----
> From: Pavel Machek [mailto:pavel@ucw.cz]
> Sent: Sunday, June 26, 2016 12:53 AM
> To: Chen, Yu C
> Cc: linux-pm@vger.kernel.org; x86@kernel.org; Rafael J. Wysocki; Len Brown;
> Peter Zijlstra; H. Peter Anvin; Borislav Petkov; Brian Gerst; Thomas Gleixner;
> Ingo Molnar; Varun Koyyalagunta; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 3/4][RFC v2] PM / hibernate: introduce a flag to indicate
> resuming from hibernation
> 
> On Sun 2016-06-26 00:18:52, Chen Yu wrote:
> > Sometime we need to do some operations before resuming from
> > hibernation, so introduce a flag to indicate this stage.
> >
> > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > ---
> >  include/linux/suspend.h  | 7 +++++++
> >  kernel/power/hibernate.c | 3 +++
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/include/linux/suspend.h b/include/linux/suspend.h index
> > 8b6ec7e..422e87a 100644
> > --- a/include/linux/suspend.h
> > +++ b/include/linux/suspend.h
> > @@ -384,6 +384,12 @@ extern bool system_entering_hibernation(void);
> >  extern bool hibernation_available(void);  asmlinkage int
> > swsusp_save(void);  extern struct pbe *restore_pblist;
> > +extern bool in_resume_hibernate;
> 
> in_resume_hibernation? But it is pretty sad if we need another such
> state function...
At first I tried not to add any new flags, and just let all the CPUs fall into hlt if it is
triggered by disable_nonboot_cpus, but as depicted in [PATCH 4/4], 
there is a scenario that,   what about the CPUs already  offline before
the hibernation? They might be in deep cstate by mwait, and mwait is not
safe during resume.
We have to kick all the offline CPUs online again, and then
put them offline to hlt state, thus to avoid the unsafe memory access during resume.
(The memory mapping for text code is not safe during resume, so it's hard to fix it
in mwait_play_dead, and the kick mechanism seems to be a suitable choice).
In order to restrict the impact of the kick mechanism to a minimal scope, I used this
flag to make this  mechanism only work for hibernation resume phase.

thanks,
Yu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it is NULL
  2016-06-25 16:51   ` Pavel Machek
  2016-06-26  1:16     ` Chen, Yu C
@ 2016-06-26  4:25     ` Chen, Yu C
  1 sibling, 0 replies; 21+ messages in thread
From: Chen, Yu C @ 2016-06-26  4:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-pm, x86, Rafael J. Wysocki, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel


> -----Original Message-----
> From: Chen, Yu C
> Sent: Sunday, June 26, 2016 9:17 AM
> To: 'Pavel Machek'
> Cc: linux-pm@vger.kernel.org; x86@kernel.org; Rafael J. Wysocki; Len Brown;
> Peter Zijlstra; H. Peter Anvin; Borislav Petkov; Brian Gerst; Thomas Gleixner;
> Ingo Molnar; Varun Koyyalagunta; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it
> is NULL
> 
> Hi,
> 
> > -----Original Message-----
> > From: Pavel Machek [mailto:pavel@ucw.cz]
> > Sent: Sunday, June 26, 2016 12:51 AM
> > To: Chen, Yu C
> > Cc: linux-pm@vger.kernel.org; x86@kernel.org; Rafael J. Wysocki; Len
> > Brown; Peter Zijlstra; H. Peter Anvin; Borislav Petkov; Brian Gerst;
> > Thomas Gleixner; Ingo Molnar; Varun Koyyalagunta;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing
> > frozen_cpus if it is NULL
> >
[cut]
> >
> > I'd say that whoever allocates frozen_cpus should just abort the
> > hibernation if there's not enough memory for the operation...? This
> > seems like checking for the problem too late.
> >
> The allocation of frozen_cpus is alloc_frozen_cpus, which is in core_initcall, and
> do_initcall during boot up  seems not to care about the return value from these
> functions.  So I think either we add the check in disable_nonboot_cpus, or we
> can set noresume = 1 and nohibernate = 1 if alloc_frozen_cpus fails.
> 
Besides, disable_nonboot_cpus is not only used for hibernation, but also for
suspend to ram, kexec, etc, so it might be better to check before we use
it, thus to avoid introducing new flags IMO.

thanks,
Yu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus
  2016-06-25 16:18 ` [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus Chen Yu
  2016-06-25 16:51   ` Pavel Machek
@ 2016-10-07 19:31   ` Andy Lutomirski
  2016-10-08 16:58     ` Chen Yu
  2016-10-11 15:42     ` Pavel Machek
  1 sibling, 2 replies; 21+ messages in thread
From: Andy Lutomirski @ 2016-10-07 19:31 UTC (permalink / raw)
  To: Chen Yu, linux-pm, x86
  Cc: Rafael J. Wysocki, Len Brown, Peter Zijlstra, H. Peter Anvin,
	Borislav Petkov, Pavel Machek, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel

On 06/25/2016 09:18 AM, Chen Yu wrote:
> There is requirement that we need to do some arch-specific
> operations before putting the nonboot CPUs offline/online.
> One of the requirements comes from the hibernation resume
> process on x86_64, we need to kick all the offlin-CPUs
> online and offline again, in order to put them in a safe
> state, thus to avoid possible unwilling wake up during
> hibernation resume.
>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
>  kernel/cpu.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index d25266e..ce6e5e4 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1017,6 +1017,16 @@ EXPORT_SYMBOL_GPL(cpu_up);
>  #ifdef CONFIG_PM_SLEEP_SMP
>  static cpumask_var_t frozen_cpus;
>
> +void __weak arch_disable_nonboot_cpus_pre(void)

I don't like using __weak.  It penalizes code size on architectures that 
don't hook these functions.  My preferred pattern is:

include/linux/something.h:

#include <asm/something.h>

#ifndef arch_do_xyz
static inline void arch_do_xyz() {}
#endif

arch/whatever/asm/something.h:

extern void arch_do_xyz();  /* or static inline... */
#define arch_do_xyz


This is totally free for architectures that don't have the hooks and it 
can potentially be inlined on architectures that do have the hooks. 
Everyone wins except that it's about five additional lines of code.

--Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when resuming from hibernation
  2016-06-25 16:19 ` [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when " Chen Yu
@ 2016-10-07 19:47   ` Andy Lutomirski
  2016-10-08 10:31     ` Rafael J. Wysocki
  2016-10-08 16:54     ` Chen Yu
  0 siblings, 2 replies; 21+ messages in thread
From: Andy Lutomirski @ 2016-10-07 19:47 UTC (permalink / raw)
  To: Chen Yu, linux-pm, x86
  Cc: Rafael J. Wysocki, Len Brown, Peter Zijlstra, H. Peter Anvin,
	Borislav Petkov, Pavel Machek, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, linux-kernel, Borislav Petkov

On 06/25/2016 09:19 AM, Chen Yu wrote:
> Here's the story of what the problem is, why this
> happened, and why this patch looks like this:
>
> Stress test from Varun Koyyalagunta reports that, the
> nonboot CPU would hang occasionally, when resuming from
> hibernation. Further investigation shows that, the precise
> stage when nonboot CPU hangs, is the time when the nonboot
> CPU been woken up incorrectly, and tries to monitor the
> mwait_ptr for the second time, then an exception is
> triggered due to illegal vaddr access, say, something like,
> 'Unable to handler kernel address of 0xffff8800ba800010...'
>
> Further investigation shows that, the exception is caused
> by accessing a page without PRESENT flag, because the pte entry
> for this vaddr is zero. Here's the scenario how this problem
> happens: Page table for direct mapping is allocated dynamically
> by kernel_physical_mapping_init, it is possible that in the
> resume process, when the boot CPU is trying to write back pages
> to their original address, and just right to writes to the monitor
> mwait_ptr then wakes up one of the nonboot CPUs, since the page
> table currently used by the nonboot CPU might not the same as it
> is before the hibernation, an exception might occur due to
> inconsistent page table.
>
> First try is to get rid of this problem by changing the monitor
> address from task.flag to zero page, because one one would write
> to zero page. But this still have problem because of ping-pong
> wake up situation in mwait_play_dead:
>
> One possible implementation of a clflush is a read-invalidate snoop,
> which is what a store might look like, so cflush might break the mwait.
>
> 1. CPU1 wait at zero page
> 2. CPU2 cflush zero page, wake CPU1 up, then CPU2 waits at zero page
> 3. CPU1 is woken up, and invoke cflush zero page, thus wake up CPU2 again.
> then the nonboot CPUs never sleep for long.
>
> So it's better to monitor different address for each
> nonboot CPUs, however since there is only one zero page, at most:
> PAGE_SIZE/L1_CACHE_LINE CPUs are satisfied, which is usually 64
> on a x86_64, apparently it's not enough for servers, maybe more
> zero pages are required.
>
> So choose the solution as Brian suggested, to put the nonboot CPUs
> into hlt before resuming. But Rafael has mentioned that, if some of
> the CPUs have already been offline before hibernation, then the problem
> is still there. So this patch tries to kick the already offline CPUs woken
> up and fall into hlt, and then put the rest online CPUs into hlt.
> In this way, all the nonboot CPUs will wait at a safe state,
> without touching any memory during s/r. (It's not safe to modify
> mwait_play_dead, because once previous offline CPUs are woken up,
> it will either access text code, whose page table is not safe anymore
> across hibernation, due to:
> Commit ab76f7b4ab23 ("x86/mm: Set NX on gap between __ex_table and
> rodata").
>

I realize I'm extremely late to the party, but I must admit that I don't 
get it.  Sure, hibernation resume can spuriously wake the non-boot CPU, 
but at some point it has to wake up for real.  What ensures that the 
text it was running (native_play_dead or whatever) is still there when 
it wakes up?

Or does the hibernation resume code actually send the remote CPU an 
INIT-SIPI sequence a la wakeup_secondary_cpu_via_init()?  If so, this 
seems a bit odd to me.  Shouldn't we kick the CPU all the way to the 
wait-for-SIPI state rather than getting it to play dead via hlt or 
mwait?  Or perhaps there should be a real hibernation trampoline: 
allocate a safe page, have all CPUs jump there and spin (via hlt or 
mwait), do the resume, and then kick the other CPUs awake and have them 
jump back to real code.

--Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when resuming from hibernation
  2016-10-07 19:47   ` Andy Lutomirski
@ 2016-10-08 10:31     ` Rafael J. Wysocki
  2016-10-16 16:50       ` Andy Lutomirski
  2016-10-08 16:54     ` Chen Yu
  1 sibling, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2016-10-08 10:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Chen Yu, Linux PM, the arch/x86 maintainers, Rafael J. Wysocki,
	Len Brown, Peter Zijlstra, H. Peter Anvin, Borislav Petkov,
	Pavel Machek, Brian Gerst, Thomas Gleixner, Ingo Molnar,
	Varun Koyyalagunta, Linux Kernel Mailing List, Borislav Petkov

On Fri, Oct 7, 2016 at 9:47 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On 06/25/2016 09:19 AM, Chen Yu wrote:
>>
>> Here's the story of what the problem is, why this
>> happened, and why this patch looks like this:
>>
>> Stress test from Varun Koyyalagunta reports that, the
>> nonboot CPU would hang occasionally, when resuming from
>> hibernation. Further investigation shows that, the precise
>> stage when nonboot CPU hangs, is the time when the nonboot
>> CPU been woken up incorrectly, and tries to monitor the
>> mwait_ptr for the second time, then an exception is
>> triggered due to illegal vaddr access, say, something like,
>> 'Unable to handler kernel address of 0xffff8800ba800010...'
>>
>> Further investigation shows that, the exception is caused
>> by accessing a page without PRESENT flag, because the pte entry
>> for this vaddr is zero. Here's the scenario how this problem
>> happens: Page table for direct mapping is allocated dynamically
>> by kernel_physical_mapping_init, it is possible that in the
>> resume process, when the boot CPU is trying to write back pages
>> to their original address, and just right to writes to the monitor
>> mwait_ptr then wakes up one of the nonboot CPUs, since the page
>> table currently used by the nonboot CPU might not the same as it
>> is before the hibernation, an exception might occur due to
>> inconsistent page table.
>>
>> First try is to get rid of this problem by changing the monitor
>> address from task.flag to zero page, because one one would write
>> to zero page. But this still have problem because of ping-pong
>> wake up situation in mwait_play_dead:
>>
>> One possible implementation of a clflush is a read-invalidate snoop,
>> which is what a store might look like, so cflush might break the mwait.
>>
>> 1. CPU1 wait at zero page
>> 2. CPU2 cflush zero page, wake CPU1 up, then CPU2 waits at zero page
>> 3. CPU1 is woken up, and invoke cflush zero page, thus wake up CPU2 again.
>> then the nonboot CPUs never sleep for long.
>>
>> So it's better to monitor different address for each
>> nonboot CPUs, however since there is only one zero page, at most:
>> PAGE_SIZE/L1_CACHE_LINE CPUs are satisfied, which is usually 64
>> on a x86_64, apparently it's not enough for servers, maybe more
>> zero pages are required.
>>
>> So choose the solution as Brian suggested, to put the nonboot CPUs
>> into hlt before resuming. But Rafael has mentioned that, if some of
>> the CPUs have already been offline before hibernation, then the problem
>> is still there. So this patch tries to kick the already offline CPUs woken
>> up and fall into hlt, and then put the rest online CPUs into hlt.
>> In this way, all the nonboot CPUs will wait at a safe state,
>> without touching any memory during s/r. (It's not safe to modify
>> mwait_play_dead, because once previous offline CPUs are woken up,
>> it will either access text code, whose page table is not safe anymore
>> across hibernation, due to:
>> Commit ab76f7b4ab23 ("x86/mm: Set NX on gap between __ex_table and
>> rodata").
>>
>
> I realize I'm extremely late to the party, but I must admit that I don't get
> it.  Sure, hibernation resume can spuriously wake the non-boot CPU, but at
> some point it has to wake up for real.

You mean during resume?  We reinit from scratch then.

> What ensures that the text it was
> running (native_play_dead or whatever) is still there when it wakes up?
>
> Or does the hibernation resume code actually send the remote CPU an
> INIT-SIPI sequence a la wakeup_secondary_cpu_via_init()?

That's what happens AFAICS.

> If so, this seems
> a bit odd to me.  Shouldn't we kick the CPU all the way to the wait-for-SIPI
> state rather than getting it to play dead via hlt or mwait?

We could do that.  It would be a bit cleaner than using the "hlt play
dead" thing, but the practical difference would be very small (if
observable at all).

> Or perhaps
> there should be a real hibernation trampoline: allocate a safe page, have
> all CPUs jump there and spin (via hlt or mwait), do the resume, and then
> kick the other CPUs awake and have them jump back to real code.

That would require extra synchronization between the image and restore
kernels (the trampoline would have to be the same virtual address and
the same code in both), so I'd rather not do that if not absolutely
necessary.

The wait-for-SIPI idea seems viable to me though.  Do we have code to
do that already?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when resuming from hibernation
  2016-10-07 19:47   ` Andy Lutomirski
  2016-10-08 10:31     ` Rafael J. Wysocki
@ 2016-10-08 16:54     ` Chen Yu
  1 sibling, 0 replies; 21+ messages in thread
From: Chen Yu @ 2016-10-08 16:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-pm, x86, Rafael J. Wysocki, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Pavel Machek, Brian Gerst,
	Thomas Gleixner, Ingo Molnar, Varun Koyyalagunta, linux-kernel,
	Borislav Petkov

Hi Andy,
On Fri, Oct 07, 2016 at 12:47:24PM -0700, Andy Lutomirski wrote:
> On 06/25/2016 09:19 AM, Chen Yu wrote:
> >Here's the story of what the problem is, why this
> >happened, and why this patch looks like this:
> >
> >Stress test from Varun Koyyalagunta reports that, the
> >nonboot CPU would hang occasionally, when resuming from
> >hibernation. Further investigation shows that, the precise
> >stage when nonboot CPU hangs, is the time when the nonboot
> >CPU been woken up incorrectly, and tries to monitor the
> >mwait_ptr for the second time, then an exception is
> >triggered due to illegal vaddr access, say, something like,
> >'Unable to handler kernel address of 0xffff8800ba800010...'
> >
> >Further investigation shows that, the exception is caused
> >by accessing a page without PRESENT flag, because the pte entry
> >for this vaddr is zero. Here's the scenario how this problem
> >happens: Page table for direct mapping is allocated dynamically
> >by kernel_physical_mapping_init, it is possible that in the
> >resume process, when the boot CPU is trying to write back pages
> >to their original address, and just right to writes to the monitor
> >mwait_ptr then wakes up one of the nonboot CPUs, since the page
> >table currently used by the nonboot CPU might not the same as it
> >is before the hibernation, an exception might occur due to
> >inconsistent page table.
> >
> >First try is to get rid of this problem by changing the monitor
> >address from task.flag to zero page, because one one would write
> >to zero page. But this still have problem because of ping-pong
> >wake up situation in mwait_play_dead:
> >
> >One possible implementation of a clflush is a read-invalidate snoop,
> >which is what a store might look like, so cflush might break the mwait.
> >
> >1. CPU1 wait at zero page
> >2. CPU2 cflush zero page, wake CPU1 up, then CPU2 waits at zero page
> >3. CPU1 is woken up, and invoke cflush zero page, thus wake up CPU2 again.
> >then the nonboot CPUs never sleep for long.
> >
> >So it's better to monitor different address for each
> >nonboot CPUs, however since there is only one zero page, at most:
> >PAGE_SIZE/L1_CACHE_LINE CPUs are satisfied, which is usually 64
> >on a x86_64, apparently it's not enough for servers, maybe more
> >zero pages are required.
> >
> >So choose the solution as Brian suggested, to put the nonboot CPUs
> >into hlt before resuming. But Rafael has mentioned that, if some of
> >the CPUs have already been offline before hibernation, then the problem
> >is still there. So this patch tries to kick the already offline CPUs woken
> >up and fall into hlt, and then put the rest online CPUs into hlt.
> >In this way, all the nonboot CPUs will wait at a safe state,
> >without touching any memory during s/r. (It's not safe to modify
> >mwait_play_dead, because once previous offline CPUs are woken up,
> >it will either access text code, whose page table is not safe anymore
> >across hibernation, due to:
> >Commit ab76f7b4ab23 ("x86/mm: Set NX on gap between __ex_table and
> >rodata").
> >
> 
> I realize I'm extremely late to the party, but I must admit that I don't get
> it.  Sure, hibernation resume can spuriously wake the non-boot CPU, but at
> some point it has to wake up for real.  What ensures that the text it was
> running (native_play_dead or whatever) is still there when it wakes up?
BTW, If I understand correctly, when the nonboot CPUs are woken up from hlt by
INIT,INIT,STARTUP sequence, these CPUs do not need the text mapping, as Rafael
mentioned, they are reinitialized from scratch(react in realmode after received
STARTUP).
> 
> --Andy
Thanks,
Yu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus
  2016-10-07 19:31   ` Andy Lutomirski
@ 2016-10-08 16:58     ` Chen Yu
  2016-10-11 15:42     ` Pavel Machek
  1 sibling, 0 replies; 21+ messages in thread
From: Chen Yu @ 2016-10-08 16:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-pm, x86, Rafael J. Wysocki, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Pavel Machek, Brian Gerst,
	Thomas Gleixner, Ingo Molnar, Varun Koyyalagunta, linux-kernel

Hi Andy,
On Fri, Oct 07, 2016 at 12:31:41PM -0700, Andy Lutomirski wrote:
> On 06/25/2016 09:18 AM, Chen Yu wrote:
> >There is requirement that we need to do some arch-specific
> >operations before putting the nonboot CPUs offline/online.
> >One of the requirements comes from the hibernation resume
> >process on x86_64, we need to kick all the offlin-CPUs
> >online and offline again, in order to put them in a safe
> >state, thus to avoid possible unwilling wake up during
> >hibernation resume.
> >
> >Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> >---
> > kernel/cpu.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> >diff --git a/kernel/cpu.c b/kernel/cpu.c
> >index d25266e..ce6e5e4 100644
> >--- a/kernel/cpu.c
> >+++ b/kernel/cpu.c
> >@@ -1017,6 +1017,16 @@ EXPORT_SYMBOL_GPL(cpu_up);
> > #ifdef CONFIG_PM_SLEEP_SMP
> > static cpumask_var_t frozen_cpus;
> >
> >+void __weak arch_disable_nonboot_cpus_pre(void)
> 
> I don't like using __weak.  It penalizes code size on architectures that
> don't hook these functions.  My preferred pattern is:
> 
> include/linux/something.h:
> 
> #include <asm/something.h>
> 
> #ifndef arch_do_xyz
> static inline void arch_do_xyz() {}
> #endif
> 
> arch/whatever/asm/something.h:
> 
> extern void arch_do_xyz();  /* or static inline... */
> #define arch_do_xyz
> 
> 
> This is totally free for architectures that don't have the hooks and it can
> potentially be inlined on architectures that do have the hooks. Everyone
> wins except that it's about five additional lines of code.
> 
Thanks for your guidance, this patch was deprecated previously, but I'll
take your advice the next time when writing similar patch.  
> --Andy
Thanks,
Yu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus
  2016-10-07 19:31   ` Andy Lutomirski
  2016-10-08 16:58     ` Chen Yu
@ 2016-10-11 15:42     ` Pavel Machek
  1 sibling, 0 replies; 21+ messages in thread
From: Pavel Machek @ 2016-10-11 15:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Chen Yu, linux-pm, x86, Rafael J. Wysocki, Len Brown,
	Peter Zijlstra, H. Peter Anvin, Borislav Petkov, Brian Gerst,
	Thomas Gleixner, Ingo Molnar, Varun Koyyalagunta, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1800 bytes --]

On Fri 2016-10-07 12:31:41, Andy Lutomirski wrote:
> On 06/25/2016 09:18 AM, Chen Yu wrote:
> >There is requirement that we need to do some arch-specific
> >operations before putting the nonboot CPUs offline/online.
> >One of the requirements comes from the hibernation resume
> >process on x86_64, we need to kick all the offlin-CPUs
> >online and offline again, in order to put them in a safe
> >state, thus to avoid possible unwilling wake up during
> >hibernation resume.
> >
> >Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> >---
> > kernel/cpu.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> >diff --git a/kernel/cpu.c b/kernel/cpu.c
> >index d25266e..ce6e5e4 100644
> >--- a/kernel/cpu.c
> >+++ b/kernel/cpu.c
> >@@ -1017,6 +1017,16 @@ EXPORT_SYMBOL_GPL(cpu_up);
> > #ifdef CONFIG_PM_SLEEP_SMP
> > static cpumask_var_t frozen_cpus;
> >
> >+void __weak arch_disable_nonboot_cpus_pre(void)
> 
> I don't like using __weak.  It penalizes code size on architectures that
> don't hook these functions.  My preferred pattern is:
> 
> include/linux/something.h:
> 
> #include <asm/something.h>
> 
> #ifndef arch_do_xyz
> static inline void arch_do_xyz() {}
> #endif
> 
> arch/whatever/asm/something.h:
> 
> extern void arch_do_xyz();  /* or static inline... */
> #define arch_do_xyz
> 
> 
> This is totally free for architectures that don't have the hooks and it can
> potentially be inlined on architectures that do have the hooks. Everyone
> wins except that it's about five additional lines of code.

Well... 5 additional lines may be worse than few bytes in the object
file...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when resuming from hibernation
  2016-10-08 10:31     ` Rafael J. Wysocki
@ 2016-10-16 16:50       ` Andy Lutomirski
  2016-10-18  0:30         ` Rafael J. Wysocki
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Lutomirski @ 2016-10-16 16:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andy Lutomirski, Chen Yu, Linux PM, the arch/x86 maintainers,
	Rafael J. Wysocki, Len Brown, Peter Zijlstra, H. Peter Anvin,
	Borislav Petkov, Pavel Machek, Brian Gerst, Thomas Gleixner,
	Ingo Molnar, Varun Koyyalagunta, Linux Kernel Mailing List,
	Borislav Petkov

On Sat, Oct 8, 2016 at 3:31 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, Oct 7, 2016 at 9:47 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On 06/25/2016 09:19 AM, Chen Yu wrote:
>>>
>>> Here's the story of what the problem is, why this
>>> happened, and why this patch looks like this:
>>>
>>> Stress test from Varun Koyyalagunta reports that, the
>>> nonboot CPU would hang occasionally, when resuming from
>>> hibernation. Further investigation shows that, the precise
>>> stage when nonboot CPU hangs, is the time when the nonboot
>>> CPU been woken up incorrectly, and tries to monitor the
>>> mwait_ptr for the second time, then an exception is
>>> triggered due to illegal vaddr access, say, something like,
>>> 'Unable to handler kernel address of 0xffff8800ba800010...'
>>>
>>> Further investigation shows that, the exception is caused
>>> by accessing a page without PRESENT flag, because the pte entry
>>> for this vaddr is zero. Here's the scenario how this problem
>>> happens: Page table for direct mapping is allocated dynamically
>>> by kernel_physical_mapping_init, it is possible that in the
>>> resume process, when the boot CPU is trying to write back pages
>>> to their original address, and just right to writes to the monitor
>>> mwait_ptr then wakes up one of the nonboot CPUs, since the page
>>> table currently used by the nonboot CPU might not the same as it
>>> is before the hibernation, an exception might occur due to
>>> inconsistent page table.
>>>
>>> First try is to get rid of this problem by changing the monitor
>>> address from task.flag to zero page, because one one would write
>>> to zero page. But this still have problem because of ping-pong
>>> wake up situation in mwait_play_dead:
>>>
>>> One possible implementation of a clflush is a read-invalidate snoop,
>>> which is what a store might look like, so cflush might break the mwait.
>>>
>>> 1. CPU1 wait at zero page
>>> 2. CPU2 cflush zero page, wake CPU1 up, then CPU2 waits at zero page
>>> 3. CPU1 is woken up, and invoke cflush zero page, thus wake up CPU2 again.
>>> then the nonboot CPUs never sleep for long.
>>>
>>> So it's better to monitor different address for each
>>> nonboot CPUs, however since there is only one zero page, at most:
>>> PAGE_SIZE/L1_CACHE_LINE CPUs are satisfied, which is usually 64
>>> on a x86_64, apparently it's not enough for servers, maybe more
>>> zero pages are required.
>>>
>>> So choose the solution as Brian suggested, to put the nonboot CPUs
>>> into hlt before resuming. But Rafael has mentioned that, if some of
>>> the CPUs have already been offline before hibernation, then the problem
>>> is still there. So this patch tries to kick the already offline CPUs woken
>>> up and fall into hlt, and then put the rest online CPUs into hlt.
>>> In this way, all the nonboot CPUs will wait at a safe state,
>>> without touching any memory during s/r. (It's not safe to modify
>>> mwait_play_dead, because once previous offline CPUs are woken up,
>>> it will either access text code, whose page table is not safe anymore
>>> across hibernation, due to:
>>> Commit ab76f7b4ab23 ("x86/mm: Set NX on gap between __ex_table and
>>> rodata").
>>>
>>
>> I realize I'm extremely late to the party, but I must admit that I don't get
>> it.  Sure, hibernation resume can spuriously wake the non-boot CPU, but at
>> some point it has to wake up for real.
>
> You mean during resume?  We reinit from scratch then.
>
>> What ensures that the text it was
>> running (native_play_dead or whatever) is still there when it wakes up?
>>
>> Or does the hibernation resume code actually send the remote CPU an
>> INIT-SIPI sequence a la wakeup_secondary_cpu_via_init()?
>
> That's what happens AFAICS.
>
>> If so, this seems
>> a bit odd to me.  Shouldn't we kick the CPU all the way to the wait-for-SIPI
>> state rather than getting it to play dead via hlt or mwait?
>
> We could do that.  It would be a bit cleaner than using the "hlt play
> dead" thing, but the practical difference would be very small (if
> observable at all).

Probably true.  It might be worth changing the "hlt" path to something like:

asm volatile ("hlt");
WARN(1, "CPU woke directly from halt-for-resume -- should have been
woken by SIPI\n");

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when resuming from hibernation
  2016-10-16 16:50       ` Andy Lutomirski
@ 2016-10-18  0:30         ` Rafael J. Wysocki
  2016-10-18  1:21           ` Andy Lutomirski
  0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2016-10-18  0:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Rafael J. Wysocki, Andy Lutomirski, Chen Yu, Linux PM,
	the arch/x86 maintainers, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Pavel Machek, Brian Gerst,
	Thomas Gleixner, Ingo Molnar, Varun Koyyalagunta,
	Linux Kernel Mailing List, Borislav Petkov

On Sunday, October 16, 2016 09:50:23 AM Andy Lutomirski wrote:
> On Sat, Oct 8, 2016 at 3:31 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Fri, Oct 7, 2016 at 9:47 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >> On 06/25/2016 09:19 AM, Chen Yu wrote:
> >>>
> >>> Here's the story of what the problem is, why this
> >>> happened, and why this patch looks like this:
> >>>
> >>> Stress test from Varun Koyyalagunta reports that, the
> >>> nonboot CPU would hang occasionally, when resuming from
> >>> hibernation. Further investigation shows that, the precise
> >>> stage when nonboot CPU hangs, is the time when the nonboot
> >>> CPU been woken up incorrectly, and tries to monitor the
> >>> mwait_ptr for the second time, then an exception is
> >>> triggered due to illegal vaddr access, say, something like,
> >>> 'Unable to handler kernel address of 0xffff8800ba800010...'
> >>>
> >>> Further investigation shows that, the exception is caused
> >>> by accessing a page without PRESENT flag, because the pte entry
> >>> for this vaddr is zero. Here's the scenario how this problem
> >>> happens: Page table for direct mapping is allocated dynamically
> >>> by kernel_physical_mapping_init, it is possible that in the
> >>> resume process, when the boot CPU is trying to write back pages
> >>> to their original address, and just right to writes to the monitor
> >>> mwait_ptr then wakes up one of the nonboot CPUs, since the page
> >>> table currently used by the nonboot CPU might not the same as it
> >>> is before the hibernation, an exception might occur due to
> >>> inconsistent page table.
> >>>
> >>> First try is to get rid of this problem by changing the monitor
> >>> address from task.flag to zero page, because one one would write
> >>> to zero page. But this still have problem because of ping-pong
> >>> wake up situation in mwait_play_dead:
> >>>
> >>> One possible implementation of a clflush is a read-invalidate snoop,
> >>> which is what a store might look like, so cflush might break the mwait.
> >>>
> >>> 1. CPU1 wait at zero page
> >>> 2. CPU2 cflush zero page, wake CPU1 up, then CPU2 waits at zero page
> >>> 3. CPU1 is woken up, and invoke cflush zero page, thus wake up CPU2 again.
> >>> then the nonboot CPUs never sleep for long.
> >>>
> >>> So it's better to monitor different address for each
> >>> nonboot CPUs, however since there is only one zero page, at most:
> >>> PAGE_SIZE/L1_CACHE_LINE CPUs are satisfied, which is usually 64
> >>> on a x86_64, apparently it's not enough for servers, maybe more
> >>> zero pages are required.
> >>>
> >>> So choose the solution as Brian suggested, to put the nonboot CPUs
> >>> into hlt before resuming. But Rafael has mentioned that, if some of
> >>> the CPUs have already been offline before hibernation, then the problem
> >>> is still there. So this patch tries to kick the already offline CPUs woken
> >>> up and fall into hlt, and then put the rest online CPUs into hlt.
> >>> In this way, all the nonboot CPUs will wait at a safe state,
> >>> without touching any memory during s/r. (It's not safe to modify
> >>> mwait_play_dead, because once previous offline CPUs are woken up,
> >>> it will either access text code, whose page table is not safe anymore
> >>> across hibernation, due to:
> >>> Commit ab76f7b4ab23 ("x86/mm: Set NX on gap between __ex_table and
> >>> rodata").
> >>>
> >>
> >> I realize I'm extremely late to the party, but I must admit that I don't get
> >> it.  Sure, hibernation resume can spuriously wake the non-boot CPU, but at
> >> some point it has to wake up for real.
> >
> > You mean during resume?  We reinit from scratch then.
> >
> >> What ensures that the text it was
> >> running (native_play_dead or whatever) is still there when it wakes up?
> >>
> >> Or does the hibernation resume code actually send the remote CPU an
> >> INIT-SIPI sequence a la wakeup_secondary_cpu_via_init()?
> >
> > That's what happens AFAICS.
> >
> >> If so, this seems
> >> a bit odd to me.  Shouldn't we kick the CPU all the way to the wait-for-SIPI
> >> state rather than getting it to play dead via hlt or mwait?
> >
> > We could do that.  It would be a bit cleaner than using the "hlt play
> > dead" thing, but the practical difference would be very small (if
> > observable at all).
> 
> Probably true.  It might be worth changing the "hlt" path to something like:
> 
> asm volatile ("hlt");
> WARN(1, "CPU woke directly from halt-for-resume -- should have been
> woken by SIPI\n");

The visibility of that warning would be sort of limited, though, because the
only case it might show up is when the system went belly up due to an unhandled
page fault.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when resuming from hibernation
  2016-10-18  0:30         ` Rafael J. Wysocki
@ 2016-10-18  1:21           ` Andy Lutomirski
  0 siblings, 0 replies; 21+ messages in thread
From: Andy Lutomirski @ 2016-10-18  1:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Andy Lutomirski, Chen Yu, Linux PM,
	the arch/x86 maintainers, Len Brown, Peter Zijlstra,
	H. Peter Anvin, Borislav Petkov, Pavel Machek, Brian Gerst,
	Thomas Gleixner, Ingo Molnar, Varun Koyyalagunta,
	Linux Kernel Mailing List, Borislav Petkov

On Mon, Oct 17, 2016 at 5:30 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Sunday, October 16, 2016 09:50:23 AM Andy Lutomirski wrote:
>> On Sat, Oct 8, 2016 at 3:31 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> > On Fri, Oct 7, 2016 at 9:47 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> On 06/25/2016 09:19 AM, Chen Yu wrote:
>> >>>
>> >>> Here's the story of what the problem is, why this
>> >>> happened, and why this patch looks like this:
>> >>>
>> >>> Stress test from Varun Koyyalagunta reports that, the
>> >>> nonboot CPU would hang occasionally, when resuming from
>> >>> hibernation. Further investigation shows that, the precise
>> >>> stage when nonboot CPU hangs, is the time when the nonboot
>> >>> CPU been woken up incorrectly, and tries to monitor the
>> >>> mwait_ptr for the second time, then an exception is
>> >>> triggered due to illegal vaddr access, say, something like,
>> >>> 'Unable to handler kernel address of 0xffff8800ba800010...'
>> >>>
>> >>> Further investigation shows that, the exception is caused
>> >>> by accessing a page without PRESENT flag, because the pte entry
>> >>> for this vaddr is zero. Here's the scenario how this problem
>> >>> happens: Page table for direct mapping is allocated dynamically
>> >>> by kernel_physical_mapping_init, it is possible that in the
>> >>> resume process, when the boot CPU is trying to write back pages
>> >>> to their original address, and just right to writes to the monitor
>> >>> mwait_ptr then wakes up one of the nonboot CPUs, since the page
>> >>> table currently used by the nonboot CPU might not the same as it
>> >>> is before the hibernation, an exception might occur due to
>> >>> inconsistent page table.
>> >>>
>> >>> First try is to get rid of this problem by changing the monitor
>> >>> address from task.flag to zero page, because one one would write
>> >>> to zero page. But this still have problem because of ping-pong
>> >>> wake up situation in mwait_play_dead:
>> >>>
>> >>> One possible implementation of a clflush is a read-invalidate snoop,
>> >>> which is what a store might look like, so cflush might break the mwait.
>> >>>
>> >>> 1. CPU1 wait at zero page
>> >>> 2. CPU2 cflush zero page, wake CPU1 up, then CPU2 waits at zero page
>> >>> 3. CPU1 is woken up, and invoke cflush zero page, thus wake up CPU2 again.
>> >>> then the nonboot CPUs never sleep for long.
>> >>>
>> >>> So it's better to monitor different address for each
>> >>> nonboot CPUs, however since there is only one zero page, at most:
>> >>> PAGE_SIZE/L1_CACHE_LINE CPUs are satisfied, which is usually 64
>> >>> on a x86_64, apparently it's not enough for servers, maybe more
>> >>> zero pages are required.
>> >>>
>> >>> So choose the solution as Brian suggested, to put the nonboot CPUs
>> >>> into hlt before resuming. But Rafael has mentioned that, if some of
>> >>> the CPUs have already been offline before hibernation, then the problem
>> >>> is still there. So this patch tries to kick the already offline CPUs woken
>> >>> up and fall into hlt, and then put the rest online CPUs into hlt.
>> >>> In this way, all the nonboot CPUs will wait at a safe state,
>> >>> without touching any memory during s/r. (It's not safe to modify
>> >>> mwait_play_dead, because once previous offline CPUs are woken up,
>> >>> it will either access text code, whose page table is not safe anymore
>> >>> across hibernation, due to:
>> >>> Commit ab76f7b4ab23 ("x86/mm: Set NX on gap between __ex_table and
>> >>> rodata").
>> >>>
>> >>
>> >> I realize I'm extremely late to the party, but I must admit that I don't get
>> >> it.  Sure, hibernation resume can spuriously wake the non-boot CPU, but at
>> >> some point it has to wake up for real.
>> >
>> > You mean during resume?  We reinit from scratch then.
>> >
>> >> What ensures that the text it was
>> >> running (native_play_dead or whatever) is still there when it wakes up?
>> >>
>> >> Or does the hibernation resume code actually send the remote CPU an
>> >> INIT-SIPI sequence a la wakeup_secondary_cpu_via_init()?
>> >
>> > That's what happens AFAICS.
>> >
>> >> If so, this seems
>> >> a bit odd to me.  Shouldn't we kick the CPU all the way to the wait-for-SIPI
>> >> state rather than getting it to play dead via hlt or mwait?
>> >
>> > We could do that.  It would be a bit cleaner than using the "hlt play
>> > dead" thing, but the practical difference would be very small (if
>> > observable at all).
>>
>> Probably true.  It might be worth changing the "hlt" path to something like:
>>
>> asm volatile ("hlt");
>> WARN(1, "CPU woke directly from halt-for-resume -- should have been
>> woken by SIPI\n");
>
> The visibility of that warning would be sort of limited, though, because the
> only case it might show up is when the system went belly up due to an unhandled
> page fault.

Righto.

Maybe at least add a comment that the hlt isn't intended to ever
resume on the next instruction?

--Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-10-18  1:22 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-25 16:18 [PATCH 0/4][RFC v2] x86, hotplug: Use hlt instead of mwait when resuming from hibernation Chen Yu
2016-06-25 16:18 ` [PATCH 1/4][RFC v2] PM / sleep: Avoid accessing frozen_cpus if it is NULL Chen Yu
2016-06-25 16:51   ` Pavel Machek
2016-06-26  1:16     ` Chen, Yu C
2016-06-26  4:25     ` Chen, Yu C
2016-06-25 16:18 ` [PATCH 2/4][RFC v2] PM / sleep: Introduce arch-specific hook for disable/enable nonboot cpus Chen Yu
2016-06-25 16:51   ` Pavel Machek
2016-06-26  1:19     ` Chen, Yu C
2016-10-07 19:31   ` Andy Lutomirski
2016-10-08 16:58     ` Chen Yu
2016-10-11 15:42     ` Pavel Machek
2016-06-25 16:18 ` [PATCH 3/4][RFC v2] PM / hibernate: introduce a flag to indicate resuming from hibernation Chen Yu
2016-06-25 16:53   ` Pavel Machek
2016-06-26  1:34     ` Chen, Yu C
2016-06-25 16:19 ` [PATCH 4/4] x86, hotplug: Use hlt instead of mwait when " Chen Yu
2016-10-07 19:47   ` Andy Lutomirski
2016-10-08 10:31     ` Rafael J. Wysocki
2016-10-16 16:50       ` Andy Lutomirski
2016-10-18  0:30         ` Rafael J. Wysocki
2016-10-18  1:21           ` Andy Lutomirski
2016-10-08 16:54     ` Chen Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).