linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
@ 2016-06-06 14:19 Chen Yu
  2016-06-06 14:25 ` Peter Zijlstra
  2016-06-07  8:03 ` Pavel Machek
  0 siblings, 2 replies; 10+ messages in thread
From: Chen Yu @ 2016-06-06 14:19 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	linux-pm, Rafael J . Wysocki, Pavel Machek, Len Brown,
	Borislav Petkov, Peter Zijlstra, Zhu Guihua, Juergen Gross,
	Chen Yu

Stress test from Varun Koyyalagunta reports that, the
nonboot CPU would hang occasionally, when resuming from
hibernation. Further investigation shows that, the precise
phase when nonboot CPU hangs, is the time when the nonboot
CPU been woken up incorrectly, and tries to monitor the
mwait_ptr for the second time, then an exception is
triggered due to illegal vaddr access, say, something like,
'Unable to handler kernel address of 0xffff8800ba800010...'

One of the possible scenarios for this issue is illustrated below,
when the boot CPU tries to resume from hibernation:
1. puts the nonboot CPUs offline, so the nonboot CPUs are monitoring
   at the address of the task_struct.flags.
2. boot CPU copies pages to their original address, which includes
   task_struct.flags, thus wakes up one of the nonboot CPUs.
3. nonboot CPU tries to monitor the task_struct.flags again, but since
   the page table for task_struct.flags has been overwritten by
   boot CPU, and there is probably a changed across hibernation
   (because of inconsistence of e820 memory map), an exception is
   triggered.

As suggested by Rafael and Len, this patch tries to monitor a zero
page instead of task_struct.flags, if it comes from hibernation resume
process. The zero page should be safe because it is located in .bss
and page table for kernel mapping of text/data/bss should keeps
unchanged according to hibernation semantic.

Reported-by: Varun Koyyalagunta <cpudebug@centtech.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106371
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 arch/x86/kernel/smpboot.c | 16 +++++++++++++++-
 include/linux/suspend.h   |  7 +++++++
 kernel/power/hibernate.c  |  3 +++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fafe8b9..b2732ae 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -53,6 +53,7 @@
 #include <linux/stackprotector.h>
 #include <linux/gfp.h>
 #include <linux/cpuidle.h>
+#include <linux/suspend.h>
 
 #include <asm/acpi.h>
 #include <asm/desc.h>
@@ -1595,8 +1596,21 @@ static inline void mwait_play_dead(void)
 	 * This should be a memory location in a cache line which is
 	 * unlikely to be touched by other processors.  The actual
 	 * content is immaterial as it is not actually modified in any way.
+	 *
+	 * However in hibernation resume process, this address could be
+	 * touched by BSP when restoring page frames, if the page table
+	 * for this address is not coherent across hibernation(due to
+	 * inconsistence of e820 memory map), access from APs might
+	 * cause exception. So change the mwait address to zero page,
+	 * which is located in .bss, in this way we can avoid illegal
+	 * access from APs because page table for kernel mapping
+	 * of text/data/bss should keeps unchanged according to
+	 * hibernation semantic.
 	 */
-	mwait_ptr = &current_thread_info()->flags;
+	if (hibernation_in_resume())
+		mwait_ptr = empty_zero_page;
+	else
+		mwait_ptr = &current_thread_info()->flags;
 
 	wbinvd();
 
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 8b6ec7e..422e87a 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -384,6 +384,12 @@ extern bool system_entering_hibernation(void);
 extern bool hibernation_available(void);
 asmlinkage int swsusp_save(void);
 extern struct pbe *restore_pblist;
+extern bool in_resume_hibernate;
+
+static inline bool hibernation_in_resume(void)
+{
+	return in_resume_hibernate;
+}
 #else /* CONFIG_HIBERNATION */
 static inline void register_nosave_region(unsigned long b, unsigned long e) {}
 static inline void register_nosave_region_late(unsigned long b, unsigned long e) {}
@@ -395,6 +401,7 @@ static inline void hibernation_set_ops(const struct platform_hibernation_ops *op
 static inline int hibernate(void) { return -ENOSYS; }
 static inline bool system_entering_hibernation(void) { return false; }
 static inline bool hibernation_available(void) { return false; }
+static inline bool hibernation_in_resume(void) { return false; }
 #endif /* CONFIG_HIBERNATION */
 
 /* Hibernation and suspend events */
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index fca9254..13c229a 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -43,6 +43,7 @@ static char resume_file[256] = CONFIG_PM_STD_PARTITION;
 dev_t swsusp_resume_device;
 sector_t swsusp_resume_block;
 __visible int in_suspend __nosavedata;
+bool in_resume_hibernate;
 
 enum {
 	HIBERNATION_INVALID,
@@ -433,7 +434,9 @@ static int resume_target_kernel(bool platform_mode)
 	if (error)
 		goto Cleanup;
 
+	in_resume_hibernate = true;
 	error = disable_nonboot_cpus();
+	in_resume_hibernate = false;
 	if (error)
 		goto Enable_cpus;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
  2016-06-06 14:19 [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation Chen Yu
@ 2016-06-06 14:25 ` Peter Zijlstra
  2016-06-06 15:59   ` Chen, Yu C
  2016-06-07  8:03 ` Pavel Machek
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2016-06-06 14:25 UTC (permalink / raw)
  To: Chen Yu
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	linux-pm, Rafael J . Wysocki, Pavel Machek, Len Brown,
	Borislav Petkov, Zhu Guihua, Juergen Gross

On Mon, Jun 06, 2016 at 10:19:09PM +0800, Chen Yu wrote:
> @@ -1595,8 +1596,21 @@ static inline void mwait_play_dead(void)
>  	 * This should be a memory location in a cache line which is
>  	 * unlikely to be touched by other processors.  The actual
>  	 * content is immaterial as it is not actually modified in any way.
> +	 *
> +	 * However in hibernation resume process, this address could be
> +	 * touched by BSP when restoring page frames, if the page table
> +	 * for this address is not coherent across hibernation(due to
> +	 * inconsistence of e820 memory map), access from APs might
> +	 * cause exception. So change the mwait address to zero page,
> +	 * which is located in .bss, in this way we can avoid illegal
> +	 * access from APs because page table for kernel mapping
> +	 * of text/data/bss should keeps unchanged according to
> +	 * hibernation semantic.
>  	 */
> -	mwait_ptr = &current_thread_info()->flags;
> +	if (hibernation_in_resume())
> +		mwait_ptr = empty_zero_page;
> +	else
> +		mwait_ptr = &current_thread_info()->flags;

Why is this conditional? Is there any case in which the zero page is not
also correct?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
  2016-06-06 14:25 ` Peter Zijlstra
@ 2016-06-06 15:59   ` Chen, Yu C
  2016-06-06 16:40     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Chen, Yu C @ 2016-06-06 15:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	linux-pm, Rafael J . Wysocki, Pavel Machek, Brown, Len,
	Borislav Petkov, Zhu Guihua, Juergen Gross

Hi Peter,

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Monday, June 06, 2016 10:25 PM
> To: Chen, Yu C
> Cc: x86@kernel.org; linux-kernel@vger.kernel.org; Thomas Gleixner; Ingo
> Molnar; H . Peter Anvin; linux-pm@vger.kernel.org; Rafael J . Wysocki; Pavel
> Machek; Brown, Len; Borislav Petkov; Zhu Guihua; Juergen Gross
> Subject: Re: [PATCH][RFC] x86, hotplug: Use zero page for monitor when
> resuming from hibernation
> 
> On Mon, Jun 06, 2016 at 10:19:09PM +0800, Chen Yu wrote:
> > @@ -1595,8 +1596,21 @@ static inline void mwait_play_dead(void)
> >  	 * This should be a memory location in a cache line which is
> >  	 * unlikely to be touched by other processors.  The actual
> >  	 * content is immaterial as it is not actually modified in any way.
> > +	 *
> > +	 * However in hibernation resume process, this address could be
> > +	 * touched by BSP when restoring page frames, if the page table
> > +	 * for this address is not coherent across hibernation(due to
> > +	 * inconsistence of e820 memory map), access from APs might
> > +	 * cause exception. So change the mwait address to zero page,
> > +	 * which is located in .bss, in this way we can avoid illegal
> > +	 * access from APs because page table for kernel mapping
> > +	 * of text/data/bss should keeps unchanged according to
> > +	 * hibernation semantic.
> >  	 */
> > -	mwait_ptr = &current_thread_info()->flags;
> > +	if (hibernation_in_resume())
> > +		mwait_ptr = empty_zero_page;
> > +	else
> > +		mwait_ptr = &current_thread_info()->flags;
> 
> Why is this conditional? Is there any case in which the zero page is not also
> correct?
I'm thinking of avoid unnecessary wakeup for normal CPU offline,  for example, 
if one  driver uses  the zero page and access it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
  2016-06-06 15:59   ` Chen, Yu C
@ 2016-06-06 16:40     ` Peter Zijlstra
  2016-06-06 17:34       ` Brian Gerst
  2016-06-06 21:05       ` H. Peter Anvin
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2016-06-06 16:40 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	linux-pm, Rafael J . Wysocki, Pavel Machek, Brown, Len,
	Borislav Petkov, Zhu Guihua, Juergen Gross

On Mon, Jun 06, 2016 at 03:59:06PM +0000, Chen, Yu C wrote:

> > > +	if (hibernation_in_resume())
> > > +		mwait_ptr = empty_zero_page;
> > > +	else
> > > +		mwait_ptr = &current_thread_info()->flags;
> > 
> > Why is this conditional? Is there any case in which the zero page is not also
> > correct?
> I'm thinking of avoid unnecessary wakeup for normal CPU offline,  for example, 
> if one  driver uses  the zero page and access it.

Writing to the zero page would be a major fail.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
  2016-06-06 16:40     ` Peter Zijlstra
@ 2016-06-06 17:34       ` Brian Gerst
  2016-06-06 21:05       ` H. Peter Anvin
  1 sibling, 0 replies; 10+ messages in thread
From: Brian Gerst @ 2016-06-06 17:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chen, Yu C, x86, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, linux-pm, Rafael J . Wysocki, Pavel Machek,
	Brown, Len, Borislav Petkov, Zhu Guihua, Juergen Gross

On Mon, Jun 6, 2016 at 12:40 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Jun 06, 2016 at 03:59:06PM +0000, Chen, Yu C wrote:
>
>> > > + if (hibernation_in_resume())
>> > > +         mwait_ptr = empty_zero_page;
>> > > + else
>> > > +         mwait_ptr = &current_thread_info()->flags;
>> >
>> > Why is this conditional? Is there any case in which the zero page is not also
>> > correct?
>> I'm thinking of avoid unnecessary wakeup for normal CPU offline,  for example,
>> if one  driver uses  the zero page and access it.
>
> Writing to the zero page would be a major fail.

I would think the safest thing to do during resume from hibernation is
to use hlt instead of mwait, so there is no dependency on any memory
address.  It doesn't need the power management features of mwait
either because the CPU will be reset soon after the restored kernel
resumes.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
  2016-06-06 16:40     ` Peter Zijlstra
  2016-06-06 17:34       ` Brian Gerst
@ 2016-06-06 21:05       ` H. Peter Anvin
  1 sibling, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2016-06-06 21:05 UTC (permalink / raw)
  To: Peter Zijlstra, Chen, Yu C
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, linux-pm,
	Rafael J . Wysocki, Pavel Machek, Brown, Len, Borislav Petkov,
	Zhu Guihua, Juergen Gross

On 06/06/16 09:40, Peter Zijlstra wrote:
> On Mon, Jun 06, 2016 at 03:59:06PM +0000, Chen, Yu C wrote:
> 
>>>> +	if (hibernation_in_resume())
>>>> +		mwait_ptr = empty_zero_page;
>>>> +	else
>>>> +		mwait_ptr = &current_thread_info()->flags;
>>>
>>> Why is this conditional? Is there any case in which the zero page is not also
>>> correct?
>> I'm thinking of avoid unnecessary wakeup for normal CPU offline,  for example, 
>> if one  driver uses  the zero page and access it.
> 
> Writing to the zero page would be a major fail.
> 

One implementation of MWAIT is for the waiting processor to claim the
line in an exclusive state, which would be highly undesirable for the
zero page.  I don't actually know if this is done by any real
implementation, however.

Either way, the problem at hand seems limited to post-hibernation, as
for regular CPU offline the page tables will be fully populated, so it
seems like it should not be an issue.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
  2016-06-06 14:19 [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation Chen Yu
  2016-06-06 14:25 ` Peter Zijlstra
@ 2016-06-07  8:03 ` Pavel Machek
  2016-06-07  8:44   ` Chen Yu
  1 sibling, 1 reply; 10+ messages in thread
From: Pavel Machek @ 2016-06-07  8:03 UTC (permalink / raw)
  To: Chen Yu
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	linux-pm, Rafael J . Wysocki, Len Brown, Borislav Petkov,
	Peter Zijlstra, Zhu Guihua, Juergen Gross

On Mon 2016-06-06 22:19:09, Chen Yu wrote:
> Stress test from Varun Koyyalagunta reports that, the
> nonboot CPU would hang occasionally, when resuming from
> hibernation. Further investigation shows that, the precise
> phase when nonboot CPU hangs, is the time when the nonboot
> CPU been woken up incorrectly, and tries to monitor the
> mwait_ptr for the second time, then an exception is
> triggered due to illegal vaddr access, say, something like,
> 'Unable to handler kernel address of 0xffff8800ba800010...'
> 
> One of the possible scenarios for this issue is illustrated below,
> when the boot CPU tries to resume from hibernation:
> 1. puts the nonboot CPUs offline, so the nonboot CPUs are monitoring
>    at the address of the task_struct.flags.
> 2. boot CPU copies pages to their original address, which includes
>    task_struct.flags, thus wakes up one of the nonboot CPUs.
> 3. nonboot CPU tries to monitor the task_struct.flags again, but since
>    the page table for task_struct.flags has been overwritten by
>    boot CPU, and there is probably a changed across hibernation
>    (because of inconsistence of e820 memory map), an exception is
>    triggered.

If memory map changes between suspend and resume, there'll be fun. If
that's suspected, should we attach md5 sum of e820 to the hibernation
image?

I doubt mwait fix is enough to catch everything in that case...

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
  2016-06-07  8:03 ` Pavel Machek
@ 2016-06-07  8:44   ` Chen Yu
  2016-06-07  9:13     ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Chen Yu @ 2016-06-07  8:44 UTC (permalink / raw)
  To: Pavel Machek
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	linux-pm, Rafael J . Wysocki, Len Brown, Borislav Petkov,
	Peter Zijlstra, Zhu Guihua, Juergen Gross, Zheng, Lv

Hi Pavel,

On 2016年06月07日 16:03, Pavel Machek wrote:
> On Mon 2016-06-06 22:19:09, Chen Yu wrote:
>> Stress test from Varun Koyyalagunta reports that, the
>> nonboot CPU would hang occasionally, when resuming from
>> hibernation. Further investigation shows that, the precise
>> phase when nonboot CPU hangs, is the time when the nonboot
>> CPU been woken up incorrectly, and tries to monitor the
>> mwait_ptr for the second time, then an exception is
>> triggered due to illegal vaddr access, say, something like,
>> 'Unable to handler kernel address of 0xffff8800ba800010...'
>>
>> One of the possible scenarios for this issue is illustrated below,
>> when the boot CPU tries to resume from hibernation:
>> 1. puts the nonboot CPUs offline, so the nonboot CPUs are monitoring
>>     at the address of the task_struct.flags.
>> 2. boot CPU copies pages to their original address, which includes
>>     task_struct.flags, thus wakes up one of the nonboot CPUs.
>> 3. nonboot CPU tries to monitor the task_struct.flags again, but since
>>     the page table for task_struct.flags has been overwritten by
>>     boot CPU, and there is probably a changed across hibernation
>>     (because of inconsistence of e820 memory map), an exception is
>>     triggered.
> If memory map changes between suspend and resume, there'll be fun. If
> that's suspected, should we attach md5 sum of e820 to the hibernation
> image?
Actually what I described  in the  scenario might be not so accurate,
it might not be related to inconsistence of e820 map,
because there is no guarantee that boot kernel and resume kernel
have the same memory layout(page table).

I've re-checked the logs from reporter, it seems that, the fault
access is caused by accessing a page without PRESENT flag,
and the pte entry for this vaddr is zero:

    // DATA ADDRESS TRANSLATION, virtual address = 0xFFFF8800BA803E88
    //   DATA TABLE WALK, virtual address = 0xFFFF8800BA803E88
    pml4read   0x0001C12880    0x01FD2067; // pgd
    pdptread   0x0001FD2010    0x9BEBF063; // pud
    pderead    0x009BEBFEA0    0x2D9D9063; // pmd
    pteread    0x002D9D9018    0x00000000; // pte

The last line above is a pte entry located at physical address 0x2d9d9018,
with value zero, thus access to this vaddr results in a page-not-present exception.


Since some of the pud/pde/pte are allocated dynamically during bootup
(kernel_physical_mapping_init),
it is possible that, when the boot cpu writes to this vaddr,
the page table(especially for pud/pmd/pte) for this vaddr
are not the same as it is before hibernation,  thus an exception would be
triggered due to incorrect page table, even e820 is consistent.

I'm doing more test to verify this.

thanks,
Yu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
  2016-06-07  8:44   ` Chen Yu
@ 2016-06-07  9:13     ` Borislav Petkov
  2016-06-07  9:43       ` Chen Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2016-06-07  9:13 UTC (permalink / raw)
  To: Chen Yu
  Cc: Pavel Machek, x86, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, linux-pm, Rafael J . Wysocki, Len Brown,
	Peter Zijlstra, Zhu Guihua, Juergen Gross, Zheng, Lv,
	Brian Gerst

On Tue, Jun 07, 2016 at 04:44:24PM +0800, Chen Yu wrote:
> I'm doing more test to verify this.

I think it would be better if you looked into doing HLT as Brian
suggested, instead of touching any memory during s/r.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation
  2016-06-07  9:13     ` Borislav Petkov
@ 2016-06-07  9:43       ` Chen Yu
  0 siblings, 0 replies; 10+ messages in thread
From: Chen Yu @ 2016-06-07  9:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pavel Machek, x86, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, linux-pm, Rafael J . Wysocki, Len Brown,
	Peter Zijlstra, Zhu Guihua, Juergen Gross, Zheng, Lv,
	Brian Gerst



On 2016年06月07日 17:13, Borislav Petkov wrote:
> On Tue, Jun 07, 2016 at 04:44:24PM +0800, Chen Yu wrote:
>> I'm doing more test to verify this.
> I think it would be better if you looked into doing HLT as Brian
> suggested, instead of touching any memory during s/r.
>
OK, I'll also test this solution. thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-06-07  9:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-06 14:19 [PATCH][RFC] x86, hotplug: Use zero page for monitor when resuming from hibernation Chen Yu
2016-06-06 14:25 ` Peter Zijlstra
2016-06-06 15:59   ` Chen, Yu C
2016-06-06 16:40     ` Peter Zijlstra
2016-06-06 17:34       ` Brian Gerst
2016-06-06 21:05       ` H. Peter Anvin
2016-06-07  8:03 ` Pavel Machek
2016-06-07  8:44   ` Chen Yu
2016-06-07  9:13     ` Borislav Petkov
2016-06-07  9:43       ` Chen Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).