Re: PROBLEM: Resume form hibernate broken by setting NX on gap

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
       [not found] <573DF82D.50006@deltatee.com>
@ 2016-05-20  7:15 ` Ingo Molnar
  2016-05-20 11:34   ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2016-05-20  7:15 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Stephen Smalley, Kees Cook, Ingo Molnar, x86, linux-pm,
	linux-kernel, Andy Lutomirski, Borislav Petkov, Denys Vlasenko,
	Brian Gerst


* Logan Gunthorpe <logang@deltatee.com> wrote:

> Hi,
> 
> I have been working on a bug that causes my laptop to freeze during
> resume from hibernation. I did a bisect to find the offending commit:
> 
> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
> 
> There is more information in the bugzilla report [1] that
> I've been working on but I will summarize things below.
> 
> I've experienced intermittent but reproducible freezes when resuming
> from hibernation since about kernel version 3.19. The freeze was
> significantly more reproducible when a few applications were loaded
> before hibernation and would largely not happen if hibernated
> immediately after booting to a desktop. I did some tracing work to find
> that the kernel gets as far as the resume_image call in
> swsusp_arch_resume and I could not find any response from the image
> kernel when I hit the bug. I also did testing that seemed to rule out
> this being caused by a problematic driver.
> 
> I did a successful bisect between 3.18 and 3.19 which found a bug in
> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4.
> Then, I did a second bisect with a ported version of the fix to the
> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation
> with what appears to be the exact same symptoms. Reverting that commit
> in recent kernels up to and including 4.6 fixes the issue and restores
> reliable hibernation. However, it's not at all clear to me why that
> commit would cause this issue or how to fix the issue without reverting.

I've attached that commit below and also Cc:-ed a few more people who might have 
an idea about why this regressed. Worst-case we'll have to revert it.

Thanks,

	Ingo

=================>
>From ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 Mon Sep 17 00:00:00 2001
From: Stephen Smalley <sds@tycho.nsa.gov>
Date: Thu, 1 Oct 2015 09:04:22 -0400
Subject: [PATCH] x86/mm: Set NX on gap between __ex_table and rodata

Unused space between the end of __ex_table and the start of
rodata can be left W+x in the kernel page tables.  Extend the
setting of the NX bit to cover this gap by starting from
text_end rather than rodata_start.

  Before:
  ---[ High Kernel Mapping ]---
  0xffffffff80000000-0xffffffff81000000          16M                               pmd
  0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
  0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
  0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB x  pte
  0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
  0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
  0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
  0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
  0xffffffff82200000-0xffffffffa0000000         478M                               pmd

  After:
  ---[ High Kernel Mapping ]---
  0xffffffff80000000-0xffffffff81000000          16M                               pmd
  0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
  0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
  0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB NX pte
  0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
  0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
  0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
  0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
  0xffffffff82200000-0xffffffffa0000000         478M                               pmd

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/init_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30564e2752d3..df48430c279b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1132,7 +1132,7 @@ void mark_rodata_ro(void)
 	 * has been zapped already via cleanup_highmem().
 	 */
 	all_end = roundup((unsigned long)_brk_end, PMD_SIZE);
-	set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT);
 
 	rodata_test();
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-05-20  7:15 ` PROBLEM: Resume form hibernate broken by setting NX on gap Ingo Molnar
@ 2016-05-20 11:34   ` Rafael J. Wysocki
  2016-05-20 13:56     ` Stephen Smalley
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-05-20 11:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Logan Gunthorpe, Stephen Smalley, Kees Cook, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Logan Gunthorpe <logang@deltatee.com> wrote:
>
>> Hi,
>>
>> I have been working on a bug that causes my laptop to freeze during
>> resume from hibernation. I did a bisect to find the offending commit:
>>
>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
>>
>> There is more information in the bugzilla report [1] that
>> I've been working on but I will summarize things below.
>>
>> I've experienced intermittent but reproducible freezes when resuming
>> from hibernation since about kernel version 3.19. The freeze was
>> significantly more reproducible when a few applications were loaded
>> before hibernation and would largely not happen if hibernated
>> immediately after booting to a desktop. I did some tracing work to find
>> that the kernel gets as far as the resume_image call in
>> swsusp_arch_resume and I could not find any response from the image
>> kernel when I hit the bug. I also did testing that seemed to rule out
>> this being caused by a problematic driver.
>>
>> I did a successful bisect between 3.18 and 3.19 which found a bug in
>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4.
>> Then, I did a second bisect with a ported version of the fix to the
>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation
>> with what appears to be the exact same symptoms. Reverting that commit
>> in recent kernels up to and including 4.6 fixes the issue and restores
>> reliable hibernation. However, it's not at all clear to me why that
>> commit would cause this issue or how to fix the issue without reverting.
>
> I've attached that commit below and also Cc:-ed a few more people who might have
> an idea about why this regressed. Worst-case we'll have to revert it.

Without looking deep into mm, my theory would be that after this patch
the final jump from the boot kernel to the image kernel's trampoline
code during resume may crash the kernel if the trampoline page turns
out to be NX in the boot kernel (it has to be executable in both the
boot and the image kernels).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-05-20 11:34   ` Rafael J. Wysocki
@ 2016-05-20 13:56     ` Stephen Smalley
  2016-05-20 21:46       ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Stephen Smalley @ 2016-05-20 13:56 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ingo Molnar
  Cc: Logan Gunthorpe, Kees Cook, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote:
> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Logan Gunthorpe <logang@deltatee.com> wrote:
>>
>>> Hi,
>>>
>>> I have been working on a bug that causes my laptop to freeze during
>>> resume from hibernation. I did a bisect to find the offending commit:
>>>
>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
>>>
>>> There is more information in the bugzilla report [1] that
>>> I've been working on but I will summarize things below.
>>>
>>> I've experienced intermittent but reproducible freezes when resuming
>>> from hibernation since about kernel version 3.19. The freeze was
>>> significantly more reproducible when a few applications were loaded
>>> before hibernation and would largely not happen if hibernated
>>> immediately after booting to a desktop. I did some tracing work to find
>>> that the kernel gets as far as the resume_image call in
>>> swsusp_arch_resume and I could not find any response from the image
>>> kernel when I hit the bug. I also did testing that seemed to rule out
>>> this being caused by a problematic driver.
>>>
>>> I did a successful bisect between 3.18 and 3.19 which found a bug in
>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4.
>>> Then, I did a second bisect with a ported version of the fix to the
>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation
>>> with what appears to be the exact same symptoms. Reverting that commit
>>> in recent kernels up to and including 4.6 fixes the issue and restores
>>> reliable hibernation. However, it's not at all clear to me why that
>>> commit would cause this issue or how to fix the issue without reverting.
>>
>> I've attached that commit below and also Cc:-ed a few more people who might have
>> an idea about why this regressed. Worst-case we'll have to revert it.
> 
> Without looking deep into mm, my theory would be that after this patch
> the final jump from the boot kernel to the image kernel's trampoline
> code during resume may crash the kernel if the trampoline page turns
> out to be NX in the boot kernel (it has to be executable in both the
> boot and the image kernels).

So, pardon my ignorance, but where is this trampoline page placed in
kernel memory?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-05-20 13:56     ` Stephen Smalley
@ 2016-05-20 21:46       ` Rafael J. Wysocki
  2016-05-20 21:59         ` Kees Cook
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-05-20 21:46 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Rafael J. Wysocki, Ingo Molnar, Logan Gunthorpe, Kees Cook,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote:
>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>> * Logan Gunthorpe <logang@deltatee.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have been working on a bug that causes my laptop to freeze during
>>>> resume from hibernation. I did a bisect to find the offending commit:
>>>>
>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
>>>>
>>>> There is more information in the bugzilla report [1] that
>>>> I've been working on but I will summarize things below.
>>>>
>>>> I've experienced intermittent but reproducible freezes when resuming
>>>> from hibernation since about kernel version 3.19. The freeze was
>>>> significantly more reproducible when a few applications were loaded
>>>> before hibernation and would largely not happen if hibernated
>>>> immediately after booting to a desktop. I did some tracing work to find
>>>> that the kernel gets as far as the resume_image call in
>>>> swsusp_arch_resume and I could not find any response from the image
>>>> kernel when I hit the bug. I also did testing that seemed to rule out
>>>> this being caused by a problematic driver.
>>>>
>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in
>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4.
>>>> Then, I did a second bisect with a ported version of the fix to the
>>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation
>>>> with what appears to be the exact same symptoms. Reverting that commit
>>>> in recent kernels up to and including 4.6 fixes the issue and restores
>>>> reliable hibernation. However, it's not at all clear to me why that
>>>> commit would cause this issue or how to fix the issue without reverting.
>>>
>>> I've attached that commit below and also Cc:-ed a few more people who might have
>>> an idea about why this regressed. Worst-case we'll have to revert it.
>>
>> Without looking deep into mm, my theory would be that after this patch
>> the final jump from the boot kernel to the image kernel's trampoline
>> code during resume may crash the kernel if the trampoline page turns
>> out to be NX in the boot kernel (it has to be executable in both the
>> boot and the image kernels).
>
> So, pardon my ignorance, but where is this trampoline page placed in
> kernel memory?

On 32-bit its location has to be the same in both the boot and the
image kernels and that's within kernel text in both cases, so that
shouldn't be a problem.

On 64-bit its location depends on the image kernel and specifically on
the location of the restore_registers routine in it.  The (virtual)
address of that routine is stored in the restore_jump_address
variable, so the page containing it (the trampoline page) can be found
with the help of that.

swsusp_arch_resume() sets up a temporary kernel mapping to finalize
the image restoration and that page must not be NX in that mapping for
things to work.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-05-20 21:46       ` Rafael J. Wysocki
@ 2016-05-20 21:59         ` Kees Cook
  2016-05-20 22:16           ` Kees Cook
  2016-06-10 22:11           ` Rafael J. Wysocki
  0 siblings, 2 replies; 25+ messages in thread
From: Kees Cook @ 2016-05-20 21:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stephen Smalley, Ingo Molnar, Logan Gunthorpe, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote:
>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>>
>>>> * Logan Gunthorpe <logang@deltatee.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have been working on a bug that causes my laptop to freeze during
>>>>> resume from hibernation. I did a bisect to find the offending commit:
>>>>>
>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
>>>>>
>>>>> There is more information in the bugzilla report [1] that
>>>>> I've been working on but I will summarize things below.
>>>>>
>>>>> I've experienced intermittent but reproducible freezes when resuming
>>>>> from hibernation since about kernel version 3.19. The freeze was
>>>>> significantly more reproducible when a few applications were loaded
>>>>> before hibernation and would largely not happen if hibernated
>>>>> immediately after booting to a desktop. I did some tracing work to find
>>>>> that the kernel gets as far as the resume_image call in
>>>>> swsusp_arch_resume and I could not find any response from the image
>>>>> kernel when I hit the bug. I also did testing that seemed to rule out
>>>>> this being caused by a problematic driver.
>>>>>
>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in
>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4.
>>>>> Then, I did a second bisect with a ported version of the fix to the
>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation
>>>>> with what appears to be the exact same symptoms. Reverting that commit
>>>>> in recent kernels up to and including 4.6 fixes the issue and restores
>>>>> reliable hibernation. However, it's not at all clear to me why that
>>>>> commit would cause this issue or how to fix the issue without reverting.
>>>>
>>>> I've attached that commit below and also Cc:-ed a few more people who might have
>>>> an idea about why this regressed. Worst-case we'll have to revert it.
>>>
>>> Without looking deep into mm, my theory would be that after this patch
>>> the final jump from the boot kernel to the image kernel's trampoline
>>> code during resume may crash the kernel if the trampoline page turns
>>> out to be NX in the boot kernel (it has to be executable in both the
>>> boot and the image kernels).
>>
>> So, pardon my ignorance, but where is this trampoline page placed in
>> kernel memory?
>
> On 32-bit its location has to be the same in both the boot and the
> image kernels and that's within kernel text in both cases, so that
> shouldn't be a problem.
>
> On 64-bit its location depends on the image kernel and specifically on
> the location of the restore_registers routine in it.  The (virtual)
> address of that routine is stored in the restore_jump_address
> variable, so the page containing it (the trampoline page) can be found
> with the help of that.
>
> swsusp_arch_resume() sets up a temporary kernel mapping to finalize
> the image restoration and that page must not be NX in that mapping for
> things to work.

It looks like nothing in the swsusp_arch_resume() -> get_safe_page()
-> get_image_page() path sets the page executable...

Untested, but I wonder if this work work in swsusp_arch_resume()
before the memcpy?

(apologies for any gmail-based whitespace mangling...)

diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index 009947d419a6..c2f3ecc45bd4 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -12,6 +12,7 @@
 #include <linux/smp.h>
 #include <linux/suspend.h>

+#include <asm/cacheflush.h>
 #include <asm/init.h>
 #include <asm/proto.h>
 #include <asm/page.h>
@@ -89,6 +90,7 @@ int swsusp_arch_resume(void)
        relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC);
        if (!relocated_restore_code)
                return -ENOMEM;
+       set_memory_x((unsigned long)relocated_restore_code, 1);
        memcpy(relocated_restore_code, &core_restore_code,
               &restore_registers - &core_restore_code);


-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-05-20 21:59         ` Kees Cook
@ 2016-05-20 22:16           ` Kees Cook
       [not found]             ` <573FC081.20006@deltatee.com>
  2016-06-10 22:11           ` Rafael J. Wysocki
  1 sibling, 1 reply; 25+ messages in thread
From: Kees Cook @ 2016-05-20 22:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stephen Smalley, Ingo Molnar, Logan Gunthorpe, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
>>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote:
>>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>>>
>>>>> * Logan Gunthorpe <logang@deltatee.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have been working on a bug that causes my laptop to freeze during
>>>>>> resume from hibernation. I did a bisect to find the offending commit:
>>>>>>
>>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
>>>>>>
>>>>>> There is more information in the bugzilla report [1] that
>>>>>> I've been working on but I will summarize things below.
>>>>>>
>>>>>> I've experienced intermittent but reproducible freezes when resuming
>>>>>> from hibernation since about kernel version 3.19. The freeze was
>>>>>> significantly more reproducible when a few applications were loaded
>>>>>> before hibernation and would largely not happen if hibernated
>>>>>> immediately after booting to a desktop. I did some tracing work to find
>>>>>> that the kernel gets as far as the resume_image call in
>>>>>> swsusp_arch_resume and I could not find any response from the image
>>>>>> kernel when I hit the bug. I also did testing that seemed to rule out
>>>>>> this being caused by a problematic driver.
>>>>>>
>>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in
>>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4.
>>>>>> Then, I did a second bisect with a ported version of the fix to the
>>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation
>>>>>> with what appears to be the exact same symptoms. Reverting that commit
>>>>>> in recent kernels up to and including 4.6 fixes the issue and restores
>>>>>> reliable hibernation. However, it's not at all clear to me why that
>>>>>> commit would cause this issue or how to fix the issue without reverting.
>>>>>
>>>>> I've attached that commit below and also Cc:-ed a few more people who might have
>>>>> an idea about why this regressed. Worst-case we'll have to revert it.
>>>>
>>>> Without looking deep into mm, my theory would be that after this patch
>>>> the final jump from the boot kernel to the image kernel's trampoline
>>>> code during resume may crash the kernel if the trampoline page turns
>>>> out to be NX in the boot kernel (it has to be executable in both the
>>>> boot and the image kernels).
>>>
>>> So, pardon my ignorance, but where is this trampoline page placed in
>>> kernel memory?
>>
>> On 32-bit its location has to be the same in both the boot and the
>> image kernels and that's within kernel text in both cases, so that
>> shouldn't be a problem.
>>
>> On 64-bit its location depends on the image kernel and specifically on
>> the location of the restore_registers routine in it.  The (virtual)
>> address of that routine is stored in the restore_jump_address
>> variable, so the page containing it (the trampoline page) can be found
>> with the help of that.
>>
>> swsusp_arch_resume() sets up a temporary kernel mapping to finalize
>> the image restoration and that page must not be NX in that mapping for
>> things to work.
>
> It looks like nothing in the swsusp_arch_resume() -> get_safe_page()
> -> get_image_page() path sets the page executable...
>
> Untested, but I wonder if this work work in swsusp_arch_resume()
> before the memcpy?

I can't type today, it seems. It should read "... if this would work ..."

If you can test this and it works for you, I'll send a proper patch... :P

-Kees

>
> (apologies for any gmail-based whitespace mangling...)
>
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 009947d419a6..c2f3ecc45bd4 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -12,6 +12,7 @@
>  #include <linux/smp.h>
>  #include <linux/suspend.h>
>
> +#include <asm/cacheflush.h>
>  #include <asm/init.h>
>  #include <asm/proto.h>
>  #include <asm/page.h>
> @@ -89,6 +90,7 @@ int swsusp_arch_resume(void)
>         relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC);
>         if (!relocated_restore_code)
>                 return -ENOMEM;
> +       set_memory_x((unsigned long)relocated_restore_code, 1);
>         memcpy(relocated_restore_code, &core_restore_code,
>                &restore_registers - &core_restore_code);
>
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <573FC081.20006@deltatee.com>]

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
       [not found]             ` <573FC081.20006@deltatee.com>
@ 2016-05-21 16:39               ` Kees Cook
       [not found]                 ` <575A3E95.5090100@deltatee.com>
  0 siblings, 1 reply; 25+ messages in thread
From: Kees Cook @ 2016-05-21 16:39 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On Fri, May 20, 2016 at 6:57 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> On 20/05/16 04:16 PM, Kees Cook wrote:
>>
>> On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
>>>
>>> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org>
>>> wrote:
>>>>
>>>> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov>
>>>> wrote:
>>>>>
>>>>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote:
>>>>>>
>>>>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>> * Logan Gunthorpe <logang@deltatee.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have been working on a bug that causes my laptop to freeze during
>>>>>>>> resume from hibernation. I did a bisect to find the offending
>>>>>>>> commit:
>>>>>>>>
>>>>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
>>>>>>>>
>>>>>>>> There is more information in the bugzilla report [1] that
>>>>>>>> I've been working on but I will summarize things below.
>>>>>>>>
>>>>>>>> I've experienced intermittent but reproducible freezes when resuming
>>>>>>>> from hibernation since about kernel version 3.19. The freeze was
>>>>>>>> significantly more reproducible when a few applications were loaded
>>>>>>>> before hibernation and would largely not happen if hibernated
>>>>>>>> immediately after booting to a desktop. I did some tracing work to
>>>>>>>> find
>>>>>>>> that the kernel gets as far as the resume_image call in
>>>>>>>> swsusp_arch_resume and I could not find any response from the image
>>>>>>>> kernel when I hit the bug. I also did testing that seemed to rule
>>>>>>>> out
>>>>>>>> this being caused by a problematic driver.
>>>>>>>>
>>>>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in
>>>>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in
>>>>>>>> 4.4.
>>>>>>>> Then, I did a second bisect with a ported version of the fix to the
>>>>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break
>>>>>>>> hibernation
>>>>>>>> with what appears to be the exact same symptoms. Reverting that
>>>>>>>> commit
>>>>>>>> in recent kernels up to and including 4.6 fixes the issue and
>>>>>>>> restores
>>>>>>>> reliable hibernation. However, it's not at all clear to me why that
>>>>>>>> commit would cause this issue or how to fix the issue without
>>>>>>>> reverting.
>>>>>>>
>>>>>>>
>>>>>>> I've attached that commit below and also Cc:-ed a few more people who
>>>>>>> might have
>>>>>>> an idea about why this regressed. Worst-case we'll have to revert it.
>>>>>>
>>>>>>
>>>>>> Without looking deep into mm, my theory would be that after this patch
>>>>>> the final jump from the boot kernel to the image kernel's trampoline
>>>>>> code during resume may crash the kernel if the trampoline page turns
>>>>>> out to be NX in the boot kernel (it has to be executable in both the
>>>>>> boot and the image kernels).
>>>>>
>>>>>
>>>>> So, pardon my ignorance, but where is this trampoline page placed in
>>>>> kernel memory?
>>>>
>>>>
>>>> On 32-bit its location has to be the same in both the boot and the
>>>> image kernels and that's within kernel text in both cases, so that
>>>> shouldn't be a problem.
>>>>
>>>> On 64-bit its location depends on the image kernel and specifically on
>>>> the location of the restore_registers routine in it.  The (virtual)
>>>> address of that routine is stored in the restore_jump_address
>>>> variable, so the page containing it (the trampoline page) can be found
>>>> with the help of that.
>>>>
>>>> swsusp_arch_resume() sets up a temporary kernel mapping to finalize
>>>> the image restoration and that page must not be NX in that mapping for
>>>> things to work.
>>>
>>>
>>> It looks like nothing in the swsusp_arch_resume() -> get_safe_page()
>>> -> get_image_page() path sets the page executable...
>>>
>>> Untested, but I wonder if this work work in swsusp_arch_resume()
>>> before the memcpy?
>>
>>
>> I can't type today, it seems. It should read "... if this would work ..."
>>
>> If you can test this and it works for you, I'll send a proper patch... :P
>>
>> -Kees
>>
>
> Hi Kees,
>
> Thanks. I tried the patch but it only resulted in a kernel warning and
> freeze. I've attached a photo showing as much of the messages as I could
> get.
>
> Logan

Ah, dang, ok, thanks for trying it. I'll let Rafael try to figure this one out.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <575A3E95.5090100@deltatee.com>]

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
       [not found]                 ` <575A3E95.5090100@deltatee.com>
@ 2016-06-10 18:09                   ` Kees Cook
  2016-06-10 18:16                     ` Logan Gunthorpe
  2016-06-10 21:27                     ` Rafael J. Wysocki
  0 siblings, 2 replies; 25+ messages in thread
From: Kees Cook @ 2016-06-10 18:09 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> Hey,
>
> I've still be trying to figure this out as I have time.
>
> I tried printing a couple restore addresses and nothing I can find seems
> anywhere near the rodata/ex_table boundary.
>
> I tried with the (badly formatted) below and got the following. Nothing too
> surprising. I've attached a kallsyms that matches the kernel for reference.
>
> restore_code: ffff880157c3b000
> jump_addr: ffffffff81446be0
>
>
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 009947d..6efedb7 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -92,6 +92,9 @@ int swsusp_arch_resume(void)
>         memcpy(relocated_restore_code, &core_restore_code,
>                &restore_registers - &core_restore_code);
>
> +       pr_info("restore_code: %p\n", relocated_restore_code);
> +       pr_info("jump_addr: %lx\n", restore_jump_address);
> +

Also interesting would be the "relocated_restore_code" address, as
well as a dump of /sys/kernel/debug/kernel_page_tables (from
CONFIG_X86_PTDUMP).

I'm baffled by the problem, but the best I can understand is the the
relocated_restore_code range isn't executable (which should be visible
from finding it in /sys/kernel/debug/kernel_page_tables), but I don't
see how to solve that since my original patch didn't work.

Rafael, is this something you have time to look at quickly?

-Kees

>         restore_image();
>         return 0;
>  }
>
>
> Thanks,
>
> Logan
>
>
>
> On 21/05/16 10:39 AM, Kees Cook wrote:
>>
>> On Fri, May 20, 2016 at 6:57 PM, Logan Gunthorpe <logang@deltatee.com>
>> wrote:
>>>
>>> On 20/05/16 04:16 PM, Kees Cook wrote:
>>>>
>>>>
>>>> On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org>
>>>> wrote:
>>>>>
>>>>>
>>>>> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> * Logan Gunthorpe <logang@deltatee.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have been working on a bug that causes my laptop to freeze
>>>>>>>>>> during
>>>>>>>>>> resume from hibernation. I did a bisect to find the offending
>>>>>>>>>> commit:
>>>>>>>>>>
>>>>>>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
>>>>>>>>>>
>>>>>>>>>> There is more information in the bugzilla report [1] that
>>>>>>>>>> I've been working on but I will summarize things below.
>>>>>>>>>>
>>>>>>>>>> I've experienced intermittent but reproducible freezes when
>>>>>>>>>> resuming
>>>>>>>>>> from hibernation since about kernel version 3.19. The freeze was
>>>>>>>>>> significantly more reproducible when a few applications were
>>>>>>>>>> loaded
>>>>>>>>>> before hibernation and would largely not happen if hibernated
>>>>>>>>>> immediately after booting to a desktop. I did some tracing work to
>>>>>>>>>> find
>>>>>>>>>> that the kernel gets as far as the resume_image call in
>>>>>>>>>> swsusp_arch_resume and I could not find any response from the
>>>>>>>>>> image
>>>>>>>>>> kernel when I hit the bug. I also did testing that seemed to rule
>>>>>>>>>> out
>>>>>>>>>> this being caused by a problematic driver.
>>>>>>>>>>
>>>>>>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug
>>>>>>>>>> in
>>>>>>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in
>>>>>>>>>> 4.4.
>>>>>>>>>> Then, I did a second bisect with a ported version of the fix to
>>>>>>>>>> the
>>>>>>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break
>>>>>>>>>> hibernation
>>>>>>>>>> with what appears to be the exact same symptoms. Reverting that
>>>>>>>>>> commit
>>>>>>>>>> in recent kernels up to and including 4.6 fixes the issue and
>>>>>>>>>> restores
>>>>>>>>>> reliable hibernation. However, it's not at all clear to me why
>>>>>>>>>> that
>>>>>>>>>> commit would cause this issue or how to fix the issue without
>>>>>>>>>> reverting.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've attached that commit below and also Cc:-ed a few more people
>>>>>>>>> who
>>>>>>>>> might have
>>>>>>>>> an idea about why this regressed. Worst-case we'll have to revert
>>>>>>>>> it.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Without looking deep into mm, my theory would be that after this
>>>>>>>> patch
>>>>>>>> the final jump from the boot kernel to the image kernel's trampoline
>>>>>>>> code during resume may crash the kernel if the trampoline page turns
>>>>>>>> out to be NX in the boot kernel (it has to be executable in both the
>>>>>>>> boot and the image kernels).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So, pardon my ignorance, but where is this trampoline page placed in
>>>>>>> kernel memory?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 32-bit its location has to be the same in both the boot and the
>>>>>> image kernels and that's within kernel text in both cases, so that
>>>>>> shouldn't be a problem.
>>>>>>
>>>>>> On 64-bit its location depends on the image kernel and specifically on
>>>>>> the location of the restore_registers routine in it.  The (virtual)
>>>>>> address of that routine is stored in the restore_jump_address
>>>>>> variable, so the page containing it (the trampoline page) can be found
>>>>>> with the help of that.
>>>>>>
>>>>>> swsusp_arch_resume() sets up a temporary kernel mapping to finalize
>>>>>> the image restoration and that page must not be NX in that mapping for
>>>>>> things to work.
>>>>>
>>>>>
>>>>>
>>>>> It looks like nothing in the swsusp_arch_resume() -> get_safe_page()
>>>>> -> get_image_page() path sets the page executable...
>>>>>
>>>>> Untested, but I wonder if this work work in swsusp_arch_resume()
>>>>> before the memcpy?
>>>>
>>>>
>>>>
>>>> I can't type today, it seems. It should read "... if this would work
>>>> ..."
>>>>
>>>> If you can test this and it works for you, I'll send a proper patch...
>>>> :P
>>>>
>>>> -Kees
>>>>
>>>
>>> Hi Kees,
>>>
>>> Thanks. I tried the patch but it only resulted in a kernel warning and
>>> freeze. I've attached a photo showing as much of the messages as I could
>>> get.
>>>
>>> Logan
>>
>>
>> Ah, dang, ok, thanks for trying it. I'll let Rafael try to figure this one
>> out.
>>
>> -Kees
>>
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-10 18:09                   ` Kees Cook
@ 2016-06-10 18:16                     ` Logan Gunthorpe
  2016-06-10 18:18                       ` Kees Cook
  2016-06-10 21:27                     ` Rafael J. Wysocki
  1 sibling, 1 reply; 25+ messages in thread
From: Logan Gunthorpe @ 2016-06-10 18:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

Hey,

On 10/06/16 12:09 PM, Kees Cook wrote:
>> restore_code: ffff880157c3b000
>> jump_addr: ffffffff81446be0
>>
>>
>> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
>> index 009947d..6efedb7 100644
>> --- a/arch/x86/power/hibernate_64.c
>> +++ b/arch/x86/power/hibernate_64.c
>> @@ -92,6 +92,9 @@ int swsusp_arch_resume(void)
>>         memcpy(relocated_restore_code, &core_restore_code,
>>                &restore_registers - &core_restore_code);
>>
>> +       pr_info("restore_code: %p\n", relocated_restore_code);
>> +       pr_info("jump_addr: %lx\n", restore_jump_address);
>> +
> 
> Also interesting would be the "relocated_restore_code" address, as
> well as a dump of /sys/kernel/debug/kernel_page_tables (from
> CONFIG_X86_PTDUMP).

Is that not what I printed? If not, can you give me a better hint as to
what you're looking for so I can spin another kernel? I'll also provide
the kernel_page_tables once I do that.

> I'm baffled by the problem, but the best I can understand is the the
> relocated_restore_code range isn't executable (which should be visible
> from finding it in /sys/kernel/debug/kernel_page_tables), but I don't
> see how to solve that since my original patch didn't work.

Yeah this is definitely a baffling problem.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-10 18:16                     ` Logan Gunthorpe
@ 2016-06-10 18:18                       ` Kees Cook
  0 siblings, 0 replies; 25+ messages in thread
From: Kees Cook @ 2016-06-10 18:18 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On Fri, Jun 10, 2016 at 11:16 AM, Logan Gunthorpe <logang@deltatee.com> wrote:
> Hey,
>
> On 10/06/16 12:09 PM, Kees Cook wrote:
>>> restore_code: ffff880157c3b000
>>> jump_addr: ffffffff81446be0
>>>
>>>
>>> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
>>> index 009947d..6efedb7 100644
>>> --- a/arch/x86/power/hibernate_64.c
>>> +++ b/arch/x86/power/hibernate_64.c
>>> @@ -92,6 +92,9 @@ int swsusp_arch_resume(void)
>>>         memcpy(relocated_restore_code, &core_restore_code,
>>>                &restore_registers - &core_restore_code);
>>>
>>> +       pr_info("restore_code: %p\n", relocated_restore_code);
>>> +       pr_info("jump_addr: %lx\n", restore_jump_address);
>>> +
>>
>> Also interesting would be the "relocated_restore_code" address, as
>> well as a dump of /sys/kernel/debug/kernel_page_tables (from
>> CONFIG_X86_PTDUMP).
>
> Is that not what I printed? If not, can you give me a better hint as to

Oh, whoops, sorry, I saw "restore_code" in the pr_info and
"relocate_restore_code" in the memcpy and didn't scan the right thing
in the pr_info line. :)

> what you're looking for so I can spin another kernel? I'll also provide
> the kernel_page_tables once I do that.

Cool, thanks.

>
>> I'm baffled by the problem, but the best I can understand is the the
>> relocated_restore_code range isn't executable (which should be visible
>> from finding it in /sys/kernel/debug/kernel_page_tables), but I don't
>> see how to solve that since my original patch didn't work.
>
> Yeah this is definitely a baffling problem.

-Kees


-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-10 18:09                   ` Kees Cook
  2016-06-10 18:16                     ` Logan Gunthorpe
@ 2016-06-10 21:27                     ` Rafael J. Wysocki
  2016-06-10 22:29                       ` Rafael J. Wysocki
  1 sibling, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-10 21:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Logan Gunthorpe, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Friday, June 10, 2016 11:09:22 AM Kees Cook wrote:
> On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> > Hey,
> >
> > I've still be trying to figure this out as I have time.
> >
> > I tried printing a couple restore addresses and nothing I can find seems
> > anywhere near the rodata/ex_table boundary.
> >
> > I tried with the (badly formatted) below and got the following. Nothing too
> > surprising. I've attached a kallsyms that matches the kernel for reference.
> >
> > restore_code: ffff880157c3b000
> > jump_addr: ffffffff81446be0
> >
> >
> > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> > index 009947d..6efedb7 100644
> > --- a/arch/x86/power/hibernate_64.c
> > +++ b/arch/x86/power/hibernate_64.c
> > @@ -92,6 +92,9 @@ int swsusp_arch_resume(void)
> >         memcpy(relocated_restore_code, &core_restore_code,
> >                &restore_registers - &core_restore_code);
> >
> > +       pr_info("restore_code: %p\n", relocated_restore_code);
> > +       pr_info("jump_addr: %lx\n", restore_jump_address);
> > +
> 
> Also interesting would be the "relocated_restore_code" address, as
> well as a dump of /sys/kernel/debug/kernel_page_tables (from
> CONFIG_X86_PTDUMP).
> 
> I'm baffled by the problem, but the best I can understand is the the
> relocated_restore_code range isn't executable (which should be visible
> from finding it in /sys/kernel/debug/kernel_page_tables), but I don't
> see how to solve that since my original patch didn't work.
> 
> Rafael, is this something you have time to look at quickly?

Well, not really, but I'll do my best to look at it in the next few days.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-10 21:27                     ` Rafael J. Wysocki
@ 2016-06-10 22:29                       ` Rafael J. Wysocki
  2016-06-10 22:28                         ` Logan Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-10 22:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Logan Gunthorpe, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Friday, June 10, 2016 11:27:29 PM Rafael J. Wysocki wrote:
> On Friday, June 10, 2016 11:09:22 AM Kees Cook wrote:
> > On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> > > Hey,
> > >
> > > I've still be trying to figure this out as I have time.
> > >
> > > I tried printing a couple restore addresses and nothing I can find seems
> > > anywhere near the rodata/ex_table boundary.
> > >
> > > I tried with the (badly formatted) below and got the following. Nothing too
> > > surprising. I've attached a kallsyms that matches the kernel for reference.
> > >
> > > restore_code: ffff880157c3b000
> > > jump_addr: ffffffff81446be0
> > >
> > >
> > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> > > index 009947d..6efedb7 100644
> > > --- a/arch/x86/power/hibernate_64.c
> > > +++ b/arch/x86/power/hibernate_64.c
> > > @@ -92,6 +92,9 @@ int swsusp_arch_resume(void)
> > >         memcpy(relocated_restore_code, &core_restore_code,
> > >                &restore_registers - &core_restore_code);
> > >
> > > +       pr_info("restore_code: %p\n", relocated_restore_code);
> > > +       pr_info("jump_addr: %lx\n", restore_jump_address);
> > > +
> > 
> > Also interesting would be the "relocated_restore_code" address, as
> > well as a dump of /sys/kernel/debug/kernel_page_tables (from
> > CONFIG_X86_PTDUMP).
> > 
> > I'm baffled by the problem, but the best I can understand is the the
> > relocated_restore_code range isn't executable (which should be visible
> > from finding it in /sys/kernel/debug/kernel_page_tables), but I don't
> > see how to solve that since my original patch didn't work.
> > 
> > Rafael, is this something you have time to look at quickly?
> 
> Well, not really, but I'll do my best to look at it in the next few days.

OK, I have a theory, but I need a bit of help.

This may be a dumb question, but I don't quite remember the answer readily.

Given a physical address, how do I get the corresponding virtual one under
the kernel identity mapping?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-10 22:29                       ` Rafael J. Wysocki
@ 2016-06-10 22:28                         ` Logan Gunthorpe
  2016-06-10 22:33                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Logan Gunthorpe @ 2016-06-10 22:28 UTC (permalink / raw)
  To: Rafael J. Wysocki, Kees Cook
  Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst



On 10/06/16 04:29 PM, Rafael J. Wysocki wrote:
> OK, I have a theory, but I need a bit of help.
> 
> This may be a dumb question, but I don't quite remember the answer readily.
> 
> Given a physical address, how do I get the corresponding virtual one under
> the kernel identity mapping?

Is that not just 'phys_to_virt(x)'??

Logan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-10 22:28                         ` Logan Gunthorpe
@ 2016-06-10 22:33                           ` Rafael J. Wysocki
  2016-06-11  0:13                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-10 22:33 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote:
> 
> On 10/06/16 04:29 PM, Rafael J. Wysocki wrote:
> > OK, I have a theory, but I need a bit of help.
> > 
> > This may be a dumb question, but I don't quite remember the answer readily.
> > 
> > Given a physical address, how do I get the corresponding virtual one under
> > the kernel identity mapping?
> 
> Is that not just 'phys_to_virt(x)'??

Ah that.  Thanks!

Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-10 22:33                           ` Rafael J. Wysocki
@ 2016-06-11  0:13                             ` Rafael J. Wysocki
  2016-06-11  1:47                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-11  0:13 UTC (permalink / raw)
  To: Logan Gunthorpe, Kees Cook
  Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar,
	the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote:
> On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote:
> > 
> > On 10/06/16 04:29 PM, Rafael J. Wysocki wrote:
> > > OK, I have a theory, but I need a bit of help.
> > > 
> > > This may be a dumb question, but I don't quite remember the answer readily.
> > > 
> > > Given a physical address, how do I get the corresponding virtual one under
> > > the kernel identity mapping?
> > 
> > Is that not just 'phys_to_virt(x)'??
> 
> Ah that.  Thanks!

So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully
will make it a bit clearer what happens there.

In particular, it follows from it quite clearly that the restore will only work
if the virtual address of the trampoline (&restore_registers) in the image
kernel happens to match the virtual address of the same chunk of memory in the
boot kernel (and after it has switched to the temporary page tables).

That appears to be the case most of the time, or hibernation would randomly
break for people all over, but I'm not actually sure if that's guaranteed to
happen.  If not, that very well may explain unexpected failures in there.

If it is not guaranteed to happen, the solution would be to pass the physical
address of that memory from the image kernel to the boot kernel instead of its
virtual address, but simply applying virt_to_phys() to &restore_registers in
arch_hibernation_header_save(), storing that in rdr->jump_address and then applying
phys_to_virt() to rdr->jump_address in arch_hibernation_header_restore() doesn't
work.

Any hints on how to do it correctly?

Thanks,
Rafael

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] x86 / hibernate: Simplify passing of trampoline address

During resume from hibernation, the boot kernel has to jump to
the image kernel's trampoline code to pass control to it.  On
x86_64 the address of that code is passed from the image kernel
to the boot kernel in the image header.  Currently, the way in
which that address is saved by the image kernel is not entirely
straightforward (assembly code is involved in that unnecessarily
etc), so simplify it to make it easier to follow.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 arch/x86/power/hibernate_64.c     |    6 +++---
 arch/x86/power/hibernate_asm_64.S |    3 ---
 2 files changed, 3 insertions(+), 6 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===================================================================
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -27,7 +27,7 @@ extern asmlinkage __visible int restore_
  * Address to jump to in the last phase of restore in order to get to the image
  * kernel's text (this value is passed in the image header).
  */
-unsigned long restore_jump_address __visible;
+void *restore_jump_address __visible;

 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@ -108,7 +108,7 @@ int pfn_is_nosave(unsigned long pfn)
 }

 struct restore_data_record {
-	unsigned long jump_address;
+	void *jump_address;
 	unsigned long cr3;
 	unsigned long magic;
 };
@@ -126,7 +126,7 @@ int arch_hibernation_header_save(void *a

 	if (max_size < sizeof(struct restore_data_record))
 		return -EOVERFLOW;
-	rdr->jump_address = restore_jump_address;
+	rdr->jump_address = &restore_registers;
 	rdr->cr3 = restore_cr3;
 	rdr->magic = RESTORE_MAGIC;
 	return 0;
Index: linux-pm/arch/x86/power/hibernate_asm_64.S
===================================================================
--- linux-pm.orig/arch/x86/power/hibernate_asm_64.S
+++ linux-pm/arch/x86/power/hibernate_asm_64.S
@@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend)
 	pushfq
 	popq	pt_regs_flags(%rax)

-	/* save the address of restore_registers */
-	movq	$restore_registers, %rax
-	movq	%rax, restore_jump_address(%rip)
 	/* save cr3 */
 	movq	%cr3, %rax
 	movq	%rax, restore_cr3(%rip)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-11  0:13                             ` Rafael J. Wysocki
@ 2016-06-11  1:47                               ` Rafael J. Wysocki
  2016-06-11 11:48                                 ` Rafael J. Wysocki
  2016-06-11 17:39                                 ` Logan Gunthorpe
  0 siblings, 2 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-11  1:47 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Saturday, June 11, 2016 02:13:45 AM Rafael J. Wysocki wrote:
> On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote:
> > On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote:
> > > 
> > > On 10/06/16 04:29 PM, Rafael J. Wysocki wrote:
> > > > OK, I have a theory, but I need a bit of help.
> > > > 
> > > > This may be a dumb question, but I don't quite remember the answer readily.
> > > > 
> > > > Given a physical address, how do I get the corresponding virtual one under
> > > > the kernel identity mapping?
> > > 
> > > Is that not just 'phys_to_virt(x)'??
> > 
> > Ah that.  Thanks!
> 
> So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully
> will make it a bit clearer what happens there.
> 
> In particular, it follows from it quite clearly that the restore will only work
> if the virtual address of the trampoline (&restore_registers) in the image
> kernel happens to match the virtual address of the same chunk of memory in the
> boot kernel (and after it has switched to the temporary page tables).
> 
> That appears to be the case most of the time, or hibernation would randomly
> break for people all over, but I'm not actually sure if that's guaranteed to
> happen.  If not, that very well may explain unexpected failures in there.
> 
> If it is not guaranteed to happen, the solution would be to pass the physical
> address of that memory from the image kernel to the boot kernel instead of its
> virtual address,

OK, the appended patch does the trick for me.

Logan, can you please try it?  It shouldn't break things, but I'm wondering if
the problem you're having after commit ab76f7b4ab is still reproducible with it.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] x86 / hibernate: Pass physical address of the trampoline

During resume from hibernation, the boot kernel has to jump to
the image kernel's trampoline code to pass control to it.  On
x86_64 the address of that code is passed from the image kernel
to the boot kernel in the image header.

Currently, the way in which that address is saved by the image kernel
is not entirely straightforward (assembly code is involved in that
unnecessarily etc).  Moreover, the current code passes the virtual
address of the trampoline with the assumption that the image and boot
kernels will always use the same virtual addresses for the chunk of
physical address space occupied by the trampoline code.

In case that assumption is not valid, pass the physical address of
the trampoline code from the image kernel to the boot kernel and
update RESTORE_MAGIC to avoid situations in which the boot kernel
may use a different trampoline address passing convention from the
one used by the image kernel.

Also drop the assembly code chunk updating restore_jump_address
in swsusp_arch_suspend() as it is not necessary any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 arch/x86/power/hibernate_64.c     |    8 ++++----
 arch/x86/power/hibernate_asm_64.S |    3 ---
 2 files changed, 4 insertions(+), 7 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===================================================================
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -27,7 +27,7 @@ extern asmlinkage __visible int restore_
  * Address to jump to in the last phase of restore in order to get to the image
  * kernel's text (this value is passed in the image header).
  */
-unsigned long restore_jump_address __visible;
+void *restore_jump_address __visible;
 
 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@ -113,7 +113,7 @@ struct restore_data_record {
 	unsigned long magic;
 };
 
-#define RESTORE_MAGIC	0x0123456789ABCDEFUL
+#define RESTORE_MAGIC	0x0123456789ABCDF0UL
 
 /**
  *	arch_hibernation_header_save - populate the architecture specific part
@@ -126,7 +126,7 @@ int arch_hibernation_header_save(void *a
 
 	if (max_size < sizeof(struct restore_data_record))
 		return -EOVERFLOW;
-	rdr->jump_address = restore_jump_address;
+	rdr->jump_address = (unsigned long)__pa_symbol(&restore_registers);
 	rdr->cr3 = restore_cr3;
 	rdr->magic = RESTORE_MAGIC;
 	return 0;
@@ -141,7 +141,7 @@ int arch_hibernation_header_restore(void
 {
 	struct restore_data_record *rdr = addr;
 
-	restore_jump_address = rdr->jump_address;
+	restore_jump_address = (void *)(rdr->jump_address + __START_KERNEL_map);
 	restore_cr3 = rdr->cr3;
 	return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
 }
Index: linux-pm/arch/x86/power/hibernate_asm_64.S
===================================================================
--- linux-pm.orig/arch/x86/power/hibernate_asm_64.S
+++ linux-pm/arch/x86/power/hibernate_asm_64.S
@@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend)
 	pushfq
 	popq	pt_regs_flags(%rax)
 
-	/* save the address of restore_registers */
-	movq	$restore_registers, %rax
-	movq	%rax, restore_jump_address(%rip)
 	/* save cr3 */
 	movq	%cr3, %rax
 	movq	%rax, restore_cr3(%rip)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-11  1:47                               ` Rafael J. Wysocki
@ 2016-06-11 11:48                                 ` Rafael J. Wysocki
  2016-06-11 16:35                                   ` Logan Gunthorpe
  2016-06-11 17:39                                 ` Logan Gunthorpe
  1 sibling, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-11 11:48 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Saturday, June 11, 2016 03:47:59 AM Rafael J. Wysocki wrote:
> On Saturday, June 11, 2016 02:13:45 AM Rafael J. Wysocki wrote:
> > On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote:
> > > On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote:
> > > > 
> > > > On 10/06/16 04:29 PM, Rafael J. Wysocki wrote:
> > > > > OK, I have a theory, but I need a bit of help.
> > > > > 
> > > > > This may be a dumb question, but I don't quite remember the answer readily.
> > > > > 
> > > > > Given a physical address, how do I get the corresponding virtual one under
> > > > > the kernel identity mapping?
> > > > 
> > > > Is that not just 'phys_to_virt(x)'??
> > > 
> > > Ah that.  Thanks!
> > 
> > So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully
> > will make it a bit clearer what happens there.
> > 
> > In particular, it follows from it quite clearly that the restore will only work
> > if the virtual address of the trampoline (&restore_registers) in the image
> > kernel happens to match the virtual address of the same chunk of memory in the
> > boot kernel (and after it has switched to the temporary page tables).
> > 
> > That appears to be the case most of the time, or hibernation would randomly
> > break for people all over, but I'm not actually sure if that's guaranteed to
> > happen.  If not, that very well may explain unexpected failures in there.
> > 
> > If it is not guaranteed to happen, the solution would be to pass the physical
> > address of that memory from the image kernel to the boot kernel instead of its
> > virtual address,
> 
> OK, the appended patch does the trick for me.
> 
> Logan, can you please try it?  It shouldn't break things, but I'm wondering if
> the problem you're having after commit ab76f7b4ab is still reproducible with it.

No, it won't help, and I think I know what's going on.

The temporary page tables set up by set_up_temporary_mappings() re-use the
original boot kernel's text mapping, so if the trampoline code address
falls into a non-executable page in that mapping, the boot kernel can't
pass control to the image kernel.

If that theory is correct, the patch below should help, but IMO it should
be applied regardless to eliminate this particular failure mode.

I'll prepare another patch to pass the physical address on top of this one.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] x86 / hibernate: Ensure that 64-bit trampoline code is executable

During resume from hibernation, the boot kernel has to jump to
the image kernel's trampoline code to pass control to it.  On
x86_64 the address of that code is passed from the image kernel
to the boot kernel in the image header.

That address is interpreted with respect to the original boot
kernel's kernel text memory mapping, so if the page containing it
is not executable in that mapping, the jump to it will crash the
kernel, so prevent that from happening.

While at it, clean up the way in which the trampoline code address
is saved by the image kernel (assembly code is involved in that
unnecessarily etc).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 arch/x86/power/hibernate_64.c     |   15 ++++++++++++---
 arch/x86/power/hibernate_asm_64.S |    3 ---
 2 files changed, 12 insertions(+), 6 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===================================================================
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -12,6 +12,7 @@
 #include <linux/smp.h>
 #include <linux/suspend.h>
 
+#include <asm/cacheflush.h>
 #include <asm/init.h>
 #include <asm/proto.h>
 #include <asm/page.h>
@@ -27,7 +28,7 @@ extern asmlinkage __visible int restore_
  * Address to jump to in the last phase of restore in order to get to the image
  * kernel's text (this value is passed in the image header).
  */
-unsigned long restore_jump_address __visible;
+void *restore_jump_address __visible;
 
 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@ -91,6 +92,14 @@ int swsusp_arch_resume(void)
 		return -ENOMEM;
 	memcpy(relocated_restore_code, &core_restore_code,
 	       &restore_registers - &core_restore_code);
+	/*
+	 * The temporary memory mapping set up by set_up_temporary_mappings()
+	 * above re-uses the original kernel text mapping.  If the page with
+	 * restore_jump_address is not executable in that mapping, it won't be
+	 * possible to pass control to the image kernel, so prevent that from
+	 * happening.
+	 */
+	set_memory_x((unsigned long)restore_jump_address, 1);
 
 	restore_image();
 	return 0;
@@ -108,7 +117,7 @@ int pfn_is_nosave(unsigned long pfn)
 }
 
 struct restore_data_record {
-	unsigned long jump_address;
+	void *jump_address;
 	unsigned long cr3;
 	unsigned long magic;
 };
@@ -126,7 +135,7 @@ int arch_hibernation_header_save(void *a
 
 	if (max_size < sizeof(struct restore_data_record))
 		return -EOVERFLOW;
-	rdr->jump_address = restore_jump_address;
+	rdr->jump_address = &restore_registers;
 	rdr->cr3 = restore_cr3;
 	rdr->magic = RESTORE_MAGIC;
 	return 0;
Index: linux-pm/arch/x86/power/hibernate_asm_64.S
===================================================================
--- linux-pm.orig/arch/x86/power/hibernate_asm_64.S
+++ linux-pm/arch/x86/power/hibernate_asm_64.S
@@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend)
 	pushfq
 	popq	pt_regs_flags(%rax)
 
-	/* save the address of restore_registers */
-	movq	$restore_registers, %rax
-	movq	%rax, restore_jump_address(%rip)
 	/* save cr3 */
 	movq	%cr3, %rax
 	movq	%rax, restore_cr3(%rip)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-11 11:48                                 ` Rafael J. Wysocki
@ 2016-06-11 16:35                                   ` Logan Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Logan Gunthorpe @ 2016-06-11 16:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

Hey Rafael,

Thank for looking into this. I tried the patch below applied to v4.6 and 
I still got a lockup on resume. Additionally there was a kernel warning 
at arch/x86/mm/pageattr.c:1414 change_page_attr_set_clr+0x2bb/0x440 and 
another one right afterwards at kernel/smp.c:416 
smp_call_function_many+0xb5/0x250.

Regardless, Ill try the other patch you sent shortly.

Logan

On 11/06/16 05:48 AM, Rafael J. Wysocki wrote:
> On Saturday, June 11, 2016 03:47:59 AM Rafael J. Wysocki wrote:
>> On Saturday, June 11, 2016 02:13:45 AM Rafael J. Wysocki wrote:
>>> On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote:
>>>> On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote:
>>>>>
>>>>> On 10/06/16 04:29 PM, Rafael J. Wysocki wrote:
>>>>>> OK, I have a theory, but I need a bit of help.
>>>>>>
>>>>>> This may be a dumb question, but I don't quite remember the answer readily.
>>>>>>
>>>>>> Given a physical address, how do I get the corresponding virtual one under
>>>>>> the kernel identity mapping?
>>>>>
>>>>> Is that not just 'phys_to_virt(x)'??
>>>>
>>>> Ah that.  Thanks!
>>>
>>> So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully
>>> will make it a bit clearer what happens there.
>>>
>>> In particular, it follows from it quite clearly that the restore will only work
>>> if the virtual address of the trampoline (&restore_registers) in the image
>>> kernel happens to match the virtual address of the same chunk of memory in the
>>> boot kernel (and after it has switched to the temporary page tables).
>>>
>>> That appears to be the case most of the time, or hibernation would randomly
>>> break for people all over, but I'm not actually sure if that's guaranteed to
>>> happen.  If not, that very well may explain unexpected failures in there.
>>>
>>> If it is not guaranteed to happen, the solution would be to pass the physical
>>> address of that memory from the image kernel to the boot kernel instead of its
>>> virtual address,
>>
>> OK, the appended patch does the trick for me.
>>
>> Logan, can you please try it?  It shouldn't break things, but I'm wondering if
>> the problem you're having after commit ab76f7b4ab is still reproducible with it.
>
> No, it won't help, and I think I know what's going on.
>
> The temporary page tables set up by set_up_temporary_mappings() re-use the
> original boot kernel's text mapping, so if the trampoline code address
> falls into a non-executable page in that mapping, the boot kernel can't
> pass control to the image kernel.
>
> If that theory is correct, the patch below should help, but IMO it should
> be applied regardless to eliminate this particular failure mode.
>
> I'll prepare another patch to pass the physical address on top of this one.
>
> Thanks,
> Rafael
>
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] x86 / hibernate: Ensure that 64-bit trampoline code is executable
>
> During resume from hibernation, the boot kernel has to jump to
> the image kernel's trampoline code to pass control to it.  On
> x86_64 the address of that code is passed from the image kernel
> to the boot kernel in the image header.
>
> That address is interpreted with respect to the original boot
> kernel's kernel text memory mapping, so if the page containing it
> is not executable in that mapping, the jump to it will crash the
> kernel, so prevent that from happening.
>
> While at it, clean up the way in which the trampoline code address
> is saved by the image kernel (assembly code is involved in that
> unnecessarily etc).
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   arch/x86/power/hibernate_64.c     |   15 ++++++++++++---
>   arch/x86/power/hibernate_asm_64.S |    3 ---
>   2 files changed, 12 insertions(+), 6 deletions(-)
>
> Index: linux-pm/arch/x86/power/hibernate_64.c
> ===================================================================
> --- linux-pm.orig/arch/x86/power/hibernate_64.c
> +++ linux-pm/arch/x86/power/hibernate_64.c
> @@ -12,6 +12,7 @@
>   #include <linux/smp.h>
>   #include <linux/suspend.h>
>
> +#include <asm/cacheflush.h>
>   #include <asm/init.h>
>   #include <asm/proto.h>
>   #include <asm/page.h>
> @@ -27,7 +28,7 @@ extern asmlinkage __visible int restore_
>    * Address to jump to in the last phase of restore in order to get to the image
>    * kernel's text (this value is passed in the image header).
>    */
> -unsigned long restore_jump_address __visible;
> +void *restore_jump_address __visible;
>
>   /*
>    * Value of the cr3 register from before the hibernation (this value is passed
> @@ -91,6 +92,14 @@ int swsusp_arch_resume(void)
>   		return -ENOMEM;
>   	memcpy(relocated_restore_code, &core_restore_code,
>   	       &restore_registers - &core_restore_code);
> +	/*
> +	 * The temporary memory mapping set up by set_up_temporary_mappings()
> +	 * above re-uses the original kernel text mapping.  If the page with
> +	 * restore_jump_address is not executable in that mapping, it won't be
> +	 * possible to pass control to the image kernel, so prevent that from
> +	 * happening.
> +	 */
> +	set_memory_x((unsigned long)restore_jump_address, 1);
>
>   	restore_image();
>   	return 0;
> @@ -108,7 +117,7 @@ int pfn_is_nosave(unsigned long pfn)
>   }
>
>   struct restore_data_record {
> -	unsigned long jump_address;
> +	void *jump_address;
>   	unsigned long cr3;
>   	unsigned long magic;
>   };
> @@ -126,7 +135,7 @@ int arch_hibernation_header_save(void *a
>
>   	if (max_size < sizeof(struct restore_data_record))
>   		return -EOVERFLOW;
> -	rdr->jump_address = restore_jump_address;
> +	rdr->jump_address = &restore_registers;
>   	rdr->cr3 = restore_cr3;
>   	rdr->magic = RESTORE_MAGIC;
>   	return 0;
> Index: linux-pm/arch/x86/power/hibernate_asm_64.S
> ===================================================================
> --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S
> +++ linux-pm/arch/x86/power/hibernate_asm_64.S
> @@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend)
>   	pushfq
>   	popq	pt_regs_flags(%rax)
>
> -	/* save the address of restore_registers */
> -	movq	$restore_registers, %rax
> -	movq	%rax, restore_jump_address(%rip)
>   	/* save cr3 */
>   	movq	%cr3, %rax
>   	movq	%rax, restore_cr3(%rip)
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-11  1:47                               ` Rafael J. Wysocki
  2016-06-11 11:48                                 ` Rafael J. Wysocki
@ 2016-06-11 17:39                                 ` Logan Gunthorpe
  2016-06-12  1:05                                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 25+ messages in thread
From: Logan Gunthorpe @ 2016-06-11 17:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

Hey Rafael,

I tried this patch as well and there was no change.

I have a couple tentative observations to make though. None of this is 
100% clear to me so please correct me if I'm wrong anywhere:

1) Commit ab76f7b4ab only extends the NX bit between __ex_table and 
rodata; which, by my understanding, shouldn't be used by anything. And 
__ex_table and rodata are fixed by the kernel's binary so both symbols 
should be the same in both the image kernel and the boot kernel given 
that both are running from the same binary.

2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though, 
when it's in place, it only works some of the time. Given that commit is 
only extending the NX region a bit, if there is some random mismatch, 
why does it never reach rodata? In other words, why is rodata a magic 
line that seems to work all the time -- why doesn't this random mismatch 
ever extend into the rodata region? rodata isn't _that_ far away from 
the end of ex_table.

Anyway, thanks again for looking into this.

Logan


On 10/06/16 07:47 PM, Rafael J. Wysocki wrote:
> On Saturday, June 11, 2016 02:13:45 AM Rafael J. Wysocki wrote:
>> On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote:
>>> On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote:
>>>>
>>>> On 10/06/16 04:29 PM, Rafael J. Wysocki wrote:
>>>>> OK, I have a theory, but I need a bit of help.
>>>>>
>>>>> This may be a dumb question, but I don't quite remember the answer readily.
>>>>>
>>>>> Given a physical address, how do I get the corresponding virtual one under
>>>>> the kernel identity mapping?
>>>>
>>>> Is that not just 'phys_to_virt(x)'??
>>>
>>> Ah that.  Thanks!
>>
>> So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully
>> will make it a bit clearer what happens there.
>>
>> In particular, it follows from it quite clearly that the restore will only work
>> if the virtual address of the trampoline (&restore_registers) in the image
>> kernel happens to match the virtual address of the same chunk of memory in the
>> boot kernel (and after it has switched to the temporary page tables).
>>
>> That appears to be the case most of the time, or hibernation would randomly
>> break for people all over, but I'm not actually sure if that's guaranteed to
>> happen.  If not, that very well may explain unexpected failures in there.
>>
>> If it is not guaranteed to happen, the solution would be to pass the physical
>> address of that memory from the image kernel to the boot kernel instead of its
>> virtual address,
>
> OK, the appended patch does the trick for me.
>
> Logan, can you please try it?  It shouldn't break things, but I'm wondering if
> the problem you're having after commit ab76f7b4ab is still reproducible with it.
>
> Thanks,
> Rafael
>
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] x86 / hibernate: Pass physical address of the trampoline
>
> During resume from hibernation, the boot kernel has to jump to
> the image kernel's trampoline code to pass control to it.  On
> x86_64 the address of that code is passed from the image kernel
> to the boot kernel in the image header.
>
> Currently, the way in which that address is saved by the image kernel
> is not entirely straightforward (assembly code is involved in that
> unnecessarily etc).  Moreover, the current code passes the virtual
> address of the trampoline with the assumption that the image and boot
> kernels will always use the same virtual addresses for the chunk of
> physical address space occupied by the trampoline code.
>
> In case that assumption is not valid, pass the physical address of
> the trampoline code from the image kernel to the boot kernel and
> update RESTORE_MAGIC to avoid situations in which the boot kernel
> may use a different trampoline address passing convention from the
> one used by the image kernel.
>
> Also drop the assembly code chunk updating restore_jump_address
> in swsusp_arch_suspend() as it is not necessary any more.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   arch/x86/power/hibernate_64.c     |    8 ++++----
>   arch/x86/power/hibernate_asm_64.S |    3 ---
>   2 files changed, 4 insertions(+), 7 deletions(-)
>
> Index: linux-pm/arch/x86/power/hibernate_64.c
> ===================================================================
> --- linux-pm.orig/arch/x86/power/hibernate_64.c
> +++ linux-pm/arch/x86/power/hibernate_64.c
> @@ -27,7 +27,7 @@ extern asmlinkage __visible int restore_
>    * Address to jump to in the last phase of restore in order to get to the image
>    * kernel's text (this value is passed in the image header).
>    */
> -unsigned long restore_jump_address __visible;
> +void *restore_jump_address __visible;
>
>   /*
>    * Value of the cr3 register from before the hibernation (this value is passed
> @@ -113,7 +113,7 @@ struct restore_data_record {
>   	unsigned long magic;
>   };
>
> -#define RESTORE_MAGIC	0x0123456789ABCDEFUL
> +#define RESTORE_MAGIC	0x0123456789ABCDF0UL
>
>   /**
>    *	arch_hibernation_header_save - populate the architecture specific part
> @@ -126,7 +126,7 @@ int arch_hibernation_header_save(void *a
>
>   	if (max_size < sizeof(struct restore_data_record))
>   		return -EOVERFLOW;
> -	rdr->jump_address = restore_jump_address;
> +	rdr->jump_address = (unsigned long)__pa_symbol(&restore_registers);
>   	rdr->cr3 = restore_cr3;
>   	rdr->magic = RESTORE_MAGIC;
>   	return 0;
> @@ -141,7 +141,7 @@ int arch_hibernation_header_restore(void
>   {
>   	struct restore_data_record *rdr = addr;
>
> -	restore_jump_address = rdr->jump_address;
> +	restore_jump_address = (void *)(rdr->jump_address + __START_KERNEL_map);
>   	restore_cr3 = rdr->cr3;
>   	return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
>   }
> Index: linux-pm/arch/x86/power/hibernate_asm_64.S
> ===================================================================
> --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S
> +++ linux-pm/arch/x86/power/hibernate_asm_64.S
> @@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend)
>   	pushfq
>   	popq	pt_regs_flags(%rax)
>
> -	/* save the address of restore_registers */
> -	movq	$restore_registers, %rax
> -	movq	%rax, restore_jump_address(%rip)
>   	/* save cr3 */
>   	movq	%cr3, %rax
>   	movq	%rax, restore_cr3(%rip)
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-11 17:39                                 ` Logan Gunthorpe
@ 2016-06-12  1:05                                   ` Rafael J. Wysocki
  2016-06-12  4:48                                     ` Logan Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-12  1:05 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Saturday, June 11, 2016 11:39:48 AM Logan Gunthorpe wrote:
> Hey Rafael,
> 
> I tried this patch as well and there was no change.
> 
> I have a couple tentative observations to make though. None of this is 
> 100% clear to me so please correct me if I'm wrong anywhere:
> 
> 1) Commit ab76f7b4ab only extends the NX bit between __ex_table and 
> rodata; which, by my understanding, shouldn't be used by anything. And 
> __ex_table and rodata are fixed by the kernel's binary so both symbols 
> should be the same in both the image kernel and the boot kernel given 
> that both are running from the same binary.

Well, what if the kernel is relocated?

> 2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though, 
> when it's in place, it only works some of the time. Given that commit is 
> only extending the NX region a bit, if there is some random mismatch, 
> why does it never reach rodata? In other words, why is rodata a magic 
> line that seems to work all the time -- why doesn't this random mismatch 
> ever extend into the rodata region? rodata isn't _that_ far away from 
> the end of ex_table.

That's a very good question. :-)

Overall, it looks like re-using the boot kernel text mapping in the temporary
page tables is a bad idea.  I guess a temporary kernel text mapping is needed
too or at least the existing one has to be modified to cover the trampoline
code properly.

> Anyway, thanks again for looking into this.

No problem.  I haven't helped much so far, though ...

Can you please check if the patch below makes any difference?

---
 arch/x86/power/hibernate_64.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===================================================================
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -12,6 +12,7 @@
 #include <linux/smp.h>
 #include <linux/suspend.h>
 
+#include <asm/cacheflush.h>
 #include <asm/init.h>
 #include <asm/proto.h>
 #include <asm/page.h>
@@ -27,7 +28,7 @@ extern asmlinkage __visible int restore_
  * Address to jump to in the last phase of restore in order to get to the image
  * kernel's text (this value is passed in the image header).
  */
-unsigned long restore_jump_address __visible;
+void *restore_jump_address __visible;
 
 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@ -108,7 +109,7 @@ int pfn_is_nosave(unsigned long pfn)
 }
 
 struct restore_data_record {
-	unsigned long jump_address;
+	void *jump_address;
 	unsigned long cr3;
 	unsigned long magic;
 };
@@ -126,7 +127,7 @@ int arch_hibernation_header_save(void *a
 
 	if (max_size < sizeof(struct restore_data_record))
 		return -EOVERFLOW;
-	rdr->jump_address = restore_jump_address;
+	rdr->jump_address = &restore_registers;
 	rdr->cr3 = restore_cr3;
 	rdr->magic = RESTORE_MAGIC;
 	return 0;
@@ -140,8 +141,18 @@ int arch_hibernation_header_save(void *a
 int arch_hibernation_header_restore(void *addr)
 {
 	struct restore_data_record *rdr = addr;
+	unsigned long text_end, all_end;
+
+	if (rdr->magic != RESTORE_MAGIC)
+		return -EINVAL;
 
 	restore_jump_address = rdr->jump_address;
 	restore_cr3 = rdr->cr3;
-	return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
+
+	text_end = PFN_ALIGN(&__stop___ex_table);
+	all_end = roundup((unsigned long)restore_jump_address, PMD_SIZE);
+	if (all_end > text_end)
+		set_memory_x(text_end, (all_end - text_end) >> PAGE_SHIFT);
+
+	return 0;
 }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-12  1:05                                   ` Rafael J. Wysocki
@ 2016-06-12  4:48                                     ` Logan Gunthorpe
  2016-06-12 14:31                                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Logan Gunthorpe @ 2016-06-12  4:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

Hey,

On 11/06/16 07:05 PM, Rafael J. Wysocki wrote:
>> 1) Commit ab76f7b4ab only extends the NX bit between __ex_table and
>> rodata; which, by my understanding, shouldn't be used by anything. And
>> __ex_table and rodata are fixed by the kernel's binary so both symbols
>> should be the same in both the image kernel and the boot kernel given
>> that both are running from the same binary.
>
> Well, what if the kernel is relocated?

Ah, I'm sure I don't fully grasp the implications of that but I would 
assume that if the image kernel were located somewhere else it would 
still be far away from the boot kernel's ex_table/rodata boundary.

>> 2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though,
>> when it's in place, it only works some of the time. Given that commit is
>> only extending the NX region a bit, if there is some random mismatch,
>> why does it never reach rodata? In other words, why is rodata a magic
>> line that seems to work all the time -- why doesn't this random mismatch
>> ever extend into the rodata region? rodata isn't _that_ far away from
>> the end of ex_table.
>
> That's a very good question. :-)

Yeah, I guess if we knew the answer we'd understand what was going on 
and have a fix.


> Can you please check if the patch below makes any difference?

I'm afraid it's no different. The kernel still freezes on resume. 
Though, no warnings with this one.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-12  4:48                                     ` Logan Gunthorpe
@ 2016-06-12 14:31                                       ` Rafael J. Wysocki
  2016-06-12 16:11                                         ` Logan Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-12 14:31 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Saturday, June 11, 2016 10:48:08 PM Logan Gunthorpe wrote:
> Hey,

Hi,

> On 11/06/16 07:05 PM, Rafael J. Wysocki wrote:
> >> 1) Commit ab76f7b4ab only extends the NX bit between __ex_table and
> >> rodata; which, by my understanding, shouldn't be used by anything. And
> >> __ex_table and rodata are fixed by the kernel's binary so both symbols
> >> should be the same in both the image kernel and the boot kernel given
> >> that both are running from the same binary.
> >
> > Well, what if the kernel is relocated?
> 
> Ah, I'm sure I don't fully grasp the implications of that but I would 
> assume that if the image kernel were located somewhere else it would 
> still be far away from the boot kernel's ex_table/rodata boundary.

Probably.

> >> 2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though,
> >> when it's in place, it only works some of the time. Given that commit is
> >> only extending the NX region a bit, if there is some random mismatch,
> >> why does it never reach rodata? In other words, why is rodata a magic
> >> line that seems to work all the time -- why doesn't this random mismatch
> >> ever extend into the rodata region? rodata isn't _that_ far away from
> >> the end of ex_table.
> >
> > That's a very good question. :-)
> 
> Yeah, I guess if we knew the answer we'd understand what was going on 
> and have a fix.

Right.

Appended is one more patch to try.

It actually fixes a theoretical problem, so I'll need to add comments to it
(as it is far from obvious IMO) and a changelog etc and post it as a proper
submission.

So the concern is that the page copying done in the loop in core_restore_code()
may corrupt the kernel text part of the temporary memory mapping used at that
time, because that's the original kernel text mapping from the boot kernel.

That doesn't matter for the loop itself (as that code runs from a "safe" page
guaranteed not to be overwritten), but it (quite theoretically) may matter for
the final jump to the image kernel's restore_registers().  [I realized that
it might be a problem only after I had started to think about the problem
you reported.]

As a bonus, the patch also eliminates the possible concern about the
executability of the memory mapped via the kernel text mapping in the boot
kernel, so IMO it's worth to give it a shot.  I've tested it lightly on
one machine, but I guess it would just crash right away if there were
any problems in it.

Thanks,
Rafael


---
 arch/x86/power/hibernate_64.c     |   46 ++++++++++++++++++++++++++++++++++----
 arch/x86/power/hibernate_asm_64.S |   28 +++++++++++++----------
 2 files changed, 58 insertions(+), 16 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===================================================================
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -27,7 +27,8 @@ extern asmlinkage __visible int restore_
  * Address to jump to in the last phase of restore in order to get to the image
  * kernel's text (this value is passed in the image header).
  */
-unsigned long restore_jump_address __visible;
+void *restore_jump_address __visible;
+unsigned long jump_address_phys;
 
 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@ -37,6 +38,9 @@ unsigned long restore_cr3 __visible;
 
 pgd_t *temp_level4_pgt __visible;
 
+void *restore_pgd_addr __visible;
+pgd_t restore_pgd __visible;
+
 void *relocated_restore_code __visible;
 
 static void *alloc_pgt_page(void *context)
@@ -44,6 +48,33 @@ static void *alloc_pgt_page(void *contex
 	return (void *)get_safe_page(GFP_ATOMIC);
 }
 
+static int prepare_temporary_text_mapping(void)
+{
+	unsigned long vaddr = (unsigned long)restore_jump_address;
+	unsigned long paddr = jump_address_phys & PMD_MASK;
+	pmd_t *pmd;
+	pud_t *pud;
+
+	pud = (pud_t *)get_safe_page(GFP_ATOMIC);
+	if (!pud)
+		return -ENOMEM;
+
+	restore_pgd = __pgd(__pa(pud) | _KERNPG_TABLE);
+
+	pud += pud_index(vaddr);
+	pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
+	if (!pmd)
+		return -ENOMEM;
+
+	set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+
+	pmd += pmd_index(vaddr);
+	set_pmd(pmd, __pmd(paddr | __PAGE_KERNEL_LARGE_EXEC));
+
+	restore_pgd_addr = temp_level4_pgt + pgd_index(vaddr);
+	return 0;
+}
+
 static int set_up_temporary_mappings(void)
 {
 	struct x86_mapping_info info = {
@@ -63,6 +94,10 @@ static int set_up_temporary_mappings(voi
 	set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map),
 		init_level4_pgt[pgd_index(__START_KERNEL_map)]);
 
+	result = prepare_temporary_text_mapping();
+	if (result)
+		return result;
+
 	/* Set up the direct mapping from scratch */
 	for (i = 0; i < nr_pfn_mapped; i++) {
 		mstart = pfn_mapped[i].start << PAGE_SHIFT;
@@ -108,12 +143,13 @@ int pfn_is_nosave(unsigned long pfn)
 }
 
 struct restore_data_record {
-	unsigned long jump_address;
+	void *jump_address;
+	unsigned long jump_address_phys;
 	unsigned long cr3;
 	unsigned long magic;
 };
 
-#define RESTORE_MAGIC	0x0123456789ABCDEFUL
+#define RESTORE_MAGIC	0x123456789ABCDEF0UL
 
 /**
  *	arch_hibernation_header_save - populate the architecture specific part
@@ -126,7 +162,8 @@ int arch_hibernation_header_save(void *a
 
 	if (max_size < sizeof(struct restore_data_record))
 		return -EOVERFLOW;
-	rdr->jump_address = restore_jump_address;
+	rdr->jump_address = &restore_registers;
+	rdr->jump_address_phys = __pa_symbol(&restore_registers);
 	rdr->cr3 = restore_cr3;
 	rdr->magic = RESTORE_MAGIC;
 	return 0;
@@ -142,6 +179,7 @@ int arch_hibernation_header_restore(void
 	struct restore_data_record *rdr = addr;
 
 	restore_jump_address = rdr->jump_address;
+	jump_address_phys = rdr->jump_address_phys;
 	restore_cr3 = rdr->cr3;
 	return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
 }
Index: linux-pm/arch/x86/power/hibernate_asm_64.S
===================================================================
--- linux-pm.orig/arch/x86/power/hibernate_asm_64.S
+++ linux-pm/arch/x86/power/hibernate_asm_64.S
@@ -72,8 +72,10 @@ ENTRY(restore_image)
 	movq	%rax, %cr4;  # turn PGE back on
 
 	/* prepare to jump to the image kernel */
-	movq	restore_jump_address(%rip), %rax
 	movq	restore_cr3(%rip), %rbx
+	movq	restore_jump_address(%rip), %r10
+	movq	restore_pgd(%rip), %r8
+	movq	restore_pgd_addr(%rip), %r9
 
 	/* prepare to copy image data to their original locations */
 	movq	restore_pblist(%rip), %rdx
@@ -96,20 +98,22 @@ ENTRY(core_restore_code)
 	/* progress to the next pbe */
 	movq	pbe_next(%rdx), %rdx
 	jmp	.Lloop
+
 .Ldone:
+	/* switch over to the temporary kernel text mapping */
+	movq	%r8, (%r9)
+	/* flush TLB */
+	movq	%rax, %rdx
+	andq	$~(X86_CR4_PGE), %rdx
+	movq	%rdx, %cr4;  # turn off PGE
+	movq	%cr3, %rcx;  # flush TLB
+	movq	%rcx, %cr3;
+	movq	%rax, %cr4;  # turn PGE back on
 	/* jump to the restore_registers address from the image header */
-	jmpq	*%rax
-	/*
-	 * NOTE: This assumes that the boot kernel's text mapping covers the
-	 * image kernel's page containing restore_registers and the address of
-	 * this page is the same as in the image kernel's text mapping (it
-	 * should always be true, because the text mapping is linear, starting
-	 * from 0, and is supposed to cover the entire kernel text for every
-	 * kernel).
-	 *
-	 * code below belongs to the image kernel
-	 */
+	jmpq	*%r10
 
+	 /* code below belongs to the image kernel */
+	.align PAGE_SIZE
 ENTRY(restore_registers)
 	FRAME_BEGIN
 	/* go back to the original page tables */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-12 14:31                                       ` Rafael J. Wysocki
@ 2016-06-12 16:11                                         ` Logan Gunthorpe
  2016-06-13 13:43                                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Logan Gunthorpe @ 2016-06-12 16:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

Hey Rafael,

Awesome, this patch fixes the problem! Nice work.

Thanks,

Logan

On 12/06/16 08:31 AM, Rafael J. Wysocki wrote:
> On Saturday, June 11, 2016 10:48:08 PM Logan Gunthorpe wrote:
>> Hey,
>
> Hi,
>
>> On 11/06/16 07:05 PM, Rafael J. Wysocki wrote:
>>>> 1) Commit ab76f7b4ab only extends the NX bit between __ex_table and
>>>> rodata; which, by my understanding, shouldn't be used by anything. And
>>>> __ex_table and rodata are fixed by the kernel's binary so both symbols
>>>> should be the same in both the image kernel and the boot kernel given
>>>> that both are running from the same binary.
>>>
>>> Well, what if the kernel is relocated?
>>
>> Ah, I'm sure I don't fully grasp the implications of that but I would
>> assume that if the image kernel were located somewhere else it would
>> still be far away from the boot kernel's ex_table/rodata boundary.
>
> Probably.
>
>>>> 2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though,
>>>> when it's in place, it only works some of the time. Given that commit is
>>>> only extending the NX region a bit, if there is some random mismatch,
>>>> why does it never reach rodata? In other words, why is rodata a magic
>>>> line that seems to work all the time -- why doesn't this random mismatch
>>>> ever extend into the rodata region? rodata isn't _that_ far away from
>>>> the end of ex_table.
>>>
>>> That's a very good question. :-)
>>
>> Yeah, I guess if we knew the answer we'd understand what was going on
>> and have a fix.
>
> Right.
>
> Appended is one more patch to try.
>
> It actually fixes a theoretical problem, so I'll need to add comments to it
> (as it is far from obvious IMO) and a changelog etc and post it as a proper
> submission.
>
> So the concern is that the page copying done in the loop in core_restore_code()
> may corrupt the kernel text part of the temporary memory mapping used at that
> time, because that's the original kernel text mapping from the boot kernel.
>
> That doesn't matter for the loop itself (as that code runs from a "safe" page
> guaranteed not to be overwritten), but it (quite theoretically) may matter for
> the final jump to the image kernel's restore_registers().  [I realized that
> it might be a problem only after I had started to think about the problem
> you reported.]
>
> As a bonus, the patch also eliminates the possible concern about the
> executability of the memory mapped via the kernel text mapping in the boot
> kernel, so IMO it's worth to give it a shot.  I've tested it lightly on
> one machine, but I guess it would just crash right away if there were
> any problems in it.
>
> Thanks,
> Rafael
>
>
> ---
>   arch/x86/power/hibernate_64.c     |   46 ++++++++++++++++++++++++++++++++++----
>   arch/x86/power/hibernate_asm_64.S |   28 +++++++++++++----------
>   2 files changed, 58 insertions(+), 16 deletions(-)
>
> Index: linux-pm/arch/x86/power/hibernate_64.c
> ===================================================================
> --- linux-pm.orig/arch/x86/power/hibernate_64.c
> +++ linux-pm/arch/x86/power/hibernate_64.c
> @@ -27,7 +27,8 @@ extern asmlinkage __visible int restore_
>    * Address to jump to in the last phase of restore in order to get to the image
>    * kernel's text (this value is passed in the image header).
>    */
> -unsigned long restore_jump_address __visible;
> +void *restore_jump_address __visible;
> +unsigned long jump_address_phys;
>
>   /*
>    * Value of the cr3 register from before the hibernation (this value is passed
> @@ -37,6 +38,9 @@ unsigned long restore_cr3 __visible;
>
>   pgd_t *temp_level4_pgt __visible;
>
> +void *restore_pgd_addr __visible;
> +pgd_t restore_pgd __visible;
> +
>   void *relocated_restore_code __visible;
>
>   static void *alloc_pgt_page(void *context)
> @@ -44,6 +48,33 @@ static void *alloc_pgt_page(void *contex
>   	return (void *)get_safe_page(GFP_ATOMIC);
>   }
>
> +static int prepare_temporary_text_mapping(void)
> +{
> +	unsigned long vaddr = (unsigned long)restore_jump_address;
> +	unsigned long paddr = jump_address_phys & PMD_MASK;
> +	pmd_t *pmd;
> +	pud_t *pud;
> +
> +	pud = (pud_t *)get_safe_page(GFP_ATOMIC);
> +	if (!pud)
> +		return -ENOMEM;
> +
> +	restore_pgd = __pgd(__pa(pud) | _KERNPG_TABLE);
> +
> +	pud += pud_index(vaddr);
> +	pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
> +	if (!pmd)
> +		return -ENOMEM;
> +
> +	set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> +
> +	pmd += pmd_index(vaddr);
> +	set_pmd(pmd, __pmd(paddr | __PAGE_KERNEL_LARGE_EXEC));
> +
> +	restore_pgd_addr = temp_level4_pgt + pgd_index(vaddr);
> +	return 0;
> +}
> +
>   static int set_up_temporary_mappings(void)
>   {
>   	struct x86_mapping_info info = {
> @@ -63,6 +94,10 @@ static int set_up_temporary_mappings(voi
>   	set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map),
>   		init_level4_pgt[pgd_index(__START_KERNEL_map)]);
>
> +	result = prepare_temporary_text_mapping();
> +	if (result)
> +		return result;
> +
>   	/* Set up the direct mapping from scratch */
>   	for (i = 0; i < nr_pfn_mapped; i++) {
>   		mstart = pfn_mapped[i].start << PAGE_SHIFT;
> @@ -108,12 +143,13 @@ int pfn_is_nosave(unsigned long pfn)
>   }
>
>   struct restore_data_record {
> -	unsigned long jump_address;
> +	void *jump_address;
> +	unsigned long jump_address_phys;
>   	unsigned long cr3;
>   	unsigned long magic;
>   };
>
> -#define RESTORE_MAGIC	0x0123456789ABCDEFUL
> +#define RESTORE_MAGIC	0x123456789ABCDEF0UL
>
>   /**
>    *	arch_hibernation_header_save - populate the architecture specific part
> @@ -126,7 +162,8 @@ int arch_hibernation_header_save(void *a
>
>   	if (max_size < sizeof(struct restore_data_record))
>   		return -EOVERFLOW;
> -	rdr->jump_address = restore_jump_address;
> +	rdr->jump_address = &restore_registers;
> +	rdr->jump_address_phys = __pa_symbol(&restore_registers);
>   	rdr->cr3 = restore_cr3;
>   	rdr->magic = RESTORE_MAGIC;
>   	return 0;
> @@ -142,6 +179,7 @@ int arch_hibernation_header_restore(void
>   	struct restore_data_record *rdr = addr;
>
>   	restore_jump_address = rdr->jump_address;
> +	jump_address_phys = rdr->jump_address_phys;
>   	restore_cr3 = rdr->cr3;
>   	return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
>   }
> Index: linux-pm/arch/x86/power/hibernate_asm_64.S
> ===================================================================
> --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S
> +++ linux-pm/arch/x86/power/hibernate_asm_64.S
> @@ -72,8 +72,10 @@ ENTRY(restore_image)
>   	movq	%rax, %cr4;  # turn PGE back on
>
>   	/* prepare to jump to the image kernel */
> -	movq	restore_jump_address(%rip), %rax
>   	movq	restore_cr3(%rip), %rbx
> +	movq	restore_jump_address(%rip), %r10
> +	movq	restore_pgd(%rip), %r8
> +	movq	restore_pgd_addr(%rip), %r9
>
>   	/* prepare to copy image data to their original locations */
>   	movq	restore_pblist(%rip), %rdx
> @@ -96,20 +98,22 @@ ENTRY(core_restore_code)
>   	/* progress to the next pbe */
>   	movq	pbe_next(%rdx), %rdx
>   	jmp	.Lloop
> +
>   .Ldone:
> +	/* switch over to the temporary kernel text mapping */
> +	movq	%r8, (%r9)
> +	/* flush TLB */
> +	movq	%rax, %rdx
> +	andq	$~(X86_CR4_PGE), %rdx
> +	movq	%rdx, %cr4;  # turn off PGE
> +	movq	%cr3, %rcx;  # flush TLB
> +	movq	%rcx, %cr3;
> +	movq	%rax, %cr4;  # turn PGE back on
>   	/* jump to the restore_registers address from the image header */
> -	jmpq	*%rax
> -	/*
> -	 * NOTE: This assumes that the boot kernel's text mapping covers the
> -	 * image kernel's page containing restore_registers and the address of
> -	 * this page is the same as in the image kernel's text mapping (it
> -	 * should always be true, because the text mapping is linear, starting
> -	 * from 0, and is supposed to cover the entire kernel text for every
> -	 * kernel).
> -	 *
> -	 * code below belongs to the image kernel
> -	 */
> +	jmpq	*%r10
>
> +	 /* code below belongs to the image kernel */
> +	.align PAGE_SIZE
>   ENTRY(restore_registers)
>   	FRAME_BEGIN
>   	/* go back to the original page tables */
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-06-12 16:11                                         ` Logan Gunthorpe
@ 2016-06-13 13:43                                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-13 13:43 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Sunday, June 12, 2016 10:11:38 AM Logan Gunthorpe wrote:
> Hey Rafael,
> 
> Awesome, this patch fixes the problem! Nice work.

Great, thanks for testing!

I've just sent an "official" version of it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PROBLEM: Resume form hibernate broken by setting NX on gap
  2016-05-20 21:59         ` Kees Cook
  2016-05-20 22:16           ` Kees Cook
@ 2016-06-10 22:11           ` Rafael J. Wysocki
  1 sibling, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2016-06-10 22:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Logan Gunthorpe,
	Ingo Molnar, the arch/x86 maintainers, linux-pm,
	Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Friday, May 20, 2016 02:59:30 PM Kees Cook wrote:
> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
> >> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote:
> >>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >>>>
> >>>> * Logan Gunthorpe <logang@deltatee.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I have been working on a bug that causes my laptop to freeze during
> >>>>> resume from hibernation. I did a bisect to find the offending commit:
> >>>>>
> >>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
> >>>>>
> >>>>> There is more information in the bugzilla report [1] that
> >>>>> I've been working on but I will summarize things below.
> >>>>>
> >>>>> I've experienced intermittent but reproducible freezes when resuming
> >>>>> from hibernation since about kernel version 3.19. The freeze was
> >>>>> significantly more reproducible when a few applications were loaded
> >>>>> before hibernation and would largely not happen if hibernated
> >>>>> immediately after booting to a desktop. I did some tracing work to find
> >>>>> that the kernel gets as far as the resume_image call in
> >>>>> swsusp_arch_resume and I could not find any response from the image
> >>>>> kernel when I hit the bug. I also did testing that seemed to rule out
> >>>>> this being caused by a problematic driver.
> >>>>>
> >>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in
> >>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4.
> >>>>> Then, I did a second bisect with a ported version of the fix to the
> >>>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation
> >>>>> with what appears to be the exact same symptoms. Reverting that commit
> >>>>> in recent kernels up to and including 4.6 fixes the issue and restores
> >>>>> reliable hibernation. However, it's not at all clear to me why that
> >>>>> commit would cause this issue or how to fix the issue without reverting.
> >>>>
> >>>> I've attached that commit below and also Cc:-ed a few more people who might have
> >>>> an idea about why this regressed. Worst-case we'll have to revert it.
> >>>
> >>> Without looking deep into mm, my theory would be that after this patch
> >>> the final jump from the boot kernel to the image kernel's trampoline
> >>> code during resume may crash the kernel if the trampoline page turns
> >>> out to be NX in the boot kernel (it has to be executable in both the
> >>> boot and the image kernels).
> >>
> >> So, pardon my ignorance, but where is this trampoline page placed in
> >> kernel memory?
> >
> > On 32-bit its location has to be the same in both the boot and the
> > image kernels and that's within kernel text in both cases, so that
> > shouldn't be a problem.
> >
> > On 64-bit its location depends on the image kernel and specifically on
> > the location of the restore_registers routine in it.  The (virtual)
> > address of that routine is stored in the restore_jump_address
> > variable, so the page containing it (the trampoline page) can be found
> > with the help of that.
> >
> > swsusp_arch_resume() sets up a temporary kernel mapping to finalize
> > the image restoration and that page must not be NX in that mapping for
> > things to work.
> 
> It looks like nothing in the swsusp_arch_resume() -> get_safe_page()
> -> get_image_page() path sets the page executable...
> 
> Untested, but I wonder if this work work in swsusp_arch_resume()
> before the memcpy?
> 
> (apologies for any gmail-based whitespace mangling...)
> 
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 009947d419a6..c2f3ecc45bd4 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -12,6 +12,7 @@
>  #include <linux/smp.h>
>  #include <linux/suspend.h>
> 
> +#include <asm/cacheflush.h>
>  #include <asm/init.h>
>  #include <asm/proto.h>
>  #include <asm/page.h>
> @@ -89,6 +90,7 @@ int swsusp_arch_resume(void)
>         relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC);
>         if (!relocated_restore_code)
>                 return -ENOMEM;
> +       set_memory_x((unsigned long)relocated_restore_code, 1);
>         memcpy(relocated_restore_code, &core_restore_code,
>                &restore_registers - &core_restore_code);
> 
> 

We may not be on the right track with this I'm afraid.

When the temporary kernel mapping is set up in swsusp_arch_resume(), it passes
__PAGE_KERNEL_LARGE_EXEC as pmd_flag in the x86_mapping_info, so all memory
should be executable after we switch to that which happens right at the
beginning of restore_image().

So restore_image() and the code it jumps to should be fine from the
executability perspective (including the final jump to restore_jump_address).

Or am I missingy anything?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-06-13 13:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <573DF82D.50006@deltatee.com>
2016-05-20  7:15 ` PROBLEM: Resume form hibernate broken by setting NX on gap Ingo Molnar
2016-05-20 11:34   ` Rafael J. Wysocki
2016-05-20 13:56     ` Stephen Smalley
2016-05-20 21:46       ` Rafael J. Wysocki
2016-05-20 21:59         ` Kees Cook
2016-05-20 22:16           ` Kees Cook
     [not found]             ` <573FC081.20006@deltatee.com>
2016-05-21 16:39               ` Kees Cook
     [not found]                 ` <575A3E95.5090100@deltatee.com>
2016-06-10 18:09                   ` Kees Cook
2016-06-10 18:16                     ` Logan Gunthorpe
2016-06-10 18:18                       ` Kees Cook
2016-06-10 21:27                     ` Rafael J. Wysocki
2016-06-10 22:29                       ` Rafael J. Wysocki
2016-06-10 22:28                         ` Logan Gunthorpe
2016-06-10 22:33                           ` Rafael J. Wysocki
2016-06-11  0:13                             ` Rafael J. Wysocki
2016-06-11  1:47                               ` Rafael J. Wysocki
2016-06-11 11:48                                 ` Rafael J. Wysocki
2016-06-11 16:35                                   ` Logan Gunthorpe
2016-06-11 17:39                                 ` Logan Gunthorpe
2016-06-12  1:05                                   ` Rafael J. Wysocki
2016-06-12  4:48                                     ` Logan Gunthorpe
2016-06-12 14:31                                       ` Rafael J. Wysocki
2016-06-12 16:11                                         ` Logan Gunthorpe
2016-06-13 13:43                                           ` Rafael J. Wysocki
2016-06-10 22:11           ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.