* Re: PROBLEM: Resume form hibernate broken by setting NX on gap [not found] <573DF82D.50006@deltatee.com> @ 2016-05-20 7:15 ` Ingo Molnar 2016-05-20 11:34 ` Rafael J. Wysocki 0 siblings, 1 reply; 25+ messages in thread From: Ingo Molnar @ 2016-05-20 7:15 UTC (permalink / raw) To: Logan Gunthorpe Cc: Stephen Smalley, Kees Cook, Ingo Molnar, x86, linux-pm, linux-kernel, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst * Logan Gunthorpe <logang@deltatee.com> wrote: > Hi, > > I have been working on a bug that causes my laptop to freeze during > resume from hibernation. I did a bisect to find the offending commit: > > [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata > > There is more information in the bugzilla report [1] that > I've been working on but I will summarize things below. > > I've experienced intermittent but reproducible freezes when resuming > from hibernation since about kernel version 3.19. The freeze was > significantly more reproducible when a few applications were loaded > before hibernation and would largely not happen if hibernated > immediately after booting to a desktop. I did some tracing work to find > that the kernel gets as far as the resume_image call in > swsusp_arch_resume and I could not find any response from the image > kernel when I hit the bug. I also did testing that seemed to rule out > this being caused by a problematic driver. > > I did a successful bisect between 3.18 and 3.19 which found a bug in > commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. > Then, I did a second bisect with a ported version of the fix to the > first bug and found commit ab76f7b4ab in 4.3 to also break hibernation > with what appears to be the exact same symptoms. Reverting that commit > in recent kernels up to and including 4.6 fixes the issue and restores > reliable hibernation. However, it's not at all clear to me why that > commit would cause this issue or how to fix the issue without reverting. I've attached that commit below and also Cc:-ed a few more people who might have an idea about why this regressed. Worst-case we'll have to revert it. Thanks, Ingo =================> >From ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 Mon Sep 17 00:00:00 2001 From: Stephen Smalley <sds@tycho.nsa.gov> Date: Thu, 1 Oct 2015 09:04:22 -0400 Subject: [PATCH] x86/mm: Set NX on gap between __ex_table and rodata Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd After: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/mm/init_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 30564e2752d3..df48430c279b 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1132,7 +1132,7 @@ void mark_rodata_ro(void) * has been zapped already via cleanup_highmem(). */ all_end = roundup((unsigned long)_brk_end, PMD_SIZE); - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); + set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT); rodata_test(); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-05-20 7:15 ` PROBLEM: Resume form hibernate broken by setting NX on gap Ingo Molnar @ 2016-05-20 11:34 ` Rafael J. Wysocki 2016-05-20 13:56 ` Stephen Smalley 0 siblings, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2016-05-20 11:34 UTC (permalink / raw) To: Ingo Molnar Cc: Logan Gunthorpe, Stephen Smalley, Kees Cook, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: > > * Logan Gunthorpe <logang@deltatee.com> wrote: > >> Hi, >> >> I have been working on a bug that causes my laptop to freeze during >> resume from hibernation. I did a bisect to find the offending commit: >> >> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >> >> There is more information in the bugzilla report [1] that >> I've been working on but I will summarize things below. >> >> I've experienced intermittent but reproducible freezes when resuming >> from hibernation since about kernel version 3.19. The freeze was >> significantly more reproducible when a few applications were loaded >> before hibernation and would largely not happen if hibernated >> immediately after booting to a desktop. I did some tracing work to find >> that the kernel gets as far as the resume_image call in >> swsusp_arch_resume and I could not find any response from the image >> kernel when I hit the bug. I also did testing that seemed to rule out >> this being caused by a problematic driver. >> >> I did a successful bisect between 3.18 and 3.19 which found a bug in >> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. >> Then, I did a second bisect with a ported version of the fix to the >> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation >> with what appears to be the exact same symptoms. Reverting that commit >> in recent kernels up to and including 4.6 fixes the issue and restores >> reliable hibernation. However, it's not at all clear to me why that >> commit would cause this issue or how to fix the issue without reverting. > > I've attached that commit below and also Cc:-ed a few more people who might have > an idea about why this regressed. Worst-case we'll have to revert it. Without looking deep into mm, my theory would be that after this patch the final jump from the boot kernel to the image kernel's trampoline code during resume may crash the kernel if the trampoline page turns out to be NX in the boot kernel (it has to be executable in both the boot and the image kernels). Thanks, Rafael ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-05-20 11:34 ` Rafael J. Wysocki @ 2016-05-20 13:56 ` Stephen Smalley 2016-05-20 21:46 ` Rafael J. Wysocki 0 siblings, 1 reply; 25+ messages in thread From: Stephen Smalley @ 2016-05-20 13:56 UTC (permalink / raw) To: Rafael J. Wysocki, Ingo Molnar Cc: Logan Gunthorpe, Kees Cook, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: > On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: >> >> * Logan Gunthorpe <logang@deltatee.com> wrote: >> >>> Hi, >>> >>> I have been working on a bug that causes my laptop to freeze during >>> resume from hibernation. I did a bisect to find the offending commit: >>> >>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>> >>> There is more information in the bugzilla report [1] that >>> I've been working on but I will summarize things below. >>> >>> I've experienced intermittent but reproducible freezes when resuming >>> from hibernation since about kernel version 3.19. The freeze was >>> significantly more reproducible when a few applications were loaded >>> before hibernation and would largely not happen if hibernated >>> immediately after booting to a desktop. I did some tracing work to find >>> that the kernel gets as far as the resume_image call in >>> swsusp_arch_resume and I could not find any response from the image >>> kernel when I hit the bug. I also did testing that seemed to rule out >>> this being caused by a problematic driver. >>> >>> I did a successful bisect between 3.18 and 3.19 which found a bug in >>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. >>> Then, I did a second bisect with a ported version of the fix to the >>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation >>> with what appears to be the exact same symptoms. Reverting that commit >>> in recent kernels up to and including 4.6 fixes the issue and restores >>> reliable hibernation. However, it's not at all clear to me why that >>> commit would cause this issue or how to fix the issue without reverting. >> >> I've attached that commit below and also Cc:-ed a few more people who might have >> an idea about why this regressed. Worst-case we'll have to revert it. > > Without looking deep into mm, my theory would be that after this patch > the final jump from the boot kernel to the image kernel's trampoline > code during resume may crash the kernel if the trampoline page turns > out to be NX in the boot kernel (it has to be executable in both the > boot and the image kernels). So, pardon my ignorance, but where is this trampoline page placed in kernel memory? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-05-20 13:56 ` Stephen Smalley @ 2016-05-20 21:46 ` Rafael J. Wysocki 2016-05-20 21:59 ` Kees Cook 0 siblings, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2016-05-20 21:46 UTC (permalink / raw) To: Stephen Smalley Cc: Rafael J. Wysocki, Ingo Molnar, Logan Gunthorpe, Kees Cook, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote: > On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: >> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: >>> >>> * Logan Gunthorpe <logang@deltatee.com> wrote: >>> >>>> Hi, >>>> >>>> I have been working on a bug that causes my laptop to freeze during >>>> resume from hibernation. I did a bisect to find the offending commit: >>>> >>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>>> >>>> There is more information in the bugzilla report [1] that >>>> I've been working on but I will summarize things below. >>>> >>>> I've experienced intermittent but reproducible freezes when resuming >>>> from hibernation since about kernel version 3.19. The freeze was >>>> significantly more reproducible when a few applications were loaded >>>> before hibernation and would largely not happen if hibernated >>>> immediately after booting to a desktop. I did some tracing work to find >>>> that the kernel gets as far as the resume_image call in >>>> swsusp_arch_resume and I could not find any response from the image >>>> kernel when I hit the bug. I also did testing that seemed to rule out >>>> this being caused by a problematic driver. >>>> >>>> I did a successful bisect between 3.18 and 3.19 which found a bug in >>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. >>>> Then, I did a second bisect with a ported version of the fix to the >>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation >>>> with what appears to be the exact same symptoms. Reverting that commit >>>> in recent kernels up to and including 4.6 fixes the issue and restores >>>> reliable hibernation. However, it's not at all clear to me why that >>>> commit would cause this issue or how to fix the issue without reverting. >>> >>> I've attached that commit below and also Cc:-ed a few more people who might have >>> an idea about why this regressed. Worst-case we'll have to revert it. >> >> Without looking deep into mm, my theory would be that after this patch >> the final jump from the boot kernel to the image kernel's trampoline >> code during resume may crash the kernel if the trampoline page turns >> out to be NX in the boot kernel (it has to be executable in both the >> boot and the image kernels). > > So, pardon my ignorance, but where is this trampoline page placed in > kernel memory? On 32-bit its location has to be the same in both the boot and the image kernels and that's within kernel text in both cases, so that shouldn't be a problem. On 64-bit its location depends on the image kernel and specifically on the location of the restore_registers routine in it. The (virtual) address of that routine is stored in the restore_jump_address variable, so the page containing it (the trampoline page) can be found with the help of that. swsusp_arch_resume() sets up a temporary kernel mapping to finalize the image restoration and that page must not be NX in that mapping for things to work. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-05-20 21:46 ` Rafael J. Wysocki @ 2016-05-20 21:59 ` Kees Cook 2016-05-20 22:16 ` Kees Cook 2016-06-10 22:11 ` Rafael J. Wysocki 0 siblings, 2 replies; 25+ messages in thread From: Kees Cook @ 2016-05-20 21:59 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Stephen Smalley, Ingo Molnar, Logan Gunthorpe, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote: >> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: >>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: >>>> >>>> * Logan Gunthorpe <logang@deltatee.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have been working on a bug that causes my laptop to freeze during >>>>> resume from hibernation. I did a bisect to find the offending commit: >>>>> >>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>>>> >>>>> There is more information in the bugzilla report [1] that >>>>> I've been working on but I will summarize things below. >>>>> >>>>> I've experienced intermittent but reproducible freezes when resuming >>>>> from hibernation since about kernel version 3.19. The freeze was >>>>> significantly more reproducible when a few applications were loaded >>>>> before hibernation and would largely not happen if hibernated >>>>> immediately after booting to a desktop. I did some tracing work to find >>>>> that the kernel gets as far as the resume_image call in >>>>> swsusp_arch_resume and I could not find any response from the image >>>>> kernel when I hit the bug. I also did testing that seemed to rule out >>>>> this being caused by a problematic driver. >>>>> >>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in >>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. >>>>> Then, I did a second bisect with a ported version of the fix to the >>>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation >>>>> with what appears to be the exact same symptoms. Reverting that commit >>>>> in recent kernels up to and including 4.6 fixes the issue and restores >>>>> reliable hibernation. However, it's not at all clear to me why that >>>>> commit would cause this issue or how to fix the issue without reverting. >>>> >>>> I've attached that commit below and also Cc:-ed a few more people who might have >>>> an idea about why this regressed. Worst-case we'll have to revert it. >>> >>> Without looking deep into mm, my theory would be that after this patch >>> the final jump from the boot kernel to the image kernel's trampoline >>> code during resume may crash the kernel if the trampoline page turns >>> out to be NX in the boot kernel (it has to be executable in both the >>> boot and the image kernels). >> >> So, pardon my ignorance, but where is this trampoline page placed in >> kernel memory? > > On 32-bit its location has to be the same in both the boot and the > image kernels and that's within kernel text in both cases, so that > shouldn't be a problem. > > On 64-bit its location depends on the image kernel and specifically on > the location of the restore_registers routine in it. The (virtual) > address of that routine is stored in the restore_jump_address > variable, so the page containing it (the trampoline page) can be found > with the help of that. > > swsusp_arch_resume() sets up a temporary kernel mapping to finalize > the image restoration and that page must not be NX in that mapping for > things to work. It looks like nothing in the swsusp_arch_resume() -> get_safe_page() -> get_image_page() path sets the page executable... Untested, but I wonder if this work work in swsusp_arch_resume() before the memcpy? (apologies for any gmail-based whitespace mangling...) diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c index 009947d419a6..c2f3ecc45bd4 100644 --- a/arch/x86/power/hibernate_64.c +++ b/arch/x86/power/hibernate_64.c @@ -12,6 +12,7 @@ #include <linux/smp.h> #include <linux/suspend.h> +#include <asm/cacheflush.h> #include <asm/init.h> #include <asm/proto.h> #include <asm/page.h> @@ -89,6 +90,7 @@ int swsusp_arch_resume(void) relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC); if (!relocated_restore_code) return -ENOMEM; + set_memory_x((unsigned long)relocated_restore_code, 1); memcpy(relocated_restore_code, &core_restore_code, &restore_registers - &core_restore_code); -Kees -- Kees Cook Chrome OS & Brillo Security ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-05-20 21:59 ` Kees Cook @ 2016-05-20 22:16 ` Kees Cook [not found] ` <573FC081.20006@deltatee.com> 2016-06-10 22:11 ` Rafael J. Wysocki 1 sibling, 1 reply; 25+ messages in thread From: Kees Cook @ 2016-05-20 22:16 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Stephen Smalley, Ingo Molnar, Logan Gunthorpe, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: > On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: >> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote: >>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: >>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: >>>>> >>>>> * Logan Gunthorpe <logang@deltatee.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have been working on a bug that causes my laptop to freeze during >>>>>> resume from hibernation. I did a bisect to find the offending commit: >>>>>> >>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>>>>> >>>>>> There is more information in the bugzilla report [1] that >>>>>> I've been working on but I will summarize things below. >>>>>> >>>>>> I've experienced intermittent but reproducible freezes when resuming >>>>>> from hibernation since about kernel version 3.19. The freeze was >>>>>> significantly more reproducible when a few applications were loaded >>>>>> before hibernation and would largely not happen if hibernated >>>>>> immediately after booting to a desktop. I did some tracing work to find >>>>>> that the kernel gets as far as the resume_image call in >>>>>> swsusp_arch_resume and I could not find any response from the image >>>>>> kernel when I hit the bug. I also did testing that seemed to rule out >>>>>> this being caused by a problematic driver. >>>>>> >>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in >>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. >>>>>> Then, I did a second bisect with a ported version of the fix to the >>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation >>>>>> with what appears to be the exact same symptoms. Reverting that commit >>>>>> in recent kernels up to and including 4.6 fixes the issue and restores >>>>>> reliable hibernation. However, it's not at all clear to me why that >>>>>> commit would cause this issue or how to fix the issue without reverting. >>>>> >>>>> I've attached that commit below and also Cc:-ed a few more people who might have >>>>> an idea about why this regressed. Worst-case we'll have to revert it. >>>> >>>> Without looking deep into mm, my theory would be that after this patch >>>> the final jump from the boot kernel to the image kernel's trampoline >>>> code during resume may crash the kernel if the trampoline page turns >>>> out to be NX in the boot kernel (it has to be executable in both the >>>> boot and the image kernels). >>> >>> So, pardon my ignorance, but where is this trampoline page placed in >>> kernel memory? >> >> On 32-bit its location has to be the same in both the boot and the >> image kernels and that's within kernel text in both cases, so that >> shouldn't be a problem. >> >> On 64-bit its location depends on the image kernel and specifically on >> the location of the restore_registers routine in it. The (virtual) >> address of that routine is stored in the restore_jump_address >> variable, so the page containing it (the trampoline page) can be found >> with the help of that. >> >> swsusp_arch_resume() sets up a temporary kernel mapping to finalize >> the image restoration and that page must not be NX in that mapping for >> things to work. > > It looks like nothing in the swsusp_arch_resume() -> get_safe_page() > -> get_image_page() path sets the page executable... > > Untested, but I wonder if this work work in swsusp_arch_resume() > before the memcpy? I can't type today, it seems. It should read "... if this would work ..." If you can test this and it works for you, I'll send a proper patch... :P -Kees > > (apologies for any gmail-based whitespace mangling...) > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > index 009947d419a6..c2f3ecc45bd4 100644 > --- a/arch/x86/power/hibernate_64.c > +++ b/arch/x86/power/hibernate_64.c > @@ -12,6 +12,7 @@ > #include <linux/smp.h> > #include <linux/suspend.h> > > +#include <asm/cacheflush.h> > #include <asm/init.h> > #include <asm/proto.h> > #include <asm/page.h> > @@ -89,6 +90,7 @@ int swsusp_arch_resume(void) > relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC); > if (!relocated_restore_code) > return -ENOMEM; > + set_memory_x((unsigned long)relocated_restore_code, 1); > memcpy(relocated_restore_code, &core_restore_code, > &restore_registers - &core_restore_code); > > > -Kees > > -- > Kees Cook > Chrome OS & Brillo Security -- Kees Cook Chrome OS & Brillo Security ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <573FC081.20006@deltatee.com>]
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap [not found] ` <573FC081.20006@deltatee.com> @ 2016-05-21 16:39 ` Kees Cook [not found] ` <575A3E95.5090100@deltatee.com> 0 siblings, 1 reply; 25+ messages in thread From: Kees Cook @ 2016-05-21 16:39 UTC (permalink / raw) To: Logan Gunthorpe Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Fri, May 20, 2016 at 6:57 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > On 20/05/16 04:16 PM, Kees Cook wrote: >> >> On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: >>> >>> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> >>> wrote: >>>> >>>> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> >>>> wrote: >>>>> >>>>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: >>>>>> >>>>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: >>>>>>> >>>>>>> >>>>>>> * Logan Gunthorpe <logang@deltatee.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have been working on a bug that causes my laptop to freeze during >>>>>>>> resume from hibernation. I did a bisect to find the offending >>>>>>>> commit: >>>>>>>> >>>>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>>>>>>> >>>>>>>> There is more information in the bugzilla report [1] that >>>>>>>> I've been working on but I will summarize things below. >>>>>>>> >>>>>>>> I've experienced intermittent but reproducible freezes when resuming >>>>>>>> from hibernation since about kernel version 3.19. The freeze was >>>>>>>> significantly more reproducible when a few applications were loaded >>>>>>>> before hibernation and would largely not happen if hibernated >>>>>>>> immediately after booting to a desktop. I did some tracing work to >>>>>>>> find >>>>>>>> that the kernel gets as far as the resume_image call in >>>>>>>> swsusp_arch_resume and I could not find any response from the image >>>>>>>> kernel when I hit the bug. I also did testing that seemed to rule >>>>>>>> out >>>>>>>> this being caused by a problematic driver. >>>>>>>> >>>>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in >>>>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in >>>>>>>> 4.4. >>>>>>>> Then, I did a second bisect with a ported version of the fix to the >>>>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break >>>>>>>> hibernation >>>>>>>> with what appears to be the exact same symptoms. Reverting that >>>>>>>> commit >>>>>>>> in recent kernels up to and including 4.6 fixes the issue and >>>>>>>> restores >>>>>>>> reliable hibernation. However, it's not at all clear to me why that >>>>>>>> commit would cause this issue or how to fix the issue without >>>>>>>> reverting. >>>>>>> >>>>>>> >>>>>>> I've attached that commit below and also Cc:-ed a few more people who >>>>>>> might have >>>>>>> an idea about why this regressed. Worst-case we'll have to revert it. >>>>>> >>>>>> >>>>>> Without looking deep into mm, my theory would be that after this patch >>>>>> the final jump from the boot kernel to the image kernel's trampoline >>>>>> code during resume may crash the kernel if the trampoline page turns >>>>>> out to be NX in the boot kernel (it has to be executable in both the >>>>>> boot and the image kernels). >>>>> >>>>> >>>>> So, pardon my ignorance, but where is this trampoline page placed in >>>>> kernel memory? >>>> >>>> >>>> On 32-bit its location has to be the same in both the boot and the >>>> image kernels and that's within kernel text in both cases, so that >>>> shouldn't be a problem. >>>> >>>> On 64-bit its location depends on the image kernel and specifically on >>>> the location of the restore_registers routine in it. The (virtual) >>>> address of that routine is stored in the restore_jump_address >>>> variable, so the page containing it (the trampoline page) can be found >>>> with the help of that. >>>> >>>> swsusp_arch_resume() sets up a temporary kernel mapping to finalize >>>> the image restoration and that page must not be NX in that mapping for >>>> things to work. >>> >>> >>> It looks like nothing in the swsusp_arch_resume() -> get_safe_page() >>> -> get_image_page() path sets the page executable... >>> >>> Untested, but I wonder if this work work in swsusp_arch_resume() >>> before the memcpy? >> >> >> I can't type today, it seems. It should read "... if this would work ..." >> >> If you can test this and it works for you, I'll send a proper patch... :P >> >> -Kees >> > > Hi Kees, > > Thanks. I tried the patch but it only resulted in a kernel warning and > freeze. I've attached a photo showing as much of the messages as I could > get. > > Logan Ah, dang, ok, thanks for trying it. I'll let Rafael try to figure this one out. -Kees -- Kees Cook Chrome OS & Brillo Security ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <575A3E95.5090100@deltatee.com>]
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap [not found] ` <575A3E95.5090100@deltatee.com> @ 2016-06-10 18:09 ` Kees Cook 2016-06-10 18:16 ` Logan Gunthorpe 2016-06-10 21:27 ` Rafael J. Wysocki 0 siblings, 2 replies; 25+ messages in thread From: Kees Cook @ 2016-06-10 18:09 UTC (permalink / raw) To: Logan Gunthorpe Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > Hey, > > I've still be trying to figure this out as I have time. > > I tried printing a couple restore addresses and nothing I can find seems > anywhere near the rodata/ex_table boundary. > > I tried with the (badly formatted) below and got the following. Nothing too > surprising. I've attached a kallsyms that matches the kernel for reference. > > restore_code: ffff880157c3b000 > jump_addr: ffffffff81446be0 > > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > index 009947d..6efedb7 100644 > --- a/arch/x86/power/hibernate_64.c > +++ b/arch/x86/power/hibernate_64.c > @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) > memcpy(relocated_restore_code, &core_restore_code, > &restore_registers - &core_restore_code); > > + pr_info("restore_code: %p\n", relocated_restore_code); > + pr_info("jump_addr: %lx\n", restore_jump_address); > + Also interesting would be the "relocated_restore_code" address, as well as a dump of /sys/kernel/debug/kernel_page_tables (from CONFIG_X86_PTDUMP). I'm baffled by the problem, but the best I can understand is the the relocated_restore_code range isn't executable (which should be visible from finding it in /sys/kernel/debug/kernel_page_tables), but I don't see how to solve that since my original patch didn't work. Rafael, is this something you have time to look at quickly? -Kees > restore_image(); > return 0; > } > > > Thanks, > > Logan > > > > On 21/05/16 10:39 AM, Kees Cook wrote: >> >> On Fri, May 20, 2016 at 6:57 PM, Logan Gunthorpe <logang@deltatee.com> >> wrote: >>> >>> On 20/05/16 04:16 PM, Kees Cook wrote: >>>> >>>> >>>> On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org> >>>> wrote: >>>>> >>>>> >>>>> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> >>>>> wrote: >>>>>> >>>>>> >>>>>> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> * Logan Gunthorpe <logang@deltatee.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have been working on a bug that causes my laptop to freeze >>>>>>>>>> during >>>>>>>>>> resume from hibernation. I did a bisect to find the offending >>>>>>>>>> commit: >>>>>>>>>> >>>>>>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>>>>>>>>> >>>>>>>>>> There is more information in the bugzilla report [1] that >>>>>>>>>> I've been working on but I will summarize things below. >>>>>>>>>> >>>>>>>>>> I've experienced intermittent but reproducible freezes when >>>>>>>>>> resuming >>>>>>>>>> from hibernation since about kernel version 3.19. The freeze was >>>>>>>>>> significantly more reproducible when a few applications were >>>>>>>>>> loaded >>>>>>>>>> before hibernation and would largely not happen if hibernated >>>>>>>>>> immediately after booting to a desktop. I did some tracing work to >>>>>>>>>> find >>>>>>>>>> that the kernel gets as far as the resume_image call in >>>>>>>>>> swsusp_arch_resume and I could not find any response from the >>>>>>>>>> image >>>>>>>>>> kernel when I hit the bug. I also did testing that seemed to rule >>>>>>>>>> out >>>>>>>>>> this being caused by a problematic driver. >>>>>>>>>> >>>>>>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug >>>>>>>>>> in >>>>>>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in >>>>>>>>>> 4.4. >>>>>>>>>> Then, I did a second bisect with a ported version of the fix to >>>>>>>>>> the >>>>>>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break >>>>>>>>>> hibernation >>>>>>>>>> with what appears to be the exact same symptoms. Reverting that >>>>>>>>>> commit >>>>>>>>>> in recent kernels up to and including 4.6 fixes the issue and >>>>>>>>>> restores >>>>>>>>>> reliable hibernation. However, it's not at all clear to me why >>>>>>>>>> that >>>>>>>>>> commit would cause this issue or how to fix the issue without >>>>>>>>>> reverting. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I've attached that commit below and also Cc:-ed a few more people >>>>>>>>> who >>>>>>>>> might have >>>>>>>>> an idea about why this regressed. Worst-case we'll have to revert >>>>>>>>> it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Without looking deep into mm, my theory would be that after this >>>>>>>> patch >>>>>>>> the final jump from the boot kernel to the image kernel's trampoline >>>>>>>> code during resume may crash the kernel if the trampoline page turns >>>>>>>> out to be NX in the boot kernel (it has to be executable in both the >>>>>>>> boot and the image kernels). >>>>>>> >>>>>>> >>>>>>> >>>>>>> So, pardon my ignorance, but where is this trampoline page placed in >>>>>>> kernel memory? >>>>>> >>>>>> >>>>>> >>>>>> On 32-bit its location has to be the same in both the boot and the >>>>>> image kernels and that's within kernel text in both cases, so that >>>>>> shouldn't be a problem. >>>>>> >>>>>> On 64-bit its location depends on the image kernel and specifically on >>>>>> the location of the restore_registers routine in it. The (virtual) >>>>>> address of that routine is stored in the restore_jump_address >>>>>> variable, so the page containing it (the trampoline page) can be found >>>>>> with the help of that. >>>>>> >>>>>> swsusp_arch_resume() sets up a temporary kernel mapping to finalize >>>>>> the image restoration and that page must not be NX in that mapping for >>>>>> things to work. >>>>> >>>>> >>>>> >>>>> It looks like nothing in the swsusp_arch_resume() -> get_safe_page() >>>>> -> get_image_page() path sets the page executable... >>>>> >>>>> Untested, but I wonder if this work work in swsusp_arch_resume() >>>>> before the memcpy? >>>> >>>> >>>> >>>> I can't type today, it seems. It should read "... if this would work >>>> ..." >>>> >>>> If you can test this and it works for you, I'll send a proper patch... >>>> :P >>>> >>>> -Kees >>>> >>> >>> Hi Kees, >>> >>> Thanks. I tried the patch but it only resulted in a kernel warning and >>> freeze. I've attached a photo showing as much of the messages as I could >>> get. >>> >>> Logan >> >> >> Ah, dang, ok, thanks for trying it. I'll let Rafael try to figure this one >> out. >> >> -Kees >> > -- Kees Cook Chrome OS & Brillo Security ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-10 18:09 ` Kees Cook @ 2016-06-10 18:16 ` Logan Gunthorpe 2016-06-10 18:18 ` Kees Cook 2016-06-10 21:27 ` Rafael J. Wysocki 1 sibling, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2016-06-10 18:16 UTC (permalink / raw) To: Kees Cook Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst Hey, On 10/06/16 12:09 PM, Kees Cook wrote: >> restore_code: ffff880157c3b000 >> jump_addr: ffffffff81446be0 >> >> >> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c >> index 009947d..6efedb7 100644 >> --- a/arch/x86/power/hibernate_64.c >> +++ b/arch/x86/power/hibernate_64.c >> @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) >> memcpy(relocated_restore_code, &core_restore_code, >> &restore_registers - &core_restore_code); >> >> + pr_info("restore_code: %p\n", relocated_restore_code); >> + pr_info("jump_addr: %lx\n", restore_jump_address); >> + > > Also interesting would be the "relocated_restore_code" address, as > well as a dump of /sys/kernel/debug/kernel_page_tables (from > CONFIG_X86_PTDUMP). Is that not what I printed? If not, can you give me a better hint as to what you're looking for so I can spin another kernel? I'll also provide the kernel_page_tables once I do that. > I'm baffled by the problem, but the best I can understand is the the > relocated_restore_code range isn't executable (which should be visible > from finding it in /sys/kernel/debug/kernel_page_tables), but I don't > see how to solve that since my original patch didn't work. Yeah this is definitely a baffling problem. Thanks, Logan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-10 18:16 ` Logan Gunthorpe @ 2016-06-10 18:18 ` Kees Cook 0 siblings, 0 replies; 25+ messages in thread From: Kees Cook @ 2016-06-10 18:18 UTC (permalink / raw) To: Logan Gunthorpe Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Fri, Jun 10, 2016 at 11:16 AM, Logan Gunthorpe <logang@deltatee.com> wrote: > Hey, > > On 10/06/16 12:09 PM, Kees Cook wrote: >>> restore_code: ffff880157c3b000 >>> jump_addr: ffffffff81446be0 >>> >>> >>> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c >>> index 009947d..6efedb7 100644 >>> --- a/arch/x86/power/hibernate_64.c >>> +++ b/arch/x86/power/hibernate_64.c >>> @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) >>> memcpy(relocated_restore_code, &core_restore_code, >>> &restore_registers - &core_restore_code); >>> >>> + pr_info("restore_code: %p\n", relocated_restore_code); >>> + pr_info("jump_addr: %lx\n", restore_jump_address); >>> + >> >> Also interesting would be the "relocated_restore_code" address, as >> well as a dump of /sys/kernel/debug/kernel_page_tables (from >> CONFIG_X86_PTDUMP). > > Is that not what I printed? If not, can you give me a better hint as to Oh, whoops, sorry, I saw "restore_code" in the pr_info and "relocate_restore_code" in the memcpy and didn't scan the right thing in the pr_info line. :) > what you're looking for so I can spin another kernel? I'll also provide > the kernel_page_tables once I do that. Cool, thanks. > >> I'm baffled by the problem, but the best I can understand is the the >> relocated_restore_code range isn't executable (which should be visible >> from finding it in /sys/kernel/debug/kernel_page_tables), but I don't >> see how to solve that since my original patch didn't work. > > Yeah this is definitely a baffling problem. -Kees -- Kees Cook Chrome OS & Brillo Security ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-10 18:09 ` Kees Cook 2016-06-10 18:16 ` Logan Gunthorpe @ 2016-06-10 21:27 ` Rafael J. Wysocki 2016-06-10 22:29 ` Rafael J. Wysocki 1 sibling, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-10 21:27 UTC (permalink / raw) To: Kees Cook Cc: Logan Gunthorpe, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Friday, June 10, 2016 11:09:22 AM Kees Cook wrote: > On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > > Hey, > > > > I've still be trying to figure this out as I have time. > > > > I tried printing a couple restore addresses and nothing I can find seems > > anywhere near the rodata/ex_table boundary. > > > > I tried with the (badly formatted) below and got the following. Nothing too > > surprising. I've attached a kallsyms that matches the kernel for reference. > > > > restore_code: ffff880157c3b000 > > jump_addr: ffffffff81446be0 > > > > > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > > index 009947d..6efedb7 100644 > > --- a/arch/x86/power/hibernate_64.c > > +++ b/arch/x86/power/hibernate_64.c > > @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) > > memcpy(relocated_restore_code, &core_restore_code, > > &restore_registers - &core_restore_code); > > > > + pr_info("restore_code: %p\n", relocated_restore_code); > > + pr_info("jump_addr: %lx\n", restore_jump_address); > > + > > Also interesting would be the "relocated_restore_code" address, as > well as a dump of /sys/kernel/debug/kernel_page_tables (from > CONFIG_X86_PTDUMP). > > I'm baffled by the problem, but the best I can understand is the the > relocated_restore_code range isn't executable (which should be visible > from finding it in /sys/kernel/debug/kernel_page_tables), but I don't > see how to solve that since my original patch didn't work. > > Rafael, is this something you have time to look at quickly? Well, not really, but I'll do my best to look at it in the next few days. Thanks, Rafael ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-10 21:27 ` Rafael J. Wysocki @ 2016-06-10 22:29 ` Rafael J. Wysocki 2016-06-10 22:28 ` Logan Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-10 22:29 UTC (permalink / raw) To: Kees Cook Cc: Logan Gunthorpe, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Friday, June 10, 2016 11:27:29 PM Rafael J. Wysocki wrote: > On Friday, June 10, 2016 11:09:22 AM Kees Cook wrote: > > On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > > > Hey, > > > > > > I've still be trying to figure this out as I have time. > > > > > > I tried printing a couple restore addresses and nothing I can find seems > > > anywhere near the rodata/ex_table boundary. > > > > > > I tried with the (badly formatted) below and got the following. Nothing too > > > surprising. I've attached a kallsyms that matches the kernel for reference. > > > > > > restore_code: ffff880157c3b000 > > > jump_addr: ffffffff81446be0 > > > > > > > > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > > > index 009947d..6efedb7 100644 > > > --- a/arch/x86/power/hibernate_64.c > > > +++ b/arch/x86/power/hibernate_64.c > > > @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) > > > memcpy(relocated_restore_code, &core_restore_code, > > > &restore_registers - &core_restore_code); > > > > > > + pr_info("restore_code: %p\n", relocated_restore_code); > > > + pr_info("jump_addr: %lx\n", restore_jump_address); > > > + > > > > Also interesting would be the "relocated_restore_code" address, as > > well as a dump of /sys/kernel/debug/kernel_page_tables (from > > CONFIG_X86_PTDUMP). > > > > I'm baffled by the problem, but the best I can understand is the the > > relocated_restore_code range isn't executable (which should be visible > > from finding it in /sys/kernel/debug/kernel_page_tables), but I don't > > see how to solve that since my original patch didn't work. > > > > Rafael, is this something you have time to look at quickly? > > Well, not really, but I'll do my best to look at it in the next few days. OK, I have a theory, but I need a bit of help. This may be a dumb question, but I don't quite remember the answer readily. Given a physical address, how do I get the corresponding virtual one under the kernel identity mapping? Thanks, Rafael ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-10 22:29 ` Rafael J. Wysocki @ 2016-06-10 22:28 ` Logan Gunthorpe 2016-06-10 22:33 ` Rafael J. Wysocki 0 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2016-06-10 22:28 UTC (permalink / raw) To: Rafael J. Wysocki, Kees Cook Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On 10/06/16 04:29 PM, Rafael J. Wysocki wrote: > OK, I have a theory, but I need a bit of help. > > This may be a dumb question, but I don't quite remember the answer readily. > > Given a physical address, how do I get the corresponding virtual one under > the kernel identity mapping? Is that not just 'phys_to_virt(x)'?? Logan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-10 22:28 ` Logan Gunthorpe @ 2016-06-10 22:33 ` Rafael J. Wysocki 2016-06-11 0:13 ` Rafael J. Wysocki 0 siblings, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-10 22:33 UTC (permalink / raw) To: Logan Gunthorpe Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote: > > On 10/06/16 04:29 PM, Rafael J. Wysocki wrote: > > OK, I have a theory, but I need a bit of help. > > > > This may be a dumb question, but I don't quite remember the answer readily. > > > > Given a physical address, how do I get the corresponding virtual one under > > the kernel identity mapping? > > Is that not just 'phys_to_virt(x)'?? Ah that. Thanks! Rafael ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-10 22:33 ` Rafael J. Wysocki @ 2016-06-11 0:13 ` Rafael J. Wysocki 2016-06-11 1:47 ` Rafael J. Wysocki 0 siblings, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-11 0:13 UTC (permalink / raw) To: Logan Gunthorpe, Kees Cook Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote: > On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote: > > > > On 10/06/16 04:29 PM, Rafael J. Wysocki wrote: > > > OK, I have a theory, but I need a bit of help. > > > > > > This may be a dumb question, but I don't quite remember the answer readily. > > > > > > Given a physical address, how do I get the corresponding virtual one under > > > the kernel identity mapping? > > > > Is that not just 'phys_to_virt(x)'?? > > Ah that. Thanks! So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully will make it a bit clearer what happens there. In particular, it follows from it quite clearly that the restore will only work if the virtual address of the trampoline (&restore_registers) in the image kernel happens to match the virtual address of the same chunk of memory in the boot kernel (and after it has switched to the temporary page tables). That appears to be the case most of the time, or hibernation would randomly break for people all over, but I'm not actually sure if that's guaranteed to happen. If not, that very well may explain unexpected failures in there. If it is not guaranteed to happen, the solution would be to pass the physical address of that memory from the image kernel to the boot kernel instead of its virtual address, but simply applying virt_to_phys() to &restore_registers in arch_hibernation_header_save(), storing that in rdr->jump_address and then applying phys_to_virt() to rdr->jump_address in arch_hibernation_header_restore() doesn't work. Any hints on how to do it correctly? Thanks, Rafael --- From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Subject: [PATCH] x86 / hibernate: Simplify passing of trampoline address During resume from hibernation, the boot kernel has to jump to the image kernel's trampoline code to pass control to it. On x86_64 the address of that code is passed from the image kernel to the boot kernel in the image header. Currently, the way in which that address is saved by the image kernel is not entirely straightforward (assembly code is involved in that unnecessarily etc), so simplify it to make it easier to follow. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- arch/x86/power/hibernate_64.c | 6 +++--- arch/x86/power/hibernate_asm_64.S | 3 --- 2 files changed, 3 insertions(+), 6 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -27,7 +27,7 @@ extern asmlinkage __visible int restore_ * Address to jump to in the last phase of restore in order to get to the image * kernel's text (this value is passed in the image header). */ -unsigned long restore_jump_address __visible; +void *restore_jump_address __visible; /* * Value of the cr3 register from before the hibernation (this value is passed @@ -108,7 +108,7 @@ int pfn_is_nosave(unsigned long pfn) } struct restore_data_record { - unsigned long jump_address; + void *jump_address; unsigned long cr3; unsigned long magic; }; @@ -126,7 +126,7 @@ int arch_hibernation_header_save(void *a if (max_size < sizeof(struct restore_data_record)) return -EOVERFLOW; - rdr->jump_address = restore_jump_address; + rdr->jump_address = &restore_registers; rdr->cr3 = restore_cr3; rdr->magic = RESTORE_MAGIC; return 0; Index: linux-pm/arch/x86/power/hibernate_asm_64.S =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S +++ linux-pm/arch/x86/power/hibernate_asm_64.S @@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend) pushfq popq pt_regs_flags(%rax) - /* save the address of restore_registers */ - movq $restore_registers, %rax - movq %rax, restore_jump_address(%rip) /* save cr3 */ movq %cr3, %rax movq %rax, restore_cr3(%rip) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-11 0:13 ` Rafael J. Wysocki @ 2016-06-11 1:47 ` Rafael J. Wysocki 2016-06-11 11:48 ` Rafael J. Wysocki 2016-06-11 17:39 ` Logan Gunthorpe 0 siblings, 2 replies; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-11 1:47 UTC (permalink / raw) To: Logan Gunthorpe Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Saturday, June 11, 2016 02:13:45 AM Rafael J. Wysocki wrote: > On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote: > > On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote: > > > > > > On 10/06/16 04:29 PM, Rafael J. Wysocki wrote: > > > > OK, I have a theory, but I need a bit of help. > > > > > > > > This may be a dumb question, but I don't quite remember the answer readily. > > > > > > > > Given a physical address, how do I get the corresponding virtual one under > > > > the kernel identity mapping? > > > > > > Is that not just 'phys_to_virt(x)'?? > > > > Ah that. Thanks! > > So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully > will make it a bit clearer what happens there. > > In particular, it follows from it quite clearly that the restore will only work > if the virtual address of the trampoline (&restore_registers) in the image > kernel happens to match the virtual address of the same chunk of memory in the > boot kernel (and after it has switched to the temporary page tables). > > That appears to be the case most of the time, or hibernation would randomly > break for people all over, but I'm not actually sure if that's guaranteed to > happen. If not, that very well may explain unexpected failures in there. > > If it is not guaranteed to happen, the solution would be to pass the physical > address of that memory from the image kernel to the boot kernel instead of its > virtual address, OK, the appended patch does the trick for me. Logan, can you please try it? It shouldn't break things, but I'm wondering if the problem you're having after commit ab76f7b4ab is still reproducible with it. Thanks, Rafael --- From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Subject: [PATCH] x86 / hibernate: Pass physical address of the trampoline During resume from hibernation, the boot kernel has to jump to the image kernel's trampoline code to pass control to it. On x86_64 the address of that code is passed from the image kernel to the boot kernel in the image header. Currently, the way in which that address is saved by the image kernel is not entirely straightforward (assembly code is involved in that unnecessarily etc). Moreover, the current code passes the virtual address of the trampoline with the assumption that the image and boot kernels will always use the same virtual addresses for the chunk of physical address space occupied by the trampoline code. In case that assumption is not valid, pass the physical address of the trampoline code from the image kernel to the boot kernel and update RESTORE_MAGIC to avoid situations in which the boot kernel may use a different trampoline address passing convention from the one used by the image kernel. Also drop the assembly code chunk updating restore_jump_address in swsusp_arch_suspend() as it is not necessary any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- arch/x86/power/hibernate_64.c | 8 ++++---- arch/x86/power/hibernate_asm_64.S | 3 --- 2 files changed, 4 insertions(+), 7 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -27,7 +27,7 @@ extern asmlinkage __visible int restore_ * Address to jump to in the last phase of restore in order to get to the image * kernel's text (this value is passed in the image header). */ -unsigned long restore_jump_address __visible; +void *restore_jump_address __visible; /* * Value of the cr3 register from before the hibernation (this value is passed @@ -113,7 +113,7 @@ struct restore_data_record { unsigned long magic; }; -#define RESTORE_MAGIC 0x0123456789ABCDEFUL +#define RESTORE_MAGIC 0x0123456789ABCDF0UL /** * arch_hibernation_header_save - populate the architecture specific part @@ -126,7 +126,7 @@ int arch_hibernation_header_save(void *a if (max_size < sizeof(struct restore_data_record)) return -EOVERFLOW; - rdr->jump_address = restore_jump_address; + rdr->jump_address = (unsigned long)__pa_symbol(&restore_registers); rdr->cr3 = restore_cr3; rdr->magic = RESTORE_MAGIC; return 0; @@ -141,7 +141,7 @@ int arch_hibernation_header_restore(void { struct restore_data_record *rdr = addr; - restore_jump_address = rdr->jump_address; + restore_jump_address = (void *)(rdr->jump_address + __START_KERNEL_map); restore_cr3 = rdr->cr3; return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL; } Index: linux-pm/arch/x86/power/hibernate_asm_64.S =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S +++ linux-pm/arch/x86/power/hibernate_asm_64.S @@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend) pushfq popq pt_regs_flags(%rax) - /* save the address of restore_registers */ - movq $restore_registers, %rax - movq %rax, restore_jump_address(%rip) /* save cr3 */ movq %cr3, %rax movq %rax, restore_cr3(%rip) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-11 1:47 ` Rafael J. Wysocki @ 2016-06-11 11:48 ` Rafael J. Wysocki 2016-06-11 16:35 ` Logan Gunthorpe 2016-06-11 17:39 ` Logan Gunthorpe 1 sibling, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-11 11:48 UTC (permalink / raw) To: Logan Gunthorpe Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Saturday, June 11, 2016 03:47:59 AM Rafael J. Wysocki wrote: > On Saturday, June 11, 2016 02:13:45 AM Rafael J. Wysocki wrote: > > On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote: > > > On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote: > > > > > > > > On 10/06/16 04:29 PM, Rafael J. Wysocki wrote: > > > > > OK, I have a theory, but I need a bit of help. > > > > > > > > > > This may be a dumb question, but I don't quite remember the answer readily. > > > > > > > > > > Given a physical address, how do I get the corresponding virtual one under > > > > > the kernel identity mapping? > > > > > > > > Is that not just 'phys_to_virt(x)'?? > > > > > > Ah that. Thanks! > > > > So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully > > will make it a bit clearer what happens there. > > > > In particular, it follows from it quite clearly that the restore will only work > > if the virtual address of the trampoline (&restore_registers) in the image > > kernel happens to match the virtual address of the same chunk of memory in the > > boot kernel (and after it has switched to the temporary page tables). > > > > That appears to be the case most of the time, or hibernation would randomly > > break for people all over, but I'm not actually sure if that's guaranteed to > > happen. If not, that very well may explain unexpected failures in there. > > > > If it is not guaranteed to happen, the solution would be to pass the physical > > address of that memory from the image kernel to the boot kernel instead of its > > virtual address, > > OK, the appended patch does the trick for me. > > Logan, can you please try it? It shouldn't break things, but I'm wondering if > the problem you're having after commit ab76f7b4ab is still reproducible with it. No, it won't help, and I think I know what's going on. The temporary page tables set up by set_up_temporary_mappings() re-use the original boot kernel's text mapping, so if the trampoline code address falls into a non-executable page in that mapping, the boot kernel can't pass control to the image kernel. If that theory is correct, the patch below should help, but IMO it should be applied regardless to eliminate this particular failure mode. I'll prepare another patch to pass the physical address on top of this one. Thanks, Rafael --- From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Subject: [PATCH] x86 / hibernate: Ensure that 64-bit trampoline code is executable During resume from hibernation, the boot kernel has to jump to the image kernel's trampoline code to pass control to it. On x86_64 the address of that code is passed from the image kernel to the boot kernel in the image header. That address is interpreted with respect to the original boot kernel's kernel text memory mapping, so if the page containing it is not executable in that mapping, the jump to it will crash the kernel, so prevent that from happening. While at it, clean up the way in which the trampoline code address is saved by the image kernel (assembly code is involved in that unnecessarily etc). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- arch/x86/power/hibernate_64.c | 15 ++++++++++++--- arch/x86/power/hibernate_asm_64.S | 3 --- 2 files changed, 12 insertions(+), 6 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -12,6 +12,7 @@ #include <linux/smp.h> #include <linux/suspend.h> +#include <asm/cacheflush.h> #include <asm/init.h> #include <asm/proto.h> #include <asm/page.h> @@ -27,7 +28,7 @@ extern asmlinkage __visible int restore_ * Address to jump to in the last phase of restore in order to get to the image * kernel's text (this value is passed in the image header). */ -unsigned long restore_jump_address __visible; +void *restore_jump_address __visible; /* * Value of the cr3 register from before the hibernation (this value is passed @@ -91,6 +92,14 @@ int swsusp_arch_resume(void) return -ENOMEM; memcpy(relocated_restore_code, &core_restore_code, &restore_registers - &core_restore_code); + /* + * The temporary memory mapping set up by set_up_temporary_mappings() + * above re-uses the original kernel text mapping. If the page with + * restore_jump_address is not executable in that mapping, it won't be + * possible to pass control to the image kernel, so prevent that from + * happening. + */ + set_memory_x((unsigned long)restore_jump_address, 1); restore_image(); return 0; @@ -108,7 +117,7 @@ int pfn_is_nosave(unsigned long pfn) } struct restore_data_record { - unsigned long jump_address; + void *jump_address; unsigned long cr3; unsigned long magic; }; @@ -126,7 +135,7 @@ int arch_hibernation_header_save(void *a if (max_size < sizeof(struct restore_data_record)) return -EOVERFLOW; - rdr->jump_address = restore_jump_address; + rdr->jump_address = &restore_registers; rdr->cr3 = restore_cr3; rdr->magic = RESTORE_MAGIC; return 0; Index: linux-pm/arch/x86/power/hibernate_asm_64.S =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S +++ linux-pm/arch/x86/power/hibernate_asm_64.S @@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend) pushfq popq pt_regs_flags(%rax) - /* save the address of restore_registers */ - movq $restore_registers, %rax - movq %rax, restore_jump_address(%rip) /* save cr3 */ movq %cr3, %rax movq %rax, restore_cr3(%rip) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-11 11:48 ` Rafael J. Wysocki @ 2016-06-11 16:35 ` Logan Gunthorpe 0 siblings, 0 replies; 25+ messages in thread From: Logan Gunthorpe @ 2016-06-11 16:35 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst Hey Rafael, Thank for looking into this. I tried the patch below applied to v4.6 and I still got a lockup on resume. Additionally there was a kernel warning at arch/x86/mm/pageattr.c:1414 change_page_attr_set_clr+0x2bb/0x440 and another one right afterwards at kernel/smp.c:416 smp_call_function_many+0xb5/0x250. Regardless, Ill try the other patch you sent shortly. Logan On 11/06/16 05:48 AM, Rafael J. Wysocki wrote: > On Saturday, June 11, 2016 03:47:59 AM Rafael J. Wysocki wrote: >> On Saturday, June 11, 2016 02:13:45 AM Rafael J. Wysocki wrote: >>> On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote: >>>> On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote: >>>>> >>>>> On 10/06/16 04:29 PM, Rafael J. Wysocki wrote: >>>>>> OK, I have a theory, but I need a bit of help. >>>>>> >>>>>> This may be a dumb question, but I don't quite remember the answer readily. >>>>>> >>>>>> Given a physical address, how do I get the corresponding virtual one under >>>>>> the kernel identity mapping? >>>>> >>>>> Is that not just 'phys_to_virt(x)'?? >>>> >>>> Ah that. Thanks! >>> >>> So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully >>> will make it a bit clearer what happens there. >>> >>> In particular, it follows from it quite clearly that the restore will only work >>> if the virtual address of the trampoline (&restore_registers) in the image >>> kernel happens to match the virtual address of the same chunk of memory in the >>> boot kernel (and after it has switched to the temporary page tables). >>> >>> That appears to be the case most of the time, or hibernation would randomly >>> break for people all over, but I'm not actually sure if that's guaranteed to >>> happen. If not, that very well may explain unexpected failures in there. >>> >>> If it is not guaranteed to happen, the solution would be to pass the physical >>> address of that memory from the image kernel to the boot kernel instead of its >>> virtual address, >> >> OK, the appended patch does the trick for me. >> >> Logan, can you please try it? It shouldn't break things, but I'm wondering if >> the problem you're having after commit ab76f7b4ab is still reproducible with it. > > No, it won't help, and I think I know what's going on. > > The temporary page tables set up by set_up_temporary_mappings() re-use the > original boot kernel's text mapping, so if the trampoline code address > falls into a non-executable page in that mapping, the boot kernel can't > pass control to the image kernel. > > If that theory is correct, the patch below should help, but IMO it should > be applied regardless to eliminate this particular failure mode. > > I'll prepare another patch to pass the physical address on top of this one. > > Thanks, > Rafael > > --- > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Subject: [PATCH] x86 / hibernate: Ensure that 64-bit trampoline code is executable > > During resume from hibernation, the boot kernel has to jump to > the image kernel's trampoline code to pass control to it. On > x86_64 the address of that code is passed from the image kernel > to the boot kernel in the image header. > > That address is interpreted with respect to the original boot > kernel's kernel text memory mapping, so if the page containing it > is not executable in that mapping, the jump to it will crash the > kernel, so prevent that from happening. > > While at it, clean up the way in which the trampoline code address > is saved by the image kernel (assembly code is involved in that > unnecessarily etc). > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > arch/x86/power/hibernate_64.c | 15 ++++++++++++--- > arch/x86/power/hibernate_asm_64.S | 3 --- > 2 files changed, 12 insertions(+), 6 deletions(-) > > Index: linux-pm/arch/x86/power/hibernate_64.c > =================================================================== > --- linux-pm.orig/arch/x86/power/hibernate_64.c > +++ linux-pm/arch/x86/power/hibernate_64.c > @@ -12,6 +12,7 @@ > #include <linux/smp.h> > #include <linux/suspend.h> > > +#include <asm/cacheflush.h> > #include <asm/init.h> > #include <asm/proto.h> > #include <asm/page.h> > @@ -27,7 +28,7 @@ extern asmlinkage __visible int restore_ > * Address to jump to in the last phase of restore in order to get to the image > * kernel's text (this value is passed in the image header). > */ > -unsigned long restore_jump_address __visible; > +void *restore_jump_address __visible; > > /* > * Value of the cr3 register from before the hibernation (this value is passed > @@ -91,6 +92,14 @@ int swsusp_arch_resume(void) > return -ENOMEM; > memcpy(relocated_restore_code, &core_restore_code, > &restore_registers - &core_restore_code); > + /* > + * The temporary memory mapping set up by set_up_temporary_mappings() > + * above re-uses the original kernel text mapping. If the page with > + * restore_jump_address is not executable in that mapping, it won't be > + * possible to pass control to the image kernel, so prevent that from > + * happening. > + */ > + set_memory_x((unsigned long)restore_jump_address, 1); > > restore_image(); > return 0; > @@ -108,7 +117,7 @@ int pfn_is_nosave(unsigned long pfn) > } > > struct restore_data_record { > - unsigned long jump_address; > + void *jump_address; > unsigned long cr3; > unsigned long magic; > }; > @@ -126,7 +135,7 @@ int arch_hibernation_header_save(void *a > > if (max_size < sizeof(struct restore_data_record)) > return -EOVERFLOW; > - rdr->jump_address = restore_jump_address; > + rdr->jump_address = &restore_registers; > rdr->cr3 = restore_cr3; > rdr->magic = RESTORE_MAGIC; > return 0; > Index: linux-pm/arch/x86/power/hibernate_asm_64.S > =================================================================== > --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S > +++ linux-pm/arch/x86/power/hibernate_asm_64.S > @@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend) > pushfq > popq pt_regs_flags(%rax) > > - /* save the address of restore_registers */ > - movq $restore_registers, %rax > - movq %rax, restore_jump_address(%rip) > /* save cr3 */ > movq %cr3, %rax > movq %rax, restore_cr3(%rip) > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-11 1:47 ` Rafael J. Wysocki 2016-06-11 11:48 ` Rafael J. Wysocki @ 2016-06-11 17:39 ` Logan Gunthorpe 2016-06-12 1:05 ` Rafael J. Wysocki 1 sibling, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2016-06-11 17:39 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst Hey Rafael, I tried this patch as well and there was no change. I have a couple tentative observations to make though. None of this is 100% clear to me so please correct me if I'm wrong anywhere: 1) Commit ab76f7b4ab only extends the NX bit between __ex_table and rodata; which, by my understanding, shouldn't be used by anything. And __ex_table and rodata are fixed by the kernel's binary so both symbols should be the same in both the image kernel and the boot kernel given that both are running from the same binary. 2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though, when it's in place, it only works some of the time. Given that commit is only extending the NX region a bit, if there is some random mismatch, why does it never reach rodata? In other words, why is rodata a magic line that seems to work all the time -- why doesn't this random mismatch ever extend into the rodata region? rodata isn't _that_ far away from the end of ex_table. Anyway, thanks again for looking into this. Logan On 10/06/16 07:47 PM, Rafael J. Wysocki wrote: > On Saturday, June 11, 2016 02:13:45 AM Rafael J. Wysocki wrote: >> On Saturday, June 11, 2016 12:33:31 AM Rafael J. Wysocki wrote: >>> On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote: >>>> >>>> On 10/06/16 04:29 PM, Rafael J. Wysocki wrote: >>>>> OK, I have a theory, but I need a bit of help. >>>>> >>>>> This may be a dumb question, but I don't quite remember the answer readily. >>>>> >>>>> Given a physical address, how do I get the corresponding virtual one under >>>>> the kernel identity mapping? >>>> >>>> Is that not just 'phys_to_virt(x)'?? >>> >>> Ah that. Thanks! >> >> So first, appended is a cleanup of the 64-bit hibernate/restore code that hopefully >> will make it a bit clearer what happens there. >> >> In particular, it follows from it quite clearly that the restore will only work >> if the virtual address of the trampoline (&restore_registers) in the image >> kernel happens to match the virtual address of the same chunk of memory in the >> boot kernel (and after it has switched to the temporary page tables). >> >> That appears to be the case most of the time, or hibernation would randomly >> break for people all over, but I'm not actually sure if that's guaranteed to >> happen. If not, that very well may explain unexpected failures in there. >> >> If it is not guaranteed to happen, the solution would be to pass the physical >> address of that memory from the image kernel to the boot kernel instead of its >> virtual address, > > OK, the appended patch does the trick for me. > > Logan, can you please try it? It shouldn't break things, but I'm wondering if > the problem you're having after commit ab76f7b4ab is still reproducible with it. > > Thanks, > Rafael > > --- > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Subject: [PATCH] x86 / hibernate: Pass physical address of the trampoline > > During resume from hibernation, the boot kernel has to jump to > the image kernel's trampoline code to pass control to it. On > x86_64 the address of that code is passed from the image kernel > to the boot kernel in the image header. > > Currently, the way in which that address is saved by the image kernel > is not entirely straightforward (assembly code is involved in that > unnecessarily etc). Moreover, the current code passes the virtual > address of the trampoline with the assumption that the image and boot > kernels will always use the same virtual addresses for the chunk of > physical address space occupied by the trampoline code. > > In case that assumption is not valid, pass the physical address of > the trampoline code from the image kernel to the boot kernel and > update RESTORE_MAGIC to avoid situations in which the boot kernel > may use a different trampoline address passing convention from the > one used by the image kernel. > > Also drop the assembly code chunk updating restore_jump_address > in swsusp_arch_suspend() as it is not necessary any more. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > arch/x86/power/hibernate_64.c | 8 ++++---- > arch/x86/power/hibernate_asm_64.S | 3 --- > 2 files changed, 4 insertions(+), 7 deletions(-) > > Index: linux-pm/arch/x86/power/hibernate_64.c > =================================================================== > --- linux-pm.orig/arch/x86/power/hibernate_64.c > +++ linux-pm/arch/x86/power/hibernate_64.c > @@ -27,7 +27,7 @@ extern asmlinkage __visible int restore_ > * Address to jump to in the last phase of restore in order to get to the image > * kernel's text (this value is passed in the image header). > */ > -unsigned long restore_jump_address __visible; > +void *restore_jump_address __visible; > > /* > * Value of the cr3 register from before the hibernation (this value is passed > @@ -113,7 +113,7 @@ struct restore_data_record { > unsigned long magic; > }; > > -#define RESTORE_MAGIC 0x0123456789ABCDEFUL > +#define RESTORE_MAGIC 0x0123456789ABCDF0UL > > /** > * arch_hibernation_header_save - populate the architecture specific part > @@ -126,7 +126,7 @@ int arch_hibernation_header_save(void *a > > if (max_size < sizeof(struct restore_data_record)) > return -EOVERFLOW; > - rdr->jump_address = restore_jump_address; > + rdr->jump_address = (unsigned long)__pa_symbol(&restore_registers); > rdr->cr3 = restore_cr3; > rdr->magic = RESTORE_MAGIC; > return 0; > @@ -141,7 +141,7 @@ int arch_hibernation_header_restore(void > { > struct restore_data_record *rdr = addr; > > - restore_jump_address = rdr->jump_address; > + restore_jump_address = (void *)(rdr->jump_address + __START_KERNEL_map); > restore_cr3 = rdr->cr3; > return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL; > } > Index: linux-pm/arch/x86/power/hibernate_asm_64.S > =================================================================== > --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S > +++ linux-pm/arch/x86/power/hibernate_asm_64.S > @@ -44,9 +44,6 @@ ENTRY(swsusp_arch_suspend) > pushfq > popq pt_regs_flags(%rax) > > - /* save the address of restore_registers */ > - movq $restore_registers, %rax > - movq %rax, restore_jump_address(%rip) > /* save cr3 */ > movq %cr3, %rax > movq %rax, restore_cr3(%rip) > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-11 17:39 ` Logan Gunthorpe @ 2016-06-12 1:05 ` Rafael J. Wysocki 2016-06-12 4:48 ` Logan Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-12 1:05 UTC (permalink / raw) To: Logan Gunthorpe Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Saturday, June 11, 2016 11:39:48 AM Logan Gunthorpe wrote: > Hey Rafael, > > I tried this patch as well and there was no change. > > I have a couple tentative observations to make though. None of this is > 100% clear to me so please correct me if I'm wrong anywhere: > > 1) Commit ab76f7b4ab only extends the NX bit between __ex_table and > rodata; which, by my understanding, shouldn't be used by anything. And > __ex_table and rodata are fixed by the kernel's binary so both symbols > should be the same in both the image kernel and the boot kernel given > that both are running from the same binary. Well, what if the kernel is relocated? > 2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though, > when it's in place, it only works some of the time. Given that commit is > only extending the NX region a bit, if there is some random mismatch, > why does it never reach rodata? In other words, why is rodata a magic > line that seems to work all the time -- why doesn't this random mismatch > ever extend into the rodata region? rodata isn't _that_ far away from > the end of ex_table. That's a very good question. :-) Overall, it looks like re-using the boot kernel text mapping in the temporary page tables is a bad idea. I guess a temporary kernel text mapping is needed too or at least the existing one has to be modified to cover the trampoline code properly. > Anyway, thanks again for looking into this. No problem. I haven't helped much so far, though ... Can you please check if the patch below makes any difference? --- arch/x86/power/hibernate_64.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -12,6 +12,7 @@ #include <linux/smp.h> #include <linux/suspend.h> +#include <asm/cacheflush.h> #include <asm/init.h> #include <asm/proto.h> #include <asm/page.h> @@ -27,7 +28,7 @@ extern asmlinkage __visible int restore_ * Address to jump to in the last phase of restore in order to get to the image * kernel's text (this value is passed in the image header). */ -unsigned long restore_jump_address __visible; +void *restore_jump_address __visible; /* * Value of the cr3 register from before the hibernation (this value is passed @@ -108,7 +109,7 @@ int pfn_is_nosave(unsigned long pfn) } struct restore_data_record { - unsigned long jump_address; + void *jump_address; unsigned long cr3; unsigned long magic; }; @@ -126,7 +127,7 @@ int arch_hibernation_header_save(void *a if (max_size < sizeof(struct restore_data_record)) return -EOVERFLOW; - rdr->jump_address = restore_jump_address; + rdr->jump_address = &restore_registers; rdr->cr3 = restore_cr3; rdr->magic = RESTORE_MAGIC; return 0; @@ -140,8 +141,18 @@ int arch_hibernation_header_save(void *a int arch_hibernation_header_restore(void *addr) { struct restore_data_record *rdr = addr; + unsigned long text_end, all_end; + + if (rdr->magic != RESTORE_MAGIC) + return -EINVAL; restore_jump_address = rdr->jump_address; restore_cr3 = rdr->cr3; - return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL; + + text_end = PFN_ALIGN(&__stop___ex_table); + all_end = roundup((unsigned long)restore_jump_address, PMD_SIZE); + if (all_end > text_end) + set_memory_x(text_end, (all_end - text_end) >> PAGE_SHIFT); + + return 0; } ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-12 1:05 ` Rafael J. Wysocki @ 2016-06-12 4:48 ` Logan Gunthorpe 2016-06-12 14:31 ` Rafael J. Wysocki 0 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2016-06-12 4:48 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst Hey, On 11/06/16 07:05 PM, Rafael J. Wysocki wrote: >> 1) Commit ab76f7b4ab only extends the NX bit between __ex_table and >> rodata; which, by my understanding, shouldn't be used by anything. And >> __ex_table and rodata are fixed by the kernel's binary so both symbols >> should be the same in both the image kernel and the boot kernel given >> that both are running from the same binary. > > Well, what if the kernel is relocated? Ah, I'm sure I don't fully grasp the implications of that but I would assume that if the image kernel were located somewhere else it would still be far away from the boot kernel's ex_table/rodata boundary. >> 2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though, >> when it's in place, it only works some of the time. Given that commit is >> only extending the NX region a bit, if there is some random mismatch, >> why does it never reach rodata? In other words, why is rodata a magic >> line that seems to work all the time -- why doesn't this random mismatch >> ever extend into the rodata region? rodata isn't _that_ far away from >> the end of ex_table. > > That's a very good question. :-) Yeah, I guess if we knew the answer we'd understand what was going on and have a fix. > Can you please check if the patch below makes any difference? I'm afraid it's no different. The kernel still freezes on resume. Though, no warnings with this one. Thanks, Logan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-12 4:48 ` Logan Gunthorpe @ 2016-06-12 14:31 ` Rafael J. Wysocki 2016-06-12 16:11 ` Logan Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-12 14:31 UTC (permalink / raw) To: Logan Gunthorpe Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Saturday, June 11, 2016 10:48:08 PM Logan Gunthorpe wrote: > Hey, Hi, > On 11/06/16 07:05 PM, Rafael J. Wysocki wrote: > >> 1) Commit ab76f7b4ab only extends the NX bit between __ex_table and > >> rodata; which, by my understanding, shouldn't be used by anything. And > >> __ex_table and rodata are fixed by the kernel's binary so both symbols > >> should be the same in both the image kernel and the boot kernel given > >> that both are running from the same binary. > > > > Well, what if the kernel is relocated? > > Ah, I'm sure I don't fully grasp the implications of that but I would > assume that if the image kernel were located somewhere else it would > still be far away from the boot kernel's ex_table/rodata boundary. Probably. > >> 2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though, > >> when it's in place, it only works some of the time. Given that commit is > >> only extending the NX region a bit, if there is some random mismatch, > >> why does it never reach rodata? In other words, why is rodata a magic > >> line that seems to work all the time -- why doesn't this random mismatch > >> ever extend into the rodata region? rodata isn't _that_ far away from > >> the end of ex_table. > > > > That's a very good question. :-) > > Yeah, I guess if we knew the answer we'd understand what was going on > and have a fix. Right. Appended is one more patch to try. It actually fixes a theoretical problem, so I'll need to add comments to it (as it is far from obvious IMO) and a changelog etc and post it as a proper submission. So the concern is that the page copying done in the loop in core_restore_code() may corrupt the kernel text part of the temporary memory mapping used at that time, because that's the original kernel text mapping from the boot kernel. That doesn't matter for the loop itself (as that code runs from a "safe" page guaranteed not to be overwritten), but it (quite theoretically) may matter for the final jump to the image kernel's restore_registers(). [I realized that it might be a problem only after I had started to think about the problem you reported.] As a bonus, the patch also eliminates the possible concern about the executability of the memory mapped via the kernel text mapping in the boot kernel, so IMO it's worth to give it a shot. I've tested it lightly on one machine, but I guess it would just crash right away if there were any problems in it. Thanks, Rafael --- arch/x86/power/hibernate_64.c | 46 ++++++++++++++++++++++++++++++++++---- arch/x86/power/hibernate_asm_64.S | 28 +++++++++++++---------- 2 files changed, 58 insertions(+), 16 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -27,7 +27,8 @@ extern asmlinkage __visible int restore_ * Address to jump to in the last phase of restore in order to get to the image * kernel's text (this value is passed in the image header). */ -unsigned long restore_jump_address __visible; +void *restore_jump_address __visible; +unsigned long jump_address_phys; /* * Value of the cr3 register from before the hibernation (this value is passed @@ -37,6 +38,9 @@ unsigned long restore_cr3 __visible; pgd_t *temp_level4_pgt __visible; +void *restore_pgd_addr __visible; +pgd_t restore_pgd __visible; + void *relocated_restore_code __visible; static void *alloc_pgt_page(void *context) @@ -44,6 +48,33 @@ static void *alloc_pgt_page(void *contex return (void *)get_safe_page(GFP_ATOMIC); } +static int prepare_temporary_text_mapping(void) +{ + unsigned long vaddr = (unsigned long)restore_jump_address; + unsigned long paddr = jump_address_phys & PMD_MASK; + pmd_t *pmd; + pud_t *pud; + + pud = (pud_t *)get_safe_page(GFP_ATOMIC); + if (!pud) + return -ENOMEM; + + restore_pgd = __pgd(__pa(pud) | _KERNPG_TABLE); + + pud += pud_index(vaddr); + pmd = (pmd_t *)get_safe_page(GFP_ATOMIC); + if (!pmd) + return -ENOMEM; + + set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE)); + + pmd += pmd_index(vaddr); + set_pmd(pmd, __pmd(paddr | __PAGE_KERNEL_LARGE_EXEC)); + + restore_pgd_addr = temp_level4_pgt + pgd_index(vaddr); + return 0; +} + static int set_up_temporary_mappings(void) { struct x86_mapping_info info = { @@ -63,6 +94,10 @@ static int set_up_temporary_mappings(voi set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map), init_level4_pgt[pgd_index(__START_KERNEL_map)]); + result = prepare_temporary_text_mapping(); + if (result) + return result; + /* Set up the direct mapping from scratch */ for (i = 0; i < nr_pfn_mapped; i++) { mstart = pfn_mapped[i].start << PAGE_SHIFT; @@ -108,12 +143,13 @@ int pfn_is_nosave(unsigned long pfn) } struct restore_data_record { - unsigned long jump_address; + void *jump_address; + unsigned long jump_address_phys; unsigned long cr3; unsigned long magic; }; -#define RESTORE_MAGIC 0x0123456789ABCDEFUL +#define RESTORE_MAGIC 0x123456789ABCDEF0UL /** * arch_hibernation_header_save - populate the architecture specific part @@ -126,7 +162,8 @@ int arch_hibernation_header_save(void *a if (max_size < sizeof(struct restore_data_record)) return -EOVERFLOW; - rdr->jump_address = restore_jump_address; + rdr->jump_address = &restore_registers; + rdr->jump_address_phys = __pa_symbol(&restore_registers); rdr->cr3 = restore_cr3; rdr->magic = RESTORE_MAGIC; return 0; @@ -142,6 +179,7 @@ int arch_hibernation_header_restore(void struct restore_data_record *rdr = addr; restore_jump_address = rdr->jump_address; + jump_address_phys = rdr->jump_address_phys; restore_cr3 = rdr->cr3; return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL; } Index: linux-pm/arch/x86/power/hibernate_asm_64.S =================================================================== --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S +++ linux-pm/arch/x86/power/hibernate_asm_64.S @@ -72,8 +72,10 @@ ENTRY(restore_image) movq %rax, %cr4; # turn PGE back on /* prepare to jump to the image kernel */ - movq restore_jump_address(%rip), %rax movq restore_cr3(%rip), %rbx + movq restore_jump_address(%rip), %r10 + movq restore_pgd(%rip), %r8 + movq restore_pgd_addr(%rip), %r9 /* prepare to copy image data to their original locations */ movq restore_pblist(%rip), %rdx @@ -96,20 +98,22 @@ ENTRY(core_restore_code) /* progress to the next pbe */ movq pbe_next(%rdx), %rdx jmp .Lloop + .Ldone: + /* switch over to the temporary kernel text mapping */ + movq %r8, (%r9) + /* flush TLB */ + movq %rax, %rdx + andq $~(X86_CR4_PGE), %rdx + movq %rdx, %cr4; # turn off PGE + movq %cr3, %rcx; # flush TLB + movq %rcx, %cr3; + movq %rax, %cr4; # turn PGE back on /* jump to the restore_registers address from the image header */ - jmpq *%rax - /* - * NOTE: This assumes that the boot kernel's text mapping covers the - * image kernel's page containing restore_registers and the address of - * this page is the same as in the image kernel's text mapping (it - * should always be true, because the text mapping is linear, starting - * from 0, and is supposed to cover the entire kernel text for every - * kernel). - * - * code below belongs to the image kernel - */ + jmpq *%r10 + /* code below belongs to the image kernel */ + .align PAGE_SIZE ENTRY(restore_registers) FRAME_BEGIN /* go back to the original page tables */ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-12 14:31 ` Rafael J. Wysocki @ 2016-06-12 16:11 ` Logan Gunthorpe 2016-06-13 13:43 ` Rafael J. Wysocki 0 siblings, 1 reply; 25+ messages in thread From: Logan Gunthorpe @ 2016-06-12 16:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst Hey Rafael, Awesome, this patch fixes the problem! Nice work. Thanks, Logan On 12/06/16 08:31 AM, Rafael J. Wysocki wrote: > On Saturday, June 11, 2016 10:48:08 PM Logan Gunthorpe wrote: >> Hey, > > Hi, > >> On 11/06/16 07:05 PM, Rafael J. Wysocki wrote: >>>> 1) Commit ab76f7b4ab only extends the NX bit between __ex_table and >>>> rodata; which, by my understanding, shouldn't be used by anything. And >>>> __ex_table and rodata are fixed by the kernel's binary so both symbols >>>> should be the same in both the image kernel and the boot kernel given >>>> that both are running from the same binary. >>> >>> Well, what if the kernel is relocated? >> >> Ah, I'm sure I don't fully grasp the implications of that but I would >> assume that if the image kernel were located somewhere else it would >> still be far away from the boot kernel's ex_table/rodata boundary. > > Probably. > >>>> 2) When ab76f7b4ab is reverted, hibernation seems to work 100%. Though, >>>> when it's in place, it only works some of the time. Given that commit is >>>> only extending the NX region a bit, if there is some random mismatch, >>>> why does it never reach rodata? In other words, why is rodata a magic >>>> line that seems to work all the time -- why doesn't this random mismatch >>>> ever extend into the rodata region? rodata isn't _that_ far away from >>>> the end of ex_table. >>> >>> That's a very good question. :-) >> >> Yeah, I guess if we knew the answer we'd understand what was going on >> and have a fix. > > Right. > > Appended is one more patch to try. > > It actually fixes a theoretical problem, so I'll need to add comments to it > (as it is far from obvious IMO) and a changelog etc and post it as a proper > submission. > > So the concern is that the page copying done in the loop in core_restore_code() > may corrupt the kernel text part of the temporary memory mapping used at that > time, because that's the original kernel text mapping from the boot kernel. > > That doesn't matter for the loop itself (as that code runs from a "safe" page > guaranteed not to be overwritten), but it (quite theoretically) may matter for > the final jump to the image kernel's restore_registers(). [I realized that > it might be a problem only after I had started to think about the problem > you reported.] > > As a bonus, the patch also eliminates the possible concern about the > executability of the memory mapped via the kernel text mapping in the boot > kernel, so IMO it's worth to give it a shot. I've tested it lightly on > one machine, but I guess it would just crash right away if there were > any problems in it. > > Thanks, > Rafael > > > --- > arch/x86/power/hibernate_64.c | 46 ++++++++++++++++++++++++++++++++++---- > arch/x86/power/hibernate_asm_64.S | 28 +++++++++++++---------- > 2 files changed, 58 insertions(+), 16 deletions(-) > > Index: linux-pm/arch/x86/power/hibernate_64.c > =================================================================== > --- linux-pm.orig/arch/x86/power/hibernate_64.c > +++ linux-pm/arch/x86/power/hibernate_64.c > @@ -27,7 +27,8 @@ extern asmlinkage __visible int restore_ > * Address to jump to in the last phase of restore in order to get to the image > * kernel's text (this value is passed in the image header). > */ > -unsigned long restore_jump_address __visible; > +void *restore_jump_address __visible; > +unsigned long jump_address_phys; > > /* > * Value of the cr3 register from before the hibernation (this value is passed > @@ -37,6 +38,9 @@ unsigned long restore_cr3 __visible; > > pgd_t *temp_level4_pgt __visible; > > +void *restore_pgd_addr __visible; > +pgd_t restore_pgd __visible; > + > void *relocated_restore_code __visible; > > static void *alloc_pgt_page(void *context) > @@ -44,6 +48,33 @@ static void *alloc_pgt_page(void *contex > return (void *)get_safe_page(GFP_ATOMIC); > } > > +static int prepare_temporary_text_mapping(void) > +{ > + unsigned long vaddr = (unsigned long)restore_jump_address; > + unsigned long paddr = jump_address_phys & PMD_MASK; > + pmd_t *pmd; > + pud_t *pud; > + > + pud = (pud_t *)get_safe_page(GFP_ATOMIC); > + if (!pud) > + return -ENOMEM; > + > + restore_pgd = __pgd(__pa(pud) | _KERNPG_TABLE); > + > + pud += pud_index(vaddr); > + pmd = (pmd_t *)get_safe_page(GFP_ATOMIC); > + if (!pmd) > + return -ENOMEM; > + > + set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE)); > + > + pmd += pmd_index(vaddr); > + set_pmd(pmd, __pmd(paddr | __PAGE_KERNEL_LARGE_EXEC)); > + > + restore_pgd_addr = temp_level4_pgt + pgd_index(vaddr); > + return 0; > +} > + > static int set_up_temporary_mappings(void) > { > struct x86_mapping_info info = { > @@ -63,6 +94,10 @@ static int set_up_temporary_mappings(voi > set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map), > init_level4_pgt[pgd_index(__START_KERNEL_map)]); > > + result = prepare_temporary_text_mapping(); > + if (result) > + return result; > + > /* Set up the direct mapping from scratch */ > for (i = 0; i < nr_pfn_mapped; i++) { > mstart = pfn_mapped[i].start << PAGE_SHIFT; > @@ -108,12 +143,13 @@ int pfn_is_nosave(unsigned long pfn) > } > > struct restore_data_record { > - unsigned long jump_address; > + void *jump_address; > + unsigned long jump_address_phys; > unsigned long cr3; > unsigned long magic; > }; > > -#define RESTORE_MAGIC 0x0123456789ABCDEFUL > +#define RESTORE_MAGIC 0x123456789ABCDEF0UL > > /** > * arch_hibernation_header_save - populate the architecture specific part > @@ -126,7 +162,8 @@ int arch_hibernation_header_save(void *a > > if (max_size < sizeof(struct restore_data_record)) > return -EOVERFLOW; > - rdr->jump_address = restore_jump_address; > + rdr->jump_address = &restore_registers; > + rdr->jump_address_phys = __pa_symbol(&restore_registers); > rdr->cr3 = restore_cr3; > rdr->magic = RESTORE_MAGIC; > return 0; > @@ -142,6 +179,7 @@ int arch_hibernation_header_restore(void > struct restore_data_record *rdr = addr; > > restore_jump_address = rdr->jump_address; > + jump_address_phys = rdr->jump_address_phys; > restore_cr3 = rdr->cr3; > return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL; > } > Index: linux-pm/arch/x86/power/hibernate_asm_64.S > =================================================================== > --- linux-pm.orig/arch/x86/power/hibernate_asm_64.S > +++ linux-pm/arch/x86/power/hibernate_asm_64.S > @@ -72,8 +72,10 @@ ENTRY(restore_image) > movq %rax, %cr4; # turn PGE back on > > /* prepare to jump to the image kernel */ > - movq restore_jump_address(%rip), %rax > movq restore_cr3(%rip), %rbx > + movq restore_jump_address(%rip), %r10 > + movq restore_pgd(%rip), %r8 > + movq restore_pgd_addr(%rip), %r9 > > /* prepare to copy image data to their original locations */ > movq restore_pblist(%rip), %rdx > @@ -96,20 +98,22 @@ ENTRY(core_restore_code) > /* progress to the next pbe */ > movq pbe_next(%rdx), %rdx > jmp .Lloop > + > .Ldone: > + /* switch over to the temporary kernel text mapping */ > + movq %r8, (%r9) > + /* flush TLB */ > + movq %rax, %rdx > + andq $~(X86_CR4_PGE), %rdx > + movq %rdx, %cr4; # turn off PGE > + movq %cr3, %rcx; # flush TLB > + movq %rcx, %cr3; > + movq %rax, %cr4; # turn PGE back on > /* jump to the restore_registers address from the image header */ > - jmpq *%rax > - /* > - * NOTE: This assumes that the boot kernel's text mapping covers the > - * image kernel's page containing restore_registers and the address of > - * this page is the same as in the image kernel's text mapping (it > - * should always be true, because the text mapping is linear, starting > - * from 0, and is supposed to cover the entire kernel text for every > - * kernel). > - * > - * code below belongs to the image kernel > - */ > + jmpq *%r10 > > + /* code below belongs to the image kernel */ > + .align PAGE_SIZE > ENTRY(restore_registers) > FRAME_BEGIN > /* go back to the original page tables */ > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-06-12 16:11 ` Logan Gunthorpe @ 2016-06-13 13:43 ` Rafael J. Wysocki 0 siblings, 0 replies; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-13 13:43 UTC (permalink / raw) To: Logan Gunthorpe Cc: Kees Cook, Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Sunday, June 12, 2016 10:11:38 AM Logan Gunthorpe wrote: > Hey Rafael, > > Awesome, this patch fixes the problem! Nice work. Great, thanks for testing! I've just sent an "official" version of it. Thanks, Rafael ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: PROBLEM: Resume form hibernate broken by setting NX on gap 2016-05-20 21:59 ` Kees Cook 2016-05-20 22:16 ` Kees Cook @ 2016-06-10 22:11 ` Rafael J. Wysocki 1 sibling, 0 replies; 25+ messages in thread From: Rafael J. Wysocki @ 2016-06-10 22:11 UTC (permalink / raw) To: Kees Cook Cc: Rafael J. Wysocki, Stephen Smalley, Ingo Molnar, Logan Gunthorpe, Ingo Molnar, the arch/x86 maintainers, linux-pm, Linux Kernel Mailing List, Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst On Friday, May 20, 2016 02:59:30 PM Kees Cook wrote: > On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote: > >> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: > >>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: > >>>> > >>>> * Logan Gunthorpe <logang@deltatee.com> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I have been working on a bug that causes my laptop to freeze during > >>>>> resume from hibernation. I did a bisect to find the offending commit: > >>>>> > >>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata > >>>>> > >>>>> There is more information in the bugzilla report [1] that > >>>>> I've been working on but I will summarize things below. > >>>>> > >>>>> I've experienced intermittent but reproducible freezes when resuming > >>>>> from hibernation since about kernel version 3.19. The freeze was > >>>>> significantly more reproducible when a few applications were loaded > >>>>> before hibernation and would largely not happen if hibernated > >>>>> immediately after booting to a desktop. I did some tracing work to find > >>>>> that the kernel gets as far as the resume_image call in > >>>>> swsusp_arch_resume and I could not find any response from the image > >>>>> kernel when I hit the bug. I also did testing that seemed to rule out > >>>>> this being caused by a problematic driver. > >>>>> > >>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in > >>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. > >>>>> Then, I did a second bisect with a ported version of the fix to the > >>>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation > >>>>> with what appears to be the exact same symptoms. Reverting that commit > >>>>> in recent kernels up to and including 4.6 fixes the issue and restores > >>>>> reliable hibernation. However, it's not at all clear to me why that > >>>>> commit would cause this issue or how to fix the issue without reverting. > >>>> > >>>> I've attached that commit below and also Cc:-ed a few more people who might have > >>>> an idea about why this regressed. Worst-case we'll have to revert it. > >>> > >>> Without looking deep into mm, my theory would be that after this patch > >>> the final jump from the boot kernel to the image kernel's trampoline > >>> code during resume may crash the kernel if the trampoline page turns > >>> out to be NX in the boot kernel (it has to be executable in both the > >>> boot and the image kernels). > >> > >> So, pardon my ignorance, but where is this trampoline page placed in > >> kernel memory? > > > > On 32-bit its location has to be the same in both the boot and the > > image kernels and that's within kernel text in both cases, so that > > shouldn't be a problem. > > > > On 64-bit its location depends on the image kernel and specifically on > > the location of the restore_registers routine in it. The (virtual) > > address of that routine is stored in the restore_jump_address > > variable, so the page containing it (the trampoline page) can be found > > with the help of that. > > > > swsusp_arch_resume() sets up a temporary kernel mapping to finalize > > the image restoration and that page must not be NX in that mapping for > > things to work. > > It looks like nothing in the swsusp_arch_resume() -> get_safe_page() > -> get_image_page() path sets the page executable... > > Untested, but I wonder if this work work in swsusp_arch_resume() > before the memcpy? > > (apologies for any gmail-based whitespace mangling...) > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > index 009947d419a6..c2f3ecc45bd4 100644 > --- a/arch/x86/power/hibernate_64.c > +++ b/arch/x86/power/hibernate_64.c > @@ -12,6 +12,7 @@ > #include <linux/smp.h> > #include <linux/suspend.h> > > +#include <asm/cacheflush.h> > #include <asm/init.h> > #include <asm/proto.h> > #include <asm/page.h> > @@ -89,6 +90,7 @@ int swsusp_arch_resume(void) > relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC); > if (!relocated_restore_code) > return -ENOMEM; > + set_memory_x((unsigned long)relocated_restore_code, 1); > memcpy(relocated_restore_code, &core_restore_code, > &restore_registers - &core_restore_code); > > We may not be on the right track with this I'm afraid. When the temporary kernel mapping is set up in swsusp_arch_resume(), it passes __PAGE_KERNEL_LARGE_EXEC as pmd_flag in the x86_mapping_info, so all memory should be executable after we switch to that which happens right at the beginning of restore_image(). So restore_image() and the code it jumps to should be fine from the executability perspective (including the final jump to restore_jump_address). Or am I missingy anything? Thanks, Rafael ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2016-06-13 13:39 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <573DF82D.50006@deltatee.com> 2016-05-20 7:15 ` PROBLEM: Resume form hibernate broken by setting NX on gap Ingo Molnar 2016-05-20 11:34 ` Rafael J. Wysocki 2016-05-20 13:56 ` Stephen Smalley 2016-05-20 21:46 ` Rafael J. Wysocki 2016-05-20 21:59 ` Kees Cook 2016-05-20 22:16 ` Kees Cook [not found] ` <573FC081.20006@deltatee.com> 2016-05-21 16:39 ` Kees Cook [not found] ` <575A3E95.5090100@deltatee.com> 2016-06-10 18:09 ` Kees Cook 2016-06-10 18:16 ` Logan Gunthorpe 2016-06-10 18:18 ` Kees Cook 2016-06-10 21:27 ` Rafael J. Wysocki 2016-06-10 22:29 ` Rafael J. Wysocki 2016-06-10 22:28 ` Logan Gunthorpe 2016-06-10 22:33 ` Rafael J. Wysocki 2016-06-11 0:13 ` Rafael J. Wysocki 2016-06-11 1:47 ` Rafael J. Wysocki 2016-06-11 11:48 ` Rafael J. Wysocki 2016-06-11 16:35 ` Logan Gunthorpe 2016-06-11 17:39 ` Logan Gunthorpe 2016-06-12 1:05 ` Rafael J. Wysocki 2016-06-12 4:48 ` Logan Gunthorpe 2016-06-12 14:31 ` Rafael J. Wysocki 2016-06-12 16:11 ` Logan Gunthorpe 2016-06-13 13:43 ` Rafael J. Wysocki 2016-06-10 22:11 ` Rafael J. Wysocki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.