All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kees Cook <keescook@chromium.org>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Ingo Molnar <mingo@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Brian Gerst <brgerst@gmail.com>
Subject: Re: PROBLEM: Resume form hibernate broken by setting NX on gap
Date: Fri, 10 Jun 2016 11:09:22 -0700	[thread overview]
Message-ID: <CAGXu5jK15P5CAQd09NzG31YMPEKJGXbhXiYDQ2sRC=WBhAFTrA@mail.gmail.com> (raw)
In-Reply-To: <575A3E95.5090100@deltatee.com>

On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> Hey,
>
> I've still be trying to figure this out as I have time.
>
> I tried printing a couple restore addresses and nothing I can find seems
> anywhere near the rodata/ex_table boundary.
>
> I tried with the (badly formatted) below and got the following. Nothing too
> surprising. I've attached a kallsyms that matches the kernel for reference.
>
> restore_code: ffff880157c3b000
> jump_addr: ffffffff81446be0
>
>
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 009947d..6efedb7 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -92,6 +92,9 @@ int swsusp_arch_resume(void)
>         memcpy(relocated_restore_code, &core_restore_code,
>                &restore_registers - &core_restore_code);
>
> +       pr_info("restore_code: %p\n", relocated_restore_code);
> +       pr_info("jump_addr: %lx\n", restore_jump_address);
> +

Also interesting would be the "relocated_restore_code" address, as
well as a dump of /sys/kernel/debug/kernel_page_tables (from
CONFIG_X86_PTDUMP).

I'm baffled by the problem, but the best I can understand is the the
relocated_restore_code range isn't executable (which should be visible
from finding it in /sys/kernel/debug/kernel_page_tables), but I don't
see how to solve that since my original patch didn't work.

Rafael, is this something you have time to look at quickly?

-Kees

>         restore_image();
>         return 0;
>  }
>
>
> Thanks,
>
> Logan
>
>
>
> On 21/05/16 10:39 AM, Kees Cook wrote:
>>
>> On Fri, May 20, 2016 at 6:57 PM, Logan Gunthorpe <logang@deltatee.com>
>> wrote:
>>>
>>> On 20/05/16 04:16 PM, Kees Cook wrote:
>>>>
>>>>
>>>> On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org>
>>>> wrote:
>>>>>
>>>>>
>>>>> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> * Logan Gunthorpe <logang@deltatee.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have been working on a bug that causes my laptop to freeze
>>>>>>>>>> during
>>>>>>>>>> resume from hibernation. I did a bisect to find the offending
>>>>>>>>>> commit:
>>>>>>>>>>
>>>>>>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata
>>>>>>>>>>
>>>>>>>>>> There is more information in the bugzilla report [1] that
>>>>>>>>>> I've been working on but I will summarize things below.
>>>>>>>>>>
>>>>>>>>>> I've experienced intermittent but reproducible freezes when
>>>>>>>>>> resuming
>>>>>>>>>> from hibernation since about kernel version 3.19. The freeze was
>>>>>>>>>> significantly more reproducible when a few applications were
>>>>>>>>>> loaded
>>>>>>>>>> before hibernation and would largely not happen if hibernated
>>>>>>>>>> immediately after booting to a desktop. I did some tracing work to
>>>>>>>>>> find
>>>>>>>>>> that the kernel gets as far as the resume_image call in
>>>>>>>>>> swsusp_arch_resume and I could not find any response from the
>>>>>>>>>> image
>>>>>>>>>> kernel when I hit the bug. I also did testing that seemed to rule
>>>>>>>>>> out
>>>>>>>>>> this being caused by a problematic driver.
>>>>>>>>>>
>>>>>>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug
>>>>>>>>>> in
>>>>>>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in
>>>>>>>>>> 4.4.
>>>>>>>>>> Then, I did a second bisect with a ported version of the fix to
>>>>>>>>>> the
>>>>>>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break
>>>>>>>>>> hibernation
>>>>>>>>>> with what appears to be the exact same symptoms. Reverting that
>>>>>>>>>> commit
>>>>>>>>>> in recent kernels up to and including 4.6 fixes the issue and
>>>>>>>>>> restores
>>>>>>>>>> reliable hibernation. However, it's not at all clear to me why
>>>>>>>>>> that
>>>>>>>>>> commit would cause this issue or how to fix the issue without
>>>>>>>>>> reverting.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've attached that commit below and also Cc:-ed a few more people
>>>>>>>>> who
>>>>>>>>> might have
>>>>>>>>> an idea about why this regressed. Worst-case we'll have to revert
>>>>>>>>> it.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Without looking deep into mm, my theory would be that after this
>>>>>>>> patch
>>>>>>>> the final jump from the boot kernel to the image kernel's trampoline
>>>>>>>> code during resume may crash the kernel if the trampoline page turns
>>>>>>>> out to be NX in the boot kernel (it has to be executable in both the
>>>>>>>> boot and the image kernels).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So, pardon my ignorance, but where is this trampoline page placed in
>>>>>>> kernel memory?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 32-bit its location has to be the same in both the boot and the
>>>>>> image kernels and that's within kernel text in both cases, so that
>>>>>> shouldn't be a problem.
>>>>>>
>>>>>> On 64-bit its location depends on the image kernel and specifically on
>>>>>> the location of the restore_registers routine in it.  The (virtual)
>>>>>> address of that routine is stored in the restore_jump_address
>>>>>> variable, so the page containing it (the trampoline page) can be found
>>>>>> with the help of that.
>>>>>>
>>>>>> swsusp_arch_resume() sets up a temporary kernel mapping to finalize
>>>>>> the image restoration and that page must not be NX in that mapping for
>>>>>> things to work.
>>>>>
>>>>>
>>>>>
>>>>> It looks like nothing in the swsusp_arch_resume() -> get_safe_page()
>>>>> -> get_image_page() path sets the page executable...
>>>>>
>>>>> Untested, but I wonder if this work work in swsusp_arch_resume()
>>>>> before the memcpy?
>>>>
>>>>
>>>>
>>>> I can't type today, it seems. It should read "... if this would work
>>>> ..."
>>>>
>>>> If you can test this and it works for you, I'll send a proper patch...
>>>> :P
>>>>
>>>> -Kees
>>>>
>>>
>>> Hi Kees,
>>>
>>> Thanks. I tried the patch but it only resulted in a kernel warning and
>>> freeze. I've attached a photo showing as much of the messages as I could
>>> get.
>>>
>>> Logan
>>
>>
>> Ah, dang, ok, thanks for trying it. I'll let Rafael try to figure this one
>> out.
>>
>> -Kees
>>
>



-- 
Kees Cook
Chrome OS & Brillo Security

  parent reply	other threads:[~2016-06-10 18:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <573DF82D.50006@deltatee.com>
2016-05-20  7:15 ` PROBLEM: Resume form hibernate broken by setting NX on gap Ingo Molnar
2016-05-20 11:34   ` Rafael J. Wysocki
2016-05-20 13:56     ` Stephen Smalley
2016-05-20 21:46       ` Rafael J. Wysocki
2016-05-20 21:59         ` Kees Cook
2016-05-20 22:16           ` Kees Cook
     [not found]             ` <573FC081.20006@deltatee.com>
2016-05-21 16:39               ` Kees Cook
     [not found]                 ` <575A3E95.5090100@deltatee.com>
2016-06-10 18:09                   ` Kees Cook [this message]
2016-06-10 18:16                     ` Logan Gunthorpe
2016-06-10 18:18                       ` Kees Cook
2016-06-10 21:27                     ` Rafael J. Wysocki
2016-06-10 22:29                       ` Rafael J. Wysocki
2016-06-10 22:28                         ` Logan Gunthorpe
2016-06-10 22:33                           ` Rafael J. Wysocki
2016-06-11  0:13                             ` Rafael J. Wysocki
2016-06-11  1:47                               ` Rafael J. Wysocki
2016-06-11 11:48                                 ` Rafael J. Wysocki
2016-06-11 16:35                                   ` Logan Gunthorpe
2016-06-11 17:39                                 ` Logan Gunthorpe
2016-06-12  1:05                                   ` Rafael J. Wysocki
2016-06-12  4:48                                     ` Logan Gunthorpe
2016-06-12 14:31                                       ` Rafael J. Wysocki
2016-06-12 16:11                                         ` Logan Gunthorpe
2016-06-13 13:43                                           ` Rafael J. Wysocki
2016-06-10 22:11           ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGXu5jK15P5CAQd09NzG31YMPEKJGXbhXiYDQ2sRC=WBhAFTrA@mail.gmail.com' \
    --to=keescook@chromium.org \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=rafael@kernel.org \
    --cc=sds@tycho.nsa.gov \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.