From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030184AbcG1PRM (ORCPT ); Thu, 28 Jul 2016 11:17:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45478 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751660AbcG1PRL (ORCPT ); Thu, 28 Jul 2016 11:17:11 -0400 Date: Thu, 28 Jul 2016 10:17:07 -0500 From: Josh Poimboeuf To: "Rafael J. Wysocki" Cc: "Rafael J. Wysocki" , Ingo Molnar , Borislav Petkov , Pavel Machek , Linux PM list , Linux Kernel Mailing List , Thomas Gleixner , shuzzle@mailbox.org Subject: [PATCH] x86/asm/power: Fix hibernation return address corruption Message-ID: <20160728151707.nmtkzri4jtumaq6h@treble> References: <16541580.dFLT14ScxF@vostro.rjw.lan> <20160727221738.ilmm6mnvlmzmvfnt@treble> <1703386.0Oti4h65x8@vostro.rjw.lan> <1670639.GCDubuBUSI@vostro.rjw.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1670639.GCDubuBUSI@vostro.rjw.lan> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 28 Jul 2016 15:17:10 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 28, 2016 at 01:29:49AM +0200, Rafael J. Wysocki wrote: > On Thursday, July 28, 2016 01:20:53 AM Rafael J. Wysocki wrote: > > On Wednesday, July 27, 2016 05:17:38 PM Josh Poimboeuf wrote: > > > On Thu, Jul 28, 2016 at 12:12:15AM +0200, Rafael J. Wysocki wrote: > > > > On Wednesday, July 27, 2016 12:59:18 PM Josh Poimboeuf wrote: > > > > > Hm... I have a theory, but I'm not sure about it. I noticed that > > > > > x86_acpi_enter_sleep_state(), > > > > > > > > I think you mean x86_acpi_suspend_lowlevel(). > > > > > > Oops! > > > > > > > > which is involved in suspend, overwrites > > > > > several global variables (e.g, initial_code) which are used by the CPU > > > > > boot code in head_64.S. But surprisingly, it doesn't restore those > > > > > variables to their original values after it resumes. > > > > > > > > Is the head_64.S code also used to bring up offline CPUs? > > > > > > Yes. > > > > OK > > > > So it is really interesting why and how that stuff works for everybody. > > > > Basically, CPU online should fail after a suspend-resume cycle, but it > > doesn't most of the time AFAICS. > > do_boot_cpu() restores those values, so I think we're safe from that angle. > > That should apply to the CPU online during resume from hibernation too. Yeah, my theory was bogus. And as it turns out, the bug reporter made a mistake in the bisect. The actual offending commit was apparently: ef0f3ed5a4ac ("x86/asm/power: Create stack frames in hibernate_asm_64.S") Amazingly enough, I authored that patch as well. I think "git bisect" doesn't like me! Here's the fix: ---- From: Josh Poimboeuf Subject: [PATCH] x86/asm/power: Fix hibernation return address corruption In kernel bug 150021, a kernel panic was reported when restoring a hibernate image. Only a picture of the oops was reported, so I can't paste the whole thing here. But here are the most interesting parts: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at ffff8804615cfd78 ... RIP: ffff8804615cfd78 RSP: ffff8804615f0000 RBP: ffff8804615cfdc0 ... Call Trace: do_signal+0x23 exit_to_usermode_loop+0x64 ... The RIP is on the same page as RBP, so it apparently started executing on the stack. The bug was bisected to commit ef0f3ed5a4ac ("x86/asm/power: Create stack frames in hibernate_asm_64.S"), which in retrospect seems quite dangerous, since that code saves and restores the stack pointer from a global variable ('saved_context'). There are a lot of moving parts in the hibernate save and restore paths, so I don't know exactly what caused the panic. Presumably, a FRAME_END was executed without the corresponding FRAME_BEGIN, or vice versa. That would corrupt the return address on the stack and would be consistent with the details of the above panic. Instead of doing the frame pointer save/restore around the bounds of the affected functions, instead just do it around the call to swsusp_save(). That has the same effect of ensuring that if swsusp_save() sleeps, the frame pointers will be correct. It's also a much more obviously safe way to do it than the original patch. And objtool still doesn't report any warnings. Fixes: ef0f3ed5a4ac ("x86/asm/power: Create stack frames in hibernate_asm_64.S") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=150021 Reported-by: Tested-by: Cc: Signed-off-by: Josh Poimboeuf --- arch/x86/power/hibernate_asm_64.S | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S index 3177c2b..8eee0e9 100644 --- a/arch/x86/power/hibernate_asm_64.S +++ b/arch/x86/power/hibernate_asm_64.S @@ -24,7 +24,6 @@ #include ENTRY(swsusp_arch_suspend) - FRAME_BEGIN movq $saved_context, %rax movq %rsp, pt_regs_sp(%rax) movq %rbp, pt_regs_bp(%rax) @@ -48,6 +47,7 @@ ENTRY(swsusp_arch_suspend) movq %cr3, %rax movq %rax, restore_cr3(%rip) + FRAME_BEGIN call swsusp_save FRAME_END ret @@ -104,7 +104,6 @@ ENTRY(core_restore_code) /* code below belongs to the image kernel */ .align PAGE_SIZE ENTRY(restore_registers) - FRAME_BEGIN /* go back to the original page tables */ movq %r9, %cr3 @@ -145,6 +144,5 @@ ENTRY(restore_registers) /* tell the hibernation core that we've just restored the memory */ movq %rax, in_suspend(%rip) - FRAME_END ret ENDPROC(restore_registers) -- 2.7.4