From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755453AbcETLea (ORCPT ); Fri, 20 May 2016 07:34:30 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:33649 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755318AbcETLe3 (ORCPT ); Fri, 20 May 2016 07:34:29 -0400 MIME-Version: 1.0 In-Reply-To: <20160520071517.GB14191@gmail.com> References: <573DF82D.50006@deltatee.com> <20160520071517.GB14191@gmail.com> Date: Fri, 20 May 2016 13:34:27 +0200 X-Google-Sender-Auth: pTiMbWeHmZ7aeNdHs70oGroarIc Message-ID: Subject: Re: PROBLEM: Resume form hibernate broken by setting NX on gap From: "Rafael J. Wysocki" To: Ingo Molnar Cc: Logan Gunthorpe , Stephen Smalley , Kees Cook , Ingo Molnar , "the arch/x86 maintainers" , "linux-pm@vger.kernel.org" , Linux Kernel Mailing List , Andy Lutomirski , Borislav Petkov , Denys Vlasenko , Brian Gerst Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar wrote: > > * Logan Gunthorpe wrote: > >> Hi, >> >> I have been working on a bug that causes my laptop to freeze during >> resume from hibernation. I did a bisect to find the offending commit: >> >> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >> >> There is more information in the bugzilla report [1] that >> I've been working on but I will summarize things below. >> >> I've experienced intermittent but reproducible freezes when resuming >> from hibernation since about kernel version 3.19. The freeze was >> significantly more reproducible when a few applications were loaded >> before hibernation and would largely not happen if hibernated >> immediately after booting to a desktop. I did some tracing work to find >> that the kernel gets as far as the resume_image call in >> swsusp_arch_resume and I could not find any response from the image >> kernel when I hit the bug. I also did testing that seemed to rule out >> this being caused by a problematic driver. >> >> I did a successful bisect between 3.18 and 3.19 which found a bug in >> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. >> Then, I did a second bisect with a ported version of the fix to the >> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation >> with what appears to be the exact same symptoms. Reverting that commit >> in recent kernels up to and including 4.6 fixes the issue and restores >> reliable hibernation. However, it's not at all clear to me why that >> commit would cause this issue or how to fix the issue without reverting. > > I've attached that commit below and also Cc:-ed a few more people who might have > an idea about why this regressed. Worst-case we'll have to revert it. Without looking deep into mm, my theory would be that after this patch the final jump from the boot kernel to the image kernel's trampoline code during resume may crash the kernel if the trampoline page turns out to be NX in the boot kernel (it has to be executable in both the boot and the image kernels). Thanks, Rafael