From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935176AbdK2R6s (ORCPT ); Wed, 29 Nov 2017 12:58:48 -0500 Received: from mail.kernel.org ([198.145.29.99]:54296 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935063AbdK2R6F (ORCPT ); Wed, 29 Nov 2017 12:58:05 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1752121998 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: AGs4zMarZF8qNJS6IhJ9RiXcxOQ75wSq1KKcNk/4x+wNumxIzSbz13XSuJ4I/KeuQTJQWgH+nEXqTBDnBR1mPjusHhM= MIME-Version: 1.0 In-Reply-To: <20171129070951.hjjjpbyilzaak4ig@gmail.com> References: <0fede9f9-88b0-a6e7-1027-dfb2019b8ef2@linux.intel.com> <20171129070951.hjjjpbyilzaak4ig@gmail.com> From: Andy Lutomirski Date: Wed, 29 Nov 2017 09:57:43 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] x86/entry/64: Fix native_load_gs_index() SWAPGS handling with IRQ state tracing enabled To: Ingo Molnar Cc: Jarkko Nikula , linux-kernel , Andy Lutomirski , Thomas Gleixner , Peter Zijlstra , Linus Torvalds , Borislav Petkov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 28, 2017 at 11:09 PM, Ingo Molnar wrote: > > * Jarkko Nikula wrote: > >> Hi >> >> Suspend-to-ram and resume stopped working on v4.15-rc1 and I bisected it to >> commit ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to >> native_load_gs_index()"). >> >> I noticed it on Intel Kabylake (core) and Apollolake (atom) based prototype >> machines. Symptoms are that machine appears to enter into suspend but >> resumes instantly and hangs. Unfortunately no logs. >> >> If I revert ca37e57bbe0c on v4.15-rc1 it works as expected. > > Hm, that commit looks broken with irq-tracing enabled. > Does the patch below fix it? > > In fact the exception handler itself appears to have broken GS handling as well - > I suspect it never triggers in practice, because it was broken forever. > > Andy, do you concur? No. > > On a related note, we should definitely extend the 'intended GS state' annotation > comments I did in this patch to all SWAPGS instances - this way code review has a > much higher chance of finding discrepancies between intent and actual code. Agreed. I'll send a patch. > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -945,16 +945,16 @@ idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0 > */ > ENTRY(native_load_gs_index) > FRAME_BEGIN > + SWAPGS /* switch from user GS to kernel GS */ No, we start with kernel GS. It was correct before. > pushfq > DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI) > TRACE_IRQS_OFF > - SWAPGS > .Lgs_change: > movl %edi, %gs > 2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE > - SWAPGS > TRACE_IRQS_FLAGS (%rsp) > popfq > + SWAPGS /* switch from kernel GS to user GS */ > FRAME_END > ret > ENDPROC(native_load_gs_index) > @@ -964,7 +964,7 @@ EXPORT_SYMBOL(native_load_gs_index) > .section .fixup, "ax" > /* running with kernelgs */ > bad_gs: > - SWAPGS /* switch back to user gs */ > + SWAPGS /* switch back to user GS, to modify GS */ > .macro ZAP_GS > /* This can't be a string because the preprocessor needs to see it. */ > movl $__USER_DS, %eax > @@ -973,6 +973,7 @@ EXPORT_SYMBOL(native_load_gs_index) > ALTERNATIVE "", "ZAP_GS", X86_BUG_NULL_SEG > xorl %eax, %eax > movl %eax, %gs > + SWAPGS /* switch to kernel GS again before continuing */ Which we don't want to do because the landing site expects user GS. I suspect we're hitting an entirely different bug, that we're blowing up if we WARN too early in resume.