From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423076AbcBQRxK (ORCPT ); Wed, 17 Feb 2016 12:53:10 -0500 Received: from mail-ob0-f169.google.com ([209.85.214.169]:36270 "EHLO mail-ob0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423027AbcBQRxD (ORCPT ); Wed, 17 Feb 2016 12:53:03 -0500 MIME-Version: 1.0 In-Reply-To: <20160217092911.GA2023@pd.tnic> References: <20160211192741.GG5565@pd.tnic> <20160212170010.GE4099@pd.tnic> <20160215191422.GB32716@pd.tnic> <20160217081646.GA32354@gmail.com> <20160217092911.GA2023@pd.tnic> From: Andy Lutomirski Date: Wed, 17 Feb 2016 09:52:41 -0800 Message-ID: Subject: Re: WARNING: CPU: 0 PID: 3031 at ./arch/x86/include/asm/fpu/internal.h:530 fpu__restore+0x90/0x130() To: Borislav Petkov Cc: X86 ML , "linux-kernel@vger.kernel.org" , Ingo Molnar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Feb 17, 2016 1:29 AM, "Borislav Petkov" wrote: > > On Wed, Feb 17, 2016 at 09:16:46AM +0100, Ingo Molnar wrote: > > So I'm wondering why this started triggering only now. Is this a pre-existing bug > > that somehow got triggered via: > > > > 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs > > > > ? > > Well, that's an interesting question. See, the thing is, I triggered > this only *once* by accident and I haven't seen it ever since. > > The "reliable" "reproducer" I used to debug this was Andy's suggestion > to stick a schedule() in __fpu__restore_sig(). > > So the answer to that question is not easy. > > BUT(!), regardless, the bug still needs to be fixed because my tracing > here > > https://lkml.kernel.org/r/20160215191422.GB32716@pd.tnic > > showed that getting preempted after setting > > fpu->fpstate_active = 1; > > leads to the WARN. Because - and please doublecheck me on that - when > we're in __switch_to() and the task which already has ->fpstate_active > set and it is the next task to which we're going to switch to, when it > enters switch_fpu_prepare(), it does: > > fpu.preload = static_cpu_has(X86_FEATURE_FPU) && > new_fpu->fpstate_active && > ^^^^^^^^^^^^^^^^^^^^^^^ > > so that fpu.preload is set now. > > A bit later in that same function: > > /* Don't change CR0.TS if we just switch! */ > if (fpu.preload) { > new_fpu->counter++; > __fpregs_activate(new_fpu); > ^^^^^^^^^^^^^^^^^ > > ->fpregs_active gets set here and when the task returns to > __fpu__restore_sig(), fpu__restore() sets it again, leading to the WARN. > > Mind you, this happens on 32-bit only because there we sigreturn with > irqs enabled. Look at the call trace. Not quite. IRQs are on in the 64-bit case, too, but the problematic code doesn't run because the 64-bit sigreturn case doesn't need to fiddle with the fpu layout. --Andy