From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756848AbcBQIQw (ORCPT ); Wed, 17 Feb 2016 03:16:52 -0500 Received: from mail-wm0-f46.google.com ([74.125.82.46]:36070 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756504AbcBQIQu (ORCPT ); Wed, 17 Feb 2016 03:16:50 -0500 Date: Wed, 17 Feb 2016 09:16:46 +0100 From: Ingo Molnar To: Andy Lutomirski Cc: Borislav Petkov , "linux-kernel@vger.kernel.org" , X86 ML Subject: Re: WARNING: CPU: 0 PID: 3031 at ./arch/x86/include/asm/fpu/internal.h:530 fpu__restore+0x90/0x130() Message-ID: <20160217081646.GA32354@gmail.com> References: <20160211192741.GG5565@pd.tnic> <20160212170010.GE4099@pd.tnic> <20160215191422.GB32716@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Andy Lutomirski wrote: > On Feb 15, 2016 12:14 PM, "Borislav Petkov" wrote: > > > > --- > > From: Borislav Petkov > > Date: Mon, 15 Feb 2016 19:50:33 +0100 > > Subject: [RFC PATCH] x86/FPU: Fix double FPU regs activation > > > > On the entry_INT80_32->do_syscall_32_irqs_on path on 32-bit we run with > > interrupts enabled. > > I would change this a little bit. > > sys_sigreturn calls fpu__restore_sig with interrupts enabled. When > restoring a 32-bit signal frame, it can happen that... > > > And it can happen that we get preempted right after > > setting ->fpstate_active in a task's FPU. > > > > After we get preempted, we switch between tasks merrily and eventually > > are about to switch to that task above whose ->fpstate_active we > > set. We enter __switch_to() and do switch_fpu_prepare(). Our task gets > > ->fpregs_active set, we find ourselves back on the call stack below and > > especially in __fpu__restore_sig() which sets ->fpregs_active again. > > > > Leading to that whoops below. So I'm wondering why this started triggering only now. Is this a pre-existing bug that somehow got triggered via: 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs ? If yes then we need a plausible theory of how that never triggered on modern Intel CPUs that had eagerfpu enabled for years. Or perhaps was it caused by one of the other changes in tip:x86/fpu: c6ab109f7e0e x86/fpu: Speed up lazy FPU restores slightly a20d7297045f x86/fpu: Fold fpu_copy() into fpu__copy() 5ed73f40735c x86/fpu: Fix FNSAVE usage in eagerfpu mode 4ecd16ec7059 x86/fpu: Fix math emulation in eager fpu mode ? Which would make this a recently introduced regression. Thanks, Ingo