From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756989AbcBQJfW (ORCPT ); Wed, 17 Feb 2016 04:35:22 -0500 Received: from mail-wm0-f52.google.com ([74.125.82.52]:35354 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756964AbcBQJfQ (ORCPT ); Wed, 17 Feb 2016 04:35:16 -0500 Date: Wed, 17 Feb 2016 10:35:12 +0100 From: Ingo Molnar To: Borislav Petkov Cc: Andy Lutomirski , "linux-kernel@vger.kernel.org" , X86 ML Subject: Re: WARNING: CPU: 0 PID: 3031 at ./arch/x86/include/asm/fpu/internal.h:530 fpu__restore+0x90/0x130() Message-ID: <20160217093511.GC19001@gmail.com> References: <20160211192741.GG5565@pd.tnic> <20160212170010.GE4099@pd.tnic> <20160215191422.GB32716@pd.tnic> <20160217081646.GA32354@gmail.com> <20160217092911.GA2023@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160217092911.GA2023@pd.tnic> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Borislav Petkov wrote: > On Wed, Feb 17, 2016 at 09:16:46AM +0100, Ingo Molnar wrote: > > So I'm wondering why this started triggering only now. Is this a pre-existing bug > > that somehow got triggered via: > > > > 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs > > > > ? > > Well, that's an interesting question. See, the thing is, I triggered > this only *once* by accident and I haven't seen it ever since. > > The "reliable" "reproducer" I used to debug this was Andy's suggestion > to stick a schedule() in __fpu__restore_sig(). > > So the answer to that question is not easy. > > BUT(!), regardless, the bug still needs to be fixed because my tracing > here The fix is absolutely needed, I just would like deeper analysis about how it wasn't seen before. > > If yes then we need a plausible theory of how that never triggered on modern > > Intel CPUs that had eagerfpu enabled for years. > > AFAICT, it triggers - and the window is very small at that - only on > 32-bit. If at all. So it probably triggers on vanilla v4.4 (or v4.5-rc4) as well, with no recent FPU bits applied? > I can certainly try to test all those but I don't have a reliable reproducer. > The only thing I could do is check out each of those commits and stick a > schedule() in __fpu__restore_sig() and see what happens. > > But if my analysis above is right, none of those would matter because of the > mechanism of how the warn happens... So if you stick a schedule() into vanilla and it triggers then I think we can declare it an existing bug. (and then the fix also needs Cc: stable) Thanks, Ingo