From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S2992443AbcB0NNq (ORCPT <rfc822;w@1wt.eu>);
	Sat, 27 Feb 2016 08:13:46 -0500
Received: from mail.skyhub.de ([78.46.96.112]:51136 "EHLO mail.skyhub.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1756241AbcB0NNp (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Sat, 27 Feb 2016 08:13:45 -0500
Date: Sat, 27 Feb 2016 14:13:37 +0100
From: Borislav Petkov <bp@alien8.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: kernel test robot <ying.huang@linux.intel.com>,
        Andy Lutomirski <luto@kernel.org>, lkp@01.org,
        LKML <linux-kernel@vger.kernel.org>,
        yu-cheng yu <yu-cheng.yu@intel.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>,
        Rik van Riel <riel@redhat.com>,
        Quentin Casasnovas <quentin.casasnovas@oracle.com>,
        Peter Zijlstra <peterz@infradead.org>, Oleg Nesterov <oleg@redhat.com>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        "H. Peter Anvin" <hpa@zytor.com>, Fenghua Yu <fenghua.yu@intel.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Andy Lutomirski <luto@amacapital.net>
Subject: Re: [lkp] [x86/fpu] 58122bf1d8: WARNING: CPU: 0 PID: 1 at
 arch/x86/include/asm/fpu/internal.h:529 fpu__restore+0x28f/0x9ab()
Message-ID: <20160227131337.GB5261@pd.tnic>
References: <87d1rk9str.fsf@yhuang-dev.intel.com>
 <20160226074940.GA28911@pd.tnic>
 <20160227120211.GA25164@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20160227120211.GA25164@gmail.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sat, Feb 27, 2016 at 01:02:11PM +0100, Ingo Molnar wrote:
> So I'm wondering, why did this commit:
> 
>   58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
> 

Hmm, so looking at switch_fpu_prepare():

        /*
         * If the task has used the math, pre-load the FPU on xsave processors
         * or if the past 5 consecutive context-switches used math.
         */
        fpu.preload = static_cpu_has(X86_FEATURE_FPU) &&
                      new_fpu->fpstate_active &&
                      (use_eager_fpu() || new_fpu->counter > 5);
		       ^^^^^^^^^^^^^^

and later:

        if (old_fpu->fpregs_active) {

		...

                /* Don't change CR0.TS if we just switch! */
                if (fpu.preload) {
			...
                        __fpregs_activate(new_fpu);


so I can see a possible link between 58122bf1d856 and what we're seeing.

But as I've told you offlist, I couldn't confirm that this commit was
the culprit due to my simulated reproducer. So I'm thinking the 0day
guys have a more reliable one.

> trigger the warning, while it never triggered on CPUs that were already 
> eagerfpu=on for years?

That I can't explain... yet.

FWIW, the one time splat I saw, happened on an IVB machine on 32-bit
which has always been eagerfpu=on.

> There must be something we are still missing I think.

Yeah.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.