From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933614AbcKGSzX (ORCPT ); Mon, 7 Nov 2016 13:55:23 -0500 Received: from mga01.intel.com ([192.55.52.88]:49668 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933578AbcKGSzR (ORCPT ); Mon, 7 Nov 2016 13:55:17 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,606,1473145200"; d="scan'208";a="756468" Date: Mon, 7 Nov 2016 10:55:24 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, rt@linutronix.de, linux-edac@vger.kernel.org, x86@kernel.org, Thomas Gleixner Subject: Re: [PATCH 22/25] x86/mcheck: Do the init in one place Message-ID: <20161107185524.GA2536@intel.com> References: <20161103145021.28528-1-bigeasy@linutronix.de> <20161103145021.28528-23-bigeasy@linutronix.de> <20161107184532.xj6wzdjlzwhshcmf@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161107184532.xj6wzdjlzwhshcmf@pd.tnic> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 07, 2016 at 07:45:32PM +0100, Borislav Petkov wrote: > On Thu, Nov 03, 2016 at 03:50:18PM +0100, Sebastian Andrzej Siewior wrote: > > Part of the init (memory allocation and so on) is done > > in mcheck_cpu_init(). While moving the the allocation to > > mcheck_init_device() (where the hotplug calls are initialized) it > > becomes necessary to move the callback (mcheck_cpu_init()), too. > > > > The callback is now removed from identify_cpu() and registered as a > > hotplug event which is invoked as the very first one which is shortly > > after the original point of invocation (look at smp_store_cpu_info() and > > notify_cpu_starting() in smp_callin()). > > One "visible" difference is that MCE for the boot CPU is not enabled at > > identify_boot_cpu() time but at device_initcall_sync() time. Either way, > > both times we had no userland around. > > Uh, hm, I'm not sure about this: so the issue I see with this is that > the more we're delaying the enabling or MCE reporting - and especially > setting CR4[MCE] - the more we're increasing the window where a MCE > during early boot will cause a shutdown. (This is what happens if > CR4[MCE]=0b). > > Perhaps we should split the init into a very early init which doesn't > need to be part of hotplug and the rest, which can do mce_disable_cpu() > and mce_reenable_cpu(). > > Tony, how do you see this? I don't think that helps as much as you'd like it to help (at least on Intel). A broadcast machine check that finds the boot CPU has set CR4[MCE]=1 is still going to end up in reset if any other CPU still has CR4[MCE]=0 -Tony