From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934210AbbDIJNg (ORCPT ); Thu, 9 Apr 2015 05:13:36 -0400 Received: from mail-wi0-f180.google.com ([209.85.212.180]:37314 "EHLO mail-wi0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933576AbbDIJN0 (ORCPT ); Thu, 9 Apr 2015 05:13:26 -0400 Date: Thu, 9 Apr 2015 11:13:21 +0200 From: Ingo Molnar To: Naoya Horiguchi Cc: Borislav Petkov , Tony Luck , Prarit Bhargava , Vivek Goyal , "linux-kernel@vger.kernel.org" , Junichi Nomura , Kiyoshi Ueda Subject: Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump Message-ID: <20150409091321.GA9811@gmail.com> References: <20150306092738.GE3514@pd.tnic> <20150306093212.GB14982@hori1.linux.bs1.fc.nec.co.jp> <20150306102216.GA22787@hori1.linux.bs1.fc.nec.co.jp> <20150406071803.GA22950@hori1.linux.bs1.fc.nec.co.jp> <20150406115923.GD4078@pd.tnic> <20150407080017.GB27856@hori1.linux.bs1.fc.nec.co.jp> <20150407080218.GC27856@hori1.linux.bs1.fc.nec.co.jp> <20150409061346.GA25434@pd.tnic> <20150409080030.GA4713@gmail.com> <20150409083908.GA25764@hori1.linux.bs1.fc.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150409083908.GA25764@hori1.linux.bs1.fc.nec.co.jp> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Naoya Horiguchi wrote: > On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote: > > > > * Borislav Petkov wrote: > > > > > Btw, Ingo had some reservations about this. Ingo? > > > > Yeah, so my concerns are the following: > > > > > kexec disables (or "shoots down") all CPUs other than the crashing > > > CPU before entering the 2nd kernel. However, MCA is still enabled so > > > if an MCE happens and broadcasts to the CPUs after the main thread > > > starts the 2nd kernel (which might not initialize its MCE handler > > > yet, or might decide not to enable it) the MCE handler runs only on > > > the other CPUs (not on the main thread) leading to kernel panic > > > during MCE synchronization. The user-visible effect of this bug is a > > > kdump failure. > > > > So the thing is, when we boot up the second kernel there will be a > > window where the old handler isn't valid (because the new kernel has > > its own pagetables, etc.) and the new handler is not installed yet. > > > > If an MCE hits that window, it's bad luck. (unless the bootup sequence > > is rearchitected significantly to allow cross-kernel inheritance of > > MCE handlers.) > > > > So I think we can ignore _that_ race. > > > > The more significant question is: what happens when an MCE arrives > > whiel the kdump is proceeding - as kdumps can take a long time to > > finish when there's a lot of RAM. > > Without this patch, MCE makes idling CPUs unpreferably wake up and > needlessly run MCE handler, which disturbs memory so does harm on > the kdump. This patch improves not only the transition phase, but > also that window. The way the kdump code stops CPUs already 'disturbs' the state of those CPUs. > > But ... since the 'shootdown' is analogous to a CPU hotplug > > CPU-down sequence, I suppose that the existing MCE code should > > already properly handle the case where an MCE arrives on a > > (supposedly) dead CPU, right? > > Currently not, so Tony mentioned some idea about it (although not > included in this patch.) > > > In that case installing a separate MCE handler looks like the > > wrong thing. > > One difference bewteen kdump and CPU offline is whether we need handle > MCEs then or not. In CPU offline situation, running CPUs have to continue > their normal operations, so it's imporatant to handle MCE (i.e. log and/or > take recovery action), so I think that should be done in our main MCE > handler, do_machine_check(). I disagree: if offline CPUs are still active and can produce MCEs then they should be reported regardless of whether they were shot down by the CPU hotplug code or by kdump. > But that's not the case in kdump situation (logging or recovering is > not possible/necessary any more.) So it seems make sense to me to > separate the handler. I disagree: for example logging to the screen is still possible and should be done if there's an uncorrectable error. So I agree that MCE policy should be made non-fatal during kdump, but I disagree that it needs a separate handler: it should be part of the regular MCE handling routines to handle kdump gracefully. Thanks, Ingo