From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754790AbbCFIfi (ORCPT ); Fri, 6 Mar 2015 03:35:38 -0500 Received: from mail.skyhub.de ([78.46.96.112]:49637 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751687AbbCFIfh (ORCPT ); Fri, 6 Mar 2015 03:35:37 -0500 Date: Fri, 6 Mar 2015 09:34:21 +0100 From: Borislav Petkov To: Naoya Horiguchi Cc: "Luck, Tony" , Prarit Bhargava , Vivek Goyal , "linux-kernel@vger.kernel.org" , Junichi Nomura , Kiyoshi Ueda Subject: Re: [PATCH v6] x86: mce: kexec: switch MCE handler for kexec/kdump Message-ID: <20150306083421.GD3514@pd.tnic> References: <1425373306-26187-1-git-send-email-n-horiguchi@ah.jp.nec.com> <3908561D78D1C84285E8C5FCA982C28F329F5837@ORSMSX114.amr.corp.intel.com> <20150304074117.GA30501@hori1.linux.bs1.fc.nec.co.jp> <3908561D78D1C84285E8C5FCA982C28F329F835A@ORSMSX114.amr.corp.intel.com> <20150305012447.GA16001@hori1.linux.bs1.fc.nec.co.jp> <20150305064509.GA16012@hori1.linux.bs1.fc.nec.co.jp> <20150305085735.GE3915@pd.tnic> <20150305093752.GA11764@hori1.linux.bs1.fc.nec.co.jp> <20150306025911.GA3619@hori1.linux.bs1.fc.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150306025911.GA3619@hori1.linux.bs1.fc.nec.co.jp> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 06, 2015 at 02:59:13AM +0000, Naoya Horiguchi wrote: > From 8890e9976c525a4b480bf5f86008641688de8c11 Mon Sep 17 00:00:00 2001 > From: Naoya Horiguchi > Date: Fri, 6 Mar 2015 11:52:10 +0900 > Subject: [PATCH v6] x86: mce: kexec: switch MCE handler for kexec/kdump > > kexec disables (or "shoots down") all CPUs other than a crashing CPU before > entering the 2nd kernel. But the MCE handler is still enabled after that, > so if MCE happens and broadcasts over the CPUs after the main thread starts > the 2nd kernel (which might not initialize MCE device yet, or might decide > not to enable it,) MCE handler runs only on the other CPUs (not on the main > thread,) leading to kernel panic with MCE synchronization. The user-visible > effect of this bug is kdump failure. > > Our standard MCE handler do_machine_check() assumes some about system's > status and it's hard to alter it to cover kexec/kdump context, so let's add > another kdump-specific one and switch to it. > > Note that this problem exists since current MCE handler was implemented in > 2.6.32, and recently commit 716079f66eac ("mce: Panic when a core has reached > a timeout") made it more visible by changing the default behavior of the > synchronization timeout from "ignore" to "panic". > > Signed-off-by: Naoya Horiguchi ... > +static void machine_check_under_kdump(struct pt_regs *regs, long error_code) > +{ > + struct mce m = {}; > + char *msg = NULL; > + char *nmsg = NULL; > + int i; > + int worst = 0; > + int severity; > + int ret; if you do here if (mce_cfg.disabled) return; you can use the simple rdmsrl variants and not the _safe() ones with exception handling. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --