From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932331AbbCEJmJ (ORCPT ); Thu, 5 Mar 2015 04:42:09 -0500 Received: from TYO202.gate.nec.co.jp ([210.143.35.52]:39102 "EHLO tyo202.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932208AbbCEJmG convert rfc822-to-8bit (ORCPT ); Thu, 5 Mar 2015 04:42:06 -0500 From: Naoya Horiguchi To: Borislav Petkov CC: "Luck, Tony" , Prarit Bhargava , Vivek Goyal , "linux-kernel@vger.kernel.org" , Junichi Nomura , Kiyoshi Ueda Subject: Re: [PATCH v5] x86: mce: kexec: switch MCE handler for kexec/kdump Thread-Topic: [PATCH v5] x86: mce: kexec: switch MCE handler for kexec/kdump Thread-Index: AQHQVw/svlkiumClHUGHJf+lXEDEK50M/9iAgAALQgA= Date: Thu, 5 Mar 2015 09:37:52 +0000 Message-ID: <20150305093752.GA11764@hori1.linux.bs1.fc.nec.co.jp> References: <1425373306-26187-1-git-send-email-n-horiguchi@ah.jp.nec.com> <3908561D78D1C84285E8C5FCA982C28F329F5837@ORSMSX114.amr.corp.intel.com> <20150304074117.GA30501@hori1.linux.bs1.fc.nec.co.jp> <3908561D78D1C84285E8C5FCA982C28F329F835A@ORSMSX114.amr.corp.intel.com> <20150305012447.GA16001@hori1.linux.bs1.fc.nec.co.jp> <20150305064509.GA16012@hori1.linux.bs1.fc.nec.co.jp> <20150305085735.GE3915@pd.tnic> In-Reply-To: <20150305085735.GE3915@pd.tnic> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.128.101.9] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: <63EC43BAD664954DA2DDEAAE4244779F@gisp.nec.co.jp> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 05, 2015 at 09:57:35AM +0100, Borislav Petkov wrote: > On Thu, Mar 05, 2015 at 06:45:10AM +0000, Naoya Horiguchi wrote: ... > > > > Signed-off-by: Naoya Horiguchi > > Cc: [2.6.32+] > > I don't think you can CC stable on something which looks like a new > feature to me. OK, I drop it. > It is most likely that distros will pick it up separately. > > > --- > > ChangeLog v4 -> v5: > > - drop MCE_UC/AR_SEVERITY re-ordering > > - move most of code to arch/x86/kernel/crash.c > > - export some MCE internal variables/routines via arch/x86/include/asm/mce.h > > > > ChangeLog v3 -> v4: > > - fixed AR and UC order in enum severity_level because UC is severer than AR > > by definition. Current code is not affected by this wrong order by chance. > > - check severity in machine_check_under_kdump(), and call mce_panic() if the > > resultant severity is as bad as or worse than MCE_AR_SEVERITY. > > - use static global variable kdump_cpu instead of mca_cfg->kdump_cpu > > - reduce "#ifdef CONFIG_KEXEC" > > - add "#ifdef CONFIG_X86_MCE" for declaration of machine_check_under_kdump() > > in mce.h > > - update comment on switch_mce_handler_for_kdump() > > > > ChangeLog v2 -> v3 > > - go to "switch MCE handler" approach > > > > ChangeLog v1 -> v2 > > - clear MSR_IA32_MCG_CTL, MSR_IA32_MCx_CTL, and CR4.MCE instead of using > > global flag to ignore MCE events. > > - fixed the description of the problem > > --- > > arch/x86/include/asm/mce.h | 19 +++++++ > > arch/x86/kernel/cpu/mcheck/mce-internal.h | 13 ----- > > arch/x86/kernel/cpu/mcheck/mce.c | 10 ++-- > > arch/x86/kernel/crash.c | 87 +++++++++++++++++++++++++++++++ > > 4 files changed, 111 insertions(+), 18 deletions(-) > > > > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > > index 51b26e895933..fbb385611a14 100644 > > --- a/arch/x86/include/asm/mce.h > > +++ b/arch/x86/include/asm/mce.h > > @@ -248,4 +248,23 @@ struct cper_sec_mem_err; > > extern void apei_mce_report_mem_error(int corrected, > > struct cper_sec_mem_err *mem_err); > > > > +enum severity_level { > > + MCE_NO_SEVERITY, > > + MCE_DEFERRED_SEVERITY, > > + MCE_UCNA_SEVERITY = MCE_DEFERRED_SEVERITY, > > + MCE_KEEP_SEVERITY, > > + MCE_SOME_SEVERITY, > > + MCE_AO_SEVERITY, > > + MCE_UC_SEVERITY, > > + MCE_AR_SEVERITY, > > + MCE_PANIC_SEVERITY, > > +}; > > + > > +int mce_severity(struct mce *a, int tolerant, char **msg, bool is_excp); > > + > > +extern void mce_panic(char *msg, struct mce *final, char *exp); > > mce_panic is doing a lot of MCE-specific stuff like flushing out mcelog > etc. I don't think you need all that in your case - I think in your case > you simply want to panic(). print_mce() doesn't help? > > +extern u64 mce_rdmsrl(u32 msr); > > +extern void mce_wrmsrl(u32 msr, u64 v); > > Those wrap error injection. I don't think you need that either - use > generic rd/wrmsr* functions. > > > +extern inline void mce_gather_info(struct mce *m, struct pt_regs *regs); > > This has a vm86 mode special case. Also probably not needed for you. You > can simply read MSR_IA32_MCG_STATUS in a simplified, private version. > > > +extern void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs); > > Now this is exposing a really MCE-internal function. You probably should add a > > mce_callback(bank, m, regs); > > in a prepatch and call it from your code. Currently quirk_no_way_out() is set only for a specific CPU model, so even if we define another callback for kdump code, setting it to a real function need to be done in MCE internal function, __mcheck_cpu_apply_quirks() ? It seems that in this version I relied on reusing exisiting code too much. So I'll do only what I really need. Then, if the model specific behavior (what quirk_sandybridge_ifu() does) doesn't affect machine_check_under_kdump()'s behavior, simply stop copying this part is a right thing to do? > With the above simplified versions used, the rest of the patch becomes > almost trivial. Other than that, I'm OK to write in the simplified form. Thanks, Naoya Horiguchi