From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753114AbbETJ2F (ORCPT ); Wed, 20 May 2015 05:28:05 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56599 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752331AbbETJ2D (ORCPT ); Wed, 20 May 2015 05:28:03 -0400 Date: Wed, 20 May 2015 11:28:00 +0200 From: Borislav Petkov To: "Chen, Gong" , tony.luck@intel.com Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH 4/4 Rebase] x86, MCE: Avoid potential deadlock in MCE context Message-ID: <20150520092800.GB3645@pd.tnic> References: <1432150538-3120-1-git-send-email-gong.chen@linux.intel.com> <1432150538-3120-5-git-send-email-gong.chen@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1432150538-3120-5-git-send-email-gong.chen@linux.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 20, 2015 at 03:35:38PM -0400, Chen, Gong wrote: > Printing in MCE context is a no-no, currently, as printk is not > NMI-safe. If some of the notifiers on the MCE chain call *printk*, we > may deadlock. In order to avoid that, delay printk into process context > to fix it. > > Background info at: https://lkml.org/lkml/2014/6/27/26 > > Reported-by: Xie XiuQi > Signed-off-by: Chen, Gong > Link: http://lkml.kernel.org/r/1406797523-28710-6-git-send-email-gong.chen@linux.intel.com > [ Boris: rewrite a bit. ] > Signed-off-by: Borislav Petkov > --- > arch/x86/include/asm/mce.h | 1 + > arch/x86/kernel/cpu/mcheck/mce-apei.c | 2 +- > arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++++-- > arch/x86/kernel/cpu/mcheck/mce_intel.c | 1 - > arch/x86/kernel/cpu/mcheck/therm_throt.c | 1 + > arch/x86/kernel/cpu/mcheck/threshold.c | 1 + > 6 files changed, 10 insertions(+), 4 deletions(-) .... > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c > index 1af51b1586d7..2733f275237d 100644 > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c > @@ -427,6 +427,7 @@ static inline void __smp_thermal_interrupt(void) > { > inc_irq_stat(irq_thermal_count); > smp_thermal_vector(); > + mce_queue_irq_work(); Hmm, at a second glance, this looks wrong. I think we should do that call in intel_thermal_interrupt(). > asmlinkage __visible void smp_thermal_interrupt(struct pt_regs *regs) > diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c > index 7245980186ee..d695faa234eb 100644 > --- a/arch/x86/kernel/cpu/mcheck/threshold.c > +++ b/arch/x86/kernel/cpu/mcheck/threshold.c > @@ -22,6 +22,7 @@ static inline void __smp_threshold_interrupt(void) > { > inc_irq_stat(irq_threshold_count); > mce_threshold_vector(); > + mce_queue_irq_work(); Same here. mce_queue_irq_work() call should be issued in both AMD and Intel threshold handlers but not in the generic one which is unlikely to queue any MCE... Right? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --