linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] printk: ignore recursion_bug flag when MCE in progress
@ 2012-05-23  1:58 ShuoX Liu
  2012-05-23 10:01 ` Borislav Petkov
  0 siblings, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-05-23  1:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: andi, Andrew Morton, Yanmin Zhang

From: ShuoX Liu <shuox.liu@intel.com>

When MCE happens in printk, we ignore recursion_bug to make sure
some MCE logs printed out. Re-use mce_entry variable.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
I found mce_entry was introduced by commit 553f265f, but it's not
used now. Why not removed?
---
 arch/x86/include/asm/mce.h       |    2 --
 arch/x86/kernel/cpu/mcheck/mce.c |    2 --
 include/linux/kernel.h           |    1 +
 kernel/printk.c                  |    4 +++-
 4 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 441520e..aeda4cc 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
 
-extern atomic_t mce_entry;
-
 typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
 DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 11c9166..6073354 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -61,8 +61,6 @@ int mce_disabled __read_mostly;
 
 #define SPINUNIT 100	/* 100ns */
 
-atomic_t mce_entry;
-
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
 /*
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 645231c..24af685 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -354,6 +354,7 @@ unsigned long int_sqrt(unsigned long);
 extern void bust_spinlocks(int yes);
 extern void wake_up_klogd(void);
 extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
+extern atomic_t mce_entry;
 extern int panic_timeout;
 extern int panic_on_oops;
 extern int panic_on_unrecovered_nmi;
diff --git a/kernel/printk.c b/kernel/printk.c
index 473afdb..2bae087 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -79,6 +79,7 @@ int console_printk[4] = {
 int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
+atomic_t mce_entry;
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -864,7 +865,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 		 * recursion and return - but flag the recursion so that
 		 * it can be printed at the next appropriate moment:
 		 */
-		if (!oops_in_progress && !lockdep_recursing(current)) {
+		if (!oops_in_progress && !atomic_read(&mce_entry)
+				&& !lockdep_recursing(current)) {
 			recursion_bug = 1;
 			goto out_restore_irqs;
 		}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] printk: ignore recursion_bug flag when MCE in progress
  2012-05-23  1:58 [PATCH] printk: ignore recursion_bug flag when MCE in progress ShuoX Liu
@ 2012-05-23 10:01 ` Borislav Petkov
  2012-05-24  0:38   ` Yanmin Zhang
  2012-05-24  5:59   ` [PATCH v2] printk: ignore recursion_bug flag in HW error handle process ShuoX Liu
  0 siblings, 2 replies; 42+ messages in thread
From: Borislav Petkov @ 2012-05-23 10:01 UTC (permalink / raw)
  To: ShuoX Liu; +Cc: linux-kernel, andi, Andrew Morton, Yanmin Zhang, Tony Luck

+ Tony

On Wed, May 23, 2012 at 09:58:34AM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <shuox.liu@intel.com>
> 
> When MCE happens in printk, we ignore recursion_bug to make sure
> some MCE logs printed out. Re-use mce_entry variable.
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> ---
> I found mce_entry was introduced by commit 553f265f, but it's not
> used now. Why not removed?
> ---
>  arch/x86/include/asm/mce.h       |    2 --
>  arch/x86/kernel/cpu/mcheck/mce.c |    2 --
>  include/linux/kernel.h           |    1 +
>  kernel/printk.c                  |    4 +++-
>  4 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 441520e..aeda4cc 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
>  DECLARE_PER_CPU(unsigned, mce_exception_count);
>  DECLARE_PER_CPU(unsigned, mce_poll_count);
>  
> -extern atomic_t mce_entry;
> -
>  typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
>  DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
>  
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 11c9166..6073354 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -61,8 +61,6 @@ int mce_disabled __read_mostly;
>  
>  #define SPINUNIT 100	/* 100ns */
>  
> -atomic_t mce_entry;
> -
>  DEFINE_PER_CPU(unsigned, mce_exception_count);
>  
>  /*
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 645231c..24af685 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -354,6 +354,7 @@ unsigned long int_sqrt(unsigned long);
>  extern void bust_spinlocks(int yes);
>  extern void wake_up_klogd(void);
>  extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
> +extern atomic_t mce_entry;
>  extern int panic_timeout;
>  extern int panic_on_oops;
>  extern int panic_on_unrecovered_nmi;
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 473afdb..2bae087 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -79,6 +79,7 @@ int console_printk[4] = {
>  int oops_in_progress;
>  EXPORT_SYMBOL(oops_in_progress);
>  
> +atomic_t mce_entry;
>  /*
>   * console_sem protects the console_drivers list, and also
>   * provides serialisation for access to the entire console
> @@ -864,7 +865,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>  		 * recursion and return - but flag the recursion so that
>  		 * it can be printed at the next appropriate moment:
>  		 */
> -		if (!oops_in_progress && !lockdep_recursing(current)) {
> +		if (!oops_in_progress && !atomic_read(&mce_entry)

This is leaking x86-specific (MCE) stuff in generic kernel code. I think
it would be more appropriate to add a in_hw_error() helper or similar
and define it on each arch. I can very well imagine other architectures
would like to print hw error info too...

Hmmm.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] printk: ignore recursion_bug flag when MCE in progress
  2012-05-23 10:01 ` Borislav Petkov
@ 2012-05-24  0:38   ` Yanmin Zhang
  2012-05-24  5:59   ` [PATCH v2] printk: ignore recursion_bug flag in HW error handle process ShuoX Liu
  1 sibling, 0 replies; 42+ messages in thread
From: Yanmin Zhang @ 2012-05-24  0:38 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: ShuoX Liu, linux-kernel, andi, Andrew Morton, Tony Luck

On Wed, 2012-05-23 at 12:01 +0200, Borislav Petkov wrote:
> + Tony
> 
> On Wed, May 23, 2012 at 09:58:34AM +0800, ShuoX Liu wrote:
> > From: ShuoX Liu <shuox.liu@intel.com>
> > 
> > When MCE happens in printk, we ignore recursion_bug to make sure
> > some MCE logs printed out. Re-use mce_entry variable.
> > 
> > Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> > Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> > ---
> > I found mce_entry was introduced by commit 553f265f, but it's not
> > used now. Why not removed?
> > ---
> >  arch/x86/include/asm/mce.h       |    2 --
> >  arch/x86/kernel/cpu/mcheck/mce.c |    2 --
> >  include/linux/kernel.h           |    1 +
> >  kernel/printk.c                  |    4 +++-
> >  4 files changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> > index 441520e..aeda4cc 100644
> > --- a/arch/x86/include/asm/mce.h
> > +++ b/arch/x86/include/asm/mce.h
> > @@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
> >  DECLARE_PER_CPU(unsigned, mce_exception_count);
> >  DECLARE_PER_CPU(unsigned, mce_poll_count);
> >  
> > -extern atomic_t mce_entry;
> > -
> >  typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
> >  DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
> >  
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 11c9166..6073354 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -61,8 +61,6 @@ int mce_disabled __read_mostly;
> >  
> >  #define SPINUNIT 100	/* 100ns */
> >  
> > -atomic_t mce_entry;
> > -
> >  DEFINE_PER_CPU(unsigned, mce_exception_count);
> >  
> >  /*
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 645231c..24af685 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -354,6 +354,7 @@ unsigned long int_sqrt(unsigned long);
> >  extern void bust_spinlocks(int yes);
> >  extern void wake_up_klogd(void);
> >  extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
> > +extern atomic_t mce_entry;
> >  extern int panic_timeout;
> >  extern int panic_on_oops;
> >  extern int panic_on_unrecovered_nmi;
> > diff --git a/kernel/printk.c b/kernel/printk.c
> > index 473afdb..2bae087 100644
> > --- a/kernel/printk.c
> > +++ b/kernel/printk.c
> > @@ -79,6 +79,7 @@ int console_printk[4] = {
> >  int oops_in_progress;
> >  EXPORT_SYMBOL(oops_in_progress);
> >  
> > +atomic_t mce_entry;
> >  /*
> >   * console_sem protects the console_drivers list, and also
> >   * provides serialisation for access to the entire console
> > @@ -864,7 +865,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
> >  		 * recursion and return - but flag the recursion so that
> >  		 * it can be printed at the next appropriate moment:
> >  		 */
> > -		if (!oops_in_progress && !lockdep_recursing(current)) {
> > +		if (!oops_in_progress && !atomic_read(&mce_entry)
> 
> This is leaking x86-specific (MCE) stuff in generic kernel code. I think
> it would be more appropriate to add a in_hw_error() helper or similar
> and define it on each arch. I can very well imagine other architectures
> would like to print hw error info too...
Good idea. We would do so to make it more generic.

Thanks,
Yanmin



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] printk: ignore recursion_bug flag in HW error handle process
  2012-05-23 10:01 ` Borislav Petkov
  2012-05-24  0:38   ` Yanmin Zhang
@ 2012-05-24  5:59   ` ShuoX Liu
  2012-05-24  6:11     ` Borislav Petkov
  1 sibling, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-05-24  5:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Borislav Petkov, andi, Andrew Morton, Yanmin Zhang, Tony Luck,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

When MCE happens in printk, we ignore recursion_bug to make sure MCE logs printed out.

According to Boris' suggestion, we add some helper functions.
1) hw_error_enter: Call it when specific arch begins to process a hardware error.
2) hw_error_exit: Call it when specific arch finishes the processing of a hardware error.
3) in_hw_error():indicates whether HW error handling is in processing.

Each arch could call the helpers in their arch-dependent HW error handlers.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 arch/x86/include/asm/mce.h       |    2 --
 arch/x86/kernel/cpu/mcheck/mce.c |    6 ++----
 include/linux/cpu.h              |   17 +++++++++++++++++
 kernel/cpu.c                     |    3 +++
 kernel/printk.c                  |    3 ++-
 5 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 441520e..aeda4cc 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
 
-extern atomic_t mce_entry;
-
 typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
 DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..aaf41d2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -61,8 +61,6 @@ int mce_disabled __read_mostly;
 
 #define SPINUNIT 100	/* 100ns */
 
-atomic_t mce_entry;
-
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
 /*
@@ -1015,7 +1013,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
 	char *msg = "Unknown";
 
-	atomic_inc(&mce_entry);
+	hw_error_enter();
 
 	this_cpu_inc(mce_exception_count);
 
@@ -1143,7 +1141,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		mce_report_event(regs);
 	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
 out:
-	atomic_dec(&mce_entry);
+	hw_error_exit();
 	sync_core();
 }
 EXPORT_SYMBOL_GPL(do_machine_check);
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 7230bb5..beb56d0 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -210,4 +210,21 @@ static inline int disable_nonboot_cpus(void) { return 0; }
 static inline void enable_nonboot_cpus(void) {}
 #endif /* !CONFIG_PM_SLEEP_SMP */
 
+/* HW error handle status helpers */
+extern atomic_t hw_error;
+static inline void hw_error_enter(void)
+{
+	atomic_inc(&hw_error);
+}
+
+static inline void hw_error_exit(void)
+{
+	atomic_dec(&hw_error);
+}
+
+static inline int in_hw_error(void)
+{
+	return atomic_read(&hw_error);
+}
+
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 0e6353c..54b9c1f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -679,3 +679,6 @@ void init_cpu_online(const struct cpumask *src)
 {
 	cpumask_copy(to_cpumask(cpu_online_bits), src);
 }
+
+/* hardware error status */
+atomic_t hw_error __read_mostly = ATOMIC_INIT(0);
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..6f0f216 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1295,7 +1295,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * recursion and return - but flag the recursion so that
 		 * it can be printed at the next appropriate moment:
 		 */
-		if (!oops_in_progress && !lockdep_recursing(current)) {
+		if (!oops_in_progress && !in_hw_error()
+				&& !lockdep_recursing(current)) {
 			recursion_bug = 1;
 			goto out_restore_irqs;
 		}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] printk: ignore recursion_bug flag in HW error handle process
  2012-05-24  5:59   ` [PATCH v2] printk: ignore recursion_bug flag in HW error handle process ShuoX Liu
@ 2012-05-24  6:11     ` Borislav Petkov
  2012-05-24 22:56       ` Andrew Morton
  0 siblings, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2012-05-24  6:11 UTC (permalink / raw)
  To: ShuoX Liu
  Cc: linux-kernel, andi, Andrew Morton, Yanmin Zhang, Tony Luck, Ingo Molnar

On Thu, May 24, 2012 at 01:59:38PM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <shuox.liu@intel.com>
> 
> When MCE happens in printk, we ignore recursion_bug to make sure MCE logs printed out.
> 
> According to Boris' suggestion, we add some helper functions.
> 1) hw_error_enter: Call it when specific arch begins to process a hardware error.
> 2) hw_error_exit: Call it when specific arch finishes the processing of a hardware error.
> 3) in_hw_error():indicates whether HW error handling is in processing.
> 
> Each arch could call the helpers in their arch-dependent HW error handlers.

Yep, looks better, thanks. Design question though:

> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> ---
>  arch/x86/include/asm/mce.h       |    2 --
>  arch/x86/kernel/cpu/mcheck/mce.c |    6 ++----
>  include/linux/cpu.h              |   17 +++++++++++++++++
>  kernel/cpu.c                     |    3 +++
>  kernel/printk.c                  |    3 ++-
>  5 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 441520e..aeda4cc 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -187,8 +187,6 @@ int mce_available(struct cpuinfo_x86 *c);
>  DECLARE_PER_CPU(unsigned, mce_exception_count);
>  DECLARE_PER_CPU(unsigned, mce_poll_count);
>  
> -extern atomic_t mce_entry;
> -
>  typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
>  DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
>  
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..aaf41d2 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -61,8 +61,6 @@ int mce_disabled __read_mostly;
>  
>  #define SPINUNIT 100	/* 100ns */
>  
> -atomic_t mce_entry;
> -
>  DEFINE_PER_CPU(unsigned, mce_exception_count);
>  
>  /*
> @@ -1015,7 +1013,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
>  	char *msg = "Unknown";
>  
> -	atomic_inc(&mce_entry);
> +	hw_error_enter();
>  
>  	this_cpu_inc(mce_exception_count);
>  
> @@ -1143,7 +1141,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  		mce_report_event(regs);
>  	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>  out:
> -	atomic_dec(&mce_entry);
> +	hw_error_exit();
>  	sync_core();
>  }
>  EXPORT_SYMBOL_GPL(do_machine_check);
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 7230bb5..beb56d0 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -210,4 +210,21 @@ static inline int disable_nonboot_cpus(void) { return 0; }
>  static inline void enable_nonboot_cpus(void) {}
>  #endif /* !CONFIG_PM_SLEEP_SMP */
>  
> +/* HW error handle status helpers */
> +extern atomic_t hw_error;
> +static inline void hw_error_enter(void)
> +{
> +	atomic_inc(&hw_error);
> +}
> +
> +static inline void hw_error_exit(void)
> +{
> +	atomic_dec(&hw_error);
> +}
> +
> +static inline int in_hw_error(void)
> +{
> +	return atomic_read(&hw_error);
> +}

Shouldn't those be generic empty functions and each arch implement their
own with the stuff they want to do on the respective architecture when
they get a hardware error?

Andrew, Ingo?

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] printk: ignore recursion_bug flag in HW error handle process
  2012-05-24  6:11     ` Borislav Petkov
@ 2012-05-24 22:56       ` Andrew Morton
  2012-05-25  0:30         ` Yanmin Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2012-05-24 22:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: ShuoX Liu, linux-kernel, andi, Yanmin Zhang, Tony Luck, Ingo Molnar

On Thu, 24 May 2012 08:11:45 +0200
Borislav Petkov <bp@alien8.de> wrote:

> > +/* HW error handle status helpers */
> > +extern atomic_t hw_error;
> > +static inline void hw_error_enter(void)
> > +{
> > +	atomic_inc(&hw_error);
> > +}
> > +
> > +static inline void hw_error_exit(void)
> > +{
> > +	atomic_dec(&hw_error);
> > +}
> > +
> > +static inline int in_hw_error(void)
> > +{
> > +	return atomic_read(&hw_error);
> > +}
> 
> Shouldn't those be generic empty functions and each arch implement their
> own with the stuff they want to do on the respective architecture when
> they get a hardware error?

This code needs documentation.

Specifically, it should clearly explain (and hence define) what a
"hardware error" *is*, and for what purpose this code exists.

Because as it stands, this interface is hopelessly vague.  Once one
sees that it is *specifically* used for handling mce within a printk,
it all makes sense.

And with that understanding comes the realisation that the interface is
poorly named.  It will not be used for any purpose other than adjusting
printk() behavior so it should mention printk() in its name and in its
comments and probably it should all be moved into printk.h.

Futhermore, this code is not really related to MCE or hardware or
anything else.  It is simply a way in which callers can suppress
printk()'s recursion check.  Callers are free to use it for reasons
other than "hardware errors".

And once all that is done, and this interface becomes part of printk()
then no, there is no need to add per-arch hooks.  An arch can call into
printk_recursion_check_disable() and printk_recursion_chack_enable() -
nice and simple.


IOW, the title of this patch should be

	[patch 1/2] printk: add interface for disabling recursion check
	[patch 2/2] x86 mce: use new printk recursion disabling interface

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] printk: ignore recursion_bug flag in HW error handle process
  2012-05-24 22:56       ` Andrew Morton
@ 2012-05-25  0:30         ` Yanmin Zhang
  2012-05-25  7:19           ` [PATCH 1/2] printk: add interface for disabling recursion check ShuoX Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Yanmin Zhang @ 2012-05-25  0:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Borislav Petkov, ShuoX Liu, linux-kernel, andi, Tony Luck, Ingo Molnar

On Thu, 2012-05-24 at 15:56 -0700, Andrew Morton wrote:
> On Thu, 24 May 2012 08:11:45 +0200
> Borislav Petkov <bp@alien8.de> wrote:
> 
> > > +/* HW error handle status helpers */
> > > +extern atomic_t hw_error;
> > > +static inline void hw_error_enter(void)
> > > +{
> > > +	atomic_inc(&hw_error);
> > > +}
> > > +
> > > +static inline void hw_error_exit(void)
> > > +{
> > > +	atomic_dec(&hw_error);
> > > +}
> > > +
> > > +static inline int in_hw_error(void)
> > > +{
> > > +	return atomic_read(&hw_error);
> > > +}
> > 
> > Shouldn't those be generic empty functions and each arch implement their
> > own with the stuff they want to do on the respective architecture when
> > they get a hardware error?
> 
> This code needs documentation.
> 
> Specifically, it should clearly explain (and hence define) what a
> "hardware error" *is*, and for what purpose this code exists.
> 
> Because as it stands, this interface is hopelessly vague.  Once one
> sees that it is *specifically* used for handling mce within a printk,
> it all makes sense.
> 
> And with that understanding comes the realisation that the interface is
> poorly named.  It will not be used for any purpose other than adjusting
> printk() behavior so it should mention printk() in its name and in its
> comments and probably it should all be moved into printk.h.
> 
> Futhermore, this code is not really related to MCE or hardware or
> anything else.  It is simply a way in which callers can suppress
> printk()'s recursion check.  Callers are free to use it for reasons
> other than "hardware errors".
> 
> And once all that is done, and this interface becomes part of printk()
> then no, there is no need to add per-arch hooks.  An arch can call into
> printk_recursion_check_disable() and printk_recursion_chack_enable() -
> nice and simple.
> 
> 
> IOW, the title of this patch should be
> 
> 	[patch 1/2] printk: add interface for disabling recursion check
> 	[patch 2/2] x86 mce: use new printk recursion disabling interface
Andrew,

Thanks for the detailed comment. It's more reasonable to bind it to printk.
We would follow it to create new patches.

Yanmin



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/2] printk: add interface for disabling recursion check
  2012-05-25  0:30         ` Yanmin Zhang
@ 2012-05-25  7:19           ` ShuoX Liu
  2012-05-25  7:21             ` [PATCH 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
  2012-05-25 16:09             ` [PATCH 1/2] printk: add interface for disabling recursion check Luck, Tony
  0 siblings, 2 replies; 42+ messages in thread
From: ShuoX Liu @ 2012-05-25  7:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yanmin Zhang, Andrew Morton, Borislav Petkov, andi, Tony Luck,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to print some
important information. So we add these interfaces in printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 include/linux/printk.h |    3 +++
 kernel/printk.c        |   16 +++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
 		console_loglevel = 15;
 }
 
+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
 struct va_format {
 	const char *fmt;
 	va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..61067fe 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,18 @@ int console_printk[4] = {
 int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+	atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+	atomic_dec(&recursion_check_disabled);
+}
+
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -1295,7 +1307,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * recursion and return - but flag the recursion so that
 		 * it can be printed at the next appropriate moment:
 		 */
-		if (!oops_in_progress && !lockdep_recursing(current)) {
+		if (!atomic_read(&recursion_check_disabled)
+			&& !oops_in_progress
+			&& !lockdep_recursing(current)) {
 			recursion_bug = 1;
 			goto out_restore_irqs;
 		}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/2] x86 mce: use new printk recursion disabling interface
  2012-05-25  7:19           ` [PATCH 1/2] printk: add interface for disabling recursion check ShuoX Liu
@ 2012-05-25  7:21             ` ShuoX Liu
  2012-05-25  7:41               ` Borislav Petkov
  2012-05-25 16:09             ` [PATCH 1/2] printk: add interface for disabling recursion check Luck, Tony
  1 sibling, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-05-25  7:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yanmin Zhang, Andrew Morton, Borislav Petkov, andi, Tony Luck,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

Disable printk recursion to make sure MCE logs printed out.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..365c35d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
 	char *msg = "Unknown";
 
+	printk_recursion_check_disable();
 	atomic_inc(&mce_entry);
 
 	this_cpu_inc(mce_exception_count);
@@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
 out:
 	atomic_dec(&mce_entry);
+	printk_recursion_check_enable();
 	sync_core();
 }
 EXPORT_SYMBOL_GPL(do_machine_check);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface
  2012-05-25  7:21             ` [PATCH 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
@ 2012-05-25  7:41               ` Borislav Petkov
  2012-05-25  8:00                 ` ShuoX Liu
  2012-05-28  2:07                 ` ShuoX Liu
  0 siblings, 2 replies; 42+ messages in thread
From: Borislav Petkov @ 2012-05-25  7:41 UTC (permalink / raw)
  To: ShuoX Liu
  Cc: linux-kernel, Yanmin Zhang, Andrew Morton, andi, Tony Luck, Ingo Molnar

On Fri, May 25, 2012 at 03:21:12PM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <shuox.liu@intel.com>
> 
> Disable printk recursion to make sure MCE logs printed out.
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..365c35d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
>  	char *msg = "Unknown";
>  
> +	printk_recursion_check_disable();
>  	atomic_inc(&mce_entry);
>  
>  	this_cpu_inc(mce_exception_count);
> @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>  out:
>  	atomic_dec(&mce_entry);
> +	printk_recursion_check_enable();

Looks like those should be at the beginning and the end of print_mce() -
do_machine_check() could exit without printing an MCE and disabling the
recursion check then is superfluous, methinks.

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface
  2012-05-25  7:41               ` Borislav Petkov
@ 2012-05-25  8:00                 ` ShuoX Liu
  2012-05-28  2:07                 ` ShuoX Liu
  1 sibling, 0 replies; 42+ messages in thread
From: ShuoX Liu @ 2012-05-25  8:00 UTC (permalink / raw)
  To: Borislav Petkov, linux-kernel, Yanmin Zhang, Andrew Morton, andi,
	Tony Luck, Ingo Molnar

On 2012年05月25日 15:41, Borislav Petkov wrote:

> On Fri, May 25, 2012 at 03:21:12PM +0800, ShuoX Liu wrote:
>> From: ShuoX Liu <shuox.liu@intel.com>
>>
>> Disable printk recursion to make sure MCE logs printed out.
>>
>> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
>> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
>> ---
>>  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>> index 2afcbd2..365c35d 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>> @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>  	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
>>  	char *msg = "Unknown";
>>  
>> +	printk_recursion_check_disable();
>>  	atomic_inc(&mce_entry);
>>  
>>  	this_cpu_inc(mce_exception_count);
>> @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>  	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>>  out:
>>  	atomic_dec(&mce_entry);
>> +	printk_recursion_check_enable();
> 
> Looks like those should be at the beginning and the end of print_mce() -
> do_machine_check() could exit without printing an MCE and disabling the
> recursion check then is superfluous, methinks.

Thanks for comment. It seems you are right. I will check the code next
Monday.

> 
> Thanks.
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 1/2] printk: add interface for disabling recursion check
  2012-05-25  7:19           ` [PATCH 1/2] printk: add interface for disabling recursion check ShuoX Liu
  2012-05-25  7:21             ` [PATCH 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
@ 2012-05-25 16:09             ` Luck, Tony
  2012-05-28  0:30               ` Yanmin Zhang
  1 sibling, 1 reply; 42+ messages in thread
From: Luck, Tony @ 2012-05-25 16:09 UTC (permalink / raw)
  To: Liu, ShuoX, linux-kernel
  Cc: Yanmin Zhang, Andrew Morton, Borislav Petkov, andi, Ingo Molnar

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 437 bytes --]

+void printk_recursion_check_enable(void)
+{
+	atomic_dec(&recursion_check_disabled);
+}


Is it worth a BUG_ON() in here to check that recursion_check_disabled
is >=1 before blindly decrementing it? Or is this interface so simple
that nobody would ever get this wrong?

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 1/2] printk: add interface for disabling recursion check
  2012-05-25 16:09             ` [PATCH 1/2] printk: add interface for disabling recursion check Luck, Tony
@ 2012-05-28  0:30               ` Yanmin Zhang
  2012-05-28  2:54                 ` [PATCH v3 " ShuoX Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Yanmin Zhang @ 2012-05-28  0:30 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Liu, ShuoX, linux-kernel, Andrew Morton, Borislav Petkov, andi,
	Ingo Molnar

On Fri, 2012-05-25 at 16:09 +0000, Luck, Tony wrote:
> +void printk_recursion_check_enable(void)
> +{
> +	atomic_dec(&recursion_check_disabled);
> +}
> 
> 
> Is it worth a BUG_ON() in here to check that recursion_check_disabled
> is >=1 before blindly decrementing it? Or is this interface so simple
> that nobody would ever get this wrong?
Tony,

The interface is clear and simple. But a WARN_ON checking is better
to have. We would add WARN_ON.

Thanks,
Yanmin



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface
  2012-05-25  7:41               ` Borislav Petkov
  2012-05-25  8:00                 ` ShuoX Liu
@ 2012-05-28  2:07                 ` ShuoX Liu
  2012-05-30  9:08                   ` Borislav Petkov
  1 sibling, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-05-28  2:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Yanmin Zhang, Andrew Morton, andi, Tony Luck, Ingo Molnar

On 2012年05月25日 15:41, Borislav Petkov wrote:

> On Fri, May 25, 2012 at 03:21:12PM +0800, ShuoX Liu wrote:
>> From: ShuoX Liu <shuox.liu@intel.com>
>>
>> Disable printk recursion to make sure MCE logs printed out.
>>
>> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
>> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
>> ---
>>  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>> index 2afcbd2..365c35d 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>> @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>  	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
>>  	char *msg = "Unknown";
>>  
>> +	printk_recursion_check_disable();
>>  	atomic_inc(&mce_entry);
>>  
>>  	this_cpu_inc(mce_exception_count);
>> @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>  	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>>  out:
>>  	atomic_dec(&mce_entry);
>> +	printk_recursion_check_enable();
> 
> Looks like those should be at the beginning and the end of print_mce() -
> do_machine_check() could exit without printing an MCE and disabling the
> recursion check then is superfluous, methinks.

Boris,
I checked code and found some other functions in do_machine_check() also
would printk something. Such as add_taint(). So i think we'd better
place the recursion check at the beginning and the end of
do_machine_check(). Also more printks later(maybe) added will benefit
from this. Do you agree?

> 
> Thanks.
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 1/2] printk: add interface for disabling recursion check
  2012-05-28  0:30               ` Yanmin Zhang
@ 2012-05-28  2:54                 ` ShuoX Liu
  2012-05-28  2:56                   ` [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
  0 siblings, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-05-28  2:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yanmin Zhang, Luck, Tony, Andrew Morton, Borislav Petkov, andi,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 include/linux/printk.h |    3 +++
 kernel/printk.c        |   17 ++++++++++++++++-
 2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
 		console_loglevel = 15;
 }
 
+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
 struct va_format {
 	const char *fmt;
 	va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
 int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+	atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+	WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+	atomic_dec(&recursion_check_disabled);
+}
+
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * recursion and return - but flag the recursion so that
 		 * it can be printed at the next appropriate moment:
 		 */
-		if (!oops_in_progress && !lockdep_recursing(current)) {
+		if (!atomic_read(&recursion_check_disabled)
+			&& !oops_in_progress
+			&& !lockdep_recursing(current)) {
 			recursion_bug = 1;
 			goto out_restore_irqs;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface
  2012-05-28  2:54                 ` [PATCH v3 " ShuoX Liu
@ 2012-05-28  2:56                   ` ShuoX Liu
  2012-06-04 17:12                     ` Borislav Petkov
  0 siblings, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-05-28  2:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yanmin Zhang, Luck, Tony, Andrew Morton, Borislav Petkov, andi,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

Disable printk recursion to make sure MCE logs printed out.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..365c35d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
 	char *msg = "Unknown";
 
+	printk_recursion_check_disable();
 	atomic_inc(&mce_entry);
 
 	this_cpu_inc(mce_exception_count);
@@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
 out:
 	atomic_dec(&mce_entry);
+	printk_recursion_check_enable();
 	sync_core();
 }
 EXPORT_SYMBOL_GPL(do_machine_check);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface
  2012-05-28  2:07                 ` ShuoX Liu
@ 2012-05-30  9:08                   ` Borislav Petkov
  2012-05-31  0:30                     ` Yanmin Zhang
  2012-06-04  3:04                     ` [PATCH v4 1/2] printk: add interface for disabling recursion check ShuoX Liu
  0 siblings, 2 replies; 42+ messages in thread
From: Borislav Petkov @ 2012-05-30  9:08 UTC (permalink / raw)
  To: ShuoX Liu
  Cc: linux-kernel, Yanmin Zhang, Andrew Morton, andi, Tony Luck, Ingo Molnar

On Mon, May 28, 2012 at 10:07:59AM +0800, ShuoX Liu wrote:
> Boris,
> I checked code and found some other functions in do_machine_check() also
> would printk something. Such as add_taint(). So i think we'd better
> place the recursion check at the beginning and the end of
> do_machine_check(). Also more printks later(maybe) added will benefit
> from this. Do you agree?

I'm not sure we want to disable printk recursion for add_taint() - it
doesn't spit out any useful information wrt MCE so we could ignore it.

Btw, I forgot to ask: this printk recursion disabling, do you have a
real usecase where you don't get the MCE info in dmesg and with your
patch it works or is this purely hypothetical?

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] x86 mce: use new printk recursion disabling interface
  2012-05-30  9:08                   ` Borislav Petkov
@ 2012-05-31  0:30                     ` Yanmin Zhang
  2012-06-04  3:04                     ` [PATCH v4 1/2] printk: add interface for disabling recursion check ShuoX Liu
  1 sibling, 0 replies; 42+ messages in thread
From: Yanmin Zhang @ 2012-05-31  0:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: ShuoX Liu, linux-kernel, Andrew Morton, andi, Tony Luck, Ingo Molnar

On Wed, 2012-05-30 at 11:08 +0200, Borislav Petkov wrote:
> On Mon, May 28, 2012 at 10:07:59AM +0800, ShuoX Liu wrote:
> > Boris,
> > I checked code and found some other functions in do_machine_check() also
> > would printk something. Such as add_taint(). So i think we'd better
> > place the recursion check at the beginning and the end of
> > do_machine_check(). Also more printks later(maybe) added will benefit
> > from this. Do you agree?
> 
> I'm not sure we want to disable printk recursion for add_taint() - it
> doesn't spit out any useful information wrt MCE so we could ignore it.
add_taint might be not a good case here. We could move the recursion check
flag setting around mce_panic.


> 
> Btw, I forgot to ask: this printk recursion disabling, do you have a
> real usecase where you don't get the MCE info in dmesg and with your
> patch it works or is this purely hypothetical?
We hit it when running a MTBF testing on a Android atom mobile.

Thanks,
Yanmin



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 1/2] printk: add interface for disabling recursion check
  2012-05-30  9:08                   ` Borislav Petkov
  2012-05-31  0:30                     ` Yanmin Zhang
@ 2012-06-04  3:04                     ` ShuoX Liu
  2012-06-04  3:07                       ` [PATCH v4 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
  1 sibling, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-06-04  3:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Borislav Petkov, Yanmin Zhang, Andrew Morton, andi, Tony Luck,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 include/linux/printk.h |    3 +++
 kernel/printk.c        |   17 ++++++++++++++++-
 2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
 		console_loglevel = 15;
 }
 
+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
 struct va_format {
 	const char *fmt;
 	va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
 int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+	atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+	WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+	atomic_dec(&recursion_check_disabled);
+}
+
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * recursion and return - but flag the recursion so that
 		 * it can be printed at the next appropriate moment:
 		 */
-		if (!oops_in_progress && !lockdep_recursing(current)) {
+		if (!atomic_read(&recursion_check_disabled)
+			&& !oops_in_progress
+			&& !lockdep_recursing(current)) {
 			recursion_bug = 1;
 			goto out_restore_irqs;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-04  3:04                     ` [PATCH v4 1/2] printk: add interface for disabling recursion check ShuoX Liu
@ 2012-06-04  3:07                       ` ShuoX Liu
  2012-06-22 23:41                         ` Andrew Morton
  0 siblings, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-06-04  3:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Borislav Petkov, Yanmin Zhang, Andrew Morton, andi, Tony Luck,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

Disable printk recursion to make sure MCE logs printed out.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
We hit it when running a MTBF testing on a Android atom mobile.
---
 arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..906e838 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
 {
 	int i, apei_err = 0;
 
+	printk_recursion_check_disable();
 	if (!fake_panic) {
 		/*
 		 * Make sure only one CPU runs in machine check panic
@@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
 		panic(msg);
 	} else
 		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
+	printk_recursion_check_enable();
 }
 
 /* Support code for software error injection */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface
  2012-05-28  2:56                   ` [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
@ 2012-06-04 17:12                     ` Borislav Petkov
  2012-06-05  0:32                       ` Yanmin Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2012-06-04 17:12 UTC (permalink / raw)
  To: ShuoX Liu
  Cc: linux-kernel, Yanmin Zhang, Luck, Tony, Andrew Morton, andi, Ingo Molnar

On Mon, May 28, 2012 at 10:56:04AM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <shuox.liu@intel.com>
> 
> Disable printk recursion to make sure MCE logs printed out.
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..365c35d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
>  	char *msg = "Unknown";
>  
> +	printk_recursion_check_disable();
>  	atomic_inc(&mce_entry);
>  
>  	this_cpu_inc(mce_exception_count);
> @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>  out:
>  	atomic_dec(&mce_entry);
> +	printk_recursion_check_enable();
>  	sync_core();

This is still adding the recursion check things in do_machine_check
instead of in print_mce.

Also, this commit message above needs more explanation why we want to
disable the recursion check on an MCE, maybe an example...

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-04 17:12                     ` Borislav Petkov
@ 2012-06-05  0:32                       ` Yanmin Zhang
  2012-06-05  8:14                         ` Borislav Petkov
  0 siblings, 1 reply; 42+ messages in thread
From: Yanmin Zhang @ 2012-06-05  0:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: ShuoX Liu, linux-kernel, Luck, Tony, Andrew Morton, andi,
	Ingo Molnar, andi, Ingo Molnar

On Mon, 2012-06-04 at 19:12 +0200, Borislav Petkov wrote:
> On Mon, May 28, 2012 at 10:56:04AM +0800, ShuoX Liu wrote:
> > From: ShuoX Liu <shuox.liu@intel.com>
> > 
> > Disable printk recursion to make sure MCE logs printed out.
> > 
> > Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> > Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 2afcbd2..365c35d 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -1015,6 +1015,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> >  	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
> >  	char *msg = "Unknown";
> >  
> > +	printk_recursion_check_disable();
> >  	atomic_inc(&mce_entry);
> >  
> >  	this_cpu_inc(mce_exception_count);
> > @@ -1144,6 +1145,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> >  	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> >  out:
> >  	atomic_dec(&mce_entry);
> > +	printk_recursion_check_enable();
> >  	sync_core();
> 
> This is still adding the recursion check things in do_machine_check
> instead of in print_mce.
Boris,

You are seeing an old patch. The new patches V4 are:
https://lkml.org/lkml/2012/6/3/176
https://lkml.org/lkml/2012/6/3/177


The new patches change mce_panic instead of do_machine_check.

> 
> Also, this commit message above needs more explanation why we want to
> disable the recursion check on an MCE, maybe an example...
V4 doesn't have the comment. We would add it in V5.

Thanks.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-05  0:32                       ` Yanmin Zhang
@ 2012-06-05  8:14                         ` Borislav Petkov
  2012-06-05  9:53                           ` [PATCH v5 1/2] printk: add interface for disabling recursion check ShuoX Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2012-06-05  8:14 UTC (permalink / raw)
  To: Yanmin Zhang
  Cc: ShuoX Liu, linux-kernel, Luck, Tony, Andrew Morton, andi, Ingo Molnar

On Tue, Jun 05, 2012 at 08:32:40AM +0800, Yanmin Zhang wrote:
> You are seeing an old patch. The new patches V4 are:
> https://lkml.org/lkml/2012/6/3/176
> https://lkml.org/lkml/2012/6/3/177
> 
> The new patches change mce_panic instead of do_machine_check.

Ok.

> > Also, this commit message above needs more explanation why we want to
> > disable the recursion check on an MCE, maybe an example...
> V4 doesn't have the comment. We would add it in V5.

Cool, thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v5 1/2] printk: add interface for disabling recursion check
  2012-06-05  8:14                         ` Borislav Petkov
@ 2012-06-05  9:53                           ` ShuoX Liu
  2012-06-05  9:55                             ` [PATCH v5 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
  0 siblings, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-06-05  9:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Borislav Petkov, Yanmin Zhang, Luck, Tony, Andrew Morton, andi,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 include/linux/printk.h |    3 +++
 kernel/printk.c        |   17 ++++++++++++++++-
 2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
 		console_loglevel = 15;
 }
 
+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
 struct va_format {
 	const char *fmt;
 	va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
 int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+	atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+	WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+	atomic_dec(&recursion_check_disabled);
+}
+
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * recursion and return - but flag the recursion so that
 		 * it can be printed at the next appropriate moment:
 		 */
-		if (!oops_in_progress && !lockdep_recursing(current)) {
+		if (!atomic_read(&recursion_check_disabled)
+			&& !oops_in_progress
+			&& !lockdep_recursing(current)) {
 			recursion_bug = 1;
 			goto out_restore_irqs;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-05  9:53                           ` [PATCH v5 1/2] printk: add interface for disabling recursion check ShuoX Liu
@ 2012-06-05  9:55                             ` ShuoX Liu
  2012-06-05 15:15                               ` Borislav Petkov
  0 siblings, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-06-05  9:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Borislav Petkov, Yanmin Zhang, Luck, Tony, Andrew Morton, andi,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

On x86 machines, some times MCE happens just when kernel calls printk
to output some log info to serial console, while usually MCE module in
kernel is used to print out some hardware error information, such like
bad cache or bad memory bank. That causes printk recursion and printk
would omit MCE printk output.

We hit it when running MTBF testing on Android ATOM mobiles.

Here in mce_panic, we choose to disable printk recursion to make sure
MCE logs printed out.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..906e838 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
 {
 	int i, apei_err = 0;
 
+	printk_recursion_check_disable();
 	if (!fake_panic) {
 		/*
 		 * Make sure only one CPU runs in machine check panic
@@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
 		panic(msg);
 	} else
 		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
+	printk_recursion_check_enable();
 }
 
 /* Support code for software error injection */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-05  9:55                             ` [PATCH v5 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
@ 2012-06-05 15:15                               ` Borislav Petkov
  2012-06-06  0:36                                 ` Yanmin Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2012-06-05 15:15 UTC (permalink / raw)
  To: ShuoX Liu
  Cc: linux-kernel, Yanmin Zhang, Luck, Tony, Andrew Morton, andi, Ingo Molnar

On Tue, Jun 05, 2012 at 05:55:22PM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <shuox.liu@intel.com>
> 
> On x86 machines, some times MCE happens just when kernel calls printk
> to output some log info to serial console, while usually MCE module in
> kernel is used to print out some hardware error information, such like
> bad cache or bad memory bank. That causes printk recursion and printk
> would omit MCE printk output.
> 
> We hit it when running MTBF testing on Android ATOM mobiles.
> 
> Here in mce_panic, we choose to disable printk recursion to make sure
> MCE logs printed out.
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..906e838 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
>  {
>  	int i, apei_err = 0;
>  
> +	printk_recursion_check_disable();
>  	if (!fake_panic) {
>  		/*
>  		 * Make sure only one CPU runs in machine check panic
> @@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
>  		panic(msg);
>  	} else
>  		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
> +	printk_recursion_check_enable();

Ok, let me ask this again: why not disable the printk recursion check in
the function that actually _prints_ the MCE, i.e. print_mce() instead of
here in mce_panic() which does a whole bunch of other stuff and it can
also return without printing any MCE to dmesg?

Are you interested in seeing the printk's from mce_panic? Why?

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-05 15:15                               ` Borislav Petkov
@ 2012-06-06  0:36                                 ` Yanmin Zhang
  2012-06-06  8:31                                   ` [PATCH v6 1/2] printk: add interface for disabling recursion check ShuoX Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Yanmin Zhang @ 2012-06-06  0:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: ShuoX Liu, linux-kernel, Luck, Tony, Andrew Morton, andi, Ingo Molnar

On Tue, 2012-06-05 at 17:15 +0200, Borislav Petkov wrote:
> On Tue, Jun 05, 2012 at 05:55:22PM +0800, ShuoX Liu wrote:
> > From: ShuoX Liu <shuox.liu@intel.com>
> > 
> > On x86 machines, some times MCE happens just when kernel calls printk
> > to output some log info to serial console, while usually MCE module in
> > kernel is used to print out some hardware error information, such like
> > bad cache or bad memory bank. That causes printk recursion and printk
> > would omit MCE printk output.
> > 
> > We hit it when running MTBF testing on Android ATOM mobiles.
> > 
> > Here in mce_panic, we choose to disable printk recursion to make sure
> > MCE logs printed out.
> > 
> > Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> > Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 2afcbd2..906e838 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
> >  {
> >  	int i, apei_err = 0;
> >  
> > +	printk_recursion_check_disable();
> >  	if (!fake_panic) {
> >  		/*
> >  		 * Make sure only one CPU runs in machine check panic
> > @@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
> >  		panic(msg);
> >  	} else
> >  		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
> > +	printk_recursion_check_enable();
> 
> Ok, let me ask this again: why not disable the printk recursion check in
> the function that actually _prints_ the MCE, i.e. print_mce() instead of
> here in mce_panic() which does a whole bunch of other stuff and it can
> also return without printing any MCE to dmesg?
Sorry for forgetting the return checking.

> Are you interested in seeing the printk's from mce_panic? Why?
How about moving the disabling checking in print_mce? It seems not clean if we disable
the printk recursion checking just around every printk statement.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 1/2] printk: add interface for disabling recursion check
  2012-06-06  0:36                                 ` Yanmin Zhang
@ 2012-06-06  8:31                                   ` ShuoX Liu
  2012-06-06  8:34                                     ` [PATCH v6 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
  0 siblings, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-06-06  8:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yanmin Zhang, Borislav Petkov, Luck, Tony, Andrew Morton, andi,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 include/linux/printk.h |    3 +++
 kernel/printk.c        |   17 ++++++++++++++++-
 2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
 		console_loglevel = 15;
 }
 
+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
 struct va_format {
 	const char *fmt;
 	va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
 int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+	atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+	WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+	atomic_dec(&recursion_check_disabled);
+}
+
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * recursion and return - but flag the recursion so that
 		 * it can be printed at the next appropriate moment:
 		 */
-		if (!oops_in_progress && !lockdep_recursing(current)) {
+		if (!atomic_read(&recursion_check_disabled)
+			&& !oops_in_progress
+			&& !lockdep_recursing(current)) {
 			recursion_bug = 1;
 			goto out_restore_irqs;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-06  8:31                                   ` [PATCH v6 1/2] printk: add interface for disabling recursion check ShuoX Liu
@ 2012-06-06  8:34                                     ` ShuoX Liu
  2012-06-06 15:22                                       ` Borislav Petkov
  0 siblings, 1 reply; 42+ messages in thread
From: ShuoX Liu @ 2012-06-06  8:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yanmin Zhang, Borislav Petkov, Luck, Tony, Andrew Morton, andi,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

On x86 machines, some times MCE happens just when kernel calls printk
to output some log info to serial console, while usually MCE module in
kernel is used to print out some hardware error information, such like
bad cache or bad memory bank. That causes printk recursion and printk
would omit MCE printk output.

We hit it when running MTBF testing on Android ATOM mobiles.

Here in mce_panic, we choose to disable printk recursion to make sure
MCE logs printed out.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
v6: move the disabling checking in print_mce

---
 arch/x86/kernel/cpu/mcheck/mce.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..6056e94 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -242,6 +242,7 @@ static void print_mce(struct mce *m)
 {
 	int ret = 0;
 
+	printk_recursion_check_disable();
 	pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
 	       m->extcpu, m->mcgstatus, m->bank, m->status);
 
@@ -275,10 +276,13 @@ static void print_mce(struct mce *m)
 	 * (if the CPU has an implementation for that)
 	 */
 	ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
-	if (ret == NOTIFY_STOP)
+	if (ret == NOTIFY_STOP) {
+		printk_recursion_check_enable();
 		return;
+	}
 
 	pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
+	printk_recursion_check_enable();
 }
 
 #define PANIC_TIMEOUT 5 /* 5 seconds */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-06  8:34                                     ` [PATCH v6 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
@ 2012-06-06 15:22                                       ` Borislav Petkov
  2012-06-07  2:13                                         ` Yanmin Zhang
  2012-06-07  2:57                                         ` [PATCH v7 1/2] printk: add interface for disabling recursion check ShuoX Liu
  0 siblings, 2 replies; 42+ messages in thread
From: Borislav Petkov @ 2012-06-06 15:22 UTC (permalink / raw)
  To: ShuoX Liu
  Cc: linux-kernel, Yanmin Zhang, Luck, Tony, Andrew Morton, andi, Ingo Molnar

On Wed, Jun 06, 2012 at 04:34:21PM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <shuox.liu@intel.com>
> 
> On x86 machines, some times MCE happens just when kernel calls printk
> to output some log info to serial console, while usually MCE module in
> kernel is used to print out some hardware error information, such like
> bad cache or bad memory bank. That causes printk recursion and printk
> would omit MCE printk output.
> 
> We hit it when running MTBF testing on Android ATOM mobiles.
> 
> Here in mce_panic, we choose to disable printk recursion to make sure
> MCE logs printed out.

Just a minor nitpick: this should say "print_mce" or you can simply
remove the whole sentence - commit message is fine without it too.

Aside of that, those look ok to me.

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-06 15:22                                       ` Borislav Petkov
@ 2012-06-07  2:13                                         ` Yanmin Zhang
  2012-06-07  2:57                                         ` [PATCH v7 1/2] printk: add interface for disabling recursion check ShuoX Liu
  1 sibling, 0 replies; 42+ messages in thread
From: Yanmin Zhang @ 2012-06-07  2:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: ShuoX Liu, linux-kernel, Luck, Tony, Andrew Morton, andi, Ingo Molnar

On Wed, 2012-06-06 at 17:22 +0200, Borislav Petkov wrote:
> On Wed, Jun 06, 2012 at 04:34:21PM +0800, ShuoX Liu wrote:
> > From: ShuoX Liu <shuox.liu@intel.com>
> > 
> > On x86 machines, some times MCE happens just when kernel calls printk
> > to output some log info to serial console, while usually MCE module in
> > kernel is used to print out some hardware error information, such like
> > bad cache or bad memory bank. That causes printk recursion and printk
> > would omit MCE printk output.
> > 
> > We hit it when running MTBF testing on Android ATOM mobiles.
> > 
> > Here in mce_panic, we choose to disable printk recursion to make sure
> > MCE logs printed out.
> 
> Just a minor nitpick: this should say "print_mce" or you can simply
> remove the whole sentence - commit message is fine without it too.
Thanks for your very careful/detailed comments. We would make it perfect.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7 1/2] printk: add interface for disabling recursion check
  2012-06-06 15:22                                       ` Borislav Petkov
  2012-06-07  2:13                                         ` Yanmin Zhang
@ 2012-06-07  2:57                                         ` ShuoX Liu
  2012-06-07  3:00                                           ` [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
                                                             ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: ShuoX Liu @ 2012-06-07  2:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Borislav Petkov, Yanmin Zhang, Luck, Tony, Andrew Morton, andi,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

With some special scenario, such as Machine Check Exception happened
in printk, we want to bypass printk recursion check to printk some
important information. So we add these interfaces of printk.

1) printk_recursion_check_disable() for disabling recursion check
2) printk_recursion_check_enable() for enabling recursion check

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 include/linux/printk.h |    3 +++
 kernel/printk.c        |   17 ++++++++++++++++-
 2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..da48ec7 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -42,6 +42,9 @@ static inline void console_verbose(void)
 		console_loglevel = 15;
 }
 
+void printk_recursion_check_disable(void);
+void printk_recursion_check_enable(void);
+
 struct va_format {
 	const char *fmt;
 	va_list *va;
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..0580f67 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -78,6 +78,19 @@ int console_printk[4] = {
 int oops_in_progress;
 EXPORT_SYMBOL(oops_in_progress);
 
+static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
+
+void printk_recursion_check_disable(void)
+{
+	atomic_inc(&recursion_check_disabled);
+}
+
+void printk_recursion_check_enable(void)
+{
+	WARN_ON(atomic_read(&recursion_check_disabled) < 1);
+	atomic_dec(&recursion_check_disabled);
+}
+
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * recursion and return - but flag the recursion so that
 		 * it can be printed at the next appropriate moment:
 		 */
-		if (!oops_in_progress && !lockdep_recursing(current)) {
+		if (!atomic_read(&recursion_check_disabled)
+			&& !oops_in_progress
+			&& !lockdep_recursing(current)) {
 			recursion_bug = 1;
 			goto out_restore_irqs;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-07  2:57                                         ` [PATCH v7 1/2] printk: add interface for disabling recursion check ShuoX Liu
@ 2012-06-07  3:00                                           ` ShuoX Liu
  2012-06-08 12:34                                             ` Borislav Petkov
  2012-06-25  9:17                                             ` Ingo Molnar
  2012-06-07 13:19                                           ` [PATCH v7 1/2] printk: add interface for disabling recursion check bing deng
  2012-06-08 12:30                                           ` Borislav Petkov
  2 siblings, 2 replies; 42+ messages in thread
From: ShuoX Liu @ 2012-06-07  3:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Borislav Petkov, Yanmin Zhang, Luck, Tony, Andrew Morton, andi,
	Ingo Molnar

From: ShuoX Liu <shuox.liu@intel.com>

On x86 machines, some times MCE happens just when kernel calls printk
to output some log info to serial console, while usually MCE module in
kernel is used to print out some hardware error information, such like
bad cache or bad memory bank. That causes printk recursion and printk
would omit MCE printk output.

We hit it when running MTBF testing on Android ATOM mobiles.

Here in print_mce, we choose to disable printk recursion to make sure
MCE logs printed out.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2afcbd2..6056e94 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -242,6 +242,7 @@ static void print_mce(struct mce *m)
 {
 	int ret = 0;
 
+	printk_recursion_check_disable();
 	pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
 	       m->extcpu, m->mcgstatus, m->bank, m->status);
 
@@ -275,10 +276,13 @@ static void print_mce(struct mce *m)
 	 * (if the CPU has an implementation for that)
 	 */
 	ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
-	if (ret == NOTIFY_STOP)
+	if (ret == NOTIFY_STOP) {
+		printk_recursion_check_enable();
 		return;
+	}
 
 	pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
+	printk_recursion_check_enable();
 }
 
 #define PANIC_TIMEOUT 5 /* 5 seconds */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 1/2] printk: add interface for disabling recursion check
  2012-06-07  2:57                                         ` [PATCH v7 1/2] printk: add interface for disabling recursion check ShuoX Liu
  2012-06-07  3:00                                           ` [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
@ 2012-06-07 13:19                                           ` bing deng
  2012-06-07 13:38                                             ` Borislav Petkov
  2012-06-08 12:30                                           ` Borislav Petkov
  2 siblings, 1 reply; 42+ messages in thread
From: bing deng @ 2012-06-07 13:19 UTC (permalink / raw)
  To: shuox.liu
  Cc: linux-kernel, Borislav Petkov, Yanmin Zhang, Luck, Tony,
	Andrew Morton, andi, Ingo Molnar

On Thu, Jun 7, 2012 at 2:57 AM, ShuoX Liu <shuox.liu@intel.com> wrote:
> From: ShuoX Liu <shuox.liu@intel.com>
>
> With some special scenario, such as Machine Check Exception happened
> in printk, we want to bypass printk recursion check to printk some
> important information. So we add these interfaces of printk.
>
> 1) printk_recursion_check_disable() for disabling recursion check
> 2) printk_recursion_check_enable() for enabling recursion check
>
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> ---
>  include/linux/printk.h |    3 +++
>  kernel/printk.c        |   17 ++++++++++++++++-
>  2 files changed, 19 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 1bec2f7..da48ec7 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -42,6 +42,9 @@ static inline void console_verbose(void)
>                console_loglevel = 15;
>  }
>
> +void printk_recursion_check_disable(void);
> +void printk_recursion_check_enable(void);
> +
>  struct va_format {
>        const char *fmt;
>        va_list *va;
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 32462d2..0580f67 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -78,6 +78,19 @@ int console_printk[4] = {
>  int oops_in_progress;
>  EXPORT_SYMBOL(oops_in_progress);
>
> +static atomic_t recursion_check_disabled = ATOMIC_INIT(0);
> +
> +void printk_recursion_check_disable(void)
> +{
> +       atomic_inc(&recursion_check_disabled);
> +}
> +
> +void printk_recursion_check_enable(void)
> +{
> +       WARN_ON(atomic_read(&recursion_check_disabled) < 1);
> +       atomic_dec(&recursion_check_disabled);
> +}
> +
>  /*
>  * console_sem protects the console_drivers list, and also
>  * provides serialisation for access to the entire console
> @@ -1295,7 +1308,9 @@ asmlinkage int vprintk_emit(int facility, int level,
>                 * recursion and return - but flag the recursion so that
>                 * it can be printed at the next appropriate moment:
>                 */
> -               if (!oops_in_progress && !lockdep_recursing(current)) {
> +               if (!atomic_read(&recursion_check_disabled)
> +                       && !oops_in_progress
> +                       && !lockdep_recursing(current)) {
>                        recursion_bug = 1;
>                        goto out_restore_irqs;
>                }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Hi Shuo,

    Should the two function be export by EXPORT_SYMBOL? The the other
module can use it.

-- 
Best Regards,
Bing

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 1/2] printk: add interface for disabling recursion check
  2012-06-07 13:19                                           ` [PATCH v7 1/2] printk: add interface for disabling recursion check bing deng
@ 2012-06-07 13:38                                             ` Borislav Petkov
  0 siblings, 0 replies; 42+ messages in thread
From: Borislav Petkov @ 2012-06-07 13:38 UTC (permalink / raw)
  To: bing deng
  Cc: shuox.liu, linux-kernel, Yanmin Zhang, Luck, Tony, Andrew Morton,
	andi, Ingo Molnar

On Thu, Jun 07, 2012 at 01:19:16PM +0000, bing deng wrote:
> Should the two function be export by EXPORT_SYMBOL? The the other
> module can use it.

Not needed right now - functions are only used in built-in code and not
in modules.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 1/2] printk: add interface for disabling recursion check
  2012-06-07  2:57                                         ` [PATCH v7 1/2] printk: add interface for disabling recursion check ShuoX Liu
  2012-06-07  3:00                                           ` [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
  2012-06-07 13:19                                           ` [PATCH v7 1/2] printk: add interface for disabling recursion check bing deng
@ 2012-06-08 12:30                                           ` Borislav Petkov
  2 siblings, 0 replies; 42+ messages in thread
From: Borislav Petkov @ 2012-06-08 12:30 UTC (permalink / raw)
  To: ShuoX Liu
  Cc: linux-kernel, Yanmin Zhang, Luck, Tony, Andrew Morton, andi, Ingo Molnar

On Thu, Jun 07, 2012 at 10:57:26AM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <shuox.liu@intel.com>
> 
> With some special scenario, such as Machine Check Exception happened
> in printk, we want to bypass printk recursion check to printk some
> important information. So we add these interfaces of printk.
> 
> 1) printk_recursion_check_disable() for disabling recursion check
> 2) printk_recursion_check_enable() for enabling recursion check
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>

Acked-by: Borislav Petkov <borislav.petkov@amd.com>

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-07  3:00                                           ` [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
@ 2012-06-08 12:34                                             ` Borislav Petkov
  2012-06-25  9:17                                             ` Ingo Molnar
  1 sibling, 0 replies; 42+ messages in thread
From: Borislav Petkov @ 2012-06-08 12:34 UTC (permalink / raw)
  To: ShuoX Liu, Andrew Morton
  Cc: linux-kernel, Yanmin Zhang, Luck, Tony, andi, Ingo Molnar

On Thu, Jun 07, 2012 at 11:00:03AM +0800, ShuoX Liu wrote:
> From: ShuoX Liu <shuox.liu@intel.com>
> 
> On x86 machines, some times MCE happens just when kernel calls printk
> to output some log info to serial console, while usually MCE module in
> kernel is used to print out some hardware error information, such like
> bad cache or bad memory bank. That causes printk recursion and printk
> would omit MCE printk output.
> 
> We hit it when running MTBF testing on Android ATOM mobiles.
> 
> Here in print_mce, we choose to disable printk recursion to make sure
> MCE logs printed out.
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>

Ok,

those patches have to go in hand in hand and since the first one touches
generic code, maybe Andrew is the right guy for the job of picking them
up, provided there are no other objections against them.

Acked-by: Borislav Petkov <borislav.petkov@amd.com>

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-04  3:07                       ` [PATCH v4 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
@ 2012-06-22 23:41                         ` Andrew Morton
  2012-06-26 20:45                           ` Borislav Petkov
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2012-06-22 23:41 UTC (permalink / raw)
  To: shuox.liu
  Cc: linux-kernel, Borislav Petkov, Yanmin Zhang, andi, Tony Luck,
	Ingo Molnar

On Mon, 04 Jun 2012 11:07:11 +0800
ShuoX Liu <shuox.liu@intel.com> wrote:

> From: ShuoX Liu <shuox.liu@intel.com>
> 
> Disable printk recursion to make sure MCE logs printed out.
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> ---
> We hit it when running a MTBF testing on a Android atom mobile.
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..906e838 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -306,6 +306,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
>  {
>  	int i, apei_err = 0;
>  
> +	printk_recursion_check_disable();
>  	if (!fake_panic) {
>  		/*
>  		 * Make sure only one CPU runs in machine check panic
> @@ -360,6 +361,7 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
>  		panic(msg);
>  	} else
>  		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
> +	printk_recursion_check_enable();
>  }
>  
>  /* Support code for software error injection */

A couple of things here.

a) mce_panic() has a "return" statement deep inside.  So we return
   from mce_panic() with the recursion check disabled.  whoops.

b) adding a nice comment is nice.

--- a/arch/x86/kernel/cpu/mcheck/mce.c~x86-mce-use-new-printk-recursion-disabling-interface-fix
+++ a/arch/x86/kernel/cpu/mcheck/mce.c
@@ -303,11 +303,10 @@ static void wait_for_panic(void)
 	panic("Panicing machine check CPU died");
 }
 
-static void mce_panic(char *msg, struct mce *final, char *exp)
+static void __mce_panic(char *msg, struct mce *final, char *exp)
 {
 	int i, apei_err = 0;
 
-	printk_recursion_check_disable();
 	if (!fake_panic) {
 		/*
 		 * Make sure only one CPU runs in machine check panic
@@ -362,6 +361,17 @@ static void mce_panic(char *msg, struct 
 		panic(msg);
 	} else
 		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
+}
+
+/*
+ * If an MCE happens to occur during the execution of a printk(), we want the
+ * MCE information to be displayed.  But printk()'s recursion checking prevents
+ * that.  So temporarily disable it.
+ */
+static void mce_panic(char *msg, struct mce *final, char *exp)
+{
+	printk_recursion_check_disable();
+	__mce_panic(msg, final, exp);
 	printk_recursion_check_enable();
 }
 
_


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-07  3:00                                           ` [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
  2012-06-08 12:34                                             ` Borislav Petkov
@ 2012-06-25  9:17                                             ` Ingo Molnar
  2012-06-25 13:14                                               ` Peter Zijlstra
  1 sibling, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2012-06-25  9:17 UTC (permalink / raw)
  To: ShuoX Liu, Andrew Morton
  Cc: linux-kernel, Borislav Petkov, Yanmin Zhang, Luck, Tony,
	Andrew Morton, andi, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner


* ShuoX Liu <shuox.liu@intel.com> wrote:

> From: ShuoX Liu <shuox.liu@intel.com>
> 
> On x86 machines, some times MCE happens just when kernel calls printk
> to output some log info to serial console, while usually MCE module in
> kernel is used to print out some hardware error information, such like
> bad cache or bad memory bank. That causes printk recursion and printk
> would omit MCE printk output.
> 
> We hit it when running MTBF testing on Android ATOM mobiles.
> 
> Here in print_mce, we choose to disable printk recursion to make sure
> MCE logs printed out.
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..6056e94 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -242,6 +242,7 @@ static void print_mce(struct mce *m)
>  {
>  	int ret = 0;
>  
> +	printk_recursion_check_disable();
>  	pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
>  	       m->extcpu, m->mcgstatus, m->bank, m->status);
>  
> @@ -275,10 +276,13 @@ static void print_mce(struct mce *m)
>  	 * (if the CPU has an implementation for that)
>  	 */
>  	ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> -	if (ret == NOTIFY_STOP)
> +	if (ret == NOTIFY_STOP) {
> +		printk_recursion_check_enable();
>  		return;
> +	}
>  
>  	pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
> +	printk_recursion_check_enable();

Ok, this looks useful and it solves a real problem, but I'd 
prefer a better interface: instead of exposing the guts of 
printk to drivers in an unsafe manner (and allowing them to keep 
printk in an unsafe state indefinitely), shouldn't we instead 
introduce printk_emergency() (and variants) that just disable 
the recursion check, do the printk and then enable them?

That way drivers cannot possibly leave the recursion check 
disabled permanently and it would also be much more obvious 
*which* actual printk sites are affected by this exception.

In theory this could be achieved via a new, super-high-prio 
printk level: KERN_CRASH or so, which the core printk code could 
check - without introducing a new side-facility.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-25  9:17                                             ` Ingo Molnar
@ 2012-06-25 13:14                                               ` Peter Zijlstra
  2012-06-26 20:43                                                 ` Borislav Petkov
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2012-06-25 13:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: ShuoX Liu, Andrew Morton, linux-kernel, Borislav Petkov,
	Yanmin Zhang, Luck, Tony, andi, Ingo Molnar, Thomas Gleixner

On Mon, 2012-06-25 at 11:17 +0200, Ingo Molnar wrote:
> * ShuoX Liu <shuox.liu@intel.com> wrote:
> 
> > From: ShuoX Liu <shuox.liu@intel.com>
> > 
> > On x86 machines, some times MCE happens just when kernel calls printk
> > to output some log info to serial console, while usually MCE module in
> > kernel is used to print out some hardware error information, such like
> > bad cache or bad memory bank. That causes printk recursion and printk
> > would omit MCE printk output.
> > 
> > We hit it when running MTBF testing on Android ATOM mobiles.
> > 
> > Here in print_mce, we choose to disable printk recursion to make sure
> > MCE logs printed out.
> > 
> > Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
> > Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce.c |    6 +++++-
> >  1 files changed, 5 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 2afcbd2..6056e94 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -242,6 +242,7 @@ static void print_mce(struct mce *m)
> >  {
> >  	int ret = 0;
> >  
> > +	printk_recursion_check_disable();
> >  	pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
> >  	       m->extcpu, m->mcgstatus, m->bank, m->status);
> >  
> > @@ -275,10 +276,13 @@ static void print_mce(struct mce *m)
> >  	 * (if the CPU has an implementation for that)
> >  	 */
> >  	ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> > -	if (ret == NOTIFY_STOP)
> > +	if (ret == NOTIFY_STOP) {
> > +		printk_recursion_check_enable();
> >  		return;
> > +	}
> >  
> >  	pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
> > +	printk_recursion_check_enable();
> 
> Ok, this looks useful and it solves a real problem, but I'd 
> prefer a better interface: instead of exposing the guts of 
> printk to drivers in an unsafe manner (and allowing them to keep 
> printk in an unsafe state indefinitely), shouldn't we instead 
> introduce printk_emergency() (and variants) that just disable 
> the recursion check, do the printk and then enable them?

I really don't see why you'd do anything like the above. For one the
oops_in_progress thing is exported, so you could simply use that, you're
going to panic the machine here anyway, right?

Furthermore the whole recursion stuff is broken anyway, see:
  http://marc.info/?l=linux-kernel&m=132446646923333

That also introduces recursive_printk() which you could (ab)use if
needed.

That said, its not entirely clear to me what context you're in when
you're calling this print_mce(), it looks like its only used from
mce_panic().

That already does bust_spinlocks() which already increments the
oops_in_progress thing, so WTF are you doing anyway?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-25 13:14                                               ` Peter Zijlstra
@ 2012-06-26 20:43                                                 ` Borislav Petkov
  0 siblings, 0 replies; 42+ messages in thread
From: Borislav Petkov @ 2012-06-26 20:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, ShuoX Liu, Andrew Morton, linux-kernel,
	Borislav Petkov, Yanmin Zhang, Luck, Tony, andi, Ingo Molnar,
	Thomas Gleixner

On Mon, Jun 25, 2012 at 03:14:21PM +0200, Peter Zijlstra wrote:
> That already does bust_spinlocks() which already increments the
> oops_in_progress thing, so WTF are you doing anyway?

Hmm, that's interesting. So Shuox said they hit this in some sort of
"MTBF testing on Android ATOM mobiles":

http://marc.info/?l=linux-kernel&m=133903834107577&w=2

And I'm guessing they want this probably because they set fake_panic so
that we don't bust_spinlocks() in mce_panic().

Shoux, Yanmin, what's the dealio here?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/2] x86 mce: use new printk recursion disabling interface
  2012-06-22 23:41                         ` Andrew Morton
@ 2012-06-26 20:45                           ` Borislav Petkov
  0 siblings, 0 replies; 42+ messages in thread
From: Borislav Petkov @ 2012-06-26 20:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: shuox.liu, linux-kernel, Borislav Petkov, Yanmin Zhang, andi,
	Tony Luck, Ingo Molnar

On Fri, Jun 22, 2012 at 04:41:43PM -0700, Andrew Morton wrote:
> A couple of things here.
> 
> a) mce_panic() has a "return" statement deep inside.  So we return
>    from mce_panic() with the recursion check disabled.  whoops.

Yeah, you're staring at v4 and we fixed this later (currently discussing
v7, please follow this thread :))

> b) adding a nice comment is nice.

Agreed, if the actual use case pans out and we really need this
interface...

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2012-06-26 20:46 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-23  1:58 [PATCH] printk: ignore recursion_bug flag when MCE in progress ShuoX Liu
2012-05-23 10:01 ` Borislav Petkov
2012-05-24  0:38   ` Yanmin Zhang
2012-05-24  5:59   ` [PATCH v2] printk: ignore recursion_bug flag in HW error handle process ShuoX Liu
2012-05-24  6:11     ` Borislav Petkov
2012-05-24 22:56       ` Andrew Morton
2012-05-25  0:30         ` Yanmin Zhang
2012-05-25  7:19           ` [PATCH 1/2] printk: add interface for disabling recursion check ShuoX Liu
2012-05-25  7:21             ` [PATCH 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
2012-05-25  7:41               ` Borislav Petkov
2012-05-25  8:00                 ` ShuoX Liu
2012-05-28  2:07                 ` ShuoX Liu
2012-05-30  9:08                   ` Borislav Petkov
2012-05-31  0:30                     ` Yanmin Zhang
2012-06-04  3:04                     ` [PATCH v4 1/2] printk: add interface for disabling recursion check ShuoX Liu
2012-06-04  3:07                       ` [PATCH v4 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
2012-06-22 23:41                         ` Andrew Morton
2012-06-26 20:45                           ` Borislav Petkov
2012-05-25 16:09             ` [PATCH 1/2] printk: add interface for disabling recursion check Luck, Tony
2012-05-28  0:30               ` Yanmin Zhang
2012-05-28  2:54                 ` [PATCH v3 " ShuoX Liu
2012-05-28  2:56                   ` [PATCH v3 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
2012-06-04 17:12                     ` Borislav Petkov
2012-06-05  0:32                       ` Yanmin Zhang
2012-06-05  8:14                         ` Borislav Petkov
2012-06-05  9:53                           ` [PATCH v5 1/2] printk: add interface for disabling recursion check ShuoX Liu
2012-06-05  9:55                             ` [PATCH v5 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
2012-06-05 15:15                               ` Borislav Petkov
2012-06-06  0:36                                 ` Yanmin Zhang
2012-06-06  8:31                                   ` [PATCH v6 1/2] printk: add interface for disabling recursion check ShuoX Liu
2012-06-06  8:34                                     ` [PATCH v6 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
2012-06-06 15:22                                       ` Borislav Petkov
2012-06-07  2:13                                         ` Yanmin Zhang
2012-06-07  2:57                                         ` [PATCH v7 1/2] printk: add interface for disabling recursion check ShuoX Liu
2012-06-07  3:00                                           ` [PATCH v7 2/2] x86 mce: use new printk recursion disabling interface ShuoX Liu
2012-06-08 12:34                                             ` Borislav Petkov
2012-06-25  9:17                                             ` Ingo Molnar
2012-06-25 13:14                                               ` Peter Zijlstra
2012-06-26 20:43                                                 ` Borislav Petkov
2012-06-07 13:19                                           ` [PATCH v7 1/2] printk: add interface for disabling recursion check bing deng
2012-06-07 13:38                                             ` Borislav Petkov
2012-06-08 12:30                                           ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).