linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] x86: Do not panic if mce=2 is passed
@ 2016-09-16 20:23 Yinghai Lu
  2016-09-16 20:28 ` Luck, Tony
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Yinghai Lu @ 2016-09-16 20:23 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, linux-edac,
	Yinghai Lu

From: Yinghai Lu <yinghai.lu@oracle.com>

For UE recovery support, current we need mce=2 in command line
and also disable panic_on_oops with sysctl.

but other user may still need to have panic_on_oops to 1 always.

We can remove checking of panic_on_oops for mce-severity path.

We should be ok as on default path when mce=2 is not passed, tolerant
is 0, so they will still get MCE_PANIC_SEVERITY returned.

Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com>


---
 arch/x86/kernel/cpu/mcheck/mce-severity.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -311,7 +311,7 @@ static int mce_severity_intel(struct mce
 			*msg = s->msg;
 		s->covered = 1;
 		if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) {
-			if (panic_on_oops || tolerant < 1)
+			if (tolerant < 1)
 				return MCE_PANIC_SEVERITY;
 		}
 		return s->sev;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [RFC PATCH] x86: Do not panic if mce=2 is passed
  2016-09-16 20:23 [RFC PATCH] x86: Do not panic if mce=2 is passed Yinghai Lu
@ 2016-09-16 20:28 ` Luck, Tony
  2016-09-18 18:39   ` Borislav Petkov
  2016-10-31 10:57 ` Borislav Petkov
  2016-11-08 16:18 ` [tip:ras/core] x86/MCE: Do not look at panic_on_oops in the severity grading tip-bot for Yinghai Lu
  2 siblings, 1 reply; 5+ messages in thread
From: Luck, Tony @ 2016-09-16 20:28 UTC (permalink / raw)
  To: Yinghai Lu, Borislav Petkov
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, linux-edac,
	Yinghai Lu

> For UE recovery support, current we need mce=2 in command line
> and also disable panic_on_oops with sysctl.

Please explain. I've never given mce=2 on command line, and have
had my kernel recover from thousands of (injected) UE memory errors.

-Tony

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] x86: Do not panic if mce=2 is passed
  2016-09-16 20:28 ` Luck, Tony
@ 2016-09-18 18:39   ` Borislav Petkov
  0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2016-09-18 18:39 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Yinghai Lu, the arch/x86 maintainers, Linux Kernel Mailing List,
	linux-edac, Yinghai Lu

On Fri, Sep 16, 2016 at 08:28:44PM +0000, Luck, Tony wrote:
> > For UE recovery support, current we need mce=2 in command line
> > and also disable panic_on_oops with sysctl.
> 
> Please explain. I've never given mce=2 on command line, and have
> had my kernel recover from thousands of (injected) UE memory errors.

So frankly, that panic_on_oops doesn't make a whole lotta sense to me.

It is promoting MCEs with severity MCE_UC_SEVERITY and higher to a
panic.

So let's look at those:

	MCE_UC_SEVERITY,	- we don't do anything special in the kernel for
				those so just as well.
	MCE_AR_SEVERITY,	- those end up in the memory failure code if
				they're memory errors
	MCE_PANIC_SEVERITY,	- causes panic

so if anything, panic_on_oops shouldn't control the panicking behavior
as tolerant does that already:

	 * Tolerant levels:
	 * 0: always panic on uncorrected errors, log corrected errors
	 * 1: panic or SIGBUS on uncorrected errors, log corrected errors
	 * 2: SIGBUS or log uncorrected errors (if possible), log corr. errors
	 * 3: never panic or SIGBUS, log all errors (for testing only)

IOW, I think that patch makes sense but please doublecheck my logic
above first.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] x86: Do not panic if mce=2 is passed
  2016-09-16 20:23 [RFC PATCH] x86: Do not panic if mce=2 is passed Yinghai Lu
  2016-09-16 20:28 ` Luck, Tony
@ 2016-10-31 10:57 ` Borislav Petkov
  2016-11-08 16:18 ` [tip:ras/core] x86/MCE: Do not look at panic_on_oops in the severity grading tip-bot for Yinghai Lu
  2 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2016-10-31 10:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tony Luck, the arch/x86 maintainers, Linux Kernel Mailing List,
	linux-edac, Yinghai Lu

On Fri, Sep 16, 2016 at 01:23:25PM -0700, Yinghai Lu wrote:
> From: Yinghai Lu <yinghai.lu@oracle.com>
> 
> For UE recovery support, current we need mce=2 in command line
> and also disable panic_on_oops with sysctl.
> 
> but other user may still need to have panic_on_oops to 1 always.
> 
> We can remove checking of panic_on_oops for mce-severity path.
> 
> We should be ok as on default path when mce=2 is not passed, tolerant
> is 0, so they will still get MCE_PANIC_SEVERITY returned.
> 
> Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com>
> 
> 
> ---
>  arch/x86/kernel/cpu/mcheck/mce-severity.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce-severity.c
> +++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c
> @@ -311,7 +311,7 @@ static int mce_severity_intel(struct mce
>  			*msg = s->msg;
>  		s->covered = 1;
>  		if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) {
> -			if (panic_on_oops || tolerant < 1)
> +			if (tolerant < 1)
>  				return MCE_PANIC_SEVERITY;
>  		}
>  		return s->sev;

Applied,
thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip:ras/core] x86/MCE: Do not look at panic_on_oops in the severity grading
  2016-09-16 20:23 [RFC PATCH] x86: Do not panic if mce=2 is passed Yinghai Lu
  2016-09-16 20:28 ` Luck, Tony
  2016-10-31 10:57 ` Borislav Petkov
@ 2016-11-08 16:18 ` tip-bot for Yinghai Lu
  2 siblings, 0 replies; 5+ messages in thread
From: tip-bot for Yinghai Lu @ 2016-11-08 16:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, tony.luck, hpa, linux-edac, tglx, x86,
	yinghai.lu, bp

Commit-ID:  f5e886ef9b45a3dbfd42b054a13c755894ea8402
Gitweb:     http://git.kernel.org/tip/f5e886ef9b45a3dbfd42b054a13c755894ea8402
Author:     Yinghai Lu <yinghai.lu@oracle.com>
AuthorDate: Fri, 16 Sep 2016 13:23:25 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 8 Nov 2016 17:10:12 +0100

x86/MCE: Do not look at panic_on_oops in the severity grading

The MCE tolerance levels control whether we panic on a machine check or do
something else like generating a signal and logging error information. This
is controlled by the mce=<level> command line parameter.

However, if panic_on_oops is set, it will force a panic for such an MCE
even though the user didn't want to.

So don't check panic_on_oops in the severity grading anymore.

One of the use cases for that is recovery from uncorrectable errors with
mce=2.

 [ Boris: rewrite commit message. ]

Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20160916202325.4972-1-yinghai@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 631356c..c7efbcf 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -311,7 +311,7 @@ static int mce_severity_intel(struct mce *m, int tolerant, char **msg, bool is_e
 			*msg = s->msg;
 		s->covered = 1;
 		if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) {
-			if (panic_on_oops || tolerant < 1)
+			if (tolerant < 1)
 				return MCE_PANIC_SEVERITY;
 		}
 		return s->sev;

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-11-08 16:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-16 20:23 [RFC PATCH] x86: Do not panic if mce=2 is passed Yinghai Lu
2016-09-16 20:28 ` Luck, Tony
2016-09-18 18:39   ` Borislav Petkov
2016-10-31 10:57 ` Borislav Petkov
2016-11-08 16:18 ` [tip:ras/core] x86/MCE: Do not look at panic_on_oops in the severity grading tip-bot for Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).