linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Dump stack after certain machine checks
@ 2022-09-22 19:51 Tony Luck
  2022-09-22 19:51 ` [PATCH 1/2] x86/mce: Use severity table to handle uncorrected errors in kernel Tony Luck
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Tony Luck @ 2022-09-22 19:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Yazen Ghannam, Smita Koralahalli, Carlos Bilbao, x86, linux-edac,
	linux-kernel, Tony Luck

In general it isn't very useful to dump the kernel stack in the panic
from a fatal machine check. The problem is almost always hardware
related, so knowing how the kernel got to the routine that triggered the
machine check isn't useful.

But Linux now has the capability to recover from most user mode and a
few kernel mode memory related machine checks. Validation folks are
testing that out and occasionally bring a kernel log like this to me:

[69608.047771] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
[69608.021729] mce: [Hardware Error]: TSC 7874eb580177 ADDR 43bb84bd00 MISC 86 PPIN 9f061818e1a92082 
[69608.047773] Kernel panic - not syncing: Fatal local machine check
[69608.021720] mce: [Hardware Error]: RIP 10:<ffffffff8b767517> {copy_page+0x7/0x10}

All I can tell them is that Linux was copying a page and hit poison in
the source of the copy. But there are lots of reasons why Linux may be
copying a page. A stack trace would help figure out if:
1) the test was bad and just injected an error into the wrong location
2) an injected error sat around in memory and was later consumed

Case 2 will help identify places where Linux might use a "safe" copy
function that returns an error to the caller which may attempt some sort
of recovery.

Patch 1 cleans up the Intel severity calculation by using a new severity
table entry instead of some, now dubious, code to adjust the severity
for errors in kernel context.

Patch 2 adds a new severity level that triggers printing a stack trace.

I've only updated the Intel severity calculation to use this new
severity level. I'm not sure if AMD also has situations where this would
be useful. If so, then mce_severity_amd() would need to be updated too
to return different severity for IN_KERNEL and IN_KERNEL_RECOV cases.

I've tested this out on systems that do both broadcast and local machine
checks.

Tony Luck (2):
  x86/mce: Use severity table to handle uncorrected errors in kernel
  x86/mce: Dump the stack for recoverable machine checks in kernel
    context

 arch/x86/kernel/cpu/mce/internal.h |  1 +
 arch/x86/kernel/cpu/mce/core.c     | 11 +++++++++--
 arch/x86/kernel/cpu/mce/severity.c | 10 ++++++----
 3 files changed, 16 insertions(+), 6 deletions(-)


base-commit: 521a547ced6477c54b4b0cc206000406c221b4d6
-- 
2.37.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] x86/mce: Use severity table to handle uncorrected errors in kernel
  2022-09-22 19:51 [PATCH 0/2] Dump stack after certain machine checks Tony Luck
@ 2022-09-22 19:51 ` Tony Luck
  2022-09-22 19:51 ` [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context Tony Luck
  2022-10-31 10:30 ` [PATCH 0/2] Dump stack after certain machine checks Borislav Petkov
  2 siblings, 0 replies; 9+ messages in thread
From: Tony Luck @ 2022-09-22 19:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Yazen Ghannam, Smita Koralahalli, Carlos Bilbao, x86, linux-edac,
	linux-kernel, Tony Luck

mce_severity_intel() has a special case to promote UC and AR errors
in kernel context to PANIC severity.

The "AR" case is already handled with separate entries in the severity
table for all instruction fetch errors, and those data fetch errors that
are not in a recoverable area of the kernel (i.e. have an extable fixup
entry).

Add an entry to the severity table for UC errors in kernel context that
reports severity = PANIC. Delete the special case code from
mce_severity_intel().

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/severity.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index 00483d1c27e4..c4477162c07d 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -202,6 +202,11 @@ static struct severity {
 		PANIC, "Overflowed uncorrected",
 		BITSET(MCI_STATUS_OVER|MCI_STATUS_UC)
 		),
+	MCESEV(
+		PANIC, "Uncorrected in kernel",
+		BITSET(MCI_STATUS_UC),
+		KERNEL
+		),
 	MCESEV(
 		UC, "Uncorrected",
 		BITSET(MCI_STATUS_UC)
@@ -391,9 +396,6 @@ static noinstr int mce_severity_intel(struct mce *m, struct pt_regs *regs, char
 			*msg = s->msg;
 		s->covered = 1;
 
-		if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL)
-			return MCE_PANIC_SEVERITY;
-
 		return s->sev;
 	}
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context
  2022-09-22 19:51 [PATCH 0/2] Dump stack after certain machine checks Tony Luck
  2022-09-22 19:51 ` [PATCH 1/2] x86/mce: Use severity table to handle uncorrected errors in kernel Tony Luck
@ 2022-09-22 19:51 ` Tony Luck
  2022-10-31 16:44   ` Borislav Petkov
  2022-10-31 10:30 ` [PATCH 0/2] Dump stack after certain machine checks Borislav Petkov
  2 siblings, 1 reply; 9+ messages in thread
From: Tony Luck @ 2022-09-22 19:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Yazen Ghannam, Smita Koralahalli, Carlos Bilbao, x86, linux-edac,
	linux-kernel, Tony Luck

It isn't generally useful to dump the stack for a fatal machine check.
The error was detected by hardware when some parity or ECC check failed,
software isn't the problem.

But the kernel now has a few places where it can recover from a machine
check by treating it as an error. E.g. when copying parameters for system
calls from an application.

In order to ease the hunt for additional code flows where machine check
errors can be recovered it is useful to know, for example, why the
kernel was copying a page. Perhaps that code sequence can be modified to
handle machine checks as errors.

Add a new machine check severity value to indicate when a stack dump
may be useful.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/internal.h |  1 +
 arch/x86/kernel/cpu/mce/core.c     | 11 +++++++++--
 arch/x86/kernel/cpu/mce/severity.c |  2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 7e03f5b7f6bd..f03aaff79e39 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -18,6 +18,7 @@ enum severity_level {
 	MCE_UC_SEVERITY,
 	MCE_AR_SEVERITY,
 	MCE_PANIC_SEVERITY,
+	MCE_PANIC_STACKDUMP_SEVERITY,
 };
 
 extern struct blocking_notifier_head x86_mce_decoder_chain;
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c8ec5c71712..69ec63eaa625 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -44,6 +44,7 @@
 #include <linux/sync_core.h>
 #include <linux/task_work.h>
 #include <linux/hardirq.h>
+#include <linux/sched/debug.h>
 
 #include <asm/intel-family.h>
 #include <asm/processor.h>
@@ -254,6 +255,9 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 			wait_for_panic();
 		barrier();
 
+		if (final->severity == MCE_PANIC_STACKDUMP_SEVERITY)
+			show_stack(NULL, NULL, KERN_DEFAULT);
+
 		bust_spinlocks(1);
 		console_verbose();
 	} else {
@@ -864,6 +868,7 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
 					  struct pt_regs *regs)
 {
 	char *tmp = *msg;
+	int severity;
 	int i;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
@@ -876,9 +881,11 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
 			quirk_sandybridge_ifu(i, m, regs);
 
 		m->bank = i;
-		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
+		severity = mce_severity(m, regs, &tmp, true);
+		if (severity >= MCE_PANIC_SEVERITY) {
 			mce_read_aux(m, i);
 			*msg = tmp;
+			m->severity = severity;
 			return 1;
 		}
 	}
@@ -994,7 +1001,7 @@ static void mce_reign(void)
 	 */
 	if (m && global_worst >= MCE_PANIC_SEVERITY) {
 		/* call mce_severity() to get "msg" for panic */
-		mce_severity(m, NULL, &msg, true);
+		m->severity = mce_severity(m, NULL, &msg, true);
 		mce_panic("Fatal machine check", m, msg);
 	}
 
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index c4477162c07d..89d083c5bd06 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -174,7 +174,7 @@ static struct severity {
 		USER
 		),
 	MCESEV(
-		PANIC, "Data load in unrecoverable area of kernel",
+		PANIC_STACKDUMP, "Data load in unrecoverable area of kernel",
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
 		KERNEL
 		),
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] Dump stack after certain machine checks
  2022-09-22 19:51 [PATCH 0/2] Dump stack after certain machine checks Tony Luck
  2022-09-22 19:51 ` [PATCH 1/2] x86/mce: Use severity table to handle uncorrected errors in kernel Tony Luck
  2022-09-22 19:51 ` [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context Tony Luck
@ 2022-10-31 10:30 ` Borislav Petkov
  2022-11-01 17:36   ` Yazen Ghannam
  2 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2022-10-31 10:30 UTC (permalink / raw)
  To: Tony Luck, Yazen Ghannam
  Cc: Smita Koralahalli, Carlos Bilbao, x86, linux-edac, linux-kernel

On Thu, Sep 22, 2022 at 12:51:34PM -0700, Tony Luck wrote:
> I've only updated the Intel severity calculation to use this new
> severity level. I'm not sure if AMD also has situations where this would
> be useful. If so, then mce_severity_amd() would need to be updated too
> to return different severity for IN_KERNEL and IN_KERNEL_RECOV cases.

I'd look into Yazen's direction for that...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context
  2022-09-22 19:51 ` [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context Tony Luck
@ 2022-10-31 16:44   ` Borislav Petkov
  2022-10-31 17:13     ` Luck, Tony
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2022-10-31 16:44 UTC (permalink / raw)
  To: Tony Luck
  Cc: Yazen Ghannam, Smita Koralahalli, Carlos Bilbao, x86, linux-edac,
	linux-kernel

On Thu, Sep 22, 2022 at 12:51:36PM -0700, Tony Luck wrote:
> @@ -254,6 +255,9 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
>  			wait_for_panic();
>  		barrier();
>  
> +		if (final->severity == MCE_PANIC_STACKDUMP_SEVERITY)
> +			show_stack(NULL, NULL, KERN_DEFAULT);

So this is kinda weird, IMO:

1. If the error has raised a MCE, then we will dump stack anyway.

2. If the error is the result of consuming poison or some other deferred
type which doesn't raise an exception immediately, then we have missed
it because we don't have the stack at the time the error got detected by
the hardware.

3. If all you wanna do is avoid useless stack traces, you can simply
ignore them. :)

IOW, it will dump stack in the cases we're interested in and it will
dump stack in a couple of other PANIC cases. So? We simply ignore the
latter.

But I don't see the point of adding code just so that we can suppress
the uninteresting ones...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context
  2022-10-31 16:44   ` Borislav Petkov
@ 2022-10-31 17:13     ` Luck, Tony
  2022-10-31 18:36       ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2022-10-31 17:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Yazen Ghannam, Smita Koralahalli, Carlos Bilbao, x86, linux-edac,
	linux-kernel

> 1. If the error has raised a MCE, then we will dump stack anyway.

I don't see stack dumps for machine check panics. I don't have any non-standard
settings (I think). Nor do I see them in the panic messages that other folks send
to me.

Are you settting some CONFIG or command line option to get a stack dump?

-Tony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context
  2022-10-31 17:13     ` Luck, Tony
@ 2022-10-31 18:36       ` Borislav Petkov
  2022-10-31 19:20         ` Luck, Tony
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2022-10-31 18:36 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Yazen Ghannam, Smita Koralahalli, Carlos Bilbao, x86, linux-edac,
	linux-kernel

On Mon, Oct 31, 2022 at 05:13:10PM +0000, Luck, Tony wrote:
> > 1. If the error has raised a MCE, then we will dump stack anyway.
> 
> I don't see stack dumps for machine check panics. I don't have any non-standard
> settings (I think). Nor do I see them in the panic messages that other folks send
> to me.
> 
> Are you settting some CONFIG or command line option to get a stack dump?

Well, if one were sane, one would assume that one would expect to see a
stack dump when the machine panics, right? I mean, it is only fair...

And there's an attempt:

#ifdef CONFIG_DEBUG_BUGVERBOSE 
        /*
         * Avoid nested stack-dumping if a panic occurs during oops processing
         */
        if (!test_taint(TAINT_DIE) && oops_in_progress <= 1)
                dump_stack();
#endif

but that oops_in_progress thing is stopping us:

[   13.706764] mce: [Hardware Error]: CPU 2: Machine Check Exception: 6 Bank 4: fe000010000b0c0f
[   13.706781] mce: [Hardware Error]: RIP 10:<ffffffff8103bbcb> {trigger_mce+0xb/0x10}
[   13.706791] mce: [Hardware Error]: TSC c83826d14 ADDR e1101add1e550012 MISC cafebeef 
[   13.706795] mce: [Hardware Error]: PROCESSOR 2:a00f11 TIME 1667244167 SOCKET 0 APIC 2 microcode 1000065
[   13.706809] mce: [Hardware Error]: Machine check: Processor Context Corrupt
[   13.706810] panic: on entry: oops_in_progress: 1
[   13.706812] panic: before bust_spinlocks oops_in_progress: 1
[   13.706813] Kernel panic - not syncing: Fatal local machine check
[   13.706814] panic: taint: 0, oops_in_progress: 2
[   13.707133] Kernel Offset: disabled

as panic() is being entered with oops_in_progress already set to 1. That
oops_in_progress thing looks like is being used for console unblanking.

Looking at

  026ee1f66aaa ("panic: fix stack dump print on direct call to panic()")

it hints that panic() might've been called twice for oops_in_progress to
be already 1 on entry.

I guess we need to figure out why that is...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context
  2022-10-31 18:36       ` Borislav Petkov
@ 2022-10-31 19:20         ` Luck, Tony
  0 siblings, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2022-10-31 19:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Yazen Ghannam, Smita Koralahalli, Carlos Bilbao, x86, linux-edac,
	linux-kernel

> Well, if one were sane, one would assume that one would expect to see a
> stack dump when the machine panics, right? I mean, it is only fair...

Stack dump from a machine check wasn't at all useful until h/w and Linux started
supporting recoverable machine checks. The stack dump is there to help diagnose
and fix s/w problems. But a machine check isn't a software problem.

So I was pretty happy with the status quo of not getting a stack dump from
a machine check panic.

With recoverable machine checks there are some cases where there might
be an opportunity to change the kernel to avoid a crash. See my patches that
akpm just took into the "mm" tree to recover when the kernel hits poison during
a copy-on-write:

https://lore.kernel.org/all/20221021200120.175753-1-tony.luck@intel.com/

or the patches from Google to recover when khugepaged hits poison:

https://lore.kernel.org/linux-mm/20221010160142.1087120-1-jiaqiyan@google.com/


To identify additional opportunities to make the kernel more resilient, it would be useful
to get a kernel stack trace in the specific case of a recoverable data consumption
machine check while executing in the kernel.

> And there's an attempt:
>
> #ifdef CONFIG_DEBUG_BUGVERBOSE
>         /*
>          * Avoid nested stack-dumping if a panic occurs during oops processing
>          */
>         if (!test_taint(TAINT_DIE) && oops_in_progress <= 1)
>                 dump_stack();
> #endif
>
> but that oops_in_progress thing is stopping us: 

...

> it hints that panic() might've been called twice for oops_in_progress to
> be already 1 on entry.
>
> I guess we need to figure out why that is...

It might be interesting, but a distraction from the goal of my patch to only
dump the stack for recoverable machine checks in kernel code.

-Tony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] Dump stack after certain machine checks
  2022-10-31 10:30 ` [PATCH 0/2] Dump stack after certain machine checks Borislav Petkov
@ 2022-11-01 17:36   ` Yazen Ghannam
  0 siblings, 0 replies; 9+ messages in thread
From: Yazen Ghannam @ 2022-11-01 17:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, Smita Koralahalli, Carlos Bilbao, x86, linux-edac,
	linux-kernel

On Mon, Oct 31, 2022 at 11:30:03AM +0100, Borislav Petkov wrote:
> On Thu, Sep 22, 2022 at 12:51:34PM -0700, Tony Luck wrote:
> > I've only updated the Intel severity calculation to use this new
> > severity level. I'm not sure if AMD also has situations where this would
> > be useful. If so, then mce_severity_amd() would need to be updated too
> > to return different severity for IN_KERNEL and IN_KERNEL_RECOV cases.
> 
> I'd look into Yazen's direction for that...
>

Yes, I think this is something we can look into. I'm not aware of any
situations at the moment. But I'd like to start focusing more on the various
recovery paths and corner cases.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-11-01 17:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22 19:51 [PATCH 0/2] Dump stack after certain machine checks Tony Luck
2022-09-22 19:51 ` [PATCH 1/2] x86/mce: Use severity table to handle uncorrected errors in kernel Tony Luck
2022-09-22 19:51 ` [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context Tony Luck
2022-10-31 16:44   ` Borislav Petkov
2022-10-31 17:13     ` Luck, Tony
2022-10-31 18:36       ` Borislav Petkov
2022-10-31 19:20         ` Luck, Tony
2022-10-31 10:30 ` [PATCH 0/2] Dump stack after certain machine checks Borislav Petkov
2022-11-01 17:36   ` Yazen Ghannam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).