Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling
@ 2019-12-10  0:07 Jan H. Schönherr
  2019-12-10  0:07 ` [PATCH 1/6] x86/mce: Take action on UCNA/Deferred errors again Jan H. Schönherr
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Jan H. Schönherr @ 2019-12-10  0:07 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Jan H. Schönherr, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

Hi.

Here is a series of smallish fixes/cleanups for the handling of MCEs.

Note, that except for patches 2 and 4, the patches are just compile tested.

Regards
Jan

Jan H. Schönherr (6):
  x86/mce: Take action on UCNA/Deferred errors again
  x86/mce: Make mce=nobootlog work again
  x86/mce: Fix possibly incorrect severity calculation on AMD
  x86/mce: Fix handling of optional message string
  x86/mce: Pass MCE message to mce_panic() on failed kernel recovery
  x86/mce: Remove mce_inject_log() in favor of mce_log()

 arch/x86/kernel/cpu/mce/core.c     | 59 ++++++++++++------------------
 arch/x86/kernel/cpu/mce/inject.c   |  2 +-
 arch/x86/kernel/cpu/mce/internal.h |  2 -
 3 files changed, 25 insertions(+), 38 deletions(-)

-- 
2.22.0.3.gb49bb57c8208.dirty


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] x86/mce: Take action on UCNA/Deferred errors again
  2019-12-10  0:07 [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Jan H. Schönherr
@ 2019-12-10  0:07 ` Jan H. Schönherr
  2019-12-10  0:07 ` [PATCH 2/6] x86/mce: Make mce=nobootlog work again Jan H. Schönherr
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Jan H. Schönherr @ 2019-12-10  0:07 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Jan H. Schönherr, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

Linux 3.19 commit fa92c5869426 ("x86, mce: Support memory error recovery
for both UCNA and Deferred error in machine_check_poll") added handling
of UCNA and Deferred errors by adding them to the ring for SRAO errors.

Later, Linux 4.3 commit fd4cf79fcc4b ("x86/mce: Remove the MCE ring for
Action Optional errors") switched storage from the SRAO ring to the
unified pool that is still in use today. In order to only act on the
intended errors, a filter for MCE_AO_SEVERITY was used -- effectively
removing handling of UCNA/Deferred errors again.

Extend the severity filter to include UCNA/Deferred errors once more.

Fixes: fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
 arch/x86/kernel/cpu/mce/core.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 743370ee4983..d5a8b99f7ba3 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -595,14 +595,16 @@ static int srao_decode_notifier(struct notifier_block *nb, unsigned long val,
 	struct mce *mce = (struct mce *)data;
 	unsigned long pfn;
 
-	if (!mce)
+	if (!mce || !mce_usable_address(mce))
 		return NOTIFY_DONE;
 
-	if (mce_usable_address(mce) && (mce->severity == MCE_AO_SEVERITY)) {
-		pfn = mce->addr >> PAGE_SHIFT;
-		if (!memory_failure(pfn, 0))
-			set_mce_nospec(pfn);
-	}
+	if (mce->severity != MCE_AO_SEVERITY &&
+	    mce->severity != MCE_DEFERRED_SEVERITY)
+		return NOTIFY_DONE;
+
+	pfn = mce->addr >> PAGE_SHIFT;
+	if (!memory_failure(pfn, 0))
+		set_mce_nospec(pfn);
 
 	return NOTIFY_OK;
 }
-- 
2.22.0.3.gb49bb57c8208.dirty


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/6] x86/mce: Make mce=nobootlog work again
  2019-12-10  0:07 [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Jan H. Schönherr
  2019-12-10  0:07 ` [PATCH 1/6] x86/mce: Take action on UCNA/Deferred errors again Jan H. Schönherr
@ 2019-12-10  0:07 ` Jan H. Schönherr
  2019-12-16 17:15   ` Borislav Petkov
  2019-12-10  0:07 ` [PATCH 3/6] x86/mce: Fix possibly incorrect severity calculation on AMD Jan H. Schönherr
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Jan H. Schönherr @ 2019-12-10  0:07 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Jan H. Schönherr, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

Since Linux 4.5 commit 8b38937b7ab5 ("x86/mce: Do not enter deferred
errors into the generic pool twice") the mce=nobootlog option has become
mostly ineffective (after being only slightly ineffective before), as
the code is taking actions on MCEs left over from boot when they have a
usable address.

Move the check for MCP_DONTLOG a bit outward to make it effective again.

Also, since Linux 4.12 commit 011d82611172 ("RAS: Add a Corrected Errors
Collector") the two branches of the remaining "if" the bottom of
machine_check_poll() do the same. Unify them.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
 arch/x86/kernel/cpu/mce/core.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index d5a8b99f7ba3..81ab25d5357a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -760,24 +760,17 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 log_it:
 		error_seen = true;
 
-		mce_read_aux(&m, i);
-
-		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
-
-		/*
-		 * Don't get the IP here because it's unlikely to
-		 * have anything to do with the actual error location.
-		 */
-		if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce)
-			mce_log(&m);
-		else if (mce_usable_address(&m)) {
+		if (!(flags & MCP_DONTLOG)) {
+			mce_read_aux(&m, i);
+			m.severity = mce_severity(&m, mca_cfg.tolerant, NULL,
+						  false);
 			/*
-			 * Although we skipped logging this, we still want
-			 * to take action. Add to the pool so the registered
-			 * notifiers will see it.
+			 * Don't get the IP here because it's unlikely to
+			 * have anything to do with the actual error location.
 			 */
-			if (!mce_gen_pool_add(&m))
-				mce_schedule_work();
+
+			if (!mca_cfg.dont_log_ce || mce_usable_address(&m))
+				mce_log(&m);
 		}
 
 		/*
-- 
2.22.0.3.gb49bb57c8208.dirty


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/6] x86/mce: Fix possibly incorrect severity calculation on AMD
  2019-12-10  0:07 [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Jan H. Schönherr
  2019-12-10  0:07 ` [PATCH 1/6] x86/mce: Take action on UCNA/Deferred errors again Jan H. Schönherr
  2019-12-10  0:07 ` [PATCH 2/6] x86/mce: Make mce=nobootlog work again Jan H. Schönherr
@ 2019-12-10  0:07 ` Jan H. Schönherr
  2019-12-16 17:26   ` Borislav Petkov
  2019-12-10  0:07 ` [PATCH 4/6] x86/mce: Fix handling of optional message string Jan H. Schönherr
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Jan H. Schönherr @ 2019-12-10  0:07 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Jan H. Schönherr, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

The function mce_severity_amd_smca() requires m->bank to be initialized
for correct operation. Fix the one case, where mce_severity() is called
without doing so.

Fixes: 6bda529ec42e ("x86/mce: Grade uncorrected errors for SMCA-enabled systems")
Fixes: d28af26faa0b ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
 arch/x86/kernel/cpu/mce/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 81ab25d5357a..6afb9de251f2 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -809,8 +809,8 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
 		if (quirk_no_way_out)
 			quirk_no_way_out(i, m, regs);
 
+		m->bank = i;
 		if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) {
-			m->bank = i;
 			mce_read_aux(m, i);
 			*msg = tmp;
 			return 1;
-- 
2.22.0.3.gb49bb57c8208.dirty


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/6] x86/mce: Fix handling of optional message string
  2019-12-10  0:07 [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Jan H. Schönherr
                   ` (2 preceding siblings ...)
  2019-12-10  0:07 ` [PATCH 3/6] x86/mce: Fix possibly incorrect severity calculation on AMD Jan H. Schönherr
@ 2019-12-10  0:07 ` Jan H. Schönherr
  2019-12-16 17:37   ` Borislav Petkov
  2019-12-10  0:07 ` [PATCH 5/6] x86/mce: Pass MCE message to mce_panic() on failed kernel recovery Jan H. Schönherr
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Jan H. Schönherr @ 2019-12-10  0:07 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Jan H. Schönherr, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

The function mce_severity() is not required to update its msg argument.
In fact, mce_severity_amd() doesn't. Fix some code paths, that assume
that it is always updated.

In particular, this avoids returning uninitialized data in
mce_no_way_out(), which may be used later for printing.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
 arch/x86/kernel/cpu/mce/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 6afb9de251f2..b11a74e3fea9 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -809,10 +809,12 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
 		if (quirk_no_way_out)
 			quirk_no_way_out(i, m, regs);
 
+		tmp = NULL;
 		m->bank = i;
 		if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) {
 			mce_read_aux(m, i);
-			*msg = tmp;
+			if (tmp)
+				*msg = tmp;
 			return 1;
 		}
 	}
@@ -1313,6 +1315,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		 * make sure we have the right "msg".
 		 */
 		if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) {
+			msg = "Unknown";
 			mce_severity(&m, cfg->tolerant, &msg, true);
 			mce_panic("Local fatal machine check!", &m, msg);
 		}
-- 
2.22.0.3.gb49bb57c8208.dirty


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/6] x86/mce: Pass MCE message to mce_panic() on failed kernel recovery
  2019-12-10  0:07 [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Jan H. Schönherr
                   ` (3 preceding siblings ...)
  2019-12-10  0:07 ` [PATCH 4/6] x86/mce: Fix handling of optional message string Jan H. Schönherr
@ 2019-12-10  0:07 ` Jan H. Schönherr
  2019-12-10  0:07 ` [PATCH 6/6] x86/mce: Remove mce_inject_log() in favor of mce_log() Jan H. Schönherr
  2019-12-11  0:25 ` [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Luck, Tony
  6 siblings, 0 replies; 19+ messages in thread
From: Jan H. Schönherr @ 2019-12-10  0:07 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Jan H. Schönherr, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

In commit b2f9d678e28c ("x86/mce: Check for faults tagged in
EXTABLE_CLASS_FAULT exception table entries") another call to mce_panic()
was introduced. Pass the message of the handled MCE to that instance
of mce_panic() as well, as there doesn't seem to be a reason not to.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
 arch/x86/kernel/cpu/mce/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b11a74e3fea9..677e9079e5ba 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1351,7 +1351,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		ist_end_non_atomic();
 	} else {
 		if (!fixup_exception(regs, X86_TRAP_MC, error_code, 0))
-			mce_panic("Failed kernel mode recovery", &m, NULL);
+			mce_panic("Failed kernel mode recovery", &m, msg);
 	}
 
 out_ist:
-- 
2.22.0.3.gb49bb57c8208.dirty


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/6] x86/mce: Remove mce_inject_log() in favor of mce_log()
  2019-12-10  0:07 [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Jan H. Schönherr
                   ` (4 preceding siblings ...)
  2019-12-10  0:07 ` [PATCH 5/6] x86/mce: Pass MCE message to mce_panic() on failed kernel recovery Jan H. Schönherr
@ 2019-12-10  0:07 ` Jan H. Schönherr
  2019-12-11  0:25 ` [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Luck, Tony
  6 siblings, 0 replies; 19+ messages in thread
From: Jan H. Schönherr @ 2019-12-10  0:07 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Jan H. Schönherr, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

The mutex in mce_inject_log() became unnecessary with Linux 4.12 commit
5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver"),
though the original reason for its presence only vanished with Linux 4.14
commit 7298f08ea887 ("x86/mcelog: Get rid of RCU remnants").

Drop the mutex. And as that makes mce_inject_log() identical to mce_log(),
get rid of the former in favor of the latter.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
 arch/x86/kernel/cpu/mce/core.c     | 11 +----------
 arch/x86/kernel/cpu/mce/inject.c   |  2 +-
 arch/x86/kernel/cpu/mce/internal.h |  2 --
 3 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 677e9079e5ba..44cccae097cb 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -53,8 +53,6 @@
 
 #include "internal.h"
 
-static DEFINE_MUTEX(mce_log_mutex);
-
 /* sysfs synchronization */
 static DEFINE_MUTEX(mce_sysfs_mutex);
 
@@ -156,14 +154,7 @@ void mce_log(struct mce *m)
 	if (!mce_gen_pool_add(m))
 		irq_work_queue(&mce_irq_work);
 }
-
-void mce_inject_log(struct mce *m)
-{
-	mutex_lock(&mce_log_mutex);
-	mce_log(m);
-	mutex_unlock(&mce_log_mutex);
-}
-EXPORT_SYMBOL_GPL(mce_inject_log);
+EXPORT_SYMBOL_GPL(mce_log);
 
 static struct notifier_block mce_srao_nb;
 
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 1f30117b24ba..3413b41b8d55 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -494,7 +494,7 @@ static void do_inject(void)
 		i_mce.status |= MCI_STATUS_SYNDV;
 
 	if (inj_type == SW_INJ) {
-		mce_inject_log(&i_mce);
+		mce_log(&i_mce);
 		return;
 	}
 
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 43031db429d2..1eb1a9343188 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -78,8 +78,6 @@ static inline int apei_clear_mce(u64 record_id)
 }
 #endif
 
-void mce_inject_log(struct mce *m);
-
 /*
  * We consider records to be equivalent if bank+status+addr+misc all match.
  * This is only used when the system is going down because of a fatal error
-- 
2.22.0.3.gb49bb57c8208.dirty


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling
  2019-12-10  0:07 [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Jan H. Schönherr
                   ` (5 preceding siblings ...)
  2019-12-10  0:07 ` [PATCH 6/6] x86/mce: Remove mce_inject_log() in favor of mce_log() Jan H. Schönherr
@ 2019-12-11  0:25 ` Luck, Tony
  2019-12-12 12:25   ` Jan H. Schönherr
  6 siblings, 1 reply; 19+ messages in thread
From: Luck, Tony @ 2019-12-11  0:25 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Borislav Petkov, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

On Tue, Dec 10, 2019 at 01:07:27AM +0100, Jan H. Schönherr wrote:
> Hi.
> 
> Here is a series of smallish fixes/cleanups for the handling of MCEs.
> 
> Note, that except for patches 2 and 4, the patches are just compile tested.

I tried some UC injections with these patches applied. Stuff
still works. But thanks to the UCN/Deferred patch I see:

	Memory failure: 0x5fe3284: already hardware poisoned

which is expected because on Intel we see an SRAR error in bank 1
and the machine check takes the page offline.  Then we see a UCNA
from another bank for the same address and try to take the same
page offline again.

This is a return to the old behavior, but might surprise some folks.

One nit-pick - I think we should rename the srao_decode_notifier()
function since it now handles both SRAO and UCNA. Not sure what a
good name is ... something like "uc_decode_notifier()" since it takes
action for multiple classes of uncorrected errors?

All the patches look OK to me (modulo the above rename suggestion).
So consider this a "Reviewed-by" for all six.

Thanks in particular for hunting down what happened to the UCNA
action ... I was looking for it a couple of months ago and didn't
have time to complete the search.

-Tony

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling
  2019-12-11  0:25 ` [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Luck, Tony
@ 2019-12-12 12:25   ` Jan H. Schönherr
  2019-12-16 16:52     ` Borislav Petkov
  0 siblings, 1 reply; 19+ messages in thread
From: Jan H. Schönherr @ 2019-12-12 12:25 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

On 11/12/2019 01.25, Luck, Tony wrote:
> On Tue, Dec 10, 2019 at 01:07:27AM +0100, Jan H. Schönherr wrote:
>> Hi.
>>
>> Here is a series of smallish fixes/cleanups for the handling of MCEs.
>>
>> Note, that except for patches 2 and 4, the patches are just compile tested.
> 
> I tried some UC injections with these patches applied. Stuff
> still works. But thanks to the UCN/Deferred patch I see:
> 
> 	Memory failure: 0x5fe3284: already hardware poisoned
> 
> which is expected because on Intel we see an SRAR error in bank 1
> and the machine check takes the page offline.  Then we see a UCNA
> from another bank for the same address and try to take the same
> page offline again.

As I'm not a subject matter expert: But there are cases, where we
get an UCNA only, right? So the change makes still sense?

> This is a return to the old behavior, but might surprise some folks.

I'll add this as a note to the commit message.

> One nit-pick - I think we should rename the srao_decode_notifier()
> function since it now handles both SRAO and UCNA. Not sure what a
> good name is ... something like "uc_decode_notifier()" since it takes
> action for multiple classes of uncorrected errors?

This and names like "uncorrected_memory_error_notifier()" seem to imply
a wider scope than the function actually has. That brings me to another
question: should the scope be wider?

Instead of filtering for usable addresses and specific severities, we
could for example filter for (similar to cec_add_mce()):

  mce_is_memory_error(m) &&
  !mce_is_correctable(m) &&
  mce_usable_address(m)

Would that make sense? Or does that violate anything, that I'm not aware of?

> All the patches look OK to me (modulo the above rename suggestion).
> So consider this a "Reviewed-by" for all six.

Thank you for reviewing. :)

Regards
Jan

--



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling
  2019-12-12 12:25   ` Jan H. Schönherr
@ 2019-12-16 16:52     ` Borislav Petkov
  2019-12-16 21:59       ` Luck, Tony
  0 siblings, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2019-12-16 16:52 UTC (permalink / raw)
  To: Jan H. Schönherr, Yazen Ghannam
  Cc: Luck, Tony, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

On Thu, Dec 12, 2019 at 01:25:31PM +0100, Jan H. Schönherr wrote:
> This and names like "uncorrected_memory_error_notifier()" seem to imply
> a wider scope than the function actually has. That brings me to another
> question: should the scope be wider?
> 
> Instead of filtering for usable addresses and specific severities, we
> could for example filter for (similar to cec_add_mce()):
> 
>   mce_is_memory_error(m) &&
>   !mce_is_correctable(m) &&
>   mce_usable_address(m)

There's a comment above that code which says what that function wants:

	/* We eat only correctable DRAM errors with usable addresses. */

> Would that make sense? Or does that violate anything, that I'm not aware of?

So this should be a decision of the two CPU vendors basically answering
the question: for which error severities you want the kernel to poison
pages?

Basically a question for Tony and Yazen. CCed.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] x86/mce: Make mce=nobootlog work again
  2019-12-10  0:07 ` [PATCH 2/6] x86/mce: Make mce=nobootlog work again Jan H. Schönherr
@ 2019-12-16 17:15   ` Borislav Petkov
  0 siblings, 0 replies; 19+ messages in thread
From: Borislav Petkov @ 2019-12-16 17:15 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Tony Luck, linux-edac, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86

On Tue, Dec 10, 2019 at 01:07:29AM +0100, Jan H. Schönherr wrote:
> Since Linux 4.5 commit 8b38937b7ab5 ("x86/mce: Do not enter deferred

You don't have to go figure out the kernel version each time you quote
a commit - most people should be able to do git describe or git tag
--contains :)

> errors into the generic pool twice") the mce=nobootlog option has become
> mostly ineffective (after being only slightly ineffective before), as
> the code is taking actions on MCEs left over from boot when they have a
> usable address.
> 
> Move the check for MCP_DONTLOG a bit outward to make it effective again.
> 
> Also, since Linux 4.12 commit 011d82611172 ("RAS: Add a Corrected Errors
> Collector") the two branches of the remaining "if" the bottom of
> machine_check_poll() do the same. Unify them.
> 
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 25 +++++++++----------------
>  1 file changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index d5a8b99f7ba3..81ab25d5357a 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -760,24 +760,17 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>  log_it:
>  		error_seen = true;
>  
> -		mce_read_aux(&m, i);
> -
> -		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> -
> -		/*
> -		 * Don't get the IP here because it's unlikely to
> -		 * have anything to do with the actual error location.
> -		 */
> -		if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce)
> -			mce_log(&m);
> -		else if (mce_usable_address(&m)) {
> +		if (!(flags & MCP_DONTLOG)) {

I hate that double-negation logic we have in the code. :-\

	if (! ... DONT...

Can you pls flip the logic here?

	if (flags & MCP_DONTLOG)
		goto clear_bank;

	/* logging code */

clear_bank:
	mce_wrmsrl(msr_ops.status(i), 0);

This way you'll save an indentation level too. Something like this (I
took your patch and mangled it):

---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5f42f25bac8f..2b43caaba70d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -763,29 +763,20 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 log_it:
 		error_seen = true;
 
-		mce_read_aux(&m, i);
+		if (flags & MCP_DONTLOG)
+			goto clear_bank;
 
+		mce_read_aux(&m, i);
 		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
 		/*
-		 * Don't get the IP here because it's unlikely to
-		 * have anything to do with the actual error location.
+		 * Don't get the IP here because it's unlikely to have anything
+		 * to do with the actual error location.
 		 */
-		if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce)
+		if (!mca_cfg.dont_log_ce || mce_usable_address(&m))
 			mce_log(&m);
-		else if (mce_usable_address(&m)) {
-			/*
-			 * Although we skipped logging this, we still want
-			 * to take action. Add to the pool so the registered
-			 * notifiers will see it.
-			 */
-			if (!mce_gen_pool_add(&m))
-				mce_schedule_work();
-		}
 
-		/*
-		 * Clear state for this bank.
-		 */
+clear_bank:
 		mce_wrmsrl(msr_ops.status(i), 0);
 	}
 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] x86/mce: Fix possibly incorrect severity calculation on AMD
  2019-12-10  0:07 ` [PATCH 3/6] x86/mce: Fix possibly incorrect severity calculation on AMD Jan H. Schönherr
@ 2019-12-16 17:26   ` Borislav Petkov
  2019-12-16 17:35     ` Ghannam, Yazen
  0 siblings, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2019-12-16 17:26 UTC (permalink / raw)
  To: Jan H. Schönherr, Yazen Ghannam
  Cc: Tony Luck, linux-edac, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86

On Tue, Dec 10, 2019 at 01:07:30AM +0100, Jan H. Schönherr wrote:
> The function mce_severity_amd_smca() requires m->bank to be initialized
> for correct operation. Fix the one case, where mce_severity() is called
> without doing so.
> 
> Fixes: 6bda529ec42e ("x86/mce: Grade uncorrected errors for SMCA-enabled systems")
> Fixes: d28af26faa0b ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()")
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 81ab25d5357a..6afb9de251f2 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -809,8 +809,8 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
>  		if (quirk_no_way_out)
>  			quirk_no_way_out(i, m, regs);
>  
> +		m->bank = i;
>  		if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) {
> -			m->bank = i;
>  			mce_read_aux(m, i);
>  			*msg = tmp;
>  			return 1;
> -- 

Good catch. This should go to Linus now.

Yazen, any objections?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 3/6] x86/mce: Fix possibly incorrect severity calculation on AMD
  2019-12-16 17:26   ` Borislav Petkov
@ 2019-12-16 17:35     ` Ghannam, Yazen
  0 siblings, 0 replies; 19+ messages in thread
From: Ghannam, Yazen @ 2019-12-16 17:35 UTC (permalink / raw)
  To: Borislav Petkov, Jan H. Schönherr
  Cc: Tony Luck, linux-edac, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Monday, December 16, 2019 12:26 PM
> To: Jan H. Schönherr <jschoenh@amazon.de>; Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: Tony Luck <tony.luck@intel.com>; linux-edac@vger.kernel.org; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar
> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; x86@kernel.org
> Subject: Re: [PATCH 3/6] x86/mce: Fix possibly incorrect severity calculation on AMD
> 
> On Tue, Dec 10, 2019 at 01:07:30AM +0100, Jan H. Schönherr wrote:
> > The function mce_severity_amd_smca() requires m->bank to be initialized
> > for correct operation. Fix the one case, where mce_severity() is called
> > without doing so.
> >
> > Fixes: 6bda529ec42e ("x86/mce: Grade uncorrected errors for SMCA-enabled systems")
> > Fixes: d28af26faa0b ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()")
> > Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> > ---
> >  arch/x86/kernel/cpu/mce/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> > index 81ab25d5357a..6afb9de251f2 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -809,8 +809,8 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
> >  		if (quirk_no_way_out)
> >  			quirk_no_way_out(i, m, regs);
> >
> > +		m->bank = i;
> >  		if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) {
> > -			m->bank = i;
> >  			mce_read_aux(m, i);
> >  			*msg = tmp;
> >  			return 1;
> > --
> 
> Good catch. This should go to Linus now.
> 
> Yazen, any objections?
> 

No objections.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] x86/mce: Fix handling of optional message string
  2019-12-10  0:07 ` [PATCH 4/6] x86/mce: Fix handling of optional message string Jan H. Schönherr
@ 2019-12-16 17:37   ` Borislav Petkov
  2019-12-19 17:49     ` Jan H. Schönherr
  0 siblings, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2019-12-16 17:37 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Tony Luck, linux-edac, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86

On Tue, Dec 10, 2019 at 01:07:31AM +0100, Jan H. Schönherr wrote:
> The function mce_severity() is not required to update its msg argument.
> In fact, mce_severity_amd() doesn't. Fix some code paths, that assume
> that it is always updated.
> 
> In particular, this avoids returning uninitialized data in
> mce_no_way_out(), which may be used later for printing.
> 
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 6afb9de251f2..b11a74e3fea9 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -809,10 +809,12 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
>  		if (quirk_no_way_out)
>  			quirk_no_way_out(i, m, regs);
>  
> +		tmp = NULL;
>  		m->bank = i;
>  		if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) {
>  			mce_read_aux(m, i);
> -			*msg = tmp;
> +			if (tmp)
> +				*msg = tmp;
>  			return 1;
>  		}
>  	}
> @@ -1313,6 +1315,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  		 * make sure we have the right "msg".
>  		 */
>  		if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) {
> +			msg = "Unknown";
>  			mce_severity(&m, cfg->tolerant, &msg, true);
>  			mce_panic("Local fatal machine check!", &m, msg);
>  		}
> -- 

Can we get rid of all that silliness of dealing with a possibly
uninitialized pointer in the callers and simply do at the beginning of
mce_panic():

	if (!msg)
		msg = "Unknown";

?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling
  2019-12-16 16:52     ` Borislav Petkov
@ 2019-12-16 21:59       ` Luck, Tony
  2019-12-17  1:19         ` Ghannam, Yazen
  0 siblings, 1 reply; 19+ messages in thread
From: Luck, Tony @ 2019-12-16 21:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jan H. Schönherr, Yazen Ghannam, linux-edac,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86

On Mon, Dec 16, 2019 at 05:52:07PM +0100, Borislav Petkov wrote:
> On Thu, Dec 12, 2019 at 01:25:31PM +0100, Jan H. Schönherr wrote:
> > This and names like "uncorrected_memory_error_notifier()" seem to imply
> > a wider scope than the function actually has. That brings me to another
> > question: should the scope be wider?
> > 
> > Instead of filtering for usable addresses and specific severities, we
> > could for example filter for (similar to cec_add_mce()):
> > 
> >   mce_is_memory_error(m) &&
> >   !mce_is_correctable(m) &&
> >   mce_usable_address(m)
> 
> There's a comment above that code which says what that function wants:
> 
> 	/* We eat only correctable DRAM errors with usable addresses. */
> 
> > Would that make sense? Or does that violate anything, that I'm not aware of?
> 
> So this should be a decision of the two CPU vendors basically answering
> the question: for which error severities you want the kernel to poison
> pages?
> 
> Basically a question for Tony and Yazen. CCed.

Using Intel naming, I'd like the SRAO and UCNA severity uncorrected
errors to take the soft offline path to stop using pages.

-Tony

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling
  2019-12-16 21:59       ` Luck, Tony
@ 2019-12-17  1:19         ` Ghannam, Yazen
  2019-12-17  7:34           ` Borislav Petkov
  0 siblings, 1 reply; 19+ messages in thread
From: Ghannam, Yazen @ 2019-12-17  1:19 UTC (permalink / raw)
  To: Luck, Tony, Borislav Petkov
  Cc: Jan H. Schönherr, linux-edac, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

> -----Original Message-----
> From: Luck, Tony <tony.luck@intel.com>
> Sent: Monday, December 16, 2019 4:59 PM
> To: Borislav Petkov <bp@alien8.de>
> Cc: Jan H. Schönherr <jschoenh@amazon.de>; Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux-edac@vger.kernel.org; Thomas
> Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; x86@kernel.org
> Subject: Re: [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling
> 
> On Mon, Dec 16, 2019 at 05:52:07PM +0100, Borislav Petkov wrote:
> > On Thu, Dec 12, 2019 at 01:25:31PM +0100, Jan H. Schönherr wrote:
> > > This and names like "uncorrected_memory_error_notifier()" seem to imply
> > > a wider scope than the function actually has. That brings me to another
> > > question: should the scope be wider?
> > >
> > > Instead of filtering for usable addresses and specific severities, we
> > > could for example filter for (similar to cec_add_mce()):
> > >
> > >   mce_is_memory_error(m) &&
> > >   !mce_is_correctable(m) &&
> > >   mce_usable_address(m)
> >
> > There's a comment above that code which says what that function wants:
> >
> > 	/* We eat only correctable DRAM errors with usable addresses. */
> >
> > > Would that make sense? Or does that violate anything, that I'm not aware of?
> >
> > So this should be a decision of the two CPU vendors basically answering
> > the question: for which error severities you want the kernel to poison
> > pages?
> >
> > Basically a question for Tony and Yazen. CCed.
> 
> Using Intel naming, I'd like the SRAO and UCNA severity uncorrected
> errors to take the soft offline path to stop using pages.
> 

For AMD, I'd like that no errors are handled in the SRAO notifier for now.

The DEFERRED severity could be used, but I'd like to do more testing beforehand.

Thanks,
Yazen



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling
  2019-12-17  1:19         ` Ghannam, Yazen
@ 2019-12-17  7:34           ` Borislav Petkov
  0 siblings, 0 replies; 19+ messages in thread
From: Borislav Petkov @ 2019-12-17  7:34 UTC (permalink / raw)
  To: Ghannam, Yazen
  Cc: Luck, Tony, Jan H. Schönherr, linux-edac, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86

On Tue, Dec 17, 2019 at 01:19:30AM +0000, Ghannam, Yazen wrote:
> > Using Intel naming, I'd like the SRAO and UCNA severity uncorrected
> > errors to take the soft offline path to stop using pages.

Ok.

> For AMD, I'd like that no errors are handled in the SRAO notifier for now.
> 
> The DEFERRED severity could be used, but I'd like to do more testing beforehand.

If that notifier should be disabled on AMD, we should make it explicit
and not rely on a severify define. But sure, do some testing first
before you know what exactly needs to happen.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] x86/mce: Fix handling of optional message string
  2019-12-16 17:37   ` Borislav Petkov
@ 2019-12-19 17:49     ` Jan H. Schönherr
  2019-12-20 10:01       ` Borislav Petkov
  0 siblings, 1 reply; 19+ messages in thread
From: Jan H. Schönherr @ 2019-12-19 17:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, linux-edac, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86

On 16/12/2019 18.37, Borislav Petkov wrote:
> On Tue, Dec 10, 2019 at 01:07:31AM +0100, Jan H. Schönherr wrote:
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index 6afb9de251f2..b11a74e3fea9 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -809,10 +809,12 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
>>  		if (quirk_no_way_out)
>>  			quirk_no_way_out(i, m, regs);
>>  
>> +		tmp = NULL;
>>  		m->bank = i;
>>  		if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) {
>>  			mce_read_aux(m, i);
>> -			*msg = tmp;
>> +			if (tmp)
>> +				*msg = tmp;
>>  			return 1;
>>  		}
>>  	}
>> @@ -1313,6 +1315,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>  		 * make sure we have the right "msg".
>>  		 */
>>  		if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) {
>> +			msg = "Unknown";
>>  			mce_severity(&m, cfg->tolerant, &msg, true);
>>  			mce_panic("Local fatal machine check!", &m, msg);
>>  		}
>> -- 
> 
> Can we get rid of all that silliness of dealing with a possibly
> uninitialized pointer in the callers and simply do at the beginning of
> mce_panic():
> 
> 	if (!msg)
> 		msg = "Unknown";
> 
> ?
> 

Not quite. mce_panic() already handles NULL as a value for "exp" (not "msg").

We still need to pass NULL or a proper pointer. Not some uninitialized, potentially
random data.

So, at the very least we need to initialize "tmp" in mce_no_way_out(), if you're looking
for a minimal patch.

This would turn the (non-existing) description of the "msg" argument of mce_severity()
from an assumed:
  "an implementation may or may not update a provided *msg argument"
to:
  "an implementation must either always update a provided *msg argument, or it must never do so"

Regards
Jan

--



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] x86/mce: Fix handling of optional message string
  2019-12-19 17:49     ` Jan H. Schönherr
@ 2019-12-20 10:01       ` Borislav Petkov
  0 siblings, 0 replies; 19+ messages in thread
From: Borislav Petkov @ 2019-12-20 10:01 UTC (permalink / raw)
  To: Jan H. Schönherr
  Cc: Tony Luck, linux-edac, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86

On Thu, Dec 19, 2019 at 06:49:32PM +0100, Jan H. Schönherr wrote:
> Not quite. mce_panic() already handles NULL as a value for "exp" (not "msg").
> 
> We still need to pass NULL or a proper pointer. Not some uninitialized, potentially
> random data.
> 
> So, at the very least we need to initialize "tmp" in mce_no_way_out(), if you're looking
> for a minimal patch.

Yes, sure, this is what I'm thinking of.

And yes, I'm not going to cry if we don't print the immensely helpful
"Unknown" anymore.</sarcasm>

---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 68dd4b358740..fd23f9f53379 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -798,7 +798,7 @@ EXPORT_SYMBOL_GPL(machine_check_poll);
 static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
 			  struct pt_regs *regs)
 {
-	char *tmp;
+	char *tmp = NULL;
 	int i;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
@@ -1223,8 +1223,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
 	struct mca_config *cfg = &mca_cfg;
 	int cpu = smp_processor_id();
-	char *msg = "Unknown";
 	struct mce m, *final;
+	char *msg = NULL;
 	int worst = 0;
 
 	/*

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-10  0:07 [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Jan H. Schönherr
2019-12-10  0:07 ` [PATCH 1/6] x86/mce: Take action on UCNA/Deferred errors again Jan H. Schönherr
2019-12-10  0:07 ` [PATCH 2/6] x86/mce: Make mce=nobootlog work again Jan H. Schönherr
2019-12-16 17:15   ` Borislav Petkov
2019-12-10  0:07 ` [PATCH 3/6] x86/mce: Fix possibly incorrect severity calculation on AMD Jan H. Schönherr
2019-12-16 17:26   ` Borislav Petkov
2019-12-16 17:35     ` Ghannam, Yazen
2019-12-10  0:07 ` [PATCH 4/6] x86/mce: Fix handling of optional message string Jan H. Schönherr
2019-12-16 17:37   ` Borislav Petkov
2019-12-19 17:49     ` Jan H. Schönherr
2019-12-20 10:01       ` Borislav Petkov
2019-12-10  0:07 ` [PATCH 5/6] x86/mce: Pass MCE message to mce_panic() on failed kernel recovery Jan H. Schönherr
2019-12-10  0:07 ` [PATCH 6/6] x86/mce: Remove mce_inject_log() in favor of mce_log() Jan H. Schönherr
2019-12-11  0:25 ` [PATCH 0/6] x86/mce: Various fixes and cleanups for MCE handling Luck, Tony
2019-12-12 12:25   ` Jan H. Schönherr
2019-12-16 16:52     ` Borislav Petkov
2019-12-16 21:59       ` Luck, Tony
2019-12-17  1:19         ` Ghannam, Yazen
2019-12-17  7:34           ` Borislav Petkov

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git