[PATCH 0/7] RAS queue

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] RAS queue
@ 2018-06-22  9:54 Borislav Petkov
  2018-06-22  9:54 ` [PATCH 1/7] x86/mce: Always use 64-bit timestamps Borislav Petkov
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Borislav Petkov @ 2018-06-22  9:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Hi guys,

please queue on tip:ras/core.

Thx.

Arnd Bergmann (1):
  x86/mce: Always use 64-bit timestamps

Borislav Petkov (5):
  x86/mce: Carve out the crashing_cpu check
  x86/mce: Remove !banks check
  x86/mce: Carve out bank scanning code
  x86/mce: Cleanup __mc_scan_banks()
  x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()

Tony Luck (1):
  x86/mce: Fix incorrect "Machine check from unknown source" message

 arch/x86/kernel/cpu/mcheck/mce.c | 244 +++++++++++++++++--------------
 1 file changed, 134 insertions(+), 110 deletions(-)

-- 
2.17.0.582.gccdcbd54c


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/7] x86/mce: Always use 64-bit timestamps
  2018-06-22  9:54 [PATCH 0/7] RAS queue Borislav Petkov
@ 2018-06-22  9:54 ` Borislav Petkov
  2018-06-22  9:54 ` [PATCH 2/7] x86/mce: Fix incorrect "Machine check from unknown source" message Borislav Petkov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2018-06-22  9:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Arnd Bergmann <arnd@arndb.de>

The machine check timestamp uses get_seconds(), which returns an
'unsigned long' number that might overflow on 32-bit architectures (in
the distant future) and is therefore deprecated.

The normal replacement would be ktime_get_real_seconds(), but that needs
to use a sequence lock that might cause a deadlock if the MCE happens at
just the wrong moment. The __ktime_get_real_seconds() skips that lock
and is safer here, but has a miniscule risk of returning the wrong time
when we read it on a 32-bit architecture at the same time as updating
the epoch, i.e. from before y2106 overflow time to after, or vice versa.

This seems to be an acceptable risk in this particular case, and is the
same thing we do in kdb.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: y2038@lists.linaro.org
Link: http://lkml.kernel.org/r/20180618100759.1921750-1-arnd@arndb.de
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e4cf6ff1c2e1..b887415652ed 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -123,8 +123,8 @@ void mce_setup(struct mce *m)
 {
 	memset(m, 0, sizeof(struct mce));
 	m->cpu = m->extcpu = smp_processor_id();
-	/* We hope get_seconds stays lockless */
-	m->time = get_seconds();
+	/* need the internal __ version to avoid deadlocks */
+	m->time = __ktime_get_real_seconds();
 	m->cpuvendor = boot_cpu_data.x86_vendor;
 	m->cpuid = cpuid_eax(1);
 	m->socketid = cpu_data(m->extcpu).phys_proc_id;
-- 
2.17.0.582.gccdcbd54c

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/7] x86/mce: Fix incorrect "Machine check from unknown source" message
  2018-06-22  9:54 [PATCH 0/7] RAS queue Borislav Petkov
  2018-06-22  9:54 ` [PATCH 1/7] x86/mce: Always use 64-bit timestamps Borislav Petkov
@ 2018-06-22  9:54 ` Borislav Petkov
  2018-06-22  9:54 ` [PATCH 3/7] x86/mce: Carve out the crashing_cpu check Borislav Petkov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2018-06-22  9:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Tony Luck <tony.luck@intel.com>

Some injection testing resulted in the following console log:

  mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134
  mce: [Hardware Error]: RIP 10:<ffffffffc05292dd> {pmem_do_bvec+0x11d/0x330 [nd_pmem]}
  mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88
  mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043
  mce: [Hardware Error]: Run the above through 'mcelog --ascii'
  Kernel panic - not syncing: Machine check from unknown source

This confused everybody because the first line quite clearly shows
that we found a logged error in "Bank 1", while the last line says
"unknown source".

The problem is that the Linux code doesn't do the right thing
for a local machine check that results in a fatal error.

It turns out that we know very early in the handler whether the
machine check is fatal. The call to mce_no_way_out() has checked
all the banks for the CPU that took the local machine check. If
it says we must crash, we can do so right away with the right
messages.

We do scan all the banks again. This means that we might initially
not see a problem, but during the second scan find something fatal.
If this happens we print a slightly different message (so I can
see if it actually every happens).

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: stable@vger.kernel.org # 4.2
Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com
[ Remove unneeded severity assignment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b887415652ed..c3d08a69194d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1205,13 +1205,18 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		lmce = m.mcgstatus & MCG_STATUS_LMCES;
 
 	/*
+	 * Local machine check may already know that we have to panic.
+	 * Broadcast machine check begins rendezvous in mce_start()
 	 * Go through all banks in exclusion of the other CPUs. This way we
 	 * don't report duplicated events on shared banks because the first one
-	 * to see it will clear it. If this is a Local MCE, then no need to
-	 * perform rendezvous.
+	 * to see it will clear it.
 	 */
-	if (!lmce)
+	if (lmce) {
+		if (no_way_out)
+			mce_panic("Fatal local machine check", &m, msg);
+	} else {
 		order = mce_start(&no_way_out);
+	}
 
 	for (i = 0; i < cfg->banks; i++) {
 		__clear_bit(i, toclear);
@@ -1287,12 +1292,17 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 			no_way_out = worst >= MCE_PANIC_SEVERITY;
 	} else {
 		/*
-		 * Local MCE skipped calling mce_reign()
-		 * If we found a fatal error, we need to panic here.
+		 * If there was a fatal machine check we should have
+		 * already called mce_panic earlier in this function.
+		 * Since we re-read the banks, we might have found
+		 * something new. Check again to see if we found a
+		 * fatal error. We call "mce_severity()" again to
+		 * make sure we have the right "msg".
 		 */
-		 if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
-			mce_panic("Machine check from unknown source",
-				NULL, NULL);
+		if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) {
+			mce_severity(&m, cfg->tolerant, &msg, true);
+			mce_panic("Local fatal machine check!", &m, msg);
+		}
 	}
 
 	/*
-- 
2.17.0.582.gccdcbd54c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/7] x86/mce: Carve out the crashing_cpu check
  2018-06-22  9:54 [PATCH 0/7] RAS queue Borislav Petkov
  2018-06-22  9:54 ` [PATCH 1/7] x86/mce: Always use 64-bit timestamps Borislav Petkov
  2018-06-22  9:54 ` [PATCH 2/7] x86/mce: Fix incorrect "Machine check from unknown source" message Borislav Petkov
@ 2018-06-22  9:54 ` Borislav Petkov
  2018-06-22 12:41   ` [tip:ras/core] " tip-bot for Borislav Petkov
  2018-06-22  9:54 ` [PATCH 4/7] x86/mce: Remove !banks check Borislav Petkov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2018-06-22  9:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Carve out the rendezvous handler timeout avoidance check into a separate
function in order to simplify the #MC handler.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 64 ++++++++++++++++++--------------
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index c3d08a69194d..ea1521ec7e5b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1102,6 +1102,34 @@ static void mce_unmap_kpfn(unsigned long pfn)
 }
 #endif
 
+
+/*
+ * Cases where we avoid rendezvous handler timeout:
+ * 1) If this CPU is offline.
+ *
+ * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to
+ *  skip those CPUs which remain looping in the 1st kernel - see
+ *  crash_nmi_callback().
+ *
+ * Note: there still is a small window between kexec-ing and the new,
+ * kdump kernel establishing a new #MC handler where a broadcasted MCE
+ * might not get handled properly.
+ */
+static bool __mc_check_crashing_cpu(int cpu)
+{
+	if (cpu_is_offline(cpu) ||
+	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
+		u64 mcgstatus;
+
+		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
+		if (mcgstatus & MCG_STATUS_RIPV) {
+			mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
+			return true;
+		}
+	}
+	return false;
+}
+
 /*
  * The actual machine check handler. This only handles real
  * exceptions when something got corrupted coming in through int 18.
@@ -1116,60 +1144,42 @@ static void mce_unmap_kpfn(unsigned long pfn)
  */
 void do_machine_check(struct pt_regs *regs, long error_code)
 {
+	DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
+	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
 	struct mca_config *cfg = &mca_cfg;
+	int cpu = smp_processor_id();
+	char *msg = "Unknown";
 	struct mce m, *final;
-	int i;
 	int worst = 0;
 	int severity;
+	int i;
 
 	/*
 	 * Establish sequential order between the CPUs entering the machine
 	 * check handler.
 	 */
 	int order = -1;
+
 	/*
 	 * If no_way_out gets set, there is no safe way to recover from this
 	 * MCE.  If mca_cfg.tolerant is cranked up, we'll try anyway.
 	 */
 	int no_way_out = 0;
+
 	/*
 	 * If kill_it gets set, there might be a way to recover from this
 	 * error.
 	 */
 	int kill_it = 0;
-	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
-	DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
-	char *msg = "Unknown";
 
 	/*
 	 * MCEs are always local on AMD. Same is determined by MCG_STATUS_LMCES
 	 * on Intel.
 	 */
 	int lmce = 1;
-	int cpu = smp_processor_id();
-
-	/*
-	 * Cases where we avoid rendezvous handler timeout:
-	 * 1) If this CPU is offline.
-	 *
-	 * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to
-	 *  skip those CPUs which remain looping in the 1st kernel - see
-	 *  crash_nmi_callback().
-	 *
-	 * Note: there still is a small window between kexec-ing and the new,
-	 * kdump kernel establishing a new #MC handler where a broadcasted MCE
-	 * might not get handled properly.
-	 */
-	if (cpu_is_offline(cpu) ||
-	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
-		u64 mcgstatus;
 
-		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
-		if (mcgstatus & MCG_STATUS_RIPV) {
-			mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
-			return;
-		}
-	}
+	if (__mc_check_crashing_cpu(cpu))
+		return;
 
 	ist_enter(regs);
 
-- 
2.17.0.582.gccdcbd54c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/7] x86/mce: Remove !banks check
  2018-06-22  9:54 [PATCH 0/7] RAS queue Borislav Petkov
                   ` (2 preceding siblings ...)
  2018-06-22  9:54 ` [PATCH 3/7] x86/mce: Carve out the crashing_cpu check Borislav Petkov
@ 2018-06-22  9:54 ` Borislav Petkov
  2018-06-22 12:41   ` [tip:ras/core] " tip-bot for Borislav Petkov
  2018-06-22  9:54 ` [PATCH 5/7] x86/mce: Carve out bank scanning code Borislav Petkov
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2018-06-22  9:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

If we don't have MCA banks, we won't see machine checks anyway. Drop the
check.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index ea1521ec7e5b..05d2d15a95bd 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1185,9 +1185,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 
 	this_cpu_inc(mce_exception_count);
 
-	if (!cfg->banks)
-		goto out;
-
 	mce_gather_info(&m, regs);
 	m.tsc = rdtsc();
 
@@ -1327,7 +1324,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	if (worst > 0)
 		mce_report_event(regs);
 	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
-out:
+
 	sync_core();
 
 	if (worst != MCE_AR_SEVERITY && !kill_it)
-- 
2.17.0.582.gccdcbd54c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/7] x86/mce: Carve out bank scanning code
  2018-06-22  9:54 [PATCH 0/7] RAS queue Borislav Petkov
                   ` (3 preceding siblings ...)
  2018-06-22  9:54 ` [PATCH 4/7] x86/mce: Remove !banks check Borislav Petkov
@ 2018-06-22  9:54 ` Borislav Petkov
  2018-06-22 12:42   ` [tip:ras/core] " tip-bot for Borislav Petkov
  2018-06-22  9:54 ` [PATCH 6/7] x86/mce: Cleanup __mc_scan_banks() Borislav Petkov
  2018-06-22  9:54 ` [PATCH 7/7] x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out() Borislav Petkov
  6 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2018-06-22  9:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Carve out the scan loop into a separate function.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 134 ++++++++++++++++---------------
 1 file changed, 71 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 05d2d15a95bd..5e834bad4eaf 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1130,6 +1130,76 @@ static bool __mc_check_crashing_cpu(int cpu)
 	return false;
 }
 
+static void __mc_scan_banks(struct mce *m, struct mce *final,
+			    unsigned long *toclear, unsigned long *valid_banks,
+			    int no_way_out, int *worst)
+{
+	struct mca_config *cfg = &mca_cfg;
+	int severity, i;
+
+	for (i = 0; i < cfg->banks; i++) {
+		__clear_bit(i, toclear);
+		if (!test_bit(i, valid_banks))
+			continue;
+		if (!mce_banks[i].ctl)
+			continue;
+
+		m->misc = 0;
+		m->addr = 0;
+		m->bank = i;
+
+		m->status = mce_rdmsrl(msr_ops.status(i));
+		if ((m->status & MCI_STATUS_VAL) == 0)
+			continue;
+
+		/*
+		 * Non uncorrected or non signaled errors are handled by
+		 * machine_check_poll. Leave them alone, unless this panics.
+		 */
+		if (!(m->status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
+			!no_way_out)
+			continue;
+
+		/*
+		 * Set taint even when machine check was not enabled.
+		 */
+		add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
+
+		severity = mce_severity(m, cfg->tolerant, NULL, true);
+
+		/*
+		 * When machine check was for corrected/deferred handler don't
+		 * touch, unless we're panicing.
+		 */
+		if ((severity == MCE_KEEP_SEVERITY ||
+		     severity == MCE_UCNA_SEVERITY) && !no_way_out)
+			continue;
+		__set_bit(i, toclear);
+		if (severity == MCE_NO_SEVERITY) {
+			/*
+			 * Machine check event was not enabled. Clear, but
+			 * ignore.
+			 */
+			continue;
+		}
+
+		mce_read_aux(m, i);
+
+		/* assuming valid severity level != 0 */
+		m->severity = severity;
+
+		mce_log(m);
+
+		if (severity > *worst) {
+			*final = *m;
+			*worst = severity;
+		}
+	}
+
+	/* mce_clear_state will clear *final, save locally for use later */
+	*m = *final;
+}
+
 /*
  * The actual machine check handler. This only handles real
  * exceptions when something got corrupted coming in through int 18.
@@ -1151,8 +1221,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	char *msg = "Unknown";
 	struct mce m, *final;
 	int worst = 0;
-	int severity;
-	int i;
 
 	/*
 	 * Establish sequential order between the CPUs entering the machine
@@ -1225,67 +1293,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		order = mce_start(&no_way_out);
 	}
 
-	for (i = 0; i < cfg->banks; i++) {
-		__clear_bit(i, toclear);
-		if (!test_bit(i, valid_banks))
-			continue;
-		if (!mce_banks[i].ctl)
-			continue;
-
-		m.misc = 0;
-		m.addr = 0;
-		m.bank = i;
-
-		m.status = mce_rdmsrl(msr_ops.status(i));
-		if ((m.status & MCI_STATUS_VAL) == 0)
-			continue;
-
-		/*
-		 * Non uncorrected or non signaled errors are handled by
-		 * machine_check_poll. Leave them alone, unless this panics.
-		 */
-		if (!(m.status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
-			!no_way_out)
-			continue;
-
-		/*
-		 * Set taint even when machine check was not enabled.
-		 */
-		add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
-
-		severity = mce_severity(&m, cfg->tolerant, NULL, true);
-
-		/*
-		 * When machine check was for corrected/deferred handler don't
-		 * touch, unless we're panicing.
-		 */
-		if ((severity == MCE_KEEP_SEVERITY ||
-		     severity == MCE_UCNA_SEVERITY) && !no_way_out)
-			continue;
-		__set_bit(i, toclear);
-		if (severity == MCE_NO_SEVERITY) {
-			/*
-			 * Machine check event was not enabled. Clear, but
-			 * ignore.
-			 */
-			continue;
-		}
-
-		mce_read_aux(&m, i);
-
-		/* assuming valid severity level != 0 */
-		m.severity = severity;
-
-		mce_log(&m);
-
-		if (severity > worst) {
-			*final = m;
-			worst = severity;
-		}
-	}
-
-	/* mce_clear_state will clear *final, save locally for use later */
-	m = *final;
+	__mc_scan_banks(&m, final, toclear, valid_banks, no_way_out, &worst);
 
 	if (!no_way_out)
 		mce_clear_state(toclear);
-- 
2.17.0.582.gccdcbd54c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/7] x86/mce: Cleanup __mc_scan_banks()
  2018-06-22  9:54 [PATCH 0/7] RAS queue Borislav Petkov
                   ` (4 preceding siblings ...)
  2018-06-22  9:54 ` [PATCH 5/7] x86/mce: Carve out bank scanning code Borislav Petkov
@ 2018-06-22  9:54 ` Borislav Petkov
  2018-06-22 12:42   ` [tip:ras/core] " tip-bot for Borislav Petkov
  2018-06-22  9:54 ` [PATCH 7/7] x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out() Borislav Petkov
  6 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2018-06-22  9:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Correct comments, improve readability, simplify.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5e834bad4eaf..5c38d1f861f2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1141,6 +1141,7 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
 		__clear_bit(i, toclear);
 		if (!test_bit(i, valid_banks))
 			continue;
+
 		if (!mce_banks[i].ctl)
 			continue;
 
@@ -1149,39 +1150,35 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
 		m->bank = i;
 
 		m->status = mce_rdmsrl(msr_ops.status(i));
-		if ((m->status & MCI_STATUS_VAL) == 0)
+		if (!(m->status & MCI_STATUS_VAL))
 			continue;
 
 		/*
-		 * Non uncorrected or non signaled errors are handled by
-		 * machine_check_poll. Leave them alone, unless this panics.
+		 * Corrected or non-signaled errors are handled by
+		 * machine_check_poll(). Leave them alone, unless this panics.
 		 */
 		if (!(m->status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
 			!no_way_out)
 			continue;
 
-		/*
-		 * Set taint even when machine check was not enabled.
-		 */
+		/* Set taint even when machine check was not enabled. */
 		add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
 
 		severity = mce_severity(m, cfg->tolerant, NULL, true);
 
 		/*
 		 * When machine check was for corrected/deferred handler don't
-		 * touch, unless we're panicing.
+		 * touch, unless we're panicking.
 		 */
 		if ((severity == MCE_KEEP_SEVERITY ||
 		     severity == MCE_UCNA_SEVERITY) && !no_way_out)
 			continue;
+
 		__set_bit(i, toclear);
-		if (severity == MCE_NO_SEVERITY) {
-			/*
-			 * Machine check event was not enabled. Clear, but
-			 * ignore.
-			 */
+
+		/* Machine check event was not enabled. Clear, but ignore. */
+		if (severity == MCE_NO_SEVERITY)
 			continue;
-		}
 
 		mce_read_aux(m, i);
 
-- 
2.17.0.582.gccdcbd54c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/7] x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()
  2018-06-22  9:54 [PATCH 0/7] RAS queue Borislav Petkov
                   ` (5 preceding siblings ...)
  2018-06-22  9:54 ` [PATCH 6/7] x86/mce: Cleanup __mc_scan_banks() Borislav Petkov
@ 2018-06-22  9:54 ` Borislav Petkov
  2018-06-22 12:39   ` [tip:ras/core] " tip-bot for Borislav Petkov
  6 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2018-06-22  9:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

mce_no_way_out() does a quick check during #MC to see whether some of
the MCEs logged would require the kernel to panic immediately. And it
passes a struct mce where MCi_STATUS gets written.

However, after having saved a valid status value, the next iteration
of the loop which goes over the MCA banks on the CPU, overwrites the
valid status value because we're using struct mce as storage instead of
a temporary variable.

Which leads to MCE records with an empty status value:

  mce: [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 0: 0000000000000000
  mce: [Hardware Error]: RIP 10:<ffffffffbd42fbd7> {trigger_mce+0x7/0x10}

In order to prevent the loss of the status register value, return
immediately when severity is a panic one so that we can panic
immediately with the first fatal MCE logged. This is also the intention
of this function and not to noodle over the banks while a fatal MCE is
already logged.

Tony: read the rest of the MCA bank to populate the struct mce fully.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5c38d1f861f2..9a16f15f79eb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -772,23 +772,25 @@ EXPORT_SYMBOL_GPL(machine_check_poll);
 static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
 			  struct pt_regs *regs)
 {
-	int i, ret = 0;
 	char *tmp;
+	int i;
 
 	for (i = 0; i < mca_cfg.banks; i++) {
 		m->status = mce_rdmsrl(msr_ops.status(i));
-		if (m->status & MCI_STATUS_VAL) {
-			__set_bit(i, validp);
-			if (quirk_no_way_out)
-				quirk_no_way_out(i, m, regs);
-		}
+		if (!(m->status & MCI_STATUS_VAL))
+			continue;
+
+		__set_bit(i, validp);
+		if (quirk_no_way_out)
+			quirk_no_way_out(i, m, regs);
 
 		if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) {
+			mce_read_aux(m, i);
 			*msg = tmp;
-			ret = 1;
+			return 1;
 		}
 	}
-	return ret;
+	return 0;
 }
 
 /*
-- 
2.17.0.582.gccdcbd54c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()
  2018-06-22  9:54 ` [PATCH 7/7] x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out() Borislav Petkov
@ 2018-06-22 12:39   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Borislav Petkov @ 2018-06-22 12:39 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: stable, mingo, linux-kernel, bp, tglx, hpa, tony.luck

Commit-ID:  1f74c8a64798e2c488f86efc97e308b85fb7d7aa
Gitweb:     https://git.kernel.org/tip/1f74c8a64798e2c488f86efc97e308b85fb7d7aa
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Fri, 22 Jun 2018 11:54:28 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 22 Jun 2018 14:35:50 +0200

x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()

mce_no_way_out() does a quick check during #MC to see whether some of
the MCEs logged would require the kernel to panic immediately. And it
passes a struct mce where MCi_STATUS gets written.

However, after having saved a valid status value, the next iteration
of the loop which goes over the MCA banks on the CPU, overwrites the
valid status value because we're using struct mce as storage instead of
a temporary variable.

Which leads to MCE records with an empty status value:

  mce: [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 0: 0000000000000000
  mce: [Hardware Error]: RIP 10:<ffffffffbd42fbd7> {trigger_mce+0x7/0x10}

In order to prevent the loss of the status register value, return
immediately when severity is a panic one so that we can panic
immediately with the first fatal MCE logged. This is also the intention
of this function and not to noodle over the banks while a fatal MCE is
already logged.

Tony: read the rest of the MCA bank to populate the struct mce fully.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180622095428.626-8-bp@alien8.de

---
 arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index cd76380af79f..7e6f51a9d917 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -772,23 +772,25 @@ EXPORT_SYMBOL_GPL(machine_check_poll);
 static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
 			  struct pt_regs *regs)
 {
-	int i, ret = 0;
 	char *tmp;
+	int i;
 
 	for (i = 0; i < mca_cfg.banks; i++) {
 		m->status = mce_rdmsrl(msr_ops.status(i));
-		if (m->status & MCI_STATUS_VAL) {
-			__set_bit(i, validp);
-			if (quirk_no_way_out)
-				quirk_no_way_out(i, m, regs);
-		}
+		if (!(m->status & MCI_STATUS_VAL))
+			continue;
+
+		__set_bit(i, validp);
+		if (quirk_no_way_out)
+			quirk_no_way_out(i, m, regs);
 
 		if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) {
+			mce_read_aux(m, i);
 			*msg = tmp;
-			ret = 1;
+			return 1;
 		}
 	}
-	return ret;
+	return 0;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Carve out the crashing_cpu check
  2018-06-22  9:54 ` [PATCH 3/7] x86/mce: Carve out the crashing_cpu check Borislav Petkov
@ 2018-06-22 12:41   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Borislav Petkov @ 2018-06-22 12:41 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, linux-kernel, bp, tglx, hpa

Commit-ID:  d3d6923cd1ae61a493a405f2f001293ab62eb092
Gitweb:     https://git.kernel.org/tip/d3d6923cd1ae61a493a405f2f001293ab62eb092
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Fri, 22 Jun 2018 11:54:24 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 22 Jun 2018 14:37:22 +0200

x86/mce: Carve out the crashing_cpu check

Carve out the rendezvous handler timeout avoidance check into a separate
function in order to simplify the #MC handler.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-4-bp@alien8.de

---
 arch/x86/kernel/cpu/mcheck/mce.c | 64 +++++++++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d62201e40027..18804834dbc0 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1104,6 +1104,34 @@ static void mce_unmap_kpfn(unsigned long pfn)
 }
 #endif
 
+
+/*
+ * Cases where we avoid rendezvous handler timeout:
+ * 1) If this CPU is offline.
+ *
+ * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to
+ *  skip those CPUs which remain looping in the 1st kernel - see
+ *  crash_nmi_callback().
+ *
+ * Note: there still is a small window between kexec-ing and the new,
+ * kdump kernel establishing a new #MC handler where a broadcasted MCE
+ * might not get handled properly.
+ */
+static bool __mc_check_crashing_cpu(int cpu)
+{
+	if (cpu_is_offline(cpu) ||
+	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
+		u64 mcgstatus;
+
+		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
+		if (mcgstatus & MCG_STATUS_RIPV) {
+			mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
+			return true;
+		}
+	}
+	return false;
+}
+
 /*
  * The actual machine check handler. This only handles real
  * exceptions when something got corrupted coming in through int 18.
@@ -1118,60 +1146,42 @@ static void mce_unmap_kpfn(unsigned long pfn)
  */
 void do_machine_check(struct pt_regs *regs, long error_code)
 {
+	DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
+	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
 	struct mca_config *cfg = &mca_cfg;
+	int cpu = smp_processor_id();
+	char *msg = "Unknown";
 	struct mce m, *final;
-	int i;
 	int worst = 0;
 	int severity;
+	int i;
 
 	/*
 	 * Establish sequential order between the CPUs entering the machine
 	 * check handler.
 	 */
 	int order = -1;
+
 	/*
 	 * If no_way_out gets set, there is no safe way to recover from this
 	 * MCE.  If mca_cfg.tolerant is cranked up, we'll try anyway.
 	 */
 	int no_way_out = 0;
+
 	/*
 	 * If kill_it gets set, there might be a way to recover from this
 	 * error.
 	 */
 	int kill_it = 0;
-	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
-	DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
-	char *msg = "Unknown";
 
 	/*
 	 * MCEs are always local on AMD. Same is determined by MCG_STATUS_LMCES
 	 * on Intel.
 	 */
 	int lmce = 1;
-	int cpu = smp_processor_id();
-
-	/*
-	 * Cases where we avoid rendezvous handler timeout:
-	 * 1) If this CPU is offline.
-	 *
-	 * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to
-	 *  skip those CPUs which remain looping in the 1st kernel - see
-	 *  crash_nmi_callback().
-	 *
-	 * Note: there still is a small window between kexec-ing and the new,
-	 * kdump kernel establishing a new #MC handler where a broadcasted MCE
-	 * might not get handled properly.
-	 */
-	if (cpu_is_offline(cpu) ||
-	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
-		u64 mcgstatus;
 
-		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
-		if (mcgstatus & MCG_STATUS_RIPV) {
-			mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
-			return;
-		}
-	}
+	if (__mc_check_crashing_cpu(cpu))
+		return;
 
 	ist_enter(regs);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Remove !banks check
  2018-06-22  9:54 ` [PATCH 4/7] x86/mce: Remove !banks check Borislav Petkov
@ 2018-06-22 12:41   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Borislav Petkov @ 2018-06-22 12:41 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: bp, tglx, mingo, linux-kernel, hpa

Commit-ID:  45deca7d96e0e33061b7d12932b381114db18f60
Gitweb:     https://git.kernel.org/tip/45deca7d96e0e33061b7d12932b381114db18f60
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Fri, 22 Jun 2018 11:54:25 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 22 Jun 2018 14:37:22 +0200

x86/mce: Remove !banks check

If we don't have MCA banks, we won't see machine checks anyway. Drop the
check.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-5-bp@alien8.de

---
 arch/x86/kernel/cpu/mcheck/mce.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 18804834dbc0..8ac2ea2a874a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1187,9 +1187,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 
 	this_cpu_inc(mce_exception_count);
 
-	if (!cfg->banks)
-		goto out;
-
 	mce_gather_info(&m, regs);
 	m.tsc = rdtsc();
 
@@ -1329,7 +1326,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	if (worst > 0)
 		mce_report_event(regs);
 	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
-out:
+
 	sync_core();
 
 	if (worst != MCE_AR_SEVERITY && !kill_it)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Carve out bank scanning code
  2018-06-22  9:54 ` [PATCH 5/7] x86/mce: Carve out bank scanning code Borislav Petkov
@ 2018-06-22 12:42   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Borislav Petkov @ 2018-06-22 12:42 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, tglx, mingo, bp

Commit-ID:  f35565e3986dcc5dde52649d461543cde1b745cd
Gitweb:     https://git.kernel.org/tip/f35565e3986dcc5dde52649d461543cde1b745cd
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Fri, 22 Jun 2018 11:54:26 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 22 Jun 2018 14:37:23 +0200

x86/mce: Carve out bank scanning code

Carve out the scan loop into a separate function.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-6-bp@alien8.de

---
 arch/x86/kernel/cpu/mcheck/mce.c | 134 +++++++++++++++++++++------------------
 1 file changed, 71 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8ac2ea2a874a..d23fcd8d11c9 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1132,6 +1132,76 @@ static bool __mc_check_crashing_cpu(int cpu)
 	return false;
 }
 
+static void __mc_scan_banks(struct mce *m, struct mce *final,
+			    unsigned long *toclear, unsigned long *valid_banks,
+			    int no_way_out, int *worst)
+{
+	struct mca_config *cfg = &mca_cfg;
+	int severity, i;
+
+	for (i = 0; i < cfg->banks; i++) {
+		__clear_bit(i, toclear);
+		if (!test_bit(i, valid_banks))
+			continue;
+		if (!mce_banks[i].ctl)
+			continue;
+
+		m->misc = 0;
+		m->addr = 0;
+		m->bank = i;
+
+		m->status = mce_rdmsrl(msr_ops.status(i));
+		if ((m->status & MCI_STATUS_VAL) == 0)
+			continue;
+
+		/*
+		 * Non uncorrected or non signaled errors are handled by
+		 * machine_check_poll. Leave them alone, unless this panics.
+		 */
+		if (!(m->status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
+			!no_way_out)
+			continue;
+
+		/*
+		 * Set taint even when machine check was not enabled.
+		 */
+		add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
+
+		severity = mce_severity(m, cfg->tolerant, NULL, true);
+
+		/*
+		 * When machine check was for corrected/deferred handler don't
+		 * touch, unless we're panicing.
+		 */
+		if ((severity == MCE_KEEP_SEVERITY ||
+		     severity == MCE_UCNA_SEVERITY) && !no_way_out)
+			continue;
+		__set_bit(i, toclear);
+		if (severity == MCE_NO_SEVERITY) {
+			/*
+			 * Machine check event was not enabled. Clear, but
+			 * ignore.
+			 */
+			continue;
+		}
+
+		mce_read_aux(m, i);
+
+		/* assuming valid severity level != 0 */
+		m->severity = severity;
+
+		mce_log(m);
+
+		if (severity > *worst) {
+			*final = *m;
+			*worst = severity;
+		}
+	}
+
+	/* mce_clear_state will clear *final, save locally for use later */
+	*m = *final;
+}
+
 /*
  * The actual machine check handler. This only handles real
  * exceptions when something got corrupted coming in through int 18.
@@ -1153,8 +1223,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	char *msg = "Unknown";
 	struct mce m, *final;
 	int worst = 0;
-	int severity;
-	int i;
 
 	/*
 	 * Establish sequential order between the CPUs entering the machine
@@ -1227,67 +1295,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		order = mce_start(&no_way_out);
 	}
 
-	for (i = 0; i < cfg->banks; i++) {
-		__clear_bit(i, toclear);
-		if (!test_bit(i, valid_banks))
-			continue;
-		if (!mce_banks[i].ctl)
-			continue;
-
-		m.misc = 0;
-		m.addr = 0;
-		m.bank = i;
-
-		m.status = mce_rdmsrl(msr_ops.status(i));
-		if ((m.status & MCI_STATUS_VAL) == 0)
-			continue;
-
-		/*
-		 * Non uncorrected or non signaled errors are handled by
-		 * machine_check_poll. Leave them alone, unless this panics.
-		 */
-		if (!(m.status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
-			!no_way_out)
-			continue;
-
-		/*
-		 * Set taint even when machine check was not enabled.
-		 */
-		add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
-
-		severity = mce_severity(&m, cfg->tolerant, NULL, true);
-
-		/*
-		 * When machine check was for corrected/deferred handler don't
-		 * touch, unless we're panicing.
-		 */
-		if ((severity == MCE_KEEP_SEVERITY ||
-		     severity == MCE_UCNA_SEVERITY) && !no_way_out)
-			continue;
-		__set_bit(i, toclear);
-		if (severity == MCE_NO_SEVERITY) {
-			/*
-			 * Machine check event was not enabled. Clear, but
-			 * ignore.
-			 */
-			continue;
-		}
-
-		mce_read_aux(&m, i);
-
-		/* assuming valid severity level != 0 */
-		m.severity = severity;
-
-		mce_log(&m);
-
-		if (severity > worst) {
-			*final = m;
-			worst = severity;
-		}
-	}
-
-	/* mce_clear_state will clear *final, save locally for use later */
-	m = *final;
+	__mc_scan_banks(&m, final, toclear, valid_banks, no_way_out, &worst);
 
 	if (!no_way_out)
 		mce_clear_state(toclear);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Cleanup __mc_scan_banks()
  2018-06-22  9:54 ` [PATCH 6/7] x86/mce: Cleanup __mc_scan_banks() Borislav Petkov
@ 2018-06-22 12:42   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Borislav Petkov @ 2018-06-22 12:42 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, linux-kernel, hpa, mingo, bp

Commit-ID:  d5c84ef202d7db9c838448229f62d66560b951e5
Gitweb:     https://git.kernel.org/tip/d5c84ef202d7db9c838448229f62d66560b951e5
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Fri, 22 Jun 2018 11:54:27 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 22 Jun 2018 14:37:23 +0200

x86/mce: Cleanup __mc_scan_banks()

Correct comments, improve readability, simplify.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-7-bp@alien8.de

---
 arch/x86/kernel/cpu/mcheck/mce.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d23fcd8d11c9..888b51bef13d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1143,6 +1143,7 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
 		__clear_bit(i, toclear);
 		if (!test_bit(i, valid_banks))
 			continue;
+
 		if (!mce_banks[i].ctl)
 			continue;
 
@@ -1151,39 +1152,35 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
 		m->bank = i;
 
 		m->status = mce_rdmsrl(msr_ops.status(i));
-		if ((m->status & MCI_STATUS_VAL) == 0)
+		if (!(m->status & MCI_STATUS_VAL))
 			continue;
 
 		/*
-		 * Non uncorrected or non signaled errors are handled by
-		 * machine_check_poll. Leave them alone, unless this panics.
+		 * Corrected or non-signaled errors are handled by
+		 * machine_check_poll(). Leave them alone, unless this panics.
 		 */
 		if (!(m->status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
 			!no_way_out)
 			continue;
 
-		/*
-		 * Set taint even when machine check was not enabled.
-		 */
+		/* Set taint even when machine check was not enabled. */
 		add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
 
 		severity = mce_severity(m, cfg->tolerant, NULL, true);
 
 		/*
 		 * When machine check was for corrected/deferred handler don't
-		 * touch, unless we're panicing.
+		 * touch, unless we're panicking.
 		 */
 		if ((severity == MCE_KEEP_SEVERITY ||
 		     severity == MCE_UCNA_SEVERITY) && !no_way_out)
 			continue;
+
 		__set_bit(i, toclear);
-		if (severity == MCE_NO_SEVERITY) {
-			/*
-			 * Machine check event was not enabled. Clear, but
-			 * ignore.
-			 */
+
+		/* Machine check event was not enabled. Clear, but ignore. */
+		if (severity == MCE_NO_SEVERITY)
 			continue;
-		}
 
 		mce_read_aux(m, i);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-06-22 12:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-22  9:54 [PATCH 0/7] RAS queue Borislav Petkov
2018-06-22  9:54 ` [PATCH 1/7] x86/mce: Always use 64-bit timestamps Borislav Petkov
2018-06-22  9:54 ` [PATCH 2/7] x86/mce: Fix incorrect "Machine check from unknown source" message Borislav Petkov
2018-06-22  9:54 ` [PATCH 3/7] x86/mce: Carve out the crashing_cpu check Borislav Petkov
2018-06-22 12:41   ` [tip:ras/core] " tip-bot for Borislav Petkov
2018-06-22  9:54 ` [PATCH 4/7] x86/mce: Remove !banks check Borislav Petkov
2018-06-22 12:41   ` [tip:ras/core] " tip-bot for Borislav Petkov
2018-06-22  9:54 ` [PATCH 5/7] x86/mce: Carve out bank scanning code Borislav Petkov
2018-06-22 12:42   ` [tip:ras/core] " tip-bot for Borislav Petkov
2018-06-22  9:54 ` [PATCH 6/7] x86/mce: Cleanup __mc_scan_banks() Borislav Petkov
2018-06-22 12:42   ` [tip:ras/core] " tip-bot for Borislav Petkov
2018-06-22  9:54 ` [PATCH 7/7] x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out() Borislav Petkov
2018-06-22 12:39   ` [tip:ras/core] " tip-bot for Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).