linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip:ras/core 1/2] x86/MCE: Extend table to report action optional errors through CMCI too
@ 2017-12-04 16:54 Borislav Petkov
  2017-12-04 16:54 ` [PATCH tip:ras/core 2/2] x86/mce/AMD: Don't set DEF_INT_TYPE in MSR_CU_DEF_ERR on SMCA systems Borislav Petkov
  0 siblings, 1 reply; 2+ messages in thread
From: Borislav Petkov @ 2017-12-04 16:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML, linux-edac

From: Xie XiuQi <xiexiuqi@huawei.com>

According to the Intel SDM Volume 3B (253669-063US, July 2017), action
optional (SRAO) errors can be reported either via MCE or CMC:

  In cases when SRAO is signaled via CMCI the error signature is
  indicated via UC=1, PCC=0, S=0.

  Type(*1)	UC	EN	PCC	S	AR	Signaling
  ---------------------------------------------------------------
  UC		1	1	1	x	x	MCE
  SRAR		1	1	0	1	1	MCE
  SRAO		1	x(*2)	0	x(*2)	0	MCE/CMC
  UCNA		1	x	0	0	0	CMC
  CE		0	x	x	x	x	CMC

  NOTES:
  1. SRAR, SRAO and UCNA errors are supported by the processor only
     when IA32_MCG_CAP[24] (MCG_SER_P) is set.
  2. EN=1, S=1 when signaled via MCE. EN=x, S=0 when signaled via CMC.

And there is a description in 15.6.2 UCR Error Reporting and Logging,
for bit S:

  S (Signaling) flag, bit 56 - Indicates (when set) that a machine check
  exception was generated for the UCR error reported in this MC bank...
  When the S flag in the IA32_MCi_STATUS register is clear, this UCR error
  was not signaled via a machine check exception and instead was reported
  as a corrected machine check (CMC).

So merge the two cases and just remove the S=0 check for SRAO in
mce_severity().

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Chen Wei <chenwei68@huawei.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1511575548-41992-1-git-send-email-xiexiuqi@huawei.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 4ca632a06e0b..5bbd06f38ff6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -59,6 +59,7 @@ static struct severity {
 #define  MCGMASK(x, y)	.mcgmask = x, .mcgres = y
 #define  MASK(x, y)	.mask = x, .result = y
 #define MCI_UC_S (MCI_STATUS_UC|MCI_STATUS_S)
+#define MCI_UC_AR (MCI_STATUS_UC|MCI_STATUS_AR)
 #define MCI_UC_SAR (MCI_STATUS_UC|MCI_STATUS_S|MCI_STATUS_AR)
 #define	MCI_ADDR (MCI_STATUS_ADDRV|MCI_STATUS_MISCV)
 
@@ -101,6 +102,22 @@ static struct severity {
 		NOSER, BITCLR(MCI_STATUS_UC)
 		),
 
+	/*
+	 * known AO MCACODs reported via MCE or CMC:
+	 *
+	 * SRAO could be signaled either via a machine check exception or
+	 * CMCI with the corresponding bit S 1 or 0. So we don't need to
+	 * check bit S for SRAO.
+	 */
+	MCESEV(
+		AO, "Action optional: memory scrubbing error",
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_AR|MCACOD_SCRUBMSK, MCI_STATUS_UC|MCACOD_SCRUB)
+		),
+	MCESEV(
+		AO, "Action optional: last level cache writeback error",
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_AR|MCACOD, MCI_STATUS_UC|MCACOD_L3WB)
+		),
+
 	/* ignore OVER for UCNA */
 	MCESEV(
 		UCNA, "Uncorrected no action required",
@@ -149,15 +166,6 @@ static struct severity {
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR, MCI_UC_SAR)
 		),
 
-	/* known AO MCACODs: */
-	MCESEV(
-		AO, "Action optional: memory scrubbing error",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD_SCRUBMSK, MCI_UC_S|MCACOD_SCRUB)
-		),
-	MCESEV(
-		AO, "Action optional: last level cache writeback error",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_S|MCACOD_L3WB)
-		),
 	MCESEV(
 		SOME, "Action optional: unknown MCACOD",
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR, MCI_UC_S)
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH tip:ras/core 2/2] x86/mce/AMD: Don't set DEF_INT_TYPE in MSR_CU_DEF_ERR on SMCA systems
  2017-12-04 16:54 [PATCH tip:ras/core 1/2] x86/MCE: Extend table to report action optional errors through CMCI too Borislav Petkov
@ 2017-12-04 16:54 ` Borislav Petkov
  0 siblings, 0 replies; 2+ messages in thread
From: Borislav Petkov @ 2017-12-04 16:54 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML, Tony Luck, linux-edac

From: Yazen Ghannam <yazen.ghannam@amd.com>

The McaIntrCfg register (MSRC000_0410), previously known as
CU_DEFER_ERR, is used on SMCA systems to set the LVT offset for the
Threshold and Deferred error interrupts.

This register was used on non-SMCA systems to also set the Deferred
interrupt type in bits 2:1. However, these bits are reserved on SMCA
systems.

Only set MSRC000_0410[2:1] on non-SMCA systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20171120162646.5210-1-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 486f640b02ef..a38ab1fa53a2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -407,7 +407,9 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
 	    (deferred_error_int_vector != amd_deferred_error_interrupt))
 		deferred_error_int_vector = amd_deferred_error_interrupt;
 
-	low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
+	if (!mce_flags.smca)
+		low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
+
 	wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-12-04 16:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-04 16:54 [PATCH tip:ras/core 1/2] x86/MCE: Extend table to report action optional errors through CMCI too Borislav Petkov
2017-12-04 16:54 ` [PATCH tip:ras/core 2/2] x86/mce/AMD: Don't set DEF_INT_TYPE in MSR_CU_DEF_ERR on SMCA systems Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).