All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/5] Updates to EDAC and AMD MCE driver
@ 2016-02-29 22:32 Aravind Gopalakrishnan
  2016-02-29 22:32 ` [PATCH V2 1/5] x86/mce: Move MCx_CONFIG MSR definition Aravind Gopalakrishnan
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-02-29 22:32 UTC (permalink / raw)
  To: bp, tony.luck, hpa, mingo, tglx, dougthompson, mchehab
  Cc: x86, linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

This patchset mainly provides necessary EDAC bits to decode errors
occuring on Scalable MCA enabled processors and also updates AMD MCE
driver to program the correct MCx_MISC register address for upcoming
processors.

Patches 1, 2 and 3 are meant for the upcoming processors.

Patches 4 and 5 are either fixing or adding comments to help in
understanding the code and do not introduce any functional changes.

Patch 1: Move MSR definition to mce.h
Patch 2: Updates to EDAC driver to decode the new error signatures
Patch 3: Fix logic to obtain correct block address
Patch 4: Fix deferred error comment
Patch 5: Add comments to amd_nb.h to describe threshold_block structure

Tested V2 patches for regressions on Fam15h, Fam10h systems
and found none.

Note 1: Introduced new patch for moving MCx_CONFIG MSR to mce.h
Note 2: The enums ans amd_hwid_mappings[] array are placed in arch/x86
	as there are follow-up patches which need the struct there

Changes from V1: (per Boris suggestions)
  - Simplify error decoding routines
  - Move headers to mce.h
  - Rename enumerations and struct members (to be more descriptive)
  - Drop gerund usage
  - Remove comments that are spelling out the code

Aravind Gopalakrishnan (5):
  x86/mce: Move MCx_CONFIG MSR definition
  EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Add comments for easier understanding

 arch/x86/include/asm/amd_nb.h        |  18 +-
 arch/x86/include/asm/mce.h           |  63 ++++++-
 arch/x86/include/asm/msr-index.h     |   4 -
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 108 +++++++----
 drivers/edac/mce_amd.c               | 342 ++++++++++++++++++++++++++++++++++-
 5 files changed, 486 insertions(+), 49 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 1/5] x86/mce: Move MCx_CONFIG MSR definition
  2016-02-29 22:32 [PATCH V2 0/5] Updates to EDAC and AMD MCE driver Aravind Gopalakrishnan
@ 2016-02-29 22:32 ` Aravind Gopalakrishnan
  2016-03-08 13:12   ` [tip:ras/core] x86/mce: Move MCx_CONFIG MSR definitions tip-bot for Aravind Gopalakrishnan
  2016-02-29 22:32 ` [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors Aravind Gopalakrishnan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-02-29 22:32 UTC (permalink / raw)
  To: bp, tony.luck, hpa, mingo, tglx, dougthompson, mchehab
  Cc: x86, linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

Since this is contained to only MCE code, move
the MSR definiton there instead of adding to msr-index

Per discussion here:
http://marc.info/?l=linux-kernel&m=145633699026474&w=2

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/mce.h       | 4 ++++
 arch/x86/include/asm/msr-index.h | 4 ----
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 18d2ba9..e8b09b3 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -91,6 +91,10 @@
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE	"MACHINECHECK"
 
+/* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG	0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)	(MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 /*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 75a5bb6..984ab75 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -269,10 +269,6 @@
 #define MSR_IA32_MC0_CTL2		0x00000280
 #define MSR_IA32_MCx_CTL2(x)		(MSR_IA32_MC0_CTL2 + (x))
 
-/* 'SMCA': AMD64 Scalable MCA */
-#define MSR_AMD64_SMCA_MC0_CONFIG	0xc0002004
-#define MSR_AMD64_SMCA_MCx_CONFIG(x)	(MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
-
 #define MSR_P6_PERFCTR0			0x000000c1
 #define MSR_P6_PERFCTR1			0x000000c2
 #define MSR_P6_EVNTSEL0			0x00000186
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  2016-02-29 22:32 [PATCH V2 0/5] Updates to EDAC and AMD MCE driver Aravind Gopalakrishnan
  2016-02-29 22:32 ` [PATCH V2 1/5] x86/mce: Move MCx_CONFIG MSR definition Aravind Gopalakrishnan
@ 2016-02-29 22:32 ` Aravind Gopalakrishnan
  2016-03-02 10:50   ` Borislav Petkov
  2016-02-29 22:32 ` [PATCH V2 3/5] x86/mce/AMD: Fix logic to obtain block address Aravind Gopalakrishnan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-02-29 22:32 UTC (permalink / raw)
  To: bp, tony.luck, hpa, mingo, tglx, dougthompson, mchehab
  Cc: x86, linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

For Scalable MCA enabled processors, errors are listed
per IP block. And since it is not required for an IP to
map to a particular bank, we need to use HWID and McaType
values from the MCx_IPID register to figure out which IP
a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register
to indicate Task context is corrupt.

Add logic here to decode errors from all known IP
blocks for Fam17h Model 00-0fh and to print TCC errors.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/mce.h           |  53 ++++++
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  11 ++
 drivers/edac/mce_amd.c               | 342 ++++++++++++++++++++++++++++++++++-
 3 files changed, 405 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e8b09b3..e83bbd6 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,18 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED	(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON	(1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC		(1ULL<<55)  /* Task context corrupt */
+
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank.
+ *  - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
+ *    But should not be used to determine MSR numbers.
+ *  - TCC bit is present in MCx_STATUS.
+ */
+#define MCI_CONFIG_MCAX		0x1
+#define MCI_IPID_MCATYPE	0xFFFF0000
+#define MCI_IPID_HWID		0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -93,7 +105,9 @@
 
 /* 'SMCA': AMD64 Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG	0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID		0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)	(MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x)	(MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 /*
  * This structure contains all data related to the MCE log.  Also
@@ -292,4 +306,43 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
 				      struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerate new IP types and HWID values
+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum amd_ip_types {
+	SMCA_F17H_CORE_BLOCK = 0,	/* Core errors */
+	SMCA_DF_BLOCK,			/* Data Fabric */
+	SMCA_UMC_BLOCK,			/* Unified Memory Controller */
+	SMCA_PB_BLOCK,			/* Parameter Block */
+	SMCA_PSP_BLOCK,			/* Platform Security Processor */
+	SMCA_SMU_BLOCK,			/* System Management Unit */
+	N_AMD_IP_TYPES
+};
+
+struct amd_hwid {
+	const char *amd_ipname;
+	unsigned int amd_hwid_value;
+};
+
+extern struct amd_hwid amd_hwid_mappings[N_AMD_IP_TYPES];
+
+enum amd_core_mca_blocks {
+	SMCA_LS_BLOCK = 0,	/* Load Store */
+	SMCA_IF_BLOCK,		/* Instruction Fetch */
+	SMCA_L2_CACHE_BLOCK,	/* L2 cache */
+	SMCA_DE_BLOCK,		/* Decoder unit */
+	RES,			/* Reserved */
+	SMCA_EX_BLOCK,		/* Execution unit */
+	SMCA_FP_BLOCK,		/* Floating Point */
+	SMCA_L3_CACHE_BLOCK	/* L3 cache */
+};
+
+enum amd_df_mca_blocks {
+	SMCA_CS_BLOCK = 0,	/* Coherent Slave */
+	SMCA_PIE_BLOCK		/* Power management, Interrupts, etc */
+};
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27b..13f15cb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,17 @@ static const char * const th_names[] = {
 	"execution_unit",
 };
 
+/* Define HWID to IP type mappings for Scalable MCA */
+struct amd_hwid amd_hwid_mappings[] = {
+	[SMCA_F17H_CORE_BLOCK]	= { "f17h_core", 0xB0 },
+	[SMCA_DF_BLOCK]		= { "df", 0x2E },
+	[SMCA_UMC_BLOCK]	= { "umc", 0x96 },
+	[SMCA_PB_BLOCK]		= { "pb", 0x5 },
+	[SMCA_PSP_BLOCK]	= { "psp", 0xFF },
+	[SMCA_SMU_BLOCK]	= { "smu", 0x1 },
+};
+EXPORT_SYMBOL_GPL(amd_hwid_mappings);
+
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned char, bank_map);	/* see which banks are on */
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index e3a945c..409448e 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -147,6 +147,136 @@ static const char * const mc6_mce_desc[] = {
 	"Status Register File",
 };
 
+/* Scalable MCA error strings */
+
+static const char * const f17h_ls_mce_desc[] = {
+	"Load queue parity",
+	"Store queue parity",
+	"Miss address buffer payload parity",
+	"L1 TLB parity",
+	"",						/* reserved */
+	"DC tag error type 6",
+	"DC tag error type 1",
+	"Internal error type 1",
+	"Internal error type 2",
+	"Sys Read data error thread 0",
+	"Sys read data error thread 1",
+	"DC tag error type 2",
+	"DC data error type 1 (poison comsumption)",
+	"DC data error type 2",
+	"DC data error type 3",
+	"DC tag error type 4",
+	"L2 TLB parity",
+	"PDC parity error",
+	"DC tag error type 3",
+	"DC tag error type 5",
+	"L2 fill data error",
+};
+
+static const char * const f17h_if_mce_desc[] = {
+	"microtag probe port parity error",
+	"IC microtag or full tag multi-hit error",
+	"IC full tag parity",
+	"IC data array parity",
+	"Decoupling queue phys addr parity error",
+	"L0 ITLB parity error",
+	"L1 ITLB parity error",
+	"L2 ITLB parity error",
+	"BPQ snoop parity on Thread 0",
+	"BPQ snoop parity on Thread 1",
+	"L1 BTB multi-match error",
+	"L2 BTB multi-match error",
+};
+
+static const char * const f17h_l2_mce_desc[] = {
+	"L2M tag multi-way-hit error",
+	"L2M tag ECC error",
+	"L2M data ECC error",
+	"HW assert",
+};
+
+static const char * const f17h_de_mce_desc[] = {
+	"uop cache tag parity error",
+	"uop cache data parity error",
+	"Insn buffer parity error",
+	"Insn dispatch queue parity error",
+	"Fetch address FIFO parity",
+	"Patch RAM data parity",
+	"Patch RAM sequencer parity",
+	"uop buffer parity"
+};
+
+static const char * const f17h_ex_mce_desc[] = {
+	"Watchdog timeout error",
+	"Phy register file parity",
+	"Flag register file parity",
+	"Immediate displacement register file parity",
+	"Address generator payload parity",
+	"EX payload parity",
+	"Checkpoint queue parity",
+	"Retire dispatch queue parity",
+};
+
+static const char * const f17h_fp_mce_desc[] = {
+	"Physical register file parity",
+	"Freelist parity error",
+	"Schedule queue parity",
+	"NSQ parity error",
+	"Retire queue parity",
+	"Status register file parity",
+};
+
+static const char * const f17h_l3_mce_desc[] = {
+	"Shadow tag macro ECC error",
+	"Shadow tag macro multi-way-hit error",
+	"L3M tag ECC error",
+	"L3M tag multi-way-hit error",
+	"L3M data ECC error",
+	"XI parity, L3 fill done channel error",
+	"L3 victim queue parity",
+	"L3 HW assert",
+};
+
+static const char * const f17h_cs_mce_desc[] = {
+	"Illegal request from transport layer",
+	"Address violation",
+	"Security violation",
+	"Illegal response from transport layer",
+	"Unexpected response",
+	"Parity error on incoming request or probe response data",
+	"Parity error on incoming read response data",
+	"Atomic request parity",
+	"ECC error on probe filter access",
+};
+
+static const char * const f17h_pie_mce_desc[] = {
+	"HW assert",
+	"Internal PIE register security violation",
+	"Error on GMI link",
+	"Poison data written to internal PIE register",
+};
+
+static const char * const f17h_umc_mce_desc[] = {
+	"DRAM ECC error",
+	"Data poison error on DRAM",
+	"SDP parity error",
+	"Advanced peripheral bus error",
+	"Command/address parity error",
+	"Write data CRC error",
+};
+
+static const char * const f17h_pb_mce_desc[] = {
+	"Parameter Block RAM ECC error",
+};
+
+static const char * const f17h_psp_mce_desc[] = {
+	"PSP RAM ECC or parity error",
+};
+
+static const char * const f17h_smu_mce_desc[] = {
+	"SMU RAM ECC or parity error",
+};
+
 static bool f12h_mc0_mce(u16 ec, u8 xec)
 {
 	bool ret = false;
@@ -731,6 +861,192 @@ static bool amd_filter_mce(struct mce *m)
 	return false;
 }
 
+static void decode_f17hcore_errors(u8 xec, unsigned int mca_type)
+{
+	const char * const *error_desc_array;
+	char *ip_name;
+	size_t len;
+
+	switch (mca_type) {
+	case SMCA_LS_BLOCK:
+		error_desc_array = f17h_ls_mce_desc;
+		ip_name = "LS";
+		len = ARRAY_SIZE(f17h_ls_mce_desc) - 1;
+
+		if (xec == 0x4) {
+			pr_cont("Unrecognized error code from LS MCA bank\n");
+			return;
+		}
+
+		break;
+
+	case SMCA_IF_BLOCK:
+		error_desc_array = f17h_if_mce_desc;
+		ip_name = "IF";
+		len = ARRAY_SIZE(f17h_if_mce_desc) - 1;
+		break;
+
+	case SMCA_L2_CACHE_BLOCK:
+		error_desc_array = f17h_l2_mce_desc;
+		ip_name = "L2_Cache";
+		len = ARRAY_SIZE(f17h_l2_mce_desc) - 1;
+		break;
+
+	case SMCA_DE_BLOCK:
+		error_desc_array = f17h_de_mce_desc;
+		ip_name = "DE";
+		len = ARRAY_SIZE(f17h_de_mce_desc) - 1;
+		break;
+
+	case SMCA_EX_BLOCK:
+		error_desc_array = f17h_ex_mce_desc;
+		ip_name = "EX";
+		len = ARRAY_SIZE(f17h_ex_mce_desc) - 1;
+		break;
+
+	case SMCA_FP_BLOCK:
+		error_desc_array = f17h_fp_mce_desc;
+		ip_name = "FP";
+		len = ARRAY_SIZE(f17h_fp_mce_desc) - 1;
+		break;
+
+	case SMCA_L3_CACHE_BLOCK:
+		error_desc_array = f17h_l3_mce_desc;
+		ip_name = "L3_Cache";
+		len = ARRAY_SIZE(f17h_l3_mce_desc) - 1;
+		break;
+
+	default:
+		pr_cont("Unrecognized Mca Type value for F17h Core. Unable to decode errors\n");
+		return;
+	}
+
+	if (xec > len) {
+		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+		return;
+	}
+
+	pr_cont("%s.\n", error_desc_array[xec]);
+}
+
+static void decode_df_errors(u8 xec, unsigned int mca_type)
+{
+	const char * const *error_desc_array;
+	char *ip_name;
+	size_t len;
+
+	switch (mca_type) {
+	case  SMCA_CS_BLOCK:
+		error_desc_array = f17h_cs_mce_desc;
+		ip_name = "CS";
+		len = ARRAY_SIZE(f17h_cs_mce_desc) - 1;
+		break;
+
+	case SMCA_PIE_BLOCK:
+		error_desc_array = f17h_pie_mce_desc;
+		ip_name = "PIE";
+		len = ARRAY_SIZE(f17h_pie_mce_desc) - 1;
+		break;
+
+	default:
+		pr_cont("Unrecognized Mca Type value for DF. Unable to decode errors\n");
+		return;
+	}
+
+	if (xec > len) {
+		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+		return;
+	}
+
+	pr_cont("%s.\n", error_desc_array[xec]);
+}
+
+/* Decode errors according to Scalable MCA specification */
+static void decode_smca_errors(struct mce *m)
+{
+	u32 low, high;
+	u32 addr = MSR_AMD64_SMCA_MCx_IPID(m->bank);
+	unsigned int hwid, mca_type, i;
+	u8 xec = XEC(m->status, xec_mask);
+	const char * const *error_desc_array;
+	char *ip_name;
+	size_t len;
+
+	if (rdmsr_safe(addr, &low, &high)) {
+		pr_emerg("Invalid IP block specified, error information is unreliable.\n");
+		return;
+	}
+
+	hwid = high & MCI_IPID_HWID;
+	mca_type = (high & MCI_IPID_MCATYPE) >> 16;
+
+	pr_emerg(HW_ERR "MC%d IPID value: 0x%08x%08x\n", m->bank, high, low);
+
+	/*
+	 * Based on hwid and mca_type values,
+	 * decode errors from respective IPs.
+	 * Note: mca_type values make sense only
+	 * in the context of an hwid
+	 */
+	for (i = 0; i < ARRAY_SIZE(amd_hwid_mappings); i++)
+		if (amd_hwid_mappings[i].amd_hwid_value == hwid)
+			break;
+
+	switch (i) {
+	case SMCA_F17H_CORE_BLOCK:
+		ip_name = (mca_type == SMCA_L3_CACHE_BLOCK) ?
+			  "L3 Cache" : "F17h Core";
+		break;
+
+	case SMCA_DF_BLOCK:
+		ip_name = "DF";
+		break;
+
+	case SMCA_UMC_BLOCK:
+		error_desc_array = f17h_umc_mce_desc;
+		ip_name = "UMC";
+		len = ARRAY_SIZE(f17h_umc_mce_desc) - 1;
+		break;
+
+	case SMCA_PB_BLOCK:
+		error_desc_array = f17h_pb_mce_desc;
+		ip_name = "PB";
+		len = ARRAY_SIZE(f17h_pb_mce_desc) - 1;
+		break;
+
+	case SMCA_PSP_BLOCK:
+		error_desc_array = f17h_psp_mce_desc;
+		ip_name = "PSP";
+		len = ARRAY_SIZE(f17h_psp_mce_desc) - 1;
+		break;
+
+	case SMCA_SMU_BLOCK:
+		error_desc_array = f17h_smu_mce_desc;
+		ip_name = "SMU";
+		len = ARRAY_SIZE(f17h_smu_mce_desc) - 1;
+		break;
+
+	default:
+		pr_emerg(HW_ERR "HWID:%d does not match any existing IPs\n", hwid);
+		return;
+	}
+
+	pr_emerg(HW_ERR "%s Error: ", ip_name);
+
+	if (i == SMCA_F17H_CORE_BLOCK) {
+		decode_f17hcore_errors(xec, mca_type);
+	} else if (i == SMCA_DF_BLOCK) {
+		decode_df_errors(xec, mca_type);
+	} else {
+		if (xec > len) {
+			pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+			return;
+		}
+
+		pr_cont("%s.\n", error_desc_array[xec]);
+	}
+}
+
 static const char *decode_error_status(struct mce *m)
 {
 	if (m->status & MCI_STATUS_UC) {
@@ -752,6 +1068,7 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	struct mce *m = (struct mce *)data;
 	struct cpuinfo_x86 *c = &cpu_data(m->extcpu);
 	int ecc;
+	u32 ebx = cpuid_ebx(0x80000007);
 
 	if (amd_filter_mce(m))
 		return NOTIFY_STOP;
@@ -769,11 +1086,20 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"),
 		((m->status & MCI_STATUS_ADDRV)	? "AddrV" : "-"));
 
-	if (c->x86 == 0x15 || c->x86 == 0x16)
+	if (c->x86 >= 0x15)
 		pr_cont("|%s|%s",
 			((m->status & MCI_STATUS_DEFERRED) ? "Deferred" : "-"),
 			((m->status & MCI_STATUS_POISON)   ? "Poison"   : "-"));
 
+	if (!!(ebx & BIT(3))) {
+		u32 low, high;
+		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
+
+		if (!rdmsr_safe(addr, &low, &high) &&
+		    (low & MCI_CONFIG_MCAX))
+			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
+	}
+
 	/* do the two bits[14:13] together */
 	ecc = (m->status >> 45) & 0x3;
 	if (ecc)
@@ -784,6 +1110,11 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	if (m->status & MCI_STATUS_ADDRV)
 		pr_emerg(HW_ERR "MC%d Error Address: 0x%016llx\n", m->bank, m->addr);
 
+	if (!!(ebx & BIT(3))) {
+		decode_smca_errors(m);
+		goto err_code;
+	}
+
 	if (!fam_ops)
 		goto err_code;
 
@@ -834,6 +1165,7 @@ static struct notifier_block amd_mce_dec_nb = {
 static int __init mce_amd_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
+	u32 ebx = cpuid_ebx(0x80000007);
 
 	if (c->x86_vendor != X86_VENDOR_AMD)
 		return -ENODEV;
@@ -888,6 +1220,14 @@ static int __init mce_amd_init(void)
 		fam_ops->mc2_mce = f16h_mc2_mce;
 		break;
 
+	case 0x17:
+		xec_mask = 0x3f;
+		if (!(ebx & BIT(3))) {
+			printk(KERN_WARNING "Decoding supported only on Scalable MCA enabled processors\n");
+			return 0;
+		}
+		break;
+
 	default:
 		printk(KERN_WARNING "Huh? What family is it: 0x%x?!\n", c->x86);
 		kfree(fam_ops);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 3/5] x86/mce/AMD: Fix logic to obtain block address
  2016-02-29 22:32 [PATCH V2 0/5] Updates to EDAC and AMD MCE driver Aravind Gopalakrishnan
  2016-02-29 22:32 ` [PATCH V2 1/5] x86/mce: Move MCx_CONFIG MSR definition Aravind Gopalakrishnan
  2016-02-29 22:32 ` [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors Aravind Gopalakrishnan
@ 2016-02-29 22:32 ` Aravind Gopalakrishnan
  2016-02-29 22:32 ` [PATCH V2 4/5] x86/mce: Clarify comments regarding deferred error Aravind Gopalakrishnan
  2016-02-29 22:32 ` [PATCH V2 5/5] x86/mce/AMD: Add comments for easier understanding Aravind Gopalakrishnan
  4 siblings, 0 replies; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-02-29 22:32 UTC (permalink / raw)
  To: bp, tony.luck, hpa, mingo, tglx, dougthompson, mchehab
  Cc: x86, linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

In upcoming processors, the BLKPTR field is no longer used
to indicate the MSR number of the additional register.
Insted, it simply indicates the prescence of additional MSRs.

Fixing the logic here to gather MSR address from
MSR_AMD64_SMCA_MCx_MISC() for newer processors
and we fall back to existing logic for older processors.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/mce.h           |  4 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 90 ++++++++++++++++++++++++------------
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e83bbd6..69f8bda 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -104,10 +104,14 @@
 #define MCE_LOG_SIGNATURE	"MACHINECHECK"
 
 /* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_MISC0	0xc0002003
 #define MSR_AMD64_SMCA_MC0_CONFIG	0xc0002004
 #define MSR_AMD64_SMCA_MC0_IPID		0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1	0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x)	(MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)	(MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_IPID(x)	(MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y)	((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x)))
 
 /*
  * This structure contains all data related to the MCE log.  Also
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 13f15cb..a155eaa 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -286,6 +286,54 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
 	wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
+static u32 get_block_address(u32 current_addr, u32 low, u32 high,
+			     unsigned int bank, unsigned int block)
+{
+	u32 addr = 0, offset = 0;
+
+	if (mce_flags.smca) {
+		if (!block) {
+			addr = MSR_AMD64_SMCA_MCx_MISC(bank);
+		} else {
+			/*
+			 * For SMCA enabled processors, BLKPTR field
+			 * of the first MISC register (MCx_MISC0) indicates
+			 * presence of additional MISC register set (MISC1-4)
+			 */
+			u32 low, high;
+
+			if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank),
+				       &low, &high) ||
+			    !(low & MCI_CONFIG_MCAX))
+				goto nextaddr_out;
+
+			if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank),
+					&low, &high) &&
+			    (low & MASK_BLKPTR_LO))
+				addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+		}
+
+		goto nextaddr_out;
+	}
+
+	/* Fall back to method we used for older processors */
+	switch (block) {
+	case 0:
+		addr = MSR_IA32_MCx_MISC(bank);
+		break;
+	case 1:
+		offset = ((low & MASK_BLKPTR_LO) >> 21);
+		if (offset)
+			addr = MCG_XBLK_ADDR + offset;
+		break;
+	default:
+		addr = ++current_addr;
+	}
+
+nextaddr_out:
+		return addr;
+}
+
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
 			int offset, u32 misc_high)
@@ -348,16 +396,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
 		for (block = 0; block < NR_BLOCKS; ++block) {
-			if (block == 0)
-				address = MSR_IA32_MCx_MISC(bank);
-			else if (block == 1) {
-				address = (low & MASK_BLKPTR_LO) >> 21;
-				if (!address)
-					break;
-
-				address += MCG_XBLK_ADDR;
-			} else
-				++address;
+			address = get_block_address(address, low, high,
+						    bank, block);
+			if (!address)
+				break;
 
 			if (rdmsr_safe(address, &low, &high))
 				break;
@@ -462,16 +504,10 @@ static void amd_threshold_interrupt(void)
 		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
 		for (block = 0; block < NR_BLOCKS; ++block) {
-			if (block == 0) {
-				address = MSR_IA32_MCx_MISC(bank);
-			} else if (block == 1) {
-				address = (low & MASK_BLKPTR_LO) >> 21;
-				if (!address)
-					break;
-				address += MCG_XBLK_ADDR;
-			} else {
-				++address;
-			}
+			address = get_block_address(address, low, high,
+						    bank, block);
+			if (!address)
+				break;
 
 			if (rdmsr_safe(address, &low, &high))
 				break;
@@ -691,16 +727,12 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
 	if (err)
 		goto out_free;
 recurse:
-	if (!block) {
-		address = (low & MASK_BLKPTR_LO) >> 21;
-		if (!address)
-			return 0;
-		address += MCG_XBLK_ADDR;
-	} else {
-		++address;
-	}
+	address = get_block_address(address, low, high, bank, ++block);
+
+	if (!address)
+		return 0;
 
-	err = allocate_threshold_blocks(cpu, bank, ++block, address);
+	err = allocate_threshold_blocks(cpu, bank, block, address);
 	if (err)
 		goto out_free;
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 4/5] x86/mce: Clarify comments regarding deferred error
  2016-02-29 22:32 [PATCH V2 0/5] Updates to EDAC and AMD MCE driver Aravind Gopalakrishnan
                   ` (2 preceding siblings ...)
  2016-02-29 22:32 ` [PATCH V2 3/5] x86/mce/AMD: Fix logic to obtain block address Aravind Gopalakrishnan
@ 2016-02-29 22:32 ` Aravind Gopalakrishnan
  2016-02-29 22:32 ` [PATCH V2 5/5] x86/mce/AMD: Add comments for easier understanding Aravind Gopalakrishnan
  4 siblings, 0 replies; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-02-29 22:32 UTC (permalink / raw)
  To: bp, tony.luck, hpa, mingo, tglx, dougthompson, mchehab
  Cc: x86, linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

The Deferred field indicates if we have a Deferred error.
Deferred errors indicate errors that hardware could not
fix. But it still does not cause any interruption to program
flow. So it does not generate any #MC and UC bit in MCx_STATUS
is not set.

Fixing comment here. No functional change

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 69f8bda..3b45e36 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -40,7 +40,7 @@
 #define MCI_STATUS_AR	 (1ULL<<55)  /* Action required */
 
 /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED	(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED	(1ULL<<44)  /* uncorrected error, deferred exception */
 #define MCI_STATUS_POISON	(1ULL<<43)  /* access poisonous data */
 #define MCI_STATUS_TCC		(1ULL<<55)  /* Task context corrupt */
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 5/5] x86/mce/AMD: Add comments for easier understanding
  2016-02-29 22:32 [PATCH V2 0/5] Updates to EDAC and AMD MCE driver Aravind Gopalakrishnan
                   ` (3 preceding siblings ...)
  2016-02-29 22:32 ` [PATCH V2 4/5] x86/mce: Clarify comments regarding deferred error Aravind Gopalakrishnan
@ 2016-02-29 22:32 ` Aravind Gopalakrishnan
  4 siblings, 0 replies; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-02-29 22:32 UTC (permalink / raw)
  To: bp, tony.luck, hpa, mingo, tglx, dougthompson, mchehab
  Cc: x86, linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

In an attempt to aid in understand of what threshold_block
structure holds, assing comments to describe the members here.
Also, trimming comments around threshold_restart_bank()
and updating copyright info.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/amd_nb.h        | 18 +++++++++---------
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  7 ++-----
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 3c56ef1..bc01c0a 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -27,15 +27,15 @@ struct amd_l3_cache {
 };
 
 struct threshold_block {
-	unsigned int		block;
-	unsigned int		bank;
-	unsigned int		cpu;
-	u32			address;
-	u16			interrupt_enable;
-	bool			interrupt_capable;
-	u16			threshold_limit;
-	struct kobject		kobj;
-	struct list_head	miscj;
+	unsigned int		block;			/* Threshold block number within bank */
+	unsigned int		bank;			/* MCA bank the block belongs to */
+	unsigned int		cpu;			/* CPU which controls the MCA bank */
+	u32			address;		/* MSR address for the block */
+	u16			interrupt_enable;	/* Enable/ Disable APIC interrupt upon threshold error */
+	bool			interrupt_capable;	/* Specifies if interrupt is possible from the block */
+	u16			threshold_limit;	/* Value upon which threshold interrupt is generated */
+	struct kobject		kobj;			/* sysfs object */
+	struct list_head	miscj;			/* Add multiple threshold blocks within a bank to the list */
 };
 
 struct threshold_bank {
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index a155eaa..ebb63ec 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -1,5 +1,5 @@
 /*
- *  (c) 2005-2015 Advanced Micro Devices, Inc.
+ *  (c) 2005-2016 Advanced Micro Devices, Inc.
  *  Your use of this code is subject to the terms and conditions of the
  *  GNU general public license version 2. See "COPYING" or
  *  http://www.gnu.org/licenses/gpl.html
@@ -183,10 +183,7 @@ static int lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi)
 	return 1;
 };
 
-/*
- * Called via smp_call_function_single(), must be called with correct
- * cpu affinity.
- */
+/* Reprogram MCx_MISC MSR behind this threshold bank */
 static void threshold_restart_bank(void *_tr)
 {
 	struct thresh_restart *tr = _tr;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  2016-02-29 22:32 ` [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors Aravind Gopalakrishnan
@ 2016-03-02 10:50   ` Borislav Petkov
  2016-03-02 10:52     ` [PATCH 1/3] x86/mce/AMD, EDAC: " Borislav Petkov
                       ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Borislav Petkov @ 2016-03-02 10:50 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

On Mon, Feb 29, 2016 at 04:32:56PM -0600, Aravind Gopalakrishnan wrote:
> For Scalable MCA enabled processors, errors are listed
> per IP block. And since it is not required for an IP to
> map to a particular bank, we need to use HWID and McaType
> values from the MCx_IPID register to figure out which IP
> a given bank represents.
> 
> We also have a new bit (TCC) in the MCx_STATUS register
> to indicate Task context is corrupt.
> 
> Add logic here to decode errors from all known IP
> blocks for Fam17h Model 00-0fh and to print TCC errors.
> 
> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> ---
>  arch/x86/include/asm/mce.h           |  53 ++++++
>  arch/x86/kernel/cpu/mcheck/mce_amd.c |  11 ++
>  drivers/edac/mce_amd.c               | 342 ++++++++++++++++++++++++++++++++++-
>  3 files changed, 405 insertions(+), 1 deletion(-)

Ok, applied with a bunch of changes ontop. I'm sending them as a reply
to this message. The second patch is relying on the assumption that a
hwid of 0 is invalid. Is that so?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
  2016-03-02 10:50   ` Borislav Petkov
@ 2016-03-02 10:52     ` Borislav Petkov
  2016-03-02 10:53     ` [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding Borislav Petkov
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2016-03-02 10:52 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

From: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Date: Mon, 29 Feb 2016 16:32:56 -0600
Subject: [PATCH 1/3] x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA
 errors

For Scalable MCA enabled processors, errors are listed per IP block. And
since it is not required for an IP to map to a particular bank, we need
to use HWID and McaType values from the MCx_IPID register to figure out
which IP a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register to indicate Task
context is corrupt.

Add logic here to decode errors from all known IP blocks for Fam17h
Model 00-0fh and to print TCC errors.

Boris:
- reorganize function placement in drivers/edac/mce_amd.c
- reflow comments

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1456785179-14378-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mce.h           |  53 ++++++
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  12 ++
 drivers/edac/mce_amd.c               | 342 ++++++++++++++++++++++++++++++++++-
 3 files changed, 406 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index f9d4b8d4baf2..6f1380064471 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,18 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED	(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON	(1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC		(1ULL<<55)  /* Task context corrupt */
+
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank.
+ *  - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
+ *    But should not be used to determine MSR numbers.
+ *  - TCC bit is present in MCx_STATUS.
+ */
+#define MCI_CONFIG_MCAX		0x1
+#define MCI_IPID_MCATYPE	0xFFFF0000
+#define MCI_IPID_HWID		0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -93,7 +105,9 @@
 
 /* 'SMCA': AMD64 Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG	0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID		0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)	(MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x)	(MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 /*
  * This structure contains all data related to the MCE log.  Also
@@ -291,4 +305,43 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
 				      struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerate new IP types and HWID values in AMD processors which support
+ * Scalable MCA.
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum amd_ip_types {
+	SMCA_F17H_CORE_BLOCK = 0,	/* Core errors */
+	SMCA_DF_BLOCK,			/* Data Fabric */
+	SMCA_UMC_BLOCK,			/* Unified Memory Controller */
+	SMCA_PB_BLOCK,			/* Parameter Block */
+	SMCA_PSP_BLOCK,			/* Platform Security Processor */
+	SMCA_SMU_BLOCK,			/* System Management Unit */
+	N_AMD_IP_TYPES
+};
+
+struct amd_hwid {
+	const char *amd_ipname;
+	unsigned int amd_hwid_value;
+};
+
+extern struct amd_hwid amd_hwid_mappings[N_AMD_IP_TYPES];
+
+enum amd_core_mca_blocks {
+	SMCA_LS_BLOCK = 0,	/* Load Store */
+	SMCA_IF_BLOCK,		/* Instruction Fetch */
+	SMCA_L2_CACHE_BLOCK,	/* L2 cache */
+	SMCA_DE_BLOCK,		/* Decoder unit */
+	RES,			/* Reserved */
+	SMCA_EX_BLOCK,		/* Execution unit */
+	SMCA_FP_BLOCK,		/* Floating Point */
+	SMCA_L3_CACHE_BLOCK	/* L3 cache */
+};
+
+enum amd_df_mca_blocks {
+	SMCA_CS_BLOCK = 0,	/* Coherent Slave */
+	SMCA_PIE_BLOCK		/* Power management, Interrupts, etc */
+};
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27bd5797..3188cd9eb9b5 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,18 @@ static const char * const th_names[] = {
 	"execution_unit",
 };
 
+/* Define HWID to IP type mappings for Scalable MCA */
+struct amd_hwid amd_hwid_mappings[] =
+{
+	[SMCA_F17H_CORE_BLOCK]	= { "f17h_core",	0xB0 },
+	[SMCA_DF_BLOCK]		= { "data fabric",	0x2E },
+	[SMCA_UMC_BLOCK]	= { "UMC",		0x96 },
+	[SMCA_PB_BLOCK]		= { "param block",	0x5 },
+	[SMCA_PSP_BLOCK]	= { "PSP",		0xFF },
+	[SMCA_SMU_BLOCK]	= { "SMU",		0x1 },
+};
+EXPORT_SYMBOL_GPL(amd_hwid_mappings);
+
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned char, bank_map);	/* see which banks are on */
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index e3a945ce374b..6820d17fea9c 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -147,6 +147,135 @@ static const char * const mc6_mce_desc[] = {
 	"Status Register File",
 };
 
+/* Scalable MCA error strings */
+static const char * const f17h_ls_mce_desc[] = {
+	"Load queue parity",
+	"Store queue parity",
+	"Miss address buffer payload parity",
+	"L1 TLB parity",
+	"",						/* reserved */
+	"DC tag error type 6",
+	"DC tag error type 1",
+	"Internal error type 1",
+	"Internal error type 2",
+	"Sys Read data error thread 0",
+	"Sys read data error thread 1",
+	"DC tag error type 2",
+	"DC data error type 1 (poison comsumption)",
+	"DC data error type 2",
+	"DC data error type 3",
+	"DC tag error type 4",
+	"L2 TLB parity",
+	"PDC parity error",
+	"DC tag error type 3",
+	"DC tag error type 5",
+	"L2 fill data error",
+};
+
+static const char * const f17h_if_mce_desc[] = {
+	"microtag probe port parity error",
+	"IC microtag or full tag multi-hit error",
+	"IC full tag parity",
+	"IC data array parity",
+	"Decoupling queue phys addr parity error",
+	"L0 ITLB parity error",
+	"L1 ITLB parity error",
+	"L2 ITLB parity error",
+	"BPQ snoop parity on Thread 0",
+	"BPQ snoop parity on Thread 1",
+	"L1 BTB multi-match error",
+	"L2 BTB multi-match error",
+};
+
+static const char * const f17h_l2_mce_desc[] = {
+	"L2M tag multi-way-hit error",
+	"L2M tag ECC error",
+	"L2M data ECC error",
+	"HW assert",
+};
+
+static const char * const f17h_de_mce_desc[] = {
+	"uop cache tag parity error",
+	"uop cache data parity error",
+	"Insn buffer parity error",
+	"Insn dispatch queue parity error",
+	"Fetch address FIFO parity",
+	"Patch RAM data parity",
+	"Patch RAM sequencer parity",
+	"uop buffer parity"
+};
+
+static const char * const f17h_ex_mce_desc[] = {
+	"Watchdog timeout error",
+	"Phy register file parity",
+	"Flag register file parity",
+	"Immediate displacement register file parity",
+	"Address generator payload parity",
+	"EX payload parity",
+	"Checkpoint queue parity",
+	"Retire dispatch queue parity",
+};
+
+static const char * const f17h_fp_mce_desc[] = {
+	"Physical register file parity",
+	"Freelist parity error",
+	"Schedule queue parity",
+	"NSQ parity error",
+	"Retire queue parity",
+	"Status register file parity",
+};
+
+static const char * const f17h_l3_mce_desc[] = {
+	"Shadow tag macro ECC error",
+	"Shadow tag macro multi-way-hit error",
+	"L3M tag ECC error",
+	"L3M tag multi-way-hit error",
+	"L3M data ECC error",
+	"XI parity, L3 fill done channel error",
+	"L3 victim queue parity",
+	"L3 HW assert",
+};
+
+static const char * const f17h_cs_mce_desc[] = {
+	"Illegal request from transport layer",
+	"Address violation",
+	"Security violation",
+	"Illegal response from transport layer",
+	"Unexpected response",
+	"Parity error on incoming request or probe response data",
+	"Parity error on incoming read response data",
+	"Atomic request parity",
+	"ECC error on probe filter access",
+};
+
+static const char * const f17h_pie_mce_desc[] = {
+	"HW assert",
+	"Internal PIE register security violation",
+	"Error on GMI link",
+	"Poison data written to internal PIE register",
+};
+
+static const char * const f17h_umc_mce_desc[] = {
+	"DRAM ECC error",
+	"Data poison error on DRAM",
+	"SDP parity error",
+	"Advanced peripheral bus error",
+	"Command/address parity error",
+	"Write data CRC error",
+};
+
+static const char * const f17h_pb_mce_desc[] = {
+	"Parameter Block RAM ECC error",
+};
+
+static const char * const f17h_psp_mce_desc[] = {
+	"PSP RAM ECC or parity error",
+};
+
+static const char * const f17h_smu_mce_desc[] = {
+	"SMU RAM ECC or parity error",
+};
+
 static bool f12h_mc0_mce(u16 ec, u8 xec)
 {
 	bool ret = false;
@@ -691,6 +820,193 @@ static void decode_mc6_mce(struct mce *m)
 	pr_emerg(HW_ERR "Corrupted MC6 MCE info?\n");
 }
 
+static void decode_f17h_core_errors(u8 xec, unsigned int mca_type)
+{
+	const char * const *error_desc_array;
+	char *ip_name;
+	size_t len;
+
+	switch (mca_type) {
+	case SMCA_LS_BLOCK:
+		error_desc_array = f17h_ls_mce_desc;
+		ip_name = "LS";
+		len = ARRAY_SIZE(f17h_ls_mce_desc) - 1;
+
+		if (xec == 0x4) {
+			pr_cont("Unrecognized error code from LS MCA bank\n");
+			return;
+		}
+
+		break;
+
+	case SMCA_IF_BLOCK:
+		error_desc_array = f17h_if_mce_desc;
+		ip_name = "IF";
+		len = ARRAY_SIZE(f17h_if_mce_desc) - 1;
+		break;
+
+	case SMCA_L2_CACHE_BLOCK:
+		error_desc_array = f17h_l2_mce_desc;
+		ip_name = "L2_Cache";
+		len = ARRAY_SIZE(f17h_l2_mce_desc) - 1;
+		break;
+
+	case SMCA_DE_BLOCK:
+		error_desc_array = f17h_de_mce_desc;
+		ip_name = "DE";
+		len = ARRAY_SIZE(f17h_de_mce_desc) - 1;
+		break;
+
+	case SMCA_EX_BLOCK:
+		error_desc_array = f17h_ex_mce_desc;
+		ip_name = "EX";
+		len = ARRAY_SIZE(f17h_ex_mce_desc) - 1;
+		break;
+
+	case SMCA_FP_BLOCK:
+		error_desc_array = f17h_fp_mce_desc;
+		ip_name = "FP";
+		len = ARRAY_SIZE(f17h_fp_mce_desc) - 1;
+		break;
+
+	case SMCA_L3_CACHE_BLOCK:
+		error_desc_array = f17h_l3_mce_desc;
+		ip_name = "L3_Cache";
+		len = ARRAY_SIZE(f17h_l3_mce_desc) - 1;
+		break;
+
+	default:
+		pr_cont("Unrecognized Mca Type value for F17h Core. Unable to decode errors\n");
+		return;
+	}
+
+	if (xec > len) {
+		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+		return;
+	}
+
+	pr_cont("%s.\n", error_desc_array[xec]);
+}
+
+static void decode_df_errors(u8 xec, unsigned int mca_type)
+{
+	const char * const *error_desc_array;
+	char *ip_name;
+	size_t len;
+
+	switch (mca_type) {
+	case  SMCA_CS_BLOCK:
+		error_desc_array = f17h_cs_mce_desc;
+		ip_name = "CS";
+		len = ARRAY_SIZE(f17h_cs_mce_desc) - 1;
+		break;
+
+	case SMCA_PIE_BLOCK:
+		error_desc_array = f17h_pie_mce_desc;
+		ip_name = "PIE";
+		len = ARRAY_SIZE(f17h_pie_mce_desc) - 1;
+		break;
+
+	default:
+		pr_cont("Unrecognized Mca Type value for DF. Unable to decode errors\n");
+		return;
+	}
+
+	if (xec > len) {
+		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+		return;
+	}
+
+	pr_cont("%s.\n", error_desc_array[xec]);
+}
+
+/* Decode errors according to Scalable MCA specification */
+static void decode_smca_errors(struct mce *m)
+{
+	u32 low, high;
+	u32 addr = MSR_AMD64_SMCA_MCx_IPID(m->bank);
+	unsigned int hwid, mca_type, i;
+	u8 xec = XEC(m->status, xec_mask);
+	const char * const *error_desc_array;
+	char *ip_name;
+	size_t len;
+
+	if (rdmsr_safe(addr, &low, &high)) {
+		pr_emerg("Invalid IP block specified, error information is unreliable.\n");
+		return;
+	}
+
+	hwid = high & MCI_IPID_HWID;
+	mca_type = (high & MCI_IPID_MCATYPE) >> 16;
+
+	pr_emerg(HW_ERR "MC%d IPID value: 0x%08x%08x\n", m->bank, high, low);
+
+	/*
+	 * Based on hwid and mca_type values,
+	 * decode errors from respective IPs.
+	 * Note: mca_type values make sense only
+	 * in the context of an hwid
+	 */
+	for (i = 0; i < ARRAY_SIZE(amd_hwid_mappings); i++)
+		if (amd_hwid_mappings[i].amd_hwid_value == hwid)
+			break;
+
+	switch (i) {
+	case SMCA_F17H_CORE_BLOCK:
+		ip_name = (mca_type == SMCA_L3_CACHE_BLOCK) ?
+			  "L3 Cache" : "F17h Core";
+		break;
+
+	case SMCA_DF_BLOCK:
+		ip_name = "DF";
+		break;
+
+	case SMCA_UMC_BLOCK:
+		error_desc_array = f17h_umc_mce_desc;
+		ip_name = "UMC";
+		len = ARRAY_SIZE(f17h_umc_mce_desc) - 1;
+		break;
+
+	case SMCA_PB_BLOCK:
+		error_desc_array = f17h_pb_mce_desc;
+		ip_name = "PB";
+		len = ARRAY_SIZE(f17h_pb_mce_desc) - 1;
+		break;
+
+	case SMCA_PSP_BLOCK:
+		error_desc_array = f17h_psp_mce_desc;
+		ip_name = "PSP";
+		len = ARRAY_SIZE(f17h_psp_mce_desc) - 1;
+		break;
+
+	case SMCA_SMU_BLOCK:
+		error_desc_array = f17h_smu_mce_desc;
+		ip_name = "SMU";
+		len = ARRAY_SIZE(f17h_smu_mce_desc) - 1;
+		break;
+
+	default:
+		pr_emerg(HW_ERR "HWID:%d does not match any existing IPs\n", hwid);
+		return;
+	}
+
+	pr_emerg(HW_ERR "%s Error: ", ip_name);
+
+	if (i == SMCA_F17H_CORE_BLOCK) {
+		decode_f17h_core_errors(xec, mca_type);
+	} else if (i == SMCA_DF_BLOCK) {
+		decode_df_errors(xec, mca_type);
+	} else {
+		if (xec > len) {
+			pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+			return;
+		}
+
+		pr_cont("%s.\n", error_desc_array[xec]);
+	}
+}
+
+
 static inline void amd_decode_err_code(u16 ec)
 {
 	if (INT_ERROR(ec)) {
@@ -752,6 +1068,7 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	struct mce *m = (struct mce *)data;
 	struct cpuinfo_x86 *c = &cpu_data(m->extcpu);
 	int ecc;
+	u32 ebx = cpuid_ebx(0x80000007);
 
 	if (amd_filter_mce(m))
 		return NOTIFY_STOP;
@@ -769,11 +1086,20 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"),
 		((m->status & MCI_STATUS_ADDRV)	? "AddrV" : "-"));
 
-	if (c->x86 == 0x15 || c->x86 == 0x16)
+	if (c->x86 >= 0x15)
 		pr_cont("|%s|%s",
 			((m->status & MCI_STATUS_DEFERRED) ? "Deferred" : "-"),
 			((m->status & MCI_STATUS_POISON)   ? "Poison"   : "-"));
 
+	if (!!(ebx & BIT(3))) {
+		u32 low, high;
+		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
+
+		if (!rdmsr_safe(addr, &low, &high) &&
+		    (low & MCI_CONFIG_MCAX))
+			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
+	}
+
 	/* do the two bits[14:13] together */
 	ecc = (m->status >> 45) & 0x3;
 	if (ecc)
@@ -784,6 +1110,11 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	if (m->status & MCI_STATUS_ADDRV)
 		pr_emerg(HW_ERR "MC%d Error Address: 0x%016llx\n", m->bank, m->addr);
 
+	if (!!(ebx & BIT(3))) {
+		decode_smca_errors(m);
+		goto err_code;
+	}
+
 	if (!fam_ops)
 		goto err_code;
 
@@ -834,6 +1165,7 @@ static struct notifier_block amd_mce_dec_nb = {
 static int __init mce_amd_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
+	u32 ebx = cpuid_ebx(0x80000007);
 
 	if (c->x86_vendor != X86_VENDOR_AMD)
 		return -ENODEV;
@@ -888,6 +1220,14 @@ static int __init mce_amd_init(void)
 		fam_ops->mc2_mce = f16h_mc2_mce;
 		break;
 
+	case 0x17:
+		xec_mask = 0x3f;
+		if (!(ebx & BIT(3))) {
+			printk(KERN_WARNING "Decoding supported only on Scalable MCA enabled processors\n");
+			return 0;
+		}
+		break;
+
 	default:
 		printk(KERN_WARNING "Huh? What family is it: 0x%x?!\n", c->x86);
 		kfree(fam_ops);
-- 
2.3.5


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding
  2016-03-02 10:50   ` Borislav Petkov
  2016-03-02 10:52     ` [PATCH 1/3] x86/mce/AMD, EDAC: " Borislav Petkov
@ 2016-03-02 10:53     ` Borislav Petkov
  2016-03-02 15:52       ` Aravind Gopalakrishnan
  2016-03-02 10:54     ` [PATCH 3/3] EDAC, mce_amd: Correct error paths Borislav Petkov
  2016-03-02 15:57     ` [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors Aravind Gopalakrishnan
  3 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2016-03-02 10:53 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

From: Borislav Petkov <bp@suse.de>
Date: Wed, 2 Mar 2016 11:23:13 +0100
Subject: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding

Merge all IP blocks into a single enum. This allows for easier block
name use later. Drop superfluous "_BLOCK" from the enum names.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/mce.h           | 46 +++++++++----------
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 27 +++++++----
 drivers/edac/mce_amd.c               | 88 +++++++++++++++---------------------
 3 files changed, 76 insertions(+), 85 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6f1380064471..4a197cb25593 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -311,37 +311,33 @@ extern void apei_mce_report_mem_error(int corrected,
  */
 #ifdef CONFIG_X86_MCE_AMD
 enum amd_ip_types {
-	SMCA_F17H_CORE_BLOCK = 0,	/* Core errors */
-	SMCA_DF_BLOCK,			/* Data Fabric */
-	SMCA_UMC_BLOCK,			/* Unified Memory Controller */
-	SMCA_PB_BLOCK,			/* Parameter Block */
-	SMCA_PSP_BLOCK,			/* Platform Security Processor */
-	SMCA_SMU_BLOCK,			/* System Management Unit */
+	SMCA_F17H_CORE = 0,	/* Core errors */
+	SMCA_LS,		/* - Load Store */
+	SMCA_IF,		/* - Instruction Fetch */
+	SMCA_L2_CACHE,		/* - L2 cache */
+	SMCA_DE,		/* - Decoder unit */
+	RES,			/* - Reserved */
+	SMCA_EX,		/* - Execution unit */
+	SMCA_FP,		/* - Floating Point */
+	SMCA_L3_CACHE,		/* - L3 cache */
+
+	SMCA_DF,		/* Data Fabric */
+	SMCA_CS,		/* - Coherent Slave */
+	SMCA_PIE,		/* - Power management, Interrupts, etc */
+
+	SMCA_UMC,		/* Unified Memory Controller */
+	SMCA_PB,		/* Parameter Block */
+	SMCA_PSP,		/* Platform Security Processor */
+	SMCA_SMU,		/* System Management Unit */
 	N_AMD_IP_TYPES
 };
 
 struct amd_hwid {
-	const char *amd_ipname;
-	unsigned int amd_hwid_value;
+	const char *name;
+	unsigned int hwid;
 };
 
-extern struct amd_hwid amd_hwid_mappings[N_AMD_IP_TYPES];
-
-enum amd_core_mca_blocks {
-	SMCA_LS_BLOCK = 0,	/* Load Store */
-	SMCA_IF_BLOCK,		/* Instruction Fetch */
-	SMCA_L2_CACHE_BLOCK,	/* L2 cache */
-	SMCA_DE_BLOCK,		/* Decoder unit */
-	RES,			/* Reserved */
-	SMCA_EX_BLOCK,		/* Execution unit */
-	SMCA_FP_BLOCK,		/* Floating Point */
-	SMCA_L3_CACHE_BLOCK	/* L3 cache */
-};
+extern struct amd_hwid amd_hwids[N_AMD_IP_TYPES];
 
-enum amd_df_mca_blocks {
-	SMCA_CS_BLOCK = 0,	/* Coherent Slave */
-	SMCA_PIE_BLOCK		/* Power management, Interrupts, etc */
-};
 #endif
-
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 3188cd9eb9b5..c184c92a00ab 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -72,16 +72,25 @@ static const char * const th_names[] = {
 };
 
 /* Define HWID to IP type mappings for Scalable MCA */
-struct amd_hwid amd_hwid_mappings[] =
-{
-	[SMCA_F17H_CORE_BLOCK]	= { "f17h_core",	0xB0 },
-	[SMCA_DF_BLOCK]		= { "data fabric",	0x2E },
-	[SMCA_UMC_BLOCK]	= { "UMC",		0x96 },
-	[SMCA_PB_BLOCK]		= { "param block",	0x5 },
-	[SMCA_PSP_BLOCK]	= { "PSP",		0xFF },
-	[SMCA_SMU_BLOCK]	= { "SMU",		0x1 },
+struct amd_hwid amd_hwids[] =
+{
+	[SMCA_F17H_CORE] = { "F17h core",	0xB0 },
+	[SMCA_LS]	 = { "Load-Store",	0x0 },
+	[SMCA_IF]	 = { "IFetch",		0x0 },
+	[SMCA_L2_CACHE]  = { "L2 Cache",	0x0 },
+	[SMCA_DE]	 = { "Decoder",		0x0 },
+	[SMCA_EX]	 = { "Execution",	0x0 },
+	[SMCA_FP]	 = { "Floating Point",	0x0 },
+	[SMCA_L3_CACHE]  = { "L3 Cache",	0x0 },
+	[SMCA_DF]	 = { "Data Fabric",	0x2E },
+	[SMCA_CS]	 = { "Coherent Slave",	0x0 },
+	[SMCA_PIE]	 = { "PwrMan/Intr",	0x0 },
+	[SMCA_UMC]	 = { "UMC",		0x96 },
+	[SMCA_PB]	 = { "Param Block",	0x5 },
+	[SMCA_PSP]	 = { "PSP",		0xFF },
+	[SMCA_SMU]	 = { "SMU",		0x1 },
 };
-EXPORT_SYMBOL_GPL(amd_hwid_mappings);
+EXPORT_SYMBOL_GPL(amd_hwids);
 
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned char, bank_map);	/* see which banks are on */
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 6820d17fea9c..0f9953cbde4e 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -820,58 +820,51 @@ static void decode_mc6_mce(struct mce *m)
 	pr_emerg(HW_ERR "Corrupted MC6 MCE info?\n");
 }
 
-static void decode_f17h_core_errors(u8 xec, unsigned int mca_type)
+static void decode_f17h_core_errors(const char *ip_name, u8 xec, unsigned int mca_type)
 {
 	const char * const *error_desc_array;
-	char *ip_name;
 	size_t len;
 
+	pr_emerg(HW_ERR "%s Error: ", ip_name);
+
 	switch (mca_type) {
-	case SMCA_LS_BLOCK:
+	case SMCA_LS:
 		error_desc_array = f17h_ls_mce_desc;
-		ip_name = "LS";
 		len = ARRAY_SIZE(f17h_ls_mce_desc) - 1;
 
 		if (xec == 0x4) {
 			pr_cont("Unrecognized error code from LS MCA bank\n");
 			return;
 		}
-
 		break;
 
-	case SMCA_IF_BLOCK:
+	case SMCA_IF:
 		error_desc_array = f17h_if_mce_desc;
-		ip_name = "IF";
 		len = ARRAY_SIZE(f17h_if_mce_desc) - 1;
 		break;
 
-	case SMCA_L2_CACHE_BLOCK:
+	case SMCA_L2_CACHE:
 		error_desc_array = f17h_l2_mce_desc;
-		ip_name = "L2_Cache";
 		len = ARRAY_SIZE(f17h_l2_mce_desc) - 1;
 		break;
 
-	case SMCA_DE_BLOCK:
+	case SMCA_DE:
 		error_desc_array = f17h_de_mce_desc;
-		ip_name = "DE";
 		len = ARRAY_SIZE(f17h_de_mce_desc) - 1;
 		break;
 
-	case SMCA_EX_BLOCK:
+	case SMCA_EX:
 		error_desc_array = f17h_ex_mce_desc;
-		ip_name = "EX";
 		len = ARRAY_SIZE(f17h_ex_mce_desc) - 1;
 		break;
 
-	case SMCA_FP_BLOCK:
+	case SMCA_FP:
 		error_desc_array = f17h_fp_mce_desc;
-		ip_name = "FP";
 		len = ARRAY_SIZE(f17h_fp_mce_desc) - 1;
 		break;
 
-	case SMCA_L3_CACHE_BLOCK:
+	case SMCA_L3_CACHE:
 		error_desc_array = f17h_l3_mce_desc;
-		ip_name = "L3_Cache";
 		len = ARRAY_SIZE(f17h_l3_mce_desc) - 1;
 		break;
 
@@ -881,7 +874,7 @@ static void decode_f17h_core_errors(u8 xec, unsigned int mca_type)
 	}
 
 	if (xec > len) {
-		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+		pr_cont("Unrecognized error code from %s MCA bank\n", amd_hwids[mca_type].name);
 		return;
 	}
 
@@ -891,19 +884,18 @@ static void decode_f17h_core_errors(u8 xec, unsigned int mca_type)
 static void decode_df_errors(u8 xec, unsigned int mca_type)
 {
 	const char * const *error_desc_array;
-	char *ip_name;
 	size_t len;
 
+	pr_emerg(HW_ERR "Data Fabric Error: ");
+
 	switch (mca_type) {
-	case  SMCA_CS_BLOCK:
+	case  SMCA_CS:
 		error_desc_array = f17h_cs_mce_desc;
-		ip_name = "CS";
 		len = ARRAY_SIZE(f17h_cs_mce_desc) - 1;
 		break;
 
-	case SMCA_PIE_BLOCK:
+	case SMCA_PIE:
 		error_desc_array = f17h_pie_mce_desc;
-		ip_name = "PIE";
 		len = ARRAY_SIZE(f17h_pie_mce_desc) - 1;
 		break;
 
@@ -913,7 +905,7 @@ static void decode_df_errors(u8 xec, unsigned int mca_type)
 	}
 
 	if (xec > len) {
-		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+		pr_cont("Unrecognized error code from %s MCA bank\n", amd_hwids[mca_type].name);
 		return;
 	}
 
@@ -928,7 +920,7 @@ static void decode_smca_errors(struct mce *m)
 	unsigned int hwid, mca_type, i;
 	u8 xec = XEC(m->status, xec_mask);
 	const char * const *error_desc_array;
-	char *ip_name;
+	const char *ip_name;
 	size_t len;
 
 	if (rdmsr_safe(addr, &low, &high)) {
@@ -947,41 +939,39 @@ static void decode_smca_errors(struct mce *m)
 	 * Note: mca_type values make sense only
 	 * in the context of an hwid
 	 */
-	for (i = 0; i < ARRAY_SIZE(amd_hwid_mappings); i++)
-		if (amd_hwid_mappings[i].amd_hwid_value == hwid)
+	for (i = 0; i < ARRAY_SIZE(amd_hwids); i++)
+		if (amd_hwids[i].hwid == hwid)
 			break;
 
 	switch (i) {
-	case SMCA_F17H_CORE_BLOCK:
-		ip_name = (mca_type == SMCA_L3_CACHE_BLOCK) ?
-			  "L3 Cache" : "F17h Core";
+	case SMCA_F17H_CORE:
+		ip_name = (mca_type == SMCA_L3_CACHE) ?
+			   "L3 Cache" : "F17h Core";
+
+		return decode_f17h_core_errors(ip_name, xec, mca_type);
 		break;
 
-	case SMCA_DF_BLOCK:
-		ip_name = "DF";
+	case SMCA_DF:
+		return decode_df_errors(xec, mca_type);
 		break;
 
-	case SMCA_UMC_BLOCK:
+	case SMCA_UMC:
 		error_desc_array = f17h_umc_mce_desc;
-		ip_name = "UMC";
 		len = ARRAY_SIZE(f17h_umc_mce_desc) - 1;
 		break;
 
-	case SMCA_PB_BLOCK:
+	case SMCA_PB:
 		error_desc_array = f17h_pb_mce_desc;
-		ip_name = "PB";
 		len = ARRAY_SIZE(f17h_pb_mce_desc) - 1;
 		break;
 
-	case SMCA_PSP_BLOCK:
+	case SMCA_PSP:
 		error_desc_array = f17h_psp_mce_desc;
-		ip_name = "PSP";
 		len = ARRAY_SIZE(f17h_psp_mce_desc) - 1;
 		break;
 
-	case SMCA_SMU_BLOCK:
+	case SMCA_SMU:
 		error_desc_array = f17h_smu_mce_desc;
-		ip_name = "SMU";
 		len = ARRAY_SIZE(f17h_smu_mce_desc) - 1;
 		break;
 
@@ -990,20 +980,16 @@ static void decode_smca_errors(struct mce *m)
 		return;
 	}
 
-	pr_emerg(HW_ERR "%s Error: ", ip_name);
+	ip_name = amd_hwids[mca_type].name;
 
-	if (i == SMCA_F17H_CORE_BLOCK) {
-		decode_f17h_core_errors(xec, mca_type);
-	} else if (i == SMCA_DF_BLOCK) {
-		decode_df_errors(xec, mca_type);
-	} else {
-		if (xec > len) {
-			pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
-			return;
-		}
+	pr_emerg(HW_ERR "%s Error: ", ip_name);
 
-		pr_cont("%s.\n", error_desc_array[xec]);
+	if (xec > len) {
+		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+		return;
 	}
+
+	pr_cont("%s.\n", error_desc_array[xec]);
 }
 
 
-- 
2.3.5


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] EDAC, mce_amd: Correct error paths
  2016-03-02 10:50   ` Borislav Petkov
  2016-03-02 10:52     ` [PATCH 1/3] x86/mce/AMD, EDAC: " Borislav Petkov
  2016-03-02 10:53     ` [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding Borislav Petkov
@ 2016-03-02 10:54     ` Borislav Petkov
  2016-03-02 15:56       ` Aravind Gopalakrishnan
  2016-03-02 15:57     ` [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors Aravind Gopalakrishnan
  3 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2016-03-02 10:54 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

From: Borislav Petkov <bp@suse.de>
Date: Wed, 2 Mar 2016 11:46:58 +0100
Subject: [PATCH 3/3] EDAC, mce_amd: Correct error paths

We need to unwind properly when we fail to find the proper decoding
functions. Streamline error messages to resemble the rest of this file,
while at it and do some minor stylistic changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/edac/mce_amd.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 0f9953cbde4e..81495a360eea 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -833,7 +833,7 @@ static void decode_f17h_core_errors(const char *ip_name, u8 xec, unsigned int mc
 		len = ARRAY_SIZE(f17h_ls_mce_desc) - 1;
 
 		if (xec == 0x4) {
-			pr_cont("Unrecognized error code from LS MCA bank\n");
+			pr_cont("Unrecognized LS MCA error code.\n");
 			return;
 		}
 		break;
@@ -869,12 +869,12 @@ static void decode_f17h_core_errors(const char *ip_name, u8 xec, unsigned int mc
 		break;
 
 	default:
-		pr_cont("Unrecognized Mca Type value for F17h Core. Unable to decode errors\n");
+		pr_cont("Corrupted MCA Core info.\n");
 		return;
 	}
 
 	if (xec > len) {
-		pr_cont("Unrecognized error code from %s MCA bank\n", amd_hwids[mca_type].name);
+		pr_cont("Unrecognized %s MCA bank error code.\n", amd_hwids[mca_type].name);
 		return;
 	}
 
@@ -900,12 +900,12 @@ static void decode_df_errors(u8 xec, unsigned int mca_type)
 		break;
 
 	default:
-		pr_cont("Unrecognized Mca Type value for DF. Unable to decode errors\n");
+		pr_cont("Corrupted MCA Data Fabric info.\n");
 		return;
 	}
 
 	if (xec > len) {
-		pr_cont("Unrecognized error code from %s MCA bank\n", amd_hwids[mca_type].name);
+		pr_cont("Unrecognized %s MCA bank error code.\n", amd_hwids[mca_type].name);
 		return;
 	}
 
@@ -915,12 +915,12 @@ static void decode_df_errors(u8 xec, unsigned int mca_type)
 /* Decode errors according to Scalable MCA specification */
 static void decode_smca_errors(struct mce *m)
 {
-	u32 low, high;
 	u32 addr = MSR_AMD64_SMCA_MCx_IPID(m->bank);
 	unsigned int hwid, mca_type, i;
 	u8 xec = XEC(m->status, xec_mask);
 	const char * const *error_desc_array;
 	const char *ip_name;
+	u32 low, high;
 	size_t len;
 
 	if (rdmsr_safe(addr, &low, &high)) {
@@ -934,10 +934,8 @@ static void decode_smca_errors(struct mce *m)
 	pr_emerg(HW_ERR "MC%d IPID value: 0x%08x%08x\n", m->bank, high, low);
 
 	/*
-	 * Based on hwid and mca_type values,
-	 * decode errors from respective IPs.
-	 * Note: mca_type values make sense only
-	 * in the context of an hwid
+	 * Based on hwid and mca_type values, decode errors from respective IPs.
+	 * Note: mca_type values make sense only in the context of a hwid.
 	 */
 	for (i = 0; i < ARRAY_SIZE(amd_hwids); i++)
 		if (amd_hwids[i].hwid == hwid)
@@ -976,7 +974,7 @@ static void decode_smca_errors(struct mce *m)
 		break;
 
 	default:
-		pr_emerg(HW_ERR "HWID:%d does not match any existing IPs\n", hwid);
+		pr_emerg(HW_ERR "HWID:%d does not match any existing IPs.\n", hwid);
 		return;
 	}
 
@@ -985,7 +983,7 @@ static void decode_smca_errors(struct mce *m)
 	pr_emerg(HW_ERR "%s Error: ", ip_name);
 
 	if (xec > len) {
-		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
+		pr_cont("Unrecognized %s MCA bank error code.\n", ip_name);
 		return;
 	}
 
@@ -1151,7 +1149,7 @@ static struct notifier_block amd_mce_dec_nb = {
 static int __init mce_amd_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
-	u32 ebx = cpuid_ebx(0x80000007);
+	u32 ebx;
 
 	if (c->x86_vendor != X86_VENDOR_AMD)
 		return -ENODEV;
@@ -1207,17 +1205,18 @@ static int __init mce_amd_init(void)
 		break;
 
 	case 0x17:
+		ebx	 = cpuid_ebx(0x80000007);
 		xec_mask = 0x3f;
+
 		if (!(ebx & BIT(3))) {
-			printk(KERN_WARNING "Decoding supported only on Scalable MCA enabled processors\n");
-			return 0;
+			printk(KERN_WARNING "Decoding supported only on Scalable MCA processors.\n");
+			goto err_out;
 		}
 		break;
 
 	default:
 		printk(KERN_WARNING "Huh? What family is it: 0x%x?!\n", c->x86);
-		kfree(fam_ops);
-		fam_ops = NULL;
+		goto err_out;
 	}
 
 	pr_info("MCE: In-kernel MCE decoding enabled.\n");
@@ -1225,6 +1224,11 @@ static int __init mce_amd_init(void)
 	mce_register_decode_chain(&amd_mce_dec_nb);
 
 	return 0;
+
+err_out:
+	kfree(fam_ops);
+	fam_ops = NULL;
+	return -EINVAL;
 }
 early_initcall(mce_amd_init);
 
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding
  2016-03-02 10:53     ` [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding Borislav Petkov
@ 2016-03-02 15:52       ` Aravind Gopalakrishnan
  2016-03-02 16:21         ` Borislav Petkov
  0 siblings, 1 reply; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-03-02 15:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

On 3/2/2016 4:53 AM, Borislav Petkov wrote:
> Merge all IP blocks into a single enum. This allows for easier block
> name use later. Drop superfluous "_BLOCK" from the enum names.
>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
>
>   enum amd_ip_types {
> -	SMCA_F17H_CORE_BLOCK = 0,	/* Core errors */
> -	SMCA_DF_BLOCK,			/* Data Fabric */
> -	SMCA_UMC_BLOCK,			/* Unified Memory Controller */
> -	SMCA_PB_BLOCK,			/* Parameter Block */
> -	SMCA_PSP_BLOCK,			/* Platform Security Processor */
> -	SMCA_SMU_BLOCK,			/* System Management Unit */
> +	SMCA_F17H_CORE = 0,	/* Core errors */
> +	SMCA_LS,		/* - Load Store */
> +	SMCA_IF,		/* - Instruction Fetch */
> +	SMCA_L2_CACHE,		/* - L2 cache */
> +	SMCA_DE,		/* - Decoder unit */
> +	RES,			/* - Reserved */
> +	SMCA_EX,		/* - Execution unit */
> +	SMCA_FP,		/* - Floating Point */
> +	SMCA_L3_CACHE,		/* - L3 cache */
> +
> +	SMCA_DF,		/* Data Fabric */
> +	SMCA_CS,		/* - Coherent Slave */
> +	SMCA_PIE,		/* - Power management, Interrupts, etc */
> +
> +	SMCA_UMC,		/* Unified Memory Controller */
> +	SMCA_PB,		/* Parameter Block */
> +	SMCA_PSP,		/* Platform Security Processor */
> +	SMCA_SMU,		/* System Management Unit */
>   	N_AMD_IP_TYPES
>   };
>   

No, this would break the logic in EDAC.
The main reason I placed it in separate enums is because the "mca_type" 
values map to the enum.

These blocks-

+	SMCA_LS,		/* - Load Store */
+	SMCA_IF,		/* - Instruction Fetch */
+	SMCA_L2_CACHE,		/* - L2 cache */
+	SMCA_DE,		/* - Decoder unit */
+	RES,			/* - Reserved */
+	SMCA_EX,		/* - Execution unit */
+	SMCA_FP,		/* - Floating Point */
+	SMCA_L3_CACHE,		/* - L3 cache */


have the same hwid value (0xb0). But they differ in mca_type values. And 
in exactly that order.
(LS is mca_type 0, IF is mca_type 1 and so on..)

So, after we get mca_type value from the MSR (mca_type = (high & 
MCI_IPID_MCATYPE) >> 16),
We check for LS (=0) or IF (=1) ...
With this change, LS block is assigned 1 due to the ordering in enum.

And this logic applies to "DF" block as well.  (whose component blocks 
are "coherent slave" and "pie" which have mca_type values of 0 and 1 
respectively)
"DF" and "F17h_core" are essentially parent blocks and CS, PIE, LS, IF 
etc are children. hwid indicates the parent, mca_type indicates the child..

>
>   
>   /* Define HWID to IP type mappings for Scalable MCA */
> -struct amd_hwid amd_hwid_mappings[] =
> -{
> -	[SMCA_F17H_CORE_BLOCK]	= { "f17h_core",	0xB0 },
> -	[SMCA_DF_BLOCK]		= { "data fabric",	0x2E },
> -	[SMCA_UMC_BLOCK]	= { "UMC",		0x96 },
> -	[SMCA_PB_BLOCK]		= { "param block",	0x5 },
> -	[SMCA_PSP_BLOCK]	= { "PSP",		0xFF },
> -	[SMCA_SMU_BLOCK]	= { "SMU",		0x1 },
> +struct amd_hwid amd_hwids[] =
> +{
> +	[SMCA_F17H_CORE] = { "F17h core",	0xB0 },
> +	[SMCA_LS]	 = { "Load-Store",	0x0 },
> +	[SMCA_IF]	 = { "IFetch",		0x0 },
> +	[SMCA_L2_CACHE]  = { "L2 Cache",	0x0 },
> +	[SMCA_DE]	 = { "Decoder",		0x0 },
> +	[SMCA_EX]	 = { "Execution",	0x0 },
> +	[SMCA_FP]	 = { "Floating Point",	0x0 },
> +	[SMCA_L3_CACHE]  = { "L3 Cache",	0x0 },
> +	[SMCA_DF]	 = { "Data Fabric",	0x2E },
> +	[SMCA_CS]	 = { "Coherent Slave",	0x0 },
> +	[SMCA_PIE]	 = { "PwrMan/Intr",	0x0 },
> +	[SMCA_UMC]	 = { "UMC",		0x96 },
> +	[SMCA_PB]	 = { "Param Block",	0x5 },
> +	[SMCA_PSP]	 = { "PSP",		0xFF },
> +	[SMCA_SMU]	 = { "SMU",		0x1 },
>   };
> -EXPORT_SYMBOL_GPL(amd_hwid_mappings);
> +EXPORT_SYMBOL_GPL(amd_hwids);
>   

These strings are what I intend to use for the sysfs interface.
So, I am not sure if "PwrMan/Intr" would work when I need to create the 
kobj..

Also, the legacy names use snake_case-
static const char * const th_names[] = {
         "load_store",
         "insn_fetch",
         "combined_unit",
         "",
         "northbridge",
         "execution_unit",
};

So, I think we should continue this approach and have something like this-
static const char * const amd_core_mcablock_names[] = {
         [SMCA_LS]         = "load_store",
         [SMCA_IF]         = "insn_fetch",
         [SMCA_L2_CACHE]   = "l2_cache",
         [SMCA_DE]         = "decode_unit",
         [RES]                   = "",
         [SMCA_EX]         = "execution_unit",
         [SMCA_FP]         = "floating_point",
         [SMCA_L3_CACHE]   = "l3_cache",
};

static const char * const amd_df_mcablock_names[] = {
         [SMCA_CS]  = "coherent_slave",
         [SMCA_PIE] = "pie",
};

(Split arrays again because I feel it'd be better to have arrays ordered 
according to mca_type values)

Expanding "df" to "data_fabric" and "pb" to "param_block" is fine with me.

>
>   
>   	if (xec > len) {
> -		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
> +		pr_cont("Unrecognized error code from %s MCA bank\n", amd_hwids[mca_type].name);

This wouldn't work because of the mca_type ordering as mentioned above.
(Or it should be amd_core_mcablock_names[mca_type])

>   
>   	if (xec > len) {
> -		pr_cont("Unrecognized error code from %s MCA bank\n", ip_name);
> +		pr_cont("Unrecognized error code from %s MCA bank\n", amd_hwids[mca_type].name);
>   		return;
>   	}
>   

Ditto.

>
>   
> -	pr_emerg(HW_ERR "%s Error: ", ip_name);
> +	ip_name = amd_hwids[mca_type].name;

This should be amd_hwids[i].name
We shouldn't be using mca_type value as index..


Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] EDAC, mce_amd: Correct error paths
  2016-03-02 10:54     ` [PATCH 3/3] EDAC, mce_amd: Correct error paths Borislav Petkov
@ 2016-03-02 15:56       ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-03-02 15:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

On 3/2/2016 4:54 AM, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> Date: Wed, 2 Mar 2016 11:46:58 +0100
> Subject: [PATCH 3/3] EDAC, mce_amd: Correct error paths
>
> We need to unwind properly when we fail to find the proper decoding
> functions. Streamline error messages to resemble the rest of this file,
> while at it and do some minor stylistic changes.
>
> Signed-off-by: Borislav Petkov <bp@suse.de>

Looks good. Thanks.

Reviewed-by: Aravind Gopalakrishnan<aravind.gopalakrishnan@amd.com>

> -
>   
>   	default:
>   		printk(KERN_WARNING "Huh? What family is it: 0x%x?!\n", c->x86);
> -		kfree(fam_ops);
> -		fam_ops = NULL;
> +		goto err_out;
>   	}
>   
>   	pr_info("MCE: In-kernel MCE decoding enabled.\n");
> @@ -1225,6 +1224,11 @@ static int __init mce_amd_init(void)
>   	mce_register_decode_chain(&amd_mce_dec_nb);
>   
>   	return 0;
> +
> +err_out:
> +	kfree(fam_ops);
> +	fam_ops = NULL;
> +	return -EINVAL;

Thanks! Sorry I missed this.

-Aravind.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  2016-03-02 10:50   ` Borislav Petkov
                       ` (2 preceding siblings ...)
  2016-03-02 10:54     ` [PATCH 3/3] EDAC, mce_amd: Correct error paths Borislav Petkov
@ 2016-03-02 15:57     ` Aravind Gopalakrishnan
  3 siblings, 0 replies; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-03-02 15:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

On 3/2/2016 4:50 AM, Borislav Petkov wrote:
>
> Ok, applied with a bunch of changes ontop.


Thanks!

>   The second patch is relying on the assumption that a
> hwid of 0 is invalid. Is that so?
>

Yes, HWID of 0 is invalid.

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding
  2016-03-02 15:52       ` Aravind Gopalakrishnan
@ 2016-03-02 16:21         ` Borislav Petkov
  2016-03-02 16:27           ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2016-03-02 16:21 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

On Wed, Mar 02, 2016 at 09:52:23AM -0600, Aravind Gopalakrishnan wrote:
> So, I think we should continue this approach and have something like this-
> static const char * const amd_core_mcablock_names[] = {
>         [SMCA_LS]         = "load_store",
>         [SMCA_IF]         = "insn_fetch",
>         [SMCA_L2_CACHE]   = "l2_cache",
>         [SMCA_DE]         = "decode_unit",
>         [RES]                   = "",
>         [SMCA_EX]         = "execution_unit",
>         [SMCA_FP]         = "floating_point",
>         [SMCA_L3_CACHE]   = "l3_cache",
> };
> 
> static const char * const amd_df_mcablock_names[] = {
>         [SMCA_CS]  = "coherent_slave",
>         [SMCA_PIE] = "pie",
> };
> 
> (Split arrays again because I feel it'd be better to have arrays ordered
> according to mca_type values)

Ok, care to take the patch and redo it as you suggest?

I really don't want to be assigning strings each time during decoding.

Also, make sure the strings are as human readable as possible and so
that users can at least have an idea what we're saying. "load_store"
is better than "LS", "insn_fetch" is better than "IF", etc. Some
abbreviations should remain, though. "platform_security_processor" is
yucky and I guess there we can stick to "PSP". Ditto for "SMU"...

Making the unabbreviated lowercase for sysfs usage is fine too, of
course.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding
  2016-03-02 16:21         ` Borislav Petkov
@ 2016-03-02 16:27           ` Aravind Gopalakrishnan
  2016-03-02 16:38             ` Borislav Petkov
  0 siblings, 1 reply; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-03-02 16:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

On 3/2/2016 10:21 AM, Borislav Petkov wrote:
> On Wed, Mar 02, 2016 at 09:52:23AM -0600, Aravind Gopalakrishnan wrote:
>> So, I think we should continue this approach and have something like this-
>> static const char * const amd_core_mcablock_names[] = {
>>          [SMCA_LS]         = "load_store",
>>          [SMCA_IF]         = "insn_fetch",
>>          [SMCA_L2_CACHE]   = "l2_cache",
>>          [SMCA_DE]         = "decode_unit",
>>          [RES]                   = "",
>>          [SMCA_EX]         = "execution_unit",
>>          [SMCA_FP]         = "floating_point",
>>          [SMCA_L3_CACHE]   = "l3_cache",
>> };
>>
>> static const char * const amd_df_mcablock_names[] = {
>>          [SMCA_CS]  = "coherent_slave",
>>          [SMCA_PIE] = "pie",
>> };
>>
>> (Split arrays again because I feel it'd be better to have arrays ordered
>> according to mca_type values)
> Ok, care to take the patch and redo it as you suggest?

Sure. I was going to introduce these strings as part of patch to update 
sysfs code to
understand the new banks anyway. So it's already in the works:)

> I really don't want to be assigning strings each time during decoding.

Ok, Will update the EDAC to use the existing string array.

> Also, make sure the strings are as human readable as possible and so
> that users can at least have an idea what we're saying. "load_store"
> is better than "LS", "insn_fetch" is better than "IF", etc. Some
> abbreviations should remain, though. "platform_security_processor" is
> yucky and I guess there we can stick to "PSP". Ditto for "SMU"...

Understood. Will do as you suggest.

> Making the unabbreviated lowercase for sysfs usage is fine too, of
> course.
>
>

So, have you pushed the set of patches you applied somewhere? (bp.git?)
I can work on top of those and it will be easier to rebase on top of tip.git
once the patches find their way there..

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding
  2016-03-02 16:27           ` Aravind Gopalakrishnan
@ 2016-03-02 16:38             ` Borislav Petkov
  2016-03-02 16:54               ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2016-03-02 16:38 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

On Wed, Mar 02, 2016 at 10:27:57AM -0600, Aravind Gopalakrishnan wrote:
> So, have you pushed the set of patches you applied somewhere? (bp.git?)
> I can work on top of those and it will be easier to rebase on top of tip.git
> once the patches find their way there..

No.

But you can take the three here, merge them again into a single patch
and do the changes ontot.

I made them into three to show you more easily what should be changed.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding
  2016-03-02 16:38             ` Borislav Petkov
@ 2016-03-02 16:54               ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Aravind Gopalakrishnan @ 2016-03-02 16:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, hpa, mingo, tglx, dougthompson, mchehab, x86,
	linux-edac, linux-kernel, ashok.raj, gong.chen, len.brown,
	peterz, ak, alexander.shishkin

On 3/2/2016 10:38 AM, Borislav Petkov wrote:
> But you can take the three here, merge them again into a single patch
> and do the changes ontot.
>
> I made them into three to show you more easily what should be changed.
>
>


Ok, I'll just spin a V3 of the entire patchset with all your suggested 
changes then..

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:ras/core] x86/mce: Move MCx_CONFIG MSR definitions
  2016-02-29 22:32 ` [PATCH V2 1/5] x86/mce: Move MCx_CONFIG MSR definition Aravind Gopalakrishnan
@ 2016-03-08 13:12   ` tip-bot for Aravind Gopalakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Aravind Gopalakrishnan @ 2016-03-08 13:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, tony.luck, bp, mingo, tglx, peterz, Aravind.Gopalakrishnan,
	linux-edac, bp, linux-kernel, torvalds

Commit-ID:  adc53f2e0ae2fcff10a4b981df14729ffb1482fc
Gitweb:     http://git.kernel.org/tip/adc53f2e0ae2fcff10a4b981df14729ffb1482fc
Author:     Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
AuthorDate: Mon, 7 Mar 2016 14:02:17 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 8 Mar 2016 11:48:14 +0100

x86/mce: Move MCx_CONFIG MSR definitions

Those MSRs are used only by the MCE code so move them there.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1456785179-14378-2-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mce.h       | 4 ++++
 arch/x86/include/asm/msr-index.h | 4 ----
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 2ea4527..80ba0a8 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -91,6 +91,10 @@
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE	"MACHINECHECK"
 
+/* AMD Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG	0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)	(MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 /*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5523465..b05402e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -264,10 +264,6 @@
 #define MSR_IA32_MC0_CTL2		0x00000280
 #define MSR_IA32_MCx_CTL2(x)		(MSR_IA32_MC0_CTL2 + (x))
 
-/* 'SMCA': AMD64 Scalable MCA */
-#define MSR_AMD64_SMCA_MC0_CONFIG	0xc0002004
-#define MSR_AMD64_SMCA_MCx_CONFIG(x)	(MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
-
 #define MSR_P6_PERFCTR0			0x000000c1
 #define MSR_P6_PERFCTR1			0x000000c2
 #define MSR_P6_EVNTSEL0			0x00000186

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-03-08 13:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-29 22:32 [PATCH V2 0/5] Updates to EDAC and AMD MCE driver Aravind Gopalakrishnan
2016-02-29 22:32 ` [PATCH V2 1/5] x86/mce: Move MCx_CONFIG MSR definition Aravind Gopalakrishnan
2016-03-08 13:12   ` [tip:ras/core] x86/mce: Move MCx_CONFIG MSR definitions tip-bot for Aravind Gopalakrishnan
2016-02-29 22:32 ` [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors Aravind Gopalakrishnan
2016-03-02 10:50   ` Borislav Petkov
2016-03-02 10:52     ` [PATCH 1/3] x86/mce/AMD, EDAC: " Borislav Petkov
2016-03-02 10:53     ` [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding Borislav Petkov
2016-03-02 15:52       ` Aravind Gopalakrishnan
2016-03-02 16:21         ` Borislav Petkov
2016-03-02 16:27           ` Aravind Gopalakrishnan
2016-03-02 16:38             ` Borislav Petkov
2016-03-02 16:54               ` Aravind Gopalakrishnan
2016-03-02 10:54     ` [PATCH 3/3] EDAC, mce_amd: Correct error paths Borislav Petkov
2016-03-02 15:56       ` Aravind Gopalakrishnan
2016-03-02 15:57     ` [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors Aravind Gopalakrishnan
2016-02-29 22:32 ` [PATCH V2 3/5] x86/mce/AMD: Fix logic to obtain block address Aravind Gopalakrishnan
2016-02-29 22:32 ` [PATCH V2 4/5] x86/mce: Clarify comments regarding deferred error Aravind Gopalakrishnan
2016-02-29 22:32 ` [PATCH V2 5/5] x86/mce/AMD: Add comments for easier understanding Aravind Gopalakrishnan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.