All of lore.kernel.org
 help / color / mirror / Atom feed
* [v2,1/2] EDAC, sb_edac: Early return on ADDRV bit and address type test
@ 2018-09-07 14:57 Qiuxu Zhuo
  0 siblings, 0 replies; only message in thread
From: Qiuxu Zhuo @ 2018-09-07 14:57 UTC (permalink / raw)
  To: bp; +Cc: tony.luck, mchehab, arozansk, patrickg, linux-edac, Qiuxu Zhuo

The sb_edac driver decodes error address without checking whether
the address is the right type that it can decode. The decoded result
is meaningless/wrong, if the error address is not a physical address.

Tested on a SandyBridge (BIOS SE5C600.86B.99.99.x069.071520130923)
machine with kernel configuration '*_APEI_GHES=y'. Found that if the
"mce" structure was passed from GHES, the bit 59(MISCV) of "mce->status"
was always zero which meant "mce->misc" was invalid, therefor we can't
get the error address type from "mce->misc". In this case the driver
shouldn't do decoding for the error address, but it still gave out
the decoded result.

Test steps:
- sudo modprobe einj
- sudo echo 0x8 > /sys/kernel/debug/apei/einj/error_type
- sudo echo 1 > /sys/kernel/debug/apei/einj/error_inject

Test log:
   EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
   EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f
   EDAC sbridge MC0: TSC 0
   EDAC sbridge MC0: ADDR bb672400
   EDAC sbridge MC0: MISC 0
   ...
   EDAC MC0: 0 CE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 ...
   ...

So decide early if we have all the information to decode by adding
a test that the address is the right type (looking as MCi_MISC bits{8:6}),
and moving the test for bit 58(ADDRV) into sbridge_mce_check_error().
Also a clean up: get rid of the duplicated test whether it's a memory
controller error in sbridge_mce_output_error()

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
v1->v2:
0001: - Update commit message, no function change
0002: - Update commit message
      - Add comments for get_ha()
      - Add function pointer check for get_ha() before using it.

 drivers/edac/sb_edac.c | 68 ++++++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 33 deletions(-)

diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 4a89c8093307..2eaa745d056a 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2904,35 +2904,27 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	 *	cccc = channel
 	 * If the mask doesn't match, report an error to the parsing logic
 	 */
-	if (! ((errcode & 0xef80) == 0x80)) {
-		optype = "Can't parse: it is not a mem";
-	} else {
-		switch (optypenum) {
-		case 0:
-			optype = "generic undef request error";
-			break;
-		case 1:
-			optype = "memory read error";
-			break;
-		case 2:
-			optype = "memory write error";
-			break;
-		case 3:
-			optype = "addr/cmd error";
-			break;
-		case 4:
-			optype = "memory scrubbing error";
-			break;
-		default:
-			optype = "reserved";
-			break;
-		}
+	switch (optypenum) {
+	case 0:
+		optype = "generic undef request error";
+		break;
+	case 1:
+		optype = "memory read error";
+		break;
+	case 2:
+		optype = "memory write error";
+		break;
+	case 3:
+		optype = "addr/cmd error";
+		break;
+	case 4:
+		optype = "memory scrubbing error";
+		break;
+	default:
+		optype = "reserved";
+		break;
 	}
 
-	/* Only decode errors with an valid address (ADDRV) */
-	if (!GET_BITFIELD(m->status, 58, 58))
-		return;
-
 	if (pvt->info.type == KNIGHTS_LANDING) {
 		if (channel == 14) {
 			edac_dbg(0, "%s%s err_code:%04x:%04x EDRAM bank %d\n",
@@ -3038,17 +3030,11 @@ static int sbridge_mce_check_error(struct notifier_block *nb, unsigned long val,
 {
 	struct mce *mce = (struct mce *)data;
 	struct mem_ctl_info *mci;
-	struct sbridge_pvt *pvt;
 	char *type;
 
 	if (edac_get_report_status() == EDAC_REPORTING_DISABLED)
 		return NOTIFY_DONE;
 
-	mci = get_mci_for_node_id(mce->socketid, IMC0);
-	if (!mci)
-		return NOTIFY_DONE;
-	pvt = mci->pvt_info;
-
 	/*
 	 * Just let mcelog handle it if the error is
 	 * outside the memory controller. A memory error
@@ -3058,6 +3044,22 @@ static int sbridge_mce_check_error(struct notifier_block *nb, unsigned long val,
 	if ((mce->status & 0xefff) >> 7 != 1)
 		return NOTIFY_DONE;
 
+	/* Check ADDRV bit in STATUS */
+	if (!GET_BITFIELD(mce->status, 58, 58))
+		return NOTIFY_DONE;
+
+	/* Check MISCV bit in STATUS */
+	if (!GET_BITFIELD(mce->status, 59, 59))
+		return NOTIFY_DONE;
+
+	/* Check address type in MISC (physical address only) */
+	if (GET_BITFIELD(mce->misc, 6, 8) != 2)
+		return NOTIFY_DONE;
+
+	mci = get_mci_for_node_id(mce->socketid, IMC0);
+	if (!mci)
+		return NOTIFY_DONE;
+
 	if (mce->mcgstatus & MCG_STATUS_MCIP)
 		type = "Exception";
 	else

^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2018-09-07 14:57 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07 14:57 [v2,1/2] EDAC, sb_edac: Early return on ADDRV bit and address type test Qiuxu Zhuo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.