[RFC -v3 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC -v3 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs
@ 2012-03-06 13:31 Borislav Petkov
  2012-03-06 13:31 ` [PATCH 1/3] mce: Add a msg string to the MCE tracepoint Borislav Petkov
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Borislav Petkov @ 2012-03-06 13:31 UTC (permalink / raw)
  To: Tony Luck; +Cc: Ingo Molnar, EDAC devel, LKML, Borislav Petkov

From: Borislav Petkov <borislav.petkov@amd.com>

Third version of the patchset, with the latest addition of
"hijacking" EDAC printk output when a RAS agent is running, i.e.
/sys/devices/system/ras/agent is 1. Btw, I'm open for better suggestions
on how to do the logic of having a RAS agent running - currently, it is
a bool which is visible through sysfs and userspace can turn it on and
off.

Changelog:

* V2:

Here's a second version of the patchset with the buffer enlarging ripped
out. 1/4 in the series could go in independently since it is a cleanup,
I'll add it to a for-next testing branch if there are no objections.

* V1:

this is an initial, more or less serious attempt to collect decoded
MCE info into a buffer and jettison it into userspace using the MCE
tracepoint trace_mce_record(). This initial approach needs userspace to
do

$ echo 1 > /sys/devices/system/ras/agent

and decoded MCE info gets collected into a buffer. Then, when decoding
is finished, the tracepoint is called and the MCE info along with the
decoded information lands in the ring buffer and at possible userspace
consumers.

Also, the commit messages of the single patches contain additional info.

For example, the data looks like this:

mcegen.py-2318  [001] .N..   580.902409: mce_record: [Hardware Error]: CPU:0 MC4_STATUS[Over|CE|-|PCC|AddrV|CECC]: 0xd604c00006080a41 MC4_ADDR: 0x0000000000000016
[Hardware Error]: Northbridge Error (node 0): DRAM ECC error detected on the NB.
[Hardware Error]: ERR_ADDR: 0x16 row: 0, channel: 0
[Hardware Error]: cache level: L1, mem/io: MEM, mem-tx: DWR, part-proc: RES (no timeout)
[Hardware Error]: CPU: 0, MCGc/s: 0/0, MC4: d604c00006080a41, ADDR/MISC: 0000000000000016/dead57ac1ba0babe, RIP: 00:<0000000000000000>, TSC: 0, TIME: 0)

       mcegen.py-2326  [001] .N..   598.795494: mce_record: [Hardware Error]: CPU:0 MC4_STATUS[Over|UE|MiscV|PCC|-|UECC]: 0xfa002000001c011b
[Hardware Error]: Northbridge Error (node 0): L3 ECC data cache error.
[Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
[Hardware Error]: CPU: 0, MCGc/s: 0/0, MC4: fa002000001c011b, ADDR/MISC: 0000000000000016/dead57ac1ba0babe, RIP: 00:<0000000000000000>, TSC: 0, TIME: 0)

mcegen.py-2343  [013] .N..   619.620698: mce_record: [Hardware Error]: CPU:0 MC4_STATUS[-|UE|MiscV|PCC|-|UECC]: 0xba002100000f001b[HardwareError]: Northbridge Error (node 0): GART Table Walk data error.
[Hardware Error]: cache level: L3/GEN, tx: GEN
[Hardware Error]: CPU: 0, MCGc/s: 0/0, MC4: ba002100000f001b, ADDR/MISC: 0000000000000016/dead57ac1ba0babe, RIP: 00:<0000000000000000>, TSC: 0, TIME: 0)

As always, reviews and comments are welcome.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] mce: Add a msg string to the MCE tracepoint
  2012-03-06 13:31 [RFC -v3 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
@ 2012-03-06 13:31 ` Borislav Petkov
  2012-03-06 13:31 ` [PATCH 2/3] x86, RAS: Add a decoded msg buffer Borislav Petkov
  2012-03-06 13:31 ` [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer Borislav Petkov
  2 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2012-03-06 13:31 UTC (permalink / raw)
  To: Tony Luck; +Cc: Ingo Molnar, EDAC devel, LKML, Borislav Petkov

From: Borislav Petkov <borislav.petkov@amd.com>

The idea here is to pass an additional decoded MCE message through
the tracepoint and into the ring buffer for userspace to consume. The
designated consumers are RAS daemons and other tools collecting RAS
information.

Drop unneeded fields while at it, thus saving some room in the ring
buffer.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    2 +-
 include/trace/events/mce.h       |   19 +++++++------------
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5a11ae2e9e91..072e020ecaf3 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -144,7 +144,7 @@ void mce_log(struct mce *mce)
 	int ret = 0;
 
 	/* Emit the trace record: */
-	trace_mce_record(mce);
+	trace_mce_record("", mce);
 
 	ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
 	if (ret == NOTIFY_STOP)
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 4cbbcef6baa8..6fb5c2e0c5e1 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -10,11 +10,12 @@
 
 TRACE_EVENT(mce_record,
 
-	TP_PROTO(struct mce *m),
+	TP_PROTO(const char *msg, struct mce *m),
 
-	TP_ARGS(m),
+	TP_ARGS(msg, m),
 
 	TP_STRUCT__entry(
+		__string(	msg,		msg		)
 		__field(	u64,		mcgcap		)
 		__field(	u64,		mcgstatus	)
 		__field(	u64,		status		)
@@ -24,15 +25,13 @@ TRACE_EVENT(mce_record,
 		__field(	u64,		tsc		)
 		__field(	u64,		walltime	)
 		__field(	u32,		cpu		)
-		__field(	u32,		cpuid		)
-		__field(	u32,		apicid		)
 		__field(	u32,		socketid	)
 		__field(	u8,		cs		)
 		__field(	u8,		bank		)
-		__field(	u8,		cpuvendor	)
 	),
 
 	TP_fast_assign(
+		__assign_str(msg,	msg);
 		__entry->mcgcap		= m->mcgcap;
 		__entry->mcgstatus	= m->mcgstatus;
 		__entry->status		= m->status;
@@ -42,25 +41,21 @@ TRACE_EVENT(mce_record,
 		__entry->tsc		= m->tsc;
 		__entry->walltime	= m->time;
 		__entry->cpu		= m->extcpu;
-		__entry->cpuid		= m->cpuid;
-		__entry->apicid		= m->apicid;
 		__entry->socketid	= m->socketid;
 		__entry->cs		= m->cs;
 		__entry->bank		= m->bank;
-		__entry->cpuvendor	= m->cpuvendor;
 	),
 
-	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, ADDR/MISC: %016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
+	TP_printk("%s" HW_ERR "(CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, ADDR/MISC: %016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, TIME: %llu, SOCKET: %u)",
+		__get_str(msg),
 		__entry->cpu,
 		__entry->mcgcap, __entry->mcgstatus,
 		__entry->bank, __entry->status,
 		__entry->addr, __entry->misc,
 		__entry->cs, __entry->ip,
 		__entry->tsc,
-		__entry->cpuvendor, __entry->cpuid,
 		__entry->walltime,
-		__entry->socketid,
-		__entry->apicid)
+		__entry->socketid)
 );
 
 #endif /* _TRACE_MCE_H */
-- 
1.7.8.rc0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] x86, RAS: Add a decoded msg buffer
  2012-03-06 13:31 [RFC -v3 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
  2012-03-06 13:31 ` [PATCH 1/3] mce: Add a msg string to the MCE tracepoint Borislav Petkov
@ 2012-03-06 13:31 ` Borislav Petkov
  2012-03-06 13:31 ` [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer Borislav Petkov
  2 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2012-03-06 13:31 UTC (permalink / raw)
  To: Tony Luck; +Cc: Ingo Molnar, EDAC devel, LKML, Borislav Petkov

From: Borislav Petkov <borislav.petkov@amd.com>

Echoing 1 into /sys/devices/system/ras/agent causes the ras_printk()
function to buffer a string describing a hardware error. This is meant
for userspace daemons which are running on the system and are going to
consume decoded information through the MCE tracepoint.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/Kconfig           |    9 +++
 arch/x86/Makefile          |    3 +
 arch/x86/include/asm/ras.h |   15 +++++
 arch/x86/ras/Makefile      |    1 +
 arch/x86/ras/ras.c         |  145 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 173 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/ras.h
 create mode 100644 arch/x86/ras/Makefile
 create mode 100644 arch/x86/ras/ras.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5bed94e189fa..bda1480241b2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -657,6 +657,15 @@ config X86_CYCLONE_TIMER
 	def_bool y
 	depends on X86_SUMMIT
 
+config X86_RAS
+	def_bool y
+	prompt "X86 RAS features"
+	---help---
+	A collection of Reliability, Availability and Serviceability
+	software features which aim to enable hardware error logging
+	and reporting. Leave it at 'y' unless you really know what
+	you're doing
+
 source "arch/x86/Kconfig.cpu"
 
 config HPET_TIMER
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 209ba1294592..a6b6bb1f308b 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -146,6 +146,9 @@ drivers-$(CONFIG_OPROFILE) += arch/x86/oprofile/
 # suspend and hibernation support
 drivers-$(CONFIG_PM) += arch/x86/power/
 
+# RAS support
+core-y += arch/x86/ras/
+
 drivers-$(CONFIG_FB) += arch/x86/video/
 
 ####
diff --git a/arch/x86/include/asm/ras.h b/arch/x86/include/asm/ras.h
new file mode 100644
index 000000000000..b51838514259
--- /dev/null
+++ b/arch/x86/include/asm/ras.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_RAS_H
+#define _ASM_X86_RAS_H
+
+#define ERR_BUF_SZ 500
+
+extern bool ras_agent;
+
+#define PR_EMERG	BIT(0)
+#define PR_WARNING	BIT(4)
+#define PR_CONT		BIT(8)
+
+extern const char *ras_get_decoded_err(void);
+extern void ras_printk(unsigned long flags, const char *fmt, ...);
+
+#endif /* _ASM_X86_RAS_H */
diff --git a/arch/x86/ras/Makefile b/arch/x86/ras/Makefile
new file mode 100644
index 000000000000..7a70bb5cd057
--- /dev/null
+++ b/arch/x86/ras/Makefile
@@ -0,0 +1 @@
+obj-y		:= ras.o
diff --git a/arch/x86/ras/ras.c b/arch/x86/ras/ras.c
new file mode 100644
index 000000000000..868d732c6cd4
--- /dev/null
+++ b/arch/x86/ras/ras.c
@@ -0,0 +1,145 @@
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <asm/ras.h>
+
+static size_t dec_len;
+static char *err_str;
+
+/*
+ * If true, userspace has an agent running and eating all the
+ * tracing data we're sending out so there's no dmesg output
+ */
+bool ras_agent;
+EXPORT_SYMBOL_GPL(ras_agent);
+
+/* getting the string implies the current buffer is emptied */
+const char *ras_get_decoded_err(void)
+{
+	dec_len = 0;
+	return err_str;
+}
+
+void ras_printk(unsigned long flags, const char *fmt, ...)
+{
+	va_list args;
+	char *buf;
+	size_t left;
+	int i;
+
+	/* add a HW_ERR prefix to a newly started line */
+	if (!(flags & PR_CONT)) {
+		strcpy(err_str + dec_len, HW_ERR);
+		dec_len += strlen(HW_ERR);
+	}
+
+	left = ERR_BUF_SZ - dec_len - 1;
+	buf = err_str + dec_len;
+
+	va_start(args, fmt);
+	i = vsnprintf(buf, left, fmt, args);
+	va_end(args);
+
+	if (i >= left) {
+		pr_err("Error decode buffer truncated.\n");
+		dec_len = ERR_BUF_SZ-1;
+		err_str[dec_len] = '\n';
+	} else
+		dec_len += i;
+
+	if (!ras_agent) {
+		if (flags & PR_EMERG)
+			pr_emerg("%s", buf);
+		else if (flags & PR_WARNING)
+			pr_warning("%s", buf);
+		else if (flags & PR_CONT)
+			pr_cont("%s", buf);
+
+		dec_len = 0;
+	}
+}
+EXPORT_SYMBOL_GPL(ras_printk);
+
+struct bus_type ras_subsys = {
+	.name	  = "ras",
+	.dev_name = "ras",
+};
+
+struct ras_attr {
+	struct attribute attr;
+	ssize_t (*show) (struct kobject *kobj, struct ras_attr *attr, char *bf);
+	ssize_t (*store)(struct kobject *kobj, struct ras_attr *attr,
+			 const char *buf, size_t count);
+};
+
+#define RAS_ATTR(_name, _mode, _show, _store)	\
+static struct ras_attr ras_attr_##_name = __ATTR(_name, _mode, _show, _store)
+
+static ssize_t ras_agent_show(struct kobject *kobj,
+			      struct ras_attr *attr,
+			      char *buf)
+{
+	return sprintf(buf, "%.1d\n", ras_agent);
+}
+
+static ssize_t ras_agent_store(struct kobject *kobj,
+			       struct ras_attr *attr,
+			       const char *buf, size_t count)
+{
+	int ret = 0;
+	unsigned long value;
+
+	ret = kstrtoul(buf, 10, &value);
+	if (ret < 0) {
+		printk(KERN_ERR "Wrong value for ras_agent field.\n");
+		return ret;
+	}
+
+	ras_agent = !!value;
+
+	return count;
+}
+
+RAS_ATTR(agent, 0644, ras_agent_show, ras_agent_store);
+
+static struct attribute *ras_root_attrs[] = {
+	&ras_attr_agent.attr,
+	NULL
+};
+
+static const struct attribute_group ras_root_attr_group = {
+	.attrs = ras_root_attrs,
+};
+
+static const struct attribute_group *ras_root_attr_groups[] = {
+	&ras_root_attr_group,
+	NULL,
+};
+
+static int __init ras_init(void)
+{
+	int err = 0;
+
+	err = subsys_system_register(&ras_subsys, ras_root_attr_groups);
+	if (err) {
+		printk(KERN_ERR "Error registering toplevel RAS sysfs node.\n");
+		return err;
+	}
+
+	/* no freeing of this since it ras.c is compiled-on only */
+	err_str = kzalloc(ERR_BUF_SZ, GFP_KERNEL);
+	if (!err_str) {
+		err = -ENOMEM;
+		goto err_alloc;
+	}
+
+	return 0;
+
+err_alloc:
+	bus_unregister(&ras_subsys);
+
+	return err;
+}
+subsys_initcall(ras_init);
-- 
1.7.8.rc0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
  2012-03-06 13:31 [RFC -v3 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
  2012-03-06 13:31 ` [PATCH 1/3] mce: Add a msg string to the MCE tracepoint Borislav Petkov
  2012-03-06 13:31 ` [PATCH 2/3] x86, RAS: Add a decoded msg buffer Borislav Petkov
@ 2012-03-06 13:31 ` Borislav Petkov
  2012-03-06 15:42   ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2012-03-06 13:31 UTC (permalink / raw)
  To: Tony Luck; +Cc: Ingo Molnar, EDAC devel, LKML, Borislav Petkov

From: Borislav Petkov <borislav.petkov@amd.com>

This is an initial version of the patch which converts MCE decoding
facilities to use the RAS printk buffer. When there's no userspace agent
running (i.e., /sys/devices/system/ras/agent == 0), we fall back to the
default printk'ing into dmesg which is what we've been doing so far.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 drivers/edac/amd64_edac.c |    3 +-
 drivers/edac/edac_core.h  |   13 +++-
 drivers/edac/edac_mc.c    |   23 +++--
 drivers/edac/mce_amd.c    |  217 ++++++++++++++++++++++++---------------------
 4 files changed, 142 insertions(+), 114 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index c9eee6d33e9a..1d8feadb3610 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1,6 +1,7 @@
-#include "amd64_edac.h"
 #include <asm/amd_nb.h>
+#include <asm/ras.h>
 
+#include "amd64_edac.h"
 static struct edac_pci_ctl_info *amd64_ctl_pci;
 
 static int report_gart_errors;
diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index e48ab3108ad8..c06c82046e83 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -49,8 +49,17 @@
 #define edac_printk(level, prefix, fmt, arg...) \
 	printk(level "EDAC " prefix ": " fmt, ##arg)
 
-#define edac_mc_printk(mci, level, fmt, arg...) \
-	printk(level "EDAC MC%d: " fmt, mci->mc_idx, ##arg)
+#define edac_mc_printk(mci, level, fmt, arg...)					\
+({										\
+	if (ras_agent) {							\
+		unsigned pr_lvl = BIT((unsigned)(level[1] - '0'));		\
+										\
+		ras_printk(pr_lvl, HW_ERR "EDAC MC%d: " fmt,			\
+			   mci->mc_idx, ##arg);					\
+	}									\
+	else									\
+		printk(level "EDAC MC%d: " fmt, mci->mc_idx, ##arg);		\
+})
 
 #define edac_mc_chipset_printk(mci, level, prefix, fmt, arg...) \
 	printk(level "EDAC " prefix " MC%d: " fmt, mci->mc_idx, ##arg)
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index ca6c04d350ee..e7091dbb516f 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -30,8 +30,10 @@
 #include <asm/uaccess.h>
 #include <asm/page.h>
 #include <asm/edac.h>
+#include <asm/ras.h>
 #include "edac_core.h"
 #include "edac_module.h"
+#include "mce_amd.h"
 
 /* lock to memory controller's control array */
 static DEFINE_MUTEX(mem_ctls_mutex);
@@ -704,11 +706,14 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 	if (edac_mc_get_log_ce())
 		/* FIXME - put in DIMM location */
 		edac_mc_printk(mci, KERN_WARNING,
-			"CE page 0x%lx, offset 0x%lx, grain %d, syndrome "
-			"0x%lx, row %d, channel %d, label \"%s\": %s\n",
-			page_frame_number, offset_in_page,
-			mci->csrows[row].grain, syndrome, row, channel,
-			mci->csrows[row].channels[channel].label, msg);
+				"CE page 0x%lx, offset 0x%lx, grain %d,"
+				" syndrome 0x%lx, row %d, channel %d,"
+				" label \"%s\": %s\n",
+				page_frame_number, offset_in_page,
+				mci->csrows[row].grain, syndrome,
+				row, channel,
+				mci->csrows[row].channels[channel].label,
+				msg);
 
 	mci->ce_count++;
 	mci->csrows[row].ce_count++;
@@ -782,10 +787,10 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 
 	if (edac_mc_get_log_ue())
 		edac_mc_printk(mci, KERN_EMERG,
-			"UE page 0x%lx, offset 0x%lx, grain %d, row %d, "
-			"labels \"%s\": %s\n", page_frame_number,
-			offset_in_page, mci->csrows[row].grain, row,
-			labels, msg);
+			       "UE page 0x%lx, offset 0x%lx, grain %d,"
+			       " row %d, labels \"%s\": %s\n",
+			       page_frame_number, offset_in_page,
+			       mci->csrows[row].grain, row, labels, msg);
 
 	if (edac_mc_get_panic_on_ue())
 		panic("EDAC MC%d: UE page 0x%lx, offset 0x%lx, grain %d, "
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index bd926ea2e00c..e347d3680e13 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1,5 +1,7 @@
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <trace/events/mce.h>
+#include <asm/ras.h>
 
 #include "mce_amd.h"
 
@@ -137,9 +139,9 @@ static bool f12h_dc_mce(u16 ec, u8 xec)
 		ret = true;
 
 		if (ll == LL_L2)
-			pr_cont("during L1 linefill from L2.\n");
+			ras_printk(PR_CONT, "during L1 linefill from L2.\n");
 		else if (ll == LL_L1)
-			pr_cont("Data/Tag %s error.\n", R4_MSG(ec));
+			ras_printk(PR_CONT, "Data/Tag %s error.\n", R4_MSG(ec));
 		else
 			ret = false;
 	}
@@ -149,7 +151,7 @@ static bool f12h_dc_mce(u16 ec, u8 xec)
 static bool f10h_dc_mce(u16 ec, u8 xec)
 {
 	if (R4(ec) == R4_GEN && LL(ec) == LL_L1) {
-		pr_cont("during data scrub.\n");
+		ras_printk(PR_CONT, "during data scrub.\n");
 		return true;
 	}
 	return f12h_dc_mce(ec, xec);
@@ -158,7 +160,7 @@ static bool f10h_dc_mce(u16 ec, u8 xec)
 static bool k8_dc_mce(u16 ec, u8 xec)
 {
 	if (BUS_ERROR(ec)) {
-		pr_cont("during system linefill.\n");
+		ras_printk(PR_CONT, "during system linefill.\n");
 		return true;
 	}
 
@@ -178,14 +180,14 @@ static bool f14h_dc_mce(u16 ec, u8 xec)
 		switch (r4) {
 		case R4_DRD:
 		case R4_DWR:
-			pr_cont("Data/Tag parity error due to %s.\n",
+			ras_printk(PR_CONT, "Data/Tag parity error due to %s.\n",
 				(r4 == R4_DRD ? "load/hw prf" : "store"));
 			break;
 		case R4_EVICT:
-			pr_cont("Copyback parity error on a tag miss.\n");
+			ras_printk(PR_CONT, "Copyback parity error on a tag miss.\n");
 			break;
 		case R4_SNOOP:
-			pr_cont("Tag parity error during snoop.\n");
+			ras_printk(PR_CONT, "Tag parity error during snoop.\n");
 			break;
 		default:
 			ret = false;
@@ -195,17 +197,17 @@ static bool f14h_dc_mce(u16 ec, u8 xec)
 		if ((II(ec) != II_MEM && II(ec) != II_IO) || LL(ec) != LL_LG)
 			return false;
 
-		pr_cont("System read data error on a ");
+		ras_printk(PR_CONT, "System read data error on a ");
 
 		switch (r4) {
 		case R4_RD:
-			pr_cont("TLB reload.\n");
+			ras_printk(PR_CONT, "TLB reload.\n");
 			break;
 		case R4_DWR:
-			pr_cont("store.\n");
+			ras_printk(PR_CONT, "store.\n");
 			break;
 		case R4_DRD:
-			pr_cont("load.\n");
+			ras_printk(PR_CONT, "load.\n");
 			break;
 		default:
 			ret = false;
@@ -225,28 +227,29 @@ static bool f15h_dc_mce(u16 ec, u8 xec)
 
 		switch (xec) {
 		case 0x0:
-			pr_cont("Data Array access error.\n");
+			ras_printk(PR_CONT, "Data Array access error.\n");
 			break;
 
 		case 0x1:
-			pr_cont("UC error during a linefill from L2/NB.\n");
+			ras_printk(PR_CONT, "UC error during a linefill "
+					    "from L2/NB.\n");
 			break;
 
 		case 0x2:
 		case 0x11:
-			pr_cont("STQ access error.\n");
+			ras_printk(PR_CONT, "STQ access error.\n");
 			break;
 
 		case 0x3:
-			pr_cont("SCB access error.\n");
+			ras_printk(PR_CONT, "SCB access error.\n");
 			break;
 
 		case 0x10:
-			pr_cont("Tag error.\n");
+			ras_printk(PR_CONT, "Tag error.\n");
 			break;
 
 		case 0x12:
-			pr_cont("LDQ access error.\n");
+			ras_printk(PR_CONT, "LDQ access error.\n");
 			break;
 
 		default:
@@ -255,9 +258,9 @@ static bool f15h_dc_mce(u16 ec, u8 xec)
 	} else if (BUS_ERROR(ec)) {
 
 		if (!xec)
-			pr_cont("during system linefill.\n");
+			ras_printk(PR_CONT, "during system linefill.\n");
 		else
-			pr_cont(" Internal %s condition.\n",
+			ras_printk(PR_CONT, " Internal %s condition.\n",
 				((xec == 1) ? "livelock" : "deadlock"));
 	} else
 		ret = false;
@@ -270,12 +273,12 @@ static void amd_decode_dc_mce(struct mce *m)
 	u16 ec = EC(m->status);
 	u8 xec = XEC(m->status, xec_mask);
 
-	pr_emerg(HW_ERR "Data Cache Error: ");
+	ras_printk(PR_EMERG, "Data Cache Error: ");
 
 	/* TLB error signatures are the same across families */
 	if (TLB_ERROR(ec)) {
 		if (TT(ec) == TT_DATA) {
-			pr_cont("%s TLB %s.\n", LL_MSG(ec),
+			ras_printk(PR_CONT, "%s TLB %s.\n", LL_MSG(ec),
 				((xec == 2) ? "locked miss"
 					    : (xec ? "multimatch" : "parity")));
 			return;
@@ -283,7 +286,7 @@ static void amd_decode_dc_mce(struct mce *m)
 	} else if (fam_ops->dc_mce(ec, xec))
 		;
 	else
-		pr_emerg(HW_ERR "Corrupted DC MCE info?\n");
+		ras_printk(PR_EMERG, "Corrupted DC MCE info?\n");
 }
 
 static bool k8_ic_mce(u16 ec, u8 xec)
@@ -295,19 +298,19 @@ static bool k8_ic_mce(u16 ec, u8 xec)
 		return false;
 
 	if (ll == 0x2)
-		pr_cont("during a linefill from L2.\n");
+		ras_printk(PR_CONT, "during a linefill from L2.\n");
 	else if (ll == 0x1) {
 		switch (R4(ec)) {
 		case R4_IRD:
-			pr_cont("Parity error during data load.\n");
+			ras_printk(PR_CONT, "Parity error during data load.\n");
 			break;
 
 		case R4_EVICT:
-			pr_cont("Copyback Parity/Victim error.\n");
+			ras_printk(PR_CONT, "Copyback Parity/Victim error.\n");
 			break;
 
 		case R4_SNOOP:
-			pr_cont("Tag Snoop error.\n");
+			ras_printk(PR_CONT, "Tag Snoop error.\n");
 			break;
 
 		default:
@@ -330,9 +333,9 @@ static bool f14h_ic_mce(u16 ec, u8 xec)
 			ret = false;
 
 		if (r4 == R4_IRD)
-			pr_cont("Data/tag array parity error for a tag hit.\n");
+			ras_printk(PR_CONT, "Data/tag array parity error for a tag hit.\n");
 		else if (r4 == R4_SNOOP)
-			pr_cont("Tag error during snoop/victimization.\n");
+			ras_printk(PR_CONT, "Tag error during snoop/victimization.\n");
 		else
 			ret = false;
 	}
@@ -348,15 +351,16 @@ static bool f15h_ic_mce(u16 ec, u8 xec)
 
 	switch (xec) {
 	case 0x0 ... 0xa:
-		pr_cont("%s.\n", f15h_ic_mce_desc[xec]);
+		ras_printk(PR_CONT, "%s.\n", f15h_ic_mce_desc[xec]);
 		break;
 
 	case 0xd:
-		pr_cont("%s.\n", f15h_ic_mce_desc[xec-2]);
+		ras_printk(PR_CONT, "%s.\n", f15h_ic_mce_desc[xec-2]);
 		break;
 
 	case 0x10 ... 0x14:
-		pr_cont("Decoder %s parity error.\n", f15h_ic_mce_desc[xec-4]);
+		ras_printk(PR_CONT, "Decoder %s parity error.\n",
+				    f15h_ic_mce_desc[xec-4]);
 		break;
 
 	default:
@@ -370,19 +374,20 @@ static void amd_decode_ic_mce(struct mce *m)
 	u16 ec = EC(m->status);
 	u8 xec = XEC(m->status, xec_mask);
 
-	pr_emerg(HW_ERR "Instruction Cache Error: ");
+	ras_printk(PR_EMERG, "Instruction Cache Error: ");
 
 	if (TLB_ERROR(ec))
-		pr_cont("%s TLB %s.\n", LL_MSG(ec),
+		ras_printk(PR_CONT, "%s TLB %s.\n", LL_MSG(ec),
 			(xec ? "multimatch" : "parity error"));
 	else if (BUS_ERROR(ec)) {
 		bool k8 = (boot_cpu_data.x86 == 0xf && (m->status & BIT_64(58)));
 
-		pr_cont("during %s.\n", (k8 ? "system linefill" : "NB data read"));
+		ras_printk(PR_CONT, "during %s.\n", (k8 ? "system linefill"
+							: "NB data read"));
 	} else if (fam_ops->ic_mce(ec, xec))
 		;
 	else
-		pr_emerg(HW_ERR "Corrupted IC MCE info?\n");
+		ras_printk(PR_EMERG, "Corrupted IC MCE info?\n");
 }
 
 static void amd_decode_bu_mce(struct mce *m)
@@ -390,30 +395,33 @@ static void amd_decode_bu_mce(struct mce *m)
 	u16 ec = EC(m->status);
 	u8 xec = XEC(m->status, xec_mask);
 
-	pr_emerg(HW_ERR "Bus Unit Error");
+	ras_printk(PR_EMERG, "Bus Unit Error");
 
 	if (xec == 0x1)
-		pr_cont(" in the write data buffers.\n");
+		ras_printk(PR_CONT, " in the write data buffers.\n");
 	else if (xec == 0x3)
-		pr_cont(" in the victim data buffers.\n");
+		ras_printk(PR_CONT, " in the victim data buffers.\n");
 	else if (xec == 0x2 && MEM_ERROR(ec))
-		pr_cont(": %s error in the L2 cache tags.\n", R4_MSG(ec));
+		ras_printk(PR_CONT, ": %s error in the L2 cache tags.\n",
+			   R4_MSG(ec));
 	else if (xec == 0x0) {
 		if (TLB_ERROR(ec))
-			pr_cont(": %s error in a Page Descriptor Cache or "
-				"Guest TLB.\n", TT_MSG(ec));
+			ras_printk(PR_CONT, ": %s error in a Page Descriptor "
+					    "Cache or Guest TLB.\n",
+					    TT_MSG(ec));
 		else if (BUS_ERROR(ec))
-			pr_cont(": %s/ECC error in data read from NB: %s.\n",
-				R4_MSG(ec), PP_MSG(ec));
+			ras_printk(PR_CONT, ": %s/ECC error in data read from NB: %s.\n",
+					    R4_MSG(ec), PP_MSG(ec));
 		else if (MEM_ERROR(ec)) {
 			u8 r4 = R4(ec);
 
 			if (r4 >= 0x7)
-				pr_cont(": %s error during data copyback.\n",
-					R4_MSG(ec));
+				ras_printk(PR_CONT, ": %s error during data copyback.\n",
+						    R4_MSG(ec));
 			else if (r4 <= 0x1)
-				pr_cont(": %s parity/ECC error during data "
-					"access from L2.\n", R4_MSG(ec));
+				ras_printk(PR_CONT, ": %s parity/ECC error "
+						    "during data access from L2.\n",
+						    R4_MSG(ec));
 			else
 				goto wrong_bu_mce;
 		} else
@@ -424,7 +432,7 @@ static void amd_decode_bu_mce(struct mce *m)
 	return;
 
 wrong_bu_mce:
-	pr_emerg(HW_ERR "Corrupted BU MCE info?\n");
+	ras_printk(PR_EMERG, "Corrupted BU MCE info?\n");
 }
 
 static void amd_decode_cu_mce(struct mce *m)
@@ -432,28 +440,28 @@ static void amd_decode_cu_mce(struct mce *m)
 	u16 ec = EC(m->status);
 	u8 xec = XEC(m->status, xec_mask);
 
-	pr_emerg(HW_ERR "Combined Unit Error: ");
+	ras_printk(PR_EMERG, "Combined Unit Error: ");
 
 	if (TLB_ERROR(ec)) {
 		if (xec == 0x0)
-			pr_cont("Data parity TLB read error.\n");
+			ras_printk(PR_CONT, "Data parity TLB read error.\n");
 		else if (xec == 0x1)
-			pr_cont("Poison data provided for TLB fill.\n");
+			ras_printk(PR_CONT, "Poison data provided for TLB fill.\n");
 		else
 			goto wrong_cu_mce;
 	} else if (BUS_ERROR(ec)) {
 		if (xec > 2)
 			goto wrong_cu_mce;
 
-		pr_cont("Error during attempted NB data read.\n");
+		ras_printk(PR_CONT, "Error during attempted NB data read.\n");
 	} else if (MEM_ERROR(ec)) {
 		switch (xec) {
 		case 0x4 ... 0xc:
-			pr_cont("%s.\n", f15h_cu_mce_desc[xec - 0x4]);
+			ras_printk(PR_CONT, "%s.\n", f15h_cu_mce_desc[xec - 0x4]);
 			break;
 
 		case 0x10 ... 0x14:
-			pr_cont("%s.\n", f15h_cu_mce_desc[xec - 0x7]);
+			ras_printk(PR_CONT, "%s.\n", f15h_cu_mce_desc[xec - 0x7]);
 			break;
 
 		default:
@@ -464,7 +472,7 @@ static void amd_decode_cu_mce(struct mce *m)
 	return;
 
 wrong_cu_mce:
-	pr_emerg(HW_ERR "Corrupted CU MCE info?\n");
+	ras_printk(PR_EMERG, "Corrupted CU MCE info?\n");
 }
 
 static void amd_decode_ls_mce(struct mce *m)
@@ -473,12 +481,12 @@ static void amd_decode_ls_mce(struct mce *m)
 	u8 xec = XEC(m->status, xec_mask);
 
 	if (boot_cpu_data.x86 >= 0x14) {
-		pr_emerg("You shouldn't be seeing an LS MCE on this cpu family,"
-			 " please report on LKML.\n");
+		ras_printk(PR_EMERG, "You shouldn't be seeing an LS MCE on this"
+				     " cpu family, please report on LKML.\n");
 		return;
 	}
 
-	pr_emerg(HW_ERR "Load Store Error");
+	ras_printk(PR_EMERG, "Load Store Error");
 
 	if (xec == 0x0) {
 		u8 r4 = R4(ec);
@@ -486,14 +494,14 @@ static void amd_decode_ls_mce(struct mce *m)
 		if (!BUS_ERROR(ec) || (r4 != R4_DRD && r4 != R4_DWR))
 			goto wrong_ls_mce;
 
-		pr_cont(" during %s.\n", R4_MSG(ec));
+		ras_printk(PR_CONT, " during %s.\n", R4_MSG(ec));
 	} else
 		goto wrong_ls_mce;
 
 	return;
 
 wrong_ls_mce:
-	pr_emerg(HW_ERR "Corrupted LS MCE info?\n");
+	ras_printk(PR_EMERG, "Corrupted LS MCE info?\n");
 }
 
 static bool k8_nb_mce(u16 ec, u8 xec)
@@ -502,15 +510,15 @@ static bool k8_nb_mce(u16 ec, u8 xec)
 
 	switch (xec) {
 	case 0x1:
-		pr_cont("CRC error detected on HT link.\n");
+		ras_printk(PR_CONT, "CRC error detected on HT link.\n");
 		break;
 
 	case 0x5:
-		pr_cont("Invalid GART PTE entry during GART table walk.\n");
+		ras_printk(PR_CONT, "Invalid GART PTE entry during GART table walk.\n");
 		break;
 
 	case 0x6:
-		pr_cont("Unsupported atomic RMW received from an IO link.\n");
+		ras_printk(PR_CONT, "Unsupported atomic RMW received from an IO link.\n");
 		break;
 
 	case 0x0:
@@ -518,11 +526,11 @@ static bool k8_nb_mce(u16 ec, u8 xec)
 		if (boot_cpu_data.x86 == 0x11)
 			return false;
 
-		pr_cont("DRAM ECC error detected on the NB.\n");
+		ras_printk(PR_CONT, "DRAM ECC error detected on the NB.\n");
 		break;
 
 	case 0xd:
-		pr_cont("Parity error on the DRAM addr/ctl signals.\n");
+		ras_printk(PR_CONT, "Parity error on the DRAM addr/ctl signals.\n");
 		break;
 
 	default:
@@ -552,9 +560,9 @@ static bool f10h_nb_mce(u16 ec, u8 xec)
 
 	case 0xf:
 		if (TLB_ERROR(ec))
-			pr_cont("GART Table Walk data error.\n");
+			ras_printk(PR_CONT, "GART Table Walk data error.\n");
 		else if (BUS_ERROR(ec))
-			pr_cont("DMA Exclusion Vector Table Walk error.\n");
+			ras_printk(PR_CONT, "DMA Exclusion Vector Table Walk error.\n");
 		else
 			ret = false;
 
@@ -563,7 +571,7 @@ static bool f10h_nb_mce(u16 ec, u8 xec)
 
 	case 0x19:
 		if (boot_cpu_data.x86 == 0x15)
-			pr_cont("Compute Unit Data Error.\n");
+			ras_printk(PR_CONT, "Compute Unit Data Error.\n");
 		else
 			ret = false;
 
@@ -581,7 +589,7 @@ static bool f10h_nb_mce(u16 ec, u8 xec)
 		break;
 	}
 
-	pr_cont("%s.\n", f10h_nb_mce_desc[xec - offset]);
+	ras_printk(PR_CONT, "%s.\n", f10h_nb_mce_desc[xec - offset]);
 
 out:
 	return ret;
@@ -599,27 +607,27 @@ void amd_decode_nb_mce(struct mce *m)
 	u16 ec = EC(m->status);
 	u8 xec = XEC(m->status, 0x1f);
 
-	pr_emerg(HW_ERR "Northbridge Error (node %d): ", node_id);
+	ras_printk(PR_EMERG, "Northbridge Error (node %d): ", node_id);
 
 	switch (xec) {
 	case 0x2:
-		pr_cont("Sync error (sync packets on HT link detected).\n");
+		ras_printk(PR_CONT, "Sync error (sync packets on HT link detected).\n");
 		return;
 
 	case 0x3:
-		pr_cont("HT Master abort.\n");
+		ras_printk(PR_CONT, "HT Master abort.\n");
 		return;
 
 	case 0x4:
-		pr_cont("HT Target abort.\n");
+		ras_printk(PR_CONT, "HT Target abort.\n");
 		return;
 
 	case 0x7:
-		pr_cont("NB Watchdog timeout.\n");
+		ras_printk(PR_CONT, "NB Watchdog timeout.\n");
 		return;
 
 	case 0x9:
-		pr_cont("SVM DMA Exclusion Vector error.\n");
+		ras_printk(PR_CONT, "SVM DMA Exclusion Vector error.\n");
 		return;
 
 	default:
@@ -636,7 +644,7 @@ void amd_decode_nb_mce(struct mce *m)
 	return;
 
 wrong_nb_mce:
-	pr_emerg(HW_ERR "Corrupted NB MCE info?\n");
+	ras_printk(PR_EMERG, "Corrupted NB MCE info?\n");
 }
 EXPORT_SYMBOL_GPL(amd_decode_nb_mce);
 
@@ -651,80 +659,80 @@ static void amd_decode_fr_mce(struct mce *m)
 	if (c->x86 != 0x15 && xec != 0x0)
 		goto wrong_fr_mce;
 
-	pr_emerg(HW_ERR "%s Error: ",
+	ras_printk(PR_EMERG, "%s Error: ",
 		 (c->x86 == 0x15 ? "Execution Unit" : "FIROB"));
 
 	if (xec == 0x0 || xec == 0xc)
-		pr_cont("%s.\n", fr_ex_mce_desc[xec]);
+		ras_printk(PR_CONT, "%s.\n", fr_ex_mce_desc[xec]);
 	else if (xec < 0xd)
-		pr_cont("%s parity error.\n", fr_ex_mce_desc[xec]);
+		ras_printk(PR_CONT, "%s parity error.\n", fr_ex_mce_desc[xec]);
 	else
 		goto wrong_fr_mce;
 
 	return;
 
 wrong_fr_mce:
-	pr_emerg(HW_ERR "Corrupted FR MCE info?\n");
+	ras_printk(PR_EMERG, "Corrupted FR MCE info?\n");
 }
 
 static void amd_decode_fp_mce(struct mce *m)
 {
 	u8 xec = XEC(m->status, xec_mask);
 
-	pr_emerg(HW_ERR "Floating Point Unit Error: ");
+	ras_printk(PR_EMERG, "Floating Point Unit Error: ");
 
 	switch (xec) {
 	case 0x1:
-		pr_cont("Free List");
+		ras_printk(PR_CONT, "Free List");
 		break;
 
 	case 0x2:
-		pr_cont("Physical Register File");
+		ras_printk(PR_CONT, "Physical Register File");
 		break;
 
 	case 0x3:
-		pr_cont("Retire Queue");
+		ras_printk(PR_CONT, "Retire Queue");
 		break;
 
 	case 0x4:
-		pr_cont("Scheduler table");
+		ras_printk(PR_CONT, "Scheduler table");
 		break;
 
 	case 0x5:
-		pr_cont("Status Register File");
+		ras_printk(PR_CONT, "Status Register File");
 		break;
 
 	default:
 		goto wrong_fp_mce;
 		break;
 	}
-
-	pr_cont(" parity error.\n");
+	ras_printk(PR_CONT, " parity error.\n");
 
 	return;
 
 wrong_fp_mce:
-	pr_emerg(HW_ERR "Corrupted FP MCE info?\n");
+	ras_printk(PR_EMERG, "Corrupted FP MCE info?\n");
 }
 
 static inline void amd_decode_err_code(u16 ec)
 {
 
-	pr_emerg(HW_ERR "cache level: %s", LL_MSG(ec));
+	ras_printk(PR_EMERG, "cache level: %s", LL_MSG(ec));
 
 	if (BUS_ERROR(ec))
-		pr_cont(", mem/io: %s", II_MSG(ec));
+		ras_printk(PR_CONT, ", mem/io: %s", II_MSG(ec));
 	else
-		pr_cont(", tx: %s", TT_MSG(ec));
+		ras_printk(PR_CONT, ", tx: %s", TT_MSG(ec));
 
 	if (MEM_ERROR(ec) || BUS_ERROR(ec)) {
-		pr_cont(", mem-tx: %s", R4_MSG(ec));
+		ras_printk(PR_CONT, ", mem-tx: %s", R4_MSG(ec));
 
 		if (BUS_ERROR(ec))
-			pr_cont(", part-proc: %s (%s)", PP_MSG(ec), TO_MSG(ec));
+			ras_printk(PR_CONT, ", part-proc: %s (%s)",
+					    PP_MSG(ec), TO_MSG(ec));
 	}
 
-	pr_cont("\n");
+	ras_printk(PR_CONT, "\n");
 }
 
 /*
@@ -752,7 +760,7 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	if (amd_filter_mce(m))
 		return NOTIFY_STOP;
 
-	pr_emerg(HW_ERR "CPU:%d\tMC%d_STATUS[%s|%s|%s|%s|%s",
+	ras_printk(PR_EMERG, "CPU:%d MC%d_STATUS[%s|%s|%s|%s|%s",
 		m->extcpu, m->bank,
 		((m->status & MCI_STATUS_OVER)	? "Over"  : "-"),
 		((m->status & MCI_STATUS_UC)	? "UE"	  : "CE"),
@@ -761,19 +769,22 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 		((m->status & MCI_STATUS_ADDRV)	? "AddrV" : "-"));
 
 	if (c->x86 == 0x15)
-		pr_cont("|%s|%s",
+		ras_printk(PR_CONT, "|%s|%s",
 			((m->status & BIT_64(44)) ? "Deferred" : "-"),
 			((m->status & BIT_64(43)) ? "Poison"   : "-"));
 
 	/* do the two bits[14:13] together */
 	ecc = (m->status >> 45) & 0x3;
 	if (ecc)
-		pr_cont("|%sECC", ((ecc == 2) ? "C" : "U"));
+		ras_printk(PR_CONT, "|%sECC", ((ecc == 2) ? "C" : "U"));
 
-	pr_cont("]: 0x%016llx\n", m->status);
+	ras_printk(PR_CONT, "]: 0x%016llx", m->status);
 
 	if (m->status & MCI_STATUS_ADDRV)
-		pr_emerg(HW_ERR "\tMC%d_ADDR: 0x%016llx\n", m->bank, m->addr);
+		ras_printk(PR_CONT, " MC%d_ADDR: 0x%016llx",
+			   m->bank, m->addr);
+
+	ras_printk(PR_CONT, "\n");
 
 	switch (m->bank) {
 	case 0:
@@ -813,6 +824,8 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 
 	amd_decode_err_code(m->status & 0xffff);
 
+	trace_mce_record(ras_get_decoded_err(), m);
+
 	return NOTIFY_STOP;
 }
 EXPORT_SYMBOL_GPL(amd_decode_mce);
@@ -882,10 +895,10 @@ static int __init mce_amd_init(void)
 		return -EINVAL;
 	}
 
-	pr_info("MCE: In-kernel MCE decoding enabled.\n");
-
 	mce_register_decode_chain(&amd_mce_dec_nb);
 
+	pr_info("MCE: In-kernel MCE decoding enabled.\n");
+
 	return 0;
 }
 early_initcall(mce_amd_init);
-- 
1.7.8.rc0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
  2012-03-06 13:31 ` [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer Borislav Petkov
@ 2012-03-06 15:42   ` Mauro Carvalho Chehab
  2012-03-12 16:18     ` Luck, Tony
  0 siblings, 1 reply; 12+ messages in thread
From: Mauro Carvalho Chehab @ 2012-03-06 15:42 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Tony Luck, Ingo Molnar, EDAC devel, LKML, Borislav Petkov

Em 06-03-2012 10:31, Borislav Petkov escreveu:
> From: Borislav Petkov <borislav.petkov@amd.com>
> 
> This is an initial version of the patch which converts MCE decoding
> facilities to use the RAS printk buffer. When there's no userspace agent
> running (i.e., /sys/devices/system/ras/agent == 0), we fall back to the
> default printk'ing into dmesg which is what we've been doing so far.
> 
> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
> ---
>  drivers/edac/amd64_edac.c |    3 +-
>  drivers/edac/edac_core.h  |   13 +++-
>  drivers/edac/edac_mc.c    |   23 +++--
>  drivers/edac/mce_amd.c    |  217 ++++++++++++++++++++++++---------------------
>  4 files changed, 142 insertions(+), 114 deletions(-)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index c9eee6d33e9a..1d8feadb3610 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -1,6 +1,7 @@
> -#include "amd64_edac.h"
>  #include <asm/amd_nb.h>
> +#include <asm/ras.h>
>  
> +#include "amd64_edac.h"
>  static struct edac_pci_ctl_info *amd64_ctl_pci;
>  
>  static int report_gart_errors;
> diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
> index e48ab3108ad8..c06c82046e83 100644
> --- a/drivers/edac/edac_core.h
> +++ b/drivers/edac/edac_core.h
> @@ -49,8 +49,17 @@
>  #define edac_printk(level, prefix, fmt, arg...) \
>  	printk(level "EDAC " prefix ": " fmt, ##arg)
>  
> -#define edac_mc_printk(mci, level, fmt, arg...) \
> -	printk(level "EDAC MC%d: " fmt, mci->mc_idx, ##arg)
> +#define edac_mc_printk(mci, level, fmt, arg...)					\
> +({										\
> +	if (ras_agent) {							\
> +		unsigned pr_lvl = BIT((unsigned)(level[1] - '0'));		\
> +										\
> +		ras_printk(pr_lvl, HW_ERR "EDAC MC%d: " fmt,			\
> +			   mci->mc_idx, ##arg);					\
> +	}									\
> +	else									\
> +		printk(level "EDAC MC%d: " fmt, mci->mc_idx, ##arg);		\
> +})


NAK.


>  
>  #define edac_mc_chipset_printk(mci, level, prefix, fmt, arg...) \
>  	printk(level "EDAC " prefix " MC%d: " fmt, mci->mc_idx, ##arg)
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index ca6c04d350ee..e7091dbb516f 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -30,8 +30,10 @@
>  #include <asm/uaccess.h>
>  #include <asm/page.h>
>  #include <asm/edac.h>
> +#include <asm/ras.h>
>  #include "edac_core.h"
>  #include "edac_module.h"
> +#include "mce_amd.h"
>  
>  /* lock to memory controller's control array */
>  static DEFINE_MUTEX(mem_ctls_mutex);
> @@ -704,11 +706,14 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
>  	if (edac_mc_get_log_ce())
>  		/* FIXME - put in DIMM location */
>  		edac_mc_printk(mci, KERN_WARNING,
> -			"CE page 0x%lx, offset 0x%lx, grain %d, syndrome "
> -			"0x%lx, row %d, channel %d, label \"%s\": %s\n",
> -			page_frame_number, offset_in_page,
> -			mci->csrows[row].grain, syndrome, row, channel,
> -			mci->csrows[row].channels[channel].label, msg);
> +				"CE page 0x%lx, offset 0x%lx, grain %d,"
> +				" syndrome 0x%lx, row %d, channel %d,"
> +				" label \"%s\": %s\n",
> +				page_frame_number, offset_in_page,
> +				mci->csrows[row].grain, syndrome,
> +				row, channel,
> +				mci->csrows[row].channels[channel].label,
> +				msg);
>  
>  	mci->ce_count++;
>  	mci->csrows[row].ce_count++;
> @@ -782,10 +787,10 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
>  
>  	if (edac_mc_get_log_ue())
>  		edac_mc_printk(mci, KERN_EMERG,
> -			"UE page 0x%lx, offset 0x%lx, grain %d, row %d, "
> -			"labels \"%s\": %s\n", page_frame_number,
> -			offset_in_page, mci->csrows[row].grain, row,
> -			labels, msg);
> +			       "UE page 0x%lx, offset 0x%lx, grain %d,"
> +			       " row %d, labels \"%s\": %s\n",
> +			       page_frame_number, offset_in_page,
> +			       mci->csrows[row].grain, row, labels, msg);
>  
>  	if (edac_mc_get_panic_on_ue())
>  		panic("EDAC MC%d: UE page 0x%lx, offset 0x%lx, grain %d, "
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index bd926ea2e00c..e347d3680e13 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -1,5 +1,7 @@
>  #include <linux/module.h>
>  #include <linux/slab.h>
> +#include <trace/events/mce.h>
> +#include <asm/ras.h>
>  
>  #include "mce_amd.h"
>  
> @@ -137,9 +139,9 @@ static bool f12h_dc_mce(u16 ec, u8 xec)
>  		ret = true;
>  
>  		if (ll == LL_L2)
> -			pr_cont("during L1 linefill from L2.\n");
> +			ras_printk(PR_CONT, "during L1 linefill from L2.\n");
>  		else if (ll == LL_L1)
> -			pr_cont("Data/Tag %s error.\n", R4_MSG(ec));
> +			ras_printk(PR_CONT, "Data/Tag %s error.\n", R4_MSG(ec));
>  		else
>  			ret = false;
>  	}
> @@ -149,7 +151,7 @@ static bool f12h_dc_mce(u16 ec, u8 xec)
>  static bool f10h_dc_mce(u16 ec, u8 xec)
>  {
>  	if (R4(ec) == R4_GEN && LL(ec) == LL_L1) {
> -		pr_cont("during data scrub.\n");
> +		ras_printk(PR_CONT, "during data scrub.\n");
>  		return true;
>  	}
>  	return f12h_dc_mce(ec, xec);
> @@ -158,7 +160,7 @@ static bool f10h_dc_mce(u16 ec, u8 xec)
>  static bool k8_dc_mce(u16 ec, u8 xec)
>  {
>  	if (BUS_ERROR(ec)) {
> -		pr_cont("during system linefill.\n");
> +		ras_printk(PR_CONT, "during system linefill.\n");
>  		return true;
>  	}
>  
> @@ -178,14 +180,14 @@ static bool f14h_dc_mce(u16 ec, u8 xec)
>  		switch (r4) {
>  		case R4_DRD:
>  		case R4_DWR:
> -			pr_cont("Data/Tag parity error due to %s.\n",
> +			ras_printk(PR_CONT, "Data/Tag parity error due to %s.\n",
>  				(r4 == R4_DRD ? "load/hw prf" : "store"));
>  			break;
>  		case R4_EVICT:
> -			pr_cont("Copyback parity error on a tag miss.\n");
> +			ras_printk(PR_CONT, "Copyback parity error on a tag miss.\n");
>  			break;
>  		case R4_SNOOP:
> -			pr_cont("Tag parity error during snoop.\n");
> +			ras_printk(PR_CONT, "Tag parity error during snoop.\n");
>  			break;
>  		default:
>  			ret = false;
> @@ -195,17 +197,17 @@ static bool f14h_dc_mce(u16 ec, u8 xec)
>  		if ((II(ec) != II_MEM && II(ec) != II_IO) || LL(ec) != LL_LG)
>  			return false;
>  
> -		pr_cont("System read data error on a ");
> +		ras_printk(PR_CONT, "System read data error on a ");
>  
>  		switch (r4) {
>  		case R4_RD:
> -			pr_cont("TLB reload.\n");
> +			ras_printk(PR_CONT, "TLB reload.\n");
>  			break;
>  		case R4_DWR:
> -			pr_cont("store.\n");
> +			ras_printk(PR_CONT, "store.\n");
>  			break;
>  		case R4_DRD:
> -			pr_cont("load.\n");
> +			ras_printk(PR_CONT, "load.\n");
>  			break;
>  		default:
>  			ret = false;
> @@ -225,28 +227,29 @@ static bool f15h_dc_mce(u16 ec, u8 xec)
>  
>  		switch (xec) {
>  		case 0x0:
> -			pr_cont("Data Array access error.\n");
> +			ras_printk(PR_CONT, "Data Array access error.\n");
>  			break;
>  
>  		case 0x1:
> -			pr_cont("UC error during a linefill from L2/NB.\n");
> +			ras_printk(PR_CONT, "UC error during a linefill "
> +					    "from L2/NB.\n");
>  			break;
>  
>  		case 0x2:
>  		case 0x11:
> -			pr_cont("STQ access error.\n");
> +			ras_printk(PR_CONT, "STQ access error.\n");
>  			break;
>  
>  		case 0x3:
> -			pr_cont("SCB access error.\n");
> +			ras_printk(PR_CONT, "SCB access error.\n");
>  			break;
>  
>  		case 0x10:
> -			pr_cont("Tag error.\n");
> +			ras_printk(PR_CONT, "Tag error.\n");
>  			break;
>  
>  		case 0x12:
> -			pr_cont("LDQ access error.\n");
> +			ras_printk(PR_CONT, "LDQ access error.\n");
>  			break;
>  
>  		default:
> @@ -255,9 +258,9 @@ static bool f15h_dc_mce(u16 ec, u8 xec)
>  	} else if (BUS_ERROR(ec)) {
>  
>  		if (!xec)
> -			pr_cont("during system linefill.\n");
> +			ras_printk(PR_CONT, "during system linefill.\n");
>  		else
> -			pr_cont(" Internal %s condition.\n",
> +			ras_printk(PR_CONT, " Internal %s condition.\n",
>  				((xec == 1) ? "livelock" : "deadlock"));
>  	} else
>  		ret = false;
> @@ -270,12 +273,12 @@ static void amd_decode_dc_mce(struct mce *m)
>  	u16 ec = EC(m->status);
>  	u8 xec = XEC(m->status, xec_mask);
>  
> -	pr_emerg(HW_ERR "Data Cache Error: ");
> +	ras_printk(PR_EMERG, "Data Cache Error: ");
>  
>  	/* TLB error signatures are the same across families */
>  	if (TLB_ERROR(ec)) {
>  		if (TT(ec) == TT_DATA) {
> -			pr_cont("%s TLB %s.\n", LL_MSG(ec),
> +			ras_printk(PR_CONT, "%s TLB %s.\n", LL_MSG(ec),
>  				((xec == 2) ? "locked miss"
>  					    : (xec ? "multimatch" : "parity")));
>  			return;
> @@ -283,7 +286,7 @@ static void amd_decode_dc_mce(struct mce *m)
>  	} else if (fam_ops->dc_mce(ec, xec))
>  		;
>  	else
> -		pr_emerg(HW_ERR "Corrupted DC MCE info?\n");
> +		ras_printk(PR_EMERG, "Corrupted DC MCE info?\n");
>  }
>  
>  static bool k8_ic_mce(u16 ec, u8 xec)
> @@ -295,19 +298,19 @@ static bool k8_ic_mce(u16 ec, u8 xec)
>  		return false;
>  
>  	if (ll == 0x2)
> -		pr_cont("during a linefill from L2.\n");
> +		ras_printk(PR_CONT, "during a linefill from L2.\n");
>  	else if (ll == 0x1) {
>  		switch (R4(ec)) {
>  		case R4_IRD:
> -			pr_cont("Parity error during data load.\n");
> +			ras_printk(PR_CONT, "Parity error during data load.\n");
>  			break;
>  
>  		case R4_EVICT:
> -			pr_cont("Copyback Parity/Victim error.\n");
> +			ras_printk(PR_CONT, "Copyback Parity/Victim error.\n");
>  			break;
>  
>  		case R4_SNOOP:
> -			pr_cont("Tag Snoop error.\n");
> +			ras_printk(PR_CONT, "Tag Snoop error.\n");
>  			break;
>  
>  		default:
> @@ -330,9 +333,9 @@ static bool f14h_ic_mce(u16 ec, u8 xec)
>  			ret = false;
>  
>  		if (r4 == R4_IRD)
> -			pr_cont("Data/tag array parity error for a tag hit.\n");
> +			ras_printk(PR_CONT, "Data/tag array parity error for a tag hit.\n");
>  		else if (r4 == R4_SNOOP)
> -			pr_cont("Tag error during snoop/victimization.\n");
> +			ras_printk(PR_CONT, "Tag error during snoop/victimization.\n");
>  		else
>  			ret = false;
>  	}
> @@ -348,15 +351,16 @@ static bool f15h_ic_mce(u16 ec, u8 xec)
>  
>  	switch (xec) {
>  	case 0x0 ... 0xa:
> -		pr_cont("%s.\n", f15h_ic_mce_desc[xec]);
> +		ras_printk(PR_CONT, "%s.\n", f15h_ic_mce_desc[xec]);
>  		break;
>  
>  	case 0xd:
> -		pr_cont("%s.\n", f15h_ic_mce_desc[xec-2]);
> +		ras_printk(PR_CONT, "%s.\n", f15h_ic_mce_desc[xec-2]);
>  		break;
>  
>  	case 0x10 ... 0x14:
> -		pr_cont("Decoder %s parity error.\n", f15h_ic_mce_desc[xec-4]);
> +		ras_printk(PR_CONT, "Decoder %s parity error.\n",
> +				    f15h_ic_mce_desc[xec-4]);
>  		break;
>  
>  	default:
> @@ -370,19 +374,20 @@ static void amd_decode_ic_mce(struct mce *m)
>  	u16 ec = EC(m->status);
>  	u8 xec = XEC(m->status, xec_mask);
>  
> -	pr_emerg(HW_ERR "Instruction Cache Error: ");
> +	ras_printk(PR_EMERG, "Instruction Cache Error: ");
>  
>  	if (TLB_ERROR(ec))
> -		pr_cont("%s TLB %s.\n", LL_MSG(ec),
> +		ras_printk(PR_CONT, "%s TLB %s.\n", LL_MSG(ec),
>  			(xec ? "multimatch" : "parity error"));
>  	else if (BUS_ERROR(ec)) {
>  		bool k8 = (boot_cpu_data.x86 == 0xf && (m->status & BIT_64(58)));
>  
> -		pr_cont("during %s.\n", (k8 ? "system linefill" : "NB data read"));
> +		ras_printk(PR_CONT, "during %s.\n", (k8 ? "system linefill"
> +							: "NB data read"));
>  	} else if (fam_ops->ic_mce(ec, xec))
>  		;
>  	else
> -		pr_emerg(HW_ERR "Corrupted IC MCE info?\n");
> +		ras_printk(PR_EMERG, "Corrupted IC MCE info?\n");
>  }
>  
>  static void amd_decode_bu_mce(struct mce *m)
> @@ -390,30 +395,33 @@ static void amd_decode_bu_mce(struct mce *m)
>  	u16 ec = EC(m->status);
>  	u8 xec = XEC(m->status, xec_mask);
>  
> -	pr_emerg(HW_ERR "Bus Unit Error");
> +	ras_printk(PR_EMERG, "Bus Unit Error");
>  
>  	if (xec == 0x1)
> -		pr_cont(" in the write data buffers.\n");
> +		ras_printk(PR_CONT, " in the write data buffers.\n");
>  	else if (xec == 0x3)
> -		pr_cont(" in the victim data buffers.\n");
> +		ras_printk(PR_CONT, " in the victim data buffers.\n");
>  	else if (xec == 0x2 && MEM_ERROR(ec))
> -		pr_cont(": %s error in the L2 cache tags.\n", R4_MSG(ec));
> +		ras_printk(PR_CONT, ": %s error in the L2 cache tags.\n",
> +			   R4_MSG(ec));
>  	else if (xec == 0x0) {
>  		if (TLB_ERROR(ec))
> -			pr_cont(": %s error in a Page Descriptor Cache or "
> -				"Guest TLB.\n", TT_MSG(ec));
> +			ras_printk(PR_CONT, ": %s error in a Page Descriptor "
> +					    "Cache or Guest TLB.\n",
> +					    TT_MSG(ec));
>  		else if (BUS_ERROR(ec))
> -			pr_cont(": %s/ECC error in data read from NB: %s.\n",
> -				R4_MSG(ec), PP_MSG(ec));
> +			ras_printk(PR_CONT, ": %s/ECC error in data read from NB: %s.\n",
> +					    R4_MSG(ec), PP_MSG(ec));
>  		else if (MEM_ERROR(ec)) {
>  			u8 r4 = R4(ec);
>  
>  			if (r4 >= 0x7)
> -				pr_cont(": %s error during data copyback.\n",
> -					R4_MSG(ec));
> +				ras_printk(PR_CONT, ": %s error during data copyback.\n",
> +						    R4_MSG(ec));
>  			else if (r4 <= 0x1)
> -				pr_cont(": %s parity/ECC error during data "
> -					"access from L2.\n", R4_MSG(ec));
> +				ras_printk(PR_CONT, ": %s parity/ECC error "
> +						    "during data access from L2.\n",
> +						    R4_MSG(ec));
>  			else
>  				goto wrong_bu_mce;
>  		} else
> @@ -424,7 +432,7 @@ static void amd_decode_bu_mce(struct mce *m)
>  	return;
>  
>  wrong_bu_mce:
> -	pr_emerg(HW_ERR "Corrupted BU MCE info?\n");
> +	ras_printk(PR_EMERG, "Corrupted BU MCE info?\n");
>  }
>  
>  static void amd_decode_cu_mce(struct mce *m)
> @@ -432,28 +440,28 @@ static void amd_decode_cu_mce(struct mce *m)
>  	u16 ec = EC(m->status);
>  	u8 xec = XEC(m->status, xec_mask);
>  
> -	pr_emerg(HW_ERR "Combined Unit Error: ");
> +	ras_printk(PR_EMERG, "Combined Unit Error: ");
>  
>  	if (TLB_ERROR(ec)) {
>  		if (xec == 0x0)
> -			pr_cont("Data parity TLB read error.\n");
> +			ras_printk(PR_CONT, "Data parity TLB read error.\n");
>  		else if (xec == 0x1)
> -			pr_cont("Poison data provided for TLB fill.\n");
> +			ras_printk(PR_CONT, "Poison data provided for TLB fill.\n");
>  		else
>  			goto wrong_cu_mce;
>  	} else if (BUS_ERROR(ec)) {
>  		if (xec > 2)
>  			goto wrong_cu_mce;
>  
> -		pr_cont("Error during attempted NB data read.\n");
> +		ras_printk(PR_CONT, "Error during attempted NB data read.\n");
>  	} else if (MEM_ERROR(ec)) {
>  		switch (xec) {
>  		case 0x4 ... 0xc:
> -			pr_cont("%s.\n", f15h_cu_mce_desc[xec - 0x4]);
> +			ras_printk(PR_CONT, "%s.\n", f15h_cu_mce_desc[xec - 0x4]);
>  			break;
>  
>  		case 0x10 ... 0x14:
> -			pr_cont("%s.\n", f15h_cu_mce_desc[xec - 0x7]);
> +			ras_printk(PR_CONT, "%s.\n", f15h_cu_mce_desc[xec - 0x7]);
>  			break;
>  
>  		default:
> @@ -464,7 +472,7 @@ static void amd_decode_cu_mce(struct mce *m)
>  	return;
>  
>  wrong_cu_mce:
> -	pr_emerg(HW_ERR "Corrupted CU MCE info?\n");
> +	ras_printk(PR_EMERG, "Corrupted CU MCE info?\n");
>  }
>  
>  static void amd_decode_ls_mce(struct mce *m)
> @@ -473,12 +481,12 @@ static void amd_decode_ls_mce(struct mce *m)
>  	u8 xec = XEC(m->status, xec_mask);
>  
>  	if (boot_cpu_data.x86 >= 0x14) {
> -		pr_emerg("You shouldn't be seeing an LS MCE on this cpu family,"
> -			 " please report on LKML.\n");
> +		ras_printk(PR_EMERG, "You shouldn't be seeing an LS MCE on this"
> +				     " cpu family, please report on LKML.\n");
>  		return;
>  	}
>  
> -	pr_emerg(HW_ERR "Load Store Error");
> +	ras_printk(PR_EMERG, "Load Store Error");
>  
>  	if (xec == 0x0) {
>  		u8 r4 = R4(ec);
> @@ -486,14 +494,14 @@ static void amd_decode_ls_mce(struct mce *m)
>  		if (!BUS_ERROR(ec) || (r4 != R4_DRD && r4 != R4_DWR))
>  			goto wrong_ls_mce;
>  
> -		pr_cont(" during %s.\n", R4_MSG(ec));
> +		ras_printk(PR_CONT, " during %s.\n", R4_MSG(ec));
>  	} else
>  		goto wrong_ls_mce;
>  
>  	return;
>  
>  wrong_ls_mce:
> -	pr_emerg(HW_ERR "Corrupted LS MCE info?\n");
> +	ras_printk(PR_EMERG, "Corrupted LS MCE info?\n");
>  }
>  
>  static bool k8_nb_mce(u16 ec, u8 xec)
> @@ -502,15 +510,15 @@ static bool k8_nb_mce(u16 ec, u8 xec)
>  
>  	switch (xec) {
>  	case 0x1:
> -		pr_cont("CRC error detected on HT link.\n");
> +		ras_printk(PR_CONT, "CRC error detected on HT link.\n");
>  		break;
>  
>  	case 0x5:
> -		pr_cont("Invalid GART PTE entry during GART table walk.\n");
> +		ras_printk(PR_CONT, "Invalid GART PTE entry during GART table walk.\n");
>  		break;
>  
>  	case 0x6:
> -		pr_cont("Unsupported atomic RMW received from an IO link.\n");
> +		ras_printk(PR_CONT, "Unsupported atomic RMW received from an IO link.\n");
>  		break;
>  
>  	case 0x0:
> @@ -518,11 +526,11 @@ static bool k8_nb_mce(u16 ec, u8 xec)
>  		if (boot_cpu_data.x86 == 0x11)
>  			return false;
>  
> -		pr_cont("DRAM ECC error detected on the NB.\n");
> +		ras_printk(PR_CONT, "DRAM ECC error detected on the NB.\n");
>  		break;
>  
>  	case 0xd:
> -		pr_cont("Parity error on the DRAM addr/ctl signals.\n");
> +		ras_printk(PR_CONT, "Parity error on the DRAM addr/ctl signals.\n");
>  		break;
>  
>  	default:
> @@ -552,9 +560,9 @@ static bool f10h_nb_mce(u16 ec, u8 xec)
>  
>  	case 0xf:
>  		if (TLB_ERROR(ec))
> -			pr_cont("GART Table Walk data error.\n");
> +			ras_printk(PR_CONT, "GART Table Walk data error.\n");
>  		else if (BUS_ERROR(ec))
> -			pr_cont("DMA Exclusion Vector Table Walk error.\n");
> +			ras_printk(PR_CONT, "DMA Exclusion Vector Table Walk error.\n");
>  		else
>  			ret = false;
>  
> @@ -563,7 +571,7 @@ static bool f10h_nb_mce(u16 ec, u8 xec)
>  
>  	case 0x19:
>  		if (boot_cpu_data.x86 == 0x15)
> -			pr_cont("Compute Unit Data Error.\n");
> +			ras_printk(PR_CONT, "Compute Unit Data Error.\n");
>  		else
>  			ret = false;
>  
> @@ -581,7 +589,7 @@ static bool f10h_nb_mce(u16 ec, u8 xec)
>  		break;
>  	}
>  
> -	pr_cont("%s.\n", f10h_nb_mce_desc[xec - offset]);
> +	ras_printk(PR_CONT, "%s.\n", f10h_nb_mce_desc[xec - offset]);
>  
>  out:
>  	return ret;
> @@ -599,27 +607,27 @@ void amd_decode_nb_mce(struct mce *m)
>  	u16 ec = EC(m->status);
>  	u8 xec = XEC(m->status, 0x1f);
>  
> -	pr_emerg(HW_ERR "Northbridge Error (node %d): ", node_id);
> +	ras_printk(PR_EMERG, "Northbridge Error (node %d): ", node_id);
>  
>  	switch (xec) {
>  	case 0x2:
> -		pr_cont("Sync error (sync packets on HT link detected).\n");
> +		ras_printk(PR_CONT, "Sync error (sync packets on HT link detected).\n");
>  		return;
>  
>  	case 0x3:
> -		pr_cont("HT Master abort.\n");
> +		ras_printk(PR_CONT, "HT Master abort.\n");
>  		return;
>  
>  	case 0x4:
> -		pr_cont("HT Target abort.\n");
> +		ras_printk(PR_CONT, "HT Target abort.\n");
>  		return;
>  
>  	case 0x7:
> -		pr_cont("NB Watchdog timeout.\n");
> +		ras_printk(PR_CONT, "NB Watchdog timeout.\n");
>  		return;
>  
>  	case 0x9:
> -		pr_cont("SVM DMA Exclusion Vector error.\n");
> +		ras_printk(PR_CONT, "SVM DMA Exclusion Vector error.\n");
>  		return;
>  
>  	default:
> @@ -636,7 +644,7 @@ void amd_decode_nb_mce(struct mce *m)
>  	return;
>  
>  wrong_nb_mce:
> -	pr_emerg(HW_ERR "Corrupted NB MCE info?\n");
> +	ras_printk(PR_EMERG, "Corrupted NB MCE info?\n");
>  }
>  EXPORT_SYMBOL_GPL(amd_decode_nb_mce);
>  
> @@ -651,80 +659,80 @@ static void amd_decode_fr_mce(struct mce *m)
>  	if (c->x86 != 0x15 && xec != 0x0)
>  		goto wrong_fr_mce;
>  
> -	pr_emerg(HW_ERR "%s Error: ",
> +	ras_printk(PR_EMERG, "%s Error: ",
>  		 (c->x86 == 0x15 ? "Execution Unit" : "FIROB"));
>  
>  	if (xec == 0x0 || xec == 0xc)
> -		pr_cont("%s.\n", fr_ex_mce_desc[xec]);
> +		ras_printk(PR_CONT, "%s.\n", fr_ex_mce_desc[xec]);
>  	else if (xec < 0xd)
> -		pr_cont("%s parity error.\n", fr_ex_mce_desc[xec]);
> +		ras_printk(PR_CONT, "%s parity error.\n", fr_ex_mce_desc[xec]);
>  	else
>  		goto wrong_fr_mce;
>  
>  	return;
>  
>  wrong_fr_mce:
> -	pr_emerg(HW_ERR "Corrupted FR MCE info?\n");
> +	ras_printk(PR_EMERG, "Corrupted FR MCE info?\n");
>  }
>  
>  static void amd_decode_fp_mce(struct mce *m)
>  {
>  	u8 xec = XEC(m->status, xec_mask);
>  
> -	pr_emerg(HW_ERR "Floating Point Unit Error: ");
> +	ras_printk(PR_EMERG, "Floating Point Unit Error: ");
>  
>  	switch (xec) {
>  	case 0x1:
> -		pr_cont("Free List");
> +		ras_printk(PR_CONT, "Free List");
>  		break;
>  
>  	case 0x2:
> -		pr_cont("Physical Register File");
> +		ras_printk(PR_CONT, "Physical Register File");
>  		break;
>  
>  	case 0x3:
> -		pr_cont("Retire Queue");
> +		ras_printk(PR_CONT, "Retire Queue");
>  		break;
>  
>  	case 0x4:
> -		pr_cont("Scheduler table");
> +		ras_printk(PR_CONT, "Scheduler table");
>  		break;
>  
>  	case 0x5:
> -		pr_cont("Status Register File");
> +		ras_printk(PR_CONT, "Status Register File");
>  		break;
>  
>  	default:
>  		goto wrong_fp_mce;
>  		break;
>  	}
> -
> -	pr_cont(" parity error.\n");
> +	ras_printk(PR_CONT, " parity error.\n");
>  
>  	return;
>  
>  wrong_fp_mce:
> -	pr_emerg(HW_ERR "Corrupted FP MCE info?\n");
> +	ras_printk(PR_EMERG, "Corrupted FP MCE info?\n");
>  }
>  
>  static inline void amd_decode_err_code(u16 ec)
>  {
>  
> -	pr_emerg(HW_ERR "cache level: %s", LL_MSG(ec));
> +	ras_printk(PR_EMERG, "cache level: %s", LL_MSG(ec));
>  
>  	if (BUS_ERROR(ec))
> -		pr_cont(", mem/io: %s", II_MSG(ec));
> +		ras_printk(PR_CONT, ", mem/io: %s", II_MSG(ec));
>  	else
> -		pr_cont(", tx: %s", TT_MSG(ec));
> +		ras_printk(PR_CONT, ", tx: %s", TT_MSG(ec));
>  
>  	if (MEM_ERROR(ec) || BUS_ERROR(ec)) {
> -		pr_cont(", mem-tx: %s", R4_MSG(ec));
> +		ras_printk(PR_CONT, ", mem-tx: %s", R4_MSG(ec));
>  
>  		if (BUS_ERROR(ec))
> -			pr_cont(", part-proc: %s (%s)", PP_MSG(ec), TO_MSG(ec));
> +			ras_printk(PR_CONT, ", part-proc: %s (%s)",
> +					    PP_MSG(ec), TO_MSG(ec));
>  	}
>  
> -	pr_cont("\n");
> +	ras_printk(PR_CONT, "\n");
>  }
>  
>  /*
> @@ -752,7 +760,7 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  	if (amd_filter_mce(m))
>  		return NOTIFY_STOP;
>  
> -	pr_emerg(HW_ERR "CPU:%d\tMC%d_STATUS[%s|%s|%s|%s|%s",
> +	ras_printk(PR_EMERG, "CPU:%d MC%d_STATUS[%s|%s|%s|%s|%s",
>  		m->extcpu, m->bank,
>  		((m->status & MCI_STATUS_OVER)	? "Over"  : "-"),
>  		((m->status & MCI_STATUS_UC)	? "UE"	  : "CE"),
> @@ -761,19 +769,22 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  		((m->status & MCI_STATUS_ADDRV)	? "AddrV" : "-"));
>  
>  	if (c->x86 == 0x15)
> -		pr_cont("|%s|%s",
> +		ras_printk(PR_CONT, "|%s|%s",
>  			((m->status & BIT_64(44)) ? "Deferred" : "-"),
>  			((m->status & BIT_64(43)) ? "Poison"   : "-"));
>  
>  	/* do the two bits[14:13] together */
>  	ecc = (m->status >> 45) & 0x3;
>  	if (ecc)
> -		pr_cont("|%sECC", ((ecc == 2) ? "C" : "U"));
> +		ras_printk(PR_CONT, "|%sECC", ((ecc == 2) ? "C" : "U"));
>  
> -	pr_cont("]: 0x%016llx\n", m->status);
> +	ras_printk(PR_CONT, "]: 0x%016llx", m->status);
>  
>  	if (m->status & MCI_STATUS_ADDRV)
> -		pr_emerg(HW_ERR "\tMC%d_ADDR: 0x%016llx\n", m->bank, m->addr);
> +		ras_printk(PR_CONT, " MC%d_ADDR: 0x%016llx",
> +			   m->bank, m->addr);
> +
> +	ras_printk(PR_CONT, "\n");
>  
>  	switch (m->bank) {
>  	case 0:
> @@ -813,6 +824,8 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  
>  	amd_decode_err_code(m->status & 0xffff);
>  
> +	trace_mce_record(ras_get_decoded_err(), m);
> +
>  	return NOTIFY_STOP;
>  }
>  EXPORT_SYMBOL_GPL(amd_decode_mce);
> @@ -882,10 +895,10 @@ static int __init mce_amd_init(void)
>  		return -EINVAL;
>  	}
>  
> -	pr_info("MCE: In-kernel MCE decoding enabled.\n");
> -
>  	mce_register_decode_chain(&amd_mce_dec_nb);
>  
> +	pr_info("MCE: In-kernel MCE decoding enabled.\n");
> +
>  	return 0;
>  }
>  early_initcall(mce_amd_init);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
  2012-03-06 15:42   ` Mauro Carvalho Chehab
@ 2012-03-12 16:18     ` Luck, Tony
  2012-03-12 16:26       ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Luck, Tony @ 2012-03-12 16:18 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Borislav Petkov
  Cc: Ingo Molnar, EDAC devel, LKML, Borislav Petkov

> This is an initial version of the patch which converts MCE decoding
> facilities to use the RAS printk buffer. When there's no userspace agent
> running (i.e., /sys/devices/system/ras/agent == 0), we fall back to the
> default printk'ing into dmesg which is what we've been doing so far.

This looks unpleasant if your userspace agent set this sysfs file, and
then dies (or gets killed).

Perhaps you need some device file that the agent keeps open (so if the
agent goes away, the kernel gets a "close" on the device to tell it
to revert). But even with this sort of solution you would still have
to worry about races.

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
  2012-03-12 16:18     ` Luck, Tony
@ 2012-03-12 16:26       ` Borislav Petkov
  2012-03-12 16:59         ` Luck, Tony
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2012-03-12 16:26 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Mauro Carvalho Chehab, Borislav Petkov, Ingo Molnar, EDAC devel, LKML

On Mon, Mar 12, 2012 at 04:18:57PM +0000, Luck, Tony wrote:
> > This is an initial version of the patch which converts MCE decoding
> > facilities to use the RAS printk buffer. When there's no userspace agent
> > running (i.e., /sys/devices/system/ras/agent == 0), we fall back to the
> > default printk'ing into dmesg which is what we've been doing so far.
> 
> This looks unpleasant if your userspace agent set this sysfs file, and
> then dies (or gets killed).

Yeah, having a sysfs file like that felt unpleasant - I was hoping
someone would point me to a better solution...

> Perhaps you need some device file that the agent keeps open (so if the
> agent goes away, the kernel gets a "close" on the device to tell it
> to revert). But even with this sort of solution you would still have
> to worry about races.

Sounds better, especially the close-on-exit part. Please elaborate on
the races...

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
  2012-03-12 16:26       ` Borislav Petkov
@ 2012-03-12 16:59         ` Luck, Tony
  2012-03-12 18:03           ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Luck, Tony @ 2012-03-12 16:59 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Mauro Carvalho Chehab, Ingo Molnar, EDAC devel, LKML

> Sounds better, especially the close-on-exit part. Please elaborate on
> the races...

Errors are happening asynchronously to everything. Race looks like:

Daemon exits (or is killed)
   <<<< race begins here
kernel close routine called
close routine updates your global variable
   <<<< race ends here

-Tony



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
  2012-03-12 16:59         ` Luck, Tony
@ 2012-03-12 18:03           ` Borislav Petkov
  2012-03-27 17:06             ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2012-03-12 18:03 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar, EDAC devel, LKML

On Mon, Mar 12, 2012 at 04:59:37PM +0000, Luck, Tony wrote:
> > Sounds better, especially the close-on-exit part. Please elaborate on
> > the races...
> 
> Errors are happening asynchronously to everything. Race looks like:
> 
> Daemon exits (or is killed)
>    <<<< race begins here
> kernel close routine called
> close routine updates your global variable
>    <<<< race ends here

Well, in that case, we're going to miss logging a single error, or log
it incomplete.

Unless, we make the global variable atomic and make the daemon zero it
as the first action it does when it starts going away. If it is killed,
then we probably need some sanity-checking functionality which checks
periodically whether the daemon is still alive ...

This probably needs more meditation.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
  2012-03-12 18:03           ` Borislav Petkov
@ 2012-03-27 17:06             ` Borislav Petkov
  2012-03-27 18:35               ` Luck, Tony
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2012-03-27 17:06 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Mauro Carvalho Chehab, Ingo Molnar, EDAC devel, LKML

On Mon, Mar 12, 2012 at 07:03:59PM +0100, Borislav Petkov wrote:
> On Mon, Mar 12, 2012 at 04:59:37PM +0000, Luck, Tony wrote:
> > > Sounds better, especially the close-on-exit part. Please elaborate on
> > > the races...
> > 
> > Errors are happening asynchronously to everything. Race looks like:
> > 
> > Daemon exits (or is killed)
> >    <<<< race begins here
> > kernel close routine called
> > close routine updates your global variable
> >    <<<< race ends here
> 
> Well, in that case, we're going to miss logging a single error, or log
> it incomplete.
> 
> Unless, we make the global variable atomic and make the daemon zero it
> as the first action it does when it starts going away. If it is killed,
> then we probably need some sanity-checking functionality which checks
> periodically whether the daemon is still alive ...
> 
> This probably needs more meditation.

Ok, hm, how about we add a timer which runs for a safe period of say...
a couple of minutes after the error has been logged into the buffer.

Before it expires we expect that the userspace daemon comes in and
consumes the information - we test explicitly whether it wrote to some
file - or implicitly by checking whether the buffer got emptied in the
meantime (the exact method is still TBD).

In any case, if during the safe period of time we haven't received
confirmation from userspace that the item has been consumed, we switch
irreversibly back to the kernel log buffer and reissue the decoded info
through printk.

This way we

* don't introduce a device file with a ->close

* remain races-agnostic: either the timeout has happened and userspace
hasn't consumed the decoded data or it worked just fine and we continue
on with our marry error collection.

If other errors happen while the timer is running, we log them as usual
and restart the timer to give the newest error an equal chance. Error
size shouldn't overflow the buffer because we're reserving 4 pages per
CPU currently and this can easily be enlarged...

Hmm, thoughts..?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
  2012-03-27 17:06             ` Borislav Petkov
@ 2012-03-27 18:35               ` Luck, Tony
  2012-03-27 19:11                 ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Luck, Tony @ 2012-03-27 18:35 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Mauro Carvalho Chehab, Ingo Molnar, EDAC devel, LKML

> In any case, if during the safe period of time we haven't received
> confirmation from userspace that the item has been consumed, we switch
> irreversibly back to the kernel log buffer and reissue the decoded info
> through printk.

I'm not sure I like irreversible things.

Here's the life cycle:

1) System boots ... we have a window during this time where there is
   no daemon (or any user space at all).

2) Daemon gets started from /etc/init.d or systemd script

3) (optional) New version of daemon installed in update (old daemon is terminated, new one starts).

4) System is shutdown - all daemons terminated

5) System actually halts.

So we clearly have some gaps where there isn't a daemon.  Most of them should
be pretty short ... but I worry about the gap from #1 to #2 - which can be pretty
long if we need to fsck some disks (or we on some crazy big system that takes
many minutes just to find and spin-up all the disks).

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
  2012-03-27 18:35               ` Luck, Tony
@ 2012-03-27 19:11                 ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2012-03-27 19:11 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar, EDAC devel, LKML

On Tue, Mar 27, 2012 at 06:35:37PM +0000, Luck, Tony wrote:
> > In any case, if during the safe period of time we haven't received
> > confirmation from userspace that the item has been consumed, we switch
> > irreversibly back to the kernel log buffer and reissue the decoded info
> > through printk.
> 
> I'm not sure I like irreversible things.
> 
> Here's the life cycle:
> 
> 1) System boots ... we have a window during this time where there is
>    no daemon (or any user space at all).
> 
> 2) Daemon gets started from /etc/init.d or systemd script
> 
> 3) (optional) New version of daemon installed in update (old daemon is terminated, new one starts).
> 
> 4) System is shutdown - all daemons terminated
> 
> 5) System actually halts.
> 
> 
> So we clearly have some gaps where there isn't a daemon.  Most of them should
> be pretty short ... but I worry about the gap from #1 to #2 - which can be pretty
> long if we need to fsck some disks (or we on some crazy big system that takes
> many minutes just to find and spin-up all the disks).

Well, currenty we queue MCEs for later consumption before the decoder
chains have been registered etc: 0937195715713. We probably could delay
the draining of the buffer until we have userspace and daemon running.

Problems with this is that buffer size is limited: 32 struct mce's and
it can overflow pretty fast on a b0rked system which spews a lot of MCEs
during boot.

We probably could provide for enlarging that when needed as a Kconfig or
a boot option using early memblock allocations or whatever...

Then, after maybe a configurable period of uptime (it should be chosen
to be safe for most systems out there and the others could configure in
a higher timeout if they need to) we start spewing out decoded MCEs into
dmesg unless a daemon has drained the buffers before that.

Or something to that effect...

Concerning the irreversibility, we could probably teach the code to stop
printk'ing MCEs if the daemon has been restarted in the meantime...

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-03-27 19:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-06 13:31 [RFC -v3 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
2012-03-06 13:31 ` [PATCH 1/3] mce: Add a msg string to the MCE tracepoint Borislav Petkov
2012-03-06 13:31 ` [PATCH 2/3] x86, RAS: Add a decoded msg buffer Borislav Petkov
2012-03-06 13:31 ` [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer Borislav Petkov
2012-03-06 15:42   ` Mauro Carvalho Chehab
2012-03-12 16:18     ` Luck, Tony
2012-03-12 16:26       ` Borislav Petkov
2012-03-12 16:59         ` Luck, Tony
2012-03-12 18:03           ` Borislav Petkov
2012-03-27 17:06             ` Borislav Petkov
2012-03-27 18:35               ` Luck, Tony
2012-03-27 19:11                 ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.