All of lore.kernel.org
 help / color / mirror / Atom feed
* New eMCA trace event interface
@ 2014-05-15  8:30 Chen, Gong
  2014-05-15  8:30 ` [PATCH 1/7 v5] trace, RAS: Add basic RAS trace event Chen, Gong
                   ` (7 more replies)
  0 siblings, 8 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-15  8:30 UTC (permalink / raw)
  To: tony.luck, bp, m.chehab; +Cc: linux-acpi

[PATCH 1/7 v5] trace, RAS: Add basic RAS trace event
[PATCH 2/7 v3] trace, AER: Move trace into unified interface
[PATCH 3/7 v4] CPER: Adjust code flow of some functions
[PATCH 4/7 v2] RAS, debugfs: Add debugfs interface for RAS subsystem
[PATCH 5/7 v5] trace, RAS: Add eMCA trace event interface
[PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log
[PATCH 7/7] RAS, extlog: Adjust init flow


This patch series add new eMCA trace event interface. To avoid conflict with
existed interface, a new unified trace event stub in the kernel is used.
New trace interface is mutually exclusive with console message via
a knob under debugfs. This knob is a reference counter. When it is opened,
the counter will be increased, whereas the counter will be decreased
if it is closed. Once this counter is greater than 0, the trace will be
used, otherwise, message will be routed to the console.

dmesg output will not conflict with trace output. Only one can work
at the same time.

When dmesg is used, you will get:

...
[  157.802455] {1}Hardware error detected on CPU0
[  157.802460] {1}It has been corrected by h/w and requires no further action
[  157.802463] {1}event severity: corrected
[  157.802465] {1} Error 0, type: corrected
[  157.802467] {1}  section_type: memory error
[  157.802469] {1}  physical_address: 0x000000042c201000
[  157.802472] {1}  node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 25232 column: 1408
[  157.802474] {1}  DIMM location: Memriser1 CHANNEL A DIMM 0
[  416.121727] {2}Hardware error detected on CPU0
[  416.121732] {2}It has been corrected by h/w and requires no further action
[  416.121734] {2}event severity: corrected
[  416.121736] {2} Error 0, type: corrected
[  416.121738] {2}  section_type: memory error
[  416.121740] {2}  physical_address: 0x000000042e0fd000
[  416.121742] {2}  node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 27279 column: 1480
[  416.121744] {2}  DIMM location: Memriser1 CHANNEL A DIMM 0
...

When trace is used, you will get:

...
# tracer: nop
# 
#  entries-in-buffer/entries-written: 2/2   #P:60
# 
#                               _-----=> irqs-off
#                              / _----=> need-resched
#                             | / _---=> hardirq/softirq
#                             || / _--=> preempt-depth
#                             ||| /     delay
#            TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#               | |       |   ||||       |         |
#           <idle>-0     [000] dNh3   281.772573: extlog_mem_event: 1 corrected error: unknown DIMM location: Memriser1 CHANNEL A DIMM 0 physical addr: 0x0000000074516000 node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 7329 column: 656 FRU: 00000000-0000-0000-0000-000000000000
            <idle>-0     [000] d.h3   364.449573: extlog_mem_event: 2 corrected errors: unknown DIMM location: Memriser1 CHANNEL A DIMM 0 physical addr: 0x0000000424b0b000 node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 26320 column: 1176 FRU: 00000000-0000-0000-0000-000000000000

v3 -> v2: adjust RAS subsystem format & bunch of minor adjustments.
v2 -> v1: merge the comments from Tony Luck & Borislav Petkov.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/7 v5] trace, RAS: Add basic RAS trace event
  2014-05-15  8:30 New eMCA trace event interface Chen, Gong
@ 2014-05-15  8:30 ` Chen, Gong
  2014-05-15  8:30 ` [PATCH 2/7 v3] trace, AER: Move trace into unified interface Chen, Gong
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-15  8:30 UTC (permalink / raw)
  To: tony.luck, bp, m.chehab; +Cc: linux-acpi, Chen, Gong

To avoid confuision and conflict of usage for RAS related trace event,
add an unified RAS trace event stub.

v5 -> v4: remove explicit RAS menuconfig.
v4 -> v3: change dependency rule of RAS_TRACE.
v3 -> v2: fix dependency in Kconfig.
v2 -> v1: adjust Kconfig to take RAS as a separate subsystem.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/Kconfig        |  2 ++
 drivers/Makefile       |  1 +
 drivers/edac/Kconfig   |  1 +
 drivers/edac/edac_mc.c |  3 ---
 drivers/ras/Kconfig    |  6 ++++++
 drivers/ras/Makefile   |  1 +
 drivers/ras/ras.c      | 12 ++++++++++++
 7 files changed, 23 insertions(+), 3 deletions(-)
 create mode 100644 drivers/ras/Kconfig
 create mode 100644 drivers/ras/Makefile
 create mode 100644 drivers/ras/ras.c

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 0a0a90f..547e73a 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -174,4 +174,6 @@ source "drivers/powercap/Kconfig"
 
 source "drivers/mcb/Kconfig"
 
+source "drivers/ras/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index d05d81b..71a4985 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -157,3 +157,4 @@ obj-$(CONFIG_NTB)		+= ntb/
 obj-$(CONFIG_FMC)		+= fmc/
 obj-$(CONFIG_POWERCAP)		+= powercap/
 obj-$(CONFIG_MCB)		+= mcb/
+obj-$(CONFIG_RAS)		+= ras/
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 878f090..1589a86 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -72,6 +72,7 @@ config EDAC_MCE_INJ
 
 config EDAC_MM_EDAC
 	tristate "Main Memory EDAC (Error Detection And Correction) reporting"
+	select RAS_TRACE
 	help
 	  Some systems are able to detect and correct errors in main
 	  memory.  EDAC can report statistics on memory error
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 33edd67..28c1695 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,9 +33,6 @@
 #include <asm/edac.h>
 #include "edac_core.h"
 #include "edac_module.h"
-
-#define CREATE_TRACE_POINTS
-#define TRACE_INCLUDE_PATH ../../include/ras
 #include <ras/ras_event.h>
 
 /* lock to memory controller's control array */
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
new file mode 100644
index 0000000..85febfd
--- /dev/null
+++ b/drivers/ras/Kconfig
@@ -0,0 +1,6 @@
+config RAS_TRACE
+	def_bool n
+	select RAS
+
+config RAS
+	bool
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
new file mode 100644
index 0000000..223e806
--- /dev/null
+++ b/drivers/ras/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_RAS) += ras.o
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
new file mode 100644
index 0000000..b0c6ed1
--- /dev/null
+++ b/drivers/ras/ras.c
@@ -0,0 +1,12 @@
+/*
+ * Copyright (C) 2014 Intel Corporation
+ *
+ * Authors:
+ *	Chen, Gong <gong.chen@linux.intel.com>
+ */
+
+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/ras_event.h>
+
+EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
-- 
2.0.0.rc0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/7 v3] trace, AER: Move trace into unified interface
  2014-05-15  8:30 New eMCA trace event interface Chen, Gong
  2014-05-15  8:30 ` [PATCH 1/7 v5] trace, RAS: Add basic RAS trace event Chen, Gong
@ 2014-05-15  8:30 ` Chen, Gong
  2014-05-21 10:19   ` Borislav Petkov
  2014-05-15  8:30 ` [PATCH 3/7 v4] CPER: Adjust code flow of some functions Chen, Gong
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-15  8:30 UTC (permalink / raw)
  To: tony.luck, bp, m.chehab; +Cc: linux-acpi, Chen, Gong

AER uses a separate trace interface by now. To make it
consistent, move it into unified RAS trace interface.

v3 -> v2: change dependency rule of RAS_TRACE.
v2 -> v1: remove unnecessary dependency in drivers/ras/Kconfig.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/pci/pcie/aer/Kconfig           |  1 +
 drivers/pci/pcie/aer/aerdrv_errprint.c |  4 +-
 include/ras/ras_event.h                | 64 ++++++++++++++++++++++++++++
 include/trace/events/ras.h             | 77 ----------------------------------
 4 files changed, 66 insertions(+), 80 deletions(-)
 delete mode 100644 include/trace/events/ras.h

diff --git a/drivers/pci/pcie/aer/Kconfig b/drivers/pci/pcie/aer/Kconfig
index 50e94e0..c611384 100644
--- a/drivers/pci/pcie/aer/Kconfig
+++ b/drivers/pci/pcie/aer/Kconfig
@@ -5,6 +5,7 @@
 config PCIEAER
 	boolean "Root Port Advanced Error Reporting support"
 	depends on PCIEPORTBUS
+	select RAS_TRACE
 	default y
 	help
 	  This enables PCI Express Root Port Advanced Error Reporting
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 34ff702..73e73b7 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -22,9 +22,7 @@
 #include <linux/cper.h>
 
 #include "aerdrv.h"
-
-#define CREATE_TRACE_POINTS
-#include <trace/events/ras.h>
+#include <ras/ras_event.h>
 
 #define AER_AGENT_RECEIVER		0
 #define AER_AGENT_REQUESTER		1
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 21cdb0b..acbcbb8 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -8,6 +8,7 @@
 #include <linux/tracepoint.h>
 #include <linux/edac.h>
 #include <linux/ktime.h>
+#include <linux/aer.h>
 
 /*
  * Hardware Events Report
@@ -94,6 +95,69 @@ TRACE_EVENT(mc_event,
 		  __get_str(driver_detail))
 );
 
+/*
+ * PCIe AER Trace event
+ *
+ * These events are generated when hardware detects a corrected or
+ * uncorrected event on a PCIe device. The event report has
+ * the following structure:
+ *
+ * char * dev_name -	The name of the slot where the device resides
+ *			([domain:]bus:device.function).
+ * u32 status -		Either the correctable or uncorrectable register
+ *			indicating what error or errors have been seen
+ * u8 severity -	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
+ */
+
+#define aer_correctable_errors		\
+	{BIT(0),	"Receiver Error"},		\
+	{BIT(6),	"Bad TLP"},			\
+	{BIT(7),	"Bad DLLP"},			\
+	{BIT(8),	"RELAY_NUM Rollover"},		\
+	{BIT(12),	"Replay Timer Timeout"},	\
+	{BIT(13),	"Advisory Non-Fatal"}
+
+#define aer_uncorrectable_errors		\
+	{BIT(4),	"Data Link Protocol"},		\
+	{BIT(12),	"Poisoned TLP"},		\
+	{BIT(13),	"Flow Control Protocol"},	\
+	{BIT(14),	"Completion Timeout"},		\
+	{BIT(15),	"Completer Abort"},		\
+	{BIT(16),	"Unexpected Completion"},	\
+	{BIT(17),	"Receiver Overflow"},		\
+	{BIT(18),	"Malformed TLP"},		\
+	{BIT(19),	"ECRC"},			\
+	{BIT(20),	"Unsupported Request"}
+
+TRACE_EVENT(aer_event,
+	TP_PROTO(const char *dev_name,
+		 const u32 status,
+		 const u8 severity),
+
+	TP_ARGS(dev_name, status, severity),
+
+	TP_STRUCT__entry(
+		__string(	dev_name,	dev_name	)
+		__field(	u32,		status		)
+		__field(	u8,		severity	)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name);
+		__entry->status		= status;
+		__entry->severity	= severity;
+	),
+
+	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
+		__get_str(dev_name),
+		__entry->severity == AER_CORRECTABLE ? "Corrected" :
+			__entry->severity == AER_FATAL ?
+			"Fatal" : "Uncorrected, non-fatal",
+		__entry->severity == AER_CORRECTABLE ?
+		__print_flags(__entry->status, "|", aer_correctable_errors) :
+		__print_flags(__entry->status, "|", aer_uncorrectable_errors))
+);
+
 #endif /* _TRACE_HW_EVENT_MC_H */
 
 /* This part must be outside protection */
diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
deleted file mode 100644
index 1c875ad..0000000
--- a/include/trace/events/ras.h
+++ /dev/null
@@ -1,77 +0,0 @@
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM ras
-
-#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_AER_H
-
-#include <linux/tracepoint.h>
-#include <linux/aer.h>
-
-
-/*
- * PCIe AER Trace event
- *
- * These events are generated when hardware detects a corrected or
- * uncorrected event on a PCIe device. The event report has
- * the following structure:
- *
- * char * dev_name -	The name of the slot where the device resides
- *			([domain:]bus:device.function).
- * u32 status -		Either the correctable or uncorrectable register
- *			indicating what error or errors have been seen
- * u8 severity -	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
- */
-
-#define aer_correctable_errors		\
-	{BIT(0),	"Receiver Error"},		\
-	{BIT(6),	"Bad TLP"},			\
-	{BIT(7),	"Bad DLLP"},			\
-	{BIT(8),	"RELAY_NUM Rollover"},		\
-	{BIT(12),	"Replay Timer Timeout"},	\
-	{BIT(13),	"Advisory Non-Fatal"}
-
-#define aer_uncorrectable_errors		\
-	{BIT(4),	"Data Link Protocol"},		\
-	{BIT(12),	"Poisoned TLP"},		\
-	{BIT(13),	"Flow Control Protocol"},	\
-	{BIT(14),	"Completion Timeout"},		\
-	{BIT(15),	"Completer Abort"},		\
-	{BIT(16),	"Unexpected Completion"},	\
-	{BIT(17),	"Receiver Overflow"},		\
-	{BIT(18),	"Malformed TLP"},		\
-	{BIT(19),	"ECRC"},			\
-	{BIT(20),	"Unsupported Request"}
-
-TRACE_EVENT(aer_event,
-	TP_PROTO(const char *dev_name,
-		 const u32 status,
-		 const u8 severity),
-
-	TP_ARGS(dev_name, status, severity),
-
-	TP_STRUCT__entry(
-		__string(	dev_name,	dev_name	)
-		__field(	u32,		status		)
-		__field(	u8,		severity	)
-	),
-
-	TP_fast_assign(
-		__assign_str(dev_name, dev_name);
-		__entry->status		= status;
-		__entry->severity	= severity;
-	),
-
-	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
-		__get_str(dev_name),
-		__entry->severity == AER_CORRECTABLE ? "Corrected" :
-			__entry->severity == AER_FATAL ?
-			"Fatal" : "Uncorrected, non-fatal",
-		__entry->severity == AER_CORRECTABLE ?
-		__print_flags(__entry->status, "|", aer_correctable_errors) :
-		__print_flags(__entry->status, "|", aer_uncorrectable_errors))
-);
-
-#endif /* _TRACE_AER_H */
-
-/* This part must be outside protection */
-#include <trace/define_trace.h>
-- 
2.0.0.rc0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-15  8:30 New eMCA trace event interface Chen, Gong
  2014-05-15  8:30 ` [PATCH 1/7 v5] trace, RAS: Add basic RAS trace event Chen, Gong
  2014-05-15  8:30 ` [PATCH 2/7 v3] trace, AER: Move trace into unified interface Chen, Gong
@ 2014-05-15  8:30 ` Chen, Gong
  2014-05-21 11:05   ` Borislav Petkov
  2014-05-15  8:30 ` [PATCH 4/7 v2] RAS, debugfs: Add debugfs interface for RAS subsystem Chen, Gong
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-15  8:30 UTC (permalink / raw)
  To: tony.luck, bp, m.chehab; +Cc: linux-acpi, Chen, Gong

Some codes can be reorganzied as a common function for other usages.

v4 -> v3: minor adjustment to make output format more gracefully.
v3 -> v2: Fix a bug when calculating string length & minor fix.
v2 -> v1: Use scnprintf to simplify codes.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/firmware/efi/cper.c | 158 +++++++++++++++++++++++++++++---------------
 include/linux/cper.h        |  11 +++
 2 files changed, 114 insertions(+), 55 deletions(-)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 1491dd4..a53723a 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -34,6 +34,10 @@
 #include <linux/aer.h>
 
 #define INDENT_SP	" "
+
+static char mem_location[CPER_REC_LEN];
+static char dimm_location[CPER_REC_LEN];
+
 /*
  * CPER record ID need to be unique even after reboot, because record
  * ID is used as index for ERST storage, while CPER records from
@@ -50,18 +54,19 @@ u64 cper_next_record_id(void)
 }
 EXPORT_SYMBOL_GPL(cper_next_record_id);
 
-static const char *cper_severity_strs[] = {
+static const char * const severity_strs[] = {
 	"recoverable",
 	"fatal",
 	"corrected",
 	"info",
 };
 
-static const char *cper_severity_str(unsigned int severity)
+const char *cper_severity_str(unsigned int severity)
 {
-	return severity < ARRAY_SIZE(cper_severity_strs) ?
-		cper_severity_strs[severity] : "unknown";
+	return severity < ARRAY_SIZE(severity_strs) ?
+		severity_strs[severity] : "unknown";
 }
+EXPORT_SYMBOL_GPL(cper_severity_str);
 
 /*
  * cper_print_bits - print strings for set bits
@@ -100,32 +105,32 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 		printk("%s\n", buf);
 }
 
-static const char * const cper_proc_type_strs[] = {
+static const char * const proc_type_strs[] = {
 	"IA32/X64",
 	"IA64",
 };
 
-static const char * const cper_proc_isa_strs[] = {
+static const char * const proc_isa_strs[] = {
 	"IA32",
 	"IA64",
 	"X64",
 };
 
-static const char * const cper_proc_error_type_strs[] = {
+static const char * const proc_error_type_strs[] = {
 	"cache error",
 	"TLB error",
 	"bus error",
 	"micro-architectural error",
 };
 
-static const char * const cper_proc_op_strs[] = {
+static const char * const proc_op_strs[] = {
 	"unknown or generic",
 	"data read",
 	"data write",
 	"instruction execution",
 };
 
-static const char * const cper_proc_flag_strs[] = {
+static const char * const proc_flag_strs[] = {
 	"restartable",
 	"precise IP",
 	"overflow",
@@ -137,26 +142,26 @@ static void cper_print_proc_generic(const char *pfx,
 {
 	if (proc->validation_bits & CPER_PROC_VALID_TYPE)
 		printk("%s""processor_type: %d, %s\n", pfx, proc->proc_type,
-		       proc->proc_type < ARRAY_SIZE(cper_proc_type_strs) ?
-		       cper_proc_type_strs[proc->proc_type] : "unknown");
+		       proc->proc_type < ARRAY_SIZE(proc_type_strs) ?
+		       proc_type_strs[proc->proc_type] : "unknown");
 	if (proc->validation_bits & CPER_PROC_VALID_ISA)
 		printk("%s""processor_isa: %d, %s\n", pfx, proc->proc_isa,
-		       proc->proc_isa < ARRAY_SIZE(cper_proc_isa_strs) ?
-		       cper_proc_isa_strs[proc->proc_isa] : "unknown");
+		       proc->proc_isa < ARRAY_SIZE(proc_isa_strs) ?
+		       proc_isa_strs[proc->proc_isa] : "unknown");
 	if (proc->validation_bits & CPER_PROC_VALID_ERROR_TYPE) {
 		printk("%s""error_type: 0x%02x\n", pfx, proc->proc_error_type);
 		cper_print_bits(pfx, proc->proc_error_type,
-				cper_proc_error_type_strs,
-				ARRAY_SIZE(cper_proc_error_type_strs));
+				proc_error_type_strs,
+				ARRAY_SIZE(proc_error_type_strs));
 	}
 	if (proc->validation_bits & CPER_PROC_VALID_OPERATION)
 		printk("%s""operation: %d, %s\n", pfx, proc->operation,
-		       proc->operation < ARRAY_SIZE(cper_proc_op_strs) ?
-		       cper_proc_op_strs[proc->operation] : "unknown");
+		       proc->operation < ARRAY_SIZE(proc_op_strs) ?
+		       proc_op_strs[proc->operation] : "unknown");
 	if (proc->validation_bits & CPER_PROC_VALID_FLAGS) {
 		printk("%s""flags: 0x%02x\n", pfx, proc->flags);
-		cper_print_bits(pfx, proc->flags, cper_proc_flag_strs,
-				ARRAY_SIZE(cper_proc_flag_strs));
+		cper_print_bits(pfx, proc->flags, proc_flag_strs,
+				ARRAY_SIZE(proc_flag_strs));
 	}
 	if (proc->validation_bits & CPER_PROC_VALID_LEVEL)
 		printk("%s""level: %d\n", pfx, proc->level);
@@ -177,7 +182,7 @@ static void cper_print_proc_generic(const char *pfx,
 		printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
 }
 
-static const char *cper_mem_err_type_strs[] = {
+static const char * const mem_err_type_strs[] = {
 	"unknown",
 	"no error",
 	"single-bit ECC",
@@ -196,58 +201,101 @@ static const char *cper_mem_err_type_strs[] = {
 	"physical memory map-out event",
 };
 
-static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
+const char *cper_mem_err_type_str(unsigned int etype)
 {
-	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
-		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
-	if (mem->validation_bits & CPER_MEM_VALID_PA)
-		printk("%s""physical_address: 0x%016llx\n",
-		       pfx, mem->physical_addr);
-	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
-		printk("%s""physical_address_mask: 0x%016llx\n",
-		       pfx, mem->physical_addr_mask);
+	return etype < ARRAY_SIZE(mem_err_type_strs) ?
+		mem_err_type_strs[etype] : "unknown";
+}
+EXPORT_SYMBOL_GPL(cper_mem_err_type_str);
+
+int cper_mem_err_location(const struct cper_sec_mem_err *mem, char *msg)
+{
+	u32 len, n;
+
+	if (!msg)
+		return 0;
+
+	n = 0;
+	len = CPER_REC_LEN - 1;
 	if (mem->validation_bits & CPER_MEM_VALID_NODE)
-		pr_debug("node: %d\n", mem->node);
+		n += scnprintf(msg + n, len - n, "node: %d ", mem->node);
 	if (mem->validation_bits & CPER_MEM_VALID_CARD)
-		pr_debug("card: %d\n", mem->card);
+		n += scnprintf(msg + n, len - n, "card: %d ", mem->card);
 	if (mem->validation_bits & CPER_MEM_VALID_MODULE)
-		pr_debug("module: %d\n", mem->module);
+		n += scnprintf(msg + n, len - n, "module: %d ", mem->module);
 	if (mem->validation_bits & CPER_MEM_VALID_RANK_NUMBER)
-		pr_debug("rank: %d\n", mem->rank);
+		n += scnprintf(msg + n, len - n, "rank: %d ", mem->rank);
 	if (mem->validation_bits & CPER_MEM_VALID_BANK)
-		pr_debug("bank: %d\n", mem->bank);
+		n += scnprintf(msg + n, len - n, "bank: %d ", mem->bank);
 	if (mem->validation_bits & CPER_MEM_VALID_DEVICE)
-		pr_debug("device: %d\n", mem->device);
+		n += scnprintf(msg + n, len - n, "device: %d ", mem->device);
 	if (mem->validation_bits & CPER_MEM_VALID_ROW)
-		pr_debug("row: %d\n", mem->row);
+		n += scnprintf(msg + n, len - n, "row: %d ", mem->row);
 	if (mem->validation_bits & CPER_MEM_VALID_COLUMN)
-		pr_debug("column: %d\n", mem->column);
+		n += scnprintf(msg + n, len - n, "column: %d ", mem->column);
 	if (mem->validation_bits & CPER_MEM_VALID_BIT_POSITION)
-		pr_debug("bit_position: %d\n", mem->bit_pos);
+		n += scnprintf(msg + n, len - n, "bit_position: %d ",
+			       mem->bit_pos);
 	if (mem->validation_bits & CPER_MEM_VALID_REQUESTOR_ID)
-		pr_debug("requestor_id: 0x%016llx\n", mem->requestor_id);
+		n += scnprintf(msg + n, len - n, "requestor_id: 0x%016llx ",
+			       mem->requestor_id);
 	if (mem->validation_bits & CPER_MEM_VALID_RESPONDER_ID)
-		pr_debug("responder_id: 0x%016llx\n", mem->responder_id);
+		n += scnprintf(msg + n, len - n, "responder_id: 0x%016llx ",
+			       mem->responder_id);
 	if (mem->validation_bits & CPER_MEM_VALID_TARGET_ID)
-		pr_debug("target_id: 0x%016llx\n", mem->target_id);
+		scnprintf(msg + n, len - n, "target_id: 0x%016llx ",
+			  mem->target_id);
+
+	return n;
+}
+EXPORT_SYMBOL_GPL(cper_mem_err_location);
+
+int cper_dimm_err_location(const struct cper_sec_mem_err *mem, char *msg)
+{
+	u32 len, n;
+	const char *bank = NULL, *device = NULL;
+
+	if (!msg || !(mem->validation_bits & CPER_MEM_VALID_MODULE_HANDLE))
+		return 0;
+
+	n = 0;
+	len = CPER_REC_LEN - 1;
+	dmi_memdev_name(mem->mem_dev_handle, &bank, &device);
+	if (bank && device)
+		n = snprintf(msg, len, "DIMM location: %s %s", bank, device);
+	else
+		n = snprintf(msg, len,
+			     "DIMM location: not present. DMI handle: 0x%.4x",
+			     mem->mem_dev_handle);
+
+	return n;
+}
+EXPORT_SYMBOL_GPL(cper_dimm_err_location);
+
+static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
+{
+	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
+		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
+	if (mem->validation_bits & CPER_MEM_VALID_PA)
+		printk("%s""physical_address: 0x%016llx\n",
+		       pfx, mem->physical_addr);
+	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
+		printk("%s""physical_address_mask: 0x%016llx\n",
+		       pfx, mem->physical_addr_mask);
+	memset(mem_location, 0, CPER_REC_LEN);
+	if (cper_mem_err_location(mem, mem_location))
+		printk("%s%s\n", pfx, mem_location);
 	if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
 		u8 etype = mem->error_type;
 		printk("%s""error_type: %d, %s\n", pfx, etype,
-		       etype < ARRAY_SIZE(cper_mem_err_type_strs) ?
-		       cper_mem_err_type_strs[etype] : "unknown");
-	}
-	if (mem->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
-		const char *bank = NULL, *device = NULL;
-		dmi_memdev_name(mem->mem_dev_handle, &bank, &device);
-		if (bank != NULL && device != NULL)
-			printk("%s""DIMM location: %s %s", pfx, bank, device);
-		else
-			printk("%s""DIMM DMI handle: 0x%.4x",
-			       pfx, mem->mem_dev_handle);
+		       cper_mem_err_type_str(etype));
 	}
+	memset(dimm_location, 0, CPER_REC_LEN);
+	if (cper_dimm_err_location(mem, dimm_location))
+		printk("%s%s\n", pfx, dimm_location);
 }
 
-static const char *cper_pcie_port_type_strs[] = {
+static const char * const pcie_port_type_strs[] = {
 	"PCIe end point",
 	"legacy PCI end point",
 	"unknown",
@@ -266,8 +314,8 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
 {
 	if (pcie->validation_bits & CPER_PCIE_VALID_PORT_TYPE)
 		printk("%s""port_type: %d, %s\n", pfx, pcie->port_type,
-		       pcie->port_type < ARRAY_SIZE(cper_pcie_port_type_strs) ?
-		       cper_pcie_port_type_strs[pcie->port_type] : "unknown");
+		       pcie->port_type < ARRAY_SIZE(pcie_port_type_strs) ?
+		       pcie_port_type_strs[pcie->port_type] : "unknown");
 	if (pcie->validation_bits & CPER_PCIE_VALID_VERSION)
 		printk("%s""version: %d.%d\n", pfx,
 		       pcie->version.major, pcie->version.minor);
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 2fc0ec3..dc84337 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -36,6 +36,13 @@
 #define CPER_RECORD_REV				0x0100
 
 /*
+ * CPER record length contains the CPER fields which are relevant for further
+ * handling of a memory error in userspace (we don't carry all the fields
+ * defined in the UEFI spec because some of them don't make any sense.)
+ * Currently, a length of 256 should be more than enough.
+ */
+#define CPER_REC_LEN					256
+/*
  * Severity difinition for error_severity in struct cper_record_header
  * and section_severity in struct cper_section_descriptor
  */
@@ -395,7 +402,11 @@ struct cper_sec_pcie {
 #pragma pack()
 
 u64 cper_next_record_id(void);
+const char *cper_severity_str(unsigned int);
+const char *cper_mem_err_type_str(unsigned int);
 void cper_print_bits(const char *prefix, unsigned int bits,
 		     const char * const strs[], unsigned int strs_size);
+int cper_mem_err_location(const struct cper_sec_mem_err *mem, char *msg);
+int cper_dimm_err_location(const struct cper_sec_mem_err *mem, char *msg);
 
 #endif
-- 
2.0.0.rc0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 4/7 v2] RAS, debugfs: Add debugfs interface for RAS subsystem
  2014-05-15  8:30 New eMCA trace event interface Chen, Gong
                   ` (2 preceding siblings ...)
  2014-05-15  8:30 ` [PATCH 3/7 v4] CPER: Adjust code flow of some functions Chen, Gong
@ 2014-05-15  8:30 ` Chen, Gong
  2014-05-15  8:30 ` [PATCH 5/7 v5] trace, RAS: Add eMCA trace event interface Chen, Gong
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-15  8:30 UTC (permalink / raw)
  To: tony.luck, bp, m.chehab; +Cc: linux-acpi, Chen, Gong

Implement a new debugfs interface for RAS susbsystem.
A file named daemon_active is added there accordingly.
This file is used to track if user space daemon enables
perf/trace interface or not. One can track which daemon
opens it via "lsof /path/to/debugfs/ras/daemon_active".

v2 -> v1: Change file access mode from 0444 to 0400.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/ras/Makefile  |  2 +-
 drivers/ras/debugfs.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/ras/ras.c     | 14 +++++++++++++
 include/linux/ras.h   | 15 ++++++++++++++
 4 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ras/debugfs.c
 create mode 100644 include/linux/ras.h

diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 223e806..d7f7334 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -1 +1 @@
-obj-$(CONFIG_RAS) += ras.o
+obj-$(CONFIG_RAS) += ras.o debugfs.o
diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
new file mode 100644
index 0000000..d0bc389
--- /dev/null
+++ b/drivers/ras/debugfs.c
@@ -0,0 +1,57 @@
+#include <linux/debugfs.h>
+
+struct dentry *ras_debugfs_dir;
+EXPORT_SYMBOL_GPL(ras_debugfs_dir);
+
+static atomic_t trace_count = ATOMIC_INIT(0);
+
+int ras_userspace_consumers(void)
+{
+	return atomic_read(&trace_count);
+}
+EXPORT_SYMBOL_GPL(ras_userspace_consumers);
+
+static int trace_show(struct seq_file *m, void *v)
+{
+	return atomic_read(&trace_count);
+}
+
+static int trace_open(struct inode *inode, struct file *file)
+{
+	atomic_inc(&trace_count);
+	return single_open(file, trace_show, NULL);
+}
+
+static int trace_release(struct inode *inode, struct file *file)
+{
+	atomic_dec(&trace_count);
+	return single_release(inode, file);
+}
+
+static const struct file_operations trace_fops = {
+	.open    = trace_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = trace_release,
+};
+
+int __init ras_add_daemon_trace(void)
+{
+	struct dentry *fentry;
+
+	if (!ras_debugfs_dir)
+		return -ENOENT;
+
+	fentry = debugfs_create_file("daemon_active", S_IRUSR, ras_debugfs_dir,
+				     NULL, &trace_fops);
+	if (!fentry)
+		return -ENODEV;
+
+	return 0;
+
+}
+
+void __init ras_debugfs_init(void)
+{
+	ras_debugfs_dir = debugfs_create_dir("ras", NULL);
+}
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index b0c6ed1..4cac43a 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -5,8 +5,22 @@
  *	Chen, Gong <gong.chen@linux.intel.com>
  */
 
+#include <linux/init.h>
+#include <linux/ras.h>
+
 #define CREATE_TRACE_POINTS
 #define TRACE_INCLUDE_PATH ../../include/ras
 #include <ras/ras_event.h>
 
+static int __init ras_init(void)
+{
+	int rc = 0;
+
+	ras_debugfs_init();
+	rc = ras_add_daemon_trace();
+
+	return rc;
+}
+subsys_initcall(ras_init);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
diff --git a/include/linux/ras.h b/include/linux/ras.h
new file mode 100644
index 0000000..af53248
--- /dev/null
+++ b/include/linux/ras.h
@@ -0,0 +1,15 @@
+#ifndef __RAS_H__
+#define __RAS_H__
+
+#ifdef CONFIG_DEBUG_FS
+extern struct dentry *ras_debugfs_dir;
+int ras_userspace_consumers(void);
+void ras_debugfs_init(void);
+int ras_add_daemon_trace(void);
+#else
+static inline int ras_userspace_consumers(void) { return 0; }
+static inline void ras_debugfs_init(void) { return; }
+static inline int ras_add_daemon_trace(void) { return 0; }
+#endif
+
+#endif
-- 
2.0.0.rc0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 5/7 v5] trace, RAS: Add eMCA trace event interface
  2014-05-15  8:30 New eMCA trace event interface Chen, Gong
                   ` (3 preceding siblings ...)
  2014-05-15  8:30 ` [PATCH 4/7 v2] RAS, debugfs: Add debugfs interface for RAS subsystem Chen, Gong
@ 2014-05-15  8:30 ` Chen, Gong
  2014-05-15  8:30 ` [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log Chen, Gong
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-15  8:30 UTC (permalink / raw)
  To: tony.luck, bp, m.chehab; +Cc: linux-acpi, Chen, Gong

Add trace interface to elaborate all H/W error related information.

v5 -> v4: Add physical mask(LSB) in trace.
v4 -> v3: change ras trace dependency rule.
v3 -> v2: minor adjustment according to the suggestion from Boris.
v2 -> v1: spinlock is not needed anymore.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/acpi/Kconfig       |  4 +++-
 drivers/acpi/acpi_extlog.c | 58 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/ras/ras.c          |  1 +
 include/ras/ras_event.h    | 57 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 116 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ab686b3..5af6013 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -353,6 +353,7 @@ config ACPI_EXTLOG
 	tristate "Extended Error Log support"
 	depends on X86_MCE && X86_LOCAL_APIC
 	select UEFI_CPER
+	select RAS_TRACE
 	default n
 	help
 	  Certain usages such as Predictive Failure Analysis (PFA) require
@@ -367,6 +368,7 @@ config ACPI_EXTLOG
 
 	  Enhanced MCA Logging allows firmware to provide additional error
 	  information to system software, synchronous with MCE or CMCI. This
-	  driver adds support for that functionality.
+	  driver adds support for that functionality with corresponding
+	  tracepoint which carries that information to userspace.
 
 endif	# ACPI
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index c4a5d87..b1dcb5b 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -16,6 +16,7 @@
 #include <asm/mce.h>
 
 #include "apei/apei-internal.h"
+#include <ras/ras_event.h>
 
 #define EXT_ELOG_ENTRY_MASK	GENMASK_ULL(51, 0) /* elog entry address mask */
 
@@ -43,6 +44,9 @@ struct extlog_l1_head {
 
 static int old_edac_report_status;
 
+static char mem_location[CPER_REC_LEN];
+static char dimm_location[CPER_REC_LEN];
+
 static u8 extlog_dsm_uuid[] __initdata = "663E35AF-CC10-41A4-88EA-5470AF055295";
 
 /* L1 table related physical address */
@@ -69,6 +73,34 @@ static u32 l1_percpu_entry;
 #define ELOG_ENTRY_ADDR(phyaddr) \
 	(phyaddr - elog_base + (u8 *)elog_addr)
 
+static void __trace_mem_error(const uuid_le *fru_id, char *fru_text,
+			       u64 err_count, u32 severity,
+			       struct cper_sec_mem_err *mem)
+{
+	u32 etype = ~0U;
+	char pa_info[64];
+	u8 n = 0;
+
+	if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
+		etype = mem->error_type;
+
+	memset(pa_info, 0, 64);
+	if (mem->validation_bits & CPER_MEM_VALID_PA)
+		n = snprintf(pa_info, 63, "physical addr: 0x%016llx ",
+			     mem->physical_addr);
+
+	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
+		snprintf(pa_info + n, 63 - n, "addr LSB: 0x%x ",
+			 (u8)__ffs64(mem->physical_addr_mask));
+
+	memset(mem_location, 0, CPER_REC_LEN);
+	cper_mem_err_location(mem, mem_location);
+	memset(dimm_location, 0, CPER_REC_LEN);
+	cper_dimm_err_location(mem, dimm_location);
+	trace_extlog_mem_event(etype, fru_id, err_count, severity,
+			       dimm_location, pa_info, mem_location, fru_text);
+}
+
 static struct acpi_generic_status *extlog_elog_entry_check(int cpu, int bank)
 {
 	int idx;
@@ -137,8 +169,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 	struct mce *mce = (struct mce *)data;
 	int	bank = mce->bank;
 	int	cpu = mce->extcpu;
-	struct acpi_generic_status *estatus;
-	int rc;
+	struct acpi_generic_status *estatus, *tmp;
+	struct acpi_generic_data *gdata;
+	const uuid_le *fru_id = &NULL_UUID_LE;
+	char *fru_text = "";
+	uuid_le *sec_type;
+	static u64 err_count;
 
 	estatus = extlog_elog_entry_check(cpu, bank);
 	if (estatus == NULL)
@@ -148,7 +184,23 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 	/* clear record status to enable BIOS to update it again */
 	estatus->block_status = 0;
 
-	rc = print_extlog_rcd(NULL, (struct acpi_generic_status *)elog_buf, cpu);
+	tmp = (struct acpi_generic_status *)elog_buf;
+	print_extlog_rcd(NULL, tmp, cpu);
+
+	/* log event via trace */
+	err_count++;
+	gdata = (struct acpi_generic_data *)(tmp + 1);
+	if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+		fru_id = (uuid_le *)gdata->fru_id;
+	if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+		fru_text = gdata->fru_text;
+	sec_type = (uuid_le *)gdata->section_type;
+	if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
+		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+		if (gdata->error_data_length >= sizeof(*mem_err))
+			__trace_mem_error(fru_id, fru_text, err_count,
+					  gdata->error_severity, mem_err);
+	}
 
 	return NOTIFY_STOP;
 }
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 4cac43a..da227a3 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -23,4 +23,5 @@ static int __init ras_init(void)
 }
 subsys_initcall(ras_init);
 
+EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index acbcbb8..ac6e6d1 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -9,6 +9,63 @@
 #include <linux/edac.h>
 #include <linux/ktime.h>
 #include <linux/aer.h>
+#include <linux/cper.h>
+
+
+/*
+ * MCE Extended Error Log trace event
+ *
+ * These events are generated when hardware detects a corrected or
+ * uncorrected event.
+ *
+ */
+
+/* memory trace event */
+
+TRACE_EVENT(extlog_mem_event,
+	TP_PROTO(u32 etype,
+		 const uuid_le *fru_id,
+		 u64 error_count,
+		 u32 severity,
+		 char *dimm_info,
+		 char *pa_info,
+		 char *mem_loc,
+		 char *fru_text),
+
+	TP_ARGS(etype, fru_id, error_count, severity, dimm_info, pa_info,
+		mem_loc, fru_text),
+
+	TP_STRUCT__entry(
+		__field(u32, etype)
+		__field(u64, error_count)
+		__field(u32, severity)
+		__string(dimm_info, dimm_info)
+		__string(pa_info, pa_info)
+		__string(mem_loc, mem_loc)
+		__dynamic_array(char, fru, CPER_REC_LEN)
+	),
+
+	TP_fast_assign(
+		__entry->error_count = error_count;
+		__entry->severity = severity;
+		__entry->etype = etype;
+		__assign_str(dimm_info, dimm_info);
+		__assign_str(pa_info, pa_info);
+		__assign_str(mem_loc, mem_loc);
+		snprintf(__get_dynamic_array(fru), CPER_REC_LEN - 1,
+			 "FRU: %pUl %.20s", fru_id, fru_text);
+	),
+
+	TP_printk("%llu %s error%s: %s %s%s%s%s",
+		  __entry->error_count,
+		  cper_severity_str(__entry->severity),
+		  __entry->error_count > 1 ? "s" : "",
+		  cper_mem_err_type_str(__entry->etype),
+		  __get_str(dimm_info),
+		  __get_str(pa_info),
+		  __get_str(mem_loc),
+		  __get_str(fru))
+);
 
 /*
  * Hardware Events Report
-- 
2.0.0.rc0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log
  2014-05-15  8:30 New eMCA trace event interface Chen, Gong
                   ` (4 preceding siblings ...)
  2014-05-15  8:30 ` [PATCH 5/7 v5] trace, RAS: Add eMCA trace event interface Chen, Gong
@ 2014-05-15  8:30 ` Chen, Gong
  2014-05-21 11:06   ` Borislav Petkov
  2014-05-15  8:30 ` [PATCH 7/7] RAS, extlog: Adjust init flow Chen, Gong
  2014-05-28  3:32 ` new trace output format Chen, Gong
  7 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-15  8:30 UTC (permalink / raw)
  To: tony.luck, bp, m.chehab; +Cc: linux-acpi, Chen, Gong

To avoid saving two copies for one H/W event, add a new
file under debugfs to control how to save event log.
Once this file is opened, the perf/trace will be used,
in the meanwhile, kernel will stop to print event log
to the console. On the other hand, if this file is closed,
kernel will print event log to the console again.

v3 -> v2: minor adjustment to make flow cleanly.
v2 -> v1: move counter operation from *read* to *open*.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/acpi/acpi_extlog.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index b1dcb5b..c1dab37 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -12,6 +12,7 @@
 #include <linux/cper.h>
 #include <linux/ratelimit.h>
 #include <linux/edac.h>
+#include <linux/ras.h>
 #include <asm/cpu.h>
 #include <asm/mce.h>
 
@@ -185,7 +186,11 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 	estatus->block_status = 0;
 
 	tmp = (struct acpi_generic_status *)elog_buf;
-	print_extlog_rcd(NULL, tmp, cpu);
+
+	if (ras_userspace_consumers() == 0) {
+		print_extlog_rcd(NULL, tmp, cpu);
+		goto out;
+	}
 
 	/* log event via trace */
 	err_count++;
@@ -202,6 +207,7 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 					  gdata->error_severity, mem_err);
 	}
 
+out:
 	return NOTIFY_STOP;
 }
 
-- 
2.0.0.rc0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 7/7] RAS, extlog: Adjust init flow
  2014-05-15  8:30 New eMCA trace event interface Chen, Gong
                   ` (5 preceding siblings ...)
  2014-05-15  8:30 ` [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log Chen, Gong
@ 2014-05-15  8:30 ` Chen, Gong
  2014-05-28  3:32 ` new trace output format Chen, Gong
  7 siblings, 0 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-15  8:30 UTC (permalink / raw)
  To: tony.luck, bp, m.chehab; +Cc: linux-acpi, Chen, Gong

Unless the platform has eMCA related capability, don't
need to check if there is conflict with EDAC driver.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/acpi/acpi_extlog.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index c1dab37..216eb94 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -254,19 +254,16 @@ static int __init extlog_init(void)
 	u64 cap;
 	int rc;
 
+	rdmsrl(MSR_IA32_MCG_CAP, cap);
+
+	if (!(cap & MCG_ELOG_P) || !extlog_get_l1addr())
+		return -ENODEV;
+
 	if (get_edac_report_status() == EDAC_REPORTING_FORCE) {
 		pr_warn("Not loading eMCA, error reporting force-enabled through EDAC.\n");
 		return -EPERM;
 	}
 
-	rc = -ENODEV;
-	rdmsrl(MSR_IA32_MCG_CAP, cap);
-	if (!(cap & MCG_ELOG_P))
-		return rc;
-
-	if (!extlog_get_l1addr())
-		return rc;
-
 	rc = -EINVAL;
 	/* get L1 header to fetch necessary information */
 	l1_hdr_size = sizeof(struct extlog_l1_head);
-- 
2.0.0.rc0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7 v3] trace, AER: Move trace into unified interface
  2014-05-15  8:30 ` [PATCH 2/7 v3] trace, AER: Move trace into unified interface Chen, Gong
@ 2014-05-21 10:19   ` Borislav Petkov
  2014-05-22  0:03     ` Chen, Gong
  0 siblings, 1 reply; 53+ messages in thread
From: Borislav Petkov @ 2014-05-21 10:19 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Thu, May 15, 2014 at 04:30:41AM -0400, Chen, Gong wrote:
> AER uses a separate trace interface by now. To make it
> consistent, move it into unified RAS trace interface.
> 
> v3 -> v2: change dependency rule of RAS_TRACE.
> v2 -> v1: remove unnecessary dependency in drivers/ras/Kconfig.
> 
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> ---
>  drivers/pci/pcie/aer/Kconfig           |  1 +
>  drivers/pci/pcie/aer/aerdrv_errprint.c |  4 +-
>  include/ras/ras_event.h                | 64 ++++++++++++++++++++++++++++
>  include/trace/events/ras.h             | 77 ----------------------------------

Ok, I don't understand: all the rest of the kernel holds tracepoints in
include/trace/events/. Why are you moving this at all?

Why can't the new extlog tracepoint in patch 5 be added to
include/trace/events/ras.h too?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-15  8:30 ` [PATCH 3/7 v4] CPER: Adjust code flow of some functions Chen, Gong
@ 2014-05-21 11:05   ` Borislav Petkov
  2014-05-21 23:51     ` Chen, Gong
  0 siblings, 1 reply; 53+ messages in thread
From: Borislav Petkov @ 2014-05-21 11:05 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Thu, May 15, 2014 at 04:30:42AM -0400, Chen, Gong wrote:
> +const char *cper_mem_err_type_str(unsigned int etype)
>  {
> -	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
> -		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
> -	if (mem->validation_bits & CPER_MEM_VALID_PA)
> -		printk("%s""physical_address: 0x%016llx\n",
> -		       pfx, mem->physical_addr);
> -	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> -		printk("%s""physical_address_mask: 0x%016llx\n",

The physical address mask is still not part of the tracepoint as a u8 as
we talked.

> -		       pfx, mem->physical_addr_mask);
> +	return etype < ARRAY_SIZE(mem_err_type_strs) ?
> +		mem_err_type_strs[etype] : "unknown";
> +}
> +EXPORT_SYMBOL_GPL(cper_mem_err_type_str);

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log
  2014-05-15  8:30 ` [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log Chen, Gong
@ 2014-05-21 11:06   ` Borislav Petkov
  2014-05-21 23:46     ` Chen, Gong
  0 siblings, 1 reply; 53+ messages in thread
From: Borislav Petkov @ 2014-05-21 11:06 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Thu, May 15, 2014 at 04:30:45AM -0400, Chen, Gong wrote:
> To avoid saving two copies for one H/W event, add a new
> file under debugfs to control how to save event log.
> Once this file is opened, the perf/trace will be used,
> in the meanwhile, kernel will stop to print event log
> to the console. On the other hand, if this file is closed,
> kernel will print event log to the console again.
> 
> v3 -> v2: minor adjustment to make flow cleanly.
> v2 -> v1: move counter operation from *read* to *open*.
> 
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> ---
>  drivers/acpi/acpi_extlog.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index b1dcb5b..c1dab37 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -12,6 +12,7 @@
>  #include <linux/cper.h>
>  #include <linux/ratelimit.h>
>  #include <linux/edac.h>
> +#include <linux/ras.h>
>  #include <asm/cpu.h>
>  #include <asm/mce.h>
>  
> @@ -185,7 +186,11 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
>  	estatus->block_status = 0;
>  
>  	tmp = (struct acpi_generic_status *)elog_buf;
> -	print_extlog_rcd(NULL, tmp, cpu);
> +
> +	if (ras_userspace_consumers() == 0) {

	if (!ras_userspace_consumers())

> +		print_extlog_rcd(NULL, tmp, cpu);
> +		goto out;
> +	}
>  
>  	/* log event via trace */
>  	err_count++;
> @@ -202,6 +207,7 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
>  					  gdata->error_severity, mem_err);
>  	}
>  
> +out:
>  	return NOTIFY_STOP;
>  }
>  
> -- 
> 2.0.0.rc0
> 
> 

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log
  2014-05-21 11:06   ` Borislav Petkov
@ 2014-05-21 23:46     ` Chen, Gong
  2014-05-22 11:11       ` Borislav Petkov
  0 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-21 23:46 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, m.chehab, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 245 bytes --]

On Wed, May 21, 2014 at 01:06:31PM +0200, Borislav Petkov wrote:
> > +	if (ras_userspace_consumers() == 0) {
> 
> 	if (!ras_userspace_consumers())
> 
No, it is not a pointer so I don't think it is very
meaningful just to save some bytes.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-21 11:05   ` Borislav Petkov
@ 2014-05-21 23:51     ` Chen, Gong
  2014-05-22 10:52       ` Borislav Petkov
  0 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-21 23:51 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, m.chehab, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

On Wed, May 21, 2014 at 01:05:21PM +0200, Borislav Petkov wrote:
> > +const char *cper_mem_err_type_str(unsigned int etype)
> >  {
> > -	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
> > -		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
> > -	if (mem->validation_bits & CPER_MEM_VALID_PA)
> > -		printk("%s""physical_address: 0x%016llx\n",
> > -		       pfx, mem->physical_addr);
> > -	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> > -		printk("%s""physical_address_mask: 0x%016llx\n",
> 
> The physical address mask is still not part of the tracepoint as a u8 as
> we talked.
> 
I thought out discussion is only for trace part. But it is OK to me to
make whole style aligned.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7 v3] trace, AER: Move trace into unified interface
  2014-05-21 10:19   ` Borislav Petkov
@ 2014-05-22  0:03     ` Chen, Gong
  2014-05-22 10:41       ` Borislav Petkov
  0 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-22  0:03 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, m.chehab, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

On Wed, May 21, 2014 at 12:19:03PM +0200, Borislav Petkov wrote:
> >  drivers/pci/pcie/aer/Kconfig           |  1 +
> >  drivers/pci/pcie/aer/aerdrv_errprint.c |  4 +-
> >  include/ras/ras_event.h                | 64 ++++++++++++++++++++++++++++
> >  include/trace/events/ras.h             | 77 ----------------------------------
> 
> Ok, I don't understand: all the rest of the kernel holds tracepoints in
> include/trace/events/. Why are you moving this at all?
> 
> Why can't the new extlog tracepoint in patch 5 be added to
> include/trace/events/ras.h too?
> 
OMG. Again? Long time ago Mauro highly recommended me not putting new
tracepoints under include/trace/events/ras.h, on the contrary, it shoud
be put to include/ras/ras_event.h. But now the same story plays
again. :-(

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7 v3] trace, AER: Move trace into unified interface
  2014-05-22  0:03     ` Chen, Gong
@ 2014-05-22 10:41       ` Borislav Petkov
  0 siblings, 0 replies; 53+ messages in thread
From: Borislav Petkov @ 2014-05-22 10:41 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Chen, Gong, tony.luck, m.chehab, linux-acpi

On Wed, May 21, 2014 at 08:03:48PM -0400, Chen, Gong wrote:
> On Wed, May 21, 2014 at 12:19:03PM +0200, Borislav Petkov wrote:
> > >  drivers/pci/pcie/aer/Kconfig           |  1 +
> > >  drivers/pci/pcie/aer/aerdrv_errprint.c |  4 +-
> > >  include/ras/ras_event.h                | 64 ++++++++++++++++++++++++++++
> > >  include/trace/events/ras.h             | 77 ----------------------------------
> > 
> > Ok, I don't understand: all the rest of the kernel holds tracepoints in
> > include/trace/events/. Why are you moving this at all?
> > 
> > Why can't the new extlog tracepoint in patch 5 be added to
> > include/trace/events/ras.h too?
> > 
> OMG. Again? Long time ago Mauro highly recommended me not putting new
> tracepoints under include/trace/events/ras.h, on the contrary, it shoud
> be put to include/ras/ras_event.h. But now the same story plays
> again. :-(

Hey Steve, can you help us out please? What is the rationale about
tracepoints and their place? Do we put them in include/trace/events/ or
do we put there only generic enough ones or do we define our own include
path, i.e., include/ras/ras_event.h for example, what is it?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-21 23:51     ` Chen, Gong
@ 2014-05-22 10:52       ` Borislav Petkov
  2014-05-23  1:49         ` Chen, Gong
  0 siblings, 1 reply; 53+ messages in thread
From: Borislav Petkov @ 2014-05-22 10:52 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Wed, May 21, 2014 at 07:51:59PM -0400, Chen, Gong wrote:
> On Wed, May 21, 2014 at 01:05:21PM +0200, Borislav Petkov wrote:
> > > +const char *cper_mem_err_type_str(unsigned int etype)
> > >  {
> > > -	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
> > > -		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
> > > -	if (mem->validation_bits & CPER_MEM_VALID_PA)
> > > -		printk("%s""physical_address: 0x%016llx\n",
> > > -		       pfx, mem->physical_addr);
> > > -	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> > > -		printk("%s""physical_address_mask: 0x%016llx\n",
> > 
> > The physical address mask is still not part of the tracepoint as a u8 as
> > we talked.
> > 
> I thought out discussion is only for trace part. But it is OK to me to
> make whole style aligned.

No, I'm not talking about style - I'm talking about adding the physical address
mask to the tracepoint call:

+TRACE_EVENT(extlog_mem_event,
+       TP_PROTO(u32 etype,
+                const uuid_le *fru_id,
+                u64 error_count,

Btw, is that the error_count we're reporting?? You surely can't claim
that we'll ever report 2^64-1 errors, right?

I'd make that u32 and I'd call it

		u32 error_number;

as it is a counter we're incrementing.

+                u32 severity,

That severity can surely be u8 - we can't have 2^32-1 severities in any
normal case - I see only 5. I'm sure 256 is plenty.

And now that we slimmed some of those insanely-sized members, we can add

		u8 pa_mask_lsb

or something to that effect.

Makes sense?

+                char *dimm_info,
+                char *pa_info,
+                char *mem_loc,
+                char *fru_text),

+TRACE_EVENT(extlog_mem_event,


-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log
  2014-05-21 23:46     ` Chen, Gong
@ 2014-05-22 11:11       ` Borislav Petkov
  2014-05-23  1:40         ` Chen, Gong
  2014-05-28  3:27         ` [PATCH 6/7 v4] " Chen, Gong
  0 siblings, 2 replies; 53+ messages in thread
From: Borislav Petkov @ 2014-05-22 11:11 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Wed, May 21, 2014 at 07:46:26PM -0400, Chen, Gong wrote:
> On Wed, May 21, 2014 at 01:06:31PM +0200, Borislav Petkov wrote:
> > > +	if (ras_userspace_consumers() == 0) {
> > 
> > 	if (!ras_userspace_consumers())
> > 
> No, it is not a pointer so I don't think it is very
> meaningful just to save some bytes.

Btw, this is exactly why your patches take too long to review - you like
to debate more instead of listening to the maintainers. Next time you
want to speed up the process, just think about that.

I think the amount of time I wasted to explain all the crap to you is
more than I've spent actually reviewing your patches. How about you
do what you're told for a change, not change agreed upon stuff after
review because then I have to go and review it all over again from the
beginning and thus make both our lives easier?

As to the question why you should listen to the maintainers: that's
because we get to maintain your code after you go and do something else
so it better be readable to us.

Now to answer your direct question:

	if (!ras_userspace_consumers())

reads straight away as "if there are no ras userspace consumers" instead
of "if the number of the ras userspace consumers is zero".

Got it?!

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log
  2014-05-22 11:11       ` Borislav Petkov
@ 2014-05-23  1:40         ` Chen, Gong
  2014-05-28  3:27         ` [PATCH 6/7 v4] " Chen, Gong
  1 sibling, 0 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-23  1:40 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, m.chehab, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1269 bytes --]

On Thu, May 22, 2014 at 01:11:40PM +0200, Borislav Petkov wrote:
> Date: Thu, 22 May 2014 13:11:40 +0200
> From: Borislav Petkov <bp@alien8.de>
> To: "Chen, Gong" <gong.chen@linux.intel.com>
> Cc: tony.luck@intel.com, m.chehab@samsung.com, linux-acpi@vger.kernel.org
> Subject: Re: [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save
>  event log
> User-Agent: Mutt/1.5.23 (2014-03-12)
> 
> On Wed, May 21, 2014 at 07:46:26PM -0400, Chen, Gong wrote:
> > On Wed, May 21, 2014 at 01:06:31PM +0200, Borislav Petkov wrote:
> > > > +	if (ras_userspace_consumers() == 0) {
> > > 
> > > 	if (!ras_userspace_consumers())
> > > 
> > No, it is not a pointer so I don't think it is very
> > meaningful just to save some bytes.
> 
> Btw, this is exactly why your patches take too long to review - you like
> to debate more instead of listening to the maintainers. Next time you
> want to speed up the process, just think about that.
> 
Sorry, I'm afraid I can't agree with you. Why should I just follow you
blindly without any my own voice? Even if some times I act like an
idiot. If I can't say anything for so small thing, do you expect I
should shout aloud for something like kernel infrastructure?

OK, I will change soon, as you wish. 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-22 10:52       ` Borislav Petkov
@ 2014-05-23  1:49         ` Chen, Gong
  2014-05-23  9:37           ` Borislav Petkov
  0 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-23  1:49 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, m.chehab, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1507 bytes --]

On Thu, May 22, 2014 at 12:52:42PM +0200, Borislav Petkov wrote:
> > > > +const char *cper_mem_err_type_str(unsigned int etype)
> > > >  {
> > > > -	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
> > > > -		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
> > > > -	if (mem->validation_bits & CPER_MEM_VALID_PA)
> > > > -		printk("%s""physical_address: 0x%016llx\n",
> > > > -		       pfx, mem->physical_addr);
> > > > -	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> > > > -		printk("%s""physical_address_mask: 0x%016llx\n",
> > > 
> > > The physical address mask is still not part of the tracepoint as a u8 as
> > > we talked.
> > > 
> > I thought out discussion is only for trace part. But it is OK to me to
> > make whole style aligned.
> 
> No, I'm not talking about style - I'm talking about adding the physical address
> mask to the tracepoint call:

If so, it has been there already. Maybe you should check patch 5/7. I merge
pa/pa_mask into pa_info as a whole to avoid too much calculation/logic in
trace.

> +TRACE_EVENT(extlog_mem_event,
> +       TP_PROTO(u32 etype,
> +                const uuid_le *fru_id,
> +                u64 error_count,
> 
> Btw, is that the error_count we're reporting?? You surely can't claim
> that we'll ever report 2^64-1 errors, right?
> 
> I'd make that u32 and I'd call it
> 
> 		u32 error_number;
Fine.
> 
> as it is a counter we're incrementing.
> 
> +                u32 severity,
Fine.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-23  1:49         ` Chen, Gong
@ 2014-05-23  9:37           ` Borislav Petkov
  2014-05-23 10:11             ` Borislav Petkov
  2014-05-26  2:07             ` Chen, Gong
  0 siblings, 2 replies; 53+ messages in thread
From: Borislav Petkov @ 2014-05-23  9:37 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Thu, May 22, 2014 at 09:49:10PM -0400, Chen, Gong wrote:
> If so, it has been there already. Maybe you should check patch
> 5/7. I merge pa/pa_mask into pa_info as a whole to avoid too much
> calculation/logic in trace.

My bad, how did I miss that:

> +static void __trace_mem_error(const uuid_le *fru_id, char *fru_text,
> +                              u64 err_count, u32 severity,
> +                              struct cper_sec_mem_err *mem)
> +{
> +       u32 etype = ~0U;
> +       char pa_info[64];
> +       u8 n = 0;
> +
> +       if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
> +               etype = mem->error_type;

More SNAFU: mem->error_type is u8 and you're saving it into a u32. What
possible reason can you have for that?

> +
> +       memset(pa_info, 0, 64);
> +       if (mem->validation_bits & CPER_MEM_VALID_PA)
> +               n = snprintf(pa_info, 63, "physical addr: 0x%016llx ",
> +                            mem->physical_addr);
> +
> +       if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> +               snprintf(pa_info + n, 63 - n, "addr LSB: 0x%x ",
> +                        (u8)__ffs64(mem->physical_addr_mask));

So pa_info is 64 bytes!!! For what, a u64 and a u8? That's 9 bytes.

You don't seem to get the idea that we cannot allow ourselves to waste
bytes in an error record like that. How hard it is to comprehend? Am I
telling it wrong or do you need someone else to explain it to you?

Ok, here's how the tracepoint should look like:

	TP_PROTO(u32 error_number,
		 u8 etype,
		 u8 severity,
		 u64 pa,
		 u8 pa_mask_lsb,
		 const uuid_le *fru_id,
		 char *dimm_info,
		 char *mem_loc,
		 char *fru_text)

Now if you have valid technical reasons why it shouldn't be done
that way, do tell. Otherwise do it exactly this way without changing
anything.

Or if you don't wanna, I can do it instead - it'll be much easier for me
than reviewing it again.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-23  9:37           ` Borislav Petkov
@ 2014-05-23 10:11             ` Borislav Petkov
  2014-05-26  1:59               ` Chen, Gong
  2014-05-26  2:07             ` Chen, Gong
  1 sibling, 1 reply; 53+ messages in thread
From: Borislav Petkov @ 2014-05-23 10:11 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Fri, May 23, 2014 at 11:37:03AM +0200, Borislav Petkov wrote:
> Or if you don't wanna, I can do it instead - it'll be much easier for
> me than reviewing it again.

Here's a version with the suggested changes incorporated that builds
fine here:

--
Index: linux/drivers/acpi/Kconfig
===================================================================
--- linux.orig/drivers/acpi/Kconfig	2014-05-23 11:14:33.856625534 +0200
+++ linux/drivers/acpi/Kconfig	2014-05-23 11:14:33.840625534 +0200
@@ -370,6 +370,7 @@ config ACPI_EXTLOG
 	tristate "Extended Error Log support"
 	depends on X86_MCE && X86_LOCAL_APIC
 	select UEFI_CPER
+	select RAS_TRACE
 	default n
 	help
 	  Certain usages such as Predictive Failure Analysis (PFA) require
@@ -384,6 +385,7 @@ config ACPI_EXTLOG
 
 	  Enhanced MCA Logging allows firmware to provide additional error
 	  information to system software, synchronous with MCE or CMCI. This
-	  driver adds support for that functionality.
+	  driver adds support for that functionality with corresponding
+	  tracepoint which carries that information to userspace.
 
 endif	# ACPI
Index: linux/drivers/acpi/acpi_extlog.c
===================================================================
--- linux.orig/drivers/acpi/acpi_extlog.c	2014-05-23 11:14:33.856625534 +0200
+++ linux/drivers/acpi/acpi_extlog.c	2014-05-23 12:09:56.000000000 +0200
@@ -16,6 +16,7 @@
 #include <asm/mce.h>
 
 #include "apei/apei-internal.h"
+#include <ras/ras_event.h>
 
 #define EXT_ELOG_ENTRY_MASK	GENMASK_ULL(51, 0) /* elog entry address mask */
 
@@ -43,6 +44,9 @@ struct extlog_l1_head {
 
 static int old_edac_report_status;
 
+static char mem_location[CPER_REC_LEN];
+static char dimm_location[CPER_REC_LEN];
+
 static u8 extlog_dsm_uuid[] __initdata = "663E35AF-CC10-41A4-88EA-5470AF055295";
 
 /* L1 table related physical address */
@@ -69,6 +73,30 @@ static u32 l1_percpu_entry;
 #define ELOG_ENTRY_ADDR(phyaddr) \
 	(phyaddr - elog_base + (u8 *)elog_addr)
 
+static void __trace_mem_error(const uuid_le *fru_id, char *fru_text,
+			       u64 err_count, u32 severity,
+			       struct cper_sec_mem_err *mem)
+{
+	u8 etype = -1, pa_mask_lsb = 0;
+	u64 pa = 0;
+
+	if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
+		etype = mem->error_type;
+
+	if (mem->validation_bits & CPER_MEM_VALID_PA)
+		pa = mem->physical_addr;
+
+	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
+		pa_mask_lsb = (u8)__ffs64(mem->physical_addr_mask);
+
+	memset(mem_location, 0, CPER_REC_LEN);
+	cper_mem_err_location(mem, mem_location);
+	memset(dimm_location, 0, CPER_REC_LEN);
+	cper_dimm_err_location(mem, dimm_location);
+	trace_extlog_mem_event(err_count, etype, severity, pa, pa_mask_lsb,
+			       fru_id, dimm_location, mem_location, fru_text);
+}
+
 static struct acpi_generic_status *extlog_elog_entry_check(int cpu, int bank)
 {
 	int idx;
@@ -137,8 +165,12 @@ static int extlog_print(struct notifier_
 	struct mce *mce = (struct mce *)data;
 	int	bank = mce->bank;
 	int	cpu = mce->extcpu;
-	struct acpi_generic_status *estatus;
-	int rc;
+	struct acpi_generic_status *estatus, *tmp;
+	struct acpi_generic_data *gdata;
+	const uuid_le *fru_id = &NULL_UUID_LE;
+	char *fru_text = "";
+	uuid_le *sec_type;
+	static u64 err_count;
 
 	estatus = extlog_elog_entry_check(cpu, bank);
 	if (estatus == NULL)
@@ -148,7 +180,23 @@ static int extlog_print(struct notifier_
 	/* clear record status to enable BIOS to update it again */
 	estatus->block_status = 0;
 
-	rc = print_extlog_rcd(NULL, (struct acpi_generic_status *)elog_buf, cpu);
+	tmp = (struct acpi_generic_status *)elog_buf;
+	print_extlog_rcd(NULL, tmp, cpu);
+
+	/* log event via trace */
+	err_count++;
+	gdata = (struct acpi_generic_data *)(tmp + 1);
+	if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+		fru_id = (uuid_le *)gdata->fru_id;
+	if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+		fru_text = gdata->fru_text;
+	sec_type = (uuid_le *)gdata->section_type;
+	if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
+		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+		if (gdata->error_data_length >= sizeof(*mem_err))
+			__trace_mem_error(fru_id, fru_text, err_count,
+					  gdata->error_severity, mem_err);
+	}
 
 	return NOTIFY_STOP;
 }
Index: linux/drivers/ras/ras.c
===================================================================
--- linux.orig/drivers/ras/ras.c	2014-05-23 11:14:33.856625534 +0200
+++ linux/drivers/ras/ras.c	2014-05-23 11:14:33.840625534 +0200
@@ -23,4 +23,5 @@ static int __init ras_init(void)
 }
 subsys_initcall(ras_init);
 
+EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
Index: linux/include/ras/ras_event.h
===================================================================
--- linux.orig/include/ras/ras_event.h	2014-05-23 11:14:33.856625534 +0200
+++ linux/include/ras/ras_event.h	2014-05-23 12:10:45.252569816 +0200
@@ -9,6 +9,63 @@
 #include <linux/edac.h>
 #include <linux/ktime.h>
 #include <linux/aer.h>
+#include <linux/cper.h>
+
+
+/*
+ * MCE Extended Error Log trace event
+ *
+ * These events are generated when hardware detects a corrected or
+ * uncorrected event.
+ */
+
+TRACE_EVENT(extlog_mem_event,
+	TP_PROTO(u32 error_number,
+		 u8 etype,
+		 u8 severity,
+		 u64 pa,
+		 u8 pa_mask_lsb,
+		 const uuid_le *fru_id,
+		 const char *dimm_info,
+		 const char *mem_loc,
+		 const char *fru_text),
+
+	TP_ARGS(error_number, etype, severity, pa, pa_mask_lsb,
+		fru_id, dimm_info, mem_loc, fru_text),
+
+	TP_STRUCT__entry(
+		__field(u32, error_number)
+		__field(u8, etype)
+		__field(u8, severity)
+		__field(u64, pa)
+		__field(u8, pa_mask_lsb)
+		__string(dimm_info, dimm_info)
+		__string(mem_loc, mem_loc)
+		__dynamic_array(char, fru, CPER_REC_LEN)
+	),
+
+	TP_fast_assign(
+		__entry->error_number = error_number;
+		__entry->etype = etype;
+		__entry->severity = severity;
+		__entry->pa = pa;
+		__entry->pa_mask_lsb = pa_mask_lsb;
+		__assign_str(dimm_info, dimm_info);
+		__assign_str(mem_loc, mem_loc);
+		snprintf(__get_dynamic_array(fru), CPER_REC_LEN - 1,
+			 "FRU: %pUl %.20s", fru_id, fru_text);
+	),
+
+	TP_printk("%d %s error: %s %s %llx (mask lsb: %x), %s%s",
+		  __entry->error_number,
+		  cper_severity_str(__entry->severity),
+		  cper_mem_err_type_str(__entry->etype),
+		  __get_str(dimm_info),
+		  __entry->pa,
+		  __entry->pa_mask_lsb,
+		  __get_str(mem_loc),
+		  __get_str(fru))
+);
 
 /*
  * Hardware Events Report

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-23 10:11             ` Borislav Petkov
@ 2014-05-26  1:59               ` Chen, Gong
  2014-05-26 10:21                 ` Borislav Petkov
  0 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-26  1:59 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, m.chehab, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]

On Fri, May 23, 2014 at 12:11:43PM +0200, Borislav Petkov wrote:
> +	TP_printk("%d %s error: %s %s %llx (mask lsb: %x), %s%s",
What if pa_mask_lsb not existing? It will show something like:

extlog_mem_event: 1 corrected error: unknown DIMM location:
Memriser1 CHANNEL A DIMM 0 0x0000000074516000 (mask lsb: ), node: 0 card: 0
module: 0 rank: 0 bank: 0 row:
7329 column: 656 FRU: 00000000-0000-0000-0000-000000000000

even worse, if pa not existed, it will show:

extlog_mem_event: 1 corrected error: unknown DIMM location:
Memriser1 CHANNEL A DIMM 0  (mask lsb: ), node: 0 card: 0
module: 0 rank: 0 bank: 0 row:
7329 column: 656 FRU: 00000000-0000-0000-0000-000000000000

What I want to do is to make output format more graceful.

> +		  __entry->error_number,
> +		  cper_severity_str(__entry->severity),
> +		  cper_mem_err_type_str(__entry->etype),
> +		  __get_str(dimm_info),
> +		  __entry->pa,
> +		  __entry->pa_mask_lsb,
> +		  __get_str(mem_loc),
> +		  __get_str(fru))
> +);
>  

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-23  9:37           ` Borislav Petkov
  2014-05-23 10:11             ` Borislav Petkov
@ 2014-05-26  2:07             ` Chen, Gong
  2014-05-26 10:23               ` Borislav Petkov
  1 sibling, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-26  2:07 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, m.chehab, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1970 bytes --]

On Fri, May 23, 2014 at 11:37:03AM +0200, Borislav Petkov wrote:
> Date: Fri, 23 May 2014 11:37:03 +0200
> From: Borislav Petkov <bp@alien8.de>
> To: "Chen, Gong" <gong.chen@linux.intel.com>
> Cc: tony.luck@intel.com, m.chehab@samsung.com, linux-acpi@vger.kernel.org
> Subject: Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
> User-Agent: Mutt/1.5.23 (2014-03-12)
> 
> On Thu, May 22, 2014 at 09:49:10PM -0400, Chen, Gong wrote:
> > If so, it has been there already. Maybe you should check patch
> > 5/7. I merge pa/pa_mask into pa_info as a whole to avoid too much
> > calculation/logic in trace.
> 
> My bad, how did I miss that:
> 
> > +static void __trace_mem_error(const uuid_le *fru_id, char *fru_text,
> > +                              u64 err_count, u32 severity,
> > +                              struct cper_sec_mem_err *mem)
> > +{
> > +       u32 etype = ~0U;
> > +       char pa_info[64];
> > +       u8 n = 0;
> > +
> > +       if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
> > +               etype = mem->error_type;
> 
> More SNAFU: mem->error_type is u8 and you're saving it into a u32. What
> possible reason can you have for that?
OK, I will fix it.
> 
> > +
> > +       memset(pa_info, 0, 64);
> > +       if (mem->validation_bits & CPER_MEM_VALID_PA)
> > +               n = snprintf(pa_info, 63, "physical addr: 0x%016llx ",
> > +                            mem->physical_addr);
> > +
> > +       if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> > +               snprintf(pa_info + n, 63 - n, "addr LSB: 0x%x ",
> > +                        (u8)__ffs64(mem->physical_addr_mask));
> 
> So pa_info is 64 bytes!!! For what, a u64 and a u8? That's 9 bytes.
> 
Oh, in my subconscious I am always afraid some kind of buffer overflow attach.
You are right, here it is obviously too wasteful. I can shrink it to 10
bytes. Please see my another reply for why I hope to use a string.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-26  1:59               ` Chen, Gong
@ 2014-05-26 10:21                 ` Borislav Petkov
  2014-05-26 10:42                   ` Chen, Gong
  0 siblings, 1 reply; 53+ messages in thread
From: Borislav Petkov @ 2014-05-26 10:21 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Sun, May 25, 2014 at 09:59:44PM -0400, Chen, Gong wrote:
> On Fri, May 23, 2014 at 12:11:43PM +0200, Borislav Petkov wrote:
> > +	TP_printk("%d %s error: %s %s %llx (mask lsb: %x), %s%s",
> What if pa_mask_lsb not existing?

Then you make it the default which says that all bits in the mask are
invalid: -1, i.e. 255.

This becomes part of the interface then, just like phys_addr is
0xfffff... ,i.e. -1 in the invalid case.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-26  2:07             ` Chen, Gong
@ 2014-05-26 10:23               ` Borislav Petkov
  0 siblings, 0 replies; 53+ messages in thread
From: Borislav Petkov @ 2014-05-26 10:23 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Sun, May 25, 2014 at 10:07:45PM -0400, Chen, Gong wrote:
> I can shrink it to 10 bytes. Please see my another reply for why I
> hope to use a string.

No need - just instantiate both the physical address and mask to invalid
values and put that in a comment somewhere stating which values are
invalid.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
  2014-05-26 10:21                 ` Borislav Petkov
@ 2014-05-26 10:42                   ` Chen, Gong
  0 siblings, 0 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-26 10:42 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, m.chehab, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

On Mon, May 26, 2014 at 12:21:51PM +0200, Borislav Petkov wrote:
> Date: Mon, 26 May 2014 12:21:51 +0200
> From: Borislav Petkov <bp@alien8.de>
> To: "Chen, Gong" <gong.chen@linux.intel.com>
> Cc: tony.luck@intel.com, m.chehab@samsung.com, linux-acpi@vger.kernel.org
> Subject: Re: [PATCH 3/7 v4] CPER: Adjust code flow of some functions
> User-Agent: Mutt/1.5.23 (2014-03-12)
> 
> On Sun, May 25, 2014 at 09:59:44PM -0400, Chen, Gong wrote:
> > On Fri, May 23, 2014 at 12:11:43PM +0200, Borislav Petkov wrote:
> > > +	TP_printk("%d %s error: %s %s %llx (mask lsb: %x), %s%s",
> > What if pa_mask_lsb not existing?
> 
> Then you make it the default which says that all bits in the mask are
> invalid: -1, i.e. 255.
> 
> This becomes part of the interface then, just like phys_addr is
> 0xfffff... ,i.e. -1 in the invalid case.
> 
OK, fine to me. I will update it soon.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 6/7 v4] trace, eMCA: Add a knob to adjust where to save event log
  2014-05-22 11:11       ` Borislav Petkov
  2014-05-23  1:40         ` Chen, Gong
@ 2014-05-28  3:27         ` Chen, Gong
  1 sibling, 0 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-28  3:27 UTC (permalink / raw)
  To: bp; +Cc: tony.luck, m.chehab, linux-acpi, Chen, Gong

To avoid saving two copies for one H/W event, add a new
file under debugfs to control how to save event log.
Once this file is opened, the perf/trace will be used,
in the meanwhile, kernel will stop to print event log
to the console. On the other hand, if this file is closed,
kernel will print event log to the console again.

v4 -> v3: format adjustment.
v3 -> v2: minor adjustment to make flow cleanly.
v2 -> v1: move counter operation from *read* to *open*.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/acpi/acpi_extlog.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index 8815b73..07a6fe8 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -12,6 +12,7 @@
 #include <linux/cper.h>
 #include <linux/ratelimit.h>
 #include <linux/edac.h>
+#include <linux/ras.h>
 #include <asm/cpu.h>
 #include <asm/mce.h>
 
@@ -181,7 +182,11 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 	estatus->block_status = 0;
 
 	tmp = (struct acpi_generic_status *)elog_buf;
-	print_extlog_rcd(NULL, tmp, cpu);
+
+	if (!ras_userspace_consumers()) {
+		print_extlog_rcd(NULL, tmp, cpu);
+		goto out;
+	}
 
 	/* log event via trace */
 	err_number++;
@@ -198,6 +203,7 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 					  (u8)gdata->error_severity, mem_err);
 	}
 
+out:
 	return NOTIFY_STOP;
 }
 
-- 
2.0.0.rc2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* new trace output format
  2014-05-15  8:30 New eMCA trace event interface Chen, Gong
                   ` (6 preceding siblings ...)
  2014-05-15  8:30 ` [PATCH 7/7] RAS, extlog: Adjust init flow Chen, Gong
@ 2014-05-28  3:32 ` Chen, Gong
  2014-05-28  3:32   ` [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface Chen, Gong
  2014-05-28 16:23   ` new trace output format Borislav Petkov
  7 siblings, 2 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-28  3:32 UTC (permalink / raw)
  To: bp; +Cc: tony.luck, m.chehab, linux-acpi

Hi, Boris

Here is the new output format:

          <idle>-0     [000] d.h.   242.340120: extlog_mem_event: 1 corrected error: unknown DIMM location: Memriser1 CHANNEL A DIMM 0 physical addr: 0000000441a07000 (mask lsb: ff), node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 32336 column: 1056 FRU: 00000000-0000-0000-0000-000000000000
          <idle>-0     [000] d.h.   349.380406: extlog_mem_event: 2 corrected error: unknown DIMM location: Memriser1 CHANNEL A DIMM 0 physical addr: 00000004416ae000 (mask lsb: ff), node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 31802 column: 1064 FRU: 00000000-0000-0000-0000-000000000000

I add extra "physical addr:" as prefix. If you don't like it, please feel free to remove it.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-28  3:32 ` new trace output format Chen, Gong
@ 2014-05-28  3:32   ` Chen, Gong
  2014-05-28 15:28     ` Steven Rostedt
  2014-05-28 16:23   ` new trace output format Borislav Petkov
  1 sibling, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-28  3:32 UTC (permalink / raw)
  To: bp; +Cc: tony.luck, m.chehab, linux-acpi, Chen, Gong

Add trace interface to elaborate all H/W error related information.

v6 -> v5: format adjustment.
v5 -> v4: Add physical mask(LSB) in trace.
v4 -> v3: change ras trace dependency rule.
v3 -> v2: minor adjustment according to the suggestion from Boris.
v2 -> v1: spinlock is not needed anymore.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/acpi/Kconfig       |  4 +++-
 drivers/acpi/acpi_extlog.c | 54 +++++++++++++++++++++++++++++++++++++++---
 drivers/ras/ras.c          |  1 +
 include/ras/ras_event.h    | 59 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 114 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index a34a228..099a2d5 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -370,6 +370,7 @@ config ACPI_EXTLOG
 	tristate "Extended Error Log support"
 	depends on X86_MCE && X86_LOCAL_APIC
 	select UEFI_CPER
+	select RAS_TRACE
 	default n
 	help
 	  Certain usages such as Predictive Failure Analysis (PFA) require
@@ -384,6 +385,7 @@ config ACPI_EXTLOG
 
 	  Enhanced MCA Logging allows firmware to provide additional error
 	  information to system software, synchronous with MCE or CMCI. This
-	  driver adds support for that functionality.
+	  driver adds support for that functionality with corresponding
+	  tracepoint which carries that information to userspace.
 
 endif	# ACPI
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index c4a5d87..8815b73 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -16,6 +16,7 @@
 #include <asm/mce.h>
 
 #include "apei/apei-internal.h"
+#include <ras/ras_event.h>
 
 #define EXT_ELOG_ENTRY_MASK	GENMASK_ULL(51, 0) /* elog entry address mask */
 
@@ -43,6 +44,9 @@ struct extlog_l1_head {
 
 static int old_edac_report_status;
 
+static char mem_location[CPER_REC_LEN];
+static char dimm_location[CPER_REC_LEN];
+
 static u8 extlog_dsm_uuid[] __initdata = "663E35AF-CC10-41A4-88EA-5470AF055295";
 
 /* L1 table related physical address */
@@ -69,6 +73,30 @@ static u32 l1_percpu_entry;
 #define ELOG_ENTRY_ADDR(phyaddr) \
 	(phyaddr - elog_base + (u8 *)elog_addr)
 
+static void __trace_mem_error(const uuid_le *fru_id, char *fru_text,
+			       u32 err_number, u8 severity,
+			       struct cper_sec_mem_err *mem)
+{
+	u8 etype = ~0, pa_mask_lsb = ~0;
+	u64 pa = ~0ull;
+
+	if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
+		etype = mem->error_type;
+
+	if (mem->validation_bits & CPER_MEM_VALID_PA)
+		pa = mem->physical_addr;
+
+	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
+		pa_mask_lsb = (u8)__ffs64(mem->physical_addr_mask);
+
+	memset(mem_location, 0, CPER_REC_LEN);
+	cper_mem_err_location(mem, mem_location);
+	memset(dimm_location, 0, CPER_REC_LEN);
+	cper_dimm_err_location(mem, dimm_location);
+	trace_extlog_mem_event(err_number, etype, severity, pa, pa_mask_lsb,
+			       fru_id, dimm_location, mem_location, fru_text);
+}
+
 static struct acpi_generic_status *extlog_elog_entry_check(int cpu, int bank)
 {
 	int idx;
@@ -137,8 +165,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 	struct mce *mce = (struct mce *)data;
 	int	bank = mce->bank;
 	int	cpu = mce->extcpu;
-	struct acpi_generic_status *estatus;
-	int rc;
+	struct acpi_generic_status *estatus, *tmp;
+	struct acpi_generic_data *gdata;
+	const uuid_le *fru_id = &NULL_UUID_LE;
+	char *fru_text = "";
+	uuid_le *sec_type;
+	static u32 err_number;
 
 	estatus = extlog_elog_entry_check(cpu, bank);
 	if (estatus == NULL)
@@ -148,7 +180,23 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 	/* clear record status to enable BIOS to update it again */
 	estatus->block_status = 0;
 
-	rc = print_extlog_rcd(NULL, (struct acpi_generic_status *)elog_buf, cpu);
+	tmp = (struct acpi_generic_status *)elog_buf;
+	print_extlog_rcd(NULL, tmp, cpu);
+
+	/* log event via trace */
+	err_number++;
+	gdata = (struct acpi_generic_data *)(tmp + 1);
+	if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+		fru_id = (uuid_le *)gdata->fru_id;
+	if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+		fru_text = gdata->fru_text;
+	sec_type = (uuid_le *)gdata->section_type;
+	if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
+		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+		if (gdata->error_data_length >= sizeof(*mem_err))
+			__trace_mem_error(fru_id, fru_text, err_number,
+					  (u8)gdata->error_severity, mem_err);
+	}
 
 	return NOTIFY_STOP;
 }
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 4cac43a..da227a3 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -23,4 +23,5 @@ static int __init ras_init(void)
 }
 subsys_initcall(ras_init);
 
+EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index acbcbb8..4d3bc92 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -9,6 +9,65 @@
 #include <linux/edac.h>
 #include <linux/ktime.h>
 #include <linux/aer.h>
+#include <linux/cper.h>
+
+
+/*
+ * MCE Extended Error Log trace event
+ *
+ * These events are generated when hardware detects a corrected or
+ * uncorrected event.
+ */
+
+/* memory trace event */
+
+TRACE_EVENT(extlog_mem_event,
+	TP_PROTO(u32 error_number,
+		 u8 etype,
+		 u8 severity,
+		 u64 pa,
+		 u8 pa_mask_lsb,
+		 const uuid_le *fru_id,
+		 const char *dimm_info,
+		 const char *mem_loc,
+		 const char *fru_text),
+
+	TP_ARGS(error_number, etype, severity, pa, pa_mask_lsb, fru_id,
+		dimm_info, mem_loc, fru_text),
+
+	TP_STRUCT__entry(
+		__field(u32, error_number)
+		__field(u8, etype)
+		__field(u8, severity)
+		__field(u64, pa)
+		__field(u8, pa_mask_lsb)
+		__string(dimm_info, dimm_info)
+		__string(mem_loc, mem_loc)
+		__dynamic_array(char, fru, CPER_REC_LEN)
+	),
+
+	TP_fast_assign(
+		__entry->error_number = error_number;
+		__entry->etype = etype;
+		__entry->severity = severity;
+		__entry->pa = pa;
+		__entry->pa_mask_lsb = pa_mask_lsb;
+		__assign_str(dimm_info, dimm_info);
+		__assign_str(mem_loc, mem_loc);
+		snprintf(__get_dynamic_array(fru), CPER_REC_LEN - 1,
+			 "FRU: %pUl %.20s", fru_id, fru_text);
+	),
+
+	TP_printk("%d %s error: %s %s physical addr: %016llx (mask lsb: %x), %s%s",
+		  __entry->error_number,
+		  cper_severity_str(__entry->severity),
+		  cper_mem_err_type_str(__entry->etype),
+		  __get_str(dimm_info),
+		  __entry->pa,
+		  __entry->pa_mask_lsb,
+		  __get_str(mem_loc),
+		  __get_str(fru))
+);
 
 /*
  * Hardware Events Report
-- 
2.0.0.rc2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-28  3:32   ` [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface Chen, Gong
@ 2014-05-28 15:28     ` Steven Rostedt
  2014-05-28 16:34       ` Borislav Petkov
  0 siblings, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2014-05-28 15:28 UTC (permalink / raw)
  To: Chen, Gong; +Cc: bp, tony.luck, m.chehab, linux-acpi, LKML

Added LKML

On Tue, 27 May 2014 23:32:18 -0400
"Chen, Gong" <gong.chen@linux.intel.com> wrote:

> Add trace interface to elaborate all H/W error related information.
> 
> v6 -> v5: format adjustment.
> v5 -> v4: Add physical mask(LSB) in trace.
> v4 -> v3: change ras trace dependency rule.
> v3 -> v2: minor adjustment according to the suggestion from Boris.
> v2 -> v1: spinlock is not needed anymore.
> 
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> ---
>  drivers/acpi/Kconfig       |  4 +++-
>  drivers/acpi/acpi_extlog.c | 54 +++++++++++++++++++++++++++++++++++++++---
>  drivers/ras/ras.c          |  1 +
>  include/ras/ras_event.h    | 59 ++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 114 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index a34a228..099a2d5 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -370,6 +370,7 @@ config ACPI_EXTLOG
>  	tristate "Extended Error Log support"
>  	depends on X86_MCE && X86_LOCAL_APIC
>  	select UEFI_CPER
> +	select RAS_TRACE
>  	default n
>  	help
>  	  Certain usages such as Predictive Failure Analysis (PFA) require
> @@ -384,6 +385,7 @@ config ACPI_EXTLOG
>  
>  	  Enhanced MCA Logging allows firmware to provide additional error
>  	  information to system software, synchronous with MCE or CMCI. This
> -	  driver adds support for that functionality.
> +	  driver adds support for that functionality with corresponding
> +	  tracepoint which carries that information to userspace.
>  
>  endif	# ACPI
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index c4a5d87..8815b73 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -16,6 +16,7 @@
>  #include <asm/mce.h>
>  
>  #include "apei/apei-internal.h"
> +#include <ras/ras_event.h>
>  
>  #define EXT_ELOG_ENTRY_MASK	GENMASK_ULL(51, 0) /* elog entry address mask */
>  
> @@ -43,6 +44,9 @@ struct extlog_l1_head {
>  
>  static int old_edac_report_status;
>  
> +static char mem_location[CPER_REC_LEN];
> +static char dimm_location[CPER_REC_LEN];
> +
>  static u8 extlog_dsm_uuid[] __initdata = "663E35AF-CC10-41A4-88EA-5470AF055295";
>  
>  /* L1 table related physical address */
> @@ -69,6 +73,30 @@ static u32 l1_percpu_entry;
>  #define ELOG_ENTRY_ADDR(phyaddr) \
>  	(phyaddr - elog_base + (u8 *)elog_addr)
>  
> +static void __trace_mem_error(const uuid_le *fru_id, char *fru_text,
> +			       u32 err_number, u8 severity,
> +			       struct cper_sec_mem_err *mem)
> +{
> +	u8 etype = ~0, pa_mask_lsb = ~0;
> +	u64 pa = ~0ull;
> +
> +	if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
> +		etype = mem->error_type;
> +
> +	if (mem->validation_bits & CPER_MEM_VALID_PA)
> +		pa = mem->physical_addr;
> +
> +	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> +		pa_mask_lsb = (u8)__ffs64(mem->physical_addr_mask);
> +
> +	memset(mem_location, 0, CPER_REC_LEN);
> +	cper_mem_err_location(mem, mem_location);
> +	memset(dimm_location, 0, CPER_REC_LEN);
> +	cper_dimm_err_location(mem, dimm_location);
> +	trace_extlog_mem_event(err_number, etype, severity, pa, pa_mask_lsb,
> +			       fru_id, dimm_location, mem_location, fru_text);

This seems like a lot of work for a tracepoint. Why all the strings?
Ideally, you want to record in the fast path the minimum raw data and
reconstruct it at the time it is read.

> +}
> +
>  static struct acpi_generic_status *extlog_elog_entry_check(int cpu, int bank)
>  {
>  	int idx;
> @@ -137,8 +165,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
>  	struct mce *mce = (struct mce *)data;
>  	int	bank = mce->bank;
>  	int	cpu = mce->extcpu;
> -	struct acpi_generic_status *estatus;
> -	int rc;
> +	struct acpi_generic_status *estatus, *tmp;
> +	struct acpi_generic_data *gdata;
> +	const uuid_le *fru_id = &NULL_UUID_LE;
> +	char *fru_text = "";
> +	uuid_le *sec_type;
> +	static u32 err_number;
>  
>  	estatus = extlog_elog_entry_check(cpu, bank);
>  	if (estatus == NULL)
> @@ -148,7 +180,23 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
>  	/* clear record status to enable BIOS to update it again */
>  	estatus->block_status = 0;
>  
> -	rc = print_extlog_rcd(NULL, (struct acpi_generic_status *)elog_buf, cpu);
> +	tmp = (struct acpi_generic_status *)elog_buf;
> +	print_extlog_rcd(NULL, tmp, cpu);
> +
> +	/* log event via trace */
> +	err_number++;
> +	gdata = (struct acpi_generic_data *)(tmp + 1);
> +	if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
> +		fru_id = (uuid_le *)gdata->fru_id;
> +	if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
> +		fru_text = gdata->fru_text;
> +	sec_type = (uuid_le *)gdata->section_type;
> +	if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
> +		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
> +		if (gdata->error_data_length >= sizeof(*mem_err))
> +			__trace_mem_error(fru_id, fru_text, err_number,
> +					  (u8)gdata->error_severity, mem_err);
> +	}
>  
>  	return NOTIFY_STOP;
>  }
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index 4cac43a..da227a3 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -23,4 +23,5 @@ static int __init ras_init(void)
>  }
>  subsys_initcall(ras_init);
>  
> +EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index acbcbb8..4d3bc92 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -9,6 +9,65 @@
>  #include <linux/edac.h>
>  #include <linux/ktime.h>
>  #include <linux/aer.h>
> +#include <linux/cper.h>
> +
> +
> +/*
> + * MCE Extended Error Log trace event
> + *
> + * These events are generated when hardware detects a corrected or
> + * uncorrected event.
> + */
> +
> +/* memory trace event */
> +
> +TRACE_EVENT(extlog_mem_event,
> +	TP_PROTO(u32 error_number,
> +		 u8 etype,
> +		 u8 severity,
> +		 u64 pa,
> +		 u8 pa_mask_lsb,
> +		 const uuid_le *fru_id,
> +		 const char *dimm_info,
> +		 const char *mem_loc,
> +		 const char *fru_text),
> +
> +	TP_ARGS(error_number, etype, severity, pa, pa_mask_lsb, fru_id,
> +		dimm_info, mem_loc, fru_text),
> +
> +	TP_STRUCT__entry(
> +		__field(u32, error_number)
> +		__field(u8, etype)
> +		__field(u8, severity)
> +		__field(u64, pa)
> +		__field(u8, pa_mask_lsb)
> +		__string(dimm_info, dimm_info)
> +		__string(mem_loc, mem_loc)
> +		__dynamic_array(char, fru, CPER_REC_LEN)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->error_number = error_number;
> +		__entry->etype = etype;
> +		__entry->severity = severity;
> +		__entry->pa = pa;
> +		__entry->pa_mask_lsb = pa_mask_lsb;
> +		__assign_str(dimm_info, dimm_info);
> +		__assign_str(mem_loc, mem_loc);
> +		snprintf(__get_dynamic_array(fru), CPER_REC_LEN - 1,
> +			 "FRU: %pUl %.20s", fru_id, fru_text);

For example, here don't use snprintf(). Save that processing for the
TP_printk(), as that is done at time of read. Again, only store the
minimum raw data, and reconstruct it later. Why slow down the fast path?

-- Steve

> +	),
> +
> +	TP_printk("%d %s error: %s %s physical addr: %016llx (mask lsb: %x), %s%s",
> +		  __entry->error_number,
> +		  cper_severity_str(__entry->severity),
> +		  cper_mem_err_type_str(__entry->etype),
> +		  __get_str(dimm_info),
> +		  __entry->pa,
> +		  __entry->pa_mask_lsb,
> +		  __get_str(mem_loc),
> +		  __get_str(fru))
> +);
>  
>  /*
>   * Hardware Events Report

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: new trace output format
  2014-05-28  3:32 ` new trace output format Chen, Gong
  2014-05-28  3:32   ` [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface Chen, Gong
@ 2014-05-28 16:23   ` Borislav Petkov
  1 sibling, 0 replies; 53+ messages in thread
From: Borislav Petkov @ 2014-05-28 16:23 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, linux-acpi

On Tue, May 27, 2014 at 11:32:17PM -0400, Chen, Gong wrote:
> Hi, Boris
> 
> Here is the new output format:
> 
>           <idle>-0     [000] d.h.   242.340120: extlog_mem_event: 1 corrected error: unknown DIMM location: Memriser1 CHANNEL A DIMM 0 physical addr: 0000000441a07000 (mask lsb: ff), node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 32336 column: 1056 FRU: 00000000-0000-0000-0000-000000000000
>           <idle>-0     [000] d.h.   349.380406: extlog_mem_event: 2 corrected error: unknown DIMM location: Memriser1 CHANNEL A DIMM 0 physical addr: 00000004416ae000 (mask lsb: ff), node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 31802 column: 1064 FRU: 00000000-0000-0000-0000-000000000000
> 
> I add extra "physical addr:" as prefix. If you don't like it, please feel free to remove it.

That's fine - it is not part of the error record anyway.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-28 15:28     ` Steven Rostedt
@ 2014-05-28 16:34       ` Borislav Petkov
  2014-05-28 16:56         ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: Borislav Petkov @ 2014-05-28 16:34 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Chen, Gong, tony.luck, m.chehab, linux-acpi, LKML

On Wed, May 28, 2014 at 11:28:32AM -0400, Steven Rostedt wrote:
> > +static void __trace_mem_error(const uuid_le *fru_id, char *fru_text,
> > +			       u32 err_number, u8 severity,
> > +			       struct cper_sec_mem_err *mem)
> > +{
> > +	u8 etype = ~0, pa_mask_lsb = ~0;
> > +	u64 pa = ~0ull;
> > +
> > +	if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
> > +		etype = mem->error_type;
> > +
> > +	if (mem->validation_bits & CPER_MEM_VALID_PA)
> > +		pa = mem->physical_addr;
> > +
> > +	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> > +		pa_mask_lsb = (u8)__ffs64(mem->physical_addr_mask);
> > +
> > +	memset(mem_location, 0, CPER_REC_LEN);
> > +	cper_mem_err_location(mem, mem_location);
> > +	memset(dimm_location, 0, CPER_REC_LEN);
> > +	cper_dimm_err_location(mem, dimm_location);
> > +	trace_extlog_mem_event(err_number, etype, severity, pa, pa_mask_lsb,
> > +			       fru_id, dimm_location, mem_location, fru_text);
> 
> This seems like a lot of work for a tracepoint. Why all the strings?
> Ideally, you want to record in the fast path the minimum raw data and
> reconstruct it at the time it is read.

Well, they're constructed from a bunch of values which are checked for
validity first:

http://lkml.kernel.org/r/1400142646-10127-4-git-send-email-gong.chen@linux.intel.com

We probably could get rid of the fru* things by reading them out from
ACPI and enumerating them and issuing only an index here which the slow
path decodes.

The others we can split into fields again which should definitely make
the record smaller. Fields are defined in struct cper_sec_mem_err.

The thing is, this TP needs to be designed properly before we expose it
to the world and so that hoperfully most future uses are covered.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-28 16:34       ` Borislav Petkov
@ 2014-05-28 16:56         ` Steven Rostedt
  2014-05-29  7:43           ` Chen, Gong
  2014-05-30  9:22           ` Chen, Gong
  0 siblings, 2 replies; 53+ messages in thread
From: Steven Rostedt @ 2014-05-28 16:56 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Chen, Gong, tony.luck, m.chehab, linux-acpi, LKML

On Wed, 28 May 2014 18:34:52 +0200
Borislav Petkov <bp@alien8.de> wrote:
 
> Well, they're constructed from a bunch of values which are checked for
> validity first:
> 
> http://lkml.kernel.org/r/1400142646-10127-4-git-send-email-gong.chen@linux.intel.com

OK, looks like you are saving a bunch of integers.

> 
> We probably could get rid of the fru* things by reading them out from
> ACPI and enumerating them and issuing only an index here which the slow
> path decodes.
> 
> The others we can split into fields again which should definitely make
> the record smaller. Fields are defined in struct cper_sec_mem_err.
> 
> The thing is, this TP needs to be designed properly before we expose it
> to the world and so that hoperfully most future uses are covered.

My concern is passing in a large string and wasting a lot of the ring
buffer space. The max you can hold per event is just under a page size
(4k). And all these strings add up. If it happens to be 512bytes, then
you end up with one event per page.

Instead of making that a huge string, what about a dynamic array of
special structures?


struct __attribute__((__packed__)) cper_sec_mem_rec {
	short type;
	int data;
};


static struct cper_sec_mem_rec mem_location[CPER_REC_LEN];

then have the:

 	if (mem->validation_bits & CPER_MEM_VALID_NODE) {
		msg[n].type = CPER_MEM_VALID_NODE_TYPE;
		msg[n++].data = mem->node;
	}
	if (mem->validation_bits & CPER_MEM_VALID_CARD) {
		msg[n].type = CPER_MEM_VALID_CARD_TYPE;
		msg[n++].data = mem->card;
	}
	if (mem->validation_bits & CPER_MEM_VALID_MODULE) {
[ and so on ]


Then return the array of msg and the length and you can record this
into the dynamic array in the tracepoint.

Then you have the type and data saved, the TP_printk() can parse it.
You can create your own handler for it to do it nicely (see the ftrace
version of __print_symbol() and friends).

Makes the tracepoint much more compact.

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-28 16:56         ` Steven Rostedt
@ 2014-05-29  7:43           ` Chen, Gong
  2014-05-29 10:35             ` Borislav Petkov
  2014-05-29 13:12             ` Steven Rostedt
  2014-05-30  9:22           ` Chen, Gong
  1 sibling, 2 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-29  7:43 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Borislav Petkov, tony.luck, m.chehab, linux-acpi, LKML

[-- Attachment #1: Type: text/plain, Size: 2435 bytes --]

On Wed, May 28, 2014 at 12:56:25PM -0400, Steven Rostedt wrote:
> My concern is passing in a large string and wasting a lot of the ring
> buffer space. The max you can hold per event is just under a page size
> (4k). And all these strings add up. If it happens to be 512bytes, then
> you end up with one event per page.
I just don't understand why you say wasting memory. I just pass
a char * not a string array. And most of time these strings are partial full,
about 1/5 ~ 1/4 spaces are used.

> 
> Instead of making that a huge string, what about a dynamic array of
> special structures?
> 
> 
> struct __attribute__((__packed__)) cper_sec_mem_rec {
> 	short type;
> 	int data;
> };
> 
> 
> static struct cper_sec_mem_rec mem_location[CPER_REC_LEN];
> 
> then have the:
> 
>  	if (mem->validation_bits & CPER_MEM_VALID_NODE) {
> 		msg[n].type = CPER_MEM_VALID_NODE_TYPE;
> 		msg[n++].data = mem->node;
> 	}
> 	if (mem->validation_bits & CPER_MEM_VALID_CARD) {
> 		msg[n].type = CPER_MEM_VALID_CARD_TYPE;
> 		msg[n++].data = mem->card;
> 	}
> 	if (mem->validation_bits & CPER_MEM_VALID_MODULE) {
> [ and so on ]
> 

This function is not only for perf but for dmesg. So key is how
to handle two strings: dimm_location and mem_location.

I read some __print_symbolic implementations like btrfs trace,

#define show_ref_type(type)                                     \
        __print_symbolic(type,                                  \
        { BTRFS_TREE_BLOCK_REF_KEY,     "TREE_BLOCK_REF" },     \
        { BTRFS_EXTENT_DATA_REF_KEY,    "EXTENT_DATA_REF" },    \
        { BTRFS_EXTENT_REF_V0_KEY,      "EXTENT_REF_V0" },      \
        { BTRFS_SHARED_BLOCK_REF_KEY,   "SHARED_BLOCK_REF" },   \
        { BTRFS_SHARED_DATA_REF_KEY,    "SHARED_DATA_REF" })

So for this case, maybe we need a macro like:

#define show_dimm_location(type)                                \
        __print_symbolic(type,                                  \
        { CPER_MEM_VALID_NODE,          "node" },               \
        { CPER_MEM_VALID_CARD,          "card" },               \
        { CPER_MEM_VALID_MODULE,        "module" },             \
        ...

IMO, it is just another implementation method, maybe more graceful,
but I don't know how it can save space. Again, original functions
work both for trace and dmesg. If we add such interface, it looks
a little bit repeated.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-29  7:43           ` Chen, Gong
@ 2014-05-29 10:35             ` Borislav Petkov
  2014-05-29 13:12             ` Steven Rostedt
  1 sibling, 0 replies; 53+ messages in thread
From: Borislav Petkov @ 2014-05-29 10:35 UTC (permalink / raw)
  To: Chen, Gong; +Cc: Steven Rostedt, tony.luck, m.chehab, linux-acpi, LKML

On Thu, May 29, 2014 at 03:43:45AM -0400, Chen, Gong wrote:
> On Wed, May 28, 2014 at 12:56:25PM -0400, Steven Rostedt wrote:
> > My concern is passing in a large string and wasting a lot of the ring
> > buffer space. The max you can hold per event is just under a page size
> > (4k). And all these strings add up. If it happens to be 512bytes, then
> > you end up with one event per page.
> I just don't understand why you say wasting memory. I just pass
> a char * not a string array.

And this char * points to where?

+               __string(dimm_info, dimm_info)
+               __string(mem_loc, mem_loc)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-29  7:43           ` Chen, Gong
  2014-05-29 10:35             ` Borislav Petkov
@ 2014-05-29 13:12             ` Steven Rostedt
  2014-05-30  2:56               ` Chen, Gong
  1 sibling, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2014-05-29 13:12 UTC (permalink / raw)
  To: Chen, Gong; +Cc: Borislav Petkov, tony.luck, m.chehab, linux-acpi, LKML

On Thu, 29 May 2014 03:43:45 -0400
"Chen, Gong" <gong.chen@linux.intel.com> wrote:

> On Wed, May 28, 2014 at 12:56:25PM -0400, Steven Rostedt wrote:
> > My concern is passing in a large string and wasting a lot of the ring
> > buffer space. The max you can hold per event is just under a page size
> > (4k). And all these strings add up. If it happens to be 512bytes, then
> > you end up with one event per page.
> I just don't understand why you say wasting memory. I just pass
> a char * not a string array. And most of time these strings are partial full,
> about 1/5 ~ 1/4 spaces are used.

What do you think gets recorded in the ring buffer? The pointer to the
string? No! You copy the entire string into the ring buffer, with
markers and all. How big is that string? 60 chars? 80? I see you
recording meta data there too:

 	if (mem->validation_bits & CPER_MEM_VALID_BIT_POSITION)
-		pr_debug("bit_position: %d\n", mem->bit_pos);
+		n += scnprintf(msg + n, len - n, "bit_position: %d ",
+			       mem->bit_pos);

You record "bit_position: <number>\n"

That by itself is 15 characters not counting the number of characters
used to write the number. All you need to record for that is a two byte
type (perhaps even one byte) and a int (4 bytes) for a total of 6
bytes.  And this is just one example, you have this for all the cases.
That will fill up fast!

> 
> > 
> > Instead of making that a huge string, what about a dynamic array of
> > special structures?
> > 
> > 
> > struct __attribute__((__packed__)) cper_sec_mem_rec {
> > 	short type;
> > 	int data;
> > };
> > 
> > 
> > static struct cper_sec_mem_rec mem_location[CPER_REC_LEN];
> > 
> > then have the:
> > 
> >  	if (mem->validation_bits & CPER_MEM_VALID_NODE) {
> > 		msg[n].type = CPER_MEM_VALID_NODE_TYPE;
> > 		msg[n++].data = mem->node;
> > 	}
> > 	if (mem->validation_bits & CPER_MEM_VALID_CARD) {
> > 		msg[n].type = CPER_MEM_VALID_CARD_TYPE;
> > 		msg[n++].data = mem->card;
> > 	}
> > 	if (mem->validation_bits & CPER_MEM_VALID_MODULE) {
> > [ and so on ]
> > 
> 
> This function is not only for perf but for dmesg. So key is how
> to handle two strings: dimm_location and mem_location.
> 
> I read some __print_symbolic implementations like btrfs trace,
> 
> #define show_ref_type(type)                                     \
>         __print_symbolic(type,                                  \
>         { BTRFS_TREE_BLOCK_REF_KEY,     "TREE_BLOCK_REF" },     \
>         { BTRFS_EXTENT_DATA_REF_KEY,    "EXTENT_DATA_REF" },    \
>         { BTRFS_EXTENT_REF_V0_KEY,      "EXTENT_REF_V0" },      \
>         { BTRFS_SHARED_BLOCK_REF_KEY,   "SHARED_BLOCK_REF" },   \
>         { BTRFS_SHARED_DATA_REF_KEY,    "SHARED_DATA_REF" })
> 
> So for this case, maybe we need a macro like:
> 
> #define show_dimm_location(type)                                \
>         __print_symbolic(type,                                  \
>         { CPER_MEM_VALID_NODE,          "node" },               \
>         { CPER_MEM_VALID_CARD,          "card" },               \
>         { CPER_MEM_VALID_MODULE,        "module" },             \
>         ...
> 
> IMO, it is just another implementation method, maybe more graceful,
> but I don't know how it can save space. Again, original functions
> work both for trace and dmesg. If we add such interface, it looks
> a little bit repeated.

I'm not sure you could use that, as you save an array of data. The
print_symbolic() wont work with an array.

You can still use the same code for both the tracepoint (perf and
ftrace) and for dmesg. You need to write a packed array that is
returned as well as a way to convert that array into a human readable
string for later processing. The dmesg version would just have them
both together where as the tracepoint records the packed version on the
ring buffer and the TP_printk() will use the extraction.

That is, dmesg has the compress and extraction in one place where the
tracepoint has them in two different places.

Understand?

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-29 13:12             ` Steven Rostedt
@ 2014-05-30  2:56               ` Chen, Gong
  0 siblings, 0 replies; 53+ messages in thread
From: Chen, Gong @ 2014-05-30  2:56 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Borislav Petkov, tony.luck, m.chehab, linux-acpi, LKML

[-- Attachment #1: Type: text/plain, Size: 1841 bytes --]

On Thu, May 29, 2014 at 09:12:51AM -0400, Steven Rostedt wrote:
> What do you think gets recorded in the ring buffer? The pointer to the
> string? No! You copy the entire string into the ring buffer, with
> markers and all. How big is that string? 60 chars? 80? I see you
> recording meta data there too:
I'm not the expert of trace. If I am wrong please fix me. So it looks
like all trace data shhould be pushed into the ring buffer as raw
material and when it is needed to be printed via TP_printk, these
raw data will be expanded(unzip them as you told) into another
temp buffer, maybe kernel printk buffer directly.

[...]
> > #define show_dimm_location(type)                                \
> >         __print_symbolic(type,                                  \
> >         { CPER_MEM_VALID_NODE,          "node" },               \
> >         { CPER_MEM_VALID_CARD,          "card" },               \
> >         { CPER_MEM_VALID_MODULE,        "module" },             \
> >         ...
> > 
> 
> I'm not sure you could use that, as you save an array of data. The
> print_symbolic() wont work with an array.
Just FWIW, call it again and again. But it looks too awkward I will
use other style.

> 
> You can still use the same code for both the tracepoint (perf and
> ftrace) and for dmesg. You need to write a packed array that is
> returned as well as a way to convert that array into a human readable
> string for later processing. The dmesg version would just have them
> both together where as the tracepoint records the packed version on the
> ring buffer and the TP_printk() will use the extraction.
> 
> That is, dmesg has the compress and extraction in one place where the
> tracepoint has them in two different places.
> 
> Understand?
Yes, I feel the same way. Zip/unzip is unavoidable.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-28 16:56         ` Steven Rostedt
  2014-05-29  7:43           ` Chen, Gong
@ 2014-05-30  9:22           ` Chen, Gong
  2014-05-30 10:07             ` Borislav Petkov
  1 sibling, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-05-30  9:22 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Borislav Petkov, tony.luck, m.chehab, linux-acpi, LKML

[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]

On Wed, May 28, 2014 at 12:56:25PM -0400, Steven Rostedt wrote:
> Instead of making that a huge string, what about a dynamic array of
> special structures?
> 
> 
> struct __attribute__((__packed__)) cper_sec_mem_rec {
> 	short type;
> 	int data;
> };
> 
> 
HI, Steven & Boris

We have two big chunk string. One for memory error location, the other
for DIMM error location. Since DIMM error location depends on some
other conditions, how about just converting memory error location
to a compact mode but leaving DIMM error location alone? 

For memory error location, I will utilize type offset to save one
more byte, furthermore, I want to drop requestor_id, responder_id
and target_id. 1) They are very rare (I've never seen them by now)
2) They are u64 but not u16. So to keep whole struct clean I want
to use following struct. We can extend it later when necessary.

struct __attribute__((__packed__)) cper_sec_mem_rec {
	u8 type;
	u16 data;
};

So whole struct is just 3 bytes. Even if all fields are valid, we
have 3 * 9 = 27 bytes in total for a record in the ring buffer.

Make sense?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-30  9:22           ` Chen, Gong
@ 2014-05-30 10:07             ` Borislav Petkov
  2014-05-30 21:16               ` Tony Luck
  2014-05-30 23:03                 ` Luck, Tony
  0 siblings, 2 replies; 53+ messages in thread
From: Borislav Petkov @ 2014-05-30 10:07 UTC (permalink / raw)
  To: Chen, Gong; +Cc: Steven Rostedt, tony.luck, m.chehab, linux-acpi, LKML

On Fri, May 30, 2014 at 05:22:32AM -0400, Chen, Gong wrote:
> We have two big chunk string. One for memory error location, the other
> for DIMM error location. Since DIMM error location depends on some
> other conditions, how about just converting memory error location to a
> compact mode but leaving DIMM error location alone?

Please elaborate, what conditions? DIMM silk screen labels or so? Maybe
we can generate a mapping between text labels and indices and we can
dump the indices in the tracepoint and do the mapping back to strings in
userspace...?

> For memory error location, I will utilize type offset to save one
> more byte, furthermore, I want to drop requestor_id, responder_id
> and target_id. 1) They are very rare (I've never seen them by now)

My concern is, are we sure we're never going to need them at all? Tony,
what's your take on this?

> 2) They are u64 but not u16. So to keep whole struct clean I want
> to use following struct. We can extend it later when necessary.
> 
> struct __attribute__((__packed__)) cper_sec_mem_rec {
> 	u8 type;
> 	u16 data;
> };
> 
> So whole struct is just 3 bytes. Even if all fields are valid, we
> have 3 * 9 = 27 bytes in total for a record in the ring buffer.
> 
> Make sense?

That is definitely much better than what we have now.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-30 10:07             ` Borislav Petkov
@ 2014-05-30 21:16               ` Tony Luck
  2014-05-30 21:26                 ` Borislav Petkov
  2014-05-30 23:03                 ` Luck, Tony
  1 sibling, 1 reply; 53+ messages in thread
From: Tony Luck @ 2014-05-30 21:16 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Chen, Gong, Steven Rostedt, m.chehab, linux-acpi, LKML

On Fri, May 30, 2014 at 3:07 AM, Borislav Petkov <bp@alien8.de> wrote:
> Please elaborate, what conditions? DIMM silk screen labels or so? Maybe
> we can generate a mapping between text labels and indices and we can
> dump the indices in the tracepoint and do the mapping back to strings in
> userspace...?

The UEFI error record gives us the SMBIOS "handle" (2-byte index). We
use that to look up the bank and device locator strings ... which should be
the silk-screen labels (in a correctly written BIOS).

So we could just have the tracepoint save the "handle" and do the
decode later.  If we want to keep doing mappings in the kernel (so console
logs can say "DIMM location: CPU 0 DIMM_C1" rather than
"SMBIOS handle 0x0015") - and would like to make things easier
for ourselves - we could have dmi_memdev_walk() do a bit more
work so we can just index an allocated array of strings that are
the concatenation of the bank/device locators.

-Tony

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-30 21:16               ` Tony Luck
@ 2014-05-30 21:26                 ` Borislav Petkov
  0 siblings, 0 replies; 53+ messages in thread
From: Borislav Petkov @ 2014-05-30 21:26 UTC (permalink / raw)
  To: Tony Luck; +Cc: Chen, Gong, Steven Rostedt, m.chehab, linux-acpi, LKML

On Fri, May 30, 2014 at 02:16:06PM -0700, Tony Luck wrote:
> On Fri, May 30, 2014 at 3:07 AM, Borislav Petkov <bp@alien8.de> wrote:
> > Please elaborate, what conditions? DIMM silk screen labels or so? Maybe
> > we can generate a mapping between text labels and indices and we can
> > dump the indices in the tracepoint and do the mapping back to strings in
> > userspace...?
> 
> The UEFI error record gives us the SMBIOS "handle" (2-byte index). We
> use that to look up the bank and device locator strings ... which should be
> the silk-screen labels (in a correctly written BIOS).

Ok, sounds straightforward.

> So we could just have the tracepoint save the "handle" and do the
> decode later. If we want to keep doing mappings in the kernel (so
> console logs can say "DIMM location: CPU 0 DIMM_C1" rather than
> "SMBIOS handle 0x0015") - and would like to make things easier for
> ourselves - we could have dmi_memdev_walk() do a bit more work
> so we can just index an allocated array of strings that are the
> concatenation of the bank/device locators.

Right, we probably would need both: if a userspace agent is active, it'd
need to resolve the handle and otherwise we'll be doing it in the kernel
with dmi.

But it all sounds very doable and nice to me.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-30 10:07             ` Borislav Petkov
@ 2014-05-30 23:03                 ` Luck, Tony
  2014-05-30 23:03                 ` Luck, Tony
  1 sibling, 0 replies; 53+ messages in thread
From: Luck, Tony @ 2014-05-30 23:03 UTC (permalink / raw)
  To: Borislav Petkov, Chen, Gong; +Cc: Steven Rostedt, m.chehab, linux-acpi, LKML

>> For memory error location, I will utilize type offset to save one
>> more byte, furthermore, I want to drop requestor_id, responder_id
>> and target_id. 1) They are very rare (I've never seen them by now)
>
> My concern is, are we sure we're never going to need them at all? Tony,
> what's your take on this?

They may seem rare because our BIOS doesn't bother to provide them.
Other BIOS writers may be more diligent.

I flip-flop on the issue of how much detail to log.  For the majority
of users it is enough to just point at the DIMM.  That's the thing that
they can easily replace.

But OEMs and large scale users often want to know every tiny detail
so they can look for patterns between errors reported across a large
fleet. So I hate to drop information on the floor that might be useful
to someone later.

All of this stuff only applies to server systems - so quibbling over
a handful of *bytes* in an error record on a system that has tens,
hundreds or even thousands of *gigabytes* of memory seems
a bit pointless.

-Tony

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
@ 2014-05-30 23:03                 ` Luck, Tony
  0 siblings, 0 replies; 53+ messages in thread
From: Luck, Tony @ 2014-05-30 23:03 UTC (permalink / raw)
  To: Borislav Petkov, Chen, Gong; +Cc: Steven Rostedt, m.chehab, linux-acpi, LKML

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1196 bytes --]

>> For memory error location, I will utilize type offset to save one
>> more byte, furthermore, I want to drop requestor_id, responder_id
>> and target_id. 1) They are very rare (I've never seen them by now)
>
> My concern is, are we sure we're never going to need them at all? Tony,
> what's your take on this?

They may seem rare because our BIOS doesn't bother to provide them.
Other BIOS writers may be more diligent.

I flip-flop on the issue of how much detail to log.  For the majority
of users it is enough to just point at the DIMM.  That's the thing that
they can easily replace.

But OEMs and large scale users often want to know every tiny detail
so they can look for patterns between errors reported across a large
fleet. So I hate to drop information on the floor that might be useful
to someone later.

All of this stuff only applies to server systems - so quibbling over
a handful of *bytes* in an error record on a system that has tens,
hundreds or even thousands of *gigabytes* of memory seems
a bit pointless.

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-30 23:03                 ` Luck, Tony
  (?)
@ 2014-05-31  1:07                 ` Steven Rostedt
  2014-06-02 16:22                   ` Luck, Tony
  -1 siblings, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2014-05-31  1:07 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, Chen, Gong, m.chehab, linux-acpi, LKML

On Fri, 30 May 2014 23:03:27 +0000
"Luck, Tony" <tony.luck@intel.com> wrote:

 
> All of this stuff only applies to server systems - so quibbling over
> a handful of *bytes* in an error record on a system that has tens,
> hundreds or even thousands of *gigabytes* of memory seems
> a bit pointless.

But there's still only a limited number of bytes in the ring buffer no
matter what the system, thus we still need to quibble over it.

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-05-31  1:07                 ` Steven Rostedt
@ 2014-06-02 16:22                   ` Luck, Tony
  2014-06-02 16:57                     ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: Luck, Tony @ 2014-06-02 16:22 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Borislav Petkov, Chen, Gong, m.chehab, linux-acpi, LKML

>> All of this stuff only applies to server systems - so quibbling over
>> a handful of *bytes* in an error record on a system that has tens,
>> hundreds or even thousands of *gigabytes* of memory seems
>> a bit pointless.
>
> But there's still only a limited number of bytes in the ring buffer no
> matter what the system, thus we still need to quibble over it.

To which I'll counter that the trace ring buffer can handle tracing of
events like page faults and context switches (can't it?) that happen
at a rate of thousands per second.  Our eMCA records will normally
happen at a rate of X per month (where X may well be less than one).
If there is a storm of errors - we disable CMCI interrupts and revert
to polling.  We declare a "storm" as just 15 events in a second. If we
switch to polling, then we won't poll faster than once per second.

So worst case is that we are seeing some steady flow of events that
don't quite trigger the storm detector ... about 14 events per second.

-Tony

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-06-02 16:22                   ` Luck, Tony
@ 2014-06-02 16:57                     ` Steven Rostedt
  2014-06-03  8:36                       ` Chen, Gong
  0 siblings, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2014-06-02 16:57 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, Chen, Gong, m.chehab, linux-acpi, LKML

On Mon, 2 Jun 2014 16:22:19 +0000
"Luck, Tony" <tony.luck@intel.com> wrote:

 
> To which I'll counter that the trace ring buffer can handle tracing of
> events like page faults and context switches (can't it?) that happen
> at a rate of thousands per second.  Our eMCA records will normally
> happen at a rate of X per month (where X may well be less than one).
> If there is a storm of errors - we disable CMCI interrupts and revert
> to polling.  We declare a "storm" as just 15 events in a second. If we
> switch to polling, then we won't poll faster than once per second.
> 
> So worst case is that we are seeing some steady flow of events that
> don't quite trigger the storm detector ... about 14 events per second.

Also matters how big you expect these events to be. If you get a
"christmas tree" set of flags, how big will that event grow with all
the descriptions attached?

The max event size after all headers is 4056 bytes. If you go over
that, the event is ignored.

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-06-02 16:57                     ` Steven Rostedt
@ 2014-06-03  8:36                       ` Chen, Gong
  2014-06-03 14:35                         ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-06-03  8:36 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Luck, Tony, Borislav Petkov, m.chehab, linux-acpi, LKML

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

On Mon, Jun 02, 2014 at 12:57:48PM -0400, Steven Rostedt wrote:
> Also matters how big you expect these events to be. If you get a
> "christmas tree" set of flags, how big will that event grow with all
> the descriptions attached?
> 
> The max event size after all headers is 4056 bytes. If you go over
> that, the event is ignored.
> 
Hi, Steven

Normally, the length of one eMCA trace record is between 200 and 256 bytes.
Once CMCI storm happens, before it is turned into poll mode, there are
about ~15 CMCI events are recorded, because I don't use rate limit for
trace so they should be recorded so seriously, some records will be lost.
But they are repeated and similar records so maybe the *lost* is not a 
big issue.

Return to how to print trace record. To avoid buffer waste, I need to
print data when TP_printk called, in the meanwhile, the print content
is an array of [name, value], but we don't know how many items are
valid. Here is the question: I can't create a dynamic printk format
like "%s %d, %s %d, ..." in TP_printk. So the only way to me is
printking them all, even some of them are invalid, which means an 12
group "%s %d", or somthing like "%.*s" to make output format graceful.
This is what we want?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-06-03  8:36                       ` Chen, Gong
@ 2014-06-03 14:35                         ` Steven Rostedt
  2014-06-04 18:32                           ` Steven Rostedt
  2014-06-06  6:51                           ` Chen, Gong
  0 siblings, 2 replies; 53+ messages in thread
From: Steven Rostedt @ 2014-06-03 14:35 UTC (permalink / raw)
  To: Chen, Gong; +Cc: Luck, Tony, Borislav Petkov, m.chehab, linux-acpi, LKML

On Tue, 3 Jun 2014 04:36:07 -0400
"Chen, Gong" <gong.chen@linux.intel.com> wrote:

> On Mon, Jun 02, 2014 at 12:57:48PM -0400, Steven Rostedt wrote:
> > Also matters how big you expect these events to be. If you get a
> > "christmas tree" set of flags, how big will that event grow with all
> > the descriptions attached?
> > 
> > The max event size after all headers is 4056 bytes. If you go over
> > that, the event is ignored.
> > 
> Hi, Steven
> 
> Normally, the length of one eMCA trace record is between 200 and 256 bytes.
> Once CMCI storm happens, before it is turned into poll mode, there are
> about ~15 CMCI events are recorded, because I don't use rate limit for
> trace so they should be recorded so seriously, some records will be lost.
> But they are repeated and similar records so maybe the *lost* is not a 
> big issue.
> 
> Return to how to print trace record. To avoid buffer waste, I need to
> print data when TP_printk called, in the meanwhile, the print content
> is an array of [name, value], but we don't know how many items are
> valid. Here is the question: I can't create a dynamic printk format
> like "%s %d, %s %d, ..." in TP_printk. So the only way to me is
> printking them all, even some of them are invalid, which means an 12
> group "%s %d", or somthing like "%.*s" to make output format graceful.
> This is what we want?

You can create a helper function to call (needs to be placed in a .c
file).

Note, there's a pointer to a trace_seq structure "p" that is available.
Hmm, I should add a get_dynamic_array_len(field), to give you the
length. I'll add that now. I also don't like the trace_seq being "p" as
that is too generic. Maybe I'll change that to "__trace_seq" or
something not so generic.


Anyway, have something like this:


	TP_printk("%s", emca_parse_events(p, __get_dynamic_array(field),
			__get_dynamic_array_len(field)));

I'll still need to add that __get_dynamic_array_len() helper. I'll send
you something tonight.

Then you write the emca_parse_events() as:


const char *emca_parse_events(struct trace_seq *p,
		struct cper_sec_mem_rec *data, int len)
{
	const char *ret = p->buffer + p->len;
	int i;

	len = len / sizeof(struct cper_sec_mem_rec);
	for (i = 0; i < len; i++) {
		switch (data[i].type) {
		case FOO:
			trace_seq_printf(p, "BAR: %d\n", data[i].data);
			break;
		[..]
		}
	}
	trace_seq_putc('\0');

	return ret;
}

-- Steve

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-06-03 14:35                         ` Steven Rostedt
@ 2014-06-04 18:32                           ` Steven Rostedt
  2014-06-06  6:51                           ` Chen, Gong
  1 sibling, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2014-06-04 18:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Chen, Gong, Luck, Tony, Borislav Petkov, m.chehab, linux-acpi, LKML

On Tue, 3 Jun 2014 10:35:44 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:


> I'll still need to add that __get_dynamic_array_len() helper. I'll send
> you something tonight.
> 

I got caught up in other work, but I wrote it this morning and I'm
adding it to my 3.16 queue. Thus, you can use this:

-- Steve

>From beba4bb096201ceec0e8cfb7ce3172a53015bdaf Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Date: Wed, 4 Jun 2014 14:29:33 -0400
Subject: [PATCH] tracing: Add __get_dynamic_array_len() macro for trace events

If a trace event uses a dynamic array for something other than a string
then there's currently no way the TP_printk() can figure out what size
it is. A __get_dynamic_array_len() is required to know the length.

This also simplifies the __get_bitmask() macro which required it as well,
but instead just hardcoded it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/ftrace.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 9b7a989..0fd06fe 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -203,6 +203,10 @@
 #define __get_dynamic_array(field)	\
 		((void *)__entry + (__entry->__data_loc_##field & 0xffff))
 
+#undef __get_dynamic_array_len
+#define __get_dynamic_array_len(field)	\
+		((__entry->__data_loc_##field >> 16) & 0xffff)
+
 #undef __get_str
 #define __get_str(field) (char *)__get_dynamic_array(field)
 
@@ -211,7 +215,7 @@
 	({								\
 		void *__bitmask = __get_dynamic_array(field);		\
 		unsigned int __bitmask_size;				\
-		__bitmask_size = (__entry->__data_loc_##field >> 16) & 0xffff; \
+		__bitmask_size = __get_dynamic_array_len(field);	\
 		ftrace_print_bitmask_seq(p, __bitmask, __bitmask_size);	\
 	})
 
@@ -636,6 +640,7 @@ static inline void ftrace_test_probe_##call(void)			\
 #undef __print_symbolic
 #undef __print_hex
 #undef __get_dynamic_array
+#undef __get_dynamic_array_len
 #undef __get_str
 #undef __get_bitmask
 
@@ -700,6 +705,10 @@ __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call
 #define __get_dynamic_array(field)	\
 		((void *)__entry + (__entry->__data_loc_##field & 0xffff))
 
+#undef __get_dynamic_array_len
+#define __get_dynamic_array_len(field)	\
+		((__entry->__data_loc_##field >> 16) & 0xffff)
+
 #undef __get_str
 #define __get_str(field) (char *)__get_dynamic_array(field)
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-06-03 14:35                         ` Steven Rostedt
  2014-06-04 18:32                           ` Steven Rostedt
@ 2014-06-06  6:51                           ` Chen, Gong
  2014-06-06 15:21                             ` Steven Rostedt
  1 sibling, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-06-06  6:51 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Luck, Tony, Borislav Petkov, m.chehab, linux-acpi, LKML

[-- Attachment #1: Type: text/plain, Size: 13836 bytes --]

On Tue, Jun 03, 2014 at 10:35:44AM -0400, Steven Rostedt wrote:
> Note, there's a pointer to a trace_seq structure "p" that is available.
> Hmm, I should add a get_dynamic_array_len(field), to give you the
> length. I'll add that now. I also don't like the trace_seq being "p" as
> that is too generic. Maybe I'll change that to "__trace_seq" or
> something not so generic.
> 
> 
> Anyway, have something like this:
> 
> 
> 	TP_printk("%s", emca_parse_events(p, __get_dynamic_array(field),
> 			__get_dynamic_array_len(field)));
> 

Hi, Steven & other guys

Here is the new patch for eMCA trace. I post it here as a RFC patch for
discussion. Once it is OK, I will resend a new complete patch series.
Before showing the patch I have something to highlight first:

1. I don't use key/value because it is not most efficient way because
it often has 9+ valid fields in one record, which means at least 9 bytes
for type(assuming packing type via __ffs). But original filed, which
is a bit field combination (an u64 field) is enough to cover everyting
without *ffs* expense. As for those fields with different lenght, it is
a disater to save them in minimum cost.

2. To share more codes like decoding error. If I use key/value I must
rewrite codes & add more obsure enum, definitions etc. Currently I just
adopt a lightweight compact struct to save ring buffer spaces. Please
refer to the codes.

3. I use same buffer to store decoded string, for dimm location, memory
location, even intermediate trace result because in our current desgin,
trace output and demsg output are mutually exclusive, so don't worry
about potential contention.

One more thing, all I saved is just when one reads raw data from user space,
TP_printk part can be bypassed so extra storage expense is none.
Am I right, Steven?

Here is the dmesg log:
[  267.627996] {1}Hardware error detected on CPU15
[  267.628007] {1}It has been corrected by h/w and requires no further action
[  267.628011] {1}event severity: corrected
[  267.628017] {1} Error 0, type: corrected
[  267.628022] {1}  section_type: memory error
[  267.628026] {1}  physical_address: 0x000000084f5d8000
[  267.628037] {1}  node: 2 card: 0 module: 0 rank: 0 bank: 0 row: 28717 column: 1528
[  267.628057] {1}  DIMM location: Memriser3 CHANNEL A DIMM 0
[  316.135078] {2}Hardware error detected on CPU15
[  316.135095] {2}It has been corrected by h/w and requires no further action
[  316.135102] {2}event severity: corrected
[  316.135107] {2} Error 0, type: corrected
[  316.135111] {2}  section_type: memory error
[  316.135114] {2}  physical_address: 0x000000084f6c0000
[  316.135120] {2}  node: 2 card: 0 module: 0 rank: 0 bank: 0 row: 28732 column: 1504
[  316.135124] {2}  DIMM location: Memriser3 CHANNEL A DIMM 0

Here is the trace log (when dmesg is disabled):
          <idle>-0     [015] d.h3   207.156998: extlog_mem_event: {1} corrected error: unknown physical addr: 00000008527e2000 (mask lsb: ff) node: 2 card: 0 module: 0 rank: 0 bank: 0 row: 29758 column: 1616 DIMM location: Memriser3 CHANNEL A DIMM 0 FRU: 00000000-0000-0000-0000-000000000000
          <idle>-0     [015] d.h3   339.574445: extlog_mem_event: {2} corrected error: unknown physical addr: 0000000851dba000 (mask lsb: ff) node: 2 card: 0 module: 0 rank: 0 bank: 0 row: 29803 column: 1592 DIMM location: Memriser3 CHANNEL A DIMM 0 FRU: 00000000-0000-0000-0000-000000000000

--------8<--------
From 864bd0f2d2121b8c7a941e6e776952378a6c636c Mon Sep 17 00:00:00 2001
From: "Chen, Gong" <gong.chen@linux.intel.com>
Date: Fri, 6 Jun 2014 01:46:45 -0400
Subject: [PATCH 5/7] trace, RAS: Add eMCA trace event interface

Add trace interface to elaborate all H/W error related information.

v7 -> v6: compact trace info to save some trace buffer space
v6 -> v5: format adjustment.
v5 -> v4: Add physical mask(LSB) in trace.
v4 -> v3: change ras trace dependency rule.
v3 -> v2: minor adjustment according to the suggestion from Boris.
v2 -> v1: spinlock is not needed anymore.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/acpi/Kconfig        |  4 ++-
 drivers/acpi/acpi_extlog.c  | 27 +++++++++++++++++---
 drivers/firmware/efi/cper.c | 48 ++++++++++++++++++++++++++++++++---
 drivers/ras/ras.c           |  1 +
 include/linux/cper.h        | 21 +++++++++++++++
 include/ras/ras_event.h     | 62 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 155 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index a34a228..099a2d5 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -370,6 +370,7 @@ config ACPI_EXTLOG
 	tristate "Extended Error Log support"
 	depends on X86_MCE && X86_LOCAL_APIC
 	select UEFI_CPER
+	select RAS_TRACE
 	default n
 	help
 	  Certain usages such as Predictive Failure Analysis (PFA) require
@@ -384,6 +385,7 @@ config ACPI_EXTLOG
 
 	  Enhanced MCA Logging allows firmware to provide additional error
 	  information to system software, synchronous with MCE or CMCI. This
-	  driver adds support for that functionality.
+	  driver adds support for that functionality with corresponding
+	  tracepoint which carries that information to userspace.
 
 endif	# ACPI
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index c4a5d87..3c4a8aa 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -16,6 +16,7 @@
 #include <asm/mce.h>
 
 #include "apei/apei-internal.h"
+#include <ras/ras_event.h>
 
 #define EXT_ELOG_ENTRY_MASK	GENMASK_ULL(51, 0) /* elog entry address mask */
 
@@ -137,8 +138,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 	struct mce *mce = (struct mce *)data;
 	int	bank = mce->bank;
 	int	cpu = mce->extcpu;
-	struct acpi_generic_status *estatus;
-	int rc;
+	struct acpi_generic_status *estatus, *tmp;
+	struct acpi_generic_data *gdata;
+	const uuid_le *fru_id = &NULL_UUID_LE;
+	char *fru_text = "";
+	uuid_le *sec_type;
+	static u32 err_seq;
 
 	estatus = extlog_elog_entry_check(cpu, bank);
 	if (estatus == NULL)
@@ -148,7 +153,23 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 	/* clear record status to enable BIOS to update it again */
 	estatus->block_status = 0;
 
-	rc = print_extlog_rcd(NULL, (struct acpi_generic_status *)elog_buf, cpu);
+	tmp = (struct acpi_generic_status *)elog_buf;
+	print_extlog_rcd(NULL, tmp, cpu);
+
+	/* log event via trace */
+	err_seq++;
+	gdata = (struct acpi_generic_data *)(tmp + 1);
+	if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+		fru_id = (uuid_le *)gdata->fru_id;
+	if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+		fru_text = gdata->fru_text;
+	sec_type = (uuid_le *)gdata->section_type;
+	if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
+		struct cper_sec_mem_err *mem = (void *)(gdata + 1);
+		if (gdata->error_data_length >= sizeof(*mem))
+			trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
+					       (u8)gdata->error_severity);
+	}
 
 	return NOTIFY_STOP;
 }
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 64d182f..23d0cef 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -207,7 +207,7 @@ const char *cper_mem_err_type_str(unsigned int etype)
 }
 EXPORT_SYMBOL_GPL(cper_mem_err_type_str);
 
-int cper_mem_err_location(const struct cper_sec_mem_err *mem, char *msg)
+int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
 {
 	u32 len, n;
 
@@ -249,7 +249,7 @@ int cper_mem_err_location(const struct cper_sec_mem_err *mem, char *msg)
 	return n;
 }
 
-int cper_dimm_err_location(const struct cper_sec_mem_err *mem, char *msg)
+int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 {
 	u32 len, n;
 	const char *bank = NULL, *device = NULL;
@@ -271,8 +271,47 @@ int cper_dimm_err_location(const struct cper_sec_mem_err *mem, char *msg)
 	return n;
 }
 
+void cper_mem_err_pack(const struct cper_sec_mem_err *mem, void *data)
+{
+	struct cper_mem_err_compact *cmem = (struct cper_mem_err_compact *)data;
+
+	cmem->validation_bits = mem->validation_bits;
+	cmem->node = mem->node;
+	cmem->card = mem->card;
+	cmem->module = mem->module;
+	cmem->bank = mem->bank;
+	cmem->device = mem->device;
+	cmem->row = mem->row;
+	cmem->column = mem->column;
+	cmem->bit_pos = mem->bit_pos;
+	cmem->requestor_id = mem->requestor_id;
+	cmem->responder_id = mem->responder_id;
+	cmem->target_id = mem->target_id;
+	cmem->rank = mem->rank;
+	cmem->mem_array_handle = mem->mem_array_handle;
+	cmem->mem_dev_handle = mem->mem_dev_handle;
+}
+EXPORT_SYMBOL_GPL(cper_mem_err_pack);
+
+const char *cper_mem_err_unpack(struct trace_seq *p, void *data)
+{
+	struct cper_mem_err_compact *cmem = (struct cper_mem_err_compact *)data;
+	const char *ret = p->buffer + p->len;
+
+	if (cper_mem_err_location(cmem, rcd_decode_str))
+		trace_seq_printf(p, "%s", rcd_decode_str);
+	if (cper_dimm_err_location(cmem, rcd_decode_str))
+		trace_seq_printf(p, "%s", rcd_decode_str);
+	trace_seq_putc(p, '\0');
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cper_mem_err_unpack);
+
 static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
 {
+	struct cper_mem_err_compact cmem;
+
 	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
 		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
 	if (mem->validation_bits & CPER_MEM_VALID_PA)
@@ -281,14 +320,15 @@ static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
 	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
 		printk("%s""physical_address_mask: 0x%016llx\n",
 		       pfx, mem->physical_addr_mask);
-	if (cper_mem_err_location(mem, rcd_decode_str))
+	cper_mem_err_pack(mem, &cmem);
+	if (cper_mem_err_location(&cmem, rcd_decode_str))
 		printk("%s%s\n", pfx, rcd_decode_str);
 	if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
 		u8 etype = mem->error_type;
 		printk("%s""error_type: %d, %s\n", pfx, etype,
 		       cper_mem_err_type_str(etype));
 	}
-	if (cper_dimm_err_location(mem, rcd_decode_str))
+	if (cper_dimm_err_location(&cmem, rcd_decode_str))
 		printk("%s%s\n", pfx, rcd_decode_str);
 }
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 4cac43a..da227a3 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -23,4 +23,5 @@ static int __init ras_init(void)
 }
 subsys_initcall(ras_init);
 
+EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
diff --git a/include/linux/cper.h b/include/linux/cper.h
index ed088b9..3548160 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -22,6 +22,7 @@
 #define LINUX_CPER_H
 
 #include <linux/uuid.h>
+#include <linux/trace_seq.h>
 
 /* CPER record signature and the size */
 #define CPER_SIG_RECORD				"CPER"
@@ -363,6 +364,24 @@ struct cper_sec_mem_err {
 	__u16	mem_dev_handle;		/* module handle in UEFI 2.4 */
 };
 
+struct cper_mem_err_compact {
+	__u64	validation_bits;
+	__u16	node;
+	__u16	card;
+	__u16	module;
+	__u16	bank;
+	__u16	device;
+	__u16	row;
+	__u16	column;
+	__u16	bit_pos;
+	__u64	requestor_id;
+	__u64	responder_id;
+	__u64	target_id;
+	__u16	rank;
+	__u16	mem_array_handle;
+	__u16	mem_dev_handle;
+};
+
 struct cper_sec_pcie {
 	__u64		validation_bits;
 	__u32		port_type;
@@ -406,5 +425,7 @@ const char *cper_severity_str(unsigned int);
 const char *cper_mem_err_type_str(unsigned int);
 void cper_print_bits(const char *prefix, unsigned int bits,
 		     const char * const strs[], unsigned int strs_size);
+void cper_mem_err_pack(const struct cper_sec_mem_err *, void *);
+const char *cper_mem_err_unpack(struct trace_seq *, void *);
 
 #endif
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index acbcbb8..66cd7bb 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -9,6 +9,68 @@
 #include <linux/edac.h>
 #include <linux/ktime.h>
 #include <linux/aer.h>
+#include <linux/cper.h>
+
+/*
+ * MCE Extended Error Log trace event
+ *
+ * These events are generated when hardware detects a corrected or
+ * uncorrected event.
+ */
+
+/* memory trace event */
+
+TRACE_EVENT(extlog_mem_event,
+	TP_PROTO(struct cper_sec_mem_err *mem,
+		 u32 err_seq,
+		 const uuid_le *fru_id,
+		 const char *fru_text,
+		 u8 sev),
+
+	TP_ARGS(mem, err_seq, fru_id, fru_text, sev),
+
+	TP_STRUCT__entry(
+		__field(u32, err_seq)
+		__field(u8, etype)
+		__field(u8, sev)
+		__field(u64, pa)
+		__field(u8, pa_mask_lsb)
+		__dynamic_array(char, fru, 48)
+		__dynamic_array(u8, data, sizeof(struct cper_mem_err_compact))
+	),
+
+	TP_fast_assign(
+		__entry->err_seq = err_seq;
+		if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
+			__entry->etype = mem->error_type;
+		else
+			__entry->etype = ~0;
+		__entry->sev = sev;
+		if (mem->validation_bits & CPER_MEM_VALID_PA)
+			__entry->pa = mem->physical_addr;
+		else
+			__entry->pa = ~0ull;
+
+		if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
+			__entry->pa_mask_lsb =
+				(u8)__ffs64(mem->physical_addr_mask);
+		else
+			__entry->pa_mask_lsb = ~0;
+		snprintf(__get_dynamic_array(fru), 47,
+			 "FRU: %pUl %.20s", fru_id, fru_text);
+		cper_mem_err_pack(mem, __get_dynamic_array(data));
+	),
+
+	TP_printk("{%d} %s error: %s physical addr: %016llx (mask lsb: %x) %s%s",
+		  __entry->err_seq,
+		  cper_severity_str(__entry->sev),
+		  cper_mem_err_type_str(__entry->etype),
+		  __entry->pa,
+		  __entry->pa_mask_lsb,
+		  cper_mem_err_unpack(p, __get_dynamic_array(data)),
+		  __get_str(fru))
+);
 
 /*
  * Hardware Events Report
-- 
2.0.0.rc2

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-06-06  6:51                           ` Chen, Gong
@ 2014-06-06 15:21                             ` Steven Rostedt
  2014-06-09  1:10                               ` Chen, Gong
  0 siblings, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2014-06-06 15:21 UTC (permalink / raw)
  To: Chen, Gong; +Cc: Luck, Tony, Borislav Petkov, m.chehab, linux-acpi, LKML

On Fri, 6 Jun 2014 02:51:41 -0400
"Chen, Gong" <gong.chen@linux.intel.com> wrote:


> +/*
> + * MCE Extended Error Log trace event
> + *
> + * These events are generated when hardware detects a corrected or
> + * uncorrected event.
> + */
> +
> +/* memory trace event */
> +
> +TRACE_EVENT(extlog_mem_event,
> +	TP_PROTO(struct cper_sec_mem_err *mem,
> +		 u32 err_seq,
> +		 const uuid_le *fru_id,
> +		 const char *fru_text,
> +		 u8 sev),
> +
> +	TP_ARGS(mem, err_seq, fru_id, fru_text, sev),
> +
> +	TP_STRUCT__entry(
> +		__field(u32, err_seq)
> +		__field(u8, etype)
> +		__field(u8, sev)
> +		__field(u64, pa)
> +		__field(u8, pa_mask_lsb)
> +		__dynamic_array(char, fru, 48)
> +		__dynamic_array(u8, data, sizeof(struct cper_mem_err_compact))

For constant size arrays, don't use __dynamic_array() just use
__array().

Although I'd get rid of the fru and replace that with:

		__field(unsigned long, fru_id)
		__string(fru_text, fru_text)


> +	),
> +
> +	TP_fast_assign(
> +		__entry->err_seq = err_seq;
> +		if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
> +			__entry->etype = mem->error_type;
> +		else
> +			__entry->etype = ~0;
> +		__entry->sev = sev;
> +		if (mem->validation_bits & CPER_MEM_VALID_PA)
> +			__entry->pa = mem->physical_addr;
> +		else
> +			__entry->pa = ~0ull;
> +
> +		if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> +			__entry->pa_mask_lsb =
> +				(u8)__ffs64(mem->physical_addr_mask);
> +		else
> +			__entry->pa_mask_lsb = ~0;
> +		snprintf(__get_dynamic_array(fru), 47,
> +			 "FRU: %pUl %.20s", fru_id, fru_text);

Although, why not just save the id and text straight? Why format it
here?

	__entry->fru_id = fru_id;
	__assign_str(fru_text, fru_text);

> +		cper_mem_err_pack(mem, __get_dynamic_array(data));
> +	),
> +
> +	TP_printk("{%d} %s error: %s physical addr: %016llx (mask lsb: %x) %s%s",

	TP_printk("{%d} %s error: %s physical addr: %016llx (mask lsb: %x) %s FRU: %pUl %.20s",

> +		  __entry->err_seq,
> +		  cper_severity_str(__entry->sev),
> +		  cper_mem_err_type_str(__entry->etype),
> +		  __entry->pa,
> +		  __entry->pa_mask_lsb,
> +		  cper_mem_err_unpack(p, __get_dynamic_array(data)),
> +		  __get_str(fru))

		__entry->fru_id,
		__get_str(fru_text))

-- Steve

> +);
>  
>  /*
>   * Hardware Events Report


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-06-06 15:21                             ` Steven Rostedt
@ 2014-06-09  1:10                               ` Chen, Gong
  2014-06-09 10:22                                 ` Borislav Petkov
  0 siblings, 1 reply; 53+ messages in thread
From: Chen, Gong @ 2014-06-09  1:10 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Luck, Tony, Borislav Petkov, m.chehab, linux-acpi, LKML

[-- Attachment #1: Type: text/plain, Size: 1636 bytes --]

On Fri, Jun 06, 2014 at 11:21:27AM -0400, Steven Rostedt wrote:
> Date: Fri, 6 Jun 2014 11:21:27 -0400
> From: Steven Rostedt <rostedt@goodmis.org>
> To: "Chen, Gong" <gong.chen@linux.intel.com>
> Cc: "Luck, Tony" <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
>  "m.chehab@samsung.com" <m.chehab@samsung.com>,
>  "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, LKML
>  <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
> X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.23; x86_64-pc-linux-gnu)
> 
> On Fri, 6 Jun 2014 02:51:41 -0400
> "Chen, Gong" <gong.chen@linux.intel.com> wrote:
> 
> 
> > +/*
> > + * MCE Extended Error Log trace event
> > + *
> > + * These events are generated when hardware detects a corrected or
> > + * uncorrected event.
> > + */
> > +
> > +/* memory trace event */
> > +
> > +TRACE_EVENT(extlog_mem_event,
> > +	TP_PROTO(struct cper_sec_mem_err *mem,
> > +		 u32 err_seq,
> > +		 const uuid_le *fru_id,
> > +		 const char *fru_text,
> > +		 u8 sev),
> > +
> > +	TP_ARGS(mem, err_seq, fru_id, fru_text, sev),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(u32, err_seq)
> > +		__field(u8, etype)
> > +		__field(u8, sev)
> > +		__field(u64, pa)
> > +		__field(u8, pa_mask_lsb)
> > +		__dynamic_array(char, fru, 48)
> > +		__dynamic_array(u8, data, sizeof(struct cper_mem_err_compact))
> 
> For constant size arrays, don't use __dynamic_array() just use
> __array().
> 
Thanks a lot! I will update it.

BTW, any comments from other guys? Boris, Tony? If not, I will send out the
new version tomorrow.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface
  2014-06-09  1:10                               ` Chen, Gong
@ 2014-06-09 10:22                                 ` Borislav Petkov
  0 siblings, 0 replies; 53+ messages in thread
From: Borislav Petkov @ 2014-06-09 10:22 UTC (permalink / raw)
  To: Chen, Gong; +Cc: Steven Rostedt, Luck, Tony, m.chehab, linux-acpi, LKML

On Sun, Jun 08, 2014 at 09:10:15PM -0400, Chen, Gong wrote:
> BTW, any comments from other guys? Boris, Tony? If not, I will send out the
> new version tomorrow.

Looks ok at a first glance - I'll take a deeper look at your new version
with Steve's comments incorporated tomorrow - today's holiday here so no
work for me.

What dragged me to read emails I don't even remember. :-)

Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2014-06-09 10:22 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-15  8:30 New eMCA trace event interface Chen, Gong
2014-05-15  8:30 ` [PATCH 1/7 v5] trace, RAS: Add basic RAS trace event Chen, Gong
2014-05-15  8:30 ` [PATCH 2/7 v3] trace, AER: Move trace into unified interface Chen, Gong
2014-05-21 10:19   ` Borislav Petkov
2014-05-22  0:03     ` Chen, Gong
2014-05-22 10:41       ` Borislav Petkov
2014-05-15  8:30 ` [PATCH 3/7 v4] CPER: Adjust code flow of some functions Chen, Gong
2014-05-21 11:05   ` Borislav Petkov
2014-05-21 23:51     ` Chen, Gong
2014-05-22 10:52       ` Borislav Petkov
2014-05-23  1:49         ` Chen, Gong
2014-05-23  9:37           ` Borislav Petkov
2014-05-23 10:11             ` Borislav Petkov
2014-05-26  1:59               ` Chen, Gong
2014-05-26 10:21                 ` Borislav Petkov
2014-05-26 10:42                   ` Chen, Gong
2014-05-26  2:07             ` Chen, Gong
2014-05-26 10:23               ` Borislav Petkov
2014-05-15  8:30 ` [PATCH 4/7 v2] RAS, debugfs: Add debugfs interface for RAS subsystem Chen, Gong
2014-05-15  8:30 ` [PATCH 5/7 v5] trace, RAS: Add eMCA trace event interface Chen, Gong
2014-05-15  8:30 ` [PATCH 6/7 v3] trace, eMCA: Add a knob to adjust where to save event log Chen, Gong
2014-05-21 11:06   ` Borislav Petkov
2014-05-21 23:46     ` Chen, Gong
2014-05-22 11:11       ` Borislav Petkov
2014-05-23  1:40         ` Chen, Gong
2014-05-28  3:27         ` [PATCH 6/7 v4] " Chen, Gong
2014-05-15  8:30 ` [PATCH 7/7] RAS, extlog: Adjust init flow Chen, Gong
2014-05-28  3:32 ` new trace output format Chen, Gong
2014-05-28  3:32   ` [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface Chen, Gong
2014-05-28 15:28     ` Steven Rostedt
2014-05-28 16:34       ` Borislav Petkov
2014-05-28 16:56         ` Steven Rostedt
2014-05-29  7:43           ` Chen, Gong
2014-05-29 10:35             ` Borislav Petkov
2014-05-29 13:12             ` Steven Rostedt
2014-05-30  2:56               ` Chen, Gong
2014-05-30  9:22           ` Chen, Gong
2014-05-30 10:07             ` Borislav Petkov
2014-05-30 21:16               ` Tony Luck
2014-05-30 21:26                 ` Borislav Petkov
2014-05-30 23:03               ` Luck, Tony
2014-05-30 23:03                 ` Luck, Tony
2014-05-31  1:07                 ` Steven Rostedt
2014-06-02 16:22                   ` Luck, Tony
2014-06-02 16:57                     ` Steven Rostedt
2014-06-03  8:36                       ` Chen, Gong
2014-06-03 14:35                         ` Steven Rostedt
2014-06-04 18:32                           ` Steven Rostedt
2014-06-06  6:51                           ` Chen, Gong
2014-06-06 15:21                             ` Steven Rostedt
2014-06-09  1:10                               ` Chen, Gong
2014-06-09 10:22                                 ` Borislav Petkov
2014-05-28 16:23   ` new trace output format Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.