linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 1/7 v5] trace, RAS: Add basic RAS trace event
       [not found] ` <1402475691-30045-2-git-send-email-gong.chen@linux.intel.com>
@ 2014-06-11 18:59   ` Borislav Petkov
  0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2014-06-11 18:59 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, rostedt, linux-acpi, lkml

On Wed, Jun 11, 2014 at 04:34:45AM -0400, Chen, Gong wrote:
> To avoid confuision and conflict of usage for RAS related trace event,
> add an unified RAS trace event stub.
> 
> v5 -> v4: remove explicit RAS menuconfig.
> v4 -> v3: change dependency rule of RAS_TRACE.
> v3 -> v2: fix dependency in Kconfig.
> v2 -> v1: adjust Kconfig to take RAS as a separate subsystem.

Let's simplify it a little - I've dropped RAS_TRACE for now. We can
carve it out later, when needed.

---
From: "Chen, Gong" <gong.chen@linux.intel.com>
Subject: [PATCH 1/7 v5] trace, RAS: Add basic RAS trace event

To avoid confuision and conflict of usage for RAS related trace event,
add an unified RAS trace event stub.

Start a RAS subsystem menu which will be fleshed out in time, when more
features get added to it.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/Kconfig        |  2 ++
 drivers/Makefile       |  1 +
 drivers/edac/Kconfig   |  1 +
 drivers/edac/edac_mc.c |  3 ---
 drivers/ras/Kconfig    |  6 ++++++
 drivers/ras/Makefile   |  1 +
 drivers/ras/ras.c      | 12 ++++++++++++
 7 files changed, 23 insertions(+), 3 deletions(-)
 create mode 100644 drivers/ras/Kconfig
 create mode 100644 drivers/ras/Makefile
 create mode 100644 drivers/ras/ras.c

Index: linux/drivers/Kconfig
===================================================================
--- linux.orig/drivers/Kconfig	2014-06-11 17:14:23.782437196 +0200
+++ linux/drivers/Kconfig	2014-06-11 17:14:23.770437196 +0200
@@ -176,4 +176,6 @@ source "drivers/powercap/Kconfig"
 
 source "drivers/mcb/Kconfig"
 
+source "drivers/ras/Kconfig"
+
 endmenu
Index: linux/drivers/Makefile
===================================================================
--- linux.orig/drivers/Makefile	2014-06-11 17:14:23.782437196 +0200
+++ linux/drivers/Makefile	2014-06-11 17:14:23.770437196 +0200
@@ -158,3 +158,4 @@ obj-$(CONFIG_NTB)		+= ntb/
 obj-$(CONFIG_FMC)		+= fmc/
 obj-$(CONFIG_POWERCAP)		+= powercap/
 obj-$(CONFIG_MCB)		+= mcb/
+obj-$(CONFIG_RAS)		+= ras/
Index: linux/drivers/edac/Kconfig
===================================================================
--- linux.orig/drivers/edac/Kconfig	2014-06-11 17:14:23.782437196 +0200
+++ linux/drivers/edac/Kconfig	2014-06-11 17:24:18.142427373 +0200
@@ -72,6 +72,7 @@ config EDAC_MCE_INJ
 
 config EDAC_MM_EDAC
 	tristate "Main Memory EDAC (Error Detection And Correction) reporting"
+	select RAS
 	help
 	  Some systems are able to detect and correct errors in main
 	  memory.  EDAC can report statistics on memory error
Index: linux/drivers/edac/edac_mc.c
===================================================================
--- linux.orig/drivers/edac/edac_mc.c	2014-06-11 17:14:23.782437196 +0200
+++ linux/drivers/edac/edac_mc.c	2014-06-11 17:14:23.770437196 +0200
@@ -33,9 +33,6 @@
 #include <asm/edac.h>
 #include "edac_core.h"
 #include "edac_module.h"
-
-#define CREATE_TRACE_POINTS
-#define TRACE_INCLUDE_PATH ../../include/ras
 #include <ras/ras_event.h>
 
 /* lock to memory controller's control array */
Index: linux/drivers/ras/Kconfig
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/drivers/ras/Kconfig	2014-06-11 17:24:00.846427659 +0200
@@ -0,0 +1,2 @@
+config RAS
+	bool
Index: linux/drivers/ras/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/drivers/ras/Makefile	2014-06-11 17:14:23.774437196 +0200
@@ -0,0 +1 @@
+obj-$(CONFIG_RAS) += ras.o
Index: linux/drivers/ras/ras.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/drivers/ras/ras.c	2014-06-11 17:14:23.774437196 +0200
@@ -0,0 +1,12 @@
+/*
+ * Copyright (C) 2014 Intel Corporation
+ *
+ * Authors:
+ *	Chen, Gong <gong.chen@linux.intel.com>
+ */
+
+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/ras_event.h>
+
+EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
--

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/7 v3] trace, AER: Move trace into unified interface
       [not found] ` <1402475691-30045-3-git-send-email-gong.chen@linux.intel.com>
@ 2014-06-11 19:00   ` Borislav Petkov
  0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2014-06-11 19:00 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, rostedt, linux-acpi, lkml

On Wed, Jun 11, 2014 at 04:34:46AM -0400, Chen, Gong wrote:
> AER uses a separate trace interface by now. To make it
> consistent, move it into unified RAS trace interface.
> 
> v3 -> v2: change dependency rule of RAS_TRACE.
> v2 -> v1: remove unnecessary dependency in drivers/ras/Kconfig.
> 
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> ---
>  drivers/pci/pcie/aer/Kconfig           |  1 +
>  drivers/pci/pcie/aer/aerdrv_errprint.c |  4 +-
>  include/ras/ras_event.h                | 64 ++++++++++++++++++++++++++++
>  include/trace/events/ras.h             | 77 ----------------------------------
>  4 files changed, 66 insertions(+), 80 deletions(-)
>  delete mode 100644 include/trace/events/ras.h
> 
> diff --git a/drivers/pci/pcie/aer/Kconfig b/drivers/pci/pcie/aer/Kconfig
> index 50e94e0..c611384 100644
> --- a/drivers/pci/pcie/aer/Kconfig
> +++ b/drivers/pci/pcie/aer/Kconfig
> @@ -5,6 +5,7 @@
>  config PCIEAER
>  	boolean "Root Port Advanced Error Reporting support"
>  	depends on PCIEPORTBUS
> +	select RAS_TRACE
>  	default y
>  	help
>  	  This enables PCI Express Root Port Advanced Error Reporting

With this hunk changed to

Index: b/drivers/pci/pcie/aer/Kconfig
===================================================================
--- a/drivers/pci/pcie/aer/Kconfig      2014-06-11 17:33:57.298417802 +0200
+++ b/drivers/pci/pcie/aer/Kconfig      2014-06-11 17:34:16.302417487 +0200
@@ -5,6 +5,7 @@
 config PCIEAER
        boolean "Root Port Advanced Error Reporting support"
        depends on PCIEPORTBUS
+       select RAS
        default y
        help
          This enables PCI Express Root Port Advanced Error Reporting
--

Acked-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 4/7 v2] RAS, debugfs: Add debugfs interface for RAS subsystem
       [not found] ` <1402475691-30045-5-git-send-email-gong.chen@linux.intel.com>
@ 2014-06-11 19:01   ` Borislav Petkov
  0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2014-06-11 19:01 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, rostedt, linux-acpi, lkml

On Wed, Jun 11, 2014 at 04:34:48AM -0400, Chen, Gong wrote:
> Implement a new debugfs interface for RAS susbsystem.
> A file named daemon_active is added there accordingly.
> This file is used to track if user space daemon enables
> perf/trace interface or not. One can track which daemon
> opens it via "lsof /path/to/debugfs/ras/daemon_active".
> 
> v2 -> v1: Change file access mode from 0444 to 0400.
> 
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> ---
>  drivers/ras/Makefile  |  2 +-
>  drivers/ras/debugfs.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/ras/ras.c     | 14 +++++++++++++
>  include/linux/ras.h   | 15 ++++++++++++++
>  4 files changed, 87 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/ras/debugfs.c
>  create mode 100644 include/linux/ras.h
> 
> diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
> index 223e806..d7f7334 100644
> --- a/drivers/ras/Makefile
> +++ b/drivers/ras/Makefile
> @@ -1 +1 @@
> -obj-$(CONFIG_RAS) += ras.o
> +obj-$(CONFIG_RAS) += ras.o debugfs.o
> diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
> new file mode 100644
> index 0000000..d0bc389
> --- /dev/null
> +++ b/drivers/ras/debugfs.c
> @@ -0,0 +1,57 @@
> +#include <linux/debugfs.h>
> +
> +struct dentry *ras_debugfs_dir;
> +EXPORT_SYMBOL_GPL(ras_debugfs_dir);

No need to export this. Revised version below:

---
From: "Chen, Gong" <gong.chen@linux.intel.com>

Implement a new debugfs interface for RAS susbsystem.
A file named daemon_active is added there accordingly.
This file is used to track if user space daemon accesses
perf/trace interface or not. One can track which daemon
opens it via "lsof /path/to/debugfs/ras/daemon_active".

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1402475691-30045-5-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/ras/Makefile  |  2 +-
 drivers/ras/debugfs.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/ras/ras.c     | 14 +++++++++++++
 include/linux/ras.h   | 15 ++++++++++++++
 4 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ras/debugfs.c
 create mode 100644 include/linux/ras.h

Index: linux/drivers/ras/Makefile
===================================================================
--- linux.orig/drivers/ras/Makefile	2014-06-11 17:54:21.738397566 +0200
+++ linux/drivers/ras/Makefile	2014-06-11 17:54:21.726397566 +0200
@@ -1 +1 @@
-obj-$(CONFIG_RAS) += ras.o
+obj-$(CONFIG_RAS) += ras.o debugfs.o
Index: linux/drivers/ras/debugfs.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/drivers/ras/debugfs.c	2014-06-11 17:58:47.214393178 +0200
@@ -0,0 +1,56 @@
+#include <linux/debugfs.h>
+
+static struct dentry *ras_debugfs_dir;
+
+static atomic_t trace_count = ATOMIC_INIT(0);
+
+int ras_userspace_consumers(void)
+{
+	return atomic_read(&trace_count);
+}
+EXPORT_SYMBOL_GPL(ras_userspace_consumers);
+
+static int trace_show(struct seq_file *m, void *v)
+{
+	return atomic_read(&trace_count);
+}
+
+static int trace_open(struct inode *inode, struct file *file)
+{
+	atomic_inc(&trace_count);
+	return single_open(file, trace_show, NULL);
+}
+
+static int trace_release(struct inode *inode, struct file *file)
+{
+	atomic_dec(&trace_count);
+	return single_release(inode, file);
+}
+
+static const struct file_operations trace_fops = {
+	.open    = trace_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = trace_release,
+};
+
+int __init ras_add_daemon_trace(void)
+{
+	struct dentry *fentry;
+
+	if (!ras_debugfs_dir)
+		return -ENOENT;
+
+	fentry = debugfs_create_file("daemon_active", S_IRUSR, ras_debugfs_dir,
+				     NULL, &trace_fops);
+	if (!fentry)
+		return -ENODEV;
+
+	return 0;
+
+}
+
+void __init ras_debugfs_init(void)
+{
+	ras_debugfs_dir = debugfs_create_dir("ras", NULL);
+}
Index: linux/drivers/ras/ras.c
===================================================================
--- linux.orig/drivers/ras/ras.c	2014-06-11 17:54:21.738397566 +0200
+++ linux/drivers/ras/ras.c	2014-06-11 17:54:21.730397566 +0200
@@ -5,8 +5,22 @@
  *	Chen, Gong <gong.chen@linux.intel.com>
  */
 
+#include <linux/init.h>
+#include <linux/ras.h>
+
 #define CREATE_TRACE_POINTS
 #define TRACE_INCLUDE_PATH ../../include/ras
 #include <ras/ras_event.h>
 
+static int __init ras_init(void)
+{
+	int rc = 0;
+
+	ras_debugfs_init();
+	rc = ras_add_daemon_trace();
+
+	return rc;
+}
+subsys_initcall(ras_init);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
Index: linux/include/linux/ras.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/include/linux/ras.h	2014-06-11 17:58:43.350393242 +0200
@@ -0,0 +1,14 @@
+#ifndef __RAS_H__
+#define __RAS_H__
+
+#ifdef CONFIG_DEBUG_FS
+int ras_userspace_consumers(void);
+void ras_debugfs_init(void);
+int ras_add_daemon_trace(void);
+#else
+static inline int ras_userspace_consumers(void) { return 0; }
+static inline void ras_debugfs_init(void) { return; }
+static inline int ras_add_daemon_trace(void) { return 0; }
+#endif
+
+#endif

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 5/7 v7] trace, RAS: Add eMCA trace event interface
       [not found] ` <1402475691-30045-6-git-send-email-gong.chen@linux.intel.com>
@ 2014-06-11 19:02   ` Borislav Petkov
  2014-06-12  2:42     ` Chen, Gong
  0 siblings, 1 reply; 5+ messages in thread
From: Borislav Petkov @ 2014-06-11 19:02 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, m.chehab, rostedt, linux-acpi, lkml

On Wed, Jun 11, 2014 at 04:34:49AM -0400, Chen, Gong wrote:
> Add trace interface to elaborate all H/W error related information.
> 
> v7 -> v6: compact trace info to save trace buffer space.
> v6 -> v5: format adjustment.
> v5 -> v4: Add physical mask(LSB) in trace.
> v4 -> v3: change ras trace dependency rule.
> v3 -> v2: minor adjustment according to the suggestion from Boris.
> v2 -> v1: spinlock is not needed anymore.
> 
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> ---
>  drivers/acpi/Kconfig        |  4 ++-
>  drivers/acpi/acpi_extlog.c  | 27 ++++++++++++++++---
>  drivers/firmware/efi/cper.c | 48 +++++++++++++++++++++++++++++++---
>  drivers/ras/ras.c           |  1 +
>  include/linux/cper.h        | 21 +++++++++++++++
>  include/ras/ras_event.h     | 63 +++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 156 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index a34a228..099a2d5 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -370,6 +370,7 @@ config ACPI_EXTLOG
>  	tristate "Extended Error Log support"
>  	depends on X86_MCE && X86_LOCAL_APIC
>  	select UEFI_CPER
> +	select RAS_TRACE
>  	default n
>  	help
>  	  Certain usages such as Predictive Failure Analysis (PFA) require
> @@ -384,6 +385,7 @@ config ACPI_EXTLOG
>  
>  	  Enhanced MCA Logging allows firmware to provide additional error
>  	  information to system software, synchronous with MCE or CMCI. This
> -	  driver adds support for that functionality.
> +	  driver adds support for that functionality with corresponding
> +	  tracepoint which carries that information to userspace.
>  
>  endif	# ACPI
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index 1853341..e61da95 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -16,6 +16,7 @@
>  #include <asm/mce.h>
>  
>  #include "apei/apei-internal.h"
> +#include <ras/ras_event.h>
>  
>  #define EXT_ELOG_ENTRY_MASK	GENMASK_ULL(51, 0) /* elog entry address mask */
>  
> @@ -137,8 +138,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
>  	struct mce *mce = (struct mce *)data;
>  	int	bank = mce->bank;
>  	int	cpu = mce->extcpu;
> -	struct acpi_generic_status *estatus;
> -	int rc;
> +	struct acpi_generic_status *estatus, *tmp;
> +	struct acpi_generic_data *gdata;
> +	const uuid_le *fru_id = &NULL_UUID_LE;
> +	char *fru_text = "";
> +	uuid_le *sec_type;
> +	static u32 err_seq;
>  
>  	estatus = extlog_elog_entry_check(cpu, bank);
>  	if (estatus == NULL)
> @@ -148,7 +153,23 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
>  	/* clear record status to enable BIOS to update it again */
>  	estatus->block_status = 0;
>  
> -	rc = print_extlog_rcd(NULL, (struct acpi_generic_status *)elog_buf, cpu);
> +	tmp = (struct acpi_generic_status *)elog_buf;
> +	print_extlog_rcd(NULL, tmp, cpu);
> +
> +	/* log event via trace */
> +	err_seq++;
> +	gdata = (struct acpi_generic_data *)(tmp + 1);
> +	if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
> +		fru_id = (uuid_le *)gdata->fru_id;
> +	if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
> +		fru_text = gdata->fru_text;
> +	sec_type = (uuid_le *)gdata->section_type;
> +	if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
> +		struct cper_sec_mem_err *mem = (void *)(gdata + 1);
> +		if (gdata->error_data_length >= sizeof(*mem))
> +			trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
> +					       (u8)gdata->error_severity);
> +	}
>  
>  	return NOTIFY_STOP;
>  }
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 83b56b61..85d6d30 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -207,7 +207,7 @@ const char *cper_mem_err_type_str(unsigned int etype)
>  }
>  EXPORT_SYMBOL_GPL(cper_mem_err_type_str);
>  
> -int cper_mem_err_location(const struct cper_sec_mem_err *mem, char *msg)
> +int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
>  {
>  	u32 len, n;
>  
> @@ -249,7 +249,7 @@ int cper_mem_err_location(const struct cper_sec_mem_err *mem, char *msg)
>  	return n;
>  }
>  
> -int cper_dimm_err_location(const struct cper_sec_mem_err *mem, char *msg)
> +int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
>  {
>  	u32 len, n;
>  	const char *bank = NULL, *device = NULL;
> @@ -271,8 +271,47 @@ int cper_dimm_err_location(const struct cper_sec_mem_err *mem, char *msg)
>  	return n;
>  }
>  
> +void cper_mem_err_pack(const struct cper_sec_mem_err *mem, void *data)
> +{
> +	struct cper_mem_err_compact *cmem = (struct cper_mem_err_compact *)data;
> +
> +	cmem->validation_bits = mem->validation_bits;
> +	cmem->node = mem->node;
> +	cmem->card = mem->card;
> +	cmem->module = mem->module;
> +	cmem->bank = mem->bank;
> +	cmem->device = mem->device;
> +	cmem->row = mem->row;
> +	cmem->column = mem->column;
> +	cmem->bit_pos = mem->bit_pos;
> +	cmem->requestor_id = mem->requestor_id;
> +	cmem->responder_id = mem->responder_id;
> +	cmem->target_id = mem->target_id;
> +	cmem->rank = mem->rank;
> +	cmem->mem_array_handle = mem->mem_array_handle;
> +	cmem->mem_dev_handle = mem->mem_dev_handle;
> +}
> +EXPORT_SYMBOL_GPL(cper_mem_err_pack);

Why do we export this one and the one below? What .config warrants this?

CONFIG_ACPI_EXTLOG=m doesn't need them, AFAICT.

> +const char *cper_mem_err_unpack(struct trace_seq *p, void *data)
> +{
> +	struct cper_mem_err_compact *cmem = (struct cper_mem_err_compact *)data;
> +	const char *ret = p->buffer + p->len;
> +
> +	if (cper_mem_err_location(cmem, rcd_decode_str))
> +		trace_seq_printf(p, "%s", rcd_decode_str);
> +	if (cper_dimm_err_location(cmem, rcd_decode_str))
> +		trace_seq_printf(p, "%s", rcd_decode_str);
> +	trace_seq_putc(p, '\0');
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(cper_mem_err_unpack);
> +
>  static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
>  {
> +	struct cper_mem_err_compact cmem;
> +
>  	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
>  		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
>  	if (mem->validation_bits & CPER_MEM_VALID_PA)
> @@ -281,14 +320,15 @@ static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
>  	if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
>  		printk("%s""physical_address_mask: 0x%016llx\n",
>  		       pfx, mem->physical_addr_mask);
> -	if (cper_mem_err_location(mem, rcd_decode_str))
> +	cper_mem_err_pack(mem, &cmem);
> +	if (cper_mem_err_location(&cmem, rcd_decode_str))
>  		printk("%s%s\n", pfx, rcd_decode_str);
>  	if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
>  		u8 etype = mem->error_type;
>  		printk("%s""error_type: %d, %s\n", pfx, etype,
>  		       cper_mem_err_type_str(etype));
>  	}
> -	if (cper_dimm_err_location(mem, rcd_decode_str))
> +	if (cper_dimm_err_location(&cmem, rcd_decode_str))
>  		printk("%s%s\n", pfx, rcd_decode_str);
>  }
>  
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index 4cac43a..da227a3 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -23,4 +23,5 @@ static int __init ras_init(void)
>  }
>  subsys_initcall(ras_init);
>  
> +EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
> diff --git a/include/linux/cper.h b/include/linux/cper.h
> index ed088b9..3548160 100644
> --- a/include/linux/cper.h
> +++ b/include/linux/cper.h
> @@ -22,6 +22,7 @@
>  #define LINUX_CPER_H
>  
>  #include <linux/uuid.h>
> +#include <linux/trace_seq.h>
>  
>  /* CPER record signature and the size */
>  #define CPER_SIG_RECORD				"CPER"
> @@ -363,6 +364,24 @@ struct cper_sec_mem_err {
>  	__u16	mem_dev_handle;		/* module handle in UEFI 2.4 */
>  };
>  
> +struct cper_mem_err_compact {
> +	__u64	validation_bits;
> +	__u16	node;
> +	__u16	card;
> +	__u16	module;
> +	__u16	bank;
> +	__u16	device;
> +	__u16	row;
> +	__u16	column;
> +	__u16	bit_pos;
> +	__u64	requestor_id;
> +	__u64	responder_id;
> +	__u64	target_id;
> +	__u16	rank;
> +	__u16	mem_array_handle;
> +	__u16	mem_dev_handle;
> +};
> +
>  struct cper_sec_pcie {
>  	__u64		validation_bits;
>  	__u32		port_type;
> @@ -406,5 +425,7 @@ const char *cper_severity_str(unsigned int);
>  const char *cper_mem_err_type_str(unsigned int);
>  void cper_print_bits(const char *prefix, unsigned int bits,
>  		     const char * const strs[], unsigned int strs_size);
> +void cper_mem_err_pack(const struct cper_sec_mem_err *, void *);
> +const char *cper_mem_err_unpack(struct trace_seq *, void *);
>  
>  #endif
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index acbcbb8..c5e58db 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -9,6 +9,69 @@
>  #include <linux/edac.h>
>  #include <linux/ktime.h>
>  #include <linux/aer.h>
> +#include <linux/cper.h>
> +
> +/*
> + * MCE Extended Error Log trace event
> + *
> + * These events are generated when hardware detects a corrected or
> + * uncorrected event.
> + */
> +
> +/* memory trace event */
> +
> +TRACE_EVENT(extlog_mem_event,
> +	TP_PROTO(struct cper_sec_mem_err *mem,
> +		 u32 err_seq,
> +		 const uuid_le *fru_id,
> +		 const char *fru_text,
> +		 u8 sev),
> +
> +	TP_ARGS(mem, err_seq, fru_id, fru_text, sev),
> +
> +	TP_STRUCT__entry(
> +		__field(u32, err_seq)
> +		__field(u8, etype)
> +		__field(u8, sev)
> +		__field(u64, pa)
> +		__field(u8, pa_mask_lsb)
> +		__array(u8, fru_id, 40)

How did you come up with this magic number? Why isn't that sizeof(uuid_le)?

> +		__string(fru_text, fru_text)
> +		__array(u8, data, sizeof(struct cper_mem_err_compact))
> +	),
> +
> +	TP_fast_assign(
> +		__entry->err_seq = err_seq;
> +		if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
> +			__entry->etype = mem->error_type;
> +		else
> +			__entry->etype = ~0;
> +		__entry->sev = sev;
> +		if (mem->validation_bits & CPER_MEM_VALID_PA)
> +			__entry->pa = mem->physical_addr;
> +		else
> +			__entry->pa = ~0ull;
> +
> +		if (mem->validation_bits & CPER_MEM_VALID_PA_MASK)
> +			__entry->pa_mask_lsb =
> +				(u8)__ffs64(mem->physical_addr_mask);

No need for the linebreak here - just let it stick out.

> +		else
> +			__entry->pa_mask_lsb = ~0;
> +		snprintf(__entry->fru_id, 39, "%pUl", fru_id);

Yeah, I didn't catch the reasoning behind why we need to convert the FRU
into a string and not leave it simply as u8[16]...

> +		__assign_str(fru_text, fru_text);
> +		cper_mem_err_pack(mem, __entry->data);
> +	),
> +
> +	TP_printk("{%d} %s error: %s physical addr: %016llx (mask lsb: %x) %sFRU: %s %.20s",
> +		  __entry->err_seq,
> +		  cper_severity_str(__entry->sev),
> +		  cper_mem_err_type_str(__entry->etype),
> +		  __entry->pa,
> +		  __entry->pa_mask_lsb,
> +		  cper_mem_err_unpack(p, __entry->data),
> +		  __entry->fru_id,
> +		  __get_str(fru_text))
> +);
>  
>  /*
>   * Hardware Events Report
> -- 
> 2.0.0.rc2
> 
> 

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 5/7 v7] trace, RAS: Add eMCA trace event interface
  2014-06-11 19:02   ` [PATCH 5/7 v7] trace, RAS: Add eMCA trace event interface Borislav Petkov
@ 2014-06-12  2:42     ` Chen, Gong
  0 siblings, 0 replies; 5+ messages in thread
From: Chen, Gong @ 2014-06-12  2:42 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, m.chehab, rostedt, linux-acpi, lkml

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

On Wed, Jun 11, 2014 at 09:02:15PM +0200, Borislav Petkov wrote:
> > +EXPORT_SYMBOL_GPL(cper_mem_err_pack);
> 
> Why do we export this one and the one below? What .config warrants this?
> 
> CONFIG_ACPI_EXTLOG=m doesn't need them, AFAICT.
> 
Right. acpi_extlog doesn't use it. They can be exported later until needed.

> > +	TP_STRUCT__entry(
> > +		__field(u32, err_seq)
> > +		__field(u8, etype)
> > +		__field(u8, sev)
> > +		__field(u64, pa)
> > +		__field(u8, pa_mask_lsb)
> > +		__array(u8, fru_id, 40)
> 
> How did you come up with this magic number? Why isn't that sizeof(uuid_le)?
Cause I want to convert it into a string.

> > +		snprintf(__entry->fru_id, 39, "%pUl", fru_id);
> 
> Yeah, I didn't catch the reasoning behind why we need to convert the FRU
> into a string and not leave it simply as u8[16]...
Fair enough. It can be compressed a little bit more.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-06-12  3:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1402475691-30045-1-git-send-email-gong.chen@linux.intel.com>
     [not found] ` <1402475691-30045-2-git-send-email-gong.chen@linux.intel.com>
2014-06-11 18:59   ` [PATCH 1/7 v5] trace, RAS: Add basic RAS trace event Borislav Petkov
     [not found] ` <1402475691-30045-3-git-send-email-gong.chen@linux.intel.com>
2014-06-11 19:00   ` [PATCH 2/7 v3] trace, AER: Move trace into unified interface Borislav Petkov
     [not found] ` <1402475691-30045-5-git-send-email-gong.chen@linux.intel.com>
2014-06-11 19:01   ` [PATCH 4/7 v2] RAS, debugfs: Add debugfs interface for RAS subsystem Borislav Petkov
     [not found] ` <1402475691-30045-6-git-send-email-gong.chen@linux.intel.com>
2014-06-11 19:02   ` [PATCH 5/7 v7] trace, RAS: Add eMCA trace event interface Borislav Petkov
2014-06-12  2:42     ` Chen, Gong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).