linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@redhat.com>
To: unlisted-recipients:; (no To-header on input)
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>,
	Linux Edac Mailing List <linux-edac@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: [PATCH 09/13] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
Date: Thu, 29 Mar 2012 13:45:42 -0300	[thread overview]
Message-ID: <1333039546-5590-10-git-send-email-mchehab@redhat.com> (raw)
In-Reply-To: <1333039546-5590-1-git-send-email-mchehab@redhat.com>

Adds a trace class for handle hardware events

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

    We have several subsystems & methods for reporting hardware errors:

    1) EDAC ("Error Detection and Correction").  In its original form
    this consisted of a platform specific driver that read topology
    information and error counts from chipset registers and reported
    the results via a sysfs interface.

    2) mcelog - x86 specific decoding of machine check bank registers
    reporting in binary form via /dev/mcelog. Recent additions make use
    of the APEI extensions that were documented in version 4.0a of the
    ACPI specification to acquire more information about errors without
    having to rely reading chipset registers directly. A user level
    programs decodes into somewhat human readable format.

    3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
    decodes errors reported via machine check bank registers in AMD
    processors to the console log using printk();

    Each of these mechanisms has a band of followers ... and none
    of them appear to meet all the needs of all users.

In order to provide a proper hardware event subsystem, let's
encapsulate the memory error hardware events into a trace facility.

As no agreement was reached so far for the MCA-based trace events, for
now, let's add events only for memory errors. A latter patch can change
the tracepoint, for events originated via MCA.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_core.h        |    2 +-
 drivers/edac/edac_mc.c          |   11 ++++-
 include/trace/events/hw_event.h |  107 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+), 2 deletions(-)
 create mode 100644 include/trace/events/hw_event.h

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index 8aadd83..f7c8c8a 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -469,7 +469,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog);
+			  const void *arch_log);
 
 /*
  * edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 138437c..721d5cc 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,9 @@
 #include "edac_core.h"
 #include "edac_module.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/hw_event.h>
+
 /* lock to memory controller's control array */
 static DEFINE_MUTEX(mem_ctls_mutex);
 static LIST_HEAD(mc_devices);
@@ -381,6 +384,9 @@ struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
 	 * which will perform kobj unregistration and the actual free
 	 * will occur during the kobject callback operation
 	 */
+
+	trace_hw_event_init("edac", (unsigned)edac_index);
+
 	return mci;
 }
 EXPORT_SYMBOL_GPL(edac_mc_alloc);
@@ -900,7 +906,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog)
+			  const void *arch_log)
 {
 	unsigned long remapped_page;
 	/* FIXME: too much for stack: move it to some pre-alocated area */
@@ -1038,6 +1044,9 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			"page 0x%lx offset 0x%lx grain %d",
 			page_frame_number, offset_in_page, grain);
 
+	trace_mc_error(type, mci->mc_idx, msg, label, location,
+		       detail, other_detail);
+
 	if (type == HW_EVENT_ERR_CORRECTED) {
 		if (edac_mc_get_log_ce())
 			edac_mc_printk(mci, KERN_WARNING,
diff --git a/include/trace/events/hw_event.h b/include/trace/events/hw_event.h
new file mode 100644
index 0000000..1fabfe2
--- /dev/null
+++ b/include/trace/events/hw_event.h
@@ -0,0 +1,107 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hw_event
+
+#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_MC_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+#include <linux/ktime.h>
+
+/*
+ * Hardware Anomaly Report Mecanism (HARM) events
+ *
+ * Those events are generated when hardware detected a corrected or
+ * uncorrected event, and are meant to replace the current API to report
+ * errors defined on both EDAC and MCE subsystems.
+ *
+ * FIXME: Add events for handling memory errors originated from the
+ *        MCE subsystem.
+ */
+
+DECLARE_EVENT_CLASS(hw_event_class,
+	TP_PROTO(const char *type, unsigned int instance),
+	TP_ARGS(type, instance),
+
+	TP_STRUCT__entry(
+		__string(	type,		type			)
+		__field(	unsigned int,	instance		)
+	),
+
+	TP_fast_assign(
+		__assign_str(type, type);
+		__entry->instance = instance;
+	),
+
+	TP_printk("Initialized %s#%d\n",
+		__get_str(type),
+		__entry->instance)
+);
+
+/*
+ * This event indicates that a hardware collection mechanism is started
+ */
+DEFINE_EVENT(hw_event_class, hw_event_init,
+
+	TP_PROTO(const char *type, unsigned int instance),
+
+	TP_ARGS(type, instance)
+);
+
+
+/*
+ * Hardware-independent Memory Controller specific events
+ */
+
+/*
+ * Default error mechanisms for Memory Controller errors (CE and UE)
+ */
+TRACE_EVENT(mc_error,
+
+	TP_PROTO(const unsigned int err_type,
+		 const unsigned int mc_index,
+		 const char *msg,
+		 const char *label,
+		 const char *location,
+		 const char *detail,
+		 const char *driver_detail),
+
+	TP_ARGS(err_type, mc_index, msg, label, location,
+		detail, driver_detail),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	err_type		)
+		__field(	unsigned int,	mc_index		)
+		__string(	msg,		msg			)
+		__string(	label,		label			)
+		__string(	detail,		detail			)
+		__string(	location,	location		)
+		__string(	driver_detail,	driver_detail		)
+	),
+
+	TP_fast_assign(
+		__entry->err_type		= err_type;
+		__entry->mc_index		= mc_index;
+		__assign_str(msg, msg);
+		__assign_str(label, label);
+		__assign_str(location, location);
+		__assign_str(detail, detail);
+		__assign_str(driver_detail, driver_detail);
+	),
+
+	TP_printk(HW_ERR "mce#%d: %s error %s on label \"%s\" (%s %s %s)",
+		  __entry->mc_index,
+		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+			"Fatal" : "Uncorrected"),
+		  __get_str(msg),
+		  __get_str(label),
+		  __get_str(location),
+		  __get_str(detail),
+		  __get_str(driver_detail))
+);
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.8


  parent reply	other threads:[~2012-03-29 16:47 UTC|newest]

Thread overview: 161+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-29 16:45 [PATCH 00/13] Convert EDAC internal strutures to support all types of Memory Controllers Mauro Carvalho Chehab
2012-03-29 16:45 ` [PATCH 01/13] edac: Create a dimm struct and move the labels into it Mauro Carvalho Chehab
2012-03-30 10:50   ` Borislav Petkov
2012-03-30 13:26     ` Mauro Carvalho Chehab
2012-03-30 15:38       ` Borislav Petkov
2012-04-16  8:41     ` Mauro Carvalho Chehab
2012-04-16 11:02       ` Borislav Petkov
2012-04-16 11:44         ` Mauro Carvalho Chehab
2012-04-16 13:21           ` Borislav Petkov
2012-03-29 16:45 ` [PATCH 02/13] edac: move dimm properties to struct memset_info Mauro Carvalho Chehab
2012-03-30 13:10   ` Borislav Petkov
2012-03-30 13:22     ` Mauro Carvalho Chehab
2012-03-30 17:03   ` Borislav Petkov
2012-04-16  8:56     ` Mauro Carvalho Chehab
2012-04-16 13:31       ` Borislav Petkov
2012-03-29 16:45 ` [PATCH 03/13] edac: Don't initialize csrow's first_page & friends when not needed Mauro Carvalho Chehab
2012-04-02 12:33   ` Borislav Petkov
2012-03-29 16:45 ` [PATCH 04/13] edac: move nr_pages to dimm struct Mauro Carvalho Chehab
2012-04-02 13:18   ` Borislav Petkov
2012-03-29 16:45 ` [PATCH 05/13] edac: Fix core support for MC's that see DIMMS instead of ranks Mauro Carvalho Chehab
2012-03-29 16:45 ` [PATCH 06/13] edac: Initialize the dimm label with the known information Mauro Carvalho Chehab
2012-03-29 16:45 ` [PATCH 07/13] edac: Cleanup the logs for i7core and sb edac drivers Mauro Carvalho Chehab
2012-03-29 16:45 ` [PATCH 08/13] i5400_edac: improve debug messages to better represent the filled memory Mauro Carvalho Chehab
2012-03-29 16:45 ` Mauro Carvalho Chehab [this message]
2012-03-29 16:45 ` [PATCH 10/13] i5000_edac: Fix the logic that retrieves memory information Mauro Carvalho Chehab
2012-03-29 16:45 ` [PATCH 11/13] e752x_edac: provide more info about how DIMMS/ranks are mapped Mauro Carvalho Chehab
2012-03-29 16:45 ` [PATCH 12/13] edac: Rename the parent dev to pdev Mauro Carvalho Chehab
2012-03-29 16:45 ` [PATCH 13/13] edac: use Documentation-nano format for some data structs Mauro Carvalho Chehab
2012-03-29 20:46 ` [PATCH 00/13] Convert EDAC internal strutures to support all types of Memory Controllers Aristeu Rozanski Filho
2012-04-02 13:59 ` Borislav Petkov
2012-04-16 12:58   ` Mauro Carvalho Chehab
2012-04-16 14:06     ` Borislav Petkov
2012-04-16 20:12 ` [EDAC PATCH v13 0/7] Convert EDAC core to work with non-csrow-based memory controllers Mauro Carvalho Chehab
2012-04-16 20:12   ` [EDAC PATCH v13 1/7] edac: Create a dimm struct and move the labels into it Mauro Carvalho Chehab
2012-04-26 14:26     ` Borislav Petkov
2012-04-16 20:12   ` [EDAC PATCH v13 3/7] edac: Don't initialize csrow's first_page & friends when not needed Mauro Carvalho Chehab
2012-04-16 20:12   ` [EDAC PATCH v13 4/7] edac: move nr_pages to dimm struct Mauro Carvalho Chehab
2012-04-17 18:48     ` Borislav Petkov
2012-04-17 19:28       ` Mauro Carvalho Chehab
2012-04-17 21:40         ` Borislav Petkov
2012-04-18 12:58           ` Mauro Carvalho Chehab
2012-04-18 17:53           ` [PATCH] " Mauro Carvalho Chehab
2012-04-16 20:12   ` [EDAC PATCH v13 5/7] edac: rewrite edac_align_ptr() Mauro Carvalho Chehab
2012-04-18 14:06     ` Borislav Petkov
2012-04-18 15:25       ` Borislav Petkov
2012-04-18 18:15       ` Mauro Carvalho Chehab
2012-04-18 18:19       ` [PATCH] " Mauro Carvalho Chehab
2012-04-23 14:05         ` Borislav Petkov
2012-04-23 15:19           ` Mauro Carvalho Chehab
2012-04-23 15:26             ` Mauro Carvalho Chehab
2012-04-16 20:12   ` [EDAC PATCH v13 6/7] edac.h: Prepare to handle with generic layers Mauro Carvalho Chehab
2012-04-23 17:49     ` Borislav Petkov
2012-04-23 18:30       ` Mauro Carvalho Chehab
2012-04-23 18:56         ` Mauro Carvalho Chehab
2012-04-23 19:19           ` [PATCH] edac.h: Add generic layers for describing a memory location Mauro Carvalho Chehab
2012-04-23 20:07             ` Mauro Carvalho Chehab
2012-04-24 10:46               ` Borislav Petkov
2012-04-24 10:40         ` [EDAC PATCH v13 6/7] edac.h: Prepare to handle with generic layers Borislav Petkov
2012-04-24 11:46           ` Mauro Carvalho Chehab
2012-04-24 12:42             ` Mauro Carvalho Chehab
2012-04-24 12:49               ` [PATCH] edac.h: Add generic layers for describing a memory location Mauro Carvalho Chehab
2012-04-24 13:09                 ` Borislav Petkov
2012-04-24 13:22                   ` Mauro Carvalho Chehab
2012-04-24 13:38                     ` Borislav Petkov
2012-04-24 16:39                       ` Mauro Carvalho Chehab
2012-04-24 16:49                         ` Borislav Petkov
2012-04-24 17:38                           ` Mauro Carvalho Chehab
     [not found]                             ` <1335291342-14922-1-git-send-email-mchehab@redhat.com>
2012-04-24 18:15                               ` [PATCH EDACv16 2/2] amd64_edac: convert driver to use the new edac ABI Mauro Carvalho Chehab
2012-04-27 10:42                                 ` Mauro Carvalho Chehab
2012-04-27 13:33                               ` [PATCH EDACv16 1/2] edac: Change internal representation to work with layers Borislav Petkov
2012-04-27 14:11                                 ` Joe Perches
2012-04-27 15:12                                   ` Borislav Petkov
2012-04-27 16:07                                   ` Mauro Carvalho Chehab
2012-04-28  8:52                                     ` Borislav Petkov
2012-04-28 20:38                                       ` Joe Perches
2012-04-29 14:25                                       ` Mauro Carvalho Chehab
2012-04-29 15:11                                         ` Mauro Carvalho Chehab
2012-04-29 16:03                                           ` Joe Perches
2012-04-29 17:18                                             ` Mauro Carvalho Chehab
2012-04-29 16:20                                           ` Mauro Carvalho Chehab
2012-04-29 16:43                                             ` Joe Perches
2012-04-29 17:39                                               ` Mauro Carvalho Chehab
2012-04-30  7:47                                                 ` Borislav Petkov
2012-04-30 11:09                                                   ` Mauro Carvalho Chehab
2012-04-30 11:15                                                     ` Borislav Petkov
2012-04-30 11:46                                                       ` Mauro Carvalho Chehab
2012-04-27 15:36                                 ` Mauro Carvalho Chehab
2012-04-28  9:05                                   ` Borislav Petkov
2012-04-29 13:49                                     ` Mauro Carvalho Chehab
2012-04-30  8:15                                       ` Borislav Petkov
2012-04-30 10:58                                         ` Mauro Carvalho Chehab
2012-04-30 11:11                                           ` Borislav Petkov
2012-04-30 11:45                                             ` Mauro Carvalho Chehab
2012-04-30 12:38                                               ` Borislav Petkov
2012-04-30 13:00                                                 ` Mauro Carvalho Chehab
2012-04-30 13:53                                                   ` Mauro Carvalho Chehab
2012-04-30 15:02                                                     ` [PATCH v2] edac_mc: Cleanup per-dimm_info debug messages Mauro Carvalho Chehab
2012-04-30 15:10                                                       ` Mauro Carvalho Chehab
2012-04-30 15:20                                                         ` Borislav Petkov
2012-04-30 15:33                                                           ` Mauro Carvalho Chehab
2012-04-30 16:16                                                       ` Joe Perches
2012-04-30 16:47                                                         ` Mauro Carvalho Chehab
2012-04-30 16:44                                                       ` [PATCHv3] " Mauro Carvalho Chehab
2012-04-30 11:37                                         ` [PATCH EDACv16 1/2] edac: Change internal representation to work with layers Mauro Carvalho Chehab
2012-04-27 17:52                                 ` Mauro Carvalho Chehab
2012-04-28  9:16                                   ` Borislav Petkov
2012-04-28 17:07                                     ` Joe Perches
2012-04-29 14:02                                       ` Mauro Carvalho Chehab
2012-04-29 14:16                                     ` Mauro Carvalho Chehab
2012-04-30  7:59                                       ` Borislav Petkov
2012-04-30 11:23                                         ` Mauro Carvalho Chehab
2012-04-30 12:51                                           ` Borislav Petkov
2012-04-24 12:55             ` [EDAC PATCH v13 6/7] edac.h: Prepare to handle with generic layers Borislav Petkov
2012-04-24 13:11               ` Mauro Carvalho Chehab
2012-04-24 13:32                 ` Borislav Petkov
2012-04-24 14:24                   ` Mauro Carvalho Chehab
2012-04-24 16:27                     ` Borislav Petkov
2012-04-24 17:24                       ` Mauro Carvalho Chehab
2012-04-25 17:19                         ` Borislav Petkov
2012-04-25 17:47                           ` Mauro Carvalho Chehab
2012-04-25 18:32                             ` Luck, Tony
2012-04-25 18:44                               ` Mauro Carvalho Chehab
2012-04-26 14:11                             ` Borislav Petkov
2012-04-26 14:25                               ` Mauro Carvalho Chehab
2012-04-26 14:59                                 ` Mauro Carvalho Chehab
2012-04-25 17:55                           ` Luck, Tony
2012-04-24 17:31                       ` Luck, Tony
2012-04-16 20:21   ` [EDAC_ABI PATCH v13 00/26] Use the new EDAC kernel ABI on drivers Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 01/26] amd64_edac: convert driver to use the new edac ABI Mauro Carvalho Chehab
2012-05-07 14:31       ` Borislav Petkov
2012-05-07 16:12         ` Mauro Carvalho Chehab
2012-05-07 16:17           ` Borislav Petkov
2012-05-07 16:59             ` Mauro Carvalho Chehab
2012-05-07 19:49               ` Borislav Petkov
2012-05-07 16:24           ` Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 02/26] amd76x_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 03/26] cell_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 04/26] cpc925_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 05/26] e752x_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 06/26] e7xxx_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 07/26] i3000_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 08/26] i3200_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 09/26] i5000_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 10/26] i5100_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 11/26] i5400_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 12/26] i7300_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 13/26] i7core_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 14/26] i82443bxgx_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 15/26] i82860_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 16/26] i82875p_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 17/26] i82975x_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 18/26] mpc85xx_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 19/26] mv64x60_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 20/26] pasemi_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 21/26] ppc4xx_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 22/26] r82600_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 23/26] sb_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 24/26] tile_edac: " Mauro Carvalho Chehab
2012-04-26 19:47       ` Chris Metcalf
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 25/26] x38_edac: " Mauro Carvalho Chehab
2012-04-16 20:21     ` [EDAC_ABI PATCH v13 26/26] edac: Remove the legacy EDAC ABI Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1333039546-5590-10-git-send-email-mchehab@redhat.com \
    --to=mchehab@redhat.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).