linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
@ 2012-05-10 19:56 Mauro Carvalho Chehab
  2012-05-10 20:40 ` Borislav Petkov
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-10 19:56 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

Add a new tracepoint-based hardware events report method for
outputing Memory Controller events.

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

    We have several subsystems & methods for reporting hardware errors:

    1) EDAC ("Error Detection and Correction").  In its original form
    this consisted of a platform specific driver that read topology
    information and error counts from chipset registers and reported
    the results via a sysfs interface.

    2) mcelog - x86 specific decoding of machine check bank registers
    reporting in binary form via /dev/mcelog. Recent additions make use
    of the APEI extensions that were documented in version 4.0a of the
    ACPI specification to acquire more information about errors without
    having to rely reading chipset registers directly. A user level
    programs decodes into somewhat human readable format.

    3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
    decodes errors reported via machine check bank registers in AMD
    processors to the console log using printk();

    Each of these mechanisms has a band of followers ... and none
    of them appear to meet all the needs of all users.

As part of a hardware event subsystem, let's encapsulate the memory
error hardware events into a trace facility.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
Hoever, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_core.h |    2 +-
 drivers/edac/edac_mc.c   |   25 +++++++++++----
 include/ras/hw_event.h   |   77 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 8 deletions(-)
 create mode 100644 include/ras/hw_event.h

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog);
+			  const void *arch_log);
 
 /*
  * edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index e5b5563..11f0178 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,10 @@
 #include "edac_core.h"
 #include "edac_module.h"
 
+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/hw_event.h>
+
 /* lock to memory controller's control array */
 static DEFINE_MUTEX(mem_ctls_mutex);
 static LIST_HEAD(mc_devices);
@@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 	 * which will perform kobj unregistration and the actual free
 	 * will occur during the kobject callback operation
 	 */
+
 	return mci;
 }
 EXPORT_SYMBOL_GPL(edac_mc_alloc);
@@ -982,7 +987,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog)
+			  const void *arch_log)
 {
 	/* FIXME: too much for stack: move it to some pre-alocated area */
 	char detail[80], location[80];
@@ -1119,21 +1124,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 	}
 
 	/* Memory type dependent details about the error */
-	if (type == HW_EVENT_ERR_CORRECTED) {
+	if (type == HW_EVENT_ERR_CORRECTED)
 		snprintf(detail, sizeof(detail),
 			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
 			page_frame_number, offset_in_page,
 			grain, syndrome);
-		edac_ce_error(mci, pos, msg, location, label, detail,
-			      other_detail, enable_per_layer_report,
-			      page_frame_number, offset_in_page, grain);
-	} else {
+	else
 		snprintf(detail, sizeof(detail),
 			"page:0x%lx offset:0x%lx grain:%d",
 			page_frame_number, offset_in_page, grain);
 
+	/* Report the error via the trace interface */
+	trace_mc_error(type, mci->mc_idx, msg, label, location,
+		       detail, other_detail);
+
+	/* Report the error via the edac_mc_printk() interface */
+	if (type == HW_EVENT_ERR_CORRECTED)
+		edac_ce_error(mci, pos, msg, location, label, detail,
+			      other_detail, enable_per_layer_report,
+			      page_frame_number, offset_in_page, grain);
+	else
 		edac_ue_error(mci, pos, msg, location, label, detail,
 			      other_detail, enable_per_layer_report);
-	}
 }
 EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/include/ras/hw_event.h b/include/ras/hw_event.h
new file mode 100644
index 0000000..66981ef
--- /dev/null
+++ b/include/ras/hw_event.h
@@ -0,0 +1,77 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hw_event
+
+#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_MC_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+#include <linux/ktime.h>
+
+/*
+ * Hardware Events Report
+ *
+ * Those events are generated when hardware detected a corrected or
+ * uncorrected event, and are meant to replace the current API to report
+ * errors defined on both EDAC and MCE subsystems.
+ *
+ * FIXME: Add events for handling memory errors originated from the
+ *        MCE subsystem.
+ */
+
+/*
+ * Hardware-independent Memory Controller specific events
+ */
+
+/*
+ * Default error mechanisms for Memory Controller errors (CE and UE)
+ */
+TRACE_EVENT(mc_error,
+
+	TP_PROTO(const unsigned int err_type,
+		 const unsigned int mc_index,
+		 const char *msg,
+		 const char *label,
+		 const char *location,
+		 const char *detail,
+		 const char *driver_detail),
+
+	TP_ARGS(err_type, mc_index, msg, label, location,
+		detail, driver_detail),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	err_type		)
+		__field(	unsigned int,	mc_index		)
+		__string(	msg,		msg			)
+		__string(	label,		label			)
+		__string(	detail,		detail			)
+		__string(	location,	location		)
+		__string(	driver_detail,	driver_detail		)
+	),
+
+	TP_fast_assign(
+		__entry->err_type		= err_type;
+		__entry->mc_index		= mc_index;
+		__assign_str(msg, msg);
+		__assign_str(label, label);
+		__assign_str(location, location);
+		__assign_str(detail, detail);
+		__assign_str(driver_detail, driver_detail);
+	),
+
+	TP_printk(HW_ERR "mce#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
+		  __entry->mc_index,
+		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+			"Fatal" : "Uncorrected"),
+		  __get_str(msg),
+		  __get_str(label),
+		  __get_str(location),
+		  __get_str(detail),
+		  __get_str(driver_detail))
+);
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-10 19:56 [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues Mauro Carvalho Chehab
@ 2012-05-10 20:40 ` Borislav Petkov
  2012-05-10 20:55   ` Mauro Carvalho Chehab
                     ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Borislav Petkov @ 2012-05-10 20:40 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Tony Luck

It should be:

Subject: RAS: Add a tracepoint for reporting memory controller errors

and not

Subject: edac, ras/hw_event.h: use events to handle hw issues

because events don't handle hw issues.

On Thu, May 10, 2012 at 04:56:28PM -0300, Mauro Carvalho Chehab wrote:
> Add a new tracepoint-based hardware events report method for
> outputing Memory Controller events.

reporting

> 
> Part of the description bellow is shamelessly copied from Tony
> Luck's notes about the Hardware Error BoF during LPC 2010 [1].
> Tony, thanks for your notes and discussions to generate the
> h/w error reporting requirements.
> 
> [1] http://lwn.net/Articles/416669/
> 
>     We have several subsystems & methods for reporting hardware errors:
> 
>     1) EDAC ("Error Detection and Correction").  In its original form
>     this consisted of a platform specific driver that read topology
>     information and error counts from chipset registers and reported
>     the results via a sysfs interface.
> 
>     2) mcelog - x86 specific decoding of machine check bank registers
>     reporting in binary form via /dev/mcelog. Recent additions make use
>     of the APEI extensions that were documented in version 4.0a of the
>     ACPI specification to acquire more information about errors without
>     having to rely reading chipset registers directly. A user level
>     programs decodes into somewhat human readable format.
> 
>     3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
>     decodes errors reported via machine check bank registers in AMD
>     processors to the console log using printk();
> 
>     Each of these mechanisms has a band of followers ... and none
>     of them appear to meet all the needs of all users.
> 
> As part of a hardware event subsystem, let's encapsulate the memory
> error hardware events into a trace facility.
> 
> NOTE: The original patch was providing an additional mechanism for
> MCA-based trace events that also contained MCA error register data.
> Hoever, as no agreement was reached so far for the MCA-based trace
> events, for now, let's add events only for memory errors.
> A latter patch is planned to change the tracepoint, for those types
> of event.
> 
> Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
> Cc: Doug Thompson <norsk5@yahoo.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
> ---
>  drivers/edac/edac_core.h |    2 +-
>  drivers/edac/edac_mc.c   |   25 +++++++++++----
>  include/ras/hw_event.h   |   77 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 96 insertions(+), 8 deletions(-)
>  create mode 100644 include/ras/hw_event.h
> 
> diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
> index f06ce9a..eee7360 100644
> --- a/drivers/edac/edac_core.h
> +++ b/drivers/edac/edac_core.h
> @@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  			  const int layer2,
>  			  const char *msg,
>  			  const char *other_detail,
> -			  const void *mcelog);
> +			  const void *arch_log);
>  
>  /*
>   * edac_device APIs
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index e5b5563..11f0178 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -33,6 +33,10 @@
>  #include "edac_core.h"
>  #include "edac_module.h"
>  
> +#define CREATE_TRACE_POINTS
> +#define TRACE_INCLUDE_PATH ../../include/ras
> +#include <ras/hw_event.h>
> +
>  /* lock to memory controller's control array */
>  static DEFINE_MUTEX(mem_ctls_mutex);
>  static LIST_HEAD(mc_devices);
> @@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>  	 * which will perform kobj unregistration and the actual free
>  	 * will occur during the kobject callback operation
>  	 */
> +
>  	return mci;
>  }
>  EXPORT_SYMBOL_GPL(edac_mc_alloc);
> @@ -982,7 +987,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  			  const int layer2,
>  			  const char *msg,
>  			  const char *other_detail,
> -			  const void *mcelog)
> +			  const void *arch_log)
>  {
>  	/* FIXME: too much for stack: move it to some pre-alocated area */
>  	char detail[80], location[80];
> @@ -1119,21 +1124,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  	}
>  
>  	/* Memory type dependent details about the error */
> -	if (type == HW_EVENT_ERR_CORRECTED) {
> +	if (type == HW_EVENT_ERR_CORRECTED)
>  		snprintf(detail, sizeof(detail),
>  			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
>  			page_frame_number, offset_in_page,
>  			grain, syndrome);
> -		edac_ce_error(mci, pos, msg, location, label, detail,
> -			      other_detail, enable_per_layer_report,
> -			      page_frame_number, offset_in_page, grain);
> -	} else {
> +	else
>  		snprintf(detail, sizeof(detail),
>  			"page:0x%lx offset:0x%lx grain:%d",
>  			page_frame_number, offset_in_page, grain);
>  
> +	/* Report the error via the trace interface */
> +	trace_mc_error(type, mci->mc_idx, msg, label, location,
> +		       detail, other_detail);
> +
> +	/* Report the error via the edac_mc_printk() interface */
> +	if (type == HW_EVENT_ERR_CORRECTED)
> +		edac_ce_error(mci, pos, msg, location, label, detail,
> +			      other_detail, enable_per_layer_report,
> +			      page_frame_number, offset_in_page, grain);
> +	else
>  		edac_ue_error(mci, pos, msg, location, label, detail,
>  			      other_detail, enable_per_layer_report);
> -	}
>  }
>  EXPORT_SYMBOL_GPL(edac_mc_handle_error);
> diff --git a/include/ras/hw_event.h b/include/ras/hw_event.h
> new file mode 100644
> index 0000000..66981ef
> --- /dev/null
> +++ b/include/ras/hw_event.h
> @@ -0,0 +1,77 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM hw_event

			ras

> +
> +#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HW_EVENT_MC_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/edac.h>
> +#include <linux/ktime.h>
> +
> +/*
> + * Hardware Events Report
> + *
> + * Those events are generated when hardware detected a corrected or
> + * uncorrected event, and are meant to replace the current API to report
> + * errors defined on both EDAC and MCE subsystems.
> + *
> + * FIXME: Add events for handling memory errors originated from the
> + *        MCE subsystem.
> + */
> +
> +/*
> + * Hardware-independent Memory Controller specific events
> + */
> +
> +/*
> + * Default error mechanisms for Memory Controller errors (CE and UE)
> + */
> +TRACE_EVENT(mc_error,
> +
> +	TP_PROTO(const unsigned int err_type,
> +		 const unsigned int mc_index,
> +		 const char *msg,
> +		 const char *label,
> +		 const char *location,
> +		 const char *detail,
> +		 const char *driver_detail),
> +
> +	TP_ARGS(err_type, mc_index, msg, label, location,
> +		detail, driver_detail),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned int,	err_type		)
> +		__field(	unsigned int,	mc_index		)
> +		__string(	msg,		msg			)
> +		__string(	label,		label			)
> +		__string(	detail,		detail			)
> +		__string(	location,	location		)
> +		__string(	driver_detail,	driver_detail		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->err_type		= err_type;
> +		__entry->mc_index		= mc_index;
> +		__assign_str(msg, msg);
> +		__assign_str(label, label);
> +		__assign_str(location, location);
> +		__assign_str(detail, detail);
> +		__assign_str(driver_detail, driver_detail);
> +	),
> +
> +	TP_printk(HW_ERR "mce#%d: %s error %s on memory stick \"%s\" (%s %s %s)",

This still says "mce" and it should say "MC" or "mem_ctl" or similar.

> +		  __entry->mc_index,
> +		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> +			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> +			"Fatal" : "Uncorrected"),
> +		  __get_str(msg),
> +		  __get_str(label),
> +		  __get_str(location),
> +		  __get_str(detail),
> +		  __get_str(driver_detail))
> +);
> +
> +#endif /* _TRACE_HW_EVENT_MC_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>

I'm assuming you have all those changes on the experimental branch so
that I can continue reviewing from there?

@Tony: this is adding a RAS tracepoint for memory controller errors
coming from EDAC (for now), any objections/issues?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-10 20:40 ` Borislav Petkov
@ 2012-05-10 20:55   ` Mauro Carvalho Chehab
  2012-05-10 22:46     ` Steven Rostedt
  2012-05-10 21:00   ` [PATCHv23] RAS: " Mauro Carvalho Chehab
  2012-05-10 21:10   ` [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues Luck, Tony
  2 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-10 20:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Tony Luck

Em 10-05-2012 17:40, Borislav Petkov escreveu:
> It should be:
> 
> Subject: RAS: Add a tracepoint for reporting memory controller errors
> 
> and not
> 
> Subject: edac, ras/hw_event.h: use events to handle hw issues
> 
> because events don't handle hw issues.
> 
> On Thu, May 10, 2012 at 04:56:28PM -0300, Mauro Carvalho Chehab wrote:
>> Add a new tracepoint-based hardware events report method for
>> outputing Memory Controller events.
> 
> reporting
> 
>>
>> Part of the description bellow is shamelessly copied from Tony
>> Luck's notes about the Hardware Error BoF during LPC 2010 [1].
>> Tony, thanks for your notes and discussions to generate the
>> h/w error reporting requirements.
>>
>> [1] http://lwn.net/Articles/416669/
>>
>>     We have several subsystems & methods for reporting hardware errors:
>>
>>     1) EDAC ("Error Detection and Correction").  In its original form
>>     this consisted of a platform specific driver that read topology
>>     information and error counts from chipset registers and reported
>>     the results via a sysfs interface.
>>
>>     2) mcelog - x86 specific decoding of machine check bank registers
>>     reporting in binary form via /dev/mcelog. Recent additions make use
>>     of the APEI extensions that were documented in version 4.0a of the
>>     ACPI specification to acquire more information about errors without
>>     having to rely reading chipset registers directly. A user level
>>     programs decodes into somewhat human readable format.
>>
>>     3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
>>     decodes errors reported via machine check bank registers in AMD
>>     processors to the console log using printk();
>>
>>     Each of these mechanisms has a band of followers ... and none
>>     of them appear to meet all the needs of all users.
>>
>> As part of a hardware event subsystem, let's encapsulate the memory
>> error hardware events into a trace facility.
>>
>> NOTE: The original patch was providing an additional mechanism for
>> MCA-based trace events that also contained MCA error register data.
>> Hoever, as no agreement was reached so far for the MCA-based trace
>> events, for now, let's add events only for memory errors.
>> A latter patch is planned to change the tracepoint, for those types
>> of event.
>>
>> Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
>> Cc: Doug Thompson <norsk5@yahoo.com>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
>> ---
>>  drivers/edac/edac_core.h |    2 +-
>>  drivers/edac/edac_mc.c   |   25 +++++++++++----
>>  include/ras/hw_event.h   |   77 ++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 96 insertions(+), 8 deletions(-)
>>  create mode 100644 include/ras/hw_event.h
>>
>> diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
>> index f06ce9a..eee7360 100644
>> --- a/drivers/edac/edac_core.h
>> +++ b/drivers/edac/edac_core.h
>> @@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>>  			  const int layer2,
>>  			  const char *msg,
>>  			  const char *other_detail,
>> -			  const void *mcelog);
>> +			  const void *arch_log);
>>  
>>  /*
>>   * edac_device APIs
>> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
>> index e5b5563..11f0178 100644
>> --- a/drivers/edac/edac_mc.c
>> +++ b/drivers/edac/edac_mc.c
>> @@ -33,6 +33,10 @@
>>  #include "edac_core.h"
>>  #include "edac_module.h"
>>  
>> +#define CREATE_TRACE_POINTS
>> +#define TRACE_INCLUDE_PATH ../../include/ras
>> +#include <ras/hw_event.h>
>> +
>>  /* lock to memory controller's control array */
>>  static DEFINE_MUTEX(mem_ctls_mutex);
>>  static LIST_HEAD(mc_devices);
>> @@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>>  	 * which will perform kobj unregistration and the actual free
>>  	 * will occur during the kobject callback operation
>>  	 */
>> +
>>  	return mci;
>>  }
>>  EXPORT_SYMBOL_GPL(edac_mc_alloc);
>> @@ -982,7 +987,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>>  			  const int layer2,
>>  			  const char *msg,
>>  			  const char *other_detail,
>> -			  const void *mcelog)
>> +			  const void *arch_log)
>>  {
>>  	/* FIXME: too much for stack: move it to some pre-alocated area */
>>  	char detail[80], location[80];
>> @@ -1119,21 +1124,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>>  	}
>>  
>>  	/* Memory type dependent details about the error */
>> -	if (type == HW_EVENT_ERR_CORRECTED) {
>> +	if (type == HW_EVENT_ERR_CORRECTED)
>>  		snprintf(detail, sizeof(detail),
>>  			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
>>  			page_frame_number, offset_in_page,
>>  			grain, syndrome);
>> -		edac_ce_error(mci, pos, msg, location, label, detail,
>> -			      other_detail, enable_per_layer_report,
>> -			      page_frame_number, offset_in_page, grain);
>> -	} else {
>> +	else
>>  		snprintf(detail, sizeof(detail),
>>  			"page:0x%lx offset:0x%lx grain:%d",
>>  			page_frame_number, offset_in_page, grain);
>>  
>> +	/* Report the error via the trace interface */
>> +	trace_mc_error(type, mci->mc_idx, msg, label, location,
>> +		       detail, other_detail);
>> +
>> +	/* Report the error via the edac_mc_printk() interface */
>> +	if (type == HW_EVENT_ERR_CORRECTED)
>> +		edac_ce_error(mci, pos, msg, location, label, detail,
>> +			      other_detail, enable_per_layer_report,
>> +			      page_frame_number, offset_in_page, grain);
>> +	else
>>  		edac_ue_error(mci, pos, msg, location, label, detail,
>>  			      other_detail, enable_per_layer_report);
>> -	}
>>  }
>>  EXPORT_SYMBOL_GPL(edac_mc_handle_error);
>> diff --git a/include/ras/hw_event.h b/include/ras/hw_event.h
>> new file mode 100644
>> index 0000000..66981ef
>> --- /dev/null
>> +++ b/include/ras/hw_event.h
>> @@ -0,0 +1,77 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM hw_event
> 
> 			ras

Then, the header name should be called as "ras.h", otherwise an error happens:

In file included from include/ras/hw_event.h:77:0,
                 from drivers/edac/edac_mc.c:38:
include/trace/define_trace.h:79:43: fatal error: ../../include/ras/ras.h: Arquivo ou diretório não encontrado

> 
>> +
>> +#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_HW_EVENT_MC_H
>> +
>> +#include <linux/tracepoint.h>
>> +#include <linux/edac.h>
>> +#include <linux/ktime.h>
>> +
>> +/*
>> + * Hardware Events Report
>> + *
>> + * Those events are generated when hardware detected a corrected or
>> + * uncorrected event, and are meant to replace the current API to report
>> + * errors defined on both EDAC and MCE subsystems.
>> + *
>> + * FIXME: Add events for handling memory errors originated from the
>> + *        MCE subsystem.
>> + */
>> +
>> +/*
>> + * Hardware-independent Memory Controller specific events
>> + */
>> +
>> +/*
>> + * Default error mechanisms for Memory Controller errors (CE and UE)
>> + */
>> +TRACE_EVENT(mc_error,
>> +
>> +	TP_PROTO(const unsigned int err_type,
>> +		 const unsigned int mc_index,
>> +		 const char *msg,
>> +		 const char *label,
>> +		 const char *location,
>> +		 const char *detail,
>> +		 const char *driver_detail),
>> +
>> +	TP_ARGS(err_type, mc_index, msg, label, location,
>> +		detail, driver_detail),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	unsigned int,	err_type		)
>> +		__field(	unsigned int,	mc_index		)
>> +		__string(	msg,		msg			)
>> +		__string(	label,		label			)
>> +		__string(	detail,		detail			)
>> +		__string(	location,	location		)
>> +		__string(	driver_detail,	driver_detail		)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->err_type		= err_type;
>> +		__entry->mc_index		= mc_index;
>> +		__assign_str(msg, msg);
>> +		__assign_str(label, label);
>> +		__assign_str(location, location);
>> +		__assign_str(detail, detail);
>> +		__assign_str(driver_detail, driver_detail);
>> +	),
>> +
>> +	TP_printk(HW_ERR "mce#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
> 
> This still says "mce" and it should say "MC" or "mem_ctl" or similar.
> 
>> +		  __entry->mc_index,
>> +		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
>> +			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
>> +			"Fatal" : "Uncorrected"),
>> +		  __get_str(msg),
>> +		  __get_str(label),
>> +		  __get_str(location),
>> +		  __get_str(detail),
>> +		  __get_str(driver_detail))
>> +);
>> +
>> +#endif /* _TRACE_HW_EVENT_MC_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
> 
> I'm assuming you have all those changes on the experimental branch so
> that I can continue reviewing from there?

Yes.

> 
> @Tony: this is adding a RAS tracepoint for memory controller errors
> coming from EDAC (for now), any objections/issues?
> 
> Thanks.
> 

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCHv23] RAS: use events to handle hw issues
  2012-05-10 20:40 ` Borislav Petkov
  2012-05-10 20:55   ` Mauro Carvalho Chehab
@ 2012-05-10 21:00   ` Mauro Carvalho Chehab
  2012-05-11 10:04     ` Borislav Petkov
                       ` (2 more replies)
  2012-05-10 21:10   ` [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues Luck, Tony
  2 siblings, 3 replies; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-10 21:00 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

Add a new tracepoint-based hardware events report method for
reporting Memory Controller events.

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

    We have several subsystems & methods for reporting hardware errors:

    1) EDAC ("Error Detection and Correction").  In its original form
    this consisted of a platform specific driver that read topology
    information and error counts from chipset registers and reported
    the results via a sysfs interface.

    2) mcelog - x86 specific decoding of machine check bank registers
    reporting in binary form via /dev/mcelog. Recent additions make use
    of the APEI extensions that were documented in version 4.0a of the
    ACPI specification to acquire more information about errors without
    having to rely reading chipset registers directly. A user level
    programs decodes into somewhat human readable format.

    3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
    decodes errors reported via machine check bank registers in AMD
    processors to the console log using printk();

    Each of these mechanisms has a band of followers ... and none
    of them appear to meet all the needs of all users.

As part of a RAS subsystem, let's encapsulate the memory error hardware
events into a trace facility.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
Hoever, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_core.h |    2 +-
 drivers/edac/edac_mc.c   |   25 +++++++++++----
 include/ras/ras.h        |   77 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 8 deletions(-)
 create mode 100644 include/ras/ras.h

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog);
+			  const void *arch_log);
 
 /*
  * edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index e5b5563..7493adb 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,10 @@
 #include "edac_core.h"
 #include "edac_module.h"
 
+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/ras.h>
+
 /* lock to memory controller's control array */
 static DEFINE_MUTEX(mem_ctls_mutex);
 static LIST_HEAD(mc_devices);
@@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 	 * which will perform kobj unregistration and the actual free
 	 * will occur during the kobject callback operation
 	 */
+
 	return mci;
 }
 EXPORT_SYMBOL_GPL(edac_mc_alloc);
@@ -982,7 +987,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog)
+			  const void *arch_log)
 {
 	/* FIXME: too much for stack: move it to some pre-alocated area */
 	char detail[80], location[80];
@@ -1119,21 +1124,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 	}
 
 	/* Memory type dependent details about the error */
-	if (type == HW_EVENT_ERR_CORRECTED) {
+	if (type == HW_EVENT_ERR_CORRECTED)
 		snprintf(detail, sizeof(detail),
 			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
 			page_frame_number, offset_in_page,
 			grain, syndrome);
-		edac_ce_error(mci, pos, msg, location, label, detail,
-			      other_detail, enable_per_layer_report,
-			      page_frame_number, offset_in_page, grain);
-	} else {
+	else
 		snprintf(detail, sizeof(detail),
 			"page:0x%lx offset:0x%lx grain:%d",
 			page_frame_number, offset_in_page, grain);
 
+	/* Report the error via the trace interface */
+	trace_mc_error(type, mci->mc_idx, msg, label, location,
+		       detail, other_detail);
+
+	/* Report the error via the edac_mc_printk() interface */
+	if (type == HW_EVENT_ERR_CORRECTED)
+		edac_ce_error(mci, pos, msg, location, label, detail,
+			      other_detail, enable_per_layer_report,
+			      page_frame_number, offset_in_page, grain);
+	else
 		edac_ue_error(mci, pos, msg, location, label, detail,
 			      other_detail, enable_per_layer_report);
-	}
 }
 EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/include/ras/ras.h b/include/ras/ras.h
new file mode 100644
index 0000000..13ea4ee
--- /dev/null
+++ b/include/ras/ras.h
@@ -0,0 +1,77 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ras
+
+#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_MC_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+#include <linux/ktime.h>
+
+/*
+ * Hardware Events Report
+ *
+ * Those events are generated when hardware detected a corrected or
+ * uncorrected event, and are meant to replace the current API to report
+ * errors defined on both EDAC and MCE subsystems.
+ *
+ * FIXME: Add events for handling memory errors originated from the
+ *        MCE subsystem.
+ */
+
+/*
+ * Hardware-independent Memory Controller specific events
+ */
+
+/*
+ * Default error mechanisms for Memory Controller errors (CE and UE)
+ */
+TRACE_EVENT(mc_error,
+
+	TP_PROTO(const unsigned int err_type,
+		 const unsigned int mc_index,
+		 const char *msg,
+		 const char *label,
+		 const char *location,
+		 const char *detail,
+		 const char *driver_detail),
+
+	TP_ARGS(err_type, mc_index, msg, label, location,
+		detail, driver_detail),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	err_type		)
+		__field(	unsigned int,	mc_index		)
+		__string(	msg,		msg			)
+		__string(	label,		label			)
+		__string(	detail,		detail			)
+		__string(	location,	location		)
+		__string(	driver_detail,	driver_detail		)
+	),
+
+	TP_fast_assign(
+		__entry->err_type		= err_type;
+		__entry->mc_index		= mc_index;
+		__assign_str(msg, msg);
+		__assign_str(label, label);
+		__assign_str(location, location);
+		__assign_str(detail, detail);
+		__assign_str(driver_detail, driver_detail);
+	),
+
+	TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
+		  __entry->mc_index,
+		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+			"Fatal" : "Uncorrected"),
+		  __get_str(msg),
+		  __get_str(label),
+		  __get_str(location),
+		  __get_str(detail),
+		  __get_str(driver_detail))
+);
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* RE: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-10 20:40 ` Borislav Petkov
  2012-05-10 20:55   ` Mauro Carvalho Chehab
  2012-05-10 21:00   ` [PATCHv23] RAS: " Mauro Carvalho Chehab
@ 2012-05-10 21:10   ` Luck, Tony
  2012-05-10 22:07     ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 43+ messages in thread
From: Luck, Tony @ 2012-05-10 21:10 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

>> +	TP_printk(HW_ERR "mce#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
>
> This still says "mce" and it should say "MC" or "mem_ctl" or similar.

I'm trying to look at how this will look to an end user who is not intimately
acquainted with the internals of how memory subsystems work.

Whether the string starts with "mce" or "MC" or whatever ... what will the
user do with the mc_index that is printed with that first %d?  I don't think
it helps them find the DIMM when they open the box.  I suppose it is useful
if there are multiple messages ... and they see that the same memory controller
is mentioned in each. But I almost think it belongs inside the parentheses at
the end as the "low level details that most users won't need to care about.

Next %s is "Corrected" or "Fatal" or "Uncorrected" ... that's good.

What are the options for the next "%s" (msg)?

"memory stick"?? I suppose "DIMM" is a bit implementation dependent (SIMMs
are long gone ... but perhaps there will be some new acronym for stacked
memory ... STIMS :-) )

Then label (from SMBIOS) ... then the internal details. Good.

-Tony

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-10 21:10   ` [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues Luck, Tony
@ 2012-05-10 22:07     ` Mauro Carvalho Chehab
  2012-05-10 22:37       ` Luck, Tony
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-10 22:07 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

Em 10-05-2012 18:10, Luck, Tony escreveu:
>>> +	TP_printk(HW_ERR "mce#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
>>
>> This still says "mce" and it should say "MC" or "mem_ctl" or similar.
> 
> I'm trying to look at how this will look to an end user who is not intimately
> acquainted with the internals of how memory subsystems work.

This is what patch v23 prints on sb_edac:

# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 30/30   #P:32
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
     kworker/u:6-201   [007] .N..   186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1  page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)
     kworker/u:6-201   [007] .N..   186.239536: mc_error: [Hardware Error]: mem_ctl#1: Corrected error memory read error on memory stick "DIMM_E2" (channel:0 slot:0  page:0x93180b offset:0x927 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=1 channel=2/mask=4 rank=0)

There are still some space to improve the fields provided by the drivers.

> Whether the string starts with "mce" or "MC" or whatever ... what will the
> user do with the mc_index that is printed with that first %d?  I don't think
> it helps them find the DIMM when they open the box.  

Well, it helps to match the memory information on the trace with the sysfs nodes
that are memory-controller based and with the dmesg info.

Calling it as "mc" is more coherent with the dmesg prints.

> I suppose it is useful
> if there are multiple messages ... and they see that the same memory controller
> is mentioned in each. But I almost think it belongs inside the parentheses at
> the end as the "low level details that most users won't need to care about.

I don't have a strong preference, although I think it is better to have it at
the beginning.
 
> Next %s is "Corrected" or "Fatal" or "Uncorrected" ... that's good.
> 
> What are the options for the next "%s" (msg)?

The type of the error. In the above, it is "memory read error".
 
> "memory stick"?? I suppose "DIMM" is a bit implementation dependent (SIMMs
> are long gone ... but perhaps there will be some new acronym for stacked
> memory ... STIMS :-) )

Memory stick is described at edac.h as:

 * Memory Stick:	A printed circuit board that aggregates multiple
 *			memory devices in parallel.  In general, this is the
 *			Field Replaceable Unit (FRU) which gets replaced, in
 *			the case of excessive errors. Most often it is also
 *			called DIMM (Dual Inline Memory Module).
 *

DIMM is implementation dependent. As EDAC is also used on non-x86 archs, calling
it as DIMM on ARM is probably wrong.

Calling it as "STIMS" (or any other unusual acronym) seems worse ;)

> 
> Then label (from SMBIOS) ... then the internal details. Good.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-10 22:07     ` Mauro Carvalho Chehab
@ 2012-05-10 22:37       ` Luck, Tony
  2012-05-11  1:48         ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 43+ messages in thread
From: Luck, Tony @ 2012-05-10 22:37 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

     kworker/u:6-201   [007] .N..   186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1  page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)
     
The word "error" appears *five* times on this line (once with a capital E).
I feel beaten, bruised and ready to give up on this machine with just one
actual error reported :-)

We could get rid of one by:
 s/Corrected error memory read error/Corrected memory read error/

(though we'd need to see if things still read well for all other "msg" options.

Or perhaps it could say:
   ... Corrected error: memory read on memory stick ...
or even:
   ... Corrected error: read on memory stick ...

This part could get shortened too:
   mc_error: [Hardware Error]:
will mc_error ever report something that isn't a "Hardware Error"?
I don't think we have to preserve this legacy string when moving
to a new reporting mechanism.


> There are still some space to improve the fields provided by the drivers.
Apart from reporting "channel" twice, that doesn't look too bad. Maybe
the "1 error(s)" could say "count: 1"?

-Tony


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-10 20:55   ` Mauro Carvalho Chehab
@ 2012-05-10 22:46     ` Steven Rostedt
  2012-05-10 23:16       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 43+ messages in thread
From: Steven Rostedt @ 2012-05-10 22:46 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker,
	Ingo Molnar, Tony Luck

On Thu, 2012-05-10 at 17:55 -0300, Mauro Carvalho Chehab wrote:

> >>  EXPORT_SYMBOL_GPL(edac_mc_handle_error);
> >> diff --git a/include/ras/hw_event.h b/include/ras/hw_event.h
> >> new file mode 100644
> >> index 0000000..66981ef
> >> --- /dev/null
> >> +++ b/include/ras/hw_event.h
> >> @@ -0,0 +1,77 @@
> >> +#undef TRACE_SYSTEM
> >> +#define TRACE_SYSTEM hw_event
> > 
> > 			ras
> 
> Then, the header name should be called as "ras.h", otherwise an error happens:
> 
> In file included from include/ras/hw_event.h:77:0,
>                  from drivers/edac/edac_mc.c:38:
> include/trace/define_trace.h:79:43: fatal error: ../../include/ras/ras.h: Arquivo ou diretório não encontrado

>From samples/trace_events/trace-events-sample.h:

/*
 * TRACE_INCLUDE_FILE is not needed if the filename and TRACE_SYSTEM are equal
 */
#define TRACE_INCLUDE_FILE trace-events-sample

Thus, you could do:

+++ b/include/ras/hw_event.h

+undef TRACE_SYSTEM
+define TRACE_SYSTEM ras

[...]

#define TRACE_INCLUDE_FILE hw_event


-- Steve



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-10 22:46     ` Steven Rostedt
@ 2012-05-10 23:16       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-10 23:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Borislav Petkov, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker,
	Ingo Molnar, Tony Luck

Em 10-05-2012 19:46, Steven Rostedt escreveu:
> On Thu, 2012-05-10 at 17:55 -0300, Mauro Carvalho Chehab wrote:
> 
>>>>  EXPORT_SYMBOL_GPL(edac_mc_handle_error);
>>>> diff --git a/include/ras/hw_event.h b/include/ras/hw_event.h
>>>> new file mode 100644
>>>> index 0000000..66981ef
>>>> --- /dev/null
>>>> +++ b/include/ras/hw_event.h
>>>> @@ -0,0 +1,77 @@
>>>> +#undef TRACE_SYSTEM
>>>> +#define TRACE_SYSTEM hw_event
>>>
>>> 			ras
>>
>> Then, the header name should be called as "ras.h", otherwise an error happens:
>>
>> In file included from include/ras/hw_event.h:77:0,
>>                  from drivers/edac/edac_mc.c:38:
>> include/trace/define_trace.h:79:43: fatal error: ../../include/ras/ras.h: Arquivo ou diretório não encontrado
> 
> From samples/trace_events/trace-events-sample.h:
> 
> /*
>  * TRACE_INCLUDE_FILE is not needed if the filename and TRACE_SYSTEM are equal
>  */
> #define TRACE_INCLUDE_FILE trace-events-sample
> 
> Thus, you could do:
> 
> +++ b/include/ras/hw_event.h
> 
> +undef TRACE_SYSTEM
> +define TRACE_SYSTEM ras
> 
> [...]
> 
> #define TRACE_INCLUDE_FILE hw_event
> 
> 
> -- Steve

Thanks, Steve!

I'll rename it to include/ras/ras_event.h

Regards,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-10 22:37       ` Luck, Tony
@ 2012-05-11  1:48         ` Mauro Carvalho Chehab
  2012-05-11 10:25           ` Borislav Petkov
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-11  1:48 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

Em 10-05-2012 19:37, Luck, Tony escreveu:
>      kworker/u:6-201   [007] .N..   186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1  page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)
>      
> The word "error" appears *five* times on this line (once with a capital E).
> I feel beaten, bruised and ready to give up on this machine with just one
> actual error reported :-)

:)

Several of them come from the driver-provided details.

The edac-mc core contributes with "mc_error", "[Hardware Error]" and "Corrected error".
The sb-edac driver contributes with "memory read error" and "1 error(s)".

We can get easily get rid of "[Hardware Error]" by removing HW_ERR from:

	TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",

replacing mc_error by something else is not hard, but this is the name of the trace call:

TRACE_EVENT(mc_error,
...

Maybe the better is to do s/mc_error/mc_event/g.

The error count msg ("1 error(s)") could be replaced by "count:1".

> 
> We could get rid of one by:
>  s/Corrected error memory read error/Corrected memory read error/

This is the hardest possible solution ;) Changing it will cause weird messages
all over EDAC drivers ;)

This is how sb_edac.c provides the "memory read error" string:

                switch (optypenum) {
               	case 0:
                       	optype = "generic undef request error";
                       	break;
                case 1:
                        optype = "memory read error";
                        break;
                case 2:
                        optype = "memory write error";
                        break;
                case 3:
                        optype = "addr/cmd error";
                        break;
               	case 4:
                        optype = "memory scrubbing error";
                        break;
               	default:
                        optype = "reserved";
                       	break;

In the last case of switch, for this driver, the error would be printed as "Corrected reserved".

On i7core_edac, there's also one error that would be weird:

        switch (optypenum) {
        case 0:
                optype = "generic undef request";
               	break;

Drivers like i5100_edac also provide error messages without the word "error" on it, like:

static const char *i5100_err_msg(unsigned err)
{
        static const char *merrs[] = {
                "unknown", /* 0 */
                "uncorrectable data ECC on replay", /* 1 */
                "unknown", /* 2 */
                "unknown", /* 3 */
                "aliased uncorrectable demand data ECC", /* 4 */
                "aliased uncorrectable spare-copy data ECC", /* 5 */
                "aliased uncorrectable patrol data ECC", /* 6 */
                "unknown", /* 7 */
                "unknown", /* 8 */
                "unknown", /* 9 */
                "non-aliased uncorrectable demand data ECC", /* 10 */
                "non-aliased uncorrectable spare-copy data ECC", /* 11 */
                "non-aliased uncorrectable patrol data ECC", /* 12 */
...

On _several_ drivers, the error type is simply the name of the driver, or blank:

amd76x_edac.c:
                       	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
                                             mci->csrows[row]->first_page, 0, 0,
                                             row, 0, -1,
                                             mci->ctl_name, "", NULL);

i3200_edac:
                        edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
                                             0, 0, 0,
                                             eccerrlog_row(channel, log),
                                             -1, -1,
                                             "i3000 UE", "", NULL);

Btw, you should not forget that, while simple usecases will be to just read the
/sys/kernel/debug/tracing/trace file, a monitoring tool will use the
binary data information, and store each trace field on a separate database
field.

So, the contents of the error message field should be consistent.

> 
> (though we'd need to see if things still read well for all other "msg" options.
> 
> Or perhaps it could say:
>    ... Corrected error: memory read on memory stick ...
> or even:
>    ... Corrected error: read on memory stick ...
> 
> This part could get shortened too:
>    mc_error: [Hardware Error]:
> will mc_error ever report something that isn't a "Hardware Error"?
> I don't think we have to preserve this legacy string when moving
> to a new reporting mechanism.
> 
> 
>> There are still some space to improve the fields provided by the drivers.
> Apart from reporting "channel" twice, that doesn't look too bad. Maybe
> the "1 error(s)" could say "count: 1"?

Agreed.

See the enclosed patch. The TP_printk() message after it is:

mc_event: Corrected error:memory read error on memory stick "DIMM_A1" (mc:0 channel:0 slot:0  page:0x1a3706 offset:0xff1 grain:32 syndrome:0x0 count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:8 rank:0)

If ok, I'll merge the edac core part together with this changeset, and the 
sb_edac part together with the patch that cleans the sb_edac logs.

Regards,
Mauro

--

edac: Improve error messages on sb-edac and edac-mc

From: Mauro Carvalho Chehab <mchehab@redhat.com>

After this patch, /sys/kernel/debug/tracing/trace displays:

     kworker/u:6-201   [007] .N..   161.136624: mc_event: Corrected error:memory read error on memory stick "DIMM_A1" (mc:0 channel:0 slot:0  page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:1)
     kworker/u:6-201   [007] .N..   161.155708: mc_event: Corrected error:memory read error on memory stick "DIMM_E1" (mc:1 channel:0 slot:0  page:0x987f45 offset:0x14c grain:32 syndrome:0x0 count:1 area:DRAM err_code:0001:0090 socket:1 channel_mask:4 rank:0)
     kworker/u:6-201   [007] .N..   161.174817: mc_event: Corrected error:memory read error on memory stick "DIMM_C1" (mc:0 channel:0 slot:1  page:0x2bf618 offset:0xb2e grain:32 syndrome:0x0 count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:4 rank:4)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 0550cb4..b7492e8 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -35,7 +35,7 @@
 
 #define CREATE_TRACE_POINTS
 #define TRACE_INCLUDE_PATH ../../include/ras
-#include <ras/ras.h>
+#include <ras/ras_event.h>
 
 /* lock to memory controller's control array */
 static DEFINE_MUTEX(mem_ctls_mutex);
@@ -1174,7 +1174,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			page_frame_number, offset_in_page, grain);
 
 	/* Report the error via the trace interface */
-	trace_mc_error(type, mci->mc_idx, msg, label, location,
+	trace_mc_event(type, mci->mc_idx, msg, label, location,
 		       detail, other_detail);
 
 	/* Report the error via the edac_mc_printk() interface */
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 69c807c..60dbefe 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -786,7 +786,7 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 				 u8 *socket,
 				 long *channel_mask,
 				 u8 *rank,
-				 char *area_type, char *msg)
+				 char **area_type, char *msg)
 {
 	struct mem_ctl_info	*new_mci;
 	struct sbridge_pvt *pvt = mci->pvt_info;
@@ -841,7 +841,7 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 		sprintf(msg, "Can't discover the memory socket");
 		return -EINVAL;
 	}
-	area_type = get_dram_attr(reg);
+	*area_type = get_dram_attr(reg);
 	interleave_mode = INTERLEAVE_MODE(reg);
 
 	pci_read_config_dword(pvt->pci_sad0, interleave_list[n_sads],
@@ -1339,7 +1339,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	struct mem_ctl_info *new_mci;
 	struct sbridge_pvt *pvt = mci->pvt_info;
 	enum hw_event_mc_err_type tp_event;
-	char *type, *optype, msg[256], *recoverable_msg;
+	char *type, *optype, msg[256];
 	bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0);
 	bool overflow = GET_BITFIELD(m->status, 62, 62);
 	bool uncorrected_error = GET_BITFIELD(m->status, 61, 61);
@@ -1352,7 +1352,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	long channel_mask, first_channel;
 	u8  rank, socket;
 	int rc, dimm;
-	char *area_type = "Unknown";
+	char *area_type = NULL;
 
 	if (uncorrected_error) {
 		if (ripv) {
@@ -1404,7 +1404,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	}
 
 	rc = get_memory_error_data(mci, m->addr, &socket,
-				   &channel_mask, &rank, area_type, msg);
+				   &channel_mask, &rank, &area_type, msg);
 	if (rc < 0)
 		goto err_parsing;
 	new_mci = get_mci_for_node_id(socket);
@@ -1424,10 +1424,6 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	else
 		dimm = 2;
 
-	if (uncorrected_error && recoverable)
-		recoverable_msg = " recoverable";
-	else
-		recoverable_msg = "";
 
 	/*
 	 * FIXME: On some memory configurations (mirror, lockstep), the
@@ -1436,14 +1432,13 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	 * to the group of dimm's where the error may be happening.
 	 */
 	snprintf(msg, sizeof(msg),
-		 "%d error(s)%s: %s%s: Err=%04x:%04x socket=%d channel=%ld/mask=%ld rank=%d",
+		 "count:%d%s%s area:%s err_code:%04x:%04x socket:%d channel_mask:%ld rank:%d",
 		 core_err_cnt,
 		 overflow ? " OVERFLOW" : "",
+		 (uncorrected_error && recoverable) ? " recoverable" : "",
 		 area_type,
-		 recoverable_msg,
 		 mscod, errcode,
 		 socket,
-		 first_channel,
 		 channel_mask,
 		 rank);
 
diff --git a/include/ras/ras.h b/include/ras/ras_event.h
similarity index 80%
rename from include/ras/ras.h
rename to include/ras/ras_event.h
index 13ea4ee..66f6a43 100644
--- a/include/ras/ras.h
+++ b/include/ras/ras_event.h
@@ -1,5 +1,6 @@
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM ras
+#define TRACE_INCLUDE_FILE ras_event
 
 #if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _TRACE_HW_EVENT_MC_H
@@ -26,25 +27,25 @@
 /*
  * Default error mechanisms for Memory Controller errors (CE and UE)
  */
-TRACE_EVENT(mc_error,
+TRACE_EVENT(mc_event,
 
 	TP_PROTO(const unsigned int err_type,
 		 const unsigned int mc_index,
-		 const char *msg,
+		 const char *error_msg,
 		 const char *label,
 		 const char *location,
-		 const char *detail,
+		 const char *core_detail,
 		 const char *driver_detail),
 
-	TP_ARGS(err_type, mc_index, msg, label, location,
-		detail, driver_detail),
+	TP_ARGS(err_type, mc_index, error_msg, label, location,
+		core_detail, driver_detail),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	err_type		)
 		__field(	unsigned int,	mc_index		)
-		__string(	msg,		msg			)
+		__string(	msg,		error_msg		)
 		__string(	label,		label			)
-		__string(	detail,		detail			)
+		__string(	detail,		core_detail		)
 		__string(	location,	location		)
 		__string(	driver_detail,	driver_detail		)
 	),
@@ -52,20 +53,20 @@ TRACE_EVENT(mc_error,
 	TP_fast_assign(
 		__entry->err_type		= err_type;
 		__entry->mc_index		= mc_index;
-		__assign_str(msg, msg);
+		__assign_str(msg, error_msg);
 		__assign_str(label, label);
 		__assign_str(location, location);
-		__assign_str(detail, detail);
+		__assign_str(detail, core_detail);
 		__assign_str(driver_detail, driver_detail);
 	),
 
-	TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
-		  __entry->mc_index,
+	TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
 		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
 			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
 			"Fatal" : "Uncorrected"),
 		  __get_str(msg),
 		  __get_str(label),
+		  __entry->mc_index,
 		  __get_str(location),
 		  __get_str(detail),
 		  __get_str(driver_detail))

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCHv23] RAS: use events to handle hw issues
  2012-05-10 21:00   ` [PATCHv23] RAS: " Mauro Carvalho Chehab
@ 2012-05-11 10:04     ` Borislav Petkov
  2012-05-11 14:54     ` [PATCH v.23-2] RAS: use tracepoint " Mauro Carvalho Chehab
  2012-05-12 14:13     ` [PATCH v24] RAS: Add a tracepoint for reporting memory controller events Mauro Carvalho Chehab
  2 siblings, 0 replies; 43+ messages in thread
From: Borislav Petkov @ 2012-05-11 10:04 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Let me repeat myself:

> It should be:
> 
> Subject: RAS: Add a tracepoint for reporting memory controller errors
> 
> and not
> 
> Subject: edac, ras/hw_event.h: use events to handle hw issues
> 
> because events don't handle hw issues.

You don't use events to handle hardware issues - you use tracepoints to
report hardware issues! And besides, "event" is so overloaded in kernel
land that actually refraining from its use makes me keep at least a bit
of sanity when looking at yet another header called something event.h.

Please be more precise when naming your patches.

Thanks.

On Thu, May 10, 2012 at 06:00:28PM -0300, Mauro Carvalho Chehab wrote:
> Add a new tracepoint-based hardware events report method for
> reporting Memory Controller events.
> 
> Part of the description bellow is shamelessly copied from Tony
> Luck's notes about the Hardware Error BoF during LPC 2010 [1].
> Tony, thanks for your notes and discussions to generate the
> h/w error reporting requirements.
> 
> [1] http://lwn.net/Articles/416669/
> 
>     We have several subsystems & methods for reporting hardware errors:
> 
>     1) EDAC ("Error Detection and Correction").  In its original form
>     this consisted of a platform specific driver that read topology
>     information and error counts from chipset registers and reported
>     the results via a sysfs interface.
> 
>     2) mcelog - x86 specific decoding of machine check bank registers
>     reporting in binary form via /dev/mcelog. Recent additions make use
>     of the APEI extensions that were documented in version 4.0a of the
>     ACPI specification to acquire more information about errors without
>     having to rely reading chipset registers directly. A user level
>     programs decodes into somewhat human readable format.
> 
>     3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
>     decodes errors reported via machine check bank registers in AMD
>     processors to the console log using printk();
> 
>     Each of these mechanisms has a band of followers ... and none
>     of them appear to meet all the needs of all users.
> 
> As part of a RAS subsystem, let's encapsulate the memory error hardware
> events into a trace facility.
> 
> NOTE: The original patch was providing an additional mechanism for
> MCA-based trace events that also contained MCA error register data.
> Hoever, as no agreement was reached so far for the MCA-based trace
> events, for now, let's add events only for memory errors.
> A latter patch is planned to change the tracepoint, for those types
> of event.
> 
> Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
> Cc: Doug Thompson <norsk5@yahoo.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
> ---
>  drivers/edac/edac_core.h |    2 +-
>  drivers/edac/edac_mc.c   |   25 +++++++++++----
>  include/ras/ras.h        |   77 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 96 insertions(+), 8 deletions(-)
>  create mode 100644 include/ras/ras.h
> 
> diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
> index f06ce9a..eee7360 100644
> --- a/drivers/edac/edac_core.h
> +++ b/drivers/edac/edac_core.h
> @@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  			  const int layer2,
>  			  const char *msg,
>  			  const char *other_detail,
> -			  const void *mcelog);
> +			  const void *arch_log);
>  
>  /*
>   * edac_device APIs
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index e5b5563..7493adb 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -33,6 +33,10 @@
>  #include "edac_core.h"
>  #include "edac_module.h"
>  
> +#define CREATE_TRACE_POINTS
> +#define TRACE_INCLUDE_PATH ../../include/ras
> +#include <ras/ras.h>
> +
>  /* lock to memory controller's control array */
>  static DEFINE_MUTEX(mem_ctls_mutex);
>  static LIST_HEAD(mc_devices);
> @@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>  	 * which will perform kobj unregistration and the actual free
>  	 * will occur during the kobject callback operation
>  	 */
> +
>  	return mci;
>  }
>  EXPORT_SYMBOL_GPL(edac_mc_alloc);
> @@ -982,7 +987,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  			  const int layer2,
>  			  const char *msg,
>  			  const char *other_detail,
> -			  const void *mcelog)
> +			  const void *arch_log)
>  {
>  	/* FIXME: too much for stack: move it to some pre-alocated area */
>  	char detail[80], location[80];
> @@ -1119,21 +1124,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  	}
>  
>  	/* Memory type dependent details about the error */
> -	if (type == HW_EVENT_ERR_CORRECTED) {
> +	if (type == HW_EVENT_ERR_CORRECTED)
>  		snprintf(detail, sizeof(detail),
>  			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
>  			page_frame_number, offset_in_page,
>  			grain, syndrome);
> -		edac_ce_error(mci, pos, msg, location, label, detail,
> -			      other_detail, enable_per_layer_report,
> -			      page_frame_number, offset_in_page, grain);
> -	} else {
> +	else
>  		snprintf(detail, sizeof(detail),
>  			"page:0x%lx offset:0x%lx grain:%d",
>  			page_frame_number, offset_in_page, grain);
>  
> +	/* Report the error via the trace interface */
> +	trace_mc_error(type, mci->mc_idx, msg, label, location,
> +		       detail, other_detail);
> +
> +	/* Report the error via the edac_mc_printk() interface */
> +	if (type == HW_EVENT_ERR_CORRECTED)
> +		edac_ce_error(mci, pos, msg, location, label, detail,
> +			      other_detail, enable_per_layer_report,
> +			      page_frame_number, offset_in_page, grain);
> +	else
>  		edac_ue_error(mci, pos, msg, location, label, detail,
>  			      other_detail, enable_per_layer_report);
> -	}
>  }
>  EXPORT_SYMBOL_GPL(edac_mc_handle_error);
> diff --git a/include/ras/ras.h b/include/ras/ras.h
> new file mode 100644
> index 0000000..13ea4ee
> --- /dev/null
> +++ b/include/ras/ras.h
> @@ -0,0 +1,77 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM ras
> +
> +#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HW_EVENT_MC_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/edac.h>
> +#include <linux/ktime.h>
> +
> +/*
> + * Hardware Events Report
> + *
> + * Those events are generated when hardware detected a corrected or
> + * uncorrected event, and are meant to replace the current API to report
> + * errors defined on both EDAC and MCE subsystems.
> + *
> + * FIXME: Add events for handling memory errors originated from the
> + *        MCE subsystem.
> + */
> +
> +/*
> + * Hardware-independent Memory Controller specific events
> + */
> +
> +/*
> + * Default error mechanisms for Memory Controller errors (CE and UE)
> + */
> +TRACE_EVENT(mc_error,
> +
> +	TP_PROTO(const unsigned int err_type,
> +		 const unsigned int mc_index,
> +		 const char *msg,
> +		 const char *label,
> +		 const char *location,
> +		 const char *detail,
> +		 const char *driver_detail),
> +
> +	TP_ARGS(err_type, mc_index, msg, label, location,
> +		detail, driver_detail),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned int,	err_type		)
> +		__field(	unsigned int,	mc_index		)
> +		__string(	msg,		msg			)
> +		__string(	label,		label			)
> +		__string(	detail,		detail			)
> +		__string(	location,	location		)
> +		__string(	driver_detail,	driver_detail		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->err_type		= err_type;
> +		__entry->mc_index		= mc_index;
> +		__assign_str(msg, msg);
> +		__assign_str(label, label);
> +		__assign_str(location, location);
> +		__assign_str(detail, detail);
> +		__assign_str(driver_detail, driver_detail);
> +	),
> +
> +	TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
> +		  __entry->mc_index,
> +		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> +			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> +			"Fatal" : "Uncorrected"),
> +		  __get_str(msg),
> +		  __get_str(label),
> +		  __get_str(location),
> +		  __get_str(detail),
> +		  __get_str(driver_detail))
> +);
> +
> +#endif /* _TRACE_HW_EVENT_MC_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> -- 
> 1.7.8
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-11  1:48         ` Mauro Carvalho Chehab
@ 2012-05-11 10:25           ` Borislav Petkov
  2012-05-11 12:37             ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2012-05-11 10:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Thu, May 10, 2012 at 10:48:40PM -0300, Mauro Carvalho Chehab wrote:
> Em 10-05-2012 19:37, Luck, Tony escreveu:
> >      kworker/u:6-201   [007] .N..   186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1  page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)
> >      
> > The word "error" appears *five* times on this line (once with a capital E).
> > I feel beaten, bruised and ready to give up on this machine with just one
> > actual error reported :-)
> 
> :)
> 
> Several of them come from the driver-provided details.
> 
> The edac-mc core contributes with "mc_error", "[Hardware Error]" and "Corrected error".
> The sb-edac driver contributes with "memory read error" and "1 error(s)".
> 
> We can get easily get rid of "[Hardware Error]" by removing HW_ERR from:
> 
> 	TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
> 
> replacing mc_error by something else is not hard, but this is the name of the trace call:
> 
> TRACE_EVENT(mc_error,
> ...
> 
> Maybe the better is to do s/mc_error/mc_event/g.

HW_ERR is the "official" prefix used by the MCE code in the kernel.
Maybe we can shorten it but it is needed to raise attention when staring
at dmesg output.

Now, since this tracepoint is not dmesg, we don't need it there at all
since we know that trace_mc_error reports memory errors.

"mc_error" is also not needed.

> The error count msg ("1 error(s)") could be replaced by "count:1".

Is there even a possibility to report more than one error when invoking
trace_mc_error once? If not, simply drop the count completely.

> > We could get rid of one by:
> >  s/Corrected error memory read error/Corrected memory read error/
> 
> This is the hardest possible solution ;) Changing it will cause weird messages
> all over EDAC drivers ;)

I agree with Tony here - repeating error a gazillion times on one report
only is a "naaah!"

Here's how it should look:

kworker/u:6-201   [007] .N..   161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)

* count is gone
* MC-drivers shouldn't say "error" when reporting an error
* UE/CE moves into the brackets
* socket moves earlier in the brackets, and keep the whole deal hierarchical.
* drop "err_code" what is that?
* drop second "socket"
* drop "area" Area "DRAM" - are there other?
* what is "channel_mask"?
* move "rank" to earlier

Now this is an output format I can get on board with.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-11 10:25           ` Borislav Petkov
@ 2012-05-11 12:37             ` Mauro Carvalho Chehab
  2012-05-11 17:24               ` Borislav Petkov
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-11 12:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Em 11-05-2012 07:25, Borislav Petkov escreveu:
> On Thu, May 10, 2012 at 10:48:40PM -0300, Mauro Carvalho Chehab wrote:
>> Em 10-05-2012 19:37, Luck, Tony escreveu:
>>>      kworker/u:6-201   [007] .N..   186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1  page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)
>>>      
>>> The word "error" appears *five* times on this line (once with a capital E).
>>> I feel beaten, bruised and ready to give up on this machine with just one
>>> actual error reported :-)
>>
>> :)
>>
>> Several of them come from the driver-provided details.
>>
>> The edac-mc core contributes with "mc_error", "[Hardware Error]" and "Corrected error".
>> The sb-edac driver contributes with "memory read error" and "1 error(s)".
>>
>> We can get easily get rid of "[Hardware Error]" by removing HW_ERR from:
>>
>> 	TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
>>
>> replacing mc_error by something else is not hard, but this is the name of the trace call:
>>
>> TRACE_EVENT(mc_error,
>> ...
>>
>> Maybe the better is to do s/mc_error/mc_event/g.
> 
> HW_ERR is the "official" prefix used by the MCE code in the kernel.
> Maybe we can shorten it but it is needed to raise attention when staring
> at dmesg output.
> 
> Now, since this tracepoint is not dmesg, we don't need it there at all
> since we know that trace_mc_error reports memory errors.

IMO, we can get rid of it on trace, keeping it at dmesg.

> "mc_error" is also not needed.

Some name is required there. This parameter affects:
	- the name of the tracepoint function;
	- the TP_printk() output;
	- the trace filter name.

mc_event is probably a good name.

>> The error count msg ("1 error(s)") could be replaced by "count:1".
> 
> Is there even a possibility to report more than one error when invoking
> trace_mc_error once? If not, simply drop the count completely.

Good point. It makes sense to add a count parameter to the error handling core, 
in order to handle "count". AFAIKT, the only two drivers that currently reports
error counts are sb_edac and i7core_edac.

With regards to sb_edac, it also needs another logic to be handled by the core:
channel_mask. This is required to handle lockstep mode and mirror mode.

Adding support for it is already on my TODO list. I'll also put "count" 
on my TODO list.

>>> We could get rid of one by:
>>>  s/Corrected error memory read error/Corrected memory read error/
>>
>> This is the hardest possible solution ;) Changing it will cause weird messages
>> all over EDAC drivers ;)
> 
> I agree with Tony here - repeating error a gazillion times on one report
> only is a "naaah!"
> 
> Here's how it should look:
> 
> kworker/u:6-201   [007] .N..   161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)

As already explained: the error_msg is not consistent among the drivers.

So, "memory read on ..." will work fine on sb_edac/i7core_edac but it _won't_
work on other drivers.

Changing this field on each driver requires deep knowledge of each memory
controller, and not all datasheets are publicly available. 

For example, this is the code for Freescale MPC85xx EDAC driver:

	if (err_detect & DDR_EDE_SBE)
		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
				     pfn, err_addr & ~PAGE_MASK, syndrome,
				     row_index, 0, -1,
				     mci->ctl_name, "", NULL);

	if (err_detect & DDR_EDE_MBE)
		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
				     pfn, err_addr & ~PAGE_MASK, syndrome,
				     row_index, 0, -1,
				     mci->ctl_name, "", NULL);

It uses a blank value for the error message. putting the error_msg at the beginning,
as you proposed would print: 

[Hardware Error]:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 )

> * count is gone

While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.

The right thing here seems to move it to a core property and increment the error counters
according with the error count.

> * MC-drivers shouldn't say "error" when reporting an error
> * UE/CE moves into the brackets

It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.

> * socket moves earlier in the brackets, and keep the whole deal hierarchical.

Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
were running the code when an error was generated, not the CPU that were managing the memory.

The CPU managing the memory is called "memory controller".

It might be used by something that would kill the affected task, but doing such things
is probably not easy.

So, the hierarchical information that translates into "DIMM_A1" is "mc:0 channel:0 slot:0".

> * drop "err_code" what is that?

In this case:
	u32 errcode = GET_BITFIELD(m->status, 0, 15);

> * drop second "socket"
> * drop "area" Area "DRAM" - are there other?

Yes. there are 4 types of area at sb_edac.

> * what is "channel_mask"?

It is a bitmask mask showing all channels affected by an error, on sb_edac.
Handing it is on my TODO list.

> * move "rank" to earlier

Why? This is the least relevant information provided by the driver-specific logic:

	snprintf(msg, sizeof(msg),
		 "count:%d%s%s area:%s err_code:%04x:%04x socket:%d channel_mask:%ld rank:%d",
		 core_err_cnt,
		 overflow ? " OVERFLOW" : "",
		 (uncorrected_error && recoverable) ? " recoverable" : "",
		 area_type,
		 mscod, errcode,
		 socket,
		 channel_mask,
		 rank);

Error count, overflow, recoverable bit, etc are more relevant than it, as it actually affects
the user.

Rank, on the other hand, might only help someone interested on replacing a DRAM chip. Still,
it is doubtful that this would be used, in practice, as companies that replaces DIMM chip likely
have some specific equipments to test the DIMMs.

So, I almost dropped it. The only reason for me to not drop is that it allows me to compare
the driver results with some other testing tools I'm using on driver's development.


> Now this is an output format I can get on board with.
> 

Regards,
Mauro.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v.23-2] RAS: use tracepoint to handle hw issues
  2012-05-10 21:00   ` [PATCHv23] RAS: " Mauro Carvalho Chehab
  2012-05-11 10:04     ` Borislav Petkov
@ 2012-05-11 14:54     ` Mauro Carvalho Chehab
  2012-05-11 17:02       ` Luck, Tony
  2012-05-11 17:06       ` Borislav Petkov
  2012-05-12 14:13     ` [PATCH v24] RAS: Add a tracepoint for reporting memory controller events Mauro Carvalho Chehab
  2 siblings, 2 replies; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-11 14:54 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

Add a new tracepoint-based hardware events report method for
reporting Memory Controller events.

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

    We have several subsystems & methods for reporting hardware errors:

    1) EDAC ("Error Detection and Correction").  In its original form
    this consisted of a platform specific driver that read topology
    information and error counts from chipset registers and reported
    the results via a sysfs interface.

    2) mcelog - x86 specific decoding of machine check bank registers
    reporting in binary form via /dev/mcelog. Recent additions make use
    of the APEI extensions that were documented in version 4.0a of the
    ACPI specification to acquire more information about errors without
    having to rely reading chipset registers directly. A user level
    programs decodes into somewhat human readable format.

    3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
    decodes errors reported via machine check bank registers in AMD
    processors to the console log using printk();

    Each of these mechanisms has a band of followers ... and none
    of them appear to meet all the needs of all users.

As part of a RAS subsystem, let's encapsulate the memory error hardware
events into a trace facility.

The tracepoint printk will be displayed like:

mc_event: (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick "[label]" ([location] [edac_mc detail] [driver_detail])

Where:
	[error msg] is the driver-specific error message
		    (e. g. "memory read", "bus error", ...);
	[location] is the location in terms of memory controller and
		   branch/channel/slot, channel/slot or csrow/channel;
	[label] is the memory stick label;
	[edac_mc detail] describes the address location of the error
			 and the syndrome;
	[driver detail] is driver-specifig error message details,
			when needed/provided (e. g. "area:DMA", ...)

For example:

mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their MIBs.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
Hoever, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_core.h |    2 +-
 drivers/edac/edac_mc.c   |   25 ++++++++++----
 include/ras/ras_event.h  |   78 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 97 insertions(+), 8 deletions(-)
 create mode 100644 include/ras/ras_event.h

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog);
+			  const void *arch_log);
 
 /*
  * edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index e5b5563..0153cd98 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,10 @@
 #include "edac_core.h"
 #include "edac_module.h"
 
+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/ras_event.h>
+
 /* lock to memory controller's control array */
 static DEFINE_MUTEX(mem_ctls_mutex);
 static LIST_HEAD(mc_devices);
@@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 	 * which will perform kobj unregistration and the actual free
 	 * will occur during the kobject callback operation
 	 */
+
 	return mci;
 }
 EXPORT_SYMBOL_GPL(edac_mc_alloc);
@@ -982,7 +987,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog)
+			  const void *arch_log)
 {
 	/* FIXME: too much for stack: move it to some pre-alocated area */
 	char detail[80], location[80];
@@ -1119,21 +1124,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 	}
 
 	/* Memory type dependent details about the error */
-	if (type == HW_EVENT_ERR_CORRECTED) {
+	if (type == HW_EVENT_ERR_CORRECTED)
 		snprintf(detail, sizeof(detail),
 			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
 			page_frame_number, offset_in_page,
 			grain, syndrome);
-		edac_ce_error(mci, pos, msg, location, label, detail,
-			      other_detail, enable_per_layer_report,
-			      page_frame_number, offset_in_page, grain);
-	} else {
+	else
 		snprintf(detail, sizeof(detail),
 			"page:0x%lx offset:0x%lx grain:%d",
 			page_frame_number, offset_in_page, grain);
 
+	/* Report the error via the trace interface */
+	trace_mc_event(type, mci->mc_idx, msg, label, location,
+		       detail, other_detail);
+
+	/* Report the error via the edac_mc_printk() interface */
+	if (type == HW_EVENT_ERR_CORRECTED)
+		edac_ce_error(mci, pos, msg, location, label, detail,
+			      other_detail, enable_per_layer_report,
+			      page_frame_number, offset_in_page, grain);
+	else
 		edac_ue_error(mci, pos, msg, location, label, detail,
 			      other_detail, enable_per_layer_report);
-	}
 }
 EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
new file mode 100644
index 0000000..66f6a43
--- /dev/null
+++ b/include/ras/ras_event.h
@@ -0,0 +1,78 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ras
+#define TRACE_INCLUDE_FILE ras_event
+
+#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_MC_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+#include <linux/ktime.h>
+
+/*
+ * Hardware Events Report
+ *
+ * Those events are generated when hardware detected a corrected or
+ * uncorrected event, and are meant to replace the current API to report
+ * errors defined on both EDAC and MCE subsystems.
+ *
+ * FIXME: Add events for handling memory errors originated from the
+ *        MCE subsystem.
+ */
+
+/*
+ * Hardware-independent Memory Controller specific events
+ */
+
+/*
+ * Default error mechanisms for Memory Controller errors (CE and UE)
+ */
+TRACE_EVENT(mc_event,
+
+	TP_PROTO(const unsigned int err_type,
+		 const unsigned int mc_index,
+		 const char *error_msg,
+		 const char *label,
+		 const char *location,
+		 const char *core_detail,
+		 const char *driver_detail),
+
+	TP_ARGS(err_type, mc_index, error_msg, label, location,
+		core_detail, driver_detail),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	err_type		)
+		__field(	unsigned int,	mc_index		)
+		__string(	msg,		error_msg		)
+		__string(	label,		label			)
+		__string(	detail,		core_detail		)
+		__string(	location,	location		)
+		__string(	driver_detail,	driver_detail		)
+	),
+
+	TP_fast_assign(
+		__entry->err_type		= err_type;
+		__entry->mc_index		= mc_index;
+		__assign_str(msg, error_msg);
+		__assign_str(label, label);
+		__assign_str(location, location);
+		__assign_str(detail, core_detail);
+		__assign_str(driver_detail, driver_detail);
+	),
+
+	TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
+		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+			"Fatal" : "Uncorrected"),
+		  __get_str(msg),
+		  __get_str(label),
+		  __entry->mc_index,
+		  __get_str(location),
+		  __get_str(detail),
+		  __get_str(driver_detail))
+);
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* RE: [PATCH v.23-2] RAS: use tracepoint to handle hw issues
  2012-05-11 14:54     ` [PATCH v.23-2] RAS: use tracepoint " Mauro Carvalho Chehab
@ 2012-05-11 17:02       ` Luck, Tony
  2012-05-11 18:53         ` Mauro Carvalho Chehab
  2012-05-11 17:06       ` Borislav Petkov
  1 sibling, 1 reply; 43+ messages in thread
From: Luck, Tony @ 2012-05-11 17:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

> For example:
>
> mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

This is looking so much better.

I looked through your examples from drivers on what text we might see
in the "memory read" position ... and agree that it would be a lot of
work to make them all come up with grammatically clean messages, especially
for all the poorly documented (or undocumented) "default/unknown/..." cases.

Back to my "does the casual user really need to know" soapbox. What different
actions do we expect a user to take when we tell them "read error" or "write
error" or "unknown error"?  I'm beginning to think that this belongs inside
the brackets! Perhaps as:  type:"memory read"?

Then we'd have:

mc_event: Corrected error on memory stick "DIMM_1A" (bunch of stuff for deep diagnosis by vendor)

Knowing that the error was Corrected/Uncorrected is vital to the user. It lets them know
the urgency with which they need to take action ... we need to educate them that a few
"Corrected" errors are perfectly normal and nothing to raise blood pressure about.

Knowing which memory stick was involved - also very important. If they do take action,
they should be able to swap out the memory stick that was the source of the problem.

Everything else is just for memory geeks like me, you and Boris (and OEMs who want to
diagnose root cause of problems they see by pattern analysis across errors from multiple
machines with DIMMS from different batches/vendors).

-Tony

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v.23-2] RAS: use tracepoint to handle hw issues
  2012-05-11 14:54     ` [PATCH v.23-2] RAS: use tracepoint " Mauro Carvalho Chehab
  2012-05-11 17:02       ` Luck, Tony
@ 2012-05-11 17:06       ` Borislav Petkov
  2012-05-11 17:10         ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2012-05-11 17:06 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

>From Documentation/SubmittingPatches:

"2) Describe your changes.

Describe the technical detail of the change(s) your patch includes.

Be as specific as possible."

"use tracepoint to handle hw issues" is not specific - it is bullshit
bingo with a random word generator.

I'm getting tired of this crap: either take the suggestion as is,
without randomly changing it for no reason whatsoever, or give a good
reason why it is not a good suggestion and propose a better one!

Randomly changing it in the next version of the patch for another, even
crappier one doesn't help.

On Fri, May 11, 2012 at 11:54:14AM -0300, Mauro Carvalho Chehab wrote:
> Add a new tracepoint-based hardware events report method for
> reporting Memory Controller events.
> 
> Part of the description bellow is shamelessly copied from Tony
> Luck's notes about the Hardware Error BoF during LPC 2010 [1].
> Tony, thanks for your notes and discussions to generate the
> h/w error reporting requirements.
> 
> [1] http://lwn.net/Articles/416669/
> 
>     We have several subsystems & methods for reporting hardware errors:
> 
>     1) EDAC ("Error Detection and Correction").  In its original form
>     this consisted of a platform specific driver that read topology
>     information and error counts from chipset registers and reported
>     the results via a sysfs interface.
> 
>     2) mcelog - x86 specific decoding of machine check bank registers
>     reporting in binary form via /dev/mcelog. Recent additions make use
>     of the APEI extensions that were documented in version 4.0a of the
>     ACPI specification to acquire more information about errors without
>     having to rely reading chipset registers directly. A user level
>     programs decodes into somewhat human readable format.
> 
>     3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
>     decodes errors reported via machine check bank registers in AMD
>     processors to the console log using printk();
> 
>     Each of these mechanisms has a band of followers ... and none
>     of them appear to meet all the needs of all users.
> 
> As part of a RAS subsystem, let's encapsulate the memory error hardware
> events into a trace facility.
> 
> The tracepoint printk will be displayed like:
> 
> mc_event: (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick "[label]" ([location] [edac_mc detail] [driver_detail])
> 
> Where:
> 	[error msg] is the driver-specific error message
> 		    (e. g. "memory read", "bus error", ...);
> 	[location] is the location in terms of memory controller and
> 		   branch/channel/slot, channel/slot or csrow/channel;
> 	[label] is the memory stick label;
> 	[edac_mc detail] describes the address location of the error
> 			 and the syndrome;
> 	[driver detail] is driver-specifig error message details,
> 			when needed/provided (e. g. "area:DMA", ...)
> 
> For example:
> 
> mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
> 
> Of course, any userspace tools meant to handle errors should not parse
> the above data. They should, instead, use the binary fields provided by
> the tracepoint, mapping them directly into their MIBs.
> 
> NOTE: The original patch was providing an additional mechanism for
> MCA-based trace events that also contained MCA error register data.
> Hoever, as no agreement was reached so far for the MCA-based trace
> events, for now, let's add events only for memory errors.
> A latter patch is planned to change the tracepoint, for those types
> of event.
> 
> Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
> Cc: Doug Thompson <norsk5@yahoo.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
> ---
>  drivers/edac/edac_core.h |    2 +-
>  drivers/edac/edac_mc.c   |   25 ++++++++++----
>  include/ras/ras_event.h  |   78 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 97 insertions(+), 8 deletions(-)
>  create mode 100644 include/ras/ras_event.h
> 
> diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
> index f06ce9a..eee7360 100644
> --- a/drivers/edac/edac_core.h
> +++ b/drivers/edac/edac_core.h
> @@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  			  const int layer2,
>  			  const char *msg,
>  			  const char *other_detail,
> -			  const void *mcelog);
> +			  const void *arch_log);
>  
>  /*
>   * edac_device APIs
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index e5b5563..0153cd98 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -33,6 +33,10 @@
>  #include "edac_core.h"
>  #include "edac_module.h"
>  
> +#define CREATE_TRACE_POINTS
> +#define TRACE_INCLUDE_PATH ../../include/ras
> +#include <ras/ras_event.h>
> +
>  /* lock to memory controller's control array */
>  static DEFINE_MUTEX(mem_ctls_mutex);
>  static LIST_HEAD(mc_devices);
> @@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>  	 * which will perform kobj unregistration and the actual free
>  	 * will occur during the kobject callback operation
>  	 */
> +
>  	return mci;
>  }
>  EXPORT_SYMBOL_GPL(edac_mc_alloc);
> @@ -982,7 +987,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  			  const int layer2,
>  			  const char *msg,
>  			  const char *other_detail,
> -			  const void *mcelog)
> +			  const void *arch_log)
>  {
>  	/* FIXME: too much for stack: move it to some pre-alocated area */
>  	char detail[80], location[80];
> @@ -1119,21 +1124,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  	}
>  
>  	/* Memory type dependent details about the error */
> -	if (type == HW_EVENT_ERR_CORRECTED) {
> +	if (type == HW_EVENT_ERR_CORRECTED)
>  		snprintf(detail, sizeof(detail),
>  			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
>  			page_frame_number, offset_in_page,
>  			grain, syndrome);
> -		edac_ce_error(mci, pos, msg, location, label, detail,
> -			      other_detail, enable_per_layer_report,
> -			      page_frame_number, offset_in_page, grain);
> -	} else {
> +	else
>  		snprintf(detail, sizeof(detail),
>  			"page:0x%lx offset:0x%lx grain:%d",
>  			page_frame_number, offset_in_page, grain);
>  
> +	/* Report the error via the trace interface */
> +	trace_mc_event(type, mci->mc_idx, msg, label, location,
> +		       detail, other_detail);
> +
> +	/* Report the error via the edac_mc_printk() interface */
> +	if (type == HW_EVENT_ERR_CORRECTED)
> +		edac_ce_error(mci, pos, msg, location, label, detail,
> +			      other_detail, enable_per_layer_report,
> +			      page_frame_number, offset_in_page, grain);
> +	else
>  		edac_ue_error(mci, pos, msg, location, label, detail,
>  			      other_detail, enable_per_layer_report);
> -	}
>  }
>  EXPORT_SYMBOL_GPL(edac_mc_handle_error);
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> new file mode 100644
> index 0000000..66f6a43
> --- /dev/null
> +++ b/include/ras/ras_event.h
> @@ -0,0 +1,78 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM ras
> +#define TRACE_INCLUDE_FILE ras_event
> +
> +#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HW_EVENT_MC_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/edac.h>
> +#include <linux/ktime.h>
> +
> +/*
> + * Hardware Events Report
> + *
> + * Those events are generated when hardware detected a corrected or
> + * uncorrected event, and are meant to replace the current API to report
> + * errors defined on both EDAC and MCE subsystems.
> + *
> + * FIXME: Add events for handling memory errors originated from the
> + *        MCE subsystem.
> + */
> +
> +/*
> + * Hardware-independent Memory Controller specific events
> + */
> +
> +/*
> + * Default error mechanisms for Memory Controller errors (CE and UE)
> + */
> +TRACE_EVENT(mc_event,
> +
> +	TP_PROTO(const unsigned int err_type,
> +		 const unsigned int mc_index,
> +		 const char *error_msg,
> +		 const char *label,
> +		 const char *location,
> +		 const char *core_detail,
> +		 const char *driver_detail),
> +
> +	TP_ARGS(err_type, mc_index, error_msg, label, location,
> +		core_detail, driver_detail),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned int,	err_type		)
> +		__field(	unsigned int,	mc_index		)
> +		__string(	msg,		error_msg		)
> +		__string(	label,		label			)
> +		__string(	detail,		core_detail		)
> +		__string(	location,	location		)
> +		__string(	driver_detail,	driver_detail		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->err_type		= err_type;
> +		__entry->mc_index		= mc_index;
> +		__assign_str(msg, error_msg);
> +		__assign_str(label, label);
> +		__assign_str(location, location);
> +		__assign_str(detail, core_detail);
> +		__assign_str(driver_detail, driver_detail);
> +	),
> +
> +	TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
> +		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> +			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> +			"Fatal" : "Uncorrected"),
> +		  __get_str(msg),
> +		  __get_str(label),
> +		  __entry->mc_index,
> +		  __get_str(location),
> +		  __get_str(detail),
> +		  __get_str(driver_detail))
> +);
> +
> +#endif /* _TRACE_HW_EVENT_MC_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> -- 
> 1.7.8
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v.23-2] RAS: use tracepoint to handle hw issues
  2012-05-11 17:06       ` Borislav Petkov
@ 2012-05-11 17:10         ` Mauro Carvalho Chehab
  2012-05-11 22:31           ` Borislav Petkov
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-11 17:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Em 11-05-2012 14:06, Borislav Petkov escreveu:
> From Documentation/SubmittingPatches:
> 
> "2) Describe your changes.
> 
> Describe the technical detail of the change(s) your patch includes.
> 
> Be as specific as possible."
> 
> "use tracepoint to handle hw issues" is not specific - it is bullshit
> bingo with a random word generator.
> 
> I'm getting tired of this crap: either take the suggestion as is,
> without randomly changing it for no reason whatsoever, or give a good
> reason why it is not a good suggestion and propose a better one!
> 
> Randomly changing it in the next version of the patch for another, even
> crappier one doesn't help.

Well, tell me what title you want for this patch, instead of coming with
offending comments like above, and I'll change it to match your desire.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-11 12:37             ` Mauro Carvalho Chehab
@ 2012-05-11 17:24               ` Borislav Petkov
  2012-05-11 18:38                 ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2012-05-11 17:24 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Fri, May 11, 2012 at 09:37:48AM -0300, Mauro Carvalho Chehab wrote:
> > HW_ERR is the "official" prefix used by the MCE code in the kernel.
> > Maybe we can shorten it but it is needed to raise attention when staring
> > at dmesg output.
> > 
> > Now, since this tracepoint is not dmesg, we don't need it there at all
> > since we know that trace_mc_error reports memory errors.
> 
> IMO, we can get rid of it on trace, keeping it at dmesg.
> 
> > "mc_error" is also not needed.
> 
> Some name is required there. This parameter affects:
> 	- the name of the tracepoint function;
> 	- the TP_printk() output;
> 	- the trace filter name.
> 
> mc_event is probably a good name.

If this tracepoint is going to report memory errors and we drop HW_ERR,
then I guess "mc_error" is the most fitting one because it says what the
message is.

> >> The error count msg ("1 error(s)") could be replaced by "count:1".
> > 
> > Is there even a possibility to report more than one error when invoking
> > trace_mc_error once? If not, simply drop the count completely.
> 
> Good point. It makes sense to add a count parameter to the error handling core, 
> in order to handle "count". AFAIKT, the only two drivers that currently reports
> error counts are sb_edac and i7core_edac.

What does that mean? You report multiple errors with one tracepoint
invocation? How do you report error details then?

If you only report single errors, error count will always be 1 so drop
it.

> With regards to sb_edac, it also needs another logic to be handled by
> the core: channel_mask. This is required to handle lockstep mode and
> mirror mode.

What does "handle lockstep mode" mean and how is that important for me
staring at the tracepoint output.

[ … ]

> > Here's how it should look:
> > 
> > kworker/u:6-201   [007] .N..   161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)
> 
> As already explained: the error_msg is not consistent among the drivers.
> 
> So, "memory read on ..." will work fine on sb_edac/i7core_edac but it _won't_
> work on other drivers.
> 
> Changing this field on each driver requires deep knowledge of each memory
> controller, and not all datasheets are publicly available. 
> 
> For example, this is the code for Freescale MPC85xx EDAC driver:
> 
> 	if (err_detect & DDR_EDE_SBE)
> 		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
> 				     pfn, err_addr & ~PAGE_MASK, syndrome,
> 				     row_index, 0, -1,
> 				     mci->ctl_name, "", NULL);
> 
> 	if (err_detect & DDR_EDE_MBE)
> 		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
> 				     pfn, err_addr & ~PAGE_MASK, syndrome,
> 				     row_index, 0, -1,
> 				     mci->ctl_name, "", NULL);
> 
> It uses a blank value for the error message. putting the error_msg at the beginning,
> as you proposed would print: 
> 
> [Hardware Error]:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 )

that's fine since we know the tracepoint purpose:

 "mc_error:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)"

is pretty clear to me, IMHO.

> > * count is gone
> 
> While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.
> 
> The right thing here seems to move it to a core property and increment the error counters
> according with the error count.

Incrementing the error counters shouldn't have anything to do with the
error reporting.

> > * MC-drivers shouldn't say "error" when reporting an error
> > * UE/CE moves into the brackets
> 
> It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.

Why?

> > * socket moves earlier in the brackets, and keep the whole deal hierarchical.
> 
> Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
> were running the code when an error was generated, not the CPU that were managing the memory.

Then drop it if it's not relevant.

> > * drop "err_code" what is that?
> 
> In this case:
> 	u32 errcode = GET_BITFIELD(m->status, 0, 15);

Either decode it like I do in amd_decode_err_code() or remove it
completely - no naked bit values.

> > * drop second "socket"
> > * drop "area" Area "DRAM" - are there other?
> 
> Yes. there are 4 types of area at sb_edac.

And they are?

> > * what is "channel_mask"?
> 
> It is a bitmask mask showing all channels affected by an error, on sb_edac.
> Handing it is on my TODO list.

What can the user do with it when it sees it reported?

> > * move "rank" to earlier
> 
> Why? This is the least relevant information provided by the driver-specific logic:
> 
> 	snprintf(msg, sizeof(msg),
> 		 "count:%d%s%s area:%s err_code:%04x:%04x socket:%d channel_mask:%ld rank:%d",
> 		 core_err_cnt,
> 		 overflow ? " OVERFLOW" : "",
> 		 (uncorrected_error && recoverable) ? " recoverable" : "",
> 		 area_type,
> 		 mscod, errcode,
> 		 socket,
> 		 channel_mask,
> 		 rank);
> 
> Error count, overflow, recoverable bit, etc are more relevant than it, as it actually affects
> the user.
> 
> Rank, on the other hand, might only help someone interested on replacing a DRAM chip. Still,
> it is doubtful that this would be used, in practice, as companies that replaces DIMM chip likely
> have some specific equipments to test the DIMMs.
> 
> So, I almost dropped it. The only reason for me to not drop is that it allows me to compare
> the driver results with some other testing tools I'm using on driver's development.

Then drop it and keep a local version for your testing needs only.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-11 17:24               ` Borislav Petkov
@ 2012-05-11 18:38                 ` Mauro Carvalho Chehab
  2012-05-14 13:34                   ` Borislav Petkov
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-11 18:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Em 11-05-2012 14:24, Borislav Petkov escreveu:
> On Fri, May 11, 2012 at 09:37:48AM -0300, Mauro Carvalho Chehab wrote:
>>> HW_ERR is the "official" prefix used by the MCE code in the kernel.
>>> Maybe we can shorten it but it is needed to raise attention when staring
>>> at dmesg output.
>>>
>>> Now, since this tracepoint is not dmesg, we don't need it there at all
>>> since we know that trace_mc_error reports memory errors.
>>
>> IMO, we can get rid of it on trace, keeping it at dmesg.
>>
>>> "mc_error" is also not needed.
>>
>> Some name is required there. This parameter affects:
>> 	- the name of the tracepoint function;
>> 	- the TP_printk() output;
>> 	- the trace filter name.
>>
>> mc_event is probably a good name.
> 
> If this tracepoint is going to report memory errors and we drop HW_ERR,
> then I guess "mc_error" is the most fitting one because it says what the
> message is.

True. Yet, "mc_event" fits better, as some drivers can also report events
that aren't errors. So, calling it as "mc_error", as on my original patch
is actually wrong. For example, i5100 can report those types of events:

		"spare copy initiated", /* 20 */
		"spare copy completed", /* 21 */

Spare copy events don't fit into "errors". Other drivers can also report events
like passing some temperature thresholds that aren't really errors.

Eventually, we might need yet-another severity type for those types of events,
as they aren't Corrected/Uncorrected/Fatal errors, but just notify events.

>>>> The error count msg ("1 error(s)") could be replaced by "count:1".
>>>
>>> Is there even a possibility to report more than one error when invoking
>>> trace_mc_error once? If not, simply drop the count completely.
>>
>> Good point. It makes sense to add a count parameter to the error handling core, 
>> in order to handle "count". AFAIKT, the only two drivers that currently reports
>> error counts are sb_edac and i7core_edac.
> 
> What does that mean? You report multiple errors with one tracepoint
> invocation? How do you report error details then?

When a burst of errors happen at the same memory address (within a grain), a 
few memory controllers can merge them into a single report, providing an error
count and a overflow flag, if the burst is higher than the register size.

> If you only report single errors, error count will always be 1 so drop
> it.

No, it is not single errors. It is multiple errors at the same address/grain.

>> With regards to sb_edac, it also needs another logic to be handled by
>> the core: channel_mask. This is required to handle lockstep mode and
>> mirror mode.
> 
> What does "handle lockstep mode" mean and how is that important for me
> staring at the tracepoint output.

That were already explained dozens of time on the patch review threads:

The error check on lookstep mode is calculated over a 128 bits cacheline.
The memory controller sometimes is not able to distinguish if the error 
happened on the first channel or on the second channel.

So, either one (or the two) envolved DIMMs should be responsible for it.

> 
> [ … ]
> 
>>> Here's how it should look:
>>>
>>> kworker/u:6-201   [007] .N..   161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)
>>
>> As already explained: the error_msg is not consistent among the drivers.
>>
>> So, "memory read on ..." will work fine on sb_edac/i7core_edac but it _won't_
>> work on other drivers.
>>
>> Changing this field on each driver requires deep knowledge of each memory
>> controller, and not all datasheets are publicly available. 
>>
>> For example, this is the code for Freescale MPC85xx EDAC driver:
>>
>> 	if (err_detect & DDR_EDE_SBE)
>> 		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
>> 				     pfn, err_addr & ~PAGE_MASK, syndrome,
>> 				     row_index, 0, -1,
>> 				     mci->ctl_name, "", NULL);
>>
>> 	if (err_detect & DDR_EDE_MBE)
>> 		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
>> 				     pfn, err_addr & ~PAGE_MASK, syndrome,
>> 				     row_index, 0, -1,
>> 				     mci->ctl_name, "", NULL);
>>
>> It uses a blank value for the error message. putting the error_msg at the beginning,
>> as you proposed would print: 
>>
>> [Hardware Error]:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 )
> 
> that's fine since we know the tracepoint purpose:
> 
>  "mc_error:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)"
> 
> is pretty clear to me, IMHO.

yes, but: 
	mc_event:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)

is not clear if this is either an error or something else.

> 
>>> * count is gone
>>
>> While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.
>>
>> The right thing here seems to move it to a core property and increment the error counters
>> according with the error count.
> 
> Incrementing the error counters shouldn't have anything to do with the
> error reporting.

Why? The error counters is part of the API. Some utilities like edac-ctl actually don't even
care about the error logs. They only look into the error counting.

The API has 3 ways to report errors:
	- via counters;
	- via tracepoints;
	- via dmesg.

The error handling routine should update the error on all of them.

Of course, we may opt to remove error counting from the Kernel as a hole, letting it for
userspace handling. Even so, the driver needs to provide how many errors were reported,
in order to implement it on userspace.

>>> * MC-drivers shouldn't say "error" when reporting an error
>>> * UE/CE moves into the brackets
>>
>> It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.
> 
> Why?

Also already discussed dozens of time: this is the error severity. The way users handle
UE errors are different than the way they handle CE, e. g. CE errors are "tolerable". 
UE errors might not be.

> 
>>> * socket moves earlier in the brackets, and keep the whole deal hierarchical.
>>
>> Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
>> were running the code when an error was generated, not the CPU that were managing the memory.
> 
> Then drop it if it's not relevant.

Because irrelevant != "not very relevant".

I might eventually drop it on some latter cleanups on sb_edac driver.

>>> * drop "err_code" what is that?
>>
>> In this case:
>> 	u32 errcode = GET_BITFIELD(m->status, 0, 15);
> 
> Either decode it like I do in amd_decode_err_code() or remove it
> completely - no naked bit values.

It is decoded, but providing it might help with debugging.

I might eventually drop it on some latter cleanups on sb_edac driver.

> 
>>> * drop second "socket"
>>> * drop "area" Area "DRAM" - are there other?
>>
>> Yes. there are 4 types of area at sb_edac.
> 
> And they are?

See the driver:

static char *get_dram_attr(u32 reg)
{
	switch(DRAM_ATTR(reg)) {
		case 0:
			return "DRAM";
		case 1:
			return "MMCFG";
		case 2:
			return "NXM";
		default:
			return "unknown";
	}
}

> 
>>> * what is "channel_mask"?
>>
>> It is a bitmask mask showing all channels affected by an error, on sb_edac.
>> Handing it is on my TODO list.
> 
> What can the user do with it when it sees it reported?

Instead of blaming just one DIMM, blaming 2 or 4 DIMMs.

>>> * move "rank" to earlier
>>
>> Why? This is the least relevant information provided by the driver-specific logic:
>>
>> 	snprintf(msg, sizeof(msg),
>> 		 "count:%d%s%s area:%s err_code:%04x:%04x socket:%d channel_mask:%ld rank:%d",
>> 		 core_err_cnt,
>> 		 overflow ? " OVERFLOW" : "",
>> 		 (uncorrected_error && recoverable) ? " recoverable" : "",
>> 		 area_type,
>> 		 mscod, errcode,
>> 		 socket,
>> 		 channel_mask,
>> 		 rank);
>>
>> Error count, overflow, recoverable bit, etc are more relevant than it, as it actually affects
>> the user.
>>
>> Rank, on the other hand, might only help someone interested on replacing a DRAM chip. Still,
>> it is doubtful that this would be used, in practice, as companies that replaces DIMM chip likely
>> have some specific equipments to test the DIMMs.
>>
>> So, I almost dropped it. The only reason for me to not drop is that it allows me to compare
>> the driver results with some other testing tools I'm using on driver's development.
> 
> Then drop it and keep a local version for your testing needs only.

It is too early to do that, as there literally thousands of ways to configure SB Memory Controllers,
and I was not able to test all of them.

So, at least while we don't have a report of having all possible SB modes tested by somebody,
I won't be dropping things that will help me to address bugzillas opened for this driver.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v.23-2] RAS: use tracepoint to handle hw issues
  2012-05-11 17:02       ` Luck, Tony
@ 2012-05-11 18:53         ` Mauro Carvalho Chehab
  2012-05-11 20:07           ` Tony Luck
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-11 18:53 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Em 11-05-2012 14:02, Luck, Tony escreveu:
>> For example:
>>
>> mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
> 
> This is looking so much better.
> 
> I looked through your examples from drivers on what text we might see
> in the "memory read" position ... and agree that it would be a lot of
> work to make them all come up with grammatically clean messages, especially
> for all the poorly documented (or undocumented) "default/unknown/..." cases.
> 
> Back to my "does the casual user really need to know" soapbox. What different
> actions do we expect a user to take when we tell them "read error" or "write
> error" or "unknown error"?  I'm beginning to think that this belongs inside
> the brackets! Perhaps as:  type:"memory read"?

Indeed, read/write errors are equal for the user, but other events like (i5100):

		"SPD protocol error", /* 18 */
		"spare copy initiated", /* 20 */
		"spare copy completed", /* 21 */

Or (i5000):
	"Northbound CRC error on non-redundant retry";
	">Tmid Thermal event with intelligent throttling disabled";
			specific = "DIMM-spare copy started";
			specific = "DIMM-spare copy completed";

May mean that the DIMM is ok, but the error maybe on some other part of the system
(like an overheated cabinet, a badly-inserted DIMM or PCI device or maybe just some
 data mirroring in progress).

So, IMHO, keeping it at the main part of the error is valuable, at least when the
driver can generate such kinds of event.

> Then we'd have:
> 
> mc_event: Corrected error on memory stick "DIMM_1A" (bunch of stuff for deep diagnosis by vendor)

With the current implementation, this can actually be done at driver-basis: just fill
error_msg with a blank string and add all details at the driver-specific error detail.

> Knowing that the error was Corrected/Uncorrected is vital to the user. It lets them know
> the urgency with which they need to take action ... we need to educate them that a few
> "Corrected" errors are perfectly normal and nothing to raise blood pressure about.
> 
> Knowing which memory stick was involved - also very important. If they do take action,
> they should be able to swap out the memory stick that was the source of the problem.

> Everything else is just for memory geeks like me, you and Boris (and OEMs who want to
> diagnose root cause of problems they see by pattern analysis across errors from multiple
> machines with DIMMS from different batches/vendors).

Agreed (except for the cases like the above described).

Regards,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v.23-2] RAS: use tracepoint to handle hw issues
  2012-05-11 18:53         ` Mauro Carvalho Chehab
@ 2012-05-11 20:07           ` Tony Luck
  0 siblings, 0 replies; 43+ messages in thread
From: Tony Luck @ 2012-05-11 20:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Fri, May 11, 2012 at 11:53 AM, Mauro Carvalho Chehab
<mchehab@redhat.com> wrote:
> With the current implementation, this can actually be done at driver-basis: just fill
> error_msg with a blank string and add all details at the driver-specific error detail.

Cool. Then we just need some documentation telling EDAC driver
writers that error_msg is only intended for information that will help
a casual user understand what (if anything) to do about the error.

-Tony

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v.23-2] RAS: use tracepoint to handle hw issues
  2012-05-11 17:10         ` Mauro Carvalho Chehab
@ 2012-05-11 22:31           ` Borislav Petkov
  2012-05-11 22:35             ` Luck, Tony
  0 siblings, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2012-05-11 22:31 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

On Fri, May 11, 2012 at 02:10:30PM -0300, Mauro Carvalho Chehab wrote:
> Well, tell me what title you want for this patch, instead of coming
> with offending comments like above, and I'll change it to match your
> desire.

Already told you but here it is again:

> Let me repeat myself:
> 
> > It should be:
> >
> > Subject: RAS: Add a tracepoint for reporting memory controller errors
> >
> > and not
> >
> > Subject: edac, ras/hw_event.h: use events to handle hw issues
> >
> > because events don't handle hw issues.
> 
> You don't use events to handle hardware issues - you use tracepoints to
> report hardware issues! And besides, "event" is so overloaded in kernel
> land that actually refraining from its use makes me keep at least a bit
> of sanity when looking at yet another header called something event.h.
> 
> Please be more precise when naming your patches.
> 
> Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v.23-2] RAS: use tracepoint to handle hw issues
  2012-05-11 22:31           ` Borislav Petkov
@ 2012-05-11 22:35             ` Luck, Tony
  0 siblings, 0 replies; 43+ messages in thread
From: Luck, Tony @ 2012-05-11 22:35 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

> > Subject: RAS: Add a tracepoint for reporting memory controller errors

s/errors/events/ ... since Mauro has pointed out that some events like:
"spare copy initiated" and "spare copy completed" may be reported using
this tracepoint ... and they are not errors

-Tony

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v24] RAS: Add a tracepoint for reporting memory controller events
  2012-05-10 21:00   ` [PATCHv23] RAS: " Mauro Carvalho Chehab
  2012-05-11 10:04     ` Borislav Petkov
  2012-05-11 14:54     ` [PATCH v.23-2] RAS: use tracepoint " Mauro Carvalho Chehab
@ 2012-05-12 14:13     ` Mauro Carvalho Chehab
  2 siblings, 0 replies; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-12 14:13 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

Add a new tracepoint-based hardware events report method for
reporting Memory Controller events.

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

    We have several subsystems & methods for reporting hardware errors:

    1) EDAC ("Error Detection and Correction").  In its original form
    this consisted of a platform specific driver that read topology
    information and error counts from chipset registers and reported
    the results via a sysfs interface.

    2) mcelog - x86 specific decoding of machine check bank registers
    reporting in binary form via /dev/mcelog. Recent additions make use
    of the APEI extensions that were documented in version 4.0a of the
    ACPI specification to acquire more information about errors without
    having to rely reading chipset registers directly. A user level
    programs decodes into somewhat human readable format.

    3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
    decodes errors reported via machine check bank registers in AMD
    processors to the console log using printk();

    Each of these mechanisms has a band of followers ... and none
    of them appear to meet all the needs of all users.

As part of a RAS subsystem, let's encapsulate the memory error hardware
events into a trace facility.

The tracepoint printk will be displayed like:

mc_event: (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick "[label]" ([location] [edac_mc detail] [driver_detail])

Where:
	[error msg] is the driver-specific error message
		    (e. g. "memory read", "bus error", ...);
	[location] is the location in terms of memory controller and
		   branch/channel/slot, channel/slot or csrow/channel;
	[label] is the memory stick label;
	[edac_mc detail] describes the address location of the error
			 and the syndrome;
	[driver detail] is driver-specifig error message details,
			when needed/provided (e. g. "area:DMA", ...)

For example:

mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their MIBs.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
Hoever, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_core.h |    2 +-
 drivers/edac/edac_mc.c   |   46 +++++++++++++++++++++++----
 include/ras/ras_event.h  |   78 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+), 8 deletions(-)
 create mode 100644 include/ras/ras_event.h

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog);
+			  const void *arch_log);
 
 /*
  * edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index e5b5563..eb078ec 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,10 @@
 #include "edac_core.h"
 #include "edac_module.h"
 
+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/ras_event.h>
+
 /* lock to memory controller's control array */
 static DEFINE_MUTEX(mem_ctls_mutex);
 static LIST_HEAD(mc_devices);
@@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 	 * which will perform kobj unregistration and the actual free
 	 * will occur during the kobject callback operation
 	 */
+
 	return mci;
 }
 EXPORT_SYMBOL_GPL(edac_mc_alloc);
@@ -972,6 +977,27 @@ static void edac_ue_error(struct mem_ctl_info *mci,
 }
 
 #define OTHER_LABEL " or "
+
+/**
+ * edac_mc_handle_error - reports a memory event to userspace
+ *
+ * @type:		severity of the error (CE/UE/Fatal)
+ * @mci:		a struct mem_ctl_info pointer
+ * @page_frame_number:	mem page where the error occurred
+ * @offset_in_page:	offset of the error inside the page
+ * @syndrome:		ECC syndrome
+ * @layer0:		Memory layer0 position
+ * @layer1:		Memory layer2 position
+ * @layer2:		Memory layer3 position
+ * @msg:		Message meaningful to the end users that
+ *			explains the event
+ * @other_detail:	Technical details about the event that
+ *			may help hardware manufacturers and
+ *			EDAC developers to analyse the event
+ * @arch_log:		Architecture-specific struct that can
+ *			be used to add extended information to the
+ *			tracepoint, like dumping MCE registers.
+ */
 void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  struct mem_ctl_info *mci,
 			  const unsigned long page_frame_number,
@@ -982,7 +1008,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog)
+			  const void *arch_log)
 {
 	/* FIXME: too much for stack: move it to some pre-alocated area */
 	char detail[80], location[80];
@@ -1119,21 +1145,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 	}
 
 	/* Memory type dependent details about the error */
-	if (type == HW_EVENT_ERR_CORRECTED) {
+	if (type == HW_EVENT_ERR_CORRECTED)
 		snprintf(detail, sizeof(detail),
 			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
 			page_frame_number, offset_in_page,
 			grain, syndrome);
-		edac_ce_error(mci, pos, msg, location, label, detail,
-			      other_detail, enable_per_layer_report,
-			      page_frame_number, offset_in_page, grain);
-	} else {
+	else
 		snprintf(detail, sizeof(detail),
 			"page:0x%lx offset:0x%lx grain:%d",
 			page_frame_number, offset_in_page, grain);
 
+	/* Report the error via the trace interface */
+	trace_mc_event(type, mci->mc_idx, msg, label, location,
+		       detail, other_detail);
+
+	/* Report the error via the edac_mc_printk() interface */
+	if (type == HW_EVENT_ERR_CORRECTED)
+		edac_ce_error(mci, pos, msg, location, label, detail,
+			      other_detail, enable_per_layer_report,
+			      page_frame_number, offset_in_page, grain);
+	else
 		edac_ue_error(mci, pos, msg, location, label, detail,
 			      other_detail, enable_per_layer_report);
-	}
 }
 EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
new file mode 100644
index 0000000..66f6a43
--- /dev/null
+++ b/include/ras/ras_event.h
@@ -0,0 +1,78 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ras
+#define TRACE_INCLUDE_FILE ras_event
+
+#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_MC_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+#include <linux/ktime.h>
+
+/*
+ * Hardware Events Report
+ *
+ * Those events are generated when hardware detected a corrected or
+ * uncorrected event, and are meant to replace the current API to report
+ * errors defined on both EDAC and MCE subsystems.
+ *
+ * FIXME: Add events for handling memory errors originated from the
+ *        MCE subsystem.
+ */
+
+/*
+ * Hardware-independent Memory Controller specific events
+ */
+
+/*
+ * Default error mechanisms for Memory Controller errors (CE and UE)
+ */
+TRACE_EVENT(mc_event,
+
+	TP_PROTO(const unsigned int err_type,
+		 const unsigned int mc_index,
+		 const char *error_msg,
+		 const char *label,
+		 const char *location,
+		 const char *core_detail,
+		 const char *driver_detail),
+
+	TP_ARGS(err_type, mc_index, error_msg, label, location,
+		core_detail, driver_detail),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	err_type		)
+		__field(	unsigned int,	mc_index		)
+		__string(	msg,		error_msg		)
+		__string(	label,		label			)
+		__string(	detail,		core_detail		)
+		__string(	location,	location		)
+		__string(	driver_detail,	driver_detail		)
+	),
+
+	TP_fast_assign(
+		__entry->err_type		= err_type;
+		__entry->mc_index		= mc_index;
+		__assign_str(msg, error_msg);
+		__assign_str(label, label);
+		__assign_str(location, location);
+		__assign_str(detail, core_detail);
+		__assign_str(driver_detail, driver_detail);
+	),
+
+	TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
+		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+			"Fatal" : "Uncorrected"),
+		  __get_str(msg),
+		  __get_str(label),
+		  __entry->mc_index,
+		  __get_str(location),
+		  __get_str(detail),
+		  __get_str(driver_detail))
+);
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-11 18:38                 ` Mauro Carvalho Chehab
@ 2012-05-14 13:34                   ` Borislav Petkov
  2012-05-14 14:27                     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2012-05-14 13:34 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Fri, May 11, 2012 at 03:38:37PM -0300, Mauro Carvalho Chehab wrote:
> True. Yet, "mc_event" fits better, as some drivers can also report events
> that aren't errors. So, calling it as "mc_error", as on my original patch
> is actually wrong. For example, i5100 can report those types of events:
> 
> 		"spare copy initiated", /* 20 */
> 		"spare copy completed", /* 21 */
> 
> Spare copy events don't fit into "errors". Other drivers can also report events
> like passing some temperature thresholds that aren't really errors.
> 
> Eventually, we might need yet-another severity type for those types of events,
> as they aren't Corrected/Uncorrected/Fatal errors, but just notify events.

Stupid i5100... :-)

> >>>> The error count msg ("1 error(s)") could be replaced by "count:1".
> >>>
> >>> Is there even a possibility to report more than one error when invoking
> >>> trace_mc_error once? If not, simply drop the count completely.
> >>
> >> Good point. It makes sense to add a count parameter to the error handling core, 
> >> in order to handle "count". AFAIKT, the only two drivers that currently reports
> >> error counts are sb_edac and i7core_edac.
> > 
> > What does that mean? You report multiple errors with one tracepoint
> > invocation? How do you report error details then?
> 
> When a burst of errors happen at the same memory address (within a grain), a 
> few memory controllers can merge them into a single report, providing an error

What does "a few memory controllers can merge them into a single report"
mean exactly? Which few? Do you have an example output?

> count and a overflow flag, if the burst is higher than the register size.
> 
> > If you only report single errors, error count will always be 1 so drop
> > it.
> 
> No, it is not single errors. It is multiple errors at the same address/grain.

Oh, so the SB MC can report multiple errors with one MCE or whatever it uses?

> >> With regards to sb_edac, it also needs another logic to be handled by
> >> the core: channel_mask. This is required to handle lockstep mode and
> >> mirror mode.
> > 
> > What does "handle lockstep mode" mean and how is that important for me
> > staring at the tracepoint output.
> 
> That were already explained dozens of time on the patch review threads:
> 
> The error check on lookstep mode is calculated over a 128 bits cacheline.
> The memory controller sometimes is not able to distinguish if the error 
> happened on the first channel or on the second channel.

Ah, channel interleaving.

> So, either one (or the two) envolved DIMMs should be responsible for it.

Ok, so in that case having channel mask in there doesn't begin to
explain the user what that means - only you and Intel people know that.

In that case, you probably want to make it much more explicit:

"kworker/u:6-201   [007] .N..   161.136624: [Hardware Error]: memory read on unknown memory... (... channel: ...)"

like in the case with interleaved csrow controllers or something to that
effect saying that you cannot pinpoint the DIMM but it could be either
of the n DIMMs on channels ....

?

> >>> Here's how it should look:
> >>>
> >>> kworker/u:6-201   [007] .N..   161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)
> >>
> >> As already explained: the error_msg is not consistent among the drivers.
> >>
> >> So, "memory read on ..." will work fine on sb_edac/i7core_edac but it _won't_
> >> work on other drivers.
> >>
> >> Changing this field on each driver requires deep knowledge of each memory
> >> controller, and not all datasheets are publicly available. 
> >>
> >> For example, this is the code for Freescale MPC85xx EDAC driver:
> >>
> >> 	if (err_detect & DDR_EDE_SBE)
> >> 		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
> >> 				     pfn, err_addr & ~PAGE_MASK, syndrome,
> >> 				     row_index, 0, -1,
> >> 				     mci->ctl_name, "", NULL);
> >>
> >> 	if (err_detect & DDR_EDE_MBE)
> >> 		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
> >> 				     pfn, err_addr & ~PAGE_MASK, syndrome,
> >> 				     row_index, 0, -1,
> >> 				     mci->ctl_name, "", NULL);
> >>
> >> It uses a blank value for the error message. putting the error_msg at the beginning,
> >> as you proposed would print: 
> >>
> >> [Hardware Error]:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 )
> > 
> > that's fine since we know the tracepoint purpose:
> > 
> >  "mc_error:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)"
> > 
> > is pretty clear to me, IMHO.
> 
> yes, but: 
> 	mc_event:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)
> 
> is not clear if this is either an error or something else.

Ok, I'd say we call it "mc_error" and i5100's two messages about spare
copying simply lose. We're reporting predominantly errors here anyway.

> >>> * count is gone
> >>
> >> While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.
> >>
> >> The right thing here seems to move it to a core property and increment the error counters
> >> according with the error count.
> > 
> > Incrementing the error counters shouldn't have anything to do with the
> > error reporting.
> 
> Why?

Because incrementing error counters and reporting errors should be two
different things.

> >>> * MC-drivers shouldn't say "error" when reporting an error
> >>> * UE/CE moves into the brackets
> >>
> >> It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.
> > 
> > Why?
> 
> Also already discussed dozens of time: this is the error severity. The way users handle
> UE errors are different than the way they handle CE, e. g. CE errors are "tolerable". 
> UE errors might not be.

And why does that warrant having the error type in the beginning of the
message? I can still read it if it is in the brackets a couple of chars
later on.

> >>> * socket moves earlier in the brackets, and keep the whole deal hierarchical.
> >>
> >> Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
> >> were running the code when an error was generated, not the CPU that were managing the memory.
> > 
> > Then drop it if it's not relevant.
> 
> Because irrelevant != "not very relevant".
> 
> I might eventually drop it on some latter cleanups on sb_edac driver.

You still don't give me a valid, technical reason to keep it.

And yes, it _IS_ black or white: the socket either tells you a valuable
piece of information which you need for handling the error after
reporting it, or not, is useless, and no one needs to have it in the
error log.

It is that simple.

> 
> >>> * drop "err_code" what is that?
> >>
> >> In this case:
> >> 	u32 errcode = GET_BITFIELD(m->status, 0, 15);
> > 
> > Either decode it like I do in amd_decode_err_code() or remove it
> > completely - no naked bit values.
> 
> It is decoded, but providing it might help with debugging.

I only see naked bit values, where is it decoded? It should be properly
decoded in the final error message that the user gets to see.

> >>> * drop second "socket"
> >>> * drop "area" Area "DRAM" - are there other?
> >>
> >> Yes. there are 4 types of area at sb_edac.
> > 
> > And they are?
> 
> See the driver:
> 
> static char *get_dram_attr(u32 reg)
> {
> 	switch(DRAM_ATTR(reg)) {
> 		case 0:
> 			return "DRAM";
> 		case 1:
> 			return "MMCFG";
> 		case 2:
> 			return "NXM";
> 		default:
> 			return "unknown";
> 	}

You mean 3 - "unknown" is not a memory region.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-14 13:34                   ` Borislav Petkov
@ 2012-05-14 14:27                     ` Mauro Carvalho Chehab
  2012-05-15 15:09                       ` Borislav Petkov
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-14 14:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Hmm... I've already released 2 versions after this one, addressing all the pointed
issues pointed by you and Tony at:

	http://permalink.gmane.org/gmane.linux.kernel/1296116

It would be good if you could comment against the latest patch version.

Em 14-05-2012 10:34, Borislav Petkov escreveu:
> On Fri, May 11, 2012 at 03:38:37PM -0300, Mauro Carvalho Chehab wrote:
>> True. Yet, "mc_event" fits better, as some drivers can also report events
>> that aren't errors. So, calling it as "mc_error", as on my original patch
>> is actually wrong. For example, i5100 can report those types of events:
>>
>> 		"spare copy initiated", /* 20 */
>> 		"spare copy completed", /* 21 */
>>
>> Spare copy events don't fit into "errors". Other drivers can also report events
>> like passing some temperature thresholds that aren't really errors.
>>
>> Eventually, we might need yet-another severity type for those types of events,
>> as they aren't Corrected/Uncorrected/Fatal errors, but just notify events.
> 
> Stupid i5100... :-)

:)

> 
>>>>>> The error count msg ("1 error(s)") could be replaced by "count:1".
>>>>>
>>>>> Is there even a possibility to report more than one error when invoking
>>>>> trace_mc_error once? If not, simply drop the count completely.
>>>>
>>>> Good point. It makes sense to add a count parameter to the error handling core, 
>>>> in order to handle "count". AFAIKT, the only two drivers that currently reports
>>>> error counts are sb_edac and i7core_edac.
>>>
>>> What does that mean? You report multiple errors with one tracepoint
>>> invocation? How do you report error details then?
>>
>> When a burst of errors happen at the same memory address (within a grain), a 
>> few memory controllers can merge them into a single report, providing an error
> 
> What does "a few memory controllers can merge them into a single report"
> mean exactly? Which few? Do you have an example output?

sb_edac and i7core_edac are two examples. Other memory controllers might have this
feature, but, AFAIKT, no other drivers implement it.

I don't have any example handy.

>> count and a overflow flag, if the burst is higher than the register size.
>>
>>> If you only report single errors, error count will always be 1 so drop
>>> it.
>>
>> No, it is not single errors. It is multiple errors at the same address/grain.
> 
> Oh, so the SB MC can report multiple errors with one MCE or whatever it uses?

Yes.

> 
>>>> With regards to sb_edac, it also needs another logic to be handled by
>>>> the core: channel_mask. This is required to handle lockstep mode and
>>>> mirror mode.
>>>
>>> What does "handle lockstep mode" mean and how is that important for me
>>> staring at the tracepoint output.
>>
>> That were already explained dozens of time on the patch review threads:
>>
>> The error check on lookstep mode is calculated over a 128 bits cacheline.
>> The memory controller sometimes is not able to distinguish if the error 
>> happened on the first channel or on the second channel.
> 
> Ah, channel interleaving.

It is due to channel interleaving, but some chipsets provide more than one
ECC algorithm: one algo uses 72 bits for ECC while others use 144 bits for ECC.

On those, when channel interleaving is enabled and lockstep is disabled, it uses
two 72 bits ECC, one for each DIMM, with points to a single DIMM on errors,
at the expense of correcting less errors.

>> So, either one (or the two) envolved DIMMs should be responsible for it.
> 
> Ok, so in that case having channel mask in there doesn't begin to
> explain the user what that means - only you and Intel people know that.

With the current way, yes.

> In that case, you probably want to make it much more explicit:
> 
> "kworker/u:6-201   [007] .N..   161.136624: [Hardware Error]: memory read on unknown memory... (... channel: ...)"
> 
> like in the case with interleaved csrow controllers or something to that
> effect saying that you cannot pinpoint the DIMM but it could be either
> of the n DIMMs on channels ....
> 
> ?

What I intend to do is to display all affected DIMMs, like:

Corrected error: memory read on "DIMM1 or DIMM2" ... 

but filtering the "channel_mask" requires more core changes, so I won't start 
working on it before merging the current patch series.

>>>>> Here's how it should look:
>>>>>
>>>>> kworker/u:6-201   [007] .N..   161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)
>>>>
>>>> As already explained: the error_msg is not consistent among the drivers.
>>>>
>>>> So, "memory read on ..." will work fine on sb_edac/i7core_edac but it _won't_
>>>> work on other drivers.
>>>>
>>>> Changing this field on each driver requires deep knowledge of each memory
>>>> controller, and not all datasheets are publicly available. 
>>>>
>>>> For example, this is the code for Freescale MPC85xx EDAC driver:
>>>>
>>>> 	if (err_detect & DDR_EDE_SBE)
>>>> 		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
>>>> 				     pfn, err_addr & ~PAGE_MASK, syndrome,
>>>> 				     row_index, 0, -1,
>>>> 				     mci->ctl_name, "", NULL);
>>>>
>>>> 	if (err_detect & DDR_EDE_MBE)
>>>> 		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
>>>> 				     pfn, err_addr & ~PAGE_MASK, syndrome,
>>>> 				     row_index, 0, -1,
>>>> 				     mci->ctl_name, "", NULL);
>>>>
>>>> It uses a blank value for the error message. putting the error_msg at the beginning,
>>>> as you proposed would print: 
>>>>
>>>> [Hardware Error]:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 )
>>>
>>> that's fine since we know the tracepoint purpose:
>>>
>>>  "mc_error:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)"
>>>
>>> is pretty clear to me, IMHO.
>>
>> yes, but: 
>> 	mc_event:  on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)
>>
>> is not clear if this is either an error or something else.
> 
> Ok, I'd say we call it "mc_error" and i5100's two messages about spare
> copying simply lose. We're reporting predominantly errors here anyway.

It is not only i5100 that have non-error events like that. It is ok to call the event function
as *_handle_error, as this is something internal to the subsystem, but, when outputting to
userspace, we should avoid calling "error" something that it isn't an error.

>>>>> * count is gone
>>>>
>>>> While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.
>>>>
>>>> The right thing here seems to move it to a core property and increment the error counters
>>>> according with the error count.
>>>
>>> Incrementing the error counters shouldn't have anything to do with the
>>> error reporting.
>>
>> Why?
> 
> Because incrementing error counters and reporting errors should be two
> different things.

It is not different things: it is just different ways of exporting the same
information. Some tools only use the error counters currently to output memory errors
(see edac-util).

>>>>> * MC-drivers shouldn't say "error" when reporting an error
>>>>> * UE/CE moves into the brackets
>>>>
>>>> It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.
>>>
>>> Why?
>>
>> Also already discussed dozens of time: this is the error severity. The way users handle
>> UE errors are different than the way they handle CE, e. g. CE errors are "tolerable". 
>> UE errors might not be.
> 
> And why does that warrant having the error type in the beginning of the
> message? I can still read it if it is in the brackets a couple of chars
> later on.

What were agreed (at rebase version 3?) is that user-relevant info is outside parenthesis, 
while developers/manufactures relevant info is inside parenthesis.

>>>>> * socket moves earlier in the brackets, and keep the whole deal hierarchical.
>>>>
>>>> Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
>>>> were running the code when an error was generated, not the CPU that were managing the memory.
>>>
>>> Then drop it if it's not relevant.
>>
>> Because irrelevant != "not very relevant".
>>
>> I might eventually drop it on some latter cleanups on sb_edac driver.
> 
> You still don't give me a valid, technical reason to keep it.
> 
> And yes, it _IS_ black or white: the socket either tells you a valuable
> piece of information which you need for handling the error after
> reporting it, or not, is useless, and no one needs to have it in the
> error log.
> 
> It is that simple.

As I said, "rank" on sb_edac is relevant for me, as the driver's author,
when handling issues related to this driver.

Please, don't pretend to know better what's relevant at sb_edac than 
the driver's author.

>>>>> * drop "err_code" what is that?
>>>>
>>>> In this case:
>>>> 	u32 errcode = GET_BITFIELD(m->status, 0, 15);
>>>
>>> Either decode it like I do in amd_decode_err_code() or remove it
>>> completely - no naked bit values.
>>
>> It is decoded, but providing it might help with debugging.
> 
> I only see naked bit values, where is it decoded? It should be properly
> decoded in the final error message that the user gets to see.

It is decoded at the sb_driver. See the drivers's code for further details.

>>>>> * drop second "socket"
>>>>> * drop "area" Area "DRAM" - are there other?
>>>>
>>>> Yes. there are 4 types of area at sb_edac.
>>>
>>> And they are?
>>
>> See the driver:
>>
>> static char *get_dram_attr(u32 reg)
>> {
>> 	switch(DRAM_ATTR(reg)) {
>> 		case 0:
>> 			return "DRAM";
>> 		case 1:
>> 			return "MMCFG";
>> 		case 2:
>> 			return "NXM";
>> 		default:
>> 			return "unknown";
>> 	}
> 
> You mean 3 - "unknown" is not a memory region.
> 

Yes. So, there are actually 3 types of area.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-14 14:27                     ` Mauro Carvalho Chehab
@ 2012-05-15 15:09                       ` Borislav Petkov
  2012-05-15 16:05                         ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2012-05-15 15:09 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Mon, May 14, 2012 at 11:27:09AM -0300, Mauro Carvalho Chehab wrote:
> Hmm... I've already released 2 versions after this one, addressing all the pointed
> issues pointed by you and Tony at:
> 
> 	http://permalink.gmane.org/gmane.linux.kernel/1296116
> 
> It would be good if you could comment against the latest patch version.

Ok, here it is:

---
> commit 771823672b7c244b9a57523c955ead9fd87f2412
> Author: Mauro Carvalho Chehab <mchehab@redhat.com>
> Date:   Thu Feb 23 08:10:34 2012 -0300
> 
>     RAS: Add a tracepoint for reporting memory controller events
>     
>     Add a new tracepoint-based hardware events report method for
>     reporting Memory Controller events.
>     
>     Part of the description bellow is shamelessly copied from Tony
>     Luck's notes about the Hardware Error BoF during LPC 2010 [1].
>     Tony, thanks for your notes and discussions to generate the
>     h/w error reporting requirements.
>     
>     [1] http://lwn.net/Articles/416669/
>     
>         We have several subsystems & methods for reporting hardware errors:
>     
>         1) EDAC ("Error Detection and Correction").  In its original form
>         this consisted of a platform specific driver that read topology
>         information and error counts from chipset registers and reported
>         the results via a sysfs interface.
>     
>         2) mcelog - x86 specific decoding of machine check bank registers
>         reporting in binary form via /dev/mcelog. Recent additions make use
>         of the APEI extensions that were documented in version 4.0a of the
>         ACPI specification to acquire more information about errors without
>         having to rely reading chipset registers directly. A user level
>         programs decodes into somewhat human readable format.
>     
>         3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
>         decodes errors reported via machine check bank registers in AMD
>         processors to the console log using printk();
>     
>         Each of these mechanisms has a band of followers ... and none
>         of them appear to meet all the needs of all users.
>     
>     As part of a RAS subsystem, let's encapsulate the memory error hardware
>     events into a trace facility.
>     
>     The tracepoint printk will be displayed like:
>     
>     mc_event: (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick "[label]" ([location] [edac_mc detail] [driver_detail])
>     
>     Where:
>     	[error msg] is the driver-specific error message
>     		    (e. g. "memory read", "bus error", ...);
>     	[location] is the location in terms of memory controller and
>     		   branch/channel/slot, channel/slot or csrow/channel;
>     	[label] is the memory stick label;
>     	[edac_mc detail] describes the address location of the error
>     			 and the syndrome;
>     	[driver detail] is driver-specifig error message details,
>     			when needed/provided (e. g. "area:DMA", ...)

Ah ok, so count, area and rank are driver-specific stuff and they're not part of
the mandatory string output, I guess that's fine.

Here's what an error looks like on my system here:

       mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )

There's still this trailing " " at the end of the error line which
shouldn't be there and also two spaces between "channel" and "page".

Also, according to the output above "amd64_edac" is supposed to be
[error msg] which is strange.

I believe this comes from this call in f1x_map_sysaddr_to_csrow():

	        edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
                             page, offset, syndrome,
                             csrow, chan, -1,
                             EDAC_MOD_STR, "", NULL);

I guess you want to do the following instead:

       mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)

maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
corrected/uncorrected error?

Also, make sure the calls in the other drivers don't generate such
strange output.

>     For example:
>     
>     mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
				you need a space after "error:"?

>     Of course, any userspace tools meant to handle errors should not parse
>     the above data. They should, instead, use the binary fields provided by
>     the tracepoint, mapping them directly into their MIBs.

What is a MIB?

>     NOTE: The original patch was providing an additional mechanism for
>     MCA-based trace events that also contained MCA error register data.
>     Hoever, as no agreement was reached so far for the MCA-based trace
>     events, for now, let's add events only for memory errors.
>     A latter patch is planned to change the tracepoint, for those types
>     of event.
>     
>     Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
>     Cc: Doug Thompson <norsk5@yahoo.com>
>     Cc: Steven Rostedt <rostedt@goodmis.org>
>     Cc: Frederic Weisbecker <fweisbec@gmail.com>
>     Cc: Ingo Molnar <mingo@redhat.com>
>     Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
> 
> diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
> index f06ce9ab692c..eee73605c5a0 100644
> --- a/drivers/edac/edac_core.h
> +++ b/drivers/edac/edac_core.h
> @@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  			  const int layer2,
>  			  const char *msg,
>  			  const char *other_detail,
> -			  const void *mcelog);
> +			  const void *arch_log);
>  
>  /*
>   * edac_device APIs
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index e5b55632359f..eb078ec62044 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -33,6 +33,10 @@
>  #include "edac_core.h"
>  #include "edac_module.h"
>  
> +#define CREATE_TRACE_POINTS
> +#define TRACE_INCLUDE_PATH ../../include/ras
> +#include <ras/ras_event.h>
> +
>  /* lock to memory controller's control array */
>  static DEFINE_MUTEX(mem_ctls_mutex);
>  static LIST_HEAD(mc_devices);
> @@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>  	 * which will perform kobj unregistration and the actual free
>  	 * will occur during the kobject callback operation
>  	 */
> +
>  	return mci;
>  }
>  EXPORT_SYMBOL_GPL(edac_mc_alloc);
> @@ -972,6 +977,27 @@ static void edac_ue_error(struct mem_ctl_info *mci,
>  }
>  
>  #define OTHER_LABEL " or "
> +
> +/**
> + * edac_mc_handle_error - reports a memory event to userspace
> + *
> + * @type:		severity of the error (CE/UE/Fatal)
> + * @mci:		a struct mem_ctl_info pointer
> + * @page_frame_number:	mem page where the error occurred
> + * @offset_in_page:	offset of the error inside the page
> + * @syndrome:		ECC syndrome
> + * @layer0:		Memory layer0 position
> + * @layer1:		Memory layer2 position
> + * @layer2:		Memory layer3 position
> + * @msg:		Message meaningful to the end users that
> + *			explains the event
> + * @other_detail:	Technical details about the event that
> + *			may help hardware manufacturers and
> + *			EDAC developers to analyse the event

					analyze it.

> + * @arch_log:		Architecture-specific struct that can
> + *			be used to add extended information to the
> + *			tracepoint, like dumping MCE registers.
> + */
>  void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  			  struct mem_ctl_info *mci,
>  			  const unsigned long page_frame_number,
> @@ -982,7 +1008,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  			  const int layer2,
>  			  const char *msg,
>  			  const char *other_detail,
> -			  const void *mcelog)
> +			  const void *arch_log)
>  {
>  	/* FIXME: too much for stack: move it to some pre-alocated area */
>  	char detail[80], location[80];
> @@ -1119,21 +1145,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>  	}
>  
>  	/* Memory type dependent details about the error */
> -	if (type == HW_EVENT_ERR_CORRECTED) {
> +	if (type == HW_EVENT_ERR_CORRECTED)
>  		snprintf(detail, sizeof(detail),
>  			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
>  			page_frame_number, offset_in_page,
>  			grain, syndrome);
> -		edac_ce_error(mci, pos, msg, location, label, detail,
> -			      other_detail, enable_per_layer_report,
> -			      page_frame_number, offset_in_page, grain);
> -	} else {
> +	else
>  		snprintf(detail, sizeof(detail),
>  			"page:0x%lx offset:0x%lx grain:%d",
>  			page_frame_number, offset_in_page, grain);
>  
> +	/* Report the error via the trace interface */
> +	trace_mc_event(type, mci->mc_idx, msg, label, location,
> +		       detail, other_detail);
> +
> +	/* Report the error via the edac_mc_printk() interface */
> +	if (type == HW_EVENT_ERR_CORRECTED)
> +		edac_ce_error(mci, pos, msg, location, label, detail,
> +			      other_detail, enable_per_layer_report,
> +			      page_frame_number, offset_in_page, grain);
> +	else
>  		edac_ue_error(mci, pos, msg, location, label, detail,
>  			      other_detail, enable_per_layer_report);
> -	}
>  }
>  EXPORT_SYMBOL_GPL(edac_mc_handle_error);
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> new file mode 100644
> index 000000000000..66f6a43151dc
> --- /dev/null
> +++ b/include/ras/ras_event.h
> @@ -0,0 +1,78 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM ras
> +#define TRACE_INCLUDE_FILE ras_event
> +
> +#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HW_EVENT_MC_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/edac.h>
> +#include <linux/ktime.h>
> +
> +/*
> + * Hardware Events Report
> + *
> + * Those events are generated when hardware detected a corrected or
> + * uncorrected event, and are meant to replace the current API to report
> + * errors defined on both EDAC and MCE subsystems.
> + *
> + * FIXME: Add events for handling memory errors originated from the
> + *        MCE subsystem.
> + */
> +
> +/*
> + * Hardware-independent Memory Controller specific events
> + */
> +
> +/*
> + * Default error mechanisms for Memory Controller errors (CE and UE)
> + */
> +TRACE_EVENT(mc_event,
> +
> +	TP_PROTO(const unsigned int err_type,
> +		 const unsigned int mc_index,
> +		 const char *error_msg,
> +		 const char *label,
> +		 const char *location,
> +		 const char *core_detail,
> +		 const char *driver_detail),
> +
> +	TP_ARGS(err_type, mc_index, error_msg, label, location,
> +		core_detail, driver_detail),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned int,	err_type		)
> +		__field(	unsigned int,	mc_index		)
> +		__string(	msg,		error_msg		)
> +		__string(	label,		label			)
> +		__string(	detail,		core_detail		)
> +		__string(	location,	location		)
> +		__string(	driver_detail,	driver_detail		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->err_type		= err_type;
> +		__entry->mc_index		= mc_index;
> +		__assign_str(msg, error_msg);
> +		__assign_str(label, label);
> +		__assign_str(location, location);
> +		__assign_str(detail, core_detail);
> +		__assign_str(driver_detail, driver_detail);
> +	),
> +
> +	TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
> +		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> +			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> +			"Fatal" : "Uncorrected"),
> +		  __get_str(msg),
> +		  __get_str(label),
> +		  __entry->mc_index,
> +		  __get_str(location),
> +		  __get_str(detail),
> +		  __get_str(driver_detail))
> +);
> +
> +#endif /* _TRACE_HW_EVENT_MC_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-15 15:09                       ` Borislav Petkov
@ 2012-05-15 16:05                         ` Mauro Carvalho Chehab
  2012-05-15 16:38                           ` Borislav Petkov
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-15 16:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Em 15-05-2012 12:09, Borislav Petkov escreveu:
> On Mon, May 14, 2012 at 11:27:09AM -0300, Mauro Carvalho Chehab wrote:
>> Hmm... I've already released 2 versions after this one, addressing all the pointed
>> issues pointed by you and Tony at:
>>
>> 	http://permalink.gmane.org/gmane.linux.kernel/1296116
>>
>> It would be good if you could comment against the latest patch version.
> 
> Ok, here it is:
> 
> ---
>> commit 771823672b7c244b9a57523c955ead9fd87f2412
>> Author: Mauro Carvalho Chehab <mchehab@redhat.com>
>> Date:   Thu Feb 23 08:10:34 2012 -0300
>>
>>     RAS: Add a tracepoint for reporting memory controller events
>>     
>>     Add a new tracepoint-based hardware events report method for
>>     reporting Memory Controller events.
>>     
>>     Part of the description bellow is shamelessly copied from Tony
>>     Luck's notes about the Hardware Error BoF during LPC 2010 [1].
>>     Tony, thanks for your notes and discussions to generate the
>>     h/w error reporting requirements.
>>     
>>     [1] http://lwn.net/Articles/416669/
>>     
>>         We have several subsystems & methods for reporting hardware errors:
>>     
>>         1) EDAC ("Error Detection and Correction").  In its original form
>>         this consisted of a platform specific driver that read topology
>>         information and error counts from chipset registers and reported
>>         the results via a sysfs interface.
>>     
>>         2) mcelog - x86 specific decoding of machine check bank registers
>>         reporting in binary form via /dev/mcelog. Recent additions make use
>>         of the APEI extensions that were documented in version 4.0a of the
>>         ACPI specification to acquire more information about errors without
>>         having to rely reading chipset registers directly. A user level
>>         programs decodes into somewhat human readable format.
>>     
>>         3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
>>         decodes errors reported via machine check bank registers in AMD
>>         processors to the console log using printk();
>>     
>>         Each of these mechanisms has a band of followers ... and none
>>         of them appear to meet all the needs of all users.
>>     
>>     As part of a RAS subsystem, let's encapsulate the memory error hardware
>>     events into a trace facility.
>>     
>>     The tracepoint printk will be displayed like:
>>     
>>     mc_event: (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick "[label]" ([location] [edac_mc detail] [driver_detail])
>>     
>>     Where:
>>     	[error msg] is the driver-specific error message
>>     		    (e. g. "memory read", "bus error", ...);
>>     	[location] is the location in terms of memory controller and
>>     		   branch/channel/slot, channel/slot or csrow/channel;
>>     	[label] is the memory stick label;
>>     	[edac_mc detail] describes the address location of the error
>>     			 and the syndrome;
>>     	[driver detail] is driver-specifig error message details,
>>     			when needed/provided (e. g. "area:DMA", ...)
> 
> Ah ok, so count, area and rank are driver-specific stuff and they're not part of
> the mandatory string output, I guess that's fine.

Yep.

> 
> Here's what an error looks like on my system here:
> 
>        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
> 
> There's still this trailing " " at the end of the error line which
> shouldn't be there and also two spaces between "channel" and "page".

If you take a look at the trace printk:

+ TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
+           (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+                 ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+                 "Fatal" : "Uncorrected"),
+           __get_str(msg),
+           __get_str(label),
+           __entry->mc_index,
+           __get_str(location),
+           __get_str(detail),
+           __get_str(driver_detail))

There are not extra spaces there. The first extra space is probably because
there is an extra space at the label string. This should be easy to fix.

The other extra space at the end is because amd64 currently doesn't provide
driver_detail information.

> Also, according to the output above "amd64_edac" is supposed to be
> [error msg] which is strange.
> 
> I believe this comes from this call in f1x_map_sysaddr_to_csrow():
> 
> 	        edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
>                              page, offset, syndrome,
>                              csrow, chan, -1,
>                              EDAC_MOD_STR, "", NULL);
> 
> I guess you want to do the following instead:
> 
>        mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)
> 
> maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
> corrected/uncorrected error?

The issue here is because amd64_edac (just like a few other drivers) use
its driver name (EDAC_MOD_STR) as the error message, instead of using 
something meaningful, like "read error" or "ECC error".
 
> Also, make sure the calls in the other drivers don't generate such
> strange output.

The same kind of strange output is also at the printk's, and it is there
even with the current calls: the output syntax is broken on several drivers,
and fixing for some will break for the others.

So, this needs to be reviewed driver-per-driver. I'll handle that with the
machines I have for test. For the other drivers, maybe we can just fill 
the error_msg information with "error".

Anyway, I intend to work on that after merging this huge patch series.
As Tony said, a change on that is not trivial.

>>     For example:
>>     
>>     mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
> 				you need a space after "error:"?
> 
>>     Of course, any userspace tools meant to handle errors should not parse
>>     the above data. They should, instead, use the binary fields provided by
>>     the tracepoint, mapping them directly into their MIBs.
> 
> What is a MIB?

Management Information Base. This is how anyone that works with Element
Management calls the model of information that represents each management
property. It is generally written using ITU-T ASN.1 syntax. Almost all
management software use that.

[1] http://en.wikipedia.org/wiki/Management_information_base

> 
>>     NOTE: The original patch was providing an additional mechanism for
>>     MCA-based trace events that also contained MCA error register data.
>>     Hoever, as no agreement was reached so far for the MCA-based trace
>>     events, for now, let's add events only for memory errors.
>>     A latter patch is planned to change the tracepoint, for those types
>>     of event.
>>     
>>     Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
>>     Cc: Doug Thompson <norsk5@yahoo.com>
>>     Cc: Steven Rostedt <rostedt@goodmis.org>
>>     Cc: Frederic Weisbecker <fweisbec@gmail.com>
>>     Cc: Ingo Molnar <mingo@redhat.com>
>>     Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
>>
>> diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
>> index f06ce9ab692c..eee73605c5a0 100644
>> --- a/drivers/edac/edac_core.h
>> +++ b/drivers/edac/edac_core.h
>> @@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>>  			  const int layer2,
>>  			  const char *msg,
>>  			  const char *other_detail,
>> -			  const void *mcelog);
>> +			  const void *arch_log);
>>  
>>  /*
>>   * edac_device APIs
>> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
>> index e5b55632359f..eb078ec62044 100644
>> --- a/drivers/edac/edac_mc.c
>> +++ b/drivers/edac/edac_mc.c
>> @@ -33,6 +33,10 @@
>>  #include "edac_core.h"
>>  #include "edac_module.h"
>>  
>> +#define CREATE_TRACE_POINTS
>> +#define TRACE_INCLUDE_PATH ../../include/ras
>> +#include <ras/ras_event.h>
>> +
>>  /* lock to memory controller's control array */
>>  static DEFINE_MUTEX(mem_ctls_mutex);
>>  static LIST_HEAD(mc_devices);
>> @@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>>  	 * which will perform kobj unregistration and the actual free
>>  	 * will occur during the kobject callback operation
>>  	 */
>> +
>>  	return mci;
>>  }
>>  EXPORT_SYMBOL_GPL(edac_mc_alloc);
>> @@ -972,6 +977,27 @@ static void edac_ue_error(struct mem_ctl_info *mci,
>>  }
>>  
>>  #define OTHER_LABEL " or "
>> +
>> +/**
>> + * edac_mc_handle_error - reports a memory event to userspace
>> + *
>> + * @type:		severity of the error (CE/UE/Fatal)
>> + * @mci:		a struct mem_ctl_info pointer
>> + * @page_frame_number:	mem page where the error occurred
>> + * @offset_in_page:	offset of the error inside the page
>> + * @syndrome:		ECC syndrome
>> + * @layer0:		Memory layer0 position
>> + * @layer1:		Memory layer2 position
>> + * @layer2:		Memory layer3 position
>> + * @msg:		Message meaningful to the end users that
>> + *			explains the event
>> + * @other_detail:	Technical details about the event that
>> + *			may help hardware manufacturers and
>> + *			EDAC developers to analyse the event
> 
> 					analyze it.

Analyse is the same as analyze [2].

[2] http://dictionary.reference.com/browse/analyse

> 
>> + * @arch_log:		Architecture-specific struct that can
>> + *			be used to add extended information to the
>> + *			tracepoint, like dumping MCE registers.
>> + */
>>  void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>>  			  struct mem_ctl_info *mci,
>>  			  const unsigned long page_frame_number,
>> @@ -982,7 +1008,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>>  			  const int layer2,
>>  			  const char *msg,
>>  			  const char *other_detail,
>> -			  const void *mcelog)
>> +			  const void *arch_log)
>>  {
>>  	/* FIXME: too much for stack: move it to some pre-alocated area */
>>  	char detail[80], location[80];
>> @@ -1119,21 +1145,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
>>  	}
>>  
>>  	/* Memory type dependent details about the error */
>> -	if (type == HW_EVENT_ERR_CORRECTED) {
>> +	if (type == HW_EVENT_ERR_CORRECTED)
>>  		snprintf(detail, sizeof(detail),
>>  			"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
>>  			page_frame_number, offset_in_page,
>>  			grain, syndrome);
>> -		edac_ce_error(mci, pos, msg, location, label, detail,
>> -			      other_detail, enable_per_layer_report,
>> -			      page_frame_number, offset_in_page, grain);
>> -	} else {
>> +	else
>>  		snprintf(detail, sizeof(detail),
>>  			"page:0x%lx offset:0x%lx grain:%d",
>>  			page_frame_number, offset_in_page, grain);
>>  
>> +	/* Report the error via the trace interface */
>> +	trace_mc_event(type, mci->mc_idx, msg, label, location,
>> +		       detail, other_detail);
>> +
>> +	/* Report the error via the edac_mc_printk() interface */
>> +	if (type == HW_EVENT_ERR_CORRECTED)
>> +		edac_ce_error(mci, pos, msg, location, label, detail,
>> +			      other_detail, enable_per_layer_report,
>> +			      page_frame_number, offset_in_page, grain);
>> +	else
>>  		edac_ue_error(mci, pos, msg, location, label, detail,
>>  			      other_detail, enable_per_layer_report);
>> -	}
>>  }
>>  EXPORT_SYMBOL_GPL(edac_mc_handle_error);
>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
>> new file mode 100644
>> index 000000000000..66f6a43151dc
>> --- /dev/null
>> +++ b/include/ras/ras_event.h
>> @@ -0,0 +1,78 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM ras
>> +#define TRACE_INCLUDE_FILE ras_event
>> +
>> +#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_HW_EVENT_MC_H
>> +
>> +#include <linux/tracepoint.h>
>> +#include <linux/edac.h>
>> +#include <linux/ktime.h>
>> +
>> +/*
>> + * Hardware Events Report
>> + *
>> + * Those events are generated when hardware detected a corrected or
>> + * uncorrected event, and are meant to replace the current API to report
>> + * errors defined on both EDAC and MCE subsystems.
>> + *
>> + * FIXME: Add events for handling memory errors originated from the
>> + *        MCE subsystem.
>> + */
>> +
>> +/*
>> + * Hardware-independent Memory Controller specific events
>> + */
>> +
>> +/*
>> + * Default error mechanisms for Memory Controller errors (CE and UE)
>> + */
>> +TRACE_EVENT(mc_event,
>> +
>> +	TP_PROTO(const unsigned int err_type,
>> +		 const unsigned int mc_index,
>> +		 const char *error_msg,
>> +		 const char *label,
>> +		 const char *location,
>> +		 const char *core_detail,
>> +		 const char *driver_detail),
>> +
>> +	TP_ARGS(err_type, mc_index, error_msg, label, location,
>> +		core_detail, driver_detail),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	unsigned int,	err_type		)
>> +		__field(	unsigned int,	mc_index		)
>> +		__string(	msg,		error_msg		)
>> +		__string(	label,		label			)
>> +		__string(	detail,		core_detail		)
>> +		__string(	location,	location		)
>> +		__string(	driver_detail,	driver_detail		)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->err_type		= err_type;
>> +		__entry->mc_index		= mc_index;
>> +		__assign_str(msg, error_msg);
>> +		__assign_str(label, label);
>> +		__assign_str(location, location);
>> +		__assign_str(detail, core_detail);
>> +		__assign_str(driver_detail, driver_detail);
>> +	),
>> +
>> +	TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
>> +		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
>> +			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
>> +			"Fatal" : "Uncorrected"),
>> +		  __get_str(msg),
>> +		  __get_str(label),
>> +		  __entry->mc_index,
>> +		  __get_str(location),
>> +		  __get_str(detail),
>> +		  __get_str(driver_detail))
>> +);
>> +
>> +#endif /* _TRACE_HW_EVENT_MC_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-15 16:05                         ` Mauro Carvalho Chehab
@ 2012-05-15 16:38                           ` Borislav Petkov
  2012-05-16 11:22                             ` Mauro Carvalho Chehab
  2012-05-16 12:48                             ` Steven Rostedt
  0 siblings, 2 replies; 43+ messages in thread
From: Borislav Petkov @ 2012-05-15 16:38 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Tue, May 15, 2012 at 01:05:48PM -0300, Mauro Carvalho Chehab wrote:
> > Here's what an error looks like on my system here:
> > 
> >        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
> > 
> > There's still this trailing " " at the end of the error line which
> > shouldn't be there and also two spaces between "channel" and "page".
> 
> If you take a look at the trace printk:
> 
> + TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
> +           (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> +                 ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> +                 "Fatal" : "Uncorrected"),
> +           __get_str(msg),
> +           __get_str(label),
> +           __entry->mc_index,
> +           __get_str(location),
> +           __get_str(detail),
> +           __get_str(driver_detail))
> 
> There are not extra spaces there. The first extra space is probably because
> there is an extra space at the label string. This should be easy to fix.
> 
> The other extra space at the end is because amd64 currently doesn't provide
> driver_detail information.

Remind me again why do we need two strings: detail and driver_detail?

Because they could very well be lumped together with a single "%s"
format - "(mc:%d %s)" - and be printed.

And detail will always contain something which is not the empty string,
so problem solved.

> > Also, according to the output above "amd64_edac" is supposed to be
> > [error msg] which is strange.
> > 
> > I believe this comes from this call in f1x_map_sysaddr_to_csrow():
> > 
> > 	        edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
> >                              page, offset, syndrome,
> >                              csrow, chan, -1,
> >                              EDAC_MOD_STR, "", NULL);
> > 
> > I guess you want to do the following instead:
> > 
> >        mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)
> > 
> > maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
> > corrected/uncorrected error?
> 
> The issue here is because amd64_edac (just like a few other drivers) use
> its driver name (EDAC_MOD_STR) as the error message, instead of using 
> something meaningful, like "read error" or "ECC error".

No, the issue is here that edac_mc_handle_ce() used to say "CE..."
and edac_mc_handle_ue() used to say "UE.. " and yours don't say that
anymore. In other words, you need to add the "CE/UE" thing to the string
based on the HW_EVENT_ERR_* flag or something to that effect.

[ … ]

> >>     Of course, any userspace tools meant to handle errors should not parse
> >>     the above data. They should, instead, use the binary fields provided by
> >>     the tracepoint, mapping them directly into their MIBs.
> > 
> > What is a MIB?
> 
> Management Information Base. This is how anyone that works with Element
> Management calls the model of information that represents each management
> property. It is generally written using ITU-T ASN.1 syntax. Almost all
> management software use that.
> 
> [1] http://en.wikipedia.org/wiki/Management_information_base

That looks like an ACPI or some other idiotic spec speak, pls remove it.

[ … ]

> >> + * edac_mc_handle_error - reports a memory event to userspace
> >> + *
> >> + * @type:		severity of the error (CE/UE/Fatal)
> >> + * @mci:		a struct mem_ctl_info pointer
> >> + * @page_frame_number:	mem page where the error occurred
> >> + * @offset_in_page:	offset of the error inside the page
> >> + * @syndrome:		ECC syndrome
> >> + * @layer0:		Memory layer0 position
> >> + * @layer1:		Memory layer2 position
> >> + * @layer2:		Memory layer3 position
> >> + * @msg:		Message meaningful to the end users that
> >> + *			explains the event
> >> + * @other_detail:	Technical details about the event that
> >> + *			may help hardware manufacturers and
> >> + *			EDAC developers to analyse the event
> > 
> > 					analyze it.
> 
> Analyse is the same as analyze [2].

I know that. What I meant is

s/EDAC developers to analyse the event/EDAC developers to analyse it/

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-15 16:38                           ` Borislav Petkov
@ 2012-05-16 11:22                             ` Mauro Carvalho Chehab
  2012-05-16 13:16                               ` Borislav Petkov
  2012-05-16 12:48                             ` Steven Rostedt
  1 sibling, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-16 11:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Em 15-05-2012 13:38, Borislav Petkov escreveu:
> On Tue, May 15, 2012 at 01:05:48PM -0300, Mauro Carvalho Chehab wrote:
>>> Here's what an error looks like on my system here:
>>>
>>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
>>>
>>> There's still this trailing " " at the end of the error line which
>>> shouldn't be there and also two spaces between "channel" and "page".
>>
>> If you take a look at the trace printk:
>>
>> + TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
>> +           (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
>> +                 ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
>> +                 "Fatal" : "Uncorrected"),
>> +           __get_str(msg),
>> +           __get_str(label),
>> +           __entry->mc_index,
>> +           __get_str(location),
>> +           __get_str(detail),
>> +           __get_str(driver_detail))
>>
>> There are not extra spaces there. The first extra space is probably because
>> there is an extra space at the label string. This should be easy to fix.
>>
>> The other extra space at the end is because amd64 currently doesn't provide
>> driver_detail information.
> 
> Remind me again why do we need two strings: detail and driver_detail?
> 
> Because they could very well be lumped together with a single "%s"
> format - "(mc:%d %s)" - and be printed.

As already explained, after merging two different fields, they can't be easily
unmerged.

The main reason for moving from printk to tracepoint is to allow userspace
programs to get data in binary format, avoiding them to parse a printk message.

So, it is more important to provide the right information at the trace fields
than the niceness of the printk data.

> 
> And detail will always contain something which is not the empty string,
> so problem solved.
> 
>>> Also, according to the output above "amd64_edac" is supposed to be
>>> [error msg] which is strange.
>>>
>>> I believe this comes from this call in f1x_map_sysaddr_to_csrow():
>>>
>>> 	        edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
>>>                              page, offset, syndrome,
>>>                              csrow, chan, -1,
>>>                              EDAC_MOD_STR, "", NULL);
>>>
>>> I guess you want to do the following instead:
>>>
>>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)
>>>
>>> maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
>>> corrected/uncorrected error?
>>
>> The issue here is because amd64_edac (just like a few other drivers) use
>> its driver name (EDAC_MOD_STR) as the error message, instead of using 
>> something meaningful, like "read error" or "ECC error".
> 
> No, the issue is here that edac_mc_handle_ce() used to say "CE..."
> and edac_mc_handle_ue() used to say "UE.. " and yours don't say that
> anymore. In other words, you need to add the "CE/UE" thing to the string
> based on the HW_EVENT_ERR_* flag or something to that effect.


Huh? 

>>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )

CE/UE is there. The only change is that, instead of using acronyms, it is now
saying "Corrected error"/"Uncorrected error", as the idea is to provide something
that the user will better understand.

> [ … ]
> 
>>>>     Of course, any userspace tools meant to handle errors should not parse
>>>>     the above data. They should, instead, use the binary fields provided by
>>>>     the tracepoint, mapping them directly into their MIBs.
>>>
>>> What is a MIB?
>>
>> Management Information Base. This is how anyone that works with Element
>> Management calls the model of information that represents each management
>> property. It is generally written using ITU-T ASN.1 syntax. Almost all
>> management software use that.
>>
>> [1] http://en.wikipedia.org/wiki/Management_information_base
> 
> That looks like an ACPI or some other idiotic spec speak, pls remove it.

No, MIB is not an idiotic speak. It is a widely-used term, older than TCP/IP, 
and defined by ITU-T (M.3000 series) and by ISO.

It is used by everybody that knows a little bit about error management,
as all management systems and protocols (like SNMP) are based on MIB
concepts. This concept is also used by other international forums, like
IETF, to define the Information Model to be used to manage the machine
and/or network resources. For example, the TCP/IP stack management base
is defined at IETF RFC1213.

> 
> [ … ]
> 
>>>> + * edac_mc_handle_error - reports a memory event to userspace
>>>> + *
>>>> + * @type:		severity of the error (CE/UE/Fatal)
>>>> + * @mci:		a struct mem_ctl_info pointer
>>>> + * @page_frame_number:	mem page where the error occurred
>>>> + * @offset_in_page:	offset of the error inside the page
>>>> + * @syndrome:		ECC syndrome
>>>> + * @layer0:		Memory layer0 position
>>>> + * @layer1:		Memory layer2 position
>>>> + * @layer2:		Memory layer3 position
>>>> + * @msg:		Message meaningful to the end users that
>>>> + *			explains the event
>>>> + * @other_detail:	Technical details about the event that
>>>> + *			may help hardware manufacturers and
>>>> + *			EDAC developers to analyse the event
>>>
>>> 					analyze it.
>>
>> Analyse is the same as analyze [2].
> 
> I know that. What I meant is
> 
> s/EDAC developers to analyse the event/EDAC developers to analyse it/
> 

Ah, OK, I'll fix that.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-15 16:38                           ` Borislav Petkov
  2012-05-16 11:22                             ` Mauro Carvalho Chehab
@ 2012-05-16 12:48                             ` Steven Rostedt
  2012-05-16 15:24                               ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 43+ messages in thread
From: Steven Rostedt @ 2012-05-16 12:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker,
	Ingo Molnar

On Tue, 2012-05-15 at 18:38 +0200, Borislav Petkov wrote:
> On Tue, May 15, 2012 at 01:05:48PM -0300, Mauro Carvalho Chehab wrote:
> > > Here's what an error looks like on my system here:
> > > 
> > >        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
> > > 
> > > There's still this trailing " " at the end of the error line which
> > > shouldn't be there and also two spaces between "channel" and "page".
> > 
> > If you take a look at the trace printk:
> > 
> > + TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
> > +           (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> > +                 ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> > +
> > +           __get_str(msg),
> > +           __get_str(label),
> > +           __entry->mc_index,
> > +           __get_str(location),
> > +           __get_str(detail),
> > +           __get_str(driver_detail))
> > 
> > There are not extra spaces there. The first extra space is probably because
> > there is an extra space at the label string. This should be easy to fix.
> > 
> > The other extra space at the end is because amd64 currently doesn't provide
> > driver_detail information.
> 
> Remind me again why do we need two strings: detail and driver_detail?
> 
> Because they could very well be lumped together with a single "%s"
> format - "(mc:%d %s)" - and be printed.
> 
> And detail will always contain something which is not the empty string,
> so problem solved.

Here's another trick if you want to get rid of the space and keep both
fields:

TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s%s%s)",
           (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
                 ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
                 "Fatal" : "Uncorrected"),

           __get_str(msg),
           __get_str(label),
           __entry->mc_index,
           __get_str(location),
           __get_str(detail),
	   strlen(__get_str(detail)) &&
	    strlen(__get_str(driver_detail) ? " ": "",
           __get_str(driver_detail))

-- Steve



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 11:22                             ` Mauro Carvalho Chehab
@ 2012-05-16 13:16                               ` Borislav Petkov
  2012-05-16 13:27                                 ` Steven Rostedt
  2012-05-16 15:16                                 ` Mauro Carvalho Chehab
  0 siblings, 2 replies; 43+ messages in thread
From: Borislav Petkov @ 2012-05-16 13:16 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Luck, Tony, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

On Wed, May 16, 2012 at 08:22:07AM -0300, Mauro Carvalho Chehab wrote:
> >> If you take a look at the trace printk:
> >>
> >> + TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
> >> +           (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> >> +                 ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> >> +                 "Fatal" : "Uncorrected"),
> >> +           __get_str(msg),
> >> +           __get_str(label),
> >> +           __entry->mc_index,
> >> +           __get_str(location),
> >> +           __get_str(detail),
> >> +           __get_str(driver_detail))
> >>
> >> There are not extra spaces there. The first extra space is probably because
> >> there is an extra space at the label string. This should be easy to fix.
> >>
> >> The other extra space at the end is because amd64 currently doesn't provide
> >> driver_detail information.
> > 
> > Remind me again why do we need two strings: detail and driver_detail?
> > 
> > Because they could very well be lumped together with a single "%s"
> > format - "(mc:%d %s)" - and be printed.
> 
> As already explained, after merging two different fields, they can't be easily
> unmerged.
> 
> The main reason for moving from printk to tracepoint is to allow userspace
> programs to get data in binary format, avoiding them to parse a printk message.
> 
> So, it is more important to provide the right information at the trace fields
> than the niceness of the printk data.

This doesn't answer my question. My question was: "why can't 'detail'
and 'driver_detail' be a single parameter, i.e. 'detail' and this way
solve both pretty printing and getting binary data problems?

> > And detail will always contain something which is not the empty string,
> > so problem solved.
> > 
> >>> Also, according to the output above "amd64_edac" is supposed to be
> >>> [error msg] which is strange.
> >>>
> >>> I believe this comes from this call in f1x_map_sysaddr_to_csrow():
> >>>
> >>> 	        edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
> >>>                              page, offset, syndrome,
> >>>                              csrow, chan, -1,
> >>>                              EDAC_MOD_STR, "", NULL);
> >>>
> >>> I guess you want to do the following instead:
> >>>
> >>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)
> >>>
> >>> maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
> >>> corrected/uncorrected error?
> >>
> >> The issue here is because amd64_edac (just like a few other drivers) use
> >> its driver name (EDAC_MOD_STR) as the error message, instead of using 
> >> something meaningful, like "read error" or "ECC error".
> > 
> > No, the issue is here that edac_mc_handle_ce() used to say "CE..."
> > and edac_mc_handle_ue() used to say "UE.. " and yours don't say that
> > anymore. In other words, you need to add the "CE/UE" thing to the string
> > based on the HW_EVENT_ERR_* flag or something to that effect.
> 
> 
> Huh? 
> 
> >>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
> 
> CE/UE is there. The only change is that, instead of using acronyms, it is now
> saying "Corrected error"/"Uncorrected error", as the idea is to provide something
> that the user will better understand.

Ok, so let's change the string output from the above version to:

 "mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: Corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)"

I.e., "mc_event" [DRIVER] [ERROR TYPE] [DETAIL].

> >>>>     Of course, any userspace tools meant to handle errors should not parse
> >>>>     the above data. They should, instead, use the binary fields provided by
> >>>>     the tracepoint, mapping them directly into their MIBs.
> >>>
> >>> What is a MIB?
> >>
> >> Management Information Base. This is how anyone that works with Element
> >> Management calls the model of information that represents each management
> >> property. It is generally written using ITU-T ASN.1 syntax. Almost all
> >> management software use that.
> >>
> >> [1] http://en.wikipedia.org/wiki/Management_information_base
> > 
> > That looks like an ACPI or some other idiotic spec speak, pls remove it.
> 
> No, MIB is not an idiotic speak. It is a widely-used term, older than TCP/IP, 
> and defined by ITU-T (M.3000 series) and by ISO.
> 
> It is used by everybody that knows a little bit about error management,
> as all management systems and protocols (like SNMP) are based on MIB
> concepts. This concept is also used by other international forums, like
> IETF, to define the Information Model to be used to manage the machine
> and/or network resources. For example, the TCP/IP stack management base
> is defined at IETF RFC1213.

More idiotic drivel.

Oh, well, you won't let go of this crap, so at least add the wikipedia
URL to the "MIB" reference so that mere mortals like me who don't know
jack of error management can understand what you're talking about.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 13:16                               ` Borislav Petkov
@ 2012-05-16 13:27                                 ` Steven Rostedt
  2012-05-16 13:32                                   ` Borislav Petkov
  2012-05-16 15:16                                 ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 43+ messages in thread
From: Steven Rostedt @ 2012-05-16 13:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker,
	Ingo Molnar

On Wed, 2012-05-16 at 15:16 +0200, Borislav Petkov wrote:

> > >>>>     Of course, any userspace tools meant to handle errors should not parse
> > >>>>     the above data. They should, instead, use the binary fields provided by
> > >>>>     the tracepoint, mapping them directly into their MIBs.
> > >>>
> > >>> What is a MIB?
> > >>
> > >> Management Information Base. This is how anyone that works with Element
> > >> Management calls the model of information that represents each management
> > >> property. It is generally written using ITU-T ASN.1 syntax. Almost all
> > >> management software use that.
> > >>
> > >> [1] http://en.wikipedia.org/wiki/Management_information_base
> > > 
> > > That looks like an ACPI or some other idiotic spec speak, pls remove it.
> > 
> > No, MIB is not an idiotic speak. It is a widely-used term, older than TCP/IP, 
> > and defined by ITU-T (M.3000 series) and by ISO.
> > 
> > It is used by everybody that knows a little bit about error management,
> > as all management systems and protocols (like SNMP) are based on MIB
> > concepts. This concept is also used by other international forums, like
> > IETF, to define the Information Model to be used to manage the machine
> > and/or network resources. For example, the TCP/IP stack management base
> > is defined at IETF RFC1213.
> 
> More idiotic drivel.
> 
> Oh, well, you won't let go of this crap, so at least add the wikipedia
> URL to the "MIB" reference so that mere mortals like me who don't know
> jack of error management can understand what you're talking about.

I should change goodmis to goodmib. That sounds better. For those that
don't know, goodmis comes from a project I started a long time ago. 

 Good Object Oriented Database Management Information System

Which was inspired by a database I wrote at a company I worked for in
the 90's. Called "Object Based Information v1" or OBI-1.

:-)

-- Steve



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 13:27                                 ` Steven Rostedt
@ 2012-05-16 13:32                                   ` Borislav Petkov
  2012-05-16 13:47                                     ` Steven Rostedt
  0 siblings, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2012-05-16 13:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker,
	Ingo Molnar

On Wed, May 16, 2012 at 09:27:51AM -0400, Steven Rostedt wrote:
> I should change goodmis to goodmib. That sounds better. For those that
> don't know, goodmis comes from a project I started a long time ago.

I always thought "goodmis" meant the good thing you missed :-) And I
wanted to ask you about it the next time we're at a conf somewhere.

>  Good Object Oriented Database Management Information System

Which one was the bad OODBMIS?

> Which was inspired by a database I wrote at a company I worked for in
> the 90's. Called "Object Based Information v1" or OBI-1.

LOOL.

For the same reason, x86 hw guys came up with an instruction called LEA.

:-)

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 13:32                                   ` Borislav Petkov
@ 2012-05-16 13:47                                     ` Steven Rostedt
  0 siblings, 0 replies; 43+ messages in thread
From: Steven Rostedt @ 2012-05-16 13:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker,
	Ingo Molnar

On Wed, 2012-05-16 at 15:32 +0200, Borislav Petkov wrote:

> >  Good Object Oriented Database Management Information System
> 
> Which one was the bad OODBMIS?

I was originally going to be called the Gnu OODMIS, until I found out I
needed to sign over my wife, children, and left testicle to the FSF to
do so. Thus I just did s/Gnu/Good/.

-- Steve



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 13:16                               ` Borislav Petkov
  2012-05-16 13:27                                 ` Steven Rostedt
@ 2012-05-16 15:16                                 ` Mauro Carvalho Chehab
  2012-05-16 15:47                                   ` Borislav Petkov
  1 sibling, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-16 15:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Em 16-05-2012 10:16, Borislav Petkov escreveu:
> On Wed, May 16, 2012 at 08:22:07AM -0300, Mauro Carvalho Chehab wrote:
>>>> If you take a look at the trace printk:
>>>>
>>>> + TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
>>>> +           (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
>>>> +                 ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
>>>> +                 "Fatal" : "Uncorrected"),
>>>> +           __get_str(msg),
>>>> +           __get_str(label),
>>>> +           __entry->mc_index,
>>>> +           __get_str(location),
>>>> +           __get_str(detail),
>>>> +           __get_str(driver_detail))
>>>>
>>>> There are not extra spaces there. The first extra space is probably because
>>>> there is an extra space at the label string. This should be easy to fix.
>>>>
>>>> The other extra space at the end is because amd64 currently doesn't provide
>>>> driver_detail information.
>>>
>>> Remind me again why do we need two strings: detail and driver_detail?
>>>
>>> Because they could very well be lumped together with a single "%s"
>>> format - "(mc:%d %s)" - and be printed.
>>
>> As already explained, after merging two different fields, they can't be easily
>> unmerged.
>>
>> The main reason for moving from printk to tracepoint is to allow userspace
>> programs to get data in binary format, avoiding them to parse a printk message.
>>
>> So, it is more important to provide the right information at the trace fields
>> than the niceness of the printk data.
> 
> This doesn't answer my question. My question was: "why can't 'detail'
> and 'driver_detail' be a single parameter, i.e. 'detail' and this way
> solve both pretty printing and getting binary data problems?

This is the 24th version of this very same patch... We started reviewing this patch
in January... All arguments about why this is a separate data were already said and
discussed.

In summary: all edac messages provide "detail" as this contains the error location
in terms of channel/slot. So, any MIB for EDAC could handle those parameters properly.
With regards to driver_detail, this have per-driver details. So, per-driver MIB is
required for them, if some userspace program wants to properly store that information.

Merging those two separate fields together only makes harder for userspace to store
the error detail information on their MIB.

> 
>>> And detail will always contain something which is not the empty string,
>>> so problem solved.
>>>
>>>>> Also, according to the output above "amd64_edac" is supposed to be
>>>>> [error msg] which is strange.
>>>>>
>>>>> I believe this comes from this call in f1x_map_sysaddr_to_csrow():
>>>>>
>>>>> 	        edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
>>>>>                              page, offset, syndrome,
>>>>>                              csrow, chan, -1,
>>>>>                              EDAC_MOD_STR, "", NULL);
>>>>>
>>>>> I guess you want to do the following instead:
>>>>>
>>>>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)
>>>>>
>>>>> maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
>>>>> corrected/uncorrected error?
>>>>
>>>> The issue here is because amd64_edac (just like a few other drivers) use
>>>> its driver name (EDAC_MOD_STR) as the error message, instead of using 
>>>> something meaningful, like "read error" or "ECC error".
>>>
>>> No, the issue is here that edac_mc_handle_ce() used to say "CE..."
>>> and edac_mc_handle_ue() used to say "UE.. " and yours don't say that
>>> anymore. In other words, you need to add the "CE/UE" thing to the string
>>> based on the HW_EVENT_ERR_* flag or something to that effect.
>>
>>
>> Huh? 
>>
>>>>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
>>
>> CE/UE is there. The only change is that, instead of using acronyms, it is now
>> saying "Corrected error"/"Uncorrected error", as the idea is to provide something
>> that the user will better understand.
> 
> Ok, so let's change the string output from the above version to:
> 
>  "mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: Corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)"
> 
> I.e., "mc_event" [DRIVER] [ERROR TYPE] [DETAIL].

As I've explained, there is a consistency issue with the [ERROR MSG]. On the ones
that properly implement it, [ERROR MSG] is "read error", "spare copy", "bus error", ...

If you think we need to add an extra field for the driver name, then let's add a
"driver_name" field there, but a driver's name shouldn't be merged with an
error type message, as it is abusing the syntax, and the syntax of each field
should be clear enough to allow it to be stored on a MIB.

In any case, those drivers that fill the error msg with something else should
be changes to write there something like "ECC error" instead of "amd64_edac:".

Btw, I'm almost convinced that we should break the "detail" into its components,
e. g., instead of one string, it should be:
	int layer0, layer1, layer2;		/* Location */
	unsigned long memory_address; 		/* or maybe page/offset */
	u32 grain;
	unsigned syndrome;

That would be better from MIB point of view.

> 
>>>>>>     Of course, any userspace tools meant to handle errors should not parse
>>>>>>     the above data. They should, instead, use the binary fields provided by
>>>>>>     the tracepoint, mapping them directly into their MIBs.
>>>>>
>>>>> What is a MIB?
>>>>
>>>> Management Information Base. This is how anyone that works with Element
>>>> Management calls the model of information that represents each management
>>>> property. It is generally written using ITU-T ASN.1 syntax. Almost all
>>>> management software use that.
>>>>
>>>> [1] http://en.wikipedia.org/wiki/Management_information_base
>>>
>>> That looks like an ACPI or some other idiotic spec speak, pls remove it.
>>
>> No, MIB is not an idiotic speak. It is a widely-used term, older than TCP/IP, 
>> and defined by ITU-T (M.3000 series) and by ISO.
>>
>> It is used by everybody that knows a little bit about error management,
>> as all management systems and protocols (like SNMP) are based on MIB
>> concepts. This concept is also used by other international forums, like
>> IETF, to define the Information Model to be used to manage the machine
>> and/or network resources. For example, the TCP/IP stack management base
>> is defined at IETF RFC1213.
> 
> More idiotic drivel.
> 
> Oh, well, you won't let go of this crap, so at least add the wikipedia
> URL to the "MIB" reference so that mere mortals like me who don't know
> jack of error management can understand what you're talking about.

Ok, I'll add it.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 12:48                             ` Steven Rostedt
@ 2012-05-16 15:24                               ` Mauro Carvalho Chehab
  2012-05-16 17:05                                 ` Steven Rostedt
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-16 15:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Borislav Petkov, Luck, Tony, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker,
	Ingo Molnar

Em 16-05-2012 09:48, Steven Rostedt escreveu:
> On Tue, 2012-05-15 at 18:38 +0200, Borislav Petkov wrote:
>> On Tue, May 15, 2012 at 01:05:48PM -0300, Mauro Carvalho Chehab wrote:
>>>> Here's what an error looks like on my system here:
>>>>
>>>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
>>>>
>>>> There's still this trailing " " at the end of the error line which
>>>> shouldn't be there and also two spaces between "channel" and "page".
>>>
>>> If you take a look at the trace printk:
>>>
>>> + TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
>>> +           (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
>>> +                 ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
>>> +
>>> +           __get_str(msg),
>>> +           __get_str(label),
>>> +           __entry->mc_index,
>>> +           __get_str(location),
>>> +           __get_str(detail),
>>> +           __get_str(driver_detail))
>>>
>>> There are not extra spaces there. The first extra space is probably because
>>> there is an extra space at the label string. This should be easy to fix.
>>>
>>> The other extra space at the end is because amd64 currently doesn't provide
>>> driver_detail information.
>>
>> Remind me again why do we need two strings: detail and driver_detail?
>>
>> Because they could very well be lumped together with a single "%s"
>> format - "(mc:%d %s)" - and be printed.
>>
>> And detail will always contain something which is not the empty string,
>> so problem solved.
> 
> Here's another trick if you want to get rid of the space and keep both
> fields:
> 
> TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s%s%s)",
>            (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
>                  ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
>                  "Fatal" : "Uncorrected"),
> 
>            __get_str(msg),
>            __get_str(label),
>            __entry->mc_index,
>            __get_str(location),
>            __get_str(detail),
> 	   strlen(__get_str(detail)) &&
> 	    strlen(__get_str(driver_detail) ? " ": "",
>            __get_str(driver_detail))

Great! I'll use that trick, thanks!
> 
> -- Steve
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 15:16                                 ` Mauro Carvalho Chehab
@ 2012-05-16 15:47                                   ` Borislav Petkov
  2012-05-16 16:52                                     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2012-05-16 15:47 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

On Wed, May 16, 2012 at 12:16:35PM -0300, Mauro Carvalho Chehab wrote:
> > This doesn't answer my question. My question was: "why can't 'detail'
> > and 'driver_detail' be a single parameter, i.e. 'detail' and this way
> > solve both pretty printing and getting binary data problems?
> 
> This is the 24th version of this very same patch...

This would have been the case if you'd split your patches and sent them
in 10-15 patches sets like normal people, _after_ gathering all review feedback.

But, you wanted to fix EDAC and the whole world while at it and besides,
your patches caused boot errors here and there.

Then, you went and rebased the whole patchset after me reviewing one or
two and incremented for that rebase the version number.

So v24 means nothing to me - you might just as well use it for your
internal tracking.

> In summary: all edac messages provide "detail" as this contains the
> error location in terms of channel/slot. So, any MIB for EDAC could
> handle those parameters properly. With regards to driver_detail, this
> have per-driver details. So, per-driver MIB is required for them, if
> some userspace program wants to properly store that information.
>
> Merging those two separate fields together only makes harder for
> userspace to store the error detail information on their MIB.

There's that MIB crap again.

And it doesn't make it harder for anything because in userspace you can
do everything with those strings, cut them, replace them, whatever your
heart desires, even store the correct error detail information "on their
MIB." Basically, you have one string in userspace and you can massage
the hell out of it and even fit it to the MIB or whatever...

[ … ]

> >>>>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
> >>
> >> CE/UE is there. The only change is that, instead of using acronyms, it is now
> >> saying "Corrected error"/"Uncorrected error", as the idea is to provide something
> >> that the user will better understand.
> > 
> > Ok, so let's change the string output from the above version to:
> > 
> >  "mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: Corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)"
> > 
> > I.e., "mc_event" [DRIVER] [ERROR TYPE] [DETAIL].
> 
> As I've explained, there is a consistency issue with the [ERROR MSG]. On the ones
> that properly implement it, [ERROR MSG] is "read error", "spare copy", "bus error", ...
> 
> If you think we need to add an extra field for the driver name, then let's add a
> "driver_name" field there, but a driver's name shouldn't be merged with an
> error type message, as it is abusing the syntax, and the syntax of each field
> should be clear enough to allow it to be stored on a MIB.
> 
> In any case, those drivers that fill the error msg with something else should
> be changes to write there something like "ECC error" instead of "amd64_edac:".
> 
> Btw, I'm almost convinced that we should break the "detail" into its components,
> e. g., instead of one string, it should be:
> 	int layer0, layer1, layer2;		/* Location */
> 	unsigned long memory_address; 		/* or maybe page/offset */
> 	u32 grain;
> 	unsigned syndrome;
> 
> That would be better from MIB point of view.

Dude, enough with this MIB crap already!

Simply move the EDAC_MOD_STR thing earlier in the tracepoint so that you
can get:

"mc_event:" EDAC_MOD_STR [ERROR_TYPE] [DETAIL]

EDAC_MOD_STR is _always_ the driver name:

0 amd64_edac.h      148 #define EDAC_MOD_STR   "amd64_edac"
1 amd76x_edac.c      23 #define EDAC_MOD_STR "amd76x_edac"
2 amd8111_edac.c     37 #define AMD8111_EDAC_MOD_STR "amd8111_edac"
3 amd8131_edac.c     37 #define AMD8131_EDAC_MOD_STR "amd8131_edac"
4 cpc925_edac.c      34 #define CPC925_EDAC_MOD_STR "cpc925_edac"
5 e752x_edac.c       28 #define EDAC_MOD_STR "e752x_edac"
6 e7xxx_edac.c       33 #define EDAC_MOD_STR "e7xxx_edac"
7 i3000_edac.c       21 #define EDAC_MOD_STR  "i3000_edac"
8 i3200_edac.c       22 #define EDAC_MOD_STR        "i3200_edac"
9 i5000_edac.c       31 #define EDAC_MOD_STR      "i5000_edac"
a i5400_edac.c       38 #define EDAC_MOD_STR      "i5400_edac"
b i7300_edac.c       36 #define EDAC_MOD_STR      "i7300_edac"
c i7core_edac.c      65 #define EDAC_MOD_STR      "i7core_edac"
d i82443bxgx_edac.c  36 #define EDAC_MOD_STR    "i82443bxgx_edac"
e i82860_edac.c      20 #define EDAC_MOD_STR "i82860_edac"
f i82875p_edac.c     24 #define EDAC_MOD_STR  "i82875p_edac"
g i82975x_edac.c     20 #define EDAC_MOD_STR  "i82975x_edac"
h mpc85xx_edac.h     15 #define EDAC_MOD_STR "MPC85xx_edac"
i mv64x60_edac.h     16 #define EDAC_MOD_STR "MV64x60_edac"
j r82600_edac.c      26 #define EDAC_MOD_STR "r82600_edac"
k sb_edac.c          38 #define EDAC_MOD_STR      "sbridge_edac"
l x38_edac.c         21 #define EDAC_MOD_STR  "x38_edac"

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 15:47                                   ` Borislav Petkov
@ 2012-05-16 16:52                                     ` Mauro Carvalho Chehab
  2012-05-16 19:59                                       ` Borislav Petkov
  0 siblings, 1 reply; 43+ messages in thread
From: Mauro Carvalho Chehab @ 2012-05-16 16:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

Em 16-05-2012 12:47, Borislav Petkov escreveu:
> On Wed, May 16, 2012 at 12:16:35PM -0300, Mauro Carvalho Chehab wrote:
>>> This doesn't answer my question. My question was: "why can't 'detail'
>>> and 'driver_detail' be a single parameter, i.e. 'detail' and this way
>>> solve both pretty printing and getting binary data problems?
>>
>> This is the 24th version of this very same patch...
> 
> This would have been the case if you'd split your patches and sent them
> in 10-15 patches sets like normal people, _after_ gathering all review feedback.

The first version had only 3 patches. Patches were being added there due to
the huge delay to get them reviewed/accepted.

> But, you wanted to fix EDAC and the whole world while at it and besides,
> your patches caused boot errors here and there.
> 
> Then, you went and rebased the whole patchset after me reviewing one or
> two and incremented for that rebase the version number.

Because you requested that by not accepting incremental patches at the end
of the series.

> So v24 means nothing to me - you might just as well use it for your
> internal tracking.
> 
>> In summary: all edac messages provide "detail" as this contains the
>> error location in terms of channel/slot. So, any MIB for EDAC could
>> handle those parameters properly. With regards to driver_detail, this
>> have per-driver details. So, per-driver MIB is required for them, if
>> some userspace program wants to properly store that information.
>>
>> Merging those two separate fields together only makes harder for
>> userspace to store the error detail information on their MIB.
> 
> There's that MIB crap again.
> 
> And it doesn't make it harder for anything because in userspace you can
> do everything with those strings, cut them, replace them, whatever your
> heart desires, even store the correct error detail information "on their
> MIB." Basically, you have one string in userspace and you can massage
> the hell out of it and even fit it to the MIB or whatever...

The rationale behind providing binary information instead of a printk is
to avoid kernel to spend time with printk formatting and userspace to
parse it and try to recover the original data.

If this weren't a requirement, the better would be to just not use any 
tracepoint for errors at all.

Also, the API should be handled in a way that it will work on userspace.

I'm foreseen the need of implementing the error counters on userspace
on my plans for the edac-utils userspace (or at least, some advanced 
error counter logic, like burst detection, decay, etc).

For that to be properly implemented, userspace may need to get access
to each of the components of the "detail" field.

> 
> [ … ]
> 
>>>>>>>        mcegen.py-2868  [007] .N..   178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
>>>>
>>>> CE/UE is there. The only change is that, instead of using acronyms, it is now
>>>> saying "Corrected error"/"Uncorrected error", as the idea is to provide something
>>>> that the user will better understand.
>>>
>>> Ok, so let's change the string output from the above version to:
>>>
>>>  "mcegen.py-2868  [007] .N..   178.261607: mc_event: amd64_edac: Corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1  page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)"
>>>
>>> I.e., "mc_event" [DRIVER] [ERROR TYPE] [DETAIL].
>>
>> As I've explained, there is a consistency issue with the [ERROR MSG]. On the ones
>> that properly implement it, [ERROR MSG] is "read error", "spare copy", "bus error", ...
>>
>> If you think we need to add an extra field for the driver name, then let's add a
>> "driver_name" field there, but a driver's name shouldn't be merged with an
>> error type message, as it is abusing the syntax, and the syntax of each field
>> should be clear enough to allow it to be stored on a MIB.
>>
>> In any case, those drivers that fill the error msg with something else should
>> be changes to write there something like "ECC error" instead of "amd64_edac:".
>>
>> Btw, I'm almost convinced that we should break the "detail" into its components,
>> e. g., instead of one string, it should be:
>> 	int layer0, layer1, layer2;		/* Location */
>> 	unsigned long memory_address; 		/* or maybe page/offset */
>> 	u32 grain;
>> 	unsigned syndrome;
>>
>> That would be better from MIB point of view.
> 
> Dude, enough with this MIB crap already!

Your feedback about the patches are very welcome, but it seems that you 
have no experience with MIB handling and/or with userspace logic for
error management, so, please stop nonsense about something that doesn't
seem to be on your expertise area.

> Simply move the EDAC_MOD_STR thing earlier in the tracepoint so that you
> can get:
> 
> "mc_event:" EDAC_MOD_STR [ERROR_TYPE] [DETAIL]
> 
> EDAC_MOD_STR is _always_ the driver name:
> 
> 0 amd64_edac.h      148 #define EDAC_MOD_STR   "amd64_edac"
> 1 amd76x_edac.c      23 #define EDAC_MOD_STR "amd76x_edac"

> 2 amd8111_edac.c     37 #define AMD8111_EDAC_MOD_STR "amd8111_edac"
> 3 amd8131_edac.c     37 #define AMD8131_EDAC_MOD_STR "amd8131_edac"
> 4 cpc925_edac.c      34 #define CPC925_EDAC_MOD_STR "cpc925_edac"

Those above use another name.

> 5 e752x_edac.c       28 #define EDAC_MOD_STR "e752x_edac"
> 6 e7xxx_edac.c       33 #define EDAC_MOD_STR "e7xxx_edac"
> 7 i3000_edac.c       21 #define EDAC_MOD_STR  "i3000_edac"
> 8 i3200_edac.c       22 #define EDAC_MOD_STR        "i3200_edac"
> 9 i5000_edac.c       31 #define EDAC_MOD_STR      "i5000_edac"
> a i5400_edac.c       38 #define EDAC_MOD_STR      "i5400_edac"
> b i7300_edac.c       36 #define EDAC_MOD_STR      "i7300_edac"
> c i7core_edac.c      65 #define EDAC_MOD_STR      "i7core_edac"
> d i82443bxgx_edac.c  36 #define EDAC_MOD_STR    "i82443bxgx_edac"
> e i82860_edac.c      20 #define EDAC_MOD_STR "i82860_edac"
> f i82875p_edac.c     24 #define EDAC_MOD_STR  "i82875p_edac"
> g i82975x_edac.c     20 #define EDAC_MOD_STR  "i82975x_edac"
> h mpc85xx_edac.h     15 #define EDAC_MOD_STR "MPC85xx_edac"
> i mv64x60_edac.h     16 #define EDAC_MOD_STR "MV64x60_edac"
> j r82600_edac.c      26 #define EDAC_MOD_STR "r82600_edac"
> k sb_edac.c          38 #define EDAC_MOD_STR      "sbridge_edac"
> l x38_edac.c         21 #define EDAC_MOD_STR  "x38_edac"
> 

Ok, I'll add a driver_name field with *EDAC_MOD_STR, on a separate
patchset (as this will require touching, again, on every single 
driver).

Regards,
Mauro

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 15:24                               ` Mauro Carvalho Chehab
@ 2012-05-16 17:05                                 ` Steven Rostedt
  0 siblings, 0 replies; 43+ messages in thread
From: Steven Rostedt @ 2012-05-16 17:05 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Luck, Tony, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker,
	Ingo Molnar

On Wed, 2012-05-16 at 12:24 -0300, Mauro Carvalho Chehab wrote:

> > Here's another trick if you want to get rid of the space and keep both
> > fields:
> > 
> > TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s%s%s)",
> >            (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> >                  ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> >                  "Fatal" : "Uncorrected"),
> > 
> >            __get_str(msg),
> >            __get_str(label),
> >            __entry->mc_index,
> >            __get_str(location),
> >            __get_str(detail),
> > 	   strlen(__get_str(detail)) &&
> > 	    strlen(__get_str(driver_detail) ? " ": "",
> >            __get_str(driver_detail))
> 
> Great! I'll use that trick, thanks!

strlen() may be too overblown. The following should work and is more
efficient:

	(((char *)__get_str(detail))[0] &&
	 ((char *)__get_str(driver_detail))[0]) ? " " : "",

-- Steve



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 16:52                                     ` Mauro Carvalho Chehab
@ 2012-05-16 19:59                                       ` Borislav Petkov
  2012-05-16 20:27                                         ` Luck, Tony
  0 siblings, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2012-05-16 19:59 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Luck, Tony, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

On Wed, May 16, 2012 at 01:52:21PM -0300, Mauro Carvalho Chehab wrote:
> Em 16-05-2012 12:47, Borislav Petkov escreveu:
> > On Wed, May 16, 2012 at 12:16:35PM -0300, Mauro Carvalho Chehab wrote:
> >>> This doesn't answer my question. My question was: "why can't 'detail'
> >>> and 'driver_detail' be a single parameter, i.e. 'detail' and this way
> >>> solve both pretty printing and getting binary data problems?
> >>
> >> This is the 24th version of this very same patch...
> > 
> > This would have been the case if you'd split your patches and sent them
> > in 10-15 patches sets like normal people, _after_ gathering all review feedback.
> 
> The first version had only 3 patches. Patches were being added there due to
> the huge delay to get them reviewed/accepted.
> 
> > But, you wanted to fix EDAC and the whole world while at it and besides,
> > your patches caused boot errors here and there.
> > 
> > Then, you went and rebased the whole patchset after me reviewing one or
> > two and incremented for that rebase the version number.
> 
> Because you requested that by not accepting incremental patches at the end
> of the series.

No, because this is how patch series are done. Incremental patches are
for people who get paid on the amount of patches they get into the
kernel.

> > So v24 means nothing to me - you might just as well use it for your
> > internal tracking.
> > 
> >> In summary: all edac messages provide "detail" as this contains the
> >> error location in terms of channel/slot. So, any MIB for EDAC could
> >> handle those parameters properly. With regards to driver_detail, this
> >> have per-driver details. So, per-driver MIB is required for them, if
> >> some userspace program wants to properly store that information.
> >>
> >> Merging those two separate fields together only makes harder for
> >> userspace to store the error detail information on their MIB.
> > 
> > There's that MIB crap again.
> > 
> > And it doesn't make it harder for anything because in userspace you can
> > do everything with those strings, cut them, replace them, whatever your
> > heart desires, even store the correct error detail information "on their
> > MIB." Basically, you have one string in userspace and you can massage
> > the hell out of it and even fit it to the MIB or whatever...
> 
> The rationale behind providing binary information instead of a printk is
> to avoid kernel to spend time with printk formatting and userspace to
> parse it and try to recover the original data.
> 
> If this weren't a requirement, the better would be to just not use any 
> tracepoint for errors at all.
> 
> Also, the API should be handled in a way that it will work on userspace.

No, userspace will be doing parsing because it is the only sensible
thing to do. The kernel's job is to carry out enough information for
the user to handle the error in a way which is just enough. No more, no
less. It is in no f*cking way expected to make it pretty and in suitable
portions so that userspace can consume it.

Ok, this took longer than I thought and I'm getting really tired of the
crap so let me save you the trouble:

* either make the RAS tracepoint patch the way I'm suggesting.

* or give a really good reason for doing it differently (and yes,
userspace will be doing parsing).

* or consider it nacked.

It is that simple.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 19:59                                       ` Borislav Petkov
@ 2012-05-16 20:27                                         ` Luck, Tony
  2012-05-16 21:05                                           ` Borislav Petkov
  0 siblings, 1 reply; 43+ messages in thread
From: Luck, Tony @ 2012-05-16 20:27 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List,
	Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar

> No, userspace will be doing parsing because it is the only sensible
> thing to do. The kernel's job is to carry out enough information for
> the user to handle the error in a way which is just enough. No more, no
> less. It is in no f*cking way expected to make it pretty and in suitable
> portions so that userspace can consume it.

My requirement on prettiness was just that the information useful
to a normal system owner come first. Enough to know how serious the
problem is, and which component is affected. The fine details must
be in the "( ... )" part - and will generally only be of interest
to very sophisticated users.

We've reached that goal.

Looking at the next part ... which is how to write useful analysis
tools that study these logs. We have some common elements in trace
records - but a there is (deliberately) plenty of scope for drivers
to add their own customizations. If these bits are going to be
used by the user mode tools - then they will have per-driver
parsing code that will be tightly linked to the version of the driver.

Are we happy with this? Once the patch goes it, we have created an
ABI which will be hard to change.

-Tony

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues
  2012-05-16 20:27                                         ` Luck, Tony
@ 2012-05-16 21:05                                           ` Borislav Petkov
  0 siblings, 0 replies; 43+ messages in thread
From: Borislav Petkov @ 2012-05-16 21:05 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List, Doug Thompson, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

On Wed, May 16, 2012 at 08:27:24PM +0000, Luck, Tony wrote:
> > No, userspace will be doing parsing because it is the only sensible
> > thing to do. The kernel's job is to carry out enough information for
> > the user to handle the error in a way which is just enough. No more, no
> > less. It is in no f*cking way expected to make it pretty and in suitable
> > portions so that userspace can consume it.
> 
> My requirement on prettiness was just that the information useful
> to a normal system owner come first. Enough to know how serious the
> problem is, and which component is affected. The fine details must
> be in the "( ... )" part - and will generally only be of interest
> to very sophisticated users.
> 
> We've reached that goal.
> 
> Looking at the next part ... which is how to write useful analysis
> tools that study these logs. We have some common elements in trace
> records - but a there is (deliberately) plenty of scope for drivers
> to add their own customizations. If these bits are going to be
> used by the user mode tools - then they will have per-driver
> parsing code that will be tightly linked to the version of the driver.

This is exactly what I'm trying to explain - the kernel carries out the
information, userspace parses it the way it sees fit.

> Are we happy with this? Once the patch goes it, we have created an
> ABI which will be hard to change.

Yes, very true.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2012-05-16 21:05 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-10 19:56 [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues Mauro Carvalho Chehab
2012-05-10 20:40 ` Borislav Petkov
2012-05-10 20:55   ` Mauro Carvalho Chehab
2012-05-10 22:46     ` Steven Rostedt
2012-05-10 23:16       ` Mauro Carvalho Chehab
2012-05-10 21:00   ` [PATCHv23] RAS: " Mauro Carvalho Chehab
2012-05-11 10:04     ` Borislav Petkov
2012-05-11 14:54     ` [PATCH v.23-2] RAS: use tracepoint " Mauro Carvalho Chehab
2012-05-11 17:02       ` Luck, Tony
2012-05-11 18:53         ` Mauro Carvalho Chehab
2012-05-11 20:07           ` Tony Luck
2012-05-11 17:06       ` Borislav Petkov
2012-05-11 17:10         ` Mauro Carvalho Chehab
2012-05-11 22:31           ` Borislav Petkov
2012-05-11 22:35             ` Luck, Tony
2012-05-12 14:13     ` [PATCH v24] RAS: Add a tracepoint for reporting memory controller events Mauro Carvalho Chehab
2012-05-10 21:10   ` [PATCH v22] edac, ras/hw_event.h: use events to handle hw issues Luck, Tony
2012-05-10 22:07     ` Mauro Carvalho Chehab
2012-05-10 22:37       ` Luck, Tony
2012-05-11  1:48         ` Mauro Carvalho Chehab
2012-05-11 10:25           ` Borislav Petkov
2012-05-11 12:37             ` Mauro Carvalho Chehab
2012-05-11 17:24               ` Borislav Petkov
2012-05-11 18:38                 ` Mauro Carvalho Chehab
2012-05-14 13:34                   ` Borislav Petkov
2012-05-14 14:27                     ` Mauro Carvalho Chehab
2012-05-15 15:09                       ` Borislav Petkov
2012-05-15 16:05                         ` Mauro Carvalho Chehab
2012-05-15 16:38                           ` Borislav Petkov
2012-05-16 11:22                             ` Mauro Carvalho Chehab
2012-05-16 13:16                               ` Borislav Petkov
2012-05-16 13:27                                 ` Steven Rostedt
2012-05-16 13:32                                   ` Borislav Petkov
2012-05-16 13:47                                     ` Steven Rostedt
2012-05-16 15:16                                 ` Mauro Carvalho Chehab
2012-05-16 15:47                                   ` Borislav Petkov
2012-05-16 16:52                                     ` Mauro Carvalho Chehab
2012-05-16 19:59                                       ` Borislav Petkov
2012-05-16 20:27                                         ` Luck, Tony
2012-05-16 21:05                                           ` Borislav Petkov
2012-05-16 12:48                             ` Steven Rostedt
2012-05-16 15:24                               ` Mauro Carvalho Chehab
2012-05-16 17:05                                 ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).