linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/31] Hardware Events Report Mecanism (HERM)
@ 2012-02-10  0:00 Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 01/31] events/hw_event: Create a " Mauro Carvalho Chehab
                   ` (33 more replies)
  0 siblings, 34 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:00 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List, bp, tony.luck

This is the third version of HERM patches.

This patch series is targeted on solving some problems found at the
hardware error report mecanisms at the Kernel:

	- MCE events generate processor specific messages. Decoding them
require to know arch-specific, CPU specific information. On some cases,
the same CPU output different things on different CPU stepping;

	- The EDAC core is outdated: it assumes that all drivers talk to
memories via a chip select signal, using one or two channels. Drivers
for modern architectures need to fake data to the EDAC core;

	- There are several error functions for memory errors on EDAC;
its usage is confusing, and some drivers could be providing more information,
but they're limited to the API rigid constraints. For example, single-channel
drivers could be reporting errors to a single DIMM, even on traditional
memory architecture, but the EDAC function call doesn't allow it;

	- When an error event arises on modern x86 processors, an MCE
event is generated. Such error could be enriched by a parsed information,
complemented by some additional data available on non-MCE registers,
generating just one event with the complete (MCE log + parsed info)
event information.

While HERM is meant to be generic, the current focus is to fix the issues
with the memory errors.

This series incorporates a feedback from Boris and Tony with regards to
integrate memory error events with MCE, where supported.

With regard to memory errors, HERM will allow specify any memory hierarchy
(currently limited to up to 3 layers after the memory controller, as it 
covers all currently supported memory architectures). Expanding it should
be easy, if later needed.

The old sysfs nodes are still supported. Latter patches will allow
disabling the old sysfs nodes.

All errors currently generate the printk events, as before, but they'll
also generate perf events like:

            bash-1680  [001]   152.349448: mc_error: [Hardware Error]: mce#0: Uncorrected error FAKE ERROR on label "mc#0channel#2slot#2 " (channel 2 slot 2  page 0x0 offset 0x0 grain 0 for EDAC testing only)
     kworker/u:5-198   [006]  1341.771535: mc_error_mce: mce#0: Corrected error memory read error on label "CPU_SrcID#0_Channel#3_DIMM#0 " (channel 0 slot 0  page 0x3a2db9 offset 0x7ac grain 32 syndrome 0x0 CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC: 00000003a2db97ac/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0, PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=2/mask=4 rank=1)
     kworker/u:5-198   [006]  1341.792536: mc_error_mce: mce#0: Corrected error Can't discover the memory rank for ch addr 0x60f2a6d76 on label "any memory" ( page 0x0 offset 0x0 grain 32 syndrome 0x0 CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC: 0000000c1e54dab6/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0, PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 )

New sysfs nodes are now provided, to match the real memory architecture.

For example, on a Sandy Bridge-EP machine, with up to 4 channels, and up
to 3 DIMMs per channel:

/sys/devices/system/edac/mc/mc0/
├── ce_channel0
├── ce_channel0_slot0
├── ce_channel0_slot1
├── ce_channel0_slot2
├── ce_channel1
├── ce_channel1_slot0
├── ce_channel1_slot1
├── ce_channel1_slot2
├── ce_channel2
├── ce_channel2_slot0
├── ce_channel2_slot1
├── ce_channel2_slot2
├── ce_channel3
├── ce_channel3_slot0
├── ce_channel3_slot1
├── ce_channel3_slot2
├── ce_count
├── ce_noinfo_count
├── dimm0
│   ├── dimm_dev_type
│   ├── dimm_edac_mode
│   ├── dimm_label
│   ├── dimm_location
│   ├── dimm_mem_type
│   └── dimm_size
├── dimm1
│   ├── dimm_dev_type
│   ├── dimm_edac_mode
│   ├── dimm_label
│   ├── dimm_location
│   ├── dimm_mem_type
│   └── dimm_size
├── fake_inject
├── ue_channel0
├── ue_channel0_slot0
├── ue_channel0_slot1
├── ue_channel0_slot2
├── ue_channel1
├── ue_channel1_slot0
├── ue_channel1_slot1
├── ue_channel1_slot2
├── ue_channel2
├── ue_channel2_slot0
├── ue_channel2_slot1
├── ue_channel2_slot2
├── ue_channel3
├── ue_channel3_slot0
├── ue_channel3_slot1
├── ue_channel3_slot2
├── ue_count
└── ue_noinfo_count

One of the above nodes allow testing the error report mechanism by
providing a simple driver-independent way to inject errors (fake_inject).
This node is enabled only when CONFIG_EDAC_DEBUG is enabled, and it
is limited to test the core EDAC report mechanisms, but it helps to
test if the tracing events are properly accredited to the right DIMMs.

There's currently one assumption on the above that it might not be
true: it assumes that the last element on the hierarchy will point to
a single memory stick, called at the sysfs hierarchy as "dimm". This
may not be true with dual/quad rank memories, on some memory controllers.
Further test is needed to double check it. I intend to do that, after
having access to csrow/channel based machines that I can equip with a mix
of single and dual or quad rank memories (still trying to obtain some
hardware).

The memory error handling function has now the capability of reporting
more than one dimm, when it is not possible to put the fingers into
a single place.

For example:
	# echo 1 >/sys/devices/system/edac/mc/mc0/fake_inject  && dmesg |tail -1
	[ 2878.130704] EDAC MC0: CE FAKE ERROR on mc#0channel#1slot#0 mc#0channel#1slot#1 mc#0channel#1slot#2  (channel 1 page 0x0 offset 0x0 grain 0 syndrome 0x0 for EDAC testing only)

All dimm memories present on channel 1 are pointed as one of them were
responsible for the error.

With regards to the output, the errors are now reported on a more 
user-friendly way, e. g. the EDAC core will output:

- the timestamp;
- the memory controller;
- if the error is corrected, uncorrected or fatal;
- the error message (driver specific, for example "read error", "scrubbing
  error", etc)
- the affected memory labels.

Other technical details are provided, inside parenthesis, in order to
allow hardware manufacturers, OEM, etc to have more details on it, and
discover what DRAM has problems, if they want/need to.

Ah, now that the memory architecture is properly represented, the DIMM
labels are automatically filled by the mc_alloc function call, in order
to properly represent the memory architecture.

For example, in the case of Sandy Bridge, a memory can be described as:
	mc#0channel#1slot#0

This matches the way the memory is known inside the technical information,
and, hopefully, at the OEM manuals for the motherboard. So, it should
be simpler for OEM's and system administrators to identify what memory
is broken, and/or to relabel it with a tool like edac-utils with the
motherboard-specific nomenclature.

Currently tested on Nehalem and Sandy Bridge. On both, the memory hierarchy
is MC/Channel/Slot. I should be testing tomorrow with i5400, where the
hierarchy is MC/Branch/Channel/Slot.

This series should compile on all architectures (compile-tested the last
patch that changed some bits on all drivers on x86_64, i386, ppc32, ppc64
and tilepro). All drivers compiled fine, even the one marked as BROKEN.

Of course, tests and feedback are welcome!

Regards,
Mauro

Mauro Carvalho Chehab (31):
  events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  events/hw_event: use __string() trace macros for events
  hw_event: Consolidate uncorrected/corrected error msgs into one
  drivers/edac: rename channel_info to csrow_channel_info
  edac: Create a dimm struct and move the labels into it
  edac: Add per dimm's sysfs nodes
  edac: Prepare to push down to drivers the filling of the dimm_info
  edac: Better describe the memory concepts     The memory terms
    changed along the time, since when EDAC were originally    
    written: new concepts were introduced, and some things have
    different     meanings, depending on the memory architecture.
    Better define those     terms, and better describe each supported
    memory type.
  i5400_edac: Convert it to report memory with the new location
  i7300_edac: Convert it to report memory with the new location
  edac: move dimm properties to struct dimm_info
  edac: Don't initialize csrow's first_page & friends when not needed
  edac: move nr_pages to dimm struct
  edac: Add per-dimm sysfs show nodes
  edac: DIMM location cleanup
  edac/ppc4xx_edac: Fix compilation
  edac-mc: Allow reporting errors on a non-csrow oriented way
  edac.h: Use kernel-doc-nano-HOWTO.txt notation for enums
  edac: rework memory layer hierarchy description
  edac: Export MC hierarchy counters for CE and UE
  hw_event: Add x86 MCE events on it
  amd64_edac: convert it to use the MCE log tracepoint where applicable
  edac: Simplify logs for i7core and sb edac drivers
  edac_mc: Some clenups at the log message
  edac: Add a sysfs node to test the EDAC error report facility
  edac_mc: Fix the enable label filter logic
  edac: Initialize the dimm label with the known information
  edac: don't OOPS if the csrow is not visible
  edac: Fix sysfs csrow?/*ce*count counters
  edac: Fix new error counts
  edac: Fix per layer error count counters

 arch/x86/kernel/cpu/mcheck/mce.c |    2 +-
 drivers/edac/amd64_edac.c        |  217 +++++++-----
 drivers/edac/amd64_edac_dbg.c    |    6 +-
 drivers/edac/amd64_edac_inj.c    |   24 +-
 drivers/edac/amd76x_edac.c       |   44 ++-
 drivers/edac/cell_edac.c         |   42 ++-
 drivers/edac/cpc925_edac.c       |   93 +++--
 drivers/edac/e752x_edac.c        |   94 +++--
 drivers/edac/e7xxx_edac.c        |   88 +++--
 drivers/edac/edac_core.h         |   48 +--
 drivers/edac/edac_device.c       |   27 +-
 drivers/edac/edac_mc.c           |  719 ++++++++++++++++++++++++--------------
 drivers/edac/edac_mc_sysfs.c     |  560 +++++++++++++++++++++++++++---
 drivers/edac/edac_module.h       |    2 +-
 drivers/edac/edac_pci.c          |    7 +-
 drivers/edac/i3000_edac.c        |   51 ++-
 drivers/edac/i3200_edac.c        |   57 ++--
 drivers/edac/i5000_edac.c        |   89 +++--
 drivers/edac/i5100_edac.c        |   98 +++---
 drivers/edac/i5400_edac.c        |   99 ++----
 drivers/edac/i7300_edac.c        |  114 +++----
 drivers/edac/i7core_edac.c       |  265 ++++----------
 drivers/edac/i82443bxgx_edac.c   |   43 ++-
 drivers/edac/i82860_edac.c       |   57 ++-
 drivers/edac/i82875p_edac.c      |   53 ++-
 drivers/edac/i82975x_edac.c      |   58 +++-
 drivers/edac/mpc85xx_edac.c      |   45 ++-
 drivers/edac/mv64x60_edac.c      |   47 ++-
 drivers/edac/pasemi_edac.c       |   51 ++--
 drivers/edac/ppc4xx_edac.c       |   62 ++--
 drivers/edac/r82600_edac.c       |   42 ++-
 drivers/edac/sb_edac.c           |  201 ++++-------
 drivers/edac/tile_edac.c         |   33 ++-
 drivers/edac/x38_edac.c          |   54 ++--
 include/linux/edac.h             |  518 ++++++++++++++++++++--------
 include/trace/events/hw_event.h  |  370 ++++++++++++++++++++
 include/trace/events/mce.h       |   69 ----
 37 files changed, 2868 insertions(+), 1581 deletions(-)
 create mode 100644 include/trace/events/hw_event.h
 delete mode 100644 include/trace/events/mce.h

-- 
1.7.8


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10 13:41   ` Borislav Petkov
  2012-02-10  0:01 ` [PATCH v3 02/31] events/hw_event: use __string() trace macros for events Mauro Carvalho Chehab
                   ` (32 subsequent siblings)
  33 siblings, 1 reply; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Adds a trace class for handle hardware events

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

    We have several subsystems & methods for reporting hardware errors:

    1) EDAC ("Error Detection and Correction").  In its original form
    this consisted of a platform specific driver that read topology
    information and error counts from chipset registers and reported
    the results via a sysfs interface.

    2) mcelog - x86 specific decoding of machine check bank registers
    reporting in binary form via /dev/mcelog. Recent additions make use
    of the APEI extensions that were documented in version 4.0a of the
    ACPI specification to acquire more information about errors without
    having to rely reading chipset registers directly. A user level
    programs decodes into somewhat human readable format.

    3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
    decodes errors reported via machine check bank registers in AMD
    processors to the console log using printk();

    Each of these mechanisms has a band of followers ... and none
    of them appear to meet all the needs of all users.

In order to provide a proper hardware event subsystem, let's
encapsulate hardware events into a common trace facility, and
make both edac and mce drivers to use it. After that, common
facilities can be moved into a new core for hardware events
reporting subsystem. This patch is the first of a series, and just
touches at mce.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c          |   32 ++++
 include/trace/events/hw_event.h |  322 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 354 insertions(+), 0 deletions(-)
 create mode 100644 include/trace/events/hw_event.h

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index d69144a..2b8382e 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -34,6 +34,9 @@
 #include "edac_core.h"
 #include "edac_module.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/hw_event.h>
+
 /* lock to memory controller's control array */
 static DEFINE_MUTEX(mem_ctls_mutex);
 static LIST_HEAD(mc_devices);
@@ -224,6 +227,9 @@ struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 	 * which will perform kobj unregistration and the actual free
 	 * will occur during the kobject callback operation
 	 */
+
+	trace_hw_event_init("mce", (unsigned)edac_index);
+
 	return mci;
 }
 EXPORT_SYMBOL_GPL(edac_mc_alloc);
@@ -685,6 +691,7 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 	/* FIXME - maybe make panic on INTERNAL ERROR an option */
 	if (row >= mci->nr_csrows || row < 0) {
 		/* something is wrong */
+		trace_mc_out_of_range(mci, "CE", "row", row, 0, mci->nr_csrows);
 		edac_mc_printk(mci, KERN_ERR,
 			"INTERNAL ERROR: row out of range "
 			"(%d >= %d)\n", row, mci->nr_csrows);
@@ -694,6 +701,8 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 
 	if (channel >= mci->csrows[row].nr_channels || channel < 0) {
 		/* something is wrong */
+		trace_mc_out_of_range(mci, "CE", "channel", channel,
+				      0, mci->csrows[row].nr_channels);
 		edac_mc_printk(mci, KERN_ERR,
 			"INTERNAL ERROR: channel out of range "
 			"(%d >= %d)\n", channel,
@@ -702,6 +711,9 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 		return;
 	}
 
+	trace_mc_corrected_error(mci, page_frame_number, offset_in_page,
+				syndrome, row, channel, msg);
+
 	if (edac_mc_get_log_ce())
 		/* FIXME - put in DIMM location */
 		edac_mc_printk(mci, KERN_WARNING,
@@ -737,6 +749,7 @@ EXPORT_SYMBOL_GPL(edac_mc_handle_ce);
 
 void edac_mc_handle_ce_no_info(struct mem_ctl_info *mci, const char *msg)
 {
+	trace_mc_corrected_error_no_info(mci, msg);
 	if (edac_mc_get_log_ce())
 		edac_mc_printk(mci, KERN_WARNING,
 			"CE - no information available: %s\n", msg);
@@ -761,6 +774,8 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 	/* FIXME - maybe make panic on INTERNAL ERROR an option */
 	if (row >= mci->nr_csrows || row < 0) {
 		/* something is wrong */
+		trace_mc_out_of_range(mci, "UE", "row", row,
+				      0, mci->nr_csrows);
 		edac_mc_printk(mci, KERN_ERR,
 			"INTERNAL ERROR: row out of range "
 			"(%d >= %d)\n", row, mci->nr_csrows);
@@ -781,6 +796,8 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 		pos += chars;
 	}
 
+	trace_mc_uncorrected_error(mci, page_frame_number, offset_in_page,
+				row, msg, labels);
 	if (edac_mc_get_log_ue())
 		edac_mc_printk(mci, KERN_EMERG,
 			"UE page 0x%lx, offset 0x%lx, grain %d, row %d, "
@@ -801,6 +818,7 @@ EXPORT_SYMBOL_GPL(edac_mc_handle_ue);
 
 void edac_mc_handle_ue_no_info(struct mem_ctl_info *mci, const char *msg)
 {
+	trace_mc_uncorrected_error_no_info(mci, msg);
 	if (edac_mc_get_panic_on_ue())
 		panic("EDAC MC%d: Uncorrected Error", mci->mc_idx);
 
@@ -828,6 +846,9 @@ void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
 
 	if (csrow >= mci->nr_csrows) {
 		/* something is wrong */
+
+		trace_mc_out_of_range(mci, "UE FBDIMM", "row", csrow,
+				      0, mci->nr_csrows);
 		edac_mc_printk(mci, KERN_ERR,
 			"INTERNAL ERROR: row out of range (%d >= %d)\n",
 			csrow, mci->nr_csrows);
@@ -837,6 +858,8 @@ void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
 
 	if (channela >= mci->csrows[csrow].nr_channels) {
 		/* something is wrong */
+		trace_mc_out_of_range(mci, "UE FBDIMM", "channel-a", channela,
+				      0, mci->csrows[csrow].nr_channels);
 		edac_mc_printk(mci, KERN_ERR,
 			"INTERNAL ERROR: channel-a out of range "
 			"(%d >= %d)\n",
@@ -847,6 +870,8 @@ void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
 
 	if (channelb >= mci->csrows[csrow].nr_channels) {
 		/* something is wrong */
+		trace_mc_out_of_range(mci, "UE FBDIMM", "channel-b", channelb,
+				      0, mci->csrows[csrow].nr_channels);
 		edac_mc_printk(mci, KERN_ERR,
 			"INTERNAL ERROR: channel-b out of range "
 			"(%d >= %d)\n",
@@ -866,6 +891,8 @@ void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
 	chars = snprintf(pos, len + 1, "-%s",
 			 mci->csrows[csrow].channels[channelb].label);
 
+	trace_mc_uncorrected_error_fbd(mci, csrow, channela, channelb,
+				       msg, labels);
 	if (edac_mc_get_log_ue())
 		edac_mc_printk(mci, KERN_EMERG,
 			"UE row %d, channel-a= %d channel-b= %d "
@@ -890,6 +917,8 @@ void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
 	/* Ensure boundary values */
 	if (csrow >= mci->nr_csrows) {
 		/* something is wrong */
+		trace_mc_out_of_range(mci, "CE FBDIMM", "row", csrow,
+				      0, mci->nr_csrows);
 		edac_mc_printk(mci, KERN_ERR,
 			"INTERNAL ERROR: row out of range (%d >= %d)\n",
 			csrow, mci->nr_csrows);
@@ -898,6 +927,8 @@ void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
 	}
 	if (channel >= mci->csrows[csrow].nr_channels) {
 		/* something is wrong */
+		trace_mc_out_of_range(mci, "UE FBDIMM", "channel", channel,
+				      0, mci->csrows[csrow].nr_channels);
 		edac_mc_printk(mci, KERN_ERR,
 			"INTERNAL ERROR: channel out of range (%d >= %d)\n",
 			channel, mci->csrows[csrow].nr_channels);
@@ -905,6 +936,7 @@ void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
 		return;
 	}
 
+	trace_mc_corrected_error_fbd(mci, csrow, channel, msg);
 	if (edac_mc_get_log_ce())
 		/* FIXME - put in DIMM location */
 		edac_mc_printk(mci, KERN_WARNING,
diff --git a/include/trace/events/hw_event.h b/include/trace/events/hw_event.h
new file mode 100644
index 0000000..3735c6f
--- /dev/null
+++ b/include/trace/events/hw_event.h
@@ -0,0 +1,322 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hw_event
+
+#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_MC_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+
+/*
+ * Hardware Anomaly Report Mecanism (HARM) events
+ *
+ * Those events are generated when hardware detected a corrected or
+ * uncorrected event, and are meant to replace the current API to report
+ * errors defined on both EDAC and MCE subsystems.
+ */
+
+DECLARE_EVENT_CLASS(hw_event_class,
+	TP_PROTO(const char *type, unsigned int instance),
+	TP_ARGS(type, instance),
+
+	TP_STRUCT__entry(
+		__field(	const char *,	type			)
+		__field(	unsigned int,	instance		)
+	),
+
+	TP_fast_assign(
+		__entry->type	= type;
+		__entry->instance = instance;
+	),
+
+	TP_printk("Initialized %s#%d\n",
+		__entry->type,
+		__entry->instance)
+);
+
+/*
+ * This event indicates that a hardware collection mechanism is started
+ */
+DEFINE_EVENT(hw_event_class, hw_event_init,
+
+	TP_PROTO(const char *type, unsigned int instance),
+
+	TP_ARGS(type, instance)
+);
+
+
+/*
+ * Memory Controller specific events
+ */
+
+/*
+ * Default error mechanisms for Memory Controller errors (CE and UE)
+ */
+TRACE_EVENT(mc_corrected_error,
+
+	TP_PROTO(struct mem_ctl_info *mci,
+		unsigned long page_frame_number,
+		unsigned long offset_in_page, unsigned long syndrome,
+		int row, int channel, const char *msg),
+
+	TP_ARGS(mci, page_frame_number, offset_in_page, syndrome, row,
+		channel, msg),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	mc_index		)
+		__field(	unsigned long,	page_frame_number	)
+		__field(	unsigned long,	offset_in_page		)
+		__field(	u32,		grain			)
+		__field(	unsigned long,	syndrome		)
+		__field(	int,		row			)
+		__field(	int,		channel			)
+		__field(	const char *,	label			)
+		__field(	const char *,	msg			)
+	),
+
+	TP_fast_assign(
+		__entry->mc_index		= mci->mc_idx;
+		__entry->page_frame_number	= page_frame_number;
+		__entry->offset_in_page		= offset_in_page;
+		__entry->grain			= mci->csrows[row].grain;
+		__entry->syndrome		= syndrome;
+		__entry->row			= row;
+		__entry->channel		= channel;
+		__entry->label			= mci->csrows[row].channels[channel].label;
+		__entry->msg			= msg;
+	),
+
+	TP_printk(HW_ERR "mce#%d: Corrected error %s on label \"%s\" "
+			 "(page 0x%lux, offset 0x%lux, grain %ud, "
+			 "syndrome 0x%lux, row %d, channel %d)\n",
+		__entry->mc_index,
+		__entry->msg,
+		__entry->label,
+		__entry->page_frame_number,
+		__entry->offset_in_page,
+		__entry->grain,
+		__entry->syndrome,
+		__entry->row,
+		__entry->channel)
+);
+
+TRACE_EVENT(mc_uncorrected_error,
+
+	TP_PROTO(struct mem_ctl_info *mci,
+		unsigned long page_frame_number,
+		unsigned long offset_in_page,
+		int row, const char *msg, const char *label),
+
+	TP_ARGS(mci, page_frame_number, offset_in_page,
+		row, msg, label),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	mc_index		)
+		__field(	unsigned long,	page_frame_number	)
+		__field(	unsigned long,	offset_in_page		)
+		__field(	u32,		grain			)
+		__field(	int,		row			)
+		__field(	const char *,	msg			)
+		__field(	const char *,	label			)
+	),
+
+	TP_fast_assign(
+		__entry->mc_index		= mci->mc_idx;
+		__entry->page_frame_number	= page_frame_number;
+		__entry->offset_in_page		= offset_in_page;
+		__entry->grain			= mci->csrows[row].grain;
+		__entry->row			= row;
+		__entry->msg			= msg;
+		__entry->label			= label;
+	),
+
+	TP_printk(HW_ERR "mce#%d: Uncorrected error %s on label \"%s\""
+			 "(page 0x%lux, offset 0x%lux, grain %ud, row %d)\n",
+		__entry->mc_index,
+		__entry->msg,
+		__entry->label,
+		__entry->page_frame_number,
+		__entry->offset_in_page,
+		__entry->grain,
+		__entry->row)
+);
+
+
+/*
+ * Fully-Buffered memory hardware in general don't provide syndrome/grain/row
+ * information for all types of errors. So, we need to either have another
+ * trace event or add a bitmapped field to indicate that some info are not
+ * provided and use the previously-declared event. It seemed easier and less
+ * confusing to create a different event for such cases
+ */
+TRACE_EVENT(mc_corrected_error_fbd,
+
+	TP_PROTO(struct mem_ctl_info *mci,
+		int row, int channel, const char *msg),
+
+	TP_ARGS(mci, row, channel, msg),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	mc_index		)
+		__field(	int,		row			)
+		__field(	int,		channel			)
+		__field(	const char *,	label			)
+		__field(	const char *,	msg			)
+	),
+
+	TP_fast_assign(
+		__entry->mc_index		= mci->mc_idx;
+		__entry->row			= row;
+		__entry->channel		= channel;
+		__entry->label			= mci->csrows[row].channels[channel].label;
+		__entry->msg			= msg;
+	),
+
+	TP_printk(HW_ERR "mce#%d: Corrected Error %s on label \"%s\" "
+			 "(row %d, channel %d)\n",
+		__entry->mc_index,
+		__entry->msg,
+		__entry->label,
+		__entry->row,
+		__entry->channel)
+);
+
+TRACE_EVENT(mc_uncorrected_error_fbd,
+
+	TP_PROTO(struct mem_ctl_info *mci,
+		int row, int channela, int channelb,
+		const char *msg, const char *label),
+
+	TP_ARGS(mci, row, channela, channelb, msg, label),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	mc_index		)
+		__field(	int,		row			)
+		__field(	int,		channela		)
+		__field(	int,		channelb		)
+		__field(	const char *,	msg			)
+		__field(	const char *,	label			)
+	),
+
+	TP_fast_assign(
+		__entry->mc_index		= mci->mc_idx;
+		__entry->row			= row;
+		__entry->channela		= channela;
+		__entry->channelb		= channelb;
+		__entry->msg			= msg;
+		__entry->label			= label;
+	),
+
+	TP_printk(HW_ERR "mce#%d: Uncorrected Error %s on label \"%s\" "
+			 "(row %d, channels: %d, %d)\n",
+		__entry->mc_index,
+		__entry->msg,
+		__entry->label,
+		__entry->row,
+		__entry->channela,
+		__entry->channelb)
+);
+
+/*
+ * The Memory controller driver needs to discover the memory topology, in
+ * order to associate a hardware error with the memory label. If, for any
+ * reason, it receives an error for a channel or row that are not supposed
+ * to be there, an error event needs to be generated to indicate:
+ *	- that a Corrected or Uncorrected error was received;
+ *	- that the driver has a bug and, for that particular hardware, was
+ *	  not capable of detecting the hardware architecture
+ * If one of such errors is ever received, a bug to the kernel driver must
+ * be filled.
+ */
+
+TRACE_EVENT(mc_out_of_range,
+	TP_PROTO(struct mem_ctl_info *mci, const char *type, const char *field,
+		int invalid_val, int min, int max),
+
+	TP_ARGS(mci, type, field, invalid_val, min, max),
+
+	TP_STRUCT__entry(
+		__field(	const char *,	type			)
+		__field(	const char *,	field			)
+		__field(	unsigned int,	mc_index		)
+		__field(	int,		invalid_val		)
+		__field(	int,		min			)
+		__field(	int,		max			)
+	),
+
+	TP_fast_assign(
+		__entry->type			= type;
+		__entry->field			= field;
+		__entry->mc_index		= mci->mc_idx;
+		__entry->invalid_val		= invalid_val;
+		__entry->min			= min;
+		__entry->max			= max;
+	),
+
+	TP_printk(HW_ERR "mce#%d %s: %s=%d is not between %d and %d\n",
+		__entry->mc_index,
+		__entry->type,
+		__entry->field,
+		__entry->invalid_val,
+		__entry->min,
+		__entry->max)
+);
+
+/*
+ * On some cases, a corrected or uncorrected error was detected, but it
+ * couldn't be properly handled, or because another error overrided the
+ * error registers that details the error or because of some internal problem
+ * on the driver. Those events bellow are meant for those error types.
+ */
+TRACE_EVENT(mc_corrected_error_no_info,
+	TP_PROTO(struct mem_ctl_info *mci, const char *msg),
+
+	TP_ARGS(mci, msg),
+
+	TP_STRUCT__entry(
+		__field(	const char *,	msg			)
+		__field(	unsigned int,	mc_index		)
+	),
+
+	TP_fast_assign(
+		__entry->msg			= msg;
+		__entry->mc_index		= mci->mc_idx;
+	),
+
+	TP_printk(HW_ERR "mce#%d: Corrected Error: %s\n",
+		__entry->mc_index,
+		__entry->msg)
+);
+
+TRACE_EVENT(mc_uncorrected_error_no_info,
+	TP_PROTO(struct mem_ctl_info *mci, const char *msg),
+
+	TP_ARGS(mci, msg),
+
+	TP_STRUCT__entry(
+		__field(	const char *,	msg			)
+		__field(	unsigned int,	mc_index		)
+	),
+
+	TP_fast_assign(
+		__entry->msg			= msg;
+		__entry->mc_index		= mci->mc_idx;
+	),
+
+	TP_printk(HW_ERR "mce#%d: Uncorrected Error: %s\n",
+		__entry->mc_index,
+		__entry->msg)
+);
+
+
+
+/*
+ * MCE Events placeholder. Please add non-memory events that come from the
+ * MCE driver here
+ */
+
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 02/31] events/hw_event: use __string() trace macros for events
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 01/31] events/hw_event: Create a " Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 03/31] hw_event: Consolidate uncorrected/corrected error msgs into one Mauro Carvalho Chehab
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Some data there uses temporary alloced space. Just attributing
string pointers directly won't work on such cases.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 include/trace/events/hw_event.h |   78 +++++++++++++++++++-------------------
 1 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/include/trace/events/hw_event.h b/include/trace/events/hw_event.h
index 3735c6f..078a099 100644
--- a/include/trace/events/hw_event.h
+++ b/include/trace/events/hw_event.h
@@ -20,17 +20,17 @@ DECLARE_EVENT_CLASS(hw_event_class,
 	TP_ARGS(type, instance),
 
 	TP_STRUCT__entry(
-		__field(	const char *,	type			)
+		__string(	type,		type			)
 		__field(	unsigned int,	instance		)
 	),
 
 	TP_fast_assign(
-		__entry->type	= type;
+		__assign_str(type, type);
 		__entry->instance = instance;
 	),
 
 	TP_printk("Initialized %s#%d\n",
-		__entry->type,
+		__get_str(type),
 		__entry->instance)
 );
 
@@ -70,8 +70,8 @@ TRACE_EVENT(mc_corrected_error,
 		__field(	unsigned long,	syndrome		)
 		__field(	int,		row			)
 		__field(	int,		channel			)
-		__field(	const char *,	label			)
-		__field(	const char *,	msg			)
+		__string(	label,		mci->csrows[row].channels[channel].label)
+		__string(	msg,		msg			)
 	),
 
 	TP_fast_assign(
@@ -82,16 +82,16 @@ TRACE_EVENT(mc_corrected_error,
 		__entry->syndrome		= syndrome;
 		__entry->row			= row;
 		__entry->channel		= channel;
-		__entry->label			= mci->csrows[row].channels[channel].label;
-		__entry->msg			= msg;
+		__assign_str(label, mci->csrows[row].channels[channel].label);
+		__assign_str(msg, msg);
 	),
 
 	TP_printk(HW_ERR "mce#%d: Corrected error %s on label \"%s\" "
 			 "(page 0x%lux, offset 0x%lux, grain %ud, "
 			 "syndrome 0x%lux, row %d, channel %d)\n",
 		__entry->mc_index,
-		__entry->msg,
-		__entry->label,
+		__get_str(msg),
+		__get_str(label),
 		__entry->page_frame_number,
 		__entry->offset_in_page,
 		__entry->grain,
@@ -116,8 +116,8 @@ TRACE_EVENT(mc_uncorrected_error,
 		__field(	unsigned long,	offset_in_page		)
 		__field(	u32,		grain			)
 		__field(	int,		row			)
-		__field(	const char *,	msg			)
-		__field(	const char *,	label			)
+		__string(	msg,		msg			)
+		__string(	label,		label			)
 	),
 
 	TP_fast_assign(
@@ -126,15 +126,15 @@ TRACE_EVENT(mc_uncorrected_error,
 		__entry->offset_in_page		= offset_in_page;
 		__entry->grain			= mci->csrows[row].grain;
 		__entry->row			= row;
-		__entry->msg			= msg;
-		__entry->label			= label;
+		__assign_str(msg, msg);
+		__assign_str(label, label);
 	),
 
 	TP_printk(HW_ERR "mce#%d: Uncorrected error %s on label \"%s\""
 			 "(page 0x%lux, offset 0x%lux, grain %ud, row %d)\n",
 		__entry->mc_index,
-		__entry->msg,
-		__entry->label,
+		__get_str(msg),
+		__get_str(label),
 		__entry->page_frame_number,
 		__entry->offset_in_page,
 		__entry->grain,
@@ -160,23 +160,23 @@ TRACE_EVENT(mc_corrected_error_fbd,
 		__field(	unsigned int,	mc_index		)
 		__field(	int,		row			)
 		__field(	int,		channel			)
-		__field(	const char *,	label			)
-		__field(	const char *,	msg			)
+		__string(	label,		mci->csrows[row].channels[channel].label)
+		__string(	msg,		msg			)
 	),
 
 	TP_fast_assign(
 		__entry->mc_index		= mci->mc_idx;
 		__entry->row			= row;
 		__entry->channel		= channel;
-		__entry->label			= mci->csrows[row].channels[channel].label;
-		__entry->msg			= msg;
+		__assign_str(label, mci->csrows[row].channels[channel].label);
+		__assign_str(msg, msg);
 	),
 
 	TP_printk(HW_ERR "mce#%d: Corrected Error %s on label \"%s\" "
 			 "(row %d, channel %d)\n",
 		__entry->mc_index,
-		__entry->msg,
-		__entry->label,
+		__get_str(msg),
+		__get_str(label),
 		__entry->row,
 		__entry->channel)
 );
@@ -194,8 +194,8 @@ TRACE_EVENT(mc_uncorrected_error_fbd,
 		__field(	int,		row			)
 		__field(	int,		channela		)
 		__field(	int,		channelb		)
-		__field(	const char *,	msg			)
-		__field(	const char *,	label			)
+		__string(	msg,		msg			)
+		__string(	label,		label			)
 	),
 
 	TP_fast_assign(
@@ -203,15 +203,15 @@ TRACE_EVENT(mc_uncorrected_error_fbd,
 		__entry->row			= row;
 		__entry->channela		= channela;
 		__entry->channelb		= channelb;
-		__entry->msg			= msg;
-		__entry->label			= label;
+		__assign_str(msg, msg);
+		__assign_str(label, label);
 	),
 
 	TP_printk(HW_ERR "mce#%d: Uncorrected Error %s on label \"%s\" "
 			 "(row %d, channels: %d, %d)\n",
 		__entry->mc_index,
-		__entry->msg,
-		__entry->label,
+		__get_str(msg),
+		__get_str(label),
 		__entry->row,
 		__entry->channela,
 		__entry->channelb)
@@ -236,8 +236,8 @@ TRACE_EVENT(mc_out_of_range,
 	TP_ARGS(mci, type, field, invalid_val, min, max),
 
 	TP_STRUCT__entry(
-		__field(	const char *,	type			)
-		__field(	const char *,	field			)
+		__string(	type,		type			)
+		__string(	field,		field			)
 		__field(	unsigned int,	mc_index		)
 		__field(	int,		invalid_val		)
 		__field(	int,		min			)
@@ -245,8 +245,8 @@ TRACE_EVENT(mc_out_of_range,
 	),
 
 	TP_fast_assign(
-		__entry->type			= type;
-		__entry->field			= field;
+		__assign_str(type, type);
+		__assign_str(field, field);
 		__entry->mc_index		= mci->mc_idx;
 		__entry->invalid_val		= invalid_val;
 		__entry->min			= min;
@@ -255,8 +255,8 @@ TRACE_EVENT(mc_out_of_range,
 
 	TP_printk(HW_ERR "mce#%d %s: %s=%d is not between %d and %d\n",
 		__entry->mc_index,
-		__entry->type,
-		__entry->field,
+		__get_str(type),
+		__get_str(field),
 		__entry->invalid_val,
 		__entry->min,
 		__entry->max)
@@ -274,18 +274,18 @@ TRACE_EVENT(mc_corrected_error_no_info,
 	TP_ARGS(mci, msg),
 
 	TP_STRUCT__entry(
-		__field(	const char *,	msg			)
+	__string(	msg,			msg			)
 		__field(	unsigned int,	mc_index		)
 	),
 
 	TP_fast_assign(
-		__entry->msg			= msg;
+		__assign_str(msg, msg);
 		__entry->mc_index		= mci->mc_idx;
 	),
 
 	TP_printk(HW_ERR "mce#%d: Corrected Error: %s\n",
 		__entry->mc_index,
-		__entry->msg)
+		__get_str(msg))
 );
 
 TRACE_EVENT(mc_uncorrected_error_no_info,
@@ -294,18 +294,18 @@ TRACE_EVENT(mc_uncorrected_error_no_info,
 	TP_ARGS(mci, msg),
 
 	TP_STRUCT__entry(
-		__field(	const char *,	msg			)
+		__string(	msg,		msg			)
 		__field(	unsigned int,	mc_index		)
 	),
 
 	TP_fast_assign(
-		__entry->msg			= msg;
+		__assign_str(msg, msg);
 		__entry->mc_index		= mci->mc_idx;
 	),
 
 	TP_printk(HW_ERR "mce#%d: Uncorrected Error: %s\n",
 		__entry->mc_index,
-		__entry->msg)
+		__get_str(msg))
 );
 
 
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 03/31] hw_event: Consolidate uncorrected/corrected error msgs into one
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 01/31] events/hw_event: Create a " Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 02/31] events/hw_event: use __string() trace macros for events Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 04/31] drivers/edac: rename channel_info to csrow_channel_info Mauro Carvalho Chehab
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

This is an RFC patch, consolidating two trace calls into one.
Not sure if this is the better thing to do, but it simplifies
the error tracepoint, while still keeping the technical details
that may be needed by someone debugging the driver or for
the vendors to double-check what's happening inside the system.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c          |   51 +++++++--
 include/linux/edac.h            |    6 +
 include/trace/events/hw_event.h |  231 ++++-----------------------------------
 3 files changed, 68 insertions(+), 220 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 2b8382e..5038239 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -685,6 +685,7 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 		int row, int channel, const char *msg)
 {
 	unsigned long remapped_page;
+	char detail[80];
 
 	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
 
@@ -711,8 +712,15 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 		return;
 	}
 
-	trace_mc_corrected_error(mci, page_frame_number, offset_in_page,
-				syndrome, row, channel, msg);
+	/* Memory type dependent details about the error */
+	snprintf(detail, sizeof(detail),
+		 " (page 0x%lx, offset 0x%lx, grain %d, "
+		 "syndrome 0x%lx, row %d, channel %d)\n",
+		 page_frame_number, offset_in_page,
+		 mci->csrows[row].grain, syndrome, row, channel);
+	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
+		       mci->csrows[row].channels[channel].label,
+		       msg, detail);
 
 	if (edac_mc_get_log_ce())
 		/* FIXME - put in DIMM location */
@@ -749,7 +757,8 @@ EXPORT_SYMBOL_GPL(edac_mc_handle_ce);
 
 void edac_mc_handle_ce_no_info(struct mem_ctl_info *mci, const char *msg)
 {
-	trace_mc_corrected_error_no_info(mci, msg);
+	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
+		       "unknown", msg, "");
 	if (edac_mc_get_log_ce())
 		edac_mc_printk(mci, KERN_WARNING,
 			"CE - no information available: %s\n", msg);
@@ -768,6 +777,7 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 	char *pos = labels;
 	int chan;
 	int chars;
+	char detail[80];
 
 	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
 
@@ -796,8 +806,15 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 		pos += chars;
 	}
 
-	trace_mc_uncorrected_error(mci, page_frame_number, offset_in_page,
-				row, msg, labels);
+	/* Memory type dependent details about the error */
+	snprintf(detail, sizeof(detail),
+		 "page 0x%lx, offset 0x%lx, grain %d, row %d ",
+		 page_frame_number, offset_in_page,
+	         mci->csrows[row].grain, row);
+	trace_mc_error(HW_EVENT_ERR_UNCORRECTED, mci->mc_idx,
+		       labels,
+		       msg, detail);
+
 	if (edac_mc_get_log_ue())
 		edac_mc_printk(mci, KERN_EMERG,
 			"UE page 0x%lx, offset 0x%lx, grain %d, row %d, "
@@ -818,7 +835,8 @@ EXPORT_SYMBOL_GPL(edac_mc_handle_ue);
 
 void edac_mc_handle_ue_no_info(struct mem_ctl_info *mci, const char *msg)
 {
-	trace_mc_uncorrected_error_no_info(mci, msg);
+	trace_mc_error(HW_EVENT_ERR_UNCORRECTED, mci->mc_idx,
+		       "unknown", msg, "");
 	if (edac_mc_get_panic_on_ue())
 		panic("EDAC MC%d: Uncorrected Error", mci->mc_idx);
 
@@ -843,6 +861,7 @@ void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
 	char labels[len + 1];
 	char *pos = labels;
 	int chars;
+	char detail[80];
 
 	if (csrow >= mci->nr_csrows) {
 		/* something is wrong */
@@ -891,8 +910,13 @@ void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
 	chars = snprintf(pos, len + 1, "-%s",
 			 mci->csrows[csrow].channels[channelb].label);
 
-	trace_mc_uncorrected_error_fbd(mci, csrow, channela, channelb,
-				       msg, labels);
+	/* Memory type dependent details about the error */
+	snprintf(detail, sizeof(detail),
+		 "row %d, channel-a= %d channel-b= %d ",
+		 csrow, channela, channelb);
+	trace_mc_error(HW_EVENT_ERR_UNCORRECTED, mci->mc_idx,
+		       labels,
+		       msg, detail);
 	if (edac_mc_get_log_ue())
 		edac_mc_printk(mci, KERN_EMERG,
 			"UE row %d, channel-a= %d channel-b= %d "
@@ -913,7 +937,7 @@ EXPORT_SYMBOL(edac_mc_handle_fbd_ue);
 void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
 			unsigned int csrow, unsigned int channel, char *msg)
 {
-
+	char detail[80];
 	/* Ensure boundary values */
 	if (csrow >= mci->nr_csrows) {
 		/* something is wrong */
@@ -936,7 +960,14 @@ void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
 		return;
 	}
 
-	trace_mc_corrected_error_fbd(mci, csrow, channel, msg);
+	/* Memory type dependent details about the error */
+	snprintf(detail, sizeof(detail),
+		 "(row %d, channel %d)\n",
+		 csrow, channel);
+	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
+		       mci->csrows[csrow].channels[channel].label,
+		       msg, detail);
+
 	if (edac_mc_get_log_ce())
 		/* FIXME - put in DIMM location */
 		edac_mc_printk(mci, KERN_WARNING,
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 055b248..3ba99d7 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -66,6 +66,12 @@ enum dev_type {
 #define DEV_FLAG_X32		BIT(DEV_X32)
 #define DEV_FLAG_X64		BIT(DEV_X64)
 
+enum hw_event_mc_err_type {
+	HW_EVENT_ERR_CORRECTED,
+	HW_EVENT_ERR_UNCORRECTED,
+	HW_EVENT_ERR_FATAL,
+};
+
 /* memory types */
 enum mem_type {
 	MEM_EMPTY = 0,		/* Empty csrow */
diff --git a/include/trace/events/hw_event.h b/include/trace/events/hw_event.h
index 078a099..fee7ed2 100644
--- a/include/trace/events/hw_event.h
+++ b/include/trace/events/hw_event.h
@@ -52,183 +52,42 @@ DEFINE_EVENT(hw_event_class, hw_event_init,
 /*
  * Default error mechanisms for Memory Controller errors (CE and UE)
  */
-TRACE_EVENT(mc_corrected_error,
+TRACE_EVENT(mc_error,
 
-	TP_PROTO(struct mem_ctl_info *mci,
-		unsigned long page_frame_number,
-		unsigned long offset_in_page, unsigned long syndrome,
-		int row, int channel, const char *msg),
+	TP_PROTO(unsigned int err_type,
+		 unsigned int mc_index,
+		 const char *label,
+		 const char *msg,
+		 const char *detail),
 
-	TP_ARGS(mci, page_frame_number, offset_in_page, syndrome, row,
-		channel, msg),
+	TP_ARGS(err_type, mc_index, label, msg, detail),
 
 	TP_STRUCT__entry(
+		__field(	unsigned int,	err_type		)
 		__field(	unsigned int,	mc_index		)
-		__field(	unsigned long,	page_frame_number	)
-		__field(	unsigned long,	offset_in_page		)
-		__field(	u32,		grain			)
-		__field(	unsigned long,	syndrome		)
-		__field(	int,		row			)
-		__field(	int,		channel			)
-		__string(	label,		mci->csrows[row].channels[channel].label)
-		__string(	msg,		msg			)
-	),
-
-	TP_fast_assign(
-		__entry->mc_index		= mci->mc_idx;
-		__entry->page_frame_number	= page_frame_number;
-		__entry->offset_in_page		= offset_in_page;
-		__entry->grain			= mci->csrows[row].grain;
-		__entry->syndrome		= syndrome;
-		__entry->row			= row;
-		__entry->channel		= channel;
-		__assign_str(label, mci->csrows[row].channels[channel].label);
-		__assign_str(msg, msg);
-	),
-
-	TP_printk(HW_ERR "mce#%d: Corrected error %s on label \"%s\" "
-			 "(page 0x%lux, offset 0x%lux, grain %ud, "
-			 "syndrome 0x%lux, row %d, channel %d)\n",
-		__entry->mc_index,
-		__get_str(msg),
-		__get_str(label),
-		__entry->page_frame_number,
-		__entry->offset_in_page,
-		__entry->grain,
-		__entry->syndrome,
-		__entry->row,
-		__entry->channel)
-);
-
-TRACE_EVENT(mc_uncorrected_error,
-
-	TP_PROTO(struct mem_ctl_info *mci,
-		unsigned long page_frame_number,
-		unsigned long offset_in_page,
-		int row, const char *msg, const char *label),
-
-	TP_ARGS(mci, page_frame_number, offset_in_page,
-		row, msg, label),
-
-	TP_STRUCT__entry(
-		__field(	unsigned int,	mc_index		)
-		__field(	unsigned long,	page_frame_number	)
-		__field(	unsigned long,	offset_in_page		)
-		__field(	u32,		grain			)
-		__field(	int,		row			)
-		__string(	msg,		msg			)
 		__string(	label,		label			)
-	),
-
-	TP_fast_assign(
-		__entry->mc_index		= mci->mc_idx;
-		__entry->page_frame_number	= page_frame_number;
-		__entry->offset_in_page		= offset_in_page;
-		__entry->grain			= mci->csrows[row].grain;
-		__entry->row			= row;
-		__assign_str(msg, msg);
-		__assign_str(label, label);
-	),
-
-	TP_printk(HW_ERR "mce#%d: Uncorrected error %s on label \"%s\""
-			 "(page 0x%lux, offset 0x%lux, grain %ud, row %d)\n",
-		__entry->mc_index,
-		__get_str(msg),
-		__get_str(label),
-		__entry->page_frame_number,
-		__entry->offset_in_page,
-		__entry->grain,
-		__entry->row)
-);
-
-
-/*
- * Fully-Buffered memory hardware in general don't provide syndrome/grain/row
- * information for all types of errors. So, we need to either have another
- * trace event or add a bitmapped field to indicate that some info are not
- * provided and use the previously-declared event. It seemed easier and less
- * confusing to create a different event for such cases
- */
-TRACE_EVENT(mc_corrected_error_fbd,
-
-	TP_PROTO(struct mem_ctl_info *mci,
-		int row, int channel, const char *msg),
-
-	TP_ARGS(mci, row, channel, msg),
-
-	TP_STRUCT__entry(
-		__field(	unsigned int,	mc_index		)
-		__field(	int,		row			)
-		__field(	int,		channel			)
-		__string(	label,		mci->csrows[row].channels[channel].label)
 		__string(	msg,		msg			)
+		__string(	detail,		detail			)
 	),
 
 	TP_fast_assign(
-		__entry->mc_index		= mci->mc_idx;
-		__entry->row			= row;
-		__entry->channel		= channel;
-		__assign_str(label, mci->csrows[row].channels[channel].label);
-		__assign_str(msg, msg);
-	),
-
-	TP_printk(HW_ERR "mce#%d: Corrected Error %s on label \"%s\" "
-			 "(row %d, channel %d)\n",
-		__entry->mc_index,
-		__get_str(msg),
-		__get_str(label),
-		__entry->row,
-		__entry->channel)
-);
-
-TRACE_EVENT(mc_uncorrected_error_fbd,
-
-	TP_PROTO(struct mem_ctl_info *mci,
-		int row, int channela, int channelb,
-		const char *msg, const char *label),
-
-	TP_ARGS(mci, row, channela, channelb, msg, label),
-
-	TP_STRUCT__entry(
-		__field(	unsigned int,	mc_index		)
-		__field(	int,		row			)
-		__field(	int,		channela		)
-		__field(	int,		channelb		)
-		__string(	msg,		msg			)
-		__string(	label,		label			)
-	),
-
-	TP_fast_assign(
-		__entry->mc_index		= mci->mc_idx;
-		__entry->row			= row;
-		__entry->channela		= channela;
-		__entry->channelb		= channelb;
-		__assign_str(msg, msg);
+		__entry->err_type		= err_type;
+		__entry->mc_index		= mc_index;
 		__assign_str(label, label);
+		__assign_str(msg, msg);
+		__assign_str(detail, detail);
 	),
 
-	TP_printk(HW_ERR "mce#%d: Uncorrected Error %s on label \"%s\" "
-			 "(row %d, channels: %d, %d)\n",
-		__entry->mc_index,
-		__get_str(msg),
-		__get_str(label),
-		__entry->row,
-		__entry->channela,
-		__entry->channelb)
+	TP_printk(HW_ERR "mce#%d: %s error %s on label \"%s\" %s\n",
+		  __entry->mc_index,
+		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+			"Fatal" : "Uncorrected"),
+		  __get_str(msg),
+		  __get_str(label),
+		  __get_str(detail))
 );
 
-/*
- * The Memory controller driver needs to discover the memory topology, in
- * order to associate a hardware error with the memory label. If, for any
- * reason, it receives an error for a channel or row that are not supposed
- * to be there, an error event needs to be generated to indicate:
- *	- that a Corrected or Uncorrected error was received;
- *	- that the driver has a bug and, for that particular hardware, was
- *	  not capable of detecting the hardware architecture
- * If one of such errors is ever received, a bug to the kernel driver must
- * be filled.
- */
-
 TRACE_EVENT(mc_out_of_range,
 	TP_PROTO(struct mem_ctl_info *mci, const char *type, const char *field,
 		int invalid_val, int min, int max),
@@ -263,54 +122,6 @@ TRACE_EVENT(mc_out_of_range,
 );
 
 /*
- * On some cases, a corrected or uncorrected error was detected, but it
- * couldn't be properly handled, or because another error overrided the
- * error registers that details the error or because of some internal problem
- * on the driver. Those events bellow are meant for those error types.
- */
-TRACE_EVENT(mc_corrected_error_no_info,
-	TP_PROTO(struct mem_ctl_info *mci, const char *msg),
-
-	TP_ARGS(mci, msg),
-
-	TP_STRUCT__entry(
-	__string(	msg,			msg			)
-		__field(	unsigned int,	mc_index		)
-	),
-
-	TP_fast_assign(
-		__assign_str(msg, msg);
-		__entry->mc_index		= mci->mc_idx;
-	),
-
-	TP_printk(HW_ERR "mce#%d: Corrected Error: %s\n",
-		__entry->mc_index,
-		__get_str(msg))
-);
-
-TRACE_EVENT(mc_uncorrected_error_no_info,
-	TP_PROTO(struct mem_ctl_info *mci, const char *msg),
-
-	TP_ARGS(mci, msg),
-
-	TP_STRUCT__entry(
-		__string(	msg,		msg			)
-		__field(	unsigned int,	mc_index		)
-	),
-
-	TP_fast_assign(
-		__assign_str(msg, msg);
-		__entry->mc_index		= mci->mc_idx;
-	),
-
-	TP_printk(HW_ERR "mce#%d: Uncorrected Error: %s\n",
-		__entry->mc_index,
-		__get_str(msg))
-);
-
-
-
-/*
  * MCE Events placeholder. Please add non-memory events that come from the
  * MCE driver here
  */
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 04/31] drivers/edac: rename channel_info to csrow_channel_info
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 03/31] hw_event: Consolidate uncorrected/corrected error msgs into one Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 05/31] edac: Create a dimm struct and move the labels into it Mauro Carvalho Chehab
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Newer memory architectures use the term "channel" with a different
meaning.

On a traditional architecture, the memory controller directly
access the memory ranks, via chip select rows.

The memory controller and the memory configuration can be single-channel
or double-channel. Single-channel means 64 bits access, while dual-channel
means 128 parallel access, provided by two memory sticks that are accessed
at the same time. The addresses between channel A and channel B are
interleaved. The memories on each channel should be identical for it
to work.

When two channels are provided, the DIMM memories are generally called
DIMM 1A/DIMM 1B (where 1A means DIMM 1 at channel A, and so on).

On some modern memory architectures like FB-DIMM, there's a microcontroller
chip, called Advanced Memory Buffer (AMB) that serves as the interface
between the memory controller and the memory chips. So, the memory
controller sees independent memory channels, and doesn't actually select
the memory by a "chip select". Instead, it passes the DIMM slot it wants
to access to the AMB.

It is up to the AMB to talk with the csrows of the DRAM chips.

The bus that exchanges information between the memory controller and
the CPU is also called "channel", but it is not associated with
the channel interleaving, as each channel is independent. The entire
csrow concept is not even visible to the memory controller, as using
csrows is a task for the AMB.

Newer memory controllers like the one found on Intel Sandy Bridge
processors, even when working with normal DDR3 DIMM's, don't use the
channel A/channel B interleaving schema to provide 128 bits. Instead,
they have more channels (3 or 4 channels), and several interleaving
schemas are supported, and a csrow concept is not directly visible by
the memory controller.

The drivers that support such such newer memory architecture models
currently need to fake information and to abuse on EDAC structures,
as the subsystem was conceived with the idea that the csrow would always
be visible by the CPU.

To make things a little worse, those drivers don't currently fake
csrows/channels on a consistent way, as the concepts there don't
apply to the memory controllers they're talking with.

In order to fix it, let's rename "channel" to "cschannel", in order to
be clearer that this channel info will only be used when the csrows
concept applies.

Latter patchsets will provide a better way to represent the memory
hierarchy on a way that will work with other memory architectures.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c |    6 +++---
 include/linux/edac.h   |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 5038239..8776f30 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -43,7 +43,7 @@ static LIST_HEAD(mc_devices);
 
 #ifdef CONFIG_EDAC_DEBUG
 
-static void edac_mc_dump_channel(struct channel_info *chan)
+static void edac_mc_dump_channel(struct csrow_channel_info *chan)
 {
 	debugf4("\tchannel = %p\n", chan);
 	debugf4("\tchannel->chan_idx = %d\n", chan->chan_idx);
@@ -160,7 +160,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 {
 	struct mem_ctl_info *mci;
 	struct csrow_info *csi, *csrow;
-	struct channel_info *chi, *chp, *chan;
+	struct csrow_channel_info *chi, *chp, *chan;
 	void *pvt;
 	unsigned size;
 	int row, chn;
@@ -185,7 +185,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 	 * rather than an imaginary chunk of memory located at address 0.
 	 */
 	csi = (struct csrow_info *)(((char *)mci) + ((unsigned long)csi));
-	chi = (struct channel_info *)(((char *)mci) + ((unsigned long)chi));
+	chi = (struct csrow_channel_info *)(((char *)mci) + ((unsigned long)chi));
 	pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;
 
 	/* setup index and various internal pointers */
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 3ba99d7..6e3ab94 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -191,7 +191,7 @@ enum scrub_type {
  * Socket:		A physical connector on the motherboard that accepts
  *			a single memory stick.
  *
- * Channel:		Set of memory devices on a memory stick that must be
+ * Csrow-channel:	Set of memory devices on a memory stick that must be
  *			grouped in parallel with one or more additional
  *			channels from other memory sticks.  This parallel
  *			grouping of the output from multiple channels are
@@ -249,7 +249,7 @@ enum scrub_type {
  * PS - I enjoyed writing all that about as much as you enjoyed reading it.
  */
 
-struct channel_info {
+struct csrow_channel_info {
 	int chan_idx;		/* channel index */
 	u32 ce_count;		/* Correctable Errors for this CHANNEL */
 	char label[EDAC_MC_LABEL_LEN + 1];	/* DIMM label on motherboard */
@@ -276,7 +276,7 @@ struct csrow_info {
 
 	/* channel information for this csrow */
 	u32 nr_channels;
-	struct channel_info *channels;
+	struct csrow_channel_info *channels;
 };
 
 struct mcidev_sysfs_group {
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 05/31] edac: Create a dimm struct and move the labels into it
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 04/31] drivers/edac: rename channel_info to csrow_channel_info Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 06/31] edac: Add per dimm's sysfs nodes Mauro Carvalho Chehab
                   ` (28 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.

This forced drivers to fake a csrow struct, and to create
a mess under csrow/channel original's concept.

Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel,
on csrow-based architectures, or on channel/dimm number,
on modern architectures.

On three drivers based on the modern architectures
(i5100_edac, sb_edac and i7core_edac), the labels were
filled inside the driver, as a way to avoid loosing the
channel/dimm number. Those drivers were converted to
properly fill the DIMM location properties internally.

All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c       |   95 ++++++++++++++++++++++++++++-------------
 drivers/edac/edac_mc_sysfs.c |   15 ++++--
 drivers/edac/i5100_edac.c    |   28 ++++++++++---
 drivers/edac/i7core_edac.c   |   18 ++++++--
 drivers/edac/i82975x_edac.c  |   13 +++++-
 drivers/edac/sb_edac.c       |   18 ++++++--
 include/linux/edac.h         |   31 +++++++++++++-
 7 files changed, 164 insertions(+), 54 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 8776f30..93ef044 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -48,7 +48,8 @@ static void edac_mc_dump_channel(struct csrow_channel_info *chan)
 	debugf4("\tchannel = %p\n", chan);
 	debugf4("\tchannel->chan_idx = %d\n", chan->chan_idx);
 	debugf4("\tchannel->ce_count = %d\n", chan->ce_count);
-	debugf4("\tchannel->label = '%s'\n", chan->label);
+	if (chan->dimm)
+		debugf4("\tchannel->label = '%s'\n", chan->dimm->label);
 	debugf4("\tchannel->csrow = %p\n\n", chan->csrow);
 }
 
@@ -161,6 +162,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 	struct mem_ctl_info *mci;
 	struct csrow_info *csi, *csrow;
 	struct csrow_channel_info *chi, *chp, *chan;
+	struct dimm_info *dimm;
 	void *pvt;
 	unsigned size;
 	int row, chn;
@@ -174,7 +176,8 @@ struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 	mci = (struct mem_ctl_info *)0;
 	csi = edac_align_ptr(&mci[1], sizeof(*csi));
 	chi = edac_align_ptr(&csi[nr_csrows], sizeof(*chi));
-	pvt = edac_align_ptr(&chi[nr_chans * nr_csrows], sz_pvt);
+	dimm = edac_align_ptr(&chi[nr_chans * nr_csrows], sizeof(*dimm));
+	pvt = edac_align_ptr(&dimm[nr_chans * nr_csrows], sz_pvt);
 	size = ((unsigned long)pvt) + sz_pvt;
 
 	mci = kzalloc(size, GFP_KERNEL);
@@ -186,11 +189,13 @@ struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 	 */
 	csi = (struct csrow_info *)(((char *)mci) + ((unsigned long)csi));
 	chi = (struct csrow_channel_info *)(((char *)mci) + ((unsigned long)chi));
+	dimm = (struct dimm_info *)(((char *)mci) + ((unsigned long)dimm));
 	pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;
 
 	/* setup index and various internal pointers */
 	mci->mc_idx = edac_index;
 	mci->csrows = csi;
+	mci->dimms  = dimm;
 	mci->pvt_info = pvt;
 	mci->nr_csrows = nr_csrows;
 
@@ -507,18 +512,37 @@ EXPORT_SYMBOL(edac_mc_find);
 /* FIXME - should a warning be printed if no error detection? correction? */
 int edac_mc_add_mc(struct mem_ctl_info *mci)
 {
+	int i, j;
+	struct dimm_info *dimm;
+
 	debugf0("%s()\n", __func__);
 
+	/*
+	 * If nr_dimms is not filled, that means that the driver itself
+	 * were not converted to use the new struct, or that the driver
+	 * is for a csrow-based device.
+	 * Fill the dimms accordingly.
+	 */
+	if (!mci->nr_dimms) {
+		mci->dimm_loc_type = DIMM_LOC_CSROW;
+		dimm = mci->dimms;
+		for (i = 0; i < mci->nr_csrows; i++) {
+			for (j = 0; j < mci->csrows[i].nr_channels; j++) {
+				mci->csrows[i].channels[j].dimm = dimm;
+				dimm->location.csrow = i;
+				dimm->location.csrow_channel = j;
+				dimm++;
+				mci->nr_dimms++;
+			}
+		}
+	}
 #ifdef CONFIG_EDAC_DEBUG
 	if (edac_debug_level >= 3)
 		edac_mc_dump_mci(mci);
 
 	if (edac_debug_level >= 4) {
-		int i;
 
 		for (i = 0; i < mci->nr_csrows; i++) {
-			int j;
-
 			edac_mc_dump_csrow(&mci->csrows[i]);
 			for (j = 0; j < mci->csrows[i].nr_channels; j++)
 				edac_mc_dump_channel(&mci->csrows[i].
@@ -685,7 +709,7 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 		int row, int channel, const char *msg)
 {
 	unsigned long remapped_page;
-	char detail[80];
+	char detail[80], *label = NULL;
 
 	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
 
@@ -712,6 +736,9 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 		return;
 	}
 
+	if (mci->csrows[row].channels[channel].dimm)
+		label = mci->csrows[row].channels[channel].dimm->label;
+
 	/* Memory type dependent details about the error */
 	snprintf(detail, sizeof(detail),
 		 " (page 0x%lx, offset 0x%lx, grain %d, "
@@ -719,8 +746,7 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 		 page_frame_number, offset_in_page,
 		 mci->csrows[row].grain, syndrome, row, channel);
 	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
-		       mci->csrows[row].channels[channel].label,
-		       msg, detail);
+		       label, msg, detail);
 
 	if (edac_mc_get_log_ce())
 		/* FIXME - put in DIMM location */
@@ -729,7 +755,7 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 			"0x%lx, row %d, channel %d, label \"%s\": %s\n",
 			page_frame_number, offset_in_page,
 			mci->csrows[row].grain, syndrome, row, channel,
-			mci->csrows[row].channels[channel].label, msg);
+			label, msg);
 
 	mci->ce_count++;
 	mci->csrows[row].ce_count++;
@@ -777,7 +803,7 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 	char *pos = labels;
 	int chan;
 	int chars;
-	char detail[80];
+	char detail[80], *label = NULL;
 
 	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
 
@@ -793,17 +819,21 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 		return;
 	}
 
-	chars = snprintf(pos, len + 1, "%s",
-			 mci->csrows[row].channels[0].label);
-	len -= chars;
-	pos += chars;
+	if (mci->csrows[row].channels[0].dimm) {
+		label = mci->csrows[row].channels[0].dimm->label;
+		chars = snprintf(pos, len + 1, "%s", label);
+		len -= chars;
+		pos += chars;
+	}
 
 	for (chan = 1; (chan < mci->csrows[row].nr_channels) && (len > 0);
 		chan++) {
-		chars = snprintf(pos, len + 1, ":%s",
-				 mci->csrows[row].channels[chan].label);
-		len -= chars;
-		pos += chars;
+		if (mci->csrows[row].channels[chan].dimm) {
+			label = mci->csrows[row].channels[chan].dimm->label;
+			chars = snprintf(pos, len + 1, ":%s", label);
+			len -= chars;
+			pos += chars;
+		}
 	}
 
 	/* Memory type dependent details about the error */
@@ -861,7 +891,7 @@ void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
 	char labels[len + 1];
 	char *pos = labels;
 	int chars;
-	char detail[80];
+	char detail[80], *label;
 
 	if (csrow >= mci->nr_csrows) {
 		/* something is wrong */
@@ -903,12 +933,15 @@ void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
 	mci->csrows[csrow].ue_count++;
 
 	/* Generate the DIMM labels from the specified channels */
-	chars = snprintf(pos, len + 1, "%s",
-			 mci->csrows[csrow].channels[channela].label);
-	len -= chars;
-	pos += chars;
-	chars = snprintf(pos, len + 1, "-%s",
-			 mci->csrows[csrow].channels[channelb].label);
+	if (mci->csrows[csrow].channels[channela].dimm) {
+		label = mci->csrows[csrow].channels[channela].dimm->label;
+		chars = snprintf(pos, len + 1, "%s", label);
+		len -= chars;
+		pos += chars;
+	}
+	if (mci->csrows[csrow].channels[channela].dimm)
+		chars = snprintf(pos, len + 1, "-%s",
+				mci->csrows[csrow].channels[channelb].dimm->label);
 
 	/* Memory type dependent details about the error */
 	snprintf(detail, sizeof(detail),
@@ -937,7 +970,7 @@ EXPORT_SYMBOL(edac_mc_handle_fbd_ue);
 void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
 			unsigned int csrow, unsigned int channel, char *msg)
 {
-	char detail[80];
+	char detail[80], *label = NULL;
 	/* Ensure boundary values */
 	if (csrow >= mci->nr_csrows) {
 		/* something is wrong */
@@ -964,16 +997,18 @@ void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
 	snprintf(detail, sizeof(detail),
 		 "(row %d, channel %d)\n",
 		 csrow, channel);
+
+	if (mci->csrows[csrow].channels[channel].dimm)
+		label = mci->csrows[csrow].channels[channel].dimm->label;
+
 	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
-		       mci->csrows[csrow].channels[channel].label,
-		       msg, detail);
+		       label, msg, detail);
 
 	if (edac_mc_get_log_ce())
 		/* FIXME - put in DIMM location */
 		edac_mc_printk(mci, KERN_WARNING,
 			"CE row %d, channel %d, label \"%s\": %s\n",
-			csrow, channel,
-			mci->csrows[csrow].channels[channel].label, msg);
+			csrow, channel, label, msg);
 
 	mci->ce_count++;
 	mci->csrows[csrow].ce_count++;
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 29ffa35..a439bed 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -170,11 +170,13 @@ static ssize_t channel_dimm_label_show(struct csrow_info *csrow,
 				char *data, int channel)
 {
 	/* if field has not been initialized, there is nothing to send */
-	if (!csrow->channels[channel].label[0])
+	if (!csrow->channels[channel].dimm)
+		return 0;
+	if (!csrow->channels[channel].dimm->label[0])
 		return 0;
 
 	return snprintf(data, EDAC_MC_LABEL_LEN, "%s\n",
-			csrow->channels[channel].label);
+			csrow->channels[channel].dimm->label);
 }
 
 static ssize_t channel_dimm_label_store(struct csrow_info *csrow,
@@ -183,9 +185,12 @@ static ssize_t channel_dimm_label_store(struct csrow_info *csrow,
 {
 	ssize_t max_size = 0;
 
+	if (!csrow->channels[channel].dimm)
+		return -EINVAL;
+
 	max_size = min((ssize_t) count, (ssize_t) EDAC_MC_LABEL_LEN - 1);
-	strncpy(csrow->channels[channel].label, data, max_size);
-	csrow->channels[channel].label[max_size] = '\0';
+	strncpy(csrow->channels[channel].dimm->label, data, max_size);
+	csrow->channels[channel].dimm->label[max_size] = '\0';
 
 	return max_size;
 }
@@ -952,7 +957,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 	/* CSROW error: backout what has already been registered,  */
 fail1:
 	for (i--; i >= 0; i--) {
-		if (csrow->nr_pages > 0) {
+		if (mci->csrows[i].nr_pages > 0) {
 			kobject_put(&mci->csrows[i].kobj);
 		}
 	}
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index bcbdeec..302e43b 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -428,12 +428,16 @@ static void i5100_handle_ce(struct mem_ctl_info *mci,
 			    const char *msg)
 {
 	const int csrow = i5100_rank_to_csrow(mci, chan, rank);
+	char *label = NULL;
+
+	if (mci->csrows[csrow].channels[0].dimm)
+		label = mci->csrows[csrow].channels[0].dimm->label;
 
 	printk(KERN_ERR
 		"CE chan %d, bank %u, rank %u, syndrome 0x%lx, "
 		"cas %u, ras %u, csrow %u, label \"%s\": %s\n",
 		chan, bank, rank, syndrome, cas, ras,
-		csrow, mci->csrows[csrow].channels[0].label, msg);
+		csrow, label, msg);
 
 	mci->ce_count++;
 	mci->csrows[csrow].ce_count++;
@@ -450,12 +454,16 @@ static void i5100_handle_ue(struct mem_ctl_info *mci,
 			    const char *msg)
 {
 	const int csrow = i5100_rank_to_csrow(mci, chan, rank);
+	char *label = NULL;
+
+	if (mci->csrows[csrow].channels[0].dimm)
+		label = mci->csrows[csrow].channels[0].dimm->label;
 
 	printk(KERN_ERR
 		"UE chan %d, bank %u, rank %u, syndrome 0x%lx, "
 		"cas %u, ras %u, csrow %u, label \"%s\": %s\n",
 		chan, bank, rank, syndrome, cas, ras,
-		csrow, mci->csrows[csrow].channels[0].label, msg);
+		csrow, label, msg);
 
 	mci->ue_count++;
 	mci->csrows[csrow].ue_count++;
@@ -840,7 +848,10 @@ static void __devinit i5100_init_csrows(struct mem_ctl_info *mci)
 	int i;
 	unsigned long total_pages = 0UL;
 	struct i5100_priv *priv = mci->pvt_info;
+	struct dimm_info *dimm;
 
+	dimm = mci->dimms;
+	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
 	for (i = 0; i < mci->nr_csrows; i++) {
 		const unsigned long npages = i5100_npages(mci, i);
 		const unsigned chan = i5100_csrow_to_chan(mci, i);
@@ -871,11 +882,16 @@ static void __devinit i5100_init_csrows(struct mem_ctl_info *mci)
 		mci->csrows[i].channels[0].chan_idx = 0;
 		mci->csrows[i].channels[0].ce_count = 0;
 		mci->csrows[i].channels[0].csrow = mci->csrows + i;
-		snprintf(mci->csrows[i].channels[0].label,
-			 sizeof(mci->csrows[i].channels[0].label),
-			 "DIMM%u", i5100_rank_to_slot(mci, chan, rank));
-
 		total_pages += npages;
+
+		mci->csrows[i].channels[0].dimm = dimm;
+		dimm->location.mc_channel = chan;
+		dimm->location.mc_dimm_number = rank;
+		snprintf(dimm->label, sizeof(dimm->label),
+			 "DIMM%u",
+			 i5100_rank_to_slot(mci, chan, rank));
+		mci->nr_dimms++;
+		dimm++;
 	}
 }
 
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 70ad892..4819df8 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -592,7 +592,7 @@ static int i7core_get_active_channels(const u8 socket, unsigned *channels,
 	return 0;
 }
 
-static int get_dimm_config(const struct mem_ctl_info *mci)
+static int get_dimm_config(struct mem_ctl_info *mci)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	struct csrow_info *csr;
@@ -602,6 +602,7 @@ static int get_dimm_config(const struct mem_ctl_info *mci)
 	unsigned long last_page = 0;
 	enum edac_type mode;
 	enum mem_type mtype;
+	struct dimm_info *dimm;
 
 	/* Get data from the MC register, function 0 */
 	pdev = pvt->pci_mcr[0];
@@ -638,6 +639,8 @@ static int get_dimm_config(const struct mem_ctl_info *mci)
 		numrow(pvt->info.max_dod >> 6),
 		numcol(pvt->info.max_dod >> 9));
 
+	dimm = mci->dimms;
+	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
 	for (i = 0; i < NUM_CHANS; i++) {
 		u32 data, dimm_dod[3], value[8];
 
@@ -744,12 +747,17 @@ static int get_dimm_config(const struct mem_ctl_info *mci)
 				csr->dtype = DEV_UNKNOWN;
 			}
 
+			csr->channels[0].dimm = dimm;
+			dimm->location.mc_channel = i;
+			dimm->location.mc_dimm_number = j;
+			snprintf(dimm->label, sizeof(dimm->label),
+				 "CPU#%uChannel#%u_DIMM#%u",
+				 pvt->i7core_dev->socket, i, j);
+			mci->nr_dimms++;
+			dimm++;
+
 			csr->edac_mode = mode;
 			csr->mtype = mtype;
-			snprintf(csr->channels[0].label,
-					sizeof(csr->channels[0].label),
-					"CPU#%uChannel#%u_DIMM#%u",
-					pvt->i7core_dev->socket, i, j);
 
 			csrow++;
 		}
diff --git a/drivers/edac/i82975x_edac.c b/drivers/edac/i82975x_edac.c
index a5da732..4a4026e 100644
--- a/drivers/edac/i82975x_edac.c
+++ b/drivers/edac/i82975x_edac.c
@@ -364,6 +364,7 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 	u8 value;
 	u32 cumul_size;
 	int index, chan;
+	struct dimm_info *dimm;
 
 	last_cumul_size = 0;
 
@@ -376,6 +377,8 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 	 *
 	 */
 
+	mci->dimm_loc_type = DIMM_LOC_CSROW;
+	dimm = mci->dimms;
 	for (index = 0; index < mci->nr_csrows; index++) {
 		csrow = &mci->csrows[index];
 
@@ -398,10 +401,16 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 		 *   [0-7] for single-channel; i.e. csrow->nr_channels = 1
 		 *   [0-3] for dual-channel; i.e. csrow->nr_channels = 2
 		 */
-		for (chan = 0; chan < csrow->nr_channels; chan++)
-			strncpy(csrow->channels[chan].label,
+		for (chan = 0; chan < csrow->nr_channels; chan++) {
+			mci->csrows[index].channels[chan].dimm = dimm;
+			dimm->location.csrow = index;
+			dimm->location.csrow_channel = chan;
+			strncpy(csrow->channels[chan].dimm->label,
 					labels[(index >> 1) + (chan * 2)],
 					EDAC_MC_LABEL_LEN);
+			dimm++;
+			mci->nr_dimms++;
+		}
 
 		if (cumul_size == last_cumul_size)
 			continue;	/* not populated */
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 7a402bf..34fa898 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -550,7 +550,7 @@ static int sbridge_get_active_channels(const u8 bus, unsigned *channels,
 	return 0;
 }
 
-static int get_dimm_config(const struct mem_ctl_info *mci)
+static int get_dimm_config(struct mem_ctl_info *mci)
 {
 	struct sbridge_pvt *pvt = mci->pvt_info;
 	struct csrow_info *csr;
@@ -560,6 +560,7 @@ static int get_dimm_config(const struct mem_ctl_info *mci)
 	u32 reg;
 	enum edac_type mode;
 	enum mem_type mtype;
+	struct dimm_info *dimm;
 
 	pci_read_config_dword(pvt->pci_br, SAD_TARGET, &reg);
 	pvt->sbridge_dev->source_id = SOURCE_ID(reg);
@@ -611,6 +612,8 @@ static int get_dimm_config(const struct mem_ctl_info *mci)
 	/* On all supported DDR3 DIMM types, there are 8 banks available */
 	banks = 8;
 
+	dimm = mci->dimms;
+	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
 	for (i = 0; i < NUM_CHANNELS; i++) {
 		u32 mtr;
 
@@ -650,12 +653,17 @@ static int get_dimm_config(const struct mem_ctl_info *mci)
 				csr->channels[0].chan_idx = i;
 				csr->channels[0].ce_count = 0;
 				pvt->csrow_map[i][j] = csrow;
-				snprintf(csr->channels[0].label,
-					 sizeof(csr->channels[0].label),
-					 "CPU_SrcID#%u_Channel#%u_DIMM#%u",
-					 pvt->sbridge_dev->source_id, i, j);
 				last_page += npages;
 				csrow++;
+
+				csr->channels[0].dimm = dimm;
+				dimm->location.mc_channel = i;
+				dimm->location.mc_dimm_number = j;
+				snprintf(dimm->label, sizeof(dimm->label),
+					 "CPU_SrcID#%u_Channel#%u_DIMM#%u",
+					 pvt->sbridge_dev->source_id, i, j);
+				mci->nr_dimms++;
+				dimm++;
 			}
 		}
 	}
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 6e3ab94..9f4deed 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -249,10 +249,31 @@ enum scrub_type {
  * PS - I enjoyed writing all that about as much as you enjoyed reading it.
  */
 
+enum dimm_location_type {
+	DIMM_LOC_CSROW,
+	DIMM_LOC_MC_CHANNEL,
+};
+
+/* FIXME: add a per-dimm ce error count */
+struct dimm_info {
+	char label[EDAC_MC_LABEL_LEN + 1];	/* DIMM label on motherboard */
+	unsigned memory_controller;
+	union {
+		struct {
+			unsigned mc_channel;
+			unsigned mc_dimm_number;
+		};
+		struct {
+			unsigned csrow;
+			unsigned csrow_channel;
+		};
+	} location;
+};
+
 struct csrow_channel_info {
 	int chan_idx;		/* channel index */
 	u32 ce_count;		/* Correctable Errors for this CHANNEL */
-	char label[EDAC_MC_LABEL_LEN + 1];	/* DIMM label on motherboard */
+	struct dimm_info *dimm;
 	struct csrow_info *csrow;	/* the parent */
 };
 
@@ -353,6 +374,14 @@ struct mem_ctl_info {
 	int mc_idx;
 	int nr_csrows;
 	struct csrow_info *csrows;
+
+	/*
+	 * DIMM info. Will eventually remove the entire csrows_info some day
+	 */
+	enum dimm_location_type dimm_loc_type;
+	unsigned nr_dimms;
+	struct dimm_info *dimms;
+
 	/*
 	 * FIXME - what about controllers on other busses? - IDs must be
 	 * unique.  dev pointer should be sufficiently unique, but
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 06/31] edac: Add per dimm's sysfs nodes
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 05/31] edac: Create a dimm struct and move the labels into it Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 07/31] edac: Prepare to push down to drivers the filling of the dimm_info Mauro Carvalho Chehab
                   ` (27 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Instead of just exporting a per-csrow dimm directories, add
a pure per-dimm attributes. This will help to better map
the DIMM properties, when csrow info is not available.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc_sysfs.c |  172 +++++++++++++++++++++++++++++++++++++++++-
 include/linux/edac.h         |    2 +
 2 files changed, 173 insertions(+), 1 deletions(-)

diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index a439bed..54b24cb 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -415,6 +415,156 @@ err_out:
 	return err;
 }
 
+/* dimm specific attribute structure */
+struct dimmdev_attribute {
+	struct attribute attr;
+	 ssize_t(*show) (struct dimm_info *, char *);
+	 ssize_t(*store) (struct dimm_info *, const char *, size_t);
+};
+
+#define DIMMDEV_ATTR(_name,_mode,_show,_store)	\
+static struct dimmdev_attribute attr_##_name = {			\
+	.attr = {.name = __stringify(_name), .mode = _mode },	\
+	.show   = _show,					\
+	.store  = _store,					\
+};
+
+#define to_dimm(k) container_of(k, struct dimm_info, kobj)
+#define to_dimmdev_attr(a) container_of(a, struct dimmdev_attribute, attr)
+
+/* Set of show/store higher level functions for default dimm attributes */
+static ssize_t dimmdev_show(struct kobject *kobj,
+			struct attribute *attr, char *buffer)
+{
+	struct dimm_info *dimm = to_dimm(kobj);
+	struct dimmdev_attribute *dimmdev_attr = to_dimmdev_attr(attr);
+
+	if (dimmdev_attr->show)
+		return dimmdev_attr->show(dimm, buffer);
+	return -EIO;
+}
+
+static ssize_t dimmdev_store(struct kobject *kobj, struct attribute *attr,
+			const char *buffer, size_t count)
+{
+	struct dimm_info *dimm = to_dimm(kobj);
+	struct dimmdev_attribute *dimmdev_attr = to_dimmdev_attr(attr);
+
+	if (dimmdev_attr->store)
+		return dimmdev_attr->store(dimm,
+					buffer,
+					count);
+	return -EIO;
+}
+
+static const struct sysfs_ops dimmfs_ops = {
+	.show = dimmdev_show,
+	.store = dimmdev_store
+};
+
+/* show/store functions for DIMM Label attributes */
+static ssize_t dimmdev_location_show(struct dimm_info *dimm, char *data)
+{
+	if (dimm->mci->dimm_loc_type == DIMM_LOC_CSROW)
+		return sprintf(data, "csrow %d, channel %d\n",
+			       dimm->location.csrow,
+			       dimm->location.csrow_channel);
+	else
+		return sprintf(data, "channel %d, dimm %d\n",
+			       dimm->location.mc_channel,
+			       dimm->location.mc_dimm_number);
+}
+
+static ssize_t dimmdev_label_show(struct dimm_info *dimm, char *data)
+{
+	/* if field has not been initialized, there is nothing to send */
+	if (!dimm->label[0])
+		return 0;
+
+	return snprintf(data, EDAC_MC_LABEL_LEN, "%s\n", dimm->label);
+}
+
+static ssize_t dimmdev_label_store(struct dimm_info *dimm,
+					const char *data,
+					size_t count)
+{
+	ssize_t max_size = 0;
+
+	max_size = min((ssize_t) count, (ssize_t) EDAC_MC_LABEL_LEN - 1);
+	strncpy(dimm->label, data, max_size);
+	dimm->label[max_size] = '\0';
+
+	return max_size;
+}
+
+/* default cwrow<id>/attribute files */
+DIMMDEV_ATTR(label, S_IRUGO | S_IWUSR, dimmdev_label_show, dimmdev_label_store);
+DIMMDEV_ATTR(location, S_IRUGO, dimmdev_location_show, NULL);
+
+/* default attributes of the DIMM<id> object */
+static struct dimmdev_attribute *default_dimm_attr[] = {
+	&attr_label,
+	&attr_location,
+	NULL,
+};
+
+/* No memory to release for this kobj */
+static void edac_dimm_instance_release(struct kobject *kobj)
+{
+	struct mem_ctl_info *mci;
+	struct dimm_info *cs;
+
+	debugf1("%s()\n", __func__);
+
+	cs = container_of(kobj, struct dimm_info, kobj);
+	mci = cs->mci;
+
+	kobject_put(&mci->edac_mci_kobj);
+}
+
+/* the kobj_type instance for a DIMM */
+static struct kobj_type ktype_dimm = {
+	.release = edac_dimm_instance_release,
+	.sysfs_ops = &dimmfs_ops,
+	.default_attrs = (struct attribute **)default_dimm_attr,
+};
+/* Create a CSROW object under specifed edac_mc_device */
+static int edac_create_dimm_object(struct mem_ctl_info *mci,
+					struct dimm_info *dimm, int index)
+{
+	struct kobject *kobj_mci = &mci->edac_mci_kobj;
+	struct kobject *kobj;
+	int err;
+
+	/* generate ..../edac/mc/mc<id>/dimm<index>   */
+	memset(&dimm->kobj, 0, sizeof(dimm->kobj));
+	dimm->mci = mci;	/* include container up link */
+
+	/* bump the mci instance's kobject's ref count */
+	kobj = kobject_get(&mci->edac_mci_kobj);
+	if (!kobj) {
+		err = -ENODEV;
+		goto err_out;
+	}
+
+	/* Instanstiate the dimm object */
+	err = kobject_init_and_add(&dimm->kobj, &ktype_dimm, kobj_mci,
+				   "dimm%d", index);
+	if (err)
+		goto err_release_top_kobj;
+
+	kobject_uevent(&dimm->kobj, KOBJ_ADD);
+	return 0;
+
+	/* error unwind stack */
+err_release_top_kobj:
+	kobject_put(&mci->edac_mci_kobj);
+
+err_out:
+	return err;
+}
+
+
 /* default sysfs methods and data structures for the main MCI kobject */
 
 static ssize_t mci_reset_counters_store(struct mem_ctl_info *mci,
@@ -905,7 +1055,7 @@ static void edac_remove_mci_instance_attributes(struct mem_ctl_info *mci,
  */
 int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 {
-	int i;
+	int i, j;
 	int err;
 	struct csrow_info *csrow;
 	struct kobject *kobj_mci = &mci->edac_mci_kobj;
@@ -952,8 +1102,24 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 		}
 	}
 
+	/*
+	 * Make directories for each DIMM object under the mc<id> kobject
+	 */
+	for (j = 0; j < mci->nr_dimms; j++) {
+		err = edac_create_dimm_object(mci, &mci->dimms[j] , j);
+		if (err) {
+			debugf1("%s() failure: create dimm %d obj\n",
+				__func__, j);
+			goto fail2;
+		}
+	}
+
 	return 0;
 
+fail2:
+	for (j--; j >= 0; j--)
+		kobject_put(&mci->dimms[i].kobj);
+
 	/* CSROW error: backout what has already been registered,  */
 fail1:
 	for (i--; i >= 0; i--) {
@@ -984,6 +1150,10 @@ void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
 
 	/* remove all csrow kobjects */
 	debugf4("%s()  unregister this mci kobj\n", __func__);
+	for (i = 0; i < mci->nr_dimms; i++) {
+		debugf0("%s()  unreg dimm-%d\n", __func__, i);
+		kobject_put(&mci->dimms[i].kobj);
+	}
 	for (i = 0; i < mci->nr_csrows; i++) {
 		if (mci->csrows[i].nr_pages > 0) {
 			debugf0("%s()  unreg csrow-%d\n", __func__, i);
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 9f4deed..027e478 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -268,6 +268,8 @@ struct dimm_info {
 			unsigned csrow_channel;
 		};
 	} location;
+	struct kobject kobj;		/* sysfs kobject for this csrow */
+	struct mem_ctl_info *mci;	/* the parent */
 };
 
 struct csrow_channel_info {
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 07/31] edac: Prepare to push down to drivers the filling of the dimm_info
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (5 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 06/31] edac: Add per dimm's sysfs nodes Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 08/31] edac: Better describe the memory concepts The memory terms changed along the time, since when EDAC were originally written: new concepts were introduced, and some things have different meanings, depending on the memory architecture. Better define those terms, and better describe each supported memory type Mauro Carvalho Chehab
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Several data that it is currently stored per csrow will be
pushed into the dimm themselves. Due to that, the mci->dimms
filling will need to happen inside the drivers.

Prepare for that, by initializing the dimm fields during mci
alloc. With this change, the changes at the drivers will be
smaller, as they won't need to touch at the fields they don't
currently initialize.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c      |   43 ++++++++++++++++++++-----------------------
 drivers/edac/i5100_edac.c   |    1 +
 drivers/edac/i7core_edac.c  |    1 +
 drivers/edac/i82975x_edac.c |    1 +
 drivers/edac/sb_edac.c      |    1 +
 5 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 93ef044..37dca79 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -214,6 +214,24 @@ struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 		}
 	}
 
+	/*
+	 * By default, assumes that a per-csrow arrangement will be used,
+	 * as most drivers are based on such assumption.
+	 */
+	if (!mci->nr_dimms) {
+		mci->dimm_loc_type = DIMM_LOC_CSROW;
+		dimm = mci->dimms;
+		for (row = 0; row < mci->nr_csrows; row++) {
+			for (chn = 0; chn < mci->csrows[row].nr_channels; chn++) {
+				mci->csrows[row].channels[chn].dimm = dimm;
+				dimm->location.csrow = row;
+				dimm->location.csrow_channel = chn;
+				dimm++;
+				mci->nr_dimms++;
+			}
+		}
+	}
+
 	mci->op_state = OP_ALLOC;
 	INIT_LIST_HEAD(&mci->grp_kobj_list);
 
@@ -512,37 +530,16 @@ EXPORT_SYMBOL(edac_mc_find);
 /* FIXME - should a warning be printed if no error detection? correction? */
 int edac_mc_add_mc(struct mem_ctl_info *mci)
 {
-	int i, j;
-	struct dimm_info *dimm;
-
 	debugf0("%s()\n", __func__);
 
-	/*
-	 * If nr_dimms is not filled, that means that the driver itself
-	 * were not converted to use the new struct, or that the driver
-	 * is for a csrow-based device.
-	 * Fill the dimms accordingly.
-	 */
-	if (!mci->nr_dimms) {
-		mci->dimm_loc_type = DIMM_LOC_CSROW;
-		dimm = mci->dimms;
-		for (i = 0; i < mci->nr_csrows; i++) {
-			for (j = 0; j < mci->csrows[i].nr_channels; j++) {
-				mci->csrows[i].channels[j].dimm = dimm;
-				dimm->location.csrow = i;
-				dimm->location.csrow_channel = j;
-				dimm++;
-				mci->nr_dimms++;
-			}
-		}
-	}
 #ifdef CONFIG_EDAC_DEBUG
 	if (edac_debug_level >= 3)
 		edac_mc_dump_mci(mci);
 
 	if (edac_debug_level >= 4) {
-
+		int i;
 		for (i = 0; i < mci->nr_csrows; i++) {
+			int j;
 			edac_mc_dump_csrow(&mci->csrows[i]);
 			for (j = 0; j < mci->csrows[i].nr_channels; j++)
 				edac_mc_dump_channel(&mci->csrows[i].
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 302e43b..52939ca 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -852,6 +852,7 @@ static void __devinit i5100_init_csrows(struct mem_ctl_info *mci)
 
 	dimm = mci->dimms;
 	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
+	mci->nr_dimms = 0;
 	for (i = 0; i < mci->nr_csrows; i++) {
 		const unsigned long npages = i5100_npages(mci, i);
 		const unsigned chan = i5100_csrow_to_chan(mci, i);
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 4819df8..d6dd9bf 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -641,6 +641,7 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 
 	dimm = mci->dimms;
 	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
+	mci->nr_dimms = 0;
 	for (i = 0; i < NUM_CHANS; i++) {
 		u32 data, dimm_dod[3], value[8];
 
diff --git a/drivers/edac/i82975x_edac.c b/drivers/edac/i82975x_edac.c
index 4a4026e..47f023e 100644
--- a/drivers/edac/i82975x_edac.c
+++ b/drivers/edac/i82975x_edac.c
@@ -379,6 +379,7 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 
 	mci->dimm_loc_type = DIMM_LOC_CSROW;
 	dimm = mci->dimms;
+	mci->nr_dimms = 0;
 	for (index = 0; index < mci->nr_csrows; index++) {
 		csrow = &mci->csrows[index];
 
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 34fa898..43fc65e 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -614,6 +614,7 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 
 	dimm = mci->dimms;
 	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
+	mci->nr_dimms = 0;
 	for (i = 0; i < NUM_CHANNELS; i++) {
 		u32 mtr;
 
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 08/31] edac: Better describe the memory concepts The memory terms changed along the time, since when EDAC were originally written: new concepts were introduced, and some things have different meanings, depending on the memory architecture. Better define those terms, and better describe each supported memory type.
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (6 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 07/31] edac: Prepare to push down to drivers the filling of the dimm_info Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 09/31] i5400_edac: Convert it to report memory with the new location Mauro Carvalho Chehab
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

No functional changes. Just comments were touched.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 include/linux/edac.h |  176 ++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 134 insertions(+), 42 deletions(-)

diff --git a/include/linux/edac.h b/include/linux/edac.h
index 027e478..0f700e3 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -72,25 +72,91 @@ enum hw_event_mc_err_type {
 	HW_EVENT_ERR_FATAL,
 };
 
-/* memory types */
+/**
+ * enum mem_type - memory types
+ *
+ * @MEM_EMPTY		Empty csrow
+ * @MEM_RESERVED:	Reserved csrow type
+ * @MEM_UNKNOWN:	Unknown csrow type
+ * @MEM_FPM:		Fast page mode
+ *			An old asyncronous memory technology, encapsulated as
+ *			SIMM, popularly used between 1987-1995.
+ * @MEM_EDO:		Extended data out
+ *			Asynchronous memory technology used on early Pentium's
+ *			(1995-1998).
+ * @MEM_BEDO:		Burst Extended data out
+ *			EDO memories with performance improvement gains.
+ * @MEM_SDR:		Single data rate SDRAM
+ *			First types of syncronous ram specified by JEDEC
+ *			popular between 1998-2002. The JEDEC standards defined
+ *			3 types of sticks: PC66, PC100 and PC133. There were
+ *			also some non-official overclocked sticks like PC150
+ *			and PC166.
+ *			There are 3 pins for chip select: Pins 0 and 2 are
+ *			for rank 0; pins 1 and 3 are for rank 1, if the memory
+ *			is dual-rank.
+ * @MEM_RDR:		Registered single data rate SDRAM
+ * @MEM_DDR:		Double data rate SDRAM, used between 2002 and 2005.
+ *			The JEDEC standards defined sticks PC1600, PC2100,
+ *			PC2700 and PC3200. Non-official overclocked sticks
+ *			also exists.
+ *			On DDR memories, there are one or two channels. A
+ *			single-channel mode means that one x72 ECC dimm is
+ *			accessed, in order to provide 64 bits of data. On
+ *			dual-channel mode, two dimm's are used simultaneously,
+ *			in order to provide a 128 bits of data. The "cschannel"
+ *			concept used on EDAC refers to such channel type.
+ * @MEM_RDDR:		Registered Double data rate SDRAM
+ *			This is a variant of the DDR memories.
+ *			A registered memory has a buffer inside it, hiding
+ *			part of the memory details to the memory controller.
+ * @MEM_RMBS:		Rambus DRAM
+ *			Rambus uses a high-speed multi-drop serial bus to
+ *			communicate with each RDRAM chip, used on some
+ *			machines between 2000-2002 (Pentium III and IV).
+ * @MEM_DDR2:		DDR2 RAM, as described at JEDEC JESD79-2F.
+ *			Those memories are labed as "PC2-" instead of "PC" to
+ *			differenciate from DDR.
+ * @MEM_FB_DDR2:	Fully-Buffered DDR2, as described at JEDEC Std No. 205
+ *			and JESD206.
+ *			A FB-DIMM channel consists of 14 unidirectional signal
+ *			pairs (Northbound path) from the memories to the MC
+ *			and 10 unidirectional signal pairs (Southbond path)
+ *			from the MC to the DIMM's.
+ *			those memories are x72 ECC DIMMs. Up to 8 DIMMs can
+ *			be connected per channel. When used with 128 bits,
+ *			two channels are needed. The grouping of those two
+ *			channels is called "branch".
+ * @MEM_RDDR2:		Registered DDR2 RAM
+ *			This is a variant of the DDR2 memories.
+ * @MEM_XDR:		Rambus XDR
+ *			It is an evolution of the original RAMBUS memories,
+ *			created to compete with DDR2. Weren't used on any
+ *			x86 arch, but cell_edac PPC memory controller uses it.
+ * @MEM_DDR3:		DDR3 RAM
+ *			Those memories are labed as "PC3-" to differenciate
+ *			from DDR and DDR2.
+ * @MEM_RDDR3:		Registered DDR3 RAM
+ *			This is a variant of the DDR3 memories.
+ */
 enum mem_type {
-	MEM_EMPTY = 0,		/* Empty csrow */
-	MEM_RESERVED,		/* Reserved csrow type */
-	MEM_UNKNOWN,		/* Unknown csrow type */
-	MEM_FPM,		/* Fast page mode */
-	MEM_EDO,		/* Extended data out */
-	MEM_BEDO,		/* Burst Extended data out */
-	MEM_SDR,		/* Single data rate SDRAM */
-	MEM_RDR,		/* Registered single data rate SDRAM */
-	MEM_DDR,		/* Double data rate SDRAM */
-	MEM_RDDR,		/* Registered Double data rate SDRAM */
-	MEM_RMBS,		/* Rambus DRAM */
-	MEM_DDR2,		/* DDR2 RAM */
-	MEM_FB_DDR2,		/* fully buffered DDR2 */
-	MEM_RDDR2,		/* Registered DDR2 RAM */
-	MEM_XDR,		/* Rambus XDR */
-	MEM_DDR3,		/* DDR3 RAM */
-	MEM_RDDR3,		/* Registered DDR3 RAM */
+	MEM_EMPTY = 0,
+	MEM_RESERVED,
+	MEM_UNKNOWN,
+	MEM_FPM,
+	MEM_EDO,
+	MEM_BEDO,
+	MEM_SDR,
+	MEM_RDR,
+	MEM_DDR,
+	MEM_RDDR,
+	MEM_RMBS,
+	MEM_DDR2,
+	MEM_FB_DDR2,
+	MEM_RDDR2,
+	MEM_XDR,
+	MEM_DDR3,
+	MEM_RDDR3,
 };
 
 #define MEM_FLAG_EMPTY		BIT(MEM_EMPTY)
@@ -168,8 +234,9 @@ enum scrub_type {
 #define OP_OFFLINE		0x300
 
 /*
- * There are several things to be aware of that aren't at all obvious:
+ * Concepts used at the EDAC subsystem
  *
+ * There are several things to be aware of that aren't at all obvious:
  *
  * SOCKETS, SOCKET SETS, BANKS, ROWS, CHIP-SELECT ROWS, CHANNELS, etc..
  *
@@ -178,36 +245,61 @@ enum scrub_type {
  * creating a common ground for discussion, terms and their definitions
  * will be established.
  *
- * Memory devices:	The individual chip on a memory stick.  These devices
- *			commonly output 4 and 8 bits each.  Grouping several
- *			of these in parallel provides 64 bits which is common
- *			for a memory stick.
+ * Memory devices:	The individual DRAM chips on a memory stick.  These
+ *			devices commonly output 4 and 8 bits each (x4, x8).
+ *			Grouping several of these in parallel provides the
+ *			number of bits that the memory controller expects:
+ *			typically 72 bits, in order to provide 64 bits of ECC
+ *			corrected data.
  *
  * Memory Stick:	A printed circuit board that aggregates multiple
- *			memory devices in parallel.  This is the atomic
- *			memory component that is purchaseable by Joe consumer
- *			and loaded into a memory socket.
+ *			memory devices in parallel.  In general, this is the
+ *			First replaceable unit (FRU) that the final consumer
+ *			cares to replace. It is typically encapsulated as DIMMs
  *
  * Socket:		A physical connector on the motherboard that accepts
  *			a single memory stick.
  *
- * Csrow-channel:	Set of memory devices on a memory stick that must be
- *			grouped in parallel with one or more additional
- *			channels from other memory sticks.  This parallel
- *			grouping of the output from multiple channels are
- *			necessary for the smallest granularity of memory access.
- *			Some memory controllers are capable of single channel -
- *			which means that memory sticks can be loaded
- *			individually.  Other memory controllers are only
- *			capable of dual channel - which means that memory
- *			sticks must be loaded as pairs (see "socket set").
+ * Branch:		The highest hierarchy on a Fully-Buffered DIMM memory
+ *			controller. Typically, it contains two channels.
+ *			Two channels at the same branch can be used in single
+ *			mode or in lockstep mode.
+ *			When lockstep is enabled, the cache line is higher,
+ *			but it generally brings some performance penalty.
+ *			Also, it is generally not possible to point to just one
+ *			memory stick when an error occurs, as the error
+ *			correction code is calculated using two dimms instead
+ *			of one. Due to that, it is capable of correcting more
+ *			errors than on single mode.
+ *
+ * Channel:		A memory controller channel, responsible to communicate
+ *			with a group of DIMM's. Each channel has its own
+ *			independent control (command) and data bus, and can
+ *			be used independently or grouped.
+ *
+ * Single-channel:	The data accessed by the memory controller is contained
+ *			into one dimm only. E. g. if the data is 64 bits-wide,
+ *			the data flows to the CPU using one 64 bits parallel
+ *			access.
+ *			Typically used with SDR, DDR, DDR2 and DDR3 memories.
+ *			FB-DIMM and RAMBUS use a different concept for channel,
+ *			so this concept doesn't apply there.
+ *
+ * Double-channel:	The data size accessed by the memory controller is
+ *			contained into two dimms accessed at the same time.
+ *			E. g. if the DIMM is 64 bits-wide, the data flows to
+ *			the CPU using a 128 bits parallel access.
+ *			Typically used with SDR, DDR, DDR2 and DDR3 memories.
+ *			FB-DIMM and RAMBUS uses a different concept for channel,
+ *			so this concept doesn't apply there.
  *
- * Chip-select row:	All of the memory devices that are selected together.
- *			for a single, minimum grain of memory access.
- *			This selects all of the parallel memory devices across
- *			all of the parallel channels.  Common chip-select rows
- *			for single channel are 64 bits, for dual channel 128
- *			bits.
+ * Chip-select row:	This is the name of the memory controller signal used
+ *			to select the DRAM chips to be used. It may not be
+ *			visible by the memory controller, as some memory buffer
+ *			chip may be responsible to control it.
+ *			On devices where it is visible, it controls the DIMM
+ *			(or the DIMM pair, in dual-channel mode) that is
+ *			accessed by the memory controller.
  *
  * Single-Ranked stick:	A Single-ranked stick has 1 chip-select row of memory.
  *			Motherboards commonly drive two chip-select pins to
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 09/31] i5400_edac: Convert it to report memory with the new location
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (7 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 08/31] edac: Better describe the memory concepts The memory terms changed along the time, since when EDAC were originally written: new concepts were introduced, and some things have different meanings, depending on the memory architecture. Better define those terms, and better describe each supported memory type Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 10/31] i7300_edac: " Mauro Carvalho Chehab
                   ` (24 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

On this driver, the memory controller supports only FB-DIMMs.

The memory controller hierarchy here has 3 layers bellow the
memory controller:
	- two branches;
	- each branch has two channels;
	- each channel can select up to 4 DIMM's via the
FB-DIMM AMB (Advanced Memory Buffer) chip.

As EDAC currently limits memory controllers to 2 hierarchy
levels, on this patch, both branches and channels are grouped
together.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/i5400_edac.c |   21 +++++++++++----------
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/edac/i5400_edac.c b/drivers/edac/i5400_edac.c
index 74d6ec34..92af805 100644
--- a/drivers/edac/i5400_edac.c
+++ b/drivers/edac/i5400_edac.c
@@ -1137,6 +1137,7 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 	int csrow_megs;
 	int channel;
 	int csrow;
+	struct dimm_info *dimm;
 
 	pvt = mci->pvt_info;
 
@@ -1145,6 +1146,9 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 
 	empty = 1;		/* Assume NO memory */
 
+	dimm = mci->dimms;
+	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
+	mci->nr_dimms = 0;
 	for (csrow = 0; csrow < max_csrows; csrow++) {
 		p_csrow = &mci->csrows[csrow];
 
@@ -1163,6 +1167,9 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 		p_csrow->page_mask = 0xFFF;
 
 		p_csrow->grain = 8;
+		p_csrow->dtype = MTR_DRAM_WIDTH(mtr) ? DEV_X8 : DEV_X4;
+		p_csrow->mtype = MEM_RDDR2;
+		p_csrow->edac_mode = EDAC_SECDED;
 
 		csrow_megs = 0;
 		for (channel = 0; channel < pvt->maxch; channel++)
@@ -1170,16 +1177,10 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 
 		p_csrow->nr_pages = csrow_megs << 8;
 
-		/* Assume DDR2 for now */
-		p_csrow->mtype = MEM_FB_DDR2;
-
-		/* ask what device type on this row */
-		if (MTR_DRAM_WIDTH(mtr))
-			p_csrow->dtype = DEV_X8;
-		else
-			p_csrow->dtype = DEV_X4;
-
-		p_csrow->edac_mode = EDAC_S8ECD8ED;
+		dimm->location.mc_channel = channel;
+		dimm->location.mc_dimm_number = csrow / pvt->maxch;
+		mci->nr_dimms++;
+		dimm++;
 
 		empty = 0;
 	}
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 10/31] i7300_edac: Convert it to report memory with the new location
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (8 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 09/31] i5400_edac: Convert it to report memory with the new location Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 11/31] edac: move dimm properties to struct dimm_info Mauro Carvalho Chehab
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

On this driver, the memory controller supports only FB-DIMMs.

The memory controller hierarchy here has 3 layers bellow the
memory controller:
        - two branches;
        - each branch has two channels;
        - each channel can select up to 8 DIMM's via the
FB-DIMM AMB (Advanced Memory Buffer) chip.

As EDAC currently limits memory controllers to 2 hierarchy
levels, on this patch, both branches and channels are grouped
together.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/i7300_edac.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/edac/i7300_edac.c b/drivers/edac/i7300_edac.c
index 6104dba..ddd5842 100644
--- a/drivers/edac/i7300_edac.c
+++ b/drivers/edac/i7300_edac.c
@@ -779,6 +779,7 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 	int mtr;
 	int ch, branch, slot, channel;
 	u32 last_page = 0, nr_pages;
+	struct dimm_info *dimm;
 
 	pvt = mci->pvt_info;
 
@@ -803,6 +804,10 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 	}
 
 	/* Get the set of MTR[0-7] regs by each branch */
+	dimm = mci->dimms;
+	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
+	mci->nr_dimms = 0;
+	nr_pages = 0;
 	for (slot = 0; slot < MAX_SLOTS; slot++) {
 		int where = mtr_regs[slot];
 		for (branch = 0; branch < MAX_BRANCHES; branch++) {
@@ -815,6 +820,9 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 				dinfo = &pvt->dimm_info[slot][channel];
 				p_csrow = &mci->csrows[slot];
 
+				dimm->location.mc_channel = channel;
+				dimm->location.mc_dimm_number = slot;
+
 				mtr = decode_mtr(pvt, slot, ch, branch,
 						 dinfo, p_csrow, &nr_pages);
 				/* if no DIMMS on this row, continue */
@@ -828,6 +836,9 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 				p_csrow->last_page = last_page;
 
 				rc = 0;
+
+				mci->nr_dimms++;
+				dimm++;
 			}
 		}
 	}
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 11/31] edac: move dimm properties to struct dimm_info
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (9 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 10/31] i7300_edac: " Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 12/31] edac: Don't initialize csrow's first_page & friends when not needed Mauro Carvalho Chehab
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

While inside a csrow, the properties should be equal, some
memory controllers aren't able to access csrows, as they're
hidden by other chips.

So, we need to get rid of the per-csrow data. The first
step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/amd64_edac.c      |   30 +++++++++++-------
 drivers/edac/amd76x_edac.c     |   10 ++++--
 drivers/edac/cell_edac.c       |   10 +++++-
 drivers/edac/cpc925_edac.c     |   62 ++++++++++++++++++++-----------------
 drivers/edac/e752x_edac.c      |   44 ++++++++++++++------------
 drivers/edac/e7xxx_edac.c      |   44 +++++++++++++++-----------
 drivers/edac/edac_mc.c         |   66 ++++++++++++++++++----------------------
 drivers/edac/edac_mc_sysfs.c   |    6 ++--
 drivers/edac/i3000_edac.c      |   18 ++++++-----
 drivers/edac/i3200_edac.c      |   18 ++++++-----
 drivers/edac/i5000_edac.c      |   24 +++++++--------
 drivers/edac/i5100_edac.c      |   16 +++-------
 drivers/edac/i5400_edac.c      |    9 ++---
 drivers/edac/i7300_edac.c      |   19 ++++++-----
 drivers/edac/i7core_edac.c     |   18 +++++-----
 drivers/edac/i82443bxgx_edac.c |   13 +++++---
 drivers/edac/i82860_edac.c     |   11 ++++--
 drivers/edac/i82875p_edac.c    |   17 +++++++---
 drivers/edac/i82975x_edac.c    |   10 ++++--
 drivers/edac/mpc85xx_edac.c    |   13 +++++---
 drivers/edac/mv64x60_edac.c    |   18 ++++++-----
 drivers/edac/pasemi_edac.c     |   10 ++++--
 drivers/edac/ppc4xx_edac.c     |   13 +++++---
 drivers/edac/r82600_edac.c     |   10 ++++--
 drivers/edac/sb_edac.c         |   18 +++++-----
 drivers/edac/tile_edac.c       |   13 ++++----
 drivers/edac/x38_edac.c        |   17 +++++-----
 include/linux/edac.h           |   21 ++++++++-----
 28 files changed, 315 insertions(+), 263 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index c9eee6d..3e7bddc 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2168,7 +2168,9 @@ static int init_csrows(struct mem_ctl_info *mci)
 	struct amd64_pvt *pvt = mci->pvt_info;
 	u64 input_addr_min, input_addr_max, sys_addr, base, mask;
 	u32 val;
-	int i, empty = 1;
+	int i, j, empty = 1;
+	enum mem_type mtype;
+	enum edac_type edac_mode;
 
 	amd64_read_pci_cfg(pvt->F3, NBCFG, &val);
 
@@ -2202,7 +2204,21 @@ static int init_csrows(struct mem_ctl_info *mci)
 		csrow->page_mask = ~mask;
 		/* 8 bytes of resolution */
 
-		csrow->mtype = amd64_determine_memory_type(pvt, i);
+		mtype = amd64_determine_memory_type(pvt, i);
+
+		/*
+		 * determine whether CHIPKILL or JUST ECC or NO ECC is operating
+		 */
+		if (pvt->nbcfg & NBCFG_ECC_ENABLE)
+			edac_mode = (pvt->nbcfg & NBCFG_CHIPKILL) ?
+				    EDAC_S4ECD4ED : EDAC_SECDED;
+		else
+			edac_mode = EDAC_NONE;
+
+		for (j = 0; j < pvt->channel_count; j++) {
+			csrow->channels[j].dimm->mtype = mtype;
+			csrow->channels[j].dimm->edac_mode = edac_mode;
+		}
 
 		debugf1("  for MC node %d csrow %d:\n", pvt->mc_node_id, i);
 		debugf1("    input_addr_min: 0x%lx input_addr_max: 0x%lx\n",
@@ -2214,16 +2230,6 @@ static int init_csrows(struct mem_ctl_info *mci)
 			"last_page: 0x%lx\n",
 			(unsigned)csrow->nr_pages,
 			csrow->first_page, csrow->last_page);
-
-		/*
-		 * determine whether CHIPKILL or JUST ECC or NO ECC is operating
-		 */
-		if (pvt->nbcfg & NBCFG_ECC_ENABLE)
-			csrow->edac_mode =
-			    (pvt->nbcfg & NBCFG_CHIPKILL) ?
-			    EDAC_S4ECD4ED : EDAC_SECDED;
-		else
-			csrow->edac_mode = EDAC_NONE;
 	}
 
 	return empty;
diff --git a/drivers/edac/amd76x_edac.c b/drivers/edac/amd76x_edac.c
index e47e73b..2a63ed0 100644
--- a/drivers/edac/amd76x_edac.c
+++ b/drivers/edac/amd76x_edac.c
@@ -186,11 +186,13 @@ static void amd76x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 			enum edac_type edac_mode)
 {
 	struct csrow_info *csrow;
+	struct dimm_info *dimm;
 	u32 mba, mba_base, mba_mask, dms;
 	int index;
 
 	for (index = 0; index < mci->nr_csrows; index++) {
 		csrow = &mci->csrows[index];
+		dimm = csrow->channels[0].dimm;
 
 		/* find the DRAM Chip Select Base address and mask */
 		pci_read_config_dword(pdev,
@@ -206,10 +208,10 @@ static void amd76x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 		csrow->nr_pages = (mba_mask + 1) >> PAGE_SHIFT;
 		csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
 		csrow->page_mask = mba_mask >> PAGE_SHIFT;
-		csrow->grain = csrow->nr_pages << PAGE_SHIFT;
-		csrow->mtype = MEM_RDDR;
-		csrow->dtype = ((dms >> index) & 0x1) ? DEV_X4 : DEV_UNKNOWN;
-		csrow->edac_mode = edac_mode;
+		dimm->grain = csrow->nr_pages << PAGE_SHIFT;
+		dimm->mtype = MEM_RDDR;
+		dimm->dtype = ((dms >> index) & 0x1) ? DEV_X4 : DEV_UNKNOWN;
+		dimm->edac_mode = edac_mode;
 	}
 }
 
diff --git a/drivers/edac/cell_edac.c b/drivers/edac/cell_edac.c
index 9a6a274..94fbb12 100644
--- a/drivers/edac/cell_edac.c
+++ b/drivers/edac/cell_edac.c
@@ -124,8 +124,10 @@ static void cell_edac_check(struct mem_ctl_info *mci)
 static void __devinit cell_edac_init_csrows(struct mem_ctl_info *mci)
 {
 	struct csrow_info		*csrow = &mci->csrows[0];
+	struct dimm_info		*dimm;
 	struct cell_edac_priv		*priv = mci->pvt_info;
 	struct device_node		*np;
+	int				j;
 
 	for (np = NULL;
 	     (np = of_find_node_by_name(np, "memory")) != NULL;) {
@@ -142,8 +144,12 @@ static void __devinit cell_edac_init_csrows(struct mem_ctl_info *mci)
 		csrow->first_page = r.start >> PAGE_SHIFT;
 		csrow->nr_pages = resource_size(&r) >> PAGE_SHIFT;
 		csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
-		csrow->mtype = MEM_XDR;
-		csrow->edac_mode = EDAC_SECDED;
+
+		for (j = 0; j < csrow->nr_channels; j++) {
+			dimm = csrow->channels[j].dimm;
+			dimm->mtype = MEM_XDR;
+			dimm->edac_mode = EDAC_SECDED;
+		}
 		dev_dbg(mci->dev,
 			"Initialized on node %d, chanmask=0x%x,"
 			" first_page=0x%lx, nr_pages=0x%x\n",
diff --git a/drivers/edac/cpc925_edac.c b/drivers/edac/cpc925_edac.c
index a774c0d..ee90f3d 100644
--- a/drivers/edac/cpc925_edac.c
+++ b/drivers/edac/cpc925_edac.c
@@ -329,7 +329,8 @@ static void cpc925_init_csrows(struct mem_ctl_info *mci)
 {
 	struct cpc925_mc_pdata *pdata = mci->pvt_info;
 	struct csrow_info *csrow;
-	int index;
+	struct dimm_info *dimm;
+	int index, j;
 	u32 mbmr, mbbar, bba;
 	unsigned long row_size, last_nr_pages = 0;
 
@@ -354,32 +355,35 @@ static void cpc925_init_csrows(struct mem_ctl_info *mci)
 		csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
 		last_nr_pages = csrow->last_page + 1;
 
-		csrow->mtype = MEM_RDDR;
-		csrow->edac_mode = EDAC_SECDED;
-
-		switch (csrow->nr_channels) {
-		case 1: /* Single channel */
-			csrow->grain = 32; /* four-beat burst of 32 bytes */
-			break;
-		case 2: /* Dual channel */
-		default:
-			csrow->grain = 64; /* four-beat burst of 64 bytes */
-			break;
-		}
-
-		switch ((mbmr & MBMR_MODE_MASK) >> MBMR_MODE_SHIFT) {
-		case 6: /* 0110, no way to differentiate X8 VS X16 */
-		case 5:	/* 0101 */
-		case 8: /* 1000 */
-			csrow->dtype = DEV_X16;
-			break;
-		case 7: /* 0111 */
-		case 9: /* 1001 */
-			csrow->dtype = DEV_X8;
-			break;
-		default:
-			csrow->dtype = DEV_UNKNOWN;
-			break;
+		for (j = 0; j < csrow->nr_channels; j++) {
+			dimm = csrow->channels[j].dimm;
+			dimm->mtype = MEM_RDDR;
+			dimm->edac_mode = EDAC_SECDED;
+
+			switch (csrow->nr_channels) {
+			case 1: /* Single channel */
+				dimm->grain = 32; /* four-beat burst of 32 bytes */
+				break;
+			case 2: /* Dual channel */
+			default:
+				dimm->grain = 64; /* four-beat burst of 64 bytes */
+				break;
+			}
+
+			switch ((mbmr & MBMR_MODE_MASK) >> MBMR_MODE_SHIFT) {
+			case 6: /* 0110, no way to differentiate X8 VS X16 */
+			case 5:	/* 0101 */
+			case 8: /* 1000 */
+				dimm->dtype = DEV_X16;
+				break;
+			case 7: /* 0111 */
+			case 9: /* 1001 */
+				dimm->dtype = DEV_X8;
+				break;
+			default:
+				dimm->dtype = DEV_UNKNOWN;
+				break;
+			}
 		}
 	}
 }
@@ -962,9 +966,9 @@ static int __devinit cpc925_probe(struct platform_device *pdev)
 		goto err2;
 	}
 
-	nr_channels = cpc925_mc_get_channels(vbase);
+	nr_channels = cpc925_mc_get_channels(vbase) + 1;
 	mci = edac_mc_alloc(sizeof(struct cpc925_mc_pdata),
-			CPC925_NR_CSROWS, nr_channels + 1, edac_mc_idx);
+			CPC925_NR_CSROWS, nr_channels, edac_mc_idx);
 	if (!mci) {
 		cpc925_printk(KERN_ERR, "No memory for mem_ctl_info\n");
 		res = -ENOMEM;
diff --git a/drivers/edac/e752x_edac.c b/drivers/edac/e752x_edac.c
index 1af531a..db291ea 100644
--- a/drivers/edac/e752x_edac.c
+++ b/drivers/edac/e752x_edac.c
@@ -1044,7 +1044,7 @@ static void e752x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 	int drc_drbg;		/* DRB granularity 0=64mb, 1=128mb */
 	int drc_ddim;		/* DRAM Data Integrity Mode 0=none, 2=edac */
 	u8 value;
-	u32 dra, drc, cumul_size;
+	u32 dra, drc, cumul_size, i;
 
 	dra = 0;
 	for (index = 0; index < 4; index++) {
@@ -1053,7 +1053,7 @@ static void e752x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 		dra |= dra_reg << (index * 8);
 	}
 	pci_read_config_dword(pdev, E752X_DRC, &drc);
-	drc_chan = dual_channel_active(ddrcsr);
+	drc_chan = dual_channel_active(ddrcsr) ? 1 : 0;
 	drc_drbg = drc_chan + 1;	/* 128 in dual mode, 64 in single */
 	drc_ddim = (drc >> 20) & 0x3;
 
@@ -1080,24 +1080,28 @@ static void e752x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 		csrow->last_page = cumul_size - 1;
 		csrow->nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
-		csrow->grain = 1 << 12;	/* 4KiB - resolution of CELOG */
-		csrow->mtype = MEM_RDDR;	/* only one type supported */
-		csrow->dtype = mem_dev ? DEV_X4 : DEV_X8;
-
-		/*
-		 * if single channel or x8 devices then SECDED
-		 * if dual channel and x4 then S4ECD4ED
-		 */
-		if (drc_ddim) {
-			if (drc_chan && mem_dev) {
-				csrow->edac_mode = EDAC_S4ECD4ED;
-				mci->edac_cap |= EDAC_FLAG_S4ECD4ED;
-			} else {
-				csrow->edac_mode = EDAC_SECDED;
-				mci->edac_cap |= EDAC_FLAG_SECDED;
-			}
-		} else
-			csrow->edac_mode = EDAC_NONE;
+
+		for (i = 0; i < drc_chan + 1; i++) {
+			struct dimm_info *dimm = csrow->channels[i].dimm;
+			dimm->grain = 1 << 12;	/* 4KiB - resolution of CELOG */
+			dimm->mtype = MEM_RDDR;	/* only one type supported */
+			dimm->dtype = mem_dev ? DEV_X4 : DEV_X8;
+
+			/*
+			* if single channel or x8 devices then SECDED
+			* if dual channel and x4 then S4ECD4ED
+			*/
+			if (drc_ddim) {
+				if (drc_chan && mem_dev) {
+					dimm->edac_mode = EDAC_S4ECD4ED;
+					mci->edac_cap |= EDAC_FLAG_S4ECD4ED;
+				} else {
+					dimm->edac_mode = EDAC_SECDED;
+					mci->edac_cap |= EDAC_FLAG_SECDED;
+				}
+			} else
+				dimm->edac_mode = EDAC_NONE;
+		}
 	}
 }
 
diff --git a/drivers/edac/e7xxx_edac.c b/drivers/edac/e7xxx_edac.c
index 6ffb6d2..178d2af 100644
--- a/drivers/edac/e7xxx_edac.c
+++ b/drivers/edac/e7xxx_edac.c
@@ -347,11 +347,12 @@ static void e7xxx_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 			int dev_idx, u32 drc)
 {
 	unsigned long last_cumul_size;
-	int index;
+	int index, j;
 	u8 value;
 	u32 dra, cumul_size;
 	int drc_chan, drc_drbg, drc_ddim, mem_dev;
 	struct csrow_info *csrow;
+	struct dimm_info *dimm;
 
 	pci_read_config_dword(pdev, E7XXX_DRA, &dra);
 	drc_chan = dual_channel_active(drc, dev_idx);
@@ -381,24 +382,29 @@ static void e7xxx_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 		csrow->last_page = cumul_size - 1;
 		csrow->nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
-		csrow->grain = 1 << 12;	/* 4KiB - resolution of CELOG */
-		csrow->mtype = MEM_RDDR;	/* only one type supported */
-		csrow->dtype = mem_dev ? DEV_X4 : DEV_X8;
-
-		/*
-		 * if single channel or x8 devices then SECDED
-		 * if dual channel and x4 then S4ECD4ED
-		 */
-		if (drc_ddim) {
-			if (drc_chan && mem_dev) {
-				csrow->edac_mode = EDAC_S4ECD4ED;
-				mci->edac_cap |= EDAC_FLAG_S4ECD4ED;
-			} else {
-				csrow->edac_mode = EDAC_SECDED;
-				mci->edac_cap |= EDAC_FLAG_SECDED;
-			}
-		} else
-			csrow->edac_mode = EDAC_NONE;
+
+		for (j = 0; j < drc_chan + 1; j++) {
+			dimm = csrow->channels[j].dimm;
+
+			dimm->grain = 1 << 12;	/* 4KiB - resolution of CELOG */
+			dimm->mtype = MEM_RDDR;	/* only one type supported */
+			dimm->dtype = mem_dev ? DEV_X4 : DEV_X8;
+
+			/*
+			* if single channel or x8 devices then SECDED
+			* if dual channel and x4 then S4ECD4ED
+			*/
+			if (drc_ddim) {
+				if (drc_chan && mem_dev) {
+					dimm->edac_mode = EDAC_S4ECD4ED;
+					mci->edac_cap |= EDAC_FLAG_S4ECD4ED;
+				} else {
+					dimm->edac_mode = EDAC_SECDED;
+					mci->edac_cap |= EDAC_FLAG_SECDED;
+				}
+			} else
+				dimm->edac_mode = EDAC_NONE;
+		}
 	}
 }
 
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 37dca79..3ceddae 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -47,8 +47,7 @@ static void edac_mc_dump_channel(struct csrow_channel_info *chan)
 {
 	debugf4("\tchannel = %p\n", chan);
 	debugf4("\tchannel->chan_idx = %d\n", chan->chan_idx);
-	debugf4("\tchannel->ce_count = %d\n", chan->ce_count);
-	if (chan->dimm)
+	debugf4("\tchannel->ce_count = %d\n", chan->dimm->ce_count);
 		debugf4("\tchannel->label = '%s'\n", chan->dimm->label);
 	debugf4("\tchannel->csrow = %p\n\n", chan->csrow);
 }
@@ -707,6 +706,7 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 {
 	unsigned long remapped_page;
 	char detail[80], *label = NULL;
+	u32 grain;
 
 	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
 
@@ -733,15 +733,15 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 		return;
 	}
 
-	if (mci->csrows[row].channels[channel].dimm)
-		label = mci->csrows[row].channels[channel].dimm->label;
+	label = mci->csrows[row].channels[channel].dimm->label;
+	grain = mci->csrows[row].channels[channel].dimm->grain;
 
 	/* Memory type dependent details about the error */
 	snprintf(detail, sizeof(detail),
 		 " (page 0x%lx, offset 0x%lx, grain %d, "
 		 "syndrome 0x%lx, row %d, channel %d)\n",
 		 page_frame_number, offset_in_page,
-		 mci->csrows[row].grain, syndrome, row, channel);
+		 grain, syndrome, row, channel);
 	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
 		       label, msg, detail);
 
@@ -751,11 +751,12 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 			"CE page 0x%lx, offset 0x%lx, grain %d, syndrome "
 			"0x%lx, row %d, channel %d, label \"%s\": %s\n",
 			page_frame_number, offset_in_page,
-			mci->csrows[row].grain, syndrome, row, channel,
+			grain, syndrome, row, channel,
 			label, msg);
 
 	mci->ce_count++;
 	mci->csrows[row].ce_count++;
+	mci->csrows[row].channels[channel].dimm->ce_count++;
 	mci->csrows[row].channels[channel].ce_count++;
 
 	if (mci->scrub_mode & SCRUB_SW_SRC) {
@@ -772,8 +773,7 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
 			mci->ctl_page_to_phys(mci, page_frame_number) :
 			page_frame_number;
 
-		edac_mc_scrub_block(remapped_page, offset_in_page,
-				mci->csrows[row].grain);
+		edac_mc_scrub_block(remapped_page, offset_in_page, grain);
 	}
 }
 EXPORT_SYMBOL_GPL(edac_mc_handle_ce);
@@ -801,6 +801,7 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 	int chan;
 	int chars;
 	char detail[80], *label = NULL;
+	u32 grain;
 
 	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
 
@@ -816,28 +817,24 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 		return;
 	}
 
-	if (mci->csrows[row].channels[0].dimm) {
-		label = mci->csrows[row].channels[0].dimm->label;
-		chars = snprintf(pos, len + 1, "%s", label);
-		len -= chars;
-		pos += chars;
-	}
+	grain = mci->csrows[row].channels[0].dimm->grain;
+	label = mci->csrows[row].channels[0].dimm->label;
+	chars = snprintf(pos, len + 1, "%s", label);
+	len -= chars;
+	pos += chars;
 
 	for (chan = 1; (chan < mci->csrows[row].nr_channels) && (len > 0);
 		chan++) {
-		if (mci->csrows[row].channels[chan].dimm) {
-			label = mci->csrows[row].channels[chan].dimm->label;
-			chars = snprintf(pos, len + 1, ":%s", label);
-			len -= chars;
-			pos += chars;
-		}
+		label = mci->csrows[row].channels[chan].dimm->label;
+		chars = snprintf(pos, len + 1, ":%s", label);
+		len -= chars;
+		pos += chars;
 	}
 
 	/* Memory type dependent details about the error */
 	snprintf(detail, sizeof(detail),
 		 "page 0x%lx, offset 0x%lx, grain %d, row %d ",
-		 page_frame_number, offset_in_page,
-	         mci->csrows[row].grain, row);
+		 page_frame_number, offset_in_page, grain, row);
 	trace_mc_error(HW_EVENT_ERR_UNCORRECTED, mci->mc_idx,
 		       labels,
 		       msg, detail);
@@ -846,14 +843,13 @@ void edac_mc_handle_ue(struct mem_ctl_info *mci,
 		edac_mc_printk(mci, KERN_EMERG,
 			"UE page 0x%lx, offset 0x%lx, grain %d, row %d, "
 			"labels \"%s\": %s\n", page_frame_number,
-			offset_in_page, mci->csrows[row].grain, row,
-			labels, msg);
+			offset_in_page, grain, row, labels, msg);
 
 	if (edac_mc_get_panic_on_ue())
 		panic("EDAC MC%d: UE page 0x%lx, offset 0x%lx, grain %d, "
 			"row %d, labels \"%s\": %s\n", mci->mc_idx,
 			page_frame_number, offset_in_page,
-			mci->csrows[row].grain, row, labels, msg);
+			grain, row, labels, msg);
 
 	mci->ue_count++;
 	mci->csrows[row].ue_count++;
@@ -930,15 +926,13 @@ void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
 	mci->csrows[csrow].ue_count++;
 
 	/* Generate the DIMM labels from the specified channels */
-	if (mci->csrows[csrow].channels[channela].dimm) {
-		label = mci->csrows[csrow].channels[channela].dimm->label;
-		chars = snprintf(pos, len + 1, "%s", label);
-		len -= chars;
-		pos += chars;
-	}
-	if (mci->csrows[csrow].channels[channela].dimm)
-		chars = snprintf(pos, len + 1, "-%s",
-				mci->csrows[csrow].channels[channelb].dimm->label);
+	label = mci->csrows[csrow].channels[channela].dimm->label;
+	chars = snprintf(pos, len + 1, "%s", label);
+	len -= chars;
+	pos += chars;
+
+	chars = snprintf(pos, len + 1, "-%s",
+			mci->csrows[csrow].channels[channelb].dimm->label);
 
 	/* Memory type dependent details about the error */
 	snprintf(detail, sizeof(detail),
@@ -995,8 +989,7 @@ void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
 		 "(row %d, channel %d)\n",
 		 csrow, channel);
 
-	if (mci->csrows[csrow].channels[channel].dimm)
-		label = mci->csrows[csrow].channels[channel].dimm->label;
+	label = mci->csrows[csrow].channels[channel].dimm->label;
 
 	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
 		       label, msg, detail);
@@ -1009,6 +1002,7 @@ void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
 
 	mci->ce_count++;
 	mci->csrows[csrow].ce_count++;
+	mci->csrows[csrow].channels[channel].dimm->ce_count++;
 	mci->csrows[csrow].channels[channel].ce_count++;
 }
 EXPORT_SYMBOL(edac_mc_handle_fbd_ce);
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 54b24cb..1571d99 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -150,19 +150,19 @@ static ssize_t csrow_size_show(struct csrow_info *csrow, char *data,
 static ssize_t csrow_mem_type_show(struct csrow_info *csrow, char *data,
 				int private)
 {
-	return sprintf(data, "%s\n", mem_types[csrow->mtype]);
+	return sprintf(data, "%s\n", mem_types[csrow->channels[0].dimm->mtype]);
 }
 
 static ssize_t csrow_dev_type_show(struct csrow_info *csrow, char *data,
 				int private)
 {
-	return sprintf(data, "%s\n", dev_types[csrow->dtype]);
+	return sprintf(data, "%s\n", dev_types[csrow->channels[0].dimm->dtype]);
 }
 
 static ssize_t csrow_edac_mode_show(struct csrow_info *csrow, char *data,
 				int private)
 {
-	return sprintf(data, "%s\n", edac_caps[csrow->edac_mode]);
+	return sprintf(data, "%s\n", edac_caps[csrow->channels[0].dimm->edac_mode]);
 }
 
 /* show/store functions for DIMM Label attributes */
diff --git a/drivers/edac/i3000_edac.c b/drivers/edac/i3000_edac.c
index c0510b3..1498c5f 100644
--- a/drivers/edac/i3000_edac.c
+++ b/drivers/edac/i3000_edac.c
@@ -304,7 +304,7 @@ static int i3000_is_interleaved(const unsigned char *c0dra,
 static int i3000_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	int rc;
-	int i;
+	int i, j;
 	struct mem_ctl_info *mci = NULL;
 	unsigned long last_cumul_size;
 	int interleaved, nr_channels;
@@ -386,19 +386,21 @@ static int i3000_probe1(struct pci_dev *pdev, int dev_idx)
 			cumul_size <<= 1;
 		debugf3("MC: %s(): (%d) cumul_size 0x%x\n",
 			__func__, i, cumul_size);
-		if (cumul_size == last_cumul_size) {
-			csrow->mtype = MEM_EMPTY;
+		if (cumul_size == last_cumul_size)
 			continue;
-		}
 
 		csrow->first_page = last_cumul_size;
 		csrow->last_page = cumul_size - 1;
 		csrow->nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
-		csrow->grain = I3000_DEAP_GRAIN;
-		csrow->mtype = MEM_DDR2;
-		csrow->dtype = DEV_UNKNOWN;
-		csrow->edac_mode = EDAC_UNKNOWN;
+
+		for (j = 0; j < nr_channels; j++) {
+			struct dimm_info *dimm = csrow->channels[j].dimm;
+			dimm->grain = I3000_DEAP_GRAIN;
+			dimm->mtype = MEM_DDR2;
+			dimm->dtype = DEV_UNKNOWN;
+			dimm->edac_mode = EDAC_UNKNOWN;
+		}
 	}
 
 	/*
diff --git a/drivers/edac/i3200_edac.c b/drivers/edac/i3200_edac.c
index aa08497..38d1e87 100644
--- a/drivers/edac/i3200_edac.c
+++ b/drivers/edac/i3200_edac.c
@@ -330,7 +330,7 @@ static unsigned long drb_to_nr_pages(
 static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	int rc;
-	int i;
+	int i, j;
 	struct mem_ctl_info *mci = NULL;
 	unsigned long last_page;
 	u16 drbs[I3200_CHANNELS][I3200_RANKS_PER_CHANNEL];
@@ -386,20 +386,22 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 			i / I3200_RANKS_PER_CHANNEL,
 			i % I3200_RANKS_PER_CHANNEL);
 
-		if (nr_pages == 0) {
-			csrow->mtype = MEM_EMPTY;
+		if (nr_pages == 0)
 			continue;
-		}
 
 		csrow->first_page = last_page + 1;
 		last_page += nr_pages;
 		csrow->last_page = last_page;
 		csrow->nr_pages = nr_pages;
 
-		csrow->grain = nr_pages << PAGE_SHIFT;
-		csrow->mtype = MEM_DDR2;
-		csrow->dtype = DEV_UNKNOWN;
-		csrow->edac_mode = EDAC_UNKNOWN;
+		for (j = 0; j < nr_channels; j++) {
+			struct dimm_info *dimm = csrow->channels[j].dimm;
+
+			dimm->grain = nr_pages << PAGE_SHIFT;
+			dimm->mtype = MEM_DDR2;
+			dimm->dtype = DEV_UNKNOWN;
+			dimm->edac_mode = EDAC_UNKNOWN;
+		}
 	}
 
 	i3200_clear_error_info(mci);
diff --git a/drivers/edac/i5000_edac.c b/drivers/edac/i5000_edac.c
index 4dc3ac2..e612f1e 100644
--- a/drivers/edac/i5000_edac.c
+++ b/drivers/edac/i5000_edac.c
@@ -1268,25 +1268,23 @@ static int i5000_init_csrows(struct mem_ctl_info *mci)
 		p_csrow->last_page = 9 + csrow * 20;
 		p_csrow->page_mask = 0xFFF;
 
-		p_csrow->grain = 8;
-
 		csrow_megs = 0;
 		for (channel = 0; channel < pvt->maxch; channel++) {
 			csrow_megs += pvt->dimm_info[csrow][channel].megabytes;
-		}
+			p_csrow->channels[channel].dimm->grain = 8;
 
-		p_csrow->nr_pages = csrow_megs << 8;
+			/* Assume DDR2 for now */
+			p_csrow->channels[channel].dimm->mtype = MEM_FB_DDR2;
 
-		/* Assume DDR2 for now */
-		p_csrow->mtype = MEM_FB_DDR2;
+			/* ask what device type on this row */
+			if (MTR_DRAM_WIDTH(mtr))
+				p_csrow->channels[channel].dimm->dtype = DEV_X8;
+			else
+				p_csrow->channels[channel].dimm->dtype = DEV_X4;
 
-		/* ask what device type on this row */
-		if (MTR_DRAM_WIDTH(mtr))
-			p_csrow->dtype = DEV_X8;
-		else
-			p_csrow->dtype = DEV_X4;
-
-		p_csrow->edac_mode = EDAC_S8ECD8ED;
+			p_csrow->channels[channel].dimm->edac_mode = EDAC_S8ECD8ED;
+		}
+		p_csrow->nr_pages = csrow_megs << 8;
 
 		empty = 0;
 	}
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 52939ca..1884c36 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -867,27 +867,21 @@ static void __devinit i5100_init_csrows(struct mem_ctl_info *mci)
 		 */
 		mci->csrows[i].first_page = total_pages;
 		mci->csrows[i].last_page = total_pages + npages - 1;
-		mci->csrows[i].page_mask = 0UL;
-
 		mci->csrows[i].nr_pages = npages;
-		mci->csrows[i].grain = 32;
 		mci->csrows[i].csrow_idx = i;
-		mci->csrows[i].dtype =
-			(priv->mtr[chan][rank].width == 4) ? DEV_X4 : DEV_X8;
-		mci->csrows[i].ue_count = 0;
-		mci->csrows[i].ce_count = 0;
-		mci->csrows[i].mtype = MEM_RDDR2;
-		mci->csrows[i].edac_mode = EDAC_SECDED;
 		mci->csrows[i].mci = mci;
 		mci->csrows[i].nr_channels = 1;
-		mci->csrows[i].channels[0].chan_idx = 0;
-		mci->csrows[i].channels[0].ce_count = 0;
 		mci->csrows[i].channels[0].csrow = mci->csrows + i;
 		total_pages += npages;
 
 		mci->csrows[i].channels[0].dimm = dimm;
 		dimm->location.mc_channel = chan;
 		dimm->location.mc_dimm_number = rank;
+		dimm->grain = 32;
+		dimm->dtype = (priv->mtr[chan][rank].width == 4) ?
+			      DEV_X4 : DEV_X8;
+		dimm->mtype = MEM_RDDR2;
+		dimm->edac_mode = EDAC_SECDED;
 		snprintf(dimm->label, sizeof(dimm->label),
 			 "DIMM%u",
 			 i5100_rank_to_slot(mci, chan, rank));
diff --git a/drivers/edac/i5400_edac.c b/drivers/edac/i5400_edac.c
index 92af805..35784f2 100644
--- a/drivers/edac/i5400_edac.c
+++ b/drivers/edac/i5400_edac.c
@@ -1166,11 +1166,6 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 		p_csrow->last_page = 9 + csrow * 20;
 		p_csrow->page_mask = 0xFFF;
 
-		p_csrow->grain = 8;
-		p_csrow->dtype = MTR_DRAM_WIDTH(mtr) ? DEV_X8 : DEV_X4;
-		p_csrow->mtype = MEM_RDDR2;
-		p_csrow->edac_mode = EDAC_SECDED;
-
 		csrow_megs = 0;
 		for (channel = 0; channel < pvt->maxch; channel++)
 			csrow_megs += pvt->dimm_info[csrow][channel].megabytes;
@@ -1179,6 +1174,10 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 
 		dimm->location.mc_channel = channel;
 		dimm->location.mc_dimm_number = csrow / pvt->maxch;
+		dimm->grain = 8;
+		dimm->dtype = MTR_DRAM_WIDTH(mtr) ? DEV_X8 : DEV_X4;
+		dimm->mtype = MEM_RDDR2;
+		dimm->edac_mode = EDAC_SECDED;
 		mci->nr_dimms++;
 		dimm++;
 
diff --git a/drivers/edac/i7300_edac.c b/drivers/edac/i7300_edac.c
index ddd5842..21a8c35 100644
--- a/drivers/edac/i7300_edac.c
+++ b/drivers/edac/i7300_edac.c
@@ -618,6 +618,7 @@ static int decode_mtr(struct i7300_pvt *pvt,
 		      int slot, int ch, int branch,
 		      struct i7300_dimm_info *dinfo,
 		      struct csrow_info *p_csrow,
+		      struct dimm_info *dimm,
 		      u32 *nr_pages)
 {
 	int mtr, ans, addrBits, channel;
@@ -663,10 +664,7 @@ static int decode_mtr(struct i7300_pvt *pvt,
 	debugf2("\t\tNUMCOL: %s\n", numcol_toString[MTR_DIMM_COLS(mtr)]);
 	debugf2("\t\tSIZE: %d MB\n", dinfo->megabytes);
 
-	p_csrow->grain = 8;
-	p_csrow->mtype = MEM_FB_DDR2;
 	p_csrow->csrow_idx = slot;
-	p_csrow->page_mask = 0;
 
 	/*
 	 * The type of error detection actually depends of the
@@ -677,15 +675,17 @@ static int decode_mtr(struct i7300_pvt *pvt,
 	 * See datasheet Sections 7.3.6 to 7.3.8
 	 */
 
+	dimm->grain = 8;
+	dimm->mtype = MEM_FB_DDR2;
 	if (IS_SINGLE_MODE(pvt->mc_settings_a)) {
-		p_csrow->edac_mode = EDAC_SECDED;
+		dimm->edac_mode = EDAC_SECDED;
 		debugf2("\t\tECC code is 8-byte-over-32-byte SECDED+ code\n");
 	} else {
 		debugf2("\t\tECC code is on Lockstep mode\n");
 		if (MTR_DRAM_WIDTH(mtr) == 8)
-			p_csrow->edac_mode = EDAC_S8ECD8ED;
+			dimm->edac_mode = EDAC_S8ECD8ED;
 		else
-			p_csrow->edac_mode = EDAC_S4ECD4ED;
+			dimm->edac_mode = EDAC_S4ECD4ED;
 	}
 
 	/* ask what device type on this row */
@@ -694,9 +694,9 @@ static int decode_mtr(struct i7300_pvt *pvt,
 			IS_SCRBALGO_ENHANCED(pvt->mc_settings) ?
 					    "enhanced" : "normal");
 
-		p_csrow->dtype = DEV_X8;
+		dimm->dtype = DEV_X8;
 	} else
-		p_csrow->dtype = DEV_X4;
+		dimm->dtype = DEV_X4;
 
 	return mtr;
 }
@@ -824,7 +824,8 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 				dimm->location.mc_dimm_number = slot;
 
 				mtr = decode_mtr(pvt, slot, ch, branch,
-						 dinfo, p_csrow, &nr_pages);
+						 dinfo, p_csrow, dimm,
+						 &nr_pages);
 				/* if no DIMMS on this row, continue */
 				if (!MTR_DIMMS_PRESENT(mtr))
 					continue;
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index d6dd9bf..66879e6 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -725,7 +725,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 			csr->nr_pages = npages;
 
 			csr->page_mask = 0;
-			csr->grain = 8;
 			csr->csrow_idx = csrow;
 			csr->nr_channels = 1;
 
@@ -736,30 +735,31 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 
 			switch (banks) {
 			case 4:
-				csr->dtype = DEV_X4;
+				dimm->dtype = DEV_X4;
 				break;
 			case 8:
-				csr->dtype = DEV_X8;
+				dimm->dtype = DEV_X8;
 				break;
 			case 16:
-				csr->dtype = DEV_X16;
+				dimm->dtype = DEV_X16;
 				break;
 			default:
-				csr->dtype = DEV_UNKNOWN;
+				dimm->dtype = DEV_UNKNOWN;
 			}
 
 			csr->channels[0].dimm = dimm;
+
 			dimm->location.mc_channel = i;
 			dimm->location.mc_dimm_number = j;
 			snprintf(dimm->label, sizeof(dimm->label),
 				 "CPU#%uChannel#%u_DIMM#%u",
 				 pvt->i7core_dev->socket, i, j);
+			dimm->grain = 8;
+			dimm->edac_mode = mode;
+			dimm->mtype = mtype;
+
 			mci->nr_dimms++;
 			dimm++;
-
-			csr->edac_mode = mode;
-			csr->mtype = mtype;
-
 			csrow++;
 		}
 
diff --git a/drivers/edac/i82443bxgx_edac.c b/drivers/edac/i82443bxgx_edac.c
index 4329d39..1e19492 100644
--- a/drivers/edac/i82443bxgx_edac.c
+++ b/drivers/edac/i82443bxgx_edac.c
@@ -12,7 +12,7 @@
  * 440GX fix by Jason Uhlenkott <juhlenko@akamai.com>.
  *
  * Written with reference to 82443BX Host Bridge Datasheet:
- * http://download.intel.com/design/chipsets/datashts/29063301.pdf 
+ * http://download.intel.com/design/chipsets/datashts/29063301.pdf
  * references to this document given in [].
  *
  * This module doesn't support the 440LX, but it may be possible to
@@ -189,6 +189,7 @@ static void i82443bxgx_init_csrows(struct mem_ctl_info *mci,
 				enum mem_type mtype)
 {
 	struct csrow_info *csrow;
+	struct dimm_info *dimm;
 	int index;
 	u8 drbar, dramc;
 	u32 row_base, row_high_limit, row_high_limit_last;
@@ -197,6 +198,8 @@ static void i82443bxgx_init_csrows(struct mem_ctl_info *mci,
 	row_high_limit_last = 0;
 	for (index = 0; index < mci->nr_csrows; index++) {
 		csrow = &mci->csrows[index];
+		dimm = csrow->channels[0].dimm;
+
 		pci_read_config_byte(pdev, I82443BXGX_DRB + index, &drbar);
 		debugf1("MC%d: %s: %s() Row=%d DRB = %#0x\n",
 			mci->mc_idx, __FILE__, __func__, index, drbar);
@@ -219,12 +222,12 @@ static void i82443bxgx_init_csrows(struct mem_ctl_info *mci,
 		csrow->last_page = (row_high_limit >> PAGE_SHIFT) - 1;
 		csrow->nr_pages = csrow->last_page - csrow->first_page + 1;
 		/* EAP reports in 4kilobyte granularity [61] */
-		csrow->grain = 1 << 12;
-		csrow->mtype = mtype;
+		dimm->grain = 1 << 12;
+		dimm->mtype = mtype;
 		/* I don't think 440BX can tell you device type? FIXME? */
-		csrow->dtype = DEV_UNKNOWN;
+		dimm->dtype = DEV_UNKNOWN;
 		/* Mode is global to all rows on 440BX */
-		csrow->edac_mode = edac_mode;
+		dimm->edac_mode = edac_mode;
 		row_high_limit_last = row_high_limit;
 	}
 }
diff --git a/drivers/edac/i82860_edac.c b/drivers/edac/i82860_edac.c
index 931a057..acbd924 100644
--- a/drivers/edac/i82860_edac.c
+++ b/drivers/edac/i82860_edac.c
@@ -140,6 +140,7 @@ static void i82860_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev)
 	u16 value;
 	u32 cumul_size;
 	struct csrow_info *csrow;
+	struct dimm_info *dimm;
 	int index;
 
 	pci_read_config_word(pdev, I82860_MCHCFG, &mchcfg_ddim);
@@ -153,6 +154,8 @@ static void i82860_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev)
 	 */
 	for (index = 0; index < mci->nr_csrows; index++) {
 		csrow = &mci->csrows[index];
+		dimm = csrow->channels[0].dimm;
+
 		pci_read_config_word(pdev, I82860_GBA + index * 2, &value);
 		cumul_size = (value & I82860_GBA_MASK) <<
 			(I82860_GBA_SHIFT - PAGE_SHIFT);
@@ -166,10 +169,10 @@ static void i82860_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev)
 		csrow->last_page = cumul_size - 1;
 		csrow->nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
-		csrow->grain = 1 << 12;	/* I82860_EAP has 4KiB reolution */
-		csrow->mtype = MEM_RMBS;
-		csrow->dtype = DEV_UNKNOWN;
-		csrow->edac_mode = mchcfg_ddim ? EDAC_SECDED : EDAC_NONE;
+		dimm->grain = 1 << 12;	/* I82860_EAP has 4KiB reolution */
+		dimm->mtype = MEM_RMBS;
+		dimm->dtype = DEV_UNKNOWN;
+		dimm->edac_mode = mchcfg_ddim ? EDAC_SECDED : EDAC_NONE;
 	}
 }
 
diff --git a/drivers/edac/i82875p_edac.c b/drivers/edac/i82875p_edac.c
index 33864c6..81f79e2 100644
--- a/drivers/edac/i82875p_edac.c
+++ b/drivers/edac/i82875p_edac.c
@@ -342,11 +342,13 @@ static void i82875p_init_csrows(struct mem_ctl_info *mci,
 				void __iomem * ovrfl_window, u32 drc)
 {
 	struct csrow_info *csrow;
+	struct dimm_info *dimm;
+	unsigned nr_chans = dual_channel_active(drc) + 1;
 	unsigned long last_cumul_size;
 	u8 value;
 	u32 drc_ddim;		/* DRAM Data Integrity Mode 0=none,2=edac */
 	u32 cumul_size;
-	int index;
+	int index, j;
 
 	drc_ddim = (drc >> 18) & 0x1;
 	last_cumul_size = 0;
@@ -371,10 +373,15 @@ static void i82875p_init_csrows(struct mem_ctl_info *mci,
 		csrow->last_page = cumul_size - 1;
 		csrow->nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
-		csrow->grain = 1 << 12;	/* I82875P_EAP has 4KiB reolution */
-		csrow->mtype = MEM_DDR;
-		csrow->dtype = DEV_UNKNOWN;
-		csrow->edac_mode = drc_ddim ? EDAC_SECDED : EDAC_NONE;
+
+		for (j = 0; j < nr_chans; j++) {
+			dimm = csrow->channels[j].dimm;
+
+			dimm->grain = 1 << 12;	/* I82875P_EAP has 4KiB reolution */
+			dimm->mtype = MEM_DDR;
+			dimm->dtype = DEV_UNKNOWN;
+			dimm->edac_mode = drc_ddim ? EDAC_SECDED : EDAC_NONE;
+		}
 	}
 }
 
diff --git a/drivers/edac/i82975x_edac.c b/drivers/edac/i82975x_edac.c
index 47f023e..9e1bca5 100644
--- a/drivers/edac/i82975x_edac.c
+++ b/drivers/edac/i82975x_edac.c
@@ -365,6 +365,7 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 	u32 cumul_size;
 	int index, chan;
 	struct dimm_info *dimm;
+	enum dev_type dtype;
 
 	last_cumul_size = 0;
 
@@ -402,6 +403,7 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 		 *   [0-7] for single-channel; i.e. csrow->nr_channels = 1
 		 *   [0-3] for dual-channel; i.e. csrow->nr_channels = 2
 		 */
+		dtype = i82975x_dram_type(mch_window, index);
 		for (chan = 0; chan < csrow->nr_channels; chan++) {
 			mci->csrows[index].channels[chan].dimm = dimm;
 			dimm->location.csrow = index;
@@ -409,6 +411,10 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 			strncpy(csrow->channels[chan].dimm->label,
 					labels[(index >> 1) + (chan * 2)],
 					EDAC_MC_LABEL_LEN);
+			dimm->grain = 1 << 6;	/* I82975X_EAP has 64B resolution */
+			dimm->dtype = dtype;
+			dimm->mtype = MEM_DDR2; /* I82975x supports only DDR2 */
+			dimm->edac_mode = EDAC_SECDED; /* only supported */
 			dimm++;
 			mci->nr_dimms++;
 		}
@@ -420,10 +426,6 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 		csrow->last_page = cumul_size - 1;
 		csrow->nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
-		csrow->grain = 1 << 6;	/* I82975X_EAP has 64B resolution */
-		csrow->mtype = MEM_DDR2; /* I82975x supports only DDR2 */
-		csrow->dtype = i82975x_dram_type(mch_window, index);
-		csrow->edac_mode = EDAC_SECDED; /* only supported */
 	}
 }
 
diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index 73464a6..fb92916 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -883,6 +883,7 @@ static void __devinit mpc85xx_init_csrows(struct mem_ctl_info *mci)
 {
 	struct mpc85xx_mc_pdata *pdata = mci->pvt_info;
 	struct csrow_info *csrow;
+	struct dimm_info *dimm;
 	u32 sdram_ctl;
 	u32 sdtype;
 	enum mem_type mtype;
@@ -929,6 +930,8 @@ static void __devinit mpc85xx_init_csrows(struct mem_ctl_info *mci)
 		u32 end;
 
 		csrow = &mci->csrows[index];
+		dimm = csrow->channels[0].dimm;
+
 		cs_bnds = in_be32(pdata->mc_vbase + MPC85XX_MC_CS_BNDS_0 +
 				  (index * MPC85XX_MC_CS_BNDS_OFS));
 
@@ -945,12 +948,12 @@ static void __devinit mpc85xx_init_csrows(struct mem_ctl_info *mci)
 		csrow->first_page = start;
 		csrow->last_page = end;
 		csrow->nr_pages = end + 1 - start;
-		csrow->grain = 8;
-		csrow->mtype = mtype;
-		csrow->dtype = DEV_UNKNOWN;
+		dimm->grain = 8;
+		dimm->mtype = mtype;
+		dimm->dtype = DEV_UNKNOWN;
 		if (sdram_ctl & DSC_X32_EN)
-			csrow->dtype = DEV_X32;
-		csrow->edac_mode = EDAC_SECDED;
+			dimm->dtype = DEV_X32;
+		dimm->edac_mode = EDAC_SECDED;
 	}
 }
 
diff --git a/drivers/edac/mv64x60_edac.c b/drivers/edac/mv64x60_edac.c
index 7e5ff36..12d7fe0 100644
--- a/drivers/edac/mv64x60_edac.c
+++ b/drivers/edac/mv64x60_edac.c
@@ -656,6 +656,8 @@ static void mv64x60_init_csrows(struct mem_ctl_info *mci,
 				struct mv64x60_mc_pdata *pdata)
 {
 	struct csrow_info *csrow;
+	struct dimm_info *dimm;
+
 	u32 devtype;
 	u32 ctl;
 
@@ -664,30 +666,30 @@ static void mv64x60_init_csrows(struct mem_ctl_info *mci,
 	ctl = in_le32(pdata->mc_vbase + MV64X60_SDRAM_CONFIG);
 
 	csrow = &mci->csrows[0];
-	csrow->first_page = 0;
+	dimm = csrow->channels[0].dimm;
 	csrow->nr_pages = pdata->total_mem >> PAGE_SHIFT;
 	csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
-	csrow->grain = 8;
+	dimm->grain = 8;
 
-	csrow->mtype = (ctl & MV64X60_SDRAM_REGISTERED) ? MEM_RDDR : MEM_DDR;
+	dimm->mtype = (ctl & MV64X60_SDRAM_REGISTERED) ? MEM_RDDR : MEM_DDR;
 
 	devtype = (ctl >> 20) & 0x3;
 	switch (devtype) {
 	case 0x0:
-		csrow->dtype = DEV_X32;
+		dimm->dtype = DEV_X32;
 		break;
 	case 0x2:		/* could be X8 too, but no way to tell */
-		csrow->dtype = DEV_X16;
+		dimm->dtype = DEV_X16;
 		break;
 	case 0x3:
-		csrow->dtype = DEV_X4;
+		dimm->dtype = DEV_X4;
 		break;
 	default:
-		csrow->dtype = DEV_UNKNOWN;
+		dimm->dtype = DEV_UNKNOWN;
 		break;
 	}
 
-	csrow->edac_mode = EDAC_SECDED;
+	dimm->edac_mode = EDAC_SECDED;
 }
 
 static int __devinit mv64x60_mc_err_probe(struct platform_device *pdev)
diff --git a/drivers/edac/pasemi_edac.c b/drivers/edac/pasemi_edac.c
index 7f71ee4..4e53270 100644
--- a/drivers/edac/pasemi_edac.c
+++ b/drivers/edac/pasemi_edac.c
@@ -135,11 +135,13 @@ static int pasemi_edac_init_csrows(struct mem_ctl_info *mci,
 				   enum edac_type edac_mode)
 {
 	struct csrow_info *csrow;
+	struct dimm_info *dimm;
 	u32 rankcfg;
 	int index;
 
 	for (index = 0; index < mci->nr_csrows; index++) {
 		csrow = &mci->csrows[index];
+		dimm = csrow->channels[0].dimm;
 
 		pci_read_config_dword(pdev,
 				      MCDRAM_RANKCFG + (index * 12),
@@ -177,10 +179,10 @@ static int pasemi_edac_init_csrows(struct mem_ctl_info *mci,
 		csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
 		last_page_in_mmc += csrow->nr_pages;
 		csrow->page_mask = 0;
-		csrow->grain = PASEMI_EDAC_ERROR_GRAIN;
-		csrow->mtype = MEM_DDR;
-		csrow->dtype = DEV_UNKNOWN;
-		csrow->edac_mode = edac_mode;
+		dimm->grain = PASEMI_EDAC_ERROR_GRAIN;
+		dimm->mtype = MEM_DDR;
+		dimm->dtype = DEV_UNKNOWN;
+		dimm->edac_mode = edac_mode;
 	}
 	return 0;
 }
diff --git a/drivers/edac/ppc4xx_edac.c b/drivers/edac/ppc4xx_edac.c
index 3840096..7c98f66 100644
--- a/drivers/edac/ppc4xx_edac.c
+++ b/drivers/edac/ppc4xx_edac.c
@@ -895,7 +895,7 @@ ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
 	enum mem_type mtype;
 	enum dev_type dtype;
 	enum edac_type edac_mode;
-	int row;
+	int row, j;
 	u32 mbxcf, size;
 	static u32 ppc4xx_last_page;
 
@@ -975,15 +975,18 @@ ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
 		 * possible values would be the PLB width (16), the
 		 * page size (PAGE_SIZE) or the memory width (2 or 4).
 		 */
+		for (j = 0; j < csi->nr_channels; j++) {
+			struct dimm_info *dimm = csi->channels[j].dimm;
 
-		csi->grain	= 1;
+			dimm->grain	= 1;
 
-		csi->mtype	= mtype;
-		csi->dtype	= dtype;
+			dimm->mtype	= mtype;
+			dimm->dtype	= dtype;
 
-		csi->edac_mode	= edac_mode;
+			dimm->edac_mode	= edac_mode;
 
 		ppc4xx_last_page += csi->nr_pages;
+		}
 	}
 
  done:
diff --git a/drivers/edac/r82600_edac.c b/drivers/edac/r82600_edac.c
index b153674..c8b774d 100644
--- a/drivers/edac/r82600_edac.c
+++ b/drivers/edac/r82600_edac.c
@@ -216,6 +216,7 @@ static void r82600_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 			u8 dramcr)
 {
 	struct csrow_info *csrow;
+	struct dimm_info *dimm;
 	int index;
 	u8 drbar;		/* SDRAM Row Boundary Address Register */
 	u32 row_high_limit, row_high_limit_last;
@@ -227,6 +228,7 @@ static void r82600_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 
 	for (index = 0; index < mci->nr_csrows; index++) {
 		csrow = &mci->csrows[index];
+		dimm = csrow->channels[0].dimm;
 
 		/* find the DRAM Chip Select Base address and mask */
 		pci_read_config_byte(pdev, R82600_DRBA + index, &drbar);
@@ -250,13 +252,13 @@ static void r82600_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 		csrow->nr_pages = csrow->last_page - csrow->first_page + 1;
 		/* Error address is top 19 bits - so granularity is      *
 		 * 14 bits                                               */
-		csrow->grain = 1 << 14;
-		csrow->mtype = reg_sdram ? MEM_RDDR : MEM_DDR;
+		dimm->grain = 1 << 14;
+		dimm->mtype = reg_sdram ? MEM_RDDR : MEM_DDR;
 		/* FIXME - check that this is unknowable with this chipset */
-		csrow->dtype = DEV_UNKNOWN;
+		dimm->dtype = DEV_UNKNOWN;
 
 		/* Mode is global on 82600 */
-		csrow->edac_mode = ecc_on ? EDAC_SECDED : EDAC_NONE;
+		dimm->edac_mode = ecc_on ? EDAC_SECDED : EDAC_NONE;
 		row_high_limit_last = row_high_limit;
 	}
 }
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 43fc65e..537a06e 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -637,22 +637,18 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 					pvt->sbridge_dev->mc, i, j,
 					size, npages,
 					banks, ranks, rows, cols);
-				csr = &mci->csrows[csrow];
 
+				/*
+				 * Fake stuff. This controller doesn't see
+				 * csrows.
+				 */
+				csr = &mci->csrows[csrow];
 				csr->first_page = last_page;
 				csr->last_page = last_page + npages - 1;
-				csr->page_mask = 0UL;	/* Unused */
 				csr->nr_pages = npages;
-				csr->grain = 32;
 				csr->csrow_idx = csrow;
-				csr->dtype = (banks == 8) ? DEV_X8 : DEV_X4;
-				csr->ce_count = 0;
-				csr->ue_count = 0;
-				csr->mtype = mtype;
-				csr->edac_mode = mode;
 				csr->nr_channels = 1;
 				csr->channels[0].chan_idx = i;
-				csr->channels[0].ce_count = 0;
 				pvt->csrow_map[i][j] = csrow;
 				last_page += npages;
 				csrow++;
@@ -660,6 +656,10 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 				csr->channels[0].dimm = dimm;
 				dimm->location.mc_channel = i;
 				dimm->location.mc_dimm_number = j;
+				dimm->grain = 32;
+				dimm->dtype = (banks == 8) ? DEV_X8 : DEV_X4;
+				dimm->mtype = mtype;
+				dimm->edac_mode = mode;
 				snprintf(dimm->label, sizeof(dimm->label),
 					 "CPU_SrcID#%u_Channel#%u_DIMM#%u",
 					 pvt->sbridge_dev->source_id, i, j);
diff --git a/drivers/edac/tile_edac.c b/drivers/edac/tile_edac.c
index 1d5cf06..db7d2ae 100644
--- a/drivers/edac/tile_edac.c
+++ b/drivers/edac/tile_edac.c
@@ -84,6 +84,7 @@ static int __devinit tile_edac_init_csrows(struct mem_ctl_info *mci)
 	struct csrow_info	*csrow = &mci->csrows[0];
 	struct tile_edac_priv	*priv = mci->pvt_info;
 	struct mshim_mem_info	mem_info;
+	struct dimm_info *dimm = csrow->channels[0].dimm;
 
 	if (hv_dev_pread(priv->hv_devhdl, 0, (HV_VirtAddr)&mem_info,
 		sizeof(struct mshim_mem_info), MSHIM_MEM_INFO_OFF) !=
@@ -93,16 +94,16 @@ static int __devinit tile_edac_init_csrows(struct mem_ctl_info *mci)
 	}
 
 	if (mem_info.mem_ecc)
-		csrow->edac_mode = EDAC_SECDED;
+		dimm->edac_mode = EDAC_SECDED;
 	else
-		csrow->edac_mode = EDAC_NONE;
+		dimm->edac_mode = EDAC_NONE;
 	switch (mem_info.mem_type) {
 	case DDR2:
-		csrow->mtype = MEM_DDR2;
+		dimm->mtype = MEM_DDR2;
 		break;
 
 	case DDR3:
-		csrow->mtype = MEM_DDR3;
+		dimm->mtype = MEM_DDR3;
 		break;
 
 	default:
@@ -112,8 +113,8 @@ static int __devinit tile_edac_init_csrows(struct mem_ctl_info *mci)
 	csrow->first_page = 0;
 	csrow->nr_pages = mem_info.mem_size >> PAGE_SHIFT;
 	csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
-	csrow->grain = TILE_EDAC_ERROR_GRAIN;
-	csrow->dtype = DEV_UNKNOWN;
+	dimm->grain = TILE_EDAC_ERROR_GRAIN;
+	dimm->dtype = DEV_UNKNOWN;
 
 	return 0;
 }
diff --git a/drivers/edac/x38_edac.c b/drivers/edac/x38_edac.c
index b6f47de..52c8d69 100644
--- a/drivers/edac/x38_edac.c
+++ b/drivers/edac/x38_edac.c
@@ -317,7 +317,7 @@ static unsigned long drb_to_nr_pages(
 static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	int rc;
-	int i;
+	int i, j;
 	struct mem_ctl_info *mci = NULL;
 	unsigned long last_page;
 	u16 drbs[X38_CHANNELS][X38_RANKS_PER_CHANNEL];
@@ -372,20 +372,21 @@ static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 			i / X38_RANKS_PER_CHANNEL,
 			i % X38_RANKS_PER_CHANNEL);
 
-		if (nr_pages == 0) {
-			csrow->mtype = MEM_EMPTY;
+		if (nr_pages == 0)
 			continue;
-		}
 
 		csrow->first_page = last_page + 1;
 		last_page += nr_pages;
 		csrow->last_page = last_page;
 		csrow->nr_pages = nr_pages;
 
-		csrow->grain = nr_pages << PAGE_SHIFT;
-		csrow->mtype = MEM_DDR2;
-		csrow->dtype = DEV_UNKNOWN;
-		csrow->edac_mode = EDAC_UNKNOWN;
+		for (j = 0; j < x38_channel_num; j++) {
+			struct dimm_info *dimm = csrow->channels[j].dimm;
+			dimm->grain = nr_pages << PAGE_SHIFT;
+			dimm->mtype = MEM_DDR2;
+			dimm->dtype = DEV_UNKNOWN;
+			dimm->edac_mode = EDAC_UNKNOWN;
+		}
 	}
 
 	x38_clear_error_info(mci);
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 0f700e3..4e6420c 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -362,6 +362,13 @@ struct dimm_info {
 	} location;
 	struct kobject kobj;		/* sysfs kobject for this csrow */
 	struct mem_ctl_info *mci;	/* the parent */
+
+	u32 grain;		/* granularity of reported error in bytes */
+	enum dev_type dtype;	/* memory device type */
+	enum mem_type mtype;	/* memory dimm type */
+	enum edac_type edac_mode;	/* EDAC mode for this dimm */
+
+	u32 ce_count;		/* Correctable Errors for this dimm */
 };
 
 struct csrow_channel_info {
@@ -372,19 +379,17 @@ struct csrow_channel_info {
 };
 
 struct csrow_info {
-	unsigned long first_page;	/* first page number in dimm */
-	unsigned long last_page;	/* last page number in dimm */
+	unsigned long first_page;	/* first page number in csrow */
+	unsigned long last_page;	/* last page number in csrow */
+	u32 nr_pages;			/* number of pages in csrow */
 	unsigned long page_mask;	/* used for interleaving -
 					 * 0UL for non intlv
 					 */
-	u32 nr_pages;		/* number of pages in csrow */
-	u32 grain;		/* granularity of reported error in bytes */
-	int csrow_idx;		/* the chip-select row */
-	enum dev_type dtype;	/* memory device type */
+	int csrow_idx;			/* the chip-select row */
+
 	u32 ue_count;		/* Uncorrectable Errors for this csrow */
 	u32 ce_count;		/* Correctable Errors for this csrow */
-	enum mem_type mtype;	/* memory csrow type */
-	enum edac_type edac_mode;	/* EDAC mode for this csrow */
+
 	struct mem_ctl_info *mci;	/* the parent */
 
 	struct kobject kobj;	/* sysfs kobject for this csrow */
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 12/31] edac: Don't initialize csrow's first_page & friends when not needed
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (10 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 11/31] edac: move dimm properties to struct dimm_info Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 13/31] edac: move nr_pages to dimm struct Mauro Carvalho Chehab
                   ` (21 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Almost all edac	drivers	initialize first_page, last_page and
page_mask. Those vars are used inside the EDAC core, in	order to
calculate the csrow affected by	an error, by using the routine
edac_mc_find_csrow_by_page().

However, very few drivers actually use it:
        e752x_edac.c
        e7xxx_edac.c
        i3000_edac.c
        i82443bxgx_edac.c
        i82860_edac.c
        i82875p_edac.c
        i82975x_edac.c
        r82600_edac.c

There also a few other drivers that have their own calculus
formula internally using those vars.

All the others are just wasting time by initializing those
data.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/amd64_edac.c   |   38 ++------------------------------------
 drivers/edac/i3200_edac.c   |    5 -----
 drivers/edac/i5000_edac.c   |    5 -----
 drivers/edac/i5100_edac.c   |    2 --
 drivers/edac/i5400_edac.c   |    5 -----
 drivers/edac/i7300_edac.c   |    5 +----
 drivers/edac/i7core_edac.c  |    5 -----
 drivers/edac/mv64x60_edac.c |    1 -
 drivers/edac/ppc4xx_edac.c  |    7 -------
 drivers/edac/sb_edac.c      |    2 --
 drivers/edac/tile_edac.c    |    2 --
 drivers/edac/x38_edac.c     |    5 -----
 12 files changed, 3 insertions(+), 79 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3e7bddc..b1b1551 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -715,25 +715,6 @@ static inline u64 input_addr_to_sys_addr(struct mem_ctl_info *mci,
 				     input_addr_to_dram_addr(mci, input_addr));
 }
 
-/*
- * Find the minimum and maximum InputAddr values that map to the given @csrow.
- * Pass back these values in *input_addr_min and *input_addr_max.
- */
-static void find_csrow_limits(struct mem_ctl_info *mci, int csrow,
-			      u64 *input_addr_min, u64 *input_addr_max)
-{
-	struct amd64_pvt *pvt;
-	u64 base, mask;
-
-	pvt = mci->pvt_info;
-	BUG_ON((csrow < 0) || (csrow >= pvt->csels[0].b_cnt));
-
-	get_cs_base_and_mask(pvt, csrow, 0, &base, &mask);
-
-	*input_addr_min = base & ~mask;
-	*input_addr_max = base | mask;
-}
-
 /* Map the Error address to a PAGE and PAGE OFFSET. */
 static inline void error_address_to_page_and_offset(u64 error_address,
 						    u32 *page, u32 *offset)
@@ -2166,7 +2147,7 @@ static int init_csrows(struct mem_ctl_info *mci)
 {
 	struct csrow_info *csrow;
 	struct amd64_pvt *pvt = mci->pvt_info;
-	u64 input_addr_min, input_addr_max, sys_addr, base, mask;
+	u64 base, mask;
 	u32 val;
 	int i, j, empty = 1;
 	enum mem_type mtype;
@@ -2194,14 +2175,7 @@ static int init_csrows(struct mem_ctl_info *mci)
 
 		empty = 0;
 		csrow->nr_pages = amd64_csrow_nr_pages(pvt, 0, i);
-		find_csrow_limits(mci, i, &input_addr_min, &input_addr_max);
-		sys_addr = input_addr_to_sys_addr(mci, input_addr_min);
-		csrow->first_page = (u32) (sys_addr >> PAGE_SHIFT);
-		sys_addr = input_addr_to_sys_addr(mci, input_addr_max);
-		csrow->last_page = (u32) (sys_addr >> PAGE_SHIFT);
-
 		get_cs_base_and_mask(pvt, i, 0, &base, &mask);
-		csrow->page_mask = ~mask;
 		/* 8 bytes of resolution */
 
 		mtype = amd64_determine_memory_type(pvt, i);
@@ -2221,15 +2195,7 @@ static int init_csrows(struct mem_ctl_info *mci)
 		}
 
 		debugf1("  for MC node %d csrow %d:\n", pvt->mc_node_id, i);
-		debugf1("    input_addr_min: 0x%lx input_addr_max: 0x%lx\n",
-			(unsigned long)input_addr_min,
-			(unsigned long)input_addr_max);
-		debugf1("    sys_addr: 0x%lx  page_mask: 0x%lx\n",
-			(unsigned long)sys_addr, csrow->page_mask);
-		debugf1("    nr_pages: %u  first_page: 0x%lx "
-			"last_page: 0x%lx\n",
-			(unsigned)csrow->nr_pages,
-			csrow->first_page, csrow->last_page);
+		debugf1("    nr_pages: %u\n", csrow->nr_pages);
 	}
 
 	return empty;
diff --git a/drivers/edac/i3200_edac.c b/drivers/edac/i3200_edac.c
index 38d1e87..8086693 100644
--- a/drivers/edac/i3200_edac.c
+++ b/drivers/edac/i3200_edac.c
@@ -332,7 +332,6 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 	int rc;
 	int i, j;
 	struct mem_ctl_info *mci = NULL;
-	unsigned long last_page;
 	u16 drbs[I3200_CHANNELS][I3200_RANKS_PER_CHANNEL];
 	bool stacked;
 	void __iomem *window;
@@ -377,7 +376,6 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 	 * cumulative; the last one will contain the total memory
 	 * contained in all ranks.
 	 */
-	last_page = -1UL;
 	for (i = 0; i < mci->nr_csrows; i++) {
 		unsigned long nr_pages;
 		struct csrow_info *csrow = &mci->csrows[i];
@@ -389,9 +387,6 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 		if (nr_pages == 0)
 			continue;
 
-		csrow->first_page = last_page + 1;
-		last_page += nr_pages;
-		csrow->last_page = last_page;
 		csrow->nr_pages = nr_pages;
 
 		for (j = 0; j < nr_channels; j++) {
diff --git a/drivers/edac/i5000_edac.c b/drivers/edac/i5000_edac.c
index e612f1e..f00f684 100644
--- a/drivers/edac/i5000_edac.c
+++ b/drivers/edac/i5000_edac.c
@@ -1263,11 +1263,6 @@ static int i5000_init_csrows(struct mem_ctl_info *mci)
 		if (!MTR_DIMMS_PRESENT(mtr) && !MTR_DIMMS_PRESENT(mtr1))
 			continue;
 
-		/* FAKE OUT VALUES, FIXME */
-		p_csrow->first_page = 0 + csrow * 20;
-		p_csrow->last_page = 9 + csrow * 20;
-		p_csrow->page_mask = 0xFFF;
-
 		csrow_megs = 0;
 		for (channel = 0; channel < pvt->maxch; channel++) {
 			csrow_megs += pvt->dimm_info[csrow][channel].megabytes;
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 1884c36..76489dc 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -865,8 +865,6 @@ static void __devinit i5100_init_csrows(struct mem_ctl_info *mci)
 		 * FIXME: these two are totally bogus -- I don't see how to
 		 * map them correctly to this structure...
 		 */
-		mci->csrows[i].first_page = total_pages;
-		mci->csrows[i].last_page = total_pages + npages - 1;
 		mci->csrows[i].nr_pages = npages;
 		mci->csrows[i].csrow_idx = i;
 		mci->csrows[i].mci = mci;
diff --git a/drivers/edac/i5400_edac.c b/drivers/edac/i5400_edac.c
index 35784f2..015a368 100644
--- a/drivers/edac/i5400_edac.c
+++ b/drivers/edac/i5400_edac.c
@@ -1161,11 +1161,6 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 		if (!MTR_DIMMS_PRESENT(mtr))
 			continue;
 
-		/* FAKE OUT VALUES, FIXME */
-		p_csrow->first_page = 0 + csrow * 20;
-		p_csrow->last_page = 9 + csrow * 20;
-		p_csrow->page_mask = 0xFFF;
-
 		csrow_megs = 0;
 		for (channel = 0; channel < pvt->maxch; channel++)
 			csrow_megs += pvt->dimm_info[csrow][channel].megabytes;
diff --git a/drivers/edac/i7300_edac.c b/drivers/edac/i7300_edac.c
index 21a8c35..30453fa 100644
--- a/drivers/edac/i7300_edac.c
+++ b/drivers/edac/i7300_edac.c
@@ -778,7 +778,7 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 	int rc = -ENODEV;
 	int mtr;
 	int ch, branch, slot, channel;
-	u32 last_page = 0, nr_pages;
+	u32 nr_pages;
 	struct dimm_info *dimm;
 
 	pvt = mci->pvt_info;
@@ -832,9 +832,6 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 
 				/* Update per_csrow memory count */
 				p_csrow->nr_pages += nr_pages;
-				p_csrow->first_page = last_page;
-				last_page += nr_pages;
-				p_csrow->last_page = last_page;
 
 				rc = 0;
 
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 66879e6..4425ab9 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -599,7 +599,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 	struct pci_dev *pdev;
 	int i, j;
 	int csrow = 0;
-	unsigned long last_page = 0;
 	enum edac_type mode;
 	enum mem_type mtype;
 	struct dimm_info *dimm;
@@ -719,12 +718,8 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 			npages = MiB_TO_PAGES(size);
 
 			csr = &mci->csrows[csrow];
-			csr->first_page = last_page + 1;
-			last_page += npages;
-			csr->last_page = last_page;
 			csr->nr_pages = npages;
 
-			csr->page_mask = 0;
 			csr->csrow_idx = csrow;
 			csr->nr_channels = 1;
 
diff --git a/drivers/edac/mv64x60_edac.c b/drivers/edac/mv64x60_edac.c
index 12d7fe0..d2e3c39 100644
--- a/drivers/edac/mv64x60_edac.c
+++ b/drivers/edac/mv64x60_edac.c
@@ -668,7 +668,6 @@ static void mv64x60_init_csrows(struct mem_ctl_info *mci,
 	csrow = &mci->csrows[0];
 	dimm = csrow->channels[0].dimm;
 	csrow->nr_pages = pdata->total_mem >> PAGE_SHIFT;
-	csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
 	dimm->grain = 8;
 
 	dimm->mtype = (ctl & MV64X60_SDRAM_REGISTERED) ? MEM_RDDR : MEM_DDR;
diff --git a/drivers/edac/ppc4xx_edac.c b/drivers/edac/ppc4xx_edac.c
index 7c98f66..6dc000e 100644
--- a/drivers/edac/ppc4xx_edac.c
+++ b/drivers/edac/ppc4xx_edac.c
@@ -897,7 +897,6 @@ ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
 	enum edac_type edac_mode;
 	int row, j;
 	u32 mbxcf, size;
-	static u32 ppc4xx_last_page;
 
 	/* Establish the memory type and width */
 
@@ -959,10 +958,6 @@ ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
 			goto done;
 		}
 
-		csi->first_page = ppc4xx_last_page;
-		csi->last_page	= csi->first_page + csi->nr_pages - 1;
-		csi->page_mask	= 0;
-
 		/*
 		 * It's unclear exactly what grain should be set to
 		 * here. The SDRAM_ECCES register allows resolution of
@@ -984,8 +979,6 @@ ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
 			dimm->dtype	= dtype;
 
 			dimm->edac_mode	= edac_mode;
-
-		ppc4xx_last_page += csi->nr_pages;
 		}
 	}
 
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 537a06e..080ba3d 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -643,8 +643,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 				 * csrows.
 				 */
 				csr = &mci->csrows[csrow];
-				csr->first_page = last_page;
-				csr->last_page = last_page + npages - 1;
 				csr->nr_pages = npages;
 				csr->csrow_idx = csrow;
 				csr->nr_channels = 1;
diff --git a/drivers/edac/tile_edac.c b/drivers/edac/tile_edac.c
index db7d2ae..ba0917b 100644
--- a/drivers/edac/tile_edac.c
+++ b/drivers/edac/tile_edac.c
@@ -110,9 +110,7 @@ static int __devinit tile_edac_init_csrows(struct mem_ctl_info *mci)
 		return -1;
 	}
 
-	csrow->first_page = 0;
 	csrow->nr_pages = mem_info.mem_size >> PAGE_SHIFT;
-	csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
 	dimm->grain = TILE_EDAC_ERROR_GRAIN;
 	dimm->dtype = DEV_UNKNOWN;
 
diff --git a/drivers/edac/x38_edac.c b/drivers/edac/x38_edac.c
index 52c8d69..7be10dd 100644
--- a/drivers/edac/x38_edac.c
+++ b/drivers/edac/x38_edac.c
@@ -319,7 +319,6 @@ static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 	int rc;
 	int i, j;
 	struct mem_ctl_info *mci = NULL;
-	unsigned long last_page;
 	u16 drbs[X38_CHANNELS][X38_RANKS_PER_CHANNEL];
 	bool stacked;
 	void __iomem *window;
@@ -363,7 +362,6 @@ static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 	 * cumulative; the last one will contain the total memory
 	 * contained in all ranks.
 	 */
-	last_page = -1UL;
 	for (i = 0; i < mci->nr_csrows; i++) {
 		unsigned long nr_pages;
 		struct csrow_info *csrow = &mci->csrows[i];
@@ -375,9 +373,6 @@ static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 		if (nr_pages == 0)
 			continue;
 
-		csrow->first_page = last_page + 1;
-		last_page += nr_pages;
-		csrow->last_page = last_page;
 		csrow->nr_pages = nr_pages;
 
 		for (j = 0; j < x38_channel_num; j++) {
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 13/31] edac: move nr_pages to dimm struct
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (11 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 12/31] edac: Don't initialize csrow's first_page & friends when not needed Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 14/31] edac: Add per-dimm sysfs show nodes Mauro Carvalho Chehab
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

The number of pages is a dimm property. Move it to the dimm
struct. After this change, it is possible to add sysfs nodes
for the DIMM's that will properly represent the physical socket
characteristics.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/amd64_edac.c      |   12 +++------
 drivers/edac/amd76x_edac.c     |    6 ++--
 drivers/edac/cell_edac.c       |    8 ++++--
 drivers/edac/cpc925_edac.c     |    8 ++++--
 drivers/edac/e752x_edac.c      |    6 +++-
 drivers/edac/e7xxx_edac.c      |    5 ++-
 drivers/edac/edac_mc.c         |   29 +++++++++++++---------
 drivers/edac/edac_mc_sysfs.c   |   52 ++++++++++++++++++++++++++++++----------
 drivers/edac/i3000_edac.c      |    6 +++-
 drivers/edac/i3200_edac.c      |    3 +-
 drivers/edac/i5000_edac.c      |   14 ++++++----
 drivers/edac/i5100_edac.c      |   10 +-------
 drivers/edac/i5400_edac.c      |    3 +-
 drivers/edac/i7300_edac.c      |   17 ++----------
 drivers/edac/i7core_edac.c     |    9 +-----
 drivers/edac/i82443bxgx_edac.c |    2 +-
 drivers/edac/i82860_edac.c     |    2 +-
 drivers/edac/i82875p_edac.c    |    5 ++-
 drivers/edac/i82975x_edac.c    |    6 +++-
 drivers/edac/mpc85xx_edac.c    |    3 +-
 drivers/edac/mv64x60_edac.c    |    3 +-
 drivers/edac/pasemi_edac.c     |   14 +++++-----
 drivers/edac/ppc4xx_edac.c     |    5 ++-
 drivers/edac/r82600_edac.c     |    3 +-
 drivers/edac/sb_edac.c         |    5 +---
 drivers/edac/tile_edac.c       |    2 +-
 drivers/edac/x38_edac.c        |    4 +-
 include/linux/edac.h           |   10 ++++---
 28 files changed, 135 insertions(+), 117 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b1b1551..613d5f1 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2126,14 +2126,8 @@ static u32 amd64_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr)
 
 	nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode) << (20 - PAGE_SHIFT);
 
-	/*
-	 * If dual channel then double the memory size of single channel.
-	 * Channel count is 1 or 2
-	 */
-	nr_pages <<= (pvt->channel_count - 1);
-
 	debugf0("  (csrow=%d) DBAM map index= %d\n", csrow_nr, cs_mode);
-	debugf0("    nr_pages= %u  channel-count = %d\n",
+	debugf0("    nr_pages/dimm= %u  channel-count = %d\n",
 		nr_pages, pvt->channel_count);
 
 	return nr_pages;
@@ -2174,7 +2168,7 @@ static int init_csrows(struct mem_ctl_info *mci)
 			i, pvt->mc_node_id);
 
 		empty = 0;
-		csrow->nr_pages = amd64_csrow_nr_pages(pvt, 0, i);
+		nr_pages = amd64_csrow_nr_pages(pvt, 0, i);
 		get_cs_base_and_mask(pvt, i, 0, &base, &mask);
 		/* 8 bytes of resolution */
 
@@ -2192,6 +2186,8 @@ static int init_csrows(struct mem_ctl_info *mci)
 		for (j = 0; j < pvt->channel_count; j++) {
 			csrow->channels[j].dimm->mtype = mtype;
 			csrow->channels[j].dimm->edac_mode = edac_mode;
+			csrow->channels[j].dimm->n_pages = npages;
+
 		}
 
 		debugf1("  for MC node %d csrow %d:\n", pvt->mc_node_id, i);
diff --git a/drivers/edac/amd76x_edac.c b/drivers/edac/amd76x_edac.c
index 2a63ed0..1532750 100644
--- a/drivers/edac/amd76x_edac.c
+++ b/drivers/edac/amd76x_edac.c
@@ -205,10 +205,10 @@ static void amd76x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 		mba_mask = ((mba & 0xff80) << 16) | 0x7fffffUL;
 		pci_read_config_dword(pdev, AMD76X_DRAM_MODE_STATUS, &dms);
 		csrow->first_page = mba_base >> PAGE_SHIFT;
-		csrow->nr_pages = (mba_mask + 1) >> PAGE_SHIFT;
-		csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
+		dimm->nr_pages = (mba_mask + 1) >> PAGE_SHIFT;
+		csrow->last_page = csrow->first_page + dimm->nr_pages - 1;
 		csrow->page_mask = mba_mask >> PAGE_SHIFT;
-		dimm->grain = csrow->nr_pages << PAGE_SHIFT;
+		dimm->grain = dimm->nr_pages << PAGE_SHIFT;
 		dimm->mtype = MEM_RDDR;
 		dimm->dtype = ((dms >> index) & 0x1) ? DEV_X4 : DEV_UNKNOWN;
 		dimm->edac_mode = edac_mode;
diff --git a/drivers/edac/cell_edac.c b/drivers/edac/cell_edac.c
index 94fbb12..09e1b5d 100644
--- a/drivers/edac/cell_edac.c
+++ b/drivers/edac/cell_edac.c
@@ -128,6 +128,7 @@ static void __devinit cell_edac_init_csrows(struct mem_ctl_info *mci)
 	struct cell_edac_priv		*priv = mci->pvt_info;
 	struct device_node		*np;
 	int				j;
+	u32				nr_pages;
 
 	for (np = NULL;
 	     (np = of_find_node_by_name(np, "memory")) != NULL;) {
@@ -142,19 +143,20 @@ static void __devinit cell_edac_init_csrows(struct mem_ctl_info *mci)
 		if (of_node_to_nid(np) != priv->node)
 			continue;
 		csrow->first_page = r.start >> PAGE_SHIFT;
-		csrow->nr_pages = resource_size(&r) >> PAGE_SHIFT;
-		csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
+		nr_pages = resource_size(&r) >> PAGE_SHIFT;
+		csrow->last_page = csrow->first_page + nr_pages - 1;
 
 		for (j = 0; j < csrow->nr_channels; j++) {
 			dimm = csrow->channels[j].dimm;
 			dimm->mtype = MEM_XDR;
 			dimm->edac_mode = EDAC_SECDED;
+			dimm->nr_pages = nr_pages / csrow->nr_channels;
 		}
 		dev_dbg(mci->dev,
 			"Initialized on node %d, chanmask=0x%x,"
 			" first_page=0x%lx, nr_pages=0x%x\n",
 			priv->node, priv->chanmask,
-			csrow->first_page, csrow->nr_pages);
+			csrow->first_page, dimm->nr_pages);
 		break;
 	}
 }
diff --git a/drivers/edac/cpc925_edac.c b/drivers/edac/cpc925_edac.c
index ee90f3d..7b764a8 100644
--- a/drivers/edac/cpc925_edac.c
+++ b/drivers/edac/cpc925_edac.c
@@ -332,7 +332,7 @@ static void cpc925_init_csrows(struct mem_ctl_info *mci)
 	struct dimm_info *dimm;
 	int index, j;
 	u32 mbmr, mbbar, bba;
-	unsigned long row_size, last_nr_pages = 0;
+	unsigned long row_size, nr_pages, last_nr_pages = 0;
 
 	get_total_mem(pdata);
 
@@ -351,12 +351,14 @@ static void cpc925_init_csrows(struct mem_ctl_info *mci)
 
 		row_size = bba * (1UL << 28);	/* 256M */
 		csrow->first_page = last_nr_pages;
-		csrow->nr_pages = row_size >> PAGE_SHIFT;
-		csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
+		nr_pages = row_size >> PAGE_SHIFT;
+		csrow->last_page = csrow->first_page + nr_pages - 1;
 		last_nr_pages = csrow->last_page + 1;
 
 		for (j = 0; j < csrow->nr_channels; j++) {
 			dimm = csrow->channels[j].dimm;
+
+			dimm->nr_pages = nr_pages / csrow->nr_channels;
 			dimm->mtype = MEM_RDDR;
 			dimm->edac_mode = EDAC_SECDED;
 
diff --git a/drivers/edac/e752x_edac.c b/drivers/edac/e752x_edac.c
index db291ea..310f657 100644
--- a/drivers/edac/e752x_edac.c
+++ b/drivers/edac/e752x_edac.c
@@ -1044,7 +1044,7 @@ static void e752x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 	int drc_drbg;		/* DRB granularity 0=64mb, 1=128mb */
 	int drc_ddim;		/* DRAM Data Integrity Mode 0=none, 2=edac */
 	u8 value;
-	u32 dra, drc, cumul_size, i;
+	u32 dra, drc, cumul_size, i, nr_pages;
 
 	dra = 0;
 	for (index = 0; index < 4; index++) {
@@ -1078,11 +1078,13 @@ static void e752x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 
 		csrow->first_page = last_cumul_size;
 		csrow->last_page = cumul_size - 1;
-		csrow->nr_pages = cumul_size - last_cumul_size;
+		nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
 
 		for (i = 0; i < drc_chan + 1; i++) {
 			struct dimm_info *dimm = csrow->channels[i].dimm;
+
+			dimm->nr_pages = nr_pages / drc_chan;
 			dimm->grain = 1 << 12;	/* 4KiB - resolution of CELOG */
 			dimm->mtype = MEM_RDDR;	/* only one type supported */
 			dimm->dtype = mem_dev ? DEV_X4 : DEV_X8;
diff --git a/drivers/edac/e7xxx_edac.c b/drivers/edac/e7xxx_edac.c
index 178d2af..2005d80 100644
--- a/drivers/edac/e7xxx_edac.c
+++ b/drivers/edac/e7xxx_edac.c
@@ -349,7 +349,7 @@ static void e7xxx_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 	unsigned long last_cumul_size;
 	int index, j;
 	u8 value;
-	u32 dra, cumul_size;
+	u32 dra, cumul_size, nr_pages;
 	int drc_chan, drc_drbg, drc_ddim, mem_dev;
 	struct csrow_info *csrow;
 	struct dimm_info *dimm;
@@ -380,12 +380,13 @@ static void e7xxx_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 
 		csrow->first_page = last_cumul_size;
 		csrow->last_page = cumul_size - 1;
-		csrow->nr_pages = cumul_size - last_cumul_size;
+		nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
 
 		for (j = 0; j < drc_chan + 1; j++) {
 			dimm = csrow->channels[j].dimm;
 
+			dimm->nr_pages = nr_pages / drc_chan;
 			dimm->grain = 1 << 12;	/* 4KiB - resolution of CELOG */
 			dimm->mtype = MEM_RDDR;	/* only one type supported */
 			dimm->dtype = mem_dev ? DEV_X4 : DEV_X8;
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 3ceddae..f33d603 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -47,22 +47,23 @@ static void edac_mc_dump_channel(struct csrow_channel_info *chan)
 {
 	debugf4("\tchannel = %p\n", chan);
 	debugf4("\tchannel->chan_idx = %d\n", chan->chan_idx);
-	debugf4("\tchannel->ce_count = %d\n", chan->dimm->ce_count);
-		debugf4("\tchannel->label = '%s'\n", chan->dimm->label);
 	debugf4("\tchannel->csrow = %p\n\n", chan->csrow);
+
+	debugf4("\tdimm->ce_count = %d\n", chan->dimm->ce_count);
+	debugf4("\tdimm->label = '%s'\n", chan->dimm->label);
+	debugf4("\tdimm->nr_pages = 0x%x\n", chan->dimm->nr_pages);
 }
 
 static void edac_mc_dump_csrow(struct csrow_info *csrow)
 {
 	debugf4("\tcsrow = %p\n", csrow);
 	debugf4("\tcsrow->csrow_idx = %d\n", csrow->csrow_idx);
-	debugf4("\tcsrow->first_page = 0x%lx\n", csrow->first_page);
-	debugf4("\tcsrow->last_page = 0x%lx\n", csrow->last_page);
-	debugf4("\tcsrow->page_mask = 0x%lx\n", csrow->page_mask);
-	debugf4("\tcsrow->nr_pages = 0x%x\n", csrow->nr_pages);
 	debugf4("\tcsrow->nr_channels = %d\n", csrow->nr_channels);
 	debugf4("\tcsrow->channels = %p\n", csrow->channels);
 	debugf4("\tcsrow->mci = %p\n\n", csrow->mci);
+	debugf4("\tcsrow->first_page = 0x%lx\n", csrow->first_page);
+	debugf4("\tcsrow->last_page = 0x%lx\n", csrow->last_page);
+	debugf4("\tcsrow->page_mask = 0x%lx\n", csrow->page_mask);
 }
 
 static void edac_mc_dump_mci(struct mem_ctl_info *mci)
@@ -663,15 +664,19 @@ static void edac_mc_scrub_block(unsigned long page, unsigned long offset,
 int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci, unsigned long page)
 {
 	struct csrow_info *csrows = mci->csrows;
-	int row, i;
+	int row, i, j, n;
 
 	debugf1("MC%d: %s(): 0x%lx\n", mci->mc_idx, __func__, page);
 	row = -1;
 
 	for (i = 0; i < mci->nr_csrows; i++) {
 		struct csrow_info *csrow = &csrows[i];
-
-		if (csrow->nr_pages == 0)
+		n = 0;
+		for (j = 0; j < csrow->nr_channels; j++) {
+			struct dimm_info *dimm = csrow->channels[j].dimm;
+			n += dimm->nr_pages;
+		}
+		if (n == 0)
 			continue;
 
 		debugf3("MC%d: %s(): first(0x%lx) page(0x%lx) last(0x%lx) "
@@ -680,9 +685,9 @@ int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci, unsigned long page)
 			csrow->page_mask);
 
 		if ((page >= csrow->first_page) &&
-		    (page <= csrow->last_page) &&
-		    ((page & csrow->page_mask) ==
-		     (csrow->first_page & csrow->page_mask))) {
+		(page <= csrow->last_page) &&
+		((page & csrow->page_mask) ==
+		(csrow->first_page & csrow->page_mask))) {
 			row = i;
 			break;
 		}
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 1571d99..62b5029 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -144,7 +144,7 @@ static ssize_t csrow_ce_count_show(struct csrow_info *csrow, char *data,
 static ssize_t csrow_size_show(struct csrow_info *csrow, char *data,
 				int private)
 {
-	return sprintf(data, "%u\n", PAGES_TO_MiB(csrow->nr_pages));
+	return sprintf(data, "%u\n", PAGES_TO_MiB(csrow->channels[0].dimm->nr_pages));
 }
 
 static ssize_t csrow_mem_type_show(struct csrow_info *csrow, char *data,
@@ -674,16 +674,17 @@ static ssize_t mci_ctl_name_show(struct mem_ctl_info *mci, char *data)
 
 static ssize_t mci_size_mb_show(struct mem_ctl_info *mci, char *data)
 {
-	int total_pages, csrow_idx;
+	int total_pages, csrow_idx, j;
 
 	for (total_pages = csrow_idx = 0; csrow_idx < mci->nr_csrows;
-		csrow_idx++) {
+	     csrow_idx++) {
 		struct csrow_info *csrow = &mci->csrows[csrow_idx];
 
-		if (!csrow->nr_pages)
-			continue;
+		for (j = 0; j < csrow->nr_channels; j++) {
+			struct dimm_info *dimm = csrow->channels[j].dimm;
 
-		total_pages += csrow->nr_pages;
+			total_pages += dimm->nr_pages;
+		}
 	}
 
 	return sprintf(data, "%u\n", PAGES_TO_MiB(total_pages));
@@ -1089,10 +1090,15 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 	/* Make directories for each CSROW object under the mc<id> kobject
 	 */
 	for (i = 0; i < mci->nr_csrows; i++) {
+		int n = 0;
+
 		csrow = &mci->csrows[i];
+		for (j = 0; j < csrow->nr_channels; j++) {
+			struct dimm_info *dimm = csrow->channels[j].dimm;
+			n += dimm->nr_pages;
+		}
 
-		/* Only expose populated CSROWs */
-		if (csrow->nr_pages > 0) {
+		if (n > 0) {
 			err = edac_create_csrow_object(mci, csrow, i);
 			if (err) {
 				debugf1("%s() failure: create csrow %d obj\n",
@@ -1106,6 +1112,9 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 	 * Make directories for each DIMM object under the mc<id> kobject
 	 */
 	for (j = 0; j < mci->nr_dimms; j++) {
+		/* Only expose populated CSROWs */
+		if (mci->dimms[j].nr_pages == 0)
+			continue;
 		err = edac_create_dimm_object(mci, &mci->dimms[j] , j);
 		if (err) {
 			debugf1("%s() failure: create dimm %d obj\n",
@@ -1117,13 +1126,22 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 	return 0;
 
 fail2:
-	for (j--; j >= 0; j--)
-		kobject_put(&mci->dimms[i].kobj);
+	for (j--; j >= 0; j--) {
+		if (mci->dimms[j].nr_pages)
+			kobject_put(&mci->dimms[i].kobj);
+	}
 
 	/* CSROW error: backout what has already been registered,  */
 fail1:
 	for (i--; i >= 0; i--) {
-		if (mci->csrows[i].nr_pages > 0) {
+		int n = 0;
+
+		csrow = &mci->csrows[i];
+		for (j = 0; j < csrow->nr_channels; j++) {
+			struct dimm_info *dimm = csrow->channels[j].dimm;
+			n += dimm->nr_pages;
+		}
+		if (n > 0) {
 			kobject_put(&mci->csrows[i].kobj);
 		}
 	}
@@ -1144,7 +1162,8 @@ fail0:
  */
 void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
 {
-	int i;
+	struct csrow_info *csrow;
+	int i, j;
 
 	debugf0("%s()\n", __func__);
 
@@ -1155,7 +1174,14 @@ void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
 		kobject_put(&mci->dimms[i].kobj);
 	}
 	for (i = 0; i < mci->nr_csrows; i++) {
-		if (mci->csrows[i].nr_pages > 0) {
+		int n = 0;
+
+		csrow = &mci->csrows[i];
+		for (j = 0; j < csrow->nr_channels; j++) {
+			struct dimm_info *dimm = csrow->channels[j].dimm;
+			n += dimm->nr_pages;
+		}
+		if (n > 0) {
 			debugf0("%s()  unreg csrow-%d\n", __func__, i);
 			kobject_put(&mci->csrows[i].kobj);
 		}
diff --git a/drivers/edac/i3000_edac.c b/drivers/edac/i3000_edac.c
index 1498c5f..bf8a230 100644
--- a/drivers/edac/i3000_edac.c
+++ b/drivers/edac/i3000_edac.c
@@ -306,7 +306,7 @@ static int i3000_probe1(struct pci_dev *pdev, int dev_idx)
 	int rc;
 	int i, j;
 	struct mem_ctl_info *mci = NULL;
-	unsigned long last_cumul_size;
+	unsigned long last_cumul_size, nr_pages;
 	int interleaved, nr_channels;
 	unsigned char dra[I3000_RANKS / 2], drb[I3000_RANKS];
 	unsigned char *c0dra = dra, *c1dra = &dra[I3000_RANKS_PER_CHANNEL / 2];
@@ -391,11 +391,13 @@ static int i3000_probe1(struct pci_dev *pdev, int dev_idx)
 
 		csrow->first_page = last_cumul_size;
 		csrow->last_page = cumul_size - 1;
-		csrow->nr_pages = cumul_size - last_cumul_size;
+		nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
 
 		for (j = 0; j < nr_channels; j++) {
 			struct dimm_info *dimm = csrow->channels[j].dimm;
+
+			dimm->nr_pages = nr_pages / nr_channels;
 			dimm->grain = I3000_DEAP_GRAIN;
 			dimm->mtype = MEM_DDR2;
 			dimm->dtype = DEV_UNKNOWN;
diff --git a/drivers/edac/i3200_edac.c b/drivers/edac/i3200_edac.c
index 8086693..b3dc867 100644
--- a/drivers/edac/i3200_edac.c
+++ b/drivers/edac/i3200_edac.c
@@ -387,11 +387,10 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 		if (nr_pages == 0)
 			continue;
 
-		csrow->nr_pages = nr_pages;
-
 		for (j = 0; j < nr_channels; j++) {
 			struct dimm_info *dimm = csrow->channels[j].dimm;
 
+			dimm->nr_pages = nr_pages / nr_channels;
 			dimm->grain = nr_pages << PAGE_SHIFT;
 			dimm->mtype = MEM_DDR2;
 			dimm->dtype = DEV_UNKNOWN;
diff --git a/drivers/edac/i5000_edac.c b/drivers/edac/i5000_edac.c
index f00f684..e8d32e8 100644
--- a/drivers/edac/i5000_edac.c
+++ b/drivers/edac/i5000_edac.c
@@ -1236,6 +1236,7 @@ static int i5000_init_csrows(struct mem_ctl_info *mci)
 {
 	struct i5000_pvt *pvt;
 	struct csrow_info *p_csrow;
+	struct dimm_info *dimm;
 	int empty, channel_count;
 	int max_csrows;
 	int mtr, mtr1;
@@ -1265,21 +1266,22 @@ static int i5000_init_csrows(struct mem_ctl_info *mci)
 
 		csrow_megs = 0;
 		for (channel = 0; channel < pvt->maxch; channel++) {
+			dimm = p_csrow->channels[channel].dimm;
 			csrow_megs += pvt->dimm_info[csrow][channel].megabytes;
-			p_csrow->channels[channel].dimm->grain = 8;
+			dimm->grain = 8;
 
 			/* Assume DDR2 for now */
-			p_csrow->channels[channel].dimm->mtype = MEM_FB_DDR2;
+			dimm->mtype = MEM_FB_DDR2;
 
 			/* ask what device type on this row */
 			if (MTR_DRAM_WIDTH(mtr))
-				p_csrow->channels[channel].dimm->dtype = DEV_X8;
+				dimm->dtype = DEV_X8;
 			else
-				p_csrow->channels[channel].dimm->dtype = DEV_X4;
+				dimm->dtype = DEV_X4;
 
-			p_csrow->channels[channel].dimm->edac_mode = EDAC_S8ECD8ED;
+			dimm->edac_mode = EDAC_S8ECD8ED;
+			dimm->nr_pages = (csrow_megs << 8) / pvt->maxch;
 		}
-		p_csrow->nr_pages = csrow_megs << 8;
 
 		empty = 0;
 	}
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 76489dc..075d3c7 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -861,18 +861,10 @@ static void __devinit i5100_init_csrows(struct mem_ctl_info *mci)
 		if (!npages)
 			continue;
 
-		/*
-		 * FIXME: these two are totally bogus -- I don't see how to
-		 * map them correctly to this structure...
-		 */
-		mci->csrows[i].nr_pages = npages;
-		mci->csrows[i].csrow_idx = i;
-		mci->csrows[i].mci = mci;
-		mci->csrows[i].nr_channels = 1;
-		mci->csrows[i].channels[0].csrow = mci->csrows + i;
 		total_pages += npages;
 
 		mci->csrows[i].channels[0].dimm = dimm;
+		dimm->nr_pages = npages;
 		dimm->location.mc_channel = chan;
 		dimm->location.mc_dimm_number = rank;
 		dimm->grain = 32;
diff --git a/drivers/edac/i5400_edac.c b/drivers/edac/i5400_edac.c
index 015a368..cbba3df 100644
--- a/drivers/edac/i5400_edac.c
+++ b/drivers/edac/i5400_edac.c
@@ -1165,8 +1165,7 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 		for (channel = 0; channel < pvt->maxch; channel++)
 			csrow_megs += pvt->dimm_info[csrow][channel].megabytes;
 
-		p_csrow->nr_pages = csrow_megs << 8;
-
+		dimm->nr_pages = csrow_megs << 8;
 		dimm->location.mc_channel = channel;
 		dimm->location.mc_dimm_number = csrow / pvt->maxch;
 		dimm->grain = 8;
diff --git a/drivers/edac/i7300_edac.c b/drivers/edac/i7300_edac.c
index 30453fa..73969d0 100644
--- a/drivers/edac/i7300_edac.c
+++ b/drivers/edac/i7300_edac.c
@@ -617,9 +617,7 @@ static void i7300_enable_error_reporting(struct mem_ctl_info *mci)
 static int decode_mtr(struct i7300_pvt *pvt,
 		      int slot, int ch, int branch,
 		      struct i7300_dimm_info *dinfo,
-		      struct csrow_info *p_csrow,
-		      struct dimm_info *dimm,
-		      u32 *nr_pages)
+		      struct dimm_info *dimm)
 {
 	int mtr, ans, addrBits, channel;
 
@@ -651,7 +649,6 @@ static int decode_mtr(struct i7300_pvt *pvt,
 	addrBits -= 3;	/* 8 bits per bytes */
 
 	dinfo->megabytes = 1 << addrBits;
-	*nr_pages = dinfo->megabytes << 8;
 
 	debugf2("\t\tWIDTH: x%d\n", MTR_DRAM_WIDTH(mtr));
 
@@ -664,8 +661,6 @@ static int decode_mtr(struct i7300_pvt *pvt,
 	debugf2("\t\tNUMCOL: %s\n", numcol_toString[MTR_DIMM_COLS(mtr)]);
 	debugf2("\t\tSIZE: %d MB\n", dinfo->megabytes);
 
-	p_csrow->csrow_idx = slot;
-
 	/*
 	 * The type of error detection actually depends of the
 	 * mode of operation. When it is just one single memory chip, at
@@ -675,6 +670,7 @@ static int decode_mtr(struct i7300_pvt *pvt,
 	 * See datasheet Sections 7.3.6 to 7.3.8
 	 */
 
+	dimm->nr_pages = MiB_TO_PAGES(dinfo->megabytes);
 	dimm->grain = 8;
 	dimm->mtype = MEM_FB_DDR2;
 	if (IS_SINGLE_MODE(pvt->mc_settings_a)) {
@@ -774,7 +770,6 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 {
 	struct i7300_pvt *pvt;
 	struct i7300_dimm_info *dinfo;
-	struct csrow_info *p_csrow;
 	int rc = -ENODEV;
 	int mtr;
 	int ch, branch, slot, channel;
@@ -807,7 +802,6 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 	dimm = mci->dimms;
 	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
 	mci->nr_dimms = 0;
-	nr_pages = 0;
 	for (slot = 0; slot < MAX_SLOTS; slot++) {
 		int where = mtr_regs[slot];
 		for (branch = 0; branch < MAX_BRANCHES; branch++) {
@@ -818,21 +812,16 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 				int channel = to_channel(ch, branch);
 
 				dinfo = &pvt->dimm_info[slot][channel];
-				p_csrow = &mci->csrows[slot];
 
 				dimm->location.mc_channel = channel;
 				dimm->location.mc_dimm_number = slot;
 
 				mtr = decode_mtr(pvt, slot, ch, branch,
-						 dinfo, p_csrow, dimm,
-						 &nr_pages);
+						 dinfo, dimm);
 				/* if no DIMMS on this row, continue */
 				if (!MTR_DIMMS_PRESENT(mtr))
 					continue;
 
-				/* Update per_csrow memory count */
-				p_csrow->nr_pages += nr_pages;
-
 				rc = 0;
 
 				mci->nr_dimms++;
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 4425ab9..cbee6ad 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -718,16 +718,11 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 			npages = MiB_TO_PAGES(size);
 
 			csr = &mci->csrows[csrow];
-			csr->nr_pages = npages;
-
-			csr->csrow_idx = csrow;
-			csr->nr_channels = 1;
-
-			csr->channels[0].chan_idx = i;
-			csr->channels[0].ce_count = 0;
 
 			pvt->csrow_map[i][j] = csrow;
 
+			dimm->nr_pages = npages;
+
 			switch (banks) {
 			case 4:
 				dimm->dtype = DEV_X4;
diff --git a/drivers/edac/i82443bxgx_edac.c b/drivers/edac/i82443bxgx_edac.c
index 1e19492..74166ae 100644
--- a/drivers/edac/i82443bxgx_edac.c
+++ b/drivers/edac/i82443bxgx_edac.c
@@ -220,7 +220,7 @@ static void i82443bxgx_init_csrows(struct mem_ctl_info *mci,
 		row_base = row_high_limit_last;
 		csrow->first_page = row_base >> PAGE_SHIFT;
 		csrow->last_page = (row_high_limit >> PAGE_SHIFT) - 1;
-		csrow->nr_pages = csrow->last_page - csrow->first_page + 1;
+		dimm->nr_pages = csrow->last_page - csrow->first_page + 1;
 		/* EAP reports in 4kilobyte granularity [61] */
 		dimm->grain = 1 << 12;
 		dimm->mtype = mtype;
diff --git a/drivers/edac/i82860_edac.c b/drivers/edac/i82860_edac.c
index acbd924..48e0ecd 100644
--- a/drivers/edac/i82860_edac.c
+++ b/drivers/edac/i82860_edac.c
@@ -167,7 +167,7 @@ static void i82860_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev)
 
 		csrow->first_page = last_cumul_size;
 		csrow->last_page = cumul_size - 1;
-		csrow->nr_pages = cumul_size - last_cumul_size;
+		dimm->nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
 		dimm->grain = 1 << 12;	/* I82860_EAP has 4KiB reolution */
 		dimm->mtype = MEM_RMBS;
diff --git a/drivers/edac/i82875p_edac.c b/drivers/edac/i82875p_edac.c
index 81f79e2..dc207dc 100644
--- a/drivers/edac/i82875p_edac.c
+++ b/drivers/edac/i82875p_edac.c
@@ -347,7 +347,7 @@ static void i82875p_init_csrows(struct mem_ctl_info *mci,
 	unsigned long last_cumul_size;
 	u8 value;
 	u32 drc_ddim;		/* DRAM Data Integrity Mode 0=none,2=edac */
-	u32 cumul_size;
+	u32 cumul_size, nr_pages;
 	int index, j;
 
 	drc_ddim = (drc >> 18) & 0x1;
@@ -371,12 +371,13 @@ static void i82875p_init_csrows(struct mem_ctl_info *mci,
 
 		csrow->first_page = last_cumul_size;
 		csrow->last_page = cumul_size - 1;
-		csrow->nr_pages = cumul_size - last_cumul_size;
+		nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
 
 		for (j = 0; j < nr_chans; j++) {
 			dimm = csrow->channels[j].dimm;
 
+			dimm->nr_pages = nr_pages / nr_chans;
 			dimm->grain = 1 << 12;	/* I82875P_EAP has 4KiB reolution */
 			dimm->mtype = MEM_DDR;
 			dimm->dtype = DEV_UNKNOWN;
diff --git a/drivers/edac/i82975x_edac.c b/drivers/edac/i82975x_edac.c
index 9e1bca5..b70ea1e 100644
--- a/drivers/edac/i82975x_edac.c
+++ b/drivers/edac/i82975x_edac.c
@@ -362,7 +362,7 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 	struct csrow_info *csrow;
 	unsigned long last_cumul_size;
 	u8 value;
-	u32 cumul_size;
+	u32 cumul_size, nr_pages;
 	int index, chan;
 	struct dimm_info *dimm;
 	enum dev_type dtype;
@@ -397,6 +397,7 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 		debugf3("%s(): (%d) cumul_size 0x%x\n", __func__, index,
 			cumul_size);
 
+		nr_pages = cumul_size - last_cumul_size;
 		/*
 		 * Initialise dram labels
 		 * index values:
@@ -406,6 +407,8 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 		dtype = i82975x_dram_type(mch_window, index);
 		for (chan = 0; chan < csrow->nr_channels; chan++) {
 			mci->csrows[index].channels[chan].dimm = dimm;
+
+			dimm->nr_pages = nr_pages / csrow->nr_channels;
 			dimm->location.csrow = index;
 			dimm->location.csrow_channel = chan;
 			strncpy(csrow->channels[chan].dimm->label,
@@ -424,7 +427,6 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 
 		csrow->first_page = last_cumul_size;
 		csrow->last_page = cumul_size - 1;
-		csrow->nr_pages = cumul_size - last_cumul_size;
 		last_cumul_size = cumul_size;
 	}
 }
diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index fb92916..c1d9e15 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -947,7 +947,8 @@ static void __devinit mpc85xx_init_csrows(struct mem_ctl_info *mci)
 
 		csrow->first_page = start;
 		csrow->last_page = end;
-		csrow->nr_pages = end + 1 - start;
+
+		dimm->nr_pages = end + 1 - start;
 		dimm->grain = 8;
 		dimm->mtype = mtype;
 		dimm->dtype = DEV_UNKNOWN;
diff --git a/drivers/edac/mv64x60_edac.c b/drivers/edac/mv64x60_edac.c
index d2e3c39..281e245 100644
--- a/drivers/edac/mv64x60_edac.c
+++ b/drivers/edac/mv64x60_edac.c
@@ -667,7 +667,8 @@ static void mv64x60_init_csrows(struct mem_ctl_info *mci,
 
 	csrow = &mci->csrows[0];
 	dimm = csrow->channels[0].dimm;
-	csrow->nr_pages = pdata->total_mem >> PAGE_SHIFT;
+
+	dimm->nr_pages = pdata->total_mem >> PAGE_SHIFT;
 	dimm->grain = 8;
 
 	dimm->mtype = (ctl & MV64X60_SDRAM_REGISTERED) ? MEM_RDDR : MEM_DDR;
diff --git a/drivers/edac/pasemi_edac.c b/drivers/edac/pasemi_edac.c
index 4e53270..3fcefda 100644
--- a/drivers/edac/pasemi_edac.c
+++ b/drivers/edac/pasemi_edac.c
@@ -153,20 +153,20 @@ static int pasemi_edac_init_csrows(struct mem_ctl_info *mci,
 		switch ((rankcfg & MCDRAM_RANKCFG_TYPE_SIZE_M) >>
 			MCDRAM_RANKCFG_TYPE_SIZE_S) {
 		case 0:
-			csrow->nr_pages = 128 << (20 - PAGE_SHIFT);
+			dimm->nr_pages = 128 << (20 - PAGE_SHIFT);
 			break;
 		case 1:
-			csrow->nr_pages = 256 << (20 - PAGE_SHIFT);
+			dimm->nr_pages = 256 << (20 - PAGE_SHIFT);
 			break;
 		case 2:
 		case 3:
-			csrow->nr_pages = 512 << (20 - PAGE_SHIFT);
+			dimm->nr_pages = 512 << (20 - PAGE_SHIFT);
 			break;
 		case 4:
-			csrow->nr_pages = 1024 << (20 - PAGE_SHIFT);
+			dimm->nr_pages = 1024 << (20 - PAGE_SHIFT);
 			break;
 		case 5:
-			csrow->nr_pages = 2048 << (20 - PAGE_SHIFT);
+			dimm->nr_pages = 2048 << (20 - PAGE_SHIFT);
 			break;
 		default:
 			edac_mc_printk(mci, KERN_ERR,
@@ -176,8 +176,8 @@ static int pasemi_edac_init_csrows(struct mem_ctl_info *mci,
 		}
 
 		csrow->first_page = last_page_in_mmc;
-		csrow->last_page = csrow->first_page + csrow->nr_pages - 1;
-		last_page_in_mmc += csrow->nr_pages;
+		csrow->last_page = csrow->first_page + dimm->nr_pages - 1;
+		last_page_in_mmc += dimm->nr_pages;
 		csrow->page_mask = 0;
 		dimm->grain = PASEMI_EDAC_ERROR_GRAIN;
 		dimm->mtype = MEM_DDR;
diff --git a/drivers/edac/ppc4xx_edac.c b/drivers/edac/ppc4xx_edac.c
index 6dc000e..0f06f14 100644
--- a/drivers/edac/ppc4xx_edac.c
+++ b/drivers/edac/ppc4xx_edac.c
@@ -896,7 +896,7 @@ ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
 	enum dev_type dtype;
 	enum edac_type edac_mode;
 	int row, j;
-	u32 mbxcf, size;
+	u32 mbxcf, size, nr_pages;
 
 	/* Establish the memory type and width */
 
@@ -947,7 +947,7 @@ ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
 		case SDRAM_MBCF_SZ_2GB:
 		case SDRAM_MBCF_SZ_4GB:
 		case SDRAM_MBCF_SZ_8GB:
-			csi->nr_pages = SDRAM_MBCF_SZ_TO_PAGES(size);
+			nr_pages = SDRAM_MBCF_SZ_TO_PAGES(size);
 			break;
 		default:
 			ppc4xx_edac_mc_printk(KERN_ERR, mci,
@@ -973,6 +973,7 @@ ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
 		for (j = 0; j < csi->nr_channels; j++) {
 			struct dimm_info *dimm = csi->channels[j].dimm;
 
+			dimm->nr_pages  = nr_pages;
 			dimm->grain	= 1;
 
 			dimm->mtype	= mtype;
diff --git a/drivers/edac/r82600_edac.c b/drivers/edac/r82600_edac.c
index c8b774d..a4b0626 100644
--- a/drivers/edac/r82600_edac.c
+++ b/drivers/edac/r82600_edac.c
@@ -249,7 +249,8 @@ static void r82600_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 
 		csrow->first_page = row_base >> PAGE_SHIFT;
 		csrow->last_page = (row_high_limit >> PAGE_SHIFT) - 1;
-		csrow->nr_pages = csrow->last_page - csrow->first_page + 1;
+
+		dimm->nr_pages = csrow->last_page - csrow->first_page + 1;
 		/* Error address is top 19 bits - so granularity is      *
 		 * 14 bits                                               */
 		dimm->grain = 1 << 14;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 080ba3d..9266e3f 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -643,15 +643,12 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 				 * csrows.
 				 */
 				csr = &mci->csrows[csrow];
-				csr->nr_pages = npages;
-				csr->csrow_idx = csrow;
-				csr->nr_channels = 1;
-				csr->channels[0].chan_idx = i;
 				pvt->csrow_map[i][j] = csrow;
 				last_page += npages;
 				csrow++;
 
 				csr->channels[0].dimm = dimm;
+				dimm->nr_pages = npages;
 				dimm->location.mc_channel = i;
 				dimm->location.mc_dimm_number = j;
 				dimm->grain = 32;
diff --git a/drivers/edac/tile_edac.c b/drivers/edac/tile_edac.c
index ba0917b..6314ff9 100644
--- a/drivers/edac/tile_edac.c
+++ b/drivers/edac/tile_edac.c
@@ -110,7 +110,7 @@ static int __devinit tile_edac_init_csrows(struct mem_ctl_info *mci)
 		return -1;
 	}
 
-	csrow->nr_pages = mem_info.mem_size >> PAGE_SHIFT;
+	dimm->nr_pages = mem_info.mem_size >> PAGE_SHIFT;
 	dimm->grain = TILE_EDAC_ERROR_GRAIN;
 	dimm->dtype = DEV_UNKNOWN;
 
diff --git a/drivers/edac/x38_edac.c b/drivers/edac/x38_edac.c
index 7be10dd..0de288f 100644
--- a/drivers/edac/x38_edac.c
+++ b/drivers/edac/x38_edac.c
@@ -373,10 +373,10 @@ static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 		if (nr_pages == 0)
 			continue;
 
-		csrow->nr_pages = nr_pages;
-
 		for (j = 0; j < x38_channel_num; j++) {
 			struct dimm_info *dimm = csrow->channels[j].dimm;
+
+			dimm->nr_pages = nr_pages / x38_channel_num;
 			dimm->grain = nr_pages << PAGE_SHIFT;
 			dimm->mtype = MEM_DDR2;
 			dimm->dtype = DEV_UNKNOWN;
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 4e6420c..753187d 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -368,6 +368,8 @@ struct dimm_info {
 	enum mem_type mtype;	/* memory dimm type */
 	enum edac_type edac_mode;	/* EDAC mode for this dimm */
 
+	u32 nr_pages;			/* number of pages in csrow */
+
 	u32 ce_count;		/* Correctable Errors for this dimm */
 };
 
@@ -379,13 +381,13 @@ struct csrow_channel_info {
 };
 
 struct csrow_info {
+	int csrow_idx;			/* the chip-select row */
+
+	/* Used only by edac_mc_find_csrow_by_page() */
 	unsigned long first_page;	/* first page number in csrow */
 	unsigned long last_page;	/* last page number in csrow */
-	u32 nr_pages;			/* number of pages in csrow */
 	unsigned long page_mask;	/* used for interleaving -
-					 * 0UL for non intlv
-					 */
-	int csrow_idx;			/* the chip-select row */
+					 * 0UL for non intlv */
 
 	u32 ue_count;		/* Uncorrectable Errors for this csrow */
 	u32 ce_count;		/* Correctable Errors for this csrow */
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 14/31] edac: Add per-dimm sysfs show nodes
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (12 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 13/31] edac: move nr_pages to dimm struct Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 15/31] edac: DIMM location cleanup Mauro Carvalho Chehab
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Add sysfs nodes to describe DIMM properties: size, memory type,
dev type and edac mode.

With this change, the physical memory characteristics should
be presented, as detected by the memory controller.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc_sysfs.c |   44 +++++++++++++++++++++++++++++++++++++----
 1 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 62b5029..d175e48 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -144,7 +144,13 @@ static ssize_t csrow_ce_count_show(struct csrow_info *csrow, char *data,
 static ssize_t csrow_size_show(struct csrow_info *csrow, char *data,
 				int private)
 {
-	return sprintf(data, "%u\n", PAGES_TO_MiB(csrow->channels[0].dimm->nr_pages));
+	int i;
+	u32 nr_pages = 0;
+
+	for (i = 0; i < csrow->nr_channels; i++)
+		nr_pages += csrow->channels[i].dimm->nr_pages;
+
+	return sprintf(data, "%u\n", PAGES_TO_MiB(nr_pages));
 }
 
 static ssize_t csrow_mem_type_show(struct csrow_info *csrow, char *data,
@@ -497,14 +503,42 @@ static ssize_t dimmdev_label_store(struct dimm_info *dimm,
 	return max_size;
 }
 
+static ssize_t dimmdev_size_show(struct dimm_info *dimm, char *data)
+{
+	return sprintf(data, "%u\n", PAGES_TO_MiB(dimm->nr_pages));
+}
+
+static ssize_t dimmdev_mem_type_show(struct dimm_info *dimm, char *data)
+{
+	return sprintf(data, "%s\n", mem_types[dimm->mtype]);
+}
+
+static ssize_t dimmdev_dev_type_show(struct dimm_info *dimm, char *data)
+{
+	return sprintf(data, "%s\n", dev_types[dimm->dtype]);
+}
+
+static ssize_t dimmdev_edac_mode_show(struct dimm_info *dimm, char *data)
+{
+	return sprintf(data, "%s\n", edac_caps[dimm->edac_mode]);
+}
+
 /* default cwrow<id>/attribute files */
-DIMMDEV_ATTR(label, S_IRUGO | S_IWUSR, dimmdev_label_show, dimmdev_label_store);
-DIMMDEV_ATTR(location, S_IRUGO, dimmdev_location_show, NULL);
+DIMMDEV_ATTR(dimm_label, S_IRUGO | S_IWUSR, dimmdev_label_show, dimmdev_label_store);
+DIMMDEV_ATTR(dimm_location, S_IRUGO, dimmdev_location_show, NULL);
+DIMMDEV_ATTR(dimm_size, S_IRUGO, dimmdev_size_show, NULL);
+DIMMDEV_ATTR(dimm_mem_type, S_IRUGO, dimmdev_mem_type_show, NULL);
+DIMMDEV_ATTR(dimm_dev_type, S_IRUGO, dimmdev_dev_type_show, NULL);
+DIMMDEV_ATTR(dimm_edac_mode, S_IRUGO, dimmdev_edac_mode_show, NULL);
 
 /* default attributes of the DIMM<id> object */
 static struct dimmdev_attribute *default_dimm_attr[] = {
-	&attr_label,
-	&attr_location,
+	&attr_dimm_label,
+	&attr_dimm_location,
+	&attr_dimm_size,
+	&attr_dimm_mem_type,
+	&attr_dimm_dev_type,
+	&attr_dimm_edac_mode,
 	NULL,
 };
 
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 15/31] edac: DIMM location cleanup
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (13 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 14/31] edac: Add per-dimm sysfs show nodes Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 16/31] edac/ppc4xx_edac: Fix compilation Mauro Carvalho Chehab
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Cleans up the DIMM location information:
	- Remove it from the structure;
	- make the location sysfs node code more flexible;
	- cleans up the dimm code inside the drivers that
	  fills the dimm location properties.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/amd64_edac.c    |    6 ++--
 drivers/edac/edac_mc.c       |    8 ++++--
 drivers/edac/edac_mc_sysfs.c |   26 +++++++++++++++++-------
 drivers/edac/i5100_edac.c    |   44 ++++++++++++++++++++---------------------
 drivers/edac/i5400_edac.c    |   32 ++++++++++++------------------
 drivers/edac/i7300_edac.c    |   15 ++++++++-----
 drivers/edac/i7core_edac.c   |   19 +++++++----------
 drivers/edac/i82975x_edac.c  |   14 ++++--------
 drivers/edac/sb_edac.c       |   11 ++-------
 include/linux/edac.h         |   27 ++++++++-----------------
 10 files changed, 94 insertions(+), 108 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 613d5f1..3cba6a5 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2142,7 +2142,7 @@ static int init_csrows(struct mem_ctl_info *mci)
 	struct csrow_info *csrow;
 	struct amd64_pvt *pvt = mci->pvt_info;
 	u64 base, mask;
-	u32 val;
+	u32 val, nr_pages;
 	int i, j, empty = 1;
 	enum mem_type mtype;
 	enum edac_type edac_mode;
@@ -2186,12 +2186,12 @@ static int init_csrows(struct mem_ctl_info *mci)
 		for (j = 0; j < pvt->channel_count; j++) {
 			csrow->channels[j].dimm->mtype = mtype;
 			csrow->channels[j].dimm->edac_mode = edac_mode;
-			csrow->channels[j].dimm->n_pages = npages;
+			csrow->channels[j].dimm->nr_pages = nr_pages;
 
 		}
 
 		debugf1("  for MC node %d csrow %d:\n", pvt->mc_node_id, i);
-		debugf1("    nr_pages: %u\n", csrow->nr_pages);
+		debugf1("    nr_pages: %u\n", nr_pages);
 	}
 
 	return empty;
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index f33d603..ee3f0f8 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -219,13 +219,15 @@ struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 	 * as most drivers are based on such assumption.
 	 */
 	if (!mci->nr_dimms) {
-		mci->dimm_loc_type = DIMM_LOC_CSROW;
 		dimm = mci->dimms;
 		for (row = 0; row < mci->nr_csrows; row++) {
 			for (chn = 0; chn < mci->csrows[row].nr_channels; chn++) {
 				mci->csrows[row].channels[chn].dimm = dimm;
-				dimm->location.csrow = row;
-				dimm->location.csrow_channel = chn;
+				dimm->mc_branch = -1;
+				dimm->mc_channel = -1;
+				dimm->mc_dimm_number = -1;
+				dimm->csrow = row;
+				dimm->csrow_channel = chn;
 				dimm++;
 				mci->nr_dimms++;
 			}
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index d175e48..64b4c76 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -471,14 +471,24 @@ static const struct sysfs_ops dimmfs_ops = {
 /* show/store functions for DIMM Label attributes */
 static ssize_t dimmdev_location_show(struct dimm_info *dimm, char *data)
 {
-	if (dimm->mci->dimm_loc_type == DIMM_LOC_CSROW)
-		return sprintf(data, "csrow %d, channel %d\n",
-			       dimm->location.csrow,
-			       dimm->location.csrow_channel);
-	else
-		return sprintf(data, "channel %d, dimm %d\n",
-			       dimm->location.mc_channel,
-			       dimm->location.mc_dimm_number);
+	char *p = data;
+
+	if (dimm->mc_branch >= 0)
+		p += sprintf(p, "branch %d ", dimm->mc_branch);
+
+	if (dimm->mc_channel >= 0)
+		p += sprintf(p, "channel %d ", dimm->mc_channel);
+
+	if (dimm->csrow >= 0)
+		p += sprintf(p, "csrow %d ", dimm->csrow);
+
+	if (dimm->csrow_channel >= 0)
+		p += sprintf(p, "cs_channel %d ", dimm->csrow_channel);
+
+	if (dimm->mc_dimm_number >= 0)
+		p += sprintf(p, "dimm %d ", dimm->mc_dimm_number);
+
+	return p - data;
 }
 
 static ssize_t dimmdev_label_show(struct dimm_info *dimm, char *data)
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 075d3c7..f9baee3 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -848,35 +848,33 @@ static void __devinit i5100_init_csrows(struct mem_ctl_info *mci)
 	int i;
 	unsigned long total_pages = 0UL;
 	struct i5100_priv *priv = mci->pvt_info;
-	struct dimm_info *dimm;
 
-	dimm = mci->dimms;
-	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
-	mci->nr_dimms = 0;
-	for (i = 0; i < mci->nr_csrows; i++) {
+	for (i = 0; i < mci->nr_dimms; i++) {
 		const unsigned long npages = i5100_npages(mci, i);
 		const unsigned chan = i5100_csrow_to_chan(mci, i);
 		const unsigned rank = i5100_csrow_to_rank(mci, i);
+		struct dimm_info *dimm = &mci->dimms[i];
 
-		if (!npages)
-			continue;
-
-		total_pages += npages;
-
-		mci->csrows[i].channels[0].dimm = dimm;
 		dimm->nr_pages = npages;
-		dimm->location.mc_channel = chan;
-		dimm->location.mc_dimm_number = rank;
-		dimm->grain = 32;
-		dimm->dtype = (priv->mtr[chan][rank].width == 4) ?
-			      DEV_X4 : DEV_X8;
-		dimm->mtype = MEM_RDDR2;
-		dimm->edac_mode = EDAC_SECDED;
-		snprintf(dimm->label, sizeof(dimm->label),
-			 "DIMM%u",
-			 i5100_rank_to_slot(mci, chan, rank));
-		mci->nr_dimms++;
-		dimm++;
+
+		dimm->mc_branch = -1;
+		dimm->mc_channel = chan;
+		dimm->mc_dimm_number = rank;
+		dimm->csrow = -1;
+		dimm->csrow_channel = -1;
+
+		if (npages) {
+			total_pages += npages;
+
+			dimm->grain = 32;
+			dimm->dtype = (priv->mtr[chan][rank].width == 4) ?
+				DEV_X4 : DEV_X8;
+			dimm->mtype = MEM_RDDR2;
+			dimm->edac_mode = EDAC_SECDED;
+			snprintf(dimm->label, sizeof(dimm->label),
+				"DIMM%u",
+				i5100_rank_to_slot(mci, chan, rank));
+		}
 	}
 }
 
diff --git a/drivers/edac/i5400_edac.c b/drivers/edac/i5400_edac.c
index cbba3df..6b07450 100644
--- a/drivers/edac/i5400_edac.c
+++ b/drivers/edac/i5400_edac.c
@@ -1130,14 +1130,12 @@ static void i5400_get_mc_regs(struct mem_ctl_info *mci)
 static int i5400_init_csrows(struct mem_ctl_info *mci)
 {
 	struct i5400_pvt *pvt;
-	struct csrow_info *p_csrow;
 	int empty, channel_count;
 	int max_csrows;
 	int mtr;
-	int csrow_megs;
+	int size_mb;
 	int channel;
-	int csrow;
-	struct dimm_info *dimm;
+	int slot;
 
 	pvt = mci->pvt_info;
 
@@ -1146,34 +1144,30 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 
 	empty = 1;		/* Assume NO memory */
 
-	dimm = mci->dimms;
-	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
-	mci->nr_dimms = 0;
-	for (csrow = 0; csrow < max_csrows; csrow++) {
-		p_csrow = &mci->csrows[csrow];
+	for (slot = 0; slot < mci->nr_dimms; slot++) {
+		struct dimm_info *dimm = &mci->dimms[slot];
+		channel = slot % pvt->maxch;
 
-		p_csrow->csrow_idx = csrow;
+		dimm->mc_branch = channel / 2;
+		dimm->mc_channel = channel % 2;
+		dimm->mc_dimm_number = slot / pvt->maxch;
+		dimm->csrow = -1;
+		dimm->csrow_channel = -1;
 
 		/* use branch 0 for the basis */
-		mtr = determine_mtr(pvt, csrow, 0);
+		mtr = determine_mtr(pvt, slot, 0);
 
 		/* if no DIMMS on this row, continue */
 		if (!MTR_DIMMS_PRESENT(mtr))
 			continue;
 
-		csrow_megs = 0;
-		for (channel = 0; channel < pvt->maxch; channel++)
-			csrow_megs += pvt->dimm_info[csrow][channel].megabytes;
+		size_mb =  pvt->dimm_info[slot / pvt->maxch][channel].megabytes;
 
-		dimm->nr_pages = csrow_megs << 8;
-		dimm->location.mc_channel = channel;
-		dimm->location.mc_dimm_number = csrow / pvt->maxch;
+		dimm->nr_pages = size_mb << 8;
 		dimm->grain = 8;
 		dimm->dtype = MTR_DRAM_WIDTH(mtr) ? DEV_X8 : DEV_X4;
 		dimm->mtype = MEM_RDDR2;
 		dimm->edac_mode = EDAC_SECDED;
-		mci->nr_dimms++;
-		dimm++;
 
 		empty = 0;
 	}
diff --git a/drivers/edac/i7300_edac.c b/drivers/edac/i7300_edac.c
index 73969d0..0838ec2 100644
--- a/drivers/edac/i7300_edac.c
+++ b/drivers/edac/i7300_edac.c
@@ -773,7 +773,6 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 	int rc = -ENODEV;
 	int mtr;
 	int ch, branch, slot, channel;
-	u32 nr_pages;
 	struct dimm_info *dimm;
 
 	pvt = mci->pvt_info;
@@ -800,7 +799,6 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 
 	/* Get the set of MTR[0-7] regs by each branch */
 	dimm = mci->dimms;
-	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
 	mci->nr_dimms = 0;
 	for (slot = 0; slot < MAX_SLOTS; slot++) {
 		int where = mtr_regs[slot];
@@ -813,19 +811,24 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 
 				dinfo = &pvt->dimm_info[slot][channel];
 
-				dimm->location.mc_channel = channel;
-				dimm->location.mc_dimm_number = slot;
+				dimm->mc_branch = branch;
+				dimm->mc_channel = ch;
+				dimm->mc_dimm_number = slot;
+				dimm->csrow = -1;
+				dimm->csrow_channel = -1;
 
 				mtr = decode_mtr(pvt, slot, ch, branch,
 						 dinfo, dimm);
+
+				mci->nr_dimms++;
+				dimm++;
+
 				/* if no DIMMS on this row, continue */
 				if (!MTR_DIMMS_PRESENT(mtr))
 					continue;
 
 				rc = 0;
 
-				mci->nr_dimms++;
-				dimm++;
 			}
 		}
 	}
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index cbee6ad..c6c649d 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -601,7 +601,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 	int csrow = 0;
 	enum edac_type mode;
 	enum mem_type mtype;
-	struct dimm_info *dimm;
 
 	/* Get data from the MC register, function 0 */
 	pdev = pvt->pci_mcr[0];
@@ -638,9 +637,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 		numrow(pvt->info.max_dod >> 6),
 		numcol(pvt->info.max_dod >> 9));
 
-	dimm = mci->dimms;
-	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
-	mci->nr_dimms = 0;
 	for (i = 0; i < NUM_CHANS; i++) {
 		u32 data, dimm_dod[3], value[8];
 
@@ -693,9 +689,16 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 			(data & REGISTERED_DIMM) ? 'R' : 'U');
 
 		for (j = 0; j < 3; j++) {
+			struct dimm_info *dimm = &mci->dimms[i * 3 + j];
 			u32 banks, ranks, rows, cols;
 			u32 size, npages;
 
+			dimm->mc_branch = -1;
+			dimm->mc_channel = i;
+			dimm->mc_dimm_number = j;
+			dimm->csrow = -1;
+			dimm->csrow_channel = -1;
+
 			if (!DIMM_PRESENT(dimm_dod[j]))
 				continue;
 
@@ -718,6 +721,7 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 			npages = MiB_TO_PAGES(size);
 
 			csr = &mci->csrows[csrow];
+			csr->channels[0].dimm = dimm;
 
 			pvt->csrow_map[i][j] = csrow;
 
@@ -737,19 +741,12 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 				dimm->dtype = DEV_UNKNOWN;
 			}
 
-			csr->channels[0].dimm = dimm;
-
-			dimm->location.mc_channel = i;
-			dimm->location.mc_dimm_number = j;
 			snprintf(dimm->label, sizeof(dimm->label),
 				 "CPU#%uChannel#%u_DIMM#%u",
 				 pvt->i7core_dev->socket, i, j);
 			dimm->grain = 8;
 			dimm->edac_mode = mode;
 			dimm->mtype = mtype;
-
-			mci->nr_dimms++;
-			dimm++;
 			csrow++;
 		}
 
diff --git a/drivers/edac/i82975x_edac.c b/drivers/edac/i82975x_edac.c
index b70ea1e..d7dc455 100644
--- a/drivers/edac/i82975x_edac.c
+++ b/drivers/edac/i82975x_edac.c
@@ -378,9 +378,6 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 	 *
 	 */
 
-	mci->dimm_loc_type = DIMM_LOC_CSROW;
-	dimm = mci->dimms;
-	mci->nr_dimms = 0;
 	for (index = 0; index < mci->nr_csrows; index++) {
 		csrow = &mci->csrows[index];
 
@@ -406,11 +403,12 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 		 */
 		dtype = i82975x_dram_type(mch_window, index);
 		for (chan = 0; chan < csrow->nr_channels; chan++) {
-			mci->csrows[index].channels[chan].dimm = dimm;
+			dimm = mci->csrows[index].channels[chan].dimm;
+
+			if (!nr_pages)
+				continue;
 
 			dimm->nr_pages = nr_pages / csrow->nr_channels;
-			dimm->location.csrow = index;
-			dimm->location.csrow_channel = chan;
 			strncpy(csrow->channels[chan].dimm->label,
 					labels[(index >> 1) + (chan * 2)],
 					EDAC_MC_LABEL_LEN);
@@ -418,11 +416,9 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 			dimm->dtype = dtype;
 			dimm->mtype = MEM_DDR2; /* I82975x supports only DDR2 */
 			dimm->edac_mode = EDAC_SECDED; /* only supported */
-			dimm++;
-			mci->nr_dimms++;
 		}
 
-		if (cumul_size == last_cumul_size)
+		if (!nr_pages)
 			continue;	/* not populated */
 
 		csrow->first_page = last_cumul_size;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 9266e3f..981262b 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -560,7 +560,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 	u32 reg;
 	enum edac_type mode;
 	enum mem_type mtype;
-	struct dimm_info *dimm;
 
 	pci_read_config_dword(pvt->pci_br, SAD_TARGET, &reg);
 	pvt->sbridge_dev->source_id = SOURCE_ID(reg);
@@ -612,13 +611,11 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 	/* On all supported DDR3 DIMM types, there are 8 banks available */
 	banks = 8;
 
-	dimm = mci->dimms;
-	mci->dimm_loc_type = DIMM_LOC_MC_CHANNEL;
-	mci->nr_dimms = 0;
 	for (i = 0; i < NUM_CHANNELS; i++) {
 		u32 mtr;
 
 		for (j = 0; j < ARRAY_SIZE(mtr_regs); j++) {
+			struct dimm_info *dimm = &mci->dimms[j];
 			pci_read_config_dword(pvt->pci_tad[i],
 					      mtr_regs[j], &mtr);
 			debugf4("Channel #%d  MTR%d = %x\n", i, j, mtr);
@@ -649,8 +646,8 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 
 				csr->channels[0].dimm = dimm;
 				dimm->nr_pages = npages;
-				dimm->location.mc_channel = i;
-				dimm->location.mc_dimm_number = j;
+				dimm->mc_channel = i;
+				dimm->mc_dimm_number = j;
 				dimm->grain = 32;
 				dimm->dtype = (banks == 8) ? DEV_X8 : DEV_X4;
 				dimm->mtype = mtype;
@@ -658,8 +655,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 				snprintf(dimm->label, sizeof(dimm->label),
 					 "CPU_SrcID#%u_Channel#%u_DIMM#%u",
 					 pvt->sbridge_dev->source_id, i, j);
-				mci->nr_dimms++;
-				dimm++;
 			}
 		}
 	}
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 753187d..652be25 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -341,25 +341,17 @@ enum scrub_type {
  * PS - I enjoyed writing all that about as much as you enjoyed reading it.
  */
 
-enum dimm_location_type {
-	DIMM_LOC_CSROW,
-	DIMM_LOC_MC_CHANNEL,
-};
-
-/* FIXME: add a per-dimm ce error count */
+/* FIXME: add the proper per-location error counts */
 struct dimm_info {
 	char label[EDAC_MC_LABEL_LEN + 1];	/* DIMM label on motherboard */
-	unsigned memory_controller;
-	union {
-		struct {
-			unsigned mc_channel;
-			unsigned mc_dimm_number;
-		};
-		struct {
-			unsigned csrow;
-			unsigned csrow_channel;
-		};
-	} location;
+
+	/* Memory location data */
+	int mc_branch;
+	int mc_channel;
+	int csrow;
+	int mc_dimm_number;
+	int csrow_channel;
+
 	struct kobject kobj;		/* sysfs kobject for this csrow */
 	struct mem_ctl_info *mci;	/* the parent */
 
@@ -479,7 +471,6 @@ struct mem_ctl_info {
 	/*
 	 * DIMM info. Will eventually remove the entire csrows_info some day
 	 */
-	enum dimm_location_type dimm_loc_type;
 	unsigned nr_dimms;
 	struct dimm_info *dimms;
 
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 16/31] edac/ppc4xx_edac: Fix compilation
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (14 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 15/31] edac: DIMM location cleanup Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 17/31] edac-mc: Allow reporting errors on a non-csrow oriented way Mauro Carvalho Chehab
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

It seems that nobody is cross-compiling for this arch anymore...

drivers/edac/ppc4xx_edac.c: In function 'ppc4xx_edac_probe':
drivers/edac/ppc4xx_edac.c:188:12: error: storage class specified for parameter 'ppc4xx_edac_remove'
...
drivers/edac/ppc4xx_edac.c:1068:19: error: 'match' undeclared (first use in this function)
drivers/edac/ppc4xx_edac.c:1068:19: note: each undeclared identifier is reported only once for each function it appears in
drivers/edac/ppc4xx_edac.c:1068:36: warning: left-hand operand of comma expression has no effect [-Wunused-value]

Acked-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/ppc4xx_edac.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/ppc4xx_edac.c b/drivers/edac/ppc4xx_edac.c
index 0f06f14..1adaddf 100644
--- a/drivers/edac/ppc4xx_edac.c
+++ b/drivers/edac/ppc4xx_edac.c
@@ -184,7 +184,7 @@ struct ppc4xx_ecc_status {
 
 /* Function Prototypes */
 
-static int ppc4xx_edac_probe(struct platform_device *device)
+static int ppc4xx_edac_probe(struct platform_device *device);
 static int ppc4xx_edac_remove(struct platform_device *device);
 
 /* Global Variables */
@@ -1065,7 +1065,7 @@ ppc4xx_edac_mc_init(struct mem_ctl_info *mci,
 
 	mci->mod_name		= PPC4XX_EDAC_MODULE_NAME;
 	mci->mod_ver		= PPC4XX_EDAC_MODULE_REVISION;
-	mci->ctl_name		= match->compatible,
+	mci->ctl_name		= ppc4xx_edac_match->compatible,
 	mci->dev_name		= np->full_name;
 
 	/* Initialize callbacks */
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 17/31] edac-mc: Allow reporting errors on a non-csrow oriented way
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (15 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 16/31] edac/ppc4xx_edac: Fix compilation Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 18/31] edac.h: Use kernel-doc-nano-HOWTO.txt notation for enums Mauro Carvalho Chehab
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

The edac core were written with the idea that memory controllers
are able to directly access csrows, and that the channels are
used inside a csrows select.

This is not true for FB-DIMM and RAMBUS memory controllers.

Also, some advanced memory controllers don't present a per-csrows
view.

So, change the allocation and error report routines to allow
them to work with all types of architectures.

This allowed to remove several hacks on FB-DIMM and RAMBUS
memory controllers.

Compiled-tested only on all platforms (x86_64, i386, tile and several
ppc subarchs).

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/amd64_edac.c       |  145 +++++--
 drivers/edac/amd76x_edac.c      |   27 +-
 drivers/edac/cell_edac.c        |   21 +-
 drivers/edac/cpc925_edac.c      |   21 +-
 drivers/edac/e752x_edac.c       |   40 ++-
 drivers/edac/e7xxx_edac.c       |   42 ++-
 drivers/edac/edac_core.h        |   74 ++--
 drivers/edac/edac_device.c      |   27 +-
 drivers/edac/edac_mc.c          |  837 +++++++++++++++++++++++++--------------
 drivers/edac/edac_mc_sysfs.c    |   80 +++--
 drivers/edac/edac_module.h      |    2 +-
 drivers/edac/edac_pci.c         |    7 +-
 drivers/edac/i3000_edac.c       |   27 +-
 drivers/edac/i3200_edac.c       |   33 +-
 drivers/edac/i5000_edac.c       |   48 ++-
 drivers/edac/i5100_edac.c       |   73 ++---
 drivers/edac/i5400_edac.c       |   40 +-
 drivers/edac/i7300_edac.c       |   61 ++--
 drivers/edac/i7core_edac.c      |  102 +++--
 drivers/edac/i82443bxgx_edac.c  |   26 +-
 drivers/edac/i82860_edac.c      |   46 ++-
 drivers/edac/i82875p_edac.c     |   32 +-
 drivers/edac/i82975x_edac.c     |   31 +-
 drivers/edac/mpc85xx_edac.c     |   23 +-
 drivers/edac/mv64x60_edac.c     |   21 +-
 drivers/edac/pasemi_edac.c      |   24 +-
 drivers/edac/ppc4xx_edac.c      |   29 +-
 drivers/edac/r82600_edac.c      |   28 +-
 drivers/edac/sb_edac.c          |   89 +++--
 drivers/edac/tile_edac.c        |   12 +-
 drivers/edac/x38_edac.c         |   29 +-
 include/linux/edac.h            |  298 ++++++++------
 include/trace/events/hw_event.h |   40 ++-
 33 files changed, 1518 insertions(+), 917 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3cba6a5..139e774 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1039,6 +1039,37 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 	int channel, csrow;
 	u32 page, offset;
 
+	error_address_to_page_and_offset(sys_addr, &page, &offset);
+
+	/*
+	 * Find out which node the error address belongs to. This may be
+	 * different from the node that detected the error.
+	 */
+	src_mci = find_mc_by_sys_addr(mci, sys_addr);
+	if (!src_mci) {
+		amd64_mc_err(mci, "failed to map error addr 0x%lx to a node\n",
+			     (unsigned long)sys_addr);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC, mci,
+				     page, offset, syndrome,
+				     -1, -1, -1, -1, -1,
+				     EDAC_MOD_STR,
+				     "failed to map error addr to a node");
+		return;
+	}
+
+	/* Now map the sys_addr to a CSROW */
+	csrow = sys_addr_to_csrow(src_mci, sys_addr);
+	if (csrow < 0) {
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC, mci,
+				     page, offset, syndrome,
+				     -1, -1, -1, -1, -1,
+				     EDAC_MOD_STR,
+				     "failed to map error addr to a csrow");
+		return;
+	}
+
 	/* CHIPKILL enabled */
 	if (pvt->nbcfg & NBCFG_CHIPKILL) {
 		channel = get_channel_from_ecc_syndrome(mci, syndrome);
@@ -1048,9 +1079,15 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 			 * 2 DIMMs is in error. So we need to ID 'both' of them
 			 * as suspect.
 			 */
-			amd64_mc_warn(mci, "unknown syndrome 0x%04x - possible "
-					   "error reporting race\n", syndrome);
-			edac_mc_handle_ce_no_info(mci, EDAC_MOD_STR);
+			amd64_mc_warn(src_mci, "unknown syndrome 0x%04x - "
+				      "possible error reporting race\n",
+				      syndrome);
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW, mci,
+					     page, offset, syndrome,
+					     -1, -1, -1, csrow, -1,
+					     EDAC_MOD_STR,
+					     "unknown syndrome - possible error reporting race");
 			return;
 		}
 	} else {
@@ -1065,28 +1102,11 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 		channel = ((sys_addr & BIT(3)) != 0);
 	}
 
-	/*
-	 * Find out which node the error address belongs to. This may be
-	 * different from the node that detected the error.
-	 */
-	src_mci = find_mc_by_sys_addr(mci, sys_addr);
-	if (!src_mci) {
-		amd64_mc_err(mci, "failed to map error addr 0x%lx to a node\n",
-			     (unsigned long)sys_addr);
-		edac_mc_handle_ce_no_info(mci, EDAC_MOD_STR);
-		return;
-	}
-
-	/* Now map the sys_addr to a CSROW */
-	csrow = sys_addr_to_csrow(src_mci, sys_addr);
-	if (csrow < 0) {
-		edac_mc_handle_ce_no_info(src_mci, EDAC_MOD_STR);
-	} else {
-		error_address_to_page_and_offset(sys_addr, &page, &offset);
-
-		edac_mc_handle_ce(src_mci, page, offset, syndrome, csrow,
-				  channel, EDAC_MOD_STR);
-	}
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+			     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, src_mci,
+			     page, offset, syndrome,
+			     -1, -1, -1, csrow, channel,
+			     EDAC_MOD_STR, "");
 }
 
 static int ddr2_cs_size(unsigned i, bool dct_width)
@@ -1567,16 +1587,22 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 	struct amd64_pvt *pvt = mci->pvt_info;
 	u32 page, offset;
 	int nid, csrow, chan = 0;
+	enum hw_event_error_scope scope;
+
+	error_address_to_page_and_offset(sys_addr, &page, &offset);
 
 	csrow = f1x_translate_sysaddr_to_cs(pvt, sys_addr, &nid, &chan);
 
 	if (csrow < 0) {
-		edac_mc_handle_ce_no_info(mci, EDAC_MOD_STR);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC, mci,
+				     page, offset, syndrome,
+				     -1, -1, -1, -1, -1,
+				     EDAC_MOD_STR,
+				     "failed to map error addr to a csrow");
 		return;
 	}
 
-	error_address_to_page_and_offset(sys_addr, &page, &offset);
-
 	/*
 	 * We need the syndromes for channel detection only when we're
 	 * ganged. Otherwise @chan should already contain the channel at
@@ -1585,16 +1611,22 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 	if (dct_ganging_enabled(pvt))
 		chan = get_channel_from_ecc_syndrome(mci, syndrome);
 
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				HW_EVENT_SCOPE_MC, mci,
+				page, offset, syndrome,
+				-1, -1, -1, -1, -1,
+				EDAC_MOD_STR,
+				"failed to map error addr to a csrow");
 	if (chan >= 0)
-		edac_mc_handle_ce(mci, page, offset, syndrome, csrow, chan,
-				  EDAC_MOD_STR);
+		scope = HW_EVENT_SCOPE_MC_CSROW_CHANNEL;
 	else
-		/*
-		 * Channel unknown, report all channels on this CSROW as failed.
-		 */
-		for (chan = 0; chan < mci->csrows[csrow].nr_channels; chan++)
-			edac_mc_handle_ce(mci, page, offset, syndrome,
-					  csrow, chan, EDAC_MOD_STR);
+		scope = HW_EVENT_SCOPE_MC_CSROW;
+
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				HW_EVENT_SCOPE_MC, mci,
+				page, offset, syndrome,
+				-1, -1, -1, csrow, chan,
+				EDAC_MOD_STR, "");
 }
 
 /*
@@ -1875,7 +1907,12 @@ static void amd64_handle_ce(struct mem_ctl_info *mci, struct mce *m)
 	/* Ensure that the Error Address is VALID */
 	if (!(m->status & MCI_STATUS_ADDRV)) {
 		amd64_mc_err(mci, "HW has no ERROR_ADDRESS available\n");
-		edac_mc_handle_ce_no_info(mci, EDAC_MOD_STR);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC, mci,
+				     0, 0, 0,
+				     -1, -1, -1, -1, -1,
+				     EDAC_MOD_STR,
+				     "HW has no ERROR_ADDRESS available");
 		return;
 	}
 
@@ -1899,11 +1936,17 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
 
 	if (!(m->status & MCI_STATUS_ADDRV)) {
 		amd64_mc_err(mci, "HW has no ERROR_ADDRESS available\n");
-		edac_mc_handle_ue_no_info(log_mci, EDAC_MOD_STR);
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC, mci,
+				     0, 0, 0,
+				     -1, -1, -1, -1, -1,
+				     EDAC_MOD_STR,
+				     "HW has no ERROR_ADDRESS available");
 		return;
 	}
 
 	sys_addr = get_error_address(m);
+	error_address_to_page_and_offset(sys_addr, &page, &offset);
 
 	/*
 	 * Find out which node the error address belongs to. This may be
@@ -1913,7 +1956,12 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
 	if (!src_mci) {
 		amd64_mc_err(mci, "ERROR ADDRESS (0x%lx) NOT mapped to a MC\n",
 				  (unsigned long)sys_addr);
-		edac_mc_handle_ue_no_info(log_mci, EDAC_MOD_STR);
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC, mci,
+				     page, offset, 0,
+				     -1, -1, -1, -1, -1,
+				     EDAC_MOD_STR,
+				     "ERROR ADDRESS NOT mapped to a MC");
 		return;
 	}
 
@@ -1923,10 +1971,18 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
 	if (csrow < 0) {
 		amd64_mc_err(mci, "ERROR_ADDRESS (0x%lx) NOT mapped to CS\n",
 				  (unsigned long)sys_addr);
-		edac_mc_handle_ue_no_info(log_mci, EDAC_MOD_STR);
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC, mci,
+				     page, offset, 0,
+				     -1, -1, -1, -1, -1,
+				     EDAC_MOD_STR,
+				     "ERROR ADDRESS NOT mapped to CS");
 	} else {
-		error_address_to_page_and_offset(sys_addr, &page, &offset);
-		edac_mc_handle_ue(log_mci, page, offset, csrow, EDAC_MOD_STR);
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW, mci,
+				     page, offset, 0,
+				     -1, -1, -1, csrow, -1,
+				     EDAC_MOD_STR, "");
 	}
 }
 
@@ -2520,7 +2576,10 @@ static int amd64_init_one_instance(struct pci_dev *F2)
 		goto err_siblings;
 
 	ret = -ENOMEM;
-	mci = edac_mc_alloc(0, pvt->csels[0].b_cnt, pvt->channel_count, nid);
+	/* FIXME: Assuming one DIMM per csrow channel */
+	mci = edac_mc_alloc(nid, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, pvt->csels[0].b_cnt * pvt->channel_count,
+			    pvt->csels[0].b_cnt, pvt->channel_count, nid);
 	if (!mci)
 		goto err_siblings;
 
diff --git a/drivers/edac/amd76x_edac.c b/drivers/edac/amd76x_edac.c
index 1532750..7e6bbf8 100644
--- a/drivers/edac/amd76x_edac.c
+++ b/drivers/edac/amd76x_edac.c
@@ -29,7 +29,6 @@
 	edac_mc_chipset_printk(mci, level, "amd76x", fmt, ##arg)
 
 #define AMD76X_NR_CSROWS 8
-#define AMD76X_NR_CHANS  1
 #define AMD76X_NR_DIMMS  4
 
 /* AMD 76x register addresses - device 0 function 0 - PCI bridge */
@@ -146,8 +145,12 @@ static int amd76x_process_error_info(struct mem_ctl_info *mci,
 
 		if (handle_errors) {
 			row = (info->ecc_mode_status >> 4) & 0xf;
-			edac_mc_handle_ue(mci, mci->csrows[row].first_page, 0,
-					row, mci->ctl_name);
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
+					     mci, mci->csrows[row].first_page,
+					     0, 0,
+					     -1, -1, row, row, 0,
+					     mci->ctl_name, "");
 		}
 	}
 
@@ -159,8 +162,12 @@ static int amd76x_process_error_info(struct mem_ctl_info *mci,
 
 		if (handle_errors) {
 			row = info->ecc_mode_status & 0xf;
-			edac_mc_handle_ce(mci, mci->csrows[row].first_page, 0,
-					0, row, 0, mci->ctl_name);
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
+					     mci, mci->csrows[row].first_page,
+					     0, 0,
+					     -1, -1, row, row, 0,
+					     mci->ctl_name, "");
 		}
 	}
 
@@ -190,7 +197,7 @@ static void amd76x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 	u32 mba, mba_base, mba_mask, dms;
 	int index;
 
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		csrow = &mci->csrows[index];
 		dimm = csrow->channels[0].dimm;
 
@@ -240,11 +247,11 @@ static int amd76x_probe1(struct pci_dev *pdev, int dev_idx)
 	debugf0("%s()\n", __func__);
 	pci_read_config_dword(pdev, AMD76X_ECC_MODE_STATUS, &ems);
 	ems_mode = (ems >> 10) & 0x3;
-	mci = edac_mc_alloc(0, AMD76X_NR_CSROWS, AMD76X_NR_CHANS, 0);
-
-	if (mci == NULL) {
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_MCCHANNEL_IS_CSROW,
+			    0, 0, AMD76X_NR_CSROWS,
+			    AMD76X_NR_CSROWS, 1, 0);
+	if (mci == NULL)
 		return -ENOMEM;
-	}
 
 	debugf0("%s(): mci = %p\n", __func__, mci);
 	mci->dev = &pdev->dev;
diff --git a/drivers/edac/cell_edac.c b/drivers/edac/cell_edac.c
index 09e1b5d..abe06a4 100644
--- a/drivers/edac/cell_edac.c
+++ b/drivers/edac/cell_edac.c
@@ -48,8 +48,11 @@ static void cell_edac_count_ce(struct mem_ctl_info *mci, int chan, u64 ar)
 	syndrome = (ar & 0x000000001fe00000ul) >> 21;
 
 	/* TODO: Decoding of the error address */
-	edac_mc_handle_ce(mci, csrow->first_page + pfn, offset,
-			  syndrome, 0, chan, "");
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				csrow->first_page + pfn, offset, syndrome,
+				-1, -1, -1, 0, chan,
+				"", "");
 }
 
 static void cell_edac_count_ue(struct mem_ctl_info *mci, int chan, u64 ar)
@@ -69,7 +72,11 @@ static void cell_edac_count_ue(struct mem_ctl_info *mci, int chan, u64 ar)
 	offset = address & ~PAGE_MASK;
 
 	/* TODO: Decoding of the error address */
-	edac_mc_handle_ue(mci, csrow->first_page + pfn, offset, 0, "");
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				csrow->first_page + pfn, offset, 0,
+				-1, -1, -1, 0, chan,
+				"", "");
 }
 
 static void cell_edac_check(struct mem_ctl_info *mci)
@@ -167,7 +174,7 @@ static int __devinit cell_edac_probe(struct platform_device *pdev)
 	struct mem_ctl_info		*mci;
 	struct cell_edac_priv		*priv;
 	u64				reg;
-	int				rc, chanmask;
+	int				rc, chanmask, num_chans;
 
 	regs = cbe_get_cpu_mic_tm_regs(cbe_node_to_cpu(pdev->id));
 	if (regs == NULL)
@@ -192,8 +199,10 @@ static int __devinit cell_edac_probe(struct platform_device *pdev)
 		in_be64(&regs->mic_fir));
 
 	/* Allocate & init EDAC MC data structure */
-	mci = edac_mc_alloc(sizeof(struct cell_edac_priv), 1,
-			    chanmask == 3 ? 2 : 1, pdev->id);
+	num_chans = chanmask == 3 ? 2 : 1;
+	mci = edac_mc_alloc(pdev->id, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, num_chans,
+			    1, num_chans, sizeof(struct cell_edac_priv));
 	if (mci == NULL)
 		return -ENOMEM;
 	priv = mci->pvt_info;
diff --git a/drivers/edac/cpc925_edac.c b/drivers/edac/cpc925_edac.c
index 7b764a8..4a25b92 100644
--- a/drivers/edac/cpc925_edac.c
+++ b/drivers/edac/cpc925_edac.c
@@ -336,7 +336,7 @@ static void cpc925_init_csrows(struct mem_ctl_info *mci)
 
 	get_total_mem(pdata);
 
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		mbmr = __raw_readl(pdata->vbase + REG_MBMR_OFFSET +
 				   0x20 * index);
 		mbbar = __raw_readl(pdata->vbase + REG_MBBAR_OFFSET +
@@ -555,13 +555,20 @@ static void cpc925_mc_check(struct mem_ctl_info *mci)
 	if (apiexcp & CECC_EXCP_DETECTED) {
 		cpc925_mc_printk(mci, KERN_INFO, "DRAM CECC Fault\n");
 		channel = cpc925_mc_find_channel(mci, syndrome);
-		edac_mc_handle_ce(mci, pfn, offset, syndrome,
-				  csrow, channel, mci->ctl_name);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     pfn, offset, syndrome,
+				     -1, -1, -1, csrow, channel,
+				     mci->ctl_name, "");
 	}
 
 	if (apiexcp & UECC_EXCP_DETECTED) {
 		cpc925_mc_printk(mci, KERN_INFO, "DRAM UECC Fault\n");
-		edac_mc_handle_ue(mci, pfn, offset, csrow, mci->ctl_name);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW, mci,
+				     pfn, offset, 0,
+				     -1, -1, -1, csrow, -1,
+				     mci->ctl_name, "");
 	}
 
 	cpc925_mc_printk(mci, KERN_INFO, "Dump registers:\n");
@@ -969,8 +976,10 @@ static int __devinit cpc925_probe(struct platform_device *pdev)
 	}
 
 	nr_channels = cpc925_mc_get_channels(vbase) + 1;
-	mci = edac_mc_alloc(sizeof(struct cpc925_mc_pdata),
-			CPC925_NR_CSROWS, nr_channels, edac_mc_idx);
+	mci = edac_mc_alloc(edac_mc_idx, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, CPC925_NR_CSROWS * nr_channels,
+			    CPC925_NR_CSROWS, nr_channels,
+			    sizeof(struct cpc925_mc_pdata));
 	if (!mci) {
 		cpc925_printk(KERN_ERR, "No memory for mem_ctl_info\n");
 		res = -ENOMEM;
diff --git a/drivers/edac/e752x_edac.c b/drivers/edac/e752x_edac.c
index 310f657..813d965 100644
--- a/drivers/edac/e752x_edac.c
+++ b/drivers/edac/e752x_edac.c
@@ -6,6 +6,9 @@
  *
  * See "enum e752x_chips" below for supported chipsets
  *
+ * Datasheet:
+ *	http://www.intel.in/content/www/in/en/chipsets/e7525-memory-controller-hub-datasheet.html
+ *
  * Written by Tom Zimmerman
  *
  * Contributors:
@@ -350,8 +353,11 @@ static void do_process_ce(struct mem_ctl_info *mci, u16 error_one,
 	channel = !(error_one & 1);
 
 	/* e752x mc reads 34:6 of the DRAM linear address */
-	edac_mc_handle_ce(mci, page, offset_in_page(sec1_add << 4),
-			sec1_syndrome, row, channel, "e752x CE");
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+			     HW_EVENT_SCOPE_MC, mci,
+			     page, offset_in_page(sec1_add << 4), sec1_syndrome,
+			     -1, -1, -1, row, channel,
+			     "e752x CE", "");
 }
 
 static inline void process_ce(struct mem_ctl_info *mci, u16 error_one,
@@ -385,9 +391,13 @@ static void do_process_ue(struct mem_ctl_info *mci, u16 error_one,
 			edac_mc_find_csrow_by_page(mci, block_page);
 
 		/* e752x mc reads 34:6 of the DRAM linear address */
-		edac_mc_handle_ue(mci, block_page,
-				offset_in_page(error_2b << 4),
-				row, "e752x UE from Read");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+					HW_EVENT_SCOPE_MC_CSROW, mci,
+					block_page,
+					offset_in_page(error_2b << 4), 0,
+					-1, -1, -1, row, -1,
+					"e752x UE from Read", "");
+
 	}
 	if (error_one & 0x0404) {
 		error_2b = scrb_add;
@@ -401,9 +411,12 @@ static void do_process_ue(struct mem_ctl_info *mci, u16 error_one,
 			edac_mc_find_csrow_by_page(mci, block_page);
 
 		/* e752x mc reads 34:6 of the DRAM linear address */
-		edac_mc_handle_ue(mci, block_page,
-				offset_in_page(error_2b << 4),
-				row, "e752x UE from Scruber");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+					HW_EVENT_SCOPE_MC_CSROW, mci,
+					block_page,
+					offset_in_page(error_2b << 4), 0,
+					-1, -1, -1, row, -1,
+					"e752x UE from Scruber", "");
 	}
 }
 
@@ -426,7 +439,10 @@ static inline void process_ue_no_info_wr(struct mem_ctl_info *mci,
 		return;
 
 	debugf3("%s()\n", __func__);
-	edac_mc_handle_ue_no_info(mci, "e752x UE log memory write");
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+			     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+			     -1, -1, -1, -1, -1,
+			     "e752x UE log memory write", "");
 }
 
 static void do_process_ded_retry(struct mem_ctl_info *mci, u16 error,
@@ -1062,7 +1078,7 @@ static void e752x_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 	 * channel operation).  DRB regs are cumulative; therefore DRB7 will
 	 * contain the total memory contained in all eight rows.
 	 */
-	for (last_cumul_size = index = 0; index < mci->nr_csrows; index++) {
+	for (last_cumul_size = index = 0; index < mci->num_csrows; index++) {
 		/* mem_dev 0=x8, 1=x4 */
 		mem_dev = (dra >> (index * 4 + 2)) & 0x3;
 		csrow = &mci->csrows[remap_csrow_index(mci, index)];
@@ -1258,7 +1274,9 @@ static int e752x_probe1(struct pci_dev *pdev, int dev_idx)
 	/* Dual channel = 1, Single channel = 0 */
 	drc_chan = dual_channel_active(ddrcsr);
 
-	mci = edac_mc_alloc(sizeof(*pvt), E752X_NR_CSROWS, drc_chan + 1, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, E752X_NR_CSROWS * (drc_chan + 1),
+			    E752X_NR_CSROWS, drc_chan + 1, sizeof(*pvt));
 
 	if (mci == NULL) {
 		return -ENOMEM;
diff --git a/drivers/edac/e7xxx_edac.c b/drivers/edac/e7xxx_edac.c
index 2005d80..01f64d3 100644
--- a/drivers/edac/e7xxx_edac.c
+++ b/drivers/edac/e7xxx_edac.c
@@ -10,6 +10,9 @@
  * Based on work by Dan Hollis <goemon at anime dot net> and others.
  *	http://www.anime.net/~goemon/linux-ecc/
  *
+ * Datasheet:
+ *	http://www.intel.com/content/www/us/en/chipsets/e7501-chipset-memory-controller-hub-datasheet.html
+ *
  * Contributors:
  *	Eric Biederman (Linux Networx)
  *	Tom Zimmerman (Linux Networx)
@@ -71,7 +74,7 @@
 #endif				/* PCI_DEVICE_ID_INTEL_7505_1_ERR */
 
 #define E7XXX_NR_CSROWS		8	/* number of csrows */
-#define E7XXX_NR_DIMMS		8	/* FIXME - is this correct? */
+#define E7XXX_NR_DIMMS		8	/* 2 channels, 4 dimms/channel */
 
 /* E7XXX register addresses - device 0 function 0 */
 #define E7XXX_DRB		0x60	/* DRAM row boundary register (8b) */
@@ -216,13 +219,20 @@ static void process_ce(struct mem_ctl_info *mci, struct e7xxx_error_info *info)
 	row = edac_mc_find_csrow_by_page(mci, page);
 	/* convert syndrome to channel */
 	channel = e7xxx_find_channel(syndrome);
-	edac_mc_handle_ce(mci, page, 0, syndrome, row, channel, "e7xxx CE");
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+			     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+			     page, 0, syndrome,
+			     -1, -1, -1, row, channel,
+			     "e7xxx CE", "");
 }
 
 static void process_ce_no_info(struct mem_ctl_info *mci)
 {
 	debugf3("%s()\n", __func__);
-	edac_mc_handle_ce_no_info(mci, "e7xxx CE log register overflow");
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+			     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+			     -1, -1, -1, -1, -1,
+			     "e7xxx CE log register overflow", "");
 }
 
 static void process_ue(struct mem_ctl_info *mci, struct e7xxx_error_info *info)
@@ -236,13 +246,21 @@ static void process_ue(struct mem_ctl_info *mci, struct e7xxx_error_info *info)
 	/* FIXME - should use PAGE_SHIFT */
 	block_page = error_2b >> 6;	/* convert to 4k address */
 	row = edac_mc_find_csrow_by_page(mci, block_page);
-	edac_mc_handle_ue(mci, block_page, 0, row, "e7xxx UE");
+
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+			     HW_EVENT_SCOPE_MC_CSROW, mci, block_page, 0, 0,
+			     -1, -1, -1, row, -1,
+			     "e7xxx UE", "");
 }
 
 static void process_ue_no_info(struct mem_ctl_info *mci)
 {
 	debugf3("%s()\n", __func__);
-	edac_mc_handle_ue_no_info(mci, "e7xxx UE log register overflow");
+
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+			     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+			     -1, -1, -1, -1, -1,
+			     "e7xxx UE log register overflow", "");
 }
 
 static void e7xxx_get_error_info(struct mem_ctl_info *mci,
@@ -365,7 +383,7 @@ static void e7xxx_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 	 * channel operation).  DRB regs are cumulative; therefore DRB7 will
 	 * contain the total memory contained in all eight rows.
 	 */
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		/* mem_dev 0=x8, 1=x4 */
 		mem_dev = (dra >> (index * 4 + 3)) & 0x1;
 		csrow = &mci->csrows[index];
@@ -423,7 +441,17 @@ static int e7xxx_probe1(struct pci_dev *pdev, int dev_idx)
 	pci_read_config_dword(pdev, E7XXX_DRC, &drc);
 
 	drc_chan = dual_channel_active(drc, dev_idx);
-	mci = edac_mc_alloc(sizeof(*pvt), E7XXX_NR_CSROWS, drc_chan + 1, 0);
+	/*
+	 * According with the datasheet, this device has a maximum of
+	 * 4 DIMMS per channel, either single-rank or dual-rank. So, the
+	 * total amount of dimms is 8 (E7XXX_NR_DIMMS).
+	 * That means that the DIMM is mapped as CSROWs, and the channel
+	 * will map the rank. So, an error to either channel should be
+	 * attributed to the same dimm.
+	 */
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, E7XXX_NR_DIMMS,
+			    E7XXX_NR_CSROWS, drc_chan + 1, sizeof(*pvt));
 
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index fe90cd4..e4961fd 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -448,8 +448,36 @@ static inline void pci_write_bits32(struct pci_dev *pdev, int offset,
 
 #endif				/* CONFIG_PCI */
 
-extern struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
-					  unsigned nr_chans, int edac_index);
+/**
+ * enum edac_alloc_fill_strategy - Controls the way csrows/cschannels are mapped
+ * @EDAC_ALLOC_FILL_CSROW_CSCHANNEL:	csrows are rows, cschannels are channel.
+ *					This is the default and should be used
+ *					when the memory controller is able to
+ *					see csrows/cschannels. The dimms are
+ *					associated with cschannels.
+ * @EDAC_ALLOC_FILL_MCCHANNEL_IS_CSROW:	mc_branch/mc_channel are mapped as
+ *					cschannel. DIMMs inside each channel are
+ *					mapped as csrows. Most FBDIMMs drivers
+ *					use this model.
+ *@EDAC_ALLOC_FILL_PRIV:		The driver uses its own mapping model.
+ *					So, the core will leave the csrows
+ *					struct unitialized, leaving to the
+ *					driver the task of filling it.
+ */
+enum edac_alloc_fill_strategy {
+	EDAC_ALLOC_FILL_CSROW_CSCHANNEL = 0,
+	EDAC_ALLOC_FILL_MCCHANNEL_IS_CSROW,
+	EDAC_ALLOC_FILL_PRIV,
+};
+
+struct mem_ctl_info *edac_mc_alloc(int edac_index,
+				   enum edac_alloc_fill_strategy fill_strategy,
+				   unsigned num_branch,
+				   unsigned num_channel,
+				   unsigned num_dimm,
+				   unsigned nr_csrows,
+				   unsigned num_cschans,
+				   unsigned sz_pvt);
 extern int edac_mc_add_mc(struct mem_ctl_info *mci);
 extern void edac_mc_free(struct mem_ctl_info *mci);
 extern struct mem_ctl_info *edac_mc_find(int idx);
@@ -457,35 +485,19 @@ extern struct mem_ctl_info *find_mci_by_dev(struct device *dev);
 extern struct mem_ctl_info *edac_mc_del_mc(struct device *dev);
 extern int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci,
 				      unsigned long page);
-
-/*
- * The no info errors are used when error overflows are reported.
- * There are a limited number of error logging registers that can
- * be exausted.  When all registers are exhausted and an additional
- * error occurs then an error overflow register records that an
- * error occurred and the type of error, but doesn't have any
- * further information.  The ce/ue versions make for cleaner
- * reporting logic and function interface - reduces conditional
- * statement clutter and extra function arguments.
- */
-extern void edac_mc_handle_ce(struct mem_ctl_info *mci,
-			      unsigned long page_frame_number,
-			      unsigned long offset_in_page,
-			      unsigned long syndrome, int row, int channel,
-			      const char *msg);
-extern void edac_mc_handle_ce_no_info(struct mem_ctl_info *mci,
-				      const char *msg);
-extern void edac_mc_handle_ue(struct mem_ctl_info *mci,
-			      unsigned long page_frame_number,
-			      unsigned long offset_in_page, int row,
-			      const char *msg);
-extern void edac_mc_handle_ue_no_info(struct mem_ctl_info *mci,
-				      const char *msg);
-extern void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci, unsigned int csrow,
-				  unsigned int channel0, unsigned int channel1,
-				  char *msg);
-extern void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci, unsigned int csrow,
-				  unsigned int channel, char *msg);
+void edac_mc_handle_error(enum hw_event_mc_err_type type,
+			  enum hw_event_error_scope scope,
+			  struct mem_ctl_info *mci,
+			  unsigned long page_frame_number,
+			  unsigned long offset_in_page,
+			  unsigned long syndrome,
+			  int mc_branch,
+			  int mc_channel,
+			  int mc_dimm_number,
+			  int csrow,
+			  int cschannel,
+			  const char *msg,
+			  const char *other_detail);
 
 /*
  * edac_device APIs
diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
index c3f6743..a9a5b6c 100644
--- a/drivers/edac/edac_device.c
+++ b/drivers/edac/edac_device.c
@@ -80,7 +80,7 @@ struct edac_device_ctl_info *edac_device_alloc_ctl_info(
 	unsigned total_size;
 	unsigned count;
 	unsigned instance, block, attr;
-	void *pvt;
+	void *pvt, *p;
 	int err;
 
 	debugf4("%s() instances=%d blocks=%d\n",
@@ -93,35 +93,30 @@ struct edac_device_ctl_info *edac_device_alloc_ctl_info(
 	 * to be at least as stringent as what the compiler would
 	 * provide if we could simply hardcode everything into a single struct.
 	 */
-	dev_ctl = (struct edac_device_ctl_info *)NULL;
+	p = NULL;
+	dev_ctl = edac_align_ptr(&p, sizeof(*dev_ctl), 1);
 
 	/* Calc the 'end' offset past end of ONE ctl_info structure
 	 * which will become the start of the 'instance' array
 	 */
-	dev_inst = edac_align_ptr(&dev_ctl[1], sizeof(*dev_inst));
+	dev_inst = edac_align_ptr(&p, sizeof(*dev_inst), nr_instances);
 
 	/* Calc the 'end' offset past the instance array within the ctl_info
 	 * which will become the start of the block array
 	 */
-	dev_blk = edac_align_ptr(&dev_inst[nr_instances], sizeof(*dev_blk));
+	count = nr_instances * nr_blocks;
+	dev_blk = edac_align_ptr(&p, sizeof(*dev_blk), count);
 
 	/* Calc the 'end' offset past the dev_blk array
 	 * which will become the start of the attrib array, if any.
 	 */
-	count = nr_instances * nr_blocks;
-	dev_attrib = edac_align_ptr(&dev_blk[count], sizeof(*dev_attrib));
-
-	/* Check for case of when an attribute array is specified */
-	if (nr_attrib > 0) {
-		/* calc how many nr_attrib we need */
+	/* calc how many nr_attrib we need */
+	if (nr_attrib > 0)
 		count *= nr_attrib;
+	dev_attrib = edac_align_ptr(&p, sizeof(*dev_attrib), count);
 
-		/* Calc the 'end' offset past the attributes array */
-		pvt = edac_align_ptr(&dev_attrib[count], sz_private);
-	} else {
-		/* no attribute array specificed */
-		pvt = edac_align_ptr(dev_attrib, sz_private);
-	}
+	/* Calc the 'end' offset past the attributes array */
+	pvt = edac_align_ptr(&p, sz_private, 1);
 
 	/* 'pvt' now points to where the private data area is.
 	 * At this point 'pvt' (like dev_inst,dev_blk and dev_attrib)
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index ee3f0f8..55760bc 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -48,10 +48,20 @@ static void edac_mc_dump_channel(struct csrow_channel_info *chan)
 	debugf4("\tchannel = %p\n", chan);
 	debugf4("\tchannel->chan_idx = %d\n", chan->chan_idx);
 	debugf4("\tchannel->csrow = %p\n\n", chan->csrow);
+	debugf4("\tchannel->dimm = %p\n", chan->dimm);
+}
 
-	debugf4("\tdimm->ce_count = %d\n", chan->dimm->ce_count);
-	debugf4("\tdimm->label = '%s'\n", chan->dimm->label);
-	debugf4("\tdimm->nr_pages = 0x%x\n", chan->dimm->nr_pages);
+static void edac_mc_dump_dimm(struct dimm_info *dimm)
+{
+	debugf4("\tdimm = %p\n", dimm);
+	debugf4("\tdimm->label = '%s'\n", dimm->label);
+	debugf4("\tdimm->nr_pages = 0x%x\n", dimm->nr_pages);
+	debugf4("\tdimm location %d.%d.%d.%d.%d\n",
+		dimm->mc_branch, dimm->mc_channel,
+		dimm->mc_dimm_number,
+		dimm->csrow, dimm->cschannel);
+	debugf4("\tdimm->grain = %d\n", dimm->grain);
+	debugf4("\tdimm->nr_pages = 0x%x\n", dimm->nr_pages);
 }
 
 static void edac_mc_dump_csrow(struct csrow_info *csrow)
@@ -73,8 +83,10 @@ static void edac_mc_dump_mci(struct mem_ctl_info *mci)
 	debugf3("\tmci->edac_ctl_cap = %lx\n", mci->edac_ctl_cap);
 	debugf3("\tmci->edac_cap = %lx\n", mci->edac_cap);
 	debugf4("\tmci->edac_check = %p\n", mci->edac_check);
-	debugf3("\tmci->nr_csrows = %d, csrows = %p\n",
-		mci->nr_csrows, mci->csrows);
+	debugf3("\tmci->num_csrows = %d, csrows = %p\n",
+		mci->num_csrows, mci->csrows);
+	debugf3("\tmci->nr_dimms = %d, dimns = %p\n",
+		mci->tot_dimms, mci->dimms);
 	debugf3("\tdev = %p\n", mci->dev);
 	debugf3("\tmod_name:ctl_name = %s:%s\n", mci->mod_name, mci->ctl_name);
 	debugf3("\tpvt_info = %p\n\n", mci->pvt_info);
@@ -113,9 +125,12 @@ EXPORT_SYMBOL_GPL(edac_mem_types);
  * If 'size' is a constant, the compiler will optimize this whole function
  * down to either a no-op or the addition of a constant to the value of 'ptr'.
  */
-void *edac_align_ptr(void *ptr, unsigned size)
+void *edac_align_ptr(void **p, unsigned size, int quant)
 {
 	unsigned align, r;
+	void *ptr = *p;
+
+	*p += size * quant;
 
 	/* Here we assume that the alignment of a "long long" is the most
 	 * stringent alignment that the compiler will ever provide by default.
@@ -137,14 +152,60 @@ void *edac_align_ptr(void *ptr, unsigned size)
 	if (r == 0)
 		return (char *)ptr;
 
+	*p += align - r;
+
 	return (void *)(((unsigned long)ptr) + align - r);
 }
 
 /**
- * edac_mc_alloc: Allocate a struct mem_ctl_info structure
- * @size_pvt:	size of private storage needed
- * @nr_csrows:	Number of CWROWS needed for this MC
- * @nr_chans:	Number of channels for the MC
+ * edac_mc_alloc: Allocate and partially fills a struct mem_ctl_info structure
+ * @edac_index:		Memory controller number
+ * @fill_strategy:	csrow/cschannel filling strategy
+ * @num_branch:		Number of memory controller branches
+ * @num_channel:	Number of memory controller channels
+ * @num_dimm:		Number of dimms per memory controller channel
+ * @num_csrows:		Number of CWROWS accessed via the memory controller
+ * @num_cschannel:	Number of csrows channels
+ * @size_pvt:		size of private storage needed
+ *
+ * This routine supports 3 modes of DIMM mapping:
+ *	1) the ones that accesses DRAM's via some bus interface (FB-DIMM
+ * and RAMBUS memory controllers) or that don't have chip select view
+ *
+ * In this case, a branch is generally a group of 2 channels, used generally
+ * in  parallel to provide 128 bits data.
+ *
+ * In the case of FB-DIMMs, the dimm is addressed via the SPD Address
+ * input selection, used by the AMB to select the DIMM. The MC channel
+ * corresponds to the Memory controller channel bus used to see a series
+ * of FB-DIMM's.
+ *
+ * num_branch, num_channel and num_dimm should point to the real
+ *	parameters of the memory controller.
+ *
+ * The total number of dimms is num_branch * num_channel * num_dimm
+ *
+ * According with JEDEC No. 205, up to 8 FB-DIMMs are possible per channel. Of
+ * course, controllers may have a lower limit.
+ *
+ * num_csrows/num_cschannel should point to the emulated parameters.
+ * The total number of cschannels (num_csrows * num_cschannel) should be a
+ * multiple of the total number dimms, e. g:
+ *  factor = (num_csrows * num_cschannel)/(num_branch * num_channel * num_dimm)
+ * should be an integer (typically: it is 1 or num_cschannel)
+ *
+ *	2) The MC uses CSROWS/CS CHANNELS to directly select a DRAM chip.
+ * One dimm chip exists on every cs channel, for single-rank memories.
+ *	num_branch and num_channel should be 0
+ *	num_dimm should be the total number of dimms
+ *	num_csrows * num_cschannel should be equal to num_dimm
+ *
+ *	3)The MC uses CSROWS/CS CHANNELS. One dimm chip exists on every
+ * csrow. The cs channel is used to indicate the defective chip(s) inside
+ * the memory stick.
+ *	num_branch and num_channel should be 0
+ *	num_dimm should be the total number of dimms
+ *	num_csrows should be equal to num_dimm
  *
  * Everything is kmalloc'ed as one big chunk - more efficient.
  * Only can be used if all structures have the same lifetime - otherwise
@@ -156,30 +217,87 @@ void *edac_align_ptr(void *ptr, unsigned size)
  *	NULL allocation failed
  *	struct mem_ctl_info pointer
  */
-struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
-				unsigned nr_chans, int edac_index)
+struct mem_ctl_info *edac_mc_alloc(int edac_index,
+				   enum edac_alloc_fill_strategy fill_strategy,
+				   unsigned num_branch,
+				   unsigned num_channel,
+				   unsigned num_dimm,
+				   unsigned num_csrows,
+				   unsigned num_cschannel,
+				   unsigned sz_pvt)
 {
+	void *ptr;
 	struct mem_ctl_info *mci;
-	struct csrow_info *csi, *csrow;
+	struct csrow_info *csi, *csr;
 	struct csrow_channel_info *chi, *chp, *chan;
 	struct dimm_info *dimm;
+	u32 *ce_branch, *ce_channel, *ce_dimm, *ce_csrow, *ce_cschannel;
+	u32 *ue_branch, *ue_channel, *ue_dimm, *ue_csrow, *ue_cschannel;
 	void *pvt;
-	unsigned size;
-	int row, chn;
+	unsigned size, tot_dimms, count, dimm_div;
+	int i;
 	int err;
+	int mc_branch, mc_channel, mc_dimm_number, csrow, cschannel;
+	int row, chn;
+
+	/*
+	 * While we expect that non-pertinent values will be filled with
+	 * 0, in order to provide a way for this routine to detect if the
+	 * EDAC is emulating the old sysfs API, we can't actually accept
+	 * 0, as otherwise, a multiply by 0 whould hapen.
+	 */
+	if (num_branch <= 0)
+		num_branch = 1;
+	if (num_channel <= 0)
+		num_channel = 1;
+	if (num_dimm <= 0)
+		num_dimm = 1;
+	if (num_csrows <= 0)
+		num_csrows = 1;
+	if (num_cschannel <= 0)
+		num_cschannel = 1;
+
+	tot_dimms = num_branch * num_channel * num_dimm;
+	dimm_div = (num_csrows * num_cschannel) / tot_dimms;
+	if (dimm_div == 0) {
+		printk(KERN_ERR "%s: dimm_div is wrong: tot_channels/tot_dimms = %d/%d < 1\n",
+			__func__, num_csrows * num_cschannel, tot_dimms);
+		dimm_div = 1;
+	}
+	/* FIXME: change it to debug2() at the final version */
 
 	/* Figure out the offsets of the various items from the start of an mc
 	 * structure.  We want the alignment of each item to be at least as
 	 * stringent as what the compiler would provide if we could simply
 	 * hardcode everything into a single struct.
 	 */
-	mci = (struct mem_ctl_info *)0;
-	csi = edac_align_ptr(&mci[1], sizeof(*csi));
-	chi = edac_align_ptr(&csi[nr_csrows], sizeof(*chi));
-	dimm = edac_align_ptr(&chi[nr_chans * nr_csrows], sizeof(*dimm));
-	pvt = edac_align_ptr(&dimm[nr_chans * nr_csrows], sz_pvt);
+	ptr = NULL;
+	mci = edac_align_ptr(&ptr, sizeof(*mci), 1);
+	csi = edac_align_ptr(&ptr, sizeof(*csi), num_csrows);
+	chi = edac_align_ptr(&ptr, sizeof(*chi), num_csrows * num_cschannel);
+	dimm = edac_align_ptr(&ptr, sizeof(*dimm), tot_dimms);
+
+	count = num_branch;
+	ue_branch = edac_align_ptr(&ptr, sizeof(*ce_branch), count);
+	ce_branch = edac_align_ptr(&ptr, sizeof(*ce_branch), count);
+	count *= num_channel;
+	ue_channel = edac_align_ptr(&ptr, sizeof(*ce_channel), count);
+	ce_channel = edac_align_ptr(&ptr, sizeof(*ce_channel), count);
+	count *= num_dimm;
+	ue_dimm = edac_align_ptr(&ptr, sizeof(*ce_dimm), count * num_dimm);
+	ce_dimm = edac_align_ptr(&ptr, sizeof(*ce_dimm), count * num_dimm);
+
+	count = num_csrows;
+	ue_csrow = edac_align_ptr(&ptr, sizeof(*ce_dimm), count);
+	ce_csrow = edac_align_ptr(&ptr, sizeof(*ce_dimm), count);
+	count *= num_cschannel;
+	ue_cschannel = edac_align_ptr(&ptr, sizeof(*ce_dimm), count);
+	ce_cschannel = edac_align_ptr(&ptr, sizeof(*ce_dimm), count);
+
+	pvt = edac_align_ptr(&ptr, sz_pvt, 1);
 	size = ((unsigned long)pvt) + sz_pvt;
 
+	debugf1("%s(): allocating %u bytes for mci data\n", __func__, size);
 	mci = kzalloc(size, GFP_KERNEL);
 	if (mci == NULL)
 		return NULL;
@@ -197,41 +315,121 @@ struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 	mci->csrows = csi;
 	mci->dimms  = dimm;
 	mci->pvt_info = pvt;
-	mci->nr_csrows = nr_csrows;
-
-	for (row = 0; row < nr_csrows; row++) {
-		csrow = &csi[row];
-		csrow->csrow_idx = row;
-		csrow->mci = mci;
-		csrow->nr_channels = nr_chans;
-		chp = &chi[row * nr_chans];
-		csrow->channels = chp;
-
-		for (chn = 0; chn < nr_chans; chn++) {
-			chan = &chp[chn];
-			chan->chan_idx = chn;
-			chan->csrow = csrow;
+
+	mci->tot_dimms = tot_dimms;
+	mci->num_branch = num_branch;
+	mci->num_channel = num_channel;
+	mci->num_dimm = num_dimm;
+	mci->num_csrows = num_csrows;
+	mci->num_cschannel = num_cschannel;
+
+	/*
+	 * Fills the dimm struct
+	 */
+	mc_branch = (num_branch > 0) ? 0 : -1;
+	mc_channel = (num_channel > 0) ? 0 : -1;
+	mc_dimm_number = (num_dimm > 0) ? 0 : -1;
+	if (!num_channel && !num_branch) {
+		csrow = (num_csrows > 0) ? 0 : -1;
+		cschannel = (num_cschannel > 0) ? 0 : -1;
+	} else {
+		csrow = -1;
+		cschannel = -1;
+	}
+
+	debugf4("%s: initializing %d dimms\n", __func__, tot_dimms);
+	for (i = 0; i < tot_dimms; i++) {
+		dimm = &mci->dimms[i];
+
+		dimm->mc_branch = mc_branch;
+		dimm->mc_channel = mc_channel;
+		dimm->mc_dimm_number = mc_dimm_number;
+		dimm->csrow = csrow;
+		dimm->cschannel = cschannel;
+
+		/*
+		 * Increment the location
+		 * On csrow-emulated devices, csrow/cschannel should be -1
+		 */
+		if (!num_channel && !num_branch) {
+			if (num_cschannel) {
+				cschannel = (cschannel + 1) % num_cschannel;
+				if (cschannel)
+					continue;
+			}
+			if (num_csrows) {
+				csrow = (csrow + 1) % num_csrows;
+				if (csrow)
+					continue;
+			}
+		}
+		if (num_dimm) {
+			mc_dimm_number = (mc_dimm_number + 1) % num_dimm;
+			if (mc_dimm_number)
+				continue;
+		}
+		if (num_channel) {
+			mc_channel = (mc_channel + 1) % num_channel;
+			if (mc_channel)
+				continue;
+		}
+		if (num_branch) {
+			mc_branch = (mc_branch + 1) % num_branch;
+			if (mc_branch)
+				continue;
 		}
 	}
 
 	/*
-	 * By default, assumes that a per-csrow arrangement will be used,
-	 * as most drivers are based on such assumption.
+	 * Fills the csrows struct
+	 *
+	 * NOTE: there are two possible memory arrangements here:
+	 *
+	 *
 	 */
-	if (!mci->nr_dimms) {
-		dimm = mci->dimms;
-		for (row = 0; row < mci->nr_csrows; row++) {
-			for (chn = 0; chn < mci->csrows[row].nr_channels; chn++) {
-				mci->csrows[row].channels[chn].dimm = dimm;
-				dimm->mc_branch = -1;
-				dimm->mc_channel = -1;
-				dimm->mc_dimm_number = -1;
-				dimm->csrow = row;
-				dimm->csrow_channel = chn;
-				dimm++;
-				mci->nr_dimms++;
+	switch (fill_strategy) {
+	case EDAC_ALLOC_FILL_CSROW_CSCHANNEL:
+		for (row = 0; row < num_csrows; row++) {
+			csr = &csi[row];
+			csr->csrow_idx = row;
+			csr->mci = mci;
+			csr->nr_channels = num_cschannel;
+			chp = &chi[row * num_cschannel];
+			csr->channels = chp;
+
+			for (chn = 0; chn < num_cschannel; chn++) {
+				int dimm_idx = (chn + row * num_cschannel) /
+						dimm_div;
+				debugf4("%s: csrow(%d,%d) = dimm%d\n",
+					__func__, row, chn, dimm_idx);
+				chan = &chp[chn];
+				chan->chan_idx = chn;
+				chan->csrow = csr;
+				chan->dimm = &dimm[dimm_idx];
 			}
 		}
+	case EDAC_ALLOC_FILL_MCCHANNEL_IS_CSROW:
+		for (row = 0; row < num_csrows; row++) {
+			csr = &csi[row];
+			csr->csrow_idx = row;
+			csr->mci = mci;
+			csr->nr_channels = num_cschannel;
+			chp = &chi[row * num_cschannel];
+			csr->channels = chp;
+
+			for (chn = 0; chn < num_cschannel; chn++) {
+				int dimm_idx = (chn * num_cschannel + row) /
+						dimm_div;
+				debugf4("%s: csrow(%d,%d) = dimm%d\n",
+					__func__, row, chn, dimm_idx);
+				chan = &chp[chn];
+				chan->chan_idx = chn;
+				chan->csrow = csr;
+				chan->dimm = &dimm[dimm_idx];
+			}
+		}
+	case EDAC_ALLOC_FILL_PRIV:
+		break;
 	}
 
 	mci->op_state = OP_ALLOC;
@@ -522,7 +720,6 @@ EXPORT_SYMBOL(edac_mc_find);
  * edac_mc_add_mc: Insert the 'mci' structure into the mci global list and
  *                 create sysfs entries associated with mci structure
  * @mci: pointer to the mci structure to be added to the list
- * @mc_idx: A unique numeric identifier to be assigned to the 'mci' structure.
  *
  * Return:
  *	0	Success
@@ -540,13 +737,15 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
 
 	if (edac_debug_level >= 4) {
 		int i;
-		for (i = 0; i < mci->nr_csrows; i++) {
+		for (i = 0; i < mci->num_csrows; i++) {
 			int j;
 			edac_mc_dump_csrow(&mci->csrows[i]);
 			for (j = 0; j < mci->csrows[i].nr_channels; j++)
 				edac_mc_dump_channel(&mci->csrows[i].
 						channels[j]);
 		}
+		for (i = 0; i < mci->tot_dimms; i++)
+			edac_mc_dump_dimm(&mci->dimms[i]);
 	}
 #endif
 	mutex_lock(&mem_ctls_mutex);
@@ -671,7 +870,7 @@ int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci, unsigned long page)
 	debugf1("MC%d: %s(): 0x%lx\n", mci->mc_idx, __func__, page);
 	row = -1;
 
-	for (i = 0; i < mci->nr_csrows; i++) {
+	for (i = 0; i < mci->num_csrows; i++) {
 		struct csrow_info *csrow = &csrows[i];
 		n = 0;
 		for (j = 0; j < csrow->nr_channels; j++) {
@@ -704,312 +903,338 @@ int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci, unsigned long page)
 }
 EXPORT_SYMBOL_GPL(edac_mc_find_csrow_by_page);
 
-/* FIXME - setable log (warning/emerg) levels */
-/* FIXME - integrate with evlog: http://evlog.sourceforge.net/ */
-void edac_mc_handle_ce(struct mem_ctl_info *mci,
-		unsigned long page_frame_number,
-		unsigned long offset_in_page, unsigned long syndrome,
-		int row, int channel, const char *msg)
+void edac_increment_ce_error(enum hw_event_error_scope scope,
+			     struct mem_ctl_info *mci,
+			     int mc_branch,
+			     int mc_channel,
+			     int mc_dimm_number,
+			     int csrow,
+			     int cschannel)
 {
-	unsigned long remapped_page;
-	char detail[80], *label = NULL;
-	u32 grain;
+	int index;
 
-	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
+	mci->err.ce_mc++;
 
-	/* FIXME - maybe make panic on INTERNAL ERROR an option */
-	if (row >= mci->nr_csrows || row < 0) {
-		/* something is wrong */
-		trace_mc_out_of_range(mci, "CE", "row", row, 0, mci->nr_csrows);
-		edac_mc_printk(mci, KERN_ERR,
-			"INTERNAL ERROR: row out of range "
-			"(%d >= %d)\n", row, mci->nr_csrows);
-		edac_mc_handle_ce_no_info(mci, "INTERNAL ERROR");
+	if (scope == HW_EVENT_SCOPE_MC) {
+		mci->ce_noinfo_count = 0;
 		return;
 	}
 
-	if (channel >= mci->csrows[row].nr_channels || channel < 0) {
-		/* something is wrong */
-		trace_mc_out_of_range(mci, "CE", "channel", channel,
-				      0, mci->csrows[row].nr_channels);
-		edac_mc_printk(mci, KERN_ERR,
-			"INTERNAL ERROR: channel out of range "
-			"(%d >= %d)\n", channel,
-			mci->csrows[row].nr_channels);
-		edac_mc_handle_ce_no_info(mci, "INTERNAL ERROR");
-		return;
+	index = 0;
+	if (mc_branch >= 0) {
+		index = mc_branch;
+		mci->err.ce_branch[index]++;
 	}
+	if (scope == HW_EVENT_SCOPE_MC_BRANCH)
+		return;
+	index *= mci->num_branch;
 
-	label = mci->csrows[row].channels[channel].dimm->label;
-	grain = mci->csrows[row].channels[channel].dimm->grain;
-
-	/* Memory type dependent details about the error */
-	snprintf(detail, sizeof(detail),
-		 " (page 0x%lx, offset 0x%lx, grain %d, "
-		 "syndrome 0x%lx, row %d, channel %d)\n",
-		 page_frame_number, offset_in_page,
-		 grain, syndrome, row, channel);
-	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
-		       label, msg, detail);
-
-	if (edac_mc_get_log_ce())
-		/* FIXME - put in DIMM location */
-		edac_mc_printk(mci, KERN_WARNING,
-			"CE page 0x%lx, offset 0x%lx, grain %d, syndrome "
-			"0x%lx, row %d, channel %d, label \"%s\": %s\n",
-			page_frame_number, offset_in_page,
-			grain, syndrome, row, channel,
-			label, msg);
+	if (mc_channel >= 0) {
+		index += mc_channel;
+		mci->err.ce_channel[index]++;
+	}
+	if (scope == HW_EVENT_SCOPE_MC_CHANNEL)
+		return;
+	index *= mci->num_channel;
 
-	mci->ce_count++;
-	mci->csrows[row].ce_count++;
-	mci->csrows[row].channels[channel].dimm->ce_count++;
-	mci->csrows[row].channels[channel].ce_count++;
+	if (mc_dimm_number >= 0) {
+		index += mc_dimm_number;
+		mci->err.ce_dimm[index]++;
+	}
+	if (scope == HW_EVENT_SCOPE_MC_DIMM)
+		return;
+	index *= mci->num_dimm;
 
-	if (mci->scrub_mode & SCRUB_SW_SRC) {
-		/*
-		 * Some MC's can remap memory so that it is still available
-		 * at a different address when PCI devices map into memory.
-		 * MC's that can't do this lose the memory where PCI devices
-		 * are mapped.  This mapping is MC dependent and so we call
-		 * back into the MC driver for it to map the MC page to
-		 * a physical (CPU) page which can then be mapped to a virtual
-		 * page - which can then be scrubbed.
-		 */
-		remapped_page = mci->ctl_page_to_phys ?
-			mci->ctl_page_to_phys(mci, page_frame_number) :
-			page_frame_number;
+	if (csrow >= 0) {
+		index += csrow;
+		mci->err.ce_csrow[csrow]++;
+	}
+	if (scope == HW_EVENT_SCOPE_MC_CSROW_CHANNEL)
+		return;
+	index *= mci->num_csrows;
 
-		edac_mc_scrub_block(remapped_page, offset_in_page, grain);
+	if (cschannel >= 0) {
+		index += cschannel;
+		mci->err.ce_cschannel[index]++;
 	}
 }
-EXPORT_SYMBOL_GPL(edac_mc_handle_ce);
 
-void edac_mc_handle_ce_no_info(struct mem_ctl_info *mci, const char *msg)
+void edac_increment_ue_error(enum hw_event_error_scope scope,
+			     struct mem_ctl_info *mci,
+			     int mc_branch,
+			     int mc_channel,
+			     int mc_dimm_number,
+			     int csrow,
+			     int cschannel)
 {
-	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
-		       "unknown", msg, "");
-	if (edac_mc_get_log_ce())
-		edac_mc_printk(mci, KERN_WARNING,
-			"CE - no information available: %s\n", msg);
+	int index;
 
-	mci->ce_noinfo_count++;
-	mci->ce_count++;
-}
-EXPORT_SYMBOL_GPL(edac_mc_handle_ce_no_info);
+	mci->err.ue_mc++;
 
-void edac_mc_handle_ue(struct mem_ctl_info *mci,
-		unsigned long page_frame_number,
-		unsigned long offset_in_page, int row, const char *msg)
-{
-	int len = EDAC_MC_LABEL_LEN * 4;
-	char labels[len + 1];
-	char *pos = labels;
-	int chan;
-	int chars;
-	char detail[80], *label = NULL;
-	u32 grain;
-
-	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
-
-	/* FIXME - maybe make panic on INTERNAL ERROR an option */
-	if (row >= mci->nr_csrows || row < 0) {
-		/* something is wrong */
-		trace_mc_out_of_range(mci, "UE", "row", row,
-				      0, mci->nr_csrows);
-		edac_mc_printk(mci, KERN_ERR,
-			"INTERNAL ERROR: row out of range "
-			"(%d >= %d)\n", row, mci->nr_csrows);
-		edac_mc_handle_ue_no_info(mci, "INTERNAL ERROR");
+	if (scope == HW_EVENT_SCOPE_MC) {
+		mci->ue_noinfo_count = 0;
 		return;
 	}
 
-	grain = mci->csrows[row].channels[0].dimm->grain;
-	label = mci->csrows[row].channels[0].dimm->label;
-	chars = snprintf(pos, len + 1, "%s", label);
-	len -= chars;
-	pos += chars;
-
-	for (chan = 1; (chan < mci->csrows[row].nr_channels) && (len > 0);
-		chan++) {
-		label = mci->csrows[row].channels[chan].dimm->label;
-		chars = snprintf(pos, len + 1, ":%s", label);
-		len -= chars;
-		pos += chars;
+	index = 0;
+	if (mc_branch >= 0) {
+		index = mc_branch;
+		mci->err.ue_branch[index]++;
 	}
+	if (scope == HW_EVENT_SCOPE_MC_BRANCH)
+		return;
+	index *= mci->num_branch;
 
-	/* Memory type dependent details about the error */
-	snprintf(detail, sizeof(detail),
-		 "page 0x%lx, offset 0x%lx, grain %d, row %d ",
-		 page_frame_number, offset_in_page, grain, row);
-	trace_mc_error(HW_EVENT_ERR_UNCORRECTED, mci->mc_idx,
-		       labels,
-		       msg, detail);
-
-	if (edac_mc_get_log_ue())
-		edac_mc_printk(mci, KERN_EMERG,
-			"UE page 0x%lx, offset 0x%lx, grain %d, row %d, "
-			"labels \"%s\": %s\n", page_frame_number,
-			offset_in_page, grain, row, labels, msg);
-
-	if (edac_mc_get_panic_on_ue())
-		panic("EDAC MC%d: UE page 0x%lx, offset 0x%lx, grain %d, "
-			"row %d, labels \"%s\": %s\n", mci->mc_idx,
-			page_frame_number, offset_in_page,
-			grain, row, labels, msg);
+	if (mc_channel >= 0) {
+		index += mc_channel;
+		mci->err.ue_channel[index]++;
+	}
+	if (scope == HW_EVENT_SCOPE_MC_CHANNEL)
+		return;
+	index *= mci->num_channel;
 
-	mci->ue_count++;
-	mci->csrows[row].ue_count++;
-}
-EXPORT_SYMBOL_GPL(edac_mc_handle_ue);
+	if (mc_dimm_number >= 0) {
+		index += mc_dimm_number;
+		mci->err.ue_dimm[index]++;
+	}
+	if (scope == HW_EVENT_SCOPE_MC_DIMM)
+		return;
+	index *= mci->num_dimm;
 
-void edac_mc_handle_ue_no_info(struct mem_ctl_info *mci, const char *msg)
-{
-	trace_mc_error(HW_EVENT_ERR_UNCORRECTED, mci->mc_idx,
-		       "unknown", msg, "");
-	if (edac_mc_get_panic_on_ue())
-		panic("EDAC MC%d: Uncorrected Error", mci->mc_idx);
+	if (csrow >= 0) {
+		index += csrow;
+		mci->err.ue_csrow[csrow]++;
+	}
+	if (scope == HW_EVENT_SCOPE_MC_CSROW_CHANNEL)
+		return;
+	index *= mci->num_csrows;
 
-	if (edac_mc_get_log_ue())
-		edac_mc_printk(mci, KERN_WARNING,
-			"UE - no information available: %s\n", msg);
-	mci->ue_noinfo_count++;
-	mci->ue_count++;
+	if (cschannel >= 0) {
+		index += cschannel;
+		mci->err.ue_cschannel[index]++;
+	}
 }
-EXPORT_SYMBOL_GPL(edac_mc_handle_ue_no_info);
 
-/*************************************************************
- * On Fully Buffered DIMM modules, this help function is
- * called to process UE events
- */
-void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
-			unsigned int csrow,
-			unsigned int channela,
-			unsigned int channelb, char *msg)
+void edac_mc_handle_error(enum hw_event_mc_err_type type,
+			  enum hw_event_error_scope scope,
+			  struct mem_ctl_info *mci,
+			  unsigned long page_frame_number,
+			  unsigned long offset_in_page,
+			  unsigned long syndrome,
+			  int mc_branch,
+			  int mc_channel,
+			  int mc_dimm_number,
+			  int csrow,
+			  int cschannel,
+			  const char *msg,
+			  const char *other_detail)
 {
-	int len = EDAC_MC_LABEL_LEN * 4;
-	char labels[len + 1];
-	char *pos = labels;
-	int chars;
-	char detail[80], *label;
+	unsigned long remapped_page;
+	/* FIXME: too much for stack. Move it to some pre-alocated area */
+	char detail[80 + strlen(other_detail)];
+	char label[(EDAC_MC_LABEL_LEN + 2) * mci->tot_dimms], *p;
+	char location[80];
+	int i;
+	u32 grain;
 
-	if (csrow >= mci->nr_csrows) {
-		/* something is wrong */
+	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
 
-		trace_mc_out_of_range(mci, "UE FBDIMM", "row", csrow,
-				      0, mci->nr_csrows);
+	/* Check if the event report is consistent */
+	if ((scope == HW_EVENT_SCOPE_MC_CSROW_CHANNEL) &&
+	    (cschannel >= mci->num_cschannel)) {
+		trace_mc_out_of_range(mci, "CE", "cs channel", cschannel,
+					0, mci->num_cschannel);
 		edac_mc_printk(mci, KERN_ERR,
-			"INTERNAL ERROR: row out of range (%d >= %d)\n",
-			csrow, mci->nr_csrows);
-		edac_mc_handle_ue_no_info(mci, "INTERNAL ERROR");
+				"INTERNAL ERROR: cs channel out of range (%d >= %d)\n",
+				cschannel, mci->num_cschannel);
+		if (type == HW_EVENT_ERR_CORRECTED)
+			mci->err.ce_mc++;
+		else
+			mci->err.ue_mc++;
 		return;
+	} else {
+		cschannel = -1;
 	}
 
-	if (channela >= mci->csrows[csrow].nr_channels) {
-		/* something is wrong */
-		trace_mc_out_of_range(mci, "UE FBDIMM", "channel-a", channela,
-				      0, mci->csrows[csrow].nr_channels);
+	if ((scope <= HW_EVENT_SCOPE_MC_CSROW) &&
+	    (csrow >= mci->num_csrows)) {
+		trace_mc_out_of_range(mci, "CE", "csrow", csrow,
+					0, mci->num_csrows);
 		edac_mc_printk(mci, KERN_ERR,
-			"INTERNAL ERROR: channel-a out of range "
-			"(%d >= %d)\n",
-			channela, mci->csrows[csrow].nr_channels);
-		edac_mc_handle_ue_no_info(mci, "INTERNAL ERROR");
+				"INTERNAL ERROR: csrow out of range (%d >= %d)\n",
+				csrow, mci->num_csrows);
+		if (type == HW_EVENT_ERR_CORRECTED)
+			mci->err.ce_mc++;
+		else
+			mci->err.ue_mc++;
 		return;
+	} else {
+		csrow = -1;
 	}
 
-	if (channelb >= mci->csrows[csrow].nr_channels) {
-		/* something is wrong */
-		trace_mc_out_of_range(mci, "UE FBDIMM", "channel-b", channelb,
-				      0, mci->csrows[csrow].nr_channels);
+	if ((scope <= HW_EVENT_SCOPE_MC_CSROW) &&
+	    (mc_dimm_number >= mci->num_dimm)) {
+		trace_mc_out_of_range(mci, "CE", "dimm_number",
+					mc_dimm_number, 0, mci->num_dimm);
 		edac_mc_printk(mci, KERN_ERR,
-			"INTERNAL ERROR: channel-b out of range "
-			"(%d >= %d)\n",
-			channelb, mci->csrows[csrow].nr_channels);
-		edac_mc_handle_ue_no_info(mci, "INTERNAL ERROR");
+				"INTERNAL ERROR: dimm_number out of range (%d >= %d)\n",
+				mc_dimm_number, mci->num_dimm);
+		if (type == HW_EVENT_ERR_CORRECTED)
+			mci->err.ce_mc++;
+		else
+			mci->err.ue_mc++;
 		return;
+	} else {
+		mc_dimm_number = -1;
 	}
 
-	mci->ue_count++;
-	mci->csrows[csrow].ue_count++;
-
-	/* Generate the DIMM labels from the specified channels */
-	label = mci->csrows[csrow].channels[channela].dimm->label;
-	chars = snprintf(pos, len + 1, "%s", label);
-	len -= chars;
-	pos += chars;
-
-	chars = snprintf(pos, len + 1, "-%s",
-			mci->csrows[csrow].channels[channelb].dimm->label);
-
-	/* Memory type dependent details about the error */
-	snprintf(detail, sizeof(detail),
-		 "row %d, channel-a= %d channel-b= %d ",
-		 csrow, channela, channelb);
-	trace_mc_error(HW_EVENT_ERR_UNCORRECTED, mci->mc_idx,
-		       labels,
-		       msg, detail);
-	if (edac_mc_get_log_ue())
-		edac_mc_printk(mci, KERN_EMERG,
-			"UE row %d, channel-a= %d channel-b= %d "
-			"labels \"%s\": %s\n", csrow, channela, channelb,
-			labels, msg);
-
-	if (edac_mc_get_panic_on_ue())
-		panic("UE row %d, channel-a= %d channel-b= %d "
-			"labels \"%s\": %s\n", csrow, channela,
-			channelb, labels, msg);
-}
-EXPORT_SYMBOL(edac_mc_handle_fbd_ue);
-
-/*************************************************************
- * On Fully Buffered DIMM modules, this help function is
- * called to process CE events
- */
-void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
-			unsigned int csrow, unsigned int channel, char *msg)
-{
-	char detail[80], *label = NULL;
-	/* Ensure boundary values */
-	if (csrow >= mci->nr_csrows) {
-		/* something is wrong */
-		trace_mc_out_of_range(mci, "CE FBDIMM", "row", csrow,
-				      0, mci->nr_csrows);
+	if ((scope <= HW_EVENT_SCOPE_MC_CHANNEL) &&
+	    (mc_channel >= mci->num_dimm)) {
+		trace_mc_out_of_range(mci, "CE", "mc_channel",
+					mc_channel, 0, mci->num_dimm);
 		edac_mc_printk(mci, KERN_ERR,
-			"INTERNAL ERROR: row out of range (%d >= %d)\n",
-			csrow, mci->nr_csrows);
-		edac_mc_handle_ce_no_info(mci, "INTERNAL ERROR");
+				"INTERNAL ERROR: mc_channel out of range (%d >= %d)\n",
+				mc_channel, mci->num_dimm);
+		if (type == HW_EVENT_ERR_CORRECTED)
+			mci->err.ce_mc++;
+		else
+			mci->err.ue_mc++;
 		return;
+	} else {
+		mc_channel = -1;
 	}
-	if (channel >= mci->csrows[csrow].nr_channels) {
-		/* something is wrong */
-		trace_mc_out_of_range(mci, "UE FBDIMM", "channel", channel,
-				      0, mci->csrows[csrow].nr_channels);
+
+	if ((scope <= HW_EVENT_SCOPE_MC_BRANCH) &&
+	    (mc_branch >= mci->num_branch)) {
+		trace_mc_out_of_range(mci, "CE", "branch",
+					mc_branch, 0, mci->num_branch);
 		edac_mc_printk(mci, KERN_ERR,
-			"INTERNAL ERROR: channel out of range (%d >= %d)\n",
-			channel, mci->csrows[csrow].nr_channels);
-		edac_mc_handle_ce_no_info(mci, "INTERNAL ERROR");
+				"INTERNAL ERROR: mc_branch out of range (%d >= %d)\n",
+				mc_branch, mci->num_branch);
+		if (type == HW_EVENT_ERR_CORRECTED)
+			mci->err.ce_mc++;
+		else
+			mci->err.ue_mc++;
 		return;
+	} else {
+		mc_branch = -1;
 	}
 
-	/* Memory type dependent details about the error */
-	snprintf(detail, sizeof(detail),
-		 "(row %d, channel %d)\n",
-		 csrow, channel);
+	/*
+	 * Get the dimm label/grain that applies to the match criteria.
+	 * As the error algorithm may not be able to point to just one memory,
+	 * the logic here will get all possible labels that could pottentially
+	 * be affected by the error.
+	 * On FB-DIMM memory controllers, for uncorrected errors, it is common
+	 * to have only the MC channel and the MC dimm (also called as "rank")
+	 * but the channel is not known, as the memory is arranged in pairs,
+	 * where each memory belongs to a separate channel within the same
+	 * branch.
+	 * It will also get the max grain, over the error match range
+	 */
+	grain = 0;
+	p = label;
+	for (i = 0; i < mci->tot_dimms; i++) {
+		struct dimm_info *dimm = &mci->dimms[i];
 
-	label = mci->csrows[csrow].channels[channel].dimm->label;
+		if (mc_branch >= 0 && mc_branch != dimm->mc_branch)
+			continue;
 
-	trace_mc_error(HW_EVENT_ERR_CORRECTED, mci->mc_idx,
-		       label, msg, detail);
+		if (mc_channel >= 0 && mc_channel != dimm->mc_channel)
+			continue;
 
-	if (edac_mc_get_log_ce())
-		/* FIXME - put in DIMM location */
-		edac_mc_printk(mci, KERN_WARNING,
-			"CE row %d, channel %d, label \"%s\": %s\n",
-			csrow, channel, label, msg);
+		if (mc_dimm_number >= 0 &&
+		    mc_dimm_number != dimm->mc_dimm_number)
+			continue;
+
+		if (csrow >= 0 && csrow != dimm->csrow)
+			continue;
+		if (cschannel >= 0 && cschannel != dimm->cschannel)
+			continue;
+
+		if (dimm->grain > grain)
+			grain = dimm->grain;
+
+		strcpy(p, dimm->label);
+		p[strlen(p)] = ' ';
+		p = p + strlen(p);
+	}
+	p[strlen(p)] = '\0';
+
+	/* Fill the RAM location data */
+	p = location;
+	if (mc_branch >= 0)
+		p += sprintf(p, "branch %d ", mc_branch);
+
+	if (mc_channel >= 0)
+		p += sprintf(p, "channel %d ", mc_channel);
+
+	if (mc_dimm_number >= 0)
+		p += sprintf(p, "dimm %d ", mc_dimm_number);
 
-	mci->ce_count++;
-	mci->csrows[csrow].ce_count++;
-	mci->csrows[csrow].channels[channel].dimm->ce_count++;
-	mci->csrows[csrow].channels[channel].ce_count++;
+	if (csrow >= 0)
+		p += sprintf(p, "csrow %d ", csrow);
+
+	if (cschannel >= 0)
+		p += sprintf(p, "cs_channel %d ", cschannel);
+
+
+	/* Memory type dependent details about the error */
+	if (type == HW_EVENT_ERR_CORRECTED)
+		snprintf(detail, sizeof(detail),
+			"page 0x%lx offset 0x%lx grain %d syndrome 0x%lx\n",
+			page_frame_number, offset_in_page,
+			grain, syndrome);
+	else
+		snprintf(detail, sizeof(detail),
+			"page 0x%lx offset 0x%lx grain %d\n",
+			page_frame_number, offset_in_page, grain);
+
+	trace_mc_error(type, mci->mc_idx, msg, label, mc_branch, mc_channel,
+		       mc_dimm_number, csrow, cschannel,
+		       detail, other_detail);
+
+	if (type == HW_EVENT_ERR_CORRECTED) {
+		if (edac_mc_get_log_ce())
+			edac_mc_printk(mci, KERN_WARNING,
+				       "CE %s label \"%s\" (location: %d.%d.%d.%d.%d %s %s)\n",
+				       msg, label, mc_branch, mc_channel,
+				       mc_dimm_number, csrow, cschannel,
+				       detail, other_detail);
+		edac_increment_ce_error(scope, mci, mc_branch, mc_channel,
+					mc_dimm_number, csrow, cschannel);
+
+		if (mci->scrub_mode & SCRUB_SW_SRC) {
+			/*
+			 * Some MC's can remap memory so that it is still
+			 * available at a different address when PCI devices
+			 * map into memory.
+			 * MC's that can't do this lose the memory where PCI
+			 * devices are mapped. This mapping is MC dependent
+			 * and so we call back into the MC driver for it to
+			 * map the MC page to a physical (CPU) page which can
+			 * then be mapped to a virtual page - which can then
+			 * be scrubbed.
+			 */
+			remapped_page = mci->ctl_page_to_phys ?
+				mci->ctl_page_to_phys(mci, page_frame_number) :
+				page_frame_number;
+
+			edac_mc_scrub_block(remapped_page,
+					    offset_in_page, grain);
+		}
+	} else {
+		if (edac_mc_get_log_ue())
+			edac_mc_printk(mci, KERN_WARNING,
+				"UE %s label \"%s\" (%s %s %s)\n",
+				msg, label, location, detail, other_detail);
+
+		if (edac_mc_get_panic_on_ue())
+			panic("UE %s label \"%s\" (%s %s %s)\n",
+			      msg, label, location, detail, other_detail);
+
+		edac_increment_ue_error(scope, mci, mc_branch, mc_channel,
+					mc_dimm_number, csrow, cschannel);
+	}
 }
-EXPORT_SYMBOL(edac_mc_handle_fbd_ce);
+EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 64b4c76..a6f611f 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -132,13 +132,17 @@ static const char *edac_caps[] = {
 static ssize_t csrow_ue_count_show(struct csrow_info *csrow, char *data,
 				int private)
 {
-	return sprintf(data, "%u\n", csrow->ue_count);
+	struct mem_ctl_info *mci = csrow->mci;
+
+	return sprintf(data, "%u\n", mci->err.ue_csrow[csrow->csrow_idx]);
 }
 
 static ssize_t csrow_ce_count_show(struct csrow_info *csrow, char *data,
 				int private)
 {
-	return sprintf(data, "%u\n", csrow->ce_count);
+	struct mem_ctl_info *mci = csrow->mci;
+
+	return sprintf(data, "%u\n", mci->err.ce_csrow[csrow->csrow_idx]);
 }
 
 static ssize_t csrow_size_show(struct csrow_info *csrow, char *data,
@@ -205,7 +209,10 @@ static ssize_t channel_dimm_label_store(struct csrow_info *csrow,
 static ssize_t channel_ce_count_show(struct csrow_info *csrow,
 				char *data, int channel)
 {
-	return sprintf(data, "%u\n", csrow->channels[channel].ce_count);
+	struct mem_ctl_info *mci = csrow->mci;
+	int index = csrow->csrow_idx * mci->num_cschannel + channel;
+
+	return sprintf(data, "%u\n", mci->err.ce_cschannel[index]);
 }
 
 /* csrow specific attribute structure */
@@ -479,14 +486,14 @@ static ssize_t dimmdev_location_show(struct dimm_info *dimm, char *data)
 	if (dimm->mc_channel >= 0)
 		p += sprintf(p, "channel %d ", dimm->mc_channel);
 
+	if (dimm->mc_dimm_number >= 0)
+		p += sprintf(p, "dimm %d ", dimm->mc_dimm_number);
+
 	if (dimm->csrow >= 0)
 		p += sprintf(p, "csrow %d ", dimm->csrow);
 
-	if (dimm->csrow_channel >= 0)
-		p += sprintf(p, "cs_channel %d ", dimm->csrow_channel);
-
-	if (dimm->mc_dimm_number >= 0)
-		p += sprintf(p, "dimm %d ", dimm->mc_dimm_number);
+	if (dimm->cschannel >= 0)
+		p += sprintf(p, "cs_channel %d ", dimm->cschannel);
 
 	return p - data;
 }
@@ -614,22 +621,27 @@ err_out:
 static ssize_t mci_reset_counters_store(struct mem_ctl_info *mci,
 					const char *data, size_t count)
 {
-	int row, chan;
-
+	int num;
+	mci->err.ue_mc = 0;
+	mci->err.ce_mc = 0;
 	mci->ue_noinfo_count = 0;
 	mci->ce_noinfo_count = 0;
-	mci->ue_count = 0;
-	mci->ce_count = 0;
 
-	for (row = 0; row < mci->nr_csrows; row++) {
-		struct csrow_info *ri = &mci->csrows[row];
-
-		ri->ue_count = 0;
-		ri->ce_count = 0;
-
-		for (chan = 0; chan < ri->nr_channels; chan++)
-			ri->channels[chan].ce_count = 0;
-	}
+	num = mci->num_branch;
+	memset(mci->err.ue_branch, 0, num);
+	memset(mci->err.ce_branch, 0, num);
+	num *= mci->num_channel;
+	memset(mci->err.ue_channel, 0, num);
+	memset(mci->err.ce_channel, 0, num);
+	num *= mci->num_dimm;
+	memset(mci->err.ue_dimm, 0, num);
+	memset(mci->err.ce_dimm, 0, num);
+	num *= mci->num_csrows;
+	memset(mci->err.ue_csrow, 0, num);
+	memset(mci->err.ce_csrow, 0, num);
+	num *= mci->num_cschannel;
+	memset(mci->err.ue_cschannel, 0, num);
+	memset(mci->err.ce_cschannel, 0, num);
 
 	mci->start_time = jiffies;
 	return count;
@@ -688,12 +700,12 @@ static ssize_t mci_sdram_scrub_rate_show(struct mem_ctl_info *mci, char *data)
 /* default attribute files for the MCI object */
 static ssize_t mci_ue_count_show(struct mem_ctl_info *mci, char *data)
 {
-	return sprintf(data, "%d\n", mci->ue_count);
+	return sprintf(data, "%d\n", mci->err.ue_mc);
 }
 
 static ssize_t mci_ce_count_show(struct mem_ctl_info *mci, char *data)
 {
-	return sprintf(data, "%d\n", mci->ce_count);
+	return sprintf(data, "%d\n", mci->err.ce_mc);
 }
 
 static ssize_t mci_ce_noinfo_show(struct mem_ctl_info *mci, char *data)
@@ -720,7 +732,7 @@ static ssize_t mci_size_mb_show(struct mem_ctl_info *mci, char *data)
 {
 	int total_pages, csrow_idx, j;
 
-	for (total_pages = csrow_idx = 0; csrow_idx < mci->nr_csrows;
+	for (total_pages = csrow_idx = 0; csrow_idx < mci->num_csrows;
 	     csrow_idx++) {
 		struct csrow_info *csrow = &mci->csrows[csrow_idx];
 
@@ -1133,7 +1145,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 
 	/* Make directories for each CSROW object under the mc<id> kobject
 	 */
-	for (i = 0; i < mci->nr_csrows; i++) {
+	for (i = 0; i < mci->num_csrows; i++) {
 		int n = 0;
 
 		csrow = &mci->csrows[i];
@@ -1155,11 +1167,17 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 	/*
 	 * Make directories for each DIMM object under the mc<id> kobject
 	 */
-	for (j = 0; j < mci->nr_dimms; j++) {
-		/* Only expose populated CSROWs */
-		if (mci->dimms[j].nr_pages == 0)
+	for (j = 0; j < mci->tot_dimms; j++) {
+		struct dimm_info *dimm = &mci->dimms[j];
+		/* Only expose populated DIMMs */
+		if (dimm->nr_pages == 0)
 			continue;
-		err = edac_create_dimm_object(mci, &mci->dimms[j] , j);
+
+		debugf1("%s creating dimm%d, located at %d.%d.%d.%d.%d\n",
+			__func__, j, dimm->mc_branch, dimm->mc_channel,
+			dimm->mc_dimm_number, dimm->csrow, dimm->cschannel);
+
+		err = edac_create_dimm_object(mci, dimm, j);
 		if (err) {
 			debugf1("%s() failure: create dimm %d obj\n",
 				__func__, j);
@@ -1213,11 +1231,11 @@ void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
 
 	/* remove all csrow kobjects */
 	debugf4("%s()  unregister this mci kobj\n", __func__);
-	for (i = 0; i < mci->nr_dimms; i++) {
+	for (i = 0; i < mci->tot_dimms; i++) {
 		debugf0("%s()  unreg dimm-%d\n", __func__, i);
 		kobject_put(&mci->dimms[i].kobj);
 	}
-	for (i = 0; i < mci->nr_csrows; i++) {
+	for (i = 0; i < mci->num_csrows; i++) {
 		int n = 0;
 
 		csrow = &mci->csrows[i];
diff --git a/drivers/edac/edac_module.h b/drivers/edac/edac_module.h
index 17aabb7..4206401 100644
--- a/drivers/edac/edac_module.h
+++ b/drivers/edac/edac_module.h
@@ -52,7 +52,7 @@ extern void edac_device_reset_delay_period(struct edac_device_ctl_info
 					   *edac_dev, unsigned long value);
 extern void edac_mc_reset_delay_period(int value);
 
-extern void *edac_align_ptr(void *ptr, unsigned size);
+extern void *edac_align_ptr(void **p, unsigned size, int quant);
 
 /*
  * EDAC PCI functions
diff --git a/drivers/edac/edac_pci.c b/drivers/edac/edac_pci.c
index 2b378207..f4baa73 100644
--- a/drivers/edac/edac_pci.c
+++ b/drivers/edac/edac_pci.c
@@ -43,13 +43,14 @@ struct edac_pci_ctl_info *edac_pci_alloc_ctl_info(unsigned int sz_pvt,
 						const char *edac_pci_name)
 {
 	struct edac_pci_ctl_info *pci;
-	void *pvt;
+	void *p, *pvt;
 	unsigned int size;
 
 	debugf1("%s()\n", __func__);
 
-	pci = (struct edac_pci_ctl_info *)0;
-	pvt = edac_align_ptr(&pci[1], sz_pvt);
+	p = 0;
+	pci = edac_align_ptr(&p, sizeof(*pci), 1);
+	pvt = edac_align_ptr(&p, 1, sz_pvt);
 	size = ((unsigned long)pvt) + sz_pvt;
 
 	/* Alloc the needed control struct memory */
diff --git a/drivers/edac/i3000_edac.c b/drivers/edac/i3000_edac.c
index bf8a230..77c06af 100644
--- a/drivers/edac/i3000_edac.c
+++ b/drivers/edac/i3000_edac.c
@@ -245,7 +245,10 @@ static int i3000_process_error_info(struct mem_ctl_info *mci,
 		return 1;
 
 	if ((info->errsts ^ info->errsts2) & I3000_ERRSTS_BITS) {
-		edac_mc_handle_ce_no_info(mci, "UE overwrote CE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+				     -1, -1, -1, -1, -1,
+				     "UE overwrote CE", "");
 		info->errsts = info->errsts2;
 	}
 
@@ -256,10 +259,18 @@ static int i3000_process_error_info(struct mem_ctl_info *mci,
 	row = edac_mc_find_csrow_by_page(mci, pfn);
 
 	if (info->errsts & I3000_ERRSTS_UE)
-		edac_mc_handle_ue(mci, pfn, offset, row, "i3000 UE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW, mci,
+				     pfn, offset, 0,
+				     -1, -1, -1, row, -1,
+				     "i3000 UE", "");
 	else
-		edac_mc_handle_ce(mci, pfn, offset, info->derrsyn, row,
-				multi_chan ? channel : 0, "i3000 CE");
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     pfn, offset, info->derrsyn,
+				     -1, -1, -1, row,
+				     multi_chan ? channel : 0,
+				     "i3000 CE", "");
 
 	return 1;
 }
@@ -347,7 +358,11 @@ static int i3000_probe1(struct pci_dev *pdev, int dev_idx)
 	 */
 	interleaved = i3000_is_interleaved(c0dra, c1dra, c0drb, c1drb);
 	nr_channels = interleaved ? 2 : 1;
-	mci = edac_mc_alloc(0, I3000_RANKS / nr_channels, nr_channels, 0);
+
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    -1, -1, I3000_RANKS,
+			    I3000_RANKS / nr_channels, nr_channels,
+			    0);
 	if (!mci)
 		return -ENOMEM;
 
@@ -375,7 +390,7 @@ static int i3000_probe1(struct pci_dev *pdev, int dev_idx)
 	 * If we're in interleaved mode then we're only walking through
 	 * the ranks of controller 0, so we double all the values we see.
 	 */
-	for (last_cumul_size = i = 0; i < mci->nr_csrows; i++) {
+	for (last_cumul_size = i = 0; i < mci->num_csrows; i++) {
 		u8 value;
 		u32 cumul_size;
 		struct csrow_info *csrow = &mci->csrows[i];
diff --git a/drivers/edac/i3200_edac.c b/drivers/edac/i3200_edac.c
index b3dc867..6f04a50 100644
--- a/drivers/edac/i3200_edac.c
+++ b/drivers/edac/i3200_edac.c
@@ -21,6 +21,7 @@
 
 #define PCI_DEVICE_ID_INTEL_3200_HB    0x29f0
 
+#define I3200_DIMMS		4
 #define I3200_RANKS		8
 #define I3200_RANKS_PER_CHANNEL	4
 #define I3200_CHANNELS		2
@@ -228,21 +229,29 @@ static void i3200_process_error_info(struct mem_ctl_info *mci,
 		return;
 
 	if ((info->errsts ^ info->errsts2) & I3200_ERRSTS_BITS) {
-		edac_mc_handle_ce_no_info(mci, "UE overwrote CE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+				     -1, -1, -1, -1, -1,
+				     "UE overwrote CE", "");
 		info->errsts = info->errsts2;
 	}
 
 	for (channel = 0; channel < nr_channels; channel++) {
 		log = info->eccerrlog[channel];
 		if (log & I3200_ECCERRLOG_UE) {
-			edac_mc_handle_ue(mci, 0, 0,
-				eccerrlog_row(channel, log),
-				"i3200 UE");
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW, mci,
+					     0, 0, 0,
+					     -1, -1, -1,
+					     eccerrlog_row(channel, log), -1,
+					     "i3000 UE", "");
 		} else if (log & I3200_ECCERRLOG_CE) {
-			edac_mc_handle_ce(mci, 0, 0,
-				eccerrlog_syndrome(log),
-				eccerrlog_row(channel, log), 0,
-				"i3200 CE");
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW, mci,
+					     0, 0, eccerrlog_syndrome(log),
+					     -1, -1, -1,
+					     eccerrlog_row(channel, log), -1,
+					     "i3000 UE", "");
 		}
 	}
 }
@@ -346,8 +355,10 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 	i3200_get_drbs(window, drbs);
 	nr_channels = how_many_channels(pdev);
 
-	mci = edac_mc_alloc(sizeof(struct i3200_priv), I3200_RANKS,
-		nr_channels, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    -1, -1, I3200_DIMMS,
+			    I3200_RANKS, nr_channels,
+			    0);
 	if (!mci)
 		return -ENOMEM;
 
@@ -376,7 +387,7 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 	 * cumulative; the last one will contain the total memory
 	 * contained in all ranks.
 	 */
-	for (i = 0; i < mci->nr_csrows; i++) {
+	for (i = 0; i < mci->num_csrows; i++) {
 		unsigned long nr_pages;
 		struct csrow_info *csrow = &mci->csrows[i];
 
diff --git a/drivers/edac/i5000_edac.c b/drivers/edac/i5000_edac.c
index e8d32e8..5fec235 100644
--- a/drivers/edac/i5000_edac.c
+++ b/drivers/edac/i5000_edac.c
@@ -533,13 +533,15 @@ static void i5000_process_fatal_error_info(struct mem_ctl_info *mci,
 
 	/* Form out message */
 	snprintf(msg, sizeof(msg),
-		 "(Branch=%d DRAM-Bank=%d RDWR=%s RAS=%d CAS=%d "
-		 "FATAL Err=0x%x (%s))",
-		 branch >> 1, bank, rdwr ? "Write" : "Read", ras, cas,
-		 allErrors, specific);
+		 "Bank=%d RAS=%d CAS=%d FATAL Err=0x%x (%s)",
+		 bank, ras, cas, allErrors, specific);
 
 	/* Call the helper to output message */
-	edac_mc_handle_fbd_ue(mci, rank, channel, channel + 1, msg);
+	edac_mc_handle_error(HW_EVENT_ERR_FATAL,
+			     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
+			     branch >> 1, -1, rank, -1, -1,
+			     rdwr ? "Write error" : "Read error",
+			     msg);
 }
 
 /*
@@ -633,13 +635,15 @@ static void i5000_process_nonfatal_error_info(struct mem_ctl_info *mci,
 
 		/* Form out message */
 		snprintf(msg, sizeof(msg),
-			 "(Branch=%d DRAM-Bank=%d RDWR=%s RAS=%d "
-			 "CAS=%d, UE Err=0x%x (%s))",
-			 branch >> 1, bank, rdwr ? "Write" : "Read", ras, cas,
-			 ue_errors, specific);
+			 "Rank=%d Bank=%d RAS=%d CAS=%d, UE Err=0x%x (%s)",
+			 rank, bank, ras, cas, ue_errors, specific);
 
 		/* Call the helper to output message */
-		edac_mc_handle_fbd_ue(mci, rank, channel, channel + 1, msg);
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
+				channel >> 1, -1, rank, -1, -1,
+				rdwr ? "Write error" : "Read error",
+				msg);
 	}
 
 	/* Check correctable errors */
@@ -685,13 +689,17 @@ static void i5000_process_nonfatal_error_info(struct mem_ctl_info *mci,
 
 		/* Form out message */
 		snprintf(msg, sizeof(msg),
-			 "(Branch=%d DRAM-Bank=%d RDWR=%s RAS=%d "
+			 "Rank=%d Bank=%d RDWR=%s RAS=%d "
 			 "CAS=%d, CE Err=0x%x (%s))", branch >> 1, bank,
 			 rdwr ? "Write" : "Read", ras, cas, ce_errors,
 			 specific);
 
 		/* Call the helper to output message */
-		edac_mc_handle_fbd_ce(mci, rank, channel, msg);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				HW_EVENT_SCOPE_MC_CHANNEL, mci, 0, 0, 0,
+				channel >> 1, channel % 2, rank, -1, -1,
+				rdwr ? "Write error" : "Read error",
+				msg);
 	}
 
 	if (!misc_messages)
@@ -731,11 +739,13 @@ static void i5000_process_nonfatal_error_info(struct mem_ctl_info *mci,
 
 		/* Form out message */
 		snprintf(msg, sizeof(msg),
-			 "(Branch=%d Err=%#x (%s))", branch >> 1,
-			 misc_errors, specific);
+			 "Err=%#x (%s)", misc_errors, specific);
 
 		/* Call the helper to output message */
-		edac_mc_handle_fbd_ce(mci, 0, 0, msg);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
+				branch >> 1, -1, -1, -1, -1,
+				"Misc error", msg);
 	}
 }
 
@@ -1251,6 +1261,10 @@ static int i5000_init_csrows(struct mem_ctl_info *mci)
 
 	empty = 1;		/* Assume NO memory */
 
+	/*
+	 * TODO: it would be better to not use csrow here, filling
+	 * directly the dimm_info structs, based on branch, channel, dim number
+	 */
 	for (csrow = 0; csrow < max_csrows; csrow++) {
 		p_csrow = &mci->csrows[csrow];
 
@@ -1378,7 +1392,9 @@ static int i5000_probe1(struct pci_dev *pdev, int dev_idx)
 		__func__, num_channels, num_dimms_per_channel, num_csrows);
 
 	/* allocate a new MC control structure */
-	mci = edac_mc_alloc(sizeof(*pvt), num_csrows, num_channels, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    2, num_channels, num_dimms_per_channel,
+			    num_csrows, num_channels, sizeof(*pvt));
 
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index f9baee3..24b03b8 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -410,14 +410,6 @@ static int i5100_csrow_to_chan(const struct mem_ctl_info *mci, int csrow)
 	return csrow / priv->ranksperchan;
 }
 
-static unsigned i5100_rank_to_csrow(const struct mem_ctl_info *mci,
-				    int chan, int rank)
-{
-	const struct i5100_priv *priv = mci->pvt_info;
-
-	return chan * priv->ranksperchan + rank;
-}
-
 static void i5100_handle_ce(struct mem_ctl_info *mci,
 			    int chan,
 			    unsigned bank,
@@ -427,21 +419,18 @@ static void i5100_handle_ce(struct mem_ctl_info *mci,
 			    unsigned ras,
 			    const char *msg)
 {
-	const int csrow = i5100_rank_to_csrow(mci, chan, rank);
-	char *label = NULL;
-
-	if (mci->csrows[csrow].channels[0].dimm)
-		label = mci->csrows[csrow].channels[0].dimm->label;
-
-	printk(KERN_ERR
-		"CE chan %d, bank %u, rank %u, syndrome 0x%lx, "
-		"cas %u, ras %u, csrow %u, label \"%s\": %s\n",
-		chan, bank, rank, syndrome, cas, ras,
-		csrow, label, msg);
-
-	mci->ce_count++;
-	mci->csrows[csrow].ce_count++;
-	mci->csrows[csrow].channels[0].ce_count++;
+	char detail[80];
+
+	/* Form out message */
+	snprintf(detail, sizeof(detail),
+		 "bank %u, cas %u, ras %u\n",
+		 bank, cas, ras);
+
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+			     HW_EVENT_SCOPE_MC_DIMM, mci,
+			     0, 0, syndrome,
+			     0, chan, rank, -1, -1,
+			     msg, detail);
 }
 
 static void i5100_handle_ue(struct mem_ctl_info *mci,
@@ -453,20 +442,18 @@ static void i5100_handle_ue(struct mem_ctl_info *mci,
 			    unsigned ras,
 			    const char *msg)
 {
-	const int csrow = i5100_rank_to_csrow(mci, chan, rank);
-	char *label = NULL;
-
-	if (mci->csrows[csrow].channels[0].dimm)
-		label = mci->csrows[csrow].channels[0].dimm->label;
-
-	printk(KERN_ERR
-		"UE chan %d, bank %u, rank %u, syndrome 0x%lx, "
-		"cas %u, ras %u, csrow %u, label \"%s\": %s\n",
-		chan, bank, rank, syndrome, cas, ras,
-		csrow, label, msg);
-
-	mci->ue_count++;
-	mci->csrows[csrow].ue_count++;
+	char detail[80];
+
+	/* Form out message */
+	snprintf(detail, sizeof(detail),
+		 "bank %u, cas %u, ras %u\n",
+		 bank, cas, ras);
+
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+			     HW_EVENT_SCOPE_MC_DIMM, mci,
+			     0, 0, syndrome,
+			     0, chan, rank, -1, -1,
+			     msg, detail);
 }
 
 static void i5100_read_log(struct mem_ctl_info *mci, int chan,
@@ -849,7 +836,7 @@ static void __devinit i5100_init_csrows(struct mem_ctl_info *mci)
 	unsigned long total_pages = 0UL;
 	struct i5100_priv *priv = mci->pvt_info;
 
-	for (i = 0; i < mci->nr_dimms; i++) {
+	for (i = 0; i < mci->tot_dimms; i++) {
 		const unsigned long npages = i5100_npages(mci, i);
 		const unsigned chan = i5100_csrow_to_chan(mci, i);
 		const unsigned rank = i5100_csrow_to_rank(mci, i);
@@ -857,12 +844,6 @@ static void __devinit i5100_init_csrows(struct mem_ctl_info *mci)
 
 		dimm->nr_pages = npages;
 
-		dimm->mc_branch = -1;
-		dimm->mc_channel = chan;
-		dimm->mc_dimm_number = rank;
-		dimm->csrow = -1;
-		dimm->csrow_channel = -1;
-
 		if (npages) {
 			total_pages += npages;
 
@@ -943,7 +924,9 @@ static int __devinit i5100_init_one(struct pci_dev *pdev,
 		goto bail_ch1;
 	}
 
-	mci = edac_mc_alloc(sizeof(*priv), ranksperch * 2, 1, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    1, 2, ranksperch,
+			    ranksperch * 2, 1, sizeof(*priv));
 	if (!mci) {
 		ret = -ENOMEM;
 		goto bail_disable_ch1;
diff --git a/drivers/edac/i5400_edac.c b/drivers/edac/i5400_edac.c
index 6b07450..c7455da 100644
--- a/drivers/edac/i5400_edac.c
+++ b/drivers/edac/i5400_edac.c
@@ -532,13 +532,15 @@ static void i5400_proccess_non_recoverable_info(struct mem_ctl_info *mci,
 	int ras, cas;
 	int errnum;
 	char *type = NULL;
+	enum hw_event_mc_err_type tp_event = HW_EVENT_ERR_UNCORRECTED;
 
 	if (!allErrors)
 		return;		/* if no error, return now */
 
-	if (allErrors &  ERROR_FAT_MASK)
+	if (allErrors &  ERROR_FAT_MASK) {
 		type = "FATAL";
-	else if (allErrors & FERR_NF_UNCORRECTABLE)
+		tp_event = HW_EVENT_ERR_FATAL;
+	} else if (allErrors & FERR_NF_UNCORRECTABLE)
 		type = "NON-FATAL uncorrected";
 	else
 		type = "NON-FATAL recoverable";
@@ -566,13 +568,14 @@ static void i5400_proccess_non_recoverable_info(struct mem_ctl_info *mci,
 
 	/* Form out message */
 	snprintf(msg, sizeof(msg),
-		 "%s (Branch=%d DRAM-Bank=%d Buffer ID = %d RDWR=%s "
-		 "RAS=%d CAS=%d %s Err=0x%lx (%s))",
-		 type, branch >> 1, bank, buf_id, rdwr_str(rdwr), ras, cas,
-		 type, allErrors, error_name[errnum]);
-
-	/* Call the helper to output message */
-	edac_mc_handle_fbd_ue(mci, rank, channel, channel + 1, msg);
+		 "Bank=%d Buffer ID = %d RAS=%d CAS=%d Err=0x%lx (%s)",
+		 bank, buf_id, ras, cas, allErrors, error_name[errnum]);
+
+	edac_mc_handle_error(tp_event,
+			     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
+			     branch >> 1, -1, rank, -1, -1,
+			     rdwr ? "Write error" : "Read error",
+			     msg);
 }
 
 /*
@@ -642,8 +645,11 @@ static void i5400_process_nonfatal_error_info(struct mem_ctl_info *mci,
 			 branch >> 1, bank, rdwr_str(rdwr), ras, cas,
 			 allErrors, error_name[errnum]);
 
-		/* Call the helper to output message */
-		edac_mc_handle_fbd_ce(mci, rank, channel, msg);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
+				     branch >> 1, channel % 2, rank, -1, -1,
+				     rdwr ? "Write error" : "Read error",
+				     msg);
 
 		return;
 	}
@@ -1144,16 +1150,10 @@ static int i5400_init_csrows(struct mem_ctl_info *mci)
 
 	empty = 1;		/* Assume NO memory */
 
-	for (slot = 0; slot < mci->nr_dimms; slot++) {
+	for (slot = 0; slot < mci->tot_dimms; slot++) {
 		struct dimm_info *dimm = &mci->dimms[slot];
 		channel = slot % pvt->maxch;
 
-		dimm->mc_branch = channel / 2;
-		dimm->mc_channel = channel % 2;
-		dimm->mc_dimm_number = slot / pvt->maxch;
-		dimm->csrow = -1;
-		dimm->csrow_channel = -1;
-
 		/* use branch 0 for the basis */
 		mtr = determine_mtr(pvt, slot, 0);
 
@@ -1239,7 +1239,9 @@ static int i5400_probe1(struct pci_dev *pdev, int dev_idx)
 		__func__, num_channels, num_dimms_per_channel, num_csrows);
 
 	/* allocate a new MC control structure */
-	mci = edac_mc_alloc(sizeof(*pvt), num_csrows, num_channels, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    2, num_channels, num_dimms_per_channel,
+			    num_csrows, num_channels, sizeof(*pvt));
 
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/i7300_edac.c b/drivers/edac/i7300_edac.c
index 0838ec2..33f9ac2 100644
--- a/drivers/edac/i7300_edac.c
+++ b/drivers/edac/i7300_edac.c
@@ -464,17 +464,15 @@ static void i7300_process_fbd_error(struct mem_ctl_info *mci)
 				FERR_FAT_FBD, error_reg);
 
 		snprintf(pvt->tmp_prt_buffer, PAGE_SIZE,
-			"FATAL (Branch=%d DRAM-Bank=%d %s "
-			"RAS=%d CAS=%d Err=0x%lx (%s))",
-			branch, bank,
-			is_wr ? "RDWR" : "RD",
-			ras, cas,
-			errors, specific);
-
-		/* Call the helper to output message */
-		edac_mc_handle_fbd_ue(mci, rank, branch << 1,
-				      (branch << 1) + 1,
-				      pvt->tmp_prt_buffer);
+			 "Bank=%d RAS=%d CAS=%d Err=0x%lx (%s))",
+			 bank, ras, cas, errors, specific);
+
+		edac_mc_handle_error(HW_EVENT_ERR_FATAL,
+				     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
+				     branch, -1, rank, -1, -1,
+				     is_wr ? "Write error" : "Read error",
+				     pvt->tmp_prt_buffer);
+
 	}
 
 	/* read in the 1st NON-FATAL error register */
@@ -513,23 +511,15 @@ static void i7300_process_fbd_error(struct mem_ctl_info *mci)
 
 		/* Form out message */
 		snprintf(pvt->tmp_prt_buffer, PAGE_SIZE,
-			"Corrected error (Branch=%d, Channel %d), "
-			" DRAM-Bank=%d %s "
-			"RAS=%d CAS=%d, CE Err=0x%lx, Syndrome=0x%08x(%s))",
-			branch, channel,
-			bank,
-			is_wr ? "RDWR" : "RD",
-			ras, cas,
-			errors, syndrome, specific);
-
-		/*
-		 * Call the helper to output message
-		 * NOTE: Errors are reported per-branch, and not per-channel
-		 *	 Currently, we don't know how to identify the right
-		 *	 channel.
-		 */
-		edac_mc_handle_fbd_ce(mci, rank, channel,
-				      pvt->tmp_prt_buffer);
+			 "DRAM-Bank=%d RAS=%d CAS=%d, Err=0x%lx (%s))",
+			 bank, ras, cas, errors, specific);
+
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0,
+				     syndrome,
+				     branch >> 1, channel % 2, rank, -1, -1,
+				     is_wr ? "Write error" : "Read error",
+				     pvt->tmp_prt_buffer);
 	}
 	return;
 }
@@ -799,7 +789,7 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 
 	/* Get the set of MTR[0-7] regs by each branch */
 	dimm = mci->dimms;
-	mci->nr_dimms = 0;
+	mci->tot_dimms = 0;
 	for (slot = 0; slot < MAX_SLOTS; slot++) {
 		int where = mtr_regs[slot];
 		for (branch = 0; branch < MAX_BRANCHES; branch++) {
@@ -811,16 +801,10 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
 
 				dinfo = &pvt->dimm_info[slot][channel];
 
-				dimm->mc_branch = branch;
-				dimm->mc_channel = ch;
-				dimm->mc_dimm_number = slot;
-				dimm->csrow = -1;
-				dimm->csrow_channel = -1;
-
 				mtr = decode_mtr(pvt, slot, ch, branch,
 						 dinfo, dimm);
 
-				mci->nr_dimms++;
+				mci->tot_dimms++;
 				dimm++;
 
 				/* if no DIMMS on this row, continue */
@@ -1078,7 +1062,10 @@ static int __devinit i7300_init_one(struct pci_dev *pdev,
 		__func__, num_channels, num_dimms_per_channel, num_csrows);
 
 	/* allocate a new MC control structure */
-	mci = edac_mc_alloc(sizeof(*pvt), num_csrows, num_channels, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    MAX_BRANCHES, num_channels / MAX_BRANCHES,
+			    num_dimms_per_channel,
+			    num_csrows, num_channels, sizeof(*pvt));
 
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index c6c649d..f63c0f4 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -598,7 +598,7 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 	struct csrow_info *csr;
 	struct pci_dev *pdev;
 	int i, j;
-	int csrow = 0;
+	int csrow = 0, cschannel = 0;
 	enum edac_type mode;
 	enum mem_type mtype;
 
@@ -693,12 +693,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 			u32 banks, ranks, rows, cols;
 			u32 size, npages;
 
-			dimm->mc_branch = -1;
-			dimm->mc_channel = i;
-			dimm->mc_dimm_number = j;
-			dimm->csrow = -1;
-			dimm->csrow_channel = -1;
-
 			if (!DIMM_PRESENT(dimm_dod[j]))
 				continue;
 
@@ -710,8 +704,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 			/* DDR3 has 8 I/O banks */
 			size = (rows * cols * banks * ranks) >> (20 - 3);
 
-			pvt->channel[i].dimms++;
-
 			debugf0("\tdimm %d %d Mb offset: %x, "
 				"bank: %d, rank: %d, row: %#x, col: %#x\n",
 				j, size,
@@ -720,11 +712,16 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 
 			npages = MiB_TO_PAGES(size);
 
-			csr = &mci->csrows[csrow];
-			csr->channels[0].dimm = dimm;
-
 			pvt->csrow_map[i][j] = csrow;
 
+			csr = &mci->csrows[csrow];
+			csr->channels[cschannel].dimm = dimm;
+			cschannel++;
+			if (cschannel >= MAX_DIMMS) {
+				cschannel = 0;
+				csrow++;
+			}
+
 			dimm->nr_pages = npages;
 
 			switch (banks) {
@@ -766,6 +763,17 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 				(value[j] & ((1 << 24) - 1)));
 	}
 
+	/* Clears the unused data */
+	while (csrow < NUM_CHANS && cschannel < MAX_DIMMS) {
+		csr = &mci->csrows[csrow];
+		csr->channels[cschannel].dimm = NULL;
+		cschannel++;
+		if (cschannel >= MAX_DIMMS) {
+			cschannel = 0;
+			csrow++;
+		}
+	}
+
 	return 0;
 }
 
@@ -1568,17 +1576,14 @@ static void i7core_rdimm_update_csrow(struct mem_ctl_info *mci,
 				      const int dimm,
 				      const int add)
 {
-	char *msg;
-	struct i7core_pvt *pvt = mci->pvt_info;
-	int row = pvt->csrow_map[chan][dimm], i;
+	int i;
 
 	for (i = 0; i < add; i++) {
-		msg = kasprintf(GFP_KERNEL, "Corrected error "
-				"(Socket=%d channel=%d dimm=%d)",
-				pvt->i7core_dev->socket, chan, dimm);
-
-		edac_mc_handle_fbd_ce(mci, row, 0, msg);
-		kfree (msg);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_DIMM, mci,
+				     0, 0, 0,
+				     0, chan, dimm, -1, -1,
+				     "error", "");
 	}
 }
 
@@ -1744,7 +1749,10 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	char *type, *optype, *err, *msg;
+	enum hw_event_mc_err_type tp_event;
 	unsigned long error = m->status & 0x1ff0000l;
+	bool uncorrected_error = m->mcgstatus & 1ll << 61;
+	bool ripv = m->mcgstatus & 1;
 	u32 optypenum = (m->status >> 4) & 0x07;
 	u32 core_err_cnt = (m->status >> 38) & 0x7fff;
 	u32 dimm = (m->misc >> 16) & 0x3;
@@ -1753,10 +1761,18 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 	u32 errnum = find_first_bit(&error, 32);
 	int csrow;
 
-	if (m->mcgstatus & 1)
-		type = "FATAL";
-	else
-		type = "NON_FATAL";
+	if (uncorrected_error) {
+		if (ripv) {
+			type = "FATAL";
+			tp_event = HW_EVENT_ERR_FATAL;
+		} else {
+			type = "NON_FATAL";
+			tp_event = HW_EVENT_ERR_UNCORRECTED;
+		}
+	} else {
+		type = "CORRECTED";
+		tp_event = HW_EVENT_ERR_CORRECTED;
+	}
 
 	switch (optypenum) {
 	case 0:
@@ -1811,25 +1827,26 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 		err = "unknown";
 	}
 
-	/* FIXME: should convert addr into bank and rank information */
 	msg = kasprintf(GFP_ATOMIC,
-		"%s (addr = 0x%08llx, cpu=%d, Dimm=%d, Channel=%d, "
-		"syndrome=0x%08x, count=%d, Err=%08llx:%08llx (%s: %s))\n",
-		type, (long long) m->addr, m->cpu, dimm, channel,
-		syndrome, core_err_cnt, (long long)m->status,
-		(long long)m->misc, optype, err);
-
-	debugf0("%s", msg);
+		"addr=0x%08llx cpu=%d count=%d Err=%08llx:%08llx (%s: %s))\n",
+		(long long) m->addr, m->cpu, core_err_cnt,
+		(long long)m->status, (long long)m->misc, optype, err);
 
 	csrow = pvt->csrow_map[channel][dimm];
 
-	/* Call the helper to output message */
-	if (m->mcgstatus & 1)
-		edac_mc_handle_fbd_ue(mci, csrow, 0,
-				0 /* FIXME: should be channel here */, msg);
-	else if (!pvt->is_registered)
-		edac_mc_handle_fbd_ce(mci, csrow,
-				0 /* FIXME: should be channel here */, msg);
+	/*
+	 * Call the helper to output message
+	 * FIXME: what to do if core_err_cnt > 1? Currently, it generates
+	 * only one event
+	 */
+	if (uncorrected_error || !pvt->is_registered)
+		edac_mc_handle_error(tp_event,
+				     HW_EVENT_SCOPE_MC_DIMM, mci,
+				     m->addr >> PAGE_SHIFT,
+				     m->addr & ~PAGE_MASK,
+				     syndrome,
+				     0, channel, dimm, -1, -1,
+				     err, msg);
 
 	kfree(msg);
 }
@@ -2256,7 +2273,10 @@ static int i7core_register_mci(struct i7core_dev *i7core_dev)
 		return rc;
 
 	/* allocate a new MC control structure */
-	mci = edac_mc_alloc(sizeof(*pvt), csrows, channels, i7core_dev->socket);
+
+	mci = edac_mc_alloc(EDAC_ALLOC_FILL_PRIV, i7core_dev->socket,
+			    1, NUM_CHANS, MAX_DIMMS,
+			    MAX_DIMMS, NUM_CHANS, sizeof(*pvt));
 	if (unlikely(!mci))
 		return -ENOMEM;
 
diff --git a/drivers/edac/i82443bxgx_edac.c b/drivers/edac/i82443bxgx_edac.c
index 74166ae..0992549 100644
--- a/drivers/edac/i82443bxgx_edac.c
+++ b/drivers/edac/i82443bxgx_edac.c
@@ -156,19 +156,23 @@ static int i82443bxgx_edacmc_process_error_info(struct mem_ctl_info *mci,
 	if (info->eap & I82443BXGX_EAP_OFFSET_SBE) {
 		error_found = 1;
 		if (handle_errors)
-			edac_mc_handle_ce(mci, page, pageoffset,
-				/* 440BX/GX don't make syndrome information
-				 * available */
-				0, edac_mc_find_csrow_by_page(mci, page), 0,
-				mci->ctl_name);
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
+					     mci, page, pageoffset, 0,
+					     -1, -1, -1,
+					     edac_mc_find_csrow_by_page(mci, page),
+					     0, mci->ctl_name, 0);
 	}
 
 	if (info->eap & I82443BXGX_EAP_OFFSET_MBE) {
 		error_found = 1;
 		if (handle_errors)
-			edac_mc_handle_ue(mci, page, pageoffset,
-					edac_mc_find_csrow_by_page(mci, page),
-					mci->ctl_name);
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
+					     mci, page, pageoffset, 0,
+					     -1, -1, -1,
+					     edac_mc_find_csrow_by_page(mci, page),
+					     0, mci->ctl_name, 0);
 	}
 
 	return error_found;
@@ -196,7 +200,7 @@ static void i82443bxgx_init_csrows(struct mem_ctl_info *mci,
 
 	pci_read_config_byte(pdev, I82443BXGX_DRAMC, &dramc);
 	row_high_limit_last = 0;
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		csrow = &mci->csrows[index];
 		dimm = csrow->channels[0].dimm;
 
@@ -248,7 +252,9 @@ static int i82443bxgx_edacmc_probe1(struct pci_dev *pdev, int dev_idx)
 	if (pci_read_config_dword(pdev, I82443BXGX_NBXCFG, &nbxcfg))
 		return -EIO;
 
-	mci = edac_mc_alloc(0, I82443BXGX_NR_CSROWS, I82443BXGX_NR_CHANS, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, I82443BXGX_NR_CSROWS,
+			    I82443BXGX_NR_CSROWS, I82443BXGX_NR_CHANS, 0);
 
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/i82860_edac.c b/drivers/edac/i82860_edac.c
index 48e0ecd..3ab8a7a 100644
--- a/drivers/edac/i82860_edac.c
+++ b/drivers/edac/i82860_edac.c
@@ -99,6 +99,7 @@ static int i82860_process_error_info(struct mem_ctl_info *mci,
 				struct i82860_error_info *info,
 				int handle_errors)
 {
+	struct dimm_info *dimm;
 	int row;
 
 	if (!(info->errsts2 & 0x0003))
@@ -108,18 +109,31 @@ static int i82860_process_error_info(struct mem_ctl_info *mci,
 		return 1;
 
 	if ((info->errsts ^ info->errsts2) & 0x0003) {
-		edac_mc_handle_ce_no_info(mci, "UE overwrote CE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+				     -1, -1, -1, -1, -1,
+				     "UE overwrote CE", "");
 		info->errsts = info->errsts2;
 	}
 
 	info->eap >>= PAGE_SHIFT;
 	row = edac_mc_find_csrow_by_page(mci, info->eap);
+	dimm = mci->csrows[row].channels[0].dimm;
 
 	if (info->errsts & 0x0002)
-		edac_mc_handle_ue(mci, info->eap, 0, row, "i82860 UE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC_DIMM, mci,
+				     info->eap, 0, 0,
+				     dimm->mc_branch, dimm->mc_channel,
+				     dimm->mc_dimm_number, -1, -1,
+				     "i82860 UE", "");
 	else
-		edac_mc_handle_ce(mci, info->eap, 0, info->derrsyn, row, 0,
-				"i82860 UE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC_DIMM, mci,
+				     info->eap, 0, info->derrsyn,
+				     dimm->mc_branch, dimm->mc_channel,
+				     dimm->mc_dimm_number, -1, -1,
+				     "i82860 CE", "");
 
 	return 1;
 }
@@ -152,7 +166,7 @@ static void i82860_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev)
 	 * cumulative; therefore GRA15 will contain the total memory contained
 	 * in all eight rows.
 	 */
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		csrow = &mci->csrows[index];
 		dimm = csrow->channels[0].dimm;
 
@@ -181,15 +195,21 @@ static int i82860_probe1(struct pci_dev *pdev, int dev_idx)
 	struct mem_ctl_info *mci;
 	struct i82860_error_info discard;
 
-	/* RDRAM has channels but these don't map onto the abstractions that
-	   edac uses.
-	   The device groups from the GRA registers seem to map reasonably
-	   well onto the notion of a chip select row.
-	   There are 16 GRA registers and since the name is associated with
-	   the channel and the GRA registers map to physical devices so we are
-	   going to make 1 channel for group.
+	/*
+	 * RDRAM has channels but these don't map onto the csrow abstraction.
+	 * According with the datasheet, there are 2 Rambus channels, supporting
+	 * up to 16 direct RDRAM devices.
+	 * The device groups from the GRA registers seem to map reasonably
+	 * well onto the notion of a chip select row.
+	 * There are 16 GRA registers and since the name is associated with
+	 * the channel and the GRA registers map to physical devices so we are
+	 * going to make 1 channel for group.
 	 */
-	mci = edac_mc_alloc(0, 16, 1, 0);
+
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    1, 2 /* channels */, 8 /* sticks per channel */,
+			    16, 1,
+			    0);
 
 	if (!mci)
 		return -ENOMEM;
diff --git a/drivers/edac/i82875p_edac.c b/drivers/edac/i82875p_edac.c
index dc207dc..74afaba 100644
--- a/drivers/edac/i82875p_edac.c
+++ b/drivers/edac/i82875p_edac.c
@@ -38,7 +38,8 @@
 #endif				/* PCI_DEVICE_ID_INTEL_82875_6 */
 
 /* four csrows in dual channel, eight in single channel */
-#define I82875P_NR_CSROWS(nr_chans) (8/(nr_chans))
+#define I82875P_NR_DIMMS		8
+#define I82875P_NR_CSROWS(nr_chans)	(I82875P_NR_DIMMS / (nr_chans))
 
 /* Intel 82875p register addresses - device 0 function 0 - DRAM Controller */
 #define I82875P_EAP		0x58	/* Error Address Pointer (32b)
@@ -235,7 +236,10 @@ static int i82875p_process_error_info(struct mem_ctl_info *mci,
 		return 1;
 
 	if ((info->errsts ^ info->errsts2) & 0x0081) {
-		edac_mc_handle_ce_no_info(mci, "UE overwrote CE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+				     -1, -1, -1, -1, -1,
+				     "UE overwrote CE", "");
 		info->errsts = info->errsts2;
 	}
 
@@ -243,11 +247,18 @@ static int i82875p_process_error_info(struct mem_ctl_info *mci,
 	row = edac_mc_find_csrow_by_page(mci, info->eap);
 
 	if (info->errsts & 0x0080)
-		edac_mc_handle_ue(mci, info->eap, 0, row, "i82875p UE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW, mci,
+				     info->eap, 0, 0,
+				     -1, -1, -1, row, -1,
+				     "i82875p UE", "");
 	else
-		edac_mc_handle_ce(mci, info->eap, 0, info->derrsyn, row,
-				multi_chan ? (info->des & 0x1) : 0,
-				"i82875p CE");
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     info->eap, 0, info->derrsyn,
+				     -1, -1, -1, row,
+				     multi_chan ? (info->des & 0x1) : 0,
+				     "i82875p CE", "");
 
 	return 1;
 }
@@ -359,7 +370,7 @@ static void i82875p_init_csrows(struct mem_ctl_info *mci,
 	 * contain the total memory contained in all eight rows.
 	 */
 
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		csrow = &mci->csrows[index];
 
 		value = readb(ovrfl_window + I82875P_DRB + index);
@@ -405,9 +416,10 @@ static int i82875p_probe1(struct pci_dev *pdev, int dev_idx)
 		return -ENODEV;
 	drc = readl(ovrfl_window + I82875P_DRC);
 	nr_chans = dual_channel_active(drc) + 1;
-	mci = edac_mc_alloc(sizeof(*pvt), I82875P_NR_CSROWS(nr_chans),
-			nr_chans, 0);
-
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    -1, -1, I82875P_NR_DIMMS,
+			    I82875P_NR_CSROWS(nr_chans), nr_chans,
+			    sizeof(*pvt));
 	if (!mci) {
 		rc = -ENOMEM;
 		goto fail0;
diff --git a/drivers/edac/i82975x_edac.c b/drivers/edac/i82975x_edac.c
index d7dc455..33feeba 100644
--- a/drivers/edac/i82975x_edac.c
+++ b/drivers/edac/i82975x_edac.c
@@ -29,7 +29,8 @@
 #define PCI_DEVICE_ID_INTEL_82975_0	0x277c
 #endif				/* PCI_DEVICE_ID_INTEL_82975_0 */
 
-#define I82975X_NR_CSROWS(nr_chans)		(8/(nr_chans))
+#define I82975X_NR_DIMMS		8
+#define I82975X_NR_CSROWS(nr_chans)	(I82975X_NR_DIMMS / (nr_chans))
 
 /* Intel 82975X register addresses - device 0 function 0 - DRAM Controller */
 #define I82975X_EAP		0x58	/* Dram Error Address Pointer (32b)
@@ -289,7 +290,10 @@ static int i82975x_process_error_info(struct mem_ctl_info *mci,
 		return 1;
 
 	if ((info->errsts ^ info->errsts2) & 0x0003) {
-		edac_mc_handle_ce_no_info(mci, "UE overwrote CE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+				     -1, -1, -1, -1, -1,
+				     "UE overwrote CE", "");
 		info->errsts = info->errsts2;
 	}
 
@@ -303,11 +307,18 @@ static int i82975x_process_error_info(struct mem_ctl_info *mci,
 	row = edac_mc_find_csrow_by_page(mci, page);
 
 	if (info->errsts & 0x0002)
-		edac_mc_handle_ue(mci, page, offst , row, "i82975x UE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW, mci,
+				     page, offst, 0,
+				     -1, -1, -1, row, -1,
+				     "i82975x UE", "");
 	else
-		edac_mc_handle_ce(mci, page, offst, info->derrsyn, row,
-				multi_chan ? chan : 0,
-				"i82975x CE");
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     page, offst, info->derrsyn,
+				     -1, -1, -1, row,
+				     multi_chan ? chan : 0,
+				     "i82975x CE", "");
 
 	return 1;
 }
@@ -378,7 +389,7 @@ static void i82975x_init_csrows(struct mem_ctl_info *mci,
 	 *
 	 */
 
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		csrow = &mci->csrows[index];
 
 		value = readb(mch_window + I82975X_DRB + index +
@@ -533,8 +544,10 @@ static int i82975x_probe1(struct pci_dev *pdev, int dev_idx)
 	chans = dual_channel_active(mch_window) + 1;
 
 	/* assuming only one controller, index thus is 0 */
-	mci = edac_mc_alloc(sizeof(*pvt), I82975X_NR_CSROWS(chans),
-					chans, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    -1, -1, I82975X_NR_DIMMS,
+			    I82975X_NR_CSROWS(chans), chans,
+			    sizeof(*pvt));
 	if (!mci) {
 		rc = -ENOMEM;
 		goto fail1;
diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index c1d9e15..f7c3a67 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -812,7 +812,7 @@ static void mpc85xx_mc_check(struct mem_ctl_info *mci)
 	err_addr = in_be32(pdata->mc_vbase + MPC85XX_MC_CAPTURE_ADDRESS);
 	pfn = err_addr >> PAGE_SHIFT;
 
-	for (row_index = 0; row_index < mci->nr_csrows; row_index++) {
+	for (row_index = 0; row_index < mci->num_csrows; row_index++) {
 		csrow = &mci->csrows[row_index];
 		if ((pfn >= csrow->first_page) && (pfn <= csrow->last_page))
 			break;
@@ -850,16 +850,22 @@ static void mpc85xx_mc_check(struct mem_ctl_info *mci)
 	mpc85xx_mc_printk(mci, KERN_ERR, "PFN: %#8.8x\n", pfn);
 
 	/* we are out of range */
-	if (row_index == mci->nr_csrows)
+	if (row_index == mci->num_csrows)
 		mpc85xx_mc_printk(mci, KERN_ERR, "PFN out of range!\n");
 
 	if (err_detect & DDR_EDE_SBE)
-		edac_mc_handle_ce(mci, pfn, err_addr & ~PAGE_MASK,
-				  syndrome, row_index, 0, mci->ctl_name);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     pfn, err_addr & ~PAGE_MASK, syndrome,
+				     -1, -1, -1, row_index, 0,
+				     mci->ctl_name, "");
 
 	if (err_detect & DDR_EDE_MBE)
-		edac_mc_handle_ue(mci, pfn, err_addr & ~PAGE_MASK,
-				  row_index, mci->ctl_name);
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     pfn, err_addr & ~PAGE_MASK, syndrome,
+				     -1, -1, -1, row_index, 0,
+				     mci->ctl_name, "");
 
 	out_be32(pdata->mc_vbase + MPC85XX_MC_ERR_DETECT, err_detect);
 }
@@ -925,7 +931,7 @@ static void __devinit mpc85xx_init_csrows(struct mem_ctl_info *mci)
 		}
 	}
 
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		u32 start;
 		u32 end;
 
@@ -969,7 +975,8 @@ static int __devinit mpc85xx_mc_err_probe(struct platform_device *op)
 	if (!devres_open_group(&op->dev, mpc85xx_mc_err_probe, GFP_KERNEL))
 		return -ENOMEM;
 
-	mci = edac_mc_alloc(sizeof(*pdata), 4, 1, edac_mc_idx);
+	mci = edac_mc_alloc(edac_mc_idx, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, 4, 4, 1, sizeof(*pdata));
 	if (!mci) {
 		devres_release_group(&op->dev, mpc85xx_mc_err_probe);
 		return -ENOMEM;
diff --git a/drivers/edac/mv64x60_edac.c b/drivers/edac/mv64x60_edac.c
index 281e245..96a675a 100644
--- a/drivers/edac/mv64x60_edac.c
+++ b/drivers/edac/mv64x60_edac.c
@@ -611,12 +611,19 @@ static void mv64x60_mc_check(struct mem_ctl_info *mci)
 
 	/* first bit clear in ECC Err Reg, 1 bit error, correctable by HW */
 	if (!(reg & 0x1))
-		edac_mc_handle_ce(mci, err_addr >> PAGE_SHIFT,
-				  err_addr & PAGE_MASK, syndrome, 0, 0,
-				  mci->ctl_name);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     err_addr >> PAGE_SHIFT,
+				     err_addr & PAGE_MASK, syndrome,
+				     -1, -1, -1, 0, 0,
+				     mci->ctl_name, "");
 	else	/* 2 bit error, UE */
-		edac_mc_handle_ue(mci, err_addr >> PAGE_SHIFT,
-				  err_addr & PAGE_MASK, 0, mci->ctl_name);
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     err_addr >> PAGE_SHIFT,
+				     err_addr & PAGE_MASK, 0,
+				     -1, -1, -1, 0, 0,
+				     mci->ctl_name, "");
 
 	/* clear the error */
 	out_le32(pdata->mc_vbase + MV64X60_SDRAM_ERR_ADDR, 0);
@@ -703,7 +710,9 @@ static int __devinit mv64x60_mc_err_probe(struct platform_device *pdev)
 	if (!devres_open_group(&pdev->dev, mv64x60_mc_err_probe, GFP_KERNEL))
 		return -ENOMEM;
 
-	mci = edac_mc_alloc(sizeof(struct mv64x60_mc_pdata), 1, 1, edac_mc_idx);
+	mci = edac_mc_alloc(edac_mc_idx, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, 1,
+			    1, 1, sizeof(struct mv64x60_mc_pdata));
 	if (!mci) {
 		printk(KERN_ERR "%s: No memory for CPU err\n", __func__);
 		devres_release_group(&pdev->dev, mv64x60_mc_err_probe);
diff --git a/drivers/edac/pasemi_edac.c b/drivers/edac/pasemi_edac.c
index 3fcefda..0d0a545 100644
--- a/drivers/edac/pasemi_edac.c
+++ b/drivers/edac/pasemi_edac.c
@@ -110,15 +110,20 @@ static void pasemi_edac_process_error_info(struct mem_ctl_info *mci, u32 errsta)
 	/* uncorrectable/multi-bit errors */
 	if (errsta & (MCDEBUG_ERRSTA_MBE_STATUS |
 		      MCDEBUG_ERRSTA_RFL_STATUS)) {
-		edac_mc_handle_ue(mci, mci->csrows[cs].first_page, 0,
-				  cs, mci->ctl_name);
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     mci->csrows[cs].first_page, 0, 0,
+				     -1, -1, -1, cs, 0,
+				     mci->ctl_name, "");
 	}
 
 	/* correctable/single-bit errors */
-	if (errsta & MCDEBUG_ERRSTA_SBE_STATUS) {
-		edac_mc_handle_ce(mci, mci->csrows[cs].first_page, 0,
-				  0, cs, 0, mci->ctl_name);
-	}
+	if (errsta & MCDEBUG_ERRSTA_SBE_STATUS)
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     mci->csrows[cs].first_page, 0, 0,
+				     -1, -1, -1, cs, 0,
+				     mci->ctl_name, "");
 }
 
 static void pasemi_edac_check(struct mem_ctl_info *mci)
@@ -139,7 +144,7 @@ static int pasemi_edac_init_csrows(struct mem_ctl_info *mci,
 	u32 rankcfg;
 	int index;
 
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		csrow = &mci->csrows[index];
 		dimm = csrow->channels[0].dimm;
 
@@ -207,8 +212,9 @@ static int __devinit pasemi_edac_probe(struct pci_dev *pdev,
 		MCDEBUG_ERRCTL1_RFL_LOG_EN;
 	pci_write_config_dword(pdev, MCDEBUG_ERRCTL1, errctl1);
 
-	mci = edac_mc_alloc(0, PASEMI_EDAC_NR_CSROWS, PASEMI_EDAC_NR_CHANS,
-				system_mmc_id++);
+	mci = edac_mc_alloc(system_mmc_id++, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, PASEMI_EDAC_NR_CSROWS,
+			    PASEMI_EDAC_NR_CSROWS, PASEMI_EDAC_NR_CHANS, 0);
 
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/ppc4xx_edac.c b/drivers/edac/ppc4xx_edac.c
index 1adaddf..2e393cb 100644
--- a/drivers/edac/ppc4xx_edac.c
+++ b/drivers/edac/ppc4xx_edac.c
@@ -214,7 +214,7 @@ static struct platform_driver ppc4xx_edac_driver = {
  * TODO: The row and channel parameters likely need to be dynamically
  * set based on the aforementioned variant controller realizations.
  */
-static const unsigned ppc4xx_edac_nr_csrows = 2;
+static const unsigned ppc4xx_edac_num_csrows = 2;
 static const unsigned ppc4xx_edac_nr_chans = 1;
 
 /*
@@ -330,7 +330,7 @@ ppc4xx_edac_generate_bank_message(const struct mem_ctl_info *mci,
 	size -= n;
 	total += n;
 
-	for (rows = 0, row = 0; row < mci->nr_csrows; row++) {
+	for (rows = 0, row = 0; row < mci->num_csrows; row++) {
 		if (ppc4xx_edac_check_bank_error(status, row)) {
 			n = snprintf(buffer, size, "%s%u",
 					(rows++ ? ", " : ""), row);
@@ -725,9 +725,12 @@ ppc4xx_edac_handle_ce(struct mem_ctl_info *mci,
 
 	ppc4xx_edac_generate_message(mci, status, message, sizeof(message));
 
-	for (row = 0; row < mci->nr_csrows; row++)
+	for (row = 0; row < mci->num_csrows; row++)
 		if (ppc4xx_edac_check_bank_error(status, row))
-			edac_mc_handle_ce_no_info(mci, message);
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+					     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+					     -1, -1, -1, -1, -1,
+					     message, "");
 }
 
 /**
@@ -753,9 +756,13 @@ ppc4xx_edac_handle_ue(struct mem_ctl_info *mci,
 
 	ppc4xx_edac_generate_message(mci, status, message, sizeof(message));
 
-	for (row = 0; row < mci->nr_csrows; row++)
+	for (row = 0; row < mci->num_csrows; row++)
 		if (ppc4xx_edac_check_bank_error(status, row))
-			edac_mc_handle_ue(mci, page, offset, row, message);
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+					     HW_EVENT_SCOPE_MC, mci,
+					     page, offset, 0,
+					     -1, -1, -1, -1, -1,
+					     message, "");
 }
 
 /**
@@ -917,7 +924,7 @@ ppc4xx_edac_init_csrows(struct mem_ctl_info *mci, u32 mcopt1)
 	 * 1:1 with a controller bank/rank.
 	 */
 
-	for (row = 0; row < mci->nr_csrows; row++) {
+	for (row = 0; row < mci->num_csrows; row++) {
 		struct csrow_info *csi = &mci->csrows[row];
 
 		/*
@@ -1279,10 +1286,12 @@ static int __devinit ppc4xx_edac_probe(struct platform_device *op)
 	 * initialization.
 	 */
 
-	mci = edac_mc_alloc(sizeof(struct ppc4xx_edac_pdata),
-			    ppc4xx_edac_nr_csrows,
+	mci = edac_mc_alloc(ppc4xx_edac_instance,
+			    EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, ppc4xx_edac_num_csrows * ppc4xx_edac_nr_chans,
+			    ppc4xx_edac_num_csrows,
 			    ppc4xx_edac_nr_chans,
-			    ppc4xx_edac_instance);
+			    sizeof(struct ppc4xx_edac_pdata));
 
 	if (mci == NULL) {
 		ppc4xx_edac_printk(KERN_ERR, "%s: "
diff --git a/drivers/edac/r82600_edac.c b/drivers/edac/r82600_edac.c
index a4b0626..214bc48 100644
--- a/drivers/edac/r82600_edac.c
+++ b/drivers/edac/r82600_edac.c
@@ -179,10 +179,13 @@ static int r82600_process_error_info(struct mem_ctl_info *mci,
 		error_found = 1;
 
 		if (handle_errors)
-			edac_mc_handle_ce(mci, page, 0,	/* not avail */
-					syndrome,
-					edac_mc_find_csrow_by_page(mci, page),
-					0, mci->ctl_name);
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
+					     mci, page, 0, syndrome,
+					     -1, -1, -1,
+					     edac_mc_find_csrow_by_page(mci, page),
+					     0,
+					     mci->ctl_name, "");
 	}
 
 	if (info->eapr & BIT(1)) {	/* UE? */
@@ -190,9 +193,13 @@ static int r82600_process_error_info(struct mem_ctl_info *mci,
 
 		if (handle_errors)
 			/* 82600 doesn't give enough info */
-			edac_mc_handle_ue(mci, page, 0,
-					edac_mc_find_csrow_by_page(mci, page),
-					mci->ctl_name);
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
+					     mci, page, 0, 0,
+					     -1, -1, -1,
+					     edac_mc_find_csrow_by_page(mci, page),
+					     0,
+					     mci->ctl_name, "");
 	}
 
 	return error_found;
@@ -226,7 +233,7 @@ static void r82600_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 	reg_sdram = dramcr & BIT(4);
 	row_high_limit_last = 0;
 
-	for (index = 0; index < mci->nr_csrows; index++) {
+	for (index = 0; index < mci->num_csrows; index++) {
 		csrow = &mci->csrows[index];
 		dimm = csrow->channels[0].dimm;
 
@@ -281,7 +288,10 @@ static int r82600_probe1(struct pci_dev *pdev, int dev_idx)
 	debugf2("%s(): sdram refresh rate = %#0x\n", __func__,
 		sdram_refresh_rate);
 	debugf2("%s(): DRAMC register = %#0x\n", __func__, dramcr);
-	mci = edac_mc_alloc(0, R82600_NR_CSROWS, R82600_NR_CHANS, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    -1, -1, R82600_NR_DIMMS,
+			    R82600_NR_CSROWS, R82600_NR_CHANS,
+			    0);
 
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 981262b..5df6ade 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -646,8 +646,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 
 				csr->channels[0].dimm = dimm;
 				dimm->nr_pages = npages;
-				dimm->mc_channel = i;
-				dimm->mc_dimm_number = j;
 				dimm->grain = 32;
 				dimm->dtype = (banks == 8) ? DEV_X8 : DEV_X4;
 				dimm->mtype = mtype;
@@ -834,11 +832,10 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 				 u8 *socket,
 				 long *channel_mask,
 				 u8 *rank,
-				 char *area_type)
+				 char *area_type, char *msg)
 {
 	struct mem_ctl_info	*new_mci;
 	struct sbridge_pvt *pvt = mci->pvt_info;
-	char			msg[256];
 	int 			n_rir, n_sads, n_tads, sad_way, sck_xch;
 	int			sad_interl, idx, base_ch;
 	int			interleave_mode;
@@ -859,12 +856,10 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 	 */
 	if ((addr > (u64) pvt->tolm) && (addr < (1L << 32))) {
 		sprintf(msg, "Error at TOLM area, on addr 0x%08Lx", addr);
-		edac_mc_handle_ce_no_info(mci, msg);
 		return -EINVAL;
 	}
 	if (addr >= (u64)pvt->tohm) {
 		sprintf(msg, "Error at MMIOH area, on addr 0x%016Lx", addr);
-		edac_mc_handle_ce_no_info(mci, msg);
 		return -EINVAL;
 	}
 
@@ -881,7 +876,6 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 		limit = SAD_LIMIT(reg);
 		if (limit <= prv) {
 			sprintf(msg, "Can't discover the memory socket");
-			edac_mc_handle_ce_no_info(mci, msg);
 			return -EINVAL;
 		}
 		if  (addr <= limit)
@@ -890,7 +884,6 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 	}
 	if (n_sads == MAX_SAD) {
 		sprintf(msg, "Can't discover the memory socket");
-		edac_mc_handle_ce_no_info(mci, msg);
 		return -EINVAL;
 	}
 	area_type = get_dram_attr(reg);
@@ -931,7 +924,6 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 		break;
 	default:
 		sprintf(msg, "Can't discover socket interleave");
-		edac_mc_handle_ce_no_info(mci, msg);
 		return -EINVAL;
 	}
 	*socket = sad_interleave[idx];
@@ -946,7 +938,6 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 	if (!new_mci) {
 		sprintf(msg, "Struct for socket #%u wasn't initialized",
 			*socket);
-		edac_mc_handle_ce_no_info(mci, msg);
 		return -EINVAL;
 	}
 	mci = new_mci;
@@ -962,7 +953,6 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 		limit = TAD_LIMIT(reg);
 		if (limit <= prv) {
 			sprintf(msg, "Can't discover the memory channel");
-			edac_mc_handle_ce_no_info(mci, msg);
 			return -EINVAL;
 		}
 		if  (addr <= limit)
@@ -1002,7 +992,6 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 		break;
 	default:
 		sprintf(msg, "Can't discover the TAD target");
-		edac_mc_handle_ce_no_info(mci, msg);
 		return -EINVAL;
 	}
 	*channel_mask = 1 << base_ch;
@@ -1016,7 +1005,6 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 			break;
 		default:
 			sprintf(msg, "Invalid mirror set. Can't decode addr");
-			edac_mc_handle_ce_no_info(mci, msg);
 			return -EINVAL;
 		}
 	} else
@@ -1044,7 +1032,6 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 	if (offset > addr) {
 		sprintf(msg, "Can't calculate ch addr: TAD offset 0x%08Lx is too high for addr 0x%08Lx!",
 			offset, addr);
-		edac_mc_handle_ce_no_info(mci, msg);
 		return -EINVAL;
 	}
 	addr -= offset;
@@ -1084,7 +1071,6 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
 	if (n_rir == MAX_RIR_RANGES) {
 		sprintf(msg, "Can't discover the memory rank for ch addr 0x%08Lx",
 			ch_addr);
-		edac_mc_handle_ce_no_info(mci, msg);
 		return -EINVAL;
 	}
 	rir_way = RIR_WAY(reg);
@@ -1398,7 +1384,8 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 {
 	struct mem_ctl_info *new_mci;
 	struct sbridge_pvt *pvt = mci->pvt_info;
-	char *type, *optype, *msg, *recoverable_msg;
+	enum hw_event_mc_err_type tp_event;
+	char *type, *optype, msg[256], *recoverable_msg;
 	bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0);
 	bool overflow = GET_BITFIELD(m->status, 62, 62);
 	bool uncorrected_error = GET_BITFIELD(m->status, 61, 61);
@@ -1413,10 +1400,18 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	int csrow, rc, dimm;
 	char *area_type = "Unknown";
 
-	if (ripv)
-		type = "NON_FATAL";
-	else
-		type = "FATAL";
+	if (uncorrected_error) {
+		if (ripv) {
+			type = "FATAL";
+			tp_event = HW_EVENT_ERR_FATAL;
+		} else {
+			type = "NON_FATAL";
+			tp_event = HW_EVENT_ERR_UNCORRECTED;
+		}
+	} else {
+		type = "CORRECTED";
+		tp_event = HW_EVENT_ERR_CORRECTED;
+	}
 
 	/*
 	 * According with Table 15-9 of the Intel Archictecture spec vol 3A,
@@ -1434,19 +1429,19 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	} else {
 		switch (optypenum) {
 		case 0:
-			optype = "generic undef request";
+			optype = "generic undef request error";
 			break;
 		case 1:
-			optype = "memory read";
+			optype = "memory read error";
 			break;
 		case 2:
-			optype = "memory write";
+			optype = "memory write error";
 			break;
 		case 3:
-			optype = "addr/cmd";
+			optype = "addr/cmd error";
 			break;
 		case 4:
-			optype = "memory scrubbing";
+			optype = "memory scrubbing error";
 			break;
 		default:
 			optype = "reserved";
@@ -1455,13 +1450,13 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	}
 
 	rc = get_memory_error_data(mci, m->addr, &socket,
-				   &channel_mask, &rank, area_type);
+				   &channel_mask, &rank, area_type, msg);
 	if (rc < 0)
-		return;
+		goto err_parsing;
 	new_mci = get_mci_for_node_id(socket);
 	if (!new_mci) {
-		edac_mc_handle_ce_no_info(mci, "Error: socket got corrupted!");
-		return;
+		strcpy(msg, "Error: socket got corrupted!");
+		goto err_parsing;
 	}
 	mci = new_mci;
 	pvt = mci->pvt_info;
@@ -1487,18 +1482,14 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	 * Probably, we can just discard it, as the channel information
 	 * comes from the get_memory_error_data() address decoding
 	 */
-	msg = kasprintf(GFP_ATOMIC,
-			"%d %s error(s): %s on %s area %s%s: cpu=%d Err=%04x:%04x (ch=%d), "
-			"addr = 0x%08llx => socket=%d, Channel=%ld(mask=%ld), rank=%d\n",
+	snprintf(msg, sizeof(msg),
+			"%d error(s)%s: %s%s: cpu=%d Err=%04x:%04x addr = 0x%08llx socket=%d Channel=%ld(mask=%ld), rank=%d\n",
 			core_err_cnt,
+			overflow ? " OVERFLOW" : "",
 			area_type,
-			optype,
-			type,
 			recoverable_msg,
-			overflow ? "OVERFLOW" : "",
 			m->cpu,
 			mscod, errcode,
-			channel,		/* 1111b means not specified */
 			(long long) m->addr,
 			socket,
 			first_channel,		/* This is the real channel on SB */
@@ -1507,13 +1498,21 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 
 	debugf0("%s", msg);
 
+	/* FIXME: need support for channel mask */
+
 	/* Call the helper to output message */
-	if (uncorrected_error)
-		edac_mc_handle_fbd_ue(mci, csrow, 0, 0, msg);
-	else
-		edac_mc_handle_fbd_ce(mci, csrow, 0, msg);
+	edac_mc_handle_error(tp_event,
+			     HW_EVENT_SCOPE_MC_DIMM, mci,
+			     m->addr >> PAGE_SHIFT, m->addr & ~PAGE_MASK, 0,
+			     0, channel, dimm, -1, -1,
+			     optype, msg);
+	return;
+err_parsing:
+	edac_mc_handle_error(tp_event,
+			     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+			     -1, -1, -1, -1, -1,
+			     msg, "");
 
-	kfree(msg);
 }
 
 /*
@@ -1676,15 +1675,17 @@ static int sbridge_register_mci(struct sbridge_dev *sbridge_dev)
 {
 	struct mem_ctl_info *mci;
 	struct sbridge_pvt *pvt;
-	int rc, channels, csrows;
+	int rc, channels, dimms;
 
 	/* Check the number of active and not disabled channels */
-	rc = sbridge_get_active_channels(sbridge_dev->bus, &channels, &csrows);
+	rc = sbridge_get_active_channels(sbridge_dev->bus, &channels, &dimms);
 	if (unlikely(rc < 0))
 		return rc;
 
 	/* allocate a new MC control structure */
-	mci = edac_mc_alloc(sizeof(*pvt), csrows, channels, sbridge_dev->mc);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    1, channels, dimms,
+			    dimms, channels, sizeof(*pvt));
 	if (unlikely(!mci))
 		return -ENOMEM;
 
diff --git a/drivers/edac/tile_edac.c b/drivers/edac/tile_edac.c
index 6314ff9..19ac19e 100644
--- a/drivers/edac/tile_edac.c
+++ b/drivers/edac/tile_edac.c
@@ -71,7 +71,11 @@ static void tile_edac_check(struct mem_ctl_info *mci)
 	if (mem_error.sbe_count != priv->ce_count) {
 		dev_dbg(mci->dev, "ECC CE err on node %d\n", priv->node);
 		priv->ce_count = mem_error.sbe_count;
-		edac_mc_handle_ce(mci, 0, 0, 0, 0, 0, mci->ctl_name);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+				     0, 0, 0,
+				     -1, -1, -1, 0, 0,
+				     mci->ctl_name, "");
 	}
 }
 
@@ -131,8 +135,10 @@ static int __devinit tile_edac_mc_probe(struct platform_device *pdev)
 		return -EINVAL;
 
 	/* A TILE MC has a single channel and one chip-select row. */
-	mci = edac_mc_alloc(sizeof(struct tile_edac_priv),
-		TILE_EDAC_NR_CSROWS, TILE_EDAC_NR_CHANS, pdev->id);
+	mci = edac_mc_alloc(pdev->id, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    0, 0, TILE_EDAC_NR_CSROWS,
+			    TILE_EDAC_NR_CSROWS, TILE_EDAC_NR_CHANS,
+			    sizeof(struct tile_edac_priv));
 	if (mci == NULL)
 		return -ENOMEM;
 	priv = mci->pvt_info;
diff --git a/drivers/edac/x38_edac.c b/drivers/edac/x38_edac.c
index 0de288f..27cf304 100644
--- a/drivers/edac/x38_edac.c
+++ b/drivers/edac/x38_edac.c
@@ -215,19 +215,29 @@ static void x38_process_error_info(struct mem_ctl_info *mci,
 		return;
 
 	if ((info->errsts ^ info->errsts2) & X38_ERRSTS_BITS) {
-		edac_mc_handle_ce_no_info(mci, "UE overwrote CE");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
+				     -1, -1, -1, -1, -1,
+				     "UE overwrote CE", "");
 		info->errsts = info->errsts2;
 	}
 
 	for (channel = 0; channel < x38_channel_num; channel++) {
 		log = info->eccerrlog[channel];
 		if (log & X38_ECCERRLOG_UE) {
-			edac_mc_handle_ue(mci, 0, 0,
-				eccerrlog_row(channel, log), "x38 UE");
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW, mci,
+					     0, 0, 0,
+					     -1, -1, -1,
+					     eccerrlog_row(channel, log), -1,
+					     "x38 UE", "");
 		} else if (log & X38_ECCERRLOG_CE) {
-			edac_mc_handle_ce(mci, 0, 0,
-				eccerrlog_syndrome(log),
-				eccerrlog_row(channel, log), 0, "x38 CE");
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
+					     HW_EVENT_SCOPE_MC_CSROW, mci,
+					     0, 0, eccerrlog_syndrome(log),
+					     -1, -1, -1,
+					     eccerrlog_row(channel, log), -1,
+					     "x38 CE", "");
 		}
 	}
 }
@@ -334,7 +344,10 @@ static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 	how_many_channel(pdev);
 
 	/* FIXME: unconventional pvt_info usage */
-	mci = edac_mc_alloc(0, X38_RANKS, x38_channel_num, 0);
+	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
+			    -1, -1, X38_RANKS,
+			    X38_RANKS, x38_channel_num,
+			    0);
 	if (!mci)
 		return -ENOMEM;
 
@@ -362,7 +375,7 @@ static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 	 * cumulative; the last one will contain the total memory
 	 * contained in all ranks.
 	 */
-	for (i = 0; i < mci->nr_csrows; i++) {
+	for (i = 0; i < mci->num_csrows; i++) {
 		unsigned long nr_pages;
 		struct csrow_info *csrow = &mci->csrows[i];
 
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 652be25..d9fb796 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -12,6 +12,114 @@
 #ifndef _LINUX_EDAC_H_
 #define _LINUX_EDAC_H_
 
+/*
+ * Concepts used at the EDAC subsystem
+ *
+ * There are several things to be aware of that aren't at all obvious:
+ *
+ * SOCKETS, SOCKET SETS, BANKS, ROWS, CHIP-SELECT ROWS, CHANNELS, etc..
+ *
+ * These are some of the many terms that are thrown about that don't always
+ * mean what people think they mean (Inconceivable!).  In the interest of
+ * creating a common ground for discussion, terms and their definitions
+ * will be established.
+ *
+ * Memory devices:	The individual DRAM chips on a memory stick.  These
+ *			devices commonly output 4 and 8 bits each (x4, x8).
+ *			Grouping several of these in parallel provides the
+ *			number of bits that the memory controller expects:
+ *			typically 72 bits, in order to provide 64 bits of ECC
+ *			corrected data.
+ *
+ * Memory Stick:	A printed circuit board that aggregates multiple
+ *			memory devices in parallel.  In general, this is the
+ *			First replaceable unit (FRU) that the final consumer
+ *			cares to replace. It is typically encapsulated as DIMMs
+ *
+ * Socket:		A physical connector on the motherboard that accepts
+ *			a single memory stick.
+ *
+ * Branch:		The highest hierarchy on a Fully-Buffered DIMM memory
+ *			controller. Typically, it contains two channels.
+ *			Two channels at the same branch can be used in single
+ *			mode or in lockstep mode.
+ *			When lockstep is enabled, the cache line is higher,
+ *			but it generally brings some performance penalty.
+ *			Also, it is generally not possible to point to just one
+ *			memory stick when an error occurs, as the error
+ *			correction code is calculated using two dimms instead
+ *			of one. Due to that, it is capable of correcting more
+ *			errors than on single mode.
+ *
+ * Channel:		A memory controller channel, responsible to communicate
+ *			with a group of DIMM's. Each channel has its own
+ *			independent control (command) and data bus, and can
+ *			be used independently or grouped.
+ *
+ * Single-channel:	The data accessed by the memory controller is contained
+ *			into one dimm only. E. g. if the data is 64 bits-wide,
+ *			the data flows to the CPU using one 64 bits parallel
+ *			access.
+ *			Typically used with SDR, DDR, DDR2 and DDR3 memories.
+ *			FB-DIMM and RAMBUS use a different concept for channel,
+ *			so this concept doesn't apply there.
+ *
+ * Double-channel:	The data size accessed by the memory controller is
+ *			contained into two dimms accessed at the same time.
+ *			E. g. if the DIMM is 64 bits-wide, the data flows to
+ *			the CPU using a 128 bits parallel access.
+ *			Typically used with SDR, DDR, DDR2 and DDR3 memories.
+ *			FB-DIMM and RAMBUS uses a different concept for channel,
+ *			so this concept doesn't apply there.
+ *
+ * Chip-select row:	This is the name of the memory controller signal used
+ *			to select the DRAM chips to be used. It may not be
+ *			visible by the memory controller, as some memory buffer
+ *			chip may be responsible to control it.
+ *			On devices where it is visible, it controls the DIMM
+ *			(or the DIMM pair, in dual-channel mode) that is
+ *			accessed by the memory controller.
+ *
+ * Single-Ranked stick:	A Single-ranked stick has 1 chip-select row of memory.
+ *			Motherboards commonly drive two chip-select pins to
+ *			a memory stick. A single-ranked stick, will occupy
+ *			only one of those rows. The other will be unused.
+ *
+ * Double-Ranked stick:	A double-ranked stick has two chip-select rows which
+ *			access different sets of memory devices.  The two
+ *			rows cannot be accessed concurrently.
+ *
+ * Double-sided stick:	DEPRECATED TERM, see Double-Ranked stick.
+ *			A double-sided stick has two chip-select rows which
+ *			access different sets of memory devices.  The two
+ *			rows cannot be accessed concurrently.  "Double-sided"
+ *			is irrespective of the memory devices being mounted
+ *			on both sides of the memory stick.
+ *
+ * Socket set:		All of the memory sticks that are required for
+ *			a single memory access or all of the memory sticks
+ *			spanned by a chip-select row.  A single socket set
+ *			has two chip-select rows and if double-sided sticks
+ *			are used these will occupy those chip-select rows.
+ *
+ * Bank:		This term is avoided because it is unclear when
+ *			needing to distinguish between chip-select rows and
+ *			socket sets.
+ *
+ * Controller pages:
+ *
+ * Physical pages:
+ *
+ * Virtual pages:
+ *
+ *
+ * STRUCTURE ORGANIZATION AND CHOICES
+ *
+ *
+ *
+ * PS - I enjoyed writing all that about as much as you enjoyed reading it.
+ */
+
 #include <linux/atomic.h>
 #include <linux/sysdev.h>
 
@@ -73,6 +181,40 @@ enum hw_event_mc_err_type {
 };
 
 /**
+ * enum hw_event_error_scope - escope of a memory error
+ * @HW_EVENT_ERR_MC:		error can be anywhere inside the MC
+ * @HW_EVENT_SCOPE_MC_BRANCH:	error can be on any DIMM inside the branch
+ * @HW_EVENT_SCOPE_MC_CHANNEL:	error can be on any DIMM inside the MC channel
+ * @HW_EVENT_SCOPE_MC_DIMM:	error is on a specific DIMM
+ * @HW_EVENT_SCOPE_MC_CSROW:	error can be on any DIMM inside the csrow
+ * @HW_EVENT_SCOPE_MC_CSROW_CHANNEL: error is on a CSROW channel
+ *
+ * Depending on the error detection algorithm, the memory topology and even
+ * the MC capabilities, some errors can't be attributed to just one DIMM, but
+ * to a group of memory sockets. Depending on where the error occurs, the
+ * EDAC core will increment the corresponding error count for that entity,
+ * and the upper entities. For example, assuming a system with 1 memory
+ * controller 2 branches, 2 MC channels and 4 DIMMS on it, if an error
+ * happens at channel 0, the error counts for channel 0, for branch 0 and
+ * for the memory controller 0 will be incremented. The DIMM error counts won't
+ * be incremented, as, in this example, the driver can't be 100% sure on what
+ * memory the error actually occurred.
+ *
+ * The order here is important, as edac_mc_handle_error() will use it, in order
+ * to check what parameters will be used. The smallest number should be
+ * the hole memory controller, and the last one should be the more
+ * fine-grained detail, e. g.: DIMM.
+ */
+enum hw_event_error_scope {
+	HW_EVENT_SCOPE_MC,
+	HW_EVENT_SCOPE_MC_BRANCH,
+	HW_EVENT_SCOPE_MC_CHANNEL,
+	HW_EVENT_SCOPE_MC_DIMM,
+	HW_EVENT_SCOPE_MC_CSROW,
+	HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
+};
+
+/**
  * enum mem_type - memory types
  *
  * @MEM_EMPTY		Empty csrow
@@ -233,114 +375,6 @@ enum scrub_type {
 #define OP_RUNNING_POLL_INTR	0x203
 #define OP_OFFLINE		0x300
 
-/*
- * Concepts used at the EDAC subsystem
- *
- * There are several things to be aware of that aren't at all obvious:
- *
- * SOCKETS, SOCKET SETS, BANKS, ROWS, CHIP-SELECT ROWS, CHANNELS, etc..
- *
- * These are some of the many terms that are thrown about that don't always
- * mean what people think they mean (Inconceivable!).  In the interest of
- * creating a common ground for discussion, terms and their definitions
- * will be established.
- *
- * Memory devices:	The individual DRAM chips on a memory stick.  These
- *			devices commonly output 4 and 8 bits each (x4, x8).
- *			Grouping several of these in parallel provides the
- *			number of bits that the memory controller expects:
- *			typically 72 bits, in order to provide 64 bits of ECC
- *			corrected data.
- *
- * Memory Stick:	A printed circuit board that aggregates multiple
- *			memory devices in parallel.  In general, this is the
- *			First replaceable unit (FRU) that the final consumer
- *			cares to replace. It is typically encapsulated as DIMMs
- *
- * Socket:		A physical connector on the motherboard that accepts
- *			a single memory stick.
- *
- * Branch:		The highest hierarchy on a Fully-Buffered DIMM memory
- *			controller. Typically, it contains two channels.
- *			Two channels at the same branch can be used in single
- *			mode or in lockstep mode.
- *			When lockstep is enabled, the cache line is higher,
- *			but it generally brings some performance penalty.
- *			Also, it is generally not possible to point to just one
- *			memory stick when an error occurs, as the error
- *			correction code is calculated using two dimms instead
- *			of one. Due to that, it is capable of correcting more
- *			errors than on single mode.
- *
- * Channel:		A memory controller channel, responsible to communicate
- *			with a group of DIMM's. Each channel has its own
- *			independent control (command) and data bus, and can
- *			be used independently or grouped.
- *
- * Single-channel:	The data accessed by the memory controller is contained
- *			into one dimm only. E. g. if the data is 64 bits-wide,
- *			the data flows to the CPU using one 64 bits parallel
- *			access.
- *			Typically used with SDR, DDR, DDR2 and DDR3 memories.
- *			FB-DIMM and RAMBUS use a different concept for channel,
- *			so this concept doesn't apply there.
- *
- * Double-channel:	The data size accessed by the memory controller is
- *			contained into two dimms accessed at the same time.
- *			E. g. if the DIMM is 64 bits-wide, the data flows to
- *			the CPU using a 128 bits parallel access.
- *			Typically used with SDR, DDR, DDR2 and DDR3 memories.
- *			FB-DIMM and RAMBUS uses a different concept for channel,
- *			so this concept doesn't apply there.
- *
- * Chip-select row:	This is the name of the memory controller signal used
- *			to select the DRAM chips to be used. It may not be
- *			visible by the memory controller, as some memory buffer
- *			chip may be responsible to control it.
- *			On devices where it is visible, it controls the DIMM
- *			(or the DIMM pair, in dual-channel mode) that is
- *			accessed by the memory controller.
- *
- * Single-Ranked stick:	A Single-ranked stick has 1 chip-select row of memory.
- *			Motherboards commonly drive two chip-select pins to
- *			a memory stick. A single-ranked stick, will occupy
- *			only one of those rows. The other will be unused.
- *
- * Double-Ranked stick:	A double-ranked stick has two chip-select rows which
- *			access different sets of memory devices.  The two
- *			rows cannot be accessed concurrently.
- *
- * Double-sided stick:	DEPRECATED TERM, see Double-Ranked stick.
- *			A double-sided stick has two chip-select rows which
- *			access different sets of memory devices.  The two
- *			rows cannot be accessed concurrently.  "Double-sided"
- *			is irrespective of the memory devices being mounted
- *			on both sides of the memory stick.
- *
- * Socket set:		All of the memory sticks that are required for
- *			a single memory access or all of the memory sticks
- *			spanned by a chip-select row.  A single socket set
- *			has two chip-select rows and if double-sided sticks
- *			are used these will occupy those chip-select rows.
- *
- * Bank:		This term is avoided because it is unclear when
- *			needing to distinguish between chip-select rows and
- *			socket sets.
- *
- * Controller pages:
- *
- * Physical pages:
- *
- * Virtual pages:
- *
- *
- * STRUCTURE ORGANIZATION AND CHOICES
- *
- *
- *
- * PS - I enjoyed writing all that about as much as you enjoyed reading it.
- */
-
 /* FIXME: add the proper per-location error counts */
 struct dimm_info {
 	char label[EDAC_MC_LABEL_LEN + 1];	/* DIMM label on motherboard */
@@ -348,9 +382,9 @@ struct dimm_info {
 	/* Memory location data */
 	int mc_branch;
 	int mc_channel;
-	int csrow;
 	int mc_dimm_number;
-	int csrow_channel;
+	int csrow;
+	int cschannel;
 
 	struct kobject kobj;		/* sysfs kobject for this csrow */
 	struct mem_ctl_info *mci;	/* the parent */
@@ -361,13 +395,10 @@ struct dimm_info {
 	enum edac_type edac_mode;	/* EDAC mode for this dimm */
 
 	u32 nr_pages;			/* number of pages in csrow */
-
-	u32 ce_count;		/* Correctable Errors for this dimm */
 };
 
 struct csrow_channel_info {
 	int chan_idx;		/* channel index */
-	u32 ce_count;		/* Correctable Errors for this CHANNEL */
 	struct dimm_info *dimm;
 	struct csrow_info *csrow;	/* the parent */
 };
@@ -381,9 +412,6 @@ struct csrow_info {
 	unsigned long page_mask;	/* used for interleaving -
 					 * 0UL for non intlv */
 
-	u32 ue_count;		/* Uncorrectable Errors for this csrow */
-	u32 ce_count;		/* Correctable Errors for this csrow */
-
 	struct mem_ctl_info *mci;	/* the parent */
 
 	struct kobject kobj;	/* sysfs kobject for this csrow */
@@ -421,6 +449,24 @@ struct mcidev_sysfs_attribute {
         ssize_t (*store)(struct mem_ctl_info *, const char *,size_t);
 };
 
+/*
+ * Error counters for all possible memory arrangements
+ */
+struct error_counts {
+	u32 ce_mc;
+	u32 *ce_branch;
+	u32 *ce_channel;
+	u32 *ce_dimm;
+	u32 *ce_csrow;
+	u32 *ce_cschannel;
+	u32 ue_mc;
+	u32 *ue_branch;
+	u32 *ue_channel;
+	u32 *ue_dimm;
+	u32 *ue_csrow;
+	u32 *ue_cschannel;
+};
+
 /* MEMORY controller information structure
  */
 struct mem_ctl_info {
@@ -465,13 +511,19 @@ struct mem_ctl_info {
 	unsigned long (*ctl_page_to_phys) (struct mem_ctl_info * mci,
 					   unsigned long page);
 	int mc_idx;
-	int nr_csrows;
 	struct csrow_info *csrows;
 
+	/* Number of allocated memory location data */
+	unsigned num_branch;
+	unsigned num_channel;
+	unsigned num_dimm;
+	unsigned num_csrows;
+	unsigned num_cschannel;
+
 	/*
 	 * DIMM info. Will eventually remove the entire csrows_info some day
 	 */
-	unsigned nr_dimms;
+	unsigned tot_dimms;
 	struct dimm_info *dimms;
 
 	/*
@@ -486,12 +538,12 @@ struct mem_ctl_info {
 	const char *dev_name;
 	char proc_name[MC_PROC_NAME_MAX_LEN + 1];
 	void *pvt_info;
-	u32 ue_noinfo_count;	/* Uncorrectable Errors w/o info */
-	u32 ce_noinfo_count;	/* Correctable Errors w/o info */
-	u32 ue_count;		/* Total Uncorrectable Errors for this MC */
-	u32 ce_count;		/* Total Correctable Errors for this MC */
 	unsigned long start_time;	/* mci load start time (in jiffies) */
 
+	/* drivers shouldn't access this struct directly */
+	struct error_counts err;
+	unsigned ce_noinfo_count, ue_noinfo_count;
+
 	struct completion complete;
 
 	/* edac sysfs device control */
@@ -504,7 +556,7 @@ struct mem_ctl_info {
 	 * by the low level driver.
 	 *
 	 * Set by the low level driver to provide attributes at the
-	 * controller level, same level as 'ue_count' and 'ce_count' above.
+	 * controller level.
 	 * An array of structures, NULL terminated
 	 *
 	 * If attributes are desired, then set to array of attributes
diff --git a/include/trace/events/hw_event.h b/include/trace/events/hw_event.h
index fee7ed2..cbec44a 100644
--- a/include/trace/events/hw_event.h
+++ b/include/trace/events/hw_event.h
@@ -54,38 +54,60 @@ DEFINE_EVENT(hw_event_class, hw_event_init,
  */
 TRACE_EVENT(mc_error,
 
-	TP_PROTO(unsigned int err_type,
-		 unsigned int mc_index,
-		 const char *label,
+	TP_PROTO(const unsigned int err_type,
+		 const unsigned int mc_index,
 		 const char *msg,
-		 const char *detail),
+		 const char *label,
+		 const int branch,
+		 const int channel,
+		 const int dimm,
+		 const int csrow,
+		 const int cschannel,
+		 const char *detail,
+		 const char *driver_detail),
 
-	TP_ARGS(err_type, mc_index, label, msg, detail),
+	TP_ARGS(err_type, mc_index, msg, label, branch, channel, dimm, csrow,
+		cschannel, detail, driver_detail),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	err_type		)
 		__field(	unsigned int,	mc_index		)
-		__string(	label,		label			)
+		__field(	int,		branch			)
+		__field(	int,		channel			)
+		__field(	int,		dimm			)
+		__field(	int,		csrow			)
+		__field(	int,		cschannel		)
 		__string(	msg,		msg			)
+		__string(	label,		label			)
 		__string(	detail,		detail			)
+		__string(	driver_detail,	driver_detail		)
 	),
 
 	TP_fast_assign(
 		__entry->err_type		= err_type;
 		__entry->mc_index		= mc_index;
-		__assign_str(label, label);
+		__entry->branch			= branch;
+		__entry->channel		= channel;
+		__entry->dimm			= dimm;
+		__entry->csrow			= csrow;
+		__entry->cschannel		= cschannel;
 		__assign_str(msg, msg);
+		__assign_str(label, label);
 		__assign_str(detail, detail);
+		__assign_str(driver_detail, driver_detail);
 	),
 
-	TP_printk(HW_ERR "mce#%d: %s error %s on label \"%s\" %s\n",
+	TP_printk(HW_ERR "mce#%d: %s error %s on label \"%s\" (location %d.%d.%d.%d.%d %s %s)\n",
 		  __entry->mc_index,
 		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
 			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
 			"Fatal" : "Uncorrected"),
 		  __get_str(msg),
 		  __get_str(label),
-		  __get_str(detail))
+		  __entry->branch, __entry->channel, __entry->dimm,
+		  __entry->csrow, __entry->cschannel,
+		  __get_str(detail),
+		  __get_str(driver_detail))
 );
 
 TRACE_EVENT(mc_out_of_range,
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 18/31] edac.h: Use kernel-doc-nano-HOWTO.txt notation for enums
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (16 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 17/31] edac-mc: Allow reporting errors on a non-csrow oriented way Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 19/31] edac: rework memory layer hierarchy description Mauro Carvalho Chehab
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 include/linux/edac.h |   96 +++++++++++++++++++++++++++++++++++++------------
 1 files changed, 72 insertions(+), 24 deletions(-)

diff --git a/include/linux/edac.h b/include/linux/edac.h
index d9fb796..1d707f4 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -153,7 +153,19 @@ static inline void opstate_init(void)
 #define EDAC_MC_LABEL_LEN	31
 #define MC_PROC_NAME_MAX_LEN	7
 
-/* memory devices */
+/**
+ * enum dev_type - describe the type of memory DRAM chips used at the stick
+ * @DEV_UNKNOWN:	Can't be determined, or MC doesn't support detect it
+ * @DEV_X1:		1 bit for data
+ * @DEV_X2:		2 bits for data
+ * @DEV_X4:		4 bits for data
+ * @DEV_X8:		8 bits for data
+ * @DEV_X16:		16 bits for data
+ * @DEV_X32:		32 bits for data
+ * @DEV_X64:		64 bits for data
+ *
+ * Typical values are x4 and x8.
+ */
 enum dev_type {
 	DEV_UNKNOWN = 0,
 	DEV_X1,
@@ -174,6 +186,19 @@ enum dev_type {
 #define DEV_FLAG_X32		BIT(DEV_X32)
 #define DEV_FLAG_X64		BIT(DEV_X64)
 
+/**
+ * enum hw_event_mc_err_type - type of the detected error
+ *
+ * @HW_EVENT_ERR_CORRECTED:	Corrected Error - Indicates that an ECC
+ * 				corrected error was detected
+ * @HW_EVENT_ERR_UNCORRECTED:	Uncorrected Error - Indicates an error that
+ *				can't be corrected by ECC, but it is not
+ *				factal (maybe it is on an unused memory area,
+ *				or the memory controller could recover from
+ *				it for example, by re-trying the operation).
+ * @HW_EVENT_ERR_FATAL:		Fatal Error - Uncorrected error that could not
+ *				be recovered.
+ */
 enum hw_event_mc_err_type {
 	HW_EVENT_ERR_CORRECTED,
 	HW_EVENT_ERR_UNCORRECTED,
@@ -182,6 +207,7 @@ enum hw_event_mc_err_type {
 
 /**
  * enum hw_event_error_scope - escope of a memory error
+ *
  * @HW_EVENT_ERR_MC:		error can be anywhere inside the MC
  * @HW_EVENT_SCOPE_MC_BRANCH:	error can be on any DIMM inside the branch
  * @HW_EVENT_SCOPE_MC_CHANNEL:	error can be on any DIMM inside the MC channel
@@ -215,7 +241,7 @@ enum hw_event_error_scope {
 };
 
 /**
- * enum mem_type - memory types
+ * enum mem_type - Type of the memory stick
  *
  * @MEM_EMPTY		Empty csrow
  * @MEM_RESERVED:	Reserved csrow type
@@ -319,18 +345,29 @@ enum mem_type {
 #define MEM_FLAG_DDR3		 BIT(MEM_DDR3)
 #define MEM_FLAG_RDDR3		 BIT(MEM_RDDR3)
 
-/* chipset Error Detection and Correction capabilities and mode */
+/** enum edac-type - Error Detection and Correction capabilities and mode
+ * @EDAC_UNKNOWN:	Unknown if ECC is available
+ * @EDAC_NONE:		Doesn't support ECC
+ * @EDAC_RESERVED:	Reserved ECC type
+ * @EDAC_PARITY:	Detects parity errors
+ * @EDAC_EC:		Error Checking - no correction
+ * @EDAC_SECDED:	Single bit error correction, Double detection
+ * @EDAC_S2ECD2ED:	Chipkill x2 devices - do these exist?
+ * @EDAC_S4ECD4ED:	Chipkill x4 devices
+ * @EDAC_S8ECD8ED:	Chipkill x8 devices
+ * @EDAC_S16ECD16ED:	Chipkill x16 devices
+ */
 enum edac_type {
-	EDAC_UNKNOWN = 0,	/* Unknown if ECC is available */
-	EDAC_NONE,		/* Doesn't support ECC */
-	EDAC_RESERVED,		/* Reserved ECC type */
-	EDAC_PARITY,		/* Detects parity errors */
-	EDAC_EC,		/* Error Checking - no correction */
-	EDAC_SECDED,		/* Single bit error correction, Double detection */
-	EDAC_S2ECD2ED,		/* Chipkill x2 devices - do these exist? */
-	EDAC_S4ECD4ED,		/* Chipkill x4 devices */
-	EDAC_S8ECD8ED,		/* Chipkill x8 devices */
-	EDAC_S16ECD16ED,	/* Chipkill x16 devices */
+	EDAC_UNKNOWN =	0,
+	EDAC_NONE,
+	EDAC_RESERVED,
+	EDAC_PARITY,
+	EDAC_EC,
+	EDAC_SECDED,
+	EDAC_S2ECD2ED,
+	EDAC_S4ECD4ED,
+	EDAC_S8ECD8ED,
+	EDAC_S16ECD16ED,
 };
 
 #define EDAC_FLAG_UNKNOWN	BIT(EDAC_UNKNOWN)
@@ -343,18 +380,29 @@ enum edac_type {
 #define EDAC_FLAG_S8ECD8ED	BIT(EDAC_S8ECD8ED)
 #define EDAC_FLAG_S16ECD16ED	BIT(EDAC_S16ECD16ED)
 
-/* scrubbing capabilities */
+/** enum scrub_type - scrubbing capabilities
+ * @SCRUB_UNKNOWN		Unknown if scrubber is available
+ * @SCRUB_NONE:			No scrubber
+ * @SCRUB_SW_PROG:		SW progressive (sequential) scrubbing
+ * @SCRUB_SW_SRC:		Software scrub only errors
+ * @SCRUB_SW_PROG_SRC:		Progressive software scrub from an error
+ * @SCRUB_SW_TUNABLE:		Software scrub frequency is tunable
+ * @SCRUB_HW_PROG:		HW progressive (sequential) scrubbing
+ * @SCRUB_HW_SRC:		Hardware scrub only errors
+ * @SCRUB_HW_PROG_SRC:		Progressive hardware scrub from an error
+ * SCRUB_HW_TUNABLE:		Hardware scrub frequency is tunable
+ */
 enum scrub_type {
-	SCRUB_UNKNOWN = 0,	/* Unknown if scrubber is available */
-	SCRUB_NONE,		/* No scrubber */
-	SCRUB_SW_PROG,		/* SW progressive (sequential) scrubbing */
-	SCRUB_SW_SRC,		/* Software scrub only errors */
-	SCRUB_SW_PROG_SRC,	/* Progressive software scrub from an error */
-	SCRUB_SW_TUNABLE,	/* Software scrub frequency is tunable */
-	SCRUB_HW_PROG,		/* HW progressive (sequential) scrubbing */
-	SCRUB_HW_SRC,		/* Hardware scrub only errors */
-	SCRUB_HW_PROG_SRC,	/* Progressive hardware scrub from an error */
-	SCRUB_HW_TUNABLE	/* Hardware scrub frequency is tunable */
+	SCRUB_UNKNOWN =	0,
+	SCRUB_NONE,
+	SCRUB_SW_PROG,
+	SCRUB_SW_SRC,
+	SCRUB_SW_PROG_SRC,
+	SCRUB_SW_TUNABLE,
+	SCRUB_HW_PROG,
+	SCRUB_HW_SRC,
+	SCRUB_HW_PROG_SRC,
+	SCRUB_HW_TUNABLE
 };
 
 #define SCRUB_FLAG_SW_PROG	BIT(SCRUB_SW_PROG)
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 19/31] edac: rework memory layer hierarchy description
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (17 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 18/31] edac.h: Use kernel-doc-nano-HOWTO.txt notation for enums Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 20/31] edac: Export MC hierarchy counters for CE and UE Mauro Carvalho Chehab
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

The old way of allocating data were confusing. It were
also introducing a miss-concept with regards to "channel",
when csrows are used.

Instead, use a more generic approach: breaks the memory
controller into layers. Drivers are free to describe the
layers any way they want, in order to match the memory
architecture.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/amd64_edac.c       |  108 +++----
 drivers/edac/amd76x_edac.c      |   35 ++-
 drivers/edac/cell_edac.c        |   31 +-
 drivers/edac/cpc925_edac.c      |   26 +-
 drivers/edac/e752x_edac.c       |   44 ++--
 drivers/edac/e7xxx_edac.c       |   37 +--
 drivers/edac/edac_core.h        |   54 +---
 drivers/edac/edac_mc.c          |  684 +++++++++++++++------------------------
 drivers/edac/edac_mc_sysfs.c    |   93 +++---
 drivers/edac/i3000_edac.c       |   34 +-
 drivers/edac/i3200_edac.c       |   36 +-
 drivers/edac/i5000_edac.c       |   48 ++--
 drivers/edac/i5100_edac.c       |   26 +-
 drivers/edac/i5400_edac.c       |   49 ++--
 drivers/edac/i7300_edac.c       |   46 +--
 drivers/edac/i7core_edac.c      |  168 ++---------
 drivers/edac/i82443bxgx_edac.c  |   28 +-
 drivers/edac/i82860_edac.c      |   36 +--
 drivers/edac/i82875p_edac.c     |   35 +-
 drivers/edac/i82975x_edac.c     |   33 +-
 drivers/edac/mpc85xx_edac.c     |   26 +-
 drivers/edac/mv64x60_edac.c     |   26 +-
 drivers/edac/pasemi_edac.c      |   25 +-
 drivers/edac/ppc4xx_edac.c      |   34 +-
 drivers/edac/r82600_edac.c      |   33 +-
 drivers/edac/sb_edac.c          |   94 ++-----
 drivers/edac/tile_edac.c        |   18 +-
 drivers/edac/x38_edac.c         |   37 ++-
 include/linux/edac.h            |  122 ++++----
 include/trace/events/hw_event.h |   27 +--
 30 files changed, 838 insertions(+), 1255 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 139e774..1b374b5 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1049,24 +1049,24 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 	if (!src_mci) {
 		amd64_mc_err(mci, "failed to map error addr 0x%lx to a node\n",
 			     (unsigned long)sys_addr);
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     page, offset, syndrome,
-				     -1, -1, -1, -1, -1,
+				     -1, -1, -1,
 				     EDAC_MOD_STR,
-				     "failed to map error addr to a node");
+				     "failed to map error addr to a node",
+				     NULL);
 		return;
 	}
 
 	/* Now map the sys_addr to a CSROW */
 	csrow = sys_addr_to_csrow(src_mci, sys_addr);
 	if (csrow < 0) {
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     page, offset, syndrome,
-				     -1, -1, -1, -1, -1,
+				     -1, -1, -1,
 				     EDAC_MOD_STR,
-				     "failed to map error addr to a csrow");
+				     "failed to map error addr to a csrow",
+				     NULL);
 		return;
 	}
 
@@ -1082,12 +1082,12 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 			amd64_mc_warn(src_mci, "unknown syndrome 0x%04x - "
 				      "possible error reporting race\n",
 				      syndrome);
-			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW, mci,
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 					     page, offset, syndrome,
-					     -1, -1, -1, csrow, -1,
+					     csrow, -1, -1,
 					     EDAC_MOD_STR,
-					     "unknown syndrome - possible error reporting race");
+					     "unknown syndrome - possible error reporting race",
+					     NULL);
 			return;
 		}
 	} else {
@@ -1102,11 +1102,10 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 		channel = ((sys_addr & BIT(3)) != 0);
 	}
 
-	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-			     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, src_mci,
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, src_mci,
 			     page, offset, syndrome,
-			     -1, -1, -1, csrow, channel,
-			     EDAC_MOD_STR, "");
+			     csrow, channel, -1,
+			     EDAC_MOD_STR, "", NULL);
 }
 
 static int ddr2_cs_size(unsigned i, bool dct_width)
@@ -1587,19 +1586,18 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 	struct amd64_pvt *pvt = mci->pvt_info;
 	u32 page, offset;
 	int nid, csrow, chan = 0;
-	enum hw_event_error_scope scope;
 
 	error_address_to_page_and_offset(sys_addr, &page, &offset);
 
 	csrow = f1x_translate_sysaddr_to_cs(pvt, sys_addr, &nid, &chan);
 
 	if (csrow < 0) {
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     page, offset, syndrome,
-				     -1, -1, -1, -1, -1,
+				     -1, -1, -1,
 				     EDAC_MOD_STR,
-				     "failed to map error addr to a csrow");
+				     "failed to map error addr to a csrow",
+				     NULL);
 		return;
 	}
 
@@ -1611,22 +1609,16 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
 	if (dct_ganging_enabled(pvt))
 		chan = get_channel_from_ecc_syndrome(mci, syndrome);
 
-	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				HW_EVENT_SCOPE_MC, mci,
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				page, offset, syndrome,
-				-1, -1, -1, -1, -1,
+				-1, -1, -1,
 				EDAC_MOD_STR,
-				"failed to map error addr to a csrow");
-	if (chan >= 0)
-		scope = HW_EVENT_SCOPE_MC_CSROW_CHANNEL;
-	else
-		scope = HW_EVENT_SCOPE_MC_CSROW;
+				"failed to map error addr to a csrow", NULL);
 
-	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				HW_EVENT_SCOPE_MC, mci,
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				page, offset, syndrome,
-				-1, -1, -1, csrow, chan,
-				EDAC_MOD_STR, "");
+				csrow, chan, -1,
+				EDAC_MOD_STR, "", NULL);
 }
 
 /*
@@ -1907,12 +1899,12 @@ static void amd64_handle_ce(struct mem_ctl_info *mci, struct mce *m)
 	/* Ensure that the Error Address is VALID */
 	if (!(m->status & MCI_STATUS_ADDRV)) {
 		amd64_mc_err(mci, "HW has no ERROR_ADDRESS available\n");
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     0, 0, 0,
-				     -1, -1, -1, -1, -1,
+				     -1, -1, -1,
 				     EDAC_MOD_STR,
-				     "HW has no ERROR_ADDRESS available");
+				     "HW has no ERROR_ADDRESS available",
+				     NULL);
 		return;
 	}
 
@@ -1936,12 +1928,12 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
 
 	if (!(m->status & MCI_STATUS_ADDRV)) {
 		amd64_mc_err(mci, "HW has no ERROR_ADDRESS available\n");
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     0, 0, 0,
-				     -1, -1, -1, -1, -1,
+				     -1, -1, -1,
 				     EDAC_MOD_STR,
-				     "HW has no ERROR_ADDRESS available");
+				     "HW has no ERROR_ADDRESS available",
+				     NULL);
 		return;
 	}
 
@@ -1956,12 +1948,11 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
 	if (!src_mci) {
 		amd64_mc_err(mci, "ERROR ADDRESS (0x%lx) NOT mapped to a MC\n",
 				  (unsigned long)sys_addr);
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     page, offset, 0,
-				     -1, -1, -1, -1, -1,
+				     -1, -1, -1,
 				     EDAC_MOD_STR,
-				     "ERROR ADDRESS NOT mapped to a MC");
+				     "ERROR ADDRESS NOT mapped to a MC", NULL);
 		return;
 	}
 
@@ -1971,18 +1962,17 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
 	if (csrow < 0) {
 		amd64_mc_err(mci, "ERROR_ADDRESS (0x%lx) NOT mapped to CS\n",
 				  (unsigned long)sys_addr);
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     page, offset, 0,
-				     -1, -1, -1, -1, -1,
+				     -1, -1, -1,
 				     EDAC_MOD_STR,
-				     "ERROR ADDRESS NOT mapped to CS");
+				     "ERROR ADDRESS NOT mapped to CS",
+				     NULL);
 	} else {
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     page, offset, 0,
-				     -1, -1, -1, csrow, -1,
-				     EDAC_MOD_STR, "");
+				     csrow, -1, -1,
+				     EDAC_MOD_STR, "", NULL);
 	}
 }
 
@@ -2542,6 +2532,7 @@ static int amd64_init_one_instance(struct pci_dev *F2)
 	struct amd64_pvt *pvt = NULL;
 	struct amd64_family_type *fam_type = NULL;
 	struct mem_ctl_info *mci = NULL;
+	struct edac_mc_layer layers[2];
 	int err = 0, ret;
 	u8 nid = get_node_id(F2);
 
@@ -2576,10 +2567,13 @@ static int amd64_init_one_instance(struct pci_dev *F2)
 		goto err_siblings;
 
 	ret = -ENOMEM;
-	/* FIXME: Assuming one DIMM per csrow channel */
-	mci = edac_mc_alloc(nid, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, pvt->csels[0].b_cnt * pvt->channel_count,
-			    pvt->csels[0].b_cnt, pvt->channel_count, nid);
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = pvt->csels[0].b_cnt;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = pvt->channel_count;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, false, 0);
 	if (!mci)
 		goto err_siblings;
 
diff --git a/drivers/edac/amd76x_edac.c b/drivers/edac/amd76x_edac.c
index 7e6bbf8..4f3e54a 100644
--- a/drivers/edac/amd76x_edac.c
+++ b/drivers/edac/amd76x_edac.c
@@ -145,12 +145,10 @@ static int amd76x_process_error_info(struct mem_ctl_info *mci,
 
 		if (handle_errors) {
 			row = (info->ecc_mode_status >> 4) & 0xf;
-			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
-					     mci, mci->csrows[row].first_page,
-					     0, 0,
-					     -1, -1, row, row, 0,
-					     mci->ctl_name, "");
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
+					     mci->csrows[row].first_page, 0, 0,
+					     row, 0, -1,
+					     mci->ctl_name, "", NULL);
 		}
 	}
 
@@ -162,12 +160,10 @@ static int amd76x_process_error_info(struct mem_ctl_info *mci,
 
 		if (handle_errors) {
 			row = info->ecc_mode_status & 0xf;
-			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
-					     mci, mci->csrows[row].first_page,
-					     0, 0,
-					     -1, -1, row, row, 0,
-					     mci->ctl_name, "");
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
+					     mci->csrows[row].first_page, 0, 0,
+					     row, 0, -1,
+					     mci->ctl_name, "", NULL);
 		}
 	}
 
@@ -239,7 +235,8 @@ static int amd76x_probe1(struct pci_dev *pdev, int dev_idx)
 		EDAC_SECDED,
 		EDAC_SECDED
 	};
-	struct mem_ctl_info *mci = NULL;
+	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	u32 ems;
 	u32 ems_mode;
 	struct amd76x_error_info discard;
@@ -247,9 +244,15 @@ static int amd76x_probe1(struct pci_dev *pdev, int dev_idx)
 	debugf0("%s()\n", __func__);
 	pci_read_config_dword(pdev, AMD76X_ECC_MODE_STATUS, &ems);
 	ems_mode = (ems >> 10) & 0x3;
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_MCCHANNEL_IS_CSROW,
-			    0, 0, AMD76X_NR_CSROWS,
-			    AMD76X_NR_CSROWS, 1, 0);
+
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = AMD76X_NR_CSROWS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = 1;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, 0);
+
 	if (mci == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/edac/cell_edac.c b/drivers/edac/cell_edac.c
index abe06a4..39616a3 100644
--- a/drivers/edac/cell_edac.c
+++ b/drivers/edac/cell_edac.c
@@ -48,11 +48,9 @@ static void cell_edac_count_ce(struct mem_ctl_info *mci, int chan, u64 ar)
 	syndrome = (ar & 0x000000001fe00000ul) >> 21;
 
 	/* TODO: Decoding of the error address */
-	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
-				csrow->first_page + pfn, offset, syndrome,
-				-1, -1, -1, 0, chan,
-				"", "");
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
+			     csrow->first_page + pfn, offset, syndrome,
+			     0, chan, -1, "", "", NULL);
 }
 
 static void cell_edac_count_ue(struct mem_ctl_info *mci, int chan, u64 ar)
@@ -72,11 +70,9 @@ static void cell_edac_count_ue(struct mem_ctl_info *mci, int chan, u64 ar)
 	offset = address & ~PAGE_MASK;
 
 	/* TODO: Decoding of the error address */
-	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
-				csrow->first_page + pfn, offset, 0,
-				-1, -1, -1, 0, chan,
-				"", "");
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
+			     csrow->first_page + pfn, offset, 0,
+			     0, chan, -1, "", "", NULL);
 }
 
 static void cell_edac_check(struct mem_ctl_info *mci)
@@ -163,7 +159,7 @@ static void __devinit cell_edac_init_csrows(struct mem_ctl_info *mci)
 			"Initialized on node %d, chanmask=0x%x,"
 			" first_page=0x%lx, nr_pages=0x%x\n",
 			priv->node, priv->chanmask,
-			csrow->first_page, dimm->nr_pages);
+			csrow->first_page, nr_pages);
 		break;
 	}
 }
@@ -172,6 +168,7 @@ static int __devinit cell_edac_probe(struct platform_device *pdev)
 {
 	struct cbe_mic_tm_regs __iomem	*regs;
 	struct mem_ctl_info		*mci;
+	struct edac_mc_layer		layers[2];
 	struct cell_edac_priv		*priv;
 	u64				reg;
 	int				rc, chanmask, num_chans;
@@ -200,9 +197,15 @@ static int __devinit cell_edac_probe(struct platform_device *pdev)
 
 	/* Allocate & init EDAC MC data structure */
 	num_chans = chanmask == 3 ? 2 : 1;
-	mci = edac_mc_alloc(pdev->id, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, num_chans,
-			    1, num_chans, sizeof(struct cell_edac_priv));
+
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = 1;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = num_chans;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(pdev->id, ARRAY_SIZE(layers), layers, false,
+			    sizeof(struct cell_edac_priv));
 	if (mci == NULL)
 		return -ENOMEM;
 	priv = mci->pvt_info;
diff --git a/drivers/edac/cpc925_edac.c b/drivers/edac/cpc925_edac.c
index 4a25b92..eb6297d 100644
--- a/drivers/edac/cpc925_edac.c
+++ b/drivers/edac/cpc925_edac.c
@@ -555,20 +555,18 @@ static void cpc925_mc_check(struct mem_ctl_info *mci)
 	if (apiexcp & CECC_EXCP_DETECTED) {
 		cpc925_mc_printk(mci, KERN_INFO, "DRAM CECC Fault\n");
 		channel = cpc925_mc_find_channel(mci, syndrome);
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     pfn, offset, syndrome,
-				     -1, -1, -1, csrow, channel,
-				     mci->ctl_name, "");
+				     csrow, channel, -1,
+				     mci->ctl_name, "", NULL);
 	}
 
 	if (apiexcp & UECC_EXCP_DETECTED) {
 		cpc925_mc_printk(mci, KERN_INFO, "DRAM UECC Fault\n");
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     pfn, offset, 0,
-				     -1, -1, -1, csrow, -1,
-				     mci->ctl_name, "");
+				     csrow, -1, -1,
+				     mci->ctl_name, "", NULL);
 	}
 
 	cpc925_mc_printk(mci, KERN_INFO, "Dump registers:\n");
@@ -940,6 +938,7 @@ static int __devinit cpc925_probe(struct platform_device *pdev)
 {
 	static int edac_mc_idx;
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	void __iomem *vbase;
 	struct cpc925_mc_pdata *pdata;
 	struct resource *r;
@@ -976,9 +975,14 @@ static int __devinit cpc925_probe(struct platform_device *pdev)
 	}
 
 	nr_channels = cpc925_mc_get_channels(vbase) + 1;
-	mci = edac_mc_alloc(edac_mc_idx, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, CPC925_NR_CSROWS * nr_channels,
-			    CPC925_NR_CSROWS, nr_channels,
+
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = CPC925_NR_CSROWS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = nr_channels;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(edac_mc_idx, ARRAY_SIZE(layers), layers, false,
 			    sizeof(struct cpc925_mc_pdata));
 	if (!mci) {
 		cpc925_printk(KERN_ERR, "No memory for mem_ctl_info\n");
diff --git a/drivers/edac/e752x_edac.c b/drivers/edac/e752x_edac.c
index 813d965..1acce46 100644
--- a/drivers/edac/e752x_edac.c
+++ b/drivers/edac/e752x_edac.c
@@ -353,11 +353,10 @@ static void do_process_ce(struct mem_ctl_info *mci, u16 error_one,
 	channel = !(error_one & 1);
 
 	/* e752x mc reads 34:6 of the DRAM linear address */
-	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-			     HW_EVENT_SCOPE_MC, mci,
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 			     page, offset_in_page(sec1_add << 4), sec1_syndrome,
-			     -1, -1, -1, row, channel,
-			     "e752x CE", "");
+			     row, channel, -1,
+			     "e752x CE", "", NULL);
 }
 
 static inline void process_ce(struct mem_ctl_info *mci, u16 error_one,
@@ -391,12 +390,11 @@ static void do_process_ue(struct mem_ctl_info *mci, u16 error_one,
 			edac_mc_find_csrow_by_page(mci, block_page);
 
 		/* e752x mc reads 34:6 of the DRAM linear address */
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-					HW_EVENT_SCOPE_MC_CSROW, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 					block_page,
 					offset_in_page(error_2b << 4), 0,
-					-1, -1, -1, row, -1,
-					"e752x UE from Read", "");
+					 row, -1, -1,
+					"e752x UE from Read", "", NULL);
 
 	}
 	if (error_one & 0x0404) {
@@ -411,12 +409,11 @@ static void do_process_ue(struct mem_ctl_info *mci, u16 error_one,
 			edac_mc_find_csrow_by_page(mci, block_page);
 
 		/* e752x mc reads 34:6 of the DRAM linear address */
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-					HW_EVENT_SCOPE_MC_CSROW, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 					block_page,
 					offset_in_page(error_2b << 4), 0,
-					-1, -1, -1, row, -1,
-					"e752x UE from Scruber", "");
+					row, -1, -1,
+					"e752x UE from Scruber", "", NULL);
 	}
 }
 
@@ -439,10 +436,9 @@ static inline void process_ue_no_info_wr(struct mem_ctl_info *mci,
 		return;
 
 	debugf3("%s()\n", __func__);
-	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-			     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-			     -1, -1, -1, -1, -1,
-			     "e752x UE log memory write", "");
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0,
+			     -1, -1, -1,
+			     "e752x UE log memory write", "", NULL);
 }
 
 static void do_process_ded_retry(struct mem_ctl_info *mci, u16 error,
@@ -1248,6 +1244,7 @@ static int e752x_probe1(struct pci_dev *pdev, int dev_idx)
 	u16 pci_data;
 	u8 stat8;
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	struct e752x_pvt *pvt;
 	u16 ddrcsr;
 	int drc_chan;		/* Number of channels 0=1chan,1=2chan */
@@ -1274,13 +1271,16 @@ static int e752x_probe1(struct pci_dev *pdev, int dev_idx)
 	/* Dual channel = 1, Single channel = 0 */
 	drc_chan = dual_channel_active(ddrcsr);
 
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, E752X_NR_CSROWS * (drc_chan + 1),
-			    E752X_NR_CSROWS, drc_chan + 1, sizeof(*pvt));
-
-	if (mci == NULL) {
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = E752X_NR_CSROWS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = drc_chan + 1;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers,
+			    false, sizeof(*pvt));
+	if (mci == NULL)
 		return -ENOMEM;
-	}
 
 	debugf3("%s(): init mci\n", __func__);
 	mci->mtype_cap = MEM_FLAG_RDDR;
diff --git a/drivers/edac/e7xxx_edac.c b/drivers/edac/e7xxx_edac.c
index 01f64d3..f59dc0c 100644
--- a/drivers/edac/e7xxx_edac.c
+++ b/drivers/edac/e7xxx_edac.c
@@ -219,20 +219,15 @@ static void process_ce(struct mem_ctl_info *mci, struct e7xxx_error_info *info)
 	row = edac_mc_find_csrow_by_page(mci, page);
 	/* convert syndrome to channel */
 	channel = e7xxx_find_channel(syndrome);
-	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-			     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
-			     page, 0, syndrome,
-			     -1, -1, -1, row, channel,
-			     "e7xxx CE", "");
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, page, 0, syndrome,
+			     row, channel, -1, "e7xxx CE", "", NULL);
 }
 
 static void process_ce_no_info(struct mem_ctl_info *mci)
 {
 	debugf3("%s()\n", __func__);
-	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-			     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-			     -1, -1, -1, -1, -1,
-			     "e7xxx CE log register overflow", "");
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0, -1, -1, -1,
+			     "e7xxx CE log register overflow", "", NULL);
 }
 
 static void process_ue(struct mem_ctl_info *mci, struct e7xxx_error_info *info)
@@ -247,20 +242,16 @@ static void process_ue(struct mem_ctl_info *mci, struct e7xxx_error_info *info)
 	block_page = error_2b >> 6;	/* convert to 4k address */
 	row = edac_mc_find_csrow_by_page(mci, block_page);
 
-	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-			     HW_EVENT_SCOPE_MC_CSROW, mci, block_page, 0, 0,
-			     -1, -1, -1, row, -1,
-			     "e7xxx UE", "");
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, block_page, 0, 0,
+			     row, -1, -1, "e7xxx UE", "", NULL);
 }
 
 static void process_ue_no_info(struct mem_ctl_info *mci)
 {
 	debugf3("%s()\n", __func__);
 
-	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-			     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-			     -1, -1, -1, -1, -1,
-			     "e7xxx UE log register overflow", "");
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0, -1, -1, -1,
+			     "e7xxx UE log register overflow", "", NULL);
 }
 
 static void e7xxx_get_error_info(struct mem_ctl_info *mci,
@@ -431,6 +422,7 @@ static int e7xxx_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	u16 pci_data;
 	struct mem_ctl_info *mci = NULL;
+	struct edac_mc_layer layers[2];
 	struct e7xxx_pvt *pvt = NULL;
 	u32 drc;
 	int drc_chan;
@@ -449,10 +441,13 @@ static int e7xxx_probe1(struct pci_dev *pdev, int dev_idx)
 	 * will map the rank. So, an error to either channel should be
 	 * attributed to the same dimm.
 	 */
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, E7XXX_NR_DIMMS,
-			    E7XXX_NR_CSROWS, drc_chan + 1, sizeof(*pvt));
-
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = E7XXX_NR_CSROWS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = drc_chan + 1;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, sizeof(*pvt));
 	if (mci == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index e4961fd..1d421d3 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -448,35 +448,10 @@ static inline void pci_write_bits32(struct pci_dev *pdev, int offset,
 
 #endif				/* CONFIG_PCI */
 
-/**
- * enum edac_alloc_fill_strategy - Controls the way csrows/cschannels are mapped
- * @EDAC_ALLOC_FILL_CSROW_CSCHANNEL:	csrows are rows, cschannels are channel.
- *					This is the default and should be used
- *					when the memory controller is able to
- *					see csrows/cschannels. The dimms are
- *					associated with cschannels.
- * @EDAC_ALLOC_FILL_MCCHANNEL_IS_CSROW:	mc_branch/mc_channel are mapped as
- *					cschannel. DIMMs inside each channel are
- *					mapped as csrows. Most FBDIMMs drivers
- *					use this model.
- *@EDAC_ALLOC_FILL_PRIV:		The driver uses its own mapping model.
- *					So, the core will leave the csrows
- *					struct unitialized, leaving to the
- *					driver the task of filling it.
- */
-enum edac_alloc_fill_strategy {
-	EDAC_ALLOC_FILL_CSROW_CSCHANNEL = 0,
-	EDAC_ALLOC_FILL_MCCHANNEL_IS_CSROW,
-	EDAC_ALLOC_FILL_PRIV,
-};
-
-struct mem_ctl_info *edac_mc_alloc(int edac_index,
-				   enum edac_alloc_fill_strategy fill_strategy,
-				   unsigned num_branch,
-				   unsigned num_channel,
-				   unsigned num_dimm,
-				   unsigned nr_csrows,
-				   unsigned num_cschans,
+struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
+				   unsigned n_layers,
+				   struct edac_mc_layer *layers,
+				   bool rev_order,
 				   unsigned sz_pvt);
 extern int edac_mc_add_mc(struct mem_ctl_info *mci);
 extern void edac_mc_free(struct mem_ctl_info *mci);
@@ -485,19 +460,17 @@ extern struct mem_ctl_info *find_mci_by_dev(struct device *dev);
 extern struct mem_ctl_info *edac_mc_del_mc(struct device *dev);
 extern int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci,
 				      unsigned long page);
-void edac_mc_handle_error(enum hw_event_mc_err_type type,
-			  enum hw_event_error_scope scope,
+void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  struct mem_ctl_info *mci,
-			  unsigned long page_frame_number,
-			  unsigned long offset_in_page,
-			  unsigned long syndrome,
-			  int mc_branch,
-			  int mc_channel,
-			  int mc_dimm_number,
-			  int csrow,
-			  int cschannel,
+			  const unsigned long page_frame_number,
+			  const unsigned long offset_in_page,
+			  const unsigned long syndrome,
+			  const int layer0,
+			  const int layer1,
+			  const int layer2,
 			  const char *msg,
-			  const char *other_detail);
+			  const char *other_detail,
+			  const void *mcelog);
 
 /*
  * edac_device APIs
@@ -509,6 +482,7 @@ extern void edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
 extern void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
 				int inst_nr, int block_nr, const char *msg);
 extern int edac_device_alloc_index(void);
+extern const char *edac_layer_name[];
 
 /*
  * edac_pci APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 55760bc..6e8faf3 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -53,13 +53,18 @@ static void edac_mc_dump_channel(struct csrow_channel_info *chan)
 
 static void edac_mc_dump_dimm(struct dimm_info *dimm)
 {
+	int i;
+
 	debugf4("\tdimm = %p\n", dimm);
 	debugf4("\tdimm->label = '%s'\n", dimm->label);
 	debugf4("\tdimm->nr_pages = 0x%x\n", dimm->nr_pages);
-	debugf4("\tdimm location %d.%d.%d.%d.%d\n",
-		dimm->mc_branch, dimm->mc_channel,
-		dimm->mc_dimm_number,
-		dimm->csrow, dimm->cschannel);
+	debugf4("\tdimm location ");
+	for (i = 0; i < dimm->mci->n_layers; i++) {
+		printk(KERN_CONT "%d", dimm->location[i]);
+		if (i < dimm->mci->n_layers - 1)
+			printk(KERN_CONT ".");
+	}
+	printk(KERN_CONT "\n");
 	debugf4("\tdimm->grain = %d\n", dimm->grain);
 	debugf4("\tdimm->nr_pages = 0x%x\n", dimm->nr_pages);
 }
@@ -160,52 +165,25 @@ void *edac_align_ptr(void **p, unsigned size, int quant)
 /**
  * edac_mc_alloc: Allocate and partially fills a struct mem_ctl_info structure
  * @edac_index:		Memory controller number
- * @fill_strategy:	csrow/cschannel filling strategy
- * @num_branch:		Number of memory controller branches
- * @num_channel:	Number of memory controller channels
- * @num_dimm:		Number of dimms per memory controller channel
- * @num_csrows:		Number of CWROWS accessed via the memory controller
- * @num_cschannel:	Number of csrows channels
+ * @n_layers:		Number of layers at the MC hierarchy
+ * layers:		Describes each layer as seen by the Memory Controller
+ * @rev_order:		Fills csrows/cs channels at the reverse order
  * @size_pvt:		size of private storage needed
  *
- * This routine supports 3 modes of DIMM mapping:
- *	1) the ones that accesses DRAM's via some bus interface (FB-DIMM
- * and RAMBUS memory controllers) or that don't have chip select view
- *
- * In this case, a branch is generally a group of 2 channels, used generally
- * in  parallel to provide 128 bits data.
- *
- * In the case of FB-DIMMs, the dimm is addressed via the SPD Address
- * input selection, used by the AMB to select the DIMM. The MC channel
- * corresponds to the Memory controller channel bus used to see a series
- * of FB-DIMM's.
- *
- * num_branch, num_channel and num_dimm should point to the real
- *	parameters of the memory controller.
- *
- * The total number of dimms is num_branch * num_channel * num_dimm
- *
- * According with JEDEC No. 205, up to 8 FB-DIMMs are possible per channel. Of
- * course, controllers may have a lower limit.
+ * FIXME: rev_order seems to be uneeded. On all places, it is marked as false.
+ * Tests are required, but if this is the case, this field can just be dropped.
  *
- * num_csrows/num_cschannel should point to the emulated parameters.
- * The total number of cschannels (num_csrows * num_cschannel) should be a
- * multiple of the total number dimms, e. g:
- *  factor = (num_csrows * num_cschannel)/(num_branch * num_channel * num_dimm)
- * should be an integer (typically: it is 1 or num_cschannel)
+ * FIXME: drivers handle multi-rank memories on different ways: on some
+ * drivers, one multi-rank memory is mapped as one DIMM, while, on others,
+ * a single multi-rank DIMM would be mapped into several "dimms".
  *
- *	2) The MC uses CSROWS/CS CHANNELS to directly select a DRAM chip.
- * One dimm chip exists on every cs channel, for single-rank memories.
- *	num_branch and num_channel should be 0
- *	num_dimm should be the total number of dimms
- *	num_csrows * num_cschannel should be equal to num_dimm
+ * Non-csrow based drivers (like FB-DIMM and RAMBUS ones) will likely report
+ * such DIMMS properly, but the CSROWS-based ones will likely do the wrong
+ * thing, as two chip select values are used for dual-rank memories (and 4, for
+ * quad-rank ones). I suspect that this issue could be solved inside the EDAC
+ * core for SDRAM memories, but it requires further study at JEDEC JESD 21C.
  *
- *	3)The MC uses CSROWS/CS CHANNELS. One dimm chip exists on every
- * csrow. The cs channel is used to indicate the defective chip(s) inside
- * the memory stick.
- *	num_branch and num_channel should be 0
- *	num_dimm should be the total number of dimms
- *	num_csrows should be equal to num_dimm
+ * In summary, solving this issue is not easy, as it requires a lot of testing.
  *
  * Everything is kmalloc'ed as one big chunk - more efficient.
  * Only can be used if all structures have the same lifetime - otherwise
@@ -217,87 +195,64 @@ void *edac_align_ptr(void **p, unsigned size, int quant)
  *	NULL allocation failed
  *	struct mem_ctl_info pointer
  */
-struct mem_ctl_info *edac_mc_alloc(int edac_index,
-				   enum edac_alloc_fill_strategy fill_strategy,
-				   unsigned num_branch,
-				   unsigned num_channel,
-				   unsigned num_dimm,
-				   unsigned num_csrows,
-				   unsigned num_cschannel,
+struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
+				   unsigned n_layers,
+				   struct edac_mc_layer *layers,
+				   bool rev_order,
 				   unsigned sz_pvt)
 {
 	void *ptr;
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer *lay;
 	struct csrow_info *csi, *csr;
 	struct csrow_channel_info *chi, *chp, *chan;
 	struct dimm_info *dimm;
-	u32 *ce_branch, *ce_channel, *ce_dimm, *ce_csrow, *ce_cschannel;
-	u32 *ue_branch, *ue_channel, *ue_dimm, *ue_csrow, *ue_cschannel;
+	u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
 	void *pvt;
-	unsigned size, tot_dimms, count, dimm_div;
-	int i;
+	unsigned size, tot_dimms, count, per_layer_count[EDAC_MAX_LAYERS];
+	unsigned tot_csrows, tot_cschannels;
+	int i, j;
 	int err;
-	int mc_branch, mc_channel, mc_dimm_number, csrow, cschannel;
 	int row, chn;
 
+	BUG_ON(n_layers > EDAC_MAX_LAYERS);
 	/*
-	 * While we expect that non-pertinent values will be filled with
-	 * 0, in order to provide a way for this routine to detect if the
-	 * EDAC is emulating the old sysfs API, we can't actually accept
-	 * 0, as otherwise, a multiply by 0 whould hapen.
+	 * Calculate the total amount of dimms and csrows/cschannels while
+	 * in the old API emulation mode
 	 */
-	if (num_branch <= 0)
-		num_branch = 1;
-	if (num_channel <= 0)
-		num_channel = 1;
-	if (num_dimm <= 0)
-		num_dimm = 1;
-	if (num_csrows <= 0)
-		num_csrows = 1;
-	if (num_cschannel <= 0)
-		num_cschannel = 1;
-
-	tot_dimms = num_branch * num_channel * num_dimm;
-	dimm_div = (num_csrows * num_cschannel) / tot_dimms;
-	if (dimm_div == 0) {
-		printk(KERN_ERR "%s: dimm_div is wrong: tot_channels/tot_dimms = %d/%d < 1\n",
-			__func__, num_csrows * num_cschannel, tot_dimms);
-		dimm_div = 1;
+	tot_dimms = 1;
+	tot_cschannels = 1;
+	tot_csrows = 1;
+	for (i = 0; i < n_layers; i++) {
+		tot_dimms *= layers[i].size;
+		if (layers[i].is_csrow)
+			tot_csrows *= layers[i].size;
+		else
+			tot_cschannels *= layers[i].size;
 	}
-	/* FIXME: change it to debug2() at the final version */
 
 	/* Figure out the offsets of the various items from the start of an mc
 	 * structure.  We want the alignment of each item to be at least as
 	 * stringent as what the compiler would provide if we could simply
 	 * hardcode everything into a single struct.
 	 */
-	ptr = NULL;
+	ptr = 0;
 	mci = edac_align_ptr(&ptr, sizeof(*mci), 1);
-	csi = edac_align_ptr(&ptr, sizeof(*csi), num_csrows);
-	chi = edac_align_ptr(&ptr, sizeof(*chi), num_csrows * num_cschannel);
+	lay = edac_align_ptr(&ptr, sizeof(*lay), n_layers);
+	csi = edac_align_ptr(&ptr, sizeof(*csi), tot_csrows);
+	chi = edac_align_ptr(&ptr, sizeof(*chi), tot_csrows * tot_cschannels);
 	dimm = edac_align_ptr(&ptr, sizeof(*dimm), tot_dimms);
-
-	count = num_branch;
-	ue_branch = edac_align_ptr(&ptr, sizeof(*ce_branch), count);
-	ce_branch = edac_align_ptr(&ptr, sizeof(*ce_branch), count);
-	count *= num_channel;
-	ue_channel = edac_align_ptr(&ptr, sizeof(*ce_channel), count);
-	ce_channel = edac_align_ptr(&ptr, sizeof(*ce_channel), count);
-	count *= num_dimm;
-	ue_dimm = edac_align_ptr(&ptr, sizeof(*ce_dimm), count * num_dimm);
-	ce_dimm = edac_align_ptr(&ptr, sizeof(*ce_dimm), count * num_dimm);
-
-	count = num_csrows;
-	ue_csrow = edac_align_ptr(&ptr, sizeof(*ce_dimm), count);
-	ce_csrow = edac_align_ptr(&ptr, sizeof(*ce_dimm), count);
-	count *= num_cschannel;
-	ue_cschannel = edac_align_ptr(&ptr, sizeof(*ce_dimm), count);
-	ce_cschannel = edac_align_ptr(&ptr, sizeof(*ce_dimm), count);
-
+	count = 1;
+	for (i = 0; i < n_layers; i++) {
+		count *= layers[i].size;
+		ce_per_layer[i] = edac_align_ptr(&ptr, sizeof(unsigned), count);
+		ue_per_layer[i] = edac_align_ptr(&ptr, sizeof(unsigned), count);
+	}
 	pvt = edac_align_ptr(&ptr, sz_pvt, 1);
 	size = ((unsigned long)pvt) + sz_pvt;
 
-	debugf1("%s(): allocating %u bytes for mci data\n", __func__, size);
+	debugf1("%s(): allocating %u bytes for mci data (%d dimms, %d csrows/channels)\n",
+		__func__, size, tot_dimms, tot_csrows * tot_cschannels);
 	mci = kzalloc(size, GFP_KERNEL);
 	if (mci == NULL)
 		return NULL;
@@ -305,131 +260,97 @@ struct mem_ctl_info *edac_mc_alloc(int edac_index,
 	/* Adjust pointers so they point within the memory we just allocated
 	 * rather than an imaginary chunk of memory located at address 0.
 	 */
+	lay = (struct edac_mc_layer *)(((char *)mci) + ((unsigned long)lay));
 	csi = (struct csrow_info *)(((char *)mci) + ((unsigned long)csi));
 	chi = (struct csrow_channel_info *)(((char *)mci) + ((unsigned long)chi));
 	dimm = (struct dimm_info *)(((char *)mci) + ((unsigned long)dimm));
+	for (i = 0; i < n_layers; i++) {
+		mci->ce_per_layer[i] = (u32 *)((char *)mci + ((unsigned long)ce_per_layer[i]));
+		mci->ue_per_layer[i] = (u32 *)((char *)mci + ((unsigned long)ue_per_layer[i]));
+	}
 	pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;
 
 	/* setup index and various internal pointers */
 	mci->mc_idx = edac_index;
 	mci->csrows = csi;
 	mci->dimms  = dimm;
-	mci->pvt_info = pvt;
-
 	mci->tot_dimms = tot_dimms;
-	mci->num_branch = num_branch;
-	mci->num_channel = num_channel;
-	mci->num_dimm = num_dimm;
-	mci->num_csrows = num_csrows;
-	mci->num_cschannel = num_cschannel;
+	mci->pvt_info = pvt;
+	mci->n_layers = n_layers;
+	mci->layers = lay;
+	memcpy(mci->layers, layers, sizeof(*lay) * n_layers);
+	mci->num_csrows = tot_csrows;
+	mci->num_cschannel = tot_cschannels;
 
 	/*
-	 * Fills the dimm struct
+	 * Fills the csrow struct
 	 */
-	mc_branch = (num_branch > 0) ? 0 : -1;
-	mc_channel = (num_channel > 0) ? 0 : -1;
-	mc_dimm_number = (num_dimm > 0) ? 0 : -1;
-	if (!num_channel && !num_branch) {
-		csrow = (num_csrows > 0) ? 0 : -1;
-		cschannel = (num_cschannel > 0) ? 0 : -1;
-	} else {
-		csrow = -1;
-		cschannel = -1;
+	for (row = 0; row < tot_csrows; row++) {
+		csr = &csi[row];
+		csr->csrow_idx = row;
+		csr->mci = mci;
+		csr->nr_channels = tot_cschannels;
+		chp = &chi[row * tot_cschannels];
+		csr->channels = chp;
+
+		for (chn = 0; chn < tot_cschannels; chn++) {
+			chan = &chp[chn];
+			chan->chan_idx = chn;
+			chan->csrow = csr;
+		}
 	}
 
+	/*
+	 * Fills the dimm struct
+	 */
+	memset(&per_layer_count, 0, sizeof(per_layer_count));
+	row = 0;
+	chn = 0;
 	debugf4("%s: initializing %d dimms\n", __func__, tot_dimms);
 	for (i = 0; i < tot_dimms; i++) {
+		debugf4("%s: dimm%d: row %d, chan %d\n", __func__,
+			i, row, chn);
+		chan = &csi[row].channels[chn];
 		dimm = &mci->dimms[i];
-
-		dimm->mc_branch = mc_branch;
-		dimm->mc_channel = mc_channel;
-		dimm->mc_dimm_number = mc_dimm_number;
-		dimm->csrow = csrow;
-		dimm->cschannel = cschannel;
-
-		/*
-		 * Increment the location
-		 * On csrow-emulated devices, csrow/cschannel should be -1
-		 */
-		if (!num_channel && !num_branch) {
-			if (num_cschannel) {
-				cschannel = (cschannel + 1) % num_cschannel;
-				if (cschannel)
-					continue;
+		dimm->mci = mci;
+
+		/* Copy DIMM location */
+		for (j = 0; j < n_layers; j++)
+			dimm->location[j] = per_layer_count[j];
+
+		/* Link it to the csrows old API data */
+		chan->dimm = dimm;
+		dimm->csrow = row;
+		dimm->cschannel = chn;
+
+		/* Increment csrow location */
+		if (!rev_order) {
+			for (j = n_layers - 1; j >= 0; j--)
+				if (!layers[j].is_csrow)
+					break;
+			chn++;
+			if (chn == tot_cschannels) {
+				chn = 0;
+				row++;
 			}
-			if (num_csrows) {
-				csrow = (csrow + 1) % num_csrows;
-				if (csrow)
-					continue;
+		} else {
+			for (j = n_layers - 1; j >= 0; j--)
+				if (layers[j].is_csrow)
+					break;
+			row++;
+			if (row == tot_csrows) {
+				row = 0;
+				chn++;
 			}
 		}
-		if (num_dimm) {
-			mc_dimm_number = (mc_dimm_number + 1) % num_dimm;
-			if (mc_dimm_number)
-				continue;
-		}
-		if (num_channel) {
-			mc_channel = (mc_channel + 1) % num_channel;
-			if (mc_channel)
-				continue;
-		}
-		if (num_branch) {
-			mc_branch = (mc_branch + 1) % num_branch;
-			if (mc_branch)
-				continue;
-		}
-	}
 
-	/*
-	 * Fills the csrows struct
-	 *
-	 * NOTE: there are two possible memory arrangements here:
-	 *
-	 *
-	 */
-	switch (fill_strategy) {
-	case EDAC_ALLOC_FILL_CSROW_CSCHANNEL:
-		for (row = 0; row < num_csrows; row++) {
-			csr = &csi[row];
-			csr->csrow_idx = row;
-			csr->mci = mci;
-			csr->nr_channels = num_cschannel;
-			chp = &chi[row * num_cschannel];
-			csr->channels = chp;
-
-			for (chn = 0; chn < num_cschannel; chn++) {
-				int dimm_idx = (chn + row * num_cschannel) /
-						dimm_div;
-				debugf4("%s: csrow(%d,%d) = dimm%d\n",
-					__func__, row, chn, dimm_idx);
-				chan = &chp[chn];
-				chan->chan_idx = chn;
-				chan->csrow = csr;
-				chan->dimm = &dimm[dimm_idx];
-			}
+		/* Increment dimm location */
+		for (j = n_layers - 1; j >= 0; j--) {
+			per_layer_count[j]++;
+			if (per_layer_count[j] < layers[j].size)
+				break;
+			per_layer_count[j] = 0;
 		}
-	case EDAC_ALLOC_FILL_MCCHANNEL_IS_CSROW:
-		for (row = 0; row < num_csrows; row++) {
-			csr = &csi[row];
-			csr->csrow_idx = row;
-			csr->mci = mci;
-			csr->nr_channels = num_cschannel;
-			chp = &chi[row * num_cschannel];
-			csr->channels = chp;
-
-			for (chn = 0; chn < num_cschannel; chn++) {
-				int dimm_idx = (chn * num_cschannel + row) /
-						dimm_div;
-				debugf4("%s: csrow(%d,%d) = dimm%d\n",
-					__func__, row, chn, dimm_idx);
-				chan = &chp[chn];
-				chan->chan_idx = chn;
-				chan->csrow = csr;
-				chan->dimm = &dimm[dimm_idx];
-			}
-		}
-	case EDAC_ALLOC_FILL_PRIV:
-		break;
 	}
 
 	mci->op_state = OP_ALLOC;
@@ -886,9 +807,9 @@ int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci, unsigned long page)
 			csrow->page_mask);
 
 		if ((page >= csrow->first_page) &&
-		(page <= csrow->last_page) &&
-		((page & csrow->page_mask) ==
-		(csrow->first_page & csrow->page_mask))) {
+		    (page <= csrow->last_page) &&
+		    ((page & csrow->page_mask) ==
+		    (csrow->first_page & csrow->page_mask))) {
 			row = i;
 			break;
 		}
@@ -903,221 +824,110 @@ int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci, unsigned long page)
 }
 EXPORT_SYMBOL_GPL(edac_mc_find_csrow_by_page);
 
-void edac_increment_ce_error(enum hw_event_error_scope scope,
-			     struct mem_ctl_info *mci,
-			     int mc_branch,
-			     int mc_channel,
-			     int mc_dimm_number,
-			     int csrow,
-			     int cschannel)
-{
-	int index;
+const char *edac_layer_name[] = {
+	[EDAC_MC_LAYER_BRANCH] = "branch",
+	[EDAC_MC_LAYER_CHANNEL] = "channel",
+	[EDAC_MC_LAYER_SLOT] = "slot",
+	[EDAC_MC_LAYER_CHIP_SELECT] = "csrow",
+};
+EXPORT_SYMBOL_GPL(edac_layer_name);
 
-	mci->err.ce_mc++;
+static void edac_increment_ce_error(struct mem_ctl_info *mci,
+				    bool enable_filter,
+				    unsigned pos[EDAC_MAX_LAYERS])
+{
+	int i, index = 0;
 
-	if (scope == HW_EVENT_SCOPE_MC) {
-		mci->ce_noinfo_count = 0;
-		return;
-	}
+	mci->ce_mc++;
 
-	index = 0;
-	if (mc_branch >= 0) {
-		index = mc_branch;
-		mci->err.ce_branch[index]++;
-	}
-	if (scope == HW_EVENT_SCOPE_MC_BRANCH)
+	if (!enable_filter) {
+		mci->ce_noinfo_count++;
 		return;
-	index *= mci->num_branch;
-
-	if (mc_channel >= 0) {
-		index += mc_channel;
-		mci->err.ce_channel[index]++;
 	}
-	if (scope == HW_EVENT_SCOPE_MC_CHANNEL)
-		return;
-	index *= mci->num_channel;
 
-	if (mc_dimm_number >= 0) {
-		index += mc_dimm_number;
-		mci->err.ce_dimm[index]++;
-	}
-	if (scope == HW_EVENT_SCOPE_MC_DIMM)
-		return;
-	index *= mci->num_dimm;
-
-	if (csrow >= 0) {
-		index += csrow;
-		mci->err.ce_csrow[csrow]++;
-	}
-	if (scope == HW_EVENT_SCOPE_MC_CSROW_CHANNEL)
-		return;
-	index *= mci->num_csrows;
-
-	if (cschannel >= 0) {
-		index += cschannel;
-		mci->err.ce_cschannel[index]++;
+	for (i = 0; i <= mci->n_layers; i++) {
+		if (pos[i] < 0)
+			break;
+		index += pos[i];
+		mci->ce_per_layer[i][index]++;
+		index *= mci->layers[i].size;
 	}
 }
 
-void edac_increment_ue_error(enum hw_event_error_scope scope,
-			     struct mem_ctl_info *mci,
-			     int mc_branch,
-			     int mc_channel,
-			     int mc_dimm_number,
-			     int csrow,
-			     int cschannel)
+static void edac_increment_ue_error(struct mem_ctl_info *mci,
+				    bool enable_filter,
+				    unsigned pos[EDAC_MAX_LAYERS])
 {
-	int index;
-
-	mci->err.ue_mc++;
+	int i, index = 0;
 
-	if (scope == HW_EVENT_SCOPE_MC) {
-		mci->ue_noinfo_count = 0;
-		return;
-	}
+	mci->ue_mc++;
 
-	index = 0;
-	if (mc_branch >= 0) {
-		index = mc_branch;
-		mci->err.ue_branch[index]++;
-	}
-	if (scope == HW_EVENT_SCOPE_MC_BRANCH)
+	if (!enable_filter) {
+		mci->ce_noinfo_count++;
 		return;
-	index *= mci->num_branch;
-
-	if (mc_channel >= 0) {
-		index += mc_channel;
-		mci->err.ue_channel[index]++;
 	}
-	if (scope == HW_EVENT_SCOPE_MC_CHANNEL)
-		return;
-	index *= mci->num_channel;
 
-	if (mc_dimm_number >= 0) {
-		index += mc_dimm_number;
-		mci->err.ue_dimm[index]++;
-	}
-	if (scope == HW_EVENT_SCOPE_MC_DIMM)
-		return;
-	index *= mci->num_dimm;
-
-	if (csrow >= 0) {
-		index += csrow;
-		mci->err.ue_csrow[csrow]++;
-	}
-	if (scope == HW_EVENT_SCOPE_MC_CSROW_CHANNEL)
-		return;
-	index *= mci->num_csrows;
-
-	if (cschannel >= 0) {
-		index += cschannel;
-		mci->err.ue_cschannel[index]++;
+	for (i = 0; i <= mci->n_layers; i++) {
+		if (pos[i] < 0)
+			break;
+		index += pos[i];
+		mci->ue_per_layer[i][index]++;
+		index *= mci->layers[i].size;
 	}
 }
 
-void edac_mc_handle_error(enum hw_event_mc_err_type type,
-			  enum hw_event_error_scope scope,
+void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  struct mem_ctl_info *mci,
-			  unsigned long page_frame_number,
-			  unsigned long offset_in_page,
-			  unsigned long syndrome,
-			  int mc_branch,
-			  int mc_channel,
-			  int mc_dimm_number,
-			  int csrow,
-			  int cschannel,
+			  const unsigned long page_frame_number,
+			  const unsigned long offset_in_page,
+			  const unsigned long syndrome,
+			  const int layer0,
+			  const int layer1,
+			  const int layer2,
 			  const char *msg,
-			  const char *other_detail)
+			  const char *other_detail,
+			  const void *mcelog)
 {
 	unsigned long remapped_page;
-	/* FIXME: too much for stack. Move it to some pre-alocated area */
+	/* FIXME: too much for stack: move it to some pre-alocated area */
 	char detail[80 + strlen(other_detail)];
 	char label[(EDAC_MC_LABEL_LEN + 2) * mci->tot_dimms], *p;
 	char location[80];
+	int row = -1, chan = -1;
+	int pos[EDAC_MAX_LAYERS] = { layer0, layer1, layer2 };
 	int i;
 	u32 grain;
+	bool enable_filter = false;
 
 	debugf3("MC%d: %s()\n", mci->mc_idx, __func__);
 
 	/* Check if the event report is consistent */
-	if ((scope == HW_EVENT_SCOPE_MC_CSROW_CHANNEL) &&
-	    (cschannel >= mci->num_cschannel)) {
-		trace_mc_out_of_range(mci, "CE", "cs channel", cschannel,
-					0, mci->num_cschannel);
-		edac_mc_printk(mci, KERN_ERR,
-				"INTERNAL ERROR: cs channel out of range (%d >= %d)\n",
-				cschannel, mci->num_cschannel);
-		if (type == HW_EVENT_ERR_CORRECTED)
-			mci->err.ce_mc++;
-		else
-			mci->err.ue_mc++;
-		return;
-	} else {
-		cschannel = -1;
-	}
-
-	if ((scope <= HW_EVENT_SCOPE_MC_CSROW) &&
-	    (csrow >= mci->num_csrows)) {
-		trace_mc_out_of_range(mci, "CE", "csrow", csrow,
-					0, mci->num_csrows);
-		edac_mc_printk(mci, KERN_ERR,
-				"INTERNAL ERROR: csrow out of range (%d >= %d)\n",
-				csrow, mci->num_csrows);
-		if (type == HW_EVENT_ERR_CORRECTED)
-			mci->err.ce_mc++;
-		else
-			mci->err.ue_mc++;
-		return;
-	} else {
-		csrow = -1;
-	}
-
-	if ((scope <= HW_EVENT_SCOPE_MC_CSROW) &&
-	    (mc_dimm_number >= mci->num_dimm)) {
-		trace_mc_out_of_range(mci, "CE", "dimm_number",
-					mc_dimm_number, 0, mci->num_dimm);
-		edac_mc_printk(mci, KERN_ERR,
-				"INTERNAL ERROR: dimm_number out of range (%d >= %d)\n",
-				mc_dimm_number, mci->num_dimm);
-		if (type == HW_EVENT_ERR_CORRECTED)
-			mci->err.ce_mc++;
-		else
-			mci->err.ue_mc++;
-		return;
-	} else {
-		mc_dimm_number = -1;
-	}
-
-	if ((scope <= HW_EVENT_SCOPE_MC_CHANNEL) &&
-	    (mc_channel >= mci->num_dimm)) {
-		trace_mc_out_of_range(mci, "CE", "mc_channel",
-					mc_channel, 0, mci->num_dimm);
-		edac_mc_printk(mci, KERN_ERR,
-				"INTERNAL ERROR: mc_channel out of range (%d >= %d)\n",
-				mc_channel, mci->num_dimm);
-		if (type == HW_EVENT_ERR_CORRECTED)
-			mci->err.ce_mc++;
-		else
-			mci->err.ue_mc++;
-		return;
-	} else {
-		mc_channel = -1;
-	}
-
-	if ((scope <= HW_EVENT_SCOPE_MC_BRANCH) &&
-	    (mc_branch >= mci->num_branch)) {
-		trace_mc_out_of_range(mci, "CE", "branch",
-					mc_branch, 0, mci->num_branch);
-		edac_mc_printk(mci, KERN_ERR,
-				"INTERNAL ERROR: mc_branch out of range (%d >= %d)\n",
-				mc_branch, mci->num_branch);
-		if (type == HW_EVENT_ERR_CORRECTED)
-			mci->err.ce_mc++;
-		else
-			mci->err.ue_mc++;
-		return;
-	} else {
-		mc_branch = -1;
+	for (i = 0; i < mci->n_layers; i++) {
+		if (pos[i] >= mci->layers[i].size) {
+			if (type == HW_EVENT_ERR_CORRECTED) {
+				p = "CE";
+				mci->ce_mc++;
+			} else {
+				p = "UE";
+				mci->ue_mc++;
+			}
+			trace_mc_out_of_range(mci, p,
+					edac_layer_name[mci->layers[i].type],
+					pos[i], 0, mci->layers[i].size);
+			edac_mc_printk(mci, KERN_ERR,
+				       "INTERNAL ERROR: %s value is out of range (%d >= %d)\n",
+				       edac_layer_name[mci->layers[i].type],
+				       pos[i], mci->layers[i].size);
+			/*
+			 * Instead of just returning it, let's use what's
+			 * known about the error. The increment routines and
+			 * the DIMM filter logic will do the right thing by
+			 * pointing the likely damaged DIMMs.
+			 */
+			pos[i] = -1;
+		}
+		if (pos[i] > 0)
+			enable_filter = true;
 	}
 
 	/*
@@ -1134,50 +944,70 @@ void edac_mc_handle_error(enum hw_event_mc_err_type type,
 	 */
 	grain = 0;
 	p = label;
+	*p = '\0';
 	for (i = 0; i < mci->tot_dimms; i++) {
 		struct dimm_info *dimm = &mci->dimms[i];
 
-		if (mc_branch >= 0 && mc_branch != dimm->mc_branch)
+		if (layer0 >= 0 && layer0 != dimm->location[0])
 			continue;
-
-		if (mc_channel >= 0 && mc_channel != dimm->mc_channel)
+		if (layer1 >= 0 && layer1 != dimm->location[1])
 			continue;
-
-		if (mc_dimm_number >= 0 &&
-		    mc_dimm_number != dimm->mc_dimm_number)
-			continue;
-
-		if (csrow >= 0 && csrow != dimm->csrow)
-			continue;
-		if (cschannel >= 0 && cschannel != dimm->cschannel)
+		if (layer2 >= 0 && layer2 != dimm->location[2])
 			continue;
 
 		if (dimm->grain > grain)
 			grain = dimm->grain;
 
-		strcpy(p, dimm->label);
-		p[strlen(p)] = ' ';
-		p = p + strlen(p);
+		/*
+		 * If the error is memory-controller wide, there's no sense
+		 * on seeking for the affected DIMMs, as everything may be
+		 * affected.
+		 */
+		if (enable_filter) {
+			strcpy(p, dimm->label);
+			p[strlen(p)] = ' ';
+			p = p + strlen(p);
+			*p = '\0';
+
+			/*
+			 * get csrow/channel of the dimm, in order to allow
+			 * incrementing the compat API counters
+			 */
+			if (mci->layers[i].is_csrow) {
+				if (row == -1)
+					row = dimm->csrow;
+				else if (row >= 0 && row != dimm->csrow)
+					row = -2;
+			} else {
+				if (chan == -1)
+					chan = dimm->cschannel;
+				else if (chan >= 0 && chan != dimm->cschannel)
+					chan = -2;
+			}
+		}
+	}
+	if (!enable_filter) {
+		p = "any memory";
+	} else {
+		if (type == HW_EVENT_ERR_CORRECTED) {
+			if (row >= 0)
+				mci->csrows[row].ce_count++;
+			if (chan >= 0)
+				mci->csrows[row].channels[chan].ce_count++;
+		} else
+			if (row >= 0)
+				mci->csrows[row].ue_count++;
 	}
-	p[strlen(p)] = '\0';
 
 	/* Fill the RAM location data */
 	p = location;
-	if (mc_branch >= 0)
-		p += sprintf(p, "branch %d ", mc_branch);
-
-	if (mc_channel >= 0)
-		p += sprintf(p, "channel %d ", mc_channel);
-
-	if (mc_dimm_number >= 0)
-		p += sprintf(p, "dimm %d ", mc_dimm_number);
-
-	if (csrow >= 0)
-		p += sprintf(p, "csrow %d ", csrow);
-
-	if (cschannel >= 0)
-		p += sprintf(p, "cs_channel %d ", cschannel);
-
+	for (i = 0; i <= mci->n_layers; i++) {
+		if (pos[i] < 0)
+			continue;
+		p += sprintf(p, "%s %d ",
+			     edac_layer_name[mci->layers[i].type],
+			     pos[i]);
+	}
 
 	/* Memory type dependent details about the error */
 	if (type == HW_EVENT_ERR_CORRECTED)
@@ -1190,19 +1020,16 @@ void edac_mc_handle_error(enum hw_event_mc_err_type type,
 			"page 0x%lx offset 0x%lx grain %d\n",
 			page_frame_number, offset_in_page, grain);
 
-	trace_mc_error(type, mci->mc_idx, msg, label, mc_branch, mc_channel,
-		       mc_dimm_number, csrow, cschannel,
+	trace_mc_error(type, mci->mc_idx, msg, label, location,
 		       detail, other_detail);
 
 	if (type == HW_EVENT_ERR_CORRECTED) {
 		if (edac_mc_get_log_ce())
 			edac_mc_printk(mci, KERN_WARNING,
-				       "CE %s label \"%s\" (location: %d.%d.%d.%d.%d %s %s)\n",
-				       msg, label, mc_branch, mc_channel,
-				       mc_dimm_number, csrow, cschannel,
+				       "CE %s label \"%s\" (%s %s %s)\n",
+				       msg, label, location,
 				       detail, other_detail);
-		edac_increment_ce_error(scope, mci, mc_branch, mc_channel,
-					mc_dimm_number, csrow, cschannel);
+		edac_increment_ce_error(mci,enable_filter, pos);
 
 		if (mci->scrub_mode & SCRUB_SW_SRC) {
 			/*
@@ -1233,8 +1060,7 @@ void edac_mc_handle_error(enum hw_event_mc_err_type type,
 			panic("UE %s label \"%s\" (%s %s %s)\n",
 			      msg, label, location, detail, other_detail);
 
-		edac_increment_ue_error(scope, mci, mc_branch, mc_channel,
-					mc_dimm_number, csrow, cschannel);
+		edac_increment_ue_error(mci,enable_filter, pos);
 	}
 }
 EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index a6f611f..245c588 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -132,17 +132,13 @@ static const char *edac_caps[] = {
 static ssize_t csrow_ue_count_show(struct csrow_info *csrow, char *data,
 				int private)
 {
-	struct mem_ctl_info *mci = csrow->mci;
-
-	return sprintf(data, "%u\n", mci->err.ue_csrow[csrow->csrow_idx]);
+	return sprintf(data, "%u\n", csrow->ue_count);
 }
 
 static ssize_t csrow_ce_count_show(struct csrow_info *csrow, char *data,
 				int private)
 {
-	struct mem_ctl_info *mci = csrow->mci;
-
-	return sprintf(data, "%u\n", mci->err.ce_csrow[csrow->csrow_idx]);
+	return sprintf(data, "%u\n", csrow->ce_count);
 }
 
 static ssize_t csrow_size_show(struct csrow_info *csrow, char *data,
@@ -209,10 +205,7 @@ static ssize_t channel_dimm_label_store(struct csrow_info *csrow,
 static ssize_t channel_ce_count_show(struct csrow_info *csrow,
 				char *data, int channel)
 {
-	struct mem_ctl_info *mci = csrow->mci;
-	int index = csrow->csrow_idx * mci->num_cschannel + channel;
-
-	return sprintf(data, "%u\n", mci->err.ce_cschannel[index]);
+	return sprintf(data, "%u\n", csrow->channels[channel].ce_count);
 }
 
 /* csrow specific attribute structure */
@@ -478,22 +471,15 @@ static const struct sysfs_ops dimmfs_ops = {
 /* show/store functions for DIMM Label attributes */
 static ssize_t dimmdev_location_show(struct dimm_info *dimm, char *data)
 {
+	struct mem_ctl_info *mci = dimm->mci;
+	int i;
 	char *p = data;
 
-	if (dimm->mc_branch >= 0)
-		p += sprintf(p, "branch %d ", dimm->mc_branch);
-
-	if (dimm->mc_channel >= 0)
-		p += sprintf(p, "channel %d ", dimm->mc_channel);
-
-	if (dimm->mc_dimm_number >= 0)
-		p += sprintf(p, "dimm %d ", dimm->mc_dimm_number);
-
-	if (dimm->csrow >= 0)
-		p += sprintf(p, "csrow %d ", dimm->csrow);
-
-	if (dimm->cschannel >= 0)
-		p += sprintf(p, "cs_channel %d ", dimm->cschannel);
+	for (i = 0; i <= mci->n_layers; i++) {
+		p += sprintf(p, "%s %d ",
+			     edac_layer_name[mci->layers[i].type],
+			     dimm->location[i]);
+	}
 
 	return p - data;
 }
@@ -621,27 +607,29 @@ err_out:
 static ssize_t mci_reset_counters_store(struct mem_ctl_info *mci,
 					const char *data, size_t count)
 {
-	int num;
-	mci->err.ue_mc = 0;
-	mci->err.ce_mc = 0;
+	int cnt, row, chan, i;
+	mci->ue_mc = 0;
+	mci->ce_mc = 0;
 	mci->ue_noinfo_count = 0;
 	mci->ce_noinfo_count = 0;
 
-	num = mci->num_branch;
-	memset(mci->err.ue_branch, 0, num);
-	memset(mci->err.ce_branch, 0, num);
-	num *= mci->num_channel;
-	memset(mci->err.ue_channel, 0, num);
-	memset(mci->err.ce_channel, 0, num);
-	num *= mci->num_dimm;
-	memset(mci->err.ue_dimm, 0, num);
-	memset(mci->err.ce_dimm, 0, num);
-	num *= mci->num_csrows;
-	memset(mci->err.ue_csrow, 0, num);
-	memset(mci->err.ce_csrow, 0, num);
-	num *= mci->num_cschannel;
-	memset(mci->err.ue_cschannel, 0, num);
-	memset(mci->err.ce_cschannel, 0, num);
+
+	for (row = 0; row < mci->num_csrows; row++) {
+		struct csrow_info *ri = &mci->csrows[row];
+
+		ri->ue_count = 0;
+		ri->ce_count = 0;
+
+		for (chan = 0; chan < ri->nr_channels; chan++)
+			ri->channels[chan].ce_count = 0;
+	}
+
+	cnt = 1;
+	for (i = 0; i < mci->n_layers; i++) {
+		cnt *= mci->layers[i].size;
+		memset(mci->ce_per_layer[i], 0, cnt);
+		memset(mci->ue_per_layer[i], 0, cnt);
+	}
 
 	mci->start_time = jiffies;
 	return count;
@@ -700,12 +688,12 @@ static ssize_t mci_sdram_scrub_rate_show(struct mem_ctl_info *mci, char *data)
 /* default attribute files for the MCI object */
 static ssize_t mci_ue_count_show(struct mem_ctl_info *mci, char *data)
 {
-	return sprintf(data, "%d\n", mci->err.ue_mc);
+	return sprintf(data, "%d\n", mci->ue_mc);
 }
 
 static ssize_t mci_ce_count_show(struct mem_ctl_info *mci, char *data)
 {
-	return sprintf(data, "%d\n", mci->err.ce_mc);
+	return sprintf(data, "%d\n", mci->ce_mc);
 }
 
 static ssize_t mci_ce_noinfo_show(struct mem_ctl_info *mci, char *data)
@@ -1172,11 +1160,18 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 		/* Only expose populated DIMMs */
 		if (dimm->nr_pages == 0)
 			continue;
-
-		debugf1("%s creating dimm%d, located at %d.%d.%d.%d.%d\n",
-			__func__, j, dimm->mc_branch, dimm->mc_channel,
-			dimm->mc_dimm_number, dimm->csrow, dimm->cschannel);
-
+#ifdef CONFIG_EDAC_DEBUG
+		debugf1("%s creating dimm%d, located at ",
+			__func__, j);
+		if (edac_debug_level >= 1) {
+			int lay;
+			for (lay = 0; lay < mci->n_layers; lay++)
+				printk(KERN_CONT "%s %d ",
+					edac_layer_name[mci->layers[lay].type],
+					dimm->location[lay]);
+			printk(KERN_CONT "\n");
+		}
+#endif
 		err = edac_create_dimm_object(mci, dimm, j);
 		if (err) {
 			debugf1("%s() failure: create dimm %d obj\n",
diff --git a/drivers/edac/i3000_edac.c b/drivers/edac/i3000_edac.c
index 77c06af..c366002 100644
--- a/drivers/edac/i3000_edac.c
+++ b/drivers/edac/i3000_edac.c
@@ -245,10 +245,9 @@ static int i3000_process_error_info(struct mem_ctl_info *mci,
 		return 1;
 
 	if ((info->errsts ^ info->errsts2) & I3000_ERRSTS_BITS) {
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-				     -1, -1, -1, -1, -1,
-				     "UE overwrote CE", "");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0,
+				     -1, -1, -1,
+				     "UE overwrote CE", "", NULL);
 		info->errsts = info->errsts2;
 	}
 
@@ -259,18 +258,15 @@ static int i3000_process_error_info(struct mem_ctl_info *mci,
 	row = edac_mc_find_csrow_by_page(mci, pfn);
 
 	if (info->errsts & I3000_ERRSTS_UE)
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     pfn, offset, 0,
-				     -1, -1, -1, row, -1,
-				     "i3000 UE", "");
+				     row, -1, -1,
+				     "i3000 UE", "", NULL);
 	else
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     pfn, offset, info->derrsyn,
-				     -1, -1, -1, row,
-				     multi_chan ? channel : 0,
-				     "i3000 CE", "");
+				     row, multi_chan ? channel : 0, -1,
+				     "i3000 CE", "", NULL);
 
 	return 1;
 }
@@ -317,6 +313,7 @@ static int i3000_probe1(struct pci_dev *pdev, int dev_idx)
 	int rc;
 	int i, j;
 	struct mem_ctl_info *mci = NULL;
+	struct edac_mc_layer layers[2];
 	unsigned long last_cumul_size, nr_pages;
 	int interleaved, nr_channels;
 	unsigned char dra[I3000_RANKS / 2], drb[I3000_RANKS];
@@ -359,10 +356,13 @@ static int i3000_probe1(struct pci_dev *pdev, int dev_idx)
 	interleaved = i3000_is_interleaved(c0dra, c1dra, c0drb, c1drb);
 	nr_channels = interleaved ? 2 : 1;
 
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    -1, -1, I3000_RANKS,
-			    I3000_RANKS / nr_channels, nr_channels,
-			    0);
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = I3000_RANKS / nr_channels;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = nr_channels;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, 0);
 	if (!mci)
 		return -ENOMEM;
 
diff --git a/drivers/edac/i3200_edac.c b/drivers/edac/i3200_edac.c
index 6f04a50..1233435 100644
--- a/drivers/edac/i3200_edac.c
+++ b/drivers/edac/i3200_edac.c
@@ -229,29 +229,25 @@ static void i3200_process_error_info(struct mem_ctl_info *mci,
 		return;
 
 	if ((info->errsts ^ info->errsts2) & I3200_ERRSTS_BITS) {
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-				     -1, -1, -1, -1, -1,
-				     "UE overwrote CE", "");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0,
+				     -1, -1, -1, "UE overwrote CE", "", NULL);
 		info->errsts = info->errsts2;
 	}
 
 	for (channel = 0; channel < nr_channels; channel++) {
 		log = info->eccerrlog[channel];
 		if (log & I3200_ECCERRLOG_UE) {
-			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW, mci,
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 					     0, 0, 0,
-					     -1, -1, -1,
-					     eccerrlog_row(channel, log), -1,
-					     "i3000 UE", "");
+					     eccerrlog_row(channel, log),
+					     -1, -1,
+					     "i3000 UE", "", NULL);
 		} else if (log & I3200_ECCERRLOG_CE) {
-			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW, mci,
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 					     0, 0, eccerrlog_syndrome(log),
-					     -1, -1, -1,
-					     eccerrlog_row(channel, log), -1,
-					     "i3000 UE", "");
+					     eccerrlog_row(channel, log),
+					     -1, -1,
+					     "i3000 UE", "", NULL);
 		}
 	}
 }
@@ -341,6 +337,7 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 	int rc;
 	int i, j;
 	struct mem_ctl_info *mci = NULL;
+	struct edac_mc_layer layers[2];
 	u16 drbs[I3200_CHANNELS][I3200_RANKS_PER_CHANNEL];
 	bool stacked;
 	void __iomem *window;
@@ -355,10 +352,13 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
 	i3200_get_drbs(window, drbs);
 	nr_channels = how_many_channels(pdev);
 
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    -1, -1, I3200_DIMMS,
-			    I3200_RANKS, nr_channels,
-			    0);
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = I3200_DIMMS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = nr_channels;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, 0);
 	if (!mci)
 		return -ENOMEM;
 
diff --git a/drivers/edac/i5000_edac.c b/drivers/edac/i5000_edac.c
index 5fec235..564fe09 100644
--- a/drivers/edac/i5000_edac.c
+++ b/drivers/edac/i5000_edac.c
@@ -537,11 +537,10 @@ static void i5000_process_fatal_error_info(struct mem_ctl_info *mci,
 		 bank, ras, cas, allErrors, specific);
 
 	/* Call the helper to output message */
-	edac_mc_handle_error(HW_EVENT_ERR_FATAL,
-			     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
-			     branch >> 1, -1, rank, -1, -1,
+	edac_mc_handle_error(HW_EVENT_ERR_FATAL, mci, 0, 0, 0,
+			     branch >> 1, -1, rank,
 			     rdwr ? "Write error" : "Read error",
-			     msg);
+			     msg, NULL);
 }
 
 /*
@@ -639,11 +638,10 @@ static void i5000_process_nonfatal_error_info(struct mem_ctl_info *mci,
 			 rank, bank, ras, cas, ue_errors, specific);
 
 		/* Call the helper to output message */
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
-				channel >> 1, -1, rank, -1, -1,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0,
+				channel >> 1, -1, rank,
 				rdwr ? "Write error" : "Read error",
-				msg);
+				msg, NULL);
 	}
 
 	/* Check correctable errors */
@@ -695,11 +693,10 @@ static void i5000_process_nonfatal_error_info(struct mem_ctl_info *mci,
 			 specific);
 
 		/* Call the helper to output message */
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				HW_EVENT_SCOPE_MC_CHANNEL, mci, 0, 0, 0,
-				channel >> 1, channel % 2, rank, -1, -1,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 0, 0, 0,
+				channel >> 1, channel % 2, rank,
 				rdwr ? "Write error" : "Read error",
-				msg);
+				msg, NULL);
 	}
 
 	if (!misc_messages)
@@ -742,10 +739,9 @@ static void i5000_process_nonfatal_error_info(struct mem_ctl_info *mci,
 			 "Err=%#x (%s)", misc_errors, specific);
 
 		/* Call the helper to output message */
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
-				branch >> 1, -1, -1, -1, -1,
-				"Misc error", msg);
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 0, 0, 0,
+				branch >> 1, -1, -1,
+				"Misc error", msg, NULL);
 	}
 }
 
@@ -1357,10 +1353,10 @@ static void i5000_get_dimm_and_channel_counts(struct pci_dev *pdev,
 static int i5000_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[3];
 	struct i5000_pvt *pvt;
 	int num_channels;
 	int num_dimms_per_channel;
-	int num_csrows;
 
 	debugf0("MC: %s: %s(), pdev bus %u dev=0x%x fn=0x%x\n",
 		__FILE__, __func__,
@@ -1386,15 +1382,21 @@ static int i5000_probe1(struct pci_dev *pdev, int dev_idx)
 	 */
 	i5000_get_dimm_and_channel_counts(pdev, &num_dimms_per_channel,
 					&num_channels);
-	num_csrows = num_dimms_per_channel * 2;
 
-	debugf0("MC: %s(): Number of - Channels= %d  DIMMS= %d  CSROWS= %d\n",
-		__func__, num_channels, num_dimms_per_channel, num_csrows);
+	debugf0("MC: %s(): Number of Branches=2 Channels= %d  DIMMS= %d\n",
+		__func__, num_channels, num_dimms_per_channel);
 
 	/* allocate a new MC control structure */
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    2, num_channels, num_dimms_per_channel,
-			    num_csrows, num_channels, sizeof(*pvt));
+	layers[0].type = EDAC_MC_LAYER_BRANCH;
+	layers[0].size = 2;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = num_channels;
+	layers[1].is_csrow = false;
+	layers[2].type = EDAC_MC_LAYER_SLOT;
+	layers[2].size = num_dimms_per_channel;
+	layers[2].is_csrow = true;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, sizeof(*pvt));
 
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 24b03b8..d594170 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -426,11 +426,10 @@ static void i5100_handle_ce(struct mem_ctl_info *mci,
 		 "bank %u, cas %u, ras %u\n",
 		 bank, cas, ras);
 
-	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-			     HW_EVENT_SCOPE_MC_DIMM, mci,
+	edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 			     0, 0, syndrome,
-			     0, chan, rank, -1, -1,
-			     msg, detail);
+			     chan, rank, -1,
+			     msg, detail, NULL);
 }
 
 static void i5100_handle_ue(struct mem_ctl_info *mci,
@@ -449,11 +448,10 @@ static void i5100_handle_ue(struct mem_ctl_info *mci,
 		 "bank %u, cas %u, ras %u\n",
 		 bank, cas, ras);
 
-	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-			     HW_EVENT_SCOPE_MC_DIMM, mci,
+	edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 			     0, 0, syndrome,
-			     0, chan, rank, -1, -1,
-			     msg, detail);
+			     chan, rank, -1,
+			     msg, detail, NULL);
 }
 
 static void i5100_read_log(struct mem_ctl_info *mci, int chan,
@@ -864,6 +862,7 @@ static int __devinit i5100_init_one(struct pci_dev *pdev,
 {
 	int rc;
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	struct i5100_priv *priv;
 	struct pci_dev *ch0mm, *ch1mm;
 	int ret = 0;
@@ -924,9 +923,14 @@ static int __devinit i5100_init_one(struct pci_dev *pdev,
 		goto bail_ch1;
 	}
 
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    1, 2, ranksperch,
-			    ranksperch * 2, 1, sizeof(*priv));
+	layers[0].type = EDAC_MC_LAYER_CHANNEL;
+	layers[0].size = 2;
+	layers[0].is_csrow = false;
+	layers[1].type = EDAC_MC_LAYER_SLOT;
+	layers[1].size = ranksperch;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers,
+			    false, sizeof(*priv));
 	if (!mci) {
 		ret = -ENOMEM;
 		goto bail_disable_ch1;
diff --git a/drivers/edac/i5400_edac.c b/drivers/edac/i5400_edac.c
index c7455da..681e97a 100644
--- a/drivers/edac/i5400_edac.c
+++ b/drivers/edac/i5400_edac.c
@@ -571,11 +571,10 @@ static void i5400_proccess_non_recoverable_info(struct mem_ctl_info *mci,
 		 "Bank=%d Buffer ID = %d RAS=%d CAS=%d Err=0x%lx (%s)",
 		 bank, buf_id, ras, cas, allErrors, error_name[errnum]);
 
-	edac_mc_handle_error(tp_event,
-			     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
-			     branch >> 1, -1, rank, -1, -1,
+	edac_mc_handle_error(tp_event, mci, 0, 0, 0,
+			     branch >> 1, -1, rank,
 			     rdwr ? "Write error" : "Read error",
-			     msg);
+			     msg, NULL);
 }
 
 /*
@@ -645,11 +644,10 @@ static void i5400_process_nonfatal_error_info(struct mem_ctl_info *mci,
 			 branch >> 1, bank, rdwr_str(rdwr), ras, cas,
 			 allErrors, error_name[errnum]);
 
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
-				     branch >> 1, channel % 2, rank, -1, -1,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 0, 0, 0,
+				     branch >> 1, channel % 2, rank,
 				     rdwr ? "Write error" : "Read error",
-				     msg);
+				     msg, NULL);
 
 		return;
 	}
@@ -1208,9 +1206,7 @@ static int i5400_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	struct mem_ctl_info *mci;
 	struct i5400_pvt *pvt;
-	int num_channels;
-	int num_dimms_per_channel;
-	int num_csrows;
+	struct edac_mc_layer layers[3];
 
 	if (dev_idx >= ARRAY_SIZE(i5400_devs))
 		return -EINVAL;
@@ -1224,24 +1220,17 @@ static int i5400_probe1(struct pci_dev *pdev, int dev_idx)
 	if (PCI_FUNC(pdev->devfn) != 0)
 		return -ENODEV;
 
-	/* As we don't have a motherboard identification routine to determine
-	 * actual number of slots/dimms per channel, we thus utilize the
-	 * resource as specified by the chipset. Thus, we might have
-	 * have more DIMMs per channel than actually on the mobo, but this
-	 * allows the driver to support up to the chipset max, without
-	 * some fancy mobo determination.
-	 */
-	num_dimms_per_channel = MAX_DIMMS_PER_CHANNEL;
-	num_channels = MAX_CHANNELS;
-	num_csrows = num_dimms_per_channel;
-
-	debugf0("MC: %s(): Number of - Channels= %d  DIMMS= %d  CSROWS= %d\n",
-		__func__, num_channels, num_dimms_per_channel, num_csrows);
-
 	/* allocate a new MC control structure */
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    2, num_channels, num_dimms_per_channel,
-			    num_csrows, num_channels, sizeof(*pvt));
+	layers[0].type = EDAC_MC_LAYER_BRANCH;
+	layers[0].size = 2;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = MAX_CHANNELS;
+	layers[1].is_csrow = false;
+	layers[2].type = EDAC_MC_LAYER_SLOT;
+	layers[2].size = MAX_DIMMS_PER_CHANNEL;
+	layers[2].is_csrow = true;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, sizeof(*pvt));
 
 	if (mci == NULL)
 		return -ENOMEM;
@@ -1252,8 +1241,8 @@ static int i5400_probe1(struct pci_dev *pdev, int dev_idx)
 
 	pvt = mci->pvt_info;
 	pvt->system_address = pdev;	/* Record this device in our private */
-	pvt->maxch = num_channels;
-	pvt->maxdimmperch = num_dimms_per_channel;
+	pvt->maxch = MAX_CHANNELS;
+	pvt->maxdimmperch = MAX_DIMMS_PER_CHANNEL;
 
 	/* 'get' the pci devices we want to reserve for our use */
 	if (i5400_get_devices(mci, dev_idx))
diff --git a/drivers/edac/i7300_edac.c b/drivers/edac/i7300_edac.c
index 33f9ac2..7b9c848 100644
--- a/drivers/edac/i7300_edac.c
+++ b/drivers/edac/i7300_edac.c
@@ -467,11 +467,10 @@ static void i7300_process_fbd_error(struct mem_ctl_info *mci)
 			 "Bank=%d RAS=%d CAS=%d Err=0x%lx (%s))",
 			 bank, ras, cas, errors, specific);
 
-		edac_mc_handle_error(HW_EVENT_ERR_FATAL,
-				     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0, 0,
-				     branch, -1, rank, -1, -1,
+		edac_mc_handle_error(HW_EVENT_ERR_FATAL, mci, 0, 0, 0,
+				     branch, -1, rank,
 				     is_wr ? "Write error" : "Read error",
-				     pvt->tmp_prt_buffer);
+				     pvt->tmp_prt_buffer, NULL);
 
 	}
 
@@ -514,12 +513,11 @@ static void i7300_process_fbd_error(struct mem_ctl_info *mci)
 			 "DRAM-Bank=%d RAS=%d CAS=%d, Err=0x%lx (%s))",
 			 bank, ras, cas, errors, specific);
 
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_BRANCH, mci, 0, 0,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 0, 0,
 				     syndrome,
-				     branch >> 1, channel % 2, rank, -1, -1,
+				     branch >> 1, channel % 2, rank,
 				     is_wr ? "Write error" : "Read error",
-				     pvt->tmp_prt_buffer);
+				     pvt->tmp_prt_buffer, NULL);
 	}
 	return;
 }
@@ -1027,10 +1025,8 @@ static int __devinit i7300_init_one(struct pci_dev *pdev,
 				    const struct pci_device_id *id)
 {
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[3];
 	struct i7300_pvt *pvt;
-	int num_channels;
-	int num_dimms_per_channel;
-	int num_csrows;
 	int rc;
 
 	/* wake up device */
@@ -1047,25 +1043,17 @@ static int __devinit i7300_init_one(struct pci_dev *pdev,
 	if (PCI_FUNC(pdev->devfn) != 0)
 		return -ENODEV;
 
-	/* As we don't have a motherboard identification routine to determine
-	 * actual number of slots/dimms per channel, we thus utilize the
-	 * resource as specified by the chipset. Thus, we might have
-	 * have more DIMMs per channel than actually on the mobo, but this
-	 * allows the driver to support up to the chipset max, without
-	 * some fancy mobo determination.
-	 */
-	num_dimms_per_channel = MAX_SLOTS;
-	num_channels = MAX_CHANNELS;
-	num_csrows = MAX_SLOTS * MAX_CHANNELS;
-
-	debugf0("MC: %s(): Number of - Channels= %d  DIMMS= %d  CSROWS= %d\n",
-		__func__, num_channels, num_dimms_per_channel, num_csrows);
-
 	/* allocate a new MC control structure */
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    MAX_BRANCHES, num_channels / MAX_BRANCHES,
-			    num_dimms_per_channel,
-			    num_csrows, num_channels, sizeof(*pvt));
+	layers[0].type = EDAC_MC_LAYER_BRANCH;
+	layers[0].size = MAX_BRANCHES;
+	layers[0].is_csrow = false;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = MAX_CHANNELS;
+	layers[1].is_csrow = true;
+	layers[2].type = EDAC_MC_LAYER_SLOT;
+	layers[2].size = MAX_SLOTS;
+	layers[2].is_csrow = true;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, sizeof(*pvt));
 
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index f63c0f4..ce75892 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -257,7 +257,6 @@ struct i7core_pvt {
 	struct i7core_channel	channel[NUM_CHANS];
 
 	int		ce_count_available;
-	int 		csrow_map[NUM_CHANS][MAX_DIMMS];
 
 			/* ECC corrected errors counts per udimm */
 	unsigned long	udimm_ce_count[MAX_DIMMS];
@@ -492,113 +491,12 @@ static void free_i7core_dev(struct i7core_dev *i7core_dev)
 /****************************************************************************
 			Memory check routines
  ****************************************************************************/
-static struct pci_dev *get_pdev_slot_func(u8 socket, unsigned slot,
-					  unsigned func)
-{
-	struct i7core_dev *i7core_dev = get_i7core_dev(socket);
-	int i;
-
-	if (!i7core_dev)
-		return NULL;
-
-	for (i = 0; i < i7core_dev->n_devs; i++) {
-		if (!i7core_dev->pdev[i])
-			continue;
-
-		if (PCI_SLOT(i7core_dev->pdev[i]->devfn) == slot &&
-		    PCI_FUNC(i7core_dev->pdev[i]->devfn) == func) {
-			return i7core_dev->pdev[i];
-		}
-	}
-
-	return NULL;
-}
-
-/**
- * i7core_get_active_channels() - gets the number of channels and csrows
- * @socket:	Quick Path Interconnect socket
- * @channels:	Number of channels that will be returned
- * @csrows:	Number of csrows found
- *
- * Since EDAC core needs to know in advance the number of available channels
- * and csrows, in order to allocate memory for csrows/channels, it is needed
- * to run two similar steps. At the first step, implemented on this function,
- * it checks the number of csrows/channels present at one socket.
- * this is used in order to properly allocate the size of mci components.
- *
- * It should be noticed that none of the current available datasheets explain
- * or even mention how csrows are seen by the memory controller. So, we need
- * to add a fake description for csrows.
- * So, this driver is attributing one DIMM memory for one csrow.
- */
-static int i7core_get_active_channels(const u8 socket, unsigned *channels,
-				      unsigned *csrows)
-{
-	struct pci_dev *pdev = NULL;
-	int i, j;
-	u32 status, control;
-
-	*channels = 0;
-	*csrows = 0;
-
-	pdev = get_pdev_slot_func(socket, 3, 0);
-	if (!pdev) {
-		i7core_printk(KERN_ERR, "Couldn't find socket %d fn 3.0!!!\n",
-			      socket);
-		return -ENODEV;
-	}
-
-	/* Device 3 function 0 reads */
-	pci_read_config_dword(pdev, MC_STATUS, &status);
-	pci_read_config_dword(pdev, MC_CONTROL, &control);
-
-	for (i = 0; i < NUM_CHANS; i++) {
-		u32 dimm_dod[3];
-		/* Check if the channel is active */
-		if (!(control & (1 << (8 + i))))
-			continue;
-
-		/* Check if the channel is disabled */
-		if (status & (1 << i))
-			continue;
-
-		pdev = get_pdev_slot_func(socket, i + 4, 1);
-		if (!pdev) {
-			i7core_printk(KERN_ERR, "Couldn't find socket %d "
-						"fn %d.%d!!!\n",
-						socket, i + 4, 1);
-			return -ENODEV;
-		}
-		/* Devices 4-6 function 1 */
-		pci_read_config_dword(pdev,
-				MC_DOD_CH_DIMM0, &dimm_dod[0]);
-		pci_read_config_dword(pdev,
-				MC_DOD_CH_DIMM1, &dimm_dod[1]);
-		pci_read_config_dword(pdev,
-				MC_DOD_CH_DIMM2, &dimm_dod[2]);
-
-		(*channels)++;
-
-		for (j = 0; j < 3; j++) {
-			if (!DIMM_PRESENT(dimm_dod[j]))
-				continue;
-			(*csrows)++;
-		}
-	}
-
-	debugf0("Number of active channels on socket %d: %d\n",
-		socket, *channels);
-
-	return 0;
-}
 
 static int get_dimm_config(struct mem_ctl_info *mci)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
-	struct csrow_info *csr;
 	struct pci_dev *pdev;
 	int i, j;
-	int csrow = 0, cschannel = 0;
 	enum edac_type mode;
 	enum mem_type mtype;
 
@@ -712,16 +610,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 
 			npages = MiB_TO_PAGES(size);
 
-			pvt->csrow_map[i][j] = csrow;
-
-			csr = &mci->csrows[csrow];
-			csr->channels[cschannel].dimm = dimm;
-			cschannel++;
-			if (cschannel >= MAX_DIMMS) {
-				cschannel = 0;
-				csrow++;
-			}
-
 			dimm->nr_pages = npages;
 
 			switch (banks) {
@@ -744,7 +632,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 			dimm->grain = 8;
 			dimm->edac_mode = mode;
 			dimm->mtype = mtype;
-			csrow++;
 		}
 
 		pci_read_config_dword(pdev, MC_SAG_CH_0, &value[0]);
@@ -763,17 +650,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 				(value[j] & ((1 << 24) - 1)));
 	}
 
-	/* Clears the unused data */
-	while (csrow < NUM_CHANS && cschannel < MAX_DIMMS) {
-		csr = &mci->csrows[csrow];
-		csr->channels[cschannel].dimm = NULL;
-		cschannel++;
-		if (cschannel >= MAX_DIMMS) {
-			cschannel = 0;
-			csrow++;
-		}
-	}
-
 	return 0;
 }
 
@@ -1571,7 +1447,7 @@ error:
 /****************************************************************************
 			Error check routines
  ****************************************************************************/
-static void i7core_rdimm_update_csrow(struct mem_ctl_info *mci,
+static void i7core_rdimm_update_errcount(struct mem_ctl_info *mci,
 				      const int chan,
 				      const int dimm,
 				      const int add)
@@ -1579,11 +1455,8 @@ static void i7core_rdimm_update_csrow(struct mem_ctl_info *mci,
 	int i;
 
 	for (i = 0; i < add; i++) {
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_DIMM, mci,
-				     0, 0, 0,
-				     0, chan, dimm, -1, -1,
-				     "error", "");
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 0, 0, 0,
+				     chan, dimm, -1, "error", "", NULL);
 	}
 }
 
@@ -1624,11 +1497,11 @@ static void i7core_rdimm_update_ce_count(struct mem_ctl_info *mci,
 
 	/*updated the edac core */
 	if (add0 != 0)
-		i7core_rdimm_update_csrow(mci, chan, 0, add0);
+		i7core_rdimm_update_errcount(mci, chan, 0, add0);
 	if (add1 != 0)
-		i7core_rdimm_update_csrow(mci, chan, 1, add1);
+		i7core_rdimm_update_errcount(mci, chan, 1, add1);
 	if (add2 != 0)
-		i7core_rdimm_update_csrow(mci, chan, 2, add2);
+		i7core_rdimm_update_errcount(mci, chan, 2, add2);
 
 }
 
@@ -1759,7 +1632,6 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 	u32 channel = (m->misc >> 18) & 0x3;
 	u32 syndrome = m->misc >> 32;
 	u32 errnum = find_first_bit(&error, 32);
-	int csrow;
 
 	if (uncorrected_error) {
 		if (ripv) {
@@ -1832,21 +1704,18 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 		(long long) m->addr, m->cpu, core_err_cnt,
 		(long long)m->status, (long long)m->misc, optype, err);
 
-	csrow = pvt->csrow_map[channel][dimm];
-
 	/*
 	 * Call the helper to output message
 	 * FIXME: what to do if core_err_cnt > 1? Currently, it generates
 	 * only one event
 	 */
 	if (uncorrected_error || !pvt->is_registered)
-		edac_mc_handle_error(tp_event,
-				     HW_EVENT_SCOPE_MC_DIMM, mci,
+		edac_mc_handle_error(tp_event, mci,
 				     m->addr >> PAGE_SHIFT,
 				     m->addr & ~PAGE_MASK,
 				     syndrome,
-				     0, channel, dimm, -1, -1,
-				     err, msg);
+				     channel, dimm, -1,
+				     err, msg, m);
 
 	kfree(msg);
 }
@@ -2265,18 +2134,19 @@ static int i7core_register_mci(struct i7core_dev *i7core_dev)
 {
 	struct mem_ctl_info *mci;
 	struct i7core_pvt *pvt;
-	int rc, channels, csrows;
-
-	/* Check the number of active and not disabled channels */
-	rc = i7core_get_active_channels(i7core_dev->socket, &channels, &csrows);
-	if (unlikely(rc < 0))
-		return rc;
+	int rc;
+	struct edac_mc_layer layers[2];
 
 	/* allocate a new MC control structure */
 
-	mci = edac_mc_alloc(EDAC_ALLOC_FILL_PRIV, i7core_dev->socket,
-			    1, NUM_CHANS, MAX_DIMMS,
-			    MAX_DIMMS, NUM_CHANS, sizeof(*pvt));
+	layers[0].type = EDAC_MC_LAYER_CHANNEL;
+	layers[0].size = NUM_CHANS;
+	layers[0].is_csrow = false;
+	layers[1].type = EDAC_MC_LAYER_SLOT;
+	layers[1].size = MAX_DIMMS;
+	layers[1].is_csrow = true;
+	mci = edac_mc_alloc(i7core_dev->socket, ARRAY_SIZE(layers), layers,
+			    false, sizeof(*pvt));
 	if (unlikely(!mci))
 		return -ENOMEM;
 
diff --git a/drivers/edac/i82443bxgx_edac.c b/drivers/edac/i82443bxgx_edac.c
index 0992549..09d39c0 100644
--- a/drivers/edac/i82443bxgx_edac.c
+++ b/drivers/edac/i82443bxgx_edac.c
@@ -156,23 +156,19 @@ static int i82443bxgx_edacmc_process_error_info(struct mem_ctl_info *mci,
 	if (info->eap & I82443BXGX_EAP_OFFSET_SBE) {
 		error_found = 1;
 		if (handle_errors)
-			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
-					     mci, page, pageoffset, 0,
-					     -1, -1, -1,
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
+					     page, pageoffset, 0,
 					     edac_mc_find_csrow_by_page(mci, page),
-					     0, mci->ctl_name, 0);
+					     0, -1, mci->ctl_name, "", NULL);
 	}
 
 	if (info->eap & I82443BXGX_EAP_OFFSET_MBE) {
 		error_found = 1;
 		if (handle_errors)
-			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
-					     mci, page, pageoffset, 0,
-					     -1, -1, -1,
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
+					     page, pageoffset, 0,
 					     edac_mc_find_csrow_by_page(mci, page),
-					     0, mci->ctl_name, 0);
+					     0, -1, mci->ctl_name, "", NULL);
 	}
 
 	return error_found;
@@ -239,6 +235,7 @@ static void i82443bxgx_init_csrows(struct mem_ctl_info *mci,
 static int i82443bxgx_edacmc_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	u8 dramc;
 	u32 nbxcfg, ecc_mode;
 	enum mem_type mtype;
@@ -252,10 +249,13 @@ static int i82443bxgx_edacmc_probe1(struct pci_dev *pdev, int dev_idx)
 	if (pci_read_config_dword(pdev, I82443BXGX_NBXCFG, &nbxcfg))
 		return -EIO;
 
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, I82443BXGX_NR_CSROWS,
-			    I82443BXGX_NR_CSROWS, I82443BXGX_NR_CHANS, 0);
-
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = I82443BXGX_NR_CSROWS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = I82443BXGX_NR_CHANS;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, 0);
 	if (mci == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/edac/i82860_edac.c b/drivers/edac/i82860_edac.c
index 3ab8a7a..85ed3a6 100644
--- a/drivers/edac/i82860_edac.c
+++ b/drivers/edac/i82860_edac.c
@@ -109,10 +109,8 @@ static int i82860_process_error_info(struct mem_ctl_info *mci,
 		return 1;
 
 	if ((info->errsts ^ info->errsts2) & 0x0003) {
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-				     -1, -1, -1, -1, -1,
-				     "UE overwrote CE", "");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0,
+				     -1, -1, -1, "UE overwrote CE", "", NULL);
 		info->errsts = info->errsts2;
 	}
 
@@ -121,19 +119,15 @@ static int i82860_process_error_info(struct mem_ctl_info *mci,
 	dimm = mci->csrows[row].channels[0].dimm;
 
 	if (info->errsts & 0x0002)
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC_DIMM, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     info->eap, 0, 0,
-				     dimm->mc_branch, dimm->mc_channel,
-				     dimm->mc_dimm_number, -1, -1,
-				     "i82860 UE", "");
+				     dimm->location[0], dimm->location[1], -1,
+				     "i82860 UE", "", NULL);
 	else
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC_DIMM, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     info->eap, 0, info->derrsyn,
-				     dimm->mc_branch, dimm->mc_channel,
-				     dimm->mc_dimm_number, -1, -1,
-				     "i82860 CE", "");
+				     dimm->location[0], dimm->location[1], -1,
+				     "i82860 CE", "", NULL);
 
 	return 1;
 }
@@ -193,6 +187,7 @@ static void i82860_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev)
 static int i82860_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	struct i82860_error_info discard;
 
 	/*
@@ -205,12 +200,13 @@ static int i82860_probe1(struct pci_dev *pdev, int dev_idx)
 	 * the channel and the GRA registers map to physical devices so we are
 	 * going to make 1 channel for group.
 	 */
-
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    1, 2 /* channels */, 8 /* sticks per channel */,
-			    16, 1,
-			    0);
-
+	layers[0].type = EDAC_MC_LAYER_CHANNEL;
+	layers[0].size = 2;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_SLOT;
+	layers[1].size = 8;
+	layers[1].is_csrow = true;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, 0);
 	if (!mci)
 		return -ENOMEM;
 
diff --git a/drivers/edac/i82875p_edac.c b/drivers/edac/i82875p_edac.c
index 74afaba..471b26a 100644
--- a/drivers/edac/i82875p_edac.c
+++ b/drivers/edac/i82875p_edac.c
@@ -236,10 +236,9 @@ static int i82875p_process_error_info(struct mem_ctl_info *mci,
 		return 1;
 
 	if ((info->errsts ^ info->errsts2) & 0x0081) {
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-				     -1, -1, -1, -1, -1,
-				     "UE overwrote CE", "");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0,
+				     -1, -1, -1,
+				     "UE overwrote CE", "", NULL);
 		info->errsts = info->errsts2;
 	}
 
@@ -247,18 +246,15 @@ static int i82875p_process_error_info(struct mem_ctl_info *mci,
 	row = edac_mc_find_csrow_by_page(mci, info->eap);
 
 	if (info->errsts & 0x0080)
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     info->eap, 0, 0,
-				     -1, -1, -1, row, -1,
-				     "i82875p UE", "");
+				     row, -1, -1,
+				     "i82875p UE", "", NULL);
 	else
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     info->eap, 0, info->derrsyn,
-				     -1, -1, -1, row,
-				     multi_chan ? (info->des & 0x1) : 0,
-				     "i82875p CE", "");
+				     row, multi_chan ? (info->des & 0x1) : 0,
+				     -1, "i82875p CE", "", NULL);
 
 	return 1;
 }
@@ -401,6 +397,7 @@ static int i82875p_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	int rc = -ENODEV;
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	struct i82875p_pvt *pvt;
 	struct pci_dev *ovrfl_pdev;
 	void __iomem *ovrfl_window;
@@ -416,10 +413,14 @@ static int i82875p_probe1(struct pci_dev *pdev, int dev_idx)
 		return -ENODEV;
 	drc = readl(ovrfl_window + I82875P_DRC);
 	nr_chans = dual_channel_active(drc) + 1;
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    -1, -1, I82875P_NR_DIMMS,
-			    I82875P_NR_CSROWS(nr_chans), nr_chans,
-			    sizeof(*pvt));
+
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = I82875P_NR_CSROWS(nr_chans);
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = nr_chans;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, sizeof(*pvt));
 	if (!mci) {
 		rc = -ENOMEM;
 		goto fail0;
diff --git a/drivers/edac/i82975x_edac.c b/drivers/edac/i82975x_edac.c
index 33feeba..c0a683a 100644
--- a/drivers/edac/i82975x_edac.c
+++ b/drivers/edac/i82975x_edac.c
@@ -290,10 +290,8 @@ static int i82975x_process_error_info(struct mem_ctl_info *mci,
 		return 1;
 
 	if ((info->errsts ^ info->errsts2) & 0x0003) {
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-				     -1, -1, -1, -1, -1,
-				     "UE overwrote CE", "");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0,
+				     -1, -1, -1,"UE overwrote CE", "", NULL);
 		info->errsts = info->errsts2;
 	}
 
@@ -307,18 +305,15 @@ static int i82975x_process_error_info(struct mem_ctl_info *mci,
 	row = edac_mc_find_csrow_by_page(mci, page);
 
 	if (info->errsts & 0x0002)
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     page, offst, 0,
-				     -1, -1, -1, row, -1,
-				     "i82975x UE", "");
+				     row, -1, -1,
+				     "i82975x UE", "", NULL);
 	else
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     page, offst, info->derrsyn,
-				     -1, -1, -1, row,
-				     multi_chan ? chan : 0,
-				     "i82975x CE", "");
+				     row, multi_chan ? chan : 0, -1,
+				     "i82975x CE", "", NULL);
 
 	return 1;
 }
@@ -476,6 +471,7 @@ static int i82975x_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	int rc = -ENODEV;
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	struct i82975x_pvt *pvt;
 	void __iomem *mch_window;
 	u32 mchbar;
@@ -544,10 +540,13 @@ static int i82975x_probe1(struct pci_dev *pdev, int dev_idx)
 	chans = dual_channel_active(mch_window) + 1;
 
 	/* assuming only one controller, index thus is 0 */
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    -1, -1, I82975X_NR_DIMMS,
-			    I82975X_NR_CSROWS(chans), chans,
-			    sizeof(*pvt));
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = I82975X_NR_DIMMS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = I82975X_NR_CSROWS(chans);
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, sizeof(*pvt));
 	if (!mci) {
 		rc = -ENOMEM;
 		goto fail1;
diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index f7c3a67..d074b71 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -854,18 +854,16 @@ static void mpc85xx_mc_check(struct mem_ctl_info *mci)
 		mpc85xx_mc_printk(mci, KERN_ERR, "PFN out of range!\n");
 
 	if (err_detect & DDR_EDE_SBE)
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     pfn, err_addr & ~PAGE_MASK, syndrome,
-				     -1, -1, -1, row_index, 0,
-				     mci->ctl_name, "");
+				     row_index, 0, -1,
+				     mci->ctl_name, "", NULL);
 
 	if (err_detect & DDR_EDE_MBE)
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     pfn, err_addr & ~PAGE_MASK, syndrome,
-				     -1, -1, -1, row_index, 0,
-				     mci->ctl_name, "");
+				     row_index, 0, -1,
+				     mci->ctl_name, "", NULL);
 
 	out_be32(pdata->mc_vbase + MPC85XX_MC_ERR_DETECT, err_detect);
 }
@@ -967,6 +965,7 @@ static void __devinit mpc85xx_init_csrows(struct mem_ctl_info *mci)
 static int __devinit mpc85xx_mc_err_probe(struct platform_device *op)
 {
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	struct mpc85xx_mc_pdata *pdata;
 	struct resource r;
 	u32 sdram_ctl;
@@ -975,8 +974,14 @@ static int __devinit mpc85xx_mc_err_probe(struct platform_device *op)
 	if (!devres_open_group(&op->dev, mpc85xx_mc_err_probe, GFP_KERNEL))
 		return -ENOMEM;
 
-	mci = edac_mc_alloc(edac_mc_idx, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, 4, 4, 1, sizeof(*pdata));
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = 4;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = 1;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(edac_mc_idx, ARRAY_SIZE(layers), layers, false,
+			    sizeof(*pdata));
 	if (!mci) {
 		devres_release_group(&op->dev, mpc85xx_mc_err_probe);
 		return -ENOMEM;
@@ -1165,7 +1170,6 @@ static void __init mpc85xx_mc_clear_rfxe(void *data)
 static int __init mpc85xx_mc_init(void)
 {
 	int res = 0;
-	u32 pvr = 0;
 
 	printk(KERN_INFO "Freescale(R) MPC85xx EDAC driver, "
 	       "(C) 2006 Montavista Software\n");
diff --git a/drivers/edac/mv64x60_edac.c b/drivers/edac/mv64x60_edac.c
index 96a675a..a32e9b6 100644
--- a/drivers/edac/mv64x60_edac.c
+++ b/drivers/edac/mv64x60_edac.c
@@ -611,19 +611,17 @@ static void mv64x60_mc_check(struct mem_ctl_info *mci)
 
 	/* first bit clear in ECC Err Reg, 1 bit error, correctable by HW */
 	if (!(reg & 0x1))
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     err_addr >> PAGE_SHIFT,
 				     err_addr & PAGE_MASK, syndrome,
-				     -1, -1, -1, 0, 0,
-				     mci->ctl_name, "");
+				     0, 0, -1,
+				     mci->ctl_name, "", NULL);
 	else	/* 2 bit error, UE */
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     err_addr >> PAGE_SHIFT,
 				     err_addr & PAGE_MASK, 0,
-				     -1, -1, -1, 0, 0,
-				     mci->ctl_name, "");
+				     0, 0, -1,
+				     mci->ctl_name, "", NULL);
 
 	/* clear the error */
 	out_le32(pdata->mc_vbase + MV64X60_SDRAM_ERR_ADDR, 0);
@@ -702,6 +700,7 @@ static void mv64x60_init_csrows(struct mem_ctl_info *mci,
 static int __devinit mv64x60_mc_err_probe(struct platform_device *pdev)
 {
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	struct mv64x60_mc_pdata *pdata;
 	struct resource *r;
 	u32 ctl;
@@ -710,9 +709,14 @@ static int __devinit mv64x60_mc_err_probe(struct platform_device *pdev)
 	if (!devres_open_group(&pdev->dev, mv64x60_mc_err_probe, GFP_KERNEL))
 		return -ENOMEM;
 
-	mci = edac_mc_alloc(edac_mc_idx, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, 1,
-			    1, 1, sizeof(struct mv64x60_mc_pdata));
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = 1;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = 1;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(edac_mc_idx, ARRAY_SIZE(layers), layers, false,
+			    sizeof(struct mv64x60_mc_pdata));
 	if (!mci) {
 		printk(KERN_ERR "%s: No memory for CPU err\n", __func__);
 		devres_release_group(&pdev->dev, mv64x60_mc_err_probe);
diff --git a/drivers/edac/pasemi_edac.c b/drivers/edac/pasemi_edac.c
index 0d0a545..2959db6 100644
--- a/drivers/edac/pasemi_edac.c
+++ b/drivers/edac/pasemi_edac.c
@@ -110,20 +110,16 @@ static void pasemi_edac_process_error_info(struct mem_ctl_info *mci, u32 errsta)
 	/* uncorrectable/multi-bit errors */
 	if (errsta & (MCDEBUG_ERRSTA_MBE_STATUS |
 		      MCDEBUG_ERRSTA_RFL_STATUS)) {
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     mci->csrows[cs].first_page, 0, 0,
-				     -1, -1, -1, cs, 0,
-				     mci->ctl_name, "");
+				     cs, 0, -1, mci->ctl_name, "", NULL);
 	}
 
 	/* correctable/single-bit errors */
 	if (errsta & MCDEBUG_ERRSTA_SBE_STATUS)
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     mci->csrows[cs].first_page, 0, 0,
-				     -1, -1, -1, cs, 0,
-				     mci->ctl_name, "");
+				     cs, 0, -1, mci->ctl_name, "", NULL);
 }
 
 static void pasemi_edac_check(struct mem_ctl_info *mci)
@@ -196,6 +192,7 @@ static int __devinit pasemi_edac_probe(struct pci_dev *pdev,
 		const struct pci_device_id *ent)
 {
 	struct mem_ctl_info *mci = NULL;
+	struct edac_mc_layer layers[2];
 	u32 errctl1, errcor, scrub, mcen;
 
 	pci_read_config_dword(pdev, MCCFG_MCEN, &mcen);
@@ -212,10 +209,14 @@ static int __devinit pasemi_edac_probe(struct pci_dev *pdev,
 		MCDEBUG_ERRCTL1_RFL_LOG_EN;
 	pci_write_config_dword(pdev, MCDEBUG_ERRCTL1, errctl1);
 
-	mci = edac_mc_alloc(system_mmc_id++, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, PASEMI_EDAC_NR_CSROWS,
-			    PASEMI_EDAC_NR_CSROWS, PASEMI_EDAC_NR_CHANS, 0);
-
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = PASEMI_EDAC_NR_CSROWS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = PASEMI_EDAC_NR_CHANS;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(system_mmc_id++, ARRAY_SIZE(layers), layers, false,
+			    0);
 	if (mci == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/edac/ppc4xx_edac.c b/drivers/edac/ppc4xx_edac.c
index 2e393cb..89ffc39 100644
--- a/drivers/edac/ppc4xx_edac.c
+++ b/drivers/edac/ppc4xx_edac.c
@@ -214,7 +214,7 @@ static struct platform_driver ppc4xx_edac_driver = {
  * TODO: The row and channel parameters likely need to be dynamically
  * set based on the aforementioned variant controller realizations.
  */
-static const unsigned ppc4xx_edac_num_csrows = 2;
+static const unsigned ppc4xx_edac_nr_csrows = 2;
 static const unsigned ppc4xx_edac_nr_chans = 1;
 
 /*
@@ -727,10 +727,10 @@ ppc4xx_edac_handle_ce(struct mem_ctl_info *mci,
 
 	for (row = 0; row < mci->num_csrows; row++)
 		if (ppc4xx_edac_check_bank_error(status, row))
-			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-					     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-					     -1, -1, -1, -1, -1,
-					     message, "");
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
+					     0, 0, 0,
+					     row, 0, -1,
+					     message, "", NULL);
 }
 
 /**
@@ -758,11 +758,10 @@ ppc4xx_edac_handle_ue(struct mem_ctl_info *mci,
 
 	for (row = 0; row < mci->num_csrows; row++)
 		if (ppc4xx_edac_check_bank_error(status, row))
-			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-					     HW_EVENT_SCOPE_MC, mci,
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 					     page, offset, 0,
-					     -1, -1, -1, -1, -1,
-					     message, "");
+					     row, 0, -1,
+					     message, "", NULL);
 }
 
 /**
@@ -1240,6 +1239,7 @@ static int __devinit ppc4xx_edac_probe(struct platform_device *op)
 	dcr_host_t dcr_host;
 	const struct device_node *np = op->dev.of_node;
 	struct mem_ctl_info *mci = NULL;
+	struct edac_mc_layer layers[2];
 	static int ppc4xx_edac_instance;
 
 	/*
@@ -1285,14 +1285,14 @@ static int __devinit ppc4xx_edac_probe(struct platform_device *op)
 	 * controller instance and perform the appropriate
 	 * initialization.
 	 */
-
-	mci = edac_mc_alloc(ppc4xx_edac_instance,
-			    EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, ppc4xx_edac_num_csrows * ppc4xx_edac_nr_chans,
-			    ppc4xx_edac_num_csrows,
-			    ppc4xx_edac_nr_chans,
-			    sizeof(struct ppc4xx_edac_pdata));
-
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = ppc4xx_edac_nr_csrows;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = ppc4xx_edac_nr_chans;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(ppc4xx_edac_instance, ARRAY_SIZE(layers), layers,
+			    false, sizeof(struct ppc4xx_edac_pdata));
 	if (mci == NULL) {
 		ppc4xx_edac_printk(KERN_ERR, "%s: "
 				   "Failed to allocate EDAC MC instance!\n",
diff --git a/drivers/edac/r82600_edac.c b/drivers/edac/r82600_edac.c
index 214bc48..f820c14 100644
--- a/drivers/edac/r82600_edac.c
+++ b/drivers/edac/r82600_edac.c
@@ -179,13 +179,11 @@ static int r82600_process_error_info(struct mem_ctl_info *mci,
 		error_found = 1;
 
 		if (handle_errors)
-			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
-					     mci, page, 0, syndrome,
-					     -1, -1, -1,
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
+					     page, 0, syndrome,
 					     edac_mc_find_csrow_by_page(mci, page),
-					     0,
-					     mci->ctl_name, "");
+					     0, -1,
+					     mci->ctl_name, "", NULL);
 	}
 
 	if (info->eapr & BIT(1)) {	/* UE? */
@@ -193,13 +191,11 @@ static int r82600_process_error_info(struct mem_ctl_info *mci,
 
 		if (handle_errors)
 			/* 82600 doesn't give enough info */
-			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
-					     mci, page, 0, 0,
-					     -1, -1, -1,
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
+					     page, 0, 0,
 					     edac_mc_find_csrow_by_page(mci, page),
-					     0,
-					     mci->ctl_name, "");
+					     0, -1,
+					     mci->ctl_name, "", NULL);
 	}
 
 	return error_found;
@@ -274,6 +270,7 @@ static void r82600_init_csrows(struct mem_ctl_info *mci, struct pci_dev *pdev,
 static int r82600_probe1(struct pci_dev *pdev, int dev_idx)
 {
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	u8 dramcr;
 	u32 eapr;
 	u32 scrub_disabled;
@@ -288,11 +285,13 @@ static int r82600_probe1(struct pci_dev *pdev, int dev_idx)
 	debugf2("%s(): sdram refresh rate = %#0x\n", __func__,
 		sdram_refresh_rate);
 	debugf2("%s(): DRAMC register = %#0x\n", __func__, dramcr);
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    -1, -1, R82600_NR_DIMMS,
-			    R82600_NR_CSROWS, R82600_NR_CHANS,
-			    0);
-
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = R82600_NR_CSROWS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = R82600_NR_CHANS;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, 0);
 	if (mci == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 5df6ade..4745c94 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -313,8 +313,6 @@ struct sbridge_pvt {
 	struct sbridge_info	info;
 	struct sbridge_channel	channel[NUM_CHANNELS];
 
-	int 			csrow_map[NUM_CHANNELS][MAX_DIMMS];
-
 	/* Memory type detection */
 	bool			is_mirrored, is_lockstep, is_close_pg;
 
@@ -486,29 +484,14 @@ static struct pci_dev *get_pdev_slot_func(u8 bus, unsigned slot,
 }
 
 /**
- * sbridge_get_active_channels() - gets the number of channels and csrows
+ * check_if_ecc_is_active() - Checks if ECC is active
  * bus:		Device bus
- * @channels:	Number of channels that will be returned
- * @csrows:	Number of csrows found
- *
- * Since EDAC core needs to know in advance the number of available channels
- * and csrows, in order to allocate memory for csrows/channels, it is needed
- * to run two similar steps. At the first step, implemented on this function,
- * it checks the number of csrows/channels present at one socket, identified
- * by the associated PCI bus.
- * this is used in order to properly allocate the size of mci components.
- * Note: one csrow is one dimm.
  */
-static int sbridge_get_active_channels(const u8 bus, unsigned *channels,
-				      unsigned *csrows)
+static int check_if_ecc_is_active(const u8 bus)
 {
 	struct pci_dev *pdev = NULL;
-	int i, j;
 	u32 mcmtr;
 
-	*channels = 0;
-	*csrows = 0;
-
 	pdev = get_pdev_slot_func(bus, 15, 0);
 	if (!pdev) {
 		sbridge_printk(KERN_ERR, "Couldn't find PCI device "
@@ -522,41 +505,13 @@ static int sbridge_get_active_channels(const u8 bus, unsigned *channels,
 		sbridge_printk(KERN_ERR, "ECC is disabled. Aborting\n");
 		return -ENODEV;
 	}
-
-	for (i = 0; i < NUM_CHANNELS; i++) {
-		u32 mtr;
-
-		/* Device 15 functions 2 - 5  */
-		pdev = get_pdev_slot_func(bus, 15, 2 + i);
-		if (!pdev) {
-			sbridge_printk(KERN_ERR, "Couldn't find PCI device "
-						 "%2x.%02d.%d!!!\n",
-						 bus, 15, 2 + i);
-			return -ENODEV;
-		}
-		(*channels)++;
-
-		for (j = 0; j < ARRAY_SIZE(mtr_regs); j++) {
-			pci_read_config_dword(pdev, mtr_regs[j], &mtr);
-			debugf1("Bus#%02x channel #%d  MTR%d = %x\n", bus, i, j, mtr);
-			if (IS_DIMM_PRESENT(mtr))
-				(*csrows)++;
-		}
-	}
-
-	debugf0("Number of active channels: %d, number of active dimms: %d\n",
-		*channels, *csrows);
-
 	return 0;
 }
 
 static int get_dimm_config(struct mem_ctl_info *mci)
 {
 	struct sbridge_pvt *pvt = mci->pvt_info;
-	struct csrow_info *csr;
 	int i, j, banks, ranks, rows, cols, size, npages;
-	int csrow = 0;
-	unsigned long last_page = 0;
 	u32 reg;
 	enum edac_type mode;
 	enum mem_type mtype;
@@ -635,16 +590,6 @@ static int get_dimm_config(struct mem_ctl_info *mci)
 					size, npages,
 					banks, ranks, rows, cols);
 
-				/*
-				 * Fake stuff. This controller doesn't see
-				 * csrows.
-				 */
-				csr = &mci->csrows[csrow];
-				pvt->csrow_map[i][j] = csrow;
-				last_page += npages;
-				csrow++;
-
-				csr->channels[0].dimm = dimm;
 				dimm->nr_pages = npages;
 				dimm->grain = 32;
 				dimm->dtype = (banks == 8) ? DEV_X8 : DEV_X4;
@@ -1397,7 +1342,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	u32 optypenum = GET_BITFIELD(m->status, 4, 6);
 	long channel_mask, first_channel;
 	u8  rank, socket;
-	int csrow, rc, dimm;
+	int rc, dimm;
 	char *area_type = "Unknown";
 
 	if (uncorrected_error) {
@@ -1470,8 +1415,6 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	else
 		dimm = 2;
 
-	csrow = pvt->csrow_map[first_channel][dimm];
-
 	if (uncorrected_error && recoverable)
 		recoverable_msg = " recoverable";
 	else
@@ -1501,17 +1444,15 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 	/* FIXME: need support for channel mask */
 
 	/* Call the helper to output message */
-	edac_mc_handle_error(tp_event,
-			     HW_EVENT_SCOPE_MC_DIMM, mci,
+	edac_mc_handle_error(tp_event, mci,
 			     m->addr >> PAGE_SHIFT, m->addr & ~PAGE_MASK, 0,
-			     0, channel, dimm, -1, -1,
-			     optype, msg);
+			     channel, dimm, -1,
+			     optype, msg, m);
 	return;
 err_parsing:
-	edac_mc_handle_error(tp_event,
-			     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-			     -1, -1, -1, -1, -1,
-			     msg, "");
+	edac_mc_handle_error(tp_event, mci, 0, 0, 0,
+			     -1, -1, -1,
+			     msg, "", m);
 
 }
 
@@ -1674,18 +1615,25 @@ static void sbridge_unregister_mci(struct sbridge_dev *sbridge_dev)
 static int sbridge_register_mci(struct sbridge_dev *sbridge_dev)
 {
 	struct mem_ctl_info *mci;
+	struct edac_mc_layer layers[2];
 	struct sbridge_pvt *pvt;
-	int rc, channels, dimms;
+	int rc;
 
 	/* Check the number of active and not disabled channels */
-	rc = sbridge_get_active_channels(sbridge_dev->bus, &channels, &dimms);
+	rc = check_if_ecc_is_active(sbridge_dev->bus);
 	if (unlikely(rc < 0))
 		return rc;
 
 	/* allocate a new MC control structure */
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    1, channels, dimms,
-			    dimms, channels, sizeof(*pvt));
+	layers[0].type = EDAC_MC_LAYER_CHANNEL;
+	layers[0].size = NUM_CHANNELS;
+	layers[0].is_csrow = false;
+	layers[1].type = EDAC_MC_LAYER_SLOT;
+	layers[1].size = MAX_DIMMS;
+	layers[1].is_csrow = true;
+	mci = edac_mc_alloc(sbridge_dev->mc, ARRAY_SIZE(layers), layers,
+			    false, sizeof(*pvt));
+
 	if (unlikely(!mci))
 		return -ENOMEM;
 
diff --git a/drivers/edac/tile_edac.c b/drivers/edac/tile_edac.c
index 19ac19e..9a91826 100644
--- a/drivers/edac/tile_edac.c
+++ b/drivers/edac/tile_edac.c
@@ -71,11 +71,10 @@ static void tile_edac_check(struct mem_ctl_info *mci)
 	if (mem_error.sbe_count != priv->ce_count) {
 		dev_dbg(mci->dev, "ECC CE err on node %d\n", priv->node);
 		priv->ce_count = mem_error.sbe_count;
-		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-				     HW_EVENT_SCOPE_MC_CSROW_CHANNEL, mci,
+		edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 				     0, 0, 0,
-				     -1, -1, -1, 0, 0,
-				     mci->ctl_name, "");
+				     0, 0, -1,
+				     mci->ctl_name, "", NULL);
 	}
 }
 
@@ -126,6 +125,7 @@ static int __devinit tile_edac_mc_probe(struct platform_device *pdev)
 	char			hv_file[32];
 	int			hv_devhdl;
 	struct mem_ctl_info	*mci;
+	struct edac_mc_layer	layers[2];
 	struct tile_edac_priv	*priv;
 	int			rc;
 
@@ -135,9 +135,13 @@ static int __devinit tile_edac_mc_probe(struct platform_device *pdev)
 		return -EINVAL;
 
 	/* A TILE MC has a single channel and one chip-select row. */
-	mci = edac_mc_alloc(pdev->id, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    0, 0, TILE_EDAC_NR_CSROWS,
-			    TILE_EDAC_NR_CSROWS, TILE_EDAC_NR_CHANS,
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = TILE_EDAC_NR_CSROWS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = TILE_EDAC_NR_CHANS;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false,
 			    sizeof(struct tile_edac_priv));
 	if (mci == NULL)
 		return -ENOMEM;
diff --git a/drivers/edac/x38_edac.c b/drivers/edac/x38_edac.c
index 27cf304..5f3c57f 100644
--- a/drivers/edac/x38_edac.c
+++ b/drivers/edac/x38_edac.c
@@ -215,29 +215,26 @@ static void x38_process_error_info(struct mem_ctl_info *mci,
 		return;
 
 	if ((info->errsts ^ info->errsts2) & X38_ERRSTS_BITS) {
-		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-				     HW_EVENT_SCOPE_MC, mci, 0, 0, 0,
-				     -1, -1, -1, -1, -1,
-				     "UE overwrote CE", "");
+		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 0, 0, 0,
+				     -1, -1, -1,
+				     "UE overwrote CE", "", NULL);
 		info->errsts = info->errsts2;
 	}
 
 	for (channel = 0; channel < x38_channel_num; channel++) {
 		log = info->eccerrlog[channel];
 		if (log & X38_ECCERRLOG_UE) {
-			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW, mci,
+			edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 					     0, 0, 0,
-					     -1, -1, -1,
-					     eccerrlog_row(channel, log), -1,
-					     "x38 UE", "");
+					     eccerrlog_row(channel, log),
+					     -1, -1,
+					     "x38 UE", "", NULL);
 		} else if (log & X38_ECCERRLOG_CE) {
-			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED,
-					     HW_EVENT_SCOPE_MC_CSROW, mci,
+			edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
 					     0, 0, eccerrlog_syndrome(log),
-					     -1, -1, -1,
-					     eccerrlog_row(channel, log), -1,
-					     "x38 CE", "");
+					     eccerrlog_row(channel, log),
+					     -1, -1,
+					     "x38 CE", "", NULL);
 		}
 	}
 }
@@ -329,6 +326,7 @@ static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 	int rc;
 	int i, j;
 	struct mem_ctl_info *mci = NULL;
+	struct edac_mc_layer layers[2];
 	u16 drbs[X38_CHANNELS][X38_RANKS_PER_CHANNEL];
 	bool stacked;
 	void __iomem *window;
@@ -344,10 +342,13 @@ static int x38_probe1(struct pci_dev *pdev, int dev_idx)
 	how_many_channel(pdev);
 
 	/* FIXME: unconventional pvt_info usage */
-	mci = edac_mc_alloc(0, EDAC_ALLOC_FILL_CSROW_CSCHANNEL,
-			    -1, -1, X38_RANKS,
-			    X38_RANKS, x38_channel_num,
-			    0);
+	layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+	layers[0].size = X38_RANKS;
+	layers[0].is_csrow = true;
+	layers[1].type = EDAC_MC_LAYER_CHANNEL;
+	layers[1].size = x38_channel_num;
+	layers[1].is_csrow = false;
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, false, 0);
 	if (!mci)
 		return -ENOMEM;
 
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 1d707f4..4d84e40 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -206,41 +206,6 @@ enum hw_event_mc_err_type {
 };
 
 /**
- * enum hw_event_error_scope - escope of a memory error
- *
- * @HW_EVENT_ERR_MC:		error can be anywhere inside the MC
- * @HW_EVENT_SCOPE_MC_BRANCH:	error can be on any DIMM inside the branch
- * @HW_EVENT_SCOPE_MC_CHANNEL:	error can be on any DIMM inside the MC channel
- * @HW_EVENT_SCOPE_MC_DIMM:	error is on a specific DIMM
- * @HW_EVENT_SCOPE_MC_CSROW:	error can be on any DIMM inside the csrow
- * @HW_EVENT_SCOPE_MC_CSROW_CHANNEL: error is on a CSROW channel
- *
- * Depending on the error detection algorithm, the memory topology and even
- * the MC capabilities, some errors can't be attributed to just one DIMM, but
- * to a group of memory sockets. Depending on where the error occurs, the
- * EDAC core will increment the corresponding error count for that entity,
- * and the upper entities. For example, assuming a system with 1 memory
- * controller 2 branches, 2 MC channels and 4 DIMMS on it, if an error
- * happens at channel 0, the error counts for channel 0, for branch 0 and
- * for the memory controller 0 will be incremented. The DIMM error counts won't
- * be incremented, as, in this example, the driver can't be 100% sure on what
- * memory the error actually occurred.
- *
- * The order here is important, as edac_mc_handle_error() will use it, in order
- * to check what parameters will be used. The smallest number should be
- * the hole memory controller, and the last one should be the more
- * fine-grained detail, e. g.: DIMM.
- */
-enum hw_event_error_scope {
-	HW_EVENT_SCOPE_MC,
-	HW_EVENT_SCOPE_MC_BRANCH,
-	HW_EVENT_SCOPE_MC_CHANNEL,
-	HW_EVENT_SCOPE_MC_DIMM,
-	HW_EVENT_SCOPE_MC_CSROW,
-	HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
-};
-
-/**
  * enum mem_type - Type of the memory stick
  *
  * @MEM_EMPTY		Empty csrow
@@ -423,16 +388,51 @@ enum scrub_type {
 #define OP_RUNNING_POLL_INTR	0x203
 #define OP_OFFLINE		0x300
 
+/**
+ * enum edac_mc_layer - memory controller hierarchy layer
+ *
+ * @EDAC_MC_LAYER_BRANCH:	memory layer is named "branch"
+ * @EDAC_MC_LAYER_CHANNEL:	memory layer is named "channel"
+ * @EDAC_MC_LAYER_SLOT:		memory layer is named "slot"
+ * @EDAC_MC_LAYER_CHIP_SELECT:	memory layer is named "chip select"
+ *
+ * This enum is used by the drivers to tell edac_mc_sysfs what name should
+ * be used when describing a memory stick location.
+ */
+enum edac_mc_layer_type {
+	EDAC_MC_LAYER_BRANCH,
+	EDAC_MC_LAYER_CHANNEL,
+	EDAC_MC_LAYER_SLOT,
+	EDAC_MC_LAYER_CHIP_SELECT,
+};
+
+/**
+ * struct edac_mc_layer - describes the memory controller hierarchy
+ * @layer:		layer type
+ * @size:maximum size of the layer
+ * @is_csrow:		This layer is part of the "csrow" when old API
+ *			compatibility mode is enabled. Otherwise, it is
+ *			a channel
+ */
+struct edac_mc_layer {
+	enum edac_mc_layer_type	type;
+	unsigned 		size;
+	bool			is_csrow;
+};
+
+/*
+ * Maximum number of layers used by the memory controller to uniquelly
+ * identify a single memory stick.
+ * NOTE: change it also requires changing edac_mc_handle_error()
+ */
+#define EDAC_MAX_LAYERS		3
+
 /* FIXME: add the proper per-location error counts */
 struct dimm_info {
 	char label[EDAC_MC_LABEL_LEN + 1];	/* DIMM label on motherboard */
 
 	/* Memory location data */
-	int mc_branch;
-	int mc_channel;
-	int mc_dimm_number;
-	int csrow;
-	int cschannel;
+	unsigned location[EDAC_MAX_LAYERS];
 
 	struct kobject kobj;		/* sysfs kobject for this csrow */
 	struct mem_ctl_info *mci;	/* the parent */
@@ -442,13 +442,17 @@ struct dimm_info {
 	enum mem_type mtype;	/* memory dimm type */
 	enum edac_type edac_mode;	/* EDAC mode for this dimm */
 
-	u32 nr_pages;			/* number of pages in csrow */
+	u32 nr_pages;			/* number of pages on this dimm */
+
+	unsigned csrow, cschannel;	/* Points to the old API data */
 };
 
 struct csrow_channel_info {
 	int chan_idx;		/* channel index */
 	struct dimm_info *dimm;
 	struct csrow_info *csrow;	/* the parent */
+
+	u32 ce_count;		/* Correctable Errors for this csrow */
 };
 
 struct csrow_info {
@@ -460,6 +464,9 @@ struct csrow_info {
 	unsigned long page_mask;	/* used for interleaving -
 					 * 0UL for non intlv */
 
+	u32 ue_count;		/* Uncorrectable Errors for this csrow */
+	u32 ce_count;		/* Correctable Errors for this csrow */
+
 	struct mem_ctl_info *mci;	/* the parent */
 
 	struct kobject kobj;	/* sysfs kobject for this csrow */
@@ -497,22 +504,9 @@ struct mcidev_sysfs_attribute {
         ssize_t (*store)(struct mem_ctl_info *, const char *,size_t);
 };
 
-/*
- * Error counters for all possible memory arrangements
- */
-struct error_counts {
-	u32 ce_mc;
-	u32 *ce_branch;
-	u32 *ce_channel;
-	u32 *ce_dimm;
-	u32 *ce_csrow;
-	u32 *ce_cschannel;
-	u32 ue_mc;
-	u32 *ue_branch;
-	u32 *ue_channel;
-	u32 *ue_dimm;
-	u32 *ue_csrow;
-	u32 *ue_cschannel;
+struct edac_hierarchy {
+	char		*name;
+	unsigned	nr;
 };
 
 /* MEMORY controller information structure
@@ -560,14 +554,11 @@ struct mem_ctl_info {
 					   unsigned long page);
 	int mc_idx;
 	struct csrow_info *csrows;
+	unsigned num_csrows, num_cschannel;
 
-	/* Number of allocated memory location data */
-	unsigned num_branch;
-	unsigned num_channel;
-	unsigned num_dimm;
-	unsigned num_csrows;
-	unsigned num_cschannel;
-
+	/* Memory Controller hierarchy */
+	unsigned n_layers;
+	struct edac_mc_layer *layers;
 	/*
 	 * DIMM info. Will eventually remove the entire csrows_info some day
 	 */
@@ -589,8 +580,9 @@ struct mem_ctl_info {
 	unsigned long start_time;	/* mci load start time (in jiffies) */
 
 	/* drivers shouldn't access this struct directly */
-	struct error_counts err;
 	unsigned ce_noinfo_count, ue_noinfo_count;
+	unsigned ce_mc, ue_mc;
+	u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
 
 	struct completion complete;
 
diff --git a/include/trace/events/hw_event.h b/include/trace/events/hw_event.h
index cbec44a..4c455c1 100644
--- a/include/trace/events/hw_event.h
+++ b/include/trace/events/hw_event.h
@@ -58,54 +58,41 @@ TRACE_EVENT(mc_error,
 		 const unsigned int mc_index,
 		 const char *msg,
 		 const char *label,
-		 const int branch,
-		 const int channel,
-		 const int dimm,
-		 const int csrow,
-		 const int cschannel,
+		 const char *location,
 		 const char *detail,
 		 const char *driver_detail),
 
-	TP_ARGS(err_type, mc_index, msg, label, branch, channel, dimm, csrow,
-		cschannel, detail, driver_detail),
+	TP_ARGS(err_type, mc_index, msg, label, location,
+		detail, driver_detail),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	err_type		)
 		__field(	unsigned int,	mc_index		)
-		__field(	int,		branch			)
-		__field(	int,		channel			)
-		__field(	int,		dimm			)
-		__field(	int,		csrow			)
-		__field(	int,		cschannel		)
 		__string(	msg,		msg			)
 		__string(	label,		label			)
 		__string(	detail,		detail			)
+		__string(	location,	location		)
 		__string(	driver_detail,	driver_detail		)
 	),
 
 	TP_fast_assign(
 		__entry->err_type		= err_type;
 		__entry->mc_index		= mc_index;
-		__entry->branch			= branch;
-		__entry->channel		= channel;
-		__entry->dimm			= dimm;
-		__entry->csrow			= csrow;
-		__entry->cschannel		= cschannel;
 		__assign_str(msg, msg);
 		__assign_str(label, label);
+		__assign_str(location, location);
 		__assign_str(detail, detail);
 		__assign_str(driver_detail, driver_detail);
 	),
 
-	TP_printk(HW_ERR "mce#%d: %s error %s on label \"%s\" (location %d.%d.%d.%d.%d %s %s)\n",
+	TP_printk(HW_ERR "mce#%d: %s error %s on label \"%s\" (%s %s %s)\n",
 		  __entry->mc_index,
 		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
 			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
 			"Fatal" : "Uncorrected"),
 		  __get_str(msg),
 		  __get_str(label),
-		  __entry->branch, __entry->channel, __entry->dimm,
-		  __entry->csrow, __entry->cschannel,
+		  __get_str(location),
 		  __get_str(detail),
 		  __get_str(driver_detail))
 );
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 20/31] edac: Export MC hierarchy counters for CE and UE
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (18 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 19/31] edac: rework memory layer hierarchy description Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 21/31] hw_event: Add x86 MCE events on it Mauro Carvalho Chehab
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/amd64_edac_dbg.c |    6 +-
 drivers/edac/amd64_edac_inj.c |   24 ++++--
 drivers/edac/edac_mc.c        |   19 ++++-
 drivers/edac/edac_mc_sysfs.c  |  182 +++++++++++++++++++++++++++++++++++++----
 drivers/edac/i7core_edac.c    |   26 ++++---
 include/linux/edac.h          |   17 ++++-
 6 files changed, 232 insertions(+), 42 deletions(-)

diff --git a/drivers/edac/amd64_edac_dbg.c b/drivers/edac/amd64_edac_dbg.c
index e356228..16c517a 100644
--- a/drivers/edac/amd64_edac_dbg.c
+++ b/drivers/edac/amd64_edac_dbg.c
@@ -1,7 +1,8 @@
 #include "amd64_edac.h"
 
 #define EDAC_DCT_ATTR_SHOW(reg)						\
-static ssize_t amd64_##reg##_show(struct mem_ctl_info *mci, char *data)	\
+static ssize_t amd64_##reg##_show(struct mem_ctl_info *mci, char *data,	\
+				  void *priv)				\
 {									\
 	struct amd64_pvt *pvt = mci->pvt_info;				\
 		return sprintf(data, "0x%016llx\n", (u64)pvt->reg);	\
@@ -12,7 +13,8 @@ EDAC_DCT_ATTR_SHOW(dbam0);
 EDAC_DCT_ATTR_SHOW(top_mem);
 EDAC_DCT_ATTR_SHOW(top_mem2);
 
-static ssize_t amd64_hole_show(struct mem_ctl_info *mci, char *data)
+static ssize_t amd64_hole_show(struct mem_ctl_info *mci, char *data,
+			       void *priv)
 {
 	u64 hole_base = 0;
 	u64 hole_offset = 0;
diff --git a/drivers/edac/amd64_edac_inj.c b/drivers/edac/amd64_edac_inj.c
index 303f10e..a6fd957 100644
--- a/drivers/edac/amd64_edac_inj.c
+++ b/drivers/edac/amd64_edac_inj.c
@@ -1,6 +1,7 @@
 #include "amd64_edac.h"
 
-static ssize_t amd64_inject_section_show(struct mem_ctl_info *mci, char *buf)
+static ssize_t amd64_inject_section_show(struct mem_ctl_info *mci, char *buf,
+					 void *priv)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	return sprintf(buf, "0x%x\n", pvt->injection.section);
@@ -13,7 +14,8 @@ static ssize_t amd64_inject_section_show(struct mem_ctl_info *mci, char *buf)
  * range: 0..3
  */
 static ssize_t amd64_inject_section_store(struct mem_ctl_info *mci,
-					  const char *data, size_t count)
+					  const char *data, size_t count,
+					  void *priv)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	unsigned long value;
@@ -33,7 +35,8 @@ static ssize_t amd64_inject_section_store(struct mem_ctl_info *mci,
 	return ret;
 }
 
-static ssize_t amd64_inject_word_show(struct mem_ctl_info *mci, char *buf)
+static ssize_t amd64_inject_word_show(struct mem_ctl_info *mci, char *buf,
+				      void *priv)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	return sprintf(buf, "0x%x\n", pvt->injection.word);
@@ -46,7 +49,8 @@ static ssize_t amd64_inject_word_show(struct mem_ctl_info *mci, char *buf)
  * range: 0..8
  */
 static ssize_t amd64_inject_word_store(struct mem_ctl_info *mci,
-					const char *data, size_t count)
+					const char *data, size_t count,
+					void *priv)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	unsigned long value;
@@ -66,7 +70,8 @@ static ssize_t amd64_inject_word_store(struct mem_ctl_info *mci,
 	return ret;
 }
 
-static ssize_t amd64_inject_ecc_vector_show(struct mem_ctl_info *mci, char *buf)
+static ssize_t amd64_inject_ecc_vector_show(struct mem_ctl_info *mci,
+					    char *buf, void *priv)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	return sprintf(buf, "0x%x\n", pvt->injection.bit_map);
@@ -78,7 +83,8 @@ static ssize_t amd64_inject_ecc_vector_show(struct mem_ctl_info *mci, char *buf)
  * DRAM ECC read, it holds the contents of the of the DRAM ECC bits.
  */
 static ssize_t amd64_inject_ecc_vector_store(struct mem_ctl_info *mci,
-					     const char *data, size_t count)
+					     const char *data, size_t count,
+					     void *priv)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	unsigned long value;
@@ -104,7 +110,8 @@ static ssize_t amd64_inject_ecc_vector_store(struct mem_ctl_info *mci,
  * fields needed by the injection registers and read the NB Array Data Port.
  */
 static ssize_t amd64_inject_read_store(struct mem_ctl_info *mci,
-					const char *data, size_t count)
+				       const char *data, size_t count,
+				       void *priv)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	unsigned long value;
@@ -137,7 +144,8 @@ static ssize_t amd64_inject_read_store(struct mem_ctl_info *mci,
  * fields needed by the injection registers.
  */
 static ssize_t amd64_inject_write_store(struct mem_ctl_info *mci,
-					const char *data, size_t count)
+					const char *data, size_t count,
+					void *priv)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	unsigned long value;
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 6e8faf3..37d2c97 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -206,11 +206,13 @@ struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
 	struct edac_mc_layer *lay;
 	struct csrow_info *csi, *csr;
 	struct csrow_channel_info *chi, *chp, *chan;
+	struct mcidev_sysfs_attribute *erc;
+	struct errcount_attribute_data *ercd;
 	struct dimm_info *dimm;
 	u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
 	void *pvt;
 	unsigned size, tot_dimms, count, per_layer_count[EDAC_MAX_LAYERS];
-	unsigned tot_csrows, tot_cschannels;
+	unsigned tot_csrows, tot_cschannels, tot_errcount = 0;
 	int i, j;
 	int err;
 	int row, chn;
@@ -247,7 +249,14 @@ struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
 		count *= layers[i].size;
 		ce_per_layer[i] = edac_align_ptr(&ptr, sizeof(unsigned), count);
 		ue_per_layer[i] = edac_align_ptr(&ptr, sizeof(unsigned), count);
+		if (i < n_layers - 1)
+			tot_errcount += 2 * count;
 	}
+	/*
+	 * The last error count is equal to DIMM. So, don't export it twice
+	 */
+	erc = edac_align_ptr(&ptr, sizeof(*erc), tot_errcount);
+	ercd = edac_align_ptr(&ptr, sizeof(*ercd), tot_errcount);
 	pvt = edac_align_ptr(&ptr, sz_pvt, 1);
 	size = ((unsigned long)pvt) + sz_pvt;
 
@@ -268,6 +277,8 @@ struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
 		mci->ce_per_layer[i] = (u32 *)((char *)mci + ((unsigned long)ce_per_layer[i]));
 		mci->ue_per_layer[i] = (u32 *)((char *)mci + ((unsigned long)ue_per_layer[i]));
 	}
+	erc = (struct mcidev_sysfs_attribute *)((char *)mci + ((unsigned long)erc));
+	ercd = (struct errcount_attribute_data *)((char *)mci + ((unsigned long)ercd));
 	pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;
 
 	/* setup index and various internal pointers */
@@ -275,6 +286,8 @@ struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
 	mci->csrows = csi;
 	mci->dimms  = dimm;
 	mci->tot_dimms = tot_dimms;
+	mci->errcount_attr = erc;
+	mci->errcount_attr_data = ercd;
 	mci->pvt_info = pvt;
 	mci->n_layers = n_layers;
 	mci->layers = lay;
@@ -845,7 +858,7 @@ static void edac_increment_ce_error(struct mem_ctl_info *mci,
 		return;
 	}
 
-	for (i = 0; i <= mci->n_layers; i++) {
+	for (i = 0; i < mci->n_layers; i++) {
 		if (pos[i] < 0)
 			break;
 		index += pos[i];
@@ -867,7 +880,7 @@ static void edac_increment_ue_error(struct mem_ctl_info *mci,
 		return;
 	}
 
-	for (i = 0; i <= mci->n_layers; i++) {
+	for (i = 0; i < mci->n_layers; i++) {
 		if (pos[i] < 0)
 			break;
 		index += pos[i];
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 245c588..4e8f0ec 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -475,7 +475,7 @@ static ssize_t dimmdev_location_show(struct dimm_info *dimm, char *data)
 	int i;
 	char *p = data;
 
-	for (i = 0; i <= mci->n_layers; i++) {
+	for (i = 0; i < mci->n_layers; i++) {
 		p += sprintf(p, "%s %d ",
 			     edac_layer_name[mci->layers[i].type],
 			     dimm->location[i]);
@@ -605,7 +605,8 @@ err_out:
 /* default sysfs methods and data structures for the main MCI kobject */
 
 static ssize_t mci_reset_counters_store(struct mem_ctl_info *mci,
-					const char *data, size_t count)
+					const char *data, size_t count,
+					void *priv)
 {
 	int cnt, row, chan, i;
 	mci->ue_mc = 0;
@@ -645,7 +646,8 @@ static ssize_t mci_reset_counters_store(struct mem_ctl_info *mci,
  * the scrub rate.
  */
 static ssize_t mci_sdram_scrub_rate_store(struct mem_ctl_info *mci,
-					  const char *data, size_t count)
+					  const char *data, size_t count,
+					  void *priv)
 {
 	unsigned long bandwidth = 0;
 	int new_bw = 0;
@@ -669,7 +671,8 @@ static ssize_t mci_sdram_scrub_rate_store(struct mem_ctl_info *mci,
 /*
  * ->get_sdram_scrub_rate() return value semantics same as above.
  */
-static ssize_t mci_sdram_scrub_rate_show(struct mem_ctl_info *mci, char *data)
+static ssize_t mci_sdram_scrub_rate_show(struct mem_ctl_info *mci, char *data,
+					 void *priv)
 {
 	int bandwidth = 0;
 
@@ -686,37 +689,44 @@ static ssize_t mci_sdram_scrub_rate_show(struct mem_ctl_info *mci, char *data)
 }
 
 /* default attribute files for the MCI object */
-static ssize_t mci_ue_count_show(struct mem_ctl_info *mci, char *data)
+static ssize_t mci_ue_count_show(struct mem_ctl_info *mci, char *data,
+				 void *priv)
 {
 	return sprintf(data, "%d\n", mci->ue_mc);
 }
 
-static ssize_t mci_ce_count_show(struct mem_ctl_info *mci, char *data)
+static ssize_t mci_ce_count_show(struct mem_ctl_info *mci, char *data,
+				 void *priv)
 {
 	return sprintf(data, "%d\n", mci->ce_mc);
 }
 
-static ssize_t mci_ce_noinfo_show(struct mem_ctl_info *mci, char *data)
+static ssize_t mci_ce_noinfo_show(struct mem_ctl_info *mci, char *data,
+				  void *priv)
 {
 	return sprintf(data, "%d\n", mci->ce_noinfo_count);
 }
 
-static ssize_t mci_ue_noinfo_show(struct mem_ctl_info *mci, char *data)
+static ssize_t mci_ue_noinfo_show(struct mem_ctl_info *mci, char *data,
+				  void *priv)
 {
 	return sprintf(data, "%d\n", mci->ue_noinfo_count);
 }
 
-static ssize_t mci_seconds_show(struct mem_ctl_info *mci, char *data)
+static ssize_t mci_seconds_show(struct mem_ctl_info *mci, char *data,
+				void *priv)
 {
 	return sprintf(data, "%ld\n", (jiffies - mci->start_time) / HZ);
 }
 
-static ssize_t mci_ctl_name_show(struct mem_ctl_info *mci, char *data)
+static ssize_t mci_ctl_name_show(struct mem_ctl_info *mci, char *data,
+				 void *priv)
 {
 	return sprintf(data, "%s\n", mci->ctl_name);
 }
 
-static ssize_t mci_size_mb_show(struct mem_ctl_info *mci, char *data)
+static ssize_t mci_size_mb_show(struct mem_ctl_info *mci, char *data,
+				void *priv)
 {
 	int total_pages, csrow_idx, j;
 
@@ -747,7 +757,8 @@ static ssize_t mcidev_show(struct kobject *kobj, struct attribute *attr,
 	debugf1("%s() mem_ctl_info %p\n", __func__, mem_ctl_info);
 
 	if (mcidev_attr->show)
-		return mcidev_attr->show(mem_ctl_info, buffer);
+		return mcidev_attr->show(mem_ctl_info, buffer,
+					 mcidev_attr->priv);
 
 	return -EIO;
 }
@@ -761,7 +772,8 @@ static ssize_t mcidev_store(struct kobject *kobj, struct attribute *attr,
 	debugf1("%s() mem_ctl_info %p\n", __func__, mem_ctl_info);
 
 	if (mcidev_attr->store)
-		return mcidev_attr->store(mem_ctl_info, buffer, count);
+		return mcidev_attr->store(mem_ctl_info, buffer, count,
+					  mcidev_attr->priv);
 
 	return -EIO;
 }
@@ -773,10 +785,11 @@ static const struct sysfs_ops mci_ops = {
 };
 
 #define MCIDEV_ATTR(_name,_mode,_show,_store)			\
-static struct mcidev_sysfs_attribute mci_attr_##_name = {			\
+static struct mcidev_sysfs_attribute mci_attr_##_name = {	\
 	.attr = {.name = __stringify(_name), .mode = _mode },	\
 	.show   = _show,					\
 	.store  = _store,					\
+	.priv   = NULL,						\
 };
 
 /* default Control file */
@@ -808,6 +821,132 @@ static struct mcidev_sysfs_attribute *mci_attr[] = {
 	NULL
 };
 
+/*
+ * Per layer error count nodes
+ */
+static ssize_t errcount_ce_show(struct mem_ctl_info *mci, char *data,
+				void *priv)
+{
+	struct errcount_attribute_data *ead = priv;
+	int i, index = 0;
+
+	for (i = 0; i < mci->n_layers - 1; i++) {
+		index += ead->pos[i];
+		index *= mci->layers[i].size;
+	}
+	index += ead->pos[i];
+	return sprintf(data, "%u\n",
+		       mci->ce_per_layer[ead->n_layers - 1][index]);
+}
+
+static ssize_t errcount_ue_show(struct mem_ctl_info *mci, char *data,
+				void *priv)
+{
+	struct errcount_attribute_data *ead = priv;
+	int i, index = 0;
+
+	for (i = 0; i < mci->n_layers - 1; i++) {
+		index += ead->pos[i];
+		index *= mci->layers[i].size;
+	}
+	index += ead->pos[i];
+	return sprintf(data, "%u\n",
+		       mci->ue_per_layer[ead->n_layers - 1][index]);
+}
+
+static int edac_create_errcount_layer(struct mem_ctl_info *mci,
+				      struct mcidev_sysfs_attribute **erc,
+				      struct errcount_attribute_data **ercd,
+				      const unsigned layer,
+				      const int count)
+{
+	int err, i, j, pos[EDAC_MAX_LAYERS];
+	char location[80], *p;
+
+	memset(&pos, 0, sizeof(pos));
+	for (i = 0; i < count; i++) {
+		p = location;
+		for (j = 0; j <= layer; j++)
+			p += sprintf(p, "_%s%d",
+				     edac_layer_name[mci->layers[j].type],
+				     pos[j]);
+
+		(*erc)->attr.name = kasprintf(GFP_KERNEL, "ce%s", location);
+		debugf4("%s() creating %s\n", __func__, (*erc)->attr.name);
+		if (!(*erc)->attr.name)
+			return -ENOMEM;
+		(*erc)->attr.mode = S_IRUGO;
+		(*erc)->show = errcount_ce_show;
+		(*erc)->priv = *ercd;
+		(*ercd)->n_layers = layer + 1;
+		memcpy((*ercd)->pos, pos, sizeof(pos));
+		err = sysfs_create_file(&mci->edac_mci_kobj, &(*erc)->attr);
+		if (err < 0) {
+			printk(KERN_ERR "sysfs_create_file failed: %d\n", err);
+			return err;
+		}
+
+		(*erc)->attr.name = kasprintf(GFP_KERNEL, "ue%s", location);
+		debugf4("%s() creating %s\n", __func__, (*erc)->attr.name);
+		if (!(*erc)->attr.name)
+			return -ENOMEM;
+		(*erc)->attr.mode = S_IRUGO | S_IWUSR;
+		(*erc)->show = errcount_ue_show;
+		(*erc)->priv = *ercd;
+		(*ercd)->n_layers = layer + 1;
+		memcpy((*ercd)->pos, pos, sizeof(pos));
+		err = sysfs_create_file(&mci->edac_mci_kobj, &(*erc)->attr);
+		if (err < 0) {
+			printk(KERN_ERR "sysfs_create_file failed: %d\n", err);
+			return err;
+		}
+
+		for (j = layer; j >= 0; j--) {
+			pos[j]++;
+			if (pos[j] < mci->layers[j].size)
+				break;
+			pos[j] = 0;
+		}
+		(*erc)++;
+		(*ercd)++;
+	}
+	return 0;
+}
+
+static void edac_remove_errcount(struct mem_ctl_info *mci)
+{
+	struct mcidev_sysfs_attribute *erc = mci->errcount_attr;
+
+	do {
+		if (!(erc->attr.name))
+			return;
+
+		sysfs_remove_file(&mci->edac_mci_kobj, &erc->attr);
+
+		kfree(erc->attr.name);
+		erc++;
+	} while (1);
+	return;
+}
+
+static int edac_create_errcount_objects(struct mem_ctl_info *mci)
+{
+	struct mcidev_sysfs_attribute *erc = mci->errcount_attr;
+	struct errcount_attribute_data *ercd = mci->errcount_attr_data;
+	int err, i, count;
+
+	count = 1;
+	for (i = 0; i < mci->n_layers - 1; i++) {
+		count *= mci->layers[i].size;
+		err = edac_create_errcount_layer(mci, &erc, &ercd, i, count);
+		if (err < 0)
+			goto err;
+	}
+	return 0;
+err:
+	edac_remove_errcount(mci);
+	return err;
+}
 
 /*
  * Release of a MC controlling instance
@@ -928,7 +1067,8 @@ static ssize_t inst_grp_show(struct kobject *kobj, struct attribute *attr,
 	debugf1("%s() mem_ctl_info %p\n", __func__, mem_ctl_info);
 
 	if (mcidev_attr->show)
-		return mcidev_attr->show(mem_ctl_info, buffer);
+		return mcidev_attr->show(mem_ctl_info, buffer,
+					 mcidev_attr->priv);
 
 	return -EIO;
 }
@@ -942,7 +1082,8 @@ static ssize_t inst_grp_store(struct kobject *kobj, struct attribute *attr,
 	debugf1("%s() mem_ctl_info %p\n", __func__, mem_ctl_info);
 
 	if (mcidev_attr->store)
-		return mcidev_attr->store(mem_ctl_info, buffer, count);
+		return mcidev_attr->store(mem_ctl_info, buffer, count,
+					  mcidev_attr->priv);
 
 	return -EIO;
 }
@@ -1179,6 +1320,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 			goto fail2;
 		}
 	}
+	edac_create_errcount_objects(mci);
 
 	return 0;
 
@@ -1224,6 +1366,14 @@ void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
 
 	debugf0("%s()\n", __func__);
 
+	edac_remove_errcount(mci);
+
+	/* remove all dimms kobjects */
+	for (i = 0; i < mci->tot_dimms; i++) {
+		if (mci->dimms[i].nr_pages)
+			kobject_put(&mci->dimms[i].kobj);
+	}
+
 	/* remove all csrow kobjects */
 	debugf4("%s()  unregister this mci kobj\n", __func__);
 	for (i = 0; i < mci->tot_dimms; i++) {
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index ce75892..39d0b14 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -687,7 +687,8 @@ static int disable_inject(const struct mem_ctl_info *mci)
  *	bit 1 - refers to the upper 32-byte half cacheline
  */
 static ssize_t i7core_inject_section_store(struct mem_ctl_info *mci,
-					   const char *data, size_t count)
+					   const char *data, size_t count,
+					   void *priv)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	unsigned long value;
@@ -705,7 +706,7 @@ static ssize_t i7core_inject_section_store(struct mem_ctl_info *mci,
 }
 
 static ssize_t i7core_inject_section_show(struct mem_ctl_info *mci,
-					      char *data)
+					      char *data, void *priv)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	return sprintf(data, "0x%08x\n", pvt->inject.section);
@@ -720,7 +721,8 @@ static ssize_t i7core_inject_section_show(struct mem_ctl_info *mci,
  *	bit 2 - inject parity error
  */
 static ssize_t i7core_inject_type_store(struct mem_ctl_info *mci,
-					const char *data, size_t count)
+					const char *data, size_t count,
+					void *priv)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	unsigned long value;
@@ -738,7 +740,7 @@ static ssize_t i7core_inject_type_store(struct mem_ctl_info *mci,
 }
 
 static ssize_t i7core_inject_type_show(struct mem_ctl_info *mci,
-					      char *data)
+					      char *data, void *priv)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	return sprintf(data, "0x%08x\n", pvt->inject.type);
@@ -755,7 +757,8 @@ static ssize_t i7core_inject_type_show(struct mem_ctl_info *mci,
  *   uncorrectable error to be injected.
  */
 static ssize_t i7core_inject_eccmask_store(struct mem_ctl_info *mci,
-					const char *data, size_t count)
+					const char *data, size_t count,
+					void *priv)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	unsigned long value;
@@ -773,7 +776,7 @@ static ssize_t i7core_inject_eccmask_store(struct mem_ctl_info *mci,
 }
 
 static ssize_t i7core_inject_eccmask_show(struct mem_ctl_info *mci,
-					      char *data)
+					      char *data, void *priv)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	return sprintf(data, "0x%08x\n", pvt->inject.eccmask);
@@ -793,7 +796,7 @@ static ssize_t i7core_inject_eccmask_show(struct mem_ctl_info *mci,
 #define DECLARE_ADDR_MATCH(param, limit)			\
 static ssize_t i7core_inject_store_##param(			\
 		struct mem_ctl_info *mci,			\
-		const char *data, size_t count)			\
+		const char *data, size_t count, void *priv)	\
 {								\
 	struct i7core_pvt *pvt;					\
 	long value;						\
@@ -820,7 +823,7 @@ static ssize_t i7core_inject_store_##param(			\
 								\
 static ssize_t i7core_inject_show_##param(			\
 		struct mem_ctl_info *mci,			\
-		char *data)					\
+		char *data, void *priv)				\
 {								\
 	struct i7core_pvt *pvt;					\
 								\
@@ -895,7 +898,8 @@ static int write_and_test(struct pci_dev *dev, const int where, const u32 val)
  *    three channels. However, this is not clear at the datasheet.
  */
 static ssize_t i7core_inject_enable_store(struct mem_ctl_info *mci,
-				       const char *data, size_t count)
+					  const char *data, size_t count,
+					  void *priv)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	u32 injectmask;
@@ -998,7 +1002,7 @@ static ssize_t i7core_inject_enable_store(struct mem_ctl_info *mci,
 }
 
 static ssize_t i7core_inject_enable_show(struct mem_ctl_info *mci,
-					char *data)
+					char *data, void *priv)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
 	u32 injectmask;
@@ -1020,7 +1024,7 @@ static ssize_t i7core_inject_enable_show(struct mem_ctl_info *mci,
 #define DECLARE_COUNTER(param)					\
 static ssize_t i7core_show_counter_##param(			\
 		struct mem_ctl_info *mci,			\
-		char *data)					\
+		char *data, void *priv)			\
 {								\
 	struct i7core_pvt *pvt = mci->pvt_info;			\
 								\
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 4d84e40..790e1c2 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -500,8 +500,19 @@ struct mcidev_sysfs_attribute {
 	const struct mcidev_sysfs_group *grp;	/* Points to a group of attributes */
 
 	/* Ops for show/store values at the attribute - not used on group */
-        ssize_t (*show)(struct mem_ctl_info *,char *);
-        ssize_t (*store)(struct mem_ctl_info *, const char *,size_t);
+	ssize_t (*show)(struct mem_ctl_info *, char *, void *);
+	ssize_t (*store)(struct mem_ctl_info *, const char *, size_t, void *);
+
+	void *priv;
+};
+
+/*
+ * struct errcount_attribute - used to store the several error counts
+ */
+struct errcount_attribute_data {
+	int n_layers;
+	int pos[EDAC_MAX_LAYERS];
+	int layer0, layer1, layer2;
 };
 
 struct edac_hierarchy {
@@ -583,6 +594,8 @@ struct mem_ctl_info {
 	unsigned ce_noinfo_count, ue_noinfo_count;
 	unsigned ce_mc, ue_mc;
 	u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
+	struct mcidev_sysfs_attribute *errcount_attr;
+	struct errcount_attribute_data *errcount_attr_data;
 
 	struct completion complete;
 
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 21/31] hw_event: Add x86 MCE events on it
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (19 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 20/31] edac: Export MC hierarchy counters for CE and UE Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 22/31] amd64_edac: convert it to use the MCE log tracepoint where applicable Mauro Carvalho Chehab
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

As x86 architecture defines a way for the CPU to report hardware
errors, via MCE, integrate it at the hw_event trace class.

As the EDAC parsers are capable of enriching the information for
memory errors, pointing to the defected DIMM's, while the MCE
log provides additional details of the error, helping the OEM
and hardware vendors to track what happened, two new trace events
that merges both MCE and memory errors were created. The EDAC
core will use those new tracepoint, on x86 arch, if the MCE trace
is available.

This patch is based on Tony Luck and Borislav Petkov feedback.

The mcelog events should now be used by sb_edac and i7core_edac,
as the extra parameter for edac_mc_handle_error() were introduced
on the last changeset.

I opted to convert amd64_edac to use it on a separate patch, as it
would likely make easier for Borislav to review.

Suggested-by: Borislav Petkov <bp@amd64.org>
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    2 +-
 drivers/edac/edac_core.h         |    2 +-
 drivers/edac/edac_mc.c           |   25 ++++-
 include/trace/events/hw_event.h  |  238 +++++++++++++++++++++++++++++++++++++-
 include/trace/events/mce.h       |   69 -----------
 5 files changed, 259 insertions(+), 77 deletions(-)
 delete mode 100644 include/trace/events/mce.h

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2af127d..c219f72 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -53,7 +53,7 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
 			      lockdep_is_held(&mce_chrdev_read_mutex))
 
 #define CREATE_TRACE_POINTS
-#include <trace/events/mce.h>
+#include <trace/events/hw_event.h>
 
 int mce_disabled __read_mostly;
 
diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index 1d421d3..7caff6e 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -470,7 +470,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog);
+			  const void *arch_log);
 
 /*
  * edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 37d2c97..2dca0e3 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -899,7 +899,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog)
+			  const void *arch_log)
 {
 	unsigned long remapped_page;
 	/* FIXME: too much for stack: move it to some pre-alocated area */
@@ -924,9 +924,23 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 				p = "UE";
 				mci->ue_mc++;
 			}
+#ifdef CONFIG_X86
+			if (arch_log)
+				trace_mc_out_of_range_mce(mci, p,
+							  edac_layer_name[mci->layers[i].type],
+							  pos[i], 0,
+							  mci->layers[i].size,
+							  arch_log);
+			else
+				trace_mc_out_of_range(mci, p,
+						      edac_layer_name[mci->layers[i].type],
+						      pos[i], 0,
+						      mci->layers[i].size);
+#else
 			trace_mc_out_of_range(mci, p,
 					edac_layer_name[mci->layers[i].type],
 					pos[i], 0, mci->layers[i].size);
+#endif
 			edac_mc_printk(mci, KERN_ERR,
 				       "INTERNAL ERROR: %s value is out of range (%d >= %d)\n",
 				       edac_layer_name[mci->layers[i].type],
@@ -1033,8 +1047,17 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			"page 0x%lx offset 0x%lx grain %d\n",
 			page_frame_number, offset_in_page, grain);
 
+#ifdef CONFIG_X86
+	if (arch_log)
+		trace_mc_error_mce(type, mci->mc_idx, msg, label, location,
+				   detail, other_detail, arch_log);
+	else
+		trace_mc_error(type, mci->mc_idx, msg, label, location,
+			       detail, other_detail);
+#else
 	trace_mc_error(type, mci->mc_idx, msg, label, location,
 		       detail, other_detail);
+#endif
 
 	if (type == HW_EVENT_ERR_CORRECTED) {
 		if (edac_mc_get_log_ce())
diff --git a/include/trace/events/hw_event.h b/include/trace/events/hw_event.h
index 4c455c1..ade0185 100644
--- a/include/trace/events/hw_event.h
+++ b/include/trace/events/hw_event.h
@@ -6,6 +6,7 @@
 
 #include <linux/tracepoint.h>
 #include <linux/edac.h>
+#include <linux/ktime.h>
 
 /*
  * Hardware Anomaly Report Mecanism (HARM) events
@@ -13,6 +14,9 @@
  * Those events are generated when hardware detected a corrected or
  * uncorrected event, and are meant to replace the current API to report
  * errors defined on both EDAC and MCE subsystems.
+ *
+ * There are two types of events defined here: arch-independent ones, and
+ * x86 arch events. The x86 arch events are based on x86 MCE architecture.
  */
 
 DECLARE_EVENT_CLASS(hw_event_class,
@@ -46,7 +50,7 @@ DEFINE_EVENT(hw_event_class, hw_event_init,
 
 
 /*
- * Memory Controller specific events
+ * Hardware-independent Memory Controller specific events
  */
 
 /*
@@ -85,7 +89,7 @@ TRACE_EVENT(mc_error,
 		__assign_str(driver_detail, driver_detail);
 	),
 
-	TP_printk(HW_ERR "mce#%d: %s error %s on label \"%s\" (%s %s %s)\n",
+	TP_printk(HW_ERR "mce#%d: %s error %s on label \"%s\" (%s %s %s)",
 		  __entry->mc_index,
 		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
 			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
@@ -121,7 +125,7 @@ TRACE_EVENT(mc_out_of_range,
 		__entry->max			= max;
 	),
 
-	TP_printk(HW_ERR "mce#%d %s: %s=%d is not between %d and %d\n",
+	TP_printk(HW_ERR "mce#%d %s: %s=%d is not between %d and %d",
 		__entry->mc_index,
 		__get_str(type),
 		__get_str(field),
@@ -131,9 +135,233 @@ TRACE_EVENT(mc_out_of_range,
 );
 
 /*
- * MCE Events placeholder. Please add non-memory events that come from the
- * MCE driver here
+ * X86 arch-specific events
+ */
+
+#ifdef CONFIG_X86
+#include <asm/mce.h>
+
+/*
+ * Generic MCE event
+ */
+TRACE_EVENT(mce_record,
+
+	TP_PROTO(const struct mce *m),
+
+	TP_ARGS(m),
+
+	TP_STRUCT__entry(
+		__field(	u64,		mcgcap		)
+		__field(	u64,		mcgstatus	)
+		__field(	u64,		status		)
+		__field(	u64,		addr		)
+		__field(	u64,		misc		)
+		__field(	u64,		ip		)
+		__field(	u64,		tsc		)
+		__field(	u64,		walltime	)
+		__field(	u32,		cpu		)
+		__field(	u32,		cpuid		)
+		__field(	u32,		apicid		)
+		__field(	u32,		socketid	)
+		__field(	u8,		cs		)
+		__field(	u8,		bank		)
+		__field(	u8,		cpuvendor	)
+	),
+
+	TP_fast_assign(
+		__entry->mcgcap		= m->mcgcap;
+		__entry->mcgstatus	= m->mcgstatus;
+		__entry->status		= m->status;
+		__entry->addr		= m->addr;
+		__entry->misc		= m->misc;
+		__entry->ip		= m->ip;
+		__entry->tsc		= m->tsc;
+		__entry->walltime	= m->time;
+		__entry->cpu		= m->extcpu;
+		__entry->cpuid		= m->cpuid;
+		__entry->apicid		= m->apicid;
+		__entry->socketid	= m->socketid;
+		__entry->cs		= m->cs;
+		__entry->bank		= m->bank;
+		__entry->cpuvendor	= m->cpuvendor;
+	),
+
+	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, ADDR/MISC: %016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
+		__entry->cpu,
+		__entry->mcgcap, __entry->mcgstatus,
+		__entry->bank, __entry->status,
+		__entry->addr, __entry->misc,
+		__entry->cs, __entry->ip,
+		__entry->tsc,
+		__entry->cpuvendor, __entry->cpuid,
+		__entry->walltime,
+		__entry->socketid,
+		__entry->apicid)
+);
+
+/*
+ * MCE event for memory-controller errors
  */
+TRACE_EVENT(mc_error_mce,
+
+	TP_PROTO(const unsigned int err_type,
+		 const unsigned int mc_index,
+		 const char *msg,
+		 const char *label,
+		 const char *location,
+		 const char *detail,
+		 const char *driver_detail,
+		 const struct mce *m),
+
+	TP_ARGS(err_type, mc_index, msg, label, location,
+		detail, driver_detail, m),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	err_type	)
+		__field(	unsigned int,	mc_index	)
+		__string(	msg,		msg		)
+		__string(	label,		label		)
+		__string(	detail,		detail		)
+		__string(	location,	location	)
+		__string(	driver_detail,	driver_detail	)
+		__field(	u64,		mcgcap		)
+		__field(	u64,		mcgstatus	)
+		__field(	u64,		status		)
+		__field(	u64,		addr		)
+		__field(	u64,		misc		)
+		__field(	u64,		ip		)
+		__field(	u64,		tsc		)
+		__field(	u64,		walltime	)
+		__field(	u32,		cpu		)
+		__field(	u32,		cpuid		)
+		__field(	u32,		apicid		)
+		__field(	u32,		socketid	)
+		__field(	u8,		cs		)
+		__field(	u8,		bank		)
+		__field(	u8,		cpuvendor	)
+	),
+
+	TP_fast_assign(
+		__entry->err_type	= err_type;
+		__entry->mc_index	= mc_index;
+		__assign_str(msg, msg);
+		__assign_str(label, label);
+		__assign_str(location, location);
+		__assign_str(detail, detail);
+		__assign_str(driver_detail, driver_detail);
+		__entry->mcgcap		= m->mcgcap;
+		__entry->mcgstatus	= m->mcgstatus;
+		__entry->status		= m->status;
+		__entry->addr		= m->addr;
+		__entry->misc		= m->misc;
+		__entry->ip		= m->ip;
+		__entry->tsc		= m->tsc;
+		__entry->walltime	= m->time;
+		__entry->cpu		= m->extcpu;
+		__entry->cpuid		= m->cpuid;
+		__entry->apicid		= m->apicid;
+		__entry->socketid	= m->socketid;
+		__entry->cs		= m->cs;
+		__entry->bank		= m->bank;
+		__entry->cpuvendor	= m->cpuvendor;
+	),
+
+	TP_printk("mce#%d: %s error %s on label \"%s\" (%s %s CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, ADDR/MISC: %016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x %s)",
+		  __entry->mc_index,
+		  (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+			((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+			"Fatal" : "Uncorrected"),
+		  __get_str(msg),
+		  __get_str(label),
+		  __get_str(location),
+		  __get_str(detail),
+		  __entry->cpu,
+		  __entry->mcgcap, __entry->mcgstatus,
+		  __entry->bank, __entry->status,
+		  __entry->addr, __entry->misc,
+		  __entry->cs, __entry->ip,
+		  __entry->tsc,
+		  __entry->cpuvendor, __entry->cpuid,
+		  __entry->walltime,
+		  __entry->socketid,
+		  __entry->apicid,
+		  __get_str(driver_detail))
+);
+
+TRACE_EVENT(mc_out_of_range_mce,
+	TP_PROTO(struct mem_ctl_info *mci, const char *type, const char *field,
+		int invalid_val, int min, int max, const struct mce *m),
+
+	TP_ARGS(mci, type, field, invalid_val, min, max, m),
+
+	TP_STRUCT__entry(
+		__string(	type,		type		)
+		__string(	field,		field		)
+		__field(	unsigned int,	mc_index	)
+		__field(	int,		invalid_val	)
+		__field(	int,		min		)
+		__field(	int,		max		)
+		__field(	u64,		mcgcap		)
+		__field(	u64,		mcgstatus	)
+		__field(	u64,		status		)
+		__field(	u64,		addr		)
+		__field(	u64,		misc		)
+		__field(	u64,		ip		)
+		__field(	u64,		tsc		)
+		__field(	u64,		walltime	)
+		__field(	u32,		cpu		)
+		__field(	u32,		cpuid		)
+		__field(	u32,		apicid		)
+		__field(	u32,		socketid	)
+		__field(	u8,		cs		)
+		__field(	u8,		bank		)
+		__field(	u8,		cpuvendor	)
+	),
+
+	TP_fast_assign(
+		__assign_str(type, type);
+		__assign_str(field, field);
+		__entry->mc_index	= mci->mc_idx;
+		__entry->invalid_val	= invalid_val;
+		__entry->min		= min;
+		__entry->max		= max;
+		__entry->mcgcap		= m->mcgcap;
+		__entry->mcgstatus	= m->mcgstatus;
+		__entry->status		= m->status;
+		__entry->addr		= m->addr;
+		__entry->misc		= m->misc;
+		__entry->ip		= m->ip;
+		__entry->tsc		= m->tsc;
+		__entry->walltime	= m->time;
+		__entry->cpu		= m->extcpu;
+		__entry->cpuid		= m->cpuid;
+		__entry->apicid		= m->apicid;
+		__entry->socketid	= m->socketid;
+		__entry->cs		= m->cs;
+		__entry->bank		= m->bank;
+		__entry->cpuvendor	= m->cpuvendor;
+	),
+
+	TP_printk(HW_ERR "mce#%d %s: %s=%d is not between %d and %d (CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, ADDR/MISC: %016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x)",
+		  __entry->mc_index,
+		  __get_str(type),
+		  __get_str(field),
+		  __entry->invalid_val,
+		  __entry->min,
+		  __entry->max,
+		  __entry->cpu,
+		  __entry->mcgcap, __entry->mcgstatus,
+		  __entry->bank, __entry->status,
+		  __entry->addr, __entry->misc,
+		  __entry->cs, __entry->ip,
+		  __entry->tsc,
+		  __entry->cpuvendor, __entry->cpuid,
+		  __entry->walltime,
+		  __entry->socketid,
+		  __entry->apicid)
+);
+
+#endif
 
 
 #endif /* _TRACE_HW_EVENT_MC_H */
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
deleted file mode 100644
index 4cbbcef..0000000
--- a/include/trace/events/mce.h
+++ /dev/null
@@ -1,69 +0,0 @@
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM mce
-
-#if !defined(_TRACE_MCE_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_MCE_H
-
-#include <linux/ktime.h>
-#include <linux/tracepoint.h>
-#include <asm/mce.h>
-
-TRACE_EVENT(mce_record,
-
-	TP_PROTO(struct mce *m),
-
-	TP_ARGS(m),
-
-	TP_STRUCT__entry(
-		__field(	u64,		mcgcap		)
-		__field(	u64,		mcgstatus	)
-		__field(	u64,		status		)
-		__field(	u64,		addr		)
-		__field(	u64,		misc		)
-		__field(	u64,		ip		)
-		__field(	u64,		tsc		)
-		__field(	u64,		walltime	)
-		__field(	u32,		cpu		)
-		__field(	u32,		cpuid		)
-		__field(	u32,		apicid		)
-		__field(	u32,		socketid	)
-		__field(	u8,		cs		)
-		__field(	u8,		bank		)
-		__field(	u8,		cpuvendor	)
-	),
-
-	TP_fast_assign(
-		__entry->mcgcap		= m->mcgcap;
-		__entry->mcgstatus	= m->mcgstatus;
-		__entry->status		= m->status;
-		__entry->addr		= m->addr;
-		__entry->misc		= m->misc;
-		__entry->ip		= m->ip;
-		__entry->tsc		= m->tsc;
-		__entry->walltime	= m->time;
-		__entry->cpu		= m->extcpu;
-		__entry->cpuid		= m->cpuid;
-		__entry->apicid		= m->apicid;
-		__entry->socketid	= m->socketid;
-		__entry->cs		= m->cs;
-		__entry->bank		= m->bank;
-		__entry->cpuvendor	= m->cpuvendor;
-	),
-
-	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, ADDR/MISC: %016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
-		__entry->cpu,
-		__entry->mcgcap, __entry->mcgstatus,
-		__entry->bank, __entry->status,
-		__entry->addr, __entry->misc,
-		__entry->cs, __entry->ip,
-		__entry->tsc,
-		__entry->cpuvendor, __entry->cpuid,
-		__entry->walltime,
-		__entry->socketid,
-		__entry->apicid)
-);
-
-#endif /* _TRACE_MCE_H */
-
-/* This part must be outside protection */
-#include <trace/define_trace.h>
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 22/31] amd64_edac: convert it to use the MCE log tracepoint where applicable
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (20 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 21/31] hw_event: Add x86 MCE events on it Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 23/31] edac: Simplify logs for i7core and sb edac drivers Mauro Carvalho Chehab
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Not all families supported by amd64_edac use MCE for errors.
Whenever mcelog is used, pass it to EDAC core, in order to generate
the mixed MCE/memory trace events.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/amd64_edac.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 1b374b5..aa7ecbb 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1904,7 +1904,7 @@ static void amd64_handle_ce(struct mem_ctl_info *mci, struct mce *m)
 				     -1, -1, -1,
 				     EDAC_MOD_STR,
 				     "HW has no ERROR_ADDRESS available",
-				     NULL);
+				     m);
 		return;
 	}
 
@@ -1933,7 +1933,7 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
 				     -1, -1, -1,
 				     EDAC_MOD_STR,
 				     "HW has no ERROR_ADDRESS available",
-				     NULL);
+				     m);
 		return;
 	}
 
@@ -1952,7 +1952,7 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
 				     page, offset, 0,
 				     -1, -1, -1,
 				     EDAC_MOD_STR,
-				     "ERROR ADDRESS NOT mapped to a MC", NULL);
+				     "ERROR ADDRESS NOT mapped to a MC", m);
 		return;
 	}
 
@@ -1967,12 +1967,12 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
 				     -1, -1, -1,
 				     EDAC_MOD_STR,
 				     "ERROR ADDRESS NOT mapped to CS",
-				     NULL);
+				     m);
 	} else {
 		edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
 				     page, offset, 0,
 				     csrow, -1, -1,
-				     EDAC_MOD_STR, "", NULL);
+				     EDAC_MOD_STR, "", m);
 	}
 }
 
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 23/31] edac: Simplify logs for i7core and sb edac drivers
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (21 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 22/31] amd64_edac: convert it to use the MCE log tracepoint where applicable Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 24/31] edac_mc: Some clenups at the log message Mauro Carvalho Chehab
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Now that the MCE log is printed, we can remove some redundant
info from the message log

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/i7core_edac.c |    9 ++-------
 drivers/edac/sb_edac.c     |   29 ++++++++++++++---------------
 2 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 39d0b14..c30cbf7 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1625,7 +1625,7 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 				    const struct mce *m)
 {
 	struct i7core_pvt *pvt = mci->pvt_info;
-	char *type, *optype, *err, *msg;
+	char *type, *optype, *err, msg[80];
 	enum hw_event_mc_err_type tp_event;
 	unsigned long error = m->status & 0x1ff0000l;
 	bool uncorrected_error = m->mcgstatus & 1ll << 61;
@@ -1703,10 +1703,7 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 		err = "unknown";
 	}
 
-	msg = kasprintf(GFP_ATOMIC,
-		"addr=0x%08llx cpu=%d count=%d Err=%08llx:%08llx (%s: %s))\n",
-		(long long) m->addr, m->cpu, core_err_cnt,
-		(long long)m->status, (long long)m->misc, optype, err);
+	snprintf(msg, sizeof(msg), "count=%d %s", core_err_cnt, optype);
 
 	/*
 	 * Call the helper to output message
@@ -1720,8 +1717,6 @@ static void i7core_mce_output_error(struct mem_ctl_info *mci,
 				     syndrome,
 				     channel, dimm, -1,
 				     err, msg, m);
-
-	kfree(msg);
 }
 
 /*
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 4745c94..9ae7d9e 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -1421,23 +1421,22 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 		recoverable_msg = "";
 
 	/*
-	 * FIXME: What should we do with "channel" information on mcelog?
-	 * Probably, we can just discard it, as the channel information
-	 * comes from the get_memory_error_data() address decoding
+	 * FIXME: On some memory configurations (mirror, lockstep), the
+	 * Memory Controller can't point the error to a single DIMM. The
+	 * EDAC core should be handling the channel mask, in order to point
+	 * to the group of dimm's where the error may be happening.
 	 */
 	snprintf(msg, sizeof(msg),
-			"%d error(s)%s: %s%s: cpu=%d Err=%04x:%04x addr = 0x%08llx socket=%d Channel=%ld(mask=%ld), rank=%d\n",
-			core_err_cnt,
-			overflow ? " OVERFLOW" : "",
-			area_type,
-			recoverable_msg,
-			m->cpu,
-			mscod, errcode,
-			(long long) m->addr,
-			socket,
-			first_channel,		/* This is the real channel on SB */
-			channel_mask,
-			rank);
+		 "%d error(s)%s: %s%s: Err=%04x:%04x socket=%d channel=%ld/mask=%ld rank=%d",
+		 core_err_cnt,
+		 overflow ? " OVERFLOW" : "",
+		 area_type,
+		 recoverable_msg,
+		 mscod, errcode,
+		 socket,
+		 first_channel,
+		 channel_mask,
+		 rank);
 
 	debugf0("%s", msg);
 
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 24/31] edac_mc: Some clenups at the log message
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (22 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 23/31] edac: Simplify logs for i7core and sb edac drivers Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 25/31] edac: Add a sysfs node to test the EDAC error report facility Mauro Carvalho Chehab
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 2dca0e3..c1a34b9 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -916,7 +916,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 
 	/* Check if the event report is consistent */
 	for (i = 0; i < mci->n_layers; i++) {
-		if (pos[i] >= mci->layers[i].size) {
+		if (pos[i] >= (int)mci->layers[i].size) {
 			if (type == HW_EVENT_ERR_CORRECTED) {
 				p = "CE";
 				mci->ce_mc++;
@@ -1014,7 +1014,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 		}
 	}
 	if (!enable_filter) {
-		p = "any memory";
+		strcpy(label, "any memory");
 	} else {
 		if (type == HW_EVENT_ERR_CORRECTED) {
 			if (row >= 0)
@@ -1039,12 +1039,12 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 	/* Memory type dependent details about the error */
 	if (type == HW_EVENT_ERR_CORRECTED)
 		snprintf(detail, sizeof(detail),
-			"page 0x%lx offset 0x%lx grain %d syndrome 0x%lx\n",
+			"page 0x%lx offset 0x%lx grain %d syndrome 0x%lx",
 			page_frame_number, offset_in_page,
 			grain, syndrome);
 	else
 		snprintf(detail, sizeof(detail),
-			"page 0x%lx offset 0x%lx grain %d\n",
+			"page 0x%lx offset 0x%lx grain %d",
 			page_frame_number, offset_in_page, grain);
 
 #ifdef CONFIG_X86
@@ -1062,7 +1062,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 	if (type == HW_EVENT_ERR_CORRECTED) {
 		if (edac_mc_get_log_ce())
 			edac_mc_printk(mci, KERN_WARNING,
-				       "CE %s label \"%s\" (%s %s %s)\n",
+				       "CE %s on %s (%s%s %s)\n",
 				       msg, label, location,
 				       detail, other_detail);
 		edac_increment_ce_error(mci,enable_filter, pos);
@@ -1089,11 +1089,11 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 	} else {
 		if (edac_mc_get_log_ue())
 			edac_mc_printk(mci, KERN_WARNING,
-				"UE %s label \"%s\" (%s %s %s)\n",
+				"UE %s on %s (%s%s %s)\n",
 				msg, label, location, detail, other_detail);
 
 		if (edac_mc_get_panic_on_ue())
-			panic("UE %s label \"%s\" (%s %s %s)\n",
+			panic("UE %s on %s (%s%s %s)\n",
 			      msg, label, location, detail, other_detail);
 
 		edac_increment_ue_error(mci,enable_filter, pos);
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 25/31] edac: Add a sysfs node to test the EDAC error report facility
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (23 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 24/31] edac_mc: Some clenups at the log message Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 26/31] edac_mc: Fix the enable label filter logic Mauro Carvalho Chehab
                   ` (8 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Not all hardware supports error injection. Also, this feature
could be disabled by the BIOS. As we need to test the EDAC
error report facilities and the edac utils logic, add a way
to generate fake errors, when EDAC debug is enabled.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc_sysfs.c |   62 ++++++++++++++++++++++++++++++++++++++++-
 include/linux/edac.h         |    4 +++
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 4e8f0ec..f8132ff 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -744,6 +744,42 @@ static ssize_t mci_size_mb_show(struct mem_ctl_info *mci, char *data,
 	return sprintf(data, "%u\n", PAGES_TO_MiB(total_pages));
 }
 
+#ifdef CONFIG_EDAC_DEBUG
+static ssize_t edac_fake_inject_show(struct mem_ctl_info *mci,
+				     char *data, void *priv)
+{
+	return sprintf(data,
+		       "EDAC fake test engine. Writing to this node a value in the form of :\n"
+		       "\t0:1:0\n"
+		       "will call the EDAC core routine to produce a memory error for the given memory location (0, 1, 0).\n"
+		       "The driver's error parsing logic won't be tested. This tool is useful only\n"
+		       "if you're testing the EDAC core tracing facility, or if you're needing to test\n"
+		       "some userspace application.\n");
+}
+
+static ssize_t edac_fake_inject_store(struct mem_ctl_info *mci,
+				      const char *data, size_t count,
+				      void *priv)
+{
+	static enum hw_event_mc_err_type type = HW_EVENT_ERR_CORRECTED;
+	int err, layer0 = -1, layer1 = -1, layer2= -1;
+	err = sscanf(data, "%i:%i:%i", &layer0, &layer1, &layer2);
+	if (err < 0)
+		return err;
+
+	printk(KERN_DEBUG
+	       "Generating a fake error to %d.%d.%d to test core handling. NOTE: this won't test the driver-specific decoding logic.\n",
+	       layer0, layer1, layer2);
+	edac_mc_handle_error(type, mci, 0, 0, 0,
+			     layer0, layer1, layer2,
+			     "FAKE ERROR", "for EDAC testing only", NULL);
+	if (++type == HW_EVENT_ERR_FATAL)
+		type = HW_EVENT_ERR_CORRECTED;
+
+	return count;
+}
+#endif
+
 #define to_mci(k) container_of(k, struct mem_ctl_info, edac_mci_kobj)
 #define to_mcidev_attr(a) container_of(a,struct mcidev_sysfs_attribute,attr)
 
@@ -875,7 +911,7 @@ static int edac_create_errcount_layer(struct mem_ctl_info *mci,
 		debugf4("%s() creating %s\n", __func__, (*erc)->attr.name);
 		if (!(*erc)->attr.name)
 			return -ENOMEM;
-		(*erc)->attr.mode = S_IRUGO;
+		(*erc)->attr.mode = S_IRUGO | S_IWUSR;
 		(*erc)->show = errcount_ce_show;
 		(*erc)->priv = *ercd;
 		(*ercd)->n_layers = layer + 1;
@@ -1320,7 +1356,24 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
 			goto fail2;
 		}
 	}
-	edac_create_errcount_objects(mci);
+	err = edac_create_errcount_objects(mci);
+	if (err) {
+		debugf1("%s() failure: create error count objects\n",
+			__func__);
+		goto fail2;
+	}
+#ifdef CONFIG_EDAC_DEBUG
+	mci->errinject_attr.attr.name = "fake_inject";
+	mci->errinject_attr.attr.mode = S_IRUGO | S_IWUSR;
+	mci->errinject_attr.show = edac_fake_inject_show;
+	mci->errinject_attr.store = edac_fake_inject_store;
+	err = sysfs_create_file(&mci->edac_mci_kobj, &mci->errinject_attr.attr);
+	if (err < 0) {
+		printk(KERN_ERR
+		       "sysfs_create_file for fake inject failed: %d\n", err);
+		mci->errinject_attr.attr.name = NULL;
+	}
+#endif
 
 	return 0;
 
@@ -1366,6 +1419,11 @@ void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
 
 	debugf0("%s()\n", __func__);
 
+#ifdef CONFIG_EDAC_DEBUG
+	if (mci->errinject_attr.attr.name)
+		sysfs_remove_file(&mci->edac_mci_kobj,
+				  &mci->errinject_attr.attr);
+#endif
 	edac_remove_errcount(mci);
 
 	/* remove all dimms kobjects */
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 790e1c2..6f03768 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -597,6 +597,10 @@ struct mem_ctl_info {
 	struct mcidev_sysfs_attribute *errcount_attr;
 	struct errcount_attribute_data *errcount_attr_data;
 
+#ifdef CONFIG_EDAC_DEBUG
+	struct mcidev_sysfs_attribute errinject_attr;
+#endif
+
 	struct completion complete;
 
 	/* edac sysfs device control */
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 26/31] edac_mc: Fix the enable label filter logic
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (24 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 25/31] edac: Add a sysfs node to test the EDAC error report facility Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 27/31] edac: Initialize the dimm label with the known information Mauro Carvalho Chehab
                   ` (7 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index c1a34b9..5326b6b 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -953,7 +953,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			 */
 			pos[i] = -1;
 		}
-		if (pos[i] > 0)
+		if (pos[i] >= 0)
 			enable_filter = true;
 	}
 
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 27/31] edac: Initialize the dimm label with the known information
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (25 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 26/31] edac_mc: Fix the enable label filter logic Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 28/31] edac: don't OOPS if the csrow is not visible Mauro Carvalho Chehab
                   ` (6 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c |   21 +++++++++++++++++----
 1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 5326b6b..91545a5 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -210,10 +210,10 @@ struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
 	struct errcount_attribute_data *ercd;
 	struct dimm_info *dimm;
 	u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
-	void *pvt;
+	void *pvt, *p;
 	unsigned size, tot_dimms, count, per_layer_count[EDAC_MAX_LAYERS];
 	unsigned tot_csrows, tot_cschannels, tot_errcount = 0;
-	int i, j;
+	int i, j, n, len;
 	int err;
 	int row, chn;
 
@@ -327,9 +327,22 @@ struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
 		dimm = &mci->dimms[i];
 		dimm->mci = mci;
 
-		/* Copy DIMM location */
-		for (j = 0; j < n_layers; j++)
+		/*
+		 * Copy DIMM location and initialize the memory location
+		 */
+		len = sizeof(dimm->label);
+		p = dimm->label;
+		n = snprintf(p, len,"mc#%u", edac_index);
+		p += n;
+		len -= n;
+		for (j = 0; j < n_layers; j++) {
+			n = snprintf(p, len,"%s#%u",
+				     edac_layer_name[layers[j].type],
+				     per_layer_count[j]);
+			p += n;
+			len -= n;
 			dimm->location[j] = per_layer_count[j];
+		}
 
 		/* Link it to the csrows old API data */
 		chan->dimm = dimm;
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 28/31] edac: don't OOPS if the csrow is not visible
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (26 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 27/31] edac: Initialize the dimm label with the known information Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 29/31] edac: Fix sysfs csrow?/*ce*count counters Mauro Carvalho Chehab
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

[  119.831403] EDAC DEBUG: edac_mc_handle_error: edac_mc_handle_error: incrementing csrows (-1,0)
[  119.831418] BUG: unable to handle kernel paging request at 000000010000001b
[  119.838677] IP: [<ffffffffa0131276>] edac_mc_handle_error+0x45e/0x8c1 [edac_core]

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 91545a5..4145fa6 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1029,11 +1029,14 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 	if (!enable_filter) {
 		strcpy(label, "any memory");
 	} else {
+		debugf4("%s: incrementing csrows (%d,%d)\n",
+			__func__, row, chan);
 		if (type == HW_EVENT_ERR_CORRECTED) {
-			if (row >= 0)
+			if (row >= 0) {
 				mci->csrows[row].ce_count++;
-			if (chan >= 0)
-				mci->csrows[row].channels[chan].ce_count++;
+				if (chan >= 0)
+					mci->csrows[row].channels[chan].ce_count++;
+			}
 		} else
 			if (row >= 0)
 				mci->csrows[row].ue_count++;
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 29/31] edac: Fix sysfs csrow?/*ce*count counters
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (27 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 28/31] edac: don't OOPS if the csrow is not visible Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 30/31] edac: Fix new error counts Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Fix a bug at the logic that were preventing error counts at the
same csrow/channel to be properly incremented.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c |   23 +++++++++++------------
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 4145fa6..3f74ba9 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1013,23 +1013,22 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			 * get csrow/channel of the dimm, in order to allow
 			 * incrementing the compat API counters
 			 */
-			if (mci->layers[i].is_csrow) {
-				if (row == -1)
-					row = dimm->csrow;
-				else if (row >= 0 && row != dimm->csrow)
-					row = -2;
-			} else {
-				if (chan == -1)
-					chan = dimm->cschannel;
-				else if (chan >= 0 && chan != dimm->cschannel)
-					chan = -2;
-			}
+			debugf4("%s: dimm csrows (%d,%d)\n",
+				__func__, dimm->csrow, dimm->cschannel);
+			if (row == -1)
+				row = dimm->csrow;
+			else if (row >= 0 && row != dimm->csrow)
+				row = -2;
+			if (chan == -1)
+				chan = dimm->cschannel;
+			else if (chan >= 0 && chan != dimm->cschannel)
+				chan = -2;
 		}
 	}
 	if (!enable_filter) {
 		strcpy(label, "any memory");
 	} else {
-		debugf4("%s: incrementing csrows (%d,%d)\n",
+		debugf4("%s: csrow/channel to increment: (%d,%d)\n",
 			__func__, row, chan);
 		if (type == HW_EVENT_ERR_CORRECTED) {
 			if (row >= 0) {
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 30/31] edac: Fix new error counts
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (28 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 29/31] edac: Fix sysfs csrow?/*ce*count counters Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10  0:01 ` [PATCH v3 31/31] edac: Fix per layer error count counters Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c       |    6 +-----
 drivers/edac/edac_mc_sysfs.c |    9 ++++++---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 3f74ba9..6714b36 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -249,12 +249,8 @@ struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
 		count *= layers[i].size;
 		ce_per_layer[i] = edac_align_ptr(&ptr, sizeof(unsigned), count);
 		ue_per_layer[i] = edac_align_ptr(&ptr, sizeof(unsigned), count);
-		if (i < n_layers - 1)
-			tot_errcount += 2 * count;
+		tot_errcount += 2 * count;
 	}
-	/*
-	 * The last error count is equal to DIMM. So, don't export it twice
-	 */
 	erc = edac_align_ptr(&ptr, sizeof(*erc), tot_errcount);
 	ercd = edac_align_ptr(&ptr, sizeof(*ercd), tot_errcount);
 	pvt = edac_align_ptr(&ptr, sz_pvt, 1);
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index f8132ff..1d4ee32 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -866,7 +866,7 @@ static ssize_t errcount_ce_show(struct mem_ctl_info *mci, char *data,
 	struct errcount_attribute_data *ead = priv;
 	int i, index = 0;
 
-	for (i = 0; i < mci->n_layers - 1; i++) {
+	for (i = 0; i < ead->n_layers - 1; i++) {
 		index += ead->pos[i];
 		index *= mci->layers[i].size;
 	}
@@ -881,7 +881,8 @@ static ssize_t errcount_ue_show(struct mem_ctl_info *mci, char *data,
 	struct errcount_attribute_data *ead = priv;
 	int i, index = 0;
 
-	for (i = 0; i < mci->n_layers - 1; i++) {
+
+	for (i = 0; i < ead->n_layers - 1; i++) {
 		index += ead->pos[i];
 		index *= mci->layers[i].size;
 	}
@@ -921,6 +922,8 @@ static int edac_create_errcount_layer(struct mem_ctl_info *mci,
 			printk(KERN_ERR "sysfs_create_file failed: %d\n", err);
 			return err;
 		}
+		(*erc)++;
+		(*ercd)++;
 
 		(*erc)->attr.name = kasprintf(GFP_KERNEL, "ue%s", location);
 		debugf4("%s() creating %s\n", __func__, (*erc)->attr.name);
@@ -972,7 +975,7 @@ static int edac_create_errcount_objects(struct mem_ctl_info *mci)
 	int err, i, count;
 
 	count = 1;
-	for (i = 0; i < mci->n_layers - 1; i++) {
+	for (i = 0; i < mci->n_layers; i++) {
 		count *= mci->layers[i].size;
 		err = edac_create_errcount_layer(mci, &erc, &ercd, i, count);
 		if (err < 0)
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 31/31] edac: Fix per layer error count counters
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (29 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 30/31] edac: Fix new error counts Mauro Carvalho Chehab
@ 2012-02-10  0:01 ` Mauro Carvalho Chehab
  2012-02-10 13:26 ` [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Borislav Petkov
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10  0:01 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 drivers/edac/edac_mc.c       |   21 ++++++++++++++-------
 drivers/edac/edac_mc_sysfs.c |   25 ++++++++++++++-----------
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 6714b36..1eeecf7 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -247,10 +247,13 @@ struct mem_ctl_info *edac_mc_alloc(unsigned edac_index,
 	count = 1;
 	for (i = 0; i < n_layers; i++) {
 		count *= layers[i].size;
-		ce_per_layer[i] = edac_align_ptr(&ptr, sizeof(unsigned), count);
-		ue_per_layer[i] = edac_align_ptr(&ptr, sizeof(unsigned), count);
+		debugf4("%s: errcount layer %d size %d\n", __func__, i, count);
+		ce_per_layer[i] = edac_align_ptr(&ptr, sizeof(u32), count);
+		ue_per_layer[i] = edac_align_ptr(&ptr, sizeof(u32), count);
 		tot_errcount += 2 * count;
 	}
+
+	debugf4("%s: allocating %d error counters\n", __func__, tot_errcount);
 	erc = edac_align_ptr(&ptr, sizeof(*erc), tot_errcount);
 	ercd = edac_align_ptr(&ptr, sizeof(*ercd), tot_errcount);
 	pvt = edac_align_ptr(&ptr, sz_pvt, 1);
@@ -872,7 +875,9 @@ static void edac_increment_ce_error(struct mem_ctl_info *mci,
 			break;
 		index += pos[i];
 		mci->ce_per_layer[i][index]++;
-		index *= mci->layers[i].size;
+
+		if (i < mci->n_layers - 1)
+			index *= mci->layers[i + 1].size;
 	}
 }
 
@@ -894,7 +899,9 @@ static void edac_increment_ue_error(struct mem_ctl_info *mci,
 			break;
 		index += pos[i];
 		mci->ue_per_layer[i][index]++;
-		index *= mci->layers[i].size;
+
+		if (i < mci->n_layers - 1)
+			index *= mci->layers[i + 1].size;
 	}
 }
 
@@ -912,9 +919,8 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 {
 	unsigned long remapped_page;
 	/* FIXME: too much for stack: move it to some pre-alocated area */
-	char detail[80 + strlen(other_detail)];
+	char detail[80], location[80];
 	char label[(EDAC_MC_LABEL_LEN + 2) * mci->tot_dimms], *p;
-	char location[80];
 	int row = -1, chan = -1;
 	int pos[EDAC_MAX_LAYERS] = { layer0, layer1, layer2 };
 	int i;
@@ -1001,8 +1007,9 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 		 */
 		if (enable_filter) {
 			strcpy(p, dimm->label);
-			p[strlen(p)] = ' ';
 			p = p + strlen(p);
+			*p = ' ';
+			p++;
 			*p = '\0';
 
 			/*
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 1d4ee32..1c3d1ed 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -628,8 +628,8 @@ static ssize_t mci_reset_counters_store(struct mem_ctl_info *mci,
 	cnt = 1;
 	for (i = 0; i < mci->n_layers; i++) {
 		cnt *= mci->layers[i].size;
-		memset(mci->ce_per_layer[i], 0, cnt);
-		memset(mci->ue_per_layer[i], 0, cnt);
+		memset(mci->ce_per_layer[i], 0, cnt * sizeof(u32));
+		memset(mci->ue_per_layer[i], 0, cnt * sizeof(u32));
 	}
 
 	mci->start_time = jiffies;
@@ -866,11 +866,12 @@ static ssize_t errcount_ce_show(struct mem_ctl_info *mci, char *data,
 	struct errcount_attribute_data *ead = priv;
 	int i, index = 0;
 
-	for (i = 0; i < ead->n_layers - 1; i++) {
-		index += ead->pos[i];
-		index *= mci->layers[i].size;
+	for (i = 0; i < ead->n_layers; i++) {
+		if (i < ead->n_layers - 1)
+			index += mci->layers[i + 1].size * ead->pos[i];
+		else
+			index += ead->pos[i];
 	}
-	index += ead->pos[i];
 	return sprintf(data, "%u\n",
 		       mci->ce_per_layer[ead->n_layers - 1][index]);
 }
@@ -881,12 +882,12 @@ static ssize_t errcount_ue_show(struct mem_ctl_info *mci, char *data,
 	struct errcount_attribute_data *ead = priv;
 	int i, index = 0;
 
-
-	for (i = 0; i < ead->n_layers - 1; i++) {
-		index += ead->pos[i];
-		index *= mci->layers[i].size;
+	for (i = 0; i < ead->n_layers; i++) {
+		if (i < ead->n_layers - 1)
+			index += mci->layers[i + 1].size * ead->pos[i];
+		else
+			index += ead->pos[i];
 	}
-	index += ead->pos[i];
 	return sprintf(data, "%u\n",
 		       mci->ue_per_layer[ead->n_layers - 1][index]);
 }
@@ -949,6 +950,7 @@ static int edac_create_errcount_layer(struct mem_ctl_info *mci,
 		(*erc)++;
 		(*ercd)++;
 	}
+
 	return 0;
 }
 
@@ -981,6 +983,7 @@ static int edac_create_errcount_objects(struct mem_ctl_info *mci)
 		if (err < 0)
 			goto err;
 	}
+	debugf4("%s: created %d objects\n", __func__, (unsigned)(erc - mci->errcount_attr));
 	return 0;
 err:
 	edac_remove_errcount(mci);
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 00/31] Hardware Events Report Mecanism (HERM)
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (30 preceding siblings ...)
  2012-02-10  0:01 ` [PATCH v3 31/31] edac: Fix per layer error count counters Mauro Carvalho Chehab
@ 2012-02-10 13:26 ` Borislav Petkov
  2012-02-10 16:39   ` Mauro Carvalho Chehab
  2012-02-10 16:48 ` [PATCH v3 32/31] edac: restore mce.h file Mauro Carvalho Chehab
  2012-02-13  9:23 ` [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
  33 siblings, 1 reply; 47+ messages in thread
From: Borislav Petkov @ 2012-02-10 13:26 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List, bp, tony.luck

On Thu, Feb 09, 2012 at 10:00:59PM -0200, Mauro Carvalho Chehab wrote:
> The old sysfs nodes are still supported. Latter patches will allow
> disabling the old sysfs nodes.

I wouldn't remove those easily since they're documented as an EDAC
interface in <Documentation/edac.txt> and, as such, are probably used by
people.

> All errors currently generate the printk events, as before, but they'll
> also generate perf events like:

This format needs a bit more massaging:

>
>             bash-1680 [001] 152.349448: mc_error: [Hardware Error]:
> mce#0: Uncorrected error FAKE ERROR on label "mc#0channel#2slot#2 "
> (channel 2 slot 2 page 0x0 offset 0x0 grain 0 for EDAC testing only)

I don't see why the process and PID are relevant to the error reported
so it should probably go. I dunno whether this can easily be done with
the current ftrace code...

> kworker/u:5-198 [006] 1341.771535: mc_error_mce: mce#0: Corrected
>error memory read error on label "CPU_SrcID#0_Channel#3_DIMM#0 "
>(channel 0 slot 0 page 0x3a2db9 offset 0x7ac grain 32 syndrome
>0x0 CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
>00000003a2db97ac/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0,
>PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 1 error(s):
>Unknown: Err=0001:0090 socket=0 channel=2/mask=4 rank=1)

This is too much, you probably only want to say:

	Corrected DRAM read error on DIMM "CPU..."

The channel, slot, page etc should be only Kconfigurable for people who
really need it.

>      kworker/u:5-198 [006] 1341.792536: mc_error_mce: mce#0: Corrected
> error Can't discover the memory rank for ch addr 0x60f2a6d76 on
> label "any memory" ( page 0x0 offset 0x0 grain 32 syndrome 0x0
> CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
> 0000000c1e54dab6/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0,
> PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 )

I guess we can report EDAC failures to map the DIMM properly like this.

> New sysfs nodes are now provided, to match the real memory architecture.

... and we need those because...?

> For example, on a Sandy Bridge-EP machine, with up to 4 channels, and up
> to 3 DIMMs per channel:
> 
> /sys/devices/system/edac/mc/mc0/
> ├── ce_channel0
> ├── ce_channel0_slot0
> ├── ce_channel0_slot1
> ├── ce_channel0_slot2
> ├── ce_channel1
> ├── ce_channel1_slot0
> ├── ce_channel1_slot1
> ├── ce_channel1_slot2
> ├── ce_channel2
> ├── ce_channel2_slot0
> ├── ce_channel2_slot1
> ├── ce_channel2_slot2
> ├── ce_channel3
> ├── ce_channel3_slot0
> ├── ce_channel3_slot1
> ├── ce_channel3_slot2
> ├── ce_count
> ├── ce_noinfo_count
> ├── dimm0
> │   ├── dimm_dev_type
> │   ├── dimm_edac_mode
> │   ├── dimm_label
> │   ├── dimm_location
> │   ├── dimm_mem_type
> │   └── dimm_size
> ├── dimm1
> │   ├── dimm_dev_type
> │   ├── dimm_edac_mode
> │   ├── dimm_label
> │   ├── dimm_location
> │   ├── dimm_mem_type
> │   └── dimm_size
> ├── fake_inject
> ├── ue_channel0
> ├── ue_channel0_slot0
> ├── ue_channel0_slot1
> ├── ue_channel0_slot2
> ├── ue_channel1
> ├── ue_channel1_slot0
> ├── ue_channel1_slot1
> ├── ue_channel1_slot2
> ├── ue_channel2
> ├── ue_channel2_slot0
> ├── ue_channel2_slot1
> ├── ue_channel2_slot2
> ├── ue_channel3
> ├── ue_channel3_slot0
> ├── ue_channel3_slot1
> ├── ue_channel3_slot2
> ├── ue_count
> └── ue_noinfo_count
> 
> One of the above nodes allow testing the error report mechanism by
> providing a simple driver-independent way to inject errors (fake_inject).
> This node is enabled only when CONFIG_EDAC_DEBUG is enabled, and it
> is limited to test the core EDAC report mechanisms, but it helps to
> test if the tracing events are properly accredited to the right DIMMs.

What happens with the inject_* sysfs nodes which are in EDAC already?

[..]

> The memory error handling function has now the capability of reporting
> more than one dimm, when it is not possible to put the fingers into
> a single place.
> 
> For example:
> 	# echo 1 >/sys/devices/system/edac/mc/mc0/fake_inject  && dmesg |tail -1
> 	[ 2878.130704] EDAC MC0: CE FAKE ERROR on mc#0channel#1slot#0 mc#0channel#1slot#1 mc#0channel#1slot#2  (channel 1 page 0x0 offset 0x0 grain 0 syndrome 0x0 for EDAC testing only)
> 
> All dimm memories present on channel 1 are pointed as one of them were
> responsible for the error.

I don't see how this can be of any help? I like the EDAC failure message
better: if we cannot map it properly for some reason, we tell so the
user instead of generating some misleading data.

> 
> With regards to the output, the errors are now reported on a more 
> user-friendly way, e. g. the EDAC core will output:
> 
> - the timestamp;
> - the memory controller;
> - if the error is corrected, uncorrected or fatal;
> - the error message (driver specific, for example "read error", "scrubbing
>   error", etc)
> - the affected memory labels.

"labels"? See above, if we cannot report it properly, we better say so
instead of misleading with multiple labels.
> 
> Other technical details are provided, inside parenthesis, in order to
> allow hardware manufacturers, OEM, etc to have more details on it, and
> discover what DRAM has problems, if they want/need to.

Exactly, "if they want/need to" sounds like a Kconfig option to me which
can be turned on when needed.
> 
> Ah, now that the memory architecture is properly represented, the DIMM
> labels are automatically filled by the mc_alloc function call, in order
> to properly represent the memory architecture.
> 
> For example, in the case of Sandy Bridge, a memory can be described as:
> 	mc#0channel#1slot#0
> 
> This matches the way the memory is known inside the technical information,
> and, hopefully, at the OEM manuals for the motherboard.

This is not always the case. You need the silkscreen labels from the
board manufacturers as they do not always match with the DIMM topology
from the hw vendor. OEM vendor BIOS should do this with a table of
silkscreen labels or something.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  2012-02-10  0:01 ` [PATCH v3 01/31] events/hw_event: Create a " Mauro Carvalho Chehab
@ 2012-02-10 13:41   ` Borislav Petkov
  2012-02-10 14:17     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 47+ messages in thread
From: Borislav Petkov @ 2012-02-10 13:41 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Linux Edac Mailing List, Linux Kernel Mailing List

On Thu, Feb 09, 2012 at 10:01:00PM -0200, Mauro Carvalho Chehab wrote:
> In order to provide a proper hardware event subsystem, let's
> encapsulate hardware events into a common trace facility, and
> make both edac and mce drivers to use it. After that, common
> facilities can be moved into a new core for hardware events
> reporting subsystem. This patch is the first of a series, and just
> touches at mce.

I think it would work too if you had only one event:

* trace_hw_error(...)

which would have as an argument a string describing it, like
"Uncorrected Memory Read Error", "Memory Read Error (out of range)" "TLB
Multimatch Error" etc., followed by the rest of the error info.

Currently, you're introducing at least 5 trace_* calls _only_ for memory
errors. What about the remaining couples of tens of errors which haven't
been addressed yet?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  2012-02-10 13:41   ` Borislav Petkov
@ 2012-02-10 14:17     ` Mauro Carvalho Chehab
  2012-02-12 12:48       ` Borislav Petkov
  0 siblings, 1 reply; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10 14:17 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Linux Edac Mailing List, Linux Kernel Mailing List

Em 10-02-2012 11:41, Borislav Petkov escreveu:
> On Thu, Feb 09, 2012 at 10:01:00PM -0200, Mauro Carvalho Chehab wrote:
>> In order to provide a proper hardware event subsystem, let's
>> encapsulate hardware events into a common trace facility, and
>> make both edac and mce drivers to use it. After that, common
>> facilities can be moved into a new core for hardware events
>> reporting subsystem. This patch is the first of a series, and just
>> touches at mce.
> 
> I think it would work too if you had only one event:
> 
> * trace_hw_error(...)
> 
> which would have as an argument a string describing it, like
> "Uncorrected Memory Read Error", "Memory Read Error (out of range)" "TLB
> Multimatch Error" etc., followed by the rest of the error info.
> 
> Currently, you're introducing at least 5 trace_* calls _only_ for memory
> errors. What about the remaining couples of tens of errors which haven't
> been addressed yet?

Good point.

The way I see it is that:

- a non-memory related, non-parsed MCE event would generate a "mce_record" trace
	(we need an additional patch to disable it when the error is parsed.
	 I'll address it after finishing the tests with a few other platforms);

As more MCE parsers are added at the core, the situations where such event will
be generated will reduce, and will eventually disappear in long term.

- a non-x86 event (or a x86 event for a memory controller that is not addressed
by MCE events) will use a "mc_error";

- a x86 event generated via MCE will use a "mc_error_mce".

There are two special events defined when there's a memory error _and_ a driver
bug:

	"mc_out_of_range_mce" and "mc_out_of_range".

While the name of them and one of the parameters are memory-controller specific,
it should be easy to make it generic enough to be used by other types of errors.

The previous EDAC logic were to generate an out of range printk and return. With
the changes I made, it is possible to let the EDAC to provide the information
parsed, just discarding the bad parsed value. That's the approach I took, as the
other information there may be useful. By taking such approach, the MCE information
will be shown by the "mc_error_mce" trace. So, we can remove the "mc_out_of_range_mce"
without loosing any information.

In any case, we can't merge the *_mce with the non-mce variant, as the mce.h header
is arch specific and doesn't exist on PPC and tilera architectures.

So, the only event that we can actually remove is "mc_out_of_range_mce", if we let
the core generate two events for badly parsed error events. What do you think?

Regards,
Mauro



> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 00/31] Hardware Events Report Mecanism (HERM)
  2012-02-10 13:26 ` [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Borislav Petkov
@ 2012-02-10 16:39   ` Mauro Carvalho Chehab
  2012-02-12 12:08     ` Borislav Petkov
  0 siblings, 1 reply; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List, tony.luck

Em 10-02-2012 11:26, Borislav Petkov escreveu:
> On Thu, Feb 09, 2012 at 10:00:59PM -0200, Mauro Carvalho Chehab wrote:
>> The old sysfs nodes are still supported. Latter patches will allow
>> disabling the old sysfs nodes.
> 
> I wouldn't remove those easily since they're documented as an EDAC
> interface in <Documentation/edac.txt> and, as such, are probably used by
> people.

Yes, deprecating it will take some kernel releases. The main client for it
is the edac-utils, but sysadmins may have their own scripts.

IMO, we should provide a Kconfig to allow disabling the legacy sysfs, but
keep it enabled by default. With time, we may remove it together with
the backport code.

> 
>> All errors currently generate the printk events, as before, but they'll
>> also generate perf events like:
> 
> This format needs a bit more massaging:
> 
>>
>>             bash-1680 [001] 152.349448: mc_error: [Hardware Error]:
>> mce#0: Uncorrected error FAKE ERROR on label "mc#0channel#2slot#2 "
>> (channel 2 slot 2 page 0x0 offset 0x0 grain 0 for EDAC testing only)
> 
> I don't see why the process and PID are relevant to the error reported
> so it should probably go. I dunno whether this can easily be done with
> the current ftrace code...

Yes, those data is not relevant. I dunno how this is implemented there.
Maybe someone more experienced with trace internal implementation 
could help us on this matter.

> 
>> kworker/u:5-198 [006] 1341.771535: mc_error_mce: mce#0: Corrected
>> error memory read error on label "CPU_SrcID#0_Channel#3_DIMM#0 "
>> (channel 0 slot 0 page 0x3a2db9 offset 0x7ac grain 32 syndrome
>> 0x0 CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
>> 00000003a2db97ac/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0,
>> PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 1 error(s):
>> Unknown: Err=0001:0090 socket=0 channel=2/mask=4 rank=1)
> 
> This is too much, you probably only want to say:
> 
> 	Corrected DRAM read error on DIMM "CPU..."
> 
> The channel, slot, page etc should be only Kconfigurable for people who
> really need it.

Not sure if it is a good idea to remove those info via Kconfig, as this
would mean that the userspace parsers will need to be prepared to work
with both ways. It is probably better/easier to pass everything to
userspace, and add a filter there.

>>      kworker/u:5-198 [006] 1341.792536: mc_error_mce: mce#0: Corrected
>> error Can't discover the memory rank for ch addr 0x60f2a6d76 on
>> label "any memory" ( page 0x0 offset 0x0 grain 32 syndrome 0x0
>> CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
>> 0000000c1e54dab6/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0,
>> PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 )
> 
> I guess we can report EDAC failures to map the DIMM properly like this.

Hmm... Yes, that may work, but changing all the drivers to provide the
information on that way is not simple.

There are two different situations here:

1) the driver knows that it was not able to detect the memory rank.
   all it needs to do is to fill the message like above. No issue at all.

2) the driver thinks that the information were decoded, but there's
   a bug there. This is what the "out-of-range" error message tracks.

Changing all drivers to provide a message like above for (2) requires
lots of changes and tests, on each driver, and will provide a very
little benefit:  if such error ever happens, the EDAC driver needs a 
fix, and the parsed information is not reliable (the mce one can still
be used). In such case, a bug against the driver should be filled.

There's no way for doing properly at core level, as the way to decode
such out-of-range bugs is memory-architecture dependent. So, something
like:
	"Can't discover the memory rank for ch addr 0x60f2a6d76"

Doesn't make much sense for FB-DIMM memory driver.

On such drivers, identifying the channel and the slot is easy: all it is 
needed is to check what AMB  were selected. However, identifying the
channel addr is not relevant on this case and some drivers aren't even
able to get such info, as the memory controller may be interlacing the
memories between different channels.

Also, an out of range information would mean that the driver maintainer
will need to fix something there. A message like:

	kernel: EDAC MC0: UE - no information available: INTERNAL ERROR 
	kernel: EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)

(this is a real case example)

Is clearer for a driver maintainer than:

	Can't discover the memory rank for ch addr 0x60f2a6d76

As it points exactly what's wrong at the parser.

(in this specific example, the driver were able to get the proper error,
but the FBDIMM error call it were using required two channels, due to one
of the EDAC internal constraints). As most FB-DIMM drivers weren't able to
attribute an UE error to a single channel, the typical call were to pass
channel, channel + 1. In this specific MC, this is wrong.

>> New sysfs nodes are now provided, to match the real memory architecture.
> 
> ... and we need those because...?

Because manufacturers wanting to map their motherboards into EDAC is finding
a very bad time, as there's no easy way to map what they have with a random
driver-specific fake csrows/channel information. Anywone wanting to do such
mapping right now would need to read and understand the edac driver. The
only driver where such mapping is easy seems to be amd64_edac, as it doesn't
support FB-DIMMs, nor the memory controller abstracts csrows information or
provides more than 2 channels.

For example, on the above driver, there's no "channel-b". The error were
on branch 1, channel 1 of branch 1 (the third channel of the memory controller). 
The only way to discover it is after carefully analyzing the driver. So, anyone
trying to map what the motherboard's label DIMM 1D would quickly discover that
it means branch 1, channel 1, slot 1, but some drivers would map it as:

	csrow 3, channel 1	(branch, channel -> csrow; slot ->channel)
others as:
	csrow 7, channel 0	(branch, channel, slot -> csrow; channel always 0)
and others as:
	csrow 1, channel 3.	(slot -> csrow; branch, channel -> channel)

(yes, all 3 types of mapping exists at the drivers)

> 
>> For example, on a Sandy Bridge-EP machine, with up to 4 channels, and up
>> to 3 DIMMs per channel:
>>
>> /sys/devices/system/edac/mc/mc0/
>> ├── ce_channel0
>> ├── ce_channel0_slot0
>> ├── ce_channel0_slot1
>> ├── ce_channel0_slot2
>> ├── ce_channel1
>> ├── ce_channel1_slot0
>> ├── ce_channel1_slot1
>> ├── ce_channel1_slot2
>> ├── ce_channel2
>> ├── ce_channel2_slot0
>> ├── ce_channel2_slot1
>> ├── ce_channel2_slot2
>> ├── ce_channel3
>> ├── ce_channel3_slot0
>> ├── ce_channel3_slot1
>> ├── ce_channel3_slot2
>> ├── ce_count
>> ├── ce_noinfo_count
>> ├── dimm0
>> │   ├── dimm_dev_type
>> │   ├── dimm_edac_mode
>> │   ├── dimm_label
>> │   ├── dimm_location
>> │   ├── dimm_mem_type
>> │   └── dimm_size
>> ├── dimm1
>> │   ├── dimm_dev_type
>> │   ├── dimm_edac_mode
>> │   ├── dimm_label
>> │   ├── dimm_location
>> │   ├── dimm_mem_type
>> │   └── dimm_size
>> ├── fake_inject
>> ├── ue_channel0
>> ├── ue_channel0_slot0
>> ├── ue_channel0_slot1
>> ├── ue_channel0_slot2
>> ├── ue_channel1
>> ├── ue_channel1_slot0
>> ├── ue_channel1_slot1
>> ├── ue_channel1_slot2
>> ├── ue_channel2
>> ├── ue_channel2_slot0
>> ├── ue_channel2_slot1
>> ├── ue_channel2_slot2
>> ├── ue_channel3
>> ├── ue_channel3_slot0
>> ├── ue_channel3_slot1
>> ├── ue_channel3_slot2
>> ├── ue_count
>> └── ue_noinfo_count
>>
>> One of the above nodes allow testing the error report mechanism by
>> providing a simple driver-independent way to inject errors (fake_inject).
>> This node is enabled only when CONFIG_EDAC_DEBUG is enabled, and it
>> is limited to test the core EDAC report mechanisms, but it helps to
>> test if the tracing events are properly accredited to the right DIMMs.
> 
> What happens with the inject_* sysfs nodes which are in EDAC already?

Will keep there, as-is. This is just yet-another testing feature, and won't
interfere or superseed the existing ones.

> [..]
> 
>> The memory error handling function has now the capability of reporting
>> more than one dimm, when it is not possible to put the fingers into
>> a single place.
>>
>> For example:
>> 	# echo 1 >/sys/devices/system/edac/mc/mc0/fake_inject  && dmesg |tail -1
>> 	[ 2878.130704] EDAC MC0: CE FAKE ERROR on mc#0channel#1slot#0 mc#0channel#1slot#1 mc#0channel#1slot#2  (channel 1 page 0x0 offset 0x0 grain 0 syndrome 0x0 for EDAC testing only)
>>
>> All dimm memories present on channel 1 are pointed as one of them were
>> responsible for the error.
> 
> I don't see how this can be of any help? I like the EDAC failure message
> better: if we cannot map it properly for some reason, we tell so the
> user instead of generating some misleading data.

This is not a misleading data. Depending on how the ECC code is generated,
there's no way to point to a single dimm, because two or more memories are
used to produce the ECC data.

FB-DIMM memories can be in lockstep mode. If so, UE errors happen on a
memory pair.

If the system admin wants to quickly recover the machine, he needs to know
that replacing the 2 affected memories, the machine will work. He can later
put the affected memories on a separate hardware, using a single-channel
mode, in order to discover what's broken, but pointing to the two affected
memories helps him to recover quickly, while still allowing him to further
track where the problem is.

Btw, on Sandy Bridge, a memory can be on both lockstep and mirror mode. So,
if an UE error occurs, 4 DIMM's maybe affected.

> 
>>
>> With regards to the output, the errors are now reported on a more 
>> user-friendly way, e. g. the EDAC core will output:
>>
>> - the timestamp;
>> - the memory controller;
>> - if the error is corrected, uncorrected or fatal;
>> - the error message (driver specific, for example "read error", "scrubbing
>>   error", etc)
>> - the affected memory labels.
> 
> "labels"? See above, if we cannot report it properly, we better say so
> instead of misleading with multiple labels.

What the poor user is expected to do on such case, if it is not pointed to
some memories for him to test? Ok, we can improve the message to make it
clearer that likely just one of the pointed memories were affected, but 
letting him with no glue would be a nightmare for the users.

>> Other technical details are provided, inside parenthesis, in order to
>> allow hardware manufacturers, OEM, etc to have more details on it, and
>> discover what DRAM has problems, if they want/need to.
> 
> Exactly, "if they want/need to" sounds like a Kconfig option to me which
> can be turned on when needed.

I'm yet to know a real usecase where the user doesn't want that. He may not be
the final consumer of such data, but what we've seen, in practice, is that,
when people need to replace bad memory sticks, they go after the machine vendors,
asking for warranty replacement. The vendors usually request a more detailed 
info than just "dimm xxx is broken". The rest of the log helps them to show
what actually happened with the memories, and the vendor to verify that the
complain is pertinent.

Anyway, as I said before, the better would be that the userspace tool that
retrieves such data to have an option to show the details or not.

>> Ah, now that the memory architecture is properly represented, the DIMM
>> labels are automatically filled by the mc_alloc function call, in order
>> to properly represent the memory architecture.
>>
>> For example, in the case of Sandy Bridge, a memory can be described as:
>> 	mc#0channel#1slot#0
>>
>> This matches the way the memory is known inside the technical information,
>> and, hopefully, at the OEM manuals for the motherboard.
> 
> This is not always the case. You need the silkscreen labels from the
> board manufacturers as they do not always match with the DIMM topology
> from the hw vendor. OEM vendor BIOS should do this with a table of
> silkscreen labels or something.

Yes. However, as I've already explained, OEM vendors don't know what
"csrow 3, channel 2" means, as there are several different ways of mapping
channel#, slot# into csrow/channel, and there are at least 4 or 5 different
mapping logic inside the drivers.

If you take a look at the existing drivers that don't use csrow/channel,
as a general rule, each driver will to its own proprietary fake mapping,
with causes lots of problem for OEM's, as they need a hardware engineer
(and/or the hardware diagram) to get the real data, and a software engineer
to analyze the driver and map it into the EDAC's internal fake representation.

Regards,
Mauro.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 32/31] edac: restore mce.h file
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (31 preceding siblings ...)
  2012-02-10 13:26 ` [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Borislav Petkov
@ 2012-02-10 16:48 ` Mauro Carvalho Chehab
  2012-02-13  9:23 ` [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-10 16:48 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Linux Edac Mailing List,
	Linux Kernel Mailing List

Trace doesn't like to have the same trace file included into
two different places. I only noticed it when compiling the
entire tree again on a different machine.

Partially reverts commit 4eb2a29419c1fefd76c8dbcd308b84a4b52faf4d

I'll likely work on another approach, as keeping the mce_record
on a different place than the other *mce traces don't seem right,
but, for now, let's just take the shortest way.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    2 +-
 include/trace/events/hw_event.h  |   62 +++------------------------------
 include/trace/events/mce.h       |   69 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+), 57 deletions(-)
 create mode 100644 include/trace/events/mce.h

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index c219f72..2af127d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -53,7 +53,7 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
 			      lockdep_is_held(&mce_chrdev_read_mutex))
 
 #define CREATE_TRACE_POINTS
-#include <trace/events/hw_event.h>
+#include <trace/events/mce.h>
 
 int mce_disabled __read_mostly;
 
diff --git a/include/trace/events/hw_event.h b/include/trace/events/hw_event.h
index ade0185..91522b3 100644
--- a/include/trace/events/hw_event.h
+++ b/include/trace/events/hw_event.h
@@ -142,66 +142,16 @@ TRACE_EVENT(mc_out_of_range,
 #include <asm/mce.h>
 
 /*
- * Generic MCE event
+ * MCE event for memory-controller errors
  */
-TRACE_EVENT(mce_record,
-
-	TP_PROTO(const struct mce *m),
-
-	TP_ARGS(m),
-
-	TP_STRUCT__entry(
-		__field(	u64,		mcgcap		)
-		__field(	u64,		mcgstatus	)
-		__field(	u64,		status		)
-		__field(	u64,		addr		)
-		__field(	u64,		misc		)
-		__field(	u64,		ip		)
-		__field(	u64,		tsc		)
-		__field(	u64,		walltime	)
-		__field(	u32,		cpu		)
-		__field(	u32,		cpuid		)
-		__field(	u32,		apicid		)
-		__field(	u32,		socketid	)
-		__field(	u8,		cs		)
-		__field(	u8,		bank		)
-		__field(	u8,		cpuvendor	)
-	),
-
-	TP_fast_assign(
-		__entry->mcgcap		= m->mcgcap;
-		__entry->mcgstatus	= m->mcgstatus;
-		__entry->status		= m->status;
-		__entry->addr		= m->addr;
-		__entry->misc		= m->misc;
-		__entry->ip		= m->ip;
-		__entry->tsc		= m->tsc;
-		__entry->walltime	= m->time;
-		__entry->cpu		= m->extcpu;
-		__entry->cpuid		= m->cpuid;
-		__entry->apicid		= m->apicid;
-		__entry->socketid	= m->socketid;
-		__entry->cs		= m->cs;
-		__entry->bank		= m->bank;
-		__entry->cpuvendor	= m->cpuvendor;
-	),
-
-	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, ADDR/MISC: %016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
-		__entry->cpu,
-		__entry->mcgcap, __entry->mcgstatus,
-		__entry->bank, __entry->status,
-		__entry->addr, __entry->misc,
-		__entry->cs, __entry->ip,
-		__entry->tsc,
-		__entry->cpuvendor, __entry->cpuid,
-		__entry->walltime,
-		__entry->socketid,
-		__entry->apicid)
-);
 
 /*
- * MCE event for memory-controller errors
+ * NOTE: due to trace contraints, we can't have the mce_record at the
+ * same file as mce_record, as they're used by different files. Including
+ * trace headers twice cause duplicated symbols. So, care is needed to
+ * sync changes here with changes at include/trace/events/mce.h.
  */
+
 TRACE_EVENT(mc_error_mce,
 
 	TP_PROTO(const unsigned int err_type,
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
new file mode 100644
index 0000000..4cbbcef
--- /dev/null
+++ b/include/trace/events/mce.h
@@ -0,0 +1,69 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mce
+
+#if !defined(_TRACE_MCE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MCE_H
+
+#include <linux/ktime.h>
+#include <linux/tracepoint.h>
+#include <asm/mce.h>
+
+TRACE_EVENT(mce_record,
+
+	TP_PROTO(struct mce *m),
+
+	TP_ARGS(m),
+
+	TP_STRUCT__entry(
+		__field(	u64,		mcgcap		)
+		__field(	u64,		mcgstatus	)
+		__field(	u64,		status		)
+		__field(	u64,		addr		)
+		__field(	u64,		misc		)
+		__field(	u64,		ip		)
+		__field(	u64,		tsc		)
+		__field(	u64,		walltime	)
+		__field(	u32,		cpu		)
+		__field(	u32,		cpuid		)
+		__field(	u32,		apicid		)
+		__field(	u32,		socketid	)
+		__field(	u8,		cs		)
+		__field(	u8,		bank		)
+		__field(	u8,		cpuvendor	)
+	),
+
+	TP_fast_assign(
+		__entry->mcgcap		= m->mcgcap;
+		__entry->mcgstatus	= m->mcgstatus;
+		__entry->status		= m->status;
+		__entry->addr		= m->addr;
+		__entry->misc		= m->misc;
+		__entry->ip		= m->ip;
+		__entry->tsc		= m->tsc;
+		__entry->walltime	= m->time;
+		__entry->cpu		= m->extcpu;
+		__entry->cpuid		= m->cpuid;
+		__entry->apicid		= m->apicid;
+		__entry->socketid	= m->socketid;
+		__entry->cs		= m->cs;
+		__entry->bank		= m->bank;
+		__entry->cpuvendor	= m->cpuvendor;
+	),
+
+	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, ADDR/MISC: %016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
+		__entry->cpu,
+		__entry->mcgcap, __entry->mcgstatus,
+		__entry->bank, __entry->status,
+		__entry->addr, __entry->misc,
+		__entry->cs, __entry->ip,
+		__entry->tsc,
+		__entry->cpuvendor, __entry->cpuid,
+		__entry->walltime,
+		__entry->socketid,
+		__entry->apicid)
+);
+
+#endif /* _TRACE_MCE_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.8


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 00/31] Hardware Events Report Mecanism (HERM)
  2012-02-10 16:39   ` Mauro Carvalho Chehab
@ 2012-02-12 12:08     ` Borislav Petkov
  2012-02-12 17:10       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 47+ messages in thread
From: Borislav Petkov @ 2012-02-12 12:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Linux Edac Mailing List,
	Linux Kernel Mailing List, tony.luck

On Fri, Feb 10, 2012 at 02:39:42PM -0200, Mauro Carvalho Chehab wrote:
> IMO, we should provide a Kconfig to allow disabling the legacy sysfs, but
> keep it enabled by default. With time, we may remove it together with
> the backport code.

Yeah, why you want to remove them at all, actually? There wasn't any reason
specified.

[..]

> >> kworker/u:5-198 [006] 1341.771535: mc_error_mce: mce#0: Corrected
> >> error memory read error on label "CPU_SrcID#0_Channel#3_DIMM#0 "
> >> (channel 0 slot 0 page 0x3a2db9 offset 0x7ac grain 32 syndrome
> >> 0x0 CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
> >> 00000003a2db97ac/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0,
> >> PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 1 error(s):
> >> Unknown: Err=0001:0090 socket=0 channel=2/mask=4 rank=1)
> > 
> > This is too much, you probably only want to say:
> > 
> > 	Corrected DRAM read error on DIMM "CPU..."
> > 
> > The channel, slot, page etc should be only Kconfigurable for people who
> > really need it.
> 
> Not sure if it is a good idea to remove those info via Kconfig, as this
> would mean that the userspace parsers will need to be prepared to work
> with both ways. It is probably better/easier to pass everything to
> userspace, and add a filter there.

As I said countless times already, the normal case is not interested
in so much information - they want to know only which DIMM caused the
error. Unless someone has compelling reasons to keep that info, I don't
want to burden the reporting path with unnecessary info.

> >>      kworker/u:5-198 [006] 1341.792536: mc_error_mce: mce#0: Corrected
> >> error Can't discover the memory rank for ch addr 0x60f2a6d76 on
> >> label "any memory" ( page 0x0 offset 0x0 grain 32 syndrome 0x0
> >> CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
> >> 0000000c1e54dab6/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0,
> >> PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 )
> > 
> > I guess we can report EDAC failures to map the DIMM properly like this.
> 
> Hmm... Yes, that may work, but changing all the drivers to provide the
> information on that way is not simple.
> 
> There are two different situations here:
> 
> 1) the driver knows that it was not able to detect the memory rank.
>    all it needs to do is to fill the message like above. No issue at all.
> 
> 2) the driver thinks that the information were decoded, but there's
>    a bug there. This is what the "out-of-range" error message tracks.

That's not hard to fix at the driver level - see my other mail about the single
trace_hw_error thing.

> Changing all drivers to provide a message like above for (2) requires
> lots of changes and tests, on each driver, and will provide a very
> little benefit:  if such error ever happens, the EDAC driver needs a 
> fix, and the parsed information is not reliable (the mce one can still
> be used). In such case, a bug against the driver should be filled.
> 
> There's no way for doing properly at core level, as the way to decode
> such out-of-range bugs is memory-architecture dependent. So, something
> like:
> 	"Can't discover the memory rank for ch addr 0x60f2a6d76"
> 
> Doesn't make much sense for FB-DIMM memory driver.

This is exactly why the drivers themselves should create the error
message and stick it into the trace_* call.

[snip more]

I don't see a problem with the driver creating a proper/fitting error
message string and sticking it into the trace_* call.

> >> New sysfs nodes are now provided, to match the real memory architecture.
> > 
> > ... and we need those because...?
> 
> Because manufacturers wanting to map their motherboards into EDAC is finding
> a very bad time, as there's no easy way to map what they have with a random
> driver-specific fake csrows/channel information.

Not fake, they're actually chip select rows and channels == DRAM
controllers.

> Anywone wanting to do such
> mapping right now would need to read and understand the edac driver. The
> only driver where such mapping is easy seems to be amd64_edac, as it doesn't
> support FB-DIMMs, nor the memory controller abstracts csrows information or
> provides more than 2 channels.
> 
> For example, on the above driver, there's no "channel-b". The error were
> on branch 1, channel 1 of branch 1 (the third channel of the memory controller). 
> The only way to discover it is after carefully analyzing the driver. So, anyone
> trying to map what the motherboard's label DIMM 1D would quickly discover that
> it means branch 1, channel 1, slot 1, but some drivers would map it as:
> 
> 	csrow 3, channel 1	(branch, channel -> csrow; slot ->channel)
> others as:
> 	csrow 7, channel 0	(branch, channel, slot -> csrow; channel always 0)
> and others as:
> 	csrow 1, channel 3.	(slot -> csrow; branch, channel -> channel)
> 
> (yes, all 3 types of mapping exists at the drivers)

So what are you saying? I don't see how your a little bit changed
mapping helps. As I told you already, motherboard designers don't always
comply with the placing of the DIMM connectors to the hw vendor spec and
place the channel and slots routing in an conveniently increasing order.
Again, you need the silk screen labels the way the BIOS sees them.

[..]

> > What happens with the inject_* sysfs nodes which are in EDAC already?
> 
> Will keep there, as-is. This is just yet-another testing feature, and won't
> interfere or superseed the existing ones.

Then it should be made to use the existing ones or you should have a
compelling reason why you can't reuse the existing ones. If you really
need it, you should stick it in debugfs and behind a debugging Kconfig
option.

> >> The memory error handling function has now the capability of reporting
> >> more than one dimm, when it is not possible to put the fingers into
> >> a single place.
> >>
> >> For example:
> >> 	# echo 1 >/sys/devices/system/edac/mc/mc0/fake_inject  && dmesg |tail -1
> >> 	[ 2878.130704] EDAC MC0: CE FAKE ERROR on mc#0channel#1slot#0 mc#0channel#1slot#1 mc#0channel#1slot#2  (channel 1 page 0x0 offset 0x0 grain 0 syndrome 0x0 for EDAC testing only)
> >>
> >> All dimm memories present on channel 1 are pointed as one of them were
> >> responsible for the error.
> > 
> > I don't see how this can be of any help? I like the EDAC failure message
> > better: if we cannot map it properly for some reason, we tell so the
> > user instead of generating some misleading data.
> 
> This is not a misleading data. Depending on how the ECC code is generated,
> there's no way to point to a single dimm, because two or more memories are
> used to produce the ECC data.
> 
> FB-DIMM memories can be in lockstep mode. If so, UE errors happen on a
> memory pair.
> 
> If the system admin wants to quickly recover the machine, he needs to know
> that replacing the 2 affected memories, the machine will work. He can later
> put the affected memories on a separate hardware, using a single-channel
> mode, in order to discover what's broken, but pointing to the two affected
> memories helps him to recover quickly, while still allowing him to further
> track where the problem is.
> 
> Btw, on Sandy Bridge, a memory can be on both lockstep and mirror mode. So,
> if an UE error occurs, 4 DIMM's maybe affected.

This sounds very strange, a single UE from multiple DIMMs? Reading
through "4.2 Southbound Commands" of the JEDEC FBDIMM spec, on several
occasions it states that only single DIMMs are being addressed (DS[2:0]
field) when sending commands to them southwards.

@Tony: hey Tony, is it true that FBDIMM can actually do DIMM
interleaving when doing DIMM reads/writes and the ECC calculation is
done on words from different DIMMs? The statement that the ECC word
would effectively protect multiple DIMMs and when an error is reported,
it would mean "multiple DIMMs affected" sounds pretty strange for my
taste...

[..]

> > "labels"? See above, if we cannot report it properly, we better say so
> > instead of misleading with multiple labels.
> 
> What the poor user is expected to do on such case, if it is not pointed to
> some memories for him to test? Ok, we can improve the message to make it
> clearer that likely just one of the pointed memories were affected, but 
> letting him with no glue would be a nightmare for the users.

Why are you sticking so much to those error messages?! If your driver
is programmed properly, it should detect the DIMM in error just fine,
without any "out-of-range" issues or multiple labels. The error case
should be very very seldom and in such case, stating that we're in error
is fine!

> >> Other technical details are provided, inside parenthesis, in order to
> >> allow hardware manufacturers, OEM, etc to have more details on it, and
> >> discover what DRAM has problems, if they want/need to.
> > 
> > Exactly, "if they want/need to" sounds like a Kconfig option to me which
> > can be turned on when needed.
> 
> I'm yet to know a real usecase where the user doesn't want that. He may not be
> the final consumer of such data, but what we've seen, in practice, is that,
> when people need to replace bad memory sticks, they go after the machine vendors,
> asking for warranty replacement. The vendors usually request a more detailed 
> info than just "dimm xxx is broken". The rest of the log helps them to show
> what actually happened with the memories, and the vendor to verify that the
> complain is pertinent.
> 
> Anyway, as I said before, the better would be that the userspace tool that
> retrieves such data to have an option to show the details or not.

Right, let's see which of that bulk of additional info the machine
vendor could use:

     kworker/u:5-198   [006]  1341.771535: mc_error_mce: mce#0: Corrected error memory read error on label "CPU_SrcID#0_Channel#3_DIMM#0 " (channel 0
slot 0  page 0x3a2db9 offset 0x7ac grain 32 syndrome 0x0 CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
00000003a2db97ac/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0, PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 1 error(s): Unknown:
Err=0001:0090 socket=0 channel=2/mask=4 rank=1)

- process name: hmm, no
- pid: nope,
- which logical processor saw the error: hmm, no
- label: oh yeah, this one it wants to know
- memory page: I don't think so
- offset, grain, syndrome: dunno, maybe or maybe not
- MCG, MC5 (it should be 4, btw), ADDR/MISC etc MCE bank MSRs: nah, not really
- TSC... not even worth mentioning

So, it looks to me, this info is only tangentially, if at all, important
to the machine vendor. So, add only the fields which are really
important instead of blindly dumping all we have into the trace and thus
bloating the trace unnecessarily and making it less usable.

> >> Ah, now that the memory architecture is properly represented, the DIMM
> >> labels are automatically filled by the mc_alloc function call, in order
> >> to properly represent the memory architecture.
> >>
> >> For example, in the case of Sandy Bridge, a memory can be described as:
> >> 	mc#0channel#1slot#0
> >>
> >> This matches the way the memory is known inside the technical information,
> >> and, hopefully, at the OEM manuals for the motherboard.
> > 
> > This is not always the case. You need the silkscreen labels from the
> > board manufacturers as they do not always match with the DIMM topology
> > from the hw vendor. OEM vendor BIOS should do this with a table of
> > silkscreen labels or something.
> 
> Yes. However, as I've already explained, OEM vendors don't know what
> "csrow 3, channel 2" means, as there are several different ways of mapping
> channel#, slot# into csrow/channel, and there are at least 4 or 5 different
> mapping logic inside the drivers.
> 
> If you take a look at the existing drivers that don't use csrow/channel,
> as a general rule, each driver will to its own proprietary fake mapping,
> with causes lots of problem for OEM's, as they need a hardware engineer
> (and/or the hardware diagram) to get the real data, and a software engineer
> to analyze the driver and map it into the EDAC's internal fake representation.

Let me put it clearer this time: I hardly doubt that having a bit
different nomenclature than CS and channel, i.e. MC, channel and slot
would help identify the DIMMs since board manufacturers don't always
lay out them as the hw vendors wish for a multitude of reasons. And I'd
guess the SB case is not different. But then again, adding those as a
generated string message which only the relevant drivers issue and it
doesn't affect the rest of the drivers is fine with me.

Not when there are new sysfs nodes which are valid only on those systems
and useless on others.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  2012-02-10 14:17     ` Mauro Carvalho Chehab
@ 2012-02-12 12:48       ` Borislav Petkov
  2012-02-12 17:21         ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 47+ messages in thread
From: Borislav Petkov @ 2012-02-12 12:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List

On Fri, Feb 10, 2012 at 12:17:51PM -0200, Mauro Carvalho Chehab wrote:
> Em 10-02-2012 11:41, Borislav Petkov escreveu:
> > On Thu, Feb 09, 2012 at 10:01:00PM -0200, Mauro Carvalho Chehab wrote:
> >> In order to provide a proper hardware event subsystem, let's
> >> encapsulate hardware events into a common trace facility, and
> >> make both edac and mce drivers to use it. After that, common
> >> facilities can be moved into a new core for hardware events
> >> reporting subsystem. This patch is the first of a series, and just
> >> touches at mce.
> > 
> > I think it would work too if you had only one event:
> > 
> > * trace_hw_error(...)
> > 
> > which would have as an argument a string describing it, like
> > "Uncorrected Memory Read Error", "Memory Read Error (out of range)" "TLB
> > Multimatch Error" etc., followed by the rest of the error info.
> > 
> > Currently, you're introducing at least 5 trace_* calls _only_ for memory
> > errors. What about the remaining couples of tens of errors which haven't
> > been addressed yet?
> 
> Good point.
> 
> The way I see it is that:
> 
> - a non-memory related, non-parsed MCE event would generate a "mce_record" trace
> 	(we need an additional patch to disable it when the error is parsed.
> 	 I'll address it after finishing the tests with a few other platforms);
> 
> As more MCE parsers are added at the core, the situations where such event will
> be generated will reduce, and will eventually disappear in long term.
> 
> - a non-x86 event (or a x86 event for a memory controller that is not addressed
> by MCE events) will use a "mc_error";
> 
> - a x86 event generated via MCE will use a "mc_error_mce".
> 
> There are two special events defined when there's a memory error _and_ a driver
> bug:
> 
> 	"mc_out_of_range_mce" and "mc_out_of_range".
> 
> While the name of them and one of the parameters are memory-controller specific,
> it should be easy to make it generic enough to be used by other types of errors.
> 
> The previous EDAC logic were to generate an out of range printk and return. With
> the changes I made, it is possible to let the EDAC to provide the information
> parsed, just discarding the bad parsed value. That's the approach I took, as the
> other information there may be useful. By taking such approach, the MCE information
> will be shown by the "mc_error_mce" trace. So, we can remove the "mc_out_of_range_mce"
> without loosing any information.
> 
> In any case, we can't merge the *_mce with the non-mce variant, as the mce.h header
> is arch specific and doesn't exist on PPC and tilera architectures.
> 
> So, the only event that we can actually remove is "mc_out_of_range_mce", if we let
> the core generate two events for badly parsed error events. What do you think?

As I said already, error messages from the drivers should be something
very seldom so they don't need a special trace event.

But most importantly, _ALL_ hw errors could use a single
trace_hw_error() macro which has a single string argument containing all
the required error info as a string since the error format is different
based on the error type. In any case, memory errors are not special! As
I said also before, we cannot have a trace-call for every error type
which adds additional information or which might generate an error while
producing that error info.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 00/31] Hardware Events Report Mecanism (HERM)
  2012-02-12 12:08     ` Borislav Petkov
@ 2012-02-12 17:10       ` Mauro Carvalho Chehab
  2012-02-13 21:29         ` Luck, Tony
  0 siblings, 1 reply; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-12 17:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List, tony.luck

Em 12-02-2012 10:08, Borislav Petkov escreveu:
> On Fri, Feb 10, 2012 at 02:39:42PM -0200, Mauro Carvalho Chehab wrote:
>> IMO, we should provide a Kconfig to allow disabling the legacy sysfs, but
>> keep it enabled by default. With time, we may remove it together with
>> the backport code.
> 
> Yeah, why you want to remove them at all, actually? There wasn't any reason
> specified.

There are two reasons:

1)  API sake. After having the tools ported to use traces and the newer
nodes, the old stuff will be there for no good reason;

2) keeping there requires the core to track and increment errors on both
csrow/channel counters and the (up to 3) layers counters.

So, it is a cleanup. Yet, there's no reason for deprecating it soon. The
removal of the old stuff may happen a few years after its merge upstream.

> 
> [..]
> 
>>>> kworker/u:5-198 [006] 1341.771535: mc_error_mce: mce#0: Corrected
>>>> error memory read error on label "CPU_SrcID#0_Channel#3_DIMM#0 "
>>>> (channel 0 slot 0 page 0x3a2db9 offset 0x7ac grain 32 syndrome
>>>> 0x0 CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
>>>> 00000003a2db97ac/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0,
>>>> PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 1 error(s):
>>>> Unknown: Err=0001:0090 socket=0 channel=2/mask=4 rank=1)
>>>
>>> This is too much, you probably only want to say:
>>>
>>> 	Corrected DRAM read error on DIMM "CPU..."
>>>
>>> The channel, slot, page etc should be only Kconfigurable for people who
>>> really need it.
>>
>> Not sure if it is a good idea to remove those info via Kconfig, as this
>> would mean that the userspace parsers will need to be prepared to work
>> with both ways. It is probably better/easier to pass everything to
>> userspace, and add a filter there.
> 
> As I said countless times already, the normal case is not interested
> in so much information - they want to know only which DIMM caused the
> error. Unless someone has compelling reasons to keep that info, I don't
> want to burden the reporting path with unnecessary info.

Either way work for me, but, as I said, I doubt that someone would disable
it. It would be great to hear more opinions about removing the additional
info at the ML's.

>>>>      kworker/u:5-198 [006] 1341.792536: mc_error_mce: mce#0: Corrected
>>>> error Can't discover the memory rank for ch addr 0x60f2a6d76 on
>>>> label "any memory" ( page 0x0 offset 0x0 grain 32 syndrome 0x0
>>>> CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
>>>> 0000000c1e54dab6/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0,
>>>> PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 )
>>>
>>> I guess we can report EDAC failures to map the DIMM properly like this.
>>
>> Hmm... Yes, that may work, but changing all the drivers to provide the
>> information on that way is not simple.
>>
>> There are two different situations here:
>>
>> 1) the driver knows that it was not able to detect the memory rank.
>>    all it needs to do is to fill the message like above. No issue at all.
>>
>> 2) the driver thinks that the information were decoded, but there's
>>    a bug there. This is what the "out-of-range" error message tracks.
> 
> That's not hard to fix at the driver level - see my other mail about the single
> trace_hw_error thing.

Well, such checks may be moved to happen inside the drivers. If the driver
is doing that, the core will never produce such error. In the case of
the drivers I'm maintaining, I prefer to let the core check if the provided
values are sane and, if not, produce a different error message.

>> Changing all drivers to provide a message like above for (2) requires
>> lots of changes and tests, on each driver, and will provide a very
>> little benefit:  if such error ever happens, the EDAC driver needs a 
>> fix, and the parsed information is not reliable (the mce one can still
>> be used). In such case, a bug against the driver should be filled.
>>
>> There's no way for doing properly at core level, as the way to decode
>> such out-of-range bugs is memory-architecture dependent. So, something
>> like:
>> 	"Can't discover the memory rank for ch addr 0x60f2a6d76"
>>
>> Doesn't make much sense for FB-DIMM memory driver.
> 
> This is exactly why the drivers themselves should create the error
> message 

Agreed. This is there already.

> and stick it into the trace_* call.

Let the drivers handle the trace_* call directly is not a good approach, as
it would require to move several parts of the edac core into the drivers.
The core is responsible for doing several tasks when an error is produced:
	- get the associated DIMM(s);
	- increment the error counters;
	- generate the traces;
	- generate the printk (part of the old API backward compatibility
that could be removed together with the old sysfs nodes).

Such handling is generic enough to be in core. Moving it to drivers would
just cause duplicated code to be added on all drivers.

> 
> [snip more]
> 
> I don't see a problem with the driver creating a proper/fitting error
> message string and sticking it into the trace_* call.

See above.

> 
>>>> New sysfs nodes are now provided, to match the real memory architecture.
>>>
>>> ... and we need those because...?
>>
>> Because manufacturers wanting to map their motherboards into EDAC is finding
>> a very bad time, as there's no easy way to map what they have with a random
>> driver-specific fake csrows/channel information.
> 
> Not fake, they're actually chip select rows and channels == DRAM
> controllers.

It is fake. The chip select rows/channels are controlled by the Advanced Memory Buffer
(AMB), on FB-DIMMs. The AMB is inside the DIMM, and provides physical isolation
of the DRAM chips (see JEDEC JESD No. 206).

The FB-DIMM Memory controllers aren't capable (and don't care) of seeing the 
DRAM's csrows/channels/ranks used by the AMB. All they see is their own buses 
(typically, 4 buses, sub-divided into branches and channels), and the DIMM number
inside each channel. So, it selects an entire DIMM, instead of a single DRAM chip.

The branches are used to provide 128-wide data access, in a similar way of what the
"channel" provides with a traditional "csrows/channel" approach.

Each of such bus is like this one:
	http://en.wikipedia.org/wiki/File:FB-DIMM_system_organization.svg

Also, as I said, the new Intel memory controllers found on Nehalem and Sandy
Bridge also don't allow seeing the DRAM chips individually. So, the chip select
is not visible there.

>> Anywone wanting to do such
>> mapping right now would need to read and understand the edac driver. The
>> only driver where such mapping is easy seems to be amd64_edac, as it doesn't
>> support FB-DIMMs, nor the memory controller abstracts csrows information or
>> provides more than 2 channels.
>>
>> For example, on the above driver, there's no "channel-b". The error were
>> on branch 1, channel 1 of branch 1 (the third channel of the memory controller). 
>> The only way to discover it is after carefully analyzing the driver. So, anyone
>> trying to map what the motherboard's label DIMM 1D would quickly discover that
>> it means branch 1, channel 1, slot 1, but some drivers would map it as:
>>
>> 	csrow 3, channel 1	(branch, channel -> csrow; slot ->channel)
>> others as:
>> 	csrow 7, channel 0	(branch, channel, slot -> csrow; channel always 0)
>> and others as:
>> 	csrow 1, channel 3.	(slot -> csrow; branch, channel -> channel)
>>
>> (yes, all 3 types of mapping exists at the drivers)
> 
> So what are you saying? I don't see how your a little bit changed
> mapping helps. As I told you already, motherboard designers don't always
> comply with the placing of the DIMM connectors to the hw vendor spec and
> place the channel and slots routing in an conveniently increasing order.
> Again, you need the silk screen labels the way the BIOS sees them.

It is not a "little bit changed" mapping. It is the removal of the fake
information introduced by the driver that helps hw vendors to provide
us maps between the sink screen labels and how they associate with the
memory controller's branch/channel/slot, channel/slots or chip select/channel.

> 
> [..]
> 
>>> What happens with the inject_* sysfs nodes which are in EDAC already?
>>
>> Will keep there, as-is. This is just yet-another testing feature, and won't
>> interfere or superseed the existing ones.
> 
> Then it should be made to use the existing ones or you should have a
> compelling reason why you can't reuse the existing ones. 

The reason is simple: only two or three drivers have error injection, and I
need to test my patchset on several boards that don't provide it.

> If you really
> need it, you should stick it in debugfs and behind a debugging Kconfig
> option.

It is behind the debug Kconfig option. While I agree that moving it to
debugfs makes sense, as it makes sense to move all the other error injection
code to debugfs, I won't invest more time on it, as I still have a lot of 
work to do testing those changes. 

This feature is on a separate patch. If it is not acceptable as is, I'll
just drop it from the final submission, and keep it just for my tests.

>>>> The memory error handling function has now the capability of reporting
>>>> more than one dimm, when it is not possible to put the fingers into
>>>> a single place.
>>>>
>>>> For example:
>>>> 	# echo 1 >/sys/devices/system/edac/mc/mc0/fake_inject  && dmesg |tail -1
>>>> 	[ 2878.130704] EDAC MC0: CE FAKE ERROR on mc#0channel#1slot#0 mc#0channel#1slot#1 mc#0channel#1slot#2  (channel 1 page 0x0 offset 0x0 grain 0 syndrome 0x0 for EDAC testing only)
>>>>
>>>> All dimm memories present on channel 1 are pointed as one of them were
>>>> responsible for the error.
>>>
>>> I don't see how this can be of any help? I like the EDAC failure message
>>> better: if we cannot map it properly for some reason, we tell so the
>>> user instead of generating some misleading data.
>>
>> This is not a misleading data. Depending on how the ECC code is generated,
>> there's no way to point to a single dimm, because two or more memories are
>> used to produce the ECC data.
>>
>> FB-DIMM memories can be in lockstep mode. If so, UE errors happen on a
>> memory pair.
>>
>> If the system admin wants to quickly recover the machine, he needs to know
>> that replacing the 2 affected memories, the machine will work. He can later
>> put the affected memories on a separate hardware, using a single-channel
>> mode, in order to discover what's broken, but pointing to the two affected
>> memories helps him to recover quickly, while still allowing him to further
>> track where the problem is.
>>
>> Btw, on Sandy Bridge, a memory can be on both lockstep and mirror mode. So,
>> if an UE error occurs, 4 DIMM's maybe affected.
> 
> This sounds very strange, a single UE from multiple DIMMs? Reading
> through "4.2 Southbound Commands" of the JEDEC FBDIMM spec, on several
> occasions it states that only single DIMMs are being addressed (DS[2:0]
> field) when sending commands to them southwards.
> 
> @Tony: hey Tony, is it true that FBDIMM can actually do DIMM
> interleaving when doing DIMM reads/writes and the ECC calculation is
> done on words from different DIMMs? The statement that the ECC word
> would effectively protect multiple DIMMs and when an error is reported,
> it would mean "multiple DIMMs affected" sounds pretty strange for my
> taste...

If is there a way, I wasn't able to discover, nor the other developers
that worked with FB-DIMM's.

Look at the original EDAC error report errors for FB-DIMM's:

void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
			unsigned int csrow,
			unsigned int channela,
			unsigned int channelb, char *msg);

void edac_mc_handle_fbd_ce(struct mem_ctl_info *mci,
			unsigned int csrow, unsigned int channel, char *msg);

For CE, it is able to detect what dimm has problem, but, when lockstep is
enabled, an Uncorrected Error would point to the two channels at the affected
branch. Also, when memory mirror is enabled, there are 4 DIMMs associated to the
same 128-bit memory address. Any one of those memories could be affected by
the error. Only the Sandy Bridge driver handles memory mirror, and I'll need
to add some extra logic to the location detect algorithm, in order to work
with it (it is currently on my TODO list).

> 
> [..]
> 
>>> "labels"? See above, if we cannot report it properly, we better say so
>>> instead of misleading with multiple labels.
>>
>> What the poor user is expected to do on such case, if it is not pointed to
>> some memories for him to test? Ok, we can improve the message to make it
>> clearer that likely just one of the pointed memories were affected, but 
>> letting him with no glue would be a nightmare for the users.
> 
> Why are you sticking so much to those error messages?! If your driver
> is programmed properly, it should detect the DIMM in error just fine,
> without any "out-of-range" issues or multiple labels. The error case
> should be very very seldom and in such case, stating that we're in error
> is fine!

You're the one that it is concerned about it, wanting to integrate it to
something else ;)

Such error message has a different nature from the normal one: it is a driver
bug. It should be reported as such, and not as an 'ordinary' hardware error
message. That's why it has a different event: it is a kernel bug.

>>>> Other technical details are provided, inside parenthesis, in order to
>>>> allow hardware manufacturers, OEM, etc to have more details on it, and
>>>> discover what DRAM has problems, if they want/need to.
>>>
>>> Exactly, "if they want/need to" sounds like a Kconfig option to me which
>>> can be turned on when needed.
>>
>> I'm yet to know a real usecase where the user doesn't want that. He may not be
>> the final consumer of such data, but what we've seen, in practice, is that,
>> when people need to replace bad memory sticks, they go after the machine vendors,
>> asking for warranty replacement. The vendors usually request a more detailed 
>> info than just "dimm xxx is broken". The rest of the log helps them to show
>> what actually happened with the memories, and the vendor to verify that the
>> complain is pertinent.
>>
>> Anyway, as I said before, the better would be that the userspace tool that
>> retrieves such data to have an option to show the details or not.
> 
> Right, let's see which of that bulk of additional info the machine
> vendor could use:
> 
>      kworker/u:5-198   [006]  1341.771535: mc_error_mce: mce#0: Corrected error memory read error on label "CPU_SrcID#0_Channel#3_DIMM#0 " (channel 0
> slot 0  page 0x3a2db9 offset 0x7ac grain 32 syndrome 0x0 CPU: 0, MCGc/s: 1000c14/0, MC5: 8c00004000010090, ADDR/MISC:
> 00000003a2db97ac/00000020404c4c86, RIP: 00:<0000000000000000>, TSC: 0, PROCESSOR: 0:206d6, TIME: 1328829250, SOCKET: 0, APIC: 0 1 error(s): Unknown:
> Err=0001:0090 socket=0 channel=2/mask=4 rank=1)
> 
> - process name: hmm, no
> - pid: nope,

Trace core changes are required for those. I won't be addressing this
on the proposed patchset.

> - which logical processor saw the error: hmm, no
> - label: oh yeah, this one it wants to know
> - memory page: I don't think so

	This could be used to detect what processes got affected by a
memory error.

> - offset, grain, syndrome: dunno, maybe or maybe not
> - MCG, MC5 (it should be 4, btw), ADDR/MISC etc MCE bank MSRs: nah, not really
> - TSC... not even worth mentioning

The logical processor, MCG, MC5...TSC were added here per your request ;)
Or maybe I just miss-understood your request...

You asked me to create a merge the mce_record trace event with the memory
error. They're there due to that. 

I'm more than happy to discard this new event and use the "mc_error" also
for mce-originated errors. The mce drivers can add some of the above information
at the driver-specific detail message (time, error quantity, error code?).

> So, it looks to me, this info is only tangentially, if at all, important
> to the machine vendor. So, add only the fields which are really
> important instead of blindly dumping all we have into the trace and thus
> bloating the trace unnecessarily and making it less usable.

There is an alternate option, not sure if that would work for trace: we can keep
there all the info at the trace call, but removing those useless information from
the printk, like:

#ifdef CONFIG_EDAC_FULL_DEBUG
	TP_printk("mce#%d: %s error %s on label \"%s\" (%s %s CPU: %d, MCGc/s: % llx/%llx, MC%d: %016Lx, ADDR/MISC: %016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x %s)",
	...
#else
	TP_printk("mce#%d: %s error %s on label \"%s\"",
	...
#endif

This way, a trace report file will contain everything there, but the displayed
message would be simpler.

Even better, maybe the trace core could be changed to accept two types of
TP_printk parameters: one consuming the full data, and a simplified "user-friendly"
one for those that don't want to know the error details.

With this, if the user ever needs the extra details (for example, because of a
HW vendor request), the information can be later retrieved from the trace report
binary format.

I've no idea how easy would be to implement it at trace.

> 
>>>> Ah, now that the memory architecture is properly represented, the DIMM
>>>> labels are automatically filled by the mc_alloc function call, in order
>>>> to properly represent the memory architecture.
>>>>
>>>> For example, in the case of Sandy Bridge, a memory can be described as:
>>>> 	mc#0channel#1slot#0
>>>>
>>>> This matches the way the memory is known inside the technical information,
>>>> and, hopefully, at the OEM manuals for the motherboard.
>>>
>>> This is not always the case. You need the silkscreen labels from the
>>> board manufacturers as they do not always match with the DIMM topology
>>> from the hw vendor. OEM vendor BIOS should do this with a table of
>>> silkscreen labels or something.
>>
>> Yes. However, as I've already explained, OEM vendors don't know what
>> "csrow 3, channel 2" means, as there are several different ways of mapping
>> channel#, slot# into csrow/channel, and there are at least 4 or 5 different
>> mapping logic inside the drivers.
>>
>> If you take a look at the existing drivers that don't use csrow/channel,
>> as a general rule, each driver will to its own proprietary fake mapping,
>> with causes lots of problem for OEM's, as they need a hardware engineer
>> (and/or the hardware diagram) to get the real data, and a software engineer
>> to analyze the driver and map it into the EDAC's internal fake representation.
> 
> Let me put it clearer this time: I hardly doubt that having a bit
> different nomenclature than CS and channel, i.e. MC, channel and slot
> would help identify the DIMMs since board manufacturers don't always
> lay out them as the hw vendors wish for a multitude of reasons. And I'd
> guess the SB case is not different. But then again, adding those as a
> generated string message which only the relevant drivers issue and it
> doesn't affect the rest of the drivers is fine with me.
> 
> Not when there are new sysfs nodes which are valid only on those systems
> and useless on others.

Let me put it clearer: fake data is causing troubles. I have some BZ's opened
due to that. Drivers should not need to lie to the EDAC core by manufacturing
some random fake info for the EDAC core to accept them.

This is a major bug and requires a fix.

Of course, such fix should not break the existing drivers, or require existing
applications to change to keep working.

Putting it on numbers:



There are 9 non-x86 drivers (PPC and tilera). I suspect that at least some of 
them are also lying to the EDAC core: on several such drivers, there's just one
channel, and, even just one chip select on a few. As most of them are for
embedded hardware, I doubt that they even have DIMMs on some of such hardware.

There are currently 18 x86 drivers using EDAC:

 6 drivers for 32 bits hardware (most are for DDR first-gen memories):
     amd76x_edac		- X86_32
     e7xxx_edac			- X86_32
     i82860_edac		- X86_32, RAMBUS
     i82875p_edac		- X86_32
     r82600_edac		- X86_32
     i82443bxgx_edac		- X86_32, BROKEN

6 drivers for 64 bits hardware that uses csrows/channel:
     amd64_edac			- DDR2, DDR3
     e752x_edac			- Intel Core 2Duo, DDR2 memories
     i3000_edac			- DDR2
     i3200_edac			- DDR2
     i82975x_edac		- DDR2
     x38_edac			- DDR2

6 drivers for 64 bits hardware where there's no csrows/channel visible by
the memory controller:

     i5000_edac			- FB-DIMM, fakes csrows/channel with a static map
     i5100_edac			- FB-DIMM, fakes csrows/channel with a static map
     i5400_edac			- FB-DIMM, fakes csrows/channel with a static map
     i7300_edac			- FB-DIMM, fakes csrows/channel with a static map
     i7core_edac		- DDR3, fakes csrows/channel by dynamically getting a
				  random free csrows/channel pair
     sb_edac			- DDR3, fakes csrows/channel by dynamically getting a
				  random free csrows/channel pair

So, from the 12 drivers for CPU's with 64 bits support, _half_ of them are lying
to EDAC, providing fake information to the kernel and to the userspace, each with
their own way to produce the fake csrows/channel info.

This is fixed by this patchset.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  2012-02-12 12:48       ` Borislav Petkov
@ 2012-02-12 17:21         ` Mauro Carvalho Chehab
  2012-02-12 18:44           ` Borislav Petkov
  0 siblings, 1 reply; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-12 17:21 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Linux Edac Mailing List, Linux Kernel Mailing List

Em 12-02-2012 10:48, Borislav Petkov escreveu:
> On Fri, Feb 10, 2012 at 12:17:51PM -0200, Mauro Carvalho Chehab wrote:
>> Em 10-02-2012 11:41, Borislav Petkov escreveu:
>>> On Thu, Feb 09, 2012 at 10:01:00PM -0200, Mauro Carvalho Chehab wrote:
>>>> In order to provide a proper hardware event subsystem, let's
>>>> encapsulate hardware events into a common trace facility, and
>>>> make both edac and mce drivers to use it. After that, common
>>>> facilities can be moved into a new core for hardware events
>>>> reporting subsystem. This patch is the first of a series, and just
>>>> touches at mce.
>>>
>>> I think it would work too if you had only one event:
>>>
>>> * trace_hw_error(...)
>>>
>>> which would have as an argument a string describing it, like
>>> "Uncorrected Memory Read Error", "Memory Read Error (out of range)" "TLB
>>> Multimatch Error" etc., followed by the rest of the error info.
>>>
>>> Currently, you're introducing at least 5 trace_* calls _only_ for memory
>>> errors. What about the remaining couples of tens of errors which haven't
>>> been addressed yet?
>>
>> Good point.
>>
>> The way I see it is that:
>>
>> - a non-memory related, non-parsed MCE event would generate a "mce_record" trace
>> 	(we need an additional patch to disable it when the error is parsed.
>> 	 I'll address it after finishing the tests with a few other platforms);
>>
>> As more MCE parsers are added at the core, the situations where such event will
>> be generated will reduce, and will eventually disappear in long term.
>>
>> - a non-x86 event (or a x86 event for a memory controller that is not addressed
>> by MCE events) will use a "mc_error";
>>
>> - a x86 event generated via MCE will use a "mc_error_mce".
>>
>> There are two special events defined when there's a memory error _and_ a driver
>> bug:
>>
>> 	"mc_out_of_range_mce" and "mc_out_of_range".
>>
>> While the name of them and one of the parameters are memory-controller specific,
>> it should be easy to make it generic enough to be used by other types of errors.
>>
>> The previous EDAC logic were to generate an out of range printk and return. With
>> the changes I made, it is possible to let the EDAC to provide the information
>> parsed, just discarding the bad parsed value. That's the approach I took, as the
>> other information there may be useful. By taking such approach, the MCE information
>> will be shown by the "mc_error_mce" trace. So, we can remove the "mc_out_of_range_mce"
>> without loosing any information.
>>
>> In any case, we can't merge the *_mce with the non-mce variant, as the mce.h header
>> is arch specific and doesn't exist on PPC and tilera architectures.
>>
>> So, the only event that we can actually remove is "mc_out_of_range_mce", if we let
>> the core generate two events for badly parsed error events. What do you think?
> 
> As I said already, error messages from the drivers should be something
> very seldom so they don't need a special trace event.
> 
> But most importantly, _ALL_ hw errors could use a single
> trace_hw_error() macro which has a single string argument containing all
> the required error info as a string since the error format is different
> based on the error type. In any case, memory errors are not special! As
> I said also before, we cannot have a trace-call for every error type
> which adds additional information or which might generate an error while
> producing that error info.

All trace events could be resumed into a single string. That's what the
TK_printk macro does.

As I said before, there's just one trace call for memory error events 
(hw_event:mc_error) on my second RFC.

I've added the mce-variant (mc_error_mce) for the version 3 because of what I
understood from the feedback you've provided me in priv, that the mce_record
event should be merged with it. I'm more than happy to remove it, if I
miss-understood you.

Regards,
Mauro.

> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  2012-02-12 17:21         ` Mauro Carvalho Chehab
@ 2012-02-12 18:44           ` Borislav Petkov
  2012-02-12 19:38             ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 47+ messages in thread
From: Borislav Petkov @ 2012-02-12 18:44 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List

On Sun, Feb 12, 2012 at 03:21:42PM -0200, Mauro Carvalho Chehab wrote:
> As I said before, there's just one trace call for memory error events 
> (hw_event:mc_error) on my second RFC.

Are you kidding me:

$ grep -EriIno "trace_.*\W" patch01.txt

...

TRACE_EVENT(mc_corrected_error,
TRACE_EVENT(mc_uncorrected_error,
TRACE_EVENT(mc_corrected_error_fbd,
TRACE_EVENT(mc_uncorrected_error_fbd,
TRACE_EVENT(mc_out_of_range,
TRACE_EVENT(mc_corrected_error_no_info,
TRACE_EVENT(mc_uncorrected_error_no_info,

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  2012-02-12 18:44           ` Borislav Petkov
@ 2012-02-12 19:38             ` Mauro Carvalho Chehab
  2012-02-13  9:21               ` Borislav Petkov
  0 siblings, 1 reply; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-12 19:38 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Linux Edac Mailing List, Linux Kernel Mailing List

Em 12-02-2012 16:44, Borislav Petkov escreveu:
> On Sun, Feb 12, 2012 at 03:21:42PM -0200, Mauro Carvalho Chehab wrote:
>> As I said before, there's just one trace call for memory error events 
>> (hw_event:mc_error) on my second RFC.
> 
> Are you kidding me:
> 
> $ grep -EriIno "trace_.*\W" patch01.txt
> 
> ...
> 
> TRACE_EVENT(mc_corrected_error,
> TRACE_EVENT(mc_uncorrected_error,
> TRACE_EVENT(mc_corrected_error_fbd,
> TRACE_EVENT(mc_uncorrected_error_fbd,
> TRACE_EVENT(mc_out_of_range,
> TRACE_EVENT(mc_corrected_error_no_info,
> TRACE_EVENT(mc_uncorrected_error_no_info,
> 

Huh?

See PATCH v3 03/31:  hw_event: Consolidate uncorrected/corrected error msgs into one

Those events got merged there into one hardware event and one
software error event generated due to a hardware trouble
(mc_out_of_range). 

This patch:
	[PATCH v3 21/31] hw_event: Add x86 MCE events on it

adds the mc_error_mce variant per your request.

What is there is:

$ grep TRACE_EVENT include/trace/events/hw_event.h
TRACE_EVENT(mc_error,
TRACE_EVENT(mc_out_of_range,
TRACE_EVENT(mc_error_mce,
TRACE_EVENT(mc_out_of_range_mce,

And what I've said already is that I'll get rid of 
"mc_out_of_range_mce" in the final version, and convert
"mc_out_of_range" into a generic event to inform that a hardware
error occurred but the driver has a bug and weren't able to 
parse it.

I only added:
	TRACE_EVENT(mc_error_mce,
per your request to have an arch-specific event with both mce-record
data and mc_error information.

For me, only this hardware-error event is needed:
	TRACE_EVENT(mc_error,

Subsequent patches consolidate the drivers to just use one function
call to EDAC core to report the errors:
	edac_mc_handle_error()

That function increments the error counts, gets the associated
label(s) and generate the event. It currently also prints the error
message, to preserve backward compatibility.

Mauro.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  2012-02-12 19:38             ` Mauro Carvalho Chehab
@ 2012-02-13  9:21               ` Borislav Petkov
  2012-02-13 10:23                 ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 47+ messages in thread
From: Borislav Petkov @ 2012-02-13  9:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List

On Sun, Feb 12, 2012 at 05:38:08PM -0200, Mauro Carvalho Chehab wrote:
> Em 12-02-2012 16:44, Borislav Petkov escreveu:
> > On Sun, Feb 12, 2012 at 03:21:42PM -0200, Mauro Carvalho Chehab wrote:
> >> As I said before, there's just one trace call for memory error events 
> >> (hw_event:mc_error) on my second RFC.
> > 
> > Are you kidding me:
> > 
> > $ grep -EriIno "trace_.*\W" patch01.txt
> > 
> > ...
> > 
> > TRACE_EVENT(mc_corrected_error,
> > TRACE_EVENT(mc_uncorrected_error,
> > TRACE_EVENT(mc_corrected_error_fbd,
> > TRACE_EVENT(mc_uncorrected_error_fbd,
> > TRACE_EVENT(mc_out_of_range,
> > TRACE_EVENT(mc_corrected_error_no_info,
> > TRACE_EVENT(mc_uncorrected_error_no_info,
> > 
> 
> Huh?
> 
> See PATCH v3 03/31:  hw_event: Consolidate uncorrected/corrected error msgs into one
> 
> Those events got merged there into one hardware event and one
> software error event generated due to a hardware trouble
> (mc_out_of_range).

[..]

Right, and what I was suggesting is to introduce a single trace event
and use it everywhere. Instead, you're converting the edac calls into
trace events and then eliminating them, which creates unnecessary noise.

But, nevermind this, I have a better suggestion: instead of you and me
going back and forth needlessly about the trace events, how about you
concentrate on fixing the FBDIMM drivers (and only those) since this is
the main reason for your patchset, as you say, and let me concentrate
on writing the trace event I mean - I'm currently travelling but I'll
try to hack up something in the next couple of days in order to give
you a better idea of what I mean? The edac drivers can use the standard
edac_printk and friends in the meantime and we can convert them later.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 00/31] Hardware Events Report Mecanism (HERM)
  2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
                   ` (32 preceding siblings ...)
  2012-02-10 16:48 ` [PATCH v3 32/31] edac: restore mce.h file Mauro Carvalho Chehab
@ 2012-02-13  9:23 ` Mauro Carvalho Chehab
  33 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-13  9:23 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List, bp, tony.luck

Em 09-02-2012 22:00, Mauro Carvalho Chehab escreveu:
> This is the third version of HERM patches.
> 

Forgot to mention, but I'm pushing the HERM patches on my edac tree on a
separate branch:

	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac.git hw_events

There are a few more patches there, as it can be seen via gitweb[1] that I'm
writing in order to fix bugs I'm noticing on my tests. I'll be merging/cleaning
the patches for a final version of this patchset after doing some tests on enough
systems to be sure that no regressions were introduced.

[1] http://git.kernel.org/?p=linux/kernel/git/mchehab/linux-edac.git;a=shortlog;h=refs/heads/hw_events

Regards,
Mauro

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 01/31] events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  2012-02-13  9:21               ` Borislav Petkov
@ 2012-02-13 10:23                 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2012-02-13 10:23 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Linux Edac Mailing List, Linux Kernel Mailing List

Em 13-02-2012 07:21, Borislav Petkov escreveu:
> On Sun, Feb 12, 2012 at 05:38:08PM -0200, Mauro Carvalho Chehab wrote:
>> Em 12-02-2012 16:44, Borislav Petkov escreveu:
>>> On Sun, Feb 12, 2012 at 03:21:42PM -0200, Mauro Carvalho Chehab wrote:
>>>> As I said before, there's just one trace call for memory error events 
>>>> (hw_event:mc_error) on my second RFC.
>>>
>>> Are you kidding me:
>>>
>>> $ grep -EriIno "trace_.*\W" patch01.txt
>>>
>>> ...
>>>
>>> TRACE_EVENT(mc_corrected_error,
>>> TRACE_EVENT(mc_uncorrected_error,
>>> TRACE_EVENT(mc_corrected_error_fbd,
>>> TRACE_EVENT(mc_uncorrected_error_fbd,
>>> TRACE_EVENT(mc_out_of_range,
>>> TRACE_EVENT(mc_corrected_error_no_info,
>>> TRACE_EVENT(mc_uncorrected_error_no_info,
>>>
>>
>> Huh?
>>
>> See PATCH v3 03/31:  hw_event: Consolidate uncorrected/corrected error msgs into one
>>
>> Those events got merged there into one hardware event and one
>> software error event generated due to a hardware trouble
>> (mc_out_of_range).
> 
> [..]
> 
> Right, and what I was suggesting is to introduce a single trace event
> and use it everywhere. Instead, you're converting the edac calls into
> trace events and then eliminating them, which creates unnecessary noise.

I did that for a few reasons: preserve history for the ones that reviewed
the original patchset, to remind why some changes were needed, and avoid 
rebasing my tree. Also, this way it it simpler to change or remove a patchset
if needed.

At the final version, I intend to fold some patches, in order to remove
some uneeded-to-preserve dirty details from the upstream history.

> But, nevermind this, I have a better suggestion: instead of you and me
> going back and forth needlessly about the trace events, how about you
> concentrate on fixing the FBDIMM drivers (and only those) since this is
> the main reason for your patchset, as you say, and let me concentrate
> on writing the trace event I mean - I'm currently travelling but I'll
> try to hack up something in the next couple of days in order to give
> you a better idea of what I mean? The edac drivers can use the standard
> edac_printk and friends in the meantime and we can convert them later.

The main reason for this patchset is to implement the changes that were
discussed on the EDAC mini-summits that happened in 2010 [1][2]. The
fix for FB-DIMM is one of the issues that I'm addressing [3].

The fixes needed for FB-DIMM drivers and for Intel CPU-integrated memory
controllers (for Nehalem and Sandy Bridge) are done already. I'm now
focused on testing it on a wide range of machines, in order to be sure that
they won't be causing any regressions. I think I'll be able to test it
on almost all x86 machines and on a few ppc ones.

Anyway, I won't be touching on the trace events again. So, feel free to
propose what you're meaning.

It is probably better if you could write a tracing patch against my tree 
with your view, as it will be easy for us to review and to merge it later.
It should also be easier for you to propose it, as, on my tree, all drivers 
call a single function to report errors:

	edac_mc_handle_error(), defined on drivers/edac/edac_mc.c.

This is the function that calls the defined events, and replaces all the
previous ones. All drivers were ported to use it on my tree.

So, for example, on amd64_edac[4], an error is reported like:

edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, src_mci,
			     page, offset, syndrome,
			     csrow, channel, -1,
			     EDAC_MOD_STR, "", NULL);

for the families that don't use MCE for errors, or:

edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, src_mci,
			     page, offset, syndrome,
			     csrow, channel, -1,
			     EDAC_MOD_STR, "", m);

for the ones that use it. The last parameter there is arch-dependent.
The EDAC core calls the x86 variant of the trace call, with the MCE
log information, if the parameter is filled and the machine is X86
(hmm... there's a bug there... it should be testing for CONFIG_X86_MCE
instead of just CONFIG_X86 - I'll add a patch fixing it).

The label is decoded using the (csrow, channel, -1) location. In the case
of this driver, only 2 layers are used, so, the final number is -1. The
EDAC core will print the error for the label found at [csrow][cschannel]
location.

An event where the driver can't decode where the error happened is generated
with:

edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
				     page, offset, syndrome,
				     -1, -1, -1,
				     EDAC_MOD_STR,
				     "failed to map error addr to a node",
				     NULL);

In such case, the location is unknown (all were filled with -1), so
the EDAC core will not seek for the labels.

For FB-DIMM drivers, where part of the location is not known, like UE
errors where the MC can only point to the branch and DIMM slot, but the
channel can't be determined, due to lockstep mode, where both channels of
a branch are used for ECC, the driver will call it with something like:

edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, src_mci,
			     page, offset, syndrome,
			     branch, -1, dimm_slot,
			     "memory read error", "some other detail", NULL);

The core should produce a message like:

	EDAC MC0: UE memory read error on DIMM1A or DIMM1B (branch 0 slot 0 page 0xdeadbeef offset 0xdeadbeef grain 8 syndrome 0x0 some other detail)


[1] http://lwn.net/Articles/388292/
[2] http://lwn.net/Articles/416669/
[3] As you can see, change the EDAC core to not force a csrow/channel
    hierarchy is indeed the hardest challenge that this patchset addresses. 
    While I'm making a big effort to minimally touch the drivers, all drivers need 
    to be converted to use the new function prototypes, and to properly describe 
    what memory hierarchy is used there. For example, those are the changes done
    at amd64_edac:

    http://git.kernel.org/?p=linux/kernel/git/mchehab/linux-edac.git;a=history;f=drivers/edac/amd64_edac.c;h=aa7ecbb48777f7a27ff86c87772facab51f40663;hb=refs/heads/hw_events

   I'll likely be testing my patches tomorrow on amd64, to be sure that no
   regressions were added.

[4] The change at the error logic to use the new way on amd64_edac 
    is on those patches:

	http://git.kernel.org/?p=linux/kernel/git/mchehab/linux-edac.git;a=commitdiff;h=78f9d383a1ab40352c3eb3cf84a7ad93c19652bc
	http://git.kernel.org/?p=linux/kernel/git/mchehab/linux-edac.git;a=commitdiff;h=60dae3534f9f3c8408e1e9016e815e9b06d53a2f


^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [PATCH v3 00/31] Hardware Events Report Mecanism (HERM)
  2012-02-12 17:10       ` Mauro Carvalho Chehab
@ 2012-02-13 21:29         ` Luck, Tony
  0 siblings, 0 replies; 47+ messages in thread
From: Luck, Tony @ 2012-02-13 21:29 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Borislav Petkov
  Cc: Linux Edac Mailing List, Linux Kernel Mailing List

> For CE, it is able to detect what dimm has problem, but, when lockstep is
> enabled, an Uncorrected Error would point to the two channels at the affected
> branch. Also, when memory mirror is enabled, there are 4 DIMMs associated to the
> same 128-bit memory address. Any one of those memories could be affected by
> the error. Only the Sandy Bridge driver handles memory mirror, and I'll need
> to add some extra logic to the location detect algorithm, in order to work
> with it (it is currently on my TODO list).

This looks like a hard problem to solve. In mirror mode memory writes go to
both sides, reads come from the master. To determine the location of the error
you'd need to know which set was the master at the time the error was detected.
Note that can change (and since we know we just had an error it might have been
switched because of the error we just saw). We don't have any visibility from
the OS as to what the BIOS is doing with mirrors.

-Tony

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2012-02-13 21:29 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-10  0:00 [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 01/31] events/hw_event: Create a " Mauro Carvalho Chehab
2012-02-10 13:41   ` Borislav Petkov
2012-02-10 14:17     ` Mauro Carvalho Chehab
2012-02-12 12:48       ` Borislav Petkov
2012-02-12 17:21         ` Mauro Carvalho Chehab
2012-02-12 18:44           ` Borislav Petkov
2012-02-12 19:38             ` Mauro Carvalho Chehab
2012-02-13  9:21               ` Borislav Petkov
2012-02-13 10:23                 ` Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 02/31] events/hw_event: use __string() trace macros for events Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 03/31] hw_event: Consolidate uncorrected/corrected error msgs into one Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 04/31] drivers/edac: rename channel_info to csrow_channel_info Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 05/31] edac: Create a dimm struct and move the labels into it Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 06/31] edac: Add per dimm's sysfs nodes Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 07/31] edac: Prepare to push down to drivers the filling of the dimm_info Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 08/31] edac: Better describe the memory concepts The memory terms changed along the time, since when EDAC were originally written: new concepts were introduced, and some things have different meanings, depending on the memory architecture. Better define those terms, and better describe each supported memory type Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 09/31] i5400_edac: Convert it to report memory with the new location Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 10/31] i7300_edac: " Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 11/31] edac: move dimm properties to struct dimm_info Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 12/31] edac: Don't initialize csrow's first_page & friends when not needed Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 13/31] edac: move nr_pages to dimm struct Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 14/31] edac: Add per-dimm sysfs show nodes Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 15/31] edac: DIMM location cleanup Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 16/31] edac/ppc4xx_edac: Fix compilation Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 17/31] edac-mc: Allow reporting errors on a non-csrow oriented way Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 18/31] edac.h: Use kernel-doc-nano-HOWTO.txt notation for enums Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 19/31] edac: rework memory layer hierarchy description Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 20/31] edac: Export MC hierarchy counters for CE and UE Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 21/31] hw_event: Add x86 MCE events on it Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 22/31] amd64_edac: convert it to use the MCE log tracepoint where applicable Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 23/31] edac: Simplify logs for i7core and sb edac drivers Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 24/31] edac_mc: Some clenups at the log message Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 25/31] edac: Add a sysfs node to test the EDAC error report facility Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 26/31] edac_mc: Fix the enable label filter logic Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 27/31] edac: Initialize the dimm label with the known information Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 28/31] edac: don't OOPS if the csrow is not visible Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 29/31] edac: Fix sysfs csrow?/*ce*count counters Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 30/31] edac: Fix new error counts Mauro Carvalho Chehab
2012-02-10  0:01 ` [PATCH v3 31/31] edac: Fix per layer error count counters Mauro Carvalho Chehab
2012-02-10 13:26 ` [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Borislav Petkov
2012-02-10 16:39   ` Mauro Carvalho Chehab
2012-02-12 12:08     ` Borislav Petkov
2012-02-12 17:10       ` Mauro Carvalho Chehab
2012-02-13 21:29         ` Luck, Tony
2012-02-10 16:48 ` [PATCH v3 32/31] edac: restore mce.h file Mauro Carvalho Chehab
2012-02-13  9:23 ` [PATCH v3 00/31] Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).