linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@redhat.com>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-acpi@vger.kernel.org, Huang Ying <ying.huang@intel.com>,
	Tony Luck <tony.luck@intel.com>,
	Linux Edac Mailing List <linux-edac@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH EDAC 07/13] edac: add support for raw error reports
Date: Mon, 18 Feb 2013 12:24:29 -0300	[thread overview]
Message-ID: <20130218122429.239584aa@redhat.com> (raw)
In-Reply-To: <20130218135251.GC16622@pd.tnic>

Em Mon, 18 Feb 2013 14:52:51 +0100
Borislav Petkov <bp@alien8.de> escreveu:

> On Sun, Feb 17, 2013 at 07:44:04AM -0300, Mauro Carvalho Chehab wrote:
> > We could do it for the location. The space for label, however, depends on
> > how many DIMMs are in the system, as multiple dimm's may be present, and
> > the core will point to all possible affected DIMMs.
> > 
> > Ok, perhaps we could just allocate one big area for it (like one page), 
> > as this would very likely be enough for it, and change the logic to take
> > the buffer size into account when filling it.
> 
> Or, in the case where ->label is all dimms on the mci, you simply put
> "All DIMMs on MCI%d" in there and done. Simple.

The core does this already when it has no glue at all about where is the
error.

The core is prepared to the case where the location is only half-filled,
as this is a common scenario on the drivers, and important enough on
some memory controllers.

As already discussed, on most memory controllers nowadays, the memory
controller can't point to a single DIMM, as the error correction code
takes 128 bits (2 DIMMs). It is impossible for the error correction
code to determine on what DIMM an uncorrected error happened[1].

With Nehalem memory controllers, depending on the memory configuration,
the minimal DIMM granularity for an uncorrected error can be even worse: 
4 DIMMs, if 128-bits error correction code and mirror mode are both enabled.

There are some border cases where the driver can simply not discover on
what channel or on what dimm(or csrow) inside a channel the error
happened. The error could be associated with some failure at the logic
or at the bus that communicated with the Advanced Memory Buffers on an
FB-DIMM memory controller, for example.

So, the real core's worse case scenario would be if the driver can't
determine on what DIMM inside a channel the error happened. As a channel
can have a large number of DIMMs[2] the allocated area for the label
should be conservative.


 (16? Not sure what's the worse case),

[1] such error can even not be fatal, if that particular address is
unused.

[2] Currently, up to 8, according with:
	$for i in $(git grep "layers.*size\s*=" drivers/edac|perl -ne 'print "$1 " if (m/\=\s*([A-Z][^\s]+);/);'); do echo $i; git grep $i drivers/edac; done|grep define|perl -ne 'print "$1 " if (m/define\s+[^\s]+\s(\d+)/)'
	8 8 2 2 4 2 3 3 3 8 4 4 3 3 1 1 4 

and
	$ git grep "layers.*size\s*=" drivers/edac|perl -ne 'print "$1 " if (m/\=\s*(\d+);/);'
	1 1 1 1 2 2 8 4 1 1 1 1 

Nothing prevents that a driver would have more than 8 DIMMs per layer
in the future.

-- 

Cheers,
Mauro

  reply	other threads:[~2013-02-18 15:24 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-15 12:44 [PATCH EDAC 00/13] Add a driver to report Firmware first errors (via GHES) Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 01/13] edac: lock module owner to avoid error report conflicts Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 02/13] ghes: move structures/enum to a header file Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 03/13] ghes: add the needed hooks for EDAC error report Mauro Carvalho Chehab
2013-02-21  1:26   ` Huang Ying
2013-02-21 12:04     ` Mauro Carvalho Chehab
2013-02-22  0:45       ` Huang Ying
2013-02-22  8:50         ` Mauro Carvalho Chehab
2013-02-22  8:57           ` Mauro Carvalho Chehab
2013-02-25  0:25             ` Huang Ying
2013-02-15 12:44 ` [PATCH EDAC 04/13] edac: add a new memory layer type Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 05/13] ghes_edac: Register at EDAC core the BIOS report Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 06/13] ghes_edac: Allow registering more than once Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 07/13] edac: add support for raw error reports Mauro Carvalho Chehab
2013-02-15 14:13   ` Borislav Petkov
2013-02-15 15:25     ` Mauro Carvalho Chehab
2013-02-15 15:41       ` Borislav Petkov
2013-02-15 15:49         ` Mauro Carvalho Chehab
2013-02-15 16:02           ` Borislav Petkov
2013-02-15 18:20             ` Mauro Carvalho Chehab
2013-02-16 16:57               ` Borislav Petkov
2013-02-17 10:44                 ` Mauro Carvalho Chehab
2013-02-18 13:52                   ` Borislav Petkov
2013-02-18 15:24                     ` Mauro Carvalho Chehab [this message]
2013-02-19 11:56                       ` Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 08/13] ghes_edac: add support for reporting errors via EDAC Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 09/13] ghes_edac: do a better job of filling EDAC DIMM info Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 10/13] edac: better report error conditions in debug mode Mauro Carvalho Chehab
2013-02-15 12:44 ` [PATCH EDAC 11/13] edac: initialize the core earlier Mauro Carvalho Chehab
2013-02-15 12:45 ` [PATCH EDAC 12/13] ghes_edac.c: Don't credit the same memory dimm twice Mauro Carvalho Chehab
2013-02-15 12:45 ` [PATCH EDAC 13/13] ghes_edac: Improve driver's printk messages Mauro Carvalho Chehab
2013-02-15 16:38   ` Joe Perches
2013-02-15 17:33     ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130218122429.239584aa@redhat.com \
    --to=mchehab@redhat.com \
    --cc=bp@alien8.de \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).