linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Wang, Zhenyu" <zhenyu.z.wang@intel.com>
To: Russ Anderson <rja@sgi.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RCF] Linux memory error handling
Date: Thu, 16 Jun 2005 10:54:14 +0800	[thread overview]
Message-ID: <20050616025414.GA14764@zhen-devel.sh.intel.com> (raw)
In-Reply-To: <200506151430.j5FEUD7J1393603@clink.americas.sgi.com>

On 2005.06.15 09:30:13 +0000, Russ Anderson wrote:
> 		[RCF] Linux memory error handling.
> 
> Summary: One of the most common hardware failures in a computer 
> 	is a memory failure.   There has been efforts in various
> 	architectures to support recover from memory errors.  This
> 	is an attempt to define a common support infrastructure
> 	in Linux to support memory error handling.
> 
> Background:  There has been considerable work on recovering from
> 	Machine Check Aborts (MCAs) in arch/ia64.  One result is
> 	that many memory errors encountered by user applications
> 	not longer cause a kernel panic.  The application is 
> 	terminated, but linux and other applications keep running.
> 	Additional improvements are becoming dependent on mainline
> 	linux support.  That requires involvement of lkml, not
> 	just linux-ia64.

Good RFC! Actually on x86 arch, 'bluesmoke' - http://bluesmoke.sf.net - is out 
there for some simple mem ECC error handling already. It's inspired by the old linux-ecc 
project. Current capability is limited to detect, report, configuable for polling and UE
panic. 

Bluesmoke contains a driver core which is used to host infos for each mem 
controller, like dimm info, and currently only polling method is taken for registered 
controller. Others are all the specific chipset drivers, which is mostly platform depend, 
e.g e7520, 82875P, etc. Those platforms have also been tested, bluesmoke's webpage
contains some test method if you really want to try. 

nmi handling is still under work, Dave and Corey's patch is on sourceforge page, and

    http://lkml.org/lkml/2004/8/19/140
    http://lkml.org/lkml/2005/3/21/11

Those nmi callbacks have not been added to chipset driver yet, but some initial 
testing failed, still don't know why...

thanks
-zhen

      parent reply	other threads:[~2005-06-16  2:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-15 14:30 [RCF] Linux memory error handling Russ Anderson
2005-06-15 15:08 ` Andi Kleen
2005-06-15 16:36   ` Russ Anderson
2005-06-15 15:26 ` Maciej W. Rozycki
2005-06-15 19:46   ` Russell King
2005-06-15 20:28     ` [RFC] " Russ Anderson
2005-06-15 20:45       ` Dave Hansen
2005-06-15 21:27         ` Russ Anderson
2005-06-15 21:33           ` Dave Hansen
2005-06-20 20:42             ` Russ Anderson
2005-06-20 21:07               ` Dave Hansen
2005-06-15 22:09   ` Russ Anderson
2005-06-16 19:42     ` Maciej W. Rozycki
2005-06-16  1:03   ` [RCF] " Ross Biro
2005-06-15 20:42 ` Joel Schopp
2005-06-16  2:54 ` Wang, Zhenyu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050616025414.GA14764@zhen-devel.sh.intel.com \
    --to=zhenyu.z.wang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rja@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).