From: "Wang, Zhenyu" <zhenyu.z.wang@intel.com>
To: Russ Anderson <rja@sgi.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RCF] Linux memory error handling
Date: Thu, 16 Jun 2005 10:54:14 +0800 [thread overview]
Message-ID: <20050616025414.GA14764@zhen-devel.sh.intel.com> (raw)
In-Reply-To: <200506151430.j5FEUD7J1393603@clink.americas.sgi.com>
On 2005.06.15 09:30:13 +0000, Russ Anderson wrote:
> [RCF] Linux memory error handling.
>
> Summary: One of the most common hardware failures in a computer
> is a memory failure. There has been efforts in various
> architectures to support recover from memory errors. This
> is an attempt to define a common support infrastructure
> in Linux to support memory error handling.
>
> Background: There has been considerable work on recovering from
> Machine Check Aborts (MCAs) in arch/ia64. One result is
> that many memory errors encountered by user applications
> not longer cause a kernel panic. The application is
> terminated, but linux and other applications keep running.
> Additional improvements are becoming dependent on mainline
> linux support. That requires involvement of lkml, not
> just linux-ia64.
Good RFC! Actually on x86 arch, 'bluesmoke' - http://bluesmoke.sf.net - is out
there for some simple mem ECC error handling already. It's inspired by the old linux-ecc
project. Current capability is limited to detect, report, configuable for polling and UE
panic.
Bluesmoke contains a driver core which is used to host infos for each mem
controller, like dimm info, and currently only polling method is taken for registered
controller. Others are all the specific chipset drivers, which is mostly platform depend,
e.g e7520, 82875P, etc. Those platforms have also been tested, bluesmoke's webpage
contains some test method if you really want to try.
nmi handling is still under work, Dave and Corey's patch is on sourceforge page, and
http://lkml.org/lkml/2004/8/19/140
http://lkml.org/lkml/2005/3/21/11
Those nmi callbacks have not been added to chipset driver yet, but some initial
testing failed, still don't know why...
thanks
-zhen
prev parent reply other threads:[~2005-06-16 2:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-06-15 14:30 [RCF] Linux memory error handling Russ Anderson
2005-06-15 15:08 ` Andi Kleen
2005-06-15 16:36 ` Russ Anderson
2005-06-15 15:26 ` Maciej W. Rozycki
2005-06-15 19:46 ` Russell King
2005-06-15 20:28 ` [RFC] " Russ Anderson
2005-06-15 20:45 ` Dave Hansen
2005-06-15 21:27 ` Russ Anderson
2005-06-15 21:33 ` Dave Hansen
2005-06-20 20:42 ` Russ Anderson
2005-06-20 21:07 ` Dave Hansen
2005-06-15 22:09 ` Russ Anderson
2005-06-16 19:42 ` Maciej W. Rozycki
2005-06-16 1:03 ` [RCF] " Ross Biro
2005-06-15 20:42 ` Joel Schopp
2005-06-16 2:54 ` Wang, Zhenyu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050616025414.GA14764@zhen-devel.sh.intel.com \
--to=zhenyu.z.wang@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rja@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).