linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederman@lnxi.com (Eric W. Biederman)
To: Dave Peterson <dsp@llnl.gov>
Cc: Gunther Mayer <gunther.mayer@gmx.net>,
	"bluesmoke-devel@lists.sourceforge.net" 
	<bluesmoke-devel@lists.sourceforge.net>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: noisy edac
Date: Mon, 30 Jan 2006 20:22:42 -0700	[thread overview]
Message-ID: <m3zmldjd31.fsf@maxwell.lnxi.com> (raw)
In-Reply-To: <200601301653.15984.dsp@llnl.gov> (Dave Peterson's message of "Mon, 30 Jan 2006 16:53:15 -0800")

Dave Peterson <dsp@llnl.gov> writes:

> On Monday 30 January 2006 16:02, Gunther Mayer wrote:
>> Just printk() the exact driver specific low-level error, even if non-fatal.
>>
>> Single non-fatal errors just show your system recovers correctly.
>>
>> Multiple (e.g. noisy) non-fatal are either an indication of a serious
>> problem
>>   (e.g. after how many corrected ECC errors on the same address in which
>>     time interval will you replace your dimm? How many S-ATA CRC-errors
>>      will indicate marginal bad cabling? )
>> or it shows the problem needs to be root analyzed. But don't disable the
>> messages as this will only hide the real problem.
>>
>> Concerning Non-Fatal PCI Express errors, the error cause registers need
>> to be printed in case of error, too (see Intel Chipset Specifications)
>
> I agree that in general, the kernel should not be silent when errors are
> detected.  However, for a particular type of error, it may be that the
> user is aware of the error (because (s)he has seen the messages), the user
> has determined the root cause, and it turns out that the error is benign.
> Therefore the user may want to suppress further messages of this type to
> avoid cluttering the logs.  If you don't provide that option to the user,
> then it can be viewed as hardcoding a certain aspect of sysadmin policy
> into the kernel.
>
> Depending on the particular type of error, it may be appropriate to just
> offer the user two options: either printk() or be silent.  For other types
> of errors, it may make sense to give the user more than two options (for
> instance ignore, printk(), or panic()).  I think developers of chipset
> drivers can make this decision individually for each type of error.

One piece missing from this conversation is the issue that we need errors
in a uniform format.  That is why edac_mc has helper functions.

However there will always be errors that don't fit any particular model.
Could we add a edac_printk(dev, );  That is similar to dev_printk but
prints out an EDAC header and the device on which the error was found?
Letting the rest of the string be user specified.

For actual control that interface may be to blunt, but at least for people
looking in the logs it allows all of the errors to be detected and harvested.

Eric

  reply	other threads:[~2006-01-31  3:23 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-27  1:41 noisy edac Dave Jones
2006-01-29 21:52 ` Alan Cox
2006-01-29 23:42   ` Dave Jones
2006-01-30 18:59     ` Doug Thompson
2006-01-30 19:58       ` Dave Peterson
2006-01-30 21:04         ` doug thompson
2006-01-30 22:24           ` Dave Peterson
2006-01-30 23:44             ` Gunther Mayer
2006-01-30 23:52               ` Dave Peterson
2006-01-31  0:02                 ` Gunther Mayer
2006-01-31  0:32                   ` doug thompson
2006-01-31  4:09                     ` Dave Peterson
2006-01-31  0:53                   ` Dave Peterson
2006-01-31  3:22                     ` Eric W. Biederman [this message]
2006-01-31  4:15                       ` Dave Peterson
2006-01-31 16:34                         ` doug thompson
2006-02-02  3:16                       ` [PATCH] EDAC printk() cleanup Dave Peterson
2006-02-02 16:16                         ` doug thompson
2006-01-31  3:28       ` noisy edac Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3zmldjd31.fsf@maxwell.lnxi.com \
    --to=ebiederman@lnxi.com \
    --cc=bluesmoke-devel@lists.sourceforge.net \
    --cc=dsp@llnl.gov \
    --cc=gunther.mayer@gmx.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).