All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Maciej W. Rozycki" <macro@linux-mips.org>
To: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org, Andi Kleen <andi@firstfloor.org>
Subject: Re: [RFC 3/6] x86, NMI, Rename memory parity error to PCI SERR error
Date: Wed, 22 Sep 2010 00:04:30 +0100 (BST)	[thread overview]
Message-ID: <alpine.LFD.2.00.1009212333280.6717@eddie.linux-mips.org> (raw)
In-Reply-To: <1284087065-32722-3-git-send-email-ying.huang@intel.com>

On Fri, 10 Sep 2010, Huang Ying wrote:

> memory parity error is only valid for IBM PC-AT, newer machine use 7
> bit (0x80) of 0x61 port for PCI SERR. While memory error is usually
> reported via MCE. So corresponding function name and kernel log string
> is changed.

 Things perhaps changed over the last few years while I have not been 
watching, but for many years the bit #7 of the NMI status port 
(implemented by the southbridge at 0x61 in the port I/O space) was still 
used for memory parity or ECC errors even after the original IBM PC/AT.  
The usual arrangement was in the event of a memory error the memory 
controller in the northbridge would assert the chip's PCI SERR output 
line, which in turn would be trapped by the southbridge and converted to 
an NMI event while setting the said bit in the NMI status port.  See e.g. 
the 82349HX System Controller datasheet (Intel document number 290551).

 So the name of the error reported is not that unjustified except, of 
course, to be precise the handler would have to scan the state of the SERR 
output reported by all the PCI devices in the PCI configuration space to 
find the originator and then interpret the event accordingly.  Which 
obviously means the only piece of code that could exactly know what the 
reason was is the respective device driver as causes of SERR are 
device-specific and may require processing of device-specific registers to 
determine the cause and/or recover (a device reset may be required in some 
cases).

 Of course using the MCE seems natural and better, especially if the 
exception can be raised synchronously and stop the failing memory load CPU 
instruction from completion -- this is important for parity and MBE ECC 
errors, where in some cases the handler may be able to retry the failing 
operation having refreshed RAM from the backing store or otherwise the 
affected process must be killed (unless a kernel memory location is 
involved that is, where the whole system has to be brought down).

 OTOH, for CPU stores and DMA transactions the event will always be 
asynchronous and an NMI might be a better option, as in the case of parity 
and MBE ECC errors the whole system will probably have to be brought down, 
and with SBE ECC errors scrubbing can be done at any time and otherwise 
(except from logging and/or marking the physical page bad, as required) no 
action is needed.

  Maciej

  parent reply	other threads:[~2010-09-21 23:04 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-10  2:51 [RFC 1/6] x86, NMI, Add symbol definition for NMI magic constants Huang Ying
2010-09-10  2:51 ` [RFC 2/6] x86, NMI, Add touch_nmi_watchdog to io_check_error delay Huang Ying
2010-09-10  2:51 ` [RFC 3/6] x86, NMI, Rename memory parity error to PCI SERR error Huang Ying
2010-09-13  1:02   ` Robert Richter
2010-09-13  2:02     ` Huang Ying
2010-09-16  8:18       ` Robert Richter
2010-09-17  0:08         ` Huang Ying
2010-09-17  9:14           ` Robert Richter
2010-09-19  0:20             ` Huang Ying
2010-09-20  8:00               ` Robert Richter
2010-09-20 12:59                 ` Borislav Petkov
2010-09-21  0:22                   ` Huang Ying
2010-09-21  6:37                     ` Borislav Petkov
2010-09-21 14:08                       ` Doug Thompson
2010-09-21 23:04   ` Maciej W. Rozycki [this message]
2010-09-23  5:37     ` huang ying
2010-09-29  0:26       ` Maciej W. Rozycki
2010-09-10  2:51 ` [RFC 4/6] x86, NMI, Rewrite NMI handler Huang Ying
2010-09-10 15:56   ` Don Zickus
2010-09-10 16:03     ` Andi Kleen
2010-09-10 18:29       ` Don Zickus
2010-09-13  2:09         ` Huang Ying
2010-09-13 14:04           ` Don Zickus
2010-09-14  5:12             ` Huang Ying
2010-09-14 13:37               ` Don Zickus
2010-09-13  1:16   ` Robert Richter
2010-09-10  2:51 ` [RFC 5/6] x86, NMI, Add support to notify hardware error with unknown NMI Huang Ying
2010-09-10 16:02   ` Don Zickus
2010-09-10 16:19     ` Andi Kleen
2010-09-10 18:40       ` Don Zickus
2010-09-13  2:19         ` Huang Ying
2010-09-13 14:11           ` Don Zickus
2010-09-13 15:24             ` Andi Kleen
2010-09-13 15:47               ` Don Zickus
2010-09-13 16:57                 ` Andi Kleen
2010-09-13 17:53                   ` Don Zickus
2010-09-13 18:07                     ` Andi Kleen
2010-09-13 18:23                       ` Don Zickus
2010-09-13 18:36                         ` Andi Kleen
2010-09-13 19:36                           ` Don Zickus
2010-09-13 20:49                             ` Andi Kleen
2010-09-13 21:25                               ` Valdis.Kletnieks
2010-09-14  7:48                                 ` Andi Kleen
2010-09-14 17:54                                   ` Valdis.Kletnieks
2010-09-14 12:21                             ` Ingo Molnar
2010-09-14 13:45                               ` Don Zickus
2010-09-14 19:34                               ` Cyrill Gorcunov
2010-09-15  9:29                                 ` Ingo Molnar
2010-09-10  2:51 ` [RFC 6/6] x86, NMI, Remove do_nmi_callback logic Huang Ying
2010-09-10 16:13   ` Don Zickus
2010-09-13  2:27     ` Huang Ying
2010-09-13  6:24       ` Ingo Molnar
2010-09-10 20:37 ` [RFC 1/6] x86, NMI, Add symbol definition for NMI magic constants Peter Zijlstra
2010-09-10 22:58   ` H. Peter Anvin
2010-09-11  8:50   ` Andi Kleen
2010-09-13  1:30     ` Robert Richter
2010-09-21 21:48 ` Don Zickus
2010-09-21 22:19   ` Andi Kleen
2010-09-22 16:07     ` Don Zickus
2010-09-23  9:29       ` huang ying
2010-09-23 14:16         ` Don Zickus
2010-09-24 11:50           ` huang ying
2010-09-24 14:29             ` Don Zickus
2010-09-23  9:51   ` huang ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1009212333280.6717@eddie.linux-mips.org \
    --to=macro@linux-mips.org \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.