linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: "Hawa, Hanna" <hhhawa@amazon.com>
Cc: robh+dt@kernel.org, mark.rutland@arm.com, bp@alien8.de,
	mchehab@kernel.org, davem@davemloft.net,
	gregkh@linuxfoundation.org, nicolas.ferre@microchip.com,
	paulmck@linux.ibm.com, dwmw@amazon.co.uk, benh@amazon.com,
	ronenk@amazon.com, talel@amazon.com, jonnyc@amazon.com,
	hanochu@amazon.com, linux-edac@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC
Date: Wed, 19 Jun 2019 18:22:37 +0100	[thread overview]
Message-ID: <44da6863-eb79-a61b-a4bf-9e8c6cacc2b8@arm.com> (raw)
In-Reply-To: <bbb9b41d-8ffa-d4c5-c199-2400695cce8d@amazon.com>

Hi Hawa,

On 17/06/2019 14:00, Hawa, Hanna wrote:
>> I don't think it can, on a second reading, it looks to be even more complicated than I
>> thought! That bit is described as disabling forwarding of uncorrected data, but it looks
>> like the uncorrected data never actually reaches the other end. (I'm unsure what 'flush'
>> means in this context.)
>> I was looking for reasons you could 'know' that any reported error was corrected. This was
>> just a bad suggestion!

> Is there interrupt for un-correctable error?

The answer here is somewhere between 'not really' and 'maybe'.
There is a signal you may have wired-up as an interrupt, but its not usable from linux.

A.8.2 "Asychronous error signals" of the A57 TRM [0] has:
| nINTERRIRQ output Error indicator for an L2 RAM double-bit ECC error.
("7.6 Asynchronous errors" has more on this).

Errors cause L2ECTLR[30] to get set, and this value output as a signal, you may have wired
it up as an interrupt.

If you did, beware its level sensitive, and can only be cleared by writing to L2ECTLR_EL1.
You shouldn't allow linux to access this register as it could mess with the L2
configuration, which could also affect your EL3 and any secure-world software.

The arrival of this interrupt doesn't tell you which L2 tripped the error, and you can
only clear it if you write to L2ECTLR_EL1 on a CPU attached to the right L2. So this isn't
actually a shared (peripheral) interrupt.

This stuff is expected to be used by firmware, which can know the affinity constraints of
signals coming in as interrupts.


> Does 'asynchronous errors' in L2 used to report UE?

From "7.2.4 Error correction code" single-bit errors are always corrected.
A.8.2 quoted above gives the behaviour for double-bit errors.


> In case no interrupt, can we use die-notifier subsystem to check if any error had occur
> while system shutdown?

notify_die() would imply a synchronous exception that killed a thread. SError are a whole
lot worse. Before v8.2 these are all treated as 'uncontained': unknown memory corruption.
Which in your L2 case is exactly what happened. The arch code will panic().

If your driver can print something useful to help debug the panic(), then a panic_notifier
sounds appropriate. But you can't rely on these notifiers being called, as kdump has some
hooks that affect if/when they run.

(KVM will 'contain' SError that come from a guest to the guest, as we know a distinct set
of memory was in use. You may see fatal error counters increasing without the system
panic()ing)

contained/uncontained is part of the terminology from the v8.2 RAS spec [1].


Thanks,

James


[0]
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0488c/DDI0488C_cortex_a57_mpcore_r1p0_trm.pdf
[1]
https://static.docs.arm.com/ddi0587/ca/ARM_DDI_0587C_a_RAS.pdf?_ga=2.148234679.1686960568.1560964184-897392434.1556719556

      reply	other threads:[~2019-06-19 17:22 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30 10:15 [PATCH 0/2] Add support for Amazon's Annapurna Labs EDAC for L1/L2 Hanna Hawa
2019-05-30 10:15 ` [PATCH 1/2] dt-bindings: EDAC: add Amazon Annapurna Labs EDAC binding Hanna Hawa
2019-05-30 11:54   ` Greg KH
2019-05-31  0:35     ` Borislav Petkov
2019-05-30 10:15 ` [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC Hanna Hawa
2019-05-30 11:57   ` Greg KH
2019-05-30 12:52     ` hhhawa
2019-05-30 13:04       ` Joe Perches
2019-05-30 18:19   ` Boris Petkov
2019-05-31  1:15     ` Herrenschmidt, Benjamin
2019-05-31  5:14       ` Borislav Petkov
2019-06-05 15:13         ` James Morse
2019-06-06  7:53         ` Hawa, Hanna
2019-06-06 10:03           ` Borislav Petkov
2019-06-06 10:33           ` James Morse
2019-06-06 11:22             ` Borislav Petkov
2019-06-06 11:37             ` Shenhar, Talel
2019-06-07 15:11               ` James Morse
2019-06-08  0:22                 ` Benjamin Herrenschmidt
2019-06-08  0:16             ` Benjamin Herrenschmidt
2019-06-08  9:05               ` Borislav Petkov
2019-06-11  5:50                 ` Benjamin Herrenschmidt
2019-06-11  7:21                   ` Benjamin Herrenschmidt
2019-06-11 11:56                     ` Borislav Petkov
2019-06-11 22:25                       ` Benjamin Herrenschmidt
2019-06-12  3:48                         ` Borislav Petkov
2019-06-12  8:29                           ` Benjamin Herrenschmidt
2019-06-12 10:42                             ` Borislav Petkov
2019-06-12 23:54                               ` Benjamin Herrenschmidt
2019-06-13  7:44                                 ` Borislav Petkov
2019-06-14 10:53                                 ` Borislav Petkov
2019-06-12 10:42                             ` Mauro Carvalho Chehab
2019-06-12 11:00                               ` Borislav Petkov
2019-06-12 11:42                                 ` Mauro Carvalho Chehab
2019-06-12 11:57                                   ` Benjamin Herrenschmidt
2019-06-12 12:25                                     ` Borislav Petkov
2019-06-12 12:35                                       ` Hawa, Hanna
2019-06-12 15:34                                         ` Borislav Petkov
2019-06-12 23:57                                       ` Benjamin Herrenschmidt
2019-06-12 23:56                                 ` Benjamin Herrenschmidt
2019-06-11  7:29                   ` Hawa, Hanna
2019-06-11 11:59                     ` Borislav Petkov
2019-06-11 11:47                   ` Borislav Petkov
2019-06-03  6:56       ` Hawa, Hanna
2019-06-05 15:16   ` James Morse
2019-06-11 19:56     ` Hawa, Hanna
2019-06-13 17:05       ` James Morse
2019-06-14 10:49         ` James Morse
2019-06-17 13:00         ` Hawa, Hanna
2019-06-19 17:22           ` James Morse [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44da6863-eb79-a61b-a4bf-9e8c6cacc2b8@arm.com \
    --to=james.morse@arm.com \
    --cc=benh@amazon.com \
    --cc=bp@alien8.de \
    --cc=davem@davemloft.net \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw@amazon.co.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=hanochu@amazon.com \
    --cc=hhhawa@amazon.com \
    --cc=jonnyc@amazon.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mchehab@kernel.org \
    --cc=nicolas.ferre@microchip.com \
    --cc=paulmck@linux.ibm.com \
    --cc=robh+dt@kernel.org \
    --cc=ronenk@amazon.com \
    --cc=talel@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).