From: "Hawa, Hanna" <hhhawa@amazon.com>
To: James Morse <james.morse@arm.com>
Cc: <robh+dt@kernel.org>, <mark.rutland@arm.com>, <bp@alien8.de>,
<mchehab@kernel.org>, <davem@davemloft.net>,
<gregkh@linuxfoundation.org>, <nicolas.ferre@microchip.com>,
<paulmck@linux.ibm.com>, <dwmw@amazon.co.uk>, <benh@amazon.com>,
<ronenk@amazon.com>, <talel@amazon.com>, <jonnyc@amazon.com>,
<hanochu@amazon.com>, <linux-edac@vger.kernel.org>,
<devicetree@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC
Date: Mon, 17 Jun 2019 16:00:45 +0300 [thread overview]
Message-ID: <bbb9b41d-8ffa-d4c5-c199-2400695cce8d@amazon.com> (raw)
In-Reply-To: <fdc3b458-96eb-1734-c294-2463f37f2244@arm.com>
>>>> +static void al_a57_edac_l2merrsr(void *arg)
>>>> +{
>>>
>>>> + edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
>>>
>>> How do we know this is corrected?
>
>>> If looks like L2CTLR_EL1[20] might force fatal 1/0 to map to uncorrected/corrected. Is
>>> this what you are depending on here?
>
>> No - not on this. Reporting all the errors as corrected seems to be bad.
>>
>> Can i be depends on fatal field?
>
> That is described as "set to 1 on the first memory error that caused a Data Abort". I
> assume this is one of the parity-error external-aborts.
>
> If the repeat counter shows, say, 2, and fatal is set, you only know that at least one of
> these errors caused an abort. But it could have been all three. The repeat counter only
> matches against the RAMID and friends, otherwise the error is counted in 'other'.
>
> I don't think there is a right thing to do here, (other than increase the scrubbing
> frequency). As you can only feed one error into edac at a time then:
>
>> if (fatal)
>> edac_device_handle_ue(edac_dev, 0, 0, "L2 Error");
>> else
>> edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
>
> seems reasonable. You're reporting the most severe, and 'other/repeat' counter values just
> go missing.
I had print the values of 'other/repeat' to be noticed.
>
>
>> How can L2CTLR_EL1[20] force fatal?
>
> I don't think it can, on a second reading, it looks to be even more complicated than I
> thought! That bit is described as disabling forwarding of uncorrected data, but it looks
> like the uncorrected data never actually reaches the other end. (I'm unsure what 'flush'
> means in this context.)
> I was looking for reasons you could 'know' that any reported error was corrected. This was
> just a bad suggestion!
Is there interrupt for un-correctable error?
Does 'asynchronous errors' in L2 used to report UE?
In case no interrupt, can we use die-notifier subsystem to check if any
error had occur while system shutdown?
>>>> + cluster = topology_physical_package_id(cpu);
>>>
>>> Hmm, I'm not sure cluster==package is guaranteed to be true forever.
>>>
>>> If you describe the L2MERRSR_EL1 cpu mapping in your DT you could use that. Otherwise
>>> pulling out the DT using something like the arch code's parse_cluster().
>
>> I rely on that it's alpine SoC specific driver.
>
> ... and that the topology code hasn't changed to really know what a package is:
> https://lore.kernel.org/lkml/20190529211340.17087-2-atish.patra@wdc.com/T/#u
>
> As what you really want to know is 'same L2?', and you're holding the cpu_read_lock(),
> would struct cacheinfo's shared_cpu_map be a better fit?
>
> This would be done by something like a cpu-mask of cache:shared_cpu_map's for the L2's
> you've visited. It removes the dependency on package==L2, and insulates you from the
> cpu-numbering not being exactly as you expect.
I'll add dt property that point to L2-cache node (phandle), then it'll
be easy to create cpu-mask with all cores that point to same l2 cache.
Thanks,
Hanna
next prev parent reply other threads:[~2019-06-17 13:01 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-30 10:15 [PATCH 0/2] Add support for Amazon's Annapurna Labs EDAC for L1/L2 Hanna Hawa
2019-05-30 10:15 ` [PATCH 1/2] dt-bindings: EDAC: add Amazon Annapurna Labs EDAC binding Hanna Hawa
2019-05-30 11:54 ` Greg KH
2019-05-31 0:35 ` Borislav Petkov
2019-05-30 10:15 ` [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC Hanna Hawa
2019-05-30 11:57 ` Greg KH
2019-05-30 12:52 ` hhhawa
2019-05-30 13:04 ` Joe Perches
2019-05-30 18:19 ` Boris Petkov
2019-05-31 1:15 ` Herrenschmidt, Benjamin
2019-05-31 5:14 ` Borislav Petkov
2019-06-05 15:13 ` James Morse
2019-06-06 7:53 ` Hawa, Hanna
2019-06-06 10:03 ` Borislav Petkov
2019-06-06 10:33 ` James Morse
2019-06-06 11:22 ` Borislav Petkov
2019-06-06 11:37 ` Shenhar, Talel
2019-06-07 15:11 ` James Morse
2019-06-08 0:22 ` Benjamin Herrenschmidt
2019-06-08 0:16 ` Benjamin Herrenschmidt
2019-06-08 9:05 ` Borislav Petkov
2019-06-11 5:50 ` Benjamin Herrenschmidt
2019-06-11 7:21 ` Benjamin Herrenschmidt
2019-06-11 11:56 ` Borislav Petkov
2019-06-11 22:25 ` Benjamin Herrenschmidt
2019-06-12 3:48 ` Borislav Petkov
2019-06-12 8:29 ` Benjamin Herrenschmidt
2019-06-12 10:42 ` Borislav Petkov
2019-06-12 23:54 ` Benjamin Herrenschmidt
2019-06-13 7:44 ` Borislav Petkov
2019-06-14 10:53 ` Borislav Petkov
2019-06-12 10:42 ` Mauro Carvalho Chehab
2019-06-12 11:00 ` Borislav Petkov
2019-06-12 11:42 ` Mauro Carvalho Chehab
2019-06-12 11:57 ` Benjamin Herrenschmidt
2019-06-12 12:25 ` Borislav Petkov
2019-06-12 12:35 ` Hawa, Hanna
2019-06-12 15:34 ` Borislav Petkov
2019-06-12 23:57 ` Benjamin Herrenschmidt
2019-06-12 23:56 ` Benjamin Herrenschmidt
2019-06-11 7:29 ` Hawa, Hanna
2019-06-11 11:59 ` Borislav Petkov
2019-06-11 11:47 ` Borislav Petkov
2019-06-03 6:56 ` Hawa, Hanna
2019-06-05 15:16 ` James Morse
2019-06-11 19:56 ` Hawa, Hanna
2019-06-13 17:05 ` James Morse
2019-06-14 10:49 ` James Morse
2019-06-17 13:00 ` Hawa, Hanna [this message]
2019-06-19 17:22 ` James Morse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bbb9b41d-8ffa-d4c5-c199-2400695cce8d@amazon.com \
--to=hhhawa@amazon.com \
--cc=benh@amazon.com \
--cc=bp@alien8.de \
--cc=davem@davemloft.net \
--cc=devicetree@vger.kernel.org \
--cc=dwmw@amazon.co.uk \
--cc=gregkh@linuxfoundation.org \
--cc=hanochu@amazon.com \
--cc=james.morse@arm.com \
--cc=jonnyc@amazon.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mchehab@kernel.org \
--cc=nicolas.ferre@microchip.com \
--cc=paulmck@linux.ibm.com \
--cc=robh+dt@kernel.org \
--cc=ronenk@amazon.com \
--cc=talel@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).