From: James Morse <james.morse@arm.com>
To: Sascha Hauer <s.hauer@pengutronix.de>
Cc: linux-edac@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Tony Luck <tony.luck@intel.com>,
Robert Richter <rrichter@marvell.com>,
York Sun <york.sun@nxp.com>,
kernel@pengutronix.de, linux-arm-kernel@lists.infradead.org,
devicetree@vger.kernel.org, Rob Herring <robh+dt@kernel.org>
Subject: Re: [PATCH 2/3] drivers/edac: Add L1 and L2 error detection for A53 and A57
Date: Fri, 6 Nov 2020 19:34:07 +0000 [thread overview]
Message-ID: <6ea8a824-ba55-6c5b-993d-4c782e396f32@arm.com> (raw)
In-Reply-To: <20201013125033.4749-3-s.hauer@pengutronix.de>
Hi Sascha,
On 13/10/2020 13:50, Sascha Hauer wrote:
> The Cortex A53 and A57 cores have error detection capabilities for the
> L1/L2 Caches, this patch adds a driver for them.
>
> Unfortunately there is no robust way to inject errors into the caches,
> so this driver doesn't contain any code to actually test it. It has
> been tested though with code taken from an older version of this driver
> found here: https://lkml.org/lkml/2018/3/14/1203.
> For reasons stated
> in this thread the error injection code is not suitable for mainline,
> so it is removed from the driver.
> diff --git a/drivers/edac/cortex_arm64_l1_l2.c b/drivers/edac/cortex_arm64_l1_l2.c
> new file mode 100644
> index 000000000000..fb8386eb40ac
> --- /dev/null
> +++ b/drivers/edac/cortex_arm64_l1_l2.c
> @@ -0,0 +1,208 @@
> +static void read_errors(void *data)
> +{
> + struct edac_device_ctl_info *edac_ctl = data;
> + int cpu = smp_processor_id();
> + char msg[MESSAGE_SIZE];
> + u64 cpumerr, l2merr;
> +
> + /* cpumerrsr_el1 */
> + asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (cpumerr));
> + asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (0));
I think you've seen earlier comments on using the sys_reg macros for this. There were
versions of binutils out there that choke on this.
[...]
> +}
> +
> +static void cortex_arm64_edac_check(struct edac_device_ctl_info *edac_ctl)
> +{
> + struct arm64_pvt *pvt = edac_ctl->pvt_info;
> + call_single_data_t *csd;
> + int cpu;
> +
> + get_online_cpus();
> + for_each_cpu_and(cpu, cpu_online_mask, &pvt->compat_mask) {
> + csd = per_cpu_ptr(pvt->csd_check, cpu);
> + csd->func = read_errors;
> + csd->info = edac_ctl;
> + csd->flags = 0;
> + /* Read CPU L1/L2 errors */
> + smp_call_function_single_async(cpu, csd);
> + /* Wait until flags cleared */
> + smp_cond_load_acquire(&csd->flags, !VAL);
Hmm. We end up waiting for each CPU to schedule something else. I can't see any reason we
can't sleep here.
Can't we use smp_call_function_many() here? It already considers cpu_online_mask, you'd
just need to deal with read_errors() being called in parallel with itself.
(concurrent calls into edac are one problem, but two CPUs read/writing the same L2
register could lead to double counting)
> + }
> + put_online_cpus();
> +}
> +static int cortex_arm64_edac_probe(struct platform_device *pdev)
> +{
> + struct device_node *np, *dn = pdev->dev.of_node;
> + struct edac_device_ctl_info *edac_ctl;
> + struct device *dev = &pdev->dev;
> + struct of_phandle_iterator it;
> + struct arm64_pvt *pvt;
> + int rc, cpu;
> +
> + edac_ctl = edac_device_alloc_ctl_info(sizeof(*pvt), "cpu_cache",
> + 1, "L", 2, 1, NULL, 0,
> + edac_device_alloc_index());
I used this series to test on Juno to poke the user-space interface:
This chokes on a big-little system as it can't register "cpu_cache" a second time.
I think we should try to make the topology look like the one in edac_device.h. This means
calling it 'cpu', and registering all of them up front.
On a big/little system the second probe() call would need to be careful.
I can have a go at this if you don't have a platform to hand.
(The 'L2-cache' thing in edac_device.h turns out to be impossible and the 'Lx' you've done
here is the most popular option. I'll post a patch to change the documentation to what
people are doing)
[...]
> +}
Thanks,
James
next prev parent reply other threads:[~2020-11-06 19:34 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-13 12:50 [PATCH v2 0/3] Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2020-10-13 12:50 ` [PATCH 1/3] dt-bindings: edac: Add binding for L1/L2 error detection for Cortex A53/57 Sascha Hauer
2020-10-14 13:25 ` Rob Herring
2020-10-13 12:50 ` [PATCH 2/3] drivers/edac: Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2020-11-06 19:34 ` James Morse [this message]
2020-10-13 12:50 ` [PATCH 3/3] arm64: dts: ls104x: Add L1/L2 cache edac node Sascha Hauer
2020-10-14 13:25 ` [PATCH v2 0/3] Add L1 and L2 error detection for A53 and A57 Rob Herring
2020-10-14 14:04 ` Sascha Hauer
2020-10-14 15:17 ` Rob Herring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6ea8a824-ba55-6c5b-993d-4c782e396f32@arm.com \
--to=james.morse@arm.com \
--cc=bp@alien8.de \
--cc=devicetree@vger.kernel.org \
--cc=kernel@pengutronix.de \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-edac@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=robh+dt@kernel.org \
--cc=rrichter@marvell.com \
--cc=s.hauer@pengutronix.de \
--cc=tony.luck@intel.com \
--cc=york.sun@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).