linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Sascha Hauer <s.hauer@pengutronix.de>
Cc: linux-edac@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Tony Luck <tony.luck@intel.com>,
	Robert Richter <rrichter@marvell.com>,
	York Sun <york.sun@nxp.com>,
	kernel@pengutronix.de, linux-arm-kernel@lists.infradead.org,
	devicetree@vger.kernel.org, Rob Herring <robh+dt@kernel.org>
Subject: Re: [PATCH 2/3] drivers/edac: Add L1 and L2 error detection for A53 and A57
Date: Fri, 6 Nov 2020 19:34:07 +0000	[thread overview]
Message-ID: <6ea8a824-ba55-6c5b-993d-4c782e396f32@arm.com> (raw)
In-Reply-To: <20201013125033.4749-3-s.hauer@pengutronix.de>

Hi Sascha,

On 13/10/2020 13:50, Sascha Hauer wrote:
> The Cortex A53 and A57 cores have error detection capabilities for the
> L1/L2 Caches, this patch adds a driver for them.
> 
> Unfortunately there is no robust way to inject errors into the caches,
> so this driver doesn't contain any code to actually test it. It has
> been tested though with code taken from an older version of this driver
> found here: https://lkml.org/lkml/2018/3/14/1203.

> For reasons stated
> in this thread the error injection code is not suitable for mainline,
> so it is removed from the driver.


> diff --git a/drivers/edac/cortex_arm64_l1_l2.c b/drivers/edac/cortex_arm64_l1_l2.c
> new file mode 100644
> index 000000000000..fb8386eb40ac
> --- /dev/null
> +++ b/drivers/edac/cortex_arm64_l1_l2.c
> @@ -0,0 +1,208 @@

> +static void read_errors(void *data)
> +{
> +	struct edac_device_ctl_info *edac_ctl = data;
> +	int cpu = smp_processor_id();
> +	char msg[MESSAGE_SIZE];
> +	u64 cpumerr, l2merr;
> +
> +	/* cpumerrsr_el1 */
> +	asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (cpumerr));
> +	asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (0));

I think you've seen earlier comments on using the sys_reg macros for this. There were
versions of binutils out there that choke on this.

[...]

> +}
> +
> +static void cortex_arm64_edac_check(struct edac_device_ctl_info *edac_ctl)
> +{
> +	struct arm64_pvt *pvt = edac_ctl->pvt_info;
> +	call_single_data_t *csd;
> +	int cpu;
> +
> +	get_online_cpus();
> +	for_each_cpu_and(cpu, cpu_online_mask, &pvt->compat_mask) {
> +		csd = per_cpu_ptr(pvt->csd_check, cpu);
> +		csd->func = read_errors;
> +		csd->info = edac_ctl;
> +		csd->flags = 0;

> +		/* Read CPU L1/L2 errors */
> +		smp_call_function_single_async(cpu, csd);
> +		/* Wait until flags cleared */
> +		smp_cond_load_acquire(&csd->flags, !VAL);

Hmm. We end up waiting for each CPU to schedule something else. I can't see any reason we
can't sleep here.

Can't we use smp_call_function_many() here? It already considers cpu_online_mask, you'd
just need to deal with read_errors() being called in parallel with itself.

(concurrent calls into edac are one problem, but two CPUs read/writing the same L2
register could lead to double counting)


> +	}
> +	put_online_cpus();
> +}


> +static int cortex_arm64_edac_probe(struct platform_device *pdev)
> +{
> +	struct device_node *np, *dn = pdev->dev.of_node;
> +	struct edac_device_ctl_info *edac_ctl;
> +	struct device *dev = &pdev->dev;
> +	struct of_phandle_iterator it;
> +	struct arm64_pvt *pvt;
> +	int rc, cpu;
> +
> +	edac_ctl = edac_device_alloc_ctl_info(sizeof(*pvt), "cpu_cache",
> +					      1, "L", 2, 1, NULL, 0,
> +					      edac_device_alloc_index());

I used this series to test on Juno to poke the user-space interface:
This chokes on a big-little system as it can't register "cpu_cache" a second time.

I think we should try to make the topology look like the one in edac_device.h. This means
calling it 'cpu', and registering all of them up front.
On a big/little system the second probe() call would need to be careful.

I can have a go at this if you don't have a platform to hand.


(The 'L2-cache' thing in edac_device.h turns out to be impossible and the 'Lx' you've done
here is the most popular option. I'll post a patch to change the documentation to what
people are doing)


[...]

> +}


Thanks,

James


  reply	other threads:[~2020-11-06 19:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-13 12:50 [PATCH v2 0/3] Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2020-10-13 12:50 ` [PATCH 1/3] dt-bindings: edac: Add binding for L1/L2 error detection for Cortex A53/57 Sascha Hauer
2020-10-14 13:25   ` Rob Herring
2020-10-13 12:50 ` [PATCH 2/3] drivers/edac: Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2020-11-06 19:34   ` James Morse [this message]
2020-10-13 12:50 ` [PATCH 3/3] arm64: dts: ls104x: Add L1/L2 cache edac node Sascha Hauer
2020-10-14 13:25 ` [PATCH v2 0/3] Add L1 and L2 error detection for A53 and A57 Rob Herring
2020-10-14 14:04   ` Sascha Hauer
2020-10-14 15:17     ` Rob Herring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ea8a824-ba55-6c5b-993d-4c782e396f32@arm.com \
    --to=james.morse@arm.com \
    --cc=bp@alien8.de \
    --cc=devicetree@vger.kernel.org \
    --cc=kernel@pengutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=rrichter@marvell.com \
    --cc=s.hauer@pengutronix.de \
    --cc=tony.luck@intel.com \
    --cc=york.sun@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).