linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Sascha Hauer <s.hauer@pengutronix.de>
Cc: linux-edac@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Tony Luck <tony.luck@intel.com>,
	James Morse <james.morse@arm.com>,
	Robert Richter <rrichter@marvell.com>,
	York Sun <york.sun@nxp.com>,
	kernel@pengutronix.de, linux-arm-kernel@lists.infradead.org,
	Rob Herring <robh+dt@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>
Subject: Re: [PATCH 1/2] drivers/edac: Add L1 and L2 error detection for A53 and A57
Date: Fri, 02 Apr 2021 11:06:56 +0100	[thread overview]
Message-ID: <87czvdf1a7.wl-maz@kernel.org> (raw)
In-Reply-To: <20210401110615.15326-2-s.hauer@pengutronix.de>

On Thu, 01 Apr 2021 12:06:14 +0100,
Sascha Hauer <s.hauer@pengutronix.de> wrote:
> 
> The Cortex A53 and A57 cores have error detection capabilities for the
> L1/L2 Caches, this patch adds a driver for them.
> 
> Unfortunately there is no robust way to inject errors into the caches,
> so this driver doesn't contain any code to actually test it. It has
> been tested though with code taken from an older version of this driver
> found here: https://lkml.org/lkml/2018/3/14/1203. For reasons stated
> in this thread the error injection code is not suitable for mainline,
> so it is removed from the driver.
> 
> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
> ---
>  drivers/edac/Kconfig              |   6 +
>  drivers/edac/Makefile             |   1 +
>  drivers/edac/cortex_arm64_l1_l2.c | 218 ++++++++++++++++++++++++++++++
>  3 files changed, 225 insertions(+)
>  create mode 100644 drivers/edac/cortex_arm64_l1_l2.c
> 
> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
> index 27d0c4cdc58d..b038aed35e93 100644
> --- a/drivers/edac/Kconfig
> +++ b/drivers/edac/Kconfig
> @@ -538,4 +538,10 @@ config EDAC_DMC520
>  	  Support for error detection and correction on the
>  	  SoCs with ARM DMC-520 DRAM controller.
>  
> +config EDAC_CORTEX_ARM64_L1_L2
> +	tristate "ARM Cortex A57/A53"
> +	depends on ARM64
> +	help
> +	  Support for L1/L2 cache error detection on ARM Cortex A57 and A53.

I went through the TRMs for a few other Cortex-A cores, and this
feature looks more common than this comment suggests. At least A35 and
A72 implement something similar (if not strictly identical), probably
owing to their ancestry.

> +
>  endif # EDAC
> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
> index 2d1641a27a28..5849d8bb32ae 100644
> --- a/drivers/edac/Makefile
> +++ b/drivers/edac/Makefile
> @@ -84,3 +84,4 @@ obj-$(CONFIG_EDAC_QCOM)			+= qcom_edac.o
>  obj-$(CONFIG_EDAC_ASPEED)		+= aspeed_edac.o
>  obj-$(CONFIG_EDAC_BLUEFIELD)		+= bluefield_edac.o
>  obj-$(CONFIG_EDAC_DMC520)		+= dmc520_edac.o
> +obj-$(CONFIG_EDAC_CORTEX_ARM64_L1_L2)	+= cortex_arm64_l1_l2.o
> diff --git a/drivers/edac/cortex_arm64_l1_l2.c b/drivers/edac/cortex_arm64_l1_l2.c
> new file mode 100644
> index 000000000000..3b1e2f3ccab6
> --- /dev/null
> +++ b/drivers/edac/cortex_arm64_l1_l2.c
> @@ -0,0 +1,218 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Cortex A57 and A53 EDAC L1 and L2 cache error detection
> + *
> + * Copyright (c) 2020 Pengutronix, Sascha Hauer <s.hauer@pengutronix.de>
> + *
> + * Based on Code from:
> + * Copyright (c) 2018, NXP Semiconductor
> + * Author: York Sun <york.sun@nxp.com>
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/of_platform.h>
> +#include <linux/of_device.h>
> +#include <linux/bitfield.h>
> +#include <asm/smp_plat.h>
> +
> +#include "edac_module.h"
> +
> +#define DRVNAME				"cortex-arm64-edac"
> +
> +#define CPUMERRSR_EL1_RAMID		GENMASK(30, 24)
> +
> +#define CPUMERRSR_EL1_VALID		BIT(31)
> +#define CPUMERRSR_EL1_FATAL		BIT(63)
> +
> +#define L1_I_TAG_RAM			0x00
> +#define L1_I_DATA_RAM			0x01
> +#define L1_D_TAG_RAM			0x08
> +#define L1_D_DATA_RAM			0x09
> +#define L1_D_DIRTY_RAM			0x14
> +#define TLB_RAM				0x18
> +
> +#define L2MERRSR_EL1_VALID		BIT(31)
> +#define L2MERRSR_EL1_FATAL		BIT(63)
> +
> +struct merrsr {
> +	u64 cpumerr;
> +	u64 l2merr;
> +};
> +
> +#define MESSAGE_SIZE 64
> +
> +#define SYS_CPUMERRSR_EL1			sys_reg(3, 1, 15, 2, 2)
> +#define SYS_L2MERRSR_EL1			sys_reg(3, 1, 15, 2, 3)
> +
> +static struct cpumask compat_mask;
> +
> +static void report_errors(struct edac_device_ctl_info *edac_ctl, int cpu,
> +			  struct merrsr *merrsr)
> +{
> +	char msg[MESSAGE_SIZE];
> +	u64 cpumerr = merrsr->cpumerr;
> +	u64 l2merr = merrsr->l2merr;
> +
> +	if (cpumerr & CPUMERRSR_EL1_VALID) {
> +		const char *str;
> +		bool fatal = cpumerr & CPUMERRSR_EL1_FATAL;
> +
> +		switch (FIELD_GET(CPUMERRSR_EL1_RAMID, cpumerr)) {
> +		case L1_I_TAG_RAM:
> +			str = "L1-I Tag RAM";
> +			break;
> +		case L1_I_DATA_RAM:
> +			str = "L1-I Data RAM";
> +			break;
> +		case L1_D_TAG_RAM:
> +			str = "L1-D Tag RAM";
> +			break;
> +		case L1_D_DATA_RAM:
> +			str = "L1-D Data RAM";
> +			break;
> +		case L1_D_DIRTY_RAM:
> +			str = "L1 Dirty RAM";
> +			break;
> +		case TLB_RAM:
> +			str = "TLB RAM";
> +			break;
> +		default:
> +			str = "unknown";
> +			break;
> +		}
> +
> +		snprintf(msg, MESSAGE_SIZE, "%s %s error(s) on CPU %d",
> +			 str, fatal ? "fatal" : "correctable", cpu);
> +
> +		if (fatal)
> +			edac_device_handle_ue(edac_ctl, cpu, 0, msg);
> +		else
> +			edac_device_handle_ce(edac_ctl, cpu, 0, msg);
> +	}
> +
> +	if (l2merr & L2MERRSR_EL1_VALID) {
> +		bool fatal = l2merr & L2MERRSR_EL1_FATAL;
> +
> +		snprintf(msg, MESSAGE_SIZE, "L2 %s error(s) on CPU %d",
> +			 fatal ? "fatal" : "correctable", cpu);

The shared nature of the L2 makes the CPU it has been detected on
pretty much irrelevant. What you really want here is the CPUID+Way
that is in the register data.

> +		if (fatal)
> +			edac_device_handle_ue(edac_ctl, cpu, 1, msg);
> +		else
> +			edac_device_handle_ce(edac_ctl, cpu, 1, msg);
> +	}
> +}
> +
> +static void read_errors(void *data)
> +{
> +	struct merrsr *merrsr = data;
> +
> +	merrsr->cpumerr = read_sysreg_s(SYS_CPUMERRSR_EL1);
> +	write_sysreg_s(0, SYS_CPUMERRSR_EL1);
> +	merrsr->l2merr = read_sysreg_s(SYS_L2MERRSR_EL1);
> +	write_sysreg_s(0, SYS_L2MERRSR_EL1);

If an error happens between read and write, you lose it. That's not
great. You could improve things by only writing 0 if you have found an
error. You probably also need an isb after the write if you want it to
take effect in a timely manner.

I'm also not sure of how valuable it is to probe for L2 errors on each
CPU, given that it is shared with up to 3 other cores. You probably
want to use the cache topology information for this.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2021-04-02 10:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 11:06 [PATCH v5 0/2] Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2021-04-01 11:06 ` [PATCH 1/2] drivers/edac: " Sascha Hauer
2021-04-02 10:06   ` Marc Zyngier [this message]
2021-04-15 10:15     ` Sascha Hauer
2021-04-01 11:06 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Sascha Hauer
2021-04-01 15:37   ` Marc Zyngier
  -- strict thread matches above, loose matches on Subject: below --
2021-02-01 11:57 [PATCH iv4 0/2] Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2021-02-01 11:57 ` [PATCH 1/2] drivers/edac: " Sascha Hauer
2020-08-13  7:57 [PATCH 0/2] " Sascha Hauer
2020-08-13  7:57 ` [PATCH 1/2] drivers/edac: " Sascha Hauer
2020-08-26  8:41   ` Borislav Petkov
2020-10-13 11:13     ` Sascha Hauer
2020-10-13 11:31       ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czvdf1a7.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=kernel@pengutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mchehab@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=rrichter@marvell.com \
    --cc=s.hauer@pengutronix.de \
    --cc=tony.luck@intel.com \
    --cc=york.sun@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).