linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sascha Hauer <s.hauer@pengutronix.de>
To: Marc Zyngier <maz@kernel.org>
Cc: linux-edac@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Tony Luck <tony.luck@intel.com>,
	James Morse <james.morse@arm.com>,
	Robert Richter <rrichter@marvell.com>,
	York Sun <york.sun@nxp.com>,
	kernel@pengutronix.de, linux-arm-kernel@lists.infradead.org,
	Rob Herring <robh+dt@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>
Subject: Re: [PATCH 1/2] drivers/edac: Add L1 and L2 error detection for A53 and A57
Date: Thu, 15 Apr 2021 12:15:09 +0200	[thread overview]
Message-ID: <20210415101509.GD19819@pengutronix.de> (raw)
In-Reply-To: <87czvdf1a7.wl-maz@kernel.org>

Hi Marc,

Thanks for the input.

On Fri, Apr 02, 2021 at 11:06:56AM +0100, Marc Zyngier wrote:
> > +config EDAC_CORTEX_ARM64_L1_L2
> > +	tristate "ARM Cortex A57/A53"
> > +	depends on ARM64
> > +	help
> > +	  Support for L1/L2 cache error detection on ARM Cortex A57 and A53.
> 
> I went through the TRMs for a few other Cortex-A cores, and this
> feature looks more common than this comment suggests. At least A35 and
> A72 implement something similar (if not strictly identical), probably
> owing to their ancestry.

Ok, I'll add these to the description.

> > +		}
> > +
> > +		snprintf(msg, MESSAGE_SIZE, "%s %s error(s) on CPU %d",
> > +			 str, fatal ? "fatal" : "correctable", cpu);
> > +
> > +		if (fatal)
> > +			edac_device_handle_ue(edac_ctl, cpu, 0, msg);
> > +		else
> > +			edac_device_handle_ce(edac_ctl, cpu, 0, msg);
> > +	}
> > +
> > +	if (l2merr & L2MERRSR_EL1_VALID) {
> > +		bool fatal = l2merr & L2MERRSR_EL1_FATAL;
> > +
> > +		snprintf(msg, MESSAGE_SIZE, "L2 %s error(s) on CPU %d",
> > +			 fatal ? "fatal" : "correctable", cpu);
> 
> The shared nature of the L2 makes the CPU it has been detected on
> pretty much irrelevant. What you really want here is the CPUID+Way
> that is in the register data.

You are right. For the next round I added some more code to decode the
CPUID/Way field. What's still missing then is information which L2
cache has errors in case there is more than one. I wonder if we should
add get_cpu_cacheinfo(cpu)->id to the message or if there's more to it.

> 
> > +		if (fatal)
> > +			edac_device_handle_ue(edac_ctl, cpu, 1, msg);
> > +		else
> > +			edac_device_handle_ce(edac_ctl, cpu, 1, msg);
> > +	}
> > +}
> > +
> > +static void read_errors(void *data)
> > +{
> > +	struct merrsr *merrsr = data;
> > +
> > +	merrsr->cpumerr = read_sysreg_s(SYS_CPUMERRSR_EL1);
> > +	write_sysreg_s(0, SYS_CPUMERRSR_EL1);
> > +	merrsr->l2merr = read_sysreg_s(SYS_L2MERRSR_EL1);
> > +	write_sysreg_s(0, SYS_L2MERRSR_EL1);
> 
> If an error happens between read and write, you lose it. That's not
> great. You could improve things by only writing 0 if you have found an
> error. You probably also need an isb after the write if you want it to
> take effect in a timely manner.

Ok, will change.

> 
> I'm also not sure of how valuable it is to probe for L2 errors on each
> CPU, given that it is shared with up to 3 other cores. You probably
> want to use the cache topology information for this.

I have no idea how l2merr is implemented. When there is only one
register for all CPUs sharing the same L2 cache then it shouldn't
do any harm to read it more than once. The expensive part is
probably to schedule a function on all CPUs, and we have to do that
anyway to read the L1 cache errors.

Sascha

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

  reply	other threads:[~2021-04-15 10:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 11:06 [PATCH v5 0/2] Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2021-04-01 11:06 ` [PATCH 1/2] drivers/edac: " Sascha Hauer
2021-04-02 10:06   ` Marc Zyngier
2021-04-15 10:15     ` Sascha Hauer [this message]
2021-04-01 11:06 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Sascha Hauer
2021-04-01 15:37   ` Marc Zyngier
  -- strict thread matches above, loose matches on Subject: below --
2021-02-01 11:57 [PATCH iv4 0/2] Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2021-02-01 11:57 ` [PATCH 1/2] drivers/edac: " Sascha Hauer
2020-08-13  7:57 [PATCH 0/2] " Sascha Hauer
2020-08-13  7:57 ` [PATCH 1/2] drivers/edac: " Sascha Hauer
2020-08-26  8:41   ` Borislav Petkov
2020-10-13 11:13     ` Sascha Hauer
2020-10-13 11:31       ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210415101509.GD19819@pengutronix.de \
    --to=s.hauer@pengutronix.de \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=kernel@pengutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=mchehab@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=rrichter@marvell.com \
    --cc=tony.luck@intel.com \
    --cc=york.sun@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).