All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: "Shenhar, Talel" <talel@amazon.com>
Cc: <robh+dt@kernel.org>, <tglx@linutronix.de>,
	<jason@lakedaemon.net>, <mark.rutland@arm.com>,
	<nicolas.ferre@microchip.com>, <mchehab+samsung@kernel.org>,
	<shawn.lin@rock-chips.com>, <gregkh@linuxfoundation.org>,
	<dwmw@amazon.co.uk>, <benh@kernel.crashing.org>,
	<linux-kernel@vger.kernel.org>, <devicetree@vger.kernel.org>,
	James Morse <james.morse@arm.com>
Subject: Re: [UNVERIFIED SENDER] Re: [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver
Date: Thu, 12 Sep 2019 09:50:22 +0100	[thread overview]
Message-ID: <865zlxsxtd.wl-maz@kernel.org> (raw)
In-Reply-To: <3205f7ae-5568-c064-23ac-ea726246173b@amazon.com>

On Thu, 12 Sep 2019 07:50:03 +0100,
"Shenhar, Talel" <talel@amazon.com> wrote:
> 
> Hi Marc,
> 
> 
> On 9/11/2019 5:15 PM, Marc Zyngier wrote:
> > [+James]
> > 
> > Hi Talel,
> > 
> > On Tue, 10 Sep 2019 20:05:09 +0100,
> > Talel Shenhar <talel@amazon.com> wrote:
> > 
> >> +	log1 = readl(pos->mmio_base + AL_POS_ERROR_LOG_1);
> > Do you actually need the implied barriers? I'd expect that relaxed
> > accesses should be enough.
> 
> You are correct. Barriers are not needed, In v1 this driver indeed
> used _relaxed versions.
> 
> Due to request coming from Arnd in v1 patch series I removed it. As
> this is not data path I had no strong objection for removing it.

Independently from whether this has any material impact on performance
(this obviously isn't a hot path, unless it can be directly generated
by userspace or a guest), I believe it is important to use the right
type of accessor, if only because code gets copied around... Others
would probably argue that this is the very reason why we should always
use the "safe" option...

> 
> > 
> >> +	if (!FIELD_GET(AL_POS_ERROR_LOG_1_VALID, log1))
> >> +		return IRQ_NONE;
> >> +
> >> +	log0 = readl(pos->mmio_base + AL_POS_ERROR_LOG_0);
> >> +	writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
> >> +
> >> +	addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
> >> +	addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
> >> +	request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
> >> +	bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
> >> +
> >> +	dev_err(&pdev->dev, "addr=0x%llx request_id=0x%x bresp=0x%x\n",
> >> +		addr, request_id, bresp);
> > What is this information? How do we make use of it? Given that this is
> > asynchronous, how do we correlate it to the offending software?
> 
> Indeed this information arriving from the HW is asynchronous.
> 
> There is no direct method to get the offending software.
> 
> There are all kinds of hacks we do to find the offending software once
> we find this error. most of the time its a new patch introduced but
> some of the time is just digging.

OK, so that the moment, this is more of a debug tool than anything
else, right?

> > The whole think looks to me like a poor man's EDAC handling, and I'd
> > expect to be plugged in that subsystem instead. Any reason why this
> > isn't the case? It would certainly make the handling uniform for the
> > user.
> 
> This logic was not plugged into EDAC as there is no "Correctable"
> error here. its just error event. Not all errors are EDAC in the sense
> of Error Detection And *Correction*. There are no correctable errors
> for this driver.

I'd argue the opposite! Because you obviously don't let a read-only
register being written to, the error has been corrected, and you
signal the correction status.

> So plugging it  under EDAC seems like abusing the EDAC system.
> 
> Now that I've emphasize the reason for not putting this under EDAC,
> what do you think? should this "only uncorrectable event" driver
> should be part of EDAC?

My choice would be to plug it into the EDAC subsystem, and report all
interrupts as "Corrected" events. Optionally, and only if you are
debugging something that requires it, report the error as
"Uncorrectable", in which case the EDAC subsystem should trigger a
panic.

At least you'd get the infrastructure, logging and tooling that the
EDAC subsystem offers (parsing the kernel log doesn't really count).

Thanks,

	M.

-- 
Jazz is not dead, it just smells funny.

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: "Shenhar, Talel" <talel@amazon.com>
Cc: robh+dt@kernel.org, tglx@linutronix.de, jason@lakedaemon.net,
	mark.rutland@arm.com, nicolas.ferre@microchip.com,
	mchehab+samsung@kernel.org, shawn.lin@rock-chips.com,
	gregkh@linuxfoundation.org, dwmw@amazon.co.uk,
	benh@kernel.crashing.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, James Morse <james.morse@arm.com>
Subject: Re: [UNVERIFIED SENDER] Re: [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver
Date: Thu, 12 Sep 2019 09:50:22 +0100	[thread overview]
Message-ID: <865zlxsxtd.wl-maz@kernel.org> (raw)
In-Reply-To: <3205f7ae-5568-c064-23ac-ea726246173b@amazon.com>

On Thu, 12 Sep 2019 07:50:03 +0100,
"Shenhar, Talel" <talel@amazon.com> wrote:
> 
> Hi Marc,
> 
> 
> On 9/11/2019 5:15 PM, Marc Zyngier wrote:
> > [+James]
> > 
> > Hi Talel,
> > 
> > On Tue, 10 Sep 2019 20:05:09 +0100,
> > Talel Shenhar <talel@amazon.com> wrote:
> > 
> >> +	log1 = readl(pos->mmio_base + AL_POS_ERROR_LOG_1);
> > Do you actually need the implied barriers? I'd expect that relaxed
> > accesses should be enough.
> 
> You are correct. Barriers are not needed, In v1 this driver indeed
> used _relaxed versions.
> 
> Due to request coming from Arnd in v1 patch series I removed it. As
> this is not data path I had no strong objection for removing it.

Independently from whether this has any material impact on performance
(this obviously isn't a hot path, unless it can be directly generated
by userspace or a guest), I believe it is important to use the right
type of accessor, if only because code gets copied around... Others
would probably argue that this is the very reason why we should always
use the "safe" option...

> 
> > 
> >> +	if (!FIELD_GET(AL_POS_ERROR_LOG_1_VALID, log1))
> >> +		return IRQ_NONE;
> >> +
> >> +	log0 = readl(pos->mmio_base + AL_POS_ERROR_LOG_0);
> >> +	writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
> >> +
> >> +	addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
> >> +	addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
> >> +	request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
> >> +	bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
> >> +
> >> +	dev_err(&pdev->dev, "addr=0x%llx request_id=0x%x bresp=0x%x\n",
> >> +		addr, request_id, bresp);
> > What is this information? How do we make use of it? Given that this is
> > asynchronous, how do we correlate it to the offending software?
> 
> Indeed this information arriving from the HW is asynchronous.
> 
> There is no direct method to get the offending software.
> 
> There are all kinds of hacks we do to find the offending software once
> we find this error. most of the time its a new patch introduced but
> some of the time is just digging.

OK, so that the moment, this is more of a debug tool than anything
else, right?

> > The whole think looks to me like a poor man's EDAC handling, and I'd
> > expect to be plugged in that subsystem instead. Any reason why this
> > isn't the case? It would certainly make the handling uniform for the
> > user.
> 
> This logic was not plugged into EDAC as there is no "Correctable"
> error here. its just error event. Not all errors are EDAC in the sense
> of Error Detection And *Correction*. There are no correctable errors
> for this driver.

I'd argue the opposite! Because you obviously don't let a read-only
register being written to, the error has been corrected, and you
signal the correction status.

> So plugging it  under EDAC seems like abusing the EDAC system.
> 
> Now that I've emphasize the reason for not putting this under EDAC,
> what do you think? should this "only uncorrectable event" driver
> should be part of EDAC?

My choice would be to plug it into the EDAC subsystem, and report all
interrupts as "Corrected" events. Optionally, and only if you are
debugging something that requires it, report the error as
"Uncorrectable", in which case the EDAC subsystem should trigger a
panic.

At least you'd get the infrastructure, logging and tooling that the
EDAC subsystem offers (parsing the kernel log doesn't really count).

Thanks,

	M.

-- 
Jazz is not dead, it just smells funny.

  reply	other threads:[~2019-09-12  8:50 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-10 19:05 [PATCH v2 0/3] Amazon's Annapurna Labs POS Driver Talel Shenhar
2019-09-10 19:05 ` Talel Shenhar
2019-09-10 19:05 ` [PATCH v2 1/3] dt-bindings: soc: al-pos: Amazon's Annapurna Labs POS Talel Shenhar
2019-09-10 19:05   ` Talel Shenhar
2019-09-18 13:32   ` Rob Herring
2019-09-18 13:44     ` Shenhar, Talel
2019-09-18 13:44       ` Shenhar, Talel
2019-09-10 19:05 ` [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver Talel Shenhar
2019-09-10 19:05   ` Talel Shenhar
2019-09-11 14:15   ` Marc Zyngier
2019-09-11 14:15     ` Marc Zyngier
2019-09-12  6:50     ` [UNVERIFIED SENDER] " Shenhar, Talel
2019-09-12  6:50       ` Shenhar, Talel
2019-09-12  8:50       ` Marc Zyngier [this message]
2019-09-12  8:50         ` Marc Zyngier
2019-09-12  9:19         ` [UNVERIFIED SENDER] " Shenhar, Talel
2019-09-12  9:19           ` Shenhar, Talel
2019-09-19 14:42           ` James Morse
2019-09-22  6:55             ` Shenhar, Talel
2019-09-22  6:55               ` Shenhar, Talel
2019-09-10 19:05 ` [PATCH v2 3/3] soc: amazon: al-pos: cast to u64 before left shifting Talel Shenhar
2019-09-10 19:05   ` Talel Shenhar
2019-09-11 14:18   ` Marc Zyngier
2019-09-11 14:18     ` Marc Zyngier
2019-09-12  6:51     ` [UNVERIFIED SENDER] " Shenhar, Talel
2019-09-12  6:51       ` Shenhar, Talel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=865zlxsxtd.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw@amazon.co.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=james.morse@arm.com \
    --cc=jason@lakedaemon.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mchehab+samsung@kernel.org \
    --cc=nicolas.ferre@microchip.com \
    --cc=robh+dt@kernel.org \
    --cc=shawn.lin@rock-chips.com \
    --cc=talel@amazon.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.