linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Bharat Bhushan <bbhushan2@marvell.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"maz@kernel.org" <maz@kernel.org>,
	"konrad.dybcio@somainline.org" <konrad.dybcio@somainline.org>,
	"saiprakash.ranjan@codeaurora.org"
	<saiprakash.ranjan@codeaurora.org>,
	"robh@kernel.org" <robh@kernel.org>,
	"marcan@marcan.st" <marcan@marcan.st>,
	"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
	"broonie@kernel.org" <broonie@kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linu Cherian <lcherian@marvell.com>
Subject: Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround
Date: Thu, 8 Jul 2021 12:41:57 +0100	[thread overview]
Message-ID: <20210708114157.GC24650@C02TD0UTHF1T.local> (raw)
In-Reply-To: <CO6PR18MB4465687A22FE724E59D4225CE3199@CO6PR18MB4465.namprd18.prod.outlook.com>

On Thu, Jul 08, 2021 at 10:47:42AM +0000, Bharat Bhushan wrote:
> Hi Mark,
> 
> Sorry for the delay, was gathering some details.
> Pease see inline
> 
> > -----Original Message-----
> > From: Mark Rutland <mark.rutland@arm.com>
> > Sent: Monday, July 5, 2021 2:38 PM
> > To: Bharat Bhushan <bbhushan2@marvell.com>
> > Cc: catalin.marinas@arm.com; will@kernel.org; daniel.lezcano@linaro.org;
> > maz@kernel.org; konrad.dybcio@somainline.org;
> > saiprakash.ranjan@codeaurora.org; robh@kernel.org; marcan@marcan.st;
> > suzuki.poulose@arm.com; broonie@kernel.org; linux-arm-
> > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; Linu Cherian
> > <lcherian@marvell.com>
> > Subject: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround
> > 
> > External Email
> > 
> > ----------------------------------------------------------------------
> > Hi Bharat,
> > 
> > On Mon, Jul 05, 2021 at 11:38:43AM +0530, Bharat Bhushan wrote:
> > > CPU pipeline have unpredicted behavior when timer interrupt appears
> > > and then disappears prior to the exception happening. Time interrupt
> > > appears on timer expiry and disappears when timer programming or timer
> > > disable. This typically can happen when a load instruction misses in
> > > the cache,  which can take few hundreds of cycles, and an interrupt
> > > appears after the load instruction starts executing but disappears
> > > before the load instruction completes.
> > 
> > Could you elaborate on the scenario? What sort of unpredictable behaviour can
> > occur? e.g:
> 
> This is a race condition where an instruction (except store, system,
> load atomic and load exclusive) becomes "nop" if interrupt appears and
> disappears before taken by CPU. For example interrupt appears after
> the atomic load instruction starts executing and disappears before the
> atomic load instruction completes, in that case instruction (not all)
> can become "nop". As interrupt disappears before atomic instruction
> completes, cpu continues to execute and while take junk from register
> as other dependent got "nop".

Thanks for this; I have a number of further questions below.

You said this doesn't apply to:

* store
* system
* load atomic
* load exclusive

... but your example explains this happening for an atomic load, which
was in that list. Was the example bad, or was the list wrong?

It's not entirely clear to me which instructions this covers. e.g. is
"system" the entire system instruction class (i.e. all opcodes
0b110101010_0xxxxxxx_xxxxxxxx_xxxxxxxx), or did you mean something more
specific? Does "store" include store-exlcusive?

Other than that list, can this occur for *any* instruction? e.g. MOV,
SHA256*, *DIV?

Does this only apply to a single instruction at a time, or can multiple
instructions "become nop"?

When an instruction "becomes nop", will subsequent instructions see a
consistent architectural state (e.g. GPRs as they were exactly before
the instruction which "becomes nop"), or can they see something else
(e.g. garbage forwarded from register renaming or other internal
microarchitectural state)?

> > * Does the CPU lockup?
> No
> 
> > * Does the CPU take the exception at all?
> No
> 
> > * Does the load behave erroneously?
> No,
> 
> > * Does any CPU state (e.g. GPRs, PC, PSTATE) become corrupted?
> 
> yes, GPRs will get corrupted, will have stale value

As above, is that the prior architectural value of the GPRs, or can that
be some bogus microarchitectural state (e.g. from renaming or other
forwarding paths)?

> > Does the problem manifest when IRQs are masked by DAIF.I, or by
> > CNT*_CTL_EL0.{IMASK,ENABLE} ?
> 
> No, there are no issue if interrupts are masked.

If a write to CNTV_CTL_EL0.IMASK races with the interrupt being
asserted, can that trigger the problem?

If a write to DAIF.I races with the interrupt being asserted, can that
trigger the problem?

From your description so far, this doesn't sound like it is specific to
the timer interrupt. Is it possible for a different interrupt to trigger
this, e.g:

* Can the same happen with another PPI, e.g. the PMU interrupt if that
  gets de-asserted, or there's a race with DAIF.I?

* Can the same happen with an SGI, e.g. if one CPU asserts then
  de-asserts an SGI targetting another CPU, or there's a race with
  DAIF.I?

* Can the same happen with an SPI, e.g. if a device asserts then
  de-asserts its IRQ line, or there's a race with DAIF.I?

If not, *why* does this happen specifically for the timer interrupt?

> > > Workaround of this is to ensure maximum 2us of time gap between timer
> > > interrupt and timer programming which can de-assert timer interrupt.
> > 
> > The code below seems to try to enforce a 2us *minimum*. Which is it supposed
> > to be?
> 
> Yes, it is minimum 2us.
> 
> > 
> > Can you explain *why* this is supposed to help?
> With the workaround interrupt assertion and de-assertion will be minimum 2us apart.

I understood that, but why is that deemed to be sufficient? e.g. is it
somehow guaranteed that the CPU will complete the instruction that would
"become nop" in that time?

> > I don't see how we can guarantee this in a VM, or if the CPU misses on an
> > instruction fetch.
> 
> This errata applies to VM (virtual timer) as well, maybe there is some
> gap in my understanding, how it will be different in VM.
> Can you help with what issue we can have VM?

A VCPU can be pre-empted by the host at *any* time, for an arbitrary
length of time. So e.g. you can have a scenario such as:

1. Guest reads CNTx_TVAL, sees interrupt is 4us in the future and
   decides it does not need to wait
2. Host preempts guest
3. Host does some processing for ~3.9us
4. Host returns to guest, with 0.1us left until the interrupt triggers
5. Guest reprograms CNTx_TVAL, and triggers the erratum

Thanks,
Mark.

  reply	other threads:[~2021-07-08 11:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-05  6:08 [PATCH] clocksource: Add Marvell Errata-38627 workaround Bharat Bhushan
2021-07-05  9:07 ` Mark Rutland
2021-07-05  9:14   ` Mark Rutland
2021-07-08 10:47   ` [EXT] " Bharat Bhushan
2021-07-08 11:41     ` Mark Rutland [this message]
2021-07-13  2:40       ` Bharat Bhushan
2021-07-13 15:38         ` Marc Zyngier
2021-07-13 16:12         ` Mark Rutland
2021-07-26  4:29           ` Bharat Bhushan
2021-07-26 18:03             ` Marc Zyngier
2021-07-05  9:26 ` Marc Zyngier
2021-07-08 10:48   ` [EXT] " Bharat Bhushan
2021-07-11  9:57     ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210708114157.GC24650@C02TD0UTHF1T.local \
    --to=mark.rutland@arm.com \
    --cc=bbhushan2@marvell.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=konrad.dybcio@somainline.org \
    --cc=lcherian@marvell.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcan@marcan.st \
    --cc=maz@kernel.org \
    --cc=robh@kernel.org \
    --cc=saiprakash.ranjan@codeaurora.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).