All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhang, Lei" <zhang.lei@jp.fujitsu.com>
To: 'Mark Rutland' <mark.rutland@arm.com>
Cc: "'catalin.marinas@arm.com'" <catalin.marinas@arm.com>,
	"'will.deacon@arm.com'" <will.deacon@arm.com>,
	"'linux-arm-kernel@lists.infradead.org'" 
	<linux-arm-kernel@lists.infradead.org>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	"Zhang, Lei" <zhang.lei@jp.fujitsu.com>
Subject: RE: [PATCH] arm64 memory accesses may cause undefined fault on Fujitsu-A64FX
Date: Tue, 22 Jan 2019 02:05:26 +0000	[thread overview]
Message-ID: <8898674D84E3B24BA3A2D289B872026A6A2A2F44@G01JPEXMBKW03> (raw)
In-Reply-To: <20190118141758.GC12256@lakrids.cambridge.arm.com>

Hi, Mark

Thanks for your comments, and sorry for late.

> -----Original Message-----
> * Under what conditions can the fault occur? e.g. is this in place of
>   some other fault, or completely spurious?
This fault can occur completely spurious under
a specific hardware condition and instructions order.
 
> * Does this only occur for data abort? i.e. not instruction aborts?
Yes. This fault only occurs for data abort.

> * How often does this fault occur?
In my test, this fault occurs once every several times 
in the OS boot sequence, and after the completion of OS boot, 
this fault have never occurred.
In my opinion, this fault rarely occurs 
after the completion of OS boot.

> * Does this only apply to Stage-1, or can the same faults be taken at
>   Stage-2?
This fault can be taken only at Stage-1.

> I'm a bit surprised by the single retry. Is there any guarantee that a
> thread will eventually stop delivering this fault code?
I guarantee that a thread will stop delivering this 
fault code by the this patch.
The hardware condition which cause this fault is 
reset at exception entry, therefore execution of at 
least one instruction is guaranteed by this single retry.

> Note that all CPUs and threads share the do_bad_ignore_first variable,
> so this is going to behave non-deterministically and kill threads in
> some cases.
> 
> This code is also preemptible, so checking the MIDR here doesn't make
> much sense. Either this is always uniform (and we can check once in the
> errata framework), or it's variable (e.g. on a big.LITTLE system) and
> we
> need to avoid preemption up until this point.
> 
> Rather than dynamically checking the MIDR, this should use the errata
> framework, and if any A64FX CPU is discovered, set an erratum cap like
> ARM64_WORKAROUND_CONFIG_FUJITSU_ERRATUM_010001, so we can do something
> like:
I try to provide a new patch to reflect your comments in today.
Unfortunately this bug may occurs before 
init_cpu_hwcaps_indirect_list called.
It is means maybe errata cap is not available. I am trying to
figure out best way to resolve this problem.

---
Best regards,
Lei Zhang
zhang.lei@jp.fujitsu.com


WARNING: multiple messages have this Message-ID (diff)
From: "Zhang, Lei" <zhang.lei@jp.fujitsu.com>
To: 'Mark Rutland' <mark.rutland@arm.com>
Cc: "'catalin.marinas@arm.com'" <catalin.marinas@arm.com>,
	"'will.deacon@arm.com'" <will.deacon@arm.com>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	"'linux-arm-kernel@lists.infradead.org'"
	<linux-arm-kernel@lists.infradead.org>,
	"Zhang, Lei" <zhang.lei@jp.fujitsu.com>
Subject: RE: [PATCH] arm64 memory accesses may cause undefined fault on Fujitsu-A64FX
Date: Tue, 22 Jan 2019 02:05:26 +0000	[thread overview]
Message-ID: <8898674D84E3B24BA3A2D289B872026A6A2A2F44@G01JPEXMBKW03> (raw)
In-Reply-To: <20190118141758.GC12256@lakrids.cambridge.arm.com>

Hi, Mark

Thanks for your comments, and sorry for late.

> -----Original Message-----
> * Under what conditions can the fault occur? e.g. is this in place of
>   some other fault, or completely spurious?
This fault can occur completely spurious under
a specific hardware condition and instructions order.
 
> * Does this only occur for data abort? i.e. not instruction aborts?
Yes. This fault only occurs for data abort.

> * How often does this fault occur?
In my test, this fault occurs once every several times 
in the OS boot sequence, and after the completion of OS boot, 
this fault have never occurred.
In my opinion, this fault rarely occurs 
after the completion of OS boot.

> * Does this only apply to Stage-1, or can the same faults be taken at
>   Stage-2?
This fault can be taken only at Stage-1.

> I'm a bit surprised by the single retry. Is there any guarantee that a
> thread will eventually stop delivering this fault code?
I guarantee that a thread will stop delivering this 
fault code by the this patch.
The hardware condition which cause this fault is 
reset at exception entry, therefore execution of at 
least one instruction is guaranteed by this single retry.

> Note that all CPUs and threads share the do_bad_ignore_first variable,
> so this is going to behave non-deterministically and kill threads in
> some cases.
> 
> This code is also preemptible, so checking the MIDR here doesn't make
> much sense. Either this is always uniform (and we can check once in the
> errata framework), or it's variable (e.g. on a big.LITTLE system) and
> we
> need to avoid preemption up until this point.
> 
> Rather than dynamically checking the MIDR, this should use the errata
> framework, and if any A64FX CPU is discovered, set an erratum cap like
> ARM64_WORKAROUND_CONFIG_FUJITSU_ERRATUM_010001, so we can do something
> like:
I try to provide a new patch to reflect your comments in today.
Unfortunately this bug may occurs before 
init_cpu_hwcaps_indirect_list called.
It is means maybe errata cap is not available. I am trying to
figure out best way to resolve this problem.

---
Best regards,
Lei Zhang
zhang.lei@jp.fujitsu.com


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-01-22  2:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-18 12:52 [PATCH] arm64 memory accesses may cause undefined fault on Fujitsu-A64FX Zhang, Lei
2019-01-18 12:52 ` Zhang, Lei
2019-01-18 14:17 ` Mark Rutland
2019-01-18 14:17   ` Mark Rutland
2019-01-22  2:05   ` Zhang, Lei [this message]
2019-01-22  2:05     ` Zhang, Lei
2019-01-22 14:42     ` James Morse
2019-01-22 14:42       ` James Morse
2019-01-22 15:23     ` Mark Rutland
2019-01-22 15:23       ` Mark Rutland
2019-01-23 12:51       ` Zhang, Lei
2019-01-23 12:51         ` Zhang, Lei
2019-01-25  6:51         ` Zhang, Lei
2019-01-25  6:51           ` Zhang, Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8898674D84E3B24BA3A2D289B872026A6A2A2F44@G01JPEXMBKW03 \
    --to=zhang.lei@jp.fujitsu.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.