linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64/io: Don't use WZR in writel
@ 2019-02-09 18:34 AngeloGioacchino Del Regno
  2019-02-11 10:57 ` Will Deacon
  0 siblings, 1 reply; 20+ messages in thread
From: AngeloGioacchino Del Regno @ 2019-02-09 18:34 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno, Jens Axboe, Will Deacon, Catalin Marinas
  Cc: AngeloGioacchino Del Regno, linux-arm-kernel, linux-kernel,
	linux-arm-msm

From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00 2001
From: "Angelo G. Del Regno" <kholk11@gmail.com>
Date: Sat, 9 Feb 2019 18:56:46 +0100
Subject: [PATCH] arm64/io: Don't use WZR in writel

This is a partial revert of commit ee5e41b5f21a
("arm64/io: Allow I/O writes to use {W,X}ZR")

When we try to use the zero register directly on some SoCs,
their security will make them freeze due to a firmware bug.
This behavior is seen with the arm-smmu driver freezing on
TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.

Allocating a temporary register to store the zero for the
write actually solves the issue on these SoCs.

Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
---
 arch/arm64/include/asm/io.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index ee723835c1f4..a0a6d1aeb670 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -49,7 +49,7 @@ static inline void __raw_writew(u16 val, volatile void __iomem *addr)
 #define __raw_writel __raw_writel
 static inline void __raw_writel(u32 val, volatile void __iomem *addr)
 {
-	asm volatile("str %w0, [%1]" : : "rZ" (val), "r" (addr));
+	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
 }
 
 #define __raw_writeq __raw_writeq
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-09 18:34 [PATCH] arm64/io: Don't use WZR in writel AngeloGioacchino Del Regno
@ 2019-02-11 10:57 ` Will Deacon
  2019-02-11 11:52   ` Marc Zyngier
  0 siblings, 1 reply; 20+ messages in thread
From: Will Deacon @ 2019-02-11 10:57 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno
  Cc: Jens Axboe, Catalin Marinas, linux-arm-kernel, linux-kernel,
	linux-arm-msm

On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del Regno wrote:
> From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00 2001
> From: "Angelo G. Del Regno" <kholk11@gmail.com>
> Date: Sat, 9 Feb 2019 18:56:46 +0100
> Subject: [PATCH] arm64/io: Don't use WZR in writel
> 
> This is a partial revert of commit ee5e41b5f21a
> ("arm64/io: Allow I/O writes to use {W,X}ZR")
> 
> When we try to use the zero register directly on some SoCs,
> their security will make them freeze due to a firmware bug.
> This behavior is seen with the arm-smmu driver freezing on
> TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.

Hmm, this sounds very fragile. I hope they're not trapping and emulating
MMIO accesses and treating the zero register as the stack pointer...

Wouldn't this also be triggerable from userspace by mmap()ing either
/dev/mem or e.g. a PCI bar via sysfs?

> Allocating a temporary register to store the zero for the
> write actually solves the issue on these SoCs.

I don't think this catches all MMIO accesses, so I think we need to
understand more about the actual issue here. For example, is it only the
SMMU that causes this problem? Also, any workaround should be specific to
the broken SoCs.

Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-11 10:57 ` Will Deacon
@ 2019-02-11 11:52   ` Marc Zyngier
  2019-02-11 14:29     ` AngeloGioacchino Del Regno
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2019-02-11 11:52 UTC (permalink / raw)
  To: Will Deacon, AngeloGioacchino Del Regno
  Cc: Jens Axboe, Catalin Marinas, linux-kernel, linux-arm-kernel,
	linux-arm-msm

On 11/02/2019 10:57, Will Deacon wrote:
> On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del Regno wrote:
>> From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00 2001
>> From: "Angelo G. Del Regno" <kholk11@gmail.com>
>> Date: Sat, 9 Feb 2019 18:56:46 +0100
>> Subject: [PATCH] arm64/io: Don't use WZR in writel
>>
>> This is a partial revert of commit ee5e41b5f21a
>> ("arm64/io: Allow I/O writes to use {W,X}ZR")
>>
>> When we try to use the zero register directly on some SoCs,
>> their security will make them freeze due to a firmware bug.>> This behavior is seen with the arm-smmu driver freezing on
>> TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.

This looks similar to the issue these SoCs have with GICv3, worked
around in 9c8114c20d18.

> Hmm, this sounds very fragile. I hope they're not trapping and emulating
> MMIO accesses and treating the zero register as the stack pointer...

I bet this is the case. The same bug was there in both KVM and Xen. The
only difference is that we fixed it back in December 2015 (at least for
KVM), while some of these SoCs were announced in 2017, and are still
shipping. Great stuff.

> 
> Wouldn't this also be triggerable from userspace by mmap()ing either
> /dev/mem or e.g. a PCI bar via sysfs?
> 
>> Allocating a temporary register to store the zero for the
>> write actually solves the issue on these SoCs.
> 
> I don't think this catches all MMIO accesses, so I think we need to
> understand more about the actual issue here. For example, is it only the
> SMMU that causes this problem? Also, any workaround should be specific to
> the broken SoCs.

Also, nothing would prevent a compiler from generating these accesses.

	M.

Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-11 11:52   ` Marc Zyngier
@ 2019-02-11 14:29     ` AngeloGioacchino Del Regno
  2019-02-11 14:59       ` Marc Zyngier
  0 siblings, 1 reply; 20+ messages in thread
From: AngeloGioacchino Del Regno @ 2019-02-11 14:29 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Jens Axboe, Catalin Marinas, linux-kernel, linux-arm-kernel,
	linux-arm-msm

Il giorno lun, 11/02/2019 alle 11.52 +0000, Marc Zyngier ha scritto:
> On 11/02/2019 10:57, Will Deacon wrote:
> > On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del
> > Regno wrote:
> > > From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00
> > > 2001
> > > From: "Angelo G. Del Regno" <kholk11@gmail.com>
> > > Date: Sat, 9 Feb 2019 18:56:46 +0100
> > > Subject: [PATCH] arm64/io: Don't use WZR in writel
> > > 
> > > This is a partial revert of commit ee5e41b5f21a
> > > ("arm64/io: Allow I/O writes to use {W,X}ZR")
> > > 
> > > When we try to use the zero register directly on some SoCs,
> > > their security will make them freeze due to a firmware bug.>>
> > > This behavior is seen with the arm-smmu driver freezing on
> > > TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.
> 
> This looks similar to the issue these SoCs have with GICv3, worked
> around in 9c8114c20d18.
> 

Well, yes that's a firmware quirk, of course, due to the "security"
stuff that they have inside...

> > Hmm, this sounds very fragile. I hope they're not trapping and
> > emulating
> > MMIO accesses and treating the zero register as the stack
> > pointer...
> 
> I bet this is the case. The same bug was there in both KVM and Xen.
> The
> only difference is that we fixed it back in December 2015 (at least
> for
> KVM), while some of these SoCs were announced in 2017, and are still
> shipping. Great stuff.

Totally agree, they must be using it as stack pointer.
Poor decision.

> 
> > Wouldn't this also be triggerable from userspace by mmap()ing
> > either
> > /dev/mem or e.g. a PCI bar via sysfs?
> > 
> > > Allocating a temporary register to store the zero for the
> > > write actually solves the issue on these SoCs.
> > 
> > I don't think this catches all MMIO accesses, so I think we need to
> > understand more about the actual issue here. For example, is it
> > only the
> > SMMU that causes this problem? Also, any workaround should be
> > specific to
> > the broken SoCs.
> 
> Also, nothing would prevent a compiler from generating these
> accesses.
> 
> 	M.
> 
> Jazz is not dead. It just smells funny...

While I agree that nothing would prevent a compiler from generating
these accesses, please take in mind that everything worked on
downstream kernels before this change was introduced (which is first
seen downstream on msm-4.9).
So I've discovered it on msm-4.9 while porting the 8996-98, 630-660
to that and I've had a whole lot of head scratching: the arm-smmu
code was apparently right, then I've seen that surprise......
By the way, I can tell you for sure that this bug is not present on
at least SDM845, since that one worked fine even before this fix,
and I imagine that also SDM670 and newest may not be affected.
Also Family-B SoCs are not affected by this bug (MSM8916-36-37-56-76).

Unfortunately, I couldn't think of any other solution on these
Family-A SoCs, also because I'm not totally sure that the only
driver that produces this issue is arm-smmu. When I've fixed it
on the downstream kernel, I've also had some other random freezes
that weren't related to the SMMU: usually qseecom stuff was also
acting funny sometimes.

Also, just one more thing: yes this thing is going ARM64-wide and
- from my findings - it's targeting certain Qualcomm SoCs, but...
I'm not sure that only QC is affected by that, others may as well
have the same stupid bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-11 14:29     ` AngeloGioacchino Del Regno
@ 2019-02-11 14:59       ` Marc Zyngier
  2019-02-11 16:15         ` AngeloGioacchino Del Regno
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Marc Zyngier @ 2019-02-11 14:59 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno, Will Deacon
  Cc: Jens Axboe, Catalin Marinas, linux-kernel, linux-arm-kernel,
	linux-arm-msm

On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:

[...]

> Also, just one more thing: yes this thing is going ARM64-wide and
> - from my findings - it's targeting certain Qualcomm SoCs, but...
> I'm not sure that only QC is affected by that, others may as well
> have the same stupid bug.
> 

At the moment, only QC SoCs seem to be affected, probably because
everyone else has debugged their hypervisor (or most likely doesn't
bother with shipping one).

In all honesty, we need some information from QC here: which SoCs are
affected, what is the exact nature of the bug, can it be triggered from
EL0. Randomly papering over symptoms is not something I really like
doing, and is likely to generate problems on unaffected systems.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-11 14:59       ` Marc Zyngier
@ 2019-02-11 16:15         ` AngeloGioacchino Del Regno
  2019-02-11 16:37         ` Robin Murphy
  2019-02-23 18:12         ` Bjorn Andersson
  2 siblings, 0 replies; 20+ messages in thread
From: AngeloGioacchino Del Regno @ 2019-02-11 16:15 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Jens Axboe, Catalin Marinas, linux-kernel, linux-arm-kernel,
	linux-arm-msm

Il giorno lun, 11/02/2019 alle 14.59 +0000, Marc Zyngier ha scritto:
> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> 
> [...]
> 
> > Also, just one more thing: yes this thing is going ARM64-wide and
> > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > I'm not sure that only QC is affected by that, others may as well
> > have the same stupid bug.
> > 
> 
> At the moment, only QC SoCs seem to be affected, probably because
> everyone else has debugged their hypervisor (or most likely doesn't
> bother with shipping one).
> 

Between all the ARM SoCs, as far as I know, the only (?) ones using
actual "smartphones", so actually provisioned SoCs, for upstream
development are using Qualcomm SoCs.. of which, some development
boards are not entirely security enabled, or have got newer firmwares
which can't be used on production phones etc, so.. that's why I said
that I'm not sure that only QC is affected.
It's just relative to what we currently know, but looking at, for
example, MediaTek, I'm not sure that the only bugged hypervisor is on
QC (because MTK is very similar on certain aspects).
I mean, it's highly possible that some other is affected and we don't
know (and we possibly don't care...).

> In all honesty, we need some information from QC here: which SoCs are
> affected, what is the exact nature of the bug, can it be triggered
> from

It'd be wonderful if Qualcomm gives us some information about that.
Would really be helpful and nice from them.

> EL0. Randomly papering over symptoms is not something I really like
> doing, and is likely to generate problems on unaffected systems.
> 
> Thanks,
> 
> 	M.

I also don't like "randomly papering over symptoms", I totally agree
with you on that... but this change potentially generating problems on
unaffected systems is something I don't really agree on: this is a
partial revert of a commit that was done purely to introduce some
vmlinux (relatively small) size saving on ARM64 and no other reason
(as far as I can read on the original commit), so I really don't think
that my partial revert could ever harm anything.
Though, this is a personal opinion, I can be right, but I can obviously
be totally wrong on that.

Though I have to make this clear: if there's another (better/cleaner)
solution to this issue, I'd be totally happy (and, of course, curious
too) to see it!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-11 14:59       ` Marc Zyngier
  2019-02-11 16:15         ` AngeloGioacchino Del Regno
@ 2019-02-11 16:37         ` Robin Murphy
  2019-02-23 18:12         ` Bjorn Andersson
  2 siblings, 0 replies; 20+ messages in thread
From: Robin Murphy @ 2019-02-11 16:37 UTC (permalink / raw)
  To: Marc Zyngier, AngeloGioacchino Del Regno, Will Deacon
  Cc: Jens Axboe, Catalin Marinas, linux-kernel, linux-arm-kernel,
	linux-arm-msm

On 11/02/2019 14:59, Marc Zyngier wrote:
> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> 
> [...]
> 
>> Also, just one more thing: yes this thing is going ARM64-wide and
>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>> I'm not sure that only QC is affected by that, others may as well
>> have the same stupid bug.
>>
> 
> At the moment, only QC SoCs seem to be affected, probably because
> everyone else has debugged their hypervisor (or most likely doesn't
> bother with shipping one).
> 
> In all honesty, we need some information from QC here: which SoCs are
> affected, what is the exact nature of the bug, can it be triggered from
> EL0. Randomly papering over symptoms is not something I really like
> doing, and is likely to generate problems on unaffected systems.

And even if we *were* to just try papering over the observed extent of 
the issue, I'd still be inclined to confine it to arm-smmu.c where the 
impact is finite and minimal - of the 4 instances of writel(0) there, 3 
of them don't care what the data is (so could just reuse the base 
register or similar) and the other one already has a zero in a GPR to 
hand by construction.

Robin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-11 14:59       ` Marc Zyngier
  2019-02-11 16:15         ` AngeloGioacchino Del Regno
  2019-02-11 16:37         ` Robin Murphy
@ 2019-02-23 18:12         ` Bjorn Andersson
  2019-02-23 18:37           ` Marc Zyngier
  2 siblings, 1 reply; 20+ messages in thread
From: Bjorn Andersson @ 2019-02-23 18:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: AngeloGioacchino Del Regno, Will Deacon, Jens Axboe,
	Catalin Marinas, linux-kernel, linux-arm-kernel, linux-arm-msm

On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:

> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> 
> [...]
> 
> > Also, just one more thing: yes this thing is going ARM64-wide and
> > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > I'm not sure that only QC is affected by that, others may as well
> > have the same stupid bug.
> > 
> 
> At the moment, only QC SoCs seem to be affected, probably because
> everyone else has debugged their hypervisor (or most likely doesn't
> bother with shipping one).
> 
> In all honesty, we need some information from QC here: which SoCs are
> affected, what is the exact nature of the bug, can it be triggered from
> EL0. Randomly papering over symptoms is not something I really like
> doing, and is likely to generate problems on unaffected systems.
> 

The bug at hand is that the XZR is not deemed a valid source in the
virtualization of the SMMU registers. It was identified and fixed for
all platforms that are shipping kernels based on v4.9 or later.

As such Angelo's list of affected platforms covers the high-profile
ones. In particular MSM8996 and MSM8998 is getting pretty good support
upstream, if we can figure out a way around this issue.

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-23 18:12         ` Bjorn Andersson
@ 2019-02-23 18:37           ` Marc Zyngier
  2019-02-24  3:53             ` Bjorn Andersson
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2019-02-23 18:37 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: AngeloGioacchino Del Regno, Will Deacon, Jens Axboe,
	Catalin Marinas, linux-kernel, linux-arm-kernel, linux-arm-msm,
	Robin Murphy

On Sat, 23 Feb 2019 18:12:54 +0000,
Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> 
> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> 
> > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > 
> > [...]
> > 
> > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > I'm not sure that only QC is affected by that, others may as well
> > > have the same stupid bug.
> > > 
> > 
> > At the moment, only QC SoCs seem to be affected, probably because
> > everyone else has debugged their hypervisor (or most likely doesn't
> > bother with shipping one).
> > 
> > In all honesty, we need some information from QC here: which SoCs are
> > affected, what is the exact nature of the bug, can it be triggered from
> > EL0. Randomly papering over symptoms is not something I really like
> > doing, and is likely to generate problems on unaffected systems.
> > 
> 
> The bug at hand is that the XZR is not deemed a valid source in the
> virtualization of the SMMU registers. It was identified and fixed for
> all platforms that are shipping kernels based on v4.9 or later.

When you say "fixed": Do you mean fixed in the firmware? Or by adding
a workaround in the shipped kernel? If the former, is this part of an
official QC statement, with an associated erratum number? Is this
really limited to the SMMU accesses?

> As such Angelo's list of affected platforms covers the high-profile
> ones. In particular MSM8996 and MSM8998 is getting pretty good support
> upstream, if we can figure out a way around this issue.

We'd need an exhaustive list of the affected SoCs, and work out if we
can limit the hack to the SMMU driver (cc'ing Robin, who's the one
who'd know about it).

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-23 18:37           ` Marc Zyngier
@ 2019-02-24  3:53             ` Bjorn Andersson
  2019-03-12 12:36               ` Marc Gonzalez
  0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Andersson @ 2019-02-24  3:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: AngeloGioacchino Del Regno, Will Deacon, Jens Axboe,
	Catalin Marinas, linux-kernel, linux-arm-kernel, linux-arm-msm,
	Robin Murphy

On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:

> On Sat, 23 Feb 2019 18:12:54 +0000,
> Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> > 
> > On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> > 
> > > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > > 
> > > [...]
> > > 
> > > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > > I'm not sure that only QC is affected by that, others may as well
> > > > have the same stupid bug.
> > > > 
> > > 
> > > At the moment, only QC SoCs seem to be affected, probably because
> > > everyone else has debugged their hypervisor (or most likely doesn't
> > > bother with shipping one).
> > > 
> > > In all honesty, we need some information from QC here: which SoCs are
> > > affected, what is the exact nature of the bug, can it be triggered from
> > > EL0. Randomly papering over symptoms is not something I really like
> > > doing, and is likely to generate problems on unaffected systems.
> > > 
> > 
> > The bug at hand is that the XZR is not deemed a valid source in the
> > virtualization of the SMMU registers. It was identified and fixed for
> > all platforms that are shipping kernels based on v4.9 or later.
> 
> When you say "fixed": Do you mean fixed in the firmware? Or by adding
> a workaround in the shipped kernel?

I mean that it's fixed in the firmware.

> If the former, is this part of an official QC statement, with an
> associated erratum number?

I don't know, will get back to you on this one.

> Is this really limited to the SMMU accesses?
> 

Yes.

> > As such Angelo's list of affected platforms covers the high-profile
> > ones. In particular MSM8996 and MSM8998 is getting pretty good support
> > upstream, if we can figure out a way around this issue.
> 
> We'd need an exhaustive list of the affected SoCs, and work out if we
> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
> who'd know about it).
> 

I will try to compose a list.

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-02-24  3:53             ` Bjorn Andersson
@ 2019-03-12 12:36               ` Marc Gonzalez
  2019-03-18 16:04                 ` Robin Murphy
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Gonzalez @ 2019-03-12 12:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Robin Murphy
  Cc: Bjorn Andersson, AngeloGioacchino Del Regno, Jens Axboe,
	Catalin Marinas, LKML, Linux ARM, MSM, Jeffrey Hugo

On 24/02/2019 04:53, Bjorn Andersson wrote:

> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
> 
>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>
>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>
>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>
>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>> have the same stupid bug.
>>>>
>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>> bother with shipping one).
>>>>
>>>> In all honesty, we need some information from QC here: which SoCs are
>>>> affected, what is the exact nature of the bug, can it be triggered from
>>>> EL0. Randomly papering over symptoms is not something I really like
>>>> doing, and is likely to generate problems on unaffected systems.
>>>
>>> The bug at hand is that the XZR is not deemed a valid source in the
>>> virtualization of the SMMU registers. It was identified and fixed for
>>> all platforms that are shipping kernels based on v4.9 or later.
>>
>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>> a workaround in the shipped kernel?
> 
> I mean that it's fixed in the firmware.
> 
>> If the former, is this part of an official QC statement, with an
>> associated erratum number?
> 
> I don't know, will get back to you on this one.
> 
>> Is this really limited to the SMMU accesses?
> 
> Yes.
> 
>>> As such Angelo's list of affected platforms covers the high-profile
>>> ones. In particular MSM8996 and MSM8998 is getting pretty good support
>>> upstream, if we can figure out a way around this issue.
>> 
>> We'd need an exhaustive list of the affected SoCs, and work out if we
>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>> who'd know about it).
> 
> I will try to compose a list.

FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:

	/* Invalidate the TLB, just in case */
	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);


With the 'Z' constraint, gcc generates:

	str wzr, [x0]

without the 'Z' constraint, gcc generates:

	mov	w1, 0
	str w1, [x0]


I can work around the problem using the following patch:

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 045d93884164..93117519aed8 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -59,6 +59,11 @@
 
 #include "arm-smmu-regs.h"
 
+static inline void qcom_writel(u32 val, volatile void __iomem *addr)
+{
+	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
+}
+
 #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
 
 #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
@@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
 {
 	unsigned int spin_cnt, delay;
 
-	writel_relaxed(0, sync);
+	qcom_writel(0, sync);
 	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
 		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
 			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
@@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	}
 
 	/* Invalidate the TLB, just in case */
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
+	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
 
 	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
 



Can a quirk be used to work around the issue?
Or can we just "pessimize" the 3 writes for everybody?
(Might be cheaper than a test anyway)

Regards.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-03-12 12:36               ` Marc Gonzalez
@ 2019-03-18 16:04                 ` Robin Murphy
  2019-03-18 17:00                   ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 20+ messages in thread
From: Robin Murphy @ 2019-03-18 16:04 UTC (permalink / raw)
  To: Marc Gonzalez, Marc Zyngier, Will Deacon
  Cc: Bjorn Andersson, AngeloGioacchino Del Regno, Jens Axboe,
	Catalin Marinas, LKML, Linux ARM, MSM, Jeffrey Hugo

On 12/03/2019 12:36, Marc Gonzalez wrote:
> On 24/02/2019 04:53, Bjorn Andersson wrote:
> 
>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>
>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>
>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>
>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>
>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>> have the same stupid bug.
>>>>>
>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>>> bother with shipping one).
>>>>>
>>>>> In all honesty, we need some information from QC here: which SoCs are
>>>>> affected, what is the exact nature of the bug, can it be triggered from
>>>>> EL0. Randomly papering over symptoms is not something I really like
>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>
>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>> virtualization of the SMMU registers. It was identified and fixed for
>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>
>>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>>> a workaround in the shipped kernel?
>>
>> I mean that it's fixed in the firmware.
>>
>>> If the former, is this part of an official QC statement, with an
>>> associated erratum number?
>>
>> I don't know, will get back to you on this one.
>>
>>> Is this really limited to the SMMU accesses?
>>
>> Yes.
>>
>>>> As such Angelo's list of affected platforms covers the high-profile
>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good support
>>>> upstream, if we can figure out a way around this issue.
>>>
>>> We'd need an exhaustive list of the affected SoCs, and work out if we
>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>> who'd know about it).
>>
>> I will try to compose a list.
> 
> FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
> filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
> 
> 	/* Invalidate the TLB, just in case */
> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> 
> 
> With the 'Z' constraint, gcc generates:
> 
> 	str wzr, [x0]
> 
> without the 'Z' constraint, gcc generates:
> 
> 	mov	w1, 0
> 	str w1, [x0]
> 
> 
> I can work around the problem using the following patch:
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 045d93884164..93117519aed8 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -59,6 +59,11 @@
>   
>   #include "arm-smmu-regs.h"
>   
> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
> +{
> +	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
> +}
> +
>   #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
>   
>   #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>   {
>   	unsigned int spin_cnt, delay;
>   
> -	writel_relaxed(0, sync);
> +	qcom_writel(0, sync);
>   	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>   		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>   			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>   	}
>   
>   	/* Invalidate the TLB, just in case */
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>   
>   	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>   
> 
> 
> 
> Can a quirk be used to work around the issue?
> Or can we just "pessimize" the 3 writes for everybody?
> (Might be cheaper than a test anyway)

If it really is just the SMMU driver which is affected, we can work 
around it for free (not counting the 'cost' of slightly-weird-looking 
code, of course). If the diff below works as expected, I'll write it up 
properly.

Robin.
----->8-----
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 045d93884164..7ff29e33298f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct 
arm_smmu_device *smmu,
  {
  	unsigned int spin_cnt, delay;

-	writel_relaxed(0, sync);
+	writel_relaxed((unsigned long)sync, sync);
  	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
  		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
  			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
@@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)

  	/* Unassigned context banks only need disabling */
  	if (!cfg) {
-		writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
+		/*
+		 * For Qualcomm reasons, we want to guarantee that we write a
+		 * zero from a register which is not WZR. Fortunately, the cfg
+		 * logic here plays right into our hands...
+		 */
+		writel_relaxed((unsigned long)cfg, cb_base + ARM_SMMU_CB_SCTLR);
  		return;
  	}

@@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct 
arm_smmu_device *smmu)
  	}

  	/* Invalidate the TLB, just in case */
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
+	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);

  	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-03-18 16:04                 ` Robin Murphy
@ 2019-03-18 17:00                   ` Russell King - ARM Linux admin
  2019-03-18 17:11                     ` Ard Biesheuvel
  2019-03-18 17:19                     ` Robin Murphy
  0 siblings, 2 replies; 20+ messages in thread
From: Russell King - ARM Linux admin @ 2019-03-18 17:00 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Marc Gonzalez, Marc Zyngier, Will Deacon, Jens Axboe,
	Jeffrey Hugo, Catalin Marinas, LKML, Bjorn Andersson, MSM,
	AngeloGioacchino Del Regno, Linux ARM

On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
> On 12/03/2019 12:36, Marc Gonzalez wrote:
> > On 24/02/2019 04:53, Bjorn Andersson wrote:
> > 
> > > On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
> > > 
> > > > On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
> > > > > 
> > > > > On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> > > > > 
> > > > > > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > > > > > 
> > > > > > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > > > > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > > > > > I'm not sure that only QC is affected by that, others may as well
> > > > > > > have the same stupid bug.
> > > > > > 
> > > > > > At the moment, only QC SoCs seem to be affected, probably because
> > > > > > everyone else has debugged their hypervisor (or most likely doesn't
> > > > > > bother with shipping one).
> > > > > > 
> > > > > > In all honesty, we need some information from QC here: which SoCs are
> > > > > > affected, what is the exact nature of the bug, can it be triggered from
> > > > > > EL0. Randomly papering over symptoms is not something I really like
> > > > > > doing, and is likely to generate problems on unaffected systems.
> > > > > 
> > > > > The bug at hand is that the XZR is not deemed a valid source in the
> > > > > virtualization of the SMMU registers. It was identified and fixed for
> > > > > all platforms that are shipping kernels based on v4.9 or later.
> > > > 
> > > > When you say "fixed": Do you mean fixed in the firmware? Or by adding
> > > > a workaround in the shipped kernel?
> > > 
> > > I mean that it's fixed in the firmware.
> > > 
> > > > If the former, is this part of an official QC statement, with an
> > > > associated erratum number?
> > > 
> > > I don't know, will get back to you on this one.
> > > 
> > > > Is this really limited to the SMMU accesses?
> > > 
> > > Yes.
> > > 
> > > > > As such Angelo's list of affected platforms covers the high-profile
> > > > > ones. In particular MSM8996 and MSM8998 is getting pretty good support
> > > > > upstream, if we can figure out a way around this issue.
> > > > 
> > > > We'd need an exhaustive list of the affected SoCs, and work out if we
> > > > can limit the hack to the SMMU driver (cc'ing Robin, who's the one
> > > > who'd know about it).
> > > 
> > > I will try to compose a list.
> > 
> > FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
> > filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
> > MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
> > 
> > 	/* Invalidate the TLB, just in case */
> > 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > 
> > 
> > With the 'Z' constraint, gcc generates:
> > 
> > 	str wzr, [x0]
> > 
> > without the 'Z' constraint, gcc generates:
> > 
> > 	mov	w1, 0
> > 	str w1, [x0]
> > 
> > 
> > I can work around the problem using the following patch:
> > 
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 045d93884164..93117519aed8 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -59,6 +59,11 @@
> >   #include "arm-smmu-regs.h"
> > +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
> > +{
> > +	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
> > +}
> > +
> >   #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
> >   #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
> > @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> >   {
> >   	unsigned int spin_cnt, delay;
> > -	writel_relaxed(0, sync);
> > +	qcom_writel(0, sync);
> >   	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
> >   		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
> >   			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> > @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
> >   	}
> >   	/* Invalidate the TLB, just in case */
> > -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> >   	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
> > 
> > 
> > 
> > Can a quirk be used to work around the issue?
> > Or can we just "pessimize" the 3 writes for everybody?
> > (Might be cheaper than a test anyway)
> 
> If it really is just the SMMU driver which is affected, we can work around
> it for free (not counting the 'cost' of slightly-weird-looking code, of
> course). If the diff below works as expected, I'll write it up properly.
> 
> Robin.
> ----->8-----
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 045d93884164..7ff29e33298f 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device
> *smmu,
>  {
>  	unsigned int spin_cnt, delay;
> 
> -	writel_relaxed(0, sync);
> +	writel_relaxed((unsigned long)sync, sync);
>  	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>  		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>  			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
> arm_smmu_device *smmu, int idx)
> 
>  	/* Unassigned context banks only need disabling */
>  	if (!cfg) {
> -		writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
> +		/*
> +		 * For Qualcomm reasons, we want to guarantee that we write a
> +		 * zero from a register which is not WZR. Fortunately, the cfg
> +		 * logic here plays right into our hands...
> +		 */
> +		writel_relaxed((unsigned long)cfg, cb_base + ARM_SMMU_CB_SCTLR);
>  		return;
>  	}
> 
> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
> arm_smmu_device *smmu)
>  	}
> 
>  	/* Invalidate the TLB, just in case */
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> 
>  	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
> 

Given what we've seen from Clang for futex stuff in 32-bit ARM, are
you really sure that the above will not result in Clang still spotting
that the value is zero and using a wzr for all these cases?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-03-18 17:00                   ` Russell King - ARM Linux admin
@ 2019-03-18 17:11                     ` Ard Biesheuvel
  2019-03-18 17:19                     ` Robin Murphy
  1 sibling, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2019-03-18 17:11 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Robin Murphy, Jens Axboe, Marc Gonzalez, Marc Zyngier,
	Catalin Marinas, Will Deacon, LKML, Bjorn Andersson,
	Jeffrey Hugo, MSM, AngeloGioacchino Del Regno, Linux ARM

On Mon, 18 Mar 2019 at 18:01, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
> > On 12/03/2019 12:36, Marc Gonzalez wrote:
> > > On 24/02/2019 04:53, Bjorn Andersson wrote:
> > >
> > > > On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
> > > >
> > > > > On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
> > > > > >
> > > > > > On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> > > > > >
> > > > > > > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > > > > > >
> > > > > > > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > > > > > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > > > > > > I'm not sure that only QC is affected by that, others may as well
> > > > > > > > have the same stupid bug.
> > > > > > >
> > > > > > > At the moment, only QC SoCs seem to be affected, probably because
> > > > > > > everyone else has debugged their hypervisor (or most likely doesn't
> > > > > > > bother with shipping one).
> > > > > > >
> > > > > > > In all honesty, we need some information from QC here: which SoCs are
> > > > > > > affected, what is the exact nature of the bug, can it be triggered from
> > > > > > > EL0. Randomly papering over symptoms is not something I really like
> > > > > > > doing, and is likely to generate problems on unaffected systems.
> > > > > >
> > > > > > The bug at hand is that the XZR is not deemed a valid source in the
> > > > > > virtualization of the SMMU registers. It was identified and fixed for
> > > > > > all platforms that are shipping kernels based on v4.9 or later.
> > > > >
> > > > > When you say "fixed": Do you mean fixed in the firmware? Or by adding
> > > > > a workaround in the shipped kernel?
> > > >
> > > > I mean that it's fixed in the firmware.
> > > >
> > > > > If the former, is this part of an official QC statement, with an
> > > > > associated erratum number?
> > > >
> > > > I don't know, will get back to you on this one.
> > > >
> > > > > Is this really limited to the SMMU accesses?
> > > >
> > > > Yes.
> > > >
> > > > > > As such Angelo's list of affected platforms covers the high-profile
> > > > > > ones. In particular MSM8996 and MSM8998 is getting pretty good support
> > > > > > upstream, if we can figure out a way around this issue.
> > > > >
> > > > > We'd need an exhaustive list of the affected SoCs, and work out if we
> > > > > can limit the hack to the SMMU driver (cc'ing Robin, who's the one
> > > > > who'd know about it).
> > > >
> > > > I will try to compose a list.
> > >
> > > FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
> > > filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
> > > MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
> > >
> > >     /* Invalidate the TLB, just in case */
> > >     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > >     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > >
> > >
> > > With the 'Z' constraint, gcc generates:
> > >
> > >     str wzr, [x0]
> > >
> > > without the 'Z' constraint, gcc generates:
> > >
> > >     mov     w1, 0
> > >     str w1, [x0]
> > >
> > >
> > > I can work around the problem using the following patch:
> > >
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index 045d93884164..93117519aed8 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -59,6 +59,11 @@
> > >   #include "arm-smmu-regs.h"
> > > +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
> > > +{
> > > +   asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
> > > +}
> > > +
> > >   #define ARM_MMU500_ACTLR_CPRE             (1 << 1)
> > >   #define ARM_MMU500_ACR_CACHE_LOCK (1 << 26)
> > > @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> > >   {
> > >     unsigned int spin_cnt, delay;
> > > -   writel_relaxed(0, sync);
> > > +   qcom_writel(0, sync);
> > >     for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
> > >             for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
> > >                     if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> > > @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
> > >     }
> > >     /* Invalidate the TLB, just in case */
> > > -   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > > -   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > > +   qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > > +   qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > >     reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
> > >
> > >
> > >
> > > Can a quirk be used to work around the issue?
> > > Or can we just "pessimize" the 3 writes for everybody?
> > > (Might be cheaper than a test anyway)
> >
> > If it really is just the SMMU driver which is affected, we can work around
> > it for free (not counting the 'cost' of slightly-weird-looking code, of
> > course). If the diff below works as expected, I'll write it up properly.
> >
> > Robin.
> > ----->8-----
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 045d93884164..7ff29e33298f 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device
> > *smmu,
> >  {
> >       unsigned int spin_cnt, delay;
> >
> > -     writel_relaxed(0, sync);
> > +     writel_relaxed((unsigned long)sync, sync);
> >       for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
> >               for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
> >                       if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> > @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
> > arm_smmu_device *smmu, int idx)
> >
> >       /* Unassigned context banks only need disabling */
> >       if (!cfg) {
> > -             writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
> > +             /*
> > +              * For Qualcomm reasons, we want to guarantee that we write a
> > +              * zero from a register which is not WZR. Fortunately, the cfg
> > +              * logic here plays right into our hands...
> > +              */
> > +             writel_relaxed((unsigned long)cfg, cb_base + ARM_SMMU_CB_SCTLR);
> >               return;
> >       }
> >
> > @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
> > arm_smmu_device *smmu)
> >       }
> >
> >       /* Invalidate the TLB, just in case */
> > -     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > -     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > +     writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > +     writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> >
> >       reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
> >
>
> Given what we've seen from Clang for futex stuff in 32-bit ARM, are
> you really sure that the above will not result in Clang still spotting
> that the value is zero and using a wzr for all these cases?
>

Yeah, it seems to me that even GCC would still be likely to treat cfg
as a constant zero when fulfilling the asm constraints if it occurs
inside a 'if (!cfg) {}' block.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-03-18 17:00                   ` Russell King - ARM Linux admin
  2019-03-18 17:11                     ` Ard Biesheuvel
@ 2019-03-18 17:19                     ` Robin Murphy
  2019-03-18 17:24                       ` Robin Murphy
  2019-03-18 17:30                       ` Marc Gonzalez
  1 sibling, 2 replies; 20+ messages in thread
From: Robin Murphy @ 2019-03-18 17:19 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Marc Gonzalez, Marc Zyngier, Will Deacon, Jens Axboe,
	Jeffrey Hugo, Catalin Marinas, LKML, Bjorn Andersson, MSM,
	AngeloGioacchino Del Regno, Linux ARM

On 18/03/2019 17:00, Russell King - ARM Linux admin wrote:
> On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
>> On 12/03/2019 12:36, Marc Gonzalez wrote:
>>> On 24/02/2019 04:53, Bjorn Andersson wrote:
>>>
>>>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>>>
>>>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>>>
>>>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>>>
>>>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>>>
>>>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>>>> have the same stupid bug.
>>>>>>>
>>>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>>>>> bother with shipping one).
>>>>>>>
>>>>>>> In all honesty, we need some information from QC here: which SoCs are
>>>>>>> affected, what is the exact nature of the bug, can it be triggered from
>>>>>>> EL0. Randomly papering over symptoms is not something I really like
>>>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>>>
>>>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>>>> virtualization of the SMMU registers. It was identified and fixed for
>>>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>>>
>>>>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>>>>> a workaround in the shipped kernel?
>>>>
>>>> I mean that it's fixed in the firmware.
>>>>
>>>>> If the former, is this part of an official QC statement, with an
>>>>> associated erratum number?
>>>>
>>>> I don't know, will get back to you on this one.
>>>>
>>>>> Is this really limited to the SMMU accesses?
>>>>
>>>> Yes.
>>>>
>>>>>> As such Angelo's list of affected platforms covers the high-profile
>>>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good support
>>>>>> upstream, if we can figure out a way around this issue.
>>>>>
>>>>> We'd need an exhaustive list of the affected SoCs, and work out if we
>>>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>>>> who'd know about it).
>>>>
>>>> I will try to compose a list.
>>>
>>> FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
>>> filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
>>> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
>>>
>>> 	/* Invalidate the TLB, just in case */
>>> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>
>>>
>>> With the 'Z' constraint, gcc generates:
>>>
>>> 	str wzr, [x0]
>>>
>>> without the 'Z' constraint, gcc generates:
>>>
>>> 	mov	w1, 0
>>> 	str w1, [x0]
>>>
>>>
>>> I can work around the problem using the following patch:
>>>
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index 045d93884164..93117519aed8 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -59,6 +59,11 @@
>>>    #include "arm-smmu-regs.h"
>>> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
>>> +{
>>> +	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
>>> +}
>>> +
>>>    #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
>>>    #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
>>> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>>>    {
>>>    	unsigned int spin_cnt, delay;
>>> -	writel_relaxed(0, sync);
>>> +	qcom_writel(0, sync);
>>>    	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>    		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>    			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>>    	}
>>>    	/* Invalidate the TLB, just in case */
>>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>    	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>
>>>
>>>
>>> Can a quirk be used to work around the issue?
>>> Or can we just "pessimize" the 3 writes for everybody?
>>> (Might be cheaper than a test anyway)
>>
>> If it really is just the SMMU driver which is affected, we can work around
>> it for free (not counting the 'cost' of slightly-weird-looking code, of
>> course). If the diff below works as expected, I'll write it up properly.
>>
>> Robin.
>> ----->8-----
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 045d93884164..7ff29e33298f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device
>> *smmu,
>>   {
>>   	unsigned int spin_cnt, delay;
>>
>> -	writel_relaxed(0, sync);
>> +	writel_relaxed((unsigned long)sync, sync);
>>   	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>   		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>   			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>> @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
>> arm_smmu_device *smmu, int idx)
>>
>>   	/* Unassigned context banks only need disabling */
>>   	if (!cfg) {
>> -		writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
>> +		/*
>> +		 * For Qualcomm reasons, we want to guarantee that we write a
>> +		 * zero from a register which is not WZR. Fortunately, the cfg
>> +		 * logic here plays right into our hands...
>> +		 */
>> +		writel_relaxed((unsigned long)cfg, cb_base + ARM_SMMU_CB_SCTLR);
>>   		return;
>>   	}
>>
>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
>> arm_smmu_device *smmu)
>>   	}
>>
>>   	/* Invalidate the TLB, just in case */
>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>
>>   	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>
> 
> Given what we've seen from Clang for futex stuff in 32-bit ARM, are
> you really sure that the above will not result in Clang still spotting
> that the value is zero and using a wzr for all these cases?

The trick is that in the write-only TLBI cases the variable we're 
passing in really is nonzero, so that can't possibly happen. For the 
context bank reset, yes, I am assuming that no complier will ever be 
perverse enough to detect that cfg is not written after the NULL check 
and immediately reallocate it to XZR for no good reason. I'd like to 
think that assumption is going to hold for the reasonable scope of this 
particular workaround, though.

Robin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-03-18 17:19                     ` Robin Murphy
@ 2019-03-18 17:24                       ` Robin Murphy
  2019-03-19 11:45                         ` Robin Murphy
  2019-03-18 17:30                       ` Marc Gonzalez
  1 sibling, 1 reply; 20+ messages in thread
From: Robin Murphy @ 2019-03-18 17:24 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Marc Gonzalez, Marc Zyngier, Will Deacon, Jens Axboe,
	Jeffrey Hugo, Catalin Marinas, LKML, Bjorn Andersson, MSM,
	AngeloGioacchino Del Regno, Linux ARM

On 18/03/2019 17:19, Robin Murphy wrote:
> On 18/03/2019 17:00, Russell King - ARM Linux admin wrote:
>> On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
>>> On 12/03/2019 12:36, Marc Gonzalez wrote:
>>>> On 24/02/2019 04:53, Bjorn Andersson wrote:
>>>>
>>>>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>>>>
>>>>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>>>>
>>>>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>>>>
>>>>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>>>>
>>>>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>>>>> have the same stupid bug.
>>>>>>>>
>>>>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>>>>>> bother with shipping one).
>>>>>>>>
>>>>>>>> In all honesty, we need some information from QC here: which 
>>>>>>>> SoCs are
>>>>>>>> affected, what is the exact nature of the bug, can it be 
>>>>>>>> triggered from
>>>>>>>> EL0. Randomly papering over symptoms is not something I really like
>>>>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>>>>
>>>>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>>>>> virtualization of the SMMU registers. It was identified and fixed 
>>>>>>> for
>>>>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>>>>
>>>>>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>>>>>> a workaround in the shipped kernel?
>>>>>
>>>>> I mean that it's fixed in the firmware.
>>>>>
>>>>>> If the former, is this part of an official QC statement, with an
>>>>>> associated erratum number?
>>>>>
>>>>> I don't know, will get back to you on this one.
>>>>>
>>>>>> Is this really limited to the SMMU accesses?
>>>>>
>>>>> Yes.
>>>>>
>>>>>>> As such Angelo's list of affected platforms covers the high-profile
>>>>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good 
>>>>>>> support
>>>>>>> upstream, if we can figure out a way around this issue.
>>>>>>
>>>>>> We'd need an exhaustive list of the affected SoCs, and work out if we
>>>>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>>>>> who'd know about it).
>>>>>
>>>>> I will try to compose a list.
>>>>
>>>> FWIW, I have just been bitten by this issue. I needed to enable an 
>>>> SMMU to
>>>> filter PCIe EP accesses to system RAM (or something). I'm using an 
>>>> APQ8098
>>>> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
>>>>
>>>>     /* Invalidate the TLB, just in case */
>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>
>>>>
>>>> With the 'Z' constraint, gcc generates:
>>>>
>>>>     str wzr, [x0]
>>>>
>>>> without the 'Z' constraint, gcc generates:
>>>>
>>>>     mov    w1, 0
>>>>     str w1, [x0]
>>>>
>>>>
>>>> I can work around the problem using the following patch:
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>> index 045d93884164..93117519aed8 100644
>>>> --- a/drivers/iommu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm-smmu.c
>>>> @@ -59,6 +59,11 @@
>>>>    #include "arm-smmu-regs.h"
>>>> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
>>>> +{
>>>> +    asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
>>>> +}
>>>> +
>>>>    #define ARM_MMU500_ACTLR_CPRE        (1 << 1)
>>>>    #define ARM_MMU500_ACR_CACHE_LOCK    (1 << 26)
>>>> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct 
>>>> arm_smmu_device *smmu,
>>>>    {
>>>>        unsigned int spin_cnt, delay;
>>>> -    writel_relaxed(0, sync);
>>>> +    qcom_writel(0, sync);
>>>>        for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>>            for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>>                if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct 
>>>> arm_smmu_device *smmu)
>>>>        }
>>>>        /* Invalidate the TLB, just in case */
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>        reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>>
>>>>
>>>>
>>>> Can a quirk be used to work around the issue?
>>>> Or can we just "pessimize" the 3 writes for everybody?
>>>> (Might be cheaper than a test anyway)
>>>
>>> If it really is just the SMMU driver which is affected, we can work 
>>> around
>>> it for free (not counting the 'cost' of slightly-weird-looking code, of
>>> course). If the diff below works as expected, I'll write it up properly.
>>>
>>> Robin.
>>> ----->8-----
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index 045d93884164..7ff29e33298f 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct 
>>> arm_smmu_device
>>> *smmu,
>>>   {
>>>       unsigned int spin_cnt, delay;
>>>
>>> -    writel_relaxed(0, sync);
>>> +    writel_relaxed((unsigned long)sync, sync);
>>>       for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>           for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>               if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>> @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
>>> arm_smmu_device *smmu, int idx)
>>>
>>>       /* Unassigned context banks only need disabling */
>>>       if (!cfg) {
>>> -        writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
>>> +        /*
>>> +         * For Qualcomm reasons, we want to guarantee that we write a
>>> +         * zero from a register which is not WZR. Fortunately, the cfg
>>> +         * logic here plays right into our hands...
>>> +         */
>>> +        writel_relaxed((unsigned long)cfg, cb_base + 
>>> ARM_SMMU_CB_SCTLR);
>>>           return;
>>>       }
>>>
>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
>>> arm_smmu_device *smmu)
>>>       }
>>>
>>>       /* Invalidate the TLB, just in case */
>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>
>>>       reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>
>>
>> Given what we've seen from Clang for futex stuff in 32-bit ARM, are
>> you really sure that the above will not result in Clang still spotting
>> that the value is zero and using a wzr for all these cases?
> 
> The trick is that in the write-only TLBI cases the variable we're 
> passing in really is nonzero, so that can't possibly happen. For the 
> context bank reset, yes, I am assuming that no complier will ever be 
> perverse enough to detect that cfg is not written after the NULL check 
> and immediately reallocate it to XZR for no good reason. I'd like to 
> think that assumption is going to hold for the reasonable scope of this 
> particular workaround, though.

Well, crap. So much for that hubris...


00000000000000f0 <arm_smmu_write_context_bank>:
       f0:       52800504        mov     w4, #0x28 
// #40
       f4:       f940240a        ldr     x10, [x0,#72]
       f8:       a9411402        ldp     x2, x5, [x0,#16]
       fc:       9b247c24        smull   x4, w1, w4
      100:       8b040148        add     x8, x10, x4
      104:       1ac52023        lsl     w3, w1, w5
      108:       8b23c042        add     x2, x2, w3, sxtw
      10c:       f9401107        ldr     x7, [x8,#32]
      110:       b5000067        cbnz    x7, 11c 
<arm_smmu_write_context_bank+0x2c>
      114:       b900005f        str     wzr, [x2]


Time to come up with a better SCTLR reset value, I guess.

Robin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-03-18 17:19                     ` Robin Murphy
  2019-03-18 17:24                       ` Robin Murphy
@ 2019-03-18 17:30                       ` Marc Gonzalez
  2019-03-18 17:59                         ` Robin Murphy
  1 sibling, 1 reply; 20+ messages in thread
From: Marc Gonzalez @ 2019-03-18 17:30 UTC (permalink / raw)
  To: Robin Murphy, Russell King - ARM Linux admin
  Cc: Marc Zyngier, Will Deacon, Jens Axboe, Jeffrey Hugo,
	Catalin Marinas, LKML, Bjorn Andersson, MSM,
	AngeloGioacchino Del Regno, Linux ARM

On 18/03/2019 18:19, Robin Murphy wrote:

> For the context bank reset, yes, I am assuming that no complier will
> ever be perverse enough to detect that cfg is not written after the
> NULL check and immediately reallocate it to XZR for no good reason.
> I'd like to think that assumption is going to hold for the reasonable
> scope of this particular workaround, though.

I'm not sure I understand the above paragraph.

In code such as:

	if (val == 0) foo(val);

gcc's algorithm is likely to figure out that the code is equivalent to

	if (val == 0) foo(0)

and perform constant-propagation, etc.

Is that what we're talking about?

Regards.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-03-18 17:30                       ` Marc Gonzalez
@ 2019-03-18 17:59                         ` Robin Murphy
  0 siblings, 0 replies; 20+ messages in thread
From: Robin Murphy @ 2019-03-18 17:59 UTC (permalink / raw)
  To: Marc Gonzalez, Russell King - ARM Linux admin
  Cc: Marc Zyngier, Will Deacon, Jens Axboe, Jeffrey Hugo,
	Catalin Marinas, LKML, Bjorn Andersson, MSM,
	AngeloGioacchino Del Regno, Linux ARM

On 18/03/2019 17:30, Marc Gonzalez wrote:
> On 18/03/2019 18:19, Robin Murphy wrote:
> 
>> For the context bank reset, yes, I am assuming that no complier will
>> ever be perverse enough to detect that cfg is not written after the
>> NULL check and immediately reallocate it to XZR for no good reason.
>> I'd like to think that assumption is going to hold for the reasonable
>> scope of this particular workaround, though.
> 
> I'm not sure I understand the above paragraph.
> 
> In code such as:
> 
> 	if (val == 0) foo(val);
> 
> gcc's algorithm is likely to figure out that the code is equivalent to
> 
> 	if (val == 0) foo(0)
> 
> and perform constant-propagation, etc.
> 
> Is that what we're talking about?

As may already be apparent, the way I expect compilers to behave, and 
the way compilers actually behave, are two overlapping but distinct sets 
of concepts...

Robin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/io: Don't use WZR in writel
  2019-03-18 17:24                       ` Robin Murphy
@ 2019-03-19 11:45                         ` Robin Murphy
  0 siblings, 0 replies; 20+ messages in thread
From: Robin Murphy @ 2019-03-19 11:45 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Jens Axboe, Marc Gonzalez, Marc Zyngier, Catalin Marinas,
	Will Deacon, LKML, Bjorn Andersson, Jeffrey Hugo, MSM,
	AngeloGioacchino Del Regno, Linux ARM

On 18/03/2019 17:24, Robin Murphy wrote:
> On 18/03/2019 17:19, Robin Murphy wrote:
>> On 18/03/2019 17:00, Russell King - ARM Linux admin wrote:
>>> On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
>>>> On 12/03/2019 12:36, Marc Gonzalez wrote:
>>>>> On 24/02/2019 04:53, Bjorn Andersson wrote:
>>>>>
>>>>>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>>>>>
>>>>>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>>>>>
>>>>>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>>>>>
>>>>>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>>>>>
>>>>>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>>>>>> have the same stupid bug.
>>>>>>>>>
>>>>>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>>>>>> everyone else has debugged their hypervisor (or most likely 
>>>>>>>>> doesn't
>>>>>>>>> bother with shipping one).
>>>>>>>>>
>>>>>>>>> In all honesty, we need some information from QC here: which 
>>>>>>>>> SoCs are
>>>>>>>>> affected, what is the exact nature of the bug, can it be 
>>>>>>>>> triggered from
>>>>>>>>> EL0. Randomly papering over symptoms is not something I really 
>>>>>>>>> like
>>>>>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>>>>>
>>>>>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>>>>>> virtualization of the SMMU registers. It was identified and 
>>>>>>>> fixed for
>>>>>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>>>>>
>>>>>>> When you say "fixed": Do you mean fixed in the firmware? Or by 
>>>>>>> adding
>>>>>>> a workaround in the shipped kernel?
>>>>>>
>>>>>> I mean that it's fixed in the firmware.
>>>>>>
>>>>>>> If the former, is this part of an official QC statement, with an
>>>>>>> associated erratum number?
>>>>>>
>>>>>> I don't know, will get back to you on this one.
>>>>>>
>>>>>>> Is this really limited to the SMMU accesses?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>> As such Angelo's list of affected platforms covers the high-profile
>>>>>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good 
>>>>>>>> support
>>>>>>>> upstream, if we can figure out a way around this issue.
>>>>>>>
>>>>>>> We'd need an exhaustive list of the affected SoCs, and work out 
>>>>>>> if we
>>>>>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>>>>>> who'd know about it).
>>>>>>
>>>>>> I will try to compose a list.
>>>>>
>>>>> FWIW, I have just been bitten by this issue. I needed to enable an 
>>>>> SMMU to
>>>>> filter PCIe EP accesses to system RAM (or something). I'm using an 
>>>>> APQ8098
>>>>> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
>>>>>
>>>>>     /* Invalidate the TLB, just in case */
>>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>>
>>>>>
>>>>> With the 'Z' constraint, gcc generates:
>>>>>
>>>>>     str wzr, [x0]
>>>>>
>>>>> without the 'Z' constraint, gcc generates:
>>>>>
>>>>>     mov    w1, 0
>>>>>     str w1, [x0]
>>>>>
>>>>>
>>>>> I can work around the problem using the following patch:
>>>>>
>>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>>> index 045d93884164..93117519aed8 100644
>>>>> --- a/drivers/iommu/arm-smmu.c
>>>>> +++ b/drivers/iommu/arm-smmu.c
>>>>> @@ -59,6 +59,11 @@
>>>>>    #include "arm-smmu-regs.h"
>>>>> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
>>>>> +{
>>>>> +    asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
>>>>> +}
>>>>> +
>>>>>    #define ARM_MMU500_ACTLR_CPRE        (1 << 1)
>>>>>    #define ARM_MMU500_ACR_CACHE_LOCK    (1 << 26)
>>>>> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct 
>>>>> arm_smmu_device *smmu,
>>>>>    {
>>>>>        unsigned int spin_cnt, delay;
>>>>> -    writel_relaxed(0, sync);
>>>>> +    qcom_writel(0, sync);
>>>>>        for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>>>            for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>>>                if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct 
>>>>> arm_smmu_device *smmu)
>>>>>        }
>>>>>        /* Invalidate the TLB, just in case */
>>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>>        reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>>>
>>>>>
>>>>>
>>>>> Can a quirk be used to work around the issue?
>>>>> Or can we just "pessimize" the 3 writes for everybody?
>>>>> (Might be cheaper than a test anyway)
>>>>
>>>> If it really is just the SMMU driver which is affected, we can work 
>>>> around
>>>> it for free (not counting the 'cost' of slightly-weird-looking code, of
>>>> course). If the diff below works as expected, I'll write it up 
>>>> properly.
>>>>
>>>> Robin.
>>>> ----->8-----
>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>> index 045d93884164..7ff29e33298f 100644
>>>> --- a/drivers/iommu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm-smmu.c
>>>> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct 
>>>> arm_smmu_device
>>>> *smmu,
>>>>   {
>>>>       unsigned int spin_cnt, delay;
>>>>
>>>> -    writel_relaxed(0, sync);
>>>> +    writel_relaxed((unsigned long)sync, sync);
>>>>       for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>>           for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>>               if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>>> @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
>>>> arm_smmu_device *smmu, int idx)
>>>>
>>>>       /* Unassigned context banks only need disabling */
>>>>       if (!cfg) {
>>>> -        writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
>>>> +        /*
>>>> +         * For Qualcomm reasons, we want to guarantee that we write a
>>>> +         * zero from a register which is not WZR. Fortunately, the cfg
>>>> +         * logic here plays right into our hands...
>>>> +         */
>>>> +        writel_relaxed((unsigned long)cfg, cb_base + 
>>>> ARM_SMMU_CB_SCTLR);
>>>>           return;
>>>>       }
>>>>
>>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
>>>> arm_smmu_device *smmu)
>>>>       }
>>>>
>>>>       /* Invalidate the TLB, just in case */
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>
>>>>       reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>>
>>>
>>> Given what we've seen from Clang for futex stuff in 32-bit ARM, are
>>> you really sure that the above will not result in Clang still spotting
>>> that the value is zero and using a wzr for all these cases?
>>
>> The trick is that in the write-only TLBI cases the variable we're 
>> passing in really is nonzero, so that can't possibly happen. For the 
>> context bank reset, yes, I am assuming that no complier will ever be 
>> perverse enough to detect that cfg is not written after the NULL check 
>> and immediately reallocate it to XZR for no good reason. I'd like to 
>> think that assumption is going to hold for the reasonable scope of 
>> this particular workaround, though.
> 
> Well, crap. So much for that hubris...
> 
> 
> 00000000000000f0 <arm_smmu_write_context_bank>:
>        f0:       52800504        mov     w4, #0x28 // #40
>        f4:       f940240a        ldr     x10, [x0,#72]
>        f8:       a9411402        ldp     x2, x5, [x0,#16]
>        fc:       9b247c24        smull   x4, w1, w4
>       100:       8b040148        add     x8, x10, x4
>       104:       1ac52023        lsl     w3, w1, w5
>       108:       8b23c042        add     x2, x2, w3, sxtw
>       10c:       f9401107        ldr     x7, [x8,#32]
>       110:       b5000067        cbnz    x7, 11c 
> <arm_smmu_write_context_bank+0x2c>
>       114:       b900005f        str     wzr, [x2]
> 
> 
> Time to come up with a better SCTLR reset value, I guess.

Hmm, or perhaps not... The hangs are all reported in the TLB maintenance 
calls, and by the time we get to the first of those we've already 
written the context banks with their initial disabled value, which 
implies that that particular use of WZR isn't a problem (despite being 
the only one which actually consumes the value stored; how bizarre).

Robin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] arm64/io: Don't use WZR in writel
@ 2019-02-09 18:30 AngeloGioacchino Del Regno
  0 siblings, 0 replies; 20+ messages in thread
From: AngeloGioacchino Del Regno @ 2019-02-09 18:30 UTC (permalink / raw)
  To: AngeloGioacchino Del Regno, Jens Axboe, Will Deacon, Catalin Marinas
  Cc: AngeloGioacchino Del Regno, linux-arm-kernel, linux-kernel,
	linux-arm-msm



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-03-19 11:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-09 18:34 [PATCH] arm64/io: Don't use WZR in writel AngeloGioacchino Del Regno
2019-02-11 10:57 ` Will Deacon
2019-02-11 11:52   ` Marc Zyngier
2019-02-11 14:29     ` AngeloGioacchino Del Regno
2019-02-11 14:59       ` Marc Zyngier
2019-02-11 16:15         ` AngeloGioacchino Del Regno
2019-02-11 16:37         ` Robin Murphy
2019-02-23 18:12         ` Bjorn Andersson
2019-02-23 18:37           ` Marc Zyngier
2019-02-24  3:53             ` Bjorn Andersson
2019-03-12 12:36               ` Marc Gonzalez
2019-03-18 16:04                 ` Robin Murphy
2019-03-18 17:00                   ` Russell King - ARM Linux admin
2019-03-18 17:11                     ` Ard Biesheuvel
2019-03-18 17:19                     ` Robin Murphy
2019-03-18 17:24                       ` Robin Murphy
2019-03-19 11:45                         ` Robin Murphy
2019-03-18 17:30                       ` Marc Gonzalez
2019-03-18 17:59                         ` Robin Murphy
  -- strict thread matches above, loose matches on Subject: below --
2019-02-09 18:30 AngeloGioacchino Del Regno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).