All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
@ 2022-12-08 15:15 ` Vladimir Oltean
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Oltean @ 2022-12-08 15:15 UTC (permalink / raw)
  To: devicetree, iommu
  Cc: Laurentiu Tudor, Will Deacon, Robin Murphy, linux-arm-kernel,
	Shawn Guo, Li Yang, Rob Herring, Krzysztof Kozlowski,
	linux-kernel, Michael Walle

Since commit df198b37e72c ("iommu/arm-smmu: Report
IOMMU_CAP_CACHE_COHERENCY better"), the SMMU driver will say that a
device has the IOMMU_CAP_CACHE_COHERENCY capability if the
ARM_SMMU_FEAT_COHERENT_WALK bit was set in smmu->features.

This breaks vfio-pci, as can be seen below:

$ echo 0000:00:00.0 > /sys/bus/pci/drivers/fsl_enetc/unbind
$ echo vfio-pci > /sys/bus/pci/devices/0000\:00\:00.0/driver_override
$ echo 0000:00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
[   25.261941] vfio-pci 0000:00:00.0: arm_smmu_capable: smmu features 0xe9e
[   25.268877] vfio-pci 0000:00:00.0: vfio_group_find_or_alloc: device_iommu_capable() returned false
[   25.279271] vfio-pci 0000:00:00.0: vfio_pci_core_register_device: failed to register group dev: -EINVAL
[   25.301377] vfio-pci: probe of 0000:00:00.0 failed with error -22

The ARM_SMMU_FEAT_COHERENT_WALK feature is set in
arm_smmu_device_dt_probe() if the OF node of the SMMU device was set as
dma-coherent. In the case of LS1028A, it wasn't.

Fix that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
The LS1028A is not the only SoC affected by this, it seems.
fsl-ls1088a.dtsi seems to be in the same situation where vfio-pci worked
before. There are also other SoCs which don't have dma-coherent in the
iommu node. There's also something I don't quite like about this patch
technically introducing a regression which requires a device tree update.

Can something different be done about that, or are LS1028A/LS1088A
simply to blame because of breaching the dt-bindings contract, and in
that case, I'll have to suck it up, put a Fixes tag here, write another
patch for LS1088A, and resend?

 arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
index 383829ec7be7..bcce189c7a0a 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
@@ -715,6 +715,7 @@ smmu: iommu@5000000 {
 			#global-interrupts = <8>;
 			#iommu-cells = <1>;
 			stream-match-mask = <0x7c00>;
+			dma-coherent;
 			/* global secure fault */
 			interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>,
 			/* combined secure interrupt */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
@ 2022-12-08 15:15 ` Vladimir Oltean
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Oltean @ 2022-12-08 15:15 UTC (permalink / raw)
  To: devicetree, iommu
  Cc: Laurentiu Tudor, Will Deacon, Robin Murphy, linux-arm-kernel,
	Shawn Guo, Li Yang, Rob Herring, Krzysztof Kozlowski,
	linux-kernel, Michael Walle

Since commit df198b37e72c ("iommu/arm-smmu: Report
IOMMU_CAP_CACHE_COHERENCY better"), the SMMU driver will say that a
device has the IOMMU_CAP_CACHE_COHERENCY capability if the
ARM_SMMU_FEAT_COHERENT_WALK bit was set in smmu->features.

This breaks vfio-pci, as can be seen below:

$ echo 0000:00:00.0 > /sys/bus/pci/drivers/fsl_enetc/unbind
$ echo vfio-pci > /sys/bus/pci/devices/0000\:00\:00.0/driver_override
$ echo 0000:00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
[   25.261941] vfio-pci 0000:00:00.0: arm_smmu_capable: smmu features 0xe9e
[   25.268877] vfio-pci 0000:00:00.0: vfio_group_find_or_alloc: device_iommu_capable() returned false
[   25.279271] vfio-pci 0000:00:00.0: vfio_pci_core_register_device: failed to register group dev: -EINVAL
[   25.301377] vfio-pci: probe of 0000:00:00.0 failed with error -22

The ARM_SMMU_FEAT_COHERENT_WALK feature is set in
arm_smmu_device_dt_probe() if the OF node of the SMMU device was set as
dma-coherent. In the case of LS1028A, it wasn't.

Fix that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
The LS1028A is not the only SoC affected by this, it seems.
fsl-ls1088a.dtsi seems to be in the same situation where vfio-pci worked
before. There are also other SoCs which don't have dma-coherent in the
iommu node. There's also something I don't quite like about this patch
technically introducing a regression which requires a device tree update.

Can something different be done about that, or are LS1028A/LS1088A
simply to blame because of breaching the dt-bindings contract, and in
that case, I'll have to suck it up, put a Fixes tag here, write another
patch for LS1088A, and resend?

 arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
index 383829ec7be7..bcce189c7a0a 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
@@ -715,6 +715,7 @@ smmu: iommu@5000000 {
 			#global-interrupts = <8>;
 			#iommu-cells = <1>;
 			stream-match-mask = <0x7c00>;
+			dma-coherent;
 			/* global secure fault */
 			interrupts = <GIC_SPI 13 IRQ_TYPE_LEVEL_HIGH>,
 			/* combined secure interrupt */
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
  2022-12-08 15:15 ` Vladimir Oltean
@ 2022-12-08 19:01   ` Robin Murphy
  -1 siblings, 0 replies; 14+ messages in thread
From: Robin Murphy @ 2022-12-08 19:01 UTC (permalink / raw)
  To: Vladimir Oltean, devicetree, iommu
  Cc: Laurentiu Tudor, Will Deacon, linux-arm-kernel, Shawn Guo,
	Li Yang, Rob Herring, Krzysztof Kozlowski, linux-kernel,
	Michael Walle

On 08/12/2022 3:15 pm, Vladimir Oltean wrote:
> Since commit df198b37e72c ("iommu/arm-smmu: Report
> IOMMU_CAP_CACHE_COHERENCY better"), the SMMU driver will say that a
> device has the IOMMU_CAP_CACHE_COHERENCY capability if the
> ARM_SMMU_FEAT_COHERENT_WALK bit was set in smmu->features.
> 
> This breaks vfio-pci, as can be seen below:
> 
> $ echo 0000:00:00.0 > /sys/bus/pci/drivers/fsl_enetc/unbind
> $ echo vfio-pci > /sys/bus/pci/devices/0000\:00\:00.0/driver_override
> $ echo 0000:00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
> [   25.261941] vfio-pci 0000:00:00.0: arm_smmu_capable: smmu features 0xe9e
> [   25.268877] vfio-pci 0000:00:00.0: vfio_group_find_or_alloc: device_iommu_capable() returned false
> [   25.279271] vfio-pci 0000:00:00.0: vfio_pci_core_register_device: failed to register group dev: -EINVAL
> [   25.301377] vfio-pci: probe of 0000:00:00.0 failed with error -22
> 
> The ARM_SMMU_FEAT_COHERENT_WALK feature is set in
> arm_smmu_device_dt_probe() if the OF node of the SMMU device was set as
> dma-coherent. In the case of LS1028A, it wasn't.
> 
> Fix that.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> The LS1028A is not the only SoC affected by this, it seems.
> fsl-ls1088a.dtsi seems to be in the same situation where vfio-pci worked
> before. There are also other SoCs which don't have dma-coherent in the
> iommu node. There's also something I don't quite like about this patch
> technically introducing a regression which requires a device tree update.
> 
> Can something different be done about that, or are LS1028A/LS1088A
> simply to blame because of breaching the dt-bindings contract, and in
> that case, I'll have to suck it up, put a Fixes tag here, write another
> patch for LS1088A, and resend?

It's more just good fortune that it ever worked properly at all. We have
to make the DT authoritative about coherency because cases exist where
the ID register is misconfigured. You've been telling Linux that that is
the case, and now the message is finally getting through to VFIO. If we
weren't also lazy in io-pgtable-arm about what shareability attribute to
use for IOMMU_CACHE, you would have actually had the broken VFIO
behaviour that that check is now defending against.

I'd argue that you do want to make the DT change, because it's the truth
of the hardware. Even if you did want to keep doing the significant
extra work of maintaining non-coherent pagetables (there is a dubious
snoop latency vs. TLB miss rate argument), that would be better achieved
at the level of the io_pgtable_cfg, not by lying about the entire SMMU.

However, since Jason refactored things at the VFIO end too, it looks like
this should now be consistently checked for every individual device
bound to a VFIO driver, so we might be able to do a bit better, as
below. I'd be rather surprised if anyone ever genuinely built this
topology, but it does happen to be the one other combination that's easy
to infer with reasonable confidence.

Thanks,
Robin.

----->8-----
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 30dab1418e3f..a5ad9d6b51cf 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
  	switch (cap) {
  	case IOMMU_CAP_CACHE_COHERENCY:
  		/* Assume that a coherent TCU implies coherent TBUs */
-		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK;
+		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK ||
+			device_get_dma_attr(dev) == DEV_DMA_COHERENT;
  	case IOMMU_CAP_NOEXEC:
  		return true;
  	default:

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
@ 2022-12-08 19:01   ` Robin Murphy
  0 siblings, 0 replies; 14+ messages in thread
From: Robin Murphy @ 2022-12-08 19:01 UTC (permalink / raw)
  To: Vladimir Oltean, devicetree, iommu
  Cc: Laurentiu Tudor, Will Deacon, linux-arm-kernel, Shawn Guo,
	Li Yang, Rob Herring, Krzysztof Kozlowski, linux-kernel,
	Michael Walle

On 08/12/2022 3:15 pm, Vladimir Oltean wrote:
> Since commit df198b37e72c ("iommu/arm-smmu: Report
> IOMMU_CAP_CACHE_COHERENCY better"), the SMMU driver will say that a
> device has the IOMMU_CAP_CACHE_COHERENCY capability if the
> ARM_SMMU_FEAT_COHERENT_WALK bit was set in smmu->features.
> 
> This breaks vfio-pci, as can be seen below:
> 
> $ echo 0000:00:00.0 > /sys/bus/pci/drivers/fsl_enetc/unbind
> $ echo vfio-pci > /sys/bus/pci/devices/0000\:00\:00.0/driver_override
> $ echo 0000:00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
> [   25.261941] vfio-pci 0000:00:00.0: arm_smmu_capable: smmu features 0xe9e
> [   25.268877] vfio-pci 0000:00:00.0: vfio_group_find_or_alloc: device_iommu_capable() returned false
> [   25.279271] vfio-pci 0000:00:00.0: vfio_pci_core_register_device: failed to register group dev: -EINVAL
> [   25.301377] vfio-pci: probe of 0000:00:00.0 failed with error -22
> 
> The ARM_SMMU_FEAT_COHERENT_WALK feature is set in
> arm_smmu_device_dt_probe() if the OF node of the SMMU device was set as
> dma-coherent. In the case of LS1028A, it wasn't.
> 
> Fix that.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> The LS1028A is not the only SoC affected by this, it seems.
> fsl-ls1088a.dtsi seems to be in the same situation where vfio-pci worked
> before. There are also other SoCs which don't have dma-coherent in the
> iommu node. There's also something I don't quite like about this patch
> technically introducing a regression which requires a device tree update.
> 
> Can something different be done about that, or are LS1028A/LS1088A
> simply to blame because of breaching the dt-bindings contract, and in
> that case, I'll have to suck it up, put a Fixes tag here, write another
> patch for LS1088A, and resend?

It's more just good fortune that it ever worked properly at all. We have
to make the DT authoritative about coherency because cases exist where
the ID register is misconfigured. You've been telling Linux that that is
the case, and now the message is finally getting through to VFIO. If we
weren't also lazy in io-pgtable-arm about what shareability attribute to
use for IOMMU_CACHE, you would have actually had the broken VFIO
behaviour that that check is now defending against.

I'd argue that you do want to make the DT change, because it's the truth
of the hardware. Even if you did want to keep doing the significant
extra work of maintaining non-coherent pagetables (there is a dubious
snoop latency vs. TLB miss rate argument), that would be better achieved
at the level of the io_pgtable_cfg, not by lying about the entire SMMU.

However, since Jason refactored things at the VFIO end too, it looks like
this should now be consistently checked for every individual device
bound to a VFIO driver, so we might be able to do a bit better, as
below. I'd be rather surprised if anyone ever genuinely built this
topology, but it does happen to be the one other combination that's easy
to infer with reasonable confidence.

Thanks,
Robin.

----->8-----
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 30dab1418e3f..a5ad9d6b51cf 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
  	switch (cap) {
  	case IOMMU_CAP_CACHE_COHERENCY:
  		/* Assume that a coherent TCU implies coherent TBUs */
-		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK;
+		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK ||
+			device_get_dma_attr(dev) == DEV_DMA_COHERENT;
  	case IOMMU_CAP_NOEXEC:
  		return true;
  	default:

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
  2022-12-08 19:01   ` Robin Murphy
@ 2022-12-14 16:53     ` Vladimir Oltean
  -1 siblings, 0 replies; 14+ messages in thread
From: Vladimir Oltean @ 2022-12-14 16:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

Hi Robin,

On Thu, Dec 08, 2022 at 07:01:16PM +0000, Robin Murphy wrote:
> It's more just good fortune that it ever worked properly at all.

Thanks for the response. I don't know much about IOMMUs or the ARM SMMU,
I'm trying to understand what you've said. I hope you don't mind a few
foolish questions.

> We have to make the DT authoritative about coherency because cases exist where
> the ID register is misconfigured.

Which ID register? The ARM_SMMU_ID0_CTTW bit in ARM_SMMU_GR0_ID0, as
read by arm_smmu_device_cfg_probe()?

I tried to find more about this bit (driver suggests it's bit 14), but I'm a bit lost.
First of all, I don't know where to find the ID0 register for MMU-500.
I looked at the register summary here and didn't find it:
https://developer.arm.com/documentation/ddi0517/f/programmers-model/register-summary/global-address-space-0-registers-summary?lang=en
Then I googled it and found this page, where it just says that at bit 14
there's indeed something named CTTW (unexplained) which is hardcoded to 0:
https://developer.arm.com/documentation/ddi0517/e/programmers-model/memory-model/reset-values?lang=en
I did however download the SMMU v2 arch spec PDF at
https://developer.arm.com/documentation/ihi0062/latest

and there, I did find it. But I'm not sure why the MMU-500 says it
*should* be hardcoded to 0? Is this what you call "misconfigured"?

On my hardware (both LS1028A and LS1088A), it reads:

$ dmesg | grep smmu
[    4.825109] arm-smmu 5000000.iommu: probing hardware configuration...
[    4.831625] arm-smmu 5000000.iommu: SMMUv2 with:
[    4.836293] arm-smmu 5000000.iommu: GR0_ID0: 0x7c013e80
[    4.841569] arm-smmu 5000000.iommu:  stage 1 translation
[    4.846931] arm-smmu 5000000.iommu:  stage 2 translation
[    4.852293] arm-smmu 5000000.iommu:  nested translation
[    4.857573] arm-smmu 5000000.iommu:  stream matching with 128 register groups
[    4.864776] arm-smmu 5000000.iommu:  64 context banks (0 stage-2 only)
[    4.871372] arm-smmu 5000000.iommu:  Supported page sizes: 0x61311000
[    4.877868] arm-smmu 5000000.iommu:  Stage-1: 48-bit VA -> 48-bit IPA
[    4.884375] arm-smmu 5000000.iommu:  Stage-2: 48-bit IPA -> 48-bit PA
[    4.893113] arm-smmu 5000000.iommu:  preserved 0 boot mappings

On the other hand, in the verbal (no registers) documentation of the
MMU-500 integration in my SoCs, it does say that "The SMMU supports cache
coherency for page table walks and DVM transactions for page table cache
maintenance operations."

Does looking at the CTTW bit make any sense for MMU-500?

> You've been telling Linux that that is the case, and now the message
> is finally getting through to VFIO. If we weren't also lazy in
> io-pgtable-arm about what shareability attribute to use for IOMMU_CACHE,
> you would have actually had the broken VFIO behaviour that that check
> is now defending against.

lazy in io-pgtable-arm == ??

I assume you're talking about something which is (not) done in
arm_lpae_prot_to_pte()? Could you please clarify, as I didn't understand?


IIUC (and I've been wrong before), the IOMMU_CACHE "prot" flag means
that memory mapped by the IOMMU for DMA is coherent w.r.t. CPU caches,
and VFIO specifically needs it because:

	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
	 * restore cache coherency.

i.o.w. user space can't execute cache invalidation instructions (DC CIVAC etc),
which would make cache-coherent DMA transactions the only viable possibility.
Right?

> I'd argue that you do want to make the DT change, because it's the truth
> of the hardware. Even if you did want to keep doing the significant
> extra work of maintaining non-coherent pagetables (there is a dubious
> snoop latency vs. TLB miss rate argument), that would be better achieved
> at the level of the io_pgtable_cfg, not by lying about the entire SMMU.

I do agree that we could set the dma-coherent property in the SMMU node
to skip some of the wmb() instructions in the TLB invalidation procedures
for stage 1/stage 2 translations. I wasn't trying to make any argument
in favor of manually maintaining the cache coherency with the page tables.

I'm just not exactly clear what does the page table walk of the SMMU TCU
have to do with the cache coherence of the DMA transactions forwarded/translated
by the TBUs. I mean, I saw the comment below:

	/* Assume that a coherent TCU implies coherent TBUs */

but I simply don't understand what is it that gives this assumption any
grounds.

> However, since Jason refactored things at the VFIO end too, it looks like
> this should now be consistently checked for every individual device
> bound to a VFIO driver, so we might be able to do a bit better, as
> below.

hmm, the change in vfio_group_find_or_alloc() between iommu_capable(dev->bus)
and device_iommu_capable(dev) took place in commit a9cf69d0e7f2 ("Merge
tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfio"), says my
git blame. Pretty strange. I had to use git log --graph to find your
commit specifically: 3b498b665621 ("vfio: Use device_iommu_capable()").

> I'd be rather surprised if anyone ever genuinely built this
> topology, but it does happen to be the one other combination that's easy
> to infer with reasonable confidence.

this topology == ?
non-coherent SMMU page table walks but cache coherent DMA traffic of
SMMU upstream device?

Feasible or not, VFIO doesn't have a problem working with that device
even if the SMMU doesn't do coherent walking of its translation tables,
no?

> 
> Thanks,
> Robin.
> 
> ----->8-----
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 30dab1418e3f..a5ad9d6b51cf 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>  	switch (cap) {
>  	case IOMMU_CAP_CACHE_COHERENCY:
>  		/* Assume that a coherent TCU implies coherent TBUs */
> -		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK;
> +		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK ||
> +			device_get_dma_attr(dev) == DEV_DMA_COHERENT;

So this works for enetc, thanks. However, do we need to also consider
handling DEV_DMA_NOT_SUPPORTED, and thus, testing != DEV_DMA_NON_COHERENT
could be better than == DEV_DMA_COHERENT?

But from your response and the explanation in commit df198b37e72c
("iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY better"), I'm not
clear why we would keep looking at the COHERENT_WALK feature at all?

It seems to my layman eyes that we are artificially coupling 2 concepts
that have nothing in common, and bad reporting for one of them is
causing trouble with the other.

>  	case IOMMU_CAP_NOEXEC:
>  		return true;
>  	default:

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
@ 2022-12-14 16:53     ` Vladimir Oltean
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Oltean @ 2022-12-14 16:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

Hi Robin,

On Thu, Dec 08, 2022 at 07:01:16PM +0000, Robin Murphy wrote:
> It's more just good fortune that it ever worked properly at all.

Thanks for the response. I don't know much about IOMMUs or the ARM SMMU,
I'm trying to understand what you've said. I hope you don't mind a few
foolish questions.

> We have to make the DT authoritative about coherency because cases exist where
> the ID register is misconfigured.

Which ID register? The ARM_SMMU_ID0_CTTW bit in ARM_SMMU_GR0_ID0, as
read by arm_smmu_device_cfg_probe()?

I tried to find more about this bit (driver suggests it's bit 14), but I'm a bit lost.
First of all, I don't know where to find the ID0 register for MMU-500.
I looked at the register summary here and didn't find it:
https://developer.arm.com/documentation/ddi0517/f/programmers-model/register-summary/global-address-space-0-registers-summary?lang=en
Then I googled it and found this page, where it just says that at bit 14
there's indeed something named CTTW (unexplained) which is hardcoded to 0:
https://developer.arm.com/documentation/ddi0517/e/programmers-model/memory-model/reset-values?lang=en
I did however download the SMMU v2 arch spec PDF at
https://developer.arm.com/documentation/ihi0062/latest

and there, I did find it. But I'm not sure why the MMU-500 says it
*should* be hardcoded to 0? Is this what you call "misconfigured"?

On my hardware (both LS1028A and LS1088A), it reads:

$ dmesg | grep smmu
[    4.825109] arm-smmu 5000000.iommu: probing hardware configuration...
[    4.831625] arm-smmu 5000000.iommu: SMMUv2 with:
[    4.836293] arm-smmu 5000000.iommu: GR0_ID0: 0x7c013e80
[    4.841569] arm-smmu 5000000.iommu:  stage 1 translation
[    4.846931] arm-smmu 5000000.iommu:  stage 2 translation
[    4.852293] arm-smmu 5000000.iommu:  nested translation
[    4.857573] arm-smmu 5000000.iommu:  stream matching with 128 register groups
[    4.864776] arm-smmu 5000000.iommu:  64 context banks (0 stage-2 only)
[    4.871372] arm-smmu 5000000.iommu:  Supported page sizes: 0x61311000
[    4.877868] arm-smmu 5000000.iommu:  Stage-1: 48-bit VA -> 48-bit IPA
[    4.884375] arm-smmu 5000000.iommu:  Stage-2: 48-bit IPA -> 48-bit PA
[    4.893113] arm-smmu 5000000.iommu:  preserved 0 boot mappings

On the other hand, in the verbal (no registers) documentation of the
MMU-500 integration in my SoCs, it does say that "The SMMU supports cache
coherency for page table walks and DVM transactions for page table cache
maintenance operations."

Does looking at the CTTW bit make any sense for MMU-500?

> You've been telling Linux that that is the case, and now the message
> is finally getting through to VFIO. If we weren't also lazy in
> io-pgtable-arm about what shareability attribute to use for IOMMU_CACHE,
> you would have actually had the broken VFIO behaviour that that check
> is now defending against.

lazy in io-pgtable-arm == ??

I assume you're talking about something which is (not) done in
arm_lpae_prot_to_pte()? Could you please clarify, as I didn't understand?


IIUC (and I've been wrong before), the IOMMU_CACHE "prot" flag means
that memory mapped by the IOMMU for DMA is coherent w.r.t. CPU caches,
and VFIO specifically needs it because:

	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
	 * restore cache coherency.

i.o.w. user space can't execute cache invalidation instructions (DC CIVAC etc),
which would make cache-coherent DMA transactions the only viable possibility.
Right?

> I'd argue that you do want to make the DT change, because it's the truth
> of the hardware. Even if you did want to keep doing the significant
> extra work of maintaining non-coherent pagetables (there is a dubious
> snoop latency vs. TLB miss rate argument), that would be better achieved
> at the level of the io_pgtable_cfg, not by lying about the entire SMMU.

I do agree that we could set the dma-coherent property in the SMMU node
to skip some of the wmb() instructions in the TLB invalidation procedures
for stage 1/stage 2 translations. I wasn't trying to make any argument
in favor of manually maintaining the cache coherency with the page tables.

I'm just not exactly clear what does the page table walk of the SMMU TCU
have to do with the cache coherence of the DMA transactions forwarded/translated
by the TBUs. I mean, I saw the comment below:

	/* Assume that a coherent TCU implies coherent TBUs */

but I simply don't understand what is it that gives this assumption any
grounds.

> However, since Jason refactored things at the VFIO end too, it looks like
> this should now be consistently checked for every individual device
> bound to a VFIO driver, so we might be able to do a bit better, as
> below.

hmm, the change in vfio_group_find_or_alloc() between iommu_capable(dev->bus)
and device_iommu_capable(dev) took place in commit a9cf69d0e7f2 ("Merge
tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfio"), says my
git blame. Pretty strange. I had to use git log --graph to find your
commit specifically: 3b498b665621 ("vfio: Use device_iommu_capable()").

> I'd be rather surprised if anyone ever genuinely built this
> topology, but it does happen to be the one other combination that's easy
> to infer with reasonable confidence.

this topology == ?
non-coherent SMMU page table walks but cache coherent DMA traffic of
SMMU upstream device?

Feasible or not, VFIO doesn't have a problem working with that device
even if the SMMU doesn't do coherent walking of its translation tables,
no?

> 
> Thanks,
> Robin.
> 
> ----->8-----
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 30dab1418e3f..a5ad9d6b51cf 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>  	switch (cap) {
>  	case IOMMU_CAP_CACHE_COHERENCY:
>  		/* Assume that a coherent TCU implies coherent TBUs */
> -		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK;
> +		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK ||
> +			device_get_dma_attr(dev) == DEV_DMA_COHERENT;

So this works for enetc, thanks. However, do we need to also consider
handling DEV_DMA_NOT_SUPPORTED, and thus, testing != DEV_DMA_NON_COHERENT
could be better than == DEV_DMA_COHERENT?

But from your response and the explanation in commit df198b37e72c
("iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY better"), I'm not
clear why we would keep looking at the COHERENT_WALK feature at all?

It seems to my layman eyes that we are artificially coupling 2 concepts
that have nothing in common, and bad reporting for one of them is
causing trouble with the other.

>  	case IOMMU_CAP_NOEXEC:
>  		return true;
>  	default:

Thanks!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
  2022-12-14 16:53     ` Vladimir Oltean
@ 2022-12-14 20:33       ` Robin Murphy
  -1 siblings, 0 replies; 14+ messages in thread
From: Robin Murphy @ 2022-12-14 20:33 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

On 2022-12-14 16:53, Vladimir Oltean wrote:
> Hi Robin,
> 
> On Thu, Dec 08, 2022 at 07:01:16PM +0000, Robin Murphy wrote:
>> It's more just good fortune that it ever worked properly at all.
> 
> Thanks for the response. I don't know much about IOMMUs or the ARM SMMU,
> I'm trying to understand what you've said. I hope you don't mind a few
> foolish questions.
> 
>> We have to make the DT authoritative about coherency because cases exist where
>> the ID register is misconfigured.
> 
> Which ID register? The ARM_SMMU_ID0_CTTW bit in ARM_SMMU_GR0_ID0, as
> read by arm_smmu_device_cfg_probe()?

Yes.

> I tried to find more about this bit (driver suggests it's bit 14), but I'm a bit lost.
> First of all, I don't know where to find the ID0 register for MMU-500.
> I looked at the register summary here and didn't find it:
> https://developer.arm.com/documentation/ddi0517/f/programmers-model/register-summary/global-address-space-0-registers-summary?lang=en
> Then I googled it and found this page, where it just says that at bit 14
> there's indeed something named CTTW (unexplained) which is hardcoded to 0:
> https://developer.arm.com/documentation/ddi0517/e/programmers-model/memory-model/reset-values?lang=en
> I did however download the SMMU v2 arch spec PDF at
> https://developer.arm.com/documentation/ihi0062/latest
> 
> and there, I did find it. But I'm not sure why the MMU-500 says it
> *should* be hardcoded to 0? Is this what you call "misconfigured"?

For some reason, SMMU TRMs seem to have a tradition of describing their 
implementation of architectural registers in weird and unexpected 
places... Anyway that appears to be a documentation bug, since the 
actual SMMU_IDR0.CTTW value should reflect whatever is sampled on the 
cfg_cttw input at reset, per:

https://developer.arm.com/documentation/ddi0517/f/signal-descriptions/miscellaneous-signals/tie-off-signals

> On my hardware (both LS1028A and LS1088A), it reads:
> 
> $ dmesg | grep smmu
> [    4.825109] arm-smmu 5000000.iommu: probing hardware configuration...
> [    4.831625] arm-smmu 5000000.iommu: SMMUv2 with:
> [    4.836293] arm-smmu 5000000.iommu: GR0_ID0: 0x7c013e80
> [    4.841569] arm-smmu 5000000.iommu:  stage 1 translation
> [    4.846931] arm-smmu 5000000.iommu:  stage 2 translation
> [    4.852293] arm-smmu 5000000.iommu:  nested translation
> [    4.857573] arm-smmu 5000000.iommu:  stream matching with 128 register groups
> [    4.864776] arm-smmu 5000000.iommu:  64 context banks (0 stage-2 only)
> [    4.871372] arm-smmu 5000000.iommu:  Supported page sizes: 0x61311000
> [    4.877868] arm-smmu 5000000.iommu:  Stage-1: 48-bit VA -> 48-bit IPA
> [    4.884375] arm-smmu 5000000.iommu:  Stage-2: 48-bit IPA -> 48-bit PA
> [    4.893113] arm-smmu 5000000.iommu:  preserved 0 boot mappings
> 
> On the other hand, in the verbal (no registers) documentation of the
> MMU-500 integration in my SoCs, it does say that "The SMMU supports cache
> coherency for page table walks and DVM transactions for page table cache
> maintenance operations."
> 
> Does looking at the CTTW bit make any sense for MMU-500?

In general, yes. The result above does imply that NXP have inadvertently 
set cfg_cttw wrong. For the avoidance of doubt, here's another MMU-500 
showing SMMU_IDR0.CTTW set:

[    3.014972] arm-smmu arm-smmu.0.auto: probing hardware configuration...
[    3.014974] arm-smmu arm-smmu.0.auto: SMMUv2 with:
[    3.014976] arm-smmu arm-smmu.0.auto:        stage 2 translation
[    3.014977] arm-smmu arm-smmu.0.auto:        coherent table walk
[    3.014979] arm-smmu arm-smmu.0.auto:        stream matching with 128 
register groups
[    3.014981] arm-smmu arm-smmu.0.auto:        128 context banks (128 
stage-2 only)
[    3.014984] arm-smmu arm-smmu.0.auto:        Supported page sizes: 
0x60211000
[    3.014986] arm-smmu arm-smmu.0.auto:        Stage-2: 48-bit IPA -> 
48-bit PA

>> You've been telling Linux that that is the case, and now the message
>> is finally getting through to VFIO. If we weren't also lazy in
>> io-pgtable-arm about what shareability attribute to use for IOMMU_CACHE,
>> you would have actually had the broken VFIO behaviour that that check
>> is now defending against.
> 
> lazy in io-pgtable-arm == ??
> 
> I assume you're talking about something which is (not) done in
> arm_lpae_prot_to_pte()? Could you please clarify, as I didn't understand?

Strictly, a Cacheable mapping for a non-coherent device should probably 
have the Non-Shareable attribute, but since we assume that a 
non-coherent interconnect would ignore the shareability anyway, we don't 
bother, and just always use Inner-Shareable since that's the right value 
for when it *will* matter. Also io-pgtable couldn't make that decision 
itself, since it doesn't know anything about the intended use of the 
mapping other than an input address, and output address, and some 
abstract iommu_prot flags.

> IIUC (and I've been wrong before), the IOMMU_CACHE "prot" flag means
> that memory mapped by the IOMMU for DMA is coherent w.r.t. CPU caches,
> and VFIO specifically needs it because:
> 
> 	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> 	 * restore cache coherency.
> 
> i.o.w. user space can't execute cache invalidation instructions (DC CIVAC etc),
> which would make cache-coherent DMA transactions the only viable possibility.
> Right?

Yes, userspace is in general not allowed to bypass CPU coherency for 
security reasons, so the only way it's viable for a device to operate 
directly on userspace mappings is if that device can snoop caches as 
well. Of course it's technically a bit different when that "userspace" 
is a virtual machine manager that's then going to hand control of the 
device to a VM that *can* do its own non-coherent cache maintenance, but 
I don't think we can easily make that distinction at the VFIO level.

>> I'd argue that you do want to make the DT change, because it's the truth
>> of the hardware. Even if you did want to keep doing the significant
>> extra work of maintaining non-coherent pagetables (there is a dubious
>> snoop latency vs. TLB miss rate argument), that would be better achieved
>> at the level of the io_pgtable_cfg, not by lying about the entire SMMU.
> 
> I do agree that we could set the dma-coherent property in the SMMU node
> to skip some of the wmb() instructions in the TLB invalidation procedures
> for stage 1/stage 2 translations. I wasn't trying to make any argument
> in favor of manually maintaining the cache coherency with the page tables.

FWIW it's not about the TLB maintenance in the driver, it's io-pgtable's 
cache maintenance on PTE updates, and even more so the synchronisation 
thereof, that gets pretty involved.

> I'm just not exactly clear what does the page table walk of the SMMU TCU
> have to do with the cache coherence of the DMA transactions forwarded/translated
> by the TBUs. I mean, I saw the comment below:
> 
> 	/* Assume that a coherent TCU implies coherent TBUs */
> 
> but I simply don't understand what is it that gives this assumption any
> grounds.

In short, it's based on how people tend to design SoCs in practice; I'm 
yet to come across any system where it doesn't hold.

>> However, since Jason refactored things at the VFIO end too, it looks like
>> this should now be consistently checked for every individual device
>> bound to a VFIO driver, so we might be able to do a bit better, as
>> below.
> 
> hmm, the change in vfio_group_find_or_alloc() between iommu_capable(dev->bus)
> and device_iommu_capable(dev) took place in commit a9cf69d0e7f2 ("Merge
> tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfio"), says my
> git blame. Pretty strange. I had to use git log --graph to find your
> commit specifically: 3b498b665621 ("vfio: Use device_iommu_capable()").

That's because that line does actually belong to the merge commit, where 
Linus resolved the conflict between my IOMMU patch changing the call and 
Jason's VFIO patch moving the callsite.

>> I'd be rather surprised if anyone ever genuinely built this
>> topology, but it does happen to be the one other combination that's easy
>> to infer with reasonable confidence.
> 
> this topology == ?
> non-coherent SMMU page table walks but cache coherent DMA traffic of
> SMMU upstream device?

Yes.

> Feasible or not, VFIO doesn't have a problem working with that device
> even if the SMMU doesn't do coherent walking of its translation tables,
> no?

As you've quoted from the SoC documentation and demonstrated with this 
patch, this SMMU *is* coherent for both pagetable walks and client 
translation, and that's what matters in terms of VFIO not being 
completely broken. It makes no difference what pagetable walk attributes 
Linux actually uses, the underlying hardware is there to make 
IOMMU_CACHE snoop, which in this one weird corner case means that a 
couple of assumptions based on incorrect information cancel out, and 
something that seemingly shouldn't work, does.

>>
>> Thanks,
>> Robin.
>>
>> ----->8-----
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> index 30dab1418e3f..a5ad9d6b51cf 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> @@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>>   	switch (cap) {
>>   	case IOMMU_CAP_CACHE_COHERENCY:
>>   		/* Assume that a coherent TCU implies coherent TBUs */
>> -		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK;
>> +		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK ||
>> +			device_get_dma_attr(dev) == DEV_DMA_COHERENT;
> 
> So this works for enetc, thanks. However, do we need to also consider
> handling DEV_DMA_NOT_SUPPORTED, and thus, testing != DEV_DMA_NON_COHERENT
> could be better than == DEV_DMA_COHERENT?

If firmware says the device can't access memory, can we make its memory 
accesses coherent?... Since it's likely to have a pretty bad time with 
VFIO either way, I think we can dodge getting philosophical here :)

> But from your response and the explanation in commit df198b37e72c
> ("iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY better"), I'm not
> clear why we would keep looking at the COHERENT_WALK feature at all?

Because if the SMMU is coherent, then a stage 1 translation with 
IOMMU_CACHE can output cache-snooping attributes even if the device 
can't do so natively.

Of course that isn't necessarily true for stage 2 (without venturing 
into S2CR attribute overrides), so we're still some way off giving an 
accurate answer here, but as the original commit said, the aim was not 
to be right, just less wrong.

I'll try to improve the comment when I spin this tweak into a proper patch.

Thanks,
Robin.

> It seems to my layman eyes that we are artificially coupling 2 concepts
> that have nothing in common, and bad reporting for one of them is
> causing trouble with the other.
> 
>>   	case IOMMU_CAP_NOEXEC:
>>   		return true;
>>   	default:
> 
> Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
@ 2022-12-14 20:33       ` Robin Murphy
  0 siblings, 0 replies; 14+ messages in thread
From: Robin Murphy @ 2022-12-14 20:33 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

On 2022-12-14 16:53, Vladimir Oltean wrote:
> Hi Robin,
> 
> On Thu, Dec 08, 2022 at 07:01:16PM +0000, Robin Murphy wrote:
>> It's more just good fortune that it ever worked properly at all.
> 
> Thanks for the response. I don't know much about IOMMUs or the ARM SMMU,
> I'm trying to understand what you've said. I hope you don't mind a few
> foolish questions.
> 
>> We have to make the DT authoritative about coherency because cases exist where
>> the ID register is misconfigured.
> 
> Which ID register? The ARM_SMMU_ID0_CTTW bit in ARM_SMMU_GR0_ID0, as
> read by arm_smmu_device_cfg_probe()?

Yes.

> I tried to find more about this bit (driver suggests it's bit 14), but I'm a bit lost.
> First of all, I don't know where to find the ID0 register for MMU-500.
> I looked at the register summary here and didn't find it:
> https://developer.arm.com/documentation/ddi0517/f/programmers-model/register-summary/global-address-space-0-registers-summary?lang=en
> Then I googled it and found this page, where it just says that at bit 14
> there's indeed something named CTTW (unexplained) which is hardcoded to 0:
> https://developer.arm.com/documentation/ddi0517/e/programmers-model/memory-model/reset-values?lang=en
> I did however download the SMMU v2 arch spec PDF at
> https://developer.arm.com/documentation/ihi0062/latest
> 
> and there, I did find it. But I'm not sure why the MMU-500 says it
> *should* be hardcoded to 0? Is this what you call "misconfigured"?

For some reason, SMMU TRMs seem to have a tradition of describing their 
implementation of architectural registers in weird and unexpected 
places... Anyway that appears to be a documentation bug, since the 
actual SMMU_IDR0.CTTW value should reflect whatever is sampled on the 
cfg_cttw input at reset, per:

https://developer.arm.com/documentation/ddi0517/f/signal-descriptions/miscellaneous-signals/tie-off-signals

> On my hardware (both LS1028A and LS1088A), it reads:
> 
> $ dmesg | grep smmu
> [    4.825109] arm-smmu 5000000.iommu: probing hardware configuration...
> [    4.831625] arm-smmu 5000000.iommu: SMMUv2 with:
> [    4.836293] arm-smmu 5000000.iommu: GR0_ID0: 0x7c013e80
> [    4.841569] arm-smmu 5000000.iommu:  stage 1 translation
> [    4.846931] arm-smmu 5000000.iommu:  stage 2 translation
> [    4.852293] arm-smmu 5000000.iommu:  nested translation
> [    4.857573] arm-smmu 5000000.iommu:  stream matching with 128 register groups
> [    4.864776] arm-smmu 5000000.iommu:  64 context banks (0 stage-2 only)
> [    4.871372] arm-smmu 5000000.iommu:  Supported page sizes: 0x61311000
> [    4.877868] arm-smmu 5000000.iommu:  Stage-1: 48-bit VA -> 48-bit IPA
> [    4.884375] arm-smmu 5000000.iommu:  Stage-2: 48-bit IPA -> 48-bit PA
> [    4.893113] arm-smmu 5000000.iommu:  preserved 0 boot mappings
> 
> On the other hand, in the verbal (no registers) documentation of the
> MMU-500 integration in my SoCs, it does say that "The SMMU supports cache
> coherency for page table walks and DVM transactions for page table cache
> maintenance operations."
> 
> Does looking at the CTTW bit make any sense for MMU-500?

In general, yes. The result above does imply that NXP have inadvertently 
set cfg_cttw wrong. For the avoidance of doubt, here's another MMU-500 
showing SMMU_IDR0.CTTW set:

[    3.014972] arm-smmu arm-smmu.0.auto: probing hardware configuration...
[    3.014974] arm-smmu arm-smmu.0.auto: SMMUv2 with:
[    3.014976] arm-smmu arm-smmu.0.auto:        stage 2 translation
[    3.014977] arm-smmu arm-smmu.0.auto:        coherent table walk
[    3.014979] arm-smmu arm-smmu.0.auto:        stream matching with 128 
register groups
[    3.014981] arm-smmu arm-smmu.0.auto:        128 context banks (128 
stage-2 only)
[    3.014984] arm-smmu arm-smmu.0.auto:        Supported page sizes: 
0x60211000
[    3.014986] arm-smmu arm-smmu.0.auto:        Stage-2: 48-bit IPA -> 
48-bit PA

>> You've been telling Linux that that is the case, and now the message
>> is finally getting through to VFIO. If we weren't also lazy in
>> io-pgtable-arm about what shareability attribute to use for IOMMU_CACHE,
>> you would have actually had the broken VFIO behaviour that that check
>> is now defending against.
> 
> lazy in io-pgtable-arm == ??
> 
> I assume you're talking about something which is (not) done in
> arm_lpae_prot_to_pte()? Could you please clarify, as I didn't understand?

Strictly, a Cacheable mapping for a non-coherent device should probably 
have the Non-Shareable attribute, but since we assume that a 
non-coherent interconnect would ignore the shareability anyway, we don't 
bother, and just always use Inner-Shareable since that's the right value 
for when it *will* matter. Also io-pgtable couldn't make that decision 
itself, since it doesn't know anything about the intended use of the 
mapping other than an input address, and output address, and some 
abstract iommu_prot flags.

> IIUC (and I've been wrong before), the IOMMU_CACHE "prot" flag means
> that memory mapped by the IOMMU for DMA is coherent w.r.t. CPU caches,
> and VFIO specifically needs it because:
> 
> 	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> 	 * restore cache coherency.
> 
> i.o.w. user space can't execute cache invalidation instructions (DC CIVAC etc),
> which would make cache-coherent DMA transactions the only viable possibility.
> Right?

Yes, userspace is in general not allowed to bypass CPU coherency for 
security reasons, so the only way it's viable for a device to operate 
directly on userspace mappings is if that device can snoop caches as 
well. Of course it's technically a bit different when that "userspace" 
is a virtual machine manager that's then going to hand control of the 
device to a VM that *can* do its own non-coherent cache maintenance, but 
I don't think we can easily make that distinction at the VFIO level.

>> I'd argue that you do want to make the DT change, because it's the truth
>> of the hardware. Even if you did want to keep doing the significant
>> extra work of maintaining non-coherent pagetables (there is a dubious
>> snoop latency vs. TLB miss rate argument), that would be better achieved
>> at the level of the io_pgtable_cfg, not by lying about the entire SMMU.
> 
> I do agree that we could set the dma-coherent property in the SMMU node
> to skip some of the wmb() instructions in the TLB invalidation procedures
> for stage 1/stage 2 translations. I wasn't trying to make any argument
> in favor of manually maintaining the cache coherency with the page tables.

FWIW it's not about the TLB maintenance in the driver, it's io-pgtable's 
cache maintenance on PTE updates, and even more so the synchronisation 
thereof, that gets pretty involved.

> I'm just not exactly clear what does the page table walk of the SMMU TCU
> have to do with the cache coherence of the DMA transactions forwarded/translated
> by the TBUs. I mean, I saw the comment below:
> 
> 	/* Assume that a coherent TCU implies coherent TBUs */
> 
> but I simply don't understand what is it that gives this assumption any
> grounds.

In short, it's based on how people tend to design SoCs in practice; I'm 
yet to come across any system where it doesn't hold.

>> However, since Jason refactored things at the VFIO end too, it looks like
>> this should now be consistently checked for every individual device
>> bound to a VFIO driver, so we might be able to do a bit better, as
>> below.
> 
> hmm, the change in vfio_group_find_or_alloc() between iommu_capable(dev->bus)
> and device_iommu_capable(dev) took place in commit a9cf69d0e7f2 ("Merge
> tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfio"), says my
> git blame. Pretty strange. I had to use git log --graph to find your
> commit specifically: 3b498b665621 ("vfio: Use device_iommu_capable()").

That's because that line does actually belong to the merge commit, where 
Linus resolved the conflict between my IOMMU patch changing the call and 
Jason's VFIO patch moving the callsite.

>> I'd be rather surprised if anyone ever genuinely built this
>> topology, but it does happen to be the one other combination that's easy
>> to infer with reasonable confidence.
> 
> this topology == ?
> non-coherent SMMU page table walks but cache coherent DMA traffic of
> SMMU upstream device?

Yes.

> Feasible or not, VFIO doesn't have a problem working with that device
> even if the SMMU doesn't do coherent walking of its translation tables,
> no?

As you've quoted from the SoC documentation and demonstrated with this 
patch, this SMMU *is* coherent for both pagetable walks and client 
translation, and that's what matters in terms of VFIO not being 
completely broken. It makes no difference what pagetable walk attributes 
Linux actually uses, the underlying hardware is there to make 
IOMMU_CACHE snoop, which in this one weird corner case means that a 
couple of assumptions based on incorrect information cancel out, and 
something that seemingly shouldn't work, does.

>>
>> Thanks,
>> Robin.
>>
>> ----->8-----
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> index 30dab1418e3f..a5ad9d6b51cf 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> @@ -1320,7 +1320,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>>   	switch (cap) {
>>   	case IOMMU_CAP_CACHE_COHERENCY:
>>   		/* Assume that a coherent TCU implies coherent TBUs */
>> -		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK;
>> +		return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK ||
>> +			device_get_dma_attr(dev) == DEV_DMA_COHERENT;
> 
> So this works for enetc, thanks. However, do we need to also consider
> handling DEV_DMA_NOT_SUPPORTED, and thus, testing != DEV_DMA_NON_COHERENT
> could be better than == DEV_DMA_COHERENT?

If firmware says the device can't access memory, can we make its memory 
accesses coherent?... Since it's likely to have a pretty bad time with 
VFIO either way, I think we can dodge getting philosophical here :)

> But from your response and the explanation in commit df198b37e72c
> ("iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY better"), I'm not
> clear why we would keep looking at the COHERENT_WALK feature at all?

Because if the SMMU is coherent, then a stage 1 translation with 
IOMMU_CACHE can output cache-snooping attributes even if the device 
can't do so natively.

Of course that isn't necessarily true for stage 2 (without venturing 
into S2CR attribute overrides), so we're still some way off giving an 
accurate answer here, but as the original commit said, the aim was not 
to be right, just less wrong.

I'll try to improve the comment when I spin this tweak into a proper patch.

Thanks,
Robin.

> It seems to my layman eyes that we are artificially coupling 2 concepts
> that have nothing in common, and bad reporting for one of them is
> causing trouble with the other.
> 
>>   	case IOMMU_CAP_NOEXEC:
>>   		return true;
>>   	default:
> 
> Thanks!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
  2022-12-14 20:33       ` Robin Murphy
@ 2022-12-19 12:16         ` Vladimir Oltean
  -1 siblings, 0 replies; 14+ messages in thread
From: Vladimir Oltean @ 2022-12-19 12:16 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

Hi Robin,

On Wed, Dec 14, 2022 at 08:33:10PM +0000, Robin Murphy wrote:
> > Does looking at the CTTW bit make any sense for MMU-500?
> 
> In general, yes. The result above does imply that NXP have inadvertently set
> cfg_cttw wrong. For the avoidance of doubt, here's another MMU-500 showing
> SMMU_IDR0.CTTW set:
> 
> [    3.014972] arm-smmu arm-smmu.0.auto: probing hardware configuration...
> [    3.014974] arm-smmu arm-smmu.0.auto: SMMUv2 with:
> [    3.014976] arm-smmu arm-smmu.0.auto:        stage 2 translation
> [    3.014977] arm-smmu arm-smmu.0.auto:        coherent table walk
> [    3.014979] arm-smmu arm-smmu.0.auto:        stream matching with 128 register groups
> [    3.014981] arm-smmu arm-smmu.0.auto:        128 context banks (128 stage-2 only)
> [    3.014984] arm-smmu arm-smmu.0.auto:        Supported page sizes: 0x60211000
> [    3.014986] arm-smmu arm-smmu.0.auto:        Stage-2: 48-bit IPA -> 48-bit PA

Thanks for the explanations and the patch you've sent separately.

I have a side question, why is the dev_name() of your SMMU set to
"arm-smmu.0.auto" (determined by PLATFORM_DEVID_AUTO if I'm not mistaken)?
I'm asking because I would like to study the mechanism through which
your SMMU platform device get probed, to make sure that it's not
possible, during shutdown, for both platform_driver :: shutdown()
and platform_driver :: remove() methods to get called by the driver core.
This is generally not disallowed, and even possible if the entity who
registers these platform devices has its ->shutdown() method pointing
at ->remove().

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
@ 2022-12-19 12:16         ` Vladimir Oltean
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Oltean @ 2022-12-19 12:16 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

Hi Robin,

On Wed, Dec 14, 2022 at 08:33:10PM +0000, Robin Murphy wrote:
> > Does looking at the CTTW bit make any sense for MMU-500?
> 
> In general, yes. The result above does imply that NXP have inadvertently set
> cfg_cttw wrong. For the avoidance of doubt, here's another MMU-500 showing
> SMMU_IDR0.CTTW set:
> 
> [    3.014972] arm-smmu arm-smmu.0.auto: probing hardware configuration...
> [    3.014974] arm-smmu arm-smmu.0.auto: SMMUv2 with:
> [    3.014976] arm-smmu arm-smmu.0.auto:        stage 2 translation
> [    3.014977] arm-smmu arm-smmu.0.auto:        coherent table walk
> [    3.014979] arm-smmu arm-smmu.0.auto:        stream matching with 128 register groups
> [    3.014981] arm-smmu arm-smmu.0.auto:        128 context banks (128 stage-2 only)
> [    3.014984] arm-smmu arm-smmu.0.auto:        Supported page sizes: 0x60211000
> [    3.014986] arm-smmu arm-smmu.0.auto:        Stage-2: 48-bit IPA -> 48-bit PA

Thanks for the explanations and the patch you've sent separately.

I have a side question, why is the dev_name() of your SMMU set to
"arm-smmu.0.auto" (determined by PLATFORM_DEVID_AUTO if I'm not mistaken)?
I'm asking because I would like to study the mechanism through which
your SMMU platform device get probed, to make sure that it's not
possible, during shutdown, for both platform_driver :: shutdown()
and platform_driver :: remove() methods to get called by the driver core.
This is generally not disallowed, and even possible if the entity who
registers these platform devices has its ->shutdown() method pointing
at ->remove().

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
  2022-12-19 12:16         ` Vladimir Oltean
@ 2023-01-03 18:12           ` Robin Murphy
  -1 siblings, 0 replies; 14+ messages in thread
From: Robin Murphy @ 2023-01-03 18:12 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

On 19/12/2022 12:16 pm, Vladimir Oltean wrote:
> Hi Robin,
> 
> On Wed, Dec 14, 2022 at 08:33:10PM +0000, Robin Murphy wrote:
>>> Does looking at the CTTW bit make any sense for MMU-500?
>>
>> In general, yes. The result above does imply that NXP have inadvertently set
>> cfg_cttw wrong. For the avoidance of doubt, here's another MMU-500 showing
>> SMMU_IDR0.CTTW set:
>>
>> [    3.014972] arm-smmu arm-smmu.0.auto: probing hardware configuration...
>> [    3.014974] arm-smmu arm-smmu.0.auto: SMMUv2 with:
>> [    3.014976] arm-smmu arm-smmu.0.auto:        stage 2 translation
>> [    3.014977] arm-smmu arm-smmu.0.auto:        coherent table walk
>> [    3.014979] arm-smmu arm-smmu.0.auto:        stream matching with 128 register groups
>> [    3.014981] arm-smmu arm-smmu.0.auto:        128 context banks (128 stage-2 only)
>> [    3.014984] arm-smmu arm-smmu.0.auto:        Supported page sizes: 0x60211000
>> [    3.014986] arm-smmu arm-smmu.0.auto:        Stage-2: 48-bit IPA -> 48-bit PA
> 
> Thanks for the explanations and the patch you've sent separately.
> 
> I have a side question, why is the dev_name() of your SMMU set to
> "arm-smmu.0.auto" (determined by PLATFORM_DEVID_AUTO if I'm not mistaken)?

This is an ACPI-based machine, where platform device discovery and 
creation is... different :)

SMMUs are among those managed by drivers/acpi/arm64/iort.c

> I'm asking because I would like to study the mechanism through which
> your SMMU platform device get probed, to make sure that it's not
> possible, during shutdown, for both platform_driver :: shutdown()
> and platform_driver :: remove() methods to get called by the driver core.
> This is generally not disallowed, and even possible if the entity who
> registers these platform devices has its ->shutdown() method pointing
> at ->remove().

Yikes, I'd very much hope that that's not a thing!

Cheers,
Robin.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
@ 2023-01-03 18:12           ` Robin Murphy
  0 siblings, 0 replies; 14+ messages in thread
From: Robin Murphy @ 2023-01-03 18:12 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

On 19/12/2022 12:16 pm, Vladimir Oltean wrote:
> Hi Robin,
> 
> On Wed, Dec 14, 2022 at 08:33:10PM +0000, Robin Murphy wrote:
>>> Does looking at the CTTW bit make any sense for MMU-500?
>>
>> In general, yes. The result above does imply that NXP have inadvertently set
>> cfg_cttw wrong. For the avoidance of doubt, here's another MMU-500 showing
>> SMMU_IDR0.CTTW set:
>>
>> [    3.014972] arm-smmu arm-smmu.0.auto: probing hardware configuration...
>> [    3.014974] arm-smmu arm-smmu.0.auto: SMMUv2 with:
>> [    3.014976] arm-smmu arm-smmu.0.auto:        stage 2 translation
>> [    3.014977] arm-smmu arm-smmu.0.auto:        coherent table walk
>> [    3.014979] arm-smmu arm-smmu.0.auto:        stream matching with 128 register groups
>> [    3.014981] arm-smmu arm-smmu.0.auto:        128 context banks (128 stage-2 only)
>> [    3.014984] arm-smmu arm-smmu.0.auto:        Supported page sizes: 0x60211000
>> [    3.014986] arm-smmu arm-smmu.0.auto:        Stage-2: 48-bit IPA -> 48-bit PA
> 
> Thanks for the explanations and the patch you've sent separately.
> 
> I have a side question, why is the dev_name() of your SMMU set to
> "arm-smmu.0.auto" (determined by PLATFORM_DEVID_AUTO if I'm not mistaken)?

This is an ACPI-based machine, where platform device discovery and 
creation is... different :)

SMMUs are among those managed by drivers/acpi/arm64/iort.c

> I'm asking because I would like to study the mechanism through which
> your SMMU platform device get probed, to make sure that it's not
> possible, during shutdown, for both platform_driver :: shutdown()
> and platform_driver :: remove() methods to get called by the driver core.
> This is generally not disallowed, and even possible if the entity who
> registers these platform devices has its ->shutdown() method pointing
> at ->remove().

Yikes, I'd very much hope that that's not a thing!

Cheers,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
  2023-01-03 18:12           ` Robin Murphy
@ 2023-01-05 12:04             ` Vladimir Oltean
  -1 siblings, 0 replies; 14+ messages in thread
From: Vladimir Oltean @ 2023-01-05 12:04 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

On Tue, Jan 03, 2023 at 06:12:29PM +0000, Robin Murphy wrote:
> > I have a side question, why is the dev_name() of your SMMU set to
> > "arm-smmu.0.auto" (determined by PLATFORM_DEVID_AUTO if I'm not mistaken)?
> 
> This is an ACPI-based machine, where platform device discovery and creation
> is... different :)
> 
> SMMUs are among those managed by drivers/acpi/arm64/iort.c
> 
> > I'm asking because I would like to study the mechanism through which
> > your SMMU platform device get probed, to make sure that it's not
> > possible, during shutdown, for both platform_driver :: shutdown()
> > and platform_driver :: remove() methods to get called by the driver core.
> > This is generally not disallowed, and even possible if the entity who
> > registers these platform devices has its ->shutdown() method pointing
> > at ->remove().
> 
> Yikes, I'd very much hope that that's not a thing!

Ah, ok. Appears to be fine. Looking at drivers/acpi/arm64/iort.c,
it seems that no one is removing those platform devices.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent
@ 2023-01-05 12:04             ` Vladimir Oltean
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Oltean @ 2023-01-05 12:04 UTC (permalink / raw)
  To: Robin Murphy
  Cc: devicetree, iommu, Laurentiu Tudor, Will Deacon,
	linux-arm-kernel, Shawn Guo, Li Yang, Rob Herring,
	Krzysztof Kozlowski, linux-kernel, Michael Walle

On Tue, Jan 03, 2023 at 06:12:29PM +0000, Robin Murphy wrote:
> > I have a side question, why is the dev_name() of your SMMU set to
> > "arm-smmu.0.auto" (determined by PLATFORM_DEVID_AUTO if I'm not mistaken)?
> 
> This is an ACPI-based machine, where platform device discovery and creation
> is... different :)
> 
> SMMUs are among those managed by drivers/acpi/arm64/iort.c
> 
> > I'm asking because I would like to study the mechanism through which
> > your SMMU platform device get probed, to make sure that it's not
> > possible, during shutdown, for both platform_driver :: shutdown()
> > and platform_driver :: remove() methods to get called by the driver core.
> > This is generally not disallowed, and even possible if the entity who
> > registers these platform devices has its ->shutdown() method pointing
> > at ->remove().
> 
> Yikes, I'd very much hope that that's not a thing!

Ah, ok. Appears to be fine. Looking at drivers/acpi/arm64/iort.c,
it seems that no one is removing those platform devices.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-01-05 22:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-08 15:15 [RFC PATCH] arm64: dts: ls1028a: mark ARM SMMU as DMA coherent Vladimir Oltean
2022-12-08 15:15 ` Vladimir Oltean
2022-12-08 19:01 ` Robin Murphy
2022-12-08 19:01   ` Robin Murphy
2022-12-14 16:53   ` Vladimir Oltean
2022-12-14 16:53     ` Vladimir Oltean
2022-12-14 20:33     ` Robin Murphy
2022-12-14 20:33       ` Robin Murphy
2022-12-19 12:16       ` Vladimir Oltean
2022-12-19 12:16         ` Vladimir Oltean
2023-01-03 18:12         ` Robin Murphy
2023-01-03 18:12           ` Robin Murphy
2023-01-05 12:04           ` Vladimir Oltean
2023-01-05 12:04             ` Vladimir Oltean

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.