All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-05-30  8:03 ` Thierry Reding
  0 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30  8:03 UTC (permalink / raw)
  To: Ben Skeggs, Christoph Hellwig, Russell King
  Cc: nouveau, Joerg Roedel, iommu, dri-devel, Jordan Crouse,
	Daniel Vetter, linux-tegra, Robin Murphy, linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

An unfortunate interaction between the 32-bit ARM DMA/IOMMU mapping code
and Tegra SMMU driver changes to support IOMMU groups introduced a boot-
time regression on Tegra124. This was caught very late because none of
the standard configurations that are tested on Tegra enable the ARM DMA/
IOMMU mapping code since it is not needed.

The reason for the failure is that the GPU found on Tegra uses a special
bit in physical addresses to determine whether or not a buffer is mapped
through the SMMU. In order to achieve this, the Nouveau driver needs to
explicitly understand which buffers are mapped through the SMMU and
which aren't. Hiding usage of the SMMU behind the DMA API is bound to
fail because the knowledge doesn't exist. Furthermore, the GPU has its
own IOMMU and in most cases doesn't need buffers to be physically or
virtually contiguous. One notable exception is for compressible buffers
which need to be mapped with large pages, which in turn require all the
small pages in a large page to be contiguous. This can be achieved with
an SMMU mapping, though it isn't currently supported in Nouveau. Since
Translating through the SMMU is unnecessary and can have a negative
impact on performance for the common case, so we want to avoid it when
possible.

This series of patches adds a 32-bit ARM specific API that allows a
driver to detach the device from the DMA/IOMMU mapping so that it can
provide its own implementation for dealing with the SMMU. The second
patch makes use of that new API in the Nouveau driver to fix the
regression.

Thierry

Thierry Reding (2):
  ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
  drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

 arch/arm/include/asm/dma-mapping.h               |  3 +++
 arch/arm/mm/dma-mapping-nommu.c                  |  4 ++++
 arch/arm/mm/dma-mapping.c                        | 16 ++++++++++++++++
 .../gpu/drm/nouveau/nvkm/engine/device/tegra.c   |  5 +++++
 4 files changed, 28 insertions(+)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 0/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-05-30  8:03 ` Thierry Reding
  0 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30  8:03 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

An unfortunate interaction between the 32-bit ARM DMA/IOMMU mapping code
and Tegra SMMU driver changes to support IOMMU groups introduced a boot-
time regression on Tegra124. This was caught very late because none of
the standard configurations that are tested on Tegra enable the ARM DMA/
IOMMU mapping code since it is not needed.

The reason for the failure is that the GPU found on Tegra uses a special
bit in physical addresses to determine whether or not a buffer is mapped
through the SMMU. In order to achieve this, the Nouveau driver needs to
explicitly understand which buffers are mapped through the SMMU and
which aren't. Hiding usage of the SMMU behind the DMA API is bound to
fail because the knowledge doesn't exist. Furthermore, the GPU has its
own IOMMU and in most cases doesn't need buffers to be physically or
virtually contiguous. One notable exception is for compressible buffers
which need to be mapped with large pages, which in turn require all the
small pages in a large page to be contiguous. This can be achieved with
an SMMU mapping, though it isn't currently supported in Nouveau. Since
Translating through the SMMU is unnecessary and can have a negative
impact on performance for the common case, so we want to avoid it when
possible.

This series of patches adds a 32-bit ARM specific API that allows a
driver to detach the device from the DMA/IOMMU mapping so that it can
provide its own implementation for dealing with the SMMU. The second
patch makes use of that new API in the Nouveau driver to fix the
regression.

Thierry

Thierry Reding (2):
  ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
  drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping

 arch/arm/include/asm/dma-mapping.h               |  3 +++
 arch/arm/mm/dma-mapping-nommu.c                  |  4 ++++
 arch/arm/mm/dma-mapping.c                        | 16 ++++++++++++++++
 .../gpu/drm/nouveau/nvkm/engine/device/tegra.c   |  5 +++++
 4 files changed, 28 insertions(+)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
  2018-05-30  8:03 ` Thierry Reding
@ 2018-05-30  8:03   ` Thierry Reding
  -1 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30  8:03 UTC (permalink / raw)
  To: Ben Skeggs, Christoph Hellwig, Russell King
  Cc: nouveau, Joerg Roedel, iommu, dri-devel, Jordan Crouse,
	Daniel Vetter, linux-tegra, Robin Murphy, linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Implement this function to enable drivers from detaching from any IOMMU
domains that architecture code might have attached them to so that they
can take exclusive control of the IOMMU via the IOMMU API.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v3:
- make API 32-bit ARM specific
- avoid extra local variable

Changes in v2:
- fix compilation

 arch/arm/include/asm/dma-mapping.h |  3 +++
 arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
 arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
 3 files changed, 23 insertions(+)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 8436f6ade57d..5960e9f3a9d0 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 #define arch_teardown_dma_ops arch_teardown_dma_ops
 extern void arch_teardown_dma_ops(struct device *dev);
 
+#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
+extern void arm_dma_iommu_detach_device(struct device *dev);
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index f448a0663b10..eb781369377b 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 void arch_teardown_dma_ops(struct device *dev)
 {
 }
+
+void arm_dma_iommu_detach_device(struct device *dev)
+{
+}
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index af27f1c22d93..6d8af08b3e7d 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
 
 	arm_teardown_iommu_dma_ops(dev);
 }
+
+void arm_dma_iommu_detach_device(struct device *dev)
+{
+#ifdef CONFIG_ARM_DMA_USE_IOMMU
+	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+
+	if (!mapping)
+		return;
+
+	arm_iommu_release_mapping(mapping);
+	arm_iommu_detach_device(dev);
+
+	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));
+#endif
+}
+EXPORT_SYMBOL(arm_dma_iommu_detach_device);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
@ 2018-05-30  8:03   ` Thierry Reding
  0 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30  8:03 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Implement this function to enable drivers from detaching from any IOMMU
domains that architecture code might have attached them to so that they
can take exclusive control of the IOMMU via the IOMMU API.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v3:
- make API 32-bit ARM specific
- avoid extra local variable

Changes in v2:
- fix compilation

 arch/arm/include/asm/dma-mapping.h |  3 +++
 arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
 arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
 3 files changed, 23 insertions(+)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 8436f6ade57d..5960e9f3a9d0 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 #define arch_teardown_dma_ops arch_teardown_dma_ops
 extern void arch_teardown_dma_ops(struct device *dev);
 
+#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
+extern void arm_dma_iommu_detach_device(struct device *dev);
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index f448a0663b10..eb781369377b 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 void arch_teardown_dma_ops(struct device *dev)
 {
 }
+
+void arm_dma_iommu_detach_device(struct device *dev)
+{
+}
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index af27f1c22d93..6d8af08b3e7d 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
 
 	arm_teardown_iommu_dma_ops(dev);
 }
+
+void arm_dma_iommu_detach_device(struct device *dev)
+{
+#ifdef CONFIG_ARM_DMA_USE_IOMMU
+	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+
+	if (!mapping)
+		return;
+
+	arm_iommu_release_mapping(mapping);
+	arm_iommu_detach_device(dev);
+
+	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));
+#endif
+}
+EXPORT_SYMBOL(arm_dma_iommu_detach_device);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-05-30  8:03 ` Thierry Reding
@ 2018-05-30  8:03   ` Thierry Reding
  -1 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30  8:03 UTC (permalink / raw)
  To: Ben Skeggs, Christoph Hellwig, Russell King
  Cc: nouveau, Joerg Roedel, iommu, dri-devel, Jordan Crouse,
	Daniel Vetter, linux-tegra, Robin Murphy, linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Depending on the kernel configuration, early ARM architecture setup code
may have attached the GPU to a DMA/IOMMU mapping that transparently uses
the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
backed buffers (a special bit in the GPU's MMU page tables indicates the
memory path to take: via the SMMU or directly to the memory controller).
Transparently backing DMA memory with an IOMMU prevents Nouveau from
properly handling such memory accesses and causes memory access faults.

As a side-note: buffers other than those allocated in instance memory
don't need to be physically contiguous from the GPU's perspective since
the GPU can map them into contiguous buffers using its own MMU. Mapping
these buffers through the IOMMU is unnecessary and will even lead to
performance degradation because of the additional translation. One
exception to this are compressible buffers which need large pages. In
order to enable these large pages, multiple small pages will have to be
combined into one large (I/O virtually contiguous) mapping via the
IOMMU. However, that is a topic outside the scope of this fix and isn't
currently supported. An implementation will want to explicitly create
these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
mapping would still be required.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v3:
- clarify the use of IOMMU mapping for compressible buffers
- squash multiple patches into this

 drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index 78597da6313a..d0538af1b967 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
 	unsigned long pgsize_bitmap;
 	int ret;
 
+#if IS_ENABLED(CONFIG_ARM)
+	/* make sure we can use the IOMMU exclusively */
+	arm_dma_iommu_detach_device(dev);
+#endif
+
 	if (!tdev->func->iommu_bit)
 		return;
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-05-30  8:03   ` Thierry Reding
  0 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30  8:03 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Depending on the kernel configuration, early ARM architecture setup code
may have attached the GPU to a DMA/IOMMU mapping that transparently uses
the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
backed buffers (a special bit in the GPU's MMU page tables indicates the
memory path to take: via the SMMU or directly to the memory controller).
Transparently backing DMA memory with an IOMMU prevents Nouveau from
properly handling such memory accesses and causes memory access faults.

As a side-note: buffers other than those allocated in instance memory
don't need to be physically contiguous from the GPU's perspective since
the GPU can map them into contiguous buffers using its own MMU. Mapping
these buffers through the IOMMU is unnecessary and will even lead to
performance degradation because of the additional translation. One
exception to this are compressible buffers which need large pages. In
order to enable these large pages, multiple small pages will have to be
combined into one large (I/O virtually contiguous) mapping via the
IOMMU. However, that is a topic outside the scope of this fix and isn't
currently supported. An implementation will want to explicitly create
these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
mapping would still be required.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v3:
- clarify the use of IOMMU mapping for compressible buffers
- squash multiple patches into this

 drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index 78597da6313a..d0538af1b967 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
 	unsigned long pgsize_bitmap;
 	int ret;
 
+#if IS_ENABLED(CONFIG_ARM)
+	/* make sure we can use the IOMMU exclusively */
+	arm_dma_iommu_detach_device(dev);
+#endif
+
 	if (!tdev->func->iommu_bit)
 		return;
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
  2018-05-30  8:03   ` Thierry Reding
@ 2018-05-30  9:59       ` Robin Murphy
  -1 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30  9:59 UTC (permalink / raw)
  To: Thierry Reding, Ben Skeggs, Christoph Hellwig, Russell King
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 30/05/18 09:03, Thierry Reding wrote:
> From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> 
> Implement this function to enable drivers from detaching from any IOMMU
> domains that architecture code might have attached them to so that they
> can take exclusive control of the IOMMU via the IOMMU API.
> 
> Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
> Changes in v3:
> - make API 32-bit ARM specific
> - avoid extra local variable
> 
> Changes in v2:
> - fix compilation
> 
>   arch/arm/include/asm/dma-mapping.h |  3 +++
>   arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
>   arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
>   3 files changed, 23 insertions(+)
> 
> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 8436f6ade57d..5960e9f3a9d0 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>   #define arch_teardown_dma_ops arch_teardown_dma_ops
>   extern void arch_teardown_dma_ops(struct device *dev);
>   
> +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
> +extern void arm_dma_iommu_detach_device(struct device *dev);
> +
>   /* do not use this function in a driver */
>   static inline bool is_device_dma_coherent(struct device *dev)
>   {
> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> index f448a0663b10..eb781369377b 100644
> --- a/arch/arm/mm/dma-mapping-nommu.c
> +++ b/arch/arm/mm/dma-mapping-nommu.c
> @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>   void arch_teardown_dma_ops(struct device *dev)
>   {
>   }
> +
> +void arm_dma_iommu_detach_device(struct device *dev)
> +{
> +}
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index af27f1c22d93..6d8af08b3e7d 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
>   
>   	arm_teardown_iommu_dma_ops(dev);
>   }
> +
> +void arm_dma_iommu_detach_device(struct device *dev)
> +{
> +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> +
> +	if (!mapping)
> +		return;
> +
> +	arm_iommu_release_mapping(mapping);

Potentially freeing the mapping before you try to operate on it is never 
the best idea. Plus arm_iommu_detach_device() already releases a 
reference appropriately anyway, so it's a double-free.

> +	arm_iommu_detach_device(dev);
> +
> +	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));
> +#endif
> +}
> +EXPORT_SYMBOL(arm_dma_iommu_detach_device);

I really don't see why we need an extra function that essentially just 
duplicates arm_iommu_detach_device(). The only real difference here is 
that here you reset the DMA ops more appropriately, but I think the 
existing function should be fixed to do that anyway, since 
set_dma_ops(dev, NULL) now just behaves as an unconditional fallback to 
the noncoherent arm_dma_ops, which clearly isn't always right.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
@ 2018-05-30  9:59       ` Robin Murphy
  0 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/05/18 09:03, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Implement this function to enable drivers from detaching from any IOMMU
> domains that architecture code might have attached them to so that they
> can take exclusive control of the IOMMU via the IOMMU API.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> Changes in v3:
> - make API 32-bit ARM specific
> - avoid extra local variable
> 
> Changes in v2:
> - fix compilation
> 
>   arch/arm/include/asm/dma-mapping.h |  3 +++
>   arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
>   arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
>   3 files changed, 23 insertions(+)
> 
> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 8436f6ade57d..5960e9f3a9d0 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>   #define arch_teardown_dma_ops arch_teardown_dma_ops
>   extern void arch_teardown_dma_ops(struct device *dev);
>   
> +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
> +extern void arm_dma_iommu_detach_device(struct device *dev);
> +
>   /* do not use this function in a driver */
>   static inline bool is_device_dma_coherent(struct device *dev)
>   {
> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> index f448a0663b10..eb781369377b 100644
> --- a/arch/arm/mm/dma-mapping-nommu.c
> +++ b/arch/arm/mm/dma-mapping-nommu.c
> @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>   void arch_teardown_dma_ops(struct device *dev)
>   {
>   }
> +
> +void arm_dma_iommu_detach_device(struct device *dev)
> +{
> +}
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index af27f1c22d93..6d8af08b3e7d 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
>   
>   	arm_teardown_iommu_dma_ops(dev);
>   }
> +
> +void arm_dma_iommu_detach_device(struct device *dev)
> +{
> +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> +
> +	if (!mapping)
> +		return;
> +
> +	arm_iommu_release_mapping(mapping);

Potentially freeing the mapping before you try to operate on it is never 
the best idea. Plus arm_iommu_detach_device() already releases a 
reference appropriately anyway, so it's a double-free.

> +	arm_iommu_detach_device(dev);
> +
> +	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));
> +#endif
> +}
> +EXPORT_SYMBOL(arm_dma_iommu_detach_device);

I really don't see why we need an extra function that essentially just 
duplicates arm_iommu_detach_device(). The only real difference here is 
that here you reset the DMA ops more appropriately, but I think the 
existing function should be fixed to do that anyway, since 
set_dma_ops(dev, NULL) now just behaves as an unconditional fallback to 
the noncoherent arm_dma_ops, which clearly isn't always right.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-05-30  8:03   ` Thierry Reding
@ 2018-05-30 10:30       ` Robin Murphy
  -1 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30 10:30 UTC (permalink / raw)
  To: Thierry Reding, Ben Skeggs, Christoph Hellwig, Russell King
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 30/05/18 09:03, Thierry Reding wrote:
> From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> 
> Depending on the kernel configuration, early ARM architecture setup code
> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> backed buffers (a special bit in the GPU's MMU page tables indicates the
> memory path to take: via the SMMU or directly to the memory controller).
> Transparently backing DMA memory with an IOMMU prevents Nouveau from
> properly handling such memory accesses and causes memory access faults.
> 
> As a side-note: buffers other than those allocated in instance memory
> don't need to be physically contiguous from the GPU's perspective since
> the GPU can map them into contiguous buffers using its own MMU. Mapping
> these buffers through the IOMMU is unnecessary and will even lead to
> performance degradation because of the additional translation. One
> exception to this are compressible buffers which need large pages. In
> order to enable these large pages, multiple small pages will have to be
> combined into one large (I/O virtually contiguous) mapping via the
> IOMMU. However, that is a topic outside the scope of this fix and isn't
> currently supported. An implementation will want to explicitly create
> these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
> mapping would still be required.

I wonder if it might make sense to have a hook in iommu_attach_device() 
to notify the arch DMA API code when moving devices between unmanaged 
and DMA ops domains? That seems like it might be the most low-impact way 
to address the overall problem long-term.

> Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
> Changes in v3:
> - clarify the use of IOMMU mapping for compressible buffers
> - squash multiple patches into this
> 
>   drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> index 78597da6313a..d0538af1b967 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
>   	unsigned long pgsize_bitmap;
>   	int ret;
>   
> +#if IS_ENABLED(CONFIG_ARM)

Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?

> +	/* make sure we can use the IOMMU exclusively */
> +	arm_dma_iommu_detach_device(dev);

As before, I would just use the existing infrastructure the same way the 
Exynos DRM driver currently does in __exynos_iommu_attach() (albeit 
without then reattaching to another DMA ops mapping).

Robin.

> +#endif
> +
>   	if (!tdev->func->iommu_bit)
>   		return;
>   
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-05-30 10:30       ` Robin Murphy
  0 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30 10:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/05/18 09:03, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Depending on the kernel configuration, early ARM architecture setup code
> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> backed buffers (a special bit in the GPU's MMU page tables indicates the
> memory path to take: via the SMMU or directly to the memory controller).
> Transparently backing DMA memory with an IOMMU prevents Nouveau from
> properly handling such memory accesses and causes memory access faults.
> 
> As a side-note: buffers other than those allocated in instance memory
> don't need to be physically contiguous from the GPU's perspective since
> the GPU can map them into contiguous buffers using its own MMU. Mapping
> these buffers through the IOMMU is unnecessary and will even lead to
> performance degradation because of the additional translation. One
> exception to this are compressible buffers which need large pages. In
> order to enable these large pages, multiple small pages will have to be
> combined into one large (I/O virtually contiguous) mapping via the
> IOMMU. However, that is a topic outside the scope of this fix and isn't
> currently supported. An implementation will want to explicitly create
> these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
> mapping would still be required.

I wonder if it might make sense to have a hook in iommu_attach_device() 
to notify the arch DMA API code when moving devices between unmanaged 
and DMA ops domains? That seems like it might be the most low-impact way 
to address the overall problem long-term.

> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> Changes in v3:
> - clarify the use of IOMMU mapping for compressible buffers
> - squash multiple patches into this
> 
>   drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> index 78597da6313a..d0538af1b967 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
>   	unsigned long pgsize_bitmap;
>   	int ret;
>   
> +#if IS_ENABLED(CONFIG_ARM)

Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?

> +	/* make sure we can use the IOMMU exclusively */
> +	arm_dma_iommu_detach_device(dev);

As before, I would just use the existing infrastructure the same way the 
Exynos DRM driver currently does in __exynos_iommu_attach() (albeit 
without then reattaching to another DMA ops mapping).

Robin.

> +#endif
> +
>   	if (!tdev->func->iommu_bit)
>   		return;
>   
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
  2018-05-30  9:59       ` Robin Murphy
@ 2018-05-30 12:54           ` Thierry Reding
  -1 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 12:54 UTC (permalink / raw)
  To: Robin Murphy
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ben Skeggs,
	Daniel Vetter, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 4320 bytes --]

On Wed, May 30, 2018 at 10:59:30AM +0100, Robin Murphy wrote:
> On 30/05/18 09:03, Thierry Reding wrote:
> > From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > 
> > Implement this function to enable drivers from detaching from any IOMMU
> > domains that architecture code might have attached them to so that they
> > can take exclusive control of the IOMMU via the IOMMU API.
> > 
> > Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > ---
> > Changes in v3:
> > - make API 32-bit ARM specific
> > - avoid extra local variable
> > 
> > Changes in v2:
> > - fix compilation
> > 
> >   arch/arm/include/asm/dma-mapping.h |  3 +++
> >   arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
> >   arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
> >   3 files changed, 23 insertions(+)
> > 
> > diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> > index 8436f6ade57d..5960e9f3a9d0 100644
> > --- a/arch/arm/include/asm/dma-mapping.h
> > +++ b/arch/arm/include/asm/dma-mapping.h
> > @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> >   #define arch_teardown_dma_ops arch_teardown_dma_ops
> >   extern void arch_teardown_dma_ops(struct device *dev);
> > +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
> > +extern void arm_dma_iommu_detach_device(struct device *dev);
> > +
> >   /* do not use this function in a driver */
> >   static inline bool is_device_dma_coherent(struct device *dev)
> >   {
> > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> > index f448a0663b10..eb781369377b 100644
> > --- a/arch/arm/mm/dma-mapping-nommu.c
> > +++ b/arch/arm/mm/dma-mapping-nommu.c
> > @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> >   void arch_teardown_dma_ops(struct device *dev)
> >   {
> >   }
> > +
> > +void arm_dma_iommu_detach_device(struct device *dev)
> > +{
> > +}
> > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > index af27f1c22d93..6d8af08b3e7d 100644
> > --- a/arch/arm/mm/dma-mapping.c
> > +++ b/arch/arm/mm/dma-mapping.c
> > @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
> >   	arm_teardown_iommu_dma_ops(dev);
> >   }
> > +
> > +void arm_dma_iommu_detach_device(struct device *dev)
> > +{
> > +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> > +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> > +
> > +	if (!mapping)
> > +		return;
> > +
> > +	arm_iommu_release_mapping(mapping);
> 
> Potentially freeing the mapping before you try to operate on it is never the
> best idea. Plus arm_iommu_detach_device() already releases a reference
> appropriately anyway, so it's a double-free.

But the reference released by arm_iommu_detach_device() is to balance
out the reference acquired by arm_iommu_attach_device(), isn't it? In
the above, the arm_iommu_release_mapping() is supposed to drop the
final reference which was obtained by arm_iommu_create_mapping(). The
mapping shouldn't go away irrespective of the order in which these
will be called.

> > +	arm_iommu_detach_device(dev);
> > +
> > +	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));
> > +#endif
> > +}
> > +EXPORT_SYMBOL(arm_dma_iommu_detach_device);
> 
> I really don't see why we need an extra function that essentially just
> duplicates arm_iommu_detach_device(). The only real difference here is that
> here you reset the DMA ops more appropriately, but I think the existing
> function should be fixed to do that anyway, since set_dma_ops(dev, NULL) now
> just behaves as an unconditional fallback to the noncoherent arm_dma_ops,
> which clearly isn't always right.

The idea behind making this an extra function is that we can call it
unconditionally and it will do the right things. Granted, that already
doesn't quite work as elegantly anymore as I had hoped since this is
now an ARM specific function, so we need an #ifdef guard anyway.

I don't care very strongly either way, so if you and Christoph can both
agree that we just want arm_iommu_detach_device() to call the proper
variant of set_dma_ops(), that's fine with me, too.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
@ 2018-05-30 12:54           ` Thierry Reding
  0 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 30, 2018 at 10:59:30AM +0100, Robin Murphy wrote:
> On 30/05/18 09:03, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > Implement this function to enable drivers from detaching from any IOMMU
> > domains that architecture code might have attached them to so that they
> > can take exclusive control of the IOMMU via the IOMMU API.
> > 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> > Changes in v3:
> > - make API 32-bit ARM specific
> > - avoid extra local variable
> > 
> > Changes in v2:
> > - fix compilation
> > 
> >   arch/arm/include/asm/dma-mapping.h |  3 +++
> >   arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
> >   arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
> >   3 files changed, 23 insertions(+)
> > 
> > diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> > index 8436f6ade57d..5960e9f3a9d0 100644
> > --- a/arch/arm/include/asm/dma-mapping.h
> > +++ b/arch/arm/include/asm/dma-mapping.h
> > @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> >   #define arch_teardown_dma_ops arch_teardown_dma_ops
> >   extern void arch_teardown_dma_ops(struct device *dev);
> > +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
> > +extern void arm_dma_iommu_detach_device(struct device *dev);
> > +
> >   /* do not use this function in a driver */
> >   static inline bool is_device_dma_coherent(struct device *dev)
> >   {
> > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> > index f448a0663b10..eb781369377b 100644
> > --- a/arch/arm/mm/dma-mapping-nommu.c
> > +++ b/arch/arm/mm/dma-mapping-nommu.c
> > @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> >   void arch_teardown_dma_ops(struct device *dev)
> >   {
> >   }
> > +
> > +void arm_dma_iommu_detach_device(struct device *dev)
> > +{
> > +}
> > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > index af27f1c22d93..6d8af08b3e7d 100644
> > --- a/arch/arm/mm/dma-mapping.c
> > +++ b/arch/arm/mm/dma-mapping.c
> > @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
> >   	arm_teardown_iommu_dma_ops(dev);
> >   }
> > +
> > +void arm_dma_iommu_detach_device(struct device *dev)
> > +{
> > +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> > +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> > +
> > +	if (!mapping)
> > +		return;
> > +
> > +	arm_iommu_release_mapping(mapping);
> 
> Potentially freeing the mapping before you try to operate on it is never the
> best idea. Plus arm_iommu_detach_device() already releases a reference
> appropriately anyway, so it's a double-free.

But the reference released by arm_iommu_detach_device() is to balance
out the reference acquired by arm_iommu_attach_device(), isn't it? In
the above, the arm_iommu_release_mapping() is supposed to drop the
final reference which was obtained by arm_iommu_create_mapping(). The
mapping shouldn't go away irrespective of the order in which these
will be called.

> > +	arm_iommu_detach_device(dev);
> > +
> > +	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));
> > +#endif
> > +}
> > +EXPORT_SYMBOL(arm_dma_iommu_detach_device);
> 
> I really don't see why we need an extra function that essentially just
> duplicates arm_iommu_detach_device(). The only real difference here is that
> here you reset the DMA ops more appropriately, but I think the existing
> function should be fixed to do that anyway, since set_dma_ops(dev, NULL) now
> just behaves as an unconditional fallback to the noncoherent arm_dma_ops,
> which clearly isn't always right.

The idea behind making this an extra function is that we can call it
unconditionally and it will do the right things. Granted, that already
doesn't quite work as elegantly anymore as I had hoped since this is
now an ARM specific function, so we need an #ifdef guard anyway.

I don't care very strongly either way, so if you and Christoph can both
agree that we just want arm_iommu_detach_device() to call the proper
variant of set_dma_ops(), that's fine with me, too.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180530/38fbec80/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-05-30 10:30       ` Robin Murphy
@ 2018-05-30 13:00           ` Thierry Reding
  -1 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 13:00 UTC (permalink / raw)
  To: Robin Murphy
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ben Skeggs,
	Daniel Vetter, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 3561 bytes --]

On Wed, May 30, 2018 at 11:30:25AM +0100, Robin Murphy wrote:
> On 30/05/18 09:03, Thierry Reding wrote:
> > From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > 
> > Depending on the kernel configuration, early ARM architecture setup code
> > may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> > the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> > backed buffers (a special bit in the GPU's MMU page tables indicates the
> > memory path to take: via the SMMU or directly to the memory controller).
> > Transparently backing DMA memory with an IOMMU prevents Nouveau from
> > properly handling such memory accesses and causes memory access faults.
> > 
> > As a side-note: buffers other than those allocated in instance memory
> > don't need to be physically contiguous from the GPU's perspective since
> > the GPU can map them into contiguous buffers using its own MMU. Mapping
> > these buffers through the IOMMU is unnecessary and will even lead to
> > performance degradation because of the additional translation. One
> > exception to this are compressible buffers which need large pages. In
> > order to enable these large pages, multiple small pages will have to be
> > combined into one large (I/O virtually contiguous) mapping via the
> > IOMMU. However, that is a topic outside the scope of this fix and isn't
> > currently supported. An implementation will want to explicitly create
> > these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
> > mapping would still be required.
> 
> I wonder if it might make sense to have a hook in iommu_attach_device() to
> notify the arch DMA API code when moving devices between unmanaged and DMA
> ops domains? That seems like it might be the most low-impact way to address
> the overall problem long-term.
> 
> > Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > ---
> > Changes in v3:
> > - clarify the use of IOMMU mapping for compressible buffers
> > - squash multiple patches into this
> > 
> >   drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > index 78597da6313a..d0538af1b967 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
> >   	unsigned long pgsize_bitmap;
> >   	int ret;
> > +#if IS_ENABLED(CONFIG_ARM)
> 
> Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?

Not necessarily. arm_dma_iommu_detach_device() is always defined on ARM,
only with CONFIG_ARM_DMA_USE_IOMMU=n it will be empty. So this check is
a guard to make sure we don't call the function when it isn't available,
but it may still not do anything.

> 
> > +	/* make sure we can use the IOMMU exclusively */
> > +	arm_dma_iommu_detach_device(dev);
> 
> As before, I would just use the existing infrastructure the same way the
> Exynos DRM driver currently does in __exynos_iommu_attach() (albeit without
> then reattaching to another DMA ops mapping).

That's pretty much what I initially did and which was shot down by
Christoph. As I said earlier, at this point I don't really care what
color the shed will be. Can you and Christoph come to an agreement
on what it should be?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-05-30 13:00           ` Thierry Reding
  0 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 13:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 30, 2018 at 11:30:25AM +0100, Robin Murphy wrote:
> On 30/05/18 09:03, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > Depending on the kernel configuration, early ARM architecture setup code
> > may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> > the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> > backed buffers (a special bit in the GPU's MMU page tables indicates the
> > memory path to take: via the SMMU or directly to the memory controller).
> > Transparently backing DMA memory with an IOMMU prevents Nouveau from
> > properly handling such memory accesses and causes memory access faults.
> > 
> > As a side-note: buffers other than those allocated in instance memory
> > don't need to be physically contiguous from the GPU's perspective since
> > the GPU can map them into contiguous buffers using its own MMU. Mapping
> > these buffers through the IOMMU is unnecessary and will even lead to
> > performance degradation because of the additional translation. One
> > exception to this are compressible buffers which need large pages. In
> > order to enable these large pages, multiple small pages will have to be
> > combined into one large (I/O virtually contiguous) mapping via the
> > IOMMU. However, that is a topic outside the scope of this fix and isn't
> > currently supported. An implementation will want to explicitly create
> > these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
> > mapping would still be required.
> 
> I wonder if it might make sense to have a hook in iommu_attach_device() to
> notify the arch DMA API code when moving devices between unmanaged and DMA
> ops domains? That seems like it might be the most low-impact way to address
> the overall problem long-term.
> 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> > Changes in v3:
> > - clarify the use of IOMMU mapping for compressible buffers
> > - squash multiple patches into this
> > 
> >   drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > index 78597da6313a..d0538af1b967 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
> >   	unsigned long pgsize_bitmap;
> >   	int ret;
> > +#if IS_ENABLED(CONFIG_ARM)
> 
> Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?

Not necessarily. arm_dma_iommu_detach_device() is always defined on ARM,
only with CONFIG_ARM_DMA_USE_IOMMU=n it will be empty. So this check is
a guard to make sure we don't call the function when it isn't available,
but it may still not do anything.

> 
> > +	/* make sure we can use the IOMMU exclusively */
> > +	arm_dma_iommu_detach_device(dev);
> 
> As before, I would just use the existing infrastructure the same way the
> Exynos DRM driver currently does in __exynos_iommu_attach() (albeit without
> then reattaching to another DMA ops mapping).

That's pretty much what I initially did and which was shot down by
Christoph. As I said earlier, at this point I don't really care what
color the shed will be. Can you and Christoph come to an agreement
on what it should be?

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180530/b6e5bbcc/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
  2018-05-30 12:54           ` Thierry Reding
@ 2018-05-30 13:12             ` Thierry Reding
  -1 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 13:12 UTC (permalink / raw)
  To: Robin Murphy
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ben Skeggs,
	Daniel Vetter, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 4053 bytes --]

On Wed, May 30, 2018 at 02:54:46PM +0200, Thierry Reding wrote:
> On Wed, May 30, 2018 at 10:59:30AM +0100, Robin Murphy wrote:
> > On 30/05/18 09:03, Thierry Reding wrote:
> > > From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > 
> > > Implement this function to enable drivers from detaching from any IOMMU
> > > domains that architecture code might have attached them to so that they
> > > can take exclusive control of the IOMMU via the IOMMU API.
> > > 
> > > Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > ---
> > > Changes in v3:
> > > - make API 32-bit ARM specific
> > > - avoid extra local variable
> > > 
> > > Changes in v2:
> > > - fix compilation
> > > 
> > >   arch/arm/include/asm/dma-mapping.h |  3 +++
> > >   arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
> > >   arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
> > >   3 files changed, 23 insertions(+)
> > > 
> > > diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> > > index 8436f6ade57d..5960e9f3a9d0 100644
> > > --- a/arch/arm/include/asm/dma-mapping.h
> > > +++ b/arch/arm/include/asm/dma-mapping.h
> > > @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > >   #define arch_teardown_dma_ops arch_teardown_dma_ops
> > >   extern void arch_teardown_dma_ops(struct device *dev);
> > > +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
> > > +extern void arm_dma_iommu_detach_device(struct device *dev);
> > > +
> > >   /* do not use this function in a driver */
> > >   static inline bool is_device_dma_coherent(struct device *dev)
> > >   {
> > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> > > index f448a0663b10..eb781369377b 100644
> > > --- a/arch/arm/mm/dma-mapping-nommu.c
> > > +++ b/arch/arm/mm/dma-mapping-nommu.c
> > > @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > >   void arch_teardown_dma_ops(struct device *dev)
> > >   {
> > >   }
> > > +
> > > +void arm_dma_iommu_detach_device(struct device *dev)
> > > +{
> > > +}
> > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > > index af27f1c22d93..6d8af08b3e7d 100644
> > > --- a/arch/arm/mm/dma-mapping.c
> > > +++ b/arch/arm/mm/dma-mapping.c
> > > @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
> > >   	arm_teardown_iommu_dma_ops(dev);
> > >   }
> > > +
> > > +void arm_dma_iommu_detach_device(struct device *dev)
> > > +{
> > > +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> > > +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> > > +
> > > +	if (!mapping)
> > > +		return;
> > > +
> > > +	arm_iommu_release_mapping(mapping);
> > 
> > Potentially freeing the mapping before you try to operate on it is never the
> > best idea. Plus arm_iommu_detach_device() already releases a reference
> > appropriately anyway, so it's a double-free.
> 
> But the reference released by arm_iommu_detach_device() is to balance
> out the reference acquired by arm_iommu_attach_device(), isn't it? In
> the above, the arm_iommu_release_mapping() is supposed to drop the
> final reference which was obtained by arm_iommu_create_mapping(). The
> mapping shouldn't go away irrespective of the order in which these
> will be called.

Going over the DMA/IOMMU code I just remembered that I drew inspiration
from arm_teardown_iommu_dma_ops() for the initial proposal which also
calls both arm_iommu_detach_device() and arm_iommu_release_mapping().
That said, one other possibility to implement this would be to export
the 32-bit and 64-bit ARM implementations of arch_teardown_dma_ops()
and use that instead. linux/dma-mapping.h implements a stub for
architectures that don't provide one, so it should work without any
#ifdef guards.

That combined with the set_dma_ops() fix in arm_iommu_detach_device()
should fix this pretty nicely.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
@ 2018-05-30 13:12             ` Thierry Reding
  0 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 13:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 30, 2018 at 02:54:46PM +0200, Thierry Reding wrote:
> On Wed, May 30, 2018 at 10:59:30AM +0100, Robin Murphy wrote:
> > On 30/05/18 09:03, Thierry Reding wrote:
> > > From: Thierry Reding <treding@nvidia.com>
> > > 
> > > Implement this function to enable drivers from detaching from any IOMMU
> > > domains that architecture code might have attached them to so that they
> > > can take exclusive control of the IOMMU via the IOMMU API.
> > > 
> > > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > > ---
> > > Changes in v3:
> > > - make API 32-bit ARM specific
> > > - avoid extra local variable
> > > 
> > > Changes in v2:
> > > - fix compilation
> > > 
> > >   arch/arm/include/asm/dma-mapping.h |  3 +++
> > >   arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
> > >   arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
> > >   3 files changed, 23 insertions(+)
> > > 
> > > diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> > > index 8436f6ade57d..5960e9f3a9d0 100644
> > > --- a/arch/arm/include/asm/dma-mapping.h
> > > +++ b/arch/arm/include/asm/dma-mapping.h
> > > @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > >   #define arch_teardown_dma_ops arch_teardown_dma_ops
> > >   extern void arch_teardown_dma_ops(struct device *dev);
> > > +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
> > > +extern void arm_dma_iommu_detach_device(struct device *dev);
> > > +
> > >   /* do not use this function in a driver */
> > >   static inline bool is_device_dma_coherent(struct device *dev)
> > >   {
> > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> > > index f448a0663b10..eb781369377b 100644
> > > --- a/arch/arm/mm/dma-mapping-nommu.c
> > > +++ b/arch/arm/mm/dma-mapping-nommu.c
> > > @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > >   void arch_teardown_dma_ops(struct device *dev)
> > >   {
> > >   }
> > > +
> > > +void arm_dma_iommu_detach_device(struct device *dev)
> > > +{
> > > +}
> > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > > index af27f1c22d93..6d8af08b3e7d 100644
> > > --- a/arch/arm/mm/dma-mapping.c
> > > +++ b/arch/arm/mm/dma-mapping.c
> > > @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
> > >   	arm_teardown_iommu_dma_ops(dev);
> > >   }
> > > +
> > > +void arm_dma_iommu_detach_device(struct device *dev)
> > > +{
> > > +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> > > +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> > > +
> > > +	if (!mapping)
> > > +		return;
> > > +
> > > +	arm_iommu_release_mapping(mapping);
> > 
> > Potentially freeing the mapping before you try to operate on it is never the
> > best idea. Plus arm_iommu_detach_device() already releases a reference
> > appropriately anyway, so it's a double-free.
> 
> But the reference released by arm_iommu_detach_device() is to balance
> out the reference acquired by arm_iommu_attach_device(), isn't it? In
> the above, the arm_iommu_release_mapping() is supposed to drop the
> final reference which was obtained by arm_iommu_create_mapping(). The
> mapping shouldn't go away irrespective of the order in which these
> will be called.

Going over the DMA/IOMMU code I just remembered that I drew inspiration
from arm_teardown_iommu_dma_ops() for the initial proposal which also
calls both arm_iommu_detach_device() and arm_iommu_release_mapping().
That said, one other possibility to implement this would be to export
the 32-bit and 64-bit ARM implementations of arch_teardown_dma_ops()
and use that instead. linux/dma-mapping.h implements a stub for
architectures that don't provide one, so it should work without any
#ifdef guards.

That combined with the set_dma_ops() fix in arm_iommu_detach_device()
should fix this pretty nicely.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180530/64c953e2/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-05-30 13:00           ` Thierry Reding
@ 2018-05-30 13:30             ` Robin Murphy
  -1 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30 13:30 UTC (permalink / raw)
  To: Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ben Skeggs,
	Daniel Vetter, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 30/05/18 14:00, Thierry Reding wrote:
> On Wed, May 30, 2018 at 11:30:25AM +0100, Robin Murphy wrote:
>> On 30/05/18 09:03, Thierry Reding wrote:
>>> From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>>
>>> Depending on the kernel configuration, early ARM architecture setup code
>>> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
>>> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
>>> backed buffers (a special bit in the GPU's MMU page tables indicates the
>>> memory path to take: via the SMMU or directly to the memory controller).
>>> Transparently backing DMA memory with an IOMMU prevents Nouveau from
>>> properly handling such memory accesses and causes memory access faults.
>>>
>>> As a side-note: buffers other than those allocated in instance memory
>>> don't need to be physically contiguous from the GPU's perspective since
>>> the GPU can map them into contiguous buffers using its own MMU. Mapping
>>> these buffers through the IOMMU is unnecessary and will even lead to
>>> performance degradation because of the additional translation. One
>>> exception to this are compressible buffers which need large pages. In
>>> order to enable these large pages, multiple small pages will have to be
>>> combined into one large (I/O virtually contiguous) mapping via the
>>> IOMMU. However, that is a topic outside the scope of this fix and isn't
>>> currently supported. An implementation will want to explicitly create
>>> these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
>>> mapping would still be required.
>>
>> I wonder if it might make sense to have a hook in iommu_attach_device() to
>> notify the arch DMA API code when moving devices between unmanaged and DMA
>> ops domains? That seems like it might be the most low-impact way to address
>> the overall problem long-term.
>>
>>> Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>> ---
>>> Changes in v3:
>>> - clarify the use of IOMMU mapping for compressible buffers
>>> - squash multiple patches into this
>>>
>>>    drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>> index 78597da6313a..d0538af1b967 100644
>>> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>> @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
>>>    	unsigned long pgsize_bitmap;
>>>    	int ret;
>>> +#if IS_ENABLED(CONFIG_ARM)
>>
>> Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?
> 
> Not necessarily. arm_dma_iommu_detach_device() is always defined on ARM,
> only with CONFIG_ARM_DMA_USE_IOMMU=n it will be empty. So this check is
> a guard to make sure we don't call the function when it isn't available,
> but it may still not do anything.

Calling a function under condition A, which only does anything under 
condition B, when B depends on A, is identical in behaviour to only 
calling the function under condition B, except needlessly harder to follow.

>>> +	/* make sure we can use the IOMMU exclusively */
>>> +	arm_dma_iommu_detach_device(dev);
>>
>> As before, I would just use the existing infrastructure the same way the
>> Exynos DRM driver currently does in __exynos_iommu_attach() (albeit without
>> then reattaching to another DMA ops mapping).
> 
> That's pretty much what I initially did and which was shot down by
> Christoph. As I said earlier, at this point I don't really care what
> color the shed will be. Can you and Christoph come to an agreement
> on what it should be?

What I was getting at is that arm_iommu_detach_device() already *is* the 
exact function Christoph was asking for, it just needs a minor fix 
instead of adding explicit set_dma_ops() fiddling at its callsites which 
only obfuscates the fact that it's supposed to be responsible for 
resetting the device's DMA ops already.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-05-30 13:30             ` Robin Murphy
  0 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/05/18 14:00, Thierry Reding wrote:
> On Wed, May 30, 2018 at 11:30:25AM +0100, Robin Murphy wrote:
>> On 30/05/18 09:03, Thierry Reding wrote:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> Depending on the kernel configuration, early ARM architecture setup code
>>> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
>>> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
>>> backed buffers (a special bit in the GPU's MMU page tables indicates the
>>> memory path to take: via the SMMU or directly to the memory controller).
>>> Transparently backing DMA memory with an IOMMU prevents Nouveau from
>>> properly handling such memory accesses and causes memory access faults.
>>>
>>> As a side-note: buffers other than those allocated in instance memory
>>> don't need to be physically contiguous from the GPU's perspective since
>>> the GPU can map them into contiguous buffers using its own MMU. Mapping
>>> these buffers through the IOMMU is unnecessary and will even lead to
>>> performance degradation because of the additional translation. One
>>> exception to this are compressible buffers which need large pages. In
>>> order to enable these large pages, multiple small pages will have to be
>>> combined into one large (I/O virtually contiguous) mapping via the
>>> IOMMU. However, that is a topic outside the scope of this fix and isn't
>>> currently supported. An implementation will want to explicitly create
>>> these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
>>> mapping would still be required.
>>
>> I wonder if it might make sense to have a hook in iommu_attach_device() to
>> notify the arch DMA API code when moving devices between unmanaged and DMA
>> ops domains? That seems like it might be the most low-impact way to address
>> the overall problem long-term.
>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>> Changes in v3:
>>> - clarify the use of IOMMU mapping for compressible buffers
>>> - squash multiple patches into this
>>>
>>>    drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>> index 78597da6313a..d0538af1b967 100644
>>> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>> @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
>>>    	unsigned long pgsize_bitmap;
>>>    	int ret;
>>> +#if IS_ENABLED(CONFIG_ARM)
>>
>> Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?
> 
> Not necessarily. arm_dma_iommu_detach_device() is always defined on ARM,
> only with CONFIG_ARM_DMA_USE_IOMMU=n it will be empty. So this check is
> a guard to make sure we don't call the function when it isn't available,
> but it may still not do anything.

Calling a function under condition A, which only does anything under 
condition B, when B depends on A, is identical in behaviour to only 
calling the function under condition B, except needlessly harder to follow.

>>> +	/* make sure we can use the IOMMU exclusively */
>>> +	arm_dma_iommu_detach_device(dev);
>>
>> As before, I would just use the existing infrastructure the same way the
>> Exynos DRM driver currently does in __exynos_iommu_attach() (albeit without
>> then reattaching to another DMA ops mapping).
> 
> That's pretty much what I initially did and which was shot down by
> Christoph. As I said earlier, at this point I don't really care what
> color the shed will be. Can you and Christoph come to an agreement
> on what it should be?

What I was getting at is that arm_iommu_detach_device() already *is* the 
exact function Christoph was asking for, it just needs a minor fix 
instead of adding explicit set_dma_ops() fiddling at its callsites which 
only obfuscates the fact that it's supposed to be responsible for 
resetting the device's DMA ops already.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-05-30 13:30             ` Robin Murphy
@ 2018-05-30 13:41               ` Thierry Reding
  -1 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 13:41 UTC (permalink / raw)
  To: Robin Murphy
  Cc: nouveau, Joerg Roedel, Russell King, dri-devel,
	Christoph Hellwig, Jordan Crouse, iommu, Ben Skeggs,
	Daniel Vetter, linux-tegra, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 4684 bytes --]

On Wed, May 30, 2018 at 02:30:51PM +0100, Robin Murphy wrote:
> On 30/05/18 14:00, Thierry Reding wrote:
> > On Wed, May 30, 2018 at 11:30:25AM +0100, Robin Murphy wrote:
> > > On 30/05/18 09:03, Thierry Reding wrote:
> > > > From: Thierry Reding <treding@nvidia.com>
> > > > 
> > > > Depending on the kernel configuration, early ARM architecture setup code
> > > > may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> > > > the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> > > > backed buffers (a special bit in the GPU's MMU page tables indicates the
> > > > memory path to take: via the SMMU or directly to the memory controller).
> > > > Transparently backing DMA memory with an IOMMU prevents Nouveau from
> > > > properly handling such memory accesses and causes memory access faults.
> > > > 
> > > > As a side-note: buffers other than those allocated in instance memory
> > > > don't need to be physically contiguous from the GPU's perspective since
> > > > the GPU can map them into contiguous buffers using its own MMU. Mapping
> > > > these buffers through the IOMMU is unnecessary and will even lead to
> > > > performance degradation because of the additional translation. One
> > > > exception to this are compressible buffers which need large pages. In
> > > > order to enable these large pages, multiple small pages will have to be
> > > > combined into one large (I/O virtually contiguous) mapping via the
> > > > IOMMU. However, that is a topic outside the scope of this fix and isn't
> > > > currently supported. An implementation will want to explicitly create
> > > > these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
> > > > mapping would still be required.
> > > 
> > > I wonder if it might make sense to have a hook in iommu_attach_device() to
> > > notify the arch DMA API code when moving devices between unmanaged and DMA
> > > ops domains? That seems like it might be the most low-impact way to address
> > > the overall problem long-term.
> > > 
> > > > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > > > ---
> > > > Changes in v3:
> > > > - clarify the use of IOMMU mapping for compressible buffers
> > > > - squash multiple patches into this
> > > > 
> > > >    drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
> > > >    1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > > > index 78597da6313a..d0538af1b967 100644
> > > > --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > > > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > > > @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
> > > >    	unsigned long pgsize_bitmap;
> > > >    	int ret;
> > > > +#if IS_ENABLED(CONFIG_ARM)
> > > 
> > > Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?
> > 
> > Not necessarily. arm_dma_iommu_detach_device() is always defined on ARM,
> > only with CONFIG_ARM_DMA_USE_IOMMU=n it will be empty. So this check is
> > a guard to make sure we don't call the function when it isn't available,
> > but it may still not do anything.
> 
> Calling a function under condition A, which only does anything under
> condition B, when B depends on A, is identical in behaviour to only calling
> the function under condition B, except needlessly harder to follow.
> 
> > > > +	/* make sure we can use the IOMMU exclusively */
> > > > +	arm_dma_iommu_detach_device(dev);
> > > 
> > > As before, I would just use the existing infrastructure the same way the
> > > Exynos DRM driver currently does in __exynos_iommu_attach() (albeit without
> > > then reattaching to another DMA ops mapping).
> > 
> > That's pretty much what I initially did and which was shot down by
> > Christoph. As I said earlier, at this point I don't really care what
> > color the shed will be. Can you and Christoph come to an agreement
> > on what it should be?
> 
> What I was getting at is that arm_iommu_detach_device() already *is* the
> exact function Christoph was asking for, it just needs a minor fix instead
> of adding explicit set_dma_ops() fiddling at its callsites which only
> obfuscates the fact that it's supposed to be responsible for resetting the
> device's DMA ops already.

It still has the downside of callers having to explicitly check for the
existence of a mapping, otherwise they'll cause a warning to be printed
to the kernel log.

That's not all that bad, though. I'll prepare version 4 with those
changes.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-05-30 13:41               ` Thierry Reding
  0 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 30, 2018 at 02:30:51PM +0100, Robin Murphy wrote:
> On 30/05/18 14:00, Thierry Reding wrote:
> > On Wed, May 30, 2018 at 11:30:25AM +0100, Robin Murphy wrote:
> > > On 30/05/18 09:03, Thierry Reding wrote:
> > > > From: Thierry Reding <treding@nvidia.com>
> > > > 
> > > > Depending on the kernel configuration, early ARM architecture setup code
> > > > may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> > > > the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> > > > backed buffers (a special bit in the GPU's MMU page tables indicates the
> > > > memory path to take: via the SMMU or directly to the memory controller).
> > > > Transparently backing DMA memory with an IOMMU prevents Nouveau from
> > > > properly handling such memory accesses and causes memory access faults.
> > > > 
> > > > As a side-note: buffers other than those allocated in instance memory
> > > > don't need to be physically contiguous from the GPU's perspective since
> > > > the GPU can map them into contiguous buffers using its own MMU. Mapping
> > > > these buffers through the IOMMU is unnecessary and will even lead to
> > > > performance degradation because of the additional translation. One
> > > > exception to this are compressible buffers which need large pages. In
> > > > order to enable these large pages, multiple small pages will have to be
> > > > combined into one large (I/O virtually contiguous) mapping via the
> > > > IOMMU. However, that is a topic outside the scope of this fix and isn't
> > > > currently supported. An implementation will want to explicitly create
> > > > these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
> > > > mapping would still be required.
> > > 
> > > I wonder if it might make sense to have a hook in iommu_attach_device() to
> > > notify the arch DMA API code when moving devices between unmanaged and DMA
> > > ops domains? That seems like it might be the most low-impact way to address
> > > the overall problem long-term.
> > > 
> > > > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > > > ---
> > > > Changes in v3:
> > > > - clarify the use of IOMMU mapping for compressible buffers
> > > > - squash multiple patches into this
> > > > 
> > > >    drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
> > > >    1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > > > index 78597da6313a..d0538af1b967 100644
> > > > --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > > > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> > > > @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
> > > >    	unsigned long pgsize_bitmap;
> > > >    	int ret;
> > > > +#if IS_ENABLED(CONFIG_ARM)
> > > 
> > > Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?
> > 
> > Not necessarily. arm_dma_iommu_detach_device() is always defined on ARM,
> > only with CONFIG_ARM_DMA_USE_IOMMU=n it will be empty. So this check is
> > a guard to make sure we don't call the function when it isn't available,
> > but it may still not do anything.
> 
> Calling a function under condition A, which only does anything under
> condition B, when B depends on A, is identical in behaviour to only calling
> the function under condition B, except needlessly harder to follow.
> 
> > > > +	/* make sure we can use the IOMMU exclusively */
> > > > +	arm_dma_iommu_detach_device(dev);
> > > 
> > > As before, I would just use the existing infrastructure the same way the
> > > Exynos DRM driver currently does in __exynos_iommu_attach() (albeit without
> > > then reattaching to another DMA ops mapping).
> > 
> > That's pretty much what I initially did and which was shot down by
> > Christoph. As I said earlier, at this point I don't really care what
> > color the shed will be. Can you and Christoph come to an agreement
> > on what it should be?
> 
> What I was getting at is that arm_iommu_detach_device() already *is* the
> exact function Christoph was asking for, it just needs a minor fix instead
> of adding explicit set_dma_ops() fiddling at its callsites which only
> obfuscates the fact that it's supposed to be responsible for resetting the
> device's DMA ops already.

It still has the downside of callers having to explicitly check for the
existence of a mapping, otherwise they'll cause a warning to be printed
to the kernel log.

That's not all that bad, though. I'll prepare version 4 with those
changes.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180530/02b467fa/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
  2018-05-30 13:12             ` Thierry Reding
@ 2018-05-30 13:42               ` Robin Murphy
  -1 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30 13:42 UTC (permalink / raw)
  To: Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ben Skeggs,
	Daniel Vetter, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 30/05/18 14:12, Thierry Reding wrote:
> On Wed, May 30, 2018 at 02:54:46PM +0200, Thierry Reding wrote:
>> On Wed, May 30, 2018 at 10:59:30AM +0100, Robin Murphy wrote:
>>> On 30/05/18 09:03, Thierry Reding wrote:
>>>> From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>>>
>>>> Implement this function to enable drivers from detaching from any IOMMU
>>>> domains that architecture code might have attached them to so that they
>>>> can take exclusive control of the IOMMU via the IOMMU API.
>>>>
>>>> Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>> Changes in v3:
>>>> - make API 32-bit ARM specific
>>>> - avoid extra local variable
>>>>
>>>> Changes in v2:
>>>> - fix compilation
>>>>
>>>>    arch/arm/include/asm/dma-mapping.h |  3 +++
>>>>    arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
>>>>    arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
>>>>    3 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
>>>> index 8436f6ade57d..5960e9f3a9d0 100644
>>>> --- a/arch/arm/include/asm/dma-mapping.h
>>>> +++ b/arch/arm/include/asm/dma-mapping.h
>>>> @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>>>>    #define arch_teardown_dma_ops arch_teardown_dma_ops
>>>>    extern void arch_teardown_dma_ops(struct device *dev);
>>>> +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
>>>> +extern void arm_dma_iommu_detach_device(struct device *dev);
>>>> +
>>>>    /* do not use this function in a driver */
>>>>    static inline bool is_device_dma_coherent(struct device *dev)
>>>>    {
>>>> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
>>>> index f448a0663b10..eb781369377b 100644
>>>> --- a/arch/arm/mm/dma-mapping-nommu.c
>>>> +++ b/arch/arm/mm/dma-mapping-nommu.c
>>>> @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>>>>    void arch_teardown_dma_ops(struct device *dev)
>>>>    {
>>>>    }
>>>> +
>>>> +void arm_dma_iommu_detach_device(struct device *dev)
>>>> +{
>>>> +}
>>>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>>>> index af27f1c22d93..6d8af08b3e7d 100644
>>>> --- a/arch/arm/mm/dma-mapping.c
>>>> +++ b/arch/arm/mm/dma-mapping.c
>>>> @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
>>>>    	arm_teardown_iommu_dma_ops(dev);
>>>>    }
>>>> +
>>>> +void arm_dma_iommu_detach_device(struct device *dev)
>>>> +{
>>>> +#ifdef CONFIG_ARM_DMA_USE_IOMMU
>>>> +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
>>>> +
>>>> +	if (!mapping)
>>>> +		return;
>>>> +
>>>> +	arm_iommu_release_mapping(mapping);
>>>
>>> Potentially freeing the mapping before you try to operate on it is never the
>>> best idea. Plus arm_iommu_detach_device() already releases a reference
>>> appropriately anyway, so it's a double-free.
>>
>> But the reference released by arm_iommu_detach_device() is to balance
>> out the reference acquired by arm_iommu_attach_device(), isn't it? In
>> the above, the arm_iommu_release_mapping() is supposed to drop the
>> final reference which was obtained by arm_iommu_create_mapping(). The
>> mapping shouldn't go away irrespective of the order in which these
>> will be called.
> 
> Going over the DMA/IOMMU code I just remembered that I drew inspiration
> from arm_teardown_iommu_dma_ops() for the initial proposal which also
> calls both arm_iommu_detach_device() and arm_iommu_release_mapping().
> That said, one other possibility to implement this would be to export
> the 32-bit and 64-bit ARM implementations of arch_teardown_dma_ops()
> and use that instead. linux/dma-mapping.h implements a stub for
> architectures that don't provide one, so it should work without any
> #ifdef guards.
> 
> That combined with the set_dma_ops() fix in arm_iommu_detach_device()
> should fix this pretty nicely.

OK, having a second look at the ARM code I see I had indeed overlooked 
that extra reference held until arm_teardown_iommu_dma_ops() - mea culpa 
- but frankly that looks wrong anyway, as it basically defeats the point 
of refcounting the mapping at all. AFAICS arm_setup_iommu_dma_ops() 
should just be made to behave 'normally' by unconditionally dropping the 
initial reference after calling __arm_iommu_attach_device(), then we 
don't need all these odd and confusing release calls dotted around at all.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
@ 2018-05-30 13:42               ` Robin Murphy
  0 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30 13:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/05/18 14:12, Thierry Reding wrote:
> On Wed, May 30, 2018 at 02:54:46PM +0200, Thierry Reding wrote:
>> On Wed, May 30, 2018 at 10:59:30AM +0100, Robin Murphy wrote:
>>> On 30/05/18 09:03, Thierry Reding wrote:
>>>> From: Thierry Reding <treding@nvidia.com>
>>>>
>>>> Implement this function to enable drivers from detaching from any IOMMU
>>>> domains that architecture code might have attached them to so that they
>>>> can take exclusive control of the IOMMU via the IOMMU API.
>>>>
>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>> ---
>>>> Changes in v3:
>>>> - make API 32-bit ARM specific
>>>> - avoid extra local variable
>>>>
>>>> Changes in v2:
>>>> - fix compilation
>>>>
>>>>    arch/arm/include/asm/dma-mapping.h |  3 +++
>>>>    arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
>>>>    arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
>>>>    3 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
>>>> index 8436f6ade57d..5960e9f3a9d0 100644
>>>> --- a/arch/arm/include/asm/dma-mapping.h
>>>> +++ b/arch/arm/include/asm/dma-mapping.h
>>>> @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>>>>    #define arch_teardown_dma_ops arch_teardown_dma_ops
>>>>    extern void arch_teardown_dma_ops(struct device *dev);
>>>> +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
>>>> +extern void arm_dma_iommu_detach_device(struct device *dev);
>>>> +
>>>>    /* do not use this function in a driver */
>>>>    static inline bool is_device_dma_coherent(struct device *dev)
>>>>    {
>>>> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
>>>> index f448a0663b10..eb781369377b 100644
>>>> --- a/arch/arm/mm/dma-mapping-nommu.c
>>>> +++ b/arch/arm/mm/dma-mapping-nommu.c
>>>> @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>>>>    void arch_teardown_dma_ops(struct device *dev)
>>>>    {
>>>>    }
>>>> +
>>>> +void arm_dma_iommu_detach_device(struct device *dev)
>>>> +{
>>>> +}
>>>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>>>> index af27f1c22d93..6d8af08b3e7d 100644
>>>> --- a/arch/arm/mm/dma-mapping.c
>>>> +++ b/arch/arm/mm/dma-mapping.c
>>>> @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
>>>>    	arm_teardown_iommu_dma_ops(dev);
>>>>    }
>>>> +
>>>> +void arm_dma_iommu_detach_device(struct device *dev)
>>>> +{
>>>> +#ifdef CONFIG_ARM_DMA_USE_IOMMU
>>>> +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
>>>> +
>>>> +	if (!mapping)
>>>> +		return;
>>>> +
>>>> +	arm_iommu_release_mapping(mapping);
>>>
>>> Potentially freeing the mapping before you try to operate on it is never the
>>> best idea. Plus arm_iommu_detach_device() already releases a reference
>>> appropriately anyway, so it's a double-free.
>>
>> But the reference released by arm_iommu_detach_device() is to balance
>> out the reference acquired by arm_iommu_attach_device(), isn't it? In
>> the above, the arm_iommu_release_mapping() is supposed to drop the
>> final reference which was obtained by arm_iommu_create_mapping(). The
>> mapping shouldn't go away irrespective of the order in which these
>> will be called.
> 
> Going over the DMA/IOMMU code I just remembered that I drew inspiration
> from arm_teardown_iommu_dma_ops() for the initial proposal which also
> calls both arm_iommu_detach_device() and arm_iommu_release_mapping().
> That said, one other possibility to implement this would be to export
> the 32-bit and 64-bit ARM implementations of arch_teardown_dma_ops()
> and use that instead. linux/dma-mapping.h implements a stub for
> architectures that don't provide one, so it should work without any
> #ifdef guards.
> 
> That combined with the set_dma_ops() fix in arm_iommu_detach_device()
> should fix this pretty nicely.

OK, having a second look at the ARM code I see I had indeed overlooked 
that extra reference held until arm_teardown_iommu_dma_ops() - mea culpa 
- but frankly that looks wrong anyway, as it basically defeats the point 
of refcounting the mapping at all. AFAICS arm_setup_iommu_dma_ops() 
should just be made to behave 'normally' by unconditionally dropping the 
initial reference after calling __arm_iommu_attach_device(), then we 
don't need all these odd and confusing release calls dotted around at all.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-05-30 13:41               ` Thierry Reding
@ 2018-05-30 13:46                 ` Robin Murphy
  -1 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30 13:46 UTC (permalink / raw)
  To: Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ben Skeggs,
	Daniel Vetter, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 30/05/18 14:41, Thierry Reding wrote:
> On Wed, May 30, 2018 at 02:30:51PM +0100, Robin Murphy wrote:
>> On 30/05/18 14:00, Thierry Reding wrote:
>>> On Wed, May 30, 2018 at 11:30:25AM +0100, Robin Murphy wrote:
>>>> On 30/05/18 09:03, Thierry Reding wrote:
>>>>> From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>>>>
>>>>> Depending on the kernel configuration, early ARM architecture setup code
>>>>> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
>>>>> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
>>>>> backed buffers (a special bit in the GPU's MMU page tables indicates the
>>>>> memory path to take: via the SMMU or directly to the memory controller).
>>>>> Transparently backing DMA memory with an IOMMU prevents Nouveau from
>>>>> properly handling such memory accesses and causes memory access faults.
>>>>>
>>>>> As a side-note: buffers other than those allocated in instance memory
>>>>> don't need to be physically contiguous from the GPU's perspective since
>>>>> the GPU can map them into contiguous buffers using its own MMU. Mapping
>>>>> these buffers through the IOMMU is unnecessary and will even lead to
>>>>> performance degradation because of the additional translation. One
>>>>> exception to this are compressible buffers which need large pages. In
>>>>> order to enable these large pages, multiple small pages will have to be
>>>>> combined into one large (I/O virtually contiguous) mapping via the
>>>>> IOMMU. However, that is a topic outside the scope of this fix and isn't
>>>>> currently supported. An implementation will want to explicitly create
>>>>> these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
>>>>> mapping would still be required.
>>>>
>>>> I wonder if it might make sense to have a hook in iommu_attach_device() to
>>>> notify the arch DMA API code when moving devices between unmanaged and DMA
>>>> ops domains? That seems like it might be the most low-impact way to address
>>>> the overall problem long-term.
>>>>
>>>>> Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>>>> ---
>>>>> Changes in v3:
>>>>> - clarify the use of IOMMU mapping for compressible buffers
>>>>> - squash multiple patches into this
>>>>>
>>>>>     drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
>>>>>     1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>>>> index 78597da6313a..d0538af1b967 100644
>>>>> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>>>> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>>>> @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
>>>>>     	unsigned long pgsize_bitmap;
>>>>>     	int ret;
>>>>> +#if IS_ENABLED(CONFIG_ARM)
>>>>
>>>> Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?
>>>
>>> Not necessarily. arm_dma_iommu_detach_device() is always defined on ARM,
>>> only with CONFIG_ARM_DMA_USE_IOMMU=n it will be empty. So this check is
>>> a guard to make sure we don't call the function when it isn't available,
>>> but it may still not do anything.
>>
>> Calling a function under condition A, which only does anything under
>> condition B, when B depends on A, is identical in behaviour to only calling
>> the function under condition B, except needlessly harder to follow.
>>
>>>>> +	/* make sure we can use the IOMMU exclusively */
>>>>> +	arm_dma_iommu_detach_device(dev);
>>>>
>>>> As before, I would just use the existing infrastructure the same way the
>>>> Exynos DRM driver currently does in __exynos_iommu_attach() (albeit without
>>>> then reattaching to another DMA ops mapping).
>>>
>>> That's pretty much what I initially did and which was shot down by
>>> Christoph. As I said earlier, at this point I don't really care what
>>> color the shed will be. Can you and Christoph come to an agreement
>>> on what it should be?
>>
>> What I was getting at is that arm_iommu_detach_device() already *is* the
>> exact function Christoph was asking for, it just needs a minor fix instead
>> of adding explicit set_dma_ops() fiddling at its callsites which only
>> obfuscates the fact that it's supposed to be responsible for resetting the
>> device's DMA ops already.
> 
> It still has the downside of callers having to explicitly check for the
> existence of a mapping, otherwise they'll cause a warning to be printed
> to the kernel log.

Or we could look at the way it's actually used, and reconsider whether 
the warning is really appropriate. That's always an option ;)

Robin.

> That's not all that bad, though. I'll prepare version 4 with those
> changes.
> 
> Thierry
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-05-30 13:46                 ` Robin Murphy
  0 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2018-05-30 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/05/18 14:41, Thierry Reding wrote:
> On Wed, May 30, 2018 at 02:30:51PM +0100, Robin Murphy wrote:
>> On 30/05/18 14:00, Thierry Reding wrote:
>>> On Wed, May 30, 2018 at 11:30:25AM +0100, Robin Murphy wrote:
>>>> On 30/05/18 09:03, Thierry Reding wrote:
>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>
>>>>> Depending on the kernel configuration, early ARM architecture setup code
>>>>> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
>>>>> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
>>>>> backed buffers (a special bit in the GPU's MMU page tables indicates the
>>>>> memory path to take: via the SMMU or directly to the memory controller).
>>>>> Transparently backing DMA memory with an IOMMU prevents Nouveau from
>>>>> properly handling such memory accesses and causes memory access faults.
>>>>>
>>>>> As a side-note: buffers other than those allocated in instance memory
>>>>> don't need to be physically contiguous from the GPU's perspective since
>>>>> the GPU can map them into contiguous buffers using its own MMU. Mapping
>>>>> these buffers through the IOMMU is unnecessary and will even lead to
>>>>> performance degradation because of the additional translation. One
>>>>> exception to this are compressible buffers which need large pages. In
>>>>> order to enable these large pages, multiple small pages will have to be
>>>>> combined into one large (I/O virtually contiguous) mapping via the
>>>>> IOMMU. However, that is a topic outside the scope of this fix and isn't
>>>>> currently supported. An implementation will want to explicitly create
>>>>> these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
>>>>> mapping would still be required.
>>>>
>>>> I wonder if it might make sense to have a hook in iommu_attach_device() to
>>>> notify the arch DMA API code when moving devices between unmanaged and DMA
>>>> ops domains? That seems like it might be the most low-impact way to address
>>>> the overall problem long-term.
>>>>
>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>>> ---
>>>>> Changes in v3:
>>>>> - clarify the use of IOMMU mapping for compressible buffers
>>>>> - squash multiple patches into this
>>>>>
>>>>>     drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 5 +++++
>>>>>     1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>>>> index 78597da6313a..d0538af1b967 100644
>>>>> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>>>> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
>>>>> @@ -105,6 +105,11 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
>>>>>     	unsigned long pgsize_bitmap;
>>>>>     	int ret;
>>>>> +#if IS_ENABLED(CONFIG_ARM)
>>>>
>>>> Wouldn't CONFIG_ARM_DMA_USE_IOMMU be even more appropriate?
>>>
>>> Not necessarily. arm_dma_iommu_detach_device() is always defined on ARM,
>>> only with CONFIG_ARM_DMA_USE_IOMMU=n it will be empty. So this check is
>>> a guard to make sure we don't call the function when it isn't available,
>>> but it may still not do anything.
>>
>> Calling a function under condition A, which only does anything under
>> condition B, when B depends on A, is identical in behaviour to only calling
>> the function under condition B, except needlessly harder to follow.
>>
>>>>> +	/* make sure we can use the IOMMU exclusively */
>>>>> +	arm_dma_iommu_detach_device(dev);
>>>>
>>>> As before, I would just use the existing infrastructure the same way the
>>>> Exynos DRM driver currently does in __exynos_iommu_attach() (albeit without
>>>> then reattaching to another DMA ops mapping).
>>>
>>> That's pretty much what I initially did and which was shot down by
>>> Christoph. As I said earlier, at this point I don't really care what
>>> color the shed will be. Can you and Christoph come to an agreement
>>> on what it should be?
>>
>> What I was getting at is that arm_iommu_detach_device() already *is* the
>> exact function Christoph was asking for, it just needs a minor fix instead
>> of adding explicit set_dma_ops() fiddling at its callsites which only
>> obfuscates the fact that it's supposed to be responsible for resetting the
>> device's DMA ops already.
> 
> It still has the downside of callers having to explicitly check for the
> existence of a mapping, otherwise they'll cause a warning to be printed
> to the kernel log.

Or we could look at the way it's actually used, and reconsider whether 
the warning is really appropriate. That's always an option ;)

Robin.

> That's not all that bad, though. I'll prepare version 4 with those
> changes.
> 
> Thierry
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
  2018-05-30 13:42               ` Robin Murphy
@ 2018-05-30 14:07                   ` Thierry Reding
  -1 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 14:07 UTC (permalink / raw)
  To: Robin Murphy
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ben Skeggs,
	Daniel Vetter, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 5129 bytes --]

On Wed, May 30, 2018 at 02:42:50PM +0100, Robin Murphy wrote:
> On 30/05/18 14:12, Thierry Reding wrote:
> > On Wed, May 30, 2018 at 02:54:46PM +0200, Thierry Reding wrote:
> > > On Wed, May 30, 2018 at 10:59:30AM +0100, Robin Murphy wrote:
> > > > On 30/05/18 09:03, Thierry Reding wrote:
> > > > > From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > > > 
> > > > > Implement this function to enable drivers from detaching from any IOMMU
> > > > > domains that architecture code might have attached them to so that they
> > > > > can take exclusive control of the IOMMU via the IOMMU API.
> > > > > 
> > > > > Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > > > ---
> > > > > Changes in v3:
> > > > > - make API 32-bit ARM specific
> > > > > - avoid extra local variable
> > > > > 
> > > > > Changes in v2:
> > > > > - fix compilation
> > > > > 
> > > > >    arch/arm/include/asm/dma-mapping.h |  3 +++
> > > > >    arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
> > > > >    arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
> > > > >    3 files changed, 23 insertions(+)
> > > > > 
> > > > > diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> > > > > index 8436f6ade57d..5960e9f3a9d0 100644
> > > > > --- a/arch/arm/include/asm/dma-mapping.h
> > > > > +++ b/arch/arm/include/asm/dma-mapping.h
> > > > > @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > > > >    #define arch_teardown_dma_ops arch_teardown_dma_ops
> > > > >    extern void arch_teardown_dma_ops(struct device *dev);
> > > > > +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
> > > > > +extern void arm_dma_iommu_detach_device(struct device *dev);
> > > > > +
> > > > >    /* do not use this function in a driver */
> > > > >    static inline bool is_device_dma_coherent(struct device *dev)
> > > > >    {
> > > > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> > > > > index f448a0663b10..eb781369377b 100644
> > > > > --- a/arch/arm/mm/dma-mapping-nommu.c
> > > > > +++ b/arch/arm/mm/dma-mapping-nommu.c
> > > > > @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > > > >    void arch_teardown_dma_ops(struct device *dev)
> > > > >    {
> > > > >    }
> > > > > +
> > > > > +void arm_dma_iommu_detach_device(struct device *dev)
> > > > > +{
> > > > > +}
> > > > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > > > > index af27f1c22d93..6d8af08b3e7d 100644
> > > > > --- a/arch/arm/mm/dma-mapping.c
> > > > > +++ b/arch/arm/mm/dma-mapping.c
> > > > > @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
> > > > >    	arm_teardown_iommu_dma_ops(dev);
> > > > >    }
> > > > > +
> > > > > +void arm_dma_iommu_detach_device(struct device *dev)
> > > > > +{
> > > > > +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> > > > > +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> > > > > +
> > > > > +	if (!mapping)
> > > > > +		return;
> > > > > +
> > > > > +	arm_iommu_release_mapping(mapping);
> > > > 
> > > > Potentially freeing the mapping before you try to operate on it is never the
> > > > best idea. Plus arm_iommu_detach_device() already releases a reference
> > > > appropriately anyway, so it's a double-free.
> > > 
> > > But the reference released by arm_iommu_detach_device() is to balance
> > > out the reference acquired by arm_iommu_attach_device(), isn't it? In
> > > the above, the arm_iommu_release_mapping() is supposed to drop the
> > > final reference which was obtained by arm_iommu_create_mapping(). The
> > > mapping shouldn't go away irrespective of the order in which these
> > > will be called.
> > 
> > Going over the DMA/IOMMU code I just remembered that I drew inspiration
> > from arm_teardown_iommu_dma_ops() for the initial proposal which also
> > calls both arm_iommu_detach_device() and arm_iommu_release_mapping().
> > That said, one other possibility to implement this would be to export
> > the 32-bit and 64-bit ARM implementations of arch_teardown_dma_ops()
> > and use that instead. linux/dma-mapping.h implements a stub for
> > architectures that don't provide one, so it should work without any
> > #ifdef guards.
> > 
> > That combined with the set_dma_ops() fix in arm_iommu_detach_device()
> > should fix this pretty nicely.
> 
> OK, having a second look at the ARM code I see I had indeed overlooked that
> extra reference held until arm_teardown_iommu_dma_ops() - mea culpa - but
> frankly that looks wrong anyway, as it basically defeats the point of
> refcounting the mapping at all. AFAICS arm_setup_iommu_dma_ops() should just
> be made to behave 'normally' by unconditionally dropping the initial
> reference after calling __arm_iommu_attach_device(), then we don't need all
> these odd and confusing release calls dotted around at all.

Good point. I can follow up with a series to fix the reference counting.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device()
@ 2018-05-30 14:07                   ` Thierry Reding
  0 siblings, 0 replies; 26+ messages in thread
From: Thierry Reding @ 2018-05-30 14:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 30, 2018 at 02:42:50PM +0100, Robin Murphy wrote:
> On 30/05/18 14:12, Thierry Reding wrote:
> > On Wed, May 30, 2018 at 02:54:46PM +0200, Thierry Reding wrote:
> > > On Wed, May 30, 2018 at 10:59:30AM +0100, Robin Murphy wrote:
> > > > On 30/05/18 09:03, Thierry Reding wrote:
> > > > > From: Thierry Reding <treding@nvidia.com>
> > > > > 
> > > > > Implement this function to enable drivers from detaching from any IOMMU
> > > > > domains that architecture code might have attached them to so that they
> > > > > can take exclusive control of the IOMMU via the IOMMU API.
> > > > > 
> > > > > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > > > > ---
> > > > > Changes in v3:
> > > > > - make API 32-bit ARM specific
> > > > > - avoid extra local variable
> > > > > 
> > > > > Changes in v2:
> > > > > - fix compilation
> > > > > 
> > > > >    arch/arm/include/asm/dma-mapping.h |  3 +++
> > > > >    arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
> > > > >    arch/arm/mm/dma-mapping.c          | 16 ++++++++++++++++
> > > > >    3 files changed, 23 insertions(+)
> > > > > 
> > > > > diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> > > > > index 8436f6ade57d..5960e9f3a9d0 100644
> > > > > --- a/arch/arm/include/asm/dma-mapping.h
> > > > > +++ b/arch/arm/include/asm/dma-mapping.h
> > > > > @@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > > > >    #define arch_teardown_dma_ops arch_teardown_dma_ops
> > > > >    extern void arch_teardown_dma_ops(struct device *dev);
> > > > > +#define arm_dma_iommu_detach_device arm_dma_iommu_detach_device
> > > > > +extern void arm_dma_iommu_detach_device(struct device *dev);
> > > > > +
> > > > >    /* do not use this function in a driver */
> > > > >    static inline bool is_device_dma_coherent(struct device *dev)
> > > > >    {
> > > > > diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> > > > > index f448a0663b10..eb781369377b 100644
> > > > > --- a/arch/arm/mm/dma-mapping-nommu.c
> > > > > +++ b/arch/arm/mm/dma-mapping-nommu.c
> > > > > @@ -241,3 +241,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > > > >    void arch_teardown_dma_ops(struct device *dev)
> > > > >    {
> > > > >    }
> > > > > +
> > > > > +void arm_dma_iommu_detach_device(struct device *dev)
> > > > > +{
> > > > > +}
> > > > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > > > > index af27f1c22d93..6d8af08b3e7d 100644
> > > > > --- a/arch/arm/mm/dma-mapping.c
> > > > > +++ b/arch/arm/mm/dma-mapping.c
> > > > > @@ -2400,3 +2400,19 @@ void arch_teardown_dma_ops(struct device *dev)
> > > > >    	arm_teardown_iommu_dma_ops(dev);
> > > > >    }
> > > > > +
> > > > > +void arm_dma_iommu_detach_device(struct device *dev)
> > > > > +{
> > > > > +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> > > > > +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> > > > > +
> > > > > +	if (!mapping)
> > > > > +		return;
> > > > > +
> > > > > +	arm_iommu_release_mapping(mapping);
> > > > 
> > > > Potentially freeing the mapping before you try to operate on it is never the
> > > > best idea. Plus arm_iommu_detach_device() already releases a reference
> > > > appropriately anyway, so it's a double-free.
> > > 
> > > But the reference released by arm_iommu_detach_device() is to balance
> > > out the reference acquired by arm_iommu_attach_device(), isn't it? In
> > > the above, the arm_iommu_release_mapping() is supposed to drop the
> > > final reference which was obtained by arm_iommu_create_mapping(). The
> > > mapping shouldn't go away irrespective of the order in which these
> > > will be called.
> > 
> > Going over the DMA/IOMMU code I just remembered that I drew inspiration
> > from arm_teardown_iommu_dma_ops() for the initial proposal which also
> > calls both arm_iommu_detach_device() and arm_iommu_release_mapping().
> > That said, one other possibility to implement this would be to export
> > the 32-bit and 64-bit ARM implementations of arch_teardown_dma_ops()
> > and use that instead. linux/dma-mapping.h implements a stub for
> > architectures that don't provide one, so it should work without any
> > #ifdef guards.
> > 
> > That combined with the set_dma_ops() fix in arm_iommu_detach_device()
> > should fix this pretty nicely.
> 
> OK, having a second look at the ARM code I see I had indeed overlooked that
> extra reference held until arm_teardown_iommu_dma_ops() - mea culpa - but
> frankly that looks wrong anyway, as it basically defeats the point of
> refcounting the mapping at all. AFAICS arm_setup_iommu_dma_ops() should just
> be made to behave 'normally' by unconditionally dropping the initial
> reference after calling __arm_iommu_attach_device(), then we don't need all
> these odd and confusing release calls dotted around at all.

Good point. I can follow up with a series to fix the reference counting.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180530/547e8bef/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-05-30 14:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30  8:03 [PATCH v3 0/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping Thierry Reding
2018-05-30  8:03 ` Thierry Reding
2018-05-30  8:03 ` [PATCH v3 1/2] ARM: dma-mapping: Implement arm_dma_iommu_detach_device() Thierry Reding
2018-05-30  8:03   ` Thierry Reding
     [not found]   ` <20180530080345.2353-2-thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-05-30  9:59     ` Robin Murphy
2018-05-30  9:59       ` Robin Murphy
     [not found]       ` <eee02391-aa25-da84-f98a-b5fed6c69599-5wv7dgnIgG8@public.gmane.org>
2018-05-30 12:54         ` Thierry Reding
2018-05-30 12:54           ` Thierry Reding
2018-05-30 13:12           ` Thierry Reding
2018-05-30 13:12             ` Thierry Reding
2018-05-30 13:42             ` Robin Murphy
2018-05-30 13:42               ` Robin Murphy
     [not found]               ` <d3121aeb-1d75-2ba6-681a-f1e0681e290a-5wv7dgnIgG8@public.gmane.org>
2018-05-30 14:07                 ` Thierry Reding
2018-05-30 14:07                   ` Thierry Reding
2018-05-30  8:03 ` [PATCH v3 2/2] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping Thierry Reding
2018-05-30  8:03   ` Thierry Reding
     [not found]   ` <20180530080345.2353-3-thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-05-30 10:30     ` Robin Murphy
2018-05-30 10:30       ` Robin Murphy
     [not found]       ` <7960e4fc-f680-f8d1-0c5a-3ff1e13b3154-5wv7dgnIgG8@public.gmane.org>
2018-05-30 13:00         ` Thierry Reding
2018-05-30 13:00           ` Thierry Reding
2018-05-30 13:30           ` Robin Murphy
2018-05-30 13:30             ` Robin Murphy
2018-05-30 13:41             ` Thierry Reding
2018-05-30 13:41               ` Thierry Reding
2018-05-30 13:46               ` Robin Murphy
2018-05-30 13:46                 ` Robin Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.