All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-04-25 10:10 ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: Christoph Hellwig, Joerg Roedel
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

From: Thierry Reding <treding@nvidia.com>

Depending on the kernel configuration, early ARM architecture setup code
may have attached the GPU to a DMA/IOMMU mapping that transparently uses
the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
backed buffers (a special bit in the GPU's MMU page tables indicates the
memory path to take: via the SMMU or directly to the memory controller).
Transparently backing DMA memory with an IOMMU prevents Nouveau from
properly handling such memory accesses and causes memory access faults.

As a side-note: buffers other than those allocated in instance memory
don't need to be physically contiguous from the GPU's perspective since
the GPU can map them into contiguous buffers using its own MMU. Mapping
these buffers through the IOMMU is unnecessary and will even lead to
performance degradation because of the additional translation.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
I had already sent this out independently to fix a regression that was
introduced in v4.16, but then Christoph pointed out that it should've
been sent to a wider audience and should use a core API rather than
calling into architecture code directly.

I've added it to this series for easier reference and to show the need
for the new API.

 .../drm/nouveau/nvkm/engine/device/tegra.c    | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index 78597da6313a..23428a7056e9 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -19,6 +19,11 @@
  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  * DEALINGS IN THE SOFTWARE.
  */
+
+#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
+#include <asm/dma-iommu.h>
+#endif
+
 #include <core/tegra.h>
 #ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER
 #include "priv.h"
@@ -105,6 +110,20 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
 	unsigned long pgsize_bitmap;
 	int ret;
 
+#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
+	if (dev->archdata.mapping) {
+		struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+
+		arm_iommu_release_mapping(mapping);
+		arm_iommu_detach_device(dev);
+
+		if (dev->archdata.dma_coherent)
+			set_dma_ops(dev, &arm_coherent_dma_ops);
+		else
+			set_dma_ops(dev, &arm_dma_ops);
+	}
+#endif
+
 	if (!tdev->func->iommu_bit)
 		return;
 
-- 
2.17.0

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-04-25 10:10 ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Depending on the kernel configuration, early ARM architecture setup code
may have attached the GPU to a DMA/IOMMU mapping that transparently uses
the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
backed buffers (a special bit in the GPU's MMU page tables indicates the
memory path to take: via the SMMU or directly to the memory controller).
Transparently backing DMA memory with an IOMMU prevents Nouveau from
properly handling such memory accesses and causes memory access faults.

As a side-note: buffers other than those allocated in instance memory
don't need to be physically contiguous from the GPU's perspective since
the GPU can map them into contiguous buffers using its own MMU. Mapping
these buffers through the IOMMU is unnecessary and will even lead to
performance degradation because of the additional translation.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
I had already sent this out independently to fix a regression that was
introduced in v4.16, but then Christoph pointed out that it should've
been sent to a wider audience and should use a core API rather than
calling into architecture code directly.

I've added it to this series for easier reference and to show the need
for the new API.

 .../drm/nouveau/nvkm/engine/device/tegra.c    | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index 78597da6313a..23428a7056e9 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -19,6 +19,11 @@
  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  * DEALINGS IN THE SOFTWARE.
  */
+
+#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
+#include <asm/dma-iommu.h>
+#endif
+
 #include <core/tegra.h>
 #ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER
 #include "priv.h"
@@ -105,6 +110,20 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
 	unsigned long pgsize_bitmap;
 	int ret;
 
+#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
+	if (dev->archdata.mapping) {
+		struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+
+		arm_iommu_release_mapping(mapping);
+		arm_iommu_detach_device(dev);
+
+		if (dev->archdata.dma_coherent)
+			set_dma_ops(dev, &arm_coherent_dma_ops);
+		else
+			set_dma_ops(dev, &arm_dma_ops);
+	}
+#endif
+
 	if (!tdev->func->iommu_bit)
 		return;
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
  2018-04-25 10:10 ` Thierry Reding
@ 2018-04-25 10:10     ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: Christoph Hellwig, Joerg Roedel
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

The dma_iommu_detach_device() API can be used by drivers to forcibly
detach a device from an IOMMU that architecture code might have attached
to. This is useful for drivers that need explicit control over the IOMMU
using the IOMMU API directly.

Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/base/dma-mapping.c  | 8 ++++++++
 include/linux/dma-mapping.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
index d82566d6e237..18ddf32b10c9 100644
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -366,3 +366,11 @@ void dma_deconfigure(struct device *dev)
 	of_dma_deconfigure(dev);
 	acpi_dma_deconfigure(dev);
 }
+
+void dma_iommu_detach_device(struct device *dev)
+{
+#ifdef arch_iommu_detach_device
+	arch_iommu_detach_device(dev);
+#endif
+}
+EXPORT_SYMBOL(dma_iommu_detach_device);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f8ab1c0f589e..732191a2c64e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -671,6 +671,8 @@ static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base,
 static inline void arch_teardown_dma_ops(struct device *dev) { }
 #endif
 
+extern void dma_iommu_detach_device(struct device *dev);
+
 static inline unsigned int dma_get_max_seg_size(struct device *dev)
 {
 	if (dev->dma_parms && dev->dma_parms->max_segment_size)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
@ 2018-04-25 10:10     ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

The dma_iommu_detach_device() API can be used by drivers to forcibly
detach a device from an IOMMU that architecture code might have attached
to. This is useful for drivers that need explicit control over the IOMMU
using the IOMMU API directly.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/base/dma-mapping.c  | 8 ++++++++
 include/linux/dma-mapping.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
index d82566d6e237..18ddf32b10c9 100644
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -366,3 +366,11 @@ void dma_deconfigure(struct device *dev)
 	of_dma_deconfigure(dev);
 	acpi_dma_deconfigure(dev);
 }
+
+void dma_iommu_detach_device(struct device *dev)
+{
+#ifdef arch_iommu_detach_device
+	arch_iommu_detach_device(dev);
+#endif
+}
+EXPORT_SYMBOL(dma_iommu_detach_device);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f8ab1c0f589e..732191a2c64e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -671,6 +671,8 @@ static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base,
 static inline void arch_teardown_dma_ops(struct device *dev) { }
 #endif
 
+extern void dma_iommu_detach_device(struct device *dev);
+
 static inline unsigned int dma_get_max_seg_size(struct device *dev)
 {
 	if (dev->dma_parms && dev->dma_parms->max_segment_size)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()
  2018-04-25 10:10 ` Thierry Reding
@ 2018-04-25 10:10     ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: Christoph Hellwig, Joerg Roedel
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

Implement this function to enable drivers from detaching from any IOMMU
domains that architecture code might have attached them to so that they
can take exclusive control of the IOMMU via the IOMMU API.

Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
Changes in v2:
- fix compilation

 arch/arm/include/asm/dma-mapping.h |  3 +++
 arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
 arch/arm/mm/dma-mapping.c          | 17 +++++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 8436f6ade57d..d6d5bd44f962 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 #define arch_teardown_dma_ops arch_teardown_dma_ops
 extern void arch_teardown_dma_ops(struct device *dev);
 
+#define arch_iommu_detach_device arch_iommu_detach_device
+extern void arch_iommu_detach_device(struct device *dev);
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 619f24a42d09..60fef97d8452 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -242,6 +242,10 @@ void arch_teardown_dma_ops(struct device *dev)
 {
 }
 
+void arch_iommu_detach_device(struct device *dev)
+{
+}
+
 #define PREALLOC_DMA_DEBUG_ENTRIES	4096
 
 static int __init dma_debug_do_init(void)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8c398fedbbb6..cc63a25bd088 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2423,3 +2423,20 @@ void arch_teardown_dma_ops(struct device *dev)
 
 	arm_teardown_iommu_dma_ops(dev);
 }
+
+void arch_iommu_detach_device(struct device *dev)
+{
+#ifdef CONFIG_ARM_DMA_USE_IOMMU
+	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	const struct dma_map_ops *dma_ops;
+
+	if (!mapping)
+		return;
+
+	arm_iommu_release_mapping(mapping);
+	arm_iommu_detach_device(dev);
+
+	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
+	set_dma_ops(dev, dma_ops);
+#endif
+}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()
@ 2018-04-25 10:10     ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Implement this function to enable drivers from detaching from any IOMMU
domains that architecture code might have attached them to so that they
can take exclusive control of the IOMMU via the IOMMU API.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v2:
- fix compilation

 arch/arm/include/asm/dma-mapping.h |  3 +++
 arch/arm/mm/dma-mapping-nommu.c    |  4 ++++
 arch/arm/mm/dma-mapping.c          | 17 +++++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 8436f6ade57d..d6d5bd44f962 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -103,6 +103,9 @@ extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 #define arch_teardown_dma_ops arch_teardown_dma_ops
 extern void arch_teardown_dma_ops(struct device *dev);
 
+#define arch_iommu_detach_device arch_iommu_detach_device
+extern void arch_iommu_detach_device(struct device *dev);
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 619f24a42d09..60fef97d8452 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -242,6 +242,10 @@ void arch_teardown_dma_ops(struct device *dev)
 {
 }
 
+void arch_iommu_detach_device(struct device *dev)
+{
+}
+
 #define PREALLOC_DMA_DEBUG_ENTRIES	4096
 
 static int __init dma_debug_do_init(void)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8c398fedbbb6..cc63a25bd088 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2423,3 +2423,20 @@ void arch_teardown_dma_ops(struct device *dev)
 
 	arm_teardown_iommu_dma_ops(dev);
 }
+
+void arch_iommu_detach_device(struct device *dev)
+{
+#ifdef CONFIG_ARM_DMA_USE_IOMMU
+	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	const struct dma_map_ops *dma_ops;
+
+	if (!mapping)
+		return;
+
+	arm_iommu_release_mapping(mapping);
+	arm_iommu_detach_device(dev);
+
+	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
+	set_dma_ops(dev, dma_ops);
+#endif
+}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 4/5] drm/nouveau: tegra: Use dma_iommu_detach_device()
  2018-04-25 10:10 ` Thierry Reding
@ 2018-04-25 10:10     ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: Christoph Hellwig, Joerg Roedel
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

Use the new dma_iommu_detach_device() function to replace the open-coded
equivalent.

Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 .../drm/nouveau/nvkm/engine/device/tegra.c    | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index 23428a7056e9..c0a7f3839cbb 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -20,10 +20,6 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-#include <asm/dma-iommu.h>
-#endif
-
 #include <core/tegra.h>
 #ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER
 #include "priv.h"
@@ -110,19 +106,8 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
 	unsigned long pgsize_bitmap;
 	int ret;
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-	if (dev->archdata.mapping) {
-		struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-		arm_iommu_release_mapping(mapping);
-		arm_iommu_detach_device(dev);
-
-		if (dev->archdata.dma_coherent)
-			set_dma_ops(dev, &arm_coherent_dma_ops);
-		else
-			set_dma_ops(dev, &arm_dma_ops);
-	}
-#endif
+	/* make sure we can use the IOMMU exclusively */
+	dma_iommu_detach_device(dev);
 
 	if (!tdev->func->iommu_bit)
 		return;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 4/5] drm/nouveau: tegra: Use dma_iommu_detach_device()
@ 2018-04-25 10:10     ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Use the new dma_iommu_detach_device() function to replace the open-coded
equivalent.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 .../drm/nouveau/nvkm/engine/device/tegra.c    | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index 23428a7056e9..c0a7f3839cbb 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -20,10 +20,6 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-#include <asm/dma-iommu.h>
-#endif
-
 #include <core/tegra.h>
 #ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER
 #include "priv.h"
@@ -110,19 +106,8 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
 	unsigned long pgsize_bitmap;
 	int ret;
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-	if (dev->archdata.mapping) {
-		struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-		arm_iommu_release_mapping(mapping);
-		arm_iommu_detach_device(dev);
-
-		if (dev->archdata.dma_coherent)
-			set_dma_ops(dev, &arm_coherent_dma_ops);
-		else
-			set_dma_ops(dev, &arm_dma_ops);
-	}
-#endif
+	/* make sure we can use the IOMMU exclusively */
+	dma_iommu_detach_device(dev);
 
 	if (!tdev->func->iommu_bit)
 		return;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU
  2018-04-25 10:10 ` Thierry Reding
@ 2018-04-25 10:10     ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: Christoph Hellwig, Joerg Roedel
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

From: Thierry Reding <treding@nvidia.com>

The ARM_DMA_USE_IOMMU Kconfig option has side-effects that drivers can
not opt into but have to explicitly opt out of. This can lead to subtle
bugs that are difficult to track down and not immediately obvious to be
related to this Kconfig option.

To avoid this confusion, always enable the option to expose any lurking
bugs once and allow any regressions introduced by the DMA/IOMMU code to
be caught more quickly in the future.

Note that some drivers still use the Kconfig symbol to provide different
code paths depending on what architecture the code runs on (e.g. 32-bit
ARM vs. 64-bit ARM which have different and incompatible implementations
of the DMA/IOMMU integration code), so leave the symbol in place to keep
those drivers working.

For the long term, it is preferable to transition everyone to the
generic DMA/IOMMU integration code in drivers/iommu/dma-iommu.c.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v2:
- new patch

 arch/arm/Kconfig               |  2 +-
 arch/arm/include/asm/device.h  |  6 ------
 arch/arm/mm/dma-mapping.c      | 18 ------------------
 drivers/iommu/Kconfig          |  7 -------
 drivers/media/platform/Kconfig |  1 -
 5 files changed, 1 insertion(+), 33 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index fa0b190f8a38..3c91de78535a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -124,7 +124,7 @@ config NEED_SG_DMA_LENGTH
 	bool
 
 config ARM_DMA_USE_IOMMU
-	bool
+	def_bool y
 	select ARM_HAS_SG_CHAIN
 	select NEED_SG_DMA_LENGTH
 
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 3234fe9bba6e..c3cf38e16866 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -13,9 +13,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu; /* private IOMMU data */
 #endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
 	struct dma_iommu_mapping	*mapping;
-#endif
 #ifdef CONFIG_XEN
 	const struct dma_map_ops *dev_dma_ops;
 #endif
@@ -31,10 +29,6 @@ struct pdev_archdata {
 #endif
 };
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
 #define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
 
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index cc63a25bd088..f6c28ed5651a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1174,8 +1174,6 @@ static int __init dma_debug_do_init(void)
 }
 core_initcall(dma_debug_do_init);
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-
 static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long attrs)
 {
 	int prot = 0;
@@ -2366,20 +2364,6 @@ static void arm_teardown_iommu_dma_ops(struct device *dev)
 	arm_iommu_release_mapping(mapping);
 }
 
-#else
-
-static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    const struct iommu_ops *iommu)
-{
-	return false;
-}
-
-static void arm_teardown_iommu_dma_ops(struct device *dev) { }
-
-#define arm_get_iommu_dma_map_ops arm_get_dma_map_ops
-
-#endif	/* CONFIG_ARM_DMA_USE_IOMMU */
-
 static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
@@ -2426,7 +2410,6 @@ void arch_teardown_dma_ops(struct device *dev)
 
 void arch_iommu_detach_device(struct device *dev)
 {
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
 	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
 	const struct dma_map_ops *dma_ops;
 
@@ -2438,5 +2421,4 @@ void arch_iommu_detach_device(struct device *dev)
 
 	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
 	set_dma_ops(dev, dma_ops);
-#endif
 }
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index df171cb85822..7f0b3ca76a17 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -226,7 +226,6 @@ config ROCKCHIP_IOMMU
 	depends on ARM || ARM64
 	depends on ARCH_ROCKCHIP || COMPILE_TEST
 	select IOMMU_API
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for IOMMUs found on Rockchip rk32xx SOCs.
 	  These IOMMUs allow virtualization of the address space used by most
@@ -259,7 +258,6 @@ config EXYNOS_IOMMU
 	depends on ARCH_EXYNOS && MMU
 	depends on !CPU_BIG_ENDIAN # revisit driver if we can enable big-endian ptes
 	select IOMMU_API
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for the IOMMU (System MMU) of Samsung Exynos application
 	  processor family. This enables H/W multimedia accelerators to see
@@ -283,7 +281,6 @@ config IPMMU_VMSA
 	depends on ARCH_RENESAS || (COMPILE_TEST && !GENERIC_ATOMIC64)
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for the Renesas VMSA-compatible IPMMU Renesas found in the
 	  R-Mobile APE6 and R-Car H2/M2 SoCs.
@@ -304,7 +301,6 @@ config ARM_SMMU
 	depends on (ARM64 || ARM) && MMU
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
-	select ARM_DMA_USE_IOMMU if ARM
 	help
 	  Support for implementations of the ARM System MMU architecture
 	  versions 1 and 2.
@@ -344,7 +340,6 @@ config MTK_IOMMU
 	bool "MTK IOMMU Support"
 	depends on ARM || ARM64
 	depends on ARCH_MEDIATEK || COMPILE_TEST
-	select ARM_DMA_USE_IOMMU
 	select IOMMU_API
 	select IOMMU_DMA
 	select IOMMU_IO_PGTABLE_ARMV7S
@@ -361,7 +356,6 @@ config MTK_IOMMU_V1
 	bool "MTK IOMMU Version 1 (M4U gen1) Support"
 	depends on ARM
 	depends on ARCH_MEDIATEK || COMPILE_TEST
-	select ARM_DMA_USE_IOMMU
 	select IOMMU_API
 	select MEMORY
 	select MTK_SMI
@@ -379,7 +373,6 @@ config QCOM_IOMMU
 	depends on HAS_DMA
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for IOMMU on certain Qualcomm SoCs.
 
diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index e3229f7baed1..5f7135b052ed 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -66,7 +66,6 @@ config VIDEO_OMAP3
 	depends on (ARCH_OMAP3 && OMAP_IOMMU) || COMPILE_TEST
 	depends on COMMON_CLK
 	depends on HAS_DMA && OF
-	select ARM_DMA_USE_IOMMU if OMAP_IOMMU
 	select VIDEOBUF2_DMA_CONTIG
 	select MFD_SYSCON
 	select V4L2_FWNODE
-- 
2.17.0

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU
@ 2018-04-25 10:10     ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-25 10:10 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

The ARM_DMA_USE_IOMMU Kconfig option has side-effects that drivers can
not opt into but have to explicitly opt out of. This can lead to subtle
bugs that are difficult to track down and not immediately obvious to be
related to this Kconfig option.

To avoid this confusion, always enable the option to expose any lurking
bugs once and allow any regressions introduced by the DMA/IOMMU code to
be caught more quickly in the future.

Note that some drivers still use the Kconfig symbol to provide different
code paths depending on what architecture the code runs on (e.g. 32-bit
ARM vs. 64-bit ARM which have different and incompatible implementations
of the DMA/IOMMU integration code), so leave the symbol in place to keep
those drivers working.

For the long term, it is preferable to transition everyone to the
generic DMA/IOMMU integration code in drivers/iommu/dma-iommu.c.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v2:
- new patch

 arch/arm/Kconfig               |  2 +-
 arch/arm/include/asm/device.h  |  6 ------
 arch/arm/mm/dma-mapping.c      | 18 ------------------
 drivers/iommu/Kconfig          |  7 -------
 drivers/media/platform/Kconfig |  1 -
 5 files changed, 1 insertion(+), 33 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index fa0b190f8a38..3c91de78535a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -124,7 +124,7 @@ config NEED_SG_DMA_LENGTH
 	bool
 
 config ARM_DMA_USE_IOMMU
-	bool
+	def_bool y
 	select ARM_HAS_SG_CHAIN
 	select NEED_SG_DMA_LENGTH
 
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 3234fe9bba6e..c3cf38e16866 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -13,9 +13,7 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu; /* private IOMMU data */
 #endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
 	struct dma_iommu_mapping	*mapping;
-#endif
 #ifdef CONFIG_XEN
 	const struct dma_map_ops *dev_dma_ops;
 #endif
@@ -31,10 +29,6 @@ struct pdev_archdata {
 #endif
 };
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
 #define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
 
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index cc63a25bd088..f6c28ed5651a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1174,8 +1174,6 @@ static int __init dma_debug_do_init(void)
 }
 core_initcall(dma_debug_do_init);
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-
 static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long attrs)
 {
 	int prot = 0;
@@ -2366,20 +2364,6 @@ static void arm_teardown_iommu_dma_ops(struct device *dev)
 	arm_iommu_release_mapping(mapping);
 }
 
-#else
-
-static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    const struct iommu_ops *iommu)
-{
-	return false;
-}
-
-static void arm_teardown_iommu_dma_ops(struct device *dev) { }
-
-#define arm_get_iommu_dma_map_ops arm_get_dma_map_ops
-
-#endif	/* CONFIG_ARM_DMA_USE_IOMMU */
-
 static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
@@ -2426,7 +2410,6 @@ void arch_teardown_dma_ops(struct device *dev)
 
 void arch_iommu_detach_device(struct device *dev)
 {
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
 	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
 	const struct dma_map_ops *dma_ops;
 
@@ -2438,5 +2421,4 @@ void arch_iommu_detach_device(struct device *dev)
 
 	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
 	set_dma_ops(dev, dma_ops);
-#endif
 }
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index df171cb85822..7f0b3ca76a17 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -226,7 +226,6 @@ config ROCKCHIP_IOMMU
 	depends on ARM || ARM64
 	depends on ARCH_ROCKCHIP || COMPILE_TEST
 	select IOMMU_API
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for IOMMUs found on Rockchip rk32xx SOCs.
 	  These IOMMUs allow virtualization of the address space used by most
@@ -259,7 +258,6 @@ config EXYNOS_IOMMU
 	depends on ARCH_EXYNOS && MMU
 	depends on !CPU_BIG_ENDIAN # revisit driver if we can enable big-endian ptes
 	select IOMMU_API
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for the IOMMU (System MMU) of Samsung Exynos application
 	  processor family. This enables H/W multimedia accelerators to see
@@ -283,7 +281,6 @@ config IPMMU_VMSA
 	depends on ARCH_RENESAS || (COMPILE_TEST && !GENERIC_ATOMIC64)
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for the Renesas VMSA-compatible IPMMU Renesas found in the
 	  R-Mobile APE6 and R-Car H2/M2 SoCs.
@@ -304,7 +301,6 @@ config ARM_SMMU
 	depends on (ARM64 || ARM) && MMU
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
-	select ARM_DMA_USE_IOMMU if ARM
 	help
 	  Support for implementations of the ARM System MMU architecture
 	  versions 1 and 2.
@@ -344,7 +340,6 @@ config MTK_IOMMU
 	bool "MTK IOMMU Support"
 	depends on ARM || ARM64
 	depends on ARCH_MEDIATEK || COMPILE_TEST
-	select ARM_DMA_USE_IOMMU
 	select IOMMU_API
 	select IOMMU_DMA
 	select IOMMU_IO_PGTABLE_ARMV7S
@@ -361,7 +356,6 @@ config MTK_IOMMU_V1
 	bool "MTK IOMMU Version 1 (M4U gen1) Support"
 	depends on ARM
 	depends on ARCH_MEDIATEK || COMPILE_TEST
-	select ARM_DMA_USE_IOMMU
 	select IOMMU_API
 	select MEMORY
 	select MTK_SMI
@@ -379,7 +373,6 @@ config QCOM_IOMMU
 	depends on HAS_DMA
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
-	select ARM_DMA_USE_IOMMU
 	help
 	  Support for IOMMU on certain Qualcomm SoCs.
 
diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index e3229f7baed1..5f7135b052ed 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -66,7 +66,6 @@ config VIDEO_OMAP3
 	depends on (ARCH_OMAP3 && OMAP_IOMMU) || COMPILE_TEST
 	depends on COMMON_CLK
 	depends on HAS_DMA && OF
-	select ARM_DMA_USE_IOMMU if OMAP_IOMMU
 	select VIDEOBUF2_DMA_CONTIG
 	select MFD_SYSCON
 	select V4L2_FWNODE
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU
  2018-04-25 10:10     ` Thierry Reding
@ 2018-04-25 10:25         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 42+ messages in thread
From: Russell King - ARM Linux @ 2018-04-25 10:25 UTC (permalink / raw)
  To: Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Apr 25, 2018 at 12:10:51PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> 
> The ARM_DMA_USE_IOMMU Kconfig option has side-effects that drivers can
> not opt into but have to explicitly opt out of. This can lead to subtle
> bugs that are difficult to track down and not immediately obvious to be
> related to this Kconfig option.
> 
> To avoid this confusion, always enable the option to expose any lurking
> bugs once and allow any regressions introduced by the DMA/IOMMU code to
> be caught more quickly in the future.
> 
> Note that some drivers still use the Kconfig symbol to provide different
> code paths depending on what architecture the code runs on (e.g. 32-bit
> ARM vs. 64-bit ARM which have different and incompatible implementations
> of the DMA/IOMMU integration code), so leave the symbol in place to keep
> those drivers working.
> 
> For the long term, it is preferable to transition everyone to the
> generic DMA/IOMMU integration code in drivers/iommu/dma-iommu.c.
> 
> Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
> Changes in v2:
> - new patch
> 
>  arch/arm/Kconfig               |  2 +-
>  arch/arm/include/asm/device.h  |  6 ------
>  arch/arm/mm/dma-mapping.c      | 18 ------------------
>  drivers/iommu/Kconfig          |  7 -------
>  drivers/media/platform/Kconfig |  1 -
>  5 files changed, 1 insertion(+), 33 deletions(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index fa0b190f8a38..3c91de78535a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -124,7 +124,7 @@ config NEED_SG_DMA_LENGTH
>  	bool
>  
>  config ARM_DMA_USE_IOMMU
> -	bool
> +	def_bool y
>  	select ARM_HAS_SG_CHAIN
>  	select NEED_SG_DMA_LENGTH

This doesn't work - as has recently been discussed with hch, we can't
globally enable NEED_SG_DMA_LENGTH on ARM - the ARM architecture
pre-dates the addition of the DMA length member in the scatterlist,
and not every machine supports the splitting of the DMA length from
the non-DMA length member.  Hence, this will cause a regression,
sorry.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU
@ 2018-04-25 10:25         ` Russell King - ARM Linux
  0 siblings, 0 replies; 42+ messages in thread
From: Russell King - ARM Linux @ 2018-04-25 10:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 12:10:51PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> The ARM_DMA_USE_IOMMU Kconfig option has side-effects that drivers can
> not opt into but have to explicitly opt out of. This can lead to subtle
> bugs that are difficult to track down and not immediately obvious to be
> related to this Kconfig option.
> 
> To avoid this confusion, always enable the option to expose any lurking
> bugs once and allow any regressions introduced by the DMA/IOMMU code to
> be caught more quickly in the future.
> 
> Note that some drivers still use the Kconfig symbol to provide different
> code paths depending on what architecture the code runs on (e.g. 32-bit
> ARM vs. 64-bit ARM which have different and incompatible implementations
> of the DMA/IOMMU integration code), so leave the symbol in place to keep
> those drivers working.
> 
> For the long term, it is preferable to transition everyone to the
> generic DMA/IOMMU integration code in drivers/iommu/dma-iommu.c.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> Changes in v2:
> - new patch
> 
>  arch/arm/Kconfig               |  2 +-
>  arch/arm/include/asm/device.h  |  6 ------
>  arch/arm/mm/dma-mapping.c      | 18 ------------------
>  drivers/iommu/Kconfig          |  7 -------
>  drivers/media/platform/Kconfig |  1 -
>  5 files changed, 1 insertion(+), 33 deletions(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index fa0b190f8a38..3c91de78535a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -124,7 +124,7 @@ config NEED_SG_DMA_LENGTH
>  	bool
>  
>  config ARM_DMA_USE_IOMMU
> -	bool
> +	def_bool y
>  	select ARM_HAS_SG_CHAIN
>  	select NEED_SG_DMA_LENGTH

This doesn't work - as has recently been discussed with hch, we can't
globally enable NEED_SG_DMA_LENGTH on ARM - the ARM architecture
pre-dates the addition of the DMA length member in the scatterlist,
and not every machine supports the splitting of the DMA length from
the non-DMA length member.  Hence, this will cause a regression,
sorry.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU
  2018-04-25 10:25         ` Russell King - ARM Linux
@ 2018-04-25 15:17           ` Christoph Hellwig
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2018-04-25 15:17 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: nouveau, Joerg Roedel, dri-devel, Christoph Hellwig, iommu,
	Thierry Reding, Daniel Vetter, linux-tegra, linux-arm-kernel

On Wed, Apr 25, 2018 at 11:25:11AM +0100, Russell King - ARM Linux wrote:
> >  config ARM_DMA_USE_IOMMU
> > -	bool
> > +	def_bool y
> >  	select ARM_HAS_SG_CHAIN
> >  	select NEED_SG_DMA_LENGTH
> 
> This doesn't work - as has recently been discussed with hch, we can't
> globally enable NEED_SG_DMA_LENGTH on ARM - the ARM architecture
> pre-dates the addition of the DMA length member in the scatterlist,
> and not every machine supports the splitting of the DMA length from
> the non-DMA length member.  Hence, this will cause a regression,
> sorry.

Agreed as-is.  However supporting it everywhere should not be much work,
I'll take a look at it.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU
@ 2018-04-25 15:17           ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2018-04-25 15:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 11:25:11AM +0100, Russell King - ARM Linux wrote:
> >  config ARM_DMA_USE_IOMMU
> > -	bool
> > +	def_bool y
> >  	select ARM_HAS_SG_CHAIN
> >  	select NEED_SG_DMA_LENGTH
> 
> This doesn't work - as has recently been discussed with hch, we can't
> globally enable NEED_SG_DMA_LENGTH on ARM - the ARM architecture
> pre-dates the addition of the DMA length member in the scatterlist,
> and not every machine supports the splitting of the DMA length from
> the non-DMA length member.  Hence, this will cause a regression,
> sorry.

Agreed as-is.  However supporting it everywhere should not be much work,
I'll take a look at it.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-04-25 10:10 ` Thierry Reding
@ 2018-04-25 15:18     ` Christoph Hellwig
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2018-04-25 15:18 UTC (permalink / raw)
  To: Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

The series seems to miss a cover letter.

Also I really think this patch original patch shouldn't be in the proper
series.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-04-25 15:18     ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2018-04-25 15:18 UTC (permalink / raw)
  To: linux-arm-kernel

The series seems to miss a cover letter.

Also I really think this patch original patch shouldn't be in the proper
series.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
  2018-04-25 10:10     ` Thierry Reding
@ 2018-04-25 15:19         ` Christoph Hellwig
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2018-04-25 15:19 UTC (permalink / raw)
  To: Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> 
> The dma_iommu_detach_device() API can be used by drivers to forcibly
> detach a device from an IOMMU that architecture code might have attached
> to. This is useful for drivers that need explicit control over the IOMMU
> using the IOMMU API directly.

Given that no one else implements it making it a generic API seems
rather confusing.  For now I'd rename it to
arm_dma_iommu_detach_device() and only implement it in arm.

Once I've got the dma mapping implementations consolidated to a small
enough number we could think about something like a device quirk that
tells the architecture to simply never even attach the iommu dma ops
to start with.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
@ 2018-04-25 15:19         ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2018-04-25 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> The dma_iommu_detach_device() API can be used by drivers to forcibly
> detach a device from an IOMMU that architecture code might have attached
> to. This is useful for drivers that need explicit control over the IOMMU
> using the IOMMU API directly.

Given that no one else implements it making it a generic API seems
rather confusing.  For now I'd rename it to
arm_dma_iommu_detach_device() and only implement it in arm.

Once I've got the dma mapping implementations consolidated to a small
enough number we could think about something like a device quirk that
tells the architecture to simply never even attach the iommu dma ops
to start with.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()
  2018-04-25 10:10     ` Thierry Reding
@ 2018-04-25 15:20         ` Christoph Hellwig
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2018-04-25 15:20 UTC (permalink / raw)
  To: Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> +void arch_iommu_detach_device(struct device *dev)
> +{
> +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> +	const struct dma_map_ops *dma_ops;
> +
> +	if (!mapping)
> +		return;
> +
> +	arm_iommu_release_mapping(mapping);
> +	arm_iommu_detach_device(dev);
> +
> +	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
> +	set_dma_ops(dev, dma_ops);

Why not simply:

	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()
@ 2018-04-25 15:20         ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2018-04-25 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

> +void arch_iommu_detach_device(struct device *dev)
> +{
> +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> +	const struct dma_map_ops *dma_ops;
> +
> +	if (!mapping)
> +		return;
> +
> +	arm_iommu_release_mapping(mapping);
> +	arm_iommu_detach_device(dev);
> +
> +	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
> +	set_dma_ops(dev, dma_ops);

Why not simply:

	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-04-25 10:10 ` Thierry Reding
@ 2018-04-25 15:28   ` Jordan Crouse
  -1 siblings, 0 replies; 42+ messages in thread
From: Jordan Crouse @ 2018-04-25 15:28 UTC (permalink / raw)
  To: Thierry Reding
  Cc: nouveau, Russell King, dri-devel, Christoph Hellwig, iommu,
	linux-tegra, linux-arm-kernel

On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Depending on the kernel configuration, early ARM architecture setup code
> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> backed buffers (a special bit in the GPU's MMU page tables indicates the
> memory path to take: via the SMMU or directly to the memory controller).
> Transparently backing DMA memory with an IOMMU prevents Nouveau from
> properly handling such memory accesses and causes memory access faults.
> 
> As a side-note: buffers other than those allocated in instance memory
> don't need to be physically contiguous from the GPU's perspective since
> the GPU can map them into contiguous buffers using its own MMU. Mapping
> these buffers through the IOMMU is unnecessary and will even lead to
> performance degradation because of the additional translation.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> I had already sent this out independently to fix a regression that was
> introduced in v4.16, but then Christoph pointed out that it should've
> been sent to a wider audience and should use a core API rather than
> calling into architecture code directly.
> 
> I've added it to this series for easier reference and to show the need
> for the new API.

This is good stuff, I am struggling with something similar on ARM64. One
problem that I wasn't able to fully solve cleanly was that for arm-smmu 
the SMMU HW resources are not released until the domain itself is destroyed
and I never quite figured out a way to swap the default domain cleanly.

This is a problem for the MSM GPU because not only do we run our own IOMMU as
you do we also have a hardware dependency to use context bank 0 to
asynchronously switch the pagetable during rendering.

I'm not sure if this is a problem you have encountered. In any event, this code
gets us a little bit further down the path and at least there is somebody else
out there in the cold dark world that understands my pain. :)

For your interest, here was my half-hearted attempt to avoid creating DMA
domains in the first place based on a blacklist to try to spur a bit of
discussion: https://patchwork.freedesktop.org/series/41573/

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-04-25 15:28   ` Jordan Crouse
  0 siblings, 0 replies; 42+ messages in thread
From: Jordan Crouse @ 2018-04-25 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Depending on the kernel configuration, early ARM architecture setup code
> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> backed buffers (a special bit in the GPU's MMU page tables indicates the
> memory path to take: via the SMMU or directly to the memory controller).
> Transparently backing DMA memory with an IOMMU prevents Nouveau from
> properly handling such memory accesses and causes memory access faults.
> 
> As a side-note: buffers other than those allocated in instance memory
> don't need to be physically contiguous from the GPU's perspective since
> the GPU can map them into contiguous buffers using its own MMU. Mapping
> these buffers through the IOMMU is unnecessary and will even lead to
> performance degradation because of the additional translation.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> I had already sent this out independently to fix a regression that was
> introduced in v4.16, but then Christoph pointed out that it should've
> been sent to a wider audience and should use a core API rather than
> calling into architecture code directly.
> 
> I've added it to this series for easier reference and to show the need
> for the new API.

This is good stuff, I am struggling with something similar on ARM64. One
problem that I wasn't able to fully solve cleanly was that for arm-smmu 
the SMMU HW resources are not released until the domain itself is destroyed
and I never quite figured out a way to swap the default domain cleanly.

This is a problem for the MSM GPU because not only do we run our own IOMMU as
you do we also have a hardware dependency to use context bank 0 to
asynchronously switch the pagetable during rendering.

I'm not sure if this is a problem you have encountered. In any event, this code
gets us a little bit further down the path and at least there is somebody else
out there in the cold dark world that understands my pain. :)

For your interest, here was my half-hearted attempt to avoid creating DMA
domains in the first place based on a blacklist to try to spur a bit of
discussion: https://patchwork.freedesktop.org/series/41573/

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-04-25 15:18     ` Christoph Hellwig
@ 2018-04-26 12:09         ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 12:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 464 bytes --]

On Wed, Apr 25, 2018 at 08:18:15AM -0700, Christoph Hellwig wrote:
> The series seems to miss a cover letter.
> 
> Also I really think this patch original patch shouldn't be in the proper
> series.

I added a note explaining why I included this. Not everyone in this
discussion had seen the patch and therefore may not be aware of the
problem that the series attempts to fix. I agree that it shouldn't
be merged as part of the series, though.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-04-26 12:09         ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 12:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 08:18:15AM -0700, Christoph Hellwig wrote:
> The series seems to miss a cover letter.
> 
> Also I really think this patch original patch shouldn't be in the proper
> series.

I added a note explaining why I included this. Not everyone in this
discussion had seen the patch and therefore may not be aware of the
problem that the series attempts to fix. I agree that it shouldn't
be merged as part of the series, though.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180426/a91efdbf/attachment.sig>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
  2018-04-25 15:19         ` Christoph Hellwig
@ 2018-04-26 12:11             ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 12:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 908 bytes --]

On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
> > From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > 
> > The dma_iommu_detach_device() API can be used by drivers to forcibly
> > detach a device from an IOMMU that architecture code might have attached
> > to. This is useful for drivers that need explicit control over the IOMMU
> > using the IOMMU API directly.
> 
> Given that no one else implements it making it a generic API seems
> rather confusing.  For now I'd rename it to
> arm_dma_iommu_detach_device() and only implement it in arm.

That'd be suboptimal because this code is used on both 32-bit and 64-bit
ARM. If we make the function 32-bit ARM specific then the driver code
would need to use an #ifdef to make sure compilation doesn't break on
64-bit ARM.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
@ 2018-04-26 12:11             ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 12:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > The dma_iommu_detach_device() API can be used by drivers to forcibly
> > detach a device from an IOMMU that architecture code might have attached
> > to. This is useful for drivers that need explicit control over the IOMMU
> > using the IOMMU API directly.
> 
> Given that no one else implements it making it a generic API seems
> rather confusing.  For now I'd rename it to
> arm_dma_iommu_detach_device() and only implement it in arm.

That'd be suboptimal because this code is used on both 32-bit and 64-bit
ARM. If we make the function 32-bit ARM specific then the driver code
would need to use an #ifdef to make sure compilation doesn't break on
64-bit ARM.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180426/550c5cea/attachment.sig>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()
  2018-04-25 15:20         ` Christoph Hellwig
@ 2018-04-26 12:14             ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 12:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 773 bytes --]

On Wed, Apr 25, 2018 at 08:20:49AM -0700, Christoph Hellwig wrote:
> > +void arch_iommu_detach_device(struct device *dev)
> > +{
> > +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> > +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> > +	const struct dma_map_ops *dma_ops;
> > +
> > +	if (!mapping)
> > +		return;
> > +
> > +	arm_iommu_release_mapping(mapping);
> > +	arm_iommu_detach_device(dev);
> > +
> > +	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
> > +	set_dma_ops(dev, dma_ops);
> 
> Why not simply:
> 
> 	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));

I had that initially, but it looked cluttered to me, so I split it up.
I don't care much either way, so I can revert to that if you prefer.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device()
@ 2018-04-26 12:14             ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 08:20:49AM -0700, Christoph Hellwig wrote:
> > +void arch_iommu_detach_device(struct device *dev)
> > +{
> > +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> > +	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> > +	const struct dma_map_ops *dma_ops;
> > +
> > +	if (!mapping)
> > +		return;
> > +
> > +	arm_iommu_release_mapping(mapping);
> > +	arm_iommu_detach_device(dev);
> > +
> > +	dma_ops = arm_get_dma_map_ops(dev->archdata.dma_coherent);
> > +	set_dma_ops(dev, dma_ops);
> 
> Why not simply:
> 
> 	set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));

I had that initially, but it looked cluttered to me, so I split it up.
I don't care much either way, so I can revert to that if you prefer.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180426/c380f505/attachment.sig>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-04-25 15:28   ` Jordan Crouse
@ 2018-04-26 12:41       ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 12:41 UTC (permalink / raw)
  To: Christoph Hellwig, Joerg Roedel,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 5321 bytes --]

On Wed, Apr 25, 2018 at 09:28:49AM -0600, Jordan Crouse wrote:
> On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding wrote:
> > From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > 
> > Depending on the kernel configuration, early ARM architecture setup code
> > may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> > the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> > backed buffers (a special bit in the GPU's MMU page tables indicates the
> > memory path to take: via the SMMU or directly to the memory controller).
> > Transparently backing DMA memory with an IOMMU prevents Nouveau from
> > properly handling such memory accesses and causes memory access faults.
> > 
> > As a side-note: buffers other than those allocated in instance memory
> > don't need to be physically contiguous from the GPU's perspective since
> > the GPU can map them into contiguous buffers using its own MMU. Mapping
> > these buffers through the IOMMU is unnecessary and will even lead to
> > performance degradation because of the additional translation.
> > 
> > Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > ---
> > I had already sent this out independently to fix a regression that was
> > introduced in v4.16, but then Christoph pointed out that it should've
> > been sent to a wider audience and should use a core API rather than
> > calling into architecture code directly.
> > 
> > I've added it to this series for easier reference and to show the need
> > for the new API.
> 
> This is good stuff, I am struggling with something similar on ARM64. One
> problem that I wasn't able to fully solve cleanly was that for arm-smmu 
> the SMMU HW resources are not released until the domain itself is destroyed
> and I never quite figured out a way to swap the default domain cleanly.
> 
> This is a problem for the MSM GPU because not only do we run our own IOMMU as
> you do we also have a hardware dependency to use context bank 0 to
> asynchronously switch the pagetable during rendering.
> 
> I'm not sure if this is a problem you have encountered.

I don't think I have. Recent chips have similar capabilities, but they
don't have the restriction to a context bank, as far as I understand.
Adding Mikko who's had more exposure to this.

> In any event, this code
> gets us a little bit further down the path and at least there is somebody else
> out there in the cold dark world that understands my pain. :)

This doesn't actually fix anything on 64-bit ARM, and oddly enough I
haven't run into this issue myself on 64-bit ARM either. I think the
reason is that I haven't tested Nouveau on Tegra186 yet, which is the
first SoC which has an ARM SMMU. On prior 64-bit ARM chips we've used
the custom Tegra SMMU and that driver simply forbids creating any DMA
domains, so it will allow only explicit usage of the IOMMU API. There
is code in the generic DMA/IOMMU integration layer to not use the DMA
API with non-DMA IOMMU domains, but that's not true on 32-bit ARM,
unfortunately. It's entirely possible that Tegra186 will show exactly
the same problem that you are describing.

We do use the IOMMU API explicitly in the Tegra DRM driver as well,
and that is something that I've tested on Tegra186 and that I know to
be working. However, the reason why it works there is that the IOMMU
group will contain multiple display controllers, which will again
trigger a special case that will prevent the DMA/IOMMU integration
from setting up a DMA domain for use with those devices.

> For your interest, here was my half-hearted attempt to avoid creating DMA
> domains in the first place based on a blacklist to try to spur a bit of
> discussion: https://patchwork.freedesktop.org/series/41573/

This looks very interesting and simple, but I can imagine that it will
see significant pushback from the ARM SMMU maintainers (if not complete
silence), because it adds SoC-specific knowledge to an otherwise fully
generic driver. Having the GPU driver explicitly detach from the IOMMU
domain sounds like a better option, but I don't see how that would help
with the context bank issue that you're seeing.

One other possibility that I can imagine is to add something to struct
device that could be used by arch_setup_dma_ops() to not attach any of
the IOMMU-backed DMA operations to the device. Unfortunately that code
is called before the driver's ->probe() implementation is called, so a
driver doesn't have an opportunity to set it. Something like
of_dma_configure() could still set that up, perhaps based on some DT
property, though I can already hear the NAK from DT maintainers because
this is, after all, policy, not hardware description.

The last solution that I can think of that might allow us to do this is
to add a flag to struct device_driver (bool explicit_iommu?) that will
be used by the DMA/IOMMU setup code to decide whether or not to attach
to the IOMMU automatically.

Though, again, I'm not sure that would actually solve your bank problem.
That's really something I don't see any other solution than to fix it in
the ARM SMMU driver. Perhaps context bank 0 can always be reserved for
non-DMA domains?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-04-26 12:41       ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 09:28:49AM -0600, Jordan Crouse wrote:
> On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > Depending on the kernel configuration, early ARM architecture setup code
> > may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> > the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> > backed buffers (a special bit in the GPU's MMU page tables indicates the
> > memory path to take: via the SMMU or directly to the memory controller).
> > Transparently backing DMA memory with an IOMMU prevents Nouveau from
> > properly handling such memory accesses and causes memory access faults.
> > 
> > As a side-note: buffers other than those allocated in instance memory
> > don't need to be physically contiguous from the GPU's perspective since
> > the GPU can map them into contiguous buffers using its own MMU. Mapping
> > these buffers through the IOMMU is unnecessary and will even lead to
> > performance degradation because of the additional translation.
> > 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> > I had already sent this out independently to fix a regression that was
> > introduced in v4.16, but then Christoph pointed out that it should've
> > been sent to a wider audience and should use a core API rather than
> > calling into architecture code directly.
> > 
> > I've added it to this series for easier reference and to show the need
> > for the new API.
> 
> This is good stuff, I am struggling with something similar on ARM64. One
> problem that I wasn't able to fully solve cleanly was that for arm-smmu 
> the SMMU HW resources are not released until the domain itself is destroyed
> and I never quite figured out a way to swap the default domain cleanly.
> 
> This is a problem for the MSM GPU because not only do we run our own IOMMU as
> you do we also have a hardware dependency to use context bank 0 to
> asynchronously switch the pagetable during rendering.
> 
> I'm not sure if this is a problem you have encountered.

I don't think I have. Recent chips have similar capabilities, but they
don't have the restriction to a context bank, as far as I understand.
Adding Mikko who's had more exposure to this.

> In any event, this code
> gets us a little bit further down the path and at least there is somebody else
> out there in the cold dark world that understands my pain. :)

This doesn't actually fix anything on 64-bit ARM, and oddly enough I
haven't run into this issue myself on 64-bit ARM either. I think the
reason is that I haven't tested Nouveau on Tegra186 yet, which is the
first SoC which has an ARM SMMU. On prior 64-bit ARM chips we've used
the custom Tegra SMMU and that driver simply forbids creating any DMA
domains, so it will allow only explicit usage of the IOMMU API. There
is code in the generic DMA/IOMMU integration layer to not use the DMA
API with non-DMA IOMMU domains, but that's not true on 32-bit ARM,
unfortunately. It's entirely possible that Tegra186 will show exactly
the same problem that you are describing.

We do use the IOMMU API explicitly in the Tegra DRM driver as well,
and that is something that I've tested on Tegra186 and that I know to
be working. However, the reason why it works there is that the IOMMU
group will contain multiple display controllers, which will again
trigger a special case that will prevent the DMA/IOMMU integration
from setting up a DMA domain for use with those devices.

> For your interest, here was my half-hearted attempt to avoid creating DMA
> domains in the first place based on a blacklist to try to spur a bit of
> discussion: https://patchwork.freedesktop.org/series/41573/

This looks very interesting and simple, but I can imagine that it will
see significant pushback from the ARM SMMU maintainers (if not complete
silence), because it adds SoC-specific knowledge to an otherwise fully
generic driver. Having the GPU driver explicitly detach from the IOMMU
domain sounds like a better option, but I don't see how that would help
with the context bank issue that you're seeing.

One other possibility that I can imagine is to add something to struct
device that could be used by arch_setup_dma_ops() to not attach any of
the IOMMU-backed DMA operations to the device. Unfortunately that code
is called before the driver's ->probe() implementation is called, so a
driver doesn't have an opportunity to set it. Something like
of_dma_configure() could still set that up, perhaps based on some DT
property, though I can already hear the NAK from DT maintainers because
this is, after all, policy, not hardware description.

The last solution that I can think of that might allow us to do this is
to add a flag to struct device_driver (bool explicit_iommu?) that will
be used by the DMA/IOMMU setup code to decide whether or not to attach
to the IOMMU automatically.

Though, again, I'm not sure that would actually solve your bank problem.
That's really something I don't see any other solution than to fix it in
the ARM SMMU driver. Perhaps context bank 0 can always be reserved for
non-DMA domains?

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180426/86bf02b0/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-04-26 12:41       ` Thierry Reding
@ 2018-04-26 12:59         ` Mikko Perttunen
  -1 siblings, 0 replies; 42+ messages in thread
From: Mikko Perttunen @ 2018-04-26 12:59 UTC (permalink / raw)
  To: Thierry Reding, Christoph Hellwig, Joerg Roedel, nouveau,
	Russell King, dri-devel, iommu, Daniel Vetter, linux-tegra,
	linux-arm-kernel

On 26.04.2018 15:41, Thierry Reding wrote:
> On Wed, Apr 25, 2018 at 09:28:49AM -0600, Jordan Crouse wrote:
>> On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding wrote:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> Depending on the kernel configuration, early ARM architecture setup code
>>> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
>>> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
>>> backed buffers (a special bit in the GPU's MMU page tables indicates the
>>> memory path to take: via the SMMU or directly to the memory controller).
>>> Transparently backing DMA memory with an IOMMU prevents Nouveau from
>>> properly handling such memory accesses and causes memory access faults.
>>>
>>> As a side-note: buffers other than those allocated in instance memory
>>> don't need to be physically contiguous from the GPU's perspective since
>>> the GPU can map them into contiguous buffers using its own MMU. Mapping
>>> these buffers through the IOMMU is unnecessary and will even lead to
>>> performance degradation because of the additional translation.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>> I had already sent this out independently to fix a regression that was
>>> introduced in v4.16, but then Christoph pointed out that it should've
>>> been sent to a wider audience and should use a core API rather than
>>> calling into architecture code directly.
>>>
>>> I've added it to this series for easier reference and to show the need
>>> for the new API.
>>
>> This is good stuff, I am struggling with something similar on ARM64. One
>> problem that I wasn't able to fully solve cleanly was that for arm-smmu
>> the SMMU HW resources are not released until the domain itself is destroyed
>> and I never quite figured out a way to swap the default domain cleanly.
>>
>> This is a problem for the MSM GPU because not only do we run our own IOMMU as
>> you do we also have a hardware dependency to use context bank 0 to
>> asynchronously switch the pagetable during rendering.
>>
>> I'm not sure if this is a problem you have encountered.
>
> I don't think I have. Recent chips have similar capabilities, but they
> don't have the restriction to a context bank, as far as I understand.
> Adding Mikko who's had more exposure to this.

IIRC the only way I've gotten Host1x to work on Tegra186 with IOMMU 
enabled is doing the equivalent of this patch (or actually using the DMA 
API, which may be possible but has some potential issues).

As you said, we don't have a limitation regarding the context bank or 
similar - Host1x handles context switching by changing the sent stream 
ID on the fly (which is quite difficult to deal with from kernel point 
of view as well), and the actual GPU has its own MMU.

Thanks,
Mikko

>
>> In any event, this code
>> gets us a little bit further down the path and at least there is somebody else
>> out there in the cold dark world that understands my pain. :)
>
> This doesn't actually fix anything on 64-bit ARM, and oddly enough I
> haven't run into this issue myself on 64-bit ARM either. I think the
> reason is that I haven't tested Nouveau on Tegra186 yet, which is the
> first SoC which has an ARM SMMU. On prior 64-bit ARM chips we've used
> the custom Tegra SMMU and that driver simply forbids creating any DMA
> domains, so it will allow only explicit usage of the IOMMU API. There
> is code in the generic DMA/IOMMU integration layer to not use the DMA
> API with non-DMA IOMMU domains, but that's not true on 32-bit ARM,
> unfortunately. It's entirely possible that Tegra186 will show exactly
> the same problem that you are describing.
>
> We do use the IOMMU API explicitly in the Tegra DRM driver as well,
> and that is something that I've tested on Tegra186 and that I know to
> be working. However, the reason why it works there is that the IOMMU
> group will contain multiple display controllers, which will again
> trigger a special case that will prevent the DMA/IOMMU integration
> from setting up a DMA domain for use with those devices.
>
>> For your interest, here was my half-hearted attempt to avoid creating DMA
>> domains in the first place based on a blacklist to try to spur a bit of
>> discussion: https://patchwork.freedesktop.org/series/41573/
>
> This looks very interesting and simple, but I can imagine that it will
> see significant pushback from the ARM SMMU maintainers (if not complete
> silence), because it adds SoC-specific knowledge to an otherwise fully
> generic driver. Having the GPU driver explicitly detach from the IOMMU
> domain sounds like a better option, but I don't see how that would help
> with the context bank issue that you're seeing.
>
> One other possibility that I can imagine is to add something to struct
> device that could be used by arch_setup_dma_ops() to not attach any of
> the IOMMU-backed DMA operations to the device. Unfortunately that code
> is called before the driver's ->probe() implementation is called, so a
> driver doesn't have an opportunity to set it. Something like
> of_dma_configure() could still set that up, perhaps based on some DT
> property, though I can already hear the NAK from DT maintainers because
> this is, after all, policy, not hardware description.
>
> The last solution that I can think of that might allow us to do this is
> to add a flag to struct device_driver (bool explicit_iommu?) that will
> be used by the DMA/IOMMU setup code to decide whether or not to attach
> to the IOMMU automatically.
>
> Though, again, I'm not sure that would actually solve your bank problem.
> That's really something I don't see any other solution than to fix it in
> the ARM SMMU driver. Perhaps context bank 0 can always be reserved for
> non-DMA domains?
>
> Thierry
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-04-26 12:59         ` Mikko Perttunen
  0 siblings, 0 replies; 42+ messages in thread
From: Mikko Perttunen @ 2018-04-26 12:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 26.04.2018 15:41, Thierry Reding wrote:
> On Wed, Apr 25, 2018 at 09:28:49AM -0600, Jordan Crouse wrote:
>> On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding wrote:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> Depending on the kernel configuration, early ARM architecture setup code
>>> may have attached the GPU to a DMA/IOMMU mapping that transparently uses
>>> the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
>>> backed buffers (a special bit in the GPU's MMU page tables indicates the
>>> memory path to take: via the SMMU or directly to the memory controller).
>>> Transparently backing DMA memory with an IOMMU prevents Nouveau from
>>> properly handling such memory accesses and causes memory access faults.
>>>
>>> As a side-note: buffers other than those allocated in instance memory
>>> don't need to be physically contiguous from the GPU's perspective since
>>> the GPU can map them into contiguous buffers using its own MMU. Mapping
>>> these buffers through the IOMMU is unnecessary and will even lead to
>>> performance degradation because of the additional translation.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>> I had already sent this out independently to fix a regression that was
>>> introduced in v4.16, but then Christoph pointed out that it should've
>>> been sent to a wider audience and should use a core API rather than
>>> calling into architecture code directly.
>>>
>>> I've added it to this series for easier reference and to show the need
>>> for the new API.
>>
>> This is good stuff, I am struggling with something similar on ARM64. One
>> problem that I wasn't able to fully solve cleanly was that for arm-smmu
>> the SMMU HW resources are not released until the domain itself is destroyed
>> and I never quite figured out a way to swap the default domain cleanly.
>>
>> This is a problem for the MSM GPU because not only do we run our own IOMMU as
>> you do we also have a hardware dependency to use context bank 0 to
>> asynchronously switch the pagetable during rendering.
>>
>> I'm not sure if this is a problem you have encountered.
>
> I don't think I have. Recent chips have similar capabilities, but they
> don't have the restriction to a context bank, as far as I understand.
> Adding Mikko who's had more exposure to this.

IIRC the only way I've gotten Host1x to work on Tegra186 with IOMMU 
enabled is doing the equivalent of this patch (or actually using the DMA 
API, which may be possible but has some potential issues).

As you said, we don't have a limitation regarding the context bank or 
similar - Host1x handles context switching by changing the sent stream 
ID on the fly (which is quite difficult to deal with from kernel point 
of view as well), and the actual GPU has its own MMU.

Thanks,
Mikko

>
>> In any event, this code
>> gets us a little bit further down the path and at least there is somebody else
>> out there in the cold dark world that understands my pain. :)
>
> This doesn't actually fix anything on 64-bit ARM, and oddly enough I
> haven't run into this issue myself on 64-bit ARM either. I think the
> reason is that I haven't tested Nouveau on Tegra186 yet, which is the
> first SoC which has an ARM SMMU. On prior 64-bit ARM chips we've used
> the custom Tegra SMMU and that driver simply forbids creating any DMA
> domains, so it will allow only explicit usage of the IOMMU API. There
> is code in the generic DMA/IOMMU integration layer to not use the DMA
> API with non-DMA IOMMU domains, but that's not true on 32-bit ARM,
> unfortunately. It's entirely possible that Tegra186 will show exactly
> the same problem that you are describing.
>
> We do use the IOMMU API explicitly in the Tegra DRM driver as well,
> and that is something that I've tested on Tegra186 and that I know to
> be working. However, the reason why it works there is that the IOMMU
> group will contain multiple display controllers, which will again
> trigger a special case that will prevent the DMA/IOMMU integration
> from setting up a DMA domain for use with those devices.
>
>> For your interest, here was my half-hearted attempt to avoid creating DMA
>> domains in the first place based on a blacklist to try to spur a bit of
>> discussion: https://patchwork.freedesktop.org/series/41573/
>
> This looks very interesting and simple, but I can imagine that it will
> see significant pushback from the ARM SMMU maintainers (if not complete
> silence), because it adds SoC-specific knowledge to an otherwise fully
> generic driver. Having the GPU driver explicitly detach from the IOMMU
> domain sounds like a better option, but I don't see how that would help
> with the context bank issue that you're seeing.
>
> One other possibility that I can imagine is to add something to struct
> device that could be used by arch_setup_dma_ops() to not attach any of
> the IOMMU-backed DMA operations to the device. Unfortunately that code
> is called before the driver's ->probe() implementation is called, so a
> driver doesn't have an opportunity to set it. Something like
> of_dma_configure() could still set that up, perhaps based on some DT
> property, though I can already hear the NAK from DT maintainers because
> this is, after all, policy, not hardware description.
>
> The last solution that I can think of that might allow us to do this is
> to add a flag to struct device_driver (bool explicit_iommu?) that will
> be used by the DMA/IOMMU setup code to decide whether or not to attach
> to the IOMMU automatically.
>
> Though, again, I'm not sure that would actually solve your bank problem.
> That's really something I don't see any other solution than to fix it in
> the ARM SMMU driver. Perhaps context bank 0 can always be reserved for
> non-DMA domains?
>
> Thierry
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
  2018-04-26 12:59         ` Mikko Perttunen
@ 2018-04-26 13:14             ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 13:14 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Joerg Roedel,
	Russell King, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 3624 bytes --]

On Thu, Apr 26, 2018 at 03:59:04PM +0300, Mikko Perttunen wrote:
> On 26.04.2018 15:41, Thierry Reding wrote:
> > On Wed, Apr 25, 2018 at 09:28:49AM -0600, Jordan Crouse wrote:
> > > On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding wrote:
> > > > From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > > 
> > > > Depending on the kernel configuration, early ARM architecture setup code
> > > > may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> > > > the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> > > > backed buffers (a special bit in the GPU's MMU page tables indicates the
> > > > memory path to take: via the SMMU or directly to the memory controller).
> > > > Transparently backing DMA memory with an IOMMU prevents Nouveau from
> > > > properly handling such memory accesses and causes memory access faults.
> > > > 
> > > > As a side-note: buffers other than those allocated in instance memory
> > > > don't need to be physically contiguous from the GPU's perspective since
> > > > the GPU can map them into contiguous buffers using its own MMU. Mapping
> > > > these buffers through the IOMMU is unnecessary and will even lead to
> > > > performance degradation because of the additional translation.
> > > > 
> > > > Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > > ---
> > > > I had already sent this out independently to fix a regression that was
> > > > introduced in v4.16, but then Christoph pointed out that it should've
> > > > been sent to a wider audience and should use a core API rather than
> > > > calling into architecture code directly.
> > > > 
> > > > I've added it to this series for easier reference and to show the need
> > > > for the new API.
> > > 
> > > This is good stuff, I am struggling with something similar on ARM64. One
> > > problem that I wasn't able to fully solve cleanly was that for arm-smmu
> > > the SMMU HW resources are not released until the domain itself is destroyed
> > > and I never quite figured out a way to swap the default domain cleanly.
> > > 
> > > This is a problem for the MSM GPU because not only do we run our own IOMMU as
> > > you do we also have a hardware dependency to use context bank 0 to
> > > asynchronously switch the pagetable during rendering.
> > > 
> > > I'm not sure if this is a problem you have encountered.
> > 
> > I don't think I have. Recent chips have similar capabilities, but they
> > don't have the restriction to a context bank, as far as I understand.
> > Adding Mikko who's had more exposure to this.
> 
> IIRC the only way I've gotten Host1x to work on Tegra186 with IOMMU enabled
> is doing the equivalent of this patch (or actually using the DMA API, which
> may be possible but has some potential issues).
> 
> As you said, we don't have a limitation regarding the context bank or
> similar - Host1x handles context switching by changing the sent stream ID on
> the fly (which is quite difficult to deal with from kernel point of view as
> well), and the actual GPU has its own MMU.

One instance where we still need the system MMU for GPU is to implement
support for big pages, which is required in order to do compression and
better performance in some other use-cases. I don't think we'll need
anything fancy like context switching in that case, though, because we
would use the SMMU exclusively to make sparse allocations look
contiguous to the GPU, so all of the per-process protection would still
be achieved via the GPU MMU.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
@ 2018-04-26 13:14             ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-26 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 26, 2018 at 03:59:04PM +0300, Mikko Perttunen wrote:
> On 26.04.2018 15:41, Thierry Reding wrote:
> > On Wed, Apr 25, 2018 at 09:28:49AM -0600, Jordan Crouse wrote:
> > > On Wed, Apr 25, 2018 at 12:10:47PM +0200, Thierry Reding wrote:
> > > > From: Thierry Reding <treding@nvidia.com>
> > > > 
> > > > Depending on the kernel configuration, early ARM architecture setup code
> > > > may have attached the GPU to a DMA/IOMMU mapping that transparently uses
> > > > the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
> > > > backed buffers (a special bit in the GPU's MMU page tables indicates the
> > > > memory path to take: via the SMMU or directly to the memory controller).
> > > > Transparently backing DMA memory with an IOMMU prevents Nouveau from
> > > > properly handling such memory accesses and causes memory access faults.
> > > > 
> > > > As a side-note: buffers other than those allocated in instance memory
> > > > don't need to be physically contiguous from the GPU's perspective since
> > > > the GPU can map them into contiguous buffers using its own MMU. Mapping
> > > > these buffers through the IOMMU is unnecessary and will even lead to
> > > > performance degradation because of the additional translation.
> > > > 
> > > > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > > > ---
> > > > I had already sent this out independently to fix a regression that was
> > > > introduced in v4.16, but then Christoph pointed out that it should've
> > > > been sent to a wider audience and should use a core API rather than
> > > > calling into architecture code directly.
> > > > 
> > > > I've added it to this series for easier reference and to show the need
> > > > for the new API.
> > > 
> > > This is good stuff, I am struggling with something similar on ARM64. One
> > > problem that I wasn't able to fully solve cleanly was that for arm-smmu
> > > the SMMU HW resources are not released until the domain itself is destroyed
> > > and I never quite figured out a way to swap the default domain cleanly.
> > > 
> > > This is a problem for the MSM GPU because not only do we run our own IOMMU as
> > > you do we also have a hardware dependency to use context bank 0 to
> > > asynchronously switch the pagetable during rendering.
> > > 
> > > I'm not sure if this is a problem you have encountered.
> > 
> > I don't think I have. Recent chips have similar capabilities, but they
> > don't have the restriction to a context bank, as far as I understand.
> > Adding Mikko who's had more exposure to this.
> 
> IIRC the only way I've gotten Host1x to work on Tegra186 with IOMMU enabled
> is doing the equivalent of this patch (or actually using the DMA API, which
> may be possible but has some potential issues).
> 
> As you said, we don't have a limitation regarding the context bank or
> similar - Host1x handles context switching by changing the sent stream ID on
> the fly (which is quite difficult to deal with from kernel point of view as
> well), and the actual GPU has its own MMU.

One instance where we still need the system MMU for GPU is to implement
support for big pages, which is required in order to do compression and
better performance in some other use-cases. I don't think we'll need
anything fancy like context switching in that case, though, because we
would use the SMMU exclusively to make sparse allocations look
contiguous to the GPU, so all of the per-process protection would still
be achieved via the GPU MMU.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180426/9f1b53c7/attachment.sig>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
  2018-04-26 12:11             ` Thierry Reding
@ 2018-04-30 11:02               ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-30 11:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: nouveau, Russell King, dri-devel, iommu, linux-tegra, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1329 bytes --]

On Thu, Apr 26, 2018 at 02:11:36PM +0200, Thierry Reding wrote:
> On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
> > On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
> > > From: Thierry Reding <treding@nvidia.com>
> > > 
> > > The dma_iommu_detach_device() API can be used by drivers to forcibly
> > > detach a device from an IOMMU that architecture code might have attached
> > > to. This is useful for drivers that need explicit control over the IOMMU
> > > using the IOMMU API directly.
> > 
> > Given that no one else implements it making it a generic API seems
> > rather confusing.  For now I'd rename it to
> > arm_dma_iommu_detach_device() and only implement it in arm.
> 
> That'd be suboptimal because this code is used on both 32-bit and 64-bit
> ARM. If we make the function 32-bit ARM specific then the driver code
> would need to use an #ifdef to make sure compilation doesn't break on
> 64-bit ARM.

Do you still want me to make this ARM specific? While I haven't
encountered this issue on 64-bit ARM yet, I think it would happen there
as well, under the right circumstances. I could take a shot at
implementing the equivalent there (which means essentially implementing
it for drivers/iommu/dma-iommu.c and calling that from 64-bit ARM code).

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
@ 2018-04-30 11:02               ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-30 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 26, 2018 at 02:11:36PM +0200, Thierry Reding wrote:
> On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
> > On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
> > > From: Thierry Reding <treding@nvidia.com>
> > > 
> > > The dma_iommu_detach_device() API can be used by drivers to forcibly
> > > detach a device from an IOMMU that architecture code might have attached
> > > to. This is useful for drivers that need explicit control over the IOMMU
> > > using the IOMMU API directly.
> > 
> > Given that no one else implements it making it a generic API seems
> > rather confusing.  For now I'd rename it to
> > arm_dma_iommu_detach_device() and only implement it in arm.
> 
> That'd be suboptimal because this code is used on both 32-bit and 64-bit
> ARM. If we make the function 32-bit ARM specific then the driver code
> would need to use an #ifdef to make sure compilation doesn't break on
> 64-bit ARM.

Do you still want me to make this ARM specific? While I haven't
encountered this issue on 64-bit ARM yet, I think it would happen there
as well, under the right circumstances. I could take a shot at
implementing the equivalent there (which means essentially implementing
it for drivers/iommu/dma-iommu.c and calling that from 64-bit ARM code).

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180430/c99db340/attachment.sig>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
  2018-04-30 11:02               ` Thierry Reding
@ 2018-04-30 11:41                 ` Robin Murphy
  -1 siblings, 0 replies; 42+ messages in thread
From: Robin Murphy @ 2018-04-30 11:41 UTC (permalink / raw)
  To: Thierry Reding, Christoph Hellwig
  Cc: nouveau, Russell King, dri-devel, iommu, linux-tegra, linux-arm-kernel

On 30/04/18 12:02, Thierry Reding wrote:
> On Thu, Apr 26, 2018 at 02:11:36PM +0200, Thierry Reding wrote:
>> On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
>>> On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
>>>> From: Thierry Reding <treding@nvidia.com>
>>>>
>>>> The dma_iommu_detach_device() API can be used by drivers to forcibly
>>>> detach a device from an IOMMU that architecture code might have attached
>>>> to. This is useful for drivers that need explicit control over the IOMMU
>>>> using the IOMMU API directly.
>>>
>>> Given that no one else implements it making it a generic API seems
>>> rather confusing.  For now I'd rename it to
>>> arm_dma_iommu_detach_device() and only implement it in arm.
>>
>> That'd be suboptimal because this code is used on both 32-bit and 64-bit
>> ARM. If we make the function 32-bit ARM specific then the driver code
>> would need to use an #ifdef to make sure compilation doesn't break on
>> 64-bit ARM.
> 
> Do you still want me to make this ARM specific? While I haven't
> encountered this issue on 64-bit ARM yet, I think it would happen there
> as well, under the right circumstances. I could take a shot at
> implementing the equivalent there (which means essentially implementing
> it for drivers/iommu/dma-iommu.c and calling that from 64-bit ARM code).

It sounds like things are getting a bit backwards here: iommu-dma should 
have nothing to do with this, since if you've explicitly attached the 
device to your own IOMMU domain then you're already bypassing everything 
it knows about and has control over. Arch code calling into iommu-dma to 
do something which makes arch code not use iommu-dma makes very little 
sense.

AFAICS the only aspect of arm_iommu_detach_device() which actually 
matters in this case is the set_dma_ops() bit, so what we're really 
after is a generic way for drivers to say "Hey, I actually have my own 
MMU (or want to control the one you already know about) so please give 
me direct DMA ops", which the arch code handles as appropriate (maybe 
it's also allowed to fail in cases like swiotlb=force or when there is 
RAM beyond the device's DMA mask).

Robin.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
@ 2018-04-30 11:41                 ` Robin Murphy
  0 siblings, 0 replies; 42+ messages in thread
From: Robin Murphy @ 2018-04-30 11:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/04/18 12:02, Thierry Reding wrote:
> On Thu, Apr 26, 2018 at 02:11:36PM +0200, Thierry Reding wrote:
>> On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
>>> On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
>>>> From: Thierry Reding <treding@nvidia.com>
>>>>
>>>> The dma_iommu_detach_device() API can be used by drivers to forcibly
>>>> detach a device from an IOMMU that architecture code might have attached
>>>> to. This is useful for drivers that need explicit control over the IOMMU
>>>> using the IOMMU API directly.
>>>
>>> Given that no one else implements it making it a generic API seems
>>> rather confusing.  For now I'd rename it to
>>> arm_dma_iommu_detach_device() and only implement it in arm.
>>
>> That'd be suboptimal because this code is used on both 32-bit and 64-bit
>> ARM. If we make the function 32-bit ARM specific then the driver code
>> would need to use an #ifdef to make sure compilation doesn't break on
>> 64-bit ARM.
> 
> Do you still want me to make this ARM specific? While I haven't
> encountered this issue on 64-bit ARM yet, I think it would happen there
> as well, under the right circumstances. I could take a shot at
> implementing the equivalent there (which means essentially implementing
> it for drivers/iommu/dma-iommu.c and calling that from 64-bit ARM code).

It sounds like things are getting a bit backwards here: iommu-dma should 
have nothing to do with this, since if you've explicitly attached the 
device to your own IOMMU domain then you're already bypassing everything 
it knows about and has control over. Arch code calling into iommu-dma to 
do something which makes arch code not use iommu-dma makes very little 
sense.

AFAICS the only aspect of arm_iommu_detach_device() which actually 
matters in this case is the set_dma_ops() bit, so what we're really 
after is a generic way for drivers to say "Hey, I actually have my own 
MMU (or want to control the one you already know about) so please give 
me direct DMA ops", which the arch code handles as appropriate (maybe 
it's also allowed to fail in cases like swiotlb=force or when there is 
RAM beyond the device's DMA mask).

Robin.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
  2018-04-30 11:41                 ` Robin Murphy
@ 2018-04-30 12:12                     ` Thierry Reding
  -1 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-30 12:12 UTC (permalink / raw)
  To: Robin Murphy
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 2204 bytes --]

On Mon, Apr 30, 2018 at 12:41:52PM +0100, Robin Murphy wrote:
> On 30/04/18 12:02, Thierry Reding wrote:
> > On Thu, Apr 26, 2018 at 02:11:36PM +0200, Thierry Reding wrote:
> > > On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
> > > > On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
> > > > > From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > > > 
> > > > > The dma_iommu_detach_device() API can be used by drivers to forcibly
> > > > > detach a device from an IOMMU that architecture code might have attached
> > > > > to. This is useful for drivers that need explicit control over the IOMMU
> > > > > using the IOMMU API directly.
> > > > 
> > > > Given that no one else implements it making it a generic API seems
> > > > rather confusing.  For now I'd rename it to
> > > > arm_dma_iommu_detach_device() and only implement it in arm.
> > > 
> > > That'd be suboptimal because this code is used on both 32-bit and 64-bit
> > > ARM. If we make the function 32-bit ARM specific then the driver code
> > > would need to use an #ifdef to make sure compilation doesn't break on
> > > 64-bit ARM.
> > 
> > Do you still want me to make this ARM specific? While I haven't
> > encountered this issue on 64-bit ARM yet, I think it would happen there
> > as well, under the right circumstances. I could take a shot at
> > implementing the equivalent there (which means essentially implementing
> > it for drivers/iommu/dma-iommu.c and calling that from 64-bit ARM code).
> 
> It sounds like things are getting a bit backwards here: iommu-dma should
> have nothing to do with this, since if you've explicitly attached the device
> to your own IOMMU domain then you're already bypassing everything it knows
> about and has control over. Arch code calling into iommu-dma to do something
> which makes arch code not use iommu-dma makes very little sense.

My understanding is that iommu-dma will set up an IOMMU domain at device
probe time anyway. So even if attaching to an own IOMMU domain will end
up bypassing iommu-dma, we'd still want to clear up the IOMMU domain and
any associated resources, right?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
@ 2018-04-30 12:12                     ` Thierry Reding
  0 siblings, 0 replies; 42+ messages in thread
From: Thierry Reding @ 2018-04-30 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 30, 2018 at 12:41:52PM +0100, Robin Murphy wrote:
> On 30/04/18 12:02, Thierry Reding wrote:
> > On Thu, Apr 26, 2018 at 02:11:36PM +0200, Thierry Reding wrote:
> > > On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
> > > > On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
> > > > > From: Thierry Reding <treding@nvidia.com>
> > > > > 
> > > > > The dma_iommu_detach_device() API can be used by drivers to forcibly
> > > > > detach a device from an IOMMU that architecture code might have attached
> > > > > to. This is useful for drivers that need explicit control over the IOMMU
> > > > > using the IOMMU API directly.
> > > > 
> > > > Given that no one else implements it making it a generic API seems
> > > > rather confusing.  For now I'd rename it to
> > > > arm_dma_iommu_detach_device() and only implement it in arm.
> > > 
> > > That'd be suboptimal because this code is used on both 32-bit and 64-bit
> > > ARM. If we make the function 32-bit ARM specific then the driver code
> > > would need to use an #ifdef to make sure compilation doesn't break on
> > > 64-bit ARM.
> > 
> > Do you still want me to make this ARM specific? While I haven't
> > encountered this issue on 64-bit ARM yet, I think it would happen there
> > as well, under the right circumstances. I could take a shot at
> > implementing the equivalent there (which means essentially implementing
> > it for drivers/iommu/dma-iommu.c and calling that from 64-bit ARM code).
> 
> It sounds like things are getting a bit backwards here: iommu-dma should
> have nothing to do with this, since if you've explicitly attached the device
> to your own IOMMU domain then you're already bypassing everything it knows
> about and has control over. Arch code calling into iommu-dma to do something
> which makes arch code not use iommu-dma makes very little sense.

My understanding is that iommu-dma will set up an IOMMU domain at device
probe time anyway. So even if attaching to an own IOMMU domain will end
up bypassing iommu-dma, we'd still want to clear up the IOMMU domain and
any associated resources, right?

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180430/7b66e2f7/attachment.sig>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
  2018-04-30 12:12                     ` Thierry Reding
@ 2018-04-30 12:49                       ` Robin Murphy
  -1 siblings, 0 replies; 42+ messages in thread
From: Robin Murphy @ 2018-04-30 12:49 UTC (permalink / raw)
  To: Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Vetter,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 30/04/18 13:12, Thierry Reding wrote:
> On Mon, Apr 30, 2018 at 12:41:52PM +0100, Robin Murphy wrote:
>> On 30/04/18 12:02, Thierry Reding wrote:
>>> On Thu, Apr 26, 2018 at 02:11:36PM +0200, Thierry Reding wrote:
>>>> On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
>>>>> On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
>>>>>> From: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>>>>>
>>>>>> The dma_iommu_detach_device() API can be used by drivers to forcibly
>>>>>> detach a device from an IOMMU that architecture code might have attached
>>>>>> to. This is useful for drivers that need explicit control over the IOMMU
>>>>>> using the IOMMU API directly.
>>>>>
>>>>> Given that no one else implements it making it a generic API seems
>>>>> rather confusing.  For now I'd rename it to
>>>>> arm_dma_iommu_detach_device() and only implement it in arm.
>>>>
>>>> That'd be suboptimal because this code is used on both 32-bit and 64-bit
>>>> ARM. If we make the function 32-bit ARM specific then the driver code
>>>> would need to use an #ifdef to make sure compilation doesn't break on
>>>> 64-bit ARM.
>>>
>>> Do you still want me to make this ARM specific? While I haven't
>>> encountered this issue on 64-bit ARM yet, I think it would happen there
>>> as well, under the right circumstances. I could take a shot at
>>> implementing the equivalent there (which means essentially implementing
>>> it for drivers/iommu/dma-iommu.c and calling that from 64-bit ARM code).
>>
>> It sounds like things are getting a bit backwards here: iommu-dma should
>> have nothing to do with this, since if you've explicitly attached the device
>> to your own IOMMU domain then you're already bypassing everything it knows
>> about and has control over. Arch code calling into iommu-dma to do something
>> which makes arch code not use iommu-dma makes very little sense.
> 
> My understanding is that iommu-dma will set up an IOMMU domain at device
> probe time anyway. So even if attaching to an own IOMMU domain will end
> up bypassing iommu-dma, we'd still want to clear up the IOMMU domain and
> any associated resources, right?

The lifetime of a "proper" IOMMU API default domain is that of the 
iommu_group, so more or less between device_add() and device_del() from 
the perspective of a device in that group. IOW at least one level below 
what that device's driver should be messing with. The domain itself is 
just a little bit of memory and shouldn't have to occupy any hardware 
resources while it's not active (and yes, I know the ARM SMMU driver is 
currently a bit crap in that regard).

Robin.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API
@ 2018-04-30 12:49                       ` Robin Murphy
  0 siblings, 0 replies; 42+ messages in thread
From: Robin Murphy @ 2018-04-30 12:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/04/18 13:12, Thierry Reding wrote:
> On Mon, Apr 30, 2018 at 12:41:52PM +0100, Robin Murphy wrote:
>> On 30/04/18 12:02, Thierry Reding wrote:
>>> On Thu, Apr 26, 2018 at 02:11:36PM +0200, Thierry Reding wrote:
>>>> On Wed, Apr 25, 2018 at 08:19:34AM -0700, Christoph Hellwig wrote:
>>>>> On Wed, Apr 25, 2018 at 12:10:48PM +0200, Thierry Reding wrote:
>>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>>
>>>>>> The dma_iommu_detach_device() API can be used by drivers to forcibly
>>>>>> detach a device from an IOMMU that architecture code might have attached
>>>>>> to. This is useful for drivers that need explicit control over the IOMMU
>>>>>> using the IOMMU API directly.
>>>>>
>>>>> Given that no one else implements it making it a generic API seems
>>>>> rather confusing.  For now I'd rename it to
>>>>> arm_dma_iommu_detach_device() and only implement it in arm.
>>>>
>>>> That'd be suboptimal because this code is used on both 32-bit and 64-bit
>>>> ARM. If we make the function 32-bit ARM specific then the driver code
>>>> would need to use an #ifdef to make sure compilation doesn't break on
>>>> 64-bit ARM.
>>>
>>> Do you still want me to make this ARM specific? While I haven't
>>> encountered this issue on 64-bit ARM yet, I think it would happen there
>>> as well, under the right circumstances. I could take a shot at
>>> implementing the equivalent there (which means essentially implementing
>>> it for drivers/iommu/dma-iommu.c and calling that from 64-bit ARM code).
>>
>> It sounds like things are getting a bit backwards here: iommu-dma should
>> have nothing to do with this, since if you've explicitly attached the device
>> to your own IOMMU domain then you're already bypassing everything it knows
>> about and has control over. Arch code calling into iommu-dma to do something
>> which makes arch code not use iommu-dma makes very little sense.
> 
> My understanding is that iommu-dma will set up an IOMMU domain at device
> probe time anyway. So even if attaching to an own IOMMU domain will end
> up bypassing iommu-dma, we'd still want to clear up the IOMMU domain and
> any associated resources, right?

The lifetime of a "proper" IOMMU API default domain is that of the 
iommu_group, so more or less between device_add() and device_del() from 
the perspective of a device in that group. IOW at least one level below 
what that device's driver should be messing with. The domain itself is 
just a little bit of memory and shouldn't have to occupy any hardware 
resources while it's not active (and yes, I know the ARM SMMU driver is 
currently a bit crap in that regard).

Robin.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-04-30 12:49 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 10:10 [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping Thierry Reding
2018-04-25 10:10 ` Thierry Reding
     [not found] ` <20180425101051.15349-1-thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-25 10:10   ` [PATCH v2 2/5] dma-mapping: Introduce dma_iommu_detach_device() API Thierry Reding
2018-04-25 10:10     ` Thierry Reding
     [not found]     ` <20180425101051.15349-2-thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-25 15:19       ` Christoph Hellwig
2018-04-25 15:19         ` Christoph Hellwig
     [not found]         ` <20180425151934.GC16075-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2018-04-26 12:11           ` Thierry Reding
2018-04-26 12:11             ` Thierry Reding
2018-04-30 11:02             ` Thierry Reding
2018-04-30 11:02               ` Thierry Reding
2018-04-30 11:41               ` Robin Murphy
2018-04-30 11:41                 ` Robin Murphy
     [not found]                 ` <e549979d-9f53-a329-da1a-ed5139958762-5wv7dgnIgG8@public.gmane.org>
2018-04-30 12:12                   ` Thierry Reding
2018-04-30 12:12                     ` Thierry Reding
2018-04-30 12:49                     ` Robin Murphy
2018-04-30 12:49                       ` Robin Murphy
2018-04-25 10:10   ` [PATCH v2 3/5] ARM: dma-mapping: Implement arch_iommu_detach_device() Thierry Reding
2018-04-25 10:10     ` Thierry Reding
     [not found]     ` <20180425101051.15349-3-thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-25 15:20       ` Christoph Hellwig
2018-04-25 15:20         ` Christoph Hellwig
     [not found]         ` <20180425152049.GD16075-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2018-04-26 12:14           ` Thierry Reding
2018-04-26 12:14             ` Thierry Reding
2018-04-25 10:10   ` [PATCH v2 4/5] drm/nouveau: tegra: Use dma_iommu_detach_device() Thierry Reding
2018-04-25 10:10     ` Thierry Reding
2018-04-25 10:10   ` [PATCH v2 5/5] ARM: Unconditionally enable ARM_DMA_USE_IOMMU Thierry Reding
2018-04-25 10:10     ` Thierry Reding
     [not found]     ` <20180425101051.15349-5-thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-04-25 10:25       ` Russell King - ARM Linux
2018-04-25 10:25         ` Russell King - ARM Linux
2018-04-25 15:17         ` Christoph Hellwig
2018-04-25 15:17           ` Christoph Hellwig
2018-04-25 15:18   ` [PATCH v2 1/5] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping Christoph Hellwig
2018-04-25 15:18     ` Christoph Hellwig
     [not found]     ` <20180425151815.GB16075-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2018-04-26 12:09       ` Thierry Reding
2018-04-26 12:09         ` Thierry Reding
2018-04-25 15:28 ` Jordan Crouse
2018-04-25 15:28   ` Jordan Crouse
     [not found]   ` <20180425152849.GA2447-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
2018-04-26 12:41     ` Thierry Reding
2018-04-26 12:41       ` Thierry Reding
2018-04-26 12:59       ` Mikko Perttunen
2018-04-26 12:59         ` Mikko Perttunen
     [not found]         ` <fd6e0d90-7614-296b-2927-7b2745e8fbde-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2018-04-26 13:14           ` Thierry Reding
2018-04-26 13:14             ` Thierry Reding

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.