linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/9] Tegra GART driver clean up and optimization
@ 2018-05-08 18:16 Dmitry Osipenko
  2018-05-08 18:16 ` [PATCH v1 1/9] memory: tegra: Provide facility for integration with the GART driver Dmitry Osipenko
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:16 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

Hello,

This series addresses multiple shortcomings of the GART driver:

1. Thierry noticed that Memory Controller driver uses registers that belong
   to GART in [0] and for now MC driver only reports the fact of GART's page
   fault. The first two patches of the series are addressing this shortcoming.

   [0] https://www.spinics.net/lists/linux-tegra/msg33072.html

2. Currently GART is kept disabled by the commit c7e3ca515e784 ("iommu/tegra:
   gart: Do not register with bus"). If GART is re-enabled than all devices
   in the system are getting assigned to the GART as it is a global systems
   IOMMU provider. This is wrong simply because GART doesn't handle all those
   devices. This series makes GART to accept only devices that are explicitly
   assigned to GART in device tree using 'iommu' phandle.

3. This series makes a generic clean up of the driver by removing dead code,
   allowing to have one IOMMU domain at max, etc.

4. This series introduces and utilizes iotlb_sync_map() callback that was
   previously suggested by Joerg Roedel in [1].

   [1] https://www.spinics.net/lists/linux-tegra/msg32914.html

Dmitry Osipenko (9):
  memory: tegra: Provide facility for integration with the GART driver
  iommu/tegra: gart: Provide access to Memory Controller driver
  iommu/tegra: gart: Remove code related to module unloading
  iommu/tegra: gart: Remove pr_fmt and clean up includes
  iommu/tegra: gart: Clean up driver probe failure unwinding
  iommu/tegra: gart: Ignore devices without IOMMU phandle in DT
  iommu/tegra: gart: Provide single domain and group for all devices
  iommu: Introduce iotlb_sync_map callback
  iommu/tegra: gart: Optimize mapping / unmapping performance

 drivers/iommu/iommu.c      |   8 +-
 drivers/iommu/tegra-gart.c | 216 +++++++++++++++++--------------------
 drivers/memory/tegra/mc.c  |  26 ++++-
 include/linux/iommu.h      |   1 +
 include/soc/tegra/mc.h     |  13 +++
 5 files changed, 143 insertions(+), 121 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 1/9] memory: tegra: Provide facility for integration with the GART driver
  2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
@ 2018-05-08 18:16 ` Dmitry Osipenko
  2018-05-08 18:16 ` [PATCH v1 2/9] iommu/tegra: gart: Provide access to Memory Controller driver Dmitry Osipenko
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:16 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

In order to report clients name and access direction on GART page fault,
MC driver needs to access GART registers. Add facility that provides
access to the GART.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/memory/tegra/mc.c | 26 +++++++++++++++++++++++---
 include/soc/tegra/mc.h    | 13 +++++++++++++
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/drivers/memory/tegra/mc.c b/drivers/memory/tegra/mc.c
index c81d01caf1a8..49dd7ad1459f 100644
--- a/drivers/memory/tegra/mc.c
+++ b/drivers/memory/tegra/mc.c
@@ -72,6 +72,8 @@ static const struct of_device_id tegra_mc_of_match[] = {
 };
 MODULE_DEVICE_TABLE(of, tegra_mc_of_match);
 
+static struct tegra_mc_gart_handle *gart_handle;
+
 static int terga_mc_block_dma_common(struct tegra_mc *mc,
 				     const struct tegra_mc_reset *rst)
 {
@@ -543,6 +545,11 @@ static irqreturn_t tegra_mc_irq(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+void tegra_mc_register_gart(struct tegra_mc_gart_handle *handle)
+{
+	WRITE_ONCE(gart_handle, handle);
+}
+
 static __maybe_unused irqreturn_t tegra20_mc_irq(int irq, void *data)
 {
 	struct tegra_mc *mc = data;
@@ -565,6 +572,7 @@ static __maybe_unused irqreturn_t tegra20_mc_irq(int irq, void *data)
 		switch (BIT(bit)) {
 		case MC_INT_DECERR_EMEM:
 			reg = MC_DECERR_EMEM_OTHERS_STATUS;
+			addr = mc_readl(mc, reg + sizeof(u32));
 			value = mc_readl(mc, reg);
 
 			id = value & mc->soc->client_id_mask;
@@ -575,11 +583,24 @@ static __maybe_unused irqreturn_t tegra20_mc_irq(int irq, void *data)
 			break;
 
 		case MC_INT_INVALID_GART_PAGE:
-			dev_err_ratelimited(mc->dev, "%s\n", error);
-			continue;
+			if (READ_ONCE(gart_handle) == NULL) {
+				dev_err_ratelimited(mc->dev, "%s\n", error);
+				continue;
+			}
+
+			addr = gart_handle->error_addr(gart_handle);
+			value = gart_handle->error_req(gart_handle);
+
+			id = (value >> 1) & mc->soc->client_id_mask;
+			desc = error_names[2];
+
+			if (value & BIT(0))
+				direction = "write";
+			break;
 
 		case MC_INT_SECURITY_VIOLATION:
 			reg = MC_SECURITY_VIOLATION_STATUS;
+			addr = mc_readl(mc, reg + sizeof(u32));
 			value = mc_readl(mc, reg);
 
 			id = value & mc->soc->client_id_mask;
@@ -596,7 +617,6 @@ static __maybe_unused irqreturn_t tegra20_mc_irq(int irq, void *data)
 		}
 
 		client = mc->soc->clients[id].name;
-		addr = mc_readl(mc, reg + sizeof(u32));
 
 		dev_err_ratelimited(mc->dev, "%s: %s%s @%pa: %s (%s)\n",
 				    client, secure, direction, &addr, error,
diff --git a/include/soc/tegra/mc.h b/include/soc/tegra/mc.h
index b43f37fea096..5bf72eb4dd51 100644
--- a/include/soc/tegra/mc.h
+++ b/include/soc/tegra/mc.h
@@ -162,4 +162,17 @@ struct tegra_mc {
 void tegra_mc_write_emem_configuration(struct tegra_mc *mc, unsigned long rate);
 unsigned int tegra_mc_get_emem_device_count(struct tegra_mc *mc);
 
+struct tegra_mc_gart_handle {
+	u32 (*error_addr)(struct tegra_mc_gart_handle *handle);
+	u32 (*error_req)(struct tegra_mc_gart_handle *handle);
+};
+
+#ifdef CONFIG_TEGRA_MC
+void tegra_mc_register_gart(struct tegra_mc_gart_handle *handle);
+#else
+static inline void tegra_mc_register_gart(struct tegra_mc_gart_handle *handle)
+{
+}
+#endif
+
 #endif /* __SOC_TEGRA_MC_H__ */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 2/9] iommu/tegra: gart: Provide access to Memory Controller driver
  2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
  2018-05-08 18:16 ` [PATCH v1 1/9] memory: tegra: Provide facility for integration with the GART driver Dmitry Osipenko
@ 2018-05-08 18:16 ` Dmitry Osipenko
  2018-05-08 18:16 ` [PATCH v1 3/9] iommu/tegra: gart: Remove code related to module unloading Dmitry Osipenko
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:16 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

GART contains registers needed by the Memory Controller driver. Provide
access to the MC driver by utilizing its GART-integration facility.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/iommu/tegra-gart.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 89ec24c6952c..de48943bf843 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -31,6 +31,8 @@
 #include <linux/iommu.h>
 #include <linux/of.h>
 
+#include <soc/tegra/mc.h>
+
 #include <asm/cacheflush.h>
 
 /* bitmap of the page sizes currently supported */
@@ -41,6 +43,8 @@
 #define GART_ENTRY_ADDR		(0x28 - GART_REG_BASE)
 #define GART_ENTRY_DATA		(0x2c - GART_REG_BASE)
 #define GART_ENTRY_PHYS_ADDR_VALID	(1 << 31)
+#define GART_ERROR_REQ		(0x30 - GART_REG_BASE)
+#define GART_ERROR_ADDR		(0x34 - GART_REG_BASE)
 
 #define GART_PAGE_SHIFT		12
 #define GART_PAGE_SIZE		(1 << GART_PAGE_SHIFT)
@@ -63,6 +67,8 @@ struct gart_device {
 	struct device		*dev;
 
 	struct iommu_device	iommu;		/* IOMMU Core handle */
+
+	struct tegra_mc_gart_handle mc_gart_handle;
 };
 
 struct gart_domain {
@@ -408,6 +414,20 @@ static int tegra_gart_resume(struct device *dev)
 	return 0;
 }
 
+static u32 tegra_gart_error_addr(struct tegra_mc_gart_handle *handle)
+{
+	struct gart_device *gart = container_of(handle, struct gart_device,
+						mc_gart_handle);
+	return readl(gart->regs + GART_ERROR_ADDR);
+}
+
+static u32 tegra_gart_error_req(struct tegra_mc_gart_handle *handle)
+{
+	struct gart_device *gart = container_of(handle, struct gart_device,
+						mc_gart_handle);
+	return readl(gart->regs + GART_ERROR_REQ);
+}
+
 static int tegra_gart_probe(struct platform_device *pdev)
 {
 	struct gart_device *gart;
@@ -464,6 +484,8 @@ static int tegra_gart_probe(struct platform_device *pdev)
 	gart->regs = gart_regs;
 	gart->iovmm_base = (dma_addr_t)res_remap->start;
 	gart->page_count = (resource_size(res_remap) >> GART_PAGE_SHIFT);
+	gart->mc_gart_handle.error_addr = tegra_gart_error_addr;
+	gart->mc_gart_handle.error_req = tegra_gart_error_req;
 
 	gart->savedata = vmalloc(sizeof(u32) * gart->page_count);
 	if (!gart->savedata) {
@@ -475,6 +497,7 @@ static int tegra_gart_probe(struct platform_device *pdev)
 	do_gart_setup(gart, NULL);
 
 	gart_handle = gart;
+	tegra_mc_register_gart(&gart->mc_gart_handle);
 
 	return 0;
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 3/9] iommu/tegra: gart: Remove code related to module unloading
  2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
  2018-05-08 18:16 ` [PATCH v1 1/9] memory: tegra: Provide facility for integration with the GART driver Dmitry Osipenko
  2018-05-08 18:16 ` [PATCH v1 2/9] iommu/tegra: gart: Provide access to Memory Controller driver Dmitry Osipenko
@ 2018-05-08 18:16 ` Dmitry Osipenko
  2018-05-08 18:16 ` [PATCH v1 4/9] iommu/tegra: gart: Remove pr_fmt and clean up includes Dmitry Osipenko
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:16 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

GART driver is built-in, hence it can't be unloaded. This patch merely
removes the dead code.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/iommu/tegra-gart.c | 25 +++----------------------
 1 file changed, 3 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index de48943bf843..268d29fb9097 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -502,20 +502,6 @@ static int tegra_gart_probe(struct platform_device *pdev)
 	return 0;
 }
 
-static int tegra_gart_remove(struct platform_device *pdev)
-{
-	struct gart_device *gart = platform_get_drvdata(pdev);
-
-	iommu_device_unregister(&gart->iommu);
-	iommu_device_sysfs_remove(&gart->iommu);
-
-	writel(0, gart->regs + GART_CONFIG);
-	if (gart->savedata)
-		vfree(gart->savedata);
-	gart_handle = NULL;
-	return 0;
-}
-
 static const struct dev_pm_ops tegra_gart_pm_ops = {
 	.suspend	= tegra_gart_suspend,
 	.resume		= tegra_gart_resume,
@@ -529,26 +515,21 @@ MODULE_DEVICE_TABLE(of, tegra_gart_of_match);
 
 static struct platform_driver tegra_gart_driver = {
 	.probe		= tegra_gart_probe,
-	.remove		= tegra_gart_remove,
 	.driver = {
 		.name	= "tegra-gart",
 		.pm	= &tegra_gart_pm_ops,
 		.of_match_table = tegra_gart_of_match,
+		.suppress_bind_attrs = true,
 	},
+	.prevent_deferred_probe = true,
 };
 
 static int tegra_gart_init(void)
 {
 	return platform_driver_register(&tegra_gart_driver);
 }
-
-static void __exit tegra_gart_exit(void)
-{
-	platform_driver_unregister(&tegra_gart_driver);
-}
-
 subsys_initcall(tegra_gart_init);
-module_exit(tegra_gart_exit);
+
 module_param(gart_debug, bool, 0644);
 
 MODULE_PARM_DESC(gart_debug, "Enable GART debugging");
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 4/9] iommu/tegra: gart: Remove pr_fmt and clean up includes
  2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
                   ` (2 preceding siblings ...)
  2018-05-08 18:16 ` [PATCH v1 3/9] iommu/tegra: gart: Remove code related to module unloading Dmitry Osipenko
@ 2018-05-08 18:16 ` Dmitry Osipenko
  2018-05-08 18:16 ` [PATCH v1 5/9] iommu/tegra: gart: Clean up driver probe failure unwinding Dmitry Osipenko
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:16 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

Remove unneeded 'includes' and sort them in alphabet order. Also remove
pr_fmt since there is no pr_xxx() and it doesn't affect dev_xxx().

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/iommu/tegra-gart.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 268d29fb9097..08e0de4087d1 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -17,24 +17,17 @@
  * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
  */
 
-#define pr_fmt(fmt)	"%s(): " fmt, __func__
-
+#include <linux/io.h>
+#include <linux/iommu.h>
+#include <linux/list.h>
 #include <linux/module.h>
-#include <linux/platform_device.h>
-#include <linux/spinlock.h>
+#include <linux/of_device.h>
 #include <linux/slab.h>
+#include <linux/spinlock.h>
 #include <linux/vmalloc.h>
-#include <linux/mm.h>
-#include <linux/list.h>
-#include <linux/device.h>
-#include <linux/io.h>
-#include <linux/iommu.h>
-#include <linux/of.h>
 
 #include <soc/tegra/mc.h>
 
-#include <asm/cacheflush.h>
-
 /* bitmap of the page sizes currently supported */
 #define GART_IOMMU_PGSIZES	(SZ_4K)
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 5/9] iommu/tegra: gart: Clean up driver probe failure unwinding
  2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
                   ` (3 preceding siblings ...)
  2018-05-08 18:16 ` [PATCH v1 4/9] iommu/tegra: gart: Remove pr_fmt and clean up includes Dmitry Osipenko
@ 2018-05-08 18:16 ` Dmitry Osipenko
  2018-05-08 18:16 ` [PATCH v1 6/9] iommu/tegra: gart: Ignore devices without IOMMU phandle in DT Dmitry Osipenko
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:16 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

Properly clean up allocated resources on driver probe failure.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/iommu/tegra-gart.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 08e0de4087d1..39305224c48d 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -466,8 +466,7 @@ static int tegra_gart_probe(struct platform_device *pdev)
 	ret = iommu_device_register(&gart->iommu);
 	if (ret) {
 		dev_err(dev, "Failed to register IOMMU\n");
-		iommu_device_sysfs_remove(&gart->iommu);
-		return ret;
+		goto remove_sysfs;
 	}
 
 	gart->dev = &pdev->dev;
@@ -483,7 +482,8 @@ static int tegra_gart_probe(struct platform_device *pdev)
 	gart->savedata = vmalloc(sizeof(u32) * gart->page_count);
 	if (!gart->savedata) {
 		dev_err(dev, "failed to allocate context save area\n");
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto iommu_unregister;
 	}
 
 	platform_set_drvdata(pdev, gart);
@@ -493,6 +493,13 @@ static int tegra_gart_probe(struct platform_device *pdev)
 	tegra_mc_register_gart(&gart->mc_gart_handle);
 
 	return 0;
+
+iommu_unregister:
+	iommu_device_unregister(&gart->iommu);
+remove_sysfs:
+	iommu_device_sysfs_remove(&gart->iommu);
+
+	return ret;
 }
 
 static const struct dev_pm_ops tegra_gart_pm_ops = {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 6/9] iommu/tegra: gart: Ignore devices without IOMMU phandle in DT
  2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
                   ` (4 preceding siblings ...)
  2018-05-08 18:16 ` [PATCH v1 5/9] iommu/tegra: gart: Clean up driver probe failure unwinding Dmitry Osipenko
@ 2018-05-08 18:16 ` Dmitry Osipenko
  2018-05-11 11:34   ` Robin Murphy
  2018-05-08 18:16 ` [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices Dmitry Osipenko
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:16 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

GART can't handle all devices, ignore devices that aren't related to GART.
Device tree must explicitly assign GART IOMMU to the devices.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/iommu/tegra-gart.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 39305224c48d..5b2d27620350 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -366,6 +366,26 @@ static void gart_iommu_remove_device(struct device *dev)
 	iommu_device_unlink(&gart_handle->iommu, dev);
 }
 
+static int gart_iommu_check_device(struct gart_device *gart,
+				   struct device *dev);
+
+struct iommu_group *gart_iommu_device_group(struct device *dev)
+{
+	int err;
+
+	err = gart_iommu_check_device(gart_handle, dev);
+	if (err)
+		return ERR_PTR(err);
+
+	return generic_device_group(dev);
+}
+
+static int gart_iommu_of_xlate(struct device *dev,
+			       struct of_phandle_args *args)
+{
+	return 0;
+}
+
 static const struct iommu_ops gart_iommu_ops = {
 	.capable	= gart_iommu_capable,
 	.domain_alloc	= gart_iommu_domain_alloc,
@@ -374,14 +394,24 @@ static const struct iommu_ops gart_iommu_ops = {
 	.detach_dev	= gart_iommu_detach_dev,
 	.add_device	= gart_iommu_add_device,
 	.remove_device	= gart_iommu_remove_device,
-	.device_group	= generic_device_group,
+	.device_group	= gart_iommu_device_group,
 	.map		= gart_iommu_map,
 	.map_sg		= default_iommu_map_sg,
 	.unmap		= gart_iommu_unmap,
 	.iova_to_phys	= gart_iommu_iova_to_phys,
 	.pgsize_bitmap	= GART_IOMMU_PGSIZES,
+	.of_xlate	= gart_iommu_of_xlate,
 };
 
+static int gart_iommu_check_device(struct gart_device *gart,
+				   struct device *dev)
+{
+	if (!dev->iommu_fwspec || dev->iommu_fwspec->ops != &gart_iommu_ops)
+		return -ENODEV;
+
+	return 0;
+}
+
 static int tegra_gart_suspend(struct device *dev)
 {
 	struct gart_device *gart = dev_get_drvdata(dev);
@@ -462,6 +492,7 @@ static int tegra_gart_probe(struct platform_device *pdev)
 	}
 
 	iommu_device_set_ops(&gart->iommu, &gart_iommu_ops);
+	iommu_device_set_fwnode(&gart->iommu, dev->fwnode);
 
 	ret = iommu_device_register(&gart->iommu);
 	if (ret) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices
  2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
                   ` (5 preceding siblings ...)
  2018-05-08 18:16 ` [PATCH v1 6/9] iommu/tegra: gart: Ignore devices without IOMMU phandle in DT Dmitry Osipenko
@ 2018-05-08 18:16 ` Dmitry Osipenko
  2018-05-11 11:12   ` Dmitry Osipenko
  2018-05-11 12:32   ` Robin Murphy
  2018-05-08 18:16 ` [PATCH v1 8/9] iommu: Introduce iotlb_sync_map callback Dmitry Osipenko
  2018-05-08 18:17 ` [PATCH v1 9/9] iommu/tegra: gart: Optimize mapping / unmapping performance Dmitry Osipenko
  8 siblings, 2 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:16 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

GART aperture is shared by all devices, hence there is a single IOMMU
domain and group shared by these devices. Allocation of a group per
device only wastes resources and allowance of having more than one domain
is simply wrong because IOMMU mappings made by the users of "different"
domains will stomp on each other.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/iommu/tegra-gart.c | 107 +++++++++----------------------------
 1 file changed, 24 insertions(+), 83 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 5b2d27620350..ebc105c201bd 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -19,7 +19,6 @@
 
 #include <linux/io.h>
 #include <linux/iommu.h>
-#include <linux/list.h>
 #include <linux/module.h>
 #include <linux/of_device.h>
 #include <linux/slab.h>
@@ -44,22 +43,17 @@
 #define GART_PAGE_MASK						\
 	(~(GART_PAGE_SIZE - 1) & ~GART_ENTRY_PHYS_ADDR_VALID)
 
-struct gart_client {
-	struct device		*dev;
-	struct list_head	list;
-};
-
 struct gart_device {
 	void __iomem		*regs;
 	u32			*savedata;
 	u32			page_count;	/* total remappable size */
 	dma_addr_t		iovmm_base;	/* offset to vmm_area */
 	spinlock_t		pte_lock;	/* for pagetable */
-	struct list_head	client;
-	spinlock_t		client_lock;	/* for client list */
 	struct device		*dev;
 
 	struct iommu_device	iommu;		/* IOMMU Core handle */
+	struct iommu_group	*group;		/* Common IOMMU group */
+	struct gart_domain	*domain;	/* Unique IOMMU domain */
 
 	struct tegra_mc_gart_handle mc_gart_handle;
 };
@@ -169,81 +163,31 @@ static inline bool gart_iova_range_valid(struct gart_device *gart,
 static int gart_iommu_attach_dev(struct iommu_domain *domain,
 				 struct device *dev)
 {
-	struct gart_domain *gart_domain = to_gart_domain(domain);
-	struct gart_device *gart = gart_domain->gart;
-	struct gart_client *client, *c;
-	int err = 0;
-
-	client = devm_kzalloc(gart->dev, sizeof(*c), GFP_KERNEL);
-	if (!client)
-		return -ENOMEM;
-	client->dev = dev;
-
-	spin_lock(&gart->client_lock);
-	list_for_each_entry(c, &gart->client, list) {
-		if (c->dev == dev) {
-			dev_err(gart->dev,
-				"%s is already attached\n", dev_name(dev));
-			err = -EINVAL;
-			goto fail;
-		}
-	}
-	list_add(&client->list, &gart->client);
-	spin_unlock(&gart->client_lock);
-	dev_dbg(gart->dev, "Attached %s\n", dev_name(dev));
 	return 0;
-
-fail:
-	devm_kfree(gart->dev, client);
-	spin_unlock(&gart->client_lock);
-	return err;
 }
 
 static void gart_iommu_detach_dev(struct iommu_domain *domain,
 				  struct device *dev)
 {
-	struct gart_domain *gart_domain = to_gart_domain(domain);
-	struct gart_device *gart = gart_domain->gart;
-	struct gart_client *c;
-
-	spin_lock(&gart->client_lock);
-
-	list_for_each_entry(c, &gart->client, list) {
-		if (c->dev == dev) {
-			list_del(&c->list);
-			devm_kfree(gart->dev, c);
-			dev_dbg(gart->dev, "Detached %s\n", dev_name(dev));
-			goto out;
-		}
-	}
-	dev_err(gart->dev, "Couldn't find\n");
-out:
-	spin_unlock(&gart->client_lock);
 }
 
 static struct iommu_domain *gart_iommu_domain_alloc(unsigned type)
 {
-	struct gart_domain *gart_domain;
-	struct gart_device *gart;
-
-	if (type != IOMMU_DOMAIN_UNMANAGED)
-		return NULL;
+	struct gart_device *gart = gart_handle;
 
-	gart = gart_handle;
-	if (!gart)
+	if (type != IOMMU_DOMAIN_UNMANAGED || gart->domain)
 		return NULL;
 
-	gart_domain = kzalloc(sizeof(*gart_domain), GFP_KERNEL);
-	if (!gart_domain)
-		return NULL;
-
-	gart_domain->gart = gart;
-	gart_domain->domain.geometry.aperture_start = gart->iovmm_base;
-	gart_domain->domain.geometry.aperture_end = gart->iovmm_base +
+	gart->domain = kzalloc(sizeof(*gart->domain), GFP_KERNEL);
+	if (gart->domain) {
+		gart->domain->domain.geometry.aperture_start = gart->iovmm_base;
+		gart->domain->domain.geometry.aperture_end = gart->iovmm_base +
 					gart->page_count * GART_PAGE_SIZE - 1;
-	gart_domain->domain.geometry.force_aperture = true;
+		gart->domain->domain.geometry.force_aperture = true;
+		gart->domain->gart = gart;
+	}
 
-	return &gart_domain->domain;
+	return &gart->domain->domain;
 }
 
 static void gart_iommu_domain_free(struct iommu_domain *domain)
@@ -251,18 +195,7 @@ static void gart_iommu_domain_free(struct iommu_domain *domain)
 	struct gart_domain *gart_domain = to_gart_domain(domain);
 	struct gart_device *gart = gart_domain->gart;
 
-	if (gart) {
-		spin_lock(&gart->client_lock);
-		if (!list_empty(&gart->client)) {
-			struct gart_client *c;
-
-			list_for_each_entry(c, &gart->client, list)
-				gart_iommu_detach_dev(domain, c->dev);
-		}
-		spin_unlock(&gart->client_lock);
-	}
-
-	kfree(gart_domain);
+	kfree(gart->domain);
 }
 
 static int gart_iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -377,7 +310,7 @@ struct iommu_group *gart_iommu_device_group(struct device *dev)
 	if (err)
 		return ERR_PTR(err);
 
-	return generic_device_group(dev);
+	return gart_handle->group;
 }
 
 static int gart_iommu_of_xlate(struct device *dev,
@@ -502,8 +435,6 @@ static int tegra_gart_probe(struct platform_device *pdev)
 
 	gart->dev = &pdev->dev;
 	spin_lock_init(&gart->pte_lock);
-	spin_lock_init(&gart->client_lock);
-	INIT_LIST_HEAD(&gart->client);
 	gart->regs = gart_regs;
 	gart->iovmm_base = (dma_addr_t)res_remap->start;
 	gart->page_count = (resource_size(res_remap) >> GART_PAGE_SHIFT);
@@ -517,6 +448,14 @@ static int tegra_gart_probe(struct platform_device *pdev)
 		goto iommu_unregister;
 	}
 
+	gart->group = iommu_group_alloc();
+	if (IS_ERR(gart->group)) {
+		ret = PTR_ERR(gart->group);
+		goto free_savedata;
+	}
+
+	iommu_group_ref_get(gart->group);
+
 	platform_set_drvdata(pdev, gart);
 	do_gart_setup(gart, NULL);
 
@@ -525,6 +464,8 @@ static int tegra_gart_probe(struct platform_device *pdev)
 
 	return 0;
 
+free_savedata:
+	vfree(gart->savedata);
 iommu_unregister:
 	iommu_device_unregister(&gart->iommu);
 remove_sysfs:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 8/9] iommu: Introduce iotlb_sync_map callback
  2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
                   ` (6 preceding siblings ...)
  2018-05-08 18:16 ` [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices Dmitry Osipenko
@ 2018-05-08 18:16 ` Dmitry Osipenko
  2018-05-11 13:02   ` Robin Murphy
  2018-05-08 18:17 ` [PATCH v1 9/9] iommu/tegra: gart: Optimize mapping / unmapping performance Dmitry Osipenko
  8 siblings, 1 reply; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:16 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

Introduce iotlb_sync_map() callback that is invoked in the end of
iommu_map(). This new callback allows IOMMU drivers to avoid syncing
on mapping of each contiguous chunk and sync only when whole mapping
is completed, optimizing performance of the mapping operation.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/iommu/iommu.c | 8 ++++++--
 include/linux/iommu.h | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d2aa23202bb9..39b2ee66aa96 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1508,13 +1508,14 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
+	const struct iommu_ops *ops = domain->ops;
 	unsigned long orig_iova = iova;
 	unsigned int min_pagesz;
 	size_t orig_size = size;
 	phys_addr_t orig_paddr = paddr;
 	int ret = 0;
 
-	if (unlikely(domain->ops->map == NULL ||
+	if (unlikely(ops->map == NULL ||
 		     domain->pgsize_bitmap == 0UL))
 		return -ENODEV;
 
@@ -1543,7 +1544,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 			 iova, &paddr, pgsize);
 
-		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
+		ret = ops->map(domain, iova, paddr, pgsize, prot);
 		if (ret)
 			break;
 
@@ -1552,6 +1553,9 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		size -= pgsize;
 	}
 
+	if (ops->iotlb_sync_map)
+		ops->iotlb_sync_map(domain);
+
 	/* unroll mapping in case something went wrong */
 	if (ret)
 		iommu_unmap(domain, orig_iova, orig_size - size);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee6eb31..5224aa376377 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -206,6 +206,7 @@ struct iommu_ops {
 	void (*flush_iotlb_all)(struct iommu_domain *domain);
 	void (*iotlb_range_add)(struct iommu_domain *domain,
 				unsigned long iova, size_t size);
+	void (*iotlb_sync_map)(struct iommu_domain *domain);
 	void (*iotlb_sync)(struct iommu_domain *domain);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
 	int (*add_device)(struct device *dev);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 9/9] iommu/tegra: gart: Optimize mapping / unmapping performance
  2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
                   ` (7 preceding siblings ...)
  2018-05-08 18:16 ` [PATCH v1 8/9] iommu: Introduce iotlb_sync_map callback Dmitry Osipenko
@ 2018-05-08 18:17 ` Dmitry Osipenko
  8 siblings, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-08 18:17 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

Currently GART writes one page entry at a time. More optimal would be to
aggregate the writes and flush BUS buffer in the end, this gives map/unmap
10-40% (depending on size of mapping) performance boost compared to a
flushing after each entry update.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/iommu/tegra-gart.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index ebc105c201bd..26d8735d26e8 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -226,7 +226,6 @@ static int gart_iommu_map(struct iommu_domain *domain, unsigned long iova,
 		}
 	}
 	gart_set_pte(gart, iova, GART_PTE(pfn));
-	FLUSH_GART_REGS(gart);
 	spin_unlock_irqrestore(&gart->pte_lock, flags);
 	return 0;
 }
@@ -243,7 +242,6 @@ static size_t gart_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 
 	spin_lock_irqsave(&gart->pte_lock, flags);
 	gart_set_pte(gart, iova, 0);
-	FLUSH_GART_REGS(gart);
 	spin_unlock_irqrestore(&gart->pte_lock, flags);
 	return bytes;
 }
@@ -319,6 +317,14 @@ static int gart_iommu_of_xlate(struct device *dev,
 	return 0;
 }
 
+static void gart_iommu_sync(struct iommu_domain *domain)
+{
+	struct gart_domain *gart_domain = to_gart_domain(domain);
+	struct gart_device *gart = gart_domain->gart;
+
+	FLUSH_GART_REGS(gart);
+}
+
 static const struct iommu_ops gart_iommu_ops = {
 	.capable	= gart_iommu_capable,
 	.domain_alloc	= gart_iommu_domain_alloc,
@@ -334,6 +340,8 @@ static const struct iommu_ops gart_iommu_ops = {
 	.iova_to_phys	= gart_iommu_iova_to_phys,
 	.pgsize_bitmap	= GART_IOMMU_PGSIZES,
 	.of_xlate	= gart_iommu_of_xlate,
+	.iotlb_sync_map	= gart_iommu_sync,
+	.iotlb_sync	= gart_iommu_sync,
 };
 
 static int gart_iommu_check_device(struct gart_device *gart,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices
  2018-05-08 18:16 ` [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices Dmitry Osipenko
@ 2018-05-11 11:12   ` Dmitry Osipenko
  2018-05-11 12:32   ` Robin Murphy
  1 sibling, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-11 11:12 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

On 08.05.2018 21:16, Dmitry Osipenko wrote:
> GART aperture is shared by all devices, hence there is a single IOMMU
> domain and group shared by these devices. Allocation of a group per
> device only wastes resources and allowance of having more than one domain
> is simply wrong because IOMMU mappings made by the users of "different"
> domains will stomp on each other.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/iommu/tegra-gart.c | 107 +++++++++----------------------------
>  1 file changed, 24 insertions(+), 83 deletions(-)
> 
> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
> index 5b2d27620350..ebc105c201bd 100644
> --- a/drivers/iommu/tegra-gart.c
> +++ b/drivers/iommu/tegra-gart.c
> @@ -19,7 +19,6 @@
>  
>  #include <linux/io.h>
>  #include <linux/iommu.h>
> -#include <linux/list.h>
>  #include <linux/module.h>
>  #include <linux/of_device.h>
>  #include <linux/slab.h>
> @@ -44,22 +43,17 @@
>  #define GART_PAGE_MASK						\
>  	(~(GART_PAGE_SIZE - 1) & ~GART_ENTRY_PHYS_ADDR_VALID)
>  
> -struct gart_client {
> -	struct device		*dev;
> -	struct list_head	list;
> -};
> -
>  struct gart_device {
>  	void __iomem		*regs;
>  	u32			*savedata;
>  	u32			page_count;	/* total remappable size */
>  	dma_addr_t		iovmm_base;	/* offset to vmm_area */
>  	spinlock_t		pte_lock;	/* for pagetable */
> -	struct list_head	client;
> -	spinlock_t		client_lock;	/* for client list */
>  	struct device		*dev;
>  
>  	struct iommu_device	iommu;		/* IOMMU Core handle */
> +	struct iommu_group	*group;		/* Common IOMMU group */
> +	struct gart_domain	*domain;	/* Unique IOMMU domain */
>  
>  	struct tegra_mc_gart_handle mc_gart_handle;
>  };
> @@ -169,81 +163,31 @@ static inline bool gart_iova_range_valid(struct gart_device *gart,
>  static int gart_iommu_attach_dev(struct iommu_domain *domain,
>  				 struct device *dev)
>  {
> -	struct gart_domain *gart_domain = to_gart_domain(domain);
> -	struct gart_device *gart = gart_domain->gart;
> -	struct gart_client *client, *c;
> -	int err = 0;
> -
> -	client = devm_kzalloc(gart->dev, sizeof(*c), GFP_KERNEL);
> -	if (!client)
> -		return -ENOMEM;
> -	client->dev = dev;
> -
> -	spin_lock(&gart->client_lock);
> -	list_for_each_entry(c, &gart->client, list) {
> -		if (c->dev == dev) {
> -			dev_err(gart->dev,
> -				"%s is already attached\n", dev_name(dev));
> -			err = -EINVAL;
> -			goto fail;
> -		}
> -	}
> -	list_add(&client->list, &gart->client);
> -	spin_unlock(&gart->client_lock);
> -	dev_dbg(gart->dev, "Attached %s\n", dev_name(dev));
>  	return 0;
> -
> -fail:
> -	devm_kfree(gart->dev, client);
> -	spin_unlock(&gart->client_lock);
> -	return err;
>  }
>  
>  static void gart_iommu_detach_dev(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> -	struct gart_domain *gart_domain = to_gart_domain(domain);
> -	struct gart_device *gart = gart_domain->gart;
> -	struct gart_client *c;
> -
> -	spin_lock(&gart->client_lock);
> -
> -	list_for_each_entry(c, &gart->client, list) {
> -		if (c->dev == dev) {
> -			list_del(&c->list);
> -			devm_kfree(gart->dev, c);
> -			dev_dbg(gart->dev, "Detached %s\n", dev_name(dev));
> -			goto out;
> -		}
> -	}
> -	dev_err(gart->dev, "Couldn't find\n");
> -out:
> -	spin_unlock(&gart->client_lock);
>  }
>  
>  static struct iommu_domain *gart_iommu_domain_alloc(unsigned type)
>  {
> -	struct gart_domain *gart_domain;
> -	struct gart_device *gart;
> -
> -	if (type != IOMMU_DOMAIN_UNMANAGED)
> -		return NULL;
> +	struct gart_device *gart = gart_handle;
>  
> -	gart = gart_handle;
> -	if (!gart)
> +	if (type != IOMMU_DOMAIN_UNMANAGED || gart->domain)
>  		return NULL;
>  
> -	gart_domain = kzalloc(sizeof(*gart_domain), GFP_KERNEL);
> -	if (!gart_domain)
> -		return NULL;
> -
> -	gart_domain->gart = gart;
> -	gart_domain->domain.geometry.aperture_start = gart->iovmm_base;
> -	gart_domain->domain.geometry.aperture_end = gart->iovmm_base +
> +	gart->domain = kzalloc(sizeof(*gart->domain), GFP_KERNEL);
> +	if (gart->domain) {
> +		gart->domain->domain.geometry.aperture_start = gart->iovmm_base;
> +		gart->domain->domain.geometry.aperture_end = gart->iovmm_base +
>  					gart->page_count * GART_PAGE_SIZE - 1;
> -	gart_domain->domain.geometry.force_aperture = true;
> +		gart->domain->domain.geometry.force_aperture = true;
> +		gart->domain->gart = gart;
> +	}
>  
> -	return &gart_domain->domain;
> +	return &gart->domain->domain;
>  }

I've missed a NULL-check and locking here, this will be fixed in v2. For now
I'll wait for the review comments (please review).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 6/9] iommu/tegra: gart: Ignore devices without IOMMU phandle in DT
  2018-05-08 18:16 ` [PATCH v1 6/9] iommu/tegra: gart: Ignore devices without IOMMU phandle in DT Dmitry Osipenko
@ 2018-05-11 11:34   ` Robin Murphy
  2018-05-11 15:34     ` Dmitry Osipenko
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2018-05-11 11:34 UTC (permalink / raw)
  To: Dmitry Osipenko, Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

Hi Dmitry,

On 08/05/18 19:16, Dmitry Osipenko wrote:
> GART can't handle all devices, ignore devices that aren't related to GART.
> Device tree must explicitly assign GART IOMMU to the devices.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>   drivers/iommu/tegra-gart.c | 33 ++++++++++++++++++++++++++++++++-
>   1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
> index 39305224c48d..5b2d27620350 100644
> --- a/drivers/iommu/tegra-gart.c
> +++ b/drivers/iommu/tegra-gart.c
> @@ -366,6 +366,26 @@ static void gart_iommu_remove_device(struct device *dev)
>   	iommu_device_unlink(&gart_handle->iommu, dev);
>   }
>   
> +static int gart_iommu_check_device(struct gart_device *gart,
> +				   struct device *dev);
> +
> +struct iommu_group *gart_iommu_device_group(struct device *dev)
> +{
> +	int err;
> +
> +	err = gart_iommu_check_device(gart_handle, dev);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	return generic_device_group(dev);
> +}
> +
> +static int gart_iommu_of_xlate(struct device *dev,
> +			       struct of_phandle_args *args)
> +{
> +	return 0;
> +}
> +
>   static const struct iommu_ops gart_iommu_ops = {
>   	.capable	= gart_iommu_capable,
>   	.domain_alloc	= gart_iommu_domain_alloc,
> @@ -374,14 +394,24 @@ static const struct iommu_ops gart_iommu_ops = {
>   	.detach_dev	= gart_iommu_detach_dev,
>   	.add_device	= gart_iommu_add_device,
>   	.remove_device	= gart_iommu_remove_device,
> -	.device_group	= generic_device_group,
> +	.device_group	= gart_iommu_device_group,
>   	.map		= gart_iommu_map,
>   	.map_sg		= default_iommu_map_sg,
>   	.unmap		= gart_iommu_unmap,
>   	.iova_to_phys	= gart_iommu_iova_to_phys,
>   	.pgsize_bitmap	= GART_IOMMU_PGSIZES,
> +	.of_xlate	= gart_iommu_of_xlate,
>   };
>   
> +static int gart_iommu_check_device(struct gart_device *gart,
> +				   struct device *dev)
> +{
> +	if (!dev->iommu_fwspec || dev->iommu_fwspec->ops != &gart_iommu_ops)
> +		return -ENODEV;

Conceptually, it would be better to verify this in .add_device *before* 
calling iommu_group_get_for_dev(). That would also align with what other 
drivers do, and let you save introducing the .device_group callback 
until the real implementation in the later patch.

Robin.

> +
> +	return 0;
> +}
> +
>   static int tegra_gart_suspend(struct device *dev)
>   {
>   	struct gart_device *gart = dev_get_drvdata(dev);
> @@ -462,6 +492,7 @@ static int tegra_gart_probe(struct platform_device *pdev)
>   	}
>   
>   	iommu_device_set_ops(&gart->iommu, &gart_iommu_ops);
> +	iommu_device_set_fwnode(&gart->iommu, dev->fwnode);
>   
>   	ret = iommu_device_register(&gart->iommu);
>   	if (ret) {
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices
  2018-05-08 18:16 ` [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices Dmitry Osipenko
  2018-05-11 11:12   ` Dmitry Osipenko
@ 2018-05-11 12:32   ` Robin Murphy
  2018-05-11 20:05     ` Dmitry Osipenko
  1 sibling, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2018-05-11 12:32 UTC (permalink / raw)
  To: Dmitry Osipenko, Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

On 08/05/18 19:16, Dmitry Osipenko wrote:
> GART aperture is shared by all devices, hence there is a single IOMMU
> domain and group shared by these devices. Allocation of a group per
> device only wastes resources and allowance of having more than one domain
> is simply wrong because IOMMU mappings made by the users of "different"
> domains will stomp on each other.

Strictly, that reasoning is a bit backwards - allocating multiple groups 
is the conceptually-wrong thing if the GART cannot differentiate between 
different devices, whereas having multiple domains *exist* is no real 
problem, it's merely that only one can be active at any point in time 
(which will inherently become the case once all devices are grouped 
together).

> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>   drivers/iommu/tegra-gart.c | 107 +++++++++----------------------------
>   1 file changed, 24 insertions(+), 83 deletions(-)
> 
> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
> index 5b2d27620350..ebc105c201bd 100644
> --- a/drivers/iommu/tegra-gart.c
> +++ b/drivers/iommu/tegra-gart.c
> @@ -19,7 +19,6 @@
>   
>   #include <linux/io.h>
>   #include <linux/iommu.h>
> -#include <linux/list.h>
>   #include <linux/module.h>
>   #include <linux/of_device.h>
>   #include <linux/slab.h>
> @@ -44,22 +43,17 @@
>   #define GART_PAGE_MASK						\
>   	(~(GART_PAGE_SIZE - 1) & ~GART_ENTRY_PHYS_ADDR_VALID)
>   
> -struct gart_client {
> -	struct device		*dev;
> -	struct list_head	list;
> -};
> -
>   struct gart_device {
>   	void __iomem		*regs;
>   	u32			*savedata;
>   	u32			page_count;	/* total remappable size */
>   	dma_addr_t		iovmm_base;	/* offset to vmm_area */
>   	spinlock_t		pte_lock;	/* for pagetable */
> -	struct list_head	client;
> -	spinlock_t		client_lock;	/* for client list */
>   	struct device		*dev;
>   
>   	struct iommu_device	iommu;		/* IOMMU Core handle */
> +	struct iommu_group	*group;		/* Common IOMMU group */
> +	struct gart_domain	*domain;	/* Unique IOMMU domain */
>   
>   	struct tegra_mc_gart_handle mc_gart_handle;
>   };
> @@ -169,81 +163,31 @@ static inline bool gart_iova_range_valid(struct gart_device *gart,
>   static int gart_iommu_attach_dev(struct iommu_domain *domain,
>   				 struct device *dev)
>   {
> -	struct gart_domain *gart_domain = to_gart_domain(domain);
> -	struct gart_device *gart = gart_domain->gart;
> -	struct gart_client *client, *c;
> -	int err = 0;
> -
> -	client = devm_kzalloc(gart->dev, sizeof(*c), GFP_KERNEL);
> -	if (!client)
> -		return -ENOMEM;
> -	client->dev = dev;
> -
> -	spin_lock(&gart->client_lock);
> -	list_for_each_entry(c, &gart->client, list) {
> -		if (c->dev == dev) {
> -			dev_err(gart->dev,
> -				"%s is already attached\n", dev_name(dev));
> -			err = -EINVAL;
> -			goto fail;
> -		}
> -	}
> -	list_add(&client->list, &gart->client);
> -	spin_unlock(&gart->client_lock);
> -	dev_dbg(gart->dev, "Attached %s\n", dev_name(dev));
>   	return 0;
> -
> -fail:
> -	devm_kfree(gart->dev, client);
> -	spin_unlock(&gart->client_lock);
> -	return err;
>   }
>   
>   static void gart_iommu_detach_dev(struct iommu_domain *domain,
>   				  struct device *dev)
>   {
> -	struct gart_domain *gart_domain = to_gart_domain(domain);
> -	struct gart_device *gart = gart_domain->gart;
> -	struct gart_client *c;
> -
> -	spin_lock(&gart->client_lock);
> -
> -	list_for_each_entry(c, &gart->client, list) {
> -		if (c->dev == dev) {
> -			list_del(&c->list);
> -			devm_kfree(gart->dev, c);
> -			dev_dbg(gart->dev, "Detached %s\n", dev_name(dev));
> -			goto out;
> -		}
> -	}
> -	dev_err(gart->dev, "Couldn't find\n");
> -out:
> -	spin_unlock(&gart->client_lock);
>   }

The .detach_dev callback is optional in the core API now, so you can 
just remove the whole thing.

>   static struct iommu_domain *gart_iommu_domain_alloc(unsigned type)
>   {
> -	struct gart_domain *gart_domain;
> -	struct gart_device *gart;
> -
> -	if (type != IOMMU_DOMAIN_UNMANAGED)
> -		return NULL;
> +	struct gart_device *gart = gart_handle;
>   
> -	gart = gart_handle;
> -	if (!gart)
> +	if (type != IOMMU_DOMAIN_UNMANAGED || gart->domain)

Singleton domains are a little unpleasant given the way the IOMMU API 
expects things to work, but it looks fairly simple to avoid needing that 
at all. AFAICS you could move gart->savedata to something like 
gart_domain->ptes and keep it up-to-date in .map/.unmap, then in 
.attach_dev you just need to do something like:

	if (gart_domain != gart->domain) {
		do_gart_setup(gart, gart_domain->ptes);
		gart->domain = gart_domain;
	}

to context-switch the hardware state when moving the group from one 
domain to another (and as a bonus you would no longer need to do 
anything for suspend, since resume can just look at the current domain 
too). If in practice there's only ever one domain allocated anyway, then 
there's no difference in memory overhead, but you still have the benefit 
of the driver being more consistent with others and allowing that 
flexibility if anyone ever did want to play with it.

>   		return NULL;
>   
> -	gart_domain = kzalloc(sizeof(*gart_domain), GFP_KERNEL);
> -	if (!gart_domain)
> -		return NULL;
> -
> -	gart_domain->gart = gart;
> -	gart_domain->domain.geometry.aperture_start = gart->iovmm_base;
> -	gart_domain->domain.geometry.aperture_end = gart->iovmm_base +
> +	gart->domain = kzalloc(sizeof(*gart->domain), GFP_KERNEL);
> +	if (gart->domain) {
> +		gart->domain->domain.geometry.aperture_start = gart->iovmm_base;
> +		gart->domain->domain.geometry.aperture_end = gart->iovmm_base +
>   					gart->page_count * GART_PAGE_SIZE - 1;
> -	gart_domain->domain.geometry.force_aperture = true;
> +		gart->domain->domain.geometry.force_aperture = true;
> +		gart->domain->gart = gart;
> +	}
>   
> -	return &gart_domain->domain;
> +	return &gart->domain->domain;
>   }
>   
>   static void gart_iommu_domain_free(struct iommu_domain *domain)
> @@ -251,18 +195,7 @@ static void gart_iommu_domain_free(struct iommu_domain *domain)
>   	struct gart_domain *gart_domain = to_gart_domain(domain);
>   	struct gart_device *gart = gart_domain->gart;
>   
> -	if (gart) {
> -		spin_lock(&gart->client_lock);
> -		if (!list_empty(&gart->client)) {
> -			struct gart_client *c;
> -
> -			list_for_each_entry(c, &gart->client, list)
> -				gart_iommu_detach_dev(domain, c->dev);
> -		}
> -		spin_unlock(&gart->client_lock);
> -	}
> -
> -	kfree(gart_domain);
> +	kfree(gart->domain);
>   }
>   
>   static int gart_iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -377,7 +310,7 @@ struct iommu_group *gart_iommu_device_group(struct device *dev)
>   	if (err)
>   		return ERR_PTR(err);
>   
> -	return generic_device_group(dev);
> +	return gart_handle->group;

You should take a reference per device, i.e.:

	return iommu_group_ref_get(gart_handle->group);

otherwise removing devices could unbalance things and result in the 
group getting freed prematurely.

>   }
>   
>   static int gart_iommu_of_xlate(struct device *dev,
> @@ -502,8 +435,6 @@ static int tegra_gart_probe(struct platform_device *pdev)
>   
>   	gart->dev = &pdev->dev;
>   	spin_lock_init(&gart->pte_lock);
> -	spin_lock_init(&gart->client_lock);
> -	INIT_LIST_HEAD(&gart->client);
>   	gart->regs = gart_regs;
>   	gart->iovmm_base = (dma_addr_t)res_remap->start;
>   	gart->page_count = (resource_size(res_remap) >> GART_PAGE_SHIFT);
> @@ -517,6 +448,14 @@ static int tegra_gart_probe(struct platform_device *pdev)
>   		goto iommu_unregister;
>   	}
>   
> +	gart->group = iommu_group_alloc();
> +	if (IS_ERR(gart->group)) {
> +		ret = PTR_ERR(gart->group);
> +		goto free_savedata;
> +	}
> +
> +	iommu_group_ref_get(gart->group);

You already hold the initial reference from iommu_group_alloc(), so 
there's no need to take a second one at this point.

Robin.

> +
>   	platform_set_drvdata(pdev, gart);
>   	do_gart_setup(gart, NULL);
>   
> @@ -525,6 +464,8 @@ static int tegra_gart_probe(struct platform_device *pdev)
>   
>   	return 0;
>   
> +free_savedata:
> +	vfree(gart->savedata);
>   iommu_unregister:
>   	iommu_device_unregister(&gart->iommu);
>   remove_sysfs:
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 8/9] iommu: Introduce iotlb_sync_map callback
  2018-05-08 18:16 ` [PATCH v1 8/9] iommu: Introduce iotlb_sync_map callback Dmitry Osipenko
@ 2018-05-11 13:02   ` Robin Murphy
  2018-05-11 19:58     ` Dmitry Osipenko
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2018-05-11 13:02 UTC (permalink / raw)
  To: Dmitry Osipenko, Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

On 08/05/18 19:16, Dmitry Osipenko wrote:
> Introduce iotlb_sync_map() callback that is invoked in the end of
> iommu_map(). This new callback allows IOMMU drivers to avoid syncing
> on mapping of each contiguous chunk and sync only when whole mapping
> is completed, optimizing performance of the mapping operation.

This looks like a reasonable compromise - for users of 
IO_PGTABLE_QUIRK_TLBI_ON_MAP we can leave the implicit add_flush() in 
place for each individual map call, but at least move the sync() out to 
this callback, which should still be beneficial overall.

Modulo one possible nit below,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>   drivers/iommu/iommu.c | 8 ++++++--
>   include/linux/iommu.h | 1 +
>   2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d2aa23202bb9..39b2ee66aa96 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1508,13 +1508,14 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
>   int iommu_map(struct iommu_domain *domain, unsigned long iova,
>   	      phys_addr_t paddr, size_t size, int prot)
>   {
> +	const struct iommu_ops *ops = domain->ops;
>   	unsigned long orig_iova = iova;
>   	unsigned int min_pagesz;
>   	size_t orig_size = size;
>   	phys_addr_t orig_paddr = paddr;
>   	int ret = 0;
>   
> -	if (unlikely(domain->ops->map == NULL ||
> +	if (unlikely(ops->map == NULL ||
>   		     domain->pgsize_bitmap == 0UL))
>   		return -ENODEV;
>   
> @@ -1543,7 +1544,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
>   		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
>   			 iova, &paddr, pgsize);
>   
> -		ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
> +		ret = ops->map(domain, iova, paddr, pgsize, prot);
>   		if (ret)
>   			break;
>   
> @@ -1552,6 +1553,9 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
>   		size -= pgsize;
>   	}
>   
> +	if (ops->iotlb_sync_map)
> +		ops->iotlb_sync_map(domain);

Is it worth trying to skip this in the error case when we're just going 
to undo it immediately?

Robin.

> +
>   	/* unroll mapping in case something went wrong */
>   	if (ret)
>   		iommu_unmap(domain, orig_iova, orig_size - size);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 19938ee6eb31..5224aa376377 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -206,6 +206,7 @@ struct iommu_ops {
>   	void (*flush_iotlb_all)(struct iommu_domain *domain);
>   	void (*iotlb_range_add)(struct iommu_domain *domain,
>   				unsigned long iova, size_t size);
> +	void (*iotlb_sync_map)(struct iommu_domain *domain);
>   	void (*iotlb_sync)(struct iommu_domain *domain);
>   	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
>   	int (*add_device)(struct device *dev);
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 6/9] iommu/tegra: gart: Ignore devices without IOMMU phandle in DT
  2018-05-11 11:34   ` Robin Murphy
@ 2018-05-11 15:34     ` Dmitry Osipenko
  0 siblings, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-11 15:34 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

Hi Robin,

On 11.05.2018 14:34, Robin Murphy wrote:
> Hi Dmitry,
> 
> On 08/05/18 19:16, Dmitry Osipenko wrote:
>> GART can't handle all devices, ignore devices that aren't related to GART.
>> Device tree must explicitly assign GART IOMMU to the devices.
>>
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>   drivers/iommu/tegra-gart.c | 33 ++++++++++++++++++++++++++++++++-
>>   1 file changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
>> index 39305224c48d..5b2d27620350 100644
>> --- a/drivers/iommu/tegra-gart.c
>> +++ b/drivers/iommu/tegra-gart.c
>> @@ -366,6 +366,26 @@ static void gart_iommu_remove_device(struct device *dev)
>>       iommu_device_unlink(&gart_handle->iommu, dev);
>>   }
>>   +static int gart_iommu_check_device(struct gart_device *gart,
>> +                   struct device *dev);
>> +
>> +struct iommu_group *gart_iommu_device_group(struct device *dev)
>> +{
>> +    int err;
>> +
>> +    err = gart_iommu_check_device(gart_handle, dev);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +
>> +    return generic_device_group(dev);
>> +}
>> +
>> +static int gart_iommu_of_xlate(struct device *dev,
>> +                   struct of_phandle_args *args)
>> +{
>> +    return 0;
>> +}
>> +
>>   static const struct iommu_ops gart_iommu_ops = {
>>       .capable    = gart_iommu_capable,
>>       .domain_alloc    = gart_iommu_domain_alloc,
>> @@ -374,14 +394,24 @@ static const struct iommu_ops gart_iommu_ops = {
>>       .detach_dev    = gart_iommu_detach_dev,
>>       .add_device    = gart_iommu_add_device,
>>       .remove_device    = gart_iommu_remove_device,
>> -    .device_group    = generic_device_group,
>> +    .device_group    = gart_iommu_device_group,
>>       .map        = gart_iommu_map,
>>       .map_sg        = default_iommu_map_sg,
>>       .unmap        = gart_iommu_unmap,
>>       .iova_to_phys    = gart_iommu_iova_to_phys,
>>       .pgsize_bitmap    = GART_IOMMU_PGSIZES,
>> +    .of_xlate    = gart_iommu_of_xlate,
>>   };
>>   +static int gart_iommu_check_device(struct gart_device *gart,
>> +                   struct device *dev)
>> +{
>> +    if (!dev->iommu_fwspec || dev->iommu_fwspec->ops != &gart_iommu_ops)
>> +        return -ENODEV;
> 
> Conceptually, it would be better to verify this in .add_device *before* calling
> iommu_group_get_for_dev(). That would also align with what other drivers do, and
> let you save introducing the .device_group callback until the real
> implementation in the later patch.

Indeed, thank you for the suggestion.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 8/9] iommu: Introduce iotlb_sync_map callback
  2018-05-11 13:02   ` Robin Murphy
@ 2018-05-11 19:58     ` Dmitry Osipenko
  0 siblings, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-11 19:58 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

On 11.05.2018 16:02, Robin Murphy wrote:
> On 08/05/18 19:16, Dmitry Osipenko wrote:
>> Introduce iotlb_sync_map() callback that is invoked in the end of
>> iommu_map(). This new callback allows IOMMU drivers to avoid syncing
>> on mapping of each contiguous chunk and sync only when whole mapping
>> is completed, optimizing performance of the mapping operation.
> 
> This looks like a reasonable compromise - for users of
> IO_PGTABLE_QUIRK_TLBI_ON_MAP we can leave the implicit add_flush() in place for
> each individual map call, but at least move the sync() out to this callback,
> which should still be beneficial overall.
> 
> Modulo one possible nit below,
> 
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> 
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>   drivers/iommu/iommu.c | 8 ++++++--
>>   include/linux/iommu.h | 1 +
>>   2 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index d2aa23202bb9..39b2ee66aa96 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1508,13 +1508,14 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
>>   int iommu_map(struct iommu_domain *domain, unsigned long iova,
>>             phys_addr_t paddr, size_t size, int prot)
>>   {
>> +    const struct iommu_ops *ops = domain->ops;
>>       unsigned long orig_iova = iova;
>>       unsigned int min_pagesz;
>>       size_t orig_size = size;
>>       phys_addr_t orig_paddr = paddr;
>>       int ret = 0;
>>   -    if (unlikely(domain->ops->map == NULL ||
>> +    if (unlikely(ops->map == NULL ||
>>                domain->pgsize_bitmap == 0UL))
>>           return -ENODEV;
>>   @@ -1543,7 +1544,7 @@ int iommu_map(struct iommu_domain *domain, unsigned
>> long iova,
>>           pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
>>                iova, &paddr, pgsize);
>>   -        ret = domain->ops->map(domain, iova, paddr, pgsize, prot);
>> +        ret = ops->map(domain, iova, paddr, pgsize, prot);
>>           if (ret)
>>               break;
>>   @@ -1552,6 +1553,9 @@ int iommu_map(struct iommu_domain *domain, unsigned
>> long iova,
>>           size -= pgsize;
>>       }
>>   +    if (ops->iotlb_sync_map)
>> +        ops->iotlb_sync_map(domain);
> 
> Is it worth trying to skip this in the error case when we're just going to undo
> it immediately?

I can imagine an IOMMU driver that could handle syncing of mapping differently
from the unmapping, in this case it may not worth skipping sync_map in the error
case.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices
  2018-05-11 12:32   ` Robin Murphy
@ 2018-05-11 20:05     ` Dmitry Osipenko
  2018-05-14 18:18       ` Robin Murphy
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-11 20:05 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

On 11.05.2018 15:32, Robin Murphy wrote:
> On 08/05/18 19:16, Dmitry Osipenko wrote:
>> GART aperture is shared by all devices, hence there is a single IOMMU
>> domain and group shared by these devices. Allocation of a group per
>> device only wastes resources and allowance of having more than one domain
>> is simply wrong because IOMMU mappings made by the users of "different"
>> domains will stomp on each other.
> 
> Strictly, that reasoning is a bit backwards - allocating multiple groups is the
> conceptually-wrong thing if the GART cannot differentiate between different
> devices, whereas having multiple domains *exist* is no real problem, it's merely
> that only one can be active at any point in time (which will inherently become
> the case once all devices are grouped together).

IIUC, the IOMMU domain represents the address space. There is only one address
space in a case of GART, the GART's aperture. So GART not only isn't
differentiating between different devices, but also between different domains.

>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>   drivers/iommu/tegra-gart.c | 107 +++++++++----------------------------
>>   1 file changed, 24 insertions(+), 83 deletions(-)
>>
>> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
>> index 5b2d27620350..ebc105c201bd 100644
>> --- a/drivers/iommu/tegra-gart.c
>> +++ b/drivers/iommu/tegra-gart.c
>> @@ -19,7 +19,6 @@
>>     #include <linux/io.h>
>>   #include <linux/iommu.h>
>> -#include <linux/list.h>
>>   #include <linux/module.h>
>>   #include <linux/of_device.h>
>>   #include <linux/slab.h>
>> @@ -44,22 +43,17 @@
>>   #define GART_PAGE_MASK                        \
>>       (~(GART_PAGE_SIZE - 1) & ~GART_ENTRY_PHYS_ADDR_VALID)
>>   -struct gart_client {
>> -    struct device        *dev;
>> -    struct list_head    list;
>> -};
>> -
>>   struct gart_device {
>>       void __iomem        *regs;
>>       u32            *savedata;
>>       u32            page_count;    /* total remappable size */
>>       dma_addr_t        iovmm_base;    /* offset to vmm_area */
>>       spinlock_t        pte_lock;    /* for pagetable */
>> -    struct list_head    client;
>> -    spinlock_t        client_lock;    /* for client list */
>>       struct device        *dev;
>>         struct iommu_device    iommu;        /* IOMMU Core handle */
>> +    struct iommu_group    *group;        /* Common IOMMU group */
>> +    struct gart_domain    *domain;    /* Unique IOMMU domain */
>>         struct tegra_mc_gart_handle mc_gart_handle;
>>   };
>> @@ -169,81 +163,31 @@ static inline bool gart_iova_range_valid(struct
>> gart_device *gart,
>>   static int gart_iommu_attach_dev(struct iommu_domain *domain,
>>                    struct device *dev)
>>   {
>> -    struct gart_domain *gart_domain = to_gart_domain(domain);
>> -    struct gart_device *gart = gart_domain->gart;
>> -    struct gart_client *client, *c;
>> -    int err = 0;
>> -
>> -    client = devm_kzalloc(gart->dev, sizeof(*c), GFP_KERNEL);
>> -    if (!client)
>> -        return -ENOMEM;
>> -    client->dev = dev;
>> -
>> -    spin_lock(&gart->client_lock);
>> -    list_for_each_entry(c, &gart->client, list) {
>> -        if (c->dev == dev) {
>> -            dev_err(gart->dev,
>> -                "%s is already attached\n", dev_name(dev));
>> -            err = -EINVAL;
>> -            goto fail;
>> -        }
>> -    }
>> -    list_add(&client->list, &gart->client);
>> -    spin_unlock(&gart->client_lock);
>> -    dev_dbg(gart->dev, "Attached %s\n", dev_name(dev));
>>       return 0;
>> -
>> -fail:
>> -    devm_kfree(gart->dev, client);
>> -    spin_unlock(&gart->client_lock);
>> -    return err;
>>   }
>>     static void gart_iommu_detach_dev(struct iommu_domain *domain,
>>                     struct device *dev)
>>   {
>> -    struct gart_domain *gart_domain = to_gart_domain(domain);
>> -    struct gart_device *gart = gart_domain->gart;
>> -    struct gart_client *c;
>> -
>> -    spin_lock(&gart->client_lock);
>> -
>> -    list_for_each_entry(c, &gart->client, list) {
>> -        if (c->dev == dev) {
>> -            list_del(&c->list);
>> -            devm_kfree(gart->dev, c);
>> -            dev_dbg(gart->dev, "Detached %s\n", dev_name(dev));
>> -            goto out;
>> -        }
>> -    }
>> -    dev_err(gart->dev, "Couldn't find\n");
>> -out:
>> -    spin_unlock(&gart->client_lock);
>>   }
> 
> The .detach_dev callback is optional in the core API now, so you can just remove
> the whole thing.

Good catch, thanks!

> 
>>   static struct iommu_domain *gart_iommu_domain_alloc(unsigned type)
>>   {
>> -    struct gart_domain *gart_domain;
>> -    struct gart_device *gart;
>> -
>> -    if (type != IOMMU_DOMAIN_UNMANAGED)
>> -        return NULL;
>> +    struct gart_device *gart = gart_handle;
>>   -    gart = gart_handle;
>> -    if (!gart)
>> +    if (type != IOMMU_DOMAIN_UNMANAGED || gart->domain)
> 
> Singleton domains are a little unpleasant given the way the IOMMU API expects
> things to work, but it looks fairly simple to avoid needing that at all. AFAICS
> you could move gart->savedata to something like gart_domain->ptes and keep it
> up-to-date in .map/.unmap, then in .attach_dev you just need to do something like:
> 
>     if (gart_domain != gart->domain) {
>         do_gart_setup(gart, gart_domain->ptes);
>         gart->domain = gart_domain;
>     }
> 
> to context-switch the hardware state when moving the group from one domain to
> another (and as a bonus you would no longer need to do anything for suspend,
> since resume can just look at the current domain too). If in practice there's
> only ever one domain allocated anyway, then there's no difference in memory
> overhead, but you still have the benefit of the driver being more consistent
> with others and allowing that flexibility if anyone ever did want to play with it.

For the starter we'll have a single domain solely used by GPU with all its
sub-devices. Context switching will be handled by the Tegra's DRM driver. Later
we may consider introducing IOMMU support for the video decoder, at least to
provide memory isolation for the buffers to which decoder performs writing.

Cross-driver context switching isn't that straightforward and I think Tegra-GART
driver shouldn't take care of context switching in any form and only perform
mapping / unmapping operations. There are couple variants of how to deal with
the context switching:

1. A simple solution could be to logically split the GART's aperture space into
different domains, but GART's aperture won't be utilized efficiently with this
approach, wasting IOVA space quite a lot.

2. In order to utilize aperture more efficiently, we are going to make DRM
driver to cache IOMMU mappings such that graphics buffer will be moved to the
cache-eviction list on unmapping and actually unmapped when that buffer isn't
in-use and there is no IOVA space for another buffer or on the buffers
destruction. We'll use DRM's MM scanning helper for that [0][1]. Maybe we could
share access to that MM helper with the video decoder somehow. Seems IOMMU API
isn't tailored for a such use-case, so probably having a custom
platform-specific API on top of the IOMMU API would be fine and with that we
could have cross-device/driver context switching handled by the custom API.

Please let me know if you have any other variants to suggest.

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/include/drm/drm_mm.h
[1]
https://github.com/grate-driver/linux/commit/16e017efaa343e23e5a7d2d498915764cc806054

> 
>>           return NULL;
>>   -    gart_domain = kzalloc(sizeof(*gart_domain), GFP_KERNEL);
>> -    if (!gart_domain)
>> -        return NULL;
>> -
>> -    gart_domain->gart = gart;
>> -    gart_domain->domain.geometry.aperture_start = gart->iovmm_base;
>> -    gart_domain->domain.geometry.aperture_end = gart->iovmm_base +
>> +    gart->domain = kzalloc(sizeof(*gart->domain), GFP_KERNEL);
>> +    if (gart->domain) {
>> +        gart->domain->domain.geometry.aperture_start = gart->iovmm_base;
>> +        gart->domain->domain.geometry.aperture_end = gart->iovmm_base +
>>                       gart->page_count * GART_PAGE_SIZE - 1;
>> -    gart_domain->domain.geometry.force_aperture = true;
>> +        gart->domain->domain.geometry.force_aperture = true;
>> +        gart->domain->gart = gart;
>> +    }
>>   -    return &gart_domain->domain;
>> +    return &gart->domain->domain;
>>   }
>>     static void gart_iommu_domain_free(struct iommu_domain *domain)
>> @@ -251,18 +195,7 @@ static void gart_iommu_domain_free(struct iommu_domain
>> *domain)
>>       struct gart_domain *gart_domain = to_gart_domain(domain);
>>       struct gart_device *gart = gart_domain->gart;
>>   -    if (gart) {
>> -        spin_lock(&gart->client_lock);
>> -        if (!list_empty(&gart->client)) {
>> -            struct gart_client *c;
>> -
>> -            list_for_each_entry(c, &gart->client, list)
>> -                gart_iommu_detach_dev(domain, c->dev);
>> -        }
>> -        spin_unlock(&gart->client_lock);
>> -    }
>> -
>> -    kfree(gart_domain);
>> +    kfree(gart->domain);
>>   }
>>     static int gart_iommu_map(struct iommu_domain *domain, unsigned long iova,
>> @@ -377,7 +310,7 @@ struct iommu_group *gart_iommu_device_group(struct device
>> *dev)
>>       if (err)
>>           return ERR_PTR(err);
>>   -    return generic_device_group(dev);
>> +    return gart_handle->group;
> 
> You should take a reference per device, i.e.:
> 
>     return iommu_group_ref_get(gart_handle->group);
> 
> otherwise removing devices could unbalance things and result in the group
> getting freed prematurely.

Seems more correctly would be to remove iommu_group_put() from
gart_iommu_add_device().

> 
>>   }
>>     static int gart_iommu_of_xlate(struct device *dev,
>> @@ -502,8 +435,6 @@ static int tegra_gart_probe(struct platform_device *pdev)
>>         gart->dev = &pdev->dev;
>>       spin_lock_init(&gart->pte_lock);
>> -    spin_lock_init(&gart->client_lock);
>> -    INIT_LIST_HEAD(&gart->client);
>>       gart->regs = gart_regs;
>>       gart->iovmm_base = (dma_addr_t)res_remap->start;
>>       gart->page_count = (resource_size(res_remap) >> GART_PAGE_SHIFT);
>> @@ -517,6 +448,14 @@ static int tegra_gart_probe(struct platform_device *pdev)
>>           goto iommu_unregister;
>>       }
>>   +    gart->group = iommu_group_alloc();
>> +    if (IS_ERR(gart->group)) {
>> +        ret = PTR_ERR(gart->group);
>> +        goto free_savedata;
>> +    }
>> +
>> +    iommu_group_ref_get(gart->group);
> 
> You already hold the initial reference from iommu_group_alloc(), so there's no
> need to take a second one at this point.

Yes, looks like this refcount-bump isn't needed here. I'll revisit the
refcountings and correct them in v2 where necessary.

Thank you very much for the review.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices
  2018-05-11 20:05     ` Dmitry Osipenko
@ 2018-05-14 18:18       ` Robin Murphy
  2018-05-16 13:43         ` Dmitry Osipenko
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2018-05-14 18:18 UTC (permalink / raw)
  To: Dmitry Osipenko, Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

On 11/05/18 21:05, Dmitry Osipenko wrote:
> On 11.05.2018 15:32, Robin Murphy wrote:
>> On 08/05/18 19:16, Dmitry Osipenko wrote:
>>> GART aperture is shared by all devices, hence there is a single IOMMU
>>> domain and group shared by these devices. Allocation of a group per
>>> device only wastes resources and allowance of having more than one domain
>>> is simply wrong because IOMMU mappings made by the users of "different"
>>> domains will stomp on each other.
>>
>> Strictly, that reasoning is a bit backwards - allocating multiple groups is the
>> conceptually-wrong thing if the GART cannot differentiate between different
>> devices, whereas having multiple domains *exist* is no real problem, it's merely
>> that only one can be active at any point in time (which will inherently become
>> the case once all devices are grouped together).
> 
> IIUC, the IOMMU domain represents the address space. There is only one address
> space in a case of GART, the GART's aperture. So GART not only isn't
> differentiating between different devices, but also between different domains.

Right, but that's the same as many other IOMMUs (exynos, rockchip, mtk, 
etc.) - the point is that an IOMMU domain represents *an* address space, 
but if nothing is attached to that domain, it's just a set of logical 
mappings which doesn't need to be backed by real hardware. It's 
specifically *because* these IOMMUs also can't differentiate between 
devices that things work out neatly - there's only one group, which can 
only be attached to a single domain at once, so there is never a time 
when more than one domain needs to be backed by hardware. Think of the 
IOMMU+devices as a CPU and the domains as processes ;)

>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>> ---
>>>    drivers/iommu/tegra-gart.c | 107 +++++++++----------------------------
>>>    1 file changed, 24 insertions(+), 83 deletions(-)
>>>
>>> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
>>> index 5b2d27620350..ebc105c201bd 100644
>>> --- a/drivers/iommu/tegra-gart.c
>>> +++ b/drivers/iommu/tegra-gart.c
>>> @@ -19,7 +19,6 @@
>>>      #include <linux/io.h>
>>>    #include <linux/iommu.h>
>>> -#include <linux/list.h>
>>>    #include <linux/module.h>
>>>    #include <linux/of_device.h>
>>>    #include <linux/slab.h>
>>> @@ -44,22 +43,17 @@
>>>    #define GART_PAGE_MASK                        \
>>>        (~(GART_PAGE_SIZE - 1) & ~GART_ENTRY_PHYS_ADDR_VALID)
>>>    -struct gart_client {
>>> -    struct device        *dev;
>>> -    struct list_head    list;
>>> -};
>>> -
>>>    struct gart_device {
>>>        void __iomem        *regs;
>>>        u32            *savedata;
>>>        u32            page_count;    /* total remappable size */
>>>        dma_addr_t        iovmm_base;    /* offset to vmm_area */
>>>        spinlock_t        pte_lock;    /* for pagetable */
>>> -    struct list_head    client;
>>> -    spinlock_t        client_lock;    /* for client list */
>>>        struct device        *dev;
>>>          struct iommu_device    iommu;        /* IOMMU Core handle */
>>> +    struct iommu_group    *group;        /* Common IOMMU group */
>>> +    struct gart_domain    *domain;    /* Unique IOMMU domain */
>>>          struct tegra_mc_gart_handle mc_gart_handle;
>>>    };
>>> @@ -169,81 +163,31 @@ static inline bool gart_iova_range_valid(struct
>>> gart_device *gart,
>>>    static int gart_iommu_attach_dev(struct iommu_domain *domain,
>>>                     struct device *dev)
>>>    {
>>> -    struct gart_domain *gart_domain = to_gart_domain(domain);
>>> -    struct gart_device *gart = gart_domain->gart;
>>> -    struct gart_client *client, *c;
>>> -    int err = 0;
>>> -
>>> -    client = devm_kzalloc(gart->dev, sizeof(*c), GFP_KERNEL);
>>> -    if (!client)
>>> -        return -ENOMEM;
>>> -    client->dev = dev;
>>> -
>>> -    spin_lock(&gart->client_lock);
>>> -    list_for_each_entry(c, &gart->client, list) {
>>> -        if (c->dev == dev) {
>>> -            dev_err(gart->dev,
>>> -                "%s is already attached\n", dev_name(dev));
>>> -            err = -EINVAL;
>>> -            goto fail;
>>> -        }
>>> -    }
>>> -    list_add(&client->list, &gart->client);
>>> -    spin_unlock(&gart->client_lock);
>>> -    dev_dbg(gart->dev, "Attached %s\n", dev_name(dev));
>>>        return 0;
>>> -
>>> -fail:
>>> -    devm_kfree(gart->dev, client);
>>> -    spin_unlock(&gart->client_lock);
>>> -    return err;
>>>    }
>>>      static void gart_iommu_detach_dev(struct iommu_domain *domain,
>>>                      struct device *dev)
>>>    {
>>> -    struct gart_domain *gart_domain = to_gart_domain(domain);
>>> -    struct gart_device *gart = gart_domain->gart;
>>> -    struct gart_client *c;
>>> -
>>> -    spin_lock(&gart->client_lock);
>>> -
>>> -    list_for_each_entry(c, &gart->client, list) {
>>> -        if (c->dev == dev) {
>>> -            list_del(&c->list);
>>> -            devm_kfree(gart->dev, c);
>>> -            dev_dbg(gart->dev, "Detached %s\n", dev_name(dev));
>>> -            goto out;
>>> -        }
>>> -    }
>>> -    dev_err(gart->dev, "Couldn't find\n");
>>> -out:
>>> -    spin_unlock(&gart->client_lock);
>>>    }
>>
>> The .detach_dev callback is optional in the core API now, so you can just remove
>> the whole thing.
> 
> Good catch, thanks!
> 
>>
>>>    static struct iommu_domain *gart_iommu_domain_alloc(unsigned type)
>>>    {
>>> -    struct gart_domain *gart_domain;
>>> -    struct gart_device *gart;
>>> -
>>> -    if (type != IOMMU_DOMAIN_UNMANAGED)
>>> -        return NULL;
>>> +    struct gart_device *gart = gart_handle;
>>>    -    gart = gart_handle;
>>> -    if (!gart)
>>> +    if (type != IOMMU_DOMAIN_UNMANAGED || gart->domain)
>>
>> Singleton domains are a little unpleasant given the way the IOMMU API expects
>> things to work, but it looks fairly simple to avoid needing that at all. AFAICS
>> you could move gart->savedata to something like gart_domain->ptes and keep it
>> up-to-date in .map/.unmap, then in .attach_dev you just need to do something like:
>>
>>      if (gart_domain != gart->domain) {
>>          do_gart_setup(gart, gart_domain->ptes);
>>          gart->domain = gart_domain;
>>      }
>>
>> to context-switch the hardware state when moving the group from one domain to
>> another (and as a bonus you would no longer need to do anything for suspend,
>> since resume can just look at the current domain too). If in practice there's
>> only ever one domain allocated anyway, then there's no difference in memory
>> overhead, but you still have the benefit of the driver being more consistent
>> with others and allowing that flexibility if anyone ever did want to play with it.
> 
> For the starter we'll have a single domain solely used by GPU with all its
> sub-devices. Context switching will be handled by the Tegra's DRM driver. Later
> we may consider introducing IOMMU support for the video decoder, at least to
> provide memory isolation for the buffers to which decoder performs writing.
> 
> Cross-driver context switching isn't that straightforward and I think Tegra-GART
> driver shouldn't take care of context switching in any form and only perform
> mapping / unmapping operations. There are couple variants of how to deal with
> the context switching:
> 
> 1. A simple solution could be to logically split the GART's aperture space into
> different domains, but GART's aperture won't be utilized efficiently with this
> approach, wasting IOVA space quite a lot.
> 
> 2. In order to utilize aperture more efficiently, we are going to make DRM
> driver to cache IOMMU mappings such that graphics buffer will be moved to the
> cache-eviction list on unmapping and actually unmapped when that buffer isn't
> in-use and there is no IOVA space for another buffer or on the buffers
> destruction. We'll use DRM's MM scanning helper for that [0][1]. Maybe we could
> share access to that MM helper with the video decoder somehow. Seems IOMMU API
> isn't tailored for a such use-case, so probably having a custom
> platform-specific API on top of the IOMMU API would be fine and with that we
> could have cross-device/driver context switching handled by the custom API.

Yes, if the DRM driver has overall control of the domain, then drivers 
for other devices in the group are going to have to cooperate with it in 
terms of IOVA allocation. It might even make sense to have that 
inter-driver interface abstract things down to the map/unmap level, 
since with a limited aperture you really want to avoid mapping the same 
PA to two different IOVAs if at all possible.

> Please let me know if you have any other variants to suggest.
> 
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/include/drm/drm_mm.h
> [1]
> https://github.com/grate-driver/linux/commit/16e017efaa343e23e5a7d2d498915764cc806054
> 
>>
>>>            return NULL;
>>>    -    gart_domain = kzalloc(sizeof(*gart_domain), GFP_KERNEL);
>>> -    if (!gart_domain)
>>> -        return NULL;
>>> -
>>> -    gart_domain->gart = gart;
>>> -    gart_domain->domain.geometry.aperture_start = gart->iovmm_base;
>>> -    gart_domain->domain.geometry.aperture_end = gart->iovmm_base +
>>> +    gart->domain = kzalloc(sizeof(*gart->domain), GFP_KERNEL);
>>> +    if (gart->domain) {
>>> +        gart->domain->domain.geometry.aperture_start = gart->iovmm_base;
>>> +        gart->domain->domain.geometry.aperture_end = gart->iovmm_base +
>>>                        gart->page_count * GART_PAGE_SIZE - 1;
>>> -    gart_domain->domain.geometry.force_aperture = true;
>>> +        gart->domain->domain.geometry.force_aperture = true;
>>> +        gart->domain->gart = gart;
>>> +    }
>>>    -    return &gart_domain->domain;
>>> +    return &gart->domain->domain;
>>>    }
>>>      static void gart_iommu_domain_free(struct iommu_domain *domain)
>>> @@ -251,18 +195,7 @@ static void gart_iommu_domain_free(struct iommu_domain
>>> *domain)
>>>        struct gart_domain *gart_domain = to_gart_domain(domain);
>>>        struct gart_device *gart = gart_domain->gart;
>>>    -    if (gart) {
>>> -        spin_lock(&gart->client_lock);
>>> -        if (!list_empty(&gart->client)) {
>>> -            struct gart_client *c;
>>> -
>>> -            list_for_each_entry(c, &gart->client, list)
>>> -                gart_iommu_detach_dev(domain, c->dev);
>>> -        }
>>> -        spin_unlock(&gart->client_lock);
>>> -    }
>>> -
>>> -    kfree(gart_domain);
>>> +    kfree(gart->domain);
>>>    }
>>>      static int gart_iommu_map(struct iommu_domain *domain, unsigned long iova,
>>> @@ -377,7 +310,7 @@ struct iommu_group *gart_iommu_device_group(struct device
>>> *dev)
>>>        if (err)
>>>            return ERR_PTR(err);
>>>    -    return generic_device_group(dev);
>>> +    return gart_handle->group;
>>
>> You should take a reference per device, i.e.:
>>
>>      return iommu_group_ref_get(gart_handle->group);
>>
>> otherwise removing devices could unbalance things and result in the group
>> getting freed prematurely.
> 
> Seems more correctly would be to remove iommu_group_put() from
> gart_iommu_add_device().

If you're confident that no bus code will ever result in add_device() 
getting called more than once for the same device, then you could get 
away with that. AFAIK it *shouldn't* happen, but I've never managed to 
convince myself that it *can't*.

Robin.

>>
>>>    }
>>>      static int gart_iommu_of_xlate(struct device *dev,
>>> @@ -502,8 +435,6 @@ static int tegra_gart_probe(struct platform_device *pdev)
>>>          gart->dev = &pdev->dev;
>>>        spin_lock_init(&gart->pte_lock);
>>> -    spin_lock_init(&gart->client_lock);
>>> -    INIT_LIST_HEAD(&gart->client);
>>>        gart->regs = gart_regs;
>>>        gart->iovmm_base = (dma_addr_t)res_remap->start;
>>>        gart->page_count = (resource_size(res_remap) >> GART_PAGE_SHIFT);
>>> @@ -517,6 +448,14 @@ static int tegra_gart_probe(struct platform_device *pdev)
>>>            goto iommu_unregister;
>>>        }
>>>    +    gart->group = iommu_group_alloc();
>>> +    if (IS_ERR(gart->group)) {
>>> +        ret = PTR_ERR(gart->group);
>>> +        goto free_savedata;
>>> +    }
>>> +
>>> +    iommu_group_ref_get(gart->group);
>>
>> You already hold the initial reference from iommu_group_alloc(), so there's no
>> need to take a second one at this point.
> 
> Yes, looks like this refcount-bump isn't needed here. I'll revisit the
> refcountings and correct them in v2 where necessary.
> 
> Thank you very much for the review.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices
  2018-05-14 18:18       ` Robin Murphy
@ 2018-05-16 13:43         ` Dmitry Osipenko
  0 siblings, 0 replies; 19+ messages in thread
From: Dmitry Osipenko @ 2018-05-16 13:43 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Thierry Reding, Jonathan Hunter
  Cc: linux-tegra, iommu, linux-kernel

On 14.05.2018 21:18, Robin Murphy wrote:
> On 11/05/18 21:05, Dmitry Osipenko wrote:
>> On 11.05.2018 15:32, Robin Murphy wrote:
>>> On 08/05/18 19:16, Dmitry Osipenko wrote:
>>>> GART aperture is shared by all devices, hence there is a single IOMMU
>>>> domain and group shared by these devices. Allocation of a group per
>>>> device only wastes resources and allowance of having more than one domain
>>>> is simply wrong because IOMMU mappings made by the users of "different"
>>>> domains will stomp on each other.
>>>
>>> Strictly, that reasoning is a bit backwards - allocating multiple groups is the
>>> conceptually-wrong thing if the GART cannot differentiate between different
>>> devices, whereas having multiple domains *exist* is no real problem, it's merely
>>> that only one can be active at any point in time (which will inherently become
>>> the case once all devices are grouped together).
>>
>> IIUC, the IOMMU domain represents the address space. There is only one address
>> space in a case of GART, the GART's aperture. So GART not only isn't
>> differentiating between different devices, but also between different domains.
> 
> Right, but that's the same as many other IOMMUs (exynos, rockchip, mtk, etc.) -
> the point is that an IOMMU domain represents *an* address space, but if nothing
> is attached to that domain, it's just a set of logical mappings which doesn't
> need to be backed by real hardware. It's specifically *because* these IOMMUs
> also can't differentiate between devices that things work out neatly - there's
> only one group, which can only be attached to a single domain at once, so there
> is never a time when more than one domain needs to be backed by hardware. Think
> of the IOMMU+devices as a CPU and the domains as processes ;)

\I think\ I understand what you are trying to convey. The "domain swapping"
functionality sounds like a good idea, but I don't see any practical application
to a such functionality right now. Your suggestion also feels a bit like an
implicit ad hoc to me, maybe we could extended IOMMU API to support somewhat
like "explicit domain swapping" for device drivers if multiple platforms will
need that.

In a case of the Tegra-GART driver, I'd prefer to allow having only a single
IOMMU domain in the system for the starter and implement other features on by
as-needed basis.

>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>>> ---
>>>>    drivers/iommu/tegra-gart.c | 107 +++++++++----------------------------
>>>>    1 file changed, 24 insertions(+), 83 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
>>>> index 5b2d27620350..ebc105c201bd 100644
>>>> --- a/drivers/iommu/tegra-gart.c
>>>> +++ b/drivers/iommu/tegra-gart.c
>>>> @@ -19,7 +19,6 @@
>>>>      #include <linux/io.h>
>>>>    #include <linux/iommu.h>
>>>> -#include <linux/list.h>
>>>>    #include <linux/module.h>
>>>>    #include <linux/of_device.h>
>>>>    #include <linux/slab.h>
>>>> @@ -44,22 +43,17 @@
>>>>    #define GART_PAGE_MASK                        \
>>>>        (~(GART_PAGE_SIZE - 1) & ~GART_ENTRY_PHYS_ADDR_VALID)
>>>>    -struct gart_client {
>>>> -    struct device        *dev;
>>>> -    struct list_head    list;
>>>> -};
>>>> -
>>>>    struct gart_device {
>>>>        void __iomem        *regs;
>>>>        u32            *savedata;
>>>>        u32            page_count;    /* total remappable size */
>>>>        dma_addr_t        iovmm_base;    /* offset to vmm_area */
>>>>        spinlock_t        pte_lock;    /* for pagetable */
>>>> -    struct list_head    client;
>>>> -    spinlock_t        client_lock;    /* for client list */
>>>>        struct device        *dev;
>>>>          struct iommu_device    iommu;        /* IOMMU Core handle */
>>>> +    struct iommu_group    *group;        /* Common IOMMU group */
>>>> +    struct gart_domain    *domain;    /* Unique IOMMU domain */
>>>>          struct tegra_mc_gart_handle mc_gart_handle;
>>>>    };
>>>> @@ -169,81 +163,31 @@ static inline bool gart_iova_range_valid(struct
>>>> gart_device *gart,
>>>>    static int gart_iommu_attach_dev(struct iommu_domain *domain,
>>>>                     struct device *dev)
>>>>    {
>>>> -    struct gart_domain *gart_domain = to_gart_domain(domain);
>>>> -    struct gart_device *gart = gart_domain->gart;
>>>> -    struct gart_client *client, *c;
>>>> -    int err = 0;
>>>> -
>>>> -    client = devm_kzalloc(gart->dev, sizeof(*c), GFP_KERNEL);
>>>> -    if (!client)
>>>> -        return -ENOMEM;
>>>> -    client->dev = dev;
>>>> -
>>>> -    spin_lock(&gart->client_lock);
>>>> -    list_for_each_entry(c, &gart->client, list) {
>>>> -        if (c->dev == dev) {
>>>> -            dev_err(gart->dev,
>>>> -                "%s is already attached\n", dev_name(dev));
>>>> -            err = -EINVAL;
>>>> -            goto fail;
>>>> -        }
>>>> -    }
>>>> -    list_add(&client->list, &gart->client);
>>>> -    spin_unlock(&gart->client_lock);
>>>> -    dev_dbg(gart->dev, "Attached %s\n", dev_name(dev));
>>>>        return 0;
>>>> -
>>>> -fail:
>>>> -    devm_kfree(gart->dev, client);
>>>> -    spin_unlock(&gart->client_lock);
>>>> -    return err;
>>>>    }
>>>>      static void gart_iommu_detach_dev(struct iommu_domain *domain,
>>>>                      struct device *dev)
>>>>    {
>>>> -    struct gart_domain *gart_domain = to_gart_domain(domain);
>>>> -    struct gart_device *gart = gart_domain->gart;
>>>> -    struct gart_client *c;
>>>> -
>>>> -    spin_lock(&gart->client_lock);
>>>> -
>>>> -    list_for_each_entry(c, &gart->client, list) {
>>>> -        if (c->dev == dev) {
>>>> -            list_del(&c->list);
>>>> -            devm_kfree(gart->dev, c);
>>>> -            dev_dbg(gart->dev, "Detached %s\n", dev_name(dev));
>>>> -            goto out;
>>>> -        }
>>>> -    }
>>>> -    dev_err(gart->dev, "Couldn't find\n");
>>>> -out:
>>>> -    spin_unlock(&gart->client_lock);
>>>>    }
>>>
>>> The .detach_dev callback is optional in the core API now, so you can just remove
>>> the whole thing.
>>
>> Good catch, thanks!
>>
>>>
>>>>    static struct iommu_domain *gart_iommu_domain_alloc(unsigned type)
>>>>    {
>>>> -    struct gart_domain *gart_domain;
>>>> -    struct gart_device *gart;
>>>> -
>>>> -    if (type != IOMMU_DOMAIN_UNMANAGED)
>>>> -        return NULL;
>>>> +    struct gart_device *gart = gart_handle;
>>>>    -    gart = gart_handle;
>>>> -    if (!gart)
>>>> +    if (type != IOMMU_DOMAIN_UNMANAGED || gart->domain)
>>>
>>> Singleton domains are a little unpleasant given the way the IOMMU API expects
>>> things to work, but it looks fairly simple to avoid needing that at all. AFAICS
>>> you could move gart->savedata to something like gart_domain->ptes and keep it
>>> up-to-date in .map/.unmap, then in .attach_dev you just need to do something
>>> like:
>>>
>>>      if (gart_domain != gart->domain) {
>>>          do_gart_setup(gart, gart_domain->ptes);
>>>          gart->domain = gart_domain;
>>>      }
>>>
>>> to context-switch the hardware state when moving the group from one domain to
>>> another (and as a bonus you would no longer need to do anything for suspend,
>>> since resume can just look at the current domain too). If in practice there's
>>> only ever one domain allocated anyway, then there's no difference in memory
>>> overhead, but you still have the benefit of the driver being more consistent
>>> with others and allowing that flexibility if anyone ever did want to play
>>> with it.
>>
>> For the starter we'll have a single domain solely used by GPU with all its
>> sub-devices. Context switching will be handled by the Tegra's DRM driver. Later
>> we may consider introducing IOMMU support for the video decoder, at least to
>> provide memory isolation for the buffers to which decoder performs writing.
>>
>> Cross-driver context switching isn't that straightforward and I think Tegra-GART
>> driver shouldn't take care of context switching in any form and only perform
>> mapping / unmapping operations. There are couple variants of how to deal with
>> the context switching:
>>
>> 1. A simple solution could be to logically split the GART's aperture space into
>> different domains, but GART's aperture won't be utilized efficiently with this
>> approach, wasting IOVA space quite a lot.
>>
>> 2. In order to utilize aperture more efficiently, we are going to make DRM
>> driver to cache IOMMU mappings such that graphics buffer will be moved to the
>> cache-eviction list on unmapping and actually unmapped when that buffer isn't
>> in-use and there is no IOVA space for another buffer or on the buffers
>> destruction. We'll use DRM's MM scanning helper for that [0][1]. Maybe we could
>> share access to that MM helper with the video decoder somehow. Seems IOMMU API
>> isn't tailored for a such use-case, so probably having a custom
>> platform-specific API on top of the IOMMU API would be fine and with that we
>> could have cross-device/driver context switching handled by the custom API.
> 
> Yes, if the DRM driver has overall control of the domain, then drivers for other
> devices in the group are going to have to cooperate with it in terms of IOVA
> allocation. It might even make sense to have that inter-driver interface
> abstract things down to the map/unmap level, since with a limited aperture you
> really want to avoid mapping the same PA to two different IOVAs if at all possible.

Yeah, for now all that are just thoughts about the inter-driver interfaces and
whatnot, it's more important to get at least basic things up and running. So the
current plan is to go with a single IOMMU domain and make the DRM driver use it.

>> Please let me know if you have any other variants to suggest.
>>
>> [0]
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/include/drm/drm_mm.h
>>
>> [1]
>> https://github.com/grate-driver/linux/commit/16e017efaa343e23e5a7d2d498915764cc806054
>>
>>
>>>
>>>>            return NULL;
>>>>    -    gart_domain = kzalloc(sizeof(*gart_domain), GFP_KERNEL);
>>>> -    if (!gart_domain)
>>>> -        return NULL;
>>>> -
>>>> -    gart_domain->gart = gart;
>>>> -    gart_domain->domain.geometry.aperture_start = gart->iovmm_base;
>>>> -    gart_domain->domain.geometry.aperture_end = gart->iovmm_base +
>>>> +    gart->domain = kzalloc(sizeof(*gart->domain), GFP_KERNEL);
>>>> +    if (gart->domain) {
>>>> +        gart->domain->domain.geometry.aperture_start = gart->iovmm_base;
>>>> +        gart->domain->domain.geometry.aperture_end = gart->iovmm_base +
>>>>                        gart->page_count * GART_PAGE_SIZE - 1;
>>>> -    gart_domain->domain.geometry.force_aperture = true;
>>>> +        gart->domain->domain.geometry.force_aperture = true;
>>>> +        gart->domain->gart = gart;
>>>> +    }
>>>>    -    return &gart_domain->domain;
>>>> +    return &gart->domain->domain;
>>>>    }
>>>>      static void gart_iommu_domain_free(struct iommu_domain *domain)
>>>> @@ -251,18 +195,7 @@ static void gart_iommu_domain_free(struct iommu_domain
>>>> *domain)
>>>>        struct gart_domain *gart_domain = to_gart_domain(domain);
>>>>        struct gart_device *gart = gart_domain->gart;
>>>>    -    if (gart) {
>>>> -        spin_lock(&gart->client_lock);
>>>> -        if (!list_empty(&gart->client)) {
>>>> -            struct gart_client *c;
>>>> -
>>>> -            list_for_each_entry(c, &gart->client, list)
>>>> -                gart_iommu_detach_dev(domain, c->dev);
>>>> -        }
>>>> -        spin_unlock(&gart->client_lock);
>>>> -    }
>>>> -
>>>> -    kfree(gart_domain);
>>>> +    kfree(gart->domain);
>>>>    }
>>>>      static int gart_iommu_map(struct iommu_domain *domain, unsigned long iova,
>>>> @@ -377,7 +310,7 @@ struct iommu_group *gart_iommu_device_group(struct device
>>>> *dev)
>>>>        if (err)
>>>>            return ERR_PTR(err);
>>>>    -    return generic_device_group(dev);
>>>> +    return gart_handle->group;
>>>
>>> You should take a reference per device, i.e.:
>>>
>>>      return iommu_group_ref_get(gart_handle->group);
>>>
>>> otherwise removing devices could unbalance things and result in the group
>>> getting freed prematurely.
>>
>> Seems more correctly would be to remove iommu_group_put() from
>> gart_iommu_add_device().
> 
> If you're confident that no bus code will ever result in add_device() getting
> called more than once for the same device, then you could get away with that.
> AFAIK it *shouldn't* happen, but I've never managed to convince myself that it
> *can't*
Having refcount incremented more times than decremented won't be a big problem
for the Tegra-GART driver since the singleton IOMMU group is allocated during of
driver probe once and then re-used by all devices. Let's go with keeping code
clean and simple where possible, and change it only if it will cause real problems.

>>>
>>>>    }
>>>>      static int gart_iommu_of_xlate(struct device *dev,
>>>> @@ -502,8 +435,6 @@ static int tegra_gart_probe(struct platform_device *pdev)
>>>>          gart->dev = &pdev->dev;
>>>>        spin_lock_init(&gart->pte_lock);
>>>> -    spin_lock_init(&gart->client_lock);
>>>> -    INIT_LIST_HEAD(&gart->client);
>>>>        gart->regs = gart_regs;
>>>>        gart->iovmm_base = (dma_addr_t)res_remap->start;
>>>>        gart->page_count = (resource_size(res_remap) >> GART_PAGE_SHIFT);
>>>> @@ -517,6 +448,14 @@ static int tegra_gart_probe(struct platform_device *pdev)
>>>>            goto iommu_unregister;
>>>>        }
>>>>    +    gart->group = iommu_group_alloc();
>>>> +    if (IS_ERR(gart->group)) {
>>>> +        ret = PTR_ERR(gart->group);
>>>> +        goto free_savedata;
>>>> +    }
>>>> +
>>>> +    iommu_group_ref_get(gart->group);
>>>
>>> You already hold the initial reference from iommu_group_alloc(), so there's no
>>> need to take a second one at this point.
>>
>> Yes, looks like this refcount-bump isn't needed here. I'll revisit the
>> refcountings and correct them in v2 where necessary.
>>
>> Thank you very much for the review.
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-05-16 13:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-08 18:16 [PATCH v1 0/9] Tegra GART driver clean up and optimization Dmitry Osipenko
2018-05-08 18:16 ` [PATCH v1 1/9] memory: tegra: Provide facility for integration with the GART driver Dmitry Osipenko
2018-05-08 18:16 ` [PATCH v1 2/9] iommu/tegra: gart: Provide access to Memory Controller driver Dmitry Osipenko
2018-05-08 18:16 ` [PATCH v1 3/9] iommu/tegra: gart: Remove code related to module unloading Dmitry Osipenko
2018-05-08 18:16 ` [PATCH v1 4/9] iommu/tegra: gart: Remove pr_fmt and clean up includes Dmitry Osipenko
2018-05-08 18:16 ` [PATCH v1 5/9] iommu/tegra: gart: Clean up driver probe failure unwinding Dmitry Osipenko
2018-05-08 18:16 ` [PATCH v1 6/9] iommu/tegra: gart: Ignore devices without IOMMU phandle in DT Dmitry Osipenko
2018-05-11 11:34   ` Robin Murphy
2018-05-11 15:34     ` Dmitry Osipenko
2018-05-08 18:16 ` [PATCH v1 7/9] iommu/tegra: gart: Provide single domain and group for all devices Dmitry Osipenko
2018-05-11 11:12   ` Dmitry Osipenko
2018-05-11 12:32   ` Robin Murphy
2018-05-11 20:05     ` Dmitry Osipenko
2018-05-14 18:18       ` Robin Murphy
2018-05-16 13:43         ` Dmitry Osipenko
2018-05-08 18:16 ` [PATCH v1 8/9] iommu: Introduce iotlb_sync_map callback Dmitry Osipenko
2018-05-11 13:02   ` Robin Murphy
2018-05-11 19:58     ` Dmitry Osipenko
2018-05-08 18:17 ` [PATCH v1 9/9] iommu/tegra: gart: Optimize mapping / unmapping performance Dmitry Osipenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).