linux-mediatek.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap
@ 2021-01-07 12:29 Yong Wu
  2021-01-07 12:29 ` [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Yong Wu @ 2021-01-07 12:29 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek, yong.wu,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

This patchset is to improve tlb flushing performance in iommu_map/unmap
for MediaTek IOMMU.

For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP
to do tlb_flush for each a memory chunk. this is so unnecessary. we could
improve it by tlb flushing one time at the end of iommu_map.

For iommu_unmap, currently we have already improve this performance by
gather. But the current gather should take care its granule size. if the
granule size is different, it will do tlb flush and gather again. Our HW
don't care about granule size. thus I gather the range in our file.

After this patchset, we could achieve only tlb flushing once in iommu_map
and iommu_unmap.

Regardless of sg, for each a segment, I did a simple test:
  
  size = 20 * SZ_1M;
  /* the worst case, all are 4k mapping. */
  ret = iommu_map(domain, 0x5bb02000, 0x123f1000, size, IOMMU_READ);
  iommu_unmap(domain, 0x5bb02000, size);

This is the comparing time(unit is us):
              original-time  after-improve
   map-20M    59943           2347
   unmap-20M  264             36

This patchset also flush tlb once in the iommu_map_sg case.

patch [1/7][2/7][3/7] are for map while the others are for unmap.

change note:
v4: a. base on v5.11-rc1.
    b. Add a little helper _iommu_map.
    c. Fix a build fail for tegra-gart.c. I didn't notice there is another place
    call gart_iommu_sync_map.
    d. Switch gather->end to the read end address("start + end - 1").
    
v3: https://lore.kernel.org/linux-iommu/20201216103607.23050-1-yong.wu@mediatek.com/#r
    Refactor the unmap flow suggested by Robin.
     
v2: https://lore.kernel.org/linux-iommu/20201119061836.15238-1-yong.wu@mediatek.com/
    Refactor all the code.
    base on v5.10-rc1.

Yong Wu (7):
  iommu: Move iotlb_sync_map out from __iommu_map
  iommu: Add iova and size as parameters in iotlb_sync_map
  iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
  iommu: Switch gather->end to the inclusive end
  iommu/io-pgtable: Allow io_pgtable_tlb ops optional
  iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
  iommu/mediatek: Remove the tlb-ops for v7s

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  2 +-
 drivers/iommu/iommu.c                       | 23 +++++++---
 drivers/iommu/mtk_iommu.c                   | 47 +++++++++------------
 drivers/iommu/tegra-gart.c                  |  7 ++-
 include/linux/io-pgtable.h                  |  8 ++--
 include/linux/iommu.h                       |  7 +--
 6 files changed, 52 insertions(+), 42 deletions(-)

-- 
2.18.0

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map
  2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
@ 2021-01-07 12:29 ` Yong Wu
  2021-02-02  1:07   ` Doug Anderson
  2021-01-07 12:29 ` [PATCH v4 2/7] iommu: Add iova and size as parameters in iotlb_sync_map Yong Wu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Yong Wu @ 2021-01-07 12:29 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek, yong.wu,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

In the end of __iommu_map, It alway call iotlb_sync_map.

This patch moves iotlb_sync_map out from __iommu_map since it is
unnecessary to call this for each sg segment especially iotlb_sync_map
is flush tlb all currently. Add a little helper _iommu_map for this.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/iommu.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ffeebda8d6de..c304a6a30d42 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2426,9 +2426,6 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
 		size -= pgsize;
 	}
 
-	if (ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
-
 	/* unroll mapping in case something went wrong */
 	if (ret)
 		iommu_unmap(domain, orig_iova, orig_size - size);
@@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
 	return ret;
 }
 
+static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
+		      phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+	const struct iommu_ops *ops = domain->ops;
+	int ret;
+
+	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+	if (ret == 0 && ops->iotlb_sync_map)
+		ops->iotlb_sync_map(domain);
+
+	return ret;
+}
+
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
 	might_sleep();
-	return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+	return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
 int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
-	return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
+	return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
 }
 EXPORT_SYMBOL_GPL(iommu_map_atomic);
 
@@ -2533,6 +2543,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 			     struct scatterlist *sg, unsigned int nents, int prot,
 			     gfp_t gfp)
 {
+	const struct iommu_ops *ops = domain->ops;
 	size_t len = 0, mapped = 0;
 	phys_addr_t start;
 	unsigned int i = 0;
@@ -2563,6 +2574,8 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 			sg = sg_next(sg);
 	}
 
+	if (ops->iotlb_sync_map)
+		ops->iotlb_sync_map(domain);
 	return mapped;
 
 out_err:
-- 
2.18.0
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 2/7] iommu: Add iova and size as parameters in iotlb_sync_map
  2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
  2021-01-07 12:29 ` [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
@ 2021-01-07 12:29 ` Yong Wu
  2021-01-07 12:29 ` [PATCH v4 3/7] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range Yong Wu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Yong Wu @ 2021-01-07 12:29 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek, yong.wu,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
mapping. This patch adds iova and size as the parameters in it. then the
IOMMU driver could flush tlb with the whole range once after iova mapping
to improve performance.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/iommu.c      | 4 ++--
 drivers/iommu/tegra-gart.c | 7 +++++--
 include/linux/iommu.h      | 3 ++-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c304a6a30d42..3d099a31ddca 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
 
 	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
 	if (ret == 0 && ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
+		ops->iotlb_sync_map(domain, iova, size);
 
 	return ret;
 }
@@ -2575,7 +2575,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 	}
 
 	if (ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
+		ops->iotlb_sync_map(domain, iova, mapped);
 	return mapped;
 
 out_err:
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index fac720273889..05e8e19b8269 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -261,7 +261,8 @@ static int gart_iommu_of_xlate(struct device *dev,
 	return 0;
 }
 
-static void gart_iommu_sync_map(struct iommu_domain *domain)
+static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
+				size_t size)
 {
 	FLUSH_GART_REGS(gart_handle);
 }
@@ -269,7 +270,9 @@ static void gart_iommu_sync_map(struct iommu_domain *domain)
 static void gart_iommu_sync(struct iommu_domain *domain,
 			    struct iommu_iotlb_gather *gather)
 {
-	gart_iommu_sync_map(domain);
+	size_t length = gather->end - gather->start;
+
+	gart_iommu_sync_map(domain, gather->start, length);
 }
 
 static const struct iommu_ops gart_iommu_ops = {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b3f0e2018c62..9ce0aa9e236b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -246,7 +246,8 @@ struct iommu_ops {
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
 	void (*flush_iotlb_all)(struct iommu_domain *domain);
-	void (*iotlb_sync_map)(struct iommu_domain *domain);
+	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
+			       size_t size);
 	void (*iotlb_sync)(struct iommu_domain *domain,
 			   struct iommu_iotlb_gather *iotlb_gather);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
-- 
2.18.0
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 3/7] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
  2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
  2021-01-07 12:29 ` [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
  2021-01-07 12:29 ` [PATCH v4 2/7] iommu: Add iova and size as parameters in iotlb_sync_map Yong Wu
@ 2021-01-07 12:29 ` Yong Wu
  2021-01-07 12:29 ` [PATCH v4 4/7] iommu: Switch gather->end to the inclusive end Yong Wu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Yong Wu @ 2021-01-07 12:29 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek, yong.wu,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

Remove IO_PGTABLE_QUIRK_TLBI_ON_MAP to avoid tlb sync for each a small
chunk memory, Use the new iotlb_sync_map to tlb_sync once for whole the
iova range of iommu_map.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/mtk_iommu.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 8e56cec532e7..f579fc21971e 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -322,7 +322,6 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom)
 	dom->cfg = (struct io_pgtable_cfg) {
 		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
 			IO_PGTABLE_QUIRK_NO_PERMS |
-			IO_PGTABLE_QUIRK_TLBI_ON_MAP |
 			IO_PGTABLE_QUIRK_ARM_MTK_EXT,
 		.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
 		.ias = 32,
@@ -453,6 +452,14 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
 				       data);
 }
 
+static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
+			       size_t size)
+{
+	struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
+
+	mtk_iommu_tlb_flush_range_sync(iova, size, size, data);
+}
+
 static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
 					  dma_addr_t iova)
 {
@@ -539,6 +546,7 @@ static const struct iommu_ops mtk_iommu_ops = {
 	.unmap		= mtk_iommu_unmap,
 	.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
 	.iotlb_sync	= mtk_iommu_iotlb_sync,
+	.iotlb_sync_map	= mtk_iommu_sync_map,
 	.iova_to_phys	= mtk_iommu_iova_to_phys,
 	.probe_device	= mtk_iommu_probe_device,
 	.release_device	= mtk_iommu_release_device,
-- 
2.18.0
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 4/7] iommu: Switch gather->end to the inclusive end
  2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (2 preceding siblings ...)
  2021-01-07 12:29 ` [PATCH v4 3/7] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range Yong Wu
@ 2021-01-07 12:29 ` Yong Wu
  2021-01-18 18:41   ` Robin Murphy
  2021-01-07 12:29 ` [PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional Yong Wu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Yong Wu @ 2021-01-07 12:29 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek, yong.wu,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

Currently gather->end is "unsigned long" which may be overflow in
arch32 in the corner case: 0xfff00000 + 0x100000(iova + size).
Although it doesn't affect the size(end - start), it affects the checking
"gather->end < end"

This patch changes this "end" to the real end address
(end = start + size - 1). Correspondingly, update the length to
"end - start + 1".

Fixes: a7d20dc19d9e ("iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes")
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
 drivers/iommu/mtk_iommu.c                   | 2 +-
 drivers/iommu/tegra-gart.c                  | 2 +-
 include/linux/iommu.h                       | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8ca7415d785d..c70d6e79f534 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2280,7 +2280,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-	arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start,
+	arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start + 1,
 			       gather->pgsize, true, smmu_domain);
 }
 
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index f579fc21971e..66a00a2cb445 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -443,7 +443,7 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
 				 struct iommu_iotlb_gather *gather)
 {
 	struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
-	size_t length = gather->end - gather->start;
+	size_t length = gather->end - gather->start + 1;
 
 	if (gather->start == ULONG_MAX)
 		return;
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 05e8e19b8269..6f130e51f072 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -270,7 +270,7 @@ static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
 static void gart_iommu_sync(struct iommu_domain *domain,
 			    struct iommu_iotlb_gather *gather)
 {
-	size_t length = gather->end - gather->start;
+	size_t length = gather->end - gather->start + 1;
 
 	gart_iommu_sync_map(domain, gather->start, length);
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9ce0aa9e236b..ae8eddd4621b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -170,7 +170,7 @@ enum iommu_dev_features {
  * struct iommu_iotlb_gather - Range information for a pending IOTLB flush
  *
  * @start: IOVA representing the start of the range to be flushed
- * @end: IOVA representing the end of the range to be flushed (exclusive)
+ * @end: IOVA representing the end of the range to be flushed (inclusive)
  * @pgsize: The interval at which to perform the flush
  *
  * This structure is intended to be updated by multiple calls to the
@@ -539,7 +539,7 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
 					       struct iommu_iotlb_gather *gather,
 					       unsigned long iova, size_t size)
 {
-	unsigned long start = iova, end = start + size;
+	unsigned long start = iova, end = start + size - 1;
 
 	/*
 	 * If the new page is disjoint from the current range or is mapped at
-- 
2.18.0
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional
  2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (3 preceding siblings ...)
  2021-01-07 12:29 ` [PATCH v4 4/7] iommu: Switch gather->end to the inclusive end Yong Wu
@ 2021-01-07 12:29 ` Yong Wu
  2021-01-18 18:42   ` Robin Murphy
  2021-01-07 12:29 ` [PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once Yong Wu
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Yong Wu @ 2021-01-07 12:29 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek, yong.wu,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

This patch allows io_pgtable_tlb ops could be null since the IOMMU drivers
may use the tlb ops from iommu framework.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 include/linux/io-pgtable.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ea727eb1a1a9..2a5686ca2ba3 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -214,14 +214,16 @@ struct io_pgtable_domain_attr {
 
 static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
 {
-	iop->cfg.tlb->tlb_flush_all(iop->cookie);
+	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
+		iop->cfg.tlb->tlb_flush_all(iop->cookie);
 }
 
 static inline void
 io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
 			  size_t size, size_t granule)
 {
-	iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
+		iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
 }
 
 static inline void
@@ -229,7 +231,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
 			struct iommu_iotlb_gather * gather, unsigned long iova,
 			size_t granule)
 {
-	if (iop->cfg.tlb->tlb_add_page)
+	if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
 		iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
 }
 
-- 
2.18.0
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
  2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (4 preceding siblings ...)
  2021-01-07 12:29 ` [PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional Yong Wu
@ 2021-01-07 12:29 ` Yong Wu
  2021-01-18 18:44   ` Robin Murphy
  2021-01-07 12:29 ` [PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s Yong Wu
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Yong Wu @ 2021-01-07 12:29 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek, yong.wu,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

In current iommu_unmap, this code is:

	iommu_iotlb_gather_init(&iotlb_gather);
	ret = __iommu_unmap(domain, iova, size, &iotlb_gather);
	iommu_iotlb_sync(domain, &iotlb_gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/mtk_iommu.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 66a00a2cb445..d3b8a1649093 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -430,7 +430,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
 			      struct iommu_iotlb_gather *gather)
 {
 	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+	unsigned long end = iova + size - 1;
 
+	if (gather->start > iova)
+		gather->start = iova;
+	if (gather->end < end)
+		gather->end = end;
 	return dom->iop->unmap(dom->iop, iova, size, gather);
 }
 
@@ -445,9 +450,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
 	struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
 	size_t length = gather->end - gather->start + 1;
 
-	if (gather->start == ULONG_MAX)
-		return;
-
 	mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
 				       data);
 }
-- 
2.18.0
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s
  2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (5 preceding siblings ...)
  2021-01-07 12:29 ` [PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once Yong Wu
@ 2021-01-07 12:29 ` Yong Wu
  2021-01-18 18:46   ` Robin Murphy
  2021-01-22 19:32 ` [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Will Deacon
  2021-01-27 13:19 ` Will Deacon
  8 siblings, 1 reply; 15+ messages in thread
From: Yong Wu @ 2021-01-07 12:29 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek, yong.wu,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

Until now, we have already used the tlb operations from iommu framework,
then the tlb operations for v7s can be removed.

Correspondingly, Switch the paramenter "cookie" to the internal structure.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/mtk_iommu.c | 27 ++++-----------------------
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d3b8a1649093..86ab577c9520 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -182,10 +182,8 @@ static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
 	return container_of(dom, struct mtk_iommu_domain, domain);
 }
 
-static void mtk_iommu_tlb_flush_all(void *cookie)
+static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
 {
-	struct mtk_iommu_data *data = cookie;
-
 	for_each_m4u(data) {
 		writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
 			       data->base + data->plat_data->inv_sel_reg);
@@ -195,9 +193,9 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
 }
 
 static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
-					   size_t granule, void *cookie)
+					   size_t granule,
+					   struct mtk_iommu_data *data)
 {
-	struct mtk_iommu_data *data = cookie;
 	unsigned long flags;
 	int ret;
 	u32 tmp;
@@ -219,7 +217,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
 		if (ret) {
 			dev_warn(data->dev,
 				 "Partial TLB flush timed out, falling back to full flush\n");
-			mtk_iommu_tlb_flush_all(cookie);
+			mtk_iommu_tlb_flush_all(data);
 		}
 		/* Clear the CPE status */
 		writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
@@ -227,22 +225,6 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
 	}
 }
 
-static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
-					    unsigned long iova, size_t granule,
-					    void *cookie)
-{
-	struct mtk_iommu_data *data = cookie;
-	struct iommu_domain *domain = &data->m4u_dom->domain;
-
-	iommu_iotlb_gather_add_page(domain, gather, iova, granule);
-}
-
-static const struct iommu_flush_ops mtk_iommu_flush_ops = {
-	.tlb_flush_all = mtk_iommu_tlb_flush_all,
-	.tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
-	.tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
-};
-
 static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
 {
 	struct mtk_iommu_data *data = dev_id;
@@ -326,7 +308,6 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom)
 		.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
 		.ias = 32,
 		.oas = 34,
-		.tlb = &mtk_iommu_flush_ops,
 		.iommu_dev = data->dev,
 	};
 
-- 
2.18.0
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 4/7] iommu: Switch gather->end to the inclusive end
  2021-01-07 12:29 ` [PATCH v4 4/7] iommu: Switch gather->end to the inclusive end Yong Wu
@ 2021-01-18 18:41   ` Robin Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2021-01-18 18:41 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

On 2021-01-07 12:29, Yong Wu wrote:
> Currently gather->end is "unsigned long" which may be overflow in
> arch32 in the corner case: 0xfff00000 + 0x100000(iova + size).
> Although it doesn't affect the size(end - start), it affects the checking
> "gather->end < end"
> 
> This patch changes this "end" to the real end address
> (end = start + size - 1). Correspondingly, update the length to
> "end - start + 1".
> 
> Fixes: a7d20dc19d9e ("iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes")
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
>   drivers/iommu/mtk_iommu.c                   | 2 +-
>   drivers/iommu/tegra-gart.c                  | 2 +-
>   include/linux/iommu.h                       | 4 ++--
>   4 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 8ca7415d785d..c70d6e79f534 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2280,7 +2280,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
>   {
>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>   
> -	arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start,
> +	arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start + 1,
>   			       gather->pgsize, true, smmu_domain);
>   }
>   
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index f579fc21971e..66a00a2cb445 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -443,7 +443,7 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
>   				 struct iommu_iotlb_gather *gather)
>   {
>   	struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
> -	size_t length = gather->end - gather->start;
> +	size_t length = gather->end - gather->start + 1;
>   
>   	if (gather->start == ULONG_MAX)
>   		return;
> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
> index 05e8e19b8269..6f130e51f072 100644
> --- a/drivers/iommu/tegra-gart.c
> +++ b/drivers/iommu/tegra-gart.c
> @@ -270,7 +270,7 @@ static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
>   static void gart_iommu_sync(struct iommu_domain *domain,
>   			    struct iommu_iotlb_gather *gather)
>   {
> -	size_t length = gather->end - gather->start;
> +	size_t length = gather->end - gather->start + 1;

TBH I don't think there's any need to bother doing precise calculations 
on effectively-uninitialised data (this driver doesn't even do 
address-based invalidation, let alone use the gather mechanism). In fact 
it might make sense to flip things around and define gart_iommu_sync_map 
in terms of gart_iommu_sync now just so there's one less unused argument 
to make up. However we can always do cleanup on top, and right now I'm 
more interested in getting these changes landed, so either way,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

>   	gart_iommu_sync_map(domain, gather->start, length);
>   }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9ce0aa9e236b..ae8eddd4621b 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -170,7 +170,7 @@ enum iommu_dev_features {
>    * struct iommu_iotlb_gather - Range information for a pending IOTLB flush
>    *
>    * @start: IOVA representing the start of the range to be flushed
> - * @end: IOVA representing the end of the range to be flushed (exclusive)
> + * @end: IOVA representing the end of the range to be flushed (inclusive)
>    * @pgsize: The interval at which to perform the flush
>    *
>    * This structure is intended to be updated by multiple calls to the
> @@ -539,7 +539,7 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
>   					       struct iommu_iotlb_gather *gather,
>   					       unsigned long iova, size_t size)
>   {
> -	unsigned long start = iova, end = start + size;
> +	unsigned long start = iova, end = start + size - 1;
>   
>   	/*
>   	 * If the new page is disjoint from the current range or is mapped at
> 

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional
  2021-01-07 12:29 ` [PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional Yong Wu
@ 2021-01-18 18:42   ` Robin Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2021-01-18 18:42 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

On 2021-01-07 12:29, Yong Wu wrote:
> This patch allows io_pgtable_tlb ops could be null since the IOMMU drivers
> may use the tlb ops from iommu framework.

For the reasons I gave on v3,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
>   include/linux/io-pgtable.h | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index ea727eb1a1a9..2a5686ca2ba3 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -214,14 +214,16 @@ struct io_pgtable_domain_attr {
>   
>   static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
>   {
> -	iop->cfg.tlb->tlb_flush_all(iop->cookie);
> +	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
> +		iop->cfg.tlb->tlb_flush_all(iop->cookie);
>   }
>   
>   static inline void
>   io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
>   			  size_t size, size_t granule)
>   {
> -	iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
> +	if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
> +		iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
>   }
>   
>   static inline void
> @@ -229,7 +231,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
>   			struct iommu_iotlb_gather * gather, unsigned long iova,
>   			size_t granule)
>   {
> -	if (iop->cfg.tlb->tlb_add_page)
> +	if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
>   		iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
>   }
>   
> 

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
  2021-01-07 12:29 ` [PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once Yong Wu
@ 2021-01-18 18:44   ` Robin Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2021-01-18 18:44 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	linux-kernel, Krzysztof Kozlowski, Christoph Hellwig,
	Tomasz Figa, iommu, David Laight, linux-mediatek,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

On 2021-01-07 12:29, Yong Wu wrote:
> In current iommu_unmap, this code is:
> 
> 	iommu_iotlb_gather_init(&iotlb_gather);
> 	ret = __iommu_unmap(domain, iova, size, &iotlb_gather);
> 	iommu_iotlb_sync(domain, &iotlb_gather);
> 
> We could gather the whole iova range in __iommu_unmap, and then do tlb
> synchronization in the iommu_iotlb_sync.
> 
> This patch implement this, Gather the range in mtk_iommu_unmap.
> then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> could be regardless of granule size.
> 
> In this way, gather->start is impossible ULONG_MAX, remove the checking.
> 
> This patch aims to do tlb synchronization *once* in the iommu_unmap.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
>   drivers/iommu/mtk_iommu.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 66a00a2cb445..d3b8a1649093 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -430,7 +430,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>   			      struct iommu_iotlb_gather *gather)
>   {
>   	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> +	unsigned long end = iova + size - 1;
>   
> +	if (gather->start > iova)
> +		gather->start = iova;
> +	if (gather->end < end)
> +		gather->end = end;
>   	return dom->iop->unmap(dom->iop, iova, size, gather);
>   }
>   
> @@ -445,9 +450,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
>   	struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
>   	size_t length = gather->end - gather->start + 1;
>   
> -	if (gather->start == ULONG_MAX)
> -		return;
> -
>   	mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
>   				       data);
>   }
> 

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s
  2021-01-07 12:29 ` [PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s Yong Wu
@ 2021-01-18 18:46   ` Robin Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2021-01-18 18:46 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream,
	Tomasz Figa, linux-kernel, Krzysztof Kozlowski,
	Christoph Hellwig, chao.hao, iommu, David Laight, linux-mediatek,
	Matthias Brugger, Greg Kroah-Hartman, kernel-team,
	linux-arm-kernel

On 2021-01-07 12:29, Yong Wu wrote:
> Until now, we have already used the tlb operations from iommu framework,
> then the tlb operations for v7s can be removed.
> 
> Correspondingly, Switch the paramenter "cookie" to the internal structure.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
>   drivers/iommu/mtk_iommu.c | 27 ++++-----------------------
>   1 file changed, 4 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index d3b8a1649093..86ab577c9520 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -182,10 +182,8 @@ static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
>   	return container_of(dom, struct mtk_iommu_domain, domain);
>   }
>   
> -static void mtk_iommu_tlb_flush_all(void *cookie)
> +static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
>   {
> -	struct mtk_iommu_data *data = cookie;
> -
>   	for_each_m4u(data) {
>   		writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
>   			       data->base + data->plat_data->inv_sel_reg);
> @@ -195,9 +193,9 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
>   }
>   
>   static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
> -					   size_t granule, void *cookie)
> +					   size_t granule,
> +					   struct mtk_iommu_data *data)
>   {
> -	struct mtk_iommu_data *data = cookie;
>   	unsigned long flags;
>   	int ret;
>   	u32 tmp;
> @@ -219,7 +217,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
>   		if (ret) {
>   			dev_warn(data->dev,
>   				 "Partial TLB flush timed out, falling back to full flush\n");
> -			mtk_iommu_tlb_flush_all(cookie);
> +			mtk_iommu_tlb_flush_all(data);
>   		}
>   		/* Clear the CPE status */
>   		writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
> @@ -227,22 +225,6 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
>   	}
>   }
>   
> -static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
> -					    unsigned long iova, size_t granule,
> -					    void *cookie)
> -{
> -	struct mtk_iommu_data *data = cookie;
> -	struct iommu_domain *domain = &data->m4u_dom->domain;
> -
> -	iommu_iotlb_gather_add_page(domain, gather, iova, granule);
> -}
> -
> -static const struct iommu_flush_ops mtk_iommu_flush_ops = {
> -	.tlb_flush_all = mtk_iommu_tlb_flush_all,
> -	.tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
> -	.tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
> -};
> -
>   static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
>   {
>   	struct mtk_iommu_data *data = dev_id;
> @@ -326,7 +308,6 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom)
>   		.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
>   		.ias = 32,
>   		.oas = 34,
> -		.tlb = &mtk_iommu_flush_ops,
>   		.iommu_dev = data->dev,
>   	};
>   
> 

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap
  2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (6 preceding siblings ...)
  2021-01-07 12:29 ` [PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s Yong Wu
@ 2021-01-22 19:32 ` Will Deacon
  2021-01-27 13:19 ` Will Deacon
  8 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2021-01-22 19:32 UTC (permalink / raw)
  To: Yong Wu
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	kernel-team, Joerg Roedel, linux-kernel, Krzysztof Kozlowski,
	Christoph Hellwig, Tomasz Figa, iommu, David Laight,
	linux-mediatek, Matthias Brugger, Greg Kroah-Hartman,
	Robin Murphy, linux-arm-kernel

On Thu, Jan 07, 2021 at 08:29:02PM +0800, Yong Wu wrote:
> This patchset is to improve tlb flushing performance in iommu_map/unmap
> for MediaTek IOMMU.
> 
> For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP
> to do tlb_flush for each a memory chunk. this is so unnecessary. we could
> improve it by tlb flushing one time at the end of iommu_map.
> 
> For iommu_unmap, currently we have already improve this performance by
> gather. But the current gather should take care its granule size. if the
> granule size is different, it will do tlb flush and gather again. Our HW
> don't care about granule size. thus I gather the range in our file.
> 
> After this patchset, we could achieve only tlb flushing once in iommu_map
> and iommu_unmap.
> 
> Regardless of sg, for each a segment, I did a simple test:
>   
>   size = 20 * SZ_1M;
>   /* the worst case, all are 4k mapping. */
>   ret = iommu_map(domain, 0x5bb02000, 0x123f1000, size, IOMMU_READ);
>   iommu_unmap(domain, 0x5bb02000, size);
> 
> This is the comparing time(unit is us):
>               original-time  after-improve
>    map-20M    59943           2347
>    unmap-20M  264             36
> 
> This patchset also flush tlb once in the iommu_map_sg case.
> 
> patch [1/7][2/7][3/7] are for map while the others are for unmap.
> 
> change note:
> v4: a. base on v5.11-rc1.
>     b. Add a little helper _iommu_map.
>     c. Fix a build fail for tegra-gart.c. I didn't notice there is another place
>     call gart_iommu_sync_map.
>     d. Switch gather->end to the read end address("start + end - 1").
>     
> v3: https://lore.kernel.org/linux-iommu/20201216103607.23050-1-yong.wu@mediatek.com/#r
>     Refactor the unmap flow suggested by Robin.
>      
> v2: https://lore.kernel.org/linux-iommu/20201119061836.15238-1-yong.wu@mediatek.com/
>     Refactor all the code.
>     base on v5.10-rc1.
> 
> Yong Wu (7):
>   iommu: Move iotlb_sync_map out from __iommu_map
>   iommu: Add iova and size as parameters in iotlb_sync_map
>   iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
>   iommu: Switch gather->end to the inclusive end
>   iommu/io-pgtable: Allow io_pgtable_tlb ops optional
>   iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
>   iommu/mediatek: Remove the tlb-ops for v7s

For the series:

Acked-by: Will Deacon <will@kernel.org>

Joerg -- how would you like to handle merging this? I suppose either you
could host a separate branch that I could merge if needed, or I could
include this in my pull to you, or something else.

Please let me know what you prefer,

Cheers,

Will

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap
  2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (7 preceding siblings ...)
  2021-01-22 19:32 ` [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Will Deacon
@ 2021-01-27 13:19 ` Will Deacon
  8 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2021-01-27 13:19 UTC (permalink / raw)
  To: Joerg Roedel, Yong Wu, Robin Murphy
  Cc: anan.sun, youlin.pei, Nicolas Boichat, srv_heupstream,
	Will Deacon, Tomasz Figa, catalin.marinas, linux-kernel,
	Krzysztof Kozlowski, Christoph Hellwig, chao.hao, iommu,
	David Laight, linux-mediatek, Matthias Brugger,
	Greg Kroah-Hartman, kernel-team, linux-arm-kernel

On Thu, 7 Jan 2021 20:29:02 +0800, Yong Wu wrote:
> This patchset is to improve tlb flushing performance in iommu_map/unmap
> for MediaTek IOMMU.
> 
> For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP
> to do tlb_flush for each a memory chunk. this is so unnecessary. we could
> improve it by tlb flushing one time at the end of iommu_map.
> 
> [...]

After discussion with Joerg, I'll queue this (and hopefully the next posting
of your IOMMU driver) along with the Arm SMMU patches, and then send that
all together.

Applied to will (for-joerg/mtk), thanks!

[1/7] iommu: Move iotlb_sync_map out from __iommu_map
      https://git.kernel.org/arm64/c/d8c1df02ac7f
[2/7] iommu: Add iova and size as parameters in iotlb_sync_map
      https://git.kernel.org/arm64/c/2ebbd25873ce
[3/7] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
      https://git.kernel.org/arm64/c/20143451eff0
[4/7] iommu: Switch gather->end to the inclusive end
      https://git.kernel.org/arm64/c/862c3715de8f
[5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional
      https://git.kernel.org/arm64/c/77e0992aee4e
[6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
      https://git.kernel.org/arm64/c/f21ae3b10084
[7/7] iommu/mediatek: Remove the tlb-ops for v7s
      https://git.kernel.org/arm64/c/0954d61a59e3

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map
  2021-01-07 12:29 ` [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
@ 2021-02-02  1:07   ` Doug Anderson
  0 siblings, 0 replies; 15+ messages in thread
From: Doug Anderson @ 2021-02-02  1:07 UTC (permalink / raw)
  To: Yong Wu
  Cc: youlin.pei, anan.sun, Nicolas Boichat, srv_heupstream, chao.hao,
	Will Deacon, Joerg Roedel, LKML, Krzysztof Kozlowski,
	Christoph Hellwig, Tomasz Figa,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	David Laight, moderated list:ARM/Mediatek SoC support,
	Matthias Brugger, kernel-team, Greg Kroah-Hartman, Robin Murphy,
	Linux ARM

Hi,

On Thu, Jan 7, 2021 at 4:31 AM Yong Wu <yong.wu@mediatek.com> wrote:
>
> @@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
>         return ret;
>  }
>
> +static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
> +                     phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> +{
> +       const struct iommu_ops *ops = domain->ops;
> +       int ret;
> +
> +       ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);

The above is broken.  Instead of GFP_KERNEL it should be passing "gfp".


> +       if (ret == 0 && ops->iotlb_sync_map)
> +               ops->iotlb_sync_map(domain);
> +
> +       return ret;
> +}
> +
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
>               phys_addr_t paddr, size_t size, int prot)
>  {
>         might_sleep();
> -       return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> +       return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
>
>  int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
>               phys_addr_t paddr, size_t size, int prot)
>  {
> -       return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
> +       return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);

Specifically the above bug means we drop the "GFP_ATOMIC" here.

It means we trigger a warning, like this (on a downstream kernel with
the patch backported):

 BUG: sleeping function called from invalid context at mm/page_alloc.c:4726
 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: ksoftirqd/0
 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.93-12508-gc10c93e28e39 #1
 Call trace:
  dump_backtrace+0x0/0x154
  show_stack+0x20/0x2c
  dump_stack+0xa0/0xfc
  ___might_sleep+0x11c/0x12c
  __might_sleep+0x50/0x84
  __alloc_pages_nodemask+0xf8/0x2bc
  __arm_lpae_alloc_pages+0x48/0x1b4
  __arm_lpae_map+0x124/0x274
  __arm_lpae_map+0x1cc/0x274
  arm_lpae_map+0x140/0x170
  arm_smmu_map+0x78/0xbc
  __iommu_map+0xd4/0x210
  _iommu_map+0x4c/0x84
  iommu_map_atomic+0x44/0x58
  __iommu_dma_map+0x8c/0xc4
  iommu_dma_map_page+0xac/0xf0

---

A quick (but not very tested) fix at:

https://lore.kernel.org/r/20210201170611.1.I64a7b62579287d668d7c89e105dcedf45d641063@changeid/


-Doug

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-02-02  1:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-07 12:29 [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
2021-01-07 12:29 ` [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
2021-02-02  1:07   ` Doug Anderson
2021-01-07 12:29 ` [PATCH v4 2/7] iommu: Add iova and size as parameters in iotlb_sync_map Yong Wu
2021-01-07 12:29 ` [PATCH v4 3/7] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range Yong Wu
2021-01-07 12:29 ` [PATCH v4 4/7] iommu: Switch gather->end to the inclusive end Yong Wu
2021-01-18 18:41   ` Robin Murphy
2021-01-07 12:29 ` [PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional Yong Wu
2021-01-18 18:42   ` Robin Murphy
2021-01-07 12:29 ` [PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once Yong Wu
2021-01-18 18:44   ` Robin Murphy
2021-01-07 12:29 ` [PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s Yong Wu
2021-01-18 18:46   ` Robin Murphy
2021-01-22 19:32 ` [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap Will Deacon
2021-01-27 13:19 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).