linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap
@ 2020-11-19  6:18 Yong Wu
  2020-11-19  6:18 ` [PATCH v2 1/6] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Yong Wu @ 2020-11-19  6:18 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, yong.wu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao,
	jun.wen

This patchset is to improve tlb flushing performance in iommu_map/unmap
for MediaTek IOMMU.

For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP
to do tlb_flush for each a memory chunk. this is so unnecessary. we could
improve it by tlb flushing one time at the end of iommu_map.

For iommu_unmap, currently we have already improve this performance by
gather. But the current gather should take care its granule size. if the
granule size is different, it will do tlb flush and gather again. Our HW
don't care about granule size. thus I add a flag(granule_ignore) for this
case.

After this patchset, we could achieve only tlb flushing once in iommu_map
and iommu_unmap.

Regardless of sg, for each a segment, I did a simple test:
  
  size = 20 * SZ_1M;
  /* the worst case, all are 4k mapping. */
  ret = iommu_map(domain, 0x5bb02000, 0x123f1000, size, IOMMU_READ);
  iommu_unmap(domain, 0x5bb02000, size);

This is the comparing time(unit is us):
              original-time  after-improve
   map-20M    59943           2347
   unmap-20M  264             36

This patchset also flush tlb once in the iommu_map_sg case.

patch [1/6][2/6][3/6] are for map while the others are for unmap.

change note:
v2: Refactor all the code.
    base on v5.10-rc1.

v1: https://lore.kernel.org/linux-iommu/20201019113100.23661-1-chao.hao@mediatek.com/

Yong Wu (6):
  iommu: Move iotlb_sync_map out from __iommu_map
  iommu: Add iova and size as parameters in iommu_iotlb_map
  iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
  iommu: Add granule_ignore when tlb gather
  iommu/mediatek: Enable granule_ignore for unmap
  iommu/mediatek: Convert tlb_flush_walk to gather_add_page

 drivers/iommu/iommu.c      | 24 +++++++++++++++++++-----
 drivers/iommu/mtk_iommu.c  | 32 ++++++++++++++++++++++++++------
 drivers/iommu/tegra-gart.c |  3 ++-
 include/linux/iommu.h      |  7 +++++--
 4 files changed, 52 insertions(+), 14 deletions(-)

-- 
2.18.0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/6] iommu: Move iotlb_sync_map out from __iommu_map
  2020-11-19  6:18 [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
@ 2020-11-19  6:18 ` Yong Wu
  2020-11-25 16:37   ` Robin Murphy
  2020-11-19  6:18 ` [PATCH v2 2/6] iommu: Add iova and size as parameters in iommu_iotlb_map Yong Wu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Yong Wu @ 2020-11-19  6:18 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, yong.wu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao,
	jun.wen

In the end of __iommu_map, It alway call iotlb_sync_map.
This patch moves iotlb_sync_map out from __iommu_map since it is
unnecessary to call this for each sg segment especially iotlb_sync_map
is flush tlb all currently.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/iommu.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c470f451a32..decef851fa3a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2407,9 +2407,6 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
 		size -= pgsize;
 	}
 
-	if (ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
-
 	/* unroll mapping in case something went wrong */
 	if (ret)
 		iommu_unmap(domain, orig_iova, orig_size - size);
@@ -2422,15 +2419,29 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
+	const struct iommu_ops *ops = domain->ops;
+	int ret;
+
 	might_sleep();
-	return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+	if (ret == 0 && ops->iotlb_sync_map)
+		ops->iotlb_sync_map(domain);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
 int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
-	return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
+	const struct iommu_ops *ops = domain->ops;
+	int ret;
+
+	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
+	if (ret == 0 && ops->iotlb_sync_map)
+		ops->iotlb_sync_map(domain);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_map_atomic);
 
@@ -2514,6 +2525,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 			     struct scatterlist *sg, unsigned int nents, int prot,
 			     gfp_t gfp)
 {
+	const struct iommu_ops *ops = domain->ops;
 	size_t len = 0, mapped = 0;
 	phys_addr_t start;
 	unsigned int i = 0;
@@ -2544,6 +2556,8 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 			sg = sg_next(sg);
 	}
 
+	if (ops->iotlb_sync_map)
+		ops->iotlb_sync_map(domain);
 	return mapped;
 
 out_err:
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/6] iommu: Add iova and size as parameters in iommu_iotlb_map
  2020-11-19  6:18 [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
  2020-11-19  6:18 ` [PATCH v2 1/6] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
@ 2020-11-19  6:18 ` Yong Wu
  2020-11-25 16:38   ` Robin Murphy
  2020-11-19  6:18 ` [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range Yong Wu
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Yong Wu @ 2020-11-19  6:18 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, yong.wu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao,
	jun.wen

iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
mapping. This patch adds iova and size as the parameters in it. then the
IOMMU driver could flush tlb with the whole range once after iova mapping
to improve performance.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/iommu.c      | 6 +++---
 drivers/iommu/tegra-gart.c | 3 ++-
 include/linux/iommu.h      | 3 ++-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index decef851fa3a..df87c8e825f7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2425,7 +2425,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	might_sleep();
 	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
 	if (ret == 0 && ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
+		ops->iotlb_sync_map(domain, iova, size);
 
 	return ret;
 }
@@ -2439,7 +2439,7 @@ int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
 
 	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
 	if (ret == 0 && ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
+		ops->iotlb_sync_map(domain, iova, size);
 
 	return ret;
 }
@@ -2557,7 +2557,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 	}
 
 	if (ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
+		ops->iotlb_sync_map(domain, iova, mapped);
 	return mapped;
 
 out_err:
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index fac720273889..d15d13a98ed1 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -261,7 +261,8 @@ static int gart_iommu_of_xlate(struct device *dev,
 	return 0;
 }
 
-static void gart_iommu_sync_map(struct iommu_domain *domain)
+static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
+				size_t size)
 {
 	FLUSH_GART_REGS(gart_handle);
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b95a6f8db6ff..794d4085edd3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -244,7 +244,8 @@ struct iommu_ops {
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
 	void (*flush_iotlb_all)(struct iommu_domain *domain);
-	void (*iotlb_sync_map)(struct iommu_domain *domain);
+	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
+			       size_t size);
 	void (*iotlb_sync)(struct iommu_domain *domain,
 			   struct iommu_iotlb_gather *iotlb_gather);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
  2020-11-19  6:18 [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
  2020-11-19  6:18 ` [PATCH v2 1/6] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
  2020-11-19  6:18 ` [PATCH v2 2/6] iommu: Add iova and size as parameters in iommu_iotlb_map Yong Wu
@ 2020-11-19  6:18 ` Yong Wu
  2020-11-19 15:33   ` kernel test robot
                     ` (2 more replies)
  2020-11-19  6:18 ` [PATCH v2 4/6] iommu: Add granule_ignore when tlb gather Yong Wu
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 15+ messages in thread
From: Yong Wu @ 2020-11-19  6:18 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, yong.wu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao,
	jun.wen

Remove IO_PGTABLE_QUIRK_TLBI_ON_MAP to avoid tlb sync for each a small
chunk memory, Use the new iotlb_sync_map to tlb_sync once for whole the
iova range of iommu_map.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
After reading msm_iommu.c, It looks IO_PGTABLE_QUIRK_TLBI_ON_MAP can be
removed.
---
 drivers/iommu/mtk_iommu.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index c072cee532c2..8c2d4a225666 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -323,7 +323,6 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom)
 	dom->cfg = (struct io_pgtable_cfg) {
 		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
 			IO_PGTABLE_QUIRK_NO_PERMS |
-			IO_PGTABLE_QUIRK_TLBI_ON_MAP |
 			IO_PGTABLE_QUIRK_ARM_MTK_EXT,
 		.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
 		.ias = 32,
@@ -454,6 +453,14 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
 				       data);
 }
 
+static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
+			       size_t size)
+{
+	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+
+	mtk_iommu_tlb_flush_range_sync(iova, size, size, dom->data);
+}
+
 static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
 					  dma_addr_t iova)
 {
@@ -540,6 +547,7 @@ static const struct iommu_ops mtk_iommu_ops = {
 	.unmap		= mtk_iommu_unmap,
 	.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
 	.iotlb_sync	= mtk_iommu_iotlb_sync,
+	.iotlb_sync_map	= mtk_iommu_sync_map,
 	.iova_to_phys	= mtk_iommu_iova_to_phys,
 	.probe_device	= mtk_iommu_probe_device,
 	.release_device	= mtk_iommu_release_device,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/6] iommu: Add granule_ignore when tlb gather
  2020-11-19  6:18 [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (2 preceding siblings ...)
  2020-11-19  6:18 ` [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range Yong Wu
@ 2020-11-19  6:18 ` Yong Wu
  2020-11-25 16:38   ` Robin Murphy
  2020-11-19  6:18 ` [PATCH v2 5/6] iommu/mediatek: Enable granule_ignore for unmap Yong Wu
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Yong Wu @ 2020-11-19  6:18 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, yong.wu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao,
	jun.wen

Add a granule_ignore option when tlb gather for some HW which don't care
about granule when it flush tlb.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 include/linux/iommu.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 794d4085edd3..1aad32238510 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -171,6 +171,7 @@ enum iommu_dev_features {
  * @start: IOVA representing the start of the range to be flushed
  * @end: IOVA representing the end of the range to be flushed (exclusive)
  * @pgsize: The interval at which to perform the flush
+ * @granule_ignore: For tlb flushing that could be regardless of granule.
  *
  * This structure is intended to be updated by multiple calls to the
  * ->unmap() function in struct iommu_ops before eventually being passed
@@ -180,6 +181,7 @@ struct iommu_iotlb_gather {
 	unsigned long		start;
 	unsigned long		end;
 	size_t			pgsize;
+	bool			granule_ignore;
 };
 
 /**
@@ -544,7 +546,7 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
 	 * a different granularity, then sync the TLB so that the gather
 	 * structure can be rewritten.
 	 */
-	if (gather->pgsize != size ||
+	if ((!gather->granule_ignore && gather->pgsize != size) ||
 	    end < gather->start || start > gather->end) {
 		if (gather->pgsize)
 			iommu_iotlb_sync(domain, gather);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/6] iommu/mediatek: Enable granule_ignore for unmap
  2020-11-19  6:18 [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (3 preceding siblings ...)
  2020-11-19  6:18 ` [PATCH v2 4/6] iommu: Add granule_ignore when tlb gather Yong Wu
@ 2020-11-19  6:18 ` Yong Wu
  2020-11-19  6:18 ` [PATCH v2 6/6] iommu/mediatek: Convert tlb_flush_walk to gather_add_page Yong Wu
  2020-11-25 12:27 ` [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Will Deacon
  6 siblings, 0 replies; 15+ messages in thread
From: Yong Wu @ 2020-11-19  6:18 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, yong.wu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao,
	jun.wen

MediaTek IOMMU HW don't care about granule when it flush tlb.
In order to flush tlb once when unmap, Enable this flag to gather all
the iova chunk of unmap.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/mtk_iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 8c2d4a225666..94786860bd84 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -432,6 +432,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
 {
 	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 
+	gather->granule_ignore = true;
 	return dom->iop->unmap(dom->iop, iova, size, gather);
 }
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 6/6] iommu/mediatek: Convert tlb_flush_walk to gather_add_page
  2020-11-19  6:18 [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (4 preceding siblings ...)
  2020-11-19  6:18 ` [PATCH v2 5/6] iommu/mediatek: Enable granule_ignore for unmap Yong Wu
@ 2020-11-19  6:18 ` Yong Wu
  2020-11-25 16:38   ` Robin Murphy
  2020-11-25 12:27 ` [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Will Deacon
  6 siblings, 1 reply; 15+ messages in thread
From: Yong Wu @ 2020-11-19  6:18 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, yong.wu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao,
	jun.wen

MediaTek TLB flush don't care about granule. when unmap, it could gather
whole the iova range then do tlb flush once.

In current v7s, If unmap the lvl2 pagetable, the steps are:
step1: set this current pdg to 0.
step2: tlb flush for this lvl2 block iova(1M).
step3: free the lvl2 pagetable.

This patch means we delay the step2 after unmap whole the iova.
the iommu consumer HW should have stopped before it call dma_free_xx,
thus, this delay looks ok.

Since tlb_flush_walk doesn't have the "gather" parameter, so we have to
add this "gather" in ourself private data.

Meanswhile, After this patch, the gather_add_pages will always be called,
then "gather->start == ULONG_MAX" is impossible. remove this checking.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
tlb_flush_walk is designed for tlb flush range, I'm not sure whether it's ok
if adding "gather" as a parameter in tlb_flush_walk. in this version, I put
it into our private data.
---
 drivers/iommu/mtk_iommu.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 94786860bd84..4c8200f4403a 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -128,6 +128,8 @@ struct mtk_iommu_domain {
 	struct io_pgtable_ops		*iop;
 
 	struct iommu_domain		domain;
+
+	struct iommu_iotlb_gather	*gather;
 };
 
 static const struct iommu_ops mtk_iommu_ops;
@@ -227,6 +229,17 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
 	}
 }
 
+static void mtk_iommu_tlb_flush_walk(unsigned long iova, size_t size,
+				     size_t granule, void *cookie)
+{
+	struct mtk_iommu_data *data = cookie;
+	struct mtk_iommu_domain *m4u_dom = data->m4u_dom;
+	struct iommu_domain *domain = &m4u_dom->domain;
+
+	/* Gather all the iova and tlb flush once after unmap. */
+	iommu_iotlb_gather_add_page(domain, m4u_dom->gather, iova, size);
+}
+
 static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
 					    unsigned long iova, size_t granule,
 					    void *cookie)
@@ -239,8 +252,8 @@ static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
 
 static const struct iommu_flush_ops mtk_iommu_flush_ops = {
 	.tlb_flush_all = mtk_iommu_tlb_flush_all,
-	.tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
-	.tlb_flush_leaf = mtk_iommu_tlb_flush_range_sync,
+	.tlb_flush_walk = mtk_iommu_tlb_flush_walk,
+	.tlb_flush_leaf = mtk_iommu_tlb_flush_walk,
 	.tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
 };
 
@@ -432,6 +445,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
 {
 	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 
+	dom->gather = gather;
 	gather->granule_ignore = true;
 	return dom->iop->unmap(dom->iop, iova, size, gather);
 }
@@ -447,9 +461,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
 	struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
 	size_t length = gather->end - gather->start;
 
-	if (gather->start == ULONG_MAX)
-		return;
-
 	mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
 				       data);
 }
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
  2020-11-19  6:18 ` [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range Yong Wu
@ 2020-11-19 15:33   ` kernel test robot
  2020-11-19 16:11   ` kernel test robot
  2020-11-25 16:38   ` Robin Murphy
  2 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2020-11-19 15:33 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon, Robin Murphy
  Cc: kbuild-all, Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 3276 bytes --]

Hi Yong,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on iommu/next]
[also build test ERROR on tegra/for-next arm-perf/for-next/perf v5.10-rc4 next-20201119]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Yong-Wu/MediaTek-IOMMU-improve-tlb-flush-performance-in-map-unmap/20201119-142620
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: m68k-randconfig-r001-20201119 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/7b8f6281690f7b2c2dbdfdabb6196bc29690c9ed
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Yong-Wu/MediaTek-IOMMU-improve-tlb-flush-performance-in-map-unmap/20201119-142620
        git checkout 7b8f6281690f7b2c2dbdfdabb6196bc29690c9ed
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:5,
                    from arch/m68k/include/asm/bug.h:32,
                    from include/linux/bug.h:5,
                    from drivers/iommu/mtk_iommu.c:6:
   include/linux/scatterlist.h: In function 'sg_set_buf':
   arch/m68k/include/asm/page_mm.h:169:49: warning: ordered comparison of pointer with null pointer [-Wextra]
     169 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
         |                                                 ^~
   include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
      78 | # define unlikely(x) __builtin_expect(!!(x), 0)
         |                                          ^
   include/linux/scatterlist.h:143:2: note: in expansion of macro 'BUG_ON'
     143 |  BUG_ON(!virt_addr_valid(buf));
         |  ^~~~~~
   include/linux/scatterlist.h:143:10: note: in expansion of macro 'virt_addr_valid'
     143 |  BUG_ON(!virt_addr_valid(buf));
         |          ^~~~~~~~~~~~~~~
   drivers/iommu/mtk_iommu.c: In function 'mtk_iommu_sync_map':
>> drivers/iommu/mtk_iommu.c:461:54: error: 'struct mtk_iommu_domain' has no member named 'data'
     461 |  mtk_iommu_tlb_flush_range_sync(iova, size, size, dom->data);
         |                                                      ^~

vim +461 drivers/iommu/mtk_iommu.c

   455	
   456	static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
   457				       size_t size)
   458	{
   459		struct mtk_iommu_domain *dom = to_mtk_domain(domain);
   460	
 > 461		mtk_iommu_tlb_flush_range_sync(iova, size, size, dom->data);
   462	}
   463	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23302 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
  2020-11-19  6:18 ` [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range Yong Wu
  2020-11-19 15:33   ` kernel test robot
@ 2020-11-19 16:11   ` kernel test robot
  2020-11-25 16:38   ` Robin Murphy
  2 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2020-11-19 16:11 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon, Robin Murphy
  Cc: kbuild-all, clang-built-linux, Matthias Brugger,
	Krzysztof Kozlowski, Tomasz Figa, linux-mediatek, srv_heupstream,
	linux-kernel, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 8474 bytes --]

Hi Yong,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on iommu/next]
[also build test ERROR on tegra/for-next arm-perf/for-next/perf v5.10-rc4 next-20201119]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Yong-Wu/MediaTek-IOMMU-improve-tlb-flush-performance-in-map-unmap/20201119-142620
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: powerpc64-randconfig-r033-20201119 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project b2613fb2f0f53691dd0211895afbb9413457fca7)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc64 cross compiling tool for clang build
        # apt-get install binutils-powerpc64-linux-gnu
        # https://github.com/0day-ci/linux/commit/7b8f6281690f7b2c2dbdfdabb6196bc29690c9ed
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Yong-Wu/MediaTek-IOMMU-improve-tlb-flush-performance-in-map-unmap/20201119-142620
        git checkout 7b8f6281690f7b2c2dbdfdabb6196bc29690c9ed
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   __do_insb
   ^
   arch/powerpc/include/asm/io.h:541:56: note: expanded from macro '__do_insb'
   #define __do_insb(p, b, n)      readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/iommu/mtk_iommu.c:12:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:142:1: note: expanded from here
   __do_insw
   ^
   arch/powerpc/include/asm/io.h:542:56: note: expanded from macro '__do_insw'
   #define __do_insw(p, b, n)      readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/iommu/mtk_iommu.c:12:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:144:1: note: expanded from here
   __do_insl
   ^
   arch/powerpc/include/asm/io.h:543:56: note: expanded from macro '__do_insl'
   #define __do_insl(p, b, n)      readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/iommu/mtk_iommu.c:12:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:146:1: note: expanded from here
   __do_outsb
   ^
   arch/powerpc/include/asm/io.h:544:58: note: expanded from macro '__do_outsb'
   #define __do_outsb(p, b, n)     writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/iommu/mtk_iommu.c:12:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:148:1: note: expanded from here
   __do_outsw
   ^
   arch/powerpc/include/asm/io.h:545:58: note: expanded from macro '__do_outsw'
   #define __do_outsw(p, b, n)     writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/iommu/mtk_iommu.c:12:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:150:1: note: expanded from here
   __do_outsl
   ^
   arch/powerpc/include/asm/io.h:546:58: note: expanded from macro '__do_outsl'
   #define __do_outsl(p, b, n)     writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
>> drivers/iommu/mtk_iommu.c:461:56: error: no member named 'data' in 'struct mtk_iommu_domain'
           mtk_iommu_tlb_flush_range_sync(iova, size, size, dom->data);
                                                            ~~~  ^
   12 warnings and 1 error generated.

vim +461 drivers/iommu/mtk_iommu.c

   455	
   456	static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
   457				       size_t size)
   458	{
   459		struct mtk_iommu_domain *dom = to_mtk_domain(domain);
   460	
 > 461		mtk_iommu_tlb_flush_range_sync(iova, size, size, dom->data);
   462	}
   463	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37991 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap
  2020-11-19  6:18 [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
                   ` (5 preceding siblings ...)
  2020-11-19  6:18 ` [PATCH v2 6/6] iommu/mediatek: Convert tlb_flush_walk to gather_add_page Yong Wu
@ 2020-11-25 12:27 ` Will Deacon
  6 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2020-11-25 12:27 UTC (permalink / raw)
  To: Yong Wu, robin.murphy
  Cc: Joerg Roedel, Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao, jun.wen

On Thu, Nov 19, 2020 at 02:18:30PM +0800, Yong Wu wrote:
> This patchset is to improve tlb flushing performance in iommu_map/unmap
> for MediaTek IOMMU.
> 
> For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP
> to do tlb_flush for each a memory chunk. this is so unnecessary. we could
> improve it by tlb flushing one time at the end of iommu_map.
> 
> For iommu_unmap, currently we have already improve this performance by
> gather. But the current gather should take care its granule size. if the
> granule size is different, it will do tlb flush and gather again. Our HW
> don't care about granule size. thus I add a flag(granule_ignore) for this
> case.
> 
> After this patchset, we could achieve only tlb flushing once in iommu_map
> and iommu_unmap.
> 
> Regardless of sg, for each a segment, I did a simple test:
>   
>   size = 20 * SZ_1M;
>   /* the worst case, all are 4k mapping. */
>   ret = iommu_map(domain, 0x5bb02000, 0x123f1000, size, IOMMU_READ);
>   iommu_unmap(domain, 0x5bb02000, size);
> 
> This is the comparing time(unit is us):
>               original-time  after-improve
>    map-20M    59943           2347
>    unmap-20M  264             36
> 
> This patchset also flush tlb once in the iommu_map_sg case.
> 
> patch [1/6][2/6][3/6] are for map while the others are for unmap.
> 
> change note:
> v2: Refactor all the code.
>     base on v5.10-rc1.

Robin -- please can you take a look at this series? I'm not sure I'll
have time to get to it at the moment and you understand how this stuff
is supposed to work.

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/6] iommu: Move iotlb_sync_map out from __iommu_map
  2020-11-19  6:18 ` [PATCH v2 1/6] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
@ 2020-11-25 16:37   ` Robin Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2020-11-25 16:37 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao, jun.wen

On 2020-11-19 06:18, Yong Wu wrote:
> In the end of __iommu_map, It alway call iotlb_sync_map.
> This patch moves iotlb_sync_map out from __iommu_map since it is
> unnecessary to call this for each sg segment especially iotlb_sync_map
> is flush tlb all currently.

I don't see a way to avoid the boilerplate that wouldn't end up making 
things even more ugly and complicated, so:

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
>   drivers/iommu/iommu.c | 24 +++++++++++++++++++-----
>   1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8c470f451a32..decef851fa3a 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2407,9 +2407,6 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
>   		size -= pgsize;
>   	}
>   
> -	if (ops->iotlb_sync_map)
> -		ops->iotlb_sync_map(domain);
> -
>   	/* unroll mapping in case something went wrong */
>   	if (ret)
>   		iommu_unmap(domain, orig_iova, orig_size - size);
> @@ -2422,15 +2419,29 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
>   int iommu_map(struct iommu_domain *domain, unsigned long iova,
>   	      phys_addr_t paddr, size_t size, int prot)
>   {
> +	const struct iommu_ops *ops = domain->ops;
> +	int ret;
> +
>   	might_sleep();
> -	return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> +	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> +	if (ret == 0 && ops->iotlb_sync_map)
> +		ops->iotlb_sync_map(domain);
> +
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(iommu_map);
>   
>   int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
>   	      phys_addr_t paddr, size_t size, int prot)
>   {
> -	return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
> +	const struct iommu_ops *ops = domain->ops;
> +	int ret;
> +
> +	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
> +	if (ret == 0 && ops->iotlb_sync_map)
> +		ops->iotlb_sync_map(domain);
> +
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(iommu_map_atomic);
>   
> @@ -2514,6 +2525,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
>   			     struct scatterlist *sg, unsigned int nents, int prot,
>   			     gfp_t gfp)
>   {
> +	const struct iommu_ops *ops = domain->ops;
>   	size_t len = 0, mapped = 0;
>   	phys_addr_t start;
>   	unsigned int i = 0;
> @@ -2544,6 +2556,8 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
>   			sg = sg_next(sg);
>   	}
>   
> +	if (ops->iotlb_sync_map)
> +		ops->iotlb_sync_map(domain);
>   	return mapped;
>   
>   out_err:
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] iommu: Add iova and size as parameters in iommu_iotlb_map
  2020-11-19  6:18 ` [PATCH v2 2/6] iommu: Add iova and size as parameters in iommu_iotlb_map Yong Wu
@ 2020-11-25 16:38   ` Robin Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2020-11-25 16:38 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao, jun.wen

On 2020-11-19 06:18, Yong Wu wrote:
> iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
> mapping. This patch adds iova and size as the parameters in it. then the
> IOMMU driver could flush tlb with the whole range once after iova mapping
> to improve performance.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
>   drivers/iommu/iommu.c      | 6 +++---
>   drivers/iommu/tegra-gart.c | 3 ++-
>   include/linux/iommu.h      | 3 ++-
>   3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index decef851fa3a..df87c8e825f7 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2425,7 +2425,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
>   	might_sleep();
>   	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
>   	if (ret == 0 && ops->iotlb_sync_map)
> -		ops->iotlb_sync_map(domain);
> +		ops->iotlb_sync_map(domain, iova, size);
>   
>   	return ret;
>   }
> @@ -2439,7 +2439,7 @@ int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
>   
>   	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
>   	if (ret == 0 && ops->iotlb_sync_map)
> -		ops->iotlb_sync_map(domain);
> +		ops->iotlb_sync_map(domain, iova, size);
>   
>   	return ret;
>   }
> @@ -2557,7 +2557,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
>   	}
>   
>   	if (ops->iotlb_sync_map)
> -		ops->iotlb_sync_map(domain);
> +		ops->iotlb_sync_map(domain, iova, mapped);
>   	return mapped;
>   
>   out_err:
> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
> index fac720273889..d15d13a98ed1 100644
> --- a/drivers/iommu/tegra-gart.c
> +++ b/drivers/iommu/tegra-gart.c
> @@ -261,7 +261,8 @@ static int gart_iommu_of_xlate(struct device *dev,
>   	return 0;
>   }
>   
> -static void gart_iommu_sync_map(struct iommu_domain *domain)
> +static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
> +				size_t size)
>   {
>   	FLUSH_GART_REGS(gart_handle);
>   }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b95a6f8db6ff..794d4085edd3 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -244,7 +244,8 @@ struct iommu_ops {
>   	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
>   		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
>   	void (*flush_iotlb_all)(struct iommu_domain *domain);
> -	void (*iotlb_sync_map)(struct iommu_domain *domain);
> +	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
> +			       size_t size);
>   	void (*iotlb_sync)(struct iommu_domain *domain,
>   			   struct iommu_iotlb_gather *iotlb_gather);
>   	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
  2020-11-19  6:18 ` [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range Yong Wu
  2020-11-19 15:33   ` kernel test robot
  2020-11-19 16:11   ` kernel test robot
@ 2020-11-25 16:38   ` Robin Murphy
  2 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2020-11-25 16:38 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao, jun.wen

On 2020-11-19 06:18, Yong Wu wrote:
> Remove IO_PGTABLE_QUIRK_TLBI_ON_MAP to avoid tlb sync for each a small
> chunk memory, Use the new iotlb_sync_map to tlb_sync once for whole the
> iova range of iommu_map.
> 
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
> After reading msm_iommu.c, It looks IO_PGTABLE_QUIRK_TLBI_ON_MAP can be
> removed.
> ---
>   drivers/iommu/mtk_iommu.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index c072cee532c2..8c2d4a225666 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -323,7 +323,6 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom)
>   	dom->cfg = (struct io_pgtable_cfg) {
>   		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
>   			IO_PGTABLE_QUIRK_NO_PERMS |
> -			IO_PGTABLE_QUIRK_TLBI_ON_MAP |
>   			IO_PGTABLE_QUIRK_ARM_MTK_EXT,
>   		.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
>   		.ias = 32,
> @@ -454,6 +453,14 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
>   				       data);
>   }
>   
> +static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
> +			       size_t size)
> +{
> +	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> +
> +	mtk_iommu_tlb_flush_range_sync(iova, size, size, dom->data);

So we have a conflict/dependency against the MT8192 series here - I 
guess we need to make a decision up-front about which series should go 
first. Other than that, though:

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> +}
> +
>   static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
>   					  dma_addr_t iova)
>   {
> @@ -540,6 +547,7 @@ static const struct iommu_ops mtk_iommu_ops = {
>   	.unmap		= mtk_iommu_unmap,
>   	.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
>   	.iotlb_sync	= mtk_iommu_iotlb_sync,
> +	.iotlb_sync_map	= mtk_iommu_sync_map,
>   	.iova_to_phys	= mtk_iommu_iova_to_phys,
>   	.probe_device	= mtk_iommu_probe_device,
>   	.release_device	= mtk_iommu_release_device,
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/6] iommu: Add granule_ignore when tlb gather
  2020-11-19  6:18 ` [PATCH v2 4/6] iommu: Add granule_ignore when tlb gather Yong Wu
@ 2020-11-25 16:38   ` Robin Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2020-11-25 16:38 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao, jun.wen

On 2020-11-19 06:18, Yong Wu wrote:
> Add a granule_ignore option when tlb gather for some HW which don't care
> about granule when it flush tlb.
> 
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
>   include/linux/iommu.h | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 794d4085edd3..1aad32238510 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -171,6 +171,7 @@ enum iommu_dev_features {
>    * @start: IOVA representing the start of the range to be flushed
>    * @end: IOVA representing the end of the range to be flushed (exclusive)
>    * @pgsize: The interval at which to perform the flush
> + * @granule_ignore: For tlb flushing that could be regardless of granule.
>    *
>    * This structure is intended to be updated by multiple calls to the
>    * ->unmap() function in struct iommu_ops before eventually being passed
> @@ -180,6 +181,7 @@ struct iommu_iotlb_gather {
>   	unsigned long		start;
>   	unsigned long		end;
>   	size_t			pgsize;
> +	bool			granule_ignore;

I can't see that this would ever need to vary on a per-unmap-operation 
basis, so this doesn't seem like the right level of abstraction. AFAICS 
it should simply be hard-coded in the driver logic.

>   };
>   
>   /**
> @@ -544,7 +546,7 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
>   	 * a different granularity, then sync the TLB so that the gather
>   	 * structure can be rewritten.
>   	 */
> -	if (gather->pgsize != size ||
> +	if ((!gather->granule_ignore && gather->pgsize != size) ||

I also think this is a slippery slope in the wrong direction anyway - 
there is likely to be a fair bit of hardware-dependent variation around 
how low-level TLB maintenance works (also consider drivers that may want 
to convert a sufficiently large range to an "invalidate all" operation), 
so if a generic helper function doesn't do the right thing for a given 
driver, that driver should simply not use the helper, and directly 
implement the logic it does need.

Robin.

>   	    end < gather->start || start > gather->end) {
>   		if (gather->pgsize)
>   			iommu_iotlb_sync(domain, gather);
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 6/6] iommu/mediatek: Convert tlb_flush_walk to gather_add_page
  2020-11-19  6:18 ` [PATCH v2 6/6] iommu/mediatek: Convert tlb_flush_walk to gather_add_page Yong Wu
@ 2020-11-25 16:38   ` Robin Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2020-11-25 16:38 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Will Deacon
  Cc: Matthias Brugger, Krzysztof Kozlowski, Tomasz Figa,
	linux-mediatek, srv_heupstream, linux-kernel, linux-arm-kernel,
	iommu, youlin.pei, Nicolas Boichat, anan.sun, chao.hao, jun.wen

On 2020-11-19 06:18, Yong Wu wrote:
> MediaTek TLB flush don't care about granule. when unmap, it could gather
> whole the iova range then do tlb flush once.
> 
> In current v7s, If unmap the lvl2 pagetable, the steps are:
> step1: set this current pdg to 0.
> step2: tlb flush for this lvl2 block iova(1M).
> step3: free the lvl2 pagetable.
> 
> This patch means we delay the step2 after unmap whole the iova.
> the iommu consumer HW should have stopped before it call dma_free_xx,
> thus, this delay looks ok.

If you can guarantee that no kind of speculative table walks can happen 
(i.e. the hardware has no kind of translation prefetching that could be 
triggered by, say, an unrelated device accessing an adjacent page) then 
I guess it's probably OK to reason that this can be safe.

However at that point I wonder whether you really need to do anything 
for flush_walk at all, if you don't need to differentiate between leaf 
and non-leaf invalidations either. In DMA API usage you should never be 
actually unmapping a table at block granularity, since the matching map 
request would have used a block mapping in the first place, so I guess 
you're hitting this in the case of *mapping* a block over an empty 
table. For that, the subsequent iotlb_sync_map() will overlap the whole 
region anyway. You'd still want to handle the general unmap case 
properly for the sake of correctness, but AFAICS you could just 
manipulate the gather data directly in mtk_iommu_unmap() far more easily 
than threading a special case all the way through io-pgtable.

> Since tlb_flush_walk doesn't have the "gather" parameter, so we have to
> add this "gather" in ourself private data.
> 
> Meanswhile, After this patch, the gather_add_pages will always be called,
> then "gather->start == ULONG_MAX" is impossible. remove this checking.
> 
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
> tlb_flush_walk is designed for tlb flush range, I'm not sure whether it's ok
> if adding "gather" as a parameter in tlb_flush_walk. in this version, I put
> it into our private data.
> ---
>   drivers/iommu/mtk_iommu.c | 21 ++++++++++++++++-----
>   1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 94786860bd84..4c8200f4403a 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -128,6 +128,8 @@ struct mtk_iommu_domain {
>   	struct io_pgtable_ops		*iop;
>   
>   	struct iommu_domain		domain;
> +
> +	struct iommu_iotlb_gather	*gather;
>   };
>   
>   static const struct iommu_ops mtk_iommu_ops;
> @@ -227,6 +229,17 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
>   	}
>   }
>   
> +static void mtk_iommu_tlb_flush_walk(unsigned long iova, size_t size,
> +				     size_t granule, void *cookie)
> +{
> +	struct mtk_iommu_data *data = cookie;
> +	struct mtk_iommu_domain *m4u_dom = data->m4u_dom;
> +	struct iommu_domain *domain = &m4u_dom->domain;
> +
> +	/* Gather all the iova and tlb flush once after unmap. */
> +	iommu_iotlb_gather_add_page(domain, m4u_dom->gather, iova, size);
> +}
> +
>   static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
>   					    unsigned long iova, size_t granule,
>   					    void *cookie)
> @@ -239,8 +252,8 @@ static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
>   
>   static const struct iommu_flush_ops mtk_iommu_flush_ops = {
>   	.tlb_flush_all = mtk_iommu_tlb_flush_all,
> -	.tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
> -	.tlb_flush_leaf = mtk_iommu_tlb_flush_range_sync,
> +	.tlb_flush_walk = mtk_iommu_tlb_flush_walk,
> +	.tlb_flush_leaf = mtk_iommu_tlb_flush_walk,
>   	.tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
>   };
>   
> @@ -432,6 +445,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>   {
>   	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>   
> +	dom->gather = gather;

Either way, this definitely doesn't work - multiple threads could be 
unmapping different regions in the same domain at the same time. That's 
why the gather structure has to be on each caller's stack in the first 
place.

Robin.

>   	gather->granule_ignore = true;
>   	return dom->iop->unmap(dom->iop, iova, size, gather);
>   }
> @@ -447,9 +461,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
>   	struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
>   	size_t length = gather->end - gather->start;
>   
> -	if (gather->start == ULONG_MAX)
> -		return;
> -
>   	mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
>   				       data);
>   }
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-11-25 16:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19  6:18 [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Yong Wu
2020-11-19  6:18 ` [PATCH v2 1/6] iommu: Move iotlb_sync_map out from __iommu_map Yong Wu
2020-11-25 16:37   ` Robin Murphy
2020-11-19  6:18 ` [PATCH v2 2/6] iommu: Add iova and size as parameters in iommu_iotlb_map Yong Wu
2020-11-25 16:38   ` Robin Murphy
2020-11-19  6:18 ` [PATCH v2 3/6] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range Yong Wu
2020-11-19 15:33   ` kernel test robot
2020-11-19 16:11   ` kernel test robot
2020-11-25 16:38   ` Robin Murphy
2020-11-19  6:18 ` [PATCH v2 4/6] iommu: Add granule_ignore when tlb gather Yong Wu
2020-11-25 16:38   ` Robin Murphy
2020-11-19  6:18 ` [PATCH v2 5/6] iommu/mediatek: Enable granule_ignore for unmap Yong Wu
2020-11-19  6:18 ` [PATCH v2 6/6] iommu/mediatek: Convert tlb_flush_walk to gather_add_page Yong Wu
2020-11-25 16:38   ` Robin Murphy
2020-11-25 12:27 ` [PATCH v2 0/6] MediaTek IOMMU improve tlb flush performance in map/unmap Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).