iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/9] Possible set of VT-d optimizations
@ 2021-01-27 20:00 Chuck Lever
  2021-01-27 20:00 ` [PATCH RFC 1/9] iommu: Move iotlb_sync_map out from __iommu_map Chuck Lever
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:00 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

Hi-

This collection of patches seems to get the best throughtput results
so far. The NFS WRITE result is fully restored, and the NFS READ
result is very close to fully restored.

	Children see throughput for 12 initial writers  = 5008474.03 kB/sec
	Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
	Min throughput per process                      = 416956.88 kB/sec 
	Max throughput per process                      = 417910.22 kB/sec
	Avg throughput per process                      = 417372.84 kB/sec
	Min xfer                                        = 1046272.00 kB
	CPU Utilization: Wall time    2.515    CPU time    1.996    CPU utilization  79.37 %


	Children see throughput for 12 rewriters        = 5020584.59 kB/sec
	Parent sees throughput for 12 rewriters         = 5012539.29 kB/sec
	Min throughput per process                      = 417799.00 kB/sec 
	Max throughput per process                      = 419082.22 kB/sec
	Avg throughput per process                      = 418382.05 kB/sec
	Min xfer                                        = 1046528.00 kB
	CPU utilization: Wall time    2.507    CPU time    2.024    CPU utilization  80.73 %


	Children see throughput for 12 readers          = 5805484.25 kB/sec
	Parent sees throughput for 12 readers           = 5799535.68 kB/sec
	Min throughput per process                      = 482888.16 kB/sec 
	Max throughput per process                      = 484444.16 kB/sec
	Avg throughput per process                      = 483790.35 kB/sec
	Min xfer                                        = 1045760.00 kB
	CPU utilization: Wall time    2.167    CPU time    1.964    CPU utilization  90.63 %


	Children see throughput for 12 re-readers       = 5812227.16 kB/sec
	Parent sees throughput for 12 re-readers        = 5803793.06 kB/sec
	Min throughput per process                      = 483242.97 kB/sec 
	Max throughput per process                      = 485724.41 kB/sec
	Avg throughput per process                      = 484352.26 kB/sec
	Min xfer                                        = 1043456.00 kB
	CPU utilization: Wall time    2.161    CPU time    1.976    CPU utilization  91.45 %

I've included a simple-minded implementation of a map_sg op for
the Intel IOMMU. This is nothing more than a copy of the loop in
__iommu_map_sg() with the call to __iommu_map() replaced with a
call to intel_iommu_map().

---

Chuck Lever (1):
      iommu/vt-d: Introduce map_sg() for Intel IOMMUs

Isaac J. Manjarres (5):
      iommu/io-pgtable: Introduce map_sg() as a page table op
      iommu/io-pgtable-arm: Hook up map_sg()
      iommu/io-pgtable-arm-v7s: Hook up map_sg()
      iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
      iommu/arm-smmu: Hook up map_sg()

Lu Baolu (1):
      iommu/vt-d: Add iotlb_sync_map callback

Yong Wu (2):
      iommu: Move iotlb_sync_map out from __iommu_map
      iommu: Add iova and size as parameters in iotlb_sync_map


 drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 ++++
 drivers/iommu/intel/iommu.c           | 131 ++++++++++++++++++++------
 drivers/iommu/io-pgtable-arm-v7s.c    |  90 ++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c        |  86 +++++++++++++++++
 drivers/iommu/iommu.c                 |  47 +++++++--
 drivers/iommu/tegra-gart.c            |   7 +-
 include/linux/iommu.h                 |  16 +++-
 7 files changed, 353 insertions(+), 43 deletions(-)

--
Chuck Lever

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH RFC 1/9] iommu: Move iotlb_sync_map out from __iommu_map
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
@ 2021-01-27 20:00 ` Chuck Lever
  2021-01-28  2:40   ` Lu Baolu
  2021-01-27 20:00 ` [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map Chuck Lever
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:00 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

From: Yong Wu <yong.wu@mediatek.com>

In the end of __iommu_map, It alway call iotlb_sync_map.

This patch moves iotlb_sync_map out from __iommu_map since it is
unnecessary to call this for each sg segment especially iotlb_sync_map
is flush tlb all currently. Add a little helper _iommu_map for this.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/iommu/iommu.c |   23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ffeebda8d6de..c304a6a30d42 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2426,9 +2426,6 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
 		size -= pgsize;
 	}
 
-	if (ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
-
 	/* unroll mapping in case something went wrong */
 	if (ret)
 		iommu_unmap(domain, orig_iova, orig_size - size);
@@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
 	return ret;
 }
 
+static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
+		      phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+	const struct iommu_ops *ops = domain->ops;
+	int ret;
+
+	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+	if (ret == 0 && ops->iotlb_sync_map)
+		ops->iotlb_sync_map(domain);
+
+	return ret;
+}
+
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
 	might_sleep();
-	return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+	return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
 int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
-	return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
+	return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
 }
 EXPORT_SYMBOL_GPL(iommu_map_atomic);
 
@@ -2533,6 +2543,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 			     struct scatterlist *sg, unsigned int nents, int prot,
 			     gfp_t gfp)
 {
+	const struct iommu_ops *ops = domain->ops;
 	size_t len = 0, mapped = 0;
 	phys_addr_t start;
 	unsigned int i = 0;
@@ -2563,6 +2574,8 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 			sg = sg_next(sg);
 	}
 
+	if (ops->iotlb_sync_map)
+		ops->iotlb_sync_map(domain);
 	return mapped;
 
 out_err:


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
  2021-01-27 20:00 ` [PATCH RFC 1/9] iommu: Move iotlb_sync_map out from __iommu_map Chuck Lever
@ 2021-01-27 20:00 ` Chuck Lever
  2021-01-28  2:44   ` Lu Baolu
  2021-01-28 12:51   ` Joerg Roedel
  2021-01-27 20:00 ` [PATCH RFC 3/9] iommu/vt-d: Add iotlb_sync_map callback Chuck Lever
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:00 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

From: Yong Wu <yong.wu@mediatek.com>

iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
mapping. This patch adds iova and size as the parameters in it. then the
IOMMU driver could flush tlb with the whole range once after iova mapping
to improve performance.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/iommu/iommu.c      |    4 ++--
 drivers/iommu/tegra-gart.c |    7 +++++--
 include/linux/iommu.h      |    3 ++-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c304a6a30d42..3d099a31ddca 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
 
 	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
 	if (ret == 0 && ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
+		ops->iotlb_sync_map(domain, iova, size);
 
 	return ret;
 }
@@ -2575,7 +2575,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 	}
 
 	if (ops->iotlb_sync_map)
-		ops->iotlb_sync_map(domain);
+		ops->iotlb_sync_map(domain, iova, mapped);
 	return mapped;
 
 out_err:
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index fac720273889..05e8e19b8269 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -261,7 +261,8 @@ static int gart_iommu_of_xlate(struct device *dev,
 	return 0;
 }
 
-static void gart_iommu_sync_map(struct iommu_domain *domain)
+static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
+				size_t size)
 {
 	FLUSH_GART_REGS(gart_handle);
 }
@@ -269,7 +270,9 @@ static void gart_iommu_sync_map(struct iommu_domain *domain)
 static void gart_iommu_sync(struct iommu_domain *domain,
 			    struct iommu_iotlb_gather *gather)
 {
-	gart_iommu_sync_map(domain);
+	size_t length = gather->end - gather->start;
+
+	gart_iommu_sync_map(domain, gather->start, length);
 }
 
 static const struct iommu_ops gart_iommu_ops = {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b3f0e2018c62..9ce0aa9e236b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -246,7 +246,8 @@ struct iommu_ops {
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
 	void (*flush_iotlb_all)(struct iommu_domain *domain);
-	void (*iotlb_sync_map)(struct iommu_domain *domain);
+	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
+			       size_t size);
 	void (*iotlb_sync)(struct iommu_domain *domain,
 			   struct iommu_iotlb_gather *iotlb_gather);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 3/9] iommu/vt-d: Add iotlb_sync_map callback
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
  2021-01-27 20:00 ` [PATCH RFC 1/9] iommu: Move iotlb_sync_map out from __iommu_map Chuck Lever
  2021-01-27 20:00 ` [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map Chuck Lever
@ 2021-01-27 20:00 ` Chuck Lever
  2021-01-27 20:00 ` [PATCH RFC 4/9] iommu/io-pgtable: Introduce map_sg() as a page table op Chuck Lever
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:00 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

From: Lu Baolu <baolu.lu@linux.intel.com>

Some Intel VT-d hardware implementations don't support memory coherency
for page table walk (presented by the Page-Walk-coherency bit in the
ecap register), so that software must flush the corresponding CPU cache
lines explicitly after each page table entry update.

The iommu_map_sg() code iterates through the given scatter-gather list
and invokes iommu_map() for each element in the scatter-gather list,
which calls into the vendor IOMMU driver through iommu_ops callback. As
the result, a single sg mapping may lead to multiple cache line flushes,
which leads to the degradation of I/O performance after the commit
<c588072bba6b5> ("iommu/vt-d: Convert intel iommu driver to the iommu
ops").

Fix this by adding iotlb_sync_map callback and centralizing the clflush
operations after all sg mappings.

Fixes: c588072bba6b5 ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/linux-iommu/D81314ED-5673-44A6-B597-090E3CB83EB0@oracle.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Robin Murphy <robin.murphy@arm.com>
[ cel: removed @first_pte, which is no longer used ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/iommu/intel/iommu.c |   90 +++++++++++++++++++++++++++++--------------
 1 file changed, 60 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index f665322a0991..013097b6d55f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2298,9 +2298,9 @@ static int
 __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 		 unsigned long phys_pfn, unsigned long nr_pages, int prot)
 {
-	struct dma_pte *first_pte = NULL, *pte = NULL;
 	unsigned int largepage_lvl = 0;
 	unsigned long lvl_pages = 0;
+	struct dma_pte *pte = NULL;
 	phys_addr_t pteval;
 	u64 attr;
 
@@ -2322,7 +2322,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 			largepage_lvl = hardware_largepage_caps(domain, iov_pfn,
 					phys_pfn, nr_pages);
 
-			first_pte = pte = pfn_to_dma_pte(domain, iov_pfn, &largepage_lvl);
+			pte = pfn_to_dma_pte(domain, iov_pfn, &largepage_lvl);
 			if (!pte)
 				return -ENOMEM;
 			/* It is large page*/
@@ -2383,34 +2383,14 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 		 * recalculate 'pte' and switch back to smaller pages for the
 		 * end of the mapping, if the trailing size is not enough to
 		 * use another superpage (i.e. nr_pages < lvl_pages).
+		 *
+		 * We leave clflush for the leaf pte changes to iotlb_sync_map()
+		 * callback.
 		 */
 		pte++;
 		if (!nr_pages || first_pte_in_page(pte) ||
-		    (largepage_lvl > 1 && nr_pages < lvl_pages)) {
-			domain_flush_cache(domain, first_pte,
-					   (void *)pte - (void *)first_pte);
+		    (largepage_lvl > 1 && nr_pages < lvl_pages))
 			pte = NULL;
-		}
-	}
-
-	return 0;
-}
-
-static int
-domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
-	       unsigned long phys_pfn, unsigned long nr_pages, int prot)
-{
-	int iommu_id, ret;
-	struct intel_iommu *iommu;
-
-	/* Do the real mapping first */
-	ret = __domain_mapping(domain, iov_pfn, phys_pfn, nr_pages, prot);
-	if (ret)
-		return ret;
-
-	for_each_domain_iommu(iommu_id, domain) {
-		iommu = g_iommus[iommu_id];
-		__mapping_notify_one(iommu, domain, iov_pfn, nr_pages);
 	}
 
 	return 0;
@@ -4943,7 +4923,6 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	u64 max_addr;
 	int prot = 0;
-	int ret;
 
 	if (iommu_prot & IOMMU_READ)
 		prot |= DMA_PTE_READ;
@@ -4969,9 +4948,8 @@ static int intel_iommu_map(struct iommu_domain *domain,
 	/* Round up size to next multiple of PAGE_SIZE, if it and
 	   the low bits of hpa would take us onto the next page */
 	size = aligned_nrpages(hpa, size);
-	ret = domain_mapping(dmar_domain, iova >> VTD_PAGE_SHIFT,
-			     hpa >> VTD_PAGE_SHIFT, size, prot);
-	return ret;
+	return __domain_mapping(dmar_domain, iova >> VTD_PAGE_SHIFT,
+				hpa >> VTD_PAGE_SHIFT, size, prot);
 }
 
 static size_t intel_iommu_unmap(struct iommu_domain *domain,
@@ -5478,6 +5456,57 @@ static bool risky_device(struct pci_dev *pdev)
 	return false;
 }
 
+static void clflush_sync_map(struct dmar_domain *domain, unsigned long clf_pfn,
+			     unsigned long clf_pages)
+{
+	struct dma_pte *first_pte = NULL, *pte = NULL;
+	unsigned long lvl_pages = 0;
+	int level = 0;
+
+	while (clf_pages > 0) {
+		if (!pte) {
+			level = 0;
+			pte = pfn_to_dma_pte(domain, clf_pfn, &level);
+			if (WARN_ON(!pte))
+				return;
+			first_pte = pte;
+			lvl_pages = lvl_to_nr_pages(level);
+		}
+
+		if (WARN_ON(!lvl_pages || clf_pages < lvl_pages))
+			return;
+
+		clf_pages -= lvl_pages;
+		clf_pfn += lvl_pages;
+		pte++;
+
+		if (!clf_pages || first_pte_in_page(pte) ||
+		    (level > 1 && clf_pages < lvl_pages)) {
+			domain_flush_cache(domain, first_pte,
+					   (void *)pte - (void *)first_pte);
+			pte = NULL;
+		}
+	}
+}
+
+static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
+				       unsigned long iova, size_t size)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	unsigned long pages = aligned_nrpages(iova, size);
+	unsigned long pfn = iova >> VTD_PAGE_SHIFT;
+	struct intel_iommu *iommu;
+	int iommu_id;
+
+	if (!dmar_domain->iommu_coherency)
+		clflush_sync_map(dmar_domain, pfn, pages);
+
+	for_each_domain_iommu(iommu_id, dmar_domain) {
+		iommu = g_iommus[iommu_id];
+		__mapping_notify_one(iommu, dmar_domain, pfn, pages);
+	}
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
@@ -5490,6 +5519,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.aux_detach_dev		= intel_iommu_aux_detach_device,
 	.aux_get_pasid		= intel_iommu_aux_get_pasid,
 	.map			= intel_iommu_map,
+	.iotlb_sync_map		= intel_iommu_iotlb_sync_map,
 	.unmap			= intel_iommu_unmap,
 	.flush_iotlb_all        = intel_flush_iotlb_all,
 	.iotlb_sync		= intel_iommu_tlb_sync,


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 4/9] iommu/io-pgtable: Introduce map_sg() as a page table op
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
                   ` (2 preceding siblings ...)
  2021-01-27 20:00 ` [PATCH RFC 3/9] iommu/vt-d: Add iotlb_sync_map callback Chuck Lever
@ 2021-01-27 20:00 ` Chuck Lever
  2021-01-27 20:00 ` [PATCH RFC 5/9] iommu/io-pgtable-arm: Hook up map_sg() Chuck Lever
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:00 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

From: Isaac J. Manjarres <isaacm@codeaurora.org>

While mapping a scatter-gather list, iommu_map_sg() calls
into the IOMMU driver through an indirect call, which can
call into the io-pgtable code through another indirect call.

This sequence of going through the IOMMU core code, the IOMMU
driver, and finally the io-pgtable code, occurs for every
element in the scatter-gather list, in the worse case, which
is not optimal.

Introduce a map_sg callback in the io-pgtable ops so that
IOMMU drivers can invoke it with the complete scatter-gather
list, so that it can be processed within the io-pgtable
code entirely, reducing the number of indirect calls, and
boosting overall iommu_map_sg() performance.

Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/io-pgtable.h |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ea727eb1a1a9..6d0e73172603 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -147,6 +147,9 @@ struct io_pgtable_cfg {
  * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
  *
  * @map:          Map a physically contiguous memory region.
+ * @map_sg:       Map a scatter-gather list of physically contiguous memory
+ *                chunks. The mapped pointer argument is used to store how
+ *                many bytes are mapped.
  * @unmap:        Unmap a physically contiguous memory region.
  * @iova_to_phys: Translate iova to physical address.
  *
@@ -156,6 +159,9 @@ struct io_pgtable_cfg {
 struct io_pgtable_ops {
 	int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+	int (*map_sg)(struct io_pgtable_ops *ops, unsigned long iova,
+		      struct scatterlist *sg, unsigned int nents, int prot,
+		      gfp_t gfp, size_t *mapped);
 	size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
 			size_t size, struct iommu_iotlb_gather *gather);
 	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 5/9] iommu/io-pgtable-arm: Hook up map_sg()
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
                   ` (3 preceding siblings ...)
  2021-01-27 20:00 ` [PATCH RFC 4/9] iommu/io-pgtable: Introduce map_sg() as a page table op Chuck Lever
@ 2021-01-27 20:00 ` Chuck Lever
  2021-01-28 13:53   ` Will Deacon
  2021-01-27 20:01 ` [PATCH RFC 6/9] iommu/io-pgtable-arm-v7s: " Chuck Lever
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:00 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

From: Isaac J. Manjarres <isaacm@codeaurora.org>

Implement the map_sg io-pgtable op for the ARM LPAE io-pgtable
code, so that IOMMU drivers can call it when they need to map
a scatter-gather list.

Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/iommu/io-pgtable-arm.c |   86 ++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c          |   12 +++---
 include/linux/iommu.h          |    8 ++++
 3 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..0c11529442b8 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -473,6 +473,91 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
+static int arm_lpae_map_by_pgsize(struct io_pgtable_ops *ops,
+				  unsigned long iova, phys_addr_t paddr,
+				  size_t size, int iommu_prot, gfp_t gfp,
+				  size_t *mapped)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_lpae_iopte *ptep = data->pgd;
+	int ret, lvl = data->start_level;
+	arm_lpae_iopte prot = arm_lpae_prot_to_pte(data, iommu_prot);
+	unsigned int min_pagesz = 1 << __ffs(cfg->pgsize_bitmap);
+	long iaext = (s64)(iova + size - 1) >> cfg->ias;
+	size_t pgsize;
+
+	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n",
+		       iova, &paddr, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext || (paddr + size - 1) >> cfg->oas))
+		return -ERANGE;
+
+	while (size) {
+		pgsize = iommu_pgsize(cfg->pgsize_bitmap, iova | paddr, size);
+		ret = __arm_lpae_map(data, iova, paddr, pgsize, prot, lvl, ptep,
+				     gfp);
+		if (ret)
+			return ret;
+
+		iova += pgsize;
+		paddr += pgsize;
+		*mapped += pgsize;
+		size -= pgsize;
+	}
+
+	return 0;
+}
+
+static int arm_lpae_map_sg(struct io_pgtable_ops *ops, unsigned long iova,
+			   struct scatterlist *sg, unsigned int nents,
+			   int iommu_prot, gfp_t gfp, size_t *mapped)
+{
+
+	size_t len = 0;
+	unsigned int i = 0;
+	int ret;
+	phys_addr_t start;
+
+	*mapped = 0;
+
+	/* If no access, then nothing to do */
+	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
+		return 0;
+
+	while (i <= nents) {
+		phys_addr_t s_phys = sg_phys(sg);
+
+		if (len && s_phys != start + len) {
+			ret = arm_lpae_map_by_pgsize(ops, iova + *mapped, start,
+						     len, iommu_prot, gfp,
+						     mapped);
+
+			if (ret)
+				return ret;
+
+			len = 0;
+		}
+
+		if (len) {
+			len += sg->length;
+		} else {
+			len = sg->length;
+			start = s_phys;
+		}
+
+		if (++i < nents)
+			sg = sg_next(sg);
+	}
+
+	return 0;
+}
+
 static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 				    arm_lpae_iopte *ptep)
 {
@@ -750,6 +835,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 
 	data->iop.ops = (struct io_pgtable_ops) {
 		.map		= arm_lpae_map,
+		.map_sg		= arm_lpae_map_sg,
 		.unmap		= arm_lpae_unmap,
 		.iova_to_phys	= arm_lpae_iova_to_phys,
 	};
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3d099a31ddca..ed879a4d7fac 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2346,8 +2346,8 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
 
-static size_t iommu_pgsize(struct iommu_domain *domain,
-			   unsigned long addr_merge, size_t size)
+size_t iommu_pgsize(unsigned long pgsize_bitmap, unsigned long addr_merge,
+		    size_t size)
 {
 	unsigned int pgsize_idx;
 	size_t pgsize;
@@ -2366,7 +2366,7 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
 	pgsize = (1UL << (pgsize_idx + 1)) - 1;
 
 	/* throw away page sizes not supported by the hardware */
-	pgsize &= domain->pgsize_bitmap;
+	pgsize &= pgsize_bitmap;
 
 	/* make sure we're still sane */
 	BUG_ON(!pgsize);
@@ -2412,7 +2412,8 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
 	pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, &paddr, size);
 
 	while (size) {
-		size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
+		size_t pgsize = iommu_pgsize(domain->pgsize_bitmap,
+					     iova | paddr, size);
 
 		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 			 iova, &paddr, pgsize);
@@ -2500,7 +2501,8 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 	 * or we hit an area that isn't mapped.
 	 */
 	while (unmapped < size) {
-		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
+		size_t pgsize = iommu_pgsize(domain->pgsize_bitmap, iova,
+					     size - unmapped);
 
 		unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather);
 		if (!unmapped_page)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9ce0aa9e236b..cd5f35022a25 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -439,6 +439,8 @@ extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
 				   struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
+extern size_t iommu_pgsize(unsigned long pgsize_bitmap, unsigned long addr_mer,
+			   size_t size);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
 extern int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
@@ -691,6 +693,12 @@ static inline struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 	return NULL;
 }
 
+static inline size_t iommu_pgsize(unsigned long pgsize_bitmap,
+				  unsigned long addr_merge, size_t size)
+{
+	return 0;
+}
+
 static inline int iommu_map(struct iommu_domain *domain, unsigned long iova,
 			    phys_addr_t paddr, size_t size, int prot)
 {


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 6/9] iommu/io-pgtable-arm-v7s: Hook up map_sg()
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
                   ` (4 preceding siblings ...)
  2021-01-27 20:00 ` [PATCH RFC 5/9] iommu/io-pgtable-arm: Hook up map_sg() Chuck Lever
@ 2021-01-27 20:01 ` Chuck Lever
  2021-01-27 20:01 ` [PATCH RFC 7/9] iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers Chuck Lever
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:01 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

From: Isaac J. Manjarres <isaacm@codeaurora.org>

Implement the map_sg io-pgtable op for the ARMv7s io-pgtable
code, so that IOMMU drivers can call it when they need to map
a scatter-gather list.

Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/iommu/io-pgtable-arm-v7s.c |   90 ++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 1d92ac948db7..8665dabb753b 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -545,6 +545,95 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
+static int arm_v7s_map_by_pgsize(struct io_pgtable_ops *ops,
+				 unsigned long iova, phys_addr_t paddr,
+				 size_t size, int prot, gfp_t gfp,
+				 size_t *mapped)
+{
+	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable *iop = &data->iop;
+	struct io_pgtable_cfg *cfg = &iop->cfg;
+	unsigned int min_pagesz = 1 << __ffs(cfg->pgsize_bitmap);
+	int ret;
+	size_t pgsize;
+
+	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n",
+		       iova, &paddr, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	if (WARN_ON((iova + size - 1) >= (1ULL << cfg->ias) ||
+		    (paddr + size - 1) >= (1ULL << cfg->oas)))
+		return -ERANGE;
+
+	while (size) {
+		pgsize = iommu_pgsize(cfg->pgsize_bitmap, iova | paddr, size);
+		ret = __arm_v7s_map(data, iova, paddr, pgsize, prot, 1,
+				    data->pgd, gfp);
+
+		if (iop->cfg.quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) {
+			io_pgtable_tlb_flush_walk(&data->iop, iova, size,
+						  ARM_V7S_BLOCK_SIZE(2));
+		} else {
+			wmb();
+		}
+
+		if (ret)
+			return ret;
+
+		iova += pgsize;
+		paddr += pgsize;
+		*mapped += pgsize;
+		size -= pgsize;
+	}
+
+	return 0;
+}
+
+static int arm_v7s_map_sg(struct io_pgtable_ops *ops, unsigned long iova,
+			  struct scatterlist *sg, unsigned int nents,
+			  int iommu_prot, gfp_t gfp, size_t *mapped)
+{
+	size_t len = 0;
+	unsigned int i = 0;
+	int ret;
+	phys_addr_t start;
+
+	*mapped = 0;
+
+	/* If no access, then nothing to do */
+	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
+		return 0;
+
+	while (i <= nents) {
+		phys_addr_t s_phys = sg_phys(sg);
+
+		if (len && s_phys != start + len) {
+			ret = arm_v7s_map_by_pgsize(ops, iova + *mapped, start,
+						    len, iommu_prot, gfp,
+						    mapped);
+
+			if (ret)
+				return ret;
+
+			len = 0;
+		}
+
+		if (len) {
+			len += sg->length;
+		} else {
+			len = sg->length;
+			start = s_phys;
+		}
+
+		if (++i < nents)
+			sg = sg_next(sg);
+	}
+
+	return 0;
+}
+
 static void arm_v7s_free_pgtable(struct io_pgtable *iop)
 {
 	struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop);
@@ -783,6 +872,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 
 	data->iop.ops = (struct io_pgtable_ops) {
 		.map		= arm_v7s_map,
+		.map_sg		= arm_v7s_map_sg,
 		.unmap		= arm_v7s_unmap,
 		.iova_to_phys	= arm_v7s_iova_to_phys,
 	};


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 7/9] iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
                   ` (5 preceding siblings ...)
  2021-01-27 20:01 ` [PATCH RFC 6/9] iommu/io-pgtable-arm-v7s: " Chuck Lever
@ 2021-01-27 20:01 ` Chuck Lever
  2021-01-28  2:58   ` Lu Baolu
  2021-01-27 20:01 ` [PATCH RFC 8/9] iommu/arm-smmu: Hook up map_sg() Chuck Lever
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:01 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

From: Isaac J. Manjarres <isaacm@codeaurora.org>

Add support for IOMMU drivers to have their own map_sg() callbacks.
This completes the path for having iommu_map_sg() invoke an IOMMU
driver's map_sg() callback, which can then invoke the io-pgtable
map_sg() callback with the entire scatter-gather list, so that it
can be processed entirely in the io-pgtable layer.

For IOMMU drivers that do not provide a callback, the default
implementation of iterating through the scatter-gather list, while
calling iommu_map() will be used.

Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
[ cel: adjusted new iotlb_sync_map call site ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/iommu/iommu.c |   12 ++++++++++++
 include/linux/iommu.h |    5 +++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ed879a4d7fac..bd7adbd0339b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2551,6 +2551,18 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 	unsigned int i = 0;
 	int ret;
 
+	if (ops->map_sg) {
+		ret = ops->map_sg(domain, iova, sg, nents, prot, gfp, &mapped);
+
+		if (ops->iotlb_sync_map)
+			ops->iotlb_sync_map(domain, iova, mapped);
+
+		if (ret)
+			goto out_err;
+
+		return mapped;
+	}
+
 	while (i <= nents) {
 		phys_addr_t s_phys = sg_phys(sg);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index cd5f35022a25..667edc7b034a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -192,6 +192,8 @@ struct iommu_iotlb_gather {
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
  * @map: map a physically contiguous memory region to an iommu domain
+ * @map_sg: map a scatter-gather list of physically contiguous chunks to
+ *          an iommu domain.
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
@@ -243,6 +245,9 @@ struct iommu_ops {
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+	int (*map_sg)(struct iommu_domain *domain, unsigned long iova,
+		      struct scatterlist *sg, unsigned int nents, int prot,
+		      gfp_t gfp, size_t *mapped);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
 	void (*flush_iotlb_all)(struct iommu_domain *domain);


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 8/9] iommu/arm-smmu: Hook up map_sg()
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
                   ` (6 preceding siblings ...)
  2021-01-27 20:01 ` [PATCH RFC 7/9] iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers Chuck Lever
@ 2021-01-27 20:01 ` Chuck Lever
  2021-01-27 20:01 ` [PATCH RFC 9/9] iommu/vt-d: Introduce map_sg() for Intel IOMMUs Chuck Lever
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:01 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

From: Isaac J. Manjarres <isaacm@codeaurora.org>

Now that everything is in place for iommu_map_sg() to defer
mapping a scatter-gather list to the io-pgtable layer, implement
the map_sg() callback in the SMMU driver, so that iommu_map_sg()
can invoke it with the entire scatter-gather list that will be
mapped.

Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfde6a61..52acc6858512 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1208,6 +1208,24 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 	return ret;
 }
 
+static int arm_smmu_map_sg(struct iommu_domain *domain, unsigned long iova,
+			   struct scatterlist *sg, unsigned int nents, int prot,
+			   gfp_t gfp, size_t *mapped)
+{
+	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+	int ret;
+
+	if (!ops)
+		return -ENODEV;
+
+	arm_smmu_rpm_get(smmu);
+	ret = ops->map_sg(ops, iova, sg, nents, prot, gfp, mapped);
+	arm_smmu_rpm_put(smmu);
+
+	return ret;
+}
+
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 			     size_t size, struct iommu_iotlb_gather *gather)
 {
@@ -1624,6 +1642,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_free		= arm_smmu_domain_free,
 	.attach_dev		= arm_smmu_attach_dev,
 	.map			= arm_smmu_map,
+	.map_sg			= arm_smmu_map_sg,
 	.unmap			= arm_smmu_unmap,
 	.flush_iotlb_all	= arm_smmu_flush_iotlb_all,
 	.iotlb_sync		= arm_smmu_iotlb_sync,


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC 9/9] iommu/vt-d: Introduce map_sg() for Intel IOMMUs
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
                   ` (7 preceding siblings ...)
  2021-01-27 20:01 ` [PATCH RFC 8/9] iommu/arm-smmu: Hook up map_sg() Chuck Lever
@ 2021-01-27 20:01 ` Chuck Lever
  2021-01-28  2:28 ` [PATCH RFC 0/9] Possible set of VT-d optimizations Lu Baolu
  2021-01-28 13:59 ` Robin Murphy
  10 siblings, 0 replies; 22+ messages in thread
From: Chuck Lever @ 2021-01-27 20:01 UTC (permalink / raw)
  To: baolu.lu; +Cc: isaacm, robin.murphy, iommu, will

Attempt to reduce indirect call overhead when mapping a substantial
scatter-gather list.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/iommu/intel/iommu.c |   37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 013097b6d55f..deae39f1477a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4952,6 +4952,42 @@ static int intel_iommu_map(struct iommu_domain *domain,
 				hpa >> VTD_PAGE_SHIFT, size, prot);
 }
 
+static int intel_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
+			      struct scatterlist *sg, unsigned int nents,
+			      int prot, gfp_t gfp, size_t *mapped)
+{
+	unsigned int i = 0;
+	phys_addr_t start;
+	size_t len = 0;
+	int ret;
+
+	while (i <= nents) {
+		phys_addr_t s_phys = sg_phys(sg);
+
+		if (len && s_phys != start + len) {
+			ret = intel_iommu_map(domain, iova + *mapped, start,
+					      len, prot, gfp);
+			if (ret)
+				return ret;
+
+			*mapped += len;
+			len = 0;
+		}
+
+		if (len) {
+			len += sg->length;
+		} else {
+			len = sg->length;
+			start = s_phys;
+		}
+
+		if (++i < nents)
+			sg = sg_next(sg);
+	}
+
+	return 0;
+}
+
 static size_t intel_iommu_unmap(struct iommu_domain *domain,
 				unsigned long iova, size_t size,
 				struct iommu_iotlb_gather *gather)
@@ -5519,6 +5555,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.aux_detach_dev		= intel_iommu_aux_detach_device,
 	.aux_get_pasid		= intel_iommu_aux_get_pasid,
 	.map			= intel_iommu_map,
+	.map_sg			= intel_iommu_map_sg,
 	.iotlb_sync_map		= intel_iommu_iotlb_sync_map,
 	.unmap			= intel_iommu_unmap,
 	.flush_iotlb_all        = intel_flush_iotlb_all,


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 0/9] Possible set of VT-d optimizations
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
                   ` (8 preceding siblings ...)
  2021-01-27 20:01 ` [PATCH RFC 9/9] iommu/vt-d: Introduce map_sg() for Intel IOMMUs Chuck Lever
@ 2021-01-28  2:28 ` Lu Baolu
  2021-01-28 13:59 ` Robin Murphy
  10 siblings, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2021-01-28  2:28 UTC (permalink / raw)
  To: Chuck Lever; +Cc: isaacm, robin.murphy, iommu, will

Hi Chuck,

On 1/28/21 4:00 AM, Chuck Lever wrote:
> Hi-
> 
> This collection of patches seems to get the best throughtput results
> so far. The NFS WRITE result is fully restored, and the NFS READ
> result is very close to fully restored.

Very glad to see this. Thanks!

Can you please add below link if you have a next version?

https://lore.kernel.org/linux-iommu/D81314ED-5673-44A6-B597-090E3CB83EB0@oracle.com/

It helps people to understand what regression you have seen.

Best regards,
baolu

> 
> 	Children see throughput for 12 initial writers  = 5008474.03 kB/sec
> 	Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
> 	Min throughput per process                      = 416956.88 kB/sec
> 	Max throughput per process                      = 417910.22 kB/sec
> 	Avg throughput per process                      = 417372.84 kB/sec
> 	Min xfer                                        = 1046272.00 kB
> 	CPU Utilization: Wall time    2.515    CPU time    1.996    CPU utilization  79.37 %
> 
> 
> 	Children see throughput for 12 rewriters        = 5020584.59 kB/sec
> 	Parent sees throughput for 12 rewriters         = 5012539.29 kB/sec
> 	Min throughput per process                      = 417799.00 kB/sec
> 	Max throughput per process                      = 419082.22 kB/sec
> 	Avg throughput per process                      = 418382.05 kB/sec
> 	Min xfer                                        = 1046528.00 kB
> 	CPU utilization: Wall time    2.507    CPU time    2.024    CPU utilization  80.73 %
> 
> 
> 	Children see throughput for 12 readers          = 5805484.25 kB/sec
> 	Parent sees throughput for 12 readers           = 5799535.68 kB/sec
> 	Min throughput per process                      = 482888.16 kB/sec
> 	Max throughput per process                      = 484444.16 kB/sec
> 	Avg throughput per process                      = 483790.35 kB/sec
> 	Min xfer                                        = 1045760.00 kB
> 	CPU utilization: Wall time    2.167    CPU time    1.964    CPU utilization  90.63 %
> 
> 
> 	Children see throughput for 12 re-readers       = 5812227.16 kB/sec
> 	Parent sees throughput for 12 re-readers        = 5803793.06 kB/sec
> 	Min throughput per process                      = 483242.97 kB/sec
> 	Max throughput per process                      = 485724.41 kB/sec
> 	Avg throughput per process                      = 484352.26 kB/sec
> 	Min xfer                                        = 1043456.00 kB
> 	CPU utilization: Wall time    2.161    CPU time    1.976    CPU utilization  91.45 %
> 
> I've included a simple-minded implementation of a map_sg op for
> the Intel IOMMU. This is nothing more than a copy of the loop in
> __iommu_map_sg() with the call to __iommu_map() replaced with a
> call to intel_iommu_map().
> 
> ---
> 
> Chuck Lever (1):
>        iommu/vt-d: Introduce map_sg() for Intel IOMMUs
> 
> Isaac J. Manjarres (5):
>        iommu/io-pgtable: Introduce map_sg() as a page table op
>        iommu/io-pgtable-arm: Hook up map_sg()
>        iommu/io-pgtable-arm-v7s: Hook up map_sg()
>        iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
>        iommu/arm-smmu: Hook up map_sg()
> 
> Lu Baolu (1):
>        iommu/vt-d: Add iotlb_sync_map callback
> 
> Yong Wu (2):
>        iommu: Move iotlb_sync_map out from __iommu_map
>        iommu: Add iova and size as parameters in iotlb_sync_map
> 
> 
>   drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 ++++
>   drivers/iommu/intel/iommu.c           | 131 ++++++++++++++++++++------
>   drivers/iommu/io-pgtable-arm-v7s.c    |  90 ++++++++++++++++++
>   drivers/iommu/io-pgtable-arm.c        |  86 +++++++++++++++++
>   drivers/iommu/iommu.c                 |  47 +++++++--
>   drivers/iommu/tegra-gart.c            |   7 +-
>   include/linux/iommu.h                 |  16 +++-
>   7 files changed, 353 insertions(+), 43 deletions(-)
> 
> --
> Chuck Lever
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 1/9] iommu: Move iotlb_sync_map out from __iommu_map
  2021-01-27 20:00 ` [PATCH RFC 1/9] iommu: Move iotlb_sync_map out from __iommu_map Chuck Lever
@ 2021-01-28  2:40   ` Lu Baolu
  0 siblings, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2021-01-28  2:40 UTC (permalink / raw)
  To: Chuck Lever; +Cc: isaacm, robin.murphy, iommu, will

Hi,

On 1/28/21 4:00 AM, Chuck Lever wrote:
> From: Yong Wu <yong.wu@mediatek.com>
> 
> In the end of __iommu_map, It alway call iotlb_sync_map.
> 
> This patch moves iotlb_sync_map out from __iommu_map since it is
> unnecessary to call this for each sg segment especially iotlb_sync_map
> is flush tlb all currently. Add a little helper _iommu_map for this.
> 
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   drivers/iommu/iommu.c |   23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index ffeebda8d6de..c304a6a30d42 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2426,9 +2426,6 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
>   		size -= pgsize;
>   	}
>   
> -	if (ops->iotlb_sync_map)
> -		ops->iotlb_sync_map(domain);
> -
>   	/* unroll mapping in case something went wrong */
>   	if (ret)
>   		iommu_unmap(domain, orig_iova, orig_size - size);
> @@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
>   	return ret;
>   }
>   
> +static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
> +		      phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> +{
> +	const struct iommu_ops *ops = domain->ops;
> +	int ret;
> +
> +	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> +	if (ret == 0 && ops->iotlb_sync_map)
> +		ops->iotlb_sync_map(domain);

Previous code called iotlb_sync_map() regardless of whether the mapping
was successful or not. Here the logic changes, and the callback is only
called if mapping successfully.

Any reason? It's safer to always call iotlb_sync_map() even in failed
mapping case. In this way, we can ensure the consistency of cache as
much as possible.

Best regards,
baolu

> +
> +	return ret;
> +}
> +
>   int iommu_map(struct iommu_domain *domain, unsigned long iova,
>   	      phys_addr_t paddr, size_t size, int prot)
>   {
>   	might_sleep();
> -	return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> +	return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
>   }
>   EXPORT_SYMBOL_GPL(iommu_map);
>   
>   int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
>   	      phys_addr_t paddr, size_t size, int prot)
>   {
> -	return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
> +	return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
>   }
>   EXPORT_SYMBOL_GPL(iommu_map_atomic);
>   
> @@ -2533,6 +2543,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
>   			     struct scatterlist *sg, unsigned int nents, int prot,
>   			     gfp_t gfp)
>   {
> +	const struct iommu_ops *ops = domain->ops;
>   	size_t len = 0, mapped = 0;
>   	phys_addr_t start;
>   	unsigned int i = 0;
> @@ -2563,6 +2574,8 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
>   			sg = sg_next(sg);
>   	}
>   
> +	if (ops->iotlb_sync_map)
> +		ops->iotlb_sync_map(domain);
>   	return mapped;
>   
>   out_err:
> 
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map
  2021-01-27 20:00 ` [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map Chuck Lever
@ 2021-01-28  2:44   ` Lu Baolu
  2021-01-28 12:51   ` Joerg Roedel
  1 sibling, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2021-01-28  2:44 UTC (permalink / raw)
  To: Chuck Lever; +Cc: isaacm, robin.murphy, iommu, will

On 1/28/21 4:00 AM, Chuck Lever wrote:
> From: Yong Wu <yong.wu@mediatek.com>
> 
> iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
> mapping. This patch adds iova and size as the parameters in it. then the
> IOMMU driver could flush tlb with the whole range once after iova mapping
> to improve performance.
> 
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   drivers/iommu/iommu.c      |    4 ++--
>   drivers/iommu/tegra-gart.c |    7 +++++--
>   include/linux/iommu.h      |    3 ++-
>   3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index c304a6a30d42..3d099a31ddca 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
>   
>   	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
>   	if (ret == 0 && ops->iotlb_sync_map)
> -		ops->iotlb_sync_map(domain);
> +		ops->iotlb_sync_map(domain, iova, size);
>   
>   	return ret;
>   }
> @@ -2575,7 +2575,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
>   	}
>   
>   	if (ops->iotlb_sync_map)
> -		ops->iotlb_sync_map(domain);
> +		ops->iotlb_sync_map(domain, iova, mapped);
>   	return mapped;
>   
>   out_err:
> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
> index fac720273889..05e8e19b8269 100644
> --- a/drivers/iommu/tegra-gart.c
> +++ b/drivers/iommu/tegra-gart.c
> @@ -261,7 +261,8 @@ static int gart_iommu_of_xlate(struct device *dev,
>   	return 0;
>   }
>   
> -static void gart_iommu_sync_map(struct iommu_domain *domain)
> +static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
> +				size_t size)
>   {
>   	FLUSH_GART_REGS(gart_handle);
>   }
> @@ -269,7 +270,9 @@ static void gart_iommu_sync_map(struct iommu_domain *domain)
>   static void gart_iommu_sync(struct iommu_domain *domain,
>   			    struct iommu_iotlb_gather *gather)
>   {
> -	gart_iommu_sync_map(domain);
> +	size_t length = gather->end - gather->start;
> +
> +	gart_iommu_sync_map(domain, gather->start, length);
>   }
>   
>   static const struct iommu_ops gart_iommu_ops = {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b3f0e2018c62..9ce0aa9e236b 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -246,7 +246,8 @@ struct iommu_ops {
>   	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
>   		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
>   	void (*flush_iotlb_all)(struct iommu_domain *domain);
> -	void (*iotlb_sync_map)(struct iommu_domain *domain);
> +	void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
> +			       size_t size);
>   	void (*iotlb_sync)(struct iommu_domain *domain,
>   			   struct iommu_iotlb_gather *iotlb_gather);
>   	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
> 
> 

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 7/9] iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
  2021-01-27 20:01 ` [PATCH RFC 7/9] iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers Chuck Lever
@ 2021-01-28  2:58   ` Lu Baolu
  0 siblings, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2021-01-28  2:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: isaacm, robin.murphy, iommu, will

Hi,

On 1/28/21 4:01 AM, Chuck Lever wrote:
> From: Isaac J. Manjarres <isaacm@codeaurora.org>
> 
> Add support for IOMMU drivers to have their own map_sg() callbacks.
> This completes the path for having iommu_map_sg() invoke an IOMMU
> driver's map_sg() callback, which can then invoke the io-pgtable
> map_sg() callback with the entire scatter-gather list, so that it
> can be processed entirely in the io-pgtable layer.
> 
> For IOMMU drivers that do not provide a callback, the default
> implementation of iterating through the scatter-gather list, while
> calling iommu_map() will be used.
> 
> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> [ cel: adjusted new iotlb_sync_map call site ]
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   drivers/iommu/iommu.c |   12 ++++++++++++
>   include/linux/iommu.h |    5 +++++
>   2 files changed, 17 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index ed879a4d7fac..bd7adbd0339b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2551,6 +2551,18 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
>   	unsigned int i = 0;
>   	int ret;
>   
> +	if (ops->map_sg) {
> +		ret = ops->map_sg(domain, iova, sg, nents, prot, gfp, &mapped);
> +
> +		if (ops->iotlb_sync_map)
> +			ops->iotlb_sync_map(domain, iova, mapped);
> +
> +		if (ret)
> +			goto out_err;
> +
> +		return mapped;
> +	}
> +
>   	while (i <= nents) {
>   		phys_addr_t s_phys = sg_phys(sg);
>   
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index cd5f35022a25..667edc7b034a 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -192,6 +192,8 @@ struct iommu_iotlb_gather {
>    * @attach_dev: attach device to an iommu domain
>    * @detach_dev: detach device from an iommu domain
>    * @map: map a physically contiguous memory region to an iommu domain
> + * @map_sg: map a scatter-gather list of physically contiguous chunks to
> + *          an iommu domain.
>    * @unmap: unmap a physically contiguous memory region from an iommu domain
>    * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
>    * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
> @@ -243,6 +245,9 @@ struct iommu_ops {
>   	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
>   	int (*map)(struct iommu_domain *domain, unsigned long iova,
>   		   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> +	int (*map_sg)(struct iommu_domain *domain, unsigned long iova,
> +		      struct scatterlist *sg, unsigned int nents, int prot,
> +		      gfp_t gfp, size_t *mapped);

I might be oversensitive. But what if the vendor iommu driver uses iova
beyond the pre-allocated range? It's safer if we could pass an iova
length parameter as well, so that the iommu driver could do some sanity
check.

Best regards,
baolu

>   	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
>   		     size_t size, struct iommu_iotlb_gather *iotlb_gather);
>   	void (*flush_iotlb_all)(struct iommu_domain *domain);
> 
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map
  2021-01-27 20:00 ` [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map Chuck Lever
  2021-01-28  2:44   ` Lu Baolu
@ 2021-01-28 12:51   ` Joerg Roedel
  2021-01-28 13:19     ` Will Deacon
  1 sibling, 1 reply; 22+ messages in thread
From: Joerg Roedel @ 2021-01-28 12:51 UTC (permalink / raw)
  To: Chuck Lever; +Cc: isaacm, robin.murphy, iommu, will

Hi Chuck,

thanks for your optimizations!

On Wed, Jan 27, 2021 at 03:00:32PM -0500, Chuck Lever wrote:
> From: Yong Wu <yong.wu@mediatek.com>
> 
> iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
> mapping. This patch adds iova and size as the parameters in it. then the
> IOMMU driver could flush tlb with the whole range once after iova mapping
> to improve performance.
> 
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  drivers/iommu/iommu.c      |    4 ++--
>  drivers/iommu/tegra-gart.c |    7 +++++--
>  include/linux/iommu.h      |    3 ++-
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index c304a6a30d42..3d099a31ddca 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
>  
>  	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
>  	if (ret == 0 && ops->iotlb_sync_map)
> -		ops->iotlb_sync_map(domain);
> +		ops->iotlb_sync_map(domain, iova, size);

How about using 'struct iommu_iotlb_gather' instead of directly passing
the iova/size parameters here? The iotlb_sync() call-back uses it
already.

Regards,

	Joerg
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map
  2021-01-28 12:51   ` Joerg Roedel
@ 2021-01-28 13:19     ` Will Deacon
  2021-01-28 13:26       ` Joerg Roedel
  0 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2021-01-28 13:19 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: isaacm, iommu, Chuck Lever, robin.murphy

Hi Joerg,

On Thu, Jan 28, 2021 at 01:51:12PM +0100, Joerg Roedel wrote:
> On Wed, Jan 27, 2021 at 03:00:32PM -0500, Chuck Lever wrote:
> > From: Yong Wu <yong.wu@mediatek.com>
> > 
> > iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
> > mapping. This patch adds iova and size as the parameters in it. then the
> > IOMMU driver could flush tlb with the whole range once after iova mapping
> > to improve performance.
> > 
> > Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> > Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >  drivers/iommu/iommu.c      |    4 ++--
> >  drivers/iommu/tegra-gart.c |    7 +++++--
> >  include/linux/iommu.h      |    3 ++-
> >  3 files changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index c304a6a30d42..3d099a31ddca 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
> >  
> >  	ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> >  	if (ret == 0 && ops->iotlb_sync_map)
> > -		ops->iotlb_sync_map(domain);
> > +		ops->iotlb_sync_map(domain, iova, size);
> 
> How about using 'struct iommu_iotlb_gather' instead of directly passing
> the iova/size parameters here? The iotlb_sync() call-back uses it
> already.

Heads-up, but I already queued this patch as part of its original series:

https://lore.kernel.org/r/20210107122909.16317-1-yong.wu@mediatek.com

since it's part of the Mediatek series for 5.12.

Would you like me to drop that, or can we stick with passing iova and size
for now, with a view to refactoring it later on?

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map
  2021-01-28 13:19     ` Will Deacon
@ 2021-01-28 13:26       ` Joerg Roedel
  0 siblings, 0 replies; 22+ messages in thread
From: Joerg Roedel @ 2021-01-28 13:26 UTC (permalink / raw)
  To: Will Deacon; +Cc: isaacm, iommu, Chuck Lever, robin.murphy

On Thu, Jan 28, 2021 at 01:19:29PM +0000, Will Deacon wrote:
> Heads-up, but I already queued this patch as part of its original series:
> 
> https://lore.kernel.org/r/20210107122909.16317-1-yong.wu@mediatek.com
> 
> since it's part of the Mediatek series for 5.12.
> 
> Would you like me to drop that, or can we stick with passing iova and size
> for now, with a view to refactoring it later on?

I think its okay for now, we can refactor it later.

Regards,

	Joerg
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 5/9] iommu/io-pgtable-arm: Hook up map_sg()
  2021-01-27 20:00 ` [PATCH RFC 5/9] iommu/io-pgtable-arm: Hook up map_sg() Chuck Lever
@ 2021-01-28 13:53   ` Will Deacon
  0 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2021-01-28 13:53 UTC (permalink / raw)
  To: Chuck Lever; +Cc: isaacm, iommu, robin.murphy

On Wed, Jan 27, 2021 at 03:00:53PM -0500, Chuck Lever wrote:
> From: Isaac J. Manjarres <isaacm@codeaurora.org>
> 
> Implement the map_sg io-pgtable op for the ARM LPAE io-pgtable
> code, so that IOMMU drivers can call it when they need to map
> a scatter-gather list.
> 
> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  drivers/iommu/io-pgtable-arm.c |   86 ++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu.c          |   12 +++---
>  include/linux/iommu.h          |    8 ++++
>  3 files changed, 101 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 87def58e79b5..0c11529442b8 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -473,6 +473,91 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	return ret;
>  }
>  
> +static int arm_lpae_map_by_pgsize(struct io_pgtable_ops *ops,
> +				  unsigned long iova, phys_addr_t paddr,
> +				  size_t size, int iommu_prot, gfp_t gfp,
> +				  size_t *mapped)
> +{
> +	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +	struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +	arm_lpae_iopte *ptep = data->pgd;
> +	int ret, lvl = data->start_level;
> +	arm_lpae_iopte prot = arm_lpae_prot_to_pte(data, iommu_prot);
> +	unsigned int min_pagesz = 1 << __ffs(cfg->pgsize_bitmap);
> +	long iaext = (s64)(iova + size - 1) >> cfg->ias;
> +	size_t pgsize;
> +
> +	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
> +		pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n",
> +		       iova, &paddr, size, min_pagesz);
> +		return -EINVAL;
> +	}
> +
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext || (paddr + size - 1) >> cfg->oas))
> +		return -ERANGE;
> +
> +	while (size) {
> +		pgsize = iommu_pgsize(cfg->pgsize_bitmap, iova | paddr, size);
> +		ret = __arm_lpae_map(data, iova, paddr, pgsize, prot, lvl, ptep,
> +				     gfp);
> +		if (ret)
> +			return ret;
> +
> +		iova += pgsize;
> +		paddr += pgsize;
> +		*mapped += pgsize;
> +		size -= pgsize;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_lpae_map_sg(struct io_pgtable_ops *ops, unsigned long iova,
> +			   struct scatterlist *sg, unsigned int nents,
> +			   int iommu_prot, gfp_t gfp, size_t *mapped)
> +{
> +
> +	size_t len = 0;
> +	unsigned int i = 0;
> +	int ret;
> +	phys_addr_t start;
> +
> +	*mapped = 0;
> +
> +	/* If no access, then nothing to do */
> +	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
> +		return 0;
> +
> +	while (i <= nents) {
> +		phys_addr_t s_phys = sg_phys(sg);
> +
> +		if (len && s_phys != start + len) {
> +			ret = arm_lpae_map_by_pgsize(ops, iova + *mapped, start,
> +						     len, iommu_prot, gfp,
> +						     mapped);
> +
> +			if (ret)
> +				return ret;
> +
> +			len = 0;
> +		}
> +
> +		if (len) {
> +			len += sg->length;
> +		} else {
> +			len = sg->length;
> +			start = s_phys;
> +		}
> +
> +		if (++i < nents)
> +			sg = sg_next(sg);
> +	}
> +
> +	return 0;
> +}

Although I really like the idea of reducing the layering here, I think we
need to figure out a way to reduce the amount of boiler-plate that ends up
in the pgtable code. Otherwise it's pretty unmaintainable.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 0/9] Possible set of VT-d optimizations
  2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
                   ` (9 preceding siblings ...)
  2021-01-28  2:28 ` [PATCH RFC 0/9] Possible set of VT-d optimizations Lu Baolu
@ 2021-01-28 13:59 ` Robin Murphy
  2021-01-28 14:52   ` Chuck Lever
  10 siblings, 1 reply; 22+ messages in thread
From: Robin Murphy @ 2021-01-28 13:59 UTC (permalink / raw)
  To: Chuck Lever, baolu.lu; +Cc: isaacm, iommu, will

On 2021-01-27 20:00, Chuck Lever wrote:
> Hi-
> 
> This collection of patches seems to get the best throughtput results
> so far. The NFS WRITE result is fully restored, and the NFS READ
> result is very close to fully restored.
> 
> 	Children see throughput for 12 initial writers  = 5008474.03 kB/sec
> 	Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
> 	Min throughput per process                      = 416956.88 kB/sec
> 	Max throughput per process                      = 417910.22 kB/sec
> 	Avg throughput per process                      = 417372.84 kB/sec
> 	Min xfer                                        = 1046272.00 kB
> 	CPU Utilization: Wall time    2.515    CPU time    1.996    CPU utilization  79.37 %
> 
> 
> 	Children see throughput for 12 rewriters        = 5020584.59 kB/sec
> 	Parent sees throughput for 12 rewriters         = 5012539.29 kB/sec
> 	Min throughput per process                      = 417799.00 kB/sec
> 	Max throughput per process                      = 419082.22 kB/sec
> 	Avg throughput per process                      = 418382.05 kB/sec
> 	Min xfer                                        = 1046528.00 kB
> 	CPU utilization: Wall time    2.507    CPU time    2.024    CPU utilization  80.73 %
> 
> 
> 	Children see throughput for 12 readers          = 5805484.25 kB/sec
> 	Parent sees throughput for 12 readers           = 5799535.68 kB/sec
> 	Min throughput per process                      = 482888.16 kB/sec
> 	Max throughput per process                      = 484444.16 kB/sec
> 	Avg throughput per process                      = 483790.35 kB/sec
> 	Min xfer                                        = 1045760.00 kB
> 	CPU utilization: Wall time    2.167    CPU time    1.964    CPU utilization  90.63 %
> 
> 
> 	Children see throughput for 12 re-readers       = 5812227.16 kB/sec
> 	Parent sees throughput for 12 re-readers        = 5803793.06 kB/sec
> 	Min throughput per process                      = 483242.97 kB/sec
> 	Max throughput per process                      = 485724.41 kB/sec
> 	Avg throughput per process                      = 484352.26 kB/sec
> 	Min xfer                                        = 1043456.00 kB
> 	CPU utilization: Wall time    2.161    CPU time    1.976    CPU utilization  91.45 %
> 
> I've included a simple-minded implementation of a map_sg op for
> the Intel IOMMU. This is nothing more than a copy of the loop in
> __iommu_map_sg() with the call to __iommu_map() replaced with a
> call to intel_iommu_map().

...which is the main reason I continue to strongly dislike patches #4-#9 
(#3 definitely seems to makes sense either way, now that #1 and #2 are 
going to land). If a common operation is worth optimising anywhere, then 
it deserves optimising everywhere, so we end up with a dozen diverging 
copies of essentially the same code - particularly when the 
driver-specific functionality *is* already in the drivers, so what gets 
duplicated is solely the "generic" parts.

And if there's justification for pushing iommu_map_sg() entirely into 
drivers, then it's verging on self-contradictory not to do the same for 
iommu_map() and iommu_unmap(). Some IOMMU drivers - mainly intel-iommu, 
as it happens - are already implementing hacks around the "one call per 
page" interface being inherently inefficient, so the logical thing to do 
here is take a step back and reconsider the fundamental design of the 
whole map/unmap interface. Implementing hacks on top of hacks to make 
particular things faster on particular systems that particular people 
care about is not going to do us any favours in the long run.

As it stands, I can easily see a weird anti-pattern emerging where 
people start adding code to fake up scatterlists in random drivers 
because they see dma_map_sg() performing paradoxically better than 
dma_map_page().

Robin.

> ---
> 
> Chuck Lever (1):
>        iommu/vt-d: Introduce map_sg() for Intel IOMMUs
> 
> Isaac J. Manjarres (5):
>        iommu/io-pgtable: Introduce map_sg() as a page table op
>        iommu/io-pgtable-arm: Hook up map_sg()
>        iommu/io-pgtable-arm-v7s: Hook up map_sg()
>        iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
>        iommu/arm-smmu: Hook up map_sg()
> 
> Lu Baolu (1):
>        iommu/vt-d: Add iotlb_sync_map callback
> 
> Yong Wu (2):
>        iommu: Move iotlb_sync_map out from __iommu_map
>        iommu: Add iova and size as parameters in iotlb_sync_map
> 
> 
>   drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 ++++
>   drivers/iommu/intel/iommu.c           | 131 ++++++++++++++++++++------
>   drivers/iommu/io-pgtable-arm-v7s.c    |  90 ++++++++++++++++++
>   drivers/iommu/io-pgtable-arm.c        |  86 +++++++++++++++++
>   drivers/iommu/iommu.c                 |  47 +++++++--
>   drivers/iommu/tegra-gart.c            |   7 +-
>   include/linux/iommu.h                 |  16 +++-
>   7 files changed, 353 insertions(+), 43 deletions(-)
> 
> --
> Chuck Lever
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 0/9] Possible set of VT-d optimizations
  2021-01-28 13:59 ` Robin Murphy
@ 2021-01-28 14:52   ` Chuck Lever
  2021-01-29 11:54     ` Lu Baolu
  2021-02-01 18:08     ` Chuck Lever
  0 siblings, 2 replies; 22+ messages in thread
From: Chuck Lever @ 2021-01-28 14:52 UTC (permalink / raw)
  To: Robin Murphy; +Cc: isaacm, iommu, will



> On Jan 28, 2021, at 8:59 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 2021-01-27 20:00, Chuck Lever wrote:
>> Hi-
>> This collection of patches seems to get the best throughtput results
>> so far. The NFS WRITE result is fully restored, and the NFS READ
>> result is very close to fully restored.
>> 	Children see throughput for 12 initial writers  = 5008474.03 kB/sec
>> 	Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
>> 	Min throughput per process                      = 416956.88 kB/sec
>> 	Max throughput per process                      = 417910.22 kB/sec
>> 	Avg throughput per process                      = 417372.84 kB/sec
>> 	Min xfer                                        = 1046272.00 kB
>> 	CPU Utilization: Wall time    2.515    CPU time    1.996    CPU utilization  79.37 %
>> 	Children see throughput for 12 rewriters        = 5020584.59 kB/sec
>> 	Parent sees throughput for 12 rewriters         = 5012539.29 kB/sec
>> 	Min throughput per process                      = 417799.00 kB/sec
>> 	Max throughput per process                      = 419082.22 kB/sec
>> 	Avg throughput per process                      = 418382.05 kB/sec
>> 	Min xfer                                        = 1046528.00 kB
>> 	CPU utilization: Wall time    2.507    CPU time    2.024    CPU utilization  80.73 %
>> 	Children see throughput for 12 readers          = 5805484.25 kB/sec
>> 	Parent sees throughput for 12 readers           = 5799535.68 kB/sec
>> 	Min throughput per process                      = 482888.16 kB/sec
>> 	Max throughput per process                      = 484444.16 kB/sec
>> 	Avg throughput per process                      = 483790.35 kB/sec
>> 	Min xfer                                        = 1045760.00 kB
>> 	CPU utilization: Wall time    2.167    CPU time    1.964    CPU utilization  90.63 %
>> 	Children see throughput for 12 re-readers       = 5812227.16 kB/sec
>> 	Parent sees throughput for 12 re-readers        = 5803793.06 kB/sec
>> 	Min throughput per process                      = 483242.97 kB/sec
>> 	Max throughput per process                      = 485724.41 kB/sec
>> 	Avg throughput per process                      = 484352.26 kB/sec
>> 	Min xfer                                        = 1043456.00 kB
>> 	CPU utilization: Wall time    2.161    CPU time    1.976    CPU utilization  91.45 %
>> I've included a simple-minded implementation of a map_sg op for
>> the Intel IOMMU. This is nothing more than a copy of the loop in
>> __iommu_map_sg() with the call to __iommu_map() replaced with a
>> call to intel_iommu_map().
> 
> ...which is the main reason I continue to strongly dislike patches #4-#9 (#3 definitely seems to makes sense either way, now that #1 and #2 are going to land). If a common operation is worth optimising anywhere, then it deserves optimising everywhere, so we end up with a dozen diverging copies of essentially the same code - particularly when the driver-specific functionality *is* already in the drivers, so what gets duplicated is solely the "generic" parts.

I don't disagree with that assessment, but I don't immediately see an
alternative API arrangement that would be more successful in the short
term. If 4/9 - 9/9 are not acceptable, then the responsible thing to
do would be to revert:

 - 58a8bb39490d ("iommu/vt-d: Cleanup after converting to dma-iommu ops")
 - c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")

for v5.11, work out the proper API design, and then try the VT-d conversion
again.

IMHO.


> And if there's justification for pushing iommu_map_sg() entirely into drivers, then it's verging on self-contradictory not to do the same for iommu_map() and iommu_unmap(). Some IOMMU drivers - mainly intel-iommu, as it happens - are already implementing hacks around the "one call per page" interface being inherently inefficient, so the logical thing to do here is take a step back and reconsider the fundamental design of the whole map/unmap interface. Implementing hacks on top of hacks to make particular things faster on particular systems that particular people care about is not going to do us any favours in the long run.
> 
> As it stands, I can easily see a weird anti-pattern emerging where people start adding code to fake up scatterlists in random drivers because they see dma_map_sg() performing paradoxically better than dma_map_page().
> 
> Robin.
> 
>> ---
>> Chuck Lever (1):
>>       iommu/vt-d: Introduce map_sg() for Intel IOMMUs
>> Isaac J. Manjarres (5):
>>       iommu/io-pgtable: Introduce map_sg() as a page table op
>>       iommu/io-pgtable-arm: Hook up map_sg()
>>       iommu/io-pgtable-arm-v7s: Hook up map_sg()
>>       iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
>>       iommu/arm-smmu: Hook up map_sg()
>> Lu Baolu (1):
>>       iommu/vt-d: Add iotlb_sync_map callback
>> Yong Wu (2):
>>       iommu: Move iotlb_sync_map out from __iommu_map
>>       iommu: Add iova and size as parameters in iotlb_sync_map
>>  drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 ++++
>>  drivers/iommu/intel/iommu.c           | 131 ++++++++++++++++++++------
>>  drivers/iommu/io-pgtable-arm-v7s.c    |  90 ++++++++++++++++++
>>  drivers/iommu/io-pgtable-arm.c        |  86 +++++++++++++++++
>>  drivers/iommu/iommu.c                 |  47 +++++++--
>>  drivers/iommu/tegra-gart.c            |   7 +-
>>  include/linux/iommu.h                 |  16 +++-
>>  7 files changed, 353 insertions(+), 43 deletions(-)
>> --
>> Chuck Lever
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

--
Chuck Lever



_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 0/9] Possible set of VT-d optimizations
  2021-01-28 14:52   ` Chuck Lever
@ 2021-01-29 11:54     ` Lu Baolu
  2021-02-01 18:08     ` Chuck Lever
  1 sibling, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2021-01-29 11:54 UTC (permalink / raw)
  To: Chuck Lever, Robin Murphy; +Cc: isaacm, iommu, will

On 2021/1/28 22:52, Chuck Lever wrote:
> 
> 
>> On Jan 28, 2021, at 8:59 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2021-01-27 20:00, Chuck Lever wrote:
>>> Hi-
>>> This collection of patches seems to get the best throughtput results
>>> so far. The NFS WRITE result is fully restored, and the NFS READ
>>> result is very close to fully restored.
>>> 	Children see throughput for 12 initial writers  = 5008474.03 kB/sec
>>> 	Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
>>> 	Min throughput per process                      = 416956.88 kB/sec
>>> 	Max throughput per process                      = 417910.22 kB/sec
>>> 	Avg throughput per process                      = 417372.84 kB/sec
>>> 	Min xfer                                        = 1046272.00 kB
>>> 	CPU Utilization: Wall time    2.515    CPU time    1.996    CPU utilization  79.37 %
>>> 	Children see throughput for 12 rewriters        = 5020584.59 kB/sec
>>> 	Parent sees throughput for 12 rewriters         = 5012539.29 kB/sec
>>> 	Min throughput per process                      = 417799.00 kB/sec
>>> 	Max throughput per process                      = 419082.22 kB/sec
>>> 	Avg throughput per process                      = 418382.05 kB/sec
>>> 	Min xfer                                        = 1046528.00 kB
>>> 	CPU utilization: Wall time    2.507    CPU time    2.024    CPU utilization  80.73 %
>>> 	Children see throughput for 12 readers          = 5805484.25 kB/sec
>>> 	Parent sees throughput for 12 readers           = 5799535.68 kB/sec
>>> 	Min throughput per process                      = 482888.16 kB/sec
>>> 	Max throughput per process                      = 484444.16 kB/sec
>>> 	Avg throughput per process                      = 483790.35 kB/sec
>>> 	Min xfer                                        = 1045760.00 kB
>>> 	CPU utilization: Wall time    2.167    CPU time    1.964    CPU utilization  90.63 %
>>> 	Children see throughput for 12 re-readers       = 5812227.16 kB/sec
>>> 	Parent sees throughput for 12 re-readers        = 5803793.06 kB/sec
>>> 	Min throughput per process                      = 483242.97 kB/sec
>>> 	Max throughput per process                      = 485724.41 kB/sec
>>> 	Avg throughput per process                      = 484352.26 kB/sec
>>> 	Min xfer                                        = 1043456.00 kB
>>> 	CPU utilization: Wall time    2.161    CPU time    1.976    CPU utilization  91.45 %
>>> I've included a simple-minded implementation of a map_sg op for
>>> the Intel IOMMU. This is nothing more than a copy of the loop in
>>> __iommu_map_sg() with the call to __iommu_map() replaced with a
>>> call to intel_iommu_map().
>>
>> ...which is the main reason I continue to strongly dislike patches #4-#9 (#3 definitely seems to makes sense either way, now that #1 and #2 are going to land). If a common operation is worth optimising anywhere, then it deserves optimising everywhere, so we end up with a dozen diverging copies of essentially the same code - particularly when the driver-specific functionality *is* already in the drivers, so what gets duplicated is solely the "generic" parts.
> 
> I don't disagree with that assessment, but I don't immediately see an
> alternative API arrangement that would be more successful in the short
> term. If 4/9 - 9/9 are not acceptable, then the responsible thing to
> do would be to revert:
> 
>   - 58a8bb39490d ("iommu/vt-d: Cleanup after converting to dma-iommu ops")
>   - c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
> 
> for v5.11, work out the proper API design, and then try the VT-d conversion
> again.

Probably we could introduce an iommu_ops->map/unmap_range() callback and
let the iommu driver to select "one call per page" map() or simply map a
range in a loop.

Best regards,
baolu

> 
> IMHO.
> 
> 
>> And if there's justification for pushing iommu_map_sg() entirely into drivers, then it's verging on self-contradictory not to do the same for iommu_map() and iommu_unmap(). Some IOMMU drivers - mainly intel-iommu, as it happens - are already implementing hacks around the "one call per page" interface being inherently inefficient, so the logical thing to do here is take a step back and reconsider the fundamental design of the whole map/unmap interface. Implementing hacks on top of hacks to make particular things faster on particular systems that particular people care about is not going to do us any favours in the long run.
>>
>> As it stands, I can easily see a weird anti-pattern emerging where people start adding code to fake up scatterlists in random drivers because they see dma_map_sg() performing paradoxically better than dma_map_page().
>>
>> Robin.
>>
>>> ---
>>> Chuck Lever (1):
>>>        iommu/vt-d: Introduce map_sg() for Intel IOMMUs
>>> Isaac J. Manjarres (5):
>>>        iommu/io-pgtable: Introduce map_sg() as a page table op
>>>        iommu/io-pgtable-arm: Hook up map_sg()
>>>        iommu/io-pgtable-arm-v7s: Hook up map_sg()
>>>        iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
>>>        iommu/arm-smmu: Hook up map_sg()
>>> Lu Baolu (1):
>>>        iommu/vt-d: Add iotlb_sync_map callback
>>> Yong Wu (2):
>>>        iommu: Move iotlb_sync_map out from __iommu_map
>>>        iommu: Add iova and size as parameters in iotlb_sync_map
>>>   drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 ++++
>>>   drivers/iommu/intel/iommu.c           | 131 ++++++++++++++++++++------
>>>   drivers/iommu/io-pgtable-arm-v7s.c    |  90 ++++++++++++++++++
>>>   drivers/iommu/io-pgtable-arm.c        |  86 +++++++++++++++++
>>>   drivers/iommu/iommu.c                 |  47 +++++++--
>>>   drivers/iommu/tegra-gart.c            |   7 +-
>>>   include/linux/iommu.h                 |  16 +++-
>>>   7 files changed, 353 insertions(+), 43 deletions(-)
>>> --
>>> Chuck Lever
>>> _______________________________________________
>>> iommu mailing list
>>> iommu@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
> --
> Chuck Lever
> 
> 
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC 0/9] Possible set of VT-d optimizations
  2021-01-28 14:52   ` Chuck Lever
  2021-01-29 11:54     ` Lu Baolu
@ 2021-02-01 18:08     ` Chuck Lever
  1 sibling, 0 replies; 22+ messages in thread
From: Chuck Lever @ 2021-02-01 18:08 UTC (permalink / raw)
  To: Robin Murphy; +Cc: isaacm, iommu, will



> On Jan 28, 2021, at 9:52 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Jan 28, 2021, at 8:59 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>> 
>> On 2021-01-27 20:00, Chuck Lever wrote:
>>> Hi-
>>> This collection of patches seems to get the best throughtput results
>>> so far. The NFS WRITE result is fully restored, and the NFS READ
>>> result is very close to fully restored.
>>> 	Children see throughput for 12 initial writers  = 5008474.03 kB/sec
>>> 	Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
>>> 	Min throughput per process                      = 416956.88 kB/sec
>>> 	Max throughput per process                      = 417910.22 kB/sec
>>> 	Avg throughput per process                      = 417372.84 kB/sec
>>> 	Min xfer                                        = 1046272.00 kB
>>> 	CPU Utilization: Wall time    2.515    CPU time    1.996    CPU utilization  79.37 %
>>> 	Children see throughput for 12 rewriters        = 5020584.59 kB/sec
>>> 	Parent sees throughput for 12 rewriters         = 5012539.29 kB/sec
>>> 	Min throughput per process                      = 417799.00 kB/sec
>>> 	Max throughput per process                      = 419082.22 kB/sec
>>> 	Avg throughput per process                      = 418382.05 kB/sec
>>> 	Min xfer                                        = 1046528.00 kB
>>> 	CPU utilization: Wall time    2.507    CPU time    2.024    CPU utilization  80.73 %
>>> 	Children see throughput for 12 readers          = 5805484.25 kB/sec
>>> 	Parent sees throughput for 12 readers           = 5799535.68 kB/sec
>>> 	Min throughput per process                      = 482888.16 kB/sec
>>> 	Max throughput per process                      = 484444.16 kB/sec
>>> 	Avg throughput per process                      = 483790.35 kB/sec
>>> 	Min xfer                                        = 1045760.00 kB
>>> 	CPU utilization: Wall time    2.167    CPU time    1.964    CPU utilization  90.63 %
>>> 	Children see throughput for 12 re-readers       = 5812227.16 kB/sec
>>> 	Parent sees throughput for 12 re-readers        = 5803793.06 kB/sec
>>> 	Min throughput per process                      = 483242.97 kB/sec
>>> 	Max throughput per process                      = 485724.41 kB/sec
>>> 	Avg throughput per process                      = 484352.26 kB/sec
>>> 	Min xfer                                        = 1043456.00 kB
>>> 	CPU utilization: Wall time    2.161    CPU time    1.976    CPU utilization  91.45 %
>>> I've included a simple-minded implementation of a map_sg op for
>>> the Intel IOMMU. This is nothing more than a copy of the loop in
>>> __iommu_map_sg() with the call to __iommu_map() replaced with a
>>> call to intel_iommu_map().
>> 
>> ...which is the main reason I continue to strongly dislike patches #4-#9 (#3 definitely seems to makes sense either way, now that #1 and #2 are going to land). If a common operation is worth optimising anywhere, then it deserves optimising everywhere, so we end up with a dozen diverging copies of essentially the same code - particularly when the driver-specific functionality *is* already in the drivers, so what gets duplicated is solely the "generic" parts.
> 
> I don't disagree with that assessment, but I don't immediately see an
> alternative API arrangement that would be more successful in the short
> term. If 4/9 - 9/9 are not acceptable, then the responsible thing to
> do would be to revert:
> 
> - 58a8bb39490d ("iommu/vt-d: Cleanup after converting to dma-iommu ops")
> - c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
> 
> for v5.11, work out the proper API design, and then try the VT-d conversion
> again.
> 
> IMHO.

Are all y'all waiting for me to post such patches? ;-)


>> And if there's justification for pushing iommu_map_sg() entirely into drivers, then it's verging on self-contradictory not to do the same for iommu_map() and iommu_unmap(). Some IOMMU drivers - mainly intel-iommu, as it happens - are already implementing hacks around the "one call per page" interface being inherently inefficient, so the logical thing to do here is take a step back and reconsider the fundamental design of the whole map/unmap interface. Implementing hacks on top of hacks to make particular things faster on particular systems that particular people care about is not going to do us any favours in the long run.
>> 
>> As it stands, I can easily see a weird anti-pattern emerging where people start adding code to fake up scatterlists in random drivers because they see dma_map_sg() performing paradoxically better than dma_map_page().
>> 
>> Robin.
>> 
>>> ---
>>> Chuck Lever (1):
>>>      iommu/vt-d: Introduce map_sg() for Intel IOMMUs
>>> Isaac J. Manjarres (5):
>>>      iommu/io-pgtable: Introduce map_sg() as a page table op
>>>      iommu/io-pgtable-arm: Hook up map_sg()
>>>      iommu/io-pgtable-arm-v7s: Hook up map_sg()
>>>      iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
>>>      iommu/arm-smmu: Hook up map_sg()
>>> Lu Baolu (1):
>>>      iommu/vt-d: Add iotlb_sync_map callback
>>> Yong Wu (2):
>>>      iommu: Move iotlb_sync_map out from __iommu_map
>>>      iommu: Add iova and size as parameters in iotlb_sync_map
>>> drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 ++++
>>> drivers/iommu/intel/iommu.c           | 131 ++++++++++++++++++++------
>>> drivers/iommu/io-pgtable-arm-v7s.c    |  90 ++++++++++++++++++
>>> drivers/iommu/io-pgtable-arm.c        |  86 +++++++++++++++++
>>> drivers/iommu/iommu.c                 |  47 +++++++--
>>> drivers/iommu/tegra-gart.c            |   7 +-
>>> include/linux/iommu.h                 |  16 +++-
>>> 7 files changed, 353 insertions(+), 43 deletions(-)
>>> --
>>> Chuck Lever
>>> _______________________________________________
>>> iommu mailing list
>>> iommu@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
> --
> Chuck Lever
> 
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

--
Chuck Lever



_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-02-01 18:09 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 20:00 [PATCH RFC 0/9] Possible set of VT-d optimizations Chuck Lever
2021-01-27 20:00 ` [PATCH RFC 1/9] iommu: Move iotlb_sync_map out from __iommu_map Chuck Lever
2021-01-28  2:40   ` Lu Baolu
2021-01-27 20:00 ` [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map Chuck Lever
2021-01-28  2:44   ` Lu Baolu
2021-01-28 12:51   ` Joerg Roedel
2021-01-28 13:19     ` Will Deacon
2021-01-28 13:26       ` Joerg Roedel
2021-01-27 20:00 ` [PATCH RFC 3/9] iommu/vt-d: Add iotlb_sync_map callback Chuck Lever
2021-01-27 20:00 ` [PATCH RFC 4/9] iommu/io-pgtable: Introduce map_sg() as a page table op Chuck Lever
2021-01-27 20:00 ` [PATCH RFC 5/9] iommu/io-pgtable-arm: Hook up map_sg() Chuck Lever
2021-01-28 13:53   ` Will Deacon
2021-01-27 20:01 ` [PATCH RFC 6/9] iommu/io-pgtable-arm-v7s: " Chuck Lever
2021-01-27 20:01 ` [PATCH RFC 7/9] iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers Chuck Lever
2021-01-28  2:58   ` Lu Baolu
2021-01-27 20:01 ` [PATCH RFC 8/9] iommu/arm-smmu: Hook up map_sg() Chuck Lever
2021-01-27 20:01 ` [PATCH RFC 9/9] iommu/vt-d: Introduce map_sg() for Intel IOMMUs Chuck Lever
2021-01-28  2:28 ` [PATCH RFC 0/9] Possible set of VT-d optimizations Lu Baolu
2021-01-28 13:59 ` Robin Murphy
2021-01-28 14:52   ` Chuck Lever
2021-01-29 11:54     ` Lu Baolu
2021-02-01 18:08     ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).