All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-08-14 13:04 ` Robin Murphy
  0 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-14 13:04 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

John raised the issue[1] that we have some unnecessary refcount contention
in the DMA ops path which shows scalability problems now that we have more
real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
sidestepping this by stashing domain references in archdata, but since
that's not very nice for architecture-agnostic code, I think it's time to
look at a generic API-level solution.

These are a couple of quick patches based on the idea I had back when
first implementing this lot, but didn't have any way to justify at the
time. The third patch can be ignored for the sake of API discussion, but
is included for completeness. 

Robin.


[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html

Robin Murphy (3):
  iommu: Add fast hook for getting DMA domains
  iommu/dma: Use fast DMA domain lookup
  arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
 drivers/iommu/iommu.c       |  9 +++++++++
 include/linux/iommu.h       |  1 +
 4 files changed, 27 insertions(+), 16 deletions(-)

-- 
2.17.1.dirty

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-08-14 13:04 ` Robin Murphy
  0 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-14 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

John raised the issue[1] that we have some unnecessary refcount contention
in the DMA ops path which shows scalability problems now that we have more
real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
sidestepping this by stashing domain references in archdata, but since
that's not very nice for architecture-agnostic code, I think it's time to
look at a generic API-level solution.

These are a couple of quick patches based on the idea I had back when
first implementing this lot, but didn't have any way to justify at the
time. The third patch can be ignored for the sake of API discussion, but
is included for completeness. 

Robin.


[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html

Robin Murphy (3):
  iommu: Add fast hook for getting DMA domains
  iommu/dma: Use fast DMA domain lookup
  arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
 drivers/iommu/iommu.c       |  9 +++++++++
 include/linux/iommu.h       |  1 +
 4 files changed, 27 insertions(+), 16 deletions(-)

-- 
2.17.1.dirty

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] iommu: Add fast hook for getting DMA domains
  2018-08-14 13:04 ` Robin Murphy
@ 2018-08-14 13:04     ` Robin Murphy
  -1 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-14 13:04 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
API callers to retrieve the domain pointer, for DMA ops domains it
doesn't scale well for large systems and multi-queue devices, since the
momentary refcount adjustment will lead to exclusive cacheline contention
when multiple CPUs are operating in parallel on different mappings for
the same device.

In the case of DMA ops domains, however, this refcounting is actually
unnecessary, since they already imply that the group exists and is
managed by platform code and IOMMU internals (by virtue of
iommu_group_get_for_dev()) such that a reference will already be held
for the lifetime of the device. Thus we can avoid the bottleneck by
providing a fast lookup specifically for the DMA code to retrieve the
default domain it already knows it has set up - a simple read-only
dereference plays much nicer with cache-coherency protocols.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu.c | 9 +++++++++
 include/linux/iommu.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63b37563db7e..63c586875df5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1379,6 +1379,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
 
+/*
+ * For IOMMU_DOMAIN_DMA implementations which already provide their own
+ * guarantees that the group and its default domain are valid and correct.
+ */
+struct iommu_domain *iommu_get_dma_domain(struct device *dev)
+{
+	return dev->iommu_group->default_domain;
+}
+
 /*
  * IOMMU groups are really the natrual working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap by
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee6eb31..16f2172698e5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -297,6 +297,7 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-- 
2.17.1.dirty

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/3] iommu: Add fast hook for getting DMA domains
@ 2018-08-14 13:04     ` Robin Murphy
  0 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-14 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
API callers to retrieve the domain pointer, for DMA ops domains it
doesn't scale well for large systems and multi-queue devices, since the
momentary refcount adjustment will lead to exclusive cacheline contention
when multiple CPUs are operating in parallel on different mappings for
the same device.

In the case of DMA ops domains, however, this refcounting is actually
unnecessary, since they already imply that the group exists and is
managed by platform code and IOMMU internals (by virtue of
iommu_group_get_for_dev()) such that a reference will already be held
for the lifetime of the device. Thus we can avoid the bottleneck by
providing a fast lookup specifically for the DMA code to retrieve the
default domain it already knows it has set up - a simple read-only
dereference plays much nicer with cache-coherency protocols.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/iommu.c | 9 +++++++++
 include/linux/iommu.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63b37563db7e..63c586875df5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1379,6 +1379,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
 
+/*
+ * For IOMMU_DOMAIN_DMA implementations which already provide their own
+ * guarantees that the group and its default domain are valid and correct.
+ */
+struct iommu_domain *iommu_get_dma_domain(struct device *dev)
+{
+	return dev->iommu_group->default_domain;
+}
+
 /*
  * IOMMU groups are really the natrual working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap by
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee6eb31..16f2172698e5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -297,6 +297,7 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-- 
2.17.1.dirty

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] iommu/dma: Use fast DMA domain lookup
  2018-08-14 13:04 ` Robin Murphy
@ 2018-08-14 13:04     ` Robin Murphy
  -1 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-14 13:04 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Most parts of iommu-dma already assume they are operating on a default
domain set up by iommu_dma_init_domain(), and can be converted straight
over to avoid the refcounting bottleneck. MSI page mappings may be in
an unmanaged domain with an explicit MSI-only cookie, so retain the
non-specific lookup, but that's OK since they're far from a contended
fast path either way.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/dma-iommu.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8f549bdb866d..a27eedc16b32 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -510,7 +510,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,
 void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
 		dma_addr_t *handle)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
 	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
 	*handle = IOMMU_MAPPING_ERROR;
 }
@@ -537,7 +537,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
 		unsigned long attrs, int prot, dma_addr_t *handle,
 		void (*flush_page)(struct device *, const void *, phys_addr_t))
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	struct page **pages;
@@ -625,9 +625,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-		size_t size, int prot)
+		size_t size, int prot, struct iommu_domain *domain)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	size_t iova_off = 0;
 	dma_addr_t iova;
@@ -651,13 +650,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
 dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, int prot)
 {
-	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
+	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
+			iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 /*
@@ -745,7 +745,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
 int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 		int nents, int prot)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	struct scatterlist *s, *prev = NULL;
@@ -830,20 +830,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		sg = tmp;
 	}
 	end = sg_dma_address(sg) + sg_dma_len(sg);
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
 }
 
 dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	return __iommu_dma_map(dev, phys, size,
-			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
+			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+			iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
@@ -869,7 +870,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
 	if (!msi_page)
 		return NULL;
 
-	iova = __iommu_dma_map(dev, msi_addr, size, prot);
+	iova = __iommu_dma_map(dev, msi_addr, size, prot, domain);
 	if (iommu_dma_mapping_error(dev, iova))
 		goto out_free_page;
 
-- 
2.17.1.dirty

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-08-14 13:04     ` Robin Murphy
  0 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-14 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

Most parts of iommu-dma already assume they are operating on a default
domain set up by iommu_dma_init_domain(), and can be converted straight
over to avoid the refcounting bottleneck. MSI page mappings may be in
an unmanaged domain with an explicit MSI-only cookie, so retain the
non-specific lookup, but that's OK since they're far from a contended
fast path either way.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/dma-iommu.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8f549bdb866d..a27eedc16b32 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -510,7 +510,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,
 void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
 		dma_addr_t *handle)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
 	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
 	*handle = IOMMU_MAPPING_ERROR;
 }
@@ -537,7 +537,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
 		unsigned long attrs, int prot, dma_addr_t *handle,
 		void (*flush_page)(struct device *, const void *, phys_addr_t))
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	struct page **pages;
@@ -625,9 +625,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-		size_t size, int prot)
+		size_t size, int prot, struct iommu_domain *domain)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	size_t iova_off = 0;
 	dma_addr_t iova;
@@ -651,13 +650,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
 dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, int prot)
 {
-	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
+	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
+			iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 /*
@@ -745,7 +745,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
 int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 		int nents, int prot)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	struct scatterlist *s, *prev = NULL;
@@ -830,20 +830,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		sg = tmp;
 	}
 	end = sg_dma_address(sg) + sg_dma_len(sg);
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
 }
 
 dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	return __iommu_dma_map(dev, phys, size,
-			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
+			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+			iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
@@ -869,7 +870,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
 	if (!msi_page)
 		return NULL;
 
-	iova = __iommu_dma_map(dev, msi_addr, size, prot);
+	iova = __iommu_dma_map(dev, msi_addr, size, prot, domain);
 	if (iommu_dma_mapping_error(dev, iova))
 		goto out_free_page;
 
-- 
2.17.1.dirty

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
  2018-08-14 13:04 ` Robin Murphy
@ 2018-08-14 13:04     ` Robin Murphy
  -1 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-14 13:04 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Whilst the symmetry of deferring to the existing sync callback in
__iommu_map_page() is nice, taking a round-trip through
iommu_iova_to_phys() is a pretty heavyweight way to get an address we
can trivially compute from the page we already have. Tweaking it to just
perform the cache maintenance directly when appropriate doesn't really
make the code any more complicated, and the runtime efficiency gain can
only be a benefit.

Furthermore, the sync operations themselves know they can only be
invoked on a managed DMA ops domain, so can use the fast specific domain
lookup to avoid excessive manipulation of the group refcount
(particularly in the scatterlist cases).

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 61e93f0b5482..5d4144093c20 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -712,7 +712,7 @@ static void __iommu_sync_single_for_cpu(struct device *dev,
 	if (is_device_dma_coherent(dev))
 		return;
 
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
 	__dma_unmap_area(phys_to_virt(phys), size, dir);
 }
 
@@ -725,7 +725,7 @@ static void __iommu_sync_single_for_device(struct device *dev,
 	if (is_device_dma_coherent(dev))
 		return;
 
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
 	__dma_map_area(phys_to_virt(phys), size, dir);
 }
 
@@ -738,9 +738,9 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
 	int prot = dma_info_to_prot(dir, coherent, attrs);
 	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
-	if (!iommu_dma_mapping_error(dev, dev_addr) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+	    !iommu_dma_mapping_error(dev, dev_addr))
+		__dma_map_area(page_address(page), size, dir);
 
 	return dev_addr;
 }
-- 
2.17.1.dirty

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
@ 2018-08-14 13:04     ` Robin Murphy
  0 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-14 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

Whilst the symmetry of deferring to the existing sync callback in
__iommu_map_page() is nice, taking a round-trip through
iommu_iova_to_phys() is a pretty heavyweight way to get an address we
can trivially compute from the page we already have. Tweaking it to just
perform the cache maintenance directly when appropriate doesn't really
make the code any more complicated, and the runtime efficiency gain can
only be a benefit.

Furthermore, the sync operations themselves know they can only be
invoked on a managed DMA ops domain, so can use the fast specific domain
lookup to avoid excessive manipulation of the group refcount
(particularly in the scatterlist cases).

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 61e93f0b5482..5d4144093c20 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -712,7 +712,7 @@ static void __iommu_sync_single_for_cpu(struct device *dev,
 	if (is_device_dma_coherent(dev))
 		return;
 
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
 	__dma_unmap_area(phys_to_virt(phys), size, dir);
 }
 
@@ -725,7 +725,7 @@ static void __iommu_sync_single_for_device(struct device *dev,
 	if (is_device_dma_coherent(dev))
 		return;
 
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
 	__dma_map_area(phys_to_virt(phys), size, dir);
 }
 
@@ -738,9 +738,9 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
 	int prot = dma_info_to_prot(dir, coherent, attrs);
 	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
-	if (!iommu_dma_mapping_error(dev, dev_addr) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+	    !iommu_dma_mapping_error(dev, dev_addr))
+		__dma_map_area(page_address(page), size, dir);
 
 	return dev_addr;
 }
-- 
2.17.1.dirty

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-08-14 13:04 ` Robin Murphy
@ 2018-08-14 13:38     ` John Garry
  -1 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2018-08-14 13:38 UTC (permalink / raw)
  To: Robin Murphy, joro-zLv9SwRftAIdnm+yROfE0A,
	will.deacon-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 14/08/2018 14:04, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
>
> These are a couple of quick patches based on the idea I had back when
> first implementing this lot, but didn't have any way to justify at the
> time. The third patch can be ignored for the sake of API discussion, but
> is included for completeness.
>
> Robin.
>
>
> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html
>
> Robin Murphy (3):
>   iommu: Add fast hook for getting DMA domains
>   iommu/dma: Use fast DMA domain lookup
>   arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
>
>  arch/arm64/mm/dma-mapping.c | 10 +++++-----
>  drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
>  drivers/iommu/iommu.c       |  9 +++++++++
>  include/linux/iommu.h       |  1 +
>  4 files changed, 27 insertions(+), 16 deletions(-)
>

Thanks Robin. So we'll test this now. I anticipate a decent improvement 
across the board since we are now simply dereferencing the device's 
iommu group directly.

Cheers,
John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-08-14 13:38     ` John Garry
  0 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2018-08-14 13:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/08/2018 14:04, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
>
> These are a couple of quick patches based on the idea I had back when
> first implementing this lot, but didn't have any way to justify at the
> time. The third patch can be ignored for the sake of API discussion, but
> is included for completeness.
>
> Robin.
>
>
> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html
>
> Robin Murphy (3):
>   iommu: Add fast hook for getting DMA domains
>   iommu/dma: Use fast DMA domain lookup
>   arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
>
>  arch/arm64/mm/dma-mapping.c | 10 +++++-----
>  drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
>  drivers/iommu/iommu.c       |  9 +++++++++
>  include/linux/iommu.h       |  1 +
>  4 files changed, 27 insertions(+), 16 deletions(-)
>

Thanks Robin. So we'll test this now. I anticipate a decent improvement 
across the board since we are now simply dereferencing the device's 
iommu group directly.

Cheers,
John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-08-14 13:04 ` Robin Murphy
@ 2018-08-17  7:24   ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2018-08-17  7:24 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas, joro, will.deacon, linuxarm, iommu, guohanjun,
	linux-arm-kernel

I plan to make the arm iommu dma ops generic and move them to
drivers/iommu for the 4.20 merge window.  Because of that it would
be great to create a stable branch or even pull this in through
the dma-mapping tree entirely.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-08-17  7:24   ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2018-08-17  7:24 UTC (permalink / raw)
  To: linux-arm-kernel

I plan to make the arm iommu dma ops generic and move them to
drivers/iommu for the 4.20 merge window.  Because of that it would
be great to create a stable branch or even pull this in through
the dma-mapping tree entirely.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-08-17  7:24   ` Christoph Hellwig
@ 2018-08-17  9:03       ` Will Deacon
  -1 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2018-08-17  9:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: catalin.marinas-5wv7dgnIgG8, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA, Robin Murphy,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Aug 17, 2018 at 12:24:15AM -0700, Christoph Hellwig wrote:
> I plan to make the arm iommu dma ops generic and move them to
> drivers/iommu for the 4.20 merge window.  Because of that it would
> be great to create a stable branch or even pull this in through
> the dma-mapping tree entirely.

In which case, for the arm64 bits:

Acked-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-08-17  9:03       ` Will Deacon
  0 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2018-08-17  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 17, 2018 at 12:24:15AM -0700, Christoph Hellwig wrote:
> I plan to make the arm iommu dma ops generic and move them to
> drivers/iommu for the 4.20 merge window.  Because of that it would
> be great to create a stable branch or even pull this in through
> the dma-mapping tree entirely.

In which case, for the arm64 bits:

Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] iommu: Add fast hook for getting DMA domains
  2018-08-14 13:04     ` Robin Murphy
@ 2018-08-17  9:36       ` John Garry
  -1 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2018-08-17  9:36 UTC (permalink / raw)
  To: Robin Murphy, joro, will.deacon, catalin.marinas
  Cc: thunder.leizhen, linuxarm, iommu, liudongdong3, guohanjun,
	linux-arm-kernel

On 14/08/2018 14:04, Robin Murphy wrote:
> While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
> API callers to retrieve the domain pointer, for DMA ops domains it
> doesn't scale well for large systems and multi-queue devices, since the
> momentary refcount adjustment will lead to exclusive cacheline contention
> when multiple CPUs are operating in parallel on different mappings for
> the same device.
>
> In the case of DMA ops domains, however, this refcounting is actually
> unnecessary, since they already imply that the group exists and is
> managed by platform code and IOMMU internals (by virtue of
> iommu_group_get_for_dev()) such that a reference will already be held
> for the lifetime of the device. Thus we can avoid the bottleneck by
> providing a fast lookup specifically for the DMA code to retrieve the
> default domain it already knows it has set up - a simple read-only
> dereference plays much nicer with cache-coherency protocols.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/iommu.c | 9 +++++++++
>  include/linux/iommu.h | 1 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 63b37563db7e..63c586875df5 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1379,6 +1379,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
>
> +/*
> + * For IOMMU_DOMAIN_DMA implementations which already provide their own
> + * guarantees that the group and its default domain are valid and correct.
> + */
> +struct iommu_domain *iommu_get_dma_domain(struct device *dev)
> +{
> +	return dev->iommu_group->default_domain;
> +}
> +
>  /*
>   * IOMMU groups are really the natrual working unit of the IOMMU, but
>   * the IOMMU API works on domains and devices.  Bridge that gap by
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 19938ee6eb31..16f2172698e5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -297,6 +297,7 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  extern void iommu_detach_device(struct iommu_domain *domain,
>  				struct device *dev);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
> +extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);

Hi Robin,

I was wondering whether it's standard to provide a stubbed version of 
this function for !CONFIG_IOMMU_API?

Cheers,
John

>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  		     phys_addr_t paddr, size_t size, int prot);
>  extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] iommu: Add fast hook for getting DMA domains
@ 2018-08-17  9:36       ` John Garry
  0 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2018-08-17  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/08/2018 14:04, Robin Murphy wrote:
> While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
> API callers to retrieve the domain pointer, for DMA ops domains it
> doesn't scale well for large systems and multi-queue devices, since the
> momentary refcount adjustment will lead to exclusive cacheline contention
> when multiple CPUs are operating in parallel on different mappings for
> the same device.
>
> In the case of DMA ops domains, however, this refcounting is actually
> unnecessary, since they already imply that the group exists and is
> managed by platform code and IOMMU internals (by virtue of
> iommu_group_get_for_dev()) such that a reference will already be held
> for the lifetime of the device. Thus we can avoid the bottleneck by
> providing a fast lookup specifically for the DMA code to retrieve the
> default domain it already knows it has set up - a simple read-only
> dereference plays much nicer with cache-coherency protocols.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/iommu.c | 9 +++++++++
>  include/linux/iommu.h | 1 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 63b37563db7e..63c586875df5 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1379,6 +1379,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
>
> +/*
> + * For IOMMU_DOMAIN_DMA implementations which already provide their own
> + * guarantees that the group and its default domain are valid and correct.
> + */
> +struct iommu_domain *iommu_get_dma_domain(struct device *dev)
> +{
> +	return dev->iommu_group->default_domain;
> +}
> +
>  /*
>   * IOMMU groups are really the natrual working unit of the IOMMU, but
>   * the IOMMU API works on domains and devices.  Bridge that gap by
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 19938ee6eb31..16f2172698e5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -297,6 +297,7 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  extern void iommu_detach_device(struct iommu_domain *domain,
>  				struct device *dev);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
> +extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);

Hi Robin,

I was wondering whether it's standard to provide a stubbed version of 
this function for !CONFIG_IOMMU_API?

Cheers,
John

>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  		     phys_addr_t paddr, size_t size, int prot);
>  extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] iommu: Add fast hook for getting DMA domains
  2018-08-17  9:36       ` John Garry
@ 2018-08-17 11:11           ` Robin Murphy
  -1 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-17 11:11 UTC (permalink / raw)
  To: John Garry, joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 17/08/18 10:36, John Garry wrote:
> On 14/08/2018 14:04, Robin Murphy wrote:
>> While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
>> API callers to retrieve the domain pointer, for DMA ops domains it
>> doesn't scale well for large systems and multi-queue devices, since the
>> momentary refcount adjustment will lead to exclusive cacheline contention
>> when multiple CPUs are operating in parallel on different mappings for
>> the same device.
>>
>> In the case of DMA ops domains, however, this refcounting is actually
>> unnecessary, since they already imply that the group exists and is
>> managed by platform code and IOMMU internals (by virtue of
>> iommu_group_get_for_dev()) such that a reference will already be held
>> for the lifetime of the device. Thus we can avoid the bottleneck by
>> providing a fast lookup specifically for the DMA code to retrieve the
>> default domain it already knows it has set up - a simple read-only
>> dereference plays much nicer with cache-coherency protocols.
>>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>  drivers/iommu/iommu.c | 9 +++++++++
>>  include/linux/iommu.h | 1 +
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 63b37563db7e..63c586875df5 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1379,6 +1379,15 @@ struct iommu_domain 
>> *iommu_get_domain_for_dev(struct device *dev)
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
>>
>> +/*
>> + * For IOMMU_DOMAIN_DMA implementations which already provide their own
>> + * guarantees that the group and its default domain are valid and 
>> correct.
>> + */
>> +struct iommu_domain *iommu_get_dma_domain(struct device *dev)
>> +{
>> +    return dev->iommu_group->default_domain;
>> +}
>> +
>>  /*
>>   * IOMMU groups are really the natrual working unit of the IOMMU, but
>>   * the IOMMU API works on domains and devices.  Bridge that gap by
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 19938ee6eb31..16f2172698e5 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -297,6 +297,7 @@ extern int iommu_attach_device(struct iommu_domain 
>> *domain,
>>  extern void iommu_detach_device(struct iommu_domain *domain,
>>                  struct device *dev);
>>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device 
>> *dev);
>> +extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
> 
> Hi Robin,
> 
> I was wondering whether it's standard to provide a stubbed version of 
> this function for !CONFIG_IOMMU_API?

Nope, that's deliberate - this is one of those hooks where any 
legitimate caller will already be dependent on IOMMU_API itself.

Robin.

> 
> Cheers,
> John
> 
>>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>>               phys_addr_t paddr, size_t size, int prot);
>>  extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long 
>> iova,
>>
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] iommu: Add fast hook for getting DMA domains
@ 2018-08-17 11:11           ` Robin Murphy
  0 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-17 11:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/08/18 10:36, John Garry wrote:
> On 14/08/2018 14:04, Robin Murphy wrote:
>> While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
>> API callers to retrieve the domain pointer, for DMA ops domains it
>> doesn't scale well for large systems and multi-queue devices, since the
>> momentary refcount adjustment will lead to exclusive cacheline contention
>> when multiple CPUs are operating in parallel on different mappings for
>> the same device.
>>
>> In the case of DMA ops domains, however, this refcounting is actually
>> unnecessary, since they already imply that the group exists and is
>> managed by platform code and IOMMU internals (by virtue of
>> iommu_group_get_for_dev()) such that a reference will already be held
>> for the lifetime of the device. Thus we can avoid the bottleneck by
>> providing a fast lookup specifically for the DMA code to retrieve the
>> default domain it already knows it has set up - a simple read-only
>> dereference plays much nicer with cache-coherency protocols.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>> ?drivers/iommu/iommu.c | 9 +++++++++
>> ?include/linux/iommu.h | 1 +
>> ?2 files changed, 10 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 63b37563db7e..63c586875df5 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1379,6 +1379,15 @@ struct iommu_domain 
>> *iommu_get_domain_for_dev(struct device *dev)
>> ?}
>> ?EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
>>
>> +/*
>> + * For IOMMU_DOMAIN_DMA implementations which already provide their own
>> + * guarantees that the group and its default domain are valid and 
>> correct.
>> + */
>> +struct iommu_domain *iommu_get_dma_domain(struct device *dev)
>> +{
>> +??? return dev->iommu_group->default_domain;
>> +}
>> +
>> ?/*
>> ? * IOMMU groups are really the natrual working unit of the IOMMU, but
>> ? * the IOMMU API works on domains and devices.? Bridge that gap by
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 19938ee6eb31..16f2172698e5 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -297,6 +297,7 @@ extern int iommu_attach_device(struct iommu_domain 
>> *domain,
>> ?extern void iommu_detach_device(struct iommu_domain *domain,
>> ???????????????? struct device *dev);
>> ?extern struct iommu_domain *iommu_get_domain_for_dev(struct device 
>> *dev);
>> +extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
> 
> Hi Robin,
> 
> I was wondering whether it's standard to provide a stubbed version of 
> this function for !CONFIG_IOMMU_API?

Nope, that's deliberate - this is one of those hooks where any 
legitimate caller will already be dependent on IOMMU_API itself.

Robin.

> 
> Cheers,
> John
> 
>> ?extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>> ????????????? phys_addr_t paddr, size_t size, int prot);
>> ?extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long 
>> iova,
>>
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-08-17  7:24   ` Christoph Hellwig
@ 2018-08-17 11:30       ` Robin Murphy
  -1 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-17 11:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 17/08/18 08:24, Christoph Hellwig wrote:
> I plan to make the arm iommu dma ops generic and move them to
> drivers/iommu for the 4.20 merge window.

You mean 32-bit arm? The only place that code should move to is 
/dev/null ;) - the plan has always been to convert it to use groups and 
default domains so it can just plug iommu-dma in. The tricky bit is 
either weaning the users of the private arm_iommu_*() API onto generic 
IOMMU API usage, or at least implementing transitional wrappers in a way 
that doesn't break anything.

>  Because of that it would
> be great to create a stable branch or even pull this in through
> the dma-mapping tree entirely.

FWIW I might still need to tweak the first patch depending on how we 
choose to resolve the interaction between DMA ops and unmanaged 
domains[1] - if we want to add fallback paths to iommu-dma instead of 
swizzling device ops then the assumptions for this hook change and it 
will need to reference group->domain rather than group->default_domain.

Robin.


[1] 
https://www.mail-archive.com/dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org/msg230275.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-08-17 11:30       ` Robin Murphy
  0 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-17 11:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/08/18 08:24, Christoph Hellwig wrote:
> I plan to make the arm iommu dma ops generic and move them to
> drivers/iommu for the 4.20 merge window.

You mean 32-bit arm? The only place that code should move to is 
/dev/null ;) - the plan has always been to convert it to use groups and 
default domains so it can just plug iommu-dma in. The tricky bit is 
either weaning the users of the private arm_iommu_*() API onto generic 
IOMMU API usage, or at least implementing transitional wrappers in a way 
that doesn't break anything.

>  Because of that it would
> be great to create a stable branch or even pull this in through
> the dma-mapping tree entirely.

FWIW I might still need to tweak the first patch depending on how we 
choose to resolve the interaction between DMA ops and unmanaged 
domains[1] - if we want to add fallback paths to iommu-dma instead of 
swizzling device ops then the assumptions for this hook change and it 
will need to reference group->domain rather than group->default_domain.

Robin.


[1] 
https://www.mail-archive.com/dri-devel at lists.freedesktop.org/msg230275.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-08-17 11:30       ` Robin Murphy
@ 2018-08-17 12:01         ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2018-08-17 12:01 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas, joro, will.deacon, linuxarm, Christoph Hellwig,
	iommu, guohanjun, linux-arm-kernel

On Fri, Aug 17, 2018 at 12:30:31PM +0100, Robin Murphy wrote:
> On 17/08/18 08:24, Christoph Hellwig wrote:
> > I plan to make the arm iommu dma ops generic and move them to
> > drivers/iommu for the 4.20 merge window.
> 
> You mean 32-bit arm?

Sorry, I meant the arm64 wrappers for dma-iommu of course.

> The only place that code should move to is /dev/null ;)
> - the plan has always been to convert it to use groups and default domains
> so it can just plug iommu-dma in. The tricky bit is either weaning the users
> of the private arm_iommu_*() API onto generic IOMMU API usage, or at least
> implementing transitional wrappers in a way that doesn't break anything.

Agreed on the arm32 iommu code.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-08-17 12:01         ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2018-08-17 12:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 17, 2018 at 12:30:31PM +0100, Robin Murphy wrote:
> On 17/08/18 08:24, Christoph Hellwig wrote:
> > I plan to make the arm iommu dma ops generic and move them to
> > drivers/iommu for the 4.20 merge window.
> 
> You mean 32-bit arm?

Sorry, I meant the arm64 wrappers for dma-iommu of course.

> The only place that code should move to is /dev/null ;)
> - the plan has always been to convert it to use groups and default domains
> so it can just plug iommu-dma in. The tricky bit is either weaning the users
> of the private arm_iommu_*() API onto generic IOMMU API usage, or at least
> implementing transitional wrappers in a way that doesn't break anything.

Agreed on the arm32 iommu code.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-08-14 13:04 ` Robin Murphy
@ 2018-08-17 13:03     ` John Garry
  -1 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2018-08-17 13:03 UTC (permalink / raw)
  To: Robin Murphy, joro-zLv9SwRftAIdnm+yROfE0A,
	will.deacon-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 14/08/2018 14:04, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
>
> These are a couple of quick patches based on the idea I had back when
> first implementing this lot, but didn't have any way to justify at the
> time. The third patch can be ignored for the sake of API discussion, but
> is included for completeness.
>
> Robin.
>

Some results:

PCIe NIC iperf test (128 processes, small packets):
Without patchset:
289232.00 rxpck/s

With patchset:
367283 rxpck/s

JFYI, with Leizhen's non-strict mode patchset + Robin's patchset:
1215539 rxpck/s

Leizhen can share non-strict mode results in his own patchset however.

We did also try the storage controller fio test with a lot of SAS SSD 
disks (24 disks, 24 fio processes) for Robin's patchset only, but did 
not see a significant change.

Thanks to Dongdong + chenxiang for testing.

Let me know if you require more info.

Thanks again,
John


>
> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html
>
> Robin Murphy (3):
>   iommu: Add fast hook for getting DMA domains
>   iommu/dma: Use fast DMA domain lookup
>   arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
>
>  arch/arm64/mm/dma-mapping.c | 10 +++++-----
>  drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
>  drivers/iommu/iommu.c       |  9 +++++++++
>  include/linux/iommu.h       |  1 +
>  4 files changed, 27 insertions(+), 16 deletions(-)
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-08-17 13:03     ` John Garry
  0 siblings, 0 replies; 28+ messages in thread
From: John Garry @ 2018-08-17 13:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/08/2018 14:04, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
>
> These are a couple of quick patches based on the idea I had back when
> first implementing this lot, but didn't have any way to justify at the
> time. The third patch can be ignored for the sake of API discussion, but
> is included for completeness.
>
> Robin.
>

Some results:

PCIe NIC iperf test (128 processes, small packets):
Without patchset:
289232.00 rxpck/s

With patchset:
367283 rxpck/s

JFYI, with Leizhen's non-strict mode patchset + Robin's patchset:
1215539 rxpck/s

Leizhen can share non-strict mode results in his own patchset however.

We did also try the storage controller fio test with a lot of SAS SSD 
disks (24 disks, 24 fio processes) for Robin's patchset only, but did 
not see a significant change.

Thanks to Dongdong + chenxiang for testing.

Let me know if you require more info.

Thanks again,
John


>
> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html
>
> Robin Murphy (3):
>   iommu: Add fast hook for getting DMA domains
>   iommu/dma: Use fast DMA domain lookup
>   arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
>
>  arch/arm64/mm/dma-mapping.c | 10 +++++-----
>  drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
>  drivers/iommu/iommu.c       |  9 +++++++++
>  include/linux/iommu.h       |  1 +
>  4 files changed, 27 insertions(+), 16 deletions(-)
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] iommu: Add fast hook for getting DMA domains
  2018-08-14 13:04     ` Robin Murphy
@ 2018-08-17 15:27         ` Laurentiu Tudor
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurentiu Tudor @ 2018-08-17 15:27 UTC (permalink / raw)
  To: Robin Murphy, joro-zLv9SwRftAIdnm+yROfE0A,
	will.deacon-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	guohanjun-hv44wF8Li93QT0dZR+AlfA

Hi Robin,

On 14.08.2018 16:04, Robin Murphy wrote:
> While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
> API callers to retrieve the domain pointer, for DMA ops domains it
> doesn't scale well for large systems and multi-queue devices, since the
> momentary refcount adjustment will lead to exclusive cacheline contention
> when multiple CPUs are operating in parallel on different mappings for
> the same device.
> 
> In the case of DMA ops domains, however, this refcounting is actually
> unnecessary, since they already imply that the group exists and is
> managed by platform code and IOMMU internals (by virtue of
> iommu_group_get_for_dev()) such that a reference will already be held
> for the lifetime of the device. Thus we can avoid the bottleneck by
> providing a fast lookup specifically for the DMA code to retrieve the
> default domain it already knows it has set up - a simple read-only
> dereference plays much nicer with cache-coherency protocols.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>   drivers/iommu/iommu.c | 9 +++++++++
>   include/linux/iommu.h | 1 +
>   2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 63b37563db7e..63c586875df5 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1379,6 +1379,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
>   
> +/*
> + * For IOMMU_DOMAIN_DMA implementations which already provide their own
> + * guarantees that the group and its default domain are valid and correct.
> + */
> +struct iommu_domain *iommu_get_dma_domain(struct device *dev)
> +{
> +	return dev->iommu_group->default_domain;
> +}

After some preliminary tests I'm seeing a ~10% performance improvement 
on one of our chips (nxp ls1046a) which is pretty nice. :-)
Any chance of making the function inline?
If not, shouldn't an EXPORT_SYMBOL_GPL be added?

---
Thanks & Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] iommu: Add fast hook for getting DMA domains
@ 2018-08-17 15:27         ` Laurentiu Tudor
  0 siblings, 0 replies; 28+ messages in thread
From: Laurentiu Tudor @ 2018-08-17 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On 14.08.2018 16:04, Robin Murphy wrote:
> While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
> API callers to retrieve the domain pointer, for DMA ops domains it
> doesn't scale well for large systems and multi-queue devices, since the
> momentary refcount adjustment will lead to exclusive cacheline contention
> when multiple CPUs are operating in parallel on different mappings for
> the same device.
> 
> In the case of DMA ops domains, however, this refcounting is actually
> unnecessary, since they already imply that the group exists and is
> managed by platform code and IOMMU internals (by virtue of
> iommu_group_get_for_dev()) such that a reference will already be held
> for the lifetime of the device. Thus we can avoid the bottleneck by
> providing a fast lookup specifically for the DMA code to retrieve the
> default domain it already knows it has set up - a simple read-only
> dereference plays much nicer with cache-coherency protocols.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>   drivers/iommu/iommu.c | 9 +++++++++
>   include/linux/iommu.h | 1 +
>   2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 63b37563db7e..63c586875df5 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1379,6 +1379,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
>   
> +/*
> + * For IOMMU_DOMAIN_DMA implementations which already provide their own
> + * guarantees that the group and its default domain are valid and correct.
> + */
> +struct iommu_domain *iommu_get_dma_domain(struct device *dev)
> +{
> +	return dev->iommu_group->default_domain;
> +}

After some preliminary tests I'm seeing a ~10% performance improvement 
on one of our chips (nxp ls1046a) which is pretty nice. :-)
Any chance of making the function inline?
If not, shouldn't an EXPORT_SYMBOL_GPL be added?

---
Thanks & Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
  2018-08-14 13:04     ` Robin Murphy
@ 2018-08-20 15:41         ` Robin Murphy
  -1 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-20 15:41 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	guohanjun-hv44wF8Li93QT0dZR+AlfA

On 14/08/18 14:04, Robin Murphy wrote:
> Whilst the symmetry of deferring to the existing sync callback in
> __iommu_map_page() is nice, taking a round-trip through
> iommu_iova_to_phys() is a pretty heavyweight way to get an address we
> can trivially compute from the page we already have. Tweaking it to just
> perform the cache maintenance directly when appropriate doesn't really
> make the code any more complicated, and the runtime efficiency gain can
> only be a benefit.
> 
> Furthermore, the sync operations themselves know they can only be
> invoked on a managed DMA ops domain, so can use the fast specific domain
> lookup to avoid excessive manipulation of the group refcount
> (particularly in the scatterlist cases).
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>   arch/arm64/mm/dma-mapping.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 61e93f0b5482..5d4144093c20 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -712,7 +712,7 @@ static void __iommu_sync_single_for_cpu(struct device *dev,
>   	if (is_device_dma_coherent(dev))
>   		return;
>   
> -	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
> +	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
>   	__dma_unmap_area(phys_to_virt(phys), size, dir);
>   }
>   
> @@ -725,7 +725,7 @@ static void __iommu_sync_single_for_device(struct device *dev,
>   	if (is_device_dma_coherent(dev))
>   		return;
>   
> -	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
> +	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
>   	__dma_map_area(phys_to_virt(phys), size, dir);
>   }
>   
> @@ -738,9 +738,9 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
>   	int prot = dma_info_to_prot(dir, coherent, attrs);
>   	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
>   
> -	if (!iommu_dma_mapping_error(dev, dev_addr) &&
> -	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
> -		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
> +	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> +	    !iommu_dma_mapping_error(dev, dev_addr))
> +		__dma_map_area(page_address(page), size, dir);

                                                  + offset

Sigh...

And there I was assuming that if I'd bothered to write up a proper 
commit message for the 15-month-old change I was cherry-picking, I must 
have actually tested it at the time.

Robin.

>   
>   	return dev_addr;
>   }
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
@ 2018-08-20 15:41         ` Robin Murphy
  0 siblings, 0 replies; 28+ messages in thread
From: Robin Murphy @ 2018-08-20 15:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/08/18 14:04, Robin Murphy wrote:
> Whilst the symmetry of deferring to the existing sync callback in
> __iommu_map_page() is nice, taking a round-trip through
> iommu_iova_to_phys() is a pretty heavyweight way to get an address we
> can trivially compute from the page we already have. Tweaking it to just
> perform the cache maintenance directly when appropriate doesn't really
> make the code any more complicated, and the runtime efficiency gain can
> only be a benefit.
> 
> Furthermore, the sync operations themselves know they can only be
> invoked on a managed DMA ops domain, so can use the fast specific domain
> lookup to avoid excessive manipulation of the group refcount
> (particularly in the scatterlist cases).
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>   arch/arm64/mm/dma-mapping.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 61e93f0b5482..5d4144093c20 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -712,7 +712,7 @@ static void __iommu_sync_single_for_cpu(struct device *dev,
>   	if (is_device_dma_coherent(dev))
>   		return;
>   
> -	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
> +	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
>   	__dma_unmap_area(phys_to_virt(phys), size, dir);
>   }
>   
> @@ -725,7 +725,7 @@ static void __iommu_sync_single_for_device(struct device *dev,
>   	if (is_device_dma_coherent(dev))
>   		return;
>   
> -	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
> +	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
>   	__dma_map_area(phys_to_virt(phys), size, dir);
>   }
>   
> @@ -738,9 +738,9 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
>   	int prot = dma_info_to_prot(dir, coherent, attrs);
>   	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
>   
> -	if (!iommu_dma_mapping_error(dev, dev_addr) &&
> -	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
> -		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
> +	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> +	    !iommu_dma_mapping_error(dev, dev_addr))
> +		__dma_map_area(page_address(page), size, dir);

                                                  + offset

Sigh...

And there I was assuming that if I'd bothered to write up a proper 
commit message for the 15-month-old change I was cherry-picking, I must 
have actually tested it at the time.

Robin.

>   
>   	return dev_addr;
>   }
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-08-20 15:41 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-14 13:04 [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention Robin Murphy
2018-08-14 13:04 ` Robin Murphy
     [not found] ` <cover.1534250425.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-08-14 13:04   ` [PATCH 1/3] iommu: Add fast hook for getting DMA domains Robin Murphy
2018-08-14 13:04     ` Robin Murphy
2018-08-17  9:36     ` John Garry
2018-08-17  9:36       ` John Garry
     [not found]       ` <d31ca4b3-5cb3-8ba3-485b-3fee58807cae-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-08-17 11:11         ` Robin Murphy
2018-08-17 11:11           ` Robin Murphy
     [not found]     ` <4d50587e62c16f7b5cfc45dee808d774655155cf.1534250425.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-08-17 15:27       ` Laurentiu Tudor
2018-08-17 15:27         ` Laurentiu Tudor
2018-08-14 13:04   ` [PATCH 2/3] iommu/dma: Use fast DMA domain lookup Robin Murphy
2018-08-14 13:04     ` Robin Murphy
2018-08-14 13:04   ` [PATCH 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops Robin Murphy
2018-08-14 13:04     ` Robin Murphy
     [not found]     ` <52b017ead032f90e5f2b70b87747a49eb86209c8.1534250425.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-08-20 15:41       ` Robin Murphy
2018-08-20 15:41         ` Robin Murphy
2018-08-14 13:38   ` [PATCH 0/3] iommu: Avoid DMA ops domain refcount contention John Garry
2018-08-14 13:38     ` John Garry
2018-08-17 13:03   ` John Garry
2018-08-17 13:03     ` John Garry
2018-08-17  7:24 ` Christoph Hellwig
2018-08-17  7:24   ` Christoph Hellwig
     [not found]   ` <20180817072415.GA22241-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2018-08-17  9:03     ` Will Deacon
2018-08-17  9:03       ` Will Deacon
2018-08-17 11:30     ` Robin Murphy
2018-08-17 11:30       ` Robin Murphy
2018-08-17 12:01       ` Christoph Hellwig
2018-08-17 12:01         ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.