All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-09-12 15:24 ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-12 15:24 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, hch-jcswGhMUV9g
  Cc: guohanjun-hv44wF8Li93QT0dZR+AlfA, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

John raised the issue[1] that we have some unnecessary refcount contention
in the DMA ops path which shows scalability problems now that we have more
real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
sidestepping this by stashing domain references in archdata, but since
that's not very nice for architecture-agnostic code, I think it's time to
look at a generic API-level solution.

These are a couple of quick patches based on the idea I had back when
first implementing iommu-dma, but didn't have any way to justify at the
time. However, the reports of 10-25% better networking performance on v1
suggest that it's very worthwhile (and far more significant than I ever
would have guessed).

As far as merging goes, I don't mind at all whether this goes via IOMMU,
or via dma-mapping provided Joerg's happy to ack it.

Robin.


[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html

Robin Murphy (3):
  iommu: Add fast hook for getting DMA domains
  iommu/dma: Use fast DMA domain lookup
  arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
 drivers/iommu/iommu.c       |  9 +++++++++
 include/linux/iommu.h       |  1 +
 4 files changed, 27 insertions(+), 16 deletions(-)

-- 
2.19.0.dirty

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-09-12 15:24 ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-12 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

John raised the issue[1] that we have some unnecessary refcount contention
in the DMA ops path which shows scalability problems now that we have more
real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
sidestepping this by stashing domain references in archdata, but since
that's not very nice for architecture-agnostic code, I think it's time to
look at a generic API-level solution.

These are a couple of quick patches based on the idea I had back when
first implementing iommu-dma, but didn't have any way to justify at the
time. However, the reports of 10-25% better networking performance on v1
suggest that it's very worthwhile (and far more significant than I ever
would have guessed).

As far as merging goes, I don't mind at all whether this goes via IOMMU,
or via dma-mapping provided Joerg's happy to ack it.

Robin.


[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html

Robin Murphy (3):
  iommu: Add fast hook for getting DMA domains
  iommu/dma: Use fast DMA domain lookup
  arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
 drivers/iommu/iommu.c       |  9 +++++++++
 include/linux/iommu.h       |  1 +
 4 files changed, 27 insertions(+), 16 deletions(-)

-- 
2.19.0.dirty

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 1/3] iommu: Add fast hook for getting DMA domains
  2018-09-12 15:24 ` Robin Murphy
@ 2018-09-12 15:24     ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-12 15:24 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, hch-jcswGhMUV9g
  Cc: guohanjun-hv44wF8Li93QT0dZR+AlfA, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
API callers to retrieve the domain pointer, for DMA ops domains it
doesn't scale well for large systems and multi-queue devices, since the
momentary refcount adjustment will lead to exclusive cacheline contention
when multiple CPUs are operating in parallel on different mappings for
the same device.

In the case of DMA ops domains, however, this refcounting is actually
unnecessary, since they already imply that the group exists and is
managed by platform code and IOMMU internals (by virtue of
iommu_group_get_for_dev()) such that a reference will already be held
for the lifetime of the device. Thus we can avoid the bottleneck by
providing a fast lookup specifically for the DMA code to retrieve the
default domain it already knows it has set up - a simple read-only
dereference plays much nicer with cache-coherency protocols.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu.c | 9 +++++++++
 include/linux/iommu.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c15c5980299..9d70344204fe 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1415,6 +1415,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
 
+/*
+ * For IOMMU_DOMAIN_DMA implementations which already provide their own
+ * guarantees that the group and its default domain are valid and correct.
+ */
+struct iommu_domain *iommu_get_dma_domain(struct device *dev)
+{
+	return dev->iommu_group->default_domain;
+}
+
 /*
  * IOMMU groups are really the natrual working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap by
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 87994c265bf5..c783648d4060 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -293,6 +293,7 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-- 
2.19.0.dirty

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 1/3] iommu: Add fast hook for getting DMA domains
@ 2018-09-12 15:24     ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-12 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
API callers to retrieve the domain pointer, for DMA ops domains it
doesn't scale well for large systems and multi-queue devices, since the
momentary refcount adjustment will lead to exclusive cacheline contention
when multiple CPUs are operating in parallel on different mappings for
the same device.

In the case of DMA ops domains, however, this refcounting is actually
unnecessary, since they already imply that the group exists and is
managed by platform code and IOMMU internals (by virtue of
iommu_group_get_for_dev()) such that a reference will already be held
for the lifetime of the device. Thus we can avoid the bottleneck by
providing a fast lookup specifically for the DMA code to retrieve the
default domain it already knows it has set up - a simple read-only
dereference plays much nicer with cache-coherency protocols.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/iommu.c | 9 +++++++++
 include/linux/iommu.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c15c5980299..9d70344204fe 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1415,6 +1415,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
 
+/*
+ * For IOMMU_DOMAIN_DMA implementations which already provide their own
+ * guarantees that the group and its default domain are valid and correct.
+ */
+struct iommu_domain *iommu_get_dma_domain(struct device *dev)
+{
+	return dev->iommu_group->default_domain;
+}
+
 /*
  * IOMMU groups are really the natrual working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap by
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 87994c265bf5..c783648d4060 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -293,6 +293,7 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-- 
2.19.0.dirty

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
  2018-09-12 15:24 ` Robin Murphy
@ 2018-09-12 15:24     ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-12 15:24 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, hch-jcswGhMUV9g
  Cc: guohanjun-hv44wF8Li93QT0dZR+AlfA, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Most parts of iommu-dma already assume they are operating on a default
domain set up by iommu_dma_init_domain(), and can be converted straight
over to avoid the refcounting bottleneck. MSI page mappings may be in
an unmanaged domain with an explicit MSI-only cookie, so retain the
non-specific lookup, but that's OK since they're far from a contended
fast path either way.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/dma-iommu.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 511ff9a1d6d9..320f9ea82f3f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -491,7 +491,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,
 void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
 		dma_addr_t *handle)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
 	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
 	*handle = IOMMU_MAPPING_ERROR;
 }
@@ -518,7 +518,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
 		unsigned long attrs, int prot, dma_addr_t *handle,
 		void (*flush_page)(struct device *, const void *, phys_addr_t))
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	struct page **pages;
@@ -606,9 +606,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-		size_t size, int prot)
+		size_t size, int prot, struct iommu_domain *domain)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	size_t iova_off = 0;
 	dma_addr_t iova;
@@ -632,13 +631,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
 dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, int prot)
 {
-	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
+	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
+			iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 /*
@@ -726,7 +726,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
 int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 		int nents, int prot)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	struct scatterlist *s, *prev = NULL;
@@ -811,20 +811,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		sg = tmp;
 	}
 	end = sg_dma_address(sg) + sg_dma_len(sg);
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
 }
 
 dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	return __iommu_dma_map(dev, phys, size,
-			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
+			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+			iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
@@ -850,7 +851,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
 	if (!msi_page)
 		return NULL;
 
-	iova = __iommu_dma_map(dev, msi_addr, size, prot);
+	iova = __iommu_dma_map(dev, msi_addr, size, prot, domain);
 	if (iommu_dma_mapping_error(dev, iova))
 		goto out_free_page;
 
-- 
2.19.0.dirty

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-09-12 15:24     ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-12 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

Most parts of iommu-dma already assume they are operating on a default
domain set up by iommu_dma_init_domain(), and can be converted straight
over to avoid the refcounting bottleneck. MSI page mappings may be in
an unmanaged domain with an explicit MSI-only cookie, so retain the
non-specific lookup, but that's OK since they're far from a contended
fast path either way.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/dma-iommu.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 511ff9a1d6d9..320f9ea82f3f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -491,7 +491,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,
 void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
 		dma_addr_t *handle)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
 	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
 	*handle = IOMMU_MAPPING_ERROR;
 }
@@ -518,7 +518,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
 		unsigned long attrs, int prot, dma_addr_t *handle,
 		void (*flush_page)(struct device *, const void *, phys_addr_t))
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	struct page **pages;
@@ -606,9 +606,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-		size_t size, int prot)
+		size_t size, int prot, struct iommu_domain *domain)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	size_t iova_off = 0;
 	dma_addr_t iova;
@@ -632,13 +631,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
 dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, int prot)
 {
-	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
+	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
+			iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 /*
@@ -726,7 +726,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
 int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 		int nents, int prot)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	struct scatterlist *s, *prev = NULL;
@@ -811,20 +811,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		sg = tmp;
 	}
 	end = sg_dma_address(sg) + sg_dma_len(sg);
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
 }
 
 dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	return __iommu_dma_map(dev, phys, size,
-			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
+			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+			iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
@@ -850,7 +851,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
 	if (!msi_page)
 		return NULL;
 
-	iova = __iommu_dma_map(dev, msi_addr, size, prot);
+	iova = __iommu_dma_map(dev, msi_addr, size, prot, domain);
 	if (iommu_dma_mapping_error(dev, iova))
 		goto out_free_page;
 
-- 
2.19.0.dirty

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
  2018-09-12 15:24 ` Robin Murphy
@ 2018-09-12 15:24     ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-12 15:24 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, hch-jcswGhMUV9g
  Cc: guohanjun-hv44wF8Li93QT0dZR+AlfA, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Whilst the symmetry of deferring to the existing sync callback in
__iommu_map_page() is nice, taking a round-trip through
iommu_iova_to_phys() is a pretty heavyweight way to get an address we
can trivially compute from the page we already have. Tweaking it to just
perform the cache maintenance directly when appropriate doesn't really
make the code any more complicated, and the runtime efficiency gain can
only be a benefit.

Furthermore, the sync operations themselves know they can only be
invoked on a managed DMA ops domain, so can use the fast specific domain
lookup to avoid excessive manipulation of the group refcount
(particularly in the scatterlist cases).

Acked-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---

v2: Don't be totally broken by forgetting the offset

 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 072c51fb07d7..cf017c5bb5e7 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -712,7 +712,7 @@ static void __iommu_sync_single_for_cpu(struct device *dev,
 	if (is_device_dma_coherent(dev))
 		return;
 
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
 	__dma_unmap_area(phys_to_virt(phys), size, dir);
 }
 
@@ -725,7 +725,7 @@ static void __iommu_sync_single_for_device(struct device *dev,
 	if (is_device_dma_coherent(dev))
 		return;
 
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
 	__dma_map_area(phys_to_virt(phys), size, dir);
 }
 
@@ -738,9 +738,9 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
 	int prot = dma_info_to_prot(dir, coherent, attrs);
 	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
-	if (!iommu_dma_mapping_error(dev, dev_addr) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+	    !iommu_dma_mapping_error(dev, dev_addr))
+		__dma_map_area(page_address(page) + offset, size, dir);
 
 	return dev_addr;
 }
-- 
2.19.0.dirty

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
@ 2018-09-12 15:24     ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-12 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

Whilst the symmetry of deferring to the existing sync callback in
__iommu_map_page() is nice, taking a round-trip through
iommu_iova_to_phys() is a pretty heavyweight way to get an address we
can trivially compute from the page we already have. Tweaking it to just
perform the cache maintenance directly when appropriate doesn't really
make the code any more complicated, and the runtime efficiency gain can
only be a benefit.

Furthermore, the sync operations themselves know they can only be
invoked on a managed DMA ops domain, so can use the fast specific domain
lookup to avoid excessive manipulation of the group refcount
(particularly in the scatterlist cases).

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: Don't be totally broken by forgetting the offset

 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 072c51fb07d7..cf017c5bb5e7 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -712,7 +712,7 @@ static void __iommu_sync_single_for_cpu(struct device *dev,
 	if (is_device_dma_coherent(dev))
 		return;
 
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
 	__dma_unmap_area(phys_to_virt(phys), size, dir);
 }
 
@@ -725,7 +725,7 @@ static void __iommu_sync_single_for_device(struct device *dev,
 	if (is_device_dma_coherent(dev))
 		return;
 
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
 	__dma_map_area(phys_to_virt(phys), size, dir);
 }
 
@@ -738,9 +738,9 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
 	int prot = dma_info_to_prot(dir, coherent, attrs);
 	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
-	if (!iommu_dma_mapping_error(dev, dev_addr) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+	    !iommu_dma_mapping_error(dev, dev_addr))
+		__dma_map_area(page_address(page) + offset, size, dir);
 
 	return dev_addr;
 }
-- 
2.19.0.dirty

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-09-12 15:24 ` Robin Murphy
@ 2018-09-14 12:48     ` Will Deacon
  -1 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2018-09-14 12:48 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas-5wv7dgnIgG8, guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hch-jcswGhMUV9g,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Robin,

On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
> 
> These are a couple of quick patches based on the idea I had back when
> first implementing iommu-dma, but didn't have any way to justify at the
> time. However, the reports of 10-25% better networking performance on v1
> suggest that it's very worthwhile (and far more significant than I ever
> would have guessed).
> 
> As far as merging goes, I don't mind at all whether this goes via IOMMU,
> or via dma-mapping provided Joerg's happy to ack it.

I think it makes most sense for Joerg to take this series via his tree.

Anyway, I've been running them on my TX2 box and things are happy enough,
so:

Tested-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-09-14 12:48     ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2018-09-14 12:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
> 
> These are a couple of quick patches based on the idea I had back when
> first implementing iommu-dma, but didn't have any way to justify at the
> time. However, the reports of 10-25% better networking performance on v1
> suggest that it's very worthwhile (and far more significant than I ever
> would have guessed).
> 
> As far as merging goes, I don't mind at all whether this goes via IOMMU,
> or via dma-mapping provided Joerg's happy to ack it.

I think it makes most sense for Joerg to take this series via his tree.

Anyway, I've been running them on my TX2 box and things are happy enough,
so:

Tested-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-09-14 12:48     ` Will Deacon
@ 2018-09-17 11:20         ` John Garry
  -1 siblings, 0 replies; 40+ messages in thread
From: John Garry @ 2018-09-17 11:20 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas-5wv7dgnIgG8, guohanjun-hv44wF8Li93QT0dZR+AlfA,
	Will Deacon, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hch-jcswGhMUV9g,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 14/09/2018 13:48, Will Deacon wrote:
> Hi Robin,
>

Hi Robin,

I just spoke with Dongdong and we will test this version also so that we 
may provide a "Tested-by" tag.

Thanks,
John

> On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
>> John raised the issue[1] that we have some unnecessary refcount contention
>> in the DMA ops path which shows scalability problems now that we have more
>> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
>> sidestepping this by stashing domain references in archdata, but since
>> that's not very nice for architecture-agnostic code, I think it's time to
>> look at a generic API-level solution.
>>
>> These are a couple of quick patches based on the idea I had back when
>> first implementing iommu-dma, but didn't have any way to justify at the
>> time. However, the reports of 10-25% better networking performance on v1
>> suggest that it's very worthwhile (and far more significant than I ever
>> would have guessed).
>>
>> As far as merging goes, I don't mind at all whether this goes via IOMMU,
>> or via dma-mapping provided Joerg's happy to ack it.
>
> I think it makes most sense for Joerg to take this series via his tree.
>
> Anyway, I've been running them on my TX2 box and things are happy enough,
> so:
>
> Tested-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
>
> Will
>
> .
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-09-17 11:20         ` John Garry
  0 siblings, 0 replies; 40+ messages in thread
From: John Garry @ 2018-09-17 11:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/09/2018 13:48, Will Deacon wrote:
> Hi Robin,
>

Hi Robin,

I just spoke with Dongdong and we will test this version also so that we 
may provide a "Tested-by" tag.

Thanks,
John

> On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
>> John raised the issue[1] that we have some unnecessary refcount contention
>> in the DMA ops path which shows scalability problems now that we have more
>> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
>> sidestepping this by stashing domain references in archdata, but since
>> that's not very nice for architecture-agnostic code, I think it's time to
>> look at a generic API-level solution.
>>
>> These are a couple of quick patches based on the idea I had back when
>> first implementing iommu-dma, but didn't have any way to justify at the
>> time. However, the reports of 10-25% better networking performance on v1
>> suggest that it's very worthwhile (and far more significant than I ever
>> would have guessed).
>>
>> As far as merging goes, I don't mind at all whether this goes via IOMMU,
>> or via dma-mapping provided Joerg's happy to ack it.
>
> I think it makes most sense for Joerg to take this series via his tree.
>
> Anyway, I've been running them on my TX2 box and things are happy enough,
> so:
>
> Tested-by: Will Deacon <will.deacon@arm.com>
>
> Will
>
> .
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-09-14 12:48     ` Will Deacon
@ 2018-09-17 13:33         ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2018-09-17 13:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas-5wv7dgnIgG8, guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA, Tom Murphy,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Robin Murphy,
	hch-jcswGhMUV9g,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Sep 14, 2018 at 01:48:59PM +0100, Will Deacon wrote:
> > As far as merging goes, I don't mind at all whether this goes via IOMMU,
> > or via dma-mapping provided Joerg's happy to ack it.
> 
> I think it makes most sense for Joerg to take this series via his tree.

FYI, I have WIP patches to move the arm dma-iommu wrappers to common
code:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-maybe-coherent

which will require some synchronization of the involved trees.  In the
end it is iommu code, so the actual iommu patches should probably
go into the iommu tree after all, but it might have to pull in the
dma-mapping branch for that.  Or we just punt it until the next merge
window.  Not sure how fast Tom needs the common dma-iommu code for
the x86 AMD iommu conversion.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-09-17 13:33         ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2018-09-17 13:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 14, 2018 at 01:48:59PM +0100, Will Deacon wrote:
> > As far as merging goes, I don't mind at all whether this goes via IOMMU,
> > or via dma-mapping provided Joerg's happy to ack it.
> 
> I think it makes most sense for Joerg to take this series via his tree.

FYI, I have WIP patches to move the arm dma-iommu wrappers to common
code:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-maybe-coherent

which will require some synchronization of the involved trees.  In the
end it is iommu code, so the actual iommu patches should probably
go into the iommu tree after all, but it might have to pull in the
dma-mapping branch for that.  Or we just punt it until the next merge
window.  Not sure how fast Tom needs the common dma-iommu code for
the x86 AMD iommu conversion.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-09-17 13:33         ` Christoph Hellwig
@ 2018-09-18 13:28             ` Tom Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Tom Murphy @ 2018-09-18 13:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: guohanjun-hv44wF8Li93QT0dZR+AlfA, catalin.marinas-5wv7dgnIgG8,
	Will Deacon, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Robin Murphy,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

>Not sure how fast Tom needs the common dma-iommu code for the x86 AMD iommu conversion.

I am currently busy working on something else and won't be able to
do/test the x86 AMD iommu conversion anytime soon. So I don't need the
common dma-iommu code anytime soon.

On 17 September 2018 at 14:33, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
> On Fri, Sep 14, 2018 at 01:48:59PM +0100, Will Deacon wrote:
>> > As far as merging goes, I don't mind at all whether this goes via IOMMU,
>> > or via dma-mapping provided Joerg's happy to ack it.
>>
>> I think it makes most sense for Joerg to take this series via his tree.
>
> FYI, I have WIP patches to move the arm dma-iommu wrappers to common
> code:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-maybe-coherent
>
> which will require some synchronization of the involved trees.  In the
> end it is iommu code, so the actual iommu patches should probably
> go into the iommu tree after all, but it might have to pull in the
> dma-mapping branch for that.  Or we just punt it until the next merge
> window.  Not sure how fast Tom needs the common dma-iommu code for
> the x86 AMD iommu conversion.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-09-18 13:28             ` Tom Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Tom Murphy @ 2018-09-18 13:28 UTC (permalink / raw)
  To: linux-arm-kernel

>Not sure how fast Tom needs the common dma-iommu code for the x86 AMD iommu conversion.

I am currently busy working on something else and won't be able to
do/test the x86 AMD iommu conversion anytime soon. So I don't need the
common dma-iommu code anytime soon.

On 17 September 2018 at 14:33, Christoph Hellwig <hch@lst.de> wrote:
> On Fri, Sep 14, 2018 at 01:48:59PM +0100, Will Deacon wrote:
>> > As far as merging goes, I don't mind at all whether this goes via IOMMU,
>> > or via dma-mapping provided Joerg's happy to ack it.
>>
>> I think it makes most sense for Joerg to take this series via his tree.
>
> FYI, I have WIP patches to move the arm dma-iommu wrappers to common
> code:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-maybe-coherent
>
> which will require some synchronization of the involved trees.  In the
> end it is iommu code, so the actual iommu patches should probably
> go into the iommu tree after all, but it might have to pull in the
> dma-mapping branch for that.  Or we just punt it until the next merge
> window.  Not sure how fast Tom needs the common dma-iommu code for
> the x86 AMD iommu conversion.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-09-17 11:20         ` John Garry
@ 2018-09-20 12:51             ` John Garry
  -1 siblings, 0 replies; 40+ messages in thread
From: John Garry @ 2018-09-20 12:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas-5wv7dgnIgG8, Will Deacon,
	linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA, hch-jcswGhMUV9g,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 17/09/2018 12:20, John Garry wrote:
> On 14/09/2018 13:48, Will Deacon wrote:
>> Hi Robin,
>>
>
> Hi Robin,
>
> I just spoke with Dongdong and we will test this version also so that we
> may provide a "Tested-by" tag.
>

I tested this, so for series:
Tested-by: John Garry <john.garry-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Thanks,
John

> Thanks,
> John
>
>> On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
>>> John raised the issue[1] that we have some unnecessary refcount
>>> contention
>>> in the DMA ops path which shows scalability problems now that we have
>>> more
>>> real high-performance hardware using iommu-dma. The x86 IOMMU drivers
>>> are
>>> sidestepping this by stashing domain references in archdata, but since
>>> that's not very nice for architecture-agnostic code, I think it's
>>> time to
>>> look at a generic API-level solution.
>>>
>>> These are a couple of quick patches based on the idea I had back when
>>> first implementing iommu-dma, but didn't have any way to justify at the
>>> time. However, the reports of 10-25% better networking performance on v1
>>> suggest that it's very worthwhile (and far more significant than I ever
>>> would have guessed).
>>>
>>> As far as merging goes, I don't mind at all whether this goes via IOMMU,
>>> or via dma-mapping provided Joerg's happy to ack it.
>>
>> I think it makes most sense for Joerg to take this series via his tree.
>>
>> Anyway, I've been running them on my TX2 box and things are happy enough,
>> so:
>>
>> Tested-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
>>
>> Will
>>
>> .
>>
>
>
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
> .
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-09-20 12:51             ` John Garry
  0 siblings, 0 replies; 40+ messages in thread
From: John Garry @ 2018-09-20 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/09/2018 12:20, John Garry wrote:
> On 14/09/2018 13:48, Will Deacon wrote:
>> Hi Robin,
>>
>
> Hi Robin,
>
> I just spoke with Dongdong and we will test this version also so that we
> may provide a "Tested-by" tag.
>

I tested this, so for series:
Tested-by: John Garry <john.garry@huawei.com>

Thanks,
John

> Thanks,
> John
>
>> On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
>>> John raised the issue[1] that we have some unnecessary refcount
>>> contention
>>> in the DMA ops path which shows scalability problems now that we have
>>> more
>>> real high-performance hardware using iommu-dma. The x86 IOMMU drivers
>>> are
>>> sidestepping this by stashing domain references in archdata, but since
>>> that's not very nice for architecture-agnostic code, I think it's
>>> time to
>>> look at a generic API-level solution.
>>>
>>> These are a couple of quick patches based on the idea I had back when
>>> first implementing iommu-dma, but didn't have any way to justify at the
>>> time. However, the reports of 10-25% better networking performance on v1
>>> suggest that it's very worthwhile (and far more significant than I ever
>>> would have guessed).
>>>
>>> As far as merging goes, I don't mind at all whether this goes via IOMMU,
>>> or via dma-mapping provided Joerg's happy to ack it.
>>
>> I think it makes most sense for Joerg to take this series via his tree.
>>
>> Anyway, I've been running them on my TX2 box and things are happy enough,
>> so:
>>
>> Tested-by: Will Deacon <will.deacon@arm.com>
>>
>> Will
>>
>> .
>>
>
>
> _______________________________________________
> iommu mailing list
> iommu at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
> .
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
  2018-09-12 15:24 ` Robin Murphy
@ 2018-09-25  8:24     ` Joerg Roedel
  -1 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2018-09-25  8:24 UTC (permalink / raw)
  To: Robin Murphy
  Cc: guohanjun-hv44wF8Li93QT0dZR+AlfA, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, linuxarm-hv44wF8Li93QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hch-jcswGhMUV9g,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
> Robin Murphy (3):
>   iommu: Add fast hook for getting DMA domains
>   iommu/dma: Use fast DMA domain lookup
>   arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

Applied, thanks Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention
@ 2018-09-25  8:24     ` Joerg Roedel
  0 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2018-09-25  8:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
> Robin Murphy (3):
>   iommu: Add fast hook for getting DMA domains
>   iommu/dma: Use fast DMA domain lookup
>   arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

Applied, thanks Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
  2018-09-12 15:24     ` Robin Murphy
@ 2018-09-28 13:26         ` Marek Szyprowski
  -1 siblings, 0 replies; 40+ messages in thread
From: Marek Szyprowski @ 2018-09-28 13:26 UTC (permalink / raw)
  To: Robin Murphy, joro-zLv9SwRftAIdnm+yROfE0A, hch-jcswGhMUV9g
  Cc: 'Linux Samsung SOC',
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	linuxarm-hv44wF8Li93QT0dZR+AlfA, 대인기,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi All,

On 2018-09-12 17:24, Robin Murphy wrote:
> Most parts of iommu-dma already assume they are operating on a default
> domain set up by iommu_dma_init_domain(), and can be converted straight
> over to avoid the refcounting bottleneck. MSI page mappings may be in
> an unmanaged domain with an explicit MSI-only cookie, so retain the
> non-specific lookup, but that's OK since they're far from a contended
> fast path either way.
>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
Exynos DRM creates it's own domain, attach all devices which performs
DMA access (typically CRTC devices) to it and uses standard DMA-mapping
calls to allocate/map buffers. This way it can use the same one DMA
address for each buffer regardless of the CRTC/display/processing
device.

This no longer works with this patch. The simplest solution to fix this
is add API to change default_domain to the one allocated by the Exynos
DRM driver.

> ---
>   drivers/iommu/dma-iommu.c | 23 ++++++++++++-----------
>   1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 511ff9a1d6d9..320f9ea82f3f 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -491,7 +491,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,
>   void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
>   		dma_addr_t *handle)
>   {
> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
>   	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
>   	*handle = IOMMU_MAPPING_ERROR;
>   }
> @@ -518,7 +518,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
>   		unsigned long attrs, int prot, dma_addr_t *handle,
>   		void (*flush_page)(struct device *, const void *, phys_addr_t))
>   {
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	struct iova_domain *iovad = &cookie->iovad;
>   	struct page **pages;
> @@ -606,9 +606,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
>   }
>   
>   static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
> -		size_t size, int prot)
> +		size_t size, int prot, struct iommu_domain *domain)
>   {
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	size_t iova_off = 0;
>   	dma_addr_t iova;
> @@ -632,13 +631,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
>   dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   		unsigned long offset, size_t size, int prot)
>   {
> -	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
> +	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
> +			iommu_get_dma_domain(dev));
>   }
>   
>   void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs)
>   {
> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
>   }
>   
>   /*
> @@ -726,7 +726,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
>   int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   		int nents, int prot)
>   {
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	struct iova_domain *iovad = &cookie->iovad;
>   	struct scatterlist *s, *prev = NULL;
> @@ -811,20 +811,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
>   		sg = tmp;
>   	}
>   	end = sg_dma_address(sg) + sg_dma_len(sg);
> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start);
> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
>   }
>   
>   dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	return __iommu_dma_map(dev, phys, size,
> -			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
> +			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
> +			iommu_get_dma_domain(dev));
>   }
>   
>   void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
>   }
>   
>   int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> @@ -850,7 +851,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>   	if (!msi_page)
>   		return NULL;
>   
> -	iova = __iommu_dma_map(dev, msi_addr, size, prot);
> +	iova = __iommu_dma_map(dev, msi_addr, size, prot, domain);
>   	if (iommu_dma_mapping_error(dev, iova))
>   		goto out_free_page;
>   

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-09-28 13:26         ` Marek Szyprowski
  0 siblings, 0 replies; 40+ messages in thread
From: Marek Szyprowski @ 2018-09-28 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All,

On 2018-09-12 17:24, Robin Murphy wrote:
> Most parts of iommu-dma already assume they are operating on a default
> domain set up by iommu_dma_init_domain(), and can be converted straight
> over to avoid the refcounting bottleneck. MSI page mappings may be in
> an unmanaged domain with an explicit MSI-only cookie, so retain the
> non-specific lookup, but that's OK since they're far from a contended
> fast path either way.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
Exynos DRM creates it's own domain, attach all devices which performs
DMA access (typically CRTC devices) to it and uses standard DMA-mapping
calls to allocate/map buffers. This way it can use the same one DMA
address for each buffer regardless of the CRTC/display/processing
device.

This no longer works with this patch. The simplest solution to fix this
is add API to change default_domain to the one allocated by the Exynos
DRM driver.

> ---
>   drivers/iommu/dma-iommu.c | 23 ++++++++++++-----------
>   1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 511ff9a1d6d9..320f9ea82f3f 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -491,7 +491,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,
>   void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
>   		dma_addr_t *handle)
>   {
> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
>   	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
>   	*handle = IOMMU_MAPPING_ERROR;
>   }
> @@ -518,7 +518,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
>   		unsigned long attrs, int prot, dma_addr_t *handle,
>   		void (*flush_page)(struct device *, const void *, phys_addr_t))
>   {
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	struct iova_domain *iovad = &cookie->iovad;
>   	struct page **pages;
> @@ -606,9 +606,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
>   }
>   
>   static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
> -		size_t size, int prot)
> +		size_t size, int prot, struct iommu_domain *domain)
>   {
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	size_t iova_off = 0;
>   	dma_addr_t iova;
> @@ -632,13 +631,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
>   dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>   		unsigned long offset, size_t size, int prot)
>   {
> -	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
> +	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
> +			iommu_get_dma_domain(dev));
>   }
>   
>   void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
>   		enum dma_data_direction dir, unsigned long attrs)
>   {
> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
>   }
>   
>   /*
> @@ -726,7 +726,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
>   int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   		int nents, int prot)
>   {
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	struct iova_domain *iovad = &cookie->iovad;
>   	struct scatterlist *s, *prev = NULL;
> @@ -811,20 +811,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
>   		sg = tmp;
>   	}
>   	end = sg_dma_address(sg) + sg_dma_len(sg);
> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start);
> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
>   }
>   
>   dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	return __iommu_dma_map(dev, phys, size,
> -			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
> +			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
> +			iommu_get_dma_domain(dev));
>   }
>   
>   void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>   {
> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
>   }
>   
>   int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> @@ -850,7 +851,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>   	if (!msi_page)
>   		return NULL;
>   
> -	iova = __iommu_dma_map(dev, msi_addr, size, prot);
> +	iova = __iommu_dma_map(dev, msi_addr, size, prot, domain);
>   	if (iommu_dma_mapping_error(dev, iova))
>   		goto out_free_page;
>   

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
  2018-09-28 13:26         ` Marek Szyprowski
@ 2018-09-28 13:52             ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-28 13:52 UTC (permalink / raw)
  To: Marek Szyprowski, joro-zLv9SwRftAIdnm+yROfE0A, hch-jcswGhMUV9g
  Cc: 'Linux Samsung SOC',
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	linuxarm-hv44wF8Li93QT0dZR+AlfA, 대인기,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 28/09/18 14:26, Marek Szyprowski wrote:
> Hi All,
> 
> On 2018-09-12 17:24, Robin Murphy wrote:
>> Most parts of iommu-dma already assume they are operating on a default
>> domain set up by iommu_dma_init_domain(), and can be converted straight
>> over to avoid the refcounting bottleneck. MSI page mappings may be in
>> an unmanaged domain with an explicit MSI-only cookie, so retain the
>> non-specific lookup, but that's OK since they're far from a contended
>> fast path either way.
>>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> 
> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
> Exynos DRM creates it's own domain, attach all devices which performs
> DMA access (typically CRTC devices) to it and uses standard DMA-mapping
> calls to allocate/map buffers. This way it can use the same one DMA
> address for each buffer regardless of the CRTC/display/processing
> device.
> 
> This no longer works with this patch. The simplest solution to fix this
> is add API to change default_domain to the one allocated by the Exynos
> DRM driver.

Ugh, I hadn't realised there were any drivers trying to fake up their 
own default domains - that's always going to be fragile. In fact, one of 
my proposals for un-breaking Tegra and other DRM drivers involves 
preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA 
domains at all, which would stop that from working permanently.

Can Exynos not put all the DRM components into a single group, or at 
least just pick one of the real default domains to attach everyone to 
instead of allocating a fake one?

Robin.

>> ---
>>    drivers/iommu/dma-iommu.c | 23 ++++++++++++-----------
>>    1 file changed, 12 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 511ff9a1d6d9..320f9ea82f3f 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -491,7 +491,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,
>>    void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
>>    		dma_addr_t *handle)
>>    {
>> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
>> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
>>    	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
>>    	*handle = IOMMU_MAPPING_ERROR;
>>    }
>> @@ -518,7 +518,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
>>    		unsigned long attrs, int prot, dma_addr_t *handle,
>>    		void (*flush_page)(struct device *, const void *, phys_addr_t))
>>    {
>> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>>    	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>    	struct iova_domain *iovad = &cookie->iovad;
>>    	struct page **pages;
>> @@ -606,9 +606,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
>>    }
>>    
>>    static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
>> -		size_t size, int prot)
>> +		size_t size, int prot, struct iommu_domain *domain)
>>    {
>> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>>    	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>    	size_t iova_off = 0;
>>    	dma_addr_t iova;
>> @@ -632,13 +631,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
>>    dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>>    		unsigned long offset, size_t size, int prot)
>>    {
>> -	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
>> +	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
>> +			iommu_get_dma_domain(dev));
>>    }
>>    
>>    void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
>>    		enum dma_data_direction dir, unsigned long attrs)
>>    {
>> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
>> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
>>    }
>>    
>>    /*
>> @@ -726,7 +726,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
>>    int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>    		int nents, int prot)
>>    {
>> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>>    	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>    	struct iova_domain *iovad = &cookie->iovad;
>>    	struct scatterlist *s, *prev = NULL;
>> @@ -811,20 +811,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
>>    		sg = tmp;
>>    	}
>>    	end = sg_dma_address(sg) + sg_dma_len(sg);
>> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start);
>> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
>>    }
>>    
>>    dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
>>    		size_t size, enum dma_data_direction dir, unsigned long attrs)
>>    {
>>    	return __iommu_dma_map(dev, phys, size,
>> -			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
>> +			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
>> +			iommu_get_dma_domain(dev));
>>    }
>>    
>>    void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
>>    		size_t size, enum dma_data_direction dir, unsigned long attrs)
>>    {
>> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
>> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
>>    }
>>    
>>    int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
>> @@ -850,7 +851,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>>    	if (!msi_page)
>>    		return NULL;
>>    
>> -	iova = __iommu_dma_map(dev, msi_addr, size, prot);
>> +	iova = __iommu_dma_map(dev, msi_addr, size, prot, domain);
>>    	if (iommu_dma_mapping_error(dev, iova))
>>    		goto out_free_page;
>>    
> 
> Best regards
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-09-28 13:52             ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-28 13:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/09/18 14:26, Marek Szyprowski wrote:
> Hi All,
> 
> On 2018-09-12 17:24, Robin Murphy wrote:
>> Most parts of iommu-dma already assume they are operating on a default
>> domain set up by iommu_dma_init_domain(), and can be converted straight
>> over to avoid the refcounting bottleneck. MSI page mappings may be in
>> an unmanaged domain with an explicit MSI-only cookie, so retain the
>> non-specific lookup, but that's OK since they're far from a contended
>> fast path either way.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> 
> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
> Exynos DRM creates it's own domain, attach all devices which performs
> DMA access (typically CRTC devices) to it and uses standard DMA-mapping
> calls to allocate/map buffers. This way it can use the same one DMA
> address for each buffer regardless of the CRTC/display/processing
> device.
> 
> This no longer works with this patch. The simplest solution to fix this
> is add API to change default_domain to the one allocated by the Exynos
> DRM driver.

Ugh, I hadn't realised there were any drivers trying to fake up their 
own default domains - that's always going to be fragile. In fact, one of 
my proposals for un-breaking Tegra and other DRM drivers involves 
preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA 
domains at all, which would stop that from working permanently.

Can Exynos not put all the DRM components into a single group, or at 
least just pick one of the real default domains to attach everyone to 
instead of allocating a fake one?

Robin.

>> ---
>>    drivers/iommu/dma-iommu.c | 23 ++++++++++++-----------
>>    1 file changed, 12 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 511ff9a1d6d9..320f9ea82f3f 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -491,7 +491,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,
>>    void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
>>    		dma_addr_t *handle)
>>    {
>> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
>> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
>>    	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
>>    	*handle = IOMMU_MAPPING_ERROR;
>>    }
>> @@ -518,7 +518,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
>>    		unsigned long attrs, int prot, dma_addr_t *handle,
>>    		void (*flush_page)(struct device *, const void *, phys_addr_t))
>>    {
>> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>>    	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>    	struct iova_domain *iovad = &cookie->iovad;
>>    	struct page **pages;
>> @@ -606,9 +606,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
>>    }
>>    
>>    static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
>> -		size_t size, int prot)
>> +		size_t size, int prot, struct iommu_domain *domain)
>>    {
>> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>>    	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>    	size_t iova_off = 0;
>>    	dma_addr_t iova;
>> @@ -632,13 +631,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
>>    dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>>    		unsigned long offset, size_t size, int prot)
>>    {
>> -	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
>> +	return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
>> +			iommu_get_dma_domain(dev));
>>    }
>>    
>>    void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
>>    		enum dma_data_direction dir, unsigned long attrs)
>>    {
>> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
>> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
>>    }
>>    
>>    /*
>> @@ -726,7 +726,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
>>    int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>    		int nents, int prot)
>>    {
>> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>>    	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>    	struct iova_domain *iovad = &cookie->iovad;
>>    	struct scatterlist *s, *prev = NULL;
>> @@ -811,20 +811,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
>>    		sg = tmp;
>>    	}
>>    	end = sg_dma_address(sg) + sg_dma_len(sg);
>> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start);
>> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
>>    }
>>    
>>    dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
>>    		size_t size, enum dma_data_direction dir, unsigned long attrs)
>>    {
>>    	return __iommu_dma_map(dev, phys, size,
>> -			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
>> +			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
>> +			iommu_get_dma_domain(dev));
>>    }
>>    
>>    void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
>>    		size_t size, enum dma_data_direction dir, unsigned long attrs)
>>    {
>> -	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
>> +	__iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
>>    }
>>    
>>    int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
>> @@ -850,7 +851,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>>    	if (!msi_page)
>>    		return NULL;
>>    
>> -	iova = __iommu_dma_map(dev, msi_addr, size, prot);
>> +	iova = __iommu_dma_map(dev, msi_addr, size, prot, domain);
>>    	if (iommu_dma_mapping_error(dev, iova))
>>    		goto out_free_page;
>>    
> 
> Best regards
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
  2018-09-28 13:52             ` Robin Murphy
@ 2018-09-28 14:21                 ` Marek Szyprowski
  -1 siblings, 0 replies; 40+ messages in thread
From: Marek Szyprowski @ 2018-09-28 14:21 UTC (permalink / raw)
  To: Robin Murphy, joro-zLv9SwRftAIdnm+yROfE0A, hch-jcswGhMUV9g
  Cc: 'Linux Samsung SOC',
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	linuxarm-hv44wF8Li93QT0dZR+AlfA, 대인기,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Robin,

On 2018-09-28 15:52, Robin Murphy wrote:
> On 28/09/18 14:26, Marek Szyprowski wrote:
>> On 2018-09-12 17:24, Robin Murphy wrote:
>>> Most parts of iommu-dma already assume they are operating on a default
>>> domain set up by iommu_dma_init_domain(), and can be converted straight
>>> over to avoid the refcounting bottleneck. MSI page mappings may be in
>>> an unmanaged domain with an explicit MSI-only cookie, so retain the
>>> non-specific lookup, but that's OK since they're far from a contended
>>> fast path either way.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>
>> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
>> Exynos DRM creates it's own domain, attach all devices which performs
>> DMA access (typically CRTC devices) to it and uses standard DMA-mapping
>> calls to allocate/map buffers. This way it can use the same one DMA
>> address for each buffer regardless of the CRTC/display/processing
>> device.
>>
>> This no longer works with this patch. The simplest solution to fix this
>> is add API to change default_domain to the one allocated by the Exynos
>> DRM driver.
>
> Ugh, I hadn't realised there were any drivers trying to fake up their 
> own default domains - that's always going to be fragile. In fact, one 
> of my proposals for un-breaking Tegra and other DRM drivers involves 
> preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA 
> domains at all, which would stop that from working permanently.

IMHO there should be a way to let drivers like DRM to use DMA-mapping 
infrastructure on their own iommu domain. Better provide such API to 
avoid hacking in the DRM drivers to get this done without help from 
IOMMU core.

> Can Exynos not put all the DRM components into a single group, or at 
> least just pick one of the real default domains to attach everyone to 
> instead of allocating a fake one?

Exynos DRM components are not in the single group. Currently 
iommu-groups have no meaning on Exynos5433 and Exynos IOMMU simply 
allocate one group per each iommu-master device in the system.

Exynos DRM already selects one of its component devices as the 'main DMA 
device'. It is being used for managing buffers if no IOMMU is available, 
so it can also use its iommu domain instead of allocating a fake one. 
I've checked and it fixes Exynos DRM now. The question is how to merge 
this fix to keep bisectability.

From: Marek Szyprowski <m.szyprowski@samsung.com>
Date: Fri, 28 Sep 2018 16:17:27 +0200
Subject: [PATCH] drm/exynos: Use selected dma_dev default iommu domain 
instead
  of a fake one

Instead of allocating a fake IOMMU domain for all Exynos DRM components,
simply reuse the default IOMMU domain of the already selected DMA device.
This allows some design changes in IOMMU framework without breaking IOMMU
support in Exynos DRM.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
  drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34 ++++-------------------
  1 file changed, 6 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h 
b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
index 87f6b5672e11..51d3b7dcd529 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
@@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct 
exynos_drm_private *priv,
  static inline int __exynos_iommu_create_mapping(struct 
exynos_drm_private *priv,
                      unsigned long start, unsigned long size)
  {
-    struct iommu_domain *domain;
-    int ret;
-
-    domain = iommu_domain_alloc(priv->dma_dev->bus);
-    if (!domain)
-        return -ENOMEM;
-
-    ret = iommu_get_dma_cookie(domain);
-    if (ret)
-        goto free_domain;
-
-    ret = iommu_dma_init_domain(domain, start, size, NULL);
-    if (ret)
-        goto put_cookie;
-
-    priv->mapping = domain;
+    priv->mapping = iommu_get_dma_domain(priv->dma_dev);
      return 0;
-
-put_cookie:
-    iommu_put_dma_cookie(domain);
-free_domain:
-    iommu_domain_free(domain);
-    return ret;
  }

  static inline void __exynos_iommu_release_mapping(struct 
exynos_drm_private *priv)
  {
-    struct iommu_domain *domain = priv->mapping;
-
-    iommu_put_dma_cookie(domain);
-    iommu_domain_free(domain);
      priv->mapping = NULL;
  }

@@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct 
exynos_drm_private *priv,
  {
      struct iommu_domain *domain = priv->mapping;

-    return iommu_attach_device(domain, dev);
+    if (dev != priv->dma_dev)
+        return iommu_attach_device(domain, dev);
+    return 0;
  }

  static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
@@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct 
exynos_drm_private *priv,
  {
      struct iommu_domain *domain = priv->mapping;

-    iommu_detach_device(domain, dev);
+    if (dev != priv->dma_dev)
+        iommu_detach_device(domain, dev);
  }
  #else
  #error Unsupported architecture and IOMMU/DMA-mapping glue code

 > ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-09-28 14:21                 ` Marek Szyprowski
  0 siblings, 0 replies; 40+ messages in thread
From: Marek Szyprowski @ 2018-09-28 14:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On 2018-09-28 15:52, Robin Murphy wrote:
> On 28/09/18 14:26, Marek Szyprowski wrote:
>> On 2018-09-12 17:24, Robin Murphy wrote:
>>> Most parts of iommu-dma already assume they are operating on a default
>>> domain set up by iommu_dma_init_domain(), and can be converted straight
>>> over to avoid the refcounting bottleneck. MSI page mappings may be in
>>> an unmanaged domain with an explicit MSI-only cookie, so retain the
>>> non-specific lookup, but that's OK since they're far from a contended
>>> fast path either way.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>
>> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
>> Exynos DRM creates it's own domain, attach all devices which performs
>> DMA access (typically CRTC devices) to it and uses standard DMA-mapping
>> calls to allocate/map buffers. This way it can use the same one DMA
>> address for each buffer regardless of the CRTC/display/processing
>> device.
>>
>> This no longer works with this patch. The simplest solution to fix this
>> is add API to change default_domain to the one allocated by the Exynos
>> DRM driver.
>
> Ugh, I hadn't realised there were any drivers trying to fake up their 
> own default domains - that's always going to be fragile. In fact, one 
> of my proposals for un-breaking Tegra and other DRM drivers involves 
> preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA 
> domains at all, which would stop that from working permanently.

IMHO there should be a way to let drivers like DRM to use DMA-mapping 
infrastructure on their own iommu domain. Better provide such API to 
avoid hacking in the DRM drivers to get this done without help from 
IOMMU core.

> Can Exynos not put all the DRM components into a single group, or at 
> least just pick one of the real default domains to attach everyone to 
> instead of allocating a fake one?

Exynos DRM components are not in the single group. Currently 
iommu-groups have no meaning on Exynos5433 and Exynos IOMMU simply 
allocate one group per each iommu-master device in the system.

Exynos DRM already selects one of its component devices as the 'main DMA 
device'. It is being used for managing buffers if no IOMMU is available, 
so it can also use its iommu domain instead of allocating a fake one. 
I've checked and it fixes Exynos DRM now. The question is how to merge 
this fix to keep bisectability.

From: Marek Szyprowski <m.szyprowski@samsung.com>
Date: Fri, 28 Sep 2018 16:17:27 +0200
Subject: [PATCH] drm/exynos: Use selected dma_dev default iommu domain 
instead
 ?of a fake one

Instead of allocating a fake IOMMU domain for all Exynos DRM components,
simply reuse the default IOMMU domain of the already selected DMA device.
This allows some design changes in IOMMU framework without breaking IOMMU
support in Exynos DRM.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 ?drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34 ++++-------------------
 ?1 file changed, 6 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h 
b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
index 87f6b5672e11..51d3b7dcd529 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
@@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct 
exynos_drm_private *priv,
 ?static inline int __exynos_iommu_create_mapping(struct 
exynos_drm_private *priv,
 ???? ??? ??? ??? ??? unsigned long start, unsigned long size)
 ?{
-??? struct iommu_domain *domain;
-??? int ret;
-
-??? domain = iommu_domain_alloc(priv->dma_dev->bus);
-??? if (!domain)
-??? ??? return -ENOMEM;
-
-??? ret = iommu_get_dma_cookie(domain);
-??? if (ret)
-??? ??? goto free_domain;
-
-??? ret = iommu_dma_init_domain(domain, start, size, NULL);
-??? if (ret)
-??? ??? goto put_cookie;
-
-??? priv->mapping = domain;
+??? priv->mapping = iommu_get_dma_domain(priv->dma_dev);
 ???? return 0;
-
-put_cookie:
-??? iommu_put_dma_cookie(domain);
-free_domain:
-??? iommu_domain_free(domain);
-??? return ret;
 ?}

 ?static inline void __exynos_iommu_release_mapping(struct 
exynos_drm_private *priv)
 ?{
-??? struct iommu_domain *domain = priv->mapping;
-
-??? iommu_put_dma_cookie(domain);
-??? iommu_domain_free(domain);
 ???? priv->mapping = NULL;
 ?}

@@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct 
exynos_drm_private *priv,
 ?{
 ???? struct iommu_domain *domain = priv->mapping;

-??? return iommu_attach_device(domain, dev);
+??? if (dev != priv->dma_dev)
+??? ??? return iommu_attach_device(domain, dev);
+??? return 0;
 ?}

 ?static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
@@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct 
exynos_drm_private *priv,
 ?{
 ???? struct iommu_domain *domain = priv->mapping;

-??? iommu_detach_device(domain, dev);
+??? if (dev != priv->dma_dev)
+??? ??? iommu_detach_device(domain, dev);
 ?}
 ?#else
 ?#error Unsupported architecture and IOMMU/DMA-mapping glue code

 > ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
  2018-09-28 14:21                 ` Marek Szyprowski
@ 2018-09-28 15:31                     ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-28 15:31 UTC (permalink / raw)
  To: Marek Szyprowski, joro-zLv9SwRftAIdnm+yROfE0A, hch-jcswGhMUV9g
  Cc: 'Linux Samsung SOC',
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	linuxarm-hv44wF8Li93QT0dZR+AlfA, 대인기,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 28/09/18 15:21, Marek Szyprowski wrote:
> Hi Robin,
> 
> On 2018-09-28 15:52, Robin Murphy wrote:
>> On 28/09/18 14:26, Marek Szyprowski wrote:
>>> On 2018-09-12 17:24, Robin Murphy wrote:
>>>> Most parts of iommu-dma already assume they are operating on a default
>>>> domain set up by iommu_dma_init_domain(), and can be converted straight
>>>> over to avoid the refcounting bottleneck. MSI page mappings may be in
>>>> an unmanaged domain with an explicit MSI-only cookie, so retain the
>>>> non-specific lookup, but that's OK since they're far from a contended
>>>> fast path either way.
>>>>
>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>
>>> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
>>> Exynos DRM creates it's own domain, attach all devices which performs
>>> DMA access (typically CRTC devices) to it and uses standard DMA-mapping
>>> calls to allocate/map buffers. This way it can use the same one DMA
>>> address for each buffer regardless of the CRTC/display/processing
>>> device.
>>>
>>> This no longer works with this patch. The simplest solution to fix this
>>> is add API to change default_domain to the one allocated by the Exynos
>>> DRM driver.
>>
>> Ugh, I hadn't realised there were any drivers trying to fake up their
>> own default domains - that's always going to be fragile. In fact, one
>> of my proposals for un-breaking Tegra and other DRM drivers involves
>> preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA
>> domains at all, which would stop that from working permanently.
> 
> IMHO there should be a way to let drivers like DRM to use DMA-mapping
> infrastructure on their own iommu domain. Better provide such API to
> avoid hacking in the DRM drivers to get this done without help from
> IOMMU core.

The tricky part is how to reconcile that with those other drivers which 
want to do explicit IOMMU management with their own domain but still use 
the DMA API for coherency of the underlying memory. I do have a couple 
of half-formed ideas, but they're not quite there yet.

>> Can Exynos not put all the DRM components into a single group, or at
>> least just pick one of the real default domains to attach everyone to
>> instead of allocating a fake one?
> 
> Exynos DRM components are not in the single group. Currently
> iommu-groups have no meaning on Exynos5433 and Exynos IOMMU simply
> allocate one group per each iommu-master device in the system.

Yeah, a group spanning multiple IOMMU devices would have been a bit of a 
hack for your topology, but still *technically* possible ;)

> Exynos DRM already selects one of its component devices as the 'main DMA
> device'. It is being used for managing buffers if no IOMMU is available,
> so it can also use its iommu domain instead of allocating a fake one.
> I've checked and it fixes Exynos DRM now. The question is how to merge
> this fix to keep bisectability.
> 
> From: Marek Szyprowski <m.szyprowski@samsung.com>
> Date: Fri, 28 Sep 2018 16:17:27 +0200
> Subject: [PATCH] drm/exynos: Use selected dma_dev default iommu domain
> instead
>    of a fake one
> 
> Instead of allocating a fake IOMMU domain for all Exynos DRM components,
> simply reuse the default IOMMU domain of the already selected DMA device.
> This allows some design changes in IOMMU framework without breaking IOMMU
> support in Exynos DRM.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>    drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34 ++++-------------------
>    1 file changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> index 87f6b5672e11..51d3b7dcd529 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> +++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> @@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct
> exynos_drm_private *priv,
>    static inline int __exynos_iommu_create_mapping(struct
> exynos_drm_private *priv,
>                        unsigned long start, unsigned long size)
>    {
> -    struct iommu_domain *domain;
> -    int ret;
> -
> -    domain = iommu_domain_alloc(priv->dma_dev->bus);
> -    if (!domain)
> -        return -ENOMEM;
> -
> -    ret = iommu_get_dma_cookie(domain);
> -    if (ret)
> -        goto free_domain;
> -
> -    ret = iommu_dma_init_domain(domain, start, size, NULL);
> -    if (ret)
> -        goto put_cookie;
> -
> -    priv->mapping = domain;
> +    priv->mapping = iommu_get_dma_domain(priv->dma_dev);

iommu_get_domain_for_dev(), please - this isn't a performance-critical 
path inside a DMA API implementation, plus without an actual dependency 
on this series maybe there's a chance of sneaking it into 4.19?

(you could always still check domain->type == IOMMU_DOMAIN_DMA if you 
want to be really really paranoid.)

>        return 0;
> -
> -put_cookie:
> -    iommu_put_dma_cookie(domain);
> -free_domain:
> -    iommu_domain_free(domain);
> -    return ret;
>    }
> 
>    static inline void __exynos_iommu_release_mapping(struct
> exynos_drm_private *priv)
>    {
> -    struct iommu_domain *domain = priv->mapping;
> -
> -    iommu_put_dma_cookie(domain);
> -    iommu_domain_free(domain);
>        priv->mapping = NULL;
>    }
> 
> @@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct
> exynos_drm_private *priv,
>    {
>        struct iommu_domain *domain = priv->mapping;
> 
> -    return iommu_attach_device(domain, dev);
> +    if (dev != priv->dma_dev)
> +        return iommu_attach_device(domain, dev);
> +    return 0;
>    }
> 
>    static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
> @@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct
> exynos_drm_private *priv,
>    {
>        struct iommu_domain *domain = priv->mapping;
> 
> -    iommu_detach_device(domain, dev);
> +    if (dev != priv->dma_dev)
> +        iommu_detach_device(domain, dev);

Strictly you could skip that check, as detaching the master device from 
its real default domain is just a no-op, but I guess maintaining the 
symmetry is probably more intuitive.

Robin.

>    }
>    #else
>    #error Unsupported architecture and IOMMU/DMA-mapping glue code
> 
>   > ...
> 
> Best regards
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-09-28 15:31                     ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-28 15:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/09/18 15:21, Marek Szyprowski wrote:
> Hi Robin,
> 
> On 2018-09-28 15:52, Robin Murphy wrote:
>> On 28/09/18 14:26, Marek Szyprowski wrote:
>>> On 2018-09-12 17:24, Robin Murphy wrote:
>>>> Most parts of iommu-dma already assume they are operating on a default
>>>> domain set up by iommu_dma_init_domain(), and can be converted straight
>>>> over to avoid the refcounting bottleneck. MSI page mappings may be in
>>>> an unmanaged domain with an explicit MSI-only cookie, so retain the
>>>> non-specific lookup, but that's OK since they're far from a contended
>>>> fast path either way.
>>>>
>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>
>>> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
>>> Exynos DRM creates it's own domain, attach all devices which performs
>>> DMA access (typically CRTC devices) to it and uses standard DMA-mapping
>>> calls to allocate/map buffers. This way it can use the same one DMA
>>> address for each buffer regardless of the CRTC/display/processing
>>> device.
>>>
>>> This no longer works with this patch. The simplest solution to fix this
>>> is add API to change default_domain to the one allocated by the Exynos
>>> DRM driver.
>>
>> Ugh, I hadn't realised there were any drivers trying to fake up their
>> own default domains - that's always going to be fragile. In fact, one
>> of my proposals for un-breaking Tegra and other DRM drivers involves
>> preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA
>> domains at all, which would stop that from working permanently.
> 
> IMHO there should be a way to let drivers like DRM to use DMA-mapping
> infrastructure on their own iommu domain. Better provide such API to
> avoid hacking in the DRM drivers to get this done without help from
> IOMMU core.

The tricky part is how to reconcile that with those other drivers which 
want to do explicit IOMMU management with their own domain but still use 
the DMA API for coherency of the underlying memory. I do have a couple 
of half-formed ideas, but they're not quite there yet.

>> Can Exynos not put all the DRM components into a single group, or at
>> least just pick one of the real default domains to attach everyone to
>> instead of allocating a fake one?
> 
> Exynos DRM components are not in the single group. Currently
> iommu-groups have no meaning on Exynos5433 and Exynos IOMMU simply
> allocate one group per each iommu-master device in the system.

Yeah, a group spanning multiple IOMMU devices would have been a bit of a 
hack for your topology, but still *technically* possible ;)

> Exynos DRM already selects one of its component devices as the 'main DMA
> device'. It is being used for managing buffers if no IOMMU is available,
> so it can also use its iommu domain instead of allocating a fake one.
> I've checked and it fixes Exynos DRM now. The question is how to merge
> this fix to keep bisectability.
> 
> From: Marek Szyprowski <m.szyprowski@samsung.com>
> Date: Fri, 28 Sep 2018 16:17:27 +0200
> Subject: [PATCH] drm/exynos: Use selected dma_dev default iommu domain
> instead
>   ?of a fake one
> 
> Instead of allocating a fake IOMMU domain for all Exynos DRM components,
> simply reuse the default IOMMU domain of the already selected DMA device.
> This allows some design changes in IOMMU framework without breaking IOMMU
> support in Exynos DRM.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>   ?drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34 ++++-------------------
>   ?1 file changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> index 87f6b5672e11..51d3b7dcd529 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> +++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> @@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct
> exynos_drm_private *priv,
>   ?static inline int __exynos_iommu_create_mapping(struct
> exynos_drm_private *priv,
>   ???? ??? ??? ??? ??? unsigned long start, unsigned long size)
>   ?{
> -??? struct iommu_domain *domain;
> -??? int ret;
> -
> -??? domain = iommu_domain_alloc(priv->dma_dev->bus);
> -??? if (!domain)
> -??? ??? return -ENOMEM;
> -
> -??? ret = iommu_get_dma_cookie(domain);
> -??? if (ret)
> -??? ??? goto free_domain;
> -
> -??? ret = iommu_dma_init_domain(domain, start, size, NULL);
> -??? if (ret)
> -??? ??? goto put_cookie;
> -
> -??? priv->mapping = domain;
> +??? priv->mapping = iommu_get_dma_domain(priv->dma_dev);

iommu_get_domain_for_dev(), please - this isn't a performance-critical 
path inside a DMA API implementation, plus without an actual dependency 
on this series maybe there's a chance of sneaking it into 4.19?

(you could always still check domain->type == IOMMU_DOMAIN_DMA if you 
want to be really really paranoid.)

>   ???? return 0;
> -
> -put_cookie:
> -??? iommu_put_dma_cookie(domain);
> -free_domain:
> -??? iommu_domain_free(domain);
> -??? return ret;
>   ?}
> 
>   ?static inline void __exynos_iommu_release_mapping(struct
> exynos_drm_private *priv)
>   ?{
> -??? struct iommu_domain *domain = priv->mapping;
> -
> -??? iommu_put_dma_cookie(domain);
> -??? iommu_domain_free(domain);
>   ???? priv->mapping = NULL;
>   ?}
> 
> @@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct
> exynos_drm_private *priv,
>   ?{
>   ???? struct iommu_domain *domain = priv->mapping;
> 
> -??? return iommu_attach_device(domain, dev);
> +??? if (dev != priv->dma_dev)
> +??? ??? return iommu_attach_device(domain, dev);
> +??? return 0;
>   ?}
> 
>   ?static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
> @@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct
> exynos_drm_private *priv,
>   ?{
>   ???? struct iommu_domain *domain = priv->mapping;
> 
> -??? iommu_detach_device(domain, dev);
> +??? if (dev != priv->dma_dev)
> +??? ??? iommu_detach_device(domain, dev);

Strictly you could skip that check, as detaching the master device from 
its real default domain is just a no-op, but I guess maintaining the 
symmetry is probably more intuitive.

Robin.

>   ?}
>   ?#else
>   ?#error Unsupported architecture and IOMMU/DMA-mapping glue code
> 
>   > ...
> 
> Best regards
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
  2018-09-28 15:31                     ` Robin Murphy
@ 2018-09-28 15:33                         ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2018-09-28 15:33 UTC (permalink / raw)
  To: Robin Murphy
  Cc: 'Linux Samsung SOC',
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	linuxarm-hv44wF8Li93QT0dZR+AlfA, 대인기,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA, hch-jcswGhMUV9g,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Sep 28, 2018 at 04:31:10PM +0100, Robin Murphy wrote:
> The tricky part is how to reconcile that with those other drivers which 
> want to do explicit IOMMU management with their own domain but still use 
> the DMA API for coherency of the underlying memory. I do have a couple of 
> half-formed ideas, but they're not quite there yet.

It think the only sensible answer is that they can't, and we'll need
coherency API in the iommu API (or augmenting it).  Which might be a
real possibility now that I'm almost done lifting the coherency
management to arch hooks instead of dma op methods.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-09-28 15:33                         ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2018-09-28 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 28, 2018 at 04:31:10PM +0100, Robin Murphy wrote:
> The tricky part is how to reconcile that with those other drivers which 
> want to do explicit IOMMU management with their own domain but still use 
> the DMA API for coherency of the underlying memory. I do have a couple of 
> half-formed ideas, but they're not quite there yet.

It think the only sensible answer is that they can't, and we'll need
coherency API in the iommu API (or augmenting it).  Which might be a
real possibility now that I'm almost done lifting the coherency
management to arch hooks instead of dma op methods.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
  2018-09-28 15:33                         ` Christoph Hellwig
@ 2018-09-28 15:46                             ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-28 15:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: 'Linux Samsung SOC',
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	linuxarm-hv44wF8Li93QT0dZR+AlfA, 대인기,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 28/09/18 16:33, Christoph Hellwig wrote:
> On Fri, Sep 28, 2018 at 04:31:10PM +0100, Robin Murphy wrote:
>> The tricky part is how to reconcile that with those other drivers which
>> want to do explicit IOMMU management with their own domain but still use
>> the DMA API for coherency of the underlying memory. I do have a couple of
>> half-formed ideas, but they're not quite there yet.
> 
> It think the only sensible answer is that they can't, and we'll need
> coherency API in the iommu API (or augmenting it).  Which might be a
> real possibility now that I'm almost done lifting the coherency
> management to arch hooks instead of dma op methods.

I reckon it seems feasible if the few clients that want to had a way to 
allocate their own proper managed domains, and the IOMMU API knew how to 
swizzle between iommu_dma_ops and dma_direct_ops upon attach depending 
on whether the target domain was managed or unmanaged. The worst part is 
liekly to be the difference between that lovely conceptual 
"iommu_dma_ops" and the cross-architecture mess of reality, plus where 
identity domains with a possible need for SWIOTLB come into it.

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-09-28 15:46                             ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-28 15:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/09/18 16:33, Christoph Hellwig wrote:
> On Fri, Sep 28, 2018 at 04:31:10PM +0100, Robin Murphy wrote:
>> The tricky part is how to reconcile that with those other drivers which
>> want to do explicit IOMMU management with their own domain but still use
>> the DMA API for coherency of the underlying memory. I do have a couple of
>> half-formed ideas, but they're not quite there yet.
> 
> It think the only sensible answer is that they can't, and we'll need
> coherency API in the iommu API (or augmenting it).  Which might be a
> real possibility now that I'm almost done lifting the coherency
> management to arch hooks instead of dma op methods.

I reckon it seems feasible if the few clients that want to had a way to 
allocate their own proper managed domains, and the IOMMU API knew how to 
swizzle between iommu_dma_ops and dma_direct_ops upon attach depending 
on whether the target domain was managed or unmanaged. The worst part is 
liekly to be the difference between that lovely conceptual 
"iommu_dma_ops" and the cross-architecture mess of reality, plus where 
identity domains with a possible need for SWIOTLB come into it.

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] drm/exynos: Use selected dma_dev default iommu domain instead of a fake one
       [not found]                     ` <CGME20180928160945eucas1p17ff77fe34854548719880638ecb9c54d@eucas1p1.samsung.com>
@ 2018-09-28 16:09                         ` Marek Szyprowski
  0 siblings, 0 replies; 40+ messages in thread
From: Marek Szyprowski @ 2018-09-28 16:09 UTC (permalink / raw)
  To: dri-devel, iommu
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	Robin Murphy, hch, linux-arm-kernel, Marek Szyprowski

Instead of allocating a fake IOMMU domain for all Exynos DRM components,
simply reuse the default IOMMU domain of the already selected DMA device.
This allows some design changes in IOMMU framework without breaking IOMMU
support in Exynos DRM.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Inki:
If possible, please consider this patch as a fix for v4.19-rc, this will
help doing a rework in IOMMU and DMA-IOMMU frameworks in v4.20 without
breaking Exynos DRM. It worked for current IOMMU code, but such usage is
considered as a hack.
---
 drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34 ++++-------------------
 1 file changed, 6 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
index 87f6b5672e11..797d9ee5f15a 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
@@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
 static inline int __exynos_iommu_create_mapping(struct exynos_drm_private *priv,
 					unsigned long start, unsigned long size)
 {
-	struct iommu_domain *domain;
-	int ret;
-
-	domain = iommu_domain_alloc(priv->dma_dev->bus);
-	if (!domain)
-		return -ENOMEM;
-
-	ret = iommu_get_dma_cookie(domain);
-	if (ret)
-		goto free_domain;
-
-	ret = iommu_dma_init_domain(domain, start, size, NULL);
-	if (ret)
-		goto put_cookie;
-
-	priv->mapping = domain;
+	priv->mapping = iommu_get_domain_for_dev(priv->dma_dev);
 	return 0;
-
-put_cookie:
-	iommu_put_dma_cookie(domain);
-free_domain:
-	iommu_domain_free(domain);
-	return ret;
 }
 
 static inline void __exynos_iommu_release_mapping(struct exynos_drm_private *priv)
 {
-	struct iommu_domain *domain = priv->mapping;
-
-	iommu_put_dma_cookie(domain);
-	iommu_domain_free(domain);
 	priv->mapping = NULL;
 }
 
@@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct exynos_drm_private *priv,
 {
 	struct iommu_domain *domain = priv->mapping;
 
-	return iommu_attach_device(domain, dev);
+	if (dev != priv->dma_dev)
+		return iommu_attach_device(domain, dev);
+	return 0;
 }
 
 static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
@@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
 {
 	struct iommu_domain *domain = priv->mapping;
 
-	iommu_detach_device(domain, dev);
+	if (dev != priv->dma_dev)
+		iommu_detach_device(domain, dev);
 }
 #else
 #error Unsupported architecture and IOMMU/DMA-mapping glue code
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH] drm/exynos: Use selected dma_dev default iommu domain instead of a fake one
@ 2018-09-28 16:09                         ` Marek Szyprowski
  0 siblings, 0 replies; 40+ messages in thread
From: Marek Szyprowski @ 2018-09-28 16:09 UTC (permalink / raw)
  To: linux-arm-kernel

Instead of allocating a fake IOMMU domain for all Exynos DRM components,
simply reuse the default IOMMU domain of the already selected DMA device.
This allows some design changes in IOMMU framework without breaking IOMMU
support in Exynos DRM.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Inki:
If possible, please consider this patch as a fix for v4.19-rc, this will
help doing a rework in IOMMU and DMA-IOMMU frameworks in v4.20 without
breaking Exynos DRM. It worked for current IOMMU code, but such usage is
considered as a hack.
---
 drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34 ++++-------------------
 1 file changed, 6 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
index 87f6b5672e11..797d9ee5f15a 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
@@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
 static inline int __exynos_iommu_create_mapping(struct exynos_drm_private *priv,
 					unsigned long start, unsigned long size)
 {
-	struct iommu_domain *domain;
-	int ret;
-
-	domain = iommu_domain_alloc(priv->dma_dev->bus);
-	if (!domain)
-		return -ENOMEM;
-
-	ret = iommu_get_dma_cookie(domain);
-	if (ret)
-		goto free_domain;
-
-	ret = iommu_dma_init_domain(domain, start, size, NULL);
-	if (ret)
-		goto put_cookie;
-
-	priv->mapping = domain;
+	priv->mapping = iommu_get_domain_for_dev(priv->dma_dev);
 	return 0;
-
-put_cookie:
-	iommu_put_dma_cookie(domain);
-free_domain:
-	iommu_domain_free(domain);
-	return ret;
 }
 
 static inline void __exynos_iommu_release_mapping(struct exynos_drm_private *priv)
 {
-	struct iommu_domain *domain = priv->mapping;
-
-	iommu_put_dma_cookie(domain);
-	iommu_domain_free(domain);
 	priv->mapping = NULL;
 }
 
@@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct exynos_drm_private *priv,
 {
 	struct iommu_domain *domain = priv->mapping;
 
-	return iommu_attach_device(domain, dev);
+	if (dev != priv->dma_dev)
+		return iommu_attach_device(domain, dev);
+	return 0;
 }
 
 static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
@@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
 {
 	struct iommu_domain *domain = priv->mapping;
 
-	iommu_detach_device(domain, dev);
+	if (dev != priv->dma_dev)
+		iommu_detach_device(domain, dev);
 }
 #else
 #error Unsupported architecture and IOMMU/DMA-mapping glue code
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] drm/exynos: Use selected dma_dev default iommu domain instead of a fake one
  2018-09-28 16:09                         ` Marek Szyprowski
@ 2018-09-28 16:13                           ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-28 16:13 UTC (permalink / raw)
  To: Marek Szyprowski, dri-devel, iommu
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim, hch,
	linux-arm-kernel

On 2018-09-28 5:09 PM, Marek Szyprowski wrote:
> Instead of allocating a fake IOMMU domain for all Exynos DRM components,
> simply reuse the default IOMMU domain of the already selected DMA device.
> This allows some design changes in IOMMU framework without breaking IOMMU
> support in Exynos DRM.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> Inki:
> If possible, please consider this patch as a fix for v4.19-rc, this will
> help doing a rework in IOMMU and DMA-IOMMU frameworks in v4.20 without
> breaking Exynos DRM. It worked for current IOMMU code, but such usage is
> considered as a hack.
> ---
>   drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34 ++++-------------------
>   1 file changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> index 87f6b5672e11..797d9ee5f15a 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> +++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> @@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
>   static inline int __exynos_iommu_create_mapping(struct exynos_drm_private *priv,
>   					unsigned long start, unsigned long size)
>   {
> -	struct iommu_domain *domain;
> -	int ret;
> -
> -	domain = iommu_domain_alloc(priv->dma_dev->bus);
> -	if (!domain)
> -		return -ENOMEM;
> -
> -	ret = iommu_get_dma_cookie(domain);
> -	if (ret)
> -		goto free_domain;
> -
> -	ret = iommu_dma_init_domain(domain, start, size, NULL);
> -	if (ret)
> -		goto put_cookie;
> -
> -	priv->mapping = domain;
> +	priv->mapping = iommu_get_domain_for_dev(priv->dma_dev);
>   	return 0;
> -
> -put_cookie:
> -	iommu_put_dma_cookie(domain);
> -free_domain:
> -	iommu_domain_free(domain);
> -	return ret;
>   }
>   
>   static inline void __exynos_iommu_release_mapping(struct exynos_drm_private *priv)
>   {
> -	struct iommu_domain *domain = priv->mapping;
> -
> -	iommu_put_dma_cookie(domain);
> -	iommu_domain_free(domain);
>   	priv->mapping = NULL;
>   }
>   
> @@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct exynos_drm_private *priv,
>   {
>   	struct iommu_domain *domain = priv->mapping;
>   
> -	return iommu_attach_device(domain, dev);
> +	if (dev != priv->dma_dev)
> +		return iommu_attach_device(domain, dev);
> +	return 0;
>   }
>   
>   static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
> @@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
>   {
>   	struct iommu_domain *domain = priv->mapping;
>   
> -	iommu_detach_device(domain, dev);
> +	if (dev != priv->dma_dev)
> +		iommu_detach_device(domain, dev);
>   }
>   #else
>   #error Unsupported architecture and IOMMU/DMA-mapping glue code
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] drm/exynos: Use selected dma_dev default iommu domain instead of a fake one
@ 2018-09-28 16:13                           ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2018-09-28 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 2018-09-28 5:09 PM, Marek Szyprowski wrote:
> Instead of allocating a fake IOMMU domain for all Exynos DRM components,
> simply reuse the default IOMMU domain of the already selected DMA device.
> This allows some design changes in IOMMU framework without breaking IOMMU
> support in Exynos DRM.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> Inki:
> If possible, please consider this patch as a fix for v4.19-rc, this will
> help doing a rework in IOMMU and DMA-IOMMU frameworks in v4.20 without
> breaking Exynos DRM. It worked for current IOMMU code, but such usage is
> considered as a hack.
> ---
>   drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34 ++++-------------------
>   1 file changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> index 87f6b5672e11..797d9ee5f15a 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> +++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> @@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
>   static inline int __exynos_iommu_create_mapping(struct exynos_drm_private *priv,
>   					unsigned long start, unsigned long size)
>   {
> -	struct iommu_domain *domain;
> -	int ret;
> -
> -	domain = iommu_domain_alloc(priv->dma_dev->bus);
> -	if (!domain)
> -		return -ENOMEM;
> -
> -	ret = iommu_get_dma_cookie(domain);
> -	if (ret)
> -		goto free_domain;
> -
> -	ret = iommu_dma_init_domain(domain, start, size, NULL);
> -	if (ret)
> -		goto put_cookie;
> -
> -	priv->mapping = domain;
> +	priv->mapping = iommu_get_domain_for_dev(priv->dma_dev);
>   	return 0;
> -
> -put_cookie:
> -	iommu_put_dma_cookie(domain);
> -free_domain:
> -	iommu_domain_free(domain);
> -	return ret;
>   }
>   
>   static inline void __exynos_iommu_release_mapping(struct exynos_drm_private *priv)
>   {
> -	struct iommu_domain *domain = priv->mapping;
> -
> -	iommu_put_dma_cookie(domain);
> -	iommu_domain_free(domain);
>   	priv->mapping = NULL;
>   }
>   
> @@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct exynos_drm_private *priv,
>   {
>   	struct iommu_domain *domain = priv->mapping;
>   
> -	return iommu_attach_device(domain, dev);
> +	if (dev != priv->dma_dev)
> +		return iommu_attach_device(domain, dev);
> +	return 0;
>   }
>   
>   static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
> @@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct exynos_drm_private *priv,
>   {
>   	struct iommu_domain *domain = priv->mapping;
>   
> -	iommu_detach_device(domain, dev);
> +	if (dev != priv->dma_dev)
> +		iommu_detach_device(domain, dev);
>   }
>   #else
>   #error Unsupported architecture and IOMMU/DMA-mapping glue code
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
  2018-09-28 15:31                     ` Robin Murphy
@ 2018-09-28 16:18                         ` Marek Szyprowski
  -1 siblings, 0 replies; 40+ messages in thread
From: Marek Szyprowski @ 2018-09-28 16:18 UTC (permalink / raw)
  To: Robin Murphy, joro-zLv9SwRftAIdnm+yROfE0A, hch-jcswGhMUV9g
  Cc: 'Linux Samsung SOC',
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	linuxarm-hv44wF8Li93QT0dZR+AlfA, 대인기,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	guohanjun-hv44wF8Li93QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Robin,

On 2018-09-28 17:31, Robin Murphy wrote:
> On 28/09/18 15:21, Marek Szyprowski wrote:
>> On 2018-09-28 15:52, Robin Murphy wrote:
>>> On 28/09/18 14:26, Marek Szyprowski wrote:
>>>> On 2018-09-12 17:24, Robin Murphy wrote:
>>>>> Most parts of iommu-dma already assume they are operating on a
>>>>> default
>>>>> domain set up by iommu_dma_init_domain(), and can be converted
>>>>> straight
>>>>> over to avoid the refcounting bottleneck. MSI page mappings may be in
>>>>> an unmanaged domain with an explicit MSI-only cookie, so retain the
>>>>> non-specific lookup, but that's OK since they're far from a contended
>>>>> fast path either way.
>>>>>
>>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>>
>>>> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
>>>> Exynos DRM creates it's own domain, attach all devices which performs
>>>> DMA access (typically CRTC devices) to it and uses standard
>>>> DMA-mapping
>>>> calls to allocate/map buffers. This way it can use the same one DMA
>>>> address for each buffer regardless of the CRTC/display/processing
>>>> device.
>>>>
>>>> This no longer works with this patch. The simplest solution to fix
>>>> this
>>>> is add API to change default_domain to the one allocated by the Exynos
>>>> DRM driver.
>>>
>>> Ugh, I hadn't realised there were any drivers trying to fake up their
>>> own default domains - that's always going to be fragile. In fact, one
>>> of my proposals for un-breaking Tegra and other DRM drivers involves
>>> preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA
>>> domains at all, which would stop that from working permanently.
>>
>> IMHO there should be a way to let drivers like DRM to use DMA-mapping
>> infrastructure on their own iommu domain. Better provide such API to
>> avoid hacking in the DRM drivers to get this done without help from
>> IOMMU core.
>
> The tricky part is how to reconcile that with those other drivers
> which want to do explicit IOMMU management with their own domain but
> still use the DMA API for coherency of the underlying memory. I do
> have a couple of half-formed ideas, but they're not quite there yet.

The only idea I have here is to add some flags to struct driver to let
device core and frameworks note that this driver wants to manage
everything on their own. Then such drivers, once they settle their IOMMU
domain, should call some simple API to bless it for DMA API.

>
>>> Can Exynos not put all the DRM components into a single group, or at
>>> least just pick one of the real default domains to attach everyone to
>>> instead of allocating a fake one?
>>
>> Exynos DRM components are not in the single group. Currently
>> iommu-groups have no meaning on Exynos5433 and Exynos IOMMU simply
>> allocate one group per each iommu-master device in the system.
>
> Yeah, a group spanning multiple IOMMU devices would have been a bit of
> a hack for your topology, but still *technically* possible ;)

The question if I want to put all DRM components into single IOMMU
group, how to propagate such knowledge from Exynos DRM driver to Exynos
IOMMU driver (groups are created by IOMMU driver)? Anyway, I don't see
any benefits from using IOMMU groups as for now, especially when each
Exynos DRM component has its own, separate IOMMU (or even more than one,
but that a different story).

>> Exynos DRM already selects one of its component devices as the 'main DMA
>> device'. It is being used for managing buffers if no IOMMU is available,
>> so it can also use its iommu domain instead of allocating a fake one.
>> I've checked and it fixes Exynos DRM now. The question is how to merge
>> this fix to keep bisectability.
>>
>> From: Marek Szyprowski <m.szyprowski@samsung.com>
>> Date: Fri, 28 Sep 2018 16:17:27 +0200
>> Subject: [PATCH] drm/exynos: Use selected dma_dev default iommu domain
>> instead
>>    of a fake one
>>
>> Instead of allocating a fake IOMMU domain for all Exynos DRM components,
>> simply reuse the default IOMMU domain of the already selected DMA
>> device.
>> This allows some design changes in IOMMU framework without breaking
>> IOMMU
>> support in Exynos DRM.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> ---
>>    drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34
>> ++++-------------------
>>    1 file changed, 6 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> index 87f6b5672e11..51d3b7dcd529 100644
>> --- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> +++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> @@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct
>> exynos_drm_private *priv,
>>    static inline int __exynos_iommu_create_mapping(struct
>> exynos_drm_private *priv,
>>                        unsigned long start, unsigned long size)
>>    {
>> -    struct iommu_domain *domain;
>> -    int ret;
>> -
>> -    domain = iommu_domain_alloc(priv->dma_dev->bus);
>> -    if (!domain)
>> -        return -ENOMEM;
>> -
>> -    ret = iommu_get_dma_cookie(domain);
>> -    if (ret)
>> -        goto free_domain;
>> -
>> -    ret = iommu_dma_init_domain(domain, start, size, NULL);
>> -    if (ret)
>> -        goto put_cookie;
>> -
>> -    priv->mapping = domain;
>> +    priv->mapping = iommu_get_dma_domain(priv->dma_dev);
>
> iommu_get_domain_for_dev(), please - this isn't a performance-critical
> path inside a DMA API implementation, plus without an actual
> dependency on this series maybe there's a chance of sneaking it into
> 4.19?
>
> (you could always still check domain->type == IOMMU_DOMAIN_DMA if you
> want to be really really paranoid.)
>

I've sent a patch. I hope we will manage to get it as a fix into v4.19,
so it won't block your changes in v4.20.

>>        return 0;
>> -
>> -put_cookie:
>> -    iommu_put_dma_cookie(domain);
>> -free_domain:
>> -    iommu_domain_free(domain);
>> -    return ret;
>>    }
>>
>>    static inline void __exynos_iommu_release_mapping(struct
>> exynos_drm_private *priv)
>>    {
>> -    struct iommu_domain *domain = priv->mapping;
>> -
>> -    iommu_put_dma_cookie(domain);
>> -    iommu_domain_free(domain);
>>        priv->mapping = NULL;
>>    }
>>
>> @@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct
>> exynos_drm_private *priv,
>>    {
>>        struct iommu_domain *domain = priv->mapping;
>>
>> -    return iommu_attach_device(domain, dev);
>> +    if (dev != priv->dma_dev)
>> +        return iommu_attach_device(domain, dev);
>> +    return 0;
>>    }
>>
>>    static inline void __exynos_iommu_detach(struct exynos_drm_private
>> *priv,
>> @@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct
>> exynos_drm_private *priv,
>>    {
>>        struct iommu_domain *domain = priv->mapping;
>>
>> -    iommu_detach_device(domain, dev);
>> +    if (dev != priv->dma_dev)
>> +        iommu_detach_device(domain, dev);
>
> Strictly you could skip that check, as detaching the master device
> from its real default domain is just a no-op, but I guess maintaining
> the symmetry is probably more intuitive.

I would like to keep the symmetry.

> ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-09-28 16:18                         ` Marek Szyprowski
  0 siblings, 0 replies; 40+ messages in thread
From: Marek Szyprowski @ 2018-09-28 16:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On 2018-09-28 17:31, Robin Murphy wrote:
> On 28/09/18 15:21, Marek Szyprowski wrote:
>> On 2018-09-28 15:52, Robin Murphy wrote:
>>> On 28/09/18 14:26, Marek Szyprowski wrote:
>>>> On 2018-09-12 17:24, Robin Murphy wrote:
>>>>> Most parts of iommu-dma already assume they are operating on a
>>>>> default
>>>>> domain set up by iommu_dma_init_domain(), and can be converted
>>>>> straight
>>>>> over to avoid the refcounting bottleneck. MSI page mappings may be in
>>>>> an unmanaged domain with an explicit MSI-only cookie, so retain the
>>>>> non-specific lookup, but that's OK since they're far from a contended
>>>>> fast path either way.
>>>>>
>>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>>
>>>> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
>>>> Exynos DRM creates it's own domain, attach all devices which performs
>>>> DMA access (typically CRTC devices) to it and uses standard
>>>> DMA-mapping
>>>> calls to allocate/map buffers. This way it can use the same one DMA
>>>> address for each buffer regardless of the CRTC/display/processing
>>>> device.
>>>>
>>>> This no longer works with this patch. The simplest solution to fix
>>>> this
>>>> is add API to change default_domain to the one allocated by the Exynos
>>>> DRM driver.
>>>
>>> Ugh, I hadn't realised there were any drivers trying to fake up their
>>> own default domains - that's always going to be fragile. In fact, one
>>> of my proposals for un-breaking Tegra and other DRM drivers involves
>>> preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA
>>> domains at all, which would stop that from working permanently.
>>
>> IMHO there should be a way to let drivers like DRM to use DMA-mapping
>> infrastructure on their own iommu domain. Better provide such API to
>> avoid hacking in the DRM drivers to get this done without help from
>> IOMMU core.
>
> The tricky part is how to reconcile that with those other drivers
> which want to do explicit IOMMU management with their own domain but
> still use the DMA API for coherency of the underlying memory. I do
> have a couple of half-formed ideas, but they're not quite there yet.

The only idea I have here is to add some flags to struct driver to let
device core and frameworks note that this driver wants to manage
everything on their own. Then such drivers, once they settle their IOMMU
domain, should call some simple API to bless it for DMA API.

>
>>> Can Exynos not put all the DRM components into a single group, or at
>>> least just pick one of the real default domains to attach everyone to
>>> instead of allocating a fake one?
>>
>> Exynos DRM components are not in the single group. Currently
>> iommu-groups have no meaning on Exynos5433 and Exynos IOMMU simply
>> allocate one group per each iommu-master device in the system.
>
> Yeah, a group spanning multiple IOMMU devices would have been a bit of
> a hack for your topology, but still *technically* possible ;)

The question if I want to put all DRM components into single IOMMU
group, how to propagate such knowledge from Exynos DRM driver to Exynos
IOMMU driver (groups are created by IOMMU driver)? Anyway, I don't see
any benefits from using IOMMU groups as for now, especially when each
Exynos DRM component has its own, separate IOMMU (or even more than one,
but that a different story).

>> Exynos DRM already selects one of its component devices as the 'main DMA
>> device'. It is being used for managing buffers if no IOMMU is available,
>> so it can also use its iommu domain instead of allocating a fake one.
>> I've checked and it fixes Exynos DRM now. The question is how to merge
>> this fix to keep bisectability.
>>
>> From: Marek Szyprowski <m.szyprowski@samsung.com>
>> Date: Fri, 28 Sep 2018 16:17:27 +0200
>> Subject: [PATCH] drm/exynos: Use selected dma_dev default iommu domain
>> instead
>> ? ?of a fake one
>>
>> Instead of allocating a fake IOMMU domain for all Exynos DRM components,
>> simply reuse the default IOMMU domain of the already selected DMA
>> device.
>> This allows some design changes in IOMMU framework without breaking
>> IOMMU
>> support in Exynos DRM.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> ---
>> ? ?drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34
>> ++++-------------------
>> ? ?1 file changed, 6 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> index 87f6b5672e11..51d3b7dcd529 100644
>> --- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> +++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
>> @@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct
>> exynos_drm_private *priv,
>> ? ?static inline int __exynos_iommu_create_mapping(struct
>> exynos_drm_private *priv,
>> ? ???? ??? ??? ??? ??? unsigned long start, unsigned long size)
>> ? ?{
>> -??? struct iommu_domain *domain;
>> -??? int ret;
>> -
>> -??? domain = iommu_domain_alloc(priv->dma_dev->bus);
>> -??? if (!domain)
>> -??? ??? return -ENOMEM;
>> -
>> -??? ret = iommu_get_dma_cookie(domain);
>> -??? if (ret)
>> -??? ??? goto free_domain;
>> -
>> -??? ret = iommu_dma_init_domain(domain, start, size, NULL);
>> -??? if (ret)
>> -??? ??? goto put_cookie;
>> -
>> -??? priv->mapping = domain;
>> +??? priv->mapping = iommu_get_dma_domain(priv->dma_dev);
>
> iommu_get_domain_for_dev(), please - this isn't a performance-critical
> path inside a DMA API implementation, plus without an actual
> dependency on this series maybe there's a chance of sneaking it into
> 4.19?
>
> (you could always still check domain->type == IOMMU_DOMAIN_DMA if you
> want to be really really paranoid.)
>

I've sent a patch. I hope we will manage to get it as a fix into v4.19,
so it won't block your changes in v4.20.

>> ? ???? return 0;
>> -
>> -put_cookie:
>> -??? iommu_put_dma_cookie(domain);
>> -free_domain:
>> -??? iommu_domain_free(domain);
>> -??? return ret;
>> ? ?}
>>
>> ? ?static inline void __exynos_iommu_release_mapping(struct
>> exynos_drm_private *priv)
>> ? ?{
>> -??? struct iommu_domain *domain = priv->mapping;
>> -
>> -??? iommu_put_dma_cookie(domain);
>> -??? iommu_domain_free(domain);
>> ? ???? priv->mapping = NULL;
>> ? ?}
>>
>> @@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct
>> exynos_drm_private *priv,
>> ? ?{
>> ? ???? struct iommu_domain *domain = priv->mapping;
>>
>> -??? return iommu_attach_device(domain, dev);
>> +??? if (dev != priv->dma_dev)
>> +??? ??? return iommu_attach_device(domain, dev);
>> +??? return 0;
>> ? ?}
>>
>> ? ?static inline void __exynos_iommu_detach(struct exynos_drm_private
>> *priv,
>> @@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct
>> exynos_drm_private *priv,
>> ? ?{
>> ? ???? struct iommu_domain *domain = priv->mapping;
>>
>> -??? iommu_detach_device(domain, dev);
>> +??? if (dev != priv->dma_dev)
>> +??? ??? iommu_detach_device(domain, dev);
>
> Strictly you could skip that check, as detaching the master device
> from its real default domain is just a no-op, but I guess maintaining
> the symmetry is probably more intuitive.

I would like to keep the symmetry.

> ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
  2018-09-28 16:18                         ` Marek Szyprowski
@ 2018-10-01 10:55                           ` Inki Dae
  -1 siblings, 0 replies; 40+ messages in thread
From: Inki Dae @ 2018-10-01 10:55 UTC (permalink / raw)
  To: Marek Szyprowski, DRI mailing list
  Cc: linux-samsung-soc, catalin.marinas, will.deacon, linuxarm, iommu,
	guohanjun, robin.murphy, hch, linux-arm-kernel

 + dri-devel

This patch may also break other ARM DRM drivers. I cced dri-devel so
that they could manage this.

And below relevant link for ARM DRM maintainers,
https://www.spinics.net/lists/arm-kernel/msg676098.html

Thanks,
Inki Dae

2018년 9월 29일 (토) 오전 1:19, Marek Szyprowski <m.szyprowski@samsung.com>님이 작성:
>
> Hi Robin,
>
> On 2018-09-28 17:31, Robin Murphy wrote:
> > On 28/09/18 15:21, Marek Szyprowski wrote:
> >> On 2018-09-28 15:52, Robin Murphy wrote:
> >>> On 28/09/18 14:26, Marek Szyprowski wrote:
> >>>> On 2018-09-12 17:24, Robin Murphy wrote:
> >>>>> Most parts of iommu-dma already assume they are operating on a
> >>>>> default
> >>>>> domain set up by iommu_dma_init_domain(), and can be converted
> >>>>> straight
> >>>>> over to avoid the refcounting bottleneck. MSI page mappings may be in
> >>>>> an unmanaged domain with an explicit MSI-only cookie, so retain the
> >>>>> non-specific lookup, but that's OK since they're far from a contended
> >>>>> fast path either way.
> >>>>>
> >>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>>>
> >>>> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
> >>>> Exynos DRM creates it's own domain, attach all devices which performs
> >>>> DMA access (typically CRTC devices) to it and uses standard
> >>>> DMA-mapping
> >>>> calls to allocate/map buffers. This way it can use the same one DMA
> >>>> address for each buffer regardless of the CRTC/display/processing
> >>>> device.
> >>>>
> >>>> This no longer works with this patch. The simplest solution to fix
> >>>> this
> >>>> is add API to change default_domain to the one allocated by the Exynos
> >>>> DRM driver.
> >>>
> >>> Ugh, I hadn't realised there were any drivers trying to fake up their
> >>> own default domains - that's always going to be fragile. In fact, one
> >>> of my proposals for un-breaking Tegra and other DRM drivers involves
> >>> preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA
> >>> domains at all, which would stop that from working permanently.
> >>
> >> IMHO there should be a way to let drivers like DRM to use DMA-mapping
> >> infrastructure on their own iommu domain. Better provide such API to
> >> avoid hacking in the DRM drivers to get this done without help from
> >> IOMMU core.
> >
> > The tricky part is how to reconcile that with those other drivers
> > which want to do explicit IOMMU management with their own domain but
> > still use the DMA API for coherency of the underlying memory. I do
> > have a couple of half-formed ideas, but they're not quite there yet.
>
> The only idea I have here is to add some flags to struct driver to let
> device core and frameworks note that this driver wants to manage
> everything on their own. Then such drivers, once they settle their IOMMU
> domain, should call some simple API to bless it for DMA API.
>
> >
> >>> Can Exynos not put all the DRM components into a single group, or at
> >>> least just pick one of the real default domains to attach everyone to
> >>> instead of allocating a fake one?
> >>
> >> Exynos DRM components are not in the single group. Currently
> >> iommu-groups have no meaning on Exynos5433 and Exynos IOMMU simply
> >> allocate one group per each iommu-master device in the system.
> >
> > Yeah, a group spanning multiple IOMMU devices would have been a bit of
> > a hack for your topology, but still *technically* possible ;)
>
> The question if I want to put all DRM components into single IOMMU
> group, how to propagate such knowledge from Exynos DRM driver to Exynos
> IOMMU driver (groups are created by IOMMU driver)? Anyway, I don't see
> any benefits from using IOMMU groups as for now, especially when each
> Exynos DRM component has its own, separate IOMMU (or even more than one,
> but that a different story).
>
> >> Exynos DRM already selects one of its component devices as the 'main DMA
> >> device'. It is being used for managing buffers if no IOMMU is available,
> >> so it can also use its iommu domain instead of allocating a fake one.
> >> I've checked and it fixes Exynos DRM now. The question is how to merge
> >> this fix to keep bisectability.
> >>
> >> From: Marek Szyprowski <m.szyprowski@samsung.com>
> >> Date: Fri, 28 Sep 2018 16:17:27 +0200
> >> Subject: [PATCH] drm/exynos: Use selected dma_dev default iommu domain
> >> instead
> >>    of a fake one
> >>
> >> Instead of allocating a fake IOMMU domain for all Exynos DRM components,
> >> simply reuse the default IOMMU domain of the already selected DMA
> >> device.
> >> This allows some design changes in IOMMU framework without breaking
> >> IOMMU
> >> support in Exynos DRM.
> >>
> >> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >> ---
> >>    drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34
> >> ++++-------------------
> >>    1 file changed, 6 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> >> b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> >> index 87f6b5672e11..51d3b7dcd529 100644
> >> --- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> >> +++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> >> @@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct
> >> exynos_drm_private *priv,
> >>    static inline int __exynos_iommu_create_mapping(struct
> >> exynos_drm_private *priv,
> >>                        unsigned long start, unsigned long size)
> >>    {
> >> -    struct iommu_domain *domain;
> >> -    int ret;
> >> -
> >> -    domain = iommu_domain_alloc(priv->dma_dev->bus);
> >> -    if (!domain)
> >> -        return -ENOMEM;
> >> -
> >> -    ret = iommu_get_dma_cookie(domain);
> >> -    if (ret)
> >> -        goto free_domain;
> >> -
> >> -    ret = iommu_dma_init_domain(domain, start, size, NULL);
> >> -    if (ret)
> >> -        goto put_cookie;
> >> -
> >> -    priv->mapping = domain;
> >> +    priv->mapping = iommu_get_dma_domain(priv->dma_dev);
> >
> > iommu_get_domain_for_dev(), please - this isn't a performance-critical
> > path inside a DMA API implementation, plus without an actual
> > dependency on this series maybe there's a chance of sneaking it into
> > 4.19?
> >
> > (you could always still check domain->type == IOMMU_DOMAIN_DMA if you
> > want to be really really paranoid.)
> >
>
> I've sent a patch. I hope we will manage to get it as a fix into v4.19,
> so it won't block your changes in v4.20.
>
> >>        return 0;
> >> -
> >> -put_cookie:
> >> -    iommu_put_dma_cookie(domain);
> >> -free_domain:
> >> -    iommu_domain_free(domain);
> >> -    return ret;
> >>    }
> >>
> >>    static inline void __exynos_iommu_release_mapping(struct
> >> exynos_drm_private *priv)
> >>    {
> >> -    struct iommu_domain *domain = priv->mapping;
> >> -
> >> -    iommu_put_dma_cookie(domain);
> >> -    iommu_domain_free(domain);
> >>        priv->mapping = NULL;
> >>    }
> >>
> >> @@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct
> >> exynos_drm_private *priv,
> >>    {
> >>        struct iommu_domain *domain = priv->mapping;
> >>
> >> -    return iommu_attach_device(domain, dev);
> >> +    if (dev != priv->dma_dev)
> >> +        return iommu_attach_device(domain, dev);
> >> +    return 0;
> >>    }
> >>
> >>    static inline void __exynos_iommu_detach(struct exynos_drm_private
> >> *priv,
> >> @@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct
> >> exynos_drm_private *priv,
> >>    {
> >>        struct iommu_domain *domain = priv->mapping;
> >>
> >> -    iommu_detach_device(domain, dev);
> >> +    if (dev != priv->dma_dev)
> >> +        iommu_detach_device(domain, dev);
> >
> > Strictly you could skip that check, as detaching the master device
> > from its real default domain is just a no-op, but I guess maintaining
> > the symmetry is probably more intuitive.
>
> I would like to keep the symmetry.
>
> > ...
>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup
@ 2018-10-01 10:55                           ` Inki Dae
  0 siblings, 0 replies; 40+ messages in thread
From: Inki Dae @ 2018-10-01 10:55 UTC (permalink / raw)
  To: linux-arm-kernel

 + dri-devel

This patch may also break other ARM DRM drivers. I cced dri-devel so
that they could manage this.

And below relevant link for ARM DRM maintainers,
https://www.spinics.net/lists/arm-kernel/msg676098.html

Thanks,
Inki Dae

2018? 9? 29? (?) ?? 1:19, Marek Szyprowski <m.szyprowski@samsung.com>?? ??:
>
> Hi Robin,
>
> On 2018-09-28 17:31, Robin Murphy wrote:
> > On 28/09/18 15:21, Marek Szyprowski wrote:
> >> On 2018-09-28 15:52, Robin Murphy wrote:
> >>> On 28/09/18 14:26, Marek Szyprowski wrote:
> >>>> On 2018-09-12 17:24, Robin Murphy wrote:
> >>>>> Most parts of iommu-dma already assume they are operating on a
> >>>>> default
> >>>>> domain set up by iommu_dma_init_domain(), and can be converted
> >>>>> straight
> >>>>> over to avoid the refcounting bottleneck. MSI page mappings may be in
> >>>>> an unmanaged domain with an explicit MSI-only cookie, so retain the
> >>>>> non-specific lookup, but that's OK since they're far from a contended
> >>>>> fast path either way.
> >>>>>
> >>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>>>
> >>>> This breaks Exynos DRM driver with Exynos IOMMU on ARM64 (Exynos5433).
> >>>> Exynos DRM creates it's own domain, attach all devices which performs
> >>>> DMA access (typically CRTC devices) to it and uses standard
> >>>> DMA-mapping
> >>>> calls to allocate/map buffers. This way it can use the same one DMA
> >>>> address for each buffer regardless of the CRTC/display/processing
> >>>> device.
> >>>>
> >>>> This no longer works with this patch. The simplest solution to fix
> >>>> this
> >>>> is add API to change default_domain to the one allocated by the Exynos
> >>>> DRM driver.
> >>>
> >>> Ugh, I hadn't realised there were any drivers trying to fake up their
> >>> own default domains - that's always going to be fragile. In fact, one
> >>> of my proposals for un-breaking Tegra and other DRM drivers involves
> >>> preventing iommu-dma from trying to operate on non-IOMMU_DOMAIN_DMA
> >>> domains at all, which would stop that from working permanently.
> >>
> >> IMHO there should be a way to let drivers like DRM to use DMA-mapping
> >> infrastructure on their own iommu domain. Better provide such API to
> >> avoid hacking in the DRM drivers to get this done without help from
> >> IOMMU core.
> >
> > The tricky part is how to reconcile that with those other drivers
> > which want to do explicit IOMMU management with their own domain but
> > still use the DMA API for coherency of the underlying memory. I do
> > have a couple of half-formed ideas, but they're not quite there yet.
>
> The only idea I have here is to add some flags to struct driver to let
> device core and frameworks note that this driver wants to manage
> everything on their own. Then such drivers, once they settle their IOMMU
> domain, should call some simple API to bless it for DMA API.
>
> >
> >>> Can Exynos not put all the DRM components into a single group, or at
> >>> least just pick one of the real default domains to attach everyone to
> >>> instead of allocating a fake one?
> >>
> >> Exynos DRM components are not in the single group. Currently
> >> iommu-groups have no meaning on Exynos5433 and Exynos IOMMU simply
> >> allocate one group per each iommu-master device in the system.
> >
> > Yeah, a group spanning multiple IOMMU devices would have been a bit of
> > a hack for your topology, but still *technically* possible ;)
>
> The question if I want to put all DRM components into single IOMMU
> group, how to propagate such knowledge from Exynos DRM driver to Exynos
> IOMMU driver (groups are created by IOMMU driver)? Anyway, I don't see
> any benefits from using IOMMU groups as for now, especially when each
> Exynos DRM component has its own, separate IOMMU (or even more than one,
> but that a different story).
>
> >> Exynos DRM already selects one of its component devices as the 'main DMA
> >> device'. It is being used for managing buffers if no IOMMU is available,
> >> so it can also use its iommu domain instead of allocating a fake one.
> >> I've checked and it fixes Exynos DRM now. The question is how to merge
> >> this fix to keep bisectability.
> >>
> >> From: Marek Szyprowski <m.szyprowski@samsung.com>
> >> Date: Fri, 28 Sep 2018 16:17:27 +0200
> >> Subject: [PATCH] drm/exynos: Use selected dma_dev default iommu domain
> >> instead
> >>    of a fake one
> >>
> >> Instead of allocating a fake IOMMU domain for all Exynos DRM components,
> >> simply reuse the default IOMMU domain of the already selected DMA
> >> device.
> >> This allows some design changes in IOMMU framework without breaking
> >> IOMMU
> >> support in Exynos DRM.
> >>
> >> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >> ---
> >>    drivers/gpu/drm/exynos/exynos_drm_iommu.h | 34
> >> ++++-------------------
> >>    1 file changed, 6 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> >> b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> >> index 87f6b5672e11..51d3b7dcd529 100644
> >> --- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> >> +++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
> >> @@ -55,37 +55,12 @@ static inline void __exynos_iommu_detach(struct
> >> exynos_drm_private *priv,
> >>    static inline int __exynos_iommu_create_mapping(struct
> >> exynos_drm_private *priv,
> >>                        unsigned long start, unsigned long size)
> >>    {
> >> -    struct iommu_domain *domain;
> >> -    int ret;
> >> -
> >> -    domain = iommu_domain_alloc(priv->dma_dev->bus);
> >> -    if (!domain)
> >> -        return -ENOMEM;
> >> -
> >> -    ret = iommu_get_dma_cookie(domain);
> >> -    if (ret)
> >> -        goto free_domain;
> >> -
> >> -    ret = iommu_dma_init_domain(domain, start, size, NULL);
> >> -    if (ret)
> >> -        goto put_cookie;
> >> -
> >> -    priv->mapping = domain;
> >> +    priv->mapping = iommu_get_dma_domain(priv->dma_dev);
> >
> > iommu_get_domain_for_dev(), please - this isn't a performance-critical
> > path inside a DMA API implementation, plus without an actual
> > dependency on this series maybe there's a chance of sneaking it into
> > 4.19?
> >
> > (you could always still check domain->type == IOMMU_DOMAIN_DMA if you
> > want to be really really paranoid.)
> >
>
> I've sent a patch. I hope we will manage to get it as a fix into v4.19,
> so it won't block your changes in v4.20.
>
> >>        return 0;
> >> -
> >> -put_cookie:
> >> -    iommu_put_dma_cookie(domain);
> >> -free_domain:
> >> -    iommu_domain_free(domain);
> >> -    return ret;
> >>    }
> >>
> >>    static inline void __exynos_iommu_release_mapping(struct
> >> exynos_drm_private *priv)
> >>    {
> >> -    struct iommu_domain *domain = priv->mapping;
> >> -
> >> -    iommu_put_dma_cookie(domain);
> >> -    iommu_domain_free(domain);
> >>        priv->mapping = NULL;
> >>    }
> >>
> >> @@ -94,7 +69,9 @@ static inline int __exynos_iommu_attach(struct
> >> exynos_drm_private *priv,
> >>    {
> >>        struct iommu_domain *domain = priv->mapping;
> >>
> >> -    return iommu_attach_device(domain, dev);
> >> +    if (dev != priv->dma_dev)
> >> +        return iommu_attach_device(domain, dev);
> >> +    return 0;
> >>    }
> >>
> >>    static inline void __exynos_iommu_detach(struct exynos_drm_private
> >> *priv,
> >> @@ -102,7 +79,8 @@ static inline void __exynos_iommu_detach(struct
> >> exynos_drm_private *priv,
> >>    {
> >>        struct iommu_domain *domain = priv->mapping;
> >>
> >> -    iommu_detach_device(domain, dev);
> >> +    if (dev != priv->dma_dev)
> >> +        iommu_detach_device(domain, dev);
> >
> > Strictly you could skip that check, as detaching the master device
> > from its real default domain is just a no-op, but I guess maintaining
> > the symmetry is probably more intuitive.
>
> I would like to keep the symmetry.
>
> > ...
>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-10-01 10:55 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-12 15:24 [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention Robin Murphy
2018-09-12 15:24 ` Robin Murphy
     [not found] ` <cover.1536764440.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-09-12 15:24   ` [PATCH v2 1/3] iommu: Add fast hook for getting DMA domains Robin Murphy
2018-09-12 15:24     ` Robin Murphy
2018-09-12 15:24   ` [PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup Robin Murphy
2018-09-12 15:24     ` Robin Murphy
     [not found]     ` <e42771992a73620f23128c0479b2ae91b3e177bf.1536764440.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-09-28 13:26       ` Marek Szyprowski
2018-09-28 13:26         ` Marek Szyprowski
     [not found]         ` <20180928132605eucas1p1d39fedc3be3e4e2c16035c01a40cfab6~Ykz2Rax6e0538205382eucas1p17-MHMrYXj8g+pqW5MlFJXMulaTQe2KTcn/@public.gmane.org>
2018-09-28 13:52           ` Robin Murphy
2018-09-28 13:52             ` Robin Murphy
     [not found]             ` <78a5b373-d569-4461-e258-3104286ba322-5wv7dgnIgG8@public.gmane.org>
2018-09-28 14:21               ` Marek Szyprowski
2018-09-28 14:21                 ` Marek Szyprowski
     [not found]                 ` <20180928142148eucas1p17d6ecbd5c98b3fe1bf008dbcf0626c76~YlkfonoPN1120711207eucas1p1u-MHMrYXj8g+pqW5MlFJXMulaTQe2KTcn/@public.gmane.org>
2018-09-28 15:31                   ` Robin Murphy
2018-09-28 15:31                     ` Robin Murphy
     [not found]                     ` <a9eb4b79-1fb8-eec7-0566-966f84022d7b-5wv7dgnIgG8@public.gmane.org>
2018-09-28 15:33                       ` Christoph Hellwig
2018-09-28 15:33                         ` Christoph Hellwig
     [not found]                         ` <20180928153334.GA9462-jcswGhMUV9g@public.gmane.org>
2018-09-28 15:46                           ` Robin Murphy
2018-09-28 15:46                             ` Robin Murphy
2018-09-28 16:18                       ` Marek Szyprowski
2018-09-28 16:18                         ` Marek Szyprowski
2018-10-01 10:55                         ` Inki Dae
2018-10-01 10:55                           ` Inki Dae
     [not found]                     ` <CGME20180928160945eucas1p17ff77fe34854548719880638ecb9c54d@eucas1p1.samsung.com>
2018-09-28 16:09                       ` [PATCH] drm/exynos: Use selected dma_dev default iommu domain instead of a fake one Marek Szyprowski
2018-09-28 16:09                         ` Marek Szyprowski
2018-09-28 16:13                         ` Robin Murphy
2018-09-28 16:13                           ` Robin Murphy
2018-09-12 15:24   ` [PATCH v2 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops Robin Murphy
2018-09-12 15:24     ` Robin Murphy
2018-09-14 12:48   ` [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention Will Deacon
2018-09-14 12:48     ` Will Deacon
     [not found]     ` <20180914124858.GA4010-5wv7dgnIgG8@public.gmane.org>
2018-09-17 11:20       ` John Garry
2018-09-17 11:20         ` John Garry
     [not found]         ` <dbb9e48f-e31b-b8a1-0287-378c35e9fdb9-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-09-20 12:51           ` John Garry
2018-09-20 12:51             ` John Garry
2018-09-17 13:33       ` Christoph Hellwig
2018-09-17 13:33         ` Christoph Hellwig
     [not found]         ` <20180917133359.GA972-jcswGhMUV9g@public.gmane.org>
2018-09-18 13:28           ` Tom Murphy
2018-09-18 13:28             ` Tom Murphy
2018-09-25  8:24   ` Joerg Roedel
2018-09-25  8:24     ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.