linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface
@ 2017-08-17 12:56 Joerg Roedel
  2017-08-17 12:56 ` [PATCH 01/13] iommu/amd: Rename a few flush functions Joerg Roedel
                   ` (13 more replies)
  0 siblings, 14 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel

Hi,

here is a patch-set to introduce an explicit interface to
the IOMMU-API to flush IOMMU and device IO/TLBs. Currently
the iommu_map(), iommu_map_sg(), and iommu_unmap() functions
have to make sure all IO/TLBs in the system are synchronized
with the page-table updates they made.

This is very inefficient in some scenarios, for example when
a large address space is unmapped and an IO/TLB flush has to
be done in every call of iommu_unmap(). Or in a scenario
where it makes sense to queue up some changes to the
page-tables and flush them together.

To optimize these scenarios, the need to synchronize with
the IOMMU and device TLBs has been removed from the
map/unmap functions of the IOMMU-API and an interface to
explicitly do the flushes has been introduced.

To make the conversion of existing users of the IOMMU-API
easier, new functions - iommu_map_sync(), iommu_map_sg_sync(),
and iommu_unmap_sync() - have been introduced. These
functions guarantee that the IO/TLBs are synchronized with
any page-table update when they return. The optimizations
possible with the new interface are subject to separate
patch-sets.

Patch 1 just renames a few functions in the AMD-Vi driver
that would otherwise collide with the new TLB-flush
functions from the IOMMU-API.

Patch 2 introduces the new IO/TLB Flush-Interface.

Patch 3-13 convert existing users of the IOMMU-API to use
the *_sync functions for now.

Please review.

Thanks,

	Joerg

Joerg Roedel (13):
  iommu/amd: Rename a few flush functions
  iommu: Introduce Interface for IOMMU TLB Flushing
  vfio/type1: Use sychronized interface of the IOMMU-API
  iommu/dma: Use sychronized interface of the IOMMU-API
  arm: dma-mapping: Use sychronized interface of the IOMMU-API
  drm/etnaviv: Use sychronized interface of the IOMMU-API
  drm/msm: Use sychronized interface of the IOMMU-API
  drm/nouveau/imem/gk20a: Use sychronized interface of the IOMMU-API
  drm/rockchip: Use sychronized interface of the IOMMU-API
  drm/tegra: Use sychronized interface of the IOMMU-API
  gpu: host1x: Use sychronized interface of the IOMMU-API
  IB/usnic: Use sychronized interface of the IOMMU-API
  remoteproc: Use sychronized interface of the IOMMU-API

 arch/arm/mm/dma-mapping.c                          | 21 +++---
 drivers/gpu/drm/etnaviv/etnaviv_mmu.c              | 10 +--
 drivers/gpu/drm/msm/msm_iommu.c                    |  5 +-
 .../gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c    | 12 ++--
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c        |  6 +-
 drivers/gpu/drm/tegra/drm.c                        |  6 +-
 drivers/gpu/drm/tegra/gem.c                        |  6 +-
 drivers/gpu/host1x/cdma.c                          |  6 +-
 drivers/gpu/host1x/job.c                           |  6 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c           | 10 +--
 drivers/iommu/amd_iommu.c                          | 16 ++---
 drivers/iommu/dma-iommu.c                          |  8 +--
 drivers/iommu/iommu.c                              | 26 +++++++
 drivers/remoteproc/remoteproc_core.c               | 10 +--
 drivers/vfio/vfio_iommu_type1.c                    | 38 +++++-----
 include/linux/iommu.h                              | 80 +++++++++++++++++++++-
 16 files changed, 189 insertions(+), 77 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/13] iommu/amd: Rename a few flush functions
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 12:56 ` [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing Joerg Roedel
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Rename a few iommu cache-flush functions that start with
iommu_ to start with amd_iommu now. This is to prevent name
collisions with generic iommu code later on.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/iommu/amd_iommu.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 354cbd6..5469457 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1165,7 +1165,7 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
 	return iommu_queue_command(iommu, &cmd);
 }
 
-static void iommu_flush_dte_all(struct amd_iommu *iommu)
+static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
 {
 	u32 devid;
 
@@ -1179,7 +1179,7 @@ static void iommu_flush_dte_all(struct amd_iommu *iommu)
  * This function uses heavy locking and may disable irqs for some time. But
  * this is no issue because it is only called during resume.
  */
-static void iommu_flush_tlb_all(struct amd_iommu *iommu)
+static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
 {
 	u32 dom_id;
 
@@ -1193,7 +1193,7 @@ static void iommu_flush_tlb_all(struct amd_iommu *iommu)
 	iommu_completion_wait(iommu);
 }
 
-static void iommu_flush_all(struct amd_iommu *iommu)
+static void amd_iommu_flush_all(struct amd_iommu *iommu)
 {
 	struct iommu_cmd cmd;
 
@@ -1212,7 +1212,7 @@ static void iommu_flush_irt(struct amd_iommu *iommu, u16 devid)
 	iommu_queue_command(iommu, &cmd);
 }
 
-static void iommu_flush_irt_all(struct amd_iommu *iommu)
+static void amd_iommu_flush_irt_all(struct amd_iommu *iommu)
 {
 	u32 devid;
 
@@ -1225,11 +1225,11 @@ static void iommu_flush_irt_all(struct amd_iommu *iommu)
 void iommu_flush_all_caches(struct amd_iommu *iommu)
 {
 	if (iommu_feature(iommu, FEATURE_IA)) {
-		iommu_flush_all(iommu);
+		amd_iommu_flush_all(iommu);
 	} else {
-		iommu_flush_dte_all(iommu);
-		iommu_flush_irt_all(iommu);
-		iommu_flush_tlb_all(iommu);
+		amd_iommu_flush_dte_all(iommu);
+		amd_iommu_flush_irt_all(iommu);
+		amd_iommu_flush_tlb_all(iommu);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
  2017-08-17 12:56 ` [PATCH 01/13] iommu/amd: Rename a few flush functions Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 16:32   ` Will Deacon
  2017-08-17 12:56 ` [PATCH 03/13] vfio/type1: Use sychronized interface of the IOMMU-API Joerg Roedel
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	Alex Williamson, Will Deacon, Robin Murphy

From: Joerg Roedel <jroedel@suse.de>

With the current IOMMU-API the hardware TLBs have to be
flushed in every iommu_map(), iommu_map_sg(), and
iommu_unmap() call.

For unmapping large amounts of address space, like it
happens when a KVM domain with assigned devices is
destroyed, this causes thousands of unnecessary TLB flushes
in the IOMMU hardware because the unmap call-back runs for
every unmapped physical page.

With the TLB Flush Interface introduced here the need to
clean the hardware TLBs is removed from the iommu_map/unmap
functions. Users now have to explicitly call these functions
to sync the page-table changes to the hardware.

Three functions are introduced:

	* iommu_flush_tlb_all() - Flushes all TLB entries
	                          associated with that
				  domain. TLBs entries are
				  flushed when this function
				  returns.

	* iommu_tlb_range_add() - This will add a given
				  range to the flush queue
				  for this domain.

	* iommu_tlb_sync() - Flushes all queued ranges from
			     the hardware TLBs. Returns when
			     the flush is finished.

The semantic of this interface is intentionally similar to
the iommu_gather_ops from the io-pgtable code.

Additionally, this patch introduces synchronized versions of
the iommu_map(), iommu_map_sg(), and iommu_unmap()
functions. They will be used by current users of the
IOMMU-API, before they are optimized to the unsynchronized
versions.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/iommu/iommu.c | 26 +++++++++++++++++
 include/linux/iommu.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3f6ea16..816e248 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -527,6 +527,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
 
 	}
 
+	iommu_flush_tlb_all(domain);
+
 out:
 	iommu_put_resv_regions(dev, &mappings);
 
@@ -1556,6 +1558,18 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
+int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
+		   phys_addr_t paddr, size_t size, int prot)
+{
+	int ret = iommu_map(domain, iova, paddr, size, prot);
+
+	iommu_tlb_range_add(domain, iova, size);
+	iommu_tlb_sync(domain);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_map_sync);
+
 size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
 	size_t unmapped_page, unmapped = 0;
@@ -1608,6 +1622,18 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
+size_t iommu_unmap_sync(struct iommu_domain *domain,
+			unsigned long iova, size_t size)
+{
+	size_t ret = iommu_unmap(domain, iova, size);
+
+	iommu_tlb_range_add(domain, iova, size);
+	iommu_tlb_sync(domain);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_unmap_sync);
+
 size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 			 struct scatterlist *sg, unsigned int nents, int prot)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2cb54ad..7f9c114 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -167,6 +167,10 @@ struct iommu_resv_region {
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
+ * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain
+ * @tlb_range_add: Add a given iova range to the flush queue for this domain
+ * @tlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
+ *            queue
  * to an iommu domain
  * @iova_to_phys: translate iova to physical address
  * @add_device: add device to iommu grouping
@@ -199,6 +203,10 @@ struct iommu_ops {
 		     size_t size);
 	size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova,
 			 struct scatterlist *sg, unsigned int nents, int prot);
+	void (*flush_iotlb_all)(struct iommu_domain *domain);
+	void (*iotlb_range_add)(struct iommu_domain *domain,
+				unsigned long iova, size_t size);
+	void (*iotlb_sync)(struct iommu_domain *domain);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
 	int (*add_device)(struct device *dev);
 	void (*remove_device)(struct device *dev);
@@ -285,8 +293,12 @@ extern void iommu_detach_device(struct iommu_domain *domain,
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
+extern int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
+			  phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-		       size_t size);
+			  size_t size);
+extern size_t iommu_unmap_sync(struct iommu_domain *domain,
+			       unsigned long iova, size_t size);
 extern size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
 				struct scatterlist *sg,unsigned int nents,
 				int prot);
@@ -343,6 +355,25 @@ extern void iommu_domain_window_disable(struct iommu_domain *domain, u32 wnd_nr)
 extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 			      unsigned long iova, int flags);
 
+static inline void iommu_flush_tlb_all(struct iommu_domain *domain)
+{
+	if (domain->ops->flush_iotlb_all)
+		domain->ops->flush_iotlb_all(domain);
+}
+
+static inline void iommu_tlb_range_add(struct iommu_domain *domain,
+				       unsigned long iova, size_t size)
+{
+	if (domain->ops->iotlb_range_add)
+		domain->ops->iotlb_range_add(domain, iova, size);
+}
+
+static inline void iommu_tlb_sync(struct iommu_domain *domain)
+{
+	if (domain->ops->iotlb_sync)
+		domain->ops->iotlb_sync(domain);
+}
+
 static inline size_t iommu_map_sg(struct iommu_domain *domain,
 				  unsigned long iova, struct scatterlist *sg,
 				  unsigned int nents, int prot)
@@ -350,6 +381,20 @@ static inline size_t iommu_map_sg(struct iommu_domain *domain,
 	return domain->ops->map_sg(domain, iova, sg, nents, prot);
 }
 
+static inline size_t iommu_map_sg_sync(struct iommu_domain *domain,
+				       unsigned long iova,
+				       struct scatterlist *sg,
+				       unsigned int nents, int prot)
+{
+	size_t size = domain->ops->map_sg(domain, iova, sg, nents, prot);
+	if (size > 0) {
+		iommu_tlb_range_add(domain, iova, size);
+		iommu_tlb_sync(domain);
+	}
+
+	return size;
+}
+
 /* PCI device grouping function */
 extern struct iommu_group *pci_device_group(struct device *dev);
 /* Generic device grouping function */
@@ -430,12 +475,24 @@ static inline int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	return -ENODEV;
 }
 
+static inline int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
+				 phys_addr_t paddr, int gfp_order, int prot)
+{
+	return -ENODEV;
+}
+
 static inline int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 			      int gfp_order)
 {
 	return -ENODEV;
 }
 
+static inline int iommu_unmap_sync(struct iommu_domain *domain, unsigned long iova,
+				   int gfp_order)
+{
+	return -ENODEV;
+}
+
 static inline size_t iommu_map_sg(struct iommu_domain *domain,
 				  unsigned long iova, struct scatterlist *sg,
 				  unsigned int nents, int prot)
@@ -443,6 +500,27 @@ static inline size_t iommu_map_sg(struct iommu_domain *domain,
 	return -ENODEV;
 }
 
+static inline size_t iommu_map_sg_sync(struct iommu_domain *domain,
+				       unsigned long iova,
+				       struct scatterlist *sg,
+				       unsigned int nents, int prot)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_flush_tlb_all(struct iommu_domain *domain)
+{
+}
+
+static inline void iommu_tlb_range_add(struct iommu_domain *domain,
+				       unsigned long iova, size_t size)
+{
+}
+
+static inline void iommu_tlb_sync(struct iommu_domain *domain)
+{
+}
+
 static inline int iommu_domain_window_enable(struct iommu_domain *domain,
 					     u32 wnd_nr, phys_addr_t paddr,
 					     u64 size, int prot)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/13] vfio/type1: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
  2017-08-17 12:56 ` [PATCH 01/13] iommu/amd: Rename a few flush functions Joerg Roedel
  2017-08-17 12:56 ` [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 12:56 ` [PATCH 04/13] iommu/dma: " Joerg Roedel
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Alex Williamson, kvm

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/vfio/vfio_iommu_type1.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 8549cb1..4ad83d4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -672,7 +672,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 				      struct vfio_domain, next);
 
 	list_for_each_entry_continue(d, &iommu->domain_list, next) {
-		iommu_unmap(d->domain, dma->iova, dma->size);
+		iommu_unmap_sync(d->domain, dma->iova, dma->size);
 		cond_resched();
 	}
 
@@ -687,9 +687,9 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 		}
 
 		/*
-		 * To optimize for fewer iommu_unmap() calls, each of which
-		 * may require hardware cache flushing, try to find the
-		 * largest contiguous physical memory chunk to unmap.
+		 * To optimize for fewer iommu_unmap_sync() calls, each of which
+		 * may require hardware cache flushing, try to find the largest
+		 * contiguous physical memory chunk to unmap.
 		 */
 		for (len = PAGE_SIZE;
 		     !domain->fgsp && iova + len < end; len += PAGE_SIZE) {
@@ -698,7 +698,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 				break;
 		}
 
-		unmapped = iommu_unmap(domain->domain, iova, len);
+		unmapped = iommu_unmap_sync(domain->domain, iova, len);
 		if (WARN_ON(!unmapped))
 			break;
 
@@ -877,15 +877,15 @@ static int map_try_harder(struct vfio_domain *domain, dma_addr_t iova,
 	int ret = 0;
 
 	for (i = 0; i < npage; i++, pfn++, iova += PAGE_SIZE) {
-		ret = iommu_map(domain->domain, iova,
-				(phys_addr_t)pfn << PAGE_SHIFT,
-				PAGE_SIZE, prot | domain->prot);
+		ret = iommu_map_sync(domain->domain, iova,
+				     (phys_addr_t)pfn << PAGE_SHIFT,
+				     PAGE_SIZE, prot | domain->prot);
 		if (ret)
 			break;
 	}
 
 	for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)
-		iommu_unmap(domain->domain, iova, PAGE_SIZE);
+		iommu_unmap_sync(domain->domain, iova, PAGE_SIZE);
 
 	return ret;
 }
@@ -897,8 +897,9 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
 	int ret;
 
 	list_for_each_entry(d, &iommu->domain_list, next) {
-		ret = iommu_map(d->domain, iova, (phys_addr_t)pfn << PAGE_SHIFT,
-				npage << PAGE_SHIFT, prot | d->prot);
+		ret = iommu_map_sync(d->domain, iova,
+				     (phys_addr_t)pfn << PAGE_SHIFT,
+				     npage << PAGE_SHIFT, prot | d->prot);
 		if (ret) {
 			if (ret != -EBUSY ||
 			    map_try_harder(d, iova, pfn, npage, prot))
@@ -912,7 +913,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
 
 unwind:
 	list_for_each_entry_continue_reverse(d, &iommu->domain_list, next)
-		iommu_unmap(d->domain, iova, npage << PAGE_SHIFT);
+		iommu_unmap_sync(d->domain, iova, npage << PAGE_SHIFT);
 
 	return ret;
 }
@@ -1102,8 +1103,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 				size = npage << PAGE_SHIFT;
 			}
 
-			ret = iommu_map(domain->domain, iova, phys,
-					size, dma->prot | domain->prot);
+			ret = iommu_map_sync(domain->domain, iova, phys,
+					     size, dma->prot | domain->prot);
 			if (ret)
 				return ret;
 
@@ -1133,13 +1134,14 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain)
 	if (!pages)
 		return;
 
-	ret = iommu_map(domain->domain, 0, page_to_phys(pages), PAGE_SIZE * 2,
-			IOMMU_READ | IOMMU_WRITE | domain->prot);
+	ret = iommu_map_sync(domain->domain, 0, page_to_phys(pages),
+			     PAGE_SIZE * 2,
+			     IOMMU_READ | IOMMU_WRITE | domain->prot);
 	if (!ret) {
-		size_t unmapped = iommu_unmap(domain->domain, 0, PAGE_SIZE);
+		size_t unmapped = iommu_unmap_sync(domain->domain, 0, PAGE_SIZE);
 
 		if (unmapped == PAGE_SIZE)
-			iommu_unmap(domain->domain, PAGE_SIZE, PAGE_SIZE);
+			iommu_unmap_sync(domain->domain, PAGE_SIZE, PAGE_SIZE);
 		else
 			domain->fgsp = true;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/13] iommu/dma: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (2 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 03/13] vfio/type1: Use sychronized interface of the IOMMU-API Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 12:56 ` [PATCH 05/13] arm: dma-mapping: " Joerg Roedel
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Robin Murphy,
	Will Deacon, Nate Watterson, Eric Auger, Mitchel Humpherys

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Nate Watterson <nwatters@codeaurora.org>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Mitchel Humpherys <mitchelh@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/iommu/dma-iommu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9d1cebe..38c41a2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -417,7 +417,7 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr,
 	dma_addr -= iova_off;
 	size = iova_align(iovad, size + iova_off);
 
-	WARN_ON(iommu_unmap(domain, dma_addr, size) != size);
+	WARN_ON(iommu_unmap_sync(domain, dma_addr, size) != size);
 	iommu_dma_free_iova(cookie, dma_addr, size);
 }
 
@@ -572,7 +572,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
 		sg_miter_stop(&miter);
 	}
 
-	if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, prot)
+	if (iommu_map_sg_sync(domain, iova, sgt.sgl, sgt.orig_nents, prot)
 			< size)
 		goto out_free_sg;
 
@@ -631,7 +631,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
 	if (!iova)
 		return IOMMU_MAPPING_ERROR;
 
-	if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
+	if (iommu_map_sync(domain, iova, phys - iova_off, size, prot)) {
 		iommu_dma_free_iova(cookie, iova, size);
 		return IOMMU_MAPPING_ERROR;
 	}
@@ -791,7 +791,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	 * We'll leave any physical concatenation to the IOMMU driver's
 	 * implementation - it knows better than we do.
 	 */
-	if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
+	if (iommu_map_sg_sync(domain, iova, sg, nents, prot) < iova_len)
 		goto out_free_iova;
 
 	return __finalise_sg(dev, sg, nents, iova);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/13] arm: dma-mapping: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (3 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 04/13] iommu/dma: " Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 12:56 ` [PATCH 06/13] drm/etnaviv: " Joerg Roedel
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Russell King,
	Laurent Pinchart, Sricharan R, linux-arm-kernel

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Cc: Sricharan R <sricharan@codeaurora.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/arm/mm/dma-mapping.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index fcf1473..fae2398 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1444,7 +1444,7 @@ __iommu_create_mapping(struct device *dev, struct page **pages, size_t size,
 				break;
 
 		len = (j - i) << PAGE_SHIFT;
-		ret = iommu_map(mapping->domain, iova, phys, len,
+		ret = iommu_map_sync(mapping->domain, iova, phys, len,
 				__dma_info_to_prot(DMA_BIDIRECTIONAL, attrs));
 		if (ret < 0)
 			goto fail;
@@ -1453,7 +1453,7 @@ __iommu_create_mapping(struct device *dev, struct page **pages, size_t size,
 	}
 	return dma_addr;
 fail:
-	iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
+	iommu_unmap_sync(mapping->domain, dma_addr, iova-dma_addr);
 	__free_iova(mapping, dma_addr, size);
 	return ARM_MAPPING_ERROR;
 }
@@ -1469,7 +1469,7 @@ static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t si
 	size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
 	iova &= PAGE_MASK;
 
-	iommu_unmap(mapping->domain, iova, size);
+	iommu_unmap_sync(mapping->domain, iova, size);
 	__free_iova(mapping, iova, size);
 	return 0;
 }
@@ -1730,7 +1730,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 
 		prot = __dma_info_to_prot(dir, attrs);
 
-		ret = iommu_map(mapping->domain, iova, phys, len, prot);
+		ret = iommu_map_sync(mapping->domain, iova, phys, len, prot);
 		if (ret < 0)
 			goto fail;
 		count += len >> PAGE_SHIFT;
@@ -1740,7 +1740,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 
 	return 0;
 fail:
-	iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
+	iommu_unmap_sync(mapping->domain, iova_base, count * PAGE_SIZE);
 	__free_iova(mapping, iova_base, size);
 	return ret;
 }
@@ -1938,7 +1938,8 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
 
 	prot = __dma_info_to_prot(dir, attrs);
 
-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
+	ret = iommu_map_sync(mapping->domain, dma_addr, page_to_phys(page),
+			     len, prot);
 	if (ret < 0)
 		goto fail;
 
@@ -1988,7 +1989,7 @@ static void arm_coherent_iommu_unmap_page(struct device *dev, dma_addr_t handle,
 	if (!iova)
 		return;
 
-	iommu_unmap(mapping->domain, iova, len);
+	iommu_unmap_sync(mapping->domain, iova, len);
 	__free_iova(mapping, iova, len);
 }
 
@@ -2016,7 +2017,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
 	if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 
-	iommu_unmap(mapping->domain, iova, len);
+	iommu_unmap_sync(mapping->domain, iova, len);
 	__free_iova(mapping, iova, len);
 }
 
@@ -2044,7 +2045,7 @@ static dma_addr_t arm_iommu_map_resource(struct device *dev,
 
 	prot = __dma_info_to_prot(dir, attrs) | IOMMU_MMIO;
 
-	ret = iommu_map(mapping->domain, dma_addr, addr, len, prot);
+	ret = iommu_map_sync(mapping->domain, dma_addr, addr, len, prot);
 	if (ret < 0)
 		goto fail;
 
@@ -2073,7 +2074,7 @@ static void arm_iommu_unmap_resource(struct device *dev, dma_addr_t dma_handle,
 	if (!iova)
 		return;
 
-	iommu_unmap(mapping->domain, iova, len);
+	iommu_unmap_sync(mapping->domain, iova, len);
 	__free_iova(mapping, iova, len);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (4 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 05/13] arm: dma-mapping: " Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 13:32   ` Lucas Stach
  2017-08-17 12:56 ` [PATCH 07/13] drm/msm: " Joerg Roedel
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Lucas Stach,
	Russell King, Christian Gmeiner, David Airlie, etnaviv,
	dri-devel

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: etnaviv@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
index f103e78..ae0247c 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
@@ -47,7 +47,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova,
 
 		VERB("map[%d]: %08x %08x(%zx)", i, iova, pa, bytes);
 
-		ret = iommu_map(domain, da, pa, bytes, prot);
+		ret = iommu_map_sync(domain, da, pa, bytes, prot);
 		if (ret)
 			goto fail;
 
@@ -62,7 +62,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova,
 	for_each_sg(sgt->sgl, sg, i, j) {
 		size_t bytes = sg_dma_len(sg) + sg->offset;
 
-		iommu_unmap(domain, da, bytes);
+		iommu_unmap_sync(domain, da, bytes);
 		da += bytes;
 	}
 	return ret;
@@ -80,7 +80,7 @@ int etnaviv_iommu_unmap(struct etnaviv_iommu *iommu, u32 iova,
 		size_t bytes = sg_dma_len(sg) + sg->offset;
 		size_t unmapped;
 
-		unmapped = iommu_unmap(domain, da, bytes);
+		unmapped = iommu_unmap_sync(domain, da, bytes);
 		if (unmapped < bytes)
 			return unmapped;
 
@@ -338,7 +338,7 @@ int etnaviv_iommu_get_suballoc_va(struct etnaviv_gpu *gpu, dma_addr_t paddr,
 			mutex_unlock(&mmu->lock);
 			return ret;
 		}
-		ret = iommu_map(mmu->domain, vram_node->start, paddr, size,
+		ret = iommu_map_sync(mmu->domain, vram_node->start, paddr, size,
 				IOMMU_READ);
 		if (ret < 0) {
 			drm_mm_remove_node(vram_node);
@@ -362,7 +362,7 @@ void etnaviv_iommu_put_suballoc_va(struct etnaviv_gpu *gpu,
 
 	if (mmu->version == ETNAVIV_IOMMU_V2) {
 		mutex_lock(&mmu->lock);
-		iommu_unmap(mmu->domain,iova, size);
+		iommu_unmap_sync(mmu->domain,iova, size);
 		drm_mm_remove_node(vram_node);
 		mutex_unlock(&mmu->lock);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (5 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 06/13] drm/etnaviv: " Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-19 15:39   ` Rob Clark
  2017-08-17 12:56 ` [PATCH 08/13] drm/nouveau/imem/gk20a: " Joerg Roedel
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Rob Clark,
	David Airlie, linux-arm-msm, dri-devel, freedreno

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Rob Clark <robdclark@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: linux-arm-msm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: freedreno@lists.freedesktop.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/gpu/drm/msm/msm_iommu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index b23d336..b3525b7 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -64,7 +64,8 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 	size_t ret;
 
 //	pm_runtime_get_sync(mmu->dev);
-	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
+	ret = iommu_map_sg_sync(iommu->domain, iova, sgt->sgl,
+				sgt->nents, prot);
 //	pm_runtime_put_sync(mmu->dev);
 	WARN_ON(ret < 0);
 
@@ -77,7 +78,7 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova,
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 
 	pm_runtime_get_sync(mmu->dev);
-	iommu_unmap(iommu->domain, iova, len);
+	iommu_unmap_sync(iommu->domain, iova, len);
 	pm_runtime_put_sync(mmu->dev);
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/13] drm/nouveau/imem/gk20a: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (6 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 07/13] drm/msm: " Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 12:56 ` [PATCH 09/13] drm/rockchip: " Joerg Roedel
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Ben Skeggs,
	David Airlie, dri-devel, nouveau

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index cd5adbe..3f0de47 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -322,8 +322,9 @@ gk20a_instobj_dtor_iommu(struct nvkm_memory *memory)
 
 	/* Unmap pages from GPU address space and free them */
 	for (i = 0; i < node->base.mem.size; i++) {
-		iommu_unmap(imem->domain,
-			    (r->offset + i) << imem->iommu_pgshift, PAGE_SIZE);
+		iommu_unmap_sync(imem->domain,
+				 (r->offset + i) << imem->iommu_pgshift,
+				 PAGE_SIZE);
 		dma_unmap_page(dev, node->dma_addrs[i], PAGE_SIZE,
 			       DMA_BIDIRECTIONAL);
 		__free_page(node->pages[i]);
@@ -458,14 +459,15 @@ gk20a_instobj_ctor_iommu(struct gk20a_instmem *imem, u32 npages, u32 align,
 	for (i = 0; i < npages; i++) {
 		u32 offset = (r->offset + i) << imem->iommu_pgshift;
 
-		ret = iommu_map(imem->domain, offset, node->dma_addrs[i],
-				PAGE_SIZE, IOMMU_READ | IOMMU_WRITE);
+		ret = iommu_map_sync(imem->domain, offset, node->dma_addrs[i],
+				     PAGE_SIZE, IOMMU_READ | IOMMU_WRITE);
 		if (ret < 0) {
 			nvkm_error(subdev, "IOMMU mapping failure: %d\n", ret);
 
 			while (i-- > 0) {
 				offset -= PAGE_SIZE;
-				iommu_unmap(imem->domain, offset, PAGE_SIZE);
+				iommu_unmap_sync(imem->domain, offset,
+						 PAGE_SIZE);
 			}
 			goto release_area;
 		}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/13] drm/rockchip: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (7 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 08/13] drm/nouveau/imem/gk20a: " Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 12:56 ` [PATCH 10/13] drm/tegra: " Joerg Roedel
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Mark Yao,
	David Airlie, Heiko Stuebner, dri-devel, linux-arm-kernel,
	linux-rockchip

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Mark Yao <mark.yao@rock-chips.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index b74ac71..6d28224 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -41,8 +41,8 @@ static int rockchip_gem_iommu_map(struct rockchip_gem_object *rk_obj)
 
 	rk_obj->dma_addr = rk_obj->mm.start;
 
-	ret = iommu_map_sg(private->domain, rk_obj->dma_addr, rk_obj->sgt->sgl,
-			   rk_obj->sgt->nents, prot);
+	ret = iommu_map_sg_sync(private->domain, rk_obj->dma_addr,
+				rk_obj->sgt->sgl, rk_obj->sgt->nents, prot);
 	if (ret < rk_obj->base.size) {
 		DRM_ERROR("failed to map buffer: size=%zd request_size=%zd\n",
 			  ret, rk_obj->base.size);
@@ -67,7 +67,7 @@ static int rockchip_gem_iommu_unmap(struct rockchip_gem_object *rk_obj)
 	struct drm_device *drm = rk_obj->base.dev;
 	struct rockchip_drm_private *private = drm->dev_private;
 
-	iommu_unmap(private->domain, rk_obj->dma_addr, rk_obj->size);
+	iommu_unmap_sync(private->domain, rk_obj->dma_addr, rk_obj->size);
 
 	mutex_lock(&private->mm_lock);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/13] drm/tegra: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (8 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 09/13] drm/rockchip: " Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 13:28   ` Thierry Reding
  2017-08-17 12:56 ` [PATCH 11/13] gpu: host1x: " Joerg Roedel
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	Thierry Reding, David Airlie, Jonathan Hunter, dri-devel,
	linux-tegra

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-tegra@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/gpu/drm/tegra/drm.c | 6 +++---
 drivers/gpu/drm/tegra/gem.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 518f4b6..bc4528ee 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1156,8 +1156,8 @@ void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size,
 	}
 
 	*dma = iova_dma_addr(&tegra->carveout.domain, alloc);
-	err = iommu_map(tegra->domain, *dma, virt_to_phys(virt),
-			size, IOMMU_READ | IOMMU_WRITE);
+	err = iommu_map_sync(tegra->domain, *dma, virt_to_phys(virt),
+			     size, IOMMU_READ | IOMMU_WRITE);
 	if (err < 0)
 		goto free_iova;
 
@@ -1180,7 +1180,7 @@ void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
 		size = PAGE_ALIGN(size);
 
 	if (tegra->domain) {
-		iommu_unmap(tegra->domain, dma, size);
+		iommu_unmap_sync(tegra->domain, dma, size);
 		free_iova(&tegra->carveout.domain,
 			  iova_pfn(&tegra->carveout.domain, dma));
 	}
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 7a39a35..639bc75 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -135,8 +135,8 @@ static int tegra_bo_iommu_map(struct tegra_drm *tegra, struct tegra_bo *bo)
 
 	bo->paddr = bo->mm->start;
 
-	err = iommu_map_sg(tegra->domain, bo->paddr, bo->sgt->sgl,
-			   bo->sgt->nents, prot);
+	err = iommu_map_sg_sync(tegra->domain, bo->paddr, bo->sgt->sgl,
+				bo->sgt->nents, prot);
 	if (err < 0) {
 		dev_err(tegra->drm->dev, "failed to map buffer: %zd\n", err);
 		goto remove;
@@ -162,7 +162,7 @@ static int tegra_bo_iommu_unmap(struct tegra_drm *tegra, struct tegra_bo *bo)
 		return 0;
 
 	mutex_lock(&tegra->mm_lock);
-	iommu_unmap(tegra->domain, bo->paddr, bo->size);
+	iommu_unmap_sync(tegra->domain, bo->paddr, bo->size);
 	drm_mm_remove_node(bo->mm);
 	mutex_unlock(&tegra->mm_lock);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/13] gpu: host1x: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (9 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 10/13] drm/tegra: " Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 13:29   ` Thierry Reding
  2017-08-17 12:56 ` [PATCH 12/13] IB/usnic: " Joerg Roedel
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	Thierry Reding, dri-devel, linux-tegra

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-tegra@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/gpu/host1x/cdma.c | 6 +++---
 drivers/gpu/host1x/job.c  | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 28541b2..bdf557e 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -55,7 +55,7 @@ static void host1x_pushbuffer_destroy(struct push_buffer *pb)
 		return;
 
 	if (host1x->domain) {
-		iommu_unmap(host1x->domain, pb->dma, pb->alloc_size);
+		iommu_unmap_sync(host1x->domain, pb->dma, pb->alloc_size);
 		free_iova(&host1x->iova, iova_pfn(&host1x->iova, pb->dma));
 	}
 
@@ -105,8 +105,8 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
 		}
 
 		pb->dma = iova_dma_addr(&host1x->iova, alloc);
-		err = iommu_map(host1x->domain, pb->dma, pb->phys, size,
-				IOMMU_READ);
+		err = iommu_map_sync(host1x->domain, pb->dma, pb->phys, size,
+				     IOMMU_READ);
 		if (err)
 			goto iommu_free_iova;
 	} else {
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index bee5044..70a029c 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -243,7 +243,7 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 				goto unpin;
 			}
 
-			err = iommu_map_sg(host->domain,
+			err = iommu_map_sg_sync(host->domain,
 					iova_dma_addr(&host->iova, alloc),
 					sgt->sgl, sgt->nents, IOMMU_READ);
 			if (err == 0) {
@@ -695,8 +695,8 @@ void host1x_job_unpin(struct host1x_job *job)
 		struct host1x_job_unpin_data *unpin = &job->unpins[i];
 
 		if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) && host->domain) {
-			iommu_unmap(host->domain, job->addr_phys[i],
-				    unpin->size);
+			iommu_unmap_sync(host->domain, job->addr_phys[i],
+					 unpin->size);
 			free_iova(&host->iova,
 				iova_pfn(&host->iova, job->addr_phys[i]));
 		}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/13] IB/usnic: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (10 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 11/13] gpu: host1x: " Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-17 12:56 ` [PATCH 13/13] remoteproc: " Joerg Roedel
  2017-08-17 14:35 ` [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Alex Williamson
  13 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	Christian Benvenuti, Dave Goodell, Doug Ledford, Sean Hefty,
	Hal Rosenstock, linux-rdma

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Christian Benvenuti <benve@cisco.com>
Cc: Dave Goodell <dgoodell@cisco.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/infiniband/hw/usnic/usnic_uiom.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index c49db7c..e53dc38 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -205,7 +205,7 @@ static void usnic_uiom_unmap_sorted_intervals(struct list_head *intervals,
 		while (size > 0) {
 			/* Workaround for RH 970401 */
 			usnic_dbg("va 0x%lx size 0x%lx", va, PAGE_SIZE);
-			iommu_unmap(pd->domain, va, PAGE_SIZE);
+			iommu_unmap_sync(pd->domain, va, PAGE_SIZE);
 			va += PAGE_SIZE;
 			size -= PAGE_SIZE;
 		}
@@ -282,8 +282,8 @@ static int usnic_uiom_map_sorted_intervals(struct list_head *intervals,
 				size = pa_end - pa_start + PAGE_SIZE;
 				usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x",
 					va_start, &pa_start, size, flags);
-				err = iommu_map(pd->domain, va_start, pa_start,
-							size, flags);
+				err = iommu_map_sync(pd->domain, va_start,
+						     pa_start, size, flags);
 				if (err) {
 					usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n",
 						va_start, &pa_start, size, err);
@@ -299,8 +299,8 @@ static int usnic_uiom_map_sorted_intervals(struct list_head *intervals,
 				size = pa - pa_start + PAGE_SIZE;
 				usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x\n",
 					va_start, &pa_start, size, flags);
-				err = iommu_map(pd->domain, va_start, pa_start,
-						size, flags);
+				err = iommu_map_sync(pd->domain, va_start,
+						     pa_start, size, flags);
 				if (err) {
 					usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n",
 						va_start, &pa_start, size, err);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 13/13] remoteproc: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (11 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 12/13] IB/usnic: " Joerg Roedel
@ 2017-08-17 12:56 ` Joerg Roedel
  2017-08-24 19:06   ` Bjorn Andersson
  2017-08-17 14:35 ` [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Alex Williamson
  13 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	Ohad Ben-Cohen, Bjorn Andersson, linux-remoteproc

From: Joerg Roedel <jroedel@suse.de>

The map and unmap functions of the IOMMU-API changed their
semantics: They do no longer guarantee that the hardware
TLBs are synchronized with the page-table updates they made.

To make conversion easier, new synchronized functions have
been introduced which give these guarantees again until the
code is converted to use the new TLB-flush interface of the
IOMMU-API, which allows certain optimizations.

But for now, just convert this code to use the synchronized
functions so that it will behave as before.

Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: linux-remoteproc@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/remoteproc/remoteproc_core.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 564061d..16db19c 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -553,7 +553,8 @@ static int rproc_handle_devmem(struct rproc *rproc, struct fw_rsc_devmem *rsc,
 	if (!mapping)
 		return -ENOMEM;
 
-	ret = iommu_map(rproc->domain, rsc->da, rsc->pa, rsc->len, rsc->flags);
+	ret = iommu_map_sync(rproc->domain, rsc->da,
+			     rsc->pa, rsc->len, rsc->flags);
 	if (ret) {
 		dev_err(dev, "failed to map devmem: %d\n", ret);
 		goto out;
@@ -661,8 +662,8 @@ static int rproc_handle_carveout(struct rproc *rproc,
 			goto dma_free;
 		}
 
-		ret = iommu_map(rproc->domain, rsc->da, dma, rsc->len,
-				rsc->flags);
+		ret = iommu_map_sync(rproc->domain, rsc->da, dma, rsc->len,
+				     rsc->flags);
 		if (ret) {
 			dev_err(dev, "iommu_map failed: %d\n", ret);
 			goto free_mapping;
@@ -823,7 +824,8 @@ static void rproc_resource_cleanup(struct rproc *rproc)
 	list_for_each_entry_safe(entry, tmp, &rproc->mappings, node) {
 		size_t unmapped;
 
-		unmapped = iommu_unmap(rproc->domain, entry->da, entry->len);
+		unmapped = iommu_unmap_sync(rproc->domain, entry->da,
+					    entry->len);
 		if (unmapped != entry->len) {
 			/* nothing much to do besides complaining */
 			dev_err(dev, "failed to unmap %u/%zu\n", entry->len,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/13] drm/tegra: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 ` [PATCH 10/13] drm/tegra: " Joerg Roedel
@ 2017-08-17 13:28   ` Thierry Reding
  0 siblings, 0 replies; 34+ messages in thread
From: Thierry Reding @ 2017-08-17 13:28 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	David Airlie, Jonathan Hunter, dri-devel, linux-tegra

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

On Thu, Aug 17, 2017 at 02:56:33PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> The map and unmap functions of the IOMMU-API changed their
> semantics: They do no longer guarantee that the hardware
> TLBs are synchronized with the page-table updates they made.
> 
> To make conversion easier, new synchronized functions have
> been introduced which give these guarantees again until the
> code is converted to use the new TLB-flush interface of the
> IOMMU-API, which allows certain optimizations.
> 
> But for now, just convert this code to use the synchronized
> functions so that it will behave as before.
> 
> Cc: Thierry Reding <thierry.reding@gmail.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Jonathan Hunter <jonathanh@nvidia.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-tegra@vger.kernel.org
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  drivers/gpu/drm/tegra/drm.c | 6 +++---
>  drivers/gpu/drm/tegra/gem.c | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)

Acked-by: Thierry Reding <treding@nvidia.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/13] gpu: host1x: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 ` [PATCH 11/13] gpu: host1x: " Joerg Roedel
@ 2017-08-17 13:29   ` Thierry Reding
  0 siblings, 0 replies; 34+ messages in thread
From: Thierry Reding @ 2017-08-17 13:29 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	dri-devel, linux-tegra

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

On Thu, Aug 17, 2017 at 02:56:34PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> The map and unmap functions of the IOMMU-API changed their
> semantics: They do no longer guarantee that the hardware
> TLBs are synchronized with the page-table updates they made.
> 
> To make conversion easier, new synchronized functions have
> been introduced which give these guarantees again until the
> code is converted to use the new TLB-flush interface of the
> IOMMU-API, which allows certain optimizations.
> 
> But for now, just convert this code to use the synchronized
> functions so that it will behave as before.
> 
> Cc: Thierry Reding <thierry.reding@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-tegra@vger.kernel.org
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  drivers/gpu/host1x/cdma.c | 6 +++---
>  drivers/gpu/host1x/job.c  | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)

Acked-by: Thierry Reding <treding@nvidia.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 ` [PATCH 06/13] drm/etnaviv: " Joerg Roedel
@ 2017-08-17 13:32   ` Lucas Stach
  2017-08-17 13:45     ` Joerg Roedel
  0 siblings, 1 reply; 34+ messages in thread
From: Lucas Stach @ 2017-08-17 13:32 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	Russell King, Christian Gmeiner, David Airlie, etnaviv,
	dri-devel

Hi Joerg,

Am Donnerstag, den 17.08.2017, 14:56 +0200 schrieb Joerg Roedel:
> From: Joerg Roedel <jroedel@suse.de>
> 
> The map and unmap functions of the IOMMU-API changed their
> semantics: They do no longer guarantee that the hardware
> TLBs are synchronized with the page-table updates they made.
> 
> To make conversion easier, new synchronized functions have
> been introduced which give these guarantees again until the
> code is converted to use the new TLB-flush interface of the
> IOMMU-API, which allows certain optimizations.
> 
> But for now, just convert this code to use the synchronized
> functions so that it will behave as before.

I don't think this is necessary. Etnaviv has managed and batched up TLB
flushes from day 1, as they don't happen through the MMU MMIO interface,
but through the GPU command stream.

So if my understanding of this series is correct, Etnaviv is just fine
with the changed semantics of the unsynchronized map/unmap calls.

Regards,
Lucas

> 
> Cc: Lucas Stach <l.stach@pengutronix.de>
> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: etnaviv@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
> index f103e78..ae0247c 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
> @@ -47,7 +47,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova,
>  
>  		VERB("map[%d]: %08x %08x(%zx)", i, iova, pa, bytes);
>  
> -		ret = iommu_map(domain, da, pa, bytes, prot);
> +		ret = iommu_map_sync(domain, da, pa, bytes, prot);
>  		if (ret)
>  			goto fail;
>  
> @@ -62,7 +62,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova,
>  	for_each_sg(sgt->sgl, sg, i, j) {
>  		size_t bytes = sg_dma_len(sg) + sg->offset;
>  
> -		iommu_unmap(domain, da, bytes);
> +		iommu_unmap_sync(domain, da, bytes);
>  		da += bytes;
>  	}
>  	return ret;
> @@ -80,7 +80,7 @@ int etnaviv_iommu_unmap(struct etnaviv_iommu *iommu, u32 iova,
>  		size_t bytes = sg_dma_len(sg) + sg->offset;
>  		size_t unmapped;
>  
> -		unmapped = iommu_unmap(domain, da, bytes);
> +		unmapped = iommu_unmap_sync(domain, da, bytes);
>  		if (unmapped < bytes)
>  			return unmapped;
>  
> @@ -338,7 +338,7 @@ int etnaviv_iommu_get_suballoc_va(struct etnaviv_gpu *gpu, dma_addr_t paddr,
>  			mutex_unlock(&mmu->lock);
>  			return ret;
>  		}
> -		ret = iommu_map(mmu->domain, vram_node->start, paddr, size,
> +		ret = iommu_map_sync(mmu->domain, vram_node->start, paddr, size,
>  				IOMMU_READ);
>  		if (ret < 0) {
>  			drm_mm_remove_node(vram_node);
> @@ -362,7 +362,7 @@ void etnaviv_iommu_put_suballoc_va(struct etnaviv_gpu *gpu,
>  
>  	if (mmu->version == ETNAVIV_IOMMU_V2) {
>  		mutex_lock(&mmu->lock);
> -		iommu_unmap(mmu->domain,iova, size);
> +		iommu_unmap_sync(mmu->domain,iova, size);
>  		drm_mm_remove_node(vram_node);
>  		mutex_unlock(&mmu->lock);
>  	}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API
  2017-08-17 13:32   ` Lucas Stach
@ 2017-08-17 13:45     ` Joerg Roedel
  2017-08-17 14:03       ` Lucas Stach
  0 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 13:45 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit,
	Russell King, Christian Gmeiner, David Airlie, etnaviv,
	dri-devel

Hi Lucas,

On Thu, Aug 17, 2017 at 03:32:38PM +0200, Lucas Stach wrote:
> I don't think this is necessary. Etnaviv has managed and batched up TLB
> flushes from day 1, as they don't happen through the MMU MMIO interface,
> but through the GPU command stream.
> 
> So if my understanding of this series is correct, Etnaviv is just fine
> with the changed semantics of the unsynchronized map/unmap calls.

This is not about any TLB on the GPU that could be flushed through the
GPU command stream, but about the TLB in the IOMMU device. Or is this
actually the same on this hardware? Which IOMMU-driver is use there?


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API
  2017-08-17 13:45     ` Joerg Roedel
@ 2017-08-17 14:03       ` Lucas Stach
  2017-08-17 14:18         ` Joerg Roedel
  0 siblings, 1 reply; 34+ messages in thread
From: Lucas Stach @ 2017-08-17 14:03 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit,
	Russell King, Christian Gmeiner, David Airlie, etnaviv,
	dri-devel

Am Donnerstag, den 17.08.2017, 15:45 +0200 schrieb Joerg Roedel:
> Hi Lucas,
> 
> On Thu, Aug 17, 2017 at 03:32:38PM +0200, Lucas Stach wrote:
> > I don't think this is necessary. Etnaviv has managed and batched up TLB
> > flushes from day 1, as they don't happen through the MMU MMIO interface,
> > but through the GPU command stream.
> > 
> > So if my understanding of this series is correct, Etnaviv is just fine
> > with the changed semantics of the unsynchronized map/unmap calls.
> 
> This is not about any TLB on the GPU that could be flushed through the
> GPU command stream, but about the TLB in the IOMMU device. Or is this
> actually the same on this hardware? Which IOMMU-driver is use there?

There is no IOMMU driver in use. Etnaviv just uses part of the IOMMU API
to manage the GPU internal MMU, see
drivers/gpu/drm/etnaviv/etnaviv_iommu.c

Regards,
Lucas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API
  2017-08-17 14:03       ` Lucas Stach
@ 2017-08-17 14:18         ` Joerg Roedel
  2017-08-17 14:30           ` Lucas Stach
  0 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 14:18 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit,
	Russell King, Christian Gmeiner, David Airlie, etnaviv,
	dri-devel

On Thu, Aug 17, 2017 at 04:03:54PM +0200, Lucas Stach wrote:
> There is no IOMMU driver in use. Etnaviv just uses part of the IOMMU API
> to manage the GPU internal MMU, see
> drivers/gpu/drm/etnaviv/etnaviv_iommu.c

That looks like a very fragile construct, because it relies on internal
behavior of the iommu code that can change in the future.

I strongly suggest to either make etnaviv_iommu.c a real iommu driver an
move it to drivers/iommu, or (prefered by me) just call directly into
the map/unmap functions of this driver from the rest of the
etnaviv_iommu.c. I don't really see a reason why the IOMMU-API needs to
be used there as another layer of indirection.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API
  2017-08-17 14:18         ` Joerg Roedel
@ 2017-08-17 14:30           ` Lucas Stach
  2017-08-17 14:35             ` Joerg Roedel
  0 siblings, 1 reply; 34+ messages in thread
From: Lucas Stach @ 2017-08-17 14:30 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit,
	Russell King, Christian Gmeiner, David Airlie, etnaviv,
	dri-devel

Am Donnerstag, den 17.08.2017, 16:18 +0200 schrieb Joerg Roedel:
> On Thu, Aug 17, 2017 at 04:03:54PM +0200, Lucas Stach wrote:
> > There is no IOMMU driver in use. Etnaviv just uses part of the IOMMU API
> > to manage the GPU internal MMU, see
> > drivers/gpu/drm/etnaviv/etnaviv_iommu.c
> 
> That looks like a very fragile construct, because it relies on internal
> behavior of the iommu code that can change in the future.
> 
> I strongly suggest to either make etnaviv_iommu.c a real iommu driver an
> move it to drivers/iommu, or (prefered by me) just call directly into
> the map/unmap functions of this driver from the rest of the
> etnaviv_iommu.c. I don't really see a reason why the IOMMU-API needs to
> be used there as another layer of indirection.

Yeah, the IOMMU API being used internally is a historical accident, that
we didn't get around to rectify yet. It's on my things-we-need-to-do
list to prune the usage of the IOMMU API in etnaviv.

Regards,
Lucas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API
  2017-08-17 14:30           ` Lucas Stach
@ 2017-08-17 14:35             ` Joerg Roedel
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 14:35 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit,
	Russell King, Christian Gmeiner, David Airlie, etnaviv,
	dri-devel

On Thu, Aug 17, 2017 at 04:30:48PM +0200, Lucas Stach wrote:
> Yeah, the IOMMU API being used internally is a historical accident, that
> we didn't get around to rectify yet. It's on my things-we-need-to-do
> list to prune the usage of the IOMMU API in etnaviv.

Okay, so for the time being, I drop the etnaviv patch from this series.


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface
  2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
                   ` (12 preceding siblings ...)
  2017-08-17 12:56 ` [PATCH 13/13] remoteproc: " Joerg Roedel
@ 2017-08-17 14:35 ` Alex Williamson
  2017-08-17 14:43   ` Joerg Roedel
  13 siblings, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2017-08-17 14:35 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel, Suravee Suthikulpanit

On Thu, 17 Aug 2017 14:56:23 +0200
Joerg Roedel <joro@8bytes.org> wrote:

> Hi,
> 
> here is a patch-set to introduce an explicit interface to
> the IOMMU-API to flush IOMMU and device IO/TLBs. Currently
> the iommu_map(), iommu_map_sg(), and iommu_unmap() functions
> have to make sure all IO/TLBs in the system are synchronized
> with the page-table updates they made.
> 
> This is very inefficient in some scenarios, for example when
> a large address space is unmapped and an IO/TLB flush has to
> be done in every call of iommu_unmap(). Or in a scenario
> where it makes sense to queue up some changes to the
> page-tables and flush them together.
> 
> To optimize these scenarios, the need to synchronize with
> the IOMMU and device TLBs has been removed from the
> map/unmap functions of the IOMMU-API and an interface to
> explicitly do the flushes has been introduced.
> 
> To make the conversion of existing users of the IOMMU-API
> easier, new functions - iommu_map_sync(), iommu_map_sg_sync(),
> and iommu_unmap_sync() - have been introduced. These
> functions guarantee that the IO/TLBs are synchronized with
> any page-table update when they return. The optimizations
> possible with the new interface are subject to separate
> patch-sets.

Hi Joerg,

Wouldn't it be much more friendly to downstreams and out-of-tree
drivers to introduce new functions for the async semantics?  ie.
iommu_map_async(), etc.  The API also seems a little cleaner that
iommu_map() stands alone, it's synchronous, iommu_map_async() is
explicitly asynchronous and a _flush() call is needed to finalize it.
What do you see as the advantage to the approach here?  Thanks,

Alex


> Patch 1 just renames a few functions in the AMD-Vi driver
> that would otherwise collide with the new TLB-flush
> functions from the IOMMU-API.
> 
> Patch 2 introduces the new IO/TLB Flush-Interface.
> 
> Patch 3-13 convert existing users of the IOMMU-API to use
> the *_sync functions for now.
> 
> Please review.
> 
> Thanks,
> 
> 	Joerg
> 
> Joerg Roedel (13):
>   iommu/amd: Rename a few flush functions
>   iommu: Introduce Interface for IOMMU TLB Flushing
>   vfio/type1: Use sychronized interface of the IOMMU-API
>   iommu/dma: Use sychronized interface of the IOMMU-API
>   arm: dma-mapping: Use sychronized interface of the IOMMU-API
>   drm/etnaviv: Use sychronized interface of the IOMMU-API
>   drm/msm: Use sychronized interface of the IOMMU-API
>   drm/nouveau/imem/gk20a: Use sychronized interface of the IOMMU-API
>   drm/rockchip: Use sychronized interface of the IOMMU-API
>   drm/tegra: Use sychronized interface of the IOMMU-API
>   gpu: host1x: Use sychronized interface of the IOMMU-API
>   IB/usnic: Use sychronized interface of the IOMMU-API
>   remoteproc: Use sychronized interface of the IOMMU-API
> 
>  arch/arm/mm/dma-mapping.c                          | 21 +++---
>  drivers/gpu/drm/etnaviv/etnaviv_mmu.c              | 10 +--
>  drivers/gpu/drm/msm/msm_iommu.c                    |  5 +-
>  .../gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c    | 12 ++--
>  drivers/gpu/drm/rockchip/rockchip_drm_gem.c        |  6 +-
>  drivers/gpu/drm/tegra/drm.c                        |  6 +-
>  drivers/gpu/drm/tegra/gem.c                        |  6 +-
>  drivers/gpu/host1x/cdma.c                          |  6 +-
>  drivers/gpu/host1x/job.c                           |  6 +-
>  drivers/infiniband/hw/usnic/usnic_uiom.c           | 10 +--
>  drivers/iommu/amd_iommu.c                          | 16 ++---
>  drivers/iommu/dma-iommu.c                          |  8 +--
>  drivers/iommu/iommu.c                              | 26 +++++++
>  drivers/remoteproc/remoteproc_core.c               | 10 +--
>  drivers/vfio/vfio_iommu_type1.c                    | 38 +++++-----
>  include/linux/iommu.h                              | 80 +++++++++++++++++++++-
>  16 files changed, 189 insertions(+), 77 deletions(-)
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface
  2017-08-17 14:35 ` [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Alex Williamson
@ 2017-08-17 14:43   ` Joerg Roedel
  2017-08-17 14:54     ` Alex Williamson
  0 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 14:43 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, linux-kernel, Suravee Suthikulpanit

Hi Alex,

On Thu, Aug 17, 2017 at 08:35:20AM -0600, Alex Williamson wrote:
> Wouldn't it be much more friendly to downstreams and out-of-tree
> drivers to introduce new functions for the async semantics?  ie.
> iommu_map_async(), etc.  The API also seems a little cleaner that
> iommu_map() stands alone, it's synchronous, iommu_map_async() is
> explicitly asynchronous and a _flush() call is needed to finalize it.
> What do you see as the advantage to the approach here?  Thanks,

The reason I did it this way was that I want the iommu_map(),
iommu_unmap(), and iomu_map_sg() functions be considered the _default_
to chose when using the IOMMU-API, because their use is faster than
using the _sync() variants. Or in other words, I want the _sync function
names to imply that they are slower versions of the default ones.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface
  2017-08-17 14:43   ` Joerg Roedel
@ 2017-08-17 14:54     ` Alex Williamson
  2017-08-17 15:22       ` Joerg Roedel
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2017-08-17 14:54 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel, Suravee Suthikulpanit

On Thu, 17 Aug 2017 16:43:08 +0200
Joerg Roedel <joro@8bytes.org> wrote:

> Hi Alex,
> 
> On Thu, Aug 17, 2017 at 08:35:20AM -0600, Alex Williamson wrote:
> > Wouldn't it be much more friendly to downstreams and out-of-tree
> > drivers to introduce new functions for the async semantics?  ie.
> > iommu_map_async(), etc.  The API also seems a little cleaner that
> > iommu_map() stands alone, it's synchronous, iommu_map_async() is
> > explicitly asynchronous and a _flush() call is needed to finalize it.
> > What do you see as the advantage to the approach here?  Thanks,  
> 
> The reason I did it this way was that I want the iommu_map(),
> iommu_unmap(), and iomu_map_sg() functions be considered the _default_
> to chose when using the IOMMU-API, because their use is faster than
> using the _sync() variants. Or in other words, I want the _sync function
> names to imply that they are slower versions of the default ones.

So _sync() does imply that they're slower, but iommu_map() does not
imply that a _flush() is required.  One is a performance issue, the
other is an API usability issue.  If the sync version is used
sub-optimally, it's a performance issue, not a correctness issue.  If
the async version is used without an explicit flush, it's a correctness
issue.  Therefore, I would lean towards making the asynchronous mode
explicit and providing good documentation and comments to steer
developers to the async version.  I think it makes the API harder to
use incorrectly.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface
  2017-08-17 14:54     ` Alex Williamson
@ 2017-08-17 15:22       ` Joerg Roedel
  2017-08-23 12:09         ` Joerg Roedel
  0 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 15:22 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, linux-kernel, Suravee Suthikulpanit

On Thu, Aug 17, 2017 at 08:54:07AM -0600, Alex Williamson wrote:
> So _sync() does imply that they're slower, but iommu_map() does not
> imply that a _flush() is required.  One is a performance issue, the
> other is an API usability issue.  If the sync version is used
> sub-optimally, it's a performance issue, not a correctness issue.  If
> the async version is used without an explicit flush, it's a correctness
> issue.  Therefore, I would lean towards making the asynchronous mode
> explicit and providing good documentation and comments to steer
> developers to the async version.  I think it makes the API harder to
> use incorrectly.

I agree that it makes the API a bit more complicated to use. But that
can be solved by documenting it and by converting the main users (for me
that is VFIO and dma-iommu.c) of it to the unsynchronized interface,
because people will look at existing users and just do what they do.

Introducing _sync functions instead of _async ones also has the
side-effect that it puts pressure on the maintainers of the code to
convert it to the async interface, because they now see explicitly that
they use the slow version and start looking for ways to make things
faster.

What I absolutly don't want is that the whole explicit TLB flushing
of the IOMMU-API (as introduced in this patch-set) is considered some
optional part of the API, as it would be when I just introduce _async
versions of map/unmap/map_sg.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing
  2017-08-17 12:56 ` [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing Joerg Roedel
@ 2017-08-17 16:32   ` Will Deacon
  2017-08-17 16:50     ` Joerg Roedel
  0 siblings, 1 reply; 34+ messages in thread
From: Will Deacon @ 2017-08-17 16:32 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	Alex Williamson, Robin Murphy

Hi Joerg,

I really like the idea of this, but I have a couple of questions and
comments below.

On Thu, Aug 17, 2017 at 02:56:25PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> With the current IOMMU-API the hardware TLBs have to be
> flushed in every iommu_map(), iommu_map_sg(), and
> iommu_unmap() call.
> 
> For unmapping large amounts of address space, like it
> happens when a KVM domain with assigned devices is
> destroyed, this causes thousands of unnecessary TLB flushes
> in the IOMMU hardware because the unmap call-back runs for
> every unmapped physical page.
> 
> With the TLB Flush Interface introduced here the need to
> clean the hardware TLBs is removed from the iommu_map/unmap
> functions. Users now have to explicitly call these functions
> to sync the page-table changes to the hardware.
> 
> Three functions are introduced:
> 
> 	* iommu_flush_tlb_all() - Flushes all TLB entries
> 	                          associated with that
> 				  domain. TLBs entries are
> 				  flushed when this function
> 				  returns.
> 
> 	* iommu_tlb_range_add() - This will add a given
> 				  range to the flush queue
> 				  for this domain.
> 
> 	* iommu_tlb_sync() - Flushes all queued ranges from
> 			     the hardware TLBs. Returns when
> 			     the flush is finished.
> 
> The semantic of this interface is intentionally similar to
> the iommu_gather_ops from the io-pgtable code.
> 
> Additionally, this patch introduces synchronized versions of
> the iommu_map(), iommu_map_sg(), and iommu_unmap()
> functions. They will be used by current users of the
> IOMMU-API, before they are optimized to the unsynchronized
> versions.
> 
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  drivers/iommu/iommu.c | 26 +++++++++++++++++
>  include/linux/iommu.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 105 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3f6ea16..816e248 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -527,6 +527,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
>  
>  	}
>  
> +	iommu_flush_tlb_all(domain);
> +
>  out:
>  	iommu_put_resv_regions(dev, &mappings);
>  
> @@ -1556,6 +1558,18 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
>  
> +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
> +		   phys_addr_t paddr, size_t size, int prot)
> +{
> +	int ret = iommu_map(domain, iova, paddr, size, prot);
> +
> +	iommu_tlb_range_add(domain, iova, size);
> +	iommu_tlb_sync(domain);

Many IOMMUs don't need these callbacks on ->map operations, but they won't
be able to distinguish them easily with this API. Could you add a flags
parameter or something to the iommu_tlb_* functions, please?

> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_map_sync);
> +
>  size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  {
>  	size_t unmapped_page, unmapped = 0;
> @@ -1608,6 +1622,18 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  }
>  EXPORT_SYMBOL_GPL(iommu_unmap);
>  
> +size_t iommu_unmap_sync(struct iommu_domain *domain,
> +			unsigned long iova, size_t size)
> +{
> +	size_t ret = iommu_unmap(domain, iova, size);
> +
> +	iommu_tlb_range_add(domain, iova, size);
> +	iommu_tlb_sync(domain);

I think we will struggle to implement this efficiently on ARM SMMUv3. The
way invalidation works there is that there is a single in-memory command
queue into which we can put TLB invalidation commands (they are inserted
under a lock). These are then processed asynchronously by the hardware, and
you can complete them by inserting a SYNC command and waiting for that to
be consumed by the SMMU. Sounds like a perfect fit, right?

The problem is that we want to add those invalidation commands as early
as possible, so that they can be processed by the hardware concurrently
with us unmapping other pages. That means adding the invalidation commands
in the ->unmap callback and not bothering to implement ->iotlb_range_add
callback at all. Then, we will do the sync in ->iotlb_sync. This falls
apart if somebody decides to use iommu_flush_tlb_all(), where we would
prefer not to insert all of the invalidation commands in unmap and instead
insert a single invalidate-all command, followed up with a SYNC.

In other words, we really need the information about the invalidation as
part of the unmap call.

Any ideas?

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing
  2017-08-17 16:32   ` Will Deacon
@ 2017-08-17 16:50     ` Joerg Roedel
  2017-08-17 17:17       ` Will Deacon
  0 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 16:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit,
	Alex Williamson, Robin Murphy

Hi Will,

On Thu, Aug 17, 2017 at 05:32:35PM +0100, Will Deacon wrote:
> I really like the idea of this, but I have a couple of questions and
> comments below.

Great, this together with the common iova-flush it should make it
possible to solve the performance problems of the dma-iommu code.

> > +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
> > +		   phys_addr_t paddr, size_t size, int prot)
> > +{
> > +	int ret = iommu_map(domain, iova, paddr, size, prot);
> > +
> > +	iommu_tlb_range_add(domain, iova, size);
> > +	iommu_tlb_sync(domain);
> 
> Many IOMMUs don't need these callbacks on ->map operations, but they won't
> be able to distinguish them easily with this API. Could you add a flags
> parameter or something to the iommu_tlb_* functions, please?

Yeah, this is only needed for virtualized IOMMUs that have a non-present
cache. My idea was to let the iommu-drivers tell the common code whether
the iommu needs it and the code above just checks a flag and omits the
calls to the flush-functions then.

Problem currently is how to get this information from
'struct iommu_device' to 'struct iommu_domain'. As a workaround I
consider a per-domain flag in the iommu drivers which checks whether any
unmap has happened and just do nothing on the flush-call-back if there
were none.

> I think we will struggle to implement this efficiently on ARM SMMUv3. The
> way invalidation works there is that there is a single in-memory command
> queue into which we can put TLB invalidation commands (they are inserted
> under a lock). These are then processed asynchronously by the hardware, and
> you can complete them by inserting a SYNC command and waiting for that to
> be consumed by the SMMU. Sounds like a perfect fit, right?

Yes, its basically the same as way as it works on AMD-Vi and Intel VT-d.

> The problem is that we want to add those invalidation commands as early
> as possible, so that they can be processed by the hardware concurrently
> with us unmapping other pages.

I think that's a bad idea, because then you re-introduce the performance
problems again because everyone will spin on the cmd-queue lock in the
unmap path of the dma-api.

> That means adding the invalidation commands in the ->unmap callback
> and not bothering to implement ->iotlb_range_add callback at all.
> Then, we will do the sync in ->iotlb_sync. This falls apart if
> somebody decides to use iommu_flush_tlb_all(), where we would prefer
> not to insert all of the invalidation commands in unmap and instead
> insert a single invalidate-all command, followed up with a SYNC.

This problem can be solved with the deferred iova flushing code I posted
to the ML. When a queue fills up, iommu_flush_tlb_all() is called and
every entry that was unmapped before can be released. This works well on
x86, are there reasons it wouldn't on ARM?

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing
  2017-08-17 16:50     ` Joerg Roedel
@ 2017-08-17 17:17       ` Will Deacon
  2017-08-17 21:20         ` Joerg Roedel
  0 siblings, 1 reply; 34+ messages in thread
From: Will Deacon @ 2017-08-17 17:17 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit,
	Alex Williamson, Robin Murphy

Hi Joerg,

On Thu, Aug 17, 2017 at 06:50:40PM +0200, Joerg Roedel wrote:
> On Thu, Aug 17, 2017 at 05:32:35PM +0100, Will Deacon wrote:
> > I really like the idea of this, but I have a couple of questions and
> > comments below.
> 
> Great, this together with the common iova-flush it should make it
> possible to solve the performance problems of the dma-iommu code.
> 
> > > +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
> > > +		   phys_addr_t paddr, size_t size, int prot)
> > > +{
> > > +	int ret = iommu_map(domain, iova, paddr, size, prot);
> > > +
> > > +	iommu_tlb_range_add(domain, iova, size);
> > > +	iommu_tlb_sync(domain);
> > 
> > Many IOMMUs don't need these callbacks on ->map operations, but they won't
> > be able to distinguish them easily with this API. Could you add a flags
> > parameter or something to the iommu_tlb_* functions, please?
> 
> Yeah, this is only needed for virtualized IOMMUs that have a non-present
> cache. My idea was to let the iommu-drivers tell the common code whether
> the iommu needs it and the code above just checks a flag and omits the
> calls to the flush-functions then.
> 
> Problem currently is how to get this information from
> 'struct iommu_device' to 'struct iommu_domain'. As a workaround I
> consider a per-domain flag in the iommu drivers which checks whether any
> unmap has happened and just do nothing on the flush-call-back if there
> were none.

Given that this can all happen concurrently, I really don't like the idea of
having to track things with a flag. We'd end up introducing atomics and/or
over-invalidating the TLBs.

> > I think we will struggle to implement this efficiently on ARM SMMUv3. The
> > way invalidation works there is that there is a single in-memory command
> > queue into which we can put TLB invalidation commands (they are inserted
> > under a lock). These are then processed asynchronously by the hardware, and
> > you can complete them by inserting a SYNC command and waiting for that to
> > be consumed by the SMMU. Sounds like a perfect fit, right?
> 
> Yes, its basically the same as way as it works on AMD-Vi and Intel VT-d.
> 
> > The problem is that we want to add those invalidation commands as early
> > as possible, so that they can be processed by the hardware concurrently
> > with us unmapping other pages.
> 
> I think that's a bad idea, because then you re-introduce the performance
> problems again because everyone will spin on the cmd-queue lock in the
> unmap path of the dma-api.

We don't actually tend to see issues adding the TLB invalidation commands
under the lock -- the vast majority of the overhead comes from the SYNC.
Besides, I don't see how adding the commands in the ->iotlb_range_add
callback is any better: it still happens on unmap and it still needs to
take the lock.

If we had something like an ->unmap_all_sync callback, we could incorporate
the TLB invalidation into that.

> > That means adding the invalidation commands in the ->unmap callback
> > and not bothering to implement ->iotlb_range_add callback at all.
> > Then, we will do the sync in ->iotlb_sync. This falls apart if
> > somebody decides to use iommu_flush_tlb_all(), where we would prefer
> > not to insert all of the invalidation commands in unmap and instead
> > insert a single invalidate-all command, followed up with a SYNC.
> 
> This problem can be solved with the deferred iova flushing code I posted
> to the ML. When a queue fills up, iommu_flush_tlb_all() is called and
> every entry that was unmapped before can be released. This works well on
> x86, are there reasons it wouldn't on ARM?

There are a few reasons I'm not rushing to move to the deferred flushing
code for ARM:

  1. The performance numbers we have suggest that we can achieve near-native
     performance without needing to do that.

  2. We can free page-table pages in unmap, but that's not safe if we defer
     flushing

  3. Flushing the whole TLB is undesirable and not something we currently
     need to do

  4. There are security implications of deferring the unmap and I'm aware
     of a security research group that use this to gain root privileges.

  5. *If* performance figures end up showing that deferring the flush is
     worthwhile, I would rather use an RCU-based approach for protecting
     the page tables, like we do on the CPU.

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing
  2017-08-17 17:17       ` Will Deacon
@ 2017-08-17 21:20         ` Joerg Roedel
  2017-08-18 15:16           ` Will Deacon
  0 siblings, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 21:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit,
	Alex Williamson, Robin Murphy

Hi Will,

On Thu, Aug 17, 2017 at 06:17:05PM +0100, Will Deacon wrote:
> On Thu, Aug 17, 2017 at 06:50:40PM +0200, Joerg Roedel wrote:
> > Problem currently is how to get this information from
> > 'struct iommu_device' to 'struct iommu_domain'. As a workaround I
> > consider a per-domain flag in the iommu drivers which checks whether any
> > unmap has happened and just do nothing on the flush-call-back if there
> > were none.
> 
> Given that this can all happen concurrently, I really don't like the idea of
> having to track things with a flag. We'd end up introducing atomics and/or
> over-invalidating the TLBs.

Okay, I look into a better solution for that.

> We don't actually tend to see issues adding the TLB invalidation commands
> under the lock -- the vast majority of the overhead comes from the SYNC.
> Besides, I don't see how adding the commands in the ->iotlb_range_add
> callback is any better: it still happens on unmap and it still needs to
> take the lock.

With the deferred flushing you don't flush anything in the unmap path in
most cases.  All you do there is to add the unmapped iova-range to a
per-cpu list (its actually a ring-buffer). Only when that buffer is full
you do a flush_tlb_all() on the domain and then free all the iova
ranges.

With the flush-counters you can also see which entries in your buffer
have already been flushed from the IO/TLB by another CPU, so that you
can release them right away without any further flush. This way its less
likely that the buffer fills up.

In my tests on x86 I got the flush-rate down to ~1800 flushes/sec at a
network packet rate of 1.45 million pps.

> There are a few reasons I'm not rushing to move to the deferred flushing
> code for ARM:
> 
>   1. The performance numbers we have suggest that we can achieve near-native
>      performance without needing to do that.

Hard to believe when all CPUs fight for the cmd-buffer lock, especially
when you have around 96 CPUs :) Can you share the performance numbers
you have and what you measured?

>   2. We can free page-table pages in unmap, but that's not safe if we defer
>      flushing

Right, VT-d has the same problem and solved it with a free-list of pages
that is passed to the deferred flushing code. When the IO/TLB is flushed
it calls back into the driver which then frees the pages.

>   3. Flushing the whole TLB is undesirable and not something we currently
>      need to do

Is the TLB-refill cost higher than the time needed to add a
flush-command for every unmapped range?

>   4. There are security implications of deferring the unmap and I'm aware
>      of a security research group that use this to gain root privileges.

Interesting, can you share more about that?

>   5. *If* performance figures end up showing that deferring the flush is
>      worthwhile, I would rather use an RCU-based approach for protecting
>      the page tables, like we do on the CPU.

Yeah, I don't like the entry_dtor_cb() I introduced for that very much, so if
there are better solutions I am all ears.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing
  2017-08-17 21:20         ` Joerg Roedel
@ 2017-08-18 15:16           ` Will Deacon
  0 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2017-08-18 15:16 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit,
	Alex Williamson, Robin Murphy

Hi Joerg,

On Thu, Aug 17, 2017 at 11:20:54PM +0200, Joerg Roedel wrote:
> On Thu, Aug 17, 2017 at 06:17:05PM +0100, Will Deacon wrote:
> > On Thu, Aug 17, 2017 at 06:50:40PM +0200, Joerg Roedel wrote:
> > > Problem currently is how to get this information from
> > > 'struct iommu_device' to 'struct iommu_domain'. As a workaround I
> > > consider a per-domain flag in the iommu drivers which checks whether any
> > > unmap has happened and just do nothing on the flush-call-back if there
> > > were none.
> > 
> > Given that this can all happen concurrently, I really don't like the idea of
> > having to track things with a flag. We'd end up introducing atomics and/or
> > over-invalidating the TLBs.
> 
> Okay, I look into a better solution for that.

Thanks. One possibility is that IOMMU drivers requiring TLB invalidation on
->map can set a flag on the domain when they allocate it, which the IOMMU
core can test for later on.

> > We don't actually tend to see issues adding the TLB invalidation commands
> > under the lock -- the vast majority of the overhead comes from the SYNC.
> > Besides, I don't see how adding the commands in the ->iotlb_range_add
> > callback is any better: it still happens on unmap and it still needs to
> > take the lock.
> 
> With the deferred flushing you don't flush anything in the unmap path in
> most cases.  All you do there is to add the unmapped iova-range to a
> per-cpu list (its actually a ring-buffer). Only when that buffer is full
> you do a flush_tlb_all() on the domain and then free all the iova
> ranges.
> 
> With the flush-counters you can also see which entries in your buffer
> have already been flushed from the IO/TLB by another CPU, so that you
> can release them right away without any further flush. This way its less
> likely that the buffer fills up.
> 
> In my tests on x86 I got the flush-rate down to ~1800 flushes/sec at a
> network packet rate of 1.45 million pps.
> 
> > There are a few reasons I'm not rushing to move to the deferred flushing
> > code for ARM:
> > 
> >   1. The performance numbers we have suggest that we can achieve near-native
> >      performance without needing to do that.
> 
> Hard to believe when all CPUs fight for the cmd-buffer lock, especially
> when you have around 96 CPUs :) Can you share the performance numbers
> you have and what you measured?

I can only point to numbers that have already been posted to the list, but
the numbers in this thread here are promising:

https://marc.info/?i=c1d85f28-c57b-4414-3504-16afb3a19ce0%40codeaurora.org

Note that patch 1 in that series doesn't remove the locking, it just removes
the heavy barrier instruction after advancing the queue pointer (and we
can easily do that with your series, deferring it to the sync).

As I said, if the locking does turn out to be a problem and this is backed
up by profiling data, then I'll look into it, but at the moment it doesn't
appear to be the case.

Just a thought, but if we could return a token from unmap to pass to the
TLB invalidation functions, then it would be possible to do a spin_trylock
on the command queue lock in ->unmap. If you get the lock, then you put the
command in, if you don't then you set something in the token and the
add_range would see that flag and insert the commands then.

> >   2. We can free page-table pages in unmap, but that's not safe if we defer
> >      flushing
> 
> Right, VT-d has the same problem and solved it with a free-list of pages
> that is passed to the deferred flushing code. When the IO/TLB is flushed
> it calls back into the driver which then frees the pages.

There are some situations where we cannot defer. For example, if we map 2MB
at the PMD level in the page table, but 4k of that region is then unmapped.
In this case, we have to allocate a new level but we cannot plumb it into
the table without performing (and completing) TLB maintenance.

> >   3. Flushing the whole TLB is undesirable and not something we currently
> >      need to do
> 
> Is the TLB-refill cost higher than the time needed to add a
> flush-command for every unmapped range?

The TLB-refill cost is likely to be significantly higher for an SMMU than
the CPU, because the TLBs tend to be distributed with a centralised table
walker. Furthermore, doing an invalidate-all on a domain may cause other
masters in the domain to miss a deadline, which is a common complaint we've
had from graphics folks in the past (who basically pin their buffers and
rely on never missing).

> >   4. There are security implications of deferring the unmap and I'm aware
> >      of a security research group that use this to gain root privileges.
> 
> Interesting, can you share more about that?

Unfortunately, the paper isn't public yet so I can't say more right now.
When it's published, I'll point you to it!

> >   5. *If* performance figures end up showing that deferring the flush is
> >      worthwhile, I would rather use an RCU-based approach for protecting
> >      the page tables, like we do on the CPU.
> 
> Yeah, I don't like the entry_dtor_cb() I introduced for that very much, so if
> there are better solutions I am all ears.

I think it should be possible to protect the page tables with RCU and then
run the TLB invalidation, freeing and IOVA reclaim as part of the callback.
It's not simple to implement, though, because you run into concurrency
issues where the callback can run in parallel with unmappers etc.

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 ` [PATCH 07/13] drm/msm: " Joerg Roedel
@ 2017-08-19 15:39   ` Rob Clark
  0 siblings, 0 replies; 34+ messages in thread
From: Rob Clark @ 2017-08-19 15:39 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, Linux Kernel Mailing List, Suravee Suthikulpanit,
	Joerg Roedel, David Airlie, linux-arm-msm, dri-devel, freedreno

On Thu, Aug 17, 2017 at 8:56 AM, Joerg Roedel <joro@8bytes.org> wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> The map and unmap functions of the IOMMU-API changed their
> semantics: They do no longer guarantee that the hardware
> TLBs are synchronized with the page-table updates they made.
>
> To make conversion easier, new synchronized functions have
> been introduced which give these guarantees again until the
> code is converted to use the new TLB-flush interface of the
> IOMMU-API, which allows certain optimizations.
>
> But for now, just convert this code to use the synchronized
> functions so that it will behave as before.


I like the idea of decoupling tlb inv with map (well that doesn't
really effect any recent hw) and unmap (which does need tlb flush)..
and for a gpu driver it should be pretty easy to back up unmaps.

I did slightly prefer adding a new iommu_{map,unmap}_async() rather
than changing semantics of existing API, but either way:

Acked-by: Rob Clark <robdclark@gmail.com>

> Cc: Rob Clark <robdclark@gmail.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: linux-arm-msm@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: freedreno@lists.freedesktop.org
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  drivers/gpu/drm/msm/msm_iommu.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index b23d336..b3525b7 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -64,7 +64,8 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
>         size_t ret;
>
>  //     pm_runtime_get_sync(mmu->dev);
> -       ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
> +       ret = iommu_map_sg_sync(iommu->domain, iova, sgt->sgl,
> +                               sgt->nents, prot);
>  //     pm_runtime_put_sync(mmu->dev);
>         WARN_ON(ret < 0);
>
> @@ -77,7 +78,7 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova,
>         struct msm_iommu *iommu = to_msm_iommu(mmu);
>
>         pm_runtime_get_sync(mmu->dev);
> -       iommu_unmap(iommu->domain, iova, len);
> +       iommu_unmap_sync(iommu->domain, iova, len);
>         pm_runtime_put_sync(mmu->dev);
>
>         return 0;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface
  2017-08-17 15:22       ` Joerg Roedel
@ 2017-08-23 12:09         ` Joerg Roedel
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-23 12:09 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, linux-kernel

On Thu, Aug 17, 2017 at 05:22:20PM +0200, Joerg Roedel wrote:
> What I absolutly don't want is that the whole explicit TLB flushing
> of the IOMMU-API (as introduced in this patch-set) is considered some
> optional part of the API, as it would be when I just introduce _async
> versions of map/unmap/map_sg.

Okay, forget that :)

The discussions I had around this interface made me change it a little
bit in the version 2 of the patch-set which I will post soon.

I thought a bit more about the iommu_map() code-path. It really doesn't
make any sense to remove the tlb-sync requirement from it, because in
almost all cases the hardware doesn't require any flushes after a map
operation anyway. And in the rare cases where it does - because the
hardware is emulated and slow - the iommu-driver can handle
that by doing a flush in its iommu_ops->map() call-back.

So I removed the iommu_map_sync() and iommu_map_sg_sync() functions from
this series. With those changes it also doesn't make sense anymore
to have different tlb-sync semantics between iommu_map() and
iommu_unmap(). So I ended up introducing a new iommu_unmap_fast()
function which can unmap ranges and return with dirty io-tlbs.

This makes the extension of couse look somewhat optional, which I tried
to avoid, but I hope the '_fast' part of the name is enough motivation for
iommu-api users to look into ways to use it in their code.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 13/13] remoteproc: Use sychronized interface of the IOMMU-API
  2017-08-17 12:56 ` [PATCH 13/13] remoteproc: " Joerg Roedel
@ 2017-08-24 19:06   ` Bjorn Andersson
  0 siblings, 0 replies; 34+ messages in thread
From: Bjorn Andersson @ 2017-08-24 19:06 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-kernel, Suravee Suthikulpanit, Joerg Roedel,
	Ohad Ben-Cohen, linux-remoteproc

On Thu 17 Aug 05:56 PDT 2017, Joerg Roedel wrote:

> From: Joerg Roedel <jroedel@suse.de>
> 
> The map and unmap functions of the IOMMU-API changed their
> semantics: They do no longer guarantee that the hardware
> TLBs are synchronized with the page-table updates they made.
> 
> To make conversion easier, new synchronized functions have
> been introduced which give these guarantees again until the
> code is converted to use the new TLB-flush interface of the
> IOMMU-API, which allows certain optimizations.
> 
> But for now, just convert this code to use the synchronized
> functions so that it will behave as before.
> 
> Cc: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
> Cc: linux-remoteproc@vger.kernel.org
> Signed-off-by: Joerg Roedel <jroedel@suse.de>

Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn

> ---
>  drivers/remoteproc/remoteproc_core.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 564061d..16db19c 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -553,7 +553,8 @@ static int rproc_handle_devmem(struct rproc *rproc, struct fw_rsc_devmem *rsc,
>  	if (!mapping)
>  		return -ENOMEM;
>  
> -	ret = iommu_map(rproc->domain, rsc->da, rsc->pa, rsc->len, rsc->flags);
> +	ret = iommu_map_sync(rproc->domain, rsc->da,
> +			     rsc->pa, rsc->len, rsc->flags);
>  	if (ret) {
>  		dev_err(dev, "failed to map devmem: %d\n", ret);
>  		goto out;
> @@ -661,8 +662,8 @@ static int rproc_handle_carveout(struct rproc *rproc,
>  			goto dma_free;
>  		}
>  
> -		ret = iommu_map(rproc->domain, rsc->da, dma, rsc->len,
> -				rsc->flags);
> +		ret = iommu_map_sync(rproc->domain, rsc->da, dma, rsc->len,
> +				     rsc->flags);
>  		if (ret) {
>  			dev_err(dev, "iommu_map failed: %d\n", ret);
>  			goto free_mapping;
> @@ -823,7 +824,8 @@ static void rproc_resource_cleanup(struct rproc *rproc)
>  	list_for_each_entry_safe(entry, tmp, &rproc->mappings, node) {
>  		size_t unmapped;
>  
> -		unmapped = iommu_unmap(rproc->domain, entry->da, entry->len);
> +		unmapped = iommu_unmap_sync(rproc->domain, entry->da,
> +					    entry->len);
>  		if (unmapped != entry->len) {
>  			/* nothing much to do besides complaining */
>  			dev_err(dev, "failed to unmap %u/%zu\n", entry->len,
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-08-24 19:06 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
2017-08-17 12:56 ` [PATCH 01/13] iommu/amd: Rename a few flush functions Joerg Roedel
2017-08-17 12:56 ` [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing Joerg Roedel
2017-08-17 16:32   ` Will Deacon
2017-08-17 16:50     ` Joerg Roedel
2017-08-17 17:17       ` Will Deacon
2017-08-17 21:20         ` Joerg Roedel
2017-08-18 15:16           ` Will Deacon
2017-08-17 12:56 ` [PATCH 03/13] vfio/type1: Use sychronized interface of the IOMMU-API Joerg Roedel
2017-08-17 12:56 ` [PATCH 04/13] iommu/dma: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 05/13] arm: dma-mapping: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 06/13] drm/etnaviv: " Joerg Roedel
2017-08-17 13:32   ` Lucas Stach
2017-08-17 13:45     ` Joerg Roedel
2017-08-17 14:03       ` Lucas Stach
2017-08-17 14:18         ` Joerg Roedel
2017-08-17 14:30           ` Lucas Stach
2017-08-17 14:35             ` Joerg Roedel
2017-08-17 12:56 ` [PATCH 07/13] drm/msm: " Joerg Roedel
2017-08-19 15:39   ` Rob Clark
2017-08-17 12:56 ` [PATCH 08/13] drm/nouveau/imem/gk20a: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 09/13] drm/rockchip: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 10/13] drm/tegra: " Joerg Roedel
2017-08-17 13:28   ` Thierry Reding
2017-08-17 12:56 ` [PATCH 11/13] gpu: host1x: " Joerg Roedel
2017-08-17 13:29   ` Thierry Reding
2017-08-17 12:56 ` [PATCH 12/13] IB/usnic: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 13/13] remoteproc: " Joerg Roedel
2017-08-24 19:06   ` Bjorn Andersson
2017-08-17 14:35 ` [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Alex Williamson
2017-08-17 14:43   ` Joerg Roedel
2017-08-17 14:54     ` Alex Williamson
2017-08-17 15:22       ` Joerg Roedel
2017-08-23 12:09         ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).