linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] iommu/amd: Enable page-selective flushes
@ 2021-06-07 18:25 Nadav Amit
  2021-06-07 18:25 ` [PATCH v3 1/6] iommu/amd: Selective flush on unmap Nadav Amit
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Nadav Amit @ 2021-06-07 18:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Nadav Amit, Will Deacon, Jiajun Cao, Robin Murphy, Lu Baolu,
	iommu, linux-kernel

From: Nadav Amit <namit@vmware.com>

The previous patch, commit 268aa4548277 ("iommu/amd: Page-specific
invalidations for more than one page") was supposed to enable
page-selective IOTLB flushes on AMD.

Besides the bug that was already fixed by commit a017c567915f
("iommu/amd: Fix wrong parentheses on page-specific invalidations")
there are several remaining matters to enable and benefit from
page-selective IOTLB flushes on AMD:

1. Enable selective flushes on unmap (patch 1)
2. Avoid using flush-queue on vIOMMUs (patch 2)
3. Relaxed flushes when gathering, excluding vIOMMUs (patches 3-5)
4. Syncing once on scatter-gather map operations (patch 6)

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org

--- 

v2->v3:
* Rebase on v5.13-rc5
* Refactoring (patches 4-5) [Robin]
* Rework flush logic (patch 5): more relaxed on native
* Syncing once on scatter-gather operations (patch 6)

v1->v2:
* Rebase on v5.13-rc3

Nadav Amit (5):
  iommu/amd: Selective flush on unmap
  iommu/amd: Do not use flush-queue when NpCache is on
  iommu: Factor iommu_iotlb_gather_is_disjoint() out
  iommu/amd: Tailored gather logic for AMD
  iommu/amd: Sync once for scatter-gather operations

Robin Murphy (1):
  iommu: Improve iommu_iotlb_gather helpers

 drivers/iommu/amd/init.c  |  7 +++-
 drivers/iommu/amd/iommu.c | 72 ++++++++++++++++++++++++++++++---
 drivers/iommu/mtk_iommu.c |  5 +--
 include/linux/iommu.h     | 84 +++++++++++++++++++++++++++++++++------
 4 files changed, 145 insertions(+), 23 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/6] iommu/amd: Selective flush on unmap
  2021-06-07 18:25 [PATCH v3 0/6] iommu/amd: Enable page-selective flushes Nadav Amit
@ 2021-06-07 18:25 ` Nadav Amit
  2021-06-07 18:25 ` [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on Nadav Amit
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Nadav Amit @ 2021-06-07 18:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Nadav Amit, Will Deacon, Jiajun Cao, Robin Murphy, Lu Baolu,
	iommu, linux-kernel

From: Nadav Amit <namit@vmware.com>

Recent patch attempted to enable selective page flushes on AMD IOMMU but
neglected to adapt amd_iommu_iotlb_sync() to use the selective flushes.

Adapt amd_iommu_iotlb_sync() to use selective flushes and change
amd_iommu_unmap() to collect the flushes. As a defensive measure, to
avoid potential issues as those that the Intel IOMMU driver encountered
recently, flush the page-walk caches by always setting the "pde"
parameter. This can be removed later.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 drivers/iommu/amd/iommu.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3ac42bbdefc6..3e40f6610b6a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2059,12 +2059,17 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
 {
 	struct protection_domain *domain = to_pdomain(dom);
 	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
+	size_t r;
 
 	if ((amd_iommu_pgtable == AMD_IOMMU_V1) &&
 	    (domain->iop.mode == PAGE_MODE_NONE))
 		return 0;
 
-	return (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
+	r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
+
+	iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
+
+	return r;
 }
 
 static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
@@ -2167,7 +2172,13 @@ static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)
 static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
 				 struct iommu_iotlb_gather *gather)
 {
-	amd_iommu_flush_iotlb_all(domain);
+	struct protection_domain *dom = to_pdomain(domain);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dom->lock, flags);
+	__domain_flush_pages(dom, gather->start, gather->end - gather->start, 1);
+	amd_iommu_domain_flush_complete(dom);
+	spin_unlock_irqrestore(&dom->lock, flags);
 }
 
 static int amd_iommu_def_domain_type(struct device *dev)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on
  2021-06-07 18:25 [PATCH v3 0/6] iommu/amd: Enable page-selective flushes Nadav Amit
  2021-06-07 18:25 ` [PATCH v3 1/6] iommu/amd: Selective flush on unmap Nadav Amit
@ 2021-06-07 18:25 ` Nadav Amit
  2021-06-15 13:08   ` Robin Murphy
  2021-06-07 18:25 ` [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers Nadav Amit
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2021-06-07 18:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Nadav Amit, Will Deacon, Jiajun Cao, Robin Murphy, Lu Baolu,
	iommu, linux-kernel

From: Nadav Amit <namit@vmware.com>

Do not use flush-queue on virtualized environments, where the NpCache
capability of the IOMMU is set. This is required to reduce
virtualization overheads.

This change follows a similar change to Intel's VT-d and a detailed
explanation as for the rationale is described in commit 29b32839725f
("iommu/vt-d: Do not use flush-queue when caching-mode is on").

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 drivers/iommu/amd/init.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d006724f4dc2..ba3b76ed776d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1850,8 +1850,13 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
 	if (ret)
 		return ret;
 
-	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
+	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
+		if (!amd_iommu_unmap_flush)
+			pr_warn_once("IOMMU batching is disabled due to virtualization");
+
 		amd_iommu_np_cache = true;
+		amd_iommu_unmap_flush = true;
+	}
 
 	init_iommu_perf_ctr(iommu);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers
  2021-06-07 18:25 [PATCH v3 0/6] iommu/amd: Enable page-selective flushes Nadav Amit
  2021-06-07 18:25 ` [PATCH v3 1/6] iommu/amd: Selective flush on unmap Nadav Amit
  2021-06-07 18:25 ` [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on Nadav Amit
@ 2021-06-07 18:25 ` Nadav Amit
  2021-06-11 13:50   ` Will Deacon
                     ` (2 more replies)
  2021-06-07 18:25 ` [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out Nadav Amit
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 26+ messages in thread
From: Nadav Amit @ 2021-06-07 18:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Robin Murphy, Will Deacon, Jiajun Cao, Lu Baolu, iommu, linux-kernel

From: Robin Murphy <robin.murphy@arm.com>

The Mediatek driver is not the only one which might want a basic
address-based gathering behaviour, so although it's arguably simple
enough to open-code, let's factor it out for the sake of cleanliness.
Let's also take this opportunity to document the intent of these
helpers for clarity.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Robin Murphy <robin.murphy@arm.com>

---

Changes from Robin's version:
* Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API
* Use iommu_iotlb_gather_add_range() in iommu_iotlb_gather_add_page()
---
 drivers/iommu/mtk_iommu.c |  5 +----
 include/linux/iommu.h     | 43 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index e06b8a0e2b56..0af4a91ac7da 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -523,10 +523,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
 	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 	unsigned long end = iova + size - 1;
 
-	if (gather->start > iova)
-		gather->start = iova;
-	if (gather->end < end)
-		gather->end = end;
+	iommu_iotlb_gather_add_range(gather, iova, size);
 	return dom->iop->unmap(dom->iop, iova, size, gather);
 }
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..f254c62f3720 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -497,6 +497,38 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
 	iommu_iotlb_gather_init(iotlb_gather);
 }
 
+/**
+ * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
+ * @gather: TLB gather data
+ * @iova: start of page to invalidate
+ * @size: size of page to invalidate
+ *
+ * Helper for IOMMU drivers to build arbitrarily-sized invalidation commands
+ * where only the address range matters, and simply minimising intermediate
+ * syncs is preferred.
+ */
+static inline void iommu_iotlb_gather_add_range(struct iommu_iotlb_gather *gather,
+						unsigned long iova, size_t size)
+{
+	unsigned long end = iova + size - 1;
+
+	if (gather->start > iova)
+		gather->start = iova;
+	if (gather->end < end)
+		gather->end = end;
+}
+
+/**
+ * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation
+ * @domain: IOMMU domain to be invalidated
+ * @gather: TLB gather data
+ * @iova: start of page to invalidate
+ * @size: size of page to invalidate
+ *
+ * Helper for IOMMU drivers to build invalidation commands based on individual
+ * pages, or with page size/table level hints which cannot be gathered if they
+ * differ.
+ */
 static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
 					       struct iommu_iotlb_gather *gather,
 					       unsigned long iova, size_t size)
@@ -515,11 +547,7 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
 		gather->pgsize = size;
 	}
 
-	if (gather->end < end)
-		gather->end = end;
-
-	if (gather->start > start)
-		gather->start = start;
+	iommu_iotlb_gather_add_range(gather, iova, size);
 }
 
 /* PCI device grouping function */
@@ -702,6 +730,11 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
 {
 }
 
+static inline void iommu_iotlb_gather_add_range(struct iommu_iotlb_gather *gather,
+						unsigned long iova, size_t size)
+{
+}
+
 static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 {
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out
  2021-06-07 18:25 [PATCH v3 0/6] iommu/amd: Enable page-selective flushes Nadav Amit
                   ` (2 preceding siblings ...)
  2021-06-07 18:25 ` [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers Nadav Amit
@ 2021-06-07 18:25 ` Nadav Amit
  2021-06-11 13:57   ` Will Deacon
  2021-06-07 18:25 ` [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD Nadav Amit
  2021-06-07 18:25 ` [PATCH v3 6/6] iommu/amd: Sync once for scatter-gather operations Nadav Amit
  5 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2021-06-07 18:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Nadav Amit, Will Deacon, Jiajun Cao, Robin Murphy, Lu Baolu,
	iommu, linux-kernel

From: Nadav Amit <namit@vmware.com>

Refactor iommu_iotlb_gather_add_page() and factor out the logic that
detects whether IOTLB gather range and a new range are disjoint. To be
used by the next patch that implements different gathering logic for
AMD.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 include/linux/iommu.h | 41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f254c62f3720..b5a2bfc68fb0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -497,6 +497,28 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
 	iommu_iotlb_gather_init(iotlb_gather);
 }
 
+/**
+ * iommu_iotlb_gather_is_disjoint - Checks whether a new range is disjoint
+ *
+ * @gather: TLB gather data
+ * @iova: start of page to invalidate
+ * @size: size of page to invalidate
+ *
+ * Helper for IOMMU drivers to check whether a new range is and the gathered
+ * range are disjoint. For many IOMMUs, flushing the IOMMU in this case is
+ * better than merging the two, which might lead to unnecessary invalidations.
+ */
+static inline
+bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather,
+				    unsigned long iova, size_t size)
+{
+	unsigned long start = iova, end = start + size - 1;
+
+	return gather->end != 0 &&
+		(end + 1 < gather->start || start > gather->end + 1);
+}
+
+
 /**
  * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
  * @gather: TLB gather data
@@ -533,20 +555,16 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
 					       struct iommu_iotlb_gather *gather,
 					       unsigned long iova, size_t size)
 {
-	unsigned long start = iova, end = start + size - 1;
-
 	/*
 	 * If the new page is disjoint from the current range or is mapped at
 	 * a different granularity, then sync the TLB so that the gather
 	 * structure can be rewritten.
 	 */
-	if (gather->pgsize != size ||
-	    end + 1 < gather->start || start > gather->end + 1) {
-		if (gather->pgsize)
-			iommu_iotlb_sync(domain, gather);
-		gather->pgsize = size;
-	}
+	if ((gather->pgsize && gather->pgsize != size) ||
+	    iommu_iotlb_gather_is_disjoint(gather, iova, size))
+		iommu_iotlb_sync(domain, gather);
 
+	gather->pgsize = size;
 	iommu_iotlb_gather_add_range(gather, iova, size);
 }
 
@@ -730,6 +748,13 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
 {
 }
 
+static inline
+bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather,
+				    unsigned long iova, size_t size)
+{
+	return false;
+}
+
 static inline void iommu_iotlb_gather_add_range(struct iommu_iotlb_gather *gather,
 						unsigned long iova, size_t size)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD
  2021-06-07 18:25 [PATCH v3 0/6] iommu/amd: Enable page-selective flushes Nadav Amit
                   ` (3 preceding siblings ...)
  2021-06-07 18:25 ` [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out Nadav Amit
@ 2021-06-07 18:25 ` Nadav Amit
  2021-06-15 12:55   ` Robin Murphy
  2021-06-07 18:25 ` [PATCH v3 6/6] iommu/amd: Sync once for scatter-gather operations Nadav Amit
  5 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2021-06-07 18:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Nadav Amit, Will Deacon, Jiajun Cao, Robin Murphy, Lu Baolu,
	iommu, linux-kernel

From: Nadav Amit <namit@vmware.com>

AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
This is in contrast, for instnace, to Intel IOMMUs that have a limit on
the number of pages that can be flushed in a single flush.  In addition,
AMD's IOMMU do not care about the page-size, so changes of the page size
do not need to trigger a TLB flush.

So in most cases, a TLB flush due to disjoint range or page-size changes
are not needed for AMD. Yet, vIOMMUs require the hypervisor to
synchronize the virtualized IOMMU's PTEs with the physical ones. This
process induce overheads, so it is better not to cause unnecessary
flushes, i.e., flushes of PTEs that were not modified.

Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
of the generic iommu_iotlb_gather_add_page(). Ignore page-size changes
and disjoint regions unless "non-present cache" feature is reported by
the IOMMU capabilities, as this is an indication we are running on a
physical IOMMU. A similar indication is used by VT-d (see "caching
mode"). The new logic retains the same flushing behavior that we had
before the introduction of page-selective IOTLB flushes for AMD.

On virtualized environments, check if the newly flushed region and the
gathered one are disjoint and flush if it is. Also check whether the new
region would cause IOTLB invalidation of large region that would include
unmodified PTE. The latter check is done according to the "order" of the
IOTLB flush.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 drivers/iommu/amd/iommu.c | 44 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3e40f6610b6a..128f2e889ced 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2053,6 +2053,48 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
 	return ret;
 }
 
+static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,
+					    struct iommu_iotlb_gather *gather,
+					    unsigned long iova, size_t size)
+{
+	/*
+	 * AMD's IOMMU can flush as many pages as necessary in a single flush.
+	 * Unless we run in a virtual machine, which can be inferred according
+	 * to whether "non-present cache" is on, it is probably best to prefer
+	 * (potentially) too extensive TLB flushing (i.e., more misses) over
+	 * mutliple TLB flushes (i.e., more flushes). For virtual machines the
+	 * hypervisor needs to synchronize the host IOMMU PTEs with those of
+	 * the guest, and the trade-off is different: unnecessary TLB flushes
+	 * should be avoided.
+	 */
+	if (amd_iommu_np_cache && gather->end != 0) {
+		unsigned long start = iova, end = start + size - 1;
+
+		if (iommu_iotlb_gather_is_disjoint(gather, iova, size)) {
+			/*
+			 * If the new page is disjoint from the current range,
+			 * flush.
+			 */
+			iommu_iotlb_sync(domain, gather);
+		} else {
+			/*
+			 * If the order of TLB flushes increases by more than
+			 * 1, it means that we would have to flush PTEs that
+			 * were not modified. In this case, flush.
+			 */
+			unsigned long new_start = min(gather->start, start);
+			unsigned long new_end = min(gather->end, end);
+			int msb_diff = fls64(gather->end ^ gather->start);
+			int new_msb_diff = fls64(new_end ^ new_start);
+
+			if (new_msb_diff > msb_diff + 1)
+				iommu_iotlb_sync(domain, gather);
+		}
+	}
+
+	iommu_iotlb_gather_add_range(gather, iova, size);
+}
+
 static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
 			      size_t page_size,
 			      struct iommu_iotlb_gather *gather)
@@ -2067,7 +2109,7 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
 
 	r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
 
-	iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
+	amd_iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
 
 	return r;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 6/6] iommu/amd: Sync once for scatter-gather operations
  2021-06-07 18:25 [PATCH v3 0/6] iommu/amd: Enable page-selective flushes Nadav Amit
                   ` (4 preceding siblings ...)
  2021-06-07 18:25 ` [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD Nadav Amit
@ 2021-06-07 18:25 ` Nadav Amit
  2021-06-15 11:25   ` Robin Murphy
  5 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2021-06-07 18:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Nadav Amit, Will Deacon, Jiajun Cao, Robin Murphy, Lu Baolu,
	iommu, linux-kernel

From: Nadav Amit <namit@vmware.com>

On virtual machines, software must flush the IOTLB after each page table
entry update.

The iommu_map_sg() code iterates through the given scatter-gather list
and invokes iommu_map() for each element in the scatter-gather list,
which calls into the vendor IOMMU driver through iommu_ops callback. As
the result, a single sg mapping may lead to multiple IOTLB flushes.

Fix this by adding amd_iotlb_sync_map() callback and flushing at this
point after all sg mappings we set.

This commit is followed and inspired by commit 933fcd01e97e2
("iommu/vt-d: Add iotlb_sync_map callback").

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Jiajun Cao <caojiajun@vmware.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 drivers/iommu/amd/iommu.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 128f2e889ced..dd23566f1db8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2027,6 +2027,16 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
 	return ret;
 }
 
+static void amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
+				     unsigned long iova, size_t size)
+{
+	struct protection_domain *domain = to_pdomain(dom);
+	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
+
+	if (ops->map)
+		domain_flush_np_cache(domain, iova, size);
+}
+
 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
 			 phys_addr_t paddr, size_t page_size, int iommu_prot,
 			 gfp_t gfp)
@@ -2045,10 +2055,8 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
 	if (iommu_prot & IOMMU_WRITE)
 		prot |= IOMMU_PROT_IW;
 
-	if (ops->map) {
+	if (ops->map)
 		ret = ops->map(ops, iova, paddr, page_size, prot, gfp);
-		domain_flush_np_cache(domain, iova, page_size);
-	}
 
 	return ret;
 }
@@ -2249,6 +2257,7 @@ const struct iommu_ops amd_iommu_ops = {
 	.attach_dev = amd_iommu_attach_device,
 	.detach_dev = amd_iommu_detach_device,
 	.map = amd_iommu_map,
+	.iotlb_sync_map	= amd_iommu_iotlb_sync_map,
 	.unmap = amd_iommu_unmap,
 	.iova_to_phys = amd_iommu_iova_to_phys,
 	.probe_device = amd_iommu_probe_device,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers
  2021-06-07 18:25 ` [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers Nadav Amit
@ 2021-06-11 13:50   ` Will Deacon
  2021-06-15 10:42   ` Robin Murphy
  2021-06-15 12:29   ` Yong Wu
  2 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2021-06-11 13:50 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Joerg Roedel, Robin Murphy, Jiajun Cao, Lu Baolu, iommu, linux-kernel

On Mon, Jun 07, 2021 at 11:25:38AM -0700, Nadav Amit wrote:
> From: Robin Murphy <robin.murphy@arm.com>
> 
> The Mediatek driver is not the only one which might want a basic
> address-based gathering behaviour, so although it's arguably simple
> enough to open-code, let's factor it out for the sake of cleanliness.
> Let's also take this opportunity to document the intent of these
> helpers for clarity.
> 
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jiajun Cao <caojiajun@vmware.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> 
> ---
> 
> Changes from Robin's version:
> * Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API
> * Use iommu_iotlb_gather_add_range() in iommu_iotlb_gather_add_page()
> ---
>  drivers/iommu/mtk_iommu.c |  5 +----
>  include/linux/iommu.h     | 43 ++++++++++++++++++++++++++++++++++-----
>  2 files changed, 39 insertions(+), 9 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out
  2021-06-07 18:25 ` [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out Nadav Amit
@ 2021-06-11 13:57   ` Will Deacon
  2021-06-11 16:50     ` Nadav Amit
  0 siblings, 1 reply; 26+ messages in thread
From: Will Deacon @ 2021-06-11 13:57 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Joerg Roedel, Nadav Amit, Jiajun Cao, Robin Murphy, Lu Baolu,
	iommu, linux-kernel

On Mon, Jun 07, 2021 at 11:25:39AM -0700, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> Refactor iommu_iotlb_gather_add_page() and factor out the logic that
> detects whether IOTLB gather range and a new range are disjoint. To be
> used by the next patch that implements different gathering logic for
> AMD.
> 
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jiajun Cao <caojiajun@vmware.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-kernel@vger.kernel.org>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  include/linux/iommu.h | 41 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 33 insertions(+), 8 deletions(-)

[...]

> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index f254c62f3720..b5a2bfc68fb0 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -497,6 +497,28 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
>  	iommu_iotlb_gather_init(iotlb_gather);
>  }
>  
> +/**
> + * iommu_iotlb_gather_is_disjoint - Checks whether a new range is disjoint
> + *
> + * @gather: TLB gather data
> + * @iova: start of page to invalidate
> + * @size: size of page to invalidate
> + *
> + * Helper for IOMMU drivers to check whether a new range is and the gathered
> + * range are disjoint. 

I can't quite parse this. Delete the "is"?

>     For many IOMMUs, flushing the IOMMU in this case is
> + * better than merging the two, which might lead to unnecessary invalidations.
> + */
> +static inline
> +bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather,
> +				    unsigned long iova, size_t size)
> +{
> +	unsigned long start = iova, end = start + size - 1;
> +
> +	return gather->end != 0 &&
> +		(end + 1 < gather->start || start > gather->end + 1);
> +}
> +
> +
>  /**
>   * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
>   * @gather: TLB gather data
> @@ -533,20 +555,16 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
>  					       struct iommu_iotlb_gather *gather,
>  					       unsigned long iova, size_t size)
>  {
> -	unsigned long start = iova, end = start + size - 1;
> -
>  	/*
>  	 * If the new page is disjoint from the current range or is mapped at
>  	 * a different granularity, then sync the TLB so that the gather
>  	 * structure can be rewritten.
>  	 */
> -	if (gather->pgsize != size ||
> -	    end + 1 < gather->start || start > gather->end + 1) {
> -		if (gather->pgsize)
> -			iommu_iotlb_sync(domain, gather);
> -		gather->pgsize = size;
> -	}
> +	if ((gather->pgsize && gather->pgsize != size) ||
> +	    iommu_iotlb_gather_is_disjoint(gather, iova, size))
> +		iommu_iotlb_sync(domain, gather);
>  
> +	gather->pgsize = size;

Why have you made this unconditional? I think it's ok, but just not sure
if it's necessary or not.

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out
  2021-06-11 13:57   ` Will Deacon
@ 2021-06-11 16:50     ` Nadav Amit
  2021-06-15 10:29       ` Will Deacon
  0 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2021-06-11 16:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: Joerg Roedel, Jiajun Cao, Robin Murphy, Lu Baolu, iommu, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4295 bytes --]



> On Jun 11, 2021, at 6:57 AM, Will Deacon <will@kernel.org> wrote:
> 
> On Mon, Jun 07, 2021 at 11:25:39AM -0700, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> Refactor iommu_iotlb_gather_add_page() and factor out the logic that
>> detects whether IOTLB gather range and a new range are disjoint. To be
>> used by the next patch that implements different gathering logic for
>> AMD.
>> 
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Jiajun Cao <caojiajun@vmware.com>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>> Cc: iommu@lists.linux-foundation.org
>> Cc: linux-kernel@vger.kernel.org>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>> include/linux/iommu.h | 41 +++++++++++++++++++++++++++++++++--------
>> 1 file changed, 33 insertions(+), 8 deletions(-)
> 
> [...]
> 
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index f254c62f3720..b5a2bfc68fb0 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -497,6 +497,28 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
>> 	iommu_iotlb_gather_init(iotlb_gather);
>> }
>> 
>> +/**
>> + * iommu_iotlb_gather_is_disjoint - Checks whether a new range is disjoint
>> + *
>> + * @gather: TLB gather data
>> + * @iova: start of page to invalidate
>> + * @size: size of page to invalidate
>> + *
>> + * Helper for IOMMU drivers to check whether a new range is and the gathered
>> + * range are disjoint.
> 
> I can't quite parse this. Delete the "is"?

Indeed. Will do (I mean I will do ;-) )

> 
>>    For many IOMMUs, flushing the IOMMU in this case is
>> + * better than merging the two, which might lead to unnecessary invalidations.
>> + */
>> +static inline
>> +bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather,
>> +				    unsigned long iova, size_t size)
>> +{
>> +	unsigned long start = iova, end = start + size - 1;
>> +
>> +	return gather->end != 0 &&
>> +		(end + 1 < gather->start || start > gather->end + 1);
>> +}
>> +
>> +
>> /**
>>  * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
>>  * @gather: TLB gather data
>> @@ -533,20 +555,16 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
>> 					       struct iommu_iotlb_gather *gather,
>> 					       unsigned long iova, size_t size)
>> {
>> -	unsigned long start = iova, end = start + size - 1;
>> -
>> 	/*
>> 	 * If the new page is disjoint from the current range or is mapped at
>> 	 * a different granularity, then sync the TLB so that the gather
>> 	 * structure can be rewritten.
>> 	 */
>> -	if (gather->pgsize != size ||
>> -	    end + 1 < gather->start || start > gather->end + 1) {
>> -		if (gather->pgsize)
>> -			iommu_iotlb_sync(domain, gather);
>> -		gather->pgsize = size;
>> -	}
>> +	if ((gather->pgsize && gather->pgsize != size) ||
>> +	    iommu_iotlb_gather_is_disjoint(gather, iova, size))
>> +		iommu_iotlb_sync(domain, gather);
>> 
>> +	gather->pgsize = size;
> 
> Why have you made this unconditional? I think it's ok, but just not sure
> if it's necessary or not.

In regard to gather->pgsize, this function had (and has) an
invariant, in which gather->pgsize always represents the flushing
granularity of its range. Arguably, “size" should never be
zero, but lets assume for the matter of discussion that it might.

If “size” equals to “gather->pgsize”, then the assignment in
question has no impact.

Otherwise, if “size” is non-zero, then iommu_iotlb_sync() would
initialize the size and range (see iommu_iotlb_gather_init()),
and the invariant is kept.

Otherwise, “size” is zero, and “gather” already holds a range,
so gather->pgsize is non-zero and
(gather->pgsize && gather->pgsize != size) is true. Therefore,
again, iommu_iotlb_sync() would be called and initialize the
size.

I think that this change makes the code much simpler to read.
It probably has no performance impact as “gather” is probably
cached and anyhow accessed shortly after.

If anything, I can add a VM_BUG_ON() to check “size” is
non-zero, although this code seems correct regardless of that.


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out
  2021-06-11 16:50     ` Nadav Amit
@ 2021-06-15 10:29       ` Will Deacon
  2021-06-15 18:54         ` Nadav Amit
  0 siblings, 1 reply; 26+ messages in thread
From: Will Deacon @ 2021-06-15 10:29 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Joerg Roedel, Jiajun Cao, Robin Murphy, Lu Baolu, iommu, linux-kernel

On Fri, Jun 11, 2021 at 09:50:31AM -0700, Nadav Amit wrote:
> 
> 
> > On Jun 11, 2021, at 6:57 AM, Will Deacon <will@kernel.org> wrote:
> > 
> > On Mon, Jun 07, 2021 at 11:25:39AM -0700, Nadav Amit wrote:
> >> From: Nadav Amit <namit@vmware.com>
> >> 
> >> Refactor iommu_iotlb_gather_add_page() and factor out the logic that
> >> detects whether IOTLB gather range and a new range are disjoint. To be
> >> used by the next patch that implements different gathering logic for
> >> AMD.
> >> 
> >> Cc: Joerg Roedel <joro@8bytes.org>
> >> Cc: Will Deacon <will@kernel.org>
> >> Cc: Jiajun Cao <caojiajun@vmware.com>
> >> Cc: Robin Murphy <robin.murphy@arm.com>
> >> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> >> Cc: iommu@lists.linux-foundation.org
> >> Cc: linux-kernel@vger.kernel.org>
> >> Signed-off-by: Nadav Amit <namit@vmware.com>
> >> ---
> >> include/linux/iommu.h | 41 +++++++++++++++++++++++++++++++++--------
> >> 1 file changed, 33 insertions(+), 8 deletions(-)
> > 
> > [...]
> > 
> >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> >> index f254c62f3720..b5a2bfc68fb0 100644
> >> --- a/include/linux/iommu.h
> >> +++ b/include/linux/iommu.h
> >> @@ -497,6 +497,28 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
> >> 	iommu_iotlb_gather_init(iotlb_gather);
> >> }
> >> 
> >> +/**
> >> + * iommu_iotlb_gather_is_disjoint - Checks whether a new range is disjoint
> >> + *
> >> + * @gather: TLB gather data
> >> + * @iova: start of page to invalidate
> >> + * @size: size of page to invalidate
> >> + *
> >> + * Helper for IOMMU drivers to check whether a new range is and the gathered
> >> + * range are disjoint.
> > 
> > I can't quite parse this. Delete the "is"?
> 
> Indeed. Will do (I mean I will do ;-) )
> 
> > 
> >>    For many IOMMUs, flushing the IOMMU in this case is
> >> + * better than merging the two, which might lead to unnecessary invalidations.
> >> + */
> >> +static inline
> >> +bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather,
> >> +				    unsigned long iova, size_t size)
> >> +{
> >> +	unsigned long start = iova, end = start + size - 1;
> >> +
> >> +	return gather->end != 0 &&
> >> +		(end + 1 < gather->start || start > gather->end + 1);
> >> +}
> >> +
> >> +
> >> /**
> >>  * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
> >>  * @gather: TLB gather data
> >> @@ -533,20 +555,16 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
> >> 					       struct iommu_iotlb_gather *gather,
> >> 					       unsigned long iova, size_t size)
> >> {
> >> -	unsigned long start = iova, end = start + size - 1;
> >> -
> >> 	/*
> >> 	 * If the new page is disjoint from the current range or is mapped at
> >> 	 * a different granularity, then sync the TLB so that the gather
> >> 	 * structure can be rewritten.
> >> 	 */
> >> -	if (gather->pgsize != size ||
> >> -	    end + 1 < gather->start || start > gather->end + 1) {
> >> -		if (gather->pgsize)
> >> -			iommu_iotlb_sync(domain, gather);
> >> -		gather->pgsize = size;
> >> -	}
> >> +	if ((gather->pgsize && gather->pgsize != size) ||
> >> +	    iommu_iotlb_gather_is_disjoint(gather, iova, size))
> >> +		iommu_iotlb_sync(domain, gather);
> >> 
> >> +	gather->pgsize = size;
> > 
> > Why have you made this unconditional? I think it's ok, but just not sure
> > if it's necessary or not.
> 
> In regard to gather->pgsize, this function had (and has) an
> invariant, in which gather->pgsize always represents the flushing
> granularity of its range. Arguably, “size" should never be
> zero, but lets assume for the matter of discussion that it might.
> 
> If “size” equals to “gather->pgsize”, then the assignment in
> question has no impact.
> 
> Otherwise, if “size” is non-zero, then iommu_iotlb_sync() would
> initialize the size and range (see iommu_iotlb_gather_init()),
> and the invariant is kept.
> 
> Otherwise, “size” is zero, and “gather” already holds a range,
> so gather->pgsize is non-zero and
> (gather->pgsize && gather->pgsize != size) is true. Therefore,
> again, iommu_iotlb_sync() would be called and initialize the
> size.
> 
> I think that this change makes the code much simpler to read.
> It probably has no performance impact as “gather” is probably
> cached and anyhow accessed shortly after.

Thanks. I was just interested in whether it had a functional impact (I don't
think it does) or whether it was just cleanup.

With the updated comment:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers
  2021-06-07 18:25 ` [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers Nadav Amit
  2021-06-11 13:50   ` Will Deacon
@ 2021-06-15 10:42   ` Robin Murphy
  2021-06-15 19:05     ` Nadav Amit
  2021-06-15 12:29   ` Yong Wu
  2 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2021-06-15 10:42 UTC (permalink / raw)
  To: Nadav Amit, Joerg Roedel
  Cc: Will Deacon, Jiajun Cao, Lu Baolu, iommu, linux-kernel

On 2021-06-07 19:25, Nadav Amit wrote:
> From: Robin Murphy <robin.murphy@arm.com>
> 
> The Mediatek driver is not the only one which might want a basic
> address-based gathering behaviour, so although it's arguably simple
> enough to open-code, let's factor it out for the sake of cleanliness.
> Let's also take this opportunity to document the intent of these
> helpers for clarity.
> 
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jiajun Cao <caojiajun@vmware.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Nit: missing your signoff.

> ---
> 
> Changes from Robin's version:
> * Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API

Out of curiosity, is there any config in which a stub is actually 
needed? Unlike iommu_iotlb_gather_init(), I would have thought that 
these helpers should only ever be called by driver code which already 
depends on IOMMU_API.

Thanks,
Robin.

> * Use iommu_iotlb_gather_add_range() in iommu_iotlb_gather_add_page()
> ---
>   drivers/iommu/mtk_iommu.c |  5 +----
>   include/linux/iommu.h     | 43 ++++++++++++++++++++++++++++++++++-----
>   2 files changed, 39 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index e06b8a0e2b56..0af4a91ac7da 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -523,10 +523,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>   	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>   	unsigned long end = iova + size - 1;
>   
> -	if (gather->start > iova)
> -		gather->start = iova;
> -	if (gather->end < end)
> -		gather->end = end;
> +	iommu_iotlb_gather_add_range(gather, iova, size);
>   	return dom->iop->unmap(dom->iop, iova, size, gather);
>   }
>   
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 32d448050bf7..f254c62f3720 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -497,6 +497,38 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
>   	iommu_iotlb_gather_init(iotlb_gather);
>   }
>   
> +/**
> + * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
> + * @gather: TLB gather data
> + * @iova: start of page to invalidate
> + * @size: size of page to invalidate
> + *
> + * Helper for IOMMU drivers to build arbitrarily-sized invalidation commands
> + * where only the address range matters, and simply minimising intermediate
> + * syncs is preferred.
> + */
> +static inline void iommu_iotlb_gather_add_range(struct iommu_iotlb_gather *gather,
> +						unsigned long iova, size_t size)
> +{
> +	unsigned long end = iova + size - 1;
> +
> +	if (gather->start > iova)
> +		gather->start = iova;
> +	if (gather->end < end)
> +		gather->end = end;
> +}
> +
> +/**
> + * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation
> + * @domain: IOMMU domain to be invalidated
> + * @gather: TLB gather data
> + * @iova: start of page to invalidate
> + * @size: size of page to invalidate
> + *
> + * Helper for IOMMU drivers to build invalidation commands based on individual
> + * pages, or with page size/table level hints which cannot be gathered if they
> + * differ.
> + */
>   static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
>   					       struct iommu_iotlb_gather *gather,
>   					       unsigned long iova, size_t size)
> @@ -515,11 +547,7 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
>   		gather->pgsize = size;
>   	}
>   
> -	if (gather->end < end)
> -		gather->end = end;
> -
> -	if (gather->start > start)
> -		gather->start = start;
> +	iommu_iotlb_gather_add_range(gather, iova, size);
>   }
>   
>   /* PCI device grouping function */
> @@ -702,6 +730,11 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
>   {
>   }
>   
> +static inline void iommu_iotlb_gather_add_range(struct iommu_iotlb_gather *gather,
> +						unsigned long iova, size_t size)
> +{
> +}
> +
>   static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
>   {
>   	return 0;
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 6/6] iommu/amd: Sync once for scatter-gather operations
  2021-06-07 18:25 ` [PATCH v3 6/6] iommu/amd: Sync once for scatter-gather operations Nadav Amit
@ 2021-06-15 11:25   ` Robin Murphy
  2021-06-15 18:51     ` Nadav Amit
  0 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2021-06-15 11:25 UTC (permalink / raw)
  To: Nadav Amit, Joerg Roedel
  Cc: linux-kernel, iommu, Nadav Amit, Jiajun Cao, Will Deacon

On 2021-06-07 19:25, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> On virtual machines, software must flush the IOTLB after each page table
> entry update.
> 
> The iommu_map_sg() code iterates through the given scatter-gather list
> and invokes iommu_map() for each element in the scatter-gather list,
> which calls into the vendor IOMMU driver through iommu_ops callback. As
> the result, a single sg mapping may lead to multiple IOTLB flushes.
> 
> Fix this by adding amd_iotlb_sync_map() callback and flushing at this
> point after all sg mappings we set.
> 
> This commit is followed and inspired by commit 933fcd01e97e2
> ("iommu/vt-d: Add iotlb_sync_map callback").
> 
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jiajun Cao <caojiajun@vmware.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>   drivers/iommu/amd/iommu.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 128f2e889ced..dd23566f1db8 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2027,6 +2027,16 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
>   	return ret;
>   }
>   
> +static void amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
> +				     unsigned long iova, size_t size)
> +{
> +	struct protection_domain *domain = to_pdomain(dom);
> +	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
> +
> +	if (ops->map)

Not too critical since you're only moving existing code around, but is 
ops->map ever not set? Either way the check ends up looking rather 
out-of-place here :/

It's not very clear what the original intent was - I do wonder whether 
it's supposed to be related to PAGE_MODE_NONE, but given that 
amd_iommu_map() has an explicit check and errors out early in that case, 
we'd never get here anyway. Possibly something to come back and clean up 
later?

Robin.

> +		domain_flush_np_cache(domain, iova, size);
> +}
> +
>   static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
>   			 phys_addr_t paddr, size_t page_size, int iommu_prot,
>   			 gfp_t gfp)
> @@ -2045,10 +2055,8 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
>   	if (iommu_prot & IOMMU_WRITE)
>   		prot |= IOMMU_PROT_IW;
>   
> -	if (ops->map) {
> +	if (ops->map)
>   		ret = ops->map(ops, iova, paddr, page_size, prot, gfp);
> -		domain_flush_np_cache(domain, iova, page_size);
> -	}
>   
>   	return ret;
>   }
> @@ -2249,6 +2257,7 @@ const struct iommu_ops amd_iommu_ops = {
>   	.attach_dev = amd_iommu_attach_device,
>   	.detach_dev = amd_iommu_detach_device,
>   	.map = amd_iommu_map,
> +	.iotlb_sync_map	= amd_iommu_iotlb_sync_map,
>   	.unmap = amd_iommu_unmap,
>   	.iova_to_phys = amd_iommu_iova_to_phys,
>   	.probe_device = amd_iommu_probe_device,
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers
  2021-06-07 18:25 ` [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers Nadav Amit
  2021-06-11 13:50   ` Will Deacon
  2021-06-15 10:42   ` Robin Murphy
@ 2021-06-15 12:29   ` Yong Wu
  2021-06-15 12:41     ` Robin Murphy
  2 siblings, 1 reply; 26+ messages in thread
From: Yong Wu @ 2021-06-15 12:29 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Joerg Roedel, Robin Murphy, linux-kernel, iommu, Jiajun Cao, Will Deacon

On Mon, 2021-06-07 at 11:25 -0700, Nadav Amit wrote:
> From: Robin Murphy <robin.murphy@arm.com>
> 
> The Mediatek driver is not the only one which might want a basic
> address-based gathering behaviour, so although it's arguably simple
> enough to open-code, let's factor it out for the sake of cleanliness.
> Let's also take this opportunity to document the intent of these
> helpers for clarity.
> 
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jiajun Cao <caojiajun@vmware.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> 
> ---
> 
> Changes from Robin's version:
> * Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API
> * Use iommu_iotlb_gather_add_range() in iommu_iotlb_gather_add_page()
> ---
>  drivers/iommu/mtk_iommu.c |  5 +----
>  include/linux/iommu.h     | 43 ++++++++++++++++++++++++++++++++++-----
>  2 files changed, 39 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index e06b8a0e2b56..0af4a91ac7da 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -523,10 +523,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>  	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>  	unsigned long end = iova + size - 1;

Please also help delete this "end".

>  
> -	if (gather->start > iova)
> -		gather->start = iova;
> -	if (gather->end < end)
> -		gather->end = end;
> +	iommu_iotlb_gather_add_range(gather, iova, size);
>  	return dom->iop->unmap(dom->iop, iova, size, gather);
>  }

[snip] 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers
  2021-06-15 12:29   ` Yong Wu
@ 2021-06-15 12:41     ` Robin Murphy
  0 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2021-06-15 12:41 UTC (permalink / raw)
  To: Yong Wu, Nadav Amit; +Cc: linux-kernel, iommu, Jiajun Cao, Will Deacon

On 2021-06-15 13:29, Yong Wu wrote:
> On Mon, 2021-06-07 at 11:25 -0700, Nadav Amit wrote:
>> From: Robin Murphy <robin.murphy@arm.com>
>>
>> The Mediatek driver is not the only one which might want a basic
>> address-based gathering behaviour, so although it's arguably simple
>> enough to open-code, let's factor it out for the sake of cleanliness.
>> Let's also take this opportunity to document the intent of these
>> helpers for clarity.
>>
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Jiajun Cao <caojiajun@vmware.com>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>> Cc: iommu@lists.linux-foundation.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>
>> ---
>>
>> Changes from Robin's version:
>> * Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API
>> * Use iommu_iotlb_gather_add_range() in iommu_iotlb_gather_add_page()
>> ---
>>   drivers/iommu/mtk_iommu.c |  5 +----
>>   include/linux/iommu.h     | 43 ++++++++++++++++++++++++++++++++++-----
>>   2 files changed, 39 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>> index e06b8a0e2b56..0af4a91ac7da 100644
>> --- a/drivers/iommu/mtk_iommu.c
>> +++ b/drivers/iommu/mtk_iommu.c
>> @@ -523,10 +523,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>>   	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>>   	unsigned long end = iova + size - 1;
> 
> Please also help delete this "end".

Yes, that was part of my original patch - not sure what happened here :/

Robin.

>>   
>> -	if (gather->start > iova)
>> -		gather->start = iova;
>> -	if (gather->end < end)
>> -		gather->end = end;
>> +	iommu_iotlb_gather_add_range(gather, iova, size);
>>   	return dom->iop->unmap(dom->iop, iova, size, gather);
>>   }
> 
> [snip]
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD
  2021-06-07 18:25 ` [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD Nadav Amit
@ 2021-06-15 12:55   ` Robin Murphy
  2021-06-15 18:14     ` Nadav Amit
  0 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2021-06-15 12:55 UTC (permalink / raw)
  To: Nadav Amit, Joerg Roedel
  Cc: linux-kernel, iommu, Nadav Amit, Jiajun Cao, Will Deacon

On 2021-06-07 19:25, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
> This is in contrast, for instnace, to Intel IOMMUs that have a limit on
> the number of pages that can be flushed in a single flush.  In addition,
> AMD's IOMMU do not care about the page-size, so changes of the page size
> do not need to trigger a TLB flush.
> 
> So in most cases, a TLB flush due to disjoint range or page-size changes
> are not needed for AMD. Yet, vIOMMUs require the hypervisor to
> synchronize the virtualized IOMMU's PTEs with the physical ones. This
> process induce overheads, so it is better not to cause unnecessary
> flushes, i.e., flushes of PTEs that were not modified.
> 
> Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
> of the generic iommu_iotlb_gather_add_page(). Ignore page-size changes
> and disjoint regions unless "non-present cache" feature is reported by
> the IOMMU capabilities, as this is an indication we are running on a
> physical IOMMU. A similar indication is used by VT-d (see "caching
> mode"). The new logic retains the same flushing behavior that we had
> before the introduction of page-selective IOTLB flushes for AMD.
> 
> On virtualized environments, check if the newly flushed region and the
> gathered one are disjoint and flush if it is. Also check whether the new
> region would cause IOTLB invalidation of large region that would include
> unmodified PTE. The latter check is done according to the "order" of the
> IOTLB flush.

If it helps,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

I wonder if it might be more effective to defer the alignment-based 
splitting part to amd_iommu_iotlb_sync() itself, but that could be 
investigated as another follow-up.

Robin.

> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jiajun Cao <caojiajun@vmware.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-kernel@vger.kernel.org>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>   drivers/iommu/amd/iommu.c | 44 ++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 3e40f6610b6a..128f2e889ced 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2053,6 +2053,48 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
>   	return ret;
>   }
>   
> +static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,
> +					    struct iommu_iotlb_gather *gather,
> +					    unsigned long iova, size_t size)
> +{
> +	/*
> +	 * AMD's IOMMU can flush as many pages as necessary in a single flush.
> +	 * Unless we run in a virtual machine, which can be inferred according
> +	 * to whether "non-present cache" is on, it is probably best to prefer
> +	 * (potentially) too extensive TLB flushing (i.e., more misses) over
> +	 * mutliple TLB flushes (i.e., more flushes). For virtual machines the
> +	 * hypervisor needs to synchronize the host IOMMU PTEs with those of
> +	 * the guest, and the trade-off is different: unnecessary TLB flushes
> +	 * should be avoided.
> +	 */
> +	if (amd_iommu_np_cache && gather->end != 0) {
> +		unsigned long start = iova, end = start + size - 1;
> +
> +		if (iommu_iotlb_gather_is_disjoint(gather, iova, size)) {
> +			/*
> +			 * If the new page is disjoint from the current range,
> +			 * flush.
> +			 */
> +			iommu_iotlb_sync(domain, gather);
> +		} else {
> +			/*
> +			 * If the order of TLB flushes increases by more than
> +			 * 1, it means that we would have to flush PTEs that
> +			 * were not modified. In this case, flush.
> +			 */
> +			unsigned long new_start = min(gather->start, start);
> +			unsigned long new_end = min(gather->end, end);
> +			int msb_diff = fls64(gather->end ^ gather->start);
> +			int new_msb_diff = fls64(new_end ^ new_start);
> +
> +			if (new_msb_diff > msb_diff + 1)
> +				iommu_iotlb_sync(domain, gather);
> +		}
> +	}
> +
> +	iommu_iotlb_gather_add_range(gather, iova, size);
> +}
> +
>   static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
>   			      size_t page_size,
>   			      struct iommu_iotlb_gather *gather)
> @@ -2067,7 +2109,7 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
>   
>   	r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
>   
> -	iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
> +	amd_iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
>   
>   	return r;
>   }
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on
  2021-06-07 18:25 ` [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on Nadav Amit
@ 2021-06-15 13:08   ` Robin Murphy
  2021-06-15 18:26     ` Nadav Amit
  0 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2021-06-15 13:08 UTC (permalink / raw)
  To: Nadav Amit, Joerg Roedel
  Cc: Nadav Amit, Will Deacon, Jiajun Cao, Lu Baolu, iommu, linux-kernel

On 2021-06-07 19:25, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> Do not use flush-queue on virtualized environments, where the NpCache
> capability of the IOMMU is set. This is required to reduce
> virtualization overheads.
> 
> This change follows a similar change to Intel's VT-d and a detailed
> explanation as for the rationale is described in commit 29b32839725f
> ("iommu/vt-d: Do not use flush-queue when caching-mode is on").
> 
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jiajun Cao <caojiajun@vmware.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>   drivers/iommu/amd/init.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index d006724f4dc2..ba3b76ed776d 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -1850,8 +1850,13 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
>   	if (ret)
>   		return ret;
>   
> -	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
> +	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
> +		if (!amd_iommu_unmap_flush)
> +			pr_warn_once("IOMMU batching is disabled due to virtualization");

Nit: you can just use pr_warn() (or arguably pr_info()) since the 
explicit conditions already only match once. Speaking of which, it might 
be better to use amd_iommu_np_cache instead, since other patches are 
planning to clean up the last remnants of amd_iommu_unmap_flush.

Robin.

> +
>   		amd_iommu_np_cache = true;
> +		amd_iommu_unmap_flush = true;
> +	}
>   
>   	init_iommu_perf_ctr(iommu);
>   
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD
  2021-06-15 12:55   ` Robin Murphy
@ 2021-06-15 18:14     ` Nadav Amit
  2021-06-15 19:20       ` Robin Murphy
  0 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2021-06-15 18:14 UTC (permalink / raw)
  To: Robin Murphy; +Cc: Joerg Roedel, LKML, iommu, Jiajun Cao, Will Deacon



> On Jun 15, 2021, at 5:55 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 2021-06-07 19:25, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
>> This is in contrast, for instnace, to Intel IOMMUs that have a limit on
>> the number of pages that can be flushed in a single flush.  In addition,
>> AMD's IOMMU do not care about the page-size, so changes of the page size
>> do not need to trigger a TLB flush.
>> So in most cases, a TLB flush due to disjoint range or page-size changes
>> are not needed for AMD. Yet, vIOMMUs require the hypervisor to
>> synchronize the virtualized IOMMU's PTEs with the physical ones. This
>> process induce overheads, so it is better not to cause unnecessary
>> flushes, i.e., flushes of PTEs that were not modified.
>> Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
>> of the generic iommu_iotlb_gather_add_page(). Ignore page-size changes
>> and disjoint regions unless "non-present cache" feature is reported by
>> the IOMMU capabilities, as this is an indication we are running on a
>> physical IOMMU. A similar indication is used by VT-d (see "caching
>> mode"). The new logic retains the same flushing behavior that we had
>> before the introduction of page-selective IOTLB flushes for AMD.
>> On virtualized environments, check if the newly flushed region and the
>> gathered one are disjoint and flush if it is. Also check whether the new
>> region would cause IOTLB invalidation of large region that would include
>> unmodified PTE. The latter check is done according to the "order" of the
>> IOTLB flush.
> 
> If it helps,
> 
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>

Thanks!


> I wonder if it might be more effective to defer the alignment-based splitting part to amd_iommu_iotlb_sync() itself, but that could be investigated as another follow-up.

Note that the alignment-based splitting is only used for virtualized AMD IOMMUs, so it has no impact for most users.

Right now, the performance is kind of bad on VMs since AMD’s IOMMU driver does a full IOTLB flush whenever it unmaps more than a single page. So, although your idea makes sense, I do not know exactly how to implement it right now, and regardless it is likely to provide much lower performance improvements than those that avoiding full IOTLB flushes would.

Having said that, if I figure out a way to implement it, I would give it a try (although I am admittedly afraid of a complicated logic that might cause subtle, mostly undetectable bugs).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on
  2021-06-15 13:08   ` Robin Murphy
@ 2021-06-15 18:26     ` Nadav Amit
  2021-06-15 19:36       ` Robin Murphy
  0 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2021-06-15 18:26 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Will Deacon, Jiajun Cao, Lu Baolu, iommu, linux-kernel



> On Jun 15, 2021, at 6:08 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 2021-06-07 19:25, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> Do not use flush-queue on virtualized environments, where the NpCache
>> capability of the IOMMU is set. This is required to reduce
>> virtualization overheads.
>> This change follows a similar change to Intel's VT-d and a detailed
>> explanation as for the rationale is described in commit 29b32839725f
>> ("iommu/vt-d: Do not use flush-queue when caching-mode is on").
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Jiajun Cao <caojiajun@vmware.com>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>> Cc: iommu@lists.linux-foundation.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>>  drivers/iommu/amd/init.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index d006724f4dc2..ba3b76ed776d 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -1850,8 +1850,13 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
>>  	if (ret)
>>  		return ret;
>>  -	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
>> +	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
>> +		if (!amd_iommu_unmap_flush)
>> +			pr_warn_once("IOMMU batching is disabled due to virtualization");
> 
> Nit: you can just use pr_warn() (or arguably pr_info()) since the explicit conditions already only match once.

Yes, my bad. I will fix it in the next version.

> Speaking of which, it might be better to use amd_iommu_np_cache instead, since other patches are planning to clean up the last remnants of amd_iommu_unmap_flush.

I prefer that the other patches (that remove amd_iommu_unmap_flush) would address this code as well. I certainly do not want to embed amd_iommu_np_cache deep into the flushing logic. IOW: I don’t know what you have exactly in mind, but I prefer the code would be clear.

This code follows (copies?) the same pattern+logic from commit 5f3116ea8b5 ("iommu/vt-d: Do not use flush-queue when caching-mode is on”). I see that changed the code in commit 53255e545807c ("iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE”), but did not get rid of intel_iommu_strict, so please allow me to use amd_iommu_unmap_flush.

To remind you/me/whoever: disabling batching due to caching-mode/NP-cache is not inherently needed. It was not needed for quite some time on Intel, but somehow along the way the consolidated flushing code broke it, and now it is needed (without intrusive code changes).


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 6/6] iommu/amd: Sync once for scatter-gather operations
  2021-06-15 11:25   ` Robin Murphy
@ 2021-06-15 18:51     ` Nadav Amit
  0 siblings, 0 replies; 26+ messages in thread
From: Nadav Amit @ 2021-06-15 18:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, LKML, iommu, Jiajun Cao, Will Deacon,
	suravee.suthikulpanit



> On Jun 15, 2021, at 4:25 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 2021-06-07 19:25, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> On virtual machines, software must flush the IOTLB after each page table
>> entry update.
>> The iommu_map_sg() code iterates through the given scatter-gather list
>> and invokes iommu_map() for each element in the scatter-gather list,
>> which calls into the vendor IOMMU driver through iommu_ops callback. As
>> the result, a single sg mapping may lead to multiple IOTLB flushes.
>> Fix this by adding amd_iotlb_sync_map() callback and flushing at this
>> point after all sg mappings we set.
>> This commit is followed and inspired by commit 933fcd01e97e2
>> ("iommu/vt-d: Add iotlb_sync_map callback").
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Jiajun Cao <caojiajun@vmware.com>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>> Cc: iommu@lists.linux-foundation.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>>  drivers/iommu/amd/iommu.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index 128f2e889ced..dd23566f1db8 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -2027,6 +2027,16 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
>>  	return ret;
>>  }
>>  +static void amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
>> +				     unsigned long iova, size_t size)
>> +{
>> +	struct protection_domain *domain = to_pdomain(dom);
>> +	struct io_pgtable_ops *ops = &domain->iop.iop.ops;
>> +
>> +	if (ops->map)
> 
> Not too critical since you're only moving existing code around, but is ops->map ever not set? Either way the check ends up looking rather out-of-place here :/
> 
> It's not very clear what the original intent was - I do wonder whether it's supposed to be related to PAGE_MODE_NONE, but given that amd_iommu_map() has an explicit check and errors out early in that case, we'd never get here anyway. Possibly something to come back and clean up later?

[ +Suravee ]

According to what I see in the git log, the checks for ops->map (as well as ops->unmap) were relatively recently introduced by Suravee [1] in preparation to AMD IOMMU v2 page tables [2]. Since I do not know what he plans, I prefer not to touch this code.

[1] https://lore.kernel.org/linux-iommu/20200923101442.73157-13-suravee.suthikulpanit@amd.com/
[2] https://lore.kernel.org/linux-iommu/20200923101442.73157-1-suravee.suthikulpanit@amd.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out
  2021-06-15 10:29       ` Will Deacon
@ 2021-06-15 18:54         ` Nadav Amit
  0 siblings, 0 replies; 26+ messages in thread
From: Nadav Amit @ 2021-06-15 18:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Joerg Roedel, Jiajun Cao, Robin Murphy, Lu Baolu, iommu, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4886 bytes --]



> On Jun 15, 2021, at 3:29 AM, Will Deacon <will@kernel.org> wrote:
> 
> On Fri, Jun 11, 2021 at 09:50:31AM -0700, Nadav Amit wrote:
>> 
>> 
>>> On Jun 11, 2021, at 6:57 AM, Will Deacon <will@kernel.org> wrote:
>>> 
>>> On Mon, Jun 07, 2021 at 11:25:39AM -0700, Nadav Amit wrote:
>>>> From: Nadav Amit <namit@vmware.com>
>>>> 
>>>> Refactor iommu_iotlb_gather_add_page() and factor out the logic that
>>>> detects whether IOTLB gather range and a new range are disjoint. To be
>>>> used by the next patch that implements different gathering logic for
>>>> AMD.
>>>> 
>>>> Cc: Joerg Roedel <joro@8bytes.org>
>>>> Cc: Will Deacon <will@kernel.org>
>>>> Cc: Jiajun Cao <caojiajun@vmware.com>
>>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>>>> Cc: iommu@lists.linux-foundation.org
>>>> Cc: linux-kernel@vger.kernel.org>
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>> ---
>>>> include/linux/iommu.h | 41 +++++++++++++++++++++++++++++++++--------
>>>> 1 file changed, 33 insertions(+), 8 deletions(-)
>>> 
>>> [...]
>>> 
>>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>>> index f254c62f3720..b5a2bfc68fb0 100644
>>>> --- a/include/linux/iommu.h
>>>> +++ b/include/linux/iommu.h
>>>> @@ -497,6 +497,28 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain,
>>>> 	iommu_iotlb_gather_init(iotlb_gather);
>>>> }
>>>> 
>>>> +/**
>>>> + * iommu_iotlb_gather_is_disjoint - Checks whether a new range is disjoint
>>>> + *
>>>> + * @gather: TLB gather data
>>>> + * @iova: start of page to invalidate
>>>> + * @size: size of page to invalidate
>>>> + *
>>>> + * Helper for IOMMU drivers to check whether a new range is and the gathered
>>>> + * range are disjoint.
>>> 
>>> I can't quite parse this. Delete the "is"?
>> 
>> Indeed. Will do (I mean I will do ;-) )
>> 
>>> 
>>>>   For many IOMMUs, flushing the IOMMU in this case is
>>>> + * better than merging the two, which might lead to unnecessary invalidations.
>>>> + */
>>>> +static inline
>>>> +bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather,
>>>> +				    unsigned long iova, size_t size)
>>>> +{
>>>> +	unsigned long start = iova, end = start + size - 1;
>>>> +
>>>> +	return gather->end != 0 &&
>>>> +		(end + 1 < gather->start || start > gather->end + 1);
>>>> +}
>>>> +
>>>> +
>>>> /**
>>>> * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
>>>> * @gather: TLB gather data
>>>> @@ -533,20 +555,16 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
>>>> 					       struct iommu_iotlb_gather *gather,
>>>> 					       unsigned long iova, size_t size)
>>>> {
>>>> -	unsigned long start = iova, end = start + size - 1;
>>>> -
>>>> 	/*
>>>> 	 * If the new page is disjoint from the current range or is mapped at
>>>> 	 * a different granularity, then sync the TLB so that the gather
>>>> 	 * structure can be rewritten.
>>>> 	 */
>>>> -	if (gather->pgsize != size ||
>>>> -	    end + 1 < gather->start || start > gather->end + 1) {
>>>> -		if (gather->pgsize)
>>>> -			iommu_iotlb_sync(domain, gather);
>>>> -		gather->pgsize = size;
>>>> -	}
>>>> +	if ((gather->pgsize && gather->pgsize != size) ||
>>>> +	    iommu_iotlb_gather_is_disjoint(gather, iova, size))
>>>> +		iommu_iotlb_sync(domain, gather);
>>>> 
>>>> +	gather->pgsize = size;
>>> 
>>> Why have you made this unconditional? I think it's ok, but just not sure
>>> if it's necessary or not.
>> 
>> In regard to gather->pgsize, this function had (and has) an
>> invariant, in which gather->pgsize always represents the flushing
>> granularity of its range. Arguably, “size" should never be
>> zero, but lets assume for the matter of discussion that it might.
>> 
>> If “size” equals to “gather->pgsize”, then the assignment in
>> question has no impact.
>> 
>> Otherwise, if “size” is non-zero, then iommu_iotlb_sync() would
>> initialize the size and range (see iommu_iotlb_gather_init()),
>> and the invariant is kept.
>> 
>> Otherwise, “size” is zero, and “gather” already holds a range,
>> so gather->pgsize is non-zero and
>> (gather->pgsize && gather->pgsize != size) is true. Therefore,
>> again, iommu_iotlb_sync() would be called and initialize the
>> size.
>> 
>> I think that this change makes the code much simpler to read.
>> It probably has no performance impact as “gather” is probably
>> cached and anyhow accessed shortly after.
> 
> Thanks. I was just interested in whether it had a functional impact (I don't
> think it does) or whether it was just cleanup.
> 
> With the updated comment:
> 
> Acked-by: Will Deacon <will@kernel.org>

Thanks. I will add the explanation to the commit log, but not to the code in order not to inflate it too much.


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers
  2021-06-15 10:42   ` Robin Murphy
@ 2021-06-15 19:05     ` Nadav Amit
  2021-06-15 19:07       ` Nadav Amit
  0 siblings, 1 reply; 26+ messages in thread
From: Nadav Amit @ 2021-06-15 19:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Will Deacon, Jiajun Cao, Lu Baolu, iommu, linux-kernel



> On Jun 15, 2021, at 3:42 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 2021-06-07 19:25, Nadav Amit wrote:
>> From: Robin Murphy <robin.murphy@arm.com>
>> The Mediatek driver is not the only one which might want a basic
>> address-based gathering behaviour, so although it's arguably simple
>> enough to open-code, let's factor it out for the sake of cleanliness.
>> Let's also take this opportunity to document the intent of these
>> helpers for clarity.
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Jiajun Cao <caojiajun@vmware.com>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>> Cc: iommu@lists.linux-foundation.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> 
> Nit: missing your signoff.
> 
>> ---
>> Changes from Robin's version:
>> * Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API
> 
> Out of curiosity, is there any config in which a stub is actually needed? Unlike iommu_iotlb_gather_init(), I would have thought that these helpers should only ever be called by driver code which already depends on IOMMU_API.

Indeed, this was only done as a defensive step.

I will remove it. I see no reason for it. Sorry for ruining your patch.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers
  2021-06-15 19:05     ` Nadav Amit
@ 2021-06-15 19:07       ` Nadav Amit
  0 siblings, 0 replies; 26+ messages in thread
From: Nadav Amit @ 2021-06-15 19:07 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Will Deacon, Jiajun Cao, Lu Baolu, iommu, linux-kernel



> On Jun 15, 2021, at 12:05 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
> 
> 
>> On Jun 15, 2021, at 3:42 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>> 
>> On 2021-06-07 19:25, Nadav Amit wrote:
>>> From: Robin Murphy <robin.murphy@arm.com>
>>> The Mediatek driver is not the only one which might want a basic
>>> address-based gathering behaviour, so although it's arguably simple
>>> enough to open-code, let's factor it out for the sake of cleanliness.
>>> Let's also take this opportunity to document the intent of these
>>> helpers for clarity.
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Jiajun Cao <caojiajun@vmware.com>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>>> Cc: iommu@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> 
>> Nit: missing your signoff.
>> 
>>> ---
>>> Changes from Robin's version:
>>> * Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API
>> 
>> Out of curiosity, is there any config in which a stub is actually needed? Unlike iommu_iotlb_gather_init(), I would have thought that these helpers should only ever be called by driver code which already depends on IOMMU_API.
> 
> Indeed, this was only done as a defensive step.
> 
> I will remove it. I see no reason for it. Sorry for ruining your patch.

And remove the stub for iommu_iotlb_gather_is_disjoint() as well.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD
  2021-06-15 18:14     ` Nadav Amit
@ 2021-06-15 19:20       ` Robin Murphy
  2021-06-15 19:46         ` Nadav Amit
  0 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2021-06-15 19:20 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Joerg Roedel, LKML, iommu, Jiajun Cao, Will Deacon

On 2021-06-15 19:14, Nadav Amit wrote:
> 
> 
>> On Jun 15, 2021, at 5:55 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2021-06-07 19:25, Nadav Amit wrote:
>>> From: Nadav Amit <namit@vmware.com>
>>> AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
>>> This is in contrast, for instnace, to Intel IOMMUs that have a limit on
>>> the number of pages that can be flushed in a single flush.  In addition,
>>> AMD's IOMMU do not care about the page-size, so changes of the page size
>>> do not need to trigger a TLB flush.
>>> So in most cases, a TLB flush due to disjoint range or page-size changes
>>> are not needed for AMD. Yet, vIOMMUs require the hypervisor to
>>> synchronize the virtualized IOMMU's PTEs with the physical ones. This
>>> process induce overheads, so it is better not to cause unnecessary
>>> flushes, i.e., flushes of PTEs that were not modified.
>>> Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
>>> of the generic iommu_iotlb_gather_add_page(). Ignore page-size changes
>>> and disjoint regions unless "non-present cache" feature is reported by
>>> the IOMMU capabilities, as this is an indication we are running on a
>>> physical IOMMU. A similar indication is used by VT-d (see "caching
>>> mode"). The new logic retains the same flushing behavior that we had
>>> before the introduction of page-selective IOTLB flushes for AMD.
>>> On virtualized environments, check if the newly flushed region and the
>>> gathered one are disjoint and flush if it is. Also check whether the new
>>> region would cause IOTLB invalidation of large region that would include
>>> unmodified PTE. The latter check is done according to the "order" of the
>>> IOTLB flush.
>>
>> If it helps,
>>
>> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> 
> Thanks!
> 
> 
>> I wonder if it might be more effective to defer the alignment-based splitting part to amd_iommu_iotlb_sync() itself, but that could be investigated as another follow-up.
> 
> Note that the alignment-based splitting is only used for virtualized AMD IOMMUs, so it has no impact for most users.
> 
> Right now, the performance is kind of bad on VMs since AMD’s IOMMU driver does a full IOTLB flush whenever it unmaps more than a single page. So, although your idea makes sense, I do not know exactly how to implement it right now, and regardless it is likely to provide much lower performance improvements than those that avoiding full IOTLB flushes would.
> 
> Having said that, if I figure out a way to implement it, I would give it a try (although I am admittedly afraid of a complicated logic that might cause subtle, mostly undetectable bugs).

I was mainly thinking that when you observe a change in "order" and sync 
to avoid over-invalidating adjacent pages, those pages may still be part 
of the current unmap and you've just not seen them added yet. Hence 
simply gathering contiguous pages regardless of alignment, then breaking 
the total range down into appropriately-aligned commands in the sync 
once you know you've seen everything, seems like it might allow issuing 
fewer commands overall. But I haven't quite grasped the alignment rules 
either, so possibly this is moot anyway.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on
  2021-06-15 18:26     ` Nadav Amit
@ 2021-06-15 19:36       ` Robin Murphy
  0 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2021-06-15 19:36 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Joerg Roedel, Will Deacon, Jiajun Cao, Lu Baolu, iommu, linux-kernel

On 2021-06-15 19:26, Nadav Amit wrote:
> 
> 
>> On Jun 15, 2021, at 6:08 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2021-06-07 19:25, Nadav Amit wrote:
>>> From: Nadav Amit <namit@vmware.com>
>>> Do not use flush-queue on virtualized environments, where the NpCache
>>> capability of the IOMMU is set. This is required to reduce
>>> virtualization overheads.
>>> This change follows a similar change to Intel's VT-d and a detailed
>>> explanation as for the rationale is described in commit 29b32839725f
>>> ("iommu/vt-d: Do not use flush-queue when caching-mode is on").
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Jiajun Cao <caojiajun@vmware.com>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>>> Cc: iommu@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>> ---
>>>   drivers/iommu/amd/init.c | 7 ++++++-
>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>> index d006724f4dc2..ba3b76ed776d 100644
>>> --- a/drivers/iommu/amd/init.c
>>> +++ b/drivers/iommu/amd/init.c
>>> @@ -1850,8 +1850,13 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
>>>   	if (ret)
>>>   		return ret;
>>>   -	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
>>> +	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
>>> +		if (!amd_iommu_unmap_flush)
>>> +			pr_warn_once("IOMMU batching is disabled due to virtualization");
>>
>> Nit: you can just use pr_warn() (or arguably pr_info()) since the explicit conditions already only match once.
> 
> Yes, my bad. I will fix it in the next version.
> 
>> Speaking of which, it might be better to use amd_iommu_np_cache instead, since other patches are planning to clean up the last remnants of amd_iommu_unmap_flush.
> 
> I prefer that the other patches (that remove amd_iommu_unmap_flush) would address this code as well. I certainly do not want to embed amd_iommu_np_cache deep into the flushing logic. IOW: I don’t know what you have exactly in mind, but I prefer the code would be clear.
> 
> This code follows (copies?) the same pattern+logic from commit 5f3116ea8b5 ("iommu/vt-d: Do not use flush-queue when caching-mode is on”). I see that changed the code in commit 53255e545807c ("iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE”), but did not get rid of intel_iommu_strict, so please allow me to use amd_iommu_unmap_flush.

Sure, it was just a suggestion to pre-resolve one line of merge conflict 
with another series[1] which is also almost ready, and removes those 
local variables for both AMD and Intel. But there will still be other 
conflicts either way, so it's not a big deal.

Robin.

[1] 
https://lore.kernel.org/linux-iommu/1623414043-40745-5-git-send-email-john.garry@huawei.com/

> To remind you/me/whoever: disabling batching due to caching-mode/NP-cache is not inherently needed. It was not needed for quite some time on Intel, but somehow along the way the consolidated flushing code broke it, and now it is needed (without intrusive code changes).
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD
  2021-06-15 19:20       ` Robin Murphy
@ 2021-06-15 19:46         ` Nadav Amit
  0 siblings, 0 replies; 26+ messages in thread
From: Nadav Amit @ 2021-06-15 19:46 UTC (permalink / raw)
  To: Robin Murphy; +Cc: Joerg Roedel, LKML, iommu, Jiajun Cao, Will Deacon

[-- Attachment #1: Type: text/plain, Size: 3891 bytes --]



> On Jun 15, 2021, at 12:20 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 2021-06-15 19:14, Nadav Amit wrote:
>>> On Jun 15, 2021, at 5:55 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> 
>>> On 2021-06-07 19:25, Nadav Amit wrote:
>>>> From: Nadav Amit <namit@vmware.com>
>>>> AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
>>>> This is in contrast, for instnace, to Intel IOMMUs that have a limit on
>>>> the number of pages that can be flushed in a single flush.  In addition,
>>>> AMD's IOMMU do not care about the page-size, so changes of the page size
>>>> do not need to trigger a TLB flush.
>>>> So in most cases, a TLB flush due to disjoint range or page-size changes
>>>> are not needed for AMD. Yet, vIOMMUs require the hypervisor to
>>>> synchronize the virtualized IOMMU's PTEs with the physical ones. This
>>>> process induce overheads, so it is better not to cause unnecessary
>>>> flushes, i.e., flushes of PTEs that were not modified.
>>>> Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
>>>> of the generic iommu_iotlb_gather_add_page(). Ignore page-size changes
>>>> and disjoint regions unless "non-present cache" feature is reported by
>>>> the IOMMU capabilities, as this is an indication we are running on a
>>>> physical IOMMU. A similar indication is used by VT-d (see "caching
>>>> mode"). The new logic retains the same flushing behavior that we had
>>>> before the introduction of page-selective IOTLB flushes for AMD.
>>>> On virtualized environments, check if the newly flushed region and the
>>>> gathered one are disjoint and flush if it is. Also check whether the new
>>>> region would cause IOTLB invalidation of large region that would include
>>>> unmodified PTE. The latter check is done according to the "order" of the
>>>> IOTLB flush.
>>> 
>>> If it helps,
>>> 
>>> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
>> Thanks!
>>> I wonder if it might be more effective to defer the alignment-based splitting part to amd_iommu_iotlb_sync() itself, but that could be investigated as another follow-up.
>> Note that the alignment-based splitting is only used for virtualized AMD IOMMUs, so it has no impact for most users.
>> Right now, the performance is kind of bad on VMs since AMD’s IOMMU driver does a full IOTLB flush whenever it unmaps more than a single page. So, although your idea makes sense, I do not know exactly how to implement it right now, and regardless it is likely to provide much lower performance improvements than those that avoiding full IOTLB flushes would.
>> Having said that, if I figure out a way to implement it, I would give it a try (although I am admittedly afraid of a complicated logic that might cause subtle, mostly undetectable bugs).
> 
> I was mainly thinking that when you observe a change in "order" and sync to avoid over-invalidating adjacent pages, those pages may still be part of the current unmap and you've just not seen them added yet. Hence simply gathering contiguous pages regardless of alignment, then breaking the total range down into appropriately-aligned commands in the sync once you know you've seen everything, seems like it might allow issuing fewer commands overall. But I haven't quite grasped the alignment rules either, so possibly this is moot anyway.

Thanks for explaining. I think that what you propose makes sense. We might already flush more than needed in certain cases (e.g., patterns in which pages are added before and after the gathered range). I doubt these cases actually happen, and the tradeoff between being precise in what you flush (more flushes) and not causing the hypervisor to check unchanged mappings (synchronization cost) is less obvious.

I will see if I can change __domain_flush_pages() to your liking, and see whether it can be part of this series.


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-06-15 19:47 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 18:25 [PATCH v3 0/6] iommu/amd: Enable page-selective flushes Nadav Amit
2021-06-07 18:25 ` [PATCH v3 1/6] iommu/amd: Selective flush on unmap Nadav Amit
2021-06-07 18:25 ` [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on Nadav Amit
2021-06-15 13:08   ` Robin Murphy
2021-06-15 18:26     ` Nadav Amit
2021-06-15 19:36       ` Robin Murphy
2021-06-07 18:25 ` [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers Nadav Amit
2021-06-11 13:50   ` Will Deacon
2021-06-15 10:42   ` Robin Murphy
2021-06-15 19:05     ` Nadav Amit
2021-06-15 19:07       ` Nadav Amit
2021-06-15 12:29   ` Yong Wu
2021-06-15 12:41     ` Robin Murphy
2021-06-07 18:25 ` [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out Nadav Amit
2021-06-11 13:57   ` Will Deacon
2021-06-11 16:50     ` Nadav Amit
2021-06-15 10:29       ` Will Deacon
2021-06-15 18:54         ` Nadav Amit
2021-06-07 18:25 ` [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD Nadav Amit
2021-06-15 12:55   ` Robin Murphy
2021-06-15 18:14     ` Nadav Amit
2021-06-15 19:20       ` Robin Murphy
2021-06-15 19:46         ` Nadav Amit
2021-06-07 18:25 ` [PATCH v3 6/6] iommu/amd: Sync once for scatter-gather operations Nadav Amit
2021-06-15 11:25   ` Robin Murphy
2021-06-15 18:51     ` Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).