linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3
@ 2018-07-12  6:18 Zhen Lei
  2018-07-12  6:18 ` [PATCH v3 1/6] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook Zhen Lei
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Zhen Lei @ 2018-07-12  6:18 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel
  Cc: Zhen Lei

v2 -> v3:
Add a bootup option "iommu_strict_mode" to make the manager can choose which
mode to be used. The first 5 patches have not changed.
+	iommu_strict_mode=	[arm-smmu-v3]
+		0 - strict mode (default)
+		1 - non-strict mode

v1 -> v2:
Use the lowest bit of the io_pgtable_ops.unmap's iova parameter to pass the strict mode:
0, IOMMU_STRICT;
1, IOMMU_NON_STRICT;
Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
other IOMMUs which still use strict mode. In other words, this patch series
will not impact other IOMMU drivers. I tried add a new quirk IO_PGTABLE_QUIRK_NON_STRICT
in io_pgtable_cfg.quirks, but it can not pass the strict mode of the domain from SMMUv3
driver to io-pgtable module. 

Add a new member domain_non_strict in struct iommu_dma_cookie, this member will only be
initialized when the related domain and IOMMU driver support non-strict mode.

v1:
In common, a IOMMU unmap operation follow the below steps:
1. remove the mapping in page table of the specified iova range
2. execute tlbi command to invalid the mapping which is cached in TLB
3. wait for the above tlbi operation to be finished
4. free the IOVA resource
5. free the physical memory resource

This maybe a problem when unmap is very frequently, the combination of tlbi
and wait operation will consume a lot of time. A feasible method is put off
tlbi and iova-free operation, when accumulating to a certain number or
reaching a specified time, execute only one tlbi_all command to clean up
TLB, then free the backup IOVAs. Mark as non-strict mode.

But it must be noted that, although the mapping has already been removed in
the page table, it maybe still exist in TLB. And the freed physical memory
may also be reused for others. So a attacker can persistent access to memory
based on the just freed IOVA, to obtain sensible data or corrupt memory. So
the VFIO should always choose the strict mode.

Some may consider put off physical memory free also, that will still follow
strict mode. But for the map_sg cases, the memory allocation is not controlled
by IOMMU APIs, so it is not enforceable.

Fortunately, Intel and AMD have already applied the non-strict mode, and put
queue_iova() operation into the common file dma-iommu.c., and my work is based
on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
unmap, but Intel and AMD IOMMU drivers are not.

Below is the performance data of strict vs non-strict for NVMe device:
Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
Randomly Write IOPS: 143K(strict) vs 513K(non-strict)


Zhen Lei (6):
  iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
  iommu/dma: add support for non-strict mode
  iommu/amd: use default branch to deal with all non-supported
    capabilities
  iommu/io-pgtable-arm: add support for non-strict mode
  iommu/arm-smmu-v3: add support for non-strict mode
  iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"

 Documentation/admin-guide/kernel-parameters.txt | 12 +++++++
 drivers/iommu/amd_iommu.c                       |  4 +--
 drivers/iommu/arm-smmu-v3.c                     | 42 +++++++++++++++++++++++--
 drivers/iommu/dma-iommu.c                       | 25 +++++++++++++++
 drivers/iommu/io-pgtable-arm.c                  | 23 ++++++++------
 include/linux/iommu.h                           |  7 +++++
 6 files changed, 98 insertions(+), 15 deletions(-)

-- 
1.8.3



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/6] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
  2018-07-12  6:18 [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Zhen Lei
@ 2018-07-12  6:18 ` Zhen Lei
  2018-07-12  6:18 ` [PATCH v3 2/6] iommu/dma: add support for non-strict mode Zhen Lei
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Zhen Lei @ 2018-07-12  6:18 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel
  Cc: Zhen Lei

.flush_iotlb_all can not just wait for previous tlbi operations to be
completed, but should also invalid all TLBs of the related domain.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm-smmu-v3.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1d64710..4402187 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1770,6 +1770,14 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 	return ops->unmap(ops, iova, size);
 }
 
+static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (smmu_domain->smmu)
+		arm_smmu_tlb_inv_context(smmu_domain);
+}
+
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
 	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1998,7 +2006,7 @@ static void arm_smmu_put_resv_regions(struct device *dev,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-	.flush_iotlb_all	= arm_smmu_iotlb_sync,
+	.flush_iotlb_all	= arm_smmu_flush_iotlb_all,
 	.iotlb_sync		= arm_smmu_iotlb_sync,
 	.iova_to_phys		= arm_smmu_iova_to_phys,
 	.add_device		= arm_smmu_add_device,
-- 
1.8.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/6] iommu/dma: add support for non-strict mode
  2018-07-12  6:18 [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Zhen Lei
  2018-07-12  6:18 ` [PATCH v3 1/6] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook Zhen Lei
@ 2018-07-12  6:18 ` Zhen Lei
  2018-07-24 22:01   ` Robin Murphy
  2018-07-12  6:18 ` [PATCH v3 3/6] iommu/amd: use default branch to deal with all non-supported capabilities Zhen Lei
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Zhen Lei @ 2018-07-12  6:18 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel
  Cc: Zhen Lei

1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
   capable call domain->ops->flush_iotlb_all to flush TLB.
2. Add a new iommu capability: IOMMU_CAP_NON_STRICT, which used to indicate
   that the iommu domain support non-strict mode.
3. During the iommu domain initialization phase, call capable() to check
   whether it support non-strcit mode. If so, call init_iova_flush_queue
   to register iovad->flush_cb callback.
4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
   -->iommu_dma_free_iova. If the domain is non-strict, call queue_iova to
   put off iova freeing.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/dma-iommu.c | 25 +++++++++++++++++++++++++
 include/linux/iommu.h     |  7 +++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ddcbbdb..9f0c77a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -55,6 +55,7 @@ struct iommu_dma_cookie {
 	};
 	struct list_head		msi_page_list;
 	spinlock_t			msi_lock;
+	struct iommu_domain		*domain_non_strict;
 };
 
 static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
@@ -257,6 +258,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
 	return ret;
 }
 
+static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad)
+{
+	struct iommu_dma_cookie *cookie;
+	struct iommu_domain *domain;
+
+	cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
+	domain = cookie->domain_non_strict;
+
+	domain->ops->flush_iotlb_all(domain);
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -272,6 +284,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
 int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
 		u64 size, struct device *dev)
 {
+	const struct iommu_ops *ops = domain->ops;
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	unsigned long order, base_pfn, end_pfn;
@@ -308,6 +321,15 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
 	}
 
 	init_iova_domain(iovad, 1UL << order, base_pfn);
+
+	if ((ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) &&
+	    (IOMMU_DOMAIN_STRICT_MODE(domain) == IOMMU_NON_STRICT)) {
+		BUG_ON(!ops->flush_iotlb_all);
+
+		cookie->domain_non_strict = domain;
+		init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, NULL);
+	}
+
 	if (!dev)
 		return 0;
 
@@ -390,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
 	/* The MSI case is only ever cleaning up its most recent allocation */
 	if (cookie->type == IOMMU_DMA_MSI_COOKIE)
 		cookie->msi_iova -= size;
+	else if (cookie->domain_non_strict)
+		queue_iova(iovad, iova_pfn(iovad, iova),
+				size >> iova_shift(iovad), 0);
 	else
 		free_iova_fast(iovad, iova_pfn(iovad, iova),
 				size >> iova_shift(iovad));
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee..82ed979 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -86,6 +86,12 @@ struct iommu_domain_geometry {
 #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
 				 __IOMMU_DOMAIN_DMA_API)
 
+#define IOMMU_STRICT		0
+#define IOMMU_NON_STRICT	1
+#define IOMMU_STRICT_MODE_MASK	1UL
+#define IOMMU_DOMAIN_STRICT_MODE(domain)	\
+		(domain->type != IOMMU_DOMAIN_UNMANAGED)
+
 struct iommu_domain {
 	unsigned type;
 	const struct iommu_ops *ops;
@@ -101,6 +107,7 @@ enum iommu_cap {
 					   transactions */
 	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
 	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
+	IOMMU_CAP_NON_STRICT,		/* IOMMU supports non-strict mode */
 };
 
 /*
-- 
1.8.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 3/6] iommu/amd: use default branch to deal with all non-supported capabilities
  2018-07-12  6:18 [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Zhen Lei
  2018-07-12  6:18 ` [PATCH v3 1/6] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook Zhen Lei
  2018-07-12  6:18 ` [PATCH v3 2/6] iommu/dma: add support for non-strict mode Zhen Lei
@ 2018-07-12  6:18 ` Zhen Lei
  2018-07-12  6:18 ` [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode Zhen Lei
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Zhen Lei @ 2018-07-12  6:18 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel
  Cc: Zhen Lei

Avoid below warning when new capabilities added:

drivers/iommu/amd_iommu.c: In function 'amd_iommu_capable':
drivers/iommu/amd_iommu.c:3053:2: warning: enumeration value 'IOMMU_CAP_NON_STRICT' not handled in switch [-Wswitch]
     switch (cap) {

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/amd_iommu.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 596b95c..b619e5c 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3085,11 +3085,9 @@ static bool amd_iommu_capable(enum iommu_cap cap)
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
 		return (irq_remapping_enabled == 1);
-	case IOMMU_CAP_NOEXEC:
+	default:
 		return false;
 	}
-
-	return false;
 }
 
 static void amd_iommu_get_resv_regions(struct device *dev,
-- 
1.8.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-07-12  6:18 [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Zhen Lei
                   ` (2 preceding siblings ...)
  2018-07-12  6:18 ` [PATCH v3 3/6] iommu/amd: use default branch to deal with all non-supported capabilities Zhen Lei
@ 2018-07-12  6:18 ` Zhen Lei
  2018-07-24 22:25   ` Robin Murphy
  2018-07-12  6:18 ` [PATCH v3 5/6] iommu/arm-smmu-v3: " Zhen Lei
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Zhen Lei @ 2018-07-12  6:18 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel
  Cc: Zhen Lei

To support the non-strict mode, now we only tlbi and sync for the strict
mode. But for the non-leaf case, always follow strict mode.

Use the lowest bit of the iova parameter to pass the strict mode:
0, IOMMU_STRICT;
1, IOMMU_NON_STRICT;
Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
other IOMMUs which still use strict mode.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/io-pgtable-arm.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 010a254..9234db3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -292,7 +292,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			       unsigned long iova, size_t size, int lvl,
-			       arm_lpae_iopte *ptep);
+			       arm_lpae_iopte *ptep, int strict);
 
 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 				phys_addr_t paddr, arm_lpae_iopte prot,
@@ -334,7 +334,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 		size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
 
 		tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-		if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
+		if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp, IOMMU_STRICT) != sz))
 			return -EINVAL;
 	}
 
@@ -531,7 +531,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 				       unsigned long iova, size_t size,
 				       arm_lpae_iopte blk_pte, int lvl,
-				       arm_lpae_iopte *ptep)
+				       arm_lpae_iopte *ptep, int strict)
 {
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte pte, *tablep;
@@ -576,15 +576,18 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 	}
 
 	if (unmap_idx < 0)
-		return __arm_lpae_unmap(data, iova, size, lvl, tablep);
+		return __arm_lpae_unmap(data, iova, size, lvl, tablep, strict);
 
 	io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
+	if (!strict)
+		io_pgtable_tlb_sync(&data->iop);
+
 	return size;
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			       unsigned long iova, size_t size, int lvl,
-			       arm_lpae_iopte *ptep)
+			       arm_lpae_iopte *ptep, int strict)
 {
 	arm_lpae_iopte pte;
 	struct io_pgtable *iop = &data->iop;
@@ -609,7 +612,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			io_pgtable_tlb_sync(iop);
 			ptep = iopte_deref(pte, data);
 			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
-		} else {
+		} else if (strict) {
 			io_pgtable_tlb_add_flush(iop, iova, size, size, true);
 		}
 
@@ -620,25 +623,27 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 		 * minus the part we want to unmap
 		 */
 		return arm_lpae_split_blk_unmap(data, iova, size, pte,
-						lvl + 1, ptep);
+						lvl + 1, ptep, strict);
 	}
 
 	/* Keep on walkin' */
 	ptep = iopte_deref(pte, data);
-	return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
+	return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep, strict);
 }
 
 static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 			     size_t size)
 {
+	int strict = ((iova & IOMMU_STRICT_MODE_MASK) == IOMMU_STRICT);
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	arm_lpae_iopte *ptep = data->pgd;
 	int lvl = ARM_LPAE_START_LVL(data);
 
+	iova &= ~IOMMU_STRICT_MODE_MASK;
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
 		return 0;
 
-	return __arm_lpae_unmap(data, iova, size, lvl, ptep);
+	return __arm_lpae_unmap(data, iova, size, lvl, ptep, strict);
 }
 
 static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-- 
1.8.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 5/6] iommu/arm-smmu-v3: add support for non-strict mode
  2018-07-12  6:18 [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Zhen Lei
                   ` (3 preceding siblings ...)
  2018-07-12  6:18 ` [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode Zhen Lei
@ 2018-07-12  6:18 ` Zhen Lei
  2018-07-12  6:18 ` [PATCH v3 6/6] iommu/arm-smmu-v3: add bootup option "iommu_strict_mode" Zhen Lei
  2018-07-24 21:51 ` [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Robin Murphy
  6 siblings, 0 replies; 27+ messages in thread
From: Zhen Lei @ 2018-07-12  6:18 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel
  Cc: Zhen Lei

1. Add IOMMU_CAP_NON_STRICT capability.
2. Dynamic choose strict or non-strict mode base on the iommu domain type.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm-smmu-v3.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4402187..4a198a0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1440,6 +1440,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		return true;
 	case IOMMU_CAP_NOEXEC:
 		return true;
+	case IOMMU_CAP_NON_STRICT:
+		return true;
 	default:
 		return false;
 	}
@@ -1767,7 +1769,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	return ops->unmap(ops, iova | IOMMU_DOMAIN_STRICT_MODE(domain), size);
 }
 
 static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
@@ -1782,7 +1784,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
 	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
 
-	if (smmu)
+	if (smmu && (IOMMU_DOMAIN_STRICT_MODE(domain) == IOMMU_STRICT))
 		__arm_smmu_tlb_sync(smmu);
 }
 
-- 
1.8.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 6/6] iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"
  2018-07-12  6:18 [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Zhen Lei
                   ` (4 preceding siblings ...)
  2018-07-12  6:18 ` [PATCH v3 5/6] iommu/arm-smmu-v3: " Zhen Lei
@ 2018-07-12  6:18 ` Zhen Lei
  2018-07-24 22:46   ` Robin Murphy
  2018-07-24 21:51 ` [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Robin Murphy
  6 siblings, 1 reply; 27+ messages in thread
From: Zhen Lei @ 2018-07-12  6:18 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel
  Cc: Zhen Lei

Because the non-strict mode introduces a vulnerability window, so add a
bootup option to make the manager can choose which mode to be used. The
default mode is IOMMU_STRICT.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++++
 drivers/iommu/arm-smmu-v3.c                     | 32 ++++++++++++++++++++++---
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index efc7aa7..0cc80bc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1720,6 +1720,18 @@
 		nobypass	[PPC/POWERNV]
 			Disable IOMMU bypass, using IOMMU for PCI devices.
 
+	iommu_strict_mode=	[arm-smmu-v3]
+		0 - strict mode
+		    Make sure all related TLBs to be invalidated before the
+		    memory released.
+		1 - non-strict mode
+		    Put off TLBs invalidation and release memory first. This mode
+		    introduces a vlunerability window, an untrusted device can
+		    access the reused memory because the TLBs may still valid.
+		    Please take full consideration before choosing this mode.
+		    Note that, VFIO is always use strict mode.
+		others - strict mode
+
 	iommu.passthrough=
 			[ARM64] Configure DMA to bypass the IOMMU by default.
 			Format: { "0" | "1" }
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4a198a0..9b72fc4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -631,6 +631,24 @@ struct arm_smmu_option_prop {
 	{ 0, NULL},
 };
 
+static u32 iommu_strict_mode __read_mostly = IOMMU_STRICT;
+
+static int __init setup_iommu_strict_mode(char *str)
+{
+	u32 strict_mode = IOMMU_STRICT;
+
+	get_option(&str, &strict_mode);
+	if (strict_mode == IOMMU_NON_STRICT) {
+		iommu_strict_mode = strict_mode;
+		pr_warn("WARNING: iommu non-strict mode is chose.\n"
+			"It's good for scatter-gather performance but lacks full isolation\n");
+		add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
+	}
+
+	return 0;
+}
+early_param("iommu_strict_mode", setup_iommu_strict_mode);
+
 static inline void __iomem *arm_smmu_page1_fixup(unsigned long offset,
 						 struct arm_smmu_device *smmu)
 {
@@ -1441,7 +1459,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	case IOMMU_CAP_NON_STRICT:
-		return true;
+		return (iommu_strict_mode == IOMMU_NON_STRICT) ? true : false;
 	default:
 		return false;
 	}
@@ -1750,6 +1768,14 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	return ret;
 }
 
+static u32 arm_smmu_strict_mode(struct iommu_domain *domain)
+{
+	if (iommu_strict_mode == IOMMU_NON_STRICT)
+		return IOMMU_DOMAIN_STRICT_MODE(domain);
+
+	return IOMMU_STRICT;
+}
+
 static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 			phys_addr_t paddr, size_t size, int prot)
 {
@@ -1769,7 +1795,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova | IOMMU_DOMAIN_STRICT_MODE(domain), size);
+	return ops->unmap(ops, iova | arm_smmu_strict_mode(domain), size);
 }
 
 static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
@@ -1784,7 +1810,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
 	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
 
-	if (smmu && (IOMMU_DOMAIN_STRICT_MODE(domain) == IOMMU_STRICT))
+	if (smmu && (arm_smmu_strict_mode(domain) == IOMMU_STRICT))
 		__arm_smmu_tlb_sync(smmu);
 }
 
-- 
1.8.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3
  2018-07-12  6:18 [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Zhen Lei
                   ` (5 preceding siblings ...)
  2018-07-12  6:18 ` [PATCH v3 6/6] iommu/arm-smmu-v3: add bootup option "iommu_strict_mode" Zhen Lei
@ 2018-07-24 21:51 ` Robin Murphy
  2018-07-26  3:44   ` Leizhen (ThunderTown)
  6 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2018-07-24 21:51 UTC (permalink / raw)
  To: Zhen Lei, Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel

On 2018-07-12 7:18 AM, Zhen Lei wrote:
> v2 -> v3:
> Add a bootup option "iommu_strict_mode" to make the manager can choose which
> mode to be used. The first 5 patches have not changed.
> +	iommu_strict_mode=	[arm-smmu-v3]
> +		0 - strict mode (default)
> +		1 - non-strict mode
> 
> v1 -> v2:
> Use the lowest bit of the io_pgtable_ops.unmap's iova parameter to pass the strict mode:
> 0, IOMMU_STRICT;
> 1, IOMMU_NON_STRICT;
> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
> other IOMMUs which still use strict mode. In other words, this patch series
> will not impact other IOMMU drivers. I tried add a new quirk IO_PGTABLE_QUIRK_NON_STRICT
> in io_pgtable_cfg.quirks, but it can not pass the strict mode of the domain from SMMUv3
> driver to io-pgtable module.

What exactly is the issue there? We don't have any problem with other 
quirks like NO_DMA, and as I said before, by the time we're allocating 
the io-pgtable in arm_smmu_domain_finalise() we already know everything 
there is to know about the domain.

> Add a new member domain_non_strict in struct iommu_dma_cookie, this member will only be
> initialized when the related domain and IOMMU driver support non-strict mode.
> 
> v1:
> In common, a IOMMU unmap operation follow the below steps:
> 1. remove the mapping in page table of the specified iova range
> 2. execute tlbi command to invalid the mapping which is cached in TLB
> 3. wait for the above tlbi operation to be finished
> 4. free the IOVA resource
> 5. free the physical memory resource
> 
> This maybe a problem when unmap is very frequently, the combination of tlbi
> and wait operation will consume a lot of time. A feasible method is put off
> tlbi and iova-free operation, when accumulating to a certain number or
> reaching a specified time, execute only one tlbi_all command to clean up
> TLB, then free the backup IOVAs. Mark as non-strict mode.
> 
> But it must be noted that, although the mapping has already been removed in
> the page table, it maybe still exist in TLB. And the freed physical memory
> may also be reused for others. So a attacker can persistent access to memory
> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
> the VFIO should always choose the strict mode.
> 
> Some may consider put off physical memory free also, that will still follow
> strict mode. But for the map_sg cases, the memory allocation is not controlled
> by IOMMU APIs, so it is not enforceable.
> 
> Fortunately, Intel and AMD have already applied the non-strict mode, and put
> queue_iova() operation into the common file dma-iommu.c., and my work is based
> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
> unmap, but Intel and AMD IOMMU drivers are not.
> 
> Below is the performance data of strict vs non-strict for NVMe device:
> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)

How does that compare to passthrough performance? One thing I'm not 
entirely clear about is what the realistic use-case for this is - even 
if invalidation were infinitely fast, enabling translation still 
typically has a fair impact on overall system performance in terms of 
latency, power, memory bandwidth, etc., so I can't help wonder what 
devices exist today for which performance is critical and robustness* is 
unimportant, yet have crippled addressing capabilities such that they 
can't just use passthrough.

Robin.


* I don't want to say "security" here, since I'm actually a lot less 
concerned about the theoretical malicious endpoint/wild write scenarios 
than the the much more straightforward malfunctioning device and/or 
buggy driver causing use-after-free style memory corruption. Also, I'm 
sick of the word "security"...

> 
> Zhen Lei (6):
>    iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
>    iommu/dma: add support for non-strict mode
>    iommu/amd: use default branch to deal with all non-supported
>      capabilities
>    iommu/io-pgtable-arm: add support for non-strict mode
>    iommu/arm-smmu-v3: add support for non-strict mode
>    iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"
> 
>   Documentation/admin-guide/kernel-parameters.txt | 12 +++++++
>   drivers/iommu/amd_iommu.c                       |  4 +--
>   drivers/iommu/arm-smmu-v3.c                     | 42 +++++++++++++++++++++++--
>   drivers/iommu/dma-iommu.c                       | 25 +++++++++++++++
>   drivers/iommu/io-pgtable-arm.c                  | 23 ++++++++------
>   include/linux/iommu.h                           |  7 +++++
>   6 files changed, 98 insertions(+), 15 deletions(-)
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] iommu/dma: add support for non-strict mode
  2018-07-12  6:18 ` [PATCH v3 2/6] iommu/dma: add support for non-strict mode Zhen Lei
@ 2018-07-24 22:01   ` Robin Murphy
  2018-07-26  4:15     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2018-07-24 22:01 UTC (permalink / raw)
  To: Zhen Lei, Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel

On 2018-07-12 7:18 AM, Zhen Lei wrote:
> 1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
>     capable call domain->ops->flush_iotlb_all to flush TLB.
> 2. Add a new iommu capability: IOMMU_CAP_NON_STRICT, which used to indicate
>     that the iommu domain support non-strict mode.
> 3. During the iommu domain initialization phase, call capable() to check
>     whether it support non-strcit mode. If so, call init_iova_flush_queue
>     to register iovad->flush_cb callback.
> 4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
>     -->iommu_dma_free_iova. If the domain is non-strict, call queue_iova to
>     put off iova freeing.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/dma-iommu.c | 25 +++++++++++++++++++++++++
>   include/linux/iommu.h     |  7 +++++++
>   2 files changed, 32 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index ddcbbdb..9f0c77a 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -55,6 +55,7 @@ struct iommu_dma_cookie {
>   	};
>   	struct list_head		msi_page_list;
>   	spinlock_t			msi_lock;
> +	struct iommu_domain		*domain_non_strict;
>   };
>   
>   static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
> @@ -257,6 +258,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   	return ret;
>   }
>   
> +static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad)
> +{
> +	struct iommu_dma_cookie *cookie;
> +	struct iommu_domain *domain;
> +
> +	cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
> +	domain = cookie->domain_non_strict;
> +
> +	domain->ops->flush_iotlb_all(domain);
> +}
> +
>   /**
>    * iommu_dma_init_domain - Initialise a DMA mapping domain
>    * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
> @@ -272,6 +284,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   		u64 size, struct device *dev)
>   {
> +	const struct iommu_ops *ops = domain->ops;
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	struct iova_domain *iovad = &cookie->iovad;
>   	unsigned long order, base_pfn, end_pfn;
> @@ -308,6 +321,15 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   	}
>   
>   	init_iova_domain(iovad, 1UL << order, base_pfn);
> +
> +	if ((ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) &&
> +	    (IOMMU_DOMAIN_STRICT_MODE(domain) == IOMMU_NON_STRICT)) {
> +		BUG_ON(!ops->flush_iotlb_all);
> +
> +		cookie->domain_non_strict = domain;
> +		init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, NULL);
> +	}
> +
>   	if (!dev)
>   		return 0;
>   
> @@ -390,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
>   	/* The MSI case is only ever cleaning up its most recent allocation */
>   	if (cookie->type == IOMMU_DMA_MSI_COOKIE)
>   		cookie->msi_iova -= size;
> +	else if (cookie->domain_non_strict)
> +		queue_iova(iovad, iova_pfn(iovad, iova),
> +				size >> iova_shift(iovad), 0);
>   	else
>   		free_iova_fast(iovad, iova_pfn(iovad, iova),
>   				size >> iova_shift(iovad));
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 19938ee..82ed979 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -86,6 +86,12 @@ struct iommu_domain_geometry {
>   #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
>   				 __IOMMU_DOMAIN_DMA_API)
>   
> +#define IOMMU_STRICT		0
> +#define IOMMU_NON_STRICT	1
> +#define IOMMU_STRICT_MODE_MASK	1UL
> +#define IOMMU_DOMAIN_STRICT_MODE(domain)	\
> +		(domain->type != IOMMU_DOMAIN_UNMANAGED)
> +
>   struct iommu_domain {
>   	unsigned type;
>   	const struct iommu_ops *ops;
> @@ -101,6 +107,7 @@ enum iommu_cap {
>   					   transactions */
>   	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
>   	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
> +	IOMMU_CAP_NON_STRICT,		/* IOMMU supports non-strict mode */

This still isn't a capability of the hardware. Non-strict mode is pure 
software policy; *every IOMMU ever* supports it.

If you want to have this kind of per-driver control, make it a domain 
attribute - that's not only more logical, but should also work out a lot 
cleaner overall.

Robin.

>   };
>   
>   /*
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-07-12  6:18 ` [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode Zhen Lei
@ 2018-07-24 22:25   ` Robin Murphy
  2018-07-26  7:20     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2018-07-24 22:25 UTC (permalink / raw)
  To: Zhen Lei, Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel

On 2018-07-12 7:18 AM, Zhen Lei wrote:
> To support the non-strict mode, now we only tlbi and sync for the strict
> mode. But for the non-leaf case, always follow strict mode.
> 
> Use the lowest bit of the iova parameter to pass the strict mode:
> 0, IOMMU_STRICT;
> 1, IOMMU_NON_STRICT;
> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
> other IOMMUs which still use strict mode.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/io-pgtable-arm.c | 23 ++++++++++++++---------
>   1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 010a254..9234db3 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -292,7 +292,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
>   
>   static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>   			       unsigned long iova, size_t size, int lvl,
> -			       arm_lpae_iopte *ptep);
> +			       arm_lpae_iopte *ptep, int strict);
>   
>   static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>   				phys_addr_t paddr, arm_lpae_iopte prot,
> @@ -334,7 +334,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>   		size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>   
>   		tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
> -		if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
> +		if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp, IOMMU_STRICT) != sz))
>   			return -EINVAL;
>   	}
>   
> @@ -531,7 +531,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
>   static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>   				       unsigned long iova, size_t size,
>   				       arm_lpae_iopte blk_pte, int lvl,
> -				       arm_lpae_iopte *ptep)
> +				       arm_lpae_iopte *ptep, int strict)

DMA code should never ever be splitting blocks anyway, and frankly the 
TLB maintenance here is dodgy enough (since we can't reasonably do 
break-before make as VMSA says we should) that I *really* don't want to 
introduce any possibility of making it more asynchronous. I'd much 
rather just hard-code the expectation of strict == true for this.

Robin.

>   {
>   	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>   	arm_lpae_iopte pte, *tablep;
> @@ -576,15 +576,18 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>   	}
>   
>   	if (unmap_idx < 0)
> -		return __arm_lpae_unmap(data, iova, size, lvl, tablep);
> +		return __arm_lpae_unmap(data, iova, size, lvl, tablep, strict);
>   
>   	io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
> +	if (!strict)
> +		io_pgtable_tlb_sync(&data->iop);
> +
>   	return size;
>   }
>   
>   static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>   			       unsigned long iova, size_t size, int lvl,
> -			       arm_lpae_iopte *ptep)
> +			       arm_lpae_iopte *ptep, int strict)
>   {
>   	arm_lpae_iopte pte;
>   	struct io_pgtable *iop = &data->iop;
> @@ -609,7 +612,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>   			io_pgtable_tlb_sync(iop);
>   			ptep = iopte_deref(pte, data);
>   			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
> -		} else {
> +		} else if (strict) {
>   			io_pgtable_tlb_add_flush(iop, iova, size, size, true);
>   		}
>   
> @@ -620,25 +623,27 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>   		 * minus the part we want to unmap
>   		 */
>   		return arm_lpae_split_blk_unmap(data, iova, size, pte,
> -						lvl + 1, ptep);
> +						lvl + 1, ptep, strict);
>   	}
>   
>   	/* Keep on walkin' */
>   	ptep = iopte_deref(pte, data);
> -	return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
> +	return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep, strict);
>   }
>   
>   static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>   			     size_t size)
>   {
> +	int strict = ((iova & IOMMU_STRICT_MODE_MASK) == IOMMU_STRICT);
>   	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>   	arm_lpae_iopte *ptep = data->pgd;
>   	int lvl = ARM_LPAE_START_LVL(data);
>   
> +	iova &= ~IOMMU_STRICT_MODE_MASK;
>   	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
>   		return 0;
>   
> -	return __arm_lpae_unmap(data, iova, size, lvl, ptep);
> +	return __arm_lpae_unmap(data, iova, size, lvl, ptep, strict);
>   }
>   
>   static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 6/6] iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"
  2018-07-12  6:18 ` [PATCH v3 6/6] iommu/arm-smmu-v3: add bootup option "iommu_strict_mode" Zhen Lei
@ 2018-07-24 22:46   ` Robin Murphy
  2018-07-26  7:41     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2018-07-24 22:46 UTC (permalink / raw)
  To: Zhen Lei, Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel

On 2018-07-12 7:18 AM, Zhen Lei wrote:
> Because the non-strict mode introduces a vulnerability window, so add a
> bootup option to make the manager can choose which mode to be used. The
> default mode is IOMMU_STRICT.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++++
>   drivers/iommu/arm-smmu-v3.c                     | 32 ++++++++++++++++++++++---
>   2 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index efc7aa7..0cc80bc 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1720,6 +1720,18 @@
>   		nobypass	[PPC/POWERNV]
>   			Disable IOMMU bypass, using IOMMU for PCI devices.
>   
> +	iommu_strict_mode=	[arm-smmu-v3]

If anything this should belong to iommu-dma, as that's where the actual 
decision of whether to use a flush queue or not happens. Also it would 
be nice to stick to the iommu.* option namespace in the hope of 
maintaining some consistency.

> +		0 - strict mode
> +		    Make sure all related TLBs to be invalidated before the
> +		    memory released.
> +		1 - non-strict mode
> +		    Put off TLBs invalidation and release memory first. This mode
> +		    introduces a vlunerability window, an untrusted device can
> +		    access the reused memory because the TLBs may still valid.
> +		    Please take full consideration before choosing this mode.
> +		    Note that, VFIO is always use strict mode.
> +		others - strict mode
> +
>   	iommu.passthrough=
>   			[ARM64] Configure DMA to bypass the IOMMU by default.
>   			Format: { "0" | "1" }
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 4a198a0..9b72fc4 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -631,6 +631,24 @@ struct arm_smmu_option_prop {
>   	{ 0, NULL},
>   };
>   
> +static u32 iommu_strict_mode __read_mostly = IOMMU_STRICT;
> +
> +static int __init setup_iommu_strict_mode(char *str)
> +{
> +	u32 strict_mode = IOMMU_STRICT;
> +
> +	get_option(&str, &strict_mode);
> +	if (strict_mode == IOMMU_NON_STRICT) {
> +		iommu_strict_mode = strict_mode;
> +		pr_warn("WARNING: iommu non-strict mode is chose.\n"
> +			"It's good for scatter-gather performance but lacks full isolation\n");
> +		add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
> +	}
> +
> +	return 0;
> +}
> +early_param("iommu_strict_mode", setup_iommu_strict_mode);
> +
>   static inline void __iomem *arm_smmu_page1_fixup(unsigned long offset,
>   						 struct arm_smmu_device *smmu)
>   {
> @@ -1441,7 +1459,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
>   	case IOMMU_CAP_NOEXEC:
>   		return true;
>   	case IOMMU_CAP_NON_STRICT:
> -		return true;
> +		return (iommu_strict_mode == IOMMU_NON_STRICT) ? true : false;

Ugh. The "completely redundant ternany" idiom hurts my soul :(

Robin.

>   	default:
>   		return false;
>   	}
> @@ -1750,6 +1768,14 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>   	return ret;
>   }
>   
> +static u32 arm_smmu_strict_mode(struct iommu_domain *domain)
> +{
> +	if (iommu_strict_mode == IOMMU_NON_STRICT)
> +		return IOMMU_DOMAIN_STRICT_MODE(domain);
> +
> +	return IOMMU_STRICT;
> +}
> +
>   static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>   			phys_addr_t paddr, size_t size, int prot)
>   {
> @@ -1769,7 +1795,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>   	if (!ops)
>   		return 0;
>   
> -	return ops->unmap(ops, iova | IOMMU_DOMAIN_STRICT_MODE(domain), size);
> +	return ops->unmap(ops, iova | arm_smmu_strict_mode(domain), size);
>   }
>   
>   static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
> @@ -1784,7 +1810,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
>   {
>   	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
>   
> -	if (smmu && (IOMMU_DOMAIN_STRICT_MODE(domain) == IOMMU_STRICT))
> +	if (smmu && (arm_smmu_strict_mode(domain) == IOMMU_STRICT))
>   		__arm_smmu_tlb_sync(smmu);
>   }
>   
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3
  2018-07-24 21:51 ` [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Robin Murphy
@ 2018-07-26  3:44   ` Leizhen (ThunderTown)
  2018-07-26 14:16     ` Robin Murphy
  0 siblings, 1 reply; 27+ messages in thread
From: Leizhen (ThunderTown) @ 2018-07-26  3:44 UTC (permalink / raw)
  To: Robin Murphy, Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel, LinuxArm



On 2018/7/25 5:51, Robin Murphy wrote:
> On 2018-07-12 7:18 AM, Zhen Lei wrote:
>> v2 -> v3:
>> Add a bootup option "iommu_strict_mode" to make the manager can choose which
>> mode to be used. The first 5 patches have not changed.
>> +    iommu_strict_mode=    [arm-smmu-v3]
>> +        0 - strict mode (default)
>> +        1 - non-strict mode
>>
>> v1 -> v2:
>> Use the lowest bit of the io_pgtable_ops.unmap's iova parameter to pass the strict mode:
>> 0, IOMMU_STRICT;
>> 1, IOMMU_NON_STRICT;
>> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
>> other IOMMUs which still use strict mode. In other words, this patch series
>> will not impact other IOMMU drivers. I tried add a new quirk IO_PGTABLE_QUIRK_NON_STRICT
>> in io_pgtable_cfg.quirks, but it can not pass the strict mode of the domain from SMMUv3
>> driver to io-pgtable module.
> 
> What exactly is the issue there? We don't have any problem with other quirks like NO_DMA, and as I said before, by the time we're allocating the io-pgtable in arm_smmu_domain_finalise() we already know everything there is to know about the domain.

Because userspace can map/unamp and start devices to access memory through VFIO.
So that, the attacker can:
1. alloc memory
2. map
3. unmap
4. free memory
5. repeatedly accesssing the just freed memory base on the just unmapped iova,
   this attack may success if the freed memory is reused by others and the mapping still staying in TLB

But if only root user can use VFIO, this is an unnecessary worry. Then both normal and VFIO will use the
same strict mode, so that the new quirk IO_PGTABLE_QUIRK_NON_STRICT can easily be applied.

> 
>> Add a new member domain_non_strict in struct iommu_dma_cookie, this member will only be
>> initialized when the related domain and IOMMU driver support non-strict mode.
>>
>> v1:
>> In common, a IOMMU unmap operation follow the below steps:
>> 1. remove the mapping in page table of the specified iova range
>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>> 3. wait for the above tlbi operation to be finished
>> 4. free the IOVA resource
>> 5. free the physical memory resource
>>
>> This maybe a problem when unmap is very frequently, the combination of tlbi
>> and wait operation will consume a lot of time. A feasible method is put off
>> tlbi and iova-free operation, when accumulating to a certain number or
>> reaching a specified time, execute only one tlbi_all command to clean up
>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>
>> But it must be noted that, although the mapping has already been removed in
>> the page table, it maybe still exist in TLB. And the freed physical memory
>> may also be reused for others. So a attacker can persistent access to memory
>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>> the VFIO should always choose the strict mode.
>>
>> Some may consider put off physical memory free also, that will still follow
>> strict mode. But for the map_sg cases, the memory allocation is not controlled
>> by IOMMU APIs, so it is not enforceable.
>>
>> Fortunately, Intel and AMD have already applied the non-strict mode, and put
>> queue_iova() operation into the common file dma-iommu.c., and my work is based
>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
>> unmap, but Intel and AMD IOMMU drivers are not.
>>
>> Below is the performance data of strict vs non-strict for NVMe device:
>> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
> 
> How does that compare to passthrough performance? One thing I'm not entirely clear about is what the realistic use-case for this is - even if invalidation were infinitely fast, enabling translation still typically has a fair impact on overall system performance in terms of latency, power, memory bandwidth, etc., so I can't help wonder what devices exist today for which performance is critical and robustness* is unimportant, yet have crippled addressing capabilities such that they can't just use passthrough.
I have no passthrough performance data yet, I will ask my team to do it. But we have tested the Global bypass:
Randomly Read IOPS: 744K, and Randomly Write IOPS: is the same to non-strict.

I'm also not clear. But I think in most cases, the system does not need to run at full capacity, but the system
should have that ability. For example, a system's daily load may only 30-50%, but the load may increase to 80%+
on festival day.

Passthrough is not enough to support VFIO, and virtualization need the later.

> 
> Robin.
> 
> 
> * I don't want to say "security" here, since I'm actually a lot less concerned about the theoretical malicious endpoint/wild write scenarios than the the much more straightforward malfunctioning device and/or buggy driver causing use-after-free style memory corruption. Also, I'm sick of the word "security"...

OK,We really have no need to consider buggy devices.

> 
>>
>> Zhen Lei (6):
>>    iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
>>    iommu/dma: add support for non-strict mode
>>    iommu/amd: use default branch to deal with all non-supported
>>      capabilities
>>    iommu/io-pgtable-arm: add support for non-strict mode
>>    iommu/arm-smmu-v3: add support for non-strict mode
>>    iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"
>>
>>   Documentation/admin-guide/kernel-parameters.txt | 12 +++++++
>>   drivers/iommu/amd_iommu.c                       |  4 +--
>>   drivers/iommu/arm-smmu-v3.c                     | 42 +++++++++++++++++++++++--
>>   drivers/iommu/dma-iommu.c                       | 25 +++++++++++++++
>>   drivers/iommu/io-pgtable-arm.c                  | 23 ++++++++------
>>   include/linux/iommu.h                           |  7 +++++
>>   6 files changed, 98 insertions(+), 15 deletions(-)
>>
> 
> .
> 

-- 
Thanks!
BestRegards


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] iommu/dma: add support for non-strict mode
  2018-07-24 22:01   ` Robin Murphy
@ 2018-07-26  4:15     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 27+ messages in thread
From: Leizhen (ThunderTown) @ 2018-07-26  4:15 UTC (permalink / raw)
  To: Robin Murphy, Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel, LinuxArm



On 2018/7/25 6:01, Robin Murphy wrote:
> On 2018-07-12 7:18 AM, Zhen Lei wrote:
>> 1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
>>     capable call domain->ops->flush_iotlb_all to flush TLB.
>> 2. Add a new iommu capability: IOMMU_CAP_NON_STRICT, which used to indicate
>>     that the iommu domain support non-strict mode.
>> 3. During the iommu domain initialization phase, call capable() to check
>>     whether it support non-strcit mode. If so, call init_iova_flush_queue
>>     to register iovad->flush_cb callback.
>> 4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
>>     -->iommu_dma_free_iova. If the domain is non-strict, call queue_iova to
>>     put off iova freeing.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>   drivers/iommu/dma-iommu.c | 25 +++++++++++++++++++++++++
>>   include/linux/iommu.h     |  7 +++++++
>>   2 files changed, 32 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index ddcbbdb..9f0c77a 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -55,6 +55,7 @@ struct iommu_dma_cookie {
>>       };
>>       struct list_head        msi_page_list;
>>       spinlock_t            msi_lock;
>> +    struct iommu_domain        *domain_non_strict;
>>   };
>>     static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
>> @@ -257,6 +258,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
>>       return ret;
>>   }
>>   +static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad)
>> +{
>> +    struct iommu_dma_cookie *cookie;
>> +    struct iommu_domain *domain;
>> +
>> +    cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
>> +    domain = cookie->domain_non_strict;
>> +
>> +    domain->ops->flush_iotlb_all(domain);
>> +}
>> +
>>   /**
>>    * iommu_dma_init_domain - Initialise a DMA mapping domain
>>    * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
>> @@ -272,6 +284,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
>>   int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>>           u64 size, struct device *dev)
>>   {
>> +    const struct iommu_ops *ops = domain->ops;
>>       struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>       struct iova_domain *iovad = &cookie->iovad;
>>       unsigned long order, base_pfn, end_pfn;
>> @@ -308,6 +321,15 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>>       }
>>         init_iova_domain(iovad, 1UL << order, base_pfn);
>> +
>> +    if ((ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) &&
>> +        (IOMMU_DOMAIN_STRICT_MODE(domain) == IOMMU_NON_STRICT)) {
>> +        BUG_ON(!ops->flush_iotlb_all);
>> +
>> +        cookie->domain_non_strict = domain;
>> +        init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, NULL);
>> +    }
>> +
>>       if (!dev)
>>           return 0;
>>   @@ -390,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
>>       /* The MSI case is only ever cleaning up its most recent allocation */
>>       if (cookie->type == IOMMU_DMA_MSI_COOKIE)
>>           cookie->msi_iova -= size;
>> +    else if (cookie->domain_non_strict)
>> +        queue_iova(iovad, iova_pfn(iovad, iova),
>> +                size >> iova_shift(iovad), 0);
>>       else
>>           free_iova_fast(iovad, iova_pfn(iovad, iova),
>>                   size >> iova_shift(iovad));
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 19938ee..82ed979 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -86,6 +86,12 @@ struct iommu_domain_geometry {
>>   #define IOMMU_DOMAIN_DMA    (__IOMMU_DOMAIN_PAGING |    \
>>                    __IOMMU_DOMAIN_DMA_API)
>>   +#define IOMMU_STRICT        0
>> +#define IOMMU_NON_STRICT    1
>> +#define IOMMU_STRICT_MODE_MASK    1UL
>> +#define IOMMU_DOMAIN_STRICT_MODE(domain)    \
>> +        (domain->type != IOMMU_DOMAIN_UNMANAGED)
>> +
>>   struct iommu_domain {
>>       unsigned type;
>>       const struct iommu_ops *ops;
>> @@ -101,6 +107,7 @@ enum iommu_cap {
>>                          transactions */
>>       IOMMU_CAP_INTR_REMAP,        /* IOMMU supports interrupt isolation */
>>       IOMMU_CAP_NOEXEC,        /* IOMMU_NOEXEC flag */
>> +    IOMMU_CAP_NON_STRICT,        /* IOMMU supports non-strict mode */
> 
> This still isn't a capability of the hardware. Non-strict mode is pure software policy; *every IOMMU ever* supports it.
> 
> If you want to have this kind of per-driver control, make it a domain attribute - that's not only more logical, but should also work out a lot cleaner overall.

OK, I will use quirk, so this capability will be removed in the next version.

> 
> Robin.
> 
>>   };
>>     /*
>>
> 
> .
> 

-- 
Thanks!
BestRegards


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-07-24 22:25   ` Robin Murphy
@ 2018-07-26  7:20     ` Leizhen (ThunderTown)
  2018-07-26 14:35       ` Robin Murphy
  0 siblings, 1 reply; 27+ messages in thread
From: Leizhen (ThunderTown) @ 2018-07-26  7:20 UTC (permalink / raw)
  To: Robin Murphy, Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel



On 2018/7/25 6:25, Robin Murphy wrote:
> On 2018-07-12 7:18 AM, Zhen Lei wrote:
>> To support the non-strict mode, now we only tlbi and sync for the strict
>> mode. But for the non-leaf case, always follow strict mode.
>>
>> Use the lowest bit of the iova parameter to pass the strict mode:
>> 0, IOMMU_STRICT;
>> 1, IOMMU_NON_STRICT;
>> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
>> other IOMMUs which still use strict mode.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>   drivers/iommu/io-pgtable-arm.c | 23 ++++++++++++++---------
>>   1 file changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index 010a254..9234db3 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -292,7 +292,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
>>     static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>                      unsigned long iova, size_t size, int lvl,
>> -                   arm_lpae_iopte *ptep);
>> +                   arm_lpae_iopte *ptep, int strict);
>>     static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>                   phys_addr_t paddr, arm_lpae_iopte prot,
>> @@ -334,7 +334,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>           size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>>             tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
>> -        if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
>> +        if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp, IOMMU_STRICT) != sz))
>>               return -EINVAL;
>>       }
>>   @@ -531,7 +531,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
>>   static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>                          unsigned long iova, size_t size,
>>                          arm_lpae_iopte blk_pte, int lvl,
>> -                       arm_lpae_iopte *ptep)
>> +                       arm_lpae_iopte *ptep, int strict)
> 
> DMA code should never ever be splitting blocks anyway, and frankly the TLB maintenance here is dodgy enough (since we can't reasonably do break-before make as VMSA says we should) that I *really* don't want to introduce any possibility of making it more asynchronous. I'd much rather just hard-code the expectation of strict == true for this.

OK, I will hard-code strict=true for it.

But since it never ever be happened, why did not give a warning at the beginning?

> 
> Robin.
> 
>>   {
>>       struct io_pgtable_cfg *cfg = &data->iop.cfg;
>>       arm_lpae_iopte pte, *tablep;
>> @@ -576,15 +576,18 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>       }
>>         if (unmap_idx < 0)
>> -        return __arm_lpae_unmap(data, iova, size, lvl, tablep);
>> +        return __arm_lpae_unmap(data, iova, size, lvl, tablep, strict);
>>         io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
>> +    if (!strict)
>> +        io_pgtable_tlb_sync(&data->iop);
>> +
>>       return size;
>>   }
>>     static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>                      unsigned long iova, size_t size, int lvl,
>> -                   arm_lpae_iopte *ptep)
>> +                   arm_lpae_iopte *ptep, int strict)
>>   {
>>       arm_lpae_iopte pte;
>>       struct io_pgtable *iop = &data->iop;
>> @@ -609,7 +612,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>               io_pgtable_tlb_sync(iop);
>>               ptep = iopte_deref(pte, data);
>>               __arm_lpae_free_pgtable(data, lvl + 1, ptep);
>> -        } else {
>> +        } else if (strict) {
>>               io_pgtable_tlb_add_flush(iop, iova, size, size, true);
>>           }
>>   @@ -620,25 +623,27 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>            * minus the part we want to unmap
>>            */
>>           return arm_lpae_split_blk_unmap(data, iova, size, pte,
>> -                        lvl + 1, ptep);
>> +                        lvl + 1, ptep, strict);
>>       }
>>         /* Keep on walkin' */
>>       ptep = iopte_deref(pte, data);
>> -    return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
>> +    return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep, strict);
>>   }
>>     static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>>                    size_t size)
>>   {
>> +    int strict = ((iova & IOMMU_STRICT_MODE_MASK) == IOMMU_STRICT);
>>       struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>>       arm_lpae_iopte *ptep = data->pgd;
>>       int lvl = ARM_LPAE_START_LVL(data);
>>   +    iova &= ~IOMMU_STRICT_MODE_MASK;
>>       if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
>>           return 0;
>>   -    return __arm_lpae_unmap(data, iova, size, lvl, ptep);
>> +    return __arm_lpae_unmap(data, iova, size, lvl, ptep, strict);
>>   }
>>     static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>>
> 
> .
> 

-- 
Thanks!
BestRegards


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 6/6] iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"
  2018-07-24 22:46   ` Robin Murphy
@ 2018-07-26  7:41     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 27+ messages in thread
From: Leizhen (ThunderTown) @ 2018-07-26  7:41 UTC (permalink / raw)
  To: Robin Murphy, Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel, LinuxArm



On 2018/7/25 6:46, Robin Murphy wrote:
> On 2018-07-12 7:18 AM, Zhen Lei wrote:
>> Because the non-strict mode introduces a vulnerability window, so add a
>> bootup option to make the manager can choose which mode to be used. The
>> default mode is IOMMU_STRICT.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>   Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++++
>>   drivers/iommu/arm-smmu-v3.c                     | 32 ++++++++++++++++++++++---
>>   2 files changed, 41 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index efc7aa7..0cc80bc 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1720,6 +1720,18 @@
>>           nobypass    [PPC/POWERNV]
>>               Disable IOMMU bypass, using IOMMU for PCI devices.
>>   +    iommu_strict_mode=    [arm-smmu-v3]
> 
> If anything this should belong to iommu-dma, as that's where the actual decision of whether to use a flush queue or not happens. Also it would be nice to stick to the iommu.* option namespace in the hope of maintaining some consistency.

How about arm_iommu? like "s390_iommu" and "intel_iommu". After all it only affect smmu now.

> 
>> +        0 - strict mode
>> +            Make sure all related TLBs to be invalidated before the
>> +            memory released.
>> +        1 - non-strict mode
>> +            Put off TLBs invalidation and release memory first. This mode
>> +            introduces a vlunerability window, an untrusted device can
>> +            access the reused memory because the TLBs may still valid.
>> +            Please take full consideration before choosing this mode.
>> +            Note that, VFIO is always use strict mode.
>> +        others - strict mode
>> +
>>       iommu.passthrough=
>>               [ARM64] Configure DMA to bypass the IOMMU by default.
>>               Format: { "0" | "1" }
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 4a198a0..9b72fc4 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -631,6 +631,24 @@ struct arm_smmu_option_prop {
>>       { 0, NULL},
>>   };
>>   +static u32 iommu_strict_mode __read_mostly = IOMMU_STRICT;
>> +
>> +static int __init setup_iommu_strict_mode(char *str)
>> +{
>> +    u32 strict_mode = IOMMU_STRICT;
>> +
>> +    get_option(&str, &strict_mode);
>> +    if (strict_mode == IOMMU_NON_STRICT) {
>> +        iommu_strict_mode = strict_mode;
>> +        pr_warn("WARNING: iommu non-strict mode is chose.\n"
>> +            "It's good for scatter-gather performance but lacks full isolation\n");
>> +        add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
>> +    }
>> +
>> +    return 0;
>> +}
>> +early_param("iommu_strict_mode", setup_iommu_strict_mode);
>> +
>>   static inline void __iomem *arm_smmu_page1_fixup(unsigned long offset,
>>                            struct arm_smmu_device *smmu)
>>   {
>> @@ -1441,7 +1459,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
>>       case IOMMU_CAP_NOEXEC:
>>           return true;
>>       case IOMMU_CAP_NON_STRICT:
>> -        return true;
>> +        return (iommu_strict_mode == IOMMU_NON_STRICT) ? true : false;
> 
> Ugh. The "completely redundant ternany" idiom hurts my soul :(

OK, I will modify it.

> 
> Robin.
> 
>>       default:
>>           return false;
>>       }
>> @@ -1750,6 +1768,14 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>>       return ret;
>>   }
>>   +static u32 arm_smmu_strict_mode(struct iommu_domain *domain)
>> +{
>> +    if (iommu_strict_mode == IOMMU_NON_STRICT)
>> +        return IOMMU_DOMAIN_STRICT_MODE(domain);
>> +
>> +    return IOMMU_STRICT;
>> +}
>> +
>>   static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>               phys_addr_t paddr, size_t size, int prot)
>>   {
>> @@ -1769,7 +1795,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>       if (!ops)
>>           return 0;
>>   -    return ops->unmap(ops, iova | IOMMU_DOMAIN_STRICT_MODE(domain), size);
>> +    return ops->unmap(ops, iova | arm_smmu_strict_mode(domain), size);
>>   }
>>     static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
>> @@ -1784,7 +1810,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
>>   {
>>       struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
>>   -    if (smmu && (IOMMU_DOMAIN_STRICT_MODE(domain) == IOMMU_STRICT))
>> +    if (smmu && (arm_smmu_strict_mode(domain) == IOMMU_STRICT))
>>           __arm_smmu_tlb_sync(smmu);
>>   }
>>  
> 
> .
> 

-- 
Thanks!
BestRegards


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3
  2018-07-26  3:44   ` Leizhen (ThunderTown)
@ 2018-07-26 14:16     ` Robin Murphy
  2018-07-27  2:49       ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2018-07-26 14:16 UTC (permalink / raw)
  To: Leizhen (ThunderTown),
	Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel, LinuxArm

On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote:
> 
> 
> On 2018/7/25 5:51, Robin Murphy wrote:
>> On 2018-07-12 7:18 AM, Zhen Lei wrote:
>>> v2 -> v3: Add a bootup option "iommu_strict_mode" to make the
>>> manager can choose which mode to be used. The first 5 patches
>>> have not changed. +    iommu_strict_mode=    [arm-smmu-v3] +
>>> 0 - strict mode (default) +        1 - non-strict mode
>>> 
>>> v1 -> v2: Use the lowest bit of the io_pgtable_ops.unmap's iova
>>> parameter to pass the strict mode: 0, IOMMU_STRICT; 1,
>>> IOMMU_NON_STRICT; Treat 0 as IOMMU_STRICT, so that the unmap
>>> operation can compatible with other IOMMUs which still use strict
>>> mode. In other words, this patch series will not impact other
>>> IOMMU drivers. I tried add a new quirk
>>> IO_PGTABLE_QUIRK_NON_STRICT in io_pgtable_cfg.quirks, but it can
>>> not pass the strict mode of the domain from SMMUv3 driver to
>>> io-pgtable module.
>> 
>> What exactly is the issue there? We don't have any problem with
>> other quirks like NO_DMA, and as I said before, by the time we're
>> allocating the io-pgtable in arm_smmu_domain_finalise() we already
>> know everything there is to know about the domain.
> 
> Because userspace can map/unamp and start devices to access memory
> through VFIO. So that, the attacker can: 1. alloc memory 2. map 3.
> unmap 4. free memory 5. repeatedly accesssing the just freed memory
> base on the just unmapped iova, this attack may success if the freed
> memory is reused by others and the mapping still staying in TLB

Right, but that's why we don't set non-strict mode on unmanaged domains; 
what I was asking about was specifically why "it can not pass the strict 
mode of the domain from SMMUv3 driver to io-pgtable module", because we 
don't get anywhere near io-pgtable until we already know whether the 
domain in question can allow lazy unmaps or not.

> But if only root user can use VFIO, this is an unnecessary worry.
> Then both normal and VFIO will use the same strict mode, so that the
> new quirk IO_PGTABLE_QUIRK_NON_STRICT can easily be applied.
> 
>> 
>>> Add a new member domain_non_strict in struct iommu_dma_cookie,
>>> this member will only be initialized when the related domain and
>>> IOMMU driver support non-strict mode.
>>> 
>>> v1: In common, a IOMMU unmap operation follow the below steps: 1.
>>> remove the mapping in page table of the specified iova range 2.
>>> execute tlbi command to invalid the mapping which is cached in
>>> TLB 3. wait for the above tlbi operation to be finished 4. free
>>> the IOVA resource 5. free the physical memory resource
>>> 
>>> This maybe a problem when unmap is very frequently, the
>>> combination of tlbi and wait operation will consume a lot of
>>> time. A feasible method is put off tlbi and iova-free operation,
>>> when accumulating to a certain number or reaching a specified
>>> time, execute only one tlbi_all command to clean up TLB, then
>>> free the backup IOVAs. Mark as non-strict mode.
>>> 
>>> But it must be noted that, although the mapping has already been
>>> removed in the page table, it maybe still exist in TLB. And the
>>> freed physical memory may also be reused for others. So a
>>> attacker can persistent access to memory based on the just freed
>>> IOVA, to obtain sensible data or corrupt memory. So the VFIO
>>> should always choose the strict mode.
>>> 
>>> Some may consider put off physical memory free also, that will
>>> still follow strict mode. But for the map_sg cases, the memory
>>> allocation is not controlled by IOMMU APIs, so it is not
>>> enforceable.
>>> 
>>> Fortunately, Intel and AMD have already applied the non-strict
>>> mode, and put queue_iova() operation into the common file
>>> dma-iommu.c., and my work is based on it. The difference is that
>>> arm-smmu-v3 driver will call IOMMU common APIs to unmap, but
>>> Intel and AMD IOMMU drivers are not.
>>> 
>>> Below is the performance data of strict vs non-strict for NVMe
>>> device: Randomly Read  IOPS: 146K(strict) vs 573K(non-strict) 
>>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
>> 
>> How does that compare to passthrough performance? One thing I'm not
>> entirely clear about is what the realistic use-case for this is -
>> even if invalidation were infinitely fast, enabling translation
>> still typically has a fair impact on overall system performance in
>> terms of latency, power, memory bandwidth, etc., so I can't help
>> wonder what devices exist today for which performance is critical
>> and robustness* is unimportant, yet have crippled addressing
>> capabilities such that they can't just use passthrough.
> I have no passthrough performance data yet, I will ask my team to do
> it. But we have tested the Global bypass: Randomly Read IOPS: 744K,
> and Randomly Write IOPS: is the same to non-strict.
> 
> I'm also not clear. But I think in most cases, the system does not
> need to run at full capacity, but the system should have that
> ability. For example, a system's daily load may only 30-50%, but the
> load may increase to 80%+ on festival day.
> 
> Passthrough is not enough to support VFIO, and virtualization need
> the later.

Huh? The whole point of passthrough mode is that the IOMMU can still be 
used for VFIO, but without imposing the overhead of dynamic mapping on 
host DMA.

Robin.

>> * I don't want to say "security" here, since I'm actually a lot
>> less concerned about the theoretical malicious endpoint/wild write
>> scenarios than the the much more straightforward malfunctioning
>> device and/or buggy driver causing use-after-free style memory
>> corruption. Also, I'm sick of the word "security"...
> 
> OK,We really have no need to consider buggy devices.
> 
>> 
>>> 
>>> Zhen Lei (6): iommu/arm-smmu-v3: fix the implementation of
>>> flush_iotlb_all hook iommu/dma: add support for non-strict mode 
>>> iommu/amd: use default branch to deal with all non-supported 
>>> capabilities iommu/io-pgtable-arm: add support for non-strict
>>> mode iommu/arm-smmu-v3: add support for non-strict mode 
>>> iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"
>>> 
>>> Documentation/admin-guide/kernel-parameters.txt | 12 +++++++ 
>>> drivers/iommu/amd_iommu.c                       |  4 +-- 
>>> drivers/iommu/arm-smmu-v3.c                     | 42
>>> +++++++++++++++++++++++-- drivers/iommu/dma-iommu.c
>>> | 25 +++++++++++++++ drivers/iommu/io-pgtable-arm.c
>>> | 23 ++++++++------ include/linux/iommu.h
>>> |  7 +++++ 6 files changed, 98 insertions(+), 15 deletions(-)
>>> 
>> 
>> .
>> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-07-26  7:20     ` Leizhen (ThunderTown)
@ 2018-07-26 14:35       ` Robin Murphy
  2018-08-06  1:32         ` Yang, Shunyong
  0 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2018-07-26 14:35 UTC (permalink / raw)
  To: Leizhen (ThunderTown),
	Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel

On 2018-07-26 8:20 AM, Leizhen (ThunderTown) wrote:
> On 2018/7/25 6:25, Robin Murphy wrote:
>> On 2018-07-12 7:18 AM, Zhen Lei wrote:
>>> To support the non-strict mode, now we only tlbi and sync for the strict
>>> mode. But for the non-leaf case, always follow strict mode.
>>>
>>> Use the lowest bit of the iova parameter to pass the strict mode:
>>> 0, IOMMU_STRICT;
>>> 1, IOMMU_NON_STRICT;
>>> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
>>> other IOMMUs which still use strict mode.
>>>
>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>> ---
>>>    drivers/iommu/io-pgtable-arm.c | 23 ++++++++++++++---------
>>>    1 file changed, 14 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>>> index 010a254..9234db3 100644
>>> --- a/drivers/iommu/io-pgtable-arm.c
>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>> @@ -292,7 +292,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
>>>      static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>                       unsigned long iova, size_t size, int lvl,
>>> -                   arm_lpae_iopte *ptep);
>>> +                   arm_lpae_iopte *ptep, int strict);
>>>      static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>                    phys_addr_t paddr, arm_lpae_iopte prot,
>>> @@ -334,7 +334,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>            size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>>>              tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
>>> -        if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
>>> +        if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp, IOMMU_STRICT) != sz))
>>>                return -EINVAL;
>>>        }
>>>    @@ -531,7 +531,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
>>>    static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>>                           unsigned long iova, size_t size,
>>>                           arm_lpae_iopte blk_pte, int lvl,
>>> -                       arm_lpae_iopte *ptep)
>>> +                       arm_lpae_iopte *ptep, int strict)
>>
>> DMA code should never ever be splitting blocks anyway, and frankly the TLB maintenance here is dodgy enough (since we can't reasonably do break-before make as VMSA says we should) that I *really* don't want to introduce any possibility of making it more asynchronous. I'd much rather just hard-code the expectation of strict == true for this.
> 
> OK, I will hard-code strict=true for it.
> 
> But since it never ever be happened, why did not give a warning at the beginning?

Because DMA code is not the only caller of iommu_map/unmap. It's 
perfectly legal in the IOMMU API to partially unmap a previous mapping 
such that a block entry needs to be split. The DMA API, however, is a 
lot more constrined, and thus by construction the iommu-dma layer will 
never generate a block-splitting iommu_unmap() except as a result of 
illegal DMA API usage, and we obviously do not need to optimise for that 
(you will get a warning about mismatched unmaps under dma-debug, but 
it's a bit too expensive to police in the general case).

Robin.

>>>    {
>>>        struct io_pgtable_cfg *cfg = &data->iop.cfg;
>>>        arm_lpae_iopte pte, *tablep;
>>> @@ -576,15 +576,18 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>>        }
>>>          if (unmap_idx < 0)
>>> -        return __arm_lpae_unmap(data, iova, size, lvl, tablep);
>>> +        return __arm_lpae_unmap(data, iova, size, lvl, tablep, strict);
>>>          io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
>>> +    if (!strict)
>>> +        io_pgtable_tlb_sync(&data->iop);
>>> +
>>>        return size;
>>>    }
>>>      static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>                       unsigned long iova, size_t size, int lvl,
>>> -                   arm_lpae_iopte *ptep)
>>> +                   arm_lpae_iopte *ptep, int strict)
>>>    {
>>>        arm_lpae_iopte pte;
>>>        struct io_pgtable *iop = &data->iop;
>>> @@ -609,7 +612,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>                io_pgtable_tlb_sync(iop);
>>>                ptep = iopte_deref(pte, data);
>>>                __arm_lpae_free_pgtable(data, lvl + 1, ptep);
>>> -        } else {
>>> +        } else if (strict) {
>>>                io_pgtable_tlb_add_flush(iop, iova, size, size, true);
>>>            }
>>>    @@ -620,25 +623,27 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>             * minus the part we want to unmap
>>>             */
>>>            return arm_lpae_split_blk_unmap(data, iova, size, pte,
>>> -                        lvl + 1, ptep);
>>> +                        lvl + 1, ptep, strict);
>>>        }
>>>          /* Keep on walkin' */
>>>        ptep = iopte_deref(pte, data);
>>> -    return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
>>> +    return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep, strict);
>>>    }
>>>      static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>>>                     size_t size)
>>>    {
>>> +    int strict = ((iova & IOMMU_STRICT_MODE_MASK) == IOMMU_STRICT);
>>>        struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>>>        arm_lpae_iopte *ptep = data->pgd;
>>>        int lvl = ARM_LPAE_START_LVL(data);
>>>    +    iova &= ~IOMMU_STRICT_MODE_MASK;
>>>        if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
>>>            return 0;
>>>    -    return __arm_lpae_unmap(data, iova, size, lvl, ptep);
>>> +    return __arm_lpae_unmap(data, iova, size, lvl, ptep, strict);
>>>    }
>>>      static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>>>
>>
>> .
>>
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3
  2018-07-26 14:16     ` Robin Murphy
@ 2018-07-27  2:49       ` Leizhen (ThunderTown)
  2018-07-27  9:37         ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Leizhen (ThunderTown) @ 2018-07-27  2:49 UTC (permalink / raw)
  To: Robin Murphy, Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel, LinuxArm



On 2018/7/26 22:16, Robin Murphy wrote:
> On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2018/7/25 5:51, Robin Murphy wrote:
>>> On 2018-07-12 7:18 AM, Zhen Lei wrote:
>>>> v2 -> v3: Add a bootup option "iommu_strict_mode" to make the
>>>> manager can choose which mode to be used. The first 5 patches
>>>> have not changed. +    iommu_strict_mode=    [arm-smmu-v3] +
>>>> 0 - strict mode (default) +        1 - non-strict mode
>>>>
>>>> v1 -> v2: Use the lowest bit of the io_pgtable_ops.unmap's iova
>>>> parameter to pass the strict mode: 0, IOMMU_STRICT; 1,
>>>> IOMMU_NON_STRICT; Treat 0 as IOMMU_STRICT, so that the unmap
>>>> operation can compatible with other IOMMUs which still use strict
>>>> mode. In other words, this patch series will not impact other
>>>> IOMMU drivers. I tried add a new quirk
>>>> IO_PGTABLE_QUIRK_NON_STRICT in io_pgtable_cfg.quirks, but it can
>>>> not pass the strict mode of the domain from SMMUv3 driver to
>>>> io-pgtable module.
>>>
>>> What exactly is the issue there? We don't have any problem with
>>> other quirks like NO_DMA, and as I said before, by the time we're
>>> allocating the io-pgtable in arm_smmu_domain_finalise() we already
>>> know everything there is to know about the domain.
>>
>> Because userspace can map/unamp and start devices to access memory
>> through VFIO. So that, the attacker can: 1. alloc memory 2. map 3.
>> unmap 4. free memory 5. repeatedly accesssing the just freed memory
>> base on the just unmapped iova, this attack may success if the freed
>> memory is reused by others and the mapping still staying in TLB
> 
> Right, but that's why we don't set non-strict mode on unmanaged domains; what I was asking about was specifically why "it can not pass the strict mode of the domain from SMMUv3 driver to io-pgtable module", because we don't get anywhere near io-pgtable until we already know whether the domain in question can allow lazy unmaps or not.

Sorry, it's my fault. I've all through mistaken that "data->iop.cfg" is shared by all domains until you mentioned me again.

> 
>> But if only root user can use VFIO, this is an unnecessary worry.
>> Then both normal and VFIO will use the same strict mode, so that the
>> new quirk IO_PGTABLE_QUIRK_NON_STRICT can easily be applied.
>>
>>>
>>>> Add a new member domain_non_strict in struct iommu_dma_cookie,
>>>> this member will only be initialized when the related domain and
>>>> IOMMU driver support non-strict mode.
>>>>
>>>> v1: In common, a IOMMU unmap operation follow the below steps: 1.
>>>> remove the mapping in page table of the specified iova range 2.
>>>> execute tlbi command to invalid the mapping which is cached in
>>>> TLB 3. wait for the above tlbi operation to be finished 4. free
>>>> the IOVA resource 5. free the physical memory resource
>>>>
>>>> This maybe a problem when unmap is very frequently, the
>>>> combination of tlbi and wait operation will consume a lot of
>>>> time. A feasible method is put off tlbi and iova-free operation,
>>>> when accumulating to a certain number or reaching a specified
>>>> time, execute only one tlbi_all command to clean up TLB, then
>>>> free the backup IOVAs. Mark as non-strict mode.
>>>>
>>>> But it must be noted that, although the mapping has already been
>>>> removed in the page table, it maybe still exist in TLB. And the
>>>> freed physical memory may also be reused for others. So a
>>>> attacker can persistent access to memory based on the just freed
>>>> IOVA, to obtain sensible data or corrupt memory. So the VFIO
>>>> should always choose the strict mode.
>>>>
>>>> Some may consider put off physical memory free also, that will
>>>> still follow strict mode. But for the map_sg cases, the memory
>>>> allocation is not controlled by IOMMU APIs, so it is not
>>>> enforceable.
>>>>
>>>> Fortunately, Intel and AMD have already applied the non-strict
>>>> mode, and put queue_iova() operation into the common file
>>>> dma-iommu.c., and my work is based on it. The difference is that
>>>> arm-smmu-v3 driver will call IOMMU common APIs to unmap, but
>>>> Intel and AMD IOMMU drivers are not.
>>>>
>>>> Below is the performance data of strict vs non-strict for NVMe
>>>> device: Randomly Read  IOPS: 146K(strict) vs 573K(non-strict) Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
>>>
>>> How does that compare to passthrough performance? One thing I'm not
>>> entirely clear about is what the realistic use-case for this is -
>>> even if invalidation were infinitely fast, enabling translation
>>> still typically has a fair impact on overall system performance in
>>> terms of latency, power, memory bandwidth, etc., so I can't help
>>> wonder what devices exist today for which performance is critical
>>> and robustness* is unimportant, yet have crippled addressing
>>> capabilities such that they can't just use passthrough.
>> I have no passthrough performance data yet, I will ask my team to do
>> it. But we have tested the Global bypass: Randomly Read IOPS: 744K,
>> and Randomly Write IOPS: is the same to non-strict.
>>
>> I'm also not clear. But I think in most cases, the system does not
>> need to run at full capacity, but the system should have that
>> ability. For example, a system's daily load may only 30-50%, but the
>> load may increase to 80%+ on festival day.
>>
>> Passthrough is not enough to support VFIO, and virtualization need
>> the later.
> 
> Huh? The whole point of passthrough mode is that the IOMMU can still be used for VFIO, but without imposing the overhead of dynamic mapping on host DMA.

I said that from my experience. Userspace do not known the PA, so I think the user can not fill dma_map.iova correctly.

	/* Allocate some space and setup a DMA mapping */
	dma_map.vaddr = (__u64)mmap(0, 0x1000, PROT_READ | PROT_WRITE,
			     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
	dma_map.size = 0x1000;
	dma_map.iova = 0x2f00000000UL;						/* user specified */
	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;

	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);

Further more, dma_map is also suitable for iommu_map_sg usage scenario.

> 
> Robin.
> 
>>> * I don't want to say "security" here, since I'm actually a lot
>>> less concerned about the theoretical malicious endpoint/wild write
>>> scenarios than the the much more straightforward malfunctioning
>>> device and/or buggy driver causing use-after-free style memory
>>> corruption. Also, I'm sick of the word "security"...
>>
>> OK,We really have no need to consider buggy devices.
>>
>>>
>>>>
>>>> Zhen Lei (6): iommu/arm-smmu-v3: fix the implementation of
>>>> flush_iotlb_all hook iommu/dma: add support for non-strict mode iommu/amd: use default branch to deal with all non-supported capabilities iommu/io-pgtable-arm: add support for non-strict
>>>> mode iommu/arm-smmu-v3: add support for non-strict mode iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"
>>>>
>>>> Documentation/admin-guide/kernel-parameters.txt | 12 +++++++ drivers/iommu/amd_iommu.c                       |  4 +-- drivers/iommu/arm-smmu-v3.c                     | 42
>>>> +++++++++++++++++++++++-- drivers/iommu/dma-iommu.c
>>>> | 25 +++++++++++++++ drivers/iommu/io-pgtable-arm.c
>>>> | 23 ++++++++------ include/linux/iommu.h
>>>> |  7 +++++ 6 files changed, 98 insertions(+), 15 deletions(-)
>>>>
>>>
>>> .
>>>
>>
> 
> .
> 

-- 
Thanks!
BestRegards


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3
  2018-07-27  2:49       ` Leizhen (ThunderTown)
@ 2018-07-27  9:37         ` Will Deacon
  0 siblings, 0 replies; 27+ messages in thread
From: Will Deacon @ 2018-07-27  9:37 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Robin Murphy, Jean-Philippe Brucker, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel, LinuxArm

On Fri, Jul 27, 2018 at 10:49:59AM +0800, Leizhen (ThunderTown) wrote:
> On 2018/7/26 22:16, Robin Murphy wrote:
> > On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote:
> >> Passthrough is not enough to support VFIO, and virtualization need
> >> the later.
> > 
> > Huh? The whole point of passthrough mode is that the IOMMU can still be
> > used for VFIO, but without imposing the overhead of dynamic mapping on
> > host DMA.
> 
> I said that from my experience. Userspace do not known the PA, so I think
> the user can not fill dma_map.iova correctly.
> 
> 	/* Allocate some space and setup a DMA mapping */
> 	dma_map.vaddr = (__u64)mmap(0, 0x1000, PROT_READ | PROT_WRITE,
> 			     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> 	dma_map.size = 0x1000;
> 	dma_map.iova = 0x2f00000000UL;						/* user specified */
> 	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
> 
> 	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);

Hmm, I'm pretty sure that's not the case. When passthrough is configured
via iommu.passthrough, it only applies to the default domain and therefore
won't affect the unmanaged domain that VFIO manages explicitly. So VFIO
will continue to use translation and userspace can't pass whatever it likes
for the iova.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-07-26 14:35       ` Robin Murphy
@ 2018-08-06  1:32         ` Yang, Shunyong
  2018-08-14  8:33           ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 27+ messages in thread
From: Yang, Shunyong @ 2018-08-06  1:32 UTC (permalink / raw)
  To: Robin Murphy, Leizhen (ThunderTown),
	Jean-Philippe Brucker, Will Deacon, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel

Hi, Robin,

On 2018/7/26 22:37, Robin Murphy wrote:
> On 2018-07-26 8:20 AM, Leizhen (ThunderTown) wrote:
>> On 2018/7/25 6:25, Robin Murphy wrote:
>>> On 2018-07-12 7:18 AM, Zhen Lei wrote:
>>>> To support the non-strict mode, now we only tlbi and sync for the strict
>>>> mode. But for the non-leaf case, always follow strict mode.
>>>>
>>>> Use the lowest bit of the iova parameter to pass the strict mode:
>>>> 0, IOMMU_STRICT;
>>>> 1, IOMMU_NON_STRICT;
>>>> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
>>>> other IOMMUs which still use strict mode.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>    drivers/iommu/io-pgtable-arm.c | 23 ++++++++++++++---------
>>>>    1 file changed, 14 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>>>> index 010a254..9234db3 100644
>>>> --- a/drivers/iommu/io-pgtable-arm.c
>>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>>> @@ -292,7 +292,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
>>>>      static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>>                       unsigned long iova, size_t size, int lvl,
>>>> -                   arm_lpae_iopte *ptep);
>>>> +                   arm_lpae_iopte *ptep, int strict);
>>>>      static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>>                    phys_addr_t paddr, arm_lpae_iopte prot,
>>>> @@ -334,7 +334,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>>            size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>>>>              tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
>>>> -        if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
>>>> +        if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp, IOMMU_STRICT) != sz))
>>>>                return -EINVAL;
>>>>        }
>>>>    @@ -531,7 +531,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
>>>>    static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>>>                           unsigned long iova, size_t size,
>>>>                           arm_lpae_iopte blk_pte, int lvl,
>>>> -                       arm_lpae_iopte *ptep)
>>>> +                       arm_lpae_iopte *ptep, int strict)
>>>
>>> DMA code should never ever be splitting blocks anyway, and frankly the TLB maintenance here is dodgy enough (since we can't reasonably do break-before make as VMSA says we should) that I *really* don't want to introduce any possibility of making it more asynchronous. I'd much rather just hard-code the expectation of strict == true for this.
>>
>> OK, I will hard-code strict=true for it.
>>
>> But since it never ever be happened, why did not give a warning at the beginning?
> 
> Because DMA code is not the only caller of iommu_map/unmap. It's 
> perfectly legal in the IOMMU API to partially unmap a previous mapping 
> such that a block entry needs to be split. The DMA API, however, is a 
> lot more constrined, and thus by construction the iommu-dma layer will 
> never generate a block-splitting iommu_unmap() except as a result of 
> illegal DMA API usage, and we obviously do not need to optimise for that 
> (you will get a warning about mismatched unmaps under dma-debug, but 
> it's a bit too expensive to police in the general case).
> 

When I was reading the code around arm_lpae_split_blk_unmap(), I was
curious in which scenario a block will be split. Now with your comments
"Because DMA code is not the only caller of iommu_map/unmap", it seems
depending on the user.

Would you please explain this further? I mean besides DMA, which user
will use iommu_map/umap and how it split a block.

Thanks.
Shunyong.

> 
>>>>    {
>>>>        struct io_pgtable_cfg *cfg = &data->iop.cfg;
>>>>        arm_lpae_iopte pte, *tablep;
>>>> @@ -576,15 +576,18 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>>>        }
>>>>          if (unmap_idx < 0)
>>>> -        return __arm_lpae_unmap(data, iova, size, lvl, tablep);
>>>> +        return __arm_lpae_unmap(data, iova, size, lvl, tablep, strict);
>>>>          io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
>>>> +    if (!strict)
>>>> +        io_pgtable_tlb_sync(&data->iop);
>>>> +
>>>>        return size;
>>>>    }
>>>>      static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>>                       unsigned long iova, size_t size, int lvl,
>>>> -                   arm_lpae_iopte *ptep)
>>>> +                   arm_lpae_iopte *ptep, int strict)
>>>>    {
>>>>        arm_lpae_iopte pte;
>>>>        struct io_pgtable *iop = &data->iop;
>>>> @@ -609,7 +612,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>>                io_pgtable_tlb_sync(iop);
>>>>                ptep = iopte_deref(pte, data);
>>>>                __arm_lpae_free_pgtable(data, lvl + 1, ptep);
>>>> -        } else {
>>>> +        } else if (strict) {
>>>>                io_pgtable_tlb_add_flush(iop, iova, size, size, true);
>>>>            }
>>>>    @@ -620,25 +623,27 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>>             * minus the part we want to unmap
>>>>             */
>>>>            return arm_lpae_split_blk_unmap(data, iova, size, pte,
>>>> -                        lvl + 1, ptep);
>>>> +                        lvl + 1, ptep, strict);
>>>>        }
>>>>          /* Keep on walkin' */
>>>>        ptep = iopte_deref(pte, data);
>>>> -    return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
>>>> +    return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep, strict);
>>>>    }
>>>>      static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>>>>                     size_t size)
>>>>    {
>>>> +    int strict = ((iova & IOMMU_STRICT_MODE_MASK) == IOMMU_STRICT);
>>>>        struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>>>>        arm_lpae_iopte *ptep = data->pgd;
>>>>        int lvl = ARM_LPAE_START_LVL(data);
>>>>    +    iova &= ~IOMMU_STRICT_MODE_MASK;
>>>>        if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
>>>>            return 0;
>>>>    -    return __arm_lpae_unmap(data, iova, size, lvl, ptep);
>>>> +    return __arm_lpae_unmap(data, iova, size, lvl, ptep, strict);
>>>>    }
>>>>      static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>>>>
>>>
>>> .
>>>
>>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-08-06  1:32         ` Yang, Shunyong
@ 2018-08-14  8:33           ` Leizhen (ThunderTown)
  2018-08-14  8:35             ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Leizhen (ThunderTown) @ 2018-08-14  8:33 UTC (permalink / raw)
  To: Yang, Shunyong, Robin Murphy, Jean-Philippe Brucker, Will Deacon,
	Joerg Roedel, linux-arm-kernel, iommu, linux-kernel



On 2018/8/6 9:32, Yang, Shunyong wrote:
> Hi, Robin,
> 
> On 2018/7/26 22:37, Robin Murphy wrote:
>> On 2018-07-26 8:20 AM, Leizhen (ThunderTown) wrote:
>>> On 2018/7/25 6:25, Robin Murphy wrote:
>>>> On 2018-07-12 7:18 AM, Zhen Lei wrote:
>>>>> To support the non-strict mode, now we only tlbi and sync for the strict
>>>>> mode. But for the non-leaf case, always follow strict mode.
>>>>>
>>>>> Use the lowest bit of the iova parameter to pass the strict mode:
>>>>> 0, IOMMU_STRICT;
>>>>> 1, IOMMU_NON_STRICT;
>>>>> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
>>>>> other IOMMUs which still use strict mode.
>>>>>
>>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>>> ---
>>>>>    drivers/iommu/io-pgtable-arm.c | 23 ++++++++++++++---------
>>>>>    1 file changed, 14 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>>>>> index 010a254..9234db3 100644
>>>>> --- a/drivers/iommu/io-pgtable-arm.c
>>>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>>>> @@ -292,7 +292,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
>>>>>      static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>>>                       unsigned long iova, size_t size, int lvl,
>>>>> -                   arm_lpae_iopte *ptep);
>>>>> +                   arm_lpae_iopte *ptep, int strict);
>>>>>      static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>>>                    phys_addr_t paddr, arm_lpae_iopte prot,
>>>>> @@ -334,7 +334,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>>>            size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>>>>>              tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
>>>>> -        if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
>>>>> +        if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp, IOMMU_STRICT) != sz))
>>>>>                return -EINVAL;
>>>>>        }
>>>>>    @@ -531,7 +531,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
>>>>>    static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>>>>                           unsigned long iova, size_t size,
>>>>>                           arm_lpae_iopte blk_pte, int lvl,
>>>>> -                       arm_lpae_iopte *ptep)
>>>>> +                       arm_lpae_iopte *ptep, int strict)
>>>>
>>>> DMA code should never ever be splitting blocks anyway, and frankly the TLB maintenance here is dodgy enough (since we can't reasonably do break-before make as VMSA says we should) that I *really* don't want to introduce any possibility of making it more asynchronous. I'd much rather just hard-code the expectation of strict == true for this.
>>>
>>> OK, I will hard-code strict=true for it.
>>>
>>> But since it never ever be happened, why did not give a warning at the beginning?
>>
>> Because DMA code is not the only caller of iommu_map/unmap. It's 
>> perfectly legal in the IOMMU API to partially unmap a previous mapping 
>> such that a block entry needs to be split. The DMA API, however, is a 
>> lot more constrined, and thus by construction the iommu-dma layer will 
>> never generate a block-splitting iommu_unmap() except as a result of 
>> illegal DMA API usage, and we obviously do not need to optimise for that 
>> (you will get a warning about mismatched unmaps under dma-debug, but 
>> it's a bit too expensive to police in the general case).
>>
> 
> When I was reading the code around arm_lpae_split_blk_unmap(), I was
> curious in which scenario a block will be split. Now with your comments
> "Because DMA code is not the only caller of iommu_map/unmap", it seems
> depending on the user.
> 
> Would you please explain this further? I mean besides DMA, which user
> will use iommu_map/umap and how it split a block.

I also think that arm_lpae_split_blk_unmap() scenario is not exist, maybe
we should remove it, and give a warning for this wrong usage.

> 
> Thanks.
> Shunyong.
> 
>>
>>>>>    {
>>>>>        struct io_pgtable_cfg *cfg = &data->iop.cfg;
>>>>>        arm_lpae_iopte pte, *tablep;
>>>>> @@ -576,15 +576,18 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>>>>        }
>>>>>          if (unmap_idx < 0)
>>>>> -        return __arm_lpae_unmap(data, iova, size, lvl, tablep);
>>>>> +        return __arm_lpae_unmap(data, iova, size, lvl, tablep, strict);
>>>>>          io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
>>>>> +    if (!strict)
>>>>> +        io_pgtable_tlb_sync(&data->iop);
>>>>> +
>>>>>        return size;
>>>>>    }
>>>>>      static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>>>                       unsigned long iova, size_t size, int lvl,
>>>>> -                   arm_lpae_iopte *ptep)
>>>>> +                   arm_lpae_iopte *ptep, int strict)
>>>>>    {
>>>>>        arm_lpae_iopte pte;
>>>>>        struct io_pgtable *iop = &data->iop;
>>>>> @@ -609,7 +612,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>>>                io_pgtable_tlb_sync(iop);
>>>>>                ptep = iopte_deref(pte, data);
>>>>>                __arm_lpae_free_pgtable(data, lvl + 1, ptep);
>>>>> -        } else {
>>>>> +        } else if (strict) {
>>>>>                io_pgtable_tlb_add_flush(iop, iova, size, size, true);
>>>>>            }
>>>>>    @@ -620,25 +623,27 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>>>>             * minus the part we want to unmap
>>>>>             */
>>>>>            return arm_lpae_split_blk_unmap(data, iova, size, pte,
>>>>> -                        lvl + 1, ptep);
>>>>> +                        lvl + 1, ptep, strict);
>>>>>        }
>>>>>          /* Keep on walkin' */
>>>>>        ptep = iopte_deref(pte, data);
>>>>> -    return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
>>>>> +    return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep, strict);
>>>>>    }
>>>>>      static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>>>>>                     size_t size)
>>>>>    {
>>>>> +    int strict = ((iova & IOMMU_STRICT_MODE_MASK) == IOMMU_STRICT);
>>>>>        struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>>>>>        arm_lpae_iopte *ptep = data->pgd;
>>>>>        int lvl = ARM_LPAE_START_LVL(data);
>>>>>    +    iova &= ~IOMMU_STRICT_MODE_MASK;
>>>>>        if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
>>>>>            return 0;
>>>>>    -    return __arm_lpae_unmap(data, iova, size, lvl, ptep);
>>>>> +    return __arm_lpae_unmap(data, iova, size, lvl, ptep, strict);
>>>>>    }
>>>>>      static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>>>>>
>>>>
>>>> .
>>>>
>>>
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
> 
> 
> .
> 

-- 
Thanks!
BestRegards


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-08-14  8:33           ` Leizhen (ThunderTown)
@ 2018-08-14  8:35             ` Will Deacon
  2018-08-14 10:02               ` Robin Murphy
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2018-08-14  8:35 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Yang, Shunyong, Robin Murphy, Jean-Philippe Brucker,
	Joerg Roedel, linux-arm-kernel, iommu, linux-kernel

On Tue, Aug 14, 2018 at 04:33:41PM +0800, Leizhen (ThunderTown) wrote:
> On 2018/8/6 9:32, Yang, Shunyong wrote:
> > On 2018/7/26 22:37, Robin Murphy wrote:
> >> Because DMA code is not the only caller of iommu_map/unmap. It's 
> >> perfectly legal in the IOMMU API to partially unmap a previous mapping 
> >> such that a block entry needs to be split. The DMA API, however, is a 
> >> lot more constrined, and thus by construction the iommu-dma layer will 
> >> never generate a block-splitting iommu_unmap() except as a result of 
> >> illegal DMA API usage, and we obviously do not need to optimise for that 
> >> (you will get a warning about mismatched unmaps under dma-debug, but 
> >> it's a bit too expensive to police in the general case).
> >>
> > 
> > When I was reading the code around arm_lpae_split_blk_unmap(), I was
> > curious in which scenario a block will be split. Now with your comments
> > "Because DMA code is not the only caller of iommu_map/unmap", it seems
> > depending on the user.
> > 
> > Would you please explain this further? I mean besides DMA, which user
> > will use iommu_map/umap and how it split a block.
> 
> I also think that arm_lpae_split_blk_unmap() scenario is not exist, maybe
> we should remove it, and give a warning for this wrong usage.

Can't it happen with VFIO?

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-08-14  8:35             ` Will Deacon
@ 2018-08-14 10:02               ` Robin Murphy
  2018-08-15  1:43                 ` Yang, Shunyong
  0 siblings, 1 reply; 27+ messages in thread
From: Robin Murphy @ 2018-08-14 10:02 UTC (permalink / raw)
  To: Will Deacon, Leizhen (ThunderTown)
  Cc: Yang, Shunyong, Jean-Philippe Brucker, Joerg Roedel,
	linux-arm-kernel, iommu, linux-kernel

On 14/08/18 09:35, Will Deacon wrote:
> On Tue, Aug 14, 2018 at 04:33:41PM +0800, Leizhen (ThunderTown) wrote:
>> On 2018/8/6 9:32, Yang, Shunyong wrote:
>>> On 2018/7/26 22:37, Robin Murphy wrote:
>>>> Because DMA code is not the only caller of iommu_map/unmap. It's
>>>> perfectly legal in the IOMMU API to partially unmap a previous mapping
>>>> such that a block entry needs to be split. The DMA API, however, is a
>>>> lot more constrined, and thus by construction the iommu-dma layer will
>>>> never generate a block-splitting iommu_unmap() except as a result of
>>>> illegal DMA API usage, and we obviously do not need to optimise for that
>>>> (you will get a warning about mismatched unmaps under dma-debug, but
>>>> it's a bit too expensive to police in the general case).
>>>>
>>>
>>> When I was reading the code around arm_lpae_split_blk_unmap(), I was
>>> curious in which scenario a block will be split. Now with your comments
>>> "Because DMA code is not the only caller of iommu_map/unmap", it seems
>>> depending on the user.
>>>
>>> Would you please explain this further? I mean besides DMA, which user
>>> will use iommu_map/umap and how it split a block.
>>
>> I also think that arm_lpae_split_blk_unmap() scenario is not exist, maybe
>> we should remove it, and give a warning for this wrong usage.
> 
> Can't it happen with VFIO?

...or GPU drivers, or anyone else managing their own IOMMU domain 
directly. A sequence like this is perfectly legal:

	iommu_map(domain, iova, paddr, SZ_8M, prot);
	...
	iommu_unmap(domain, iova + SZ_1M * 5, SZ_1M * 3);

where if iova and paddr happen to be suitably aligned, the map will lay 
down blocks, and the unmap will then have to split one of them into 
pages to remove half of it. We don't tear our hair out maintaining 
split_blk_unmap() for the fun of it :(

Robin.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-08-14 10:02               ` Robin Murphy
@ 2018-08-15  1:43                 ` Yang, Shunyong
  2018-08-15  7:33                   ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Yang, Shunyong @ 2018-08-15  1:43 UTC (permalink / raw)
  To: robin.murphy, thunder.leizhen, will.deacon
  Cc: linux-arm-kernel, joro, linux-kernel, iommu, jean-philippe.brucker

Hi, Robin,

On Tue, 2018-08-14 at 11:02 +0100, Robin Murphy wrote:
> On 14/08/18 09:35, Will Deacon wrote:
> > On Tue, Aug 14, 2018 at 04:33:41PM +0800, Leizhen (ThunderTown)
> > wrote:
> > > On 2018/8/6 9:32, Yang, Shunyong wrote:
> > > > On 2018/7/26 22:37, Robin Murphy wrote:
> > > > > Because DMA code is not the only caller of iommu_map/unmap.
> > > > > It's
> > > > > perfectly legal in the IOMMU API to partially unmap a
> > > > > previous mapping
> > > > > such that a block entry needs to be split. The DMA API,
> > > > > however, is a
> > > > > lot more constrined, and thus by construction the iommu-dma
> > > > > layer will
> > > > > never generate a block-splitting iommu_unmap() except as a
> > > > > result of
> > > > > illegal DMA API usage, and we obviously do not need to
> > > > > optimise for that
> > > > > (you will get a warning about mismatched unmaps under dma-
> > > > > debug, but
> > > > > it's a bit too expensive to police in the general case).
> > > > > 
> > > > 
> > > > When I was reading the code around arm_lpae_split_blk_unmap(),
> > > > I was
> > > > curious in which scenario a block will be split. Now with your
> > > > comments
> > > > "Because DMA code is not the only caller of iommu_map/unmap",
> > > > it seems
> > > > depending on the user.
> > > > 
> > > > Would you please explain this further? I mean besides DMA,
> > > > which user
> > > > will use iommu_map/umap and how it split a block.
> > > 
> > > I also think that arm_lpae_split_blk_unmap() scenario is not
> > > exist, maybe
> > > we should remove it, and give a warning for this wrong usage.
> > 
> > Can't it happen with VFIO?
> 
> ...or GPU drivers, or anyone else managing their own IOMMU domain 
> directly. A sequence like this is perfectly legal:
> 
> 	iommu_map(domain, iova, paddr, SZ_8M, prot);
> 	...
> 	iommu_unmap(domain, iova + SZ_1M * 5, SZ_1M * 3);
> 
> where if iova and paddr happen to be suitably aligned, the map will
> lay 
> down blocks, and the unmap will then have to split one of them into 
> pages to remove half of it. We don't tear our hair out maintaining 
> split_blk_unmap() for the fun of it :(

Thank you for the GPU example. But for VFIO, I remember all memory will
be   pinned in the early stage of emulator (such as qemu) start. So,
the split will occur at which operation? Maybe virtio balloon inflate?

Thanks.
Shunyong.

> 
> Robin.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-08-15  1:43                 ` Yang, Shunyong
@ 2018-08-15  7:33                   ` Will Deacon
  2018-08-15  7:35                     ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2018-08-15  7:33 UTC (permalink / raw)
  To: Yang, Shunyong
  Cc: robin.murphy, thunder.leizhen, linux-arm-kernel, joro,
	linux-kernel, iommu, jean-philippe.brucker

On Wed, Aug 15, 2018 at 01:43:37AM +0000, Yang, Shunyong wrote:
> On Tue, 2018-08-14 at 11:02 +0100, Robin Murphy wrote:
> > On 14/08/18 09:35, Will Deacon wrote:
> > > On Tue, Aug 14, 2018 at 04:33:41PM +0800, Leizhen (ThunderTown)
> > > wrote:
> > > > On 2018/8/6 9:32, Yang, Shunyong wrote:
> > > > > On 2018/7/26 22:37, Robin Murphy wrote:
> > > > > > Because DMA code is not the only caller of iommu_map/unmap.
> > > > > > It's
> > > > > > perfectly legal in the IOMMU API to partially unmap a
> > > > > > previous mapping
> > > > > > such that a block entry needs to be split. The DMA API,
> > > > > > however, is a
> > > > > > lot more constrined, and thus by construction the iommu-dma
> > > > > > layer will
> > > > > > never generate a block-splitting iommu_unmap() except as a
> > > > > > result of
> > > > > > illegal DMA API usage, and we obviously do not need to
> > > > > > optimise for that
> > > > > > (you will get a warning about mismatched unmaps under dma-
> > > > > > debug, but
> > > > > > it's a bit too expensive to police in the general case).
> > > > > >
> > > > >
> > > > > When I was reading the code around arm_lpae_split_blk_unmap(),
> > > > > I was
> > > > > curious in which scenario a block will be split. Now with your
> > > > > comments
> > > > > "Because DMA code is not the only caller of iommu_map/unmap",
> > > > > it seems
> > > > > depending on the user.
> > > > >
> > > > > Would you please explain this further? I mean besides DMA,
> > > > > which user
> > > > > will use iommu_map/umap and how it split a block.
> > > >
> > > > I also think that arm_lpae_split_blk_unmap() scenario is not
> > > > exist, maybe
> > > > we should remove it, and give a warning for this wrong usage.
> > >
> > > Can't it happen with VFIO?
> >
> > ...or GPU drivers, or anyone else managing their own IOMMU domain
> > directly. A sequence like this is perfectly legal:
> >
> >     iommu_map(domain, iova, paddr, SZ_8M, prot);
> >     ...
> >     iommu_unmap(domain, iova + SZ_1M * 5, SZ_1M * 3);
> >
> > where if iova and paddr happen to be suitably aligned, the map will
> > lay
> > down blocks, and the unmap will then have to split one of them into
> > pages to remove half of it. We don't tear our hair out maintaining
> > split_blk_unmap() for the fun of it :(
>
> Thank you for the GPU example. But for VFIO, I remember all memory will
> be   pinned in the early stage of emulator (such as qemu) start. So,
> the split will occur at which operation? Maybe virtio balloon inflate?

My memory is pretty hazy here, but I was fairly sure that VFIO didn't
always unmap() with the same granularity as it map()'d, at least for
the v1 interface. Either way, split_blk_unmap() was written because it was
necessary at the time, rather than just for fun!

Will
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-08-15  7:33                   ` Will Deacon
@ 2018-08-15  7:35                     ` Will Deacon
  2018-08-16  0:43                       ` Yang, Shunyong
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2018-08-15  7:35 UTC (permalink / raw)
  To: Yang, Shunyong
  Cc: jean-philippe.brucker, joro, linux-kernel, iommu,
	thunder.leizhen, robin.murphy, linux-arm-kernel

On Wed, Aug 15, 2018 at 08:33:01AM +0100, Will Deacon wrote:
> On Wed, Aug 15, 2018 at 01:43:37AM +0000, Yang, Shunyong wrote:
> > On Tue, 2018-08-14 at 11:02 +0100, Robin Murphy wrote:
> > > On 14/08/18 09:35, Will Deacon wrote:
> > > > On Tue, Aug 14, 2018 at 04:33:41PM +0800, Leizhen (ThunderTown)
> > > > wrote:
> > > > > On 2018/8/6 9:32, Yang, Shunyong wrote:
> > > > > > On 2018/7/26 22:37, Robin Murphy wrote:
> > > > > > > Because DMA code is not the only caller of iommu_map/unmap.
> > > > > > > It's
> > > > > > > perfectly legal in the IOMMU API to partially unmap a
> > > > > > > previous mapping
> > > > > > > such that a block entry needs to be split. The DMA API,
> > > > > > > however, is a
> > > > > > > lot more constrined, and thus by construction the iommu-dma
> > > > > > > layer will
> > > > > > > never generate a block-splitting iommu_unmap() except as a
> > > > > > > result of
> > > > > > > illegal DMA API usage, and we obviously do not need to
> > > > > > > optimise for that
> > > > > > > (you will get a warning about mismatched unmaps under dma-
> > > > > > > debug, but
> > > > > > > it's a bit too expensive to police in the general case).
> > > > > > >
> > > > > >
> > > > > > When I was reading the code around arm_lpae_split_blk_unmap(),
> > > > > > I was
> > > > > > curious in which scenario a block will be split. Now with your
> > > > > > comments
> > > > > > "Because DMA code is not the only caller of iommu_map/unmap",
> > > > > > it seems
> > > > > > depending on the user.
> > > > > >
> > > > > > Would you please explain this further? I mean besides DMA,
> > > > > > which user
> > > > > > will use iommu_map/umap and how it split a block.
> > > > >
> > > > > I also think that arm_lpae_split_blk_unmap() scenario is not
> > > > > exist, maybe
> > > > > we should remove it, and give a warning for this wrong usage.
> > > >
> > > > Can't it happen with VFIO?
> > >
> > > ...or GPU drivers, or anyone else managing their own IOMMU domain
> > > directly. A sequence like this is perfectly legal:
> > >
> > >     iommu_map(domain, iova, paddr, SZ_8M, prot);
> > >     ...
> > >     iommu_unmap(domain, iova + SZ_1M * 5, SZ_1M * 3);
> > >
> > > where if iova and paddr happen to be suitably aligned, the map will
> > > lay
> > > down blocks, and the unmap will then have to split one of them into
> > > pages to remove half of it. We don't tear our hair out maintaining
> > > split_blk_unmap() for the fun of it :(
> >
> > Thank you for the GPU example. But for VFIO, I remember all memory will
> > be   pinned in the early stage of emulator (such as qemu) start. So,
> > the split will occur at which operation? Maybe virtio balloon inflate?
> 
> My memory is pretty hazy here, but I was fairly sure that VFIO didn't
> always unmap() with the same granularity as it map()'d, at least for
> the v1 interface. Either way, split_blk_unmap() was written because it was
> necessary at the time, rather than just for fun!
> 
> Will
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.

Urgh, sorry about this threatening disclaimer ^^. Please disregard.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode
  2018-08-15  7:35                     ` Will Deacon
@ 2018-08-16  0:43                       ` Yang, Shunyong
  0 siblings, 0 replies; 27+ messages in thread
From: Yang, Shunyong @ 2018-08-16  0:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: jean-philippe.brucker, joro, linux-kernel, iommu,
	thunder.leizhen, robin.murphy, linux-arm-kernel

Hi, Will and Robin,

  Many thanks for your explanations.

Thanks.
Shunyong.

On 2018/8/15 15:35, Will Deacon wrote:
> On Wed, Aug 15, 2018 at 08:33:01AM +0100, Will Deacon wrote:
>> On Wed, Aug 15, 2018 at 01:43:37AM +0000, Yang, Shunyong wrote:
>>> On Tue, 2018-08-14 at 11:02 +0100, Robin Murphy wrote:
>>>> On 14/08/18 09:35, Will Deacon wrote:
>>>>> On Tue, Aug 14, 2018 at 04:33:41PM +0800, Leizhen (ThunderTown)
>>>>> wrote:
>>>>>> On 2018/8/6 9:32, Yang, Shunyong wrote:
>>>>>>> On 2018/7/26 22:37, Robin Murphy wrote:
>>>>>>>> Because DMA code is not the only caller of iommu_map/unmap.
>>>>>>>> It's
>>>>>>>> perfectly legal in the IOMMU API to partially unmap a
>>>>>>>> previous mapping
>>>>>>>> such that a block entry needs to be split. The DMA API,
>>>>>>>> however, is a
>>>>>>>> lot more constrined, and thus by construction the iommu-dma
>>>>>>>> layer will
>>>>>>>> never generate a block-splitting iommu_unmap() except as a
>>>>>>>> result of
>>>>>>>> illegal DMA API usage, and we obviously do not need to
>>>>>>>> optimise for that
>>>>>>>> (you will get a warning about mismatched unmaps under dma-
>>>>>>>> debug, but
>>>>>>>> it's a bit too expensive to police in the general case).
>>>>>>>>
>>>>>>>
>>>>>>> When I was reading the code around arm_lpae_split_blk_unmap(),
>>>>>>> I was
>>>>>>> curious in which scenario a block will be split. Now with your
>>>>>>> comments
>>>>>>> "Because DMA code is not the only caller of iommu_map/unmap",
>>>>>>> it seems
>>>>>>> depending on the user.
>>>>>>>
>>>>>>> Would you please explain this further? I mean besides DMA,
>>>>>>> which user
>>>>>>> will use iommu_map/umap and how it split a block.
>>>>>>
>>>>>> I also think that arm_lpae_split_blk_unmap() scenario is not
>>>>>> exist, maybe
>>>>>> we should remove it, and give a warning for this wrong usage.
>>>>>
>>>>> Can't it happen with VFIO?
>>>>
>>>> ...or GPU drivers, or anyone else managing their own IOMMU domain
>>>> directly. A sequence like this is perfectly legal:
>>>>
>>>>     iommu_map(domain, iova, paddr, SZ_8M, prot);
>>>>     ...
>>>>     iommu_unmap(domain, iova + SZ_1M * 5, SZ_1M * 3);
>>>>
>>>> where if iova and paddr happen to be suitably aligned, the map will
>>>> lay
>>>> down blocks, and the unmap will then have to split one of them into
>>>> pages to remove half of it. We don't tear our hair out maintaining
>>>> split_blk_unmap() for the fun of it :(
>>>
>>> Thank you for the GPU example. But for VFIO, I remember all memory will
>>> be   pinned in the early stage of emulator (such as qemu) start. So,
>>> the split will occur at which operation? Maybe virtio balloon inflate?
>>
>> My memory is pretty hazy here, but I was fairly sure that VFIO didn't
>> always unmap() with the same granularity as it map()'d, at least for
>> the v1 interface. Either way, split_blk_unmap() was written because it was
>> necessary at the time, rather than just for fun!
>>
>> Will
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
> 
> Urgh, sorry about this threatening disclaimer ^^. Please disregard.
> 
> Will
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-08-16  0:44 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-12  6:18 [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Zhen Lei
2018-07-12  6:18 ` [PATCH v3 1/6] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook Zhen Lei
2018-07-12  6:18 ` [PATCH v3 2/6] iommu/dma: add support for non-strict mode Zhen Lei
2018-07-24 22:01   ` Robin Murphy
2018-07-26  4:15     ` Leizhen (ThunderTown)
2018-07-12  6:18 ` [PATCH v3 3/6] iommu/amd: use default branch to deal with all non-supported capabilities Zhen Lei
2018-07-12  6:18 ` [PATCH v3 4/6] iommu/io-pgtable-arm: add support for non-strict mode Zhen Lei
2018-07-24 22:25   ` Robin Murphy
2018-07-26  7:20     ` Leizhen (ThunderTown)
2018-07-26 14:35       ` Robin Murphy
2018-08-06  1:32         ` Yang, Shunyong
2018-08-14  8:33           ` Leizhen (ThunderTown)
2018-08-14  8:35             ` Will Deacon
2018-08-14 10:02               ` Robin Murphy
2018-08-15  1:43                 ` Yang, Shunyong
2018-08-15  7:33                   ` Will Deacon
2018-08-15  7:35                     ` Will Deacon
2018-08-16  0:43                       ` Yang, Shunyong
2018-07-12  6:18 ` [PATCH v3 5/6] iommu/arm-smmu-v3: " Zhen Lei
2018-07-12  6:18 ` [PATCH v3 6/6] iommu/arm-smmu-v3: add bootup option "iommu_strict_mode" Zhen Lei
2018-07-24 22:46   ` Robin Murphy
2018-07-26  7:41     ` Leizhen (ThunderTown)
2018-07-24 21:51 ` [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Robin Murphy
2018-07-26  3:44   ` Leizhen (ThunderTown)
2018-07-26 14:16     ` Robin Murphy
2018-07-27  2:49       ` Leizhen (ThunderTown)
2018-07-27  9:37         ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).