linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] add non-strict mode support for arm-smmu-v3
@ 2018-05-31  7:42 Zhen Lei
  2018-05-31  7:42 ` [PATCH 1/7] iommu/dma: fix trival coding style mistake Zhen Lei
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Zhen Lei @ 2018-05-31  7:42 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Zhen Lei, Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

In common, a IOMMU unmap operation follow the below steps:
1. remove the mapping in page table of the specified iova range
2. execute tlbi command to invalid the mapping which is cached in TLB
3. wait for the above tlbi operation to be finished
4. free the IOVA resource
5. free the physical memory resource

This maybe a problem when unmap is very frequently, the combination of tlbi
and wait operation will consume a lot of time. A feasible method is put off
tlbi and iova-free operation, when accumulating to a certain number or
reaching a specified time, execute only one tlbi_all command to clean up
TLB, then free the backup IOVAs. Mark as non-strict mode.

But it must be noted that, although the mapping has already been removed in
the page table, it maybe still exist in TLB. And the freed physical memory
may also be reused for others. So a attacker can persistent access to memory
based on the just freed IOVA, to obtain sensible data or corrupt memory. So
the VFIO should always choose the strict mode.

Some may consider put off physical memory free also, that will still follow
strict mode. But for the map_sg cases, the memory allocation is not controlled
by IOMMU APIs, so it is not enforceable.

Fortunately, Intel and AMD have already applied the non-strict mode, and put
queue_iova() operation into the common file dma-iommu.c., and my work is based
on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
unmap, but Intel and AMD IOMMU drivers are not.

Below is the performance data of strict vs non-strict for NVMe device:
Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
Randomly Write IOPS: 143K(strict) vs 513K(non-strict)


Zhen Lei (7):
  iommu/dma: fix trival coding style mistake
  iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
  iommu: prepare for the non-strict mode support
  iommu/amd: make sure TLB to be flushed before IOVA freed
  iommu/dma: add support for non-strict mode
  iommu/io-pgtable-arm: add support for non-strict mode
  iommu/arm-smmu-v3: add support for non-strict mode

 drivers/iommu/amd_iommu.c          |  2 +-
 drivers/iommu/arm-smmu-v3.c        | 16 ++++++++++++---
 drivers/iommu/arm-smmu.c           |  2 +-
 drivers/iommu/dma-iommu.c          | 41 ++++++++++++++++++++++++++++++--------
 drivers/iommu/io-pgtable-arm-v7s.c |  6 +++---
 drivers/iommu/io-pgtable-arm.c     | 28 ++++++++++++++------------
 drivers/iommu/io-pgtable.h         |  2 +-
 drivers/iommu/ipmmu-vmsa.c         |  2 +-
 drivers/iommu/msm_iommu.c          |  2 +-
 drivers/iommu/mtk_iommu.c          |  2 +-
 drivers/iommu/qcom_iommu.c         |  2 +-
 include/linux/iommu.h              |  5 +++++
 12 files changed, 76 insertions(+), 34 deletions(-)

-- 
1.8.3

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] iommu/dma: fix trival coding style mistake
  2018-05-31  7:42 [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Zhen Lei
@ 2018-05-31  7:42 ` Zhen Lei
  2018-05-31 13:03   ` Robin Murphy
  2018-05-31  7:42 ` [PATCH 2/7] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook Zhen Lei
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Zhen Lei @ 2018-05-31  7:42 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Zhen Lei, Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

The static function iova_reserve_iommu_regions is only called by function
iommu_dma_init_domain, and the 'if (!dev)' check in iommu_dma_init_domain
affect it only, so we can safely move the check into it. I think it looks
more natural.

In addition, the local variable 'ret' is only assigned in the branch of
'if (region->type == IOMMU_RESV_MSI)', so the 'if (ret)' should also only
take effect in the branch, add a brace to enclose it.

No functional changes.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/dma-iommu.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ddcbbdb..4e885f7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
 	LIST_HEAD(resv_regions);
 	int ret = 0;
 
+	if (!dev)
+		return 0;
+
 	if (dev_is_pci(dev))
 		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
 
@@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev,
 		hi = iova_pfn(iovad, region->start + region->length - 1);
 		reserve_iova(iovad, lo, hi);
 
-		if (region->type == IOMMU_RESV_MSI)
+		if (region->type == IOMMU_RESV_MSI) {
 			ret = cookie_init_hw_msi_region(cookie, region->start,
 					region->start + region->length);
-		if (ret)
-			break;
+			if (ret)
+				break;
+		}
 	}
 	iommu_put_resv_regions(dev, &resv_regions);
 
@@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
 	}
 
 	init_iova_domain(iovad, 1UL << order, base_pfn);
-	if (!dev)
-		return 0;
 
 	return iova_reserve_iommu_regions(dev, domain);
 }
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
  2018-05-31  7:42 [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Zhen Lei
  2018-05-31  7:42 ` [PATCH 1/7] iommu/dma: fix trival coding style mistake Zhen Lei
@ 2018-05-31  7:42 ` Zhen Lei
  2018-05-31  7:42 ` [PATCH 3/7] iommu: prepare for the non-strict mode support Zhen Lei
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Zhen Lei @ 2018-05-31  7:42 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Zhen Lei, Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

.flush_iotlb_all can not just wait for previous tlbi operations to be
completed, but should also invalid all TLBs of the related domain.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm-smmu-v3.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1d64710..4402187 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1770,6 +1770,14 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 	return ops->unmap(ops, iova, size);
 }
 
+static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (smmu_domain->smmu)
+		arm_smmu_tlb_inv_context(smmu_domain);
+}
+
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
 	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1998,7 +2006,7 @@ static void arm_smmu_put_resv_regions(struct device *dev,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-	.flush_iotlb_all	= arm_smmu_iotlb_sync,
+	.flush_iotlb_all	= arm_smmu_flush_iotlb_all,
 	.iotlb_sync		= arm_smmu_iotlb_sync,
 	.iova_to_phys		= arm_smmu_iova_to_phys,
 	.add_device		= arm_smmu_add_device,
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] iommu: prepare for the non-strict mode support
  2018-05-31  7:42 [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Zhen Lei
  2018-05-31  7:42 ` [PATCH 1/7] iommu/dma: fix trival coding style mistake Zhen Lei
  2018-05-31  7:42 ` [PATCH 2/7] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook Zhen Lei
@ 2018-05-31  7:42 ` Zhen Lei
  2018-05-31 13:04   ` Robin Murphy
  2018-05-31  7:42 ` [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed Zhen Lei
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Zhen Lei @ 2018-05-31  7:42 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Zhen Lei, Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

In common, a IOMMU unmap operation follow the below steps:
1. remove the mapping in page table of the specified iova range
2. execute tlbi command to invalid the mapping which is cached in TLB
3. wait for the above tlbi operation to be finished
4. free the IOVA resource
5. free the physical memory resource

This maybe a problem when unmap is very frequently, the combination of tlbi
and wait operation will consume a lot of time. A feasible method is put off
tlbi and iova-free operation, when accumulating to a certain number or
reaching a specified time, execute only one tlbi_all command to clean up
TLB, then free the backup IOVAs. Mark as non-strict mode.

But it must be noted that, although the mapping has already been removed in
the page table, it maybe still exist in TLB. And the freed physical memory
may also be reused for others. So a attacker can persistent access to memory
based on the just freed IOVA, to obtain sensible data or corrupt memory. So
the VFIO should always choose the strict mode.

This patch just add a new parameter for the unmap operation, to help the
upper functions capable choose which mode to be applied.

No functional changes.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm-smmu-v3.c        | 2 +-
 drivers/iommu/arm-smmu.c           | 2 +-
 drivers/iommu/io-pgtable-arm-v7s.c | 6 +++---
 drivers/iommu/io-pgtable-arm.c     | 6 +++---
 drivers/iommu/io-pgtable.h         | 2 +-
 drivers/iommu/ipmmu-vmsa.c         | 2 +-
 drivers/iommu/msm_iommu.c          | 2 +-
 drivers/iommu/mtk_iommu.c          | 2 +-
 drivers/iommu/qcom_iommu.c         | 2 +-
 include/linux/iommu.h              | 2 ++
 10 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4402187..59b3387 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1767,7 +1767,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	return ops->unmap(ops, iova, size, IOMMU_STRICT);
 }
 
 static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 69e7c60..253e807 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1249,7 +1249,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	return ops->unmap(ops, iova, size, IOMMU_STRICT);
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 10e4a3d..799eced 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -658,7 +658,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
 }
 
 static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
-			    size_t size)
+			    size_t size, int strict)
 {
 	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
 
@@ -883,7 +883,7 @@ static int __init arm_v7s_do_selftests(void)
 	size = 1UL << __ffs(cfg.pgsize_bitmap);
 	while (i < loopnr) {
 		iova_start = i * SZ_16M;
-		if (ops->unmap(ops, iova_start + size, size) != size)
+		if (ops->unmap(ops, iova_start + size, size, IOMMU_STRICT) != size)
 			return __FAIL(ops);
 
 		/* Remap of partial unmap */
@@ -902,7 +902,7 @@ static int __init arm_v7s_do_selftests(void)
 	while (i != BITS_PER_LONG) {
 		size = 1UL << i;
 
-		if (ops->unmap(ops, iova, size) != size)
+		if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
 			return __FAIL(ops);
 
 		if (ops->iova_to_phys(ops, iova + 42))
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 39c2a05..e0f52db 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -624,7 +624,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 }
 
 static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
-			     size_t size)
+			     size_t size, int strict)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	arm_lpae_iopte *ptep = data->pgd;
@@ -1108,7 +1108,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
 
 		/* Partial unmap */
 		size = 1UL << __ffs(cfg->pgsize_bitmap);
-		if (ops->unmap(ops, SZ_1G + size, size) != size)
+		if (ops->unmap(ops, SZ_1G + size, size, IOMMU_STRICT) != size)
 			return __FAIL(ops, i);
 
 		/* Remap of partial unmap */
@@ -1124,7 +1124,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
 		while (j != BITS_PER_LONG) {
 			size = 1UL << j;
 
-			if (ops->unmap(ops, iova, size) != size)
+			if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
 				return __FAIL(ops, i);
 
 			if (ops->iova_to_phys(ops, iova + 42))
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 2df7909..2908806 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -120,7 +120,7 @@ struct io_pgtable_ops {
 	int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
-			size_t size);
+			size_t size, int strict);
 	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
 				    unsigned long iova);
 };
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 40ae6e8..e6d9e11 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -716,7 +716,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
 {
 	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
 
-	return domain->iop->unmap(domain->iop, iova, size);
+	return domain->iop->unmap(domain->iop, iova, size, IOMMU_STRICT);
 }
 
 static void ipmmu_iotlb_sync(struct iommu_domain *io_domain)
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 0d33504..180fa3d 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -532,7 +532,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	unsigned long flags;
 
 	spin_lock_irqsave(&priv->pgtlock, flags);
-	len = priv->iop->unmap(priv->iop, iova, len);
+	len = priv->iop->unmap(priv->iop, iova, len, IOMMU_STRICT);
 	spin_unlock_irqrestore(&priv->pgtlock, flags);
 
 	return len;
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index f2832a1..54661ed 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -386,7 +386,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
 	size_t unmapsz;
 
 	spin_lock_irqsave(&dom->pgtlock, flags);
-	unmapsz = dom->iop->unmap(dom->iop, iova, size);
+	unmapsz = dom->iop->unmap(dom->iop, iova, size, IOMMU_STRICT);
 	spin_unlock_irqrestore(&dom->pgtlock, flags);
 
 	return unmapsz;
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 65b9c99..90abde1 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -444,7 +444,7 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	 */
 	pm_runtime_get_sync(qcom_domain->iommu->dev);
 	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
-	ret = ops->unmap(ops, iova, size);
+	ret = ops->unmap(ops, iova, size, IOMMU_STRICT);
 	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
 	pm_runtime_put_sync(qcom_domain->iommu->dev);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee..39b3150 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -86,6 +86,8 @@ struct iommu_domain_geometry {
 #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
 				 __IOMMU_DOMAIN_DMA_API)
 
+#define IOMMU_STRICT		1
+
 struct iommu_domain {
 	unsigned type;
 	const struct iommu_ops *ops;
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed
  2018-05-31  7:42 [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Zhen Lei
                   ` (2 preceding siblings ...)
  2018-05-31  7:42 ` [PATCH 3/7] iommu: prepare for the non-strict mode support Zhen Lei
@ 2018-05-31  7:42 ` Zhen Lei
  2018-05-31 13:04   ` Robin Murphy
  2018-05-31  7:42 ` [PATCH 5/7] iommu/dma: add support for non-strict mode Zhen Lei
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Zhen Lei @ 2018-05-31  7:42 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Zhen Lei, Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

Although the mapping has already been removed in the page table, it maybe
still exist in TLB. Suppose the freed IOVAs is reused by others before the
flush operation completed, the new user can not correctly access to its
meomory.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/amd_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8fb8c73..93aa389 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2402,9 +2402,9 @@ static void __unmap_single(struct dma_ops_domain *dma_dom,
 	}
 
 	if (amd_iommu_unmap_flush) {
-		dma_ops_free_iova(dma_dom, dma_addr, pages);
 		domain_flush_tlb(&dma_dom->domain);
 		domain_flush_complete(&dma_dom->domain);
+		dma_ops_free_iova(dma_dom, dma_addr, pages);
 	} else {
 		pages = __roundup_pow_of_two(pages);
 		queue_iova(&dma_dom->iovad, dma_addr >> PAGE_SHIFT, pages, 0);
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] iommu/dma: add support for non-strict mode
  2018-05-31  7:42 [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Zhen Lei
                   ` (3 preceding siblings ...)
  2018-05-31  7:42 ` [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed Zhen Lei
@ 2018-05-31  7:42 ` Zhen Lei
  2018-05-31 13:04   ` Robin Murphy
  2018-06-01 17:51   ` kbuild test robot
  2018-05-31  7:42 ` [PATCH 6/7] iommu/io-pgtable-arm: " Zhen Lei
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Zhen Lei @ 2018-05-31  7:42 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Zhen Lei, Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
   capable call domain->ops->flush_iotlb_all to flush TLB.
2. Define a new iommu capable: IOMMU_CAP_NON_STRICT, which used to indicate
   that the iommu domain support non-strict mode.
3. During the iommu domain initialization phase, call capable() to check
   whether it support non-strcit mode. If so, call init_iova_flush_queue
   to register iovad->flush_cb callback.
4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
   -->iommu_dma_free_iova. Use iovad->flush_cb to check whether its related
   iommu support non-strict mode or not, and call IOMMU_DOMAIN_IS_STRICT to
   make sure the IOMMU_DOMAIN_UNMANAGED domain always follow strict mode.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/dma-iommu.c | 29 ++++++++++++++++++++++++++---
 include/linux/iommu.h     |  3 +++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4e885f7..2e116d9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -55,6 +55,7 @@ struct iommu_dma_cookie {
 	};
 	struct list_head		msi_page_list;
 	spinlock_t			msi_lock;
+	struct iommu_domain		*domain;
 };
 
 static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
@@ -64,7 +65,8 @@ static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
 	return PAGE_SIZE;
 }
 
-static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
+static struct iommu_dma_cookie *cookie_alloc(struct iommu_domain *domain,
+					     enum iommu_dma_cookie_type type)
 {
 	struct iommu_dma_cookie *cookie;
 
@@ -73,6 +75,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
 		spin_lock_init(&cookie->msi_lock);
 		INIT_LIST_HEAD(&cookie->msi_page_list);
 		cookie->type = type;
+		cookie->domain = domain;
 	}
 	return cookie;
 }
@@ -94,7 +97,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
 	if (domain->iova_cookie)
 		return -EEXIST;
 
-	domain->iova_cookie = cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
+	domain->iova_cookie = cookie_alloc(domain, IOMMU_DMA_IOVA_COOKIE);
 	if (!domain->iova_cookie)
 		return -ENOMEM;
 
@@ -124,7 +127,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
 	if (domain->iova_cookie)
 		return -EEXIST;
 
-	cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+	cookie = cookie_alloc(domain, IOMMU_DMA_MSI_COOKIE);
 	if (!cookie)
 		return -ENOMEM;
 
@@ -261,6 +264,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
 	return ret;
 }
 
+static void iova_flush_iotlb_all(struct iova_domain *iovad)
+{
+	struct iommu_dma_cookie *cookie;
+	struct iommu_domain *domain;
+
+	cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
+	domain = cookie->domain;
+
+	domain->ops->flush_iotlb_all(domain);
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -276,6 +290,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
 int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
 		u64 size, struct device *dev)
 {
+	const struct iommu_ops *ops = domain->ops;
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	unsigned long order, base_pfn, end_pfn;
@@ -313,6 +328,11 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
 
 	init_iova_domain(iovad, 1UL << order, base_pfn);
 
+	if (ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) {
+		BUG_ON(!ops->flush_iotlb_all);
+		init_iova_flush_queue(iovad, iova_flush_iotlb_all, NULL);
+	}
+
 	return iova_reserve_iommu_regions(dev, domain);
 }
 EXPORT_SYMBOL(iommu_dma_init_domain);
@@ -392,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
 	/* The MSI case is only ever cleaning up its most recent allocation */
 	if (cookie->type == IOMMU_DMA_MSI_COOKIE)
 		cookie->msi_iova -= size;
+	else if (!IOMMU_DOMAIN_IS_STRICT(cookie->domain) && iovad->flush_cb)
+		queue_iova(iovad, iova_pfn(iovad, iova),
+				size >> iova_shift(iovad), 0);
 	else
 		free_iova_fast(iovad, iova_pfn(iovad, iova),
 				size >> iova_shift(iovad));
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 39b3150..01ff569 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -87,6 +87,8 @@ struct iommu_domain_geometry {
 				 __IOMMU_DOMAIN_DMA_API)
 
 #define IOMMU_STRICT		1
+#define IOMMU_DOMAIN_IS_STRICT(domain)	\
+		(domain->type == IOMMU_DOMAIN_UNMANAGED)
 
 struct iommu_domain {
 	unsigned type;
@@ -103,6 +105,7 @@ enum iommu_cap {
 					   transactions */
 	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
 	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
+	IOMMU_CAP_NON_STRICT,		/* IOMMU supports non-strict mode */
 };
 
 /*
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] iommu/io-pgtable-arm: add support for non-strict mode
  2018-05-31  7:42 [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Zhen Lei
                   ` (4 preceding siblings ...)
  2018-05-31  7:42 ` [PATCH 5/7] iommu/dma: add support for non-strict mode Zhen Lei
@ 2018-05-31  7:42 ` Zhen Lei
  2018-05-31  7:42 ` [PATCH 7/7] iommu/arm-smmu-v3: " Zhen Lei
  2018-05-31 11:24 ` [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Robin Murphy
  7 siblings, 0 replies; 23+ messages in thread
From: Zhen Lei @ 2018-05-31  7:42 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Zhen Lei, Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

To support the non-strict mode, now we only tlbi and sync for the strict
mode. But for the non-leaf case, always follow strict mode.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/io-pgtable-arm.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e0f52db..1a65b7b 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -287,7 +287,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			       unsigned long iova, size_t size, int lvl,
-			       arm_lpae_iopte *ptep);
+			       arm_lpae_iopte *ptep, int strict);
 
 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 				phys_addr_t paddr, arm_lpae_iopte prot,
@@ -329,7 +329,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 		size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
 
 		tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-		if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
+		if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp, IOMMU_STRICT) != sz))
 			return -EINVAL;
 	}
 
@@ -526,7 +526,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 				       unsigned long iova, size_t size,
 				       arm_lpae_iopte blk_pte, int lvl,
-				       arm_lpae_iopte *ptep)
+				       arm_lpae_iopte *ptep, int strict)
 {
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte pte, *tablep;
@@ -571,15 +571,17 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 	}
 
 	if (unmap_idx < 0)
-		return __arm_lpae_unmap(data, iova, size, lvl, tablep);
+		return __arm_lpae_unmap(data, iova, size, lvl, tablep, strict);
+
+	if (strict)
+		io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
 
-	io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
 	return size;
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			       unsigned long iova, size_t size, int lvl,
-			       arm_lpae_iopte *ptep)
+			       arm_lpae_iopte *ptep, int strict)
 {
 	arm_lpae_iopte pte;
 	struct io_pgtable *iop = &data->iop;
@@ -604,7 +606,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			io_pgtable_tlb_sync(iop);
 			ptep = iopte_deref(pte, data);
 			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
-		} else {
+		} else if (strict) {
 			io_pgtable_tlb_add_flush(iop, iova, size, size, true);
 		}
 
@@ -615,12 +617,12 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 		 * minus the part we want to unmap
 		 */
 		return arm_lpae_split_blk_unmap(data, iova, size, pte,
-						lvl + 1, ptep);
+						lvl + 1, ptep, strict);
 	}
 
 	/* Keep on walkin' */
 	ptep = iopte_deref(pte, data);
-	return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
+	return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep, strict);
 }
 
 static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
@@ -633,7 +635,7 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
 		return 0;
 
-	return __arm_lpae_unmap(data, iova, size, lvl, ptep);
+	return __arm_lpae_unmap(data, iova, size, lvl, ptep, strict);
 }
 
 static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] iommu/arm-smmu-v3: add support for non-strict mode
  2018-05-31  7:42 [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Zhen Lei
                   ` (5 preceding siblings ...)
  2018-05-31  7:42 ` [PATCH 6/7] iommu/io-pgtable-arm: " Zhen Lei
@ 2018-05-31  7:42 ` Zhen Lei
  2018-05-31 11:24 ` [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Robin Murphy
  7 siblings, 0 replies; 23+ messages in thread
From: Zhen Lei @ 2018-05-31  7:42 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Zhen Lei, Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

1. Add IOMMU_CAP_NON_STRICT capability.
2. Dynamic choose strict or non-strict mode base on the iommu domain type.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/iommu/arm-smmu-v3.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 59b3387..25bccbd 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1440,6 +1440,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		return true;
 	case IOMMU_CAP_NOEXEC:
 		return true;
+	case IOMMU_CAP_NON_STRICT:
+		return true;
 	default:
 		return false;
 	}
@@ -1767,7 +1769,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size, IOMMU_STRICT);
+	return ops->unmap(ops, iova, size, IOMMU_DOMAIN_IS_STRICT(domain));
 }
 
 static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
@@ -1782,7 +1784,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
 	struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
 
-	if (smmu)
+	if (smmu && IOMMU_DOMAIN_IS_STRICT(domain))
 		__arm_smmu_tlb_sync(smmu);
 }
 
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3
  2018-05-31  7:42 [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Zhen Lei
                   ` (6 preceding siblings ...)
  2018-05-31  7:42 ` [PATCH 7/7] iommu/arm-smmu-v3: " Zhen Lei
@ 2018-05-31 11:24 ` Robin Murphy
  2018-05-31 13:49   ` Hanjun Guo
  7 siblings, 1 reply; 23+ messages in thread
From: Robin Murphy @ 2018-05-31 11:24 UTC (permalink / raw)
  To: Zhen Lei, Will Deacon, Matthias Brugger, Rob Clark, Joerg Roedel,
	linux-mediatek, linux-arm-msm, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

On 31/05/18 08:42, Zhen Lei wrote:
> In common, a IOMMU unmap operation follow the below steps:
> 1. remove the mapping in page table of the specified iova range
> 2. execute tlbi command to invalid the mapping which is cached in TLB
> 3. wait for the above tlbi operation to be finished
> 4. free the IOVA resource
> 5. free the physical memory resource
> 
> This maybe a problem when unmap is very frequently, the combination of tlbi
> and wait operation will consume a lot of time. A feasible method is put off
> tlbi and iova-free operation, when accumulating to a certain number or
> reaching a specified time, execute only one tlbi_all command to clean up
> TLB, then free the backup IOVAs. Mark as non-strict mode.
> 
> But it must be noted that, although the mapping has already been removed in
> the page table, it maybe still exist in TLB. And the freed physical memory
> may also be reused for others. So a attacker can persistent access to memory
> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
> the VFIO should always choose the strict mode.
> 
> Some may consider put off physical memory free also, that will still follow
> strict mode. But for the map_sg cases, the memory allocation is not controlled
> by IOMMU APIs, so it is not enforceable.
> 
> Fortunately, Intel and AMD have already applied the non-strict mode, and put
> queue_iova() operation into the common file dma-iommu.c., and my work is based
> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
> unmap, but Intel and AMD IOMMU drivers are not.
> 
> Below is the performance data of strict vs non-strict for NVMe device:
> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)

What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then 
you'll still be using the rubbish globally-blocking sync implementation. 
If that is the case, I'd be very interested to see how much there is to 
gain from just improving that - I've had a patch kicking around for a 
while[1] (also on a rebased branch at [2]), but don't have the means for 
serious performance testing.

Robin.

[1] 
https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg20576.html
[2] git://linux-arm.org/linux-rm iommu/smmu

> 
> 
> Zhen Lei (7):
>    iommu/dma: fix trival coding style mistake
>    iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
>    iommu: prepare for the non-strict mode support
>    iommu/amd: make sure TLB to be flushed before IOVA freed
>    iommu/dma: add support for non-strict mode
>    iommu/io-pgtable-arm: add support for non-strict mode
>    iommu/arm-smmu-v3: add support for non-strict mode
> 
>   drivers/iommu/amd_iommu.c          |  2 +-
>   drivers/iommu/arm-smmu-v3.c        | 16 ++++++++++++---
>   drivers/iommu/arm-smmu.c           |  2 +-
>   drivers/iommu/dma-iommu.c          | 41 ++++++++++++++++++++++++++++++--------
>   drivers/iommu/io-pgtable-arm-v7s.c |  6 +++---
>   drivers/iommu/io-pgtable-arm.c     | 28 ++++++++++++++------------
>   drivers/iommu/io-pgtable.h         |  2 +-
>   drivers/iommu/ipmmu-vmsa.c         |  2 +-
>   drivers/iommu/msm_iommu.c          |  2 +-
>   drivers/iommu/mtk_iommu.c          |  2 +-
>   drivers/iommu/qcom_iommu.c         |  2 +-
>   include/linux/iommu.h              |  5 +++++
>   12 files changed, 76 insertions(+), 34 deletions(-)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] iommu/dma: fix trival coding style mistake
  2018-05-31  7:42 ` [PATCH 1/7] iommu/dma: fix trival coding style mistake Zhen Lei
@ 2018-05-31 13:03   ` Robin Murphy
  2018-06-04 11:05     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Robin Murphy @ 2018-05-31 13:03 UTC (permalink / raw)
  To: Zhen Lei, Will Deacon, Matthias Brugger, Rob Clark, Joerg Roedel,
	linux-mediatek, linux-arm-msm, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

On 31/05/18 08:42, Zhen Lei wrote:
> The static function iova_reserve_iommu_regions is only called by function
> iommu_dma_init_domain, and the 'if (!dev)' check in iommu_dma_init_domain
> affect it only, so we can safely move the check into it. I think it looks
> more natural.

As before, I disagree - the logic of iommu_dma_init_domain() is "we 
expect to have a valid device, but stop here if we don't", and moving 
the check just needlessly obfuscates that. It is not a coincidence that 
the arguments of both functions are in effective order of importance.

> In addition, the local variable 'ret' is only assigned in the branch of
> 'if (region->type == IOMMU_RESV_MSI)', so the 'if (ret)' should also only
> take effect in the branch, add a brace to enclose it.

'ret' is clearly also assigned at its declaration, to cover the (very 
likely) case where we don't enter the loop at all. Thus testing it in 
the loop is harmless, and cluttering that up with extra tabs and braces 
is just noise.

Robin.

> No functional changes.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/dma-iommu.c | 12 +++++++-----
>   1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index ddcbbdb..4e885f7 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   	LIST_HEAD(resv_regions);
>   	int ret = 0;
>   
> +	if (!dev)
> +		return 0;
> +
>   	if (dev_is_pci(dev))
>   		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
>   
> @@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   		hi = iova_pfn(iovad, region->start + region->length - 1);
>   		reserve_iova(iovad, lo, hi);
>   
> -		if (region->type == IOMMU_RESV_MSI)
> +		if (region->type == IOMMU_RESV_MSI) {
>   			ret = cookie_init_hw_msi_region(cookie, region->start,
>   					region->start + region->length);
> -		if (ret)
> -			break;
> +			if (ret)
> +				break;
> +		}
>   	}
>   	iommu_put_resv_regions(dev, &resv_regions);
>   
> @@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   	}
>   
>   	init_iova_domain(iovad, 1UL << order, base_pfn);
> -	if (!dev)
> -		return 0;
>   
>   	return iova_reserve_iommu_regions(dev, domain);
>   }
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] iommu: prepare for the non-strict mode support
  2018-05-31  7:42 ` [PATCH 3/7] iommu: prepare for the non-strict mode support Zhen Lei
@ 2018-05-31 13:04   ` Robin Murphy
  2018-06-04 11:27     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Robin Murphy @ 2018-05-31 13:04 UTC (permalink / raw)
  To: Zhen Lei, Will Deacon, Matthias Brugger, Rob Clark, Joerg Roedel,
	linux-mediatek, linux-arm-msm, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

On 31/05/18 08:42, Zhen Lei wrote:
> In common, a IOMMU unmap operation follow the below steps:
> 1. remove the mapping in page table of the specified iova range
> 2. execute tlbi command to invalid the mapping which is cached in TLB
> 3. wait for the above tlbi operation to be finished
> 4. free the IOVA resource
> 5. free the physical memory resource
> 
> This maybe a problem when unmap is very frequently, the combination of tlbi
> and wait operation will consume a lot of time. A feasible method is put off
> tlbi and iova-free operation, when accumulating to a certain number or
> reaching a specified time, execute only one tlbi_all command to clean up
> TLB, then free the backup IOVAs. Mark as non-strict mode.
> 
> But it must be noted that, although the mapping has already been removed in
> the page table, it maybe still exist in TLB. And the freed physical memory
> may also be reused for others. So a attacker can persistent access to memory
> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
> the VFIO should always choose the strict mode.
> 
> This patch just add a new parameter for the unmap operation, to help the
> upper functions capable choose which mode to be applied.

This seems like it might be better handled by a flag in 
io_pgtable_cfg->quirks. This interface change on its own looks rather 
invasive, and teh fact that it ends up only being used to pass through a 
constant property of the domain (which is already known by the point 
io_pgtable_alloc() is called) implies that it is indeed the wrong level 
of abstraction.

> No functional changes.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/arm-smmu-v3.c        | 2 +-
>   drivers/iommu/arm-smmu.c           | 2 +-
>   drivers/iommu/io-pgtable-arm-v7s.c | 6 +++---
>   drivers/iommu/io-pgtable-arm.c     | 6 +++---
>   drivers/iommu/io-pgtable.h         | 2 +-
>   drivers/iommu/ipmmu-vmsa.c         | 2 +-
>   drivers/iommu/msm_iommu.c          | 2 +-
>   drivers/iommu/mtk_iommu.c          | 2 +-
>   drivers/iommu/qcom_iommu.c         | 2 +-
>   include/linux/iommu.h              | 2 ++

Plus things specific to io-pgtable shouldn't really be spilling into the 
core API header either.

Robin.

>   10 files changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 4402187..59b3387 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1767,7 +1767,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>   	if (!ops)
>   		return 0;
>   
> -	return ops->unmap(ops, iova, size);
> +	return ops->unmap(ops, iova, size, IOMMU_STRICT);
>   }
>   
>   static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 69e7c60..253e807 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1249,7 +1249,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>   	if (!ops)
>   		return 0;
>   
> -	return ops->unmap(ops, iova, size);
> +	return ops->unmap(ops, iova, size, IOMMU_STRICT);
>   }
>   
>   static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 10e4a3d..799eced 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -658,7 +658,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>   }
>   
>   static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> -			    size_t size)
> +			    size_t size, int strict)
>   {
>   	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
>   
> @@ -883,7 +883,7 @@ static int __init arm_v7s_do_selftests(void)
>   	size = 1UL << __ffs(cfg.pgsize_bitmap);
>   	while (i < loopnr) {
>   		iova_start = i * SZ_16M;
> -		if (ops->unmap(ops, iova_start + size, size) != size)
> +		if (ops->unmap(ops, iova_start + size, size, IOMMU_STRICT) != size)
>   			return __FAIL(ops);
>   
>   		/* Remap of partial unmap */
> @@ -902,7 +902,7 @@ static int __init arm_v7s_do_selftests(void)
>   	while (i != BITS_PER_LONG) {
>   		size = 1UL << i;
>   
> -		if (ops->unmap(ops, iova, size) != size)
> +		if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
>   			return __FAIL(ops);
>   
>   		if (ops->iova_to_phys(ops, iova + 42))
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 39c2a05..e0f52db 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -624,7 +624,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>   }
>   
>   static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> -			     size_t size)
> +			     size_t size, int strict)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>   	arm_lpae_iopte *ptep = data->pgd;
> @@ -1108,7 +1108,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>   
>   		/* Partial unmap */
>   		size = 1UL << __ffs(cfg->pgsize_bitmap);
> -		if (ops->unmap(ops, SZ_1G + size, size) != size)
> +		if (ops->unmap(ops, SZ_1G + size, size, IOMMU_STRICT) != size)
>   			return __FAIL(ops, i);
>   
>   		/* Remap of partial unmap */
> @@ -1124,7 +1124,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>   		while (j != BITS_PER_LONG) {
>   			size = 1UL << j;
>   
> -			if (ops->unmap(ops, iova, size) != size)
> +			if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
>   				return __FAIL(ops, i);
>   
>   			if (ops->iova_to_phys(ops, iova + 42))
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index 2df7909..2908806 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -120,7 +120,7 @@ struct io_pgtable_ops {
>   	int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
>   		   phys_addr_t paddr, size_t size, int prot);
>   	size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
> -			size_t size);
> +			size_t size, int strict);
>   	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
>   				    unsigned long iova);
>   };
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index 40ae6e8..e6d9e11 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -716,7 +716,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
>   {
>   	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
>   
> -	return domain->iop->unmap(domain->iop, iova, size);
> +	return domain->iop->unmap(domain->iop, iova, size, IOMMU_STRICT);
>   }
>   
>   static void ipmmu_iotlb_sync(struct iommu_domain *io_domain)
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index 0d33504..180fa3d 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -532,7 +532,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&priv->pgtlock, flags);
> -	len = priv->iop->unmap(priv->iop, iova, len);
> +	len = priv->iop->unmap(priv->iop, iova, len, IOMMU_STRICT);
>   	spin_unlock_irqrestore(&priv->pgtlock, flags);
>   
>   	return len;
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index f2832a1..54661ed 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -386,7 +386,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>   	size_t unmapsz;
>   
>   	spin_lock_irqsave(&dom->pgtlock, flags);
> -	unmapsz = dom->iop->unmap(dom->iop, iova, size);
> +	unmapsz = dom->iop->unmap(dom->iop, iova, size, IOMMU_STRICT);
>   	spin_unlock_irqrestore(&dom->pgtlock, flags);
>   
>   	return unmapsz;
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 65b9c99..90abde1 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -444,7 +444,7 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>   	 */
>   	pm_runtime_get_sync(qcom_domain->iommu->dev);
>   	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
> -	ret = ops->unmap(ops, iova, size);
> +	ret = ops->unmap(ops, iova, size, IOMMU_STRICT);
>   	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
>   	pm_runtime_put_sync(qcom_domain->iommu->dev);
>   
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 19938ee..39b3150 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -86,6 +86,8 @@ struct iommu_domain_geometry {
>   #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
>   				 __IOMMU_DOMAIN_DMA_API)
>   
> +#define IOMMU_STRICT		1
> +
>   struct iommu_domain {
>   	unsigned type;
>   	const struct iommu_ops *ops;
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed
  2018-05-31  7:42 ` [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed Zhen Lei
@ 2018-05-31 13:04   ` Robin Murphy
  2018-06-04 11:41     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Robin Murphy @ 2018-05-31 13:04 UTC (permalink / raw)
  To: Zhen Lei, Will Deacon, Matthias Brugger, Rob Clark, Joerg Roedel,
	linux-mediatek, linux-arm-msm, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

On 31/05/18 08:42, Zhen Lei wrote:
> Although the mapping has already been removed in the page table, it maybe
> still exist in TLB. Suppose the freed IOVAs is reused by others before the
> flush operation completed, the new user can not correctly access to its
> meomory.

This change seems reasonable in isolation, but why is it right in the 
middle of a series which has nothing to do with x86?

Robin.

> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/amd_iommu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 8fb8c73..93aa389 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2402,9 +2402,9 @@ static void __unmap_single(struct dma_ops_domain *dma_dom,
>   	}
>   
>   	if (amd_iommu_unmap_flush) {
> -		dma_ops_free_iova(dma_dom, dma_addr, pages);
>   		domain_flush_tlb(&dma_dom->domain);
>   		domain_flush_complete(&dma_dom->domain);
> +		dma_ops_free_iova(dma_dom, dma_addr, pages);
>   	} else {
>   		pages = __roundup_pow_of_two(pages);
>   		queue_iova(&dma_dom->iovad, dma_addr >> PAGE_SHIFT, pages, 0);
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] iommu/dma: add support for non-strict mode
  2018-05-31  7:42 ` [PATCH 5/7] iommu/dma: add support for non-strict mode Zhen Lei
@ 2018-05-31 13:04   ` Robin Murphy
  2018-06-04 12:04     ` Leizhen (ThunderTown)
  2018-06-01 17:51   ` kbuild test robot
  1 sibling, 1 reply; 23+ messages in thread
From: Robin Murphy @ 2018-05-31 13:04 UTC (permalink / raw)
  To: Zhen Lei, Will Deacon, Matthias Brugger, Rob Clark, Joerg Roedel,
	linux-mediatek, linux-arm-msm, linux-arm-kernel, iommu,
	linux-kernel
  Cc: Hanjun Guo, Libin, Guozhu Li, Xinwei Hu

On 31/05/18 08:42, Zhen Lei wrote:
> 1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
>     capable call domain->ops->flush_iotlb_all to flush TLB.
> 2. Define a new iommu capable: IOMMU_CAP_NON_STRICT, which used to indicate
>     that the iommu domain support non-strict mode.
> 3. During the iommu domain initialization phase, call capable() to check
>     whether it support non-strcit mode. If so, call init_iova_flush_queue
>     to register iovad->flush_cb callback.
> 4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
>     -->iommu_dma_free_iova. Use iovad->flush_cb to check whether its related
>     iommu support non-strict mode or not, and call IOMMU_DOMAIN_IS_STRICT to
>     make sure the IOMMU_DOMAIN_UNMANAGED domain always follow strict mode.

Once again, this is a whole load of complexity for a property which 
could just be statically encoded at allocation, e.g. in the cookie type.

> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/dma-iommu.c | 29 ++++++++++++++++++++++++++---
>   include/linux/iommu.h     |  3 +++
>   2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 4e885f7..2e116d9 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -55,6 +55,7 @@ struct iommu_dma_cookie {
>   	};
>   	struct list_head		msi_page_list;
>   	spinlock_t			msi_lock;
> +	struct iommu_domain		*domain;
>   };
>   
>   static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
> @@ -64,7 +65,8 @@ static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
>   	return PAGE_SIZE;
>   }
>   
> -static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
> +static struct iommu_dma_cookie *cookie_alloc(struct iommu_domain *domain,
> +					     enum iommu_dma_cookie_type type)
>   {
>   	struct iommu_dma_cookie *cookie;
>   
> @@ -73,6 +75,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
>   		spin_lock_init(&cookie->msi_lock);
>   		INIT_LIST_HEAD(&cookie->msi_page_list);
>   		cookie->type = type;
> +		cookie->domain = domain;
>   	}
>   	return cookie;
>   }
> @@ -94,7 +97,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
>   	if (domain->iova_cookie)
>   		return -EEXIST;
>   
> -	domain->iova_cookie = cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
> +	domain->iova_cookie = cookie_alloc(domain, IOMMU_DMA_IOVA_COOKIE);
>   	if (!domain->iova_cookie)
>   		return -ENOMEM;
>   
> @@ -124,7 +127,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>   	if (domain->iova_cookie)
>   		return -EEXIST;
>   
> -	cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
> +	cookie = cookie_alloc(domain, IOMMU_DMA_MSI_COOKIE);
>   	if (!cookie)
>   		return -ENOMEM;
>   
> @@ -261,6 +264,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   	return ret;
>   }
>   
> +static void iova_flush_iotlb_all(struct iova_domain *iovad)

iommu_dma_flush...

> +{
> +	struct iommu_dma_cookie *cookie;
> +	struct iommu_domain *domain;
> +
> +	cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
> +	domain = cookie->domain;
> +
> +	domain->ops->flush_iotlb_all(domain);
> +}
> +
>   /**
>    * iommu_dma_init_domain - Initialise a DMA mapping domain
>    * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
> @@ -276,6 +290,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   		u64 size, struct device *dev)
>   {
> +	const struct iommu_ops *ops = domain->ops;
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	struct iova_domain *iovad = &cookie->iovad;
>   	unsigned long order, base_pfn, end_pfn;
> @@ -313,6 +328,11 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   
>   	init_iova_domain(iovad, 1UL << order, base_pfn);
>   
> +	if (ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) {
> +		BUG_ON(!ops->flush_iotlb_all);
> +		init_iova_flush_queue(iovad, iova_flush_iotlb_all, NULL);
> +	}
> +
>   	return iova_reserve_iommu_regions(dev, domain);
>   }
>   EXPORT_SYMBOL(iommu_dma_init_domain);
> @@ -392,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
>   	/* The MSI case is only ever cleaning up its most recent allocation */
>   	if (cookie->type == IOMMU_DMA_MSI_COOKIE)
>   		cookie->msi_iova -= size;
> +	else if (!IOMMU_DOMAIN_IS_STRICT(cookie->domain) && iovad->flush_cb)
> +		queue_iova(iovad, iova_pfn(iovad, iova),
> +				size >> iova_shift(iovad), 0);
>   	else
>   		free_iova_fast(iovad, iova_pfn(iovad, iova),
>   				size >> iova_shift(iovad));
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 39b3150..01ff569 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -87,6 +87,8 @@ struct iommu_domain_geometry {
>   				 __IOMMU_DOMAIN_DMA_API)
>   
>   #define IOMMU_STRICT		1
> +#define IOMMU_DOMAIN_IS_STRICT(domain)	\
> +		(domain->type == IOMMU_DOMAIN_UNMANAGED)
>   
>   struct iommu_domain {
>   	unsigned type;
> @@ -103,6 +105,7 @@ enum iommu_cap {
>   					   transactions */
>   	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
>   	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
> +	IOMMU_CAP_NON_STRICT,		/* IOMMU supports non-strict mode */

This isn't a property of the IOMMU, it depends purely on the driver 
implementation. I think it also doesn't matter anyway - if a caller asks 
for lazy unmapping on their domain but the IOMMU driver just does strict 
unmaps anyway because that's all it supports, there's no actual harm done.

Robin.

>   };
>   
>   /*
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3
  2018-05-31 11:24 ` [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Robin Murphy
@ 2018-05-31 13:49   ` Hanjun Guo
  2018-05-31 14:25     ` Robin Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Hanjun Guo @ 2018-05-31 13:49 UTC (permalink / raw)
  To: Robin Murphy, Zhen Lei, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Libin, Guozhu Li, Xinwei Hu

Hi Robin,

On 2018/5/31 19:24, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> In common, a IOMMU unmap operation follow the below steps:
>> 1. remove the mapping in page table of the specified iova range
>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>> 3. wait for the above tlbi operation to be finished
>> 4. free the IOVA resource
>> 5. free the physical memory resource
>>
>> This maybe a problem when unmap is very frequently, the combination of tlbi
>> and wait operation will consume a lot of time. A feasible method is put off
>> tlbi and iova-free operation, when accumulating to a certain number or
>> reaching a specified time, execute only one tlbi_all command to clean up
>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>
>> But it must be noted that, although the mapping has already been removed in
>> the page table, it maybe still exist in TLB. And the freed physical memory
>> may also be reused for others. So a attacker can persistent access to memory
>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>> the VFIO should always choose the strict mode.
>>
>> Some may consider put off physical memory free also, that will still follow
>> strict mode. But for the map_sg cases, the memory allocation is not controlled
>> by IOMMU APIs, so it is not enforceable.
>>
>> Fortunately, Intel and AMD have already applied the non-strict mode, and put
>> queue_iova() operation into the common file dma-iommu.c., and my work is based
>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
>> unmap, but Intel and AMD IOMMU drivers are not.
>>
>> Below is the performance data of strict vs non-strict for NVMe device:
>> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
> 
> What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then you'll still be using the rubbish globally-blocking sync implementation. If that is the case, I'd be very interested to see how much there is to gain from just improving that - I've had a patch kicking around for a while[1] (also on a rebased branch at [2]), but don't have the means for serious performance testing.

The hardware is the new D06 which the SMMU with MSIs,
it's not D05 :)

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3
  2018-05-31 13:49   ` Hanjun Guo
@ 2018-05-31 14:25     ` Robin Murphy
  2018-06-01  6:50       ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Robin Murphy @ 2018-05-31 14:25 UTC (permalink / raw)
  To: Hanjun Guo, Zhen Lei, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Libin, Guozhu Li, Xinwei Hu

On 31/05/18 14:49, Hanjun Guo wrote:
> Hi Robin,
> 
> On 2018/5/31 19:24, Robin Murphy wrote:
>> On 31/05/18 08:42, Zhen Lei wrote:
>>> In common, a IOMMU unmap operation follow the below steps:
>>> 1. remove the mapping in page table of the specified iova range
>>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>>> 3. wait for the above tlbi operation to be finished
>>> 4. free the IOVA resource
>>> 5. free the physical memory resource
>>>
>>> This maybe a problem when unmap is very frequently, the combination of tlbi
>>> and wait operation will consume a lot of time. A feasible method is put off
>>> tlbi and iova-free operation, when accumulating to a certain number or
>>> reaching a specified time, execute only one tlbi_all command to clean up
>>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>>
>>> But it must be noted that, although the mapping has already been removed in
>>> the page table, it maybe still exist in TLB. And the freed physical memory
>>> may also be reused for others. So a attacker can persistent access to memory
>>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>>> the VFIO should always choose the strict mode.
>>>
>>> Some may consider put off physical memory free also, that will still follow
>>> strict mode. But for the map_sg cases, the memory allocation is not controlled
>>> by IOMMU APIs, so it is not enforceable.
>>>
>>> Fortunately, Intel and AMD have already applied the non-strict mode, and put
>>> queue_iova() operation into the common file dma-iommu.c., and my work is based
>>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
>>> unmap, but Intel and AMD IOMMU drivers are not.
>>>
>>> Below is the performance data of strict vs non-strict for NVMe device:
>>> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
>>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
>>
>> What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then you'll still be using the rubbish globally-blocking sync implementation. If that is the case, I'd be very interested to see how much there is to gain from just improving that - I've had a patch kicking around for a while[1] (also on a rebased branch at [2]), but don't have the means for serious performance testing.
> 
> The hardware is the new D06 which the SMMU with MSIs,

Cool! Now that profiling is fairly useful since we got rid of most of 
the locks, are you able to get an idea of how the overhead in the normal 
case is distributed between arm_smmu_cmdq_insert_cmd() and 
__arm_smmu_sync_poll_msi()? We're always trying to improve our 
understanding of where command-queue-related overheads turn out to be in 
practice, and there's still potentially room to do nicer things than 
TLBI_NH_ALL ;)

Robin.

> it's not D05 :)
> 
> Thanks
> Hanjun
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3
  2018-05-31 14:25     ` Robin Murphy
@ 2018-06-01  6:50       ` Leizhen (ThunderTown)
  2018-06-10 11:13         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2018-06-01  6:50 UTC (permalink / raw)
  To: Robin Murphy, Hanjun Guo, Will Deacon, Matthias Brugger,
	Rob Clark, Joerg Roedel, linux-mediatek, linux-arm-msm,
	linux-arm-kernel, iommu, linux-kernel
  Cc: Libin, Guozhu Li, Xinwei Hu



On 2018/5/31 22:25, Robin Murphy wrote:
> On 31/05/18 14:49, Hanjun Guo wrote:
>> Hi Robin,
>>
>> On 2018/5/31 19:24, Robin Murphy wrote:
>>> On 31/05/18 08:42, Zhen Lei wrote:
>>>> In common, a IOMMU unmap operation follow the below steps:
>>>> 1. remove the mapping in page table of the specified iova range
>>>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>>>> 3. wait for the above tlbi operation to be finished
>>>> 4. free the IOVA resource
>>>> 5. free the physical memory resource
>>>>
>>>> This maybe a problem when unmap is very frequently, the combination of tlbi
>>>> and wait operation will consume a lot of time. A feasible method is put off
>>>> tlbi and iova-free operation, when accumulating to a certain number or
>>>> reaching a specified time, execute only one tlbi_all command to clean up
>>>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>>>
>>>> But it must be noted that, although the mapping has already been removed in
>>>> the page table, it maybe still exist in TLB. And the freed physical memory
>>>> may also be reused for others. So a attacker can persistent access to memory
>>>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>>>> the VFIO should always choose the strict mode.
>>>>
>>>> Some may consider put off physical memory free also, that will still follow
>>>> strict mode. But for the map_sg cases, the memory allocation is not controlled
>>>> by IOMMU APIs, so it is not enforceable.
>>>>
>>>> Fortunately, Intel and AMD have already applied the non-strict mode, and put
>>>> queue_iova() operation into the common file dma-iommu.c., and my work is based
>>>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
>>>> unmap, but Intel and AMD IOMMU drivers are not.
>>>>
>>>> Below is the performance data of strict vs non-strict for NVMe device:
>>>> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
>>>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
>>>
>>> What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then you'll still be using the rubbish globally-blocking sync implementation. If that is the case, I'd be very interested to see how much there is to gain from just improving that - I've had a patch kicking around for a while[1] (also on a rebased branch at [2]), but don't have the means for serious performance testing.
I will try your patch to see how much it can improve. I think the best way
to resovle the globally-blocking sync is that the hardware provide 64bits
CONS regitster, so that it can never be wrapped, and the spinlock can also
be removed.

>>
>> The hardware is the new D06 which the SMMU with MSIs,
> 
> Cool! Now that profiling is fairly useful since we got rid of most of the locks, are you able to get an idea of how the overhead in the normal case is distributed between arm_smmu_cmdq_insert_cmd() and __arm_smmu_sync_poll_msi()? We're always trying to improve our understanding of where command-queue-related overheads turn out to be in practice, and there's still potentially room to do nicer things than TLBI_NH_ALL ;)
Even if the software has no overhead, there may still be a problem, because
the smmu need to execute the commands in sequence, especially before
globally-blocking sync has been removed. Base on the actually execute time
of single tlbi and sync, we can get the upper limit in theory.

BTW, I will reply the reset of mail next week. I'm busy with other things now.

> 
> Robin.
> 
>> it's not D05 :)
>>
>> Thanks
>> Hanjun
>>
> 
> .
> 

-- 
Thanks!
BestRegards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] iommu/dma: add support for non-strict mode
  2018-05-31  7:42 ` [PATCH 5/7] iommu/dma: add support for non-strict mode Zhen Lei
  2018-05-31 13:04   ` Robin Murphy
@ 2018-06-01 17:51   ` kbuild test robot
  2018-06-04 12:19     ` Leizhen (ThunderTown)
  1 sibling, 1 reply; 23+ messages in thread
From: kbuild test robot @ 2018-06-01 17:51 UTC (permalink / raw)
  To: Zhen Lei
  Cc: kbuild-all, Robin Murphy, Will Deacon, Matthias Brugger,
	Rob Clark, Joerg Roedel, linux-mediatek, linux-arm-msm,
	linux-arm-kernel, iommu, linux-kernel, Zhen Lei, Hanjun Guo,
	Libin, Guozhu Li, Xinwei Hu

[-- Attachment #1: Type: text/plain, Size: 2602 bytes --]

Hi Zhen,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17-rc7 next-20180601]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Zhen-Lei/add-non-strict-mode-support-for-arm-smmu-v3/20180602-000418
config: x86_64-randconfig-x008-201821 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers//iommu/amd_iommu.c: In function 'amd_iommu_capable':
>> drivers//iommu/amd_iommu.c:3053:2: warning: enumeration value 'IOMMU_CAP_NON_STRICT' not handled in switch [-Wswitch]
     switch (cap) {
     ^~~~~~

vim +/IOMMU_CAP_NON_STRICT +3053 drivers//iommu/amd_iommu.c

645c4c8d arch/x86/kernel/amd_iommu.c Joerg Roedel 2008-12-02  3050  
ab636481 drivers/iommu/amd_iommu.c   Joerg Roedel 2014-09-05  3051  static bool amd_iommu_capable(enum iommu_cap cap)
dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang   2009-03-18  3052  {
80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 @3053  	switch (cap) {
80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27  3054  	case IOMMU_CAP_CACHE_COHERENCY:
ab636481 drivers/iommu/amd_iommu.c   Joerg Roedel 2014-09-05  3055  		return true;
bdddadcb drivers/iommu/amd_iommu.c   Joerg Roedel 2012-07-02  3056  	case IOMMU_CAP_INTR_REMAP:
ab636481 drivers/iommu/amd_iommu.c   Joerg Roedel 2014-09-05  3057  		return (irq_remapping_enabled == 1);
cfdeec22 drivers/iommu/amd_iommu.c   Will Deacon  2014-10-27  3058  	case IOMMU_CAP_NOEXEC:
cfdeec22 drivers/iommu/amd_iommu.c   Will Deacon  2014-10-27  3059  		return false;
80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27  3060  	}
80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27  3061  
ab636481 drivers/iommu/amd_iommu.c   Joerg Roedel 2014-09-05  3062  	return false;
dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang   2009-03-18  3063  }
dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang   2009-03-18  3064  

:::::: The code at line 3053 was first introduced by commit
:::::: 80a506b8fdcfa868bb53eb740f928217d0966fc1 x86/amd-iommu: Export cache-coherency capability

:::::: TO: Joerg Roedel <joerg.roedel@amd.com>
:::::: CC: Joerg Roedel <joerg.roedel@amd.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28596 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] iommu/dma: fix trival coding style mistake
  2018-05-31 13:03   ` Robin Murphy
@ 2018-06-04 11:05     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2018-06-04 11:05 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Hanjun Guo, Libin, Guozhu Li, Xinwei Hu



On 2018/5/31 21:03, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> The static function iova_reserve_iommu_regions is only called by function
>> iommu_dma_init_domain, and the 'if (!dev)' check in iommu_dma_init_domain
>> affect it only, so we can safely move the check into it. I think it looks
>> more natural.
> 
> As before, I disagree - the logic of iommu_dma_init_domain() is "we expect to have a valid device, but stop here if we don't", and moving the check just needlessly obfuscates that. It is not a coincidence that the arguments of both functions are in effective order of importance.
OK

> 
>> In addition, the local variable 'ret' is only assigned in the branch of
>> 'if (region->type == IOMMU_RESV_MSI)', so the 'if (ret)' should also only
>> take effect in the branch, add a brace to enclose it.
> 
> 'ret' is clearly also assigned at its declaration, to cover the (very likely) case where we don't enter the loop at all. Thus testing it in the loop is harmless, and cluttering that up with extra tabs and braces is just noise.
OK, I will drop this patch in v2

> 
> Robin.
> 
>> No functional changes.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>   drivers/iommu/dma-iommu.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index ddcbbdb..4e885f7 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
>>       LIST_HEAD(resv_regions);
>>       int ret = 0;
>>   +    if (!dev)
>> +        return 0;
>> +
>>       if (dev_is_pci(dev))
>>           iova_reserve_pci_windows(to_pci_dev(dev), iovad);
>>   @@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev,
>>           hi = iova_pfn(iovad, region->start + region->length - 1);
>>           reserve_iova(iovad, lo, hi);
>>   -        if (region->type == IOMMU_RESV_MSI)
>> +        if (region->type == IOMMU_RESV_MSI) {
>>               ret = cookie_init_hw_msi_region(cookie, region->start,
>>                       region->start + region->length);
>> -        if (ret)
>> -            break;
>> +            if (ret)
>> +                break;
>> +        }
>>       }
>>       iommu_put_resv_regions(dev, &resv_regions);
>>   @@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>>       }
>>         init_iova_domain(iovad, 1UL << order, base_pfn);
>> -    if (!dev)
>> -        return 0;
>>         return iova_reserve_iommu_regions(dev, domain);
>>   }
>>
> 
> .
> 

-- 
Thanks!
BestRegards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] iommu: prepare for the non-strict mode support
  2018-05-31 13:04   ` Robin Murphy
@ 2018-06-04 11:27     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2018-06-04 11:27 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Hanjun Guo, Libin, Guozhu Li, Xinwei Hu



On 2018/5/31 21:04, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> In common, a IOMMU unmap operation follow the below steps:
>> 1. remove the mapping in page table of the specified iova range
>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>> 3. wait for the above tlbi operation to be finished
>> 4. free the IOVA resource
>> 5. free the physical memory resource
>>
>> This maybe a problem when unmap is very frequently, the combination of tlbi
>> and wait operation will consume a lot of time. A feasible method is put off
>> tlbi and iova-free operation, when accumulating to a certain number or
>> reaching a specified time, execute only one tlbi_all command to clean up
>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>
>> But it must be noted that, although the mapping has already been removed in
>> the page table, it maybe still exist in TLB. And the freed physical memory
>> may also be reused for others. So a attacker can persistent access to memory
>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>> the VFIO should always choose the strict mode.
>>
>> This patch just add a new parameter for the unmap operation, to help the
>> upper functions capable choose which mode to be applied.
> 
> This seems like it might be better handled by a flag in io_pgtable_cfg->quirks. This interface change on its own looks rather invasive, and teh fact that it ends up only being used to pass through a constant property of the domain (which is already known by the point io_pgtable_alloc() is called) implies that it is indeed the wrong level of abstraction.
> 
Sound good. Thanks for your suggestion, I will try it in v2.

>> No functional changes.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>   drivers/iommu/arm-smmu-v3.c        | 2 +-
>>   drivers/iommu/arm-smmu.c           | 2 +-
>>   drivers/iommu/io-pgtable-arm-v7s.c | 6 +++---
>>   drivers/iommu/io-pgtable-arm.c     | 6 +++---
>>   drivers/iommu/io-pgtable.h         | 2 +-
>>   drivers/iommu/ipmmu-vmsa.c         | 2 +-
>>   drivers/iommu/msm_iommu.c          | 2 +-
>>   drivers/iommu/mtk_iommu.c          | 2 +-
>>   drivers/iommu/qcom_iommu.c         | 2 +-
>>   include/linux/iommu.h              | 2 ++
> 
> Plus things specific to io-pgtable shouldn't really be spilling into the core API header either.
> 
> Robin.
> 
>>   10 files changed, 15 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 4402187..59b3387 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -1767,7 +1767,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>>       if (!ops)
>>           return 0;
>>   -    return ops->unmap(ops, iova, size);
>> +    return ops->unmap(ops, iova, size, IOMMU_STRICT);
>>   }
>>     static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 69e7c60..253e807 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1249,7 +1249,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>>       if (!ops)
>>           return 0;
>>   -    return ops->unmap(ops, iova, size);
>> +    return ops->unmap(ops, iova, size, IOMMU_STRICT);
>>   }
>>     static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
>> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
>> index 10e4a3d..799eced 100644
>> --- a/drivers/iommu/io-pgtable-arm-v7s.c
>> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
>> @@ -658,7 +658,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>>   }
>>     static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>> -                size_t size)
>> +                size_t size, int strict)
>>   {
>>       struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
>>   @@ -883,7 +883,7 @@ static int __init arm_v7s_do_selftests(void)
>>       size = 1UL << __ffs(cfg.pgsize_bitmap);
>>       while (i < loopnr) {
>>           iova_start = i * SZ_16M;
>> -        if (ops->unmap(ops, iova_start + size, size) != size)
>> +        if (ops->unmap(ops, iova_start + size, size, IOMMU_STRICT) != size)
>>               return __FAIL(ops);
>>             /* Remap of partial unmap */
>> @@ -902,7 +902,7 @@ static int __init arm_v7s_do_selftests(void)
>>       while (i != BITS_PER_LONG) {
>>           size = 1UL << i;
>>   -        if (ops->unmap(ops, iova, size) != size)
>> +        if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
>>               return __FAIL(ops);
>>             if (ops->iova_to_phys(ops, iova + 42))
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index 39c2a05..e0f52db 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -624,7 +624,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>>   }
>>     static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>> -                 size_t size)
>> +                 size_t size, int strict)
>>   {
>>       struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>>       arm_lpae_iopte *ptep = data->pgd;
>> @@ -1108,7 +1108,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>>             /* Partial unmap */
>>           size = 1UL << __ffs(cfg->pgsize_bitmap);
>> -        if (ops->unmap(ops, SZ_1G + size, size) != size)
>> +        if (ops->unmap(ops, SZ_1G + size, size, IOMMU_STRICT) != size)
>>               return __FAIL(ops, i);
>>             /* Remap of partial unmap */
>> @@ -1124,7 +1124,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>>           while (j != BITS_PER_LONG) {
>>               size = 1UL << j;
>>   -            if (ops->unmap(ops, iova, size) != size)
>> +            if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
>>                   return __FAIL(ops, i);
>>                 if (ops->iova_to_phys(ops, iova + 42))
>> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
>> index 2df7909..2908806 100644
>> --- a/drivers/iommu/io-pgtable.h
>> +++ b/drivers/iommu/io-pgtable.h
>> @@ -120,7 +120,7 @@ struct io_pgtable_ops {
>>       int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
>>              phys_addr_t paddr, size_t size, int prot);
>>       size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
>> -            size_t size);
>> +            size_t size, int strict);
>>       phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
>>                       unsigned long iova);
>>   };
>> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
>> index 40ae6e8..e6d9e11 100644
>> --- a/drivers/iommu/ipmmu-vmsa.c
>> +++ b/drivers/iommu/ipmmu-vmsa.c
>> @@ -716,7 +716,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
>>   {
>>       struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
>>   -    return domain->iop->unmap(domain->iop, iova, size);
>> +    return domain->iop->unmap(domain->iop, iova, size, IOMMU_STRICT);
>>   }
>>     static void ipmmu_iotlb_sync(struct iommu_domain *io_domain)
>> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
>> index 0d33504..180fa3d 100644
>> --- a/drivers/iommu/msm_iommu.c
>> +++ b/drivers/iommu/msm_iommu.c
>> @@ -532,7 +532,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>>       unsigned long flags;
>>         spin_lock_irqsave(&priv->pgtlock, flags);
>> -    len = priv->iop->unmap(priv->iop, iova, len);
>> +    len = priv->iop->unmap(priv->iop, iova, len, IOMMU_STRICT);
>>       spin_unlock_irqrestore(&priv->pgtlock, flags);
>>         return len;
>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>> index f2832a1..54661ed 100644
>> --- a/drivers/iommu/mtk_iommu.c
>> +++ b/drivers/iommu/mtk_iommu.c
>> @@ -386,7 +386,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>>       size_t unmapsz;
>>         spin_lock_irqsave(&dom->pgtlock, flags);
>> -    unmapsz = dom->iop->unmap(dom->iop, iova, size);
>> +    unmapsz = dom->iop->unmap(dom->iop, iova, size, IOMMU_STRICT);
>>       spin_unlock_irqrestore(&dom->pgtlock, flags);
>>         return unmapsz;
>> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
>> index 65b9c99..90abde1 100644
>> --- a/drivers/iommu/qcom_iommu.c
>> +++ b/drivers/iommu/qcom_iommu.c
>> @@ -444,7 +444,7 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>>        */
>>       pm_runtime_get_sync(qcom_domain->iommu->dev);
>>       spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
>> -    ret = ops->unmap(ops, iova, size);
>> +    ret = ops->unmap(ops, iova, size, IOMMU_STRICT);
>>       spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
>>       pm_runtime_put_sync(qcom_domain->iommu->dev);
>>   diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 19938ee..39b3150 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -86,6 +86,8 @@ struct iommu_domain_geometry {
>>   #define IOMMU_DOMAIN_DMA    (__IOMMU_DOMAIN_PAGING |    \
>>                    __IOMMU_DOMAIN_DMA_API)
>>   +#define IOMMU_STRICT        1
>> +
>>   struct iommu_domain {
>>       unsigned type;
>>       const struct iommu_ops *ops;
>>
> 
> .
> 

-- 
Thanks!
BestRegards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed
  2018-05-31 13:04   ` Robin Murphy
@ 2018-06-04 11:41     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2018-06-04 11:41 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Hanjun Guo, Libin, Guozhu Li, Xinwei Hu



On 2018/5/31 21:04, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> Although the mapping has already been removed in the page table, it maybe
>> still exist in TLB. Suppose the freed IOVAs is reused by others before the
>> flush operation completed, the new user can not correctly access to its
>> meomory.
> 
> This change seems reasonable in isolation, but why is it right in the middle of a series which has nothing to do with x86?
Because I described more in the previous patch, which may help this patch to be understood well.

You're right, I will repost this patch separately.

> 
> Robin.
> 
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>   drivers/iommu/amd_iommu.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 8fb8c73..93aa389 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -2402,9 +2402,9 @@ static void __unmap_single(struct dma_ops_domain *dma_dom,
>>       }
>>         if (amd_iommu_unmap_flush) {
>> -        dma_ops_free_iova(dma_dom, dma_addr, pages);
>>           domain_flush_tlb(&dma_dom->domain);
>>           domain_flush_complete(&dma_dom->domain);
>> +        dma_ops_free_iova(dma_dom, dma_addr, pages);
>>       } else {
>>           pages = __roundup_pow_of_two(pages);
>>           queue_iova(&dma_dom->iovad, dma_addr >> PAGE_SHIFT, pages, 0);
>>
> 
> .
> 

-- 
Thanks!
BestRegards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] iommu/dma: add support for non-strict mode
  2018-05-31 13:04   ` Robin Murphy
@ 2018-06-04 12:04     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2018-06-04 12:04 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Matthias Brugger, Rob Clark,
	Joerg Roedel, linux-mediatek, linux-arm-msm, linux-arm-kernel,
	iommu, linux-kernel
  Cc: Hanjun Guo, Libin, Guozhu Li, Xinwei Hu



On 2018/5/31 21:04, Robin Murphy wrote:
> On 31/05/18 08:42, Zhen Lei wrote:
>> 1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
>>     capable call domain->ops->flush_iotlb_all to flush TLB.
>> 2. Define a new iommu capable: IOMMU_CAP_NON_STRICT, which used to indicate
>>     that the iommu domain support non-strict mode.
>> 3. During the iommu domain initialization phase, call capable() to check
>>     whether it support non-strcit mode. If so, call init_iova_flush_queue
>>     to register iovad->flush_cb callback.
>> 4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
>>     -->iommu_dma_free_iova. Use iovad->flush_cb to check whether its related
>>     iommu support non-strict mode or not, and call IOMMU_DOMAIN_IS_STRICT to
>>     make sure the IOMMU_DOMAIN_UNMANAGED domain always follow strict mode.
> 
> Once again, this is a whole load of complexity for a property which could just be statically encoded at allocation, e.g. in the cookie type.
That's right. Pass domain to the static function iommu_dma_free_iova will be better.

> 
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>   drivers/iommu/dma-iommu.c | 29 ++++++++++++++++++++++++++---
>>   include/linux/iommu.h     |  3 +++
>>   2 files changed, 29 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 4e885f7..2e116d9 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -55,6 +55,7 @@ struct iommu_dma_cookie {
>>       };
>>       struct list_head        msi_page_list;
>>       spinlock_t            msi_lock;
>> +    struct iommu_domain        *domain;
>>   };
>>     static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
>> @@ -64,7 +65,8 @@ static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
>>       return PAGE_SIZE;
>>   }
>>   -static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
>> +static struct iommu_dma_cookie *cookie_alloc(struct iommu_domain *domain,
>> +                         enum iommu_dma_cookie_type type)
>>   {
>>       struct iommu_dma_cookie *cookie;
>>   @@ -73,6 +75,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
>>           spin_lock_init(&cookie->msi_lock);
>>           INIT_LIST_HEAD(&cookie->msi_page_list);
>>           cookie->type = type;
>> +        cookie->domain = domain;
>>       }
>>       return cookie;
>>   }
>> @@ -94,7 +97,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
>>       if (domain->iova_cookie)
>>           return -EEXIST;
>>   -    domain->iova_cookie = cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
>> +    domain->iova_cookie = cookie_alloc(domain, IOMMU_DMA_IOVA_COOKIE);
>>       if (!domain->iova_cookie)
>>           return -ENOMEM;
>>   @@ -124,7 +127,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>>       if (domain->iova_cookie)
>>           return -EEXIST;
>>   -    cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
>> +    cookie = cookie_alloc(domain, IOMMU_DMA_MSI_COOKIE);
>>       if (!cookie)
>>           return -ENOMEM;
>>   @@ -261,6 +264,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
>>       return ret;
>>   }
>>   +static void iova_flush_iotlb_all(struct iova_domain *iovad)
> 
> iommu_dma_flush...
OK

> 
>> +{
>> +    struct iommu_dma_cookie *cookie;
>> +    struct iommu_domain *domain;
>> +
>> +    cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
>> +    domain = cookie->domain;
>> +
>> +    domain->ops->flush_iotlb_all(domain);
>> +}
>> +
>>   /**
>>    * iommu_dma_init_domain - Initialise a DMA mapping domain
>>    * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
>> @@ -276,6 +290,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
>>   int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>>           u64 size, struct device *dev)
>>   {
>> +    const struct iommu_ops *ops = domain->ops;
>>       struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>       struct iova_domain *iovad = &cookie->iovad;
>>       unsigned long order, base_pfn, end_pfn;
>> @@ -313,6 +328,11 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>>         init_iova_domain(iovad, 1UL << order, base_pfn);
>>   +    if (ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) {
>> +        BUG_ON(!ops->flush_iotlb_all);
>> +        init_iova_flush_queue(iovad, iova_flush_iotlb_all, NULL);
>> +    }
>> +
>>       return iova_reserve_iommu_regions(dev, domain);
>>   }
>>   EXPORT_SYMBOL(iommu_dma_init_domain);
>> @@ -392,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
>>       /* The MSI case is only ever cleaning up its most recent allocation */
>>       if (cookie->type == IOMMU_DMA_MSI_COOKIE)
>>           cookie->msi_iova -= size;
>> +    else if (!IOMMU_DOMAIN_IS_STRICT(cookie->domain) && iovad->flush_cb)
>> +        queue_iova(iovad, iova_pfn(iovad, iova),
>> +                size >> iova_shift(iovad), 0);
>>       else
>>           free_iova_fast(iovad, iova_pfn(iovad, iova),
>>                   size >> iova_shift(iovad));
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 39b3150..01ff569 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -87,6 +87,8 @@ struct iommu_domain_geometry {
>>                    __IOMMU_DOMAIN_DMA_API)
>>     #define IOMMU_STRICT        1
>> +#define IOMMU_DOMAIN_IS_STRICT(domain)    \
>> +        (domain->type == IOMMU_DOMAIN_UNMANAGED)
>>     struct iommu_domain {
>>       unsigned type;
>> @@ -103,6 +105,7 @@ enum iommu_cap {
>>                          transactions */
>>       IOMMU_CAP_INTR_REMAP,        /* IOMMU supports interrupt isolation */
>>       IOMMU_CAP_NOEXEC,        /* IOMMU_NOEXEC flag */
>> +    IOMMU_CAP_NON_STRICT,        /* IOMMU supports non-strict mode */
> 
> This isn't a property of the IOMMU, it depends purely on the driver implementation. I think it also doesn't matter anyway - if a caller asks for lazy unmapping on their domain but the IOMMU driver just does strict unmaps anyway because that's all it supports, there's no actual harm done.
> 
> Robin.
> 
>>   };
>>     /*
>>
> 
> .
> 

-- 
Thanks!
BestRegards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] iommu/dma: add support for non-strict mode
  2018-06-01 17:51   ` kbuild test robot
@ 2018-06-04 12:19     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2018-06-04 12:19 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Robin Murphy, Will Deacon, Matthias Brugger,
	Rob Clark, Joerg Roedel, linux-mediatek, linux-arm-msm,
	linux-arm-kernel, iommu, linux-kernel, Hanjun Guo, Libin,
	Guozhu Li, Xinwei Hu



On 2018/6/2 1:51, kbuild test robot wrote:
> Hi Zhen,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v4.17-rc7 next-20180601]
> [cannot apply to iommu/next]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Zhen-Lei/add-non-strict-mode-support-for-arm-smmu-v3/20180602-000418
> config: x86_64-randconfig-x008-201821 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> All warnings (new ones prefixed by >>):
> 
>    drivers//iommu/amd_iommu.c: In function 'amd_iommu_capable':
>>> drivers//iommu/amd_iommu.c:3053:2: warning: enumeration value 'IOMMU_CAP_NON_STRICT' not handled in switch [-Wswitch]
>      switch (cap) {
>      ^~~~~~
> 
> vim +/IOMMU_CAP_NON_STRICT +3053 drivers//iommu/amd_iommu.c
> 
> 645c4c8d arch/x86/kernel/amd_iommu.c Joerg Roedel 2008-12-02  3050  
> ab636481 drivers/iommu/amd_iommu.c   Joerg Roedel 2014-09-05  3051  static bool amd_iommu_capable(enum iommu_cap cap)
> dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang   2009-03-18  3052  {
> 80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27 @3053  	switch (cap) {
> 80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27  3054  	case IOMMU_CAP_CACHE_COHERENCY:
> ab636481 drivers/iommu/amd_iommu.c   Joerg Roedel 2014-09-05  3055  		return true;
> bdddadcb drivers/iommu/amd_iommu.c   Joerg Roedel 2012-07-02  3056  	case IOMMU_CAP_INTR_REMAP:
> ab636481 drivers/iommu/amd_iommu.c   Joerg Roedel 2014-09-05  3057  		return (irq_remapping_enabled == 1);
> cfdeec22 drivers/iommu/amd_iommu.c   Will Deacon  2014-10-27  3058  	case IOMMU_CAP_NOEXEC:
It seems that it's better to change this to 'default'.

> cfdeec22 drivers/iommu/amd_iommu.c   Will Deacon  2014-10-27  3059  		return false;
> 80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27  3060  	}
> 80a506b8 arch/x86/kernel/amd_iommu.c Joerg Roedel 2010-07-27  3061  
> ab636481 drivers/iommu/amd_iommu.c   Joerg Roedel 2014-09-05  3062  	return false;
> dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang   2009-03-18  3063  }
> dbb9fd86 arch/x86/kernel/amd_iommu.c Sheng Yang   2009-03-18  3064  
> 
> :::::: The code at line 3053 was first introduced by commit
> :::::: 80a506b8fdcfa868bb53eb740f928217d0966fc1 x86/amd-iommu: Export cache-coherency capability
> 
> :::::: TO: Joerg Roedel <joerg.roedel@amd.com>
> :::::: CC: Joerg Roedel <joerg.roedel@amd.com>
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
> 

-- 
Thanks!
BestRegards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] add non-strict mode support for arm-smmu-v3
  2018-06-01  6:50       ` Leizhen (ThunderTown)
@ 2018-06-10 11:13         ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 23+ messages in thread
From: Leizhen (ThunderTown) @ 2018-06-10 11:13 UTC (permalink / raw)
  To: Robin Murphy, Hanjun Guo, Will Deacon, Matthias Brugger,
	Rob Clark, Joerg Roedel, linux-mediatek, linux-arm-msm,
	linux-arm-kernel, iommu, linux-kernel
  Cc: Libin, Guozhu Li, Xinwei Hu



On 2018/6/1 14:50, Leizhen (ThunderTown) wrote:
> 
> 
> On 2018/5/31 22:25, Robin Murphy wrote:
>> On 31/05/18 14:49, Hanjun Guo wrote:
>>> Hi Robin,
>>>
>>> On 2018/5/31 19:24, Robin Murphy wrote:
>>>> On 31/05/18 08:42, Zhen Lei wrote:
>>>>> In common, a IOMMU unmap operation follow the below steps:
>>>>> 1. remove the mapping in page table of the specified iova range
>>>>> 2. execute tlbi command to invalid the mapping which is cached in TLB
>>>>> 3. wait for the above tlbi operation to be finished
>>>>> 4. free the IOVA resource
>>>>> 5. free the physical memory resource
>>>>>
>>>>> This maybe a problem when unmap is very frequently, the combination of tlbi
>>>>> and wait operation will consume a lot of time. A feasible method is put off
>>>>> tlbi and iova-free operation, when accumulating to a certain number or
>>>>> reaching a specified time, execute only one tlbi_all command to clean up
>>>>> TLB, then free the backup IOVAs. Mark as non-strict mode.
>>>>>
>>>>> But it must be noted that, although the mapping has already been removed in
>>>>> the page table, it maybe still exist in TLB. And the freed physical memory
>>>>> may also be reused for others. So a attacker can persistent access to memory
>>>>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
>>>>> the VFIO should always choose the strict mode.
>>>>>
>>>>> Some may consider put off physical memory free also, that will still follow
>>>>> strict mode. But for the map_sg cases, the memory allocation is not controlled
>>>>> by IOMMU APIs, so it is not enforceable.
>>>>>
>>>>> Fortunately, Intel and AMD have already applied the non-strict mode, and put
>>>>> queue_iova() operation into the common file dma-iommu.c., and my work is based
>>>>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
>>>>> unmap, but Intel and AMD IOMMU drivers are not.
>>>>>
>>>>> Below is the performance data of strict vs non-strict for NVMe device:
>>>>> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
>>>>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)
>>>>
>>>> What hardware is this on? If it's SMMUv3 without MSIs (e.g. D05), then you'll still be using the rubbish globally-blocking sync implementation. If that is the case, I'd be very interested to see how much there is to gain from just improving that - I've had a patch kicking around for a while[1] (also on a rebased branch at [2]), but don't have the means for serious performance testing.
> I will try your patch to see how much it can improve. I think the best way
Hi Robin,

I applied your patch and got below improvemnet.

Randomly Read  IOPS: 146K --> 214K
Randomly Write IOPS: 143K --> 212K

> to resovle the globally-blocking sync is that the hardware provide 64bits
> CONS regitster, so that it can never be wrapped, and the spinlock can also
> be removed.
> 
>>>
>>> The hardware is the new D06 which the SMMU with MSIs,
>>
>> Cool! Now that profiling is fairly useful since we got rid of most of the locks, are you able to get an idea of how the overhead in the normal case is distributed between arm_smmu_cmdq_insert_cmd() and __arm_smmu_sync_poll_msi()? We're always trying to improve our understanding of where command-queue-related overheads turn out to be in practice, and there's still potentially room to do nicer things than TLBI_NH_ALL ;)
> Even if the software has no overhead, there may still be a problem, because
> the smmu need to execute the commands in sequence, especially before
> globally-blocking sync has been removed. Base on the actually execute time
> of single tlbi and sync, we can get the upper limit in theory.
> 
> BTW, I will reply the reset of mail next week. I'm busy with other things now.
> 
>>
>> Robin.
>>
>>> it's not D05 :)
>>>
>>> Thanks
>>> Hanjun
>>>
>>
>> .
>>
> 

-- 
Thanks!
BestRegards

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-06-10 11:13 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-31  7:42 [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Zhen Lei
2018-05-31  7:42 ` [PATCH 1/7] iommu/dma: fix trival coding style mistake Zhen Lei
2018-05-31 13:03   ` Robin Murphy
2018-06-04 11:05     ` Leizhen (ThunderTown)
2018-05-31  7:42 ` [PATCH 2/7] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook Zhen Lei
2018-05-31  7:42 ` [PATCH 3/7] iommu: prepare for the non-strict mode support Zhen Lei
2018-05-31 13:04   ` Robin Murphy
2018-06-04 11:27     ` Leizhen (ThunderTown)
2018-05-31  7:42 ` [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed Zhen Lei
2018-05-31 13:04   ` Robin Murphy
2018-06-04 11:41     ` Leizhen (ThunderTown)
2018-05-31  7:42 ` [PATCH 5/7] iommu/dma: add support for non-strict mode Zhen Lei
2018-05-31 13:04   ` Robin Murphy
2018-06-04 12:04     ` Leizhen (ThunderTown)
2018-06-01 17:51   ` kbuild test robot
2018-06-04 12:19     ` Leizhen (ThunderTown)
2018-05-31  7:42 ` [PATCH 6/7] iommu/io-pgtable-arm: " Zhen Lei
2018-05-31  7:42 ` [PATCH 7/7] iommu/arm-smmu-v3: " Zhen Lei
2018-05-31 11:24 ` [PATCH 0/7] add non-strict mode support for arm-smmu-v3 Robin Murphy
2018-05-31 13:49   ` Hanjun Guo
2018-05-31 14:25     ` Robin Murphy
2018-06-01  6:50       ` Leizhen (ThunderTown)
2018-06-10 11:13         ` Leizhen (ThunderTown)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).