linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Use 1st-level for IOVA translation
@ 2019-12-11  2:12 Lu Baolu
  2019-12-11  2:12 ` [PATCH v3 1/6] iommu/vt-d: Identify domains using first level page table Lu Baolu
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Lu Baolu @ 2019-12-11  2:12 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, Peter Xu, iommu, kvm, linux-kernel, Lu Baolu

Intel VT-d in scalable mode supports two types of page tables
for DMA translation: the first level page table and the second
level page table. The first level page table uses the same
format as the CPU page table, while the second level page table
keeps compatible with previous formats. The software is able
to choose any one of them for DMA remapping according to the use
case.

This patchset aims to move IOVA (I/O Virtual Address) translation
to 1st-level page table in scalable mode. This will simplify vIOMMU
(IOMMU simulated by VM hypervisor) design by using the two-stage
translation, a.k.a. nested mode translation.

As Intel VT-d architecture offers caching mode, guest IOVA (GIOVA)
support is currently implemented in a shadow page manner. The device
simulation software, like QEMU, has to figure out GIOVA->GPA mappings
and write them to a shadowed page table, which will be used by the
physical IOMMU. Each time when mappings are created or destroyed in
vIOMMU, the simulation software has to intervene. Hence, the changes
on GIOVA->GPA could be shadowed to host.


     .-----------.
     |  vIOMMU   |
     |-----------|                 .--------------------.
     |           |IOTLB flush trap |        QEMU        |
     .-----------. (map/unmap)     |--------------------|
     |GIOVA->GPA |---------------->|    .------------.  |
     '-----------'                 |    | GIOVA->HPA |  |
     |           |                 |    '------------'  |
     '-----------'                 |                    |
                                   |                    |
                                   '--------------------'
                                                |
            <------------------------------------
            |
            v VFIO/IOMMU API
      .-----------.
      |  pIOMMU   |
      |-----------|
      |           |
      .-----------.
      |GIOVA->HPA |
      '-----------'
      |           |
      '-----------'

In VT-d 3.0, scalable mode is introduced, which offers two-level
translation page tables and nested translation mode. Regards to
GIOVA support, it can be simplified by 1) moving the GIOVA support
over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
second level for GPA->HPA translation, and 4) enable nested (a.k.a.
dual-stage) translation in host. Compared with current shadow GIOVA
support, the new approach makes the vIOMMU design simpler and more
efficient as we only need to flush the pIOMMU IOTLB and possible
device-IOTLB when an IOVA mapping in vIOMMU is torn down.

     .-----------.
     |  vIOMMU   |
     |-----------|                 .-----------.
     |           |IOTLB flush trap |   QEMU    |
     .-----------.    (unmap)      |-----------|
     |GIOVA->GPA |---------------->|           |
     '-----------'                 '-----------'
     |           |                       |
     '-----------'                       |
           <------------------------------
           |      VFIO/IOMMU          
           |  cache invalidation and  
           | guest gpd bind interfaces
           v
     .-----------.
     |  pIOMMU   |
     |-----------|
     .-----------.
     |GIOVA->GPA |<---First level
     '-----------'
     | GPA->HPA  |<---Scond level
     '-----------'
     '-----------'

This patch applies the first level page table for IOVA translation
unless the DOMAIN_ATTR_NESTING domain attribution has been set.
Setting of this attribution means the second level will be used to
map gPA (guest physical address) to hPA (host physical address), and
the mappings between gVA (guest virtual address) and gPA will be
maintained by the guest with the page table address binding to host's
first level.

Based-on-idea-by: Ashok Raj <ashok.raj@intel.com>
Based-on-idea-by: Kevin Tian <kevin.tian@intel.com>
Based-on-idea-by: Liu Yi L <yi.l.liu@intel.com>
Based-on-idea-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Based-on-idea-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Based-on-idea-by: Lu Baolu <baolu.lu@linux.intel.com>

Change log:

v2->v3:
 - The previous version was posted here
   https://lkml.org/lkml/2019/11/27/1831
 - Accept Jacob's suggestion on merging two page tables.

 v1->v2
 - The first series was posted here
   https://lkml.org/lkml/2019/9/23/297
 - Use per domain page table ops to handle different page tables.
 - Use first level for DMA remapping by default on both bare metal
   and vm guest.
 - Code refine according to code review comments for v1.

Lu Baolu (6):
  iommu/vt-d: Identify domains using first level page table
  iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr
  iommu/vt-d: Add PASID_FLAG_FL5LP for first-level pasid setup
  iommu/vt-d: Setup pasid entries for iova over first level
  iommu/vt-d: Flush PASID-based iotlb for iova over first level
  iommu/vt-d: Use iova over first level

 drivers/iommu/dmar.c        |  41 ++++++++
 drivers/iommu/intel-iommu.c | 185 ++++++++++++++++++++++++++++++++----
 drivers/iommu/intel-pasid.c |   7 +-
 drivers/iommu/intel-pasid.h |   6 ++
 drivers/iommu/intel-svm.c   |   8 +-
 include/linux/intel-iommu.h |  12 ++-
 6 files changed, 231 insertions(+), 28 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/6] iommu/vt-d: Identify domains using first level page table
  2019-12-11  2:12 [PATCH v3 0/6] Use 1st-level for IOVA translation Lu Baolu
@ 2019-12-11  2:12 ` Lu Baolu
  2019-12-11  2:12 ` [PATCH v3 2/6] iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr Lu Baolu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2019-12-11  2:12 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, Peter Xu, iommu, kvm, linux-kernel, Lu Baolu

This checks whether a domain should use the first level page
table for map/unmap and marks it in the domain structure.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 39 +++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 82b9687e82d3..c93fe716e6b0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -307,6 +307,14 @@ static int hw_pass_through = 1;
  */
 #define DOMAIN_FLAG_LOSE_CHILDREN		BIT(1)
 
+/*
+ * When VT-d works in the scalable mode, it allows DMA translation to
+ * happen through either first level or second level page table. This
+ * bit marks that the DMA translation for the domain goes through the
+ * first level page table, otherwise, it goes through the second level.
+ */
+#define DOMAIN_FLAG_USE_FIRST_LEVEL		BIT(2)
+
 #define for_each_domain_iommu(idx, domain)			\
 	for (idx = 0; idx < g_num_of_iommus; idx++)		\
 		if (domain->iommu_refcnt[idx])
@@ -1714,6 +1722,35 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 #endif
 }
 
+/*
+ * Check and return whether first level is used by default for
+ * DMA translation. Currently, we make it off by setting
+ * first_level_support = 0, and will change it to -1 after all
+ * map/unmap paths support first level page table.
+ */
+static bool first_level_by_default(void)
+{
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
+	static int first_level_support = 0;
+
+	if (likely(first_level_support != -1))
+		return first_level_support;
+
+	first_level_support = 1;
+
+	rcu_read_lock();
+	for_each_active_iommu(iommu, drhd) {
+		if (!sm_supported(iommu) || !ecap_flts(iommu->ecap)) {
+			first_level_support = 0;
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return first_level_support;
+}
+
 static struct dmar_domain *alloc_domain(int flags)
 {
 	struct dmar_domain *domain;
@@ -1725,6 +1762,8 @@ static struct dmar_domain *alloc_domain(int flags)
 	memset(domain, 0, sizeof(*domain));
 	domain->nid = NUMA_NO_NODE;
 	domain->flags = flags;
+	if (first_level_by_default())
+		domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
 	domain->has_iotlb_device = false;
 	INIT_LIST_HEAD(&domain->devices);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/6] iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr
  2019-12-11  2:12 [PATCH v3 0/6] Use 1st-level for IOVA translation Lu Baolu
  2019-12-11  2:12 ` [PATCH v3 1/6] iommu/vt-d: Identify domains using first level page table Lu Baolu
@ 2019-12-11  2:12 ` Lu Baolu
  2019-12-11  2:12 ` [PATCH v3 3/6] iommu/vt-d: Add PASID_FLAG_FL5LP for first-level pasid setup Lu Baolu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2019-12-11  2:12 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, Peter Xu, iommu, kvm, linux-kernel, Lu Baolu, Yi Sun

This adds the Intel VT-d specific callback of setting
DOMAIN_ATTR_NESTING domain attribution. It is necessary
to let the VT-d driver know that the domain represents
a virtual machine which requires the IOMMU hardware to
support nested translation mode. Return success if the
IOMMU hardware suports nested mode, otherwise failure.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 56 +++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c93fe716e6b0..2b5a47584baf 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -315,6 +315,12 @@ static int hw_pass_through = 1;
  */
 #define DOMAIN_FLAG_USE_FIRST_LEVEL		BIT(2)
 
+/*
+ * Domain represents a virtual machine which demands iommu nested
+ * translation mode support.
+ */
+#define DOMAIN_FLAG_NESTING_MODE		BIT(3)
+
 #define for_each_domain_iommu(idx, domain)			\
 	for (idx = 0; idx < g_num_of_iommus; idx++)		\
 		if (domain->iommu_refcnt[idx])
@@ -5635,6 +5641,24 @@ static inline bool iommu_pasid_support(void)
 	return ret;
 }
 
+static inline bool nested_mode_support(void)
+{
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
+	bool ret = true;
+
+	rcu_read_lock();
+	for_each_active_iommu(iommu, drhd) {
+		if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
+			ret = false;
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
 static bool intel_iommu_capable(enum iommu_cap cap)
 {
 	if (cap == IOMMU_CAP_CACHE_COHERENCY)
@@ -6013,10 +6037,42 @@ static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain,
 	return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
 }
 
+static int
+intel_iommu_domain_set_attr(struct iommu_domain *domain,
+			    enum iommu_attr attr, void *data)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	unsigned long flags;
+	int ret = 0;
+
+	if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+		return -EINVAL;
+
+	switch (attr) {
+	case DOMAIN_ATTR_NESTING:
+		spin_lock_irqsave(&device_domain_lock, flags);
+		if (nested_mode_support() &&
+		    list_empty(&dmar_domain->devices)) {
+			dmar_domain->flags |= DOMAIN_FLAG_NESTING_MODE;
+			dmar_domain->flags &= ~DOMAIN_FLAG_USE_FIRST_LEVEL;
+		} else {
+			ret = -ENODEV;
+		}
+		spin_unlock_irqrestore(&device_domain_lock, flags);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.domain_free		= intel_iommu_domain_free,
+	.domain_set_attr	= intel_iommu_domain_set_attr,
 	.attach_dev		= intel_iommu_attach_device,
 	.detach_dev		= intel_iommu_detach_device,
 	.aux_attach_dev		= intel_iommu_aux_attach_device,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/6] iommu/vt-d: Add PASID_FLAG_FL5LP for first-level pasid setup
  2019-12-11  2:12 [PATCH v3 0/6] Use 1st-level for IOVA translation Lu Baolu
  2019-12-11  2:12 ` [PATCH v3 1/6] iommu/vt-d: Identify domains using first level page table Lu Baolu
  2019-12-11  2:12 ` [PATCH v3 2/6] iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr Lu Baolu
@ 2019-12-11  2:12 ` Lu Baolu
  2019-12-11  2:12 ` [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level Lu Baolu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2019-12-11  2:12 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, Peter Xu, iommu, kvm, linux-kernel, Lu Baolu

Current intel_pasid_setup_first_level() use 5-level paging for
first level translation if CPUs use 5-level paging mode too.
This makes sense for SVA usages since the page table is shared
between CPUs and IOMMUs. But it makes no sense if we only want
to use first level for IOVA translation. Add PASID_FLAG_FL5LP
bit in the flags which indicates whether the 5-level paging
mode should be used.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-pasid.c | 7 ++-----
 drivers/iommu/intel-pasid.h | 6 ++++++
 drivers/iommu/intel-svm.c   | 8 ++++++--
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 3cb569e76642..22b30f10b396 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -477,18 +477,15 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 		pasid_set_sre(pte);
 	}
 
-#ifdef CONFIG_X86
-	/* Both CPU and IOMMU paging mode need to match */
-	if (cpu_feature_enabled(X86_FEATURE_LA57)) {
+	if (flags & PASID_FLAG_FL5LP) {
 		if (cap_5lp_support(iommu->cap)) {
 			pasid_set_flpm(pte, 1);
 		} else {
-			pr_err("VT-d has no 5-level paging support for CPU\n");
+			pr_err("No 5-level paging support for first-level\n");
 			pasid_clear_entry(pte);
 			return -EINVAL;
 		}
 	}
-#endif /* CONFIG_X86 */
 
 	pasid_set_domain_id(pte, did);
 	pasid_set_address_width(pte, iommu->agaw);
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index fc8cd8f17de1..92de6df24ccb 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -37,6 +37,12 @@
  */
 #define PASID_FLAG_SUPERVISOR_MODE	BIT(0)
 
+/*
+ * The PASID_FLAG_FL5LP flag Indicates using 5-level paging for first-
+ * level translation, otherwise, 4-level paging will be used.
+ */
+#define PASID_FLAG_FL5LP		BIT(1)
+
 struct pasid_dir_entry {
 	u64 val;
 };
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 04023033b79f..d7f2a5358900 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -364,7 +364,9 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 		ret = intel_pasid_setup_first_level(iommu, dev,
 				mm ? mm->pgd : init_mm.pgd,
 				svm->pasid, FLPT_DEFAULT_DID,
-				mm ? 0 : PASID_FLAG_SUPERVISOR_MODE);
+				(mm ? 0 : PASID_FLAG_SUPERVISOR_MODE) |
+				(cpu_feature_enabled(X86_FEATURE_LA57) ?
+				 PASID_FLAG_FL5LP : 0));
 		spin_unlock(&iommu->lock);
 		if (ret) {
 			if (mm)
@@ -385,7 +387,9 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 		ret = intel_pasid_setup_first_level(iommu, dev,
 						mm ? mm->pgd : init_mm.pgd,
 						svm->pasid, FLPT_DEFAULT_DID,
-						mm ? 0 : PASID_FLAG_SUPERVISOR_MODE);
+						(mm ? 0 : PASID_FLAG_SUPERVISOR_MODE) |
+						(cpu_feature_enabled(X86_FEATURE_LA57) ?
+						PASID_FLAG_FL5LP : 0));
 		spin_unlock(&iommu->lock);
 		if (ret) {
 			kfree(sdev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
  2019-12-11  2:12 [PATCH v3 0/6] Use 1st-level for IOVA translation Lu Baolu
                   ` (2 preceding siblings ...)
  2019-12-11  2:12 ` [PATCH v3 3/6] iommu/vt-d: Add PASID_FLAG_FL5LP for first-level pasid setup Lu Baolu
@ 2019-12-11  2:12 ` Lu Baolu
  2019-12-13  9:23   ` Liu, Yi L
  2019-12-11  2:12 ` [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb " Lu Baolu
  2019-12-11  2:12 ` [PATCH v3 6/6] iommu/vt-d: Use " Lu Baolu
  5 siblings, 1 reply; 22+ messages in thread
From: Lu Baolu @ 2019-12-11  2:12 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, Peter Xu, iommu, kvm, linux-kernel, Lu Baolu

Intel VT-d in scalable mode supports two types of page tables for
IOVA translation: first level and second level. The IOMMU driver
can choose one from both for IOVA translation according to the use
case. This sets up the pasid entry if a domain is selected to use
the first-level page table for iova translation.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++--
 include/linux/intel-iommu.h | 10 ++++----
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2b5a47584baf..83a7abf0c4f0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain *domain)
 	return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;
 }
 
+static inline bool domain_use_first_level(struct dmar_domain *domain)
+{
+	return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL;
+}
+
 static inline int domain_pfn_supported(struct dmar_domain *domain,
 				       unsigned long pfn)
 {
@@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 		return -EINVAL;
 
 	prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
+	if (domain_use_first_level(domain))
+		prot |= DMA_FL_PTE_PRESENT;
 
 	if (!sg) {
 		sg_res = nr_pages;
@@ -2515,6 +2522,36 @@ dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
 	return NULL;
 }
 
+static int domain_setup_first_level(struct intel_iommu *iommu,
+				    struct dmar_domain *domain,
+				    struct device *dev,
+				    int pasid)
+{
+	int flags = PASID_FLAG_SUPERVISOR_MODE;
+	struct dma_pte *pgd = domain->pgd;
+	int agaw, level;
+
+	/*
+	 * Skip top levels of page tables for iommu which has
+	 * less agaw than default. Unnecessary for PT mode.
+	 */
+	for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+		pgd = phys_to_virt(dma_pte_addr(pgd));
+		if (!dma_pte_present(pgd))
+			return -ENOMEM;
+	}
+
+	level = agaw_to_level(agaw);
+	if (level != 4 && level != 5)
+		return -EINVAL;
+
+	flags |= (level == 5) ? PASID_FLAG_FL5LP : 0;
+
+	return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid,
+					     domain->iommu_did[iommu->seq_id],
+					     flags);
+}
+
 static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
 						    int bus, int devfn,
 						    struct device *dev,
@@ -2614,6 +2651,9 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
 		if (hw_pass_through && domain_type_is_si(domain))
 			ret = intel_pasid_setup_pass_through(iommu, domain,
 					dev, PASID_RID2PASID);
+		else if (domain_use_first_level(domain))
+			ret = domain_setup_first_level(iommu, domain, dev,
+					PASID_RID2PASID);
 		else
 			ret = intel_pasid_setup_second_level(iommu, domain,
 					dev, PASID_RID2PASID);
@@ -5369,8 +5409,12 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
 		goto attach_failed;
 
 	/* Setup the PASID entry for mediated devices: */
-	ret = intel_pasid_setup_second_level(iommu, domain, dev,
-					     domain->default_pasid);
+	if (domain_use_first_level(domain))
+		ret = domain_setup_first_level(iommu, domain, dev,
+					       domain->default_pasid);
+	else
+		ret = intel_pasid_setup_second_level(iommu, domain, dev,
+						     domain->default_pasid);
 	if (ret)
 		goto table_failed;
 	spin_unlock(&iommu->lock);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index aaece25c055f..66b525bad434 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -34,10 +34,12 @@
 #define VTD_STRIDE_SHIFT        (9)
 #define VTD_STRIDE_MASK         (((u64)-1) << VTD_STRIDE_SHIFT)
 
-#define DMA_PTE_READ (1)
-#define DMA_PTE_WRITE (2)
-#define DMA_PTE_LARGE_PAGE (1 << 7)
-#define DMA_PTE_SNP (1 << 11)
+#define DMA_PTE_READ		(1)
+#define DMA_PTE_WRITE		(2)
+#define DMA_PTE_LARGE_PAGE	(1 << 7)
+#define DMA_PTE_SNP		(1 << 11)
+
+#define DMA_FL_PTE_PRESENT	(1)
 
 #define CONTEXT_TT_MULTI_LEVEL	0
 #define CONTEXT_TT_DEV_IOTLB	1
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-11  2:12 [PATCH v3 0/6] Use 1st-level for IOVA translation Lu Baolu
                   ` (3 preceding siblings ...)
  2019-12-11  2:12 ` [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level Lu Baolu
@ 2019-12-11  2:12 ` Lu Baolu
  2019-12-13 11:42   ` Liu, Yi L
  2019-12-11  2:12 ` [PATCH v3 6/6] iommu/vt-d: Use " Lu Baolu
  5 siblings, 1 reply; 22+ messages in thread
From: Lu Baolu @ 2019-12-11  2:12 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, Peter Xu, iommu, kvm, linux-kernel, Lu Baolu

When software has changed first-level tables, it should invalidate
the affected IOTLB and the paging-structure-caches using the PASID-
based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/dmar.c        | 41 ++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-iommu.c | 44 ++++++++++++++++++++++++-------------
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 72 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 3acfa6a25fa2..fb30d5053664 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1371,6 +1371,47 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based IOTLB invalidation */
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
+		     unsigned long npages, bool ih)
+{
+	struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
+
+	/*
+	 * npages == -1 means a PASID-selective invalidation, otherwise,
+	 * a positive value for Page-selective-within-PASID invalidation.
+	 * 0 is not a valid input.
+	 */
+	if (WARN_ON(!npages)) {
+		pr_err("Invalid input npages = %ld\n", npages);
+		return;
+	}
+
+	if (npages == -1) {
+		desc.qw0 = QI_EIOTLB_PASID(pasid) |
+				QI_EIOTLB_DID(did) |
+				QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
+				QI_EIOTLB_TYPE;
+		desc.qw1 = 0;
+	} else {
+		int mask = ilog2(__roundup_pow_of_two(npages));
+		unsigned long align = (1ULL << (VTD_PAGE_SHIFT + mask));
+
+		if (WARN_ON_ONCE(!ALIGN(addr, align)))
+			addr &= ~(align - 1);
+
+		desc.qw0 = QI_EIOTLB_PASID(pasid) |
+				QI_EIOTLB_DID(did) |
+				QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
+				QI_EIOTLB_TYPE;
+		desc.qw1 = QI_EIOTLB_ADDR(addr) |
+				QI_EIOTLB_IH(ih) |
+				QI_EIOTLB_AM(mask);
+	}
+
+	qi_submit_sync(&desc, iommu);
+}
+
 /*
  * Disable Queued Invalidation interface.
  */
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 83a7abf0c4f0..e47f5fe37b59 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1520,18 +1520,24 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
 
 	if (ih)
 		ih = 1 << 6;
-	/*
-	 * Fallback to domain selective flush if no PSI support or the size is
-	 * too big.
-	 * PSI requires page size to be 2 ^ x, and the base address is naturally
-	 * aligned to the size
-	 */
-	if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
-		iommu->flush.flush_iotlb(iommu, did, 0, 0,
-						DMA_TLB_DSI_FLUSH);
-	else
-		iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
-						DMA_TLB_PSI_FLUSH);
+
+	if (domain_use_first_level(domain)) {
+		qi_flush_piotlb(iommu, did, domain->default_pasid,
+				addr, pages, ih);
+	} else {
+		/*
+		 * Fallback to domain selective flush if no PSI support or
+		 * the size is too big. PSI requires page size to be 2 ^ x,
+		 * and the base address is naturally aligned to the size.
+		 */
+		if (!cap_pgsel_inv(iommu->cap) ||
+		    mask > cap_max_amask_val(iommu->cap))
+			iommu->flush.flush_iotlb(iommu, did, 0, 0,
+							DMA_TLB_DSI_FLUSH);
+		else
+			iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
+							DMA_TLB_PSI_FLUSH);
+	}
 
 	/*
 	 * In caching mode, changes of pages from non-present to present require
@@ -1546,8 +1552,11 @@ static inline void __mapping_notify_one(struct intel_iommu *iommu,
 					struct dmar_domain *domain,
 					unsigned long pfn, unsigned int pages)
 {
-	/* It's a non-present to present mapping. Only flush if caching mode */
-	if (cap_caching_mode(iommu->cap))
+	/*
+	 * It's a non-present to present mapping. Only flush if caching mode
+	 * and second level.
+	 */
+	if (cap_caching_mode(iommu->cap) && !domain_use_first_level(domain))
 		iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1);
 	else
 		iommu_flush_write_buffer(iommu);
@@ -1564,7 +1573,12 @@ static void iommu_flush_iova(struct iova_domain *iovad)
 		struct intel_iommu *iommu = g_iommus[idx];
 		u16 did = domain->iommu_did[iommu->seq_id];
 
-		iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+		if (domain_use_first_level(domain))
+			qi_flush_piotlb(iommu, did,
+					domain->default_pasid, 0, -1, 0);
+		else
+			iommu->flush.flush_iotlb(iommu, did, 0, 0,
+						 DMA_TLB_DSI_FLUSH);
 
 		if (!cap_caching_mode(iommu->cap))
 			iommu_flush_dev_iotlb(get_iommu_domain(iommu, did),
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 66b525bad434..b693540e4b4f 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -648,6 +648,8 @@ extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 			  unsigned int size_order, u64 type);
 extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 			u16 qdep, u64 addr, unsigned mask);
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
+		     unsigned long npages, bool ih);
 extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
 
 extern int dmar_ir_support(void);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 6/6] iommu/vt-d: Use iova over first level
  2019-12-11  2:12 [PATCH v3 0/6] Use 1st-level for IOVA translation Lu Baolu
                   ` (4 preceding siblings ...)
  2019-12-11  2:12 ` [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb " Lu Baolu
@ 2019-12-11  2:12 ` Lu Baolu
  5 siblings, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2019-12-11  2:12 UTC (permalink / raw)
  To: Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu,
	yi.y.sun, Peter Xu, iommu, kvm, linux-kernel, Lu Baolu

After we make all map/unmap paths support first level page table.
Let's turn it on if hardware supports scalable mode.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e47f5fe37b59..9228f121a040 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1749,15 +1749,13 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 
 /*
  * Check and return whether first level is used by default for
- * DMA translation. Currently, we make it off by setting
- * first_level_support = 0, and will change it to -1 after all
- * map/unmap paths support first level page table.
+ * DMA translation.
  */
 static bool first_level_by_default(void)
 {
 	struct dmar_drhd_unit *drhd;
 	struct intel_iommu *iommu;
-	static int first_level_support = 0;
+	static int first_level_support = -1;
 
 	if (likely(first_level_support != -1))
 		return first_level_support;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
  2019-12-11  2:12 ` [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level Lu Baolu
@ 2019-12-13  9:23   ` Liu, Yi L
  2019-12-14  3:03     ` Lu Baolu
  0 siblings, 1 reply; 22+ messages in thread
From: Liu, Yi L @ 2019-12-13  9:23 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian, Kevin, Sun,
	Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi Allen,

> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of Lu Baolu
> Sent: Wednesday, December 11, 2019 10:12 AM
> Subject: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
> 
> Intel VT-d in scalable mode supports two types of page tables for IOVA translation:
> first level and second level. The IOMMU driver can choose one from both for IOVA
> translation according to the use case. This sets up the pasid entry if a domain is
> selected to use the first-level page table for iova translation.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++--
>  include/linux/intel-iommu.h | 10 ++++----
>  2 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
> 2b5a47584baf..83a7abf0c4f0 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain
> *domain)
>  	return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;  }
> 
> +static inline bool domain_use_first_level(struct dmar_domain *domain) {
> +	return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; }
> +
>  static inline int domain_pfn_supported(struct dmar_domain *domain,
>  				       unsigned long pfn)
>  {
> @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain
> *domain, unsigned long iov_pfn,
>  		return -EINVAL;
> 
>  	prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
> +	if (domain_use_first_level(domain))
> +		prot |= DMA_FL_PTE_PRESENT;

For DMA_PTE_SNP bit, I think there needs some work. The bit 11 of prot
should be cleared when FLPT is used for IOVA.

Also, we need to set bit 63 "XD" properly. e.g. If bit 11 of prot is set, it
means snoop required, then "XD" bit is "0". If bit 11 of prot is "0", it means
this domain is not snooping, so you may want to set "XD" bit as "1". With
such enhancement, I think IOVA over FLPT would have as less difference
with IOVA over SLPT.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-11  2:12 ` [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb " Lu Baolu
@ 2019-12-13 11:42   ` Liu, Yi L
  2019-12-14  3:24     ` Lu Baolu
  0 siblings, 1 reply; 22+ messages in thread
From: Liu, Yi L @ 2019-12-13 11:42 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian, Kevin, Sun,
	Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi Allen,

> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of Lu Baolu
> Sent: Wednesday, December 11, 2019 10:12 AM
> To: Joerg Roedel <joro@8bytes.org>; David Woodhouse <dwmw2@infradead.org>;
> Subject: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
> 
> When software has changed first-level tables, it should invalidate
> the affected IOTLB and the paging-structure-caches using the PASID-
> based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/dmar.c        | 41 ++++++++++++++++++++++++++++++++++
>  drivers/iommu/intel-iommu.c | 44 ++++++++++++++++++++++++-------------
>  include/linux/intel-iommu.h |  2 ++
>  3 files changed, 72 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 3acfa6a25fa2..fb30d5053664 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1371,6 +1371,47 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16
> sid, u16 pfsid,
>  	qi_submit_sync(&desc, iommu);
>  }
> 
> +/* PASID-based IOTLB invalidation */
> +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
> +		     unsigned long npages, bool ih)
> +{
> +	struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> +
> +	/*
> +	 * npages == -1 means a PASID-selective invalidation, otherwise,
> +	 * a positive value for Page-selective-within-PASID invalidation.
> +	 * 0 is not a valid input.
> +	 */
> +	if (WARN_ON(!npages)) {
> +		pr_err("Invalid input npages = %ld\n", npages);
> +		return;
> +	}
> +
> +	if (npages == -1) {
> +		desc.qw0 = QI_EIOTLB_PASID(pasid) |
> +				QI_EIOTLB_DID(did) |
> +				QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
> +				QI_EIOTLB_TYPE;
> +		desc.qw1 = 0;
> +	} else {
> +		int mask = ilog2(__roundup_pow_of_two(npages));
> +		unsigned long align = (1ULL << (VTD_PAGE_SHIFT + mask));
> +
> +		if (WARN_ON_ONCE(!ALIGN(addr, align)))
> +			addr &= ~(align - 1);
> +
> +		desc.qw0 = QI_EIOTLB_PASID(pasid) |
> +				QI_EIOTLB_DID(did) |
> +				QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
> +				QI_EIOTLB_TYPE;
> +		desc.qw1 = QI_EIOTLB_ADDR(addr) |
> +				QI_EIOTLB_IH(ih) |
> +				QI_EIOTLB_AM(mask);
> +	}
> +
> +	qi_submit_sync(&desc, iommu);
> +}
> +
>  /*
>   * Disable Queued Invalidation interface.
>   */
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 83a7abf0c4f0..e47f5fe37b59 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1520,18 +1520,24 @@ static void iommu_flush_iotlb_psi(struct intel_iommu
> *iommu,
> 
>  	if (ih)
>  		ih = 1 << 6;
> -	/*
> -	 * Fallback to domain selective flush if no PSI support or the size is
> -	 * too big.
> -	 * PSI requires page size to be 2 ^ x, and the base address is naturally
> -	 * aligned to the size
> -	 */
> -	if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu-
> >cap))
> -		iommu->flush.flush_iotlb(iommu, did, 0, 0,
> -						DMA_TLB_DSI_FLUSH);
> -	else
> -		iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
> -						DMA_TLB_PSI_FLUSH);
> +
> +	if (domain_use_first_level(domain)) {
> +		qi_flush_piotlb(iommu, did, domain->default_pasid,
> +				addr, pages, ih);

I'm not sure if my understanding is correct. But let me tell a story.
Assuming we assign a mdev and a PF/VF to a single VM, then there
will be p_iotlb tagged with PASID_RID2PASID and p_iotlb tagged with
default_pasid. We may want to flush both... If this operation is
invoked per-device, then need to pass in a hint to indicate whether
to use PASID_RID2PASID or default_pasid, or you may just issue two
flush with the two PASID values. Thoughts?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
  2019-12-13  9:23   ` Liu, Yi L
@ 2019-12-14  3:03     ` Lu Baolu
  2019-12-15  9:37       ` Liu, Yi L
  0 siblings, 1 reply; 22+ messages in thread
From: Lu Baolu @ 2019-12-14  3:03 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: baolu.lu, Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian,
	Kevin, Sun, Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi Liu Yi,

Thanks for reviewing my patch.

On 12/13/19 5:23 PM, Liu, Yi L wrote:
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
>> Of Lu Baolu
>> Sent: Wednesday, December 11, 2019 10:12 AM
>> Subject: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
>>
>> Intel VT-d in scalable mode supports two types of page tables for IOVA translation:
>> first level and second level. The IOMMU driver can choose one from both for IOVA
>> translation according to the use case. This sets up the pasid entry if a domain is
>> selected to use the first-level page table for iova translation.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++--
>>   include/linux/intel-iommu.h | 10 ++++----
>>   2 files changed, 52 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
>> 2b5a47584baf..83a7abf0c4f0 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain
>> *domain)
>>   	return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;  }
>>
>> +static inline bool domain_use_first_level(struct dmar_domain *domain) {
>> +	return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; }
>> +
>>   static inline int domain_pfn_supported(struct dmar_domain *domain,
>>   				       unsigned long pfn)
>>   {
>> @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain
>> *domain, unsigned long iov_pfn,
>>   		return -EINVAL;
>>
>>   	prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
>> +	if (domain_use_first_level(domain))
>> +		prot |= DMA_FL_PTE_PRESENT;
> 
> For DMA_PTE_SNP bit, I think there needs some work. The bit 11 of prot
> should be cleared when FLPT is used for IOVA.

SNP (bit 11) is only for second level. This bit is ignored for first
level page table walk. We should clear this bit for first level anyway.

> 
> Also, we need to set bit 63 "XD" properly. e.g. If bit 11 of prot is set, it
> means snoop required, then "XD" bit is "0". If bit 11 of prot is "0", it means
> this domain is not snooping, so you may want to set "XD" bit as "1". With
> such enhancement, I think IOVA over FLPT would have as less difference
> with IOVA over SLPT.

XD (bit 63) is only for the first level, and SNP (bit 11) is only for
second level, right? I think we need to always set XD bit for IOVA over
FL case. thoughts?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-13 11:42   ` Liu, Yi L
@ 2019-12-14  3:24     ` Lu Baolu
  2019-12-15  9:22       ` Liu, Yi L
  0 siblings, 1 reply; 22+ messages in thread
From: Lu Baolu @ 2019-12-14  3:24 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: baolu.lu, Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian,
	Kevin, Sun, Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi Liu Yi,

On 12/13/19 7:42 PM, Liu, Yi L wrote:
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
>> Of Lu Baolu
>> Sent: Wednesday, December 11, 2019 10:12 AM
>> To: Joerg Roedel <joro@8bytes.org>; David Woodhouse <dwmw2@infradead.org>;
>> Subject: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
>>
>> When software has changed first-level tables, it should invalidate
>> the affected IOTLB and the paging-structure-caches using the PASID-
>> based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/dmar.c        | 41 ++++++++++++++++++++++++++++++++++
>>   drivers/iommu/intel-iommu.c | 44 ++++++++++++++++++++++++-------------
>>   include/linux/intel-iommu.h |  2 ++
>>   3 files changed, 72 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
>> index 3acfa6a25fa2..fb30d5053664 100644
>> --- a/drivers/iommu/dmar.c
>> +++ b/drivers/iommu/dmar.c
>> @@ -1371,6 +1371,47 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16
>> sid, u16 pfsid,
>>   	qi_submit_sync(&desc, iommu);
>>   }
>>
>> +/* PASID-based IOTLB invalidation */
>> +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
>> +		     unsigned long npages, bool ih)
>> +{
>> +	struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
>> +
>> +	/*
>> +	 * npages == -1 means a PASID-selective invalidation, otherwise,
>> +	 * a positive value for Page-selective-within-PASID invalidation.
>> +	 * 0 is not a valid input.
>> +	 */
>> +	if (WARN_ON(!npages)) {
>> +		pr_err("Invalid input npages = %ld\n", npages);
>> +		return;
>> +	}
>> +
>> +	if (npages == -1) {
>> +		desc.qw0 = QI_EIOTLB_PASID(pasid) |
>> +				QI_EIOTLB_DID(did) |
>> +				QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
>> +				QI_EIOTLB_TYPE;
>> +		desc.qw1 = 0;
>> +	} else {
>> +		int mask = ilog2(__roundup_pow_of_two(npages));
>> +		unsigned long align = (1ULL << (VTD_PAGE_SHIFT + mask));
>> +
>> +		if (WARN_ON_ONCE(!ALIGN(addr, align)))
>> +			addr &= ~(align - 1);
>> +
>> +		desc.qw0 = QI_EIOTLB_PASID(pasid) |
>> +				QI_EIOTLB_DID(did) |
>> +				QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
>> +				QI_EIOTLB_TYPE;
>> +		desc.qw1 = QI_EIOTLB_ADDR(addr) |
>> +				QI_EIOTLB_IH(ih) |
>> +				QI_EIOTLB_AM(mask);
>> +	}
>> +
>> +	qi_submit_sync(&desc, iommu);
>> +}
>> +
>>   /*
>>    * Disable Queued Invalidation interface.
>>    */
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 83a7abf0c4f0..e47f5fe37b59 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -1520,18 +1520,24 @@ static void iommu_flush_iotlb_psi(struct intel_iommu
>> *iommu,
>>
>>   	if (ih)
>>   		ih = 1 << 6;
>> -	/*
>> -	 * Fallback to domain selective flush if no PSI support or the size is
>> -	 * too big.
>> -	 * PSI requires page size to be 2 ^ x, and the base address is naturally
>> -	 * aligned to the size
>> -	 */
>> -	if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu-
>>> cap))
>> -		iommu->flush.flush_iotlb(iommu, did, 0, 0,
>> -						DMA_TLB_DSI_FLUSH);
>> -	else
>> -		iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
>> -						DMA_TLB_PSI_FLUSH);
>> +
>> +	if (domain_use_first_level(domain)) {
>> +		qi_flush_piotlb(iommu, did, domain->default_pasid,
>> +				addr, pages, ih);
> 
> I'm not sure if my understanding is correct. But let me tell a story.
> Assuming we assign a mdev and a PF/VF to a single VM, then there
> will be p_iotlb tagged with PASID_RID2PASID and p_iotlb tagged with
> default_pasid. We may want to flush both... If this operation is

I assume that SRIOV and SIOV are exclusive. You can't enable both SRIOV
and SIOV on a single device. So the mdev and PF/VF are from different
devices, right?

Or, in SRIOV case, you can wrap a PF or VF as a mediated device. But
this mdev still be backed with a pasid of RID2PASID.

> invoked per-device, then need to pass in a hint to indicate whether
> to use PASID_RID2PASID or default_pasid, or you may just issue two
> flush with the two PASID values. Thoughts?

This is per-domain and each domain has specific domain id and default
pasid (assume default domain is 0 in RID2PASID case).

Best regards,
baolu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-14  3:24     ` Lu Baolu
@ 2019-12-15  9:22       ` Liu, Yi L
  2019-12-17  1:19         ` Lu Baolu
  0 siblings, 1 reply; 22+ messages in thread
From: Liu, Yi L @ 2019-12-15  9:22 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian, Kevin, Sun,
	Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi Baolu,

Please check replies below:

> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> Sent: Saturday, December 14, 2019 11:24 AM
> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David
> Woodhouse <dwmw2@infradead.org>; Alex Williamson
> <alex.williamson@redhat.com>
> Subject: Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first
> level
> 
> Hi Liu Yi,
> 
> On 12/13/19 7:42 PM, Liu, Yi L wrote:
> >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf
> >> Of Lu Baolu
> >> Sent: Wednesday, December 11, 2019 10:12 AM
> >> To: Joerg Roedel <joro@8bytes.org>; David Woodhouse
> <dwmw2@infradead.org>;
> >> Subject: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first
> level
> >>
> >> When software has changed first-level tables, it should invalidate
> >> the affected IOTLB and the paging-structure-caches using the PASID-
> >> based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4.
> >>
> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> >> ---
> >>   drivers/iommu/dmar.c        | 41 ++++++++++++++++++++++++++++++++++
> >>   drivers/iommu/intel-iommu.c | 44 ++++++++++++++++++++++++-------------
> >>   include/linux/intel-iommu.h |  2 ++
> >>   3 files changed, 72 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> >> index 3acfa6a25fa2..fb30d5053664 100644
> >> --- a/drivers/iommu/dmar.c
> >> +++ b/drivers/iommu/dmar.c
> >> @@ -1371,6 +1371,47 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu,
> u16
> >> sid, u16 pfsid,
> >>   	qi_submit_sync(&desc, iommu);
> >>   }
> >>
> >> +/* PASID-based IOTLB invalidation */
> >> +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
> >> +		     unsigned long npages, bool ih)
> >> +{
> >> +	struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> >> +
> >> +	/*
> >> +	 * npages == -1 means a PASID-selective invalidation, otherwise,
> >> +	 * a positive value for Page-selective-within-PASID invalidation.
> >> +	 * 0 is not a valid input.
> >> +	 */
> >> +	if (WARN_ON(!npages)) {
> >> +		pr_err("Invalid input npages = %ld\n", npages);
> >> +		return;
> >> +	}
> >> +
> >> +	if (npages == -1) {
> >> +		desc.qw0 = QI_EIOTLB_PASID(pasid) |
> >> +				QI_EIOTLB_DID(did) |
> >> +				QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
> >> +				QI_EIOTLB_TYPE;
> >> +		desc.qw1 = 0;
> >> +	} else {
> >> +		int mask = ilog2(__roundup_pow_of_two(npages));
> >> +		unsigned long align = (1ULL << (VTD_PAGE_SHIFT + mask));
> >> +
> >> +		if (WARN_ON_ONCE(!ALIGN(addr, align)))
> >> +			addr &= ~(align - 1);
> >> +
> >> +		desc.qw0 = QI_EIOTLB_PASID(pasid) |
> >> +				QI_EIOTLB_DID(did) |
> >> +				QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
> >> +				QI_EIOTLB_TYPE;
> >> +		desc.qw1 = QI_EIOTLB_ADDR(addr) |
> >> +				QI_EIOTLB_IH(ih) |
> >> +				QI_EIOTLB_AM(mask);
> >> +	}
> >> +
> >> +	qi_submit_sync(&desc, iommu);
> >> +}
> >> +
> >>   /*
> >>    * Disable Queued Invalidation interface.
> >>    */
> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> >> index 83a7abf0c4f0..e47f5fe37b59 100644
> >> --- a/drivers/iommu/intel-iommu.c
> >> +++ b/drivers/iommu/intel-iommu.c
> >> @@ -1520,18 +1520,24 @@ static void iommu_flush_iotlb_psi(struct
> intel_iommu
> >> *iommu,
> >>
> >>   	if (ih)
> >>   		ih = 1 << 6;
> >> -	/*
> >> -	 * Fallback to domain selective flush if no PSI support or the size is
> >> -	 * too big.
> >> -	 * PSI requires page size to be 2 ^ x, and the base address is naturally
> >> -	 * aligned to the size
> >> -	 */
> >> -	if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu-
> >>> cap))
> >> -		iommu->flush.flush_iotlb(iommu, did, 0, 0,
> >> -						DMA_TLB_DSI_FLUSH);
> >> -	else
> >> -		iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
> >> -						DMA_TLB_PSI_FLUSH);
> >> +
> >> +	if (domain_use_first_level(domain)) {
> >> +		qi_flush_piotlb(iommu, did, domain->default_pasid,
> >> +				addr, pages, ih);
> >
> > I'm not sure if my understanding is correct. But let me tell a story.
> > Assuming we assign a mdev and a PF/VF to a single VM, then there
> > will be p_iotlb tagged with PASID_RID2PASID and p_iotlb tagged with
> > default_pasid. We may want to flush both... If this operation is
> 
> I assume that SRIOV and SIOV are exclusive. You can't enable both SRIOV
> and SIOV on a single device.

yes, but I'm not talking use them on a single device...

> So the mdev and PF/VF are from different
> devices, right?

yes, the case I mentioned above is: a mdev from a device (say devA),
and another device (say devB). Create mdev on devA and assign it to
a VM together with devB.

> 
> Or, in SRIOV case, you can wrap a PF or VF as a mediated device. But
> this mdev still be backed with a pasid of RID2PASID.

My comment has no business with wrapping PF/VR as mdev...

> 
> > invoked per-device, then need to pass in a hint to indicate whether
> > to use PASID_RID2PASID or default_pasid, or you may just issue two
> > flush with the two PASID values. Thoughts?
> 
> This is per-domain and each domain has specific domain id and default
> pasid (assume default domain is 0 in RID2PASID case).

Ok, let me explain more... default pasid is meaningful only when
the domain has been attached to a device as an aux-domain. right?
If a domain only has one device, and it is attached to this device as
normal domain (normal domain means non aux-domain here). Then
you should flush cache with domain-id and RID2PASID value.
If a domain has one device, and it is attached to this device as
aux-domain. Then you may want to flush cache with domain-id
and default pasid. right?
Then let's come to the case I mentioned in previous email. a mdev
and another device assigned to a single VM. In host, you will have
a domain which has two devices, one device(deva) is attached as
normal domain, another one (devB) is attached as aux-domain. Then
which pasid should be used when the mapping in IOVA page table is
modified? RID2PASID or default pasid? I think both should be used
since the domain means differently to the two devices. If you just
use default pasid, then deva may still be able to use stale caches.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
  2019-12-14  3:03     ` Lu Baolu
@ 2019-12-15  9:37       ` Liu, Yi L
  2019-12-17  2:03         ` Lu Baolu
  0 siblings, 1 reply; 22+ messages in thread
From: Liu, Yi L @ 2019-12-15  9:37 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian, Kevin, Sun,
	Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi Baolu,

> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> Sent: Saturday, December 14, 2019 11:04 AM
> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David
> Woodhouse <dwmw2@infradead.org>; Alex Williamson
> <alex.williamson@redhat.com>
> Subject: Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
> 
> Hi Liu Yi,
> 
> Thanks for reviewing my patch.
> 
> On 12/13/19 5:23 PM, Liu, Yi L wrote:
> >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> >> Behalf Of Lu Baolu
> >> Sent: Wednesday, December 11, 2019 10:12 AM
> >> Subject: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over
> >> first level
> >>
> >> Intel VT-d in scalable mode supports two types of page tables for IOVA
> translation:
> >> first level and second level. The IOMMU driver can choose one from
> >> both for IOVA translation according to the use case. This sets up the
> >> pasid entry if a domain is selected to use the first-level page table for iova
> translation.
> >>
> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> >> ---
> >>   drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++--
> >>   include/linux/intel-iommu.h | 10 ++++----
> >>   2 files changed, 52 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/iommu/intel-iommu.c
> >> b/drivers/iommu/intel-iommu.c index
> >> 2b5a47584baf..83a7abf0c4f0 100644
> >> --- a/drivers/iommu/intel-iommu.c
> >> +++ b/drivers/iommu/intel-iommu.c
> >> @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct
> >> dmar_domain
> >> *domain)
> >>   	return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;  }
> >>
> >> +static inline bool domain_use_first_level(struct dmar_domain *domain) {
> >> +	return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; }
> >> +
> >>   static inline int domain_pfn_supported(struct dmar_domain *domain,
> >>   				       unsigned long pfn)
> >>   {
> >> @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain
> >> *domain, unsigned long iov_pfn,
> >>   		return -EINVAL;
> >>
> >>   	prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
> >> +	if (domain_use_first_level(domain))
> >> +		prot |= DMA_FL_PTE_PRESENT;
> >
> > For DMA_PTE_SNP bit, I think there needs some work. The bit 11 of prot
> > should be cleared when FLPT is used for IOVA.
> 
> SNP (bit 11) is only for second level. This bit is ignored for first level page table walk.
> We should clear this bit for first level anyway.

I think this is what I meant above? This patch somehow misses the operation
to clear the bit 11.

> >
> > Also, we need to set bit 63 "XD" properly. e.g. If bit 11 of prot is
> > set, it means snoop required, then "XD" bit is "0". If bit 11 of prot
> > is "0", it means this domain is not snooping, so you may want to set
> > "XD" bit as "1". With such enhancement, I think IOVA over FLPT would
> > have as less difference with IOVA over SLPT.
> 
> XD (bit 63) is only for the first level, and SNP (bit 11) is only for second level, right? I
> think we need to always set XD bit for IOVA over FL case. thoughts?

Oops, I made a mistake here. Please forget SNP bit, there is no way to control SNP
with first level page table.:-)

Actually, it is execute (bit 1) of second level page table which I wanted to say.
If software sets R/W/X permission to an IOVA, with IOVA over second level
page table, it will set bit 1. However, if IOVA is over first level page table, it
may need to clear XD bit. This is what I want to say here. If IOVA doesn’t allow
execute permission, it's ok to always set XD bit for IOVA over FL case. But I
would like to do it just as what we did for R/W permission. R/W permission
relies on the permission configured by the page map caller. right?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-15  9:22       ` Liu, Yi L
@ 2019-12-17  1:19         ` Lu Baolu
  2019-12-17  1:37           ` Lu Baolu
  0 siblings, 1 reply; 22+ messages in thread
From: Lu Baolu @ 2019-12-17  1:19 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: baolu.lu, Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian,
	Kevin, Sun, Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi Yi,

On 12/15/19 5:22 PM, Liu, Yi L wrote:
> Ok, let me explain more... default pasid is meaningful only when
> the domain has been attached to a device as an aux-domain. right?

No exactly. Each domain has a specific default pasid, no matter normal
domain (RID based) or aux-domain (PASID based). The difference is for a
normal domain RID2PASID value is used, for an aux-domain the pasid is
allocated from a global pool.

The same concept used in VT-d 3.x scalable mode. For RID based DMA
translation RID2PASID value is used when walking the tables; For PASID
based DMA translation a real pasid in the transaction is used.

> If a domain only has one device, and it is attached to this device as
> normal domain (normal domain means non aux-domain here). Then
> you should flush cache with domain-id and RID2PASID value.
> If a domain has one device, and it is attached to this device as
> aux-domain. Then you may want to flush cache with domain-id
> and default pasid. right?

A domain's counterpart is IOMMU group. So we say attach/detach domain
to/from devices in a group. We don't allow devices with different
default pasid sitting in a same group, right?

> Then let's come to the case I mentioned in previous email. a mdev
> and another device assigned to a single VM. In host, you will have
> a domain which has two devices, one device(deva) is attached as

No. We will have two IOMMU groups and two domains. Correct me if my
understanding is not right.

Best regards,
baolu

> normal domain, another one (devB) is attached as aux-domain. Then
> which pasid should be used when the mapping in IOVA page table is
> modified? RID2PASID or default pasid? I think both should be used
> since the domain means differently to the two devices. If you just
> use default pasid, then deva may still be able to use stale caches.
> 
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-17  1:19         ` Lu Baolu
@ 2019-12-17  1:37           ` Lu Baolu
  2019-12-17  1:39             ` Lu Baolu
  2019-12-17  2:26             ` Liu, Yi L
  0 siblings, 2 replies; 22+ messages in thread
From: Lu Baolu @ 2019-12-17  1:37 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: baolu.lu, Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian,
	Kevin, Sun, Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi again,

On 12/17/19 9:19 AM, Lu Baolu wrote:
> Hi Yi,
> 
> On 12/15/19 5:22 PM, Liu, Yi L wrote:
>> Ok, let me explain more... default pasid is meaningful only when
>> the domain has been attached to a device as an aux-domain. right?
> 
> No exactly. Each domain has a specific default pasid, no matter normal
> domain (RID based) or aux-domain (PASID based). The difference is for a
> normal domain RID2PASID value is used, for an aux-domain the pasid is
> allocated from a global pool.
> 
> The same concept used in VT-d 3.x scalable mode. For RID based DMA
> translation RID2PASID value is used when walking the tables; For PASID
> based DMA translation a real pasid in the transaction is used.
> 
>> If a domain only has one device, and it is attached to this device as
>> normal domain (normal domain means non aux-domain here). Then
>> you should flush cache with domain-id and RID2PASID value.
>> If a domain has one device, and it is attached to this device as
>> aux-domain. Then you may want to flush cache with domain-id
>> and default pasid. right?
> 
> A domain's counterpart is IOMMU group. So we say attach/detach domain
> to/from devices in a group. We don't allow devices with different
> default pasid sitting in a same group, right?
> 
>> Then let's come to the case I mentioned in previous email. a mdev
>> and another device assigned to a single VM. In host, you will have
>> a domain which has two devices, one device(deva) is attached as
> 
> No. We will have two IOMMU groups and two domains. Correct me if my
> understanding is not right.

Reconsidered this. Unfortunately, my understanding is not right. :-(

A single domain could be attached to multiple IOMMU groups. So it
comes to the issue you concerned. Do I understand it right?

> 
>> normal domain, another one (devB) is attached as aux-domain. Then
>> which pasid should be used when the mapping in IOVA page table is
>> modified? RID2PASID or default pasid? I think both should be used
>> since the domain means differently to the two devices. If you just
>> use default pasid, then deva may still be able to use stale caches.

You are right. I will change it accordingly. The logic should look
like:

if (domain attached to physical device)
	flush_piotlb_with_RID2PASID()
else if (domain_attached_to_mdev_device)
	flush_piotlb_with_default_pasid()

Does this work for you? Thanks for catching this!

Best regards,
baolu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-17  1:37           ` Lu Baolu
@ 2019-12-17  1:39             ` Lu Baolu
  2019-12-17  2:44               ` Liu, Yi L
  2019-12-17  2:26             ` Liu, Yi L
  1 sibling, 1 reply; 22+ messages in thread
From: Lu Baolu @ 2019-12-17  1:39 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: baolu.lu, Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian,
	Kevin, Sun, Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi,

On 12/17/19 9:37 AM, Lu Baolu wrote:
> You are right. I will change it accordingly. The logic should look
> like:
> 
> if (domain attached to physical device)
>      flush_piotlb_with_RID2PASID()
> else if (domain_attached_to_mdev_device)
>      flush_piotlb_with_default_pasid()
> 

Both! so no "else" here.

Best regards,
baolu

> Does this work for you? Thanks for catching this!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
  2019-12-15  9:37       ` Liu, Yi L
@ 2019-12-17  2:03         ` Lu Baolu
  2019-12-17  2:33           ` Liu, Yi L
  0 siblings, 1 reply; 22+ messages in thread
From: Lu Baolu @ 2019-12-17  2:03 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: baolu.lu, Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian,
	Kevin, Sun, Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi Yi,

On 12/15/19 5:37 PM, Liu, Yi L wrote:
>> XD (bit 63) is only for the first level, and SNP (bit 11) is only for second level, right? I
>> think we need to always set XD bit for IOVA over FL case. thoughts?
> Oops, I made a mistake here. Please forget SNP bit, there is no way to control SNP
> with first level page table.:-)
> 
> Actually, it is execute (bit 1) of second level page table which I wanted to say.
> If software sets R/W/X permission to an IOVA, with IOVA over second level
> page table, it will set bit 1. However, if IOVA is over first level page table, it
> may need to clear XD bit. This is what I want to say here. If IOVA doesn’t allow
> execute permission, it's ok to always set XD bit for IOVA over FL case. But I
> would like to do it just as what we did for R/W permission. R/W permission
> relies on the permission configured by the page map caller. right?

Got your point.

Current driver always cleard X (bit 2) in the second level page table.
So we will always set XD bit (bit 63) in the first level page table.
If we decide to use the X permission, we need a separated patch, right?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-17  1:37           ` Lu Baolu
  2019-12-17  1:39             ` Lu Baolu
@ 2019-12-17  2:26             ` Liu, Yi L
  2019-12-17  2:36               ` Liu, Yi L
  1 sibling, 1 reply; 22+ messages in thread
From: Liu, Yi L @ 2019-12-17  2:26 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian, Kevin, Sun,
	Yi Y, Peter Xu, iommu, kvm, linux-kernel

> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> Sent: Tuesday, December 17, 2019 9:37 AM
> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David
> Woodhouse <dwmw2@infradead.org>; Alex Williamson
> <alex.williamson@redhat.com>
> Subject: Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first
> level
> 
> Hi again,
> 
> On 12/17/19 9:19 AM, Lu Baolu wrote:
> > Hi Yi,
> >
> > On 12/15/19 5:22 PM, Liu, Yi L wrote:
> >> Ok, let me explain more... default pasid is meaningful only when
> >> the domain has been attached to a device as an aux-domain. right?
> >
> > No exactly. Each domain has a specific default pasid, no matter normal
> > domain (RID based) or aux-domain (PASID based). The difference is for a
> > normal domain RID2PASID value is used, for an aux-domain the pasid is
> > allocated from a global pool.
> >
> > The same concept used in VT-d 3.x scalable mode. For RID based DMA
> > translation RID2PASID value is used when walking the tables; For PASID
> > based DMA translation a real pasid in the transaction is used.
> >
> >> If a domain only has one device, and it is attached to this device as
> >> normal domain (normal domain means non aux-domain here). Then
> >> you should flush cache with domain-id and RID2PASID value.
> >> If a domain has one device, and it is attached to this device as
> >> aux-domain. Then you may want to flush cache with domain-id
> >> and default pasid. right?
> >
> > A domain's counterpart is IOMMU group. So we say attach/detach domain
> > to/from devices in a group. We don't allow devices with different
> > default pasid sitting in a same group, right?
> >
> >> Then let's come to the case I mentioned in previous email. a mdev
> >> and another device assigned to a single VM. In host, you will have
> >> a domain which has two devices, one device(deva) is attached as
> >
> > No. We will have two IOMMU groups and two domains. Correct me if my
> > understanding is not right.
> 
> Reconsidered this. Unfortunately, my understanding is not right. :-(
> 
> A single domain could be attached to multiple IOMMU groups. So it
> comes to the issue you concerned. Do I understand it right?

yes. Device within the same group has no such issue since such
devices are not able to enabled aux-domain. Now our understanding
are aligned. :-)

> >
> >> normal domain, another one (devB) is attached as aux-domain. Then
> >> which pasid should be used when the mapping in IOVA page table is
> >> modified? RID2PASID or default pasid? I think both should be used
> >> since the domain means differently to the two devices. If you just
> >> use default pasid, then deva may still be able to use stale caches.
> 
> You are right. I will change it accordingly. The logic should look
> like:
> 
> if (domain attached to physical device)
> 	flush_piotlb_with_RID2PASID()
> else if (domain_attached_to_mdev_device)
> 	flush_piotlb_with_default_pasid()
> 
> Does this work for you? Thanks for catching this!

If no else, it would work for scalable mode. ^_^ I noticed you've
already corrected by yourself in another reply. :-) Look forward to
your next version.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
  2019-12-17  2:03         ` Lu Baolu
@ 2019-12-17  2:33           ` Liu, Yi L
  0 siblings, 0 replies; 22+ messages in thread
From: Liu, Yi L @ 2019-12-17  2:33 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian, Kevin, Sun,
	Yi Y, Peter Xu, iommu, kvm, linux-kernel

> From: Lu Baolu < baolu.lu@linux.intel.com >
> Sent: Tuesday, December 17, 2019 10:04 AM
> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David
> Woodhouse <dwmw2@infradead.org>; Alex Williamson
> <alex.williamson@redhat.com>
> Subject: Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
> 
> Hi Yi,
> 
> On 12/15/19 5:37 PM, Liu, Yi L wrote:
> >> XD (bit 63) is only for the first level, and SNP (bit 11) is only for
> >> second level, right? I think we need to always set XD bit for IOVA over FL case.
> thoughts?
> > Oops, I made a mistake here. Please forget SNP bit, there is no way to
> > control SNP with first level page table.:-)
> >
> > Actually, it is execute (bit 1) of second level page table which I wanted to say.
> > If software sets R/W/X permission to an IOVA, with IOVA over second
> > level page table, it will set bit 1. However, if IOVA is over first
> > level page table, it may need to clear XD bit. This is what I want to
> > say here. If IOVA doesn’t allow execute permission, it's ok to always
> > set XD bit for IOVA over FL case. But I would like to do it just as
> > what we did for R/W permission. R/W permission relies on the permission
> configured by the page map caller. right?
> 
> Got your point.
> 
> Current driver always cleard X (bit 2) in the second level page table.
> So we will always set XD bit (bit 63) in the first level page table.

yes, I also noticed X (bit 2) is not used in intel-iommu driver. So I
know why you set XD for IOVA over FL case. But it's a little bit weird
to hard code it. That's why I suggested to relay page map caller's
permission input.

> If we decide to use the X permission, we need a separated patch, right?

sure, it would be a separate patch since current code doesn’t apply
X permission.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-17  2:26             ` Liu, Yi L
@ 2019-12-17  2:36               ` Liu, Yi L
  2019-12-17  4:13                 ` Lu Baolu
  0 siblings, 1 reply; 22+ messages in thread
From: Liu, Yi L @ 2019-12-17  2:36 UTC (permalink / raw)
  To: Liu, Yi L, Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian, Kevin, Sun,
	Yi Y, Peter Xu, iommu, kvm, linux-kernel

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, December 17, 2019 10:26 AM
> To: Lu Baolu <baolu.lu@linux.intel.com>; Joerg Roedel <joro@8bytes.org>; David
> Woodhouse <dwmw2@infradead.org>; Alex Williamson
> <alex.williamson@redhat.com>
> Subject: RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first
> level
> 
> > From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> > Sent: Tuesday, December 17, 2019 9:37 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David
> > Woodhouse <dwmw2@infradead.org>; Alex Williamson
> > <alex.williamson@redhat.com>
> > Subject: Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first
> > level
> >
> > Hi again,
> >
> > On 12/17/19 9:19 AM, Lu Baolu wrote:
> > > Hi Yi,
> > >
> > > On 12/15/19 5:22 PM, Liu, Yi L wrote:
> > >> Ok, let me explain more... default pasid is meaningful only when
> > >> the domain has been attached to a device as an aux-domain. right?
> > >
> > > No exactly. Each domain has a specific default pasid, no matter normal
> > > domain (RID based) or aux-domain (PASID based). The difference is for a
> > > normal domain RID2PASID value is used, for an aux-domain the pasid is
> > > allocated from a global pool.
> > >
> > > The same concept used in VT-d 3.x scalable mode. For RID based DMA
> > > translation RID2PASID value is used when walking the tables; For PASID
> > > based DMA translation a real pasid in the transaction is used.
> > >
> > >> If a domain only has one device, and it is attached to this device as
> > >> normal domain (normal domain means non aux-domain here). Then
> > >> you should flush cache with domain-id and RID2PASID value.
> > >> If a domain has one device, and it is attached to this device as
> > >> aux-domain. Then you may want to flush cache with domain-id
> > >> and default pasid. right?
> > >
> > > A domain's counterpart is IOMMU group. So we say attach/detach domain
> > > to/from devices in a group. We don't allow devices with different
> > > default pasid sitting in a same group, right?
> > >
> > >> Then let's come to the case I mentioned in previous email. a mdev
> > >> and another device assigned to a single VM. In host, you will have
> > >> a domain which has two devices, one device(deva) is attached as
> > >
> > > No. We will have two IOMMU groups and two domains. Correct me if my
> > > understanding is not right.
> >
> > Reconsidered this. Unfortunately, my understanding is not right. :-(
> >
> > A single domain could be attached to multiple IOMMU groups. So it
> > comes to the issue you concerned. Do I understand it right?
> 
> yes. Device within the same group has no such issue since such
> devices are not able to enabled aux-domain. Now our understanding
> are aligned. :-)
> 
> > >
> > >> normal domain, another one (devB) is attached as aux-domain. Then
> > >> which pasid should be used when the mapping in IOVA page table is
> > >> modified? RID2PASID or default pasid? I think both should be used
> > >> since the domain means differently to the two devices. If you just
> > >> use default pasid, then deva may still be able to use stale caches.
> >
> > You are right. I will change it accordingly. The logic should look
> > like:
> >
> > if (domain attached to physical device)
> > 	flush_piotlb_with_RID2PASID()
> > else if (domain_attached_to_mdev_device)
> > 	flush_piotlb_with_default_pasid()
> >
> > Does this work for you? Thanks for catching this!
> 
> If no else, it would work for scalable mode. ^_^ I noticed you've
> already corrected by yourself in another reply. :-) Look forward to
> your next version.

BTW. The discussion in this thread may apply to other cache flush
in your series. Please have a check. At least, there are two places which
need to be updated in this single patch.
 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-17  1:39             ` Lu Baolu
@ 2019-12-17  2:44               ` Liu, Yi L
  0 siblings, 0 replies; 22+ messages in thread
From: Liu, Yi L @ 2019-12-17  2:44 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian, Kevin, Sun,
	Yi Y, Peter Xu, iommu, kvm, linux-kernel

> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> Sent: Tuesday, December 17, 2019 9:39 AM
> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David
> Woodhouse <dwmw2@infradead.org>; Alex Williamson
> <alex.williamson@redhat.com>
> Subject: Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first
> level
> 
> Hi,
> 
> On 12/17/19 9:37 AM, Lu Baolu wrote:
> > You are right. I will change it accordingly. The logic should look
> > like:
> >
> > if (domain attached to physical device)
> >      flush_piotlb_with_RID2PASID()
> > else if (domain_attached_to_mdev_device)
> >      flush_piotlb_with_default_pasid()
> >
> 
> Both! so no "else" here.

aha, if we want to flush more precisely, we may check whether
domain->default_pasid is allocated. If not, it means no mdev is
involved, then we may save a p_iotlb_inv_dsc submission and also
save VT-d hardware circles from doing useless flush. This is just
optimization, it's up to you to pick it or not in this patchset. I'm
fine with flush "both" since it guarantees correctness.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level
  2019-12-17  2:36               ` Liu, Yi L
@ 2019-12-17  4:13                 ` Lu Baolu
  0 siblings, 0 replies; 22+ messages in thread
From: Lu Baolu @ 2019-12-17  4:13 UTC (permalink / raw)
  To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson
  Cc: Raj, Ashok, Kumar, Sanjay K, jacob.jun.pan, Tian, Kevin, Sun,
	Yi Y, Peter Xu, iommu, kvm, linux-kernel

Hi,

On 2019/12/17 10:36, Liu, Yi L wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Tuesday, December 17, 2019 10:26 AM
>> To: Lu Baolu <baolu.lu@linux.intel.com>; Joerg Roedel <joro@8bytes.org>; David
>> Woodhouse <dwmw2@infradead.org>; Alex Williamson
>> <alex.williamson@redhat.com>
>> Subject: RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first
>> level
>>
>>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
>>> Sent: Tuesday, December 17, 2019 9:37 AM
>>> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David
>>> Woodhouse <dwmw2@infradead.org>; Alex Williamson
>>> <alex.williamson@redhat.com>
>>> Subject: Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first
>>> level
>>>
>>> Hi again,
>>>
>>> On 12/17/19 9:19 AM, Lu Baolu wrote:
>>>> Hi Yi,
>>>>
>>>> On 12/15/19 5:22 PM, Liu, Yi L wrote:
>>>>> Ok, let me explain more... default pasid is meaningful only when
>>>>> the domain has been attached to a device as an aux-domain. right?
>>>> No exactly. Each domain has a specific default pasid, no matter normal
>>>> domain (RID based) or aux-domain (PASID based). The difference is for a
>>>> normal domain RID2PASID value is used, for an aux-domain the pasid is
>>>> allocated from a global pool.
>>>>
>>>> The same concept used in VT-d 3.x scalable mode. For RID based DMA
>>>> translation RID2PASID value is used when walking the tables; For PASID
>>>> based DMA translation a real pasid in the transaction is used.
>>>>
>>>>> If a domain only has one device, and it is attached to this device as
>>>>> normal domain (normal domain means non aux-domain here). Then
>>>>> you should flush cache with domain-id and RID2PASID value.
>>>>> If a domain has one device, and it is attached to this device as
>>>>> aux-domain. Then you may want to flush cache with domain-id
>>>>> and default pasid. right?
>>>> A domain's counterpart is IOMMU group. So we say attach/detach domain
>>>> to/from devices in a group. We don't allow devices with different
>>>> default pasid sitting in a same group, right?
>>>>
>>>>> Then let's come to the case I mentioned in previous email. a mdev
>>>>> and another device assigned to a single VM. In host, you will have
>>>>> a domain which has two devices, one device(deva) is attached as
>>>> No. We will have two IOMMU groups and two domains. Correct me if my
>>>> understanding is not right.
>>> Reconsidered this. Unfortunately, my understanding is not right. :-(
>>>
>>> A single domain could be attached to multiple IOMMU groups. So it
>>> comes to the issue you concerned. Do I understand it right?
>> yes. Device within the same group has no such issue since such
>> devices are not able to enabled aux-domain. Now our understanding
>> are aligned. :-)
>>
>>>>> normal domain, another one (devB) is attached as aux-domain. Then
>>>>> which pasid should be used when the mapping in IOVA page table is
>>>>> modified? RID2PASID or default pasid? I think both should be used
>>>>> since the domain means differently to the two devices. If you just
>>>>> use default pasid, then deva may still be able to use stale caches.
>>> You are right. I will change it accordingly. The logic should look
>>> like:
>>>
>>> if (domain attached to physical device)
>>> 	flush_piotlb_with_RID2PASID()
>>> else if (domain_attached_to_mdev_device)
>>> 	flush_piotlb_with_default_pasid()
>>>
>>> Does this work for you? Thanks for catching this!
>> If no else, it would work for scalable mode. ^_^ I noticed you've
>> already corrected by yourself in another reply. :-) Look forward to
>> your next version.
> BTW. The discussion in this thread may apply to other cache flush
> in your series. Please have a check. At least, there are two places which
> need to be updated in this single patch.

Sure. I will.

Best regards,

baolu
>   
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-12-17  4:13 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11  2:12 [PATCH v3 0/6] Use 1st-level for IOVA translation Lu Baolu
2019-12-11  2:12 ` [PATCH v3 1/6] iommu/vt-d: Identify domains using first level page table Lu Baolu
2019-12-11  2:12 ` [PATCH v3 2/6] iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr Lu Baolu
2019-12-11  2:12 ` [PATCH v3 3/6] iommu/vt-d: Add PASID_FLAG_FL5LP for first-level pasid setup Lu Baolu
2019-12-11  2:12 ` [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level Lu Baolu
2019-12-13  9:23   ` Liu, Yi L
2019-12-14  3:03     ` Lu Baolu
2019-12-15  9:37       ` Liu, Yi L
2019-12-17  2:03         ` Lu Baolu
2019-12-17  2:33           ` Liu, Yi L
2019-12-11  2:12 ` [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb " Lu Baolu
2019-12-13 11:42   ` Liu, Yi L
2019-12-14  3:24     ` Lu Baolu
2019-12-15  9:22       ` Liu, Yi L
2019-12-17  1:19         ` Lu Baolu
2019-12-17  1:37           ` Lu Baolu
2019-12-17  1:39             ` Lu Baolu
2019-12-17  2:44               ` Liu, Yi L
2019-12-17  2:26             ` Liu, Yi L
2019-12-17  2:36               ` Liu, Yi L
2019-12-17  4:13                 ` Lu Baolu
2019-12-11  2:12 ` [PATCH v3 6/6] iommu/vt-d: Use " Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).