linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] Add Intel VT-d nested translation
@ 2023-05-11 14:51 Yi Liu
  2023-05-11 14:51 ` [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
                   ` (10 more replies)
  0 siblings, 11 replies; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This is to add Intel VT-d nested translation based on IOMMUFD nesting
infrastructure. As the iommufd nesting infrastructure series[1], iommu
core supports new ops to report iommu hardware information, allocate
domains with user data and sync stage-1 IOTLB. The data required in
the three paths are vendor-specific, so

1) IOMMU_HW_INFO_TYPE_INTEL_VTD and struct iommu_device_info_vtd are
   defined to report iommu hardware information for Intel VT-d .
2) IOMMU_HWPT_DATA_VTD_S1 is defined for the Intel VT-d stage-1 page
   table, it will be used in the stage-1 domain allocation and IOTLB
   syncing path. struct iommu_hwpt_intel_vtd is defined to pass user_data
   for the Intel VT-d stage-1 domain allocation.
   struct iommu_hwpt_invalidate_intel_vtd is defined to pass the data for
   the Intel VT-d stage-1 IOTLB invalidation.

With above IOMMUFD extensions, the intel iommu driver implements the three
paths to support nested translation.

The first Intel platform supporting nested translation is Sapphire
Rapids which, unfortunately, has a hardware errata [2] requiring special
treatment. This errata happens when a stage-1 page table page (either
level) is located in a stage-2 read-only region. In that case the IOMMU
hardware may ignore the stage-2 RO permission and still set the A/D bit
in stage-1 page table entries during page table walking.

A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to report
this errata to userspace. With that restriction the user should either
disable nested translation to favor RO stage-2 mappings or ensure no
RO stage-2 mapping to enable nested translation.

Intel-iommu driver is armed with necessary checks to prevent such mix
in patch10 of this series.

Qemu currently does add RO mappings though. The vfio agent in Qemu
simply maps all valid regions in the GPA address space which certainly
includes RO regions e.g. vbios.

In reality we don't know a usage relying on DMA reads from the BIOS
region. Hence finding a way to allow user opt-out RO mappings in
Qemu might be an acceptable tradeoff. But how to achieve it cleanly
needs more discussion in Qemu community. For now we just hacked Qemu
to test.

Complete code can be found in [3], QEMU could can be found in [4].

base-commit: ce9b593b1f74ccd090edc5d2ad397da84baa9946

[1] https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.com/
[2] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html
[3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[4] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4.mig.reset.v4_var3%2Bnesting

Change log:
v3:
 - Further split the patches into an order of adding helpers for nested
   domain, iotlb flush, nested domain attachment and nested domain allocation
   callback, then report the hw_info to userspace.
 - Add batch support in cache invalidation from userspace
 - Disallow nested translation usage if RO mappings exists in stage-2 domain
   due to errata on readonly mappings on Sapphire Rapids platform.

v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.com/
 - The iommufd infrastructure is split to be separate series.

v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Lu Baolu (5):
  iommu/vt-d: Extend dmar_domain to support nested domain
  iommu/vt-d: Add helper for nested domain allocation
  iommu/vt-d: Add helper to setup pasid nested translation
  iommu/vt-d: Add nested domain allocation
  iommu/vt-d: Disallow nesting on domains with read-only mappings

Yi Liu (5):
  iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  iommu/vt-d: Make domain attach helpers to be extern
  iommu/vt-d: Set the nested domain to a device
  iommu/vt-d: Add iotlb flush for nested domain
  iommu/vt-d: Implement hw_info for iommu capability query

 drivers/iommu/intel/Makefile |   2 +-
 drivers/iommu/intel/iommu.c  |  78 ++++++++++++---
 drivers/iommu/intel/iommu.h  |  55 +++++++++--
 drivers/iommu/intel/nested.c | 181 +++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.c  | 151 +++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h  |   2 +
 drivers/iommu/iommufd/main.c |   6 ++
 include/linux/iommu.h        |   1 +
 include/uapi/linux/iommufd.h | 149 ++++++++++++++++++++++++++++
 9 files changed, 603 insertions(+), 22 deletions(-)
 create mode 100644 drivers/iommu/intel/nested.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-24  6:59   ` Tian, Kevin
                     ` (2 more replies)
  2023-05-11 14:51 ` [PATCH v3 02/10] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
                   ` (9 subsequent siblings)
  10 siblings, 3 replies; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This adds IOMMU_HWPT_TYPE_VTD_S1 for stage-1 hw_pagetable of Intel VT-d
and the corressponding data structure for userspace specified parameter
for the domain allocation.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/uapi/linux/iommufd.h | 57 ++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 06dcad6ab082..c2658394827a 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -353,9 +353,64 @@ struct iommu_vfio_ioas {
 /**
  * enum iommu_hwpt_type - IOMMU HWPT Type
  * @IOMMU_HWPT_TYPE_DEFAULT: default
+ * @IOMMU_HWPT_TYPE_VTD_S1: Intel VT-d stage-1 page table
  */
 enum iommu_hwpt_type {
 	IOMMU_HWPT_TYPE_DEFAULT,
+	IOMMU_HWPT_TYPE_VTD_S1,
+};
+
+/**
+ * enum iommu_hwpt_intel_vtd_flags - Intel VT-d stage-1 page
+ *				     table entry attributes
+ * @IOMMU_VTD_PGTBL_SRE: Supervisor request
+ * @IOMMU_VTD_PGTBL_EAFE: Extended access enable
+ * @IOMMU_VTD_PGTBL_PCD: Page-level cache disable
+ * @IOMMU_VTD_PGTBL_PWT: Page-level write through
+ * @IOMMU_VTD_PGTBL_EMTE: Extended mem type enable
+ * @IOMMU_VTD_PGTBL_CD: PASID-level cache disable
+ * @IOMMU_VTD_PGTBL_WPE: Write protect enable
+ */
+enum iommu_hwpt_intel_vtd_flags {
+	IOMMU_VTD_PGTBL_SRE = 1 << 0,
+	IOMMU_VTD_PGTBL_EAFE = 1 << 1,
+	IOMMU_VTD_PGTBL_PCD = 1 << 2,
+	IOMMU_VTD_PGTBL_PWT = 1 << 3,
+	IOMMU_VTD_PGTBL_EMTE = 1 << 4,
+	IOMMU_VTD_PGTBL_CD = 1 << 5,
+	IOMMU_VTD_PGTBL_WPE = 1 << 6,
+	IOMMU_VTD_PGTBL_LAST = 1 << 7,
+};
+
+/**
+ * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
+ *                               stage-1 page table info
+ * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
+ * @pgtbl_addr: The base address of the user-managed stage-1 page table.
+ * @pat: Page attribute table data to compute effective memory type
+ * @emt: Extended memory type
+ * @addr_width: The address width of the untranslated addresses that are
+ *              subjected to the user-managed stage-1 page table.
+ * @__reserved: Must be 0
+ *
+ * The Intel VT-d specific data for creating hw_pagetable to represent
+ * the user-managed stage-1 page table that is used in nested translation.
+ *
+ * In nested translation, the stage-1 page table locates in the address
+ * space that defined by the corresponding stage-2 page table. Hence the
+ * stage-1 page table base address value should not be higher than the
+ * maximum untranslated address of stage-2 page table.
+ *
+ * The paging level of the stage-1 page table should be compatible with
+ * the hardware iommu. Otherwise, the allocation would be failed.
+ */
+struct iommu_hwpt_intel_vtd {
+	__u64 flags;
+	__u64 pgtbl_addr;
+	__u32 pat;
+	__u32 emt;
+	__u32 addr_width;
+	__u32 __reserved;
 };
 
 /**
@@ -391,6 +446,8 @@ enum iommu_hwpt_type {
  * +------------------------------+-------------------------------------+-----------+
  * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
  * +------------------------------+-------------------------------------+-----------+
+ * | IOMMU_HWPT_TYPE_VTD_S1       |      struct iommu_hwpt_intel_vtd    |    HWPT   |
+ * +------------------------------+-------------------------------------+-----------+
  */
 struct iommu_hwpt_alloc {
 	__u32 size;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 02/10] iommu/vt-d: Extend dmar_domain to support nested domain
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
  2023-05-11 14:51 ` [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-24  7:02   ` Tian, Kevin
  2023-05-11 14:51 ` [PATCH v3 03/10] iommu/vt-d: Add helper for nested domain allocation Yi Liu
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Lu Baolu <baolu.lu@linux.intel.com>

The nested domain fields are exclusive to those that used for a DMA
remapping domain. Use union to avoid memory waste.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 1c5e1d88862b..e818520f4068 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -596,15 +596,38 @@ struct dmar_domain {
 	spinlock_t lock;		/* Protect device tracking lists */
 	struct list_head devices;	/* all devices' list */
 
-	struct dma_pte	*pgd;		/* virtual address */
-	int		gaw;		/* max guest address width */
-
-	/* adjusted guest address width, 0 is level 2 30-bit */
-	int		agaw;
 	int		iommu_superpage;/* Level of superpages supported:
 					   0 == 4KiB (no superpages), 1 == 2MiB,
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
-	u64		max_addr;	/* maximum mapped address */
+	union {
+		/* DMA remapping domain */
+		struct {
+			/* virtual address */
+			struct dma_pte	*pgd;
+			/* max guest address width */
+			int		gaw;
+			/*
+			 * adjusted guest address width:
+			 *   0: level 2 30-bit
+			 *   1: level 3 39-bit
+			 *   2: level 4 48-bit
+			 *   3: level 5 57-bit
+			 */
+			int		agaw;
+			/* maximum mapped address */
+			u64		max_addr;
+		};
+
+		/* Nested user domain */
+		struct {
+			/* 2-level page table the user domain nested */
+			struct dmar_domain *s2_domain;
+			/* user page table pointer (in GPA) */
+			unsigned long s1_pgtbl;
+			/* page table attributes */
+			struct iommu_hwpt_intel_vtd s1_cfg;
+		};
+	};
 
 	struct iommu_domain domain;	/* generic domain data structure for
 					   iommu core */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 03/10] iommu/vt-d: Add helper for nested domain allocation
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
  2023-05-11 14:51 ` [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
  2023-05-11 14:51 ` [PATCH v3 02/10] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-11 14:51 ` [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

This adds helper for accepting user parameters and allocate a nested
domain.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/Makefile |  2 +-
 drivers/iommu/intel/iommu.h  |  2 ++
 drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/intel/nested.c

diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
index 7af3b8a4f2a0..5dabf081a779 100644
--- a/drivers/iommu/intel/Makefile
+++ b/drivers/iommu/intel/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
-obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o
+obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o
 obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
 obj-$(CONFIG_DMAR_PERF) += perf.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index e818520f4068..8d0c7970c1ad 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -858,6 +858,8 @@ void *alloc_pgtable_page(int node, gfp_t gfp);
 void free_pgtable_page(void *vaddr);
 void iommu_flush_write_buffer(struct intel_iommu *iommu);
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+					       const union iommu_domain_user_data *user_data);
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 void intel_svm_check(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
new file mode 100644
index 000000000000..f83931bb44b6
--- /dev/null
+++ b/drivers/iommu/intel/nested.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * nested.c - nested mode translation support
+ *
+ * Copyright (C) 2023 Intel Corporation
+ *
+ * Author: Lu Baolu <baolu.lu@linux.intel.com>
+ *         Jacob Pan <jacob.jun.pan@linux.intel.com>
+ */
+
+#define pr_fmt(fmt)	"DMAR: " fmt
+
+#include <linux/iommu.h>
+
+#include "iommu.h"
+
+static void intel_nested_domain_free(struct iommu_domain *domain)
+{
+	kfree(to_dmar_domain(domain));
+}
+
+static const struct iommu_domain_ops intel_nested_domain_ops = {
+	.free			= intel_nested_domain_free,
+};
+
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+					       const union iommu_domain_user_data *user_data)
+{
+	const struct iommu_hwpt_intel_vtd *vtd = (struct iommu_hwpt_intel_vtd *)user_data;
+	struct dmar_domain *domain;
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
+	if (!domain)
+		return NULL;
+
+	domain->use_first_level = true;
+	domain->s2_domain = to_dmar_domain(s2_domain);
+	domain->s1_pgtbl = vtd->pgtbl_addr;
+	domain->s1_cfg = *vtd;
+	domain->domain.ops = &intel_nested_domain_ops;
+	domain->domain.type = IOMMU_DOMAIN_NESTED;
+	INIT_LIST_HEAD(&domain->devices);
+	spin_lock_init(&domain->lock);
+	xa_init(&domain->iommu_array);
+
+	return &domain->domain;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
                   ` (2 preceding siblings ...)
  2023-05-11 14:51 ` [PATCH v3 03/10] iommu/vt-d: Add helper for nested domain allocation Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-24  7:16   ` Tian, Kevin
  2023-05-11 14:51 ` [PATCH v3 05/10] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

The configurations are passed in from the user when the user domain is
allocated. This helper interprets these configurations according to the
data structure defined in uapi/linux/iommufd.h. The EINVAL error will be
returned if any of configurations are not compatible with the hardware
capabilities. The caller can retry with another compatible user domain.
The encoding of fields of each pasid entry is defined in section 9.6 of
the VT-d spec.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/pasid.c | 151 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h |   2 +
 2 files changed, 153 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c5d479770e12..ee4c7b69c6c6 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -21,6 +21,11 @@
 #include "iommu.h"
 #include "pasid.h"
 
+#define IOMMU_VTD_PGTBL_MTS_MASK	(IOMMU_VTD_PGTBL_CD | \
+					 IOMMU_VTD_PGTBL_EMTE | \
+					 IOMMU_VTD_PGTBL_PCD | \
+					 IOMMU_VTD_PGTBL_PWT)
+
 /*
  * Intel IOMMU system wide PASID name space:
  */
@@ -335,6 +340,15 @@ static inline void pasid_set_fault_enable(struct pasid_entry *pe)
 	pasid_set_bits(&pe->val[0], 1 << 1, 0);
 }
 
+/*
+ * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_sre(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 0, 1);
+}
+
 /*
  * Setup the WPE(Write Protect Enable) field (Bit 132) of a
  * scalable mode PASID entry.
@@ -402,6 +416,15 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
 	pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
 }
 
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void pasid_set_eafe(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
+}
+
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 				    u16 did, u32 pasid)
@@ -713,3 +736,131 @@ void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
 	if (!cap_caching_mode(iommu->cap))
 		devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
+ * This could be used for guest shared virtual address. In this case, the
+ * first level page tables are used for GVA-GPA translation in the guest,
+ * second level page tables are used for GPA-HPA translation.
+ *
+ * @iommu:      IOMMU which the device belong to
+ * @dev:        Device to be set up for translation
+ * @pasid:      PASID to be programmed in the device PASID table
+ * @domain:     User domain nested on a s2 domain
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain)
+{
+	struct iommu_hwpt_intel_vtd *s1_cfg = &domain->s1_cfg;
+	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
+	struct dmar_domain *s2_domain = domain->s2_domain;
+	u16 did = domain_id_iommu(domain, iommu);
+	struct dma_pte *pgd = s2_domain->pgd;
+	struct pasid_entry *pte;
+	int agaw;
+
+	if (!ecap_nest(iommu->ecap)) {
+		pr_err_ratelimited("%s: No nested translation support\n",
+				   iommu->name);
+		return -ENODEV;
+	}
+
+	/*
+	 * Sanity checking performed by caller to make sure address width
+	 * matching in two dimensions: CPU vs. IOMMU, guest vs. host.
+	 */
+	switch (s1_cfg->addr_width) {
+	case ADDR_WIDTH_4LEVEL:
+		break;
+#ifdef CONFIG_X86
+	case ADDR_WIDTH_5LEVEL:
+		if (!cpu_feature_enabled(X86_FEATURE_LA57) ||
+		    !cap_fl5lp_support(iommu->cap)) {
+			dev_err_ratelimited(dev,
+					    "5-level paging not supported\n");
+			return -EINVAL;
+		}
+		break;
+#endif
+	default:
+		dev_err_ratelimited(dev, "Invalid guest address width %d\n",
+				    s1_cfg->addr_width);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_PGTBL_SRE) && !ecap_srs(iommu->ecap)) {
+		pr_err_ratelimited("No supervisor request support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_PGTBL_EAFE) && !ecap_eafs(iommu->ecap)) {
+		pr_err_ratelimited("No extended access flag support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	/*
+	 * Memory type is only applicable to devices inside processor coherent
+	 * domain. Will add MTS support once coherent devices are available.
+	 */
+	if (s1_cfg->flags & IOMMU_VTD_PGTBL_MTS_MASK) {
+		pr_warn_ratelimited("No memory type support %s\n",
+				    iommu->name);
+		return -EINVAL;
+	}
+
+	agaw = iommu_skip_agaw(s2_domain, iommu, &pgd);
+	if (agaw < 0) {
+		dev_err_ratelimited(dev, "Invalid domain page table\n");
+		return -EINVAL;
+	}
+
+	/* First level PGD (in GPA) must be supported by the second level. */
+	if ((uintptr_t)s1_gpgd > (1ULL << s2_domain->gaw)) {
+		dev_err_ratelimited(dev,
+				    "Guest PGD %lx not supported, max %llx\n",
+				    (uintptr_t)s1_gpgd, s2_domain->max_addr);
+		return -EINVAL;
+	}
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+	if (pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EBUSY;
+	}
+
+	pasid_clear_entry(pte);
+
+	if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL)
+		pasid_set_flpm(pte, 1);
+
+	pasid_set_flptr(pte, (uintptr_t)s1_gpgd);
+
+	if (s1_cfg->flags & IOMMU_VTD_PGTBL_SRE) {
+		pasid_set_sre(pte);
+		if (s1_cfg->flags & IOMMU_VTD_PGTBL_WPE)
+			pasid_set_wpe(pte);
+	}
+
+	if (s1_cfg->flags & IOMMU_VTD_PGTBL_EAFE)
+		pasid_set_eafe(pte);
+
+	pasid_set_slptr(pte, virt_to_phys(pgd));
+	pasid_set_fault_enable(pte);
+	pasid_set_domain_id(pte, did);
+	pasid_set_address_width(pte, agaw);
+	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+	pasid_set_present(pte);
+	spin_unlock(&iommu->lock);
+
+	pasid_flush_caches(iommu, pte, pasid, did);
+
+	return 0;
+}
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index d6b7d21244b1..864b12848392 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -111,6 +111,8 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, u32 pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, u32 pasid,
 				 bool fault_ignore);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 05/10] iommu/vt-d: Make domain attach helpers to be extern
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
                   ` (3 preceding siblings ...)
  2023-05-11 14:51 ` [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-11 14:51 ` [PATCH v3 06/10] iommu/vt-d: Set the nested domain to a device Yi Liu
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

hence can be used in the nested.c

Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c | 17 +++++++----------
 drivers/iommu/intel/iommu.h |  8 ++++++++
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b871a6afd803..e6536a43dd82 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -277,7 +277,6 @@ static LIST_HEAD(dmar_satc_units);
 #define for_each_rmrr_units(rmrr) \
 	list_for_each_entry(rmrr, &dmar_rmrr_units, list)
 
-static void device_block_translation(struct device *dev);
 static void intel_iommu_domain_free(struct iommu_domain *domain);
 
 int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
@@ -555,7 +554,7 @@ static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
 }
 
 /* Some capabilities may be different across iommus */
-static void domain_update_iommu_cap(struct dmar_domain *domain)
+void domain_update_iommu_cap(struct dmar_domain *domain)
 {
 	domain_update_iommu_coherency(domain);
 	domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL);
@@ -1740,8 +1739,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
 	return domain;
 }
 
-static int domain_attach_iommu(struct dmar_domain *domain,
-			       struct intel_iommu *iommu)
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 {
 	struct iommu_domain_info *info, *curr;
 	unsigned long ndomains;
@@ -1790,8 +1788,7 @@ static int domain_attach_iommu(struct dmar_domain *domain,
 	return ret;
 }
 
-static void domain_detach_iommu(struct dmar_domain *domain,
-				struct intel_iommu *iommu)
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 {
 	struct iommu_domain_info *info;
 
@@ -3994,7 +3991,7 @@ static void dmar_remove_one_dev_info(struct device *dev)
  * all DMA requests without PASID from the device are blocked. If the page
  * table has been set, clean up the data structures.
  */
-static void device_block_translation(struct device *dev)
+void device_block_translation(struct device *dev)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
 	struct intel_iommu *iommu = info->iommu;
@@ -4101,8 +4098,8 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
 		domain_exit(to_dmar_domain(domain));
 }
 
-static int prepare_domain_attach_device(struct iommu_domain *domain,
-					struct device *dev)
+int prepare_domain_attach_device(struct iommu_domain *domain,
+				 struct device *dev)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	struct intel_iommu *iommu;
@@ -4342,7 +4339,7 @@ static void domain_set_force_snooping(struct dmar_domain *domain)
 						     PASID_RID2PASID);
 }
 
-static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
+bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	unsigned long flags;
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 8d0c7970c1ad..ccb93aed6cf2 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -852,6 +852,14 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
  */
 #define QI_OPT_WAIT_DRAIN		BIT(0)
 
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void device_block_translation(struct device *dev);
+int prepare_domain_attach_device(struct iommu_domain *domain,
+				 struct device *dev);
+bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain);
+void domain_update_iommu_cap(struct dmar_domain *domain);
+
 int dmar_ir_support(void);
 
 void *alloc_pgtable_page(int node, gfp_t gfp);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 06/10] iommu/vt-d: Set the nested domain to a device
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
                   ` (4 preceding siblings ...)
  2023-05-11 14:51 ` [PATCH v3 05/10] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-24  7:22   ` Tian, Kevin
  2023-05-11 14:51 ` [PATCH v3 07/10] iommu/vt-d: Add iotlb flush for nested domain Yi Liu
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, Jacob Pan

hence enable nested domain usage on Intel VT-d.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index f83931bb44b6..fd38424b78f0 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -11,8 +11,53 @@
 #define pr_fmt(fmt)	"DMAR: " fmt
 
 #include <linux/iommu.h>
+#include <linux/pci.h>
+#include <linux/pci-ats.h>
 
 #include "iommu.h"
+#include "pasid.h"
+
+static int intel_nested_attach_dev(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct intel_iommu *iommu = info->iommu;
+	unsigned long flags;
+	int ret = 0;
+
+	if (info->domain)
+		device_block_translation(dev);
+
+	/* Is s2_domain compatible with this IOMMU? */
+	ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
+	if (ret) {
+		dev_err_ratelimited(dev, "s2 domain is not compatible\n");
+		return ret;
+	}
+
+	ret = domain_attach_iommu(dmar_domain, iommu);
+	if (ret) {
+		dev_err_ratelimited(dev, "Failed to attach domain to iommu\n");
+		return ret;
+	}
+
+	ret = intel_pasid_setup_nested(iommu, dev,
+				       PASID_RID2PASID, dmar_domain);
+	if (ret) {
+		domain_detach_iommu(dmar_domain, iommu);
+		dev_err_ratelimited(dev, "Failed to setup pasid entry\n");
+		return ret;
+	}
+
+	info->domain = dmar_domain;
+	spin_lock_irqsave(&dmar_domain->lock, flags);
+	list_add(&info->link, &dmar_domain->devices);
+	spin_unlock_irqrestore(&dmar_domain->lock, flags);
+	domain_update_iommu_cap(dmar_domain);
+
+	return 0;
+}
 
 static void intel_nested_domain_free(struct iommu_domain *domain)
 {
@@ -20,7 +65,9 @@ static void intel_nested_domain_free(struct iommu_domain *domain)
 }
 
 static const struct iommu_domain_ops intel_nested_domain_ops = {
+	.attach_dev		= intel_nested_attach_dev,
 	.free			= intel_nested_domain_free,
+	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
 };
 
 struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 07/10] iommu/vt-d: Add iotlb flush for nested domain
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
                   ` (5 preceding siblings ...)
  2023-05-11 14:51 ` [PATCH v3 06/10] iommu/vt-d: Set the nested domain to a device Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-24  7:33   ` Tian, Kevin
  2023-05-11 14:51 ` [PATCH v3 08/10] iommu/vt-d: Add nested domain allocation Yi Liu
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This is needed as the stage-1 page table of the nested domain is
maintained outside the iommu subsystem, hence, needs to support iotlb
flush requests.

This adds the data structure for flushing iotlb for the nested domain
allocated with IOMMU_HWPT_TYPE_VTD_S1 type and the related callback
to accept iotlb flush request from IOMMUFD.

This only exposes the interface for invalidating IOTLB, but no for
device-TLB as device-TLB invalidation will be covered automatically
in IOTLB invalidation if the affected device is ATS-capable.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c  | 10 +++---
 drivers/iommu/intel/iommu.h  |  6 ++++
 drivers/iommu/intel/nested.c | 69 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommufd/main.c |  6 ++++
 include/uapi/linux/iommufd.h | 59 ++++++++++++++++++++++++++++++
 5 files changed, 145 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index e6536a43dd82..5f27cee4656a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1474,10 +1474,10 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
 
-static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
-				  struct dmar_domain *domain,
-				  unsigned long pfn, unsigned int pages,
-				  int ih, int map)
+void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
+			   struct dmar_domain *domain,
+			   unsigned long pfn, unsigned int pages,
+			   int ih, int map)
 {
 	unsigned int aligned_pages = __roundup_pow_of_two(pages);
 	unsigned int mask = ilog2(aligned_pages);
@@ -1550,7 +1550,7 @@ static inline void __mapping_notify_one(struct intel_iommu *iommu,
 		iommu_flush_write_buffer(iommu);
 }
 
-static void intel_flush_iotlb_all(struct iommu_domain *domain)
+void intel_flush_iotlb_all(struct iommu_domain *domain)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	struct iommu_domain_info *info;
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index ccb93aed6cf2..581596d90c1b 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -859,6 +859,12 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
 				 struct device *dev);
 bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain);
 void domain_update_iommu_cap(struct dmar_domain *domain);
+void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
+			   struct dmar_domain *domain,
+			   unsigned long pfn, unsigned int pages,
+			   int ih, int map);
+void intel_flush_iotlb_all(struct iommu_domain *domain);
+
 
 int dmar_ir_support(void);
 
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index fd38424b78f0..d13fbcd3f5a6 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -64,8 +64,77 @@ static void intel_nested_domain_free(struct iommu_domain *domain)
 	kfree(to_dmar_domain(domain));
 }
 
+static void intel_nested_invalidate(struct device *dev,
+				    struct dmar_domain *domain,
+				    u64 addr, unsigned long npages)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+
+	if (addr == 0 && npages == -1)
+		intel_flush_iotlb_all(&domain->domain);
+	else
+		iommu_flush_iotlb_psi(iommu, domain,
+				      addr >> VTD_PAGE_SHIFT,
+				      npages, 1, 0);
+}
+
+static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
+					      void *user_data)
+{
+	struct iommu_hwpt_invalidate_request_intel_vtd *req = user_data;
+	struct iommu_hwpt_invalidate_intel_vtd *inv_info = user_data;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	unsigned int entry_size = inv_info->entry_size;
+	u64 uptr = inv_info->inv_data_uptr;
+	u64 nr_uptr = inv_info->entry_nr_uptr;
+	struct device_domain_info *info;
+	u32 entry_nr, index;
+	unsigned long flags;
+	int ret = 0;
+
+	if (WARN_ON(!user_data))
+		return 0;
+
+	if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
+		return -EFAULT;
+
+	if (!entry_nr)
+		return -EINVAL;
+
+	for (index = 0; index < entry_nr; index++) {
+		ret = copy_struct_from_user(req, sizeof(*req),
+					    u64_to_user_ptr(uptr + index * entry_size),
+					    entry_size);
+		if (ret) {
+			pr_err_ratelimited("Failed to fetch invalidation request\n");
+			break;
+		}
+
+		if (req->__reserved || (req->flags & ~IOMMU_VTD_QI_FLAGS_LEAF) ||
+		    !IS_ALIGNED(req->addr, VTD_PAGE_SIZE)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		spin_lock_irqsave(&dmar_domain->lock, flags);
+		list_for_each_entry(info, &dmar_domain->devices, link)
+			intel_nested_invalidate(info->dev, dmar_domain,
+						req->addr, req->npages);
+		spin_unlock_irqrestore(&dmar_domain->lock, flags);
+	}
+
+	if (ret && put_user(index, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
+		return -EFAULT;
+
+	return ret;
+}
+
 static const struct iommu_domain_ops intel_nested_domain_ops = {
 	.attach_dev		= intel_nested_attach_dev,
+	.cache_invalidate_user	= intel_nested_cache_invalidate_user,
+	.cache_invalidate_user_data_len =
+		sizeof(struct iommu_hwpt_invalidate_intel_vtd),
 	.free			= intel_nested_domain_free,
 	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
 };
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 39922f83ce34..b338b082950b 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -282,6 +282,12 @@ union ucmd_buffer {
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
+	/*
+	 * hwpt_type specific structure used in the cache invalidation
+	 * path.
+	 */
+	struct iommu_hwpt_invalidate_intel_vtd vtd;
+	struct iommu_hwpt_invalidate_request_intel_vtd req_vtd;
 };
 
 struct iommufd_ioctl_op {
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index c2658394827a..2e658fa346ad 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -505,6 +505,63 @@ struct iommu_hw_info {
 };
 #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
 
+/**
+ * enum iommu_hwpt_intel_vtd_invalidate_flags - Flags for Intel VT-d
+ *                                              stage-1 page table cache
+ *                                              invalidation
+ * @IOMMU_VTD_QI_FLAGS_LEAF: The LEAF flag indicates whether only the
+ *                           leaf PTE caching needs to be invalidated
+ *                           and other paging structure caches can be
+ *                           preserved.
+ */
+enum iommu_hwpt_intel_vtd_invalidate_flags {
+	IOMMU_VTD_QI_FLAGS_LEAF = 1 << 0,
+};
+
+/**
+ * struct iommu_hwpt_invalidate_request_intel_vtd - Intel VT-d cache invalidation request
+ * @addr: The start address of the addresses to be invalidated.
+ * @npages: Number of contiguous 4K pages to be invalidated.
+ * @flags: Combination of enum iommu_hwpt_intel_vtd_invalidate_flags
+ * @__reserved: Must be 0
+ *
+ * The Intel VT-d specific invalidation data for user-managed stage-1 cache
+ * invalidation under nested translation. Userspace uses this structure to
+ * tell host about the impacted caches after modifying the stage-1 page table.
+ *
+ * Invalidating all the caches related to the hw_pagetable by setting
+ * @addr==0 and @npages==__u64(-1).
+ */
+struct iommu_hwpt_invalidate_request_intel_vtd {
+	__u64 addr;
+	__u64 npages;
+	__u32 flags;
+	__u32 __reserved;
+};
+
+/**
+ * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache invalidation info
+ * @flags: Must be 0
+ * @entry_size: Size in bytes of each cache invalidation request
+ * @entry_nr_uptr: User pointer to the number of invalidation requests.
+ *                 Kernel reads it to get the number of requests and
+ *                 updates the buffer with the number of requests that
+ *                 have been processed successfully. This pointer must
+ *                 point to a __u32 type of memory location.
+ * @inv_data_uptr: Pointer to the cache invalidation requests
+ *
+ * The Intel VT-d specific invalidation data for a set of cache invalidation
+ * requests. Kernel loops the requests one-by-one and stops when failure
+ * is encountered. The number of handled requests is reported to user by
+ * writing the buffer pointed by @entry_nr_uptr.
+ */
+struct iommu_hwpt_invalidate_intel_vtd {
+	__u32 flags;
+	__u32 entry_size;
+	__u64 entry_nr_uptr;
+	__u64 inv_data_uptr;
+};
+
 /**
  * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
  * @size: sizeof(struct iommu_hwpt_invalidate)
@@ -520,6 +577,8 @@ struct iommu_hw_info {
  * +==============================+========================================+
  * | @hwpt_type                   |     Data structure in @data_uptr       |
  * +------------------------------+----------------------------------------+
+ * | IOMMU_HWPT_TYPE_VTD_S1       | struct iommu_hwpt_invalidate_intel_vtd |
+ * +------------------------------+----------------------------------------+
  */
 struct iommu_hwpt_invalidate {
 	__u32 size;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 08/10] iommu/vt-d: Add nested domain allocation
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
                   ` (6 preceding siblings ...)
  2023-05-11 14:51 ` [PATCH v3 07/10] iommu/vt-d: Add iotlb flush for nested domain Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-11 14:51 ` [PATCH v3 09/10] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Lu Baolu <baolu.lu@linux.intel.com>

This adds the support for IOMMU_HWPT_TYPE_VTD_S1 type.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c | 19 +++++++++++++++++++
 include/linux/iommu.h       |  1 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 5f27cee4656a..51612da4a1ea 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4092,6 +4092,16 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 	return NULL;
 }
 
+static struct iommu_domain *
+intel_iommu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
+			      const union iommu_domain_user_data *user_data)
+{
+	if (parent)
+		return intel_nested_domain_alloc(parent, user_data);
+	else
+		return iommu_domain_alloc(dev->bus);
+}
+
 static void intel_iommu_domain_free(struct iommu_domain *domain)
 {
 	if (domain != &si_domain->domain && domain != &blocking_domain)
@@ -4736,9 +4746,18 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	intel_pasid_tear_down_entry(iommu, dev, pasid, false);
 }
 
+static int intel_iommu_domain_user_data_len(u32 hwpt_type)
+{
+	if (hwpt_type != IOMMU_HWPT_TYPE_VTD_S1)
+		return -EOPNOTSUPP;
+	return sizeof(struct iommu_hwpt_intel_vtd);
+};
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
+	.domain_alloc_user	= intel_iommu_domain_alloc_user,
+	.domain_alloc_user_data_len = intel_iommu_domain_user_data_len,
 	.probe_device		= intel_iommu_probe_device,
 	.probe_finalize		= intel_iommu_probe_finalize,
 	.release_device		= intel_iommu_release_device,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c696d8e7f43b..32324f0756de 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -234,6 +234,7 @@ union iommu_domain_user_data {
 #ifdef CONFIG_IOMMUFD_TEST
 	__u64 test[2];
 #endif
+	struct iommu_hwpt_intel_vtd vtd;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 09/10] iommu/vt-d: Implement hw_info for iommu capability query
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
                   ` (7 preceding siblings ...)
  2023-05-11 14:51 ` [PATCH v3 08/10] iommu/vt-d: Add nested domain allocation Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-11 14:51 ` [PATCH v3 10/10] iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu
  2023-05-24  8:59 ` [PATCH v3 00/10] Add Intel VT-d nested translation Tian, Kevin
  10 siblings, 0 replies; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c  | 19 +++++++++++++++++++
 include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 51612da4a1ea..20d4ae1cb8a6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4746,6 +4746,23 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	intel_pasid_tear_down_entry(iommu, dev, pasid, false);
 }
 
+static void *intel_iommu_hw_info(struct device *dev, u32 *length)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct iommu_hw_info_vtd *vtd;
+
+	vtd = kzalloc(sizeof(*vtd), GFP_KERNEL);
+	if (!vtd)
+		return ERR_PTR(-ENOMEM);
+
+	vtd->cap_reg = iommu->cap;
+	vtd->ecap_reg = iommu->ecap;
+	*length = sizeof(*vtd);
+
+	return vtd;
+}
+
 static int intel_iommu_domain_user_data_len(u32 hwpt_type)
 {
 	if (hwpt_type != IOMMU_HWPT_TYPE_VTD_S1)
@@ -4755,6 +4772,7 @@ static int intel_iommu_domain_user_data_len(u32 hwpt_type)
 
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
+	.hw_info		= intel_iommu_hw_info,
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.domain_alloc_user	= intel_iommu_domain_alloc_user,
 	.domain_alloc_user_data_len = intel_iommu_domain_user_data_len,
@@ -4769,6 +4787,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.def_domain_type	= device_def_domain_type,
 	.remove_dev_pasid	= intel_iommu_remove_dev_pasid,
 	.pgsize_bitmap		= SZ_4K,
+	.hw_info_type		= IOMMU_HW_INFO_TYPE_INTEL_VTD,
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	.page_response		= intel_svm_page_response,
 #endif
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 2e658fa346ad..b46270a4cf46 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -464,9 +464,32 @@ struct iommu_hwpt_alloc {
 
 /**
  * enum iommu_hw_info_type - IOMMU Hardware Info Types
+ * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
  */
 enum iommu_hw_info_type {
 	IOMMU_HW_INFO_TYPE_NONE,
+	IOMMU_HW_INFO_TYPE_INTEL_VTD,
+};
+
+/**
+ * struct iommu_hw_info_vtd - Intel VT-d hardware information
+ *
+ * @flags: Must be 0
+ * @__reserved: Must be 0
+ *
+ * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ *           section 11.4.2 Capability Register.
+ * @ecap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ *            section 11.4.3 Extended Capability Register.
+ *
+ * User needs to understand the Intel VT-d specification to decode the
+ * register value.
+ */
+struct iommu_hw_info_vtd {
+	__u32 flags;
+	__u32 __reserved;
+	__aligned_u64 cap_reg;
+	__aligned_u64 ecap_reg;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 10/10] iommu/vt-d: Disallow nesting on domains with read-only mappings
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
                   ` (8 preceding siblings ...)
  2023-05-11 14:51 ` [PATCH v3 09/10] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
@ 2023-05-11 14:51 ` Yi Liu
  2023-05-24  7:44   ` Tian, Kevin
  2023-05-24  8:59 ` [PATCH v3 00/10] Add Intel VT-d nested translation Tian, Kevin
  10 siblings, 1 reply; 40+ messages in thread
From: Yi Liu @ 2023-05-11 14:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Lu Baolu <baolu.lu@linux.intel.com>

When remapping hardware is configured by system software in scalable mode
as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry,
it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled)
in first-stage page-table entries even when second-stage mappings indicate
that corresponding first-stage page-table is Read-Only.

As the result, contents of pages designated by VMM as Read-Only can be
modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of
address translation process due to DMAs issued by Guest.

Disallow the nested translation when there are read-only pages in the
corresponding second-stage mappings. And, no read-only pages are allowed
to be configured in the second-stage table of a nested translation.
For the latter, an alternative is to disallow read-only mappings in
any stage-2 domain as long as it's ever been used as a parent. In this
way, we can simply replace the user counter with a flag.

In concept if the user understands this errata and does expect to
enable nested translation it should never install any RO mapping
in stage-2 in the entire VM life cycle."

Reference from Sapphire Rapids Specification Update [1], errata details,
SPR17.

[1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c  | 13 +++++++++++++
 drivers/iommu/intel/iommu.h  |  4 ++++
 drivers/iommu/intel/nested.c | 22 ++++++++++++++++++++--
 include/uapi/linux/iommufd.h | 12 +++++++++++-
 4 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 20d4ae1cb8a6..42288bd449a0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2150,6 +2150,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 	struct dma_pte *first_pte = NULL, *pte = NULL;
 	unsigned int largepage_lvl = 0;
 	unsigned long lvl_pages = 0;
+	unsigned long flags;
 	phys_addr_t pteval;
 	u64 attr;
 
@@ -2159,6 +2160,17 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 	if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
 		return -EINVAL;
 
+	if (!(prot & DMA_PTE_WRITE) && !domain->read_only_mapped) {
+		spin_lock_irqsave(&domain->lock, flags);
+		if (domain->nested_users > 0) {
+			spin_unlock_irqrestore(&domain->lock, flags);
+			return -EINVAL;
+		}
+
+		domain->read_only_mapped = true;
+		spin_unlock_irqrestore(&domain->lock, flags);
+	}
+
 	attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP);
 	attr |= DMA_FL_PTE_PRESENT;
 	if (domain->use_first_level) {
@@ -4756,6 +4768,7 @@ static void *intel_iommu_hw_info(struct device *dev, u32 *length)
 	if (!vtd)
 		return ERR_PTR(-ENOMEM);
 
+	vtd->flags = IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17;
 	vtd->cap_reg = iommu->cap;
 	vtd->ecap_reg = iommu->ecap;
 	*length = sizeof(*vtd);
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 581596d90c1b..95644c6815af 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -616,6 +616,10 @@ struct dmar_domain {
 			int		agaw;
 			/* maximum mapped address */
 			u64		max_addr;
+			/* domain has mappings with read-only permission */
+			bool		read_only_mapped;
+			/* user nested domain count */
+			int		nested_users;
 		};
 
 		/* Nested user domain */
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index d13fbcd3f5a6..9092ce28382c 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -61,7 +61,14 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
 
 static void intel_nested_domain_free(struct iommu_domain *domain)
 {
-	kfree(to_dmar_domain(domain));
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct dmar_domain *s2_domain = dmar_domain->s2_domain;
+	unsigned long flags;
+
+	spin_lock_irqsave(&s2_domain->lock, flags);
+	s2_domain->nested_users--;
+	spin_unlock_irqrestore(&s2_domain->lock, flags);
+	kfree(dmar_domain);
 }
 
 static void intel_nested_invalidate(struct device *dev,
@@ -143,14 +150,25 @@ struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
 					       const union iommu_domain_user_data *user_data)
 {
 	const struct iommu_hwpt_intel_vtd *vtd = (struct iommu_hwpt_intel_vtd *)user_data;
+	struct dmar_domain *s2_dmar_domain = to_dmar_domain(s2_domain);
 	struct dmar_domain *domain;
+	unsigned long flags;
 
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
 	if (!domain)
 		return NULL;
 
+	spin_lock_irqsave(&s2_dmar_domain->lock, flags);
+	if (s2_dmar_domain->read_only_mapped) {
+		spin_unlock_irqrestore(&s2_dmar_domain->lock, flags);
+		kfree(domain);
+		return NULL;
+	}
+	s2_dmar_domain->nested_users++;
+	spin_unlock_irqrestore(&s2_dmar_domain->lock, flags);
+
 	domain->use_first_level = true;
-	domain->s2_domain = to_dmar_domain(s2_domain);
+	domain->s2_domain = s2_dmar_domain;
 	domain->s1_pgtbl = vtd->pgtbl_addr;
 	domain->s1_cfg = *vtd;
 	domain->domain.ops = &intel_nested_domain_ops;
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index b46270a4cf46..8626f36e0353 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -471,10 +471,20 @@ enum iommu_hw_info_type {
 	IOMMU_HW_INFO_TYPE_INTEL_VTD,
 };
 
+/**
+ * enum iommu_hw_info_vtd_flags - Flags for VT-d hw_info
+ * @IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17: If set, disallow nesting on domains
+ *                                   with read-only mapping.
+ *                                   https://www.intel.com/content/www/us/en/content-details/772415/content-details.html
+ */
+enum iommu_hw_info_vtd_flags {
+	IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 = 1 << 0,
+};
+
 /**
  * struct iommu_hw_info_vtd - Intel VT-d hardware information
  *
- * @flags: Must be 0
+ * @flags: Combination of enum iommu_hw_info_vtd_flags
  * @__reserved: Must be 0
  *
  * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  2023-05-11 14:51 ` [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
@ 2023-05-24  6:59   ` Tian, Kevin
  2023-05-25  2:28   ` Zhang, Tina
  2023-05-29 19:53   ` Jason Gunthorpe
  2 siblings, 0 replies; 40+ messages in thread
From: Tian, Kevin @ 2023-05-24  6:59 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:51 PM
>
> @@ -353,9 +353,64 @@ struct iommu_vfio_ioas {
>  /**
>   * enum iommu_hwpt_type - IOMMU HWPT Type
>   * @IOMMU_HWPT_TYPE_DEFAULT: default
> + * @IOMMU_HWPT_TYPE_VTD_S1: Intel VT-d stage-1 page table
>   */
>  enum iommu_hwpt_type {
>  	IOMMU_HWPT_TYPE_DEFAULT,
> +	IOMMU_HWPT_TYPE_VTD_S1,
> +};
> +
> +/**
> + * enum iommu_hwpt_intel_vtd_flags - Intel VT-d stage-1 page
> + *				     table entry attributes

No need to have both 'intel' and 'vtd' in one name. There won't
be another vtd. 😊

Also since it's for s1, let's be specific on the naming:

enum iommu_hwpt_vtd_s1_flags

> + * @IOMMU_VTD_PGTBL_SRE: Supervisor request
> + * @IOMMU_VTD_PGTBL_EAFE: Extended access enable
> + * @IOMMU_VTD_PGTBL_PCD: Page-level cache disable
> + * @IOMMU_VTD_PGTBL_PWT: Page-level write through
> + * @IOMMU_VTD_PGTBL_EMTE: Extended mem type enable
> + * @IOMMU_VTD_PGTBL_CD: PASID-level cache disable
> + * @IOMMU_VTD_PGTBL_WPE: Write protect enable

similarly above should be IOMMU_VTD_S1_SRE.

PGTBL can be skipped.

> + */
> +enum iommu_hwpt_intel_vtd_flags {
> +	IOMMU_VTD_PGTBL_SRE = 1 << 0,
> +	IOMMU_VTD_PGTBL_EAFE = 1 << 1,
> +	IOMMU_VTD_PGTBL_PCD = 1 << 2,
> +	IOMMU_VTD_PGTBL_PWT = 1 << 3,
> +	IOMMU_VTD_PGTBL_EMTE = 1 << 4,
> +	IOMMU_VTD_PGTBL_CD = 1 << 5,
> +	IOMMU_VTD_PGTBL_WPE = 1 << 6,
> +	IOMMU_VTD_PGTBL_LAST = 1 << 7,
> +};
> +
> +/**
> + * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
> + *                               stage-1 page table info

struct iommu_hwpt_vtd_s1

> + * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
> + * @pgtbl_addr: The base address of the user-managed stage-1 page table.

no need to highlight 'user-managed' in every reference to stage-1

> + * @pat: Page attribute table data to compute effective memory type
> + * @emt: Extended memory type
> + * @addr_width: The address width of the untranslated addresses that are
> + *              subjected to the user-managed stage-1 page table.

"The address width of the stage-1 page table"

> + * @__reserved: Must be 0
> + *
> + * The Intel VT-d specific data for creating hw_pagetable to represent
> + * the user-managed stage-1 page table that is used in nested translation.

"VT-d specific data for creating a stage-1 page table that ..."

> + *
> + * In nested translation, the stage-1 page table locates in the address
> + * space that defined by the corresponding stage-2 page table. Hence the

"locates in the address space of the specified stage-2 page table"

> + * stage-1 page table base address value should not be higher than the
> + * maximum untranslated address of stage-2 page table.

"should not exceed the maximum allowed input address of stage-2"

> + *
> + * The paging level of the stage-1 page table should be compatible with
> + * the hardware iommu. Otherwise, the allocation would be failed.

there is no information about paging level in this structure

> + */
> +struct iommu_hwpt_intel_vtd {
> +	__u64 flags;
> +	__u64 pgtbl_addr;
> +	__u32 pat;
> +	__u32 emt;
> +	__u32 addr_width;
> +	__u32 __reserved;
>  };
> 

I'd prefer to move vendor specific definitions before 'enum iommu_hwpt_type'
so any new added type can be easily correlated between the type enumeration
and following change in the comment of 'struct iommu_hwpt_alloc'

>  /**
> @@ -391,6 +446,8 @@ enum iommu_hwpt_type {
>   * +------------------------------+-------------------------------------+-----------+
>   * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
>   * +------------------------------+-------------------------------------+-----------+
> + * | IOMMU_HWPT_TYPE_VTD_S1       |      struct iommu_hwpt_intel_vtd    |
> HWPT   |
> + * +------------------------------+-------------------------------------+-----------+
>   */
>  struct iommu_hwpt_alloc {
>  	__u32 size;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 02/10] iommu/vt-d: Extend dmar_domain to support nested domain
  2023-05-11 14:51 ` [PATCH v3 02/10] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
@ 2023-05-24  7:02   ` Tian, Kevin
  2023-05-26  2:56     ` Baolu Lu
  0 siblings, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-05-24  7:02 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:51 PM
> 
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> The nested domain fields are exclusive to those that used for a DMA
> remapping domain. Use union to avoid memory waste.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
>  1 file changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 1c5e1d88862b..e818520f4068 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -596,15 +596,38 @@ struct dmar_domain {
>  	spinlock_t lock;		/* Protect device tracking lists */
>  	struct list_head devices;	/* all devices' list */
> 
> -	struct dma_pte	*pgd;		/* virtual address */
> -	int		gaw;		/* max guest address width */
> -
> -	/* adjusted guest address width, 0 is level 2 30-bit */
> -	int		agaw;
>  	int		iommu_superpage;/* Level of superpages supported:
>  					   0 == 4KiB (no superpages), 1 ==
> 2MiB,
>  					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
> -	u64		max_addr;	/* maximum mapped address */
> +	union {
> +		/* DMA remapping domain */
> +		struct {
> +			/* virtual address */
> +			struct dma_pte	*pgd;
> +			/* max guest address width */
> +			int		gaw;
> +			/*
> +			 * adjusted guest address width:
> +			 *   0: level 2 30-bit
> +			 *   1: level 3 39-bit
> +			 *   2: level 4 48-bit
> +			 *   3: level 5 57-bit
> +			 */
> +			int		agaw;
> +			/* maximum mapped address */
> +			u64		max_addr;
> +		};

what about 'nid'?

> +
> +		/* Nested user domain */
> +		struct {
> +			/* 2-level page table the user domain nested */

/* parent page table which the user domain is nested on */

> +			struct dmar_domain *s2_domain;
> +			/* user page table pointer (in GPA) */
> +			unsigned long s1_pgtbl;
> +			/* page table attributes */
> +			struct iommu_hwpt_intel_vtd s1_cfg;
> +		};
> +	};
> 
>  	struct iommu_domain domain;	/* generic domain data structure for
>  					   iommu core */
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation
  2023-05-11 14:51 ` [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
@ 2023-05-24  7:16   ` Tian, Kevin
  2023-05-26  4:16     ` Baolu Lu
  0 siblings, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-05-24  7:16 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, Liu,
	Yi L, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, Duan, Zhenzhong, Jacob Pan

> From: Yi Liu <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:51 PM
> 
> +
> +/**
> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
> + * This could be used for guest shared virtual address. In this case, the
> + * first level page tables are used for GVA-GPA translation in the guest,
> + * second level page tables are used for GPA-HPA translation.

it's not just for guest SVA. Actually in this series it's RID_PASID nested
translation.

> + *
> + * @iommu:      IOMMU which the device belong to
> + * @dev:        Device to be set up for translation
> + * @pasid:      PASID to be programmed in the device PASID table
> + * @domain:     User domain nested on a s2 domain

"User stage-1 domain"

> + */
> +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device
> *dev,
> +			     u32 pasid, struct dmar_domain *domain)
> +{
> +	struct iommu_hwpt_intel_vtd *s1_cfg = &domain->s1_cfg;
> +	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
> +	struct dmar_domain *s2_domain = domain->s2_domain;
> +	u16 did = domain_id_iommu(domain, iommu);
> +	struct dma_pte *pgd = s2_domain->pgd;
> +	struct pasid_entry *pte;
> +	int agaw;
> +
> +	if (!ecap_nest(iommu->ecap)) {
> +		pr_err_ratelimited("%s: No nested translation support\n",
> +				   iommu->name);
> +		return -ENODEV;
> +	}
> +
> +	/*
> +	 * Sanity checking performed by caller to make sure address width

"by caller"? it's checked in this function.

> +	 * matching in two dimensions: CPU vs. IOMMU, guest vs. host.
> +	 */
> +	switch (s1_cfg->addr_width) {
> +	case ADDR_WIDTH_4LEVEL:
> +		break;
> +#ifdef CONFIG_X86
> +	case ADDR_WIDTH_5LEVEL:
> +		if (!cpu_feature_enabled(X86_FEATURE_LA57) ||
> +		    !cap_fl5lp_support(iommu->cap)) {
> +			dev_err_ratelimited(dev,
> +					    "5-level paging not supported\n");
> +			return -EINVAL;
> +		}
> +		break;
> +#endif
> +	default:
> +		dev_err_ratelimited(dev, "Invalid guest address width %d\n",
> +				    s1_cfg->addr_width);
> +		return -EINVAL;
> +	}
> +
> +	if ((s1_cfg->flags & IOMMU_VTD_PGTBL_SRE) && !ecap_srs(iommu-
> >ecap)) {
> +		pr_err_ratelimited("No supervisor request support on %s\n",
> +				   iommu->name);
> +		return -EINVAL;
> +	}
> +
> +	if ((s1_cfg->flags & IOMMU_VTD_PGTBL_EAFE)
> && !ecap_eafs(iommu->ecap)) {
> +		pr_err_ratelimited("No extended access flag support
> on %s\n",
> +				   iommu->name);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Memory type is only applicable to devices inside processor
> coherent
> +	 * domain. Will add MTS support once coherent devices are available.
> +	 */
> +	if (s1_cfg->flags & IOMMU_VTD_PGTBL_MTS_MASK) {
> +		pr_warn_ratelimited("No memory type support %s\n",
> +				    iommu->name);
> +		return -EINVAL;
> +	}

If it's unsupported why exposing them in the uAPI at this point?

> +
> +	agaw = iommu_skip_agaw(s2_domain, iommu, &pgd);
> +	if (agaw < 0) {
> +		dev_err_ratelimited(dev, "Invalid domain page table\n");
> +		return -EINVAL;
> +	}

this looks problematic.

static inline int iommu_skip_agaw(struct dmar_domain *domain,
                                  struct intel_iommu *iommu,
                                  struct dma_pte **pgd)
{
	int agaw;

	for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
		*pgd = phys_to_virt(dma_pte_addr(*pgd));
		if (!dma_pte_present(*pgd))
			return -EINVAL;
	}

	return agaw;
}

why is it safe to change pgd level of s2 domain when it's used as
the parent? this s2 pgtbl might be used by other devices behind
other iommus which already maps GPAs in a level which this
iommu doesn't support...

shouldn't we simply fail it as another incompatible condition?

> +
> +	/* First level PGD (in GPA) must be supported by the second level. */
> +	if ((uintptr_t)s1_gpgd > (1ULL << s2_domain->gaw)) {
> +		dev_err_ratelimited(dev,
> +				    "Guest PGD %lx not supported,
> max %llx\n",
> +				    (uintptr_t)s1_gpgd, s2_domain-
> >max_addr);
> +		return -EINVAL;
> +	}

I'm not sure how useful this check is. Even if the pgd is sane the
lower level PTEs could include unsupported GPA's. If a guest really
doesn't want to follow the GPA restriction which vIOMMU reports,
it can easily cause IOMMU fault in many ways.

Then why treating pgd specially?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 06/10] iommu/vt-d: Set the nested domain to a device
  2023-05-11 14:51 ` [PATCH v3 06/10] iommu/vt-d: Set the nested domain to a device Yi Liu
@ 2023-05-24  7:22   ` Tian, Kevin
  2023-05-26  4:24     ` Baolu Lu
  0 siblings, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-05-24  7:22 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Jacob Pan

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:51 PM
> 
> +
> +static int intel_nested_attach_dev(struct iommu_domain *domain,
> +				   struct device *dev)
> +{
> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct intel_iommu *iommu = info->iommu;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	if (info->domain)
> +		device_block_translation(dev);
> +
> +	/* Is s2_domain compatible with this IOMMU? */
> +	ret = prepare_domain_attach_device(&dmar_domain->s2_domain-
> >domain, dev);
> +	if (ret) {
> +		dev_err_ratelimited(dev, "s2 domain is not compatible\n");
> +		return ret;
> +	}

this also includes logic to trim higher page levels:

	/*
	 * Knock out extra levels of page tables if necessary
	 */
	while (iommu->agaw < dmar_domain->agaw) {
		struct dma_pte *pte;

		pte = dmar_domain->pgd;
		if (dma_pte_present(pte)) {
			dmar_domain->pgd = phys_to_virt(dma_pte_addr(pte));
			free_pgtable_page(pte);
		}
		dmar_domain->agaw--;
	}

What's the background of doing such truncation instead of simply
failing the request?

In any means it's probably fine before the domain includes any mapping
but really unreasonable to apply it to an existing s2 when it's used as
a parent.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 07/10] iommu/vt-d: Add iotlb flush for nested domain
  2023-05-11 14:51 ` [PATCH v3 07/10] iommu/vt-d: Add iotlb flush for nested domain Yi Liu
@ 2023-05-24  7:33   ` Tian, Kevin
  2023-06-08  7:14     ` Liu, Yi L
  0 siblings, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-05-24  7:33 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:51 PM
> 
> This is needed as the stage-1 page table of the nested domain is
> maintained outside the iommu subsystem, hence, needs to support iotlb
> flush requests.
> 
> This adds the data structure for flushing iotlb for the nested domain
> allocated with IOMMU_HWPT_TYPE_VTD_S1 type and the related callback
> to accept iotlb flush request from IOMMUFD.
> 
> This only exposes the interface for invalidating IOTLB, but no for
> device-TLB as device-TLB invalidation will be covered automatically
> in IOTLB invalidation if the affected device is ATS-capable.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Following how you split patches in former part of the series this should
be split into three patches: one to introduce the uAPI changes, the 2nd
to export symbols and the last to actually add iotlb flush.

> +static int intel_nested_cache_invalidate_user(struct iommu_domain
> *domain,
> +					      void *user_data)
> +{
> +	struct iommu_hwpt_invalidate_request_intel_vtd *req = user_data;
> +	struct iommu_hwpt_invalidate_intel_vtd *inv_info = user_data;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	unsigned int entry_size = inv_info->entry_size;
> +	u64 uptr = inv_info->inv_data_uptr;
> +	u64 nr_uptr = inv_info->entry_nr_uptr;
> +	struct device_domain_info *info;
> +	u32 entry_nr, index;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	if (WARN_ON(!user_data))
> +		return 0;

WARN_ON should lead to error returned.

> +
> +	if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
> +		return -EFAULT;
> +
> +	if (!entry_nr)
> +		return -EINVAL;

Having zero number of entries is instead not an error. Just means no work
to do.

> +
> +	for (index = 0; index < entry_nr; index++) {
> +		ret = copy_struct_from_user(req, sizeof(*req),
> +					    u64_to_user_ptr(uptr + index *
> entry_size),
> +					    entry_size);
> +		if (ret) {
> +			pr_err_ratelimited("Failed to fetch invalidation
> request\n");
> +			break;
> +		}
> +
> +		if (req->__reserved || (req->flags &
> ~IOMMU_VTD_QI_FLAGS_LEAF) ||
> +		    !IS_ALIGNED(req->addr, VTD_PAGE_SIZE)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		spin_lock_irqsave(&dmar_domain->lock, flags);
> +		list_for_each_entry(info, &dmar_domain->devices, link)
> +			intel_nested_invalidate(info->dev, dmar_domain,
> +						req->addr, req->npages);
> +		spin_unlock_irqrestore(&dmar_domain->lock, flags);
> +	}
> +
> +	if (ret && put_user(index, (uint32_t __user
> *)u64_to_user_ptr(nr_uptr)))
> +		return -EFAULT;

You want to always update the nr no matter success or failure

> diff --git a/drivers/iommu/iommufd/main.c
> b/drivers/iommu/iommufd/main.c
> index 39922f83ce34..b338b082950b 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -282,6 +282,12 @@ union ucmd_buffer {
>  #ifdef CONFIG_IOMMUFD_TEST
>  	struct iommu_test_cmd test;
>  #endif
> +	/*
> +	 * hwpt_type specific structure used in the cache invalidation
> +	 * path.
> +	 */
> +	struct iommu_hwpt_invalidate_intel_vtd vtd;
> +	struct iommu_hwpt_invalidate_request_intel_vtd req_vtd;
>  };

Can you add some explanation in commit msg why such vendor
specific structures must be put in the generic ucmd_buffer?

> 
> +/**
> + * enum iommu_hwpt_intel_vtd_invalidate_flags - Flags for Intel VT-d

enum iommu_hwpt_vtd_s1_invalidate_flags

> + *                                              stage-1 page table cache
> + *                                              invalidation
> + * @IOMMU_VTD_QI_FLAGS_LEAF: The LEAF flag indicates whether only the
> + *                           leaf PTE caching needs to be invalidated
> + *                           and other paging structure caches can be
> + *                           preserved.
> + */

what about "Drain Reads" and "Drain Writes"? Is the user allowed/required
to provide those hints?
> +
> +/**
> + * struct iommu_hwpt_invalidate_request_intel_vtd - Intel VT-d cache
> invalidation request

here you put "intel_vtd" in the end of the name. let's follow the same order
as earlier definitions.

struct iommu_hwpt_vtd_s1_invalidate_desc

> + * @addr: The start address of the addresses to be invalidated.
> + * @npages: Number of contiguous 4K pages to be invalidated.
> + * @flags: Combination of enum iommu_hwpt_intel_vtd_invalidate_flags
> + * @__reserved: Must be 0
> + *
> + * The Intel VT-d specific invalidation data for user-managed stage-1 cache
> + * invalidation under nested translation. Userspace uses this structure to
> + * tell host about the impacted caches after modifying the stage-1 page
> table.
> + *
> + * Invalidating all the caches related to the hw_pagetable by setting
> + * @addr==0 and @npages==__u64(-1).
> + */
> +struct iommu_hwpt_invalidate_request_intel_vtd {
> +	__u64 addr;
> +	__u64 npages;
> +	__u32 flags;
> +	__u32 __reserved;
> +};
> +
> +/**
> + * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache invalidation
> info

iommu_hwpt_vtd_s1_invalidate

> + * @flags: Must be 0
> + * @entry_size: Size in bytes of each cache invalidation request
> + * @entry_nr_uptr: User pointer to the number of invalidation requests.
> + *                 Kernel reads it to get the number of requests and
> + *                 updates the buffer with the number of requests that
> + *                 have been processed successfully. This pointer must
> + *                 point to a __u32 type of memory location.
> + * @inv_data_uptr: Pointer to the cache invalidation requests
> + *
> + * The Intel VT-d specific invalidation data for a set of cache invalidation
> + * requests. Kernel loops the requests one-by-one and stops when failure
> + * is encountered. The number of handled requests is reported to user by
> + * writing the buffer pointed by @entry_nr_uptr.
> + */
> +struct iommu_hwpt_invalidate_intel_vtd {
> +	__u32 flags;
> +	__u32 entry_size;
> +	__u64 entry_nr_uptr;
> +	__u64 inv_data_uptr;
> +};
> +
>  /**
>   * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
>   * @size: sizeof(struct iommu_hwpt_invalidate)
> @@ -520,6 +577,8 @@ struct iommu_hw_info {
>   *
> +==============================+================================
> ========+
>   * | @hwpt_type                   |     Data structure in @data_uptr       |
>   * +------------------------------+----------------------------------------+
> + * | IOMMU_HWPT_TYPE_VTD_S1       | struct
> iommu_hwpt_invalidate_intel_vtd |
> + * +------------------------------+----------------------------------------+
>   */
>  struct iommu_hwpt_invalidate {
>  	__u32 size;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 10/10] iommu/vt-d: Disallow nesting on domains with read-only mappings
  2023-05-11 14:51 ` [PATCH v3 10/10] iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu
@ 2023-05-24  7:44   ` Tian, Kevin
  2023-05-26  4:28     ` Baolu Lu
  0 siblings, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-05-24  7:44 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:51 PM
> 
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> When remapping hardware is configured by system software in scalable
> mode
> as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry,
> it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled)
> in first-stage page-table entries even when second-stage mappings indicate
> that corresponding first-stage page-table is Read-Only.
> 
> As the result, contents of pages designated by VMM as Read-Only can be
> modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of
> address translation process due to DMAs issued by Guest.
> 
> Disallow the nested translation when there are read-only pages in the
> corresponding second-stage mappings. And, no read-only pages are allowed
> to be configured in the second-stage table of a nested translation.
> For the latter, an alternative is to disallow read-only mappings in
> any stage-2 domain as long as it's ever been used as a parent. In this
> way, we can simply replace the user counter with a flag.
> 
> In concept if the user understands this errata and does expect to
> enable nested translation it should never install any RO mapping
> in stage-2 in the entire VM life cycle."

IMHO the alternative is reasonable and simpler. If the user decides to
enable nesting it should keep the nesting-friendly configuration static
since whether nesting is enabled on a device is according to viommu
configuration (i.e. whether the guest attaches the device to identity
domain or non-identity domain) and it's not good to break the nesting
setup just because the host inadvertently adds a RO mapping to s2 in
the middle between guest is detached/put back to identity domain
and then re-attach to an unmanaged domain.
> 
> +	if (!(prot & DMA_PTE_WRITE) && !domain->read_only_mapped) {
> +		spin_lock_irqsave(&domain->lock, flags);
> +		if (domain->nested_users > 0) {
> +			spin_unlock_irqrestore(&domain->lock, flags);
> +			return -EINVAL;
> +		}
> +

this is worth a one-off warning. Same in the other path.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 00/10] Add Intel VT-d nested translation
  2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
                   ` (9 preceding siblings ...)
  2023-05-11 14:51 ` [PATCH v3 10/10] iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu
@ 2023-05-24  8:59 ` Tian, Kevin
  2023-05-25 18:06   ` Alex Williamson
  2023-05-29 18:43   ` Jason Gunthorpe
  10 siblings, 2 replies; 40+ messages in thread
From: Tian, Kevin @ 2023-05-24  8:59 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:51 PM
> 
> The first Intel platform supporting nested translation is Sapphire
> Rapids which, unfortunately, has a hardware errata [2] requiring special
> treatment. This errata happens when a stage-1 page table page (either
> level) is located in a stage-2 read-only region. In that case the IOMMU
> hardware may ignore the stage-2 RO permission and still set the A/D bit
> in stage-1 page table entries during page table walking.
> 
> A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to
> report
> this errata to userspace. With that restriction the user should either
> disable nested translation to favor RO stage-2 mappings or ensure no
> RO stage-2 mapping to enable nested translation.
> 
> Intel-iommu driver is armed with necessary checks to prevent such mix
> in patch10 of this series.
> 
> Qemu currently does add RO mappings though. The vfio agent in Qemu
> simply maps all valid regions in the GPA address space which certainly
> includes RO regions e.g. vbios.
> 
> In reality we don't know a usage relying on DMA reads from the BIOS
> region. Hence finding a way to allow user opt-out RO mappings in
> Qemu might be an acceptable tradeoff. But how to achieve it cleanly
> needs more discussion in Qemu community. For now we just hacked Qemu
> to test.
> 

Hi, Alex,

Want to touch base on your thoughts about this errata before we
actually go to discuss how to handle it in Qemu.

Overall it affects all Sapphire Rapids platforms. Fully disabling nested
translation in the kernel just for this rare vulnerability sounds an overkill.

So we decide to enforce the exclusive check (RO in stage-2 vs. nesting)
in the kernel and expose the restriction to userspace so the VMM can
choose which one to enable based on its own requirement.

At least this looks a reasonable tradeoff to some proprietary VMMs
which never adds RO mappings in stage-2 today.

But we do want to get Qemu support nested translation on those
platform as the widely-used reference VMM!

Do you see any major oversight before pursuing such change in Qemu
e.g. having a way for the user to opt-out adding RO mappings in stage-2? 😊

Thanks
Kevin 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  2023-05-11 14:51 ` [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
  2023-05-24  6:59   ` Tian, Kevin
@ 2023-05-25  2:28   ` Zhang, Tina
  2023-05-29 20:00     ` Jason Gunthorpe
  2023-05-29 19:53   ` Jason Gunthorpe
  2 siblings, 1 reply; 40+ messages in thread
From: Zhang, Tina @ 2023-05-25  2:28 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, Tian, Kevin, robin.murphy,
	baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, Liu,
	Yi L, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, Duan, Zhenzhong

Hi,

> -----Original Message-----
> From: Yi Liu <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:51 PM
> To: joro@8bytes.org; alex.williamson@redhat.com; jgg@nvidia.com; Tian,
> Kevin <kevin.tian@intel.com>; robin.murphy@arm.com;
> baolu.lu@linux.intel.com
> Cc: cohuck@redhat.com; eric.auger@redhat.com; nicolinc@nvidia.com;
> kvm@vger.kernel.org; mjrosato@linux.ibm.com;
> chao.p.peng@linux.intel.com; Liu, Yi L <yi.l.liu@intel.com>;
> yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> shameerali.kolothum.thodi@huawei.com; lulu@redhat.com;
> suravee.suthikulpanit@amd.com; iommu@lists.linux.dev; linux-
> kernel@vger.kernel.org; linux-kselftest@vger.kernel.org; Duan, Zhenzhong
> <zhenzhong.duan@intel.com>
> Subject: [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1
> domain allocation
> 
> This adds IOMMU_HWPT_TYPE_VTD_S1 for stage-1 hw_pagetable of Intel VT-
> d and the corressponding data structure for userspace specified parameter
> for the domain allocation.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  include/uapi/linux/iommufd.h | 57
> ++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
> 
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 06dcad6ab082..c2658394827a 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -353,9 +353,64 @@ struct iommu_vfio_ioas {
>  /**
>   * enum iommu_hwpt_type - IOMMU HWPT Type
>   * @IOMMU_HWPT_TYPE_DEFAULT: default
> + * @IOMMU_HWPT_TYPE_VTD_S1: Intel VT-d stage-1 page table
>   */
>  enum iommu_hwpt_type {
>  	IOMMU_HWPT_TYPE_DEFAULT,
> +	IOMMU_HWPT_TYPE_VTD_S1,
> +};
> +
> +/**
> + * enum iommu_hwpt_intel_vtd_flags - Intel VT-d stage-1 page
> + *				     table entry attributes
> + * @IOMMU_VTD_PGTBL_SRE: Supervisor request
> + * @IOMMU_VTD_PGTBL_EAFE: Extended access enable
> + * @IOMMU_VTD_PGTBL_PCD: Page-level cache disable
> + * @IOMMU_VTD_PGTBL_PWT: Page-level write through
> + * @IOMMU_VTD_PGTBL_EMTE: Extended mem type enable
> + * @IOMMU_VTD_PGTBL_CD: PASID-level cache disable
> + * @IOMMU_VTD_PGTBL_WPE: Write protect enable  */ enum
> +iommu_hwpt_intel_vtd_flags {
> +	IOMMU_VTD_PGTBL_SRE = 1 << 0,
> +	IOMMU_VTD_PGTBL_EAFE = 1 << 1,
> +	IOMMU_VTD_PGTBL_PCD = 1 << 2,
> +	IOMMU_VTD_PGTBL_PWT = 1 << 3,
> +	IOMMU_VTD_PGTBL_EMTE = 1 << 4,
> +	IOMMU_VTD_PGTBL_CD = 1 << 5,
> +	IOMMU_VTD_PGTBL_WPE = 1 << 6,
> +	IOMMU_VTD_PGTBL_LAST = 1 << 7,
> +};
> +
> +/**
> + * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
> + *                               stage-1 page table info
> + * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
> + * @pgtbl_addr: The base address of the user-managed stage-1 page table.
> + * @pat: Page attribute table data to compute effective memory type
> + * @emt: Extended memory type
> + * @addr_width: The address width of the untranslated addresses that are
> + *              subjected to the user-managed stage-1 page table.
> + * @__reserved: Must be 0
> + *
> + * The Intel VT-d specific data for creating hw_pagetable to represent
> + * the user-managed stage-1 page table that is used in nested translation.
> + *
> + * In nested translation, the stage-1 page table locates in the address
> + * space that defined by the corresponding stage-2 page table. Hence
> +the
> + * stage-1 page table base address value should not be higher than the
> + * maximum untranslated address of stage-2 page table.
> + *
> + * The paging level of the stage-1 page table should be compatible with
> + * the hardware iommu. Otherwise, the allocation would be failed.
> + */
> +struct iommu_hwpt_intel_vtd {
> +	__u64 flags;
> +	__u64 pgtbl_addr;
> +	__u32 pat;
> +	__u32 emt;
Do we need the emt field as part of the stage-1 page table info? IIUC, according to vt-d spec, the emt field is in stage-2 page table entries. Since in nested mode we only expose stage-1 page table to guest vt-d driver, I'm curious how does guest vt-d driver populate this EMT?

Thanks
-Tina
> +	__u32 addr_width;
> +	__u32 __reserved;
>  };
> 
>  /**
> @@ -391,6 +446,8 @@ enum iommu_hwpt_type {
>   * +------------------------------+-------------------------------------+-----------+
>   * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
>   * +------------------------------+-------------------------------------+-----------+
> + * | IOMMU_HWPT_TYPE_VTD_S1       |      struct iommu_hwpt_intel_vtd    |
> HWPT   |
> + *
> + +------------------------------+-------------------------------------+
> + -----------+
>   */
>  struct iommu_hwpt_alloc {
>  	__u32 size;
> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 00/10] Add Intel VT-d nested translation
  2023-05-24  8:59 ` [PATCH v3 00/10] Add Intel VT-d nested translation Tian, Kevin
@ 2023-05-25 18:06   ` Alex Williamson
  2023-05-26 11:25     ` Tian, Kevin
  2023-05-29 18:43   ` Jason Gunthorpe
  1 sibling, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2023-05-25 18:06 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, jgg, robin.murphy, baolu.lu, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On Wed, 24 May 2023 08:59:43 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, May 11, 2023 10:51 PM
> > 
> > The first Intel platform supporting nested translation is Sapphire
> > Rapids which, unfortunately, has a hardware errata [2] requiring special
> > treatment. This errata happens when a stage-1 page table page (either
> > level) is located in a stage-2 read-only region. In that case the IOMMU
> > hardware may ignore the stage-2 RO permission and still set the A/D bit
> > in stage-1 page table entries during page table walking.
> > 
> > A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to
> > report
> > this errata to userspace. With that restriction the user should either
> > disable nested translation to favor RO stage-2 mappings or ensure no
> > RO stage-2 mapping to enable nested translation.
> > 
> > Intel-iommu driver is armed with necessary checks to prevent such mix
> > in patch10 of this series.
> > 
> > Qemu currently does add RO mappings though. The vfio agent in Qemu
> > simply maps all valid regions in the GPA address space which certainly
> > includes RO regions e.g. vbios.
> > 
> > In reality we don't know a usage relying on DMA reads from the BIOS
> > region. Hence finding a way to allow user opt-out RO mappings in
> > Qemu might be an acceptable tradeoff. But how to achieve it cleanly
> > needs more discussion in Qemu community. For now we just hacked Qemu
> > to test.
> >   
> 
> Hi, Alex,
> 
> Want to touch base on your thoughts about this errata before we
> actually go to discuss how to handle it in Qemu.
> 
> Overall it affects all Sapphire Rapids platforms. Fully disabling nested
> translation in the kernel just for this rare vulnerability sounds an overkill.
> 
> So we decide to enforce the exclusive check (RO in stage-2 vs. nesting)
> in the kernel and expose the restriction to userspace so the VMM can
> choose which one to enable based on its own requirement.
> 
> At least this looks a reasonable tradeoff to some proprietary VMMs
> which never adds RO mappings in stage-2 today.
> 
> But we do want to get Qemu support nested translation on those
> platform as the widely-used reference VMM!
> 
> Do you see any major oversight before pursuing such change in Qemu
> e.g. having a way for the user to opt-out adding RO mappings in stage-2? 😊

I don't feel like I have enough info to know what common scenarios are
going to make use of 2-stage and nested configurations and how likely a
user is to need such an opt-out.  If it's likely that a user is going
to encounter this configuration, an opt-out is at best a workaround.
It's a significant support issue if a user needs to generate a failure
in QEMU, notice and decipher any log messages that failure may have
generated, and take action to introduce specific changes in their VM
configuration to support a usage restriction.

For QEMU I might lean more towards an effort to better filter the
mappings we create to avoid these read-only ranges that likely don't
require DMA mappings anyway.

How much does this affect arbitrary userspace vfio drivers?  For
example are there scenarios where running in a VM with a vIOMMU
introduces nested support that's unknown to the user which now prevents
this usage?  An example might be running an L2 guest with a version of
QEMU that does create read-only mappings.  If necessary, how would lack
of read-only mapping support be conveyed to those nested use cases?
Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 02/10] iommu/vt-d: Extend dmar_domain to support nested domain
  2023-05-24  7:02   ` Tian, Kevin
@ 2023-05-26  2:56     ` Baolu Lu
  0 siblings, 0 replies; 40+ messages in thread
From: Baolu Lu @ 2023-05-26  2:56 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On 5/24/23 3:02 PM, Tian, Kevin wrote:
>> From: Liu, Yi L<yi.l.liu@intel.com>
>> Sent: Thursday, May 11, 2023 10:51 PM
>>
>> From: Lu Baolu<baolu.lu@linux.intel.com>
>>
>> The nested domain fields are exclusive to those that used for a DMA
>> remapping domain. Use union to avoid memory waste.
>>
>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>> Signed-off-by: Yi Liu<yi.l.liu@intel.com>
>> ---
>>   drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
>>   1 file changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>> index 1c5e1d88862b..e818520f4068 100644
>> --- a/drivers/iommu/intel/iommu.h
>> +++ b/drivers/iommu/intel/iommu.h
>> @@ -596,15 +596,38 @@ struct dmar_domain {
>>   	spinlock_t lock;		/* Protect device tracking lists */
>>   	struct list_head devices;	/* all devices' list */
>>
>> -	struct dma_pte	*pgd;		/* virtual address */
>> -	int		gaw;		/* max guest address width */
>> -
>> -	/* adjusted guest address width, 0 is level 2 30-bit */
>> -	int		agaw;
>>   	int		iommu_superpage;/* Level of superpages supported:
>>   					   0 == 4KiB (no superpages), 1 ==
>> 2MiB,
>>   					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
>> -	u64		max_addr;	/* maximum mapped address */
>> +	union {
>> +		/* DMA remapping domain */
>> +		struct {
>> +			/* virtual address */
>> +			struct dma_pte	*pgd;
>> +			/* max guest address width */
>> +			int		gaw;
>> +			/*
>> +			 * adjusted guest address width:
>> +			 *   0: level 2 30-bit
>> +			 *   1: level 3 39-bit
>> +			 *   2: level 4 48-bit
>> +			 *   3: level 5 57-bit
>> +			 */
>> +			int		agaw;
>> +			/* maximum mapped address */
>> +			u64		max_addr;
>> +		};
> what about 'nid'?


"nid" represents which NUMA node should we allocate pages from for this
domain. It's updated every time when a domain is attached/detached
to/from a device or pasid.

Generally speaking, "nid" is common for all types of domain. But in this
case, only a DMA remapping domain has a need to allocate pages. I intend
to keep it as it for now. There's more cleanup rooms if we limit it only
for DMA remapping domain.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation
  2023-05-24  7:16   ` Tian, Kevin
@ 2023-05-26  4:16     ` Baolu Lu
  2023-06-07  8:34       ` Liu, Yi L
  2023-06-08  3:35       ` Liu, Yi L
  0 siblings, 2 replies; 40+ messages in thread
From: Baolu Lu @ 2023-05-26  4:16 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong, Jacob Pan

On 5/24/23 3:16 PM, Tian, Kevin wrote:
>> From: Yi Liu <yi.l.liu@intel.com>
>> Sent: Thursday, May 11, 2023 10:51 PM
>>
>> +
>> +/**
>> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
>> + * This could be used for guest shared virtual address. In this case, the
>> + * first level page tables are used for GVA-GPA translation in the guest,
>> + * second level page tables are used for GPA-HPA translation.
> 
> it's not just for guest SVA. Actually in this series it's RID_PASID nested
> translation.

Yes.

>> + *
>> + * @iommu:      IOMMU which the device belong to
>> + * @dev:        Device to be set up for translation
>> + * @pasid:      PASID to be programmed in the device PASID table
>> + * @domain:     User domain nested on a s2 domain
> 
> "User stage-1 domain"

Yes.

>> + */
>> +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device
>> *dev,
>> +			     u32 pasid, struct dmar_domain *domain)
>> +{
>> +	struct iommu_hwpt_intel_vtd *s1_cfg = &domain->s1_cfg;
>> +	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
>> +	struct dmar_domain *s2_domain = domain->s2_domain;
>> +	u16 did = domain_id_iommu(domain, iommu);
>> +	struct dma_pte *pgd = s2_domain->pgd;
>> +	struct pasid_entry *pte;
>> +	int agaw;
>> +
>> +	if (!ecap_nest(iommu->ecap)) {
>> +		pr_err_ratelimited("%s: No nested translation support\n",
>> +				   iommu->name);
>> +		return -ENODEV;
>> +	}
>> +
>> +	/*
>> +	 * Sanity checking performed by caller to make sure address width
> 
> "by caller"? it's checked in this function.

This comment need to be updated.

>> +	 * matching in two dimensions: CPU vs. IOMMU, guest vs. host.
>> +	 */
>> +	switch (s1_cfg->addr_width) {
>> +	case ADDR_WIDTH_4LEVEL:
>> +		break;
>> +#ifdef CONFIG_X86
>> +	case ADDR_WIDTH_5LEVEL:
>> +		if (!cpu_feature_enabled(X86_FEATURE_LA57) ||
>> +		    !cap_fl5lp_support(iommu->cap)) {
>> +			dev_err_ratelimited(dev,
>> +					    "5-level paging not supported\n");
>> +			return -EINVAL;
>> +		}
>> +		break;
>> +#endif
>> +	default:
>> +		dev_err_ratelimited(dev, "Invalid guest address width %d\n",
>> +				    s1_cfg->addr_width);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if ((s1_cfg->flags & IOMMU_VTD_PGTBL_SRE) && !ecap_srs(iommu-
>>> ecap)) {
>> +		pr_err_ratelimited("No supervisor request support on %s\n",
>> +				   iommu->name);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if ((s1_cfg->flags & IOMMU_VTD_PGTBL_EAFE)
>> && !ecap_eafs(iommu->ecap)) {
>> +		pr_err_ratelimited("No extended access flag support
>> on %s\n",
>> +				   iommu->name);
>> +		return -EINVAL;
>> +	}
>> +
>> +	/*
>> +	 * Memory type is only applicable to devices inside processor
>> coherent
>> +	 * domain. Will add MTS support once coherent devices are available.
>> +	 */
>> +	if (s1_cfg->flags & IOMMU_VTD_PGTBL_MTS_MASK) {
>> +		pr_warn_ratelimited("No memory type support %s\n",
>> +				    iommu->name);
>> +		return -EINVAL;
>> +	}
> 
> If it's unsupported why exposing them in the uAPI at this point?

Agreed. We can remove this flag for now.

>> +
>> +	agaw = iommu_skip_agaw(s2_domain, iommu, &pgd);
>> +	if (agaw < 0) {
>> +		dev_err_ratelimited(dev, "Invalid domain page table\n");
>> +		return -EINVAL;
>> +	}
> 
> this looks problematic.
> 
> static inline int iommu_skip_agaw(struct dmar_domain *domain,
>                                    struct intel_iommu *iommu,
>                                    struct dma_pte **pgd)
> {
> 	int agaw;
> 
> 	for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
> 		*pgd = phys_to_virt(dma_pte_addr(*pgd));
> 		if (!dma_pte_present(*pgd))
> 			return -EINVAL;
> 	}
> 
> 	return agaw;
> }
> 
> why is it safe to change pgd level of s2 domain when it's used as
> the parent? this s2 pgtbl might be used by other devices behind
> other iommus which already maps GPAs in a level which this
> iommu doesn't support...
> 
> shouldn't we simply fail it as another incompatible condition?

You are right. We can change it to this:

	if (domain->agaw > iommu->agaw)
		return -EINVAL;

> 
>> +
>> +	/* First level PGD (in GPA) must be supported by the second level. */
>> +	if ((uintptr_t)s1_gpgd > (1ULL << s2_domain->gaw)) {
>> +		dev_err_ratelimited(dev,
>> +				    "Guest PGD %lx not supported,
>> max %llx\n",
>> +				    (uintptr_t)s1_gpgd, s2_domain-
>>> max_addr);
>> +		return -EINVAL;
>> +	}
> 
> I'm not sure how useful this check is. Even if the pgd is sane the
> lower level PTEs could include unsupported GPA's. If a guest really
> doesn't want to follow the GPA restriction which vIOMMU reports,
> it can easily cause IOMMU fault in many ways.

You are right.

> Then why treating pgd specially?

I have no memory about this check for now. Yi, any thought?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 06/10] iommu/vt-d: Set the nested domain to a device
  2023-05-24  7:22   ` Tian, Kevin
@ 2023-05-26  4:24     ` Baolu Lu
  0 siblings, 0 replies; 40+ messages in thread
From: Baolu Lu @ 2023-05-26  4:24 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong, Jacob Pan

On 5/24/23 3:22 PM, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Thursday, May 11, 2023 10:51 PM
>>
>> +
>> +static int intel_nested_attach_dev(struct iommu_domain *domain,
>> +				   struct device *dev)
>> +{
>> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
>> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> +	struct intel_iommu *iommu = info->iommu;
>> +	unsigned long flags;
>> +	int ret = 0;
>> +
>> +	if (info->domain)
>> +		device_block_translation(dev);
>> +
>> +	/* Is s2_domain compatible with this IOMMU? */
>> +	ret = prepare_domain_attach_device(&dmar_domain->s2_domain-
>>> domain, dev);
>> +	if (ret) {
>> +		dev_err_ratelimited(dev, "s2 domain is not compatible\n");
>> +		return ret;
>> +	}
> 
> this also includes logic to trim higher page levels:
> 
> 	/*
> 	 * Knock out extra levels of page tables if necessary
> 	 */
> 	while (iommu->agaw < dmar_domain->agaw) {
> 		struct dma_pte *pte;
> 
> 		pte = dmar_domain->pgd;
> 		if (dma_pte_present(pte)) {
> 			dmar_domain->pgd = phys_to_virt(dma_pte_addr(pte));
> 			free_pgtable_page(pte);
> 		}
> 		dmar_domain->agaw--;
> 	}
> 
> What's the background of doing such truncation instead of simply
> failing the request?

This code existed a long time ago. I'm not sure if it's still reasonable
so far.

> In any means it's probably fine before the domain includes any mapping
> but really unreasonable to apply it to an existing s2 when it's used as
> a parent.

But for the new nested translation, it is obviously unreasonable.

Let me revisit it.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 10/10] iommu/vt-d: Disallow nesting on domains with read-only mappings
  2023-05-24  7:44   ` Tian, Kevin
@ 2023-05-26  4:28     ` Baolu Lu
  0 siblings, 0 replies; 40+ messages in thread
From: Baolu Lu @ 2023-05-26  4:28 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On 5/24/23 3:44 PM, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Thursday, May 11, 2023 10:51 PM
>>
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>
>> When remapping hardware is configured by system software in scalable
>> mode
>> as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry,
>> it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled)
>> in first-stage page-table entries even when second-stage mappings indicate
>> that corresponding first-stage page-table is Read-Only.
>>
>> As the result, contents of pages designated by VMM as Read-Only can be
>> modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of
>> address translation process due to DMAs issued by Guest.
>>
>> Disallow the nested translation when there are read-only pages in the
>> corresponding second-stage mappings. And, no read-only pages are allowed
>> to be configured in the second-stage table of a nested translation.
>> For the latter, an alternative is to disallow read-only mappings in
>> any stage-2 domain as long as it's ever been used as a parent. In this
>> way, we can simply replace the user counter with a flag.
>>
>> In concept if the user understands this errata and does expect to
>> enable nested translation it should never install any RO mapping
>> in stage-2 in the entire VM life cycle."
> 
> IMHO the alternative is reasonable and simpler. If the user decides to
> enable nesting it should keep the nesting-friendly configuration static
> since whether nesting is enabled on a device is according to viommu
> configuration (i.e. whether the guest attaches the device to identity
> domain or non-identity domain) and it's not good to break the nesting
> setup just because the host inadvertently adds a RO mapping to s2 in
> the middle between guest is detached/put back to identity domain
> and then re-attach to an unmanaged domain.

Fair enough.

>>
>> +	if (!(prot & DMA_PTE_WRITE) && !domain->read_only_mapped) {
>> +		spin_lock_irqsave(&domain->lock, flags);
>> +		if (domain->nested_users > 0) {
>> +			spin_unlock_irqrestore(&domain->lock, flags);
>> +			return -EINVAL;
>> +		}
>> +
> 
> this is worth a one-off warning. Same in the other path.

Sure.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 00/10] Add Intel VT-d nested translation
  2023-05-25 18:06   ` Alex Williamson
@ 2023-05-26 11:25     ` Tian, Kevin
  0 siblings, 0 replies; 40+ messages in thread
From: Tian, Kevin @ 2023-05-26 11:25 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, joro, jgg, robin.murphy, baolu.lu, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, May 26, 2023 2:07 AM
> 
> On Wed, 24 May 2023 08:59:43 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Thursday, May 11, 2023 10:51 PM
> > >
> > > The first Intel platform supporting nested translation is Sapphire
> > > Rapids which, unfortunately, has a hardware errata [2] requiring special
> > > treatment. This errata happens when a stage-1 page table page (either
> > > level) is located in a stage-2 read-only region. In that case the IOMMU
> > > hardware may ignore the stage-2 RO permission and still set the A/D bit
> > > in stage-1 page table entries during page table walking.
> > >
> > > A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to
> > > report
> > > this errata to userspace. With that restriction the user should either
> > > disable nested translation to favor RO stage-2 mappings or ensure no
> > > RO stage-2 mapping to enable nested translation.
> > >
> > > Intel-iommu driver is armed with necessary checks to prevent such mix
> > > in patch10 of this series.
> > >
> > > Qemu currently does add RO mappings though. The vfio agent in Qemu
> > > simply maps all valid regions in the GPA address space which certainly
> > > includes RO regions e.g. vbios.
> > >
> > > In reality we don't know a usage relying on DMA reads from the BIOS
> > > region. Hence finding a way to allow user opt-out RO mappings in
> > > Qemu might be an acceptable tradeoff. But how to achieve it cleanly
> > > needs more discussion in Qemu community. For now we just hacked
> Qemu
> > > to test.
> > >
> >
> > Hi, Alex,
> >
> > Want to touch base on your thoughts about this errata before we
> > actually go to discuss how to handle it in Qemu.
> >
> > Overall it affects all Sapphire Rapids platforms. Fully disabling nested
> > translation in the kernel just for this rare vulnerability sounds an overkill.
> >
> > So we decide to enforce the exclusive check (RO in stage-2 vs. nesting)
> > in the kernel and expose the restriction to userspace so the VMM can
> > choose which one to enable based on its own requirement.
> >
> > At least this looks a reasonable tradeoff to some proprietary VMMs
> > which never adds RO mappings in stage-2 today.
> >
> > But we do want to get Qemu support nested translation on those
> > platform as the widely-used reference VMM!
> >
> > Do you see any major oversight before pursuing such change in Qemu
> > e.g. having a way for the user to opt-out adding RO mappings in stage-2?
> 😊
> 
> I don't feel like I have enough info to know what common scenarios are
> going to make use of 2-stage and nested configurations and how likely a
> user is to need such an opt-out.  If it's likely that a user is going
> to encounter this configuration, an opt-out is at best a workaround.
> It's a significant support issue if a user needs to generate a failure
> in QEMU, notice and decipher any log messages that failure may have
> generated, and take action to introduce specific changes in their VM
> configuration to support a usage restriction.

Thanks. This is a good point.

> 
> For QEMU I might lean more towards an effort to better filter the
> mappings we create to avoid these read-only ranges that likely don't
> require DMA mappings anyway.

We thought about having intel-viommu to register a discard memory
manager to filter in case the kernel reports this errata.

Our originally thought was that even with it we may still want to
explicitly let user to opt given this configuration doesn't match the
bare metal. But with your explanation probably doing so instead
causes more trouble than what it tries to achieve.

> 
> How much does this affect arbitrary userspace vfio drivers?  For
> example are there scenarios where running in a VM with a vIOMMU
> introduces nested support that's unknown to the user which now prevents
> this usage?  An example might be running an L2 guest with a version of
> QEMU that does create read-only mappings.  If necessary, how would lack
> of read-only mapping support be conveyed to those nested use cases?

To enable nested translation it's expected to have the guest use
stage-1 while the host uses stage-2. So the L0 QEMU will expose
a vIOMMU with only stage-1 capability to L1.

In that case it's perfectly fine to have RO mappings in stage-1 no
matter whether L1 further create L2 guest inside.

Then only L0 QEMU needs to care about this RO thing in stage-2.

In case L0 QEMU exposes a legacy vIOMMU which supports only stage-2
then nesting cannot be enabled. Instead it will fallback to the old
shadowing path then RO mapping from guest doesn't matter either.

Exposing a vIOMMU which supports both stage-1/stage-2/nesting
is another story. But I believe it's far from when this becomes useful
and it's reasonable to just have L0 QEMU not support this configuration
before this errata is fixed. 😊

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 00/10] Add Intel VT-d nested translation
  2023-05-24  8:59 ` [PATCH v3 00/10] Add Intel VT-d nested translation Tian, Kevin
  2023-05-25 18:06   ` Alex Williamson
@ 2023-05-29 18:43   ` Jason Gunthorpe
  2023-05-30  0:16     ` Alex Williamson
  2023-06-14  8:07     ` Tian, Kevin
  1 sibling, 2 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2023-05-29 18:43 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, May 24, 2023 at 08:59:43AM +0000, Tian, Kevin wrote:

> At least this looks a reasonable tradeoff to some proprietary VMMs
> which never adds RO mappings in stage-2 today.

What is the reason for the RO anyhow?

Would it be so bad if it was DMA mapped as RW due to the errata?

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  2023-05-11 14:51 ` [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
  2023-05-24  6:59   ` Tian, Kevin
  2023-05-25  2:28   ` Zhang, Tina
@ 2023-05-29 19:53   ` Jason Gunthorpe
  2 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2023-05-29 19:53 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan

On Thu, May 11, 2023 at 07:51:01AM -0700, Yi Liu wrote:
> This adds IOMMU_HWPT_TYPE_VTD_S1 for stage-1 hw_pagetable of Intel VT-d
> +/**
> + * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
> + *                               stage-1 page table info
> + * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
> + * @pgtbl_addr: The base address of the user-managed stage-1 page table.
> + * @pat: Page attribute table data to compute effective memory type
> + * @emt: Extended memory type
> + * @addr_width: The address width of the untranslated addresses that are
> + *              subjected to the user-managed stage-1 page table.
> + * @__reserved: Must be 0
> + *
> + * The Intel VT-d specific data for creating hw_pagetable to represent
> + * the user-managed stage-1 page table that is used in nested translation.
> + *
> + * In nested translation, the stage-1 page table locates in the address
> + * space that defined by the corresponding stage-2 page table. Hence the
> + * stage-1 page table base address value should not be higher than the
> + * maximum untranslated address of stage-2 page table.
> + *
> + * The paging level of the stage-1 page table should be compatible with
> + * the hardware iommu. Otherwise, the allocation would be failed.
> + */
> +struct iommu_hwpt_intel_vtd {
> +	__u64 flags;
> +	__u64 pgtbl_addr;

__aligned_u64

> +	__u32 pat;
> +	__u32 emt;
> +	__u32 addr_width;
> +	__u32 __reserved;
>  };
>  
>  /**
> @@ -391,6 +446,8 @@ enum iommu_hwpt_type {
>   * +------------------------------+-------------------------------------+-----------+
>   * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
>   * +------------------------------+-------------------------------------+-----------+
> + * | IOMMU_HWPT_TYPE_VTD_S1       |      struct iommu_hwpt_intel_vtd    |    HWPT   |
> + * +------------------------------+-------------------------------------+-----------+

Please don't make ascii art tables.

Note beside the struct what enum it is for

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  2023-05-25  2:28   ` Zhang, Tina
@ 2023-05-29 20:00     ` Jason Gunthorpe
  0 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2023-05-29 20:00 UTC (permalink / raw)
  To: Zhang, Tina
  Cc: Liu, Yi L, joro, alex.williamson, Tian, Kevin, robin.murphy,
	baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On Thu, May 25, 2023 at 02:28:18AM +0000, Zhang, Tina wrote:

> > +struct iommu_hwpt_intel_vtd {
> > +	__u64 flags;
> > +	__u64 pgtbl_addr;
> > +	__u32 pat;
> > +	__u32 emt;

> Do we need the emt field as part of the stage-1 page table info?
> IIUC, according to vt-d spec, the emt field is in stage-2 page table
> entries. Since in nested mode we only expose stage-1 page table to
> guest vt-d driver, I'm curious how does guest vt-d driver populate
> this EMT?

Indeed. The EMT is controlling how the iommu HW parses memory that is
controlled by the kernel - this simply should not be something that
userspace controls.

The S2 itself has to decide if it is populating the EMT bits in the
IOPTE and if so it could enable EMT. Does userspace need to be
involved here?

The seemingly more tricky thing is that it feels like EMT and PAT
would like to be per-iova and we don't have a means for that right
now. (and even if we did, how would qemu decide what to do ?)

So probably drop it.

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 00/10] Add Intel VT-d nested translation
  2023-05-29 18:43   ` Jason Gunthorpe
@ 2023-05-30  0:16     ` Alex Williamson
  2023-05-30 16:42       ` Jason Gunthorpe
  2023-06-14  8:07     ` Tian, Kevin
  1 sibling, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2023-05-30  0:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Liu, Yi L, joro, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Mon, 29 May 2023 15:43:02 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, May 24, 2023 at 08:59:43AM +0000, Tian, Kevin wrote:
> 
> > At least this looks a reasonable tradeoff to some proprietary VMMs
> > which never adds RO mappings in stage-2 today.  
> 
> What is the reason for the RO anyhow?
> 
> Would it be so bad if it was DMA mapped as RW due to the errata?

What if it's the zero page?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 00/10] Add Intel VT-d nested translation
  2023-05-30  0:16     ` Alex Williamson
@ 2023-05-30 16:42       ` Jason Gunthorpe
  0 siblings, 0 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2023-05-30 16:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Liu, Yi L, joro, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Mon, May 29, 2023 at 06:16:44PM -0600, Alex Williamson wrote:
> On Mon, 29 May 2023 15:43:02 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, May 24, 2023 at 08:59:43AM +0000, Tian, Kevin wrote:
> > 
> > > At least this looks a reasonable tradeoff to some proprietary VMMs
> > > which never adds RO mappings in stage-2 today.  
> > 
> > What is the reason for the RO anyhow?
> > 
> > Would it be so bad if it was DMA mapped as RW due to the errata?
> 
> What if it's the zero page?  Thanks,

GUP doesn't return the zero page if FOL_WRITE is specified

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation
  2023-05-26  4:16     ` Baolu Lu
@ 2023-06-07  8:34       ` Liu, Yi L
  2023-06-08  3:32         ` Liu, Yi L
  2023-06-08  3:35       ` Liu, Yi L
  1 sibling, 1 reply; 40+ messages in thread
From: Liu, Yi L @ 2023-06-07  8:34 UTC (permalink / raw)
  To: Baolu Lu, Tian, Kevin, joro, alex.williamson, jgg, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Jacob Pan

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Friday, May 26, 2023 12:16 PM

> >> +
> >> +	/* First level PGD (in GPA) must be supported by the second level. */
> >> +	if ((uintptr_t)s1_gpgd > (1ULL << s2_domain->gaw)) {
> >> +		dev_err_ratelimited(dev,
> >> +				    "Guest PGD %lx not supported,
> >> max %llx\n",
> >> +				    (uintptr_t)s1_gpgd, s2_domain-
> >>> max_addr);
> >> +		return -EINVAL;
> >> +	}
> >
> > I'm not sure how useful this check is. Even if the pgd is sane the
> > lower level PTEs could include unsupported GPA's. If a guest really
> > doesn't want to follow the GPA restriction which vIOMMU reports,
> > it can easily cause IOMMU fault in many ways.
> 
> You are right.
> 
> > Then why treating pgd specially?
> 
> I have no memory about this check for now. Yi, any thought?

I don’t think there is another special reason. Just want to ensure the page table
base address follows the GPA restriction. But as Kevin mentioned, hw can detect
it anyway. So, I think this check can be dropped.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation
  2023-06-07  8:34       ` Liu, Yi L
@ 2023-06-08  3:32         ` Liu, Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu, Yi L @ 2023-06-08  3:32 UTC (permalink / raw)
  To: Baolu Lu, Tian, Kevin, joro, alex.williamson, jgg, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Jacob Pan

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, June 7, 2023 4:34 PM
> 
> > From: Baolu Lu <baolu.lu@linux.intel.com>
> > Sent: Friday, May 26, 2023 12:16 PM
> 
> > >> +
> > >> +	/* First level PGD (in GPA) must be supported by the second level. */
> > >> +	if ((uintptr_t)s1_gpgd > (1ULL << s2_domain->gaw)) {
> > >> +		dev_err_ratelimited(dev,
> > >> +				    "Guest PGD %lx not supported,
> > >> max %llx\n",
> > >> +				    (uintptr_t)s1_gpgd, s2_domain-
> > >>> max_addr);
> > >> +		return -EINVAL;
> > >> +	}
> > >
> > > I'm not sure how useful this check is. Even if the pgd is sane the
> > > lower level PTEs could include unsupported GPA's. If a guest really
> > > doesn't want to follow the GPA restriction which vIOMMU reports,
> > > it can easily cause IOMMU fault in many ways.
> >
> > You are right.
> >
> > > Then why treating pgd specially?
> >
> > I have no memory about this check for now. Yi, any thought?
> 
> I don’t think there is another special reason. Just want to ensure the page table
> base address follows the GPA restriction. But as Kevin mentioned, hw can detect
> it anyway. So, I think this check can be dropped.

Then we also need to remove below words in the uapi header.

+ Hence the
+ * stage-1 page table base address value should not be higher than the
+ * maximum untranslated address of stage-2 page table.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation
  2023-05-26  4:16     ` Baolu Lu
  2023-06-07  8:34       ` Liu, Yi L
@ 2023-06-08  3:35       ` Liu, Yi L
  2023-06-08  3:37         ` Baolu Lu
  1 sibling, 1 reply; 40+ messages in thread
From: Liu, Yi L @ 2023-06-08  3:35 UTC (permalink / raw)
  To: Baolu Lu, Tian, Kevin, joro, alex.williamson, jgg, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Jacob Pan

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Friday, May 26, 2023 12:16 PM
> On 5/24/23 3:16 PM, Tian, Kevin wrote:
> >> From: Yi Liu <yi.l.liu@intel.com>
> >> Sent: Thursday, May 11, 2023 10:51 PM
> >>
> >> +	/*
> >> +	 * Memory type is only applicable to devices inside processor
> >> coherent
> >> +	 * domain. Will add MTS support once coherent devices are available.
> >> +	 */
> >> +	if (s1_cfg->flags & IOMMU_VTD_PGTBL_MTS_MASK) {
> >> +		pr_warn_ratelimited("No memory type support %s\n",
> >> +				    iommu->name);
> >> +		return -EINVAL;
> >> +	}
> >
> > If it's unsupported why exposing them in the uAPI at this point?
> 
> Agreed. We can remove this flag for now.

So we shall remove the below flags in uapi as well, is it?

+#define IOMMU_VTD_PGTBL_MTS_MASK	(IOMMU_VTD_PGTBL_CD | \
+					 IOMMU_VTD_PGTBL_EMTE | \
+					 IOMMU_VTD_PGTBL_PCD | \
+					 IOMMU_VTD_PGTBL_PWT)

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation
  2023-06-08  3:35       ` Liu, Yi L
@ 2023-06-08  3:37         ` Baolu Lu
  0 siblings, 0 replies; 40+ messages in thread
From: Baolu Lu @ 2023-06-08  3:37 UTC (permalink / raw)
  To: Liu, Yi L, Tian, Kevin, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong, Jacob Pan

On 6/8/23 11:35 AM, Liu, Yi L wrote:
>> From: Baolu Lu<baolu.lu@linux.intel.com>
>> Sent: Friday, May 26, 2023 12:16 PM
>> On 5/24/23 3:16 PM, Tian, Kevin wrote:
>>>> From: Yi Liu<yi.l.liu@intel.com>
>>>> Sent: Thursday, May 11, 2023 10:51 PM
>>>>
>>>> +	/*
>>>> +	 * Memory type is only applicable to devices inside processor
>>>> coherent
>>>> +	 * domain. Will add MTS support once coherent devices are available.
>>>> +	 */
>>>> +	if (s1_cfg->flags & IOMMU_VTD_PGTBL_MTS_MASK) {
>>>> +		pr_warn_ratelimited("No memory type support %s\n",
>>>> +				    iommu->name);
>>>> +		return -EINVAL;
>>>> +	}
>>> If it's unsupported why exposing them in the uAPI at this point?
>> Agreed. We can remove this flag for now.
> So we shall remove the below flags in uapi as well, is it?
> 
> +#define IOMMU_VTD_PGTBL_MTS_MASK	(IOMMU_VTD_PGTBL_CD | \
> +					 IOMMU_VTD_PGTBL_EMTE | \
> +					 IOMMU_VTD_PGTBL_PCD | \
> +					 IOMMU_VTD_PGTBL_PWT)

I suppose so.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 07/10] iommu/vt-d: Add iotlb flush for nested domain
  2023-05-24  7:33   ` Tian, Kevin
@ 2023-06-08  7:14     ` Liu, Yi L
  2023-06-08  8:07       ` Baolu Lu
  0 siblings, 1 reply; 40+ messages in thread
From: Liu, Yi L @ 2023-06-08  7:14 UTC (permalink / raw)
  To: Tian, Kevin, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, May 24, 2023 3:34 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, May 11, 2023 10:51 PM
> >
> > This is needed as the stage-1 page table of the nested domain is
> > maintained outside the iommu subsystem, hence, needs to support iotlb
> > flush requests.
> >
> > This adds the data structure for flushing iotlb for the nested domain
> > allocated with IOMMU_HWPT_TYPE_VTD_S1 type and the related callback
> > to accept iotlb flush request from IOMMUFD.
> >
> > This only exposes the interface for invalidating IOTLB, but no for
> > device-TLB as device-TLB invalidation will be covered automatically
> > in IOTLB invalidation if the affected device is ATS-capable.
> >
> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> 
> Following how you split patches in former part of the series this should
> be split into three patches: one to introduce the uAPI changes, the 2nd
> to export symbols and the last to actually add iotlb flush.

Will do.

> > +static int intel_nested_cache_invalidate_user(struct iommu_domain
> > *domain,
> > +					      void *user_data)
> > +{
> > +	struct iommu_hwpt_invalidate_request_intel_vtd *req = user_data;
> > +	struct iommu_hwpt_invalidate_intel_vtd *inv_info = user_data;
> > +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +	unsigned int entry_size = inv_info->entry_size;
> > +	u64 uptr = inv_info->inv_data_uptr;
> > +	u64 nr_uptr = inv_info->entry_nr_uptr;
> > +	struct device_domain_info *info;
> > +	u32 entry_nr, index;
> > +	unsigned long flags;
> > +	int ret = 0;
> > +
> > +	if (WARN_ON(!user_data))
> > +		return 0;
> 
> WARN_ON should lead to error returned.

Yes. or may just remove it. caller should provide a valid pointer anyhow.

> > +
> > +	if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
> > +		return -EFAULT;
> > +
> > +	if (!entry_nr)
> > +		return -EINVAL;
> 
> Having zero number of entries is instead not an error. Just means no work
> to do.
> 
> > +
> > +	for (index = 0; index < entry_nr; index++) {
> > +		ret = copy_struct_from_user(req, sizeof(*req),
> > +					    u64_to_user_ptr(uptr + index *
> > entry_size),
> > +					    entry_size);
> > +		if (ret) {
> > +			pr_err_ratelimited("Failed to fetch invalidation
> > request\n");
> > +			break;
> > +		}
> > +
> > +		if (req->__reserved || (req->flags &
> > ~IOMMU_VTD_QI_FLAGS_LEAF) ||
> > +		    !IS_ALIGNED(req->addr, VTD_PAGE_SIZE)) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		spin_lock_irqsave(&dmar_domain->lock, flags);
> > +		list_for_each_entry(info, &dmar_domain->devices, link)
> > +			intel_nested_invalidate(info->dev, dmar_domain,
> > +						req->addr, req->npages);
> > +		spin_unlock_irqrestore(&dmar_domain->lock, flags);
> > +	}
> > +
> > +	if (ret && put_user(index, (uint32_t __user
> > *)u64_to_user_ptr(nr_uptr)))
> > +		return -EFAULT;
> 
> You want to always update the nr no matter success or failure
> 
> > diff --git a/drivers/iommu/iommufd/main.c
> > b/drivers/iommu/iommufd/main.c
> > index 39922f83ce34..b338b082950b 100644
> > --- a/drivers/iommu/iommufd/main.c
> > +++ b/drivers/iommu/iommufd/main.c
> > @@ -282,6 +282,12 @@ union ucmd_buffer {
> >  #ifdef CONFIG_IOMMUFD_TEST
> >  	struct iommu_test_cmd test;
> >  #endif
> > +	/*
> > +	 * hwpt_type specific structure used in the cache invalidation
> > +	 * path.
> > +	 */
> > +	struct iommu_hwpt_invalidate_intel_vtd vtd;
> > +	struct iommu_hwpt_invalidate_request_intel_vtd req_vtd;
> >  };
> 
> Can you add some explanation in commit msg why such vendor
> specific structures must be put in the generic ucmd_buffer?
> 
> >
> > +/**
> > + * enum iommu_hwpt_intel_vtd_invalidate_flags - Flags for Intel VT-d
> 
> enum iommu_hwpt_vtd_s1_invalidate_flags
> 
> > + *                                              stage-1 page table cache
> > + *                                              invalidation
> > + * @IOMMU_VTD_QI_FLAGS_LEAF: The LEAF flag indicates whether only the
> > + *                           leaf PTE caching needs to be invalidated
> > + *                           and other paging structure caches can be
> > + *                           preserved.
> > + */
> 
> what about "Drain Reads" and "Drain Writes"? Is the user allowed/required
> to provide those hints?

All other comments got. For these two hints, the two flags are from the IOTLB
Invalidation descriptor. Per below description, the hardware that supports nested
should support drain and does not require software to ask for it. So it appears no
need to define them in uapi.

"Hardware implementation with Major Version 2 or higher (VER_REG),
always performs required drain without software explicitly requesting
a drain in IOTLB invalidation. This field is deprecated and hardware
will always report it as 1 to maintain backward compatibility with
software"

Regards,
Yi Liu

> > +
> > +/**
> > + * struct iommu_hwpt_invalidate_request_intel_vtd - Intel VT-d cache
> > invalidation request
> 
> here you put "intel_vtd" in the end of the name. let's follow the same order
> as earlier definitions.
> 
> struct iommu_hwpt_vtd_s1_invalidate_desc
> 
> > + * @addr: The start address of the addresses to be invalidated.
> > + * @npages: Number of contiguous 4K pages to be invalidated.
> > + * @flags: Combination of enum iommu_hwpt_intel_vtd_invalidate_flags
> > + * @__reserved: Must be 0
> > + *
> > + * The Intel VT-d specific invalidation data for user-managed stage-1 cache
> > + * invalidation under nested translation. Userspace uses this structure to
> > + * tell host about the impacted caches after modifying the stage-1 page
> > table.
> > + *
> > + * Invalidating all the caches related to the hw_pagetable by setting
> > + * @addr==0 and @npages==__u64(-1).
> > + */
> > +struct iommu_hwpt_invalidate_request_intel_vtd {
> > +	__u64 addr;
> > +	__u64 npages;
> > +	__u32 flags;
> > +	__u32 __reserved;
> > +};
> > +
> > +/**
> > + * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache invalidation
> > info
> 
> iommu_hwpt_vtd_s1_invalidate
> 
> > + * @flags: Must be 0
> > + * @entry_size: Size in bytes of each cache invalidation request
> > + * @entry_nr_uptr: User pointer to the number of invalidation requests.
> > + *                 Kernel reads it to get the number of requests and
> > + *                 updates the buffer with the number of requests that
> > + *                 have been processed successfully. This pointer must
> > + *                 point to a __u32 type of memory location.
> > + * @inv_data_uptr: Pointer to the cache invalidation requests
> > + *
> > + * The Intel VT-d specific invalidation data for a set of cache invalidation
> > + * requests. Kernel loops the requests one-by-one and stops when failure
> > + * is encountered. The number of handled requests is reported to user by
> > + * writing the buffer pointed by @entry_nr_uptr.
> > + */
> > +struct iommu_hwpt_invalidate_intel_vtd {
> > +	__u32 flags;
> > +	__u32 entry_size;
> > +	__u64 entry_nr_uptr;
> > +	__u64 inv_data_uptr;
> > +};
> > +
> >  /**
> >   * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
> >   * @size: sizeof(struct iommu_hwpt_invalidate)
> > @@ -520,6 +577,8 @@ struct iommu_hw_info {
> >   *
> > +==============================+================================
> > ========+
> >   * | @hwpt_type                   |     Data structure in @data_uptr       |
> >   * +------------------------------+----------------------------------------+
> > + * | IOMMU_HWPT_TYPE_VTD_S1       | struct
> > iommu_hwpt_invalidate_intel_vtd |
> > + * +------------------------------+----------------------------------------+
> >   */
> >  struct iommu_hwpt_invalidate {
> >  	__u32 size;
> > --
> > 2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 07/10] iommu/vt-d: Add iotlb flush for nested domain
  2023-06-08  7:14     ` Liu, Yi L
@ 2023-06-08  8:07       ` Baolu Lu
  2023-06-20  6:22         ` Liu, Yi L
  0 siblings, 1 reply; 40+ messages in thread
From: Baolu Lu @ 2023-06-08  8:07 UTC (permalink / raw)
  To: Liu, Yi L, Tian, Kevin, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On 2023/6/8 15:14, Liu, Yi L wrote:
>>> + *                                              stage-1 page table cache
>>> + *                                              invalidation
>>> + * @IOMMU_VTD_QI_FLAGS_LEAF: The LEAF flag indicates whether only the
>>> + *                           leaf PTE caching needs to be invalidated
>>> + *                           and other paging structure caches can be
>>> + *                           preserved.
>>> + */
>> what about "Drain Reads" and "Drain Writes"? Is the user allowed/required
>> to provide those hints?
> All other comments got. For these two hints, the two flags are from the IOTLB
> Invalidation descriptor. Per below description, the hardware that supports nested
> should support drain and does not require software to ask for it. So it appears no
> need to define them in uapi.
> 
> "Hardware implementation with Major Version 2 or higher (VER_REG),
> always performs required drain without software explicitly requesting
> a drain in IOTLB invalidation. This field is deprecated and hardware
> will always report it as 1 to maintain backward compatibility with
> software"

Make sense. Perhaps we can also remove below code in
__iommu_flush_iotlb():

         /* Note: set drain read/write */
#if 0
         /*
          * This is probably to be super secure.. Looks like we can
          * ignore it without any impact.
          */
         if (cap_read_drain(iommu->cap))
                 val |= DMA_TLB_READ_DRAIN;
#endif

Best regards,
baolu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 00/10] Add Intel VT-d nested translation
  2023-05-29 18:43   ` Jason Gunthorpe
  2023-05-30  0:16     ` Alex Williamson
@ 2023-06-14  8:07     ` Tian, Kevin
  2023-06-14 11:52       ` Jason Gunthorpe
  1 sibling, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2023-06-14  8:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, May 30, 2023 2:43 AM
> 
> On Wed, May 24, 2023 at 08:59:43AM +0000, Tian, Kevin wrote:
> 
> > At least this looks a reasonable tradeoff to some proprietary VMMs
> > which never adds RO mappings in stage-2 today.
> 
> What is the reason for the RO anyhow?

vfio simply follows the permission in the CPU address space.

vBIOS regions are marked as RO there hence also carried to vfio mappings.

> 
> Would it be so bad if it was DMA mapped as RW due to the errata?
> 

think of a scenario where the vbios memory is shared by multiple qemu
instances then RW allows a malicious VM to modify the shared content
then potentially attacking other VMs.

skipping the mapping is safest in this regard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 00/10] Add Intel VT-d nested translation
  2023-06-14  8:07     ` Tian, Kevin
@ 2023-06-14 11:52       ` Jason Gunthorpe
  2023-06-16  2:29         ` Tian, Kevin
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 11:52 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, Jun 14, 2023 at 08:07:30AM +0000, Tian, Kevin wrote:

> think of a scenario where the vbios memory is shared by multiple qemu
> instances then RW allows a malicious VM to modify the shared content
> then potentially attacking other VMs.

qemu would have to map the vbios as MAP_PRIVATE WRITE before the iommu
side could map it writable, so this is not a real worry.

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 00/10] Add Intel VT-d nested translation
  2023-06-14 11:52       ` Jason Gunthorpe
@ 2023-06-16  2:29         ` Tian, Kevin
  0 siblings, 0 replies; 40+ messages in thread
From: Tian, Kevin @ 2023-06-16  2:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 7:53 PM
> 
> On Wed, Jun 14, 2023 at 08:07:30AM +0000, Tian, Kevin wrote:
> 
> > think of a scenario where the vbios memory is shared by multiple qemu
> > instances then RW allows a malicious VM to modify the shared content
> > then potentially attacking other VMs.
> 
> qemu would have to map the vbios as MAP_PRIVATE WRITE before the
> iommu
> side could map it writable, so this is not a real worry.
> 

Make sense.

but IMHO it's still safer to reduce the permission (RO->NP) than increasing
the permission (RO->RW) when faithfully emulating bare metal behavior
is impossible, especially when there is no real usage counting on it. 😊

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v3 07/10] iommu/vt-d: Add iotlb flush for nested domain
  2023-06-08  8:07       ` Baolu Lu
@ 2023-06-20  6:22         ` Liu, Yi L
  0 siblings, 0 replies; 40+ messages in thread
From: Liu, Yi L @ 2023-06-20  6:22 UTC (permalink / raw)
  To: Baolu Lu, Tian, Kevin, joro, alex.williamson, jgg, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, June 8, 2023 4:08 PM
> 
> On 2023/6/8 15:14, Liu, Yi L wrote:
> >>> + *                                              stage-1 page table cache
> >>> + *                                              invalidation
> >>> + * @IOMMU_VTD_QI_FLAGS_LEAF: The LEAF flag indicates whether only the
> >>> + *                           leaf PTE caching needs to be invalidated
> >>> + *                           and other paging structure caches can be
> >>> + *                           preserved.
> >>> + */
> >> what about "Drain Reads" and "Drain Writes"? Is the user allowed/required
> >> to provide those hints?
> > All other comments got. For these two hints, the two flags are from the IOTLB
> > Invalidation descriptor. Per below description, the hardware that supports nested
> > should support drain and does not require software to ask for it. So it appears no
> > need to define them in uapi.
> >
> > "Hardware implementation with Major Version 2 or higher (VER_REG),
> > always performs required drain without software explicitly requesting
> > a drain in IOTLB invalidation. This field is deprecated and hardware
> > will always report it as 1 to maintain backward compatibility with
> > software"
> 
> Make sense. Perhaps we can also remove below code in
> __iommu_flush_iotlb():
> 
>          /* Note: set drain read/write */
> #if 0
>          /*
>           * This is probably to be super secure.. Looks like we can
>           * ignore it without any impact.
>           */
>          if (cap_read_drain(iommu->cap))
>                  val |= DMA_TLB_READ_DRAIN;
> #endif

This seems dead code. But it is there for a long time since below commit.

ba39592764ed20cee09aae5352e603a27bf56b0d

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-06-20  6:22 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-11 14:51 [PATCH v3 00/10] Add Intel VT-d nested translation Yi Liu
2023-05-11 14:51 ` [PATCH v3 01/10] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
2023-05-24  6:59   ` Tian, Kevin
2023-05-25  2:28   ` Zhang, Tina
2023-05-29 20:00     ` Jason Gunthorpe
2023-05-29 19:53   ` Jason Gunthorpe
2023-05-11 14:51 ` [PATCH v3 02/10] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
2023-05-24  7:02   ` Tian, Kevin
2023-05-26  2:56     ` Baolu Lu
2023-05-11 14:51 ` [PATCH v3 03/10] iommu/vt-d: Add helper for nested domain allocation Yi Liu
2023-05-11 14:51 ` [PATCH v3 04/10] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
2023-05-24  7:16   ` Tian, Kevin
2023-05-26  4:16     ` Baolu Lu
2023-06-07  8:34       ` Liu, Yi L
2023-06-08  3:32         ` Liu, Yi L
2023-06-08  3:35       ` Liu, Yi L
2023-06-08  3:37         ` Baolu Lu
2023-05-11 14:51 ` [PATCH v3 05/10] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
2023-05-11 14:51 ` [PATCH v3 06/10] iommu/vt-d: Set the nested domain to a device Yi Liu
2023-05-24  7:22   ` Tian, Kevin
2023-05-26  4:24     ` Baolu Lu
2023-05-11 14:51 ` [PATCH v3 07/10] iommu/vt-d: Add iotlb flush for nested domain Yi Liu
2023-05-24  7:33   ` Tian, Kevin
2023-06-08  7:14     ` Liu, Yi L
2023-06-08  8:07       ` Baolu Lu
2023-06-20  6:22         ` Liu, Yi L
2023-05-11 14:51 ` [PATCH v3 08/10] iommu/vt-d: Add nested domain allocation Yi Liu
2023-05-11 14:51 ` [PATCH v3 09/10] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
2023-05-11 14:51 ` [PATCH v3 10/10] iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu
2023-05-24  7:44   ` Tian, Kevin
2023-05-26  4:28     ` Baolu Lu
2023-05-24  8:59 ` [PATCH v3 00/10] Add Intel VT-d nested translation Tian, Kevin
2023-05-25 18:06   ` Alex Williamson
2023-05-26 11:25     ` Tian, Kevin
2023-05-29 18:43   ` Jason Gunthorpe
2023-05-30  0:16     ` Alex Williamson
2023-05-30 16:42       ` Jason Gunthorpe
2023-06-14  8:07     ` Tian, Kevin
2023-06-14 11:52       ` Jason Gunthorpe
2023-06-16  2:29         ` Tian, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).