All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Add Intel VT-d nested translation
@ 2023-03-09  8:22 Yi Liu
  2023-03-09  8:22 ` [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d Yi Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Yi Liu @ 2023-03-09  8:22 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

This is to add Intel VT-d nested translation based on IOMMUFD nesting
infrastructure. As the iommufd nesting infrastructure series[1], iommu
core supports new ops to report iommu hardware information, allocate
domains with user data and sync stage-1 IOTLB. The data required in
the three paths are vendor-specific, so

1) IOMMU_HW_INFO_TYPE_INTEL_VTD and struct iommu_device_info_vtd are
   defined to report iommu hardware information for Intel VT-d .
2) IOMMU_HWPT_DATA_VTD_S1 is defined for the Intel VT-d stage-1 page
   table, it will be used in the stage-1 domain allocation and IOTLB
   syncing path. struct iommu_hwpt_intel_vtd is defined to pass user_data
   for the Intel VT-d stage-1 domain allocation.
   struct iommu_hwpt_invalidate_intel_vtd is defined to pass the data for
   the Intel VT-d stage-1 IOTLB invalidation.

With above IOMMUFD extensions, the intel iommu driver implements the three
callbacks to support nested translation.

Complete code can be found in [2], QEMU could can be found in [3].

base-commit: f01f1c95684dd18c15dd0e51b4fd6e796a0a2c0e

[1] https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.com/
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting

Change log:
v2:
 - The iommufd infrastructure is split to be separate series.

v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.com/

Regards,
        Yi Liu

Lu Baolu (4):
  iommu/vt-d: Implement hw_info for iommu capability query
  iommu/vt-d: Extend dmar_domain to support nested domain
  iommu/vt-d: Add helper to setup pasid nested translation
  iommu/vt-d: Add nested domain support

Yi Liu (1):
  iommufd: Add nesting related data structures for Intel VT-d

 drivers/iommu/intel/Makefile         |   2 +-
 drivers/iommu/intel/iommu.c          |  57 ++++++++---
 drivers/iommu/intel/iommu.h          |  51 ++++++++--
 drivers/iommu/intel/nested.c         | 143 +++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.c          | 142 ++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h          |   2 +
 drivers/iommu/iommufd/hw_pagetable.c |   7 +-
 drivers/iommu/iommufd/main.c         |   5 +
 include/uapi/linux/iommufd.h         | 136 +++++++++++++++++++++++++
 9 files changed, 522 insertions(+), 23 deletions(-)
 create mode 100644 drivers/iommu/intel/nested.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d
  2023-03-09  8:22 [PATCH v2 0/5] Add Intel VT-d nested translation Yi Liu
@ 2023-03-09  8:22 ` Yi Liu
  2023-03-15 13:50   ` Liu, Jingqi
  2023-03-20 13:49   ` Jason Gunthorpe
  2023-03-09  8:22 ` [PATCH v2 2/5] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Yi Liu @ 2023-03-09  8:22 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

Add the following data structures for corresponding ioctls:
               iommu_hwpt_intel_vtd => IOMMU_HWPT_ALLOC
              iommu_hw_info_vtd => IOMMU_DEVICE_GET_HW_INFO
    iommu_hwpt_invalidate_intel_vtd => IOMMU_HWPT_INVALIDATE

Also, add IOMMU_HW_INFO_TYPE_INTEL_VTD and IOMMU_HWPT_TYPE_VTD_S1 to the
header and corresponding type/size arrays.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c |   7 +-
 drivers/iommu/iommufd/main.c         |   5 +
 include/uapi/linux/iommufd.h         | 136 +++++++++++++++++++++++++++
 3 files changed, 147 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 76ff228dfc1f..36e79db8a17d 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -172,6 +172,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
  */
 static const size_t iommufd_hwpt_alloc_data_size[] = {
 	[IOMMU_HWPT_TYPE_DEFAULT] = 0,
+	[IOMMU_HWPT_TYPE_VTD_S1] = sizeof(struct iommu_hwpt_intel_vtd),
 };
 
 /*
@@ -180,6 +181,8 @@ static const size_t iommufd_hwpt_alloc_data_size[] = {
  */
 const u64 iommufd_hwpt_type_bitmaps[] =  {
 	[IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
+	[IOMMU_HW_INFO_TYPE_INTEL_VTD] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT) |
+					 BIT_ULL(IOMMU_HWPT_TYPE_VTD_S1),
 };
 
 /* Return true if type is supported, otherwise false */
@@ -324,7 +327,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
  * size of page table type specific invalidate_info, indexed by
  * enum iommu_hwpt_type.
  */
-static const size_t iommufd_hwpt_invalidate_info_size[] = {};
+static const size_t iommufd_hwpt_invalidate_info_size[] = {
+	[IOMMU_HWPT_TYPE_VTD_S1] = sizeof(struct iommu_hwpt_invalidate_intel_vtd),
+};
 
 int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
 {
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 7ec3ceac01b3..514db4c26927 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -275,6 +275,11 @@ union ucmd_buffer {
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
+	/*
+	 * data_type specific structure used in the cache invalidation
+	 * path.
+	 */
+	struct iommu_hwpt_invalidate_intel_vtd vtd;
 };
 
 struct iommufd_ioctl_op {
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index e2eff9c56ab3..2a6c326391b2 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -351,9 +351,64 @@ struct iommu_vfio_ioas {
 /**
  * enum iommu_hwpt_type - IOMMU HWPT Type
  * @IOMMU_HWPT_TYPE_DEFAULT: default
+ * @IOMMU_HWPT_TYPE_VTD_S1: Intel VT-d stage-1 page table
  */
 enum iommu_hwpt_type {
 	IOMMU_HWPT_TYPE_DEFAULT,
+	IOMMU_HWPT_TYPE_VTD_S1,
+};
+
+/**
+ * enum iommu_hwpt_intel_vtd_flags - Intel VT-d stage-1 page
+ *				     table entry attributes
+ * @IOMMU_VTD_PGTBL_SRE: Supervisor request
+ * @IOMMU_VTD_PGTBL_EAFE: Extended access enable
+ * @IOMMU_VTD_PGTBL_PCD: Page-level cache disable
+ * @IOMMU_VTD_PGTBL_PWT: Page-level write through
+ * @IOMMU_VTD_PGTBL_EMTE: Extended mem type enable
+ * @IOMMU_VTD_PGTBL_CD: PASID-level cache disable
+ * @IOMMU_VTD_PGTBL_WPE: Write protect enable
+ */
+enum iommu_hwpt_intel_vtd_flags {
+	IOMMU_VTD_PGTBL_SRE = 1 << 0,
+	IOMMU_VTD_PGTBL_EAFE = 1 << 1,
+	IOMMU_VTD_PGTBL_PCD = 1 << 2,
+	IOMMU_VTD_PGTBL_PWT = 1 << 3,
+	IOMMU_VTD_PGTBL_EMTE = 1 << 4,
+	IOMMU_VTD_PGTBL_CD = 1 << 5,
+	IOMMU_VTD_PGTBL_WPE = 1 << 6,
+	IOMMU_VTD_PGTBL_LAST = 1 << 7,
+};
+
+/**
+ * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
+ *                               stage-1 page table info
+ * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
+ * @pgtbl_addr: The base address of the user-managed stage-1 page table.
+ * @pat: Page attribute table data to compute effective memory type
+ * @emt: Extended memory type
+ * @addr_width: The address width of the untranslated addresses that are
+ *              subjected to the user-managed stage-1 page table.
+ * @__reserved: Must be 0
+ *
+ * The Intel VT-d specific data for creating hw_pagetable to represent
+ * the user-managed stage-1 page table that is used in nested translation.
+ *
+ * In nested translation, the stage-1 page table locates in the address
+ * space that defined by the corresponding stage-2 page table. Hence the
+ * stage-1 page table base address value should not be higher than the
+ * maximum untranslated address of stage-2 page table.
+ *
+ * The paging level of the stage-1 page table should be compataible with
+ * the hardware iommu. Otherwise, the allocation would be failed.
+ */
+struct iommu_hwpt_intel_vtd {
+	__u64 flags;
+	__u64 pgtbl_addr;
+	__u32 pat;
+	__u32 emt;
+	__u32 addr_width;
+	__u32 __reserved;
 };
 
 /**
@@ -389,6 +444,8 @@ enum iommu_hwpt_type {
  * +------------------------------+-------------------------------------+-----------+
  * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
  * +------------------------------+-------------------------------------+-----------+
+ * | IOMMU_HWPT_TYPE_VTD_S1       |      struct iommu_hwpt_intel_vtd    |    HWPT   |
+ * +------------------------------+-------------------------------------+-----------+
  */
 struct iommu_hwpt_alloc {
 	__u32 size;
@@ -405,9 +462,32 @@ struct iommu_hwpt_alloc {
 
 /**
  * enum iommu_hw_info_type - IOMMU Hardware Info Types
+ * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
  */
 enum iommu_hw_info_type {
 	IOMMU_HW_INFO_TYPE_DEFAULT,
+	IOMMU_HW_INFO_TYPE_INTEL_VTD,
+};
+
+/**
+ * struct iommu_hw_info_vtd - Intel VT-d hardware information
+ *
+ * @flags: Must be set to 0
+ * @__reserved: Must be 0
+ *
+ * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ *           section 11.4.2 Capability Register.
+ * @ecap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ *            section 11.4.3 Extended Capability Register.
+ *
+ * User needs to understand the Intel VT-d specification to decode the
+ * register value.
+ */
+struct iommu_hw_info_vtd {
+	__u32 flags;
+	__u32 __reserved;
+	__aligned_u64 cap_reg;
+	__aligned_u64 ecap_reg;
 };
 
 /**
@@ -457,6 +537,60 @@ struct iommu_hw_info {
 };
 #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
 
+/**
+ * enum iommu_vtd_qi_granularity - Intel VT-d specific granularity of
+ *                                 queued invalidation
+ * @IOMMU_VTD_QI_GRAN_DOMAIN: domain-selective invalidation
+ * @IOMMU_VTD_QI_GRAN_ADDR: page-selective invalidation
+ */
+enum iommu_vtd_qi_granularity {
+	IOMMU_VTD_QI_GRAN_DOMAIN,
+	IOMMU_VTD_QI_GRAN_ADDR,
+};
+
+/**
+ * enum iommu_hwpt_intel_vtd_invalidate_flags - Flags for Intel VT-d
+ *                                              stage-1 page table cache
+ *                                              invalidation
+ * @IOMMU_VTD_QI_FLAGS_LEAF: The LEAF flag indicates whether only the
+ *                           leaf PTE caching needs to be invalidated
+ *                           and other paging structure caches can be
+ *                           preserved.
+ */
+enum iommu_hwpt_intel_vtd_invalidate_flags {
+	IOMMU_VTD_QI_FLAGS_LEAF = 1 << 0,
+};
+
+/**
+ * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache invalidation info
+ * @granularity: One of enum iommu_vtd_qi_granularity.
+ * @flags: Combination of enum iommu_hwpt_intel_vtd_invalidate_flags
+ * @__reserved: Must be 0
+ * @addr: The start address of the addresses to be invalidated.
+ * @granule_size: Page/block size of the mapping in bytes. It is used to
+ *                compute the invalidation range togehter with @nb_granules.
+ * @nb_granules: Number of contiguous granules to be invalidated.
+ *
+ * The Intel VT-d specific invalidation data for user-managed stage-1 cache
+ * invalidation under nested translation. Userspace uses this structure to
+ * tell host about the impacted caches after modifying the stage-1 page table.
+ *
+ * @addr, @granule_size and @nb_granules are meaningful when
+ * @granularity==IOMMU_VTD_QI_GRAN_ADDR. Intel VT-d currently only supports
+ * 4kB page size, so @granule_size should be 4KB. @addr should be aligned
+ * with @granule_size * @nb_granules, otherwise invalidation won't take
+ * effect.
+ */
+struct iommu_hwpt_invalidate_intel_vtd {
+	__u8 granularity;
+	__u8 padding[7];
+	__u32 flags;
+	__u32 __reserved;
+	__u64 addr;
+	__u64 granule_size;
+	__u64 nb_granules;
+};
+
 /**
  * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
  * @size: sizeof(struct iommu_hwpt_invalidate)
@@ -473,6 +607,8 @@ struct iommu_hw_info {
  * +==============================+========================================+
  * | @data_type                   |     Data structure in @data_uptr       |
  * +------------------------------+----------------------------------------+
+ * | IOMMU_HWPT_TYPE_VTD_S1       | struct iommu_hwpt_invalidate_intel_vtd |
+ * +------------------------------+----------------------------------------+
  */
 struct iommu_hwpt_invalidate {
 	__u32 size;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/5] iommu/vt-d: Implement hw_info for iommu capability query
  2023-03-09  8:22 [PATCH v2 0/5] Add Intel VT-d nested translation Yi Liu
  2023-03-09  8:22 ` [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d Yi Liu
@ 2023-03-09  8:22 ` Yi Liu
  2023-03-09  8:22 ` [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Yi Liu @ 2023-03-09  8:22 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

From: Lu Baolu <baolu.lu@linux.intel.com>

Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 19 +++++++++++++++++++
 drivers/iommu/intel/iommu.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7c2f4bd33582..004406209793 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4780,8 +4780,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	intel_pasid_tear_down_entry(iommu, dev, pasid, false);
 }
 
+static void *intel_iommu_hw_info(struct device *dev, u32 *length)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct iommu_hw_info_vtd *vtd;
+
+	vtd = kzalloc(sizeof(*vtd), GFP_KERNEL);
+	if (!vtd)
+		return ERR_PTR(-ENOMEM);
+
+	vtd->cap_reg = iommu->cap;
+	vtd->ecap_reg = iommu->ecap;
+	*length = sizeof(*vtd);
+
+	return vtd;
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
+	.hw_info		= intel_iommu_hw_info,
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.probe_device		= intel_iommu_probe_device,
 	.probe_finalize		= intel_iommu_probe_finalize,
@@ -4794,6 +4812,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.def_domain_type	= device_def_domain_type,
 	.remove_dev_pasid	= intel_iommu_remove_dev_pasid,
 	.pgsize_bitmap		= SZ_4K,
+	.driver_type		= IOMMU_HW_INFO_TYPE_INTEL_VTD,
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	.page_response		= intel_svm_page_response,
 #endif
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index d6df3b865812..9871de2170eb 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -23,6 +23,7 @@
 #include <linux/bitfield.h>
 #include <linux/xarray.h>
 #include <linux/perf_event.h>
+#include <uapi/linux/iommufd.h>
 
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain
  2023-03-09  8:22 [PATCH v2 0/5] Add Intel VT-d nested translation Yi Liu
  2023-03-09  8:22 ` [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d Yi Liu
  2023-03-09  8:22 ` [PATCH v2 2/5] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
@ 2023-03-09  8:22 ` Yi Liu
  2023-03-20 13:54   ` Jason Gunthorpe
  2023-03-09  8:22 ` [PATCH v2 4/5] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
  2023-03-09  8:22 ` [PATCH v2 5/5] iommu/vt-d: Add nested domain support Yi Liu
  4 siblings, 1 reply; 14+ messages in thread
From: Yi Liu @ 2023-03-09  8:22 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

From: Lu Baolu <baolu.lu@linux.intel.com>

The nested domain fields are exclusive to those that used for a DMA
remapping domain. Use union to avoid memory waste.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 9871de2170eb..9446e17dd202 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -599,15 +599,38 @@ struct dmar_domain {
 	spinlock_t lock;		/* Protect device tracking lists */
 	struct list_head devices;	/* all devices' list */
 
-	struct dma_pte	*pgd;		/* virtual address */
-	int		gaw;		/* max guest address width */
-
-	/* adjusted guest address width, 0 is level 2 30-bit */
-	int		agaw;
 	int		iommu_superpage;/* Level of superpages supported:
 					   0 == 4KiB (no superpages), 1 == 2MiB,
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
-	u64		max_addr;	/* maximum mapped address */
+	union {
+		/* DMA remapping domain */
+		struct {
+			/* virtual address */
+			struct dma_pte	*pgd;
+			/* max guest address width */
+			int		gaw;
+			/*
+			 * adjusted guest address width:
+			 *   0: level 2 30-bit
+			 *   1: level 3 39-bit
+			 *   2: level 4 48-bit
+			 *   3: level 5 57-bit
+			 */
+			int		agaw;
+			/* maximum mapped address */
+			u64		max_addr;
+		};
+
+		/* Nested user domain */
+		struct {
+			/* 2-level page table the user domain nested */
+			struct dmar_domain *s2_domain;
+			/* user page table pointer (in GPA) */
+			unsigned long s1_pgtbl;
+			/* page table attributes */
+			struct iommu_hwpt_intel_vtd s1_cfg;
+		};
+	};
 
 	struct iommu_domain domain;	/* generic domain data structure for
 					   iommu core */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/5] iommu/vt-d: Add helper to setup pasid nested translation
  2023-03-09  8:22 [PATCH v2 0/5] Add Intel VT-d nested translation Yi Liu
                   ` (2 preceding siblings ...)
  2023-03-09  8:22 ` [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
@ 2023-03-09  8:22 ` Yi Liu
  2023-03-09  8:22 ` [PATCH v2 5/5] iommu/vt-d: Add nested domain support Yi Liu
  4 siblings, 0 replies; 14+ messages in thread
From: Yi Liu @ 2023-03-09  8:22 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

The configurations are passed in from the user when the user domain is
allocated. This helper interprets these configurations according to the
data structure defined in uapi/linux/iommufd.h. The EINVAL error will be
returned if any of configurations are not compatible with the hardware
capabilities. The caller can retry with another compatible user domain.
The encoding of fields of each pasid entry is defined in section 9.6 of
the VT-d spec.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/pasid.c | 142 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h |   2 +
 2 files changed, 144 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 633e0a4a01e7..2e8a912b1499 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -21,6 +21,11 @@
 #include "iommu.h"
 #include "pasid.h"
 
+#define IOMMU_VTD_PGTBL_MTS_MASK	(IOMMU_VTD_PGTBL_CD | \
+					 IOMMU_VTD_PGTBL_EMTE | \
+					 IOMMU_VTD_PGTBL_PCD | \
+					 IOMMU_VTD_PGTBL_PWT)
+
 /*
  * Intel IOMMU system wide PASID name space:
  */
@@ -411,6 +416,15 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
 	pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
 }
 
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void pasid_set_eafe(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
+}
+
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 				    u16 did, u32 pasid)
@@ -756,3 +770,131 @@ void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
 	if (!cap_caching_mode(iommu->cap))
 		devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
+ * This could be used for guest shared virtual address. In this case, the
+ * first level page tables are used for GVA-GPA translation in the guest,
+ * second level page tables are used for GPA-HPA translation.
+ *
+ * @iommu:      IOMMU which the device belong to
+ * @dev:        Device to be set up for translation
+ * @pasid:      PASID to be programmed in the device PASID table
+ * @domain:     User domain nested on a s2 domain
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain)
+{
+	struct iommu_hwpt_intel_vtd *s1_cfg = &domain->s1_cfg;
+	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
+	struct dmar_domain *s2_domain = domain->s2_domain;
+	u16 did = domain_id_iommu(domain, iommu);
+	struct dma_pte *pgd = s2_domain->pgd;
+	struct pasid_entry *pte;
+	int agaw;
+
+	if (!ecap_nest(iommu->ecap)) {
+		pr_err_ratelimited("%s: No nested translation support\n",
+				   iommu->name);
+		return -ENODEV;
+	}
+
+	/*
+	 * Sanity checking performed by caller to make sure address width
+	 * matching in two dimensions: CPU vs. IOMMU, guest vs. host.
+	 */
+	switch (s1_cfg->addr_width) {
+	case ADDR_WIDTH_4LEVEL:
+		break;
+#ifdef CONFIG_X86
+	case ADDR_WIDTH_5LEVEL:
+		if (!cpu_feature_enabled(X86_FEATURE_LA57) ||
+		    !cap_fl5lp_support(iommu->cap)) {
+			dev_err_ratelimited(dev,
+					    "5-level paging not supported\n");
+			return -EINVAL;
+		}
+		break;
+#endif
+	default:
+		dev_err_ratelimited(dev, "Invalid guest address width %d\n",
+				    s1_cfg->addr_width);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_PGTBL_SRE) && !ecap_srs(iommu->ecap)) {
+		pr_err_ratelimited("No supervisor request support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_PGTBL_EAFE) && !ecap_eafs(iommu->ecap)) {
+		pr_err_ratelimited("No extended access flag support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	/*
+	 * Memory type is only applicable to devices inside processor coherent
+	 * domain. Will add MTS support once coherent devices are available.
+	 */
+	if (s1_cfg->flags & IOMMU_VTD_PGTBL_MTS_MASK) {
+		pr_warn_ratelimited("No memory type support %s\n",
+				    iommu->name);
+		return -EINVAL;
+	}
+
+	agaw = iommu_skip_agaw(s2_domain, iommu, &pgd);
+	if (agaw < 0) {
+		dev_err_ratelimited(dev, "Invalid domain page table\n");
+		return -EINVAL;
+	}
+
+	/* First level PGD (in GPA) must be supported by the second level. */
+	if ((uintptr_t)s1_gpgd > (1ULL << s2_domain->gaw)) {
+		dev_err_ratelimited(dev,
+				    "Guest PGD %lx not supported, max %llx\n",
+				    (uintptr_t)s1_gpgd, s2_domain->max_addr);
+		return -EINVAL;
+	}
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+	if (pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EBUSY;
+	}
+
+	pasid_clear_entry(pte);
+
+	if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL)
+		pasid_set_flpm(pte, 1);
+
+	pasid_set_flptr(pte, (uintptr_t)s1_gpgd);
+
+	if (s1_cfg->flags & IOMMU_VTD_PGTBL_SRE) {
+		pasid_set_sre(pte);
+		if (s1_cfg->flags & IOMMU_VTD_PGTBL_WPE)
+			pasid_set_wpe(pte);
+	}
+
+	if (s1_cfg->flags & IOMMU_VTD_PGTBL_EAFE)
+		pasid_set_eafe(pte);
+
+	pasid_set_slptr(pte, virt_to_phys(pgd));
+	pasid_set_fault_enable(pte);
+	pasid_set_domain_id(pte, did);
+	pasid_set_address_width(pte, agaw);
+	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+	pasid_set_present(pte);
+	spin_unlock(&iommu->lock);
+
+	pasid_flush_caches(iommu, pte, pasid, did);
+
+	return 0;
+}
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 20c54e50f533..2a72bbc79532 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -118,6 +118,8 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, u32 pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, u32 pasid,
 				 bool fault_ignore);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/5] iommu/vt-d: Add nested domain support
  2023-03-09  8:22 [PATCH v2 0/5] Add Intel VT-d nested translation Yi Liu
                   ` (3 preceding siblings ...)
  2023-03-09  8:22 ` [PATCH v2 4/5] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
@ 2023-03-09  8:22 ` Yi Liu
  2023-03-13 13:51   ` Liu, Jingqi
       [not found]   ` <bf434709-4bb3-a7bd-4b5a-9f495bd3a104@intel.com>
  4 siblings, 2 replies; 14+ messages in thread
From: Yi Liu @ 2023-03-09  8:22 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

This adds nested domain support in the Intel IOMMU driver. It allows to
allocate and free a nested domain, set the nested domain to a device,
and synchronize the caches when the userspace managed page tables are
updated.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/Makefile |   2 +-
 drivers/iommu/intel/iommu.c  |  38 ++++++----
 drivers/iommu/intel/iommu.h  |  15 ++++
 drivers/iommu/intel/nested.c | 143 +++++++++++++++++++++++++++++++++++
 4 files changed, 182 insertions(+), 16 deletions(-)
 create mode 100644 drivers/iommu/intel/nested.c

diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
index 7af3b8a4f2a0..5dabf081a779 100644
--- a/drivers/iommu/intel/Makefile
+++ b/drivers/iommu/intel/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
-obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o
+obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o
 obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
 obj-$(CONFIG_DMAR_PERF) += perf.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 004406209793..89cc7095afed 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -277,7 +277,6 @@ static LIST_HEAD(dmar_satc_units);
 #define for_each_rmrr_units(rmrr) \
 	list_for_each_entry(rmrr, &dmar_rmrr_units, list)
 
-static void device_block_translation(struct device *dev);
 static void intel_iommu_domain_free(struct iommu_domain *domain);
 
 int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
@@ -555,7 +554,7 @@ static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
 }
 
 /* Some capabilities may be different across iommus */
-static void domain_update_iommu_cap(struct dmar_domain *domain)
+void domain_update_iommu_cap(struct dmar_domain *domain)
 {
 	domain_update_iommu_coherency(domain);
 	domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL);
@@ -1498,10 +1497,10 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
 
-static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
-				  struct dmar_domain *domain,
-				  unsigned long pfn, unsigned int pages,
-				  int ih, int map)
+void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
+			   struct dmar_domain *domain,
+			   unsigned long pfn, unsigned int pages,
+			   int ih, int map)
 {
 	unsigned int aligned_pages = __roundup_pow_of_two(pages);
 	unsigned int mask = ilog2(aligned_pages);
@@ -1573,7 +1572,7 @@ static inline void __mapping_notify_one(struct intel_iommu *iommu,
 		iommu_flush_write_buffer(iommu);
 }
 
-static void intel_flush_iotlb_all(struct iommu_domain *domain)
+void intel_flush_iotlb_all(struct iommu_domain *domain)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	struct iommu_domain_info *info;
@@ -1765,8 +1764,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
 	return domain;
 }
 
-static int domain_attach_iommu(struct dmar_domain *domain,
-			       struct intel_iommu *iommu)
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 {
 	struct iommu_domain_info *info, *curr;
 	unsigned long ndomains;
@@ -1815,8 +1813,7 @@ static int domain_attach_iommu(struct dmar_domain *domain,
 	return ret;
 }
 
-static void domain_detach_iommu(struct dmar_domain *domain,
-				struct intel_iommu *iommu)
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 {
 	struct iommu_domain_info *info;
 
@@ -4103,7 +4100,7 @@ static void dmar_remove_one_dev_info(struct device *dev)
  * all DMA requests without PASID from the device are blocked. If the page
  * table has been set, clean up the data structures.
  */
-static void device_block_translation(struct device *dev)
+void device_block_translation(struct device *dev)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
 	struct intel_iommu *iommu = info->iommu;
@@ -4204,14 +4201,24 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 	return NULL;
 }
 
+static struct iommu_domain *
+intel_iommu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
+			      const void *user_data)
+{
+	if (parent)
+		return intel_nested_domain_alloc(parent, user_data);
+	else
+		return iommu_domain_alloc(dev->bus);
+}
+
 static void intel_iommu_domain_free(struct iommu_domain *domain)
 {
 	if (domain != &si_domain->domain && domain != &blocking_domain)
 		domain_exit(to_dmar_domain(domain));
 }
 
-static int prepare_domain_attach_device(struct iommu_domain *domain,
-					struct device *dev)
+int prepare_domain_attach_device(struct iommu_domain *domain,
+				 struct device *dev)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	struct intel_iommu *iommu;
@@ -4450,7 +4457,7 @@ static void domain_set_force_snooping(struct dmar_domain *domain)
 						     PASID_RID2PASID);
 }
 
-static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
+bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	unsigned long flags;
@@ -4801,6 +4808,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.hw_info		= intel_iommu_hw_info,
 	.domain_alloc		= intel_iommu_domain_alloc,
+	.domain_alloc_user	= intel_iommu_domain_alloc_user,
 	.probe_device		= intel_iommu_probe_device,
 	.probe_finalize		= intel_iommu_probe_finalize,
 	.release_device		= intel_iommu_release_device,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 9446e17dd202..661fb5050e2d 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -848,6 +848,19 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
 
 int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
 		   unsigned int count, unsigned long options);
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void device_block_translation(struct device *dev);
+int prepare_domain_attach_device(struct iommu_domain *domain,
+				 struct device *dev);
+bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain);
+void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
+			   struct dmar_domain *domain,
+			   unsigned long pfn, unsigned int pages,
+			   int ih, int map);
+void intel_flush_iotlb_all(struct iommu_domain *domain);
+void domain_update_iommu_cap(struct dmar_domain *domain);
+
 /*
  * Options used in qi_submit_sync:
  * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
@@ -860,6 +873,8 @@ void *alloc_pgtable_page(int node, gfp_t gfp);
 void free_pgtable_page(void *vaddr);
 void iommu_flush_write_buffer(struct intel_iommu *iommu);
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+					       const void *data);
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 extern void intel_svm_check(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
new file mode 100644
index 000000000000..64441d11a84d
--- /dev/null
+++ b/drivers/iommu/intel/nested.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * nested.c - nested mode translation support
+ *
+ * Copyright (C) 2023 Intel Corporation
+ *
+ * Author: Lu Baolu <baolu.lu@linux.intel.com>
+ *         Jacob Pan <jacob.jun.pan@linux.intel.com>
+ */
+
+#define pr_fmt(fmt)	"DMAR: " fmt
+
+#include <linux/iommu.h>
+#include <linux/pci.h>
+#include <linux/pci-ats.h>
+
+#include "iommu.h"
+#include "pasid.h"
+
+static int intel_nested_attach_dev(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct intel_iommu *iommu = info->iommu;
+	unsigned long flags;
+	int ret = 0;
+
+	if (info->domain)
+		device_block_translation(dev);
+
+	/* Is s2_domain compatible with this IOMMU? */
+	ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
+	if (ret) {
+		dev_err_ratelimited(dev, "s2 domain is not compatible\n");
+		return ret;
+	}
+
+	ret = domain_attach_iommu(dmar_domain, iommu);
+	if (ret) {
+		dev_err_ratelimited(dev, "Failed to attach domain to iommu\n");
+		return ret;
+	}
+
+	ret = intel_pasid_setup_nested(iommu, dev,
+				       PASID_RID2PASID, dmar_domain);
+	if (ret) {
+		domain_detach_iommu(dmar_domain, iommu);
+		dev_err_ratelimited(dev, "Failed to setup pasid entry\n");
+		return ret;
+	}
+
+	info->domain = dmar_domain;
+	spin_lock_irqsave(&dmar_domain->lock, flags);
+	list_add(&info->link, &dmar_domain->devices);
+	spin_unlock_irqrestore(&dmar_domain->lock, flags);
+	domain_update_iommu_cap(dmar_domain);
+
+	return 0;
+}
+
+static void intel_nested_domain_free(struct iommu_domain *domain)
+{
+	kfree(to_dmar_domain(domain));
+}
+
+static void intel_nested_invalidate(struct device *dev,
+				    struct dmar_domain *domain,
+				    void *user_data)
+{
+	struct iommu_hwpt_invalidate_intel_vtd *inv_info = user_data;
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+
+	if (WARN_ON(!user_data))
+		return;
+
+	switch (inv_info->granularity) {
+	case IOMMU_VTD_QI_GRAN_ADDR:
+		if (inv_info->granule_size != VTD_PAGE_SIZE ||
+		    !IS_ALIGNED(inv_info->addr, VTD_PAGE_SIZE)) {
+			dev_err_ratelimited(dev, "Invalid invalidation address 0x%llx\n",
+					    inv_info->addr);
+			return;
+		}
+
+		iommu_flush_iotlb_psi(iommu, domain,
+				      inv_info->addr >> VTD_PAGE_SHIFT,
+				      inv_info->nb_granules, 1, 0);
+		break;
+	case IOMMU_VTD_QI_GRAN_DOMAIN:
+		intel_flush_iotlb_all(&domain->domain);
+		break;
+	default:
+		dev_err_ratelimited(dev, "Unsupported IOMMU invalidation type %d\n",
+				    inv_info->granularity);
+		break;
+	}
+}
+
+static void intel_nested_cache_invalidate_user(struct iommu_domain *domain,
+					       void *user_data)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct device_domain_info *info;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dmar_domain->lock, flags);
+	list_for_each_entry(info, &dmar_domain->devices, link)
+		intel_nested_invalidate(info->dev, dmar_domain,
+					user_data);
+	spin_unlock_irqrestore(&dmar_domain->lock, flags);
+}
+
+static const struct iommu_domain_ops intel_nested_domain_ops = {
+	.attach_dev		= intel_nested_attach_dev,
+	.cache_invalidate_user	= intel_nested_cache_invalidate_user,
+	.free			= intel_nested_domain_free,
+	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+};
+
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+					       const void *user_data)
+{
+	const struct iommu_hwpt_intel_vtd *vtd = user_data;
+	struct dmar_domain *domain;
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
+	if (!domain)
+		return NULL;
+
+	domain->use_first_level = true;
+	domain->s2_domain = to_dmar_domain(s2_domain);
+	domain->s1_pgtbl = vtd->pgtbl_addr;
+	domain->s1_cfg = *vtd;
+	domain->domain.ops = &intel_nested_domain_ops;
+	domain->domain.type = IOMMU_DOMAIN_NESTED;
+	INIT_LIST_HEAD(&domain->devices);
+	spin_lock_init(&domain->lock);
+	xa_init(&domain->iommu_array);
+
+	return &domain->domain;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] iommu/vt-d: Add nested domain support
  2023-03-09  8:22 ` [PATCH v2 5/5] iommu/vt-d: Add nested domain support Yi Liu
@ 2023-03-13 13:51   ` Liu, Jingqi
       [not found]   ` <bf434709-4bb3-a7bd-4b5a-9f495bd3a104@intel.com>
  1 sibling, 0 replies; 14+ messages in thread
From: Liu, Jingqi @ 2023-03-13 13:51 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Jacob Pan


On 3/9/2023 4:22 PM, Yi Liu wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
>
> This adds nested domain support in the Intel IOMMU driver. It allows to
> allocate and free a nested domain, set the nested domain to a device,
> and synchronize the caches when the userspace managed page tables are
> updated.
>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/intel/Makefile |   2 +-
>   drivers/iommu/intel/iommu.c  |  38 ++++++----
>   drivers/iommu/intel/iommu.h  |  15 ++++
>   drivers/iommu/intel/nested.c | 143 +++++++++++++++++++++++++++++++++++
>   4 files changed, 182 insertions(+), 16 deletions(-)
>   create mode 100644 drivers/iommu/intel/nested.c
[...]

> +
> +struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
> +					       const void *user_data)
> +{
> +	const struct iommu_hwpt_intel_vtd *vtd = user_data;
> +	struct dmar_domain *domain;
> +

Would it be better to add the following check ?

     	if (WARN_ON(!user_data))
		return NULL;

> +	domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
> +	if (!domain)
> +		return NULL;
> +
> +	domain->use_first_level = true;
> +	domain->s2_domain = to_dmar_domain(s2_domain);
> +	domain->s1_pgtbl = vtd->pgtbl_addr;
> +	domain->s1_cfg = *vtd;
> +	domain->domain.ops = &intel_nested_domain_ops;
> +	domain->domain.type = IOMMU_DOMAIN_NESTED;
> +	INIT_LIST_HEAD(&domain->devices);
> +	spin_lock_init(&domain->lock);
> +	xa_init(&domain->iommu_array);
> +
> +	return &domain->domain;
> +}

Thanks,

Jingqi



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] iommu/vt-d: Add nested domain support
       [not found]   ` <bf434709-4bb3-a7bd-4b5a-9f495bd3a104@intel.com>
@ 2023-03-14  1:21     ` Baolu Lu
  0 siblings, 0 replies; 14+ messages in thread
From: Baolu Lu @ 2023-03-14  1:21 UTC (permalink / raw)
  To: Liu, Jingqi, Yi Liu, joro, alex.williamson, jgg, kevin.tian,
	robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Jacob Pan

On 3/13/23 5:11 PM, Liu, Jingqi wrote:
>> +struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
>> +					       const void *user_data)
>> +{
>> +	const struct iommu_hwpt_intel_vtd *vtd = user_data;
>> +	struct dmar_domain *domain;
>> +
> 
> Would it be better to add the following check ?
> 
> 	if (WARN_ON(!user_data))
> 		return NULL;

The iommufd has already done the sanity check. Considering that this
callback is only for iommufd purpose, the individual driver has no need
to check it.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d
  2023-03-09  8:22 ` [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d Yi Liu
@ 2023-03-15 13:50   ` Liu, Jingqi
  2023-03-21  6:31     ` Liu, Yi L
  2023-03-20 13:49   ` Jason Gunthorpe
  1 sibling, 1 reply; 14+ messages in thread
From: Liu, Jingqi @ 2023-03-15 13:50 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest


On 3/9/2023 4:22 PM, Yi Liu wrote:
> Add the following data structures for corresponding ioctls:
>                 iommu_hwpt_intel_vtd => IOMMU_HWPT_ALLOC
>                iommu_hw_info_vtd => IOMMU_DEVICE_GET_HW_INFO
>      iommu_hwpt_invalidate_intel_vtd => IOMMU_HWPT_INVALIDATE
>
> Also, add IOMMU_HW_INFO_TYPE_INTEL_VTD and IOMMU_HWPT_TYPE_VTD_S1 to the
> header and corresponding type/size arrays.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/hw_pagetable.c |   7 +-
>   drivers/iommu/iommufd/main.c         |   5 +
>   include/uapi/linux/iommufd.h         | 136 +++++++++++++++++++++++++++
>   3 files changed, 147 insertions(+), 1 deletion(-)
>
[...]

> +
> +/**
> + * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
> + *                               stage-1 page table info
> + * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
> + * @pgtbl_addr: The base address of the user-managed stage-1 page table.
> + * @pat: Page attribute table data to compute effective memory type
> + * @emt: Extended memory type
> + * @addr_width: The address width of the untranslated addresses that are
> + *              subjected to the user-managed stage-1 page table.
> + * @__reserved: Must be 0
> + *
> + * The Intel VT-d specific data for creating hw_pagetable to represent
> + * the user-managed stage-1 page table that is used in nested translation.
> + *
> + * In nested translation, the stage-1 page table locates in the address
> + * space that defined by the corresponding stage-2 page table. Hence the
> + * stage-1 page table base address value should not be higher than the
> + * maximum untranslated address of stage-2 page table.
> + *
> + * The paging level of the stage-1 page table should be compataible with

s/compataible/compatible

> + * the hardware iommu. Otherwise, the allocation would be failed.
> + */
> +struct iommu_hwpt_intel_vtd {
> +	__u64 flags;
> +	__u64 pgtbl_addr;
> +	__u32 pat;
> +	__u32 emt;
> +	__u32 addr_width;
> +	__u32 __reserved;
>   };
>   

[...]

> +
> +/**
> + * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache invalidation info
> + * @granularity: One of enum iommu_vtd_qi_granularity.
> + * @flags: Combination of enum iommu_hwpt_intel_vtd_invalidate_flags
> + * @__reserved: Must be 0
> + * @addr: The start address of the addresses to be invalidated.
> + * @granule_size: Page/block size of the mapping in bytes. It is used to
> + *                compute the invalidation range togehter with @nb_granules.

s/togehter/together

Thanks,
Jingqi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d
  2023-03-09  8:22 ` [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d Yi Liu
  2023-03-15 13:50   ` Liu, Jingqi
@ 2023-03-20 13:49   ` Jason Gunthorpe
  2023-03-21  6:06     ` Liu, Yi L
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 13:49 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Thu, Mar 09, 2023 at 12:22:03AM -0800, Yi Liu wrote:

> +struct iommu_hwpt_invalidate_intel_vtd {
> +	__u8 granularity;
> +	__u8 padding[7];
> +	__u32 flags;
> +	__u32 __reserved;
> +	__u64 addr;
> +	__u64 granule_size;
> +	__u64 nb_granules;
> +};

Is there a reason this has such a weird layout? Put the granularity in
the __reserved slot?

Consider the discussion on ARM if you prefer to use the native HW
command structure instead?

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain
  2023-03-09  8:22 ` [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
@ 2023-03-20 13:54   ` Jason Gunthorpe
  2023-03-21  1:58     ` Baolu Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 13:54 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Thu, Mar 09, 2023 at 12:22:05AM -0800, Yi Liu wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> The nested domain fields are exclusive to those that used for a DMA
> remapping domain. Use union to avoid memory waste.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
>  1 file changed, 29 insertions(+), 6 deletions(-)

Using unions like this often devolves into a mess.

You'd be better to have more structures

struct intel_iommu_domain {
   struct iommu_domain domain;
   [general fields about attachment]
};

struct intel_iopte_domain {
    struct intel_iommu_domain domain;
    [stuff describing the io page table data, pgd, format, etc]
};

strut intel_s1_domain {
     struct intel_iommu_domain domain;
     struct dmar_domain *s2_domain;
     /* user page table pointer (in GPA) */
     unsigned long s1_pgtbl;
     /* page table attributes */
     struct iommu_hwpt_intel_vtd s1_cfg;
};
static_assert(offset_of(struct intel_s1_domain, domain.domain) ==
              offset_of(struct intel_iommu_domain, domain));

The per-domain ops allow to make this work sensibly

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain
  2023-03-20 13:54   ` Jason Gunthorpe
@ 2023-03-21  1:58     ` Baolu Lu
  0 siblings, 0 replies; 14+ messages in thread
From: Baolu Lu @ 2023-03-21  1:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Yi Liu
  Cc: baolu.lu, joro, alex.williamson, kevin.tian, robin.murphy,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On 3/20/23 9:54 PM, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 12:22:05AM -0800, Yi Liu wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>
>> The nested domain fields are exclusive to those that used for a DMA
>> remapping domain. Use union to avoid memory waste.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>>   drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
>>   1 file changed, 29 insertions(+), 6 deletions(-)
> 
> Using unions like this often devolves into a mess.
> 
> You'd be better to have more structures
> 
> struct intel_iommu_domain {
>     struct iommu_domain domain;
>     [general fields about attachment]
> };
> 
> struct intel_iopte_domain {
>      struct intel_iommu_domain domain;
>      [stuff describing the io page table data, pgd, format, etc]
> };
> 
> strut intel_s1_domain {
>       struct intel_iommu_domain domain;
>       struct dmar_domain *s2_domain;
>       /* user page table pointer (in GPA) */
>       unsigned long s1_pgtbl;
>       /* page table attributes */
>       struct iommu_hwpt_intel_vtd s1_cfg;
> };
> static_assert(offset_of(struct intel_s1_domain, domain.domain) ==
>                offset_of(struct intel_iommu_domain, domain));
> 
> The per-domain ops allow to make this work sensibly

Yes. This will make the data structures clearer.

However, this will lead to significant code changes. I think it would be
more appropriate to put it in a separate refactoring series later.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d
  2023-03-20 13:49   ` Jason Gunthorpe
@ 2023-03-21  6:06     ` Liu, Yi L
  0 siblings, 0 replies; 14+ messages in thread
From: Liu, Yi L @ 2023-03-21  6:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, alex.williamson, Tian, Kevin, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, March 20, 2023 9:50 PM
> 
> On Thu, Mar 09, 2023 at 12:22:03AM -0800, Yi Liu wrote:
> 
> > +struct iommu_hwpt_invalidate_intel_vtd {
> > +	__u8 granularity;
> > +	__u8 padding[7];
> > +	__u32 flags;
> > +	__u32 __reserved;
> > +	__u64 addr;
> > +	__u64 granule_size;
> > +	__u64 nb_granules;
> > +};
> 
> Is there a reason this has such a weird layout? Put the granularity in
> the __reserved slot?

No special reason. This layout was from the previous merged version.
Will modify it as you suggested.

> Consider the discussion on ARM if you prefer to use the native HW
> command structure instead?

Yes, will think about it. at least granule_size and nb_granules are not
necessary. They was added in the previous abstracted invalidation
uapi structure.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d
  2023-03-15 13:50   ` Liu, Jingqi
@ 2023-03-21  6:31     ` Liu, Yi L
  0 siblings, 0 replies; 14+ messages in thread
From: Liu, Yi L @ 2023-03-21  6:31 UTC (permalink / raw)
  To: Liu, Jingqi, joro, alex.williamson, jgg, Tian, Kevin,
	robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Liu, Jingqi <jingqi.liu@intel.com>
> Sent: Wednesday, March 15, 2023 9:51 PM
> 
> 
> On 3/9/2023 4:22 PM, Yi Liu wrote:
> > Add the following data structures for corresponding ioctls:
> >                 iommu_hwpt_intel_vtd => IOMMU_HWPT_ALLOC
> >                iommu_hw_info_vtd => IOMMU_DEVICE_GET_HW_INFO
> >      iommu_hwpt_invalidate_intel_vtd => IOMMU_HWPT_INVALIDATE
> >
> > Also, add IOMMU_HW_INFO_TYPE_INTEL_VTD and
> IOMMU_HWPT_TYPE_VTD_S1 to the
> > header and corresponding type/size arrays.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/iommu/iommufd/hw_pagetable.c |   7 +-
> >   drivers/iommu/iommufd/main.c         |   5 +
> >   include/uapi/linux/iommufd.h         | 136
> +++++++++++++++++++++++++++
> >   3 files changed, 147 insertions(+), 1 deletion(-)
> >
> [...]
> 
> > +
> > +/**
> > + * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
> > + *                               stage-1 page table info
> > + * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
> > + * @pgtbl_addr: The base address of the user-managed stage-1 page
> table.
> > + * @pat: Page attribute table data to compute effective memory type
> > + * @emt: Extended memory type
> > + * @addr_width: The address width of the untranslated addresses that
> are
> > + *              subjected to the user-managed stage-1 page table.
> > + * @__reserved: Must be 0
> > + *
> > + * The Intel VT-d specific data for creating hw_pagetable to represent
> > + * the user-managed stage-1 page table that is used in nested translation.
> > + *
> > + * In nested translation, the stage-1 page table locates in the address
> > + * space that defined by the corresponding stage-2 page table. Hence
> the
> > + * stage-1 page table base address value should not be higher than the
> > + * maximum untranslated address of stage-2 page table.
> > + *
> > + * The paging level of the stage-1 page table should be compataible with
> 
> s/compataible/compatible
> 
> > + * the hardware iommu. Otherwise, the allocation would be failed.
> > + */
> > +struct iommu_hwpt_intel_vtd {
> > +	__u64 flags;
> > +	__u64 pgtbl_addr;
> > +	__u32 pat;
> > +	__u32 emt;
> > +	__u32 addr_width;
> > +	__u32 __reserved;
> >   };
> >
> 
> [...]
> 
> > +
> > +/**
> > + * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache
> invalidation info
> > + * @granularity: One of enum iommu_vtd_qi_granularity.
> > + * @flags: Combination of enum
> iommu_hwpt_intel_vtd_invalidate_flags
> > + * @__reserved: Must be 0
> > + * @addr: The start address of the addresses to be invalidated.
> > + * @granule_size: Page/block size of the mapping in bytes. It is used to
> > + *                compute the invalidation range togehter with @nb_granules.
> 
> s/togehter/together
> 

All above received. Thanks.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-03-21  6:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-09  8:22 [PATCH v2 0/5] Add Intel VT-d nested translation Yi Liu
2023-03-09  8:22 ` [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d Yi Liu
2023-03-15 13:50   ` Liu, Jingqi
2023-03-21  6:31     ` Liu, Yi L
2023-03-20 13:49   ` Jason Gunthorpe
2023-03-21  6:06     ` Liu, Yi L
2023-03-09  8:22 ` [PATCH v2 2/5] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
2023-03-09  8:22 ` [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
2023-03-20 13:54   ` Jason Gunthorpe
2023-03-21  1:58     ` Baolu Lu
2023-03-09  8:22 ` [PATCH v2 4/5] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
2023-03-09  8:22 ` [PATCH v2 5/5] iommu/vt-d: Add nested domain support Yi Liu
2023-03-13 13:51   ` Liu, Jingqi
     [not found]   ` <bf434709-4bb3-a7bd-4b5a-9f495bd3a104@intel.com>
2023-03-14  1:21     ` Baolu Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.