linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/12] Add Intel VT-d nested translation
@ 2023-07-24 11:13 Yi Liu
  2023-07-24 11:13 ` [PATCH v4 01/12] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
                   ` (11 more replies)
  0 siblings, 12 replies; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This is to add Intel VT-d nested translation based on IOMMUFD nesting
infrastructure. As the iommufd nesting infrastructure series[1], iommu
core supports new ops to report iommu hardware information, allocate
domains with user data and invalidate stage-1 IOTLB when there is mapping
changed in stage-1 page table. The data required in the three paths are
vendor-specific, so

1) IOMMU_HWPT_TYPE_VTD_S1 is defined for the Intel VT-d stage-1 page
   table, it will be used in the stage-1 domain allocation and IOTLB
   syncing path. struct iommu_hwpt_vtd_s1 is defined to pass user_data
   for the Intel VT-d stage-1 domain allocation.
   struct iommu_hwpt_vtd_s1_invalidate is defined to pass the data for
   the Intel VT-d stage-1 IOTLB invalidation.
2) IOMMU_HW_INFO_TYPE_INTEL_VTD and struct iommu_hw_info_vtd are defined
   to report iommu hardware information for Intel VT-d.

With above IOMMUFD extensions, the intel iommu driver implements the three
paths to support nested translation.

The first Intel platform supporting nested translation is Sapphire
Rapids which, unfortunately, has a hardware errata [2] requiring special
treatment. This errata happens when a stage-1 page table page (either
level) is located in a stage-2 read-only region. In that case the IOMMU
hardware may ignore the stage-2 RO permission and still set the A/D bit
in stage-1 page table entries during page table walking.

A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to report
this errata to userspace. With that restriction the user should either
disable nested translation to favor RO stage-2 mappings or ensure no
RO stage-2 mapping to enable nested translation.

Intel-iommu driver is armed with necessary checks to prevent such mix
in patch12 of this series.

Qemu currently does add RO mappings though. The vfio agent in Qemu
simply maps all valid regions in the GPA address space which certainly
includes RO regions e.g. vbios.

In reality we don't know a usage relying on DMA reads from the BIOS
region. Hence finding a way to skip RO regions (e.g. via a discard manager)
in Qemu might be an acceptable tradeoff. The actual change needs more
discussion in Qemu community. For now we just hacked Qemu to test.

Complete code can be found in [3], corresponding QEMU could can be found
in [4].

[1] https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.com/
[2] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html
[3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[4] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4_nesting

Change log:
v4:
 - Remove ascii art tables (Jason)
 - Drop EMT (Tina, Jason)
 - Drop MTS and related definitions (Kevin)
 - Rename macro IOMMU_VTD_PGTBL_ to IOMMU_VTD_S1_ (Kevin)
 - Rename struct iommu_hwpt_intel_vtd_ to iommu_hwpt_vtd_ (Kevin)
 - Rename struct iommu_hwpt_intel_vtd to iommu_hwpt_vtd_s1 (Kevin)
 - Put the vendor specific hwpt alloc data structure before enuma iommu_hwpt_type (Kevin)
 - Do not trim the higher page levels of S2 domain in nested domain attachment as the
   S2 domain may have been used independently. (Kevin)
 - Remove the first-stage pgd check against the maximum address of s2_domain as hw
   can check it anyhow. It makes sense to check every pfns used in the stage-1 page
   table. But it cannot make it. So just leave it to hw. (Kevin)
 - Split the iotlb flush part into an order of uapi, helper and callback implementation (Kevin)
 - Change the policy of VT-d nesting errata, disallow RO mapping once a domain is used
   as parent domain of a nested domain. This removes the nested_users counting. (Kevin)
 - Minor fix for "make htmldocs"

v3: https://lore.kernel.org/linux-iommu/20230511145110.27707-1-yi.l.liu@intel.com/
 - Further split the patches into an order of adding helpers for nested
   domain, iotlb flush, nested domain attachment and nested domain allocation
   callback, then report the hw_info to userspace.
 - Add batch support in cache invalidation from userspace
 - Disallow nested translation usage if RO mappings exists in stage-2 domain
   due to errata on readonly mappings on Sapphire Rapids platform.

v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.com/
 - The iommufd infrastructure is split to be separate series.

v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Lu Baolu (5):
  iommu/vt-d: Extend dmar_domain to support nested domain
  iommu/vt-d: Add helper for nested domain allocation
  iommu/vt-d: Add helper to setup pasid nested translation
  iommu/vt-d: Add nested domain allocation
  iommu/vt-d: Disallow nesting on domains with read-only mappings

Yi Liu (7):
  iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  iommu/vt-d: Make domain attach helpers to be extern
  iommu/vt-d: Set the nested domain to a device
  iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
  iommu/vt-d: Make iotlb flush helpers to be extern
  iommu/vt-d: Add iotlb flush for nested domain
  iommu/vt-d: Implement hw_info for iommu capability query

 drivers/iommu/intel/Makefile |   2 +-
 drivers/iommu/intel/iommu.c  |  80 +++++++++++++---
 drivers/iommu/intel/iommu.h  |  55 +++++++++--
 drivers/iommu/intel/nested.c | 174 +++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.c  | 127 +++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h  |   2 +
 drivers/iommu/iommufd/main.c |   6 ++
 include/linux/iommu.h        |   1 +
 include/uapi/linux/iommufd.h | 124 +++++++++++++++++++++++++
 9 files changed, 549 insertions(+), 22 deletions(-)
 create mode 100644 drivers/iommu/intel/nested.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v4 01/12] iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  6:40   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 02/12] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This adds IOMMU_HWPT_TYPE_VTD_S1 for stage-1 hw_pagetable of Intel VT-d
and the corressponding data structure for userspace specified parameter
for the domain allocation.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/uapi/linux/iommufd.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index ede822e5acbb..90b0d3f603a7 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -351,12 +351,45 @@ struct iommu_vfio_ioas {
 };
 #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
 
+/**
+ * enum iommu_hwpt_vtd_s1_flags - Intel VT-d stage-1 page table
+ *                                entry attributes
+ * @IOMMU_VTD_S1_SRE: Supervisor request
+ * @IOMMU_VTD_S1_EAFE: Extended access enable
+ * @IOMMU_VTD_S1_WPE: Write protect enable
+ */
+enum iommu_hwpt_vtd_s1_flags {
+	IOMMU_VTD_S1_SRE = 1 << 0,
+	IOMMU_VTD_S1_EAFE = 1 << 1,
+	IOMMU_VTD_S1_WPE = 1 << 2,
+};
+
+/**
+ * struct iommu_hwpt_vtd_s1 - Intel VT-d specific user-managed stage-1
+ *                            page table info (IOMMU_HWPT_TYPE_VTD_S1)
+ * @flags: Combination of enum iommu_hwpt_vtd_s1_flags
+ * @pgtbl_addr: The base address of the stage-1 page table.
+ * @addr_width: The address width of the stage-1 page table
+ * @__reserved: Must be 0
+ *
+ * VT-d specific data for creating a stage-1 page table that is used
+ * in nested translation.
+ */
+struct iommu_hwpt_vtd_s1 {
+	__aligned_u64 flags;
+	__aligned_u64 pgtbl_addr;
+	__u32 addr_width;
+	__u32 __reserved;
+};
+
 /**
  * enum iommu_hwpt_type - IOMMU HWPT Type
  * @IOMMU_HWPT_TYPE_DEFAULT: default
+ * @IOMMU_HWPT_TYPE_VTD_S1: Intel VT-d stage-1 page table
  */
 enum iommu_hwpt_type {
 	IOMMU_HWPT_TYPE_DEFAULT,
+	IOMMU_HWPT_TYPE_VTD_S1,
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 02/12] iommu/vt-d: Extend dmar_domain to support nested domain
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
  2023-07-24 11:13 ` [PATCH v4 01/12] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  6:42   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 03/12] iommu/vt-d: Add helper for nested domain allocation Yi Liu
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Lu Baolu <baolu.lu@linux.intel.com>

The nested domain fields are exclusive to those that used for a DMA
remapping domain. Use union to avoid memory waste.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 1c5e1d88862b..565e6ae54d32 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -596,15 +596,38 @@ struct dmar_domain {
 	spinlock_t lock;		/* Protect device tracking lists */
 	struct list_head devices;	/* all devices' list */
 
-	struct dma_pte	*pgd;		/* virtual address */
-	int		gaw;		/* max guest address width */
-
-	/* adjusted guest address width, 0 is level 2 30-bit */
-	int		agaw;
 	int		iommu_superpage;/* Level of superpages supported:
 					   0 == 4KiB (no superpages), 1 == 2MiB,
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
-	u64		max_addr;	/* maximum mapped address */
+	union {
+		/* DMA remapping domain */
+		struct {
+			/* virtual address */
+			struct dma_pte	*pgd;
+			/* max guest address width */
+			int		gaw;
+			/*
+			 * adjusted guest address width:
+			 *   0: level 2 30-bit
+			 *   1: level 3 39-bit
+			 *   2: level 4 48-bit
+			 *   3: level 5 57-bit
+			 */
+			int		agaw;
+			/* maximum mapped address */
+			u64		max_addr;
+		};
+
+		/* Nested user domain */
+		struct {
+			/* parent page table which the user domain is nested on */
+			struct dmar_domain *s2_domain;
+			/* user page table pointer (in GPA) */
+			unsigned long s1_pgtbl;
+			/* page table attributes */
+			struct iommu_hwpt_vtd_s1 s1_cfg;
+		};
+	};
 
 	struct iommu_domain domain;	/* generic domain data structure for
 					   iommu core */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 03/12] iommu/vt-d: Add helper for nested domain allocation
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
  2023-07-24 11:13 ` [PATCH v4 01/12] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
  2023-07-24 11:13 ` [PATCH v4 02/12] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  6:44   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

This adds helper for accepting user parameters and allocate a nested
domain.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/Makefile |  2 +-
 drivers/iommu/intel/iommu.h  |  2 ++
 drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/intel/nested.c

diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
index 7af3b8a4f2a0..5dabf081a779 100644
--- a/drivers/iommu/intel/Makefile
+++ b/drivers/iommu/intel/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
-obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o
+obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o
 obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
 obj-$(CONFIG_DMAR_PERF) += perf.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 565e6ae54d32..b3edad7359c9 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -858,6 +858,8 @@ void *alloc_pgtable_page(int node, gfp_t gfp);
 void free_pgtable_page(void *vaddr);
 void iommu_flush_write_buffer(struct intel_iommu *iommu);
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+					       const union iommu_domain_user_data *user_data);
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 void intel_svm_check(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
new file mode 100644
index 000000000000..80a64ba87d46
--- /dev/null
+++ b/drivers/iommu/intel/nested.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * nested.c - nested mode translation support
+ *
+ * Copyright (C) 2023 Intel Corporation
+ *
+ * Author: Lu Baolu <baolu.lu@linux.intel.com>
+ *         Jacob Pan <jacob.jun.pan@linux.intel.com>
+ */
+
+#define pr_fmt(fmt)	"DMAR: " fmt
+
+#include <linux/iommu.h>
+
+#include "iommu.h"
+
+static void intel_nested_domain_free(struct iommu_domain *domain)
+{
+	kfree(to_dmar_domain(domain));
+}
+
+static const struct iommu_domain_ops intel_nested_domain_ops = {
+	.free			= intel_nested_domain_free,
+};
+
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+					       const union iommu_domain_user_data *user_data)
+{
+	const struct iommu_hwpt_vtd_s1 *vtd = (struct iommu_hwpt_vtd_s1 *)user_data;
+	struct dmar_domain *domain;
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
+	if (!domain)
+		return NULL;
+
+	domain->use_first_level = true;
+	domain->s2_domain = to_dmar_domain(s2_domain);
+	domain->s1_pgtbl = vtd->pgtbl_addr;
+	domain->s1_cfg = *vtd;
+	domain->domain.ops = &intel_nested_domain_ops;
+	domain->domain.type = IOMMU_DOMAIN_NESTED;
+	INIT_LIST_HEAD(&domain->devices);
+	spin_lock_init(&domain->lock);
+	xa_init(&domain->iommu_array);
+
+	return &domain->domain;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
                   ` (2 preceding siblings ...)
  2023-07-24 11:13 ` [PATCH v4 03/12] iommu/vt-d: Add helper for nested domain allocation Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  7:10   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 05/12] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

The configurations are passed in from the user when the user domain is
allocated. This helper interprets these configurations according to the
data structure defined in uapi/linux/iommufd.h. The EINVAL error will be
returned if any of configurations are not compatible with the hardware
capabilities. The caller can retry with another compatible user domain.
The encoding of fields of each pasid entry is defined in section 9.6 of
the VT-d spec.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/pasid.c | 127 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h |   2 +
 2 files changed, 129 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c5d479770e12..af9cfd2d5c52 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -335,6 +335,15 @@ static inline void pasid_set_fault_enable(struct pasid_entry *pe)
 	pasid_set_bits(&pe->val[0], 1 << 1, 0);
 }
 
+/*
+ * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_sre(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 0, 1);
+}
+
 /*
  * Setup the WPE(Write Protect Enable) field (Bit 132) of a
  * scalable mode PASID entry.
@@ -402,6 +411,15 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
 	pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
 }
 
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void pasid_set_eafe(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
+}
+
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 				    u16 did, u32 pasid)
@@ -713,3 +731,112 @@ void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
 	if (!cap_caching_mode(iommu->cap))
 		devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
+ * This could be used for nested translation based vIOMMU. e.g. guest IOVA
+ * and guest shared virtual address. In this case, the first level page
+ * tables are used for GVA/GIOVA-GPA translation in the guest, second level
+ * page tables are used for GPA-HPA translation.
+ *
+ * @iommu:      IOMMU which the device belong to
+ * @dev:        Device to be set up for translation
+ * @pasid:      PASID to be programmed in the device PASID table
+ * @domain:     User stage-1 domain nested on a s2 domain
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain)
+{
+	struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg;
+	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
+	struct dmar_domain *s2_domain = domain->s2_domain;
+	u16 did = domain_id_iommu(domain, iommu);
+	struct dma_pte *pgd = s2_domain->pgd;
+	struct pasid_entry *pte;
+
+	if (!ecap_nest(iommu->ecap)) {
+		pr_err_ratelimited("%s: No nested translation support\n",
+				   iommu->name);
+		return -ENODEV;
+	}
+
+	/*
+	 * Address width should match in two dimensions: CPU vs. IOMMU,
+	 * guest vs. host.
+	 */
+	switch (s1_cfg->addr_width) {
+	case ADDR_WIDTH_4LEVEL:
+		break;
+#ifdef CONFIG_X86
+	case ADDR_WIDTH_5LEVEL:
+		if (!cpu_feature_enabled(X86_FEATURE_LA57) ||
+		    !cap_fl5lp_support(iommu->cap)) {
+			dev_err_ratelimited(dev,
+					    "5-level paging not supported\n");
+			return -EINVAL;
+		}
+		break;
+#endif
+	default:
+		dev_err_ratelimited(dev, "Invalid guest address width %d\n",
+				    s1_cfg->addr_width);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_SRE) && !ecap_srs(iommu->ecap)) {
+		pr_err_ratelimited("No supervisor request support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_EAFE) && !ecap_eafs(iommu->ecap)) {
+		pr_err_ratelimited("No extended access flag support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	if (s2_domain->agaw > iommu->agaw) {
+		pr_err_ratelimited("Incompatible agaw %s\n", iommu->name);
+		return -EINVAL;
+	}
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+	if (pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EBUSY;
+	}
+
+	pasid_clear_entry(pte);
+
+	if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL)
+		pasid_set_flpm(pte, 1);
+
+	pasid_set_flptr(pte, (uintptr_t)s1_gpgd);
+
+	if (s1_cfg->flags & IOMMU_VTD_S1_SRE) {
+		pasid_set_sre(pte);
+		if (s1_cfg->flags & IOMMU_VTD_S1_WPE)
+			pasid_set_wpe(pte);
+	}
+
+	if (s1_cfg->flags & IOMMU_VTD_S1_EAFE)
+		pasid_set_eafe(pte);
+
+	pasid_set_slptr(pte, virt_to_phys(pgd));
+	pasid_set_fault_enable(pte);
+	pasid_set_domain_id(pte, did);
+	pasid_set_address_width(pte, s2_domain->agaw);
+	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+	pasid_set_present(pte);
+	spin_unlock(&iommu->lock);
+
+	pasid_flush_caches(iommu, pte, pasid, did);
+
+	return 0;
+}
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index d6b7d21244b1..864b12848392 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -111,6 +111,8 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, u32 pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, u32 pasid,
 				 bool fault_ignore);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 05/12] iommu/vt-d: Make domain attach helpers to be extern
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
                   ` (3 preceding siblings ...)
  2023-07-24 11:13 ` [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  7:14   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 06/12] iommu/vt-d: Set the nested domain to a device Yi Liu
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This makes the helpers visible to nested.c.

Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c | 17 +++++++----------
 drivers/iommu/intel/iommu.h |  8 ++++++++
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 5c8c5cdc36cf..289e8c2417ad 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -277,7 +277,6 @@ static LIST_HEAD(dmar_satc_units);
 #define for_each_rmrr_units(rmrr) \
 	list_for_each_entry(rmrr, &dmar_rmrr_units, list)
 
-static void device_block_translation(struct device *dev);
 static void intel_iommu_domain_free(struct iommu_domain *domain);
 
 int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
@@ -555,7 +554,7 @@ static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
 }
 
 /* Some capabilities may be different across iommus */
-static void domain_update_iommu_cap(struct dmar_domain *domain)
+void domain_update_iommu_cap(struct dmar_domain *domain)
 {
 	domain_update_iommu_coherency(domain);
 	domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL);
@@ -1732,8 +1731,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
 	return domain;
 }
 
-static int domain_attach_iommu(struct dmar_domain *domain,
-			       struct intel_iommu *iommu)
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 {
 	struct iommu_domain_info *info, *curr;
 	unsigned long ndomains;
@@ -1782,8 +1780,7 @@ static int domain_attach_iommu(struct dmar_domain *domain,
 	return ret;
 }
 
-static void domain_detach_iommu(struct dmar_domain *domain,
-				struct intel_iommu *iommu)
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 {
 	struct iommu_domain_info *info;
 
@@ -3987,7 +3984,7 @@ static void dmar_remove_one_dev_info(struct device *dev)
  * all DMA requests without PASID from the device are blocked. If the page
  * table has been set, clean up the data structures.
  */
-static void device_block_translation(struct device *dev)
+void device_block_translation(struct device *dev)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
 	struct intel_iommu *iommu = info->iommu;
@@ -4093,8 +4090,8 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
 		domain_exit(to_dmar_domain(domain));
 }
 
-static int prepare_domain_attach_device(struct iommu_domain *domain,
-					struct device *dev)
+int prepare_domain_attach_device(struct iommu_domain *domain,
+				 struct device *dev)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	struct intel_iommu *iommu;
@@ -4334,7 +4331,7 @@ static void domain_set_force_snooping(struct dmar_domain *domain)
 						     PASID_RID2PASID);
 }
 
-static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
+bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	unsigned long flags;
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index b3edad7359c9..4b12166a9c3f 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -852,6 +852,14 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
  */
 #define QI_OPT_WAIT_DRAIN		BIT(0)
 
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void device_block_translation(struct device *dev);
+int prepare_domain_attach_device(struct iommu_domain *domain,
+				 struct device *dev);
+bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain);
+void domain_update_iommu_cap(struct dmar_domain *domain);
+
 int dmar_ir_support(void);
 
 void *alloc_pgtable_page(int node, gfp_t gfp);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 06/12] iommu/vt-d: Set the nested domain to a device
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
                   ` (4 preceding siblings ...)
  2023-07-24 11:13 ` [PATCH v4 05/12] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  7:22   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation Yi Liu
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, Jacob Pan

This adds the helper for setting the nested domain to a device hence
enable nested domain usage on Intel VT-d.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/nested.c | 52 ++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index 80a64ba87d46..98164894f22f 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -11,8 +11,58 @@
 #define pr_fmt(fmt)	"DMAR: " fmt
 
 #include <linux/iommu.h>
+#include <linux/pci.h>
+#include <linux/pci-ats.h>
 
 #include "iommu.h"
+#include "pasid.h"
+
+static int intel_nested_attach_dev(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct intel_iommu *iommu = info->iommu;
+	unsigned long flags;
+	int ret = 0;
+
+	if (info->domain)
+		device_block_translation(dev);
+
+	if (iommu->agaw < dmar_domain->s2_domain->agaw) {
+		dev_err_ratelimited(dev, "Adjusted guest address width not compatible\n");
+		return -ENODEV;
+	}
+
+	/* Is s2_domain compatible with this IOMMU? */
+	ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
+	if (ret) {
+		dev_err_ratelimited(dev, "s2 domain is not compatible\n");
+		return ret;
+	}
+
+	ret = domain_attach_iommu(dmar_domain, iommu);
+	if (ret) {
+		dev_err_ratelimited(dev, "Failed to attach domain to iommu\n");
+		return ret;
+	}
+
+	ret = intel_pasid_setup_nested(iommu, dev,
+				       PASID_RID2PASID, dmar_domain);
+	if (ret) {
+		domain_detach_iommu(dmar_domain, iommu);
+		dev_err_ratelimited(dev, "Failed to setup pasid entry\n");
+		return ret;
+	}
+
+	info->domain = dmar_domain;
+	spin_lock_irqsave(&dmar_domain->lock, flags);
+	list_add(&info->link, &dmar_domain->devices);
+	spin_unlock_irqrestore(&dmar_domain->lock, flags);
+	domain_update_iommu_cap(dmar_domain);
+
+	return 0;
+}
 
 static void intel_nested_domain_free(struct iommu_domain *domain)
 {
@@ -20,7 +70,9 @@ static void intel_nested_domain_free(struct iommu_domain *domain)
 }
 
 static const struct iommu_domain_ops intel_nested_domain_ops = {
+	.attach_dev		= intel_nested_attach_dev,
 	.free			= intel_nested_domain_free,
+	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
 };
 
 struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
                   ` (5 preceding siblings ...)
  2023-07-24 11:13 ` [PATCH v4 06/12] iommu/vt-d: Set the nested domain to a device Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  7:41   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 08/12] iommu/vt-d: Make iotlb flush helpers to be extern Yi Liu
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This adds the data structure for flushing iotlb for the nested domain
allocated with IOMMU_HWPT_TYPE_VTD_S1 type.

Cache invalidation path is performance path, so it's better to avoid
memory allocation in such path. To achieve it, this path reuses the
ucmd_buffer to copy user data. So the new data structures are added in
the ucmd_buffer union to avoid overflow.

This only supports invalidating IOTLB, but no for device-TLB as device-TLB
invalidation will be covered automatically in the IOTLB invalidation if the
underlying IOMMU driver has enabled ATS for the affected device.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/main.c |  6 ++++
 include/uapi/linux/iommufd.h | 58 ++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index d49837397dfa..b927ace7f3af 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -485,6 +485,12 @@ union ucmd_buffer {
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
+	/*
+	 * hwpt_type specific structure used in the cache invalidation
+	 * path.
+	 */
+	struct iommu_hwpt_vtd_s1_invalidate vtd;
+	struct iommu_hwpt_vtd_s1_invalidate_desc req_vtd;
 };
 
 struct iommufd_ioctl_op {
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 90b0d3f603a7..2c1241448c87 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -523,6 +523,64 @@ struct iommu_resv_iova_ranges {
 };
 #define IOMMU_RESV_IOVA_RANGES _IO(IOMMUFD_TYPE, IOMMUFD_CMD_RESV_IOVA_RANGES)
 
+/**
+ * enum iommu_hwpt_vtd_s1_invalidate_flags - Flags for Intel VT-d
+ *                                           stage-1 cache invalidation
+ * @IOMMU_VTD_QI_FLAGS_LEAF: The LEAF flag indicates whether only the
+ *                           leaf PTE caching needs to be invalidated
+ *                           and other paging structure caches can be
+ *                           preserved.
+ */
+enum iommu_hwpt_vtd_s1_invalidate_flags {
+	IOMMU_VTD_QI_FLAGS_LEAF = 1 << 0,
+};
+
+/**
+ * struct iommu_hwpt_vtd_s1_invalidate_desc - Intel VT-d stage-1 cache
+ *                                            invalidation descriptor
+ * @addr: The start address of the addresses to be invalidated.
+ * @npages: Number of contiguous 4K pages to be invalidated.
+ * @flags: Combination of enum iommu_hwpt_vtd_s1_invalidate_flags
+ * @__reserved: Must be 0
+ *
+ * The Intel VT-d specific invalidation data for user-managed stage-1 cache
+ * invalidation under nested translation. Userspace uses this structure to
+ * tell host about the impacted caches after modifying the stage-1 page table.
+ *
+ * Invalidating all the caches related to the hw_pagetable by setting @addr
+ * to be 0 and @npages to be __aligned_u64(-1).
+ */
+struct iommu_hwpt_vtd_s1_invalidate_desc {
+	__aligned_u64 addr;
+	__aligned_u64 npages;
+	__u32 flags;
+	__u32 __reserved;
+};
+
+/**
+ * struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation
+ *                                       (IOMMU_HWPT_TYPE_VTD_S1)
+ * @flags: Must be 0
+ * @entry_size: Size in bytes of each cache invalidation request
+ * @entry_nr_uptr: User pointer to the number of invalidation requests.
+ *                 Kernel reads it to get the number of requests and
+ *                 updates the buffer with the number of requests that
+ *                 have been processed successfully. This pointer must
+ *                 point to a __u32 type of memory location.
+ * @inv_data_uptr: Pointer to the cache invalidation requests
+ *
+ * The Intel VT-d specific invalidation data for a set of cache invalidation
+ * requests. Kernel loops the requests one-by-one and stops when failure
+ * is encountered. The number of handled requests is reported to user by
+ * writing the buffer pointed by @entry_nr_uptr.
+ */
+struct iommu_hwpt_vtd_s1_invalidate {
+	__u32 flags;
+	__u32 entry_size;
+	__aligned_u64 entry_nr_uptr;
+	__aligned_u64 inv_data_uptr;
+};
+
 /**
  * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
  * @size: sizeof(struct iommu_hwpt_invalidate)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 08/12] iommu/vt-d: Make iotlb flush helpers to be extern
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
                   ` (6 preceding siblings ...)
  2023-07-24 11:13 ` [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  7:41   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain Yi Liu
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This makes the helpers visible to nested.c.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c | 10 +++++-----
 drivers/iommu/intel/iommu.h |  6 ++++++
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 289e8c2417ad..3119a79ebc83 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1466,10 +1466,10 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
 
-static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
-				  struct dmar_domain *domain,
-				  unsigned long pfn, unsigned int pages,
-				  int ih, int map)
+void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
+			   struct dmar_domain *domain,
+			   unsigned long pfn, unsigned int pages,
+			   int ih, int map)
 {
 	unsigned int aligned_pages = __roundup_pow_of_two(pages);
 	unsigned int mask = ilog2(aligned_pages);
@@ -1542,7 +1542,7 @@ static inline void __mapping_notify_one(struct intel_iommu *iommu,
 		iommu_flush_write_buffer(iommu);
 }
 
-static void intel_flush_iotlb_all(struct iommu_domain *domain)
+void intel_flush_iotlb_all(struct iommu_domain *domain)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	struct iommu_domain_info *info;
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 4b12166a9c3f..5b292213bcb8 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -859,6 +859,12 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
 				 struct device *dev);
 bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain);
 void domain_update_iommu_cap(struct dmar_domain *domain);
+void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
+			   struct dmar_domain *domain,
+			   unsigned long pfn, unsigned int pages,
+			   int ih, int map);
+void intel_flush_iotlb_all(struct iommu_domain *domain);
+
 
 int dmar_ir_support(void);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
                   ` (7 preceding siblings ...)
  2023-07-24 11:13 ` [PATCH v4 08/12] iommu/vt-d: Make iotlb flush helpers to be extern Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  7:46   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 10/12] iommu/vt-d: Add nested domain allocation Yi Liu
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This implements the .cache_invalidate_user() callback and sets the
.cache_invalidate_user_data_len to support iotlb flush for nested domain.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/nested.c | 63 ++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index 98164894f22f..2739c0d7880d 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -69,8 +69,71 @@ static void intel_nested_domain_free(struct iommu_domain *domain)
 	kfree(to_dmar_domain(domain));
 }
 
+static void intel_nested_invalidate(struct device *dev,
+				    struct dmar_domain *domain,
+				    u64 addr, unsigned long npages)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+
+	if (addr == 0 && npages == -1)
+		intel_flush_iotlb_all(&domain->domain);
+	else
+		iommu_flush_iotlb_psi(iommu, domain,
+				      addr >> VTD_PAGE_SHIFT,
+				      npages, 1, 0);
+}
+
+static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
+					      void *user_data)
+{
+	struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
+	struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	unsigned int entry_size = inv_info->entry_size;
+	u64 uptr = inv_info->inv_data_uptr;
+	u64 nr_uptr = inv_info->entry_nr_uptr;
+	struct device_domain_info *info;
+	u32 entry_nr, index;
+	unsigned long flags;
+	int ret = 0;
+
+	if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
+		return -EFAULT;
+
+	for (index = 0; index < entry_nr; index++) {
+		ret = copy_struct_from_user(req, sizeof(*req),
+					    u64_to_user_ptr(uptr + index * entry_size),
+					    entry_size);
+		if (ret) {
+			pr_err_ratelimited("Failed to fetch invalidation request\n");
+			break;
+		}
+
+		if (req->__reserved || (req->flags & ~IOMMU_VTD_QI_FLAGS_LEAF) ||
+		    !IS_ALIGNED(req->addr, VTD_PAGE_SIZE)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		spin_lock_irqsave(&dmar_domain->lock, flags);
+		list_for_each_entry(info, &dmar_domain->devices, link)
+			intel_nested_invalidate(info->dev, dmar_domain,
+						req->addr, req->npages);
+		spin_unlock_irqrestore(&dmar_domain->lock, flags);
+	}
+
+	if (put_user(index, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
+		return -EFAULT;
+
+	return ret;
+}
+
 static const struct iommu_domain_ops intel_nested_domain_ops = {
 	.attach_dev		= intel_nested_attach_dev,
+	.cache_invalidate_user	= intel_nested_cache_invalidate_user,
+	.cache_invalidate_user_data_len =
+		sizeof(struct iommu_hwpt_vtd_s1_invalidate),
 	.free			= intel_nested_domain_free,
 	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 10/12] iommu/vt-d: Add nested domain allocation
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
                   ` (8 preceding siblings ...)
  2023-07-24 11:13 ` [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  7:47   ` Tian, Kevin
  2023-07-24 11:13 ` [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
  2023-07-24 11:13 ` [PATCH v4 12/12] iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Lu Baolu <baolu.lu@linux.intel.com>

This adds the support for IOMMU_HWPT_TYPE_VTD_S1 type.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c | 20 ++++++++++++++++++++
 include/linux/iommu.h       |  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 3119a79ebc83..6977d320c440 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4084,6 +4084,25 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 	return NULL;
 }
 
+static struct iommu_domain *
+intel_iommu_domain_alloc_user(struct device *dev,
+			      enum iommu_hwpt_type hwpt_type,
+			      struct iommu_domain *parent,
+			      const union iommu_domain_user_data *user_data)
+{
+	if (hwpt_type != IOMMU_HWPT_TYPE_DEFAULT &&
+	    hwpt_type != IOMMU_HWPT_TYPE_VTD_S1)
+		return ERR_PTR(-EINVAL);
+
+	if ((hwpt_type == IOMMU_HWPT_TYPE_DEFAULT) == !!parent)
+		return ERR_PTR(-EINVAL);
+
+	if (parent)
+		return intel_nested_domain_alloc(parent, user_data);
+	else
+		return iommu_domain_alloc(dev->bus);
+}
+
 static void intel_iommu_domain_free(struct iommu_domain *domain)
 {
 	if (domain != &si_domain->domain && domain != &blocking_domain)
@@ -4732,6 +4751,7 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
+	.domain_alloc_user	= intel_iommu_domain_alloc_user,
 	.probe_device		= intel_iommu_probe_device,
 	.probe_finalize		= intel_iommu_probe_finalize,
 	.release_device		= intel_iommu_release_device,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 231920efab84..09b8e800b55e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -240,6 +240,7 @@ union iommu_domain_user_data {
 #ifdef CONFIG_IOMMUFD_TEST
 	__u64 test[2];
 #endif
+	struct iommu_hwpt_vtd_s1 vtd;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
                   ` (9 preceding siblings ...)
  2023-07-24 11:13 ` [PATCH v4 10/12] iommu/vt-d: Add nested domain allocation Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-15 16:31   ` Jason Gunthorpe
  2023-07-24 11:13 ` [PATCH v4 12/12] iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c  | 19 +++++++++++++++++++
 include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6977d320c440..ba34827045e6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4748,8 +4748,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	intel_pasid_tear_down_entry(iommu, dev, pasid, false);
 }
 
+static void *intel_iommu_hw_info(struct device *dev, u32 *length)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct iommu_hw_info_vtd *vtd;
+
+	vtd = kzalloc(sizeof(*vtd), GFP_KERNEL);
+	if (!vtd)
+		return ERR_PTR(-ENOMEM);
+
+	vtd->cap_reg = iommu->cap;
+	vtd->ecap_reg = iommu->ecap;
+	*length = sizeof(*vtd);
+
+	return vtd;
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
+	.hw_info		= intel_iommu_hw_info,
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.domain_alloc_user	= intel_iommu_domain_alloc_user,
 	.probe_device		= intel_iommu_probe_device,
@@ -4763,6 +4781,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.def_domain_type	= device_def_domain_type,
 	.remove_dev_pasid	= intel_iommu_remove_dev_pasid,
 	.pgsize_bitmap		= SZ_4K,
+	.hw_info_type		= IOMMU_HW_INFO_TYPE_INTEL_VTD,
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	.page_response		= intel_svm_page_response,
 #endif
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 2c1241448c87..0dfb6f3d8dda 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -435,12 +435,35 @@ struct iommu_hwpt_alloc {
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
 
+/**
+ * struct iommu_hw_info_vtd - Intel VT-d hardware information
+ *
+ * @flags: Must be 0
+ * @__reserved: Must be 0
+ *
+ * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ *           section 11.4.2 Capability Register.
+ * @ecap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ *            section 11.4.3 Extended Capability Register.
+ *
+ * User needs to understand the Intel VT-d specification to decode the
+ * register value.
+ */
+struct iommu_hw_info_vtd {
+	__u32 flags;
+	__u32 __reserved;
+	__aligned_u64 cap_reg;
+	__aligned_u64 ecap_reg;
+};
+
 /**
  * enum iommu_hw_info_type - IOMMU Hardware Info Types
  * @IOMMU_HW_INFO_TYPE_NONE: Used by the drivers that does not report hardware info
+ * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
  */
 enum iommu_hw_info_type {
 	IOMMU_HW_INFO_TYPE_NONE,
+	IOMMU_HW_INFO_TYPE_INTEL_VTD,
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v4 12/12] iommu/vt-d: Disallow nesting on domains with read-only mappings
  2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
                   ` (10 preceding siblings ...)
  2023-07-24 11:13 ` [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
@ 2023-07-24 11:13 ` Yi Liu
  2023-08-02  7:59   ` Tian, Kevin
  11 siblings, 1 reply; 75+ messages in thread
From: Yi Liu @ 2023-07-24 11:13 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Lu Baolu <baolu.lu@linux.intel.com>

When remapping hardware is configured by system software in scalable mode
as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry,
it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled)
in first-stage page-table entries even when second-stage mappings indicate
that corresponding first-stage page-table is Read-Only.

As the result, contents of pages designated by VMM as Read-Only can be
modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of
address translation process due to DMAs issued by Guest.

Disallow the nested translation when there are read-only pages in the
corresponding second-stage mappings. And, no read-only pages are allowed
to be configured in the second-stage table of a nested translation.

For simplicity the 2nd restriction is not relaxed even when the nesting
is turned off later due to vIOMMU config change. In concept if the user
understands this errata and does expect to enable nested translation
it should never install any RO mapping in stage-2 in the entire VM life
cycle. Accordingly introduce a single sticky bit to mark the parent role
on a domain instead of tracking the role with a counter.

Reference from Sapphire Rapids Specification Update [1], errata details,
SPR17.

[1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c  | 14 ++++++++++++++
 drivers/iommu/intel/iommu.h  |  4 ++++
 drivers/iommu/intel/nested.c | 14 +++++++++++++-
 include/uapi/linux/iommufd.h | 12 +++++++++++-
 4 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ba34827045e6..caaa3a58dc94 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2138,6 +2138,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 	struct dma_pte *first_pte = NULL, *pte = NULL;
 	unsigned int largepage_lvl = 0;
 	unsigned long lvl_pages = 0;
+	unsigned long flags;
 	phys_addr_t pteval;
 	u64 attr;
 
@@ -2147,6 +2148,18 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 	if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
 		return -EINVAL;
 
+	if (!(prot & DMA_PTE_WRITE) && !domain->read_only_mapped) {
+		spin_lock_irqsave(&domain->lock, flags);
+		if (domain->set_nested) {
+			pr_err_ratelimited("No read-only mapping permitted\n");
+			spin_unlock_irqrestore(&domain->lock, flags);
+			return -EINVAL;
+		}
+
+		domain->read_only_mapped = true;
+		spin_unlock_irqrestore(&domain->lock, flags);
+	}
+
 	attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP);
 	attr |= DMA_FL_PTE_PRESENT;
 	if (domain->use_first_level) {
@@ -4758,6 +4771,7 @@ static void *intel_iommu_hw_info(struct device *dev, u32 *length)
 	if (!vtd)
 		return ERR_PTR(-ENOMEM);
 
+	vtd->flags = IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17;
 	vtd->cap_reg = iommu->cap;
 	vtd->ecap_reg = iommu->ecap;
 	*length = sizeof(*vtd);
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 5b292213bcb8..2a14fab6ac4f 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -592,6 +592,10 @@ struct dmar_domain {
 					 * otherwise, goes through the second
 					 * level.
 					 */
+	u8 read_only_mapped:1;		/* domain has mappings with read-only
+					 * permission.
+					 */
+	u8 set_nested:1;		/* has other domains nested on it */
 
 	spinlock_t lock;		/* Protect device tracking lists */
 	struct list_head devices;	/* all devices' list */
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index 2739c0d7880d..50934da613fa 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -142,14 +142,26 @@ struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
 					       const union iommu_domain_user_data *user_data)
 {
 	const struct iommu_hwpt_vtd_s1 *vtd = (struct iommu_hwpt_vtd_s1 *)user_data;
+	struct dmar_domain *s2_dmar_domain = to_dmar_domain(s2_domain);
 	struct dmar_domain *domain;
+	unsigned long flags;
 
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
 	if (!domain)
 		return NULL;
 
+	spin_lock_irqsave(&s2_dmar_domain->lock, flags);
+	if (s2_dmar_domain->read_only_mapped) {
+		spin_unlock_irqrestore(&s2_dmar_domain->lock, flags);
+		pr_err_ratelimited("S2 domain has read-only mappings\n");
+		kfree(domain);
+		return NULL;
+	}
+	s2_dmar_domain->set_nested = true;
+	spin_unlock_irqrestore(&s2_dmar_domain->lock, flags);
+
 	domain->use_first_level = true;
-	domain->s2_domain = to_dmar_domain(s2_domain);
+	domain->s2_domain = s2_dmar_domain;
 	domain->s1_pgtbl = vtd->pgtbl_addr;
 	domain->s1_cfg = *vtd;
 	domain->domain.ops = &intel_nested_domain_ops;
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 0dfb6f3d8dda..2f8f2dab95a7 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -435,10 +435,20 @@ struct iommu_hwpt_alloc {
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
 
+/**
+ * enum iommu_hw_info_vtd_flags - Flags for VT-d hw_info
+ * @IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17: If set, disallow nesting on domains
+ *                                   with read-only mapping.
+ *                                   https://www.intel.com/content/www/us/en/content-details/772415/content-details.html
+ */
+enum iommu_hw_info_vtd_flags {
+	IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 = 1 << 0,
+};
+
 /**
  * struct iommu_hw_info_vtd - Intel VT-d hardware information
  *
- * @flags: Must be 0
+ * @flags: Combination of enum iommu_hw_info_vtd_flags
  * @__reserved: Must be 0
  *
  * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 01/12] iommufd: Add data structure for Intel VT-d stage-1 domain allocation
  2023-07-24 11:13 ` [PATCH v4 01/12] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
@ 2023-08-02  6:40   ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  6:40 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:13 PM
> 
> +
> +/**
> + * struct iommu_hwpt_vtd_s1 - Intel VT-d specific user-managed stage-1
> + *                            page table info (IOMMU_HWPT_TYPE_VTD_S1)

remove "specific user-managed"

> + * @flags: Combination of enum iommu_hwpt_vtd_s1_flags
> + * @pgtbl_addr: The base address of the stage-1 page table.
> + * @addr_width: The address width of the stage-1 page table
> + * @__reserved: Must be 0
> + *
> + * VT-d specific data for creating a stage-1 page table that is used
> + * in nested translation.

this doesn't add more info. could be removed.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 02/12] iommu/vt-d: Extend dmar_domain to support nested domain
  2023-07-24 11:13 ` [PATCH v4 02/12] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
@ 2023-08-02  6:42   ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  6:42 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:13 PM
> 
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> The nested domain fields are exclusive to those that used for a DMA
> remapping domain. Use union to avoid memory waste.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 03/12] iommu/vt-d: Add helper for nested domain allocation
  2023-07-24 11:13 ` [PATCH v4 03/12] iommu/vt-d: Add helper for nested domain allocation Yi Liu
@ 2023-08-02  6:44   ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  6:44 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Jacob Pan

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:13 PM
> 
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> This adds helper for accepting user parameters and allocate a nested
> domain.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation
  2023-07-24 11:13 ` [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
@ 2023-08-02  7:10   ` Tian, Kevin
  2023-08-02  8:09     ` Baolu Lu
  2023-08-03  3:13     ` Baolu Lu
  0 siblings, 2 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  7:10 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Jacob Pan

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:13 PM
>  }
> +
> +/**
> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
> + * This could be used for nested translation based vIOMMU. e.g. guest IOVA

s/could be/is/

> + * and guest shared virtual address. In this case, the first level page
> + * tables are used for GVA/GIOVA-GPA translation in the guest, second level
> + * page tables are used for GPA-HPA translation.

let's be consistent on using stage-1/stage-2

btw the convention is to have 1-line summary, then the list of
parameters followed by detail explanation of the function.

> + *
> + * @iommu:      IOMMU which the device belong to
> + * @dev:        Device to be set up for translation
> + * @pasid:      PASID to be programmed in the device PASID table
> + * @domain:     User stage-1 domain nested on a s2 domain
> + */
> +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device
> *dev,
> +			     u32 pasid, struct dmar_domain *domain)
> +{
> +	struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg;
> +	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
> +	struct dmar_domain *s2_domain = domain->s2_domain;
> +	u16 did = domain_id_iommu(domain, iommu);
> +	struct dma_pte *pgd = s2_domain->pgd;
> +	struct pasid_entry *pte;
> +
> +	if (!ecap_nest(iommu->ecap)) {
> +		pr_err_ratelimited("%s: No nested translation support\n",
> +				   iommu->name);
> +		return -ENODEV;
> +	}

-EINVAL

> +
> +	if (s2_domain->agaw > iommu->agaw) {
> +		pr_err_ratelimited("Incompatible agaw %s\n", iommu-
> >name);
> +		return -EINVAL;
> +	}

there is a duplicated check in intel_nested_attach_dev().

> +
> +	pasid_set_slptr(pte, virt_to_phys(pgd));
> +	pasid_set_fault_enable(pte);
> +	pasid_set_domain_id(pte, did);
> +	pasid_set_address_width(pte, s2_domain->agaw);
> +	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> +	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> +	pasid_set_present(pte);
> +	spin_unlock(&iommu->lock);
> +

this lacks of handling of force_snooping

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 05/12] iommu/vt-d: Make domain attach helpers to be extern
  2023-07-24 11:13 ` [PATCH v4 05/12] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
@ 2023-08-02  7:14   ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  7:14 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:13 PM
> 
> This makes the helpers visible to nested.c.
> 
> Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 06/12] iommu/vt-d: Set the nested domain to a device
  2023-07-24 11:13 ` [PATCH v4 06/12] iommu/vt-d: Set the nested domain to a device Yi Liu
@ 2023-08-02  7:22   ` Tian, Kevin
  2023-08-03  3:17     ` Baolu Lu
  0 siblings, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  7:22 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Jacob Pan

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:13 PM
> +
> +static int intel_nested_attach_dev(struct iommu_domain *domain,
> +				   struct device *dev)
> +{
> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	struct intel_iommu *iommu = info->iommu;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	if (info->domain)
> +		device_block_translation(dev);
> +
> +	if (iommu->agaw < dmar_domain->s2_domain->agaw) {
> +		dev_err_ratelimited(dev, "Adjusted guest address width not
> compatible\n");
> +		return -ENODEV;
> +	}

this is the check duplicated with patch04.

> +
> +	ret = domain_attach_iommu(dmar_domain, iommu);
> +	if (ret) {
> +		dev_err_ratelimited(dev, "Failed to attach domain to
> iommu\n");
> +		return ret;
> +	}
> +

[...]

> +	domain_update_iommu_cap(dmar_domain);

iommu_cap is already updated in domain_attach_iommu().

> 
>  static const struct iommu_domain_ops intel_nested_domain_ops = {
> +	.attach_dev		= intel_nested_attach_dev,
>  	.free			= intel_nested_domain_free,
> +	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,

this is not required. enforce_cache_coherency() will be called on parent
hwpt when it's being created. patch04 should check parent's force_snooping
to set pgsnp in the pasid entry.

As Jason explained it should be done only for kernel owned page table.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
  2023-07-24 11:13 ` [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation Yi Liu
@ 2023-08-02  7:41   ` Tian, Kevin
  2023-08-02 13:47     ` Jason Gunthorpe
  0 siblings, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  7:41 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:13 PM
> 
> This adds the data structure for flushing iotlb for the nested domain
> allocated with IOMMU_HWPT_TYPE_VTD_S1 type.
> 
> Cache invalidation path is performance path, so it's better to avoid
> memory allocation in such path. To achieve it, this path reuses the
> ucmd_buffer to copy user data. So the new data structures are added in
> the ucmd_buffer union to avoid overflow.

this patch has nothing to do with ucmd_buffer

> +
> +/**
> + * struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation
> + *                                       (IOMMU_HWPT_TYPE_VTD_S1)
> + * @flags: Must be 0
> + * @entry_size: Size in bytes of each cache invalidation request
> + * @entry_nr_uptr: User pointer to the number of invalidation requests.
> + *                 Kernel reads it to get the number of requests and
> + *                 updates the buffer with the number of requests that
> + *                 have been processed successfully. This pointer must
> + *                 point to a __u32 type of memory location.
> + * @inv_data_uptr: Pointer to the cache invalidation requests
> + *
> + * The Intel VT-d specific invalidation data for a set of cache invalidation
> + * requests. Kernel loops the requests one-by-one and stops when failure
> + * is encountered. The number of handled requests is reported to user by
> + * writing the buffer pointed by @entry_nr_uptr.
> + */
> +struct iommu_hwpt_vtd_s1_invalidate {
> +	__u32 flags;
> +	__u32 entry_size;
> +	__aligned_u64 entry_nr_uptr;
> +	__aligned_u64 inv_data_uptr;
> +};
> +

I wonder whether this array can be defined directly in the common
struct iommu_hwpt_invalidate so there is no need for underlying
iommu driver to further deal with user buffers, including various
minsz/backward compat. check. 

smmu may not require it by using a native queue format. But that
could be considered as a special case of 1-entry array. With careful
coding the added cost should be negligible.

Jason, your thought?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 08/12] iommu/vt-d: Make iotlb flush helpers to be extern
  2023-07-24 11:13 ` [PATCH v4 08/12] iommu/vt-d: Make iotlb flush helpers to be extern Yi Liu
@ 2023-08-02  7:41   ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  7:41 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:14 PM
> 
> This makes the helpers visible to nested.c.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-07-24 11:13 ` [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain Yi Liu
@ 2023-08-02  7:46   ` Tian, Kevin
  2023-08-03  3:24     ` Baolu Lu
  2023-08-07 15:08     ` Liu, Yi L
  0 siblings, 2 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  7:46 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:14 PM
> 
> +static int intel_nested_cache_invalidate_user(struct iommu_domain
> *domain,
> +					      void *user_data)
> +{
> +	struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
> +	struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	unsigned int entry_size = inv_info->entry_size;
> +	u64 uptr = inv_info->inv_data_uptr;
> +	u64 nr_uptr = inv_info->entry_nr_uptr;
> +	struct device_domain_info *info;
> +	u32 entry_nr, index;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
> +		return -EFAULT;
> +
> +	for (index = 0; index < entry_nr; index++) {
> +		ret = copy_struct_from_user(req, sizeof(*req),
> +					    u64_to_user_ptr(uptr + index *
> entry_size),
> +					    entry_size);

If continuing this direction then the driver should also check minsz etc.
for struct iommu_hwpt_vtd_s1_invalidate and iommu_hwpt_vtd_s1_invalidate_desc
since they are uAPI and subject to change.

> +		if (ret) {
> +			pr_err_ratelimited("Failed to fetch invalidation
> request\n");
> +			break;
> +		}
> +
> +		if (req->__reserved || (req->flags &
> ~IOMMU_VTD_QI_FLAGS_LEAF) ||
> +		    !IS_ALIGNED(req->addr, VTD_PAGE_SIZE)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		spin_lock_irqsave(&dmar_domain->lock, flags);
> +		list_for_each_entry(info, &dmar_domain->devices, link)
> +			intel_nested_invalidate(info->dev, dmar_domain,
> +						req->addr, req->npages);
> +		spin_unlock_irqrestore(&dmar_domain->lock, flags);

Disabling interrupt while invalidating iotlb is certainly unacceptable.

Actually there is no need to walk devices. Under dmar_domain there
is already a list of attached iommu's. 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 10/12] iommu/vt-d: Add nested domain allocation
  2023-07-24 11:13 ` [PATCH v4 10/12] iommu/vt-d: Add nested domain allocation Yi Liu
@ 2023-08-02  7:47   ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  7:47 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:14 PM
> 
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> This adds the support for IOMMU_HWPT_TYPE_VTD_S1 type.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 12/12] iommu/vt-d: Disallow nesting on domains with read-only mappings
  2023-07-24 11:13 ` [PATCH v4 12/12] iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu
@ 2023-08-02  7:59   ` Tian, Kevin
  2023-08-03  3:27     ` Baolu Lu
  0 siblings, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  7:59 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:14 PM
>
> @@ -2147,6 +2148,18 @@ __domain_mapping(struct dmar_domain
> *domain, unsigned long iov_pfn,
>  	if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
>  		return -EINVAL;
> 
> +	if (!(prot & DMA_PTE_WRITE) && !domain->read_only_mapped) {
> +		spin_lock_irqsave(&domain->lock, flags);
> +		if (domain->set_nested) {
> +			pr_err_ratelimited("No read-only mapping
> permitted\n");

"Read-only mapping is disallowed on the domain which serves as the
parent in a nested configuration, due to HW errata (ERRATA_772415_SPR17)"

> +	u8 read_only_mapped:1;		/* domain has mappings with
> read-only
> +					 * permission.
> +					 */
> +	u8 set_nested:1;		/* has other domains nested on it */

what about "is_parent"?

> 
> +	spin_lock_irqsave(&s2_dmar_domain->lock, flags);
> +	if (s2_dmar_domain->read_only_mapped) {
> +		spin_unlock_irqrestore(&s2_dmar_domain->lock, flags);
> +		pr_err_ratelimited("S2 domain has read-only mappings\n");

"Nested configuration is disallowed when the stage-2 domain already
has read-only mappings, due to HW errata (ERRATA_772415_SPR17)"


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation
  2023-08-02  7:10   ` Tian, Kevin
@ 2023-08-02  8:09     ` Baolu Lu
  2023-08-02  8:11       ` Baolu Lu
  2023-08-03  3:13     ` Baolu Lu
  1 sibling, 1 reply; 75+ messages in thread
From: Baolu Lu @ 2023-08-02  8:09 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong, Jacob Pan

On 2023/8/2 15:10, Tian, Kevin wrote:
>> +
>> +	pasid_set_slptr(pte, virt_to_phys(pgd));
>> +	pasid_set_fault_enable(pte);
>> +	pasid_set_domain_id(pte, did);
>> +	pasid_set_address_width(pte, s2_domain->agaw);
>> +	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
>> +	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
>> +	pasid_set_present(pte);
>> +	spin_unlock(&iommu->lock);
>> +
> this lacks of handling of force_snooping

If stage-2 domain has force_snooping attribute set, then we should set
the bit field in the pasid entry, right?

How about below additional change?

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 19ac4084913b..86db81ec91f1 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -827,6 +827,9 @@ int intel_pasid_setup_nested(struct intel_iommu 
*iommu, struct device *dev,
         if (s1_cfg->flags & IOMMU_VTD_S1_EAFE)
                 pasid_set_eafe(pte);

+       if (s2_domain>force_snooping)
+               pasid_set_pgsnp(pte);
+
         pasid_set_slptr(pte, virt_to_phys(pgd));
         pasid_set_fault_enable(pte);
         pasid_set_domain_id(pte, did);

Best regards,
baolu

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation
  2023-08-02  8:09     ` Baolu Lu
@ 2023-08-02  8:11       ` Baolu Lu
  2023-08-02  8:13         ` Tian, Kevin
  0 siblings, 1 reply; 75+ messages in thread
From: Baolu Lu @ 2023-08-02  8:11 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong, Jacob Pan

On 2023/8/2 16:09, Baolu Lu wrote:
> On 2023/8/2 15:10, Tian, Kevin wrote:
>>> +
>>> +    pasid_set_slptr(pte, virt_to_phys(pgd));
>>> +    pasid_set_fault_enable(pte);
>>> +    pasid_set_domain_id(pte, did);
>>> +    pasid_set_address_width(pte, s2_domain->agaw);
>>> +    pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
>>> +    pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
>>> +    pasid_set_present(pte);
>>> +    spin_unlock(&iommu->lock);
>>> +
>> this lacks of handling of force_snooping
> 
> If stage-2 domain has force_snooping attribute set, then we should set
> the bit field in the pasid entry, right?
> 
> How about below additional change?
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 19ac4084913b..86db81ec91f1 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -827,6 +827,9 @@ int intel_pasid_setup_nested(struct intel_iommu 
> *iommu, struct device *dev,
>          if (s1_cfg->flags & IOMMU_VTD_S1_EAFE)
>                  pasid_set_eafe(pte);
> 
> +       if (s2_domain>force_snooping)

+       if (s2_domain->force_snooping)

Sorry for typo.

> +               pasid_set_pgsnp(pte);
> +
>          pasid_set_slptr(pte, virt_to_phys(pgd));
>          pasid_set_fault_enable(pte);
>          pasid_set_domain_id(pte, did);
> 
> Best regards,
> baolu


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation
  2023-08-02  8:11       ` Baolu Lu
@ 2023-08-02  8:13         ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-02  8:13 UTC (permalink / raw)
  To: Baolu Lu, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Jacob Pan

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, August 2, 2023 4:12 PM
> 
> On 2023/8/2 16:09, Baolu Lu wrote:
> > On 2023/8/2 15:10, Tian, Kevin wrote:
> >>> +
> >>> +    pasid_set_slptr(pte, virt_to_phys(pgd));
> >>> +    pasid_set_fault_enable(pte);
> >>> +    pasid_set_domain_id(pte, did);
> >>> +    pasid_set_address_width(pte, s2_domain->agaw);
> >>> +    pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> >>> +    pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> >>> +    pasid_set_present(pte);
> >>> +    spin_unlock(&iommu->lock);
> >>> +
> >> this lacks of handling of force_snooping
> >
> > If stage-2 domain has force_snooping attribute set, then we should set
> > the bit field in the pasid entry, right?
> >
> > How about below additional change?

yes

> >
> > diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> > index 19ac4084913b..86db81ec91f1 100644
> > --- a/drivers/iommu/intel/pasid.c
> > +++ b/drivers/iommu/intel/pasid.c
> > @@ -827,6 +827,9 @@ int intel_pasid_setup_nested(struct intel_iommu
> > *iommu, struct device *dev,
> >          if (s1_cfg->flags & IOMMU_VTD_S1_EAFE)
> >                  pasid_set_eafe(pte);
> >
> > +       if (s2_domain>force_snooping)
> 
> +       if (s2_domain->force_snooping)
> 
> Sorry for typo.
> 
> > +               pasid_set_pgsnp(pte);
> > +
> >          pasid_set_slptr(pte, virt_to_phys(pgd));
> >          pasid_set_fault_enable(pte);
> >          pasid_set_domain_id(pte, did);
> >
> > Best regards,
> > baolu


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
  2023-08-02  7:41   ` Tian, Kevin
@ 2023-08-02 13:47     ` Jason Gunthorpe
  2023-08-03  0:38       ` Tian, Kevin
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-02 13:47 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, Aug 02, 2023 at 07:41:05AM +0000, Tian, Kevin wrote:
> > +/**
> > + * struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation
> > + *                                       (IOMMU_HWPT_TYPE_VTD_S1)
> > + * @flags: Must be 0
> > + * @entry_size: Size in bytes of each cache invalidation request
> > + * @entry_nr_uptr: User pointer to the number of invalidation requests.
> > + *                 Kernel reads it to get the number of requests and
> > + *                 updates the buffer with the number of requests that
> > + *                 have been processed successfully. This pointer must
> > + *                 point to a __u32 type of memory location.
> > + * @inv_data_uptr: Pointer to the cache invalidation requests
> > + *
> > + * The Intel VT-d specific invalidation data for a set of cache invalidation
> > + * requests. Kernel loops the requests one-by-one and stops when failure
> > + * is encountered. The number of handled requests is reported to user by
> > + * writing the buffer pointed by @entry_nr_uptr.
> > + */
> > +struct iommu_hwpt_vtd_s1_invalidate {
> > +	__u32 flags;
> > +	__u32 entry_size;
> > +	__aligned_u64 entry_nr_uptr;
> > +	__aligned_u64 inv_data_uptr;
> > +};
> > +
> 
> I wonder whether this array can be defined directly in the common
> struct iommu_hwpt_invalidate so there is no need for underlying
> iommu driver to further deal with user buffers, including various
> minsz/backward compat. check. 

You want to have an array and another chunk of data?

What is the array for? To do batching?

It means we have to allocate memory on this path, that doesn't seem
like the right direction for a performance improvement..

Having the driver copy in a loop might be better

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
  2023-08-02 13:47     ` Jason Gunthorpe
@ 2023-08-03  0:38       ` Tian, Kevin
  2023-08-04 13:04         ` Liu, Yi L
  0 siblings, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-03  0:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, August 2, 2023 9:48 PM
> 
> On Wed, Aug 02, 2023 at 07:41:05AM +0000, Tian, Kevin wrote:
> > > +/**
> > > + * struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation
> > > + *                                       (IOMMU_HWPT_TYPE_VTD_S1)
> > > + * @flags: Must be 0
> > > + * @entry_size: Size in bytes of each cache invalidation request
> > > + * @entry_nr_uptr: User pointer to the number of invalidation requests.
> > > + *                 Kernel reads it to get the number of requests and
> > > + *                 updates the buffer with the number of requests that
> > > + *                 have been processed successfully. This pointer must
> > > + *                 point to a __u32 type of memory location.
> > > + * @inv_data_uptr: Pointer to the cache invalidation requests
> > > + *
> > > + * The Intel VT-d specific invalidation data for a set of cache invalidation
> > > + * requests. Kernel loops the requests one-by-one and stops when
> failure
> > > + * is encountered. The number of handled requests is reported to user
> by
> > > + * writing the buffer pointed by @entry_nr_uptr.
> > > + */
> > > +struct iommu_hwpt_vtd_s1_invalidate {
> > > +	__u32 flags;
> > > +	__u32 entry_size;
> > > +	__aligned_u64 entry_nr_uptr;
> > > +	__aligned_u64 inv_data_uptr;
> > > +};
> > > +
> >
> > I wonder whether this array can be defined directly in the common
> > struct iommu_hwpt_invalidate so there is no need for underlying
> > iommu driver to further deal with user buffers, including various
> > minsz/backward compat. check.
> 
> You want to have an array and another chunk of data?
> 
> What is the array for? To do batching?

yes, it's for batching

> 
> It means we have to allocate memory on this path, that doesn't seem
> like the right direction for a performance improvement..

It reuses the ucmd_buffer to avoid memory allocation:

@@ -485,6 +485,12 @@ union ucmd_buffer {
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
+	/*
+	 * hwpt_type specific structure used in the cache invalidation
+	 * path.
+	 */
+	struct iommu_hwpt_vtd_s1_invalidate vtd;
+	struct iommu_hwpt_vtd_s1_invalidate_desc req_vtd;
 };

I don't quite like this way.

> 
> Having the driver copy in a loop might be better
> 

Can you elaborate?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation
  2023-08-02  7:10   ` Tian, Kevin
  2023-08-02  8:09     ` Baolu Lu
@ 2023-08-03  3:13     ` Baolu Lu
  2023-08-03  3:58       ` Tian, Kevin
  1 sibling, 1 reply; 75+ messages in thread
From: Baolu Lu @ 2023-08-03  3:13 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong, Jacob Pan

On 2023/8/2 15:10, Tian, Kevin wrote:
>> From: Liu, Yi L<yi.l.liu@intel.com>
>> Sent: Monday, July 24, 2023 7:13 PM
>>   }
>> +
>> +/**
>> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
>> + * This could be used for nested translation based vIOMMU. e.g. guest IOVA
> s/could be/is/

Ack.

> 
>> + * and guest shared virtual address. In this case, the first level page
>> + * tables are used for GVA/GIOVA-GPA translation in the guest, second level
>> + * page tables are used for GPA-HPA translation.
> let's be consistent on using stage-1/stage-2
> 
> btw the convention is to have 1-line summary, then the list of
> parameters followed by detail explanation of the function.
> 

This patch just follows the existing code style in this file. Need a
separated patch to cleanup this.

>> + *
>> + * @iommu:      IOMMU which the device belong to
>> + * @dev:        Device to be set up for translation
>> + * @pasid:      PASID to be programmed in the device PASID table
>> + * @domain:     User stage-1 domain nested on a s2 domain
>> + */
>> +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device
>> *dev,
>> +			     u32 pasid, struct dmar_domain *domain)
>> +{
>> +	struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg;
>> +	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
>> +	struct dmar_domain *s2_domain = domain->s2_domain;
>> +	u16 did = domain_id_iommu(domain, iommu);
>> +	struct dma_pte *pgd = s2_domain->pgd;
>> +	struct pasid_entry *pte;
>> +
>> +	if (!ecap_nest(iommu->ecap)) {
>> +		pr_err_ratelimited("%s: No nested translation support\n",
>> +				   iommu->name);
>> +		return -ENODEV;
>> +	}
> -EINVAL

This is in the attach domain path. -EINVAL has the special meaning of
"this domain is not compatible with iommu for the device".

So here, I still think we should return -ENODEV and the caller doesn't
need to retry anymore.

> 
>> +
>> +	if (s2_domain->agaw > iommu->agaw) {
>> +		pr_err_ratelimited("Incompatible agaw %s\n", iommu-
>>> name);
>> +		return -EINVAL;
>> +	}
> there is a duplicated check in intel_nested_attach_dev().
> 

Yeah, should be removed.

Best regards,
baolu


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 06/12] iommu/vt-d: Set the nested domain to a device
  2023-08-02  7:22   ` Tian, Kevin
@ 2023-08-03  3:17     ` Baolu Lu
  0 siblings, 0 replies; 75+ messages in thread
From: Baolu Lu @ 2023-08-03  3:17 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong, Jacob Pan

On 2023/8/2 15:22, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Monday, July 24, 2023 7:13 PM
>> +
>> +static int intel_nested_attach_dev(struct iommu_domain *domain,
>> +				   struct device *dev)
>> +{
>> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
>> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> +	struct intel_iommu *iommu = info->iommu;
>> +	unsigned long flags;
>> +	int ret = 0;
>> +
>> +	if (info->domain)
>> +		device_block_translation(dev);
>> +
>> +	if (iommu->agaw < dmar_domain->s2_domain->agaw) {
>> +		dev_err_ratelimited(dev, "Adjusted guest address width not
>> compatible\n");
>> +		return -ENODEV;
>> +	}
> 
> this is the check duplicated with patch04.

Ack. No need for a check here.

> 
>> +
>> +	ret = domain_attach_iommu(dmar_domain, iommu);
>> +	if (ret) {
>> +		dev_err_ratelimited(dev, "Failed to attach domain to
>> iommu\n");
>> +		return ret;
>> +	}
>> +
> 
> [...]
> 
>> +	domain_update_iommu_cap(dmar_domain);
> 
> iommu_cap is already updated in domain_attach_iommu().

Ack. We will eventually remove this helper when every domain is
allocated for a specific device.

>>
>>   static const struct iommu_domain_ops intel_nested_domain_ops = {
>> +	.attach_dev		= intel_nested_attach_dev,
>>   	.free			= intel_nested_domain_free,
>> +	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
> 
> this is not required. enforce_cache_coherency() will be called on parent
> hwpt when it's being created. patch04 should check parent's force_snooping
> to set pgsnp in the pasid entry.
> 
> As Jason explained it should be done only for kernel owned page table.

Ack and thanks!

Best regards,
baolu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-02  7:46   ` Tian, Kevin
@ 2023-08-03  3:24     ` Baolu Lu
  2023-08-03  4:00       ` Tian, Kevin
  2023-08-07 15:08     ` Liu, Yi L
  1 sibling, 1 reply; 75+ messages in thread
From: Baolu Lu @ 2023-08-03  3:24 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On 2023/8/2 15:46, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Monday, July 24, 2023 7:14 PM
>>
>> +static int intel_nested_cache_invalidate_user(struct iommu_domain
>> *domain,
>> +					      void *user_data)
>> +{
>> +	struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
>> +	struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
>> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> +	unsigned int entry_size = inv_info->entry_size;
>> +	u64 uptr = inv_info->inv_data_uptr;
>> +	u64 nr_uptr = inv_info->entry_nr_uptr;
>> +	struct device_domain_info *info;
>> +	u32 entry_nr, index;
>> +	unsigned long flags;
>> +	int ret = 0;
>> +
>> +	if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
>> +		return -EFAULT;
>> +
>> +	for (index = 0; index < entry_nr; index++) {
>> +		ret = copy_struct_from_user(req, sizeof(*req),
>> +					    u64_to_user_ptr(uptr + index *
>> entry_size),
>> +					    entry_size);
> 
> If continuing this direction then the driver should also check minsz etc.
> for struct iommu_hwpt_vtd_s1_invalidate and iommu_hwpt_vtd_s1_invalidate_desc
> since they are uAPI and subject to change.

Agreed.

> 
>> +		if (ret) {
>> +			pr_err_ratelimited("Failed to fetch invalidation
>> request\n");
>> +			break;
>> +		}
>> +
>> +		if (req->__reserved || (req->flags &
>> ~IOMMU_VTD_QI_FLAGS_LEAF) ||
>> +		    !IS_ALIGNED(req->addr, VTD_PAGE_SIZE)) {
>> +			ret = -EINVAL;
>> +			break;
>> +		}
>> +
>> +		spin_lock_irqsave(&dmar_domain->lock, flags);
>> +		list_for_each_entry(info, &dmar_domain->devices, link)
>> +			intel_nested_invalidate(info->dev, dmar_domain,
>> +						req->addr, req->npages);
>> +		spin_unlock_irqrestore(&dmar_domain->lock, flags);
> 
> Disabling interrupt while invalidating iotlb is certainly unacceptable.
> 
> Actually there is no need to walk devices. Under dmar_domain there
> is already a list of attached iommu's.

Walking device is only necessary when invalidating device TLB. For iotlb
invalidation, it only needs to know the iommu's.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 12/12] iommu/vt-d: Disallow nesting on domains with read-only mappings
  2023-08-02  7:59   ` Tian, Kevin
@ 2023-08-03  3:27     ` Baolu Lu
  2023-08-03  4:00       ` Tian, Kevin
  0 siblings, 1 reply; 75+ messages in thread
From: Baolu Lu @ 2023-08-03  3:27 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On 2023/8/2 15:59, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Monday, July 24, 2023 7:14 PM
>>
>> @@ -2147,6 +2148,18 @@ __domain_mapping(struct dmar_domain
>> *domain, unsigned long iov_pfn,
>>   	if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
>>   		return -EINVAL;
>>
>> +	if (!(prot & DMA_PTE_WRITE) && !domain->read_only_mapped) {
>> +		spin_lock_irqsave(&domain->lock, flags);
>> +		if (domain->set_nested) {
>> +			pr_err_ratelimited("No read-only mapping
>> permitted\n");
> 
> "Read-only mapping is disallowed on the domain which serves as the
> parent in a nested configuration, due to HW errata (ERRATA_772415_SPR17)"

Ack.

> 
>> +	u8 read_only_mapped:1;		/* domain has mappings with
>> read-only
>> +					 * permission.
>> +					 */
>> +	u8 set_nested:1;		/* has other domains nested on it */
> 
> what about "is_parent"?

"is_parent" is still a bit generic. How about "is_nested_parent"?

> 
>>
>> +	spin_lock_irqsave(&s2_dmar_domain->lock, flags);
>> +	if (s2_dmar_domain->read_only_mapped) {
>> +		spin_unlock_irqrestore(&s2_dmar_domain->lock, flags);
>> +		pr_err_ratelimited("S2 domain has read-only mappings\n");
> 
> "Nested configuration is disallowed when the stage-2 domain already
> has read-only mappings, due to HW errata (ERRATA_772415_SPR17)"

Ack.

Best regards,
baolu


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation
  2023-08-03  3:13     ` Baolu Lu
@ 2023-08-03  3:58       ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-03  3:58 UTC (permalink / raw)
  To: Baolu Lu, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Jacob Pan

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, August 3, 2023 11:13 AM
> 
> On 2023/8/2 15:10, Tian, Kevin wrote:
> >> From: Liu, Yi L<yi.l.liu@intel.com>
> >> Sent: Monday, July 24, 2023 7:13 PM
> >> +
> >> +	if (!ecap_nest(iommu->ecap)) {
> >> +		pr_err_ratelimited("%s: No nested translation support\n",
> >> +				   iommu->name);
> >> +		return -ENODEV;
> >> +	}
> > -EINVAL
> 
> This is in the attach domain path. -EINVAL has the special meaning of
> "this domain is not compatible with iommu for the device".
> 
> So here, I still think we should return -ENODEV and the caller doesn't
> need to retry anymore.
> 

You are right. I overlooked that this validation is for a device cap.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-03  3:24     ` Baolu Lu
@ 2023-08-03  4:00       ` Tian, Kevin
  2023-08-03  4:05         ` Baolu Lu
  0 siblings, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-03  4:00 UTC (permalink / raw)
  To: Baolu Lu, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, August 3, 2023 11:25 AM
> 
> On 2023/8/2 15:46, Tian, Kevin wrote:
> >> From: Liu, Yi L <yi.l.liu@intel.com>
> >> Sent: Monday, July 24, 2023 7:14 PM
> >>
> >> +
> >> +		spin_lock_irqsave(&dmar_domain->lock, flags);
> >> +		list_for_each_entry(info, &dmar_domain->devices, link)
> >> +			intel_nested_invalidate(info->dev, dmar_domain,
> >> +						req->addr, req->npages);
> >> +		spin_unlock_irqrestore(&dmar_domain->lock, flags);
> >
> > Disabling interrupt while invalidating iotlb is certainly unacceptable.
> >
> > Actually there is no need to walk devices. Under dmar_domain there
> > is already a list of attached iommu's.
> 
> Walking device is only necessary when invalidating device TLB. For iotlb
> invalidation, it only needs to know the iommu's.
> 

even for device tlb we may think whether there is any better way
to avoid disabling interrupt. It's a slow path, especially in a guest.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 12/12] iommu/vt-d: Disallow nesting on domains with read-only mappings
  2023-08-03  3:27     ` Baolu Lu
@ 2023-08-03  4:00       ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-03  4:00 UTC (permalink / raw)
  To: Baolu Lu, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, August 3, 2023 11:27 AM
> 
> On 2023/8/2 15:59, Tian, Kevin wrote:
> >> From: Liu, Yi L <yi.l.liu@intel.com>
> >> Sent: Monday, July 24, 2023 7:14 PM
> >>
> >> +	u8 read_only_mapped:1;		/* domain has mappings with
> >> read-only
> >> +					 * permission.
> >> +					 */
> >> +	u8 set_nested:1;		/* has other domains nested on it */
> >
> > what about "is_parent"?
> 
> "is_parent" is still a bit generic. How about "is_nested_parent"?
> 

looks good.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-03  4:00       ` Tian, Kevin
@ 2023-08-03  4:05         ` Baolu Lu
  2023-08-03  4:13           ` Tian, Kevin
  0 siblings, 1 reply; 75+ messages in thread
From: Baolu Lu @ 2023-08-03  4:05 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On 2023/8/3 12:00, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Thursday, August 3, 2023 11:25 AM
>>
>> On 2023/8/2 15:46, Tian, Kevin wrote:
>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>> Sent: Monday, July 24, 2023 7:14 PM
>>>>
>>>> +
>>>> +		spin_lock_irqsave(&dmar_domain->lock, flags);
>>>> +		list_for_each_entry(info, &dmar_domain->devices, link)
>>>> +			intel_nested_invalidate(info->dev, dmar_domain,
>>>> +						req->addr, req->npages);
>>>> +		spin_unlock_irqrestore(&dmar_domain->lock, flags);
>>>
>>> Disabling interrupt while invalidating iotlb is certainly unacceptable.
>>>
>>> Actually there is no need to walk devices. Under dmar_domain there
>>> is already a list of attached iommu's.
>>
>> Walking device is only necessary when invalidating device TLB. For iotlb
>> invalidation, it only needs to know the iommu's.
>>
> 
> even for device tlb we may think whether there is any better way
> to avoid disabling interrupt. It's a slow path, especially in a guest.

I ever tried this. But some device drivers call iommu_unmap() in the
interrupt critical path. :-( So we have a long way to go.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-03  4:05         ` Baolu Lu
@ 2023-08-03  4:13           ` Tian, Kevin
  2023-08-03  7:36             ` Baolu Lu
  0 siblings, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-03  4:13 UTC (permalink / raw)
  To: Baolu Lu, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, August 3, 2023 12:06 PM
> 
> On 2023/8/3 12:00, Tian, Kevin wrote:
> >> From: Baolu Lu <baolu.lu@linux.intel.com>
> >> Sent: Thursday, August 3, 2023 11:25 AM
> >>
> >> On 2023/8/2 15:46, Tian, Kevin wrote:
> >>>> From: Liu, Yi L <yi.l.liu@intel.com>
> >>>> Sent: Monday, July 24, 2023 7:14 PM
> >>>>
> >>>> +
> >>>> +		spin_lock_irqsave(&dmar_domain->lock, flags);
> >>>> +		list_for_each_entry(info, &dmar_domain->devices, link)
> >>>> +			intel_nested_invalidate(info->dev, dmar_domain,
> >>>> +						req->addr, req->npages);
> >>>> +		spin_unlock_irqrestore(&dmar_domain->lock, flags);
> >>>
> >>> Disabling interrupt while invalidating iotlb is certainly unacceptable.
> >>>
> >>> Actually there is no need to walk devices. Under dmar_domain there
> >>> is already a list of attached iommu's.
> >>
> >> Walking device is only necessary when invalidating device TLB. For iotlb
> >> invalidation, it only needs to know the iommu's.
> >>
> >
> > even for device tlb we may think whether there is any better way
> > to avoid disabling interrupt. It's a slow path, especially in a guest.
> 
> I ever tried this. But some device drivers call iommu_unmap() in the
> interrupt critical path. :-( So we have a long way to go.
> 

emmm... this path only comes from iommufd and the domain is
user-managed. There won't be kernel drivers to call iommu_unmap()
on such domain.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-03  4:13           ` Tian, Kevin
@ 2023-08-03  7:36             ` Baolu Lu
  0 siblings, 0 replies; 75+ messages in thread
From: Baolu Lu @ 2023-08-03  7:36 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On 2023/8/3 12:13, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Thursday, August 3, 2023 12:06 PM
>>
>> On 2023/8/3 12:00, Tian, Kevin wrote:
>>>> From: Baolu Lu <baolu.lu@linux.intel.com>
>>>> Sent: Thursday, August 3, 2023 11:25 AM
>>>>
>>>> On 2023/8/2 15:46, Tian, Kevin wrote:
>>>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>>>> Sent: Monday, July 24, 2023 7:14 PM
>>>>>>
>>>>>> +
>>>>>> +		spin_lock_irqsave(&dmar_domain->lock, flags);
>>>>>> +		list_for_each_entry(info, &dmar_domain->devices, link)
>>>>>> +			intel_nested_invalidate(info->dev, dmar_domain,
>>>>>> +						req->addr, req->npages);
>>>>>> +		spin_unlock_irqrestore(&dmar_domain->lock, flags);
>>>>>
>>>>> Disabling interrupt while invalidating iotlb is certainly unacceptable.
>>>>>
>>>>> Actually there is no need to walk devices. Under dmar_domain there
>>>>> is already a list of attached iommu's.
>>>>
>>>> Walking device is only necessary when invalidating device TLB. For iotlb
>>>> invalidation, it only needs to know the iommu's.
>>>>
>>>
>>> even for device tlb we may think whether there is any better way
>>> to avoid disabling interrupt. It's a slow path, especially in a guest.
>>
>> I ever tried this. But some device drivers call iommu_unmap() in the
>> interrupt critical path. :-( So we have a long way to go.
>>
> 
> emmm... this path only comes from iommufd and the domain is
> user-managed. There won't be kernel drivers to call iommu_unmap()
> on such domain.

Probably we can use a different lock for nested domain and add a comment
around the lock with above explanation.

Best regards,
baolu


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
  2023-08-03  0:38       ` Tian, Kevin
@ 2023-08-04 13:04         ` Liu, Yi L
  2023-08-04 14:03           ` Jason Gunthorpe
  0 siblings, 1 reply; 75+ messages in thread
From: Liu, Yi L @ 2023-08-04 13:04 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, August 3, 2023 8:39 AM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, August 2, 2023 9:48 PM
> >
> > On Wed, Aug 02, 2023 at 07:41:05AM +0000, Tian, Kevin wrote:
> > > > +/**
> > > > + * struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation
> > > > + *                                       (IOMMU_HWPT_TYPE_VTD_S1)
> > > > + * @flags: Must be 0
> > > > + * @entry_size: Size in bytes of each cache invalidation request
> > > > + * @entry_nr_uptr: User pointer to the number of invalidation requests.
> > > > + *                 Kernel reads it to get the number of requests and
> > > > + *                 updates the buffer with the number of requests that
> > > > + *                 have been processed successfully. This pointer must
> > > > + *                 point to a __u32 type of memory location.
> > > > + * @inv_data_uptr: Pointer to the cache invalidation requests
> > > > + *
> > > > + * The Intel VT-d specific invalidation data for a set of cache invalidation
> > > > + * requests. Kernel loops the requests one-by-one and stops when
> > failure
> > > > + * is encountered. The number of handled requests is reported to user
> > by
> > > > + * writing the buffer pointed by @entry_nr_uptr.
> > > > + */
> > > > +struct iommu_hwpt_vtd_s1_invalidate {
> > > > +	__u32 flags;
> > > > +	__u32 entry_size;
> > > > +	__aligned_u64 entry_nr_uptr;
> > > > +	__aligned_u64 inv_data_uptr;
> > > > +};
> > > > +
> > >
> > > I wonder whether this array can be defined directly in the common
> > > struct iommu_hwpt_invalidate so there is no need for underlying
> > > iommu driver to further deal with user buffers, including various
> > > minsz/backward compat. check.
> >
> > You want to have an array and another chunk of data?
> >
> > What is the array for? To do batching?
> 
> yes, it's for batching
> 
> >
> > It means we have to allocate memory on this path, that doesn't seem
> > like the right direction for a performance improvement..
> 
> It reuses the ucmd_buffer to avoid memory allocation:

I guess your point is to copy each invalidation descriptor in the common
layer and pass the descriptor to iommu driver. right?

> @@ -485,6 +485,12 @@ union ucmd_buffer {
>  #ifdef CONFIG_IOMMUFD_TEST
>  	struct iommu_test_cmd test;
>  #endif
> +	/*
> +	 * hwpt_type specific structure used in the cache invalidation
> +	 * path.
> +	 */
> +	struct iommu_hwpt_vtd_s1_invalidate vtd;
> +	struct iommu_hwpt_vtd_s1_invalidate_desc req_vtd;
>  };
> 
> I don't quite like this way.

This is because each descriptor is stored in the uncmd_buffer. So
Need to put the struct iommu_hwpt_vtd_s1_invalidate_desc here.

> >
> > Having the driver copy in a loop might be better
> >
> 
> Can you elaborate?

I think Jason means the way in patch 09.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
  2023-08-04 13:04         ` Liu, Yi L
@ 2023-08-04 14:03           ` Jason Gunthorpe
  2023-08-07 14:02             ` Liu, Yi L
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-04 14:03 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, joro, alex.williamson, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Fri, Aug 04, 2023 at 01:04:57PM +0000, Liu, Yi L wrote:
> > > Having the driver copy in a loop might be better
> > >
> > 
> > Can you elaborate?
> 
> I think Jason means the way in patch 09.

Yeah, you can't reuse the stack buffer for an array, so patch 9 copies
each element uniquely.

This is more calls to copy_to_user, which has some cost

But we avoid a memory allocation

Patch 9 should not abuse the user_data, cast it to the inv_info and
just put req on the stack:

	struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
	struct iommu_hwpt_vtd_s1_invalidate_desc req;

But I'm not sure about this entry_size logic, what happens if the
entry_size is larger than the kernel supports? I think it should
fail..

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
  2023-08-04 14:03           ` Jason Gunthorpe
@ 2023-08-07 14:02             ` Liu, Yi L
  0 siblings, 0 replies; 75+ messages in thread
From: Liu, Yi L @ 2023-08-07 14:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, joro, alex.williamson, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, August 4, 2023 10:04 PM
> 
> On Fri, Aug 04, 2023 at 01:04:57PM +0000, Liu, Yi L wrote:
> > > > Having the driver copy in a loop might be better
> > > >
> > >
> > > Can you elaborate?
> >
> > I think Jason means the way in patch 09.
> 
> Yeah, you can't reuse the stack buffer for an array, so patch 9 copies
> each element uniquely.
> 
> This is more calls to copy_to_user, which has some cost
> 
> But we avoid a memory allocation

Yes.

> Patch 9 should not abuse the user_data, cast it to the inv_info and
> just put req on the stack:
> 
> 	struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> 	struct iommu_hwpt_vtd_s1_invalidate_desc req;

Sure. The way in patch 09 is a bit tricky. The above is better and clearer. 😊

> But I'm not sure about this entry_size logic, what happens if the
> entry_size is larger than the kernel supports? I think it should
> fail..

Yes. should fail. It should be failed in copy_struct_from_user() as I use
it to copy the struct iommu_hwpt_vtd_s1_invalidate_desc.

* -E2BIG:  (@usize > @ksize) and there are non-zero trailing bytes in @src.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-02  7:46   ` Tian, Kevin
  2023-08-03  3:24     ` Baolu Lu
@ 2023-08-07 15:08     ` Liu, Yi L
  2023-08-08  3:12       ` Nicolin Chen
  1 sibling, 1 reply; 75+ messages in thread
From: Liu, Yi L @ 2023-08-07 15:08 UTC (permalink / raw)
  To: Tian, Kevin, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, August 2, 2023 3:47 PM
>
> Subject: RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, July 24, 2023 7:14 PM
> >
> > +static int intel_nested_cache_invalidate_user(struct iommu_domain
> > *domain,
> > +					      void *user_data)
> > +{
> > +	struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
> > +	struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> > +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +	unsigned int entry_size = inv_info->entry_size;
> > +	u64 uptr = inv_info->inv_data_uptr;
> > +	u64 nr_uptr = inv_info->entry_nr_uptr;
> > +	struct device_domain_info *info;
> > +	u32 entry_nr, index;
> > +	unsigned long flags;
> > +	int ret = 0;
> > +
> > +	if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
> > +		return -EFAULT;
> > +
> > +	for (index = 0; index < entry_nr; index++) {
> > +		ret = copy_struct_from_user(req, sizeof(*req),
> > +					    u64_to_user_ptr(uptr + index *
> > entry_size),
> > +					    entry_size);
> 
> If continuing this direction then the driver should also check minsz etc.
> for struct iommu_hwpt_vtd_s1_invalidate and iommu_hwpt_vtd_s1_invalidate_desc
> since they are uAPI and subject to change.

Then needs to define size in the uapi data structure, and copy size first and
check minsz before going forward. How about the structures for hwpt alloc
like struct iommu_hwpt_vtd_s1? Should check minsz for them as well?
 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-07 15:08     ` Liu, Yi L
@ 2023-08-08  3:12       ` Nicolin Chen
  2023-08-08 12:34         ` Jason Gunthorpe
  0 siblings, 1 reply; 75+ messages in thread
From: Nicolin Chen @ 2023-08-08  3:12 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Mon, Aug 07, 2023 at 03:08:29PM +0000, Liu, Yi L wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Monday, July 24, 2023 7:14 PM
> > >
> > > +static int intel_nested_cache_invalidate_user(struct iommu_domain
> > > *domain,
> > > +                                         void *user_data)
> > > +{
> > > +   struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
> > > +   struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> > > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > > +   unsigned int entry_size = inv_info->entry_size;
> > > +   u64 uptr = inv_info->inv_data_uptr;
> > > +   u64 nr_uptr = inv_info->entry_nr_uptr;
> > > +   struct device_domain_info *info;
> > > +   u32 entry_nr, index;
> > > +   unsigned long flags;
> > > +   int ret = 0;
> > > +
> > > +   if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
> > > +           return -EFAULT;
> > > +
> > > +   for (index = 0; index < entry_nr; index++) {
> > > +           ret = copy_struct_from_user(req, sizeof(*req),
> > > +                                       u64_to_user_ptr(uptr + index *
> > > entry_size),
> > > +                                       entry_size);
> >
> > If continuing this direction then the driver should also check minsz etc.
> > for struct iommu_hwpt_vtd_s1_invalidate and iommu_hwpt_vtd_s1_invalidate_desc
> > since they are uAPI and subject to change.
> 
> Then needs to define size in the uapi data structure, and copy size first and
> check minsz before going forward. How about the structures for hwpt alloc
> like struct iommu_hwpt_vtd_s1? Should check minsz for them as well?

Assuming that every uAPI data structure needs a min_size, we can
either add a structure holding all min_sizes like iommufd main.c
or have another xx_min_len in iommu_/domain_ops.

Currently, we have the union of structures in iommu.h and also a
xx_data_len in iommu_ops. So, neither of the two places seem to
be optimal from my p.o.v... any suggestion?

Also, alloc allows data_len=0, so a min_size check will be only
applied to data_len > 0 case.

Thanks
Nic

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-08  3:12       ` Nicolin Chen
@ 2023-08-08 12:34         ` Jason Gunthorpe
  2023-08-08 17:41           ` Nicolin Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-08 12:34 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, Tian, Kevin, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Mon, Aug 07, 2023 at 08:12:37PM -0700, Nicolin Chen wrote:
> On Mon, Aug 07, 2023 at 03:08:29PM +0000, Liu, Yi L wrote:
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Monday, July 24, 2023 7:14 PM
> > > >
> > > > +static int intel_nested_cache_invalidate_user(struct iommu_domain
> > > > *domain,
> > > > +                                         void *user_data)
> > > > +{
> > > > +   struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
> > > > +   struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> > > > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > > > +   unsigned int entry_size = inv_info->entry_size;
> > > > +   u64 uptr = inv_info->inv_data_uptr;
> > > > +   u64 nr_uptr = inv_info->entry_nr_uptr;
> > > > +   struct device_domain_info *info;
> > > > +   u32 entry_nr, index;
> > > > +   unsigned long flags;
> > > > +   int ret = 0;
> > > > +
> > > > +   if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
> > > > +           return -EFAULT;
> > > > +
> > > > +   for (index = 0; index < entry_nr; index++) {
> > > > +           ret = copy_struct_from_user(req, sizeof(*req),
> > > > +                                       u64_to_user_ptr(uptr + index *
> > > > entry_size),
> > > > +                                       entry_size);
> > >
> > > If continuing this direction then the driver should also check minsz etc.
> > > for struct iommu_hwpt_vtd_s1_invalidate and iommu_hwpt_vtd_s1_invalidate_desc
> > > since they are uAPI and subject to change.
> > 
> > Then needs to define size in the uapi data structure, and copy size first and
> > check minsz before going forward. How about the structures for hwpt alloc
> > like struct iommu_hwpt_vtd_s1? Should check minsz for them as well?
> 
> Assuming that every uAPI data structure needs a min_size, we can
> either add a structure holding all min_sizes like iommufd main.c
> or have another xx_min_len in iommu_/domain_ops.

If driver is doing the copy it is OK that driver does the min_size
check too

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-08 12:34         ` Jason Gunthorpe
@ 2023-08-08 17:41           ` Nicolin Chen
  2023-08-09  8:22             ` Tian, Kevin
  0 siblings, 1 reply; 75+ messages in thread
From: Nicolin Chen @ 2023-08-08 17:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Tian, Kevin, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Tue, Aug 08, 2023 at 09:34:03AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 07, 2023 at 08:12:37PM -0700, Nicolin Chen wrote:
> > On Mon, Aug 07, 2023 at 03:08:29PM +0000, Liu, Yi L wrote:
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Monday, July 24, 2023 7:14 PM
> > > > >
> > > > > +static int intel_nested_cache_invalidate_user(struct iommu_domain
> > > > > *domain,
> > > > > +                                         void *user_data)
> > > > > +{
> > > > > +   struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
> > > > > +   struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> > > > > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > > > > +   unsigned int entry_size = inv_info->entry_size;
> > > > > +   u64 uptr = inv_info->inv_data_uptr;
> > > > > +   u64 nr_uptr = inv_info->entry_nr_uptr;
> > > > > +   struct device_domain_info *info;
> > > > > +   u32 entry_nr, index;
> > > > > +   unsigned long flags;
> > > > > +   int ret = 0;
> > > > > +
> > > > > +   if (get_user(entry_nr, (uint32_t __user *)u64_to_user_ptr(nr_uptr)))
> > > > > +           return -EFAULT;
> > > > > +
> > > > > +   for (index = 0; index < entry_nr; index++) {
> > > > > +           ret = copy_struct_from_user(req, sizeof(*req),
> > > > > +                                       u64_to_user_ptr(uptr + index *
> > > > > entry_size),
> > > > > +                                       entry_size);
> > > >
> > > > If continuing this direction then the driver should also check minsz etc.
> > > > for struct iommu_hwpt_vtd_s1_invalidate and iommu_hwpt_vtd_s1_invalidate_desc
> > > > since they are uAPI and subject to change.
> > > 
> > > Then needs to define size in the uapi data structure, and copy size first and
> > > check minsz before going forward. How about the structures for hwpt alloc
> > > like struct iommu_hwpt_vtd_s1? Should check minsz for them as well?
> > 
> > Assuming that every uAPI data structure needs a min_size, we can
> > either add a structure holding all min_sizes like iommufd main.c
> > or have another xx_min_len in iommu_/domain_ops.
> 
> If driver is doing the copy it is OK that driver does the min_size
> check too

Ah, just realized my reply above was missing a context..

Yi and I are having a concern that the core iommu_hpwt_alloc()
and iommu_hwpt_cache_invalidate(), in the nesting series, copy
data without checking the min_sizes. So, we are trying to add
the missing piece into the next version but not sure which way
could be optimal.

It probably makes sense to add cache_invalidate_user_min_len
next to the existing cache_invalidate_user_data_len. For the
iommu_hwpt_alloc, we are missing a data_len, as the core just
uses sizeof(union iommu_domain_user_data) when calling the
copy_struct_from_user().

Perhaps we could add two pairs of data_len/min_len in the ops
structs:
	// iommu_ops
	const size_t domain_alloc_user_data_len; // for sanity&copy
	const size_t domain_alloc_user_min_len; // for sanity only
	// iommu_domain_ops
	const size_t cache_invalidate_user_data_len; // for sanity&copy
	const size_t cache_invalidate_user_min_len; // for sanity only

Thanks
Nic

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-08 17:41           ` Nicolin Chen
@ 2023-08-09  8:22             ` Tian, Kevin
  2023-08-09  8:50               ` Liu, Yi L
  0 siblings, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-09  8:22 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, August 9, 2023 1:42 AM
> 
> On Tue, Aug 08, 2023 at 09:34:03AM -0300, Jason Gunthorpe wrote:
> > On Mon, Aug 07, 2023 at 08:12:37PM -0700, Nicolin Chen wrote:
> > > On Mon, Aug 07, 2023 at 03:08:29PM +0000, Liu, Yi L wrote:
> > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > Sent: Monday, July 24, 2023 7:14 PM
> > > > > >
> > > > > > +static int intel_nested_cache_invalidate_user(struct
> iommu_domain
> > > > > > *domain,
> > > > > > +                                         void *user_data)
> > > > > > +{
> > > > > > +   struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
> > > > > > +   struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> > > > > > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > > > > > +   unsigned int entry_size = inv_info->entry_size;
> > > > > > +   u64 uptr = inv_info->inv_data_uptr;
> > > > > > +   u64 nr_uptr = inv_info->entry_nr_uptr;
> > > > > > +   struct device_domain_info *info;
> > > > > > +   u32 entry_nr, index;
> > > > > > +   unsigned long flags;
> > > > > > +   int ret = 0;
> > > > > > +
> > > > > > +   if (get_user(entry_nr, (uint32_t __user
> *)u64_to_user_ptr(nr_uptr)))
> > > > > > +           return -EFAULT;
> > > > > > +
> > > > > > +   for (index = 0; index < entry_nr; index++) {
> > > > > > +           ret = copy_struct_from_user(req, sizeof(*req),
> > > > > > +                                       u64_to_user_ptr(uptr + index *
> > > > > > entry_size),
> > > > > > +                                       entry_size);
> > > > >
> > > > > If continuing this direction then the driver should also check minsz etc.
> > > > > for struct iommu_hwpt_vtd_s1_invalidate and
> iommu_hwpt_vtd_s1_invalidate_desc
> > > > > since they are uAPI and subject to change.
> > > >
> > > > Then needs to define size in the uapi data structure, and copy size first
> and
> > > > check minsz before going forward. How about the structures for hwpt
> alloc
> > > > like struct iommu_hwpt_vtd_s1? Should check minsz for them as well?
> > >
> > > Assuming that every uAPI data structure needs a min_size, we can
> > > either add a structure holding all min_sizes like iommufd main.c
> > > or have another xx_min_len in iommu_/domain_ops.
> >
> > If driver is doing the copy it is OK that driver does the min_size
> > check too
> 
> Ah, just realized my reply above was missing a context..
> 
> Yi and I are having a concern that the core iommu_hpwt_alloc()
> and iommu_hwpt_cache_invalidate(), in the nesting series, copy
> data without checking the min_sizes. So, we are trying to add
> the missing piece into the next version but not sure which way
> could be optimal.
> 
> It probably makes sense to add cache_invalidate_user_min_len
> next to the existing cache_invalidate_user_data_len. For the
> iommu_hwpt_alloc, we are missing a data_len, as the core just
> uses sizeof(union iommu_domain_user_data) when calling the
> copy_struct_from_user().
> 
> Perhaps we could add two pairs of data_len/min_len in the ops
> structs:
> 	// iommu_ops
> 	const size_t domain_alloc_user_data_len; // for sanity&copy
> 	const size_t domain_alloc_user_min_len; // for sanity only
> 	// iommu_domain_ops
> 	const size_t cache_invalidate_user_data_len; // for sanity&copy
> 	const size_t cache_invalidate_user_min_len; // for sanity only
> 

What about creating a simple array to track type specific len in
iommufd instead of adding more fields to iommu/domain_ops?
anyway it's iommufd doing the copy and all the type specific
structures are already defined in the uapi header.

and a similar example already exists in union ucmd_buffer which
includes type specific structures to avoid memory copy...


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-09  8:22             ` Tian, Kevin
@ 2023-08-09  8:50               ` Liu, Yi L
  2023-08-09  8:58                 ` Tian, Kevin
  0 siblings, 1 reply; 75+ messages in thread
From: Liu, Yi L @ 2023-08-09  8:50 UTC (permalink / raw)
  To: Tian, Kevin, Nicolin Chen, Jason Gunthorpe
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, August 9, 2023 4:23 PM
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, August 9, 2023 1:42 AM
> >
> > On Tue, Aug 08, 2023 at 09:34:03AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Aug 07, 2023 at 08:12:37PM -0700, Nicolin Chen wrote:
> > > > On Mon, Aug 07, 2023 at 03:08:29PM +0000, Liu, Yi L wrote:
> > > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > > Sent: Monday, July 24, 2023 7:14 PM
> > > > > > >
> > > > > > > +static int intel_nested_cache_invalidate_user(struct
> > iommu_domain
> > > > > > > *domain,
> > > > > > > +                                         void *user_data)
> > > > > > > +{
> > > > > > > +   struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
> > > > > > > +   struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> > > > > > > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > > > > > > +   unsigned int entry_size = inv_info->entry_size;
> > > > > > > +   u64 uptr = inv_info->inv_data_uptr;
> > > > > > > +   u64 nr_uptr = inv_info->entry_nr_uptr;
> > > > > > > +   struct device_domain_info *info;
> > > > > > > +   u32 entry_nr, index;
> > > > > > > +   unsigned long flags;
> > > > > > > +   int ret = 0;
> > > > > > > +
> > > > > > > +   if (get_user(entry_nr, (uint32_t __user
> > *)u64_to_user_ptr(nr_uptr)))
> > > > > > > +           return -EFAULT;
> > > > > > > +
> > > > > > > +   for (index = 0; index < entry_nr; index++) {
> > > > > > > +           ret = copy_struct_from_user(req, sizeof(*req),
> > > > > > > +                                       u64_to_user_ptr(uptr + index *
> > > > > > > entry_size),
> > > > > > > +                                       entry_size);
> > > > > >
> > > > > > If continuing this direction then the driver should also check minsz etc.
> > > > > > for struct iommu_hwpt_vtd_s1_invalidate and
> > iommu_hwpt_vtd_s1_invalidate_desc
> > > > > > since they are uAPI and subject to change.
> > > > >
> > > > > Then needs to define size in the uapi data structure, and copy size first
> > and
> > > > > check minsz before going forward. How about the structures for hwpt
> > alloc
> > > > > like struct iommu_hwpt_vtd_s1? Should check minsz for them as well?
> > > >
> > > > Assuming that every uAPI data structure needs a min_size, we can
> > > > either add a structure holding all min_sizes like iommufd main.c
> > > > or have another xx_min_len in iommu_/domain_ops.
> > >
> > > If driver is doing the copy it is OK that driver does the min_size
> > > check too
> >
> > Ah, just realized my reply above was missing a context..
> >
> > Yi and I are having a concern that the core iommu_hpwt_alloc()
> > and iommu_hwpt_cache_invalidate(), in the nesting series, copy
> > data without checking the min_sizes. So, we are trying to add
> > the missing piece into the next version but not sure which way
> > could be optimal.
> >
> > It probably makes sense to add cache_invalidate_user_min_len
> > next to the existing cache_invalidate_user_data_len. For the
> > iommu_hwpt_alloc, we are missing a data_len, as the core just
> > uses sizeof(union iommu_domain_user_data) when calling the
> > copy_struct_from_user().
> >
> > Perhaps we could add two pairs of data_len/min_len in the ops
> > structs:
> > 	// iommu_ops
> > 	const size_t domain_alloc_user_data_len; // for sanity&copy
> > 	const size_t domain_alloc_user_min_len; // for sanity only
> > 	// iommu_domain_ops
> > 	const size_t cache_invalidate_user_data_len; // for sanity&copy
> > 	const size_t cache_invalidate_user_min_len; // for sanity only
> >
> 
> What about creating a simple array to track type specific len in
> iommufd instead of adding more fields to iommu/domain_ops?
> anyway it's iommufd doing the copy and all the type specific
> structures are already defined in the uapi header.

Then index the array with type value, is it? Seems like we have defined
such array before for the length of hwpt_alloc and invalidate structures.
but finally we dropped it the array may grow largely per new types.

> 
> and a similar example already exists in union ucmd_buffer which
> includes type specific structures to avoid memory copy...

Not quite get here. ucmd_buffer is a union used to copy any user
data. But here we want to check the minsz of the the user data.
Seems not related.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-09  8:50               ` Liu, Yi L
@ 2023-08-09  8:58                 ` Tian, Kevin
  2023-08-09  9:30                   ` Liu, Yi L
  0 siblings, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-09  8:58 UTC (permalink / raw)
  To: Liu, Yi L, Nicolin Chen, Jason Gunthorpe
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, August 9, 2023 4:50 PM
> 
> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Wednesday, August 9, 2023 4:23 PM
> >
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Wednesday, August 9, 2023 1:42 AM
> > >
> > > On Tue, Aug 08, 2023 at 09:34:03AM -0300, Jason Gunthorpe wrote:
> > > > On Mon, Aug 07, 2023 at 08:12:37PM -0700, Nicolin Chen wrote:
> > > > > On Mon, Aug 07, 2023 at 03:08:29PM +0000, Liu, Yi L wrote:
> > > > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > > > Sent: Monday, July 24, 2023 7:14 PM
> > > > > > > >
> > > > > > > > +static int intel_nested_cache_invalidate_user(struct
> > > iommu_domain
> > > > > > > > *domain,
> > > > > > > > +                                         void *user_data)
> > > > > > > > +{
> > > > > > > > +   struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
> > > > > > > > +   struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> > > > > > > > +   struct dmar_domain *dmar_domain =
> to_dmar_domain(domain);
> > > > > > > > +   unsigned int entry_size = inv_info->entry_size;
> > > > > > > > +   u64 uptr = inv_info->inv_data_uptr;
> > > > > > > > +   u64 nr_uptr = inv_info->entry_nr_uptr;
> > > > > > > > +   struct device_domain_info *info;
> > > > > > > > +   u32 entry_nr, index;
> > > > > > > > +   unsigned long flags;
> > > > > > > > +   int ret = 0;
> > > > > > > > +
> > > > > > > > +   if (get_user(entry_nr, (uint32_t __user
> > > *)u64_to_user_ptr(nr_uptr)))
> > > > > > > > +           return -EFAULT;
> > > > > > > > +
> > > > > > > > +   for (index = 0; index < entry_nr; index++) {
> > > > > > > > +           ret = copy_struct_from_user(req, sizeof(*req),
> > > > > > > > +                                       u64_to_user_ptr(uptr + index *
> > > > > > > > entry_size),
> > > > > > > > +                                       entry_size);
> > > > > > >
> > > > > > > If continuing this direction then the driver should also check minsz
> etc.
> > > > > > > for struct iommu_hwpt_vtd_s1_invalidate and
> > > iommu_hwpt_vtd_s1_invalidate_desc
> > > > > > > since they are uAPI and subject to change.
> > > > > >
> > > > > > Then needs to define size in the uapi data structure, and copy size
> first
> > > and
> > > > > > check minsz before going forward. How about the structures for
> hwpt
> > > alloc
> > > > > > like struct iommu_hwpt_vtd_s1? Should check minsz for them as
> well?
> > > > >
> > > > > Assuming that every uAPI data structure needs a min_size, we can
> > > > > either add a structure holding all min_sizes like iommufd main.c
> > > > > or have another xx_min_len in iommu_/domain_ops.
> > > >
> > > > If driver is doing the copy it is OK that driver does the min_size
> > > > check too
> > >
> > > Ah, just realized my reply above was missing a context..
> > >
> > > Yi and I are having a concern that the core iommu_hpwt_alloc()
> > > and iommu_hwpt_cache_invalidate(), in the nesting series, copy
> > > data without checking the min_sizes. So, we are trying to add
> > > the missing piece into the next version but not sure which way
> > > could be optimal.
> > >
> > > It probably makes sense to add cache_invalidate_user_min_len
> > > next to the existing cache_invalidate_user_data_len. For the
> > > iommu_hwpt_alloc, we are missing a data_len, as the core just
> > > uses sizeof(union iommu_domain_user_data) when calling the
> > > copy_struct_from_user().
> > >
> > > Perhaps we could add two pairs of data_len/min_len in the ops
> > > structs:
> > > 	// iommu_ops
> > > 	const size_t domain_alloc_user_data_len; // for sanity&copy
> > > 	const size_t domain_alloc_user_min_len; // for sanity only
> > > 	// iommu_domain_ops
> > > 	const size_t cache_invalidate_user_data_len; // for sanity&copy
> > > 	const size_t cache_invalidate_user_min_len; // for sanity only
> > >
> >
> > What about creating a simple array to track type specific len in
> > iommufd instead of adding more fields to iommu/domain_ops?
> > anyway it's iommufd doing the copy and all the type specific
> > structures are already defined in the uapi header.
> 
> Then index the array with type value, is it? Seems like we have defined
> such array before for the length of hwpt_alloc and invalidate structures.
> but finally we dropped it the array may grow largely per new types.

I'm not sure how many types iommufd will support in reality.

Just my personal feeling that having information contained in iommufd
is cleaner than expanding iommu core abstraction to assist iommufd 
user buffer copy/verification. 

> 
> >
> > and a similar example already exists in union ucmd_buffer which
> > includes type specific structures to avoid memory copy...
> 
> Not quite get here. ucmd_buffer is a union used to copy any user
> data. But here we want to check the minsz of the the user data.
> Seems not related.
> 

that's the example for recording vendor specific structure information
in iommufd and it will also grow along with new added types.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-09  8:58                 ` Tian, Kevin
@ 2023-08-09  9:30                   ` Liu, Yi L
  2023-08-09 16:24                     ` Jason Gunthorpe
  0 siblings, 1 reply; 75+ messages in thread
From: Liu, Yi L @ 2023-08-09  9:30 UTC (permalink / raw)
  To: Tian, Kevin, Nicolin Chen, Jason Gunthorpe
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, August 9, 2023 4:58 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, August 9, 2023 4:50 PM
> >
> > > From: Tian, Kevin <kevin.tian@intel.com>
> > > Sent: Wednesday, August 9, 2023 4:23 PM
> > >
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Wednesday, August 9, 2023 1:42 AM
> > > >
> > > > On Tue, Aug 08, 2023 at 09:34:03AM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, Aug 07, 2023 at 08:12:37PM -0700, Nicolin Chen wrote:
> > > > > > On Mon, Aug 07, 2023 at 03:08:29PM +0000, Liu, Yi L wrote:
> > > > > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > > > > Sent: Monday, July 24, 2023 7:14 PM
> > > > > > > > >
> > > > > > > > > +static int intel_nested_cache_invalidate_user(struct
> > > > iommu_domain
> > > > > > > > > *domain,
> > > > > > > > > +                                         void *user_data)
> > > > > > > > > +{
> > > > > > > > > +   struct iommu_hwpt_vtd_s1_invalidate_desc *req = user_data;
> > > > > > > > > +   struct iommu_hwpt_vtd_s1_invalidate *inv_info = user_data;
> > > > > > > > > +   struct dmar_domain *dmar_domain =
> > to_dmar_domain(domain);
> > > > > > > > > +   unsigned int entry_size = inv_info->entry_size;
> > > > > > > > > +   u64 uptr = inv_info->inv_data_uptr;
> > > > > > > > > +   u64 nr_uptr = inv_info->entry_nr_uptr;
> > > > > > > > > +   struct device_domain_info *info;
> > > > > > > > > +   u32 entry_nr, index;
> > > > > > > > > +   unsigned long flags;
> > > > > > > > > +   int ret = 0;
> > > > > > > > > +
> > > > > > > > > +   if (get_user(entry_nr, (uint32_t __user
> > > > *)u64_to_user_ptr(nr_uptr)))
> > > > > > > > > +           return -EFAULT;
> > > > > > > > > +
> > > > > > > > > +   for (index = 0; index < entry_nr; index++) {
> > > > > > > > > +           ret = copy_struct_from_user(req, sizeof(*req),
> > > > > > > > > +                                       u64_to_user_ptr(uptr + index *
> > > > > > > > > entry_size),
> > > > > > > > > +                                       entry_size);
> > > > > > > >
> > > > > > > > If continuing this direction then the driver should also check minsz
> > etc.
> > > > > > > > for struct iommu_hwpt_vtd_s1_invalidate and
> > > > iommu_hwpt_vtd_s1_invalidate_desc
> > > > > > > > since they are uAPI and subject to change.
> > > > > > >
> > > > > > > Then needs to define size in the uapi data structure, and copy size
> > first
> > > > and
> > > > > > > check minsz before going forward. How about the structures for
> > hwpt
> > > > alloc
> > > > > > > like struct iommu_hwpt_vtd_s1? Should check minsz for them as
> > well?
> > > > > >
> > > > > > Assuming that every uAPI data structure needs a min_size, we can
> > > > > > either add a structure holding all min_sizes like iommufd main.c
> > > > > > or have another xx_min_len in iommu_/domain_ops.
> > > > >
> > > > > If driver is doing the copy it is OK that driver does the min_size
> > > > > check too
> > > >
> > > > Ah, just realized my reply above was missing a context..
> > > >
> > > > Yi and I are having a concern that the core iommu_hpwt_alloc()
> > > > and iommu_hwpt_cache_invalidate(), in the nesting series, copy
> > > > data without checking the min_sizes. So, we are trying to add
> > > > the missing piece into the next version but not sure which way
> > > > could be optimal.
> > > >
> > > > It probably makes sense to add cache_invalidate_user_min_len
> > > > next to the existing cache_invalidate_user_data_len. For the
> > > > iommu_hwpt_alloc, we are missing a data_len, as the core just
> > > > uses sizeof(union iommu_domain_user_data) when calling the
> > > > copy_struct_from_user().
> > > >
> > > > Perhaps we could add two pairs of data_len/min_len in the ops
> > > > structs:
> > > > 	// iommu_ops
> > > > 	const size_t domain_alloc_user_data_len; // for sanity&copy
> > > > 	const size_t domain_alloc_user_min_len; // for sanity only
> > > > 	// iommu_domain_ops
> > > > 	const size_t cache_invalidate_user_data_len; // for sanity&copy
> > > > 	const size_t cache_invalidate_user_min_len; // for sanity only
> > > >
> > >
> > > What about creating a simple array to track type specific len in
> > > iommufd instead of adding more fields to iommu/domain_ops?
> > > anyway it's iommufd doing the copy and all the type specific
> > > structures are already defined in the uapi header.
> >
> > Then index the array with type value, is it? Seems like we have defined
> > such array before for the length of hwpt_alloc and invalidate structures.
> > but finally we dropped it the array may grow largely per new types.
> 
> I'm not sure how many types iommufd will support in reality.
>
> Just my personal feeling that having information contained in iommufd
> is cleaner than expanding iommu core abstraction to assist iommufd
> user buffer copy/verification.
> 
> >
> > >
> > > and a similar example already exists in union ucmd_buffer which
> > > includes type specific structures to avoid memory copy...
> >
> > Not quite get here. ucmd_buffer is a union used to copy any user
> > data. But here we want to check the minsz of the the user data.
> > Seems not related.
> >
> 
> that's the example for recording vendor specific structure information
> in iommufd and it will also grow along with new added types.

Yeah, adding new structures to ucmd_buffer may increase the size as
well if the new one is larger. While for an array, if there is new entry,
it is for sure to increase the size. I remember there is one tricky thing
when handling the selftest type. E.g. it is defined as 0xbadbeef, if using
it to index array, it would expire. So we have some special handling on
it. If defining the things in iommu_ops, it is simpler. Selftest may be
not so critical to determining the direction though.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-09  9:30                   ` Liu, Yi L
@ 2023-08-09 16:24                     ` Jason Gunthorpe
  2023-08-09 19:12                       ` Nicolin Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-09 16:24 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Nicolin Chen, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, Aug 09, 2023 at 09:30:12AM +0000, Liu, Yi L wrote:

> Yeah, adding new structures to ucmd_buffer may increase the size as
> well if the new one is larger. While for an array, if there is new entry,
> it is for sure to increase the size. I remember there is one tricky thing
> when handling the selftest type. E.g. it is defined as 0xbadbeef, if using
> it to index array, it would expire. So we have some special handling on
> it. If defining the things in iommu_ops, it is simpler. Selftest may be
> not so critical to determining the direction though.

Maybe we are trying too hard to make it "easy" on the driver.

Can't we just have the driver invoke some:

driver_iommufd_invalidate_op(??? *opaque)
{
	struct driver_base_struct args;

        rc = iommufd_get_args(opaque, &args, sizeof(args),
	     offsetof(args, last));
	if (rc)
	   return rc;
}

The whole point of this excercise was to avoid the mistake where
drivers code the uapi checks incorrectly. We can achieve the same
outcome by providing a mandatory helper.

Similarly for managing the array of invalidation commands.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-09 16:24                     ` Jason Gunthorpe
@ 2023-08-09 19:12                       ` Nicolin Chen
  2023-08-09 19:19                         ` Jason Gunthorpe
  0 siblings, 1 reply; 75+ messages in thread
From: Nicolin Chen @ 2023-08-09 19:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Tian, Kevin, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, Aug 09, 2023 at 01:24:56PM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 09, 2023 at 09:30:12AM +0000, Liu, Yi L wrote:
> 
> > Yeah, adding new structures to ucmd_buffer may increase the size as
> > well if the new one is larger. While for an array, if there is new entry,
> > it is for sure to increase the size. I remember there is one tricky thing
> > when handling the selftest type. E.g. it is defined as 0xbadbeef, if using
> > it to index array, it would expire. So we have some special handling on
> > it. If defining the things in iommu_ops, it is simpler. Selftest may be
> > not so critical to determining the direction though.
> 
> Maybe we are trying too hard to make it "easy" on the driver.
> 
> Can't we just have the driver invoke some:
> 
> driver_iommufd_invalidate_op(??? *opaque)
> {
> 	struct driver_base_struct args;
> 
>         rc = iommufd_get_args(opaque, &args, sizeof(args),
> 	     offsetof(args, last));

OK. So, IIUIC, the opaque should be:

struct iommu_user_data {
	void __user *data_uptr;
	size_t data_len;
}user_data;

And core does basic sanity of data_uptr != NULL and data_len !=0
before passing this to driver, and then do a full sanity during
the iommufd_get_args (or iommufd_get_user_data?) call.

> The whole point of this excercise was to avoid the mistake where
> drivers code the uapi checks incorrectly. We can achieve the same
> outcome by providing a mandatory helper.

OK. I will rework this part today.

> Similarly for managing the array of invalidation commands.

You mean an embedded uptr inside a driver user data struct right?
Sure, that should go through the new helper too.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-09 19:12                       ` Nicolin Chen
@ 2023-08-09 19:19                         ` Jason Gunthorpe
  2023-08-09 20:17                           ` Nicolin Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-09 19:19 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, Tian, Kevin, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, Aug 09, 2023 at 12:12:25PM -0700, Nicolin Chen wrote:
> On Wed, Aug 09, 2023 at 01:24:56PM -0300, Jason Gunthorpe wrote:
> > On Wed, Aug 09, 2023 at 09:30:12AM +0000, Liu, Yi L wrote:
> > 
> > > Yeah, adding new structures to ucmd_buffer may increase the size as
> > > well if the new one is larger. While for an array, if there is new entry,
> > > it is for sure to increase the size. I remember there is one tricky thing
> > > when handling the selftest type. E.g. it is defined as 0xbadbeef, if using
> > > it to index array, it would expire. So we have some special handling on
> > > it. If defining the things in iommu_ops, it is simpler. Selftest may be
> > > not so critical to determining the direction though.
> > 
> > Maybe we are trying too hard to make it "easy" on the driver.
> > 
> > Can't we just have the driver invoke some:
> > 
> > driver_iommufd_invalidate_op(??? *opaque)
> > {
> > 	struct driver_base_struct args;
> > 
> >         rc = iommufd_get_args(opaque, &args, sizeof(args),
> > 	     offsetof(args, last));
> 
> OK. So, IIUIC, the opaque should be:
> 
> struct iommu_user_data {
> 	void __user *data_uptr;
> 	size_t data_len;
> }user_data;
> 
> And core does basic sanity of data_uptr != NULL and data_len !=0
> before passing this to driver, and then do a full sanity during
> the iommufd_get_args (or iommufd_get_user_data?) call.

Don't even need to check datA_uptr and data_len, the helper should
check the size and null is caught by copy from user
 
> > Similarly for managing the array of invalidation commands.
> 
> You mean an embedded uptr inside a driver user data struct right?
> Sure, that should go through the new helper too.

If we are committed that all drivers have to process an array then put
the array in the top level struct and pass it in the same user_data
struct and use another helper to allow the driver to iterate through
it.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-09 19:19                         ` Jason Gunthorpe
@ 2023-08-09 20:17                           ` Nicolin Chen
  2023-08-10  2:49                             ` Tian, Kevin
  2023-08-10  3:28                             ` Liu, Yi L
  0 siblings, 2 replies; 75+ messages in thread
From: Nicolin Chen @ 2023-08-09 20:17 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: Tian, Kevin, joro, alex.williamson, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Wed, Aug 09, 2023 at 04:19:01PM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 09, 2023 at 12:12:25PM -0700, Nicolin Chen wrote:
> > On Wed, Aug 09, 2023 at 01:24:56PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Aug 09, 2023 at 09:30:12AM +0000, Liu, Yi L wrote:
> > > 
> > > > Yeah, adding new structures to ucmd_buffer may increase the size as
> > > > well if the new one is larger. While for an array, if there is new entry,
> > > > it is for sure to increase the size. I remember there is one tricky thing
> > > > when handling the selftest type. E.g. it is defined as 0xbadbeef, if using
> > > > it to index array, it would expire. So we have some special handling on
> > > > it. If defining the things in iommu_ops, it is simpler. Selftest may be
> > > > not so critical to determining the direction though.
> > > 
> > > Maybe we are trying too hard to make it "easy" on the driver.
> > > 
> > > Can't we just have the driver invoke some:
> > > 
> > > driver_iommufd_invalidate_op(??? *opaque)
> > > {
> > > 	struct driver_base_struct args;
> > > 
> > >         rc = iommufd_get_args(opaque, &args, sizeof(args),
> > > 	     offsetof(args, last));
> > 
> > OK. So, IIUIC, the opaque should be:
> > 
> > struct iommu_user_data {
> > 	void __user *data_uptr;
> > 	size_t data_len;
> > }user_data;
> > 
> > And core does basic sanity of data_uptr != NULL and data_len !=0
> > before passing this to driver, and then do a full sanity during
> > the iommufd_get_args (or iommufd_get_user_data?) call.
> 
> Don't even need to check datA_uptr and data_len, the helper should
> check the size and null is caught by copy from user

I see. I was worried about the alloc path since its data input is
optional upon IOMMU_DOMAIN_UNMANAGED. But this helper should work
for that also.

In that case, we might not even need to define the union with all
structures, in iommu.h.

> > > Similarly for managing the array of invalidation commands.
> > 
> > You mean an embedded uptr inside a driver user data struct right?
> > Sure, that should go through the new helper too.
> 
> If we are committed that all drivers have to process an array then put
> the array in the top level struct and pass it in the same user_data
> struct and use another helper to allow the driver to iterate through
> it.

I see. Both VTD and SMMU pass uptr to the arrays of invalidation
commands/requests. The only difference is that SMMU's array is a
ring buffer other than a plain one indexing from the beginning.
But the helper could take two index inputs, which should work for
VTD case too. If another IOMMU driver only supports one request,
rather than a array of requests, we can treat that as a single-
entry array.

Then, the driver-specific data structure will be the array entry
level only.

@Yi,
This seems to be a bigger rework than the top level struct. Along
with Jason's request for fail_nth below, we'd need to bisect the
workload between us, or can just continue each other's daily work.
Let me know which one you prefer.
https://lore.kernel.org/linux-iommu/ZNPCtPTcHvITt6fk@nvidia.com/

Thanks!
Nic

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-09 20:17                           ` Nicolin Chen
@ 2023-08-10  2:49                             ` Tian, Kevin
  2023-08-10 15:57                               ` Jason Gunthorpe
  2023-08-10  3:28                             ` Liu, Yi L
  1 sibling, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-10  2:49 UTC (permalink / raw)
  To: Nicolin Chen, Liu, Yi L, Jason Gunthorpe
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, August 10, 2023 4:17 AM
> 
> On Wed, Aug 09, 2023 at 04:19:01PM -0300, Jason Gunthorpe wrote:
> > On Wed, Aug 09, 2023 at 12:12:25PM -0700, Nicolin Chen wrote:
> > > On Wed, Aug 09, 2023 at 01:24:56PM -0300, Jason Gunthorpe wrote:
> > > > Similarly for managing the array of invalidation commands.
> > >
> > > You mean an embedded uptr inside a driver user data struct right?
> > > Sure, that should go through the new helper too.
> >
> > If we are committed that all drivers have to process an array then put
> > the array in the top level struct and pass it in the same user_data
> > struct and use another helper to allow the driver to iterate through
> > it.
> 
> I see. Both VTD and SMMU pass uptr to the arrays of invalidation
> commands/requests. The only difference is that SMMU's array is a
> ring buffer other than a plain one indexing from the beginning.
> But the helper could take two index inputs, which should work for
> VTD case too. If another IOMMU driver only supports one request,
> rather than a array of requests, we can treat that as a single-
> entry array.
> 

I like this approach.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-09 20:17                           ` Nicolin Chen
  2023-08-10  2:49                             ` Tian, Kevin
@ 2023-08-10  3:28                             ` Liu, Yi L
  2023-08-10  3:33                               ` Nicolin Chen
  1 sibling, 1 reply; 75+ messages in thread
From: Liu, Yi L @ 2023-08-10  3:28 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Tian, Kevin, joro, alex.williamson, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, August 10, 2023 4:17 AM
> 
> On Wed, Aug 09, 2023 at 04:19:01PM -0300, Jason Gunthorpe wrote:
> > On Wed, Aug 09, 2023 at 12:12:25PM -0700, Nicolin Chen wrote:
> > > On Wed, Aug 09, 2023 at 01:24:56PM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Aug 09, 2023 at 09:30:12AM +0000, Liu, Yi L wrote:
> > > >
> > > > > Yeah, adding new structures to ucmd_buffer may increase the size as
> > > > > well if the new one is larger. While for an array, if there is new entry,
> > > > > it is for sure to increase the size. I remember there is one tricky thing
> > > > > when handling the selftest type. E.g. it is defined as 0xbadbeef, if using
> > > > > it to index array, it would expire. So we have some special handling on
> > > > > it. If defining the things in iommu_ops, it is simpler. Selftest may be
> > > > > not so critical to determining the direction though.
> > > >
> > > > Maybe we are trying too hard to make it "easy" on the driver.
> > > >
> > > > Can't we just have the driver invoke some:
> > > >
> > > > driver_iommufd_invalidate_op(??? *opaque)
> > > > {
> > > > 	struct driver_base_struct args;
> > > >
> > > >         rc = iommufd_get_args(opaque, &args, sizeof(args),
> > > > 	     offsetof(args, last));
> > >
> > > OK. So, IIUIC, the opaque should be:
> > >
> > > struct iommu_user_data {
> > > 	void __user *data_uptr;
> > > 	size_t data_len;
> > > }user_data;
> > >
> > > And core does basic sanity of data_uptr != NULL and data_len !=0
> > > before passing this to driver, and then do a full sanity during
> > > the iommufd_get_args (or iommufd_get_user_data?) call.
> >
> > Don't even need to check datA_uptr and data_len, the helper should
> > check the size and null is caught by copy from user
> 
> I see. I was worried about the alloc path since its data input is
> optional upon IOMMU_DOMAIN_UNMANAGED. But this helper should work
> for that also.
> 
> In that case, we might not even need to define the union with all
> structures, in iommu.h.
> 
> > > > Similarly for managing the array of invalidation commands.
> > >
> > > You mean an embedded uptr inside a driver user data struct right?
> > > Sure, that should go through the new helper too.
> >
> > If we are committed that all drivers have to process an array then put
> > the array in the top level struct and pass it in the same user_data
> > struct and use another helper to allow the driver to iterate through
> > it.
> 
> I see. Both VTD and SMMU pass uptr to the arrays of invalidation
> commands/requests. The only difference is that SMMU's array is a
> ring buffer other than a plain one indexing from the beginning.
> But the helper could take two index inputs, which should work for
> VTD case too. If another IOMMU driver only supports one request,
> rather than a array of requests, we can treat that as a single-
> entry array.
> 
> Then, the driver-specific data structure will be the array entry
> level only.
> 
> @Yi,
> This seems to be a bigger rework than the top level struct. Along
> with Jason's request for fail_nth below, we'd need to bisect the
> workload between us, or can just continue each other's daily work.
> Let me know which one you prefer.
> https://lore.kernel.org/linux-iommu/ZNPCtPTcHvITt6fk@nvidia.com/

Let me address the fail_nth request first. You may rework the
iommufd_get_user_data(). If I can finish the fail_nth soon,
then may help to lift the array to the top level. If not, you
may make it as well. 😊 I guess I need some study on nth
as well.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-10  3:28                             ` Liu, Yi L
@ 2023-08-10  3:33                               ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2023-08-10  3:33 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Jason Gunthorpe, Tian, Kevin, joro, alex.williamson,
	robin.murphy, baolu.lu, cohuck, eric.auger, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong

On Thu, Aug 10, 2023 at 03:28:10AM +0000, Liu, Yi L wrote:

> > @Yi,
> > This seems to be a bigger rework than the top level struct. Along
> > with Jason's request for fail_nth below, we'd need to bisect the
> > workload between us, or can just continue each other's daily work.
> > Let me know which one you prefer.
> > https://lore.kernel.org/linux-iommu/ZNPCtPTcHvITt6fk@nvidia.com/
> 
> Let me address the fail_nth request first. You may rework the
> iommufd_get_user_data(). If I can finish the fail_nth soon,
> then may help to lift the array to the top level. If not, you
> may make it as well. 😊 I guess I need some study on nth
> as well.

Ack. Am working on the iommufd_get_user_data() already. Will
continue the ring buffer thing.

Nic

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-10  2:49                             ` Tian, Kevin
@ 2023-08-10 15:57                               ` Jason Gunthorpe
  2023-08-10 17:14                                 ` Nicolin Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-10 15:57 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Thu, Aug 10, 2023 at 02:49:59AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, August 10, 2023 4:17 AM
> > 
> > On Wed, Aug 09, 2023 at 04:19:01PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Aug 09, 2023 at 12:12:25PM -0700, Nicolin Chen wrote:
> > > > On Wed, Aug 09, 2023 at 01:24:56PM -0300, Jason Gunthorpe wrote:
> > > > > Similarly for managing the array of invalidation commands.
> > > >
> > > > You mean an embedded uptr inside a driver user data struct right?
> > > > Sure, that should go through the new helper too.
> > >
> > > If we are committed that all drivers have to process an array then put
> > > the array in the top level struct and pass it in the same user_data
> > > struct and use another helper to allow the driver to iterate through
> > > it.
> > 
> > I see. Both VTD and SMMU pass uptr to the arrays of invalidation
> > commands/requests. The only difference is that SMMU's array is a
> > ring buffer other than a plain one indexing from the beginning.
> > But the helper could take two index inputs, which should work for
> > VTD case too. If another IOMMU driver only supports one request,
> > rather than a array of requests, we can treat that as a single-
> > entry array.
> > 
> 
> I like this approach.

Do we need to worry about the ring wrap around? It is already the case
that the VMM has to scan the ring and extract the invalidation
commands, wouldn't it already just linearize them?

Is there a use case for invaliation only SW emulated rings, and do we
care about optimizing for the wrap around case?

Let's answer those questions before designing something complicated :)

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-10 15:57                               ` Jason Gunthorpe
@ 2023-08-10 17:14                                 ` Nicolin Chen
  2023-08-10 19:27                                   ` Jason Gunthorpe
  0 siblings, 1 reply; 75+ messages in thread
From: Nicolin Chen @ 2023-08-10 17:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Thu, Aug 10, 2023 at 12:57:04PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 10, 2023 at 02:49:59AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, August 10, 2023 4:17 AM
> > > 
> > > On Wed, Aug 09, 2023 at 04:19:01PM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Aug 09, 2023 at 12:12:25PM -0700, Nicolin Chen wrote:
> > > > > On Wed, Aug 09, 2023 at 01:24:56PM -0300, Jason Gunthorpe wrote:
> > > > > > Similarly for managing the array of invalidation commands.
> > > > >
> > > > > You mean an embedded uptr inside a driver user data struct right?
> > > > > Sure, that should go through the new helper too.
> > > >
> > > > If we are committed that all drivers have to process an array then put
> > > > the array in the top level struct and pass it in the same user_data
> > > > struct and use another helper to allow the driver to iterate through
> > > > it.
> > > 
> > > I see. Both VTD and SMMU pass uptr to the arrays of invalidation
> > > commands/requests. The only difference is that SMMU's array is a
> > > ring buffer other than a plain one indexing from the beginning.
> > > But the helper could take two index inputs, which should work for
> > > VTD case too. If another IOMMU driver only supports one request,
> > > rather than a array of requests, we can treat that as a single-
> > > entry array.
> > > 
> > 
> > I like this approach.
> 
> Do we need to worry about the ring wrap around? It is already the case
> that the VMM has to scan the ring and extract the invalidation
> commands, wouldn't it already just linearize them?

I haven't got the chance to send the latest vSMMU series but I
pass down the raw user CMDQ to the host to go through, as it'd
be easier to stall the consumer index movement when a command
in the middle fails.

> Is there a use case for invaliation only SW emulated rings, and do we
> care about optimizing for the wrap around case?

Hmm, why a SW emulated ring?

Yes for the latter question. SMMU kernel driver has something
like Q_WRP and other helpers, so it wasn't difficult to process
the user CMDQ in the same raw form. But it does complicates the
common code if we want to do it there.

Thanks
Nic

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-10 17:14                                 ` Nicolin Chen
@ 2023-08-10 19:27                                   ` Jason Gunthorpe
  2023-08-10 21:02                                     ` Nicolin Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-10 19:27 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Thu, Aug 10, 2023 at 10:14:37AM -0700, Nicolin Chen wrote:
> On Thu, Aug 10, 2023 at 12:57:04PM -0300, Jason Gunthorpe wrote:
> > On Thu, Aug 10, 2023 at 02:49:59AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Thursday, August 10, 2023 4:17 AM
> > > > 
> > > > On Wed, Aug 09, 2023 at 04:19:01PM -0300, Jason Gunthorpe wrote:
> > > > > On Wed, Aug 09, 2023 at 12:12:25PM -0700, Nicolin Chen wrote:
> > > > > > On Wed, Aug 09, 2023 at 01:24:56PM -0300, Jason Gunthorpe wrote:
> > > > > > > Similarly for managing the array of invalidation commands.
> > > > > >
> > > > > > You mean an embedded uptr inside a driver user data struct right?
> > > > > > Sure, that should go through the new helper too.
> > > > >
> > > > > If we are committed that all drivers have to process an array then put
> > > > > the array in the top level struct and pass it in the same user_data
> > > > > struct and use another helper to allow the driver to iterate through
> > > > > it.
> > > > 
> > > > I see. Both VTD and SMMU pass uptr to the arrays of invalidation
> > > > commands/requests. The only difference is that SMMU's array is a
> > > > ring buffer other than a plain one indexing from the beginning.
> > > > But the helper could take two index inputs, which should work for
> > > > VTD case too. If another IOMMU driver only supports one request,
> > > > rather than a array of requests, we can treat that as a single-
> > > > entry array.
> > > > 
> > > 
> > > I like this approach.
> > 
> > Do we need to worry about the ring wrap around? It is already the case
> > that the VMM has to scan the ring and extract the invalidation
> > commands, wouldn't it already just linearize them?
> 
> I haven't got the chance to send the latest vSMMU series but I
> pass down the raw user CMDQ to the host to go through, as it'd
> be easier to stall the consumer index movement when a command
> in the middle fails.

Don't some commands have to be executed by the VMM?

Even so, it seems straightforward enough for the kernel to report the
number of commands it executed and the VMM can adjust the virtual
consumer index.
> 
> > Is there a use case for invaliation only SW emulated rings, and do we
> > care about optimizing for the wrap around case?
> 
> Hmm, why a SW emulated ring?

That is what you are building. The VMM catches the write of the
producer pointer and the VMM SW bundles it up to call into the kernel.
 
> Yes for the latter question. SMMU kernel driver has something
> like Q_WRP and other helpers, so it wasn't difficult to process
> the user CMDQ in the same raw form. But it does complicates the
> common code if we want to do it there.

Optimizing wrap around means when the producer/consumer pointers pass
the end of the queue memory we execute one, not two ioctls toward the
kernel. That is possible a very minor optimization, it depends how big
the queues are and how frequent multi-entry items will be present.

Jaso

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-10 19:27                                   ` Jason Gunthorpe
@ 2023-08-10 21:02                                     ` Nicolin Chen
  2023-08-11  3:52                                       ` Tian, Kevin
  0 siblings, 1 reply; 75+ messages in thread
From: Nicolin Chen @ 2023-08-10 21:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Thu, Aug 10, 2023 at 04:27:02PM -0300, Jason Gunthorpe wrote:
 
> > > Do we need to worry about the ring wrap around? It is already the case
> > > that the VMM has to scan the ring and extract the invalidation
> > > commands, wouldn't it already just linearize them?
> > 
> > I haven't got the chance to send the latest vSMMU series but I
> > pass down the raw user CMDQ to the host to go through, as it'd
> > be easier to stall the consumer index movement when a command
> > in the middle fails.
> 
> Don't some commands have to be executed by the VMM?

Well, they do. VMM would go through the queue and "execute" non-
invalidation commands, then defer the queue to the kernel to go
through the queue once more. So, the flaw could be that some of
the commands behind the failing TLB flush command got "executed",
though in a real case most of other commands would be "executed"
standalone with a CMD_SYNC, i.e. not mixing with any invalidation
command.

> Even so, it seems straightforward enough for the kernel to report the
> number of commands it executed and the VMM can adjust the virtual
> consumer index.

It is not that straightforward to revert an array index back to
a consumer index because they might not be 1:1 mapped, since in
theory there could be other commands mixing in-between, although
it unlikely happens.

So, another index-mapping array would be needed for this matter.
And this doesn't address the flaw that I mentioned above either.
So, I took the former solution to reduce the complication.

> > > Is there a use case for invaliation only SW emulated rings, and do we
> > > care about optimizing for the wrap around case?
> > 
> > Hmm, why a SW emulated ring?
> 
> That is what you are building. The VMM catches the write of the
> producer pointer and the VMM SW bundles it up to call into the kernel.

Still not fully getting it. Do you mean a ring that is prepared
by the VMM? I think the only case that we need to handle a ring
is what I did by forwarding the guest CMDQ (a ring) to the host
directly. Not sure why VMM would need another ring for those
linearized invalidation commands. Or maybe I misunderstood..

> > Yes for the latter question. SMMU kernel driver has something
> > like Q_WRP and other helpers, so it wasn't difficult to process
> > the user CMDQ in the same raw form. But it does complicates the
> > common code if we want to do it there.
> 
> Optimizing wrap around means when the producer/consumer pointers pass
> the end of the queue memory we execute one, not two ioctls toward the
> kernel. That is possible a very minor optimization, it depends how big
> the queues are and how frequent multi-entry items will be present.

There could be other commands being issued by other VMs or even
the host between the two ioctls. So probably we'd need to handle
the wrapping case when doing a ring solution?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-10 21:02                                     ` Nicolin Chen
@ 2023-08-11  3:52                                       ` Tian, Kevin
  2023-08-11 16:45                                         ` Nicolin Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Tian, Kevin @ 2023-08-11  3:52 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, August 11, 2023 5:03 AM
> 
> > > > Is there a use case for invaliation only SW emulated rings, and do we
> > > > care about optimizing for the wrap around case?
> > >
> > > Hmm, why a SW emulated ring?
> >
> > That is what you are building. The VMM catches the write of the
> > producer pointer and the VMM SW bundles it up to call into the kernel.
> 
> Still not fully getting it. Do you mean a ring that is prepared
> by the VMM? I think the only case that we need to handle a ring
> is what I did by forwarding the guest CMDQ (a ring) to the host
> directly. Not sure why VMM would need another ring for those
> linearized invalidation commands. Or maybe I misunderstood..
> 

iiuc the point of a ring-based native format is to maximum code reuse
when later in-kernel fast invalidation path (from kvm to smmu driver)
is enabled. Then both slow (via vmm) and fast (in-kernel) path use
the same logic to handle guest invalidation queue.

But if stepping back a bit supporting an array-based non-native format
could simplify the uAPI design and allows code sharing for array among
vendor drivers. You can still keep the entry as native format then the
only difference with future in-kernel fast path is just on walking an array
vs. walking a ring. And VMM doesn't need to expose non-invalidate
cmds to the kernel and then be skipped.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-11  3:52                                       ` Tian, Kevin
@ 2023-08-11 16:45                                         ` Nicolin Chen
  2023-08-15  6:18                                           ` Nicolin Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Nicolin Chen @ 2023-08-11 16:45 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Fri, Aug 11, 2023 at 03:52:52AM +0000, Tian, Kevin wrote:
 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Friday, August 11, 2023 5:03 AM
> >
> > > > > Is there a use case for invaliation only SW emulated rings, and do we
> > > > > care about optimizing for the wrap around case?
> > > >
> > > > Hmm, why a SW emulated ring?
> > >
> > > That is what you are building. The VMM catches the write of the
> > > producer pointer and the VMM SW bundles it up to call into the kernel.
> >
> > Still not fully getting it. Do you mean a ring that is prepared
> > by the VMM? I think the only case that we need to handle a ring
> > is what I did by forwarding the guest CMDQ (a ring) to the host
> > directly. Not sure why VMM would need another ring for those
> > linearized invalidation commands. Or maybe I misunderstood..
> >
> 
> iiuc the point of a ring-based native format is to maximum code reuse
> when later in-kernel fast invalidation path (from kvm to smmu driver)
> is enabled. Then both slow (via vmm) and fast (in-kernel) path use
> the same logic to handle guest invalidation queue.

I see. That's about the fast path topic. Thanks for the input.

> But if stepping back a bit supporting an array-based non-native format
> could simplify the uAPI design and allows code sharing for array among
> vendor drivers. You can still keep the entry as native format then the
> only difference with future in-kernel fast path is just on walking an array
> vs. walking a ring. And VMM doesn't need to expose non-invalidate
> cmds to the kernel and then be skipped.

Ah, so we might still design the uAPI to be ring based at this
moment, yet don't support a case CONS > 0 to leave that to an
upgrade in the future.

I will try estimating a bit how complicated to implement the
ring, to see if we could just start with that. Otherwise, will
just start with an array.

Thanks
Nic

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-11 16:45                                         ` Nicolin Chen
@ 2023-08-15  6:18                                           ` Nicolin Chen
  2023-08-18 16:56                                             ` Jason Gunthorpe
  0 siblings, 1 reply; 75+ messages in thread
From: Nicolin Chen @ 2023-08-15  6:18 UTC (permalink / raw)
  To: Jason Gunthorpe, Liu, Yi L, Tian, Kevin
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Fri, Aug 11, 2023 at 09:45:21AM -0700, Nicolin Chen wrote:

> > But if stepping back a bit supporting an array-based non-native format
> > could simplify the uAPI design and allows code sharing for array among
> > vendor drivers. You can still keep the entry as native format then the
> > only difference with future in-kernel fast path is just on walking an array
> > vs. walking a ring. And VMM doesn't need to expose non-invalidate
> > cmds to the kernel and then be skipped.
> 
> Ah, so we might still design the uAPI to be ring based at this
> moment, yet don't support a case CONS > 0 to leave that to an
> upgrade in the future.
> 
> I will try estimating a bit how complicated to implement the
> ring, to see if we could just start with that. Otherwise, will
> just start with an array.

I drafted a uAPI structure for a ring-based SW queue. While I am
trying an implementation, I'd like to collect some comments at the
structure, to see if it overall makes sense.

One thing that I couldn't add to this common structure for SMMU
is the hardware error code, which should be encoded in the higher
bits of the consumer index register, following the SMMU spec:
    ERR, bits [30:24] Error reason code.
    - When a command execution error is detected, ERR is set to a
      reason code and then the SMMU_GERROR.CMDQ_ERR global error
      becomes active.
    - The value in this field is UNKNOWN when the CMDQ_ERR global
      error is not active. This field resets to an UNKNOWN value.

But, I feel it odd to do the same to the generic structure. So,
perhaps an optional @out_error can be added to this structure. Or
some other idea?

Thanks
Nic

/**
 * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
 * @size: sizeof(struct iommu_hwpt_invalidate)
 * @hwpt_id: HWPT ID of target hardware page table for the invalidation
 * @q_uptr: User pointer to an invalidation queue, which can be used as a flat
 *          array or a circular ring queue. The entiry(s) in the queue must be
 *          at a fixed width @q_entry_len, containing a user data structure for
 *          an invalidation request, specific to the given hardware pagetable.
 * @q_cons_uptr: User pointer to the consumer index (with its wrap flag) of an
 *               invalidation queue. This pointer must point to a __u32 type of
 *               memory location. The consumer index tells kernel to read from
 *               the entry pointed by it (and then its next entry) until kernel
 *               reaches the entry pointed by the producer index @q_prod, and
 *               allows kernel to update the consumer index to where it stops:
 *               on success, it should be updated to @q_prod; otherwise, to the
 *               index pointing to the failed entry.
 * @q_prod: Producer index (with its wrap flag) of an invalidation queue. This
 *          index points to the entry next to the last requested entry in the
 *          invalidation queue. In case of using the queue as a flat array, it
 *          equals to the number of entries @q_entry_num.
 * @q_index_bits: Effective bits of both indexes. Defines the maximum value an
 *                index can be. Must not be greater than 31 bits. A wrap flag
 *                is defined at the next higher bit adjacent to the index bits:
 *                e.g. if @q_index_bits is 20, @q_prod[19:0] are the index bits
 *                and @q_prod[20] is the wrap flag. The wrap flag, acting like
 *                a sign flag, must be toggled each time an index overflow and
 *                wraps to the lower end of the circular queue.
 * @q_entry_num: Totaly number of the entries in an invalidation queue
 * @q_entry_len: Length (in bytes) of an entry of an invalidation queue
 *
 * Invalidate the iommu cache for user-managed page table. Modifications on a
 * user-managed page table should be followed by this operation to sync cache.
 *
 * One request supports multiple invalidations via a multi-entry queue:
 *   |<----------- Length of Queue = @q_entry_num * @q_entry_len ------------>|
 *   --------------------------------------------------------------------------
 *   | 0 | 1 | 2 | 3 | ... | @q_entry_num-3 | @q_entry_num-2 | @q_entry_num-1 |
 *   --------------------------------------------------------------------------
 *   ^           ^                          ^                |<-@q_entry_len->|
 *   |           |                          |
 * @q_uptr  @q_cons_uptr                 @q_prod
 *
 * A queue index can wrap its index bits off the high end of the queue and back
 * onto the low end by toggling its wrap flag: e.g. when @q_entry_num=0x10 and
 * @q_index_bits=4, *@q_cons_uptr=0xf and @q_prod=0x11 inputs mean the producer
 * index is wrapped to 0x1 with its wrap flag set, so kernel reads/handles the
 * entry starting from by the consumer index (0xf) and wraps it back to 0x0 and
 * 0x1 by toggling the wrap flag, i.e. *@q_cons_uptr has a final value of 0x11.
 */
struct iommu_hwpt_invalidate {
	__u32 size;
	__u32 hwpt_id;
	__aligned_u64 q_uptr;
	__aligned_u64 q_cons_uptr;
	__u32 q_prod;
	__u32 q_index_bits;
	__u32 q_entry_num;
	__u32 q_entry_len;
};
#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query
  2023-07-24 11:13 ` [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
@ 2023-08-15 16:31   ` Jason Gunthorpe
  2023-08-16  0:35     ` Baolu Lu
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-15 16:31 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan

On Mon, Jul 24, 2023 at 04:13:33AM -0700, Yi Liu wrote:
> Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/intel/iommu.c  | 19 +++++++++++++++++++
>  include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++++
>  2 files changed, 42 insertions(+)

I would like to pick this patch out of this series to go with the main
get_info stuff so that we have drivers implementing what is merged. I
made the trivial fixup.

Lu are you OK?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query
  2023-08-15 16:31   ` Jason Gunthorpe
@ 2023-08-16  0:35     ` Baolu Lu
  2023-08-16  1:10       ` Nicolin Chen
  2023-08-16 11:46       ` Jason Gunthorpe
  0 siblings, 2 replies; 75+ messages in thread
From: Baolu Lu @ 2023-08-16  0:35 UTC (permalink / raw)
  To: Jason Gunthorpe, Yi Liu
  Cc: baolu.lu, joro, alex.williamson, kevin.tian, robin.murphy,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan

On 2023/8/16 0:31, Jason Gunthorpe wrote:
> On Mon, Jul 24, 2023 at 04:13:33AM -0700, Yi Liu wrote:
>> Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
>>
>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>> Signed-off-by: Nicolin Chen<nicolinc@nvidia.com>
>> Signed-off-by: Yi Liu<yi.l.liu@intel.com>
>> ---
>>   drivers/iommu/intel/iommu.c  | 19 +++++++++++++++++++
>>   include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++++
>>   2 files changed, 42 insertions(+)
> I would like to pick this patch out of this series to go with the main
> get_info stuff so that we have drivers implementing what is merged. I
> made the trivial fixup.
> 
> Lu are you OK?

Yes.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query
  2023-08-16  0:35     ` Baolu Lu
@ 2023-08-16  1:10       ` Nicolin Chen
  2023-08-16 11:46       ` Jason Gunthorpe
  1 sibling, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2023-08-16  1:10 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Jason Gunthorpe, Yi Liu, joro, alex.williamson, kevin.tian,
	robin.murphy, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan

On Wed, Aug 16, 2023 at 08:35:00AM +0800, Baolu Lu wrote:
 
> On 2023/8/16 0:31, Jason Gunthorpe wrote:
> > On Mon, Jul 24, 2023 at 04:13:33AM -0700, Yi Liu wrote:
> > > Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
> > > 
> > > Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
> > > Signed-off-by: Nicolin Chen<nicolinc@nvidia.com>
> > > Signed-off-by: Yi Liu<yi.l.liu@intel.com>
> > > ---
> > >   drivers/iommu/intel/iommu.c  | 19 +++++++++++++++++++
> > >   include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++++
> > >   2 files changed, 42 insertions(+)
> > I would like to pick this patch out of this series to go with the main
> > get_info stuff so that we have drivers implementing what is merged. I
> > made the trivial fixup.
> > 
> > Lu are you OK?
> 
> Yes.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Hmm...

We have Yi in the author line and Baolu in the first signed-off
line, and now Baolu with "Reviewed-by" again...

I guess we might need a bit of fix or re-arrange? :)

Thanks
Nic

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query
  2023-08-16  0:35     ` Baolu Lu
  2023-08-16  1:10       ` Nicolin Chen
@ 2023-08-16 11:46       ` Jason Gunthorpe
  2023-08-16 11:48         ` Baolu Lu
  1 sibling, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-16 11:46 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan

On Wed, Aug 16, 2023 at 08:35:00AM +0800, Baolu Lu wrote:
> On 2023/8/16 0:31, Jason Gunthorpe wrote:
> > On Mon, Jul 24, 2023 at 04:13:33AM -0700, Yi Liu wrote:
> > > Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
> > > 
> > > Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
> > > Signed-off-by: Nicolin Chen<nicolinc@nvidia.com>
> > > Signed-off-by: Yi Liu<yi.l.liu@intel.com>
> > > ---
> > >   drivers/iommu/intel/iommu.c  | 19 +++++++++++++++++++
> > >   include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++++
> > >   2 files changed, 42 insertions(+)
> > I would like to pick this patch out of this series to go with the main
> > get_info stuff so that we have drivers implementing what is merged. I
> > made the trivial fixup.
> > 
> > Lu are you OK?
> 
> Yes.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

I changed this to an acked-by since you helpd write the patch :)

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query
  2023-08-16 11:46       ` Jason Gunthorpe
@ 2023-08-16 11:48         ` Baolu Lu
  2023-08-16 12:03           ` Liu, Yi L
  0 siblings, 1 reply; 75+ messages in thread
From: Baolu Lu @ 2023-08-16 11:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Yi Liu, joro, alex.williamson, kevin.tian,
	robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan

On 2023/8/16 19:46, Jason Gunthorpe wrote:
> On Wed, Aug 16, 2023 at 08:35:00AM +0800, Baolu Lu wrote:
>> On 2023/8/16 0:31, Jason Gunthorpe wrote:
>>> On Mon, Jul 24, 2023 at 04:13:33AM -0700, Yi Liu wrote:
>>>> Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
>>>>
>>>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>>>> Signed-off-by: Nicolin Chen<nicolinc@nvidia.com>
>>>> Signed-off-by: Yi Liu<yi.l.liu@intel.com>
>>>> ---
>>>>    drivers/iommu/intel/iommu.c  | 19 +++++++++++++++++++
>>>>    include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++++
>>>>    2 files changed, 42 insertions(+)
>>> I would like to pick this patch out of this series to go with the main
>>> get_info stuff so that we have drivers implementing what is merged. I
>>> made the trivial fixup.
>>>
>>> Lu are you OK?
>> Yes.
>>
>> Reviewed-by: Lu Baolu<baolu.lu@linux.intel.com>
> I changed this to an acked-by since you helpd write the patch 😄

Okay, fine to me.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query
  2023-08-16 11:48         ` Baolu Lu
@ 2023-08-16 12:03           ` Liu, Yi L
  0 siblings, 0 replies; 75+ messages in thread
From: Liu, Yi L @ 2023-08-16 12:03 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: joro, alex.williamson, Tian, Kevin, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, August 16, 2023 7:49 PM
> 
> On 2023/8/16 19:46, Jason Gunthorpe wrote:
> > On Wed, Aug 16, 2023 at 08:35:00AM +0800, Baolu Lu wrote:
> >> On 2023/8/16 0:31, Jason Gunthorpe wrote:
> >>> On Mon, Jul 24, 2023 at 04:13:33AM -0700, Yi Liu wrote:
> >>>> Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
> >>>>
> >>>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
> >>>> Signed-off-by: Nicolin Chen<nicolinc@nvidia.com>
> >>>> Signed-off-by: Yi Liu<yi.l.liu@intel.com>
> >>>> ---
> >>>>    drivers/iommu/intel/iommu.c  | 19 +++++++++++++++++++
> >>>>    include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++++
> >>>>    2 files changed, 42 insertions(+)
> >>> I would like to pick this patch out of this series to go with the main
> >>> get_info stuff so that we have drivers implementing what is merged. I
> >>> made the trivial fixup.
> >>>
> >>> Lu are you OK?
> >> Yes.
> >>
> >> Reviewed-by: Lu Baolu<baolu.lu@linux.intel.com>
> > I changed this to an acked-by since you helpd write the patch 😄
> 
> Okay, fine to me.

I'm going to send hw_info v8 which includes this patch, so will add below
tag, thanks.

Acked-by: Lu Baolu <baolu.lu@linux.intel.com>

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-15  6:18                                           ` Nicolin Chen
@ 2023-08-18 16:56                                             ` Jason Gunthorpe
  2023-08-18 17:56                                               ` Nicolin Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-18 16:56 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, Tian, Kevin, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Mon, Aug 14, 2023 at 11:18:46PM -0700, Nicolin Chen wrote:
> On Fri, Aug 11, 2023 at 09:45:21AM -0700, Nicolin Chen wrote:
> 
> > > But if stepping back a bit supporting an array-based non-native format
> > > could simplify the uAPI design and allows code sharing for array among
> > > vendor drivers. You can still keep the entry as native format then the
> > > only difference with future in-kernel fast path is just on walking an array
> > > vs. walking a ring. And VMM doesn't need to expose non-invalidate
> > > cmds to the kernel and then be skipped.
> > 
> > Ah, so we might still design the uAPI to be ring based at this
> > moment, yet don't support a case CONS > 0 to leave that to an
> > upgrade in the future.
> > 
> > I will try estimating a bit how complicated to implement the
> > ring, to see if we could just start with that. Otherwise, will
> > just start with an array.
> 
> I drafted a uAPI structure for a ring-based SW queue. While I am
> trying an implementation, I'd like to collect some comments at the
> structure, to see if it overall makes sense.

I don't think a ring makes alot of sense at this point. The only thing
it optimizes is a system call if the kernel needs to wrap around the
tail of the ring. It would possibly be better to have a small gather
list than try to describe the ring logic..

Further, the VMM already has to process it, so the vmm already knows
what ops are going to kernel are not. The VMM can just organize them
in a linear list in one way or another. We need to copy and read this
stuff in the VMM anyhow to protect against a hostile VM.

> One thing that I couldn't add to this common structure for SMMU
> is the hardware error code, which should be encoded in the higher
> bits of the consumer index register, following the SMMU spec:
>     ERR, bits [30:24] Error reason code.
>     - When a command execution error is detected, ERR is set to a
>       reason code and then the SMMU_GERROR.CMDQ_ERR global error
>       becomes active.
>     - The value in this field is UNKNOWN when the CMDQ_ERR global
>       error is not active. This field resets to an UNKNOWN value.

The invalidate ioctl should fail in some deterministic way and report
back the error code and the highest array index that maybe could have
triggered it.

The highest array index sounds generic, the error code maybe is too

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-18 16:56                                             ` Jason Gunthorpe
@ 2023-08-18 17:56                                               ` Nicolin Chen
  2023-08-18 18:20                                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 75+ messages in thread
From: Nicolin Chen @ 2023-08-18 17:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Tian, Kevin, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Fri, Aug 18, 2023 at 01:56:54PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 14, 2023 at 11:18:46PM -0700, Nicolin Chen wrote:
> > On Fri, Aug 11, 2023 at 09:45:21AM -0700, Nicolin Chen wrote:
> > 
> > > > But if stepping back a bit supporting an array-based non-native format
> > > > could simplify the uAPI design and allows code sharing for array among
> > > > vendor drivers. You can still keep the entry as native format then the
> > > > only difference with future in-kernel fast path is just on walking an array
> > > > vs. walking a ring. And VMM doesn't need to expose non-invalidate
> > > > cmds to the kernel and then be skipped.
> > > 
> > > Ah, so we might still design the uAPI to be ring based at this
> > > moment, yet don't support a case CONS > 0 to leave that to an
> > > upgrade in the future.
> > > 
> > > I will try estimating a bit how complicated to implement the
> > > ring, to see if we could just start with that. Otherwise, will
> > > just start with an array.
> > 
> > I drafted a uAPI structure for a ring-based SW queue. While I am
> > trying an implementation, I'd like to collect some comments at the
> > structure, to see if it overall makes sense.
> 
> I don't think a ring makes alot of sense at this point. The only thing
> it optimizes is a system call if the kernel needs to wrap around the
> tail of the ring. It would possibly be better to have a small gather
> list than try to describe the ring logic..
> 
> Further, the VMM already has to process it, so the vmm already knows
> what ops are going to kernel are not. The VMM can just organize them
> in a linear list in one way or another. We need to copy and read this
> stuff in the VMM anyhow to protect against a hostile VM.

OK. Then an linear array it is.

> > One thing that I couldn't add to this common structure for SMMU
> > is the hardware error code, which should be encoded in the higher
> > bits of the consumer index register, following the SMMU spec:
> >     ERR, bits [30:24] Error reason code.
> >     - When a command execution error is detected, ERR is set to a
> >       reason code and then the SMMU_GERROR.CMDQ_ERR global error
> >       becomes active.
> >     - The value in this field is UNKNOWN when the CMDQ_ERR global
> >       error is not active. This field resets to an UNKNOWN value.
> 
> The invalidate ioctl should fail in some deterministic way and report
> back the error code and the highest array index that maybe could have
> triggered it.

Yea. Having an error code in the highest bits of array_index,
"array_index != array_max" could be the deterministic way to
indicate a failure. And a kernel errno could be returned also
to the invalidate ioctl.

> The highest array index sounds generic, the error code maybe is too

We could do in its and report the error code in its raw form:
	__u32 out_array_index;
	/* number of bits used to report error code in the returned array_index */
	__u32 out_array_index_error_code_bits;
Or just:
	__u32 out_array_index;
	__u32 out_error_code;

Do we have to define a list of generic error code?

Thanks!
Nic

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-18 17:56                                               ` Nicolin Chen
@ 2023-08-18 18:20                                                 ` Jason Gunthorpe
  2023-08-18 18:25                                                   ` Nicolin Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Gunthorpe @ 2023-08-18 18:20 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, Tian, Kevin, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Fri, Aug 18, 2023 at 10:56:45AM -0700, Nicolin Chen wrote:

> > The highest array index sounds generic, the error code maybe is too
> 
> We could do in its and report the error code in its raw form:
> 	__u32 out_array_index;
> 	/* number of bits used to report error code in the returned array_index */
> 	__u32 out_array_index_error_code_bits;
> Or just:
> 	__u32 out_array_index;
> 	__u32 out_error_code;
> 
> Do we have to define a list of generic error code?

out_driver_error_code may be OK

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-18 18:20                                                 ` Jason Gunthorpe
@ 2023-08-18 18:25                                                   ` Nicolin Chen
  2023-08-21  1:24                                                     ` Tian, Kevin
  0 siblings, 1 reply; 75+ messages in thread
From: Nicolin Chen @ 2023-08-18 18:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Tian, Kevin, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Fri, Aug 18, 2023 at 03:20:55PM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 18, 2023 at 10:56:45AM -0700, Nicolin Chen wrote:
> 
> > > The highest array index sounds generic, the error code maybe is too
> > 
> > We could do in its and report the error code in its raw form:
> > 	__u32 out_array_index;
> > 	/* number of bits used to report error code in the returned array_index */
> > 	__u32 out_array_index_error_code_bits;
> > Or just:
> > 	__u32 out_array_index;
> > 	__u32 out_error_code;
> > 
> > Do we have to define a list of generic error code?
> 
> out_driver_error_code may be OK

Ack. Will implement the array in that way.

Thanks!
Nic

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain
  2023-08-18 18:25                                                   ` Nicolin Chen
@ 2023-08-21  1:24                                                     ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2023-08-21  1:24 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, August 19, 2023 2:26 AM
> 
> On Fri, Aug 18, 2023 at 03:20:55PM -0300, Jason Gunthorpe wrote:
> > On Fri, Aug 18, 2023 at 10:56:45AM -0700, Nicolin Chen wrote:
> >
> > > > The highest array index sounds generic, the error code maybe is too
> > >
> > > We could do in its and report the error code in its raw form:
> > > 	__u32 out_array_index;
> > > 	/* number of bits used to report error code in the returned
> array_index */
> > > 	__u32 out_array_index_error_code_bits;
> > > Or just:
> > > 	__u32 out_array_index;
> > > 	__u32 out_error_code;
> > >
> > > Do we have to define a list of generic error code?
> >
> > out_driver_error_code may be OK
> 
> Ack. Will implement the array in that way.
> 

Yes. Error code is driver specific. Just having a field to carry it is
sufficient.

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2023-08-21  1:24 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24 11:13 [PATCH v4 00/12] Add Intel VT-d nested translation Yi Liu
2023-07-24 11:13 ` [PATCH v4 01/12] iommufd: Add data structure for Intel VT-d stage-1 domain allocation Yi Liu
2023-08-02  6:40   ` Tian, Kevin
2023-07-24 11:13 ` [PATCH v4 02/12] iommu/vt-d: Extend dmar_domain to support nested domain Yi Liu
2023-08-02  6:42   ` Tian, Kevin
2023-07-24 11:13 ` [PATCH v4 03/12] iommu/vt-d: Add helper for nested domain allocation Yi Liu
2023-08-02  6:44   ` Tian, Kevin
2023-07-24 11:13 ` [PATCH v4 04/12] iommu/vt-d: Add helper to setup pasid nested translation Yi Liu
2023-08-02  7:10   ` Tian, Kevin
2023-08-02  8:09     ` Baolu Lu
2023-08-02  8:11       ` Baolu Lu
2023-08-02  8:13         ` Tian, Kevin
2023-08-03  3:13     ` Baolu Lu
2023-08-03  3:58       ` Tian, Kevin
2023-07-24 11:13 ` [PATCH v4 05/12] iommu/vt-d: Make domain attach helpers to be extern Yi Liu
2023-08-02  7:14   ` Tian, Kevin
2023-07-24 11:13 ` [PATCH v4 06/12] iommu/vt-d: Set the nested domain to a device Yi Liu
2023-08-02  7:22   ` Tian, Kevin
2023-08-03  3:17     ` Baolu Lu
2023-07-24 11:13 ` [PATCH v4 07/12] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation Yi Liu
2023-08-02  7:41   ` Tian, Kevin
2023-08-02 13:47     ` Jason Gunthorpe
2023-08-03  0:38       ` Tian, Kevin
2023-08-04 13:04         ` Liu, Yi L
2023-08-04 14:03           ` Jason Gunthorpe
2023-08-07 14:02             ` Liu, Yi L
2023-07-24 11:13 ` [PATCH v4 08/12] iommu/vt-d: Make iotlb flush helpers to be extern Yi Liu
2023-08-02  7:41   ` Tian, Kevin
2023-07-24 11:13 ` [PATCH v4 09/12] iommu/vt-d: Add iotlb flush for nested domain Yi Liu
2023-08-02  7:46   ` Tian, Kevin
2023-08-03  3:24     ` Baolu Lu
2023-08-03  4:00       ` Tian, Kevin
2023-08-03  4:05         ` Baolu Lu
2023-08-03  4:13           ` Tian, Kevin
2023-08-03  7:36             ` Baolu Lu
2023-08-07 15:08     ` Liu, Yi L
2023-08-08  3:12       ` Nicolin Chen
2023-08-08 12:34         ` Jason Gunthorpe
2023-08-08 17:41           ` Nicolin Chen
2023-08-09  8:22             ` Tian, Kevin
2023-08-09  8:50               ` Liu, Yi L
2023-08-09  8:58                 ` Tian, Kevin
2023-08-09  9:30                   ` Liu, Yi L
2023-08-09 16:24                     ` Jason Gunthorpe
2023-08-09 19:12                       ` Nicolin Chen
2023-08-09 19:19                         ` Jason Gunthorpe
2023-08-09 20:17                           ` Nicolin Chen
2023-08-10  2:49                             ` Tian, Kevin
2023-08-10 15:57                               ` Jason Gunthorpe
2023-08-10 17:14                                 ` Nicolin Chen
2023-08-10 19:27                                   ` Jason Gunthorpe
2023-08-10 21:02                                     ` Nicolin Chen
2023-08-11  3:52                                       ` Tian, Kevin
2023-08-11 16:45                                         ` Nicolin Chen
2023-08-15  6:18                                           ` Nicolin Chen
2023-08-18 16:56                                             ` Jason Gunthorpe
2023-08-18 17:56                                               ` Nicolin Chen
2023-08-18 18:20                                                 ` Jason Gunthorpe
2023-08-18 18:25                                                   ` Nicolin Chen
2023-08-21  1:24                                                     ` Tian, Kevin
2023-08-10  3:28                             ` Liu, Yi L
2023-08-10  3:33                               ` Nicolin Chen
2023-07-24 11:13 ` [PATCH v4 10/12] iommu/vt-d: Add nested domain allocation Yi Liu
2023-08-02  7:47   ` Tian, Kevin
2023-07-24 11:13 ` [PATCH v4 11/12] iommu/vt-d: Implement hw_info for iommu capability query Yi Liu
2023-08-15 16:31   ` Jason Gunthorpe
2023-08-16  0:35     ` Baolu Lu
2023-08-16  1:10       ` Nicolin Chen
2023-08-16 11:46       ` Jason Gunthorpe
2023-08-16 11:48         ` Baolu Lu
2023-08-16 12:03           ` Liu, Yi L
2023-07-24 11:13 ` [PATCH v4 12/12] iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu
2023-08-02  7:59   ` Tian, Kevin
2023-08-03  3:27     ` Baolu Lu
2023-08-03  4:00       ` Tian, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).