linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/17] iommufd: Add nesting infrastructure
@ 2023-09-21  7:51 Yi Liu
  2023-09-21  7:51 ` [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op Yi Liu
                   ` (17 more replies)
  0 siblings, 18 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.

Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest I/O page table      |
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush --+
    '-------------'                        |
    |             |                        V
    |             |           I/O page table pointer in GPA
    '-------------'
Guest
------| Shadow |---------------------------|--------
      v        v                           v
Host
    .-------------.  .------------------------.
    |   pIOMMU    |  |  FS for GIOVA->GPA     |
    |             |  '------------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.----------------------------------.
    |             |   | SS for GPA->HPA, unmanaged domain|
    |             |   '----------------------------------'
    '-------------'
Where:
 - FS = First stage page tables
 - SS = Second stage page tables
<Intel VT-d Nested translation>

In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.

This series is based on the iommu hw info reporting series [1]. It first
extends domain_alloc_user to allocate domains with user data and adds new
op for invalidate stage-1 IOTLB for user-managed domains, then extends the
IOMMUFD internal infrastructure to accept user_data and parent hwpt, relay
the user_data/parent to iommu core to allocate user-managed iommu_domain.
After it, extends the ioctl IOMMU_HWPT_ALLOC to accept user data and stage-2
hwpt ID. Along with it, ioctl IOMMU_HWPT_INVALIDATE is added to invalidate
stage-1 IOTLB. This is needed for user-managed hwpts. Selftest is added as
well to cover the new ioctls.

Complete code can be found in [2], QEMU could can be found in [3].

At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.

[1] https://lore.kernel.org/linux-iommu/20230818101033.4100-1-yi.l.liu@intel.com/#r - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1

Change log:

v4:
 - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
   and kernel-managed HWPTs
 - Rework invalidate uAPI to be a multi-request array-based design
 - Add a struct iommu_user_data_array and a helper for driver to sanitize
   and copy the entry data from user space invalidation array
 - Add a patch fixing TEST_LENGTH() in selftest program
 - Drop IOMMU_RESV_IOVA_RANGES patches
 - Update kdoc and inline comments
 - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
   this does not change the rule that resv regions should only be added to the
   kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
   as it is needed only by SMMU so far.

v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.com/
 - Add new uAPI things in alphabetical order
 - Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
   sanity, replacing the previous op->domain_alloc_user_data_len solution
 - Return ERR_PTR from domain_alloc_user instead of NULL
 - Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
 - Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
   userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
   page table). (Kevin)
 - Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
 - Minor changes per Kevin's inputs

v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.com/
 - Add union iommu_domain_user_data to include all user data structures to avoid
   passing void * in kernel APIs.
 - Add iommu op to return user data length for user domain allocation
 - Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
 - Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
 - Convert cache_invalidate_user op to be int instead of void
 - Remove @data_type in struct iommu_hwpt_invalidate
 - Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1

v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.com/

Thanks,
	Yi Liu

Lu Baolu (1):
  iommu: Add nested domain support

Nicolin Chen (12):
  iommufd: Unite all kernel-managed members into a struct
  iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
  iommufd: Add shared alloc_fn function pointer and mutex pointer
  iommufd: Add user-managed hw_pagetable support
  iommufd: Always setup MSI and anforce cc on kernel-managed domains
  iommufd/device: Add helpers to enforce/remove device reserved regions
  iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly
  iommufd/selftest: Add nested domain allocation for mock domain
  iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs
  iommufd/selftest: Add mock_domain_cache_invalidate_user support
  iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
  iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl

Yi Liu (4):
  iommu: Add hwpt_type with user_data for domain_alloc_user op
  iommufd: Pass in hwpt_type/user_data to iommufd_hw_pagetable_alloc()
  iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  iommufd: Add IOMMU_HWPT_INVALIDATE

 drivers/iommu/intel/iommu.c                   |   5 +-
 drivers/iommu/iommufd/device.c                |  51 +++-
 drivers/iommu/iommufd/hw_pagetable.c          | 257 ++++++++++++++++--
 drivers/iommu/iommufd/iommufd_private.h       |  59 +++-
 drivers/iommu/iommufd/iommufd_test.h          |  40 +++
 drivers/iommu/iommufd/main.c                  |   3 +
 drivers/iommu/iommufd/selftest.c              | 184 ++++++++++++-
 include/linux/iommu.h                         | 110 +++++++-
 include/uapi/linux/iommufd.h                  |  60 +++-
 tools/testing/selftests/iommu/iommufd.c       | 209 +++++++++++++-
 .../selftests/iommu/iommufd_fail_nth.c        |   3 +-
 tools/testing/selftests/iommu/iommufd_utils.h |  91 ++++++-
 12 files changed, 998 insertions(+), 74 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-21 12:10   ` Baolu Lu
                     ` (3 more replies)
  2023-09-21  7:51 ` [PATCH v4 02/17] iommu: Add nested domain support Yi Liu
                   ` (16 subsequent siblings)
  17 siblings, 4 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

domain_alloc_user op already accepts user flags for domain allocation,
add hwpt_type and user_data support as well. This op chooses to make the
special parameters opaque to the core. This suits the current usage model
where accessing any of the IOMMU device special parameters does require
a userspace driver that matches the kernel driver. If a need for common
parameters, implemented similarly by several drivers, arises then there
is room in the design to grow a generic parameter set as well.

Define enum iommu_hwpt_type to differentiate the user parameters used by
different iommu vendor drivers in the future. For backward compatibility
and the allocations without any specified hwpt_type or user parameters,
add an IOMMU_HWPT_TYPE_DEFAULT in the list.

Also, add a struct iommu_user_data as a package of data_ptr and data_len
from an iommufd core uAPI structure, then provide a helper function for
a driver to run data length sanity and copy a defined hwpt_type specific
driver user data.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c          |  5 ++-
 drivers/iommu/iommufd/hw_pagetable.c |  4 ++-
 drivers/iommu/iommufd/selftest.c     |  5 ++-
 include/linux/iommu.h                | 51 ++++++++++++++++++++++++++--
 include/uapi/linux/iommufd.h         |  8 +++++
 5 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 491bcde1ff96..4ffc939a71f0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4075,7 +4075,10 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 }
 
 static struct iommu_domain *
-intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
+intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
+			      enum iommu_hwpt_type hwpt_type,
+			      struct iommu_domain *parent,
+			      const struct iommu_user_data *user_data)
 {
 	struct iommu_domain *domain;
 	struct intel_iommu *iommu;
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 26a8a818ffa3..1d7378a6cbb3 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -96,7 +96,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	hwpt->ioas = ioas;
 
 	if (ops->domain_alloc_user) {
-		hwpt->domain = ops->domain_alloc_user(idev->dev, flags);
+		hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
+						      IOMMU_HWPT_TYPE_DEFAULT,
+						      NULL, NULL);
 		if (IS_ERR(hwpt->domain)) {
 			rc = PTR_ERR(hwpt->domain);
 			hwpt->domain = NULL;
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index b54cbfb1862b..2205a552e570 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -171,7 +171,10 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
 }
 
 static struct iommu_domain *
-mock_domain_alloc_user(struct device *dev, u32 flags)
+mock_domain_alloc_user(struct device *dev, u32 flags,
+		       enum iommu_hwpt_type hwpt_type,
+		       struct iommu_domain *parent,
+		       const struct iommu_user_data *user_data)
 {
 	struct iommu_domain *domain;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 660dc1931dc9..12e12e5563e6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -14,6 +14,7 @@
 #include <linux/err.h>
 #include <linux/of.h>
 #include <uapi/linux/iommu.h>
+#include <uapi/linux/iommufd.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -227,6 +228,41 @@ struct iommu_iotlb_gather {
 	bool			queued;
 };
 
+/**
+ * struct iommu_user_data - iommu driver specific user space data info
+ * @uptr: Pointer to the user buffer for copy_from_user()
+ * @len: The length of the user buffer in bytes
+ *
+ * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
+ * Both @uptr and @len should be just copied from an iommufd core uAPI structure
+ */
+struct iommu_user_data {
+	void __user *uptr;
+	size_t len;
+};
+
+/**
+ * iommu_copy_user_data - Copy iommu driver specific user space data
+ * @dst_data: Pointer to an iommu driver specific user data that is defined in
+ *            include/uapi/linux/iommufd.h
+ * @src_data: Pointer to a struct iommu_user_data for user space data info
+ * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
+ * @min_len: Initial length of user data structure for backward compatibility.
+ *           This should be offsetofend using the last member in the user data
+ *           struct that was initially added to include/uapi/linux/iommufd.h
+ */
+static inline int iommu_copy_user_data(void *dst_data,
+				       const struct iommu_user_data *src_data,
+				       size_t data_len, size_t min_len)
+{
+	if (WARN_ON(!dst_data || !src_data))
+		return -EINVAL;
+	if (src_data->len < min_len || data_len < src_data->len)
+		return -EINVAL;
+	return copy_struct_from_user(dst_data, data_len,
+				     src_data->uptr, src_data->len);
+}
+
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
@@ -237,11 +273,17 @@ struct iommu_iotlb_gather {
  * @domain_alloc: allocate iommu domain
  * @domain_alloc_user: Allocate an iommu domain corresponding to the input
  *                     parameters like flags defined as enum iommufd_ioas_map_flags
+ *                     and the @hwpt_type that is defined as enum iommu_hwpt_type
  *                     in include/uapi/linux/iommufd.h. Different from the
  *                     domain_alloc op, it requires iommu driver to fully
  *                     initialize a new domain including the generic iommu_domain
- *                     struct. Upon success, a domain is returned. Upon failure,
- *                     ERR_PTR must be returned.
+ *                     struct.
+ *                     Upon success, if the @user_data is valid and the @parent
+ *                     points to a kernel-managed domain, the new domain must be
+ *                     IOMMU_DOMAIN_NESTED type; otherwise, the @parent must be
+ *                     NULL while the @user_data can be optionally provided, the
+ *                     new domain must be IOMMU_DOMAIN_UNMANAGED type.
+ *                     Upon failure, ERR_PTR must be returned.
  * @probe_device: Add device to iommu driver handling
  * @release_device: Remove device from iommu driver handling
  * @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -274,7 +316,10 @@ struct iommu_ops {
 
 	/* Domain allocation and freeing by the iommu driver */
 	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
-	struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags);
+	struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags,
+						  enum iommu_hwpt_type hwpt_type,
+						  struct iommu_domain *parent,
+						  const struct iommu_user_data *user_data);
 
 	struct iommu_device *(*probe_device)(struct device *dev);
 	void (*release_device)(struct device *dev);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 4a7c5c8fdbb4..3c8660fe9bb1 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags {
 	IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
 };
 
+/**
+ * enum iommu_hwpt_type - IOMMU HWPT Type
+ * @IOMMU_HWPT_TYPE_DEFAULT: default
+ */
+enum iommu_hwpt_type {
+	IOMMU_HWPT_TYPE_DEFAULT,
+};
+
 /**
  * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
  * @size: sizeof(struct iommu_hwpt_alloc)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 02/17] iommu: Add nested domain support
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
  2023-09-21  7:51 ` [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-10-10 17:21   ` Jason Gunthorpe
  2023-09-21  7:51 ` [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct Yi Liu
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Lu Baolu <baolu.lu@linux.intel.com>

Introduce a new domain type for a user I/O page table, which is nested on
top of another user space address represented by a UNMANAGED domain. The
mappings of a nested domain are managed by user space software, so it is
not necessary to have map/unmap callbacks. But the updates of the PTEs in
the nested page table will be propagated to the hardware caches on both
IOMMU (IOTLB) and devices (DevTLB/ATC).

A nested domain is allocated by the domain_alloc_user op, and attached to
a device through the existing iommu_attach_device/group() interfaces.

Add a new domain op cache_invalidate_user for the userspace to flush the
hardware caches for a nested domain through iommufd. No wrapper for it,
as it's only supposed to be used by iommufd.

Pass in invalidation requests to the cache_invalidate_user op, in form of
a user data array that conatins a number of invalidation entries. Add an
iommu_user_data_array struct and an iommu_copy_user_data_from_array helper
for iommu drivers to walk through the invalidation request array and fetch
the data entry inside.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/linux/iommu.h | 59 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 12e12e5563e6..439e295c91a3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -66,6 +66,9 @@ struct iommu_domain_geometry {
 
 #define __IOMMU_DOMAIN_SVA	(1U << 4)  /* Shared process address space */
 
+#define __IOMMU_DOMAIN_NESTED	(1U << 5)  /* User-managed address space nested
+					      on a stage-2 translation        */
+
 #define IOMMU_DOMAIN_ALLOC_FLAGS ~__IOMMU_DOMAIN_DMA_FQ
 /*
  * This are the possible domain-types
@@ -92,6 +95,7 @@ struct iommu_domain_geometry {
 				 __IOMMU_DOMAIN_DMA_API |	\
 				 __IOMMU_DOMAIN_DMA_FQ)
 #define IOMMU_DOMAIN_SVA	(__IOMMU_DOMAIN_SVA)
+#define IOMMU_DOMAIN_NESTED	(__IOMMU_DOMAIN_NESTED)
 
 struct iommu_domain {
 	unsigned type;
@@ -241,6 +245,21 @@ struct iommu_user_data {
 	size_t len;
 };
 
+/**
+ * struct iommu_user_data_array - iommu driver specific user space data array
+ * @uptr: Pointer to the user buffer array for copy_from_user()
+ * @entry_len: The fixed-width length of a entry in the array, in bytes
+ * @entry_num: The number of total entries in the array
+ *
+ * A array having a @entry_num number of @entry_len sized entries, each entry is
+ * user space data, i.e. an uAPI that is defined in include/uapi/linux/iommufd.h
+ */
+struct iommu_user_data_array {
+	void __user *uptr;
+	size_t entry_len;
+	int entry_num;
+};
+
 /**
  * iommu_copy_user_data - Copy iommu driver specific user space data
  * @dst_data: Pointer to an iommu driver specific user data that is defined in
@@ -263,6 +282,34 @@ static inline int iommu_copy_user_data(void *dst_data,
 				     src_data->uptr, src_data->len);
 }
 
+/**
+ * iommu_copy_user_data_from_array - Copy iommu driver specific user space data
+ *                                   from an iommu_user_data_array input
+ * @dst_data: Pointer to an iommu driver specific user data that is defined in
+ *            include/uapi/linux/iommufd.h
+ * @src_data: Pointer to a struct iommu_user_data_array for user space data array
+ * @index: Index to offset the location in the array to copy user data from
+ * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
+ * @min_len: Initial length of user data structure for backward compatibility.
+ *           This should be offsetofend using the last member in the user data
+ *           struct that was initially added to include/uapi/linux/iommufd.h
+ */
+static inline int
+iommu_copy_user_data_from_array(void *dst_data,
+				const struct iommu_user_data_array *src_array,
+				int index, size_t data_len, size_t min_len)
+{
+	struct iommu_user_data src_data;
+
+	if (WARN_ON(!src_array || index >= src_array->entry_num))
+		return -EINVAL;
+	if (!src_array->entry_num)
+		return -EINVAL;
+	src_data.uptr = src_array->uptr + src_array->entry_len * index;
+	src_data.len = src_array->entry_len;
+	return iommu_copy_user_data(dst_data, &src_data, data_len, min_len);
+}
+
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
@@ -374,6 +421,15 @@ struct iommu_ops {
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *            queue
+ * @cache_invalidate_user: Flush hardware cache for user space IO page table.
+ *                         The @domain must be IOMMU_DOMAIN_NESTED. The @array
+ *                         passes in the cache invalidation requests, in form
+ *                         of a driver data structure. The driver must update
+ *                         array->entry_num to report the number of handled
+ *                         invalidation requests. The 32-bit @error_code can
+ *                         forward a driver specific error code to user space.
+ *                         Both the driver data structure and the error code
+ *                         must be defined in  include/uapi/linux/iommufd.h
  * @iova_to_phys: translate iova to physical address
  * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE,
  *                           including no-snoop TLPs on PCIe or other platform
@@ -403,6 +459,9 @@ struct iommu_domain_ops {
 			       size_t size);
 	void (*iotlb_sync)(struct iommu_domain *domain,
 			   struct iommu_iotlb_gather *iotlb_gather);
+	int (*cache_invalidate_user)(struct iommu_domain *domain,
+				     struct iommu_user_data_array *array,
+				     u32 *error_code);
 
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
 				    dma_addr_t iova);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
  2023-09-21  7:51 ` [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op Yi Liu
  2023-09-21  7:51 ` [PATCH v4 02/17] iommu: Add nested domain support Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-26  7:46   ` Tian, Kevin
  2023-10-07 10:08   ` Yan Zhao
  2023-09-21  7:51 ` [PATCH v4 04/17] iommufd: Pass in hwpt_type/user_data to iommufd_hw_pagetable_alloc() Yi Liu
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

The struct iommufd_hw_pagetable has been representing a kernel-managed
HWPT, yet soon will be reused to represent a user-managed HWPT. These
two types of HWPTs has the same IOMMUFD object type and an iommu_domain
object, but have quite different attributes/members.

Add a union in struct iommufd_hw_pagetable and group all the existing
kernel-managed members. One of the following patches will add another
struct for user-managed members.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 3064997a0181..947a797536e3 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
  */
 struct iommufd_hw_pagetable {
 	struct iommufd_object obj;
-	struct iommufd_ioas *ioas;
 	struct iommu_domain *domain;
-	bool auto_domain : 1;
-	bool enforce_cache_coherency : 1;
-	bool msi_cookie : 1;
-	/* Head at iommufd_ioas::hwpt_list */
-	struct list_head hwpt_item;
+
+	union {
+		struct { /* kernel-managed */
+			struct iommufd_ioas *ioas;
+			bool auto_domain : 1;
+			bool enforce_cache_coherency : 1;
+			bool msi_cookie : 1;
+			/* Head at iommufd_ioas::hwpt_list */
+			struct list_head hwpt_item;
+		};
+	};
 };
 
 struct iommufd_hw_pagetable *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 04/17] iommufd: Pass in hwpt_type/user_data to iommufd_hw_pagetable_alloc()
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (2 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-21  7:51 ` [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions Yi Liu
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

iommu core already supports accepting user_data and hwpt_type to allocate
a user iommu_domain, which allows the caller iommufd_hw_pagetable_alloc()
to accept the hwpt_type and user_data too. Thus, pass them in.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          |  3 ++-
 drivers/iommu/iommufd/hw_pagetable.c    | 17 ++++++++++++++---
 drivers/iommu/iommufd/iommufd_private.h |  3 +++
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index e88fa73a45e6..e04900f101f1 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -540,7 +540,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 	}
 
 	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
-					  0, immediate_attach);
+					  0, IOMMU_HWPT_TYPE_DEFAULT,
+					  NULL, immediate_attach);
 	if (IS_ERR(hwpt)) {
 		destroy_hwpt = ERR_CAST(hwpt);
 		goto out_unlock;
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 1d7378a6cbb3..554a9c3d740f 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -62,6 +62,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
  * @ioas: IOAS to associate the domain with
  * @idev: Device to get an iommu_domain for
  * @flags: Flags from userspace
+ * @hwpt_type: Requested type of hw_pagetable
+ * @user_data: Optional user_data pointer
  * @immediate_attach: True if idev should be attached to the hwpt
  *
  * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
@@ -75,6 +77,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			   struct iommufd_device *idev, u32 flags,
+			   enum iommu_hwpt_type hwpt_type,
+			   struct iommu_user_data *user_data,
 			   bool immediate_attach)
 {
 	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
@@ -86,6 +90,11 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops->domain_alloc_user)
 		return ERR_PTR(-EOPNOTSUPP);
 
+	if (user_data && hwpt_type == IOMMU_HWPT_TYPE_DEFAULT)
+		return ERR_PTR(-EINVAL);
+	if (user_data && !ops->domain_alloc_user)
+		return ERR_PTR(-EOPNOTSUPP);
+
 	hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
 	if (IS_ERR(hwpt))
 		return hwpt;
@@ -97,8 +106,8 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 
 	if (ops->domain_alloc_user) {
 		hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
-						      IOMMU_HWPT_TYPE_DEFAULT,
-						      NULL, NULL);
+						      hwpt_type, NULL,
+						      user_data);
 		if (IS_ERR(hwpt->domain)) {
 			rc = PTR_ERR(hwpt->domain);
 			hwpt->domain = NULL;
@@ -174,7 +183,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 
 	mutex_lock(&ioas->mutex);
 	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas,
-					  idev, cmd->flags, false);
+					  idev, cmd->flags,
+					  IOMMU_HWPT_TYPE_DEFAULT,
+					  NULL, false);
 	if (IS_ERR(hwpt)) {
 		rc = PTR_ERR(hwpt);
 		goto out_unlock;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 947a797536e3..1d3b1a74e854 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -13,6 +13,7 @@ struct iommu_domain;
 struct iommu_group;
 struct iommu_option;
 struct iommufd_device;
+struct iommu_user_data;
 
 struct iommufd_ctx {
 	struct file *file;
@@ -248,6 +249,8 @@ struct iommufd_hw_pagetable {
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			   struct iommufd_device *idev, u32 flags,
+			   enum iommu_hwpt_type hwpt_type,
+			   struct iommu_user_data *user_data,
 			   bool immediate_attach);
 int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt);
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (3 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 04/17] iommufd: Pass in hwpt_type/user_data to iommufd_hw_pagetable_alloc() Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-26  7:57   ` Tian, Kevin
  2023-10-10 18:49   ` Jason Gunthorpe
  2023-09-21  7:51 ` [PATCH v4 06/17] iommufd: Add shared alloc_fn function pointer and mutex pointer Yi Liu
                   ` (12 subsequent siblings)
  17 siblings, 2 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

As one of the previous commits mentioned, a user-managed HWPT will have
some different attributes/members. It'd be more clear by having separate
allocators. Since the existing iommufd_hw_pagetable_alloc() serves well
kernel-managed HWPTs, apply some minimal updates to mark it as a kernel-
managed HWPT allocator.

Also, add a pair of function pointers (abort and destroy) in the struct,
to separate different cleanup routines. Then rename the existing cleanup
functions to iommufd_kernel_managed_hwpt_destroy/abort() linked to the
HWPT in the allocator.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 34 ++++++++++++++++++++-----
 drivers/iommu/iommufd/iommufd_private.h |  3 +++
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 554a9c3d740f..1cc7178121d1 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,7 +8,7 @@
 #include "../iommu-priv.h"
 #include "iommufd_private.h"
 
-void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
+static void iommufd_kernel_managed_hwpt_destroy(struct iommufd_object *obj)
 {
 	struct iommufd_hw_pagetable *hwpt =
 		container_of(obj, struct iommufd_hw_pagetable, obj);
@@ -27,7 +27,12 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 	refcount_dec(&hwpt->ioas->obj.users);
 }
 
-void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
+void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
+{
+	container_of(obj, struct iommufd_hw_pagetable, obj)->destroy(obj);
+}
+
+static void iommufd_kernel_managed_hwpt_abort(struct iommufd_object *obj)
 {
 	struct iommufd_hw_pagetable *hwpt =
 		container_of(obj, struct iommufd_hw_pagetable, obj);
@@ -42,6 +47,11 @@ void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
 	iommufd_hw_pagetable_destroy(obj);
 }
 
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
+{
+	container_of(obj, struct iommufd_hw_pagetable, obj)->abort(obj);
+}
+
 int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
 {
 	if (hwpt->enforce_cache_coherency)
@@ -57,7 +67,7 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
 }
 
 /**
- * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
+ * iommufd_hw_pagetable_alloc() - Get a kernel-managed iommu_domain for a device
  * @ictx: iommufd context
  * @ioas: IOAS to associate the domain with
  * @idev: Device to get an iommu_domain for
@@ -66,9 +76,9 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
  * @user_data: Optional user_data pointer
  * @immediate_attach: True if idev should be attached to the hwpt
  *
- * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
- * will be linked to the given ioas and upon return the underlying iommu_domain
- * is fully popoulated.
+ * Allocate a new iommu_domain (must be IOMMU_DOMAIN_UNMANAGED) and return it as
+ * a kernel-managed hw_pagetable. The HWPT will be linked to the given ioas and
+ * upon return the underlying iommu_domain is fully popoulated.
  *
  * The caller must hold the ioas->mutex until after
  * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on
@@ -103,6 +113,8 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	/* Pairs with iommufd_hw_pagetable_destroy() */
 	refcount_inc(&ioas->obj.users);
 	hwpt->ioas = ioas;
+	hwpt->abort = iommufd_kernel_managed_hwpt_abort;
+	hwpt->destroy = iommufd_kernel_managed_hwpt_destroy;
 
 	if (ops->domain_alloc_user) {
 		hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
@@ -121,6 +133,16 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 		}
 	}
 
+	if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED)) {
+		rc = -EINVAL;
+		goto out_abort;
+	}
+	/* Driver is buggy by mixing user-managed op in kernel-managed ops */
+	if (WARN_ON_ONCE(hwpt->domain->ops->cache_invalidate_user)) {
+		rc = -EINVAL;
+		goto out_abort;
+	}
+
 	/*
 	 * Set the coherency mode before we do iopt_table_add_domain() as some
 	 * iommus have a per-PTE bit that controls it and need to decide before
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 1d3b1a74e854..3e89c3d530f3 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -234,6 +234,9 @@ struct iommufd_hw_pagetable {
 	struct iommufd_object obj;
 	struct iommu_domain *domain;
 
+	void (*abort)(struct iommufd_object *obj);
+	void (*destroy)(struct iommufd_object *obj);
+
 	union {
 		struct { /* kernel-managed */
 			struct iommufd_ioas *ioas;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 06/17] iommufd: Add shared alloc_fn function pointer and mutex pointer
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (4 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-21  7:51 ` [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support Yi Liu
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

This allows iommufd_hwpt_alloc() to have a common routine but jump to a
different allocator and hold a different mutex, corresponding to types
of HWPT allocation (either kernel-managed or user-managed). This shared
function pointer takes "pt_obj" as an input that would be coverted into
an IOAS pointer or a parent HWPT pointer.

Then, update the kernel-managed allocator to follow this pt_obj change.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          |  2 +-
 drivers/iommu/iommufd/hw_pagetable.c    | 46 ++++++++++++++++++-------
 drivers/iommu/iommufd/iommufd_private.h |  3 +-
 3 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index e04900f101f1..eb120f70a3e3 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -539,7 +539,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 		goto out_unlock;
 	}
 
-	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
+	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, &ioas->obj, idev,
 					  0, IOMMU_HWPT_TYPE_DEFAULT,
 					  NULL, immediate_attach);
 	if (IS_ERR(hwpt)) {
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 1cc7178121d1..b2af68776877 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -69,7 +69,7 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
 /**
  * iommufd_hw_pagetable_alloc() - Get a kernel-managed iommu_domain for a device
  * @ictx: iommufd context
- * @ioas: IOAS to associate the domain with
+ * @pt_obj: An object to an IOAS to associate the domain with
  * @idev: Device to get an iommu_domain for
  * @flags: Flags from userspace
  * @hwpt_type: Requested type of hw_pagetable
@@ -85,12 +85,15 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
  * the returned hwpt.
  */
 struct iommufd_hw_pagetable *
-iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
+iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx,
+			   struct iommufd_object *pt_obj,
 			   struct iommufd_device *idev, u32 flags,
 			   enum iommu_hwpt_type hwpt_type,
 			   struct iommu_user_data *user_data,
 			   bool immediate_attach)
 {
+	struct iommufd_ioas *ioas =
+		container_of(pt_obj, struct iommufd_ioas, obj);
 	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
@@ -184,10 +187,19 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 
 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 {
+	struct iommufd_hw_pagetable *(*alloc_fn)(
+					struct iommufd_ctx *ictx,
+					struct iommufd_object *pt_obj,
+					struct iommufd_device *idev,
+					u32 flags, enum iommu_hwpt_type type,
+					struct iommu_user_data *user_data,
+					bool flag);
 	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
 	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_object *pt_obj;
 	struct iommufd_device *idev;
 	struct iommufd_ioas *ioas;
+	struct mutex *mutex;
 	int rc;
 
 	if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved)
@@ -197,17 +209,26 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	if (IS_ERR(idev))
 		return PTR_ERR(idev);
 
-	ioas = iommufd_get_ioas(ucmd->ictx, cmd->pt_id);
-	if (IS_ERR(ioas)) {
-		rc = PTR_ERR(ioas);
+	pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY);
+	if (IS_ERR(pt_obj)) {
+		rc = -EINVAL;
 		goto out_put_idev;
 	}
 
-	mutex_lock(&ioas->mutex);
-	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas,
-					  idev, cmd->flags,
-					  IOMMU_HWPT_TYPE_DEFAULT,
-					  NULL, false);
+	switch (pt_obj->type) {
+	case IOMMUFD_OBJ_IOAS:
+		ioas = container_of(pt_obj, struct iommufd_ioas, obj);
+		mutex = &ioas->mutex;
+		alloc_fn = iommufd_hw_pagetable_alloc;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out_put_pt;
+	}
+
+	mutex_lock(mutex);
+	hwpt = alloc_fn(ucmd->ictx, pt_obj, idev, cmd->flags,
+			IOMMU_HWPT_TYPE_DEFAULT, NULL, false);
 	if (IS_ERR(hwpt)) {
 		rc = PTR_ERR(hwpt);
 		goto out_unlock;
@@ -223,8 +244,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 out_hwpt:
 	iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
 out_unlock:
-	mutex_unlock(&ioas->mutex);
-	iommufd_put_object(&ioas->obj);
+	mutex_unlock(mutex);
+out_put_pt:
+	iommufd_put_object(pt_obj);
 out_put_idev:
 	iommufd_put_object(&idev->obj);
 	return rc;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 3e89c3d530f3..e4d06ae6b0c5 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -250,7 +250,8 @@ struct iommufd_hw_pagetable {
 };
 
 struct iommufd_hw_pagetable *
-iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
+iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx,
+			   struct iommufd_object *pt_obj,
 			   struct iommufd_device *idev, u32 flags,
 			   enum iommu_hwpt_type hwpt_type,
 			   struct iommu_user_data *user_data,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (5 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 06/17] iommufd: Add shared alloc_fn function pointer and mutex pointer Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-26  8:14   ` Tian, Kevin
  2023-09-21  7:51 ` [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains Yi Liu
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

Add a parent hw_pagetable pointer for user-managed hw_pagetables. Similar
to the ioas->mutex, add another mutex in the kernel-managed hw_pagetable
to serialize associating user-managed hw_pagetable allocations. Then, add
user_managed flag too in the struct to ease identifying a HWPT.

Also, add a new allocator iommufd_user_managed_hwpt_alloc() and two pairs
of cleanup functions iommufd_user_managed_hwpt_destroy/abort().

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 112 +++++++++++++++++++++++-
 drivers/iommu/iommufd/iommufd_private.h |   6 ++
 2 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index b2af68776877..dc3e11a23acf 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,17 @@
 #include "../iommu-priv.h"
 #include "iommufd_private.h"
 
+static void iommufd_user_managed_hwpt_destroy(struct iommufd_object *obj)
+{
+	struct iommufd_hw_pagetable *hwpt =
+		container_of(obj, struct iommufd_hw_pagetable, obj);
+
+	if (hwpt->domain)
+		iommu_domain_free(hwpt->domain);
+
+	refcount_dec(&hwpt->parent->obj.users);
+}
+
 static void iommufd_kernel_managed_hwpt_destroy(struct iommufd_object *obj)
 {
 	struct iommufd_hw_pagetable *hwpt =
@@ -32,6 +43,17 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 	container_of(obj, struct iommufd_hw_pagetable, obj)->destroy(obj);
 }
 
+static void iommufd_user_managed_hwpt_abort(struct iommufd_object *obj)
+{
+	struct iommufd_hw_pagetable *hwpt =
+		container_of(obj, struct iommufd_hw_pagetable, obj);
+
+	/* The parent->mutex must be held until finalize is called. */
+	lockdep_assert_held(&hwpt->parent->mutex);
+
+	iommufd_hw_pagetable_destroy(obj);
+}
+
 static void iommufd_kernel_managed_hwpt_abort(struct iommufd_object *obj)
 {
 	struct iommufd_hw_pagetable *hwpt =
@@ -52,6 +74,82 @@ void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
 	container_of(obj, struct iommufd_hw_pagetable, obj)->abort(obj);
 }
 
+/**
+ * iommufd_user_managed_hwpt_alloc() - Get a user-managed hw_pagetable
+ * @ictx: iommufd context
+ * @pt_obj: Parent object to an HWPT to associate the domain with
+ * @idev: Device to get an iommu_domain for
+ * @flags: Flags from userspace
+ * @hwpt_type: Requested type of hw_pagetable
+ * @user_data: user_data pointer
+ * @dummy: never used
+ *
+ * Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and return it as
+ * a user-managed hw_pagetable.
+ */
+static struct iommufd_hw_pagetable *
+iommufd_user_managed_hwpt_alloc(struct iommufd_ctx *ictx,
+				struct iommufd_object *pt_obj,
+				struct iommufd_device *idev,
+				u32 flags,
+				enum iommu_hwpt_type hwpt_type,
+				struct iommu_user_data *user_data,
+				bool dummy)
+{
+	struct iommufd_hw_pagetable *parent =
+		container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+	struct iommufd_hw_pagetable *hwpt;
+	int rc;
+
+	if (!user_data)
+		return ERR_PTR(-EINVAL);
+	if (parent->auto_domain)
+		return ERR_PTR(-EINVAL);
+	if (!parent->nest_parent)
+		return ERR_PTR(-EINVAL);
+	if (hwpt_type == IOMMU_HWPT_TYPE_DEFAULT)
+		return ERR_PTR(-EINVAL);
+
+	if (!ops->domain_alloc_user)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	lockdep_assert_held(&parent->mutex);
+
+	hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
+	if (IS_ERR(hwpt))
+		return hwpt;
+
+	refcount_inc(&parent->obj.users);
+	hwpt->parent = parent;
+	hwpt->user_managed = true;
+	hwpt->abort = iommufd_user_managed_hwpt_abort;
+	hwpt->destroy = iommufd_user_managed_hwpt_destroy;
+
+	hwpt->domain = ops->domain_alloc_user(idev->dev, flags, hwpt_type,
+					      parent->domain, user_data);
+	if (IS_ERR(hwpt->domain)) {
+		rc = PTR_ERR(hwpt->domain);
+		hwpt->domain = NULL;
+		goto out_abort;
+	}
+
+	if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) {
+		rc = -EINVAL;
+		goto out_abort;
+	}
+	/* Driver is buggy by missing cache_invalidate_user in domain_ops */
+	if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
+		rc = -EINVAL;
+		goto out_abort;
+	}
+	return hwpt;
+
+out_abort:
+	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
+	return ERR_PTR(rc);
+}
+
 int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
 {
 	if (hwpt->enforce_cache_coherency)
@@ -112,10 +210,12 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx,
 	if (IS_ERR(hwpt))
 		return hwpt;
 
+	mutex_init(&hwpt->mutex);
 	INIT_LIST_HEAD(&hwpt->hwpt_item);
 	/* Pairs with iommufd_hw_pagetable_destroy() */
 	refcount_inc(&ioas->obj.users);
 	hwpt->ioas = ioas;
+	hwpt->nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
 	hwpt->abort = iommufd_kernel_managed_hwpt_abort;
 	hwpt->destroy = iommufd_kernel_managed_hwpt_destroy;
 
@@ -194,8 +294,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 					u32 flags, enum iommu_hwpt_type type,
 					struct iommu_user_data *user_data,
 					bool flag);
+	struct iommufd_hw_pagetable *hwpt, *parent;
 	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
-	struct iommufd_hw_pagetable *hwpt;
 	struct iommufd_object *pt_obj;
 	struct iommufd_device *idev;
 	struct iommufd_ioas *ioas;
@@ -221,6 +321,16 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 		mutex = &ioas->mutex;
 		alloc_fn = iommufd_hw_pagetable_alloc;
 		break;
+	case IOMMUFD_OBJ_HW_PAGETABLE:
+		parent = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+		/* No user-managed HWPT on top of an user-managed one */
+		if (parent->user_managed) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}
+		mutex = &parent->mutex;
+		alloc_fn = iommufd_user_managed_hwpt_alloc;
+		break;
 	default:
 		rc = -EINVAL;
 		goto out_put_pt;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index e4d06ae6b0c5..34940596c2c2 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -237,12 +237,18 @@ struct iommufd_hw_pagetable {
 	void (*abort)(struct iommufd_object *obj);
 	void (*destroy)(struct iommufd_object *obj);
 
+	bool user_managed : 1;
 	union {
+		struct { /* user-managed */
+			struct iommufd_hw_pagetable *parent;
+		};
 		struct { /* kernel-managed */
 			struct iommufd_ioas *ioas;
+			struct mutex mutex;
 			bool auto_domain : 1;
 			bool enforce_cache_coherency : 1;
 			bool msi_cookie : 1;
+			bool nest_parent : 1;
 			/* Head at iommufd_ioas::hwpt_list */
 			struct list_head hwpt_item;
 		};
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (6 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-26  8:16   ` Tian, Kevin
  2023-09-21  7:51 ` [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions Yi Liu
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt things.
So, they should be only setup on kernel-managed domains. If the attaching
domain is a user-managed domain, redirect the hwpt to hwpt->parent to do
it correctly.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Co-developed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c       | 4 ++++
 drivers/iommu/iommufd/hw_pagetable.c | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index eb120f70a3e3..104dd061a2a3 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -305,12 +305,16 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
 	 * domain after request_irq(). If it is not done interrupts will not
 	 * work on this domain.
 	 *
+	 * Note: always set up a msi_cookie on a kernel-manage hw_pagetable.
+	 *
 	 * FIXME: This is conceptually broken for iommufd since we want to allow
 	 * userspace to change the domains, eg switch from an identity IOAS to a
 	 * DMA IOAS. There is currently no way to create a MSI window that
 	 * matches what the IRQ layer actually expects in a newly created
 	 * domain.
 	 */
+	if (hwpt->user_managed)
+		hwpt = hwpt->parent;
 	if (sw_msi_start != PHYS_ADDR_MAX && !hwpt->msi_cookie) {
 		rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start);
 		if (rc)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index dc3e11a23acf..90fd65859e28 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -152,6 +152,10 @@ iommufd_user_managed_hwpt_alloc(struct iommufd_ctx *ictx,
 
 int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
 {
+	/* Always enforce cache coherency on a kernel-managed hw_pagetable */
+	if (hwpt->user_managed)
+		hwpt = hwpt->parent;
+
 	if (hwpt->enforce_cache_coherency)
 		return 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (7 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-10-07  7:20   ` Yan Zhao
  2023-09-21  7:51 ` [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data Yi Liu
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

The iopt_table_enforce_dev_resv_regions() and iopt_remove_reserved_iova()
require callers to pass in an ioas->iopt pointer. It simply works with a
kernel-managed hw_pagetable by passing in its hwpt->ioas->iopt pointer.
However, now there could be a user-managed hw_pagetable that doesn't have
an ioas pointer. And typically most of device reserved regions should be
enforced to a kernel-managed domain only, although the IOMMU_RESV_SW_MSI
used by SMMU will introduce some complication.

Add a pair of iommufd_device_enforce_rr/iommufd_device_remove_rr helpers
that calls iopt_table_enforce_dev_resv_regions/iopt_remove_reserved_iova
functions after some additional checks. This would also ease any further
extension to support the IOMMU_RESV_SW_MSI complication mentioned above.

For the replace() routine, add another helper to compare ioas pointers,
with the support of user-managed hw_pagetables.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 42 ++++++++++++++++++-------
 drivers/iommu/iommufd/iommufd_private.h | 18 +++++++++++
 2 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 104dd061a2a3..10e6ec590ede 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -329,6 +329,28 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
 	return 0;
 }
 
+static void iommufd_device_remove_rr(struct iommufd_device *idev,
+				     struct iommufd_hw_pagetable *hwpt)
+{
+	if (WARN_ON(!hwpt))
+		return;
+	if (hwpt->user_managed)
+		return;
+	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+}
+
+static int iommufd_device_enforce_rr(struct iommufd_device *idev,
+				     struct iommufd_hw_pagetable *hwpt,
+				     phys_addr_t *sw_msi_start)
+{
+	if (WARN_ON(!hwpt))
+		return -EINVAL;
+	if (hwpt->user_managed)
+		return 0;
+	return iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev,
+						   sw_msi_start);
+}
+
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				struct iommufd_device *idev)
 {
@@ -348,8 +370,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 			goto err_unlock;
 	}
 
-	rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev,
-						 &idev->igroup->sw_msi_start);
+	rc = iommufd_device_enforce_rr(idev, hwpt, &idev->igroup->sw_msi_start);
 	if (rc)
 		goto err_unlock;
 
@@ -375,7 +396,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	mutex_unlock(&idev->igroup->lock);
 	return 0;
 err_unresv:
-	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+	iommufd_device_remove_rr(idev, hwpt);
 err_unlock:
 	mutex_unlock(&idev->igroup->lock);
 	return rc;
@@ -392,7 +413,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 		iommu_detach_group(hwpt->domain, idev->igroup->group);
 		idev->igroup->hwpt = NULL;
 	}
-	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+	iommufd_device_remove_rr(idev, hwpt);
 	mutex_unlock(&idev->igroup->lock);
 
 	/* Caller must destroy hwpt */
@@ -444,10 +465,9 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 	}
 
 	old_hwpt = igroup->hwpt;
-	if (hwpt->ioas != old_hwpt->ioas) {
+	if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
 		list_for_each_entry(cur, &igroup->device_list, group_item) {
-			rc = iopt_table_enforce_dev_resv_regions(
-				&hwpt->ioas->iopt, cur->dev, NULL);
+			rc = iommufd_device_enforce_rr(cur, hwpt, NULL);
 			if (rc)
 				goto err_unresv;
 		}
@@ -461,12 +481,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 	if (rc)
 		goto err_unresv;
 
-	if (hwpt->ioas != old_hwpt->ioas) {
+	if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
 		list_for_each_entry(cur, &igroup->device_list, group_item)
-			iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
-						  cur->dev);
+			iommufd_device_remove_rr(cur, hwpt);
 	}
-
 	igroup->hwpt = hwpt;
 
 	/*
@@ -483,7 +501,7 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 	return old_hwpt;
 err_unresv:
 	list_for_each_entry(cur, &igroup->device_list, group_item)
-		iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev);
+		iommufd_device_remove_rr(cur, hwpt);
 err_unlock:
 	mutex_unlock(&idev->igroup->lock);
 	return ERR_PTR(rc);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 34940596c2c2..b14f23d3f42e 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -281,6 +281,24 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
 		refcount_dec(&hwpt->obj.users);
 }
 
+static inline bool
+iommufd_hw_pagetable_compare_ioas(struct iommufd_hw_pagetable *old_hwpt,
+				  struct iommufd_hw_pagetable *new_hwpt)
+{
+	struct iommufd_ioas *old_ioas, *new_ioas;
+
+	WARN_ON(!old_hwpt || !new_hwpt);
+	if (old_hwpt->user_managed)
+		old_ioas = old_hwpt->parent->ioas;
+	else
+		old_ioas = old_hwpt->ioas;
+	if (new_hwpt->user_managed)
+		new_ioas = new_hwpt->parent->ioas;
+	else
+		new_ioas = new_hwpt->ioas;
+	return old_ioas != new_ioas;
+}
+
 struct iommufd_group {
 	struct kref ref;
 	struct mutex lock;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (8 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-10-13 15:19   ` Jason Gunthorpe
  2023-09-21  7:51 ` [PATCH v4 11/17] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
But it can only allocate a hw_pagetable that associates to a given IOAS,
i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.

IOMMU drivers can now support user-managed hw_pagetables, for two-stage
translation use cases, that require user data input from the user space.

Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
type specified user data. Also, update the @pt_id to accept hwpt_id too
besides an ioas_id. Then, pass them to the downstream alloc_fn().

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
 include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 90fd65859e28..ab25de149ae6 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -300,6 +300,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 					bool flag);
 	struct iommufd_hw_pagetable *hwpt, *parent;
 	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
+	struct iommu_user_data *data = NULL;
 	struct iommufd_object *pt_obj;
 	struct iommufd_device *idev;
 	struct iommufd_ioas *ioas;
@@ -308,6 +309,11 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 
 	if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved)
 		return -EOPNOTSUPP;
+	if (!cmd->data_len && cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT)
+		return -EINVAL;
+	if (cmd->flags & IOMMU_HWPT_ALLOC_NEST_PARENT &&
+	    cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT)
+		return -EINVAL;
 
 	idev = iommufd_get_device(ucmd, cmd->dev_id);
 	if (IS_ERR(idev))
@@ -340,9 +346,19 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 		goto out_put_pt;
 	}
 
+	if (cmd->data_len) {
+		data = kzalloc(sizeof(*data), GFP_KERNEL);
+		if (!data) {
+			rc = -ENOMEM;
+			goto out_put_pt;
+		}
+		data->uptr = u64_to_user_ptr(cmd->data_uptr);
+		data->len = cmd->data_len;
+	}
+
 	mutex_lock(mutex);
 	hwpt = alloc_fn(ucmd->ictx, pt_obj, idev, cmd->flags,
-			IOMMU_HWPT_TYPE_DEFAULT, NULL, false);
+			cmd->hwpt_type, data, false);
 	if (IS_ERR(hwpt)) {
 		rc = PTR_ERR(hwpt);
 		goto out_unlock;
@@ -359,6 +375,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
 out_unlock:
 	mutex_unlock(mutex);
+	kfree(data);
 out_put_pt:
 	iommufd_put_object(pt_obj);
 out_put_idev:
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 3c8660fe9bb1..c46b1f772f20 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -370,15 +370,31 @@ enum iommu_hwpt_type {
  * @size: sizeof(struct iommu_hwpt_alloc)
  * @flags: Combination of enum iommufd_hwpt_alloc_flags
  * @dev_id: The device to allocate this HWPT for
- * @pt_id: The IOAS to connect this HWPT to
+ * @pt_id: The IOAS or HWPT to connect this HWPT to
  * @out_hwpt_id: The ID of the new HWPT
  * @__reserved: Must be 0
+ * @hwpt_type: One of enum iommu_hwpt_type
+ * @data_len: Length of the type specific data
+ * @data_uptr: User pointer to the type specific data
  *
  * Explicitly allocate a hardware page table object. This is the same object
  * type that is returned by iommufd_device_attach() and represents the
  * underlying iommu driver's iommu_domain kernel object.
  *
- * A HWPT will be created with the IOVA mappings from the given IOAS.
+ * A kernel-managed HWPT will be created with the mappings from the given
+ * IOAS via the @pt_id. The @hwpt_type for this allocation can be set to
+ * either IOMMU_HWPT_TYPE_DEFAULT or a pre-defined type corresponding to
+ * an I/O page table type supported by the underlying IOMMU hardware.
+ *
+ * A user-managed HWPT will be created from a given parent HWPT via the
+ * @pt_id, in which the parent HWPT must be allocated previously via the
+ * same ioctl from a given IOAS (@pt_id). In this case, the @hwpt_type
+ * must be set to a pre-defined type corresponding to an I/O page table
+ * type supported by the underlying IOMMU hardware.
+ *
+ * If the @hwpt_type is set to IOMMU_HWPT_TYPE_DEFAULT, @data_len and
+ * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr
+ * must be given.
  */
 struct iommu_hwpt_alloc {
 	__u32 size;
@@ -387,6 +403,9 @@ struct iommu_hwpt_alloc {
 	__u32 pt_id;
 	__u32 out_hwpt_id;
 	__u32 __reserved;
+	__u32 hwpt_type;
+	__u32 data_len;
+	__aligned_u64 data_uptr;
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 11/17] iommufd: Add IOMMU_HWPT_INVALIDATE
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (9 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-21  7:51 ` [PATCH v4 12/17] iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly Yi Liu
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

In nested translation, the stage-1 page table is user-managed but cached
by the IOMMU hardware, so an update on present page table entries in the
stage-1 page table should be followed with a cache invalidation.

Add an IOMMU_HWPT_INVALIDATE ioctl to support such a cache invalidation.
It takes hwpt_id to specify the iommu_domain, and a multi-entry array to
support multiple invalidation requests in one ioctl.

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 33 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h |  9 +++++++
 drivers/iommu/iommufd/main.c            |  3 +++
 include/uapi/linux/iommufd.h            | 29 ++++++++++++++++++++++
 4 files changed, 74 insertions(+)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index ab25de149ae6..72c46de1396b 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -382,3 +382,36 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	iommufd_put_object(&idev->obj);
 	return rc;
 }
+
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
+	struct iommu_user_data_array data_array = {
+		.uptr = u64_to_user_ptr(cmd->reqs_uptr),
+		.entry_len = cmd->req_len,
+		.entry_num = cmd->req_num,
+	};
+	struct iommufd_hw_pagetable *hwpt;
+	int rc = 0;
+
+	if (!cmd->req_len || !cmd->req_num)
+		return -EOPNOTSUPP;
+
+	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
+
+	if (!hwpt->user_managed) {
+		rc = -EINVAL;
+		goto out_put_hwpt;
+	}
+
+	rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, &data_array,
+						      &cmd->out_driver_error_code);
+	cmd->req_num = data_array.entry_num;
+	if (iommufd_ucmd_respond(ucmd, sizeof(*cmd)))
+		return -EFAULT;
+out_put_hwpt:
+	iommufd_put_object(&hwpt->obj);
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index b14f23d3f42e..bdbc8dac2fd8 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -270,6 +270,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev);
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
 void iommufd_hw_pagetable_abort(struct iommufd_object *obj);
 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd);
 
 static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
 					    struct iommufd_hw_pagetable *hwpt)
@@ -281,6 +282,14 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
 		refcount_dec(&hwpt->obj.users);
 }
 
+static inline struct iommufd_hw_pagetable *
+iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id)
+{
+	return container_of(iommufd_get_object(ucmd->ictx, id,
+					       IOMMUFD_OBJ_HW_PAGETABLE),
+			    struct iommufd_hw_pagetable, obj);
+}
+
 static inline bool
 iommufd_hw_pagetable_compare_ioas(struct iommufd_hw_pagetable *old_hwpt,
 				  struct iommufd_hw_pagetable *new_hwpt)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index e71523cbd0de..d9d82a413105 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -307,6 +307,7 @@ union ucmd_buffer {
 	struct iommu_destroy destroy;
 	struct iommu_hw_info info;
 	struct iommu_hwpt_alloc hwpt;
+	struct iommu_hwpt_invalidate cache;
 	struct iommu_ioas_alloc alloc;
 	struct iommu_ioas_allow_iovas allow_iovas;
 	struct iommu_ioas_copy ioas_copy;
@@ -342,6 +343,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 __reserved),
 	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
 		 __reserved),
+	IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate,
+		 struct iommu_hwpt_invalidate, out_driver_error_code),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
 		 struct iommu_ioas_alloc, out_ioas_id),
 	IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index c46b1f772f20..2083a0309a9b 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -47,6 +47,7 @@ enum {
 	IOMMUFD_CMD_VFIO_IOAS,
 	IOMMUFD_CMD_HWPT_ALLOC,
 	IOMMUFD_CMD_GET_HW_INFO,
+	IOMMUFD_CMD_HWPT_INVALIDATE,
 };
 
 /**
@@ -478,4 +479,32 @@ struct iommu_hw_info {
 	__u32 __reserved;
 };
 #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
+
+/**
+ * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
+ * @size: sizeof(struct iommu_hwpt_invalidate)
+ * @hwpt_id: HWPT ID of target hardware page table for cache invalidation
+ * @reqs_uptr: User pointer to an array having @req_num of cache invalidation
+ *             requests. The request entries in the array are of fixed width
+ *             @req_len, and contain a user data structure for invalidation
+ *             request specific to the given hardware page table.
+ * @req_len: Length (in bytes) of a request entry in the request array
+ * @req_num: Input the number of cache invalidation requests in the array.
+ *           Output the number of requests successfully handled by kernel.
+ * @out_driver_error_code: Report a driver speicifc error code upon failure
+ *
+ * Invalidate the iommu cache for user-managed page table. Modifications on a
+ * user-managed page table should be followed by this operation to sync cache.
+ * Each ioctl can support one or more cache invalidation requests in the array
+ * that has a total size of @req_len * @req_num.
+ */
+struct iommu_hwpt_invalidate {
+	__u32 size;
+	__u32 hwpt_id;
+	__aligned_u64 reqs_uptr;
+	__u32 req_len;
+	__u32 req_num;
+	__u32 out_driver_error_code;
+};
+#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 12/17] iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (10 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 11/17] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-21  7:51 ` [PATCH v4 13/17] iommufd/selftest: Add nested domain allocation for mock domain Yi Liu
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

TEST_LENGTH passing ".size = sizeof(struct _struct) - 1" expects -EINVAL
from "if (ucmd.user_size < op->min_size)" check in iommufd_fops_ioctl().
This has been working when min_size is exactly the size of the structure.

However, if the size of the structure becomes larger than min_size, i.e.
the passing size above is larger than min_size, that min_size sanity no
longer works.

Since the first test in TEST_LENGTH() was to test that min_size sanity
routine, rework it to support a min_size calculation, rather than using
the full size of the structure.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 tools/testing/selftests/iommu/iommufd.c | 29 ++++++++++++++-----------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 9c129e63d7c7..7a29d68bd1d2 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -86,12 +86,13 @@ TEST_F(iommufd, cmd_fail)
 
 TEST_F(iommufd, cmd_length)
 {
-#define TEST_LENGTH(_struct, _ioctl)                                     \
+#define TEST_LENGTH(_struct, _ioctl, _last)                              \
 	{                                                                \
+		size_t min_size = offsetofend(struct _struct, _last);    \
 		struct {                                                 \
 			struct _struct cmd;                              \
 			uint8_t extra;                                   \
-		} cmd = { .cmd = { .size = sizeof(struct _struct) - 1 }, \
+		} cmd = { .cmd = { .size = min_size - 1 },               \
 			  .extra = UINT8_MAX };                          \
 		int old_errno;                                           \
 		int rc;                                                  \
@@ -112,17 +113,19 @@ TEST_F(iommufd, cmd_length)
 		}                                                        \
 	}
 
-	TEST_LENGTH(iommu_destroy, IOMMU_DESTROY);
-	TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO);
-	TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC);
-	TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC);
-	TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES);
-	TEST_LENGTH(iommu_ioas_allow_iovas, IOMMU_IOAS_ALLOW_IOVAS);
-	TEST_LENGTH(iommu_ioas_map, IOMMU_IOAS_MAP);
-	TEST_LENGTH(iommu_ioas_copy, IOMMU_IOAS_COPY);
-	TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP);
-	TEST_LENGTH(iommu_option, IOMMU_OPTION);
-	TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS);
+	TEST_LENGTH(iommu_destroy, IOMMU_DESTROY, id);
+	TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO, __reserved);
+	TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC, __reserved);
+	TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC, out_ioas_id);
+	TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES,
+		    out_iova_alignment);
+	TEST_LENGTH(iommu_ioas_allow_iovas, IOMMU_IOAS_ALLOW_IOVAS,
+		    allowed_iovas);
+	TEST_LENGTH(iommu_ioas_map, IOMMU_IOAS_MAP, iova);
+	TEST_LENGTH(iommu_ioas_copy, IOMMU_IOAS_COPY, src_iova);
+	TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP, length);
+	TEST_LENGTH(iommu_option, IOMMU_OPTION, val64);
+	TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved);
 #undef TEST_LENGTH
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 13/17] iommufd/selftest: Add nested domain allocation for mock domain
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (11 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 12/17] iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-21  7:51 ` [PATCH v4 14/17] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs Yi Liu
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

Add nested domain support in the ->domain_alloc_user op with some proper
sanity checks. Then, add a domain_nested_ops for all nested domains.

Also, add an iotlb as a testing property of a nested domain. A following
patch will verify its value for the success of a nested domain allocation
and a cache invalidation request.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h |  18 +++++
 drivers/iommu/iommufd/selftest.c     | 114 ++++++++++++++++++++++++---
 2 files changed, 123 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 3f3644375bf1..7f997234a1a8 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -40,6 +40,11 @@ enum {
 	MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES = 1 << 0,
 };
 
+enum {
+	MOCK_NESTED_DOMAIN_IOTLB_ID_MAX = 3,
+	MOCK_NESTED_DOMAIN_IOTLB_NUM = 4,
+};
+
 struct iommu_test_cmd {
 	__u32 size;
 	__u32 op;
@@ -109,4 +114,17 @@ struct iommu_test_hw_info {
 	__u32 test_reg;
 };
 
+/* Should not be equal to any defined value in enum iommu_hwpt_type */
+#define IOMMU_HWPT_TYPE_SELFTEST		0xdead
+
+/**
+ * struct iommu_hwpt_selftest
+ *
+ * @iotlb: default mock iotlb value, IOMMU_TEST_IOTLB_DEFAULT
+ */
+struct iommu_hwpt_selftest {
+#define IOMMU_TEST_IOTLB_DEFAULT	0xbadbeef
+	__u32 iotlb;
+};
+
 #endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 2205a552e570..bd967317927f 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -88,6 +88,17 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
 struct mock_iommu_domain {
 	struct iommu_domain domain;
 	struct xarray pfns;
+	bool nested : 1;
+	/* mock domain test data */
+	union {
+		struct { /* nested */
+			struct mock_iommu_domain *parent;
+			u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM];
+		};
+		struct { /* parent */
+			enum iommu_hwpt_type hwpt_type;
+		};
+	};
 };
 
 enum selftest_obj_type {
@@ -147,8 +158,12 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length, u32 *type)
 }
 
 static const struct iommu_ops mock_ops;
+static struct iommu_domain_ops domain_nested_ops;
 
-static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
+static struct iommu_domain *
+__mock_domain_alloc_kernel(unsigned int iommu_domain_type,
+			   struct mock_iommu_domain *dummy,
+			   const struct iommu_hwpt_selftest *user_cfg)
 {
 	struct mock_iommu_domain *mock;
 
@@ -156,11 +171,11 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
 		return &mock_blocking_domain;
 
 	if (iommu_domain_type != IOMMU_DOMAIN_UNMANAGED)
-		return NULL;
+		return ERR_PTR(-EINVAL);
 
 	mock = kzalloc(sizeof(*mock), GFP_KERNEL);
 	if (!mock)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 	mock->domain.geometry.aperture_start = MOCK_APERTURE_START;
 	mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST;
 	mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE;
@@ -170,18 +185,91 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
 	return &mock->domain;
 }
 
+static struct iommu_domain *
+__mock_domain_alloc_nested(unsigned int iommu_domain_type,
+			   struct mock_iommu_domain *mock_parent,
+			   const struct iommu_hwpt_selftest *user_cfg)
+{
+	struct mock_iommu_domain *mock;
+	int i;
+
+	if (iommu_domain_type != IOMMU_DOMAIN_NESTED)
+		return ERR_PTR(-EINVAL);
+
+	if (!user_cfg)
+		return ERR_PTR(-EINVAL);
+
+	mock = kzalloc(sizeof(*mock), GFP_KERNEL);
+	if (!mock)
+		return ERR_PTR(-ENOMEM);
+	mock->nested = true;
+	mock->parent = mock_parent;
+	mock->domain.type = iommu_domain_type;
+	mock->domain.ops = &domain_nested_ops;
+	for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++)
+		mock->iotlb[i] = user_cfg->iotlb;
+	return &mock->domain;
+}
+
+static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
+{
+	struct iommu_domain *domain;
+
+	if (iommu_domain_type != IOMMU_DOMAIN_BLOCKED &&
+	    iommu_domain_type != IOMMU_DOMAIN_UNMANAGED)
+		return NULL;
+	domain = __mock_domain_alloc_kernel(iommu_domain_type, NULL, NULL);
+	if (IS_ERR(domain))
+		domain = NULL;
+	return domain;
+}
+
 static struct iommu_domain *
 mock_domain_alloc_user(struct device *dev, u32 flags,
 		       enum iommu_hwpt_type hwpt_type,
 		       struct iommu_domain *parent,
 		       const struct iommu_user_data *user_data)
 {
-	struct iommu_domain *domain;
+	struct iommu_domain *(*alloc_fn)(unsigned int iommu_domain_type,
+					 struct mock_iommu_domain *mock_parent,
+					 const struct iommu_hwpt_selftest *user_cfg);
+	unsigned int iommu_domain_type = IOMMU_DOMAIN_UNMANAGED;
+	struct iommu_hwpt_selftest data, *user_cfg = NULL;
+	struct mock_iommu_domain *mock_parent = NULL;
+	size_t min_len, data_len;
+
+	switch (hwpt_type) {
+	case IOMMU_HWPT_TYPE_DEFAULT:
+		if (user_data || parent)
+			return ERR_PTR(-EINVAL);
+		min_len = data_len = 0;
+		alloc_fn = __mock_domain_alloc_kernel;
+		break;
+	default:
+		/* IOMMU_HWPT_TYPE_SELFTEST cannot be a case for a big value */
+		if (hwpt_type != IOMMU_HWPT_TYPE_SELFTEST)
+			return ERR_PTR(-EINVAL);
+		if (!user_data || !parent ||
+		    parent->ops != mock_ops.default_domain_ops)
+			return ERR_PTR(-EINVAL);
+		iommu_domain_type = IOMMU_DOMAIN_NESTED;
+		mock_parent = container_of(parent,
+					   struct mock_iommu_domain, domain);
+		min_len = offsetofend(struct iommu_hwpt_selftest, iotlb);
+		data_len = sizeof(struct iommu_hwpt_selftest);
+		alloc_fn = __mock_domain_alloc_nested;
+		break;
+	}
 
-	domain = mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
-	if (!domain)
-		domain = ERR_PTR(-ENOMEM);
-	return domain;
+	if (user_data) {
+		int rc = iommu_copy_user_data(&data, user_data,
+					      data_len, min_len);
+		if (rc)
+			return ERR_PTR(rc);
+		user_cfg = &data;
+	}
+
+	return alloc_fn(iommu_domain_type, mock_parent, user_cfg);
 }
 
 static void mock_domain_free(struct iommu_domain *domain)
@@ -340,6 +428,11 @@ static const struct iommu_ops mock_ops = {
 		},
 };
 
+static struct iommu_domain_ops domain_nested_ops = {
+	.free = mock_domain_free,
+	.attach_dev = mock_domain_nop_attach,
+};
+
 static inline struct iommufd_hw_pagetable *
 get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id,
 		 struct mock_iommu_domain **mock)
@@ -352,7 +445,10 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id,
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 	hwpt = container_of(obj, struct iommufd_hw_pagetable, obj);
-	if (hwpt->domain->ops != mock_ops.default_domain_ops) {
+	if ((hwpt->domain->type == IOMMU_DOMAIN_UNMANAGED &&
+	     hwpt->domain->ops != mock_ops.default_domain_ops) ||
+	    (hwpt->domain->type == IOMMU_DOMAIN_NESTED &&
+	     hwpt->domain->ops != &domain_nested_ops)) {
 		iommufd_put_object(&hwpt->obj);
 		return ERR_PTR(-EINVAL);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 14/17] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (12 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 13/17] iommufd/selftest: Add nested domain allocation for mock domain Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-21  7:51 ` [PATCH v4 15/17] iommufd/selftest: Add mock_domain_cache_invalidate_user support Yi Liu
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

The IOMMU_HWPT_ALLOC ioctl now supports passing user_data to allocate a
user-managed domain for nested HWPTs. Add its coverage for that. Also,
update _test_cmd_hwpt_alloc() and add test_cmd/err_hwpt_alloc_nested().

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 tools/testing/selftests/iommu/iommufd.c       | 115 ++++++++++++++++++
 .../selftests/iommu/iommufd_fail_nth.c        |   3 +-
 tools/testing/selftests/iommu/iommufd_utils.h |  31 +++--
 3 files changed, 141 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 7a29d68bd1d2..db5e59e4abab 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -264,6 +264,121 @@ TEST_F(iommufd_ioas, ioas_destroy)
 	}
 }
 
+TEST_F(iommufd_ioas, alloc_hwpt_nested)
+{
+	const uint32_t min_data_len =
+		offsetofend(struct iommu_hwpt_selftest, iotlb);
+	struct iommu_hwpt_selftest data = {
+		.iotlb =  IOMMU_TEST_IOTLB_DEFAULT,
+	};
+	uint32_t nested_hwpt_id[2] = {};
+	uint32_t parent_hwpt_id = 0;
+	uint32_t parent_hwpt_id_not_work = 0;
+	uint32_t test_hwpt_id = 0;
+
+	if (self->device_id) {
+		/* Negative tests */
+		test_err_hwpt_alloc(ENOENT, self->ioas_id, self->device_id,
+				    0, &test_hwpt_id);
+		test_err_hwpt_alloc(EINVAL, self->device_id,
+				    self->device_id, 0, &test_hwpt_id);
+
+		test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
+				    IOMMU_HWPT_ALLOC_NEST_PARENT,
+				    &parent_hwpt_id);
+
+		test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
+				    0, &parent_hwpt_id_not_work);
+
+		/* Negative nested tests */
+		test_err_hwpt_alloc_nested(EINVAL,
+					   self->device_id, parent_hwpt_id,
+					   0, &nested_hwpt_id[0],
+					   IOMMU_HWPT_TYPE_DEFAULT,
+					   &data, sizeof(data));
+		test_err_hwpt_alloc_nested(EINVAL,
+					   self->device_id, parent_hwpt_id,
+					   0, &nested_hwpt_id[0],
+					   IOMMU_HWPT_TYPE_SELFTEST,
+					   &data, min_data_len - 1);
+		test_err_hwpt_alloc_nested(EFAULT,
+					   self->device_id, parent_hwpt_id,
+					   0, &nested_hwpt_id[0],
+					   IOMMU_HWPT_TYPE_SELFTEST,
+					   NULL, sizeof(data));
+		test_err_hwpt_alloc_nested(EINVAL,
+					   self->device_id, parent_hwpt_id,
+					   IOMMU_HWPT_ALLOC_NEST_PARENT,
+					   &nested_hwpt_id[0],
+					   IOMMU_HWPT_TYPE_SELFTEST,
+					   &data, sizeof(data));
+		test_err_hwpt_alloc_nested(EINVAL, self->device_id,
+					   parent_hwpt_id_not_work,
+					   0, &nested_hwpt_id[0],
+					   IOMMU_HWPT_TYPE_SELFTEST,
+					   &data, sizeof(data));
+
+		/* Allocate two nested hwpts sharing one common parent hwpt */
+		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+					   0, &nested_hwpt_id[0],
+					   IOMMU_HWPT_TYPE_SELFTEST,
+					   &data, sizeof(data));
+		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+					   0, &nested_hwpt_id[1],
+					   IOMMU_HWPT_TYPE_SELFTEST,
+					   &data, sizeof(data));
+
+		/* Negative test: a nested hwpt on top of a nested hwpt */
+		test_err_hwpt_alloc_nested(EINVAL,
+					   self->device_id, nested_hwpt_id[0],
+					   0, &test_hwpt_id,
+					   IOMMU_HWPT_TYPE_SELFTEST,
+					   &data, sizeof(data));
+		/* Negative test: parent hwpt now cannot be freed */
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, parent_hwpt_id));
+
+		/* Attach device to nested_hwpt_id[0] that then will be busy */
+		test_cmd_mock_domain_replace(self->stdev_id,
+					     nested_hwpt_id[0]);
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, nested_hwpt_id[0]));
+
+		/* Switch from nested_hwpt_id[0] to nested_hwpt_id[1] */
+		test_cmd_mock_domain_replace(self->stdev_id,
+					     nested_hwpt_id[1]);
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, nested_hwpt_id[1]));
+		test_ioctl_destroy(nested_hwpt_id[0]);
+
+		/* Detach from nested_hwpt_id[1] and destroy it */
+		test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
+		test_ioctl_destroy(nested_hwpt_id[1]);
+
+		/* Detach from the parent hw_pagetable and destroy it */
+		test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+		test_ioctl_destroy(parent_hwpt_id);
+		test_ioctl_destroy(parent_hwpt_id_not_work);
+	} else {
+		test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id,
+				    0, &parent_hwpt_id);
+		test_err_hwpt_alloc_nested(ENOENT,
+					   self->device_id, parent_hwpt_id,
+					   0, &nested_hwpt_id[0],
+					   IOMMU_HWPT_TYPE_SELFTEST,
+					   &data, sizeof(data));
+		test_err_hwpt_alloc_nested(ENOENT,
+					   self->device_id, parent_hwpt_id,
+					   0, &nested_hwpt_id[1],
+					   IOMMU_HWPT_TYPE_SELFTEST,
+					   &data, sizeof(data));
+		test_err_mock_domain_replace(ENOENT,
+					     self->stdev_id, nested_hwpt_id[0]);
+		test_err_mock_domain_replace(ENOENT,
+					     self->stdev_id, nested_hwpt_id[1]);
+	}
+}
+
 TEST_F(iommufd_ioas, hwpt_attach)
 {
 	/* Create a device attached directly to a hwpt */
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 3d7838506bfe..d3f47f262c04 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -615,7 +615,8 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 	if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info)))
 		return -1;
 
-	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id))
+	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id,
+				 IOMMU_HWPT_TYPE_DEFAULT, 0, 0))
 		return -1;
 
 	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL))
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index be4970a84977..21d7c7e53bd4 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -103,13 +103,17 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
 							   pt_id, NULL))
 
 static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
-				__u32 flags, __u32 *hwpt_id)
+				__u32 flags, __u32 *hwpt_id, __u32 hwpt_type,
+				void *data, size_t data_len)
 {
 	struct iommu_hwpt_alloc cmd = {
 		.size = sizeof(cmd),
 		.flags = flags,
 		.dev_id = device_id,
 		.pt_id = pt_id,
+		.hwpt_type = hwpt_type,
+		.data_len = data_len,
+		.data_uptr = (uint64_t)data,
 	};
 	int ret;
 
@@ -121,12 +125,25 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 	return 0;
 }
 
-#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \
-	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, \
-					  pt_id, flags, hwpt_id))
-#define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \
-	EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \
-						  pt_id, flags, hwpt_id))
+#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id)                \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+					  hwpt_id, IOMMU_HWPT_TYPE_DEFAULT,  \
+					  NULL, 0))
+#define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id)         \
+	EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, \
+						  flags, hwpt_id,             \
+						  IOMMU_HWPT_TYPE_DEFAULT,    \
+						  NULL, 0))
+
+#define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id,          \
+				   hwpt_type, data, data_len)                 \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags,  \
+					  hwpt_id, hwpt_type, data, data_len))
+#define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id,  \
+				   hwpt_type, data, data_len)                 \
+	EXPECT_ERRNO(_errno,                                                  \
+		     _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags,  \
+					  hwpt_id, hwpt_type, data, data_len))
 
 static int _test_cmd_access_replace_ioas(int fd, __u32 access_id,
 					 unsigned int ioas_id)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 15/17] iommufd/selftest: Add mock_domain_cache_invalidate_user support
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (13 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 14/17] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-21  7:51 ` [PATCH v4 16/17] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

Add mock_domain_cache_invalidate_user() data structure to support user
space selftest program to cover user cache invalidation pathway.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h | 17 +++++++++++
 drivers/iommu/iommufd/selftest.c     | 44 ++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 7f997234a1a8..748349dd5bf4 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -127,4 +127,21 @@ struct iommu_hwpt_selftest {
 	__u32 iotlb;
 };
 
+/**
+ * struct iommu_hwpt_invalidate_selftest
+ *
+ * @flags: invalidate flags
+ * @iotlb_id: invalidate iotlb entry index
+ *
+ * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @iotlb_id will be ignored
+ */
+struct iommu_hwpt_invalidate_selftest {
+#define IOMMU_TEST_INVALIDATE_ALL	(1ULL << 0)
+	__u32 flags;
+	__u32 iotlb_id;
+};
+
+#define IOMMU_TEST_INVALIDATE_ERR_FETCH	0xdeadbeee
+#define IOMMU_TEST_INVALIDATE_ERR_REQ	0xdeadbeef
+
 #endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index bd967317927f..ebb2abc40c4a 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -428,9 +428,53 @@ static const struct iommu_ops mock_ops = {
 		},
 };
 
+static int
+mock_domain_cache_invalidate_user(struct iommu_domain *domain,
+				  struct iommu_user_data_array *array,
+				  u32 *error_code)
+{
+	const size_t min_len =
+		offsetofend(struct iommu_hwpt_invalidate_selftest, iotlb_id);
+	struct mock_iommu_domain *mock =
+		container_of(domain, struct mock_iommu_domain, domain);
+	struct iommu_hwpt_invalidate_selftest inv;
+	int rc = 0;
+	int i, j;
+
+	if (domain->type != IOMMU_DOMAIN_NESTED || !mock->nested)
+		return -EINVAL;
+
+	for (i = 0; i < array->entry_num; i++) {
+		rc = iommu_copy_user_data_from_array(&inv, array,
+						     i, sizeof(inv), min_len);
+		if (rc) {
+			*error_code = IOMMU_TEST_INVALIDATE_ERR_FETCH;
+			goto err;
+		}
+		/* Invalidate all mock iotlb entries and ignore iotlb_id */
+		if (inv.flags & IOMMU_TEST_INVALIDATE_ALL) {
+			for (j = 0; j < MOCK_NESTED_DOMAIN_IOTLB_NUM; j++)
+				mock->iotlb[j] = 0;
+			continue;
+		}
+		/* Treat out-of-boundry iotlb_id as a request error */
+		if (inv.iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX) {
+			*error_code = IOMMU_TEST_INVALIDATE_ERR_REQ;
+			rc = -EINVAL;
+			goto err;
+		}
+		mock->iotlb[inv.iotlb_id] = 0;
+	}
+
+err:
+	array->entry_num = i;
+	return rc;
+}
+
 static struct iommu_domain_ops domain_nested_ops = {
 	.free = mock_domain_free,
 	.attach_dev = mock_domain_nop_attach,
+	.cache_invalidate_user = mock_domain_cache_invalidate_user,
 };
 
 static inline struct iommufd_hw_pagetable *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 16/17] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (14 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 15/17] iommufd/selftest: Add mock_domain_cache_invalidate_user support Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-09-21  7:51 ` [PATCH v4 17/17] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
  2023-10-10 16:53 ` [PATCH v4 00/17] iommufd: Add nesting infrastructure Jason Gunthorpe
  17 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

This allows to test whether IOTLB has been invalidated or not.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h          |  5 ++++
 drivers/iommu/iommufd/selftest.c              | 25 +++++++++++++++++++
 tools/testing/selftests/iommu/iommufd.c       |  4 +++
 tools/testing/selftests/iommu/iommufd_utils.h | 24 ++++++++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 748349dd5bf4..f6c8e271e71d 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -19,6 +19,7 @@ enum {
 	IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT,
 	IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE,
 	IOMMU_TEST_OP_ACCESS_REPLACE_IOAS,
+	IOMMU_TEST_OP_MD_CHECK_IOTLB,
 };
 
 enum {
@@ -100,6 +101,10 @@ struct iommu_test_cmd {
 		struct {
 			__u32 ioas_id;
 		} access_replace_ioas;
+		struct {
+			__u32 id;
+			__u32 iotlb;
+		} check_iotlb;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index ebb2abc40c4a..92f3e3f5eeb5 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -759,6 +759,27 @@ static int iommufd_test_md_check_refs(struct iommufd_ucmd *ucmd,
 	return 0;
 }
 
+static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd,
+				       u32 mockpt_id, unsigned int iotlb_id,
+				       u32 iotlb)
+{
+	struct iommufd_hw_pagetable *hwpt;
+	struct mock_iommu_domain *mock;
+	int rc = 0;
+
+	hwpt = get_md_pagetable(ucmd, mockpt_id, &mock);
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
+
+	mock = container_of(hwpt->domain, struct mock_iommu_domain, domain);
+
+	if (!mock->nested || iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX ||
+	    mock->iotlb[iotlb_id] != iotlb)
+		rc = -EINVAL;
+	iommufd_put_object(&hwpt->obj);
+	return rc;
+}
+
 struct selftest_access {
 	struct iommufd_access *access;
 	struct file *file;
@@ -1172,6 +1193,10 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 		return iommufd_test_md_check_refs(
 			ucmd, u64_to_user_ptr(cmd->check_refs.uptr),
 			cmd->check_refs.length, cmd->check_refs.refs);
+	case IOMMU_TEST_OP_MD_CHECK_IOTLB:
+		return iommufd_test_md_check_iotlb(ucmd, cmd->id,
+						   cmd->check_iotlb.id,
+						   cmd->check_iotlb.iotlb);
 	case IOMMU_TEST_OP_CREATE_ACCESS:
 		return iommufd_test_create_access(ucmd, cmd->id,
 						  cmd->create_access.flags);
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index db5e59e4abab..c59299248a98 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -327,6 +327,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 					   0, &nested_hwpt_id[1],
 					   IOMMU_HWPT_TYPE_SELFTEST,
 					   &data, sizeof(data));
+		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0],
+					      IOMMU_TEST_IOTLB_DEFAULT);
+		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1],
+					      IOMMU_TEST_IOTLB_DEFAULT);
 
 		/* Negative test: a nested hwpt on top of a nested hwpt */
 		test_err_hwpt_alloc_nested(EINVAL,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 21d7c7e53bd4..2bc41e9a69e8 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -145,6 +145,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 		     _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags,  \
 					  hwpt_id, hwpt_type, data, data_len))
 
+#define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected)                 \
+	({                                                                     \
+		struct iommu_test_cmd test_cmd = {                             \
+			.size = sizeof(test_cmd),                              \
+			.op = IOMMU_TEST_OP_MD_CHECK_IOTLB,                    \
+			.id = hwpt_id,                                         \
+			.check_iotlb = {                                       \
+				.id = iotlb_id,                                \
+				.iotlb = expected,                             \
+			},                                                     \
+		};                                                             \
+		ASSERT_EQ(0,                                                   \
+			  ioctl(self->fd,                                      \
+				_IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_IOTLB), \
+				&test_cmd));                                   \
+	})
+
+#define test_cmd_hwpt_check_iotlb_all(hwpt_id, expected)                       \
+	({                                                                     \
+		int i;                                                         \
+		for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++)             \
+			test_cmd_hwpt_check_iotlb(hwpt_id, i, expected);       \
+	})
+
 static int _test_cmd_access_replace_ioas(int fd, __u32 access_id,
 					 unsigned int ioas_id)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH v4 17/17] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (15 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 16/17] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
@ 2023-09-21  7:51 ` Yi Liu
  2023-10-10 16:53 ` [PATCH v4 00/17] iommufd: Add nesting infrastructure Jason Gunthorpe
  17 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-09-21  7:51 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan, joao.m.martins

From: Nicolin Chen <nicolinc@nvidia.com>

Add test cases for the IOMMU_HWPT_INVALIDATE ioctl and verify it by using
the new IOMMU_TEST_OP_MD_CHECK_IOTLB.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 tools/testing/selftests/iommu/iommufd.c       | 61 +++++++++++++++++++
 tools/testing/selftests/iommu/iommufd_utils.h | 36 +++++++++++
 2 files changed, 97 insertions(+)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index c59299248a98..617e11153761 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -116,6 +116,7 @@ TEST_F(iommufd, cmd_length)
 	TEST_LENGTH(iommu_destroy, IOMMU_DESTROY, id);
 	TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO, __reserved);
 	TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC, __reserved);
+	TEST_LENGTH(iommu_hwpt_invalidate, IOMMU_HWPT_INVALIDATE, out_driver_error_code);
 	TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC, out_ioas_id);
 	TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES,
 		    out_iova_alignment);
@@ -271,7 +272,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 	struct iommu_hwpt_selftest data = {
 		.iotlb =  IOMMU_TEST_IOTLB_DEFAULT,
 	};
+	struct iommu_hwpt_invalidate_selftest inv_reqs[2] = {0};
 	uint32_t nested_hwpt_id[2] = {};
+	uint32_t num_inv, driver_error;
 	uint32_t parent_hwpt_id = 0;
 	uint32_t parent_hwpt_id_not_work = 0;
 	uint32_t test_hwpt_id = 0;
@@ -342,6 +345,64 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 		EXPECT_ERRNO(EBUSY,
 			     _test_ioctl_destroy(self->fd, parent_hwpt_id));
 
+		/* hwpt_invalidate only supports a user-managed hwpt (nested) */
+		num_inv = 1;
+		test_err_hwpt_invalidate(EINVAL, parent_hwpt_id, inv_reqs,
+					 sizeof(*inv_reqs), &num_inv, NULL);
+		/* Negative test: structure size sanity */
+		num_inv = 1;
+		test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs,
+					 sizeof(*inv_reqs) + 1, &num_inv,
+					 &driver_error);
+		assert(driver_error == IOMMU_TEST_INVALIDATE_ERR_FETCH);
+
+		num_inv = 1;
+		test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs,
+					 1, &num_inv, &driver_error);
+		assert(driver_error == IOMMU_TEST_INVALIDATE_ERR_FETCH);
+
+		/* Invalidate the 1st iotlb entry but fail the 2nd request */
+		num_inv = 2;
+		inv_reqs[0].iotlb_id = 0;
+		inv_reqs[1].iotlb_id = MOCK_NESTED_DOMAIN_IOTLB_ID_MAX + 1;
+		test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs,
+					 sizeof(*inv_reqs), &num_inv,
+					 &driver_error);
+		assert(num_inv == 1);
+		assert(driver_error == IOMMU_TEST_INVALIDATE_ERR_REQ);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0, 0);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 1,
+					  IOMMU_TEST_IOTLB_DEFAULT);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 2,
+					  IOMMU_TEST_IOTLB_DEFAULT);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 3,
+					  IOMMU_TEST_IOTLB_DEFAULT);
+
+		/* Invalidate the 2nd iotlb entry and verify */
+		num_inv = 1;
+		inv_reqs[0].iotlb_id = 1;
+		test_cmd_hwpt_invalidate(nested_hwpt_id[0], inv_reqs,
+					 sizeof(*inv_reqs), &num_inv);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0, 0);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 1, 0);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 2,
+					  IOMMU_TEST_IOTLB_DEFAULT);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 3,
+					  IOMMU_TEST_IOTLB_DEFAULT);
+		/* Invalidate the 3rd and 4th iotlb entries and verify */
+		num_inv = 2;
+		inv_reqs[0].iotlb_id = 2;
+		inv_reqs[1].iotlb_id = 3;
+		test_cmd_hwpt_invalidate(nested_hwpt_id[0], inv_reqs,
+					 sizeof(*inv_reqs), &num_inv);
+		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0], 0);
+		/* Invalidate all iotlb entries for nested_hwpt_id[1] and verify */
+		num_inv = 1;
+		inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_ALL;
+		test_cmd_hwpt_invalidate(nested_hwpt_id[1], inv_reqs,
+					 sizeof(*inv_reqs), &num_inv);
+		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1], 0);
+
 		/* Attach device to nested_hwpt_id[0] that then will be busy */
 		test_cmd_mock_domain_replace(self->stdev_id,
 					     nested_hwpt_id[0]);
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 2bc41e9a69e8..55906c5da1a9 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -169,6 +169,42 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 			test_cmd_hwpt_check_iotlb(hwpt_id, i, expected);       \
 	})
 
+static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs,
+				     uint32_t lreq, uint32_t *nreqs,
+				     uint32_t *driver_error)
+{
+	struct iommu_hwpt_invalidate cmd = {
+		.size = sizeof(cmd),
+		.hwpt_id = hwpt_id,
+		.reqs_uptr = (uint64_t)reqs,
+		.req_len = lreq,
+		.req_num = *nreqs,
+	};
+	int rc = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd);
+	*nreqs = cmd.req_num;
+	if (driver_error)
+		*driver_error = cmd.out_driver_error_code;
+	return rc;
+}
+
+#define test_cmd_hwpt_invalidate(hwpt_id, reqs, lreq, nreqs)                 \
+	({                                                                   \
+		uint32_t error, num = *nreqs;                                \
+		ASSERT_EQ(0,                                                 \
+			  _test_cmd_hwpt_invalidate(self->fd, hwpt_id, reqs, \
+						    lreq, nreqs, &error));   \
+		assert(num == *nreqs);                                       \
+		assert(error == 0);                                          \
+	})
+#define test_err_hwpt_invalidate(_errno, hwpt_id, reqs, lreq, nreqs,      \
+				 driver_error)                            \
+	({                                                                \
+		EXPECT_ERRNO(_errno,                                      \
+			     _test_cmd_hwpt_invalidate(self->fd, hwpt_id, \
+						       reqs, lreq, nreqs, \
+						       driver_error));    \
+	})
+
 static int _test_cmd_access_replace_ioas(int fd, __u32 access_id,
 					 unsigned int ioas_id)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21  7:51 ` [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op Yi Liu
@ 2023-09-21 12:10   ` Baolu Lu
  2023-09-21 20:58     ` Nicolin Chen
  2023-09-25  6:22     ` Yi Liu
  2023-09-21 12:12   ` Baolu Lu
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 106+ messages in thread
From: Baolu Lu @ 2023-09-21 12:10 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan, joao.m.martins

On 2023/9/21 15:51, Yi Liu wrote:
> +/**
> + * iommu_copy_user_data - Copy iommu driver specific user space data
> + * @dst_data: Pointer to an iommu driver specific user data that is defined in
> + *            include/uapi/linux/iommufd.h
> + * @src_data: Pointer to a struct iommu_user_data for user space data info
> + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
> + * @min_len: Initial length of user data structure for backward compatibility.
> + *           This should be offsetofend using the last member in the user data
> + *           struct that was initially added to include/uapi/linux/iommufd.h
> + */
> +static inline int iommu_copy_user_data(void *dst_data,
> +				       const struct iommu_user_data *src_data,
> +				       size_t data_len, size_t min_len)
> +{
> +	if (WARN_ON(!dst_data || !src_data))
> +		return -EINVAL;
> +	if (src_data->len < min_len || data_len < src_data->len)
> +		return -EINVAL;
> +	return copy_struct_from_user(dst_data, data_len,
> +				     src_data->uptr, src_data->len);
> +}

I am not sure that I understand the purpose of "min_len" correctly. It
seems like it would always be equal to data_len?

Or, it means the minimal data length that the iommu driver requires?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21  7:51 ` [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op Yi Liu
  2023-09-21 12:10   ` Baolu Lu
@ 2023-09-21 12:12   ` Baolu Lu
  2023-09-21 16:44     ` Jason Gunthorpe
  2023-09-26  6:56   ` Tian, Kevin
  2023-10-10 16:58   ` Jason Gunthorpe
  3 siblings, 1 reply; 106+ messages in thread
From: Baolu Lu @ 2023-09-21 12:12 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan, joao.m.martins

On 2023/9/21 15:51, Yi Liu wrote:
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 4a7c5c8fdbb4..3c8660fe9bb1 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags {
>   	IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
>   };
>   
> +/**
> + * enum iommu_hwpt_type - IOMMU HWPT Type
> + * @IOMMU_HWPT_TYPE_DEFAULT: default

How about s/default/vendor agnostic/ ?

> + */
> +enum iommu_hwpt_type {
> +	IOMMU_HWPT_TYPE_DEFAULT,
> +};
> +
>   /**
>    * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
>    * @size: sizeof(struct iommu_hwpt_alloc)

Best regards,
baolu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21 12:12   ` Baolu Lu
@ 2023-09-21 16:44     ` Jason Gunthorpe
  2023-09-22  9:47       ` Robin Murphy
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-09-21 16:44 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Sep 21, 2023 at 08:12:03PM +0800, Baolu Lu wrote:
> On 2023/9/21 15:51, Yi Liu wrote:
> > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> > index 4a7c5c8fdbb4..3c8660fe9bb1 100644
> > --- a/include/uapi/linux/iommufd.h
> > +++ b/include/uapi/linux/iommufd.h
> > @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags {
> >   	IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
> >   };
> > +/**
> > + * enum iommu_hwpt_type - IOMMU HWPT Type
> > + * @IOMMU_HWPT_TYPE_DEFAULT: default
> 
> How about s/default/vendor agnostic/ ?

Please don't use the word vendor :)

IOMMU_HWPT_TYPE_GENERIC perhaps if we don't like default

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21 12:10   ` Baolu Lu
@ 2023-09-21 20:58     ` Nicolin Chen
  2023-09-25 13:05       ` Jason Gunthorpe
  2023-09-25  6:22     ` Yi Liu
  1 sibling, 1 reply; 106+ messages in thread
From: Nicolin Chen @ 2023-09-21 20:58 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, zhenzhong.duan,
	joao.m.martins

On Thu, Sep 21, 2023 at 08:10:58PM +0800, Baolu Lu wrote:
> On 2023/9/21 15:51, Yi Liu wrote:
> > +/**
> > + * iommu_copy_user_data - Copy iommu driver specific user space data
> > + * @dst_data: Pointer to an iommu driver specific user data that is defined in
> > + *            include/uapi/linux/iommufd.h
> > + * @src_data: Pointer to a struct iommu_user_data for user space data info
> > + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
> > + * @min_len: Initial length of user data structure for backward compatibility.
> > + *           This should be offsetofend using the last member in the user data
> > + *           struct that was initially added to include/uapi/linux/iommufd.h
> > + */
> > +static inline int iommu_copy_user_data(void *dst_data,
> > +                                    const struct iommu_user_data *src_data,
> > +                                    size_t data_len, size_t min_len)
> > +{
> > +     if (WARN_ON(!dst_data || !src_data))
> > +             return -EINVAL;
> > +     if (src_data->len < min_len || data_len < src_data->len)
> > +             return -EINVAL;
> > +     return copy_struct_from_user(dst_data, data_len,
> > +                                  src_data->uptr, src_data->len);
> > +}
> 
> I am not sure that I understand the purpose of "min_len" correctly. It
> seems like it would always be equal to data_len?
> 
> Or, it means the minimal data length that the iommu driver requires?

Hmm, I thought I had made it quite clear with the kdoc that it's
the "initial" length (min_len) v.s. "current" length (data_len),
i.e. min_len was set when the structure was introduced and would
never change while data_len can grow if the structure introduces
a new member. E.g. after this series struct iommu_hwpt_alloc has
a min_len fixed to offsetofend(..., __reserved) but its data_len
is actually increased to offsetofend(..., data_uptr).

Perhaps we could put all min_len defines in uAPI header, like:
include/uapi/linux/gfs2_ondisk.h:442:#define LH_V1_SIZE (offsetofend(struct gfs2_log_header, lh_hash))
In this way, drivers won't need to deal with that nor have risks
of breaking ABI by changing a min_len.

Also, if we go a bit further to ease the drivers, we could do:

========================================================================================
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 65a363f5e81e..13234e67409c 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -147,6 +147,9 @@ struct iommu_hwpt_selftest {
 	__u32 iotlb;
 };
 
+#define iommu_hwpt_selftest_min_len \
+	(offsetofend(struct iommu_hwpt_selftest, iotlb))
+
 /**
  * struct iommu_hwpt_invalidate_selftest
  *
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 117776d236dc..2cc3a8a3355b 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -263,8 +263,8 @@ mock_domain_alloc_user(struct device *dev, u32 flags,
 	}
 
 	if (user_data) {
-		int rc = iommu_copy_user_data(&data, user_data,
-					      data_len, min_len);
+		int rc = iommu_copy_user_data2(iommu_hwpt_selftest, &data,
+					       user_data);
 		if (rc)
 			return ERR_PTR(rc);
 		user_cfg = &data;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fb2febe7b8d8..db82803b026f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -282,6 +282,10 @@ static inline int iommu_copy_user_data(void *dst_data,
 				     src_data->uptr, src_data->len);
 }
 
+#define iommu_copy_user_data2(dst_struct, dst, src)               \
+	iommu_copy_user_data(dst, src, sizeof(struct dst_struct), \
+			     dst_struct##_min_len)
+
 /**
  * iommu_copy_user_data_from_array - Copy iommu driver specific user space data
  *                                   from an iommu_user_data_array input
========================================================================================

Surely, the end product of the test code above can do:
	iommu_copy_user_data = > __iommu_copy_user_data
	iommu_copy_user_data2 = > iommu_copy_user_data

Thanks
Nicolin

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21 16:44     ` Jason Gunthorpe
@ 2023-09-22  9:47       ` Robin Murphy
  2023-09-25  6:17         ` Yi Liu
  2023-09-26  6:37         ` Tian, Kevin
  0 siblings, 2 replies; 106+ messages in thread
From: Robin Murphy @ 2023-09-22  9:47 UTC (permalink / raw)
  To: Jason Gunthorpe, Baolu Lu
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan, joao.m.martins

On 2023-09-21 17:44, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 08:12:03PM +0800, Baolu Lu wrote:
>> On 2023/9/21 15:51, Yi Liu wrote:
>>> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
>>> index 4a7c5c8fdbb4..3c8660fe9bb1 100644
>>> --- a/include/uapi/linux/iommufd.h
>>> +++ b/include/uapi/linux/iommufd.h
>>> @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags {
>>>    	IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
>>>    };
>>> +/**
>>> + * enum iommu_hwpt_type - IOMMU HWPT Type
>>> + * @IOMMU_HWPT_TYPE_DEFAULT: default
>>
>> How about s/default/vendor agnostic/ ?
> 
> Please don't use the word vendor :)
> 
> IOMMU_HWPT_TYPE_GENERIC perhaps if we don't like default

Ah yes, a default domain type, not to be confused with any default 
domain type, including the default default domain type. Just in case 
anyone had forgotten how gleefully fun this is :D

I particularly like the bit where we end up with this construct later:

	switch (hwpt_type) {
	case IOMMU_HWPT_TYPE_DEFAULT:
		/* allocate a domain */
	default:
		/* allocate a different domain */
	}

But of course neither case allocates a *default* domain, because it's 
quite obviously the wrong place to be doing that.

I could go on enjoying myself, but basically yeah, "default" can't be a 
type in itself (at best it would be a meta-type which could be 
requested, such that it resolves to some real type to actually 
allocate), so a good name should reflect what the type functionally 
*means* to the user. IIUC the important distinction is that it's an 
abstract kernel-owned pagetable for the user to indirectly control via 
the API, rather than one it owns and writes directly (and thus has to be 
in a specific agreed format).

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-22  9:47       ` Robin Murphy
@ 2023-09-25  6:17         ` Yi Liu
  2023-09-25 12:59           ` Jason Gunthorpe
  2023-09-26  6:37         ` Tian, Kevin
  1 sibling, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-09-25  6:17 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe, Baolu Lu
  Cc: joro, alex.williamson, kevin.tian, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan, joao.m.martins

On 2023/9/22 17:47, Robin Murphy wrote:
> On 2023-09-21 17:44, Jason Gunthorpe wrote:
>> On Thu, Sep 21, 2023 at 08:12:03PM +0800, Baolu Lu wrote:
>>> On 2023/9/21 15:51, Yi Liu wrote:
>>>> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
>>>> index 4a7c5c8fdbb4..3c8660fe9bb1 100644
>>>> --- a/include/uapi/linux/iommufd.h
>>>> +++ b/include/uapi/linux/iommufd.h
>>>> @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags {
>>>>        IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
>>>>    };
>>>> +/**
>>>> + * enum iommu_hwpt_type - IOMMU HWPT Type
>>>> + * @IOMMU_HWPT_TYPE_DEFAULT: default
>>>
>>> How about s/default/vendor agnostic/ ?
>>
>> Please don't use the word vendor :)
>>
>> IOMMU_HWPT_TYPE_GENERIC perhaps if we don't like default
> 
> Ah yes, a default domain type, not to be confused with any default domain 
> type, including the default default domain type. Just in case anyone had 
> forgotten how gleefully fun this is :D
> 
> I particularly like the bit where we end up with this construct later:
> 
>      switch (hwpt_type) {
>      case IOMMU_HWPT_TYPE_DEFAULT:
>          /* allocate a domain */
>      default:
>          /* allocate a different domain */
>      }
> 
> But of course neither case allocates a *default* domain, because it's quite 
> obviously the wrong place to be doing that.
> 
> I could go on enjoying myself, but basically yeah, "default" can't be a 
> type in itself (at best it would be a meta-type which could be requested, 
> such that it resolves to some real type to actually allocate), so a good 
> name should reflect what the type functionally *means* to the user. IIUC 
> the important distinction is that it's an abstract kernel-owned pagetable 
> for the user to indirectly control via the API, rather than one it owns and 
> writes directly (and thus has to be in a specific agreed format).

yes. It is just what the existing domain_alloc_user op does. Here we add
a hwpt_type as the type can be given by user, so we need to define a 
specific type for it.

Perhaps we can also name it as IOMMU_HWPT_TYPE_UNMANAGED to be aligned with
the domain type naming. IOMMU_HWPT_TYPE_GENERIC is also a good choice.
Please feel free let me know your preference.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21 12:10   ` Baolu Lu
  2023-09-21 20:58     ` Nicolin Chen
@ 2023-09-25  6:22     ` Yi Liu
  2023-09-25  8:01       ` Baolu Lu
  1 sibling, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-09-25  6:22 UTC (permalink / raw)
  To: Baolu Lu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On 2023/9/21 20:10, Baolu Lu wrote:
> On 2023/9/21 15:51, Yi Liu wrote:
>> +/**
>> + * iommu_copy_user_data - Copy iommu driver specific user space data
>> + * @dst_data: Pointer to an iommu driver specific user data that is 
>> defined in
>> + *            include/uapi/linux/iommufd.h
>> + * @src_data: Pointer to a struct iommu_user_data for user space data info
>> + * @data_len: Length of current user data structure, i.e. sizeof(struct 
>> _dst)
>> + * @min_len: Initial length of user data structure for backward 
>> compatibility.
>> + *           This should be offsetofend using the last member in the 
>> user data
>> + *           struct that was initially added to 
>> include/uapi/linux/iommufd.h
>> + */
>> +static inline int iommu_copy_user_data(void *dst_data,
>> +                       const struct iommu_user_data *src_data,
>> +                       size_t data_len, size_t min_len)
>> +{
>> +    if (WARN_ON(!dst_data || !src_data))
>> +        return -EINVAL;
>> +    if (src_data->len < min_len || data_len < src_data->len)
>> +        return -EINVAL;
>> +    return copy_struct_from_user(dst_data, data_len,
>> +                     src_data->uptr, src_data->len);
>> +}
> 
> I am not sure that I understand the purpose of "min_len" correctly. It
> seems like it would always be equal to data_len?

no, it will not be equal to data_len once there is extension in the
uAPI structure.

> Or, it means the minimal data length that the iommu driver requires?

it is the minimal data length the uAPI requires. min_len is finalized
per the upstream of the first version of the uAPI.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-25  6:22     ` Yi Liu
@ 2023-09-25  8:01       ` Baolu Lu
  2023-09-25 13:00         ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Baolu Lu @ 2023-09-25  8:01 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, zhenzhong.duan, joao.m.martins

On 2023/9/25 14:22, Yi Liu wrote:
> On 2023/9/21 20:10, Baolu Lu wrote:
>> On 2023/9/21 15:51, Yi Liu wrote:
>>> +/**
>>> + * iommu_copy_user_data - Copy iommu driver specific user space data
>>> + * @dst_data: Pointer to an iommu driver specific user data that is 
>>> defined in
>>> + *            include/uapi/linux/iommufd.h
>>> + * @src_data: Pointer to a struct iommu_user_data for user space 
>>> data info
>>> + * @data_len: Length of current user data structure, i.e. 
>>> sizeof(struct _dst)
>>> + * @min_len: Initial length of user data structure for backward 
>>> compatibility.
>>> + *           This should be offsetofend using the last member in the 
>>> user data
>>> + *           struct that was initially added to 
>>> include/uapi/linux/iommufd.h
>>> + */
>>> +static inline int iommu_copy_user_data(void *dst_data,
>>> +                       const struct iommu_user_data *src_data,
>>> +                       size_t data_len, size_t min_len)
>>> +{
>>> +    if (WARN_ON(!dst_data || !src_data))
>>> +        return -EINVAL;
>>> +    if (src_data->len < min_len || data_len < src_data->len)
>>> +        return -EINVAL;
>>> +    return copy_struct_from_user(dst_data, data_len,
>>> +                     src_data->uptr, src_data->len);
>>> +}
>>
>> I am not sure that I understand the purpose of "min_len" correctly. It
>> seems like it would always be equal to data_len?
> 
> no, it will not be equal to data_len once there is extension in the
> uAPI structure.
> 
>> Or, it means the minimal data length that the iommu driver requires?
> 
> it is the minimal data length the uAPI requires. min_len is finalized
> per the upstream of the first version of the uAPI.

So, it looks like a constant. Perhaps we should document it in the
uapi/iommuf.h and avoid using it as a parameter of a helper function?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-25  6:17         ` Yi Liu
@ 2023-09-25 12:59           ` Jason Gunthorpe
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Gunthorpe @ 2023-09-25 12:59 UTC (permalink / raw)
  To: Yi Liu
  Cc: Robin Murphy, Baolu Lu, joro, alex.williamson, kevin.tian,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Mon, Sep 25, 2023 at 02:17:37PM +0800, Yi Liu wrote:

> yes. It is just what the existing domain_alloc_user op does. Here we add
> a hwpt_type as the type can be given by user, so we need to define a
> specific type for it.
> 
> Perhaps we can also name it as IOMMU_HWPT_TYPE_UNMANAGED to be
> aligned with

unmanged is also a weird nonsense name these days. We are slowly
replacing it with paging.

> the domain type naming. IOMMU_HWPT_TYPE_GENERIC is also a good choice.
> Please feel free let me know your preference.

This seems OK to me so far

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-25  8:01       ` Baolu Lu
@ 2023-09-25 13:00         ` Jason Gunthorpe
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Gunthorpe @ 2023-09-25 13:00 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Mon, Sep 25, 2023 at 04:01:55PM +0800, Baolu Lu wrote:
> On 2023/9/25 14:22, Yi Liu wrote:
> > On 2023/9/21 20:10, Baolu Lu wrote:
> > > On 2023/9/21 15:51, Yi Liu wrote:
> > > > +/**
> > > > + * iommu_copy_user_data - Copy iommu driver specific user space data
> > > > + * @dst_data: Pointer to an iommu driver specific user data
> > > > that is defined in
> > > > + *            include/uapi/linux/iommufd.h
> > > > + * @src_data: Pointer to a struct iommu_user_data for user
> > > > space data info
> > > > + * @data_len: Length of current user data structure, i.e.
> > > > sizeof(struct _dst)
> > > > + * @min_len: Initial length of user data structure for backward
> > > > compatibility.
> > > > + *           This should be offsetofend using the last member
> > > > in the user data
> > > > + *           struct that was initially added to
> > > > include/uapi/linux/iommufd.h
> > > > + */
> > > > +static inline int iommu_copy_user_data(void *dst_data,
> > > > +                       const struct iommu_user_data *src_data,
> > > > +                       size_t data_len, size_t min_len)
> > > > +{
> > > > +    if (WARN_ON(!dst_data || !src_data))
> > > > +        return -EINVAL;
> > > > +    if (src_data->len < min_len || data_len < src_data->len)
> > > > +        return -EINVAL;
> > > > +    return copy_struct_from_user(dst_data, data_len,
> > > > +                     src_data->uptr, src_data->len);
> > > > +}
> > > 
> > > I am not sure that I understand the purpose of "min_len" correctly. It
> > > seems like it would always be equal to data_len?
> > 
> > no, it will not be equal to data_len once there is extension in the
> > uAPI structure.
> > 
> > > Or, it means the minimal data length that the iommu driver requires?
> > 
> > it is the minimal data length the uAPI requires. min_len is finalized
> > per the upstream of the first version of the uAPI.
> 
> So, it looks like a constant. Perhaps we should document it in the
> uapi/iommuf.h and avoid using it as a parameter of a helper
> function?

It is per-driver, per-struct, so this is the right way to do it

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21 20:58     ` Nicolin Chen
@ 2023-09-25 13:05       ` Jason Gunthorpe
  2023-09-25 18:17         ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-09-25 13:05 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Baolu Lu, Yi Liu, joro, alex.williamson, kevin.tian,
	robin.murphy, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Sep 21, 2023 at 01:58:19PM -0700, Nicolin Chen wrote:

> Perhaps we could put all min_len defines in uAPI header, like:
> include/uapi/linux/gfs2_ondisk.h:442:#define LH_V1_SIZE (offsetofend(struct gfs2_log_header, lh_hash))
> In this way, drivers won't need to deal with that nor have risks
> of breaking ABI by changing a min_len.

I don't think we need constants, just be sure that every call to 
iommu_copy_user_data() has an offsetof() as the last parameter.

Indeed perhaps you should put it in a macro and force this to happen eg:

 #define iommu_copy_user_data(user_data, from, min_size_member) \
  __iommu_copy_user_data(user_data, from, offsetofend(typeof(*from), min_size_member))

 iommu_copy_user_data(user_data, &data, iotlb);

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-25 13:05       ` Jason Gunthorpe
@ 2023-09-25 18:17         ` Nicolin Chen
  0 siblings, 0 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-09-25 18:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Baolu Lu, Yi Liu, joro, alex.williamson, kevin.tian,
	robin.murphy, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Mon, Sep 25, 2023 at 10:05:06AM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 01:58:19PM -0700, Nicolin Chen wrote:
> 
> > Perhaps we could put all min_len defines in uAPI header, like:
> > include/uapi/linux/gfs2_ondisk.h:442:#define LH_V1_SIZE (offsetofend(struct gfs2_log_header, lh_hash))
> > In this way, drivers won't need to deal with that nor have risks
> > of breaking ABI by changing a min_len.
> 
> I don't think we need constants, just be sure that every call to 
> iommu_copy_user_data() has an offsetof() as the last parameter.
> 
> Indeed perhaps you should put it in a macro and force this to happen eg:
> 
>  #define iommu_copy_user_data(user_data, from, min_size_member) \
>   __iommu_copy_user_data(user_data, from, offsetofend(typeof(*from), min_size_member))
> 
>  iommu_copy_user_data(user_data, &data, iotlb);

OK. And always use sizeof(typeof(*from)) in the mcaro for the
current data_len, I assume?

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-22  9:47       ` Robin Murphy
  2023-09-25  6:17         ` Yi Liu
@ 2023-09-26  6:37         ` Tian, Kevin
  2023-09-26 16:29           ` Jason Gunthorpe
  1 sibling, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-09-26  6:37 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe, Baolu Lu
  Cc: Liu, Yi L, joro, alex.williamson, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest, Duan, Zhenzhong, Martins, Joao

> From: Robin Murphy <robin.murphy@arm.com>
> Sent: Friday, September 22, 2023 5:48 PM
> 
> I could go on enjoying myself, but basically yeah, "default" can't be a
> type in itself (at best it would be a meta-type which could be
> requested, such that it resolves to some real type to actually
> allocate), so a good name should reflect what the type functionally
> *means* to the user. IIUC the important distinction is that it's an
> abstract kernel-owned pagetable for the user to indirectly control via
> the API, rather than one it owns and writes directly (and thus has to be
> in a specific agreed format).
> 

IOMMU_HWPT_TYPE_KERNEL then?

IOMMU_HWPT_TYPE_GENERIC also doesn't sound a straight word.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21  7:51 ` [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op Yi Liu
  2023-09-21 12:10   ` Baolu Lu
  2023-09-21 12:12   ` Baolu Lu
@ 2023-09-26  6:56   ` Tian, Kevin
  2023-10-12  9:12     ` Yi Liu
  2023-10-10 16:58   ` Jason Gunthorpe
  3 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-09-26  6:56 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, Liu,
	Yi L, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, Duan, Zhenzhong, Martins, Joao

> From: Yi Liu <yi.l.liu@intel.com>
> Sent: Thursday, September 21, 2023 3:51 PM
> +
> +/**
> + * iommu_copy_user_data - Copy iommu driver specific user space data
> + * @dst_data: Pointer to an iommu driver specific user data that is defined
> in
> + *            include/uapi/linux/iommufd.h
> + * @src_data: Pointer to a struct iommu_user_data for user space data info
> + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
> + * @min_len: Initial length of user data structure for backward compatibility.
> + *           This should be offsetofend using the last member in the user data
> + *           struct that was initially added to include/uapi/linux/iommufd.h
> + */
> +static inline int iommu_copy_user_data(void *dst_data,
> +				       const struct iommu_user_data *src_data,
> +				       size_t data_len, size_t min_len)

iommu_copy_struct_from_user()?

btw given the confusion raised on how this would be used is it clearer
to move it to the patch together with the 1st user?

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct
  2023-09-21  7:51 ` [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct Yi Liu
@ 2023-09-26  7:46   ` Tian, Kevin
  2023-10-07 10:08   ` Yan Zhao
  1 sibling, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2023-09-26  7:46 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, September 21, 2023 3:51 PM
> 
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> The struct iommufd_hw_pagetable has been representing a kernel-managed
> HWPT, yet soon will be reused to represent a user-managed HWPT. These
> two types of HWPTs has the same IOMMUFD object type and an
> iommu_domain
> object, but have quite different attributes/members.
> 
> Add a union in struct iommufd_hw_pagetable and group all the existing
> kernel-managed members. One of the following patches will add another
> struct for user-managed members.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
  2023-09-21  7:51 ` [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions Yi Liu
@ 2023-09-26  7:57   ` Tian, Kevin
  2023-10-10 18:49   ` Jason Gunthorpe
  1 sibling, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2023-09-26  7:57 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, September 21, 2023 3:51 PM
> 
> -void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
> +static void iommufd_kernel_managed_hwpt_destroy(struct
> iommufd_object *obj)

'_managed_' could be removed. 'kernel_hwpt' and 'user_hwpt' are
clear enough.

ditto for all other related functions.


^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support
  2023-09-21  7:51 ` [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support Yi Liu
@ 2023-09-26  8:14   ` Tian, Kevin
  2023-10-14  0:08     ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-09-26  8:14 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, September 21, 2023 3:51 PM
> 
> +static void iommufd_user_managed_hwpt_abort(struct iommufd_object
> *obj)
> +{
> +	struct iommufd_hw_pagetable *hwpt =
> +		container_of(obj, struct iommufd_hw_pagetable, obj);
> +
> +	/* The parent->mutex must be held until finalize is called. */
> +	lockdep_assert_held(&hwpt->parent->mutex);
> +
> +	iommufd_hw_pagetable_destroy(obj);
> +}

Can you elaborate what exactly is protected by parent->mutex?

My gut-feeling that the child just grabs a refcnt on the parent
object. It doesn't change any other data of the parent.

> 
> +/**
> + * iommufd_user_managed_hwpt_alloc() - Get a user-managed
> hw_pagetable
> + * @ictx: iommufd context
> + * @pt_obj: Parent object to an HWPT to associate the domain with
> + * @idev: Device to get an iommu_domain for
> + * @flags: Flags from userspace
> + * @hwpt_type: Requested type of hw_pagetable
> + * @user_data: user_data pointer
> + * @dummy: never used
> + *
> + * Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and
> return it as
> + * a user-managed hw_pagetable.

Add some text to highlight the requirement being a parent, e.g. not
an auto domain, must be capable of being a parent, etc.

> +	case IOMMUFD_OBJ_HW_PAGETABLE:
> +		parent = container_of(pt_obj, struct iommufd_hw_pagetable,
> obj);
> +		/* No user-managed HWPT on top of an user-managed one
> */
> +		if (parent->user_managed) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}

move to alloc_fn().


^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-09-21  7:51 ` [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains Yi Liu
@ 2023-09-26  8:16   ` Tian, Kevin
  2023-10-14  0:44     ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-09-26  8:16 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, September 21, 2023 3:51 PM
> 
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt
> things.
> So, they should be only setup on kernel-managed domains. If the attaching
> domain is a user-managed domain, redirect the hwpt to hwpt->parent to do
> it correctly.
> 

No redirection. The parent should already have the configuration done
when it's created. It shouldn't be triggered in the nesting path.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-26  6:37         ` Tian, Kevin
@ 2023-09-26 16:29           ` Jason Gunthorpe
  2023-09-27  1:08             ` Tian, Kevin
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-09-26 16:29 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Robin Murphy, Baolu Lu, Liu, Yi L, joro, alex.williamson, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Tue, Sep 26, 2023 at 06:37:55AM +0000, Tian, Kevin wrote:
> > From: Robin Murphy <robin.murphy@arm.com>
> > Sent: Friday, September 22, 2023 5:48 PM
> > 
> > I could go on enjoying myself, but basically yeah, "default" can't be a
> > type in itself (at best it would be a meta-type which could be
> > requested, such that it resolves to some real type to actually
> > allocate), so a good name should reflect what the type functionally
> > *means* to the user. IIUC the important distinction is that it's an
> > abstract kernel-owned pagetable for the user to indirectly control via
> > the API, rather than one it owns and writes directly (and thus has to be
> > in a specific agreed format).
> > 
> 
> IOMMU_HWPT_TYPE_KERNEL then?
> 
> IOMMU_HWPT_TYPE_GENERIC also doesn't sound a straight word.

At the end of the day this enum is the type tag for:

 struct iommu_hwpt_alloc {
 	__u32 size;
 	__u32 pt_id;
 	__u32 out_hwpt_id;
 	__u32 __reserved;
+	__u32 hwpt_type;
+	__u32 data_len;
+	__aligned_u64 data_uptr;
 };

That pointer.

IOMMU_HWPT_ALLOC_DATA_NONE = 0
IOMMU_HWPT_ALLOC_DATA_INTEL_VTD
IOMMU_HWPT_ALLOC_DATA_ARM_SMMUV3

etc?

DATA_NONE requires data_len == 0

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-26 16:29           ` Jason Gunthorpe
@ 2023-09-27  1:08             ` Tian, Kevin
  2023-10-07  9:38               ` Yi Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-09-27  1:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, Baolu Lu, Liu, Yi L, joro, alex.williamson, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, September 27, 2023 12:29 AM
> 
> On Tue, Sep 26, 2023 at 06:37:55AM +0000, Tian, Kevin wrote:
> > > From: Robin Murphy <robin.murphy@arm.com>
> > > Sent: Friday, September 22, 2023 5:48 PM
> > >
> > > I could go on enjoying myself, but basically yeah, "default" can't be a
> > > type in itself (at best it would be a meta-type which could be
> > > requested, such that it resolves to some real type to actually
> > > allocate), so a good name should reflect what the type functionally
> > > *means* to the user. IIUC the important distinction is that it's an
> > > abstract kernel-owned pagetable for the user to indirectly control via
> > > the API, rather than one it owns and writes directly (and thus has to be
> > > in a specific agreed format).
> > >
> >
> > IOMMU_HWPT_TYPE_KERNEL then?
> >
> > IOMMU_HWPT_TYPE_GENERIC also doesn't sound a straight word.
> 
> At the end of the day this enum is the type tag for:
> 
>  struct iommu_hwpt_alloc {
>  	__u32 size;
>  	__u32 pt_id;
>  	__u32 out_hwpt_id;
>  	__u32 __reserved;
> +	__u32 hwpt_type;
> +	__u32 data_len;
> +	__aligned_u64 data_uptr;
>  };
> 
> That pointer.
> 
> IOMMU_HWPT_ALLOC_DATA_NONE = 0
> IOMMU_HWPT_ALLOC_DATA_INTEL_VTD
> IOMMU_HWPT_ALLOC_DATA_ARM_SMMUV3
> 
> etc?
> 
> DATA_NONE requires data_len == 0
> 

Looks good. Probably hwpt_type can be also renamed to data_type
to better match this interpretation.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions
  2023-09-21  7:51 ` [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions Yi Liu
@ 2023-10-07  7:20   ` Yan Zhao
  2023-10-07  9:27     ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Yan Zhao @ 2023-10-07  7:20 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

> @@ -444,10 +465,9 @@ iommufd_device_do_replace(struct iommufd_device *idev,
>  	}
>  
>  	old_hwpt = igroup->hwpt;
> -	if (hwpt->ioas != old_hwpt->ioas) {
> +	if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
>  		list_for_each_entry(cur, &igroup->device_list, group_item) {
> -			rc = iopt_table_enforce_dev_resv_regions(
> -				&hwpt->ioas->iopt, cur->dev, NULL);
> +			rc = iommufd_device_enforce_rr(cur, hwpt, NULL);
>  			if (rc)
>  				goto err_unresv;
>  		}
> @@ -461,12 +481,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
>  	if (rc)
>  		goto err_unresv;
>  
> -	if (hwpt->ioas != old_hwpt->ioas) {
> +	if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
>  		list_for_each_entry(cur, &igroup->device_list, group_item)
> -			iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
> -						  cur->dev);
> +			iommufd_device_remove_rr(cur, hwpt);
Should be "iommufd_device_remove_rr(cur, old_hwpt);"

>  	}
> -
>  	igroup->hwpt = hwpt;
>  
>  	/*
 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions
  2023-10-07  7:20   ` Yan Zhao
@ 2023-10-07  9:27     ` Nicolin Chen
  0 siblings, 0 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-07  9:27 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Sat, Oct 07, 2023 at 03:20:41PM +0800, Yan Zhao wrote:
> > @@ -444,10 +465,9 @@ iommufd_device_do_replace(struct iommufd_device *idev,
> >       }
> >
> >       old_hwpt = igroup->hwpt;
> > -     if (hwpt->ioas != old_hwpt->ioas) {
> > +     if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
> >               list_for_each_entry(cur, &igroup->device_list, group_item) {
> > -                     rc = iopt_table_enforce_dev_resv_regions(
> > -                             &hwpt->ioas->iopt, cur->dev, NULL);
> > +                     rc = iommufd_device_enforce_rr(cur, hwpt, NULL);
> >                       if (rc)
> >                               goto err_unresv;
> >               }
> > @@ -461,12 +481,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
> >       if (rc)
> >               goto err_unresv;
> >
> > -     if (hwpt->ioas != old_hwpt->ioas) {
> > +     if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
> >               list_for_each_entry(cur, &igroup->device_list, group_item)
> > -                     iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
> > -                                               cur->dev);
> > +                     iommufd_device_remove_rr(cur, hwpt);
> Should be "iommufd_device_remove_rr(cur, old_hwpt);"

Ah, right. Should fix this.

Thanks!
Nicolin

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-27  1:08             ` Tian, Kevin
@ 2023-10-07  9:38               ` Yi Liu
  0 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-10-07  9:38 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: Robin Murphy, Baolu Lu, joro, alex.williamson, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On 2023/9/27 09:08, Tian, Kevin wrote:
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Wednesday, September 27, 2023 12:29 AM
>>
>> On Tue, Sep 26, 2023 at 06:37:55AM +0000, Tian, Kevin wrote:
>>>> From: Robin Murphy <robin.murphy@arm.com>
>>>> Sent: Friday, September 22, 2023 5:48 PM
>>>>
>>>> I could go on enjoying myself, but basically yeah, "default" can't be a
>>>> type in itself (at best it would be a meta-type which could be
>>>> requested, such that it resolves to some real type to actually
>>>> allocate), so a good name should reflect what the type functionally
>>>> *means* to the user. IIUC the important distinction is that it's an
>>>> abstract kernel-owned pagetable for the user to indirectly control via
>>>> the API, rather than one it owns and writes directly (and thus has to be
>>>> in a specific agreed format).
>>>>
>>>
>>> IOMMU_HWPT_TYPE_KERNEL then?
>>>
>>> IOMMU_HWPT_TYPE_GENERIC also doesn't sound a straight word.
>>
>> At the end of the day this enum is the type tag for:
>>
>>   struct iommu_hwpt_alloc {
>>   	__u32 size;
>>   	__u32 pt_id;
>>   	__u32 out_hwpt_id;
>>   	__u32 __reserved;
>> +	__u32 hwpt_type;
>> +	__u32 data_len;
>> +	__aligned_u64 data_uptr;
>>   };
>>
>> That pointer.
>>
>> IOMMU_HWPT_ALLOC_DATA_NONE = 0
>> IOMMU_HWPT_ALLOC_DATA_INTEL_VTD
>> IOMMU_HWPT_ALLOC_DATA_ARM_SMMUV3
>>
>> etc?
>>
>> DATA_NONE requires data_len == 0
>>
> 
> Looks good. Probably hwpt_type can be also renamed to data_type
> to better match this interpretation.

ack.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct
  2023-09-21  7:51 ` [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct Yi Liu
  2023-09-26  7:46   ` Tian, Kevin
@ 2023-10-07 10:08   ` Yan Zhao
  2023-10-09  4:13     ` Yi Liu
  1 sibling, 1 reply; 106+ messages in thread
From: Yan Zhao @ 2023-10-07 10:08 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Sep 21, 2023 at 12:51:24AM -0700, Yi Liu wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> The struct iommufd_hw_pagetable has been representing a kernel-managed
> HWPT, yet soon will be reused to represent a user-managed HWPT. These
> two types of HWPTs has the same IOMMUFD object type and an iommu_domain
> object, but have quite different attributes/members.
> 
> Add a union in struct iommufd_hw_pagetable and group all the existing
> kernel-managed members. One of the following patches will add another
> struct for user-managed members.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 3064997a0181..947a797536e3 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
>   */
>  struct iommufd_hw_pagetable {
>  	struct iommufd_object obj;
> -	struct iommufd_ioas *ioas;
>  	struct iommu_domain *domain;
> -	bool auto_domain : 1;
> -	bool enforce_cache_coherency : 1;
> -	bool msi_cookie : 1;
> -	/* Head at iommufd_ioas::hwpt_list */
> -	struct list_head hwpt_item;
> +
> +	union {
> +		struct { /* kernel-managed */
> +			struct iommufd_ioas *ioas;
> +			bool auto_domain : 1;
Will iommufd_hw_pagetable_put() also be called on non-kernel-managed domain?
If yes, hwpt->user_managed needs to be checked in iommufd_hw_pagetable_put(),
(e.g. as below).
Otherwise, this union will lead to hwpt->ioas and hwpt->auto_domain still being
accessible though invalid.


 static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
                                            struct iommufd_hw_pagetable *hwpt)
 {
-       lockdep_assert_not_held(&hwpt->ioas->mutex);
-       if (hwpt->auto_domain)
+       if (!hwpt->user_managed)
+               lockdep_assert_not_held(&hwpt->ioas->mutex);
+
+       if (!hwpt->user_managed && hwpt->auto_domain)
                iommufd_object_deref_user(ictx, &hwpt->obj);
        else
                refcount_dec(&hwpt->obj.users);
}

> +			bool enforce_cache_coherency : 1;
> +			bool msi_cookie : 1;
> +			/* Head at iommufd_ioas::hwpt_list */
> +			struct list_head hwpt_item;
> +		};
> +	};
>  };
>  
>  struct iommufd_hw_pagetable *
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct
  2023-10-07 10:08   ` Yan Zhao
@ 2023-10-09  4:13     ` Yi Liu
  2023-10-09  5:13       ` Yan Zhao
  0 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-09  4:13 UTC (permalink / raw)
  To: Yan Zhao
  Cc: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On 2023/10/7 18:08, Yan Zhao wrote:
> On Thu, Sep 21, 2023 at 12:51:24AM -0700, Yi Liu wrote:
>> From: Nicolin Chen <nicolinc@nvidia.com>
>>
>> The struct iommufd_hw_pagetable has been representing a kernel-managed
>> HWPT, yet soon will be reused to represent a user-managed HWPT. These
>> two types of HWPTs has the same IOMMUFD object type and an iommu_domain
>> object, but have quite different attributes/members.
>>
>> Add a union in struct iommufd_hw_pagetable and group all the existing
>> kernel-managed members. One of the following patches will add another
>> struct for user-managed members.
>>
>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>>   drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------
>>   1 file changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
>> index 3064997a0181..947a797536e3 100644
>> --- a/drivers/iommu/iommufd/iommufd_private.h
>> +++ b/drivers/iommu/iommufd/iommufd_private.h
>> @@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
>>    */
>>   struct iommufd_hw_pagetable {
>>   	struct iommufd_object obj;
>> -	struct iommufd_ioas *ioas;
>>   	struct iommu_domain *domain;
>> -	bool auto_domain : 1;
>> -	bool enforce_cache_coherency : 1;
>> -	bool msi_cookie : 1;
>> -	/* Head at iommufd_ioas::hwpt_list */
>> -	struct list_head hwpt_item;
>> +
>> +	union {
>> +		struct { /* kernel-managed */
>> +			struct iommufd_ioas *ioas;
>> +			bool auto_domain : 1;
> Will iommufd_hw_pagetable_put() also be called on non-kernel-managed domain?

yes.

> If yes, hwpt->user_managed needs to be checked in iommufd_hw_pagetable_put(),
> (e.g. as below).
> Otherwise, this union will lead to hwpt->ioas and hwpt->auto_domain still being
> accessible though invalid.

not quite get this sentence.

> 
>   static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
>                                              struct iommufd_hw_pagetable *hwpt)
>   {
> -       lockdep_assert_not_held(&hwpt->ioas->mutex);
> -       if (hwpt->auto_domain)
> +       if (!hwpt->user_managed)
> +               lockdep_assert_not_held(&hwpt->ioas->mutex);

this is true. this assert is not needed when hwpt is not kernel managed domain.

> +
> +       if (!hwpt->user_managed && hwpt->auto_domain)

actually, checking auto_domain is more precise. There is hwpt which is 
neither user managed nor auto.

>                  iommufd_object_deref_user(ictx, &hwpt->obj);
>          else
>                  refcount_dec(&hwpt->obj.users);
> }
> 
>> +			bool enforce_cache_coherency : 1;
>> +			bool msi_cookie : 1;
>> +			/* Head at iommufd_ioas::hwpt_list */
>> +			struct list_head hwpt_item;
>> +		};
>> +	};
>>   };
>>   
>>   struct iommufd_hw_pagetable *
>> -- 
>> 2.34.1
>>

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct
  2023-10-09  4:13     ` Yi Liu
@ 2023-10-09  5:13       ` Yan Zhao
  2023-10-10  3:26         ` Yi Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Yan Zhao @ 2023-10-09  5:13 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Mon, Oct 09, 2023 at 12:13:52PM +0800, Yi Liu wrote:
> On 2023/10/7 18:08, Yan Zhao wrote:
> > On Thu, Sep 21, 2023 at 12:51:24AM -0700, Yi Liu wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > 
> > > The struct iommufd_hw_pagetable has been representing a kernel-managed
> > > HWPT, yet soon will be reused to represent a user-managed HWPT. These
> > > two types of HWPTs has the same IOMMUFD object type and an iommu_domain
> > > object, but have quite different attributes/members.
> > > 
> > > Add a union in struct iommufd_hw_pagetable and group all the existing
> > > kernel-managed members. One of the following patches will add another
> > > struct for user-managed members.
> > > 
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >   drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------
> > >   1 file changed, 11 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> > > index 3064997a0181..947a797536e3 100644
> > > --- a/drivers/iommu/iommufd/iommufd_private.h
> > > +++ b/drivers/iommu/iommufd/iommufd_private.h
> > > @@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
> > >    */
> > >   struct iommufd_hw_pagetable {
> > >   	struct iommufd_object obj;
> > > -	struct iommufd_ioas *ioas;
> > >   	struct iommu_domain *domain;
> > > -	bool auto_domain : 1;
> > > -	bool enforce_cache_coherency : 1;
> > > -	bool msi_cookie : 1;
> > > -	/* Head at iommufd_ioas::hwpt_list */
> > > -	struct list_head hwpt_item;
> > > +
> > > +	union {
> > > +		struct { /* kernel-managed */
> > > +			struct iommufd_ioas *ioas;
> > > +			bool auto_domain : 1;
> > Will iommufd_hw_pagetable_put() also be called on non-kernel-managed domain?
> 
> yes.
> 
> > If yes, hwpt->user_managed needs to be checked in iommufd_hw_pagetable_put(),
> > (e.g. as below).
> > Otherwise, this union will lead to hwpt->ioas and hwpt->auto_domain still being
> > accessible though invalid.
> 
> not quite get this sentence.
I mean with this union, hwpt->auto_domain or hwpt->ioas should only be accessed if and
only if hwpt type is kernel-managed.
So, any unconditional access, as in iommufd_hw_pagetable_put() pasted below, is buggy.

Therefore, do you think it's better to add a filed like "bool kernel_managed : 1",
and access the union fields under  /* kernel-managed */ only when hwpt->kernel_managed
is true.


> 
> > 
> >   static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
> >                                              struct iommufd_hw_pagetable *hwpt)
> >   {
> > -       lockdep_assert_not_held(&hwpt->ioas->mutex);
> > -       if (hwpt->auto_domain)
> > +       if (!hwpt->user_managed)
> > +               lockdep_assert_not_held(&hwpt->ioas->mutex);
> 
> this is true. this assert is not needed when hwpt is not kernel managed domain.
> 
> > +
> > +       if (!hwpt->user_managed && hwpt->auto_domain)
> 
> actually, checking auto_domain is more precise. There is hwpt which is
> neither user managed nor auto.

auto_domain is under union fields marked with kernel-managed only.
Access it without type checking is invalid.

struct iommufd_hw_pagetable {
        struct iommufd_object obj;
        struct iommu_domain *domain;

        void (*abort)(struct iommufd_object *obj);
        void (*destroy)(struct iommufd_object *obj);

        bool user_managed : 1;
        union {
                struct { /* user-managed */
                        struct iommufd_hw_pagetable *parent;
                };
                struct { /* kernel-managed */
                        struct iommufd_ioas *ioas;
                        struct mutex mutex;
                        bool auto_domain : 1;
                        bool enforce_cache_coherency : 1;
                        bool msi_cookie : 1;
                        bool nest_parent : 1;
                        /* Head at iommufd_ioas::hwpt_list */
                        struct list_head hwpt_item;
                };
        };
};

> 
> >                  iommufd_object_deref_user(ictx, &hwpt->obj);
> >          else
> >                  refcount_dec(&hwpt->obj.users);
> > }
> > 
> > > +			bool enforce_cache_coherency : 1;
> > > +			bool msi_cookie : 1;
> > > +			/* Head at iommufd_ioas::hwpt_list */
> > > +			struct list_head hwpt_item;
> > > +		};
> > > +	};
> > >   };
> > >   struct iommufd_hw_pagetable *
> > > -- 
> > > 2.34.1
> > > 
> 
> -- 
> Regards,
> Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct
  2023-10-09  5:13       ` Yan Zhao
@ 2023-10-10  3:26         ` Yi Liu
  0 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-10-10  3:26 UTC (permalink / raw)
  To: Yan Zhao
  Cc: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On 2023/10/9 13:13, Yan Zhao wrote:
> On Mon, Oct 09, 2023 at 12:13:52PM +0800, Yi Liu wrote:
>> On 2023/10/7 18:08, Yan Zhao wrote:
>>> On Thu, Sep 21, 2023 at 12:51:24AM -0700, Yi Liu wrote:
>>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>>>
>>>> The struct iommufd_hw_pagetable has been representing a kernel-managed
>>>> HWPT, yet soon will be reused to represent a user-managed HWPT. These
>>>> two types of HWPTs has the same IOMMUFD object type and an iommu_domain
>>>> object, but have quite different attributes/members.
>>>>
>>>> Add a union in struct iommufd_hw_pagetable and group all the existing
>>>> kernel-managed members. One of the following patches will add another
>>>> struct for user-managed members.
>>>>
>>>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>> ---
>>>>    drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------
>>>>    1 file changed, 11 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
>>>> index 3064997a0181..947a797536e3 100644
>>>> --- a/drivers/iommu/iommufd/iommufd_private.h
>>>> +++ b/drivers/iommu/iommufd/iommufd_private.h
>>>> @@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
>>>>     */
>>>>    struct iommufd_hw_pagetable {
>>>>    	struct iommufd_object obj;
>>>> -	struct iommufd_ioas *ioas;
>>>>    	struct iommu_domain *domain;
>>>> -	bool auto_domain : 1;
>>>> -	bool enforce_cache_coherency : 1;
>>>> -	bool msi_cookie : 1;
>>>> -	/* Head at iommufd_ioas::hwpt_list */
>>>> -	struct list_head hwpt_item;
>>>> +
>>>> +	union {
>>>> +		struct { /* kernel-managed */
>>>> +			struct iommufd_ioas *ioas;
>>>> +			bool auto_domain : 1;
>>> Will iommufd_hw_pagetable_put() also be called on non-kernel-managed domain?
>>
>> yes.
>>
>>> If yes, hwpt->user_managed needs to be checked in iommufd_hw_pagetable_put(),
>>> (e.g. as below).
>>> Otherwise, this union will lead to hwpt->ioas and hwpt->auto_domain still being
>>> accessible though invalid.
>>
>> not quite get this sentence.
> I mean with this union, hwpt->auto_domain or hwpt->ioas should only be accessed if and
> only if hwpt type is kernel-managed.

ok, I get this part. just not sure about the missing words in your prior 
comment.

> So, any unconditional access, as in iommufd_hw_pagetable_put() pasted below, is buggy.
> 
> Therefore, do you think it's better to add a filed like "bool kernel_managed : 1",
> and access the union fields under  /* kernel-managed */ only when hwpt->kernel_managed
> is true.
> 
> 
>>
>>>
>>>    static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
>>>                                               struct iommufd_hw_pagetable *hwpt)
>>>    {
>>> -       lockdep_assert_not_held(&hwpt->ioas->mutex);
>>> -       if (hwpt->auto_domain)
>>> +       if (!hwpt->user_managed)
>>> +               lockdep_assert_not_held(&hwpt->ioas->mutex);
>>
>> this is true. this assert is not needed when hwpt is not kernel managed domain.
>>
>>> +
>>> +       if (!hwpt->user_managed && hwpt->auto_domain)
>>
>> actually, checking auto_domain is more precise. There is hwpt which is
>> neither user managed nor auto.
> 
> auto_domain is under union fields marked with kernel-managed only.
> Access it without type checking is invalid.

I see. yes, should check user_managed as well. :)

> struct iommufd_hw_pagetable {
>          struct iommufd_object obj;
>          struct iommu_domain *domain;
> 
>          void (*abort)(struct iommufd_object *obj);
>          void (*destroy)(struct iommufd_object *obj);
> 
>          bool user_managed : 1;
>          union {
>                  struct { /* user-managed */
>                          struct iommufd_hw_pagetable *parent;
>                  };
>                  struct { /* kernel-managed */
>                          struct iommufd_ioas *ioas;
>                          struct mutex mutex;
>                          bool auto_domain : 1;
>                          bool enforce_cache_coherency : 1;
>                          bool msi_cookie : 1;
>                          bool nest_parent : 1;
>                          /* Head at iommufd_ioas::hwpt_list */
>                          struct list_head hwpt_item;
>                  };
>          };
> };
> 
>>
>>>                   iommufd_object_deref_user(ictx, &hwpt->obj);
>>>           else
>>>                   refcount_dec(&hwpt->obj.users);
>>> }
>>>
>>>> +			bool enforce_cache_coherency : 1;
>>>> +			bool msi_cookie : 1;
>>>> +			/* Head at iommufd_ioas::hwpt_list */
>>>> +			struct list_head hwpt_item;
>>>> +		};
>>>> +	};
>>>>    };
>>>>    struct iommufd_hw_pagetable *
>>>> -- 
>>>> 2.34.1
>>>>
>>
>> -- 
>> Regards,
>> Yi Liu

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 00/17] iommufd: Add nesting infrastructure
  2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
                   ` (16 preceding siblings ...)
  2023-09-21  7:51 ` [PATCH v4 17/17] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
@ 2023-10-10 16:53 ` Jason Gunthorpe
  2023-10-12  8:45   ` Yi Liu
  17 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-10 16:53 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Sep 21, 2023 at 12:51:21AM -0700, Yi Liu wrote:
> v4:
>  - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
>    and kernel-managed HWPTs
>  - Rework invalidate uAPI to be a multi-request array-based design
>  - Add a struct iommu_user_data_array and a helper for driver to sanitize
>    and copy the entry data from user space invalidation array
>  - Add a patch fixing TEST_LENGTH() in selftest program
>  - Drop IOMMU_RESV_IOVA_RANGES patches
>  - Update kdoc and inline comments
>  - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
>    this does not change the rule that resv regions should only be added to the
>    kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
>    as it is needed only by SMMU so far.

This doesn't apply to iommufd next so you will have to resend it

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-21  7:51 ` [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op Yi Liu
                     ` (2 preceding siblings ...)
  2023-09-26  6:56   ` Tian, Kevin
@ 2023-10-10 16:58   ` Jason Gunthorpe
  2023-10-12  9:11     ` Yi Liu
  2023-10-13  0:34     ` Nicolin Chen
  3 siblings, 2 replies; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-10 16:58 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 660dc1931dc9..12e12e5563e6 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -14,6 +14,7 @@
>  #include <linux/err.h>
>  #include <linux/of.h>
>  #include <uapi/linux/iommu.h>
> +#include <uapi/linux/iommufd.h>

Oh we should definately avoid doing that!
  
Maybe this is a good moment to start a new header file exclusively for
iommu drivers and core subsystem to include?

 include/linux/iommu-driver.h

?

Put iommu_copy_user_data() and  struct iommu_user_data in there

Avoid this include in this file.

>  #define IOMMU_READ	(1 << 0)
>  #define IOMMU_WRITE	(1 << 1)
> @@ -227,6 +228,41 @@ struct iommu_iotlb_gather {
>  	bool			queued;
>  };
>  
> +/**
> + * struct iommu_user_data - iommu driver specific user space data info
> + * @uptr: Pointer to the user buffer for copy_from_user()
> + * @len: The length of the user buffer in bytes
> + *
> + * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
> + * Both @uptr and @len should be just copied from an iommufd core uAPI structure
> + */
> +struct iommu_user_data {
> +	void __user *uptr;
> +	size_t len;
> +};

Put the "hwpt_type" in here and just call it type

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 02/17] iommu: Add nested domain support
  2023-09-21  7:51 ` [PATCH v4 02/17] iommu: Add nested domain support Yi Liu
@ 2023-10-10 17:21   ` Jason Gunthorpe
  2023-10-12  9:13     ` Yi Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-10 17:21 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Sep 21, 2023 at 12:51:23AM -0700, Yi Liu wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> Introduce a new domain type for a user I/O page table, which is nested on
> top of another user space address represented by a UNMANAGED
> domain. The

Lets start using the world PAGING whenever you want to type
UNMANAGED. I'm trying to get rid of UNMANAGED.

> @@ -241,6 +245,21 @@ struct iommu_user_data {
>  	size_t len;
>  };
>  
> +/**
> + * struct iommu_user_data_array - iommu driver specific user space data array
> + * @uptr: Pointer to the user buffer array for copy_from_user()
> + * @entry_len: The fixed-width length of a entry in the array, in bytes
> + * @entry_num: The number of total entries in the array
> + *
> + * A array having a @entry_num number of @entry_len sized entries, each entry is
> + * user space data, i.e. an uAPI that is defined in include/uapi/linux/iommufd.h
> + */
> +struct iommu_user_data_array {
> +	void __user *uptr;
> +	size_t entry_len;
> +	int entry_num;
> +};

Ditto about iommu-driver.h for most of this stuff

> +
>  /**
>   * iommu_copy_user_data - Copy iommu driver specific user space data
>   * @dst_data: Pointer to an iommu driver specific user data that is defined in
> @@ -263,6 +282,34 @@ static inline int iommu_copy_user_data(void *dst_data,
>  				     src_data->uptr, src_data->len);
>  }
>  
> +/**
> + * iommu_copy_user_data_from_array - Copy iommu driver specific user space data
> + *                                   from an iommu_user_data_array input
> + * @dst_data: Pointer to an iommu driver specific user data that is defined in
> + *            include/uapi/linux/iommufd.h
> + * @src_data: Pointer to a struct iommu_user_data_array for user space data array
> + * @index: Index to offset the location in the array to copy user data from
> + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
> + * @min_len: Initial length of user data structure for backward compatibility.
> + *           This should be offsetofend using the last member in the user data
> + *           struct that was initially added to include/uapi/linux/iommufd.h
> + */
> +static inline int
> +iommu_copy_user_data_from_array(void *dst_data,
> +				const struct iommu_user_data_array *src_array,
> +				int index, size_t data_len, size_t min_len)

Index should be 'unsigned int'

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
  2023-09-21  7:51 ` [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions Yi Liu
  2023-09-26  7:57   ` Tian, Kevin
@ 2023-10-10 18:49   ` Jason Gunthorpe
  2023-10-12 19:09     ` Jason Gunthorpe
  1 sibling, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-10 18:49 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Sep 21, 2023 at 12:51:26AM -0700, Yi Liu wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> As one of the previous commits mentioned, a user-managed HWPT will have
> some different attributes/members. It'd be more clear by having separate
> allocators. Since the existing iommufd_hw_pagetable_alloc() serves well
> kernel-managed HWPTs, apply some minimal updates to mark it as a kernel-
> managed HWPT allocator.
> 
> Also, add a pair of function pointers (abort and destroy) in the struct,
> to separate different cleanup routines. Then rename the existing cleanup
> functions to iommufd_kernel_managed_hwpt_destroy/abort() linked to the
> HWPT in the allocator.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/hw_pagetable.c    | 34 ++++++++++++++++++++-----
>  drivers/iommu/iommufd/iommufd_private.h |  3 +++
>  2 files changed, 31 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 554a9c3d740f..1cc7178121d1 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -8,7 +8,7 @@
>  #include "../iommu-priv.h"
>  #include "iommufd_private.h"
>  
> -void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
> +static void iommufd_kernel_managed_hwpt_destroy(struct iommufd_object *obj)
>  {
>  	struct iommufd_hw_pagetable *hwpt =
>  		container_of(obj, struct iommufd_hw_pagetable, obj);
> @@ -27,7 +27,12 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
>  	refcount_dec(&hwpt->ioas->obj.users);
>  }
>  
> -void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
> +void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
> +{
> +	container_of(obj, struct iommufd_hw_pagetable, obj)->destroy(obj);
> +}
> +
> +static void iommufd_kernel_managed_hwpt_abort(struct iommufd_object *obj)
>  {
>  	struct iommufd_hw_pagetable *hwpt =
>  		container_of(obj, struct iommufd_hw_pagetable, obj);
> @@ -42,6 +47,11 @@ void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
>  	iommufd_hw_pagetable_destroy(obj);
>  }
>  
> +void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
> +{
> +	container_of(obj, struct iommufd_hw_pagetable, obj)->abort(obj);
> +}
> +
>  int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
>  {
>  	if (hwpt->enforce_cache_coherency)
> @@ -57,7 +67,7 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
>  }
>  
>  /**
> - * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
> + * iommufd_hw_pagetable_alloc() - Get a kernel-managed iommu_domain for a device
>   * @ictx: iommufd context
>   * @ioas: IOAS to associate the domain with
>   * @idev: Device to get an iommu_domain for
> @@ -66,9 +76,9 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
>   * @user_data: Optional user_data pointer
>   * @immediate_attach: True if idev should be attached to the hwpt
>   *
> - * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
> - * will be linked to the given ioas and upon return the underlying iommu_domain
> - * is fully popoulated.
> + * Allocate a new iommu_domain (must be IOMMU_DOMAIN_UNMANAGED) and return it as
> + * a kernel-managed hw_pagetable. The HWPT will be linked to the given ioas and
> + * upon return the underlying iommu_domain is fully popoulated.
>   *
>   * The caller must hold the ioas->mutex until after
>   * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on
> @@ -103,6 +113,8 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
>  	/* Pairs with iommufd_hw_pagetable_destroy() */
>  	refcount_inc(&ioas->obj.users);
>  	hwpt->ioas = ioas;
> +	hwpt->abort = iommufd_kernel_managed_hwpt_abort;
> +	hwpt->destroy = iommufd_kernel_managed_hwpt_destroy;
>  
>  	if (ops->domain_alloc_user) {
>  		hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
> @@ -121,6 +133,16 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
>  		}
>  	}
>  
> +	if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED)) {
> +		rc = -EINVAL;
> +		goto out_abort;
> +	}
> +	/* Driver is buggy by mixing user-managed op in kernel-managed ops */
> +	if (WARN_ON_ONCE(hwpt->domain->ops->cache_invalidate_user)) {
> +		rc = -EINVAL;
> +		goto out_abort;
> +	}
> +
>  	/*
>  	 * Set the coherency mode before we do iopt_table_add_domain() as some
>  	 * iommus have a per-PTE bit that controls it and need to decide before
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 1d3b1a74e854..3e89c3d530f3 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -234,6 +234,9 @@ struct iommufd_hw_pagetable {
>  	struct iommufd_object obj;
>  	struct iommu_domain *domain;
>  
> +	void (*abort)(struct iommufd_object *obj);
> +	void (*destroy)(struct iommufd_object *obj);
> +
>  	union {
>  		struct { /* kernel-managed */
>  			struct iommufd_ioas *ioas;

I think if you are doing this you are trying too hard to share the
struct.. Defaintely want to avoid function pointers in general, and
function pointers in a writable struct in particular.

An if of some sort in the two functions would be clearer

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 00/17] iommufd: Add nesting infrastructure
  2023-10-10 16:53 ` [PATCH v4 00/17] iommufd: Add nesting infrastructure Jason Gunthorpe
@ 2023-10-12  8:45   ` Yi Liu
  0 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-10-12  8:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On 2023/10/11 00:53, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:51:21AM -0700, Yi Liu wrote:
>> v4:
>>   - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
>>     and kernel-managed HWPTs
>>   - Rework invalidate uAPI to be a multi-request array-based design
>>   - Add a struct iommu_user_data_array and a helper for driver to sanitize
>>     and copy the entry data from user space invalidation array
>>   - Add a patch fixing TEST_LENGTH() in selftest program
>>   - Drop IOMMU_RESV_IOVA_RANGES patches
>>   - Update kdoc and inline comments
>>   - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
>>     this does not change the rule that resv regions should only be added to the
>>     kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
>>     as it is needed only by SMMU so far.
> 
> This doesn't apply to iommufd next so you will have to resend it

sure. I'm working with Nic on a new version.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-10 16:58   ` Jason Gunthorpe
@ 2023-10-12  9:11     ` Yi Liu
  2023-10-12 13:39       ` Jason Gunthorpe
  2023-10-13  0:34     ` Nicolin Chen
  1 sibling, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-12  9:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On 2023/10/11 00:58, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 660dc1931dc9..12e12e5563e6 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -14,6 +14,7 @@
>>   #include <linux/err.h>
>>   #include <linux/of.h>
>>   #include <uapi/linux/iommu.h>
>> +#include <uapi/linux/iommufd.h>
> 
> Oh we should definately avoid doing that!
>    
> Maybe this is a good moment to start a new header file exclusively for
> iommu drivers and core subsystem to include?
> 
>   include/linux/iommu-driver.h
> 
> ?
> 
> Put iommu_copy_user_data() and  struct iommu_user_data in there
> 
> Avoid this include in this file.

sure. btw. seems all the user of this API and structure are in the
drivers/iommu directory. can we just putting them in
drivers/iommu/iommu-priv.h?

> 
>>   #define IOMMU_READ	(1 << 0)
>>   #define IOMMU_WRITE	(1 << 1)
>> @@ -227,6 +228,41 @@ struct iommu_iotlb_gather {
>>   	bool			queued;
>>   };
>>   
>> +/**
>> + * struct iommu_user_data - iommu driver specific user space data info
>> + * @uptr: Pointer to the user buffer for copy_from_user()
>> + * @len: The length of the user buffer in bytes
>> + *
>> + * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
>> + * Both @uptr and @len should be just copied from an iommufd core uAPI structure
>> + */
>> +struct iommu_user_data {
>> +	void __user *uptr;
>> +	size_t len;
>> +};
> 
> Put the "hwpt_type" in here and just call it type

I assume this is a change related to the above comment to avoid
including uapi/linux/iommufd.h. right?

Just one concern. There are other paths (like cache_invalidate of
this series and Nic's set_dev_data) uses this struct as well. I'm
a bit worrying if it is good to put type here as type is meaningful
for the domain_alloc_user path.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-09-26  6:56   ` Tian, Kevin
@ 2023-10-12  9:12     ` Yi Liu
  2023-10-13 11:42       ` Yi Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-12  9:12 UTC (permalink / raw)
  To: Tian, Kevin, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On 2023/9/26 14:56, Tian, Kevin wrote:
>> From: Yi Liu <yi.l.liu@intel.com>
>> Sent: Thursday, September 21, 2023 3:51 PM
>> +
>> +/**
>> + * iommu_copy_user_data - Copy iommu driver specific user space data
>> + * @dst_data: Pointer to an iommu driver specific user data that is defined
>> in
>> + *            include/uapi/linux/iommufd.h
>> + * @src_data: Pointer to a struct iommu_user_data for user space data info
>> + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
>> + * @min_len: Initial length of user data structure for backward compatibility.
>> + *           This should be offsetofend using the last member in the user data
>> + *           struct that was initially added to include/uapi/linux/iommufd.h
>> + */
>> +static inline int iommu_copy_user_data(void *dst_data,
>> +				       const struct iommu_user_data *src_data,
>> +				       size_t data_len, size_t min_len)
> 
> iommu_copy_struct_from_user()?
> 
> btw given the confusion raised on how this would be used is it clearer
> to move it to the patch together with the 1st user?

sure. How about your opinion? @Nic.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 02/17] iommu: Add nested domain support
  2023-10-10 17:21   ` Jason Gunthorpe
@ 2023-10-12  9:13     ` Yi Liu
  2023-10-14  0:47       ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-12  9:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On 2023/10/11 01:21, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:51:23AM -0700, Yi Liu wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>
>> Introduce a new domain type for a user I/O page table, which is nested on
>> top of another user space address represented by a UNMANAGED
>> domain. The
> 
> Lets start using the world PAGING whenever you want to type
> UNMANAGED. I'm trying to get rid of UNMANAGED.

sure.

>> @@ -241,6 +245,21 @@ struct iommu_user_data {
>>   	size_t len;
>>   };
>>   
>> +/**
>> + * struct iommu_user_data_array - iommu driver specific user space data array
>> + * @uptr: Pointer to the user buffer array for copy_from_user()
>> + * @entry_len: The fixed-width length of a entry in the array, in bytes
>> + * @entry_num: The number of total entries in the array
>> + *
>> + * A array having a @entry_num number of @entry_len sized entries, each entry is
>> + * user space data, i.e. an uAPI that is defined in include/uapi/linux/iommufd.h
>> + */
>> +struct iommu_user_data_array {
>> +	void __user *uptr;
>> +	size_t entry_len;
>> +	int entry_num;
>> +};
> 
> Ditto about iommu-driver.h for most of this stuff

ack.

>> +
>>   /**
>>    * iommu_copy_user_data - Copy iommu driver specific user space data
>>    * @dst_data: Pointer to an iommu driver specific user data that is defined in
>> @@ -263,6 +282,34 @@ static inline int iommu_copy_user_data(void *dst_data,
>>   				     src_data->uptr, src_data->len);
>>   }
>>   
>> +/**
>> + * iommu_copy_user_data_from_array - Copy iommu driver specific user space data
>> + *                                   from an iommu_user_data_array input
>> + * @dst_data: Pointer to an iommu driver specific user data that is defined in
>> + *            include/uapi/linux/iommufd.h
>> + * @src_data: Pointer to a struct iommu_user_data_array for user space data array
>> + * @index: Index to offset the location in the array to copy user data from
>> + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
>> + * @min_len: Initial length of user data structure for backward compatibility.
>> + *           This should be offsetofend using the last member in the user data
>> + *           struct that was initially added to include/uapi/linux/iommufd.h
>> + */
>> +static inline int
>> +iommu_copy_user_data_from_array(void *dst_data,
>> +				const struct iommu_user_data_array *src_array,
>> +				int index, size_t data_len, size_t min_len)
> 
> Index should be 'unsigned int'

yes.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-12  9:11     ` Yi Liu
@ 2023-10-12 13:39       ` Jason Gunthorpe
  2023-10-13  4:33         ` Yi Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-12 13:39 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Oct 12, 2023 at 05:11:09PM +0800, Yi Liu wrote:
> On 2023/10/11 00:58, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index 660dc1931dc9..12e12e5563e6 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -14,6 +14,7 @@
> > >   #include <linux/err.h>
> > >   #include <linux/of.h>
> > >   #include <uapi/linux/iommu.h>
> > > +#include <uapi/linux/iommufd.h>
> > 
> > Oh we should definately avoid doing that!
> > Maybe this is a good moment to start a new header file exclusively for
> > iommu drivers and core subsystem to include?
> > 
> >   include/linux/iommu-driver.h
> > 
> > ?
> > 
> > Put iommu_copy_user_data() and  struct iommu_user_data in there
> > 
> > Avoid this include in this file.
> 
> sure. btw. seems all the user of this API and structure are in the
> drivers/iommu directory. can we just putting them in
> drivers/iommu/iommu-priv.h?

iommu-priv.h should be private to the core iommu code, and we sort of
extended it to iommufd as well.

iommu-driver.h would be "private" to the core and all the drivers
only.

As include ../.. is often frown on at large scale it is probably
better to be in include/linux

> Just one concern. There are other paths (like cache_invalidate of
> this series and Nic's set_dev_data) uses this struct as well. I'm
> a bit worrying if it is good to put type here as type is meaningful
> for the domain_alloc_user path.

There is always a type though? I haven't got that far in the series
yet to see..

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
  2023-10-10 18:49   ` Jason Gunthorpe
@ 2023-10-12 19:09     ` Jason Gunthorpe
  2023-10-13  7:13       ` Tian, Kevin
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-12 19:09 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Tue, Oct 10, 2023 at 03:49:32PM -0300, Jason Gunthorpe wrote:
> > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> > index 1d3b1a74e854..3e89c3d530f3 100644
> > --- a/drivers/iommu/iommufd/iommufd_private.h
> > +++ b/drivers/iommu/iommufd/iommufd_private.h
> > @@ -234,6 +234,9 @@ struct iommufd_hw_pagetable {
> >  	struct iommufd_object obj;
> >  	struct iommu_domain *domain;
> >  
> > +	void (*abort)(struct iommufd_object *obj);
> > +	void (*destroy)(struct iommufd_object *obj);
> > +
> >  	union {
> >  		struct { /* kernel-managed */
> >  			struct iommufd_ioas *ioas;
> 
> I think if you are doing this you are trying too hard to share the
> struct.. Defaintely want to avoid function pointers in general, and
> function pointers in a writable struct in particular.

I looked at this for a while and I do still have the feeling that
probably two structs and even two type IDs is probably a cleaner
design.

Like this:

// Or maybe use obj.type ?
enum iommufd_hw_pagetable_type {
	IOMMUFD_HWPT_PAGING,
	IOMMUFD_HWPT_NESTED,
};

struct iommufd_hw_pagetable_common {
	struct iommufd_object obj;
	struct iommu_domain *domain;
	enum iommufd_hw_pagetable_type type;
};

struct iommufd_hw_pagetable {
	struct iommufd_hw_pagetable_common common;
	struct iommufd_ioas *ioas;
	bool auto_domain : 1;
	bool enforce_cache_coherency : 1;
	bool msi_cookie : 1;
	/* Head at iommufd_ioas::hwpt_list */
	struct list_head hwpt_item;
};

struct iommufd_hw_pagetable_nested {
	struct iommufd_hw_pagetable_common common;
	// ??
};

I poked at it in an editor for a bit and it was looking OK but
requires breaking up a bunch of functions then I ran out of time

Also, we probably should feed enforce_cache_coherency through the
alloc_hwpt uapi and not try to autodetect it..

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-10 16:58   ` Jason Gunthorpe
  2023-10-12  9:11     ` Yi Liu
@ 2023-10-13  0:34     ` Nicolin Chen
  2023-10-13 14:03       ` Jason Gunthorpe
  1 sibling, 1 reply; 106+ messages in thread
From: Nicolin Chen @ 2023-10-13  0:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Tue, Oct 10, 2023 at 01:58:44PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 660dc1931dc9..12e12e5563e6 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -14,6 +14,7 @@
> >  #include <linux/err.h>
> >  #include <linux/of.h>
> >  #include <uapi/linux/iommu.h>
> > +#include <uapi/linux/iommufd.h>
> 
> Oh we should definately avoid doing that!
>   
> Maybe this is a good moment to start a new header file exclusively for
> iommu drivers and core subsystem to include?
> 
>  include/linux/iommu-driver.h
> 
> ?
> 
> Put iommu_copy_user_data() and  struct iommu_user_data in there
> 
> Avoid this include in this file.

By looking closer, it seems that we included the uapi header for:
+	struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags,
+						  enum iommu_hwpt_data_type data_type,
+						  struct iommu_domain *parent,
+						  const struct iommu_user_data *user_data);

So we could drop the include, and instead add this next to structs:
+enum iommu_hwpt_data_type;

Then it's not that necessary to have a new header? We could mark a
section of "driver exclusively functions" in iommu.h, I think.

> >  #define IOMMU_READ	(1 << 0)
> >  #define IOMMU_WRITE	(1 << 1)
> > @@ -227,6 +228,41 @@ struct iommu_iotlb_gather {
> >  	bool			queued;
> >  };
> >  
> > +/**
> > + * struct iommu_user_data - iommu driver specific user space data info
> > + * @uptr: Pointer to the user buffer for copy_from_user()
> > + * @len: The length of the user buffer in bytes
> > + *
> > + * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
> > + * Both @uptr and @len should be just copied from an iommufd core uAPI structure
> > + */
> > +struct iommu_user_data {
> > +	void __user *uptr;
> > +	size_t len;
> > +};
> 
> Put the "hwpt_type" in here and just call it type

Ah, then domain_alloc_user can omit the enum iommu_hwpt_data_type.

Thanks!
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-12 13:39       ` Jason Gunthorpe
@ 2023-10-13  4:33         ` Yi Liu
  2023-10-13 14:04           ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-13  4:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On 2023/10/12 21:39, Jason Gunthorpe wrote:
> On Thu, Oct 12, 2023 at 05:11:09PM +0800, Yi Liu wrote:
>> On 2023/10/11 00:58, Jason Gunthorpe wrote:
>>> On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
>>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>>> index 660dc1931dc9..12e12e5563e6 100644
>>>> --- a/include/linux/iommu.h
>>>> +++ b/include/linux/iommu.h
>>>> @@ -14,6 +14,7 @@
>>>>    #include <linux/err.h>
>>>>    #include <linux/of.h>
>>>>    #include <uapi/linux/iommu.h>
>>>> +#include <uapi/linux/iommufd.h>
>>>
>>> Oh we should definately avoid doing that!
>>> Maybe this is a good moment to start a new header file exclusively for
>>> iommu drivers and core subsystem to include?
>>>
>>>    include/linux/iommu-driver.h
>>>
>>> ?
>>>
>>> Put iommu_copy_user_data() and  struct iommu_user_data in there
>>>
>>> Avoid this include in this file.
>>
>> sure. btw. seems all the user of this API and structure are in the
>> drivers/iommu directory. can we just putting them in
>> drivers/iommu/iommu-priv.h?
> 
> iommu-priv.h should be private to the core iommu code, and we sort of
> extended it to iommufd as well.
> 
> iommu-driver.h would be "private" to the core and all the drivers
> only.
> 
> As include ../.. is often frown on at large scale it is probably
> better to be in include/linux

got it.

> 
>> Just one concern. There are other paths (like cache_invalidate of
>> this series and Nic's set_dev_data) uses this struct as well. I'm
>> a bit worrying if it is good to put type here as type is meaningful
>> for the domain_alloc_user path.
> 
> There is always a type though? I haven't got that far in the series
> yet to see..

not really. Below the users of the struct iommu_user_data in my current
iommufd_nesting branch. Only the domain_alloc_user op has type as there
can be multiple vendor specific alloc data types. Basically, I'm ok to
make the change you suggested, just not sure if it is good to add type
as it is only needed by one path.

	/* Domain allocation and freeing by the iommu driver */
	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
	struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags,
						  enum iommu_hwpt_type hwpt_type,
						  struct iommu_domain *parent,
						  const struct iommu_user_data *user_data);


	int (*set_dev_user_data)(struct device *dev,
				 const struct iommu_user_data *user_data);
	void (*unset_dev_user_data)(struct device *dev);


	int (*cache_invalidate_user)(struct iommu_domain *domain,
				     struct iommu_user_data_array *array,
				     u32 *error_code);

above code snippet is from below file:

https://github.com/yiliu1765/iommufd/blob/iommufd_nesting/include/linux/iommu.h

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
  2023-10-12 19:09     ` Jason Gunthorpe
@ 2023-10-13  7:13       ` Tian, Kevin
  2023-10-13 14:05         ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-10-13  7:13 UTC (permalink / raw)
  To: Jason Gunthorpe, Liu, Yi L
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, October 13, 2023 3:10 AM
> 
> Also, we probably should feed enforce_cache_coherency through the
> alloc_hwpt uapi and not try to autodetect it..
> 

In the past we had a long discussion about this with the conclusion
that user opt is the ideal model but it's fine to stay with autodetect
and existing vfio/kvm contract for coherency detection until we
see a real demand for user opt.

Is there anything new which changes your mind to have user opt now?

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-12  9:12     ` Yi Liu
@ 2023-10-13 11:42       ` Yi Liu
  2023-10-13 22:22         ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-13 11:42 UTC (permalink / raw)
  To: Tian, Kevin, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On 2023/10/12 17:12, Yi Liu wrote:
> On 2023/9/26 14:56, Tian, Kevin wrote:
>>> From: Yi Liu <yi.l.liu@intel.com>
>>> Sent: Thursday, September 21, 2023 3:51 PM
>>> +
>>> +/**
>>> + * iommu_copy_user_data - Copy iommu driver specific user space data
>>> + * @dst_data: Pointer to an iommu driver specific user data that is 
>>> defined
>>> in
>>> + *            include/uapi/linux/iommufd.h
>>> + * @src_data: Pointer to a struct iommu_user_data for user space data info
>>> + * @data_len: Length of current user data structure, i.e. sizeof(struct 
>>> _dst)
>>> + * @min_len: Initial length of user data structure for backward 
>>> compatibility.
>>> + *           This should be offsetofend using the last member in the 
>>> user data
>>> + *           struct that was initially added to 
>>> include/uapi/linux/iommufd.h
>>> + */
>>> +static inline int iommu_copy_user_data(void *dst_data,
>>> +                       const struct iommu_user_data *src_data,
>>> +                       size_t data_len, size_t min_len)
>>
>> iommu_copy_struct_from_user()?
>>
>> btw given the confusion raised on how this would be used is it clearer
>> to move it to the patch together with the 1st user?
> 
> sure. How about your opinion? @Nic.
> 

after a second thinking, the first user of this helper is the patch to
extend mock iommu driver. Is it suitable to introduce a common API together
with selftest code?

https://lore.kernel.org/linux-iommu/20230921075138.124099-14-yi.l.liu@intel.com/

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-13  0:34     ` Nicolin Chen
@ 2023-10-13 14:03       ` Jason Gunthorpe
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 14:03 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Oct 12, 2023 at 05:34:58PM -0700, Nicolin Chen wrote:
> On Tue, Oct 10, 2023 at 01:58:44PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index 660dc1931dc9..12e12e5563e6 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -14,6 +14,7 @@
> > >  #include <linux/err.h>
> > >  #include <linux/of.h>
> > >  #include <uapi/linux/iommu.h>
> > > +#include <uapi/linux/iommufd.h>
> > 
> > Oh we should definately avoid doing that!
> >   
> > Maybe this is a good moment to start a new header file exclusively for
> > iommu drivers and core subsystem to include?
> > 
> >  include/linux/iommu-driver.h
> > 
> > ?
> > 
> > Put iommu_copy_user_data() and  struct iommu_user_data in there
> > 
> > Avoid this include in this file.
> 
> By looking closer, it seems that we included the uapi header for:
> +	struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags,
> +						  enum iommu_hwpt_data_type data_type,
> +						  struct iommu_domain *parent,
> +						  const struct iommu_user_data *user_data);
> 
> So we could drop the include, and instead add this next to structs:
> +enum iommu_hwpt_data_type;
> 
> Then it's not that necessary to have a new header? We could mark a
> section of "driver exclusively functions" in iommu.h, I think.

Yeah, OK, though I still have a desire to split this header..

(though can you really forward declare enums and then pass it by value?)

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-13  4:33         ` Yi Liu
@ 2023-10-13 14:04           ` Jason Gunthorpe
  2023-10-13 17:56             ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 14:04 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:

> not really. Below the users of the struct iommu_user_data in my current
> iommufd_nesting branch. Only the domain_alloc_user op has type as there
> can be multiple vendor specific alloc data types. Basically, I'm ok to
> make the change you suggested, just not sure if it is good to add type
> as it is only needed by one path.

I don't think we should ever have an opaque data blob without a type
tag..

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
  2023-10-13  7:13       ` Tian, Kevin
@ 2023-10-13 14:05         ` Jason Gunthorpe
  2023-10-16  8:26           ` Tian, Kevin
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 14:05 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Fri, Oct 13, 2023 at 07:13:34AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, October 13, 2023 3:10 AM
> > 
> > Also, we probably should feed enforce_cache_coherency through the
> > alloc_hwpt uapi and not try to autodetect it..
> > 
> 
> In the past we had a long discussion about this with the conclusion
> that user opt is the ideal model but it's fine to stay with autodetect
> and existing vfio/kvm contract for coherency detection until we
> see a real demand for user opt.
> 
> Is there anything new which changes your mind to have user opt now?

I guess, I was just looking at the complexity it brings to keep that
working.

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-09-21  7:51 ` [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data Yi Liu
@ 2023-10-13 15:19   ` Jason Gunthorpe
  2023-10-13 20:58     ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 15:19 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> But it can only allocate a hw_pagetable that associates to a given IOAS,
> i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> 
> IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> translation use cases, that require user data input from the user space.
> 
> Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> type specified user data. Also, update the @pt_id to accept hwpt_id too
> besides an ioas_id. Then, pass them to the downstream alloc_fn().
> 
> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
>  include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
>  2 files changed, 39 insertions(+), 3 deletions(-)

Can we also come with a small vt-d patch that does implement an op for
this? Or is it too big?

It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
self-contained series and another series for invalidate.

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-13 14:04           ` Jason Gunthorpe
@ 2023-10-13 17:56             ` Nicolin Chen
  2023-10-16  3:28               ` Yi Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Nicolin Chen @ 2023-10-13 17:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> 
> > not really. Below the users of the struct iommu_user_data in my current
> > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > make the change you suggested, just not sure if it is good to add type
> > as it is only needed by one path.
> 
> I don't think we should ever have an opaque data blob without a type
> tag..

I can add those "missing" data types, and then a driver will be
responsible for sanitizing the type along with the data_len.

I notice that the enum iommu_hwpt_data_type in the posted patch
is confined to the alloc_user uAPI. Perhaps we should share it
with invalidate too:

/**
 * enum iommu_hwpt_data_type - IOMMU HWPT Data Type
 * @IOMMU_HWPT_DATA_NONE: no data
 * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
 * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
 */
enum iommu_hwpt_data_type {
	IOMMU_HWPT_DATA_NONE,
	IOMMU_HWPT_DATA_VTD_S1,
	IOMMU_HWPT_DATA_ARM_SMMUV3,
};

Though inevitably we'd have to define a separate data group for
things like set_dev_data that is related to idev v.s. hwpt:

// IOMMU_DEV_DATA_TYPE sounds like an IOMMU device, other than a
// passthrough device, so renaming to "_IDEV_" here. And perhaps
// "set_dev_data" could be "set_idev_data" too? Any better name?

/**
 * enum iommu_idev_data_type - Data Type for a Device behind an IOMMU
 * @IOMMU_IDEV_DATA_NONE: no data
 * @IOMMU_IDEV_DATA_ARM_SMMUV3: ARM SMMUv3 specific device data
 */
enum iommu_idev_data_type {
	IOMMU_IDEV_DATA_NONE,
	IOMMU_IDEV_DATA_ARM_SMMUV3,
};

/**
 * struct iommu_idev_data_arm_smmuv3 - ARM SMMUv3 specific device data
 * @sid: The Stream ID that is assigned in the user space
 *
 * The SMMUv3 specific user space data for a device that is behind an SMMU HW.
 * The guest-level user data should be linked to the host-level kernel data,
 * which will be used by user space cache invalidation commands.
 */
struct iommu_idev_data_arm_smmuv3 {
	__u32 sid;
};

/**
 * struct iommu_set_idev_data - ioctl(IOMMU_SET_IDEV_DATA)
 * @size: sizeof(struct iommu_set_idev_data)
 * @dev_id: The device to set an iommu specific device data
 * @data_uptr: User pointer of the device user data
 * @data_len: Length of the device user data
 *
 * The device data must be unset using ioctl(IOMMU_UNSET_IDEV_DATA), before
 * another ioctl(IOMMU_SET_IDEV_DATA) call or before the device itself gets
 * unbind'd from the iommufd context.
 */
struct iommu_set_idev_data {
	__u32 size;
	__u32 dev_id;
	__aligned_u64 data_uptr;
	__u32 data_len;
};

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-10-13 15:19   ` Jason Gunthorpe
@ 2023-10-13 20:58     ` Nicolin Chen
  2023-10-14  0:07       ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Nicolin Chen @ 2023-10-13 20:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> > IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> > But it can only allocate a hw_pagetable that associates to a given IOAS,
> > i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> > 
> > IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> > translation use cases, that require user data input from the user space.
> > 
> > Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> > type specified user data. Also, update the @pt_id to accept hwpt_id too
> > besides an ioas_id. Then, pass them to the downstream alloc_fn().
> > 
> > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
> >  include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
> >  2 files changed, 39 insertions(+), 3 deletions(-)
> 
> Can we also come with a small vt-d patch that does implement an op for
> this? Or is it too big?
> 
> It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
> self-contained series and another series for invalidate.

We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
which won't be supported until the cache_invalidate_user ops is
preset?

/* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */

+       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
+       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
+               rc = -EINVAL;
+               goto out_abort;
+       }

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-13 11:42       ` Yi Liu
@ 2023-10-13 22:22         ` Nicolin Chen
  0 siblings, 0 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-13 22:22 UTC (permalink / raw)
  To: Yi Liu
  Cc: Tian, Kevin, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong, Martins,
	Joao

On Fri, Oct 13, 2023 at 07:42:50PM +0800, Yi Liu wrote:
> On 2023/10/12 17:12, Yi Liu wrote:
> > On 2023/9/26 14:56, Tian, Kevin wrote:
> > > > From: Yi Liu <yi.l.liu@intel.com>
> > > > Sent: Thursday, September 21, 2023 3:51 PM
> > > > +
> > > > +/**
> > > > + * iommu_copy_user_data - Copy iommu driver specific user space data
> > > > + * @dst_data: Pointer to an iommu driver specific user data that is
> > > > defined
> > > > in
> > > > + *            include/uapi/linux/iommufd.h
> > > > + * @src_data: Pointer to a struct iommu_user_data for user space data info
> > > > + * @data_len: Length of current user data structure, i.e. sizeof(struct
> > > > _dst)
> > > > + * @min_len: Initial length of user data structure for backward
> > > > compatibility.
> > > > + *           This should be offsetofend using the last member in the
> > > > user data
> > > > + *           struct that was initially added to
> > > > include/uapi/linux/iommufd.h
> > > > + */
> > > > +static inline int iommu_copy_user_data(void *dst_data,
> > > > +                       const struct iommu_user_data *src_data,
> > > > +                       size_t data_len, size_t min_len)
> > > 
> > > iommu_copy_struct_from_user()?
> > > 
> > > btw given the confusion raised on how this would be used is it clearer
> > > to move it to the patch together with the 1st user?
> > 
> > sure. How about your opinion? @Nic.
> > 
> 
> after a second thinking, the first user of this helper is the patch to
> extend mock iommu driver. Is it suitable to introduce a common API together
> with selftest code?
> 
> https://lore.kernel.org/linux-iommu/20230921075138.124099-14-yi.l.liu@intel.com/

I feel no...

But I could separate iommu_copy_struct_from_user and its array
drivitive into an additional patch placed before the selftest
changes, so at least it would be closer to the first callers.

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-10-13 20:58     ` Nicolin Chen
@ 2023-10-14  0:07       ` Jason Gunthorpe
  2023-10-14  0:51         ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-14  0:07 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
> On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> > > IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> > > But it can only allocate a hw_pagetable that associates to a given IOAS,
> > > i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> > > 
> > > IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> > > translation use cases, that require user data input from the user space.
> > > 
> > > Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> > > type specified user data. Also, update the @pt_id to accept hwpt_id too
> > > besides an ioas_id. Then, pass them to the downstream alloc_fn().
> > > 
> > > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
> > >  include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
> > >  2 files changed, 39 insertions(+), 3 deletions(-)
> > 
> > Can we also come with a small vt-d patch that does implement an op for
> > this? Or is it too big?
> > 
> > It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
> > self-contained series and another series for invalidate.
> 
> We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
> which won't be supported until the cache_invalidate_user ops is
> preset?
> 
> /* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
> 
> +       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
> +       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
> +               rc = -EINVAL;
> +               goto out_abort;
> +       }
>

Hm. That hunk could migrate to the invalidate series. 

I'm just leeary of doing the invalidation too considering how
complicated it is

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support
  2023-09-26  8:14   ` Tian, Kevin
@ 2023-10-14  0:08     ` Nicolin Chen
  0 siblings, 0 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-14  0:08 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, Jason Gunthorpe, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Tue, Sep 26, 2023 at 01:14:47AM -0700, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, September 21, 2023 3:51 PM
> >
> > +static void iommufd_user_managed_hwpt_abort(struct iommufd_object
> > *obj)
> > +{
> > +     struct iommufd_hw_pagetable *hwpt =
> > +             container_of(obj, struct iommufd_hw_pagetable, obj);
> > +
> > +     /* The parent->mutex must be held until finalize is called. */
> > +     lockdep_assert_held(&hwpt->parent->mutex);
> > +
> > +     iommufd_hw_pagetable_destroy(obj);
> > +}
> 
> Can you elaborate what exactly is protected by parent->mutex?
> 
> My gut-feeling that the child just grabs a refcnt on the parent
> object. It doesn't change any other data of the parent.

Ah, you are right. It's added here just for symmetry so we wouldn't
end up with something like:
	if (!hwpt->user_managed)
		mutex_lock(&hwpt->mutex);
	alloc_fn();
	if (!hwpt->user_managed)
		mutex_unlock(&hwpt->mutex);

Perhaps I should move the pair of mutex calls to the kernel-managed
hwpt allocator.

> > +/**
> > + * iommufd_user_managed_hwpt_alloc() - Get a user-managed
> > hw_pagetable
> > + * @ictx: iommufd context
> > + * @pt_obj: Parent object to an HWPT to associate the domain with
> > + * @idev: Device to get an iommu_domain for
> > + * @flags: Flags from userspace
> > + * @hwpt_type: Requested type of hw_pagetable
> > + * @user_data: user_data pointer
> > + * @dummy: never used
> > + *
> > + * Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and
> > return it as
> > + * a user-managed hw_pagetable.
> 
> Add some text to highlight the requirement being a parent, e.g. not
> an auto domain, must be capable of being a parent, etc.

OK.

> > +     case IOMMUFD_OBJ_HW_PAGETABLE:
> > +             parent = container_of(pt_obj, struct iommufd_hw_pagetable,
> > obj);
> > +             /* No user-managed HWPT on top of an user-managed one
> > */
> > +             if (parent->user_managed) {
> > +                     rc = -EINVAL;
> > +                     goto out_put_pt;
> > +             }
> 
> move to alloc_fn().

Though being a bit covert, this is actually to avoid a data buffer
allocation in the common pathway before calling alloc_fn(), which
is added in the following patch. And the reason why it's in the
common function is because we previously support a kernel-managed
hwpt allocation with data too.

But now, I think we can just move this sanity and data allocation
together into the user-managed hwpt allocator.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-09-26  8:16   ` Tian, Kevin
@ 2023-10-14  0:44     ` Nicolin Chen
  2023-10-16  8:48       ` Tian, Kevin
  0 siblings, 1 reply; 106+ messages in thread
From: Nicolin Chen @ 2023-10-14  0:44 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, Jason Gunthorpe, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, September 21, 2023 3:51 PM
> >
> > From: Nicolin Chen <nicolinc@nvidia.com>
> >
> > Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt
> > things.
> > So, they should be only setup on kernel-managed domains. If the attaching
> > domain is a user-managed domain, redirect the hwpt to hwpt->parent to do
> > it correctly.
> >
> 
> No redirection. The parent should already have the configuration done
> when it's created. It shouldn't be triggered in the nesting path.

iommufd_hw_pagetable_enforce_cc() is not only called in alloc(),
but also in hwpt_attach/replace() if cc is not enforced by the
alloc() because the idev that initiates the hwpt_alloc() might
not have idev->enforce_cache_coherency. Only when another idev
that has idev->enforce_cache_coherency attaches to the shared
hwpt, the cc configuration would be done.

In a nesting case encountering the same situation above, the S2
hwpt is allocated without the iommufd_hw_pagetable_enforce_cc().
But the 2nd idev that has idev->enforce_cache_coherency might
attach directly to the S1 domain/hwpt without touching the S2
domain (for the same VM, S2 domain can be shared). In this case,
without a redirection, the iommufd_hw_pagetable_enforce_cc()
would be missed.

Any thought?

Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 02/17] iommu: Add nested domain support
  2023-10-12  9:13     ` Yi Liu
@ 2023-10-14  0:47       ` Nicolin Chen
  0 siblings, 0 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-14  0:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: Jason Gunthorpe, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Thu, Oct 12, 2023 at 05:13:59PM +0800, Yi Liu wrote:
 
> > > @@ -241,6 +245,21 @@ struct iommu_user_data {
> > >      size_t len;
> > >   };
> > > 
> > > +/**
> > > + * struct iommu_user_data_array - iommu driver specific user space data array
> > > + * @uptr: Pointer to the user buffer array for copy_from_user()
> > > + * @entry_len: The fixed-width length of a entry in the array, in bytes
> > > + * @entry_num: The number of total entries in the array
> > > + *
> > > + * A array having a @entry_num number of @entry_len sized entries, each entry is
> > > + * user space data, i.e. an uAPI that is defined in include/uapi/linux/iommufd.h
> > > + */
> > > +struct iommu_user_data_array {
> > > +    void __user *uptr;
> > > +    size_t entry_len;
> > > +    int entry_num;
> > > +};
> > 
> > Ditto about iommu-driver.h for most of this stuff
> 
> ack.

FYI, I aligned with Jason in another conversation that we can
defer this iommu-driver.h thing since there's no more need to 
include the previous uapi header in iommu.h

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-10-14  0:07       ` Jason Gunthorpe
@ 2023-10-14  0:51         ` Nicolin Chen
  2023-10-16  7:03           ` Yi Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Nicolin Chen @ 2023-10-14  0:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Fri, Oct 13, 2023 at 09:07:09PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
> > On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> > > > IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> > > > But it can only allocate a hw_pagetable that associates to a given IOAS,
> > > > i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> > > > 
> > > > IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> > > > translation use cases, that require user data input from the user space.
> > > > 
> > > > Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> > > > type specified user data. Also, update the @pt_id to accept hwpt_id too
> > > > besides an ioas_id. Then, pass them to the downstream alloc_fn().
> > > > 
> > > > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
> > > >  include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
> > > >  2 files changed, 39 insertions(+), 3 deletions(-)
> > > 
> > > Can we also come with a small vt-d patch that does implement an op for
> > > this? Or is it too big?
> > > 
> > > It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
> > > self-contained series and another series for invalidate.
> > 
> > We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
> > which won't be supported until the cache_invalidate_user ops is
> > preset?
> > 
> > /* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
> > 
> > +       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
> > +       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
> > +               rc = -EINVAL;
> > +               goto out_abort;
> > +       }
> >
> 
> Hm. That hunk could migrate to the invalidate series. 
> 
> I'm just leeary of doing the invalidation too considering how
> complicated it is

OK. Let's see how Yi/Kevin/Baolu reply about the feasibility
with the VT-d driver. Then Yi and I can accordingly separate
the allocation part into a smaller series.

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-13 17:56             ` Nicolin Chen
@ 2023-10-16  3:28               ` Yi Liu
  2023-10-16 11:54                 ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-16  3:28 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, zhenzhong.duan,
	joao.m.martins

On 2023/10/14 01:56, Nicolin Chen wrote:
> On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
>> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
>>
>>> not really. Below the users of the struct iommu_user_data in my current
>>> iommufd_nesting branch. Only the domain_alloc_user op has type as there
>>> can be multiple vendor specific alloc data types. Basically, I'm ok to
>>> make the change you suggested, just not sure if it is good to add type
>>> as it is only needed by one path.
>>
>> I don't think we should ever have an opaque data blob without a type
>> tag..
> 
> I can add those "missing" data types, and then a driver will be
> responsible for sanitizing the type along with the data_len.
> 
> I notice that the enum iommu_hwpt_data_type in the posted patch
> is confined to the alloc_user uAPI. Perhaps we should share it
> with invalidate too:

invalidation path does not need a type field today as the data
type is vendor specific, vendor driver should know the data type
when calls in.

> 
> /**
>   * enum iommu_hwpt_data_type - IOMMU HWPT Data Type
>   * @IOMMU_HWPT_DATA_NONE: no data
>   * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
>   * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
>   */
> enum iommu_hwpt_data_type {
> 	IOMMU_HWPT_DATA_NONE,
> 	IOMMU_HWPT_DATA_VTD_S1,
> 	IOMMU_HWPT_DATA_ARM_SMMUV3,
> };
> 
> Though inevitably we'd have to define a separate data group for
> things like set_dev_data that is related to idev v.s. hwpt:

yes, the type field is in separate data group per path.

> 
> // IOMMU_DEV_DATA_TYPE sounds like an IOMMU device, other than a
> // passthrough device, so renaming to "_IDEV_" here. And perhaps
> // "set_dev_data" could be "set_idev_data" too? Any better name?
> 
> /**
>   * enum iommu_idev_data_type - Data Type for a Device behind an IOMMU
>   * @IOMMU_IDEV_DATA_NONE: no data
>   * @IOMMU_IDEV_DATA_ARM_SMMUV3: ARM SMMUv3 specific device data
>   */
> enum iommu_idev_data_type {
> 	IOMMU_IDEV_DATA_NONE,
> 	IOMMU_IDEV_DATA_ARM_SMMUV3,
> };
> 
> /**
>   * struct iommu_idev_data_arm_smmuv3 - ARM SMMUv3 specific device data
>   * @sid: The Stream ID that is assigned in the user space
>   *
>   * The SMMUv3 specific user space data for a device that is behind an SMMU HW.
>   * The guest-level user data should be linked to the host-level kernel data,
>   * which will be used by user space cache invalidation commands.
>   */
> struct iommu_idev_data_arm_smmuv3 {
> 	__u32 sid;
> };
> 
> /**
>   * struct iommu_set_idev_data - ioctl(IOMMU_SET_IDEV_DATA)
>   * @size: sizeof(struct iommu_set_idev_data)
>   * @dev_id: The device to set an iommu specific device data
>   * @data_uptr: User pointer of the device user data
>   * @data_len: Length of the device user data
>   *
>   * The device data must be unset using ioctl(IOMMU_UNSET_IDEV_DATA), before
>   * another ioctl(IOMMU_SET_IDEV_DATA) call or before the device itself gets
>   * unbind'd from the iommufd context.
>   */
> struct iommu_set_idev_data {
> 	__u32 size;
> 	__u32 dev_id;
> 	__aligned_u64 data_uptr;
> 	__u32 data_len;
> };
> 
> Thanks
> Nic

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-10-14  0:51         ` Nicolin Chen
@ 2023-10-16  7:03           ` Yi Liu
  2023-10-16 11:59             ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-16  7:03 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, zhenzhong.duan,
	joao.m.martins

On 2023/10/14 08:51, Nicolin Chen wrote:
> On Fri, Oct 13, 2023 at 09:07:09PM -0300, Jason Gunthorpe wrote:
>> On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
>>> On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
>>>> On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
>>>>> IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
>>>>> But it can only allocate a hw_pagetable that associates to a given IOAS,
>>>>> i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
>>>>>
>>>>> IOMMU drivers can now support user-managed hw_pagetables, for two-stage
>>>>> translation use cases, that require user data input from the user space.
>>>>>
>>>>> Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
>>>>> type specified user data. Also, update the @pt_id to accept hwpt_id too
>>>>> besides an ioas_id. Then, pass them to the downstream alloc_fn().
>>>>>
>>>>> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
>>>>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>> ---
>>>>>   drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
>>>>>   include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
>>>>>   2 files changed, 39 insertions(+), 3 deletions(-)
>>>>
>>>> Can we also come with a small vt-d patch that does implement an op for
>>>> this? Or is it too big?
>>>>
>>>> It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
>>>> self-contained series and another series for invalidate.
>>>
>>> We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
>>> which won't be supported until the cache_invalidate_user ops is
>>> preset?
>>>
>>> /* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
>>>
>>> +       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
>>> +       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
>>> +               rc = -EINVAL;
>>> +               goto out_abort;
>>> +       }
>>>
>>
>> Hm. That hunk could migrate to the invalidate series.
>>
>> I'm just leeary of doing the invalidation too considering how
>> complicated it is
> 
> OK. Let's see how Yi/Kevin/Baolu reply about the feasibility
> with the VT-d driver. Then Yi and I can accordingly separate
> the allocation part into a smaller series.

Current nesting series actually extends HWPT_ALLOC ioctl to accept user
data for allocating domain with vendor specific data. Nested translation
happens to be the usage of it. But nesting requires invalidation. If we
want to do further split, then this new series would be just "extending
HWPT_ALLOC to accept vendor specific data from userspace". But it will
lack of a user if nesting is separated. Is this acceptable? @Jason

BTW. Do we still have unsolved issue on the invalidation path?

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
  2023-10-13 14:05         ` Jason Gunthorpe
@ 2023-10-16  8:26           ` Tian, Kevin
  2023-10-16 12:02             ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-10-16  8:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, October 13, 2023 10:06 PM
> 
> On Fri, Oct 13, 2023 at 07:13:34AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, October 13, 2023 3:10 AM
> > >
> > > Also, we probably should feed enforce_cache_coherency through the
> > > alloc_hwpt uapi and not try to autodetect it..
> > >
> >
> > In the past we had a long discussion about this with the conclusion
> > that user opt is the ideal model but it's fine to stay with autodetect
> > and existing vfio/kvm contract for coherency detection until we
> > see a real demand for user opt.
> >
> > Is there anything new which changes your mind to have user opt now?
> 
> I guess, I was just looking at the complexity it brings to keep that
> working.
> 

vfio_file_enforced_coherent() currently just looks at device_iommu_capable()
and can be called before attach, assuming an autodetect model. 

moving away from autodetect would need a new contract which I hesitate
to bother with at this point.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-14  0:44     ` Nicolin Chen
@ 2023-10-16  8:48       ` Tian, Kevin
  2023-10-16 11:57         ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-10-16  8:48 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, joro, alex.williamson, Jason Gunthorpe, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, October 14, 2023 8:45 AM
> 
> On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Thursday, September 21, 2023 3:51 PM
> > >
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > >
> > > Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt
> > > things.
> > > So, they should be only setup on kernel-managed domains. If the
> attaching
> > > domain is a user-managed domain, redirect the hwpt to hwpt->parent to
> do
> > > it correctly.
> > >
> >
> > No redirection. The parent should already have the configuration done
> > when it's created. It shouldn't be triggered in the nesting path.
> 
> iommufd_hw_pagetable_enforce_cc() is not only called in alloc(),
> but also in hwpt_attach/replace() if cc is not enforced by the
> alloc() because the idev that initiates the hwpt_alloc() might
> not have idev->enforce_cache_coherency. Only when another idev
> that has idev->enforce_cache_coherency attaches to the shared
> hwpt, the cc configuration would be done.
> 

is this a bug already? If the 1st device doesn't have enforce_cc in its
iommu, setting the snp bit in the hwpt would lead to reserved
bit violation.

another problem is that intel_iommu_enforce_cache_coherency()
doesn't update existing entries. It only sets a domain flag to affect
future mappings. so it means the 2nd idev is also broken.

The simplest option is to follow vfio type1 i.e. don't mix devices
with different enforce_cc in one domain.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-16  3:28               ` Yi Liu
@ 2023-10-16 11:54                 ` Jason Gunthorpe
  2023-10-16 18:17                   ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 11:54 UTC (permalink / raw)
  To: Yi Liu
  Cc: Nicolin Chen, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> On 2023/10/14 01:56, Nicolin Chen wrote:
> > On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> > > 
> > > > not really. Below the users of the struct iommu_user_data in my current
> > > > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > > > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > > > make the change you suggested, just not sure if it is good to add type
> > > > as it is only needed by one path.
> > > 
> > > I don't think we should ever have an opaque data blob without a type
> > > tag..
> > 
> > I can add those "missing" data types, and then a driver will be
> > responsible for sanitizing the type along with the data_len.
> > 
> > I notice that the enum iommu_hwpt_data_type in the posted patch
> > is confined to the alloc_user uAPI. Perhaps we should share it
> > with invalidate too:
> 
> invalidation path does not need a type field today as the data
> type is vendor specific, vendor driver should know the data type
> when calls in.

I'm not keen on that, what if a driver needs another type in the
future?  You'd want to make the invalidation data format part of the
domain allocation?

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-16  8:48       ` Tian, Kevin
@ 2023-10-16 11:57         ` Jason Gunthorpe
  2023-10-17  8:52           ` Tian, Kevin
  2023-10-17  9:05           ` Tian, Kevin
  0 siblings, 2 replies; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 11:57 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Mon, Oct 16, 2023 at 08:48:03AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Saturday, October 14, 2023 8:45 AM
> > 
> > On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Thursday, September 21, 2023 3:51 PM
> > > >
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > >
> > > > Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt
> > > > things.
> > > > So, they should be only setup on kernel-managed domains. If the
> > attaching
> > > > domain is a user-managed domain, redirect the hwpt to hwpt->parent to
> > do
> > > > it correctly.
> > > >
> > >
> > > No redirection. The parent should already have the configuration done
> > > when it's created. It shouldn't be triggered in the nesting path.
> > 
> > iommufd_hw_pagetable_enforce_cc() is not only called in alloc(),
> > but also in hwpt_attach/replace() if cc is not enforced by the
> > alloc() because the idev that initiates the hwpt_alloc() might
> > not have idev->enforce_cache_coherency. Only when another idev
> > that has idev->enforce_cache_coherency attaches to the shared
> > hwpt, the cc configuration would be done.
> 
> is this a bug already? If the 1st device doesn't have enforce_cc in its
> iommu, setting the snp bit in the hwpt would lead to reserved
> bit violation.

I suspect there are technically some gaps in the intel driver, yes..
 
> another problem is that intel_iommu_enforce_cache_coherency()
> doesn't update existing entries. It only sets a domain flag to affect
> future mappings. so it means the 2nd idev is also broken.

This is such a gap, intel driver should not permit that.

> The simplest option is to follow vfio type1 i.e. don't mix devices
> with different enforce_cc in one domain.

This is why I wanted to get rid of this bad mechanism going forward.

Manually created hwpt should have a manual specification of cc and
then we don't have so many problems.

It means userspace needs to compute if they want to use CC or not, but
userspace already needs to figure this out since without autodomains
it must create two hwpts manually anyhow.

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-10-16  7:03           ` Yi Liu
@ 2023-10-16 11:59             ` Jason Gunthorpe
  2023-10-16 18:44               ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 11:59 UTC (permalink / raw)
  To: Yi Liu
  Cc: Nicolin Chen, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
> On 2023/10/14 08:51, Nicolin Chen wrote:
> > On Fri, Oct 13, 2023 at 09:07:09PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
> > > > On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
> > > > > On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> > > > > > IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> > > > > > But it can only allocate a hw_pagetable that associates to a given IOAS,
> > > > > > i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> > > > > > 
> > > > > > IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> > > > > > translation use cases, that require user data input from the user space.
> > > > > > 
> > > > > > Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> > > > > > type specified user data. Also, update the @pt_id to accept hwpt_id too
> > > > > > besides an ioas_id. Then, pass them to the downstream alloc_fn().
> > > > > > 
> > > > > > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > > > ---
> > > > > >   drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
> > > > > >   include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
> > > > > >   2 files changed, 39 insertions(+), 3 deletions(-)
> > > > > 
> > > > > Can we also come with a small vt-d patch that does implement an op for
> > > > > this? Or is it too big?
> > > > > 
> > > > > It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
> > > > > self-contained series and another series for invalidate.
> > > > 
> > > > We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
> > > > which won't be supported until the cache_invalidate_user ops is
> > > > preset?
> > > > 
> > > > /* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
> > > > 
> > > > +       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
> > > > +       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
> > > > +               rc = -EINVAL;
> > > > +               goto out_abort;
> > > > +       }
> > > > 
> > > 
> > > Hm. That hunk could migrate to the invalidate series.
> > > 
> > > I'm just leeary of doing the invalidation too considering how
> > > complicated it is
> > 
> > OK. Let's see how Yi/Kevin/Baolu reply about the feasibility
> > with the VT-d driver. Then Yi and I can accordingly separate
> > the allocation part into a smaller series.
> 
> Current nesting series actually extends HWPT_ALLOC ioctl to accept user
> data for allocating domain with vendor specific data. Nested translation
> happens to be the usage of it. But nesting requires invalidation. If we
> want to do further split, then this new series would be just "extending
> HWPT_ALLOC to accept vendor specific data from userspace". But it will
> lack of a user if nesting is separated. Is this acceptable? @Jason

I'd still like to include the nesting allocation and attach parts
though, even if they are not usable without invalidation ..

> BTW. Do we still have unsolved issue on the invalidation path?

I'm not sure, there were so many different versions of it we need to
go back over it and check the dirver implementations again

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
  2023-10-16  8:26           ` Tian, Kevin
@ 2023-10-16 12:02             ` Jason Gunthorpe
  0 siblings, 0 replies; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 12:02 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Mon, Oct 16, 2023 at 08:26:25AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, October 13, 2023 10:06 PM
> > 
> > On Fri, Oct 13, 2023 at 07:13:34AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, October 13, 2023 3:10 AM
> > > >
> > > > Also, we probably should feed enforce_cache_coherency through the
> > > > alloc_hwpt uapi and not try to autodetect it..
> > > >
> > >
> > > In the past we had a long discussion about this with the conclusion
> > > that user opt is the ideal model but it's fine to stay with autodetect
> > > and existing vfio/kvm contract for coherency detection until we
> > > see a real demand for user opt.
> > >
> > > Is there anything new which changes your mind to have user opt now?
> > 
> > I guess, I was just looking at the complexity it brings to keep that
> > working.
> > 
> 
> vfio_file_enforced_coherent() currently just looks at device_iommu_capable()
> and can be called before attach, assuming an autodetect model. 
> 
> moving away from autodetect would need a new contract which I hesitate
> to bother with at this point.

I think that is fine, we are not enabling new behaviors where qemu can
directly control the wbinvd support. If we want to do that it can be
done more directly.

If the vmm sees a coherent device and does not turn on enforced
coherency then it will get a broken wbinvd and be sad. Since this is
all in the userspace vt-d iommu driver, it can just do the right thing
anyhow.

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-16 11:54                 ` Jason Gunthorpe
@ 2023-10-16 18:17                   ` Nicolin Chen
  2023-10-17  8:51                     ` Yi Liu
  2023-10-18 16:37                     ` Jason Gunthorpe
  0 siblings, 2 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-16 18:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> > On 2023/10/14 01:56, Nicolin Chen wrote:
> > > On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> > > > 
> > > > > not really. Below the users of the struct iommu_user_data in my current
> > > > > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > > > > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > > > > make the change you suggested, just not sure if it is good to add type
> > > > > as it is only needed by one path.
> > > > 
> > > > I don't think we should ever have an opaque data blob without a type
> > > > tag..
> > > 
> > > I can add those "missing" data types, and then a driver will be
> > > responsible for sanitizing the type along with the data_len.
> > > 
> > > I notice that the enum iommu_hwpt_data_type in the posted patch
> > > is confined to the alloc_user uAPI. Perhaps we should share it
> > > with invalidate too:
> > 
> > invalidation path does not need a type field today as the data
> > type is vendor specific, vendor driver should know the data type
> > when calls in.
> 
> I'm not keen on that, what if a driver needs another type in the
> future?  You'd want to make the invalidation data format part of the
> domain allocation?

The invalidation data has hwpt_id so it's tied to a hwpt and its 
hwpt->domain. Would it be reasonable to have a different type of
invalidation data for the same type of hwpt?

With this being asked, I added it for our next version. At this
moment, it only does a sanity job:

// API
__iommu_copy_struct_from_user(void *dst_data,
			      const struct iommu_user_data *src_data,
			      unsigned int data_type, size_t data_len,
			      size_t min_len)
{
	if (src_data->type != data_type)
		return -EINVAL;

// Caller
	rc = iommu_copy_struct_from_user(&user_cfg, user_data,
					 IOMMU_HWPT_DATA_SELFTEST, iotlb);
	if (rc)
		return ERR_PTR(rc);

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-10-16 11:59             ` Jason Gunthorpe
@ 2023-10-16 18:44               ` Nicolin Chen
  2023-10-17  8:55                 ` Yi Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Nicolin Chen @ 2023-10-16 18:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
> > Current nesting series actually extends HWPT_ALLOC ioctl to accept user
> > data for allocating domain with vendor specific data. Nested translation
> > happens to be the usage of it. But nesting requires invalidation. If we
> > want to do further split, then this new series would be just "extending
> > HWPT_ALLOC to accept vendor specific data from userspace". But it will
> > lack of a user if nesting is separated. Is this acceptable? @Jason
> 
> I'd still like to include the nesting allocation and attach parts
> though, even if they are not usable without invalidation ..

This is the latest series that I reworked (in bottom-up order):
 iommu: Add a pair of helper to copy struct iommu_user_data{_array}
 iommufd: Add IOMMU_HWPT_INVALIDATE
 iommufd: Add a nested HW pagetable object
 iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
 iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
 iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
 iommufd/device: Add helpers to enforce/remove device reserved regions
 iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
 iommu: Pass in parent domain with user_data to domain_alloc_user op

Perhaps we can have a preparatory series to merge first:
 iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
 iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
 iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
 iommufd/device: Add helpers to enforce/remove device reserved regions

Then next cycle would be basically 4 patches + selftests:
 iommufd: Add IOMMU_HWPT_INVALIDATE
 iommufd: Add a nested HW pagetable object
 iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
 iommu: Pass in parent domain with user_data to domain_alloc_user op

The preparatory series doesn't involve functional changes yet have
a good amount of pieces to simplify the "nested HW pagetable" that
is basically nested_alloc/abort/destroy.

> > BTW. Do we still have unsolved issue on the invalidation path?
> 
> I'm not sure, there were so many different versions of it we need to
> go back over it and check the dirver implementations again

Only this v4 has the latest array-based invalidation design. And
it should be straightforward for drivers to define entry/request
structures. It might be a bit rush to review/finalize it at the
stage of rc6 though.

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-16 18:17                   ` Nicolin Chen
@ 2023-10-17  8:51                     ` Yi Liu
  2023-10-17  9:28                       ` Tian, Kevin
  2023-10-18 16:37                     ` Jason Gunthorpe
  1 sibling, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-17  8:51 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, zhenzhong.duan,
	joao.m.martins

On 2023/10/17 02:17, Nicolin Chen wrote:
> On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
>> On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
>>> On 2023/10/14 01:56, Nicolin Chen wrote:
>>>> On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
>>>>> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
>>>>>
>>>>>> not really. Below the users of the struct iommu_user_data in my current
>>>>>> iommufd_nesting branch. Only the domain_alloc_user op has type as there
>>>>>> can be multiple vendor specific alloc data types. Basically, I'm ok to
>>>>>> make the change you suggested, just not sure if it is good to add type
>>>>>> as it is only needed by one path.
>>>>>
>>>>> I don't think we should ever have an opaque data blob without a type
>>>>> tag..
>>>>
>>>> I can add those "missing" data types, and then a driver will be
>>>> responsible for sanitizing the type along with the data_len.
>>>>
>>>> I notice that the enum iommu_hwpt_data_type in the posted patch
>>>> is confined to the alloc_user uAPI. Perhaps we should share it
>>>> with invalidate too:
>>>
>>> invalidation path does not need a type field today as the data
>>> type is vendor specific, vendor driver should know the data type
>>> when calls in.
>>
>> I'm not keen on that, what if a driver needs another type in the
>> future?  You'd want to make the invalidation data format part of the
>> domain allocation?
> 
> The invalidation data has hwpt_id so it's tied to a hwpt and its
> hwpt->domain. Would it be reasonable to have a different type of
> invalidation data for the same type of hwpt?

this seems like what Jason asks. A type of hwpt can have two kinds
of invalidation data types. Is it really possible?

> With this being asked, I added it for our next version. At this
> moment, it only does a sanity job:
> 
> // API
> __iommu_copy_struct_from_user(void *dst_data,
> 			      const struct iommu_user_data *src_data,
> 			      unsigned int data_type, size_t data_len,
> 			      size_t min_len)
> {
> 	if (src_data->type != data_type)
> 		return -EINVAL;
> 
> // Caller
> 	rc = iommu_copy_struct_from_user(&user_cfg, user_data,
> 					 IOMMU_HWPT_DATA_SELFTEST, iotlb);
> 	if (rc)
> 		return ERR_PTR(rc);
> 
> Thanks
> Nic

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-16 11:57         ` Jason Gunthorpe
@ 2023-10-17  8:52           ` Tian, Kevin
  2023-10-17 15:53             ` Jason Gunthorpe
  2023-10-18  2:32             ` Baolu Lu
  2023-10-17  9:05           ` Tian, Kevin
  1 sibling, 2 replies; 106+ messages in thread
From: Tian, Kevin @ 2023-10-17  8:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, October 16, 2023 7:58 PM
> 
> On Mon, Oct 16, 2023 at 08:48:03AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Saturday, October 14, 2023 8:45 AM
> > >
> > > On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Thursday, September 21, 2023 3:51 PM
> > > > >
> > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > >
> > > > > Now enforce_cache_coherency and msi_cookie are kernel-managed
> hwpt
> > > > > things.
> > > > > So, they should be only setup on kernel-managed domains. If the
> > > attaching
> > > > > domain is a user-managed domain, redirect the hwpt to hwpt->parent
> to
> > > do
> > > > > it correctly.
> > > > >
> > > >
> > > > No redirection. The parent should already have the configuration done
> > > > when it's created. It shouldn't be triggered in the nesting path.
> > >
> > > iommufd_hw_pagetable_enforce_cc() is not only called in alloc(),
> > > but also in hwpt_attach/replace() if cc is not enforced by the
> > > alloc() because the idev that initiates the hwpt_alloc() might
> > > not have idev->enforce_cache_coherency. Only when another idev
> > > that has idev->enforce_cache_coherency attaches to the shared
> > > hwpt, the cc configuration would be done.
> >
> > is this a bug already? If the 1st device doesn't have enforce_cc in its
> > iommu, setting the snp bit in the hwpt would lead to reserved
> > bit violation.
> 
> I suspect there are technically some gaps in the intel driver, yes..

double checked. intel driver is doing right thing now:

intel_iommu_enforce_cache_coherency()
  domain_support_force_snooping()

static bool domain_support_force_snooping(struct dmar_domain *domain)
{
	struct device_domain_info *info;
	bool support = true;

	assert_spin_locked(&domain->lock);
	list_for_each_entry(info, &domain->devices, link) {
		if (!ecap_sc_support(info->iommu->ecap)) {
			support = false;
			break;
		}
	}

	return support;
}

> 
> > another problem is that intel_iommu_enforce_cache_coherency()
> > doesn't update existing entries. It only sets a domain flag to affect
> > future mappings. so it means the 2nd idev is also broken.
> 
> This is such a gap, intel driver should not permit that.

yes. @Baolu, can you add a fix?

> 
> > The simplest option is to follow vfio type1 i.e. don't mix devices
> > with different enforce_cc in one domain.
> 
> This is why I wanted to get rid of this bad mechanism going forward.
> 
> Manually created hwpt should have a manual specification of cc and
> then we don't have so many problems.
> 
> It means userspace needs to compute if they want to use CC or not, but
> userspace already needs to figure this out since without autodomains
> it must create two hwpts manually anyhow.
> 

Now there is no interface reporting enforce_cc to userspace.

Actually the user doesn't need to know enforce_cc. As long as there is
an attach incompatibility error the user has to create a 2nd hwpt for
the to-be-attached device then enforce_cc will be handled automatically
in hwpt_alloc.

I prefer to removing enforce_cc in attach fn completely then no parent
trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to
figure out the attaching compatibility:

  - attaching a idev with matching enforce_cc as hwpt is always allowed.
  - attaching a idev w/o enforce_cc to a hwpt with enforce_cc is rejected.
  - attaching a idev w/ enforce_cc to a hwpt w/o enforce_cc succeeds with
    hwpt continuing to be w/o enforce_cc. No promotion.

above has been guaranteed by intel iommu driver.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-10-16 18:44               ` Nicolin Chen
@ 2023-10-17  8:55                 ` Yi Liu
  2023-10-17 15:50                   ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Yi Liu @ 2023-10-17  8:55 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, zhenzhong.duan,
	joao.m.martins

On 2023/10/17 02:44, Nicolin Chen wrote:
> On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
>> On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
>>> Current nesting series actually extends HWPT_ALLOC ioctl to accept user
>>> data for allocating domain with vendor specific data. Nested translation
>>> happens to be the usage of it. But nesting requires invalidation. If we
>>> want to do further split, then this new series would be just "extending
>>> HWPT_ALLOC to accept vendor specific data from userspace". But it will
>>> lack of a user if nesting is separated. Is this acceptable? @Jason
>>
>> I'd still like to include the nesting allocation and attach parts
>> though, even if they are not usable without invalidation ..
> 
> This is the latest series that I reworked (in bottom-up order):
>   iommu: Add a pair of helper to copy struct iommu_user_data{_array}
>   iommufd: Add IOMMU_HWPT_INVALIDATE
>   iommufd: Add a nested HW pagetable object
>   iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
>   iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
>   iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
>   iommufd/device: Add helpers to enforce/remove device reserved regions
>   iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
>   iommu: Pass in parent domain with user_data to domain_alloc_user op

following Jason's comment, it looks like we can just split the cache
invalidation path out. Then the above looks good after removing
"iommufd: Add IOMMU_HWPT_INVALIDATE" and also the cache_invalidate_user
callback in "iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op".
Is it? @Jason

> Perhaps we can have a preparatory series to merge first:
>   iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
>   iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
>   iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
>   iommufd/device: Add helpers to enforce/remove device reserved regions
> 
> Then next cycle would be basically 4 patches + selftests:
>   iommufd: Add IOMMU_HWPT_INVALIDATE
>   iommufd: Add a nested HW pagetable object
>   iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
>   iommu: Pass in parent domain with user_data to domain_alloc_user op
> 
> The preparatory series doesn't involve functional changes yet have
> a good amount of pieces to simplify the "nested HW pagetable" that
> is basically nested_alloc/abort/destroy.
> >>> BTW. Do we still have unsolved issue on the invalidation path?
>>
>> I'm not sure, there were so many different versions of it we need to
>> go back over it and check the dirver implementations again
> 
> Only this v4 has the latest array-based invalidation design. And
> it should be straightforward for drivers to define entry/request
> structures. It might be a bit rush to review/finalize it at the
> stage of rc6 though.

yes, before v4, the cache invalidation path is simple and vendor
drivers have their own handling.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-16 11:57         ` Jason Gunthorpe
  2023-10-17  8:52           ` Tian, Kevin
@ 2023-10-17  9:05           ` Tian, Kevin
  1 sibling, 0 replies; 106+ messages in thread
From: Tian, Kevin @ 2023-10-17  9:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Tian, Kevin
> Sent: Tuesday, October 17, 2023 4:53 PM
> >
> > This is why I wanted to get rid of this bad mechanism going forward.
> >
> > Manually created hwpt should have a manual specification of cc and
> > then we don't have so many problems.
> >

Actually I don't see much difference between manual specification of
cc vs. indirect specification of cc via specified idev in hwpt_alloc. What
really matters is how we want to deal with the compatibility in the
following attach path.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-17  8:51                     ` Yi Liu
@ 2023-10-17  9:28                       ` Tian, Kevin
  2023-10-18  6:12                         ` Yi Liu
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-10-17  9:28 UTC (permalink / raw)
  To: Liu, Yi L, Nicolin Chen, Jason Gunthorpe
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong, Martins,
	Joao

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, October 17, 2023 4:52 PM
> 
> On 2023/10/17 02:17, Nicolin Chen wrote:
> > On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
> >> On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> >>> On 2023/10/14 01:56, Nicolin Chen wrote:
> >>>> On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> >>>>> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> >>>>>
> >>>>>> not really. Below the users of the struct iommu_user_data in my
> current
> >>>>>> iommufd_nesting branch. Only the domain_alloc_user op has type as
> there
> >>>>>> can be multiple vendor specific alloc data types. Basically, I'm ok to
> >>>>>> make the change you suggested, just not sure if it is good to add type
> >>>>>> as it is only needed by one path.
> >>>>>
> >>>>> I don't think we should ever have an opaque data blob without a type
> >>>>> tag..
> >>>>
> >>>> I can add those "missing" data types, and then a driver will be
> >>>> responsible for sanitizing the type along with the data_len.
> >>>>
> >>>> I notice that the enum iommu_hwpt_data_type in the posted patch
> >>>> is confined to the alloc_user uAPI. Perhaps we should share it
> >>>> with invalidate too:
> >>>
> >>> invalidation path does not need a type field today as the data
> >>> type is vendor specific, vendor driver should know the data type
> >>> when calls in.
> >>
> >> I'm not keen on that, what if a driver needs another type in the
> >> future?  You'd want to make the invalidation data format part of the
> >> domain allocation?
> >
> > The invalidation data has hwpt_id so it's tied to a hwpt and its
> > hwpt->domain. Would it be reasonable to have a different type of
> > invalidation data for the same type of hwpt?
> 
> this seems like what Jason asks. A type of hwpt can have two kinds
> of invalidation data types. Is it really possible?
> 

e.g. vhost-iommu may want its own vendor-agnostic format from
previous discussion...

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-10-17  8:55                 ` Yi Liu
@ 2023-10-17 15:50                   ` Jason Gunthorpe
  2023-10-17 19:32                     ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 15:50 UTC (permalink / raw)
  To: Yi Liu
  Cc: Nicolin Chen, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Tue, Oct 17, 2023 at 04:55:12PM +0800, Yi Liu wrote:
> On 2023/10/17 02:44, Nicolin Chen wrote:
> > On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
> > > > Current nesting series actually extends HWPT_ALLOC ioctl to accept user
> > > > data for allocating domain with vendor specific data. Nested translation
> > > > happens to be the usage of it. But nesting requires invalidation. If we
> > > > want to do further split, then this new series would be just "extending
> > > > HWPT_ALLOC to accept vendor specific data from userspace". But it will
> > > > lack of a user if nesting is separated. Is this acceptable? @Jason
> > > 
> > > I'd still like to include the nesting allocation and attach parts
> > > though, even if they are not usable without invalidation ..
> > 
> > This is the latest series that I reworked (in bottom-up order):
> >   iommu: Add a pair of helper to copy struct iommu_user_data{_array}
> >   iommufd: Add IOMMU_HWPT_INVALIDATE
> >   iommufd: Add a nested HW pagetable object
> >   iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
> >   iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
> >   iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
> >   iommufd/device: Add helpers to enforce/remove device reserved regions
> >   iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
> >   iommu: Pass in parent domain with user_data to domain_alloc_user op
> 
> following Jason's comment, it looks like we can just split the cache
> invalidation path out. Then the above looks good after removing
> "iommufd: Add IOMMU_HWPT_INVALIDATE" and also the cache_invalidate_user
> callback in "iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op".
> Is it? @Jason

If it can make sense, sure. It would be nice to be finished with the
alloc path

> > Only this v4 has the latest array-based invalidation design. And
> > it should be straightforward for drivers to define entry/request
> > structures. It might be a bit rush to review/finalize it at the
> > stage of rc6 though.
> 
> yes, before v4, the cache invalidation path is simple and vendor
> drivers have their own handling.

Have driver implementations of v4 been done to look at?

Jason 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-17  8:52           ` Tian, Kevin
@ 2023-10-17 15:53             ` Jason Gunthorpe
  2023-10-17 19:58               ` Nicolin Chen
  2023-10-18  2:32             ` Baolu Lu
  1 sibling, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 15:53 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:

> > This is why I wanted to get rid of this bad mechanism going forward.
> > 
> > Manually created hwpt should have a manual specification of cc and
> > then we don't have so many problems.
> > 
> > It means userspace needs to compute if they want to use CC or not, but
> > userspace already needs to figure this out since without autodomains
> > it must create two hwpts manually anyhow.
> > 
> 
> Now there is no interface reporting enforce_cc to userspace.

Yes..

> Actually the user doesn't need to know enforce_cc. As long as there is
> an attach incompatibility error the user has to create a 2nd hwpt for
> the to-be-attached device then enforce_cc will be handled automatically
> in hwpt_alloc.

Uh, OK we can do that.. It is a bit hacky because we don't know the
order do
 
> I prefer to removing enforce_cc in attach fn completely then no parent
> trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to
> figure out the attaching compatibility:

You are basically saying to set the cc mode during creation because we
know the idev at that moment and can tell if it should be on/off?

It seems reasonable, but I can't remember why it is in the attach path
at the moment.

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
  2023-10-17 15:50                   ` Jason Gunthorpe
@ 2023-10-17 19:32                     ` Nicolin Chen
  0 siblings, 0 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-17 19:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Tue, Oct 17, 2023 at 12:50:11PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 04:55:12PM +0800, Yi Liu wrote:
> > On 2023/10/17 02:44, Nicolin Chen wrote:
> > > On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
> > > > On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
> > > > > Current nesting series actually extends HWPT_ALLOC ioctl to accept user
> > > > > data for allocating domain with vendor specific data. Nested translation
> > > > > happens to be the usage of it. But nesting requires invalidation. If we
> > > > > want to do further split, then this new series would be just "extending
> > > > > HWPT_ALLOC to accept vendor specific data from userspace". But it will
> > > > > lack of a user if nesting is separated. Is this acceptable? @Jason
> > > > 
> > > > I'd still like to include the nesting allocation and attach parts
> > > > though, even if they are not usable without invalidation ..
> > > 
> > > This is the latest series that I reworked (in bottom-up order):
> > >   iommu: Add a pair of helper to copy struct iommu_user_data{_array}
> > >   iommufd: Add IOMMU_HWPT_INVALIDATE
> > >   iommufd: Add a nested HW pagetable object
> > >   iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
> > >   iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
> > >   iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
> > >   iommufd/device: Add helpers to enforce/remove device reserved regions
> > >   iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
> > >   iommu: Pass in parent domain with user_data to domain_alloc_user op
> > 
> > following Jason's comment, it looks like we can just split the cache
> > invalidation path out. Then the above looks good after removing
> > "iommufd: Add IOMMU_HWPT_INVALIDATE" and also the cache_invalidate_user
> > callback in "iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op".
> > Is it? @Jason
> 
> If it can make sense, sure. It would be nice to be finished with the
> alloc path

I can do the split today. Shall we have a domain_alloc_user op
in VT-d driver? Can we accept a core series only? I understood
it's better to have though...

> > > Only this v4 has the latest array-based invalidation design. And
> > > it should be straightforward for drivers to define entry/request
> > > structures. It might be a bit rush to review/finalize it at the
> > > stage of rc6 though.
> > 
> > yes, before v4, the cache invalidation path is simple and vendor
> > drivers have their own handling.
> 
> Have driver implementations of v4 been done to look at?

I think so:
https://lore.kernel.org/linux-iommu/20230921075431.125239-10-yi.l.liu@intel.com/

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-17 15:53             ` Jason Gunthorpe
@ 2023-10-17 19:58               ` Nicolin Chen
  2023-10-18 16:51                 ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Nicolin Chen @ 2023-10-17 19:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Tue, Oct 17, 2023 at 12:53:01PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
> > I prefer to removing enforce_cc in attach fn completely then no parent
> > trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to
> > figure out the attaching compatibility:
> 
> You are basically saying to set the cc mode during creation because we
> know the idev at that moment and can tell if it should be on/off?
> 
> It seems reasonable, but I can't remember why it is in the attach path
> at the moment.

This was the commit adding it in the alloc path:
https://lore.kernel.org/linux-iommu/8e897628-61fa-b3fb-b609-44eeda11b45e@arm.com/

The older code was doing a hwpt "upgrade" from !cc to cc:
-       /*
-        * Try to upgrade the domain we have, it is an iommu driver bug to
-        * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
-        * enforce_cache_coherency when there are no devices attached to the
-        * domain.
-        */
-       if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) {
-               if (hwpt->domain->ops->enforce_cache_coherency)
-                       hwpt->enforce_cache_coherency =
-                               hwpt->domain->ops->enforce_cache_coherency(
-                                       hwpt->domain);

If we remove the enforce_cc call in the attach path and let the
driver decide whether to enforce or reject in attach_dev calls,
there seems to be no point in tracking an enforce_cache_coherency
flag in the IOMMUFD pathway but only for the VFIO_DMA_CC_IOMMU
extension check in the vfio-compat pathway?

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-17  8:52           ` Tian, Kevin
  2023-10-17 15:53             ` Jason Gunthorpe
@ 2023-10-18  2:32             ` Baolu Lu
  1 sibling, 0 replies; 106+ messages in thread
From: Baolu Lu @ 2023-10-18  2:32 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: baolu.lu, Nicolin Chen, Liu, Yi L, joro, alex.williamson,
	robin.murphy, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On 10/17/23 4:52 PM, Tian, Kevin wrote:
>>> another problem is that intel_iommu_enforce_cache_coherency()
>>> doesn't update existing entries. It only sets a domain flag to affect
>>> future mappings. so it means the 2nd idev is also broken.
>> This is such a gap, intel driver should not permit that.
> yes. @Baolu, can you add a fix?

Sure thing!

Best regards,
baolu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-17  9:28                       ` Tian, Kevin
@ 2023-10-18  6:12                         ` Yi Liu
  0 siblings, 0 replies; 106+ messages in thread
From: Yi Liu @ 2023-10-18  6:12 UTC (permalink / raw)
  To: Tian, Kevin, Nicolin Chen, Jason Gunthorpe
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong, Martins,
	Joao

On 2023/10/17 17:28, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Tuesday, October 17, 2023 4:52 PM
>>
>> On 2023/10/17 02:17, Nicolin Chen wrote:
>>> On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
>>>> On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
>>>>> On 2023/10/14 01:56, Nicolin Chen wrote:
>>>>>> On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
>>>>>>> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
>>>>>>>
>>>>>>>> not really. Below the users of the struct iommu_user_data in my
>> current
>>>>>>>> iommufd_nesting branch. Only the domain_alloc_user op has type as
>> there
>>>>>>>> can be multiple vendor specific alloc data types. Basically, I'm ok to
>>>>>>>> make the change you suggested, just not sure if it is good to add type
>>>>>>>> as it is only needed by one path.
>>>>>>>
>>>>>>> I don't think we should ever have an opaque data blob without a type
>>>>>>> tag..
>>>>>>
>>>>>> I can add those "missing" data types, and then a driver will be
>>>>>> responsible for sanitizing the type along with the data_len.
>>>>>>
>>>>>> I notice that the enum iommu_hwpt_data_type in the posted patch
>>>>>> is confined to the alloc_user uAPI. Perhaps we should share it
>>>>>> with invalidate too:
>>>>>
>>>>> invalidation path does not need a type field today as the data
>>>>> type is vendor specific, vendor driver should know the data type
>>>>> when calls in.
>>>>
>>>> I'm not keen on that, what if a driver needs another type in the
>>>> future?  You'd want to make the invalidation data format part of the
>>>> domain allocation?
>>>
>>> The invalidation data has hwpt_id so it's tied to a hwpt and its
>>> hwpt->domain. Would it be reasonable to have a different type of
>>> invalidation data for the same type of hwpt?
>>
>> this seems like what Jason asks. A type of hwpt can have two kinds
>> of invalidation data types. Is it really possible?
>>
> 
> e.g. vhost-iommu may want its own vendor-agnostic format from
> previous discussion...

ok. So in future, if there is vendor-agnostic format cache invalidation
data structure, then each existing hwpt types would have two cache
invalidation types. is it?

So it is required to make the invalidation data format part of the
domain allocation. Perhaps we can add it later?

Regards,
Yi Liu

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-16 18:17                   ` Nicolin Chen
  2023-10-17  8:51                     ` Yi Liu
@ 2023-10-18 16:37                     ` Jason Gunthorpe
  2023-10-18 16:51                       ` Nicolin Chen
  1 sibling, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-18 16:37 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Mon, Oct 16, 2023 at 11:17:56AM -0700, Nicolin Chen wrote:
> On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
> > On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> > > On 2023/10/14 01:56, Nicolin Chen wrote:
> > > > On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> > > > > On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> > > > > 
> > > > > > not really. Below the users of the struct iommu_user_data in my current
> > > > > > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > > > > > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > > > > > make the change you suggested, just not sure if it is good to add type
> > > > > > as it is only needed by one path.
> > > > > 
> > > > > I don't think we should ever have an opaque data blob without a type
> > > > > tag..
> > > > 
> > > > I can add those "missing" data types, and then a driver will be
> > > > responsible for sanitizing the type along with the data_len.
> > > > 
> > > > I notice that the enum iommu_hwpt_data_type in the posted patch
> > > > is confined to the alloc_user uAPI. Perhaps we should share it
> > > > with invalidate too:
> > > 
> > > invalidation path does not need a type field today as the data
> > > type is vendor specific, vendor driver should know the data type
> > > when calls in.
> > 
> > I'm not keen on that, what if a driver needs another type in the
> > future?  You'd want to make the invalidation data format part of the
> > domain allocation?
> 
> The invalidation data has hwpt_id so it's tied to a hwpt and its 
> hwpt->domain. Would it be reasonable to have a different type of
> invalidation data for the same type of hwpt?

Yeah, maybe? Go down the road 10 years and we might have SMMUv3
invalidation format v1 and v2 or something?

Like we don't know what the HW side will do, they might extend the
command queue to have bigger commands and negotiate with the driver if
the bigger/smaller format is used. We've done that in our HW a couple
of times now.

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op
  2023-10-18 16:37                     ` Jason Gunthorpe
@ 2023-10-18 16:51                       ` Nicolin Chen
  0 siblings, 0 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-18 16:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan, joao.m.martins

On Wed, Oct 18, 2023 at 01:37:20PM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 11:17:56AM -0700, Nicolin Chen wrote:
> > On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> > > > On 2023/10/14 01:56, Nicolin Chen wrote:
> > > > > On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> > > > > > On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> > > > > > 
> > > > > > > not really. Below the users of the struct iommu_user_data in my current
> > > > > > > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > > > > > > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > > > > > > make the change you suggested, just not sure if it is good to add type
> > > > > > > as it is only needed by one path.
> > > > > > 
> > > > > > I don't think we should ever have an opaque data blob without a type
> > > > > > tag..
> > > > > 
> > > > > I can add those "missing" data types, and then a driver will be
> > > > > responsible for sanitizing the type along with the data_len.
> > > > > 
> > > > > I notice that the enum iommu_hwpt_data_type in the posted patch
> > > > > is confined to the alloc_user uAPI. Perhaps we should share it
> > > > > with invalidate too:
> > > > 
> > > > invalidation path does not need a type field today as the data
> > > > type is vendor specific, vendor driver should know the data type
> > > > when calls in.
> > > 
> > > I'm not keen on that, what if a driver needs another type in the
> > > future?  You'd want to make the invalidation data format part of the
> > > domain allocation?
> > 
> > The invalidation data has hwpt_id so it's tied to a hwpt and its 
> > hwpt->domain. Would it be reasonable to have a different type of
> > invalidation data for the same type of hwpt?
> 
> Yeah, maybe? Go down the road 10 years and we might have SMMUv3
> invalidation format v1 and v2 or something?
> 
> Like we don't know what the HW side will do, they might extend the
> command queue to have bigger commands and negotiate with the driver if
> the bigger/smaller format is used. We've done that in our HW a couple
> of times now.

I see. We'll have the type. Thanks!

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-17 19:58               ` Nicolin Chen
@ 2023-10-18 16:51                 ` Jason Gunthorpe
  2023-10-19  1:56                   ` Tian, Kevin
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-18 16:51 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Tue, Oct 17, 2023 at 12:58:39PM -0700, Nicolin Chen wrote:
> On Tue, Oct 17, 2023 at 12:53:01PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
> > > I prefer to removing enforce_cc in attach fn completely then no parent
> > > trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to
> > > figure out the attaching compatibility:
> > 
> > You are basically saying to set the cc mode during creation because we
> > know the idev at that moment and can tell if it should be on/off?
> > 
> > It seems reasonable, but I can't remember why it is in the attach path
> > at the moment.
> 
> This was the commit adding it in the alloc path:
> https://lore.kernel.org/linux-iommu/8e897628-61fa-b3fb-b609-44eeda11b45e@arm.com/
> 
> The older code was doing a hwpt "upgrade" from !cc to cc:
> -       /*
> -        * Try to upgrade the domain we have, it is an iommu driver bug to
> -        * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
> -        * enforce_cache_coherency when there are no devices attached to the
> -        * domain.
> -        */
> -       if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) {
> -               if (hwpt->domain->ops->enforce_cache_coherency)
> -                       hwpt->enforce_cache_coherency =
> -                               hwpt->domain->ops->enforce_cache_coherency(
> -                                       hwpt->domain);
> 
> If we remove the enforce_cc call in the attach path and let the
> driver decide whether to enforce or reject in attach_dev calls,
> there seems to be no point in tracking an enforce_cache_coherency
> flag in the IOMMUFD pathway but only for the VFIO_DMA_CC_IOMMU
> extension check in the vfio-compat pathway?

I think the purpose of this code is to try to minimize the number of
domains.

Otherwise we have a problem where the order devices are attached to
the domain decides how many domains you get. ie the first device
attached does not want CC (but is compatible with it) so we create a
non-CC domain

Then later we attach a device that does want CC and now we are forced
to create a second iommu domain when upgrading the first domain would
have been fine.

Hotplug is another scenario (though Intel driver does not support it,
and it looks broken)

Really, I hate this CC mechanism. It is only for Intel, can we just
punt it to userspace and have an intel 'want cc flag' for the entire
nesting path and forget about it?

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-18 16:51                 ` Jason Gunthorpe
@ 2023-10-19  1:56                   ` Tian, Kevin
  2023-10-19 23:53                     ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-10-19  1:56 UTC (permalink / raw)
  To: Jason Gunthorpe, Nicolin Chen
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong, Martins,
	Joao

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, October 19, 2023 12:51 AM
> 
> On Tue, Oct 17, 2023 at 12:58:39PM -0700, Nicolin Chen wrote:
> > On Tue, Oct 17, 2023 at 12:53:01PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
> > > > I prefer to removing enforce_cc in attach fn completely then no parent
> > > > trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver
> to
> > > > figure out the attaching compatibility:
> > >
> > > You are basically saying to set the cc mode during creation because we
> > > know the idev at that moment and can tell if it should be on/off?
> > >
> > > It seems reasonable, but I can't remember why it is in the attach path
> > > at the moment.
> >
> > This was the commit adding it in the alloc path:
> > https://lore.kernel.org/linux-iommu/8e897628-61fa-b3fb-b609-
> 44eeda11b45e@arm.com/
> >
> > The older code was doing a hwpt "upgrade" from !cc to cc:
> > -       /*
> > -        * Try to upgrade the domain we have, it is an iommu driver bug to
> > -        * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
> > -        * enforce_cache_coherency when there are no devices attached to
> the
> > -        * domain.
> > -        */
> > -       if (idev->enforce_cache_coherency && !hwpt-
> >enforce_cache_coherency) {
> > -               if (hwpt->domain->ops->enforce_cache_coherency)
> > -                       hwpt->enforce_cache_coherency =
> > -                               hwpt->domain->ops->enforce_cache_coherency(
> > -                                       hwpt->domain);
> >
> > If we remove the enforce_cc call in the attach path and let the
> > driver decide whether to enforce or reject in attach_dev calls,
> > there seems to be no point in tracking an enforce_cache_coherency
> > flag in the IOMMUFD pathway but only for the VFIO_DMA_CC_IOMMU
> > extension check in the vfio-compat pathway?
> 
> I think the purpose of this code is to try to minimize the number of
> domains.
> 
> Otherwise we have a problem where the order devices are attached to
> the domain decides how many domains you get. ie the first device
> attached does not want CC (but is compatible with it) so we create a
> non-CC domain

in autodetect model this won't happen. If the first device is capable
of enforce_cc then the domain will be created with enforce_cc.

there is no "does not want CC" input in autodetect.

> 
> Then later we attach a device that does want CC and now we are forced
> to create a second iommu domain when upgrading the first domain would
> have been fine.

then in this case the 2nd device will reuse the domain.

> 
> Hotplug is another scenario (though Intel driver does not support it,
> and it looks broken)

Can you elaborate how hotplug is broken? If device is hotplugged and
failed to attach to an existing domain, then a new one will be created
for it.

there is indeed a broken case in KVM which cannot handle dynamic
change of coherency due to hotplug. But that one is orthogonal to
what we discussed here about how to decide cc in iommufd.

> 
> Really, I hate this CC mechanism. It is only for Intel, can we just

Intel and AMD.

> punt it to userspace and have an intel 'want cc flag' for the entire
> nesting path and forget about it?
> 

I dislike it too. But still not get your point why adding such a flag
can really simplify things. As explained before the only difference
between autodetect and having a user flag just affects the decision
of cc when creating a hwpt. whether we should upgrade in the
attach path is an orthogonal open which imho is unnecessary and
Nicoline's patches to remove that check then also remove this
patch makes lot of sense to me.

Then the necessity of introducing such a flag (plus adding a new
interface to report cc to user space) is not obvious.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-19  1:56                   ` Tian, Kevin
@ 2023-10-19 23:53                     ` Jason Gunthorpe
  2023-10-20  2:43                       ` Tian, Kevin
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-19 23:53 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Thu, Oct 19, 2023 at 01:56:01AM +0000, Tian, Kevin wrote:

> > Otherwise we have a problem where the order devices are attached to
> > the domain decides how many domains you get. ie the first device
> > attached does not want CC (but is compatible with it) so we create a
> > non-CC domain
> 
> in autodetect model this won't happen. If the first device is capable
> of enforce_cc then the domain will be created with enforce_cc.
> 
> there is no "does not want CC" input in autodetect.
> > 
> > Then later we attach a device that does want CC and now we are forced
> > to create a second iommu domain when upgrading the first domain would
> > have been fine.
> 
> then in this case the 2nd device will reuse the domain.

Then you have the reverse problem where the domain will not be CC when
it should be.

> > Hotplug is another scenario (though Intel driver does not support it,
> > and it looks broken)
> 
> Can you elaborate how hotplug is broken? If device is hotplugged and
> failed to attach to an existing domain, then a new one will be created
> for it.

A non-cc domain will ask to be upgraded and the driver will let it
happen even though it doesn't/can't fix the existing IOPTEs

> there is indeed a broken case in KVM which cannot handle dynamic
> change of coherency due to hotplug. But that one is orthogonal to
> what we discussed here about how to decide cc in iommufd.

That too
  
> > Really, I hate this CC mechanism. It is only for Intel, can we just
> 
> Intel and AMD.

Nope, AMD just hardwires their IOMMU to always do CC enforcing. All
this mess is only for Intel and their weird IOMMU that can only do the
enforcement for a GPU.

> > punt it to userspace and have an intel 'want cc flag' for the entire
> > nesting path and forget about it?
> 
> I dislike it too. But still not get your point why adding such a flag
> can really simplify things. As explained before the only difference
> between autodetect and having a user flag just affects the decision
> of cc when creating a hwpt. whether we should upgrade in the
> attach path is an orthogonal open which imho is unnecessary and
> Nicoline's patches to remove that check then also remove this
> patch makes lot of sense to me.

I don't think we can remove it, it is supposed to provide consistency
of result regardless of ordering.

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-19 23:53                     ` Jason Gunthorpe
@ 2023-10-20  2:43                       ` Tian, Kevin
  2023-10-20 13:55                         ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-10-20  2:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, October 20, 2023 7:54 AM
> 
> On Thu, Oct 19, 2023 at 01:56:01AM +0000, Tian, Kevin wrote:
> 
> > > Otherwise we have a problem where the order devices are attached to
> > > the domain decides how many domains you get. ie the first device
> > > attached does not want CC (but is compatible with it) so we create a
> > > non-CC domain
> >
> > in autodetect model this won't happen. If the first device is capable
> > of enforce_cc then the domain will be created with enforce_cc.
> >
> > there is no "does not want CC" input in autodetect.
> > >
> > > Then later we attach a device that does want CC and now we are forced
> > > to create a second iommu domain when upgrading the first domain
> would
> > > have been fine.
> >
> > then in this case the 2nd device will reuse the domain.
> 
> Then you have the reverse problem where the domain will not be CC when
> it should be.

If the domain has been non-CC it's perfectly fine for the 2nd device with CC
to reuse it. As long as there is one domain with non-CC then KVM has to
have special treatment on wbinvd. In this case there is actually a benefit
to use just one non-CC domain for all devices (either non-CC or CC).

What we want to prevent is attaching a non-CC device to a CC domain
or upgrade a non-CC domain  to CC since in both case the non-CC device
will be broken due to incompatible page table format.

> 
> > > Hotplug is another scenario (though Intel driver does not support it,
> > > and it looks broken)
> >
> > Can you elaborate how hotplug is broken? If device is hotplugged and
> > failed to attach to an existing domain, then a new one will be created
> > for it.
> 
> A non-cc domain will ask to be upgraded and the driver will let it
> happen even though it doesn't/can't fix the existing IOPTEs

iommufd should not ask for upgrade at all. The CC attribute of domain
should be fixed since creation time.

Baolu will fix the intel-iommu driver accordingly.

> 
> > there is indeed a broken case in KVM which cannot handle dynamic
> > change of coherency due to hotplug. But that one is orthogonal to
> > what we discussed here about how to decide cc in iommufd.
> 
> That too
> 
> > > Really, I hate this CC mechanism. It is only for Intel, can we just
> >
> > Intel and AMD.
> 
> Nope, AMD just hardwires their IOMMU to always do CC enforcing. All
> this mess is only for Intel and their weird IOMMU that can only do the
> enforcement for a GPU.
> 
> > > punt it to userspace and have an intel 'want cc flag' for the entire
> > > nesting path and forget about it?
> >
> > I dislike it too. But still not get your point why adding such a flag
> > can really simplify things. As explained before the only difference
> > between autodetect and having a user flag just affects the decision
> > of cc when creating a hwpt. whether we should upgrade in the
> > attach path is an orthogonal open which imho is unnecessary and
> > Nicoline's patches to remove that check then also remove this
> > patch makes lot of sense to me.
> 
> I don't think we can remove it, it is supposed to provide consistency
> of result regardless of ordering.
> 

Who cares about such consistency? sure the result is different due to order:

1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
    dev2 (CC) will succeed;

2) creating hwpt for dev2 (CC) then later attaching hwpt to
    dev1 (non-CC) will fail then the user should create a new hwpt
    for dev1;

But the user shouldn't assume such explicit consistency since it's not
defined in our uAPI. All we defined is that the attaching may
fail due to incompatibility for whatever reason then the user can
always try creating a new hwpt for the to-be-attached device. From
this regard I don't see providing consistency of result is necessary. 😊
    

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-20  2:43                       ` Tian, Kevin
@ 2023-10-20 13:55                         ` Jason Gunthorpe
  2023-10-20 18:59                           ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-20 13:55 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:

> What we want to prevent is attaching a non-CC device to a CC domain
> or upgrade a non-CC domain to CC since in both case the non-CC
> device will be broken due to incompatible page table format.

[..]

> Who cares about such consistency? sure the result is different due to order:
> 
> 1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
>     dev2 (CC) will succeed;
> 
> 2) creating hwpt for dev2 (CC) then later attaching hwpt to
>     dev1 (non-CC) will fail then the user should create a new hwpt
>     for dev1;

AH... So really what the Intel driver wants is not upgrade to CC but
*downgrade* from CC.

non-CC is the type that is universally applicable, so if we come
across a non-CC capable device the proper/optimal thing is to degrade
the HWPT and re-use it, not allocate a new HWPT.

So the whole thing is upside down.

As changing the IOPTEs in flight seems hard, and I don't want to see
the Intel driver get slowed down to accomodate this, I think you are
right to say this should be a creation time property only.

I still think userspace should be able to select it so it can minimize
the number of HWPTs required.

> But the user shouldn't assume such explicit consistency since it's not
> defined in our uAPI. All we defined is that the attaching may
> fail due to incompatibility for whatever reason then the user can
> always try creating a new hwpt for the to-be-attached device. From
> this regard I don't see providing consistency of result is
> necessary. 😊

Anyhow, OK, lets add a comment summarizing your points and remove the
cc upgrade at attach time (sorry Nicolin/Yi!)

It is easy to add a HWPT flag for this later if someone wants to
optimize it.

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-20 13:55                         ` Jason Gunthorpe
@ 2023-10-20 18:59                           ` Nicolin Chen
  2023-10-21 16:38                             ` Jason Gunthorpe
  0 siblings, 1 reply; 106+ messages in thread
From: Nicolin Chen @ 2023-10-20 18:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
> 
> > What we want to prevent is attaching a non-CC device to a CC domain
> > or upgrade a non-CC domain to CC since in both case the non-CC
> > device will be broken due to incompatible page table format.
> 
> [..]
> 
> > Who cares about such consistency? sure the result is different due to order:
> > 
> > 1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
> >     dev2 (CC) will succeed;
> > 
> > 2) creating hwpt for dev2 (CC) then later attaching hwpt to
> >     dev1 (non-CC) will fail then the user should create a new hwpt
> >     for dev1;
> 
> AH... So really what the Intel driver wants is not upgrade to CC but
> *downgrade* from CC.
> 
> non-CC is the type that is universally applicable, so if we come
> across a non-CC capable device the proper/optimal thing is to degrade
> the HWPT and re-use it, not allocate a new HWPT.
> 
> So the whole thing is upside down.
> 
> As changing the IOPTEs in flight seems hard, and I don't want to see
> the Intel driver get slowed down to accomodate this, I think you are
> right to say this should be a creation time property only.
> 
> I still think userspace should be able to select it so it can minimize
> the number of HWPTs required.
> 
> > But the user shouldn't assume such explicit consistency since it's not
> > defined in our uAPI. All we defined is that the attaching may
> > fail due to incompatibility for whatever reason then the user can
> > always try creating a new hwpt for the to-be-attached device. From
> > this regard I don't see providing consistency of result is
> > necessary. 😊
> 
> Anyhow, OK, lets add a comment summarizing your points and remove the
> cc upgrade at attach time (sorry Nicolin/Yi!)

Ack. I will send a small removal series. I assume it should CC
stable tree also? And where should we add this comment? Kdoc of
the alloc uAPI?

Thanks!
Nicolin

> It is easy to add a HWPT flag for this later if someone wants to
> optimize it.
> 
> Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-20 18:59                           ` Nicolin Chen
@ 2023-10-21 16:38                             ` Jason Gunthorpe
  2023-10-23  0:18                               ` Nicolin Chen
  0 siblings, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-21 16:38 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
> On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
> > 
> > > What we want to prevent is attaching a non-CC device to a CC domain
> > > or upgrade a non-CC domain to CC since in both case the non-CC
> > > device will be broken due to incompatible page table format.
> > 
> > [..]
> > 
> > > Who cares about such consistency? sure the result is different due to order:
> > > 
> > > 1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
> > >     dev2 (CC) will succeed;
> > > 
> > > 2) creating hwpt for dev2 (CC) then later attaching hwpt to
> > >     dev1 (non-CC) will fail then the user should create a new hwpt
> > >     for dev1;
> > 
> > AH... So really what the Intel driver wants is not upgrade to CC but
> > *downgrade* from CC.
> > 
> > non-CC is the type that is universally applicable, so if we come
> > across a non-CC capable device the proper/optimal thing is to degrade
> > the HWPT and re-use it, not allocate a new HWPT.
> > 
> > So the whole thing is upside down.
> > 
> > As changing the IOPTEs in flight seems hard, and I don't want to see
> > the Intel driver get slowed down to accomodate this, I think you are
> > right to say this should be a creation time property only.
> > 
> > I still think userspace should be able to select it so it can minimize
> > the number of HWPTs required.
> > 
> > > But the user shouldn't assume such explicit consistency since it's not
> > > defined in our uAPI. All we defined is that the attaching may
> > > fail due to incompatibility for whatever reason then the user can
> > > always try creating a new hwpt for the to-be-attached device. From
> > > this regard I don't see providing consistency of result is
> > > necessary. 😊
> > 
> > Anyhow, OK, lets add a comment summarizing your points and remove the
> > cc upgrade at attach time (sorry Nicolin/Yi!)
> 
> Ack. I will send a small removal series. I assume it should CC
> stable tree also? 

No, it seems more like tidying that fixing a functional issue, do I
misunderstand?

> And where should we add this comment? Kdoc of
> the alloc uAPI?

Maybe right in front of the only enforce_cc op callback?

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-21 16:38                             ` Jason Gunthorpe
@ 2023-10-23  0:18                               ` Nicolin Chen
  2023-10-23  2:53                                 ` Tian, Kevin
  2023-10-23 13:59                                 ` Jason Gunthorpe
  0 siblings, 2 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-23  0:18 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong, Martins,
	Joao

On Sat, Oct 21, 2023 at 01:38:04PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
> > On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
> > > 
> > > > What we want to prevent is attaching a non-CC device to a CC domain
> > > > or upgrade a non-CC domain to CC since in both case the non-CC
> > > > device will be broken due to incompatible page table format.
> > > 
> > > [..]
> > > 
> > > > Who cares about such consistency? sure the result is different due to order:
> > > > 
> > > > 1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
> > > >     dev2 (CC) will succeed;
> > > > 
> > > > 2) creating hwpt for dev2 (CC) then later attaching hwpt to
> > > >     dev1 (non-CC) will fail then the user should create a new hwpt
> > > >     for dev1;
> > > 
> > > AH... So really what the Intel driver wants is not upgrade to CC but
> > > *downgrade* from CC.
> > > 
> > > non-CC is the type that is universally applicable, so if we come
> > > across a non-CC capable device the proper/optimal thing is to degrade
> > > the HWPT and re-use it, not allocate a new HWPT.
> > > 
> > > So the whole thing is upside down.
> > > 
> > > As changing the IOPTEs in flight seems hard, and I don't want to see
> > > the Intel driver get slowed down to accomodate this, I think you are
> > > right to say this should be a creation time property only.
> > > 
> > > I still think userspace should be able to select it so it can minimize
> > > the number of HWPTs required.
> > > 
> > > > But the user shouldn't assume such explicit consistency since it's not
> > > > defined in our uAPI. All we defined is that the attaching may
> > > > fail due to incompatibility for whatever reason then the user can
> > > > always try creating a new hwpt for the to-be-attached device. From
> > > > this regard I don't see providing consistency of result is
> > > > necessary. 😊
> > > 
> > > Anyhow, OK, lets add a comment summarizing your points and remove the
> > > cc upgrade at attach time (sorry Nicolin/Yi!)
> > 
> > Ack. I will send a small removal series. I assume it should CC
> > stable tree also? 
> 
> No, it seems more like tidying that fixing a functional issue, do I
> misunderstand?

Hmm. Maybe the misunderstanding is mine -- Kevin was asking if
it was already a bug and you answered yes:
https://lore.kernel.org/linux-iommu/20231016115736.GP3952@nvidia.com/

If this shouldn't be a bug fix, I could just merge them into a
single tidying patch and add the comments you suggested below.

> > And where should we add this comment? Kdoc of
> > the alloc uAPI?
> 
> Maybe right in front of the only enforce_cc op callback?

OK. How about this? Might be a bit verbose though:
	/*
	 * enforce_cache_coherenc must be determined during the HWPT allocation.
	 * Note that a HWPT (non-CC) created for a device (non-CC) can be later
	 * reused by another device (either non-CC or CC). However, A HWPT (CC)
	 * created for a device (CC) cannot be reused by another device (non-CC)
	 * but only devices (CC). Instead user space in this case would need to
	 * allocate a separate HWPT (non-CC).
	 */

Thanks
Nic

^ permalink raw reply	[flat|nested] 106+ messages in thread

* RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-23  0:18                               ` Nicolin Chen
@ 2023-10-23  2:53                                 ` Tian, Kevin
  2023-10-23 18:42                                   ` Nicolin Chen
  2023-10-23 13:59                                 ` Jason Gunthorpe
  1 sibling, 1 reply; 106+ messages in thread
From: Tian, Kevin @ 2023-10-23  2:53 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong, Martins,
	Joao

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Monday, October 23, 2023 8:18 AM
> 
> On Sat, Oct 21, 2023 at 01:38:04PM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
> > > On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
> > > >
> > > > > But the user shouldn't assume such explicit consistency since it's not
> > > > > defined in our uAPI. All we defined is that the attaching may
> > > > > fail due to incompatibility for whatever reason then the user can
> > > > > always try creating a new hwpt for the to-be-attached device. From
> > > > > this regard I don't see providing consistency of result is
> > > > > necessary. 😊
> > > >
> > > > Anyhow, OK, lets add a comment summarizing your points and remove
> the
> > > > cc upgrade at attach time (sorry Nicolin/Yi!)
> > >
> > > Ack. I will send a small removal series. I assume it should CC
> > > stable tree also?
> >
> > No, it seems more like tidying that fixing a functional issue, do I
> > misunderstand?
> 
> Hmm. Maybe the misunderstanding is mine -- Kevin was asking if
> it was already a bug and you answered yes:
> https://lore.kernel.org/linux-iommu/20231016115736.GP3952@nvidia.com/
> 

currently intel-iommu driver already rejects 1) enforcing CC on
a domain which is already attached to non-CC device and
2) attaching a non-CC device to a domain which has enforce_cc.

so there is no explorable bug to fix in stable tree.

Just logically intel-iommu driver doesn't support enforcing Cc
on a domain which is capable of enforce-cc and already has
valid mappings. Then it should add proper check to prevent
it from being attempted by future usages.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-23  0:18                               ` Nicolin Chen
  2023-10-23  2:53                                 ` Tian, Kevin
@ 2023-10-23 13:59                                 ` Jason Gunthorpe
  2023-10-23 18:49                                   ` Nicolin Chen
  1 sibling, 1 reply; 106+ messages in thread
From: Jason Gunthorpe @ 2023-10-23 13:59 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Sun, Oct 22, 2023 at 05:18:03PM -0700, Nicolin Chen wrote:

> > > And where should we add this comment? Kdoc of
> > > the alloc uAPI?
> > 
> > Maybe right in front of the only enforce_cc op callback?
> 
> OK. How about this? Might be a bit verbose though:
> 	/*
> 	 * enforce_cache_coherenc must be determined during the HWPT allocation.
> 	 * Note that a HWPT (non-CC) created for a device (non-CC) can be later
> 	 * reused by another device (either non-CC or CC). However, A HWPT (CC)
> 	 * created for a device (CC) cannot be reused by another device (non-CC)
> 	 * but only devices (CC). Instead user space in this case would need to
> 	 * allocate a separate HWPT (non-CC).
> 	 */

Ok, fix the spello in enforce_cache_coherenc

Jason

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-23  2:53                                 ` Tian, Kevin
@ 2023-10-23 18:42                                   ` Nicolin Chen
  0 siblings, 0 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-23 18:42 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Mon, Oct 23, 2023 at 02:53:20AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Monday, October 23, 2023 8:18 AM
> >
> > On Sat, Oct 21, 2023 at 01:38:04PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
> > > > On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> > > > > On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
> > > > >
> > > > > > But the user shouldn't assume such explicit consistency since it's not
> > > > > > defined in our uAPI. All we defined is that the attaching may
> > > > > > fail due to incompatibility for whatever reason then the user can
> > > > > > always try creating a new hwpt for the to-be-attached device. From
> > > > > > this regard I don't see providing consistency of result is
> > > > > > necessary. 😊
> > > > >
> > > > > Anyhow, OK, lets add a comment summarizing your points and remove
> > the
> > > > > cc upgrade at attach time (sorry Nicolin/Yi!)
> > > >
> > > > Ack. I will send a small removal series. I assume it should CC
> > > > stable tree also?
> > >
> > > No, it seems more like tidying that fixing a functional issue, do I
> > > misunderstand?
> >
> > Hmm. Maybe the misunderstanding is mine -- Kevin was asking if
> > it was already a bug and you answered yes:
> > https://lore.kernel.org/linux-iommu/20231016115736.GP3952@nvidia.com/
> >
> 
> currently intel-iommu driver already rejects 1) enforcing CC on
> a domain which is already attached to non-CC device and
> 2) attaching a non-CC device to a domain which has enforce_cc.
> 
> so there is no explorable bug to fix in stable tree.

I see. Thanks!

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains
  2023-10-23 13:59                                 ` Jason Gunthorpe
@ 2023-10-23 18:49                                   ` Nicolin Chen
  0 siblings, 0 replies; 106+ messages in thread
From: Nicolin Chen @ 2023-10-23 18:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong, Martins, Joao

On Mon, Oct 23, 2023 at 10:59:35AM -0300, Jason Gunthorpe wrote:
> On Sun, Oct 22, 2023 at 05:18:03PM -0700, Nicolin Chen wrote:
> 
> > > > And where should we add this comment? Kdoc of
> > > > the alloc uAPI?
> > > 
> > > Maybe right in front of the only enforce_cc op callback?
> > 
> > OK. How about this? Might be a bit verbose though:
> > 	/*
> > 	 * enforce_cache_coherenc must be determined during the HWPT allocation.
> > 	 * Note that a HWPT (non-CC) created for a device (non-CC) can be later
> > 	 * reused by another device (either non-CC or CC). However, A HWPT (CC)
> > 	 * created for a device (CC) cannot be reused by another device (non-CC)
> > 	 * but only devices (CC). Instead user space in this case would need to
> > 	 * allocate a separate HWPT (non-CC).
> > 	 */
> 
> Ok, fix the spello in enforce_cache_coherenc

Oops.

I also found the existing piece sorta says a similar thing:
         * Set the coherency mode before we do iopt_table_add_domain() as some
         * iommus have a per-PTE bit that controls it and need to decide before
         * doing any maps. 

So, did this and sending v3:
-        * enforce_cache_coherenc must be determined during the HWPT allocation.
+        * The cache coherency mode must be configured here and unchanged later.

^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2023-10-23 18:49 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21  7:51 [PATCH v4 00/17] iommufd: Add nesting infrastructure Yi Liu
2023-09-21  7:51 ` [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op Yi Liu
2023-09-21 12:10   ` Baolu Lu
2023-09-21 20:58     ` Nicolin Chen
2023-09-25 13:05       ` Jason Gunthorpe
2023-09-25 18:17         ` Nicolin Chen
2023-09-25  6:22     ` Yi Liu
2023-09-25  8:01       ` Baolu Lu
2023-09-25 13:00         ` Jason Gunthorpe
2023-09-21 12:12   ` Baolu Lu
2023-09-21 16:44     ` Jason Gunthorpe
2023-09-22  9:47       ` Robin Murphy
2023-09-25  6:17         ` Yi Liu
2023-09-25 12:59           ` Jason Gunthorpe
2023-09-26  6:37         ` Tian, Kevin
2023-09-26 16:29           ` Jason Gunthorpe
2023-09-27  1:08             ` Tian, Kevin
2023-10-07  9:38               ` Yi Liu
2023-09-26  6:56   ` Tian, Kevin
2023-10-12  9:12     ` Yi Liu
2023-10-13 11:42       ` Yi Liu
2023-10-13 22:22         ` Nicolin Chen
2023-10-10 16:58   ` Jason Gunthorpe
2023-10-12  9:11     ` Yi Liu
2023-10-12 13:39       ` Jason Gunthorpe
2023-10-13  4:33         ` Yi Liu
2023-10-13 14:04           ` Jason Gunthorpe
2023-10-13 17:56             ` Nicolin Chen
2023-10-16  3:28               ` Yi Liu
2023-10-16 11:54                 ` Jason Gunthorpe
2023-10-16 18:17                   ` Nicolin Chen
2023-10-17  8:51                     ` Yi Liu
2023-10-17  9:28                       ` Tian, Kevin
2023-10-18  6:12                         ` Yi Liu
2023-10-18 16:37                     ` Jason Gunthorpe
2023-10-18 16:51                       ` Nicolin Chen
2023-10-13  0:34     ` Nicolin Chen
2023-10-13 14:03       ` Jason Gunthorpe
2023-09-21  7:51 ` [PATCH v4 02/17] iommu: Add nested domain support Yi Liu
2023-10-10 17:21   ` Jason Gunthorpe
2023-10-12  9:13     ` Yi Liu
2023-10-14  0:47       ` Nicolin Chen
2023-09-21  7:51 ` [PATCH v4 03/17] iommufd: Unite all kernel-managed members into a struct Yi Liu
2023-09-26  7:46   ` Tian, Kevin
2023-10-07 10:08   ` Yan Zhao
2023-10-09  4:13     ` Yi Liu
2023-10-09  5:13       ` Yan Zhao
2023-10-10  3:26         ` Yi Liu
2023-09-21  7:51 ` [PATCH v4 04/17] iommufd: Pass in hwpt_type/user_data to iommufd_hw_pagetable_alloc() Yi Liu
2023-09-21  7:51 ` [PATCH v4 05/17] iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions Yi Liu
2023-09-26  7:57   ` Tian, Kevin
2023-10-10 18:49   ` Jason Gunthorpe
2023-10-12 19:09     ` Jason Gunthorpe
2023-10-13  7:13       ` Tian, Kevin
2023-10-13 14:05         ` Jason Gunthorpe
2023-10-16  8:26           ` Tian, Kevin
2023-10-16 12:02             ` Jason Gunthorpe
2023-09-21  7:51 ` [PATCH v4 06/17] iommufd: Add shared alloc_fn function pointer and mutex pointer Yi Liu
2023-09-21  7:51 ` [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support Yi Liu
2023-09-26  8:14   ` Tian, Kevin
2023-10-14  0:08     ` Nicolin Chen
2023-09-21  7:51 ` [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains Yi Liu
2023-09-26  8:16   ` Tian, Kevin
2023-10-14  0:44     ` Nicolin Chen
2023-10-16  8:48       ` Tian, Kevin
2023-10-16 11:57         ` Jason Gunthorpe
2023-10-17  8:52           ` Tian, Kevin
2023-10-17 15:53             ` Jason Gunthorpe
2023-10-17 19:58               ` Nicolin Chen
2023-10-18 16:51                 ` Jason Gunthorpe
2023-10-19  1:56                   ` Tian, Kevin
2023-10-19 23:53                     ` Jason Gunthorpe
2023-10-20  2:43                       ` Tian, Kevin
2023-10-20 13:55                         ` Jason Gunthorpe
2023-10-20 18:59                           ` Nicolin Chen
2023-10-21 16:38                             ` Jason Gunthorpe
2023-10-23  0:18                               ` Nicolin Chen
2023-10-23  2:53                                 ` Tian, Kevin
2023-10-23 18:42                                   ` Nicolin Chen
2023-10-23 13:59                                 ` Jason Gunthorpe
2023-10-23 18:49                                   ` Nicolin Chen
2023-10-18  2:32             ` Baolu Lu
2023-10-17  9:05           ` Tian, Kevin
2023-09-21  7:51 ` [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions Yi Liu
2023-10-07  7:20   ` Yan Zhao
2023-10-07  9:27     ` Nicolin Chen
2023-09-21  7:51 ` [PATCH v4 10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data Yi Liu
2023-10-13 15:19   ` Jason Gunthorpe
2023-10-13 20:58     ` Nicolin Chen
2023-10-14  0:07       ` Jason Gunthorpe
2023-10-14  0:51         ` Nicolin Chen
2023-10-16  7:03           ` Yi Liu
2023-10-16 11:59             ` Jason Gunthorpe
2023-10-16 18:44               ` Nicolin Chen
2023-10-17  8:55                 ` Yi Liu
2023-10-17 15:50                   ` Jason Gunthorpe
2023-10-17 19:32                     ` Nicolin Chen
2023-09-21  7:51 ` [PATCH v4 11/17] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
2023-09-21  7:51 ` [PATCH v4 12/17] iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly Yi Liu
2023-09-21  7:51 ` [PATCH v4 13/17] iommufd/selftest: Add nested domain allocation for mock domain Yi Liu
2023-09-21  7:51 ` [PATCH v4 14/17] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs Yi Liu
2023-09-21  7:51 ` [PATCH v4 15/17] iommufd/selftest: Add mock_domain_cache_invalidate_user support Yi Liu
2023-09-21  7:51 ` [PATCH v4 16/17] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
2023-09-21  7:51 ` [PATCH v4 17/17] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
2023-10-10 16:53 ` [PATCH v4 00/17] iommufd: Add nesting infrastructure Jason Gunthorpe
2023-10-12  8:45   ` Yi Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).