linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] iommufd: Add nesting infrastructure
@ 2023-05-11 14:38 Yi Liu
  2023-05-11 14:38 ` [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace Yi Liu
                   ` (11 more replies)
  0 siblings, 12 replies; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.

Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest I/O page table      |
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush --+
    '-------------'                        |
    |             |                        V
    |             |           I/O page table pointer in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .------------------------.
    |   pIOMMU    |  |  FS for GIOVA->GPA     |
    |             |  '------------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.----------------------------------.
    |             |   | SS for GPA->HPA, unmanaged domain|
    |             |   '----------------------------------'
    '-------------'
Where:
 - FS = First stage page tables
 - SS = Second stage page tables
<Intel VT-d Nested translation>

In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.

This series is based on the iommu hw info reporting series [1]. It first
introduces new iommu op for allocating domains with user data and the op
for syncing stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure
to accept user_data and parent hwpt, then relay the data to iommu core to
allocate iommu_domain. After it, extend the ioctl IOMMU_HWPT_ALLOC to accept
user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl
IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed
for user-managed hwpts. Selftest is added as well to cover the new ioctls.

Complete code can be found in [2], QEMU could can be found in [3].

At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.

base-commit: cf905391237ded2331388e75adb5afbabeddc852

[1] https://lore.kernel.org/linux-iommu/20230511143024.19542-1-yi.l.liu@intel.com/
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4.mig.reset.v4_var3%2Bnesting

Change log:

v2:
 - Add union iommu_domain_user_data to include all user data structures to avoid
   passing void * in kernel APIs.
 - Add iommu op to return user data length for user domain allocation
 - Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
 - Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
 - Convert cache_invalidate_user op to be int instead of void
 - Remove @data_type in struct iommu_hwpt_invalidate
 - Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1

v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.com/

Thanks,
	Yi Liu

Lu Baolu (2):
  iommu: Add new iommu op to create domains owned by userspace
  iommu: Add nested domain support

Nicolin Chen (5):
  iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  iommufd/selftest: Add domain_alloc_user() support in iommu mock
  iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
  iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
  iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl

Yi Liu (4):
  iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
  iommufd: Pass parent hwpt and user_data to
    iommufd_hw_pagetable_alloc()
  iommufd: IOMMU_HWPT_ALLOC allocation with user data
  iommufd: Add IOMMU_HWPT_INVALIDATE

 drivers/iommu/iommufd/device.c                |   2 +-
 drivers/iommu/iommufd/hw_pagetable.c          | 191 +++++++++++++++++-
 drivers/iommu/iommufd/iommufd_private.h       |  16 +-
 drivers/iommu/iommufd/iommufd_test.h          |  30 +++
 drivers/iommu/iommufd/main.c                  |   5 +-
 drivers/iommu/iommufd/selftest.c              | 119 ++++++++++-
 include/linux/iommu.h                         |  36 ++++
 include/uapi/linux/iommufd.h                  |  58 +++++-
 tools/testing/selftests/iommu/iommufd.c       | 126 +++++++++++-
 tools/testing/selftests/iommu/iommufd_utils.h |  70 +++++++
 10 files changed, 629 insertions(+), 24 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-19  8:47   ` Tian, Kevin
  2023-05-11 14:38 ` [PATCH v2 02/11] iommu: Add nested domain support Yi Liu
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Lu Baolu <baolu.lu@linux.intel.com>

Introduce a new iommu_domain op to create domains owned by userspace,
e.g. through iommufd. These domains have a few different properties
compares to kernel owned domains:

 - They may be UNMANAGED domains, but created with special parameters.
   For instance aperture size changes/number of levels, different
   IOPTE formats, or other things necessary to make a vIOMMU work

 - We have to track all the memory allocations with GFP_KERNEL_ACCOUNT
   to make the cgroup sandbox stronger

 - Device-specialty domains, such as NESTED domains can be created by
   iommufd.

The new op clearly says the domain is being created by IOMMUFD, that
the domain is intended for userspace use, and it provides a way to pass
a driver specific uAPI structure to customize the created domain to
exactly what the vIOMMU userspace driver requires.

iommu drivers that cannot support VFIO/IOMMUFD should not support this
op. This includes any driver that cannot provide a fully functional
UNMANAGED domain.

This op chooses to make the special parameters opaque to the core. This
suits the current usage model where accessing any of the IOMMU device
special parameters does require a userspace driver that matches the
kernel driver. If a need for common parameters, implemented similarly
by several drivers, arises then there is room in the design to grow a
generic parameter set as well.

This new op for now is only supposed to be used by iommufd, hence no
wrapper for it. iommufd would call the callback directly. As for domain
free, iommufd would use iommu_domain_free().

Also, add an op to return the length of supported user data structures
that must be added to include/uapi/include/iommufd.h file. This helps
the iommufd core to sanitize the input data before it forwards the data
to an iommu driver.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/linux/iommu.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a748d60206e7..7f2046fa53a3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -220,6 +220,15 @@ struct iommu_iotlb_gather {
 	bool			queued;
 };
 
+/*
+ * The user data to allocate a specific type user iommu domain
+ *
+ * This includes the corresponding driver data structures in
+ * include/uapi/linux/iommufd.h.
+ */
+union iommu_domain_user_data {
+};
+
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
@@ -229,6 +238,15 @@ struct iommu_iotlb_gather {
  *           after use. Return the data buffer if success, or ERR_PTR on
  *           failure.
  * @domain_alloc: allocate iommu domain
+ * @domain_alloc_user: allocate user iommu domain
+ * @domain_alloc_user_data_len: return the required length of the user data
+ *                              to allocate a specific type user iommu domain.
+ *                              @hwpt_type is defined as enum iommu_hwpt_type
+ *                              in include/uapi/linux/iommufd.h. The returned
+ *                              length is the corresponding sizeof driver data
+ *                              structures in include/uapi/linux/iommufd.h.
+ *                              -EOPNOTSUPP would be returned if the input
+ *                              @hwpt_type is not supported by the driver.
  * @probe_device: Add device to iommu driver handling
  * @release_device: Remove device from iommu driver handling
  * @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -269,6 +287,10 @@ struct iommu_ops {
 
 	/* Domain allocation and freeing by the iommu driver */
 	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
+	struct iommu_domain *(*domain_alloc_user)(struct device *dev,
+						  struct iommu_domain *parent,
+						  const union iommu_domain_user_data *user_data);
+	int (*domain_alloc_user_data_len)(u32 hwpt_type);
 
 	struct iommu_device *(*probe_device)(struct device *dev);
 	void (*release_device)(struct device *dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 02/11] iommu: Add nested domain support
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
  2023-05-11 14:38 ` [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-19  8:51   ` Tian, Kevin
  2023-05-11 14:38 ` [PATCH v2 03/11] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation Yi Liu
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Lu Baolu <baolu.lu@linux.intel.com>

Introduce a new domain type for a user I/O page table, which is nested
on top of another user space address represented by a UNMANAGED domain. The
mappings of a nested domain are managed by user space software, therefore
it's unnecessary to have map/unmap callbacks. But the updates of the PTEs
in the nested domain page table must be propagated to the caches on both
IOMMU (IOTLB) and devices (DevTLB).

The nested domain is allocated by the domain_alloc_user op, and attached
to the device through the existing iommu_attach_device/group() interfaces.

A new domain op, named cache_invalidate_user is added for the userspace to
flush the hardware caches for a nested domain through iommufd. No wrapper
for it, as it's only supposed to be used by iommufd.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/linux/iommu.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7f2046fa53a3..c2d0fa3e2e18 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -66,6 +66,9 @@ struct iommu_domain_geometry {
 
 #define __IOMMU_DOMAIN_SVA	(1U << 4)  /* Shared process address space */
 
+#define __IOMMU_DOMAIN_NESTED	(1U << 5)  /* User-managed IOVA nested on
+					      a stage-2 translation        */
+
 /*
  * This are the possible domain-types
  *
@@ -91,6 +94,7 @@ struct iommu_domain_geometry {
 				 __IOMMU_DOMAIN_DMA_API |	\
 				 __IOMMU_DOMAIN_DMA_FQ)
 #define IOMMU_DOMAIN_SVA	(__IOMMU_DOMAIN_SVA)
+#define IOMMU_DOMAIN_NESTED	(__IOMMU_DOMAIN_NESTED)
 
 struct iommu_domain {
 	unsigned type;
@@ -346,6 +350,10 @@ struct iommu_ops {
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *            queue
+ * @cache_invalidate_user: Flush hardware TLBs caching user space IO mappings
+ * @cache_invalidate_user_data_len: Defined length of input user data for the
+ *                                  cache_invalidate_user op, being sizeof the
+ *                                  structure in include/uapi/linux/iommufd.h
  * @iova_to_phys: translate iova to physical address
  * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE,
  *                           including no-snoop TLPs on PCIe or other platform
@@ -375,6 +383,9 @@ struct iommu_domain_ops {
 			       size_t size);
 	void (*iotlb_sync)(struct iommu_domain *domain,
 			   struct iommu_iotlb_gather *iotlb_gather);
+	int (*cache_invalidate_user)(struct iommu_domain *domain,
+				     void *user_data);
+	const size_t cache_invalidate_user_data_len;
 
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
 				    dma_addr_t iova);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 03/11] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
  2023-05-11 14:38 ` [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace Yi Liu
  2023-05-11 14:38 ` [PATCH v2 02/11] iommu: Add nested domain support Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-19  8:56   ` Tian, Kevin
  2023-05-11 14:38 ` [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

This makes IOMMUFD to use iommu_domain_alloc_user() for iommu_domain
creation as IOMMUFD needs to support iommu_domain allocation with
parameters from userspace in nesting support. If the iommu driver
doesn't provide domain_alloc_user callback then it falls back to use
iommu_domain_alloc().

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index cf2c1504e20d..b6323ad9c32d 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -5,6 +5,7 @@
 #include <linux/iommu.h>
 #include <uapi/linux/iommufd.h>
 
+#include "../iommu-priv.h"
 #include "iommufd_private.h"
 
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
@@ -74,6 +75,7 @@ struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			   struct iommufd_device *idev, bool immediate_attach)
 {
+	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
@@ -88,7 +90,10 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	refcount_inc(&ioas->obj.users);
 	hwpt->ioas = ioas;
 
-	hwpt->domain = iommu_domain_alloc(idev->dev->bus);
+	if (ops->domain_alloc_user)
+		hwpt->domain = ops->domain_alloc_user(idev->dev, NULL, NULL);
+	else
+		hwpt->domain = iommu_domain_alloc(idev->dev->bus);
 	if (!hwpt->domain) {
 		rc = -ENOMEM;
 		goto out_abort;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
                   ` (2 preceding siblings ...)
  2023-05-11 14:38 ` [PATCH v2 03/11] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-19  9:06   ` Tian, Kevin
  2023-05-19  9:34   ` Tian, Kevin
  2023-05-11 14:38 ` [PATCH v2 05/11] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables Yi Liu
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

Nested translation has stage-1 and stage-2 page tables. A stage-1 page
table is managed by user space, and it needs to work with a stage-2 page
table, which is a parent hwpt for the stage-1 hwpt.

iommu core already supports accepting parent iommu_domain and user_data
to allocate an iommu_domain. This makes iommufd_hw_pagetable_alloc() to
accept the parent hwpt and user_data, and relays them to iommu core, to
prepare for supporting hw_pagetable allocation with user_data.

Also, add a parent pointer in struct iommufd_hw_pagetable for taking and
releasing its refcount.

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          |  2 +-
 drivers/iommu/iommufd/hw_pagetable.c    | 34 ++++++++++++++++++++++---
 drivers/iommu/iommufd/iommufd_private.h |  7 ++++-
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4541d785bfd8..58c4deb3cb5d 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -584,7 +584,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 	}
 
 	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
-					  immediate_attach);
+					  NULL, NULL, immediate_attach);
 	if (IS_ERR(hwpt)) {
 		destroy_hwpt = ERR_CAST(hwpt);
 		goto out_unlock;
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index b6323ad9c32d..ec9b140939ce 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -24,6 +24,8 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 	if (hwpt->domain)
 		iommu_domain_free(hwpt->domain);
 
+	if (hwpt->parent)
+		refcount_dec(&hwpt->parent->obj.users);
 	refcount_dec(&hwpt->ioas->obj.users);
 }
 
@@ -61,6 +63,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
  * @ictx: iommufd context
  * @ioas: IOAS to associate the domain with
  * @idev: Device to get an iommu_domain for
+ * @parent: Optional parent HWPT to associate with the domain with
+ * @user_data: Optional user_data pointer
  * @immediate_attach: True if idev should be attached to the hwpt
  *
  * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
@@ -73,14 +77,22 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
  */
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
-			   struct iommufd_device *idev, bool immediate_attach)
+			   struct iommufd_device *idev,
+			   struct iommufd_hw_pagetable *parent,
+			   union iommu_domain_user_data *user_data,
+			   bool immediate_attach)
 {
 	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+	struct iommu_domain *parent_domain = NULL;
 	struct iommufd_hw_pagetable *hwpt;
+	bool type_unmanaged, type_nested;
 	int rc;
 
 	lockdep_assert_held(&ioas->mutex);
 
+	if ((user_data || parent) && !ops->domain_alloc_user)
+		return ERR_PTR(-EOPNOTSUPP);
+
 	hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
 	if (IS_ERR(hwpt))
 		return hwpt;
@@ -89,9 +101,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	/* Pairs with iommufd_hw_pagetable_destroy() */
 	refcount_inc(&ioas->obj.users);
 	hwpt->ioas = ioas;
+	if (parent) {
+		hwpt->parent = parent;
+		parent_domain = parent->domain;
+		refcount_inc(&parent->obj.users);
+	}
 
 	if (ops->domain_alloc_user)
-		hwpt->domain = ops->domain_alloc_user(idev->dev, NULL, NULL);
+		hwpt->domain = ops->domain_alloc_user(idev->dev,
+						      parent_domain, user_data);
 	else
 		hwpt->domain = iommu_domain_alloc(idev->dev->bus);
 	if (!hwpt->domain) {
@@ -99,6 +117,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 		goto out_abort;
 	}
 
+	/* It must be either NESTED or UNMANAGED, depending on parent_domain */
+       type_nested = hwpt->domain->type == IOMMU_DOMAIN_NESTED;
+       type_unmanaged = hwpt->domain->type == IOMMU_DOMAIN_UNMANAGED;
+       if (WARN_ON((parent_domain && !type_nested) ||
+                   (!parent_domain && !type_unmanaged))) {
+		rc = -EINVAL;
+		goto out_abort;
+	}
+
 	/*
 	 * Set the coherency mode before we do iopt_table_add_domain() as some
 	 * iommus have a per-PTE bit that controls it and need to decide before
@@ -160,7 +187,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	}
 
 	mutex_lock(&ioas->mutex);
-	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false);
+	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev,
+					  NULL, NULL, false);
 	if (IS_ERR(hwpt)) {
 		rc = PTR_ERR(hwpt);
 		goto out_unlock;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 69d6bb61d387..9fe807e0aed6 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -8,6 +8,7 @@
 #include <linux/xarray.h>
 #include <linux/refcount.h>
 #include <linux/uaccess.h>
+#include <linux/iommu.h>
 
 struct iommu_domain;
 struct iommu_group;
@@ -243,6 +244,7 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
  */
 struct iommufd_hw_pagetable {
 	struct iommufd_object obj;
+	struct iommufd_hw_pagetable *parent;
 	struct iommufd_ioas *ioas;
 	struct iommu_domain *domain;
 	bool auto_domain : 1;
@@ -254,7 +256,10 @@ struct iommufd_hw_pagetable {
 
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
-			   struct iommufd_device *idev, bool immediate_attach);
+			   struct iommufd_device *idev,
+			   struct iommufd_hw_pagetable *parent,
+			   union iommu_domain_user_data *user_data,
+			   bool immediate_attach);
 int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt);
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				struct iommufd_device *idev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 05/11] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
                   ` (3 preceding siblings ...)
  2023-05-11 14:38 ` [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-11 14:38 ` [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data Yi Liu
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Nicolin Chen <nicolinc@nvidia.com>

A user-managed hw_pagetable does not need to get populated, since it is
managed by a guest OS. Move the iopt_table_add_domain and list_add_tail
calls into a helper, where the hwpt pointer will be redirected to its
hwpt->parent if it's available.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index ec9b140939ce..73d7d9d07726 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -58,6 +58,24 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
 	return 0;
 }
 
+static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt)
+{
+	int rc;
+
+	/*
+	 * Only a parent hwpt needs to be linked to the IOAS. And a hwpt->parent
+	 * must be linked to the IOAS already, when it's being allocated.
+	 */
+	if (hwpt->parent)
+		return 0;
+
+	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
+	if (rc)
+		return rc;
+	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
+	return 0;
+}
+
 /**
  * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
  * @ictx: iommufd context
@@ -151,10 +169,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			goto out_abort;
 	}
 
-	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
+	rc = iommufd_hw_pagetable_link_ioas(hwpt);
 	if (rc)
 		goto out_detach;
-	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
 	return hwpt;
 
 out_detach:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
                   ` (4 preceding siblings ...)
  2023-05-11 14:38 ` [PATCH v2 05/11] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-17  3:08   ` Liu, Jingqi
  2023-05-19  9:41   ` Tian, Kevin
  2023-05-11 14:38 ` [PATCH v2 07/11] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
But it can only allocate hw_pagetables linked with IOAS. There are needs
to support hw_pagetable allocation with parameters specified by user. For
example, in nested translation, user needs to allocate hw_pagetable for
the stage-1 translation (e.g. a single I/O page table or a set of I/O page
tables) with user data. It also needs provide a stage-2 hw_pagetable which
is linked to the GPA IOAS.

This extends IOMMU_HWPT_ALLOC to accept user specified parameter and hwpt
ID in @pt_id field. Such as the user-managed stage-1 hwpt, which requires
a parent hwpt to point to stage-2 translation.

enum iommu_hwpt_type is defined to differentiate the user parameters use
by different usages. For the allocations that don't require user parameter,
IOMMU_HWPT_TYPE_DEFAULT is defined for backward compatibility. Other types
would be added by future iommu vendor driver extensions.

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c | 87 +++++++++++++++++++++++++---
 drivers/iommu/iommufd/main.c         |  2 +-
 include/uapi/linux/iommufd.h         | 32 +++++++++-
 3 files changed, 112 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 73d7d9d07726..e84270eb6e49 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -184,11 +184,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 
 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 {
+	struct iommufd_hw_pagetable *hwpt, *parent = NULL;
+	union iommu_domain_user_data *data = NULL;
 	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
-	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_object *pt_obj;
+	const struct iommu_ops *ops;
 	struct iommufd_device *idev;
 	struct iommufd_ioas *ioas;
-	int rc;
+	int klen = 0;
+	int rc = 0;
 
 	if (cmd->flags || cmd->__reserved)
 		return -EOPNOTSUPP;
@@ -197,15 +201,81 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	if (IS_ERR(idev))
 		return PTR_ERR(idev);
 
-	ioas = iommufd_get_ioas(ucmd->ictx, cmd->pt_id);
-	if (IS_ERR(ioas)) {
-		rc = PTR_ERR(ioas);
+	ops = dev_iommu_ops(idev->dev);
+
+	/*
+	 * All drivers support IOMMU_HWPT_TYPE_DEFAULT, so pass it through.
+	 * For any other hwpt_type, check the ops->domain_alloc_user_data_len
+	 * presence and its result.
+	 */
+	if (cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT) {
+		if (!ops->domain_alloc_user_data_len) {
+			rc = -EOPNOTSUPP;
+			goto out_put_idev;
+		}
+		klen = ops->domain_alloc_user_data_len(cmd->hwpt_type);
+		if (WARN_ON(klen < 0)) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}
+	}
+
+	pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY);
+	if (IS_ERR(pt_obj)) {
+		rc = -EINVAL;
 		goto out_put_idev;
 	}
 
+	switch (pt_obj->type) {
+	case IOMMUFD_OBJ_IOAS:
+		ioas = container_of(pt_obj, struct iommufd_ioas, obj);
+		break;
+	case IOMMUFD_OBJ_HW_PAGETABLE:
+		/* pt_id points HWPT only when hwpt_type is !IOMMU_HWPT_TYPE_DEFAULT */
+		if (cmd->hwpt_type == IOMMU_HWPT_TYPE_DEFAULT) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}
+
+		parent = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+		/*
+		 * Cannot allocate user-managed hwpt linking to auto_created
+		 * hwpt. If the parent hwpt is already a user-managed hwpt,
+		 * don't allocate another user-managed hwpt linking to it.
+		 */
+		if (parent->auto_domain || parent->parent) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}
+		ioas = parent->ioas;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out_put_pt;
+	}
+
+	if (klen) {
+		if (!cmd->data_len) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}
+
+		data = kzalloc(sizeof(*data), GFP_KERNEL);
+		if (!data) {
+			rc = -ENOMEM;
+			goto out_put_pt;
+		}
+
+		rc = copy_struct_from_user(data, sizeof(*data),
+					   u64_to_user_ptr(cmd->data_uptr),
+					   cmd->data_len);
+		if (rc)
+			goto out_free_data;
+	}
+
 	mutex_lock(&ioas->mutex);
 	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev,
-					  NULL, NULL, false);
+					  parent, data, false);
 	if (IS_ERR(hwpt)) {
 		rc = PTR_ERR(hwpt);
 		goto out_unlock;
@@ -222,7 +292,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
 out_unlock:
 	mutex_unlock(&ioas->mutex);
-	iommufd_put_object(&ioas->obj);
+out_free_data:
+	kfree(data);
+out_put_pt:
+	iommufd_put_object(pt_obj);
 out_put_idev:
 	iommufd_put_object(&idev->obj);
 	return rc;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 5c24e8971f09..ac81403ba78e 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -302,7 +302,7 @@ struct iommufd_ioctl_op {
 static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id),
 	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
-		 __reserved),
+		 data_uptr),
 	IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info,
 		 struct iommu_hw_info, __reserved),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index e9d42838dcbd..699e735828db 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -347,6 +347,14 @@ struct iommu_vfio_ioas {
 };
 #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
 
+/**
+ * enum iommu_hwpt_type - IOMMU HWPT Type
+ * @IOMMU_HWPT_TYPE_DEFAULT: default
+ */
+enum iommu_hwpt_type {
+	IOMMU_HWPT_TYPE_DEFAULT,
+};
+
 /**
  * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
  * @size: sizeof(struct iommu_hwpt_alloc)
@@ -355,12 +363,31 @@ struct iommu_vfio_ioas {
  * @pt_id: The IOAS to connect this HWPT to
  * @out_hwpt_id: The ID of the new HWPT
  * @__reserved: Must be 0
+ * @hwpt_type: One of enum iommu_hwpt_type
+ * @data_len: Length of the type specific data
+ * @data_uptr: User pointer to the type specific data
  *
  * Explicitly allocate a hardware page table object. This is the same object
  * type that is returned by iommufd_device_attach() and represents the
  * underlying iommu driver's iommu_domain kernel object.
  *
- * A HWPT will be created with the IOVA mappings from the given IOAS.
+ * A kernel-managed HWPT will be created with the mappings from the given IOAS.
+ * The @hwpt_type for its allocation can be set to IOMMU_HWPT_TYPE_DEFAULT, or
+ * another type (being listed below) to specialize a kernel-managed HWPT.
+ *
+ * A user-managed HWPT will be created from a given parent HWPT via @pt_id, in
+ * which the parent HWPT must be allocated previously via the same ioctl from a
+ * given IOAS. The @hwpt_type must not be set to IOMMU_HWPT_TYPE_DEFAULT but a
+ * pre-defined type corresponding to the underlying IOMMU hardware.
+ *
+ * If the @hwpt_type is set to IOMMU_HWPT_TYPE_DEFAULT, both the @data_len and
+ * the @data_uptr will be ignored. Otherwise, both of them must be given.
+ *
+ * +==============================+=====================================+===========+
+ * | @hwpt_type                   |    Data structure in @data_uptr     |   @pt_id  |
+ * +------------------------------+-------------------------------------+-----------+
+ * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
+ * +------------------------------+-------------------------------------+-----------+
  */
 struct iommu_hwpt_alloc {
 	__u32 size;
@@ -369,6 +396,9 @@ struct iommu_hwpt_alloc {
 	__u32 pt_id;
 	__u32 out_hwpt_id;
 	__u32 __reserved;
+	__u32 hwpt_type;
+	__u32 data_len;
+	__aligned_u64 data_uptr;
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 07/11] iommufd: Add IOMMU_HWPT_INVALIDATE
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
                   ` (5 preceding siblings ...)
  2023-05-11 14:38 ` [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-11 14:38 ` [PATCH v2 08/11] iommufd/selftest: Add domain_alloc_user() support in iommu mock Yi Liu
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

In nested translation, the stage-1 page table is user-managed and used
by IOMMU hardware, so update of any present page table entry in the stage-1
page table should be followed with an IOTLB invalidation.

This adds IOMMU_HWPT_INVALIDATE for IOTLB invalidation.

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 45 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h |  9 +++++
 drivers/iommu/iommufd/main.c            |  3 ++
 include/uapi/linux/iommufd.h            | 26 ++++++++++++++
 4 files changed, 83 insertions(+)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index e84270eb6e49..8206367b8d83 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -300,3 +300,48 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	iommufd_put_object(&idev->obj);
 	return rc;
 }
+
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
+	struct iommufd_hw_pagetable *hwpt;
+	u32 user_data_len, klen;
+	u64 user_ptr;
+	int rc = 0;
+
+	if (!cmd->data_len || cmd->__reserved)
+		return -EOPNOTSUPP;
+
+	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
+
+	/* Do not allow any kernel-managed hw_pagetable */
+	if (!hwpt->parent) {
+		rc = -EINVAL;
+		goto out_put_hwpt;
+	}
+
+	klen = hwpt->domain->ops->cache_invalidate_user_data_len;
+	if (!hwpt->domain->ops->cache_invalidate_user || !klen) {
+		rc = -EOPNOTSUPP;
+		goto out_put_hwpt;
+	}
+
+	/*
+	 * Copy the needed fields before reusing the ucmd buffer, this
+	 * avoids memory allocation in this path.
+	 */
+	user_ptr = cmd->data_uptr;
+	user_data_len = cmd->data_len;
+
+	rc = copy_struct_from_user(cmd, klen,
+				   u64_to_user_ptr(user_ptr), user_data_len);
+	if (rc)
+		goto out_put_hwpt;
+
+	rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, cmd);
+out_put_hwpt:
+	iommufd_put_object(&hwpt->obj);
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 9fe807e0aed6..1f9f48e45e05 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -268,6 +268,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev);
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
 void iommufd_hw_pagetable_abort(struct iommufd_object *obj);
 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd);
 
 static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
 					    struct iommufd_hw_pagetable *hwpt)
@@ -279,6 +280,14 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
 		refcount_dec(&hwpt->obj.users);
 }
 
+static inline struct iommufd_hw_pagetable *
+iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id)
+{
+	return container_of(iommufd_get_object(ucmd->ictx, id,
+					       IOMMUFD_OBJ_HW_PAGETABLE),
+			    struct iommufd_hw_pagetable, obj);
+}
+
 struct iommufd_group {
 	struct kref ref;
 	struct mutex lock;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index ac81403ba78e..d80c312f4ebe 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -270,6 +270,7 @@ union ucmd_buffer {
 	struct iommu_destroy destroy;
 	struct iommu_hwpt_alloc hwpt;
 	struct iommu_hw_info info;
+	struct iommu_hwpt_invalidate cache;
 	struct iommu_ioas_alloc alloc;
 	struct iommu_ioas_allow_iovas allow_iovas;
 	struct iommu_ioas_copy ioas_copy;
@@ -305,6 +306,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 data_uptr),
 	IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info,
 		 struct iommu_hw_info, __reserved),
+	IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate,
+		 struct iommu_hwpt_invalidate, data_uptr),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
 		 struct iommu_ioas_alloc, out_ioas_id),
 	IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 699e735828db..6b82ef6d268b 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -47,6 +47,7 @@ enum {
 	IOMMUFD_CMD_VFIO_IOAS,
 	IOMMUFD_CMD_HWPT_ALLOC,
 	IOMMUFD_CMD_DEVICE_GET_HW_INFO,
+	IOMMUFD_CMD_HWPT_INVALIDATE,
 };
 
 /**
@@ -444,4 +445,29 @@ struct iommu_hw_info {
 	__u32 __reserved;
 };
 #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
+
+/**
+ * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
+ * @size: sizeof(struct iommu_hwpt_invalidate)
+ * @hwpt_id: HWPT ID of target hardware page table for the invalidation
+ * @data_len: Length of the type specific data
+ * @__reserved: Must be 0
+ * @data_uptr: User pointer to the type specific data
+ *
+ * Invalidate the iommu cache for user-managed page table. Modifications
+ * on user-managed page table should be followed with this operation to
+ * sync the IOTLB. The data in @data_uptr differs per the hwpt type.
+ *
+ * +==============================+========================================+
+ * | @hwpt_type                   |     Data structure in @data_uptr       |
+ * +------------------------------+----------------------------------------+
+ */
+struct iommu_hwpt_invalidate {
+	__u32 size;
+	__u32 hwpt_id;
+	__u32 data_len;
+	__u32 __reserved;
+	__aligned_u64 data_uptr;
+};
+#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 08/11] iommufd/selftest: Add domain_alloc_user() support in iommu mock
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
                   ` (6 preceding siblings ...)
  2023-05-11 14:38 ` [PATCH v2 07/11] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-11 14:38 ` [PATCH v2 09/11] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data Yi Liu
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Nicolin Chen <nicolinc@nvidia.com>

Add mock_domain_alloc_user function and iommu_hwpt_selftest data structure
to support user space selftest program to allocate domains with user data.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c |  1 +
 drivers/iommu/iommufd/iommufd_test.h | 16 ++++++
 drivers/iommu/iommufd/selftest.c     | 78 +++++++++++++++++++++++++---
 include/linux/iommu.h                |  3 ++
 4 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 8206367b8d83..47ec7ddd5f5d 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -7,6 +7,7 @@
 
 #include "../iommu-priv.h"
 #include "iommufd_private.h"
+#include "iommufd_test.h"
 
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 {
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 3f3644375bf1..e77d57dd8e25 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -109,4 +109,20 @@ struct iommu_test_hw_info {
 	__u32 test_reg;
 };
 
+/* Should not be equal to any defined value in enum iommu_hwpt_type */
+#define IOMMU_HWPT_TYPE_SELFTTEST		0xdead
+
+/**
+ * struct iommu_hwpt_selftest
+ *
+ * @flags: page table entry attributes
+ * @test_config: default iotlb setup (value IOMMU_TEST_IOTLB_DEFAULT)
+ */
+struct iommu_hwpt_selftest {
+#define IOMMU_TEST_FLAG_NESTED		(1ULL << 0)
+	__u64 flags;
+#define IOMMU_TEST_IOTLB_DEFAULT	0xbadbeef
+	__u64 test_config;
+};
+
 #endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index af7459e211ad..616525c5d308 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -84,7 +84,9 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
 
 struct mock_iommu_domain {
 	struct iommu_domain domain;
+	struct mock_iommu_domain *parent;
 	struct xarray pfns;
+	u32 iotlb;
 };
 
 enum selftest_obj_type {
@@ -142,26 +144,69 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length)
 	return info;
 }
 
-static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
+static const struct iommu_ops mock_ops;
+static struct iommu_domain_ops domain_nested_ops;
+
+static struct iommu_domain *
+__mock_domain_alloc(unsigned int iommu_domain_type,
+		    struct mock_iommu_domain *mock_parent,
+		    const struct iommu_hwpt_selftest *user_cfg)
 {
 	struct mock_iommu_domain *mock;
 
 	if (iommu_domain_type == IOMMU_DOMAIN_BLOCKED)
 		return &mock_blocking_domain;
 
-	if (WARN_ON(iommu_domain_type != IOMMU_DOMAIN_UNMANAGED))
-		return NULL;
-
 	mock = kzalloc(sizeof(*mock), GFP_KERNEL);
 	if (!mock)
 		return NULL;
-	mock->domain.geometry.aperture_start = MOCK_APERTURE_START;
-	mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST;
+	mock->parent = mock_parent;
+	mock->domain.type = iommu_domain_type;
 	mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE;
-	xa_init(&mock->pfns);
+	if (mock_parent) {
+		mock->iotlb = user_cfg->test_config;
+		mock->domain.ops = &domain_nested_ops;
+	} else {
+		mock->domain.geometry.aperture_start = MOCK_APERTURE_START;
+		mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST;
+		mock->domain.ops = mock_ops.default_domain_ops;
+		xa_init(&mock->pfns);
+	}
 	return &mock->domain;
 }
 
+static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
+{
+	if (WARN_ON(iommu_domain_type != IOMMU_DOMAIN_BLOCKED &&
+		    iommu_domain_type != IOMMU_DOMAIN_UNMANAGED))
+		return NULL;
+	return __mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED, NULL, NULL);
+}
+
+static struct iommu_domain *mock_domain_alloc_user(struct device *dev,
+						   struct iommu_domain *parent,
+						   const union iommu_domain_user_data *user_data)
+{
+	const struct iommu_hwpt_selftest *user_cfg = (struct iommu_hwpt_selftest *)user_data;
+	unsigned int iommu_domain_type = IOMMU_DOMAIN_UNMANAGED;
+	struct mock_iommu_domain *mock_parent = NULL;
+
+	/* Check union iommu_domain_user_data in include/linux/iommu.h */
+	static_assert(sizeof(struct iommu_hwpt_selftest) <= 16);
+
+	if (parent) {
+		if (parent->ops != mock_ops.default_domain_ops)
+			return NULL;
+		if (!user_cfg || !(user_cfg->flags & IOMMU_TEST_FLAG_NESTED))
+			return NULL;
+		iommu_domain_type = IOMMU_DOMAIN_NESTED;
+		mock_parent = container_of(parent,
+					   struct mock_iommu_domain, domain);
+	}
+
+	return __mock_domain_alloc(iommu_domain_type, mock_parent, user_cfg);
+}
+
 static void mock_domain_free(struct iommu_domain *domain)
 {
 	struct mock_iommu_domain *mock =
@@ -290,12 +335,21 @@ static void mock_domain_set_plaform_dma_ops(struct device *dev)
 	 */
 }
 
+static int mock_domain_user_data_len(u32 hwpt_type)
+{
+	if (hwpt_type != IOMMU_HWPT_TYPE_SELFTTEST)
+		return -EOPNOTSUPP;
+	return sizeof(struct iommu_hwpt_selftest);
+};
+
 static const struct iommu_ops mock_ops = {
 	.owner = THIS_MODULE,
 	.pgsize_bitmap = MOCK_IO_PAGE_SIZE,
 	.hw_info_type = IOMMU_HW_INFO_TYPE_SELFTEST,
 	.hw_info = mock_domain_hw_info,
 	.domain_alloc = mock_domain_alloc,
+	.domain_alloc_user = mock_domain_alloc_user,
+	.domain_alloc_user_data_len = mock_domain_user_data_len,
 	.capable = mock_domain_capable,
 	.set_platform_dma_ops = mock_domain_set_plaform_dma_ops,
 	.default_domain_ops =
@@ -308,6 +362,11 @@ static const struct iommu_ops mock_ops = {
 		},
 };
 
+static struct iommu_domain_ops domain_nested_ops = {
+	.free = mock_domain_free,
+	.attach_dev = mock_domain_nop_attach,
+};
+
 static struct iommu_device mock_iommu_device = {
 	.ops = &mock_ops,
 };
@@ -324,7 +383,10 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id,
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 	hwpt = container_of(obj, struct iommufd_hw_pagetable, obj);
-	if (hwpt->domain->ops != mock_ops.default_domain_ops) {
+	if ((hwpt->domain->type == IOMMU_DOMAIN_UNMANAGED &&
+	     hwpt->domain->ops != mock_ops.default_domain_ops) ||
+	    (hwpt->domain->type == IOMMU_DOMAIN_NESTED &&
+	     hwpt->domain->ops != &domain_nested_ops)) {
 		iommufd_put_object(&hwpt->obj);
 		return ERR_PTR(-EINVAL);
 	}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c2d0fa3e2e18..dff4fc2b2014 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -231,6 +231,9 @@ struct iommu_iotlb_gather {
  * include/uapi/linux/iommufd.h.
  */
 union iommu_domain_user_data {
+#ifdef CONFIG_IOMMUFD_TEST
+	__u64 test[2];
+#endif
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 09/11] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
                   ` (7 preceding siblings ...)
  2023-05-11 14:38 ` [PATCH v2 08/11] iommufd/selftest: Add domain_alloc_user() support in iommu mock Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-11 14:38 ` [PATCH v2 10/11] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Nicolin Chen <nicolinc@nvidia.com>

The IOMMU_HWPT_ALLOC ioctl now supports passing user_data to allocate a
customized domain. Add its coverage for both a regular domain case and
a nested domain case.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 tools/testing/selftests/iommu/iommufd.c       | 114 +++++++++++++++++-
 tools/testing/selftests/iommu/iommufd_utils.h |  36 ++++++
 2 files changed, 149 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index fa2324741ad2..568f80f596ad 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -122,6 +122,7 @@ TEST_F(iommufd, cmd_length)
 	TEST_LENGTH(iommu_option, IOMMU_OPTION);
 	TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS);
 	TEST_LENGTH(iommu_hw_info, IOMMU_DEVICE_GET_HW_INFO);
+	TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC);
 #undef TEST_LENGTH
 }
 
@@ -194,6 +195,7 @@ FIXTURE_VARIANT(iommufd_ioas)
 {
 	unsigned int mock_domains;
 	unsigned int memory_limit;
+	bool new_hwpt;
 };
 
 FIXTURE_SETUP(iommufd_ioas)
@@ -233,6 +235,12 @@ FIXTURE_VARIANT_ADD(iommufd_ioas, mock_domain)
 	.mock_domains = 1,
 };
 
+FIXTURE_VARIANT_ADD(iommufd_ioas, mock_domain_hwpt)
+{
+	.mock_domains = 1,
+	.new_hwpt = true,
+};
+
 FIXTURE_VARIANT_ADD(iommufd_ioas, two_mock_domain)
 {
 	.mock_domains = 2,
@@ -260,6 +268,93 @@ TEST_F(iommufd_ioas, ioas_destroy)
 	}
 }
 
+TEST_F(iommufd_ioas, hwpt_alloc)
+{
+	uint32_t new_hwpt_id = 0;
+
+	if (self->stdev_id && self->device_id) {
+		test_cmd_hwpt_alloc(self->device_id, self->ioas_id, &new_hwpt_id);
+		test_cmd_mock_domain_replace(self->stdev_id, new_hwpt_id);
+		/* hw_pagetable cannot be freed if a device is attached to it */
+		EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, new_hwpt_id));
+
+		/* Detach from the new hw_pagetable and try again */
+		test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+		test_ioctl_destroy(new_hwpt_id);
+	} else {
+		test_err_cmd_hwpt_alloc(ENOENT, self->device_id,
+					self->ioas_id, &new_hwpt_id);
+		test_err_mock_domain_replace(ENOENT,
+					     self->stdev_id, new_hwpt_id);
+	}
+}
+
+TEST_F(iommufd_ioas, nested_hwpt_alloc)
+{
+	uint32_t nested_hwpt_id[2] = {};
+	uint32_t parent_hwpt_id = 0;
+	uint32_t test_hwpt_id = 0;
+
+	if (self->device_id) {
+		/* Negative tests */
+		test_err_cmd_hwpt_alloc(ENOENT, self->ioas_id, self->device_id,
+					&test_hwpt_id);
+		test_err_cmd_hwpt_alloc(EINVAL, self->device_id,
+					self->device_id, &test_hwpt_id);
+
+		/* Allocate two nested hwpts sharing one common parent hwpt */
+		test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
+				    &parent_hwpt_id);
+
+		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+					   &nested_hwpt_id[0]);
+		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+					   &nested_hwpt_id[1]);
+
+		/* Negative test: a nested hwpt on top of a nested hwpt */
+		test_err_cmd_hwpt_alloc_nested(EINVAL, self->device_id,
+					       nested_hwpt_id[0],
+					       &test_hwpt_id);
+		/* Negative test: parent hwpt now cannot be freed */
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, parent_hwpt_id));
+
+		/* Attach device to nested_hwpt_id[0] that then will be busy */
+		test_cmd_mock_domain_replace(self->stdev_id,
+					     nested_hwpt_id[0]);
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, nested_hwpt_id[0]));
+
+		/* Switch from nested_hwpt_id[0] to nested_hwpt_id[1] */
+		test_cmd_mock_domain_replace(self->stdev_id,
+					     nested_hwpt_id[1]);
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, nested_hwpt_id[1]));
+		test_ioctl_destroy(nested_hwpt_id[0]);
+
+		/* Detach from nested_hwpt_id[1] and destroy it */
+		test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
+		test_ioctl_destroy(nested_hwpt_id[1]);
+
+		/* Detach from the parent hw_pagetable and destroy it */
+		test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+		test_ioctl_destroy(parent_hwpt_id);
+	} else {
+		test_err_cmd_hwpt_alloc(ENOENT, self->device_id, self->ioas_id,
+					&parent_hwpt_id);
+		test_err_cmd_hwpt_alloc_nested(ENOENT, self->device_id,
+					       parent_hwpt_id,
+					       &nested_hwpt_id[0]);
+		test_err_cmd_hwpt_alloc_nested(ENOENT, self->device_id,
+					       parent_hwpt_id,
+					       &nested_hwpt_id[1]);
+		test_err_mock_domain_replace(ENOENT, self->stdev_id,
+					     nested_hwpt_id[0]);
+		test_err_mock_domain_replace(ENOENT, self->stdev_id,
+					     nested_hwpt_id[1]);
+	}
+}
+
 TEST_F(iommufd_ioas, hwpt_attach)
 {
 	/* Create a device attached directly to a hwpt */
@@ -663,6 +758,8 @@ TEST_F(iommufd_ioas, access_pin)
 			       MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES);
 
 	for (npages = 1; npages < BUFFER_SIZE / PAGE_SIZE; npages++) {
+		uint32_t new_hwpt_id = 0;
+		uint32_t mock_device_id;
 		uint32_t mock_stdev_id;
 		uint32_t mock_hwpt_id;
 
@@ -696,12 +793,27 @@ TEST_F(iommufd_ioas, access_pin)
 				   _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES),
 				   &access_cmd));
 		test_cmd_mock_domain(self->ioas_id, &mock_stdev_id,
-				     &mock_hwpt_id, NULL);
+				     &mock_hwpt_id, &mock_device_id);
 		check_map_cmd.id = mock_hwpt_id;
+		if (variant->new_hwpt) {
+			test_cmd_hwpt_alloc(mock_device_id, self->ioas_id,
+					    &new_hwpt_id);
+			test_cmd_mock_domain_replace(mock_stdev_id,
+						     new_hwpt_id);
+			check_map_cmd.id = new_hwpt_id;
+		} else {
+			check_map_cmd.id = mock_hwpt_id;
+		}
 		ASSERT_EQ(0, ioctl(self->fd,
 				   _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_MAP),
 				   &check_map_cmd));
 
+		if (variant->new_hwpt) {
+			/* Detach from the new hwpt for its destroy() */
+			test_cmd_mock_domain_replace(mock_stdev_id,
+						     mock_hwpt_id);
+			test_ioctl_destroy(new_hwpt_id);
+		}
 		test_ioctl_destroy(mock_stdev_id);
 		test_cmd_destroy_access_pages(
 			access_cmd.id,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 8dced7ef9118..e2c26168ec89 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -118,6 +118,42 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 
 #define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \
 	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id))
+#define test_err_cmd_hwpt_alloc(_errno, device_id, pt_id, hwpt_id)     \
+	EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \
+						  pt_id, hwpt_id))
+
+static int _test_cmd_hwpt_alloc_nested(int fd, __u32 device_id, __u32 parent_id,
+				       __u32 *hwpt_id)
+{
+	struct iommu_hwpt_selftest data = {
+		.flags = IOMMU_TEST_FLAG_NESTED,
+		.test_config = IOMMU_TEST_IOTLB_DEFAULT,
+	};
+	struct iommu_hwpt_alloc cmd = {
+		.size = sizeof(cmd),
+		.dev_id = device_id,
+		.pt_id = parent_id,
+		.hwpt_type = IOMMU_HWPT_TYPE_SELFTTEST,
+		.data_len = sizeof(data),
+		.data_uptr = (uint64_t)&data,
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_HWPT_ALLOC, &cmd);
+	if (ret)
+		return ret;
+	if (hwpt_id)
+		*hwpt_id = cmd.out_hwpt_id;
+	return 0;
+}
+
+#define test_cmd_hwpt_alloc_nested(device_id, parent_id, hwpt_id)     \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc_nested(self->fd, device_id, \
+						 parent_id, hwpt_id))
+#define test_err_cmd_hwpt_alloc_nested(_errno, device_id, parent_id, hwpt_id) \
+	EXPECT_ERRNO(_errno,                                                  \
+		     _test_cmd_hwpt_alloc_nested(self->fd, device_id,         \
+						 parent_id, hwpt_id))
 
 static int _test_cmd_access_replace_ioas(int fd, __u32 access_id,
 					 unsigned int ioas_id)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 10/11] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
                   ` (8 preceding siblings ...)
  2023-05-11 14:38 ` [PATCH v2 09/11] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-11 14:38 ` [PATCH v2 11/11] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
  2023-05-19  9:56 ` [PATCH v2 00/11] iommufd: Add nesting infrastructure Tian, Kevin
  11 siblings, 0 replies; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Nicolin Chen <nicolinc@nvidia.com>

This allows to test whether IOTLB has been invalidated or not.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h          |  4 ++++
 drivers/iommu/iommufd/selftest.c              | 22 +++++++++++++++++++
 tools/testing/selftests/iommu/iommufd.c       |  4 ++++
 tools/testing/selftests/iommu/iommufd_utils.h | 14 ++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index e77d57dd8e25..ec904931df69 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -19,6 +19,7 @@ enum {
 	IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT,
 	IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE,
 	IOMMU_TEST_OP_ACCESS_REPLACE_IOAS,
+	IOMMU_TEST_OP_MD_CHECK_IOTLB,
 };
 
 enum {
@@ -95,6 +96,9 @@ struct iommu_test_cmd {
 		struct {
 			__u32 ioas_id;
 		} access_replace_ioas;
+		struct {
+			__u32 iotlb;
+		} check_iotlb;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 616525c5d308..57530e7d9524 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -678,6 +678,25 @@ static int iommufd_test_md_check_refs(struct iommufd_ucmd *ucmd,
 	return 0;
 }
 
+static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd,
+				       u32 mockpt_id, u32 iotlb)
+{
+	struct iommufd_hw_pagetable *hwpt;
+	struct mock_iommu_domain *mock;
+	int rc = 0;
+
+	hwpt = get_md_pagetable(ucmd, mockpt_id, &mock);
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
+
+	mock = container_of(hwpt->domain, struct mock_iommu_domain, domain);
+
+	if (iotlb != mock->iotlb)
+		rc = -EINVAL;
+	iommufd_put_object(&hwpt->obj);
+	return rc;
+}
+
 struct selftest_access {
 	struct iommufd_access *access;
 	struct file *file;
@@ -1091,6 +1110,9 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 		return iommufd_test_md_check_refs(
 			ucmd, u64_to_user_ptr(cmd->check_refs.uptr),
 			cmd->check_refs.length, cmd->check_refs.refs);
+	case IOMMU_TEST_OP_MD_CHECK_IOTLB:
+		return iommufd_test_md_check_iotlb(ucmd, cmd->id,
+						   cmd->check_iotlb.iotlb);
 	case IOMMU_TEST_OP_CREATE_ACCESS:
 		return iommufd_test_create_access(ucmd, cmd->id,
 						  cmd->create_access.flags);
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 568f80f596ad..afbe23cecb54 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -310,6 +310,10 @@ TEST_F(iommufd_ioas, nested_hwpt_alloc)
 					   &nested_hwpt_id[0]);
 		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
 					   &nested_hwpt_id[1]);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0],
+					  IOMMU_TEST_IOTLB_DEFAULT);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[1],
+					  IOMMU_TEST_IOTLB_DEFAULT);
 
 		/* Negative test: a nested hwpt on top of a nested hwpt */
 		test_err_cmd_hwpt_alloc_nested(EINVAL, self->device_id,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index e2c26168ec89..7d9efe3f7e0d 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -122,6 +122,20 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 	EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \
 						  pt_id, hwpt_id))
 
+#define test_cmd_hwpt_check_iotlb(hwpt_id, expected)                           \
+	({                                                                     \
+		struct iommu_test_cmd test_cmd = {                             \
+			.size = sizeof(test_cmd),                              \
+			.op = IOMMU_TEST_OP_MD_CHECK_IOTLB,                    \
+			.id = hwpt_id,                                         \
+			.check_iotlb = { .iotlb = expected },                  \
+		};                                                             \
+		ASSERT_EQ(0,                                                   \
+			  ioctl(self->fd,                                      \
+				_IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_IOTLB), \
+				&test_cmd));                                   \
+	})
+
 static int _test_cmd_hwpt_alloc_nested(int fd, __u32 device_id, __u32 parent_id,
 				       __u32 *hwpt_id)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 11/11] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
                   ` (9 preceding siblings ...)
  2023-05-11 14:38 ` [PATCH v2 10/11] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
@ 2023-05-11 14:38 ` Yi Liu
  2023-05-19  9:56 ` [PATCH v2 00/11] iommufd: Add nesting infrastructure Tian, Kevin
  11 siblings, 0 replies; 64+ messages in thread
From: Yi Liu @ 2023-05-11 14:38 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, zhenzhong.duan

From: Nicolin Chen <nicolinc@nvidia.com>

Add a mock_domain_cache_invalidate_user() and a corresponding struct
iommu_hwpt_invalidate_selftest, to allow to test IOMMU_HWPT_INVALIDATE
from user space, by using the new IOMMU_TEST_OP_MD_CHECK_IOTLB.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h          | 10 ++++++++++
 drivers/iommu/iommufd/selftest.c              | 19 ++++++++++++++++++
 tools/testing/selftests/iommu/iommufd.c       |  8 ++++++++
 tools/testing/selftests/iommu/iommufd_utils.h | 20 +++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index ec904931df69..c5ca69bd07d1 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -129,4 +129,14 @@ struct iommu_hwpt_selftest {
 	__u64 test_config;
 };
 
+/**
+ * struct iommu_hwpt_invalidate_selftest
+ *
+ * @flags: invalidate flags
+ */
+struct iommu_hwpt_invalidate_selftest {
+#define IOMMU_TEST_INVALIDATE_ALL	(1ULL << 0)
+	__u64 flags;
+};
+
 #endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 57530e7d9524..3c0051442b4d 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -362,9 +362,28 @@ static const struct iommu_ops mock_ops = {
 		},
 };
 
+static int mock_domain_cache_invalidate_user(struct iommu_domain *domain,
+					     void *user_data)
+{
+	struct iommu_hwpt_invalidate_selftest *inv_info = user_data;
+	struct mock_iommu_domain *mock =
+		container_of(domain, struct mock_iommu_domain, domain);
+
+	if (domain->type != IOMMU_DOMAIN_NESTED || !mock->parent)
+		return -EINVAL;
+
+	if (inv_info->flags & IOMMU_TEST_INVALIDATE_ALL)
+		mock->iotlb = 0;
+
+	return 0;
+}
+
 static struct iommu_domain_ops domain_nested_ops = {
 	.free = mock_domain_free,
 	.attach_dev = mock_domain_nop_attach,
+	.cache_invalidate_user = mock_domain_cache_invalidate_user,
+	.cache_invalidate_user_data_len =
+		sizeof(struct iommu_hwpt_invalidate_selftest),
 };
 
 static struct iommu_device mock_iommu_device = {
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index afbe23cecb54..031e222cebf9 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -123,6 +123,7 @@ TEST_F(iommufd, cmd_length)
 	TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS);
 	TEST_LENGTH(iommu_hw_info, IOMMU_DEVICE_GET_HW_INFO);
 	TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC);
+	TEST_LENGTH(iommu_hwpt_invalidate, IOMMU_HWPT_INVALIDATE);
 #undef TEST_LENGTH
 }
 
@@ -323,6 +324,13 @@ TEST_F(iommufd_ioas, nested_hwpt_alloc)
 		EXPECT_ERRNO(EBUSY,
 			     _test_ioctl_destroy(self->fd, parent_hwpt_id));
 
+		/* hwpt_invalidate only supports a user-managed hwpt (nested) */
+		test_err_cmd_hwpt_invalidate(EINVAL, parent_hwpt_id);
+		test_cmd_hwpt_invalidate(nested_hwpt_id[0]);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0);
+		test_cmd_hwpt_invalidate(nested_hwpt_id[1]);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[1], 0);
+
 		/* Attach device to nested_hwpt_id[0] that then will be busy */
 		test_cmd_mock_domain_replace(self->stdev_id,
 					     nested_hwpt_id[0]);
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 7d9efe3f7e0d..fef17fb46160 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -169,6 +169,26 @@ static int _test_cmd_hwpt_alloc_nested(int fd, __u32 device_id, __u32 parent_id,
 		     _test_cmd_hwpt_alloc_nested(self->fd, device_id,         \
 						 parent_id, hwpt_id))
 
+static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id)
+{
+	struct iommu_hwpt_invalidate_selftest data = {
+		.flags = IOMMU_TEST_INVALIDATE_ALL,
+	};
+	struct iommu_hwpt_invalidate cmd = {
+		.size = sizeof(cmd),
+		.hwpt_id = hwpt_id,
+		.data_len = sizeof(data),
+		.data_uptr = (uint64_t)&data,
+	};
+
+	return ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd);
+}
+
+#define test_cmd_hwpt_invalidate(hwpt_id) \
+	ASSERT_EQ(0, _test_cmd_hwpt_invalidate(self->fd, hwpt_id))
+#define test_err_cmd_hwpt_invalidate(_errno, hwpt_id) \
+	EXPECT_ERRNO(_errno, _test_cmd_hwpt_invalidate(self->fd, hwpt_id))
+
 static int _test_cmd_access_replace_ioas(int fd, __u32 access_id,
 					 unsigned int ioas_id)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-05-11 14:38 ` [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data Yi Liu
@ 2023-05-17  3:08   ` Liu, Jingqi
  2023-05-19 19:34     ` Nicolin Chen
  2023-05-19  9:41   ` Tian, Kevin
  1 sibling, 1 reply; 64+ messages in thread
From: Liu, Jingqi @ 2023-05-17  3:08 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan

On 5/11/2023 10:38 PM, Yi Liu wrote:
> IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> But it can only allocate hw_pagetables linked with IOAS. There are needs
> to support hw_pagetable allocation with parameters specified by user. For
> example, in nested translation, user needs to allocate hw_pagetable for
> the stage-1 translation (e.g. a single I/O page table or a set of I/O page
> tables) with user data. It also needs provide a stage-2 hw_pagetable which
s/provide/to provide
> is linked to the GPA IOAS.
>
> This extends IOMMU_HWPT_ALLOC to accept user specified parameter and hwpt
> ID in @pt_id field. Such as the user-managed stage-1 hwpt, which requires
> a parent hwpt to point to stage-2 translation.
>
> enum iommu_hwpt_type is defined to differentiate the user parameters use
> by different usages. For the allocations that don't require user parameter,
s/parameter/parameters
> IOMMU_HWPT_TYPE_DEFAULT is defined for backward compatibility. Other types
> would be added by future iommu vendor driver extensions.
>
> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/hw_pagetable.c | 87 +++++++++++++++++++++++++---
>   drivers/iommu/iommufd/main.c         |  2 +-
>   include/uapi/linux/iommufd.h         | 32 +++++++++-
>   3 files changed, 112 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 73d7d9d07726..e84270eb6e49 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -184,11 +184,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
>   
>   int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
>   {
> +	struct iommufd_hw_pagetable *hwpt, *parent = NULL;
> +	union iommu_domain_user_data *data = NULL;
>   	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
> -	struct iommufd_hw_pagetable *hwpt;
> +	struct iommufd_object *pt_obj;
> +	const struct iommu_ops *ops;
>   	struct iommufd_device *idev;
>   	struct iommufd_ioas *ioas;
> -	int rc;
> +	int klen = 0;
> +	int rc = 0;
>   
>   	if (cmd->flags || cmd->__reserved)
>   		return -EOPNOTSUPP;
> @@ -197,15 +201,81 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
>   	if (IS_ERR(idev))
>   		return PTR_ERR(idev);
>   
> -	ioas = iommufd_get_ioas(ucmd->ictx, cmd->pt_id);
> -	if (IS_ERR(ioas)) {
> -		rc = PTR_ERR(ioas);
> +	ops = dev_iommu_ops(idev->dev);
> +
> +	/*
> +	 * All drivers support IOMMU_HWPT_TYPE_DEFAULT, so pass it through.
> +	 * For any other hwpt_type, check the ops->domain_alloc_user_data_len
> +	 * presence and its result.
> +	 */
> +	if (cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT) {
> +		if (!ops->domain_alloc_user_data_len) {
> +			rc = -EOPNOTSUPP;
> +			goto out_put_idev;
> +		}
> +		klen = ops->domain_alloc_user_data_len(cmd->hwpt_type);
> +		if (WARN_ON(klen < 0)) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}
Would it be better if the later check "klen" is moved here ?
     if (klen) {
                 [...]
     }
If this check fails here, there's no need to execute the code after it.
If this path is not executed, "klen" is 0, and there's no need to check it.
Do I understand it right ?

Thanks,
Jingqi
> +	}
> +
> +	pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY);
> +	if (IS_ERR(pt_obj)) {
> +		rc = -EINVAL;
>   		goto out_put_idev;
>   	}
>   
> +	switch (pt_obj->type) {
> +	case IOMMUFD_OBJ_IOAS:
> +		ioas = container_of(pt_obj, struct iommufd_ioas, obj);
> +		break;
> +	case IOMMUFD_OBJ_HW_PAGETABLE:
> +		/* pt_id points HWPT only when hwpt_type is !IOMMU_HWPT_TYPE_DEFAULT */
> +		if (cmd->hwpt_type == IOMMU_HWPT_TYPE_DEFAULT) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}
> +
> +		parent = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
> +		/*
> +		 * Cannot allocate user-managed hwpt linking to auto_created
> +		 * hwpt. If the parent hwpt is already a user-managed hwpt,
> +		 * don't allocate another user-managed hwpt linking to it.
> +		 */
> +		if (parent->auto_domain || parent->parent) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}
> +		ioas = parent->ioas;
> +		break;
> +	default:
> +		rc = -EINVAL;
> +		goto out_put_pt;
> +	}
> +
> +	if (klen) {
> +		if (!cmd->data_len) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}
> +
> +		data = kzalloc(sizeof(*data), GFP_KERNEL);
> +		if (!data) {
> +			rc = -ENOMEM;
> +			goto out_put_pt;
> +		}
> +
> +		rc = copy_struct_from_user(data, sizeof(*data),
> +					   u64_to_user_ptr(cmd->data_uptr),
> +					   cmd->data_len);
> +		if (rc)
> +			goto out_free_data;
> +	}
> +
>   	mutex_lock(&ioas->mutex);
>   	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev,
> -					  NULL, NULL, false);
> +					  parent, data, false);
>   	if (IS_ERR(hwpt)) {
>   		rc = PTR_ERR(hwpt);
>   		goto out_unlock;
> @@ -222,7 +292,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
>   	iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
>   out_unlock:
>   	mutex_unlock(&ioas->mutex);
> -	iommufd_put_object(&ioas->obj);
> +out_free_data:
> +	kfree(data);
> +out_put_pt:
> +	iommufd_put_object(pt_obj);
>   out_put_idev:
>   	iommufd_put_object(&idev->obj);
>   	return rc;
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index 5c24e8971f09..ac81403ba78e 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -302,7 +302,7 @@ struct iommufd_ioctl_op {
>   static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
>   	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id),
>   	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
> -		 __reserved),
> +		 data_uptr),
>   	IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info,
>   		 struct iommu_hw_info, __reserved),
>   	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index e9d42838dcbd..699e735828db 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -347,6 +347,14 @@ struct iommu_vfio_ioas {
>   };
>   #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
>   
> +/**
> + * enum iommu_hwpt_type - IOMMU HWPT Type
> + * @IOMMU_HWPT_TYPE_DEFAULT: default
> + */
> +enum iommu_hwpt_type {
> +	IOMMU_HWPT_TYPE_DEFAULT,
> +};
> +
>   /**
>    * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
>    * @size: sizeof(struct iommu_hwpt_alloc)
> @@ -355,12 +363,31 @@ struct iommu_vfio_ioas {
>    * @pt_id: The IOAS to connect this HWPT to
>    * @out_hwpt_id: The ID of the new HWPT
>    * @__reserved: Must be 0
> + * @hwpt_type: One of enum iommu_hwpt_type
> + * @data_len: Length of the type specific data
> + * @data_uptr: User pointer to the type specific data
>    *
>    * Explicitly allocate a hardware page table object. This is the same object
>    * type that is returned by iommufd_device_attach() and represents the
>    * underlying iommu driver's iommu_domain kernel object.
>    *
> - * A HWPT will be created with the IOVA mappings from the given IOAS.
> + * A kernel-managed HWPT will be created with the mappings from the given IOAS.
> + * The @hwpt_type for its allocation can be set to IOMMU_HWPT_TYPE_DEFAULT, or
> + * another type (being listed below) to specialize a kernel-managed HWPT.
> + *
> + * A user-managed HWPT will be created from a given parent HWPT via @pt_id, in
> + * which the parent HWPT must be allocated previously via the same ioctl from a
> + * given IOAS. The @hwpt_type must not be set to IOMMU_HWPT_TYPE_DEFAULT but a
> + * pre-defined type corresponding to the underlying IOMMU hardware.
> + *
> + * If the @hwpt_type is set to IOMMU_HWPT_TYPE_DEFAULT, both the @data_len and
> + * the @data_uptr will be ignored. Otherwise, both of them must be given.
> + *
> + * +==============================+=====================================+===========+
> + * | @hwpt_type                   |    Data structure in @data_uptr     |   @pt_id  |
> + * +------------------------------+-------------------------------------+-----------+
> + * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
> + * +------------------------------+-------------------------------------+-----------+
>    */
>   struct iommu_hwpt_alloc {
>   	__u32 size;
> @@ -369,6 +396,9 @@ struct iommu_hwpt_alloc {
>   	__u32 pt_id;
>   	__u32 out_hwpt_id;
>   	__u32 __reserved;
> +	__u32 hwpt_type;
> +	__u32 data_len;
> +	__aligned_u64 data_uptr;
>   };
>   #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
>   


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-05-11 14:38 ` [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace Yi Liu
@ 2023-05-19  8:47   ` Tian, Kevin
  2023-05-19 18:45     ` Nicolin Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-19  8:47 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:39 PM
> @@ -229,6 +238,15 @@ struct iommu_iotlb_gather {
>   *           after use. Return the data buffer if success, or ERR_PTR on
>   *           failure.
>   * @domain_alloc: allocate iommu domain
> + * @domain_alloc_user: allocate user iommu domain
> + * @domain_alloc_user_data_len: return the required length of the user
> data
> + *                              to allocate a specific type user iommu domain.
> + *                              @hwpt_type is defined as enum iommu_hwpt_type
> + *                              in include/uapi/linux/iommufd.h. The returned
> + *                              length is the corresponding sizeof driver data
> + *                              structures in include/uapi/linux/iommufd.h.
> + *                              -EOPNOTSUPP would be returned if the input
> + *                              @hwpt_type is not supported by the driver.

Can this be merged with earlier @hw_info callback? That will already
report a list of supported hwpt types. is there a problem to further
describe the data length for each type in that interface?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 02/11] iommu: Add nested domain support
  2023-05-11 14:38 ` [PATCH v2 02/11] iommu: Add nested domain support Yi Liu
@ 2023-05-19  8:51   ` Tian, Kevin
  2023-05-19 18:49     ` Nicolin Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-19  8:51 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, Liu,
	Yi L, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, Duan, Zhenzhong

> From: Yi Liu <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:39 PM
> 
> @@ -66,6 +66,9 @@ struct iommu_domain_geometry {
> 
>  #define __IOMMU_DOMAIN_SVA	(1U << 4)  /* Shared process address
> space */
> 
> +#define __IOMMU_DOMAIN_NESTED	(1U << 5)  /* User-managed IOVA
> nested on
> +					      a stage-2 translation        */

s/IOVA/address space/

> @@ -346,6 +350,10 @@ struct iommu_ops {
>   * @iotlb_sync_map: Sync mappings created recently using @map to the
> hardware
>   * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty
> flush
>   *            queue
> + * @cache_invalidate_user: Flush hardware TLBs caching user space IO
> mappings
> + * @cache_invalidate_user_data_len: Defined length of input user data for
> the
> + *                                  cache_invalidate_user op, being sizeof the
> + *                                  structure in include/uapi/linux/iommufd.h

same as comment to last patch, can this be merged with @hw_info?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 03/11] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
  2023-05-11 14:38 ` [PATCH v2 03/11] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation Yi Liu
@ 2023-05-19  8:56   ` Tian, Kevin
  2023-05-19 18:57     ` Nicolin Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-19  8:56 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, Liu,
	Yi L, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, Duan, Zhenzhong

> From: Yi Liu <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:39 PM
> 
> 
> @@ -88,7 +90,10 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx
> *ictx, struct iommufd_ioas *ioas,
>  	refcount_inc(&ioas->obj.users);
>  	hwpt->ioas = ioas;
> 
> -	hwpt->domain = iommu_domain_alloc(idev->dev->bus);
> +	if (ops->domain_alloc_user)
> +		hwpt->domain = ops->domain_alloc_user(idev->dev, NULL,
> NULL);
> +	else
> +		hwpt->domain = iommu_domain_alloc(idev->dev->bus);

this reminds the comment for @domain_alloc_user() should clarify
that UNMANGED domain type is assumed when no user data is 
provided, to be compatible with iommu_domain_alloc().

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-05-11 14:38 ` [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
@ 2023-05-19  9:06   ` Tian, Kevin
  2023-05-19 19:09     ` Nicolin Chen
  2023-05-19  9:34   ` Tian, Kevin
  1 sibling, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-19  9:06 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:39 PM
> @@ -61,6 +63,8 @@ int iommufd_hw_pagetable_enforce_cc(struct
> iommufd_hw_pagetable *hwpt)
>   * @ictx: iommufd context
>   * @ioas: IOAS to associate the domain with
>   * @idev: Device to get an iommu_domain for
> + * @parent: Optional parent HWPT to associate with the domain with

remove the trailing "the domain with"

> @@ -73,14 +77,22 @@ int iommufd_hw_pagetable_enforce_cc(struct
> iommufd_hw_pagetable *hwpt)
>   */
>  struct iommufd_hw_pagetable *
>  iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct
> iommufd_ioas *ioas,
> -			   struct iommufd_device *idev, bool
> immediate_attach)
> +			   struct iommufd_device *idev,
> +			   struct iommufd_hw_pagetable *parent,
> +			   union iommu_domain_user_data *user_data,
> +			   bool immediate_attach)
>  {
>  	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> +	struct iommu_domain *parent_domain = NULL;
>  	struct iommufd_hw_pagetable *hwpt;
> +	bool type_unmanaged, type_nested;
>  	int rc;
> 
>  	lockdep_assert_held(&ioas->mutex);
> 
> +	if ((user_data || parent) && !ops->domain_alloc_user)
> +		return ERR_PTR(-EOPNOTSUPP);

Do we allow specifying parent w/o user_data?

> @@ -99,6 +117,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx
> *ictx, struct iommufd_ioas *ioas,
>  		goto out_abort;
>  	}
> 
> +	/* It must be either NESTED or UNMANAGED, depending on
> parent_domain */
> +       type_nested = hwpt->domain->type == IOMMU_DOMAIN_NESTED;
> +       type_unmanaged = hwpt->domain->type ==
> IOMMU_DOMAIN_UNMANAGED;

no need of one-time used variables. Just put the conditions directly
in WARN_ON.

> +       if (WARN_ON((parent_domain && !type_nested) ||
> +                   (!parent_domain && !type_unmanaged))) {
> +		rc = -EINVAL;
> +		goto out_abort;
> +	}
> +

probably just WARN_ON_ONCE() to mark that driver has problem?


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-05-11 14:38 ` [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
  2023-05-19  9:06   ` Tian, Kevin
@ 2023-05-19  9:34   ` Tian, Kevin
  2023-05-19 19:12     ` Nicolin Chen
  1 sibling, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-19  9:34 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:39 PM
> @@ -89,9 +101,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx
> *ictx, struct iommufd_ioas *ioas,
>  	/* Pairs with iommufd_hw_pagetable_destroy() */
>  	refcount_inc(&ioas->obj.users);
>  	hwpt->ioas = ioas;

presumably a user hwpt doesn't need store ioas?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-05-11 14:38 ` [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data Yi Liu
  2023-05-17  3:08   ` Liu, Jingqi
@ 2023-05-19  9:41   ` Tian, Kevin
  2023-05-19 19:48     ` Nicolin Chen
  1 sibling, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-19  9:41 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, Liu,
	Yi L, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest, Duan, Zhenzhong

> From: Yi Liu <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:39 PM
> +	if (cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT) {
> +		if (!ops->domain_alloc_user_data_len) {
> +			rc = -EOPNOTSUPP;
> +			goto out_put_idev;
> +		}
> +		klen = ops->domain_alloc_user_data_len(cmd->hwpt_type);
> +		if (WARN_ON(klen < 0)) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}
> +	}

What about passing the user pointer to the iommu driver which
then does the copy so we don't need an extra @data_len() 
callback for every driver?

> 
> +	switch (pt_obj->type) {
> +	case IOMMUFD_OBJ_IOAS:
> +		ioas = container_of(pt_obj, struct iommufd_ioas, obj);
> +		break;

this should fail if parent is specified.

> +	case IOMMUFD_OBJ_HW_PAGETABLE:
> +		/* pt_id points HWPT only when hwpt_type
> is !IOMMU_HWPT_TYPE_DEFAULT */
> +		if (cmd->hwpt_type == IOMMU_HWPT_TYPE_DEFAULT) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}
> +
> +		parent = container_of(pt_obj, struct iommufd_hw_pagetable,
> obj);
> +		/*
> +		 * Cannot allocate user-managed hwpt linking to
> auto_created
> +		 * hwpt. If the parent hwpt is already a user-managed hwpt,
> +		 * don't allocate another user-managed hwpt linking to it.
> +		 */
> +		if (parent->auto_domain || parent->parent) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}
> +		ioas = parent->ioas;

for nesting why is ioas required? In concept we can just pass NULL ioas
to iommufd_hw_pagetable_alloc() for this hwpt. If within that function
there is a need to toggle ioas for the parent it can always retrieve it
from the parent hwpt.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
                   ` (10 preceding siblings ...)
  2023-05-11 14:38 ` [PATCH v2 11/11] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
@ 2023-05-19  9:56 ` Tian, Kevin
  2023-05-19 11:49   ` Jason Gunthorpe
  11 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-19  9:56 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 11, 2023 10:39 PM
> 
> Lu Baolu (2):
>   iommu: Add new iommu op to create domains owned by userspace
>   iommu: Add nested domain support
> 
> Nicolin Chen (5):
>   iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
>   iommufd/selftest: Add domain_alloc_user() support in iommu mock
>   iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
>   iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
>   iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
> 
> Yi Liu (4):
>   iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
>   iommufd: Pass parent hwpt and user_data to
>     iommufd_hw_pagetable_alloc()
>   iommufd: IOMMU_HWPT_ALLOC allocation with user data
>   iommufd: Add IOMMU_HWPT_INVALIDATE
> 

I didn't see any change in iommufd_hw_pagetable_attach() to handle
stage-1 hwpt differently.

In concept whatever reserved regions existing on a device should be
directly reflected on the hwpt which the device is attached to.

So with nesting presumably the reserved regions of the device have
been reported to the userspace and it's user's responsibility to avoid
allocating IOVA from those reserved regions in stage-1 hwpt.

It's not necessarily to add reserved regions to the IOAS of the parent
hwpt since the device doesn't access that address space after it's
attached to stage-1. The parent is used only for address translation
in the iommu side.

This series kind of ignores this fact which is probably the reason why
you store an ioas pointer even in the stage-1 hwpt. 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-05-19  9:56 ` [PATCH v2 00/11] iommufd: Add nesting infrastructure Tian, Kevin
@ 2023-05-19 11:49   ` Jason Gunthorpe
  2023-05-24  3:48     ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2023-05-19 11:49 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Fri, May 19, 2023 at 09:56:04AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, May 11, 2023 10:39 PM
> > 
> > Lu Baolu (2):
> >   iommu: Add new iommu op to create domains owned by userspace
> >   iommu: Add nested domain support
> > 
> > Nicolin Chen (5):
> >   iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
> >   iommufd/selftest: Add domain_alloc_user() support in iommu mock
> >   iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
> >   iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
> >   iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
> > 
> > Yi Liu (4):
> >   iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
> >   iommufd: Pass parent hwpt and user_data to
> >     iommufd_hw_pagetable_alloc()
> >   iommufd: IOMMU_HWPT_ALLOC allocation with user data
> >   iommufd: Add IOMMU_HWPT_INVALIDATE
> > 
> 
> I didn't see any change in iommufd_hw_pagetable_attach() to handle
> stage-1 hwpt differently.
> 
> In concept whatever reserved regions existing on a device should be
> directly reflected on the hwpt which the device is attached to.
> 
> So with nesting presumably the reserved regions of the device have
> been reported to the userspace and it's user's responsibility to avoid
> allocating IOVA from those reserved regions in stage-1 hwpt.

Presumably
 
> It's not necessarily to add reserved regions to the IOAS of the parent
> hwpt since the device doesn't access that address space after it's
> attached to stage-1. The parent is used only for address translation
> in the iommu side.

But if we don't put them in the IOAS of the parent there is no way for
userspace to learn what they are to forward to the VM ?

Since we expect the parent IOAS to be usable in an identity mode I
think they should be added, at least I can't see a reason not to add
them.

Which is definately complicating some parts of this..

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-05-19  8:47   ` Tian, Kevin
@ 2023-05-19 18:45     ` Nicolin Chen
  2023-05-24  5:02       ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-05-19 18:45 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Fri, May 19, 2023 at 08:47:45AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, May 11, 2023 10:39 PM
> > @@ -229,6 +238,15 @@ struct iommu_iotlb_gather {
> >   *           after use. Return the data buffer if success, or ERR_PTR on
> >   *           failure.
> >   * @domain_alloc: allocate iommu domain
> > + * @domain_alloc_user: allocate user iommu domain
> > + * @domain_alloc_user_data_len: return the required length of the user
> > data
> > + *                              to allocate a specific type user iommu domain.
> > + *                              @hwpt_type is defined as enum iommu_hwpt_type
> > + *                              in include/uapi/linux/iommufd.h. The returned
> > + *                              length is the corresponding sizeof driver data
> > + *                              structures in include/uapi/linux/iommufd.h.
> > + *                              -EOPNOTSUPP would be returned if the input
> > + *                              @hwpt_type is not supported by the driver.
> 
> Can this be merged with earlier @hw_info callback? That will already
> report a list of supported hwpt types. is there a problem to further
> describe the data length for each type in that interface?

Yi and I had a last minute talk before he sent this version
actually... This version of hw_info no longer reports a list
of supported hwpt types. We previously did that in a bitmap,
but we found that a bitmap will not be sufficient eventually
if there are more than 64 hwpt_types.

And this domain_alloc_user_data_len might not be necessary,
because in this version the IOMMUFD core doesn't really care
about the actual data_len since it copies the data into the
ucmd_buffer, i.e. we would probably only need a bool op like
"hwpt_type_is_supported".

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 02/11] iommu: Add nested domain support
  2023-05-19  8:51   ` Tian, Kevin
@ 2023-05-19 18:49     ` Nicolin Chen
  2023-05-24  5:03       ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-05-19 18:49 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Fri, May 19, 2023 at 08:51:21AM +0000, Tian, Kevin wrote:
 
> > @@ -346,6 +350,10 @@ struct iommu_ops {
> >   * @iotlb_sync_map: Sync mappings created recently using @map to the
> > hardware
> >   * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty
> > flush
> >   *            queue
> > + * @cache_invalidate_user: Flush hardware TLBs caching user space IO
> > mappings
> > + * @cache_invalidate_user_data_len: Defined length of input user data for
> > the
> > + *                                  cache_invalidate_user op, being sizeof the
> > + *                                  structure in include/uapi/linux/iommufd.h
> 
> same as comment to last patch, can this be merged with @hw_info?

I think it's better to keep them separate, since this is added
in struct iommu_domain_ops, given it is domain/hwpt specific,
while the hw_info is in struct iommu_ops?

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 03/11] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
  2023-05-19  8:56   ` Tian, Kevin
@ 2023-05-19 18:57     ` Nicolin Chen
  2023-05-24  5:04       ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-05-19 18:57 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Fri, May 19, 2023 at 08:56:25AM +0000, Tian, Kevin wrote:
 
> > From: Yi Liu <yi.l.liu@intel.com>
> > Sent: Thursday, May 11, 2023 10:39 PM
> >
> >
> > @@ -88,7 +90,10 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx
> > *ictx, struct iommufd_ioas *ioas,
> >       refcount_inc(&ioas->obj.users);
> >       hwpt->ioas = ioas;
> >
> > -     hwpt->domain = iommu_domain_alloc(idev->dev->bus);
> > +     if (ops->domain_alloc_user)
> > +             hwpt->domain = ops->domain_alloc_user(idev->dev, NULL,
> > NULL);
> > +     else
> > +             hwpt->domain = iommu_domain_alloc(idev->dev->bus);
> 
> this reminds the comment for @domain_alloc_user() should clarify
> that UNMANGED domain type is assumed when no user data is
> provided, to be compatible with iommu_domain_alloc().

Yes. Perhaps:

 * @domain_alloc_user: allocate user iommu domain. A valid user_data pointer and
 *                     a parent pointer to a kernel-managed domain are required
 *                     to allocate an IOMMU_DOMAIN_NESTED domain. Otherwise, the
 *                     new domain will be set to an IOMMU_DOMAIN_UNMANAGED type.

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-05-19  9:06   ` Tian, Kevin
@ 2023-05-19 19:09     ` Nicolin Chen
  2023-05-24  5:11       ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-05-19 19:09 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Fri, May 19, 2023 at 09:06:20AM +0000, Tian, Kevin wrote:

> > @@ -73,14 +77,22 @@ int iommufd_hw_pagetable_enforce_cc(struct
> > iommufd_hw_pagetable *hwpt)
> >   */
> >  struct iommufd_hw_pagetable *
> >  iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct
> > iommufd_ioas *ioas,
> > -                        struct iommufd_device *idev, bool
> > immediate_attach)
> > +                        struct iommufd_device *idev,
> > +                        struct iommufd_hw_pagetable *parent,
> > +                        union iommu_domain_user_data *user_data,
> > +                        bool immediate_attach)
> >  {
> >       const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> > +     struct iommu_domain *parent_domain = NULL;
> >       struct iommufd_hw_pagetable *hwpt;
> > +     bool type_unmanaged, type_nested;
> >       int rc;
> >
> >       lockdep_assert_held(&ioas->mutex);
> >
> > +     if ((user_data || parent) && !ops->domain_alloc_user)
> > +             return ERR_PTR(-EOPNOTSUPP);
> 
> Do we allow specifying parent w/o user_data?

I don't think so. Perhaps we should do a double check:

+	if (!!user_data ^ !!parent)
+		return ERR_PTR(-EINVAL);
+	if (user_data && !ops->domain_alloc_user)
+		return ERR_PTR(-EOPNOTSUPP);

> > @@ -99,6 +117,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx
> > *ictx, struct iommufd_ioas *ioas,
> >               goto out_abort;
> >       }
> >
> > +     /* It must be either NESTED or UNMANAGED, depending on
> > parent_domain */
> > +       type_nested = hwpt->domain->type == IOMMU_DOMAIN_NESTED;
> > +       type_unmanaged = hwpt->domain->type ==
> > IOMMU_DOMAIN_UNMANAGED;
> 
> no need of one-time used variables. Just put the conditions directly
> in WARN_ON.

It is to improve the readability. Otherwise, we'd have:

	if (WARN_ON((parent_domain &&
		     hwpt->domain->type != IOMMU_DOMAIN_NESTED) ||
		    (!parent_domain &&
		     hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED)))

> > +       if (WARN_ON((parent_domain && !type_nested) ||
> > +                   (!parent_domain && !type_unmanaged))) {
> > +             rc = -EINVAL;
> > +             goto out_abort;
> > +     }
> > +
> 
> probably just WARN_ON_ONCE() to mark that driver has problem?

Yea. I think we could do that.

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-05-19  9:34   ` Tian, Kevin
@ 2023-05-19 19:12     ` Nicolin Chen
  0 siblings, 0 replies; 64+ messages in thread
From: Nicolin Chen @ 2023-05-19 19:12 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Fri, May 19, 2023 at 09:34:38AM +0000, Tian, Kevin wrote:

> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, May 11, 2023 10:39 PM
> > @@ -89,9 +101,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx
> > *ictx, struct iommufd_ioas *ioas,
> >       /* Pairs with iommufd_hw_pagetable_destroy() */
> >       refcount_inc(&ioas->obj.users);
> >       hwpt->ioas = ioas;
> 
> presumably a user hwpt doesn't need store ioas?

hwpt->ioas has the refcount and mutex that are used by a user
hwpt too throughout the hwpt functions.

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-05-17  3:08   ` Liu, Jingqi
@ 2023-05-19 19:34     ` Nicolin Chen
  0 siblings, 0 replies; 64+ messages in thread
From: Nicolin Chen @ 2023-05-19 19:34 UTC (permalink / raw)
  To: Liu, Jingqi
  Cc: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	zhenzhong.duan

On Wed, May 17, 2023 at 11:08:12AM +0800, Liu, Jingqi wrote:

> > +     /*
> > +      * All drivers support IOMMU_HWPT_TYPE_DEFAULT, so pass it through.
> > +      * For any other hwpt_type, check the ops->domain_alloc_user_data_len
> > +      * presence and its result.
> > +      */
> > +     if (cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT) {
> > +             if (!ops->domain_alloc_user_data_len) {
> > +                     rc = -EOPNOTSUPP;
> > +                     goto out_put_idev;
> > +             }
> > +             klen = ops->domain_alloc_user_data_len(cmd->hwpt_type);
> > +             if (WARN_ON(klen < 0)) {
> > +                     rc = -EINVAL;
> > +                     goto out_put_pt;
> > +             }
> Would it be better if the later check "klen" is moved here ?
>     if (klen) {
>                 [...]
>     }
> If this check fails here, there's no need to execute the code after it.
> If this path is not executed, "klen" is 0, and there's no need to check it.
> Do I understand it right ?

Makes sense. And the klen value isn't really being used. So,
we may likely change it to a bool one. Also, I'm thinking of
forcing a !!cmd->data_len for a non-DEFAULT hwpt_type:

+	if (cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT) {
+		if (!cmd->data_len) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}
+		if (!ops->domain_alloc_user_data_len) {
+			rc = -EOPNOTSUPP;
+			goto out_put_pt;
+		}
+		if (!ops->hwpt_type_is_supported(cmd->hwpt_type)) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}

Then, for the latter part, simply:
+	if (cmd->data_len) {
+		// malloc data
+	}

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-05-19  9:41   ` Tian, Kevin
@ 2023-05-19 19:48     ` Nicolin Chen
  2023-05-24  5:16       ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-05-19 19:48 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Fri, May 19, 2023 at 09:41:00AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Yi Liu <yi.l.liu@intel.com>
> > Sent: Thursday, May 11, 2023 10:39 PM
> > +     if (cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT) {
> > +             if (!ops->domain_alloc_user_data_len) {
> > +                     rc = -EOPNOTSUPP;
> > +                     goto out_put_idev;
> > +             }
> > +             klen = ops->domain_alloc_user_data_len(cmd->hwpt_type);
> > +             if (WARN_ON(klen < 0)) {
> > +                     rc = -EINVAL;
> > +                     goto out_put_pt;
> > +             }
> > +     }
> 
> What about passing the user pointer to the iommu driver which
> then does the copy so we don't need an extra @data_len()
> callback for every driver?

It's doable by letting the driver do copy_from_user(), yet I
recall that Jason suggested to keep it in the iommufd. Also,
we are reusing the ucmd_buffer for the user_data. And the klen
isn't really being used for its value here. So, it is likely
enough to have an ops->hwpt_type_is_supported.

> >
> > +     switch (pt_obj->type) {
> > +     case IOMMUFD_OBJ_IOAS:
> > +             ioas = container_of(pt_obj, struct iommufd_ioas, obj);
> > +             break;
> 
> this should fail if parent is specified.

I don't think that's necessaray: the parent is NULL by default
and only specified (if IOMMUFD_OBJ_HW_PAGETABLE) by the exact
pt_id/pt_obj here.

> > +     case IOMMUFD_OBJ_HW_PAGETABLE:
> > +             /* pt_id points HWPT only when hwpt_type
> > is !IOMMU_HWPT_TYPE_DEFAULT */
> > +             if (cmd->hwpt_type == IOMMU_HWPT_TYPE_DEFAULT) {
> > +                     rc = -EINVAL;
> > +                     goto out_put_pt;
> > +             }
> > +
> > +             parent = container_of(pt_obj, struct iommufd_hw_pagetable,
> > obj);
> > +             /*
> > +              * Cannot allocate user-managed hwpt linking to
> > auto_created
> > +              * hwpt. If the parent hwpt is already a user-managed hwpt,
> > +              * don't allocate another user-managed hwpt linking to it.
> > +              */
> > +             if (parent->auto_domain || parent->parent) {
> > +                     rc = -EINVAL;
> > +                     goto out_put_pt;
> > +             }
> > +             ioas = parent->ioas;
> 
> for nesting why is ioas required? In concept we can just pass NULL ioas
> to iommufd_hw_pagetable_alloc() for this hwpt. If within that function
> there is a need to toggle ioas for the parent it can always retrieve it
> from the parent hwpt.

Jason suggested this for simplicity. As I replied in another
email, a user hwpt still needs ioas's refcount and mutex, so
it would otherwise have a duplicated code in the beginning of
most of hwpt_ functions:
	if (hwpt->parent)
		ioas = hwpt->parent->ioas;
	else (hwpt->ioas)
		ioas = hwpt->ioas;
	else
		WARN_ON(1);

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-05-19 11:49   ` Jason Gunthorpe
@ 2023-05-24  3:48     ` Tian, Kevin
  2023-06-06 14:18       ` Jason Gunthorpe
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-24  3:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, May 19, 2023 7:50 PM
> 
> On Fri, May 19, 2023 at 09:56:04AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Thursday, May 11, 2023 10:39 PM
> > >
> > > Lu Baolu (2):
> > >   iommu: Add new iommu op to create domains owned by userspace
> > >   iommu: Add nested domain support
> > >
> > > Nicolin Chen (5):
> > >   iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
> > >   iommufd/selftest: Add domain_alloc_user() support in iommu mock
> > >   iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user
> data
> > >   iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
> > >   iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
> > >
> > > Yi Liu (4):
> > >   iommufd/hw_pagetable: Use domain_alloc_user op for domain
> allocation
> > >   iommufd: Pass parent hwpt and user_data to
> > >     iommufd_hw_pagetable_alloc()
> > >   iommufd: IOMMU_HWPT_ALLOC allocation with user data
> > >   iommufd: Add IOMMU_HWPT_INVALIDATE
> > >
> >
> > I didn't see any change in iommufd_hw_pagetable_attach() to handle
> > stage-1 hwpt differently.
> >
> > In concept whatever reserved regions existing on a device should be
> > directly reflected on the hwpt which the device is attached to.
> >
> > So with nesting presumably the reserved regions of the device have
> > been reported to the userspace and it's user's responsibility to avoid
> > allocating IOVA from those reserved regions in stage-1 hwpt.
> 
> Presumably
> 
> > It's not necessarily to add reserved regions to the IOAS of the parent
> > hwpt since the device doesn't access that address space after it's
> > attached to stage-1. The parent is used only for address translation
> > in the iommu side.
> 
> But if we don't put them in the IOAS of the parent there is no way for
> userspace to learn what they are to forward to the VM ?

emmm I wonder whether that is the right interface to report
per-device reserved regions.

e.g. does it imply that all devices will be reported to the guest with
the exact same set of reserved regions merged in the parent IOAS?

it works but looks unclear in concept. By definition the list of
reserved regions on a device should be static/fixed instead of
being dynamic upon which IOAS this device is attached to and
how many other devices are sharing the same IOAS...

IOAS_IOVA_RANGES kind of follows what vfio type1 provides today

IMHO probably we should have DEVICE_IOVA_RANGES in the first
place instead of doing it via IOAS_IOVA_RANGES which is then
described as being dynamic upon the list of currently attached devices.

> 
> Since we expect the parent IOAS to be usable in an identity mode I
> think they should be added, at least I can't see a reason not to add
> them.

this is a good point.

for SMMU this sounds a must-have as identity mode is configured
in CD with nested translation always enabled. It is out of the host
awareness hence reserved regions must be added to the parent IOAS.

for VT-d identity must be configured explicitly and the hardware
doesn't support stage-1 identity in nested mode. It essentially means
not using nested translation and the user just explicitly attaches
the associated RID or {RID, PASID} to the parent IOAS then get
reserved regions covered already.

With that it makes more sense to make it a vendor specific choice. 
Probably can have a flag bit when creating nested hwpt to mark
that identity mode might be used in this nested configuration
then iommufd should add device reserved regions to the parent
IOAS?

> 
> Which is definately complicating some parts of this..
> 
> Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-05-19 18:45     ` Nicolin Chen
@ 2023-05-24  5:02       ` Tian, Kevin
  2023-05-24  5:23         ` Nicolin Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-24  5:02 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, May 20, 2023 2:45 AM
> 
> On Fri, May 19, 2023 at 08:47:45AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Thursday, May 11, 2023 10:39 PM
> > > @@ -229,6 +238,15 @@ struct iommu_iotlb_gather {
> > >   *           after use. Return the data buffer if success, or ERR_PTR on
> > >   *           failure.
> > >   * @domain_alloc: allocate iommu domain
> > > + * @domain_alloc_user: allocate user iommu domain
> > > + * @domain_alloc_user_data_len: return the required length of the user
> > > data
> > > + *                              to allocate a specific type user iommu domain.
> > > + *                              @hwpt_type is defined as enum iommu_hwpt_type
> > > + *                              in include/uapi/linux/iommufd.h. The returned
> > > + *                              length is the corresponding sizeof driver data
> > > + *                              structures in include/uapi/linux/iommufd.h.
> > > + *                              -EOPNOTSUPP would be returned if the input
> > > + *                              @hwpt_type is not supported by the driver.
> >
> > Can this be merged with earlier @hw_info callback? That will already
> > report a list of supported hwpt types. is there a problem to further
> > describe the data length for each type in that interface?
> 
> Yi and I had a last minute talk before he sent this version
> actually... This version of hw_info no longer reports a list
> of supported hwpt types. We previously did that in a bitmap,
> but we found that a bitmap will not be sufficient eventually
> if there are more than 64 hwpt_types.
> 
> And this domain_alloc_user_data_len might not be necessary,
> because in this version the IOMMUFD core doesn't really care
> about the actual data_len since it copies the data into the
> ucmd_buffer, i.e. we would probably only need a bool op like
> "hwpt_type_is_supported".
> 

Or just pass to the @domain_alloc_user ops which should fail
if the type is not supported?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 02/11] iommu: Add nested domain support
  2023-05-19 18:49     ` Nicolin Chen
@ 2023-05-24  5:03       ` Tian, Kevin
  2023-05-24  5:28         ` Nicolin Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-24  5:03 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, May 20, 2023 2:49 AM
> 
> On Fri, May 19, 2023 at 08:51:21AM +0000, Tian, Kevin wrote:
> 
> > > @@ -346,6 +350,10 @@ struct iommu_ops {
> > >   * @iotlb_sync_map: Sync mappings created recently using @map to the
> > > hardware
> > >   * @iotlb_sync: Flush all queued ranges from the hardware TLBs and
> empty
> > > flush
> > >   *            queue
> > > + * @cache_invalidate_user: Flush hardware TLBs caching user space IO
> > > mappings
> > > + * @cache_invalidate_user_data_len: Defined length of input user data
> for
> > > the
> > > + *                                  cache_invalidate_user op, being sizeof the
> > > + *                                  structure in include/uapi/linux/iommufd.h
> >
> > same as comment to last patch, can this be merged with @hw_info?
> 
> I think it's better to keep them separate, since this is added
> in struct iommu_domain_ops, given it is domain/hwpt specific,
> while the hw_info is in struct iommu_ops?
> 

Just be curious whether there are real examples in which the data
len might be different upon the hwpt type...

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 03/11] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
  2023-05-19 18:57     ` Nicolin Chen
@ 2023-05-24  5:04       ` Tian, Kevin
  0 siblings, 0 replies; 64+ messages in thread
From: Tian, Kevin @ 2023-05-24  5:04 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, May 20, 2023 2:58 AM
> 
> On Fri, May 19, 2023 at 08:56:25AM +0000, Tian, Kevin wrote:
> 
> > > From: Yi Liu <yi.l.liu@intel.com>
> > > Sent: Thursday, May 11, 2023 10:39 PM
> > >
> > >
> > > @@ -88,7 +90,10 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx
> > > *ictx, struct iommufd_ioas *ioas,
> > >       refcount_inc(&ioas->obj.users);
> > >       hwpt->ioas = ioas;
> > >
> > > -     hwpt->domain = iommu_domain_alloc(idev->dev->bus);
> > > +     if (ops->domain_alloc_user)
> > > +             hwpt->domain = ops->domain_alloc_user(idev->dev, NULL,
> > > NULL);
> > > +     else
> > > +             hwpt->domain = iommu_domain_alloc(idev->dev->bus);
> >
> > this reminds the comment for @domain_alloc_user() should clarify
> > that UNMANGED domain type is assumed when no user data is
> > provided, to be compatible with iommu_domain_alloc().
> 
> Yes. Perhaps:
> 
>  * @domain_alloc_user: allocate user iommu domain. A valid user_data
> pointer and
>  *                     a parent pointer to a kernel-managed domain are required
>  *                     to allocate an IOMMU_DOMAIN_NESTED domain. Otherwise,
> the
>  *                     new domain will be set to an IOMMU_DOMAIN_UNMANAGED
> type.
> 

yes, this is clearer.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-05-19 19:09     ` Nicolin Chen
@ 2023-05-24  5:11       ` Tian, Kevin
  2023-05-24  5:31         ` Nicolin Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-24  5:11 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, May 20, 2023 3:09 AM
> 
> On Fri, May 19, 2023 at 09:06:20AM +0000, Tian, Kevin wrote:
> 
> > > @@ -73,14 +77,22 @@ int iommufd_hw_pagetable_enforce_cc(struct
> > > iommufd_hw_pagetable *hwpt)
> > >   */
> > >  struct iommufd_hw_pagetable *
> > >  iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct
> > > iommufd_ioas *ioas,
> > > -                        struct iommufd_device *idev, bool
> > > immediate_attach)
> > > +                        struct iommufd_device *idev,
> > > +                        struct iommufd_hw_pagetable *parent,
> > > +                        union iommu_domain_user_data *user_data,
> > > +                        bool immediate_attach)
> > >  {
> > >       const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> > > +     struct iommu_domain *parent_domain = NULL;
> > >       struct iommufd_hw_pagetable *hwpt;
> > > +     bool type_unmanaged, type_nested;
> > >       int rc;
> > >
> > >       lockdep_assert_held(&ioas->mutex);
> > >
> > > +     if ((user_data || parent) && !ops->domain_alloc_user)
> > > +             return ERR_PTR(-EOPNOTSUPP);
> >
> > Do we allow specifying parent w/o user_data?
> 
> I don't think so. Perhaps we should do a double check:
> 
> +	if (!!user_data ^ !!parent)
> +		return ERR_PTR(-EINVAL);

I think we allow creating a s2 hwpt with user_data so it
should be:

	if (parent && !user_data)
		return ERR_PTR(-INVAL);

> > > @@ -99,6 +117,15 @@ iommufd_hw_pagetable_alloc(struct
> iommufd_ctx
> > > *ictx, struct iommufd_ioas *ioas,
> > >               goto out_abort;
> > >       }
> > >
> > > +     /* It must be either NESTED or UNMANAGED, depending on
> > > parent_domain */
> > > +       type_nested = hwpt->domain->type == IOMMU_DOMAIN_NESTED;
> > > +       type_unmanaged = hwpt->domain->type ==
> > > IOMMU_DOMAIN_UNMANAGED;
> >
> > no need of one-time used variables. Just put the conditions directly
> > in WARN_ON.
> 
> It is to improve the readability. Otherwise, we'd have:
> 
> 	if (WARN_ON((parent_domain &&
> 		     hwpt->domain->type != IOMMU_DOMAIN_NESTED) ||
> 		    (!parent_domain &&
> 		     hwpt->domain->type !=
> IOMMU_DOMAIN_UNMANAGED)))

IMHO this is already very clear w/o defining additional variables. 😊


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-05-19 19:48     ` Nicolin Chen
@ 2023-05-24  5:16       ` Tian, Kevin
  2023-05-24  5:40         ` Nicolin Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-24  5:16 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, May 20, 2023 3:48 AM
> 
> 
> > >
> > > +     switch (pt_obj->type) {
> > > +     case IOMMUFD_OBJ_IOAS:
> > > +             ioas = container_of(pt_obj, struct iommufd_ioas, obj);
> > > +             break;
> >
> > this should fail if parent is specified.
> 
> I don't think that's necessaray: the parent is NULL by default
> and only specified (if IOMMUFD_OBJ_HW_PAGETABLE) by the exact
> pt_id/pt_obj here.

I didn't get. The uAPI describes that only hwpt not ioas can be specified
in the pt_id field as the parent.

If we don't check here it means the user can specify an ioas id as the
parent?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-05-24  5:02       ` Tian, Kevin
@ 2023-05-24  5:23         ` Nicolin Chen
  2023-05-24  7:48           ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-05-24  5:23 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Wed, May 24, 2023 at 05:02:19AM +0000, Tian, Kevin wrote:
 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Saturday, May 20, 2023 2:45 AM
> >
> > On Fri, May 19, 2023 at 08:47:45AM +0000, Tian, Kevin wrote:
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Thursday, May 11, 2023 10:39 PM
> > > > @@ -229,6 +238,15 @@ struct iommu_iotlb_gather {
> > > >   *           after use. Return the data buffer if success, or ERR_PTR on
> > > >   *           failure.
> > > >   * @domain_alloc: allocate iommu domain
> > > > + * @domain_alloc_user: allocate user iommu domain
> > > > + * @domain_alloc_user_data_len: return the required length of the user
> > > > data
> > > > + *                              to allocate a specific type user iommu domain.
> > > > + *                              @hwpt_type is defined as enum iommu_hwpt_type
> > > > + *                              in include/uapi/linux/iommufd.h. The returned
> > > > + *                              length is the corresponding sizeof driver data
> > > > + *                              structures in include/uapi/linux/iommufd.h.
> > > > + *                              -EOPNOTSUPP would be returned if the input
> > > > + *                              @hwpt_type is not supported by the driver.
> > >
> > > Can this be merged with earlier @hw_info callback? That will already
> > > report a list of supported hwpt types. is there a problem to further
> > > describe the data length for each type in that interface?
> >
> > Yi and I had a last minute talk before he sent this version
> > actually... This version of hw_info no longer reports a list
> > of supported hwpt types. We previously did that in a bitmap,
> > but we found that a bitmap will not be sufficient eventually
> > if there are more than 64 hwpt_types.
> >
> > And this domain_alloc_user_data_len might not be necessary,
> > because in this version the IOMMUFD core doesn't really care
> > about the actual data_len since it copies the data into the
> > ucmd_buffer, i.e. we would probably only need a bool op like
> > "hwpt_type_is_supported".
> >
> 
> Or just pass to the @domain_alloc_user ops which should fail
> if the type is not supported?

The domain_alloc_user returns NULL, which then would be turned
into an ENOMEM error code. It might be confusing from the user
space perspective. Having an op at least allows the user space
to realize that something is wrong with the input structure?

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 02/11] iommu: Add nested domain support
  2023-05-24  5:03       ` Tian, Kevin
@ 2023-05-24  5:28         ` Nicolin Chen
  0 siblings, 0 replies; 64+ messages in thread
From: Nicolin Chen @ 2023-05-24  5:28 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Wed, May 24, 2023 at 05:03:37AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Saturday, May 20, 2023 2:49 AM
> >
> > On Fri, May 19, 2023 at 08:51:21AM +0000, Tian, Kevin wrote:
> >
> > > > @@ -346,6 +350,10 @@ struct iommu_ops {
> > > >   * @iotlb_sync_map: Sync mappings created recently using @map to the
> > > > hardware
> > > >   * @iotlb_sync: Flush all queued ranges from the hardware TLBs and
> > empty
> > > > flush
> > > >   *            queue
> > > > + * @cache_invalidate_user: Flush hardware TLBs caching user space IO
> > > > mappings
> > > > + * @cache_invalidate_user_data_len: Defined length of input user data
> > for
> > > > the
> > > > + *                                  cache_invalidate_user op, being sizeof the
> > > > + *                                  structure in include/uapi/linux/iommufd.h
> > >
> > > same as comment to last patch, can this be merged with @hw_info?
> >
> > I think it's better to keep them separate, since this is added
> > in struct iommu_domain_ops, given it is domain/hwpt specific,
> > while the hw_info is in struct iommu_ops?
> >
> 
> Just be curious whether there are real examples in which the data
> len might be different upon the hwpt type...

Likely "no" on top of my head. Yet it's a bit odd to see the
data length for cache_invalidate_user being added along with
the hw_info, since they belong to two different structs, and
even two different series?

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-05-24  5:11       ` Tian, Kevin
@ 2023-05-24  5:31         ` Nicolin Chen
  0 siblings, 0 replies; 64+ messages in thread
From: Nicolin Chen @ 2023-05-24  5:31 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Wed, May 24, 2023 at 05:11:43AM +0000, Tian, Kevin wrote:
 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Saturday, May 20, 2023 3:09 AM
> >
> > On Fri, May 19, 2023 at 09:06:20AM +0000, Tian, Kevin wrote:
> >
> > > > @@ -73,14 +77,22 @@ int iommufd_hw_pagetable_enforce_cc(struct
> > > > iommufd_hw_pagetable *hwpt)
> > > >   */
> > > >  struct iommufd_hw_pagetable *
> > > >  iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct
> > > > iommufd_ioas *ioas,
> > > > -                        struct iommufd_device *idev, bool
> > > > immediate_attach)
> > > > +                        struct iommufd_device *idev,
> > > > +                        struct iommufd_hw_pagetable *parent,
> > > > +                        union iommu_domain_user_data *user_data,
> > > > +                        bool immediate_attach)
> > > >  {
> > > >       const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> > > > +     struct iommu_domain *parent_domain = NULL;
> > > >       struct iommufd_hw_pagetable *hwpt;
> > > > +     bool type_unmanaged, type_nested;
> > > >       int rc;
> > > >
> > > >       lockdep_assert_held(&ioas->mutex);
> > > >
> > > > +     if ((user_data || parent) && !ops->domain_alloc_user)
> > > > +             return ERR_PTR(-EOPNOTSUPP);
> > >
> > > Do we allow specifying parent w/o user_data?
> >
> > I don't think so. Perhaps we should do a double check:
> >
> > +     if (!!user_data ^ !!parent)
> > +             return ERR_PTR(-EINVAL);
> 
> I think we allow creating a s2 hwpt with user_data so it
> should be:
> 
>         if (parent && !user_data)
>                 return ERR_PTR(-INVAL);

Oh, yes. I forgot about that :)

> > > > @@ -99,6 +117,15 @@ iommufd_hw_pagetable_alloc(struct
> > iommufd_ctx
> > > > *ictx, struct iommufd_ioas *ioas,
> > > >               goto out_abort;
> > > >       }
> > > >
> > > > +     /* It must be either NESTED or UNMANAGED, depending on
> > > > parent_domain */
> > > > +       type_nested = hwpt->domain->type == IOMMU_DOMAIN_NESTED;
> > > > +       type_unmanaged = hwpt->domain->type ==
> > > > IOMMU_DOMAIN_UNMANAGED;
> > >
> > > no need of one-time used variables. Just put the conditions directly
> > > in WARN_ON.
> >
> > It is to improve the readability. Otherwise, we'd have:
> >
> >       if (WARN_ON((parent_domain &&
> >                    hwpt->domain->type != IOMMU_DOMAIN_NESTED) ||
> >                   (!parent_domain &&
> >                    hwpt->domain->type !=
> > IOMMU_DOMAIN_UNMANAGED)))
> 
> IMHO this is already very clear w/o defining additional variables. 😊

Okay. I think we can revert this back and drop the two type_*.

Thanks
Nic


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-05-24  5:16       ` Tian, Kevin
@ 2023-05-24  5:40         ` Nicolin Chen
  2023-05-24  7:55           ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-05-24  5:40 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Wed, May 24, 2023 at 05:16:35AM +0000, Tian, Kevin wrote:
 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Saturday, May 20, 2023 3:48 AM
> >
> >
> > > >
> > > > +     switch (pt_obj->type) {
> > > > +     case IOMMUFD_OBJ_IOAS:
> > > > +             ioas = container_of(pt_obj, struct iommufd_ioas, obj);
> > > > +             break;
> > >
> > > this should fail if parent is specified.
> >
> > I don't think that's necessaray: the parent is NULL by default
> > and only specified (if IOMMUFD_OBJ_HW_PAGETABLE) by the exact
> > pt_id/pt_obj here.
> 
> I didn't get. The uAPI describes that only hwpt not ioas can be specified
> in the pt_id field as the parent.
>
> If we don't check here it means the user can specify an ioas id as the
> parent?

I meant that the parent pointer isn't specified at this line:
the declaration of the parent is simply NULL, and not touched
in this IOMMUFD_OBJ_IOAS case, as the parent pointer would be
only specified in the IOMMUFD_OBJ_HW_PAGETABLE case that is
behind this line.

We could add a sanity of the parent pointer, but that would
be just a NOP, right?

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-05-24  5:23         ` Nicolin Chen
@ 2023-05-24  7:48           ` Tian, Kevin
  2023-05-25  1:41             ` Nicolin Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-05-24  7:48 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, May 24, 2023 1:24 PM
> 
> On Wed, May 24, 2023 at 05:02:19AM +0000, Tian, Kevin wrote:
> 
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Saturday, May 20, 2023 2:45 AM
> > >
> > > On Fri, May 19, 2023 at 08:47:45AM +0000, Tian, Kevin wrote:
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Thursday, May 11, 2023 10:39 PM
> > > > > @@ -229,6 +238,15 @@ struct iommu_iotlb_gather {
> > > > >   *           after use. Return the data buffer if success, or ERR_PTR on
> > > > >   *           failure.
> > > > >   * @domain_alloc: allocate iommu domain
> > > > > + * @domain_alloc_user: allocate user iommu domain
> > > > > + * @domain_alloc_user_data_len: return the required length of the
> user
> > > > > data
> > > > > + *                              to allocate a specific type user iommu domain.
> > > > > + *                              @hwpt_type is defined as enum
> iommu_hwpt_type
> > > > > + *                              in include/uapi/linux/iommufd.h. The returned
> > > > > + *                              length is the corresponding sizeof driver data
> > > > > + *                              structures in include/uapi/linux/iommufd.h.
> > > > > + *                              -EOPNOTSUPP would be returned if the input
> > > > > + *                              @hwpt_type is not supported by the driver.
> > > >
> > > > Can this be merged with earlier @hw_info callback? That will already
> > > > report a list of supported hwpt types. is there a problem to further
> > > > describe the data length for each type in that interface?
> > >
> > > Yi and I had a last minute talk before he sent this version
> > > actually... This version of hw_info no longer reports a list
> > > of supported hwpt types. We previously did that in a bitmap,
> > > but we found that a bitmap will not be sufficient eventually
> > > if there are more than 64 hwpt_types.
> > >
> > > And this domain_alloc_user_data_len might not be necessary,
> > > because in this version the IOMMUFD core doesn't really care
> > > about the actual data_len since it copies the data into the
> > > ucmd_buffer, i.e. we would probably only need a bool op like
> > > "hwpt_type_is_supported".
> > >
> >
> > Or just pass to the @domain_alloc_user ops which should fail
> > if the type is not supported?
> 
> The domain_alloc_user returns NULL, which then would be turned
> into an ENOMEM error code. It might be confusing from the user
> space perspective. Having an op at least allows the user space
> to realize that something is wrong with the input structure?
> 

this is a new callback. any reason why it cannot be defined to
allow returning ERR_PTR?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-05-24  5:40         ` Nicolin Chen
@ 2023-05-24  7:55           ` Tian, Kevin
  0 siblings, 0 replies; 64+ messages in thread
From: Tian, Kevin @ 2023-05-24  7:55 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, May 24, 2023 1:41 PM
> 
> On Wed, May 24, 2023 at 05:16:35AM +0000, Tian, Kevin wrote:
> 
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Saturday, May 20, 2023 3:48 AM
> > >
> > >
> > > > >
> > > > > +     switch (pt_obj->type) {
> > > > > +     case IOMMUFD_OBJ_IOAS:
> > > > > +             ioas = container_of(pt_obj, struct iommufd_ioas, obj);
> > > > > +             break;
> > > >
> > > > this should fail if parent is specified.
> > >
> > > I don't think that's necessaray: the parent is NULL by default
> > > and only specified (if IOMMUFD_OBJ_HW_PAGETABLE) by the exact
> > > pt_id/pt_obj here.
> >
> > I didn't get. The uAPI describes that only hwpt not ioas can be specified
> > in the pt_id field as the parent.
> >
> > If we don't check here it means the user can specify an ioas id as the
> > parent?
> 
> I meant that the parent pointer isn't specified at this line:
> the declaration of the parent is simply NULL, and not touched
> in this IOMMUFD_OBJ_IOAS case, as the parent pointer would be
> only specified in the IOMMUFD_OBJ_HW_PAGETABLE case that is
> behind this line.
> 

I see your point. As long as the type is IOAS the alloc request is
always interpreted as creating a s2 hwpt under the IOAS. Only
when it's HWPT type then it's treated as the parent.

I kept a wrong impression that there is another flag/field to mark
the parent requirement then there could be wrong combination
of setting that flag/field plus using an IOAS pt_id. 😊

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-05-24  7:48           ` Tian, Kevin
@ 2023-05-25  1:41             ` Nicolin Chen
  2023-06-06 14:08               ` Jason Gunthorpe
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-05-25  1:41 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest, Duan, Zhenzhong

On Wed, May 24, 2023 at 07:48:46AM +0000, Tian, Kevin wrote:

> > > > > >   *           after use. Return the data buffer if success, or ERR_PTR on
> > > > > >   *           failure.
> > > > > >   * @domain_alloc: allocate iommu domain
> > > > > > + * @domain_alloc_user: allocate user iommu domain
> > > > > > + * @domain_alloc_user_data_len: return the required length of the
> > user
> > > > > > data
> > > > > > + *                              to allocate a specific type user iommu domain.
> > > > > > + *                              @hwpt_type is defined as enum
> > iommu_hwpt_type
> > > > > > + *                              in include/uapi/linux/iommufd.h. The returned
> > > > > > + *                              length is the corresponding sizeof driver data
> > > > > > + *                              structures in include/uapi/linux/iommufd.h.
> > > > > > + *                              -EOPNOTSUPP would be returned if the input
> > > > > > + *                              @hwpt_type is not supported by the driver.
> > > > >
> > > > > Can this be merged with earlier @hw_info callback? That will already
> > > > > report a list of supported hwpt types. is there a problem to further
> > > > > describe the data length for each type in that interface?
> > > >
> > > > Yi and I had a last minute talk before he sent this version
> > > > actually... This version of hw_info no longer reports a list
> > > > of supported hwpt types. We previously did that in a bitmap,
> > > > but we found that a bitmap will not be sufficient eventually
> > > > if there are more than 64 hwpt_types.
> > > >
> > > > And this domain_alloc_user_data_len might not be necessary,
> > > > because in this version the IOMMUFD core doesn't really care
> > > > about the actual data_len since it copies the data into the
> > > > ucmd_buffer, i.e. we would probably only need a bool op like
> > > > "hwpt_type_is_supported".
> > > >
> > >
> > > Or just pass to the @domain_alloc_user ops which should fail
> > > if the type is not supported?
> >
> > The domain_alloc_user returns NULL, which then would be turned
> > into an ENOMEM error code. It might be confusing from the user
> > space perspective. Having an op at least allows the user space
> > to realize that something is wrong with the input structure?
> >
> 
> this is a new callback. any reason why it cannot be defined to
> allow returning ERR_PTR?

Upon a quick check, I think we could. Though it'd be slightly
mismatched with the domain_alloc op, it should be fine since
iommufd is likely to be the only caller.

So, I think we can just take the approach letting user space
try a hwpt_type and see if the ioctl would fail with -EINVAL.

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-05-25  1:41             ` Nicolin Chen
@ 2023-06-06 14:08               ` Jason Gunthorpe
  2023-06-06 19:43                 ` Nicolin Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-06 14:08 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, May 24, 2023 at 06:41:41PM -0700, Nicolin Chen wrote:

> Upon a quick check, I think we could. Though it'd be slightly
> mismatched with the domain_alloc op, it should be fine since
> iommufd is likely to be the only caller.

Ideally the main op would return ERR_PTR too

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-05-24  3:48     ` Tian, Kevin
@ 2023-06-06 14:18       ` Jason Gunthorpe
  2023-06-16  2:43         ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-06 14:18 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, May 24, 2023 at 03:48:43AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, May 19, 2023 7:50 PM
> > 
> > On Fri, May 19, 2023 at 09:56:04AM +0000, Tian, Kevin wrote:
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Thursday, May 11, 2023 10:39 PM
> > > >
> > > > Lu Baolu (2):
> > > >   iommu: Add new iommu op to create domains owned by userspace
> > > >   iommu: Add nested domain support
> > > >
> > > > Nicolin Chen (5):
> > > >   iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
> > > >   iommufd/selftest: Add domain_alloc_user() support in iommu mock
> > > >   iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user
> > data
> > > >   iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
> > > >   iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
> > > >
> > > > Yi Liu (4):
> > > >   iommufd/hw_pagetable: Use domain_alloc_user op for domain
> > allocation
> > > >   iommufd: Pass parent hwpt and user_data to
> > > >     iommufd_hw_pagetable_alloc()
> > > >   iommufd: IOMMU_HWPT_ALLOC allocation with user data
> > > >   iommufd: Add IOMMU_HWPT_INVALIDATE
> > > >
> > >
> > > I didn't see any change in iommufd_hw_pagetable_attach() to handle
> > > stage-1 hwpt differently.
> > >
> > > In concept whatever reserved regions existing on a device should be
> > > directly reflected on the hwpt which the device is attached to.
> > >
> > > So with nesting presumably the reserved regions of the device have
> > > been reported to the userspace and it's user's responsibility to avoid
> > > allocating IOVA from those reserved regions in stage-1 hwpt.
> > 
> > Presumably
> > 
> > > It's not necessarily to add reserved regions to the IOAS of the parent
> > > hwpt since the device doesn't access that address space after it's
> > > attached to stage-1. The parent is used only for address translation
> > > in the iommu side.
> > 
> > But if we don't put them in the IOAS of the parent there is no way for
> > userspace to learn what they are to forward to the VM ?
> 
> emmm I wonder whether that is the right interface to report
> per-device reserved regions.

The iommu driver needs to report different reserved regions for the S1
and S2 iommu_domains, and the IOAS should only get the reserved
regions for the S2.

Currently the API has no way to report per-domain reserved regions and
that is possibly OK for now. The S2 really doesn't have reserved
regions beyond the domain aperture.

So an ioctl to directly query the reserved regions for a dev_id makes
sense.

> > Since we expect the parent IOAS to be usable in an identity mode I
> > think they should be added, at least I can't see a reason not to add
> > them.
> 
> this is a good point.

But it mixes things

The S2 doesn't have reserved ranges restrictions, we always have some
model of a S1, even for identity mode, that would carry the reserved
ranges.

> With that it makes more sense to make it a vendor specific choice.

It isn't vendor specific, the ranges come from the domain that is
attached to the IOAS, and we simply don't import ranges for a S2
domain.

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-06-06 14:08               ` Jason Gunthorpe
@ 2023-06-06 19:43                 ` Nicolin Chen
  2023-06-07  0:14                   ` Jason Gunthorpe
  0 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-06-06 19:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Tue, Jun 06, 2023 at 11:08:44AM -0300, Jason Gunthorpe wrote:
> On Wed, May 24, 2023 at 06:41:41PM -0700, Nicolin Chen wrote:
> 
> > Upon a quick check, I think we could. Though it'd be slightly
> > mismatched with the domain_alloc op, it should be fine since
> > iommufd is likely to be the only caller.
> 
> Ideally the main op would return ERR_PTR too

Yea. It just seems to be a bit painful to change it for that.

Worth a big series?

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace
  2023-06-06 19:43                 ` Nicolin Chen
@ 2023-06-07  0:14                   ` Jason Gunthorpe
  0 siblings, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-07  0:14 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Tue, Jun 06, 2023 at 12:43:44PM -0700, Nicolin Chen wrote:
> On Tue, Jun 06, 2023 at 11:08:44AM -0300, Jason Gunthorpe wrote:
> > On Wed, May 24, 2023 at 06:41:41PM -0700, Nicolin Chen wrote:
> > 
> > > Upon a quick check, I think we could. Though it'd be slightly
> > > mismatched with the domain_alloc op, it should be fine since
> > > iommufd is likely to be the only caller.
> > 
> > Ideally the main op would return ERR_PTR too
> 
> Yea. It just seems to be a bit painful to change it for that.
> 
> Worth a big series?

Probably not..

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-06 14:18       ` Jason Gunthorpe
@ 2023-06-16  2:43         ` Tian, Kevin
  2023-06-19 12:37           ` Jason Gunthorpe
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-06-16  2:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, June 6, 2023 10:18 PM
> 
> > > > It's not necessarily to add reserved regions to the IOAS of the parent
> > > > hwpt since the device doesn't access that address space after it's
> > > > attached to stage-1. The parent is used only for address translation
> > > > in the iommu side.
> > >
> > > But if we don't put them in the IOAS of the parent there is no way for
> > > userspace to learn what they are to forward to the VM ?
> >
> > emmm I wonder whether that is the right interface to report
> > per-device reserved regions.
> 
> The iommu driver needs to report different reserved regions for the S1
> and S2 iommu_domains, 

I can see the difference between RID and RID+PASID, but not sure whether
it's a actual requirement regarding to attached domain.

e.g. if only talking about RID then the same set of reserved regions should
be reported for both S1 attach and S2 attach.

> and the IOAS should only get the reserved regions for the S2.
> 
> Currently the API has no way to report per-domain reserved regions and
> that is possibly OK for now. The S2 really doesn't have reserved
> regions beyond the domain aperture.
> 
> So an ioctl to directly query the reserved regions for a dev_id makes
> sense.

Or more specifically query the reserved regions for RID-based access.

Ideally for PASID there is no reserved region otherwise SVA won't work. 😊

> 
> > > Since we expect the parent IOAS to be usable in an identity mode I
> > > think they should be added, at least I can't see a reason not to add
> > > them.
> >
> > this is a good point.
> 
> But it mixes things
> 
> The S2 doesn't have reserved ranges restrictions, we always have some
> model of a S1, even for identity mode, that would carry the reserved
> ranges.
> 
> > With that it makes more sense to make it a vendor specific choice.
> 
> It isn't vendor specific, the ranges come from the domain that is
> attached to the IOAS, and we simply don't import ranges for a S2
> domain.
> 

With above I think the ranges are static per device.

When talking about RID-based nesting alone, ARM needs to add reserved
regions to the parent IOAS as identity is a valid S1 mode in nesting.

But for Intel RID nesting excludes identity (which becomes a direct
attach to S2) so the reserved regions apply to S1 instead of the parent IOAS.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-16  2:43         ` Tian, Kevin
@ 2023-06-19 12:37           ` Jason Gunthorpe
  2023-06-20  1:43             ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-19 12:37 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Fri, Jun 16, 2023 at 02:43:13AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, June 6, 2023 10:18 PM
> > 
> > > > > It's not necessarily to add reserved regions to the IOAS of the parent
> > > > > hwpt since the device doesn't access that address space after it's
> > > > > attached to stage-1. The parent is used only for address translation
> > > > > in the iommu side.
> > > >
> > > > But if we don't put them in the IOAS of the parent there is no way for
> > > > userspace to learn what they are to forward to the VM ?
> > >
> > > emmm I wonder whether that is the right interface to report
> > > per-device reserved regions.
> > 
> > The iommu driver needs to report different reserved regions for the S1
> > and S2 iommu_domains, 
> 
> I can see the difference between RID and RID+PASID, but not sure whether
> it's a actual requirement regarding to attached domain.

No, it isn't RID or RID+PASID here

The S2 has a different set of reserved regsions than the S1 because
the S2's IOVA does not appear on the bus.

So the S2's reserved regions are entirely an artifact of how the IOMMU
HW itself works when nesting.

We can probably get by with some documented slightly messy rules that
the reserved_regions only applies to directly RID attached domains. S2
and PASID attachments always have no reserved spaces.

> When talking about RID-based nesting alone, ARM needs to add reserved
> regions to the parent IOAS as identity is a valid S1 mode in nesting.

No, definately not. The S2 has no reserved regions because it is an
internal IOVA, and we should not abuse that.

Reflecting the requirements for an identity map is something all iommu
HW needs to handle, we should figure out how to do that properly.

> But for Intel RID nesting excludes identity (which becomes a direct
> attach to S2) so the reserved regions apply to S1 instead of the parent IOAS.

IIRC all the HW models will assign their S2's as a RID attached "S1"
during boot time to emulate "no translation"?

They all need to learn what the allowed identiy mapping is so that the
VMM can construct a compatible guest address space, independently of
any IOAS restrictions.

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-19 12:37           ` Jason Gunthorpe
@ 2023-06-20  1:43             ` Tian, Kevin
  2023-06-20 12:47               ` Jason Gunthorpe
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-06-20  1:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, June 19, 2023 8:37 PM
> 
> On Fri, Jun 16, 2023 at 02:43:13AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, June 6, 2023 10:18 PM
> > >
> > > > > > It's not necessarily to add reserved regions to the IOAS of the parent
> > > > > > hwpt since the device doesn't access that address space after it's
> > > > > > attached to stage-1. The parent is used only for address translation
> > > > > > in the iommu side.
> > > > >
> > > > > But if we don't put them in the IOAS of the parent there is no way for
> > > > > userspace to learn what they are to forward to the VM ?
> > > >
> > > > emmm I wonder whether that is the right interface to report
> > > > per-device reserved regions.
> > >
> > > The iommu driver needs to report different reserved regions for the S1
> > > and S2 iommu_domains,
> >
> > I can see the difference between RID and RID+PASID, but not sure whether
> > it's a actual requirement regarding to attached domain.
> 
> No, it isn't RID or RID+PASID here
> 
> The S2 has a different set of reserved regsions than the S1 because
> the S2's IOVA does not appear on the bus.
> 
> So the S2's reserved regions are entirely an artifact of how the IOMMU
> HW itself works when nesting.
> 
> We can probably get by with some documented slightly messy rules that
> the reserved_regions only applies to directly RID attached domains. S2
> and PASID attachments always have no reserved spaces.
> 
> > When talking about RID-based nesting alone, ARM needs to add reserved
> > regions to the parent IOAS as identity is a valid S1 mode in nesting.
> 
> No, definately not. The S2 has no reserved regions because it is an
> internal IOVA, and we should not abuse that.
> 
> Reflecting the requirements for an identity map is something all iommu
> HW needs to handle, we should figure out how to do that properly.

I wonder whether we have argued passed each other.

This series adds reserved regions to S2. I challenged the necessity as
S2 is not directly accessed by the device.

Then you replied that doing so still made sense to support identity S1.

But now looks you also agree that reserved regions should not be 
added to S2 except supporting identity S1 needs more thought?

> 
> > But for Intel RID nesting excludes identity (which becomes a direct
> > attach to S2) so the reserved regions apply to S1 instead of the parent IOAS.
> 
> IIRC all the HW models will assign their S2's as a RID attached "S1"
> during boot time to emulate "no translation"?

I'm not sure what it means...

> 
> They all need to learn what the allowed identiy mapping is so that the
> VMM can construct a compatible guest address space, independently of
> any IOAS restrictions.
> 

Intel VT-d supports 4 configurations:
  - passthrough (i.e. identity mapped)
  - S1 only
  - S2 only
  - nested

'S2 only' is used when vIOMMU is configured in passthrough.

'nested' is used when vIOMMU is configured in 'S1 only'.

So in any case 'identity' is not a business of nesting in the VT-d context.

My understanding of ARM SMMU is that from host p.o.v. the CD is the
S1 in the nested configuration. 'identity' is one configuration in the CD
then it's in the business of nesting.

My preference was that ALLOC_HWPT allows vIOMMU to opt whether
reserved regions of dev_id should be added to the IOAS of the parent
S2 hwpt.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-20  1:43             ` Tian, Kevin
@ 2023-06-20 12:47               ` Jason Gunthorpe
  2023-06-21  6:02                 ` Tian, Kevin
  2023-06-21  8:29                 ` Duan, Zhenzhong
  0 siblings, 2 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-20 12:47 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Tue, Jun 20, 2023 at 01:43:42AM +0000, Tian, Kevin wrote:
> I wonder whether we have argued passed each other.
> 
> This series adds reserved regions to S2. I challenged the necessity as
> S2 is not directly accessed by the device.
> 
> Then you replied that doing so still made sense to support identity
> S1.

I think I said/ment if we attach the "s2" iommu domain as a direct
attach for identity - eg at boot time, then the IOAS must gain the
reserved regions. This is our normal protocol.

But when we use the "s2" iommu domain as an actual nested S2 then we
don't gain reserved regions.

> Intel VT-d supports 4 configurations:
>   - passthrough (i.e. identity mapped)
>   - S1 only
>   - S2 only
>   - nested
> 
> 'S2 only' is used when vIOMMU is configured in passthrough.

S2 only is modeled as attaching an S2 format iommu domain to the RID,
and when this is done the IOAS should gain the reserved regions
because it is no different behavior than attaching any other iommu
domain to a RID.

When the S2 is replaced with a S1 nest then the IOAS should loose
those reserved regions since it is no longer attached to a RID.

> My understanding of ARM SMMU is that from host p.o.v. the CD is the
> S1 in the nested configuration. 'identity' is one configuration in the CD
> then it's in the business of nesting.

I think it is the same. A CD doesn't come into the picture until the
guest installs a CD pointing STE. Until that time the S2 is being used
as identity.

It sounds like the same basic flow.

> My preference was that ALLOC_HWPT allows vIOMMU to opt whether
> reserved regions of dev_id should be added to the IOAS of the parent
> S2 hwpt.

Having an API to explicitly load reserved regions of a specific device
to an IOAS makes some sense to me.

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-20 12:47               ` Jason Gunthorpe
@ 2023-06-21  6:02                 ` Tian, Kevin
  2023-06-21  7:09                   ` Liu, Yi L
                                     ` (2 more replies)
  2023-06-21  8:29                 ` Duan, Zhenzhong
  1 sibling, 3 replies; 64+ messages in thread
From: Tian, Kevin @ 2023-06-21  6:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, June 20, 2023 8:47 PM
> 
> On Tue, Jun 20, 2023 at 01:43:42AM +0000, Tian, Kevin wrote:
> > I wonder whether we have argued passed each other.
> >
> > This series adds reserved regions to S2. I challenged the necessity as
> > S2 is not directly accessed by the device.
> >
> > Then you replied that doing so still made sense to support identity
> > S1.
> 
> I think I said/ment if we attach the "s2" iommu domain as a direct
> attach for identity - eg at boot time, then the IOAS must gain the
> reserved regions. This is our normal protocol.
> 
> But when we use the "s2" iommu domain as an actual nested S2 then we
> don't gain reserved regions.

Then we're aligned.

Yi/Nicolin, please update this series to not automatically add reserved
regions to S2 in the nesting configuration.

It also implies that the user cannot rely on IOAS_IOVA_RANGES to
learn reserved regions for arranging addresses in S1.

Then we also need a new ioctl to report reserved regions per dev_id.

> 
> > Intel VT-d supports 4 configurations:
> >   - passthrough (i.e. identity mapped)
> >   - S1 only
> >   - S2 only
> >   - nested
> >
> > 'S2 only' is used when vIOMMU is configured in passthrough.
> 
> S2 only is modeled as attaching an S2 format iommu domain to the RID,
> and when this is done the IOAS should gain the reserved regions
> because it is no different behavior than attaching any other iommu
> domain to a RID.
> 
> When the S2 is replaced with a S1 nest then the IOAS should loose
> those reserved regions since it is no longer attached to a RID.

yes

> 
> > My understanding of ARM SMMU is that from host p.o.v. the CD is the
> > S1 in the nested configuration. 'identity' is one configuration in the CD
> > then it's in the business of nesting.
> 
> I think it is the same. A CD doesn't come into the picture until the
> guest installs a CD pointing STE. Until that time the S2 is being used
> as identity.
> 
> It sounds like the same basic flow.

After a CD table is installed in a STE I assume the SMMU still allows to
configure an individual CD entry as identity? e.g. while vSVA is enabled
on a device the guest can continue to keep CD#0 as identity when the
default domain of the device is set as 'passthrough'. In this case the
IOAS still needs to gain reserved regions even though S2 is not directly
attached from host p.o.v.

> 
> > My preference was that ALLOC_HWPT allows vIOMMU to opt whether
> > reserved regions of dev_id should be added to the IOAS of the parent
> > S2 hwpt.
> 
> Having an API to explicitly load reserved regions of a specific device
> to an IOAS makes some sense to me.
> 
> Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-21  6:02                 ` Tian, Kevin
@ 2023-06-21  7:09                   ` Liu, Yi L
  2023-06-21 12:04                   ` Jason Gunthorpe
  2023-06-21 17:13                   ` Nicolin Chen
  2 siblings, 0 replies; 64+ messages in thread
From: Liu, Yi L @ 2023-06-21  7:09 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, June 21, 2023 2:02 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, June 20, 2023 8:47 PM
> >
> > On Tue, Jun 20, 2023 at 01:43:42AM +0000, Tian, Kevin wrote:
> > > I wonder whether we have argued passed each other.
> > >
> > > This series adds reserved regions to S2. I challenged the necessity as
> > > S2 is not directly accessed by the device.
> > >
> > > Then you replied that doing so still made sense to support identity
> > > S1.
> >
> > I think I said/ment if we attach the "s2" iommu domain as a direct
> > attach for identity - eg at boot time, then the IOAS must gain the
> > reserved regions. This is our normal protocol.
> >
> > But when we use the "s2" iommu domain as an actual nested S2 then we
> > don't gain reserved regions.
> 
> Then we're aligned.
> 
> Yi/Nicolin, please update this series to not automatically add reserved
> regions to S2 in the nesting configuration.

Got it.

> It also implies that the user cannot rely on IOAS_IOVA_RANGES to
> learn reserved regions for arranging addresses in S1.
> 
> Then we also need a new ioctl to report reserved regions per dev_id.

Shall we add it now? I suppose yes.

> >
> > > Intel VT-d supports 4 configurations:
> > >   - passthrough (i.e. identity mapped)
> > >   - S1 only
> > >   - S2 only
> > >   - nested
> > >
> > > 'S2 only' is used when vIOMMU is configured in passthrough.
> >
> > S2 only is modeled as attaching an S2 format iommu domain to the RID,
> > and when this is done the IOAS should gain the reserved regions
> > because it is no different behavior than attaching any other iommu
> > domain to a RID.
> >
> > When the S2 is replaced with a S1 nest then the IOAS should loose
> > those reserved regions since it is no longer attached to a RID.
> 
> yes

Makes sense.

Regards,
Yi Liu

> 
> >
> > > My understanding of ARM SMMU is that from host p.o.v. the CD is the
> > > S1 in the nested configuration. 'identity' is one configuration in the CD
> > > then it's in the business of nesting.
> >
> > I think it is the same. A CD doesn't come into the picture until the
> > guest installs a CD pointing STE. Until that time the S2 is being used
> > as identity.
> >
> > It sounds like the same basic flow.
> 
> After a CD table is installed in a STE I assume the SMMU still allows to
> configure an individual CD entry as identity? e.g. while vSVA is enabled
> on a device the guest can continue to keep CD#0 as identity when the
> default domain of the device is set as 'passthrough'. In this case the
> IOAS still needs to gain reserved regions even though S2 is not directly
> attached from host p.o.v.
> 
> >
> > > My preference was that ALLOC_HWPT allows vIOMMU to opt whether
> > > reserved regions of dev_id should be added to the IOAS of the parent
> > > S2 hwpt.
> >
> > Having an API to explicitly load reserved regions of a specific device
> > to an IOAS makes some sense to me.
> >
> > Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-20 12:47               ` Jason Gunthorpe
  2023-06-21  6:02                 ` Tian, Kevin
@ 2023-06-21  8:29                 ` Duan, Zhenzhong
  2023-06-21 12:07                   ` Jason Gunthorpe
  1 sibling, 1 reply; 64+ messages in thread
From: Duan, Zhenzhong @ 2023-06-21  8:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

>-----Original Message-----
>From: Jason Gunthorpe <jgg@nvidia.com>
>Sent: Tuesday, June 20, 2023 8:47 PM
>Subject: Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
>
>On Tue, Jun 20, 2023 at 01:43:42AM +0000, Tian, Kevin wrote:
>> I wonder whether we have argued passed each other.
>>
>> This series adds reserved regions to S2. I challenged the necessity as
>> S2 is not directly accessed by the device.
>>
>> Then you replied that doing so still made sense to support identity
>> S1.
>
>I think I said/ment if we attach the "s2" iommu domain as a direct attach for
>identity - eg at boot time, then the IOAS must gain the reserved regions. This is
>our normal protocol.
There is code to fail the attaching for device with RMRR in intel iommu driver,
do we plan to remove below check for IOMMUFD soon or later?

static int intel_iommu_attach_device(struct iommu_domain *domain,
                                     struct device *dev)
{
        struct device_domain_info *info = dev_iommu_priv_get(dev);
        int ret;

        if (domain->type == IOMMU_DOMAIN_UNMANAGED &&
            device_is_rmrr_locked(dev)) {
                dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Contact your platform vendor.\n");
                return -EPERM;
        }

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-21  6:02                 ` Tian, Kevin
  2023-06-21  7:09                   ` Liu, Yi L
@ 2023-06-21 12:04                   ` Jason Gunthorpe
  2023-06-26  6:32                     ` Tian, Kevin
  2023-06-21 17:13                   ` Nicolin Chen
  2 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-21 12:04 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, Jun 21, 2023 at 06:02:21AM +0000, Tian, Kevin wrote:
> > > My understanding of ARM SMMU is that from host p.o.v. the CD is the
> > > S1 in the nested configuration. 'identity' is one configuration in the CD
> > > then it's in the business of nesting.
> > 
> > I think it is the same. A CD doesn't come into the picture until the
> > guest installs a CD pointing STE. Until that time the S2 is being used
> > as identity.
> > 
> > It sounds like the same basic flow.
> 
> After a CD table is installed in a STE I assume the SMMU still allows to
> configure an individual CD entry as identity? e.g. while vSVA is enabled
> on a device the guest can continue to keep CD#0 as identity when the
> default domain of the device is set as 'passthrough'. In this case the
> IOAS still needs to gain reserved regions even though S2 is not directly
> attached from host p.o.v.

In any nesting configuration the hypervisor cannot directly restrict
what IOVA the guest will use. The VM could make a normal nest and try
to use unusable IOVA. Identity is not really special.

The VMM should construct the guest memory map so that an identity
iommu_domain can meet the reserved requirements - it needs to do this
anyhow for the initial boot part. It shouuld try to forward the
reserved regions to the guest via ACPI/etc.

Being able to explicitly load reserved regions into an IOAS seems like
a useful way to help construct this.

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-21  8:29                 ` Duan, Zhenzhong
@ 2023-06-21 12:07                   ` Jason Gunthorpe
  0 siblings, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-21 12:07 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: Tian, Kevin, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest

On Wed, Jun 21, 2023 at 08:29:09AM +0000, Duan, Zhenzhong wrote:
> >-----Original Message-----
> >From: Jason Gunthorpe <jgg@nvidia.com>
> >Sent: Tuesday, June 20, 2023 8:47 PM
> >Subject: Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
> >
> >On Tue, Jun 20, 2023 at 01:43:42AM +0000, Tian, Kevin wrote:
> >> I wonder whether we have argued passed each other.
> >>
> >> This series adds reserved regions to S2. I challenged the necessity as
> >> S2 is not directly accessed by the device.
> >>
> >> Then you replied that doing so still made sense to support identity
> >> S1.
> >
> >I think I said/ment if we attach the "s2" iommu domain as a direct attach for
> >identity - eg at boot time, then the IOAS must gain the reserved regions. This is
> >our normal protocol.
> There is code to fail the attaching for device with RMRR in intel iommu driver,
> do we plan to remove below check for IOMMUFD soon or later?
> 
> static int intel_iommu_attach_device(struct iommu_domain *domain,
>                                      struct device *dev)
> {
>         struct device_domain_info *info = dev_iommu_priv_get(dev);
>         int ret;
> 
>         if (domain->type == IOMMU_DOMAIN_UNMANAGED &&
>             device_is_rmrr_locked(dev)) {
>                 dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Contact your platform vendor.\n");
>                 return -EPERM;
>         }

Not really, systems with RMRR cannot support VFIO at all. Baolu sent a
series lifting this restriction up higher in the stack:

https://lore.kernel.org/all/20230607035145.343698-1-baolu.lu@linux.intel.com/

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-21  6:02                 ` Tian, Kevin
  2023-06-21  7:09                   ` Liu, Yi L
  2023-06-21 12:04                   ` Jason Gunthorpe
@ 2023-06-21 17:13                   ` Nicolin Chen
  2023-06-26  6:42                     ` Tian, Kevin
  2 siblings, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-06-21 17:13 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, Jun 21, 2023 at 06:02:21AM +0000, Tian, Kevin wrote:

> > On Tue, Jun 20, 2023 at 01:43:42AM +0000, Tian, Kevin wrote:
> > > I wonder whether we have argued passed each other.
> > >
> > > This series adds reserved regions to S2. I challenged the necessity as
> > > S2 is not directly accessed by the device.
> > >
> > > Then you replied that doing so still made sense to support identity
> > > S1.
> >
> > I think I said/ment if we attach the "s2" iommu domain as a direct
> > attach for identity - eg at boot time, then the IOAS must gain the
> > reserved regions. This is our normal protocol.
> >
> > But when we use the "s2" iommu domain as an actual nested S2 then we
> > don't gain reserved regions.
> 
> Then we're aligned.
> 
> Yi/Nicolin, please update this series to not automatically add reserved
> regions to S2 in the nesting configuration.

I'm a bit late for the conversation here. Yet, how about the
IOMMU_RESV_SW_MSI on ARM in the nesting configuration? We'd
still call iommufd_group_setup_msi() on the S2 HWPT, despite
attaching the device to a nested S1 HWPT right?

> It also implies that the user cannot rely on IOAS_IOVA_RANGES to
> learn reserved regions for arranging addresses in S1.
> 
> Then we also need a new ioctl to report reserved regions per dev_id.

So, in a nesting configuration, QEMU would poll a device's S2
MSI region (i.e. IOMMU_RESV_SW_MSI) to prevent conflict?

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-21 12:04                   ` Jason Gunthorpe
@ 2023-06-26  6:32                     ` Tian, Kevin
  0 siblings, 0 replies; 64+ messages in thread
From: Tian, Kevin @ 2023-06-26  6:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 21, 2023 8:05 PM
> 
> On Wed, Jun 21, 2023 at 06:02:21AM +0000, Tian, Kevin wrote:
> > > > My understanding of ARM SMMU is that from host p.o.v. the CD is the
> > > > S1 in the nested configuration. 'identity' is one configuration in the CD
> > > > then it's in the business of nesting.
> > >
> > > I think it is the same. A CD doesn't come into the picture until the
> > > guest installs a CD pointing STE. Until that time the S2 is being used
> > > as identity.
> > >
> > > It sounds like the same basic flow.
> >
> > After a CD table is installed in a STE I assume the SMMU still allows to
> > configure an individual CD entry as identity? e.g. while vSVA is enabled
> > on a device the guest can continue to keep CD#0 as identity when the
> > default domain of the device is set as 'passthrough'. In this case the
> > IOAS still needs to gain reserved regions even though S2 is not directly
> > attached from host p.o.v.
> 
> In any nesting configuration the hypervisor cannot directly restrict
> what IOVA the guest will use. The VM could make a normal nest and try
> to use unusable IOVA. Identity is not really special.

Sure. What I talked is the end result e.g. after the user explicitly requests
to load reserved regions into an IOAS.

> 
> The VMM should construct the guest memory map so that an identity
> iommu_domain can meet the reserved requirements - it needs to do this
> anyhow for the initial boot part. It shouuld try to forward the
> reserved regions to the guest via ACPI/etc.

Yes.

> 
> Being able to explicitly load reserved regions into an IOAS seems like
> a useful way to help construct this.
> 

And it's correct in concept because the IOAS is 'implicitly' accessed by
the device when the guest domain is identity in this case.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-21 17:13                   ` Nicolin Chen
@ 2023-06-26  6:42                     ` Tian, Kevin
  2023-06-26 13:05                       ` Jason Gunthorpe
  2023-06-26 17:28                       ` Nicolin Chen
  0 siblings, 2 replies; 64+ messages in thread
From: Tian, Kevin @ 2023-06-26  6:42 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Jason Gunthorpe, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, June 22, 2023 1:13 AM
> 
> On Wed, Jun 21, 2023 at 06:02:21AM +0000, Tian, Kevin wrote:
> 
> > > On Tue, Jun 20, 2023 at 01:43:42AM +0000, Tian, Kevin wrote:
> > > > I wonder whether we have argued passed each other.
> > > >
> > > > This series adds reserved regions to S2. I challenged the necessity as
> > > > S2 is not directly accessed by the device.
> > > >
> > > > Then you replied that doing so still made sense to support identity
> > > > S1.
> > >
> > > I think I said/ment if we attach the "s2" iommu domain as a direct
> > > attach for identity - eg at boot time, then the IOAS must gain the
> > > reserved regions. This is our normal protocol.
> > >
> > > But when we use the "s2" iommu domain as an actual nested S2 then we
> > > don't gain reserved regions.
> >
> > Then we're aligned.
> >
> > Yi/Nicolin, please update this series to not automatically add reserved
> > regions to S2 in the nesting configuration.
> 
> I'm a bit late for the conversation here. Yet, how about the
> IOMMU_RESV_SW_MSI on ARM in the nesting configuration? We'd
> still call iommufd_group_setup_msi() on the S2 HWPT, despite
> attaching the device to a nested S1 HWPT right?

Yes, based on current design of ARM nesting.

But please special case it instead of pretending that all reserved regions
are added to IOAS which is wrong in concept based on the discussion.

> 
> > It also implies that the user cannot rely on IOAS_IOVA_RANGES to
> > learn reserved regions for arranging addresses in S1.
> >
> > Then we also need a new ioctl to report reserved regions per dev_id.
> 
> So, in a nesting configuration, QEMU would poll a device's S2
> MSI region (i.e. IOMMU_RESV_SW_MSI) to prevent conflict?
> 

Qemu needs to know all the reserved regions of the device and skip
them when arranging S1 layout.

I'm not sure whether the MSI region needs a special MSI type or
just a general RESV_DIRECT type for 1:1 mapping, though.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-26  6:42                     ` Tian, Kevin
@ 2023-06-26 13:05                       ` Jason Gunthorpe
  2023-06-26 17:28                       ` Nicolin Chen
  1 sibling, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-26 13:05 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Mon, Jun 26, 2023 at 06:42:58AM +0000, Tian, Kevin wrote:

> I'm not sure whether the MSI region needs a special MSI type or
> just a general RESV_DIRECT type for 1:1 mapping, though.

It probably always needs a special type :(

Jason


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-26  6:42                     ` Tian, Kevin
  2023-06-26 13:05                       ` Jason Gunthorpe
@ 2023-06-26 17:28                       ` Nicolin Chen
  2023-06-27  6:02                         ` Tian, Kevin
  1 sibling, 1 reply; 64+ messages in thread
From: Nicolin Chen @ 2023-06-26 17:28 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Mon, Jun 26, 2023 at 06:42:58AM +0000, Tian, Kevin wrote:

> > > Yi/Nicolin, please update this series to not automatically add reserved
> > > regions to S2 in the nesting configuration.
> >
> > I'm a bit late for the conversation here. Yet, how about the
> > IOMMU_RESV_SW_MSI on ARM in the nesting configuration? We'd
> > still call iommufd_group_setup_msi() on the S2 HWPT, despite
> > attaching the device to a nested S1 HWPT right?
> 
> Yes, based on current design of ARM nesting.
> 
> But please special case it instead of pretending that all reserved regions
> are added to IOAS which is wrong in concept based on the discussion.

Ack. Yi made a version of change dropping it completely along
with the iommufd_group_setup_msi() call for a nested S1 HWPT.
So I thought there was a misalignment. I made another version
preserving the pathway for MSI on ARM, and perhaps we should
go with this one:
https://github.com/nicolinc/iommufd/commit/c63829a12d35f2d7a390f42821a079f8a294cff8

> > > It also implies that the user cannot rely on IOAS_IOVA_RANGES to
> > > learn reserved regions for arranging addresses in S1.
> > >
> > > Then we also need a new ioctl to report reserved regions per dev_id.
> >
> > So, in a nesting configuration, QEMU would poll a device's S2
> > MSI region (i.e. IOMMU_RESV_SW_MSI) to prevent conflict?
> >
> 
> Qemu needs to know all the reserved regions of the device and skip
> them when arranging S1 layout.

OK.

> I'm not sure whether the MSI region needs a special MSI type or
> just a general RESV_DIRECT type for 1:1 mapping, though.

I don't quite get this part. Isn't MSI having IOMMU_RESV_MSI
and IOMMU_RESV_SW_MSI? Or does it juset mean we should report
the iommu_resv_type along with reserved regions in new ioctl?

Thanks
Nic

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-26 17:28                       ` Nicolin Chen
@ 2023-06-27  6:02                         ` Tian, Kevin
  2023-06-27 16:01                           ` Jason Gunthorpe
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-06-27  6:02 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Jason Gunthorpe, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, June 27, 2023 1:29 AM
> 
> > I'm not sure whether the MSI region needs a special MSI type or
> > just a general RESV_DIRECT type for 1:1 mapping, though.
> 
> I don't quite get this part. Isn't MSI having IOMMU_RESV_MSI
> and IOMMU_RESV_SW_MSI? Or does it juset mean we should report
> the iommu_resv_type along with reserved regions in new ioctl?
> 

Currently those are iommu internal types. When defining the new
ioctl we need think about what are necessary presenting to the user.

Probably just a list of reserved regions plus a flag to mark which
one is SW_MSI? Except SW_MSI all other reserved region types
just need the user to reserve them w/o knowing more detail.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-27  6:02                         ` Tian, Kevin
@ 2023-06-27 16:01                           ` Jason Gunthorpe
  2023-06-28  2:47                             ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-27 16:01 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Tue, Jun 27, 2023 at 06:02:13AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Tuesday, June 27, 2023 1:29 AM
> > 
> > > I'm not sure whether the MSI region needs a special MSI type or
> > > just a general RESV_DIRECT type for 1:1 mapping, though.
> > 
> > I don't quite get this part. Isn't MSI having IOMMU_RESV_MSI
> > and IOMMU_RESV_SW_MSI? Or does it juset mean we should report
> > the iommu_resv_type along with reserved regions in new ioctl?
> > 
> 
> Currently those are iommu internal types. When defining the new
> ioctl we need think about what are necessary presenting to the user.
> 
> Probably just a list of reserved regions plus a flag to mark which
> one is SW_MSI? Except SW_MSI all other reserved region types
> just need the user to reserve them w/o knowing more detail.

I think I prefer the idea we just import the reserved regions from a
devid and do not expose any of this detail to userspace.

Kernel can make only the SW_MSI a mandatory cut out when the S2 is
attached.

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-27 16:01                           ` Jason Gunthorpe
@ 2023-06-28  2:47                             ` Tian, Kevin
  2023-06-28 12:36                               ` Jason Gunthorpe
  0 siblings, 1 reply; 64+ messages in thread
From: Tian, Kevin @ 2023-06-28  2:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 28, 2023 12:01 AM
> 
> On Tue, Jun 27, 2023 at 06:02:13AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Tuesday, June 27, 2023 1:29 AM
> > >
> > > > I'm not sure whether the MSI region needs a special MSI type or
> > > > just a general RESV_DIRECT type for 1:1 mapping, though.
> > >
> > > I don't quite get this part. Isn't MSI having IOMMU_RESV_MSI
> > > and IOMMU_RESV_SW_MSI? Or does it juset mean we should report
> > > the iommu_resv_type along with reserved regions in new ioctl?
> > >
> >
> > Currently those are iommu internal types. When defining the new
> > ioctl we need think about what are necessary presenting to the user.
> >
> > Probably just a list of reserved regions plus a flag to mark which
> > one is SW_MSI? Except SW_MSI all other reserved region types
> > just need the user to reserve them w/o knowing more detail.
> 
> I think I prefer the idea we just import the reserved regions from a
> devid and do not expose any of this detail to userspace.
> 
> Kernel can make only the SW_MSI a mandatory cut out when the S2 is
> attached.
> 

I'm confused.

The VMM needs to know reserved regions per dev_id and report them
to the guest.

And we have aligned on that reserved regions (except SW_MSI) should
not be automatically added to S2 in nesting case. Then the VMM cannot
rely on IOAS_IOVA_RANGES to identify the reserved regions.

So there needs a new interface for the user to discover reserved regions
per dev_id, within which the SW_MSI region should be marked out so
identity mapping can be installed properly for it in S1.

Did I misunderstand your point in previous discussion?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-28  2:47                             ` Tian, Kevin
@ 2023-06-28 12:36                               ` Jason Gunthorpe
  2023-06-29  2:16                                 ` Tian, Kevin
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2023-06-28 12:36 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

On Wed, Jun 28, 2023 at 02:47:02AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, June 28, 2023 12:01 AM
> > 
> > On Tue, Jun 27, 2023 at 06:02:13AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Tuesday, June 27, 2023 1:29 AM
> > > >
> > > > > I'm not sure whether the MSI region needs a special MSI type or
> > > > > just a general RESV_DIRECT type for 1:1 mapping, though.
> > > >
> > > > I don't quite get this part. Isn't MSI having IOMMU_RESV_MSI
> > > > and IOMMU_RESV_SW_MSI? Or does it juset mean we should report
> > > > the iommu_resv_type along with reserved regions in new ioctl?
> > > >
> > >
> > > Currently those are iommu internal types. When defining the new
> > > ioctl we need think about what are necessary presenting to the user.
> > >
> > > Probably just a list of reserved regions plus a flag to mark which
> > > one is SW_MSI? Except SW_MSI all other reserved region types
> > > just need the user to reserve them w/o knowing more detail.
> > 
> > I think I prefer the idea we just import the reserved regions from a
> > devid and do not expose any of this detail to userspace.
> > 
> > Kernel can make only the SW_MSI a mandatory cut out when the S2 is
> > attached.
> > 
> 
> I'm confused.
> 
> The VMM needs to know reserved regions per dev_id and report them
> to the guest.
> 
> And we have aligned on that reserved regions (except SW_MSI) should
> not be automatically added to S2 in nesting case. Then the VMM cannot
> rely on IOAS_IOVA_RANGES to identify the reserved regions.

We also said we need a way to load the reserved regions to create an
identity compatible version of the HWPT

So we have a model where the VMM will want to load in regions beyond
the currently attached device needs

> So there needs a new interface for the user to discover reserved regions
> per dev_id, within which the SW_MSI region should be marked out so
> identity mapping can be installed properly for it in S1.
> 
> Did I misunderstand your point in previous discussion?

This is another discussion, if the vmm needs this then we probably
need a new API to get it.

Jason

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 00/11] iommufd: Add nesting infrastructure
  2023-06-28 12:36                               ` Jason Gunthorpe
@ 2023-06-29  2:16                                 ` Tian, Kevin
  0 siblings, 0 replies; 64+ messages in thread
From: Tian, Kevin @ 2023-06-29  2:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Liu, Yi L, joro, alex.williamson, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest,
	Duan, Zhenzhong

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 28, 2023 8:36 PM
> 
> On Wed, Jun 28, 2023 at 02:47:02AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, June 28, 2023 12:01 AM
> > >
> > > On Tue, Jun 27, 2023 at 06:02:13AM +0000, Tian, Kevin wrote:
> > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Sent: Tuesday, June 27, 2023 1:29 AM
> > > > >
> > > > > > I'm not sure whether the MSI region needs a special MSI type or
> > > > > > just a general RESV_DIRECT type for 1:1 mapping, though.
> > > > >
> > > > > I don't quite get this part. Isn't MSI having IOMMU_RESV_MSI
> > > > > and IOMMU_RESV_SW_MSI? Or does it juset mean we should report
> > > > > the iommu_resv_type along with reserved regions in new ioctl?
> > > > >
> > > >
> > > > Currently those are iommu internal types. When defining the new
> > > > ioctl we need think about what are necessary presenting to the user.
> > > >
> > > > Probably just a list of reserved regions plus a flag to mark which
> > > > one is SW_MSI? Except SW_MSI all other reserved region types
> > > > just need the user to reserve them w/o knowing more detail.
> > >
> > > I think I prefer the idea we just import the reserved regions from a
> > > devid and do not expose any of this detail to userspace.
> > >
> > > Kernel can make only the SW_MSI a mandatory cut out when the S2 is
> > > attached.
> > >
> >
> > I'm confused.
> >
> > The VMM needs to know reserved regions per dev_id and report them
> > to the guest.
> >
> > And we have aligned on that reserved regions (except SW_MSI) should
> > not be automatically added to S2 in nesting case. Then the VMM cannot
> > rely on IOAS_IOVA_RANGES to identify the reserved regions.
> 
> We also said we need a way to load the reserved regions to create an
> identity compatible version of the HWPT
> 
> So we have a model where the VMM will want to load in regions beyond
> the currently attached device needs

No question on this.

> 
> > So there needs a new interface for the user to discover reserved regions
> > per dev_id, within which the SW_MSI region should be marked out so
> > identity mapping can be installed properly for it in S1.
> >
> > Did I misunderstand your point in previous discussion?
> 
> This is another discussion, if the vmm needs this then we probably
> need a new API to get it.
> 

Then it's clear. 😊

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2023-06-29  2:16 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-11 14:38 [PATCH v2 00/11] iommufd: Add nesting infrastructure Yi Liu
2023-05-11 14:38 ` [PATCH v2 01/11] iommu: Add new iommu op to create domains owned by userspace Yi Liu
2023-05-19  8:47   ` Tian, Kevin
2023-05-19 18:45     ` Nicolin Chen
2023-05-24  5:02       ` Tian, Kevin
2023-05-24  5:23         ` Nicolin Chen
2023-05-24  7:48           ` Tian, Kevin
2023-05-25  1:41             ` Nicolin Chen
2023-06-06 14:08               ` Jason Gunthorpe
2023-06-06 19:43                 ` Nicolin Chen
2023-06-07  0:14                   ` Jason Gunthorpe
2023-05-11 14:38 ` [PATCH v2 02/11] iommu: Add nested domain support Yi Liu
2023-05-19  8:51   ` Tian, Kevin
2023-05-19 18:49     ` Nicolin Chen
2023-05-24  5:03       ` Tian, Kevin
2023-05-24  5:28         ` Nicolin Chen
2023-05-11 14:38 ` [PATCH v2 03/11] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation Yi Liu
2023-05-19  8:56   ` Tian, Kevin
2023-05-19 18:57     ` Nicolin Chen
2023-05-24  5:04       ` Tian, Kevin
2023-05-11 14:38 ` [PATCH v2 04/11] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
2023-05-19  9:06   ` Tian, Kevin
2023-05-19 19:09     ` Nicolin Chen
2023-05-24  5:11       ` Tian, Kevin
2023-05-24  5:31         ` Nicolin Chen
2023-05-19  9:34   ` Tian, Kevin
2023-05-19 19:12     ` Nicolin Chen
2023-05-11 14:38 ` [PATCH v2 05/11] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables Yi Liu
2023-05-11 14:38 ` [PATCH v2 06/11] iommufd: IOMMU_HWPT_ALLOC allocation with user data Yi Liu
2023-05-17  3:08   ` Liu, Jingqi
2023-05-19 19:34     ` Nicolin Chen
2023-05-19  9:41   ` Tian, Kevin
2023-05-19 19:48     ` Nicolin Chen
2023-05-24  5:16       ` Tian, Kevin
2023-05-24  5:40         ` Nicolin Chen
2023-05-24  7:55           ` Tian, Kevin
2023-05-11 14:38 ` [PATCH v2 07/11] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
2023-05-11 14:38 ` [PATCH v2 08/11] iommufd/selftest: Add domain_alloc_user() support in iommu mock Yi Liu
2023-05-11 14:38 ` [PATCH v2 09/11] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data Yi Liu
2023-05-11 14:38 ` [PATCH v2 10/11] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
2023-05-11 14:38 ` [PATCH v2 11/11] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
2023-05-19  9:56 ` [PATCH v2 00/11] iommufd: Add nesting infrastructure Tian, Kevin
2023-05-19 11:49   ` Jason Gunthorpe
2023-05-24  3:48     ` Tian, Kevin
2023-06-06 14:18       ` Jason Gunthorpe
2023-06-16  2:43         ` Tian, Kevin
2023-06-19 12:37           ` Jason Gunthorpe
2023-06-20  1:43             ` Tian, Kevin
2023-06-20 12:47               ` Jason Gunthorpe
2023-06-21  6:02                 ` Tian, Kevin
2023-06-21  7:09                   ` Liu, Yi L
2023-06-21 12:04                   ` Jason Gunthorpe
2023-06-26  6:32                     ` Tian, Kevin
2023-06-21 17:13                   ` Nicolin Chen
2023-06-26  6:42                     ` Tian, Kevin
2023-06-26 13:05                       ` Jason Gunthorpe
2023-06-26 17:28                       ` Nicolin Chen
2023-06-27  6:02                         ` Tian, Kevin
2023-06-27 16:01                           ` Jason Gunthorpe
2023-06-28  2:47                             ` Tian, Kevin
2023-06-28 12:36                               ` Jason Gunthorpe
2023-06-29  2:16                                 ` Tian, Kevin
2023-06-21  8:29                 ` Duan, Zhenzhong
2023-06-21 12:07                   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).